Download Supplemental Information Supplementary Text Exome sequencing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Supplemental Information
Supplementary Text
10
Exome sequencing was performed using llumina TruSeq Exome Enrichment Kit or
Agilent SureSelectXT Human All Exon 50Mb kit. Illumina HiSeq2000 was used for
sequencing as 100bp paired-end runs at the UCLA Clinical Genomics Center or the
UCLA Broad Stem Cell Research Center. Sequence reads were aligned to the human
reference genome (Human GRCh37 (hg19) build) using Novoalign (v2.07,
http://www.novocraft.com/main/index.php). PCR duplicates were identified by Picard
(v1.42, http://picard.sourceforge.net/) and GATK (Genome Analysis Toolkit) (v1.1,
http://www.broadinstitute.org/gatk/)1 was used to realign indels, recalibrate the quality
scores, call, filter, recalibrate and evaluate the variants. SNVs and INDELs across the
sequenced protein-coding regions and flanking junctions were annotated using Variant
Annotator X (VAX), a customized Ensembl Variant Effect Predictor 2.
Figures
20
Figure S1: Histogram of variants with allele frequencies <1% in 1000 Genomes but >1%
in the CEU. It shows that allele frequencies can be highly structured for rare variants and
that averaging across too many genetically dissimilar populations can have a downwardbias effect on frequency estimates of alleles present in a given population.
30
40
Figure S2: Reference panel size impacts the efficacy of filtering in exome sequencing in
African-American simulations from the EVS data.
Figure S3: Principle component analysis of CEU, YRI, CHB and 101 self-reported
Middle Eastern (ME) individuals. Analysis is based on 24,791 variants where there is no
missing data for any of the Middle Eastern exomes and where the variants were called in
the 1000 Genomes data. The YRI, CEU and CHB populations are well separated and the
Middle Eastern population clusters with the CEU and trails towards the YRI. This is
similar to what is observed in previous work3. Removing the two most extreme ME
individuals (far left and far bottom blue dots) results in a marginal increase in the number
of variants remaining in the Middle Eastern individuals (see Table S3).
Tables
50
Ind
Country
Presumed
Inheritance
Causal
Variant
zygosity
1
Jordan
Recessive
Homo
2
Turkey
Recessive
Homo
3
Turkey
Recessive
Homo
4
Iran
Recessive
Homo
5
Syria
Recessive
Homo
6
Tunisia
7
Jordan
None
Given
Recessive
8
Jordan
Recessive
9
Iran
10
Jordan
11
Jordan
12
Turkey
Recessive
Het
13
Tunisia
Dominant
Het
14
Iran
Dominant
Het
15
Iran
Dominant
Het
16
Jordan
17
Jordan
18
Jordan
19
Iran
20
Turkey
None
Given
None
Given
None
Given
None
Given
None
Given
None
Given
None
Given
Recessive
Homo
Homo
Homo
Homo
Homo
Het
Het
Het
Het
Het
CompHet
Disease
Brachydactyly, type
A2
LADD Syndrome
Oral-facial-digital
syndrome type VI
Spastic paraplegia 11,
autosomal recessive
Mitchell-Riley
syndrome
Steroid-11 betahydroxylase deficiency
Microcephaly
Rickets vitamin Ddependent type 1A
Oral-facial-digital
syndrome type VI
Cranioectodermal
dysplasia 3
Microphthalmia,
syndromic 5
Split-hand/foot
malformation 1 with
sensorineural hearing
loss
Palmoplantar MSSE
Parkinson's Disease
Ataxia, sensory, 1,
autosomal dominant
Cerebral cavernous
malformations 3
Brachydactyly, type
B1
Anophthamia
Spinocerebellar ataxia
6
Oral-Facial-Digital
type VI
Variants
Remaining
Pre-Frequency
Filtering
No
Ancestry,
f>1%
PopMatched,
FNR <5%
AllPop,
FNR
<5%
5016
65
43
33
5049
50
20
20
4752
25
23
20
4674
39
15
13
4751
26
18
16
4834
37
19
11
4722
22
11
5
4973
104
102
56
5653
112
67
52
5938
97
83
66
15694
872
750
532
12171
619
445
361
14111
813
747
505
13727
726
535
446
13417
661
512
410
14713
871
690
545
13472
668
543
421
13789
748
623
494
15128
769
598
495
12919
604
426
370
Table S1. Analysis of real data by individual exome. Variants remaining pre-frequency
filtering reflects the number of variants remaining when all variants without damaging
annotations are filtered out and when variants inconsistent with the disorder architecture
are removed but before frequency-based filtering has been preformed.
Ind
Genomic
Position
(hg19)
Gene
HGVSc
HGVSp
Causal
Variant
Zygosity
Variant
reported in
Literature
?
ExAC
EVS
HGMD
variant types
SIFT
Polyphen
Prediction*
1
4:96035914
BMPR1B
c.188_207de
lGGTTGCC
TGTGGTCA
CTTCT
p.Val66ArgfsX22
Hom
No4
NP
NP
Missense/
nonsense
NA
NA
Likely
pathogenic
2
10:123256236
FGFR2
c.1400G>A
p.Gly467Glu
Hom
No 5
1 het
NP
Missense
Deleterious
Probably
damaging
Likely
pathogenic
3
5:37205557
C5orf42
-
Hom
Yes6
NP
NP
NA
NA
Pathogenic
4
15:44941193
SPG11
p.Leu57AspfsX66
Hom
No 7
1 het
NP
NA
NA
5
6:117240406
RFX6
c.1129C>T
p.Arg377X
Hom
No 8
NP
NP
NA
NA
6
8:143956714
CYP11B1
c.1136G>T
p.Gly379Val
Hom
Yes9
NP
NP
Deleterious
Benign
Pathogenic
7
8:6266892
MCPH1
c.114+1G>T
-
Hom
No10
NP
NP
NA
NA
Likely
pathogenic
8
12:58157481
CYP27B1
c.1319_1325
dupCCCAC
CC
p.Phe443GlyfsX24
Hom
Yes11
NP
NP
Missense/
nonsense
NA
NA
Pathogenic
9
5:37121819
C5orf42
c.5616delA
p.Leu1872PhefsTer44
Hom
No6
NP
NP
Missense/
nonsense
NA
NA
10
14:76455258
IFT43
c.85G>T
p.Glu29X
Hom
No12
NP
NP
Missense
NA
NA
11
14:57270998
OTX2
c.181C>G
p.Leu61Val
Het
No13
NP
NP
Missense/
nonsense
Deleterious
Probably
damaging
Likely
pathogenic
Likely
pathogenic
Likely
pathogenic
12
7:96650164
DLX5
c.754T>C
p.Tyr252His
Het
No14
NP
NP
Missense/
nonsense
Deleterious
Possibly
damaging
Likely
pathogenic
13
-
-
-
-
Het
No15
-
-
-
-
-
14
12:40653350
LRRK2
c.1487C>T
p.Thr496Ile
Het
No16
NP
NP
Missense/
nonsense
Tolerated
Benign
15
8:42729096
RNF170
c.191G>A
p.Arg64Gln
Het
No17
NP
NP
Missense
Deleterious
Probably
damaging
16
3:167414800
PDCD10
c.264dupA
p.Glu89ArgfsX6
Het
No18
NP
NP
Missense/
nonsense
NA
NA
Continued on next page.
c.31501G>T
c.169_170de
lCT
Missense/
nonsense
Mostly
nonsense
Missense/
nonsense
Missense/
nonsense
Missense/
nonsense
Likely
pathogenic
Likely
pathogenic
Likely
pathogenic
Likely
pathogenic
Likely
pathogenic
Likely
pathogenic
Ind
Genomic
Position
(hg19)
Gene
HGVSc
HGVSp
Causal
Variant
Zygosity
Variant
reported in
Literature
?
ExAC
EVS
HGMD
variant types
SIFT
Polyphen
17
9:94486233
ROR2
c.2543C>T
p.Pro848Leu
Het
No19
15 het†
2 het
Nonsense
Deleterious
Possibly
damaging
p.Asn110GlufsX6
Het
No13
NP
NP
Missense
NA
Het
No20
NP
NP
Missense
Deleterious
3 hom†
3 het
Comp het
No6
NP
NP
18
14:57269015
OTX2
c.382_331de
lAATG
19
19:13411385
CACNA1
A
c.2261C>A
p.Ala754Glu
c.4793C>A
p.Thr1598Lys
5:37183490
20
C5orf42
5:37226725
c.1972G>C
p.Gly658Arg
Missense/
nonsense
Deleterious
Tolerated
NA
Probably
damaging
Probably
damaging
Probably
damaging
Prediction*
VUS
Likely
pathogenic
Likely
pathogenic
VUS
VUS
Table S2: Summary of evidence for identifying causal variants in real exome sequencing data. Variants not found in a database are
signified as not present (NP). Variants that have exact matches to published variants are signified as reported in literature with the
citation given. Variants without exact matches cite the literature of the gene in which the variant falls that is associated to the
phenotype. The HGMD variant types list frame shifts and splice site variants as nonsense variants. *Variants reported before in the
literature were predicted to be pathogenic. Variants not reported in literature were evaluated in the context of 1) how well the
phenotypes match, 2) is the variant absent or extremely rare in the population, 3) does the variant type match the known or predicted
mechanism of how the gene can be disrupted, 4) is the in silico prediction concordant (for missense variants) and predicted to be likely
pathogenic if all four were in agreement. †Phenotypic data is not publically available from ExAC database. Patient #17’s phenotype is
relatively mild and it is likely that 15 individuals in ExAC with the same variant are affected or carriers. One variant in Patient #20 is
observed as homozygous in 3 individuals in ExAC but phenotypic data on these 3 individuals are not available. A follow-up
functional study is warranted to call the potential compound heterozygous variants likely pathogenic. Information on Individual #13
is withheld because it is a novel finding and in preparation for publication; the citation corresponding to it clinically defines the
disease and maps it to one locus.
Compound
Recessive
Dominant
Heterozygous
Method
(#cases=10)
(#cases=9)
(#cases=1)
NoAncestry, f>1%
58.6 (35.4)
758.0 (92.8)
614
PopMatched, FNR<5%
40.8 (33.2)
614.4 (110.8)
431
AllPop, FNR <5%
29.4 (21.5)
471.3 (62.8)
374
Table S3. Average number of variants that remain for follow-up post-filtering in real
exome studies of 20 individuals with monogenic disorders after controlling the Middle
Eastern reference panel by removing the two most extreme PCA outliers.
References
1. McKenna, A., Hanna, M., Banks, E., et al. The Genome Analysis Toolkit: a
MapReduce framework for analyzing next-generation DNA sequencing data.
Genome Res 2010; 20, 1297-1303.
2. Yourshaw, M., Taylor, S.P., Rao, A.R., Martin, M.G., and Nelson, S.F. Rich
annotation of DNA sequencing variants by leveraging the Ensembl Variant Effect
Predictor with plugins. Brief Bioinform 2014.
3. Yang, X., Al-Bustan, S., Feng, Q., et al. The influence of admixture and consanguinity
on population genetic diversity in Middle East. J Hum Genet 2014; 59, 615-622.
4. Lehmann, K., Seemann, P., Stricker, S., et al. Mutations in bone morphogenetic
protein receptor 1B cause brachydactyly type A2. Proc Natl Acad Sci U S A
2003; 100, 12277-12282.
5. Rohmann, E., Brunner, H.G., Kayserili, H., et al. Mutations in different components of
FGF signaling in LADD syndrome. Nat Genet 2006; 38, 414-417.
6. Lopez, E., Thauvin-Robinet, C., Reversade, B., et al. C5orf42 is the major gene
responsible for OFD syndrome type VI. Hum Genet 2014; 133, 367-377.
7. Stevanin, G., Paternotte, C., Coutinho, P., et al. A new locus for autosomal recessive
spastic paraplegia (SPG32) on chromosome 14q12-q21. Neurology 2007; 68,
1837-1840.
8. Smith, S.B., Qu, H.Q., Taleb, N., et al. Rfx6 directs islet formation and insulin
production in mice and humans. Nature 2010; 463, 775-780.
9. Kharrat, M., Trabelsi, S., Chaabouni, M., et al. Only two mutations detected in 15
Tunisian patients with 11beta-hydroxylase deficiency: the p.Q356X and the novel
p.G379V. Clin Genet 2010; 78, 398-401.
10. Darvish, H., Esmaeeli-Nieh, S., Monajemi, G.B., et al. A clinical and molecular
genetic study of 112 Iranian families with primary microcephaly. J Med Genet
2010; 47, 823-828.
11. Wang, J.T., Lin, C.J., Burridge, S.M., et al. Genetics of vitamin D 1alphahydroxylase deficiency in 17 families. Am J Hum Genet 1998; 63, 1694-1702.
12. Arts, H.H., Bongers, E.M., Mans, D.A., et al. C14ORF179 encoding IFT43 is
mutated in Sensenbrenner syndrome. J Med Genet 2011; 48, 390-395.
13. Schilter, K.F., Schneider, A., Bardakjian, T., et al. OTX2 microphthalmia syndrome:
four novel mutations and delineation of a phenotype. Clin Genet 2011; 79, 158168.
14. Shamseldin, H.E., Faden, M.A., Alashram, W., and Alkuraya, F.S. Identification of a
novel DLX5 mutation in a family with autosomal recessive split hand and foot
malformation. J Med Genet 2012; 49, 16-20.
15. Mamai, O., Boussofara, L., Denguezli, M., et al. Multiple self-healing palmoplantar
carcinoma: a familial predisposition to skin cancer with primary palmoplantar and
conjunctival lesions. J Invest Dermatol 2015; 135, 304-308.
16. Zimprich, A., Biskup, S., Leitner, P., et al. Mutations in LRRK2 cause autosomaldominant parkinsonism with pleomorphic pathology. Neuron 2004; 44, 601-607.
17. Valdmanis, P.N., Dupre, N., Lachance, M., et al. A mutation in the RNF170 gene
causes autosomal dominant sensory ataxia. Brain 2011; 134, 602-607.
18. Bergametti, F., Denier, C., Labauge, P., et al. Mutations within the programmed cell
death 10 gene cause cerebral cavernous malformations. Am J Hum Genet 2005;
76, 42-51.
19. Oldridge, M., Fortuna, A.M., Maringa, M., et al. Dominant mutations in ROR2,
encoding an orphan receptor tyrosine kinase, cause brachydactyly type B. Nat
Genet 2000; 24, 275-278.
20. Yue, Q., Jen, J.C., Nelson, S.F., and Baloh, R.W. Progressive ataxia due to a missense
mutation in a calcium-channel gene. Am J Hum Genet 1997; 61, 1078-1087.
Related documents