Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Medical sequencing Principle and application of exome sequencing Yu Sun [email protected] •Sanger sequencing of known disease causing genes •For MR/MCA: SNP array Exome sequencing Genome sequencing 2 Outline • Introduction of exome sequencing • Technique, workflow, strategy • An example case solved by exome sequencing following standard strategy • SCAR7 and TPP1 • What to do when standard strategy failed • “Silent” variation can be disease causing • Exome sequencing might catch branch point mutation • Genome sequencing 3 Exome sequencing Exome: all exons in a region Procedure Exome capture • SureSelect, Nimblegen, Truseq, etc NGS sequencing • Illumina, 454, etc Usage Find the mutation for Mendelian disease Rare allele finding for complex disease 4 Sporadic Familial de novo 5 Bamshad, Nature Reviews Genetics 12, 745-755 Identify Mendelian disease genes • By exome sequencing • a simple “sample-sequence-compare” loop 6 http://www.genomics.agilent.com/GenericA.aspx?PageType=Custom&SubPageType=Custom&PageID=2098) 7 Data analysis of exome sequencing Short reads Find the genomic locations of each short reads Genomic location, nucleotide changes, genotpye, etc No biological info Add biological info: Gene, variant function, conservation, etc To find out the disease genes 8 9 10 Standard filter and comparison strategy •Tens of thousands of variants found in each exome •Filter and comparison shorten the candidate list 11 Example case SCAR7 and TPP1 Human Mutation, 2013, 34 (5), 706-713 STANDARD STRATEGY 12 SCAR7 • Autosomal recessive spinocerebellar ataxia 7 (SCAR7), OMIM 609207 • Phenotypes : • difficulty walking and writing • dysarthria • limb ataxia • cerebellar atrophy 13 Method • SureSelect 50Mb all exons + Illumina HiSeq • 100bp pair end sequencing • Filtering • In linkage region 11p15 • Low frequency (<5%) in NHLBI ESP exomes • Homozygous/compound heterozygous • Nonsense, missense, frameshift, coding indel, splice site 14 What’s known? 11p15, 5.8cM, >200 genes Breedveld, J Med Genet 2004, 858-866 15 Candidate list Gene Chr Position Ref Sample base genotype HGVS nomenclature Function GVS TPP1 11 6636430 A A/C NM_000391.3:c.1397T>G missense TPP1 11 6638385 C C/G NM_000391.3:c.509-1G>C splice-3 DCHS1 11 6645264 G A/G NM_003737.2:c.7643C>T missense DCHS1 11 6662466 C C/T NM_003737.2:c.379G>A missense C11orf40 11 4594558 - -/G NM_144663.1:c.286_287insC C11orf40 11 4598956 C C/T NM_144663.1:c.95G>A frameshift Not nonsense cosegregate Not cosegregate TPP1: Encoding the lysosomal serine protease with tripeptidyl-peptidase 1 activity 16 TPP1 variants 17 TPP1 variants Cosegregate within families A B Wild type c.1379T>G c.509-1G>C 18 The first example TOD and FLNA “Silent” variation can be disease causing The American Journal of Human Genetics, 2010, 87(1), 146-153 WHAT IF THE STAND STRATEGY DOES NOT WORK 19 Terminal Osseous Dysplasia • • • • Terminal Osseous Dysplasia, TOD (OMIM 300244) Rare, all female X-linked male lethal dominant disease Phenotypes: • pigmentary anomalies of the skin • skeletal abnormalities of the limbs • recurring digital fibroma 20 Xq • Zhang et al, 2000 • Linkage analysis Xq27.3q28 • 8.7Mb in total • 219 genes 21 Method • The probands of the Dutch and Italian family • SureSelect X-exome by Agilent • Sequencing by Genome Analyser II, Illumina 1I:1 1II:1 1I:2 P 1II:2 1III:1 Dutch 1III:2 2I:1 2II:1 2I:2 2II:2 2III:1 Italian 22 Variant Filtering • • • • In Xq27.3-q28 Heterozygous Low frequency in european population Missense, nonsense, frameshift, inframe indel, splice site • Include silent mutation • 1 gene 1 variant in common • c.5217G>A in FLNA – Not in dbSNP; not reported before; not present in 1000 genomes project; not found in >400 control X chromosomes 23 Sanger sequencing confirmation c.5217G>A Wild type 1I:1 Mutation 1I:2 2I:1 G/G 2I:2 3I:1 3I:2 G 1II:1 P 1II:2 2II:1 2II:2 G/A G/A 3II:1 2III:1 1III:1 1III:2 Dutch G/A G/A G/A Italian G/G 3II:2 G/A 3II:3 G/A Israel-Arab 3 sporadic cases: G/A 24 FLNA summary cytoskeletal protein filamin A Flanking 26kb NM_001456 NM_001110556 47 exons 48 exons 2639a.a. 2647a.a 25 Alter Splicing • Fibroma cells from 1III:2 1I:1 1II:1 1I:2 P 1II:2 • Alter splicing 1III:1 1III:2 Family 1 26 The second example Aarskog Scott Syndrome and FGD1 Exome sequencing can detect branch point mutation Human Mutation, 2012, 34 (3), 430-434 WHAT IF THE STAND STRATEGY DOES NOT WORK 27 Aarskog Scott Syndrome • OMIM 305400, most cases X-linked inheritance; Females mildly affected • FGD1 gene (Xp11.22), 19 exons, 100 Kb • Mutation detection rate +/- 20% • Phenotypes • Short stature (-1 > -2 SD) • Facial dysmorphism • Small hands and feet • Shawl scrotum • Mental retardation rare 28 Method • Two affected boys, negative FGD1 mutation by Sanger sequencing • SureSelect all exons (Agilent) • sequencing by Genome Analyzer II 29 Candidate list Sample Variant Gene Function Depth dbSNP OMIM III-1 NM_018325.2:c.607G>C C9orf72 Missense 9 none No III-2 NM_018325.2:c.607G>C C9orf72 Missense 9 none No III-1 NM_080818.3:c.512_513insA OXGR1 Frameshift 43 none No III-2 NM_080818.3:c.512_513insA OXGR1 Frameshift 67 none No Associated with FTLD/ALS Not cosegregate FTLD/ALS= frontotemporal dementia and/or amyotrophic lateral sclerosis, not similar with ASS phenotype Neither seems the causative variants Extend the region 50nt into the intron 30 Sample Variant Gene Function Depth dbSNP OMIM III-1 NM_018325.2:c.607G>C C9orf72 Missense 9 none No III-2 NM_018325.2:c.607G>C C9orf72 Missense 9 none No III-1 NM_080818.3:c.512_513insA OXGR1 Frameshift 43 none No III-2 NM_080818.3:c.512_513insA OXGR1 Frameshift 67 none No III-1 NM_004463.2:c.2016-35delA FGD1 intron 11 none Yes III-2 NM_004463.2:c.2016-35delA FGD1 intron 11 none Yes Associated with FTLD/ALS Not cosegregate 31 Sanger Sequencing delA delA/A delA delA •Not present in controls, 1000 genomes project •Not reported before in literature •Prediction: break splicing branch site 32 FGD1 Exon 12 III-1 III-2 II-2 Exon 13 I-1 Exon 14 controls 33 Exome sequencing Pros • Successes in disease gene detection • High throughput • Exonic , easier to understand the effect • Cheaper and smaller data compare to genome sequencing Cons • Only exonic region, miss other genetic information • Capture bias, might cause false negative • High deviation in depth, hard to detect copy number changes • Unsolved cases 34 Exome sequencing Sample prep Genome sequencing Easier (no capture) Capture bias Yes No Genetic information Around selected exons Whole genome Data size Much smaller (1~2% of the genome) Depth More deviation Less deviation Price Cheaper More expensive 35 Exome Genome Less depth deviation in genome sequencing than exome sequencing 36 Exome Genome Two variants missed by exome sequencing Might due to capture failure 37 Summary • Exome sequencing high throughput technique in finding causative mutations of Mendelian diseases. • Sample choice De novo mutation: family trio Inherited mutation: some sporadic cases far away family members • Standard filter/comparison gives general solution of disease gene detection by exome sequencing. 38 Summary • Step-wise filtering and comparison strategy: Inheritance pattern: dominant – heterozygous; recessive – homozygous/compound heterozygous Predicted function: nonsense, splice site, missense, frameshift, coding indel Novelity : databases, etc Comparison: familial – variant level; sporadic – gene level • The standard strategy does not always work. 39 Summary • Use less stringent filtering will help to recover some data (allow silent mutations, extend to the intronic regions, etc). • RNA-seq tools can be applied to find large indels from the short reads. • RNA provides valuable proof of variation effect with simple experiments. • Some genetic information is missed by exome sequencing. Genome sequencing might overcome. • Some imagination and luck are needed. 40