Download Candidate gene screening using long-read sequencing

Candidate gene screening using long-read sequencing Jenny Ekholm, Ting Hon, Yu-Chih Tsai, David Greenberg, Tyson A. Clark, Steve Kujawa PacBio, Menlo Park, CA USA SMRT Sequencing overview We have developed several candidate gene screening applications for both Neuromuscular and Neurological disorders. The power behind these applications comes from the use of longread sequencing. It allows us to access previously unresolvable and even unsequencable genomic regions. SMRT Sequencing offers uniform coverage, a lack of sequence context bias, and very high accuracy. In addition, it is also possible to directly detect epigenetic signatures and characterize full-length gene transcripts through assembly-free isoform sequencing. Alzheimer's disease (AD): TOMM40 gene Gene isoform characterization Experimental pipeline Size Selection & PCR amplification cDNA synthesis with adapters 1 b a Experimental Pipeline 2 SMRTbell ligation polyA mRNA 3 Poly(A) mRNA AAAAA AAAAA TTTTT AAAAA 5PacBio raw sequence reads 4 AAAAA AAAAA AAAAA TTTTT AAAAA TTTTT AAAAA AAAAA AAAAA TTTTT AAAAA TTTTT AAAAA Informatics Sequel/PacBio RS II pipeline Sequencing AAAAA TTTTT cDNA synthesis with adapters AAAAA TTTTT AAAAA TTTTT AAAAA Remove adapters Remove artifacts AAAAA Clean sequence reads AAAA TTTT AAAA TTTT Reads clustering AAAA TTTT Coding sequence 5’ primer Raw Size partitioning & SMRT adapter PCR amplification (AAA)nn SampleNet: Iso-Seq Method with Clonetech cDNA Synthesis Kit A variable poly-T repeat at the rs10524523 SNP within intron 6 of the TOMM40 gene that in combination with APOE3 allele will affect the Poly-T age of onset of AD1) Poly(A) tail 3’AAAA primer TTTT (TTT)n AAAA Isoform clusters SMRT adapter (TTT)n TTTT Consensus calling AAAA TTTT Nonredundant transcript isoforms AAAA TTTT AAAA TTTT Quality filtering SMRTbell ligation (AAA)nn Informatics Pipeline 6 Final isoforms 7 PacBio raw sequence reads Reads of Insert 8 RS sequencing 9 Classify sequence reads Isoform clusters Evidenced-based to reference genome gene Map models 10 Nonredundant transcript isoforms Final isoforms Evidence-based gene models Remove adapters Remove artifacts Reads clustering Consensus calling Quality filtering Map to reference genome DevNet: Iso-Seq wiki page Figure 1 gDNA & Transcripts from SK-BR-3 Cell Line Captured with NimbleGen Oncology Panel - example AURKA gene Sample 1 All Sample 1 Phase 0 Sample 1 Phase I Long reads and heterozygous SNPs allow you to phase reads into two haplotypes Phased Transcripts reveal retained introns and skipped exons In addition to calling the bases, SMRT Sequencing uses the kinetic information from each nucleotide to distinguish between modified and native bases. A G C mA T G T T Template strand We successfully captured and sequenced the associated poly-T repeat in the TOMM40 gene and were able to determine that the sample had one short (15) allele and one Very Long (34) allele. Schizophrenia: C4 gene C GA G C T AG TTC A T G T Determines if 2 aa difference Long or Short in exon 26 determines C4A or C4B isotype Template strand Different possible combinations (L/S/C4A/C4B) Amplification-free targeted sequencing using CRISPR/Cas9 Example: N6-methyladenine Long-read sequencing data properties Accuracy Read Length - Two functionally distinct genes (isotypes); C4A and C4B - Both isotypes can have 1 - 3 functional copies - A human endogenous retroviral (HERV) insertion in intron 9 changes the length of the gene Data generated with 20 kb size-selected human library using 6 hour movies with P6-C4 chemistry using PacBio RS II, analyzed with SMRT® Analysis v 2.3. Each SMRT Cell generates ~55,000. reads. E. coli 20 kb-insert library, SMRT Analysis v2.3 Targeted sequencing and multiplexing Targeted sequencing workflow The C4 structural variant is highly complex:2) Barcoding options 1. Barcoded adapters C4 plays a role in signaling which connections between neurons should be “pruned” or removed, as the brain develops after childhood. And the more C4 was present, the higher the risk of developing schizophrenia. Certain versions of the C4 gene seem to increase people’s risk for developing schizophrenia by 27 to 50 percent. Repeat expansion disorders are challenging to interrogate due to the long repetitive regions. Using CRISRP/Cas9 we are able to access the repeat counts, interruption sequences as well as epigenetic information without introducing PCR HTT gene in HD sample bias. We successfully captured and sequenced the C4 gene as part of a MHC capture panel. We were able to see the two different C4 isotypes (C4A and C4B) as well as seeing the 7 kb HERV insertion in intron 9. Phase 0 2. Barcoded Universal Primer Direct methylation detection of FMR1 pre-mutation sample Phase 1 Example: Isotype C4A/L was successfully called for a sample using long-read sequencing The Huntingtin gene CAG repeat counts in HD patients Phase 0 Phase 1 CGG repeat region appears to be heavily methylated (5mC) References Intron 9 1) 2) Roses AD, et al. (2010). A TOMM40 variable-length polymorphism predicts the age of late-onset Alzheimer’s disease. Pharmacogenomics J. 10(5): 375-84 Sekar A, et al . (2016). Schizophrenia risk from complex variation of complement component 4. Nature. 530(7589):177-83 For Research Use Only. Not for use in diagnostics procedures. © Copyright 2016 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. All other trademarks are the sole property of their respective owners.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Candidate gene screening using long-read sequencing