Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Single Molecule, Real-Time Sequencing for Base Modification Detection in Eukaryotic Organisms: Coprinopsis cinerea Khai Luong1, Tyson A. Clark1, Matthew Boitano1, Yi Song1, Stephen W. Turner1, Jonas Korlach1 Lukas Chavez2, Patricia J Pukkila3, Yun Huang2, Virginia K. Hench3, Willaim Pastor2, Lakshminarayan M. Iyer5, Suneet Agarwal4, L. Aravind Iyer5, Anjana Rao2 1Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025 2 La Jolla Insitute for Allergy & Immunology, 9420 Athena Circle, La Jolla, CA 92037 3 University of North Carolina at Chapel Hill, Department of Biology, CB#3280, Chapel Hill, NC 27599-3280 4 Harvard Stem Cell Institute, Holyoke Center, Suite 727W, 1350 Massaachusetts Ave., Cambridge, MA 02138 5 NCBI-CBB, 8600 Rockville Pike, MSC 6075, Bethesda, MD 20894-6075 • Global comparison shows that SMRT detection of diglucosylation agrees with other methods, in particular in CpG-rich regions. Signal Enhancement Strategies Introduction hmC Methods Fluorescence intensity (a.u.) 400 T C G A T C 300 mA 200 IPD transferase transferase Correlation with Gene Annotations Diglucosylated 5-hmC 5-hmC Oxidation by TET 5-caC 5-mC • Chemical or enzymatic treatment of the DNA sample can be used to increase signal intensity Heat Map of IPD Ratios 0=C 1=G +2 +6 5mC 5hmC 5fC 5caC • Modification rich regions overlap with TET/JBP transposons digluc5hmC downstream NN • Using template sequence context NN*CGNN, the heat map shows sequence-dependent variability in the kinetic signature • Salient Feature #2: IPD ratio intensity increases from 5-mC to higher oxidative states the of methyl group in nearly all sequence contexts Base Modification Detection by SMRT® Sequencing A Glucosyl- Upstream NN Single Molecule Real-Time (SMRT®) DNA sequencing provides a wealth of kinetic information beyond the extraction of the primary DNA sequence, and this kinetic information can provide for the direct detection of modified bases present in genomic DNA. This method has been demonstrated for base modification detection in prokaryotes at base and strand resolutions. In eukaryotes, the common base modifications known to exist are the cytosine variants including methyl, hydroxymethyl, formyl and carboxyl forms. Each of these modifications exhibits different signatures in SMRT kinetic data, allowing for unprecedented possibilities to differentiate between them in direct sequencing data. We present early results of directly sequencing different base modifications in eukaryotic genomic DNA using this method. Glucosyl- A A Model Organism 100 • Discovered unique motif of ~200 bp that is methylated across several chromosomes and overlaps with novel retroposons 5’-…CACAGGTTACTGCGGAGCGCAGCAGAGATAAATTAGAGAA…-3 • Comparison of base modification analysis of native vs. diglucosylated samples reveals prominence of caC in LTR of novel retroposons (6 complete copies in chr. 1,4,7,9,11, and 12) motif with 5caC motif with 5caC Retroposon (approx. 8704 nt) 0 70.5 71.0 71.5 72.0 72.5 73.0 73.5 74.0 74.5 • Coprinus cinereus (C. cinerea okayama 7#130): a multicellular basidiomycete fungus with a typical mushroom form that undergoes a complete sexual cycle. Time (s) B Fluorescence intensity (a.u.) 400 G C T C G A TC A AG T A C A A 300 200 100 0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 Time (s) • Inter-pulse duration (IPD) is the time between the previous and current base incorporations 4 IPD ratio IPD ratio 4 3 CHG 2 1 1 20 30 40 50 60 70 80 90 100 110 120 130 30 40 50 6 5 5 4 4 3 2 1 1 30 40 50 60 70 80 90 Template position 80 90 100 110 120 130 100 110 120 130 20 30 40 50 60 70 80 90 100 110 ZnF Integrase Chromo 6 complete copies in the Cc genome: chromosomes 1, 4, 7, 9, 11, 12 all embedded in clusters of modifications LTR • Pattern of oxidized 5-mC enrichment is predictive for centromeres 10.0 8.0 6.0 4.0 2.0 0.0 5-hmdC 5-fodC 5-cadC • Used SMRT sequencing to sequence complete C. cinerea genome to ~100x coverage using both 800 bp and 8 kb libraries • Native genome • Diglucosylation (enhance 5-hmC) Correlation with other Methods 3 2 20 70 Template position IPD ratio IPD ratio Template position 60 RNaseH 12.0 Results 20 6 mCG dC modifications (relative amounts) CHH 3 2 mCHH CG SMRT® Sequencing of the Four Forms of Cytosine 5 RTase 14.0 • At every position, compare the observed IPDs to the expected IPD distribution 5 aspartyl peptidase Variants of cytosine in genome mCHG 6 LTR Zn-Knuckle • Small genome ~36 Mb: 13 chromosomes • IPD for a base complement to a modified base is, on average, longer than to a canonical base 6 GAG 120 130 Template position • IPD ratio kinetograms show reproducible footprint signatures for different modifications • Salient Feature #1: Variants of C have three strongest peaks at position 0, +2, and +6 in the 5’ direction Conclusions • SMRT Sequencing can detect the four known variants of C found in eukaryote genomes, 5-mC, 5-hmC, 5-fC, and 5-caC • Enhancement strategy combined with native genome sequencing can increase differentiation between some variants of C • There are 40 copies of TET/JBP transposons in C. cinerea, contained within large regions of oxidized 5-mC in CpG regions • Centromeres are also strongly marked with oxidized methylcytosines • A novel set of retroposons bears caC in a specific motif in its long terminal repeat Pacific Biosciences, PacBio, SMRT, SMRTbell and the Pacific Biosciences logo are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2013 Pacific Biosciences of California, Inc. All rights reserved.