Download Single Molecule, Real-Time Sequencing for Base Modification

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Single Molecule, Real-Time Sequencing for Base Modification
Detection in Eukaryotic Organisms: Coprinopsis cinerea
Khai Luong1, Tyson A. Clark1, Matthew Boitano1, Yi Song1, Stephen W. Turner1, Jonas Korlach1
Lukas Chavez2, Patricia J Pukkila3, Yun Huang2, Virginia K. Hench3, Willaim Pastor2, Lakshminarayan M. Iyer5, Suneet Agarwal4, L. Aravind Iyer5, Anjana Rao2
1Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025
2 La Jolla Insitute for Allergy & Immunology, 9420 Athena Circle, La Jolla, CA 92037
3 University of North Carolina at Chapel Hill, Department of Biology, CB#3280, Chapel Hill, NC 27599-3280
4 Harvard Stem Cell Institute, Holyoke Center, Suite 727W, 1350 Massaachusetts Ave., Cambridge, MA 02138
5 NCBI-CBB, 8600 Rockville Pike, MSC 6075, Bethesda, MD 20894-6075
• Global comparison shows that SMRT detection
of diglucosylation agrees with other methods, in
particular in CpG-rich regions.
Signal Enhancement Strategies
Introduction
hmC
Methods
Fluorescence
intensity (a.u.)
400
T
C
G
A
T C
300
mA
200
IPD
transferase
transferase
Correlation with Gene Annotations
Diglucosylated
5-hmC
5-hmC
Oxidation by TET
5-caC
5-mC
• Chemical or enzymatic treatment of the DNA
sample can be used to increase signal intensity
Heat Map of IPD Ratios
0=C
1=G
+2
+6
5mC
5hmC
5fC
5caC
• Modification rich regions overlap with TET/JBP
transposons
digluc5hmC
downstream NN
• Using template sequence context NN*CGNN,
the heat map shows sequence-dependent
variability in the kinetic signature
• Salient Feature #2: IPD ratio intensity increases
from 5-mC to higher oxidative states the of
methyl group in nearly all sequence contexts
Base Modification Detection
by SMRT® Sequencing
A
Glucosyl-
Upstream NN
Single Molecule Real-Time (SMRT®) DNA
sequencing provides a wealth of kinetic
information beyond the extraction of the
primary DNA sequence, and this kinetic
information can provide for the direct detection
of modified bases present in genomic DNA. This
method has been demonstrated for base
modification detection in prokaryotes at base
and strand resolutions. In eukaryotes, the
common base modifications known to exist are
the cytosine variants including methyl,
hydroxymethyl, formyl and carboxyl forms.
Each of these modifications exhibits different
signatures in SMRT kinetic data, allowing for
unprecedented possibilities to differentiate
between them in direct sequencing data. We
present early results of directly sequencing
different base modifications in eukaryotic
genomic DNA using this method.
Glucosyl-
A A
Model Organism
100
• Discovered unique motif of ~200 bp that is
methylated across several chromosomes and
overlaps with novel retroposons
5’-…CACAGGTTACTGCGGAGCGCAGCAGAGATAAATTAGAGAA…-3
• Comparison of base modification analysis of
native vs. diglucosylated samples reveals
prominence of caC in LTR of novel retroposons
(6 complete copies in chr. 1,4,7,9,11, and 12)
motif
with 5caC
motif
with 5caC
Retroposon (approx. 8704 nt)
0
70.5
71.0
71.5
72.0
72.5
73.0
73.5
74.0
74.5
• Coprinus cinereus (C. cinerea okayama 7#130):
a multicellular basidiomycete fungus with a
typical mushroom form that undergoes a
complete sexual cycle.
Time (s)
B
Fluorescence
intensity (a.u.)
400
G
C T
C
G
A
TC
A AG T
A
C
A
A
300
200
100
0
104.5
105.0
105.5
106.0
106.5
107.0
107.5
108.0
108.5
Time (s)
• Inter-pulse duration (IPD) is the time between
the previous and current base incorporations
4
IPD ratio
IPD ratio
4
3
CHG
2
1
1
20
30
40
50
60
70
80
90
100
110
120
130
30
40
50
6
5
5
4
4
3
2
1
1
30
40
50
60
70
80
90
Template position
80
90
100
110
120
130
100
110
120
130
20
30
40
50
60
70
80
90
100
110
ZnF
Integrase
Chromo
6 complete copies in the Cc genome: chromosomes 1, 4, 7, 9, 11, 12
all embedded in clusters of modifications
LTR
• Pattern of oxidized 5-mC enrichment is
predictive for centromeres
10.0
8.0
6.0
4.0
2.0
0.0
5-hmdC
5-fodC
5-cadC
• Used SMRT sequencing to sequence complete
C. cinerea genome to ~100x coverage using both
800 bp and 8 kb libraries
• Native genome
• Diglucosylation (enhance 5-hmC)
Correlation with other Methods
3
2
20
70
Template position
IPD ratio
IPD ratio
Template position
60
RNaseH
12.0
Results
20
6
mCG
dC modifications (relative amounts)
CHH
3
2
mCHH
CG
SMRT® Sequencing of the
Four Forms of Cytosine
5
RTase
14.0
• At every position, compare the observed IPDs
to the expected IPD distribution
5
aspartyl peptidase
Variants of cytosine in genome
mCHG
6
LTR
Zn-Knuckle
• Small genome ~36 Mb:
13 chromosomes
• IPD for a base complement to a modified base
is, on average, longer than to a canonical base
6
GAG
120
130
Template position
• IPD ratio kinetograms show reproducible
footprint signatures for different modifications
• Salient Feature #1: Variants of C have three
strongest peaks at position 0, +2, and +6 in the
5’ direction
Conclusions
• SMRT Sequencing can detect the four known
variants of C found in eukaryote genomes, 5-mC,
5-hmC, 5-fC, and 5-caC
• Enhancement strategy combined with native
genome sequencing can increase differentiation
between some variants of C
• There are 40 copies of TET/JBP transposons in
C. cinerea, contained within large regions of
oxidized 5-mC in CpG regions
• Centromeres are also strongly marked with
oxidized methylcytosines
• A novel set of retroposons bears caC in a specific
motif in its long terminal repeat
Pacific Biosciences, PacBio, SMRT, SMRTbell and the Pacific Biosciences logo are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2013 Pacific Biosciences of California, Inc. All rights reserved.