Download An Improved cDNA Library Generation Protocol for Transcriptome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Signal transduction wikipedia , lookup

Cell growth wikipedia , lookup

Mitosis wikipedia , lookup

Cell culture wikipedia , lookup

Organ-on-a-chip wikipedia , lookup

Cell cycle wikipedia , lookup

Amitosis wikipedia , lookup

Cellular differentiation wikipedia , lookup

List of types of proteins wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
46
Clontech Laboratories, Inc.
An Improved cDNA Library Generation Protocol for Transcriptome Analysis from a Single Cell
A Takara Bio Company
Rachel Fish, Sally Zhang, Magnolia Bostick, Cynthia Chang, Suvarna Gandlur, Andrew Farmer1
Corresponding Author
Clontech Laboratories, Inc., 1290 Terra Bella Ave., Mountain View, CA 94043
1
A new SMARTer Ultra Low kit has been developed that is simpler and faster while improving
the quality and yield of the cDNA produced. The full-length cDNA from this method may be
used as a template for library sample preparation for Ion Torrent and Illumina® NGS platforms.
Sequencing results for libraries created from single cells or from equivalent amounts of total RNA
demonstrate that approximately 90% of the reads map to RefSeq, less than 0.5% of the total reads
map to rRNA, and the average transcript coverage is uniform. Improvements in the protocol following first strand synthesis and during cDNA amplification show higher sensitivity with an increase
in gene counts and improved representation from GC-rich genes. These data indicate that the
​improved SMART cDNA protocol is an ideal choice for single cell transcriptome analysis.
cDNA libraries were generated according to the identified protocols. For UL-HV, cDNA was purified with AMPure XP beads and amplified with Advantage® 2 polymerase. For UL-v3, cDNA was
not purified prior to amplification by SeqAmp polymerase.
Illumina adapters and indices were added using the Nextera XT protocol with 100–250 pg input
cDNA. cDNA libraries were sequenced on an Illumina MiSeq® platform with 1 x 50 reads.
®
XX
XXX
5'
5' blocked
primer
35
XX 5'
XXX
XX
XXX
5'
Double-stranded
cDNA
1.9
4.4%
80%
70%
57%
43%
11,539
5
0
400 500 600 1000 2000
10,380
35
[bp]
100 150
3
UL-HV
3.8
2.7%
78%
68%
67%
33%
8,518
2.0
1.4%
89%
78%
76%
24%
12,003
UL-v3
4.2
0.3%
86%
75%
75%
25%
9,136
HeLa
1 cell
2.1
0.2%
94%
76%
85%
15%
9,261
1,000 cells
2.5
0.4%
91%
70%
92%
8.4%
12,439
103
0.973
C
101
104
100
10
10
3
-1
R=0.981
ρ=0.837
100
101
102
UL-HV-1 RPKM
103
104
UL-v3-1 vs. UL-v3-2
104
6
Genes 36-63% GC
Genes < 36% GC
Genes > 63% GC
102
Genes 36-63% GC
Genes < 36% GC
Genes > 63% GC
UL-v3 RPKM
10-1
B
UL-v3-2 RPKM
Jurkat
1 cell
2.4
0.2%
80%
70%
75%
25%
4,199
8
101
100
10-2
10-2
R=0.907
ρ=0.828
10-1
100
100
101
UL-HV RPKM
102
103
104
R=0.982
ρ=0.861
10-1
100
101
102
103
Jurkat (1 cell)
104
UL-v3-1 RPKM
Expression levels by gene GC content comparing two cDNA synthesis protocols. The libraries made
from 100 pg HBR using either the UL-HV protocol or the UL-v3 protocol were compared. Genes were
binned by GC content and correlation plots were used to evaluate the two protocols. The average
gene counts are very reproducible for replicate samples using UL-HV (Panel A) or UL-v3 (Panel B).
Genes with a high GC content (shown in red) show higher expression with the UL-v3 protocol while
genes with a median or low GC content (shown in gray and blue, respectively) show an even distribution when the two protocols are compared (Panel C). Pearson (R) and Spearman (r) correlations are
indicated for all genes regardless of GC content.
0
35
150 300
500
10,380
35
[FU]
100
200
300 400 500 700
C
140
2000
10,380 [bp]
UL-v3
120
100
80
60
40
20
0
35
100
200
300 400 500 700
2000
10,380 [bp]
Primary Sequencing Metrics Are Similar
Between UL-v3 and SMART-Seq2
Input
10 pg MBR
Protocol
Number of reads (millions)
Reads discarded
Mapped to RefSeq
Mapped uniquely to RefSeq
Mapped to exons
Mapped to introns
Number of genes
9
Jurkat (1,000 cells)
1
101
10-1
102
HeLa (1,000 cells)
20
Table III: Primary Sequencing Metrics from Two cDNA Synthesis Protocols
Gene Body Coverage Is Even and
Reproducible for Different Cell Types
HeLa (1 cell)
40
SMART-Seq2
5.0
64%
75%
64%
71%
29%
7,776
UL-v3
3.3
3%
83%
72%
76%
24%
7,421
Sequencing metrics for cDNA generated by two protocols from very small amounts of total RNA.
Sequencing metrics generated using the Illumina MiSeq platform are similar between the two protocols. However, the % reads discarded is much higher for SMART-Seq2 generated libraries. It is likely
that the concatamers in the SMART-Seq2 libraries (Figure 7B) contribute to this high fraction.
Sequencing metrics from cells. cDNA libraries made from 1 cell or 1,000 cells (HeLa or Jurkat) using
the UL-v3 protocol were sequenced on an Illumina MiSeq platform (Table IIA). cDNA libraries made
from 1 cell (HeLa or Jurkat) using the UL-v3 protocol were also sequenced on an Ion Torrent PGM
platform (Table IIB). When using whole cells as input, the UL-v3 protocol maintains the high mapping
to RefSeq, high exon:intron ratio, and low mapping to rRNA seen with low-input purified total RNA,
even for a single cell.
UL-HV vs. UL-v3
100
50
0
80
60
Electropherograms of cDNA generated from
two cDNA synthesis protocols. Panel A shows
a Bioanalyzer trace of cDNA generated from a
single T cell (human) using the SMART-Seq2
protocol as described by Picelli et al. (5, 6).
Picelli et al. suggest that the ‘hedgehog’ pattern shown in the trace is most likely due to
the formation of concatamers of the template
switching oligonucleotide (5). cDNA libraries
were made from 10 pg MBR using either the
SMART-Seq2 protocol (5, 6; Panel B) or the ULv3 protocol (Panel C) and analyzed on an Agilent 2100 Bioanalyzer. The UL-v3 protocol has
been optimized to prevent the formation of
concatamers; therefore no ‘hedgehog’ pattern
is observed.
PGM
HeLa
1 cell
2.3
0.2%
94%
83%
85%
15%
9,256
250
200
150
DNA size (bp)
Genes 36-63% GC
Genes < 36% GC
Genes > 63% GC
102
10-2
Jurkat
1,000 cells
1.8
0.3%
86%
68%
84%
16%
12,611
0.876
1 cell
1.1
0.2%
83%
66%
77%
23%
4,200
Picelli et al.
300
[bp]
MiSeq
Cell type
Input
Millions of reads
Mapped to rRNA
Mapped to RefSeq
Mapped uniquely to RefSeq
Mapped to exons
Mapped to introns
Number of genes
UL-HV-1 vs. UL-HV-2
104
10,380
Table IIB: Primary Sequencing Metrics from Ion Torrent Sequencing
Expression Levels of GC-Rich Genes
Are Increased Using UL-v3
A
300 400 500 600 1000 2000
Primary Sequencing Metrics Are Similar for
a Single Cell and 1,000 Cells
Cell type
Input
Millions of reads
Mapped to rRNA
Mapped to RefSeq
Mapped uniquely to RefSeq
Mapped to exons
Mapped to introns
Number of genes
Pearson correlation (R)
Sequencing metrics from very small amounts of total RNA comparing two cDNA synthesis protocols. Compared to UL-HV, the UL-v3 kit results in significant improvements in the number of genes
detected, the exon:intron ratio, and mapping to RefSeq. The increased sensitivity gained using the
UL-v3 protocol is most significant at the lowest RNA inputs.
10-2 -2
10
For Research Use Only. Not for use in diagnostic or therapeutic procedures. Not for resale. Clontech, the Clontech logo, Advantage, SeqAmp,
SMART, SMARTer, SMARTScribe, and Ultra are trademarks of Clontech Laboratories, Inc. Takara and the Takara logo are trademarks of TAKARA
HOLDINGS, Kyoto, Japan. Illumina, MiSeq, and Nextera are registered trademarks of Illumina, Inc. All other marks are the property of their
respective owners. Certain trademarks may not be registered in all jurisdictions. ©2014 Clontech Laboratories, Inc.
20
Sequencing platform
10 pg MBR
2.0
1.3%
89%
79%
75%
25%
11,992
300
40
Table IIA: Primary Sequencing Metrics from Illumina Sequencing
UL-v3
2.2
4.2%
79%
69%
58%
42%
11,631
100 150 200
60
3. cDNA amplification. The SMARTer anchor sequence
and poly A sequence
serve as universal priming
sites
Electropherograms
of amplified
SMARTer cDNA. Individual cells or 1,000 cells (HeLa or Jurkat) were
for end-to-end cDNA amplification. cDNA contains
used as input for SMARTer cDNA synthesis using the UL-v3 kit. The cDNA samples were analyzed for
the complete 5' end of the mRNA plus sequences
purity and yield on an Agilent 2100 Bioanalyzer. The single main peak indicates the purity and yield of
complementary to the SMARTer oligo. Since blocked
cDNA
(the additional
peak at ~1 kb in the HeLa cell samples is the contribution of a single highly
primers are used, the
adapter
sequences
are not modified
expressed
gene,
FTH1).and
Theare
reproducibility of these traces shows the reliability and sensitivity of the
by the Illumina primers
in library
production
therefore not sequenced.
Finally,
poly A–tailed
cDNAfrom whole cells.
UL-v3 kit
for cDNA
synthesis
is sheared to the appropriate library size prior
to sequencing.
Amplify cDNA by LD PCR
with blocked PCR primer
UL-HV
10-1
1.Deng, Q., et al. (2014) Science 343(193):193-196.
2.Shalek A.K., et al. (2013), Nature 498(7453):236-240.
3.Mortazavi, A., et al. (2008) Nature Methods 5(7):621-628.
4.Chenchik, A., et al. (1998) In RT-PCR Methods for Gene Cloning and Analysis.
Eds. Siebert, P. & Larrick, J. (BioTechniques Books, MA), pp. 305–319.
5.Picelli, S., et al. (2013) Nature Methods 10(11):1096–1098.
6.Picelli, S., et. al. (2014) Nature Protocols 9(1):171–181.
Template switching
and extension by RT
SMART-Seq2
100
1 cell (21 PCR cycles)
1 cell (21 PCR cycles)
1,000 cells (13 PCR cycles)
100
2. Template switching300
and extension by SMARTScribe
Reverse Transcriptase. When SMARTScribe reaches
200 its terminal transferase activity
the 5' end of the RNA,
adds a few additional nucleotides to the 3' end of the
100Oligonucleotide base-pairs with
cDNA. The SMARTer
this non-template nucleotide stretch, creating an extended
0
template to enable SMARTScribe
to continue replicating
to the end of the oligonucleotide (1).
XX
100 pg HBR
103
References
120
400
5'
XXX
B
[FU]
Jurkat
Sequencing platform
10
• Contaminating sequences are minimized—The new kit is optimized to maintain
the representation of the original mRNA transcripts and prevent amplification of
contaminating sequences (Figure 7, Table III).
HeLa
A
120
[FU]
80
-2
• Optimized for single-cell RNA-seq—The UL-v3 kit produces cDNA of high yield and
high quality from single cells and can be used for either Illumina or Ion Torrent library
construction (Figure 4, Table II).
[FU]
First-strand synthesis
and tailing by RT
7
B
1. First strand synthesis
and tailing by SMARTScribe™
600
Reverse Transcriptase. The SMARTer
Ultra
Low RNA
1 cell (20
PCR cycles)
1 cellwith
(20 PCR
cycles)
Kit for Illumina Sequencing
starts
total
RNA
500
1,000 cells (12 PCR cycles)
and an oligo(dT) primer called the SMART CDS Primer.
Table I: Primary Sequencing Metrics from Very Small Amounts of Total RNA
Comparing Two cDNA Synthesis Protocols
UL-HV-2 RPKM
• Higher expression of GC-rich genes—While the UL-HV and UL-v3 kits both generate highly
reproducible data, the new PCR conditions in the UL-v3 kit result in higher expression of
GC-rich genes (Figure 3).
poly A 3'
SMART
CDS primer
Primary Sequencing Metrics Are
Improved with the UL-v3 Kit
Protocol
Number of reads (millions)
Mapped to rRNA
Mapped to RefSeq
Mapped uniquely to RefSeq
Mapped to exons
Mapped to introns
Number of genes
Gene body coverage was determined using the geneBody_coverage.py module of RSeQC. Read
coverage was normalized using Excel.
• Improved sensitivity—The new kit results in higher mapping to RefSeq, a higher exon:intron
ratio, a greater number of genes detected, lower mapping to rRNA (Table I), and more even
gene body coverage (Figures 6 and 9).
5'
SMARTer IIA
oligonucleotide
Input
Reads were trimmed by CLC Genomics Workbench and mapped to rRNA and the mitochondrial
genome with CLC. Unmapped reads were then mapped with CLC to the human or mouse genome
with RefSeq masking, producing uniquely mapped reads and % reads that mapped to RefSeq annotations. The number of genes identified in each library was determined by the number of genes
with a RPKM ≥0.1. Expression levels of different genes were obtained using CLC Genomics Workbench after mapping to RefSeq.
The SMARTer Ultra Low Input RNA Kit for Sequencing - v3 produces accurate, full-length,
unbiased cDNA, even from single cells or single-cell levels of total RNA (10 pg). When used
with Illumina or Ion Torrent NGS platforms, this kit generates robust libraries, accurate gene
quantification, and reproducible, high-quality sequence data. This allows researchers to have
greater confidence in their RNA-seq data.
XX
XXX
A blocked PCR primer
prevents adapters from being
ligated and sequenced.
Ion Torrent adapters and barcodes were added using the Ion Xpress Plus Fragment Library Preparation Kit and Ion Xpress Barcode Adapter Kit with 1–10 ng cDNA input. cDNA libraries were size
selected for 200 base reads and sequenced on an Ion Torrent PGM platform.
Conclusions
5'
The SMARTer oligonucleotide
and primer sequences serve
as universal priming sites for
PCR amplification.
2
A
Total RNA or cell(s)
The SMARTer oligonucleotide
pairs with these additional
nucleotides and allows the RT to
continue to generate cDNA (4).
The SMARTer Ultra Low Input RNA Kit for Sequencing - v3 is the latest in a series of Clontech®
products targeted at generating cDNA for RNA-seq from very low inputs. This third generation
oligo(dT)-primed kit has been especially designed for transcriptome profiling of 1–1,000 cells or
10 pg–10 ng of purified total RNA. The major changes in this kit include improvements in the cell
lysis step, optimization of primers and oligonucleotides, elimination of the clean-up step prior to
cDNA amplification, and using the more robust SeqAmp™ PCR polymerase for cDNA amplification.
Materials and Methods
cDNA is generated with the
SMARTScribe™RT and SMART
CDS primer.
The RT adds non-templated
nucleotides to the 3’ end of the
cDNA.
Introduction
4
The UL-v3 Protocol Is Optimized for
Accurate Generation of cDNA
(fluorescence units)
1
UL-v3 Robustly Generates cDNA from
Individual HeLa and Jurkat Cells
DNA amount
SMART™technology is a powerful method for cDNA synthesis that enables library preparation
from very small amounts of starting material. Indeed, the SMARTer® Ultra™ Low RNA method allows researchers to readily obtain high quality data from a single cell or 10 pg of total RNA—the
approximate amount of total RNA in a single cell. Recent studies have used this method to investigate heterogeneity among individual cells based on RNA expression patterns (1, 2).
SMART Technology for cDNA Synthesis
from Small Amounts of Total RNA
Gene Body Coverage Is Comparable for
UL-v3 and SMART-Seq2
SMART-Seq2
UL-v3
1
Normalized Read Coverage
As Next Generation Sequencing (NGS) technologies and transcriptome profiling using NGS mature, they are increasingly being used for more sensitive applications that have only limited sample availability. The ability to analyze the transcriptome of a single cell consistently and meaningfully has only recently been realized.
Normalized Read Coverage
Abstract
0.8
0.6
0.4
0.2
0
0
10
20
30
40
50
60
70
80
90
100
Normalized Transcript Length, 5' → 3' (%)
0.8
Comparison of gene body coverage for two cDNA synthesis protocols. The gene body coverage from
libraries made with 10 pg MBR using either the SMART-Seq2 protocol (blue line) or the UL-v3 protocol (orange line) is shown. Both protocols give similar, unbiased gene body coverage profiles.
0.6
0.4
Abbreviations
0.2
0
0
10
20
30
40
50
60
70
80
90
100
Normalized Transcript Length, 5' → 3' (%)
Gene body coverage from different cell inputs. The gene body coverage of cDNA libraries made from
1 or 1,000 HeLa cells (blue and red lines, respectively), or 1 or 1,000 Jurkat cells (green and purple
lines, respectively) using the UL-v3 kit is shown. Independent of the cell type used, there is good coverage without strong 5’ or 3’ biases. Coverage is similar for a given cell type regardless of the number
of cells used as input for cDNA synthesis.
RPKM—Read per Kilobase of Exon per Million Reads (3)
UL-HV—SMARTer Ultra Low Input RNA Kit for Illumina Sequencing - HV
UL-v3—SMARTer Ultra Low Input RNA Kit for Sequencing - v3
RT—Reverse Transcriptase
HBR—Human Brain Total RNA
MBR—Mouse Brain Total RNA
SMART—Switching Mechanism at 5’ End of RNA Template
PS633686
1290 Terra Bella Ave., Mountain View, CA 94043
Orders and Customer/Technical Service: 800.662.2566
Visit us at www.clontech.com