Download Supplemental Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Supplemental Data
Material and Methods:
Human saliva collection: 5mL of unstimulated whole saliva (WS) and cell-free saliva
(CFS) samples were collected from healthy individuals between 9 AM and 10 AM in
accordance with published protocols (1). Subjects were asked to refrain from eating,
drinking, and oral hygiene procedures for at least 1 hour prior to saliva collection. Saliva
samples were kept on ice during collection. Briefly, for the WS samples, 5 mL of saliva
was collected then preserved with SUPERase-In (Ambion) and stored at -80oC until
analyzed. The CFS samples were centrifuged at the time of collection for 15 min at
2600x g (4oC), the supernatant phase was removed, then preserved with SUPERase-In
(Ambion) and stored at -80oC until analyzed. Low speed centrifugation of the saliva
pellets whole cells and large cell debris, resulting in the supernatant being referred to as
cell-free (1, 2). Saliva samples were kept on ice during collection.
Multiple saliva collections were performed over several weeks from the 2 volunteers in
order to optimize the RNA isolation protocol. After RNA could be isolated reproducibly,
8 libraries were prepared from 3 independent collections; 2 collections from volunteer 1
and 1 collection from volunteer 2.
Salivary RNA extraction: Briefly, starting with 500uL per sample,1.5 volumes of 2X
Denaturing Solution was added, and the samples were thoroughly mixed and incubated
on ice for 5 minutes. All subsequent steps were performed at room temperature. Next,
1.25 volumes of room temperature 100% ethanol was added, and the samples were
passed through the filter cartridge. The phenol-chloroform extraction step was omitted.
On-column DNase digestion (80uL, QIAGEN, RNase free DNase Set, Cat# 79254) was
performed for 15 minutes at room temperature between the first and second wash steps.
The filter cartridge was then washed twice, and the RNA was eluted using 100uL
nuclease-free water preheated to 95oC. Due to the low yield, RNA concentration was
determined using the Agilent 2100 bioanalyzer (Agilent Technologies, Agilent RNA
6000 Pico Kit, Cat# 5067-1513). Additionally, RNase and DNase digestions were
performed to confirm the nucleic acids were RNA.
RNA Library Preparation:Briefly, RNA was first fragmented. Because the salivary RNA
was partially degraded, the RNA fragmentation reaction for the CFS samples was
performed for 0, 3, and 10 minutes to determine the optimal fragmentation time.
Next, adapters were ligated to the RNA fragments, and the fragments were reverse
transcribed to generate a cDNA library. Gel size selection was performed to obtain
cDNAs ranging from 100 to 200 base pairs (bp), followed by 18 cycles of in-gel PCR
amplification.Each sample was barcoded with SOLiD™ 3’PCR Primers from the
SOLID™ RNA Barcoding Kit. Libraries were quantitated using the Agilent High
Sensitivity DNA Kit (Agilent Technologies).
Smear analysis was performed to
determine that the percentage of cDNA 25-200bp was less than 25%.
SOLIDTM Total RNA-Sequencing: A detailed description of the alignment strategy is
described in the BioScope user manual and can be found at (3).
Reproducibility of Replicates: The reproducibility between the replicates for the 3min
CFS and 10 min CFS samples was quite high (±15%), so for most of the analysis, the
sequences between the replicates were pooled prior to calculating RPKMs for increased
sequencing depth.The two replicates for the 3 and 10 min CFS samples were pooled. The
spearman correlations (R2) for technical replicates for 3 and 10 minutes were 0.99 and
0.98, respectively.
Network Analysis: Genetic network based analysis was performed using Ingenuity
software, version 9 (IPA 9.0-3206). IPA was applied in this study to explore the genes
detected in human saliva profiles using core analyses.
Results
cDNA Library Preparation
Preparation of cDNA libraries from RNA requires RNA fragmentation in order to
generate cDNAs with inserts narrowly distributed in size. With the assumption that RNA
isolated from saliva was already fragmented, we varied the time of fragmentation of the
human samples from 0 to 10 minutes to determine if additional fragmentation was
necessary.
RNA Fragmentation
The only fragmentation protocol used was the protocol described in the user’s manual
that accompanied the SOLiD Total RNA-Seq Kit (Cat# 4445374). The only variation was
the digestion time to prevent over fragmentation of the salivary RNA.
There were two reasons for varying the digestion time:
1. The amount of RNA used in the fragmentation reaction was significantly lower
than the amount of RNA recommended in the protocol
2. The salivary RNA was highly fragmented
Over fragmentation of the RNA would result in cDNA inserts that were too short for 50
bpsequencing. We found that fragmentation was necessary because the 3’ ends of the
fragmented RNA were not amenable to ligation of the adapters. We did try to
phosphorylate the ends of the fragmented RNA, but found that the ligation reaction was
less efficient than enzymatically fragmenting the RNA for 3 minutes (data not shown).
Additional fragmentation of the RNA appeared to increase the percentage of ribosomal
RNA reads within the library. Both fragmented and unfragmented samples showed
similar percentages of uniquely aligned sequences. The unfragmented sample did contain
a higher percentage of unaligned reads, suggesting that additional fragmentation may be
needed to make the ends of the RNA fragments available for ligation of the adapter to the
RNA.
Exon-specific coverage
Figure 1: The following 5 panels provides more examples that structural RNA integrity
has been preserved and sequencing reads align to gene annotation. These examples
complement MALAT1 data.
A: Alignment of the reads to the genomic loci containing DUSP1 show RNA integrity
has been preserved. Sample was from volunteer #1 and RNA was isolated from cell-free
saliva.
B: Alignment of reads to the genomic loci containing S100A9 (8737 bp). While the plot
only shows data from 2 volunteers, the 6 samples represent independent saliva
collections, library preparation, and sequencing events. The first line in each sample
shows reads aligning to the negative DNA strand and the second line shows reads
aligning to the positive strand. The gene is expressed on the positive strand.
C: Alignment of reads to the genomic loci containing GAS5. GAS5 is not expressed, but
snoRNAs within the introns of GAS5 are highly expressed. The snoRNAs are
differentially expressed and the RPKM values are placed above each peak. A similar
alignment and expression pattern is observed with Hela cells (data not shown).
D: Alignment of reads to a 3.3 Mb loci on chromosome 2 matches the gene annotation
for this region. Both the positive and negative strands are shown and read alignment
correlates with annotated transcript expression on both strands.
E: Alignment of reads to the genomic loci containing MALAT1 (110 kb). Upstream of
MALAT1 a human EST is expressed. Expression of this transcript would not have been
detected if microarrays or PCR would have been used to measure gene expression.
Network-based Gene Analysis:
We expanded the salivary gene investigation further by performing a network-based
analysis using 840genes that were detected in both cell-free and whole human saliva
samples at noise threshold set to 1 RPKM.
A network-based Ingenuity core analysis was applied to the human gene listindicating
pathways that are enriched in saliva. The network-based analysis suggested that a key
function of the salivary genes was related to the “inflammatory response”. This network
involved 47 salivary genes (p=1.57E-06).
Furthermore, this panel of genes showed involvement in processes related to all phases of
the cell cycle: cellular growth and proliferation (p=5.14-E-10), cell death (p=4.59E-08),
cellular development (p=6.46E-08), post-transcriptional modification (p=1.20E-06) as
well as cellular function and maintenance (p=1.63E-06).
Given the fact that our panel of salivary genes was related to all phases of the cell cycle
was suggestive of a transcriptomic signature found in human saliva that may represent an
abridged mirror image of biological processes within the body. Overall, this networkbased analysis indicates that the RNA found in saliva was representing more than just a
by-product of cellular death or RNA degradation pathway. This observation was
important since it supports the diagnostic value of saliva for detection of biomarkers for
distal and systemic diseases.
Discussion
In addition, in a separate study not described in this manuscript, salivary RNA from 9
healthy volunteers was sequenced. Comparison of the two RNA-Seq data sets showed the
same overall trends as described in this manuscript.
1. Size distribution analysis of salivary RNA showed the RNA was fragmented.
However, complete sequence coverage across entire exons was consistently
observed, independent of the exon length. There was not over or under
representation of specific regions of the transcript, indicating fragmentation bias of
the RNA prior to library generation and sequencing.
2. Figure 2 provides an overview of genes detected at different RPKM thresholds in
each sample (volunteer 1-10). The fraction of uniquely mapped reads ranged from
10-25%, with the largest fraction of reads being unmapped to the human reference
genome, most likely due to the high microbial sequence content.
3. Multiple species of RNA were consistently expressed, including mRNAs, long
ncRNAs, miRNAs, and snoRNAs. Noncoding RNAs represented a minimum of
90% of the most highly expressed genes; with snoRNAs being the highest fraction
represented relative to the other RNA species.
4. Sequence alignment showed that structural integrity had been preserved. Sequenced
reads aligned to annotated exons in the human reference genome (Figure 3 Panel A
and B).
References Cited:
1.
2.
3.
Navazesh M. Methods for collecting saliva. Ann N Y Acad Sci 1993;694:72-7.
St John MA, Li Y, Zhou X, Denny P, Ho CM, Montemagno C, et al. Interleukin 6
and interleukin 8 as potential biomarkers for oral cavity and oropharyngeal
squamous cell carcinoma. Arch Otolaryngol Head Neck Surg 2004;130:929-35.
Tuch BB, Laborde RR, Xu X, Gu J, Chung CB, Monighetti CK, et al. Tumor
transcriptome sequencing reveals allelic expression imbalances associated with
copy number alterations. PLoS One 2010;5:e9317.
Figures:
Figure 1: Examples of genes where RNA structural integrity has been preserved and
sequenced reads align to the annotated reference genome. The transcript is expressed
from the negative DNA strand and counts align to this strand. No reads aligned to the
positive as expected.
A. Reads align to the annotated loci of DUSP1.
B. Six independent saliva samples collected on different days from the two volunteers
described in the manuscript show reads aligning only the positive strand as expected. The
Y axis is 10 to 100 reads for all samples. Read counts are higher in the CFS samples
compared to WS samples.
Gene: S100A9
C. GAS5 is not expressed in saliva but snoRNAs are expressed the introns within GAS5.
This same expression pattern is observed in Hela cells.
8960
RPKM161
3585
1048 1958 1759
388
903
2993
D. Sequenced reads aligned to chromosome 2 (3.3 Mb region) shows directionality of
transcript expression and sequence read alignment to annotated genes.
+
E. Transcriptional activity upstream of MALAT1. Sequenced reads align to a human
EST, which would not be identified if PCR or a microarray was used.
Figure 2: Gene detection as a function of increasing RPKM noise thresholds.
(A) RefSeq gene detection in saliva samples. There were 21,208 RefSeq genes for
human. The two replicates for the 3 and 10 min CFS samples were pooled prior to
calculating RPKM values. The spearman correlations for technical replicates for 3 and 10
minutes were 0.99 and 0.98, respectively. At 1 RPKM, the two 3 min CFS samples each
showed 5504 and 6256 genes detected, with a spearman correlation of 0.98, while the
two 10 min CFS samples each showed 5067 and 5133 genes detected and a spearman
correlation of 0.99.
(B) MicroRNA detection in the saliva samples.
(C) Number of different microbial species detection in the human WS and CFS samples
based on 16S rRNA gene clone analysis.
Figure 3: RNA structural integrity is preserved in saliva.
Examples of transcript expression at two loci for 3 additional volunteers. Plotted across
each locus is the normalized sequence coverage on both DNA strands. The x-axis is in
groups of two for each sample. The first line is the negative strand and the second line is
the positive strand. Samples CFS and WS are data from the manuscript and samples 5, 6,
and 10 are from 3 additional samples. (A) MALAT1. The y-axis for the CFS samples is
0 to 1000 while the y-axis for the remaining samples is 0 to 10. (B) IL8. The y-axis scale
for sample CFS is 0 to 100 and the y-axis for the remaining 4 samples is 0 to 20.
A
B