Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Supplemental Data Material and Methods: Human saliva collection: 5mL of unstimulated whole saliva (WS) and cell-free saliva (CFS) samples were collected from healthy individuals between 9 AM and 10 AM in accordance with published protocols (1). Subjects were asked to refrain from eating, drinking, and oral hygiene procedures for at least 1 hour prior to saliva collection. Saliva samples were kept on ice during collection. Briefly, for the WS samples, 5 mL of saliva was collected then preserved with SUPERase-In (Ambion) and stored at -80oC until analyzed. The CFS samples were centrifuged at the time of collection for 15 min at 2600x g (4oC), the supernatant phase was removed, then preserved with SUPERase-In (Ambion) and stored at -80oC until analyzed. Low speed centrifugation of the saliva pellets whole cells and large cell debris, resulting in the supernatant being referred to as cell-free (1, 2). Saliva samples were kept on ice during collection. Multiple saliva collections were performed over several weeks from the 2 volunteers in order to optimize the RNA isolation protocol. After RNA could be isolated reproducibly, 8 libraries were prepared from 3 independent collections; 2 collections from volunteer 1 and 1 collection from volunteer 2. Salivary RNA extraction: Briefly, starting with 500uL per sample,1.5 volumes of 2X Denaturing Solution was added, and the samples were thoroughly mixed and incubated on ice for 5 minutes. All subsequent steps were performed at room temperature. Next, 1.25 volumes of room temperature 100% ethanol was added, and the samples were passed through the filter cartridge. The phenol-chloroform extraction step was omitted. On-column DNase digestion (80uL, QIAGEN, RNase free DNase Set, Cat# 79254) was performed for 15 minutes at room temperature between the first and second wash steps. The filter cartridge was then washed twice, and the RNA was eluted using 100uL nuclease-free water preheated to 95oC. Due to the low yield, RNA concentration was determined using the Agilent 2100 bioanalyzer (Agilent Technologies, Agilent RNA 6000 Pico Kit, Cat# 5067-1513). Additionally, RNase and DNase digestions were performed to confirm the nucleic acids were RNA. RNA Library Preparation:Briefly, RNA was first fragmented. Because the salivary RNA was partially degraded, the RNA fragmentation reaction for the CFS samples was performed for 0, 3, and 10 minutes to determine the optimal fragmentation time. Next, adapters were ligated to the RNA fragments, and the fragments were reverse transcribed to generate a cDNA library. Gel size selection was performed to obtain cDNAs ranging from 100 to 200 base pairs (bp), followed by 18 cycles of in-gel PCR amplification.Each sample was barcoded with SOLiD™ 3’PCR Primers from the SOLID™ RNA Barcoding Kit. Libraries were quantitated using the Agilent High Sensitivity DNA Kit (Agilent Technologies). Smear analysis was performed to determine that the percentage of cDNA 25-200bp was less than 25%. SOLIDTM Total RNA-Sequencing: A detailed description of the alignment strategy is described in the BioScope user manual and can be found at (3). Reproducibility of Replicates: The reproducibility between the replicates for the 3min CFS and 10 min CFS samples was quite high (±15%), so for most of the analysis, the sequences between the replicates were pooled prior to calculating RPKMs for increased sequencing depth.The two replicates for the 3 and 10 min CFS samples were pooled. The spearman correlations (R2) for technical replicates for 3 and 10 minutes were 0.99 and 0.98, respectively. Network Analysis: Genetic network based analysis was performed using Ingenuity software, version 9 (IPA 9.0-3206). IPA was applied in this study to explore the genes detected in human saliva profiles using core analyses. Results cDNA Library Preparation Preparation of cDNA libraries from RNA requires RNA fragmentation in order to generate cDNAs with inserts narrowly distributed in size. With the assumption that RNA isolated from saliva was already fragmented, we varied the time of fragmentation of the human samples from 0 to 10 minutes to determine if additional fragmentation was necessary. RNA Fragmentation The only fragmentation protocol used was the protocol described in the user’s manual that accompanied the SOLiD Total RNA-Seq Kit (Cat# 4445374). The only variation was the digestion time to prevent over fragmentation of the salivary RNA. There were two reasons for varying the digestion time: 1. The amount of RNA used in the fragmentation reaction was significantly lower than the amount of RNA recommended in the protocol 2. The salivary RNA was highly fragmented Over fragmentation of the RNA would result in cDNA inserts that were too short for 50 bpsequencing. We found that fragmentation was necessary because the 3’ ends of the fragmented RNA were not amenable to ligation of the adapters. We did try to phosphorylate the ends of the fragmented RNA, but found that the ligation reaction was less efficient than enzymatically fragmenting the RNA for 3 minutes (data not shown). Additional fragmentation of the RNA appeared to increase the percentage of ribosomal RNA reads within the library. Both fragmented and unfragmented samples showed similar percentages of uniquely aligned sequences. The unfragmented sample did contain a higher percentage of unaligned reads, suggesting that additional fragmentation may be needed to make the ends of the RNA fragments available for ligation of the adapter to the RNA. Exon-specific coverage Figure 1: The following 5 panels provides more examples that structural RNA integrity has been preserved and sequencing reads align to gene annotation. These examples complement MALAT1 data. A: Alignment of the reads to the genomic loci containing DUSP1 show RNA integrity has been preserved. Sample was from volunteer #1 and RNA was isolated from cell-free saliva. B: Alignment of reads to the genomic loci containing S100A9 (8737 bp). While the plot only shows data from 2 volunteers, the 6 samples represent independent saliva collections, library preparation, and sequencing events. The first line in each sample shows reads aligning to the negative DNA strand and the second line shows reads aligning to the positive strand. The gene is expressed on the positive strand. C: Alignment of reads to the genomic loci containing GAS5. GAS5 is not expressed, but snoRNAs within the introns of GAS5 are highly expressed. The snoRNAs are differentially expressed and the RPKM values are placed above each peak. A similar alignment and expression pattern is observed with Hela cells (data not shown). D: Alignment of reads to a 3.3 Mb loci on chromosome 2 matches the gene annotation for this region. Both the positive and negative strands are shown and read alignment correlates with annotated transcript expression on both strands. E: Alignment of reads to the genomic loci containing MALAT1 (110 kb). Upstream of MALAT1 a human EST is expressed. Expression of this transcript would not have been detected if microarrays or PCR would have been used to measure gene expression. Network-based Gene Analysis: We expanded the salivary gene investigation further by performing a network-based analysis using 840genes that were detected in both cell-free and whole human saliva samples at noise threshold set to 1 RPKM. A network-based Ingenuity core analysis was applied to the human gene listindicating pathways that are enriched in saliva. The network-based analysis suggested that a key function of the salivary genes was related to the “inflammatory response”. This network involved 47 salivary genes (p=1.57E-06). Furthermore, this panel of genes showed involvement in processes related to all phases of the cell cycle: cellular growth and proliferation (p=5.14-E-10), cell death (p=4.59E-08), cellular development (p=6.46E-08), post-transcriptional modification (p=1.20E-06) as well as cellular function and maintenance (p=1.63E-06). Given the fact that our panel of salivary genes was related to all phases of the cell cycle was suggestive of a transcriptomic signature found in human saliva that may represent an abridged mirror image of biological processes within the body. Overall, this networkbased analysis indicates that the RNA found in saliva was representing more than just a by-product of cellular death or RNA degradation pathway. This observation was important since it supports the diagnostic value of saliva for detection of biomarkers for distal and systemic diseases. Discussion In addition, in a separate study not described in this manuscript, salivary RNA from 9 healthy volunteers was sequenced. Comparison of the two RNA-Seq data sets showed the same overall trends as described in this manuscript. 1. Size distribution analysis of salivary RNA showed the RNA was fragmented. However, complete sequence coverage across entire exons was consistently observed, independent of the exon length. There was not over or under representation of specific regions of the transcript, indicating fragmentation bias of the RNA prior to library generation and sequencing. 2. Figure 2 provides an overview of genes detected at different RPKM thresholds in each sample (volunteer 1-10). The fraction of uniquely mapped reads ranged from 10-25%, with the largest fraction of reads being unmapped to the human reference genome, most likely due to the high microbial sequence content. 3. Multiple species of RNA were consistently expressed, including mRNAs, long ncRNAs, miRNAs, and snoRNAs. Noncoding RNAs represented a minimum of 90% of the most highly expressed genes; with snoRNAs being the highest fraction represented relative to the other RNA species. 4. Sequence alignment showed that structural integrity had been preserved. Sequenced reads aligned to annotated exons in the human reference genome (Figure 3 Panel A and B). References Cited: 1. 2. 3. Navazesh M. Methods for collecting saliva. Ann N Y Acad Sci 1993;694:72-7. St John MA, Li Y, Zhou X, Denny P, Ho CM, Montemagno C, et al. Interleukin 6 and interleukin 8 as potential biomarkers for oral cavity and oropharyngeal squamous cell carcinoma. Arch Otolaryngol Head Neck Surg 2004;130:929-35. Tuch BB, Laborde RR, Xu X, Gu J, Chung CB, Monighetti CK, et al. Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations. PLoS One 2010;5:e9317. Figures: Figure 1: Examples of genes where RNA structural integrity has been preserved and sequenced reads align to the annotated reference genome. The transcript is expressed from the negative DNA strand and counts align to this strand. No reads aligned to the positive as expected. A. Reads align to the annotated loci of DUSP1. B. Six independent saliva samples collected on different days from the two volunteers described in the manuscript show reads aligning only the positive strand as expected. The Y axis is 10 to 100 reads for all samples. Read counts are higher in the CFS samples compared to WS samples. Gene: S100A9 C. GAS5 is not expressed in saliva but snoRNAs are expressed the introns within GAS5. This same expression pattern is observed in Hela cells. 8960 RPKM161 3585 1048 1958 1759 388 903 2993 D. Sequenced reads aligned to chromosome 2 (3.3 Mb region) shows directionality of transcript expression and sequence read alignment to annotated genes. + E. Transcriptional activity upstream of MALAT1. Sequenced reads align to a human EST, which would not be identified if PCR or a microarray was used. Figure 2: Gene detection as a function of increasing RPKM noise thresholds. (A) RefSeq gene detection in saliva samples. There were 21,208 RefSeq genes for human. The two replicates for the 3 and 10 min CFS samples were pooled prior to calculating RPKM values. The spearman correlations for technical replicates for 3 and 10 minutes were 0.99 and 0.98, respectively. At 1 RPKM, the two 3 min CFS samples each showed 5504 and 6256 genes detected, with a spearman correlation of 0.98, while the two 10 min CFS samples each showed 5067 and 5133 genes detected and a spearman correlation of 0.99. (B) MicroRNA detection in the saliva samples. (C) Number of different microbial species detection in the human WS and CFS samples based on 16S rRNA gene clone analysis. Figure 3: RNA structural integrity is preserved in saliva. Examples of transcript expression at two loci for 3 additional volunteers. Plotted across each locus is the normalized sequence coverage on both DNA strands. The x-axis is in groups of two for each sample. The first line is the negative strand and the second line is the positive strand. Samples CFS and WS are data from the manuscript and samples 5, 6, and 10 are from 3 additional samples. (A) MALAT1. The y-axis for the CFS samples is 0 to 1000 while the y-axis for the remaining samples is 0 to 10. (B) IL8. The y-axis scale for sample CFS is 0 to 100 and the y-axis for the remaining 4 samples is 0 to 20. A B