Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
New Methods in Cardiovascular Biology Deep mRNA Sequencing for In Vivo Functional Analysis of Cardiac Transcriptional Regulators Application to G␣q Scot J. Matkovich, Yan Zhang, Derek J. Van Booven, Gerald W. Dorn II Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017 Rationale: Transcriptional profiling can detect subclinical heart disease and provide insight into disease etiology and functional status. Current microarray-based methods are expensive and subject to artifact. Objective: To develop RNA sequencing methodologies using next generation massively parallel platforms for high throughput comprehensive analysis of individual mouse cardiac transcriptomes. To compare the results of sequencing- and array-based transcriptional profiling in the well-characterized G␣q transgenic mouse hypertrophy/ cardiomyopathy model. Methods and Results: The techniques for preparation of individually bar-coded mouse heart RNA libraries for Illumina Genome Analyzer II resequencing are described. RNA sequencing showed that 234 high-abundance transcripts (>60 copies/cell) comprised 55% of total cardiac mRNA. Parallel transcriptional profiling of G␣q transgenic and nontransgenic hearts by Illumina RNA sequencing and Affymetrix Mouse Gene 1.0 ST arrays revealed superior dynamic range for mRNA expression and enhanced specificity for reporting low-abundance transcripts by RNA sequencing. Differential mRNA expression in G␣q and nontransgenic hearts correlated well between microarrays and RNA sequencing for highly abundant transcripts. RNA sequencing was superior to arrays for accurately quantifying lower-abundance genes, which represented the majority of the regulated genes in the G␣q transgenic model. Conclusions: RNA sequencing is rapid, accurate, and sensitive for identifying both abundant and rare cardiac transcripts, and has significant advantages in time- and cost-efficiencies over microarray analysis. (Circ Res. 2010;106:1459-1467.) Key Words: RNA sequencing 䡲 microarray 䡲 gene regulation 䡲 Gq S ignaling factors that mediate cardiac hypertrophy and heart failure do so in large part by altering gene expression. Accordingly, physiological hypertrophy, pathological hypertrophy and heart failure are associated with distinct transcriptional signatures in human disease and experimental mouse models.1,2 The ability to detect early changes in myocardial gene expression is essential to understanding pathophysiological mechanisms in experimental models, and is predicted to provide crucial diagnostic and prognostic information in human heart disease.3 We developed techniques for broad, accurate, and inexpensive characterization of individual mouse cardiac transcriptional signatures using massively parallel resequencing of heart cDNA libraries using the Illumina Genome Analyzer II. The digital readouts enable parallel quantification and annotation of myocardial transcripts, including mRNAs expressed at levels below 1 copy per cell. Here, we describe the components of a molecular and bioinformatic “RNA sequencing pipeline” optimized for comparative evaluation of transcript signatures in genetic mouse heart models. We compare transcriptional profiling by Illumina mRNA sequencing with Affymetrix microarray RNA in mouse hearts with a well characterized transcriptional signature of pathological hypertrophy, the G␣q transgenic mouse,4,5 to illustrate the advantages of RNA sequencing over array-based platforms. Methods An expanded Methods section is available in the Online Data Supplement at http://circres.ahajournals.org. Mouse Models Generation of the G␣q-40 transgenic mouse line has been described previously.4,5 Four pairs of 8-week-old male nontransgenic FVB/N and G␣q-40 transgenic mouse hearts were used for RNA sequencing studies. Echocardiographic and cardiac catheterization studies were performed using standard methods.4,6 Preparation and Quantification of Total Myocardial RNA Total RNA was isolated from flash-frozen mouse hearts using TRIzol (Invitrogen) according to the directions of the manufacturer, except that isopropyl alcohol precipitation of RNA was allowed to proceed for 30 minutes at room temperature. RNA was quantified via NanoDrop or UV spectrometer and integrity (28S:18S ratio) was assessed on 1% agarose. Original received January 19, 2010; revision received March 18, 2010; accepted March 19, 2010. From the Center for Pharmacogenomics, Department of Medicine, Washington University School of Medicine, St Louis, Mo. Correspondence to Gerald W. Dorn II, MD, Washington University Center for Pharmacogenomics, 660 S Euclid Ave, Campus Box 8220, St Louis, Mo 63110. E-mail [email protected] © 2010 American Heart Association, Inc. Circulation Research is available at http://circres.ahajournals.org DOI: 10.1161/CIRCRESAHA.110.217513 1459 1460 Circulation Research May 14, 2010 Non-standard Abbreviations and Acronyms Ct RPKM RT-qPCR cycle threshold reads per kilobase of exon per million mapped reads reverse transcription– quantitative polymerase chain reaction Reverse Transcription and Preparation of cDNA Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017 Preparation of cDNA fragments from poly(A)⫹ RNA was modified from a previously described protocol.7 Four g of total cardiac RNA was twice oligo(dT) selected using the Dynabead mRNA purification system (Invitrogen). Cardiac mRNA (200 ng) was fragmented to ⬇200 nt by heating to 94°C for 2.5 minutes in 40 mmol/L Tris acetate pH 8.2, 100 mmol/L potassium acetate, 30 mmol/L magnesium acetate, and immediately chilled on ice. After purification on Ambion NucAway columns, 100 ng of fragmented cardiac mRNA was reverse-transcribed using random hexamers, followed by second-strand cDNA synthesis using the Just cDNA double-stranded cDNA synthesis kit (Stratagene, catalog no. 200453). Construction of Barcoded Short-Read Libraries for Illumina Sequencing cDNAs were end-repaired using the End-It End-Repair kit (Epicentre Biotechnologies, no. ER0270) and 3⬘ A-overhangs added using 3⬘-5⬘ exo- Klenow polymerase (New England Biolabs, no. M0212) and 0.2 mmol/L dATP. Illumina adapters with T-overhangs and customized to include 3-nt “barcodes” were ligated to the cDNA at 10:1 molar excess using the Promega LigaFast kit (no. M8221); different barcoded adapters were ligated to individual mouse heart cDNAs. Following column purification (Qiagen) to remove excess unligated adapter, DNA in the 200- to 400-bp range was isolated via gel purification (Qiagen) on 2% low-melting agarose and amplified with 11 cycles of Phusion polymerase–mediated (New England Biolabs, no. F531) PCR (10 seconds 98°C, 30 seconds 65°C, 30 seconds 72°C) using oligonucleotides complementary to Illumina sequencing adapters. Barcoded sequencing adapters and PCR oligonucleotides are described in Online Table IX. The final amplified libraries were again columnpurified and quantified using PicoGreen (Quant-It, Invitrogen). Sequencing and Data Processing Four barcoded libraries were combined in equimolar (10 nmol/L) amounts and diluted to 4 pmol/L for cluster formation on a single Illumina Genome Analyzer II flowcell lane. Base-calling of DNA clusters was performed using Illumina processing pipeline software (version 1.5) and 36-nt sequences, with quality scores, were obtained in the SCARF text format of Illumina. UNIX functions were used to sort the sequences according to barcode, remove the 4 barcoding nucleotides, and convert sequence data to FASTQ format (example shell script given in the expanded Methods section in the Online Data Supplement). After barcode removal, the 32-base mRNA sequence reads were mapped to transcripts annotated in NCBI release 37 of the mouse genome using the publicly available packages Bowtie (release 0.12.0) (http://bowtie-bio.sourceforge.net/index.shtml),8 TopHat (release 1.0.12) (http://tophat.cbcb.umd.edu),9 and Cufflinks (release 0.7.0) (http://cufflinks.cbcb.umd.edu).10 TopHat and Cufflinks map known and novel splice junctions, use annotation files to compute which aligned sequences map to the known transcriptome and take into account transcript isoform diversity (alternative splicing). Cufflinks may be used with gene annotation files to calculate overall gene expression in terms of RPKM (reads per kilobase of exon per million mapped reads), a previously defined parameter.7 We used the default options supplied with these software packages in our analyses (example shell script given in the expanded Methods section in the Online Data Supplement). Annotation files in gtf format (http://mblab.wustl.edu/GTF22.html), including mRNAs, expressed sequence tags and noncoding elements such as ribosomal RNAs and microRNA precursors, were downloaded from Ensembl (ftp://ftp.ensembl.org/pub/current_gtf/mus_musculus). We used the Cufflinks module (above) with annotation files from which expressed sequence tags, rRNAs, and microRNA precursors had been manually removed, to focus on sequence reads mapping to coding genes. We analyzed only those RNA elements that had expression signals in at least 2 of 4 biological replicates. The gtf annotation files from Ensembl contain many separate transcripts (identified with ENST labels) that contribute to the sequence defined for each gene (identified with ENSG labels). To assess alternative splicing events, we used the Cufflinks module with these same gtf annotation files, but forced calculation of RPKM values for separate transcripts rather than for entire genes (example shell script given in the expanded Methods section in the Online Data Supplement). The presence of alternatively spliced products was evaluated by manually examining Cufflinks output files for ENST entries corresponding to an ENSG entry of interest. Comparative Analysis of RNA Sequencing and Microarray Results To compare gene expression data obtained with RNA sequencing to the most current microarray technology, RNA from the same samples was analyzed on Affymetrix Mouse Gene 1.0 ST arrays at the Multiplexed Gene Analysis core of Washington University. Microarray data were analyzed using Partek Genomics Suite version 6.4 (Partek, St Louis, Mo) as previously described11,12; RNA sequencing data (gene symbols and RPKM values) were imported into Partek and analyzed similarly. Biological Networks Gene Ontology (BiNGO13) was used to assign genes into gene-ontology categories. Reverse-Transcription Quantitative PCR One microgram of total cardiac RNA was reverse-transcribed into cDNA using oligo-dT priming (qScript Flex cDNA kit, Quanta Biosciences). One-twentieth of each preparation was used for each individual qPCR, using TaqMan probes (Applied Biosystems), detailed in the expanded Methods section in the Online Data Supplement. Statistical Methods Paired and unpaired data were compared with Student t test. Correlation coefficients and linear regressions for comparison of microarray and RNA sequencing data were calculated with GraphPad Prism (San Diego, Calif). Probability values and false discovery rates for gene expression data were calculated using Partek Genomics Suite 6.4 software. A probability value of ⬍0.05 was defined as significant unless indicated otherwise. Results RNA Sequencing of the Normal and Pathological Hypertrophied Heart Transcriptomes Microarray profiling has been widely used in mouse studies, but is relatively expensive and requires special equipment and expertise. Because our laboratory has found that high throughput human DNA sequencing on next-generation massively parallel systems is fast, highly reproducible between technical replicates, accurate, and inexpensive,14 we explored the utility of deep mRNA sequencing in experimental mouse cardiac models. For initial proof-of-concept studies, we RNA sequenced four pairs of 8-week-old male nontransgenic FVB/N and G␣q transgenic mouse hearts. G␣q is the essential signaling transducer of pressure overload hypertrophy.15 Cardiac-specific overexpression of G␣q at 4 to 5 times endogenous levels intrinsically activates pathological hypertrophy signaling in a manner that closely mimics pressure overload in mice. Thus, G␣q transgenic mice exhibit cardiac hypertrophy, ventricular enlargement, diminished ejection performance, and characteristic increased expression of fetal Matkovich et al RNA Sequencing and Transcriptional Regulation 1461 Figure 1. Phenotype of the adult G␣q-40 transgenic heart. A, Formalinfixed intact hearts (top) and 4-chamber views (bottom). B, Representative M-mode echocardiograms. C, Response to graded infusion of dobutamine during cardiac catheterization (means⫾SEM, n⫽3 each genotype). D, Representative RT-qPCR fluorescence curves for nontransgenic (ntg) and G␣q cardiac gene expression (dotted line indicates Ct). Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017 cardiac genes.4,6,16 –18 Figure 1 illustrates these features of the G␣q transgenic mouse model. Previous analyses of mRNA expression in the cardiac G␣q transgenic model used RNA dot blotting4,19 –22 or Incyte microarrays,17,18 neither of which provides a comprehensive examination of gene expression. Thus, we compared array- and sequencing-based methods of comprehensive transcriptome profiling in G␣q mouse hearts. The few RNA sequencing studies published to date have typically used large numbers of sequence reads to map the transcriptomes at very high resolution (eg, see elsewhere23,24). Because one or more sequencing lanes are devoted to a single sample, this can be very expensive. Furthermore, this degree of resolution is not required for comparative analysis of transcriptomes in a case-control study design (as with nontransgenic verses transgenic, or normal verses diseased). For this reason, we developed procedures to individually DNAbarcode RNA sequence libraries and pool multiple libraries into a single sequencing lane, and segregate the sequence data post hoc according to barcode. To assess the efficacy of barcoding, library pooling, and sorting of the sequence reads, we tracked a rare DNA marker polymorphism (SNP) in 24 sequencing libraries individually barcoded, pooled, and sequenced in a single Illumina GA II lane. (The marker SNP was present in 2 of the 24 sequencing libraries, as determined by Sanger sequencing.) Deconvolution of the barcoded, pooled DNA sequence correctly identified both libraries containing the marker SNP. To ensure that this technique was applicable to quantitative analysis of transcriptome levels by RNA sequencing, we individually barcoded a G␣q and a nontransgenic heart cDNA library, combined them, and resequenced them in a single Illumina GA II lane. Here, high expression of the transgene (Gnaq) and other characteristic molecular markers of this model (Myh7, Acta1) marked the G␣q library, which was correctly classified as such according to barcode. For comparative transcriptome profiling of G␣q and nontransgenic hearts, four barcoded libraries were analyzed in a single sequencing lane. The total number of barcoded RNA sequence reads from the eight cardiac libraries was 18.2 million, of which 10.5 million (57.8%) aligned to NCBI release 37 of the mouse genome (Online Table I). Using the criterion that an RNA element must be detected in at least 2 of 4 biological replicates, these sequences mapped to 14 109 different annotated RNA elements (included noncoding RNAs and microRNA precursors), of which 11 180 were coding mRNAs. The remaining sequences corresponded largely to ribosomal and transfer RNA. 10 844 coding mRNAs were detected in nontransgenic hearts, whereas 10 775 coding mRNAs were detected in G␣q hearts. 10 437 mRNAs were common to both nontransgenic and G␣q hearts. Gene expression values in nontransgenic hearts ranged from ⬇1 copy per cell (corresponding to an RPKM value ⬍37) for Casp8 (caspase 8) to 3873 copies per cell for mt-Co1 (mitochondrial cytochrome C oxidase subunit I) (Online Table II), consistent with myocardium being mitochondrial-rich25 and having low rates of caspase-mediated apoptosis.26 The most abundant transcript in G␣q hearts (2604 copies per cell) was Nppa encoding atrial natriuretic factor, consistent with the cardiomyopathic phenotype of this model.4,16,27 G␣q was the 24th-most abundant transcript in G␣q hearts, expressed at 574 copies per cell (Online Table III.) Complete gene expression data for each individual heart are in Online Table IV. Recent PMAGE (polony multiplex analysis of gene expression) analysis indicated that approximately 200 unique heart mRNAs were expressed at ⬎60 copies/cell, and that this small fraction (⬍1%) of the ⬇25 000 mouse gene transcripts comprised ⬇65% of total cardiac mRNA.28 Our RNA sequencing confirms this surprising observation, identifying 234 very high-abundance mRNAs expressed at or greater 1462 Circulation Research May 14, 2010 Figure 2. Gene ontology (GO) analysis of highly expressed genes in nontransgenic mouse heart. The 234 highestexpressing genes in nontransgenic mouse hearts (ⱖ60 copies per cell) were classified into 15 GO categories using BiNGO.13 Size of the pie slice corresponds to the number of matches to a given GO category (shown at the edge of each slice). A total of 360 matches were made to the 15 categories shown. Classification of each gene is given in Online Table III. Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017 than 60 copies per cell, which comprised 55% of the total myocardial mRNA complement of normal hearts (Online Table II). In G␣q hearts, 236 mRNAs were expressed at or greater than 60 copies per cell, representing 52% of total myocardial mRNA. 206 of these 236 very high-abundance G␣q heart mRNAs were among the 234 very high-abundance transcripts in nontransgenic hearts (88% concordance), indicating a relatively modest impact of the hypertrophy gene expression program on the most prevalent cardiac transcripts. Gene ontology analysis of the 234 most abundant cardiac mRNAs reveals a preponderance of mitochondrial, transport, cytoskeletal, and contractile genes (Figure 2 and Online Table V). Among the abundant transcripts, those differentially expressed in G␣q hearts were Nppb (BNP), Acta1 (␣-skeletal actin), Ankrd1, Atp2a2 (SERCA2a), Myom2, Mybpc3, Hspb6, Idh3b, Pdha1, Hadha, Acadvl, Ndufv2, Acaa2, Fhl2, ie, several members of the “fetal cardiac gene program” that is nonspecifically regulated in myocardial disease.29 Thus, quantitative analysis of the cardiac transcriptome by RNA sequencing shows that a relatively few, highly abundant cardiac transcripts encoding homeostatic genes account for the majority of cardiac mRNAs, but that measuring the relative expression of these abundant transcripts provides little insight into cardiac status.1 RNA Sequencing Reveals Greater Depth of the Normal and Hypertrophy mRNA Signature To directly compare results of RNA sequencing and conventional microarray transcript profiling of mouse hearts, we performed parallel microarray surveys of mRNA expression on the same eight mouse cardiac RNA samples studied above using Affymetrix Mouse Gene 1.0 ST arrays with probe sets for 25 175 annotated genes. The arrays reported signals for all ⬇25 000 probe sets, compared to only 11 180 coding cardiac mRNAs to which RNA sequencing reads were mapped. Thus, unfiltered array data generated “results” for 125% more RNAs than were present by RNA sequencing. To determine the reason for this discrepancy, expression levels of all mRNAs detected by either or both methods (⬇20 000 genes with matching gene symbols between the array platform and RNA sequencing gene annotation files) were compared for each of the eight hearts (Figure 3 and Online Figure I). The Figure 3. Correspondence of gene expression, determined by RNA sequencing vs microarray, for individual hearts. Gene expression determined by RNA sequencing (Illumina) is plotted against gene expression determined by microarrays (Affymetrix). Values shown are log2(RPKM⫹1) for RNA sequencing (x axis) and log2(Affymetrix signal units⫹1) for microarrays. A value of 1 was added to both RPKM and Affymetrix signal units to avoid taking the log of 0. Red circles highlight genes reported to be expressed by microarrays, but not by RNA sequencing. Red boxes show genes expressed below a typical cutoff level for microarray analysis. One nontransgenic heart and 1 Gq heart are shown; plots are representative of those for all 8 hearts (shown in Online Figure I). Matkovich et al Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017 correlation between mRNA expression levels reported by array and RNA sequencing was generally good (Spearman correlation coefficient, r⫽0.811⫾0.006, mean⫾SEM, n⫽8 hearts). However, the regression lines do not go through the y-axis origin, but rather intercept the y axis (microarray data) at ⬇5. Also, the slopes of the regression lines are significantly less than one (0.74⫾0.011, P⫽1⫻10⫺6), indicating compression of expression values by microarray analysis. Finally, a number of mRNAs for which RNA sequencing showed no reads (Illumina sequencing values (x axis) of 0) were reproducibly indicated as highly expressed by the microarray studies (Affymetrix array value (y axis) of ⬇6 to ⬇12; circumscribed in red on Figure 3). These types of problems have been attributed to background hybridization and false-positive hybridization on microarrays.23 Indeed, background hybridization on microarrays complicates establishing the correct threshold for mRNA detection.11,30,31 In our own data set, if we use the y-axis value at the regression line x-axis intercept to establish a threshold value for the microarray data, then hundreds of lowabundance mRNAs (851 in the nontransgenic sample and 757 in the G␣q sample shown in Figure 3 [red boxes]) detected by RNA sequencing would be incorrectly filtered. Thus, unfiltered microarray data report expression levels for mRNAs that are not really there (vertical ovals on Figure 3), but correcting for this problem by increasing the threshold for mRNAs calling eliminates genuine lowabundance mRNAs within the hybridization noise (horizontal squares in Figure 3). Case–Control Design and “Fold-Expression” Reporting Does Not Correct the Limitations of Microarray-Based mRNA Profiling We considered that nonspecific microarray hybridization signals in a case– control comparison might cancel each other out when the data are reported as the “fold change,” rather than as absolute transcript expression values. Accordingly, we compared the results of our microarray and RNA sequencing data expressed as fold-regulation between Gq and nontransgenic mouse hearts. Microarray analysis of G␣q transgenic heart mRNA identified 841 differentially expressed transcripts (false discovery rate, ⬍0.03; P⬍0.001; fold change, ⱖ1.3; 382 upregulated and 459 downregulated) (Online Table VI). At the same probability value cutoff and fold change threshold, RNA sequencing identified differential expression of only 125 genes; 77 upregulated and 48 downregulated (Online Tables VI and VII). Of the 752 genes reported as differentially regulated by microarrays, but not by RNA sequencing, 45 were not detected in any hearts by RNA sequencing, and the majority (n⫽674) were present at or less than 20 copies per cell, ie, in the group where we had shown that microarrays provide the least confident data. Unfortunately, one of the disadvantages of the fold-change analytic approach is that these critical absolute expression data are not evident, making it difficult to assess the validity of individual results as a function of absolute mRNA levels. To better understand the striking disparities between microarray and RNA-sequencing based differential mRNA profiling, we plotted the fold expression values provided either RNA Sequencing and Transcriptional Regulation 1463 by microarray or RNA sequencing of only the regulated mRNAs (ie, those whose G␣q/nontransgenic expression was P⬍0.001 by RNA sequencing), stratified by mRNA abundance in normal hearts. Thus, 3 individual regression lines were generated, separately reporting fold mRNA expression change in G␣q hearts for very high-abundance transcripts (⬎60 copies/cell, n⫽14), common transcripts (20 to 60 copies/cell, n⫽17), and low-abundance transcripts (⬍20 copies/cell, n⫽94) (Figure 4). The correlation of “fold regulation” between RNA sequencing and microarray data was good for the very high-abundance and common mRNAs (Pearson r2⫽0.92 and 0.96, respectively), but was suboptimal for low-abundance mRNAs, even though there were more data points in this group (r2⫽0.74). Furthermore, the slopes of all 3 linear regressions were very shallow (0.49 to 0.63), again reflecting a compressed range of expression data for microarray results in comparison to RNA sequencing. Quantification of mRNA counts by RNA sequencing is critically dependent on correct assignment of sequence reads to their proper genes, ie, on accurate annotation. Thus, mis-identification might explain some discrepancies between RNA sequencing and microarrays. To assess this possibility, we performed directed reverse transcription– quantitative polymerase chain reaction (RT-qPCR) for 10 transcripts representing a range of absolute expression values and relative regulation in G␣q and nontransgenic hearts (Figure 5), and compared the results to those we obtained on the same RNA samples by sequencing and arrays (Table). When all 3 methods reliably detected mRNAs in both experimental groups, the increase in mRNA content in G␣q versus nontransgenic hearts, described as “fold expression,” was similar between RT-qPCR and RNA sequencing, but these values were compressed by microarray (Table). These data suggest that mis-identification of sequence reads is not responsible for differences in fold-expression reported from RNA sequencing and microarrays. However, RNA sequencing detected transcripts from both G␣q and nontransgenic hearts for only 5 mRNAs, and RT-qPCR was either below the threshold for accuracy (Ctⱖ35) or could not detect a further four transcripts. On the other hand, RT-qPCR and microarrays agreed for detection of Tuba1a, which was not observed by RNA sequencing. This single instance likely represents a misannotation artifact. We also examined the degree of intersample variability in the 3 assay techniques, comparing expression of genes in the Table between the 4 nontransgenic, or the 4 G␣q, biological replicates. RT-qPCR exhibited the greatest variance, with the lower-abundance genes recording higher variation. RNA sequencing had less variance, and microarrays showed the least variance (because of data range compression). In short, RNA sequencing has less variance than RT-qPCR without the data compression inherent in microarrays. The utility of RNA sequencing is to derive greater understanding of gene regulation pathways in normal and diseased tissue. We used Ingenuity Pathways Analysis software (http://www.ingenuity.com) to examine possible signaling relationships between the regulated genes in G␣q hearts (Figure 4 and Online Table VII). Seventy-two of 77 G␣qupregulated genes and 42 of 48 G␣q-downregulated genes were assigned to signaling networks (Online Table VIII). Interest- 1464 Circulation Research May 14, 2010 Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017 Figure 4. Fold changes in gene expression, determined by RNA sequencing vs microarray, in Gq-overexpressing compared to normal hearts. Fold change in gene expression determined by RNA sequencing (Illumina) is plotted against fold change in gene expression determined by microarrays (Affymetrix), using a log2 scale, for the 125 significantly regulated genes defined in the Table. Red squares denote genes expressed at or above 60 copies/cell in nontransgenic hearts; blue triangles, genes expressed between 20 to 60 copies/cell; green circles, genes expressed at or less than 20 copies/cell. Inset, Comparison of fold changes in all genes detected using RNA sequencing and microarrays, regardless of significance. ingly, separating the data into low- and high- abundance transcripts generated networks with a distinctly different focus (Figure 6 and Online Table VIII). The most intriguing networks focused on cellular growth/proliferation and on cell death. A signaling pathway involving members of these 2 networks suggests relationships between Gnaq, Rcan (regulator of calcineurin), Nfatc2, and Abra (actin-binding Rho activating pro- tein), as well as signals involving the EGF receptor and the MAP kinase family (Figure 6). Discussion Here, we describe methods for RNA sequencing and transcript analysis in mouse cardiac models, and demonstrate that RNA Figure 5. Comparison of gene expression by microarray, RNA sequencing, and RT-qPCR. A, Gene expression in nontransgenic hearts, determined by RNA sequencing, plotted against gene expression determined by microarrays. Values shown are log2(RPKM⫹1) for RNA sequencing (x axis) and log2(Affymetrix signal units⫹1) for microarrays. Light gray indicates expressed genes with regression line as in Figure 3; red squares, genes regulated by both arrays and sequencing; blue triangles, genes regulated on arrays but poorly detected by sequencing; green circle, highly expressed on arrays but not detected by sequencing. B, Representative TaqMan qPCR traces for genes shown in A. Dotted line indicates fluorescence threshold for Ct determination. Matkovich et al RNA Sequencing and Transcriptional Regulation 1465 Table. Comparison of Gene Regulation Measured by Microarray, RNA Sequencing, and RT-qPCR Methods RNA Sequencing Gene Nontransgenic (Copies/Cell) Gq (Copies/Cell) Fold Change Microarray (Fold Change) RT-qPCR (Fold Change) Nppa 43 2604 60.2 13.4 Acta1 127 897 7.0 2.6 7.9 Atp2a2 811 458 ⫺1.8 ⫺1.2 ⫺2.2 Fhl2 267 50 ⫺3.4 ⫺2.6 ⫺4.0 2 5.4 2.8 4.6 3.1 2.5 Bcl2 0.4 Edn3 ND Nrg1 ND Aqp4 0.4 1.6 Ct 98.8 ⬎35 ND ⬎35 ND ⫺1.6 ⫺11.7 ⬎35 ⬎35 1.0 2.9 Tac1 ND ND ⫺1.7 ⫺4.0 Tuba1a ND ND 1.3 1.1 Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017 Fold change indicates the difference in gene expression observed between Gq and ntg samples for a particular method. ND indicates not detected. sequencing has advantages for quantitative analysis of cardiacexpressed transcripts in normal and hypertrophied mouse hearts. The idea that transcriptional signatures can provide information into the nature and/or cause of cardiac disease is decades old, but has yet to completely fulfill its promise. The laboratories of Nidal-Ginard and Chien were among the first to describe prototypical transcriptional changes in heart disease.29,32 There followed the description of a handful of “fetal genes” whose expression was strikingly increased in cardiac hypertrophy and/or failure.33–35 Measures of these few transcripts by Northern blotting, and more recently real-time quantitative PCR, became standard as early markers of cardiac pathology.1,2 In-depth analysis of the transcriptome has revealed specific transcriptional signatures for physiological versus pathological hypertrophy, and for early versus late heart failure.3 RNA sequencing takes advantage of massively parallel next generation DNA sequencing platforms that are increasingly available at academic institutions and industry. After development, optimization, and implementation of these techniques, preparation of bar-coded cDNA sequencing libraries from cardiac RNA took one individual 3 days to complete, the Illumina sequencing took 2 days, and sequence alignment and initial analysis were completed over a 2-day period. The total cost for sequencing eight mouse heart mRNA libraries (2 Illumina GA II sequencing lanes) was $2100, including reagents and instrument time. By comparison, the Core turnover for Affymetrix microarray results was 2 weeks, at a cost of $3600 for 8 cardiac mRNA profiles. Most importantly, RNA sequencing provided accurate quantitative digital data on lower abundance transcripts where most of the gene regulation was found, but that were measured less reliably on microarrays. The arrays appear to be reporting changes in relative gene expression for large numbers of mRNAs that are expressed at absolute levels below those that are optimal for array-based measurements, 20 copies per cell. In contrast, RNA sequencing appeared exquisitely sensitive to changes in the abundance of these rare transcripts, as 94 of the 125 G␣qregulated mRNAs reported by this technique were present at less than 20 copies per cell in the control, nontransgenic hearts. Thus, the major disparity between the microarray transcriptional profiling and RNA sequencing occurs with mRNAs expressed at low levels: microarrays falsely reported regulation of the rare transcripts, but also failed to detect changes in almost one quarter (36 of 125) of significantly regulated low-abundance mRNAs. We do note that wholeorgan transcriptomes, such as we have studied here, are not cell-type specific and thus some low-abundance mRNAs (especially at or below 1 copy/cell) may not originate from cardiac myocytes. However, cardiac myocyte-specific transgenesis in mouse hearts can provide cell-autonomous data in the in vivo context, and even after physiological modeling, that is not possible with cultured cells. This is especially important because elucidation of new signaling pathways from RNA sequencing studies will likely involve transcripts expressed at lower abundance, in accordance with our observation that high- and low-abundance G␣q-regulated transcripts generated different signaling networks. Another potential advantage of RNA sequencing over traditional microarrays is the ability to easily identify alternatively spliced transcripts. Although the newer generation of Affymetrix microarrays uses probe sets that cover multiple exons of a gene, the limited ability of array hybridization techniques to provide quantitative information on gene or exon expression complicates nonbiased identification and quantification of alternatively spliced isoforms. In addition, the Tophat/Cufflinks analytic software we use provide the ability to detect both known and novel splicing events via aligning sequence reads to the entire genome, whereas microarrays are limited by the probe sets placed on the arrays. Although this aspect of RNA sequencing was not the focus of the present studies, we explored the potential of this application for several cardiac-expressed genes known to undergo alternative splicing: Both transcripts for Atp2a2 (SERCA2),36 both cardiac-expressed transcripts of the creatine transporter Slc6a8,37 3 of 5 known splice forms of Cacna1c (Cav1.2),38 3 of 5 alternative transcripts of Ank2 (ankyrin B),39 and both transcripts of Bnip3l (Nix)18 were detected, although we 1466 Circulation Research May 14, 2010 for identifying both abundant and rare transcripts, and has significant advantages in time- and cost-efficiencies over Affymetrix microarray analysis. We expect that these advantages will not only be similar with any massively parallel RNA sequencing platform but that the relative benefits of RNA sequencing over microarrays will continue to increase as the technology advances. Sources of Funding Supported by NIH R01 grants HL087871 and HL059888 and National Center for Research Resources grant UL1 RR024992. Disclosures None. References Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017 Figure 6. Signaling networks of G␣q-regulated transcripts. Ingenuity Pathways Analysis software (http://www.ingenuity.com) was used to depict potential signaling pathways between high-abundance (A) and low-abundance (B) G␣q-regulated gene products. Lines with arrowheads indicate molecule acts on a target; lines without arrowheads, binding only; solid lines, direct interaction; dotted lines, indirect interaction; blue background, high-abundance genes; green background, lowabundance genes; white background, member of signaling pathway but not regulated in G␣q hearts. failed to observe alternative splicing of Ccrk40 that is expressed at less than 1 copy per cell. These findings suggest that RNA sequencing can readily detect alternative splicing events, but that more sequencing reads may be required to detect rare alternately spliced isoforms. Unbiased transcriptional profiling of mouse cardiac models has increasingly been used to identify the impact of molecular perturbations on heart development, adaptation, and function, and to define specific molecular mediators of these effects.41 The potential for RNA sequencing to detect subtle regulated events is great, but the technology is relatively new, the creation and bar-coding of libraries is unfamiliar to many laboratories, and the analytic platforms can be challenging. Here, we describe an RNA-sequencing pipeline that, once it is implemented, is essentially turn-key in its operation. Although RNA sequencing is the defacto “gold standard” technique for identifying and quantifying transcripts, to our knowledge it has not been applied to profiling of mouse cardiac genes, and so we were careful to compare the results of RNA sequencing to microarray profiling in the wellcharacterized G␣q model of cardiomyopathy. We found that Illumina RNA sequencing is rapid, accurate, highly sensitive 1. Dorn GW II, Robbins J, Sugden PH. Phenotyping hypertrophy - eschew obfuscation. Circ Res. 2003;92:1171–1175. 2. Dorn GW II. The fuzzy logic of physiological cardiac hypertrophy. Hypertension. 2007;49:962–970. 3. Dorn GW II, Matkovich SJ. Put your chips on transcriptomics. Circulation. 2008;118:216–218. 4. D’Angelo DD, Sakata Y, Lorenz JN, Boivin GP, Walsh RA, Liggett SB, Dorn GW II. Transgenic G␣q overexpression induces cardiac contractile failure in mice. Proc Natl Acad Sci U S A. 1997;94:8121– 8126. 5. Dorn GW II, Tepe NM, Lorenz JN, Koch WJ, Liggett SB. Low- and high-level transgenic expression of 2-adrenergic receptors differentially affect cardiac hypertrophy and function in G␣q-overexpressing mice. Proc Natl Acad Sci U S A. 1999;96:6400 – 6405. 6. Diwan A, Wansapura J, Syed FM, Matkovich SJ, Lorenz JN, Dorn GW II. Nix-mediated apoptosis links myocardial fibrosis, cardiac remodeling, and hypertrophy decompensation. Circulation. 2008;117:396 – 404. 7. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621– 628. 8. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memoryefficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. 9. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–1111. 10. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms. Nat Biotechnol. In press. 11. Matkovich SJ, Van Booven DJ, Youker KA, Torre-Amione G, Diwan A, Eschenbacher WH, Dorn LE, Watson MA, Margulies KB, Dorn GW II. Reciprocal regulation of myocardial microRNAs and messenger RNA in human cardiomyopathy and reversal of the microRNA signature by biomechanical support. Circulation. 2009;119:1263–1271. 12. Matkovich SJ, Wang W, Tu Y, Eschenbacher WH, Dorn LE, Condorelli G, Diwan A, Nerbonne JM, Dorn GW II. MicroRNA-133a protects against myocardial fibrosis and modulates electrical repolarization without affecting hypertrophy in pressure-overloaded adult hearts. Circ Res. 2010;106:166 –175. 13. Maere S, Heymans K, Kuiper M. BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005;21:3448 –3449. 14. Matkovich SJ, Van Booven DJ, Hindes A, Kang MY, Druley TE, Vallania FL, Mitra RD, Reilly MP, Cappola TP, Dorn GW II. Cardiac signaling genes exhibit unexpected sequence diversity in sporadic cardiomyopathy, revealing HSPB7 polymorphisms associated with disease. J Clin Invest. 2010;120:280 –289. 15. Akhter SA, Luttrell LM, Rockman HA, Iaccarino G, Lefkowitz RJ, Koch WJ. Targeting the receptor-Gq interface to inhibit in vivo pressure overload myocardial hypertrophy. Science. 1998;280:574 –577. 16. Adams JW, Sakata Y, Davis MG, Sah VP, Wang Y, Liggett SB, Chien KR, Brown JH, Dorn GW II. Enhanced G␣q signaling: a common pathway mediates cardiac hypertrophy and apoptotic heart failure. Proc Natl Acad Sci U S A. 1998;95:10140 –10145. 17. Aronow BJ, Toyokawa T, Canning A, Haghighi K, Delling U, Kranias E, Molkentin JD, Dorn GW II. Divergent transcriptional responses to inde- Matkovich et al pendent genetic causes of cardiac hypertrophy. Physiol Genomics. 2001; 6:19 –28. Yussman MG, Toyokawa T, Odley A, Lynch RA, Wu G, Colbert MC, Aronow BJ, Lorenz JN, Dorn GW II. Mitochondrial death protein Nix is induced in cardiac hypertrophy and triggers apoptotic cardiomyopathy. Nat Med. 2002;8:725–730. Sakata Y, Hoit BD, Liggett SB, Walsh RA, Dorn GW II. Decompensation of pressure-overload hypertrophy in G␣q-overexpressing mice. Circulation. 1998;97:1488–1495. Wu G, Toyokawa T, Hahn H, Dorn GW II. Epsilon protein kinase C in pathological myocardial hypertrophy. Analysis by combined transgenic expression of translocation modifiers and Galphaq. J Biol Chem. 2000; 275:29927–29930. Syed F, Odley A, Hahn HS, Brunskill EW, Lynch RA, Marreez Y, Sanbe A, Robbins J, Dorn GW II. Physiological growth synergizes with pathological genes in experimental cardiomyopathy. Circ Res. 2004;95:1200–1206. Satoh M, Matter CM, Ogita H, Takeshita K, Wang CY, Dorn GW II, Liao JK. Inhibition of apoptosis-regulated signaling kinase-1 and prevention of congestive heart failure by estrogen. Circulation. 2007;115:3197–3204. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509 –1517. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470 – 476. Goffart S, Kleist-Retzow JC, Wiesner RJ. Regulation of mitochondrial proliferation in the heart: power-plant failure contributes to cardiac failure in hypertrophy. Cardiovasc Res. 2004;64:198 –207. Dorn GW II. Apoptotic and non-apoptotic programmed cardiomyocyte death in ventricular remodelling. Cardiovasc Res. 2009;81:465– 473. Dorn GW II, Brown JH. Gq signaling in cardiac adaptation and maladaptation. Trends Cardiovasc Med. 1999;9:26 –34. Kim JB, Porreca GJ, Song L, Greenway SC, Gorham JM, Church GM, Seidman CE, Seidman JG. Polony multiplex analysis of gene expression (PMAGE) in mouse hypertrophic cardiomyopathy. Science. 2007;316: 1481–1484. Chien KR, Knowlton KU, Zhu H, Chien S. Regulation of cardiac gene expression during myocardial growth and hypertrophy: molecular studies of an adaptive physiologic response. FASEB J. 1991;5:3037–3046. 18. 19. 20. 21. 22. 23. Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017 24. 25. 26. 27. 28. 29. RNA Sequencing and Transcriptional Regulation 1467 30. Draghici S, Khatri P, Eklund AC, Szallasi Z. Reliability and reproducibility issues in DNA microarray measurements. Trends Genet. 2006;22:101–109. 31. Lister R, Gregory BD, Ecker JR. Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond. Curr Opin Plant Biol. 2009;12:107–118. 32. Izumo S, Lompre AM, Matsuoka R, Koren G, Schwartz K, Nadal-Ginard B, Mahdavi V. Myosin heavy chain messenger RNA and protein isoform transitions during cardiac hypertrophy: interaction between hemodynamic and thyroid hormone-induced signals. J Clin Invest. 1987;79:970 –977. 33. Bishopric NH, Simpson PC, Ordahl CP. Induction of the skeletal alpha-actin gene in alpha 1-adrenoceptor-mediated hypertrophy of rat cardiac myocytes. J Clin Invest. 1987;80:1194 –1199. 34. Stockmann PT, Will DH, Sides SD, Brunnert SR, Wilner GD, Leahy KM, Wiegand RC, Needleman P. Reversible induction of right ventricular atriopeptin synthesis in hypertrophy due to hypoxia. Circ Res. 1988;63: 207–213. 35. Waspe LE, Ordahl CP, Simpson PC. The cardiac -myosin heavy chain isogene is induced selectively in ␣1-adrenergic receptor-stimulated hypertrophy of cultured rat heart myocytes. J Clin Invest. 1990;85: 1206 –1214. 36. Ver Heyen M, Heymans S, Antoons G, Reed T, Periasamy M, Awede B, Lebacq J, Vangheluwe P, Dewerchin M, Collen D, Sipido K, Carmeliet P, Wuytack F. Replacement of the muscle-specific sarcoplasmic reticulum Ca2⫹-ATPase isoform SERCA2a by the nonmuscle SERCA2b homologue causes mild concentric hypertrophy and impairs contractionrelaxation of the heart. Circ Res. 2001;89:838 – 846. 37. Martinez-Munoz C, Rosenberg EH, Jakobs C, Salomons GS. Identification, characterization and cloning of SLC6A8C, a novel splice variant of the creatine transporter gene. Gene. 2008;418:53–59. 38. Liao P, Yong TF, Liang MC, Yue DT, Soong TW. Splicing for alternative structures of Cav1.2 Ca2⫹ channels in cardiac and smooth muscles. Cardiovasc Res. 2005;68:197–203. 39. Hashemi SM, Hund TJ, Mohler PJ. Cardiac ankyrins in health and disease. J Mol Cell Cardiol. 2009;47:203–209. 40. Qiu H, Dai H, Jain K, Shah R, Hong C, Pain J, Tian B, Vatner DE, Vatner SF, Depre C. Characterization of a novel cardiac isoform of the cell cycle-related kinase that is regulated during heart failure. J Biol Chem. 2008;283:22157–22165. 41. Genomics of Cardiovascular Development, Adaptation, and Remodeling. NHLBI Program for Genomic Applications, Harvard Medical School. http:///www.cardiogenomics.org. Accessed December 16, 2009. Novelty and Significance What Is Known? ● ● Accurate, unbiased gene transcription profiling is necessary for understanding disease mechanisms and can assist in determining diagnosis and prognosis. Large-scale transcriptional profiling has been performed using microarrays for several years, but expense, limited dynamic range, and background correction remain problematic. What New Information Does This Article Contribute? ● ● ● ● Next-generation (high-throughput) sequencing of RNA was compared to microarray analysis in a case– control design, using the established Gq-overexpression mouse model of heart failure. DNA-barcoding of individual heart samples permits multiple samples to be sequenced simultaneously, bringing the cost of RNA sequencing below that of microarrays. RNA sequencing provides quantitative gene expression data and reveals a greater dynamic range of gene regulation. RNA sequencing offers insight into regulation of low-abundance genes that microarrays cannot. RNA profiling offers detailed insight into critical transcriptional regulatory mechanisms in health and disease. The value of transcriptional profiling is affected by the accuracy of the data and the sensitivity to detect changes in expression of uncommon mRNAs. RNA sequencing using massively parallel next generation platforms has the potential to enhance accuracy and depth of transcriptional signatures. We compared cardiac gene expression profiling using RNA sequencing and the present state-of-the art microarrays, applying both platforms to analysis of mRNA expression in the Gq-overexpression mouse model of cardiac hypertrophy. We describe how to individually DNA barcode RNA sequencing libraries from individual hearts for batch sequencing that reduces cost while providing digital mRNA expression data at or below the level of 1 RNA copy per cell. We found that RNA sequencing and microarrays provided comparable data on regulation of high-abundance genes, but that RNA sequencing was superior for detection and quantitation of low-abundance genes, which represent the majority of regulated genes in the G␣q model. The widespread implementation of RNA sequencing in disease studies should enhance diagnostic and prognostic profiling, facilitating a more detailed description of signaling mechanisms involving low-abundance genes that were previously missed with microarray. Deep mRNA Sequencing for In Vivo Functional Analysis of Cardiac Transcriptional Regulators: Application to G αq Scot J. Matkovich, Yan Zhang, Derek J. Van Booven and Gerald W. Dorn II Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017 Circ Res. 2010;106:1459-1467; originally published online April 1, 2010; doi: 10.1161/CIRCRESAHA.110.217513 Circulation Research is published by the American Heart Association, 7272 Greenville Avenue, Dallas, TX 75231 Copyright © 2010 American Heart Association, Inc. All rights reserved. Print ISSN: 0009-7330. Online ISSN: 1524-4571 The online version of this article, along with updated information and services, is located on the World Wide Web at: http://circres.ahajournals.org/content/106/9/1459 Data Supplement (unedited) at: http://circres.ahajournals.org/content/suppl/2010/04/01/CIRCRESAHA.110.217513.DC1 Permissions: Requests for permissions to reproduce figures, tables, or portions of articles originally published in Circulation Research can be obtained via RightsLink, a service of the Copyright Clearance Center, not the Editorial Office. Once the online version of the published article for which permission is being requested is located, click Request Permissions in the middle column of the Web page under Services. Further information about this process is available in the Permissions and Rights Question and Answer document. Reprints: Information about reprints can be found online at: http://www.lww.com/reprints Subscriptions: Information about subscribing to Circulation Research is online at: http://circres.ahajournals.org//subscriptions/ Supplemental Material Detailed Methods Mouse models The Gαq-40 transgenic mice were generated on the FVB/N background in the Dorn laboratory, and have been described previously 1, 2. Four pairs of 8 week old male nontransgenic FVB/N and Gαq-40 transgenic mouse hearts were used for RNA sequencing studies. Echocardiographic and cardiac catheterization studies were performed using standard methods 1, 3. Sequencing and data processing Four barcoded libraries were combined in equimolar (10 nmol/L) amounts and diluted to 4 pmol/L for cluster formation on a single Illumina Genome Analyzer II flowcell lane. Basecalling of DNA clusters was performed using Illumina’s processing pipeline software (version 1.5) and 36-nt sequences, with quality scores, were obtained in Illumina’s SCARF text format. The first 4 nucleotides of each 36-nt read corresponded to barcodes (3 nucleotides with an additional, invariant T). The UNIX functions grep, sed and awk were used to sort the sequences according to barcode, remove the barcoding nucleotides, and convert sequence data to FASTQ format. After barcode removal, the 32 base mRNA sequence reads were mapped to transcripts annotated in NCBI release 37 of the mouse genome using the publicly available packages Bowtie (release 0.12.0) (http://bowtie-bio.sourceforge.net/index.shtml) 4, (release 1.0.12) (http://tophat.cbcb.umd.edu/) 5, and Cufflinks (release 0.7.0) (http://cufflinks.cbcb.umd.edu/) 6. Bowtie maps short reads to reference sequences such as whole genomes. TopHat and Cufflinks map known and novel splice junctions, use annotation files to compute which aligned sequences map to the known transcriptome, and take into account transcript isoform diversity (alternative splicing). Cufflinks may be used with gene annotation files to calculate overall gene expression in terms of Reads Per Kb of exon per Million mapped reads (RPKM), a parameter previously defined in 7. (The current release of Cufflinks (0.8.1) uses the term Fragments Per Kb of exon per Million mapped reads (FPKM) 6, which is equivalent to RPKM for single-end sequencing, as we have performed here.) We used the default options supplied with these software packages in our analyses. Annotation files in gtf format (http://mblab.wustl.edu/GTF22.html), including mRNAs, ESTs and noncoding elements such as ribosomal RNAs and miRNA precursors, were downloaded from Ensembl (ftp://ftp.ensembl.org/pub/current_gtf/mus_musculus/). We used the Cufflinks module (above) with annotation files from which ESTs, rRNAs and miRNA precursors had been manually removed, to focus on sequence reads mapping to coding genes. We analyzed only those RNA elements that had expression signals in at least 2 of 4 biological replicates. The gtf annotation files from Ensembl contain all possible separate transcripts (identified with ENST labels) for each gene (identified with ENSG labels). In order to assess alternative splicing events, we used the Cufflinks module with these same gtf annotation files, but forced calculation of RPKM values for separate transcripts (ENSTs) rather than for entire genes (ENSGs). The presence of alternatively spliced products was evaluated by manually examining Cufflinks output files for ENST entries corresponding to an ENSG entry of interest. Example shell script for extraction of two barcoded sequences from Illumina pipeline 1.5 SCARF text file (rawsequence.txt), removal of 4-nucleotide barcodes, and conversion to FASTQ format Extract from SCARF text file (rawsequence.txt): 1 … HWIEAS158:7:1:18:1532#0/1:ATGTGCCGGGCGAGCCGCTCTGCGCTAGGCGCTCAG:ab`aaaabbabb`ba ababaaa`aaaaaaaaa`aaa HWI-EAS158:7:1:17:190#0/1:TGATGTTGACTTTGTCCACCTGGAACTTGGTCTCAA: aaaaabaaaaaaaaa`a`a^__`aa`^`a]a]_`__ … Shell script: grep ':ATGT' rawsequence.txt > ATGT-barcoded.seq sed 's/:ATGT/:/' ATGT-barcoded.seq | awk '{gsub(":","\t");print}' | awk '{print "@" $1 "_" $2 "_" $3 "_" $4 "_" $5 "\n" $6 "\n" "+" "\n" substr($7,5,36) }' > ATGTsequences.fastq grep ':TGAT' rawsequence.txt > TGAT-barcoded.seq sed 's/:TGAT/:/' TGAT-barcoded.seq | awk '{gsub(":","\t");print}' | awk '{print "@" $1 "_" $2 "_" $3 "_" $4 "_" $5 "\n" $6 "\n" "+" "\n" substr($7,5,36) }' > TGATsequences.fastq Example shell script for mapping of FASTQ sequences to the mouse genome by Tophat, followed by annotation and RPKM calculations for whole genes, using Cufflinks with a gtf annotation file tophat --solexa1.3-quals -o ~/ATGTsequences-processed ~/bowtie_database/m_musculus/indexname ~/ATGTsequences.fastq cd ~/ATGTsequences-processed cufflinks -G ~/bowtie_database/m_musculus/index.gtf accepted_hits.sam Example shell script for mapping of FASTQ sequences to the mouse genome by TopHat, followed by annotation and RPKM calculations for separate transcript components to permit alternative splicing analysis tophat --solexa1.3-quals -o ~/ATGTsequences-processed ~/bowtie_database/m_musculus/indexname ~/ATGTsequences.fastq cd ~/ATGTsequences-processed cufflinks accepted_hits.sam cuffcompare -o summstats.txt -r ~/bowtie_database/m_musculus/index.gtf transcripts.gtf Reverse-transcription quantitative PCR One microgram of total cardiac RNA was reverse-transcribed into cDNA using oligo-dT priming (qScript Flex cDNA kit, Quanta Biosciences). One-twentieth of each preparation was used for each individual qPCR, using one of the following TaqMan probes (Applied Biosystems): Nppa, Mm01255747_g1; Atp2a2, Mm00437634_m1; Acta1, Mm00808218_g1; Myh7, Mm00600555_m1; Aqp4, Mm00802131_m1; Bcl2, Mm00477631_m1; Edn3, Mm00432986_m1; Fhl2, Mm00515781_m1; Nrg1, Mm00626552_m1; Tac1, Mm00436880_m1; Tuba1a, Mm00846967_g1; Gapdh, 4352339E. 2 Supplemental Figure I. Correspondence of gene expression determined by RNA sequencing vs microarray, for 8 individual hearts.U Gene expression determined by RNA sequencing (Illumina) is plotted against gene expression determined by microarrays (Affymetrix). Values shown are log2(RPKM+1) for RNA sequencing (x-axis) and log2(Affymetrix signal units+1) for microarrays. A value of 1 was added to RPKM or Affymetrix signal units to avoid taking the log of 0. 3 Supplemental Table I: Sequencing read totals and alignment to reference genome for barcoded RNASeq samples. Lane 1 ntgATGT ntgTGAT ntgTGCT ntgTGTT Total Recognized barcoded reads Total raw reads % barcode detection Reads aligned to reference genome % alignment 3426712 2349661 1735078 1694057 9205508 11216862 Lane 2 GqTCAT GqTCCT 2882044 2020058 GqTCTT GqTGGT Total 1343763 82.1 1885165 55.0 1546833 65.8 1304290 75.2 766577 45.3 5502865 59.8 1676644 58.2 1334172 66.0 411077 30.6 Both lanes Total 2765911 9011776 9591887 18217284 20808749 94.0 87.5 1607088 5028981 58.1 55.8 10531846 57.8 Supplemental Table I. Alignment of sequencing reads to the mouse genome. The number of sequencing reads with recognizable barcodes was compared to the total number of sequencing reads obtained for the entire sequencing lane. In addition, the percentage of barcoded reads that were successfully aligned to the mouse genome was calculated. 4 Supplemental Table II is supplied as an Excel datasheet. Supplemental Table II. Highly expressed genes in nontransgenic mouse heart, determined by RNA sequencing. The 234 highest-expressing genes (equal or greater than 60 copies per cell) in nontransgenic mouse hearts (mean ± SEM, n=4) are shown, ranked from highest to lowest expression. 1 copy per cell corresponds to an RPKM value of 3 according to Mortazavi et al. 7. Supplemental Table III is supplied as an Excel datasheet. Supplemental Table III. Highly expressed genes in Gq mouse hearts, determined by RNA sequencing. The 236 highest-expressing genes (equal or greater than 60 copies per cell) in Gq transgenic mouse hearts (mean ± SEM, n=4) are shown, ranked from highest to lowest expression. Supplemental Table IV is supplied as an Excel datasheet. Supplemental Table IV. Complete gene expression profile in individual nontransgenic and Gq mouse hearts, determined by RNA sequencing. All genes detected by RNA sequencing in individual hearts are shown, ranked from highest to lowest copy number per cell in nontransgenic hearts (mean ± SEM). Genes detected in only 1 of the 4 biological replicates (no value for standard error) were excluded from further analysis. 5 Supplemental Table V: Highly-expressed genes in mouse hearts by GO category GO-ID 6915 7166 Description apoptosis cell surface receptor linked signal transduction Genes ACTC1,BAG3,ITM2B,NDUFA13,PSAP,SOD2,TNS1,VDAC1 AES,CRYAB,GNAS,GNB2L1 5856 cytoskeleton 5783 endoplasmic reticulum glucose metabolic process lipid metabolic process ACTA1,ACTB,ACTC1,ACTN2,CSRP3,DES,GSN,HSPB7,IVNS1A BP,LDB3,LMOD2,MAP1LC3A,MYBPC3,MYH6,MYH7,MYL2,M YL3,MYL4,MYL7,MYOM1,MYOZ2,PAM,TNNT2,TPM1,TTN,TU BA4A ATP2A2,HERPUD1,HRC,MGST3,PLN,PTGDS,S100A1,SLN,SRL 6006 6629 5746 mitochondrial respiratory chain 5739 mitochondrion 30016 myofibril 5634 nucleus ALDOA,ENO3,LDHA,LDHB,MDH1,MDH2,OGDH,PDHA1,PDK2, PFKM,PGAM2,TPI1 ACAA2,ANKRD23,APOE,ACADL,ACADM,ATP5A1,ACADVL,A TP5B,CPT1B,ECH1,FABP3,HADHA,LPL,PHYH,PSAP,PTGDS,TP I1 COX6B1,COX7B,CYC1,COX6C,COX7A1,MT-CO1,COX7A2,MTCO2,MT-CYTB,MT-ND3,MTND4,NDUFA1,NDUFA10,NDUFA11,NDUFA13,NDUFA2,NDUFA 3,NDUFA4,NDUFA6,NDUFA8,NDUFB10,NDUFB5,NDUFB7,ND UFB8,NDUFB9,NDUFC1,NDUFS2,NDUFS6,NDUFS7,NDUFV1,N DUFV2,UQCR,UQCRC1,UQCRC2,UQCRFS1,UQCRH,UQCRQ ACAA2,ACO2,AK1,ACADL,ACADM,ATP5A1,ACADVL,ATP5B, ATP5C1,ATP5D,ATP5E,ATP5F1,ATP5G1,ATP5G3,ATP5H,ATP5J ,ATP5J2,ATP5K,ATP5O,BRP44,CHCHD10,CKMT2,COQ9,COX17 ,COX4I1,COX5A,COX5B,COX6A2,COX6B1,COX6C,COX7A1,C OX7A2,COX7B,COX8B,CPT1B,CS,CTSD,CYC1,ECH1,ETFB,GB AS,HADHA,HSP90AB1,IDH2,IDH3B,IDH3G,LARS2,MDH2,MTATP6,MT-CO1,MT-CO2,MT-CYTB,MT-ND1,MT-ND3,MTND4,MT-ND6, NDUFA1,NDUFA10,NDUFA11,NDUFA13,NDUFA2,NDUFA3,ND UFA4,NDUFA6,NDUFA8,NDUFB10,NDUFB5,NDUFB7,NDUFB8 ,NDUFB9,NDUFC1,NDUFS2,NDUFS6,NDUFS7,NDUFV1,NDUF V2,NNT,OGDH,OXCT1,PDHA1,PDK2,PINK1,PSAP,SDHA,SDHB ,SDHC,SLC25A3,SLC25A4,SOD2,UQCR,UQCRC1,UQCRC2,UQC RFS1,UQCRH,UQCRQ,VDAC1 ACTA1,ACTC1,ACTN2,ANKRD1,ANKRD23,CRYAB,CSRP3,DE S,HSPB1,LDB3,MYBPC3,MYH6,MYH7,MYL2,MYOM1,MYOZ2, PDLIM5,PYGM,TCAP,TNNI3,TNNT2,TPM1,TTN AES,ANKRD1,ANKRD23,CS,CSRP3,EEF1A2,FABP4,FHL2,HOP X,HSPB1,IVNS1ABP,LDB3,NDUFA13,PAM,PDE4DIP,PTGDS,TT N 6 GO-ID 7242 6950 Description intracellular signaling cascade regulation of transcription response to stress 6416 6810 translation transport 45449 Genes ATP2A2,GNAS,GNB2L1,MT1,PINK1,PSAP,SOD2,TNS1 AES,ANKRD1,FABP4,FHL2,HOPX,PAM ACADM,APOE,CRYAB,CTSD,GPX3,HERPUD1,HSP90AB1,HSP B1,HSPB6,HSPB7,HSPB8,IDH2,MB,SOD2,TNS1,VDAC1 EEF1A2,EEF2,LARS2,RPL8,RPLP1,RPS5,RPS9 APOE,ATP1A1,ATP2A2,ATP5A1,ATP5B,ATP5C1,ATP5D,ATP5E ,ATP5F1,ATP5G1,ATP5G3,ATP5H,ATP5J,ATP5J2,ATP5K,ATP5O ,COX17,CPT1B,CYC1,ETFB,FABP3,FABP4,FTH1,FXYD1,GSN,H BA-A1,HBA-A2,HBB-B1,HBB-B2,MB,MT-ATP6,MT-CO1,MTCO2,MT-CYTB,MT-ND3,MTND4,MYH6,NDUFA1,NDUFA10,NDUFA11,NDUFA13,NDUFA2, NDUFA3,NDUFA4,NDUFA6,NDUFA8,NDUFB10,NDUFB5,NDU FB7,NDUFB8,NDUFB9,NDUFC1,NDUFS2,NDUFS6,NDUFS7,ND UFV1,NDUFV2,NNT,PLN,PSAP,PTGDS,SDHA,SDHB,SDHC,SLC 25A3,SLC25A4,TUBA4A,UQCR,UQCRC1,UQCRC2,UQCRFS1,U QCRH,UQCRQ,VDAC1 Supplemental Table V. Genes expressed at greater than 60 copies per cell in nontransgenic hearts, determined by RNA sequencing, are presented by Gene Ontology categories (GO-IDs). 7 Supplemental Table VI is supplied as an Excel datasheet. Supplemental Table VI. Differential gene expression in Gq-overexpressing hearts compared to nontransgenic hearts, as determined by microarrays. Genes were defined as differentially expressed at P<0.001 (FDR<0.03), magnitude of fold-change ≥1.3. Genes are ranked by fold-change in expression, from highest to lowest. Data from RNA sequencing for the same genes are shown, and significant regulation reported by RNA sequencing is indicated. 8 Supplemental Table VII: Differential gene expression in Gq-overexpressing compared to nontransgenic hearts Gene ntg Gq symbol copies/cell copies/cell Upregulated Gnaq 2 574 Nppa 43 2604 Gdf15 2 62 Fgf6 0.4 5 Thbs4 0.7 6 Acta1 127 897 Synpo2l 20 117 Bcl2 0.4 2 Nppb 124 635 Phlda1 7 30 Scgb1c1 4 16 Rcan1 19 75 Etv5 1 4 Arc 0.3 1 Casq1 11 41 Klhl34 2 6 Dusp27 3 9 Dbn1 1 4 Nuak1 5 16 Ankrd1 204 671 Cyr61 15 47 Ppm1e 0.5 2 Fbxl7 0.2 0.6 Tnfrsf12a 20 64 Ckap4 3 10 Mybpc2 4 13 Otud1 7 22 Masp1 2 6 Spry4 1 3 Irx5 0.7 2 Uck2 5 13 Col1a1 11 29 Col15a1 9 23 Abra 11 29 Pfkp 4 9 Fold-change (RNASeq) p-value (RNASeq) Fold-change (array) p-value (array) regulated (array) 365.9 60.2 26.5 11.2 8.6 7.0 5.9 5.4 5.1 4.5 4.1 4.0 3.7 3.7 3.6 3.6 3.4 3.3 3.3 3.3 3.2 3.1 3.1 3.1 3.0 3.0 3.0 2.9 2.8 2.8 2.8 2.7 2.6 2.6 2.6 1.4E-06 5.9E-04 4.2E-04 3.4E-05 6.3E-04 2.5E-05 1.4E-04 1.6E-04 2.0E-06 8.0E-04 7.5E-04 6.5E-05 4.4E-04 4.4E-04 1.3E-05 7.1E-04 7.7E-04 5.6E-06 2.8E-04 2.0E-06 5.3E-05 2.0E-04 3.4E-04 2.9E-04 9.3E-05 1.6E-05 3.1E-05 6.8E-05 8.2E-05 1.8E-04 6.4E-04 3.2E-04 6.2E-04 1.1E-04 8.6E-05 15.1 13.4 4.0 4.1 9.7 2.6 2.3 2.8 3.0 1.9 2.7 2.3 2.6 1.1 2.5 1.6 2.4 1.6 2.4 1.5 2.3 2.8 1.6 2.3 1.9 2.3 2.4 1.7 1.3 1.4 2.3 2.3 1.7 2.1 1.7 6.9E-15 2.6E-10 9.8E-07 7.5E-08 3.8E-09 1.8E-06 5.1E-07 9.3E-09 1.0E-07 9.4E-05 1.2E-05 1.1E-06 1.0E-05 2.3E-01 8.4E-09 1.8E-03 7.6E-06 5.3E-07 1.4E-05 9.3E-06 4.0E-05 3.4E-07 3.1E-05 3.9E-07 2.1E-05 1.2E-06 2.6E-04 1.4E-04 3.3E-02 1.2E-03 5.3E-04 2.0E-09 1.1E-05 8.8E-06 1.7E-04 Y Y Y Y Y Y Y Y Y Y Y Y Y 9 Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Gene symbol Arhgap24 Rasa3 Frmd6 Napepld Thap1 Xirp2 Serpinb6b Itgb5 Surf6 Tjap1 Camta1 Itga9 Qsox1 Btg2 Hn1 Nfatc2 Myom2 Enah Nrap Col1a2 Lrrc10 Prkab2 Slc38a2 Tsc22d1 Heyl Ttpal Grb10 Sorbs2 Prmt2 Mmp2 Mybpc3 Lamb2 Bambi Myh10 Habp4 Plxnb2 Msn Hspb6 Dnaja4 Plxna1 ntg Gq copies/cell copies/cell 2 4 1 3 2 4 0.3 0.6 0.9 2 19 44 3 7 10 22 0.6 1 5 12 5 12 4 9 4 9 4 8 12 26 2 3 66 135 18 37 55 112 11 21 55 105 3 5 26 46 32 57 2 4 2 3 11 19 39 67 2 3 8 13 261 424 41 65 1 2 2 3 5 8 9 14 11 17 182 267 13 20 2 3 Fold-change (RNASeq) 2.4 2.4 2.4 2.3 2.3 2.3 2.3 2.3 2.2 2.2 2.2 2.2 2.2 2.1 2.1 2.0 2.0 2.0 2.0 1.9 1.9 1.9 1.8 1.8 1.8 1.8 1.7 1.7 1.7 1.7 1.6 1.6 1.6 1.6 1.5 1.5 1.5 1.5 1.5 1.4 10 p-value (RNASeq) 5.0E-04 7.7E-05 2.4E-04 9.6E-04 1.1E-03 4.8E-04 1.7E-04 7.8E-05 2.4E-05 4.8E-04 5.6E-05 3.0E-04 1.2E-04 6.8E-04 1.6E-04 2.4E-04 1.8E-05 3.1E-04 4.8E-05 7.6E-04 2.3E-04 1.1E-03 6.6E-04 8.3E-04 1.0E-03 9.5E-04 5.5E-05 3.3E-04 9.1E-04 1.5E-04 2.3E-05 5.6E-04 7.5E-04 7.4E-04 5.5E-04 1.0E-03 6.1E-04 5.6E-04 4.1E-04 7.5E-04 Fold-change (array) 1.2 1.5 1.5 1.5 -1.1 2.3 1.9 1.8 1.0 1.7 1.7 1.8 1.5 1.7 1.7 1.5 1.6 1.7 1.6 1.9 1.7 1.5 1.5 1.3 1.3 1.3 1.4 1.6 1.4 1.5 1.2 1.3 1.2 1.4 1.2 1.3 1.5 1.0 1.3 1.1 p-value (array) 3.2E-03 5.6E-04 6.4E-04 7.9E-04 2.5E-01 2.9E-07 1.9E-07 3.5E-08 4.7E-01 2.2E-05 3.4E-07 8.4E-07 1.4E-06 9.4E-04 6.1E-06 1.5E-05 2.7E-03 2.2E-05 7.4E-06 3.5E-08 8.9E-05 1.5E-03 3.4E-05 5.9E-04 2.9E-05 1.5E-04 4.3E-05 5.9E-05 1.3E-04 9.0E-05 7.1E-05 3.2E-05 5.7E-03 2.6E-05 2.2E-03 2.1E-04 8.8E-07 7.9E-01 1.0E-03 2.0E-01 regulated (array) Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Gene ntg Gq symbol copies/cell copies/cell Cxcr7 10 14 Arl6ip5 20 26 Downregulated Idh3b 85 63 Dlat 27 20 Fmo1 5 4 Ank 21 15 Pdha1 78 54 Hadha 128 88 Acadvl 99 68 Strn3 6 4 Rxrg 8 5 Fyco1 22 13 Sh3kbp1 8 5 Ndufv2 99 58 Rhobtb1 28 16 Mapkapk3 3 2 Atp2a2 811 458 Acaa2 96 51 Npc1 8 4 Bcs1l 4 2 Qrsl1 3 2 Hsdl2 15 7 Nphp3 18 9 Aldh6a1 20 10 Decr1 35 17 Suclg2 11 5 Epm2aip1 2 1 Grb14 24 12 Car14 20 9 Mtr 23 10 Fah 6 2 Nqo1 8 3 Scn4a 3 1 A2bp1 7 2 Slc22a3 0.8 0.3 Lrtm1 33 12 Pex11a 2 0.8 Olfml2b 5 2 Mylk4 14 5 Fold-change (RNASeq) 1.4 1.3 p-value (RNASeq) 7.2E-04 2.0E-04 Fold-change (array) 1.1 1.1 p-value (array) 4.0E-01 4.5E-01 -1.3 -1.3 -1.4 -1.4 -1.4 -1.4 -1.5 -1.5 -1.6 -1.6 -1.7 -1.7 -1.7 -1.7 -1.8 -1.9 -1.9 -1.9 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.1 -2.1 -2.2 -2.3 -2.5 -2.6 -2.6 -2.6 -2.8 -2.8 -2.8 -2.9 -3.1 6.6E-04 1.0E-03 5.0E-04 2.8E-04 9.3E-04 7.3E-04 1.1E-04 3.8E-04 6.9E-04 4.8E-04 9.1E-04 3.6E-04 8.8E-04 8.1E-04 1.8E-04 4.6E-05 2.7E-05 5.4E-04 2.3E-04 8.2E-04 7.6E-05 2.4E-04 5.3E-04 8.1E-04 6.2E-04 2.5E-04 2.0E-04 3.2E-04 9.3E-05 8.6E-04 6.9E-05 4.8E-04 3.3E-04 1.9E-04 3.8E-04 6.6E-04 2.7E-04 -1.2 -1.2 -1.2 -1.3 -1.2 -1.3 -1.6 1.1 -1.5 -1.5 -1.2 -1.2 -1.4 -1.6 -1.2 -1.7 -1.8 -1.1 -1.5 -1.5 1.0 -1.5 -1.6 -1.5 -1.1 -1.7 -2.0 -1.0 -2.3 -1.7 -2.5 -1.9 -2.2 -2.6 -1.8 -1.9 -2.1 1.2E-04 3.6E-04 1.6E-01 9.7E-06 2.5E-05 5.3E-06 2.4E-07 1.2E-01 9.5E-05 8.9E-07 1.0E-03 2.6E-04 1.4E-04 9.3E-05 2.1E-05 3.9E-07 4.4E-07 3.4E-02 2.5E-04 5.2E-06 9.6E-01 3.7E-07 1.6E-07 1.5E-06 9.5E-02 4.1E-08 1.2E-07 7.3E-01 3.9E-05 2.2E-05 7.0E-06 4.0E-07 2.4E-06 6.8E-07 6.6E-05 2.3E-06 1.1E-05 11 regulated (array) Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Gene symbol Fhl2 Slc38a3 Nr4a3 Efnb3 Dusp18 Scn4b Whrn Dixdc1 Ces3 Cfd Tmem163 ntg Gq copies/cell copies/cell 167 50 10 3 3 0.7 11 3 7 1 5 1 4 0.7 0.5 0.1 19 2 9 0.7 5 0.2 Fold-change (RNASeq) -3.4 -3.5 -3.5 -3.9 -5.2 -5.2 -5.3 -6.3 -10.0 -12.6 -28.3 p-value (RNASeq) 8.7E-05 1.6E-05 8.3E-05 4.7E-05 6.7E-05 3.8E-04 6.8E-04 3.3E-04 2.5E-04 1.2E-04 6.9E-04 Fold-change (array) -2.6 -2.5 -2.5 -2.4 -4.2 -3.9 -2.2 -1.5 -7.7 1.2 1.2 p-value (array) 5.1E-08 8.3E-07 1.7E-04 4.5E-07 1.6E-07 3.9E-06 6.7E-07 1.9E-04 2.4E-11 1.7E-02 5.7E-02 regulated (array) Y Y Y Y Y Y Y Y Y Supplemental Table VII. Differential gene expression in Gq-overexpressing compared to nontransgenic hearts. 125 genes were defined as differentially expressed using RNA sequencing (P<0.001, FDR=0.1, magnitude of fold-change ≥1.3). Genes are listed with numbers of copies per cell, and ranked by foldchange in expression, from highest to lowest. Data from microarrays for the same genes are shown, and significant regulation reported by microarrays (P<0.001, fold-change ≥1.3) is indicated. 12 Supplemental Table VIII is supplied as an Excel datasheet. Supplemental Table VIII. Ingenuity Pathways Analysis networks. The 77 upregulated and 48 downregulated genes in Gq hearts, defined by RNA sequencing, were used as input for automated signaling network analysis. Gene input lists were subdivided into a high-expressing (>20 copies/cell) set (first datasheet tab), a low-expressing (<20 copies/cell) set (second datasheet tab), or not divided (third worksheet tab). The ‘Focus Molecules’ column denotes the number of genes from the input lists that participate in a given signaling network. 13 Barcode sense oligo Barcode antisense oligo Forward PCR primer Reverse PCR primer Sequencing primer 5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCTagtT-3' 5Phos-actAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG-3' 5'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC GCTCTTCCGATCT-3' 5'-CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-3' 5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' Supplemental Table IX. Nucleotide sequences for custom ‘barcoded’ Illumina sequencing adapters. Top: Example oligonucleotides for Illumina sequencing adapters, using the barcode ‘agt’. Middle: Generic primers, independent of barcode, used for PCR amplification of gel-purified Illumina libraries. Bottom: Generic oligonucleotide for sequencing libraries on the Illumina Genome Analyzer. 14 References 1. 2. 3. 4. 5. 6. 7. D'Angelo DD, Sakata Y, Lorenz JN, Boivin GP, Walsh RA, Liggett SB, Dorn GW, II. Transgenic Gαq overexpression induces cardiac contractile failure in mice. Proc Natl Acad Sci USA. 1997;94:8121-8126. Dorn GW, II, Tepe NM, Lorenz JN, Koch WJ, Liggett SB. Low- and high-level transgenic expression of β2-adrenergic receptors differentially affect cardiac hypertrophy and function in Gαq-overexpressing mice. Proc Natl Acad Sci USA. 1999;96:6400-6405. Diwan A, Wansapura J, Syed FM, Matkovich SJ, Lorenz JN, Dorn GW, II. Nix-mediated apoptosis links myocardial fibrosis, cardiac remodeling, and hypertrophy decompensation. Circulation. 2008;117:396-404. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105-1111. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms. Nat Biotechnol. 2010;in press. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621-628. 15