Download New Methods in Cardiovascular Biology Deep mRNA Sequencing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
New Methods in Cardiovascular Biology
Deep mRNA Sequencing for In Vivo Functional Analysis of
Cardiac Transcriptional Regulators
Application to G␣q
Scot J. Matkovich, Yan Zhang, Derek J. Van Booven, Gerald W. Dorn II
Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017
Rationale: Transcriptional profiling can detect subclinical heart disease and provide insight into disease etiology
and functional status. Current microarray-based methods are expensive and subject to artifact.
Objective: To develop RNA sequencing methodologies using next generation massively parallel platforms for high
throughput comprehensive analysis of individual mouse cardiac transcriptomes. To compare the results of
sequencing- and array-based transcriptional profiling in the well-characterized G␣q transgenic mouse hypertrophy/
cardiomyopathy model.
Methods and Results: The techniques for preparation of individually bar-coded mouse heart RNA libraries for Illumina
Genome Analyzer II resequencing are described. RNA sequencing showed that 234 high-abundance transcripts (>60
copies/cell) comprised 55% of total cardiac mRNA. Parallel transcriptional profiling of G␣q transgenic and
nontransgenic hearts by Illumina RNA sequencing and Affymetrix Mouse Gene 1.0 ST arrays revealed superior
dynamic range for mRNA expression and enhanced specificity for reporting low-abundance transcripts by RNA
sequencing. Differential mRNA expression in G␣q and nontransgenic hearts correlated well between microarrays and
RNA sequencing for highly abundant transcripts. RNA sequencing was superior to arrays for accurately quantifying
lower-abundance genes, which represented the majority of the regulated genes in the G␣q transgenic model.
Conclusions: RNA sequencing is rapid, accurate, and sensitive for identifying both abundant and rare cardiac transcripts,
and has significant advantages in time- and cost-efficiencies over microarray analysis. (Circ Res. 2010;106:1459-1467.)
Key Words: RNA sequencing 䡲 microarray 䡲 gene regulation 䡲 Gq
S
ignaling factors that mediate cardiac hypertrophy and
heart failure do so in large part by altering gene expression. Accordingly, physiological hypertrophy, pathological
hypertrophy and heart failure are associated with distinct
transcriptional signatures in human disease and experimental
mouse models.1,2 The ability to detect early changes in
myocardial gene expression is essential to understanding
pathophysiological mechanisms in experimental models, and
is predicted to provide crucial diagnostic and prognostic
information in human heart disease.3
We developed techniques for broad, accurate, and inexpensive characterization of individual mouse cardiac transcriptional signatures using massively parallel resequencing of
heart cDNA libraries using the Illumina Genome Analyzer II.
The digital readouts enable parallel quantification and annotation of myocardial transcripts, including mRNAs expressed
at levels below 1 copy per cell. Here, we describe the
components of a molecular and bioinformatic “RNA sequencing pipeline” optimized for comparative evaluation of
transcript signatures in genetic mouse heart models. We compare transcriptional profiling by Illumina mRNA sequencing
with Affymetrix microarray RNA in mouse hearts with a well
characterized transcriptional signature of pathological hypertrophy, the G␣q transgenic mouse,4,5 to illustrate the advantages of
RNA sequencing over array-based platforms.
Methods
An expanded Methods section is available in the Online Data
Supplement at http://circres.ahajournals.org.
Mouse Models
Generation of the G␣q-40 transgenic mouse line has been described
previously.4,5 Four pairs of 8-week-old male nontransgenic FVB/N
and G␣q-40 transgenic mouse hearts were used for RNA sequencing
studies. Echocardiographic and cardiac catheterization studies were
performed using standard methods.4,6
Preparation and Quantification of Total
Myocardial RNA
Total RNA was isolated from flash-frozen mouse hearts using TRIzol
(Invitrogen) according to the directions of the manufacturer, except that
isopropyl alcohol precipitation of RNA was allowed to proceed for 30
minutes at room temperature. RNA was quantified via NanoDrop or UV
spectrometer and integrity (28S:18S ratio) was assessed on 1% agarose.
Original received January 19, 2010; revision received March 18, 2010; accepted March 19, 2010.
From the Center for Pharmacogenomics, Department of Medicine, Washington University School of Medicine, St Louis, Mo.
Correspondence to Gerald W. Dorn II, MD, Washington University Center for Pharmacogenomics, 660 S Euclid Ave, Campus Box 8220, St Louis,
Mo 63110. E-mail [email protected]
© 2010 American Heart Association, Inc.
Circulation Research is available at http://circres.ahajournals.org
DOI: 10.1161/CIRCRESAHA.110.217513
1459
1460
Circulation Research
May 14, 2010
Non-standard Abbreviations and Acronyms
Ct
RPKM
RT-qPCR
cycle threshold
reads per kilobase of exon per million mapped reads
reverse transcription– quantitative polymerase chain reaction
Reverse Transcription and Preparation of cDNA
Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017
Preparation of cDNA fragments from poly(A)⫹ RNA was modified
from a previously described protocol.7 Four ␮g of total cardiac RNA
was twice oligo(dT) selected using the Dynabead mRNA purification
system (Invitrogen). Cardiac mRNA (200 ng) was fragmented to
⬇200 nt by heating to 94°C for 2.5 minutes in 40 mmol/L Tris
acetate pH 8.2, 100 mmol/L potassium acetate, 30 mmol/L magnesium acetate, and immediately chilled on ice. After purification on
Ambion NucAway columns, 100 ng of fragmented cardiac mRNA
was reverse-transcribed using random hexamers, followed by
second-strand cDNA synthesis using the Just cDNA double-stranded
cDNA synthesis kit (Stratagene, catalog no. 200453).
Construction of Barcoded Short-Read Libraries
for Illumina Sequencing
cDNAs were end-repaired using the End-It End-Repair kit (Epicentre
Biotechnologies, no. ER0270) and 3⬘ A-overhangs added using 3⬘-5⬘
exo- Klenow polymerase (New England Biolabs, no. M0212) and
0.2 mmol/L dATP. Illumina adapters with T-overhangs and customized to include 3-nt “barcodes” were ligated to the cDNA at 10:1
molar excess using the Promega LigaFast kit (no. M8221); different
barcoded adapters were ligated to individual mouse heart cDNAs.
Following column purification (Qiagen) to remove excess unligated
adapter, DNA in the 200- to 400-bp range was isolated via gel
purification (Qiagen) on 2% low-melting agarose and amplified with 11
cycles of Phusion polymerase–mediated (New England Biolabs, no.
F531) PCR (10 seconds 98°C, 30 seconds 65°C, 30 seconds 72°C) using
oligonucleotides complementary to Illumina sequencing adapters. Barcoded sequencing adapters and PCR oligonucleotides are described in
Online Table IX. The final amplified libraries were again columnpurified and quantified using PicoGreen (Quant-It, Invitrogen).
Sequencing and Data Processing
Four barcoded libraries were combined in equimolar (10 nmol/L)
amounts and diluted to 4 pmol/L for cluster formation on a single
Illumina Genome Analyzer II flowcell lane. Base-calling of DNA
clusters was performed using Illumina processing pipeline software
(version 1.5) and 36-nt sequences, with quality scores, were obtained
in the SCARF text format of Illumina. UNIX functions were used to
sort the sequences according to barcode, remove the 4 barcoding
nucleotides, and convert sequence data to FASTQ format (example
shell script given in the expanded Methods section in the Online Data
Supplement). After barcode removal, the 32-base mRNA sequence
reads were mapped to transcripts annotated in NCBI release 37 of the
mouse genome using the publicly available packages Bowtie (release
0.12.0) (http://bowtie-bio.sourceforge.net/index.shtml),8 TopHat (release 1.0.12) (http://tophat.cbcb.umd.edu),9 and Cufflinks (release
0.7.0) (http://cufflinks.cbcb.umd.edu).10 TopHat and Cufflinks map
known and novel splice junctions, use annotation files to compute
which aligned sequences map to the known transcriptome and take
into account transcript isoform diversity (alternative splicing). Cufflinks may be used with gene annotation files to calculate overall gene
expression in terms of RPKM (reads per kilobase of exon per million
mapped reads), a previously defined parameter.7 We used the default
options supplied with these software packages in our analyses
(example shell script given in the expanded Methods section in the
Online Data Supplement).
Annotation files in gtf format (http://mblab.wustl.edu/GTF22.html), including mRNAs, expressed sequence tags and noncoding elements such
as ribosomal RNAs and microRNA precursors, were downloaded from
Ensembl (ftp://ftp.ensembl.org/pub/current_gtf/mus_musculus). We
used the Cufflinks module (above) with annotation files from which
expressed sequence tags, rRNAs, and microRNA precursors had
been manually removed, to focus on sequence reads mapping to
coding genes. We analyzed only those RNA elements that had
expression signals in at least 2 of 4 biological replicates.
The gtf annotation files from Ensembl contain many separate transcripts (identified with ENST labels) that contribute to the sequence
defined for each gene (identified with ENSG labels). To assess alternative splicing events, we used the Cufflinks module with these same gtf
annotation files, but forced calculation of RPKM values for separate
transcripts rather than for entire genes (example shell script given in the
expanded Methods section in the Online Data Supplement). The
presence of alternatively spliced products was evaluated by manually
examining Cufflinks output files for ENST entries corresponding to an
ENSG entry of interest.
Comparative Analysis of RNA Sequencing and
Microarray Results
To compare gene expression data obtained with RNA sequencing to
the most current microarray technology, RNA from the same samples
was analyzed on Affymetrix Mouse Gene 1.0 ST arrays at the Multiplexed Gene Analysis core of Washington University. Microarray data
were analyzed using Partek Genomics Suite version 6.4 (Partek, St
Louis, Mo) as previously described11,12; RNA sequencing data (gene
symbols and RPKM values) were imported into Partek and analyzed
similarly. Biological Networks Gene Ontology (BiNGO13) was used to
assign genes into gene-ontology categories.
Reverse-Transcription Quantitative PCR
One microgram of total cardiac RNA was reverse-transcribed into
cDNA using oligo-dT priming (qScript Flex cDNA kit, Quanta
Biosciences). One-twentieth of each preparation was used for each
individual qPCR, using TaqMan probes (Applied Biosystems), detailed
in the expanded Methods section in the Online Data Supplement.
Statistical Methods
Paired and unpaired data were compared with Student t test.
Correlation coefficients and linear regressions for comparison of
microarray and RNA sequencing data were calculated with GraphPad Prism (San Diego, Calif). Probability values and false discovery
rates for gene expression data were calculated using Partek Genomics Suite 6.4 software. A probability value of ⬍0.05 was defined as
significant unless indicated otherwise.
Results
RNA Sequencing of the Normal and Pathological
Hypertrophied Heart Transcriptomes
Microarray profiling has been widely used in mouse studies,
but is relatively expensive and requires special equipment and
expertise. Because our laboratory has found that high
throughput human DNA sequencing on next-generation massively parallel systems is fast, highly reproducible between
technical replicates, accurate, and inexpensive,14 we explored
the utility of deep mRNA sequencing in experimental mouse
cardiac models. For initial proof-of-concept studies, we RNA
sequenced four pairs of 8-week-old male nontransgenic
FVB/N and G␣q transgenic mouse hearts. G␣q is the essential signaling transducer of pressure overload hypertrophy.15
Cardiac-specific overexpression of G␣q at 4 to 5 times
endogenous levels intrinsically activates pathological hypertrophy signaling in a manner that closely mimics pressure
overload in mice. Thus, G␣q transgenic mice exhibit cardiac
hypertrophy, ventricular enlargement, diminished ejection
performance, and characteristic increased expression of fetal
Matkovich et al
RNA Sequencing and Transcriptional Regulation
1461
Figure 1. Phenotype of the adult
G␣q-40 transgenic heart. A, Formalinfixed intact hearts (top) and 4-chamber
views (bottom). B, Representative
M-mode echocardiograms. C, Response
to graded infusion of dobutamine during
cardiac catheterization (means⫾SEM,
n⫽3 each genotype). D, Representative
RT-qPCR fluorescence curves for nontransgenic (ntg) and G␣q cardiac gene
expression (dotted line indicates Ct).
Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017
cardiac genes.4,6,16 –18 Figure 1 illustrates these features of the
G␣q transgenic mouse model. Previous analyses of mRNA
expression in the cardiac G␣q transgenic model used RNA
dot blotting4,19 –22 or Incyte microarrays,17,18 neither of which
provides a comprehensive examination of gene expression.
Thus, we compared array- and sequencing-based methods of
comprehensive transcriptome profiling in G␣q mouse hearts.
The few RNA sequencing studies published to date have
typically used large numbers of sequence reads to map the
transcriptomes at very high resolution (eg, see elsewhere23,24).
Because one or more sequencing lanes are devoted to a single
sample, this can be very expensive. Furthermore, this degree
of resolution is not required for comparative analysis of
transcriptomes in a case-control study design (as with nontransgenic verses transgenic, or normal verses diseased). For
this reason, we developed procedures to individually DNAbarcode RNA sequence libraries and pool multiple libraries
into a single sequencing lane, and segregate the sequence data
post hoc according to barcode. To assess the efficacy of
barcoding, library pooling, and sorting of the sequence reads,
we tracked a rare DNA marker polymorphism (SNP) in 24
sequencing libraries individually barcoded, pooled, and sequenced in a single Illumina GA II lane. (The marker SNP
was present in 2 of the 24 sequencing libraries, as determined
by Sanger sequencing.) Deconvolution of the barcoded, pooled
DNA sequence correctly identified both libraries containing the
marker SNP. To ensure that this technique was applicable to
quantitative analysis of transcriptome levels by RNA sequencing, we individually barcoded a G␣q and a nontransgenic heart
cDNA library, combined them, and resequenced them in a single
Illumina GA II lane. Here, high expression of the transgene
(Gnaq) and other characteristic molecular markers of this model
(Myh7, Acta1) marked the G␣q library, which was correctly
classified as such according to barcode.
For comparative transcriptome profiling of G␣q and nontransgenic hearts, four barcoded libraries were analyzed in a
single sequencing lane. The total number of barcoded RNA
sequence reads from the eight cardiac libraries was 18.2
million, of which 10.5 million (57.8%) aligned to NCBI
release 37 of the mouse genome (Online Table I). Using the
criterion that an RNA element must be detected in at least 2
of 4 biological replicates, these sequences mapped to 14 109
different annotated RNA elements (included noncoding
RNAs and microRNA precursors), of which 11 180 were
coding mRNAs. The remaining sequences corresponded
largely to ribosomal and transfer RNA. 10 844 coding
mRNAs were detected in nontransgenic hearts, whereas
10 775 coding mRNAs were detected in G␣q hearts. 10 437
mRNAs were common to both nontransgenic and G␣q hearts.
Gene expression values in nontransgenic hearts ranged
from ⬇1 copy per cell (corresponding to an RPKM value
⬍37) for Casp8 (caspase 8) to 3873 copies per cell for mt-Co1
(mitochondrial cytochrome C oxidase subunit I) (Online Table
II), consistent with myocardium being mitochondrial-rich25 and
having low rates of caspase-mediated apoptosis.26 The most
abundant transcript in G␣q hearts (2604 copies per cell) was
Nppa encoding atrial natriuretic factor, consistent with the
cardiomyopathic phenotype of this model.4,16,27 G␣q was the
24th-most abundant transcript in G␣q hearts, expressed at 574
copies per cell (Online Table III.) Complete gene expression
data for each individual heart are in Online Table IV.
Recent PMAGE (polony multiplex analysis of gene expression) analysis indicated that approximately 200 unique
heart mRNAs were expressed at ⬎60 copies/cell, and that this
small fraction (⬍1%) of the ⬇25 000 mouse gene transcripts
comprised ⬇65% of total cardiac mRNA.28 Our RNA sequencing confirms this surprising observation, identifying
234 very high-abundance mRNAs expressed at or greater
1462
Circulation Research
May 14, 2010
Figure 2. Gene ontology (GO) analysis
of highly expressed genes in nontransgenic mouse heart. The 234 highestexpressing genes in nontransgenic mouse
hearts (ⱖ60 copies per cell) were classified into 15 GO categories using
BiNGO.13 Size of the pie slice corresponds to the number of matches to a
given GO category (shown at the edge of
each slice). A total of 360 matches were
made to the 15 categories shown. Classification of each gene is given in Online
Table III.
Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017
than 60 copies per cell, which comprised 55% of the total
myocardial mRNA complement of normal hearts (Online
Table II). In G␣q hearts, 236 mRNAs were expressed at or
greater than 60 copies per cell, representing 52% of total
myocardial mRNA. 206 of these 236 very high-abundance
G␣q heart mRNAs were among the 234 very high-abundance
transcripts in nontransgenic hearts (88% concordance), indicating a relatively modest impact of the hypertrophy gene
expression program on the most prevalent cardiac transcripts.
Gene ontology analysis of the 234 most abundant cardiac
mRNAs reveals a preponderance of mitochondrial, transport,
cytoskeletal, and contractile genes (Figure 2 and Online Table
V). Among the abundant transcripts, those differentially
expressed in G␣q hearts were Nppb (BNP), Acta1 (␣-skeletal
actin), Ankrd1, Atp2a2 (SERCA2a), Myom2, Mybpc3, Hspb6,
Idh3b, Pdha1, Hadha, Acadvl, Ndufv2, Acaa2, Fhl2, ie,
several members of the “fetal cardiac gene program” that is
nonspecifically regulated in myocardial disease.29 Thus,
quantitative analysis of the cardiac transcriptome by RNA
sequencing shows that a relatively few, highly abundant
cardiac transcripts encoding homeostatic genes account for
the majority of cardiac mRNAs, but that measuring the
relative expression of these abundant transcripts provides
little insight into cardiac status.1
RNA Sequencing Reveals Greater Depth of the
Normal and Hypertrophy mRNA Signature
To directly compare results of RNA sequencing and conventional microarray transcript profiling of mouse hearts, we
performed parallel microarray surveys of mRNA expression
on the same eight mouse cardiac RNA samples studied above
using Affymetrix Mouse Gene 1.0 ST arrays with probe sets
for 25 175 annotated genes. The arrays reported signals for all
⬇25 000 probe sets, compared to only 11 180 coding cardiac
mRNAs to which RNA sequencing reads were mapped. Thus,
unfiltered array data generated “results” for 125% more
RNAs than were present by RNA sequencing. To determine
the reason for this discrepancy, expression levels of all
mRNAs detected by either or both methods (⬇20 000 genes
with matching gene symbols between the array platform and
RNA sequencing gene annotation files) were compared for
each of the eight hearts (Figure 3 and Online Figure I). The
Figure 3. Correspondence of gene expression, determined by RNA sequencing vs microarray, for individual hearts. Gene
expression determined by RNA sequencing (Illumina) is plotted against gene expression determined by microarrays (Affymetrix). Values
shown are log2(RPKM⫹1) for RNA sequencing (x axis) and log2(Affymetrix signal units⫹1) for microarrays. A value of 1 was added to
both RPKM and Affymetrix signal units to avoid taking the log of 0. Red circles highlight genes reported to be expressed by microarrays, but not by RNA sequencing. Red boxes show genes expressed below a typical cutoff level for microarray analysis. One nontransgenic heart and 1 Gq heart are shown; plots are representative of those for all 8 hearts (shown in Online Figure I).
Matkovich et al
Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017
correlation between mRNA expression levels reported by
array and RNA sequencing was generally good (Spearman
correlation coefficient, r⫽0.811⫾0.006, mean⫾SEM, n⫽8
hearts). However, the regression lines do not go through the
y-axis origin, but rather intercept the y axis (microarray data)
at ⬇5. Also, the slopes of the regression lines are significantly
less than one (0.74⫾0.011, P⫽1⫻10⫺6), indicating compression of expression values by microarray analysis. Finally, a
number of mRNAs for which RNA sequencing showed no
reads (Illumina sequencing values (x axis) of 0) were
reproducibly indicated as highly expressed by the microarray studies (Affymetrix array value (y axis) of ⬇6 to ⬇12;
circumscribed in red on Figure 3). These types of problems
have been attributed to background hybridization and
false-positive hybridization on microarrays.23 Indeed,
background hybridization on microarrays complicates establishing the correct threshold for mRNA detection.11,30,31
In our own data set, if we use the y-axis value at the
regression line x-axis intercept to establish a threshold
value for the microarray data, then hundreds of lowabundance mRNAs (851 in the nontransgenic sample and
757 in the G␣q sample shown in Figure 3 [red boxes])
detected by RNA sequencing would be incorrectly filtered.
Thus, unfiltered microarray data report expression levels
for mRNAs that are not really there (vertical ovals on
Figure 3), but correcting for this problem by increasing the
threshold for mRNAs calling eliminates genuine lowabundance mRNAs within the hybridization noise (horizontal squares in Figure 3).
Case–Control Design and “Fold-Expression”
Reporting Does Not Correct the Limitations of
Microarray-Based mRNA Profiling
We considered that nonspecific microarray hybridization
signals in a case– control comparison might cancel each other
out when the data are reported as the “fold change,” rather
than as absolute transcript expression values. Accordingly,
we compared the results of our microarray and RNA sequencing data expressed as fold-regulation between Gq and nontransgenic mouse hearts. Microarray analysis of G␣q transgenic heart mRNA identified 841 differentially expressed
transcripts (false discovery rate, ⬍0.03; P⬍0.001; fold
change, ⱖ1.3; 382 upregulated and 459 downregulated)
(Online Table VI). At the same probability value cutoff and
fold change threshold, RNA sequencing identified differential
expression of only 125 genes; 77 upregulated and 48 downregulated (Online Tables VI and VII). Of the 752 genes
reported as differentially regulated by microarrays, but not by
RNA sequencing, 45 were not detected in any hearts by RNA
sequencing, and the majority (n⫽674) were present at or less
than 20 copies per cell, ie, in the group where we had shown
that microarrays provide the least confident data. Unfortunately, one of the disadvantages of the fold-change analytic
approach is that these critical absolute expression data are not
evident, making it difficult to assess the validity of individual
results as a function of absolute mRNA levels.
To better understand the striking disparities between microarray and RNA-sequencing based differential mRNA profiling, we plotted the fold expression values provided either
RNA Sequencing and Transcriptional Regulation
1463
by microarray or RNA sequencing of only the regulated
mRNAs (ie, those whose G␣q/nontransgenic expression was
P⬍0.001 by RNA sequencing), stratified by mRNA abundance in normal hearts. Thus, 3 individual regression lines
were generated, separately reporting fold mRNA expression
change in G␣q hearts for very high-abundance transcripts
(⬎60 copies/cell, n⫽14), common transcripts (20 to 60
copies/cell, n⫽17), and low-abundance transcripts (⬍20 copies/cell, n⫽94) (Figure 4). The correlation of “fold regulation” between RNA sequencing and microarray data was
good for the very high-abundance and common mRNAs
(Pearson r2⫽0.92 and 0.96, respectively), but was suboptimal
for low-abundance mRNAs, even though there were more
data points in this group (r2⫽0.74). Furthermore, the slopes
of all 3 linear regressions were very shallow (0.49 to 0.63),
again reflecting a compressed range of expression data for
microarray results in comparison to RNA sequencing.
Quantification of mRNA counts by RNA sequencing is
critically dependent on correct assignment of sequence reads
to their proper genes, ie, on accurate annotation. Thus,
mis-identification might explain some discrepancies between
RNA sequencing and microarrays. To assess this possibility,
we performed directed reverse transcription– quantitative
polymerase chain reaction (RT-qPCR) for 10 transcripts
representing a range of absolute expression values and
relative regulation in G␣q and nontransgenic hearts (Figure
5), and compared the results to those we obtained on the same
RNA samples by sequencing and arrays (Table). When all 3
methods reliably detected mRNAs in both experimental
groups, the increase in mRNA content in G␣q versus nontransgenic hearts, described as “fold expression,” was similar
between RT-qPCR and RNA sequencing, but these values
were compressed by microarray (Table). These data suggest
that mis-identification of sequence reads is not responsible
for differences in fold-expression reported from RNA sequencing and microarrays. However, RNA sequencing detected transcripts from both G␣q and nontransgenic hearts for
only 5 mRNAs, and RT-qPCR was either below the threshold
for accuracy (Ctⱖ35) or could not detect a further four
transcripts. On the other hand, RT-qPCR and microarrays
agreed for detection of Tuba1a, which was not observed by
RNA sequencing. This single instance likely represents a misannotation artifact. We also examined the degree of intersample
variability in the 3 assay techniques, comparing expression of
genes in the Table between the 4 nontransgenic, or the 4 G␣q,
biological replicates. RT-qPCR exhibited the greatest variance,
with the lower-abundance genes recording higher variation.
RNA sequencing had less variance, and microarrays showed the
least variance (because of data range compression). In short,
RNA sequencing has less variance than RT-qPCR without the
data compression inherent in microarrays.
The utility of RNA sequencing is to derive greater
understanding of gene regulation pathways in normal and
diseased tissue. We used Ingenuity Pathways Analysis
software (http://www.ingenuity.com) to examine possible signaling relationships between the regulated genes in G␣q hearts
(Figure 4 and Online Table VII). Seventy-two of 77 G␣qupregulated genes and 42 of 48 G␣q-downregulated genes were
assigned to signaling networks (Online Table VIII). Interest-
1464
Circulation Research
May 14, 2010
Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017
Figure 4. Fold changes in gene expression, determined by RNA sequencing vs microarray, in Gq-overexpressing compared to
normal hearts. Fold change in gene expression determined by RNA sequencing (Illumina) is plotted against fold change in gene
expression determined by microarrays (Affymetrix), using a log2 scale, for the 125 significantly regulated genes defined in the Table.
Red squares denote genes expressed at or above 60 copies/cell in nontransgenic hearts; blue triangles, genes expressed between
20 to 60 copies/cell; green circles, genes expressed at or less than 20 copies/cell. Inset, Comparison of fold changes in all genes
detected using RNA sequencing and microarrays, regardless of significance.
ingly, separating the data into low- and high- abundance transcripts generated networks with a distinctly different focus
(Figure 6 and Online Table VIII). The most intriguing networks
focused on cellular growth/proliferation and on cell death. A
signaling pathway involving members of these 2 networks
suggests relationships between Gnaq, Rcan (regulator of calcineurin), Nfatc2, and Abra (actin-binding Rho activating pro-
tein), as well as signals involving the EGF receptor and the MAP
kinase family (Figure 6).
Discussion
Here, we describe methods for RNA sequencing and transcript
analysis in mouse cardiac models, and demonstrate that RNA
Figure 5. Comparison of gene expression by microarray, RNA sequencing,
and RT-qPCR. A, Gene expression in
nontransgenic hearts, determined by RNA
sequencing, plotted against gene expression determined by microarrays. Values
shown are log2(RPKM⫹1) for RNA sequencing (x axis) and log2(Affymetrix signal
units⫹1) for microarrays. Light gray indicates expressed genes with regression
line as in Figure 3; red squares, genes
regulated by both arrays and sequencing;
blue triangles, genes regulated on arrays
but poorly detected by sequencing; green
circle, highly expressed on arrays but not detected by sequencing. B, Representative TaqMan qPCR traces for genes shown in A. Dotted line indicates fluorescence threshold for Ct determination.
Matkovich et al
RNA Sequencing and Transcriptional Regulation
1465
Table. Comparison of Gene Regulation Measured by Microarray, RNA
Sequencing, and RT-qPCR Methods
RNA Sequencing
Gene
Nontransgenic
(Copies/Cell)
Gq
(Copies/Cell)
Fold
Change
Microarray
(Fold Change)
RT-qPCR
(Fold Change)
Nppa
43
2604
60.2
13.4
Acta1
127
897
7.0
2.6
7.9
Atp2a2
811
458
⫺1.8
⫺1.2
⫺2.2
Fhl2
267
50
⫺3.4
⫺2.6
⫺4.0
2
5.4
2.8
4.6
3.1
2.5
Bcl2
0.4
Edn3
ND
Nrg1
ND
Aqp4
0.4
1.6
Ct
98.8
⬎35
ND
⬎35
ND
⫺1.6
⫺11.7
⬎35
⬎35
1.0
2.9
Tac1
ND
ND
⫺1.7
⫺4.0
Tuba1a
ND
ND
1.3
1.1
Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017
Fold change indicates the difference in gene expression observed between Gq and ntg samples for
a particular method. ND indicates not detected.
sequencing has advantages for quantitative analysis of cardiacexpressed transcripts in normal and hypertrophied mouse hearts.
The idea that transcriptional signatures can provide information into the nature and/or cause of cardiac disease is decades
old, but has yet to completely fulfill its promise. The laboratories
of Nidal-Ginard and Chien were among the first to describe
prototypical transcriptional changes in heart disease.29,32 There
followed the description of a handful of “fetal genes” whose
expression was strikingly increased in cardiac hypertrophy
and/or failure.33–35 Measures of these few transcripts by Northern blotting, and more recently real-time quantitative PCR,
became standard as early markers of cardiac pathology.1,2
In-depth analysis of the transcriptome has revealed specific
transcriptional signatures for physiological versus pathological
hypertrophy, and for early versus late heart failure.3
RNA sequencing takes advantage of massively parallel
next generation DNA sequencing platforms that are increasingly available at academic institutions and industry. After
development, optimization, and implementation of these
techniques, preparation of bar-coded cDNA sequencing libraries from cardiac RNA took one individual 3 days to
complete, the Illumina sequencing took 2 days, and sequence
alignment and initial analysis were completed over a 2-day
period. The total cost for sequencing eight mouse heart
mRNA libraries (2 Illumina GA II sequencing lanes) was
$2100, including reagents and instrument time. By comparison, the Core turnover for Affymetrix microarray results was
2 weeks, at a cost of $3600 for 8 cardiac mRNA profiles.
Most importantly, RNA sequencing provided accurate quantitative digital data on lower abundance transcripts where
most of the gene regulation was found, but that were
measured less reliably on microarrays.
The arrays appear to be reporting changes in relative gene
expression for large numbers of mRNAs that are expressed at
absolute levels below those that are optimal for array-based
measurements, 20 copies per cell. In contrast, RNA sequencing appeared exquisitely sensitive to changes in the abundance of these rare transcripts, as 94 of the 125 G␣qregulated mRNAs reported by this technique were present at
less than 20 copies per cell in the control, nontransgenic
hearts. Thus, the major disparity between the microarray
transcriptional profiling and RNA sequencing occurs with
mRNAs expressed at low levels: microarrays falsely reported
regulation of the rare transcripts, but also failed to detect
changes in almost one quarter (36 of 125) of significantly
regulated low-abundance mRNAs. We do note that wholeorgan transcriptomes, such as we have studied here, are not
cell-type specific and thus some low-abundance mRNAs
(especially at or below 1 copy/cell) may not originate from
cardiac myocytes. However, cardiac myocyte-specific transgenesis in mouse hearts can provide cell-autonomous data in
the in vivo context, and even after physiological modeling,
that is not possible with cultured cells. This is especially
important because elucidation of new signaling pathways
from RNA sequencing studies will likely involve transcripts
expressed at lower abundance, in accordance with our observation that high- and low-abundance G␣q-regulated transcripts generated different signaling networks.
Another potential advantage of RNA sequencing over
traditional microarrays is the ability to easily identify alternatively spliced transcripts. Although the newer generation of
Affymetrix microarrays uses probe sets that cover multiple
exons of a gene, the limited ability of array hybridization
techniques to provide quantitative information on gene or
exon expression complicates nonbiased identification and
quantification of alternatively spliced isoforms. In addition,
the Tophat/Cufflinks analytic software we use provide the
ability to detect both known and novel splicing events via
aligning sequence reads to the entire genome, whereas
microarrays are limited by the probe sets placed on the arrays.
Although this aspect of RNA sequencing was not the focus of
the present studies, we explored the potential of this application for several cardiac-expressed genes known to undergo
alternative splicing: Both transcripts for Atp2a2 (SERCA2),36
both cardiac-expressed transcripts of the creatine transporter
Slc6a8,37 3 of 5 known splice forms of Cacna1c (Cav1.2),38
3 of 5 alternative transcripts of Ank2 (ankyrin B),39 and both
transcripts of Bnip3l (Nix)18 were detected, although we
1466
Circulation Research
May 14, 2010
for identifying both abundant and rare transcripts, and has
significant advantages in time- and cost-efficiencies over
Affymetrix microarray analysis. We expect that these advantages will not only be similar with any massively parallel
RNA sequencing platform but that the relative benefits of
RNA sequencing over microarrays will continue to increase
as the technology advances.
Sources of Funding
Supported by NIH R01 grants HL087871 and HL059888 and
National Center for Research Resources grant UL1 RR024992.
Disclosures
None.
References
Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017
Figure 6. Signaling networks of G␣q-regulated transcripts.
Ingenuity Pathways Analysis software (http://www.ingenuity.com) was used to depict potential signaling pathways between
high-abundance (A) and low-abundance (B) G␣q-regulated gene
products. Lines with arrowheads indicate molecule acts on a
target; lines without arrowheads, binding only; solid lines,
direct interaction; dotted lines, indirect interaction; blue background, high-abundance genes; green background, lowabundance genes; white background, member of signaling
pathway but not regulated in G␣q hearts.
failed to observe alternative splicing of Ccrk40 that is expressed at less than 1 copy per cell. These findings suggest
that RNA sequencing can readily detect alternative splicing
events, but that more sequencing reads may be required to
detect rare alternately spliced isoforms.
Unbiased transcriptional profiling of mouse cardiac models
has increasingly been used to identify the impact of molecular
perturbations on heart development, adaptation, and function,
and to define specific molecular mediators of these effects.41
The potential for RNA sequencing to detect subtle regulated
events is great, but the technology is relatively new, the
creation and bar-coding of libraries is unfamiliar to many
laboratories, and the analytic platforms can be challenging.
Here, we describe an RNA-sequencing pipeline that, once it
is implemented, is essentially turn-key in its operation.
Although RNA sequencing is the defacto “gold standard”
technique for identifying and quantifying transcripts, to our
knowledge it has not been applied to profiling of mouse
cardiac genes, and so we were careful to compare the results
of RNA sequencing to microarray profiling in the wellcharacterized G␣q model of cardiomyopathy. We found that
Illumina RNA sequencing is rapid, accurate, highly sensitive
1. Dorn GW II, Robbins J, Sugden PH. Phenotyping hypertrophy - eschew
obfuscation. Circ Res. 2003;92:1171–1175.
2. Dorn GW II. The fuzzy logic of physiological cardiac hypertrophy.
Hypertension. 2007;49:962–970.
3. Dorn GW II, Matkovich SJ. Put your chips on transcriptomics. Circulation.
2008;118:216–218.
4. D’Angelo DD, Sakata Y, Lorenz JN, Boivin GP, Walsh RA, Liggett SB,
Dorn GW II. Transgenic G␣q overexpression induces cardiac contractile
failure in mice. Proc Natl Acad Sci U S A. 1997;94:8121– 8126.
5. Dorn GW II, Tepe NM, Lorenz JN, Koch WJ, Liggett SB. Low- and
high-level transgenic expression of ␤2-adrenergic receptors differentially
affect cardiac hypertrophy and function in G␣q-overexpressing mice.
Proc Natl Acad Sci U S A. 1999;96:6400 – 6405.
6. Diwan A, Wansapura J, Syed FM, Matkovich SJ, Lorenz JN, Dorn GW
II. Nix-mediated apoptosis links myocardial fibrosis, cardiac remodeling,
and hypertrophy decompensation. Circulation. 2008;117:396 – 404.
7. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping
and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods.
2008;5:621– 628.
8. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memoryefficient alignment of short DNA sequences to the human genome.
Genome Biol. 2009;10:R25.
9. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions
with RNA-Seq. Bioinformatics. 2009;25:1105–1111.
10. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren
MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and
abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms. Nat Biotechnol. In press.
11. Matkovich SJ, Van Booven DJ, Youker KA, Torre-Amione G, Diwan A,
Eschenbacher WH, Dorn LE, Watson MA, Margulies KB, Dorn GW II.
Reciprocal regulation of myocardial microRNAs and messenger RNA in
human cardiomyopathy and reversal of the microRNA signature by biomechanical support. Circulation. 2009;119:1263–1271.
12. Matkovich SJ, Wang W, Tu Y, Eschenbacher WH, Dorn LE, Condorelli
G, Diwan A, Nerbonne JM, Dorn GW II. MicroRNA-133a protects
against myocardial fibrosis and modulates electrical repolarization
without affecting hypertrophy in pressure-overloaded adult hearts. Circ
Res. 2010;106:166 –175.
13. Maere S, Heymans K, Kuiper M. BiNGO: a Cytoscape plugin to assess
overrepresentation of gene ontology categories in biological networks.
Bioinformatics. 2005;21:3448 –3449.
14. Matkovich SJ, Van Booven DJ, Hindes A, Kang MY, Druley TE,
Vallania FL, Mitra RD, Reilly MP, Cappola TP, Dorn GW II. Cardiac
signaling genes exhibit unexpected sequence diversity in sporadic cardiomyopathy, revealing HSPB7 polymorphisms associated with disease.
J Clin Invest. 2010;120:280 –289.
15. Akhter SA, Luttrell LM, Rockman HA, Iaccarino G, Lefkowitz RJ, Koch
WJ. Targeting the receptor-Gq interface to inhibit in vivo pressure
overload myocardial hypertrophy. Science. 1998;280:574 –577.
16. Adams JW, Sakata Y, Davis MG, Sah VP, Wang Y, Liggett SB, Chien
KR, Brown JH, Dorn GW II. Enhanced G␣q signaling: a common
pathway mediates cardiac hypertrophy and apoptotic heart failure. Proc
Natl Acad Sci U S A. 1998;95:10140 –10145.
17. Aronow BJ, Toyokawa T, Canning A, Haghighi K, Delling U, Kranias E,
Molkentin JD, Dorn GW II. Divergent transcriptional responses to inde-
Matkovich et al
pendent genetic causes of cardiac hypertrophy. Physiol Genomics. 2001;
6:19 –28.
Yussman MG, Toyokawa T, Odley A, Lynch RA, Wu G, Colbert MC,
Aronow BJ, Lorenz JN, Dorn GW II. Mitochondrial death protein Nix is
induced in cardiac hypertrophy and triggers apoptotic cardiomyopathy.
Nat Med. 2002;8:725–730.
Sakata Y, Hoit BD, Liggett SB, Walsh RA, Dorn GW II. Decompensation of
pressure-overload hypertrophy in G␣q-overexpressing mice. Circulation.
1998;97:1488–1495.
Wu G, Toyokawa T, Hahn H, Dorn GW II. Epsilon protein kinase C in
pathological myocardial hypertrophy. Analysis by combined transgenic
expression of translocation modifiers and Galphaq. J Biol Chem. 2000;
275:29927–29930.
Syed F, Odley A, Hahn HS, Brunskill EW, Lynch RA, Marreez Y, Sanbe A,
Robbins J, Dorn GW II. Physiological growth synergizes with pathological
genes in experimental cardiomyopathy. Circ Res. 2004;95:1200–1206.
Satoh M, Matter CM, Ogita H, Takeshita K, Wang CY, Dorn GW II, Liao
JK. Inhibition of apoptosis-regulated signaling kinase-1 and prevention of
congestive heart failure by estrogen. Circulation. 2007;115:3197–3204.
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an
assessment of technical reproducibility and comparison with gene
expression arrays. Genome Res. 2008;18:1509 –1517.
Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C,
Kingsmore SF, Schroth GP, Burge CB. Alternative isoform regulation in
human tissue transcriptomes. Nature. 2008;456:470 – 476.
Goffart S, Kleist-Retzow JC, Wiesner RJ. Regulation of mitochondrial
proliferation in the heart: power-plant failure contributes to cardiac failure
in hypertrophy. Cardiovasc Res. 2004;64:198 –207.
Dorn GW II. Apoptotic and non-apoptotic programmed cardiomyocyte
death in ventricular remodelling. Cardiovasc Res. 2009;81:465– 473.
Dorn GW II, Brown JH. Gq signaling in cardiac adaptation and maladaptation. Trends Cardiovasc Med. 1999;9:26 –34.
Kim JB, Porreca GJ, Song L, Greenway SC, Gorham JM, Church GM,
Seidman CE, Seidman JG. Polony multiplex analysis of gene expression
(PMAGE) in mouse hypertrophic cardiomyopathy. Science. 2007;316:
1481–1484.
Chien KR, Knowlton KU, Zhu H, Chien S. Regulation of cardiac gene
expression during myocardial growth and hypertrophy: molecular studies
of an adaptive physiologic response. FASEB J. 1991;5:3037–3046.
18.
19.
20.
21.
22.
23.
Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017
24.
25.
26.
27.
28.
29.
RNA Sequencing and Transcriptional Regulation
1467
30. Draghici S, Khatri P, Eklund AC, Szallasi Z. Reliability and reproducibility
issues in DNA microarray measurements. Trends Genet. 2006;22:101–109.
31. Lister R, Gregory BD, Ecker JR. Next is now: new technologies for
sequencing of genomes, transcriptomes, and beyond. Curr Opin Plant
Biol. 2009;12:107–118.
32. Izumo S, Lompre AM, Matsuoka R, Koren G, Schwartz K, Nadal-Ginard
B, Mahdavi V. Myosin heavy chain messenger RNA and protein isoform
transitions during cardiac hypertrophy: interaction between hemodynamic
and thyroid hormone-induced signals. J Clin Invest. 1987;79:970 –977.
33. Bishopric NH, Simpson PC, Ordahl CP. Induction of the skeletal
alpha-actin gene in alpha 1-adrenoceptor-mediated hypertrophy of rat
cardiac myocytes. J Clin Invest. 1987;80:1194 –1199.
34. Stockmann PT, Will DH, Sides SD, Brunnert SR, Wilner GD, Leahy KM,
Wiegand RC, Needleman P. Reversible induction of right ventricular
atriopeptin synthesis in hypertrophy due to hypoxia. Circ Res. 1988;63:
207–213.
35. Waspe LE, Ordahl CP, Simpson PC. The cardiac ␤-myosin heavy chain
isogene is induced selectively in ␣1-adrenergic receptor-stimulated hypertrophy of cultured rat heart myocytes. J Clin Invest. 1990;85:
1206 –1214.
36. Ver Heyen M, Heymans S, Antoons G, Reed T, Periasamy M, Awede B,
Lebacq J, Vangheluwe P, Dewerchin M, Collen D, Sipido K, Carmeliet
P, Wuytack F. Replacement of the muscle-specific sarcoplasmic
reticulum Ca2⫹-ATPase isoform SERCA2a by the nonmuscle SERCA2b
homologue causes mild concentric hypertrophy and impairs contractionrelaxation of the heart. Circ Res. 2001;89:838 – 846.
37. Martinez-Munoz C, Rosenberg EH, Jakobs C, Salomons GS. Identification, characterization and cloning of SLC6A8C, a novel splice variant
of the creatine transporter gene. Gene. 2008;418:53–59.
38. Liao P, Yong TF, Liang MC, Yue DT, Soong TW. Splicing for alternative
structures of Cav1.2 Ca2⫹ channels in cardiac and smooth muscles.
Cardiovasc Res. 2005;68:197–203.
39. Hashemi SM, Hund TJ, Mohler PJ. Cardiac ankyrins in health and
disease. J Mol Cell Cardiol. 2009;47:203–209.
40. Qiu H, Dai H, Jain K, Shah R, Hong C, Pain J, Tian B, Vatner DE, Vatner
SF, Depre C. Characterization of a novel cardiac isoform of the cell
cycle-related kinase that is regulated during heart failure. J Biol Chem.
2008;283:22157–22165.
41. Genomics of Cardiovascular Development, Adaptation, and Remodeling.
NHLBI Program for Genomic Applications, Harvard Medical School.
http:///www.cardiogenomics.org. Accessed December 16, 2009.
Novelty and Significance
What Is Known?
●
●
Accurate, unbiased gene transcription profiling is necessary for
understanding disease mechanisms and can assist in determining
diagnosis and prognosis.
Large-scale transcriptional profiling has been performed using microarrays for several years, but expense, limited dynamic range,
and background correction remain problematic.
What New Information Does This Article Contribute?
●
●
●
●
Next-generation (high-throughput) sequencing of RNA was compared
to microarray analysis in a case– control design, using the
established Gq-overexpression mouse model of heart failure.
DNA-barcoding of individual heart samples permits multiple samples
to be sequenced simultaneously, bringing the cost of RNA
sequencing below that of microarrays.
RNA sequencing provides quantitative gene expression data and
reveals a greater dynamic range of gene regulation.
RNA sequencing offers insight into regulation of low-abundance
genes that microarrays cannot.
RNA profiling offers detailed insight into critical transcriptional
regulatory mechanisms in health and disease. The value of
transcriptional profiling is affected by the accuracy of the data
and the sensitivity to detect changes in expression of uncommon
mRNAs. RNA sequencing using massively parallel next generation platforms has the potential to enhance accuracy and depth
of transcriptional signatures. We compared cardiac gene expression profiling using RNA sequencing and the present
state-of-the art microarrays, applying both platforms to analysis
of mRNA expression in the Gq-overexpression mouse model of
cardiac hypertrophy. We describe how to individually DNA
barcode RNA sequencing libraries from individual hearts for
batch sequencing that reduces cost while providing digital
mRNA expression data at or below the level of 1 RNA copy per
cell. We found that RNA sequencing and microarrays provided
comparable data on regulation of high-abundance genes, but
that RNA sequencing was superior for detection and quantitation
of low-abundance genes, which represent the majority of
regulated genes in the G␣q model. The widespread implementation of RNA sequencing in disease studies should enhance
diagnostic and prognostic profiling, facilitating a more detailed
description of signaling mechanisms involving low-abundance
genes that were previously missed with microarray.
Deep mRNA Sequencing for In Vivo Functional Analysis of Cardiac Transcriptional
Regulators: Application to G αq
Scot J. Matkovich, Yan Zhang, Derek J. Van Booven and Gerald W. Dorn II
Downloaded from http://circres.ahajournals.org/ by guest on June 15, 2017
Circ Res. 2010;106:1459-1467; originally published online April 1, 2010;
doi: 10.1161/CIRCRESAHA.110.217513
Circulation Research is published by the American Heart Association, 7272 Greenville Avenue, Dallas, TX 75231
Copyright © 2010 American Heart Association, Inc. All rights reserved.
Print ISSN: 0009-7330. Online ISSN: 1524-4571
The online version of this article, along with updated information and services, is located on the
World Wide Web at:
http://circres.ahajournals.org/content/106/9/1459
Data Supplement (unedited) at:
http://circres.ahajournals.org/content/suppl/2010/04/01/CIRCRESAHA.110.217513.DC1
Permissions: Requests for permissions to reproduce figures, tables, or portions of articles originally published
in Circulation Research can be obtained via RightsLink, a service of the Copyright Clearance Center, not the
Editorial Office. Once the online version of the published article for which permission is being requested is
located, click Request Permissions in the middle column of the Web page under Services. Further information
about this process is available in the Permissions and Rights Question and Answer document.
Reprints: Information about reprints can be found online at:
http://www.lww.com/reprints
Subscriptions: Information about subscribing to Circulation Research is online at:
http://circres.ahajournals.org//subscriptions/
Supplemental Material
Detailed Methods
Mouse models
The Gαq-40 transgenic mice were generated on the FVB/N background in the Dorn laboratory,
and have been described previously 1, 2. Four pairs of 8 week old male nontransgenic FVB/N and Gαq-40
transgenic mouse hearts were used for RNA sequencing studies. Echocardiographic and cardiac
catheterization studies were performed using standard methods 1, 3.
Sequencing and data processing
Four barcoded libraries were combined in equimolar (10 nmol/L) amounts and diluted to 4
pmol/L for cluster formation on a single Illumina Genome Analyzer II flowcell lane. Basecalling of DNA
clusters was performed using Illumina’s processing pipeline software (version 1.5) and 36-nt sequences,
with quality scores, were obtained in Illumina’s SCARF text format. The first 4 nucleotides of each 36-nt
read corresponded to barcodes (3 nucleotides with an additional, invariant T). The UNIX functions grep,
sed and awk were used to sort the sequences according to barcode, remove the barcoding nucleotides, and
convert sequence data to FASTQ format. After barcode removal, the 32 base mRNA sequence reads
were mapped to transcripts annotated in NCBI release 37 of the mouse genome using the publicly
available packages Bowtie (release 0.12.0) (http://bowtie-bio.sourceforge.net/index.shtml) 4, (release
1.0.12) (http://tophat.cbcb.umd.edu/) 5, and Cufflinks (release 0.7.0) (http://cufflinks.cbcb.umd.edu/) 6.
Bowtie maps short reads to reference sequences such as whole genomes. TopHat and Cufflinks map
known and novel splice junctions, use annotation files to compute which aligned sequences map to the
known transcriptome, and take into account transcript isoform diversity (alternative splicing). Cufflinks
may be used with gene annotation files to calculate overall gene expression in terms of Reads Per Kb of
exon per Million mapped reads (RPKM), a parameter previously defined in 7. (The current release of
Cufflinks (0.8.1) uses the term Fragments Per Kb of exon per Million mapped reads (FPKM) 6, which is
equivalent to RPKM for single-end sequencing, as we have performed here.) We used the default options
supplied with these software packages in our analyses.
Annotation files in gtf format (http://mblab.wustl.edu/GTF22.html), including mRNAs, ESTs and
noncoding elements such as ribosomal RNAs and miRNA precursors, were downloaded from Ensembl
(ftp://ftp.ensembl.org/pub/current_gtf/mus_musculus/). We used the Cufflinks module (above) with
annotation files from which ESTs, rRNAs and miRNA precursors had been manually removed, to focus
on sequence reads mapping to coding genes. We analyzed only those RNA elements that had expression
signals in at least 2 of 4 biological replicates.
The gtf annotation files from Ensembl contain all possible separate transcripts (identified with
ENST labels) for each gene (identified with ENSG labels). In order to assess alternative splicing events,
we used the Cufflinks module with these same gtf annotation files, but forced calculation of RPKM
values for separate transcripts (ENSTs) rather than for entire genes (ENSGs). The presence of
alternatively spliced products was evaluated by manually examining Cufflinks output files for ENST
entries corresponding to an ENSG entry of interest.
Example shell script for extraction of two barcoded sequences from Illumina pipeline 1.5 SCARF text file
(rawsequence.txt), removal of 4-nucleotide barcodes, and conversion to FASTQ format
Extract from SCARF text file (rawsequence.txt):
1
…
HWIEAS158:7:1:18:1532#0/1:ATGTGCCGGGCGAGCCGCTCTGCGCTAGGCGCTCAG:ab`aaaabbabb`ba
ababaaa`aaaaaaaaa`aaa
HWI-EAS158:7:1:17:190#0/1:TGATGTTGACTTTGTCCACCTGGAACTTGGTCTCAA:
aaaaabaaaaaaaaa`a`a^__`aa`^`a]a]_`__
…
Shell script:
grep ':ATGT' rawsequence.txt > ATGT-barcoded.seq
sed 's/:ATGT/:/' ATGT-barcoded.seq | awk '{gsub(":","\t");print}' | awk '{print "@" $1 "_" $2 "_" $3 "_"
$4 "_" $5 "\n" $6 "\n" "+" "\n" substr($7,5,36) }' > ATGTsequences.fastq
grep ':TGAT' rawsequence.txt > TGAT-barcoded.seq
sed 's/:TGAT/:/' TGAT-barcoded.seq | awk '{gsub(":","\t");print}' | awk '{print "@" $1 "_" $2 "_" $3 "_"
$4 "_" $5 "\n" $6 "\n" "+" "\n" substr($7,5,36) }' > TGATsequences.fastq
Example shell script for mapping of FASTQ sequences to the mouse genome by Tophat, followed by
annotation and RPKM calculations for whole genes, using Cufflinks with a gtf annotation file
tophat --solexa1.3-quals -o ~/ATGTsequences-processed ~/bowtie_database/m_musculus/indexname
~/ATGTsequences.fastq
cd ~/ATGTsequences-processed
cufflinks -G ~/bowtie_database/m_musculus/index.gtf accepted_hits.sam
Example shell script for mapping of FASTQ sequences to the mouse genome by TopHat, followed by
annotation and RPKM calculations for separate transcript components to permit alternative splicing
analysis
tophat --solexa1.3-quals -o ~/ATGTsequences-processed ~/bowtie_database/m_musculus/indexname
~/ATGTsequences.fastq
cd ~/ATGTsequences-processed
cufflinks accepted_hits.sam
cuffcompare -o summstats.txt -r ~/bowtie_database/m_musculus/index.gtf transcripts.gtf
Reverse-transcription quantitative PCR
One microgram of total cardiac RNA was reverse-transcribed into cDNA using oligo-dT priming
(qScript Flex cDNA kit, Quanta Biosciences). One-twentieth of each preparation was used for each
individual qPCR, using one of the following TaqMan probes (Applied Biosystems): Nppa,
Mm01255747_g1; Atp2a2, Mm00437634_m1; Acta1, Mm00808218_g1; Myh7, Mm00600555_m1;
Aqp4, Mm00802131_m1; Bcl2, Mm00477631_m1; Edn3, Mm00432986_m1; Fhl2, Mm00515781_m1;
Nrg1, Mm00626552_m1; Tac1, Mm00436880_m1; Tuba1a, Mm00846967_g1; Gapdh, 4352339E.
2
Supplemental Figure I. Correspondence of gene expression determined by RNA sequencing vs
microarray, for 8 individual hearts.U Gene expression determined by RNA sequencing (Illumina) is
plotted against gene expression determined by microarrays (Affymetrix). Values shown are
log2(RPKM+1) for RNA sequencing (x-axis) and log2(Affymetrix signal units+1) for microarrays. A
value of 1 was added to RPKM or Affymetrix signal units to avoid taking the log of 0.
3
Supplemental Table I: Sequencing read totals and alignment to reference genome for barcoded RNASeq samples.
Lane 1
ntgATGT ntgTGAT ntgTGCT ntgTGTT Total
Recognized
barcoded reads
Total raw reads
% barcode
detection
Reads aligned to
reference
genome
% alignment
3426712
2349661
1735078
1694057
9205508
11216862
Lane 2
GqTCAT
GqTCCT
2882044
2020058
GqTCTT GqTGGT Total
1343763
82.1
1885165
55.0
1546833
65.8
1304290
75.2
766577
45.3
5502865
59.8
1676644
58.2
1334172
66.0
411077
30.6
Both lanes
Total
2765911 9011776
9591887
18217284
20808749
94.0
87.5
1607088 5028981
58.1
55.8
10531846
57.8
Supplemental Table I. Alignment of sequencing reads to the mouse genome. The number of sequencing reads with recognizable barcodes was
compared to the total number of sequencing reads obtained for the entire sequencing lane. In addition, the percentage of barcoded reads that were
successfully aligned to the mouse genome was calculated.
4
Supplemental Table II is supplied as an Excel datasheet.
Supplemental Table II. Highly expressed genes in nontransgenic mouse heart, determined by RNA
sequencing. The 234 highest-expressing genes (equal or greater than 60 copies per cell) in nontransgenic
mouse hearts (mean ± SEM, n=4) are shown, ranked from highest to lowest expression. 1 copy per cell
corresponds to an RPKM value of 3 according to Mortazavi et al. 7.
Supplemental Table III is supplied as an Excel datasheet.
Supplemental Table III. Highly expressed genes in Gq mouse hearts, determined by RNA sequencing.
The 236 highest-expressing genes (equal or greater than 60 copies per cell) in Gq transgenic mouse hearts
(mean ± SEM, n=4) are shown, ranked from highest to lowest expression.
Supplemental Table IV is supplied as an Excel datasheet.
Supplemental Table IV. Complete gene expression profile in individual nontransgenic and Gq mouse
hearts, determined by RNA sequencing. All genes detected by RNA sequencing in individual hearts are
shown, ranked from highest to lowest copy number per cell in nontransgenic hearts (mean ± SEM).
Genes detected in only 1 of the 4 biological replicates (no value for standard error) were excluded from
further analysis.
5
Supplemental Table V: Highly-expressed genes in mouse hearts by GO category
GO-ID
6915
7166
Description
apoptosis
cell surface
receptor linked
signal
transduction
Genes
ACTC1,BAG3,ITM2B,NDUFA13,PSAP,SOD2,TNS1,VDAC1
AES,CRYAB,GNAS,GNB2L1
5856
cytoskeleton
5783
endoplasmic
reticulum
glucose
metabolic process
lipid metabolic
process
ACTA1,ACTB,ACTC1,ACTN2,CSRP3,DES,GSN,HSPB7,IVNS1A
BP,LDB3,LMOD2,MAP1LC3A,MYBPC3,MYH6,MYH7,MYL2,M
YL3,MYL4,MYL7,MYOM1,MYOZ2,PAM,TNNT2,TPM1,TTN,TU
BA4A
ATP2A2,HERPUD1,HRC,MGST3,PLN,PTGDS,S100A1,SLN,SRL
6006
6629
5746
mitochondrial
respiratory chain
5739
mitochondrion
30016
myofibril
5634
nucleus
ALDOA,ENO3,LDHA,LDHB,MDH1,MDH2,OGDH,PDHA1,PDK2,
PFKM,PGAM2,TPI1
ACAA2,ANKRD23,APOE,ACADL,ACADM,ATP5A1,ACADVL,A
TP5B,CPT1B,ECH1,FABP3,HADHA,LPL,PHYH,PSAP,PTGDS,TP
I1
COX6B1,COX7B,CYC1,COX6C,COX7A1,MT-CO1,COX7A2,MTCO2,MT-CYTB,MT-ND3,MTND4,NDUFA1,NDUFA10,NDUFA11,NDUFA13,NDUFA2,NDUFA
3,NDUFA4,NDUFA6,NDUFA8,NDUFB10,NDUFB5,NDUFB7,ND
UFB8,NDUFB9,NDUFC1,NDUFS2,NDUFS6,NDUFS7,NDUFV1,N
DUFV2,UQCR,UQCRC1,UQCRC2,UQCRFS1,UQCRH,UQCRQ
ACAA2,ACO2,AK1,ACADL,ACADM,ATP5A1,ACADVL,ATP5B,
ATP5C1,ATP5D,ATP5E,ATP5F1,ATP5G1,ATP5G3,ATP5H,ATP5J
,ATP5J2,ATP5K,ATP5O,BRP44,CHCHD10,CKMT2,COQ9,COX17
,COX4I1,COX5A,COX5B,COX6A2,COX6B1,COX6C,COX7A1,C
OX7A2,COX7B,COX8B,CPT1B,CS,CTSD,CYC1,ECH1,ETFB,GB
AS,HADHA,HSP90AB1,IDH2,IDH3B,IDH3G,LARS2,MDH2,MTATP6,MT-CO1,MT-CO2,MT-CYTB,MT-ND1,MT-ND3,MTND4,MT-ND6,
NDUFA1,NDUFA10,NDUFA11,NDUFA13,NDUFA2,NDUFA3,ND
UFA4,NDUFA6,NDUFA8,NDUFB10,NDUFB5,NDUFB7,NDUFB8
,NDUFB9,NDUFC1,NDUFS2,NDUFS6,NDUFS7,NDUFV1,NDUF
V2,NNT,OGDH,OXCT1,PDHA1,PDK2,PINK1,PSAP,SDHA,SDHB
,SDHC,SLC25A3,SLC25A4,SOD2,UQCR,UQCRC1,UQCRC2,UQC
RFS1,UQCRH,UQCRQ,VDAC1
ACTA1,ACTC1,ACTN2,ANKRD1,ANKRD23,CRYAB,CSRP3,DE
S,HSPB1,LDB3,MYBPC3,MYH6,MYH7,MYL2,MYOM1,MYOZ2,
PDLIM5,PYGM,TCAP,TNNI3,TNNT2,TPM1,TTN
AES,ANKRD1,ANKRD23,CS,CSRP3,EEF1A2,FABP4,FHL2,HOP
X,HSPB1,IVNS1ABP,LDB3,NDUFA13,PAM,PDE4DIP,PTGDS,TT
N
6
GO-ID
7242
6950
Description
intracellular
signaling cascade
regulation of
transcription
response to stress
6416
6810
translation
transport
45449
Genes
ATP2A2,GNAS,GNB2L1,MT1,PINK1,PSAP,SOD2,TNS1
AES,ANKRD1,FABP4,FHL2,HOPX,PAM
ACADM,APOE,CRYAB,CTSD,GPX3,HERPUD1,HSP90AB1,HSP
B1,HSPB6,HSPB7,HSPB8,IDH2,MB,SOD2,TNS1,VDAC1
EEF1A2,EEF2,LARS2,RPL8,RPLP1,RPS5,RPS9
APOE,ATP1A1,ATP2A2,ATP5A1,ATP5B,ATP5C1,ATP5D,ATP5E
,ATP5F1,ATP5G1,ATP5G3,ATP5H,ATP5J,ATP5J2,ATP5K,ATP5O
,COX17,CPT1B,CYC1,ETFB,FABP3,FABP4,FTH1,FXYD1,GSN,H
BA-A1,HBA-A2,HBB-B1,HBB-B2,MB,MT-ATP6,MT-CO1,MTCO2,MT-CYTB,MT-ND3,MTND4,MYH6,NDUFA1,NDUFA10,NDUFA11,NDUFA13,NDUFA2,
NDUFA3,NDUFA4,NDUFA6,NDUFA8,NDUFB10,NDUFB5,NDU
FB7,NDUFB8,NDUFB9,NDUFC1,NDUFS2,NDUFS6,NDUFS7,ND
UFV1,NDUFV2,NNT,PLN,PSAP,PTGDS,SDHA,SDHB,SDHC,SLC
25A3,SLC25A4,TUBA4A,UQCR,UQCRC1,UQCRC2,UQCRFS1,U
QCRH,UQCRQ,VDAC1
Supplemental Table V. Genes expressed at greater than 60 copies per cell in nontransgenic hearts,
determined by RNA sequencing, are presented by Gene Ontology categories (GO-IDs).
7
Supplemental Table VI is supplied as an Excel datasheet.
Supplemental Table VI. Differential gene expression in Gq-overexpressing hearts compared to
nontransgenic hearts, as determined by microarrays. Genes were defined as differentially expressed at
P<0.001 (FDR<0.03), magnitude of fold-change ≥1.3. Genes are ranked by fold-change in expression,
from highest to lowest. Data from RNA sequencing for the same genes are shown, and significant
regulation reported by RNA sequencing is indicated.
8
Supplemental Table VII: Differential gene expression in Gq-overexpressing compared to
nontransgenic hearts
Gene
ntg
Gq
symbol
copies/cell copies/cell
Upregulated
Gnaq
2
574
Nppa
43
2604
Gdf15
2
62
Fgf6
0.4
5
Thbs4
0.7
6
Acta1
127
897
Synpo2l
20
117
Bcl2
0.4
2
Nppb
124
635
Phlda1
7
30
Scgb1c1
4
16
Rcan1
19
75
Etv5
1
4
Arc
0.3
1
Casq1
11
41
Klhl34
2
6
Dusp27
3
9
Dbn1
1
4
Nuak1
5
16
Ankrd1
204
671
Cyr61
15
47
Ppm1e
0.5
2
Fbxl7
0.2
0.6
Tnfrsf12a
20
64
Ckap4
3
10
Mybpc2
4
13
Otud1
7
22
Masp1
2
6
Spry4
1
3
Irx5
0.7
2
Uck2
5
13
Col1a1
11
29
Col15a1
9
23
Abra
11
29
Pfkp
4
9
Fold-change
(RNASeq)
p-value
(RNASeq)
Fold-change
(array)
p-value
(array)
regulated
(array)
365.9
60.2
26.5
11.2
8.6
7.0
5.9
5.4
5.1
4.5
4.1
4.0
3.7
3.7
3.6
3.6
3.4
3.3
3.3
3.3
3.2
3.1
3.1
3.1
3.0
3.0
3.0
2.9
2.8
2.8
2.8
2.7
2.6
2.6
2.6
1.4E-06
5.9E-04
4.2E-04
3.4E-05
6.3E-04
2.5E-05
1.4E-04
1.6E-04
2.0E-06
8.0E-04
7.5E-04
6.5E-05
4.4E-04
4.4E-04
1.3E-05
7.1E-04
7.7E-04
5.6E-06
2.8E-04
2.0E-06
5.3E-05
2.0E-04
3.4E-04
2.9E-04
9.3E-05
1.6E-05
3.1E-05
6.8E-05
8.2E-05
1.8E-04
6.4E-04
3.2E-04
6.2E-04
1.1E-04
8.6E-05
15.1
13.4
4.0
4.1
9.7
2.6
2.3
2.8
3.0
1.9
2.7
2.3
2.6
1.1
2.5
1.6
2.4
1.6
2.4
1.5
2.3
2.8
1.6
2.3
1.9
2.3
2.4
1.7
1.3
1.4
2.3
2.3
1.7
2.1
1.7
6.9E-15
2.6E-10
9.8E-07
7.5E-08
3.8E-09
1.8E-06
5.1E-07
9.3E-09
1.0E-07
9.4E-05
1.2E-05
1.1E-06
1.0E-05
2.3E-01
8.4E-09
1.8E-03
7.6E-06
5.3E-07
1.4E-05
9.3E-06
4.0E-05
3.4E-07
3.1E-05
3.9E-07
2.1E-05
1.2E-06
2.6E-04
1.4E-04
3.3E-02
1.2E-03
5.3E-04
2.0E-09
1.1E-05
8.8E-06
1.7E-04
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
9
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Gene
symbol
Arhgap24
Rasa3
Frmd6
Napepld
Thap1
Xirp2
Serpinb6b
Itgb5
Surf6
Tjap1
Camta1
Itga9
Qsox1
Btg2
Hn1
Nfatc2
Myom2
Enah
Nrap
Col1a2
Lrrc10
Prkab2
Slc38a2
Tsc22d1
Heyl
Ttpal
Grb10
Sorbs2
Prmt2
Mmp2
Mybpc3
Lamb2
Bambi
Myh10
Habp4
Plxnb2
Msn
Hspb6
Dnaja4
Plxna1
ntg
Gq
copies/cell copies/cell
2
4
1
3
2
4
0.3
0.6
0.9
2
19
44
3
7
10
22
0.6
1
5
12
5
12
4
9
4
9
4
8
12
26
2
3
66
135
18
37
55
112
11
21
55
105
3
5
26
46
32
57
2
4
2
3
11
19
39
67
2
3
8
13
261
424
41
65
1
2
2
3
5
8
9
14
11
17
182
267
13
20
2
3
Fold-change
(RNASeq)
2.4
2.4
2.4
2.3
2.3
2.3
2.3
2.3
2.2
2.2
2.2
2.2
2.2
2.1
2.1
2.0
2.0
2.0
2.0
1.9
1.9
1.9
1.8
1.8
1.8
1.8
1.7
1.7
1.7
1.7
1.6
1.6
1.6
1.6
1.5
1.5
1.5
1.5
1.5
1.4
10
p-value
(RNASeq)
5.0E-04
7.7E-05
2.4E-04
9.6E-04
1.1E-03
4.8E-04
1.7E-04
7.8E-05
2.4E-05
4.8E-04
5.6E-05
3.0E-04
1.2E-04
6.8E-04
1.6E-04
2.4E-04
1.8E-05
3.1E-04
4.8E-05
7.6E-04
2.3E-04
1.1E-03
6.6E-04
8.3E-04
1.0E-03
9.5E-04
5.5E-05
3.3E-04
9.1E-04
1.5E-04
2.3E-05
5.6E-04
7.5E-04
7.4E-04
5.5E-04
1.0E-03
6.1E-04
5.6E-04
4.1E-04
7.5E-04
Fold-change
(array)
1.2
1.5
1.5
1.5
-1.1
2.3
1.9
1.8
1.0
1.7
1.7
1.8
1.5
1.7
1.7
1.5
1.6
1.7
1.6
1.9
1.7
1.5
1.5
1.3
1.3
1.3
1.4
1.6
1.4
1.5
1.2
1.3
1.2
1.4
1.2
1.3
1.5
1.0
1.3
1.1
p-value
(array)
3.2E-03
5.6E-04
6.4E-04
7.9E-04
2.5E-01
2.9E-07
1.9E-07
3.5E-08
4.7E-01
2.2E-05
3.4E-07
8.4E-07
1.4E-06
9.4E-04
6.1E-06
1.5E-05
2.7E-03
2.2E-05
7.4E-06
3.5E-08
8.9E-05
1.5E-03
3.4E-05
5.9E-04
2.9E-05
1.5E-04
4.3E-05
5.9E-05
1.3E-04
9.0E-05
7.1E-05
3.2E-05
5.7E-03
2.6E-05
2.2E-03
2.1E-04
8.8E-07
7.9E-01
1.0E-03
2.0E-01
regulated
(array)
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Gene
ntg
Gq
symbol
copies/cell copies/cell
Cxcr7
10
14
Arl6ip5
20
26
Downregulated
Idh3b
85
63
Dlat
27
20
Fmo1
5
4
Ank
21
15
Pdha1
78
54
Hadha
128
88
Acadvl
99
68
Strn3
6
4
Rxrg
8
5
Fyco1
22
13
Sh3kbp1
8
5
Ndufv2
99
58
Rhobtb1
28
16
Mapkapk3
3
2
Atp2a2
811
458
Acaa2
96
51
Npc1
8
4
Bcs1l
4
2
Qrsl1
3
2
Hsdl2
15
7
Nphp3
18
9
Aldh6a1
20
10
Decr1
35
17
Suclg2
11
5
Epm2aip1
2
1
Grb14
24
12
Car14
20
9
Mtr
23
10
Fah
6
2
Nqo1
8
3
Scn4a
3
1
A2bp1
7
2
Slc22a3
0.8
0.3
Lrtm1
33
12
Pex11a
2
0.8
Olfml2b
5
2
Mylk4
14
5
Fold-change
(RNASeq)
1.4
1.3
p-value
(RNASeq)
7.2E-04
2.0E-04
Fold-change
(array)
1.1
1.1
p-value
(array)
4.0E-01
4.5E-01
-1.3
-1.3
-1.4
-1.4
-1.4
-1.4
-1.5
-1.5
-1.6
-1.6
-1.7
-1.7
-1.7
-1.7
-1.8
-1.9
-1.9
-1.9
-2.0
-2.0
-2.0
-2.0
-2.0
-2.0
-2.1
-2.1
-2.2
-2.3
-2.5
-2.6
-2.6
-2.6
-2.8
-2.8
-2.8
-2.9
-3.1
6.6E-04
1.0E-03
5.0E-04
2.8E-04
9.3E-04
7.3E-04
1.1E-04
3.8E-04
6.9E-04
4.8E-04
9.1E-04
3.6E-04
8.8E-04
8.1E-04
1.8E-04
4.6E-05
2.7E-05
5.4E-04
2.3E-04
8.2E-04
7.6E-05
2.4E-04
5.3E-04
8.1E-04
6.2E-04
2.5E-04
2.0E-04
3.2E-04
9.3E-05
8.6E-04
6.9E-05
4.8E-04
3.3E-04
1.9E-04
3.8E-04
6.6E-04
2.7E-04
-1.2
-1.2
-1.2
-1.3
-1.2
-1.3
-1.6
1.1
-1.5
-1.5
-1.2
-1.2
-1.4
-1.6
-1.2
-1.7
-1.8
-1.1
-1.5
-1.5
1.0
-1.5
-1.6
-1.5
-1.1
-1.7
-2.0
-1.0
-2.3
-1.7
-2.5
-1.9
-2.2
-2.6
-1.8
-1.9
-2.1
1.2E-04
3.6E-04
1.6E-01
9.7E-06
2.5E-05
5.3E-06
2.4E-07
1.2E-01
9.5E-05
8.9E-07
1.0E-03
2.6E-04
1.4E-04
9.3E-05
2.1E-05
3.9E-07
4.4E-07
3.4E-02
2.5E-04
5.2E-06
9.6E-01
3.7E-07
1.6E-07
1.5E-06
9.5E-02
4.1E-08
1.2E-07
7.3E-01
3.9E-05
2.2E-05
7.0E-06
4.0E-07
2.4E-06
6.8E-07
6.6E-05
2.3E-06
1.1E-05
11
regulated
(array)
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Y
Gene
symbol
Fhl2
Slc38a3
Nr4a3
Efnb3
Dusp18
Scn4b
Whrn
Dixdc1
Ces3
Cfd
Tmem163
ntg
Gq
copies/cell copies/cell
167
50
10
3
3
0.7
11
3
7
1
5
1
4
0.7
0.5
0.1
19
2
9
0.7
5
0.2
Fold-change
(RNASeq)
-3.4
-3.5
-3.5
-3.9
-5.2
-5.2
-5.3
-6.3
-10.0
-12.6
-28.3
p-value
(RNASeq)
8.7E-05
1.6E-05
8.3E-05
4.7E-05
6.7E-05
3.8E-04
6.8E-04
3.3E-04
2.5E-04
1.2E-04
6.9E-04
Fold-change
(array)
-2.6
-2.5
-2.5
-2.4
-4.2
-3.9
-2.2
-1.5
-7.7
1.2
1.2
p-value
(array)
5.1E-08
8.3E-07
1.7E-04
4.5E-07
1.6E-07
3.9E-06
6.7E-07
1.9E-04
2.4E-11
1.7E-02
5.7E-02
regulated
(array)
Y
Y
Y
Y
Y
Y
Y
Y
Y
Supplemental Table VII. Differential gene expression in Gq-overexpressing compared to nontransgenic
hearts. 125 genes were defined as differentially expressed using RNA sequencing (P<0.001, FDR=0.1,
magnitude of fold-change ≥1.3). Genes are listed with numbers of copies per cell, and ranked by foldchange in expression, from highest to lowest. Data from microarrays for the same genes are shown, and
significant regulation reported by microarrays (P<0.001, fold-change ≥1.3) is indicated.
12
Supplemental Table VIII is supplied as an Excel datasheet.
Supplemental Table VIII. Ingenuity Pathways Analysis networks. The 77 upregulated and 48
downregulated genes in Gq hearts, defined by RNA sequencing, were used as input for automated
signaling network analysis. Gene input lists were subdivided into a high-expressing (>20 copies/cell) set
(first datasheet tab), a low-expressing (<20 copies/cell) set (second datasheet tab), or not divided (third
worksheet tab). The ‘Focus Molecules’ column denotes the number of genes from the input lists that
participate in a given signaling network.
13
Barcode
sense oligo
Barcode
antisense
oligo
Forward
PCR primer
Reverse
PCR primer
Sequencing
primer
5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCTagtT-3'
5Phos-actAGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG-3'
5'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGAC
GCTCTTCCGATCT-3'
5'-CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-3'
5'-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3'
Supplemental Table IX. Nucleotide sequences for custom ‘barcoded’ Illumina sequencing adapters.
Top: Example oligonucleotides for Illumina sequencing adapters, using the barcode ‘agt’. Middle:
Generic primers, independent of barcode, used for PCR amplification of gel-purified Illumina libraries.
Bottom: Generic oligonucleotide for sequencing libraries on the Illumina Genome Analyzer.
14
References
1.
2.
3.
4.
5.
6.
7.
D'Angelo DD, Sakata Y, Lorenz JN, Boivin GP, Walsh RA, Liggett SB, Dorn GW, II.
Transgenic Gαq overexpression induces cardiac contractile failure in mice. Proc Natl
Acad Sci USA. 1997;94:8121-8126.
Dorn GW, II, Tepe NM, Lorenz JN, Koch WJ, Liggett SB. Low- and high-level
transgenic expression of β2-adrenergic receptors differentially affect cardiac hypertrophy
and function in Gαq-overexpressing mice. Proc Natl Acad Sci USA. 1999;96:6400-6405.
Diwan A, Wansapura J, Syed FM, Matkovich SJ, Lorenz JN, Dorn GW, II. Nix-mediated
apoptosis links myocardial fibrosis, cardiac remodeling, and hypertrophy
decompensation. Circulation. 2008;117:396-404.
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment
of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq.
Bioinformatics. 2009;25:1105-1111.
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL,
Wold BJ, Pachter L. Transcript assembly and abundance estimation from RNA-Seq
reveals thousands of new transcripts and switching among isoforms. Nat Biotechnol.
2010;in press.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying
mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621-628.
15