Download Extracting meaning from microarray data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

JADE1 wikipedia , lookup

Gene regulatory network wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
1474
Biochemical Society Transactions (2003) Volume 31, part 6
Extracting meaning from microarray data
R.K. Curtis1 and M.D. Brand
MRC Dunn Human Nutrition Unit, Hills Road, Cambridge CB2 2XY, U.K.
Abstract
Gene expression is complex: many mRNAs change in abundance in response to a new condition. But while
some of these expression changes may be direct, many may be downstream, indirect effects. One of the
major problems of microarray data analysis is distinguishing between these changes. Some of the most
common methods of analysis are discussed, in the context of their ability to distinguish between direct
and indirect expression changes. The application of modular control analysis to microarray data in order to
partition and quantify the importance of mRNA clusters in mediating responses is described.
Introduction
Visual inspection
Microarrays can be used to profile the expression of every
gene in a sample. Their use is increasing and microarrays
have been used to investigate many aspects of biology, such as
the cell cycle [1], differentiation [2], and how environmental
stress [3] and diseases such as cancer [4] affect gene expression.
Microarray experiments reveal that hundreds and sometimes
thousands of mRNAs change in abundance between experimental conditions. The datasets produced are very large, and
there are many different ways of analysing the data produced,
depending upon the aim of the experiments. For example,
Alizadeh et al. [4] were interested in diagnosing cancer
subgroups by classifying samples, Hughes et al. [3] predicted
the function of novel genes, and Ideker et al. [5] used
expression data to investigate the regulation of a metabolic
pathway.
The simplest way to analyse microarray data is to look at the
mRNAs that change. But if the expression of a gene of interest
changes, you still do not know if this is a direct change or a
downstream indirect effect, or how important it is in causing
the response. However, this is the commonest method in
practice, with large changes in the expression of a set of
mRNAs thought to be relevant greeted with satisfaction, and
the lack of such changes greeted with disappointment and
frustration. Nonetheless, this method does throw up unexpected genes for further study.
Microarray data analysis
When cells are given a stimulus (such as the addition of a
hormone) and a response (such as growth rate) is measured,
there may be more than one pathway involved in effecting that
response. Some of these pathways may require transcription
of genes. If gene expression during a response has been
profiled using microarrays, often many mRNAs change in
abundance, revealing that gene expression is complex. The
stimulus may have direct effects on the mRNAs, for example a
hormone binds a transcription factor, which directly increases
the expression of three mRNAs. There may also be indirect
effects: the transcription factor regulates those three mRNAs,
which in turn indirectly regulate hundreds of other genes.
Both the direct and indirect effects may be involved in causing
the eventual response to the new experimental condition. One
of the major problems of microarray data analysis is finding
a method of analysis that is able to distinguish between these
changes, and can quantify the importance of each of
these direct and indirect effects on the end response, and
on the cascade of mRNA responses.
Key words: clustering, metabolic control analysis, microarray, modular control analysis.
1
To whom correspondence should be addressed (e-mail [email protected].
uk).
C 2003
Biochemical Society
Clustering
Clustering is often used to predict the function of unknown
genes, on the basis that genes with similar functions tend to
be co-expressed and, in that sense, cluster together [6]. There
are several types of cluster analysis.
Hierarchical clustering involves repeatedly merging
mRNAs or mRNA clusters, based on a distance measure,
to form a new cluster. The distance measures reflect the
absolute expression values (Euclidean), or expression trends
(correlation-based). Genes of similar function tend to cluster
together, and clustering is widely used to predict the function
of novel genes [3], and to classify samples such as tumours
[4]. Two-dimensional clustering (or biclustering) is used by
Kluger et al. [7]. This involves clustering by both gene expression and by sample in order to improve the overall result.
Self-organizing maps [8] use a neural network learning
algorithm to group mRNAs into clusters. As with hierarchical clustering, co-expression is used to predict functions of
unknowns.
Cluster analysis is able to describe patterns of expression,
to find genes which are co-expressed, and to discover new or
unexpected genes that may be worth investigating to see if
they are important in causing the response. However, these
two methods are not able to distinguish between direct and
indirect expression changes, or to find important mRNAs.
Principal components analysis [9] is used to find the genes
(or experiments) that explain the differences in observations
of a multidimensional dataset. It can be used to summarize
the observed variability into a small number of components
Unravelling Nature’s Networks
(variables), but does not provide information about how
those changes propagate to a system response from a stimulus.
Genetic networks
Gene expression data from knockout experiments can be used
to infer genetic networks [5,10]. Using information about the
expression of which genes change when another is knocked
out, a graph of nodes (genes) connected by edges (interactions
between the genes) is constructed. This predicts relationships between genes, such as transcriptional regulation,
and some methods are able to distinguish between direct and
indirect interactions. However, it is semi-quantitative at best,
and as with cluster analysis, does not describe the transmission of signals/responses through a system.
Despite all the methods described above, there is still a
need for a means of analysing this type of microarray data that
is able to distinguish direct and indirect effects, and can find
and quantify pathways that effect a response. Once important
mRNAs have been identified, this may help to indicate the
mechanisms of a response, such as the change to a disease state,
and those genes that are targets for modifying that response.
Modular control analysis
We have been applying modular control analysis [11,12] to
microarray data. This is a subset of metabolic control analysis
[13], which has been used to investigate the control and
regulation of biological systems such as metabolic pathways.
The modular approach involves simplifying a system by
dividing it into a small number of biologically meaningful
modules. Ainscow and Brand [14] investigated the response
of hepatocyte metabolism to the effectors glucagon and
adrenaline. Using control analysis of a nine-module system,
the response was partitioned and quantified, revealing which
metabolic modules responded directly to the effector. Indirect
routes of the response were also quantified, revealing, for
example, that glucagon acted directly on the glucose-release
module, while adrenaline acted by increasing the glucose 6phosphate concentration, which in turn stimulated glucose
release [14].
Application to microarray data
Modular regulation analysis can be applied to microarray
data in order to partition and quantify the importance of
expression changes that are involved in a response [15]. First,
the microarray data are simplified by grouping mRNAs
together into clusters, based on similarities in their expression
patterns. Next, how the input to the system affects the mRNA
clusters is calculated. The input might be a change in growth
medium, a genetic modification, the addition of an effector
(e.g. hormone, drug or protein), or the change to a new
state (e.g. disease). This acts to change the expression of each
mRNA cluster and is described using integrated response
coefficients, which are easy to calculate from the microarray
data.
How the mRNA clusters affect the output response is
described using elasticity coefficients. The output can be any
quantifiable response, such as the rate of an enzyme; the
concentration of a metabolite; or a physiological marker, such
as growth rate, cell volume or mortality. Expression data from
a series of genetic modulation experiments (knockouts) are
used to calculate elasticity coefficients, which describe how
much an mRNA cluster matters for the output response.
Once integrated responses (how the input affects each
cluster) and elasticity coefficients (how much a cluster matters
for the response) have been calculated, these are multiplied together to give partial response coefficients. These are the main
result of the analysis and describe how much of the response
to the input is transmitted by each mRNA cluster. Clusters
with large partial response coefficients are important for
the response. Once important clusters have been identified,
they can be opened up to see which mRNAs they contain;
these mRNAs are then possible targets for modifying the
response.
Conclusions
Unlike other methods, modular control analysis of microarray data does not require any prior information about the
function of genes or the position of knockouts. It can be
used to quantify the importance of mRNAs in physiological
responses such as growth. It is highly applicable to microarray
data and is transferable to proteome and metabolome data.
References
1 Spellman, P.T., Sherlock, G., Zhang, M.Q., Vishwanath, R.I., Anders, K.,
Eisen, M.B., Brown, P.O., Botstein, D. and Futcher, B. (1998) Mol. Biol.
Cell 9, 3273–3297
2 Le Naour, F., Hohenkirk, L., Grolleau, A., Misek, D.E., Lescure, P., Geiger,
J.D., Hanash, S. and Beretta, L. (2001) J. Biol. Chem. 276, 17920–17931
3 Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R.,
Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D. et al. (2000) Cell
102, 109–126
4 Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A.,
Boldrick, J.C., Sabet, H., Tran, T., Yu, X. et al. (2000) Nature (London) 403,
503–511
5 Ideker, T., Thorsson, V., Ranish, J.A., Christmas, R., Buhler, J., Eng, J.K.,
Bumgarner, R., Goodlett, D.R., Aebersold, R. and Hood, L. (2001) Science
292, 929–934
6 Eisen, M.B., Spellman, P.T., Brown, P.O. and Botstein, D. (1998) Proc.
Natl. Acad. Sci. U.S.A. 95, 14863–14868
7 Kluger, Y., Basri, R., Chang, J.T. and Gerstein, M. (2003) Genome Res. 13,
703–716
8 Toronen, P., Kolehmainen, M., Wong, G. and Castren, E. (1999) FEBS Lett.
451, 142–146
9 Raychaudhuri, S., Stuart, J.M. and Altman, R.B. (2000) Pac. Symp.
Biocomput. 2000, 455–466
10 Schlitt, T. and Brazma, A. (2002) Comp. Func. Genomics 3, 499–503
11 Brand, M.D. (1996) J. Theor. Biol. 182, 351–360
12 Brand, M.D. (1997) J. Exp. Biol. 200, 193–202
13 Fell, D. (1997) Understanding the Control of Metabolism, Portland Press,
London
14 Ainscow, E.K. and Brand, M.D. (1999) Eur. J. Biochem. 265, 1043–1055
15 Curtis, R.K. and Brand, M.D. (2002) Mol. Biol. Rep. 29, 67–71
Received 30 June 2003
C 2003
Biochemical Society
1475