Download Systems Biology Solutions to Microarray Nightmare

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

Community fingerprinting wikipedia , lookup

Synthetic biology wikipedia , lookup

Expression vector wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene regulatory network wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Biopharma
Systems Biology Solutions to
Microarray Nightmare
By Teresa Sardón, Cristina
Segú and José Manuel
Mas at Anaxomics
Systems biology approaches may hold the key to overcoming the data
analysis and interpretation difficulties from microarray technology.
Here, a tool to understand a drug mechanism of action from a
microarray experiment is outlined
In spite of the numerous benefits
that microarray technology brings
to pathology characterisation and
drug development, it has some
associated drawbacks at different
levels. Particular difficulties can be
encountered in gene expression
microarray in terms of data analysis
and interpretation.
Different approaches to overcome
microarray results interpretation
have been put forward. Those
that are based on systems biology
approaches seem to deliver more
informative results, such as a newly
available tool that integrates
microarray data into mathematical
models to unveil the underlying
mechanisms of action.
Microarray Drawbacks
The concept of DNA microarray,
developed in the early 1990s,
was evolved from Southern
blotting technique, based
on solid-phase hybridisation
technology. This method relies
on the immobilisation of probe
molecules onto a solid surface
and the recognition of their
complementary
Keywords
DNA target
sequence by
hybridisation.
Gene expression
DNA microarrays
Microarray technology
have dramatically
System biology
accelerated
Mathematical modelling
many types of
44
IPT 45 2013.indd 44
investigations, ranging from basic
science to clinical applications (1,2).
Gene expression microarrays
represent an extremely potent
tool to researchers, enabling them
to monitor the expression of
thousands of genes simultaneously.
However, the implementation of the
technique entails some difficulties
at different stages of the process.
Experimental Design
Microarrays are not always useful,
and sometimes it is not possible
to extract evident conclusions.
To overcome this, one has to
make sure that the variation to
be measured is dependent on
differences in gene expression, and
that those can take place in the
timeframe of the experiment. For
example, to measure the response
to a specific intervention in the
activation of the coagulation
cascade, a gene expression
microarray is not adequate, since
the activation of the coagulation
cascade factors does not involve
changes in expression.
Once the suitability of this
methodology has been evaluated,
it is essential to correctly design
the microarray experiments (3).
There are several issues to consider
in order to obtain a well-designed
experiment: the use of blocking
to avoid confounding; blinding
and randomisation to avoid bias;
what type of data analysis will be
employed; and estimating the
adequate number of replicates
necessary to have significant
conclusions (4).
Non-Biological Factors
Another drawback is that the
microarray method can alter
the results, with non-biological
factors contributing to the
variability of data. There are some
characteristics of the microarray
technique concerning both the
method and the platform, which
can be sources of variation in
the measurements of gene
expression, including sample
quality, differences in labelling
and hybridisation efficiency,
plus spatial biases across the
microarray surface (5). These
kinds of difficulties have been
extensively studied, and are
normally solved by running
enough replicates and using
normalisation methods (6).
Analysis Complexity
Only obtaining known conclusions,
or extremely abstract ones, can be
another difficulty. When researchers
get the microarray data and try to
extract conclusions from them, they
tend to search for differences in
expression in proteins known to be
involved in the process under study.
This normally allows for supporting
the already suspected mechanism,
but does not allow for drawing a
Innovations in Pharmaceutical Technology Issue 45
30/05/2013 08:37
iptonline.com
new one. In many other cases,
the complexity of the solutions
makes it difficult to obtain
further conclusions.
Pros and cons of topological mathematical modelling
ProsCons
Simple approach
In addition, it can be difficult to
discover unexpected patterns
beyond the ideas that informed
the study design. One of the
advantages of microarrays is their
capacity of proving the expression
of thousands of genes at once.
Nevertheless, it is also a handicap
when drawing conclusions.
Standard statistical techniques
fail to summarise much of the
information in all gene measures
across the samples, owing to
the high number of possible
combinations of proteins
differentially expressed between
control and intervention, and
the reduced number of samples
analysed. The obtained gene list
often depends on the statistical
test used, and most of them lack
the ability to control the expected
number of false positives at a
desirable level.
Analysis of hubs (a common connecting
protein in a network) and bottlenecks
(proteins connecting different networks)
Hub
Bottleneck
Systems Biology Tools
Systems biology can be used as
an analytical tool for microarray
data. There are two ways in which
systems biology approaches can
facilitate the analysis of results:
but also how all constituents of
a network function together, by
studying their complex interactions
(topology). The network can restrict
the number of possible conclusions
obtained from a microarray
experiment. By observing the links
between two significant proteins,
one may begin to suspect that they
represent much more than chance
associations in the results, and
that they are listed because of an
underlying biological process.
There are different mathematical
modelling approaches: those based
only on topological information
(topological modelling) and those
that exploit additional information,
such as function or expression.
Though topological information
is not always sufficiently
informative (see the pros and
cons box), the integration of other
pieces of information generate
more complete models. The
computational analysis of these
models allows the identification of
new biological restrictions which,
when applied to the microarray
analysis, constricts the amount
of data showing only the
significant data.
Case Study
One example of a systems biology
approach, involving a mathematical
model for integrating known
Innovations in Pharmaceutical Technology Issue 45
IPT 45 2013.indd 45
All links are weighted equally
W1
W2
W5
W4
W6
W8
W1=W2=W3=W4=W5=W6=W7=W8
Contextualising
Systems biology methods seek
to understand not only each
constituent of a biological network,
• Fast – chancing databases
• Lack of complete link information
• Errors in links reported
W3
Mathematical Modelling
In order to obtain a complete
and useful output from a gene
expression microarray experiment,
the high amount of information
obtained could be filtered, thus
making use of the already known
data from the biological system
under study. One possible way of
combining the existing knowledge
with the new microarray data
generated is by using systems
biology approaches (7).
Topology depends on the information used
to construct the map. Labile information
leads to labile conclusions
biological information with
microarray information, is the
online SimsCells.com software. This
uses a therapeutic performance
mapping system to generate and
explore mathematical models
representing different organisms
and cell types (8,9).
To model biological processes,
the technology exploits
available biological and medical
information, in addition to
topological data. Data relating
specific inputs (for example, drug Z
treatment, gene Y activation) with
their corresponding biological
outputs (such as clinical effect X,
protein W inhibition) are collated
in a database and used for training
and validation of the models
(see Figure 1, Step 1 on page 46).
The mathematical models
generated help solve the
disadvantages presented by
topological modelling methods,
the liability of the conclusions
and the equal weighting of
the links.
Once a mathematical model has
been trained to behave like the
represented biological system,
it can be questioned about the
mechanistic pathways that link a
stimulus to its associated outcome.
For instance, one may ask about
the mechanism of action (MoA)
that drives the side-effect of
a drug, or the network that links
45
30/05/2013 18:57
iptonline.com
Step 1: Initial modelling
Previous knowledge about humans
Interaction data
Gene/protein
network
Clinical analysis
and microarrays
Cell biology
Clinical trials
and drugs
Biochemistry
Physiology
Molecular
biology
Truth Table
Mathematical
model
Universe of solutions fulfilling the network
and the Truth Table restrictions
0,2
0,3
0,1
W1n
0,6
0,4
0,6
W2n
0,8
0,9
0,5
W3n
0,7
0,5
0,6
W4n
0,5
0,7
0,3
W5n
0,2
0,3
0,2
W6n
0,8
0,9
0,8
W7n
0,5
0,4
0,9
W8n
Solution 1 Solution 2 Solution 3
A
Solution n
B
Microarray information
Average solution
Interestingly, when models
generated through this approach are
questioned, they do not provide one
46
IPT 45 2013.indd 46
Step 1: Previous knowledge from different
scientific sources is used to generate a
mathematical model whose responses to any
stimulus comply with the biological restrictions.
(A) Generated models are questioned about
mechanistic solutions linking a new stimulus
to its corresponding biological outcome.
(B) Microarray experimental results can further
restrict the models’ biological solutions.
Step 2: All the possible solutions are compared
and grouped in a 2D representation according
to their common mechanistic patterns. By this
clustering, it is possible to identify groups
of similar outcomes. Alternatively, common
patterns from the different clusters can be
summarised in a unique ‘average’ solution.
stimulus (for example, different
side-effects observed in an individual
treated with the same drug) and
different mechanistic explanations to
the same biological response, such
as multifactorial diseases.
In order to get a mechanistic inside
on the pool of model outcomes,
the software applies a strategy
based on sampling methods, where
the common patterns in response
to an intervention are identified.
In Figure 1, the image in Step 2
shows a 2D space distribution of
all MoAs obtained. The way each
particular MoA responds to a
set of stimulus is computed and,
with a mathematical conservative
transformation, reduced to two
dimensions. The closer two
particular MoAs are placed in the
2D representation, the more similar
their responses to a stimulus.
Step 2: Sampling methods
the down-regulation of a gene
with its downstream effect on
other proteins.
Figure 1: Scheme of mathematical
modelling process
unique solution, but rather identify
a universe of possible solutions
that satisfy the restrictions set by
the topology and the database.
In a similar manner, the approach
revealed in nature different
molecular responses to the same
By extracting common patterns, it
is possible to draw a unique ‘central’
solution (‘average’ MoA) which, with
little variation, includes most of
the possible solutions. In addition,
it is possible to group the MoAs
in clusters and study each cluster
separately (see Figure 1, Step 2).
Each of them contains a set of MoAs
that, without being equal, share a
common pattern in their responses.
Considering the inter-individual
mechanistic variations observed
in nature, the different groups of
calculated solutions could represent
Innovations in Pharmaceutical Technology Issue 45
30/05/2013 08:38
iptonline.com
Control microarray
the most common patterns of
responses, in the sense that in
both cases the easiest solution
is prioritised.
Control model
System biology data
Key proteins
Figure 2: Microarray analysis using
systems biology. The comparison of
the microarrays becomes a comparison
of mathematical models taking into
account systems biology and other
biological data
Drawing Conclusions
The size and complexity of
microarray experiments often
results in a wide variety of possible
interpretations. Biologically
significant changes in expression
can be missed by expression
arrays due to technical limitations.
Therefore, mathematical models
become an invaluable tool to analyse
microarray results from a cellular
global behaviour perspective.
To extract the biologically relevant
information out of the gene
expression microarray experiments,
the case study approach outlined
here integrates microarray data into
the mathematical model training,
together with the topology and
the database. In this way, gene
expression data further restrict
the model possible outcomes.
When the results of a microarray
experiment are integrated in a
mathematical model compiling all
known biological information, the
discrete information provided by
the microarray (coming from a small
number of samples) is converted in
a mathematical model composed
by thousands of ‘individuals’,
thereby increasing the power of the
analysis. By comparing the models
corresponding to the control and
the intervention, it is possible to
obtain key proteins differentially
activated in both groups of samples
(see Figure 2). These differential
proteins are candidate proteins
to have different expression
levels in the microarray and
are worth following.
By using this method, researchers
can identify all the key proteins
relevant for the measured outcome
of the experiment, reducing the
number of false positives out of the
microarray data while at the same
Intervention microarray
time obtaining a logical, biologically
supported conclusion. It provides
some insight which goes beyond
what is already known from direct
investigation of the phenomenon
being studied, predicting properties
that might not be evident to the
experimenter.
Acknowledgement
The research leading to these results
has received funding from the
European Union’s Seventh Framework
Biological Networks 2.0 – an
integrative view of genome
biology data, BMC
Bioinformatics 11: p610, 2010
8. Visit: www.simscell.com
9. Mas JM, Pujol A, Aloy P and
Farrés J, Methods and systems
for identifying molecules
or processes of biological
interest by using knowledge
discovery in biological data,
US Patent Application No.
12/912,535, 2010
Programme (FP7/2007-2013)
under the grant agreement number
HEALTH-F4-2012-305869 (SysMalVac).
References
1. Sassolas A, Leca-Bouvier BD
and Blum LJ, DNA biosensors
and microarrays, Chemical
Reviews 108(1): pp109-139,
2008
2. Russell S, Meadows LA
and Russell RR, Microarray
Technology in Practice,
Academic Press/Elsevier, 2009
3. Falciani F, Microarray
Technology Through
Applications, Taylor & Francis
Group, 2007
4. Stekel D, Microarray
Bioinformatics, Cambridge
University Press, 2003
5. Draghici S, Khatri P, Eklund AC
and Szallasi Z, Reliability and
reproducibility issues in DNA
microarray measurements,
Trends in Genetics: TIG 22(2):
pp101-109, 2006
6. Reimers M, Making informed
choices about microarray data
analysis, PLoS Computational
Biology 6(5): e1000786, 2010
7. Kozhenkov S, Dubinina Y,
Sedova M, Gupta A,
Ponomarenko J and Baitaluk M,
Innovations in Pharmaceutical Technology Issue 45
IPT 45 2013.indd 47
Intervention model
Teresa Sardón is Head of Analytical
Services at Anaxomics, where she
is responsible for the business
development and commercialisation
of the SimsCell analysis platform. She
graduated in Pharmacy at the University
of the Basque Country and holds an MS and PhD in
Biochemistry from the Universitat Autònoma de Barcelona.
Her background includes five years of postdoctoral
research at the European Molecular Biology Laboratory in
Germany and four at the Centre for Genomic Regulation
in Barcelona, working in the fields of cellular and
molecular biology. Email: [email protected]
Cristina Segú is a Project Manager
at Anaxomics. She has a degree in
Biotechnology from the Universitat
Autònoma de Barcelona. As a member
of the Molecular Health Department,
she has gained extensive experience
in building computable descriptions of available
molecular, biochemical and physiological data.
Email: [email protected]
José Manuel Mas is a Founder and Chief
Operations Officer at Anaxomics. He was
previously the EU Head of R&D at RPS
and Founder and Chief Technology
Officer at Infociencia. José holds a degree
in Biochemistry, an MSc in Biotechnology
and a PhD in Computer Sciences. He has wide experience
in the development of biocomputational tools and artificial
intelligence techniques. Email: [email protected]
47
30/05/2013 16:40