Download A FRAMEWORK FOR MODELING IN REGULATORY NETWORKS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cellular differentiation wikipedia , lookup

Signal transduction wikipedia , lookup

Protein moonlighting wikipedia , lookup

Biochemical switches in the cell cycle wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

Silencer (genetics) wikipedia , lookup

List of types of proteins wikipedia , lookup

Gene regulatory network wikipedia , lookup

Transcript
A FRAMEWORK FOR MODELING IN
REGULATORY NETWORKS
Mohsen Ben Hassine1 , Radhi Mhiri2, Lamine Mili3
1
ISET de Nabeul, Computer Engineering,Tunisia
Faculté des Sciences de Tunis, Electrical Engineering,Tunisia
3
Virginia Tech , Electrical and Computer Engineering , USA
2
Abstract
The study of regulatory networks in systems biology and their ensuing dynamics is
a critical task to understand the huge genomic data being currently collected. Advances in
nanotechnology enable scientists for the first time to trace the biological processes on a
nanoscale by tracking the molecule movements. The projection of the real system, using
graphical and mathematical tools enables biologists to understand better, and even predict its
behaviour. Nevertheless the lack of a general framework that leads the biologist to an
efficiency modelling is a great challenge. In this paper we try to explain more issues
concerning the modelling of regulatory network using a straightforward method.
Keywords: auto-regulation, synthetic circuits, delay time, sensitivity analysis, homeostasis,
noise, data mining.
1. MODELLING IN SYSTEMS BIOLOGY
Most kinds of systems that are likely to be of interest involve entities
(proteins, metabolites, signaling molecules, etc.) that can be cast as “nodes”
interacting with each other via “edges” representing reactions that may be catalyzed
via other substances such as enzymes. These will also typically involve feedback
loops in which some of the nodes interact directly with the edges. We refer to the
basic constitution of this kind of representation as a structural model. The classical
modelling strategy in biology (and in engineering), the ordinary Differential
equation (ODE) approach contains three initial phases, and starts with this kind of
structural model, in which the reactions and effectors are known. The next level
refers to the kinetic rate equations describing the local properties of each edge, the
third level involve the parameterization of the model, in terms of providing values
for the parameters.
Armed with such knowledge, any number of software packages can predict
the time evolution of the variables (the concentrations) until they may reach a steady
state. This is done (internally) by recasting the system as a series of coupled
ordinary differential equations which are then solved numerically. We refer to this
type of operation as forward modelling, and provided that the structural model,
equations, and values of the parameters are known, it is comparatively easy to
produce such models and compare them with an experimental reality In such cases,
however, the experimental data that are most readily available do not include the
parameters at all, and are simply measurements of the (time-dependent) variables, of
which fluxes and concentrations are the most common. Comparison of the data with
the forward model is much more difficult, as we have to solve an inverse modelling,
reverse engineering or system identification problem.
Direct solution of such problems is essentially impossible, as they are
normally hugely underdetermined and do not have an analytical solution. The
normal approach is thus an iterative one in which a candidate set of parameters is
proposed, the system run in the forward direction, and on the basis of some metric of
closeness to the desired output a new set of parameters is tested. Eventually
(assuming that the structural model and the equations are adequate), a satisfactory
set of parameters, and hence solutions, will be found. These methods are much more
computer-intensive than those required for simple forward modelling, as potentially
many thousands or even millions of candidate models must be tested. We note,
however, that there are a number of other modelling strategies and issues that may
lead one to wish to choose different types of model from that described. First, the
ODE model assumes that compartments are well stirred and that the concentrations
of the participants are sufficiently great as to permit fluctuations to be ignored. If
this is not the case then stochastic simulations (SS) are required. If flow of
substances between many contiguous compartments is involved, and knowledge of
the spatial dynamics is required (as is common in computational fluid dynamics),
partial differential equations (PDEs) are necessary. SS and PDE models are again
much more computationally intensive, although in the latter case the designation of
a smaller subset of representative compartments may be effective (Mendes and Kell,
2001).
Data mining is the process of discovering meaningful new correlations,
patterns and trends by sifting through large amounts of data stored in repositories,
using pattern recognition technologies as well as statistical and mathematical
techniques. This enables modeller to construct network in a reduced and optimized
way, thus offer a “middle-out” strategy to keep insight between the two approaches:
bottom-up and top-down (fig.1).
Figure.1: Middle-out modeling strategy
2. MATHEMATICAL
MODELLING
DYNAMICS THEORY
AND
SYSTEM
In order to turn the static map (biologist system graphic representation) into
dynamic model that can provide insight into the temporal evolution of biochemical
reaction networks, a set of differential equations is needed. The general rule for
expressing the evolution of biochemical specific specie (x) is: dx/dt = rate of
production – rate of decay ± rate of transportation. For each interaction between
species we can attribute a specific function, for example, consider the Goldbeter
model (fig.2) of mitotic oscillator (Goldbeter 1991)

Figure.2: The Mitotic Oscillator
The rate of cyclin production is a linear process (vi), the decay process is
composed of tow parts: a natural exponential decay (death) and invoked decay
caused by a protease-cyclin complex (X,C). We can convert our ODE in a set of
block diagrams and nodes as used in engineering sciences (fig.3), for example, the
cyclin equation can be modelled as in fig.4
Figure.3: Examples of blocks diagrams and nodes used in engineering
sciences
Vi
-
C
kd
C
C+kd
X
vd
Figure.4: The Cyclin circuit
One important feature of a biochemical network is the robustness which
evaluates the sensitivity of the system or the ability to preserve its homeostasis
(equilibrium). As in engineering we can stimulate the system by varying the input
signal (parameter) and observe its output effect, we can also test validity and
efficiency of feedback by opening the loop involved (D. Angeli et al. 2003),
3. MATHEMATICAL FUNCTIONS AND CELL PHENOTYPES
Regulatory network are governed by the same mathematical functions
usually used by modellers to express: positive vs negative feedback, activation vs
repression and inhibition, the fraction of free operator…etc. the next table (table1)
gives the most useful functions for regulation.
Yagil rules
Michaelis-Menten
Hill functions
Hill functions
MM with ...
MM without...
Hill functions
Hill functions
Gaussian function
Delay function
Inducible enzyme (as lactose)
Repressible enzyme ( as trp)
Enzyme catalysed reaction
Activation
Repression
Competitive inhibition
Competitive inhibition
Multiple TF activation (or gate)
Multiple TF repression (or gate)
Internal noise
Time Delay ( transcription, translation
initiation)
F(O)= (1+k1*Ep)/(k+k1*Ep)
F(O)= (1+k1*Ep)/(1+k*k1*Ep)
F(S)= Vmax*S/(S+Km)
F(X)=Vmax*TFn/(Km+TFn)
F(X)=Vmax/(Km+TFn)
F(S)= Vmax*S/(S+KS(1+Ki))
F(S)= Vmax*S/((S+KS)*(1+I/Ki))
F(S)= (TF1/K1)n /(1+(TF1/K1)n+
(TF2/K2)n)
F(S)=1 /(1+(TF1/K1)n+ (TF2/K2)n)
F(X)=N(μ,σ2)
X(t)=F(Y(t-τ))
Table 1: Useful function of regulation
After a long experience in regulatory, transduction and metabolic networks
modelling, we can deduce now a lot of rules about the cell phenotypes: apoptosis,
proliferation, differentiation, stress response, mitosis, bifurcation …etc.
a.
Negative feedback loops
Negative feedback loops, common in biochemical pathways, are known to
provide stability, and withstand considerable variations and random perturbations of
biochemical parameters.
b.
Positive feedback loops
The positive-feedback network thus forms the basis for cellular memory,
allowing cells of identical genotype to achieve different phenotypes depending on
the external signals received. The behaviour of the system therefore depends on its
history, it can drive to hysteresis.
c.
Delay time
A generic feature in all intracellular biochemical processes is the time
required to complete the whole sequence of reactions to yield any observable
quantity in biological functions, theoretically time delay is known to be a source of
instability, and has been attributed to lead to oscillations or transient dynamics in
several biological functions. The delay in repression for example is the primary
factor for inducing increased inter-cellular heterogeneity in gene expression in a
population is shown theoretically and experimentally.
d.
Noise
Genetically identical cells exposed to the same environmental conditions
can show significant variation in molecular content and marked differences in
phenotypic characteristics. This variability is linked to stochasticity in gene
expression, which is generally viewed as having detrimental effects on cellular
function with potential implications for disease. However, stochasticity in gene
expression can also be advantageous. It can provide the flexibility needed by cells to
adapt to fluctuating environments or respond to sudden stresses, and a mechanism
by which population heterogeneity can be established during cellular differentiation
and development. Negative feedback reduces fluctuations by increasing expression
when protein numbers are low and decreasing expression when protein numbers are
high, negative feedback is more likely to evolve as an attenuator of stochasticity in
systems dominated by extrinsic fluctuations (Paulsson, 2004; Hooshangi and Weiss,
2006). Alternatively, intrinsic fluctuations could be reduced by an additional
positive feedback loop to maintain high protein copy numbers despite the negative
feedback needed to attenuate extrinsic fluctuations.
4.
SENSITIVITY,
SYNTHETIC
MEASUREMENT TECHNIQUES
CIRCUITS
AND
Sensitivity analysis represents a cornerstone in the analysis of complex
systems. It treats the effect of changing some parameter P (in the model) on the
reaction of some system variables. The goals of this analysis are:
- Determine factors that may contribute to output variability and so need the most
consideration
- find out parameters that can be eliminated in order to simplify the model without
altering its behavior grossly
- find the optimal region for use in a calibration study
- Check which groups of factors interact with each other.
- Evaluate the model, thus creating an output distribution or response.
- Assess the influence of each variable or group of variables using
correlation/regression, Bayesian inference, machine learning, or other methods (data
mining).
In order to break down the complexity of regulatory network, the forward
engineering of gene circuits and its ensuing experimental techniques (mutant cells,
as cdc25Δ and wee1 mutation in yeast, B.Novak 2001) enable modelers to build
desired network with specific properties predicted from mathematical models using
knowledge from biochemistry, molecular biology, and genetics. Consequently we
can engineer new cellular behaviour, and improve understanding of naturally
occurring networks (Bratsun et al.2005), the next figure (fig.5) presents some
samples:
Figure.5: synthetic genetic networks.
Massive amounts of data are being generated by genomics and proteomics
projects, thanks to sophisticated genetic engineering tools (gene knock-outs and
insertions, PCR) and measurement technologies (fluorescent proteins, microarrays,
blotting, FRET). Polymerase chain reaction (PCR) is a technique that amplifies
DNA (typically a gene or part of a gene). Creating multiple copies of a piece of
DNA, which would otherwise be present in too small a quantity to detect, PCR
enables the use of measurement techniques. Suppose that we wish to know at what
rate a certain gene X is being transcribed under a particular set of conditions in
which the cell finds itself. Fluorescent proteins may be used for that purpose. For
instance, green fluorescent protein (GFP) is a protein with the property that it
fluoresces in green when exposed to UV light. It is produced by the jellyfish
Aequoria victoria, and its gene has been isolated so that it can be used as a reporter
gene. The GFP gene is inserted (cloned) into the chromosome, adjacent to or very
close to the location of gene X, so both are controlled by the same promoter region.
Thus, gene X and GFP are transcribed simultaneously and then translated (Fig. 6),
and so by measuring the intensity of the GFP light emitted one can estimate how
much of X is being expressed. Fluorescent protein methods are particularly useful
when combined with flow cytometry. Flow Cytometry devices can be used to sort
individual cells into different groups, on the basis of characteristics such as cell size,
shape, or amount of measured fluorescence, and at rates of up to thousands of cells
per second. In this manner, it is possible, for instance, to count how many cells in a
population express a particular gene under a specific set of conditions.
Figure. 6: Fluorescent protein method
5. NOISE IN GENETIC NETWORKS
Biochemical networks are stochastic: fluctuations in numbers of molecules
are generated intrinsically by the dynamics of the network and extrinsically by
interactions of the network (fig. 7) with other stochastic systems (Elowitz et al,
2002; Swain et al, 2002). Stochastic effects in protein numbers can drive
developmental decisions (Arkin et al, 1998; Maamar et al,2007; Nachman et al,
2007; Suel et al, 2007), be inherited for several generations (Rosenfeld et al, 2005;
Kaufmann et al, 2007), and have perhaps influenced the organization of the genome
(Swain, 2004; Becskei et al, 2005). Intrinsic fluctuations are generated by
intermolecular collisions affecting the timing of individual reactions. Their strength
is increased by low copy numbers. The source of extrinsic fluctuations, however, is
mostly unknown (Kaern et al, 2005), although cell cycle effects (Rosenfeld et al,
2005; Volfson et al, 2006) and upstream networks (Volfson et al, 2006) contribute.
Yet extrinsic fluctuations dominate cellular variation in both prokaryotes (Elowitz et
al, 2002) and eukaryotes (Raser and O’Shea, 2004). They are colored, having a
lifetime that is not negligible but comparable to the cell cycle (Rosenfeld et al,
2005), and they are nonspecific, potentially affecting equally many molecules in the
system (Pedraza and van Oudenaarden, 2005). They are thus difficult to model and
their effects hard to predict (Austin et al, 2006; Cox et al, 2006; Geva-Zatorsky et
al,2006; Scott et al, 2006; Sigal et al, 2006; Tanase-Nicola et al, 2006; Tsimring et
al, 2006; Volfson et al, 2006; Maithreye and Sinha, 2007).
Intrinsic and extrinsic stochasticity can be measured by creating a copy of
the network of interest in the same cellular environment as the original network
(Elowitz et al, 2002). We can then define intrinsic and extrinsic variables, and their
fluctuations generate intrinsic and extrinsic stochasticity (Swain et al, 2002).
Intrinsic variables typically specify the copy numbers of the molecular components
of the network. Their values differ for each copy of the network. Extrinsic variables
often describe molecules that affect equally each copy of the network. Their values
are therefore the same for each copy.
Figure . 7: Noise in regulatory network
Noise strength is usually reported in terms of the standard deviation σ of a stochastic
variable q. The Fano factor, defined as F = σq2 / <q> , is related to the standard
deviation by σ /<q>= (F /<q>)1/2 ; because q measures molecule number, F is a
dimensionless quantity. When number fluctuations are due to a Poisson process, we
have F = 1. The Fano factor of an arbitrary stochastic system reveals deviations from
Poissonian behaviour. It is a sensitive measure of noise and the unit in which we
report our results. If we consider a single gene we can draw the ode equations as
follows (fig.8) :

KP : translational
efficiency
K R : transcriptional
efficiency
Figure .8: Single Gene Expression
The average number of proteins synthesized per mRNA transcript is:N=
Kp/γR, the mean number of number is : KR * N / γP ,finally the fano factor ≈ N+1
If we take account of the possibility of mutual activation and repression of
the promoter and try to tune the transcriptional and the translational efficiency we
can get different noise behaviour (fig.9)
Figure .9: Slow promoter transitions and transcriptional bursting
(M.Kærn2005)
Intrinsic and extrinsic noise can be measured and distinguished with two
reporters genes (cfp, yfp) controlled by identical regulatory sequences. In the
absence of intrinsic noise, the two fluorescent proteins fluctuate in a correlated
fashion over time in a single cell. Thus, in a population, each cell will have the same
amount of both proteins, although that amount will differ from cell to cell because of
extrinsic noise .Expression of the two genes may become uncorrelated in individual
cells because of intrinsic noise, giving rise to a population in which some cells
express more of one fluorescent protein than the other.
The next scatter plot (fig. 10) presents the fluorescent technique using two
strains of e_coli : one quiet (M22) and one noisy (D22). Each point represents the
mean fluorescence intensities from one cell. Spread of points perpendicular to the
diagonal line on which CFP and YFP intensities are equal corresponds to intrinsic
noise, whereas spread parallel to this line is increased by extrinsic
noise. The total noise generated is defined by : ηtot2 = ηint2+ηext2
Figure .10: Experimental quantification of noise ( Elowitz. Et al 2002)
Finally, we can pronounce some important results concerning the study of
noise in genetic networks:
- Extrinsic noise is not gene-specific, but intrinsic noise is.
- Extrinsic noise is predominant over intrinsic noise
- Noise does not depend on the regulatory pathway, neither on absolute rate of
expression.
- Noise depends on the rate of a slow upstream promoter transition, such as
chromatine remodelling
Downstream effects of noise can have profound phenotypic consequences,
drastically affecting the stability of gene expression.
- Noise (and consequently cell-to-cell variability) is amplified at transition in long
cascades.
- Autoregulation in gene circuits (in particular negative feedback loops) provides
stability.
- Noise can be controlled by kinetics parameters
5. CONCLUSION
Based on a lot of new research articles, This paper presents an overview of
the engineering methods used for the modelling of regulatory networks. With this
framework, it’s easy for the modeller to abstract its network, make a good analysis
of its parts dependencies, study the sensitivity of the system (effect of tuning some
parameters keys and noise) and even predict its behaviour (synthetic circuits,
mutants…). For more details a lot of paradigms are available (2,3,4,5,6,8,12)
References
1. Armen R Kherlopian, Ting Song, Qi Duan. A review of imaging techniques for
systems biology. BMC Systems Biology 2008, 2:74.
2. A. GOLDBETER. A minimal cascade model for the mitotic oscillator involving
cyclin and cdc2 kinase. Proc. Nati. Acad. Sci. USA,Vol. 88, pp. 9107-9111, October
1991
3. SOMDATTA SINHA. A Simple Approach to Study Designs in Complex
Biochemical Pathways, 74th Annual Meeting, New Delhi Oct. 31 – Nov. 2, 2008
4. David Angeli, James E. Ferrell, Jr., and Eduardo D. Sontag. Detection of
multistability, bifurcations, and hysteresis in a large class of biological positivefeedback systems, Pnas , 1822–1827, February 17, 2004
5. Michael C. Mackey ,, Moisés Santillán , Necmettin Yildirim . Modeling operon
dynamics: the tryptophan and lactose operons as paradigms. C. R. Biologies 327
(2004) 211–224
6. Vahid Shahrezaei, Julien F Ollivier and Peter S Swain. Colored extrinsic
fluctuations and stochastic gene Expression. Molecular Systems Biology 4; Article
number 196; doi:10.1038/msb.2008.31
7. Mads Kærn,William J. Blake, and J.J. Collins. The Engineering of Gene
Regulatory Networks. Annu. Rev. Biomed. Eng. 2003. 5:179–206.
8. Mads Kærn*, Timothy C. Elston, William J. Blake and James J. Collins.
Stochasticity in Gene Expression: from theories to phenotypes. nature reviews |
genetics, volume 6 , 451-463
9. Ozdudak, Thattai, Kurtser, Grossman, van Oudenaarden. Regulation of noise in
the expression of a single gene. Nat Genet 31: 69-73 ,2002
10. Rosenfeld, Young, Alon, Swain, Elowitz. Gene regulation at the single-cell
level. Science 307: 1962-1965. 2005
11. Pedraza, van Oudenaarden. Noise propagation in Gene networks Science 307:
1965-69. 2005
12. Dmitri Bratsun†, Dmitri Volfson, Lev S. Tsimring‡, and Jeff Hasty. Delayinduced stochastic oscillations
in gene regulation. PNAS. vol. 102, no. 4,14593–14598. october 2005
13. David Sprinzak1 & Michael B. Elowitz. Reconstruction of genetic circuits.
NATURE|Vol 438|24 November 2005.
14. Michael B. Elowitz, Arnold J. Levine, Eric D. Siggia,Peter S. Swain. Stochastic
Gene Expression in a Single Cell. SCIENCE VOL 297, 16 August 2002.
15. Nicholas J. Guido, Xiao Wang, David Adalsteinsson. A bottom-up approach to
gene regulation. NATURE Vol 439|16 February 2006
16. Jeff Hasty, David McMillen & J. J. Collins. Engineered gene circuits NATURE
| vol 420 | 14 november 2002
17. Timothy S. Gardner, Charles R. Cantor & James J. Collins. Construction of a
genetic toggle switch in Escherichia coli NATURE |vol 403 | 20 January 2000
18. Pratap R. Patnaik. External, extrinsic and intrinsic noise in cellular systems:
analogies and implications for protein synthesis. Biotechnology and Molecular
Biology Review Vol. 1 , pp. 121-127, December 2006.