Download Modeling Biological Pathways: an Object-oriented like

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metabolic network modelling wikipedia , lookup

Lac operon wikipedia , lookup

Transcript
Modeling Biological Pathways: an Object-oriented
like Methodology Based on Mean Field Analysis
Cordero Francesca
Department of Clinical and Biological Sciences
University of Torino
Turin, ITALY
[email protected]
Abstract—In this paper we propose an object-oriented methodology based on Mean Field Analysis that can be used to describe
in an intuitive manner the behavior of systems composed by a
large number of interacting objects. For instance, this technique
is well suited to study complex biopathways. We apply this
approach to model the lac operon gene regulatory mechanism and
glycolysis pathway. Numerical results obtained from the analysis
of the model are presented.
Index Terms—Mean Field Analysis; Biological Pathways
I. I NTRODUCTION
Nowadays the interest towards biology has led to an increasing development of models and methodologies for the
analysis of biological systems. Starting from [9] many formalisms have been proposed, to mention a few [6] [10]. In
this paper we propose an object-oriented like methodology
based on Mean Field Analysis. Our methodology can be used
to describe in an intuitive manner the behavior of systems
with a large number of interacting objects, such as complex
biopathways. Among the works presented in the literature, [4]
[8] [5] contain proposals that are close to our formalism. The
main difference with [8] lies in which has been adopted to
describe the phenomenon under study. In particular, Matsuno
et al. represent lac operon gene regulatory mechanism and
glycolysis with Hybrid Functional Petri Nets where each fluid
place models an entity involved in the biological process, and
where discrete places are used to describe system states. In
our work we consider an extension of the same biological
system, (taking into account the Escherichia Coli metabolism
switch), and we provide both a more abstract view of the
biopathway and a tractable analysis tool. Indeed we directly
describe each entity by an object class and the evolution of
the whole system is derived from the interaction among these
objects. The Bio-PEPA framework presented in [5] is very
similar to our approach; in fact both works are based on a high
level abstraction and exploit compositionality and interactions.
Our approach considers only continuous solutions and it does
not require to fix an upper bound on the entity concentrations.
The paper is structured as follows: the formal description
of the proposed methodology is presented in Section II. In
Section III, we report the biological case study, and in Section
IV, we analyze it using the proposed approach. We report some
result in Section V, and we conclude the paper in Section VI.
Manini Daniele, Gribaudo Marco
Department of Computer Science
University of Torino
Turin, ITALY
{manini,marcog}@di.unito.it
II. O BJECT- ORIENTED LIKE M EAN F IELD A NALYSIS
An Object-oriented like Mean Field Model, is a representation that describes the behavior of a system as a net composed
by a large number of interacting objects. Objects are divided
into classes: all the objects belonging to a given class have
exactly the same behavior. Objects might be influenced by the
distribution of the other objects in the system.
Each object is modeled by a Continuous Time Markov
Chain (CTMC), whose transition rates may depend on the
state of the whole system. All the objects that belong to the
same class are characterized by exactly the same infinitesimal
generator and the same parameters. If two objects perform the
same actions at different rates, they must belong to different
classes. In order to ease the description of complex systems,
classes are further grouped into meta-classes. All the classes
that derive from the same meta-class are characterized by the
same structure, but different rates.
The number of objects in every class changes dynamically:
new objects might be formed at a given rate (expressed as
quantity of new objects created per unit of time), and each
object has an exponentially distributed maximum lifetime.
More formally, we call an Object-oriented like Mean Field
Model M, a tuple:
M = (M C, OC)
(1)
where M C = {mc(1) , . . . , mc(k) } is a set of k meta-classes
and OC = {oc[1] , . . . , oc[m] } is a set of m object classes.
Each meta-class mc(i) is in turn defined by a tuple:
mc(i) = (c(i) , n(i) , L(i) , Λ(i) , C(i) , b(i) , D(i) )
(2)
where c(i) is a label corresponding to the name of the metaclass, n(i) is the number of states of the CTMC, L(i) = {l(i) }
is a set of labels (the names of the states) and Λ(i) =
(i)
(i)
(i)
{λ1 , . . . , λpi } is the set of formal parameters. Ci = |cul |
is the n(i) × n(i) infinitesimal generator of the CTMC where
(i)
cul is the transition rate from state u to state l. b(i) = |bl | is
(i)
(i)
the size n birth vector: its element bl represents the rate
(i)
at which new objects are created in state l. D(i) = diag(dll )
1
is a n(i) × n(i) diagonal matrix, such that (i)
represents the
dll
mean exponential lifetime of an object in state l. The entries of
C(i) , b(i) and D(i) may depend on the actual values assigned
to the parameters Λ. An object class oc[j] is also a tuple:
[j]
oc[j] = (o[j] , c[j] , Γ[j] , N [j] , π0 )
(3)
[j]
where o is a label representing the name of the class;
c[j] is name of the meta-class from which the class derives;
[j]
[j]
Γ[j] = {γ1 , . . . , γpi } is the set of actual parameters assigned
to each of the formal parameters of the meta-class defined by
[j]
Λ(i) ; N [j] is the initial number of objects; π0 is a probability
vector of size n[j] that defines the initial state probability
for the objects belonging to this class. We define n[j] as the
number of states of class j inherited from its meta-class, that is
n[j] = n(meta−class of j) . Note that we use round brackets
in superscripts for elements corresponding to meta-classes and
square brackets to denote elements belonging to classes. The
value of each actual parameters can depend on the distribution
of the number of objects among the states of all the classes
that compose the model.
Note that our approach is different form classical markovian
compositional approach like the one cited in [1]. The state
space growths exponentially in conventional compositional
approaches whereas our mean filed base methodology provides
approximations of the system that scales linearly with respect
to the number of objects.
A. Analysis
The model is analyzed using mean field analysis [3] which
takes advantage of the result proposed in [2] to consider
the evolution of each class separately. Initially, object classes
are instantiated: matrix C[j] (·), vector b[j] (·) and matrix
D[j] (·) are computed for each oc[j] by inserting the actual
parameters Γ[j] in the definitions of C(i) , b(i) and D(i) . We
[j]
call N[j] (t) = |Nl (t)| a vector of size n[j] , whose element
[j]
Nl (t) represents the number of objects of class j in state l at
time t. Formal parameters can depend on the number of objects
in each state, and thus we have C[j] (N[1] (t), . . . , N[m] (t)),
b[j] (N[1] (t), . . . , N[m] (t)), D[j] (N[1] (t), . . . , N[m] (t)).
The evolution of the system can then be studied solving for
j = 1..m:
h
i
dN[j] (t)
= N[j] (t) C[j] (·) − D[j] (·) + b[j] (·)
(4)
dt
[j]
with N[j] (0) = N [j] π0 . Note that, due to the presence of
the birth and death terms b[j] (·) and D[j] (·), the equation is
no longer a standard CTMC equation, and in general we have
Pn[j] [j]
that l=1 Nl (t) 6= N [j] for t > 0. The derivation of Eq. (4)
can be summarized as follows. To simplify the presentation
we drop the [j] superscript and the state dependencies (·). The
number of objects of class j in state l at time t + ∆t can be
approximated by:
Nl (t + ∆t) ≈ Nl (t) +
X
Nu (t)cul ∆t
(5)
u6=l
−Nl (t)
X
u6=l
clu ∆t − Nl (t)dll ∆t + bl ∆t
The second and third terms on the r.h.s. of Eq. (5) represent
objects entering and leaving state l, while the last two terms
consider the death andPthe birth of objects. By applying
the definition cll = − u6=l clu , rearranging the terms, and
dividing by ∆t we obtain:
Nl (t + ∆t) − Nl (t) X
≈
Nu (t)cul − Nl (t)dll + bl (6)
∆t
u
Eq. (4) can be obtained by letting ∆t → 0, and using vector
notation.
B. Methodology
We can summarize a methodology to define object-oriented
like mean field models in three steps. First, we identify the
different types of entities that compose the system and we
abstract their behavior into meta-classes. Second, we define the
Markov chains and the formal parameters of the meta-classes.
Finally, we define a class for each type of entity. Each class
is derived from a meta-class by assigning appropriate rates to
the formal parameters.
III. L AC OPERON GENE REGULATORY MECHANISM AND
GLYCOLYSIS
All the organisms respond to changing conditions in their
environment by controlling the expression of their genes.
Depending on the circumstances, the bacteria can regulate
their metabolic pathways by the expression, and consequently,
the concentration of specific enzymes. Escherichia coli is a
typical example of this behavior since it can alter the enzymes
concentrations to take full advantage for sugars fluctuation
in its environment. So a bacterium avoids synthesizing the
enzymes of a pathway in absence of the substrates, but it is
ready to produce the enzymes if the substrate should appear.
Escherichia coli can use two types of sugar: glucose and
lactose. When the concentration of glucose is abundant, the
bacterium uses only it even if other types of sugar are present.
However, when all glucose is consumed, the bacterium has the
ability to metabolize alternative sugar, lactose. The ability of
this bacterium to switch from one metabolite to another was
first described in [7].
The crucial point to make a swap between two sugar
metabolisms is lac operon. This unit is a sequence of DNA
formed by a promoter and an encoding region. The promoter
region is constituted of critical elements that can work in
concert to direct the level of transcription of a given gene.
The encoding region is constituted of three genes (LacZ, LacY
and LacA) which are translated into three different proteins
(also called enzymes) that catalyze chemical reactions. These
proteins are βgalactosidase, lactose permease and Galactosidese O-acetyltransferase. Both the speeds of transcription
and translation depend on the concentration of glucose and
lactose. The regulation depends on the concentration of both
Cyclic Adenosine Monophosphate (cAMP) and a protein
called repressor.
In absence of lactose the concentration of the repressor is
high; it has a high affinity with respect to lac promoter that
is controlled by a negative regulation and consequently the
transcription of the lac operon genes is inhibited.
In presence of lactose and glucose, since the bacteria has basal
levels of permease and β galactosidase, it can transport inside
itself a low quantity of lactose and it converts it into allolactose. Allolactose binds with high affinity the repressor and
diminishes repressor affinity for the promoter site, resulting
in a small increase in the amount of lac metabolic enzymes
produced. Then, when both glucose and lactose are provided
to Escherichia coli, the bacterium preferentially metabolizes
glucose until it is depleted.
If glucose is absent, there is a high level of cAMP that binds
with a catabolite protein (CAP) and this complex binds the
DNA sequence in the promoter region leading to positive
regulation. This type of regulation increases the amount of lac
metabolic enzymes generated with a 50-fold. When glucose
is depleted, the cAMP level gets down; cAMP therefore
dissociates from CAP.
Lactose, through βgalactosidase, is hydrolysed in allolactose and glucose. The glucose inside the bacteria is breaking
down in two molecules of piruvato, during glycolysis. Glycolysis is the initial step in any respiratory system and it is the
cascade characterized by ten biochemical reactions to obtain
energy.
IV. M ODELING B IOLOGICAL PATHWAYS
We aim to provide a model of the mechanism described
in Section III, taking into account the interaction among the
events present in the environment outside the bacterium and
the biological pathway that occur inside. We define an objectoriented like model that exploits the Mean Field Analysis
presented in Section II following the methodology presented
in Section II-B.
A. First-Step: classes and meta-classes identification
We first identify the entities (classes) that characterize
this phenomenon, and we look for similarities to abstract
their behavior and to define an appropriate number of metaclasses (see Tab. I). Our model is composed by 36 entities that can be grouped into six meta-classes: Bacteria,
Promoters, Proteins, Enzymes, Energy-rich Molecules
and Metabolites. Bacteria has just one class that is the
Escherichia coli. Promoters has two classes that are lac and
repressor. Metabolites include Glucose and Lactose, and
all sugars participating in the glycolysis. Meta-class Enzymes
has one class for each enzyme that catalyze all glycolysis
reactions, and the enzymes involved both in the recruitment
from environment and in the conversion of Glucose and
Lactose. The meta-class Proteins has four classes: lacZ,
lacY , lacA, and lacI that correspond to the genes that translate proteins βgalactosidase, P ermease, Galactosidese O−
acetyltransf erase, and T ranscription Repressor respectively. Finally Energy-rich Molecules has only one class that
is Adenosine.
CLASS
Escherichia coli
Glucose
Lactose
Allolactose
Allolactose-repressor
Glucose 6-phosphate
Fructose 6-phosphate
Fructose 1,6-bisphosphate
Glyceraldehyde 3-phosphate
Dihydroxyacetone phosphate
1,3-bisphosphoglycerate
3-phosphoglycerate
2-phosphoglycerate
Phosphoenolpyruvate
Pyruvate
βGalactosidase
Permease
Galactosidese O-acetyltransferase
Transcription Repressor
Hexonase
Phospho Glucose Isomerase
Phospho FructiKinase
Aldolase
Triosephosphate Isomerase
Gyceraldehyde-3-Phosphate Dehydrogenase
Phosphoglycerate Kinase
Phosphoglycerate Mutase
Enolase
Pyruvate Kinase
LacZ
LacY
LacA
LacI
Adenosine
lac
Repressor
META-CLASS
Bacteria
Metabolites
Metabolites
Metabolites
Metabolites
Metabolites
Metabolites
Metabolites
Metabolites
Metabolites
Metabolites
Metabolites
Metabolites
Metabolites
Metabolites
Enzymes
Enzymes
Enzymes
Enzymes
Enzymes
Enzymes
Enzymes
Enzymes
Enzymes
Enzymes
Enzymes
Enzymes
Enzymes
Enzymes
Proteins
Proteins
Proteins
Proteins
Energy-rich Molecules
Promoters
Promoters
TABLE I
T HE BIOLOGICAL MODEL .
B. Second-Step: meta-classes specification
Afterwards we define the Markov chains (depicted in Fig.
1) corresponding to the meta-classes identified before. Note
that we used arrows entering (exiting) a state to denote the
birth (death) of an object.
Bacteria can be in states U singGlucose, U singLactose
and SugarsEnded. The switch from U singGlucose to
U singLactose occurs with rate λGlucoseEnded that indicates
the absence of Glucose. On the other hand the return in
state U singGlucose happens with rate λGlucoseAdded and
represents the injection of Glucose. Moreover the switch
from U singLactose to SugarsEnded, due to the absence
of the two sugars in the environment, happens with rate
λLactAN DGlucEnded .
Promoters can be in states Basal, Activated and
Repressed. The rates that determine the switching depend
on the positive and negative regulation of the lac operon as
described in Section III.
Proteins can be in states T ranscribed and T ranslated.
λT ranscription indicates the transcription rate of mRNA from
the respective gene. The translation of the protein by decoding
of mRNA occurs with rate λT ranslation and λDegrP rot defines
the rate at which the protein degrades.
Enzymes can be in states Deactivated and Activated.
The production rate of each enzyme is determined by
λP rodEnz . The enzyme is activated and deactivated with
rates λAf f and λDis respectively. These parameters reflect
the capability of the enzyme to bind and to release the
substrates. Finally λDegrEnz define the rate at which the
enzyme degrades;
Energy-rich Molecules can be in states Di, M ono and
Cycle. These states describe the three main energetic levels
that can be reached by adenosine. The degradation of the
molecules are determined by rates λConsDI , λConsM ON O
and λConsCY CLE while the production rate is λP rodDI .
The switching between states Di and M ono is defined by
rates λKinaseD and λP hosphorylaseD , whereas the switching
between states M ono and Cycle depends on rates λKinaseM
and λP hosphorylaseM .
Metabolites can be in states Substrate and P roduct. The
production rate of each metabolite is determined by λP rodM et .
The switching from Substrate to P roduct depends on the
reaction kinetic defined by the rate λT ransf . Finally, λDegrM et
defines the rate at which the metabolite degrades.
All the meta-classes presented above can be formally expressed using the tuple reported in Eq. (2). For example the
meta-class Metabolites (mc(4) ) is defined by:
n
mc(4) =
‘M etabolites’, 2, {‘Substrate’, ‘P roduct’},
{λP rodM et , λT ransf , λDegrM
et },
−λT ransf λT ransf , |λP rodM et , 0|, 0
0
0
0
0
λDegrM et
o
The first element of the tuple c(4) =‘M etabolites’ is the name
of the meta-class, n(4) = 2 indicates the number of states
whose name is defined in L(4) = {‘Substrate’,‘P roduct’}.
The term Λ(4) = {λP rodM et , λT ransf , λDegrM et } lists the
(4)
=
formal parameters used in the specification of C
−λT ransf λT ransf (4)
(4)
, b
= |λP rodM et , 0| and D
=
0
0
0
0
0 λDegrM et .
C. Third-Step: classes and parameters specification
The crucial phase of this work is the definition of the rates
that determine the relations and the interactions among all
class objects of the model. The formal rates (depicted in Fig.
1) must be instantiated for each class. In the following we
focus on the formalization of class Glucose6 − phosphate
(Glucose6P ) and the definition of the actual rates that determine the behavior of this class. We denote each class state with
the notation Class.State whereas #(Class.State) represents
the number of objects of Class in State, and #(Class)
expresses the total number of objects of Class. The class
Glucose6P (oc[6] ) can be formally expressed using the tuple
Fig. 1.
Markov Chains representing object meta-classes
reported in (3):
oc[6] = ‘Glucose6P ’, ‘M etabolites’
, λT ransf Glucose6P , λDegrGlucose6P },
{ λP rodGlucose6P
0, |1, 0|
where o[6] =‘Glucose’ is the name of the class,
c[6] =‘M etabolites’ is the name of its meta-class. The term
Γ[6] = λP rodGlucose , λT ransf Glucose , λDegrGlucose lists the
actual parameters assigned to the formal parameters indicated
by Λ(4) (coming from the M etabolites that is mc(4) ). N [6] =
[6]
0 is the initial number of Glucose6P objects and π0 = |1, 0|
is the initial state probability vector (in this case all objects
start from state Substrate).
To defines the actual rates λP rodGlucose6P , λT ransf Glucose6P
and λDegrGlucose6P we consider that Glucose6P is involved
in the glycolysis cascade. To represent this series of biochemical reactions in which the products of one reaction
are consumed in the next, we overlap metabolites states. In
particular, the P roduct state of a metabolite corresponds to the
Substrate state of the metabolite involved in the consequent
reaction. In Fig. 2 is reported the conversion of Glucose
in Glucose6P , the overlapped states are contained in the
dashed box. We are able to calculate the rates λT ransf Glucose ,
λT ransf Glucose6P , and λT ransf F ructose6P by exploiting the
Michaelis-Menten kinetics 1 :
λT ransf Glucose =
λT ransf Glucose6P =
k2 ∗#(Glucose.Substrate)∗#(Hexo)
kM M +#(Glucose.Substrate)
k2 ∗#(Glucose6P.Substrate)∗#(P hGlIs)
kM M +#(Glu6P.Substrate)
(7)
(8)
1 We point out that the kinetics we adopted is suited for cases in which the
number of substrates increases.
Fig. 2. Conversion from Glucose to Glucose6P. States in the dashed box are
overlapped.
λT ransf F ructose6P =
k2 ∗#(F ructose6P.Substrate)∗#(P F rK)
kM M +#(F ructose6P.Substrate)
(9)
where k2 and kM M are respectively the kinetic parameter of
the metabolite production and the Michaelis-Menten constant.
F ructose6P is the metabolite that follows Glucose6P in the
glycolysis.
Taking advantage from the equivalence between overlapped states, we can define the following identities:
λP rodGlucose6P = λT ransf Glucose and λDegrGlucose6P =
λT ransf F ructose6P , that allow us to derive all the actual
parameters of class Glucose6P .
For sake of brevity we could only present the derivation
of the rates relative to the class Glucose6P . Parameters
derivation is the most crucial step in the development of
object-oriented like mean field models. For each class we
had to define appropriate expressions to capture the specific
biological behavior, which are in general significantly different
with respect to Eq. (8). This issue requires an accurate study
of the interdependency among related entities. For instance
in the case of Glucose6P the interdependency with classes
Hexonase (Hexo), P hosphoGlucoseIsomerase (P hGlIs),
and P hosphoF ructiKinase (P F rK) is considered in terms
of #(Hexo), #(P hGlIs) and #(P F rK).
V. R ESULTS
In this section we show some of the results obtained from
the analysis of the model with a set of parameters derived
from the literature. All results have been computed solving
Eq. (4) with standard numerical techniques. Using the Eulers
method with a fixed step size we were able to obtain stable
solutions in few minutes on a standard PC. We assume that the
system starts with initial concentrations of glucose and lactose
greater than 0. Moreover molecules of glucose arrive after a
given amount of time. Due to the initial presence of glucose,
the Escherichia coli begins to consume this metabolite until it
is depleted. Then the bacterium starts consuming lactose until
new glucose is injected in the environment. When the cell
consumes glucose there is a basal level of expression of the
lac operon. Otherwise when the bacterium uses lactose, the
promoter is activated. Fig. 3 plots the status of promoter lac.
Note that there is a gap between the switching instants of the
two classes (Escherichia coli and lac). Indeed when glucose
ends the enzyme activity appears very rapidly. On the other
Fig. 3. Time evolution of promoter lac (plot) and Escherichia coli status
(vertical dashed lines)
Fig. 4. Time evolution of the concentration of regulators of the lac status
(plots) and Escherichia coli status (vertical dashed lines)
hand, when lactose finishes the synthesis of enzymes stops
as rapidly as originally it had started. However the enzymes
are more stable than the mRNA, so their activity remains at
induced level for longer producing the gap.
The entities that regulate the lac status are the cAMP
(Adenosine.Cycle) and the complex allolactose-repressor
(Allolactose − repressor.P roduct). Their evolution is reported in Fig. 4. Finally Fig. 5 reports some of the evolutions of metabolites involved in the glycolysis cascade. This
figure points out that metabolites are sequentially produced
according to the glycolysis: glucose 6-phosphate, fructose 6phosphate, 1,3-bisphosphoglycerate, 3-phosphoglycerate and
2-phosphoglycerate.
VI. C ONCLUSION
In this work we presented a methodology, based on an
object-oriented like analysis, that describes lac operon gene
[10] K. Voss, M. Heiner, and I. Koch, “Steady state analysis of metabolic
pathways using Petri nets,” In Silico Biology, vol. 3, pp. 46–61, 2003.
Fig. 5. Time evolution of the concentration of some metabolites involved in
the glycolysis cascade
regulatory mechanism and glycolysis pathway. The objectoriented like approach presented in this paper simplifies the
implications of mean field analysis to model systems characterized by a large number of interacting objects. The proposed
formalism allowed us to define a high level abstraction of the
biopathway providing to experimenters a direct view of the
entities that form the phenomenon.
ACKNOWLEDGMENT
Part of this work was supported by grants from Italian Association for Cancer Research; the Regione Piemonte. Cordero
is a recipient of research fellowship supported by Regione
Piemonte and Universitá di Torino.
R EFERENCES
[1] K. Atif and B. Plateau, “Stochatic automata network for modeling
parallel systems,” IEEE Transactions on Software Engineering, vol. 17,
no. 10, 1991.
[2] A. Bobbio, M. Gribaudo, and M. Telek, “Analysis of large scale interacting systems by mean field method,” in 5th International Conference
on Quantitative Evaluation of Systems - QEST2008, St. Malo, 2008, pp.
215–224.
[3] J. L. Boudec, D. McDonald, and J. Mundinger, “A generic mean field
convergence result for systems of interacting objects,” in 4th International Conference on Quantitative Evaluation of Systems - QEST2007,
Edinburgh, 2007, pp. 3–18.
[4] H. Busch, W. Sandmann, and V. Wolf, “A numerical aggregation
algorithm for the enzyme-catalyzed substrate conversion,” in The 4th
Conference on Computational Methods in Systems Biology, 2006, pp.
298–311.
[5] F. Ciocchetta and J. Hillston, “Bio-pepa: a framework for the modelling
and analysis of biological systems,” 2008, theoretical Computer Science.
[6] R. Hofestädt, “A Petri net application of metabolic processes,” Journal
of System Analysis, Modeling and Simulation, vol. 16, pp. 113–122,
1994.
[7] F. Jacob and J. Monod, “On the regulation of gene activity.” Cold Spring
Harb. Symp. Quant. Biol., vol. 26, pp. 193–211, 1961.
[8] H. Matsuno, S. Fujita, A. Doi, M. Nagasaki, and S. Miyano, “Towards
pathway modelling and simulation,” in Proceedings of the ICATPN 2003,
ser. LNCS 2679. Eindhoven, Netherlands: Springer, 2003, pp. 3–22.
[9] V. Reddy, M. Mavrovouniotis, and M. Liebman, “Qualitative analysis of
biochemical reaction systems,” Comput. Biol. Med., vol. 26, pp. 9–24,
1996.