Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Modeling Biological Pathways: an Object-oriented like Methodology Based on Mean Field Analysis Cordero Francesca Department of Clinical and Biological Sciences University of Torino Turin, ITALY [email protected] Abstract—In this paper we propose an object-oriented methodology based on Mean Field Analysis that can be used to describe in an intuitive manner the behavior of systems composed by a large number of interacting objects. For instance, this technique is well suited to study complex biopathways. We apply this approach to model the lac operon gene regulatory mechanism and glycolysis pathway. Numerical results obtained from the analysis of the model are presented. Index Terms—Mean Field Analysis; Biological Pathways I. I NTRODUCTION Nowadays the interest towards biology has led to an increasing development of models and methodologies for the analysis of biological systems. Starting from [9] many formalisms have been proposed, to mention a few [6] [10]. In this paper we propose an object-oriented like methodology based on Mean Field Analysis. Our methodology can be used to describe in an intuitive manner the behavior of systems with a large number of interacting objects, such as complex biopathways. Among the works presented in the literature, [4] [8] [5] contain proposals that are close to our formalism. The main difference with [8] lies in which has been adopted to describe the phenomenon under study. In particular, Matsuno et al. represent lac operon gene regulatory mechanism and glycolysis with Hybrid Functional Petri Nets where each fluid place models an entity involved in the biological process, and where discrete places are used to describe system states. In our work we consider an extension of the same biological system, (taking into account the Escherichia Coli metabolism switch), and we provide both a more abstract view of the biopathway and a tractable analysis tool. Indeed we directly describe each entity by an object class and the evolution of the whole system is derived from the interaction among these objects. The Bio-PEPA framework presented in [5] is very similar to our approach; in fact both works are based on a high level abstraction and exploit compositionality and interactions. Our approach considers only continuous solutions and it does not require to fix an upper bound on the entity concentrations. The paper is structured as follows: the formal description of the proposed methodology is presented in Section II. In Section III, we report the biological case study, and in Section IV, we analyze it using the proposed approach. We report some result in Section V, and we conclude the paper in Section VI. Manini Daniele, Gribaudo Marco Department of Computer Science University of Torino Turin, ITALY {manini,marcog}@di.unito.it II. O BJECT- ORIENTED LIKE M EAN F IELD A NALYSIS An Object-oriented like Mean Field Model, is a representation that describes the behavior of a system as a net composed by a large number of interacting objects. Objects are divided into classes: all the objects belonging to a given class have exactly the same behavior. Objects might be influenced by the distribution of the other objects in the system. Each object is modeled by a Continuous Time Markov Chain (CTMC), whose transition rates may depend on the state of the whole system. All the objects that belong to the same class are characterized by exactly the same infinitesimal generator and the same parameters. If two objects perform the same actions at different rates, they must belong to different classes. In order to ease the description of complex systems, classes are further grouped into meta-classes. All the classes that derive from the same meta-class are characterized by the same structure, but different rates. The number of objects in every class changes dynamically: new objects might be formed at a given rate (expressed as quantity of new objects created per unit of time), and each object has an exponentially distributed maximum lifetime. More formally, we call an Object-oriented like Mean Field Model M, a tuple: M = (M C, OC) (1) where M C = {mc(1) , . . . , mc(k) } is a set of k meta-classes and OC = {oc[1] , . . . , oc[m] } is a set of m object classes. Each meta-class mc(i) is in turn defined by a tuple: mc(i) = (c(i) , n(i) , L(i) , Λ(i) , C(i) , b(i) , D(i) ) (2) where c(i) is a label corresponding to the name of the metaclass, n(i) is the number of states of the CTMC, L(i) = {l(i) } is a set of labels (the names of the states) and Λ(i) = (i) (i) (i) {λ1 , . . . , λpi } is the set of formal parameters. Ci = |cul | is the n(i) × n(i) infinitesimal generator of the CTMC where (i) cul is the transition rate from state u to state l. b(i) = |bl | is (i) (i) the size n birth vector: its element bl represents the rate (i) at which new objects are created in state l. D(i) = diag(dll ) 1 is a n(i) × n(i) diagonal matrix, such that (i) represents the dll mean exponential lifetime of an object in state l. The entries of C(i) , b(i) and D(i) may depend on the actual values assigned to the parameters Λ. An object class oc[j] is also a tuple: [j] oc[j] = (o[j] , c[j] , Γ[j] , N [j] , π0 ) (3) [j] where o is a label representing the name of the class; c[j] is name of the meta-class from which the class derives; [j] [j] Γ[j] = {γ1 , . . . , γpi } is the set of actual parameters assigned to each of the formal parameters of the meta-class defined by [j] Λ(i) ; N [j] is the initial number of objects; π0 is a probability vector of size n[j] that defines the initial state probability for the objects belonging to this class. We define n[j] as the number of states of class j inherited from its meta-class, that is n[j] = n(meta−class of j) . Note that we use round brackets in superscripts for elements corresponding to meta-classes and square brackets to denote elements belonging to classes. The value of each actual parameters can depend on the distribution of the number of objects among the states of all the classes that compose the model. Note that our approach is different form classical markovian compositional approach like the one cited in [1]. The state space growths exponentially in conventional compositional approaches whereas our mean filed base methodology provides approximations of the system that scales linearly with respect to the number of objects. A. Analysis The model is analyzed using mean field analysis [3] which takes advantage of the result proposed in [2] to consider the evolution of each class separately. Initially, object classes are instantiated: matrix C[j] (·), vector b[j] (·) and matrix D[j] (·) are computed for each oc[j] by inserting the actual parameters Γ[j] in the definitions of C(i) , b(i) and D(i) . We [j] call N[j] (t) = |Nl (t)| a vector of size n[j] , whose element [j] Nl (t) represents the number of objects of class j in state l at time t. Formal parameters can depend on the number of objects in each state, and thus we have C[j] (N[1] (t), . . . , N[m] (t)), b[j] (N[1] (t), . . . , N[m] (t)), D[j] (N[1] (t), . . . , N[m] (t)). The evolution of the system can then be studied solving for j = 1..m: h i dN[j] (t) = N[j] (t) C[j] (·) − D[j] (·) + b[j] (·) (4) dt [j] with N[j] (0) = N [j] π0 . Note that, due to the presence of the birth and death terms b[j] (·) and D[j] (·), the equation is no longer a standard CTMC equation, and in general we have Pn[j] [j] that l=1 Nl (t) 6= N [j] for t > 0. The derivation of Eq. (4) can be summarized as follows. To simplify the presentation we drop the [j] superscript and the state dependencies (·). The number of objects of class j in state l at time t + ∆t can be approximated by: Nl (t + ∆t) ≈ Nl (t) + X Nu (t)cul ∆t (5) u6=l −Nl (t) X u6=l clu ∆t − Nl (t)dll ∆t + bl ∆t The second and third terms on the r.h.s. of Eq. (5) represent objects entering and leaving state l, while the last two terms consider the death andPthe birth of objects. By applying the definition cll = − u6=l clu , rearranging the terms, and dividing by ∆t we obtain: Nl (t + ∆t) − Nl (t) X ≈ Nu (t)cul − Nl (t)dll + bl (6) ∆t u Eq. (4) can be obtained by letting ∆t → 0, and using vector notation. B. Methodology We can summarize a methodology to define object-oriented like mean field models in three steps. First, we identify the different types of entities that compose the system and we abstract their behavior into meta-classes. Second, we define the Markov chains and the formal parameters of the meta-classes. Finally, we define a class for each type of entity. Each class is derived from a meta-class by assigning appropriate rates to the formal parameters. III. L AC OPERON GENE REGULATORY MECHANISM AND GLYCOLYSIS All the organisms respond to changing conditions in their environment by controlling the expression of their genes. Depending on the circumstances, the bacteria can regulate their metabolic pathways by the expression, and consequently, the concentration of specific enzymes. Escherichia coli is a typical example of this behavior since it can alter the enzymes concentrations to take full advantage for sugars fluctuation in its environment. So a bacterium avoids synthesizing the enzymes of a pathway in absence of the substrates, but it is ready to produce the enzymes if the substrate should appear. Escherichia coli can use two types of sugar: glucose and lactose. When the concentration of glucose is abundant, the bacterium uses only it even if other types of sugar are present. However, when all glucose is consumed, the bacterium has the ability to metabolize alternative sugar, lactose. The ability of this bacterium to switch from one metabolite to another was first described in [7]. The crucial point to make a swap between two sugar metabolisms is lac operon. This unit is a sequence of DNA formed by a promoter and an encoding region. The promoter region is constituted of critical elements that can work in concert to direct the level of transcription of a given gene. The encoding region is constituted of three genes (LacZ, LacY and LacA) which are translated into three different proteins (also called enzymes) that catalyze chemical reactions. These proteins are βgalactosidase, lactose permease and Galactosidese O-acetyltransferase. Both the speeds of transcription and translation depend on the concentration of glucose and lactose. The regulation depends on the concentration of both Cyclic Adenosine Monophosphate (cAMP) and a protein called repressor. In absence of lactose the concentration of the repressor is high; it has a high affinity with respect to lac promoter that is controlled by a negative regulation and consequently the transcription of the lac operon genes is inhibited. In presence of lactose and glucose, since the bacteria has basal levels of permease and β galactosidase, it can transport inside itself a low quantity of lactose and it converts it into allolactose. Allolactose binds with high affinity the repressor and diminishes repressor affinity for the promoter site, resulting in a small increase in the amount of lac metabolic enzymes produced. Then, when both glucose and lactose are provided to Escherichia coli, the bacterium preferentially metabolizes glucose until it is depleted. If glucose is absent, there is a high level of cAMP that binds with a catabolite protein (CAP) and this complex binds the DNA sequence in the promoter region leading to positive regulation. This type of regulation increases the amount of lac metabolic enzymes generated with a 50-fold. When glucose is depleted, the cAMP level gets down; cAMP therefore dissociates from CAP. Lactose, through βgalactosidase, is hydrolysed in allolactose and glucose. The glucose inside the bacteria is breaking down in two molecules of piruvato, during glycolysis. Glycolysis is the initial step in any respiratory system and it is the cascade characterized by ten biochemical reactions to obtain energy. IV. M ODELING B IOLOGICAL PATHWAYS We aim to provide a model of the mechanism described in Section III, taking into account the interaction among the events present in the environment outside the bacterium and the biological pathway that occur inside. We define an objectoriented like model that exploits the Mean Field Analysis presented in Section II following the methodology presented in Section II-B. A. First-Step: classes and meta-classes identification We first identify the entities (classes) that characterize this phenomenon, and we look for similarities to abstract their behavior and to define an appropriate number of metaclasses (see Tab. I). Our model is composed by 36 entities that can be grouped into six meta-classes: Bacteria, Promoters, Proteins, Enzymes, Energy-rich Molecules and Metabolites. Bacteria has just one class that is the Escherichia coli. Promoters has two classes that are lac and repressor. Metabolites include Glucose and Lactose, and all sugars participating in the glycolysis. Meta-class Enzymes has one class for each enzyme that catalyze all glycolysis reactions, and the enzymes involved both in the recruitment from environment and in the conversion of Glucose and Lactose. The meta-class Proteins has four classes: lacZ, lacY , lacA, and lacI that correspond to the genes that translate proteins βgalactosidase, P ermease, Galactosidese O− acetyltransf erase, and T ranscription Repressor respectively. Finally Energy-rich Molecules has only one class that is Adenosine. CLASS Escherichia coli Glucose Lactose Allolactose Allolactose-repressor Glucose 6-phosphate Fructose 6-phosphate Fructose 1,6-bisphosphate Glyceraldehyde 3-phosphate Dihydroxyacetone phosphate 1,3-bisphosphoglycerate 3-phosphoglycerate 2-phosphoglycerate Phosphoenolpyruvate Pyruvate βGalactosidase Permease Galactosidese O-acetyltransferase Transcription Repressor Hexonase Phospho Glucose Isomerase Phospho FructiKinase Aldolase Triosephosphate Isomerase Gyceraldehyde-3-Phosphate Dehydrogenase Phosphoglycerate Kinase Phosphoglycerate Mutase Enolase Pyruvate Kinase LacZ LacY LacA LacI Adenosine lac Repressor META-CLASS Bacteria Metabolites Metabolites Metabolites Metabolites Metabolites Metabolites Metabolites Metabolites Metabolites Metabolites Metabolites Metabolites Metabolites Metabolites Enzymes Enzymes Enzymes Enzymes Enzymes Enzymes Enzymes Enzymes Enzymes Enzymes Enzymes Enzymes Enzymes Enzymes Proteins Proteins Proteins Proteins Energy-rich Molecules Promoters Promoters TABLE I T HE BIOLOGICAL MODEL . B. Second-Step: meta-classes specification Afterwards we define the Markov chains (depicted in Fig. 1) corresponding to the meta-classes identified before. Note that we used arrows entering (exiting) a state to denote the birth (death) of an object. Bacteria can be in states U singGlucose, U singLactose and SugarsEnded. The switch from U singGlucose to U singLactose occurs with rate λGlucoseEnded that indicates the absence of Glucose. On the other hand the return in state U singGlucose happens with rate λGlucoseAdded and represents the injection of Glucose. Moreover the switch from U singLactose to SugarsEnded, due to the absence of the two sugars in the environment, happens with rate λLactAN DGlucEnded . Promoters can be in states Basal, Activated and Repressed. The rates that determine the switching depend on the positive and negative regulation of the lac operon as described in Section III. Proteins can be in states T ranscribed and T ranslated. λT ranscription indicates the transcription rate of mRNA from the respective gene. The translation of the protein by decoding of mRNA occurs with rate λT ranslation and λDegrP rot defines the rate at which the protein degrades. Enzymes can be in states Deactivated and Activated. The production rate of each enzyme is determined by λP rodEnz . The enzyme is activated and deactivated with rates λAf f and λDis respectively. These parameters reflect the capability of the enzyme to bind and to release the substrates. Finally λDegrEnz define the rate at which the enzyme degrades; Energy-rich Molecules can be in states Di, M ono and Cycle. These states describe the three main energetic levels that can be reached by adenosine. The degradation of the molecules are determined by rates λConsDI , λConsM ON O and λConsCY CLE while the production rate is λP rodDI . The switching between states Di and M ono is defined by rates λKinaseD and λP hosphorylaseD , whereas the switching between states M ono and Cycle depends on rates λKinaseM and λP hosphorylaseM . Metabolites can be in states Substrate and P roduct. The production rate of each metabolite is determined by λP rodM et . The switching from Substrate to P roduct depends on the reaction kinetic defined by the rate λT ransf . Finally, λDegrM et defines the rate at which the metabolite degrades. All the meta-classes presented above can be formally expressed using the tuple reported in Eq. (2). For example the meta-class Metabolites (mc(4) ) is defined by: n mc(4) = ‘M etabolites’, 2, {‘Substrate’, ‘P roduct’}, {λP rodM et , λT ransf , λDegrM et }, −λT ransf λT ransf , |λP rodM et , 0|, 0 0 0 0 0 λDegrM et o The first element of the tuple c(4) =‘M etabolites’ is the name of the meta-class, n(4) = 2 indicates the number of states whose name is defined in L(4) = {‘Substrate’,‘P roduct’}. The term Λ(4) = {λP rodM et , λT ransf , λDegrM et } lists the (4) = formal parameters used in the specification of C −λT ransf λT ransf (4) (4) , b = |λP rodM et , 0| and D = 0 0 0 0 0 λDegrM et . C. Third-Step: classes and parameters specification The crucial phase of this work is the definition of the rates that determine the relations and the interactions among all class objects of the model. The formal rates (depicted in Fig. 1) must be instantiated for each class. In the following we focus on the formalization of class Glucose6 − phosphate (Glucose6P ) and the definition of the actual rates that determine the behavior of this class. We denote each class state with the notation Class.State whereas #(Class.State) represents the number of objects of Class in State, and #(Class) expresses the total number of objects of Class. The class Glucose6P (oc[6] ) can be formally expressed using the tuple Fig. 1. Markov Chains representing object meta-classes reported in (3): oc[6] = ‘Glucose6P ’, ‘M etabolites’ , λT ransf Glucose6P , λDegrGlucose6P }, { λP rodGlucose6P 0, |1, 0| where o[6] =‘Glucose’ is the name of the class, c[6] =‘M etabolites’ is the name of its meta-class. The term Γ[6] = λP rodGlucose , λT ransf Glucose , λDegrGlucose lists the actual parameters assigned to the formal parameters indicated by Λ(4) (coming from the M etabolites that is mc(4) ). N [6] = [6] 0 is the initial number of Glucose6P objects and π0 = |1, 0| is the initial state probability vector (in this case all objects start from state Substrate). To defines the actual rates λP rodGlucose6P , λT ransf Glucose6P and λDegrGlucose6P we consider that Glucose6P is involved in the glycolysis cascade. To represent this series of biochemical reactions in which the products of one reaction are consumed in the next, we overlap metabolites states. In particular, the P roduct state of a metabolite corresponds to the Substrate state of the metabolite involved in the consequent reaction. In Fig. 2 is reported the conversion of Glucose in Glucose6P , the overlapped states are contained in the dashed box. We are able to calculate the rates λT ransf Glucose , λT ransf Glucose6P , and λT ransf F ructose6P by exploiting the Michaelis-Menten kinetics 1 : λT ransf Glucose = λT ransf Glucose6P = k2 ∗#(Glucose.Substrate)∗#(Hexo) kM M +#(Glucose.Substrate) k2 ∗#(Glucose6P.Substrate)∗#(P hGlIs) kM M +#(Glu6P.Substrate) (7) (8) 1 We point out that the kinetics we adopted is suited for cases in which the number of substrates increases. Fig. 2. Conversion from Glucose to Glucose6P. States in the dashed box are overlapped. λT ransf F ructose6P = k2 ∗#(F ructose6P.Substrate)∗#(P F rK) kM M +#(F ructose6P.Substrate) (9) where k2 and kM M are respectively the kinetic parameter of the metabolite production and the Michaelis-Menten constant. F ructose6P is the metabolite that follows Glucose6P in the glycolysis. Taking advantage from the equivalence between overlapped states, we can define the following identities: λP rodGlucose6P = λT ransf Glucose and λDegrGlucose6P = λT ransf F ructose6P , that allow us to derive all the actual parameters of class Glucose6P . For sake of brevity we could only present the derivation of the rates relative to the class Glucose6P . Parameters derivation is the most crucial step in the development of object-oriented like mean field models. For each class we had to define appropriate expressions to capture the specific biological behavior, which are in general significantly different with respect to Eq. (8). This issue requires an accurate study of the interdependency among related entities. For instance in the case of Glucose6P the interdependency with classes Hexonase (Hexo), P hosphoGlucoseIsomerase (P hGlIs), and P hosphoF ructiKinase (P F rK) is considered in terms of #(Hexo), #(P hGlIs) and #(P F rK). V. R ESULTS In this section we show some of the results obtained from the analysis of the model with a set of parameters derived from the literature. All results have been computed solving Eq. (4) with standard numerical techniques. Using the Eulers method with a fixed step size we were able to obtain stable solutions in few minutes on a standard PC. We assume that the system starts with initial concentrations of glucose and lactose greater than 0. Moreover molecules of glucose arrive after a given amount of time. Due to the initial presence of glucose, the Escherichia coli begins to consume this metabolite until it is depleted. Then the bacterium starts consuming lactose until new glucose is injected in the environment. When the cell consumes glucose there is a basal level of expression of the lac operon. Otherwise when the bacterium uses lactose, the promoter is activated. Fig. 3 plots the status of promoter lac. Note that there is a gap between the switching instants of the two classes (Escherichia coli and lac). Indeed when glucose ends the enzyme activity appears very rapidly. On the other Fig. 3. Time evolution of promoter lac (plot) and Escherichia coli status (vertical dashed lines) Fig. 4. Time evolution of the concentration of regulators of the lac status (plots) and Escherichia coli status (vertical dashed lines) hand, when lactose finishes the synthesis of enzymes stops as rapidly as originally it had started. However the enzymes are more stable than the mRNA, so their activity remains at induced level for longer producing the gap. The entities that regulate the lac status are the cAMP (Adenosine.Cycle) and the complex allolactose-repressor (Allolactose − repressor.P roduct). Their evolution is reported in Fig. 4. Finally Fig. 5 reports some of the evolutions of metabolites involved in the glycolysis cascade. This figure points out that metabolites are sequentially produced according to the glycolysis: glucose 6-phosphate, fructose 6phosphate, 1,3-bisphosphoglycerate, 3-phosphoglycerate and 2-phosphoglycerate. VI. C ONCLUSION In this work we presented a methodology, based on an object-oriented like analysis, that describes lac operon gene [10] K. Voss, M. Heiner, and I. Koch, “Steady state analysis of metabolic pathways using Petri nets,” In Silico Biology, vol. 3, pp. 46–61, 2003. Fig. 5. Time evolution of the concentration of some metabolites involved in the glycolysis cascade regulatory mechanism and glycolysis pathway. The objectoriented like approach presented in this paper simplifies the implications of mean field analysis to model systems characterized by a large number of interacting objects. The proposed formalism allowed us to define a high level abstraction of the biopathway providing to experimenters a direct view of the entities that form the phenomenon. ACKNOWLEDGMENT Part of this work was supported by grants from Italian Association for Cancer Research; the Regione Piemonte. Cordero is a recipient of research fellowship supported by Regione Piemonte and Universitá di Torino. R EFERENCES [1] K. Atif and B. Plateau, “Stochatic automata network for modeling parallel systems,” IEEE Transactions on Software Engineering, vol. 17, no. 10, 1991. [2] A. Bobbio, M. Gribaudo, and M. Telek, “Analysis of large scale interacting systems by mean field method,” in 5th International Conference on Quantitative Evaluation of Systems - QEST2008, St. Malo, 2008, pp. 215–224. [3] J. L. Boudec, D. McDonald, and J. Mundinger, “A generic mean field convergence result for systems of interacting objects,” in 4th International Conference on Quantitative Evaluation of Systems - QEST2007, Edinburgh, 2007, pp. 3–18. [4] H. Busch, W. Sandmann, and V. Wolf, “A numerical aggregation algorithm for the enzyme-catalyzed substrate conversion,” in The 4th Conference on Computational Methods in Systems Biology, 2006, pp. 298–311. [5] F. Ciocchetta and J. Hillston, “Bio-pepa: a framework for the modelling and analysis of biological systems,” 2008, theoretical Computer Science. [6] R. Hofestädt, “A Petri net application of metabolic processes,” Journal of System Analysis, Modeling and Simulation, vol. 16, pp. 113–122, 1994. [7] F. Jacob and J. Monod, “On the regulation of gene activity.” Cold Spring Harb. Symp. Quant. Biol., vol. 26, pp. 193–211, 1961. [8] H. Matsuno, S. Fujita, A. Doi, M. Nagasaki, and S. Miyano, “Towards pathway modelling and simulation,” in Proceedings of the ICATPN 2003, ser. LNCS 2679. Eindhoven, Netherlands: Springer, 2003, pp. 3–22. [9] V. Reddy, M. Mavrovouniotis, and M. Liebman, “Qualitative analysis of biochemical reaction systems,” Comput. Biol. Med., vol. 26, pp. 9–24, 1996.