* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A FRAMEWORK FOR MODELING IN REGULATORY NETWORKS
Survey
Document related concepts
Cellular differentiation wikipedia , lookup
Signal transduction wikipedia , lookup
Protein moonlighting wikipedia , lookup
Biochemical switches in the cell cycle wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Transcript
A FRAMEWORK FOR MODELING IN REGULATORY NETWORKS Mohsen Ben Hassine1 , Radhi Mhiri2, Lamine Mili3 1 ISET de Nabeul, Computer Engineering,Tunisia Faculté des Sciences de Tunis, Electrical Engineering,Tunisia 3 Virginia Tech , Electrical and Computer Engineering , USA 2 Abstract The study of regulatory networks in systems biology and their ensuing dynamics is a critical task to understand the huge genomic data being currently collected. Advances in nanotechnology enable scientists for the first time to trace the biological processes on a nanoscale by tracking the molecule movements. The projection of the real system, using graphical and mathematical tools enables biologists to understand better, and even predict its behaviour. Nevertheless the lack of a general framework that leads the biologist to an efficiency modelling is a great challenge. In this paper we try to explain more issues concerning the modelling of regulatory network using a straightforward method. Keywords: auto-regulation, synthetic circuits, delay time, sensitivity analysis, homeostasis, noise, data mining. 1. MODELLING IN SYSTEMS BIOLOGY Most kinds of systems that are likely to be of interest involve entities (proteins, metabolites, signaling molecules, etc.) that can be cast as “nodes” interacting with each other via “edges” representing reactions that may be catalyzed via other substances such as enzymes. These will also typically involve feedback loops in which some of the nodes interact directly with the edges. We refer to the basic constitution of this kind of representation as a structural model. The classical modelling strategy in biology (and in engineering), the ordinary Differential equation (ODE) approach contains three initial phases, and starts with this kind of structural model, in which the reactions and effectors are known. The next level refers to the kinetic rate equations describing the local properties of each edge, the third level involve the parameterization of the model, in terms of providing values for the parameters. Armed with such knowledge, any number of software packages can predict the time evolution of the variables (the concentrations) until they may reach a steady state. This is done (internally) by recasting the system as a series of coupled ordinary differential equations which are then solved numerically. We refer to this type of operation as forward modelling, and provided that the structural model, equations, and values of the parameters are known, it is comparatively easy to produce such models and compare them with an experimental reality In such cases, however, the experimental data that are most readily available do not include the parameters at all, and are simply measurements of the (time-dependent) variables, of which fluxes and concentrations are the most common. Comparison of the data with the forward model is much more difficult, as we have to solve an inverse modelling, reverse engineering or system identification problem. Direct solution of such problems is essentially impossible, as they are normally hugely underdetermined and do not have an analytical solution. The normal approach is thus an iterative one in which a candidate set of parameters is proposed, the system run in the forward direction, and on the basis of some metric of closeness to the desired output a new set of parameters is tested. Eventually (assuming that the structural model and the equations are adequate), a satisfactory set of parameters, and hence solutions, will be found. These methods are much more computer-intensive than those required for simple forward modelling, as potentially many thousands or even millions of candidate models must be tested. We note, however, that there are a number of other modelling strategies and issues that may lead one to wish to choose different types of model from that described. First, the ODE model assumes that compartments are well stirred and that the concentrations of the participants are sufficiently great as to permit fluctuations to be ignored. If this is not the case then stochastic simulations (SS) are required. If flow of substances between many contiguous compartments is involved, and knowledge of the spatial dynamics is required (as is common in computational fluid dynamics), partial differential equations (PDEs) are necessary. SS and PDE models are again much more computationally intensive, although in the latter case the designation of a smaller subset of representative compartments may be effective (Mendes and Kell, 2001). Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques. This enables modeller to construct network in a reduced and optimized way, thus offer a “middle-out” strategy to keep insight between the two approaches: bottom-up and top-down (fig.1). Figure.1: Middle-out modeling strategy 2. MATHEMATICAL MODELLING DYNAMICS THEORY AND SYSTEM In order to turn the static map (biologist system graphic representation) into dynamic model that can provide insight into the temporal evolution of biochemical reaction networks, a set of differential equations is needed. The general rule for expressing the evolution of biochemical specific specie (x) is: dx/dt = rate of production – rate of decay ± rate of transportation. For each interaction between species we can attribute a specific function, for example, consider the Goldbeter model (fig.2) of mitotic oscillator (Goldbeter 1991) Figure.2: The Mitotic Oscillator The rate of cyclin production is a linear process (vi), the decay process is composed of tow parts: a natural exponential decay (death) and invoked decay caused by a protease-cyclin complex (X,C). We can convert our ODE in a set of block diagrams and nodes as used in engineering sciences (fig.3), for example, the cyclin equation can be modelled as in fig.4 Figure.3: Examples of blocks diagrams and nodes used in engineering sciences Vi - C kd C C+kd X vd Figure.4: The Cyclin circuit One important feature of a biochemical network is the robustness which evaluates the sensitivity of the system or the ability to preserve its homeostasis (equilibrium). As in engineering we can stimulate the system by varying the input signal (parameter) and observe its output effect, we can also test validity and efficiency of feedback by opening the loop involved (D. Angeli et al. 2003), 3. MATHEMATICAL FUNCTIONS AND CELL PHENOTYPES Regulatory network are governed by the same mathematical functions usually used by modellers to express: positive vs negative feedback, activation vs repression and inhibition, the fraction of free operator…etc. the next table (table1) gives the most useful functions for regulation. Yagil rules Michaelis-Menten Hill functions Hill functions MM with ... MM without... Hill functions Hill functions Gaussian function Delay function Inducible enzyme (as lactose) Repressible enzyme ( as trp) Enzyme catalysed reaction Activation Repression Competitive inhibition Competitive inhibition Multiple TF activation (or gate) Multiple TF repression (or gate) Internal noise Time Delay ( transcription, translation initiation) F(O)= (1+k1*Ep)/(k+k1*Ep) F(O)= (1+k1*Ep)/(1+k*k1*Ep) F(S)= Vmax*S/(S+Km) F(X)=Vmax*TFn/(Km+TFn) F(X)=Vmax/(Km+TFn) F(S)= Vmax*S/(S+KS(1+Ki)) F(S)= Vmax*S/((S+KS)*(1+I/Ki)) F(S)= (TF1/K1)n /(1+(TF1/K1)n+ (TF2/K2)n) F(S)=1 /(1+(TF1/K1)n+ (TF2/K2)n) F(X)=N(μ,σ2) X(t)=F(Y(t-τ)) Table 1: Useful function of regulation After a long experience in regulatory, transduction and metabolic networks modelling, we can deduce now a lot of rules about the cell phenotypes: apoptosis, proliferation, differentiation, stress response, mitosis, bifurcation …etc. a. Negative feedback loops Negative feedback loops, common in biochemical pathways, are known to provide stability, and withstand considerable variations and random perturbations of biochemical parameters. b. Positive feedback loops The positive-feedback network thus forms the basis for cellular memory, allowing cells of identical genotype to achieve different phenotypes depending on the external signals received. The behaviour of the system therefore depends on its history, it can drive to hysteresis. c. Delay time A generic feature in all intracellular biochemical processes is the time required to complete the whole sequence of reactions to yield any observable quantity in biological functions, theoretically time delay is known to be a source of instability, and has been attributed to lead to oscillations or transient dynamics in several biological functions. The delay in repression for example is the primary factor for inducing increased inter-cellular heterogeneity in gene expression in a population is shown theoretically and experimentally. d. Noise Genetically identical cells exposed to the same environmental conditions can show significant variation in molecular content and marked differences in phenotypic characteristics. This variability is linked to stochasticity in gene expression, which is generally viewed as having detrimental effects on cellular function with potential implications for disease. However, stochasticity in gene expression can also be advantageous. It can provide the flexibility needed by cells to adapt to fluctuating environments or respond to sudden stresses, and a mechanism by which population heterogeneity can be established during cellular differentiation and development. Negative feedback reduces fluctuations by increasing expression when protein numbers are low and decreasing expression when protein numbers are high, negative feedback is more likely to evolve as an attenuator of stochasticity in systems dominated by extrinsic fluctuations (Paulsson, 2004; Hooshangi and Weiss, 2006). Alternatively, intrinsic fluctuations could be reduced by an additional positive feedback loop to maintain high protein copy numbers despite the negative feedback needed to attenuate extrinsic fluctuations. 4. SENSITIVITY, SYNTHETIC MEASUREMENT TECHNIQUES CIRCUITS AND Sensitivity analysis represents a cornerstone in the analysis of complex systems. It treats the effect of changing some parameter P (in the model) on the reaction of some system variables. The goals of this analysis are: - Determine factors that may contribute to output variability and so need the most consideration - find out parameters that can be eliminated in order to simplify the model without altering its behavior grossly - find the optimal region for use in a calibration study - Check which groups of factors interact with each other. - Evaluate the model, thus creating an output distribution or response. - Assess the influence of each variable or group of variables using correlation/regression, Bayesian inference, machine learning, or other methods (data mining). In order to break down the complexity of regulatory network, the forward engineering of gene circuits and its ensuing experimental techniques (mutant cells, as cdc25Δ and wee1 mutation in yeast, B.Novak 2001) enable modelers to build desired network with specific properties predicted from mathematical models using knowledge from biochemistry, molecular biology, and genetics. Consequently we can engineer new cellular behaviour, and improve understanding of naturally occurring networks (Bratsun et al.2005), the next figure (fig.5) presents some samples: Figure.5: synthetic genetic networks. Massive amounts of data are being generated by genomics and proteomics projects, thanks to sophisticated genetic engineering tools (gene knock-outs and insertions, PCR) and measurement technologies (fluorescent proteins, microarrays, blotting, FRET). Polymerase chain reaction (PCR) is a technique that amplifies DNA (typically a gene or part of a gene). Creating multiple copies of a piece of DNA, which would otherwise be present in too small a quantity to detect, PCR enables the use of measurement techniques. Suppose that we wish to know at what rate a certain gene X is being transcribed under a particular set of conditions in which the cell finds itself. Fluorescent proteins may be used for that purpose. For instance, green fluorescent protein (GFP) is a protein with the property that it fluoresces in green when exposed to UV light. It is produced by the jellyfish Aequoria victoria, and its gene has been isolated so that it can be used as a reporter gene. The GFP gene is inserted (cloned) into the chromosome, adjacent to or very close to the location of gene X, so both are controlled by the same promoter region. Thus, gene X and GFP are transcribed simultaneously and then translated (Fig. 6), and so by measuring the intensity of the GFP light emitted one can estimate how much of X is being expressed. Fluorescent protein methods are particularly useful when combined with flow cytometry. Flow Cytometry devices can be used to sort individual cells into different groups, on the basis of characteristics such as cell size, shape, or amount of measured fluorescence, and at rates of up to thousands of cells per second. In this manner, it is possible, for instance, to count how many cells in a population express a particular gene under a specific set of conditions. Figure. 6: Fluorescent protein method 5. NOISE IN GENETIC NETWORKS Biochemical networks are stochastic: fluctuations in numbers of molecules are generated intrinsically by the dynamics of the network and extrinsically by interactions of the network (fig. 7) with other stochastic systems (Elowitz et al, 2002; Swain et al, 2002). Stochastic effects in protein numbers can drive developmental decisions (Arkin et al, 1998; Maamar et al,2007; Nachman et al, 2007; Suel et al, 2007), be inherited for several generations (Rosenfeld et al, 2005; Kaufmann et al, 2007), and have perhaps influenced the organization of the genome (Swain, 2004; Becskei et al, 2005). Intrinsic fluctuations are generated by intermolecular collisions affecting the timing of individual reactions. Their strength is increased by low copy numbers. The source of extrinsic fluctuations, however, is mostly unknown (Kaern et al, 2005), although cell cycle effects (Rosenfeld et al, 2005; Volfson et al, 2006) and upstream networks (Volfson et al, 2006) contribute. Yet extrinsic fluctuations dominate cellular variation in both prokaryotes (Elowitz et al, 2002) and eukaryotes (Raser and O’Shea, 2004). They are colored, having a lifetime that is not negligible but comparable to the cell cycle (Rosenfeld et al, 2005), and they are nonspecific, potentially affecting equally many molecules in the system (Pedraza and van Oudenaarden, 2005). They are thus difficult to model and their effects hard to predict (Austin et al, 2006; Cox et al, 2006; Geva-Zatorsky et al,2006; Scott et al, 2006; Sigal et al, 2006; Tanase-Nicola et al, 2006; Tsimring et al, 2006; Volfson et al, 2006; Maithreye and Sinha, 2007). Intrinsic and extrinsic stochasticity can be measured by creating a copy of the network of interest in the same cellular environment as the original network (Elowitz et al, 2002). We can then define intrinsic and extrinsic variables, and their fluctuations generate intrinsic and extrinsic stochasticity (Swain et al, 2002). Intrinsic variables typically specify the copy numbers of the molecular components of the network. Their values differ for each copy of the network. Extrinsic variables often describe molecules that affect equally each copy of the network. Their values are therefore the same for each copy. Figure . 7: Noise in regulatory network Noise strength is usually reported in terms of the standard deviation σ of a stochastic variable q. The Fano factor, defined as F = σq2 / <q> , is related to the standard deviation by σ /<q>= (F /<q>)1/2 ; because q measures molecule number, F is a dimensionless quantity. When number fluctuations are due to a Poisson process, we have F = 1. The Fano factor of an arbitrary stochastic system reveals deviations from Poissonian behaviour. It is a sensitive measure of noise and the unit in which we report our results. If we consider a single gene we can draw the ode equations as follows (fig.8) : KP : translational efficiency K R : transcriptional efficiency Figure .8: Single Gene Expression The average number of proteins synthesized per mRNA transcript is:N= Kp/γR, the mean number of number is : KR * N / γP ,finally the fano factor ≈ N+1 If we take account of the possibility of mutual activation and repression of the promoter and try to tune the transcriptional and the translational efficiency we can get different noise behaviour (fig.9) Figure .9: Slow promoter transitions and transcriptional bursting (M.Kærn2005) Intrinsic and extrinsic noise can be measured and distinguished with two reporters genes (cfp, yfp) controlled by identical regulatory sequences. In the absence of intrinsic noise, the two fluorescent proteins fluctuate in a correlated fashion over time in a single cell. Thus, in a population, each cell will have the same amount of both proteins, although that amount will differ from cell to cell because of extrinsic noise .Expression of the two genes may become uncorrelated in individual cells because of intrinsic noise, giving rise to a population in which some cells express more of one fluorescent protein than the other. The next scatter plot (fig. 10) presents the fluorescent technique using two strains of e_coli : one quiet (M22) and one noisy (D22). Each point represents the mean fluorescence intensities from one cell. Spread of points perpendicular to the diagonal line on which CFP and YFP intensities are equal corresponds to intrinsic noise, whereas spread parallel to this line is increased by extrinsic noise. The total noise generated is defined by : ηtot2 = ηint2+ηext2 Figure .10: Experimental quantification of noise ( Elowitz. Et al 2002) Finally, we can pronounce some important results concerning the study of noise in genetic networks: - Extrinsic noise is not gene-specific, but intrinsic noise is. - Extrinsic noise is predominant over intrinsic noise - Noise does not depend on the regulatory pathway, neither on absolute rate of expression. - Noise depends on the rate of a slow upstream promoter transition, such as chromatine remodelling Downstream effects of noise can have profound phenotypic consequences, drastically affecting the stability of gene expression. - Noise (and consequently cell-to-cell variability) is amplified at transition in long cascades. - Autoregulation in gene circuits (in particular negative feedback loops) provides stability. - Noise can be controlled by kinetics parameters 5. CONCLUSION Based on a lot of new research articles, This paper presents an overview of the engineering methods used for the modelling of regulatory networks. With this framework, it’s easy for the modeller to abstract its network, make a good analysis of its parts dependencies, study the sensitivity of the system (effect of tuning some parameters keys and noise) and even predict its behaviour (synthetic circuits, mutants…). For more details a lot of paradigms are available (2,3,4,5,6,8,12) References 1. Armen R Kherlopian, Ting Song, Qi Duan. A review of imaging techniques for systems biology. BMC Systems Biology 2008, 2:74. 2. A. GOLDBETER. A minimal cascade model for the mitotic oscillator involving cyclin and cdc2 kinase. Proc. Nati. Acad. Sci. USA,Vol. 88, pp. 9107-9111, October 1991 3. SOMDATTA SINHA. A Simple Approach to Study Designs in Complex Biochemical Pathways, 74th Annual Meeting, New Delhi Oct. 31 – Nov. 2, 2008 4. David Angeli, James E. Ferrell, Jr., and Eduardo D. Sontag. Detection of multistability, bifurcations, and hysteresis in a large class of biological positivefeedback systems, Pnas , 1822–1827, February 17, 2004 5. Michael C. Mackey ,, Moisés Santillán , Necmettin Yildirim . Modeling operon dynamics: the tryptophan and lactose operons as paradigms. C. R. Biologies 327 (2004) 211–224 6. Vahid Shahrezaei, Julien F Ollivier and Peter S Swain. Colored extrinsic fluctuations and stochastic gene Expression. Molecular Systems Biology 4; Article number 196; doi:10.1038/msb.2008.31 7. Mads Kærn,William J. Blake, and J.J. Collins. The Engineering of Gene Regulatory Networks. Annu. Rev. Biomed. Eng. 2003. 5:179–206. 8. Mads Kærn*, Timothy C. Elston, William J. Blake and James J. Collins. Stochasticity in Gene Expression: from theories to phenotypes. nature reviews | genetics, volume 6 , 451-463 9. Ozdudak, Thattai, Kurtser, Grossman, van Oudenaarden. Regulation of noise in the expression of a single gene. Nat Genet 31: 69-73 ,2002 10. Rosenfeld, Young, Alon, Swain, Elowitz. Gene regulation at the single-cell level. Science 307: 1962-1965. 2005 11. Pedraza, van Oudenaarden. Noise propagation in Gene networks Science 307: 1965-69. 2005 12. Dmitri Bratsun†, Dmitri Volfson, Lev S. Tsimring‡, and Jeff Hasty. Delayinduced stochastic oscillations in gene regulation. PNAS. vol. 102, no. 4,14593–14598. october 2005 13. David Sprinzak1 & Michael B. Elowitz. Reconstruction of genetic circuits. NATURE|Vol 438|24 November 2005. 14. Michael B. Elowitz, Arnold J. Levine, Eric D. Siggia,Peter S. Swain. Stochastic Gene Expression in a Single Cell. SCIENCE VOL 297, 16 August 2002. 15. Nicholas J. Guido, Xiao Wang, David Adalsteinsson. A bottom-up approach to gene regulation. NATURE Vol 439|16 February 2006 16. Jeff Hasty, David McMillen & J. J. Collins. Engineered gene circuits NATURE | vol 420 | 14 november 2002 17. Timothy S. Gardner, Charles R. Cantor & James J. Collins. Construction of a genetic toggle switch in Escherichia coli NATURE |vol 403 | 20 January 2000 18. Pratap R. Patnaik. External, extrinsic and intrinsic noise in cellular systems: analogies and implications for protein synthesis. Biotechnology and Molecular Biology Review Vol. 1 , pp. 121-127, December 2006.