Download Integrating Causal Models and Trend Analysis for Process Fault

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Signal transduction wikipedia , lookup

Organ-on-a-chip wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

Transcript
Postdoctoral Research Work
Quantitative Models for Cellular Signaling Pathways
Mano Ram Maurya
Cells respond to its environment through cellular signaling pathways which process and transmit the
effect of changes in the environment to the nucleus. Proper functioning of these pathways is required for
various cellular- and organ-level activities such as homeostasis, metabolism, growth, learning, and celldeath (apoptosis). Malfunctions in these pathways often result in a variety of diseases including mildlyfatal diseases such as cholera and deadliest ones such as cancer. Hence, a deep understanding of these
pathways is essential to design effective drugs for various diseases. This deeper understanding involves
an array of experimental and analysis tools. Some of the experimental tools are those from genomics for
sequence analysis, microarrays for gene-expression, X-ray crystallography for protein structure and
function prediction, identification of binding sites and complex molecular interactions, mass-spectrometry
and fluorescence-microscopy for accurate measurement of species concentrations, etc. Mathematical
analyses include analysis of microarray data to identify differentially regulated genes and effect of geneknockouts, statistical analysis, ab-initio methods for protein structure and function prediction, qualitative
and quantitative modeling, etc. My research work focuses on quantitative modeling.
Signaling pathways are composed of several well defined modules with complex interactions among
them. Modules themselves are immensely complex and exhibit nonlinear behavior. Although, for most of
the signaling modules, detailed quantitative models are only slowly emerging, there exist some well
understood modules and pathways. This rich understanding can be exploited to develop compact models
about these (well studied) individual subsystems in a given context (e.g., to be able to predict certain
observations) so that computational resources can be directed towards detailed understanding of those
modules (or parts of the pathway) which are not yet well understood. Hence, there exists opportunity for
coarse-graining the models for well understood subsystems. Every model involves parameters such as rate
constants, diffusion coefficients, structural parameters, etc. In most of the models discussed in the
literature, often the chosen parameter-values in such models are based upon the information about
comparable systems in a general context. The parameter-values do lie in the physiological range of
interest and that the models are capable of making qualitatively accurate predictions, but their ability to
make quantitative prediction for a particular system may be limited. Hence, parameter estimation using
system-specific experimental data becomes an essential task in modeling.
Based upon above philosophy for gaining deeper understanding about signaling pathways, three
interrelated areas of my research are: modeling, coarse-graining (model reduction) and parameter
estimation. The target biochemical systems in this work are GTPase cycle module of m1 muscarinic
acetylcholine receptor, Gq, and regulator of G-protein signaling 4 (RGS4, a GTPase activating Protein
(GAP)) and Calcium signaling pathway. GTPase cycle mediates signal transduction from the primary
messengers such as ligands (stimuli on the cell surface) to secondary messengers such as Calcium and
other downstream signaling components such as protein-kinase cascades. Both, G-protein signaling and
Calcium signaling are among the most ubiquitous signaling systems in eukaryotes. I have also developed
a novel data-mining approach to reduce the number of false positives.
First part of my work dealt with parameter estimation for a detailed model of the GTPase cycle module
followed by the development of a reduced-order/simplified model for the same. The detailed model
contained 48 reaction rate parameters and 17 distinct chemical species. The main focus was on the
Mano Ram Maurya - 2
development of methodologies for model-reduction for systems with unknown parameters. In one
methodology, a multiparametric variability analysis (MPVA)-based approach is used to systematically
eliminate some of the reactions from the detailed model. The parameters are estimated using a hybridgenetic-algorithm (GA)-based optimizer. An implicit MPVA is performed by utilizing the results
available from GA-based parameter estimation. A GA-based optimization presents the user with several
competing (near-best or pseudo-global) solutions (parameter-value sets). In this approach a parameter is
characterized as being important if its value across the parameter-value sets with good fit to the data of
interest does not vary much. These parameters are less likely to get eliminated in the process of modelreduction. Similarly, those which vary considerably across these sets are termed less-important and are
more likely to get eliminated.
I also developed a mixed-integer nonlinear optimization-based approach in which both the reducednetwork topology and the unknown parameters are determined simultaneously using a GA (more
generally, stochastic-search). Thus, no iterations are needed. In this approach, binary variables are used to
indicate whether or not a parameter is retained in the model. Complex expressions in which some
parameters should be retained or be eliminated simultaneously can be handled by introducing appropriate
constraints. The key idea is to substitute each parameter, say, k, by the expression kret*k, and then to
optimize with respect to both k and kret to minimize the fit error between experimental data and model
predictions. The relevant constraints also are reformulated appropriately. kret = 1 or 0 mean that parameter
is retained or eliminated, respectively. The computational complexity of the overall process is about only
twice of the complexity of parameter estimation for the detailed model. Thus, this approach is much faster
than the MPVA approach. Nevertheless, the MPVA approach has its own utility since it provides an
intuitive and global characterization of the nonlinear parametric variability or sensitivity.
The second project is on data-mining. High-false positive rate is very common in data mining. Based
upon the concept of minimal models (in terms of the size of the model), I developed a novel approach for
reducing the number of false positives. The model developed is essentially a simple input/output model.
In this approach, first the significant components (inputs or predictors) are identified using a principal
component regression- (PCR)-based I/O modeling and by comparing the coefficients with the standard
deviation of the coefficients of a population of models with random outputs (random-models). Then, by
using an exhaustive combinatorial search, the model with all the significant predictors is further
simplified by excluding some of the predictors while keeping the fit-error for the minimal models
statistically same as that for the detail model with all the predictors. The application of the approach is to
identify the main signaling pathways active during (responsible for) the release of different cytokines in
RAW 264.7 macrophages upon stimulation with different ligands. In this case, since all the signaling
pathways were not measured, a two-part model was developed. The first part, in which measured
signaling activity serves as the inputs, strives to capture most of the output (cytokine release). In the
second part, significant variations in the residuals (from the first part) are further captured by using the
ligands as an input. These significant ligands in the residuals model, which are much lesser in number as
compared to the measured pathways, provide a way to estimate the contribution through the unmeasured
pathways. In this study, data specific to the stimulation by Toll-like receptor (TLR) ligands and non-TLR
ligands was studied separately to unmask the strong effect of the pathways specifically activated through
the TLRs. Then the models were combined along with the information gleaned from ANOVA about
significant ligands to prepare a global (simplified) network-map for cytokine release.
The third project deals with the development of a detailed model for calcium signaling in RAW 264.7
cells. RAW 264.7 cells are macrophage-like, Abelson leukemia virus transformed cell line derived from
BALB/c mice (AfCS data center, http://www.signaling-gateway.org/, protocol ID: PP00000159). Models
for calcium response in other cells, e.g., cardiac muscle cells (myocyte) and neuronal cells, serve as initial
models to be explored further. However, since calcium concentrations in macrophages are quite different
from those in myocytes and neuronal cells, many parameters are expected to be different. One of the
Mano Ram Maurya - 3
challenges is that different repeats of the experiment (activation by ligand (stimulus)) result in different
quantitative responses because several factors inside the cell affecting calcium response cannot be
controlled. Hence, response from several controls is used for parameter estimation. Knockdown data is
also used for parameter estimation since the knockdowns, essentially, manifest local perturbations of the
network. The only available measurement is the concentration of free calcium in the cytosol. Hence, the
size of the model and the number of parameters should be kept small. Towards this end, an expandable
simplified model-structure has been used in which some lumped reactions/steps are used. To account for
unmeasurable variability inside the cell, some of the initial states are allowed to vary within the
physiological range signifying that different cells can be at different states of the cell cycle. Thus, the
unknown parameters include both the kinetic parameters and unknown initial states. Utilization of
multiple datasets (from control and knockdown experiments) required the development of dedicated
computer models. In this data-specific computer model, the unknown parameters that can vary from cellto-cell are instantiated for each dataset. This results in an increased number of unknown parameters to be
estimated but it provided a logical way of utilizing the full range of experimental data. To expedite the
modeling process, a prototype modeling tool has been developed in MATLAB to generate the modelspecific C++ code semi-automatically that is combined with a stochastic-search-based optimization
program to estimate the model-parameters. The optimization program has been parallelized using
message passing interface (MPI). The resulting quantitative model can be used to predict novel
knockdown phenotypes.
Summary:

Developed a detailed model for the GTPase cycle module of m1 muscarinic acetylcholine
receptor, Gq, and regulator of G-protein signaling 4 (RGS4, a GTPase activating Protein (GAP).
Major emphasis was on parameter estimation while ensuring that all relevant thermodynamic
constraints were satisfied.

Developed a multiparametric variability analysis-based methodology for model reduction and
applied it to develop a reduced-order model for the above system (GTPase cycle module).

Developed a stochastic-search-based optimization program for parameter-estimation. Parallelized
this program so that it can be run on a supercomputer or a cluster of workstations to deal with
large-scale problems.

Developed a mixed-integer nonlinear-optimization-based approach for model reduction and used
it to develop a reduced-order model for the GTPase cycle module.

Developed a data mining and analysis framework to reduce the number of false positives using
Principal Component Regression and model-size minimization, used the framework to identify
important signaling pathways involved in cytokine release in macrophage and to develop an
input/output model.

Developed a prototype modeling software in MATLAB for utilization of knockdown/knockout
data with cell-to-cell variation in kinetic modeling of biochemical reaction networks.

Developed an expandable simplified model for Calcium signaling in RAW 264.7 cells using the
parameter estimator.