Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Network Inference, With an Application to Yeast Systems Biology
Center for Genomic Sciences
Cuernavaca, Mexico
September 25, 2006
Reinhard Laubenbacher
Virginia Bioinformatics Institute
And
Department of Mathematics
Virginia Tech
http://polymath.vbi.vt.edu
Contributors and Collaborators
Applied Discrete Mathematics Group
(http://polymath.vbi.vt.edu)
• Miguel Colòn-Velez
• Elena Dimitrova (now at Clemson U)
• Luis Garcia (now at Texas A&M)
• Abdul Jarrah
• John McGee (now at Radford U)
• Brandy Stigler (now at MBI)
• Paola Vera-Licona
Collaborators
• Diogo Camacho (VBI)
• Ana Martins (VBI)
• Pedro Mendes (VBI)
• Wei Shah (VBI)
• Vladimir Shulaev (VBI)
• Michael Stillman (Cornell)
• Bernd Sturmfels (UC
Berkeley)
Funding: NIH, NSF,
Commonwealth of VA
“All processes in organisms,
from the interaction of molecules to the complex
functions of the brain and other whole organs,
strictly obey […] physical laws.
“Where organisms differ from inanimate matter is
in the organization of their systems and especially
in the possession of coded information.”
E. Mayr, 1988
A multiscale system
Environment
Organism
Molecular
networks
Genome
Increasing
complexity
Discrete models
“[The] transcriptional control of a gene
can be described by a discrete-valued function
of several discrete-valued variables.”
“A regulatory network, consisting of many
interacting genes and transcription factors,
can be described as a collection
of interrelated discrete functions
and depicted by a wiring diagram
similar to the diagram of a digital logic circuit.”
Karp, 2002
Model Types
Ideker, Lauffenburger, Trends in Biotech 21, 2003
Biochemical Networks
Metabolic space
Metabolite 1
Protein 2
Metabolite 2
Protein space
Complex 3:4
Protein 4
Protein 3
Protein 1
Gene 2
Gene 3
Gene 1
Gene space
Gene 4
Brazhnik, P., de la Fuente, A. and Mendes, P. Trends in Biotechnology 20, 2002
Introduction to oxidative stress and CHP
• Oxidative Stress is a general term used to describe the steady state
level of oxidative damage in a cell, tissue, or organ, caused by the
species with high oxidative potential.
• Cumene hydroperoxide (CHP) is an organic peroxide, thus has high
oxidative potential. CHP is very reactive and can easily oxidize
molecules such as lipids, proteins and DNA.
• Oxidation by CHP
H
O
H
O
CH3
C
O
CH3
+ X
Cumene
hydroperoxide (CHP)
CH3
C
CH3 + oxidized X
Cumyl alcohol (COH)
Courtesy of Wei Sha
Glutathione-glutaredoxin antioxidant defense system
Glu + Cys
Feedback
inhibition
 -glutamylcysteine synthetase (GSH1)
-GluCys
glutathione synthetase (GSH2)
+ Gly
NADP+
NADPH
glutathion oxidoreductase (GLR1)
thioredoxin reductase (TRR1)
ROOH
+
GSSG + ROH
GSH
(peroxides)
+
RX
glutathione peroxidase
(GPX1, GPX2, GPX3)
glutaredoxin
(GRX1, GRX2)
(alcohol or water)
glutathione S-transferase
(GTT1, GTT2)
HX + R-SG
Courtesy of Wei Sha
Saccharomyces cerevisiae systems biology at VBI
Experimentation
Quench metabolism in
Experimental treatment
(i.e. oxidative stress)
Cell growth in controlled
batch (in fermentors)
Break cells with
high frequency
sound waves
Metabolite
extraction
Sample Prep
Protein
extraction
GC-MS
Analysis
Modeling
LC-MS
cold buffered methanol
Separate cells
from the media
Freeze-dry
CE-MS
Data
2D PAGE,
MALDI-MS
Affymetrix
GeneChipTM
RNA
extraction
Samples for
metabolites,
RNA and proteins
Courtesy V. Shulaev
Experimental design
CHP treated Samples
Cumene hydroperoxide (CHP)
0 min
3 min
6 min
12 min
20 min
40 min
70 min
120 min
20 min
40 min
70 min
120 min
Wild type yeast culture
1
Wild type yeast culture 2
Wild type yeast culture 3
Control Samples
Buffer (EtOH)
0 min
1
Wild type yeast culture
Wild type yeast culture 2
Wild type yeast culture
3 min
6 min
12 min
3
Fermentor that contains yeast cell culture
Affymetrix Yeast Genome S98 array
Courtesy W. Sha
Why is it important to use control samples?
Control samples
CHP treated samples
Comparisons
Significantly
changed
genes
(p<0.01)
Upregulated
genes
Downregulated
genes
Comparisons
Significantly
changed
genes
(p<0.01)
Upregulated
genes
Downregulated
genes
Cont_3min vs.
Cont_0min
1
0
1
CHP_3min vs.
CHP_0min
26
25
1
Cont_6min vs.
Cont_0min
4
2
2
CHP_6min vs.
CHP_0min
235
170
65
Cont_12min vs.
Cont_0min
2
1
1
CHP_12min vs.
CHP_0min
1093
512
581
Cont_20min vs.
Cont_0min
18
12
6
CHP_20min vs.
CHP_0min
1646
867
779
Cont_40min vs.
Cont_0min
1054
571
483
CHP_40min vs.
CHP_0min
1643
932
711
Cont_70min vs.
Cont_0min
2709
1343
1366
CHP_70min vs.
CHP_0min
1800
1067
733
Cont_120min vs.
Cont_0min
2829
1344
1485
CHP_120min vs.
CHP_0min
2465
1344
1121
Courtesy W. Sha
Cumene hydroperoxide (CHP) and
cumyl alcohol (COH) progress curves
In medium
Concentration (mM)
In yeast cell culture
100
CHP
80
60
40
20
COH
0
0
10
20
30
40
50
Time (h)
Courtesy W. Sha
Pathways induced by oxidative stress were identified
GO term
3min
6min
12min
20min
Response_to_stress
>0.1
0.016645
0
0
Carbohydrate_metabolism
>0.1
>0.1
0
0
Sporulation
>0.1
>0.1
0.02106
0.000644
Protein_catabolism
>0.1
>0.1
0.030576
0
Signal_transduction
>0.1
>0.1
0.032308
0.000449
KEGG term
3min
6min
12min
20min
Glutathione metabolism
>0.1
0.002443
0.013898
5.4E-05
Glycerolipid metabolism
>0.1
0.01106
0.00064
0.020554
Starch and sucrose metabolism
>0.1
>0.1
0.000485
0.000364
Fructose and mannose metabolism
>0.1
>0.1
0.006715
0.028673
Proteasome
>0.1
>0.1
>0.1
8.81E-15
Ubiquitin mediated proteolysis
>0.1
>0.1
>0.1
0.004418
Courtesy W. Sha
Pathways repressed by oxidative stress were identified
GO term
3min
6min
12min
20min
Nuclear_organization_and_biogenesis
>0.1
0.022552
0.000632
0.036472
Ribosome_biogenesis_and_assembly
>0.1
0.093905
0
0
Organelle_organization_and_biogenesis
>0.1
>0.1
0
0
RNA_metabolism
>0.1
>0.1
0
0
Cell_cycle
>0.1
>0.1
0.00014
0.036506
Cytokinesis
>0.1
>0.1
0
0
Electron_transport
>0.1
>0.1
>0.1
0
KEGG term
3min
6min
12min
20min
cell cycle
>0.1
0.006487
1.232E-07
0.001035
purine metabolism
>0.1
0.009725
5.656E-10
1.133E-09
RNA polymerase
>0.1
>0.1
5.396E-13
2.423E-09
pyrimidine metabolism
>0.1
>0.1
8.983E-11
6.318E-09
Courtesy W. Sha
k-means clustering analysis result
1
4
2
3
5
Courtesy W. Sha
Pathway analysis for each cluster
1.5
1.5
1
1
10
20
0
-0.5 0
-1
-1
-1.5
-1.5
2
4
1.5
1
20
-0.5 0
10
20
Proteasome
Ubiquitin mediated proteolysis
MAPK signaling pathway
0
10
20
ATP synthesis
Starch and sucrose metabolism
1
5
0.5
-0.5 0
-1
-0.5
-1
0
0.5
0
3
0
10
Galactose metabolism
Oxidative phosphorylation
1
0.5
0.5
0.5
0
-0.5 0
2
1
10
20
-1
-1.5
Ribosome
Cell cycle
Where are the
oxidative
stress defense
pathways?
RNA polymerase
Purine metabolism
Pyrimidine metabolism
Courtesy W. Sha
Genotype
Phenotype
YAP1 was successfully knocked
out in yap1 mutant yeast
The transformation of CHP to COH
in wild type
Time series of YAP1 gene expression level in
wild type control sample
wild type CHP treated sample
yap1 mutant Control sample
160
140
120
100
80
60
40
20
0
in yap1 mutant
Concentration (mM)
Expression level
yap1 mutant CHP treated sample
250
200
CHP
150
100
50
COH
0
0
4
8
12
Time (min)
16
20
0
20
40
60
80
100
120
Time (min)
Courtesy W. Sha
Claytor Lake Network
M1
M2
M23
Courtesy P. Mendes
“Bottom-up modeling:” Model individual
pathways and aggregate to system-level
models
“Top-down modeling:” Develop network
inference methods for system-level
phenomenological models
Courtesy P. Mendes
Genetic Regulation
Courtesy P. Mendes
http://web.mit.edu/esgbio/www/pge/lac.html
I
= lac repressor
= protein which regulates transcription of lac mRNA (genes in blue)
Z = beta-galactosidase
= protein which cleaves lactose to produce glucose, galactose, and
allolactose
Y = Lactose permease
= protein which transports lactose into the cell
Discrete Model for lac Operon
fM = A
fB = M
fA = A  (L  B)
fL = P  (L   B)
fP = M
M = mRNA for lac genes: LacZ, LacY, LacA
B = beta-galactosidase
A = allolactose
= isomer of lactose (inducer)
L = lactose (intracellular)
P = lactose permease
Model assumptions
• Transcription/translation require 1 time unit
• mRNA/protein degradation require 1 time unit
• Extracellular lactose always available
Discrete Model with Dynamics
(M, B, A, L, P)
Variables x1, … , xn with values in a finite set X.
(s1, t1), … , (sr, tr) state transition observations with sj, tj
ε Xn.
Goal: Identify a collection of “best” dynamical systems
f=(f1, … ,fn): Xn → Xn
such that f(sj)=tj.
(1) Wiring diagram
(2) Dynamics
R. Laubenbacher and B. Stigler, A computational
algebra approach to the reverse-engineering of
gene regulatory networks, J. Theor. Biol. 229
(2004)
A. Jarrah, R. Laubenbacher, B. Stigler, and M.
Stillman, Reverse-engineering of polynomial
dynamical systems, Adv. in Appl. Math. (2006)
in press
Method Validation:
Simulated gene network
Pandapas network
• 10 genes, 3 external biochemicals
• 17 interactions
Time course data: 9 time points
• Generated 8 time series for wildtype, knockouts G1, G2, G5
• 192 data points
• G6, G9 constant
Data discretization
• 5 states per node
• 95 data points
– 49% reduction
– < 0.00001% of 513 total states
Courtesy B. Stigler
Method Validation:
Simulated gene network
Minimal Sets Algorithm
• 77% interactions
• Identified targets of P2, P3 (x12, x13)
• 11 false positives, 4 false negatives
Pandapas
Courtesy B. Stigler
Reverse engineered
Example: Gene Regulatory Networks

G1 (t ) 

0.011 
dG1
 0.01  G1 (t )   G (t )

1
dt
0.01  G3 (t )

G1 (t ) 

0.011 
0
.
01

G
(
t
)
dG2
1

  G (t )

2
dt
0.01  G3 (t )
dG3
10 6

 G3 (t )
dt
(0.01  G1 (t ))( 0.01  G3 (t ))( 0.01  G5 (t ))
dG4
0.01

 G4 (t )
dt
0.01  G3 (t )
dG5 
G3 (t ) 
G1 (t ) 

  G5 (t )


 1 
1


dt
 0.01  G1 (t )  0.01  G3 (t ) 
Stable steady states:
(1.99006, 1.99006, 0.000024814, 0.997525, 1.99994)
(-0.00493694, -0.00493694, -0.0604538, -0.198201, 0.0547545)
Data (discretized to 5 states)
1
1
1
1
1
0.203145
0.203145
0.135339
0.169883
3.469657
0.415507
0.415507
0.018334
0.220206
3.478608
1.192199
1.192199
0.002502
0.600941
2.773302
1.760581
1.760581
0.00036
0.883442
2.223943
1.941092
1.941092
7E-05
0.973211
2.047744
1.980977
1.980977
3.09E-05
0.993022
2.008786
1.988499
1.988499
2.56E-05
0.996752
2.001455
1.989805
1.989805
2.49E-05
0.997398
2.000187
1.990021
1.990021
2.48E-05
0.997505
1.999978
1.990056
1.990056
2.48E-05
0.997522
1.999944
3
1
2
2
1
1
0
1
1
2
1
0
0
1
4
3
0
0
1
3
3
1
0
1
2
4
2
0
1
2
4
2
0
2
2
4
2
0
2
2
4
2
0
2
2
4
2
0
2
2
4
2
0
2
2
Algorithm input: 7 such time courses, 60 state transitions
A model for 1 wildtype time series
f1 = – x4+1
f2 = 1
f3 = x4+1
f4 = 1
f5 = – x53 – 2x52+2x4 – 2x5 – 2
G1
var1 = {4}
var2 = {}
G3
var3 = {4}
var4 = {}
var5 = {4, 5}
G2
G4
G5
G1
G3
G2
G4
G5
Adding another wildtype
time series
G2
Adding a knockout
time series
All time series
G1
G1
G1
G3
G3
G3
G4
G5
G2
G4
G5
G2
G4
G5
Using 10 random variable orders
G1
G3
G2
G4
G5
 Wiring diagram missing two (20%) edges; includes 5 indirect
interactions.
 Network has 55 = 3125 possible state transitions.
 Input: 60 ( = approx. 2%) state transitions.
Dynamics
Wild type time series 1
4
3.5
Gene expression
3
2.5
G1, G2
2
G3
1.5
G4
1
G5
0.5
0
-0.5
1
2
3
4
5
6
7
8
9
10
11
-1
Time
Stable steady state:
(1.99006, 1.99006, 0.000024814, 0.997525, 1.99994)
Fixed point:
(4, 4, 2, 4, 2)
Dynamics
f1 = 3x3x53+x54+4x33+x12x5+4x3x52+2x12+3x32+4x1x5+x52+4x3+4x5+3
f2 = 4x3x54+4x3x53+4x54+x33+4x12x5+2x3x52+4x53+2x12+2x1x3+3x32+2x3x5+4x52+4x1+4x5+1
f3 = x13x4+4x1x43+3x12x4+4x1x42+x43+x12+x1x4+x42+x1+x4+4
f4 = x3x54+2x3x53+3x54+4x33+x12x5+3x32x5+2x3x52+2x1x3+2x32+x52+4x1+x3+4x5+4
f5 = 4x3x54+3x3x53+2x53+x12+2x1x3+2x32+4x3+4x5+4
Phase space: There are 4 components and 4 fixed point(s)
Components
Size
Cycle Length
1
2200
1
2
890
1
3
10
1
4
25
1
TOTAL: 3125 = 55 nodes
Printing fixed point(s)...
[ 0 1 2 1 0 ] lies in a component of size 25.
[ 2 2 4 2 3 ] lies in a component of size 10.
[ 4 4 2 2 3 ] lies in a component of size 890.
[ 4 4 2 4 2 ] lies in a component of size 2200.
Summary
• To use “omics” data set to their full potential
network inference methods are useful.
• Cellular processes are dynamical systems, so
we need methods for the inference of
dynamical systems models.
• Special data requirements.
• Models are useful to generate new hypotheses.
• Validation of modeling technologies is crucial.
Related documents