Download Learning rule-based models from gene expression time

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Learning rule-based models
from gene expression time
profiles annotated with
Gene Ontology terms
Jan Komorowski and
Astrid Lägreid
Joint work with
• Torgeir R. Hvidsten, Herman Midelfart,
Astrid Lægreid and Arne K. Sandvik
J. Komorowski and A. Lägreid
Selected Challenges in Geneexpression Analysis
• Function similarity corresponds to expression similarity
but:
– Functionally corelated genes may be expression-wise dissimilar
(e.g. anti-coregulated)
– Genes usually have multiple function
– Measurements may be approximate and contradictory
• Can we obtain clusters of biologically related genes?
• Can we build models that classify unknown genes to
functional classes, that are human legible, and that
handle approximate and often contradictory data?
• How can we re-use biological knowledge?
J. Komorowski and A. Lägreid
Data
• Data material
– Serum starved fibroblasts, 8,613 genes
• Added serum to medium at time = 0
• Used starved fibroblasts as reference
• Measured gene activity at various time points
– 493 genes found to be differentially expressed
• Results
– 278 genes known (3 repeats)
– 212 genes unknown, (uncharacterized)
– 211 genes given hypothetical function with 88%
quality
J. Komorowski and A. Lägreid
Fibroblast - serum response
serum
0
quiescent
samples for
microarray
analysis
1
non-proliferating
4
8
24
proliferating
J. Komorowski and A. Lägreid
Processes
stress response
protein synthesis
transcription
organelle
biogenesis
lipid synthesis
0
quiescent
1
non-proliferating
4
8
re-entry
cell cycle
cell
motility
24
proliferating
J. Komorowski and A. Lägreid
Dynamic processes
delayed
immediate
early
immediate
intermediate
early
0
primary
quiescent
1
4
secondary
non-proliferating
late
8
24
tertiary
proliferating
J. Komorowski and A. Lägreid
Protein appears after the transcript
0
1
4
primary secondary
quiescent
non-proliferating
8
24
tertiary
proliferating
J. Komorowski and A. Lägreid
Protein dynamics are not always
similar to transcript dynamics
0
1
gene
4
transcript
8
24
protein
J. Komorowski and A. Lägreid
Molecular mechanisms of
transcriptional response
serum
= signal
effectors
secondary
transcription
factors
= cellular
response
immediate early
response factors
immediate early
response genes
delayed
immediate early
response genes
intermediate/late
response genes
J. Komorowski and A. Lägreid
The dynamics of cellular processes
stress response
cell motility
cell adhesion
DNA synthesis
energy metabolism
protein synthesis
1
cell cycle regulation
4
8
24
DNA synthesis
cell motility
lipid synthesis
cell proliferation, negative regulation
quiescent
non-proliferating
proliferating
J. Komorowski and A. Lägreid
Ontology
Methodology
Process
Defense
response
Transport
g2
g2
...
...
Positive control
of cell
proliferation
g4
...
g5
Cell cycle
control
g3
...
Gene 0HR 15MIN 30MIN 1HR 2HR 4HR 6HR 8HR 12HR 16HR 20HR 24HR
g1
0.00 -0.47 -3.32 -0.81 0.11 -0.60 -1.36 -1.03 -1.84 -1.00 -0.60 -0.94
g2
g3
0.00
0.00
0.66
0.14
0.07 0.20 0.29 -0.89 -0.45 -0.29 -0.29 -0.15 -0.45 -0.42
-0.04 0.00 -0.15 -0.58 -0.30 -0.18 -0.38 -0.49 -0.81 -1.12
g4
0.00
-0.04
0.00 -0.23 -0.25 -0.47 -0.60 -0.56 -1.09 -0.71 -0.76 -0.62
g5
...
0.00
...
0.28
...
0.37
...
0.11 -0.17 -0.18 -0.60 -0.23 -0.58 -0.79 -0.29 -0.74
...
...
...
...
...
...
...
...
...
1. Mining functional
classes from an
ontology
Process
Unknown
Transport and
defense response
Cell cycle control
Positive control of
cell proliferation
Positive control of
cell proliferation
...
2. Extracting features for learning
1.5
3. Inducing minimal decision rules
using rough sets
1
0.5
0
0 - 4(Increasing) AND 6 - 10(Decreasing)
AND 14 - 18(Constant) => GO(cell proliferation)
-0.5
-1
-1.5
-2
0
2
4
6
8
10
12
14
16
18
20
22
24
4. The function of unknown genes
is predicted using the rules
!
J. Komorowski and A. Lägreid
Gene Ontology
Cell growth and maintenance
FUNCTION
Metabolism
Energy pathways
Nucleotide and nucleic acid metabolism
DNA metabolism
Mutagenesis
DNA repair
DNA packaging
Transcription
Protein metabolism and modification
Amino-acid and derivative metabolism
Protein targeting
Lipid metabolism
Transport
GENE
FUNCTION
Ion homeostasis
PROCESS
Intracellular protein traffic
Cell death
Cell motility
Stress response
Organelle organizaton and response
Oncogenesis
Cell proliferation
Cell cycle
Cell communication
Cell adhesion
Signal transduction
Cell surface receptor linked signal transduction
Intracellular signalling cascade
Developmental processes
CELLULAR
COMPARTMENT
Physiological processes
Blood Coagulation
Circulation
J. Komorowski and A. Lägreid
Biological processes from GO
Energy pathways
DNA metabolism
Amino acid and derivative
metabolism
Protein targeting
Lipid metabolism
Transport
Ion hemostasis
Intracellular traffic
Cell death
Cell motility
Stress response
Oncogenesis
Cell cycle
Cell adhesion
Cell surface receptor linked signal
transduction
Developmental processes
Blood coagulation
Circulation
Intracellular signaling
cascade
Organelle organization and biogenesis
J. Komorowski and A. Lägreid
Hierchical Clustering of the
Fibroblast Data
It’s not a cluster!
J. Komorowski and A. Lägreid
Gene Ontology vs. Clusters found
by Iyer et al.
J. Komorowski and A. Lägreid
Template-based feature synthesis
Templates:
Increasing
Decreasing
Constant
All possible
subintervals
in the time series
+
Gene expression
time series data
MATCH
Groups containing
genes matching the
same templates over
the same subinterval
12 measurement points, 55 possible intervals of length >2
J. Komorowski and A. Lägreid
Examples of template definitions
Increasing-template
1.0
M IN. 0
M IN. 0.1
M IN. 0.6
M AX 0.2
M IN. 0.1
2HR
0.5
4HR
M IN. 0
6HR
8HR
12HR
Constant-template
M IN. 0.2
M EAN
M IN. 0.2
8HR
4HR
6HR
8HR
12HR
J. Komorowski and A. Lägreid
3
2.5
Rule example 1
2
1.5
1
0.5
0
-0.5
-1
0
2
4
6
8
10
12
14
16
18
20
22
24
Rule
0 - 4(Constant) AND 0 - 10(Increasing) =>
GO(protein metabolism and
modification) OR
GO(mesoderm development) OR
GO(protein biosynthesis)
Covered genes
M35296 J02783 D13748
X05130
X60957
D13748
U90918 (unknown)
J. Komorowski and A. Lägreid
1.5
1
Rule example
2
0.5
0
-0.5
-1
-1.5
-2
0
2
4
6
8
10
12
14
16
18
20
22
24
Rule
Covered genes
0 - 4(Increasing) AND 6 - 10(Decreasing)
AND 14 - 18(Constant) =>
GO(cell proliferation) OR
GO(cell-cell signaling) OR
GO(intracellular signaling cascade) OR
GO(oncogenesis)
Y07909 X58377 U66468
X58377
X85106
Y07909
J. Komorowski and A. Lägreid
Classification using templatebased rules
IF … THEN …
IF … THEN …
IF … THEN …
IF … THEN …
IF … THEN …
IF … THEN …
IF … THEN …
X60957
3
2.5
2
1.5
1
0.5
0
-0.5
0
2
4
6
8 10 12 14 16 18 20 22 24
-1
IF 0 - 4(Constant) AND 0 - 10(Increasing) THEN GO(prot. met. and mod.) OR …
IF … THEN
IF … THEN …
IF … THEN …
IF … THEN …
IF … THEN …
IF … THEN …
IF … THEN …
IF … THEN …
IF … THEN …
…
+4
Process
Votes
protein metabolism and modification
protein amino acid phosphorylation
proteolysis and peptidolysis
transcription
transport
vision
…
6
3
2
1
1
1
Votes are normalized and processes with vote fractions higher
than a selection-threshold are chosen as predictions
J. Komorowski and A. Lägreid
Cross validation estimates Iyer et al.
PROCESS
AUC
SE
Ion homeostasis
Protein targeting
Blood coagulation
DNA metabolism
Intracellular signaling cascade
Energy pathways
Cell cycle
Oncogenesis
Circulation
Cell death
Developmental processes
Transcription
Defense (immune) response
Cell adhesion
Stress response
Protein metabolism and modification
Cell motility
Cell surface rec linked signal transd
Lipid metabolism
Transport
Cell organization and biogenesis
Cell proliferation
Amino acid and derivative metabolism
1.00
0.99
0.96
0.94
0.94
0.93
0.93
0.92
0.91
0.90
0.90
0.88
0.88
0.87
0.86
0.85
0.84
0.82
0.81
0.79
0.79
0.79
0.69
0.00
0.03
0.08
0.09
0.06
0.12
0.04
0.11
0.11
0.10
0.07
0.11
0.05
0.09
0.15
0.10
0.11
0.15
0.14
0.17
0.11
0.06
0.06
0.88
0.09
AVERAGE
A:
Coverage: 84%
Precision: 50%
B:
Coverage: 71%
Precision: 60%
C:
Coverage: 39%
Precision: 90%
Coverage = TP/(TP+FN)
Precision = TP/(TP+FP)
J. Komorowski and A. Lägreid
Cross validation estimates Cho et al.
Process
GO
AUC
SE
apoptosis*
carbohydrate metabolism
cell adhesion*
cell cycle control*
cell motility*
cell proliferation
cell surface rec linked signal transd
cell-cell signaling
DNA metabolism
energy pathways
humoral immune response
immune response
intracellular signaling cascade
lipid metabolism
mesoderm development
mitotic cell cycle*
neurogenesis
oncogenesis
phototransduction
physiological processes
protein biosynthesis
protein metabolism and modification
protein amino acid phosphorylation
proteolysis and peptidolysis
transcription
transport
vision
GO:0006915
GO:0005975
GO:0007155
GO:0000074
GO:0006928
GO:0008283
GO:0007166
GO:0007267
GO:0006259
GO:0006091
GO:0006959
GO:0006955
GO:0007242
GO:0006629
GO:0007498
GO:0000278
GO:0007399
GO:0007048
GO:0007602
GO:0007582
GO:0006412
GO:0006411
GO:0006468
GO:0006508
GO:0006350
GO:0006810
GO:0007601
0.81
0.72
0.77
0.83
0.81
0.80
0.79
0.80
0.78
0.76
0.77
0.81
0.81
0.71
0.77
0.84
0.78
0.77
0.85
0.77
0.80
0.77
0.82
0.80
0.71
0.71
0.83
0.01
0.02
0.02
0.01
0.01
0.01
0.01
0.01
0.02
0.02
0.02
0.01
0.02
0.02
0.02
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.02
0.01
0.01
0.01
AVERAGE
0.78
0.01
Coverage: 58%
Precision: 61%
Coverage = TP/(TP+FN)
Precision = TP/(TP+FP)
J. Komorowski and A. Lägreid
Protein Metabolism and Modification
A
B
D
E
C
A – annotations
B – false negatives
C – false positives
D – true positives
E – pred. unknown gene
J. Komorowski and A. Lägreid
Re-classification of the Known Genes
J. Komorowski and A. Lägreid
Co-classifications for the Unknown Genes
J. Komorowski and A. Lägreid
Conclusions
• Our methodology
– Incorporates background biological knowledge
– Handles well the noise and incompleteness in the
microarray data
– Can be objectively evaluated
– Predicts multiple functions per gene
– Can reclassify known genes and provide possible
new functions of the known genes
– Can provide hypotheses about the function of
unknown genes
• Experimental work needs to be done to
confirm our predictions
J. Komorowski and A. Lägreid
Genomic ROSETTA:
http://www.idi.ntnu.no/~aleks/rosetta
J. Komorowski and A. Lägreid
Related documents