Download Talk Powerpoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Herpes simplex virus wikipedia , lookup

Transcript
Harvard Medical School
Massachusetts Institute of Technology
Inferring Nonstationary Gene Networks
from Temporal Gene Expression Data
Hsun-Hsien Chang1, Jonathan J. Smith2, Marco F. Ramoni1
1Children’s
Hospital Informatics Program,
Harvard-MIT Division of Health Sciences and Technology,
Harvard Medical School
2Department of Mathematics,
Massachusetts Institute of Technology
IEEE Workshop on Signal Processing Systems
October 7, 2010
1
Harvard Medical School
Massachusetts Institute of Technology
Background
• Genetic information flows
from DNA to RNA through
transcription.
• Gene expression is the measure
of RNA abundance in cells,
revealing the gene activities.
• Modern microarray technologies
are able to assess expression of
50K genes in parallel.
2
Harvard Medical School
Massachusetts Institute of Technology
Clinical Applications
Multiple patients in
distinct biological
conditions.
...
gene
expre.
T0
T1
T2
T3
T4
T5
• Thanks to cost down, more samples can be collected
in a single study. A new clinical application:
– Monitor time-series gene expression in response to drugs,
treatments, vaccines, virus infection, etc.
3
Harvard Medical School
Massachusetts Institute of Technology
Time-Series Gene Expression Analysis
• Since genes interact each other in cells, an intriguing
analysis is to infer gene networks:
– Detailed models (e.g., differential equations).
– Abstract models
(e.g., Boolean
networks).
gene on
gene off
– Probabilistic graphical models (e.g., dynamic
Bayesian networks).
• Do not require densely sampled data.
• Model expression levels by random variables to
handle noisy expression measurements and
biological variability.
• Utilize the inferred networks to make prediction.
4
Harvard Medical School
Massachusetts Institute of Technology
Data Representation by Bayesian Networks
• Bayesian networks are directed acyclic graphs where:
– Nodes correspond to random variables (i.e.,
expressions of genes, clinical variables).
– Directed arcs encode conditional
probabilities of the target (child) nodes on
the source (parent) nodes.
A
C
B
D
E
• Dynamic Bayesian networks with arcs indicating
temporal dependency.
– Example: variables X and Y
at time T modulate variable
Z at time T+1.
– The network model
can serve as a
prediction tool.
XT
ZT+1
YT
XT
given
ZT+1
predicted
YT
5
Harvard Medical School
Massachusetts Institute of Technology
Network Inference Engine
Clinical
variable
VT
VT+1
AT
AT+1
BT
BT+1
CT
CT+1
NT
NT+1
Genes
• First-order Markov process:
data at time T+1 depends
only on the preceding time
T.
• For a variable at a time
T+1, search which set of
variables at time T has the
highest likelihood of
modulating its value at
T+1.
• Step-wise search
algorithm.
6
Harvard Medical School
Massachusetts Institute of Technology
Inference of Whole Dynamic Gene Network
• Infer a transition network between every pair of times.
VT
VT+1
VT+2
AT
AT+1
AT+2
BT
BT+1
BT+2
CT
CT+1
CT+2
NT
NT+1
NT+2
7
Harvard Medical School
Massachusetts Institute of Technology
Parallelize Learning Individual Transition Nets
VT
VT+1
VT+1
VT+2
VT+2
AT
AT+1
AT+1
AT+2
AT+2
BT
BT+1
BT+1
BT+2
BT+2
CT
CT+1
CT+1
CT+2
CT+2
NT
NT+1
NT+1
NT+2
NT+2
8
Harvard Medical School
Massachusetts Institute of Technology
Parallelize Parent Searching of Individual Variables
VT
VT+1
AT
AT+1
BT
BT+1
CT
CT+1
NT
NT+1
9
Harvard Medical School
Massachusetts Institute of Technology
Step-by-Step Prediction
given
data
given
predicted
data
predicted
VT
VT+1
VT+2
AT
AT+1
AT+2
BT
BT+1
BT+2
CT
CT+1
CT+2
NT
NT+1
NT+2
10
Harvard Medical School
Massachusetts Institute of Technology
Forecasting by Initial Data
given
data
predicted
predicted
VT
VT+1
VT+2
AT
AT+1
AT+2
BT
BT+1
BT+2
CT
CT+1
CT+2
NT
NT+1
NT+2
11
Harvard Medical School
Massachusetts Institute of Technology
Clinical Study: HIV Viral Load Tracking
• Global AIDS epidemic is one of the greatest threats to
human health, causing 2 million deaths every year.
• Viral load (i.e., virus density in blood) is:
– associated with clinical outcomes.
– an indicator of which treatment physicians should provide.
• If there is a tool to predict/forecast viral load trajectory,
physicians could foresee how patients progress to AIDS
and could allocate the best treatments upfront.
• Data: Fourteen
(12 Africans, 2
viral load
Americans)
...
untreated adult gene expre.
patients during
acute infection.
Enroll
1
2
4
12
24
12
Harvard Medical School
Massachusetts Institute of Technology
Dynamic Gene Network of HIV Viral Load
13
Harvard Medical School
Massachusetts Institute of Technology
14
Harvard Medical School
Massachusetts Institute of Technology
Accuracy of HIV Viral Load Tracking
• Prediction accuracy:
Fitted Validation Cross Validation
(Accuracy)
(Robustness)
Dynamic Gene Network
97.8%
95.8%
Viral Load Auto-Regression
90.1%
89.5%
• Forecasting accuracy:
Fitted Validation Cross Validation
(Accuracy)
(Robustness)
Dynamic Gene Network
92.9%
91.8%
Viral Load Auto-Regression
88.7%
87.0%
15
Harvard Medical School
Massachusetts Institute of Technology
30 Genes Dynamically Interact with Viral Load
AMY1A: amylase, alpha 1a; salivary
OTOF: otoferlin
TNFAIP6 : tumor necrosis factor, alpha-induced protein
6
KIR2DL3: killer cell immunoglobulin-like receptor, two domains,
long cytoplasmic tail, 3
NBPF14: neuroblastoma breakpoint family, member 14
OSBP2: oxysterol binding protein 2
IRF7: interferon regulatory factor 7
CFD: complement factor d (adipsin)
HLA-DQA1: major histocompatibility complex, class ii,
dq alpha 1
HLA-DRB1: major histocompatibility complex, class ii, dr beta 1
RPS23: ribosomal protein s23
GPR56: g protein-coupled receptor 56
IFI44L: interferon-induced protein 44-like
CCL23: chemokine (c-c motif) ligand 23
KLRC2: killer cell lectin-like receptor subfamily c,
member 2
ITIF3: interferon-induced protein with tetratricopeptide
repeats 3
SOS1: son of sevenless homolog 1 (drosophila)
G1P2: interferon, alpha-inducible protein (clone ifi-15k)
LOC652775: similar to ig kappa chain v-v region l7
precursor
CCL3L1: chemokine (c-c motif) ligand 3-like 1
MBP: myelin basic protein
S100P: s100 calcium binding protein p
IFITM3: interferon induced transmembrane protein 3
(1-8u)
MX1: myxovirus (influenza virus) resistance 1, interferoninducible protein p78 (mouse)
HERC5: hect domain and rld 5
NME4: non-metastatic cells 4, protein expressed in
HLA-DQB1: major histocompatibility complex, class ii,
dq beta 1
LOC653157: similar to iduronate 2-sulfatase precursor (alpha-liduronate sulfate sulfatase) (idursulfase)
LOC643313: similar to hypothetical protein loc284701
RSAD2: radical s-adenosyl methionine domain containing 2
16
Harvard Medical School
Massachusetts Institute of Technology
Conclusions
• A Bayesian network framework to infer dynamic gene
networks from time-series gene expression microarrays:
– Does not require densely sampled microarray data.
– Able to handle noise and handle biological variability.
– Temporal dependency is captured by first-order Markov
process.
– The optimal network model is achieved by parallelized search
algorithm.
• Application to HIV viral load tracking shows how our
method can be used in clinical studies:
– Our network model tracks viral load trajectories with higher
accuracy than viral load auto-regressive model.
– Our model provides candidate gene targets for drug/vaccine
development.
17
Harvard Medical School
Massachusetts Institute of Technology
Acknowledgements
Supported by Center for HIV/AIDS Vaccine Immunology
(CHAVI) # U19 AI067854-06:
•National Institute of Allergy and Infectious Diseases
(NIAID)
•National Institutes of Health (NIH)
•Division of AIDS (DAIDS)
•U.S. Department of Health and Human Services (HHS)
18
Harvard Medical School
Massachusetts Institute of Technology
Stationary Network Inference
• All networks between pairs of times are identical.
T+2
VLVL
VL
T T+1
VLT+3
VLVL
T+1T+2
VLT+2
A
ATAT+1T+2
AT+3
AT+2
AT+1
AT+2
BT+2
T+1
BB
T
BT+3
BT+2
BT+1
BT+2
CT+2
T+1
CC
T
CT+3
CT+2
CT+1
CT+2
NT+2
T+1
NN
T
NT+3
NT+2
NT+1
NT+2
19
Harvard Medical School
Massachusetts Institute of Technology
20
Harvard Medical School
Massachusetts Institute of Technology
Pathway: Immune Response (16/30 genes, p<10-6)
AMY1A: amylase, alpha 1a; salivary
OTOF: otoferlin
TNFAIP6 : tumor necrosis factor, alpha-induced protein
6
KIR2DL3: killer cell immunoglobulin-like receptor, two
domains, long cytoplasmic tail, 3
NBPF14: neuroblastoma breakpoint family, member 14
OSBP2: oxysterol binding protein 2
IRF7: interferon regulatory factor 7
CFD: complement factor d (adipsin)
HLA-DQA1: major histocompatibility complex, class ii,
dq alpha 1
HLA-DRB1: major histocompatibility complex, class ii, dr beta
1
RPS23: ribosomal protein s23
GPR56: g protein-coupled receptor 56
IFI44L: interferon-induced protein 44-like
CCL23: chemokine (c-c motif) ligand 23
KLRC2: killer cell lectin-like receptor subfamily c,
member 2
ITIF3: interferon-induced protein with tetratricopeptide
repeats 3
SOS1: son of sevenless homolog 1 (drosophila)
G1P2: interferon, alpha-inducible protein (clone ifi-15k)
LOC652775: similar to ig kappa chain v-v region l7
precursor
CCL3L1: chemokine (c-c motif) ligand 3-like 1
MBP: myelin basic protein
S100P: s100 calcium binding protein p
IFITM3: interferon induced transmembrane protein 3
(1-8u)
MX1: myxovirus (influenza virus) resistance 1, interferoninducible protein p78 (mouse)
HERC5: hect domain and rld 5
NME4: non-metastatic cells 4, protein expressed in
HLA-DQB1: major histocompatibility complex, class ii,
dq beta 1
LOC653157: similar to iduronate 2-sulfatase precursor (alpha-liduronate sulfate sulfatase) (idursulfase)
LOC643313: similar to hypothetical protein loc284701
RSAD2: radical s-adenosyl methionine domain containing 2
21
Harvard Medical School
Massachusetts Institute of Technology
Pathway: Antiviral Defense (8/30 genes, p<10-3)
major histocompatibility complex, class ii, dr beta 1
otoferlin
tumor necrosis factor, alpha-induced protein 6
killer cell immunoglobulin-like receptor, two domains, long
cytoplasmic tail, 3
neuroblastoma breakpoint family, member 14
oxysterol binding protein 2
interferon regulatory factor 7
complement factor d (adipsin)
major histocompatibility complex, class ii, dq alpha 1
amylase, alpha 1a; salivary
ribosomal protein s23
g protein-coupled receptor 56
killer cell lectin-like receptor subfamily c, member 2
chemokine (c-c motif) ligand 23
interferon-induced protein 44-like
interferon-induced protein with tetratricopeptide repeats 3
son of sevenless homolog 1 (drosophila)
interferon, alpha-inducible protein (clone ifi-15k)
similar to ig kappa chain v-v region l7 precursor
chemokine (c-c motif) ligand 3-like 1
myelin basic protein
s100 calcium binding protein p
interferon induced transmembrane protein 3 (1-8u)
myxovirus (influenza virus) resistance 1, interferon-inducible
protein p78 (mouse)
hect domain and rld 5
non-metastatic cells 4, protein expressed in
major histocompatibility complex, class ii, dq beta 1
similar to iduronate 2-sulfatase precursor (alpha-l-iduronate
sulfate sulfatase) (idursulfase)
similar to hypothetical protein loc284701
radical s-adenosyl methionine domain containing 2
22
Harvard Medical School
Massachusetts Institute of Technology
Pathway: Inflammatory Response (5/30 genes, p<0.05)
major histocompatibility complex, class ii, dr beta 1
otoferlin
tumor necrosis factor, alpha-induced protein 6
killer cell immunoglobulin-like receptor, two domains, long
cytoplasmic tail, 3
neuroblastoma breakpoint family, member 14
oxysterol binding protein 2
interferon regulatory factor 7
complement factor d (adipsin)
major histocompatibility complex, class ii, dq alpha 1
amylase, alpha 1a; salivary
ribosomal protein s23
g protein-coupled receptor 56
killer cell lectin-like receptor subfamily c, member 2
chemokine (c-c motif) ligand 23
interferon-induced protein 44-like
interferon-induced protein with tetratricopeptide repeats 3
son of sevenless homolog 1 (drosophila)
interferon, alpha-inducible protein (clone ifi-15k)
similar to ig kappa chain v-v region l7 precursor
chemokine (c-c motif) ligand 3-like 1
myelin basic protein
s100 calcium binding protein p
interferon induced transmembrane protein 3 (1-8u)
myxovirus (influenza virus) resistance 1, interferon-inducible
protein p78 (mouse)
hect domain and rld 5
non-metastatic cells 4, protein expressed in
major histocompatibility complex, class ii, dq beta 1
similar to iduronate 2-sulfatase precursor (alpha-l-iduronate
sulfate sulfatase) (idursulfase)
similar to hypothetical protein loc284701
radical s-adenosyl methionine domain containing 2
23
Harvard Medical School
Massachusetts Institute of Technology
Interferon Family Dominates
major histocompatibility complex, class ii, dr beta 1
otoferlin
tumor necrosis factor, alpha-induced protein 6
killer cell immunoglobulin-like receptor, two domains, long
cytoplasmic tail, 3
neuroblastoma breakpoint family, member 14
oxysterol binding protein 2
interferon regulatory factor 7
complement factor d (adipsin)
major histocompatibility complex, class ii, dq alpha 1
amylase, alpha 1a; salivary
ribosomal protein s23
g protein-coupled receptor 56
killer cell lectin-like receptor subfamily c, member 2
chemokine (c-c motif) ligand 23
interferon-induced protein 44-like
interferon-induced protein with tetratricopeptide repeats 3
son of sevenless homolog 1 (drosophila)
interferon, alpha-inducible protein (clone ifi-15k)
similar to ig kappa chain v-v region l7 precursor
chemokine (c-c motif) ligand 3-like 1
myelin basic protein
s100 calcium binding protein p
interferon induced transmembrane protein 3 (1-8u)
myxovirus (influenza virus) resistance 1, interferon-inducible
protein p78 (mouse)
hect domain and rld 5
non-metastatic cells 4, protein expressed in
major histocompatibility complex, class ii, dq beta 1
similar to iduronate 2-sulfatase precursor (alpha-l-iduronate
sulfate sulfatase) (idursulfase)
similar to hypothetical protein loc284701
radical s-adenosyl methionine domain containing 2
3 pathways;
2 pathways;
1 pathway
24