Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Harvard Medical School Massachusetts Institute of Technology Inferring Nonstationary Gene Networks from Temporal Gene Expression Data Hsun-Hsien Chang1, Jonathan J. Smith2, Marco F. Ramoni1 1Children’s Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School 2Department of Mathematics, Massachusetts Institute of Technology IEEE Workshop on Signal Processing Systems October 7, 2010 1 Harvard Medical School Massachusetts Institute of Technology Background • Genetic information flows from DNA to RNA through transcription. • Gene expression is the measure of RNA abundance in cells, revealing the gene activities. • Modern microarray technologies are able to assess expression of 50K genes in parallel. 2 Harvard Medical School Massachusetts Institute of Technology Clinical Applications Multiple patients in distinct biological conditions. ... gene expre. T0 T1 T2 T3 T4 T5 • Thanks to cost down, more samples can be collected in a single study. A new clinical application: – Monitor time-series gene expression in response to drugs, treatments, vaccines, virus infection, etc. 3 Harvard Medical School Massachusetts Institute of Technology Time-Series Gene Expression Analysis • Since genes interact each other in cells, an intriguing analysis is to infer gene networks: – Detailed models (e.g., differential equations). – Abstract models (e.g., Boolean networks). gene on gene off – Probabilistic graphical models (e.g., dynamic Bayesian networks). • Do not require densely sampled data. • Model expression levels by random variables to handle noisy expression measurements and biological variability. • Utilize the inferred networks to make prediction. 4 Harvard Medical School Massachusetts Institute of Technology Data Representation by Bayesian Networks • Bayesian networks are directed acyclic graphs where: – Nodes correspond to random variables (i.e., expressions of genes, clinical variables). – Directed arcs encode conditional probabilities of the target (child) nodes on the source (parent) nodes. A C B D E • Dynamic Bayesian networks with arcs indicating temporal dependency. – Example: variables X and Y at time T modulate variable Z at time T+1. – The network model can serve as a prediction tool. XT ZT+1 YT XT given ZT+1 predicted YT 5 Harvard Medical School Massachusetts Institute of Technology Network Inference Engine Clinical variable VT VT+1 AT AT+1 BT BT+1 CT CT+1 NT NT+1 Genes • First-order Markov process: data at time T+1 depends only on the preceding time T. • For a variable at a time T+1, search which set of variables at time T has the highest likelihood of modulating its value at T+1. • Step-wise search algorithm. 6 Harvard Medical School Massachusetts Institute of Technology Inference of Whole Dynamic Gene Network • Infer a transition network between every pair of times. VT VT+1 VT+2 AT AT+1 AT+2 BT BT+1 BT+2 CT CT+1 CT+2 NT NT+1 NT+2 7 Harvard Medical School Massachusetts Institute of Technology Parallelize Learning Individual Transition Nets VT VT+1 VT+1 VT+2 VT+2 AT AT+1 AT+1 AT+2 AT+2 BT BT+1 BT+1 BT+2 BT+2 CT CT+1 CT+1 CT+2 CT+2 NT NT+1 NT+1 NT+2 NT+2 8 Harvard Medical School Massachusetts Institute of Technology Parallelize Parent Searching of Individual Variables VT VT+1 AT AT+1 BT BT+1 CT CT+1 NT NT+1 9 Harvard Medical School Massachusetts Institute of Technology Step-by-Step Prediction given data given predicted data predicted VT VT+1 VT+2 AT AT+1 AT+2 BT BT+1 BT+2 CT CT+1 CT+2 NT NT+1 NT+2 10 Harvard Medical School Massachusetts Institute of Technology Forecasting by Initial Data given data predicted predicted VT VT+1 VT+2 AT AT+1 AT+2 BT BT+1 BT+2 CT CT+1 CT+2 NT NT+1 NT+2 11 Harvard Medical School Massachusetts Institute of Technology Clinical Study: HIV Viral Load Tracking • Global AIDS epidemic is one of the greatest threats to human health, causing 2 million deaths every year. • Viral load (i.e., virus density in blood) is: – associated with clinical outcomes. – an indicator of which treatment physicians should provide. • If there is a tool to predict/forecast viral load trajectory, physicians could foresee how patients progress to AIDS and could allocate the best treatments upfront. • Data: Fourteen (12 Africans, 2 viral load Americans) ... untreated adult gene expre. patients during acute infection. Enroll 1 2 4 12 24 12 Harvard Medical School Massachusetts Institute of Technology Dynamic Gene Network of HIV Viral Load 13 Harvard Medical School Massachusetts Institute of Technology 14 Harvard Medical School Massachusetts Institute of Technology Accuracy of HIV Viral Load Tracking • Prediction accuracy: Fitted Validation Cross Validation (Accuracy) (Robustness) Dynamic Gene Network 97.8% 95.8% Viral Load Auto-Regression 90.1% 89.5% • Forecasting accuracy: Fitted Validation Cross Validation (Accuracy) (Robustness) Dynamic Gene Network 92.9% 91.8% Viral Load Auto-Regression 88.7% 87.0% 15 Harvard Medical School Massachusetts Institute of Technology 30 Genes Dynamically Interact with Viral Load AMY1A: amylase, alpha 1a; salivary OTOF: otoferlin TNFAIP6 : tumor necrosis factor, alpha-induced protein 6 KIR2DL3: killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3 NBPF14: neuroblastoma breakpoint family, member 14 OSBP2: oxysterol binding protein 2 IRF7: interferon regulatory factor 7 CFD: complement factor d (adipsin) HLA-DQA1: major histocompatibility complex, class ii, dq alpha 1 HLA-DRB1: major histocompatibility complex, class ii, dr beta 1 RPS23: ribosomal protein s23 GPR56: g protein-coupled receptor 56 IFI44L: interferon-induced protein 44-like CCL23: chemokine (c-c motif) ligand 23 KLRC2: killer cell lectin-like receptor subfamily c, member 2 ITIF3: interferon-induced protein with tetratricopeptide repeats 3 SOS1: son of sevenless homolog 1 (drosophila) G1P2: interferon, alpha-inducible protein (clone ifi-15k) LOC652775: similar to ig kappa chain v-v region l7 precursor CCL3L1: chemokine (c-c motif) ligand 3-like 1 MBP: myelin basic protein S100P: s100 calcium binding protein p IFITM3: interferon induced transmembrane protein 3 (1-8u) MX1: myxovirus (influenza virus) resistance 1, interferoninducible protein p78 (mouse) HERC5: hect domain and rld 5 NME4: non-metastatic cells 4, protein expressed in HLA-DQB1: major histocompatibility complex, class ii, dq beta 1 LOC653157: similar to iduronate 2-sulfatase precursor (alpha-liduronate sulfate sulfatase) (idursulfase) LOC643313: similar to hypothetical protein loc284701 RSAD2: radical s-adenosyl methionine domain containing 2 16 Harvard Medical School Massachusetts Institute of Technology Conclusions • A Bayesian network framework to infer dynamic gene networks from time-series gene expression microarrays: – Does not require densely sampled microarray data. – Able to handle noise and handle biological variability. – Temporal dependency is captured by first-order Markov process. – The optimal network model is achieved by parallelized search algorithm. • Application to HIV viral load tracking shows how our method can be used in clinical studies: – Our network model tracks viral load trajectories with higher accuracy than viral load auto-regressive model. – Our model provides candidate gene targets for drug/vaccine development. 17 Harvard Medical School Massachusetts Institute of Technology Acknowledgements Supported by Center for HIV/AIDS Vaccine Immunology (CHAVI) # U19 AI067854-06: •National Institute of Allergy and Infectious Diseases (NIAID) •National Institutes of Health (NIH) •Division of AIDS (DAIDS) •U.S. Department of Health and Human Services (HHS) 18 Harvard Medical School Massachusetts Institute of Technology Stationary Network Inference • All networks between pairs of times are identical. T+2 VLVL VL T T+1 VLT+3 VLVL T+1T+2 VLT+2 A ATAT+1T+2 AT+3 AT+2 AT+1 AT+2 BT+2 T+1 BB T BT+3 BT+2 BT+1 BT+2 CT+2 T+1 CC T CT+3 CT+2 CT+1 CT+2 NT+2 T+1 NN T NT+3 NT+2 NT+1 NT+2 19 Harvard Medical School Massachusetts Institute of Technology 20 Harvard Medical School Massachusetts Institute of Technology Pathway: Immune Response (16/30 genes, p<10-6) AMY1A: amylase, alpha 1a; salivary OTOF: otoferlin TNFAIP6 : tumor necrosis factor, alpha-induced protein 6 KIR2DL3: killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3 NBPF14: neuroblastoma breakpoint family, member 14 OSBP2: oxysterol binding protein 2 IRF7: interferon regulatory factor 7 CFD: complement factor d (adipsin) HLA-DQA1: major histocompatibility complex, class ii, dq alpha 1 HLA-DRB1: major histocompatibility complex, class ii, dr beta 1 RPS23: ribosomal protein s23 GPR56: g protein-coupled receptor 56 IFI44L: interferon-induced protein 44-like CCL23: chemokine (c-c motif) ligand 23 KLRC2: killer cell lectin-like receptor subfamily c, member 2 ITIF3: interferon-induced protein with tetratricopeptide repeats 3 SOS1: son of sevenless homolog 1 (drosophila) G1P2: interferon, alpha-inducible protein (clone ifi-15k) LOC652775: similar to ig kappa chain v-v region l7 precursor CCL3L1: chemokine (c-c motif) ligand 3-like 1 MBP: myelin basic protein S100P: s100 calcium binding protein p IFITM3: interferon induced transmembrane protein 3 (1-8u) MX1: myxovirus (influenza virus) resistance 1, interferoninducible protein p78 (mouse) HERC5: hect domain and rld 5 NME4: non-metastatic cells 4, protein expressed in HLA-DQB1: major histocompatibility complex, class ii, dq beta 1 LOC653157: similar to iduronate 2-sulfatase precursor (alpha-liduronate sulfate sulfatase) (idursulfase) LOC643313: similar to hypothetical protein loc284701 RSAD2: radical s-adenosyl methionine domain containing 2 21 Harvard Medical School Massachusetts Institute of Technology Pathway: Antiviral Defense (8/30 genes, p<10-3) major histocompatibility complex, class ii, dr beta 1 otoferlin tumor necrosis factor, alpha-induced protein 6 killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3 neuroblastoma breakpoint family, member 14 oxysterol binding protein 2 interferon regulatory factor 7 complement factor d (adipsin) major histocompatibility complex, class ii, dq alpha 1 amylase, alpha 1a; salivary ribosomal protein s23 g protein-coupled receptor 56 killer cell lectin-like receptor subfamily c, member 2 chemokine (c-c motif) ligand 23 interferon-induced protein 44-like interferon-induced protein with tetratricopeptide repeats 3 son of sevenless homolog 1 (drosophila) interferon, alpha-inducible protein (clone ifi-15k) similar to ig kappa chain v-v region l7 precursor chemokine (c-c motif) ligand 3-like 1 myelin basic protein s100 calcium binding protein p interferon induced transmembrane protein 3 (1-8u) myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse) hect domain and rld 5 non-metastatic cells 4, protein expressed in major histocompatibility complex, class ii, dq beta 1 similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase) similar to hypothetical protein loc284701 radical s-adenosyl methionine domain containing 2 22 Harvard Medical School Massachusetts Institute of Technology Pathway: Inflammatory Response (5/30 genes, p<0.05) major histocompatibility complex, class ii, dr beta 1 otoferlin tumor necrosis factor, alpha-induced protein 6 killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3 neuroblastoma breakpoint family, member 14 oxysterol binding protein 2 interferon regulatory factor 7 complement factor d (adipsin) major histocompatibility complex, class ii, dq alpha 1 amylase, alpha 1a; salivary ribosomal protein s23 g protein-coupled receptor 56 killer cell lectin-like receptor subfamily c, member 2 chemokine (c-c motif) ligand 23 interferon-induced protein 44-like interferon-induced protein with tetratricopeptide repeats 3 son of sevenless homolog 1 (drosophila) interferon, alpha-inducible protein (clone ifi-15k) similar to ig kappa chain v-v region l7 precursor chemokine (c-c motif) ligand 3-like 1 myelin basic protein s100 calcium binding protein p interferon induced transmembrane protein 3 (1-8u) myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse) hect domain and rld 5 non-metastatic cells 4, protein expressed in major histocompatibility complex, class ii, dq beta 1 similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase) similar to hypothetical protein loc284701 radical s-adenosyl methionine domain containing 2 23 Harvard Medical School Massachusetts Institute of Technology Interferon Family Dominates major histocompatibility complex, class ii, dr beta 1 otoferlin tumor necrosis factor, alpha-induced protein 6 killer cell immunoglobulin-like receptor, two domains, long cytoplasmic tail, 3 neuroblastoma breakpoint family, member 14 oxysterol binding protein 2 interferon regulatory factor 7 complement factor d (adipsin) major histocompatibility complex, class ii, dq alpha 1 amylase, alpha 1a; salivary ribosomal protein s23 g protein-coupled receptor 56 killer cell lectin-like receptor subfamily c, member 2 chemokine (c-c motif) ligand 23 interferon-induced protein 44-like interferon-induced protein with tetratricopeptide repeats 3 son of sevenless homolog 1 (drosophila) interferon, alpha-inducible protein (clone ifi-15k) similar to ig kappa chain v-v region l7 precursor chemokine (c-c motif) ligand 3-like 1 myelin basic protein s100 calcium binding protein p interferon induced transmembrane protein 3 (1-8u) myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse) hect domain and rld 5 non-metastatic cells 4, protein expressed in major histocompatibility complex, class ii, dq beta 1 similar to iduronate 2-sulfatase precursor (alpha-l-iduronate sulfate sulfatase) (idursulfase) similar to hypothetical protein loc284701 radical s-adenosyl methionine domain containing 2 3 pathways; 2 pathways; 1 pathway 24