* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download System approaches for complex diseases
Population genetics wikipedia , lookup
Oncogenomics wikipedia , lookup
Copy-number variation wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Genetic engineering wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Ridge (biology) wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Medical genetics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene therapy wikipedia , lookup
Public health genomics wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Epigenetics of human development wikipedia , lookup
The Selfish Gene wikipedia , lookup
Genome (book) wikipedia , lookup
Gene desert wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Genome evolution wikipedia , lookup
Gene nomenclature wikipedia , lookup
Helitron (biology) wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression profiling wikipedia , lookup
Bayesian network and its applications Jun Zhu Genetics Rosetta Inpharmatics Merck & Co. Outline • Methods – Integration of genetics and gene expression – Integration of data from multiple tissues – Construction of causal graphic networks – Integration of transcription factor binding sites and protein-protein interaction (PPI) data • Applications – target selection and prioritization – Integrate with siRNA screening data – Integrate with proteomics data Biological networks/pathways Data required to train models Gene sets Association networks Probabilistic causal networks Mechanism based models Biological details revealed Biological networks/pathways Data required to train models Association networks Probabilistic causal networks Biological details revealed 4267 top genes in BxH liver female rescan qtl overlap (num(p(GGC)<1e-15)>100 ~abs(cor)>0.5886) 1. How do genes in the same module interact? 2. How do genes in different modules interact? 3. Can we make causal inferences to elucidate signaling pathway for disease targets? A framework for data integration knowledge Medline Biocarta/Biopathway Biologists High throughput data Microarray data Proteomic data How to integrate them? Database Genomics Hypothesis, test Genetics GUI Bayesian network • decompose joint distribution based on conditional independence p(G) p( X 1, X 2.... Xn) p( Xi | Pa( Xi )) i • Find maximum likelihood of G given data D, p(D|G) Bayesian network • How it is reconstructed? Data is fixed Search the best model p( D | G) – Local search method (insertion, deletion, reversion) – Complexity penalty (BIC score) – Bayesian average (1000 independent runs to explore possible space) Bayesian network-practical issues • How it is reconstructed? – NP-hard problem – Limit numbers of nodes – Limit search space BN: Markov equivalent • Bayesian network is just a graphic model • Itself does not reveal causal information AB BA p( A, B) p( B / A) p( A) p( A / B) p( B) Bayesian network: A, B and C are correlated, but through different mechanisms. A A C B C B L A A B B C B C A C BN: priors of causal information • Break Markov equivalence by introducing priors for structures • Set priors so that p(AB) is different from p(BA) • Priors were derived from genetic information Integration of genetics and gene expression Experimental Design Experimental Data • Genetic map • Genotype • Gene expression of relevant tissues • Clinical end points Ingredients for inferring causality • Perturbations with a causal anchor – KOs/transgenics present a known perturbation (causal anchor) where response can be studied – Natural variation in a segregating population provides the same type of causal anchor (ability to identify DNA variations associated with response): AACGGTT AACAGTT DNA Supporting Gene X Variation in DNA leads to variation in mRNA High expression, alt splicing, codon change, etc. Low expression, no alt. splicing, no codon change, etc. Variation in mRNA leads to variation in protein, which in turn can lead to disease Distinguishing Causal from Reactive Genes Causative Model Independent Model Reactive Model eumelanin RNAs ob/ob L leptin T1 obesity db/db T2 P L,T1,T2 P T1 | L P T2 | T1 L obesity leptin T2 T1 T1 Avy L P L,T1,T2 P T2 | L P T1 | T2 T2 obesity P L,T1,T2 P T2 | L P T1 | L L DNA Locus controlling RNA levels and/or clinical traits L: T1 R: Quantitative trait 1 T2 C: Quantitative trait 2 Schadt E, et al., Nature Genetics, 2005 Inference causal relationships Gene A with cis acting QTL Gene A Chr 1 Physical location Gene A Gene A Gene expression of A and D correlate Locus 1 Chr2 Chr2 Locus 2 Locus 3 Gene expression of B, C & E correlate Gene Chr 9 D Gene D Physical location Gene D Chr 1 Gene C Gene B Locus 1 A and D have overlapping eQTL on Chr 1 at Locus 1 Genes with complex trans acting QTLs Locus 1 Gene A controls Gene D B, C and E have overlapping eQTL on Chr 2 Gene E Chr2 Locus 1 Locus 2 Locus 3 Genes B & C control Gene E Bayesian network-integrating genetics • Experimental Hsd11b1 signature : mice treated with Hsd1 inhibitor • Prediction Hsd1 signatures based on BxD data – Correlation to Hsd1 • 10% of predicted signature overlap with experimental one – BN without genetics • 20% of predicted signature overlap with experimental one – BN with genetics • 52% of predicted signature overlap with experimental one Zhu J, et al, Cytogenet Genome Res. 2004 BN: simulation study BN: Genetics information is critical when sample size is small Zhu J, et al, PLoS Comput Biol. 2007 A framework for data integration knowledge Medline Biocarta/Biopathway Biologists High throughput data Microarray data Proteomic data How to integrate them? Database Genomics Hypothesis, test Genetics GUI How to integrate protein-protein interaction data? Can we find overlapped information better? 4-clique 4-clique 3-clique 3-clique Clique community (partial clique) Comparing protein-protein interactions with gene co-expression 0.51 0.50 0.29 0.19 Integrating transcription factor (TF) binding data and PPI • Introducing scale-free priors for TF and large PPI complex p(T g ) w(T ) w(T ) log( r (T , g ) g i R i rcutoff ) • Fixed prior for small PPI complex Application to yeast cross BN KO data GO terms TF data w/o any priors 125 55 26 w/ genetics priors 139 59 34 w/ genetics, TF and PPI priors 152 66 52 The network integrated genetics, TF and PPI has better prediction power. Mechanism for a QTL hot spot Red: TF Green: PPI Zhu J, et al. Nature Genetics, 2008 Applications • How to use networks to prioritize candidates? • How to use networks to identify causal genes in genome-wide association studies? Driver potential Query gene Hypergeometric test Validating connections in human cohorts • Study of the genetics of gene expression in pedigrees using blood samples. • Blood was collected from 455 individuals from 51 Icelandic families (Most families were dense three generation pedigrees). • Samples were expression profiled against a common reference pool. • Samples were genotyped for 1000 markers across the genome. • Each of the 455 individuals was scored for 40 clinical traits. RG1003 falls under linkage peak for obesity in females RG1003 supported by obesity/diabetes linkages in the published literature RG1003 supported by Decode Linkage Kissebah et al. 2000 Obese females RG1003 RG1033 has cis-acting QTL in Decode family blood expression data RG1003 RG1003 High-expressor allele for RG1003 associates with high BMI Overlap between cQTL and eQTL Cis eQTL for RG1003 5 4,5 C03 LOD 4 3,5 3 2,5 2 1,5 BMI>35 1 0,5 aka GPR105 0 RG1033 LD1 LD2 GPR105 expression: association analysis 6 The best single marker association 5 Marker D3S1279 Allele 10 p-value 8 x 10-6 RR 2.2 Top 50% BMI 205 Aff freq 0.21 Bottom 50% BMI 205 Ctrl freq 0.11 -log p Clinical trait 4 3 Expression trait (GPR105) Marker Allele p value D3S1279 10 1 x 10-6 2 R 0.05 Expressor High allele 10 2 Low allele 6 1 0 LD1 LD2 Markers across the GPR105 locus ASO experiment in DIO mice Wt Gain in DIO C57BL/6 Mice 35 30 25 20 15 10 vehicle GPR105 SCD1 Scrambled 5 0 0.5 1 1.5 2 2.5 wk 3 3.5 4 4.5 These same approaches can be used to functionate the large number of GWA studies getting dumped in the public domain today WTCC paper reports GWA results for 7 common diseases; coming along side this was a paper focusing on the T1D associations, where genes corresponding to the associations are identified et al. In the T1D Paper Genes Corresponding to the Associations in the WTCCC Paper are identified • • But what functional support is provided for these identifications? Consider the chr 12q13 association and the identification of ERBB3: – – The gene was closest to the associated SNP SH2B3 binds ERBB3, where ITAMs bind proteins like SH2B3 with SH2 signaling domains involved in immune inflammatory events that lead to autoimmune pancreatic beta-cell destruction in T1D 1MB Window rs11171739 Genes Adjacent to rs11171739 Cis eSNP Distribution (Liver) > 10% of cis eSNPS > 10% of cis eSNPS Rps26, but NOT ERBB3 Is Significantly Associated with rs11171739 in Cis snp_chr snp_pos 12 54756892 12 54756892 12 54756892 12 54756892 12 54756892 12 54756892 12 54756892 12 54756892 12 54756892 12 54756892 12 54756892 12 54756892 12 54756892 12 54756892 12 54756892 gene_symbol gene_name MMT00321 MMT17394 MMT12973 MMT21703 MMT00741 MMT09493 MMT15828 ERBB3 Proximal Gene MMT12163 MMT23083 MMT20493 MMT10434 MMT06311 MMT15103 ERBB3 snp_chr 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 log10_kw_pvalue gene_chr gene_pos 36.076895 8 35.9099 18 35.418233 10 34.886122 5 34.774948 9 34.746478 1 34.496097 X 34.418756 12 34.01855 15 33.982509 8 32.994015 7 32.594523 8 32.578843 7 24.875821 10 0.39735 12 cis_trans trans trans trans trans trans trans trans cis trans trans trans trans trans trans cis • ERBB3 expression activity has 2 suggestive trans eQTL, but is not at all linked to the T1D SNP • The Rps26 expression trait is very strongly linked to the T1D SNP; nearly 40% of the in vivo expression of this gene is explained by this SNP • Other genes strongly linked to the T1D SNP in trans are homologs of the Rps26 gene But now look at probabilistic causal networks All crosses, all tissues • Liver • Adipose • Skeletal muscle • Islets • Whole brain • Hypothalamus Rps26 T1D KEGG pathway genes Schadt E, et al., PLoS Biology, 2008 Functional Enrichment of Rps26 Mouse Bayesian Network Genes Similar Set Major histocompatibility complex antigen T-cell mediated immunity antigen processing MHCII-mediated immunity antigen processing, exogenous antigen via antigen presentation, exogenous antigen Type I diabetes mellitus Antigen processing and presentation MHC class II receptor activity Cell adhesion molecules (CAMs) antigen presentation antigen presentation, exogenous antigen Expectation 3.59615679443374E-11 4.11814903693412E-11 4.35718665292356E-10 2.19156051592854E-09 1.31842207155735E-08 1.48086534305264E-08 2.60246885295535E-08 3.024258456011E-08 5.66821865604424E-07 6.26076191318305E-07 8.01315717611796E-07 1.42894483510369E-05 Input Identifiers H2-Aa;H2-Ab1;H2-Eb1;H2-M3;H2-DMa;H2-DMb1;H2-Q2;MMT00082085;H2C2ta;Cd2;Ctss;H2-Aa;H2-Ab1;H2-Eb1;H2-M3;H2-DMa;H2-DMb1;H2Rmcs1;Ctss;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1;MMT00082085;Hfe;Psmb8 C2ta;Ctss;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1;MMT00072401 Rmcs1;Ctss;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1 Rmcs1;Fcgr3;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1;Psmb8 H2-Aa;H2-Ab1;H2-Eb1;H2-M3;H2-DMa;H2-DMb1;H2-Q2;MMT00082085;H2-T9;Hspd1 C2ta;Ctss;H2-Aa;H2-Ab1;H2-Eb1;H2-M3;H2-DMa;H2-DMb1;H2Rmcs1;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1 Cd2;H2-Aa;H2-Ab1;H2-Eb1;H2-M3;H2-DMa;H2-DMb1;H2-Q2;MMT00082085;H2Rmcs1;Fcgr3;H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1;MMT00082085;Hfe;Psmb8 H2-Aa;H2-Ab1;H2-Eb1;H2-DMa;H2-DMb1 What about ERBB3 network in mouse? No functional enrichment in network genes (no T1D association) AD sub-network GO: inflammation GO: anti-apoptosis Legends: Red: risk factor APOE Yellow: progression marker (proteomic candidates) Rectangle: association marker (proteomic data) NPTXR VGF CDK5R2 MAPT BDNF A2M GO:Synaptic transmission (p_value = 1.3e-12) APBB1IP How to understand phosphorylation changes detected by proteomics? • 16 proteins’ phosphorylation states changed after inducing PIN1 siRNA (16 proteomic hits); • Gene expression signature of PIN1 siRNA is also defined; • Phosphorylation change is the primary signal, gene expression change is amplified signature. • Do the two types of signals match? The two types of signals match around PIN1 PIN1 Diamond: phosphorylation Red: gene expression MP Rutper Vessey Informatics, Biosoft, Biology GEL, GEM Genetics Eric Schadt Biology/New Targets John Lamb Pek Lum Valur Emilsson Jonathan Derry Michael Coon I-Ming Wang Debraj GuhaThakurta Tao Xie Xia Yang Network/Systems Biology Jun Zhu Bin Zhang Radu Dobrin Zhidong Tu Dmitri Volfson Mani Narayanan Data management/HP computing Andrew Kasarskis Archie Russell Xavier Schildwachter Eugene Chudin Statistical Genetics Cliona Molony Solly Sieberts Josh Millstein Ke Hao Hunter Fraiser finance/admin) PMs: Sonia, Christine, and Rob* Chunsheng Zhang* Merck Collaborators Obesity/Diabetes Marc Reitman Nancy Thornberry Doug MacNeil Charles Rosenblum Su Chen Shirly Pinto Brian Kennedy Joe Mancini Joel Berger Sajjad A. Qureshi Cardiovascular Sam Wright Carl Sparrow Marty Springer Gerry Waters Kenny Wong Sleep John Renger Alzheimber’s David Stone Cancer Stephen Friend Theresa Zhang Joseph Marszalek Andrew Bloecher Vinayak Kulkarni ACSM Jeff Sachs Arthur Fridman Matthew C. Wiener Eric Minch Metabolite/Toxicogenomics Frank Sisteria Bill scheffer Ethan Xu Qiuwei Xue Other Merck Collaborators Andy Plump Larry Peterson Erik Lund External Collaborators UW Steve Schwartz Roger Baumgarner UWisc Attiegroup UCLA Jake Lusis UNL/UNC Daniel Pomp Decode Kari Stefansson NSI Yanqing Chen Harvard Jun Liu Berkerley Rachel Brem Princeton Lenoid Kruglyak