* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Inference of sets of synergistically interacting genes from microarray
Genetic engineering wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Gene therapy wikipedia , lookup
Essential gene wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene nomenclature wikipedia , lookup
Pathogenomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Gene desert wikipedia , lookup
Public health genomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
The Selfish Gene wikipedia , lookup
Genomic imprinting wikipedia , lookup
Minimal genome wikipedia , lookup
Genome evolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Ridge (biology) wikipedia , lookup
Oncogenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Microevolution wikipedia , lookup
Genome (book) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
The Measure of Synergy as a Tool in Systems Biology D. Anastassiou C2B2/MAGNet Center Third Annual Retreat, 4/11/2008 Synergy Definition: “The interaction of two or more agents or forces so that their combined effect is greater than the sum of their individual effects” (American Heritage Dictionary) Natural application in systems biology (holistic as opposed to reductionist paradigm): We wish to analyze multiple interacting factors in terms of the purely cooperative nature of their contributions towards an outcome. D. Anastassiou, "Computational Analysis of the Synergy Among Multiple Interacting Genes" (Review Article), Molecular Systems Biology, Vol. 3, No. 83, February 2007. Information-theoretic definition Synergy of two factors Gi, Gj with respect to an outcome C: I(Gi ,Gj ; C) - I(Gi ; C)+I(Gj ; C) whole sum of parts Synergy can be positive or negative (redundancy) and extended to more than two factors. Example: Synergy of two genes with respect to a phenotype CONDITIONS Given a large set of gene expression data in both presence and absence of a phenotype such as cancer, we can estimate the information I(Gi; C) that any gene Gi provides about cancer C, GENES as well as the information I(Gi,Gj; C) that any pair of two genes (Gi, Gj), jointly provide about cancer C. HEALTH CANCER Best gene pairs for classification Extension of “gene ranking” based on I(Gi; C) to “gene-pair ranking” based on I(Gi,Gj; C) Observation: Sometimes high-ranked gene pairs do not include any of the high-ranked single genes, suggesting that the correlation of the gene pair with cancer is due to a purely cooperative effect of the two genes. V. Varadan and D. Anastassiou, “Inference of Disease-Related Molecular Logic from Systems-Based Microarray Analysis,” PLoS Computational Biology, Vol. 2, Issue 6, June 2006, pp. 585-597. This purely cooperative effect can be quantified! What is the “cancer interactome”? High correlation: I(Gi,Gj; C) >> 0 implies that the two genes can be jointly used for classification High synergy: I(Gi,Gj; C) >> I(Gi; C) + I(Gj; C) ≥ 0 further implies that the two genes Gi and Gj “interact” with respect to cancer, and can be used to construct a “synergy network,” a graph with nodes represent genes and edges connect significantly high-synergy gene pairs. J. Watkinson, X. Wang, T. Zheng, D. Anastassiou, “Identification of gene interactions associated with disease from gene expression data using synergy networks,” BMC Systems Biology, February 2008 Example (prostate cancer) Example of scatter plot for highest-synergy gene pair from prostate cancer data 50 green (healthy) and 52 red (cancerous) dots Cancer = (Low RBP1) AND (High EEF1B2) Using synergy for inference of gene regulatory interactions The “phenotype” can be the expression level of a third gene Application to “Challenge 5” of the DREAM2 conference Given: A “blinded” compendium of 300 normalized Affymetrix microarray experiments from E. coli, involving 3,456 genes out of which 120 (also blinded) transcription factors. Challenge: Reconstruct a genome-scale transcriptional network (identify TF-target interactions). Score based on known “ground truth” from chromatin precipitation and otherwise experimentally verified Transcription Factor (TF)-target interactions (from RegulonDB). “Three-way” mutual information (common to three genes) p1p2 p3 p123 I (G1;G2 ;G3 ) = E log p12 p23 p13 Can be estimated from continuous data Three-way mutual information is the opposite of synergy! I(G1;G2;G3) can be negative, in which case there is no Venn diagram possible. It turns out that -I(G1;G2;G3) is equal to I(G1,G2; G3) - [I(G1;G3)+I(G2;G3)] = I(G2,G3; G1) - [I(G2;G1)+I(G3;G1)] = I(G1,G3; G2) - [I(G1;G2)+I(G3;G2)] the synergy of two of the genes with respect to the third. Synergistic “entanglement” of three genes If I(G1;G2;G3) << 0, this suggests that there is some interaction mechanism connecting the three genes, and the positive quantity -I(G1;G2;G3) can be seen as measuring their synergistic “entanglement.” In that case, one likely scenario is that one of the three genes is, at least partly or indirectly, synergistically regulated by the other two. Most-likely regulated gene in a synergistically entangled triplet I(Gi,Gk; Gj) ≥ max {I(Gi,Gj; Gk), I(Gk,Gj; Gi} Or, as it turns out, equivalently: I(Gi;Gk) ≤ min {I(Gi;Gj) , I(Gk;Gj)} Synergistic regulation index S( i , j ) = max k where k i , k j , I (Gi ;Gk ) < I (Gi ;G j ) I (Gi ;Gk ) < I (Gk ;G j ) -I (Gi ;G j ;Gk ) Measures the degree of confidence that gene Gi cooperatively regulates gene Gj It also identifies Gk as the best synergistic partner of Gi for the regulation of Gj Can be used to augment the traditional MI measure: M(i, j) = I(Gi;Gj) Final score for Gi → Gj regulation Score = M (i , j ) S(i , j ) computed from 2-way and 3-way MI values. Turns out that it is equal to: max k where k i , k j , I (Gi ;Gk ) < I (Gi ;G j ) I (Gi ;Gk ) < I (Gk ;G j ) I (Gi ;G j | Gk ) Results TEAM SCORE Combined log10(P value) GISL 40.5 Team 121 25.2 Team 73 24.1 Team 41 18.7 Team 58 10.0 GISL: Among top 150 predictions, 106 were in “ground truth.” Potential for biological discovery using synergy: Large number of statistically significant entangled triplets Conclusions and acknowledgements Synergy-based methodologies have the potential to contribute towards empowering systems biology to achieve genuine biological discovery by identifying multiple interacting contributing factors, such as genes, SNPs and CNVs. Co-authors: Prof. Tian Zheng, Statistics, Columbia University Prof. Xiaodong Wang, EE, Columbia University Ph.D. students: John Watkinson, Kuo-ching Liang