Download Inference of sets of synergistically interacting genes from microarray

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic engineering wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Gene therapy wikipedia , lookup

Epistasis wikipedia , lookup

Essential gene wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene nomenclature wikipedia , lookup

Pathogenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Gene desert wikipedia , lookup

Public health genomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

The Selfish Gene wikipedia , lookup

Genomic imprinting wikipedia , lookup

NEDD9 wikipedia , lookup

Minimal genome wikipedia , lookup

Genome evolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene wikipedia , lookup

Ridge (biology) wikipedia , lookup

Oncogenomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
The Measure of Synergy
as a Tool in Systems Biology
D. Anastassiou
C2B2/MAGNet Center
Third Annual Retreat, 4/11/2008
Synergy
Definition: “The interaction of two or more agents or
forces so that their combined effect is greater than the
sum of their individual effects” (American Heritage Dictionary)
Natural application in systems biology
(holistic as opposed to reductionist paradigm): We
wish to analyze multiple interacting factors in terms of
the purely cooperative nature of their contributions
towards an outcome.
D. Anastassiou, "Computational Analysis of the Synergy
Among Multiple Interacting Genes" (Review Article),
Molecular Systems Biology, Vol. 3, No. 83, February 2007.
Information-theoretic definition
Synergy of two factors Gi, Gj with respect to
an outcome C:
I(Gi ,Gj ; C) - I(Gi ; C)+I(Gj ; C)
whole
sum of parts
Synergy can be positive or negative (redundancy)
and extended to more than two factors.
Example: Synergy of two genes
with respect to a phenotype
CONDITIONS
Given a large set of gene expression
data in both presence and absence
of a phenotype such as cancer, we
can estimate the information I(Gi; C)
that any gene Gi provides about
cancer C,
GENES
as well as the information I(Gi,Gj; C)
that any pair of two genes (Gi, Gj),
jointly provide about cancer C.
HEALTH
CANCER
Best gene pairs for classification
Extension of “gene ranking” based on I(Gi; C)
to “gene-pair ranking” based on I(Gi,Gj; C)
Observation: Sometimes high-ranked gene pairs do not
include any of the high-ranked single genes, suggesting
that the correlation of the gene pair with cancer is due
to a purely cooperative effect of the two genes.
V. Varadan and D. Anastassiou, “Inference of Disease-Related Molecular Logic
from Systems-Based Microarray Analysis,” PLoS Computational Biology, Vol. 2,
Issue 6, June 2006, pp. 585-597.
This purely cooperative effect can be quantified!
What is the “cancer interactome”?
High correlation:
I(Gi,Gj; C) >> 0
implies that the two genes can be jointly used for classification
High synergy:
I(Gi,Gj; C) >> I(Gi; C) + I(Gj; C) ≥ 0
further implies that the two genes Gi and Gj “interact” with
respect to cancer, and can be used to construct a
“synergy network,” a graph with nodes represent genes and
edges connect significantly high-synergy gene pairs.
J. Watkinson, X. Wang, T. Zheng, D. Anastassiou,
“Identification of gene interactions associated with disease
from gene expression data using synergy networks,”
BMC Systems Biology, February 2008
Example (prostate cancer)
Example of scatter plot for highest-synergy
gene pair from prostate cancer data
50 green (healthy) and 52 red (cancerous) dots
Cancer = (Low RBP1) AND (High EEF1B2)
Using synergy for inference of
gene regulatory interactions
The “phenotype” can be the
expression level of a third gene
Application to “Challenge 5”
of the DREAM2 conference
Given: A “blinded” compendium of 300 normalized
Affymetrix microarray experiments from E. coli, involving
3,456 genes out of which 120 (also blinded) transcription
factors.
Challenge: Reconstruct a genome-scale transcriptional
network (identify TF-target interactions).
Score based on known “ground truth” from chromatin
precipitation and otherwise experimentally verified
Transcription Factor (TF)-target interactions (from
RegulonDB).
“Three-way” mutual information
(common to three genes)
 p1p2 p3 p123 
I (G1;G2 ;G3 ) = E log

p12 p23 p13 

Can be estimated from
continuous data
Three-way mutual information is
the opposite of synergy!
I(G1;G2;G3) can be negative, in which case there is no
Venn diagram possible.
It turns out that -I(G1;G2;G3) is equal to
I(G1,G2; G3) - [I(G1;G3)+I(G2;G3)] =
I(G2,G3; G1) - [I(G2;G1)+I(G3;G1)] =
I(G1,G3; G2) - [I(G1;G2)+I(G3;G2)]
the synergy of two of the genes with respect to the third.
Synergistic “entanglement”
of three genes
If I(G1;G2;G3) << 0, this suggests that there is
some interaction mechanism connecting the three
genes, and the positive quantity -I(G1;G2;G3) can
be seen as measuring their synergistic
“entanglement.”
In that case, one likely scenario is that one of the
three genes is, at least partly or indirectly,
synergistically regulated by the other two.
Most-likely regulated gene
in a synergistically entangled triplet
I(Gi,Gk; Gj) ≥ max {I(Gi,Gj; Gk), I(Gk,Gj; Gi}
Or, as it turns out, equivalently:
I(Gi;Gk) ≤ min {I(Gi;Gj) , I(Gk;Gj)}
Synergistic regulation index
S( i , j ) =
max
k
where k  i , k  j ,
I (Gi ;Gk ) < I (Gi ;G j )
I (Gi ;Gk ) < I (Gk ;G j )
-I (Gi ;G j ;Gk )
Measures the degree of confidence that gene Gi
cooperatively regulates gene Gj
It also identifies Gk as the best synergistic
partner of Gi for the regulation of Gj
Can be used to augment the traditional MI measure:
M(i, j) = I(Gi;Gj)
Final score for Gi → Gj regulation
Score = M (i , j )  S(i , j )
computed from 2-way and 3-way MI values.
Turns out that it is equal to:
max
k
where k  i , k  j ,
I (Gi ;Gk ) < I (Gi ;G j )
I (Gi ;Gk ) < I (Gk ;G j )
I (Gi ;G j | Gk )
Results
TEAM
SCORE
Combined
log10(P value)
GISL
40.5
Team 121
25.2
Team 73
24.1
Team 41
18.7
Team 58
10.0
GISL:
Among top 150 predictions,
106 were in “ground truth.”
Potential for biological discovery using synergy:
Large number of statistically significant entangled triplets
Conclusions and acknowledgements
Synergy-based methodologies have the potential to
contribute towards empowering systems biology to
achieve genuine biological discovery by identifying
multiple interacting contributing factors, such as genes,
SNPs and CNVs.
Co-authors:
Prof. Tian Zheng,
Statistics, Columbia University
Prof. Xiaodong Wang, EE, Columbia University
Ph.D. students: John Watkinson, Kuo-ching Liang