Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparative
Expression
Moran Yassour
+
=
Goal
Build a multi-species gene-coexpression
network
Find functions of unknown genes
Discover how the genes interact
Distinguish between accidentally
regulated genes from those
that are physiologically important
Construction of a genecoexpression network.
Evolutionarily diverse organisms with
extensive microarray data:
Homo sapiens
Drosophila melanogaster
Caenorhabditis elegans
Saccharomyces cerevisiae.
We first associated genes from one organism
with their orthologous counterparts in other
organisms.
Evolution 101
Paralogs vs. Orthologs
Evolution 101
Paralogs vs. Orthologs
Construct a metagene
identify
connected
components
ignore nonreciprocal hits
Human gene
Fly gene
Worm gene
Yeast gene
MEG
best BLAST hit
Using this method, we assigned each gene to
at most a single metagene.
Some numbers
In total we have 6307 metagenes (6591
human genes, 5180 worm genes, 5802 fly
genes, and 2434 yeast genes.)
We sought to identify pairs of metagenes that
not only were coexpressed in one experiment
and in one organism but that also showed
correlation in diverse experiments in multiple
organisms.
Edges in the graph
Human
Fly
1
Worm
5
5
1
3
2
4
2
4
3
4
5
2
1
MEG1
3
?
MEG2
2
MEG1
4
2
{2,4,2} significant ?
MEG2
(P-value <? 0.05)
draw an edge
Statistical tests (1) –
permuted metagenes
Construction of a network
from a set of permuted
metagenes (random
collection of genes from
each organism)
At P < 0.05, the real
networks contained 3.5 ±
0.03 times as many
interactions as the
random networks
contained
Statistical tests (2) –
half the data
Split microarray data into
halves two networks
We then counted the fraction of
interactions that were
significant in one network
(P < 0.05), given that they
were significant in the other
network at P < p for various
values of p.
P = 0.05 41% significant
expression interactions
We added increasing
levels of Gaussian
noise to the entire data
set for each of the
organisms.
Noise negative log P-value
Statistical tests (3) –
noise stability
Real network negative log P-value
Visualization
x-y plane – negative logarithm of P value
K-means clustering
z axis – density of genes in the region
function region
function network
Example – Component 5
A total of 241 metagenes
110 of which were previously known
to be involved in the cell cycle.
202 cell cycle metagenes in the
network.
P-value < 10-85
Of the 241 cell cycle metagenes:
30 – regulating the cell cycle.
80 – terminal cell cycle functions.
131 – unknown.
Experimental validation (1) –
expression data
Five metagenes with a
significant number of links
to known cell proliferation
genes.
Measuring expression
levels in dividing
pancreatic cancer cells
and in nondividing normal
cells.
Experimental validation (2) –
loss-of-function mutant
loss-of-function mutant
phenotype for one of these
genes (C. elegans gene
ZK652.1)
RNA interference (RNAi) of
ZK652.1 resulted in excess
nuclei in the germ line,
suggesting that the wildtype function of this gene is
to suppress germline
proliferation.
Multi-species vs.
single species (1)
For each gene (of the five metagenes), we
constructed an organism-specific
neighborhood.
On average, the neighborhoods of these five
genes were over four times more enriched for
cell proliferation and cell cycle genes in the
multiple-species network than they were in
the best single-species neighborhood.
Multi-species vs.
single species (2)
Trying to link together
genes that were
previously known to be
involved in a single
function (coverage)
excluding genes not
known to participate in
that function (accuracy)
Huge data
The multiple-species network was built from
more DNA microarray data (3182).
Construction of the network out of only 979
DNA microarrays (as in the worm data set)
gave similar results.
Summary - Multi is good
We map only genes that have orthologs in
other species and thus focuses strongly on
core, conserved biological processes;
Interactions in the multiple-species network
imply a functional relationship based on
evolutionary conservation.
Nice to have – analysis of other components.
Goal
Comparative study of large datasets of expression
profiles from six evolutionarily distant organisms:
Goal
Coexpression is often conserved.
Comparing the regulatory relationships
between particular functional groups in the
different organisms.
Comparing global topological properties of
the transcription networks derived from the
expression data, using a graph theoretical
approach.
Homologous gene with
preserved function
Coexpression conservation
Coexpressed groups - yeast transcription
modules
For each yeast module we constructed five
“homologue modules”.
Refining homologue modules
The signature algorithm
identifies those
homologues that are
coexpressed under a
subset of the experimental
conditions.
Furthermore, it reveals
additional genes that are
not homologous with any
of the original genes, but
display a similar
expression pattern under
those conditions
Correlation distribution
the distribution of the Z-scores for the average
gene–gene correlation of all the “homologue
modules”
Higher-order regulatory
structures
Cell Cycle Experiments
Subsets of the data
Correlations between
the sets of conditions
for randomly selected
subsets of the data.
Although the data is
sparse , the findings
reflect real properties of
the expression network.
Decomposition of the
expression data
Decomposition of the
expression data into a set of
transcription modules using
the iterative signature
algorithm (ISA)
Modules are colored
according to the fraction of
homologues they possess in
the other organism
Protein
synthesis
Power-law connectivity
distribution
k ( n) ~ k
1.1 1.8
Connections & Connectivity
Connections between
genes of similar
connectivity are
enhanced (red regions)
Connections between
highly and weakly
connected genes are
suppressed (blue)
Essentiality & Connectivity
The likelihood of a
gene to be
essential
increases with its
connectivity.
Homology & Connectivity
The highly
connected genes
are more likely to
have homologues
in the other
organisms
Summary
Similarity in lower resolution, differences in
higher resolution:
All expression networks share common
topological properties (scale-free connectivity
distribution, high degree of modularity).
The modular components of each transcription
program as well as their higher-order organization
appear to vary significantly between organisms
and are likely to reflect organism-specific
requirements.
Future
Gene expression studies
Evolution studies
Thank you …