Download Network Motifs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression profiling wikipedia , lookup

Metabolic network modelling wikipedia , lookup

Gene expression programming wikipedia , lookup

Transcript
Detecting Network Motifs
in Gene Co-expression
Networks
Shahar Artzi
Outline
Introduction to Network Motifs
Introduction to Detecting Network Motifs
Co-expression Network
Network Motif Discovery
Scale-Free Networks
Result
Discussion
Network Motifs
What is “Network Motifs” ?
Network Motifs are defined as patterns of
interconnections that recur in many
different parts of a network at frequencies
much higher than those found in
randomized networks.
Why we need them?
* To help us understand how biological networks work.
* Exact forecasting of operation and reaction in the
network under given situations.
The concept of “network motifs” was first
proposed by Uri Alon’s group :
Schematic View of Network Motif Detection
Examples for motifs
• FeedForward Loop
)‫(היזון קדמי‬
Found in neural networks.
It seems to be Used for neutralize
“Biological Noise”.
• Single-Input Module )‫(פס יצור‬
Implemented in gene control networks
Examples for motifs
• Parallel paths
Found in neural networks.
(and not so much in gene networks)
INTRODUCTION
to Detecting Network Motifs
Functionally related genes could be
clustered together based on similar expression profiles.
Problem :
General clustering algorithms produce clusters of
relatively large size, making it difficult to test the cluster
of interest using wet-lab experiments.
General clustering algorithms do not provide reasonably
detailed information about the relationship among genes
in a cluster.
This makes it even more difficult for individual
researchers to verify the predictions experimentally.
INTRODUCTION
to Detecting Network Motifs
We need a way to break big clusters into smaller ones
and thereby provide more detailed insights into
relationships among genes within subclusters.
To achieve this goal:
we need additional independent information.
In this research they used Protein sequence information
The algorithms are able to decompose the clusters of
genes into smaller ones by integrating protein domain
information into the clustering algorithm.
Co-expression Network
The genes are vertices (nodes).
An unweighted, and undirected edge (connection) is put
between two genes if they are co-expressed with
correlation higher than a specified threshold.
Correlation Matrix
Adjacency Matrix
Cutoff : 0.8
Network Motif Discovery
Here we extended the concept of network motif )Alon’s
group proposal) to the labeled graph by also considering
patterns of vertex labels.
Network Motif Discovery
Figure B shows 7 hypothetical genes in a co-expression
network forming three distinct instances of the network
motif as described in Figure A.
II and III are overlapping
I and II are non-overlapping
(I and III are also non – overlapping)
Network Motif Discovery Schematic View
< G, k, f >
Enumeration of k-vertex cliques
Protein domain
Groups of cliques
f := number of non-overlapping cliques
Network motifs
Network Motif Discovery
Scoring Method
First step :
Starting from a labeled graph whose vertices (genes)
were labeled with their corresponding annotated Pfam
protein domain information.
A generic clique-based clustering algorithm was applied
to this labeled co-expression network to search for
patterns of highly co-expressed genes or network motifs.
We’ll go over the clustering algorithm briefly.
The algorithm work in Scoring Method :
Network Motif Discovery
Scoring Method
Our first step is to determine which genes appear to
discriminate best among sample types.
To accomplish this, a discrimination score is calculated
for each gene.
Only the best genes (those with the highest scores) are
retained for subsequent steps.
Subtracting the sum of the standard deviations of a gene
within each group allows us to eliminate, or at least
diminish, the importance of any gene whose expression
levels vary excessively.
The data is obtained in an n x m matrix.
Network Motif Discovery
Scoring Method
The algorithm can be described in pidgin ALGOL as
follows:
From : “A COMBINATORIAL APPROACH TO THE ANALYSIS OF DIFFERENTIAL GENE EXPRESSION DATA
The Use of Graph Algorithms for Disease Prediction and Screening”
*Michael A. Langston1, Lan Lin1, Xinxia Peng2, Nicole E. Baldwin1, Christopher T. Symons1, Bing Zhang3 and Jay R. Snoddy3
Network Motif Discovery
Scoring Method
Alternative approach :
Elimination of outliers before computing the scores outliers might affect both the median and the standard
deviation.
we modified our approach by adding a screening phase,
in which we first compute the medians and the standard
deviations for each gene within each group, then check
the expression values corresponding to that gene,
discarding those at least three standard deviations away
from the group median.
We subsequently re-compute all medians and standard
deviations using only the retained values.
Network Motif Discovery
Scoring Method
We describe this modified algorithm in pidgin ALGOL:
continuance
From : “A COMBINATORIAL APPROACH TO THE ANALYSIS OF DIFFERENTIAL GENE EXPRESSION DATA
The Use of Graph Algorithms for Disease Prediction and Screening”
*Michael A. Langston1, Lan Lin1, Xinxia Peng2, Nicole E. Baldwin1, Christopher T. Symons1, Bing Zhang3 and Jay R. Snoddy3
Network Motif Discovery
Scoring Method
Algorithm cont’ :
Network Motif Discovery
Subsequent Steps
For a specified k, we scanned all k-vertex
cliques and grouped all cliques found based on the
protein domain information.
Next a parameter f specifying the minimum number of
mutual non-overlapping instances in a network motif was
used to trim the list of putative network motifs.
Only the putative network motifs having at least
f non-overlapping instances were kept as network motifs.
We further assessed the statistical significance of each
detected network motif by comparison to randomized
networks.
Network Motif Discovery
Subsequent Steps
Starting from the real co-expression network, we
generated a randomized network by randomly permuting
the domain labels of all genes while leaving the
connection structure of the graph untouched, and then
ran the same network motif detection procedure on the
resulting randomized network.
This process was repeated 1,000 times and the
percentage of times the same network motif was found
in the randomized networks was defined as the p-value
for the network motif.
Network Motif Discovery
domain matching levels
Here we propose only domain matching levels:
Matching level 2 : requires that two proteins have the
exact same type of domain, the same number of each
type of domain and all domains in the same order in the
respective protein sequences.
Matching level 4 :only requires the same type of domain.
The network motif detection procedure was run
separately using different domain matching levels.
Examples
Domain Matching level 2
1
A
B
C
1’
B
A
C
Domain Matching Level 4
2
A
A
C
2’
C
A
Scale-Free Networks
DEFINITION: Scale-free networks, including the Internet,
are characterized by an uneven distribution of
connectedness. Instead of the nodes of these networks
having a random pattern of connections, some nodes act
as "very connected" hubs, a fact that dramatically
influences the way the network operates.
Barabasi and his colleagues mapped the connectedness
of the Web.
Their experiment yielded a connectivity map that they
christened "scale-free."
Scale Free Vs Random
Random networks suffer from random failures
because each node important as any other
"scale free" networks are more immune
to random failure due to the redundancy
of paths linking nodes
connectivity ensured by few
highly connected nodes
"scale free" networks are prone to
catastrophic failure when key "hubs"
are attacked
Scale Free Vs Random
(a) constructed by laying down
N nodes and connecting each pair with probability p.
This network has N = 10 and p = 0.2.
(b) A new node (red) connects to two existing nodes
in the network (black) at time t + 1. This new node is
much more likely to connect to highly connected
nodes, a phenomenon called preferential attachment.
(c) The network connectivity can be characterized by
the probability P(k) that a node has k links.
For random graphs P(k) is strongly peaked at
k = <k> and decays exponentially for large k.
(d) A scale-free network does not have a peak in P(k),
and decays as a power law P(k) ~ kg at large k.
(e) A random network - most nodes have approximately
the same number of links.
(f) The majority of nodes in a scale-free network have
one or two links, but a few nodes have a large number
of links; this guarantees that the system
is fully connected
Scale Free Vs Random
• More than 60% of nodes (green) can be reached from the
five most connected nodes (red)
• compared with only 27% in the random network.
• This demonstrates the key role that hubs
play in the scale-free network.
• Both networks contain the 130 nodes and 430 links.
Results
To convert a correlation matrix to
a corresponding co-expression
network, a suitable cutoff value
for the correlation coefficient must
be chosen.
Based on the previous reports
that biological networks, including
co-expression networks, follow a
scale-free distribution of
connectivities, we chose a cutoff
value which gave fewer vertices
with higher degree (connectivity).
• The correlation cutoff value of 0.95 is appropriate
Results
Using a series of values for parameters k and f, we found a number
of putative network motifs under different domain matching levels
Results
As shown in Table 1, both increasing k, the size of
network motifs and f, the minimum number of non
overlapping instances, decrease the number of network
motifs detected.
To gain further confidence in our predictions, we used
yeast protein interaction data that includes rich protein
complex information, and searched within this data for
instances of putative malaria network motifs.
We can see (in the next slides) that more malaria
network motifs were supported by yeast interaction data
as the parameters became more stringent.
Results
Confirmation of Prediction by Yeast Protein Interactions
Percentage of network motifs having instance
in yeast PPI network.
Domain matching level 2
Results
Confirmation of Prediction by Yeast Protein Interactions
Percentage of network motifs having instance
in yeast PPI network.
Domain matching level 4
Results -
Example 1
This figure shows a putative network motif detected under domain
matching level 2, k = 6 and f = 2. This motif consists of six highly
co-expressed genes.
Results -
Example 1
This figure shows the instances formed by different combinations
of 27 genes detected in the yeast protein interaction network for
the malaria network motif shown in the last figure.
• Protein complex 11635 contains six genes forming an exact instance of
the predicted network motif.
Results -
Example 2
We hypothesized that individual instances of a network
motif could function in different locations and times,
dependent upon regulation.
The malaria time series data enables
us to test this hypothesis by examining the temporal
expression profiles of instances of network motifs.
Next figure shows such an example network motif
detected with parameter values at domain matching
level 2, k = 3 and f = 2.
Results -
Example 2
Results -
Example 2
Apparently these six genes all have similar
expression profiles and the only major difference
is the timing.
There is a phase difference between two
instances while all three genes within each of
two instances have the same expression
profiles.
Having instances in yeast protein interaction
data provide further support that these genes do
interact directly
• More Results
http://mouse.ornl.gov/~xpv/camda04/index.html
DISCUSSION
massive amounts of experimental
data have been collected
New computational approaches are needed to analyze
these data in an integrative way
provide more reliable results with finer resolution for
experimental verification
Here we propose a new strategy to analyze gene
expression data by integrating a diversity of additional
information, such as:
• primary sequence information
• protein interaction data.
DISCUSSION
Our approach can easily make use of cross-species
information.
Biological hypothesis
Modularity of biological networks
How genes within a module are orchestrated at mRNA
level is our next research question.
THE
END
References
Detecting Network Motifs in Gene Coexpression Networks Xinxia Peng et al
http://www.weizmann.ac.il/mcb/UriAlon/index.ht
ml
Network Motifs: Simple Building Blocks of
Complex Networks R. Milo et al