Download Mon 4/10 - Computer Science

Document related concepts
no text concepts found
Transcript
Gene and Protein Networks
Monday, April 10 2006
CSCI 7000-005:
Computational Genomics
Debra Goldberg
[email protected]
What is a network?
• A collection of objects (nodes, vertices)
• Binary relationships (edges)
• May be directed
• Also called a
graph
Networks are everywhere
Social networks
Nodes:
People
Edges:
Friendship
from www.liberality.org
Sexual networks
Nodes:
People
Edges:
Romantic and sexual relations
Transportation networks
Nodes:
Locations
Edges:
Roads
Power grids
Nodes:
Power station
Edges:
High voltage
transmission line
Airline routes
Nodes:
Airports
Edges:
Flights
Internet
Nodes:
MBone Routers
Edges:
Physical connection
Internet
Nodes:
Autonomous systems
Edges:
Physical connection
World-Wide-Web
Nodes:
Web documents
Edges:
Hyperlinks
Gene and protein networks
Metabolic networks
Nodes:
Metabolites
Edges:
Biochemical
reaction
(enzyme)
from web.indstate.edu
Metabolic networks
Nodes:
Metabolites
Edges:
Biochemical
reaction
(enzyme)
• Drug targets
predicted
from www.bact.wisc.edu
Metabolic
networks
Nodes:
Metabolites
Edges:
Biochemical
reaction
(enzyme)
Protein interaction networks
Nodes:
Proteins
Edges:
Observed
interaction
from www.embl.de
• Gene function predicted
Gene regulatory networks
Nodes:
Genes or gene products
Edges:
Regulation of expression
from Wyrick et al. 2002
• Inferred from error-prone
gene expression data
Signaling networks
Nodes:
Molecules
(e.g., Proteins or
Neurotransmitters)
Edges:
Activation or
Deactivation
from pharyngula.org
Signaling networks
Nodes:
Molecules
(e.g., Proteins or
Neurotransmitters)
Edges:
Activation or
Deactivation
from www.life.uiuc.edu
Synthetic sick or lethal (SSL)
X
Y
X
Y
X
Y
X
Y
Cells live
(wild type)
Cells live
Cells live
Cells die
or grow slowly
SSL networks
X
Y
Nodes:
Nonessential genes
Edges:
Genes co-lethal
from Tong et al. 2001
• Gene function, drug targets predicted
Other biological networks
• Coexpression
– Nodes: genes
– Edges: transcribed at same times,
conditions
• Gene knockout / knockdown
– Nodes: genes
– Edges: similar phenotype (defects) when
suppressed
What they really look like…
We need models!
Traditional graph modeling
from GD2002
Random
Regular
Introduce small-world
networks
Small-world Networks
• Six degrees of separation
• 100 – 1000 friends each
• Six steps:
1012 - 1018
• But…
We live in
communities
Small-world measures
• Typical separation between two vertices
– Measured by characteristic path length
• Cliquishness of a typical neighborhood
– Measured by clustering coefficient
v
v
Cv = 1.00
Cv = 0.33
Watts-Strogatz
small-world model
Measures of the W-S model
• Path length drops faster than
cliquishness
• Wide range
of p
has both
small-world
properties
Small-world measures of
various graph types
Characteristic
Cliquishness Path Length
Regular
graph
High
Long
Random
graph
Low
Short
Small-world
graph
High
Short
Another network property:
Degree distribution P (k)
• The degree (notation: k) of a node is
the number of its neighbors
• The degree distribution is a histogram
showing the frequency of nodes having
each degree
Degree distribution of
E-R random networks
Erdös-Rényi random graphs
0.1
P(k)
P(k )
Binomial degree distribution,
well-approximated by a
0.15
Poisson
0.05
0
Network figures from Strogatz, Nature 2001
0
Degree
25 = k 50
Degree = k
Degree distribution of many
real-world networks
Scale-free networks
Degree distribution follows a
1
power law
P(k = x) =  x - 0.5
log k
log P(k)
1
P(k )
P(k)
0.1
0.01
0.001
1
10
100
0
0
Degree
25 = k 50
Degree = k
Hierarchical Networks
Ravasz, et al.,
Science 2002
Properties of hierarchical networks
1. Scale-free
2. Clustering coefficient
independent of N
3. Scaling clustering
coefficient (DGM)
C of 43 metabolic networks
• Independent of N
Ravasz, et al.,
Science 2002
Scaling of the
clustering coefficient C(k)
• Metabolic
networks
Ravasz, et al.,
Science 2002
Many real-world networks are
small-world, scale-free
•
•
•
•
•
•
•
•
World-wide-web
Collaboration of film actors (Kevin Bacon)
Mathematical collaborations (Erdös number)
Power grid of US
Syntactic networks of English
Neural network of C. elegans
Metabolic networks
Protein-protein interaction networks
There is information in a
gene’s position in the network
We can use this to predict
• Relationships
– Interactions
– Regulatory relationships
• Protein function
– Process
– Complex / “molecular machine”
Confidence assessment
• Traditionally, biological networks
determined individually
– High confidence
– Slow
• New methods look at entire organism
– Lower confidence ( 50% false positives)
• Inferences made based on this data
Confidence assessment
• Can use topology to assess confidence
if true edges and false edges have
different network properties
• Assess how well each edge fits
topology of true network
• Can also predict unknown relations
Goldberg and Roth, PNAS 2003
Use clustering coefficient,
a local property
• Number of triangles = |N(v)  N(w)|
y
v
v
w
w
x
• Normalization factor?
N(x) = the neighborhood of node x
...
Mutual clustering coefficient
Jaccard Index:
Meet / Min:
Geometric:
|N(v)  N(w)|
---------------|N(v)  N(w)|
|N(v)  N(w)|
-----------------------min ( |N(v)| , |N(w)| )
|N(v)  N(w)| 2
-----------------|N(v)| · |N(w)|
Hypergeometric:
a p-value
Mutual clustering coefficient
Hypergeometric:
P (intersection at least as large by chance)
= neighbors of node v
= neighbors of node w
= nodes in graph
-log
Prediction
• A v-w edge would have a high
clustering coefficient
v
w
Confidence assessment
• Integrate experimental details with local
topology
– Degree
– Clustering coefficient
– Degree of neighbors
– Etc.
Bader, et al., Nature Biotechnology 2003
The synthetic lethal network
has many triangles
Xiaofeng Xin, Boone Lab
2-hop predictors for SSL
•
•
•
•
•
SSL – SSL (S-S)
Homology – SSL (H-S)
Co-expressed – SSL (X-S)
Physical interaction – SSL (P-S)
2 physical interactions (P-P)
S:
H:
X:
P:
Synthetic sickness or lethality (SSL)
Sequence homology
Correlated expression
Stable physical interaction
v
w
Wong, et al.,
PNAS 2004
Multi-color motifsHir1
Hir2
Hir2
Hir1
C1
C2
C1
C2
R
R
R
R
R
R
R
R
C1
C2
P
R P R
R XX R
3
4
Nreal:
5.6×10
1.5×10
Nreal:
5.6×103
1.5×104
2
Nrand: (4.3+0.5)×10
(3.6+0.2)
×103
P
X
Nrand: (4.3+0.5)×102 (3.6+0.2) ×103
Nreal:
5.6×103
1.5×104
Nrand:
(4.3+0.5)×102
S:
H:
X:
P:
R:
(3.6+0.2)
×103
Hir1
Hir1
Hir1
R
R
R Hir1 R
Hhf1R P,XR Hht1
Hhf1
Hht1
P,X
a network
Hhf1
Hht1
P,X motif
a network motif
a network motif
Synthetic sickness or lethality
Sequence homology
Correlated expression
Stable physical interaction
Transcriptional regulation
Hir2
Hta1
Hta1
Hhf1
Hht1
Hhf1
Hht1
Hta2
Htb2
Hta2 Hta1
Htb2
Hhf1
Hht1
Hhf2
Hht2
Hhf2
Hht2
Hta2 Htb1Htb2
Htb1
Hhf2 a network theme
Hht2
a network theme
Htb1
a network theme
Zhang, et al.,
Journal of Biology
2005
SSL “hubs” might be good
cancer drug targets
Normal cell
Alive
Cancer cells w/ random mutations
Dead
(Tong et al, Science, 2004)
Dead
Predict protein function from
function of neighboring proteins
• “Guilt by association”
• Consider immediate neighbors
– Schwikowski, et al., Nature Biotechnology
2001
• Consider a given radius
– Hishigaki, et al., Yeast 2001
Predict protein function from
neighboring proteins (2)
• Minimize interactions between proteins
with different annotations
– Vazquez, et al., Nature Biotechnology 2003
– Karaoz, et al., PNAS 2004
• Use network flow algorithm to “transport”
function annotation
– Nabieva, et al., Bioinformatics 2005
Lethality
• Hubs are more
likely to be
essential
Jeong, et al.,
Nature 2001
Degree anti-correlation
• Few edges directly
between hubs
• Edges between hubs
and low-degree
genes are favored
Maslov and Sneppen, Science 2002
Beware of bias
Protein abundance
• Abundant proteins are
– more likely to be represented in some
types of experiments
– More likely to be essential
• Correlation between degree (hubs) and
essentiality disappears or is reduced
when corrected for protein abundance
Bloom and Adami,
BMC Evolutionary Biology 2003
Degree correlation
25
20
average degree K1
• Anti-correlation
of degrees of
interacting
proteins
disappears in
un-biased data
15
10
essential
non-essential
5
0
0
10
20
30
40
degree k
Coulomb, et al.,
Proceedings of the Royal Society B 2005
50
60
70
Community structure
Partitioning methods
Community structure
• Proteins in a
community may be
involved in a
common process
or function
Finding the communities
•
•
•
•
•
•
Hierarchical clustering
“Betweenness” centrality
Dense subgraphs
Similar subgraphs
Spectral clustering
Party and date hubs
Hierarchical clustering (1)
Using natural edge weights
• Gene co-expression
• e.g., Eisen MB, et al.,
PNAS 1998
from www.medscape.com
Hierarchical clustering (2)
Topological overlap
• A measure of neighborhood similarity
li,j is 1 if there is a direct link between i and j, 0 otherwise
Ravasz, et al.,
Science 2002
Hierarchical clustering (3)
Adjacency vector
• Function cluster: Tong et al., Science 2004
• Find drug targets: Parsons et al.,
Nature Biotechnology 2004
“Betweenness” centrality
• Consider the shortest path(s) between
all pairs of nodes
• “Betweenness” centrality of an edge is a
measure of how many shortest paths
traverse this edge
• Edges between communities
have higher centrality
Girvan , et al., PNAS 2002
Dense subgraphs
• Spirin and Mirny, PNAS 2003
– Find fully connected subgraphs (cliques), OR
– Find subgraphs that maximize density:
2 m / (n (n-1))
• Bader and Hogue, BMC Bioinformatics
2003
– Weight vertices by neighborhood density,
connectedness
– Find connected communities with high weights
Similar
subgraphs
• Across species
• Interaction network
and genome
sequence
• e.g., Ogata, et al.,
Nucleic Acids
Research 2000
Spectral clustering
• Compute adjacency matrix eigenvectors
• Each eigenvector defines a cluster:
– Proteins with high magnitude contributions
Bu, et al., Nucleic
Acids Research
2003
positive eigenvalue
negative eigenvalue
Party and date hubs
• Protein interaction network
• Partition hubs by expression correlation
of neighbors
Han, et al., Nature 2004
Network connectivity
• Scale-free networks are:
– Robust to random failures
– Vulnerable to attacks on hubs
• Removing hubs quickly disconnects a
network and reduces the size of the
largest component
Albert, et al., Nature 2000
Removing date hubs shatters
network into communities
Date Hubs
Many sub-networks
A single main component
Temporal partitioning
Luscombe, et al., Nature 2004
Final words
• Network analysis has become an
essential tool for analyzing complex
systems
– There is still much biologists can learn from
scientists in other disciplines
• The references mentioned are
representative, and not comprehensive
Related documents