Download Biological Networks

Document related concepts

IEEE 802.1aq wikipedia , lookup

Distributed firewall wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Computer network wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Network tap wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Peering wikipedia , lookup

Airborne Networking wikipedia , lookup

Peer-to-peer wikipedia , lookup

Transcript
Biological Networks
Feng Luo
1
Copyright notice
• Many of the images in this power point
presentation of other people. The
Copyright belong to the original
authors. Thanks!
2
Biological Networks
Biological Systems
Made of many non-identical elements
interact each other with diverse ways.
Biological Networks
Biological networks as framework for the study of biological systems
3
Why Study Networks?
• It is increasingly recognized that complex
systems cannot be described in a
reductionist view.
• Understanding the behavior of such
systems starts with understanding the
topology of the corresponding network.
• Topological information is fundamental in
constructing realistic models for the
function of the network.
4
Graph Terminology
Node
Edge
Directed/Undirected
Degree
Shortest Path/Geodesic distance
Neighborhood
Subgraph
Complete Graph
Clique
Degree Distribution
Hubs
5
Type of Biological Networks
•
•
•
•
•
•
Protein interaction networks
Gene regulatory networks
Metabolism networks
Gene co-expression networks
Signal transduction networks
Genetic interaction networks
6
Protein Interactions
P. Uetz, et al. Nature, 2000; Ito et al., PNAS, 2001; …
7
Protein Interaction Network
Nodes: proteins
Links: physical interactions
(Jeong et al., 2001)
8
9
Metabolic network (KEGG)
Graph
Node: Object
e.g. Chemical compound
Edge: Relation between
objects
e.g. Chemical reaction
10
Nodes: chemicals
Metabolic Network
(substrates)
Links: chem. reaction
11
Metabolic Network
Nodes: chemicals (substrates)
Links: chemistry reactions
(Ravasz et al., 2002)
12
Gene Regulation
•Proteins are encoded by the DNA of the organism.
•Proteins regulate expression of other proteins by
interacting with the DNA
protein
protein
Inducer
(external signal)
protein
DNA
promoter region
ACCGTTGCAT
Coding region
Activators increase gene production
X
X
Activator
Y
No transcription
X binding site
gene Y
Y
Y
Sx
X
Y
Y
X*
X*
Bound activator
INCREASED TRANSCRIPTION
Repressors decrease gene production
X
Bound repressor
Sx
X
X*
No transcription
X*
Bound repressor
Y
Unbound repressor
X
Y
Y
Y
Y
Gene Regulatory Networks
•Nodes are proteins (or the genes that encode them)
X
Y
The gene regulatory network of E. coli
Shen-Orr et. al. Nature Genetics 2002
•shallow network, few long cascades.
•modular
•compact in-degree, scale free outdegree (promoter size limitation)
Gene regulatory networks
18
CoExpression Network Revealed
from Yeast Cell Cycle Data
12. Protein synthesis
9. Cell cycle regulation
11. Cell differentiation
6. Mating
3. Galactose metabolism
Y’-cluster
10. Stress response
5. Amino acid metabolism
7. Glucogenesis
14. Energy transport
1. Protein fate
Histone
2. Amino acid synthesis
Cell wall
organization
13. Cell wall
organization
8. unknown
Protein degradation
4. Protein glycosylation
and transport
Mitochondrion
Protein degradation
15. Ribosomal biogenesis
Ribosomal proteins
Yeast cell cycle microarray data (Spellman et al., 1998)
19
Signal transduction networks
(BD BioScience)
20
Properties of Biological Networks
•
•
•
•
•
•
Scale Free
Small world
Hierarchical
Modular
Robust
Motif
21
Scale-Free Network

Degree of a node


P(k)



The number of
adjacent nodes
degree=5
Degree distribution
Frequency of nodes degree=2
with degree k
degree
=3
Scale-free network


P(k) follows power law
Different from random
networks
P( k )  k

22
Erdös-Rényi model
(1960)
Connect with
probability p
p=1/6
N=10
k ~ 1.5
Pál Erdös
(1913-1996)
Poisson distribution
- Democratic
- Random
23
SCALE-FREE NETWORKS
(1) The number of nodes (N) is NOT fixed.
Networks continuously expand by
the addition of new nodes
Examples:
WWW : addition of new documents
Citation : publication of new papers
(2) The attachment is NOT uniform.
A node is linked with higher probability to a node
that already has a large number of links.
Examples :
WWW : new documents link to well known sites
(CNN, YAHOO, NewYork Times, etc)
Citation : well cited papers are more likely to be cited again
24
(1) GROWTH :
Scale-free model
At every timestep we add a new node with m edges
(connected to the nodes already present in the system).
(2) PREFERENTIAL ATTACHMENT :
The probability Π that a new node will be connected to
node i depends on the connectivity ki of that node
ki
 ( ki ) 
 jk j
P(k) ~k-3
A.-L.Barabási & R. Albert, Science, 1999
25
Metabolic network
Archaea
Bacteria
Eukaryotes
Organisms
from all three
domains of life
are scale-free
networks!
26
H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 2000
Topology of the protein network
P(k ) ~ (k  k0 )  exp( 
k  k0
)
k
H. Jeong, S.P. Mason, A.-L. Barabasi & Z.N. Oltvai, Nature, 2001
27
Nature 408 307 (2000)
28
p53 network (mammals)
29
Local clustering
Clustering: My friends will likely know each other!
Networks are clustered [large C]
30
Clustering Coefficient
The density of the network
surrounding node I, characterized as
the number of triangles through I.
Related to network modularity
nI
2n I
CI 

 k  k  k  1
 
2
k: neighbors of I
The center node has 8 (grey) neighbors
There are 4 edges between the neighbors
nI: edges between
node I’s neighbors
C = 4 /((8*(8-1)) /2)= 4/28 = 1/7
31
Shortest-Path between nodes
32
Shortest-Path between nodes
33
Small-world Network
• Every node can be reached from every other by
a small number of hops or steps
• High clustering coefficient and low meanshortest path length
– Random graphs don’t necessarily have high clustering
coefficients
• Social networks, the Internet, and biological
networks all exhibit small-world network
characteristics
34
Modularity in Cellular Networks
 Hypothesis:
Biological function are carried by discrete functional modules.
Hartwell, L.-H., Hopfield, J. J., Leibler, S., & Murray, A. W., Nature, 1999.
 Traditional view of modularity:
35
Modular vs. Scale-free Topology
(a)
Scale-free
(b)
Modular
36
How do we know that metabolic networks are modular?
• clustering coefficient is the same across metabolic networks in
different species with the same substrate
• corresponding randomized scale free network:
C(N) ~ N-0.75 (simulation, no analytical result)
bacteria
archaea (extreme-environment
single cell organisms)
eukaryotes (plants, animals,
fungi, protists)
scale free network of the same
size
37
Real Networks Have a Hierarchical Topology
What does it mean?
Many highly connected small clusters
combine into
few larger but less connected clusters
combine into
even larger and even less connected clusters
 The degree of clustering follows:
38
Properties of hierarchical networks
1. Scale-free
2. Clustering coefficient
independent of N
3. Clustering
coefficient scales
39
Hierarchy in biological systems
Metabolic networks
Protein networks
40
Can we identify the modules?
topological overlap
J (i, j )
OT (i, j ) 
J(i,j): # of nodes both i and j link to; +1 if there is a direct (i,j) link
min( ki , k j )
41
Modules in the E. coli metabolism
E. Ravasz et al., Science, 2002
42
Robustness
Complex systems maintain their basic functions
even under errors and failures
(cell  mutations; Internet  router breakdowns)
1
S
fc
0
1
Fraction of removed nodes, f
node failure
43
Robustness of scale-free networks
Failures
Topological
error tolerance
1
R. Albert et.al.
Nature, 2000
  3 : fc=1
S
(R. Cohen et. al., PRL, 2000)
0
fc
f
1
Attacks
44
Path Length
Attack Tolerance
45
Yeast protein network
- lethality and topological position -
Highly connected proteins are more essential (lethal)...
H. Jeong, S.P. Mason, A.-L. Barabasi &Z.N. Oltvai, Nature, 200146
Network Motifs
47
Network motifs
• Comparable to electronic circuit types (i.e.,
logic gates)
• The notion of motif, widely used for sequence
analysis, is generalizable to the level of
networks.
• Network Motifs are defined as recurring
patterns of interconnections found within
networks at frequencies much higher than
those found in randomized networks.
48
Random vs designed/evolved
features
• Large networks may contain information about
design principles and/or evolution of the
complex system
• Which features are there for a reason?
– Design principles (e.g. feed-forward loops)
– Constraints (e.g. the all nodes on the Internet must
be connected to each other)
– Evolution, growth dynamics (e.g. network growth is
mainly due to gene duplication)
49
Network motifs
• Uri Alon et al : “Network Motifs: Simple building
Blocks of Complex Networks”; Science, 2002.
• Different networks were found to have different
motif abundances.
• The motifs reflect the underlying processes that
generate each type of network.
50
Motifs in the network
graph
motif to be found
motif matches in the target graph
51
http://mavisto.ipk-gatersleben.de/frequency_concepts.html
Detecting network motifs
There are three main tasks in detecting
network motifs:
(1) Generating an ensemble of proper
random networks
(2) Counting the subgraphs in the real
network and in random networks
(3) Search for graphs that appear
disproportionately in one list vs. the other
52
All 3-node connected
subgraphs
• 13 different isomorphic types of 3-node connected subgraph
• There are:
199 4-node subgraphs,
9,364 5-node subgraphs,
etc……
53
Motifs detected
• Two significant motifs appearing numerous
times in non-homologous gene systems that
perform diverse biological functions
54
Motifs II
S. Wuchty, Z. Oltvai & A.-L. Barabasi, Nature Genetics, 2003
55
Probabilistic algorithm for subgraph
sampling
The problem :
•Exhaustive subgraph enumeration complexity scales as # of subgraphs
•Exponential in subgraph size
•Infeasible for large networks with hubs
Solution :
An efficient sampling algorithm
Probabilistic algorithm for subgraph
sampling
•Instead of examining absolute subgraph counts
we define subgraph concentration :
Num of n -nodes connected subgraphs type i
C 
i Total num of all n -nodes connected subgraphs
•Sampling algorithm :
Different probabilities of
sampling different subgraphs
58
Weight of each sample corrects for its
sampling probability
4
1
2
3
P=0.33
W=3
Ci 
 weighted
 weighted
5
6
7
P=0.14
W=7
samples of subgraph i
samples of ALL n - nodes subgraphs types
Rapid convergence to real
concentration
Kashtan et. al. Bioinformatics 2004
Runtime almost independent of
network size
Kashtan et. al. Bioinformatics 2004