Download Protein interaction networks: a mathematical approach A Annibale June 26, 2013

Document related concepts

Airborne Networking wikipedia , lookup

Transcript
Protein interaction networks: a mathematical
approach
A Annibale
June 26, 2013
Outline
1
Networks as infrastructure of signalling processes
2
Graphs: definitions and topology measures
3
Tailored random graph ensembles
4
Generation of graphs
5
Current research: Sampling from Graphs and inference
6
Conclusions
Outline
1
Networks as infrastructure of signalling processes
2
Graphs: definitions and topology measures
3
Tailored random graph ensembles
4
Generation of graphs
5
Current research: Sampling from Graphs and inference
6
Conclusions
Protein interaction networks (PIN)
Signalling processes
Protein interaction networks (PIN)
Signalling processes
Transport
Protein interaction networks (PIN)
Signalling processes
Transport
Protein complexes
(interact for a long time)
Protein interaction networks (PIN)
Signalling processes
Transport
Protein complexes
(interact for a long time)
Protein modification
(e.g. protein kinase)
Networks as infrastructure of signaling processes
Errors in e.g. cellular information
processing are responsible for
diseases as cancer, autoimmunity
or diabetes
Networks as infrastructure of signaling processes
Errors in e.g. cellular information
processing are responsible for
diseases as cancer, autoimmunity
or diabetes
Understand how changes in structure
of underlying signalling networks
may affect the flow of information
Normal or abnormal?
Tools to quantify structure of a network
Normal or abnormal?
Tools to quantify structure of a network
Compare networks, detect abnormalities:
Normal or abnormal?
Tools to quantify structure of a network
Compare networks, detect abnormalities:
Is this signalling network abnormal?
Normal or abnormal?
Tools to quantify structure of a network
Compare networks, detect abnormalities:
Is this signalling network abnormal?
Assess significance of an observed pattern: “trivial” or
“special” element of structure?
Normal or abnormal?
Tools to quantify structure of a network
Compare networks, detect abnormalities:
Is this signalling network abnormal?
Assess significance of an observed pattern: “trivial” or
“special” element of structure?
Crucially, our answers depend on our model of
“normal”/“random”!
Normal or abnormal?
Tools to quantify structure of a network
Compare networks, detect abnormalities:
Is this signalling network abnormal?
Assess significance of an observed pattern: “trivial” or
“special” element of structure?
Crucially, our answers depend on our model of
“normal”/“random”!
Define good “reference” or Null models
Normal or abnormal?
Tools to quantify structure of a network
Compare networks, detect abnormalities:
Is this signalling network abnormal?
Assess significance of an observed pattern: “trivial” or
“special” element of structure?
Crucially, our answers depend on our model of
“normal”/“random”!
Define good “reference” or Null models
Generate null models as ‘proxies’ to analyse processes:
hypothesis testing and prediction
Outline
1
Networks as infrastructure of signalling processes
2
Graphs: definitions and topology measures
3
Tailored random graph ensembles
4
Generation of graphs
5
Current research: Sampling from Graphs and inference
6
Conclusions
Definitions
size N : number of ’nodes’, i, j, k = 1...N
Definitions
size N : number of ’nodes’, i, j, k = 1...N
1 if link i −−j;
’links’:
cij =
0 otherwise

Connectivity matrix




c=




c11
c21
..
.
c12
c22
..
.
···
···
..
.
c1j
c2j
..
.
···
···
..
.
c1N
c2N
..
.
ci1
..
.
ci2
..
.
···
..
.
cij
..
.
···
..
.
ciN
..
.
cN j
···
cN N
cN 1 cN 2 · · ·










Definitions
size N : number of ’nodes’, i, j, k = 1...N
1 if link i −−j;
’links’:
cij =
0 otherwise

Connectivity matrix




c=




c11
c21
..
.
c12
c22
..
.
···
···
..
.
c1j
c2j
..
.
···
···
..
.
c1N
c2N
..
.
ci1
..
.
ci2
..
.
···
..
.
cij
..
.
···
..
.
ciN
..
.
cN j
···
cN N
cN 1 cN 2 · · ·
symmetry, i.e. graph undirected cij = cji ∀ i, j










Definitions
size N : number of ’nodes’, i, j, k = 1...N
1 if link i −−j;
’links’:
cij =
0 otherwise

Connectivity matrix




c=




c11
c21
..
.
c12
c22
..
.
···
···
..
.
c1j
c2j
..
.
···
···
..
.
c1N
c2N
..
.
ci1
..
.
ci2
..
.
···
..
.
cij
..
.
···
..
.
ciN
..
.
cN j
···
cN N
cN 1 cN 2 · · ·
symmetry, i.e. graph undirected cij = cji ∀ i, j
no self-interaction, i.e. cii = 0 ∀i










1
3
•
2
•
•
•
4

0
 1
c=
 1
0
1
0
0
1
1
0
0
1

0
1 

1 
0
1
3
•
2
•
•
•
4

0
 1
c=
 1
0
So, how different are these two?
1
0
0
1
1
0
0
1

0
1 

1 
0
1
3
•
2
•
•
•
4

0
 1
c=
 1
0
So, how different are these two?
Need tools to quantify structure!
1
0
0
1
1
0
0
1

0
1 

1 
0
Local structure measures
degree ki (nr of partners): ki =
2
•
•
3
@
@
@•
ki = 4
i
1
•
•4
5
•
A
A
A•
7
6
•
P
j cij
Local structure measures
degree ki (nr of partners): ki =
2
•
•
3
@
@
@•
ki = 4
i
1
•
•4
P
j cij
ki = ci1 + ci2 + ci3 + ci4 + ci5 + ci6
= 1 + 1 + 0 + 1 + 1+ 0
5
•
A
A
A•
7
6
•
Local structure measures
degree ki (nr of partners): ki =
2
•
•
3
@
@
@•
ki = 4
i
1
•
•4
P
j cij
ki = ci1 + ci2 + ci3 + ci4 + ci5 + ci6
= 1 + 1 + 0 + 1 + 1+ 0
5
•
A
A
A•
7
•
Only the neighbours of i contribute
6
Local structure measures
degree ki (nr of partners): ki =
2
•
•
3
@
@
@•
ki = 4
i
1
•
•4
P
j cij
ki = ci1 + ci2 + ci3 + ci4 + ci5 + ci6
= 1 + 1 + 0 + 1 + 1+ 0
5
•
A
A
A•
7
•
Only the neighbours of i contribute
6
(1)
generalized degrees: ki =
P
j cij ,
(2)
ki =
P
js cij cjs ,
etc
Local structure measures
degree ki (nr of partners): ki =
2
•
•
3
@
@
@•
ki = 4
i
1
•4
j cij
ki = ci1 + ci2 + ci3 + ci4 + ci5 + ci6
= 1 + 1 + 0 + 1 + 1+ 0
5
•
•
A
A
A•
7
•
Only the neighbours of i contribute
6
(1)
generalized degrees: ki =
(2)
ki :
P
P
j cij ,
(2)
ki =
P
js cij cjs ,
Contributions only from j, s such that
•
i
•
j
•
s
etc
Local structure measures
degree ki (nr of partners): ki =
2
•
•
3
@
@
@•
ki = 4
i
1
•4
ki = ci1 + ci2 + ci3 + ci4 + ci5 + ci6
5
•
A
A
A•
7
•
Only the neighbours of i contribute
6
(1)
generalized degrees: ki =
P
j cij ,
(2)
ki =
P
js cij cjs ,
Contributions only from j, s such that
•
i
(`)
j cij
= 1 + 1 + 0 + 1 + 1+ 0
•
(2)
ki :
P
•
j
•
s
(ki : nr of paths of length ` away from i)
etc
clustering coefficient Ci
P
j<k
P
Ci =
j<k cij cik
AA
2•
@
@
@•
i
1•
cij cik cjk
A•3
•
4
A
AA
clustering coefficient Ci
P
j<k
P
Ci =
=
j<k cij cik
number of connected pairs among neighbours of i
number of pairs among neighbours of i
AA
2•
@
@
@•
i
1•
cij cik cjk
A•3
•
4
A
AA
clustering coefficient Ci
P
j<k
P
Ci =
=
j<k cij cik
number of connected pairs among neighbours of i
number of pairs among neighbours of i
AA
2•
@
@
@•
i
1•
cij cik cjk
A•3
•
4
A
AA
→
Possible pairs among neighbours of i:
(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)
clustering coefficient Ci
P
j<k
P
Ci =
=
j<k cij cik
number of connected pairs among neighbours of i
number of pairs among neighbours of i
AA
2•
@
@
@•
i
1•
cij cik cjk
A•3
•
4
A
AA
→
Possible pairs among neighbours of i:
(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)
→
only (3, 4) is connected
clustering coefficient Ci
P
j<k
P
Ci =
=
cij cik cjk
j<k cij cik
number of connected pairs among neighbours of i
number of pairs among neighbours of i
AA
2•
@
@
@•
A•3
•
i
1•
Ci = 1/6
4
A
AA
→
Possible pairs among neighbours of i:
(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)
→
only (3, 4) is connected
Friendship network
How many friends
do you have?
Friendship network
How many friends
do you have?
Do your friends
know each other?
Friendship network
How many friends
do you have?
Do your friends
know each other?
How many are
the friends of
your friends ?
Friendship network
How many friends
do you have?
Do your friends
know each other?
How many are
the friends of
your friends ?
Answer: Calculate
Friendship network
How many friends
do you have?
Do your friends
know each other?
How many are
the friends of
your friends ?
Answer: Calculate
Degree ki
Friendship network
How many friends
do you have?
Do your friends
know each other?
How many are
the friends of
your friends ?
Answer: Calculate
Degree ki
Clustering Coefficient Ci
Friendship network
How many friends
do you have?
Do your friends
know each other?
How many are
the friends of
your friends ?
Answer: Calculate
Degree ki
Clustering Coefficient Ci
(2)
Generalized degree ki
Friendship network
How many friends
do you have?
Do your friends
know each other?
How many are
the friends of
your friends ?
Answer: Calculate
Degree ki
Clustering Coefficient Ci
(2)
Generalized degree ki
Impractical for large N !
Friendship network
from a “non-egocentric” point of view
How many friends do people
have on average?
Friendship network
from a “non-egocentric” point of view
How many friends do people
have on average?
Picking up at random one person
what is the probability of their
having 50 friends?
Friendship network
from a “non-egocentric” point of view
How many friends do people
have on average?
Picking up at random one person
what is the probability of their
having 50 friends?
What is the average path length
between two people in the world?
Friendship network
from a “non-egocentric” point of view
How many friends do people
have on average?
Picking up at random one person
what is the probability of their
having 50 friends?
What is the average path length
between two people in the world?
“Six degrees of separation”?
Friendship network
from a “non-egocentric” point of view
How many friends do people
have on average?
Picking up at random one person
what is the probability of their
having 50 friends?
What is the average path length
between two people in the world?
“Six degrees of separation”?
How likely is that hubs
interact with each other?
Global Structure Measures
Average degree hki =
1
N
P
i ki
Global Structure Measures
P
Average degree hki = N1 i ki
Degree distribution e.g. histogram of degree sequence
(k1 , k2 , ..., kN )
p(k)
N = 24
k
Global Structure Measures
P
Average degree hki = N1 i ki
Degree distribution e.g. histogram of degree sequence
(k1 , k2 , ..., kN )
p(k)
N = 24
k
p(k) =
1 X
δki ,k
N i
δx,y =
1
0
x=y
x 6= y
Global Structure Measures
P
Average degree hki = N1 i ki
Degree distribution e.g. histogram of degree sequence
(k1 , k2 , ..., kN )
p(k)
N = 24
k
p(k) =
X
k
1 X
δki ,k
N i
p(k) = 1
δx,y =
1
0
x=y
x 6= y
Global Structure Measures
P
Average degree hki = N1 i ki
Degree distribution e.g. histogram of degree sequence
(k1 , k2 , ..., kN )
p(k)
N = 24
k
p(k) =
X
k
1 X
δki ,k
N i
p(k) = 1
δx,y =
hki =
X
k
p(k)k
1
0
x=y
x 6= y
Facebook, PINs etc: shape of p(k)?
0.05
hki = 50:
0.04
0.03
Poissonian (random):
0.02
p(k) = e−hki hkik /k!
0.01
20
40
60
80
100
friends
or this?
0.04
hki = 50:
0.03
‘hubs’
0.02
Power law (preferential attachment):
0.01
p(k) ∼ k −γ
?
0
20
40
60
80
100
friends
Degree Correlations
W (k, k 0 ): prob link between nodes with degrees (k, k 0 )
P
0
W (k, k ) =
0
ij cij δk,ki δk ,kj
P
ij cij
@
@
•
@
@
@
ki = k
•
@
@
@
kj = k 0
Degree Correlations
W (k, k 0 ): prob link between nodes with degrees (k, k 0 )
P
0
W (k, k ) =
0
ij cij δk,ki δk ,kj
@
P
ij cij
@
•
@
@
@
@
@
ki = k
Marginal
W (k) =
X
k0
W (k, k 0 ) =
•
@
kp(k)
hki
kj = k 0
Degree Correlations
W (k, k 0 ): prob link between nodes with degrees (k, k 0 )
P
0
W (k, k ) =
0
ij cij δk,ki δk ,kj
@
P
ij cij
@
•
@
@
@
@
@
ki = k
Marginal
W (k) =
X
k0
W (k, k 0 ) =
•
@
kp(k)
hki
No degree correlations ⇒ W (k, k 0 ) = W (k)W (k 0 )
kj = k 0
Degree Correlations
W (k, k 0 ): prob link between nodes with degrees (k, k 0 )
P
0
W (k, k ) =
0
ij cij δk,ki δk ,kj
@
P
ij cij
@
•
@
@
@
@
@
ki = k
Marginal
W (k) =
X
k0
W (k, k 0 ) =
•
@
kp(k)
hki
No degree correlations ⇒ W (k, k 0 ) = W (k)W (k 0 )
Define
Π(k, k 0 ) = W (k, k 0 )/W (k)W (k 0 )
kj = k 0
Degree Correlations
W (k, k 0 ): prob link between nodes with degrees (k, k 0 )
P
0
W (k, k ) =
0
ij cij δk,ki δk ,kj
@
P
ij cij
@
•
@
@
@
@
@
ki = k
Marginal
W (k) =
X
k0
W (k, k 0 ) =
•
@
kp(k)
hki
No degree correlations ⇒ W (k, k 0 ) = W (k)W (k 0 )
Define
Π(k, k 0 ) = W (k, k 0 )/W (k)W (k 0 )
Π 6= 1 signals presence of degrees correlations (beyond p(k))
kj = k 0
Protein interaction networks
Π(k, k0 )
p(k)
0.30
cam jejuni PIN
0.20
N = 1325
hki = 17.50
0.10
Pi
0.00
0
10
20
30
40
50
60
1.5
0.30
1.4
1.3
1.2
human PIN
1.1
1.0
0.20
0.9
k‘
0.8
0.7
N = 9463
0.6
0.5
0.10
0.4
0.3
0.2
0.1
0.00
0.0
0
10
20
30
40
50
60
k
hki = 7.40
Protein interaction networks
Π(k, k0 )
p(k)
0.30
cam jejuni PIN
0.20
N = 1325
hki = 17.50
0.10
Pi
0.00
0
10
20
30
40
50
60
1.5
0.30
1.4
1.3
1.2
human PIN
1.1
1.0
0.20
0.9
k‘
0.8
0.7
N = 9463
0.6
0.5
0.10
0.4
0.3
0.2
0.1
0.00
0.0
0
10
20
30
40
50
60
k
how ‘special’ is a network with topology {p, Π}?
hki = 7.40
Protein interaction networks
Π(k, k0 )
p(k)
0.30
cam jejuni PIN
0.20
N = 1325
hki = 17.50
0.10
Pi
0.00
0
10
20
30
40
50
60
1.5
0.30
1.4
1.3
1.2
human PIN
1.1
1.0
0.20
0.9
k‘
0.8
0.7
N = 9463
0.6
0.5
0.10
0.4
0.3
0.2
0.1
0.00
0.0
0
10
20
30
40
50
60
k
how ‘special’ is a network with topology {p, Π}?
‘null models’ with topology {p, Π}?
hki = 7.40
Protein interaction networks
Π(k, k0 )
p(k)
0.30
cam jejuni PIN
0.20
N = 1325
hki = 17.50
0.10
Pi
0.00
0
10
20
30
40
50
60
1.5
0.30
1.4
1.3
1.2
human PIN
1.1
1.0
0.20
0.9
k‘
0.8
0.7
N = 9463
0.6
0.5
0.10
0.4
0.3
0.2
0.1
0.00
0.0
0
10
20
30
40
50
60
k
how ‘special’ is a network with topology {p, Π}?
‘null models’ with topology {p, Π}?
distance between networks {p, Π} and {p0 , Π0 }?
hki = 7.40
Outline
1
Networks as infrastructure of signalling processes
2
Graphs: definitions and topology measures
3
Tailored random graph ensembles
4
Generation of graphs
5
Current research: Sampling from Graphs and inference
6
Conclusions
Ensembles of random graphs
Random graphs: each link i−−j has a prescribed probability
Prob(cij ), so c with Prob(c)
Ensembles of random graphs
Random graphs: each link i−−j has a prescribed probability
Prob(cij ), so c with Prob(c)
Ensemble of random graphs: Prob(c)
tells how likely is to draw a graph c from set G of allowed graphs
Ensembles of random graphs
Random graphs: each link i−−j has a prescribed probability
Prob(cij ), so c with Prob(c)
Ensemble of random graphs: Prob(c)
tells how likely is to draw a graph c from set G of allowed graphs
G={all possible graphs}
'
$
Prob(c)
9
c
&
%
Ensembles of random graphs
Random graphs: each link i−−j has a prescribed probability
Prob(cij ), so c with Prob(c)
Ensemble of random graphs: Prob(c)
tells how likely is to draw a graph c from set G of allowed graphs
G={all possible graphs}
'
$
Prob(c)
9
c
Useful?
&
%
Ensembles of random graphs
Random graphs: each link i−−j has a prescribed probability
Prob(cij ), so c with Prob(c)
Ensemble of random graphs: Prob(c)
tells how likely is to draw a graph c from set G of allowed graphs
G={all possible graphs}
'
$
Prob(c)
9
c
Useful?
&
%
Hard to measure/handle all microcopic variables {cij }
Ensembles of random graphs
Random graphs: each link i−−j has a prescribed probability
Prob(cij ), so c with Prob(c)
Ensemble of random graphs: Prob(c)
tells how likely is to draw a graph c from set G of allowed graphs
G={all possible graphs}
'
$
Prob(c)
9
c
Useful?
&
%
Hard to measure/handle all microcopic variables {cij }
But for large N expect microcopic details less important
(distribution of velocities in a gas are always Maxwell!)
Ensembles of random graphs
Random graphs: each link i−−j has a prescribed probability
Prob(cij ), so c with Prob(c)
Ensemble of random graphs: Prob(c)
tells how likely is to draw a graph c from set G of allowed graphs
G={all possible graphs}
'
$
Prob(c)
9
c
Useful?
&
%
Hard to measure/handle all microcopic variables {cij }
But for large N expect microcopic details less important
(distribution of velocities in a gas are always Maxwell!)
Assume probability distribution Prob(c) ⇒ average over it
gives macroscopic behaviour
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
&
all possible graphs
$
%
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
&
all possible graphs
$
%
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
'
&
&
all possible graphs
hki = ...
$
$
%
%
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
p(k)
'
'
&
&
&
all possible graphs
hki = ...
p(k) = . . .
$
$
$
%
%
%
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
p(k)
k = (k1 , . . . , kN )
all possible graphs
'
hki = ...
'
'
$
$
$
$
p(k) = . . .
k = (k1 , . . . , kN )
&
&
&
&
%
%
%
%
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
p(k)
k = (k1 , . . . , kN )
W (k, k 0 )
all possible graphs
'
hki = ...
'
'
#
$
$
$
$
p(k) = . . .
k = (k1 , . . . , kN )
W (k, k0 ) = . . .
"
&
&
&
&
!%
%
%
%
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
p(k)
k = (k1 , . . . , kN )
W (k, k 0 )
all possible graphs
'
hki = ...
'
'
#
Prob(c|hki),
$
$
$
p(k) = . . .
k = (k1 , . . . , kN )
W (k, k0 ) = . . .
"
&
&
&
&
Need to specify:
$
!%
%
%
%
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
p(k)
k = (k1 , . . . , kN )
all possible graphs
'
hki = ...
'
'
#
$
$
$
$
p(k) = . . .
k = (k1 , . . . , kN )
W (k, k0 ) = . . .
W (k, k 0 )
"
&
&
&
&
Need to specify:
Prob(c|hki), Prob(c|p),
!%
%
%
%
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
p(k)
k = (k1 , . . . , kN )
W (k, k 0 )
all possible graphs
'
hki = ...
'
'
#
$
$
$
$
p(k) = . . .
k = (k1 , . . . , kN )
W (k, k0 ) = . . .
"
&
&
&
&
Need to specify:
Prob(c|hki), Prob(c|p), Prob(c|k)
!%
%
%
%
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
p(k)
k = (k1 , . . . , kN )
W (k, k 0 )
all possible graphs
'
hki = ...
'
'
#
$
$
$
$
p(k) = . . .
k = (k1 , . . . , kN )
W (k, k0 ) = . . .
"
&
&
&
&
!%
%
%
%
Need to specify:
Prob(c|hki), Prob(c|p), Prob(c|k) Easy
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
p(k)
k = (k1 , . . . , kN )
W (k, k 0 )
all possible graphs
'
hki = ...
'
'
#
$
$
$
p(k) = . . .
k = (k1 , . . . , kN )
W (k, k0 ) = . . .
"
&
&
&
&
!%
%
%
%
Need to specify:
Prob(c|hki), Prob(c|p), Prob(c|k) Easy
Prob(c|k, W ),
$
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
p(k)
k = (k1 , . . . , kN )
W (k, k 0 )
all possible graphs
'
hki = ...
'
'
#
$
$
$
$
p(k) = . . .
k = (k1 , . . . , kN )
W (k, k0 ) = . . .
"
&
&
&
&
!%
%
%
%
Need to specify:
Prob(c|hki), Prob(c|p), Prob(c|k) Easy
Prob(c|k, W ), Prob(c|p, W )
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
p(k)
k = (k1 , . . . , kN )
W (k, k 0 )
all possible graphs
'
hki = ...
'
'
#
$
$
$
$
p(k) = . . .
k = (k1 , . . . , kN )
W (k, k0 ) = . . .
"
&
&
&
&
!%
%
%
%
Need to specify:
Prob(c|hki), Prob(c|p), Prob(c|k) Easy
Prob(c|k, W ), Prob(c|p, W ) Less easy...
Tailored Random Graphs Ensembles
Tailored random graph ensemble: demand that all graphs c in
the ensemble have prescribed properties
'
Demand properties
hki
p(k)
k = (k1 , . . . , kN )
W (k, k 0 )
all possible graphs
'
hki = ...
'
'
#
$
$
$
$
p(k) = . . .
k = (k1 , . . . , kN )
W (k, k0 ) = . . .
"
&
&
&
&
!%
%
%
%
Need to specify:
Prob(c|hki), Prob(c|p), Prob(c|k) Easy
Prob(c|k, W ), Prob(c|p, W ) Less easy...
Tailored graph ensembles as proxies
Tailored graph ensembles as proxies
protein
network
Measure
properties
e.g. p(k), W (k, k 0 ), . . .
Tailored graph ensembles as proxies
protein
network
-
e.g. p(k), W (k, k 0 ), . . .
Measure
properties
define random
graph ensemble
with same
properties
'?
$
&
%
Prob(c|p, W, . . .) =
Tailored graph ensembles as proxies
protein
network
-
e.g. p(k), W (k, k 0 ), . . .
Measure
properties
hypothesis testing
and prediction
define random
graph ensemble
with same
properties
'?
$
&
%
Prob(c|p, W, . . .) =
analyze processes
on graph ensemble
Graph Counting
How special?: count graphs in the ensemble!
Graph Counting
How special?: count graphs in the ensemble!
Number of graphs in the ensemble defined by Prob(c|p, W ):
N [p, W ] = eN S[p,W ]
1 X
Prob(c|p, W ) log Prob(c|p, W )
S[p, W ] = −
N c
Graph Counting
How special?: count graphs in the ensemble!
Number of graphs in the ensemble defined by Prob(c|p, W ):
N [p, W ] = eN S[p,W ]
1 X
Prob(c|p, W ) log Prob(c|p, W )
S[p, W ] = −
N c
Measure of uncertainty in a random variable
x ∈ {x1 , . . . , xn }, with distribution p(x):
Shannon entropy
S[p] = −
X
i
p(xi ) log p(xi )
Graph Counting
How special?: count graphs in the ensemble!
Number of graphs in the ensemble defined by Prob(c|p, W ):
N [p, W ] = eN S[p,W ]
1 X
Prob(c|p, W ) log Prob(c|p, W )
S[p, W ] = −
N c
Measure of uncertainty in a random variable
x ∈ {x1 , . . . , xn }, with distribution p(x):
Shannon entropy
S[p] = −
X
i
p(xi ) log p(xi ) ≥ 0
Graph Counting
How special?: count graphs in the ensemble!
Number of graphs in the ensemble defined by Prob(c|p, W ):
N [p, W ] = eN S[p,W ]
1 X
Prob(c|p, W ) log Prob(c|p, W )
S[p, W ] = −
N c
Measure of uncertainty in a random variable
x ∈ {x1 , . . . , xn }, with distribution p(x):
Shannon entropy
S[p] = −
X
p(xi ) log p(xi ) ≥ 0
i
Maximal (subject to
P
i
p(xi ) = 1) for p(xi ) = 1/n ∀ i
Graph Counting
How special?: count graphs in the ensemble!
Number of graphs in the ensemble defined by Prob(c|p, W ):
N [p, W ] = eN S[p,W ]
1 X
Prob(c|p, W ) log Prob(c|p, W )
S[p, W ] = −
N c
Measure of uncertainty in a random variable
x ∈ {x1 , . . . , xn }, with distribution p(x):
Shannon entropy
S[p] = −
X
p(xi ) log p(xi ) ≥ 0
i
Maximal (subject to
P
i
S = 0 for p(xi ) = δxi ,x?
p(xi ) = 1) for p(xi ) = 1/n ∀ i
Distances
Distance between two probability distributions p(x), q(x)
Kullback − Leibler distance D(p||q)
=
X
x
p(x) log
p(x)
q(x)
Distances
Distance between two probability distributions p(x), q(x)
Kullback − Leibler distance D(p||q)
=
X
x
p(x) log
p(x)
q(x)
Distance between cA and cB : measure e.g. pA , WA , . . . and
pB , WB , . . .
DAB =
1 X
Prob(c|pA , WA , . . .)
Prob(c|pA , WA , . . .) log
2N c
Prob(c|pB , WB , . . .)
Dendrograms
A
B
Distance
0
2
4
6
8
10
2
4
6
8
10
S.cerevisiae VIII
C.jejuni
S.cerevisiae VIII
T.pallidum
S.cerevisiae X
H.sapiens III
S.cerevisiae V
E.coli
D.melanogaster
H.sapiens IV
S.cerevisiae XI
H.pylori
S.cerevisiae IV
S.cerevisiae IX
P.falciparum
S.cerevisiae VI
H.sapiens II
H.sapiens I
S.cerevisiae VII
M.loti
S.cerevisiae III
C.elegans
S.cerevisiae I
S.cerevisiae II
S.cerevisiae XII
Synechocystis
Yeast-two-Hybrid
Distance
0
12
S.cerevisiae X
AP-MS
S.cerevisiae III
S.cerevisiae XII
S.cerevisiae I
S.cerevisiae II
S.cerevisiae XI
Y2H
PCA
S.cerevisiae IV
S.cerevisiae IX
S.cerevisiae VI
Affinity Purification-Mass Spectrometry
Database Datasets
AP-MS
Protein Complementation Assay
Data Integration
suggests strong biases in the experiments..
Outline
1
Networks as infrastructure of signalling processes
2
Graphs: definitions and topology measures
3
Tailored random graph ensembles
4
Generation of graphs
5
Current research: Sampling from Graphs and inference
6
Conclusions
Generating graphs from ensembles
Generating random graphs with specified probabilities Prob(c)
is in general highly non trivial
Generating graphs from ensembles
Generating random graphs with specified probabilities Prob(c)
is in general highly non trivial
common method to generate graphs with prescribed degrees:
Generating graphs from ensembles
Generating random graphs with specified probabilities Prob(c)
is in general highly non trivial
common method to generate graphs with prescribed degrees:
construct ad hoc graph
with (k1 , . . . , kN )
Generating graphs from ensembles
Generating random graphs with specified probabilities Prob(c)
is in general highly non trivial
common method to generate graphs with prescribed degrees:
construct ad hoc graph
with (k1 , . . . , kN )
shuffle links (randomize)
while preserving degrees
via ‘edge swaps’
•
•
•
•
•
•
−→
•
•
Generating graphs from ensembles
Generating random graphs with specified probabilities Prob(c)
is in general highly non trivial
common method to generate graphs with prescribed degrees:
construct ad hoc graph
with (k1 , . . . , kN )
shuffle links (randomize)
while preserving degrees
via ‘edge swaps’
Problems:
•
•
•
•
•
•
−→
•
•
Generating graphs from ensembles
Generating random graphs with specified probabilities Prob(c)
is in general highly non trivial
common method to generate graphs with prescribed degrees:
construct ad hoc graph
with (k1 , . . . , kN )
shuffle links (randomize)
while preserving degrees
via ‘edge swaps’
•
•
•
•
•
•
−→
•
Problems:
algorithm equilibration, when to stop?
•
Generating graphs from ensembles
Generating random graphs with specified probabilities Prob(c)
is in general highly non trivial
common method to generate graphs with prescribed degrees:
construct ad hoc graph
with (k1 , . . . , kN )
shuffle links (randomize)
while preserving degrees
via ‘edge swaps’
•
•
•
•
•
•
−→
•
•
Problems:
algorithm equilibration, when to stop?
algorithm-induced bias: am I generating each graph with the
right probability? I shouldn’t generate some graphs more
frequenlty than others!!
Possible Bias
Where will the mouse spend most of its time?
(i)
Possible Bias
Where will the mouse spend most of its time?
(i)
And now?
(ii)
Possible Bias
Where will the mouse spend most of its time?
(i)
And now?
(ii)
Accounting for mobilities
“Room” = graph configuration
“A”= c
“B”= c0
c
c0
Accounting for mobilities
“Room” = graph configuration
“A”= c
“B”= c0
c
Possible moves
n(c) 6= n(c0 )
c0
Accounting for mobilities
“Room” = graph configuration
“A”= c
“B”= c0
c
c0
Possible moves
n(c) 6= n(c0 )
Difference in mobility may lead to non-uniform sampling
Accounting for mobilities
“Room” = graph configuration
“A”= c
“B”= c0
c
c0
Possible moves
n(c) 6= n(c0 )
Difference in mobility may lead to non-uniform sampling
To restore uniformity choose properly transition probability
P (c → c0 )
Accounting for mobilities
“Room” = graph configuration
“A”= c
“B”= c0
c
c0
Possible moves
n(c) 6= n(c0 )
Difference in mobility may lead to non-uniform sampling
To restore uniformity choose properly transition probability
P (c → c0 )
Can target any Prob(c) with appropriate P (c → c0 )!
Accounting for mobilities
“Room” = graph configuration
“A”= c
“B”= c0
c
c0
Possible moves
n(c) 6= n(c0 )
Difference in mobility may lead to non-uniform sampling
To restore uniformity choose properly transition probability
P (c → c0 )
Can target any Prob(c) with appropriate P (c → c0 )!
Prob(c)P (c → c0 ) = Prob(c0 )P (c0 → c)
Generating graphs from P (c|k, W )
P (k)
Π(k, k 0 |c0 )
N = 4000
hki = 5
Π(k, k 0 ) (theory)
Π(k, k 0 |cfinal )
After 75,000
accepted moves
Outline
1
Networks as infrastructure of signalling processes
2
Graphs: definitions and topology measures
3
Tailored random graph ensembles
4
Generation of graphs
5
Current research: Sampling from Graphs and inference
6
Conclusions
Sampling from PINs
PI may be detected via different experiments
Sampling from PINs
PI may be detected via different experiments
Assume we have L biologial species; M experiments;
Sampling from PINs
PI may be detected via different experiments
Assume we have L biologial species; M experiments;
Biological
Experimental data
:
c1 X
XXXz
X
:
c11
..
.
cM
1
c12
.
.
c2 X
XXX- .
z cM
X
2
..
.
1
: c.L
.
cL .
XX
XXX
z cM
L
experiment 1
..
.
experiment M
Different experiments yield
different data
Experiments are imperfect!
Sampling from PINs
PI may be detected via different experiments
Assume we have L biologial species; M experiments;
Biological
Experimental data
:
c1 X
XXXz
X
:
c11
..
.
cM
1
experiment 1
..
.
experiment M
Different experiments yield
different data
Experiments are imperfect!
c12
.
.
c2 X
XXX- .
z cM
X
2
..
.
1
: c.L
.
cL .
XX
XXX
z cM
L
Infer p` , W` , ` = 1, . . . , L of true underlying biological PINs
from noisy/inconsistent data? Need mathematical model!
Experimental biases
Model each experiment α = 1, . . . , M in terms of
θα = {xα , y α , z α }
Experimental biases
Model each experiment α = 1, . . . , M in terms of
θα = {xα , y α , z α }
probability to miss a protein: xα
Experimental biases
Model each experiment α = 1, . . . , M in terms of
θα = {xα , y α , z α }
probability to miss a protein: xα
probability to miss a link: y α
Experimental biases
Model each experiment α = 1, . . . , M in terms of
θα = {xα , y α , z α }
probability to miss a protein: xα
probability to miss a link: y α
probability to create a non-existing link: z α
Bayesian inference
Objective: infer {θα , p` , W` } given data
Bayesian inference
Objective: infer {θα , p` , W` } given data
Maximimum likelihood: maximize “posterior”
p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα
Bayesian inference
Objective: infer {θα , p` , W` } given data
Maximimum likelihood: maximize “posterior”
p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα
P (X)
P (X,Y )
}|
{
}|
{ zX
z
P (Y )P (X|Y )
Use Bayes P (Y |X) = P (Y )P (X|Y ) /
Y
Bayesian inference
Objective: infer {θα , p` , W` } given data
Maximimum likelihood: maximize “posterior”
p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα
P (X)
P (X,Y )
}|
{
}|
{ zX
z
P (Y )P (X|Y )
Use Bayes P (Y |X) = P (Y )P (X|Y ) /
Y
only need
p(cα` |θα , p` , W` ),
p(θα , p` , W` )
Bayesian inference
Objective: infer {θα , p` , W` } given data
Maximimum likelihood: maximize “posterior”
p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα
P (X)
P (X,Y )
}|
{
}|
{ zX
z
P (Y )P (X|Y )
Use Bayes P (Y |X) = P (Y )P (X|Y ) /
Y
only need
p(cα` |θα , p` , W` ),
p(θα , p` , W` )
No “apriori” knowledge: uniform prior p(p` , W` ), p(θα )
Bayesian inference
Objective: infer {θα , p` , W` } given data
Maximimum likelihood: maximize “posterior”
p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα
P (X)
P (X,Y )
}|
{
}|
{ zX
z
P (Y )P (X|Y )
Use Bayes P (Y |X) = P (Y )P (X|Y ) /
Y
only need
p(cα` |θα , p` , W` ),
p(θα , p` , W` )
No “apriori” knowledge: uniform prior p(p` , W` ), p(θα )
P
P (X) = Y P (Y )P (X|Y )
Bayesian inference
Objective: infer {θα , p` , W` } given data
Maximimum likelihood: maximize “posterior”
p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα
P (X)
P (X,Y )
}|
{
}|
{ zX
z
P (Y )P (X|Y )
Use Bayes P (Y |X) = P (Y )P (X|Y ) /
Y
only need
p(cα` |θα , p` , W` ),
p(θα , p` , W` )
No “apriori” knowledge: uniform prior p(p` , W` ), p(θα )
P
P (X) = Y P (Y )P (X|Y )
Bayesian inference
Objective: infer {θα , p` , W` } given data
Maximimum likelihood: maximize “posterior”
p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα
P (X)
P (X,Y )
}|
{
}|
{ zX
z
P (Y )P (X|Y )
Use Bayes P (Y |X) = P (Y )P (X|Y ) /
Y
only need
p(cα` |θα , p` , W` ),
p(θα , p` , W` )
No “apriori” knowledge: uniform prior p(p` , W` ), p(θα )
P
P (X) = Y P (Y )P (X|Y )
p(cα
` |θα , p` , W` ) =
X
c`
P (c` |W` , p` ) p(cα
|θα , c` )
|
{z
} | ` {z
}
known!
easy
Bayesian inference
Objective: infer {θα , p` , W` } given data
Maximimum likelihood: maximize “posterior”
p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα
P (X)
P (X,Y )
}|
{
}|
{ zX
z
P (Y )P (X|Y )
Use Bayes P (Y |X) = P (Y )P (X|Y ) /
Y
only need
p(cα` |θα , p` , W` ),
p(θα , p` , W` )
No “apriori” knowledge: uniform prior p(p` , W` ), p(θα )
P
P (X) = Y P (Y )P (X|Y )
p(cα
` |θα , p` , W` ) =
X
c`
P (c` |W` , p` ) p(cα
|θα , c` )
|
{z
} | ` {z
}
Decontamination of data sets!
known!
easy
Conclusions
Networks as infrastructure of signalling processes
Conclusions
Networks as infrastructure of signalling processes
Maths proved useful to:
Conclusions
Networks as infrastructure of signalling processes
Maths proved useful to:
Quantify structure of networks
Conclusions
Networks as infrastructure of signalling processes
Maths proved useful to:
Quantify structure of networks
Define “good” reference models for hypothesis testing and
quantitative predictions
Conclusions
Networks as infrastructure of signalling processes
Maths proved useful to:
Quantify structure of networks
Define “good” reference models for hypothesis testing and
quantitative predictions
Assess significance of observed patterns
Conclusions
Networks as infrastructure of signalling processes
Maths proved useful to:
Quantify structure of networks
Define “good” reference models for hypothesis testing and
quantitative predictions
Assess significance of observed patterns
Define distance between graphs
Conclusions
Networks as infrastructure of signalling processes
Maths proved useful to:
Quantify structure of networks
Define “good” reference models for hypothesis testing and
quantitative predictions
Assess significance of observed patterns
Define distance between graphs
Generate random graphs with the right probabilities
Conclusions
Networks as infrastructure of signalling processes
Maths proved useful to:
Quantify structure of networks
Define “good” reference models for hypothesis testing and
quantitative predictions
Assess significance of observed patterns
Define distance between graphs
Generate random graphs with the right probabilities
Network inference from noisy and incosistent data
Conclusions
Networks as infrastructure of signalling processes
Maths proved useful to:
Quantify structure of networks
Define “good” reference models for hypothesis testing and
quantitative predictions
Assess significance of observed patterns
Define distance between graphs
Generate random graphs with the right probabilities
Network inference from noisy and incosistent data
Maths of 21st century can help greatly bio-medical sciences
Conclusions
Networks as infrastructure of signalling processes
Maths proved useful to:
Quantify structure of networks
Define “good” reference models for hypothesis testing and
quantitative predictions
Assess significance of observed patterns
Define distance between graphs
Generate random graphs with the right probabilities
Network inference from noisy and incosistent data
Maths of 21st century can help greatly bio-medical sciences
Thanks!