Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Protein interaction networks: a mathematical approach A Annibale June 26, 2013 Outline 1 Networks as infrastructure of signalling processes 2 Graphs: definitions and topology measures 3 Tailored random graph ensembles 4 Generation of graphs 5 Current research: Sampling from Graphs and inference 6 Conclusions Outline 1 Networks as infrastructure of signalling processes 2 Graphs: definitions and topology measures 3 Tailored random graph ensembles 4 Generation of graphs 5 Current research: Sampling from Graphs and inference 6 Conclusions Protein interaction networks (PIN) Signalling processes Protein interaction networks (PIN) Signalling processes Transport Protein interaction networks (PIN) Signalling processes Transport Protein complexes (interact for a long time) Protein interaction networks (PIN) Signalling processes Transport Protein complexes (interact for a long time) Protein modification (e.g. protein kinase) Networks as infrastructure of signaling processes Errors in e.g. cellular information processing are responsible for diseases as cancer, autoimmunity or diabetes Networks as infrastructure of signaling processes Errors in e.g. cellular information processing are responsible for diseases as cancer, autoimmunity or diabetes Understand how changes in structure of underlying signalling networks may affect the flow of information Normal or abnormal? Tools to quantify structure of a network Normal or abnormal? Tools to quantify structure of a network Compare networks, detect abnormalities: Normal or abnormal? Tools to quantify structure of a network Compare networks, detect abnormalities: Is this signalling network abnormal? Normal or abnormal? Tools to quantify structure of a network Compare networks, detect abnormalities: Is this signalling network abnormal? Assess significance of an observed pattern: “trivial” or “special” element of structure? Normal or abnormal? Tools to quantify structure of a network Compare networks, detect abnormalities: Is this signalling network abnormal? Assess significance of an observed pattern: “trivial” or “special” element of structure? Crucially, our answers depend on our model of “normal”/“random”! Normal or abnormal? Tools to quantify structure of a network Compare networks, detect abnormalities: Is this signalling network abnormal? Assess significance of an observed pattern: “trivial” or “special” element of structure? Crucially, our answers depend on our model of “normal”/“random”! Define good “reference” or Null models Normal or abnormal? Tools to quantify structure of a network Compare networks, detect abnormalities: Is this signalling network abnormal? Assess significance of an observed pattern: “trivial” or “special” element of structure? Crucially, our answers depend on our model of “normal”/“random”! Define good “reference” or Null models Generate null models as ‘proxies’ to analyse processes: hypothesis testing and prediction Outline 1 Networks as infrastructure of signalling processes 2 Graphs: definitions and topology measures 3 Tailored random graph ensembles 4 Generation of graphs 5 Current research: Sampling from Graphs and inference 6 Conclusions Definitions size N : number of ’nodes’, i, j, k = 1...N Definitions size N : number of ’nodes’, i, j, k = 1...N 1 if link i −−j; ’links’: cij = 0 otherwise Connectivity matrix c= c11 c21 .. . c12 c22 .. . ··· ··· .. . c1j c2j .. . ··· ··· .. . c1N c2N .. . ci1 .. . ci2 .. . ··· .. . cij .. . ··· .. . ciN .. . cN j ··· cN N cN 1 cN 2 · · · Definitions size N : number of ’nodes’, i, j, k = 1...N 1 if link i −−j; ’links’: cij = 0 otherwise Connectivity matrix c= c11 c21 .. . c12 c22 .. . ··· ··· .. . c1j c2j .. . ··· ··· .. . c1N c2N .. . ci1 .. . ci2 .. . ··· .. . cij .. . ··· .. . ciN .. . cN j ··· cN N cN 1 cN 2 · · · symmetry, i.e. graph undirected cij = cji ∀ i, j Definitions size N : number of ’nodes’, i, j, k = 1...N 1 if link i −−j; ’links’: cij = 0 otherwise Connectivity matrix c= c11 c21 .. . c12 c22 .. . ··· ··· .. . c1j c2j .. . ··· ··· .. . c1N c2N .. . ci1 .. . ci2 .. . ··· .. . cij .. . ··· .. . ciN .. . cN j ··· cN N cN 1 cN 2 · · · symmetry, i.e. graph undirected cij = cji ∀ i, j no self-interaction, i.e. cii = 0 ∀i 1 3 • 2 • • • 4 0 1 c= 1 0 1 0 0 1 1 0 0 1 0 1 1 0 1 3 • 2 • • • 4 0 1 c= 1 0 So, how different are these two? 1 0 0 1 1 0 0 1 0 1 1 0 1 3 • 2 • • • 4 0 1 c= 1 0 So, how different are these two? Need tools to quantify structure! 1 0 0 1 1 0 0 1 0 1 1 0 Local structure measures degree ki (nr of partners): ki = 2 • • 3 @ @ @• ki = 4 i 1 • •4 5 • A A A• 7 6 • P j cij Local structure measures degree ki (nr of partners): ki = 2 • • 3 @ @ @• ki = 4 i 1 • •4 P j cij ki = ci1 + ci2 + ci3 + ci4 + ci5 + ci6 = 1 + 1 + 0 + 1 + 1+ 0 5 • A A A• 7 6 • Local structure measures degree ki (nr of partners): ki = 2 • • 3 @ @ @• ki = 4 i 1 • •4 P j cij ki = ci1 + ci2 + ci3 + ci4 + ci5 + ci6 = 1 + 1 + 0 + 1 + 1+ 0 5 • A A A• 7 • Only the neighbours of i contribute 6 Local structure measures degree ki (nr of partners): ki = 2 • • 3 @ @ @• ki = 4 i 1 • •4 P j cij ki = ci1 + ci2 + ci3 + ci4 + ci5 + ci6 = 1 + 1 + 0 + 1 + 1+ 0 5 • A A A• 7 • Only the neighbours of i contribute 6 (1) generalized degrees: ki = P j cij , (2) ki = P js cij cjs , etc Local structure measures degree ki (nr of partners): ki = 2 • • 3 @ @ @• ki = 4 i 1 •4 j cij ki = ci1 + ci2 + ci3 + ci4 + ci5 + ci6 = 1 + 1 + 0 + 1 + 1+ 0 5 • • A A A• 7 • Only the neighbours of i contribute 6 (1) generalized degrees: ki = (2) ki : P P j cij , (2) ki = P js cij cjs , Contributions only from j, s such that • i • j • s etc Local structure measures degree ki (nr of partners): ki = 2 • • 3 @ @ @• ki = 4 i 1 •4 ki = ci1 + ci2 + ci3 + ci4 + ci5 + ci6 5 • A A A• 7 • Only the neighbours of i contribute 6 (1) generalized degrees: ki = P j cij , (2) ki = P js cij cjs , Contributions only from j, s such that • i (`) j cij = 1 + 1 + 0 + 1 + 1+ 0 • (2) ki : P • j • s (ki : nr of paths of length ` away from i) etc clustering coefficient Ci P j<k P Ci = j<k cij cik AA 2• @ @ @• i 1• cij cik cjk A•3 • 4 A AA clustering coefficient Ci P j<k P Ci = = j<k cij cik number of connected pairs among neighbours of i number of pairs among neighbours of i AA 2• @ @ @• i 1• cij cik cjk A•3 • 4 A AA clustering coefficient Ci P j<k P Ci = = j<k cij cik number of connected pairs among neighbours of i number of pairs among neighbours of i AA 2• @ @ @• i 1• cij cik cjk A•3 • 4 A AA → Possible pairs among neighbours of i: (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) clustering coefficient Ci P j<k P Ci = = j<k cij cik number of connected pairs among neighbours of i number of pairs among neighbours of i AA 2• @ @ @• i 1• cij cik cjk A•3 • 4 A AA → Possible pairs among neighbours of i: (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) → only (3, 4) is connected clustering coefficient Ci P j<k P Ci = = cij cik cjk j<k cij cik number of connected pairs among neighbours of i number of pairs among neighbours of i AA 2• @ @ @• A•3 • i 1• Ci = 1/6 4 A AA → Possible pairs among neighbours of i: (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) → only (3, 4) is connected Friendship network How many friends do you have? Friendship network How many friends do you have? Do your friends know each other? Friendship network How many friends do you have? Do your friends know each other? How many are the friends of your friends ? Friendship network How many friends do you have? Do your friends know each other? How many are the friends of your friends ? Answer: Calculate Friendship network How many friends do you have? Do your friends know each other? How many are the friends of your friends ? Answer: Calculate Degree ki Friendship network How many friends do you have? Do your friends know each other? How many are the friends of your friends ? Answer: Calculate Degree ki Clustering Coefficient Ci Friendship network How many friends do you have? Do your friends know each other? How many are the friends of your friends ? Answer: Calculate Degree ki Clustering Coefficient Ci (2) Generalized degree ki Friendship network How many friends do you have? Do your friends know each other? How many are the friends of your friends ? Answer: Calculate Degree ki Clustering Coefficient Ci (2) Generalized degree ki Impractical for large N ! Friendship network from a “non-egocentric” point of view How many friends do people have on average? Friendship network from a “non-egocentric” point of view How many friends do people have on average? Picking up at random one person what is the probability of their having 50 friends? Friendship network from a “non-egocentric” point of view How many friends do people have on average? Picking up at random one person what is the probability of their having 50 friends? What is the average path length between two people in the world? Friendship network from a “non-egocentric” point of view How many friends do people have on average? Picking up at random one person what is the probability of their having 50 friends? What is the average path length between two people in the world? “Six degrees of separation”? Friendship network from a “non-egocentric” point of view How many friends do people have on average? Picking up at random one person what is the probability of their having 50 friends? What is the average path length between two people in the world? “Six degrees of separation”? How likely is that hubs interact with each other? Global Structure Measures Average degree hki = 1 N P i ki Global Structure Measures P Average degree hki = N1 i ki Degree distribution e.g. histogram of degree sequence (k1 , k2 , ..., kN ) p(k) N = 24 k Global Structure Measures P Average degree hki = N1 i ki Degree distribution e.g. histogram of degree sequence (k1 , k2 , ..., kN ) p(k) N = 24 k p(k) = 1 X δki ,k N i δx,y = 1 0 x=y x 6= y Global Structure Measures P Average degree hki = N1 i ki Degree distribution e.g. histogram of degree sequence (k1 , k2 , ..., kN ) p(k) N = 24 k p(k) = X k 1 X δki ,k N i p(k) = 1 δx,y = 1 0 x=y x 6= y Global Structure Measures P Average degree hki = N1 i ki Degree distribution e.g. histogram of degree sequence (k1 , k2 , ..., kN ) p(k) N = 24 k p(k) = X k 1 X δki ,k N i p(k) = 1 δx,y = hki = X k p(k)k 1 0 x=y x 6= y Facebook, PINs etc: shape of p(k)? 0.05 hki = 50: 0.04 0.03 Poissonian (random): 0.02 p(k) = e−hki hkik /k! 0.01 20 40 60 80 100 friends or this? 0.04 hki = 50: 0.03 ‘hubs’ 0.02 Power law (preferential attachment): 0.01 p(k) ∼ k −γ ? 0 20 40 60 80 100 friends Degree Correlations W (k, k 0 ): prob link between nodes with degrees (k, k 0 ) P 0 W (k, k ) = 0 ij cij δk,ki δk ,kj P ij cij @ @ • @ @ @ ki = k • @ @ @ kj = k 0 Degree Correlations W (k, k 0 ): prob link between nodes with degrees (k, k 0 ) P 0 W (k, k ) = 0 ij cij δk,ki δk ,kj @ P ij cij @ • @ @ @ @ @ ki = k Marginal W (k) = X k0 W (k, k 0 ) = • @ kp(k) hki kj = k 0 Degree Correlations W (k, k 0 ): prob link between nodes with degrees (k, k 0 ) P 0 W (k, k ) = 0 ij cij δk,ki δk ,kj @ P ij cij @ • @ @ @ @ @ ki = k Marginal W (k) = X k0 W (k, k 0 ) = • @ kp(k) hki No degree correlations ⇒ W (k, k 0 ) = W (k)W (k 0 ) kj = k 0 Degree Correlations W (k, k 0 ): prob link between nodes with degrees (k, k 0 ) P 0 W (k, k ) = 0 ij cij δk,ki δk ,kj @ P ij cij @ • @ @ @ @ @ ki = k Marginal W (k) = X k0 W (k, k 0 ) = • @ kp(k) hki No degree correlations ⇒ W (k, k 0 ) = W (k)W (k 0 ) Define Π(k, k 0 ) = W (k, k 0 )/W (k)W (k 0 ) kj = k 0 Degree Correlations W (k, k 0 ): prob link between nodes with degrees (k, k 0 ) P 0 W (k, k ) = 0 ij cij δk,ki δk ,kj @ P ij cij @ • @ @ @ @ @ ki = k Marginal W (k) = X k0 W (k, k 0 ) = • @ kp(k) hki No degree correlations ⇒ W (k, k 0 ) = W (k)W (k 0 ) Define Π(k, k 0 ) = W (k, k 0 )/W (k)W (k 0 ) Π 6= 1 signals presence of degrees correlations (beyond p(k)) kj = k 0 Protein interaction networks Π(k, k0 ) p(k) 0.30 cam jejuni PIN 0.20 N = 1325 hki = 17.50 0.10 Pi 0.00 0 10 20 30 40 50 60 1.5 0.30 1.4 1.3 1.2 human PIN 1.1 1.0 0.20 0.9 k‘ 0.8 0.7 N = 9463 0.6 0.5 0.10 0.4 0.3 0.2 0.1 0.00 0.0 0 10 20 30 40 50 60 k hki = 7.40 Protein interaction networks Π(k, k0 ) p(k) 0.30 cam jejuni PIN 0.20 N = 1325 hki = 17.50 0.10 Pi 0.00 0 10 20 30 40 50 60 1.5 0.30 1.4 1.3 1.2 human PIN 1.1 1.0 0.20 0.9 k‘ 0.8 0.7 N = 9463 0.6 0.5 0.10 0.4 0.3 0.2 0.1 0.00 0.0 0 10 20 30 40 50 60 k how ‘special’ is a network with topology {p, Π}? hki = 7.40 Protein interaction networks Π(k, k0 ) p(k) 0.30 cam jejuni PIN 0.20 N = 1325 hki = 17.50 0.10 Pi 0.00 0 10 20 30 40 50 60 1.5 0.30 1.4 1.3 1.2 human PIN 1.1 1.0 0.20 0.9 k‘ 0.8 0.7 N = 9463 0.6 0.5 0.10 0.4 0.3 0.2 0.1 0.00 0.0 0 10 20 30 40 50 60 k how ‘special’ is a network with topology {p, Π}? ‘null models’ with topology {p, Π}? hki = 7.40 Protein interaction networks Π(k, k0 ) p(k) 0.30 cam jejuni PIN 0.20 N = 1325 hki = 17.50 0.10 Pi 0.00 0 10 20 30 40 50 60 1.5 0.30 1.4 1.3 1.2 human PIN 1.1 1.0 0.20 0.9 k‘ 0.8 0.7 N = 9463 0.6 0.5 0.10 0.4 0.3 0.2 0.1 0.00 0.0 0 10 20 30 40 50 60 k how ‘special’ is a network with topology {p, Π}? ‘null models’ with topology {p, Π}? distance between networks {p, Π} and {p0 , Π0 }? hki = 7.40 Outline 1 Networks as infrastructure of signalling processes 2 Graphs: definitions and topology measures 3 Tailored random graph ensembles 4 Generation of graphs 5 Current research: Sampling from Graphs and inference 6 Conclusions Ensembles of random graphs Random graphs: each link i−−j has a prescribed probability Prob(cij ), so c with Prob(c) Ensembles of random graphs Random graphs: each link i−−j has a prescribed probability Prob(cij ), so c with Prob(c) Ensemble of random graphs: Prob(c) tells how likely is to draw a graph c from set G of allowed graphs Ensembles of random graphs Random graphs: each link i−−j has a prescribed probability Prob(cij ), so c with Prob(c) Ensemble of random graphs: Prob(c) tells how likely is to draw a graph c from set G of allowed graphs G={all possible graphs} ' $ Prob(c) 9 c & % Ensembles of random graphs Random graphs: each link i−−j has a prescribed probability Prob(cij ), so c with Prob(c) Ensemble of random graphs: Prob(c) tells how likely is to draw a graph c from set G of allowed graphs G={all possible graphs} ' $ Prob(c) 9 c Useful? & % Ensembles of random graphs Random graphs: each link i−−j has a prescribed probability Prob(cij ), so c with Prob(c) Ensemble of random graphs: Prob(c) tells how likely is to draw a graph c from set G of allowed graphs G={all possible graphs} ' $ Prob(c) 9 c Useful? & % Hard to measure/handle all microcopic variables {cij } Ensembles of random graphs Random graphs: each link i−−j has a prescribed probability Prob(cij ), so c with Prob(c) Ensemble of random graphs: Prob(c) tells how likely is to draw a graph c from set G of allowed graphs G={all possible graphs} ' $ Prob(c) 9 c Useful? & % Hard to measure/handle all microcopic variables {cij } But for large N expect microcopic details less important (distribution of velocities in a gas are always Maxwell!) Ensembles of random graphs Random graphs: each link i−−j has a prescribed probability Prob(cij ), so c with Prob(c) Ensemble of random graphs: Prob(c) tells how likely is to draw a graph c from set G of allowed graphs G={all possible graphs} ' $ Prob(c) 9 c Useful? & % Hard to measure/handle all microcopic variables {cij } But for large N expect microcopic details less important (distribution of velocities in a gas are always Maxwell!) Assume probability distribution Prob(c) ⇒ average over it gives macroscopic behaviour Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' & all possible graphs $ % Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties & all possible graphs $ % Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki ' & & all possible graphs hki = ... $ $ % % Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki p(k) ' ' & & & all possible graphs hki = ... p(k) = . . . $ $ $ % % % Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki p(k) k = (k1 , . . . , kN ) all possible graphs ' hki = ... ' ' $ $ $ $ p(k) = . . . k = (k1 , . . . , kN ) & & & & % % % % Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki p(k) k = (k1 , . . . , kN ) W (k, k 0 ) all possible graphs ' hki = ... ' ' # $ $ $ $ p(k) = . . . k = (k1 , . . . , kN ) W (k, k0 ) = . . . " & & & & !% % % % Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki p(k) k = (k1 , . . . , kN ) W (k, k 0 ) all possible graphs ' hki = ... ' ' # Prob(c|hki), $ $ $ p(k) = . . . k = (k1 , . . . , kN ) W (k, k0 ) = . . . " & & & & Need to specify: $ !% % % % Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki p(k) k = (k1 , . . . , kN ) all possible graphs ' hki = ... ' ' # $ $ $ $ p(k) = . . . k = (k1 , . . . , kN ) W (k, k0 ) = . . . W (k, k 0 ) " & & & & Need to specify: Prob(c|hki), Prob(c|p), !% % % % Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki p(k) k = (k1 , . . . , kN ) W (k, k 0 ) all possible graphs ' hki = ... ' ' # $ $ $ $ p(k) = . . . k = (k1 , . . . , kN ) W (k, k0 ) = . . . " & & & & Need to specify: Prob(c|hki), Prob(c|p), Prob(c|k) !% % % % Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki p(k) k = (k1 , . . . , kN ) W (k, k 0 ) all possible graphs ' hki = ... ' ' # $ $ $ $ p(k) = . . . k = (k1 , . . . , kN ) W (k, k0 ) = . . . " & & & & !% % % % Need to specify: Prob(c|hki), Prob(c|p), Prob(c|k) Easy Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki p(k) k = (k1 , . . . , kN ) W (k, k 0 ) all possible graphs ' hki = ... ' ' # $ $ $ p(k) = . . . k = (k1 , . . . , kN ) W (k, k0 ) = . . . " & & & & !% % % % Need to specify: Prob(c|hki), Prob(c|p), Prob(c|k) Easy Prob(c|k, W ), $ Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki p(k) k = (k1 , . . . , kN ) W (k, k 0 ) all possible graphs ' hki = ... ' ' # $ $ $ $ p(k) = . . . k = (k1 , . . . , kN ) W (k, k0 ) = . . . " & & & & !% % % % Need to specify: Prob(c|hki), Prob(c|p), Prob(c|k) Easy Prob(c|k, W ), Prob(c|p, W ) Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki p(k) k = (k1 , . . . , kN ) W (k, k 0 ) all possible graphs ' hki = ... ' ' # $ $ $ $ p(k) = . . . k = (k1 , . . . , kN ) W (k, k0 ) = . . . " & & & & !% % % % Need to specify: Prob(c|hki), Prob(c|p), Prob(c|k) Easy Prob(c|k, W ), Prob(c|p, W ) Less easy... Tailored Random Graphs Ensembles Tailored random graph ensemble: demand that all graphs c in the ensemble have prescribed properties ' Demand properties hki p(k) k = (k1 , . . . , kN ) W (k, k 0 ) all possible graphs ' hki = ... ' ' # $ $ $ $ p(k) = . . . k = (k1 , . . . , kN ) W (k, k0 ) = . . . " & & & & !% % % % Need to specify: Prob(c|hki), Prob(c|p), Prob(c|k) Easy Prob(c|k, W ), Prob(c|p, W ) Less easy... Tailored graph ensembles as proxies Tailored graph ensembles as proxies protein network Measure properties e.g. p(k), W (k, k 0 ), . . . Tailored graph ensembles as proxies protein network - e.g. p(k), W (k, k 0 ), . . . Measure properties define random graph ensemble with same properties '? $ & % Prob(c|p, W, . . .) = Tailored graph ensembles as proxies protein network - e.g. p(k), W (k, k 0 ), . . . Measure properties hypothesis testing and prediction define random graph ensemble with same properties '? $ & % Prob(c|p, W, . . .) = analyze processes on graph ensemble Graph Counting How special?: count graphs in the ensemble! Graph Counting How special?: count graphs in the ensemble! Number of graphs in the ensemble defined by Prob(c|p, W ): N [p, W ] = eN S[p,W ] 1 X Prob(c|p, W ) log Prob(c|p, W ) S[p, W ] = − N c Graph Counting How special?: count graphs in the ensemble! Number of graphs in the ensemble defined by Prob(c|p, W ): N [p, W ] = eN S[p,W ] 1 X Prob(c|p, W ) log Prob(c|p, W ) S[p, W ] = − N c Measure of uncertainty in a random variable x ∈ {x1 , . . . , xn }, with distribution p(x): Shannon entropy S[p] = − X i p(xi ) log p(xi ) Graph Counting How special?: count graphs in the ensemble! Number of graphs in the ensemble defined by Prob(c|p, W ): N [p, W ] = eN S[p,W ] 1 X Prob(c|p, W ) log Prob(c|p, W ) S[p, W ] = − N c Measure of uncertainty in a random variable x ∈ {x1 , . . . , xn }, with distribution p(x): Shannon entropy S[p] = − X i p(xi ) log p(xi ) ≥ 0 Graph Counting How special?: count graphs in the ensemble! Number of graphs in the ensemble defined by Prob(c|p, W ): N [p, W ] = eN S[p,W ] 1 X Prob(c|p, W ) log Prob(c|p, W ) S[p, W ] = − N c Measure of uncertainty in a random variable x ∈ {x1 , . . . , xn }, with distribution p(x): Shannon entropy S[p] = − X p(xi ) log p(xi ) ≥ 0 i Maximal (subject to P i p(xi ) = 1) for p(xi ) = 1/n ∀ i Graph Counting How special?: count graphs in the ensemble! Number of graphs in the ensemble defined by Prob(c|p, W ): N [p, W ] = eN S[p,W ] 1 X Prob(c|p, W ) log Prob(c|p, W ) S[p, W ] = − N c Measure of uncertainty in a random variable x ∈ {x1 , . . . , xn }, with distribution p(x): Shannon entropy S[p] = − X p(xi ) log p(xi ) ≥ 0 i Maximal (subject to P i S = 0 for p(xi ) = δxi ,x? p(xi ) = 1) for p(xi ) = 1/n ∀ i Distances Distance between two probability distributions p(x), q(x) Kullback − Leibler distance D(p||q) = X x p(x) log p(x) q(x) Distances Distance between two probability distributions p(x), q(x) Kullback − Leibler distance D(p||q) = X x p(x) log p(x) q(x) Distance between cA and cB : measure e.g. pA , WA , . . . and pB , WB , . . . DAB = 1 X Prob(c|pA , WA , . . .) Prob(c|pA , WA , . . .) log 2N c Prob(c|pB , WB , . . .) Dendrograms A B Distance 0 2 4 6 8 10 2 4 6 8 10 S.cerevisiae VIII C.jejuni S.cerevisiae VIII T.pallidum S.cerevisiae X H.sapiens III S.cerevisiae V E.coli D.melanogaster H.sapiens IV S.cerevisiae XI H.pylori S.cerevisiae IV S.cerevisiae IX P.falciparum S.cerevisiae VI H.sapiens II H.sapiens I S.cerevisiae VII M.loti S.cerevisiae III C.elegans S.cerevisiae I S.cerevisiae II S.cerevisiae XII Synechocystis Yeast-two-Hybrid Distance 0 12 S.cerevisiae X AP-MS S.cerevisiae III S.cerevisiae XII S.cerevisiae I S.cerevisiae II S.cerevisiae XI Y2H PCA S.cerevisiae IV S.cerevisiae IX S.cerevisiae VI Affinity Purification-Mass Spectrometry Database Datasets AP-MS Protein Complementation Assay Data Integration suggests strong biases in the experiments.. Outline 1 Networks as infrastructure of signalling processes 2 Graphs: definitions and topology measures 3 Tailored random graph ensembles 4 Generation of graphs 5 Current research: Sampling from Graphs and inference 6 Conclusions Generating graphs from ensembles Generating random graphs with specified probabilities Prob(c) is in general highly non trivial Generating graphs from ensembles Generating random graphs with specified probabilities Prob(c) is in general highly non trivial common method to generate graphs with prescribed degrees: Generating graphs from ensembles Generating random graphs with specified probabilities Prob(c) is in general highly non trivial common method to generate graphs with prescribed degrees: construct ad hoc graph with (k1 , . . . , kN ) Generating graphs from ensembles Generating random graphs with specified probabilities Prob(c) is in general highly non trivial common method to generate graphs with prescribed degrees: construct ad hoc graph with (k1 , . . . , kN ) shuffle links (randomize) while preserving degrees via ‘edge swaps’ • • • • • • −→ • • Generating graphs from ensembles Generating random graphs with specified probabilities Prob(c) is in general highly non trivial common method to generate graphs with prescribed degrees: construct ad hoc graph with (k1 , . . . , kN ) shuffle links (randomize) while preserving degrees via ‘edge swaps’ Problems: • • • • • • −→ • • Generating graphs from ensembles Generating random graphs with specified probabilities Prob(c) is in general highly non trivial common method to generate graphs with prescribed degrees: construct ad hoc graph with (k1 , . . . , kN ) shuffle links (randomize) while preserving degrees via ‘edge swaps’ • • • • • • −→ • Problems: algorithm equilibration, when to stop? • Generating graphs from ensembles Generating random graphs with specified probabilities Prob(c) is in general highly non trivial common method to generate graphs with prescribed degrees: construct ad hoc graph with (k1 , . . . , kN ) shuffle links (randomize) while preserving degrees via ‘edge swaps’ • • • • • • −→ • • Problems: algorithm equilibration, when to stop? algorithm-induced bias: am I generating each graph with the right probability? I shouldn’t generate some graphs more frequenlty than others!! Possible Bias Where will the mouse spend most of its time? (i) Possible Bias Where will the mouse spend most of its time? (i) And now? (ii) Possible Bias Where will the mouse spend most of its time? (i) And now? (ii) Accounting for mobilities “Room” = graph configuration “A”= c “B”= c0 c c0 Accounting for mobilities “Room” = graph configuration “A”= c “B”= c0 c Possible moves n(c) 6= n(c0 ) c0 Accounting for mobilities “Room” = graph configuration “A”= c “B”= c0 c c0 Possible moves n(c) 6= n(c0 ) Difference in mobility may lead to non-uniform sampling Accounting for mobilities “Room” = graph configuration “A”= c “B”= c0 c c0 Possible moves n(c) 6= n(c0 ) Difference in mobility may lead to non-uniform sampling To restore uniformity choose properly transition probability P (c → c0 ) Accounting for mobilities “Room” = graph configuration “A”= c “B”= c0 c c0 Possible moves n(c) 6= n(c0 ) Difference in mobility may lead to non-uniform sampling To restore uniformity choose properly transition probability P (c → c0 ) Can target any Prob(c) with appropriate P (c → c0 )! Accounting for mobilities “Room” = graph configuration “A”= c “B”= c0 c c0 Possible moves n(c) 6= n(c0 ) Difference in mobility may lead to non-uniform sampling To restore uniformity choose properly transition probability P (c → c0 ) Can target any Prob(c) with appropriate P (c → c0 )! Prob(c)P (c → c0 ) = Prob(c0 )P (c0 → c) Generating graphs from P (c|k, W ) P (k) Π(k, k 0 |c0 ) N = 4000 hki = 5 Π(k, k 0 ) (theory) Π(k, k 0 |cfinal ) After 75,000 accepted moves Outline 1 Networks as infrastructure of signalling processes 2 Graphs: definitions and topology measures 3 Tailored random graph ensembles 4 Generation of graphs 5 Current research: Sampling from Graphs and inference 6 Conclusions Sampling from PINs PI may be detected via different experiments Sampling from PINs PI may be detected via different experiments Assume we have L biologial species; M experiments; Sampling from PINs PI may be detected via different experiments Assume we have L biologial species; M experiments; Biological Experimental data : c1 X XXXz X : c11 .. . cM 1 c12 . . c2 X XXX- . z cM X 2 .. . 1 : c.L . cL . XX XXX z cM L experiment 1 .. . experiment M Different experiments yield different data Experiments are imperfect! Sampling from PINs PI may be detected via different experiments Assume we have L biologial species; M experiments; Biological Experimental data : c1 X XXXz X : c11 .. . cM 1 experiment 1 .. . experiment M Different experiments yield different data Experiments are imperfect! c12 . . c2 X XXX- . z cM X 2 .. . 1 : c.L . cL . XX XXX z cM L Infer p` , W` , ` = 1, . . . , L of true underlying biological PINs from noisy/inconsistent data? Need mathematical model! Experimental biases Model each experiment α = 1, . . . , M in terms of θα = {xα , y α , z α } Experimental biases Model each experiment α = 1, . . . , M in terms of θα = {xα , y α , z α } probability to miss a protein: xα Experimental biases Model each experiment α = 1, . . . , M in terms of θα = {xα , y α , z α } probability to miss a protein: xα probability to miss a link: y α Experimental biases Model each experiment α = 1, . . . , M in terms of θα = {xα , y α , z α } probability to miss a protein: xα probability to miss a link: y α probability to create a non-existing link: z α Bayesian inference Objective: infer {θα , p` , W` } given data Bayesian inference Objective: infer {θα , p` , W` } given data Maximimum likelihood: maximize “posterior” p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα Bayesian inference Objective: infer {θα , p` , W` } given data Maximimum likelihood: maximize “posterior” p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα P (X) P (X,Y ) }| { }| { zX z P (Y )P (X|Y ) Use Bayes P (Y |X) = P (Y )P (X|Y ) / Y Bayesian inference Objective: infer {θα , p` , W` } given data Maximimum likelihood: maximize “posterior” p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα P (X) P (X,Y ) }| { }| { zX z P (Y )P (X|Y ) Use Bayes P (Y |X) = P (Y )P (X|Y ) / Y only need p(cα` |θα , p` , W` ), p(θα , p` , W` ) Bayesian inference Objective: infer {θα , p` , W` } given data Maximimum likelihood: maximize “posterior” p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα P (X) P (X,Y ) }| { }| { zX z P (Y )P (X|Y ) Use Bayes P (Y |X) = P (Y )P (X|Y ) / Y only need p(cα` |θα , p` , W` ), p(θα , p` , W` ) No “apriori” knowledge: uniform prior p(p` , W` ), p(θα ) Bayesian inference Objective: infer {θα , p` , W` } given data Maximimum likelihood: maximize “posterior” p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα P (X) P (X,Y ) }| { }| { zX z P (Y )P (X|Y ) Use Bayes P (Y |X) = P (Y )P (X|Y ) / Y only need p(cα` |θα , p` , W` ), p(θα , p` , W` ) No “apriori” knowledge: uniform prior p(p` , W` ), p(θα ) P P (X) = Y P (Y )P (X|Y ) Bayesian inference Objective: infer {θα , p` , W` } given data Maximimum likelihood: maximize “posterior” p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα P (X) P (X,Y ) }| { }| { zX z P (Y )P (X|Y ) Use Bayes P (Y |X) = P (Y )P (X|Y ) / Y only need p(cα` |θα , p` , W` ), p(θα , p` , W` ) No “apriori” knowledge: uniform prior p(p` , W` ), p(θα ) P P (X) = Y P (Y )P (X|Y ) Bayesian inference Objective: infer {θα , p` , W` } given data Maximimum likelihood: maximize “posterior” p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα P (X) P (X,Y ) }| { }| { zX z P (Y )P (X|Y ) Use Bayes P (Y |X) = P (Y )P (X|Y ) / Y only need p(cα` |θα , p` , W` ), p(θα , p` , W` ) No “apriori” knowledge: uniform prior p(p` , W` ), p(θα ) P P (X) = Y P (Y )P (X|Y ) p(cα ` |θα , p` , W` ) = X c` P (c` |W` , p` ) p(cα |θα , c` ) | {z } | ` {z } known! easy Bayesian inference Objective: infer {θα , p` , W` } given data Maximimum likelihood: maximize “posterior” p({θα }, {p` }, {W` }|{cα` }) over p` , W` , θα P (X) P (X,Y ) }| { }| { zX z P (Y )P (X|Y ) Use Bayes P (Y |X) = P (Y )P (X|Y ) / Y only need p(cα` |θα , p` , W` ), p(θα , p` , W` ) No “apriori” knowledge: uniform prior p(p` , W` ), p(θα ) P P (X) = Y P (Y )P (X|Y ) p(cα ` |θα , p` , W` ) = X c` P (c` |W` , p` ) p(cα |θα , c` ) | {z } | ` {z } Decontamination of data sets! known! easy Conclusions Networks as infrastructure of signalling processes Conclusions Networks as infrastructure of signalling processes Maths proved useful to: Conclusions Networks as infrastructure of signalling processes Maths proved useful to: Quantify structure of networks Conclusions Networks as infrastructure of signalling processes Maths proved useful to: Quantify structure of networks Define “good” reference models for hypothesis testing and quantitative predictions Conclusions Networks as infrastructure of signalling processes Maths proved useful to: Quantify structure of networks Define “good” reference models for hypothesis testing and quantitative predictions Assess significance of observed patterns Conclusions Networks as infrastructure of signalling processes Maths proved useful to: Quantify structure of networks Define “good” reference models for hypothesis testing and quantitative predictions Assess significance of observed patterns Define distance between graphs Conclusions Networks as infrastructure of signalling processes Maths proved useful to: Quantify structure of networks Define “good” reference models for hypothesis testing and quantitative predictions Assess significance of observed patterns Define distance between graphs Generate random graphs with the right probabilities Conclusions Networks as infrastructure of signalling processes Maths proved useful to: Quantify structure of networks Define “good” reference models for hypothesis testing and quantitative predictions Assess significance of observed patterns Define distance between graphs Generate random graphs with the right probabilities Network inference from noisy and incosistent data Conclusions Networks as infrastructure of signalling processes Maths proved useful to: Quantify structure of networks Define “good” reference models for hypothesis testing and quantitative predictions Assess significance of observed patterns Define distance between graphs Generate random graphs with the right probabilities Network inference from noisy and incosistent data Maths of 21st century can help greatly bio-medical sciences Conclusions Networks as infrastructure of signalling processes Maths proved useful to: Quantify structure of networks Define “good” reference models for hypothesis testing and quantitative predictions Assess significance of observed patterns Define distance between graphs Generate random graphs with the right probabilities Network inference from noisy and incosistent data Maths of 21st century can help greatly bio-medical sciences Thanks!