Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computer network wikipedia , lookup
Network tap wikipedia , lookup
Backpressure routing wikipedia , lookup
Distributed operating system wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Airborne Networking wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Dijkstra's algorithm wikipedia , lookup
Chapter 4 Centralities for undirected graphs The next step in network analysis is to define, using mathematical formalisms, the different features we want to compute. So we present indexes, called centralities, that are used to describe the importance of a node in a network. Weight can be interpreted like a measure that gives the role a node has in a particular biological process, referred to his particular topology. As we said for the metabolic network, a chemical reaction has a direction, it is easy to understand why we need centralities that are designed for a directed context. Furthermore new datas that describes directed interactions are now available and the need for interpretation led to our work. In these terms it is important to highlight that in a directed network a node cannot be reached by every other node and in this case we introduce a distance that is set as ∞: dist(v, w) := ∞ if v doesn� t reach w. This specification is useful only for distance-based centralities, Eccentricity, Closeness and Radiality. In fig.4.1 we present the two networks that are used to show how centralities computation works. (a) Undirected network (b) Directed network Figure 4.1: Networks used in Eccentricity, Radiality and Closness examples 15 CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS 4.1 16 Eccentricity Eccentricity, presented by Hage and Harary[30], describes the neighborhood of a node using the distance between the root and all the other vertices in a graph. With eccentricity we can find the node that minimizes the maximum distance to any other nodes in the network. This node is the one with the highest eccentricity value. Classic definition The index, using the old definition, is computed by: Cecc (v) := 1 max{dist(v, w) : w ∈ V } New definition We used the same formalism also in directed networks but we have to introduce a specification. Eccentricity is computed only on nodes that are reached by v. This is important because it only gives us information on the neighborhood of a node; a node that has neighbors at a distance of one has a high eccenticity but influences few other nodes, but a node that has a low value of eccentricity is connected to distant nodes and can influence a cascade of reactions. Eccentricity is calculated by computing all the shortest paths between v and all the other nodes in the graph. Then the longest path is chosen and the eccentricity computed. Zero is the lowest value for eccentricity when a node doesn’t reach another node. The highest value is one, when the longest shortest path between v and all his neighbors is one. Example Using the graph in fig.4.1 we can show how eccentricity works. If we analyse the green node and its neighbors it is easy to understand that in the undirected network it has the yellow node as its farthest node. In the directed case the lightblue nodes are farthest away and in this case the distance is one. With this information we can compute the eccentricity of the green node in the different networks using the distances: dist(green, w) = {1, 1, 1, 1, 1, 2} in undirected context and dist(green, w) = {1, 1, 1, 1} in directed context: Cecc (undirected) = Cecc (directed) = 1 = 0.5 2 1 =1 1 CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS 17 These values gives us information about the topology of the graph. In the undirected case, in fig.4.1, the green node has a central role with respect to lightblue and yellow ones that appear to be more marginal. But if we compute eccentricity in a directed context we note that the green node has a higher value than the yellow nodes; this is obvious because each of the two yellow nodes can reach all other nodes by using a longest path, but green has nearer neighbors and this has great influence on eccentricity, as mentioned above. In this way we gain an idea about the real neighborhood of a node and this tells us that the green node has an important role to play for the nodes that are his immediate neighbors because it regulates them directly. Biological interpretation[29] The eccentricity of a node in a biological network, for instance a proteinsignalling network, can be interpreted as the easiness of a protein to be functionally reached by all other proteins in the network. Thus, a protein with high eccentricity, compared to the average eccentricity of the network, will be more easily influenced by the activity of other proteins (the protein is subject to a more stringent or complex regulation) or, conversely could easily influence several other proteins. In contrast, a low eccentricity, compared to the average eccentricity of the network, could indicate a marginal functional role (although this should be also evaluated with other parameters and contextualized to the network annotations). 4.2 Closeness Closeness[31] is quite similar to eccentricity, because both work on distances between nodes, but it is used to find the minimal sum of all the distances in a network. With eccentricity we want to minimize the longest path, but with closeness we want to find the node that minimizes the distance to any other node in the graph. So nodes with high closeness values have low distances to their neighbors. Classic definition The old closeness: New definition Cclo (v) := � w∈V 1 dist(v, w) The new formulation: Cclo (v) := � w∈V 1 dist(v, w) CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS 18 We modified the definition in order to introduce the impossibility for several nodes to reach other nodes. So using the value inf ty when a node don’t reach another node we add a contribution of zero to the total sum; in this way each node adds its contribution to the total Closeness. In directed networks, like in eccentricity centrality, we assume that a node that can’t be reached has a distance from the root of infinity. This assumption means that the closeness sum is near to zero. Example In fig.4.1 there is an example used to compute this centrality. The distances are the same used for eccentricity: dist(green, w) = {1, 1, 1, 1, 1, 2} in undirected context and dist(green, w) = {1, 1, 1, 1, ∞, ∞} in directed context. The computed values are: Cclo (undirected) = Cclo (directed) = 1 1 1 1+1+1+1+1+2 + 1 1 + 1 1 + 1 1 = + 1 7 1 ∞ = 0.1429 + 1 ∞ =4 These values tell us that the yellow node has a neighborhood that is more compact than the green one. In a real biological network it is possible to find more than one shortest path between two nodes. We decided to consider, for closeness and radiality, only one short path when computing these two centralities. This is important because when we compute these values what we want to investigate is the distances between the root and all the other nodes and so we consider only one shortest path between, for example, the green and yellow nodes because the value gives us the distance. We don’t want to know how many paths reach yellow starting from green node, but only the distance between them. High closeness values become interesting when supported by high eccentricity values, compared to the average values in the network. This is not confirmed because a node with high eccentricity in a directed case, has a very low closeness value compared to the yellow one. Biological interpretation[29] The closeness of a node in a biological network, for instance a proteinsignalling network, can be interpreted as the probability of a protein to be functionally relevant for several other proteins, but with the possibility of being irrelevant for a few other proteins. Thus, a protein with high closeness, compared to the average closeness of the network, will be easily central to the regulation of other proteins but with some proteins not influenced by its activity. Notably, in biological networks it could also be of interest to CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS 19 analyze proteins with low closeness, compared to the average closeness of the network, as these proteins, although less relevant to that specific network, are possibly behaving as intersecting boundaries with other networks. Accordingly, a signalling network with a very high average closeness is more likely organizing functional units or modules, whereas a signalling network with very low average closeness will behave more likely as an open cluster of proteins connecting different regulatory modules. 4.3 Radiality Radiality[32], like closeness, is distance based and is useful to understand if a node is integrated into the network. This means that the closer the node is to other nodes the better it is integrated into the graph. Like closeness, high values of radiality suggest that the node can easily reach other nodes. Classic definition The classic definition, where n is the number of nodes in the network and ΔG that is the diameter of the graph that is the longest shortest path found in the network: Crad (v) := � (ΔG + 1 − dist(v, w)) n−1 In our case n = 7 and ΔG = 2. New definition The new formulation lose the term that describes the number of nodes in the network. Crad (v) := � 1 1 ΔG − dist(v,w) where ΔG = 1. The new definition, as already said for closeness, permits to assign ∞ when a node don’t reach another node. This add a �1Δ to G the total sum of Radiality, and is the lowest partial value. Each node in the network add his contribution to Radiality. Example Using the graph in fig.4.1 it is possible to compute the radiality for the yellow and green nodes. The distances are, another time, dist(green, w) = {1, 1, 1, 1, 1, 2} in the undirected context and dist(green, w) = {1, 1, 1, 1, ∞, ∞} in the directed context so theΔG in first example is two, but in the second example is one. CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS Crad (undirected) = Crad (directed) = (3−1)∗5+(3−2) 7−1 = 1 1 (1−( 11 ))∗4+(1−( ∞ )) 11 6 = 1 2 20 = 1.8333 = 0.5 Eccenticity, closeness and radiality are influenced by the distances between nodes and have to be compared with average values. Together they highlight the role of a node, marginal or central, but may be influenced by opposing situations that depend on the context, directed or undirected. Biological interpretation[29] The radiality of a node in a biological network, for instance a proteinsignalling network, can be interpreted as the probability of a protein to be functionally relevant for some proteins, but with the possibility of being irrelevant for others. Thus, a protein with high radiality, compared to the average radiality of the network, will be relatively central to the regulation of other proteins though some proteins will not be influenced by its activity. Notably, in biological networks it could also be of interest to analyze proteins with low radiality, compared to the average radiality of the network, as these proteins, although less relevant for that specific network, are possibly behaving as intersecting boundaries with other networks. Accordingly, a signalling network with a very high average radiality is more likely organizing functional units or modules, whereas a signalling network with very low average radiality will behave more likely as an open cluster of proteins connecting different regulatory modules. All these interpretations should be accompanied by the contemporary evaluation of eccentricity and closeness. 4.4 Stress Stress centrality[33] is based on shortest paths. Eccentricity, Radiality and Closeness are based on distances between node; this difference permits to evaluate those three centralities together and stress values are interpreted using also Betweenness centrality, that is shortest path based. Stress gives us information about the number of shortest paths passing through a node and tells us how much work a vertex has to do in a graph. Classic definition The value δst (v) is the number of paths that pass through a node starting from node s and ending in node t: Cstr (v) := � � s�=v∈V t�=v∈V δst (v) CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS 21 Shortest paths starting and ending in v are excluded because stress measures the paths that pass through a node. This assumption is also used for Betweenness centrality. New definition The formalism we use is the same; the difference, as said in Section 2, lay on the different definition of degree. In directed network we have in-degree and out-degree which determine the Stress value of a node. This behaviour is highlighted by results that are shown in Appendix. Example The stress for green node in fig.4.1 is easily computable. In the undirected example each node reaches each other in the graph passing through the green one except yellow node to reach the orange one. So there are twentyeight paths that pass through the green node because each lightblue nodes reaches five nodes but the orange and the yellow nodes reach only four nodes passing through green. But in the directed network only paths starting from yellow nodes pass through the green and are only eight because yellow nodes reach only lightblue ones. To make a comparison the stress of lightblue nodes is zero and this tells us that the green node has a relevant role in connecting nodes in this graph. Biological interpretation[29] The stress of a node in a biological network, for instance a protein-signalling network, can indicate the relevance of a protein as functionally capable of holding together communicating nodes. The higher the value the higher the relevance of the protein in connecting regulatory molecules. Due to the nature of this centrality, it is possible that the stress simply indicates a molecule heavily involved in cellular processes but not relevant to maintaining communications between other proteins. 4.5 Betweenness Betweenness[34] like Stress is based on the number of shortest paths. This centrality is computed by counting the number of shortest paths starting from a node and ending in another node, v1 and v2 that pass through a third node, n. The computing of betweenness is done by couples of nodes because two nodes are chosen and then the number of paths between them is counted. Then the value is compared to the number of paths that pass through a certain node, and then to another node for all the nodes in the network obtaining partial betweenness. Then another couple is chosen and CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS 22 the computation restarts. In the end we have to sum all partial betweenness for each node. This centrality is interesting because it gives us not only the amount of work of every node, but also it tells us if a node is essential in maintaining connections in the graph. Classic definition Cbet (v) := � � δst (v) s�=v∈V t�=v∈V where: δst (v) := σst (v) σst if s reaches t. Else δst (v) = 0. New definition As already said for Stress, we maintained the same definition, but results varies because of the different nature of directed and undirected networks. Example Here we show how to compute betweenness for the green node in fig.4.1. δst (green) depends on the number of shortest paths starting from s and ending in t and on the number of shortest paths that pass through the green node. So we have in the undirected case: δlightblue,yellow (green) = (1/1) ∗ 4 = 4 δyellow,lightblue (green) = (1/1) ∗ 4 = 4 δyellow,orange (green) = 0/1 = 0 δlightblue,orange (green) = (1/1) ∗ 4 = 4 δorange,lightblue (green) = (1/1) ∗ 4 = 4 δlightblue,lightblue (green) = (1/1) ∗ 4 = 4 Else, in the directed case: δyellow,yellow (green) = 0/1 = 0 δyellow,lightblue (green) = ((1/1) ∗ 4) ∗ 2 = 8 So Cbet (greenu ndirected) = 20 and Cbet (greend irected) = 8. In our example 0/1 means that the short path between s and t does not pass through the green node and 1/1 means that the short path passes through the green node. CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS 23 Biological interpretation[29] The S.-P. Betweenness of a node in a biological network, for instance a protein-signalling network, can indicate the relevance of a protein as functionally capable of holding together communicating proteins. The higher the value the higher the relevance of the protein as an organizing regulatory molecule. The S.-P. Betweenness of a protein effectively indicates the capability of a protein to bring into communication distant proteins. In signalling modules, proteins with high S.-P. Betweenness are likely to be crucial to maintaining the function and coherence of signalling mechanisms.