Download Centralities for undirected graphs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer network wikipedia , lookup

Network tap wikipedia , lookup

Backpressure routing wikipedia , lookup

Distributed operating system wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Airborne Networking wikipedia , lookup

CAN bus wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Dijkstra's algorithm wikipedia , lookup

Routing in delay-tolerant networking wikipedia , lookup

Kademlia wikipedia , lookup

Transcript
Chapter 4
Centralities for undirected
graphs
The next step in network analysis is to define, using mathematical formalisms, the different features we want to compute. So we present indexes,
called centralities, that are used to describe the importance of a node in a
network. Weight can be interpreted like a measure that gives the role a node
has in a particular biological process, referred to his particular topology. As
we said for the metabolic network, a chemical reaction has a direction, it
is easy to understand why we need centralities that are designed for a directed context. Furthermore new datas that describes directed interactions
are now available and the need for interpretation led to our work. In these
terms it is important to highlight that in a directed network a node cannot
be reached by every other node and in this case we introduce a distance that
is set as ∞: dist(v, w) := ∞ if v doesn� t reach w. This specification is useful
only for distance-based centralities, Eccentricity, Closeness and Radiality.
In fig.4.1 we present the two networks that are used to show how centralities computation works.
(a) Undirected network
(b) Directed network
Figure 4.1: Networks used in Eccentricity, Radiality and Closness examples
15
CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS
4.1
16
Eccentricity
Eccentricity, presented by Hage and Harary[30], describes the neighborhood
of a node using the distance between the root and all the other vertices in a
graph. With eccentricity we can find the node that minimizes the maximum
distance to any other nodes in the network. This node is the one with the
highest eccentricity value.
Classic definition
The index, using the old definition, is computed by:
Cecc (v) :=
1
max{dist(v, w) : w ∈ V }
New definition
We used the same formalism also in directed networks but we have to introduce a specification. Eccentricity is computed only on nodes that are
reached by v. This is important because it only gives us information on
the neighborhood of a node; a node that has neighbors at a distance of one
has a high eccenticity but influences few other nodes, but a node that has
a low value of eccentricity is connected to distant nodes and can influence a
cascade of reactions.
Eccentricity is calculated by computing all the shortest paths between v
and all the other nodes in the graph. Then the longest path is chosen and
the eccentricity computed. Zero is the lowest value for eccentricity when a
node doesn’t reach another node. The highest value is one, when the longest
shortest path between v and all his neighbors is one.
Example
Using the graph in fig.4.1 we can show how eccentricity works. If we analyse
the green node and its neighbors it is easy to understand that in the undirected network it has the yellow node as its farthest node. In the directed
case the lightblue nodes are farthest away and in this case the distance is one.
With this information we can compute the eccentricity of the green node in
the different networks using the distances: dist(green, w) = {1, 1, 1, 1, 1, 2}
in undirected context and dist(green, w) = {1, 1, 1, 1} in directed context:
Cecc (undirected) =
Cecc (directed) =
1
= 0.5
2
1
=1
1
CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS
17
These values gives us information about the topology of the graph. In
the undirected case, in fig.4.1, the green node has a central role with respect
to lightblue and yellow ones that appear to be more marginal. But if we
compute eccentricity in a directed context we note that the green node has
a higher value than the yellow nodes; this is obvious because each of the two
yellow nodes can reach all other nodes by using a longest path, but green has
nearer neighbors and this has great influence on eccentricity, as mentioned
above. In this way we gain an idea about the real neighborhood of a node
and this tells us that the green node has an important role to play for the
nodes that are his immediate neighbors because it regulates them directly.
Biological interpretation[29]
The eccentricity of a node in a biological network, for instance a proteinsignalling network, can be interpreted as the easiness of a protein to be
functionally reached by all other proteins in the network. Thus, a protein
with high eccentricity, compared to the average eccentricity of the network,
will be more easily influenced by the activity of other proteins (the protein is
subject to a more stringent or complex regulation) or, conversely could easily influence several other proteins. In contrast, a low eccentricity, compared
to the average eccentricity of the network, could indicate a marginal functional role (although this should be also evaluated with other parameters
and contextualized to the network annotations).
4.2
Closeness
Closeness[31] is quite similar to eccentricity, because both work on distances
between nodes, but it is used to find the minimal sum of all the distances in a
network. With eccentricity we want to minimize the longest path, but with
closeness we want to find the node that minimizes the distance to any other
node in the graph. So nodes with high closeness values have low distances
to their neighbors.
Classic definition
The old closeness:
New definition
Cclo (v) := �
w∈V
1
dist(v, w)
The new formulation:
Cclo (v) :=
�
w∈V
1
dist(v, w)
CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS
18
We modified the definition in order to introduce the impossibility for
several nodes to reach other nodes. So using the value inf ty when a node
don’t reach another node we add a contribution of zero to the total sum; in
this way each node adds its contribution to the total Closeness.
In directed networks, like in eccentricity centrality, we assume that a
node that can’t be reached has a distance from the root of infinity. This
assumption means that the closeness sum is near to zero.
Example
In fig.4.1 there is an example used to compute this centrality. The distances
are the same used for eccentricity: dist(green, w) = {1, 1, 1, 1, 1, 2} in undirected context and dist(green, w) = {1, 1, 1, 1, ∞, ∞} in directed context.
The computed values are:
Cclo (undirected) =
Cclo (directed) =
1
1
1
1+1+1+1+1+2
+
1
1
+
1
1
+
1
1
=
+
1
7
1
∞
= 0.1429
+
1
∞
=4
These values tell us that the yellow node has a neighborhood that is more
compact than the green one. In a real biological network it is possible to find
more than one shortest path between two nodes. We decided to consider,
for closeness and radiality, only one short path when computing these two
centralities. This is important because when we compute these values what
we want to investigate is the distances between the root and all the other
nodes and so we consider only one shortest path between, for example, the
green and yellow nodes because the value gives us the distance. We don’t
want to know how many paths reach yellow starting from green node, but
only the distance between them. High closeness values become interesting
when supported by high eccentricity values, compared to the average values
in the network. This is not confirmed because a node with high eccentricity
in a directed case, has a very low closeness value compared to the yellow
one.
Biological interpretation[29]
The closeness of a node in a biological network, for instance a proteinsignalling network, can be interpreted as the probability of a protein to be
functionally relevant for several other proteins, but with the possibility of
being irrelevant for a few other proteins. Thus, a protein with high closeness, compared to the average closeness of the network, will be easily central
to the regulation of other proteins but with some proteins not influenced by
its activity. Notably, in biological networks it could also be of interest to
CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS
19
analyze proteins with low closeness, compared to the average closeness of
the network, as these proteins, although less relevant to that specific network, are possibly behaving as intersecting boundaries with other networks.
Accordingly, a signalling network with a very high average closeness is more
likely organizing functional units or modules, whereas a signalling network
with very low average closeness will behave more likely as an open cluster
of proteins connecting different regulatory modules.
4.3
Radiality
Radiality[32], like closeness, is distance based and is useful to understand if
a node is integrated into the network. This means that the closer the node
is to other nodes the better it is integrated into the graph. Like closeness,
high values of radiality suggest that the node can easily reach other nodes.
Classic definition
The classic definition, where n is the number of nodes in the network and
ΔG that is the diameter of the graph that is the longest shortest path found
in the network:
Crad (v) :=
�
(ΔG + 1 − dist(v, w))
n−1
In our case n = 7 and ΔG = 2.
New definition
The new formulation lose the term that describes the number of nodes in
the network.
Crad (v) := �
1
1
ΔG − dist(v,w)
where ΔG = 1. The new definition, as already said for closeness, permits
to assign ∞ when a node don’t reach another node. This add a �1Δ to
G
the total sum of Radiality, and is the lowest partial value. Each node in the
network add his contribution to Radiality.
Example
Using the graph in fig.4.1 it is possible to compute the radiality for the
yellow and green nodes. The distances are, another time, dist(green, w) =
{1, 1, 1, 1, 1, 2} in the undirected context and dist(green, w) = {1, 1, 1, 1, ∞, ∞}
in the directed context so theΔG in first example is two, but in the second
example is one.
CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS
Crad (undirected) =
Crad (directed) =
(3−1)∗5+(3−2)
7−1
=
1
1
(1−( 11 ))∗4+(1−( ∞
))
11
6
=
1
2
20
= 1.8333
= 0.5
Eccenticity, closeness and radiality are influenced by the distances between nodes and have to be compared with average values. Together they
highlight the role of a node, marginal or central, but may be influenced by
opposing situations that depend on the context, directed or undirected.
Biological interpretation[29]
The radiality of a node in a biological network, for instance a proteinsignalling network, can be interpreted as the probability of a protein to
be functionally relevant for some proteins, but with the possibility of being
irrelevant for others. Thus, a protein with high radiality, compared to the
average radiality of the network, will be relatively central to the regulation
of other proteins though some proteins will not be influenced by its activity.
Notably, in biological networks it could also be of interest to analyze proteins with low radiality, compared to the average radiality of the network,
as these proteins, although less relevant for that specific network, are possibly behaving as intersecting boundaries with other networks. Accordingly,
a signalling network with a very high average radiality is more likely organizing functional units or modules, whereas a signalling network with very
low average radiality will behave more likely as an open cluster of proteins
connecting different regulatory modules. All these interpretations should be
accompanied by the contemporary evaluation of eccentricity and closeness.
4.4
Stress
Stress centrality[33] is based on shortest paths. Eccentricity, Radiality and
Closeness are based on distances between node; this difference permits to
evaluate those three centralities together and stress values are interpreted
using also Betweenness centrality, that is shortest path based. Stress gives
us information about the number of shortest paths passing through a node
and tells us how much work a vertex has to do in a graph.
Classic definition
The value δst (v) is the number of paths that pass through a node starting
from node s and ending in node t:
Cstr (v) :=
�
�
s�=v∈V t�=v∈V
δst (v)
CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS
21
Shortest paths starting and ending in v are excluded because stress measures the paths that pass through a node. This assumption is also used for
Betweenness centrality.
New definition
The formalism we use is the same; the difference, as said in Section 2, lay
on the different definition of degree. In directed network we have in-degree
and out-degree which determine the Stress value of a node. This behaviour
is highlighted by results that are shown in Appendix.
Example
The stress for green node in fig.4.1 is easily computable. In the undirected
example each node reaches each other in the graph passing through the green
one except yellow node to reach the orange one. So there are twentyeight
paths that pass through the green node because each lightblue nodes reaches
five nodes but the orange and the yellow nodes reach only four nodes passing
through green. But in the directed network only paths starting from yellow
nodes pass through the green and are only eight because yellow nodes reach
only lightblue ones. To make a comparison the stress of lightblue nodes is
zero and this tells us that the green node has a relevant role in connecting
nodes in this graph.
Biological interpretation[29]
The stress of a node in a biological network, for instance a protein-signalling
network, can indicate the relevance of a protein as functionally capable of
holding together communicating nodes. The higher the value the higher
the relevance of the protein in connecting regulatory molecules. Due to the
nature of this centrality, it is possible that the stress simply indicates a
molecule heavily involved in cellular processes but not relevant to maintaining communications between other proteins.
4.5
Betweenness
Betweenness[34] like Stress is based on the number of shortest paths. This
centrality is computed by counting the number of shortest paths starting
from a node and ending in another node, v1 and v2 that pass through a
third node, n. The computing of betweenness is done by couples of nodes
because two nodes are chosen and then the number of paths between them
is counted. Then the value is compared to the number of paths that pass
through a certain node, and then to another node for all the nodes in the
network obtaining partial betweenness. Then another couple is chosen and
CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS
22
the computation restarts. In the end we have to sum all partial betweenness
for each node.
This centrality is interesting because it gives us not only the amount of
work of every node, but also it tells us if a node is essential in maintaining
connections in the graph.
Classic definition
Cbet (v) :=
�
�
δst (v)
s�=v∈V t�=v∈V
where:
δst (v) :=
σst (v)
σst
if s reaches t. Else δst (v) = 0.
New definition
As already said for Stress, we maintained the same definition, but results
varies because of the different nature of directed and undirected networks.
Example
Here we show how to compute betweenness for the green node in fig.4.1.
δst (green) depends on the number of shortest paths starting from s and
ending in t and on the number of shortest paths that pass through the green
node. So we have in the undirected case:
δlightblue,yellow (green) = (1/1) ∗ 4 = 4
δyellow,lightblue (green) = (1/1) ∗ 4 = 4
δyellow,orange (green) = 0/1 = 0
δlightblue,orange (green) = (1/1) ∗ 4 = 4
δorange,lightblue (green) = (1/1) ∗ 4 = 4
δlightblue,lightblue (green) = (1/1) ∗ 4 = 4
Else, in the directed case:
δyellow,yellow (green) = 0/1 = 0
δyellow,lightblue (green) = ((1/1) ∗ 4) ∗ 2 = 8
So Cbet (greenu ndirected) = 20 and Cbet (greend irected) = 8. In our
example 0/1 means that the short path between s and t does not pass
through the green node and 1/1 means that the short path passes through
the green node.
CHAPTER 4. CENTRALITIES FOR UNDIRECTED GRAPHS
23
Biological interpretation[29]
The S.-P. Betweenness of a node in a biological network, for instance a
protein-signalling network, can indicate the relevance of a protein as functionally capable of holding together communicating proteins. The higher
the value the higher the relevance of the protein as an organizing regulatory molecule. The S.-P. Betweenness of a protein effectively indicates the
capability of a protein to bring into communication distant proteins. In
signalling modules, proteins with high S.-P. Betweenness are likely to be
crucial to maintaining the function and coherence of signalling mechanisms.