Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IEEE 802.1aq wikipedia , lookup

CAN bus wikipedia , lookup

Dijkstra's algorithm wikipedia , lookup

Kademlia wikipedia , lookup

Transcript
Web Intelligence
Complex Networks II
Some images in this lecture are from the Stroglatz paper.
This Week
Metrics (measures) of complex networks, and how these
metrics seem to relate to properties of the networks
•
Degree distributions
•
Cluster Coefficient and Cluster Function
•
Small-world and scale-free networks
•
Modularity and Hierarchy
•
And, Assignment two
Remember: simple measures of
networks
•
•
•
•
Number of nodes
Each node has a degree
There is some number of edges
Between two nodes, we have the length of
the shortest path
• Diameter: the longest of the shortest-paths
between any two nodes.
Degree Histograms
(mainly we will be talking about undirected networks)
Recall the degree of a node:
A has degree 3, B has degree 2,
C has degree 2, D has degree 1
A
B
C
D
3
2
The degree histogram of this tiny graph is:
1
I.e. it has 0 nodes of degree 0, 1 of degree 1, 0
2 of degree 2, 1 of degree, 3, and 0 of degree >3. 0 1 2 3 4 5…
You might also think of the degree histogram as a table, e.g.:
Degree:
0
1
2
3
4
5
Frequency.
0
1
2
1
0
0
Degree Histograms
(mainly we will be talking about undirected networks)
Recall the degree of a node:
A has degree 3, B has degree 2,
C has degree 2, D has degree 1
A
B
C
D
3
2
The degree histogram of this tiny graph is:
1
I.e. it has 0 nodes of degree 0, 1 of degree 1, 0
2 of degree 2, 1 of degree, 3, and 0 of degree >3. 0 1 2 3 4 5…
You might also think of the degree histogram as a table, e.g.:
Degree:
0
1
2
3
4
5
Frequency.
0
1
2
1
0
0
Degree distributions
Degree:
0
1
2
3
4
5
Frequency
0
1
2
1
0
0
Distribution.
0
0.25
0.5
0.25
0
0
The degree distribution is a function P(k), which gives the probability
of a randomly chosen node from the graph having degree k.
Degree distributions
Degree:
0
1
2
3
4
5
Frequency
0
1
2
1
0
0
Distribution.
0
0.25
0.5
0.25
0
0
The degree distribution is a function P(k), which gives the probability
of a randomly chosen node from the graph having degree k.
What is the degree distribution of the complete graph on 1000 nodes?
Degree distributions
Degree:
0
1
2
3
4
5
Frequency
0
1
2
1
0
0
Distribution.
0
0.25
0.5
0.25
0
0
The degree distribution is a function P(k), which gives the probability
of a randomly chosen node from the graph having degree k.
What is the degree distribution of the complete graph on 1000 nodes?
Imagine I have a graph with 1000 nodes, but no links. Now I start
adding links randomly, one by one. After 10 random additions, what
do you expect the degree distribution to be?
Degree distributions
Degree:
0
1
2
3
4
5
Frequency
0
1
2
1
0
0
Distribution.
0
0.25
0.5
0.25
0
0
The degree distribution is a function P(k), which gives the probability
of a randomly chosen node from the graph having degree k.
What is the degree distribution of the complete graph on 1000 nodes?
Imagine I have a graph with 1000 nodes, but no links. Now I start
adding links randomly, one by one. After 10 random additions, what
do you expect the degree distribution to be?
What will the average node degree be after 1000 additions?
Example degree distributions
P(k)
0.3
0.25
0.2
0.15
P(k)
0.1
17.00
15.00
13.00
11.00
9.00
7.00
5.00
3.00
0
1.00
0.05
The standard situation in a network where
links are added completely at random. If
there are n nodes, and m edges randomly
added, then the peak of this is at 2m/n, the
average degree.
So, for a randomly picked node, the most
likely degree is the average one. The
probabilities then drop quickly either side.
The directorships figure from Stroglatz. Notice the
stretched out tail.
Unlike random graphs, there are quite a few very
highly connected nodes.
Consider what this means. A few people have
influence over many companies. These just might be
very busy people, or controllers. What kind of person
might have 20 co-directors, rather than 40?
Example degree distributions
P(k)
0.3
0.25
0.2
0.15
P(k)
0.1
17.00
15.00
13.00
11.00
9.00
7.00
5.00
3.00
0
1.00
0.05
The standard situation in a network where
links are added completely at random. If
there are n nodes, and m edges randomly
added, then the peak of this is at 2m/n, the
average degree.
So, for a randomly picked node, the most
likely degree is the average one. The
probabilities then drop quickly either side.
The directorships figure from Stroglatz. Notice the
stretched out tail.
Unlike random graphs, there are quite a few very
highly connected nodes.
Consider what this means. A few people have
influence over many companies. These just might be
very busy people, or controllers. What kind of person
might have 20 co-directors, rather than 40?
The Tails
As you know, it is the tails of the degree distribution that seems
interesting. Some notes:
The Tails
As you know, it is the tails of the degree distribution that seems
interesting. Some notes:
In real world networks, these tails are much fatter and longer than
in random networks of the same size.
It seems that, in this tail region, P(k) follows a power law – that
simply means that the way the probability decreases with k seems
to be a reasonably close fit to k   for some  , i.e.: P ( k )  k  
The Tails
As you know, it is the tails of the degree distribution that seems
interesting. Some notes:
In real world networks, these tails are much fatter and longer than
in random networks of the same size.
It seems that, in this tail region, P(k) follows a power law – that
simply means that the way the probability decreases with k seems
to be a reasonably close fit to k   for some  , i.e.: P ( k )  k  
But if so, note that : P ( k )  k    log( P ( k ))   log( k )
So, if there really is a good fit between the tail and a power law,
then when we plot log(P(k)) against log(k) we should get a straight
line sloping downwards towards the right.
-1
-1.5
-2
− Lambda is −1
-2.5
-3
k
30
28
26
24
22
20
18
16
− Lambda is −1.5
14
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
12
P(k)
Power law for exponents -1 to -3
A `random’ 10-node 10-edge
Random Graph
B
A
I
H
C
D
G
J
F
Degree distribution: 0.4 0.3 0.2 0.1
Longest Shortest path: DF 6
E
Let’s build a non-random
10-node 10-edge Small-World Graph
B
A
I
H
C
D
G
F
J
E
We will build this by a Preferential Attachment process:
The chance of a new edge incident at a node increases with the
degree of that node.
This is stochastic, but not uniformly random – it is biased
A 10-node 10-edge Small-World Graph
B
A
I
H
C
D
G
F
J
E
We will build this by a Preferential Attachment process:
The chance of a new edge incident at a node increases with the
degree of that node.
A 10-node 10-edge Small-World Graph
B
A
I
H
C
D
G
F
J
E
We will build this by a Preferential Attachment process:
The chance of a new edge incident at a node increases with the
degree of that node.
A 10-node 10-edge Small-World Graph
B
A
I
H
C
D
G
F
J
E
We will build this by a Preferential Attachment process:
The chance of a new edge incident at a node increases with the
degree of that node.
A 10-node 10-edge Small-World Graph
B
A
I
H
C
D
G
F
J
E
We will build this by a Preferential Attachment process:
The chance of a new edge incident at a node increases with the
degree of that node.
A 10-node 10-edge Small-World Graph
B
A
I
H
C
D
G
F
J
E
We will build this by a Preferential Attachment process:
The chance of a new edge incident at a node increases with the
degree of that node.
A 10-node 10-edge Small-World Graph
B
A
I
H
C
D
G
F
J
E
We will build this by a Preferential Attachment process:
The chance of a new edge incident at a node increases with the
degree of that node.
A 10-node 10-edge Small-World Graph
B
A
I
H
C
D
G
F
J
E
We will build this by a Preferential Attachment process:
The chance of a new edge incident at a node increases with the
degree of that node.
A 10-node 10-edge Small-World Graph
B
A
I
H
C
D
G
F
J
E
We will build this by a Preferential Attachment process:
The chance of a new edge incident at a node increases with the
degree of that node.
A 10-node 10-edge Small-World Graph
B
A
I
H
C
D
G
F
J
E
Degree distribution: 0.6 0.2 0.0 0.1 0.0
Longest shortest path: AE 4
0.1
Notice that:
This was only a tiny example, but illustrative …
The degree distribution of the graph generated by
Preferential attachment had a longer tail.
It’s diameter was small. PA leads to graphs that have many high
degree nodes – such a node is therefore one hop away from many
others. So, a short path is usually available via such a node.
Such graphs with small diameter (longest shortest path, or
average shortest path) are called small world networks.
Such networks also seem highly clustered.
Scale-free networks
We have learned that:
P( k )  k

for many real networks. I.e. real networks seem to have a long
`tail’ in their degree distribution, with significant numbers of nodes
having high degree.
In a random network, most nodes will have their degree close
to the average. So there is a characteristic or typical degree. But this
is not the case if the power law (above) holds. There is no `typical’
degree. The range of degree values varies very greatly – so such a
network is called scale-free
Clustered (or modular) graphs
This graph is clearly clustered – there are groups (clusters) of nodes
that are highly interconnected amongst themselves, but have few
connections to other clusters.
Would such a graph tend to have a high or low diameter?,
`Hierarchical’ graphs
This graph on the left is called modular. The graph on the right is
also clearly modular. E.g. there are three distinct modules (the
things that are copies of the graph on the left). However, each of
these modules seems to have a modular structure of its own. This is
called hierarchical modularity.
More metrics
So far we can characterise graphs by:
•Number of nodes
•Density (number of edges divided by number of possible edges)
•Average path length, longest shortest path length (diameter)
•Degree distribution.
But we need more (graphs which are the same in all these respects
could still be different in terms of the modular and hierarchical
aspects of their structure).
To capture these aspects, there are:
•Cluster coefficient
•Cluster function
Defined next …
The Cluster Coefficient
B
A
I
H
C
D
G
F
J
E
Consider node B.
It has 5 neighbours (can you define `neighbour’?): D, G, J, C, I
Every distinct pair of neighbours (there are 5 x 4 / 2 = 10 distinct pairs)
forms a potential triangle with B. The triangle BJCB exists, because
edge CJ exists. But none of the other 10 exist. The cluster coefficient
of node B is 1/10. What is the CC of node C?
The Cluster Coefficient: a proper definition
Suppose node i has ni neighbours.
Therefore there are ni ( ni 1) / 2
node i’s neighbours)
possible triangles (edges that link
Suppose t i of these edges are in the graph. The clustering coefficient
of node i is defined as:
2ti / ni (ni  1)
ni
The mean of this for a graph is called the CC of the network, C. I.e.:
N
C  1 / N  ci where ci is the cluster coeff. of node i, and N is
i 1
the number of nodes in the graph.
Some related things …
The Cluster Function (with respect to node degree) C(k) is defined
as follows:
In words: C(k) is the mean cluster coefficient over all nodes with
degree k.
C (k )  1 / | N k |  ci
ni
iN k
Where Nk is the set of nodes with degree k.
(Note that the cluster function leads to a distribution)
A high C (in comparison to random graph of same size)
indicates modularity
From Albert & Barabasi, 02
From Albert & Barabasi, 02
C for real networks is much higher than C for random networks
with the same number of nodes and edges
From Albert & Barabasi, 02
Diameters of large networks tend to be much smaller than you
would imagine – real or random. But compare with `mesh’ networks
Notice :
Cluster coeffcients for these graphs are much higher than for
equivalent random graphs. Indicates modularity?
Most of them display the small world property. In some cases the
average path length may be longer than in a random network, but
dense random networks have the small world property anyway.
The power grid graph has much longer paths than the equivalent
random graph. Why?
log k
 2
log k
log P(k)
 1
log P(k)
log P(k)
The exponents of the power law tail seem to vary between 1 and 3.
 3
log k
The lower lambda, the larger the number of highly connected nodes and the larger the
range of degrees
Some interesting facts
If the Cluster Function follows a power law (i.e. the cluster function
C(k) falls with k-lambda for some lambda) then this is evidence for
a hierarchical modular structure
Highly connected nodes are called hubs. The power law exponent
reveals something about the importance of hubs in a given network.
If > 3, the tail is short and hubs are few and not very heavily connected.
For lambda between 2 and 3, this suggests a hierarchy of hubs, with
the most heavily connected hub being connected to a relatively small
fraction of the other nodes, but many of these will be hubs themselves.
Lambda <= 2 suggests hubs that connect to large fractions of the nodes,
acting like control centres.
Scale-free networks in general are robust to damage, however the
presence of hubs (especially when lambda is …?) suggests vulnerability
Assignment 2
Read the brief paper and provide a 4-slide presentation that
conveys the main points to busy scientists.
Paper = Diameter of the WWW
Marking: Completeness (1), Presentation (2), Brevity (1), Wow (1)