Download Networks and Discrete Mathematics Random Graphs The Model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Random Graphs
Aristotle University, School of Mathematics
Master in Web Science
 A random graph is obtained by starting with a set
V={1, 2, …, n} of n vertices
and adding edges between them at random. (There are at
edges).
most N=
2
 Different random graph models produce different probability
distributions on graphs.
 Most commonly studied is:
 Model
, (known as the Erdős–Rényi model)
 Model
,
(proposed by Edgar Gilbert)
 The Sample Space ,
Networks and Discrete
Mathematics
Introduction to Random Graphs and
Real-World Networks
Chronis Moyssiadis – Vassilis Karagiannis
WS.04 Webscience
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
The Model
The Model
 Model
,
(proposed by Edgar Gilbert), in which
every possible edge occurs independently with probability
, where 0
1.
 An element 0 in the sample space
, having
vertices and edges has probability of occurrence
⋅
, where
1
0
0
 Model G(n,M) (known as the Erdős–Rényi model) assigns
equal probability to all graphs with exactly M edges
(0
.
 Because there are
,
sample space
probability
selections of the M edges, the
elements each with
contains
 If
, then
, is the sample space containing all
with the same probability
graphs on vertices
.
 Almost always Μ is a function of n, i.e.
We then denote the model as
2
,
.
(
.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
0
)
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
3
The Sample Space , G n
4
Example (a graph process)  The Sample Space , G n :
3!
, !
2! ⋅ 3 2 !
3
!
2
Sample Space , 3 : N!
3!
6
⊂
⊂⋅⋅⋅⊂
with having A graph process, ⊂
precisely t edges. There are N! such sequences in overall.
1
2
⊂
1
3!
⊂
1
6
1
3
2
3
 Bollobas 1985, the three spaces are closely related.
1
2
1
3
2
1
3
2
1
2
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
5
1
3
2
3
1
3
2
3
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
6
Example (the space 3,
Sample Spaces , one of the 3
0, 1, 2, 3. The sample space for each : for subgraphs of K3 and their probabilities is presented below . . 1
2
1
1
3
2
1
2
3
1
3
2
1
1
3
2
3
3,3
1
2
3
2
3,1
1
3
1
3
1
2
3,0
1
3
0
Example in G(3,p)
)
3
3,2
1
3
1
3
2
1
3
3
1
1
3
Example: Let the properties (events) Q1: G in G(3,p) is connected and Q2: G in G(3,p) is bipartite. Are these two properties (events) independent? WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
7
Example-Random Variables in G(3,p)
8
The Degree Distribution in  The degree distribution 

What is the probability that a given node has degree d (or in other words what is the expected fraction of nodes of G have degree d)?
Each node may have at most n‐1 neighbors, hence according to the independence between links we get the binomial expression:
1

The average degree and the variance of the degrees (under the binomial model) are: ̅
1 1
,    respectively
1 and

As n‐> infinity we get the Poisson distribution as an approximation to the binomial:
!
Invariants or basic parameters of a graph like k‐connectivity,  ‐ connectivity, the clique number, the independence number, etc., in the context of random graphs are considered random variables (r.v.)
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
1 ⋅ 2 ⋅ 3 ⋅. . .⋅
10
Studying the model G(n,p)
 What about the other properties (topological
Degree distribution
P(0) = 0.35345468 P(1) = 0.36759287 P(2) = 0.19114829 P(3) = 0.06626474 P(4) = 0.01722883
characteristics) of the network?








Example: Let G be a random graph in G(50, 0.02). (i) What fraction of nodes would be expected to have degree 1. (ii) What fraction of nodes would be expected to have at least degree 1? (iii) What is the probability that there exist a node of degree 15 in G?
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
, where !
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
9
Example Min. 1st Qu. Median Mean 3rd Qu. Max. Variance 0.00 0.00 1.00 1.04 2.00 4.00 1,02
1
11
Is it connected?
Does it contain cycles or it is a forest – tree?
What about the distribution of the components?
What about distances in there?
Do there exist any correlation between the degrees of adjacent
nodes?
Are there nodes that may play central roles?
What about the existence of cliques or communities in there?
What about the vulnerability of such a network?
 What parameters of the network define the
answers ?
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
12
Properties of the model Threshold Functions
 Any property Q of random graphs in G(n,p) is defining a class of graphs closed under isomorphism (a set of graphs with n nodes, or an event in the sample space of random graphs).
Thus a property Q is satisfied by the random graph G if G is in the set of graphs that Q defines. Usually to each property an indicator variable X is associated and proofs are based on the average E(X).
We study the model (usually G(n,p) or G(n,m)) growing the set of nodes n
and forming the probability of a link as a function of n, p(n). The model is completely specified by p(n). Most properties that are studied are referred as monotone increasing or properties. These are properties that are invariant under the addition of edges in the network.





The property of being connected is a monotone increasing property.
The property of containing a cycle is an increasing one.
 Many of these properties arise suddenly when p(n) exceeds some → 0for → ∞, then ( has at least 1 triangle) → 0
→ ∞for → ∞, then ( has at least 1 triangle) → 1
is near 1/ there is a change in the random graph. 1/ then almost surely is triangle free, while, 1/ then almost surely contains a triangle.
Hence if If if Let 1/ , then the function threshold function.
with the previous properties is called a Generally if is any monotone property of graphs (such that is preserved when edges are added)the function t(n) is a threshold function for the property : If if then almost surely ~
,
does not have then ~
, almost surely has .
‐1,
There is not a unique threshold function. In the example with the triangle, any 0is a threshold function.
threshold function pQ(n) and when such a threshold function exists it is said that a phase transition occurs at that threshold. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
We found that if 
while if 
13
Example
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
14
Threshold functions for subgraphs
Find a threshold function for the property Q : in a Poisson random graph G with n nodes, node 1 has at least one link.
Let Q be the property: F is a subgraph of order k and size l in some Poisson random graph G of the space G(n,p).   E (d )  (n  1)  p(n)
The average degree of each node is and the probability that it In G it is expected that either no F exists or a number of F’s (isomorphic to F) exist. The average number of F is:
 n  k!
nk k!
nk
  E (F )    pl   pl  pl
k! a
a
k a
 0

e
P(d  1)  1  P(0)  1 
 1  e 
0!
 n 
p(n) n 
P(d  1)  0
If then   (n  1)p(n) 
 0
1
n 1
 n 
p(n) n 
P(d  1)  1
While if then .
  (n  1)p(n) 
 
1
n 1
1
pQ (n) 
Hence is a threshold function for the property Q.
n 1
has at least one link is where a is the number of the isomorphic F in G.
l
1
c
Moreover any function may be a threshold function also. , c0
pQ (n)  c 

n 1 n 1
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
15
k
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
16
23
n 2
Below the threshold of 1/n, there are a lot of small components (of order nodes) and the largest component of them includes no more than a factor times log(n) of the nodes.
Jackson 2008
The threshold probabilities at which different subgraphs appear in a random graph: some edges also appear. For ≅
trees of order 3 appear, while for ≅
trees trees of all orders are present, and at the same time cycles of order 4 appear. At ≅
of all orders appear. The Probability ≅

pQ (n)  cn l
hence a threshold function is .
Evolution in Poisson RG - Component Distribution
Evolution of specific subgraphs
the graph consists of isolated nodes, while before l




p
1
1
p
    k    P(number of F  1)  1
    k   0
Then if while if
a  l 
a  l 
n 
n 
P(number of F  1)  0
For F
F
As each link is independent of each other the probability that at least one of the F exists is: e  0
P(number of F  1)  1  P(number of F  0)  1 
 1  e 
0!
marks the appearance of complete subgraphs of order 4 and ≅
corresponds to complete subgraphs of order 5. As – approaches 0, the graph contains complete subgraphs of increasing order.
Above the threshold of 1/n, a giant component emerges, which is the largest component 23
that contains a nontrivial fraction of all nodes, i.e., at least cn for some constant c or n
nodes (while the others are either small or isolated nodes).
The giant component grows in size until the threshold of log(n)/n, at which point the network becomes connected.
k/l
Barabasi 2002
Log(100)/100≈0.05
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
17
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
18
Diameter
Average degree, size and Component Size
Because of a phase transition in the connectivity at p = logn /n Average degree and size m (number of links), relation between the spaces G(n,p) and G(n,m):
n(n  1)
2m
2m pn(n  1)
m  p
p
, hence d 

 p(n  1)
n(n  1)
n
2
n
the problem of determining the diameter of G seems to be difficult for certain ranges of p. Some results are :
If p < 1/n , then the diameter is that of a tree component and a rough estimation is In a Poisson random graph we have that:
log
⁄log
1 d  p(n  1) and Var (d )  d , (from the Poisson degree distribution)
If p>1/n, then a giant component appears and if p≥3.5/n, the so from Var (d )  d 2  (d )2 we get d 2  (d )2  d
diameter equals the diameter of that component which is ⁄log
log
1 .
proportional to Hence (Jackson, 2008) the average size of the component that a randomly selected node lies is:
GH (1)  1 
d 2
2d  d 2
1
1
1  p(n  1)
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
If p>logn/n then the graph is almost surely connected and the diameter is concentrated around ⁄log
1
⋅ log
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
19
20
Clustering coefficient
Average path length
Fronczak et al. , 2004 show that for large random graphs the average path length or average distance can be estimated as

1 .
log n  
 0.5
log pn
Suppose that we have a Poisson random social network. Then:
P(i and j know each other) = P(link {i , j} exists) = p. Moreover
where g is the Euler constant (@ 0.5772)
P(i and j know each other when both are friends of k) = P(i and j know each and both are friends of k) / P(i and j are friends of k) = p3 / p2 = p
k
i
j
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
21
Similarities and differences with real world nets
The existence of any acquaintance between i
and j is independent of any other acquaintance between i and k or between j and k. Accordingly as n grows the clustering coefficient stays very low.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
22
A Figure from Bearman, Moody and Stovel from the Add Health data set (revisited)
In any case of a Poisson Random network where links are formed independently has many futures in common with a model where we force the network to have the expected number of links (homogeneous network – relative unrealistic). 2008. Matthew O. Jackson (Book)
Even when with low probability gets connected, the distributions of centrality indices reveal symmetric shapes that exclude real world characteristics like the existence of hubs or in the opposite site cohesive social communities. The “evolution” of those random type nets that based on growing but static population can be almost surely be estimated with high probability based on more of its monotone increasing properties (events). All properties are related to n or to p(n) hence the network is scale dependent.
Although it resembles small diameter characteristic of near almost real world nets it does not support the tendency of transitivity (a friend of my friend is a friend).
Instantly as a static model which functions on a given number of nodes by means of some given order it can serve as a benchmark or a null hypothesis to relevant real world networks with the same order and size. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
23
This is an example of real world networks that its growth was very close to a Poisson random network
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
24
Centrality Measures
Βαθμική Κεντρικότητα (Degree Centrality)
 Ο βαθμός κορυφής deg
, δηλ. το πλήθος ακμών που
συνδέονται με την κορυφή, είναι ένας δείκτης
κεντρικότητας.
Ποιες είναι οι «σημαντικότερες» κορυφές;
Ως προς ποια κριτήρια;
Πλήθος Συνδέσεων
Σημαντικότητα συνδέσεων
Μικρότερες αποστάσεις
Κομβική σπουδαιότητα
Έχουν οριστεί διάφορα μέτρα κεντρικότητας (centrality
measures)
 Η Βαθμική Κεντρικότητα ορίζεται ως το πηλίκο
deg
1
 Αντικαθιστώντας το βαθμό με τον έσω- (ή έξω-) βαθμό (σε
κατευθυνόμενα γραφήματα) έχουμε ανάλογα εσω-(έξω-)
βαθμική κεντρικότητα.
 Σε κοινωνικά δίκτυα σημαίνει όσο περισσότερες φιλίες
(γνωριμίες, σχέσεις) έχει ένα άτομο (κορυφή) τόσο
σημαντικότερο είναι στο δίκτυο.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
25
Για το δίκτυο των Μεδίκων
n παίκτες-κορυφές (στο σχήμα n=6)
παίζουν ανά δύο μέχρι τελικής νίκης
(δεν υπάρχει ισοπαλία).
Η κατεύθυνση της ακμής σημαίνει
νίκη, π.χ. από την κορυφή 3 προς
την κορυφή 1, σημαίνει ότι ο παίκτης
3 νίκησε τον παίκτη 1.
Διάνυσμα βαθμικής κεντρικότητας
(με τους έξω-βαθμούς, δηλαδή τις
νίκες) είναι:
0.8, 0.6, 0.6, 0.4, 0.4, 0.2 ΄
Βαθμική κεντρικότητα:
Medici (0.4), Guadagni (0.267),
Strozzi (0.267), Albizzi (0.2), ……..
που φαίνεται και από το σχήμα και
έδωσε μεγάλη αίγλη στην οικογένεια
των Medici.
27
Ιδιοκεντρικότητα (Eigenvector Centrality)
1
2
6
3
4
5
Καλύτερος παίκτης ο 1,
ακολουθούν οι 2 και 3 που
είναι ισοδύναμοι
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
28
Ιδιοκεντρικότητα (συν.)
 Ο παίκτης 1 έχει 4 νίκες, ενώ
Είναι φυσική γενίκευση της βαθμικής
κεντρικότητας (θα οριστεί πιο κάτω).
ο παίκτης 3 έχει 3 νίκες.
 Όμως οι νικημένοι από τον 1 έχουν
Στο Σχήμα.
Οι κορυφές 1, 6 έχουν βαθμό 3.
συνολικά 8 νίκες,
1
2
6
3
 ενώ οι νικημένοι από τον 1 έχουν
Όμως οι γείτονες της 1 έχουν συνολικό άθροισμα βαθμών 5,
ενώ οι γείτονες της 6 έχουν συνολικό άθροισμα βαθμών 11.
Αυτό μπορεί να ερμηνευτεί ότι η κορυφή 6 έχει μεγαλύτερη
σημαντικότητα από την κορυφή 1 (έχει βαθμικά
σημαντικότερους γείτονες).
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
26
Ένα δίκτυο από round robin τουρνουά
Γνωστό κοινωνικό δίκτυο με κόμβους
μεγάλες οικογένειες της Φλωρεντίας
του 15ου αιώνα και συνδέσεις μεταξύ
των οικογενειών που παριστάνουν
γάμους μεταξύ των οικογενειών.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
29
συνολικά 9 νίκες
5
4
 Αυτό μπορεί να ερμηνευτεί ότι ο παίκτης 3 μπορεί να είναι
ισχυρότερος από τον παίκτη 1, αν λάβουμε υπόψη ποιους
νίκησε.
 Στα κοινωνικά δίκτυα μετρά η σημαντικότητα των φίλων
μας για την τελική σημαντικότητα που μας αποδίδεται.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
30
Οι δυνάμεις του πίνακα αντιστοίχισης
Έστω ο πίνακας αντιστοίχισης του
,
, όπου
Αν
 Αποδεικνύεται ότι οριακά ισχύει
,
τότε
(οι βαθμοί κορυφών)
Διαπιστώνουμε
(είναι το άθροισμα
βαθμών άμεσων γειτόνων)
Συνεχίζοντας βρίσκουμε
(αθροίσματα n-τάξης)
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
Ιδιοκεντρικότητα
,
η μεγαλύτερη ιδιοτιμή του πίνακα Α.
όπου
 Η κανονικοποιημένη μορφή (normalize) της
μεγαλύτερης ιδιοτιμής
του πίνακα Α,
δηλαδή αυτή που προκύπτει με διαίρεση των
, είναι η
συνιστωσών της με το μέτρο της
ιδιοκεντρικότητα των κορυφών του γραφήματος
που παριστάνει το Α.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
31
Ύπαρξη Ιδιοκεντρικότητας
32
Παράδειγμα
Το igraph έχει τη συνάρτηση evcent
για υπολογισμό ιδιοκεντρικότητας.
Το πακέτο sna της R έχει, ομοίως,
τη συνάρτηση evcent
 Η ύπαρξή του διανύσματος
και το ότι είναι πραγματικό και
θετικό εξασφαλίζεται από το θεώρημα Peron-Frobenius (P-F).
 Πράγματι σε συνδετικά απλά γραφήματα ο πίνακας αντιστοίχισης
είναι συμμετρικός και πρωταρχικός (δηλ. υπάρχει δύναμη του ,
έστω η , με όλα τα στοιχεία διάφορα του 0), προϋποθέσεις για
την ισχύ του θεωρήματος P-F.
 Στα κατευθυνόμενα γραφήματα:
 δεν ορίζεται μονοσήμαντα το ιδιοδιάνυσμα της μέγιστης
ιδιοτιμής.
 ο πίνακας Α είναι μη-συμμετρικός.
Έτσι η ιδιοκεντρικότητα δεν λειτουργεί καλά.
Εφαρμογή στο σχήμα:
0.147, 0.045,0.214, 0.214, 0.544, 0.443, 0.443, 0.443 ΄
Δηλ. η κορυφή 5 είναι η πλέον κεντρική, όμως οι 6, 7, 8 έχουν
ίδιο βαθμό πολύ μεγαλύτερο από το βαθμό της κορυφής 1.
Η ιδιοκεντρικότητα ορίστηκε αρχικά από τον Bonacich το 1987.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
33
Παράδειγμα
Κεντρικότητα Katz
Για τους Μεδίκους βρίσκουμε:
Medici
Strozzi
Ridolfi
0.430
0.356
0.342
Tornabuoni Guadagni Bischeri
0.326
0.289
0.283
 Είναι γενίκευση της ιδιοκεντρικότητας. Αν θεωρήσουμε την σχέση
δηλ. η οικογένεια των Medicci που ήταν κεντρική στη βαθμική
κεντρικότητα, εξακολουθεί να είναι κεντρική και στην
ιδιοκεντρικότητα. Όμως, οι δύο οικογένειες Guadagni και
Strozzi που έχουν και οι δύο βαθμό 4 ως προς τη βαθμική
κεντρικότητα (δηλαδή ισοδυναμούν στη δεύτερη θέση), ως
προς την ιδιοκεντρικότητα κατέχουν η μεν Strozzi την 2η θέση,
η δε Guadagni την 5η θέση.
Σ’ αυτήν την κεντρικότητα οι οικογένειες Ridolfi και Tornabuoni
(με βαθμό κορυφής 3) ξεπερνούν την οικογένεια Guadagni.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
34
35
για θετικές ποσότητες α, β, τότε είναι σαν να δίνουμε μία
«ποσότητα κεντρικότητας» σε όλους τους κόμβους (να μην είναι 0).
 Προτάθηκε το 1957 από τον Katz και υπολογίζεται από τη σχέση:
Ως β μπορούμε να πάρουμε το 1, ενώ ως α μία τιμή μικρότερη
του
(στην οποία μηδενίζεται το διάνυσμα).
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
36
Βαθμική κεντρικότητα Page
(PageRank centrality)
Σχέση των τεσσάρων μέτρων
 Είναι βελτίωση της κατά Katz κεντρικότητας .
 Οι Larry Page και Sergey Brin που δημιούργησαν την Google


ανέπτυξαν έναν νέο αλγόριθμο για αξιολόγηση της
σημαντικότητας των σελίδων που συνδέονται με την Google που
τον ονόμασαν PageRank centrality (σελιδοβαθμική κεντρικότητα)
αξιοποιώντας το όνομα του Page.
Η κατά Katz κεντρικότητα έχει το εξής μειονέκτημα: αν ένα
κόμβος έχει μεγάλη κεντρικότητα, τότε και όσοι συνδέονται με
αυτόν έχουν επίσης μεγάλη κεντρικότητα. Για παράδειγμα
κάποιος κόμβος που συνδέεται με τη Google, ή την Yahoo που
έχουν μεγάλη κεντρικότητα θα έχει και αυτός μεγάλη
κεντρικότητα.
Αυτό διορθώθηκε με διαίρεση με τον εξω‐βαθμό, δηλαδή:
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
με σταθερό όρο
διαίρεση
με
έξω-βαθμό
χωρίς
διαίρεση
37
Κεντρικότητα Εγγύτητας (closeness centrality)
βαθμική κεντρικότητα του
βαθμική κεντρικότητα
Page
Katz κεντρικότητα
ιδιοκεντρικότητα
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
38
Κεντρικότητα Εγγύτητας (συν.)
 Η κεντρικότητα αυτή χρησιμοποιείται πολύ σε κοινωνικά
 Αν
,
1, 2, … ,
οι αποστάσεις της κορυφής από τις
γειτονικές κορυφές της, τότε:
1
1
δίκτυα, έχει όμως κάποια μειονεκτήματα.
 Το εύρος των τιμών είναι πολύ μικρό και έτσι δεν μπορεί να γίνει
καλός διαχωρισμός και ταξινόμηση των κορυφών ιδιαίτερα σε
μεγάλα δίκτυα.
 Μεταβάλλεται εύκολα με την προσθήκη ή διαγραφή κορυφών.
η μέση απόσταση, είναι μικρή για τις κεντρικές κορυφές και
μεγαλύτερη για τις απομακρυσμένες κορυφές.
 Έχουν προταθεί και άλλοι ορισμοί, όπως το να
χρησιμοποιηθεί ο αρμονικός μέσος των αποστάσεων ο
οποίος έχει ωραίες ιδιότητες και δεν έχει πρόβλημα με
κορυφές που δεν συνδέονται, αφού το αντίστροφό τους
είναι 0 και δεν μεταβάλλει την κεντρικότητα.
Αν δεν υπάρχει μονοπάτι μεταξύ δύο κορυφών θέτουμε ως
απόσταση το πλήθος όλων των κορυφών (αυθαίρετο).
, είναι η κεντρικότητα
Η αντίστροφη ποσότητα
Χωρίς σταθερό όρο
εγγύτητας.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
39
Ενδιάμεση Κεντρικότητα
(betweenness centrality)
 Αν
=1 όταν η κορυφή i βρίσκεται στη
γεωδαισιακή που συνδέει τις κορυφές s, t και 0
σε άλλη περίπτωση, τότε η ενδιάμεση
κεντρικότητα ορίζεται ως
, όπου
∑
,
The Configuration Model (Bender and Canfield).
The aim is to generate random networks with a given degree sequence (d1, d2, … dn).
Η ενδιάμεση κορυφή
έχει μικρή βαθμική
κεντρικότητα αλλά
μεγάλη ενδιάμεση
κεντρικότητα
,
Αν
το πλήθος των γεωδαισιακών από το s προς το t που περνούν
το πλήθος των γεωδαισιακών από το s προς το t, τότε η
από το i και
ενδιάμεση κεντρικότητα της κορυφής i γράφεται:
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
40
Generalized Poisson Random Graph models
 Η ενδιάμεση κεντρικότητα μετρά το κατά πόσον
μια κορυφή βρίσκεται σε μονοπάτια
(γεωδαισιακές) μεταξύ άλλων κορυφών.
Προτάθηκε από τον Freeman το 1977, αλλά
είχε ήδη προταθεί από τον Anthonisse σε
αδημοσίευτη εργασία του.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
∑
,
,
Construct a sequence where node 1 is listed d1 times, node 2 is listed d2 times, etc.:
We randomly pick two elements of the sequence and form a link between the
two nodes corresponding to those entries.
We delete those entries from the sequence and repeat.
The sum of degrees needs to be even (or else an entry will be left out at the end).
It is possible to have more than one link between two nodes (thus generating a multigraph).
Self‐loops are possible.
Higher degree nodes are involved in a higher percentage of the links (not Poisson)
If we delete loops or multiple edges we have to remember that
n
d
where is the maximum degree up to node n. 41
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
dn
(nd n )1 / 3

n
 0
42
Generalized Poisson Random Graph models
The Expected Degree Model (Chung and Lu).
Start with n nodes and a given degree sequence (d1, d2, … dn). Form a link between nodes i
and j with probability under the restriction so that each of di d j /  dk
max i di2   dk
k
k
the above probabilities is less than 1. Taking the summation for the expected frequency of node i we get that it has degree di but the realized degree distribution is significantly different from the one in the configuration model (ch. 4 Jackson). Real-World Networks
Small World Model
While random graphs exhibit some futures of real‐world networks (small diameters or average distances relative to growing average degree), they lack other characteristics.
Stanley Milgram (1967), Harvard, what is the probability that two randomly selected people would know each other?
While the configuration model leads more closely to the starting degree sequence (under the previous restrictions) the Expected degree model is nearest to the Poisson model.
On average 5.5 hops (in a sparse network)
But:
 Both of these static models have shown properties similar to the Poisson model (based on related threshold functions) and they are useful just for generating random graphs from different initial distributions (even power law distribution).
Six degrees of separation
Why?
In social networks (which are sparse networks) it has observed that the friends of us are usually also friends between them, or in other words people tend to group into relative small (or large clusters) in a way that in some cases does not depend to the place of living. The main problem is that like the Poisson model miss the inclusion of functional mechanisms that formulate and influence social or economical or other real type relationships because growth in these models is based on independent links.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
43
Real‐World Networks ‐ Small World Model
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
44
Real‐World Networks ‐ Small World Model
Suppose that we have a Poisson random social network. Then:
P(i and j know each other) = P(link {i , j} exists) = p. Moreover
P(i and j know each other when both are friends of k) = P(i and j know each and both are friends of k) / P(i and j are friends of k) = p3 / p2 = p
Usually a friendship network having ourselves as a root may have a picture like the one drawn below k
i
The existence of any acquaintance between i
and j is independent of any other acquaintance between i and k or between j and k. j
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
45
Real‐World Networks ‐ Small World Model
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
46
Real‐World Networks ‐ Small World Model
A way to define the previous probability is by the enumeration of the connected triples and the triangles in the network.
A connected triple centered at node i is defined as a path of length 2 having node i as the intermediate node. The number of all possible connected triples at node i having degree di is:
This network has one triangle and eight connected triples.

d (d  1)
T (i)  d2i  i i
2
The individual vertices have local clustering coefficients The transitivity of a node i as also the clustering coefficient of a node i is defined as:
1, 1, 1/6 , 0 and 0, hence the clustering coefficient is equal to C(G) = 13/30 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
47
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
48
Real‐World Networks ‐ Small World Model
Watts and Strogatz (1998). A one dimensional lattice is graph on n nodes such that if we order them into a ring each node is connected to the first k left‐hand neighbors and to the first k right‐hand neighbors. Such a lattice with n=20 nodes and k = 2 is presented below (2k‐regular graph). Then with probability p, we decide independently for each link {u,v} if we replace it with a link {u,w} where w is chosen uniformly at random from the set of nodes.
Real‐World Networks ‐ Small World Model
For a network G we compute the distance from each node i to each other and then we take the average distance for node i . Doing the same for each node in the network we get a set of |V(G)| average distances.
The characteristic path length of the network G is the median
of the all the average distances and sometimes is used instead of average path length (which is the average distance of the graph).
The importance of Watts and Strogatz model is due to the fact that it started the active and important field of modeling large‐scale networks by random graphs defined by simple rules
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
49
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
50
The WS(n, k, p) graph
For any Watts‐Strogatz graph G from WS(n, k, 0) the clustering coefficient for G as also for each vertex of G is equal to
3 k 2
C (G)  
4 k 1
3 k 1
while for G from WS(n, k, p) is
C (G)  
 (1  p)3
2 2k  1
(u) from a given For any Watts‐Strogatz graph G from WS(n, k, 0) the average path length vertex u to any other vertex in G is approximated by
(n  1)(n  k  1)
(u) 
2kn
log n  
p 1
while for G from WS(n, k, p) is near as 
 0.5
log pn
and γ is the Euler constant (which is approximately equal to 0.5772)
The degree distribution which shows a different form than the corresponding to random graphs can be found in Newman 2003, review.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
51
Real‐World Networks ‐ Small World Model
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
52
Two main points of the small-world
phenomenon
Using n = 20, k = 2, and rewiring probability p=0.1 the diameter from 5 goes to 4, while theoretically it was expected near log20 = 2.996 ≈3.
  Random and C  C Random
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
53
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
54
Fundamentals
Beyond the degenerate degree distribution of a k‐regular network where P(d)=1, for d=k and 0 otherwise, the Poisson random network and the generalized random networks, the distribution that had been associated along time ago with a great number of real events is the so called scale‐free distribution or power distribution P(d) which satisfies: 
P(d )  c  d
Where c > 0 is a scalar that depends on the nature of the variable d (for d=1,2,… is the z( )  d1 1
inverse of the Riemann Zeta function ). d
Fundamentals
Scale‐free distributions are often said to exhibit a power law, which is due to the power 
function d
A characteristic of these distributions is the “fat tails” they have. That is they tend to have many nodes with very small and very large degrees that one cannot observe in the usual Poisson random networks in which links are created independently.
They have to be distinguished from the exponential distributions of the type:
P(d )  c  e ad
which show a cut‐off, that is large degrees are unlikely to be met over some value.
Increasing the degree by a factor a the frequency goes down by a factor of a – γ. That means that regardless of the scale of the degrees which have a fixed relative ratio the relative probabilities are equal, that is P(2)/P(1) = P(20)/P(10) = c∙2 – γ / c∙1 – γ.
Hence the term scale‐free. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
55
Example
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
56
Basic Properties and Claims
The main properties of SF graphs that appear in the existing literature can be summarized as:
1. SF networks have scaling (power law) degree distribution.
2. SF networks can be generated by certain random processes, the foremost among which is preferential attachment.
3. SF networks have highly connected “hubs” which “hold the network together” and give the “robust yet fragile” feature of error tolerance but attack vulnerability (in contrast to Poisson Random Networks).
4. SF networks are generic in the sense of being preserved under random degree preserving rewiring.
5. SF networks are self‐similar.
6. SF networks are universal in the sense of not depending on domain‐specific details.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
57
Example (from Adamic)
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
58
Preferential Attachment BA model
59
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
60
Searching for Real-World Network Models
Preferential Attachment BA model
To find suitable models for the real world is the primary goal here. The analyzed
real‐world networks mostly fall into three categories.
The biggest fraction : of research work is devoted to the Internet, WWW that is HTML‐pages and their links (WWW), the newsgroups
and messages posted to two or more of them (USENET), the routers and their
physical connections, the autonomous systems, … In biology, in particular chemical biology, genetics, as networks
Some of these show their net structure directly, at least under a microscope. But some of the most notorious of the biological networks, namely the metabolic networks, are formed a little more subtly. Here the nodes model certain molecules, and links represent chemical reactions between these molecules in the metabolism of a certain organism. In the simplest case, two vertices are connected if there is a
reaction between those molecules.
Sociological networks often appear without scientific help. networks in politics and economy, actors, sexual, friendship …
No cycle is created under the BA model, so other models are proposed….
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
61
Example: World Wide Web
Scientific Collaboration Network
R. Albert, H. Jeong, A‐L Barabasi, Nature, 401 130 (1999).
Nodes: WWW documents Links: URL links
ROBOT: collects all URL’s found in a document and follows them recursively. Over 3 billion documents
Exponential Network
Expected
P(k) ~ k‐
Found
Scale-free Network
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
63
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
64
The BA model: growth and preferential attachment
Example: Sexual Relationships
Nodes: people (Females; Males)
Links: sexual relationships
Growth: Starting with a complete graph with m nodes. Nodes born over time and are i  0,1,2,...,t ,...
indexed by their time of birth . Its new node forms m links with pre‐
existing nodes. Preferential attachment: The probability that some node take a link is proportional to its degree. Hence the probability that node I at time t takes a link is:
d (t )
d (t ) d (t )
m t i
 m i  i
2mt 2t
1 d j (t )
Following a continuous‐time approximation the rate of the change of the i node degrees is: d (t ) t
di (t ) di (t )

 i  , with initial condition di (i)  m
t
t
di (t ) t
having the solution
2
2
 m 
i  m 
 or i  t 


t  di (t ) 
 di (t ) 
2 2
Hence the number of nodes with degree less than di(t) or d in general is: Ft (d )  1  m d
and the corresponding density distribution is f (d )  2m2d 3 (which is P(d )  Cd 3 , given by Barabasi )
4781 Swedes; 18‐74; 59% response rate.
Liljeros et al. Nature 2001
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
62
65
1/2
t
di (t )  m 
i
hence
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
66
The role of growth and preferential attachment
The growth of BA network
Barabási-Albert
(a) Due to preferential attachment the new vertex was linked to vertices with already high connectivity
(b) Since preferential attachment is absent, the new vertex connects with equal probability to any vertex in the system
(c) At every step a new edge is introduced, one end being added to a randomly selected vertex, the other end following preferential attachment
Albert‐László Barabási, 2008
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
67
For the BA model the clustering coefficient of the node vs t steps after the start (based on m initial nodes) is equal to
 2

4m
m 1
 ln (t ) 
ln2 (s) 
C (v s ) 
2
m  12
8 t  s / m  

Fronczak et al., 2003 For the BA model with m initial nodes, the average path length is:
ln(n)  ln(m / 2)  1   3

lnln(n)  ln(m / 2)
2
and γ is the Euler constant (which is approximately equal to 0.5772)
Fronczak et al., 2004 The graph is connected and the older are richer
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
69
Is it just that “rich get richer”?
Pareto (the wealth in populations ‐ huge variability ). Most individuals do not earn so much, but there are these rare individuals that earn a substantial part of the total income. Pareto's principle ‐> `80/20 rule‘ or 20 percent of the people earn 80 percent of the total income.
Zipf (study of the frequencies of occurrence of words in large pieces of text). The most frequent word is about twice as frequent as the second most frequent word, and about three times as frequent as the third most frequent word, etc. In short, with k the rank of
the word and f(k) the relative frequency of kth most frequent word, f(k)~k‐r where r is close to 1. This is called Zipf's law.
Lotka (Chemical Abstracts in the period from 1901‐1916). The number of scientists appearing with 2 entries is close to 1/22 = 1/4 of the number of scientists with just one entry. The number of scientists appearing with 3 entries is close to 1/32 = 1/9 times the number of scientists appearing with 1 entry, etc. Again, with f(k) denoting the number of scientists appearing in k entries, f(k)~k‐r , where r is close to 2. This is dubbed Lotka's Law.
Simon (the “Gibrat” principle). Gibrat argued that the proportional change in the firm size is the same for all firms in an industry. Merton (“Matthew Effect”). For everyone who has will be given more, and he will have an abundance. Whoever does not have, even what he has will be taken from him.
Price (“cumulative advantage”). Newly arriving nodes will tend to connect to already well‐
connected nodes rather than poorly connected nodes in citation networks.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
70
Example
Barabasi, Albert and Jeong investigate the scale‐free nature of the WWW, and propose a preferential attachment model for it. In the proposed model for the WWW, older vertices tend to have the highest degrees Different
realizations of the
model
a) b) c) have ρ(x)
power law with
exponent 2.5 ,3
,4 respectively.
On the WWW this is not necessarily the case, as Adamic and Huberman
demonstrate. For example, Google is a late arrival on the WWW, but has yet managed to become one of the most popular web sites. d) Has
ρ(h)=exp(-x)
and a threshold
rule.
A possible fix for this problem is given by G. Bianconi.‐L. Barabási through a
notion of fitness of vertices, which enhance or decrease their preferential power. Under that assumptions the probability of a node to take a link is not just the “age” but also it depends on another parameter (in some other models except fitness there was no preferential attachment)
 i di
, with i value is taken from some distribution.
tj 1i di
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
68
History on growth and preferential attachment
Properties of the Barabási‐Albert model

WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
From Caldarelli
71
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
72
Klienberg Copying Model
Searching in Networks
power-law graph
number of
nodes found
Consider the WWW.
94
What is the “microscopic”process of growth?
TWO STEPS
67
63
1.GROWTH: Every time step you copy a vertex and its m links 54
2.MUTATION (for everyone of m links)
•With Probability (1‐α) you keep it
•With Probability α you change destination vertex
2
Then the rate of change is:
6
1
d
m
d
 1  a   a
n
n
t
Scale-free from Adamic
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
73
74
Vulnerability
Searching in Networks
Poisson graph
number of
nodes found
Complex systems maintain their basic functions even under errors and failures (cell  mutations; Internet  router breakdowns)
93
19
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
15
11
7
3
1
Poisson from Adamic
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
75
Vulnerability
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
76
Vulnerability
A scale‐free network is thus seen to be sensitive to a targeted attack,
but just as robust as an ER random graph in the case of a random attack
Scale-free from Adamic: Targeting and removing hubs can quickly break up the network
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
77
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
78
Three plots of the same power law
The CCDF plot of some distributions
Willinger et al., 2005
From Caldarelli lectures in complex networks WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
79
Power Law – Lognormal
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
80
Power Law – Lognormal
Mitzenmacher, 2004 . Nevertheless power‐law with preferential attachment is not the only way to describe real events concerning the growth of some living organism, nor it is the only distribution that can be characterized as scale‐free.
Lognormal distribution beyond the application that found in biological nets and other disciplines (where a multiplicative processes take place), in case of sufficient large variance (σ2) may appear to behave like scale‐free for a large range of values :
df : f (d ) 
2
2
1
 e (ln x   ) / 2
2 x
ln( f (d ))   ln d  ln 2  
and taking logarithms:
(lnd   )
(lnd )2  
2

   1 ln d  ln 2   2
2 2
2 2   2
2

2
The conclusion comes from the underlying terms which so a power law with exponent.

1
2
Moreover in many cases there is not any distinct way to distinguish between the best fitting in the real data (especially when one uses the complementary cumulative frequency distribution). WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
81
From Caldarelli lectures in complex networks WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
Power Law ‐ Lognormal
Mitzenmacher, 2004. Power law distributions arise from multiplicative processes once the observation time is random or a lower boundary is put into effect, while lognormal arises when there is no lower boundary concerning data (in contrast with power law it behaves more realistic in the left tail of the distribution).
Another case is the double Pareto distribution which is two different power laws (that is something observed in collaboration networks)
Lognormal by linear regression
Supplementary Material
Power law by linear regression
From Staša Milojević, 2010,
with logarithmic binning. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
83
82
Chebyshev’s Inequality
Markov’s Inequality
Let be a nonnegative random variable and a positive number, then:
Let be a random variable a positive number and /
|
(Use of the inequality) For 1, and ∶ hence if for → ∞, we get → 0, 1 1– 0 → 0
then or 0 → 1
Let X be the number of triangles in ~
X = ∑ S Œ V, |S|=3 XS , where
,
1 ,then :
Thus if 2
 

→ 0, and /
2, VX  3 p  2 (n  2)(n  3)p
n p3  (np)3
3
→ 0, then Let X be the number of triangles in ~
, , then :
∑ S Œ V, |S|=3 ∑
, + ∑
,
∑
∑
,
If and have 0 or 1 node in common, then ,
6– 6 0. If ,
5– 6 . Hence:
then n 3 n
5

(Use of the inequality) For , |
|and 0, 0 | | |
|
2 → 0, hence if → ∞and /
0 → 0
The covariance of , is , – while for an indicator random 2 variable : 2 X and –
1
S] is triangle
XS  0 if G[otherwise
Hence
|
the variance of , then:
2 → 0if → ∞, or and as → ∞, /
almost surely has at least one triangle.
0 → 1.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
85
Power Law Distribution (David Kempe)
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
87
Power Law Distribution (Clauset et al.) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
89
0 → 0which implies that WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
86
Power Law Distribution in WWW (David Kempe)
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
88
Power Law Distribution (Clauset et al.) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
90
Power Law and other Distributions (Clauset et al.)
Off–line and on–line models
The random graph models mentioned above are off‐line models or static and this
refers to models in which all nodes are establishes at the same time. Also distances, clustering coefficient, connectedness …, are dependent on the order (n) of the network as also as the probability p(n) of the existence of each link.
Since real‐world graphs are dynamically changing—both in adding and deleting links and nodes — or in other words they grow over time, there are several on‐line random graph models in which the probability spaces are changing at the
tick of the clock. In fact, in the study of complex real‐world graphs, the on‐line model
came to attention first. On‐line random graph is a graph which grows in size over time, according to given probabilistic rules, starting from a start graph G0. One can make statements about these graphs in the limit of long time (and hence large vertex number n, as n
approaches infinity).
Web graphs are on – line models which try to explain the evolution of the entire complex system. Additionally the topology of the network does not depend on n (self‐similarity).
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
91
Preferential Attachment BA model (David Kempe)
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
93
92
Preferential Attachment BA model (David Kempe)
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
94
The distribution of the eigenvalues of a Random Graph
Balanced and Unbalanced Subgraphs
2m
n
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
m
n
st definition)  (G)  .
d
The average degree of G is and the density ( 1
A random graph has n real eigenvalues λ1≤ λ2≤, …, ≤λn (spectra) and n orthonormal
vectors forming a base of n. A graph G is balanced if the average degree of any subgraph H  G does not exceed the average degree of G. Then:
Let H be a non empty balanced graph with k nodes and l links. Then n‐k/l is a threshold function for the property that G contains H as a subgraph.
We have seen that eigenvalues are related to certain graph properties (walk of given length, average degree, chromatic number, subgraphs, …), while each eigenvector represents a weight function on the nodes of the graph.
The spectra density can ne defined as
For unbalanced graphs:
ρ(λ)= (1/n)∑ 1≤i≤n(λ ‐ λi) , where (λ ‐ λi) =1 for λ= λi and 0 otherwise.
Let H be an unbalanced graph such that  H  max{ F ,F  H}
t(n)  n1/
Then is a threshold function for the property that G contains H as a subgraph.
H
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
95
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
96
The distribution of the eigenvalues of a
Random Graph
When G is a random graph with p<<n‐1 , using computer simulations it is observed that the largest eigenvalue λn is far apart from the other n‐1 eigenvalues and usually ignoring it the shape of the empirical distribution function tends to a “semicyrcle”. In the figure the greatest eigenvalue is out and values in the x‐axis where scaled by 1/σ while values in the y‐axis by σ, where 1
1

i
 (i   )2 ,   n 1
n  1 1in
i n
The distribution of the eigenvalues of a Random Graph
When G is a random graph with p>n‐1 and n→∞ a giant component is emerged. Then ρ(λ) approaches a continuous function. Even more when p>logn/n almost every ransom graph is connected. In this case the spectral density converges to a semicircular distribution :


4 np(1p)2
2 np(1p)

0
()  
if
| | 2 np(1  p)
otherwise
Also the largest eigenvalue is isolated from the rest of the spectra and it increases as pn.
Again it has observed that there are peaks on the empirical distribution (previous slide) which as argued is due to trees that are grafted to the giant component.
Investigations show that peaks are due to small connected components, which is the case also when pn>>1 . More over odd moments are tend to 0 showing the non‐existence of odd‐cycles and also the existence of trees.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
97
Eigenvectors – The inverse participation ratio.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
98
Hits Algorithm
We show that each eigenvector w=(w1, w2, …, wn)T can be interpreted as a weight function w(i) on the nodes of a graph. When the eigenvector are normalized the inverse participation ratio of the jth eigenvector is defined as:
Ij 
 (w ) 
1kn
4
j k
In the extreme cases, if (wj)k are uniformly distributed (probability = 1/√n) then Ij=1/n, while if (wj)k=0 except for k=s, 1≤s≤n where (wj)s =1 then Ij=1.
A characteristic of random graphs is that the scatterplot of the inverse participation ratio against the corresponding eigenvalues reveals that are evenly distributed in a small bandwidth.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
99
Hits Algorithm
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
100
Hits Algorithm
Dorogovtsev, Mendes, 2001
Yamaha
Toyota
Red: best authority
google
black: best hub
The authority scores of the vertices are defined as the principal eigenvector of transpose(A)*A, where A is the adjacency matrix of the graph.
The hubs scores of the vertices are defined as the principal eigenvector of A * transpose(A), where A is the adjacency matrix of the graph.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
101
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
102
PageRank Algorithm
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
PageRank ALgorithm
103
PageRank ALgorithm
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
104
Graph spectra (adjacency matrix) ‐ histogram
Red: the best authority
N=100, p=0.05
N=300, p=0.05
The largest (principal) eigenvalue, 1 , is isolated from the bulk of the
spectrum, and it increases with the network size as pN.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
105
Eigenvector centrality G(1000,0.05)
106
Closeness centrality G(1000,0.05)
what is the distribution of eigenvector centrality?
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
what is the distribution of Closeness centrality?
107
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
108
Betweenness centrality G(1000,0.05)
Regular Random Graphs
regular random graphs Gn, r‐reg
A way to define regular random graphs of degree r on n vertices is to consider all possible matchings in a complete graph Krn, where each matching is chosen with equal probability). A regular random graph is then taken after partitioning the set of vertices into r subsets, each one containing n nodes.
A G(50, 3‐reg) is presented at the left.
Theorem. Let r ³ 3 and e >0 be fixed. Then for n∞:
P( (1‐ e){logn/log(r‐1)} £ L(Gn,r‐reg) £ (1+ e){logn/log(r‐1)} )  1 what is the distribution of Betweenness centrality?
The drawn G50,3‐reg has diameter = 7 while log50/log(2‐1) = 5.7
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
Directed Random Graphs
Let D(n,m) be a directed graph chosen at random from the set of all

n(n  1)
m
D  {D(n, m)}

The following can be proved (under some more conditions, Molloy and Reed, 1995):
Given a sequence of nonnegative real numbers λ0, λ1, …, λn which sums to 1, a random graph having approximately λin nodes of degree λi will have a giant component for n∞ if maximum degree £ n1/4 ‐ e and ∑ 1≤didi(di‐2) λi > 0
, with D(n,0) having no links
The following Theorem can be proved:
while if Let e > 0 be a positive constant and w(n)∞ (like logn, or loglogn). Then:
(i) if e < 1 in almost every random digraph (i.e. containing more than one vertex) each nontrivial component of D(n,(1 ‐ e)n) is a cycle smaller than w;
(ii) there is a positive constant α=α(e) such that D(n,(1 + e)n) almost surely contains a component of order larger than αn, and every other nontrivial component of D(n,(1 + e)n) is cycle smaller than w.
(iii) If m/n ∞ then almost surely D(n,m) contains a unique nontrivial component of the size (1 – o(n))n.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
111
maximum degree is £ n1/8 ‐ e and ∑ 1≤didi (di‐2) λi < 0
there is no giant component.
The problem is to generate a random graph with the given degree sequence.
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
112
How can we randomize a network while
preserving the degree distribution?
Random Graphs for General Degree Distributions
Hence the configuration model also shows a phase transition, similar to that of the Bernoulli random graph, at which a giant component forms. Instead of a rigorous proof, consider a set of connected nodes and consider the “boundary nodes” that are immediate neighbors of that set. Grow that set by adding the boundary nodes to it one by one. When we add one such node the number of boundary nodes goes down by 1. However, the number of boundary nodes also increases by the number of the new neighbors which is di‐1. Thus the total change in the number of boundary nodes is ‐1 + (di‐1)= di‐2. However, the probability of a particular node being a boundary nodes is proportional to di, since there are di times as many edges by which a node of degree di could be connected to our set than a node of degree 1. Therefore the expected increase in the number of unknown neighbors is about ∑ 1≤didi(di‐2) λi , which have to be >0 in order to contain a giant component and not to exposed very quickly.
 Stub reconnection algorithm (M. E. Newman, et al, 2001, also known in
mathematical literature since 1960s)
 Break every edge in two “edge stubs”
AB to A
B
 Randomly reconnect stubs
 Problems:
 Leads to multiple edges
 Cannot be modified to preserve additional topological
properties
‐1 + (di‐1)= di‐2
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
110
Random Graphs for General Degree Distributions
Directed graphs with n nodes and m links. In this case to prove the given results is more convenient to use graph processes (Markov processes), where D(n,m) is the (m+1)‐th stage of a the graph process: n(n1)
m0
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
109
from Adamic
113
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
114
Another common distribution: power-law
with an exponential cutoff
Local rewiring algorithm
 p(x) ~ x-a e-k/
starts out as a power law
0
10
-5
p(x)
10
 Randomly select and rewire two edges (Maslov, Sneppen, 2002, also
known in mathematical literature since 1960s)
-10
10
 Repeat many times
 Preserves both the number of upstream and downstream
neighbors of each node
-15
10
0
1
10
2
10
3
10
10
x
from Adamic
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
from Adamic
but could also be a lognormal or double exponential…
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
115
116
Linear scale plot of straight bin of the data
Example on an artificially generated data set
 How many times did the number 1 or 3843 or 99723 occur
 Power-law relationship not as apparent
 Only makes sense to look at smallest bins
 Take 1 million random numbers from a distribution with

5
4.5
x 10
4
3.5
4
frequency

x 10
4.5
3.5
frequency

5
5
5
γ = 2.5
Can be generated using the so-called
‘transformation method’
Generate random numbers r on the unit interval
0≤r<1
then x = (1-r)γ is a random power law distributed
real number in the range 1 ≤ x < 
3
3
2.5
2
1.5
2.5
1
0.5
2
0
1.5
0
1000
2000
3000
4000
5000
6000
7000
8000
9000 10000
integer value
1
whole range
0.5
0
from Adamic
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
117
Log-log scale plot of straight binning of the data
 Same bins, but plotted on a log-log scale
0
2
4
6
8
10
12
14
16
18
20
integer value
from Adamic
first few bins
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
118
Log-log scale plot of straight binning of the data
 Fitting a straight line to it via least squares regression will
give values of the exponent  that are too low
6
10
6
here we have tens of thousands of observations
when x < 10
5
10
10
fitted 
true 
5
10
4
4
10
3
10
Noise in the tail:
Here we have 0, 1 or 2 observations
of values of x when x > 500
2
10
frequency
frequency
10
3
10
2
10
1
10
Actually don’t see all the zero
values because log(0) = 
1
10
0
10
0
10
1
10
2
10
3
10
integer value
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
4
0
10
from Adamic
119
10
0
10
1
10
2
10
3
10
integer value
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
4
10
from Adamic
120
What goes wrong with straightforward binning
 Noise in the tail skews the regression result
 1, 2, 4, 8, 16, 32, …
6
 normalize by the width of the bin
10
data
have few bins
here
5
10
First solution: logarithmic binning
 bin data into exponentially wider bins:
 = 1.6 fit
6
10
data
 = 2.41 fit
4
10
4
10
evenly
spaced
datapoints
3
10
2
10
have many more bins here
2
10
less noise
in the tail
of the
distribution
0
10
1
10
-2
10
0
10
0
10
1
2
10
10
3
10
-4
4
10
10
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
121
Second solution: cumulative binning

2
3
10
4
10
10
from Adamic
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
122
 fitted exponent (2.43) much closer to actual (2.5)
 No need to bin, has value at each observed value of x
 But now have cumulative distribution
6
10
data
 i.e. how many of the values of x are at least X
 -1 = 1.43 fit
5
is also power law but with an exponent
-1
c ( 1)
x

1
frequency sample > x
10
 The cumulative probability of a power law probability distribution
 cx
1
10
disadvantage: binning smoothes out data but also loses information
Fitting via regression to the cumulative distribution
 No loss of information

0
10
from Adamic
4
10
3
10
2
10
1
10
0
from Adamic
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
123
10
0
10
1
10
2
3
10
10
x
from Adamic
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
124
Example:
Where to start fitting?
 some data exhibit a power law only in the tail
 after binning or taking the cumulative distribution
you can fit to the tail
 so need to select an xmin the value of x where
you think the power-law starts
 certainly xmin needs to be greater than 0,
because x is infinite at x = 0
 Distribution of citations to papers
 power law is evident only in the tail (xmin > 100
citations)
xmin
from Adamic
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
4
10
125
from Adamic
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
126
Maximum likelihood fitting – best
 You have to be sure you have a power-law
distribution (this will just give you an exponent
but not a goodness of fit)
 n
x 
  1  n  ln i 
 i 1 xmin 
1
 xi are all your datapoints, and you have n of
them
 for our data set we get  = 2.503 – pretty close!
from Adamic
WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis
127