Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Graphs Aristotle University, School of Mathematics Master in Web Science A random graph is obtained by starting with a set V={1, 2, …, n} of n vertices and adding edges between them at random. (There are at edges). most N= 2 Different random graph models produce different probability distributions on graphs. Most commonly studied is: Model , (known as the Erdős–Rényi model) Model , (proposed by Edgar Gilbert) The Sample Space , Networks and Discrete Mathematics Introduction to Random Graphs and Real-World Networks Chronis Moyssiadis – Vassilis Karagiannis WS.04 Webscience WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis The Model The Model Model , (proposed by Edgar Gilbert), in which every possible edge occurs independently with probability , where 0 1. An element 0 in the sample space , having vertices and edges has probability of occurrence ⋅ , where 1 0 0 Model G(n,M) (known as the Erdős–Rényi model) assigns equal probability to all graphs with exactly M edges (0 . Because there are , sample space probability selections of the M edges, the elements each with contains If , then , is the sample space containing all with the same probability graphs on vertices . Almost always Μ is a function of n, i.e. We then denote the model as 2 , . ( . WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 0 ) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 3 The Sample Space , G n 4 Example (a graph process) The Sample Space , G n : 3! , ! 2! ⋅ 3 2 ! 3 ! 2 Sample Space , 3 : N! 3! 6 ⊂ ⊂⋅⋅⋅⊂ with having A graph process, ⊂ precisely t edges. There are N! such sequences in overall. 1 2 ⊂ 1 3! ⊂ 1 6 1 3 2 3 Bollobas 1985, the three spaces are closely related. 1 2 1 3 2 1 3 2 1 2 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 5 1 3 2 3 1 3 2 3 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 6 Example (the space 3, Sample Spaces , one of the 3 0, 1, 2, 3. The sample space for each : for subgraphs of K3 and their probabilities is presented below . . 1 2 1 1 3 2 1 2 3 1 3 2 1 1 3 2 3 3,3 1 2 3 2 3,1 1 3 1 3 1 2 3,0 1 3 0 Example in G(3,p) ) 3 3,2 1 3 1 3 2 1 3 3 1 1 3 Example: Let the properties (events) Q1: G in G(3,p) is connected and Q2: G in G(3,p) is bipartite. Are these two properties (events) independent? WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 7 Example-Random Variables in G(3,p) 8 The Degree Distribution in The degree distribution What is the probability that a given node has degree d (or in other words what is the expected fraction of nodes of G have degree d)? Each node may have at most n‐1 neighbors, hence according to the independence between links we get the binomial expression: 1 The average degree and the variance of the degrees (under the binomial model) are: ̅ 1 1 , respectively 1 and As n‐> infinity we get the Poisson distribution as an approximation to the binomial: ! Invariants or basic parameters of a graph like k‐connectivity, ‐ connectivity, the clique number, the independence number, etc., in the context of random graphs are considered random variables (r.v.) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 1 ⋅ 2 ⋅ 3 ⋅. . .⋅ 10 Studying the model G(n,p) What about the other properties (topological Degree distribution P(0) = 0.35345468 P(1) = 0.36759287 P(2) = 0.19114829 P(3) = 0.06626474 P(4) = 0.01722883 characteristics) of the network? Example: Let G be a random graph in G(50, 0.02). (i) What fraction of nodes would be expected to have degree 1. (ii) What fraction of nodes would be expected to have at least degree 1? (iii) What is the probability that there exist a node of degree 15 in G? WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis , where ! WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 9 Example Min. 1st Qu. Median Mean 3rd Qu. Max. Variance 0.00 0.00 1.00 1.04 2.00 4.00 1,02 1 11 Is it connected? Does it contain cycles or it is a forest – tree? What about the distribution of the components? What about distances in there? Do there exist any correlation between the degrees of adjacent nodes? Are there nodes that may play central roles? What about the existence of cliques or communities in there? What about the vulnerability of such a network? What parameters of the network define the answers ? WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 12 Properties of the model Threshold Functions Any property Q of random graphs in G(n,p) is defining a class of graphs closed under isomorphism (a set of graphs with n nodes, or an event in the sample space of random graphs). Thus a property Q is satisfied by the random graph G if G is in the set of graphs that Q defines. Usually to each property an indicator variable X is associated and proofs are based on the average E(X). We study the model (usually G(n,p) or G(n,m)) growing the set of nodes n and forming the probability of a link as a function of n, p(n). The model is completely specified by p(n). Most properties that are studied are referred as monotone increasing or properties. These are properties that are invariant under the addition of edges in the network. The property of being connected is a monotone increasing property. The property of containing a cycle is an increasing one. Many of these properties arise suddenly when p(n) exceeds some → 0for → ∞, then ( has at least 1 triangle) → 0 → ∞for → ∞, then ( has at least 1 triangle) → 1 is near 1/ there is a change in the random graph. 1/ then almost surely is triangle free, while, 1/ then almost surely contains a triangle. Hence if If if Let 1/ , then the function threshold function. with the previous properties is called a Generally if is any monotone property of graphs (such that is preserved when edges are added)the function t(n) is a threshold function for the property : If if then almost surely ~ , does not have then ~ , almost surely has . ‐1, There is not a unique threshold function. In the example with the triangle, any 0is a threshold function. threshold function pQ(n) and when such a threshold function exists it is said that a phase transition occurs at that threshold. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis We found that if while if 13 Example WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 14 Threshold functions for subgraphs Find a threshold function for the property Q : in a Poisson random graph G with n nodes, node 1 has at least one link. Let Q be the property: F is a subgraph of order k and size l in some Poisson random graph G of the space G(n,p). E (d ) (n 1) p(n) The average degree of each node is and the probability that it In G it is expected that either no F exists or a number of F’s (isomorphic to F) exist. The average number of F is: n k! nk k! nk E (F ) pl pl pl k! a a k a 0 e P(d 1) 1 P(0) 1 1 e 0! n p(n) n P(d 1) 0 If then (n 1)p(n) 0 1 n 1 n p(n) n P(d 1) 1 While if then . (n 1)p(n) 1 n 1 1 pQ (n) Hence is a threshold function for the property Q. n 1 has at least one link is where a is the number of the isomorphic F in G. l 1 c Moreover any function may be a threshold function also. , c0 pQ (n) c n 1 n 1 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 15 k WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 16 23 n 2 Below the threshold of 1/n, there are a lot of small components (of order nodes) and the largest component of them includes no more than a factor times log(n) of the nodes. Jackson 2008 The threshold probabilities at which different subgraphs appear in a random graph: some edges also appear. For ≅ trees of order 3 appear, while for ≅ trees trees of all orders are present, and at the same time cycles of order 4 appear. At ≅ of all orders appear. The Probability ≅ pQ (n) cn l hence a threshold function is . Evolution in Poisson RG - Component Distribution Evolution of specific subgraphs the graph consists of isolated nodes, while before l p 1 1 p k P(number of F 1) 1 k 0 Then if while if a l a l n n P(number of F 1) 0 For F F As each link is independent of each other the probability that at least one of the F exists is: e 0 P(number of F 1) 1 P(number of F 0) 1 1 e 0! marks the appearance of complete subgraphs of order 4 and ≅ corresponds to complete subgraphs of order 5. As – approaches 0, the graph contains complete subgraphs of increasing order. Above the threshold of 1/n, a giant component emerges, which is the largest component 23 that contains a nontrivial fraction of all nodes, i.e., at least cn for some constant c or n nodes (while the others are either small or isolated nodes). The giant component grows in size until the threshold of log(n)/n, at which point the network becomes connected. k/l Barabasi 2002 Log(100)/100≈0.05 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 17 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 18 Diameter Average degree, size and Component Size Because of a phase transition in the connectivity at p = logn /n Average degree and size m (number of links), relation between the spaces G(n,p) and G(n,m): n(n 1) 2m 2m pn(n 1) m p p , hence d p(n 1) n(n 1) n 2 n the problem of determining the diameter of G seems to be difficult for certain ranges of p. Some results are : If p < 1/n , then the diameter is that of a tree component and a rough estimation is In a Poisson random graph we have that: log ⁄log 1 d p(n 1) and Var (d ) d , (from the Poisson degree distribution) If p>1/n, then a giant component appears and if p≥3.5/n, the so from Var (d ) d 2 (d )2 we get d 2 (d )2 d diameter equals the diameter of that component which is ⁄log log 1 . proportional to Hence (Jackson, 2008) the average size of the component that a randomly selected node lies is: GH (1) 1 d 2 2d d 2 1 1 1 p(n 1) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis If p>logn/n then the graph is almost surely connected and the diameter is concentrated around ⁄log 1 ⋅ log WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 19 20 Clustering coefficient Average path length Fronczak et al. , 2004 show that for large random graphs the average path length or average distance can be estimated as 1 . log n 0.5 log pn Suppose that we have a Poisson random social network. Then: P(i and j know each other) = P(link {i , j} exists) = p. Moreover where g is the Euler constant (@ 0.5772) P(i and j know each other when both are friends of k) = P(i and j know each and both are friends of k) / P(i and j are friends of k) = p3 / p2 = p k i j WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 21 Similarities and differences with real world nets The existence of any acquaintance between i and j is independent of any other acquaintance between i and k or between j and k. Accordingly as n grows the clustering coefficient stays very low. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 22 A Figure from Bearman, Moody and Stovel from the Add Health data set (revisited) In any case of a Poisson Random network where links are formed independently has many futures in common with a model where we force the network to have the expected number of links (homogeneous network – relative unrealistic). 2008. Matthew O. Jackson (Book) Even when with low probability gets connected, the distributions of centrality indices reveal symmetric shapes that exclude real world characteristics like the existence of hubs or in the opposite site cohesive social communities. The “evolution” of those random type nets that based on growing but static population can be almost surely be estimated with high probability based on more of its monotone increasing properties (events). All properties are related to n or to p(n) hence the network is scale dependent. Although it resembles small diameter characteristic of near almost real world nets it does not support the tendency of transitivity (a friend of my friend is a friend). Instantly as a static model which functions on a given number of nodes by means of some given order it can serve as a benchmark or a null hypothesis to relevant real world networks with the same order and size. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 23 This is an example of real world networks that its growth was very close to a Poisson random network WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 24 Centrality Measures Βαθμική Κεντρικότητα (Degree Centrality) Ο βαθμός κορυφής deg , δηλ. το πλήθος ακμών που συνδέονται με την κορυφή, είναι ένας δείκτης κεντρικότητας. Ποιες είναι οι «σημαντικότερες» κορυφές; Ως προς ποια κριτήρια; Πλήθος Συνδέσεων Σημαντικότητα συνδέσεων Μικρότερες αποστάσεις Κομβική σπουδαιότητα Έχουν οριστεί διάφορα μέτρα κεντρικότητας (centrality measures) Η Βαθμική Κεντρικότητα ορίζεται ως το πηλίκο deg 1 Αντικαθιστώντας το βαθμό με τον έσω- (ή έξω-) βαθμό (σε κατευθυνόμενα γραφήματα) έχουμε ανάλογα εσω-(έξω-) βαθμική κεντρικότητα. Σε κοινωνικά δίκτυα σημαίνει όσο περισσότερες φιλίες (γνωριμίες, σχέσεις) έχει ένα άτομο (κορυφή) τόσο σημαντικότερο είναι στο δίκτυο. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 25 Για το δίκτυο των Μεδίκων n παίκτες-κορυφές (στο σχήμα n=6) παίζουν ανά δύο μέχρι τελικής νίκης (δεν υπάρχει ισοπαλία). Η κατεύθυνση της ακμής σημαίνει νίκη, π.χ. από την κορυφή 3 προς την κορυφή 1, σημαίνει ότι ο παίκτης 3 νίκησε τον παίκτη 1. Διάνυσμα βαθμικής κεντρικότητας (με τους έξω-βαθμούς, δηλαδή τις νίκες) είναι: 0.8, 0.6, 0.6, 0.4, 0.4, 0.2 ΄ Βαθμική κεντρικότητα: Medici (0.4), Guadagni (0.267), Strozzi (0.267), Albizzi (0.2), …….. που φαίνεται και από το σχήμα και έδωσε μεγάλη αίγλη στην οικογένεια των Medici. 27 Ιδιοκεντρικότητα (Eigenvector Centrality) 1 2 6 3 4 5 Καλύτερος παίκτης ο 1, ακολουθούν οι 2 και 3 που είναι ισοδύναμοι WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 28 Ιδιοκεντρικότητα (συν.) Ο παίκτης 1 έχει 4 νίκες, ενώ Είναι φυσική γενίκευση της βαθμικής κεντρικότητας (θα οριστεί πιο κάτω). ο παίκτης 3 έχει 3 νίκες. Όμως οι νικημένοι από τον 1 έχουν Στο Σχήμα. Οι κορυφές 1, 6 έχουν βαθμό 3. συνολικά 8 νίκες, 1 2 6 3 ενώ οι νικημένοι από τον 1 έχουν Όμως οι γείτονες της 1 έχουν συνολικό άθροισμα βαθμών 5, ενώ οι γείτονες της 6 έχουν συνολικό άθροισμα βαθμών 11. Αυτό μπορεί να ερμηνευτεί ότι η κορυφή 6 έχει μεγαλύτερη σημαντικότητα από την κορυφή 1 (έχει βαθμικά σημαντικότερους γείτονες). WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 26 Ένα δίκτυο από round robin τουρνουά Γνωστό κοινωνικό δίκτυο με κόμβους μεγάλες οικογένειες της Φλωρεντίας του 15ου αιώνα και συνδέσεις μεταξύ των οικογενειών που παριστάνουν γάμους μεταξύ των οικογενειών. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 29 συνολικά 9 νίκες 5 4 Αυτό μπορεί να ερμηνευτεί ότι ο παίκτης 3 μπορεί να είναι ισχυρότερος από τον παίκτη 1, αν λάβουμε υπόψη ποιους νίκησε. Στα κοινωνικά δίκτυα μετρά η σημαντικότητα των φίλων μας για την τελική σημαντικότητα που μας αποδίδεται. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 30 Οι δυνάμεις του πίνακα αντιστοίχισης Έστω ο πίνακας αντιστοίχισης του , , όπου Αν Αποδεικνύεται ότι οριακά ισχύει , τότε (οι βαθμοί κορυφών) Διαπιστώνουμε (είναι το άθροισμα βαθμών άμεσων γειτόνων) Συνεχίζοντας βρίσκουμε (αθροίσματα n-τάξης) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis Ιδιοκεντρικότητα , η μεγαλύτερη ιδιοτιμή του πίνακα Α. όπου Η κανονικοποιημένη μορφή (normalize) της μεγαλύτερης ιδιοτιμής του πίνακα Α, δηλαδή αυτή που προκύπτει με διαίρεση των , είναι η συνιστωσών της με το μέτρο της ιδιοκεντρικότητα των κορυφών του γραφήματος που παριστάνει το Α. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 31 Ύπαρξη Ιδιοκεντρικότητας 32 Παράδειγμα Το igraph έχει τη συνάρτηση evcent για υπολογισμό ιδιοκεντρικότητας. Το πακέτο sna της R έχει, ομοίως, τη συνάρτηση evcent Η ύπαρξή του διανύσματος και το ότι είναι πραγματικό και θετικό εξασφαλίζεται από το θεώρημα Peron-Frobenius (P-F). Πράγματι σε συνδετικά απλά γραφήματα ο πίνακας αντιστοίχισης είναι συμμετρικός και πρωταρχικός (δηλ. υπάρχει δύναμη του , έστω η , με όλα τα στοιχεία διάφορα του 0), προϋποθέσεις για την ισχύ του θεωρήματος P-F. Στα κατευθυνόμενα γραφήματα: δεν ορίζεται μονοσήμαντα το ιδιοδιάνυσμα της μέγιστης ιδιοτιμής. ο πίνακας Α είναι μη-συμμετρικός. Έτσι η ιδιοκεντρικότητα δεν λειτουργεί καλά. Εφαρμογή στο σχήμα: 0.147, 0.045,0.214, 0.214, 0.544, 0.443, 0.443, 0.443 ΄ Δηλ. η κορυφή 5 είναι η πλέον κεντρική, όμως οι 6, 7, 8 έχουν ίδιο βαθμό πολύ μεγαλύτερο από το βαθμό της κορυφής 1. Η ιδιοκεντρικότητα ορίστηκε αρχικά από τον Bonacich το 1987. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 33 Παράδειγμα Κεντρικότητα Katz Για τους Μεδίκους βρίσκουμε: Medici Strozzi Ridolfi 0.430 0.356 0.342 Tornabuoni Guadagni Bischeri 0.326 0.289 0.283 Είναι γενίκευση της ιδιοκεντρικότητας. Αν θεωρήσουμε την σχέση δηλ. η οικογένεια των Medicci που ήταν κεντρική στη βαθμική κεντρικότητα, εξακολουθεί να είναι κεντρική και στην ιδιοκεντρικότητα. Όμως, οι δύο οικογένειες Guadagni και Strozzi που έχουν και οι δύο βαθμό 4 ως προς τη βαθμική κεντρικότητα (δηλαδή ισοδυναμούν στη δεύτερη θέση), ως προς την ιδιοκεντρικότητα κατέχουν η μεν Strozzi την 2η θέση, η δε Guadagni την 5η θέση. Σ’ αυτήν την κεντρικότητα οι οικογένειες Ridolfi και Tornabuoni (με βαθμό κορυφής 3) ξεπερνούν την οικογένεια Guadagni. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 34 35 για θετικές ποσότητες α, β, τότε είναι σαν να δίνουμε μία «ποσότητα κεντρικότητας» σε όλους τους κόμβους (να μην είναι 0). Προτάθηκε το 1957 από τον Katz και υπολογίζεται από τη σχέση: Ως β μπορούμε να πάρουμε το 1, ενώ ως α μία τιμή μικρότερη του (στην οποία μηδενίζεται το διάνυσμα). WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 36 Βαθμική κεντρικότητα Page (PageRank centrality) Σχέση των τεσσάρων μέτρων Είναι βελτίωση της κατά Katz κεντρικότητας . Οι Larry Page και Sergey Brin που δημιούργησαν την Google ανέπτυξαν έναν νέο αλγόριθμο για αξιολόγηση της σημαντικότητας των σελίδων που συνδέονται με την Google που τον ονόμασαν PageRank centrality (σελιδοβαθμική κεντρικότητα) αξιοποιώντας το όνομα του Page. Η κατά Katz κεντρικότητα έχει το εξής μειονέκτημα: αν ένα κόμβος έχει μεγάλη κεντρικότητα, τότε και όσοι συνδέονται με αυτόν έχουν επίσης μεγάλη κεντρικότητα. Για παράδειγμα κάποιος κόμβος που συνδέεται με τη Google, ή την Yahoo που έχουν μεγάλη κεντρικότητα θα έχει και αυτός μεγάλη κεντρικότητα. Αυτό διορθώθηκε με διαίρεση με τον εξω‐βαθμό, δηλαδή: WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis με σταθερό όρο διαίρεση με έξω-βαθμό χωρίς διαίρεση 37 Κεντρικότητα Εγγύτητας (closeness centrality) βαθμική κεντρικότητα του βαθμική κεντρικότητα Page Katz κεντρικότητα ιδιοκεντρικότητα WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 38 Κεντρικότητα Εγγύτητας (συν.) Η κεντρικότητα αυτή χρησιμοποιείται πολύ σε κοινωνικά Αν , 1, 2, … , οι αποστάσεις της κορυφής από τις γειτονικές κορυφές της, τότε: 1 1 δίκτυα, έχει όμως κάποια μειονεκτήματα. Το εύρος των τιμών είναι πολύ μικρό και έτσι δεν μπορεί να γίνει καλός διαχωρισμός και ταξινόμηση των κορυφών ιδιαίτερα σε μεγάλα δίκτυα. Μεταβάλλεται εύκολα με την προσθήκη ή διαγραφή κορυφών. η μέση απόσταση, είναι μικρή για τις κεντρικές κορυφές και μεγαλύτερη για τις απομακρυσμένες κορυφές. Έχουν προταθεί και άλλοι ορισμοί, όπως το να χρησιμοποιηθεί ο αρμονικός μέσος των αποστάσεων ο οποίος έχει ωραίες ιδιότητες και δεν έχει πρόβλημα με κορυφές που δεν συνδέονται, αφού το αντίστροφό τους είναι 0 και δεν μεταβάλλει την κεντρικότητα. Αν δεν υπάρχει μονοπάτι μεταξύ δύο κορυφών θέτουμε ως απόσταση το πλήθος όλων των κορυφών (αυθαίρετο). , είναι η κεντρικότητα Η αντίστροφη ποσότητα Χωρίς σταθερό όρο εγγύτητας. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 39 Ενδιάμεση Κεντρικότητα (betweenness centrality) Αν =1 όταν η κορυφή i βρίσκεται στη γεωδαισιακή που συνδέει τις κορυφές s, t και 0 σε άλλη περίπτωση, τότε η ενδιάμεση κεντρικότητα ορίζεται ως , όπου ∑ , The Configuration Model (Bender and Canfield). The aim is to generate random networks with a given degree sequence (d1, d2, … dn). Η ενδιάμεση κορυφή έχει μικρή βαθμική κεντρικότητα αλλά μεγάλη ενδιάμεση κεντρικότητα , Αν το πλήθος των γεωδαισιακών από το s προς το t που περνούν το πλήθος των γεωδαισιακών από το s προς το t, τότε η από το i και ενδιάμεση κεντρικότητα της κορυφής i γράφεται: WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 40 Generalized Poisson Random Graph models Η ενδιάμεση κεντρικότητα μετρά το κατά πόσον μια κορυφή βρίσκεται σε μονοπάτια (γεωδαισιακές) μεταξύ άλλων κορυφών. Προτάθηκε από τον Freeman το 1977, αλλά είχε ήδη προταθεί από τον Anthonisse σε αδημοσίευτη εργασία του. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis ∑ , , Construct a sequence where node 1 is listed d1 times, node 2 is listed d2 times, etc.: We randomly pick two elements of the sequence and form a link between the two nodes corresponding to those entries. We delete those entries from the sequence and repeat. The sum of degrees needs to be even (or else an entry will be left out at the end). It is possible to have more than one link between two nodes (thus generating a multigraph). Self‐loops are possible. Higher degree nodes are involved in a higher percentage of the links (not Poisson) If we delete loops or multiple edges we have to remember that n d where is the maximum degree up to node n. 41 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis dn (nd n )1 / 3 n 0 42 Generalized Poisson Random Graph models The Expected Degree Model (Chung and Lu). Start with n nodes and a given degree sequence (d1, d2, … dn). Form a link between nodes i and j with probability under the restriction so that each of di d j / dk max i di2 dk k k the above probabilities is less than 1. Taking the summation for the expected frequency of node i we get that it has degree di but the realized degree distribution is significantly different from the one in the configuration model (ch. 4 Jackson). Real-World Networks Small World Model While random graphs exhibit some futures of real‐world networks (small diameters or average distances relative to growing average degree), they lack other characteristics. Stanley Milgram (1967), Harvard, what is the probability that two randomly selected people would know each other? While the configuration model leads more closely to the starting degree sequence (under the previous restrictions) the Expected degree model is nearest to the Poisson model. On average 5.5 hops (in a sparse network) But: Both of these static models have shown properties similar to the Poisson model (based on related threshold functions) and they are useful just for generating random graphs from different initial distributions (even power law distribution). Six degrees of separation Why? In social networks (which are sparse networks) it has observed that the friends of us are usually also friends between them, or in other words people tend to group into relative small (or large clusters) in a way that in some cases does not depend to the place of living. The main problem is that like the Poisson model miss the inclusion of functional mechanisms that formulate and influence social or economical or other real type relationships because growth in these models is based on independent links. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 43 Real‐World Networks ‐ Small World Model WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 44 Real‐World Networks ‐ Small World Model Suppose that we have a Poisson random social network. Then: P(i and j know each other) = P(link {i , j} exists) = p. Moreover P(i and j know each other when both are friends of k) = P(i and j know each and both are friends of k) / P(i and j are friends of k) = p3 / p2 = p Usually a friendship network having ourselves as a root may have a picture like the one drawn below k i The existence of any acquaintance between i and j is independent of any other acquaintance between i and k or between j and k. j WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 45 Real‐World Networks ‐ Small World Model WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 46 Real‐World Networks ‐ Small World Model A way to define the previous probability is by the enumeration of the connected triples and the triangles in the network. A connected triple centered at node i is defined as a path of length 2 having node i as the intermediate node. The number of all possible connected triples at node i having degree di is: This network has one triangle and eight connected triples. d (d 1) T (i) d2i i i 2 The individual vertices have local clustering coefficients The transitivity of a node i as also the clustering coefficient of a node i is defined as: 1, 1, 1/6 , 0 and 0, hence the clustering coefficient is equal to C(G) = 13/30 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 47 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 48 Real‐World Networks ‐ Small World Model Watts and Strogatz (1998). A one dimensional lattice is graph on n nodes such that if we order them into a ring each node is connected to the first k left‐hand neighbors and to the first k right‐hand neighbors. Such a lattice with n=20 nodes and k = 2 is presented below (2k‐regular graph). Then with probability p, we decide independently for each link {u,v} if we replace it with a link {u,w} where w is chosen uniformly at random from the set of nodes. Real‐World Networks ‐ Small World Model For a network G we compute the distance from each node i to each other and then we take the average distance for node i . Doing the same for each node in the network we get a set of |V(G)| average distances. The characteristic path length of the network G is the median of the all the average distances and sometimes is used instead of average path length (which is the average distance of the graph). The importance of Watts and Strogatz model is due to the fact that it started the active and important field of modeling large‐scale networks by random graphs defined by simple rules WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 49 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 50 The WS(n, k, p) graph For any Watts‐Strogatz graph G from WS(n, k, 0) the clustering coefficient for G as also for each vertex of G is equal to 3 k 2 C (G) 4 k 1 3 k 1 while for G from WS(n, k, p) is C (G) (1 p)3 2 2k 1 (u) from a given For any Watts‐Strogatz graph G from WS(n, k, 0) the average path length vertex u to any other vertex in G is approximated by (n 1)(n k 1) (u) 2kn log n p 1 while for G from WS(n, k, p) is near as 0.5 log pn and γ is the Euler constant (which is approximately equal to 0.5772) The degree distribution which shows a different form than the corresponding to random graphs can be found in Newman 2003, review. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 51 Real‐World Networks ‐ Small World Model WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 52 Two main points of the small-world phenomenon Using n = 20, k = 2, and rewiring probability p=0.1 the diameter from 5 goes to 4, while theoretically it was expected near log20 = 2.996 ≈3. Random and C C Random WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 53 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 54 Fundamentals Beyond the degenerate degree distribution of a k‐regular network where P(d)=1, for d=k and 0 otherwise, the Poisson random network and the generalized random networks, the distribution that had been associated along time ago with a great number of real events is the so called scale‐free distribution or power distribution P(d) which satisfies: P(d ) c d Where c > 0 is a scalar that depends on the nature of the variable d (for d=1,2,… is the z( ) d1 1 inverse of the Riemann Zeta function ). d Fundamentals Scale‐free distributions are often said to exhibit a power law, which is due to the power function d A characteristic of these distributions is the “fat tails” they have. That is they tend to have many nodes with very small and very large degrees that one cannot observe in the usual Poisson random networks in which links are created independently. They have to be distinguished from the exponential distributions of the type: P(d ) c e ad which show a cut‐off, that is large degrees are unlikely to be met over some value. Increasing the degree by a factor a the frequency goes down by a factor of a – γ. That means that regardless of the scale of the degrees which have a fixed relative ratio the relative probabilities are equal, that is P(2)/P(1) = P(20)/P(10) = c∙2 – γ / c∙1 – γ. Hence the term scale‐free. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 55 Example WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 56 Basic Properties and Claims The main properties of SF graphs that appear in the existing literature can be summarized as: 1. SF networks have scaling (power law) degree distribution. 2. SF networks can be generated by certain random processes, the foremost among which is preferential attachment. 3. SF networks have highly connected “hubs” which “hold the network together” and give the “robust yet fragile” feature of error tolerance but attack vulnerability (in contrast to Poisson Random Networks). 4. SF networks are generic in the sense of being preserved under random degree preserving rewiring. 5. SF networks are self‐similar. 6. SF networks are universal in the sense of not depending on domain‐specific details. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 57 Example (from Adamic) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 58 Preferential Attachment BA model 59 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 60 Searching for Real-World Network Models Preferential Attachment BA model To find suitable models for the real world is the primary goal here. The analyzed real‐world networks mostly fall into three categories. The biggest fraction : of research work is devoted to the Internet, WWW that is HTML‐pages and their links (WWW), the newsgroups and messages posted to two or more of them (USENET), the routers and their physical connections, the autonomous systems, … In biology, in particular chemical biology, genetics, as networks Some of these show their net structure directly, at least under a microscope. But some of the most notorious of the biological networks, namely the metabolic networks, are formed a little more subtly. Here the nodes model certain molecules, and links represent chemical reactions between these molecules in the metabolism of a certain organism. In the simplest case, two vertices are connected if there is a reaction between those molecules. Sociological networks often appear without scientific help. networks in politics and economy, actors, sexual, friendship … No cycle is created under the BA model, so other models are proposed…. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 61 Example: World Wide Web Scientific Collaboration Network R. Albert, H. Jeong, A‐L Barabasi, Nature, 401 130 (1999). Nodes: WWW documents Links: URL links ROBOT: collects all URL’s found in a document and follows them recursively. Over 3 billion documents Exponential Network Expected P(k) ~ k‐ Found Scale-free Network WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 63 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 64 The BA model: growth and preferential attachment Example: Sexual Relationships Nodes: people (Females; Males) Links: sexual relationships Growth: Starting with a complete graph with m nodes. Nodes born over time and are i 0,1,2,...,t ,... indexed by their time of birth . Its new node forms m links with pre‐ existing nodes. Preferential attachment: The probability that some node take a link is proportional to its degree. Hence the probability that node I at time t takes a link is: d (t ) d (t ) d (t ) m t i m i i 2mt 2t 1 d j (t ) Following a continuous‐time approximation the rate of the change of the i node degrees is: d (t ) t di (t ) di (t ) i , with initial condition di (i) m t t di (t ) t having the solution 2 2 m i m or i t t di (t ) di (t ) 2 2 Hence the number of nodes with degree less than di(t) or d in general is: Ft (d ) 1 m d and the corresponding density distribution is f (d ) 2m2d 3 (which is P(d ) Cd 3 , given by Barabasi ) 4781 Swedes; 18‐74; 59% response rate. Liljeros et al. Nature 2001 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 62 65 1/2 t di (t ) m i hence WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 66 The role of growth and preferential attachment The growth of BA network Barabási-Albert (a) Due to preferential attachment the new vertex was linked to vertices with already high connectivity (b) Since preferential attachment is absent, the new vertex connects with equal probability to any vertex in the system (c) At every step a new edge is introduced, one end being added to a randomly selected vertex, the other end following preferential attachment Albert‐László Barabási, 2008 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 67 For the BA model the clustering coefficient of the node vs t steps after the start (based on m initial nodes) is equal to 2 4m m 1 ln (t ) ln2 (s) C (v s ) 2 m 12 8 t s / m Fronczak et al., 2003 For the BA model with m initial nodes, the average path length is: ln(n) ln(m / 2) 1 3 lnln(n) ln(m / 2) 2 and γ is the Euler constant (which is approximately equal to 0.5772) Fronczak et al., 2004 The graph is connected and the older are richer WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 69 Is it just that “rich get richer”? Pareto (the wealth in populations ‐ huge variability ). Most individuals do not earn so much, but there are these rare individuals that earn a substantial part of the total income. Pareto's principle ‐> `80/20 rule‘ or 20 percent of the people earn 80 percent of the total income. Zipf (study of the frequencies of occurrence of words in large pieces of text). The most frequent word is about twice as frequent as the second most frequent word, and about three times as frequent as the third most frequent word, etc. In short, with k the rank of the word and f(k) the relative frequency of kth most frequent word, f(k)~k‐r where r is close to 1. This is called Zipf's law. Lotka (Chemical Abstracts in the period from 1901‐1916). The number of scientists appearing with 2 entries is close to 1/22 = 1/4 of the number of scientists with just one entry. The number of scientists appearing with 3 entries is close to 1/32 = 1/9 times the number of scientists appearing with 1 entry, etc. Again, with f(k) denoting the number of scientists appearing in k entries, f(k)~k‐r , where r is close to 2. This is dubbed Lotka's Law. Simon (the “Gibrat” principle). Gibrat argued that the proportional change in the firm size is the same for all firms in an industry. Merton (“Matthew Effect”). For everyone who has will be given more, and he will have an abundance. Whoever does not have, even what he has will be taken from him. Price (“cumulative advantage”). Newly arriving nodes will tend to connect to already well‐ connected nodes rather than poorly connected nodes in citation networks. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 70 Example Barabasi, Albert and Jeong investigate the scale‐free nature of the WWW, and propose a preferential attachment model for it. In the proposed model for the WWW, older vertices tend to have the highest degrees Different realizations of the model a) b) c) have ρ(x) power law with exponent 2.5 ,3 ,4 respectively. On the WWW this is not necessarily the case, as Adamic and Huberman demonstrate. For example, Google is a late arrival on the WWW, but has yet managed to become one of the most popular web sites. d) Has ρ(h)=exp(-x) and a threshold rule. A possible fix for this problem is given by G. Bianconi.‐L. Barabási through a notion of fitness of vertices, which enhance or decrease their preferential power. Under that assumptions the probability of a node to take a link is not just the “age” but also it depends on another parameter (in some other models except fitness there was no preferential attachment) i di , with i value is taken from some distribution. tj 1i di WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 68 History on growth and preferential attachment Properties of the Barabási‐Albert model WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis From Caldarelli 71 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 72 Klienberg Copying Model Searching in Networks power-law graph number of nodes found Consider the WWW. 94 What is the “microscopic”process of growth? TWO STEPS 67 63 1.GROWTH: Every time step you copy a vertex and its m links 54 2.MUTATION (for everyone of m links) •With Probability (1‐α) you keep it •With Probability α you change destination vertex 2 Then the rate of change is: 6 1 d m d 1 a a n n t Scale-free from Adamic WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 73 74 Vulnerability Searching in Networks Poisson graph number of nodes found Complex systems maintain their basic functions even under errors and failures (cell mutations; Internet router breakdowns) 93 19 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 15 11 7 3 1 Poisson from Adamic WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 75 Vulnerability WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 76 Vulnerability A scale‐free network is thus seen to be sensitive to a targeted attack, but just as robust as an ER random graph in the case of a random attack Scale-free from Adamic: Targeting and removing hubs can quickly break up the network WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 77 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 78 Three plots of the same power law The CCDF plot of some distributions Willinger et al., 2005 From Caldarelli lectures in complex networks WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 79 Power Law – Lognormal WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 80 Power Law – Lognormal Mitzenmacher, 2004 . Nevertheless power‐law with preferential attachment is not the only way to describe real events concerning the growth of some living organism, nor it is the only distribution that can be characterized as scale‐free. Lognormal distribution beyond the application that found in biological nets and other disciplines (where a multiplicative processes take place), in case of sufficient large variance (σ2) may appear to behave like scale‐free for a large range of values : df : f (d ) 2 2 1 e (ln x ) / 2 2 x ln( f (d )) ln d ln 2 and taking logarithms: (lnd ) (lnd )2 2 1 ln d ln 2 2 2 2 2 2 2 2 2 The conclusion comes from the underlying terms which so a power law with exponent. 1 2 Moreover in many cases there is not any distinct way to distinguish between the best fitting in the real data (especially when one uses the complementary cumulative frequency distribution). WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 81 From Caldarelli lectures in complex networks WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis Power Law ‐ Lognormal Mitzenmacher, 2004. Power law distributions arise from multiplicative processes once the observation time is random or a lower boundary is put into effect, while lognormal arises when there is no lower boundary concerning data (in contrast with power law it behaves more realistic in the left tail of the distribution). Another case is the double Pareto distribution which is two different power laws (that is something observed in collaboration networks) Lognormal by linear regression Supplementary Material Power law by linear regression From Staša Milojević, 2010, with logarithmic binning. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 83 82 Chebyshev’s Inequality Markov’s Inequality Let be a nonnegative random variable and a positive number, then: Let be a random variable a positive number and / | (Use of the inequality) For 1, and ∶ hence if for → ∞, we get → 0, 1 1– 0 → 0 then or 0 → 1 Let X be the number of triangles in ~ X = ∑ S Œ V, |S|=3 XS , where , 1 ,then : Thus if 2 → 0, and / 2, VX 3 p 2 (n 2)(n 3)p n p3 (np)3 3 → 0, then Let X be the number of triangles in ~ , , then : ∑ S Œ V, |S|=3 ∑ , + ∑ , ∑ ∑ , If and have 0 or 1 node in common, then , 6– 6 0. If , 5– 6 . Hence: then n 3 n 5 (Use of the inequality) For , | |and 0, 0 | | | | 2 → 0, hence if → ∞and / 0 → 0 The covariance of , is , – while for an indicator random 2 variable : 2 X and – 1 S] is triangle XS 0 if G[otherwise Hence | the variance of , then: 2 → 0if → ∞, or and as → ∞, / almost surely has at least one triangle. 0 → 1. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 85 Power Law Distribution (David Kempe) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 87 Power Law Distribution (Clauset et al.) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 89 0 → 0which implies that WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 86 Power Law Distribution in WWW (David Kempe) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 88 Power Law Distribution (Clauset et al.) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 90 Power Law and other Distributions (Clauset et al.) Off–line and on–line models The random graph models mentioned above are off‐line models or static and this refers to models in which all nodes are establishes at the same time. Also distances, clustering coefficient, connectedness …, are dependent on the order (n) of the network as also as the probability p(n) of the existence of each link. Since real‐world graphs are dynamically changing—both in adding and deleting links and nodes — or in other words they grow over time, there are several on‐line random graph models in which the probability spaces are changing at the tick of the clock. In fact, in the study of complex real‐world graphs, the on‐line model came to attention first. On‐line random graph is a graph which grows in size over time, according to given probabilistic rules, starting from a start graph G0. One can make statements about these graphs in the limit of long time (and hence large vertex number n, as n approaches infinity). Web graphs are on – line models which try to explain the evolution of the entire complex system. Additionally the topology of the network does not depend on n (self‐similarity). WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 91 Preferential Attachment BA model (David Kempe) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 93 92 Preferential Attachment BA model (David Kempe) WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 94 The distribution of the eigenvalues of a Random Graph Balanced and Unbalanced Subgraphs 2m n WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis m n st definition) (G) . d The average degree of G is and the density ( 1 A random graph has n real eigenvalues λ1≤ λ2≤, …, ≤λn (spectra) and n orthonormal vectors forming a base of n. A graph G is balanced if the average degree of any subgraph H G does not exceed the average degree of G. Then: Let H be a non empty balanced graph with k nodes and l links. Then n‐k/l is a threshold function for the property that G contains H as a subgraph. We have seen that eigenvalues are related to certain graph properties (walk of given length, average degree, chromatic number, subgraphs, …), while each eigenvector represents a weight function on the nodes of the graph. The spectra density can ne defined as For unbalanced graphs: ρ(λ)= (1/n)∑ 1≤i≤n(λ ‐ λi) , where (λ ‐ λi) =1 for λ= λi and 0 otherwise. Let H be an unbalanced graph such that H max{ F ,F H} t(n) n1/ Then is a threshold function for the property that G contains H as a subgraph. H WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 95 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 96 The distribution of the eigenvalues of a Random Graph When G is a random graph with p<<n‐1 , using computer simulations it is observed that the largest eigenvalue λn is far apart from the other n‐1 eigenvalues and usually ignoring it the shape of the empirical distribution function tends to a “semicyrcle”. In the figure the greatest eigenvalue is out and values in the x‐axis where scaled by 1/σ while values in the y‐axis by σ, where 1 1 i (i )2 , n 1 n 1 1in i n The distribution of the eigenvalues of a Random Graph When G is a random graph with p>n‐1 and n→∞ a giant component is emerged. Then ρ(λ) approaches a continuous function. Even more when p>logn/n almost every ransom graph is connected. In this case the spectral density converges to a semicircular distribution : 4 np(1p)2 2 np(1p) 0 () if | | 2 np(1 p) otherwise Also the largest eigenvalue is isolated from the rest of the spectra and it increases as pn. Again it has observed that there are peaks on the empirical distribution (previous slide) which as argued is due to trees that are grafted to the giant component. Investigations show that peaks are due to small connected components, which is the case also when pn>>1 . More over odd moments are tend to 0 showing the non‐existence of odd‐cycles and also the existence of trees. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 97 Eigenvectors – The inverse participation ratio. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 98 Hits Algorithm We show that each eigenvector w=(w1, w2, …, wn)T can be interpreted as a weight function w(i) on the nodes of a graph. When the eigenvector are normalized the inverse participation ratio of the jth eigenvector is defined as: Ij (w ) 1kn 4 j k In the extreme cases, if (wj)k are uniformly distributed (probability = 1/√n) then Ij=1/n, while if (wj)k=0 except for k=s, 1≤s≤n where (wj)s =1 then Ij=1. A characteristic of random graphs is that the scatterplot of the inverse participation ratio against the corresponding eigenvalues reveals that are evenly distributed in a small bandwidth. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 99 Hits Algorithm WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 100 Hits Algorithm Dorogovtsev, Mendes, 2001 Yamaha Toyota Red: best authority google black: best hub The authority scores of the vertices are defined as the principal eigenvector of transpose(A)*A, where A is the adjacency matrix of the graph. The hubs scores of the vertices are defined as the principal eigenvector of A * transpose(A), where A is the adjacency matrix of the graph. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 101 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 102 PageRank Algorithm WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis PageRank ALgorithm 103 PageRank ALgorithm WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 104 Graph spectra (adjacency matrix) ‐ histogram Red: the best authority N=100, p=0.05 N=300, p=0.05 The largest (principal) eigenvalue, 1 , is isolated from the bulk of the spectrum, and it increases with the network size as pN. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 105 Eigenvector centrality G(1000,0.05) 106 Closeness centrality G(1000,0.05) what is the distribution of eigenvector centrality? WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis what is the distribution of Closeness centrality? 107 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 108 Betweenness centrality G(1000,0.05) Regular Random Graphs regular random graphs Gn, r‐reg A way to define regular random graphs of degree r on n vertices is to consider all possible matchings in a complete graph Krn, where each matching is chosen with equal probability). A regular random graph is then taken after partitioning the set of vertices into r subsets, each one containing n nodes. A G(50, 3‐reg) is presented at the left. Theorem. Let r ³ 3 and e >0 be fixed. Then for n∞: P( (1‐ e){logn/log(r‐1)} £ L(Gn,r‐reg) £ (1+ e){logn/log(r‐1)} ) 1 what is the distribution of Betweenness centrality? The drawn G50,3‐reg has diameter = 7 while log50/log(2‐1) = 5.7 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis Directed Random Graphs Let D(n,m) be a directed graph chosen at random from the set of all n(n 1) m D {D(n, m)} The following can be proved (under some more conditions, Molloy and Reed, 1995): Given a sequence of nonnegative real numbers λ0, λ1, …, λn which sums to 1, a random graph having approximately λin nodes of degree λi will have a giant component for n∞ if maximum degree £ n1/4 ‐ e and ∑ 1≤didi(di‐2) λi > 0 , with D(n,0) having no links The following Theorem can be proved: while if Let e > 0 be a positive constant and w(n)∞ (like logn, or loglogn). Then: (i) if e < 1 in almost every random digraph (i.e. containing more than one vertex) each nontrivial component of D(n,(1 ‐ e)n) is a cycle smaller than w; (ii) there is a positive constant α=α(e) such that D(n,(1 + e)n) almost surely contains a component of order larger than αn, and every other nontrivial component of D(n,(1 + e)n) is cycle smaller than w. (iii) If m/n ∞ then almost surely D(n,m) contains a unique nontrivial component of the size (1 – o(n))n. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 111 maximum degree is £ n1/8 ‐ e and ∑ 1≤didi (di‐2) λi < 0 there is no giant component. The problem is to generate a random graph with the given degree sequence. WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 112 How can we randomize a network while preserving the degree distribution? Random Graphs for General Degree Distributions Hence the configuration model also shows a phase transition, similar to that of the Bernoulli random graph, at which a giant component forms. Instead of a rigorous proof, consider a set of connected nodes and consider the “boundary nodes” that are immediate neighbors of that set. Grow that set by adding the boundary nodes to it one by one. When we add one such node the number of boundary nodes goes down by 1. However, the number of boundary nodes also increases by the number of the new neighbors which is di‐1. Thus the total change in the number of boundary nodes is ‐1 + (di‐1)= di‐2. However, the probability of a particular node being a boundary nodes is proportional to di, since there are di times as many edges by which a node of degree di could be connected to our set than a node of degree 1. Therefore the expected increase in the number of unknown neighbors is about ∑ 1≤didi(di‐2) λi , which have to be >0 in order to contain a giant component and not to exposed very quickly. Stub reconnection algorithm (M. E. Newman, et al, 2001, also known in mathematical literature since 1960s) Break every edge in two “edge stubs” AB to A B Randomly reconnect stubs Problems: Leads to multiple edges Cannot be modified to preserve additional topological properties ‐1 + (di‐1)= di‐2 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 110 Random Graphs for General Degree Distributions Directed graphs with n nodes and m links. In this case to prove the given results is more convenient to use graph processes (Markov processes), where D(n,m) is the (m+1)‐th stage of a the graph process: n(n1) m0 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 109 from Adamic 113 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 114 Another common distribution: power-law with an exponential cutoff Local rewiring algorithm p(x) ~ x-a e-k/ starts out as a power law 0 10 -5 p(x) 10 Randomly select and rewire two edges (Maslov, Sneppen, 2002, also known in mathematical literature since 1960s) -10 10 Repeat many times Preserves both the number of upstream and downstream neighbors of each node -15 10 0 1 10 2 10 3 10 10 x from Adamic WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis from Adamic but could also be a lognormal or double exponential… WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 115 116 Linear scale plot of straight bin of the data Example on an artificially generated data set How many times did the number 1 or 3843 or 99723 occur Power-law relationship not as apparent Only makes sense to look at smallest bins Take 1 million random numbers from a distribution with 5 4.5 x 10 4 3.5 4 frequency x 10 4.5 3.5 frequency 5 5 5 γ = 2.5 Can be generated using the so-called ‘transformation method’ Generate random numbers r on the unit interval 0≤r<1 then x = (1-r)γ is a random power law distributed real number in the range 1 ≤ x < 3 3 2.5 2 1.5 2.5 1 0.5 2 0 1.5 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 integer value 1 whole range 0.5 0 from Adamic WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 117 Log-log scale plot of straight binning of the data Same bins, but plotted on a log-log scale 0 2 4 6 8 10 12 14 16 18 20 integer value from Adamic first few bins WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 118 Log-log scale plot of straight binning of the data Fitting a straight line to it via least squares regression will give values of the exponent that are too low 6 10 6 here we have tens of thousands of observations when x < 10 5 10 10 fitted true 5 10 4 4 10 3 10 Noise in the tail: Here we have 0, 1 or 2 observations of values of x when x > 500 2 10 frequency frequency 10 3 10 2 10 1 10 Actually don’t see all the zero values because log(0) = 1 10 0 10 0 10 1 10 2 10 3 10 integer value WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 4 0 10 from Adamic 119 10 0 10 1 10 2 10 3 10 integer value WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 4 10 from Adamic 120 What goes wrong with straightforward binning Noise in the tail skews the regression result 1, 2, 4, 8, 16, 32, … 6 normalize by the width of the bin 10 data have few bins here 5 10 First solution: logarithmic binning bin data into exponentially wider bins: = 1.6 fit 6 10 data = 2.41 fit 4 10 4 10 evenly spaced datapoints 3 10 2 10 have many more bins here 2 10 less noise in the tail of the distribution 0 10 1 10 -2 10 0 10 0 10 1 2 10 10 3 10 -4 4 10 10 WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 121 Second solution: cumulative binning 2 3 10 4 10 10 from Adamic WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 122 fitted exponent (2.43) much closer to actual (2.5) No need to bin, has value at each observed value of x But now have cumulative distribution 6 10 data i.e. how many of the values of x are at least X -1 = 1.43 fit 5 is also power law but with an exponent -1 c ( 1) x 1 frequency sample > x 10 The cumulative probability of a power law probability distribution cx 1 10 disadvantage: binning smoothes out data but also loses information Fitting via regression to the cumulative distribution No loss of information 0 10 from Adamic 4 10 3 10 2 10 1 10 0 from Adamic WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 123 10 0 10 1 10 2 3 10 10 x from Adamic WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 124 Example: Where to start fitting? some data exhibit a power law only in the tail after binning or taking the cumulative distribution you can fit to the tail so need to select an xmin the value of x where you think the power-law starts certainly xmin needs to be greater than 0, because x is infinite at x = 0 Distribution of citations to papers power law is evident only in the tail (xmin > 100 citations) xmin from Adamic WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 4 10 125 from Adamic WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 126 Maximum likelihood fitting – best You have to be sure you have a power-law distribution (this will just give you an exponent but not a goodness of fit) n x 1 n ln i i 1 xmin 1 xi are all your datapoints, and you have n of them for our data set we get = 2.503 – pretty close! from Adamic WS.04 lecture on Random Graphs- C. Moyssiadis– V. Karagiannis 127