Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sapienza, Università di Roma Dottorato di Ricerca in Computer Science XXIII Ciclo – 2010 Algorithms and models for social networks Silvio Lattanzi Sapienza, Università di Roma Dottorato di Ricerca in Computer Science XXIII Ciclo - 2010 Silvio Lattanzi Algorithms and models for social networks Thesis Committee Prof. Alessandro Panconesi D. Sivakumar Prof. Angelo Monti (Advisor) Author’s address: Silvio Lattanzi Computer Science department Sapienza, University of Rome Via Salaria 113, 00198 Rome, Italy e-mail: [email protected] www: http://sites.google.com/site/silviolattanzi/ To Angela and Barbara Abstract The coming-together of the Internet and large scale social networks(i.e. Flikr, Facebook, MSN, Bebo, Twitter etc.) is having a deep and fruitful impact on the study of social networks. Thanks to the new amount of data available nowadays is now possible to observe social phenomena with greater precision and to study their evolution at a relatively fine temporal scale. These opportunities, together with the increasing economic importance of the internet economy create new interest on studying networks as unifying theme of research between computer science, economy, sociology and biological science. In this context, to fully characterize the statistical properties and to find suitable stochastic model for these kind of graphs is now a central problem in theoretical and experimental computer science. Within this umbrella, our research focus is to develop mathematical models of behavioral social networks and to study the performances of algorithms on real world graph both from a theoretical that from practical point of view. More specifically we first give a new model that explains the evolving properties of social networks and we analyze its algorithmic implications. Then we consider the diffusion of information in real networks and we give an explanation to the rumor spreading based on a statistical property of social network, i.e. the conductance. Finally, we study the compressibility of the WorldWideWeb and of its models, and we use our findings to design an algorithm to compress other social networks. i Acknowledgments I owe my deepest gratitude to Prof. Alessandro Panconesi for the great guidance and advice. He inspired me to work in this field and continuously transmitted excitement in regard to research. Without his suggestions and help this thesis would not have been possible. I would like to thank Flavio Chierichetti for many insights he shared with me and for the great time we spent together. I am grateful to my host in Yahoo! Research Ravi Kumar and my host in Google D. Sivakumar for the invaluable ideas they shared with me, for suggesting many wonderful problems and for working with me on them. I would like to thank Lorenzo Alvisi, Amitanand Aiyer, Allen Clement, Rafael Frongillo, Federico Mari, Michael Mitzenmacher, Ben Moseley, Prabhakar Raghavan, Siddarth Suri, Sergei Vassilvitskii, Andrea Vattani for being wonderful coauthors and great coworkers. I am grateful to Benjamin Doerr, Kevin McCurley, Patrick Nguyen and Mark Sandler for numerous insightful comments and discussions. I am indebted to my many of my university colleagues to support me during my PhD. A special thanks goes to Federico Mari, Francesco Davì, Emanuele Fusco, Massimo Lauria, Gaia Maselli, Igor Melatti, Simone Silvestri, Blerina Sinaimeri and Julinda Stefa. I would like to show my gratitude to my family for their incredible support, help and understanding. Finally, I would like to thank Livio Romano, Stefano Pozio, Alessandro D’Amico, Valeria Morra, Delia De Siervo, Simona Tramontana, Guido Bolognesi, Donna Alvarez, Riccardo Romano, Gabriele Carracoy, Paolo Pino, Matteo Bonavia, Barbara Lattanzi, Giancarlo Capezzuoli, Gianluca Gallo, Stefano Paoletti, Gioacchino Mendola, Andrea Giannantonio, Alessandro Bonelli, Amit Lavy, Sascha Trifunovic and Alex Loddengard for all the wonderful experiences we had together in the past three years. Contents 1 Introduction 1.1 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 Affiliation networks 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Our model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Concentration Theorems . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Degree distribution of B(Q, U ) . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Lipschitz condition for the random variable Xti . . . . . . . . . . . . . 2.5 Properties of the degree distribution of B(Q, U ) . . . . . . . . . . . . . . . . 2.6 Properties of the degree distributions of the graphs G(Q, E) and Ĝ(Q, Ê) . . 2.7 Densification of edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Shrinking/stabilizing of the effective diameter . . . . . . . . . . . . . . . . . 2.9 Sparsification of G(Q, E) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.1 Sparsification with preservation of the distances from a set of relevant nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.2 Sparsification with a stretching of the distances . . . . . . . . . . . . 2.10 Flexibility of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 13 14 15 16 18 20 22 23 25 26 3 Navigability of Affiliation Networks 3.1 Introduction . . . . . . . . . . . . . 3.2 Our model . . . . . . . . . . . . . . 3.3 Preliminaries . . . . . . . . . . . . 3.3.1 Concentration Theorems . . 3.4 Properties of the model . . . . . . . 3.5 The crucial role of weak ties . . . . 3.6 Local routing and the interest space 3.7 Experiments . . . . . . . . . . . . . . . . . . . . . 33 33 35 37 37 38 42 44 49 4 Gossip 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 55 57 58 . . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 29 30 ii CONTENTS 4.4 4.5 4.6 4.7 Warm-up: a weak bound . . . A tighter bound . . . . . . . . Push and Pull by themselves . Optimality of Corollary 6.5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Compressibility of the Web graph 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Incompressibility of the existing models . . . . . . . . . . . . . 5.3.1 Proving incompressibility . . . . . . . . . . . . . . . . . 5.3.2 Incompressibility of the preferential attachment model 5.3.3 Incompressibility of the ACL model . . . . . . . . . . . 5.3.4 Incompressibility of the copying model . . . . . . . . . 5.3.5 Incompressibility of the Kronecker multiplication model 5.3.6 Incompressibility of Kleinberg’s small-world model . . . 5.4 The new web graph model . . . . . . . . . . . . . . . . . . . . 5.5 Rich get richer . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Long get longer . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Compressibility of our model . . . . . . . . . . . . . . . . . . . 5.8 Other properties of our model . . . . . . . . . . . . . . . . . . 5.8.1 Bipartite cliques . . . . . . . . . . . . . . . . . . . . . . 5.8.2 Clustering coefficient . . . . . . . . . . . . . . . . . . . 5.8.3 Undirected diameter . . . . . . . . . . . . . . . . . . . 6 Compressibility of social networks 6.1 Introduction . . . . . . . . . . . . . . . 6.2 Related work . . . . . . . . . . . . . . 6.3 Compression Schemes . . . . . . . . . . 6.3.1 BV compression scheme . . . . 6.3.2 Backlinks compression scheme . 6.4 Compression-friendly orderings . . . . 6.4.1 Formulation . . . . . . . . . . . 6.4.2 Hardness results . . . . . . . . . 6.5 MLogA vs. MLinA vs. MLogGapA 6.6 Hardness of MLogA . . . . . . . . . . 6.7 Hardness of MLinGapA . . . . . . . . 6.8 Lowerbound: MLogA for expanders . 6.8.1 The shingle ordering heuristic . 6.8.2 Properties of shingle ordering . 6.9 Experimental results . . . . . . . . . . 6.9.1 Data . . . . . . . . . . . . . . . 6.9.2 Baselines . . . . . . . . . . . . . 6.9.3 Compression performance . . . 6.9.4 Temporal analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 68 74 75 . . . . . . . . . . . . . . . . . 77 77 80 81 82 82 84 87 89 90 91 93 94 99 100 100 101 101 . . . . . . . . . . . . . . . . . . . 103 103 104 105 105 106 107 107 108 108 109 110 112 112 113 115 115 115 116 117 CONTENTS iii 6.9.5 6.9.6 Why does shingle ordering work best? . . . . . . . . . . . . . . . . . 117 A cause of incompressibility . . . . . . . . . . . . . . . . . . . . . . . 119 Chapter 1 Introduction Over the past decade, the idea of networks as a unifying theme to study how social, technological, and natural systems are connected has emerged as an important and extroverted direction within computer science, biology, economy and sociology. Indeed the growth and the relevance of Internet-based networks draws the interest of several researchers on the study of this new topic. Furthermore thanks to the new technologies and the computing power available nowadays it is possible to analyze, discover and explain for the first time the main properties of those networks. Historically sociological, economic and biological studies on the structure and evolution of networks have been based simply on local information. Thus their main results focused only on dynamics of small chunks of a network. They were based only on small communities (the usual size of an analyzed network was about a hundred of individuals) and consisted of just few trials. Due to this lack of data it was hard to get any result on the global structure of networks or at least to have a rigorous verification of their results. Furthermore it was hard to analyze the interactions between nodes and communities on meaningful data-sets, while, as pointed out by Watts in [104], this information is crucial to understand the structure and the dynamics of real networks. In the Nineties, with the introduction of the Internet and the WorldWideWeb, it was possible for the first time to observe the dynamics and the evolution of those networks on a large scale. In addition, with the birth of the Web 2.0 and the consequent introduction of the blogosphere and social networks, it is now possible to have access to large data-sets. The new opportunities, plus the new computing power available nowadays, generated a new interest in analyzing common patterns in real networks. These studies led to the conclusion that different kinds of networks share some macroscopic statistical properties. For example studying the WorldWideWeb, the Internet, protein interactions graphs, Facebook and several other real graphs it has been noticed that all of them have similar degree distributions and similar community structure. At the same time the increasing economic relevance of the Internet-based economy and the proliferation of new social networks create an urge to study this interesting class of graphs more deeply in order to design more efficient and performing algorithms for them. In this new area of research three main trends arise independently: • Statistical analysis of the data. In order to have a better understanding of the 2 CHAPTER 1. INTRODUCTION behavior of social networks it is crucial to analyze the available data-sets and to find out common patterns. The main challenge here is to cope with the size of the datasets(often in the order of billions of nodes) and with the technical, bureaucratic and sometimes ethical issues to retrieve data. • Modeling the dynamics of social networks. The modeling effort goes hand to hand with the discovery of new statistic patterns of the data. After the discovery of the first static properties of real networks it was clear that existing models failed to explain several properties observed in real network. So recently there has been a lot of effort to come out with models for social networks that can explain mathematically such properties. • Analysis of algorithms for social networks. Once it is possible to describe and model this class of graphs it becomes of great interest to analyze the performance of known and new algorithms for them. In this context, the thesis focuses on developing mathematical models of behavioral social networks and studying the performance of algorithms in real world graph from a theoretical and practical point of view. In particular we addressed the following fundamental questions: • Can we formulate a stochastic random process that matches all the known static properties and the evolving properties of social networks? Can we use the new model to develop efficient algorithms? • Can we build a model that at the same time explains the statistical properties and the local routing properties of social networks? • Can we explain the fast diffusion of information in social networks? • Can we explain the compressibility of the Web? Can we achieve the same compression rate for other social graphs? In the following sections we first give a more precise definition of the problems that we study in the thesis and an overview of our main contributions in this thesis. 1.1 Roadmap In this section we introduce our main results and give an overview of the organization of the thesis. Affiliation Networks [Chapter 2] As already outlined in the previous section the problem of finding a suitable stochastic process to explain the properties of social networks attracts a lot of attentions from the theory community. In 2005 Leskovec et al, in a breakthrough paper [69], studied for the first time the evolving properties of social networks. The authors analyzed the behavior over time of the following social graphs: the ArXiv citation 1.1. ROADMAP 3 graph, the patent citation graph, the autonomous system graph and a few co-authorship networks. In particular they focused their attention on the the average degree and the diameter of the graph over time, with surprising results. The common wisdom before their work was that the average degree is constant time and that the diameter is slowly growing in time. Instead, they observed that the average degree is actually growing in time and, even more surprisingly, the diameter of social networks tends to shrink and finally stabilize over time. These findings have several interesting implications for social network analysis and immediately invalidate all the previously known models for social networks. In [67], we presented a new model that explains all the static and evolving properties of social networks, as well as densification and shrinking diameter. A nice aspect of our model is that it is based on the coevolution of a social network and an affiliation network, that is a bipartite graph that captures the connections between people and interests. This idea has strong sociological roots and appeared for the first time in the groundbreaking work of Breiger [18]. More precisely in our model there are two graphs that co-evolve in time: a bipartite graph on people and interest and a social graph on people. Initially we start with two graphs, a bipartite graph of people and interests and a people graph with the property that if two people share an interest in the bipartite graph they are friends in the people graph. In every time step we add an interest to the bipartite graph with probability α or a person to both graphs with probability 1 − α. (See figure 1.1) • When a new interest is added, it selects a prototype and copies a “perturbation” of its edges. Then the people graph is updated so that if two people share an interest in the bipartite graph they are friends. • When a new person is added, he(she) selects a prototype and connects to a “perturbation” of his/her neighborhood. Then the people graph is updated so that if two people share an interest in the bipartite graph they are friends. Finally a constant number preferential attachment edges going out from the new node in the people network are added. Note that in our model, there are two graphs that evolve at the same time, a bipartite graph on people and interests, and a social network on people. Let’s call them the friendship graph and the people-interest graph. In our model there are two kinds of social ties that arise independently. The first comes from a preferential attachment process, while the second comes from the existence of common interests. In this way we are able combine the intuition of Breiger with the idea of centrality. Using this technique, we can prove formally that our model not only enjoys the usual static properties of real networks, but also the evolving ones observed in [71]. Intuitively we get the static properties because we use a copying process to generate the bipartite graph. And we obtain the evolving properties because when we add an edge to the bipartite graph we add multiple edges to the people graph. In addition, by combining the effects of densification and of preferential attachment edges we are able to prove that the effective diameter initially shrinks and then stabilizes in time. 4 CHAPTER 1. INTRODUCTION P1 I1 P2 I2 P3 I3 P2 P1 P3 P1 I1 P2 I2 P3 I3 P2 P1 P3 P4 P4 (A) P1 I1 P2 I2 P3 I3 P2 P1 P3 (B) P1 I1 P2 I2 P3 I3 P2 P1 P3 P4 P4 P4 P4 (C) P1 I1 P2 I2 P3 I3 P2 P1 P3 (D) P1 I1 P2 I2 P3 I3 P2 P1 P3 P4 P4 (E) P4 P4 (F) Figure 1.1: Insertion of a new person in the affiliation network and the social network derived from it. (A)The initial affiliation network and the related social graph. (B)Insertion of P4 in the affiliation network. (C)P4 selects as prototype P3 . (D)P4 copies a perturbation of the edges of P3 . (E)The social graph is updated. (F)P4 adds some preferential attachment edges in the social graph. Finally we also analyze some algorithmic consequences of our model. Since once we understand the causes of the densification property of social networks it is natural to ask if we can produce sparse graphs that preserve connectivity properties of the initial social network. Specifically to overcome the difficulties of processing dense graph we study the performance of two simple sparsification algorithms in our model. We proved that there are sparsification algorithms that return a graph with a linear number of edges and that approximate all the distance with constant distortion. Navigability and Affiliation Networks [Chapter 3] One of the main limitations of previously known evolving models is that those models are not embedded in any space so it is impossible to define the concept of local information and to study their navigability. This is not true for the Affiliation Network model, where using the concept of interest space we are able to show that our model is navigable. 1.1. ROADMAP 5 Figure 1.2: An affiliation network(A) and the induced social network(B) and hierarchy of interests(C). The dotted lines from a to b in (A) represent that b is the prototype of a. In particular in figure we have that I1 is the prototype of I2 and I3 and that I3 is the prototype of I4 . From those relationship we derive the interest tree represented in (C). Specifically an interesting peculiarity of the Affiliation Networks model is that there are a friendship graph and people-interest graph that co-evolve at the same time. In addition using the notion of prototype inspired by the copying model it is possible to define a prototype interest tree, where every interest is connected to its prototypes(see Figure 1.2). So in our model, there are actually three graphs that evolve at the same time, a friendship graph, people-interest graph and a hierarchy of interests. We will refer to the latter as the “interest space”. Using this characteristic we can embed every node in the social graph in the interest space using its interests(i.e. every node is embedded in the positions of its interests, and we define a distance between two nodes as the minimum distance between any pair of interests of the first and of the second node). Using this definition it is possible to explain for the first time the “small-world phenomenon” in a model that matches all the static and evolving properties of real world graphs. Specifically, we proved that in our model, if every node knows the interest space and its neighborhood in the social graph, the greedy local routing algorithm routes a message from any node to any other node in at most polylogarithmically many steps. Furthermore, if the receiver is a high degree node(i.e. a “hub”) the algorithm will use only a constant number of rounds. One of the most interesting features of our model is that it is the first attempt to create a bridge between the study of the small-world phenomenon and the study of other statistical properties of social networks. Finally, this is the first model that can explain Milgram’s experiment if we include the presence of attrition [48], i.e. the unwillingness of people to forward the messages. In order to validate our model we ran a cyber-replica of Milgram’s experiment. We perform a series of Milgram’s experiments on the network of co-authorship in scientific 6 CHAPTER 1. INTRODUCTION papers, which naturally lends itself as a test-bed for evaluating our theory of social networks derived from affiliation networks. Furthermore our experiments are also the first attempt to make a cyber replica of Milgram’s experiment based on the interest space, and it is also the first time that some rudimental concept of data mining is used to explore the navigability of social networks. The empirical finding of our experiments confirmed Milgram’s initial result and give a stronger empirical evidence of the reliability of our model. Gossiping in social networks [Chapter 4] One of the aims of Milgram’s experiment was to show that information can be easily delivered in social networks. A similar question, that arises from everyday life, is whether information spreads efficiently in social networks. Indeed in the real world it is possible to find many examples in which information, viruses or malwares spread quickly in social graphs, so it would be interesting to understand why this is happening. First, we give an algorithmic formalization of the problem. As a first step we study the well known randomized broadcast algorithm, also known as rumor spreading. Demers et al [30] in particular defined three variants of this algorithm: PUSH, PULL and PUSH-PULL. In the PUSH strategy in each round, every informed node selects a neighbor uniformly at random and forwards the message to her(him). The PULL is a symmetric variant. In each round, every node that does not yet have the message selects a neighbor uniformly at random and asks for the information. Finally, the PUSH-PULL strategy is a combination of the two techniques: in each round every informed node performs a PUSH and every uninformed node performs a PULL. Second, instead of giving an arbitrary definition of social networks or to analyze the problem only in some specific model, we study the correlation between information dissemination and the conductance of the underlying network. In [72] it is shown empirically that social networks have high conductance. In particular we prove that high conductance implies that the PUSH-PULL strategy is fast, specifically we show that if a connected graph with n nodes has conductance φ then rumour spreading, also known as randomized broadcast, successfully broadcasts a message within Õ(φ−1 · log n), many rounds with high probability. This result is almost tight since there exists graph of n nodes, and conductance φ, with diameter Ω(φ−1 · log n). Furthermore we also show that high conductance is not a sufficient condition for PUSH or PULL to be efficient by themselves. Compressible models for the Web graph [Chapter 5] Compressibility is a fundamental property of large scale graphs, indeed the ability of storing the structure of these graphs using few bits has a great impact on the possibility to efficiently store and manipulate these massive amounts of data. In an intriguing set of papers Boldi, Santini and Vigna [9, 10] showed that the web is compressible using just a few bits per link (i.e. 2-3 bits per link on average). These findings suggest that the Web is compressible using just O(1) bits per link. Starting from this observation we studied the compressibility of various well-known models in order to understand if they can explain the good compression rate observed in [9, 10]. More precisely we study the entropy of several well-known web graph models and by using a min-entropy argument we are able to prove that their entropy is too large to explain 1.1. ROADMAP 7 the compressibility of the Web. Specifically they need to store at least Θ(log n) bits per link, on average. For this reason we introduce and analyze mathematically a new evolving model for the Web graph that is able to explain all the well-known static properties of the Web and that can explain also the good compression rate. In particular, our model achieves O(1) bits per link, on average. Compressibility of social networks [Chapter 6] As underlined in the previous subsection compressibility is a fundamental property of the Web graph. It is unclear if also other social networks share it. We study this problem formally for the first time and we introduce a new algorithm that outperforms all the previously known compression technique. First, we adapted the Boldi and Vigna technique to compress general social networks. The main problem with this approach is that in their work Boldi and Vigna use heavily two properties of the URLs ordering: • Locality: web pages that are close in the ordering point to a similar set of pages. • Proximity: the typical edge length is small. Those properties arise naturally in the case of URL ordering but it is not clear how to obtain them in the case of social networks. Indeed in this second case there is no natural ordering that we can use to sort the nodes. Our first contribution is to define the concept of optimal ordering for compression and to show that to find the optimal order is NP-hard. We then design heuristic to overcome this hardness. Our new heuristic used shingles [20] to measure the similarity of the outgoing edges of two nodes, and then to order the nodes in such a way that similar nodes appear close in the ordering. We also propose a new compression method inspired by the Boldi and Vigna algorithm that in addition exploits also link reciprocity in social networks. Finally we then perform an extensive set of experiments on four large real-world graphs, including two social networks. Our experimental results show that social networks and the Web graph exhibit different compressibility characteristics. Even if we can compress those graphs using only 10 bits per link in average, we cannot compress any graph using less than 8 bits per links. These findings support the intriguing idea that entropy can be used to distinguish different kind of social networks that have similar static properties but that differ in the amount of randomness of their structure(indeed it seems that the Web graph seems to be way more structured due to the domain hierarchy). Now we move to the core of the thesis, the thesis is organized in 6 chapters(the introduction and the chapters described above), each chapter is self-contained consisting of a brief introduction, an explanation of related works and the presentation of our results. Chapter 2 Affiliation networks In the last decade, structural properties of several naturally arising networks (the Internet, social networks, the web graph, etc.) have been studied intensively with a view to understanding their evolution. In recent empirical work, Leskovec, Kleinberg, and Faloutsos identify two new and surprising properties of the evolution of many real-world networks: densification (the ratio of edges to vertices grows over time), and shrinking diameter (the diameter reduces over time). These properties run counter to conventional wisdom, and are certainly inconsistent with graph models prior to their work. In this chapter, we present the first model that provides a simple, realistic, and mathematically tractable generative model that intrinsically explains all the well-known properties of the social networks, as well as densification and shrinking diameter. Our model is based on ideas studied empirically in the social sciences, primarily in the groundbreaking work of Breiger (1973) on bipartite models of social networks that capture the affiliation of agents to societies. We also present algorithms that harness the structural consequences of our model. Specifically, we show how to overcome the bottleneck of densification in computing shortest paths between vertices by producing sparse subgraphs that preserve or approximate shortest distances to all or a distinguished subset of vertices. This is a rare example of an algorithmic benefit derived from a realistic graph model. Finally, our work also presents a modular approach to connecting random graph paradigms (preferential attachment, edge-copying, etc.) to structural consequences (heavy-tailed degree distributions, shrinking diameter, etc.). 2.1 Introduction The aim of this chapter is to develop mathematical models of real-world “social” networks that are realistic, mathematically tractable, and — perhaps most importantly — algorithmically useful . There are several models of social networks that are natural and realistic (fit available data) but are hard from an analytical viewpoint; the ones that are amenable to The work described in this chapter is a joint work with D. Sivakumar and its extended abstract appeared in the Proceedings of 41st ACM Symposium on Theory of Computing (STOC09) [67]. 9 10 CHAPTER 2. AFFILIATION NETWORKS mathematical analysis or that have algorithmic significance are often unnatural or unrealistic. In contrast, we present a model, rooted in sociology, that leads to clean mathematical analysis as well as algorithmic benefits. We now briefly outline the history of significant recent developments in modeling realworld networks that provide the immediate context for our work. The numerous references from and to these salient pieces of work will offer the reader a more comprehensive picture of this area. Internet and Web Graphs. One of the first observations that led to the interest in random graph models significantly different from the classical Erdös–Rényi models comes in the work of Faloutsos et al. [38], who noticed that the degree distribution of the Internet graph1 is heavy-tailed, and roughly obeys a “power law,” that is, for some constant α > 0, the fraction of nodes of degree d is proportional to d−α . Similar observations were made about the web graph2 by Barabasi and Albert [5], who also presented models based on the notion of “preferential attachment,” wherein a network evolves by new nodes attaching themselves to existing nodes with probability proportional to the degrees of those nodes. Both works draw their inspiration and mathematical precedents from classical works of Zipf [108], Mandelbrot [76], and Simon [97]. The latter work was formalized and studied rigorously in [14, 15, 28]. Broder et al. [21] made a rich set of observations about the degree and connectivity structure of the web graph, and showed that besides power-law degree distribution, the web graph consisted of numerous dense bipartite subgraphs (often dubbed “communities”). Within theoretical CS, Aiello et al. [2], and Kumar et al. [65] presented two models of random graphs, both of which offer rigorous explanations for power-law degree distributions; the models of [65] also led to graphs with numerous dense bipartite subgraphs, the first models to do so. The models of [65] are based on the notion of graph evolution by “copying,” where each new vertex picks an existing vertex as its “prototype,” and copies (according to some probabilistic model) its edges. Preferential attachment and edge copying are two basic paradigms that both lead to heavy-tailed degree distributions and small diameter. The former is simpler to analyze, and indeed, despite its shortcomings with respect to explaining community structure, it has been analyzed extensively [14, 15, 25, 26]. For an entirely different treatment, see [37]. Small-World Graphs. In another development, Watts and Strogatz [103], Kleinberg [58, 59], and Dodds et al. [31] revisited a classic 1960’s experiment of the sociologist Stanley Milgram [79], who discovered that, on average, pairs of people chosen at random from the population are only six steps apart in the network of first-name acquaintances. In Kleinberg’s model, vertices reside in some metric space, and a vertex is usually connected to most other vertices in its metric neighborhood, and, in addition, to a few “long range” neighbors. Kleinberg introduced an algorithmic twist, and proved the remarkable result that the network has small diameter and easily discoverable paths iff the long-range neighbors are chosen in a specific way. 1 Loosely speaking, this is the graph whose vertices are computers and whose edges are network links. This is the graph whose vertices are web pages, and whose directed edges are hyperlinks among web pages. 2 2.1. INTRODUCTION 11 Kleinberg’s models, dubbed “small-world networks,” offer a nice starting point to analyze social networks3 . A piece of folklore wisdom about social networks is the observation that friendship is mostly transitive, that is, if a and b are friends and b and c are friends, then there is a good chance that a and c are friends as well. Kleinberg’s model certainly produces graphs that satisfy this condition, but because of its stylized nature, isn’t applicable in developing an understanding of real social networks. The other limitation of Kleinberg’s model is that it is static, and is not a model of graph evolution. Densification and Shrinking Diameter. Returning to the topic of evolving random graphs, the next significant milestone is the work of Leskovec et al. [71], who made two stunning empirical observations, both of which immediately invalidate prior models based on preferential attachment, edge copying, etc., as well as the small-world models. Namely, they reported that real-world networks became denser over time (super-constant average degree), and their diameters effectively decreased over time! The dual pursuits of empirical observations and theoretical models go hand in hand4 , and the work of [71] poses new challenges for mathematical modeling of real-world networks. Along with their observations, Leskovec et al. [71] present two graph models called the “community guided attachment” and “forest fire model”. The former is a hierarchical model, and the latter is based on an extension of edge copying. While several analytical results are proved concerning these two models in [71], the models are quite complex and do not admit analyses powerful enough to establish all the observed properties, most notably degree distribution, densification, and shrinking diameter simultaneously. The papers [69] and [75] study models explicitly contrived to be mathematically tractable and yielding the observed properties, without any claims of being realistic or intuitively natural. In the opposite direction, Leskovec et al. [68] propose a model that fit the data quite well, but that do not admit mathematical analyses. The crucial features of the latter model are that edges are created based on preferential attachment and by randomly “closing triangles.” Affiliation Networks. Our design goals for a mathematical model of generic social networks are that it should be simple to state and intuitively natural, sufficiently flexible and modular in structure with respect to the paradigms employed, and, of course, by judicious choice of the paradigms, offer compelling explanations of the empirically observed phenomena. The underlying idea behind our model is that in social networks there are two types of entities — actors and societies — that are related by affiliation of the former in the latter. These relationships can be naturally viewed as bipartite graphs, called affiliation networks; the social network among the actors that results from the bipartite graph is obtained by “folding” the graph, that is, replacing paths of length two in the bipartite graph among actors by an (undirected) edge. The central thesis in developing a social network as a folded 3 collaboration networks among authors, email and instant messaging networks, as well as the ones underlying Friendster, LiveJournal, Orkut, LinkedIn, MySpace, FaceBook, Bebo, etc. Indeed, the work of [73] demonstrates interesting correlations of friendships on the LiveJournal network with geographic proximity as an underlying metric for a small-world model. 4 see Mitzenmacher’s editorial [81] for an eloquent articulation of this phenomenon 12 CHAPTER 2. AFFILIATION NETWORKS affiliation network is that acquaintance among people often stem from one or more common or shared affiliations — living on the same street, working at the same place, being fans of the same football club, having coauthored a paper together, etc. Affiliation networks are certainly not new — indeed, this terminology is prevalent in sociology, and a fundamental 1974 paper of Breiger [18] appears to be the first one to explicitly address the duality of “persons and groups” in the context of “networks of interpersonal ties... [and] intergroup ties.” Breiger notes that the metaphor of this “dualism” occurs as early as in 1902 in the work of Cooley. Finally in two previous paper the connectivity and the degree distribution of a similar static version of this model as been studied in [52, 54, 85]. Our model for the evolving affiliation network and the consequent social network incorporates elements of preferential attachment and edge copying in fairly natural ways. The folding rule we analyze primarily in the chapter is the one that places an undirected edge between every pair of (actor) nodes connected by a length-2 path in the bipartite graph. We comment briefly on some extensions for which our analyses continue to work, and more generally, on the flexibility of our model, Section 2.10. We show that when an affiliation network B is generated according to our model and its folding G on n vertices is produced, the resulting graphs satisfy the following properties: (1) B has a power-law distribution, and G has a heavy-tailed degree distribution as well, and all but o(n) vertices of G have bounded degree; (2) under a mild condition on the ratio of the expected degree of actor nodes and society nodes in B, the graph G has superlinear number of edges; (3) under the same condition, the effective diameter of G stabilizes to a constant. Algorithmic Benefits, and an Application. Although they are very interesting, these structural properties do not yield any direct insight into the development of efficient algorithms for challenging problems on large-scale graphs. With our model of networks based on affiliation graphs, we take a significant step towards remedying this situation. We show how we can approach path problems on our networks by taking advantage of a key feature in their structure. Namely, we utilize the fact that even though the ultimate network produced by the model is dense, there is a sparse (constant average degree) backbone of the network given by the underlying affiliation network. First we show that if we are given a large random set R of distinguished nodes and we care about paths from arbitrary nodes to nodes in R, then we can sparsify the graph to have only a small constant fraction of its edges, yet preserving all shortest distances to vertices in R. Secondly, we show that if we are allowed some distortion, we can sparsify the graph significantly via a simple algorithm for graph spanners: namely, we show that we can sparsify the graph to have a linear number of edges, while stretching distances by no more than a factor given by the ratio of the expected degree of actor and society nodes in the affiliation network. Finally, we mention our motivating example: a “social” network that emerges from search engine queries, where these shortest path problems have considerable significance. The affiliation network here is the bipartite graph of queries and web pages (urls), with edges between queries and urls that users clicked on for the query; by folding this network, we may produce a graph on just the queries, whose edges take on a natural semantics of relatedness. Now suppose we are given a distinguished subset of queries that possess some 2.2. OUR MODEL 13 significance (high commercial value, topic names, people names, etc.). Given any query, we can: find the nearest commercial queries (to generate advertisements), classify the query into topics, or discover people associated with the query. We have empirically observed that our sparsification algorithms work well on these graphs with hundreds of millions of nodes. Critique of our work. In this chapter we only analyze the most basic folding rule, namely replace each society node in the affiliation network by a complete graph on its actors in the folded graph. As noted in Section 2.10, this could be remedied somewhat without losing the structural properties; we leave for future work a more detailed exploration of the possibilities here. The next drawback of our models is that given a social network (or other large graph), it is not at all clear how one can test the hypothesis that it was formed by the folding of an affiliation network. The general problem of solving, given a graph G on a set Q of vertices, whether it was obtained by folding an affiliation network on vertex sets Q and U , where |U | = O(|Q|), is NP-Complete. Finally, our model of folded affiliation networks seems limited to social networks among people related together by various attributes (the societies). A feature that is often seen in several large real networks that appears to be missed by our model is the presence of an approximately hierarchical structure (for example, the Internet graph exhibits an approximate hierarchy in the form of autonomous systems, domains, intra- and inter-domain edges via gateways, and so forth). 2.2 Our model In our model, two graphs evolve at the same time. The first one is a simple bipartite graph that represents the affiliation network; we refer to this graph as B(Q, U ). The second one it is the social network graph, we call this graph G(Q, E). The set Q is the same in both graphs. As defined, G(Q, E) is a multigraph, so we also analyze the underlying simple graph Ĝ(Q, Ê). For readability, we present the two evolution processes separately even though the two graphs evolve together. More precisely the bipartite graph B(Q, U ) evolves independently, and at every step G(Q, E) is obtained by “folding” the edges of B(Q, U ) and by adding some extra edges (the folding process is simply B(Q, U )2 [Q], where B(Q, U )2 is the usual product (composition) of B with itself and [Q] denotes the subgraph of B(Q, U ) induced by Q). In order to understand the intuition behind this evolving process, let us consider, for example, the citation graph among papers. In this case the bipartite graph consists of papers, the set Q, and topics, the set U . Now when an author writes a new paper, he probably has in mind some older paper that will be the prototype, and he is likely to write on (a subset of the) topics considered in this prototype. Similarly, when a new topic emerges in the literature, it is usually inspired by an existing topic (prototype) and it has been probably foreseen by older papers. To continue the example of citation networks, the intuition behind the construction of G(Q, E) is that when an author writes the references of a new paper he will cite all, or most, of the paper on the same topics and some other papers of general interest. The same ideas 14 CHAPTER 2. AFFILIATION NETWORKS B(Q, U ) G(Q, E) Fix integers cq , cu , s > 0, and let β ∈ (0, 1). At time 0, G0 (Q, E) consists of the subset Fix two integers cq , cu > 0, and let β ∈ Q of the vertices of B0 (Q, U ), and two ver(0, 1). tices have an edge between them for every At time 0, the bipartite graph B0 (Q, U ) neighbor in U that they have in common is a simple graph with at least cq cu edges, in B0 (Q, U ). where each node in Q has at least cq edges At time t > 0: and each node in U has at least cu edges. (Evolution of Q) With probability β: At time t > 0: (Arrival ) A new node q is added to Q. (Edges via Prototype) An edge between q (Evolution of Q) With probability β: and another node in Q is added for ev(Arrival ) A new node q is added to Q. ery neighbor that they have in common in (Preferentially chosen Prototype) A node B(Q, U ) (note that this is done after the q 0 ∈ Q is chosen as prototype for the new edges for q are determined in B). node, with probability proportional to its (Edges via evolution of U ) degree. With probability 1 − β: (Edge copying) cq edges are “copied” from A new edge is added between two nodes q1 q 0 ; that is, cq neighbors of q 0 , denoted by and q2 if the new node added to u ∈ U is u1 , . . . , ucq , are chosen uniformly at ran- a neighbor of both q1 and q2 in B(Q, U ). dom (without replacement), and the edges (Preferentially Chosen Edges) A set of (q, u1 ), · · · , (q, ucq ) are added to the graph. s nodes q , . . . , q is chosen, each node ini1 is dependently of the others (with replace(Evolution of U ) With probability 1 − β, a new node u is added to U following a ment), by choosing vertices with probasymmetrical process, adding cu edges to u. bility proportional to their degrees, and the edges (q, qi1 ), . . . , (q, qis ) are added to G(Q, E). that suggest this model as a reasonable model for the citation graph can be applied also to several other social graphs. We call folded any edge that is in G0 (Q, E) or has been added to G(Q, E) via the prototype or by evolution of U ; the set of folded edges is denoted by F . In the next section we will introduce some notation and some results that we will use in this chapter. 2.3 Preliminaries We say that an event occurs with high probability (whp) if it happens with probability 1 − o(1), where the o(1) term goes to zero as n, the number of vertices goes to ∞. We 1 denote with ∆ the fraction cu1(1−β) and with ∆0 the fraction . We define eB0 as the cq β 4+ cq β 4+ c u (1−β) number of edges of the initial graph B0 (Q, U ). Finally, we denote c∗ and c∗ respectively the max(cq , cu ) and the min(cq , cu ). 2.3. PRELIMINARIES 2.3.1 15 Concentration Theorems Now we recall two important properties of functions that makes the task of establishing measure concentration results easier, and present the relevant concentration results from the literature. Definition 2.3.1 [Averaged Lipschitz Condition] A function f satisfies the averaged Lipschitz condition with parameters cj , j ∈ [n] with respect to the random variables X1 , · · · , Xn if for any aj , a0j and for 1 ≤ j ≤ n. 0 E f (X , · · · , X ) X = a , · · · , X = a −E f (X , · · · , X ) X = a , · · · , X = a 1 n 1 1 j j 1 n 1 1 j j ≤ dj Lemma 2.3.1 [cf. [77]] Assume f satisfies the averaged Lipschitz condition with respect to the variables X1 , · · · ,P Xn with parameters cj , j ∈ [n]. Then P r[|f − E[f ]| > t] ≤ 2 exp (−t /2c), where c = j≤n c2j . Definition 2.3.2 (Hereditary Property and Hereditary Function) A Boolean property ρ(x, J ), where x is a sequence of n reals and J is a family of subsets of [n], is said to be a hereditary property of index sets if: (1) ρ is a property of index sets, that is, if xj = yj for every j ∈ J ∈ J , then ρ(x, J ) = ρ(y, J ); (2) ρ is non-increasing on the index sets, that is, if I ⊆ J , then ρ(x, J ) ⇒ ρ(x, I). Let fρ (x) be the function determined by a hereditary property of index sets ρ given by fρ = maxJ :ρ(x,J ) |J |; we will call fρ a hereditary function of index sets. The concentration result for hereditary functions of index sets is a consequence of the Talagrand’s inequality [34]. Theorem 2.3.1 [ [34]] Let M [f ] be the median of f (x) and fρ (x) be a hereditary function of index sets. Then for all t > 0, Pr[f > M [f ]+t] ≤ 2 exp (−t2 /(4(M [f ] + t))), and Pr[f < M [f ]−t] ≤ 2 exp (−t2 /(4(M [f ]))). The next proposition gives a passage for concetration theorems for the median value of a function to its mean value. Proposition 2.3.1 The following are equivalent for an arbitrary function f and random variables X1 , · · · , Xn : (1) For all t > 0, there exist c1 , α1 > 0 such that Pr[|f − E[f ]| > t] ≤ c1 e−α1 t . (2) For all t > 0, there exist c2 , α2 > 0 such that Pr[|f − M [f ]| > t] ≤ c2 e−α2 t . 16 CHAPTER 2. AFFILIATION NETWORKS 2.4 Degree distribution of B(Q, U ) Theorem 2.4.1 For the bipartite graph B(Q, U ) generated after n steps, almost surely, when n → ∞, the degree sequence U ) follows a power law distribution of nodes in Q (resp. cq β cu (1−β) with exponent α = −2 − cu (1−β) α = −2 − cq β , for every degree smaller than nγ , with γ < ∆0 (γ < ∆). Before we present the proof, we recall the following useful lemma from [2]. Lemma 2.4.1 [ [2]] If a sequence at satisfies the recursive formula at+1 = (1 − bt /t) at + ct for t ≥ t0 , where limt→∞ bt = b > 0 and limt→∞ ct ≥ c exist. Then limt→∞ at /t exists and equals c/(1 + b). Proof: (of Theorem 2.4.1) Let Xti be the random variable that counts the number of nodes i i in Q of degree i at time t. We would write Eti = E[Xti ] in terms of Et−1 = E[Xt−1 ]. First, we analyze the case when i = cq . c c q + Pr[a new node is added to Q] Et q = Et−1 −E[number of nodes in Q with degree cq at time t − 1 whose degrees increase]. In the random process that generates B(Q, U ) the degree of a node in Q can increase if and only if a node is added to U , so we have that: c c q Et q = Et−1 + Pr[a new node is added to Q] − (1 − β) E[num. nodes in Q of degree cq at time t − 1 whose degrees increase | a node is added toP U] cq u = Et−1 + β − (1 − β) ci=1 Pr[a node whose degree is cq at time t is chosen as endpoint for the i-th edge] where the second equation comes from linearity of expectation. c Let Et q |G(t − 1) be the expectation of the random variable that counts the number of nodes in Q of degree i at time t given that G(Q, E) at time t − 1 is equal to G(t − 1). Now by noticing that in the random process the endpoint of any edge in Q is chosen with equal probability as a destination of the i-th new edge, we have: c c E cq cq q Et q |G(t − 1) = Et−1 + β − (1 − β)cu et−1t−1 +eB 0 cq cq = Et−1 1 − (1 − β)cu et−1 +eB + β, 0 where et−1 is the number of edges added by the process until time t−1 and eB0 is the number of edges in B0 (Q, U ). Using the Chernoff bound and summing over all the possible G(t − 1) we obtain: cq cq cq Et = Et−1 1 − (1 − β)cu (cq β+cu (1−β))(t−1)±o(t)+eB + o(1) + β 0 cq cq = Et−1 1 − (1 − β)cu (cq β+cu (1−β))(t−1)±o(t) + o(1) + β cq cq = Et−1 1 − (1 − β)cu (cq β+cu (1−β))(t−1) (1 ± o(1)) + o(1) + β 2.4. DEGREE DISTRIBUTION OF B(Q, U ) 17 Where with o(1) we consider the sum of all the G(t − 1) where the number of edges is far from the mean. Thus, using lemma 2.4.1, we have: c Et q β β(cq β + cu (1 − β)) lim = . = cq t→∞ t 1 + (1 − β)cu (cq β+cu (1−β)) cq β + cu (1 − β) + (1 − β)cu cq Now let us analyze the general case when i > cu . We have that: i Eti = Et−1 − E[number of nodes in Q, with degree i at time t − 1, that increase their degree] + E[number of nodes in Q with degree smaller than i at time t − 1 that increase their degree to i] Noticing that in the bipartite graph there are no multiple edges and with an analysis similar to the case of cq we get: i−1 i i 1 − (1 − β)cu et−1 Eti = Et−1 + (1 − β)cu ei−1 Et−1 + o(1) t−1 i i (1 ± o(1)) 1 − (1 − β)cu (cq β+cu (1−β))(t−1) = Et−1 i−1 i−1 +(1 − β)cu (cq β+cu (1−β))(t−1) (1 ± o(1))Et−1 + o(1). Let define Y i = limt→∞ Eti /t. Thus from lemma 2.4.1, we have: Yi = = = = i−1 Y i−1 q β+cu (1−β)) i 1+(1−β)cu (c β+c (1−β)) q u (1−β)cu (c (1−β)cu (i−1) Y i−1 cq β+cu (1−β)+(1−β)cu i (i−1) i−1 Y c β q i+1+ c (1−β) u Q k−1 Y cu ik=cu +1 cq β k+1+ c (1−β) u = Y cu Γ(i) cq β −2− c (1−β) u ∼ i cq β ) u (1−β) Γ(i+2+ c cq β ) u (1−β) Γ(cu +2+ c Γ(cu ) . Finally, to obtain theorem 2.4.1 we need to prove that the variables Xti are concentrated around their expected values. In order to do this we describe our random process in term of two random choice. That is, at each time step firstly a biased coin is tossed and a node is added to Q or U according to the outcome, then a set of endpoints is chosen. Let be Ct and St the two random choices at time t. We show that the random variables Xti satisfies certain “bounded differences” property, defined below in 2.4.2. Before we prove Lemma 2.4.2, we will complete the proof of Theorem 2.4.1, using Lemma 2.3.1. Combining Lemma 2.4.2 and Lemma 2.3.1, we obtain the concentration result. By noticing that all proofs hold also for the distribution of nodes in U by symmetry, Theorem 2.4.1 follows. 18 CHAPTER 2. AFFILIATION NETWORKS Lipschitz condition for the random variable Xti 2.4.1 Lemma 2.4.2 The random variables Xti satisfies the averaged Lipschitz condition with parameters 2cu + (2cu + 2cq )(i + 1) with respect to the random variables C1 , S1 · · · , Cn , Sn c Proof: First, we start by computing the Averaged Lipschitz Condition for the variable Xt q . We begin by analizing what is the change if a different set of endpoints is chosen. At time j we have that the maximum possible difference is cu . This is due to the fact that if a node c is added to Q even changing the endpoints of its starting edges will no effect Xj q , instead adding a node to U changing the endpoints of its starting edges will effect at most cu nodes. Thus, recalling the notation Eti = E[Xti ], we have: c c cq ∆jq = |Ej q − Ê j | ≤ cu Now we have that: c cq c ∆t q = |Et q −Ê t | PeB0 +(t−1)(cu +cv ) cq cq = |Et−1 1 − k=1 (1 − β)cu k P r(k edges at time t − 1) + β+ PeB0 +(t−1)(cu +cv ) cq (1 − β)cu ckq P r(k edges at time t − 1) − β| −Ê t−1 1 − k=1 PeB0 +(t−1)(cu +cv ) cq cq = 1 − k=1 − Ê t−1 | (1 − β)cu ckq P r(k edges at time t − 1) |Et−1 cq c q ≤ |Et−1 − Ê t−1 | ≤ · · · ≤ cu Let us consider the case when we have a different result in the tossed coin. In one case we add the j − th node nj to Q and in the other we add nj to U . So we have that: c c ∆jq = c |Ej q − cq Êj | = cq |Xj−1 +1− cq Xj−1 q cq Ej−1 | ≤ cu + 1 + cu ej−1 Thus: c c cq q ∆t q = |E t −Ê t |P eB0 +(t−1)(cu +cv ) cq = Et−1 1 − k=1 (1 − β)cu ckq P r(k edges at time t − 1)|vj ∈ Q) + β+ PeB0 +(t−1)(cu +cv ) cq cq −Ê t−1 1 − k=1 (1 − β)cu k P r(k edges at time t − 1)|vj ∈ U ) − β Now we cannot use the same trick of before, because we have different number of edges, but we have that P r(k − cq + cu edges at time t − 1|nj ∈ U ) = P r(k edges at time t − 1|nj ∈ Q). In this case we assume wlog that cu ≥ cq . So we have: PeB0 +(t−1)(cu +cv ) cq cq ∆t = Et−1 1 − k=1 (1 − β)cu k+ccuq−cq P r(k edges at time t − 1|vj ∈ U ) + PeB0 +(t−1)(cu +cv ) cq −Ê t−1 1 − k=1 (1 − β)cu ckq P r(k edges at time t − 1|vj ∈ U ) PeB0 +(t−1)(cu +cv ) cq cq cq = (Et−1 − Ê t−1 ) 1 − k=1 (1 − β)cu k P r(k edges at time t − 1)|vj ∈ U P eB0 +(t−1)(cu +cv ) cq 1 1 +Et−1 (1 − β)c c − P r(k edges at time t − 1| u q k+cu −cq k k=1 |vj ∈ U ) 2.4. DEGREE DISTRIBUTION OF B(Q, U ) 19 c q Now suppose by induction that ∆t−1 ≤ 2cu + 2cq . We have two cases if the expression in the absolute value is ≥ 0 then PeB0 +(t−1)(cu +cv ) cq cq c − Ê t−1 ) 1 − k=1 (1 − β)cu ckq P r(k edges at time t − 1)|vj ∈ U + ∆t q = (Et−1 P eB0 +(t−1)(cu +cv ) cq cu −cq P r(k edges at time t − 1|v ∈ U ) −Et−1 (1 − β)c c j u q k(k+cu −cq ) k=1 PeB0 +(t−1)(cu +cv ) cq cq cq (1 − β)cu k P r(k edges at time t − 1)|vj ∈ U + ≤ (Et−1 − Ê t−1 ) 1 − k=1 P eB0 +(t−1)(cu +cq ) −(cu + cv ) (1 − β)P r(k edges at time t − 1|v ∈ U ) j k=1 cq ≤ max(∆t−1 , cu − cv ) Otherwise if the expression in the absolute value is < 0 PeB0 +(t−1)(cu +cv ) cq c cq ∆t q ≤ (Ê t−1 − Et−1 ) 1 − k=1 (1 − β)cu ckq P r(k edges at time t − 1)|vj ∈ U + P eB0 +(t−1)(cu +cq ) cq cu −cq +Et−1 (1 − β)c c P r(k edges at time t − 1|v ∈ U ) u q k(k+cu −cq ) j k=1 P eB0 +(t−1)(cu +cv ) cq cq cu −cq cq ≤ ∆t−1 − ∆t−1 − cq (1 − β)cu k P r(k edges at time t − 1)| k=1 |vj ∈ U ≤ 2cu + 2cq Thus we have the 2cu + 2cq -Averaged Lipschitz Condition with respect to the vaiables c C1 , S1 · · · , Cn , Sn for the random variable Xt q . Now we have to analyze the case of Xti . First, we begin by analizing what is the change if a different set of endpoints is chosen. We have: i ∆ij ≤ |Eji − Ê j | ≤ cu So: i i ∆it = |E t − Ê t | P eB0 +(t−1)(cu +cv ) i = Et−1 1 − k=1 (1 − β)cu ki P r(k edges at time t − 1) + PeB0 +(t−1)(cu +cv ) i−1 + k=1 (1 − β)cu i−1 P r(k edges at time t − 1)Et−1 k PeB0 +(t−1)(cu +cv ) i i −Ê t−1 1 − k=1 (1 − β)cu k P r(k edges at time t − 1) + PeB0 +(t−1)(cu +cv ) i−1 − k=1 (1 − β)cu i−1 P r(k edges at time t − 1)Ê t−1 k P i i eB0 +(t−1)(cu +cv ) i i = Et−1 − Ê t−1 − k=1 (1 − β)cu ki P r(k edges at time t − 1)(Et−1 − Ê t−1 )+ PeB0 +(t−1)(cu +cv ) i−1 i−1 − Ê t−1 ) + k=1 (1 − β)cu i−1 P r(k edges at time t − 1)(Et−1 k Thus: ∆it PeB0 +(t−1)(cu +cv ) i i−1 1 i ≤ ∆t−1 − k=1 (1 − β)cu k P r(k edges at time t − 1)(i∆t−1 − (i − 1)∆t−1 ) i−1 ≤ ∆t−1 + 1 Now let us focus on the case when there is a different result in the tossed coin. In one case we add the j − th node nj to Q and in the other we add nj to U . As before we assume wlog 20 CHAPTER 2. AFFILIATION NETWORKS that cu ≥ cq . So we have that: ∆ij = |Eji − i Ê j| = cq |Xj−1 − cq Xj−1 i−1 i cq (i − 1)Xj−1 iXj−1 − cu | ≤ cu + 1 + cu ej−1 ej−1 Like before also here we can have a different number of edges, so we analyze this case as before: i i ∆it = |E t − Ê t | P eB0 +(t−1)(cu +cv ) i = Et−1 1 − k=1 (1 − β)cu k+cui −cq P r(k edges at time t − 1|vj ∈ U ) + P eB0 +(t−1)(cu +cv ) i−1 i−1 +Et−1 (1 − β)c P r(k edges at time t − 1|v ∈ U ) u k+cu −cq j k=1 PeB0 +(t−1)(cu +cv ) i i −Ê t−1 1 − k=1 (1 − β)cu k P r(k edges at time t − 1|vj ∈ U ) + i−1 PeB0 +(t−1)(cu +cv ) i−1 (1 − β)c −Ê t−1 P r(k edges at time t − 1|v ∈ U ) u k j k=1 P eB0 +(t−1)(cu +cv ) (1 − β)cu k1 P r(k edges at time t − 1| ≤ ∆it−1 − i∆it−1 − (i − 1)∆i−1 t−1 k=1 P eB0 +(t−1)(cu +cv ) i−1 i |vj ∈ U ) + (Et−1 − Et−1 ) (1 − β)cu cq ( k+cu1 −cq − k1 )P r(k edges at k=1 time t − 1|vj ∈ U ) PeB0 +(t−1)(cu +cv ) (1 − β)cu k1 P r(k edges at time t − 1| ≤ ∆it−1 − (i∆it−1 − (i − 1)∆i−1 t−1 ) k=1 P eB0 +(t−1)(cu +cv ) cu −cq (1 − β)c |vj ∈ U ) + P r(k edges at time t − 1|v ∈ U ) u k j k=1 i−1 ≤ ∆t−1 + 2cu + 2cq By induction we have the 2cu + (2cu + 2cq )(i + 1)-Averaged Lipschitz Condition with respect to the vaiables C1 , S1 · · · , Cn , Sn for the random variable Xti . 2.5 Properties of the degree distribution of B(Q, U ) Here we will explore several aspects of the evolution model. In particular we start by showing that if a node in B(Q, U ) has degree g(n) at time n, it should have degree θ(g(n)) also at time n for > 0 and its degree is increased by θ(g(n)) between time n and n. First, we prove that the following property. Lemma 2.5.1 If a node in B(Q, U ) has degree g(n) ∈ Ω(logn) at time φn, with constant B0 cq β−e c∗ n−1 and 1 < φ < 0 then, with high probability, it will have degree smaller than g(n) φn cq β−eB c 0 ∗ n−1 larger than g(n) φn at the end of the process, for any constant 0 < ϕ < 1. Proof: We want to compute the number of nodes in Q that will point to u at the end of the process knowing that at time φn its degree was g(n). Let Etu be the expected degree of u at time t. We have that: cq β u u Et = Et−1 1 + et−1 2.5. PROPERTIES OF THE DEGREE DISTRIBUTION OF B(Q, U ) 21 Where et−1 is the number of edges at time t − 1. Thus we have that: cq β cq β u u u < Et < Et−1 1 + ∗ Et−1 1 + c∗ (t − 1) + eB0 c (t − 1) + eB0 So we have: E Thus: cq β c∗ eB (t−1)+ c 0 ∗ “ ” eB c β Γ t−1+ cq Γ(φn+ c 0 ) ∗ u ∗ ” “ φn Γ(t−1+ eB0 ) Γ φn+ cq β c∗ c∗ u Et−1 t−1+ “ ” eB c β Γ n−1+ cq Γ(φn+ c 0 ) ∗ u ∗ ” “ Eφn eB c β Γ(n−1+ c 0 ) Γ φn+ cq ∗ n−1 φn ∗ cq β−eB 0 c∗ < Etu < < Etu < E < Enu < < Enu < cq β c∗ eB (t−1)+ c∗0 “ ” c β eB Γ t−1+ cq∗ Γ(φn+ c∗0 ) u “ ” φn Γ(t−1+ eB0 ) Γ φn+ cq β c∗ c∗ u Et−1 t−1+ “ ” c β c β Γ n−1+ cq∗ Γ(φn+ cq∗ ) u “ ” Eφn cq β cq β Γ(n−1+ c∗ ) Γ φn+ c∗ n−1 φn B0 cq β−e c∗ Now we have to show that the degree of u is concentrated around its mean, in order to do it we will use the theorem 2.3.1 combined with proposition 2.3.1. Indeed the degree of u can be seen as a hereditary function on the set of edges where the boolean property, associated with the hereditary function, is: having all the endpoints in U equal to u. Further, we have that the lower bound and the mean value for f ∈ Θ(g(n)), so M [f ] ∈ Θ(g(n)). So by the proposition 2.3.1 and the theorem 2.3.1 we have that Etu is concentrated. The two following Corollaries follow from the previous by selecting the parameters of the previous Lemma carefully. Lemma 2.5.2 If a node in B(Q, U ) has degree g(n) at time n, with g(n) ∈ ω(log n), it had, with high probability, degree Ω(g(n)) also at time n for any constant > 0. Lemma 2.5.3 If a node in B(Q, U ) has degree Θ(nλ ) at the end of the process a δ fraction of the node pointing to u have been inserted after time φn, for any constant 0 < δ, λ < 1 and for a constant φ that depends on δ. The last lemma is an upper bound to the number of edges in B(Q, U ) that points to a node of degree at least i ∈ U . This lemma is important to have an upperbound on the probability of pointing to a high degree node. Lemma 2.5.4 At any time φn, for any 0 < φ ≤ 1 the number of edges, in B(Q, U ), c (1−β) − uc β q , for any i up to nγ , with that points to a node in U of degree at least i is Θ ni γ< 1 4+ cu (1−β) cq β . Proof: Let Zti be the number of edges with the endpoint in U of degree j at time t. We have that Ztj = jXtj , where Xtj is the number of nodes of degree j in U at time t. By 22 CHAPTER 2. AFFILIATION NETWORKS Xnj theorem 2.4.1, we have that c (1−β) −1− uc β j q Zn ∈ Θ j . Hence n X c (1−β) −2− uc β q ∈Θ j , for j up to nγ , with γ < 1 4+ cu (1−β) cq β , thus c (1−β) c (1−β) c (1−β) − uc β − uc β − uc β q q q =Θ n i −n Z ∈Θ n i j j=i for i up to nγ , with γ < 1 4+ cu (1−β) cq β . Finally, we have to prove that this is true for every φn, for any 0 < φ ≤ 1, by lemma 2.5.2 j = Θ(g(n)), thus using the same technique we can prove we know that if Xnj = g(n) then Xφn P P n j that the same property holds also for the variables Ztj . So the nj=i Ztj = Θ Z , for j=i any t > φn. 2.6 Properties of the degree distributions of the graphs G(Q, E) and Ĝ(Q, Ê) Although derived from B(Q, U ), the problem of computing the degree distributions of G(Q, E) and of Ĝ(Q, Ê) is much harder; in this section we will show some interesting properties of the degree distribution of the folded graphs. First we will show that the probability of a random node G having high-degree dominates the complementary cumulative distribution function of the degree distribution of the nodes in U in B(Q, U ). Then, by construction, a similar theorem follows with respect to the nodes in Q. Together, these results imply: Theorem 2.6.1 The degree distributions of the graphs G(Q, E) and Ĝ(Q, Ê) are heavytailed. Proposition 2.6.1 For the folded graphs G(Q, E) and Ĝ(Q, Ê) generated after n steps, almost surely, when n → ∞, the complementary cumulative distribution function of the degrees of nodes inserted after time φn, for any 0 < φ < 1, dominates the complementary cumulative distribution of a power law with exponent α = −2 − cu (1−β) , for every degree cq β 2+ 1 γ smaller than n , with γ < cu (1−β) , and in ω(log n). 4+ cq β Proof: Let Qi be the number of nodes inserted after time φn and with degree at least i in G(Q, E). Instead ofcomputing directly Qi we show that Qi is bigger than a random variable, which is in Θ ni −2− cu (1−β) cq β . Let S i be the number of edges inserted after time φn, pointing to a node of degree at least i and such that ∀(a, b) ∈ S i , if a ∈ Q then (a, b) is the oldest edge pointing to the node a. By definition the following inequality holds: S i ≤ Qi . Now by Lemma 2.5.2 we know 2.7. DENSIFICATION OF EDGES 23 that all the nodes inserted after time φn will have degree in O(log n) in B(Q, U ). So any node in Qi has degree in O(log n) in B(Q, U ) and only cu of its neighbors can have degree in ω(log n). Hence if a node in Q0 has degree i ∈ ω(log2+ n) at least one of its initial neighbors has degree in Θ(i) in B(Q, U ). to nγ , with γ < 4+ − cu (1−β) cq β edges of degree at least i, for i up c (1−β) − uc β 1 q i such that p∗ < cu (1−β) , thus there is a constant p∗ ∈ Θ Now by Lemma 2.5.4 there are Θ ni cq β Pr[copying an edge of degree at least i at time t], for any t ≥ φn. Now S i dominates the number of heads that we have if we flip Θ((1 − φ)n) times a biased coin that gives head with probability p∗ . Thus applying the Chernoff bound, we i have: Θ(p∗ (1 − φ)n) ≤Q. Hence Qi ∈ Ω ni − cu (1−β) cq β , and Pr[a node in Q0 has degree > i] ∈ Ω i − cu (1−β) cq β . Proposition 2.6.2 For the folded graphs G(Q, E) and Ĝ(Q, Ê) generated after n steps, almost surely, when n → ∞, the complementary cumulative distribution function of nodes dominates the complementary cumulative distribution function of a power law distribution cq β 1 with exponent α = −2 − cu (1−β) , for every degree smaller than nγ , with γ < . cq β 4+ c u (1−β) The proof follows from the definition of the co-evolution of the graphs. Finally we will show that most the nodes have degrees in Θ(1). Recall that F is the set of edges obtained by the folding process. Proposition 2.6.3 For the folded graphs G(Q, E) and Ĝ(Q, Ê) generated after n steps, all but o(n) nodes have degree in Θ(1). Proof: We start by notice that we can restrict our attention to the edges in F because the edges in |E − F | ∈ Θ(n). Indeed only o(n) nodes have degree in ω(1) in the graph G(Q, E − F ). Further, by Theorem 2.4.1, all but o(n) nodes in B(Q, U ) have degree ∈ Θ(1). In addition, recalling that for Lemma 2.5.4 only an o(n) of the edges in B(Q, U ) will point to a nonconstant degree node in U . Hence only an o(n) of the nodes increase its degree by more than a constant factor. 2.7 Densification of edges In this section we prove that the number of edges of the graph G(Q, E) and Ĝ(Q, Ê) are in ω(|Q|). Theorem 2.7.1 If cu < β c 1−β q the number of edges in G(Q, E) is ω(n). Proof: We notice that every node u in U ∈ B(Q, U ) in G(Q, E) gives rise to a clique where all neighbors of u are connected. Thus we can lower bound the number of edges in the graph G(Q, E) as follows: 24 CHAPTER 2. AFFILIATION NETWORKS X n N X i i |E| > (# of nodes of degree i in U ) ≥ (# of nodes of degree i) , 2 2 i=1 i=1 1 . By Theorem 2.4.1 with high probability: where N = nγ , with γ < cq β 4+ c (1−β) u N X n 1 (1 ± o(1)) i ∈ ω(n). |E| > cu (1−β) 2+ c β 2 q ζ −2 − cu (1−β) i i=1 cq β Theorem 2.7.2 If cu < β c 1−β q the number of edges in Ĝ(Q, Ê) is an ω(n). Proof: In order to prove the claim we start by noticing that the number of edges increases in Ĝ(Q, Ê) only when a node is added to Q, indeed when a new node is added to U only multiple edges and self loops are introduced. In the proof we restrict our attention to the edges in F and with one endpoint in a node of degree bigger then nλ . First, we start by computing the number of nodes of degree bigger than µnλ , let us call this set H. λ |H| = n − #of nodes of degree smaller than µn Pµnλ −1 1 = Θ n − n · 2+ cu1(1−β) ! c (1−β) i=1 2+ u ζ i cq β i cq β = Θ n − n 1 − 1 ζ i c (1−β) 2+ u cq β ! Pn 1 i=µnλ i c (1−β) 2+ u cq β c (1−β) 1−λ−λ uc β q = Θ n P Where in the last passage we use the fact that ni=k i1α = Θ(k 1−α − n1−α ). When a node, q, is added to Q, it will add cq edges to nodes in U . Let us denote eq = (q, u1 ) the first edge added by q. Then q will introduce in Ĝ(Q, Ê) a number of edges larger than the degree of u1 . So if u1 is in H, q will introduce at least µnλ new edges. Let us define an edge interesting if it points to H and it is the first edge added by a new node in Q, we would like to lower bound the number of interesting edges added after φn. We start by lower bounding the number of edges added before φn from below and pointing to a node in 2.5.2 and the estimate of |H| we get that the number of those H. Using lemma edges is Θ n 1−λ cu (1−β) cq β . Thus the number of interesting edges added after φn dominates the number of headthat we get ifwe tossed (1 − φ)ntimes a biased coin, which gives head c (1−β) 1−λ u −ε c (1−β) cq β −λ uc β −ε q with probability Θ n (cu +cv )n =Θ n , for any small ε > 0. 2.8. SHRINKING/STABILIZING OF THE EFFECTIVE DIAMETER 25 So using the Chernoff bound we have that w.h.p. the number of interesting edges in cu (1−β) 1−λ c β −ε q troduced after time φn is Ω n . Now recalling that each one of this edges will introduce in the folded graph at least Θ(nλ ) edges the claim follows. 2.8 Shrinking/stabilizing of the effective diameter We use the definition of the ψ-effective diameter given in [71]. Definition 2.8.1 (Effective Diameter) For 0 < ψ < 1, we define the ψ-effective diameter as the minimum de such that, for at least a ψ fraction of the reachable node pairs, the shortest path between the pairs is at most de . In this section we will show that the effective diameters of G(Q, E) and Ĝ(Q, Ê) shrink or stabilize over time. The intuition behind those proofs is that even if a person q is not interested in any popular topic, and so is not linked to any popular topic in B(Q, U ), with high probability at least a friend of q is interested in a popular topic. β Theorem 2.8.1 If cu < 1−β cq , the ψ-effective diameter of the graph G(Q, E) shrinks or stabilizes after time φn with high probability, for any 0 < φ < 1 and for any constant 0 < ψ < 1. Proof: Let H be the set of nodes of U in B(Q, U ) with degree ≥ nα , for small α > 0. By Lemma 2.5.2 every node in H has been inserted in the graph before time γn, for any 0 < γ < 1. Thus the diameter of the neighborhood of H in G(Q, E) shrinks or stabilizes after time γn. Now we want to show that all but o(n) nodes inserted after time n with φ has at least a neighbor that is in the neighborhood of H in B(Q, U ). Hence we will be able to upper bound the q-effective diameter with the diam(H) + 2, for any constant q < 1. The number of edges that have as one endpoint a node which is a neighbor of H is lower bounded by the number of edges generated by the existence of nodes in H. At any time after n the number of this edges can be lower bounded, as in Theorem 2.7.1, by PN 1 n 1 “ ” (1 ± o(1)) 2i , where N = nγ , with γ < , cq β c (1−β) c (1−β) i=nα 2+ u 4+ ζ −2− uc β cq β cu (1−β) q i ! cu (1−β) 1 1+ thus they are in Ω n 4+ cq β cu (1−β) cq β . Instead the number of edges whose endpoints are not neighbors of H can be upper bounded by ! ! ! ! nα X cu (1−β) n 1 i 1+α c β q (1 ± o(1)) + sn ∈ Θ n , cu (1−β) 2+ cu (1−β) 2 cq β ζ − 2 − i i=1 cq β where the first term of the sum represents all the edges that are created by nodes in U in B(Q, U ) and ∈ / H and the second term represents all the edges added to the graph by a choice based on preferential attachment in G(Q, E). 26 CHAPTER 2. AFFILIATION NETWORKS Now when a new node v arrives at a time between n and φn, it chooses a set of nodes qi1 , . . . , qis independently with a probability proportional to their degrees and it connects to 1 those nodes. Thus by fixing α < we have that v will point with high probability cq β 4+ c u (1−β) to a node that is neighbor to H in B(Q, U ). Hence for at least a q fraction of the reachable node pairs, the shortest path length between a pair is at most diam(H) + 2. β Theorem 2.8.2 If cu < 1−β cq , the ψ-effective diameter of the graph Ĝ(Q, Ê) is upper bounded by a constant with high probability for any time φn, for 0 < φ < 1. Proof: This proof is mostly same as the proof of Theorem 2.8.1; the only difference is that we cannot use the same lower bound for the edges that have an endpoint in the neighbors of H. Let 0 < φ, using the same tecniques of Theorem 2.7.2we have that the number of edges ” “ 1+δ 1− of degree at least nδ inserted between time 0 and is Ω n cu (1−β) cq β −ε . Thus, fixing the δ and the α of the proof of Theorem 2.8.1 such that α is smaller than δ 1 − cu (1−β) −ε cq β we have that also in this case the probability of choosing a destination of an edge that is not in the neighbors of H is o(1). Hence using the same arguments of Theorem 2.8.1 the result follows. 2.9 Sparsification of G(Q, E) Several interesting algorithms (eg. the Dijkstra’s algorithm) have complexity proportional to the number of edges in the graph. So to harness the implicit hardness due to the densification of the edges in social networks we study in this section the performances of two sparsification algorithms. First we analyze a setting in which we have a set of several relevant, or distinguished, nodes and we want to preserve all the distances between a relevant node and every other nodes. The set of relevant nodes has cardinality at most logn n and is chosen uniformly at random. For this case, we present an algorithm, Algorithm A , which, with high probability, generates from G(Q, E) a new graph G0 (Q, E 0 ), with |E 0 | ≤ δ|E| and 0 < δ < 1, such that for any node u in G and any relevant node v, a path of shortest distance in G is also present in G0 . In the second setting, in which a constant stretching of distances is allowed, we show that exists an algorithm that reduces the number of edges to a Θ(n) both in G(Q, E) and in Ĝ(Q, Ê). 2.9.1 Sparsification with preservation of the distances from a set of relevant nodes We start by describing algorithm A , the sparsification algorithm. Input: G(Q, E) and a set R of relevant nodes. (1) Initially, label all edges deletable. 2.9. SPARSIFICATION OF G(Q, E) 27 (2) For each node a ∈ R: (a) Compute the breadth first search tree starting from node a and exploring the children of a node in increasing order of insertion. (b) Label all edges in the breadth first search tree of any node a as undeletable. (3) Delete all edges labeled as deletable. Theorem 2.9.1 Suppose the set of relevant nodes R has cardinality logn n and suppose that β the elements of R are chosen uniformly at random from Q. If cu ≤ 1−β cq , the algorithm A 0 0 with high probability generates from G(Q, E) a new graph G (Q, E ), with |E 0 | ≤ δ|E|, for some small constant 0 < δ < 1, in which the distance between every pair of nodes (a, b) is preserved if at least one of the two node is in R. Before proving Theorem 2.9.1 we introduce two useful lemmata. Definition 2.9.1 (Useless Nodes) For a node u ∈ U in B(Q, U ), we say that a set Su of nodes in G(Q, E) is useless for u if every v ∈ Su has an edge to u in B and, furthermore, if we compute a breadth first search in B(Q, U ), starting from node u and analyzing the nodes following increasing order of insertion, no node in Su will be in a path between u and a relevant node in the breadth first search tree. Lemma 2.9.1 Let u ∈ U and let Su be a set of useless nodes for u; then algorithm A will delete all edges in G(Q, E) that are between nodes in Su and that are in the clique generated by the interest u. Lemma 2.9.2 For > 0, if u has degree Ω(n ), then δdeg(u) neighbors of u are in Su , for some small constant 0 < δ < 1. Proof:(of Theorem 2.9.1) It is easy to see that running algorithm A will not change distances between pairs of nodes (a, b) if at least one of the two nodes is in R. So we have only to prove that a constant fraction of the edges are deleted by the algorithm. First we notice that we can restrict our attention only to the set F of folded edges; indeed, by construction, |E − F | ∈ Θ(n). Now, recalling the description of the generating process given in Theorem 2.7.1, we have that all but an o(|E|) of the edges in F will be part of cliques of polynomial size generated from a node u of degree Ω(n ), for small . Now by Lemma 2.9.1 and Lemma 2.9.2 we have that in every clique generated from such a node a δ fraction of the edges will be deleted, for any constant 0 < δ < 1, thus the claim follows. Proof:(of Lemma 2.9.1) First we notice if an edge is deleted by A in G(Q, F ), where F is the set of folded edges, it will be deleted also in G(Q, E). This is true because A deletes all edges that do not appear in any shortest path from any node to a node to in R and F ⊂ E. In the following we will consider G(Q, F ). Let u ∈ U and NB (u) the set of neighbors of u in B(Q, U ). After running algorithm A , we have that any node v ∈ Su ⊂ NB (u) does not appear as an intermediate node in a shortest path between a relevant node and a node in NB (u). Indeed suppose by contradiction that v appears as an intermediate node in the path between a relevant node r and a node 28 CHAPTER 2. AFFILIATION NETWORKS t ∈ NB (u), this would imply that no node h ∈ NB (u) would satisfy d(h, r) ≤ d(v, r), where d(·, ·) is the distance function, and h has been added to B(Q, U ) before v. Thus the breadth first search tree in B(Q, U ) rooted at u should have v in the path between (u, r), thus v ∈ / Su a contradiction. Thus each node in Su belongs to a different branch in every breadth first search tree in G(Q, F ) rooted at any relevant node, hence any edge between two nodes in Su will be deleted. Proof:(of Lemma 2.9.2) By Lemma 2.5.2 and Lemma 2.5.3 we have that if a node u has degree nλ ∈ Ω(n ) at the end of the process it should have degree µnλ , also at time φn and that a δ fraction of the nodes pointing to u have been inserted after time φn, for any constant 0 < δ < 1 for some constant 0 < µ ≤ 1 and for some constant 0 < φ < 1 that depends on δ. We call this set L of nodes latecomers. We prove that in the breadth first search from u, only o(|L|) of the vertices in L are used to reach a relevant node. Thus |Su | ≥ |L| − o(L) ≥ (1 − δ)|NB (u)|, for any constant 0 < δ < 1 so the Lemma will follow. In order to prove this we start by showing that the sum of the nodes over the branches of the breadth first search tree rooted at u and containing a latecomer node is Θ(nλ ).5 We say that a node i is a child of u if the edge (i, j) exists in B(Q, U ) and i has been inserted in B(Q, U ) after u. Let the descendants of u be the set S such that a node v is in S if and only if v is a child of u or v is a child of a node in S. It is easy to notice that the number of nodes in branches of u that contain also a latecomer at time t is upper bounded by the number of descendants of u. Let Etdesc be the expected number of nodes that are desc = 0, so we have: descendants of u. Notice that Eφn desc + (βcq + (1 − β)cu ) Etdesc = Et−1 desc +µnλ Et−1 et−1 +eB0 Instead of studying Etdesc will we study the function Wt , with Wφn = µnλ and the recursive equation: Wt−1 Wt = Wt−1 + (βcq + (1 − β)cu ) et−1 + eB0 It easy to note that Wt > Etdesc . So we have: Etdesc < Wt−1 (1 + (βcq + (1 − β)cu ) et−1 1+eB ) 0 q +(1−β)cu < Wt−1 1 + eφnβc +c∗ (t−1)−c∗ φn) βcq +(1−β)cu ∗ c < Wt−1 1 + e −c∗ φn t−1+ φn c∗ βc +(1−β)cu t−1+ q c∗ < Wt−1 eφn −c∗ φn Endesc < t−1+ c∗ “ ” “ ” βc +(1−β)cu βc +(1−β)cu Γ n−1+ q c∗ Γ φn+ q c∗ ” “ ” Wφn “ e −c∗ φn e −c∗ φn Γ n−1+ φn c∗ Γ φn+ φn c∗ = Θ nλ 5 n−1 φn βcq +(1−β)ccu∗−eφn +c∗ φn ! ∈ Θ(nλ ) Note that when a node is added all its edges are copied from its prototype. So the distance between any couple of pre-existing nodes cannot shrink after the insertion of a new node. Thus in the breadth-first tree built by A it holds that: for any internal node i all the sons of i have been inserted after i. 2.9. SPARSIFICATION OF G(Q, E) 29 The final technical steps use the concentration results on hereditary function. Specifically, we notice that the number of descendants can be seen as a hereditary function on the set of edges where the boolean property is being a descendant of u. In addition M [number of descendants] < cm nλ for a 0 < cm < 1. By proposition 2.3.1 and the Theorem 2.3.1, we have that Etdesc is sharply concentrated. Furthermore the set of relevant nodes is of cardinality logn n and it is chosen uniformly at random hence with high probability only a o(|L|) of the latecomers and their descendants would be a relevant. Thus only a o(|L|) of the branches of the breadth first search tree rooted at u and containing a node inserted after time φn will lead to a relevant nodes. So all but a o(|L|) of the latecomers will be in Su . 2.9.2 Sparsification with a stretching of the distances In the previous subsection we have shown that we can reduce the number of edges in G(Q, E) by a constant factor using the algorithm A . In this section we will study what we can achieve if we permit some bounded stretching of the shortest distance between two nodes. We start by noticing that the graph B(Q, U ) has a linear number of edges and any distance between two nodes in this graph is equal to 2 times the distance of nodes in G(Q, F ) so adding the edges in E − F it seems that we have the perfect solution to our problem. Unfortunately the original bipartite graph may not be available to us; nevertheless, we are able to explore the underlying backbone structure of G to prove the following theorem. Theorem 2.9.2 There is a polynomial algorithm that, for any fixed cu , cq , β, finds a graph G0 (Q, E 0 ) with a linear number of edges, where the distance between two nodes is at most k times larger that the distance in G(Q, E) and in Ĝ(Q, Ê), where k is a function of cu , cq , β. Proof: First we notice that we can restrict our attention only to the folded edges, indeed by construction |E − F | ∈ Θ(n). Let us say that S is a k-spanner of the graph G if it is a subgraph of G in which every two vertices are no more than k times further apart as they are in G. The problem of finding k-spanners of a graph is studied extensively in several papers — [3, 6, 87], to name a few. In our analysis, we will consider the algorithm proposed in [3] for the unit-weight case. Their algorithm builds the set ES of edges of the 2k-spanner as follows: at the beginning ES = ∅. The edges are processed one by one, and an edge is added to ES if and only if it does not close a cycle of length 2k or smaller in the graph induced by the current spanner edges ES . At the end of the process the graph G(Q, ES ) will be a 2k-spanner of G(Q, E) by construction and the fact that the girth of G(V, ES ) will be at least k + 1. Since a graph 1 with more thann1+ k edges must have a cycle of at most 2k, the algorithm builds a spanner 1 of size O n1+ k . It is important to notice that if we apply the algorithm described above to G(Q, E) and G(Q, F ), analyzing the edges in F in the same order, every edge deleted in G(Q, F ) is deleted also in G(Q, E). Now in the G(Q, F ) we have that any clique generated by any node 30 CHAPTER 2. AFFILIATION NETWORKS 1 in U has O n1+ k edges. Thus using the algorithm described, we have the following upper bound on the number edges for a 2k-spanner of G(Q, F ). |FS | ≤ n X 1 # of nodes of degree i in U ) i1+ k i=1 By Theorem 2.4.1, we have with high probability: P n 1 1+ k1 |FS | ≤ (1 ± o(1)) i c (1−β) c (1−β) i<n∆− ζ(−2− uc β ) 2+ ucq β q i 1 Pn + (# of nodes of degree i in U ) i1+ k i≥n∆− P n 1 1+ k1 = (1 ± o(1)) i c (1−β) c (1−β) i<n∆− ζ(−2− uc β ) 2+ ucq β q i 1 Pn (# of edges pointing to a node in U of degree i) i1+ k + ∆− i≥n i P n 1 1+ k1 = (1 ± o(1)) i c (1−β) c (1−β) i<n∆− ζ(−2− uc β ) 2+ ucq β q i 1 Pn k + i≥n∆− (# of edges pointing to a node in U of degree i) i P n 1 1+ k1 = i (1 ± o(1)) ∆− c (1−β) c (1−β) i<n ζ(−2− uc β ) 2+ ucq β q i 1P n + nk i≥n∆− (# of edges pointing to a node in U of degree i) By Lemma 2.5.4: |FS | ≤ n P i<n∆− ζ(−2− cu (1−β) 1 ) c (1−β) 2+ u cq β 1+ k1 (1 ± o(1)) i cq β i c (1−β) 1 −(∆−) uc β q + n·Θ n nk So if k > 2.10 Flexibility of the model 4cq β+cu (1−β) cu (1−β) then |FS | ∈ Θ(n). Thus also |FS + (E − F )| ∈ Θ(n). In this section we consider some variations of the model for which is easy to prove that the main theorems hold. We will analyze the two following cases: • Instead of generating only one bipartite graph B(Q, U ), a list B0 (Q, U ), · · · , Bk (Q, U ) of bipartite graphs 6 are generated. At the same time the multigraph G(Q, E) evolves in parallel; besides “folding” length-2 paths in B0 , · · · , Bk into edges, we also add to G(Q, E) a few preferentially attached neighbors. 6 In this model the choice of adding a node to U or Q is the same for all the graphs, but the number of edges added(cu0 , cq0 , · · · , cuk , cqk ) and their destination differ. 2.10. FLEXIBILITY OF THE MODEL 31 • Instead of “folding” length-2 paths in B into edges, for every pair of nodes in Q and every shared common neighbor u ∈ U between them, we randomly and independently place an edge between the nodes in G(Q, E) with probability proportional to the reciprocal of d(u)α , where d(·) denotes degree 0 < α < 1. β In the first case if for at least a bipartite graph cui < 1−β cqi the densification of the edges and the shrinking/stabilizing follow using the same arguments used in the proof of the Theorems 2.7.1, 2.7.2, 2.8.1 and 2.8.2. Furthermore if k is constant all the theorems on the degree distribution of G(Q, E) and Ĝ(Q, Ê) continue to hold. In the second case it is sufficient to notice that every node u in U in B(Q, U ) is no 1 longer substituted by a clique but by a G(n, p), where n = d(u) and p = d(u) α . Now if β cu < 1−β cq (1−α) using the same argument of Theorem 2.7 and the Chernoff bound we obtain the densification of the edges. The shrinking/stabilizing diameter in this case follows from the fact that most of the nodes will point to a high degree node in G(Q, E)7 and that the G(n, p), where p = nα for 0 < α < 1 has constant diameter by [13]. Finally also in this case the degree distribution is heavy-tailed because with high probability the complementary cumulative distribution function of nodes dominates the complementary cumulative distribution function of the degrees of Q in B(Q, U ). 7 This can be proved using the same proof strategy as before. Chapter 3 Navigability of Affiliation Networks We demonstrate how the Affiliation Networks model offers powerful cues in local routing within social networks, a theme made famous by sociologist Milgram’s "six degrees of separation" experiments. This model posits the existence of an "interest space" that underlies a social network; we prove that in networks produced by this model, not only do short paths exist among all pairs of nodes but natural local routing algorithms can discover them effectively. Specifically, we show that local routing can discover paths of length O(log2 n) to targets chosen uniformly at random, and paths of length O(1) to targets chosen with probability proportional to their degrees. Experiments on the co-authorship graph derived from DBLP data confirm our theoretical results, and shed light into the power of one step of lookahead in routing algorithms for social networks. 3.1 Introduction Milgram’s six-degrees-of-separation experiment [79, 101] and the fascinating small world hypothesis that follows from it, has generated a lot of interesting research in recent years. In this landmark experiment, human subjects were asked to deliver a letter to a target person in a far away city only if they knew the target on a first name basis. Otherwise, they would pass along the letter to a friend who, recursively, would follow the same instructions. The surprising outcome was that a reasonably large fraction of the letters reached the target and moreover, they did so in very few hops. This led to the fascinating small world hypothesis: take any two people in a social network, and they will be connected by a short chain of acquaintances. The extent to which the hypothesis is true is still actively debated, and no evolving model for social network, that exhibits the standard statistical property for social network (i.e. power law distribution [21, 38], high clustering coefficient [103], densification and shrinking diameter [71]), can explain at the same time the small world phenomena. The main contribution of this chapter is to create a bridge between the analysis of the small world phenomena and the analysis of the evolving model for social networks. In particular we introduce a new dynamic model that explains the small worlds and all the The work described in this chapter is a joint work with A. Panconesi and D. Sivakumar. 33 34 CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS standard properties of social networks. The model is based on affiliation networks, a very natural model for situations such as those considered in this chapter. This model, studied in the previous chapter, gives a very good explanation of evolutionary properties such as densification and shrinking diameter observed in [71] and several other properties that arise in social networks analysis [63]. In this chapter we introduce and analyze a more general version of the model and show that it naturally defines in an implicit way a space of interests that co-evolves with the social network, and that furthermore this space is navigable. The model is more general than that one introduced in the previous chapter because new interests that join in can be a perturbation of a mixture of pre-existing interests. In the original model, a new interest could only be a perturbation of one pre-existing interest. Similarly, a new person joining the network will share a subset of the interests of several friends, as opposed to just one of them. Thus, this extension is more natural and flexible. In this chapter we prove that this enhanced model has several strong properties that are especially relevant for modeling small worlds. Our model is the first to exhibit simultaneously three different sets of properties of social networks: small world, evolutionary properties and navigability of the interest space. In previous attempts, these features were somehow captured but separately. For instance, the models in [41, 42, 58, 103] deal with the small world phenomenon, but they are static and unable to explain evolutionary properties or even the heavy-tailed distribution of popularity (number of friends). Furthermore they assume that every person knows the distance between its neighbors and the target, instead we only assume that every person knows the closeness between two interests. There have been also some attempts to define and navigate an interest space instead of geographic informations [60,105] or to use a latent space of interests to define the friendship graph [89, 93]. But, again, these models are static(the number of nodes in the graph does not increase in time) and unable to explain evolutionary properties. In contrast, in our model all these different aspects come forth naturally from the same model. Our model also matches the experimental evidence from a quantitative point of view. The effective diameter of the friendship graph is upper bounded by a constant. This is compatible with the empirical observations of [70] where a huge social network of hundreds of millions of nodes was analyzed and its effective diameter found to be a very small number. When we analyze the actual working of Milgram routing in the friendship graph (not to be confused with the mere existence of short paths), we find that when source and target are chosen at random, their expected routing distance is O(log2 n). The novelty here, is that to find this short chain we navigate the interest space associated with the affiliation network, and not the friendship graph itself. When the target is chosen by popularity, i.e. with probability proportional to the numbers of friends, then the expected length of the chain can be upper bounded by a constant. This is quite in line with the experimental evidence with human subjects. It has been pointed out that the successful outcome of Milgram’s experiment was due to the fact that the target was a person of high social status and had a profession that contributed even more than his status to establish and nurture many social connections. When the experiment was repeated using targets of low social status the outcome was indeed quite different [61]. Our model captures these features of the real world very nicely. Further, in accordance with the observation of Granovetter [49], the proofs of the upper bound for the diameter and the expected routing distance use heavily the presence of weak ties(i.e 3.2. OUR MODEL 35 preferential attachment edges in the model). Finally, we point out that our model is the only one to capture, together with evolutionary properties and the notion of a navigable interest space, another crucial property of social networks, the heavy-tailed distribution of popularity (number of friends). Another important issue with the Milgram’s small world hypothesis is in the structural hardness of its verification. Milgram’s painstaking work enabled him to collect data on a few hundreds of individuals, to solve this problem now it is possible to use large-scale social networking sites, indeed “in silico” experiments that make use of social networks can easily manage millions of individuals. Furthermore, the issue of attrition, the natural unwillingness of human subjects to drop the experiment, disappears. For such reasons, several “cyber replicas” of the experiment have been performed [70,73]. These replicas confirm qualitatively the small world hypothesis but they are very crude simulations of the experiment. In one such instance for example [73], a snapshot of the social networking site LiveJournal was downloaded to obtain a social network of roughly 15 million individuals. The experiment was simulated by picking source and target at random, and by moving toward the target according to geographical proximity (geo-greedy): from the current node X we move to the neighbor of X that is closest to the target. In another instance [70], the effective diameter of the social network of IM chat exchanges was estimated and found to be compatible with the small world hypothesis. The main drawback of these approaches is that they only take into account geographical or positional information, while it is clear that other cues play a role. In the original experiment, subjects knew the profession of the target and this information proved to be crucial. This motivates the second question the we address in this chapter: Is it possible to perform a cyber-replica of Milgram’s experiment in which a cognitive “space of interests” is navigated? We show here that this is possible. In our experiment we consider a social network of co-authorships of computer science papers. Two people in this networks are “friends” if they are co-authors. We then extract a space of interests consisting of computer science topics. In simulating the experiment, we go from person to person by moving to the friend of the current person that has more interests in common with the target. By and large, the outcome confirms the small-world hypothesis in general, and in particular our assertion that navigating the dual space of interests offers powerful cues in decentralized routing. Furthermore, our experiments strongly reinforce two significant pieces of work in the sociology literature — the importance of weak ties [49] and the significance of the social status of the target node in Milgram’s experiment [61]. Finally, since our experiments are based on publicly available data, it should be possible for other researchers to replicate our work as well as derive additional insights underlying small-world routing. 3.2 Our model The model that we consider in this chapter is a variation of the Affiliation Networks model presented in the previous chapter. In both models, two graphs evolve at the same time. The first is a bipartite graph, denoted as B(P, I), that represents the affiliation network, with a set P of people on one side and a set of interests I on the other. An edge (p, i) represents the fact that p is interested in i. The second graph is a friendship network, denoted as G(P, E), 36 CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS representing friendship relations within the same set P of people. In this graph, people can be friends for two different reasons: if they share an interest or because of preferential attachment. Thus, G is the “folding” of B, plus a set of edges generated by preferential attachment. B(P, I) Fix two P 1integers k1 and P 2 k2 , fix k1 + k2 integers kj=1 cpj = cp , kj=1 cij = ci > 0, and let β ∈ (0, 1). At time 0, the bipartite graph B0 (P, I) is a simple graph with at least cp ci edges, where each node in P has at least cp edges and each node in I has at least ci edges. At time t > 0: (Evolution of P ) With probability β: (Arrival ) A new node p is added to P . (Preferentially chosen Prototypes) A set of nodes p1 , · · · , pk1 ∈ P , with k > 1, are chosen as prototypes for the new node, with probability proportional to their degrees. (Edge copying) cpj edges are “copied” from P1 pj , with 1 ≤ j ≤ k1 and kj=1 cp j = cp ; that is, cpj neighbors of pj , denoted by u1 , . . . , ucpj , are chosen uniformly at random (without replacement), and the edges (p, i1 ), · · · , (p, icpj ) are added to the graph. (Evolution of I) With probability 1 − β, a new node i is added to I following a symmetrical process, adding ci edges to i. G(P, E) Fix threeP integers k1 , k2 P and s, fix k1 +k2 +s 1 2 integers kj=1 cpj = cp , kj=1 cij = ci > 0, and let β ∈ (0, 1). At time 0, G0 (P, E) consists of the subset P of the vertices of B0 (P, I), and two vertices have an edge between them for every neighbor in I that they have in common in B0 (P, I). At time t > 0: (Evolution of P ) With probability β: (Arrival ) A new node p is added to P . (Edges via Prototype) An edge between p and another node in P is added for every neighbor that they have in common in B(P, I) (note that this is done after the edges for p are determined in B). (Edges via evolution of I) With probability 1 − β: A new edge is added between two nodes p1 and p2 if the new node added to i ∈ I is a neighbor of both p1 and p2 in B(P, I). (Preferentially Chosen Edges) A set of s nodes pi1 , . . . , pis is chosen, each node independently of the others (with replacement), by choosing vertices with probability proportional to their degrees, and the edges (p, pi1 ), . . . , (p, pis ) are added to G(P, E). In the previous chapter the graph B evolves as follows. When a new interest (resp. person) comes in, it selects a prototype node among the existing interests (resp. people) and copies it with a small perturbation. In this new version, when a new node joins B it can select more than one prototype. A new interest for example, will be a slightly perturbed mixture of a few existing interests, and a new person will be interested in a combination of interests of his/her friends. This new model seems more realistic and, from the technical point of view, it presents a few complications that make it a non straightforward extension of the previous one. Furthermore in this new version of the model it is possible to prove 3.3. PRELIMINARIES 37 some additional properties of the graph such the constant diameter and the navigability. The above table describes the model precisely. For readability, we present the two evolution processes separately even though the two graphs evolve together. Before proceeding, let us introduce some terminology. An edge of G between two people that comes from the fact that these two people share an interest in B is called a folded edge. The set of folded edges is denoted by F . In the next section we will introduce some notation and some results that we will use in the chapter. 3.3 Preliminaries We say that an event occurs with high probability (whp) if it happens with probability 1 − o(1), where the o(1) term goes to zero as n (the number of vertices) goes to ∞. Finally, recall that the distribution of a r.v. X with distribution function F is said to be heavy-tailed if: limx→∞ eλx P r[X > x] = ∞ for all constants λ > 0. 3.3.1 Concentration Theorems Now we recall three important properties of functions that make the task of establishing measure concentration results easier, and present the relevant concentration results from the literature (see [35]). First we present the simplest version of the method of bounded differences. Definition 3.3.1 [Lipschitz Condition] A function f satisfies the Lipschitz condition with parameters dj , j ∈ [n] with respect to the random variables X1 , · · · , Xn if for any aj , a0j and for 1 ≤ j ≤ n. f (X1 = a1 , · · · , Xj = aj , · · · , Xn = an ) − f X1 = a1 , · · · , Xj = a0j , · · · , Xn = an ≤ dj Theorem 3.3.1 [cf. [35,77]] Assume f satisfies the Lipschitz condition with respect to the 2 variables XP 1 , · · · , Xn with parameters dj , j ∈ [n]. Then P r[|f − E[f ]| > t] ≤ exp (−t /2d), where d = j≤n d2j . Now we recall an extension of the Lipschitz Condition and of the method of bounded differences introduced in Chapter 2. Definition 3.3.2 [Averaged Lipschitz Condition] A function f satisfies the averaged Lipschitz condition with parameters cj , j ∈ [n] with respect to the random variables X1 , · · · , Xn if for any aj , a0j and for 1 ≤ j ≤ n. E f (X1 , · · · , Xn ) X1 = a1 , · · · , Xj = aj −E f (X1 , · · · , Xn ) X1 = a1 , · · · , Xj = a0j ≤ dj Lemma 3.3.1 [cf. [35, 77]] Assume f satisfies the averaged Lipschitz condition with respect to the variables X1P , · · · , Xn with parameters cj , j ∈ [n]. Then P r[|f − E[f ]| > t] ≤ 2 exp (−t /2c), where c = j≤n c2j . 38 CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS Finally we recall a concentration result on hereditary function that will be used in the proof of Lemma 3.4.1 and already introduced in Chapter 2. Definition 3.3.3 (Hereditary Property and Hereditary Function) A Boolean property ρ(x, J ), where x is a sequence of n reals J is a family of subsets of [n], is said to be a hereditary property of index sets if: (1) ρ is a property of index sets, that is, if xj = yj for every j ∈ J ∈ J , then ρ(x, J ) = ρ(y, J ); (2) ρ is non-increasing on the index sets, that is, if I ⊆ J , then ρ(x, J ) ⇒ ρ(x, I). Let fρ be the function determined by a hereditary property of index sets ρ given by fρ = maxJ :ρ(x,J ) |J |; we will call fρ a hereditary function of index sets. The concentration result for hereditary functions of index sets is a consequence of the Talagrand’s inequality and it was proven in [34]. Theorem 3.3.2 [ [34]] Let fρ be a hereditary function of index sets. Then for all t > 0, Pr[f > M [f ]+t] ≤ 2 exp (−t2 /(4(M [f ] + t))), and Pr[f < M [f ]−t] ≤ 2 exp (−t2 /(4(M [f ]))). The next proposition relate the concentration theorems for the median value of a function to concentration theorems on its mean value. Proposition 3.3.1 The following are equivalent for an arbitrary function f and random variables X1 , · · · , Xn : (1) For all t > 0, there exist c1 , α1 > 0 such that Pr[|f − E[f ]| > t] ≤ c1 e−α1 t . (2) For all t > 0, there exist c2 , α2 > 0 such that Pr[|f − M [f ]| > t] ≤ c2 e−α2 t . 3.4 Properties of the model In this section we give some definitions and then we some describe relevant properties of the model. We first define the concepts of effective diameter, core and hubs of a graph. Definition 3.4.1 [Effective Diameter] For 0 < q < 1, we define the q-effective diameter as the minimum de such that, for at least a q fraction of the node pairs, the shortest path between the pair is at most de . Definition 3.4.2 [Core and hubs of B(P, I)] The core of B(P, I) is the set C ⊆ I of vertices such that there exist two constants , α > 0 such that for all v ∈ C then d(v) ≥ αn . The hubs is the set of vertices in P that are at distance 1 from the core. Now we introduce some properties that our model shares with the original “Affiliation Network” model introduced in chapter 2. Most of the techniques that we will use in the proof are inspired by the results in chapter 2. 3.4. PROPERTIES OF THE MODEL 39 Theorem 3.4.1 [General properties of the model] β If ci < 1−β cp , we have that: (1) For the bipartite graph B(P, I) generated after n steps, almost surely, when n → ∞, the degree sequence of nodes in P (resp. I) follows a power law distribution with expo cp β nent α = −2 − ci (1−β) α = −2 − ci (1−β) , for every degree smaller than nγ , with γ < cp β 1 γ < ci1(1−β) with high probability. cp β 4+ c 4+ i (1−β) cp β (2) The degree distributions of the graphs G(P, E) is heavy-tailed with high probability. (3) The number of edges in G(P, E) is ω(n) with high probability. (4) The q-effective diameter of G(P, E) shrinks or stabilizes after time φn with high probability, for any constant 0 < φ < 1 and for any constant 0 < q < 1. Proof: We show that our new models have many statistical properties in common with the model presented in chapter 2, the basic idea is to show that the expected number of nodes of degree k evolve in the same way in our new model and in the Affiliation Network model so we get similar properties. Let Xti be the random variable that counts the number of nodes in P of degree i at time i i ]. = E[Xt−1 t. We want to express Eti = E[Xti ] in terms of Et−1 In the case of the Affiliation Network model recall from chapter 2(Theorem 2.4.1) that: cp c cp 1 − (1 − β)ci et−1 Et p = Et−1 +β (3.1) and Eti = i Et−1 i−1 1 − (1 − β)ci et−1 + (1 − β)ci ei−1 Et−1 . t−1 i (3.2) Similarly for our new model we have that: c c p Et p = Et−1 + Pr[a new node is added to P ] − Pr[a new node is added to I]· ·E[num. nodes in P of degree cp at time t − 1 whose degrees increase | | a node is added to I] P 2 Pcil cp cp = Et−1 + β − (1 − β) kl=1 j=1 Pr[a node in Et−1 is chosen as endpoint for the i-th edge] where the last equation follows from linearity of expectation. In addition, if we focus on an addition of a single edge, we have that every edge has the same probability to be copied by the process, thus we get that: c cp cp Et p = Et−1 1 − (1 − β)ci et−1 +β 40 CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS So in the base case the recursive equation for the new model is equal to the equation for the Affiliation Networks model in chapter 2. Now let us consider the general case, we have that: i − E[number of nodes in P , with degree i at time t − 1, that increase deg.]+ Eti = Et−1 +E[num. nodes in P with deg. smaller than i at time t − 1 that increase deg. to i] i − E[number of nodes in P , with deg. i at time t − 1, that increase deg.]+ = Et−1 Pk2 + j=1 E[num. nodes in P with deg. i − j at time t − 1 that increase deg. to i] i−j j i−j Pk ci i−1 i i i ≤ Et−1 − Et−1 (1 − β)ci et−1 + (1 − β)ci ei−1 E + Et−1 . t−1 j=2 j et−1 t−1 i−j j i−j Pk ci i−1 i i 1 − (1 − β)ci et−1 ≤ Et−1 + (1 − β)ci ei−1 E + Et−1 . t−1 j=2 j et−1 t−1 where the first inequality equation follows from the same observations above and the linearity of expectation. Furthermore we can also prove a lower bound to the expectation using the same equalities: i i − 1 i−1 i i + (1 − β)ci E . Et ≥ Et−1 1 − (1 − β)ci et−1 et−1 t−1 Thus in this case only the lower bound for Eti is equal to the equation for Eti in the original Affiliation Network model, so we can not use directly the results of chapter 2. Fortunately we are interested in the behavior of Eti when t → ∞. In particular we are interested in the value of Ei Y i = lim t t→∞ t “ Pk j=2 (cji ) i−j et−1 ”j i−j Et−1 goes to zero when t → ∞ so the And in the upper bound of Eit the term t upper and the lower bound are tight when t → ∞, so when t → ∞ the equation for Y i is the same as for the original Affiliation Networks thus using the same techniques we get the Y i have the same value in the new and in the original model. More precisely we get that: ci β p (1−β) −2− c Yi ∼i To finish the prove of property (1) we have to show that number of nodes of degree i at time t, Xti , is concentrated around Eti . To do it we will use Theorem 3.3.1, as in chapter 2, in particular we would like to follow the same techniques used in chapter 2 but unfortunately we cannot do it directly because we have to consider the additional terms in the upper bound of Eti . We define ∆it = Eit − Êit , where Eit = E[Xti |x1 = a1 , x2 = a2 , · · · , xs = as ] and Êit = E[Xti |x1 = a1 , x2 = a2 , · · · , xs = a0s ], with s > 0. In chapter 2 it is shown that ∆it ≤ j P i−j ∆it−1 + 2ci + 2cp . In our case we note that the additive factor kj=2 cji ei−j Et−1 ≤ k2 , t−1 thus using the same algebraic manipulation presented in chapter 2 we get that in our case ∆it ≤ ∆it−1 + 2ci + 2cp + k2 . So we have 2cp + (2cp + 2ci + k2 )(i + 1)–Averaged Lipschitz Condition for our variables, hence combing the bounds for Y i and using theorem 3.3.1 we get property (1) for the set P . Finally by symmetry of the degree distributions of P and I we obtain property (1). 3.4. PROPERTIES OF THE MODEL 41 The proof of the properties (2-4) follows directly from property (1) and the presence of the preferential attachment edges as shown in chapter 2. We refer to chapter 2 for more details on those proofs. Now we prove two technical Lemmata on the evolution of our graph model that we will use in the following sections. In the following Lemma we give an explicit relation between the degree of a node at time n and its final degree. Lemma 3.4.1 Let v be a node in B(P, I) with degree g(n) at time n, with g(n) ∈ Ω(log2 n), then, with high probability, its degree at time n is smaller than C · g(n), for every constant > 0 and some constant C > 0. Furthermore if a node v has degree o(log2 n) at time n or it is inserted after time n, for any constant > 0, then the final degree of v is in o(log2 n) with high probability. Proof: Let v be a node in B(P, I) with degree g(n) at time , with g(n) ∈ Ω(log2 n), without loss of generality we assume that v ∈ P . First we find an upper to the expected final degree of a vertex of degree g(n). We call Et the expected degree of v at time t, we have that En = g(n) and for t > n ci Et−1 Et = Et−1 + E[new edges pointing to v added at time t] ≤ Et−1 + et−1 ci ≤ Et−1 1 + (t − 1)cmin where et−1 is the number of edges at time t − 1 and cmin = min (ci , cp ). Thus we have: t Y ci ci Et ≤ Et−1 1 + ≤ g(n) 1+ (t − 1)cmin (i − 1)cmin i=n ! ci t c i Γ (t − 1) + Γ (t − 1) Y (i − 1) + c cmin min ≤ g(n) ≤ g(n) ci (i − 1) Γ (t − 1) Γ (t − 1) + i=n cmin c ci t − 1 min ≤ C 0 g(n) = C 00 g(n) t − 1 Where C 0 and C 00 are two positive constants. Furthermore, Et ≥ g(n) and it is the expected value of an hereditary function with median g(n). Thus theorem 3.3.2 and propo applying 000 − log2 n sition 3.3.1, we have that P r[d(v) > C g(n)] = Θ e = o (n−1 ), for some positive constant C 000 . Thus using the union bounds on the number of nodes we get the first part of the claim. Now let us consider the case in which v is inserted after n or has degree in o(log2 n). Let h(n) be the degree of v at time αn, where αn is n or the insertion time of v if it is bigger than n. Using the same derivation as above we get that Et ≤ K · h(n) for some positive constant K, thus noticing that the median final degree of v is in Θ(h(n)) and applying 4 2 Theorem 3.3.2 we get that P r[d(v) ≥ K log2 n] < e−K log n = o (n−1 ), for every constant K > 0. Thus using again the union bounds on the number of nodes we get the claim. 42 CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS Finally we prove a connectivity property of the nodes inserted after time φn, for some constant φ > 0. β Lemma 3.4.2 If ci < 1−β cp , any node of P inserted after time φn, for any constant φ > 0, will have, with probability 1 − o(1), at least one preferential attachment edge incident to an hub in G(P, E). Proof: Our proof strategy is to lower bounding the volume of the hubs in G at time φn and then prove that with high probability every new node will add an edge to them. Note that by definition the volume of the hubs is bigger or equal to the sum of the squares of the degrees of the interests in the core of B(P, I), thus using Lemma 3.4.1, we have that at time t > φn X X d(i) X d dG (v) ≥ ≥ (# nodes of degree d at time t) 2 2 i∈C v∈hubs t ≤d≤tγ X d c (1−β) −2− i c β p ≥ Θ t·d 2 γ t ≤d≤t “ “ ” ” c (1−β) c (1−β) 1+γ 1− i c β 1+ 1− i c β p p ∈ Ω t −t ” “ c (1−β) 1+γ 1− i c β p ∈ Ω t Where the last inequality follow from (1) of Theorem 3.4.1 and Lemma 3.4.1 and Pn property 1 the last two passages follow from i=k iα ∈ Θ(k 1−α − n1−α ). Similarly for any φn we have that: X X d dG (v) ≤ (# nodes of degree d at time t) + (edges added via preferential 2 d<t v ∈hubs / attachment) “ ” X d c (1−β) c (1−β) −2− i c β 1+ 1− i c β p p Θ t·d ≤ +n∈Θ t 2 d<t Thus by taking a small enough constant > 0 all the nodes added after φn will point to at least one hub with high probability. 3.5 The crucial role of weak ties In this section we study the effective diameter of G(P, E) and show that it is upper bounded by a constant (it is unknown if this property holds in the original Affiliation Network model). This property is a consequence of the coexistence of folded and preferential attachment edges. Several studies have shown that links in a social network can be of two types, local and longrange, also called weak, ties [49]. Weak ties have several important structural properties, 3.5. THE CRUCIAL ROLE OF WEAK TIES 43 for instance they form bridges between different communities and, in particular, they are the crucial ingredient that makes small worlds possible. It is thanks to them that Milgram’s routing can be so effective and fast. In our model folded edges are local, for they connect people within a community of shared interests, while preferential attachment edges are the weak (or long-range) ties [58,59]. Note that, in accordance with the previous literature and sociological intuition, in our model weak ties are very few compared to folded edges. In this section we show that weak ties play another interesting structural function that is in accordance with the empirical evidence. It is because of them that the diameter of the friendship graph shrinks to the point that the effective diameter is bounded by a constant. Our proof also uses in a fundamental way the presence of hubs. This might seem in contrast with the results in [31] where the authors suggest that their role is not relevant. A possible explanation is that they consider only the degree induced by the explored paths, and thus consider only a subgraph of the social network. Thus it is possible that in their experiments a high degree node seems to have small degree just because only few messages passed through him. In our proof, we consider the real degree of a node. We note that our results are in line with the original findings of Milgram [79] and also with our experiments, presented in section 7. The main theorem is the following. β cp then the q-effective diameter of G(P, E) is constant with Theorem 3.5.1 Let ci < 1−β high probability, for every constant q < 1. To prove Theorem 3.5.1 we first show the following lemma on the maximum distance between two nodes in the core in B(P, I). β Lemma 3.5.1 Let ci < 1−β cp . Then, there exists a constant D such that for any pair of nodes u, v ∈ C the distance between u and v in B(P, I) is smaller than or equal to D with high probability. Proof: The idea behind the proof is to show that B(P, I) contains a subgraph with properties similar to an Erdös-Renyi random graph. More specifically we will show that a graph composed by the nodes in the core and some path of length 2 between them behave as an Erdös-Renyi random graph G(C, M ), i.e. a graph chosen uniformly at random among all graphs having |C| nodes and |M | edges. In addition we will prove that in this graph |M | = Ω(|C|1+α ), for some constant α. Thus from [57] it follows that the diameter of G(C, M ) is smaller or equal to 1/α with high probability. And so we will get that the diameter of the core is bounded by 2/α with high probability. Consider the following alternative description of the evolution of B(P, I). With probability β a new node v is added to P . Then the following steps take place: • The new node v selects k > 1 edges, (p1 , i1 ), · · · , (pk , ik ) uniformly at random and the edges (p, i1 ), · · · , (p, ik ) are added to B(P, I), • For j = 1, 2, . . . , k, cpj − 1 nodes are chosen uniformly at random in N (pj )/ij , and v is connected to them. Where N (pj ) is the neighborhood of pj . 44 CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS A symmetric process takes place when a node is added to I. Note that from definition 3.4.2 and from lemma 3.4.1 all nodes in C are inserted before time φn with high probability, for any constant φ > 0. Let us fix a time δn for some fixed constant δ > 0 and let d be the minimum degree of a node in C at time δn. We say that an edge is fair if it is among the first d edges that are added to the node in C. When a new node, inserted after time φn, selects two fair edges in the first step of our alternative definition of the process we say that a pseudo-edge is added between the two endpoints in I of the fair edges. Note that with this definition every node in I is selected as an endpoint of a pseudo-edge with the same probability. So we can define an Erdös-Renyi-like random graph G(C, M ) that consists of the nodes in the core and the pseudo-edges. Now we want to find a bound on the number of pseudo-edges. From theorem 3.4.1 and lemma 3.4.1 we have with high probability that the number of fair edges is larger or equal to d|C|. Thus, at any step, the probability of adding a pseudo-edge is larger or equal to n(cpd+ci ) |C|. By computing the expectation and applying the Chernoff bound, we get that the number d of pseudo-edges in G(C, M ) is in (cp +c |C| with high probability. By noticing that only an i) o(1) of a loop at every step is the pseudo-edges is a loop i.e. the probability of introducing 2 1+α d ( /n) and the bound on the diameter of a G(N, M ) when M = n presented in [57], we get that the maximum distance between any two nodes in the core is upper bounded by a constant. The following corollary is a consequence of Lemma 3.5.1. β Corollary 3.5.1 Let ci < 1−β cp then the hubs are at constant distance in G(P, E) and B(P, I) with high probability. Now we prove Theorem 3.5.1. Proof: Recall that from Lemma 3.4.2 we have that all nodes in P inserted after time φn, for any φ > 0, will have at least one preferentail attachment edge incident to an hub with probability 1 − o(1). Now, let Xi be a random variable such that: 1 if i has a hub in its neighborhood Xi = 0 otherwise Pn The number of nodes that have at least one hub in their neighborhood is i=1 Xi ≥ i hP Pn n i=φn Xi . From Lemma 3.4.2 it follows that E i=φn Xi ≥ (1 − c)n, for any constant c > φ. Observe that eachP Xi satisfies the Lipschitz condition with di equal 1. So by Theorem 3.3.1 we have that ni=φn Xi ≥ (1 − c0 )n, for any constant c0 > c. Hence the claim follows from Corollary 3.5.1. 3.6 Local routing and the interest space In this section we analyze the performance of a local routing algorithm based on the interests. We notice that it is not clear if the model introduced in chapter 2 is navigable, in particular it is not even clear what is its diameter. Our model has two separate graphs that evolve together, the friendship graph and the affiliation network. In this section we show that the 3.6. LOCAL ROUTING AND THE INTEREST SPACE 45 affiliation network naturally induces a space of interests that is navigable. This is a crucial feature of a model for Milgram’s experiment, for it is known that cues other than geographic distance play a crucial role. For instance, in the experiment the target was defined not only by a location, but, crucially, by a profession. This is also the first study of the performance of local routing algorithm on an evolving model. We study for the first time the navigability problem in an evolving graph and with an evolving embedding. Furthermore, ours is the first model that can explain Milgram’s experiment if we assume some constant attrition, as suggested in [48] (i.e. in this case only paths of constant length can be observed with high probability). We start by defining a notion of distance between interests. In order to do this we have first to define the prototype graph G(I, Ẽ). The nodes of the prototype graph are the interests of the Affiliation Network, and two interest i1 , i2 have an edge between them if i1 has been selected as a prototype for i2 or vice versa. Furthermore, we have that two initial interests i0 and i00 contained in the graph B0 (P, I) are connected if there is a person contained in B0 (P, I) that is interested to both i0 and i00 . Note that the prototype graph is composed by the clique of the initial interests and a DAG and every non stating interest ts connected only with topics related to it. Now we can define the distance between two interests. Definition 3.6.1 [Distance between interests] Let i1 , i2 ∈ I. We define the distance between i1 and i2 as the hop distance of the two nodes in the prototype graph. Further, we define the interest distance between two people p1 and p2 as the smallest distance between any pair of interests, where the first element of the pair is an interest of p1 and the second is an interest of p2 . In our analysis we assume that every person knows the distance between any two interests. In practice we are assuming that every person in reality is able to compute the similarity between every two interests decide who is his(her) closest neighbor to the target1 . In particular in our setting the message holder has knowledge of the distances between interests of the destination and those of its neighbors. We define our routing algorithm as follows. Definition 3.6.2 [Local Routing algorithm] In each step the message holder u performs the following local algorithm: • If the destination is a neighbor of u, the message is forwarded to it. • Otherwise, u forwards the message to the neighbor that minimizes the interest distance to the destination. We start by proving a basic property of our algorithm. Lemma 3.6.1 In every step of the local routing algorithm, or the interest distance between the message holder and the destination is reduced or the message is delivered to the target. 1 Note that this assumption is made also in every previous navigation model. For example in the Kleinberg [58, 59] model a node is always able to select the neighbor that is closer in the metric space to the target. 46 CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS Proof: If the message holder knows the target the lemma is true by definition 3.6.2. Otherwise let v be any interest of the message holder and let w(v) be an interest connected to v in the prototype graph but with smaller distance from the target. Note that w(v) always exists because the graph is connected. There are three cases: either v, w(v) ∈ B0 (P, I) so there is a person in B0 (P, I) interested to both v, w(v). Or v is a prototype of w(v) or vice-versa, those last two cases are symmetric and in both cases v and w(v) have a neighbor in common in B(P, I) by definition of the evolving process. Thus in any case for any interest v of the message holder in the people graph there is a person that is interested to both v and w(v). Thus in the neighborhood of the message holder for any interest v there is a person interested to w(v). So using the local routing algorithm it is always possible forward the message to neighbor closer to the target and thus the claim follows. We now show that for most source-destination pairs it is possible to route the message within a constant number of steps, provided that the destination is selected with a probability that is proportional to its degree, i.e. its “popularity” in the social network. This result is in accordance with the analysis of Milgram’s experiment done by Kleinfeld [61], who pointed out that a successful outcome crucially depends on the social status of the target2 . β cp . If the destination is selected with probability proportional Theorem 3.6.1 Let ci < 1−β to its degree and the source is selected uniformly at random then, with probability bigger or equal (1 − φ − o(1)), for any constant φ > 0, the local routing algorithm route the message in constantly many steps. Proof: Let v be the destination, we first prove that with probability 1 − o(1) v is an hub. Let V (hubs, t) be the total volume of the hubs at time t, and V (G/hubs, t) the total volume of the rest of the graph at time t. Recall that as shown in Lemma 3.4.2 we have that, for t > φn: ” “ X c (1−β) 1+γ 1− i c β p dG (v) ∈ Ω t V (hubs, t) = v∈hubs and V (G/hubs, t) = X “ ” c (1−β) 1+ 1− i c β dG (v) ∈ Θ t p v ∈hubs / For some such that γ > . Thus when the destination is selected with probability proportional to its degree, with probability 1 − o(1), it will be an hub. In addition note that Lemma 3.5.1 implies that two hubs are at constant distance also in the interest space. So, by Lemma 3.6.1, it holds with high probability that if a message reaches an hub it will need 2 Also this point is in contrast by the claim in [31], but on this point Kleinfeld wrote in [61] that in the Milgram’s experiment “the selection of the sample. I found in the archives the original advertisement recruiting subjects for the Wichita, Kansas study. This advertisement was worded so as to attract not representative people but particularly sociable people proud of their social skills and confident of their powers to reach someone across class barriers.” Besides this there are other experiments that suggest that social barriers can actually stop the Milgram’s local routing algorithm [62, 74] 3.6. LOCAL ROUTING AND THE INTEREST SPACE 47 only a constant additional number of step to reach every other hub using the local routing algorithm defined in 3.6.2. Now note that Lemma 3.4.1 implies that all the hubs are inserted before time φn with high probability, for every constant φ > 0. Further by Lemma 3.4.2 every node inserted after time φn will have a hub in its neighborhood with probability 1 − o(1). So with probability (1 − φ − o(1)) the destination is a hub and the source has at least a hub in its neighborhood. Thus the local routing algorithm of Definition 3.6.2 will deliver a message in a constant number of round with probability bigger or equal to (1 − φ − o(1)). We now consider another interesting setting. In this case we expand the interests of the destination in such a way that they include the interest of its neighbors. We call this case the expanded interests setting. This is an attempt to capture the additional knowledge that human subjects have about the destination, apart from its personal information. This setting is interesting because it capture some feature of the original Milgram’s experiment. For instance, in the first experiment presented by Milgram in [79], the sources knew also that the target was married with a divinity student at Cambridge. In this setting we can prove the following theorem. β cp . In the expanded interests setting when source and destinaTheorem 3.6.2 Let ci < 1−β tion are selected uniformly at random then, with probability (1 − 2φ − o(1)), the local routing algorithm will route the message in constantly many steps, for every constant φ > 0. Proof: The proof strategy is similar to the proof of the previous lemma the main difference is that in the previous Lemma the hubs played a crucial role instead in this new Lemma the central role is played by the nodes in the core. Let v be the destination, we first prove that with probability 1 − o(1) v has a neighbor with an interest in the core. Let EC (t) be set of folded edges that are generated by an interest in the core at time t and E(t) the number of edges at time t. Using the same strategy of Lemma 3.4.2, we get that, for t > φn: ” “ X c (1−β) 1+γ 1− i c β p EC (t) = dG (v) ∈ Ω t v∈hubs and E(t)/EC (t) = X ” “ c (1−β) 1+ 1− i c β dG (v) ∈ Θ t p v ∈hubs / Where γ > . Now with probability 1 − φ the destination is a node inserted after time φn and by Lemma 3.4.1 every interest in the core has been inserted before time φn, for every constant φ > 0. So with high probability the destination is connected with a preferential attachment edge to a node that is interested to a topic in the core. Thus if we augment the interests of the destination with those of its neighbors we have that with probability 1 − φ − o(1) the new set of interests have an interest in the core. But now as shown in Lemma 3.6.1 the source has an hub in its neighborhood with probability 1 − φ − o(1). Furthermore by Lemma 3.5.1 it follows that if the message is in an hub in a constant number of step it can reach every node that has an interest in the core. Thus, using the same argument of Lemma 3.6.1, we get the claim. 48 CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS Now we study the most general case, when source and target are chosen adversarially and we do not extend the interest space of the destination, in this setting we are able to show the following upper bound on the running time of the local routing algorithm. β Theorem 3.6.3 If ci < 1−β cp then, for any source and any destination, the local routing algorithm routes the message within O(log2 n) steps with high probability. Proof: To prove the result we will bound the diameter of the interest prototype tree, by Lemma 3.6.1 the diameter is an easy upper bound to our local routing algorithm. In particular we will show that, with high probability (whp), the diameter of the prototype graph is O(log2 n). The general idea of the proof is to divide the random process in O(log n) macro-phases and to show that in each macro-phase the probability that diameter increases of a ω (log n) 1 is o log n . Thus, we get that the diameter is O(log2 n) whp. Let us divide our evolving process in O(log n) phases. In phase zero we group the first 600 log n steps. Phase one is from the end of phase zero to step b(1 + ) log nc, for a small constant > 0. Phase two is up to step b(1 + )2 log nc. In genera, phase i starts after the end of phase i − 1 and ends at step b(1 + )i log nc. Let now consider a generic phase t > 0. Let T = (1 + )t 600 log n. First we want to bound the number of edges that we have at the beginning of each phase in B(P, I). Let At be the random variable that counts the number of edges at the beginning of phase t. We have that E[At ] = (βcp + (1 − β)ci )T . By the Chernoff bound we have that 1 E[At ] 1 P r |E[At ] − At | > E[At ] ≤ exp − ≤ 2. 10 300 n Thus using the union bound on the number of macro-phases, it follows that at the beginning 9 11 of each phase t, 10 E[At ] ≤ At ≤ 10 E[At ] with high probability. In the rest of the proof we 9 11 will assume that 10 E[At ] ≤ At ≤ 10 E[At ]. To get a bound on the diameter, we start by studying the two following event ξ1 and ξ2 . ξ1 (j) = {interest j, inserted in phase t, of degree ci is selected in a step during phase t as a prototype for the first time} ξ2 (j) = {interest j, inserted in phase t, of degree ci increases its degree in a step during phase t} First notice that from the definition of the evolving process, we have that P r[ξ1 (j)] ≤ i ≤ 10c . 9T To bound P r[ξ2 (j)], recall that interest j has degree ci , so there are ci people interested in it, denote them as p1 , p2 , · · · , pci . Now if j increases its degree, this implies that a new person arrives in the graph and copies the interest j from one of the person interested to it, p1 , p2 , · · · , pci . This happens with probability: c ci X 10di 1 p P r[ξ2 (j)] ≤ 1− 1− 9T di i=1 ci At 3.7. EXPERIMENTS 49 By an application of calculus, it is possible to see that this probability is maximized when d1 = · · · = dci = T . Thus c cc cp 1 p p i P r[ξ2 (j)] ≤ ci 1 − 1 − ≤ ci 1 − e T ≤ T T So P r[ξ1 (j) ∨ ξ2 (j)] ≤ 2 cpTci . Let us define ξ(j) = ξ1 (j) ∨ ξ2 (j). Now we can compute the probability that in phase t the diameter of the prototype graph increases by more then C, with C > e. Let us call this event τC . Note that if τC implies that a sequence of C new interests added in phase t increase the diameter of the prototype tree of C. In order for this event to hold ξ have to occur at least C times in a phase. So we can upper bound τC as follows. P r[τC ] ≤ (# of steps in a phase) · (# of new nodes in a phase)· ·P [xi (j) holds for node j of degree ci ] cp cq C dT e dT e 2 ≤ dT e C P [xi holds for a node j in a step] ≤ dT e C T p 1 dT e 2πdT e (dT e) e 12n cp cq C ≤ dT e p 2 1 T 2πC2π(dT e − C) C C (dT e − C)dT e−C e 6n+1 √ T −C C 1 1 T T = dT e p (2cp cq )C e( 12n − 6n+1 ) C 2πC(T − C) T − C T −C C C C ≤ dT e 1 + (2cp cq )C ≤ dT eeC (2cp cq )C T − C C C < dT e (2cp cq )C Where in the third inequality we use Stirling’s approximation [86, 106]. Therefore the probability of τC decreases geometrically with C. Finally, let us compute the probability that the final diameter is bigger than K = k log2 n. After the first phase the diameter is at most 600 log n, so we can bound the previous probability as the probability that the diameter increases by at least (k − 600) log n after phase 1. Hence X log n P r[diameter is at least k log n] ≤ Πi=2(1+) P r[ξki ] k2 ,k3 ,··· ,klog Plog(1+) n i=2 ≤ ≤ (1+) n ki =K−600 log n log(1+) n · (K − 600 log n) · T log n · (2cp cq )K−600 log n log(1+) n · (K − 600 log n) · Θ nlog n · n−k log n ∈ o(1) Thus by fixing a big enough k the claim follows. 3.7 Experiments Our mathematical model of social networks, building on the affiliation network model, suggests natural decentralized routing algorithms in social networks. Namely, given a source 50 CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS vertex s and a target vertex t, identify the interests of s and t in the underlying affiliation network and identify the neighbor of s whose interests are closer to that of t (with respect to the hierarchy of interests implied by the prototype selection step). Inspired by this, one can define natural algorithms that perform decentralized routing in real-world social networks by suitably approximating the process of navigating the interest hierarchy. In this section, we do precisely this, and report our findings based on simple experiments with a modestly-sized social network. Our social network consists of authors as nodes and edges defined by co-authorship of one or more articles. We downloaded a copy of the DBLP database of computer science papers, a DB of roughly 735,000 authors and 1.24M articles, and constructed the co-authorship graph with about 4.63M edges (for an average degree of roughly 6.7 co-authors per node). On this network, we randomly selected about 575 source–target pairs and attempted to construct paths between them. The largest connected component in this network has roughly 80% of the vertices, with the rest of the vertices in very small isolated components, so that the probability that two randomly selected nodes belong to the largest connected component is roughly 64%. The mean length of the shortest path between nodes in this component is roughly 6.3 (with a median length of 6). Notice that in this way, we construct an affiliation network where two authors are friends if they coauthor a paper, now we have to infer a metric on the interest in order to route the messages. Unfortunately this is not easy, because there is not a clear definition of closeness between papers and all the standard classification system for the papers are too poor for our purpose. To overcome this difficulty we define the interest space not as the set of papers but as the set of bigrams and unigrams contained in the title of the paper. In particular we begin by segmenting article titles into one-word and two-word sequences (unigrams and bigrams) after suitably eliminating stopwords that occur commonly (‘and’, ‘the’, etc.). For instance, the title “Small world experiments for everyone” generates four unigrams — ‘small’, ‘world’, ‘experiments’, and ‘everyone’, and two bigrams — ‘small world’, ‘world experiments’. Both bigrams and unigrams are treated as interests, with the latter of a more generic kind; for instance, the unigram ‘physics’ is somewhat general, whereas the bigram ‘particle physics’ is much more specific. In this fashion, for every author, their interest profile is identified; specifically, for author a and interest i, we define s(i, a) to be the strength of interest i for author a, and is defined as the number of occurrences of interest (unigram/bigram) i within author a’s publications. To simulate Milgram’s experiment, our basic algorithm operates as follows: if we are currently at node x, we move to the neighbor y of x whose interest profile is closest to the target t, where the measure of proximity of y to t is computed according to the formula proximity(y, t) = P Interest i s(i,y)s(i,t) , p(i) P where p(i) denotes the overall popularity of interest i, defined by p(i) = a s(i, a). If there is no neighbor with non-zero proximity, we either declare failure, or in a variation of the experiment, proceed greedily to the neighbor of highest degree. The most basic variant of the algorithm outlined insists that the proximity measure strictly increase in each step of the routing: this version is called Local-Monotone, and the ver- 3.7. EXPERIMENTS 51 sion without this restriction is called Local. The next variation we consider is to allow one step of ‘lookahead’, where we not only evaluate neighbors of x, but also evaluate neighbors of neighbors of x, and route through the neighobor whose neighbor achieves the highest proximity to the target; this idea of ‘lookahead’, very common in computer science, captures the belief that in real social networks, one not only has knowledge about their friends, one often has partial knowledge about friends-of-friends. The corresponding non-monotone and monotone variations are called, respectively, Lookahead and Lookahead-Monotone. In a third variation, we allow the algorithm the knowledge not only of the target’s interests, but also those of its neighbors’; this is a ‘reverse’ and limited form of lookahead, and has precedent in Milgram’s experiment, where the sources had the knowledge that the target was the wife of a student of divinity in Cambridge, Mass. This is naturally aimed at routing to hard-to-reach destinations by augmenting the algorithm with extra information. The corresponding variations of the four algorithms described above are Local-Expand, Local-Monotone-Expand, and so on. Figure 3.1, 3.2 report the percentage of succesful chains for the eight variations of the decentralized routing algorithm we studied. For reference, we compare the performance of the decentralized routing algorithms to that of the omniscient algorithm that has full information about the network structure and employs a standard ‘shortest path’ computation. The ‘success percentage’ in Figure 3.1, 3.2 is the percentage of source–target pairs successfully routed, divided by 0.64 (which is the fraction for this omniscient algorithm). The results are presented in four groups, each corresponding to one value of a parameter called τ , which restricts the sampling of the target nodes to be uniform among all nodes of degree at least τ ; this is done to explore the role of the centrality of the target in determining the success of decentralized routing. Success Rate without expanded interests 100 Lookahead Monotone Local Monotone Lookahead Local Success Rate 80 60 40 20 0 0 2 4 6 8 10 12 Minimum degree of the destinations 14 16 Figure 3.1: Success Rate without extended interest. We briefly highlight some salient observations based on Figures 3.1, 3.2, 3.3 and 3.4 and other related experiments. (1) Navigation based on interests is an extremely powerful paradigm; the success of the basic algorithm Local in achieving 21% successful routing is, a priori, unexpected, given how crude our construction of the interest space is. In particular the previous replicas of the small-world experiment had always lower successful rate [?, ?]. (2) Adding even one of two natural cues to local routing (either expanding the interests of 52 CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS Success Rate with expanded interests 100 Success Rate 80 60 40 20 0 Lookahead Monotone Local Monotone Lookahead Local 0 2 4 6 8 10 12 Minimum degree of the destinations 14 16 Figure 3.2: Success Rate with extended interest. Average path length without expanded interests 30 Lookahead Monotone Local Monotone Lookahead Local Path Length 25 20 15 10 5 0 0 2 4 6 8 10 12 Minimum degree of the destinations 14 16 Figure 3.3: Average path length without extended interest. the target or adding a step of lookahead) is enormously powerful — with each cue raising the success rate to about 57%, and reducing the path length from about 24 to about 12. (3) Adding both interest expansion and lookahead results in 80% successful routing, with extremely short paths (a median path length of 7). (4) Insisting on monotonically better proximity to the target’s interests typically reduces success rate, but significantly improves the length of the path constructed, for each of the four variations of the algorithm. (5) Picking the target from a distribution that is restricted to targets of certain minimum degree dramatically improves the success rate and path length for decentralized routing algorithms. While this restriction might appear strange, this captures the idea the even modestly ‘well-connected’ nodes are significantly easier to reach than completely isolated ones. When we place a minimum degree restriction of 15 (recall that the average degree is only 6.7), the best algorithm achieves 97% success rate and produces paths almost as short as the shortest possible! Even the simplest of algorithms, Local, succeeds on 50% of the cases — this reinforces the argument made by Kleinfeld12 , who, analyzing Milgram’s experiments, suggests that the success of the routing depends, at least to some extent, on the fact that the target was not a completely isolated person but one well-connected in terms of geographic location, employment, social status, etc. 3.7. EXPERIMENTS 53 Average path length with expanded interests 20 Lookahead Monotone Local Monotone Lookahead Local Path Length 15 10 5 0 0 2 4 6 8 10 12 Minimum degree of the destinations 14 16 Figure 3.4: Average path length with extended interest. Chapter 4 Gossip In this chapter we show that if a connected graph with n nodes has conductance φ then rumour spreading, also known as randomized broadcast, successfully broadcasts a message within Õ(φ−1 · log n), rounds with high probability, regardless of the source, by using the PUSH-PULL strategy. The Õ(· · · ) notation hides a polylog φ−1 factor. This result is almost tight since there exists graph of n nodes, and conductance φ, with diameter Ω(φ−1 · log n). If, in addition, the network satisfies some kind of uniformity condition on the degrees, our analysis implies that both both PUSH and PULL, by themselves, successfully broadcast the message to every node in the same number of rounds. 4.1 Introduction Rumour spreading, also known as randomized broadcast or randomized gossip (all terms that will be used as synonyms throughout the chapter), refers to the following distributed algorithm. Starting with one source node with a message, the protocol proceeds in a sequence of synchronous rounds with the goal of broadcasting the message, i.e. to deliver it to every node in the network. In round t ≥ 0, every node that knows the message selects a neighbour uniformly at random to which the message is forwarded. This is the so-called PUSH strategy. The PULL variant is symmetric. In round t ≥ 0 every node that does not yet have the message selects a neighbour uniformly at random and asks for the information, which is transferred provided that the queried neighbour knows it. Finally, the PUSH-PULL strategy is a combination of both. In round t ≥ 0, each node selects a random neighbour to perform a PUSH if it has the information or a PULL in the opposite case. These three strategies have been introduced in [30] and since then have been intensely investigated (see the related work section). One of the most studied questions concerns their completion time: how many rounds will it take for one of the above strategies to disseminate the information to all nodes in the graph, assuming a worst-case source? In this chapter we prove the following two results: The work described in this chapter is a joint work with F.Chierichetti and A. Panconesi, its extended abstract appeared in the Proceedings of 42st ACM Symposium on Theory of Computing (STOC10) [27]. 55 56 CHAPTER 4. GOSSIP • If a network has conductance φ and n nodes, then, with high probability, PUSH-PULL log2 φ−1 reaches every node within O · log n many rounds, regardless of the source. φ • If, in addition, the network satisfies the following condition for every edge uv and some constant α > 0: deg(u) deg(v) max , ≤α deg(v) deg(u) then both PUSH and PULL, by themselves1 , reach every node within O(cα · φ−1 · log n · log2 φ−1 ) many rounds with high probability regardless of the source, where cα is a constant depending only on α. The first result is a significant improvement with respect to best current bound of O(log4 n/φ6 ) [27]. (The proof of [27] is based on an interesting connection with spectral sparsification [98]. The approach followed here is entirely different.) The result is almost tight because Ω(log n/φ) is a lower bound2 — in particular, the bound is tight in the case of constant conductance (for instance, this is the case for the almost-preferential-attachment graphs of [78].) The second result can be proved using the same approach we use for the main one. Our main motivation comes from the study of social networks. Loosely stated, we are looking for a theorem of the form “Rumour spreading is fast in social networks”. There is some empirical evidence showing that real social networks have high conductance. The authors of [72] report that in many different social networks there exist only cuts of small (logarithmic) size having small (inversely logarithmic) conductance – all other cuts appear to have larger conductance. That is, the conductance of the social networks they analyze is larger than a quantity seemingly proportional to an inverse logarithm. Our work should also be viewed in the context of expansion properties of graphs, of which conductance is an important example, and their relationship with rumour spreading. In particular we observe how, interestingly, the convergence time of the PUSH-PULL process on graph of conductance φ is a factor of φ smaller than the worst-case mixing time of the uniform random walk on such graphs. Conductance is one of the most studied measures of graph expansion. Edge expansion, and vertex expansion are two other notable measures. In the case of edge expansion there are classes of graphs for which the protocol is slow (see [26] for more details), while the problem remains open for vertex expansion. 1 We observe that the star, a graph of conductance O(1), is such that both the PUSH and the PULL strategy by themselves require Ω(n) many rounds to spread the information to each node, assuming a worst case, or even uniformly random, source. That is, conductance alone is not enough to ensure that PUSH, or PULL, spread the information fast. 2 Indeed, choose any n, and any φ ≥ n−1+ . Take any 3-regular graph of constant vertex expansion (a random 3-regular graph will suffice) on O(n · φ) nodes. Then, substitute each edge of the regular graph with a path of O(φ−1 ) new nodes. The graph obtained is easily seen to have O(n) nodes, diameter O(φ−1 · log n) and conductance Ω(φ). 4.2. RELATED WORK 57 In terms of message complexity, we observe first that it has been determined precisely only for very special classes of graphs (cliques [55] and Erdös-Rényi random graphs [36]). Apart from this, given the generality of our class, it is impossible to improve the trivial upper bound on the number of messages – that is, number of rounds times number of nodes. For instance consider the “lollipop graph” 3 . Fix ω(n−1 ) < φ < o(log−1 n), and suppose to have a path of length φ−1 connected to a clique of size n − φ−1 = Θ(n). This graph has conductance ≈ φ. Let the source be any node in the clique. After Θ(log n) rounds each node in the clique will have the information. Furthermore, at least φ−1 steps will be needed for the information to be sent to each node in the path. So, at least n − φ−1 = Θ(n) messages are pushed (by the nodes in the clique) in each round, for at least φ−1 − Θ(log n) = Θ(φ−1 ) rounds. Thus, the total number of messages sent will be Ω(n · φ−1 ). Observing that the running time is Θ(φ−1 + log n) = Θ(φ−1 ), we have that the total number of rounds times n is (asymptotically) less than or equal to the number of transmitted messages. 4.2 Related work The literature on the gossip protocol and social networks is huge and we confine ourselves to what appears to be more relevant to the present work. Clearly, at least as many rounds as the diameter are needed for the gossip protocol to reach all nodes. It has been shown that O(n log n) rounds are always sufficient for each connected graph of n nodes [39]. The problem has been studied on a number of graph classes, such as hypercubes, bounded-degree graphs, cliques and Erdös-Rényi random graphs (see [39, 44, 88]). Recently, there has been a lot of work on “quasi-regular” expanders (i.e., expander graphs for which the ratio between the maximum and minimum degree is constant) — it has been shown in different settings [7, 32, 33, 43, 95] that O(log n) rounds are sufficient for the rumour to be spread throughout the graph. See also [56,82]. Our work can be seen as an extension of these studies to graphs of arbitrary degree distribution. Observe that many real world graphs (e.g., facebook, Internet, etc.) have a very skewed degree distribution — that is, the ratio between the maximum and the minimum degree is very high. In most social networks’ graph models the ratio between the maximum and the minimum degree can be shown to be polynomial in the graph size. Mihail et al. [78] study the edge expansion and the conductance of graphs that are very similar to preferential attachment (PA) graphs. We shall refer to these as “almost” PA graphs. They show that edge expansion and conductance are constant in these graphs. Their result and ours together imply that rumor spreading requires O(log n) rounds on almost PA graphs. For what concerns the original PA graphs, in [26] it is shown that rumour spreading is fast (requires time O(log2 n)) in those networks. In [17] it is shown that high conductance implies that non-uniform (over neighbours) rumour spreading succeeds. By non-uniform we mean that, for every ordered pair of neighbours i and j, node i will select j with probability pij for the rumour spreading step (in general, pij 6= pji ). This results does not extend to the case of uniform probabilities studied 3 The lollipop graph is the graph obtained by joining a complete graph to a path graph with a bridge. 58 CHAPTER 4. GOSSIP in this chapter. In our setting (but not in theirs), the existence of a non uniform distribution that makes rumour spreading fast is a rather trivial matter. A graph of conductance φ has diameter bounded by O(φ−1 log n). Thus, in a synchronous network, it is possible to elect a leader in O(φ−1 log n) many rounds and set up a BFS tree originating from it. By assigning probability 1 to the edge between a node and its parent one has the desired non uniform probability distribution. Thus, from the point of view of this chapter the existence of non uniform probabilities is rather uninteresting. In [82] the authors consider a problem that at first sight might appear equivalent to ours. They consider the conductance φP of the connection probability matrix P , whose entry Pi,j , 1 ≤ i, j ≤ n, gives the probability that i calls j in any given round. They show that if P is doubly stochastic then the running time of PUSH-PULL is O(φ−1 P · log n). This might seem to subsume our result but this is not the case. The catch is that they consider the conductance of a doubly stochastic matrix instead of the actual conductance of the graph, as we do. Observe that the are graphs of high conductance that do not admit doubly stochastic matrices of high conductance. For instance, in the star, no matter how one sets the probabilities Pij , there will always exist a leaf ` that will be contacted by the central 1 . Since the matrix is doubly-stochastic this implies that ` will node with probability ≤ n−1 contact the central node with probability O(n−1 ). Thus, at least Ω(n) rounds will be needed. Therefore their result gives too weak a bound for the uniform PUSH-PULL process that we analyze in this chapter. 4.3 Preliminaries Observe that 1 2 vol(V ) = |E|. Given S ⊆ V , and v ∈ S, we define NS+ (v) = {w | w ∈ V − S ∧ {v, w} ∈ E} + − − + − and d+ S (v) = NS (v) . Analogously, we define NS (w) = NV −S (w) and dS (w) = NS (w) . Recall that the conductance (see [51]) of a graph G(V, E) is: Φ(G) = cut(S, V − S) S⊂V :vol(S)≤|E| vol(S) min Where cut(S, V − S) is the number of edges in the cut between S and V − S and vol(S) is the volume of S. We recall three classic concentration results for random variables using, respectively, the first moment, the second moment and every moment of a random variable X. Theorem 4.3.1 (Markov inequality) Let X be a random variable. Then, E[|X|] Pr |X| ≥ ≤ . 4.3. PRELIMINARIES 59 Theorem 4.3.2 (Chebyshev inequality) Let X be a random variable. Then, h i p Pr |X − E[X]| ≥ Var[X]/ ≤ , where Var[X] is the variance of X, Var[X] = E[X 2 ] − E[X]2 . Pn Theorem 4.3.3 (Chernoff bound) Let X = i=1 Xi , where Xi are independently distributed random variables in [0, 1]. Then, 2 Pr [|X − E[X]| > · E[X]] ≤ exp − · E[X] . 3 We now state and prove some technical lemmas that we will use in our analysis. The first one can be seen as an “inversion” of Markov’s inequality. Lemma 4.3.1 Suppose X1 , X2 , . . . , Xt are random variables, with Xi having co-domain {0, vi } and such that Xi = vi with probability pi . Fix p ≤ min pi . Then, for each 0 < q < p, " X # X 1−p Pr Xi ≥ 1 − · vi ≥ q. 1 − q i i Proof: Let X i = vi − Xi . Observe that each Xi and each X i is a non-negative random variable, of mean Pexpected sum P pi · vi and (1 − pi ) · vi , respectively. We use µ to denote the of the Xi , µ = (pi · vi ), and P µ to denote the expected sum of the X i , µ = ((1 − pi ) · vi ). Observe that µ ≤ (1 − p) · vi . We have " # 1−p X Pr Xi ≤ 1 − vi ≤ 1−q i i " # X X 1 Pr ·µ = Xi ≤ vi − 1−q i i " # X 1 Pr Xi ≥ ·µ ≤ 1−q 1−q i X where in the last step we applied Markov’s inequality. Thus the claim. The next lemma gives some probabilistic bounds on the sum of binary random variables having close expectations. Lemma 4.3.2 Let p ∈ (0, 1). Suppose X1 , . . . , Xt are independent 0/1 random variables, the i-th of which such that Pr[Xi = 1] = pi , with 12 · p ≤ pi ≤ p. Then, P 1 1. if pt2 > 1, then Pr[ Xi ≥ pt4 ] ≥ 32 ; P 2. if pt2 ≤ 1, then Pr[ Xi ≥ 1] ≥ pt4 ; 60 CHAPTER 4. GOSSIP 3. in general, for P = min 1 pt , 32 4 , we have X Pr Xi ≥ pt 128 · P ≥ P. P Proof: Let X = ti=1 Xi . In the first case, E[X] ≥ t · p2 ; in particular, E[X] ≥ 1. Therefore, by Chernoff’s bound, we have 1 1 Pr X < · E[X] ≤ e− 16 E[X] 2 1 1 ≤ e− 16 ≤ 1 − , 32 where the last inequality follows from e−x ≤ 1 − x 2 if x ∈ [0, 1]. In the second case, we compute the probability that for no i, Xi = 1: t Y Pr [Xi = 0] ≤ i=1 t Y i=1 = p 1− 2 p t 1− = 2 p ≤ e− 2 ·t ≤ 1 − So, with probability ≥ pt 4 p p2 1− 2 p2 ·t pt . 4 at least one Xi will be equal to 1. The third case, follows directly from the former two, by choosing — respectively — 1 , and P = pt2 . P = 32 The following lemma, which we will use later in the analysis, gives a probability bound close to the one that one could be obtained using Bernstein Inequality. We keep it this way, for simplicity of exposition of our later proofs. Lemma 4.3.3 Suppose a player starts with a time budget of B time units. At each round i, an adversary (knowledgeable of the past) chooses a number of time units 1 ≤ `i ≤ L. If the remaining budget of the player is at least `i then a game, lasting for `i time units, is played. The outcome of the game is determined by an independent random coin flip: with probability pi ≥ P the gain is equal to `i , the length of the round, and with probability 1 − pi the gain is zero. The game is then repeated. 24 If B ≥ 193 · PL · ln dlogδ2 Le with probability at least 1 − δ the gain is at least 193 · B · P. Proof: Let the game Pt go on until the192end. Suppose the adversary chose games’ lengths `1 , `2 , . . . , `t , with i=1 `t > B − L ≥ 193 B. Let Xj be the set containing all the rounds whose `i ’s were such that 2j ≤ `i < 2j+1 , Xj = {i | 2j ≤ `i < 2j+1 }. The sets X0 , X1 , . . . , Xdlog2 Le partition the rounds in O(log L) buckets. 4.3. PRELIMINARIES 61 AssignPwith each bucket Xj the total number S(Xj ) of time units “spent” in that bucket, S(Xj ) = i∈Xj `i . Let X be the set of buckets Xj for which S(Xj ) ≥ 12 · 2j+1 · ln dlogδ2 Le . The total number P of time units spent in buckets not in X will then be at most dlog2 Le X j=0 dlog2 Le 12 j+1 · 2 · ln P δ dlog2 Le 12 dlog2 Le X 96 dlog2 Le = · ln · 2j+1 ≤ · L · ln . P δ P δ j=0 P Therefore the total number of units spent in buckets of X , S(X ) = Xj ∈X S(Xj ), will be 96 at least S(X ) ≥ 193 B. Furthermore, the number of rounds |Xj | played in bucket Xj ∈ X will be at least S(Xj ) · 2−(j+1) ≥ 12 · ln dlogδ2 Le . Each such round will let us gain a positive P amount with probability at least P . Therefore, the expected number of rounds E[W (Xj )] in bucket Xj ∈ X having positive gain will be at least E[W (Xj )] ≥ 2−(j+1) · S(Xj ) · P ≥ 12 · ln dlog2 Le . δ By the Chernoff bound, 1 −(j+1) 1 Pr W (Xj ) < · 2 · S(Xj ) · P ≤ Pr W (Xj ) < · E[W (Xj )] 2 2 1 ≤ exp − · E[W (Xj )] 12 dlog2 Le δ . ≤ exp − ln = δ dlog2 Le Observe that the gain G(Xj ) in bucket Xj ∈ X is at least 2j · W (Xj ). By union bound, the probability that at least one bucket Xj ∈ X is such that G(Xj ) < 14 · S(Xj ) · P is at most δ. Therefore with probability at least 1 − δ, the total gain is at least 24 P P X B · P. S(Xj ) = · S(X ) ≥ 4 X ∈X 4 193 j Finally we give a lemma that underlines symmetries in the PUSH-PULL strategy. Let u→ − v be the event that an information originated at u arrives to v in t rounds using the t PUSH-PULL strategy. And let u ← − v the event that an information that is originally in v arrives to u in t rounds using the PUSH-PULL strategy. We have that: t Lemma 4.3.4 Let u, v ∈ V , then t t P r[u → − v] = P r[u ← − v] 62 CHAPTER 4. GOSSIP Proof: Look at each possible sequence of PUSH-PULL requests done by the nodes of G in t rounds. We define the “inverse” of some sequence, as the sequence we would obtain by looking at the sequence starting from the last round to the first and exchanging PUSH’s and PULLÕs. Now the probability that the information spreads from u to v (resp., from v to u) in at most t steps is equal to the sum of the probabilities of sequences of length at most t that manage to pass the information from u to v (from v to u) — given that the probability of a sequence, and that of its inverse, are the same, the claim follows. 4.4 Warm-up: a weak bound In this section we prove a completion time bound for the PUSH-PULL strategy of O(φ−2 ·log n). Observe that this bound happens to be tight if φ ∈ Ω(1). The general strategy is as follows: • we will prove that, given any set S of informed nodes having volume ≤ |E|, after O(φ−1 ) rounds (that we call a phase) the new set S 0 of informed vertices, S 0 ⊇ S, will have volume vol(S 0 ) ≥ (1 + Ω(φ)) · vol(S) with constant probability (over the random choices performed by nodes during those O(φ−1 ) rounds) — if this happens, we say that the phase was successful; this section is devoted to proving this lemma; • given the lemma, it follows that PUSH-PULL informs a set of nodes of volume larger than |E|, starting from any single node, in time O(φ−2 · log n). Indeed, by applying the Chernoff bound one can prove that, by flipping c · φ−1 · log n IID coins, each having Θ(1) head probability, the number of heads will be at least f (c)·φ−1 ·log n with high probability — with f (c) increasing, and unbounded, in c. This implies that we can get enough (that is, Θ(φ−1 · log n)) successful phases for covering more than half of the graph’s volume in at most Θ(φ−1 ) · Θ(φ−1 · log n) = Θ(φ−2 · log n) rounds; • applying lemma 4.3.4, we can then show that each uninformed node can get the information in the same number of steps, if a set S of volume > |E| is informed — completing the proof. Recall that the probability that the information spreads from any node v to a set of nodes with more than half the volume of the graph is 1−O(n−2 ). Then, with that probability the source node s spreads the information to a set of nodes with such volume. Furthermore, by lemma 4.3.4, any uninformed node would get the information from some node — after node s successfully spreads the information — with probability 1−O(n−2 ). By union bound, we have that with probability 1−O(n−1 ) PUSH-PULL will succeed in O(φ−2 · log n). Our first lemma shows how one can always find a subset of nodes in the “smallest” part of a good conductance cut, that happen to hit many of the edges in the cut, and whose elements have a large “fraction” of their degree that cross the cut. Lemma 4.4.1 Let G(V, E) be a simple graph. 4.4. WARM-UP: A WEAK BOUND 63 Let A ⊆ B ⊆ V , with vol(B) ≤ |E| and cut(A, V − B) ≥ 34 · cut(B, V − B). Suppose further that the conductance of the cut (B, V − B) is at least φ, cut(B, V − B) ≥ φ · vol(B). If we let d+ (v) φ B U = UB (A) = v ∈ A ≥ , d(v) 2 1 4 then cut(U, V − B) ≥ · cut(B, V − B). Proof: We prove the lemma with the following derivation: X X d+ (v) + d+ B B (v) = cut(B, V − B) v∈A X v∈B−A d+ B (v) = cut(B, V − B) − v∈A X d+ B (v) + v∈A∩U X X d+ B (v) v∈B−A X d+ B (v) = cut(A, V − B) v∈A−U d+ B (v) + v∈U X d+ B (v) ≥ v∈A−U 3 · cut(B, V − B), 4 then, X 3 · cut(B, V − B) − d+ B (v) 4 v∈U v∈A−U X X φ 3 + dB (v) ≥ · cut(B, V − B) − · d(v) 4 2 v∈U v∈A−U X φ 3 d+ · cut(B, V − B) − · vol(B) B (v) ≥ 4 2 v∈U X 1 3 d+ · cut(B, V − B) − · cut(B, V − B) B (v) ≥ 4 2 v∈U X 1 d+ · cut(B, V − B). B (v) ≥ 4 v∈U X d+ B (v) ≥ Given v ∈ U = UB (A), we define read “N-pull-U-v”) as follows NU# (v) (to be read “N-push-U-v”) and 1 NU# (v) = {u ∈ NU+ (v) | d(u) ≥ d+ (v)} 3 and 1 NU# (v) = {u ∈ NU+ (v) | d(u) < d+ (v)}. 3 NU# (v)(to be 64 CHAPTER 4. GOSSIP Then, U # = {v ∈ U | NB# (v) ≥ NB# (v)} and U # = {v ∈ U | NB# (v) > NB# (v)}. Observe that U # ∩ U # = ∅ and U # ∪ U # = U . In particular, (at least) one of vol(U # ) ≥ 1 · vol(U ) and vol(U # ) ≥ 12 · vol(U ) holds. In the following, if vol(U # ) ≥ 12 · vol(U ) we will 2 “apply” the PUSH strategy on U ; otherwise, we will “apply” the PULL strategy. Given a vertex v ∈ U , we will simulate either the PUSH the PULL strategy, for O φ1 steps over it. The “gain” g(v) of node v is then the volume of the node(s) that pull the information from v, or that v pushes the information to. Our aim is to get a bound on the gain of the whole original vertex set S. This cannot be done by summing the gains of single vertices in S, because of the many dependencies in the process. For instance, different nodes v, v 0 ∈ S might inform (or could be asked the information by) the same node in V − S. To overcome this difficulty, we use an idea similar in spirit to the deferred decision principle. First of all, let us remark that, given a vertex set S having the information, we will run the PUSH-PULL process for O φ1 rounds. We will look at what happens to the neighbourhoods of different nodes in S sequentially, by simulating the O φ1 steps (which we call a phase) of each v ∈ S and some of its peers in NS+ (v) ⊆ V − S. Obviously, we will 1 make sure that no node in V − S performs more than O φ PULL steps in a single phase. l m steps. Specifically, we consider Algorithm 1 with a phase of k = 10 φ Algorithm 1 The expansion process of the O 1: 2: 3: 4: 5: 6: 7: log n φ2 bound, with a phase length of k steps. at step i, we consider the sets Ai , Bi ; at the first step, i = 0, we take A0 = B0 = S and H0 = ∅; if cut(Ai , V − Bi ) < 43 · cut(Bi , V − Bi ), or vol(Bi ) > |E|, we stop; otherwise, apply lemma 4.4.1 to Ai , Bi , obtaining set Ui = UBi (Ai ); we take a node v out of Ui , and we consider the effects of either the push or the pull strategy, repeated for k steps, over v and NB+i (v); Hi+1 ← Hi ; each node u ∈ NB+i (v) that gets informed (either by a push of v, or by a pull from v) is added to the set of the “halted nodes” Hi+1 ; v is also added to the set Hi+1 ; let Ai+1 = Ai − {v}, and Bi+1 = Bi ∪ Hi+1 ; observe that Bi+1 − Ai+1 = Hi+1 ; iterate the process. l m Observe that — in Process 1 with k = 10 — no vertex in V − S will make more than φ O φ1 PULL steps in a single phase. Indeed, each time we run point 3 in the process, we only 4.4. WARM-UP: A WEAK BOUND 65 disclose whether some node u ∈ V − S actually makes, or does not make, a PULL from v. If the PULL does not go through, and node u later tries to make a PULL to another node v 0 , the probability of this second batch of PULL’s (and in fact, of any subsequent batch) to succeed is actually larger than the probability of success of the first batch of PULL’s of u (since at that point, we already know that the previous PULL batches made by u never reached any previous candidate node v ∈ S). The next lemma summarize the gain, in a single step, of a node v ∈ Ui . Lemma 4.4.2 If v ∈ Ui# , then 1 + φ Pr g(v) ≥ · dBi (v) ≥ . 3 4 On the other hand, if v ∈ Ui# then 1 1 + Pr g(v) ≥ · dBi (v) ≥ . 20 10 In general, if v ∈ Ui , φ 1 + Pr g(v) ≥ · dBi (v) ≥ . 20 10 Proof: Suppose that v ∈ Ui# . Then, at least 21 d+ Bi (v) of the neighbours of v that are not in Bi have degree ≥ 31 d+ Bi (v). Since v ∈ Ui , we have that d+ B (v) i d(v) ≥ φ2 . Thus, the probability that v pushes the information to one of its neighbours of degree ≥ 13 d+ Bi (v) is ≥ 1 + d (v) 2 Bi d(v) ≥ φ4 . Now, suppose that v ∈ Ui# . Recall that g(v) is the random variable denoting the gain of v; that is, X g(v) = gu (v), # u∈NB (v) i where gu (v) is a random variable equal to d(u) if u pulls the information from v, and 0 otherwise. Observe that E[gu (v)] = 1, so that E[g(v)] = N # (v), and that the variance of gu (v) is Bi 2 Var[gu (v)] = E[gu (v) ] − E[gu (v)]2 1 · d(u)2 − 1 = d(u) = d(u) − 1. Since the gu1 (v), gu2 (v), . . . are independent, we have X Var[g(v)] = Var[gu (v)] = # u∈NB (v) i ≤ vol(NB#i (v)) 1 + ≤ dBi (v) NB#i (v). 3 X (d(u) − 1) # u∈NB (v) i 66 CHAPTER 4. GOSSIP In the following chain of inequalities we will apply Chebyshev’s inequality to bound the deviation of g(v), using the variance bound we have just obtained: 1 + Pr g(v) ≤ · d (v) ≤ 20 Bi 1 + · d (v) ≤ Pr g(v) ≤ E[g(v)] − E[g(v)] + 20 Bi # 1 + Pr −g(v) + E[g(v)] ≥ NBi (v) − · d (v) ≤ 20 Bi # 1 + Pr |g(v) − E[g(v)]| ≥ NBi (v) − · d (v) ≤ 20 Bi # N (v) − 1 d+ (v) p 20 Bi Pr |g(v) − E[g(v)]| ≥ qBi · Var[g(v)] ≤ # 1 + d (v) NBi (v) 3 Bi # 1 + N (v) d (v) B B 3# i 1 i + ≤ N (v) − d (v) 2 Bi B 20 i 2 + 1 dBi (v) 6 2 ≤ 1 + 1 + (v) − (v) d d B B 2 20 i i 2 20 9 √ ≤ . 10 9· 6 This concludes the proof of the second claim. The third one is combination of the other two. l m + 10 Now we focus on v, and its neighbourhood NBi (v), for φ many steps. What is the gain G(v) of v in these many steps? Lemma 4.4.3 Pr[G(v) ≥ 1 20 −1 · d+ Bi (v)] ≥ 1 − e . Proof: lObserve that the probability that the event “g(v) ≥ m 10 once in φ independent trials is lower bounded by 1 20 · d+ Bi (v)” happens at least 10 10 φ dφe φ φ 1− 1− ≥1− 1− ≥ 1 − e−1 . 10 10 The claim follows. We now prove the main theorem of the section: Theorem 4.4.1 Let S be the set of informed nodes, vol(S) ≤ |E|. Then, if S 0 is the set of informed nodes after Ω(φ−1 ) steps, then with Ω(1) probability, vol(S 0 ) ≥ (1 + Ω(φ)) · vol(S). 4.4. WARM-UP: A WEAK BOUND 67 l m Proof: Consider Process 1 with a phase of length k = 10 . For the process to finish, at φ some step t it must happen that either vol(Bt ) > |E| (in which case, we are done — so we assume the contrary), or cut(At , V − Bt ) < 43 · cut(Bt , V − Bt ). Analogously, 1 · cut(Bt , V − Bt ) ≤ cut(Bt − At , V − Bt ) = cut(Ht , V − Bt ). 4 But then, 1 1 · φ · vol(S) ≤ · φ · vol(Bt ) 4 4 1 ≤ · cut(Bt , V − Bt ) 4 ≤ cut(Ht , V − Bt ) X X = d+ (v) + Bt v∈Ht ∩S ≤ X v∈Ht ∩S d+ Bt (v) v∈Ht ∩(V −S) d+ Bt (v) + X vol(v) v∈Ht ∩(V −S) Consider the following two inequalities (that might, or might not, hold): P 1 (a) v∈Ht ∩(V −S) vol(v) ≥ 1000 · φ · vol(S), and P + 249 (b) v∈Ht ∩S dBt (v) ≥ 1000 · φ · vol(S). At least one of (a) and (b) has to be true. We call two-cases property the disjunction of (a) and (b). If (a) is true, we are done, in the sense that we have captured enough volume to cover a constant fraction of the cut induced by S. We lower bound the probability of (a) to be false given the truth of (b), since the negation of (b) implies the truth of (a). Recall lemma 4.4.3. It states that — for each vi ∈ Ht ∩ S — we had probability at least + 1 1 · d+ 1 − e−1 of gaining at least 20 Bi (vi ) ≥ 20 · dBt (vi ), since i ≤ t implies Bi ⊆ Bt . For each vi , let us define the random variable Xi as follows: with probability 1 − e−1 , Xi 1 has value 20 · d+ Bt (vi ) , and with the remaining probability it has value 0. Then, the gain of vi is a random variable that dominates Xi . Choosing q = 1 − 2e−1 , in lemma 4.3.1, we can conclude that " # X X 1 1 + Pr Xi ≥ · · dBt (vi ) ≥ 1 − 2e−1 . 2 v ∈H ∩S 20 i: vi ∈Ht ∩S t i P 1 Thus, with constant probability (≥ 1 − 2e−1 ) we gain at least 40 · vi d+ Bt (vi ), which in turn, is at least X 1 1 249 6 · d+ · · φ · vol(S) ≥ · φ · vol(S). Bt (vi ) ≥ 40 v ∈H ∩S 40 1000 1000 i t 68 CHAPTER 4. GOSSIP Hence, (a) is true with probability at least 1 − 2e−1 . So with constant probability there is 1 a gain of 1000 · φ · vol(S) in φ1 steps. Thus using the proof strategy presented at the beginning of the section we get a O (φ−2 log n) bound on the completion time. 4.5 A tighter bound In this section we will present a tighter bound of ! log2 φ1 log n O · log n = Õ . φ φ Observe that, given the already noted diametral lower bound of Ω conductance φ ≥ in φ−1 ). 1 , n1− log n φ on graphs of the bound is almost tight (we only lose an exponentially small factor Our generalstrategy for showing the tighter bound will be close in spirit to the one we log n used for the O φ2 bound of the previous section. The new strategy is as follows: • we will prove in this section that, given any set S of informed nodes having volume at most |E|, for some p = p(S) ≥ Ω(φ), after O(p−1 ) rounds (that we call a p-phase) the new set S 0 of informed vertices, S 0 ⊇ S, will have volume vol(S 0 ) ≥ 1 + Ω p·logφ2 φ−1 · vol(S) with constant probability (over the random choices performed by nodes during those O(p−1 ) rounds) — if this happens, we say that the phase was successful; • using the previous statement we can show that PUSH-PULL informs a set of nodes of volume larger than |E|, starting from any single node, in time T ≤ O(φ−1 · log2 φ−1 · log n) with high probability. Observe that at the end of a phase one has a multiplicative volume gain of φ 1+Ω p · log2 φ−1 with probability lower bounded by a positive constant c. If one averages that gain over the O(p−1 ) rounds of the phase, one can say that with constant probability c, each φ round in the phase resulted in a multiplicative volume gain of 1 + Ω log2 φ−1 . We therefore apply lemma 4.3.3 with L = Θ log2 φ−1 φ , B = Θ log2 φ−1 φ · log n , P = c and δ equal to any inverse polynomial in n, δ = n−Θ(1) . Observe that log L B ≥ Θ PL log δ . Thus, with probability 1 − δ = 1 − n−Θ(1) , we have Θ(B · P ) = Θ log2 φ−1 φ · log n successful steps. Since each successful step gives a multiplicative 4.5. A TIGHTER BOUND volume gain of 1 + Ω 69 φ log φ−1 2 , we obtain a volume of „ 1+Ω φ log φ−1 2 Θ « log2 φ−1 ·log n φ = eΘ(log n) , which, by a suitable choice of the constants, is larger than |E|. • by applying lemma 4.3.4, we can then then show by symmetry that each uninformed node can get the information in T rounds, if a set S of volume > |E| is informed — completing the proof. b # (v) (to be read “N-hat-push-U-v”) and N b # (v)(to Given v ∈ U = UB (A), we define N U B be read “N-hat-pull-U-v”) as follows b # (v) = {u ∈ N + (v) | d(u) ≥ 1 · φ−1 · d+ (v)} N B B B 3 and b # (v) = {u ∈ N + (v) | d(u) < 1 · φ−1 · d+ (v)}. N B B B 3 Then, we define, o n b # b # # b U = v ∈ U | NB (v) ≥ NB (v) and o n b # (v) . b # (v) > N b # = v ∈ U | N U B B b # ) ≥ 1 · vol(U ) we “apply” the PUSH strategy on U ; otherwise, we As before, if vol(U 2 “apply” the PULL strategy. The following lemma is the crux of our analysis. It is a strengthening of lemma 4.4.2. A corollary of the lemma is that there exists a p = pv ≥ Ω(φ), such that afterp−1 rounds, with constant probability, node v lets us gain a new volume proportional to Θ d+ B (v) i p·log φ−1 . b # (v) can be partitioned in at most 6+log φ−1 parts, Lemma 4.5.1 Assume v ∈ Ui . Then, N Bi S1 , S2 , . . ., in such a way that for each part i it holds that, for some PSi ∈ (2−i+1 , 2−i ], |Si | ≥ 1 − 2e−1 , Pr GSi (v) ≥ 256 · PSi where GSi (v) is the total volume of nodes in Si that perform a PULL from v in PS−1 rounds. i Lemma 4.4.2, that we used previously, only stated that with probability Ω(φ) we gained a new volume of Θ(d+ Bi (v)) in a single step. If we do not allow v to go on for more than one 70 CHAPTER 4. GOSSIP step then the bounds of lemma 4.4.2 are sharp4 . The insight of lemma 4.5.1 is that different nodes might require different numbers of rounds to give their “full” contribution in terms of new volume, but the more we have to wait for, the more we gain. We now prove Lemma 4.5.1. Proof:[Proof of Lemma 4.5.1.] + b # (v) in K = KB (v) = lg dBi (v) We divide the nodes in N i Bi 3·φ buckets in a power-of-two b # (v) having degree manner. That is, for j = 1, . . . , K, Rj contains all the nodes u in N Bi 2j−1 ≤ d(u) < 2j . Observe that the Rj ’s are pairwise disjoint and that their union is equal b # (v). to N Bi Consider the buckets Rj , with j > lg φ−1 . We will empty some of them, in such a way that the total number of nodes removed from the union of the bucket is an fraction of the total (that is, of d+ Bi (v)). This node removal step is necessary for the following reason: the buckets Rj , with j > lg φ−1 , contain nodes with a degree so high that any single one of them will perform a PULL operation on v itself with probability strictly smaller than φ. We want to guarantee that the probability of a “gain” is at least φ so we are forced to remove nodes having too high degree. If their number is so small that — overall — the probability of any single one of them to actually perform a PULL on v is smaller than φ. The node removal phase is as follows. If Rj0 is the set of nodes in the j-th bucket after the node removal phase, then 1 Rj |Rj | ≥ 16 · 2j · φ 0 Rj = ∅ otherwise Observe that the total number of nodes we remove is upper bounded by K K X 1 φ X i φ K+1 i ·2 ·φ ≤ 2 ≤ 2 16 16 8 i=0 i=1 (v) d+ φ 1 ≤ · 4 · Bi = · d+ (v). 8 3·φ 6 Bi P P 1 + Therefore, j Rj0 ≥ 31 d+ Bi (v), since j |Rj | ≥ 2 dBi (v). 4 To prove this, we give two examples. In the first one, we show that the probability of informing any new node might be as small as O(φ). In the second, we show that a single step the gain might be only a φ fraction of the volume of the informed nodes. Lemma 4.5.1 implies that these two phenomena cannot happen together. For the first example, take two stars: a little one with Θ φ−1 leaves, and a large one with Θ(n) leaves. Connect the centers of the two stars by an edge. The graph will have conductance Θ(φ). Now, suppose that the center and the leaves of the little star are informed, while the nodes of the large star are not. Then, the probability of the information to spread to any new node (that is, to the center of the large star), will be O(φ). For the second example, keep the same two stars, but connect them with a path of length 2. Again, inform only the nodes of the little star. Then, in a single step, only the central node in the length-2 path can be informed. The multiplicative volume gain is then only 1 + O(φ). 4.5. A TIGHTER BOUND 71 Consider the random variable g(v), which represents the total volume of the nodes in the different Rj0 ’s that manage to pull the information from v. If we denote by gj (v) the P contribution of the nodes in bucket Rj0 to g(v), we have g(v) = K j=1 gj (v). Take any non-empty bucket Rj0 . We want to show, via lemma 4.3.2, that " # 1 Rj0 Pr gj (v) ≥ · ≥ pj . 128 pj (If this event occurs, we say that bucket j succeeds.) This claim follows directly from 4.3.2 by creating one Xi in the lemma for each u ∈ Rj0 , and letting Xi = 1 iff node u pulls the information from v. The probability of this event is pi ∈ (2−j , 2−j+1 ], so we can safely choose the p of lemma 4.3.2 to be p = 2−j+1 . Consider different pj ’s of the buckets. Fix some j. If pj came out of case 2 then, 0 the 1 1 1 since Rj ≥ 16 · 2j · φ, we have pj ≥ 32 · φ. If pj came out of case 1, then pj = 32 . In general, φ pj ≥ 32 , and pj ≤ 1. Let us divide the unit into segments of exponentially decreasing length: that is, 1, 21 , 12 , 14 , . . . , [2−j+1 , 2−j ) , . . .. For each j, let us put each bucket Rj0 into the segment containing its m l ≤ 6 + lg φ−1 segments. pj . Observe that there are at most lg 32 φ Fix any non-empty segment `. Let S` be the union of the buckets in segment `. Observe that if we let the nodes in the buckets of S` run the process for 2` times, we have that, for each bucket Rj0 , " # 1 Rj0 ` Pr Gj (v) ≥ · ≥ 1 − (1 − pj )2 ≥ 1 − e−1 , 128 pj where Gj (v) is the total volume of nodes in Rj0 that perform a PULL from v in 2` rounds. |R0 | 1 Now, we can apply lemma 4.3.1, choosing Xj to be equal vj = 128 · pjj if bucket Rj0 in segment ` (buckets can be ordered arbitrarily) is such that Gj (v) ≥ vj , and 0 otherwise. Choosing p = 1 − e−1 and q = 1 − 2e−1 , we obtain: 1 |S` | Pr GS` (v) ≥ · ≥ 1 − 2e−1 . 256 2−` The following corollary follows from lemma 4.5.1. (We prove in Appendix 4.7 that, constants aside, it is the best possible.) φ Corollary 4.5.1 Assume vi ∈ Ui . Then, there exists pi ∈ 64 , 1 such that " # d+ Bi (vi ) Pr G(vi ) ≥ ≥ 1 − 2e−1 . 5000 · pi · lg φ2 where G(vi ) is the total volume of nodes in NB+i (vi ) that perform a PULL from vi , or that vi pushes the information to, in p−1 rounds. i 72 CHAPTER 4. GOSSIP b # , the same reasoning of lemma 4.4.2 applies. If vi ∈ U b # , then we apply Proof: If vi ∈ U i i lemma 4.5.1 choosing the part S with the largest cardinality. By the bound on the number of partitions, we will have S≥ d+ d+ 1 Bi (vi ) Bi (vi ) · ≥ , 3 6 + lg φ−1 18 · lg φ2 which implies the corollary. We now prove the main theorem of the section: Theorem 4.5.1 Let S be the set of informed nodes, vol(S) ≤ |E|. Then, if S 0 is the set of informed nodes then there exists some Ω(φ) ≤ p ≤ 1 such that, after O (p−1 ) steps, then with Ω(1) probability, !! φ · vol(S). vol(S 0 ) ≥ 1 + Ω p · log2 φ1 Corollary 4.5.1 is a generalization of lemma 4.4.2, which would lead to our result if we could prove an analogous of the two-cases property of the previous section. Unfortunately, the final gain we might need to aim for, could be larger than the cut — this inhibits the use of the two-cases property. Still, by using a strenghtening of the two-cases property, we will prove Theorem 4.5.1 with an approach similar to the one of Theorem 4.4.1. Proof: We say that an edge in the cut (S, V − S) is easy if its endpoint w in V − S is such that ≥ φ. Then, to overcome the just noted issue, we consider two cases separately: (a) at least half of the edges in the cut are easy, or (b) less than half of the edges in the cut are easy. l m In case (a) we bucket the easy nodes in Γ(S) (the neighbourhood of S) in lg φ1 buckets l m in the following way. Bucket Di , i = 1, . . . , lg φ1 , will contain all the nodes w in Γ(S) d− S (w) d(w) such that 2−i < arbitrarily). d− S (w) d(w) ≤ 2−i+1 . Now let Dj be the bucket of highest volume (breaking ties For any node v ∈ Dj we have that its probability to pull the information in one step is at least 2−j . So, the probability of v to pull the information in 2j rounds is at least 1 − e−1 . Hence, by applying lemma 4.3.1, we get that with probability greater than or equal vol(D ) to 1 − 2e−1 we gain a set of new nodes of volume at least 2 j in 2j rounds. But, vol(Dj ) cut(S, Dj ) cut(S, Γ(S)) φ · vol(S) l m ≥ 2j · l m . ≥ 2j · ≥ 2j · 2 2 2 lg φ1 2 lg φ1 Thus in this first case we gain with probability at least 1 − 2e−1 a set of new nodes of volume at least 2j · φ·vol(S) in 2j rounds. By the reasoning presented at the beginning of 1 2dlg φ e section 4.5 the claim follows. 4.5. A TIGHTER BOUND 73 Now let us consider the second case, recall that in this case half of the edges in the cut − (u) point to nodes u in Γ(S) having dd(u) ≥ φ1 . We then replace the original two-cases property with the strong two-cases property: P − 1 (a’) v∈Ht ∩(V −S) dBt (v) ≥ 1000 · cut(S, V − S), and P + 249 (b’) v∈Ht ∩S dBt (v) ≥ 1000 · cut(S, V − S). As before, at least one of (a’) and (b’) has to be true. If (a’) happens to be true then we obtains will be greater than or equal P are done since the−1total P volume of −the new nodes 1 −1 · cut(S, V − S). By Corollary 4.5.1, v∈Ht ∩(V −S) d(v) ≥ φ v∈Ht ∩(V −S) dBt (v) ≥ 1000 · φ −1 we will wait at most w rounds for the cut (S, V − S), for some w ≤ O (φ ). Thus,if (a’) cut(S,V −S) holds, we are guaranteed to obtain a new set of nodes of total volume Ω in w w rounds. Which implies our main claim. We now show that if (b’) holds, then with Θ(1) probability our total gain will be at least cut(S,V −S) Ω w log2 φ−1 in w rounds, for some w ≤ (φ−1 ). Observe that each vi ∈ Ht ∩ S, when it was considered by the process, was given some φ probability pi ∈ 64 , 1 by Corollary 4.5.1. We partition Ht ∩ S in buckets according to probabilities pi . The j-th bucket will contain all the nodes vi in Ht ∩ S such that 2−j < pi ≤ 2−j+1 . Recalling that Bi is thePset of informed nodes when node vi is considered, we let F be the bucket that maximizes vi ∈F d+ Bt (vi ). Then, X d+ Bt (vi ) ≥ vi ∈F 249 l m · cut(S, V − S) 1000 lg 64 φ (4.1) φ By Corollary 4.5.1, we have that for each l m vi ∈ F , there exists p = p(F ) ≥ 64 , such that + 1 with probability at least 1 − 2e−1 , after p2 round, we gain at least 5000·p·lg 2 · dBi (vi ) ≥ φ 1 5000·p·lg 2 φ · d+ Bt (vi ) (since i ≤ t implies Bi ⊆ Bt ). For each vi , let us define the random variable Xi as follows: with probability 1 − 2e−1 , Xi has value 1 5000·p·lg 2 φ · d+ Bt (vi ) , and with the remaining probability it has value 0. Then, the gain of vi is a random variable that dominates Xi . Choosing q = 1 − 52 e−1 , in lemma 4.3.1, we can conclude that " !# X 1 4 X + · d (vi ) Pr Xi ≥ · 5 v ∈F 5000 · p · lg φ2 Bt i: v ∈F i i 5 ≥ 1 − e−1 ≥ 0.08. 2 Thus, in l m 2 p rounds, with constant probability we gain at least X d+ 1 Bt (vi ) · , 2 p 6250 · lg φ v ∈F i 74 CHAPTER 4. GOSSIP which by equation 4.1, it is lower bounded by 1 6250·p·lg 2 φ · 249 1000dlg 64 ≥Ω cut(S,V −S) p·log2 φ−1 φ e · cut(S, V − S) ≥ Thus applying the reasoning presented at the beginning of the section the claim follows. 4.6 Push and Pull by themselves We now comment on how one can change our analysis to get a bound of O(cα · φ−1 · log2 φ−1 log n) on the completion time of PUSH or PULL by themselves. Observe that, if degrees of neighboring nodes have a constant ratio, then the probability that a single node vi ∈ S (in our analysis) to make a viable PUSH or to be hit by a PULL is Θ(αφ) (indeed, v will have at least φ · d(v) neighboring nodes in V − S, each having at most its degree times α — an easy calculation shows that the probability bound for both PUSH and PULL is then Θ(αφ)). Using this observation our analysis can be concluded as in the previous section. Lemma 4.6.1 If the PUSH strategy is used then for each v ∈ Ui , 1 φ Pr g(v) ≥ · d(v) ≥ . α 4 If the PULL strategy is used then for each v ∈ Ui , 1 φ Pr g(v) ≥ · d(v) ≥ . α 4α Proof: By the uniformity condition (a) each of the d+ Bi (v) neighbours of v, that are outside of −1 Bi , have degree within α · d(v) and α · d(v); furthermore, since v ∈ Ui , (b) it holds that d+ B (v) i ≥ φ2 . Suppose the PUSH strategy is used. By (b) the probability that v pushes the information to d(v) some neighour outside Bi (obtaining a gain g(v) of at least α−1 ·d(v), by (a)) is ≥ 1 + d (v) 2 Bi d(v) ≥ φ4 . Suppose, instead, the PULL strategy is used. Then the probability that some neighbour of v outside Bi performs a PULL from v is d(v)·φ·1/2 Y Y 1 1 1 1− 1− ≥ 1− 1− ≥1− 1− d(u) α · d(v) α · d(v) u∈NBi (v) u∈NBi (v) φ 4α where the first inequality is justified by (a), the second by (b), and the remaining two are classic algebraic manipulations. Using (a) again, we obtain that the probability of having a φ gain g(v) of at least α−1 · d(v) is at least 4α . 1 ≥ 1 − e−φ· 2α ≥ 4.7. OPTIMALITY OF COROLLARY 4.5.1 4.7 75 Optimality of Corollary 4.5.1 Figure 4.1: A construction showing that Corollary 4.5.1 is sharp. Each node is labeled with its degree. Consider the cut in Figure 4.1, with φ = 2−t , for some integer t ≥ 1. The set of informed 1 nodes S is a star; its central node (shown in the figure), having degree log2φ /φ , is connected log2 1/φ φ − log2 1/φ leaves inside S, and log2 1/φ nodes outside S. The volume of S is then φ 2 vol(S) = φ − 1 log2 1/φ, and the conductance of the cut is then 2+φ = Ω(φ). (It follows that, for any sufficiently large order, there exists a graph of that order with conductance Θ(φ) that contains the graph in Figure 4.1 as a subgraph.) Finally, the i-th neighbour of S, i = 1, . . . , log2 1/φ, has degree 2i . to Corollary 4.5.1, applied to our construction, gives that there exists some p, Ω(φ) ≤ p ≤ 1, such that the gain in p−1 rounds is Ω(p−1 ) with constant probability. (One can get a direct proof of this statement by analyzing the PULL performance.) We will show that Corollary 4.5.1 is sharp in the sense that, for each fixed constant c > 0, and for any p in the range, the probability of having a gain of at least cp−1 with no more than p−1 rounds is O(). Observe that the claim is trivial if p > , since no gain can be obtained in zero rounds. We will then assume p ≤ . Because of the O (·) notation, we can also assume ≤ c/8. Therefore we prove the statement for p ≤ 8c . Let us analyze the PUSH strategy. Observe that the probability of performing a PUSH 76 CHAPTER 4. GOSSIP −1 from S to the outside in φ−1 ≥ Ω(p−1 ) rounds is (1 − φ)φ of gaining anything with the PUSH strategy is at most . ≤ . Therefore the probability Now let us analyze the PULL strategy. Fix any Ω(φ−1 ) ≤ p ≤ 1. Let A be the set of the neighbours of S having degree less than 2c p−1 , and let B be the set of remaining neighbours. Then, the total volume of (and, thus, the total PULL gain from) nodes in A is not more than cp−1 − 1. Therefore to obtain the required gain, we need a node in B to make a PULL from S. The probability that some node in B makes a PULL from S in one round is upperbounded by log2 1/φ X c −1 2 4p 1 ≤ . 2−i = 2 · 1 − 2− log2 /φ−1 − 1 + 2−dlog2 2 p e ≤ c −1 c 2dlog2 2 p e i=dlog2 2c p−1 e l m It follows that the probability P that some node in B performs a PULL from S in k = p k c 2 c/8 we have that 1 − 4p 4p ≥ 1 rounds is at most P ≥ 1 − 1 − 4p . Since p ≤ = 41 . c c 2 Therefore, 1 4p 4p 4 = Θ(). P ≥ 1 − 4−k· c ≥ 1 − 4−· p · c = 1 − 4−· c = Θ c The claim is thus proved. Chapter 5 Compressibility of the Web graph Graphs resulting from human behavior (the web graph, friendship graphs, etc.) have hitherto been viewed as a monolithic class of graphs with similar characteristics; for instance, their degree distributions are markedly heavy-tailed. In this chapter we take our understanding of behavioral graphs a step further by showing that an intriguing empirical property of web graphs — their compressibility — cannot be exhibited by well-known graph models for the web and for social networks. We then develop a more nuanced model for web graphs and show that it does exhibit compressibility, in addition to previously modeled web graph properties. 5.1 Overview There are three main reasons for modeling and analyzing graphs arising from the Web and from social networks: (1) they model social and behavioral phenomena whose graph-theoretic analysis has led to significant societal impact (witnessed by the role of link analysis in web search); (2) from an empirical standpoint, these networks are several orders of magnitude larger than those studied hitherto (search companies are now working on crawls of 100 billion pages and beyond); (3) from a theoretical standpoint, stochastic processes built from independent random events — the classical basis of the design and analysis of computing artifacts — are no longer appropriate. The characteristics of such behavioral graphs (viz., graphs arising from human behavior) demand the design and analysis of new stochastic processes in which elementary events are highly dependent. This in turn demands new analysis and insights that are likely to be of utility in many other applications of probability and statistics. In such analysis, there has been a tendency to lump together behavioral graphs arising from a variety of contexts, to be studied using a common set of models and tools. It has been observed [5, 21, 66] for instance that the directed graphs arising from such diverse phenomena as the web graph (pages are nodes and hyperlinks are edges), citation graphs, The work described in this chapter is a joint work with F. Chierichetti, R. Kumar, A. Panconesi and P. Raghavan and its extended abstract appeared in the Proceedings of 50th Annual IEEE Symposium on Foundations of Computer Science(FOCS09) [25]. This work is also part of F. Chierichetti’s PhD Thesis. 77 78 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH friendship graphs, and email traffic graphs all exhibit power laws in their degree distributions: the fraction of nodes with indegree k > 0 is proportional to 1/k α typically for some α > 1; random graphs generated by classic Erdös–Rényi models cannot exhibit such power laws. To explain the power law degree distributions seen in behavioral graphs, several models have been developed for generating random graphs [2, 5, 15, 16, 23, 37, 58, 71] in which dependent events combine to deliver the observed power laws. While the degree distribution is a fundamental but local property of such graphs, an important global property is their compressibility — the number of bits needed to store each edge in the graph. Compressibility determines the ability to efficiently store and manipulate these massive graphs [53, 99, 107]. An intriguing set of papers by Boldi, Santini, and Vigna [9, 10, 12] shows that the web graph is highly compressible: it can be stored such that each edge requires only a small constant number — between one and three — of bits on average; a more recent experimental study confirms these findings [22]. These empirical results suggest the intriguing possibility that the Web can be described with only O(1) bits per edge on average. Two properties are at the heart of the compression algorithm of Boldi and Vigna [10]. First, once web pages are sorted lexicographically by URL, the set of outlinks of a page exhibits locality; this can plausibly be attributed to the fact that nearby pages are likely to come from the same web site’s template. Second, the distribution of the lengths of edges follows a power law with exponent > 1 (the length of an edge is the distance of its endpoints in the ordering); this turns out to be crucial for high compressibility. This prompts the natural question: can we model the compressibility of the web graph, in particular mirroring the properties of locality and edge length distribution, while maintaining other well-known properties such as power law degree distribution. Main results. Our first set of results in this chapter is to show that the best known models for the web graph cannot account for compressibility, in the sense that they require Ω(log n) bits storage per edge on average. This holds even when these graphs are represented just in terms of their topology (i.e., with all labels stripped away). Specifically, we show that the preferential attachment model [5, 15], the ACL model [2], the copying model [65], the Kronecker product model [69], and Kleinberg’s model for navigability1 on social networks [58], all have large entropy in the above sense. We then show our main result: a new model for the web graph that has constant entropy per edge, while preserving crucial properties of previous models such as the power law distribution of indegrees, a large number of communities (i.e., bipartite cliques), small diameter, and a high clustering coefficient. In this model, nodes lie on the line and when a new node arrives it selects an existing node uniformly at random, placing itself on the line to the immediate left of the chosen node. An edge from the new to the chosen node is added, and moreover all outgoing edges of the chosen node but one are copied (these edges are chosen at random); thus, the edges have some locality. We then show a crucial property of our model: the power law distribution of edge lengths. Intuitively, this long-get-longer effect is caused since a long edge is likely to receive the new node (which selects its position 1 Since navigability is a crucial property of real-life social networks (cf. [31, 73, 101]), it is tempting to conjecture that social networks are incompressible; see, for instance, [24]. 5.1. OVERVIEW 79 uniformly at random) under its protective wing, and the longer it gets, the more likely it is to attract new nodes. Using this, we show that the graphs generated by our model are compressible to O(1) bits per edge; we also provide a linear-time algorithm to compress an unlabeled graph generated by our model. Technical contributions and guided tour. In Section 5.3 we prove that several wellknown web graph models are not compressible, i.e., they need Ω(log n) bits per edge. In fact, we prove incompressibility even after the labels of nodes and orientations of edges are removed. Sections 5.4 presents our new model and Sections 5.5, 5.6 and 5.8 present the basic properties of our model. Although our new model might at first sight closely resembles a prior copying model of [65], it differs in fundamental respects. First, our new model successfully admits the global property of compressibility which the copying model provably does not. Second, while the analysis of the distribution of the in-degrees is rather standard, the crucial property that edge lengths are distributed according to a power law requires an entirely novel analysis; in particular, the proof requires a very delicate understanding of the structural properties of the graphs generated by our model in order to establish the concentration of measure. Section 6.3 addresses the compressibility of our model, where we also provide an efficient algorithm to compress graphs generated by our model. It is difficult to distinguish experimentally between graphs that require only O(1) bits per edge and those requiring, say, log n bits. The point however is that the compressibility of our model relies upon other important structural properties of real web graphs that previous models, in view of our lower bounds, provably cannot have. Related prior work. The observation of power law degree distributions in behavioral (and other) graphs has a long history [5,66]; indeed, such distributions predate the modern interest in social networks through observations in linguistics [108] and sociology [97]; see the survey by Mitzenmacher [80]. Simon [97], Mandelbrot [76], Zipf [108] and others have provided a number of explanations for these distributions, attributing them to the dependencies between the interacting humans who collectively generate these statistics. These explanations have found new expression in the form of rich-get-richer and herd-mentality theories [5, 103]. Early rigorous analyses of such models include [2, 15, 28, 65]. Whereas Kumar et al. [65] and Borgs et al. [16] focused on modeling the web graph, the models of Aiello, Chung, and Lu (ACL) [2], Kleinberg [58], Lattanzi and Sivakumar [67], and Leskovec et al. [69] addressed social graphs in which people are nodes and the edges between them denote friendship. The ACL model is in fact known not to be a good representation of the web graph [66], but is a plausible model for human social networks. Kleinberg’s model of social networks focuses on their navigability: it is possible for a node to find a short route to a target using only local, myopic choices at each step of the route. The papers by Boldi, Santini and Vigna [9, 10, 12] suggests that the web graph is highly compressible (see also [1, 22, 24, 99]). 80 5.2 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH Preliminaries The graph models we study will either have a fixed number of nodes or will be evolving models in which nodes arrive in a discrete-time stochastic process; for many of them, the number of edges will be linear in the number of nodes. We analyze the space needed to store a graph randomly generated by the models under study; this can be viewed in terms of the entropy of the graph generation process. Note that a naive representation of a graph would require Ω(log n) bits per edge; entropically, one can hope for no better for an Erdös–Rényi graph. We are particularly interested in the case when the amortized storage per edge can be reduced to a constant. As in the work of Boldi and Vigna [10, 12], we view the nodes as being arranged in a linear order. To prove compressibility we then study the distribution of edge lengths — the distance in this linear order between the end-points of an edge. Background.We now recall a concentration result introduced in the previous chapters. Given a function f : A1 × · · · × An → R, we say that f satisfies the c-Lipschitz property if, for any sequence (a1 , . . . , an ) ∈ A1 × · · · × An , and for any i and a0i ∈ Ai , |f (a1 , . . . , ai−1 , ai , ai+1 , . . . , an ) − f (a1 , . . . , ai−1 , a0i , ai+1 , . . . , an )| ≤ c. In order to establish that certain events occur w.h.p., we will make use of the following concentration result known as the method of bounded differences (cf. [35]). Theorem 5.2.1 (Method of bounded differences) Let X1 , . . . , Xn be independent r.v.’s. Let f be a function on X1 , . . . , Xn satisfying the c-Lipschitz property. Then, 2 2 Pr [|f (X1 , . . . , Xn ) − E [f (X1 , . . . , Xn )]| > t] ≤ 2e−t /(c n) . We also prove the following lemma about the Gamma function. We will use it in the compressibility analysis of our new model. Lemma 5.2.1 Let a, b ∈ R+ be such that b 6= a + 1. For each t ∈ Z+ , it holds that t X Γ(i + a) i=1 1 = · Γ(i + b) b−a−1 Γ(a + 1) Γ(t + a + 1) − Γ(b) Γ(t + b) . Proof: We start by giving an expression of Γ(i+a) , for i ≥ 1, that we will use to telescope Γ(i+b) the sum. Consider the following chain of equations: Γ(i + a) Γ(i + b) Γ(i + a) Γ(i + b) b − a − 1 = (i + b − 1) − (i + a) Γ(i + a) Γ(i + a) · (b − a − 1) = · (i + b − 1) − · (i + a) Γ(i + b) Γ(i + b) Γ(i + a) Γ(i + a + 1) · (b − a − 1) = − Γ(i + b − 1) Γ(i + b) Γ(i + a) 1 Γ(i + a) Γ(i + a + 1) = · − Γ(i + b) b−a−1 Γ(i + b − 1) Γ(i + b) 5.3. INCOMPRESSIBILITY OF THE EXISTING MODELS Then, by telescoping on the sum terms, we get: Γ(a+1) Γ(a+2) Γ(a+2) Γ(a+3) Γ(a+t) t − + − + · · · + − X Γ(b) Γ(b+1) Γ(b+1) Γ(b+2) Γ(b+t−1) Γ(i + a) = Γ(i + b) b−a−1 i=1 = Γ(a+1) Γ(b) − Γ(a+t+1) Γ(b+t) b−a−1 Γ(a+t+1) Γ(b+t) , proving the claim. 5.3 81 Incompressibility of the existing models In this section we prove the inherent incompressibility of commonly-studied random graph models for social networks and the web. We show that on average Ω(log n) bits per edge are necessary to store graphs generated by several well-known models for web/social networks, including the preferential attachment and the copying models. In our lower bounds, we show that the random graph produced by the models we consider are incompressible, even after removing the labels of their nodes and orientations of their edges. Given a labeled/directed graph and its unlabeled/undirected counterpart(the set of graphs obtained from the initial graph by applying an isomorphism), the latter is more compressible than the former; in fact, the gap can be arbitrarily large [84,102]. Thus the task of proving incompressibility of unlabeled/undirected versions of graphs generated by various models is made more challenging. (Note that it is crucial to analyze the compressibility of unlabeled graphs — the experiments on web graph [10, 12] show how just the edges can be compressed using only ≈ 2 bits per edge.) We now give some intuition on why one cannot preclude an incompressible directed/labeled graph from becoming very compressible after removing the labels and directions. Consider the following (non-graph related) random process. Suppose we have two bins B1 and B2 and suppose we toss two independent fair coins c1 , c2 . If c1 is head (resp., tail), then we place a white (resp., black) ball in B1 . Analogously, if c2 is head (resp., tail), then we place a white (resp., black) ball in B2 . Now, consider the r.v. X describing the status of the two distinguishable bins. It has four possible outcomes ((W, W ), (W, B), (B, W ), (B, B)) and each of them is equally likely; thus H(X) = 2. Now, suppose we empty the bins B1 and B2 on a table, and let Y be the random variable describing the status of the table after the two balls are placed on it. Y has three possible outcomes ({W, W }, {W, B}, {B, B}) and its entropy is H(Y ) = 32 < 2 = H(X). Similarly, for n coins and n bins, we have H(Xn ) = n and H(Yn ) = Θ(log n). Thus, we can get an exponential gap between the entropies of the labeled (i.e., each outcome can be matched to the coin toss that determined it) and unlabeled processes. For a graph-related example, suppose we choose a labeled transitive tournament on n nodes u.a.r. There are n! such graph, each equally likely, so that the entropy would be log(n!) = Θ(n log n). On the other hand, there exists a single unlabeled transitive tournament, i.e., the entropy of the unlabeled version is zero. 82 5.3.1 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH Proving incompressibility Let Gn denote the set of all directed labeled graphs on n nodes. Let Pnθ : Gn → [0, 1] denote the probability distribution on Gn induced by the random graph model θ. In this chapter we consider the preferential attachment model (θ = pref), the ACL model (θ = acl), the copying model (θ = copy), the Kronecker multiplication model (θ = krm), and Kleinberg’s model (θ = kl). For a given θ, let H(Pnθ ) denote the Shannon entropy of the distribution Pnθ , that is, the average number of bits needed to represent a directed labeled random graph generated by θ. Our goal is to obtain lower bounds on the representation. This is accomplished by the following min-entropy argument. P Lemma 5.3.1 (Min-entropy argument) Let Gn∗ ⊆ Gn , P + ≤ G∈Gn∗ Pnθ (G), and P ∗ ≥ maxG∈Gn∗ Pnθ (G). Then, H(Pnθ ) ≥ P + · log(1/P ∗ ). Proof: H(Pnθ ) = X G∈Gn Pnθ (G) log X X 1 1 1 1 θ P (G) log Pnθ (G) log ∗ ≥ P + ·log ∗ . 2 ≥ ≥ n θ θ Pn (G) G∈G ∗ Pn (G) G∈G ∗ P P n n by P and we will upper bound Thus, to obtain P lowerθ bounds on + ∗ lower bound G∈Gn∗ Pn (G) by P , for a suitably chosen Gn ⊆ Gn . For good lower bounds on H(Pnθ ), Gn∗ has to be chosen judiciously. For instance, choosing a large Gn∗ (say, Gn ) might only yield a P ∗ that is moderately small, while at the same time, it is important to choose a Gn∗ such that P + is large. Let Hn denote the set of all undirected unlabeled graphs on n nodes. Let ϕ : Gn → Hn be the many-to-one map that discards node and edge labels and edge orientations. For aPgiven model θ, let Qθn : Hn → [0, 1] be the probability distribution such that Qθn (H) = θ θ θ θ ϕ(G)=H Pn (G). Clearly, H(Qn ) ≤ H(Pn ) and therefore, lower bounds on H(Qn ) are stronger and harder to obtain. In the following subsections we consider a number of Web Graph models, showing that each of them requires Ω(log n) bits per link — that is, that they all are incompressible. We consider, in this order, the Preferential Attachment model [15], the Aiello-Chung-Lu (ACL) model [2], the copying model [65], the Kronecker multiplication model [69] and Kleinberg’s small-world model [58]. H(Pnθ ), 5.3.2 maxG∈Gn∗ Pnθ (G) ∗ Incompressibility of the preferential attachment model Consider the preferential attachment model (pref[k]) defined in [15]. This model is parametrized by an integer k ≥ 1. At time 1, the (undirected) graph consists of a single node x1 with 1 self-loop. At time t > 1, (1) a new node xt , labeled t, is added to the graph; (2) a random node y is chosen from the graph with probability proportional to its current degree (in this phase, the degree of xt is taken to be 1); 5.3. INCOMPRESSIBILITY OF THE EXISTING MODELS 83 (3) the edge xt → y, labeled t mod k, is added to the graph;2 and (4) if t is a multiple of k, nodes t − k + 1, . . . , t are merged together, preserving self-loops and multi-edges. For k = 1, note that the graphs generated by the above model are forests. Since there are 2O(n) unlabeled forests on n nodes (e.g., [83]), whose edges can be directed in at most 2n pref[k] ways, H(Qn ) = O(n), i.e., the graph without labels and edge orientations is compressible to O(1) bits per edge. The more interesting case is when k ≥ 2 for which we show an incompressibility bound. We underscore the importance of a good choice of Gn∗ in applying Lemma 5.3.1. Consider the graph G having the first node of degree k(n + 1) and the other n − 1 nodes of degree Q pref[k] k−1+i −nk k. Clearly, Pn (G) = nk . Thus, choosing a set Gn∗ containing G, would i=k+1 2i−1 ≥ 2 force us to have P ∗ ≥ 2−nk so that the entropy bound given by Lemma 5.3.1 would only be pref[k] H(Pn ) ≥ nk = Θ(n). (A similar issue would be encountered in the unlabeled case as well.) A careful choice of Gn∗ , however, yields a better lower bound. pref[k] Theorem 5.3.1 H(Qn ) = Ω(n log n), for k ≥ 2. Proof: Let G be a graph generated by pref[k]. Let degt (xi ), for i ≤ t, be the degree of the i-th inserted node at time t in G. By [29, Lemma 6], with probability 1 − O(n−3 ), for each p 1 ≤ t ≤ n, each node xi , 1 ≤ i ≤ t, will have degree degt (xi ) < ( t/i) log3 n in G. P∗ √ In particular, let t∗ = d 3 ne. Let ξ be the event: “∃t ≥ t∗ , ti=1 degt (xi ) ≥ n3/4 .” At time n, the sum of the degrees of nodes x1 , . . . , xt∗ can be upper bounded by t∗ t∗ r t∗ X X X √ n 3 3 degn (xi ) ≤ log n = n log n i−1/2 < O(n3/4 ), i i=1 i=1 i=1 w.h.p. Indeed, Pr [ξ] ≤ O(n−3 ). Now define t+ = dne, for some small enough > 0; let n be large enough such that ∗ t < t+ . We call a node added after time t+ good if it is not connected to any of the first t∗ nodes. To bound the number of good nodes from below, we condition on ξ, and we upper bound the number of bad nodes. Using a union bound, the probability that node xt for t ≥ t∗ is bad can be upper bounded by k · n3/4 /(n) ≤ O(n−1/4 ). Let ξ 0 be the event: “at least (1 − 2)n nodes are good”; by stochastic dominance, the event ξ 0 happens w.h.p. In our application of Lemma 5.3.1, we will choose Gn∗ ⊆ Gn to be the set of graphs satisfying ξ ∩ ξ 0 . Thus, P + = Pr [ξ ∩ ξ 0 ] = 1 − o(1). Moreover, (1−2)kn q n 3 √ 3 n log n 2(1−2)n 4 14 ≤ O(n−2/3+ ) ≤ n− 3 n+ 3 n = ρ. max∗ Pnpref[k] (G) ≤ G∈Gn kn pref[k] (Notice how, by applying Lemma 5.3.1 at this point, we already have that H(Pn Ω(n log n).) 2 ) ≥ In the original PA model, edges are both undirected and unlabeled: we direct and label them for simplicity of exposition. The entropy lower bound will hold for the undirected and unlabeled version of these graphs. 84 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH pref[k] Now, we proceed to lower bound H(Qn ) through an upper bound on |ϕ−1 (H)| for pref[k] H ∈ Hn , by a careful counting argument. Given a H, it is possible to determine for each of its edges, which of the two endpoints of the edge was responsible for adding the edge to the graph. This task is easy for edges incident to any node of degree k, as that node will have necessarily added all k edges to the graph. So, we can remove all degree k nodes from the graph and repeat this process until the graph becomes empty. Thus, H could have been produced from at most n! · (k!)n labeled graphs, since there are at most n! ways of labeling the nodes, and k! ways of labeling each of the “outgoing” edges of each node. That is, |ϕ−1 (H)| ≤ n! · (k!)n ≤ nn k kn . Then, choosing Hn∗ ⊆ Hn to be the set of unlabeled graphs obtained by removing labels from Gn∗ , Hn∗ = {ϕ(G) | G ∈ Gn∗ }, we obtain P + = 1 − o(1), and (H) ≤ ρ · nn · k kn = n−Ω(n) k kn = P ∗ . max Qpref[k] n ∗ H∈Hn pref[k] Finally, an application of Lemma 5.3.1 gives H(Qn the proof. 5.3.3 ) ≥ P + ·log P1∗ ≥ Ω(n log n), completing Incompressibility of the ACL model We recall the ACL model (model A in [2]). This model (acl[α]) is parametrized by some α ∈ (0, 1). At time 1, the graph consists of a single node. At time t + 1, a coin is tossed: with probability 1 − α, a new node is added to the graph and with probability α, an edge from x to y is added to the graph, where node x is chosen with probability proportional to the outdegree of x, while node y is chosen randomly with probability proportional to the indegree of y. We assume that α > 1/2. This is because the edge density of the graph generated by model is α/(1 − α), w.h.p.; if α < 1/2, then there are many more nodes than edges, an uninteresting case both in theory and in practice. Under this assumption, we show acl[α] H(Pn ) = Ω(n log n). 0acl[α] Theorem 5.3.2 H(Qn ) = Ω(n log n), for3 α > 1/2. Proof: Let α > 1/2 be the parameter of the acl[α] model. Let Gn0 be the set4 of all timelabeled graphs, that can be generated by acl[α] model in n time steps, where the label represents the time when a node or an edge was added to the graph. Let Hn0 be the set of all undirected and unlabeled graphs that can be obtained by removing the orientation and (time-)labels from the graphs in Gn0 . 0acl[α] Let Pn : Gn0 → [0, 1] denote the probability distribution induced on Gn0 by the model acl[α]. We define the following two events. 3 Here we do not use the probability distribution Q on the graphs of n nodes — in the acl[α] model the 0acl[α] number of nodes is a r.v. Qn denotes the probability distribution on the graphs that can be generated by the acl[α] model in n steps. 4 Note that here it would be unnatural to consider the previously defined class Gn , as the number of nodes in the acl[α] model is a r.v. The same holds for Hn . 5.3. INCOMPRESSIBILITY OF THE EXISTING MODELS 85 ξ: the number of edges is αn ± o(n), while the number of nodes is (1 − α)n ± o(n), and ξ 0 : the number of edges going from a node of O(1) outdegree to a node of O(1) indegree is at least (α − )n, for some > 0 to be fixed later. Our plan is first show (Lemma 5.3.2) that ξ ∧ ξ 0 occurs with probability 1 − o(1). Let Gn0∗ ⊆ Gn0 be the subset of Gn0 containing the graphs satisfying ξ ∧ ξ 0 . Then, with the notation of Lemma 5.3.1, it holds that P + = 1 − o(1). We will then show (Lemma 5.3.3) that 0acl[α] P ∗ = maxG0 ∈Gn0∗ Pn (G0 ) ≤ n−(2α−)n . Given these, we can complete the proof as follows. Let ϕ0 : Gn0 → Hn0 , be the map that removes edge and node labels from the graphs of Gn0 . As P 0acl[α] 0acl[α] before, Qn (H 0 ) = ϕ0 (G0 )=H 0 Pn (G0 ). Note that for each H 0 we have that |ϕ0 (G0 )| ≤ n! (as each element of the graph has one label out of the set {1, . . . , n}). Thus, (G) ≤ n! · n−(2α−)n ≤ n−(2α−)n+n = n(1−2α+)n . max Q0acl[α] n 0∗ G0 ∈Gn The proof can be concluded with an application of Lemma 5.3.1. Lemma 5.3.2 Pr [ξ ∧ ξ 0 ] = 1 − o(1). Proof: By Chernoff bound, Pr [ξ] = 1 − o(1). Thus it suffices to show that Pr [ξ 0 ] = 1 − o(1). Let Xit (Yit ) be the r.v. denoting the number of nodes having indegree (outdegree) i at time t. The authors of [2] show that E [Xit ] 1 E [Yit ] 1−α 1 Γ(i) ±O = = Γ 1+ , 1 t t α α Γ i+1+ α t and that i h t √ t Pr Xi − E Xi > 2t log n + 2 < exp − log2 n , i h t √ t Pr Yi − E Yi > 2t log n + 2 < exp − log2 n . Note that, by union bound, each of the r.v.s Xit , Yit can be shown to deviate from their mean by at most the stated error term w.h.p. Let j be an integer to be fixed later. An edge is good if it goes from a node of outdegree ≤ j to a node of indegree ≤ j. Let us denote by Zjt the number of good edges at time t. Note that Zjt−1 + 1 ≥ Zjt ≥ Zjt−1 − 2j. This is because at most one edge is added in a single step and adding an edge changes the degree of at most 2 nodes. Thus, the number of good edges can decrease at most 2j in a single step, i.e., Zjt satisfies the (2j)-Lipschitz condition. Then, 2j t t−1 t X t−1 E Zj = E Zj + Pr Zj = Zj + 1 − i Pr Zjt = Zkt−1 − i . i=1 In order to increase the number of good edges, a node of indegree < j and a node of outdegree < j must be chosen as the ending and the starting point of the new edge. P P j−1 j−1 t−1 t−1 iX iY i i i=1 i=1 Pr Zjt = Zjt−1 + 1 = α . (t − 1)2 86 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH For the number of good edges to decrease, either the origin of the new edge has outdegree j, or the destination of the new edge has indegree j. Thus, jXjt−1 jYjt−1 Pr Zjt < Zjt−1 ≤ + . t−1 t−1 By calculations, 2j X i=1 2j X t t X t−1 + Yjt−1 t−1 t−1 2 j Pr Zj = Zj − i ≤ 2j i Pr Zj = Zj − i ≤ 2j . t−1 i=1 Thus, E Zjt ≥E Zjt−1 E hP +α j−1 i=1 iXit−1 P j−1 i=1 iYit−1 (t − 1)2 i E Xjt−1 + E Yjt−1 . − 2j t−1 2 With probability 1 − o(1), for all log2 n ≤ t ≤ n and 1 ≤ i ≤ j − 1, we have 1−α 1 Γ(i) t t t. Xi = Yi = (1 ± o(1)) Γ 1+ α α Γ i + 1 + α1 Thus w.h.p., for all t ≥ log2 n, j−1 j−1 X Γ(j + 1)Γ(1 + α1 ) Xit X Yit i = =1− ± o(j 2 ). i 1 t t Γ(j + α ) i=1 i=1 As j is a constant, the error term is o(1). Then, 2 t t−1 Γ(j + 1)Γ(1 + α1 ) 1−α 1 j 2 Γ(j) E Zj ≥ E Zj − 4 Γ 1 + ± o(1). +α 1− α α Γ(j + 1 + α1 ) Γ(j + α1 ) Γ(j+1)Γ(1+ 1 ) 2 j Γ(j) α Note that, as j grows, both and Γ(j+1+ tend to 0. That is, for each 1 , 1 1 Γ(j+ α ) ) α there exists a j = j(1 ) such that (5.1) E Zjt ≥ E Zjt−1 + (1 − 2 )α. For each j, and for each t, we will define a Bjt in such a way that, w.h.p., Bjt ≤ E Zjt . Let Bjt = 0 for t ≤ dlog3 ne so that the base case is true. Define Bjt = (t−dlog3 ne)(1−2 )α. This definition satisfies Bjt ≤ E Zjt for all t — this can be shown by induction on (5.1). Recall that all these hold w.h.p. As we have already argued, the r.v. Zkt satisfies the (2j)-Lipschitz condition, i.e., using [2, Lemma 1], Zjt = E Zjt ± o(t) ≥ t(1 − 2 )α, for every t ≥ dlog3 ne, w.h.p. In particular for any 2 > 0 there exists a j = j() s.t. Zjn ≥ n(1 − 2 )α, w.h.p. 0acl[α] Lemma 5.3.3 Conditioned on ξ ∧ ξ 0 , maxG0 ∈Gn0∗ Pn (G0 ) ≤ n−(2α−)n . 5.3. INCOMPRESSIBILITY OF THE EXISTING MODELS 87 Proof: Since we condition on ξ 0 , there are at least n(1 − 2 )α good edges. These good edges are labeled with their order of arrival. For i ≥ 3 αn, the probability that the i-th arrived 2 2 edge is good is at most ji2 ≤ (3jαn)2 . The probability that all the edges with label at least 3 αn are good is at most j 3 αn 2(1−3 )αn ≤ j 3 α 2(1−3 )αn n−2(1−3 )αn ≤ n−(2α−)n . Thus, the maximum probability of generating a graph in Gn0∗ , conditioned on ξ ∧ ξ 0 , is at most n−(2α−)n . 5.3.4 Incompressibility of the copying model We now turn our attention to the (linear growth) copying model (copy[α, k]) of Kumar et al. [65]. This model is parametrized by an integer k ≥ 1 and an α ∈ (0, 1). Here, k represents the outdegree of nodes and α determines the “copying rate” of the graph. At time t = 1, the graph consists of a single node with k self-loops. At time t > 1, (1) a new node xt is added to the graph; (2) a node x is chosen uniformly at random among x1 , . . . , xt−1 ; and (3) for each i = 1, . . . , k, a α-biased coin is flipped: with probability α, the i-th outlink of xt is chosen uniformly at random from x1 , . . . , xt−1 and with probability 1 − α, the i-th outlink of xt will be equal to the i-th outlink of x, i.e., the i-th outlink will be “copied”. copy[α,k] Theorem 5.3.3 H(Qn ) = Ω(n log n), for k > 2/α. Proof: We start by noting that the copying model with outdegree k can be completely described by k independent versions of the copying model with outdegree 1. We use copy[α, k] to denote the copying model with k outlinks, Gn,k for the set of labeled5 graphs on n nodes that can be generated by copy[α, k], and Hn,k for the set of unlabeled graphs that can be obtained by removing labels and orientations from the graphs in Gn,k . We start with the case k = 1. Let E [Xit ] be the expected indegree at time t of the node inserted at time i ≤ t. Then, t 0 t=k E Xi = α + t > i. E Xit−1 1 + 1−α t−1 t−1 αΓ(t+1−α)Γ(i) α Note that E [Xit ] = (1−α)Γ(i+1−α)Γ(t) − 1−α . We now show that Xit satisfies a O(1)-Lipschitz condition, with the constant depending on i and α. Let Yjt denote the number of edges “copied”, directly or indirectly, from the j-th edge until time t ≥ j. Precisely, let us define the singleton Sjj = {ej } containing the j-th added edge. The set Sjt , t > j, will be defined as follows: if the t-th edge et was copied from some edge in Sjt−1 , then Sjt = {et } ∪ Sjt−1 , otherwise Sjt = Sjt−1 . With this notation, Yjt = |Sjt |. We now use the following concentration bound [35]. 5 Nodes are labeled with 1, . . . , n and, for each node, its outlinks are labeled with 1, . . . k. 88 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH Theorem 5.3.4 (Method of average bounded differences) Suppose f is some function of (possibly dependent) r.v.’s X1 , . . . , Xn . Suppose that, for each i = 1, . . . , n, there exists a ci such that, for all pairs xi , x0i of possible values of Xi , and for any assignment X1 = x1 , . . . , Xi−1 = xi−1 , it holds that |E − E 0 | ≤ ci , where E = E [f (X1 , . . . , Xn ) | Xi = xi , Xi−1 = xi−1 , . . . , X1 = x1 ] , E 0 = E [f (X1 , . . . , Xn ) | Xi = x0i , Xi−1 = xi−1 , . . . , X1 = x1 ] . Let c = Pn 2 i=1 ci . Then, 2 t Pr [|f (X1 , . . . , Xn ) − E [f (X1 , . . . , Xn )]| > t] ≤ 2 exp − . 2c Let j be fixed. to bound cj in 5.3.4 can be applied. such a way that Theorem goal ist−1 Our j 1−α t · 1 + t−1 , for t > j, and Yj = 1. Then, it follows that Observe that E Yj = E Yj t Γ(t+1−α)Γ(j) E Yj = Γ(t)Γ(j+1−α) . Suppose we want to bound the degree of the i-th node xi . Then, we are interested in t bounding the maximum expected change cj in the degree nXi of xi over the possible choices of the j-th edge, for j =Pi + 1, . . . , n. We have cj ≤ 2 E Yj . Let us consider c = nj=i+1 c2j . We have c ≤ n X 2 2 E Yjn j=i+1 ≤ 4 Γ(n + 1 − α) Γ(n) ≤ a · n2−2α n X j=i+1 2 1 j 2−2α · n X j=i+1 Γ(j) Γ(j + 1 − α) 2 , for some large enough constant a > 0. Thus we obtain, i2α−1 −n2α−1 α< a · n2−2α · 1−2α a · n · (log n + 1) α= c≤ 2−2α n2α−1 −i2α−1 +1 a·n · α> 2α−1 1 2 1 2 1 2 Let us fix i = dne. Then, 1−2α 1−2α = a · n · 1− α< a · n2−2α · n2α−1 1− 1−2α 1−2α c≤ a · n · (log n + 1) α= 1 1 a · n2−2α · n2α−1 2α−1 + n2−2α = a · n · 2α−1 + o(n) α > 1 2 1 2 1 2 Thus, c ≤ O(n log n). Applying Theorem 5.3.4, we get h i p t 4c log n 2 t Pr Xi − E Xi ≥ 2 c log n ≤ 2 exp = 2 exp(2 log n) = 2 . 2c n 5.3. INCOMPRESSIBILITY OF THE EXISTING MODELS 89 By the union bound, with probability √ 1 − O(1/n), each node i = dne, dne + 1, . . . , n will have degree upper bounded by O( n log n) (note that the expected degree of these nodes is constant). Conditioning (as in the proof of Theorem 5.3.1) on this event we obtain ∗ ⊆ Gn,1 , P + = 1 − o(1), and for k = 1, Gn,1 r max Pncopy[α,1] (G) ≤ ∗ G∈Gn,1 O log n n !!αn . Now let us consider copy[α, k], with k > 1. Since outlinks are chosen independently, it holds that !!kαn r log n max Pncopy[α,k] (G) ≤ O . ∗ G∈Gn,k n For constant k > 2/α, this upper bound is less than n−(1+)n for some constant > 0. copy[α,k] To show a lower bound on H(Qn ), we once again upper bound |ϕ−1 (H)|, for H ∈ Hn,k . We proceed as in the proof of Theorem 5.3.1. Given H, for each of its nodes v, it is possible to determine which of the edges incident to v were its outlinks in all the G’s such that ϕ(G) = H (this can be done by induction, noting that a node of degree k in H in would have had in-degree 0 in G). As there are exactly k labels for the outlinks of each node, and the number of nodes is n, we have that, for each H ∈ Hn,k , |ϕ−1 (H)| ≤ n! · (k!)n . The proof can be concluded as in Theorem 5.3.1. 5.3.5 Incompressibility of the Kronecker multiplication model We now turn our attention to the Kronecker multiplication model (krm) of Leskovec et al. [69]. Given two matrices A ∈ Rn×n and B ∈ Rm×m , their Kronecker product A ⊗ B is an nm × nm matrix a1,1 B a1,2 B · · · a1,n B a2,1 B a2,2 B · · · a2,n B A ⊗ B = .. .. .. , . . . . . . an,1 B an,2 B · · · an,n B where A = {ai,j } and ai,j B is the usual scalar product. The Kronecker multiplication model is parametrized by a square matrix M ∈ [0, 1]`×` , and a number s of multiplication “steps”. The graph will be composed by `s nodes. The edges are generated as follows. For each couple of distinct nodes (i, j) in the graph an edge going from [s] [s] i to j will be added independently with probability Mi,j , where Mi,j = |M ⊗ M {z ⊗ · · · ⊗ M}. s times It is clear that for some choices of the matrix M , the graph will be compressible. Indeed, if M has only 0/1 values then the random graph has zero entropy, as its construction is completely deterministic. On the other hand, we show here that there exists a matrix M that makes the graph incompressible. Indeed, even some 2 × 2 matrix M would generate graphs requiring at least Ω(log n) bits per edge and we expect that a lot of probabilistic 90 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH matrix will have the same behavior. (Note that a 1 × 1 matrix can only produce graphs containing a single node.) 1 1 krm[M,s] Theorem 5.3.5 Let ` ≥ 2, J = and 1/` < α < 1. Then, w.h.p., H(Qn )= 1 1 Ω(m log n), where n = `s , M = α · J` , and m is the number of edges. Proof: Consider the original directed version of the graph. Note that M [s] = αs · J`s . Thus the events “the edge i → j is added to the graph” are i.i.d. trials, each having probability of success αs . In the undirected and simple version of the graph, the events “the edge {i, j} is added to the graph”, for i 6= j, are again i.i.d. trials, each of probability β = 1 − (1 − αs )2 = Θ(αs ). Thus we obtain an Erdös–Rényi Gn,p graph with n = `s and p = Θ(αs ). By a Chernoff bound, m = Θ(n2 p), w.h.p. Now, s 1 1 m = Θ(n2 p) = Θ(n · (`α)s ) = Θ n · `1+log` α = Θ n · (`s )1−log` α = Θ n2−log` α . By α > `−1 , we obtain log` α1 < 1; thus m = Θ(n2 p) is a polynomial in n of degree > 1. Recall that, for Lemma 5.3.1 to apply, we need to find a subset Hn∗ ⊆ Hn , having large total probability P + , and such that each graph in Hn∗ has probability upper bounded by a (small) P ∗ . The condition {m = Θ(n2 p)} determines our Hn∗ , giving us P + = 1 − o(1). To upper bound P ∗ , note that each labeled version of each graph in Hn∗ has probability 2 2 ≤ pΘ(n p) ≤ 2−Θ(s·n p) . There are at most n! ≤ 2O(n log n) many labeled versions of each fixed graph in Hn∗ . Thus, 2 2 P ∗ ≤ 2O(n log n)−Θ(s·n p) = 2−Θ(s·n p) . + ∗ 2 By Lemma 5.3.1, we have that H(Qkrm n ) ≥ P log(1/P ) ≥ Θ(s · n p). Noting that s = Θ(log n) and m = Θ(n2 p) concludes the proof. 5.3.6 Incompressibility of Kleinberg’s small-world model Recall Kleinberg’s small-world model6 (kl) [58, 59] on the line, with nodes 1, . . . , n. A directed labeled random graph is generated by the following stochastic process. Each node x independently chooses a node y with probability proportional to 1/(|x − y|) and adds the directed edge x → y; these are the so-called long-range edges. In addition, the node x has (fixed) directed edges to its neighbors x − 1 and x + 1 (the short-range edges). For simplicity, we start by proving the following weaker result. After the proof, we will comment on how one can obtain the stronger incompressibility of Ω(n log n). Lemma 5.3.4 H(Qkl n ) = Ω(n log log n). 6 Note that an important difference between Kleinberg’s small-world model and other models considered in this chapter lies in their degree distribution. Nodes’ degrees in Kleinberg’s model are upper bounded by O(log n) w.h.p.; the other models we consider here have a power law degree distribution, and thus nodes of polynomial degree, w.h.p. 5.4. THE NEW WEB GRAPH MODEL 91 Proof: Note that in Kleinberg’s one-dimensional model, the normalization factor for the probability distribution that generates long-range edges is Θ(log n). Hence, for every node x, the maximum probability of choosing a particular long-range edge x → y is at most c1 / log n, for some constant c1 . Since each node chooses edges independently, the maximum probability of generating any labeled n-node graph O((c1 / log n)n ), i.e., maxG∈Gn Pnkl (G) ≤ (c1 / log n)n . Using Lemma 5.3.1, we conclude H(Pnkl ) = Ω(n log log n). To get a lower bound on H(Qkl n ), we first obtain an upper bound on the number ρ(H) of Hamiltonian paths in an undirected graph H with m edges (this upper bound will hold for directed Pn graphs as well). Suppose Qn that H has degree sequence d1 ≥ · · · ≥ dn , with 2m = i=1 di . Clearly, ρ(H) ≤ n · i=1 di , where the leading n is for the different choices p Pn Qn 1 n of the starting node. Applying the AM-GM inequality ( i=1 xi , for noni=1 xi ≤ n Qn negative xi ’s), we have that ρ(H) ≤ n · i=1 di ≤ n · (2m/n)n . Let H ∈ Hn . By just considering all possible permutations of the node labels, we can see that |ϕ−1 (H)| ≤ n!. However, not all permutations are valid. In particular, a valid permutation preserves adjacency, hence the number of valid permutations is upper bounded by the number of Hamiltonian paths in H. Since m = O(n) in kl, by the above argument, ρ(H) ≤ cn2 , for some constant c2 . Thus, |ϕ−1 (H)| ≤ cn2 . We have n n X c1 1 kl kl −1 kl n Qn (H) = Pn (G) ≤ |ϕ (H)| · (max Pn (G)) ≤ c2 =O . G∈Gn log n log n ϕ(G)=H The proof is complete by appealing to Lemma 5.3.1. The above lower bound can be improved as follows. First, we only consider graphs in which Ω(n) of the edges exist between nodes that are nΩ(1) apart. Using a Chernoff bound we show that the graphs generated by Kleinberg’s model satisfies this property w.h.p. (i.e., the P + of the Lemma 5.3.1 is Ω(1)). It can then be shown that the maximum probability of generating any one of these graphs is at most P ∗ = n−Ω(n) . Once again, applying Lemma 5.3.1, we can obtain the following theorem: Theorem 5.3.6 H(Qkl n ) = Ω(n log n). Finally, we note that the similar incompressibility bounds can be obtained for the rank-based friendship model [73]. 5.4 The new web graph model In this section we present our new web graph model. Let k ≥ 2 be a fixed positive integer. Our new model creates a directed simple graph (i.e., no self-loops or multi-edges) by the following process. The process starts at time t0 with a simple directed seed graph Gt0 whose nodes are arranged on a (discrete) line, or list. The graph Gt0 has t0 nodes, each of outdegree k. Here, Gt0 could be, for instance, a complete directed graph with t0 = k + 1 nodes. At time t > t0 , an existing node y is chosen uniformly at random (u.a.r.) as a prototype: 92 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH k=2 A B C A B D C 1 2 3 1 2 3 4 Gt0 = G3 G4 (x = D, y = C) Figure 5.1: The new node x = D chooses y = C as its prototype. The edge C → B is copied and the new edge D → C is added for reference. Notice that all the edges incident to C in Gt0 = G3 increase their length by 1 in Gt0 +1 = G4 . (1) a new node x is placed to the immediate left of y (so that y, and all the nodes on its right, are shifted one position right in the ordering), (2) a directed edge x → y is added to the graph, and (3) k − 1 edges are “copied” from y, i.e., k − 1 successors (i.e., out-neighbors) of y, say z1 , . . . , zk−1 , are chosen u.a.r. without replacement and the directed edges x → z1 , . . . , x → zk−1 are added to the graph. See Figure 5.1 for an illustration of our model. An intuitive explanation of this process is as follows. Consider the list of web pages ordered lexicographically by their URLs (for this ordering, a URL a.b.com/d/e is to be interpreted as com/b/a/d/e.) A website owner might decide to add a new web page to her site; to do this, she could take one of the existing web pages from her site as a prototype, modify it as needed, add an edge to the prototype for reference, and publish the new page on her site. Thus the new web page and the prototype will be close in the URL ordering. In our model, we can show the following: 1 (1) The fraction of nodes of indegree i is asymptotic to Θ(i−2− k−1 ); this power law is often referred to as “rich get richer.” 1 (2) The fraction of edges of length7 ` in the given embedding is asymptotic to Θ(`−1− k ); analogously, we refer to this as “long get longer.” Boldi and Vigna [10] study the distribution of gaps in the web graph, defined as follows. Sort the web pages lexicographically by URLs and this gives an embedding of nodes on the line. Now, if a web page x = z0 has edges to z1 , . . . , zj in this order, the gaps are given by |zi−1 − zi |, 1 ≤ i ≤ j. They observe how the gap distribution in real web graph snapshots follows a power law with exponent ≈ 1.3. Our model can capture a similar distribution for the edge lengths, by an appropriate choice of k. In fact, both the average edge length and the average gap in our model are small; intuitively, though not immediately, this leads to the compressibility result of Section 6.3. It turns out that a power law distribution of either 7 The length of an edge x → y is the absolute difference between the positions of node x and y in the given embedding. 5.5. RICH GET RICHER 93 the lengths or the gaps (with exponent > 1) is sufficient to show compressibility; for sake of simplicity, we focus on the former in Section 5.6. 5.5 Rich get richer In this section we characterize the indegree distribution of our graph model. We show that the expected indegree distribution follows a power law. We then show the distribution is tightly concentrated. Let 2 1 1 k 21+ k−1 Γ 23 + k−1 Γ i + 1 + k−1 . √ f (i) = · 2 (k − 1) π Γ i + 3 + k−1 2 . k 21+ k−1 1 1 1 Γ( 32 + k−1 ) −2− k−1 √ = 1, i.e., f (i) = Θ(i−2− k−1 ). Let It follows that limi→∞ f (i) ·i (k−1) π Xit denote the number of nodes of indegree i at time t. We first show that E [Xit ] can be bounded by f (i) · t ± c, for some constant c. Theorem 5.5.1 There is a constant c = c(Gt0 ) such that f (i) · t − c ≤ E [Xit ] ≤ f (i) · t + c, for all t ≥ t0 and i ∈ [t]. (1) Proof: For now, assume t > t0 . Let x be the new node, and let y be the node we will copy edges from; recall that y is chosen u.a.r. First, we focus on the case i = 0. We have E X0t | X0t−1 = X0t−1 − Pr [y had indegree 0] + 1, as at each time step a new node (i.e., x) of indegree 0 is added, and the only node that could change its indegree to 1 is y. The probability of the latter event is exactly X0t−1 /(t − 1). By the linearity of expectation, we get t 1 E X0t−1 + 1. (5.2) E X0 = 1 − t−1 Next, consider i ≥ 1. According to our model, nodes z1 , . . . , zk−1 , will be chosen without replacement from S(y), the successors of y. The successors of the new node x will then be S(x) = {y, z1 , . . . , zk−1 }. Since z1 , . . . , zk−1 are all distinct, the graph remains simple and |S(x)| = k. For each j = 1, . . . , k − 1, the node zj is chosen with probability proportional to its indegree; this follows since node zj was the endpoint of an edge chosen u.a.r. The probability i(k−1) 1 + k(t−1) (recall that that a particular node of indegree i ≥ 1 gets chosen as a successor is t−1 all the k successors of x will be distinct). Thus, for i ≥ 1, t t−1 t−1 1 i k−1 1 i−1k−1 E Xi = 1 − − E Xi + + E Xi−1 . (5.3) t−1 t−1 k t−1 t−1 k For the base cases, note that Xtt = 0 for each t ≥ t0 . Also, the variables Xit0 are 1 completely determined by Gt0 . For each fixed k, we have f (t) = Θ(t−2− k−1 ). Thus, there is 94 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH a constant c0 such that for any c ≥ c0 , and for all t ≥ t0 , E [Xtt ] follows (1). The base cases t0 E Xi , i = 1, 2, . . ., can also be covered with a sufficiently large c (that has to be greater than some function of the initial graph Gt0 ). √ For the inductive case, we have f (0) = 12 (by applying Γ(x)Γ(x + 12 ) = Γ(2x) 21−2x π, 1 and Γ(2x + 1) = 2x Γ(2x), with x = 1 + k−1 ). Using this, (5.2), and calculations, we can t−1 t show that if X0 satisfies (1), then X0 also satisfies (1). For i ≥ 1, we have f (i − 1) = f (i) · (ik − i + 2k + 2)/(ik − i + 1). An induction on (5.3) completes the proof. Thus, in expectation, the indegrees follow a power law with exponent −2 − 1/(k − 1). We now show a O(1)-Lipschitz property for the r.v.’s Xit for k = O(1). The concentration immediately follows using Theorem 5.2.1. Lemma 5.5.1 Each r.v. Xit satisfies the (2k)-Lipschitz property. Proof: Our model can be interpreted as the following stochastic process: at step t, two independent dice, with t − 1 and k faces respectively, are thrown. Let Qt and Rt be the respective outcomes of these two trials. The new node x will position itself to the immediate left of the node y that was added at time Qt . Suppose that the (ordered) list of successors of y is (z1 , . . . , zk ). The ordered list of successors of x will be composed of y followed by the nodes z1 , . . . , zk with the exception of node zRt . Thus, the number of nodes Xiτ of indegree i at time τ can be interpreted as a function of the trials (Q1 , R1 ), . . . , (Qτ , Rτ ). We want to show that changing the outcome of any single trial (Qt0 , Rt0 ), changes the r.v. τ Xi (for fixed i) by an amount not greater than 2k. Suppose we change (qt0 , rt0 ) to (qt00 , rt0 0 ), going from graph G to G0 . Let x be the node added at time t0 with the choice (qt0 , rt0 ), and x0 be the node added with the choice (qt00 , rt0 0 ). Let S, S 0 be the successors of x in G and x0 in G0 , respectively. The proof is complete by showing inductively that at any time step t, and for any nodes y, y 0 added at the same time respectively in G, G0 , the (ordered) list of successors of y and y 0 are close, i.e., in each of their positions, they either have the same successor, or they have two different elements of S ∪ S 0 . If t ≤ t0 , then the proof is immediate. For t > t0 , it follows that the only edges we need to consider are the copied edges. By induction, we know that at time t − 1, the lists of successors of the node we are copying from, in G and G0 , were close. Since the two lists are sorted, either the i-th copied edges in G and G0 will be the same, or they will both point to nodes in S ∪ S 0 . Thus the lists of the time t node are close and the proof is complete. 5.6 Long get longer In this section we analyze the edge length distribution in our graph model. We show it follows a power law with exponent more than 1. Later, we will use this to establish the compressibility of graphs generated by our model. Let Γ ` + 1 − k1 g(`) = . Γ 2 − k1 Γ (` + 2) 5.6. LONG GET LONGER 95 . 1 1 It holds that lim`→∞ g(`) `−1− k Γ 2 − k1 = 1, i.e., g(`) = Θ(`−1− k ). Recall that the length of an edge from a node in position i to a node in position j is equal to |i − j|; we define its circular directed length, denoted cd-length, to be j − i if j > i, and t − (i − j) otherwise. Let Y`t be the number of edges of length ` at time t. We aim to show that Y`t ≈ g(`) · t. It turns out to be useful to consider a related r.v. Z`t , which denotes the number of edges of cd-length ` at time t. We will first show that, w.h.p., Z`t ≈ g(`) · t. We will then argue that Y`t is very close to Z`t . The following shows that E [Z`t ] is bounded by g(`) · t ± O(1). Theorem 5.6.1 There exists some constant c = c(Gt0 ) such that g(`) · t − c ≤ E [Z`t ] ≤ g(`) · t + c, for all t ≥ t0 and ` ∈ [t]. Proof: As in the proof of Theorem 5.5.1, we start by obtaining a recurrence on the r.v.’s Zit . Let x be the node added at time t, and let y, y 0 be the nodes to the immediate right and left of x respectively (where y 0 equals the last node in the ordering if x is placed before the first node y). Consider Z1t . For t > t0 , E Z1t | Z1t−1 = Z1t−1 −Pr [x enlarges an edge of cd-length 1]+1, as an edge x → y of length 1 is necessarily added to the graph, and adding x can enlarge at most one edge of cd-length 1 (that is, the edge y 0 → y if it exists). The probability of the latter event is equal to Z1t−1 /(t − 1). By the linearity of expectation, t 1 E Z1 = 1 − E Z1t−1 + 1. t−1 Now consider Z`t , for ` ≥ 2 and t > t0 . We have, t−1 t−1 E Z`t | Z`t−1 , Z`−1 = Z`t−1 − E # of edges of cd-length ` that x enlarged | Z`t−1 , Z`−1 t−1 + E # of edges of cd-length (` − 1) that x enlarged | Z`t−1 , Z`−1 t−1 + E # of edges of cd-length (` − 1) that x copied from y | Z`t−1 , Z`−1 . Recall that x is placed to the left of a node y chosen u.a.r. Thus, given a fixed edge of length `, the probability this edge is enlarged by x is `/(t − 1). Thus, t−1 = E # of edges of length ` that x enlarged | Z`t−1 , Z`−1 ` Z`t−1 , and t−1 ` − 1 t−1 t−1 = E # of edges of length (` − 1) that x enlarged | Z`t−1 , Z`−1 Z , t − 1 `−1 t−1 E # of edges of cd-length (` − 1) that x copied from y | Z`t−1 , Z`−1 = k−1 X j=1 t−1 Pr the j-th copied edge had cd-length (` − 1) | Z`t−1 , Z`−1 . 96 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH Note that, for each j = 1, . . . , k − 1, the j-th copied edge is chosen uniformly at random over all the edges (even if the k − 1 copied edges are not independent). Thus, k−1 X t−1 (k − 1)Z`−1 t−1 Pr the j-th copied edge had cd-length (` − 1) | Z`t−1 , Z`−1 = . k(t − 1) j=1 By the linearity of expectation, we get for ` ≥ 2, t t−1 t−1 ` `−1 1 k−1 E Z` = 1 − E Z` + + E Z`−1 . t−1 t−1 t−1 k The base cases can be handled as in Theorem 5.5.1. The inductive step for ` = 1 can be directly verified. For ` ≥ 2, it suffices to note that g(` − 1) = k · (` + 1)/(`k − 1) · g(`). Thus, the expectation of the edge lengths follows a power law with exponent −1 − 1/k. To establish the concentration result, we need to analyze quite closely the combinatorial structure of the graphs generated by our model. Recall that the nodes in our graphs are placed contiguously on a discrete line (or list). At a generic time step, we use xi to refer to the i-th node in the ordering from left to right. Given an ordering π = (x1 , x2 , . . . , xt ) of the nodes, and an integer 0 ≤ k < t, a k-rotation, ρk (xi ) maps the generic node xi , 1 ≤ i ≤ t, to position 1 + ((i + k) mod t). We say that two nodes x, x0 are consecutive if there exists a k such that |ρk (x) − ρk (x0 )| = 1, i.e., they are consecutive if in the ordering either they are adjacent or one is the first and the other the last. Further, we say that an edge x00 → x000 passes over an node x if there exists k such that ρk (x00 ) < ρk (x) < ρk (x000 ). Finally, two edges x → x0 and x00 → x000 are said to cross if there exists a k such that after a k-rotation exactly one of x and x0 is within the positions ρk (x00 ) and ρk (x000 ). We prove the following characterization that will be used later in the analysis. Lemma 5.6.1 At any time, given any two consecutive nodes x, x0 , and any positive integer `, the number of edges of cd-length ` that pass over x or x0 (or both) is at most C = (k +2)t0 +1. Proof: Let us define G− t as the graph Gt minus the edges incident to the nodes that were originally in Gt0 . Note that, for each cd-length `, the number of the edges of cd-length ` that we remove is upper-bounded by 2t0 as each node can be incident to at most two edges of cd-length `, one going in, and one going out of the node. Unless otherwise noted, we will consider G− t for the rest of the proof. Fix the time t, and take any rotation ρ; let x1 , . . . , xt be the nodes in the list in the left-right order given by the rotation (i.e., node xi is in position i according to ρ). For a set of edges of the same cd-length to pass over at least one of two consecutive nodes x, x0 it is necessary for every pair of them to cross. We will bound, for a generic edge e, the number of edges that cross e and have the same length as e. Let t(xa ) be the time when xa was added to the graph. First, by definition we have that if xa → xb , then t(xa ) > t(xb ). Second, we claim that if there exists a rotation ρ0 such that xa , xb , xc are three nodes with ρ0 (xa ) < ρ0 (xb ) < ρ0 (xc ) and t(xc ) > t(xb ), then the edge xa → xc cannot exist. To see this, for xa → xc to exist it must be that t(xa ) > t(xc ). We want to show inductively 5.6. LONG GET LONGER 97 that all the nodes that will point to xc will be both to the left of xc and to the right of xb , in the ordering implied by ρ0 . Note that xc was not in Gt0 since its insertion time is larger than that of xb . Thus, each node placed to the immediate left of xc will point to it, and will satisfy the induction hypothesis. Furthermore, each node that copies an edge to xc must be placed to the immediate left of a node pointing to xc . Thus, the second claim is proved. Third, we claim that if xa , xb , xc , xd are four nodes such that the edges xa → xc and xb → xd exist, and cross each other, then there exists an edge xc → xd . To see this, first note that none of these four nodes could have been part of Gt0 , for otherwise at least one of 00 00 00 00 the two edges could not have been part of G− t . Fix a rotation ρ s.t. ρ (xa ) < ρ (xb ) < ρ (xc ); by the second claim, it must be that t(xb ) > t(xc ). Thus, the edge xb → xd has necessarily been copied from some node, say xb1 . Note that ρ00 (xb1 ) ≤ ρ(xc ). Indeed by assumption ρ00 (xc ) > ρ00 (xb ) and it is impossible that ρ00 (xc ) < ρ00 (xb1 ), for otherwise xb could not have copied from xb1 as t(xb ) > t(xc ). Now, we know that the edge xb1 → xd exists (as before, xb1 is not part of Gt0 ). If xb1 = xc , then we are done. Otherwise, there must exist an xb2 pointing to xd from which xb1 has copied the edge. Note that ρ00 (xb1 ) < ρ00 (xb2 ) ≤ ρ00 (xc ). By iterating this reasoning, the claim follows. Take any set S of edges having the same length, and such that any pair of them cross. Given an arbitrary ρ000 , let x be the node with the smallest ρ000 (x) such that, for some x0 , the edge x → x0 is in S (the nodes x and x0 are unique). For any other edge y → y 0 in S, by the third claim, there must exist the edge x0 → y 0 . As x0 has outdegree k, it follows that |S| ≤ k + 1. Finally, since the seed graph Gt0 had k · t0 edges and we removed at most 2t0 edges of cd-length ` (for an arbitrary ` ≥ 1) in the cut [Gt0 , Gt \ Gt0 ], we have refrained from counting at most k · t0 + 2t0 edges of length ` passing over one of the nodes x, x0 . The proof follows. Now we prove the O(1)-Lipschitz property of the r.v.’s Z`t , if t0 , k = O(1). The concentration of the Z`t will follow from Theorem 5.2.1. Lemma 5.6.2 Each r.v. Z`t satisfies the ((k + 2)t0 + k + 1)-Lipschitz property. Proof: We use the stochastic interpretation as in the proof of Lemma 5.5.1. For each τ , let Z`τ be the r.v. representing the number of edges of cd-length ` at time τ . We consider Y`τ as a function of the trials (Q1 , R1 ), . . ., (Qτ , Rτ ). We show that changing the outcome of any single trial (Qt0 , Rt0 ), changes the r.v. Z`τ , for fixed `, by an amount not greater than C + k = (k + 2)t0 + k + 1. Suppose we change (qt0 , rt0 ) to (qt00 , rt0 0 ), going from graph G to G0 . Let x be the node added at time t0 with the choice (qt0 , rt0 ), and x0 be its equivalent with the choice (qt00 , rt0 0 ). We show that choosing two different positions for x and x0 can change the number of edges of cd-length ` by at most C + k at any time step. Note that before time step t0 , the cd-lengths are all equal. By Lemma 5.6.1, at time t > t0 , for all `, the number of edges of cd-length ` that pass over x (resp., x0 ) is upper bounded by C. For an edge e, let Se be the set of edges that have been copied from e, directly or indirectly, including e itself, i.e., e ∈ Se and if an edge e0 is copied from some edge in Se , then e0 ∈ Se . Note that no two edges in Se have the same cd-length, since they all start from different nodes, but end up at the same node. 98 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH For any node z, if e1 , . . . , ek are the successors of z, we define Sz = Se1 ∪ · · · ∪ Sek . The last observation implies that, for any fixed `, no more than k edges of cd-length ` are in Sv (or Sv0 ) at any single time step. Now, consider the following edge bijection from G to G0 : the i-th edge of the j-th inserted node in G is mapped to the i-th edge of the j-th inserted node in G0 . It follows that if an edge e in G (resp., G0 ) does not pass over x (resp., x0 ) and is not in Sx (resp., Sx0 ), then e gets mapped to an edge of the same cd-length in G0 (resp., G). Thus, the difference in the number of edges of the cd-length ` in G and G0 is at most C + k. We now show that the number Dt of edges whose length and cd-length are different (at time t) is very small. Since the maximum absolute difference between Y`t and Z`t is bounded by Dt , this will show that these r.v.’s are close to each other. First note that if an edge xi → xj has different length and cd-length, then j < i; call such an edge left-directed and let Rt be the set of left-directed edges. Since Dt ≤ Rt , it suffices to bound the latter. Lemma 5.6.3 With probability 1 − O 1 t 1 , Rt ≤ O t1− k + , for each constant > 0. Proof: Observe that each edge xi → xj counted by Rt is such that j < i. Thus, Rt0 is equal to the number of left-directed edges in Gt0 with its given embedding. Further, Rt ’s increase over Rt−1 equals the number of left-directed edges copied at step t (the proximity edge is always not left-directed). Thus, E[Rt |Rt−1 ] = 1 + (k − 1) · E[Rt−1 ], for each t > t0 . Therefore, 1 k(t−1) · Rt−1 and E[Rt ] = 1 + (k − 1) · 1 k(t−1) · t t Y Y + 1 · Γ (t0 + 1) Γ t + k−1 i + k−1 k−1 1 k k · = Rt0 · . = Rt0 · E[Rt ] = Rt0 · 1+ k−1 k i i Γ t + + 1 · Γ (t + 1) 0 k i=t +1 i=t +1 0 0 1 Thus, E[Rt ] = Θ t1− k . We note how a O(1)-Lipschitz condition holds (at most k − 1 new left-directed edges can be added 1at each step). 1Thus Theorem 5.2.1 can be applied with √ + 1− + an error term of O t log t ≤ O t 2 ≤O t k . The result follows. Applying Theorem 5.2.1, Theorem 5.6.1, Lemma 5.6.2, and Lemma 5.6.3, we obtain the following. Corollary 5.6.1 With probability ≥ 1 − O i. E[Z`t ] − O √ 1 t , it holds that √ t log t ≤ Z`t ≤ E[Z`t ] + O t log t , and ii. E[Z`t ] − O t1−1/k+ ≤ Y`t ≤ E[Z`t ] + O t1−1/k+ . √ Note that the concentration error term, O( t log t), is upper bounded by Rt , for each k ≥ 2. Also, the corollary is vacuous if ` > t1/(k+2) . 5.7. COMPRESSIBILITY OF OUR MODEL 5.7 99 Compressibility of our model We now analyze the number of bits needed to compress the graphs generated by our model. Recall that the web graph has a natural embedding on the line via the URL ordering that experimentally gives very good compression [10,12]. Our model generates a web-like random graphs and an embedding “à-la-URL” on the line. We work with the following BV-like compression scheme: a node at position p on the line stores its list of successors at positions p1 , . . . , pk as a list (p1 − p, . . . , pk − p) of compressed integers. An integer i 6= 0 will be compressed using O (log (|i| + 1)) bits, using Elias γ-code, for instance [107]. We show that our graphs can be compressed using O(1) bits per edge using above scheme. Theorem 5.7.1 The above BV-like scheme compresses the graphs generated by our model 1 using O(n) bits, with probability at least 1 − O n . Proof: Let > 0 be a small constant. At time n, consider the number of edges of length at most L = dn e. Note that by Corollary 5.6.1, for each 1 ≤ ` ≤ L, it holds that |Y`n − E[Z`n ]| ≤ O n1−1/k+ , with probability 1 − O (n−1 ). For the rest of the proof, we implicitly condition on this event. Lower bounding E[Z`n ] as in Theorem 5.6.1, we obtain the following lower bound on the number of edges of length ≤ L, using standard algebraic manipulation and Lemma8 5.2.1 ! L X Γ ` + 1 − k1 · n − c − O n1−1/k+ S ≥ 1 Γ 2 − Γ(` + 2) k `=1 ! Γ L + 2 − k1 − O L · n1−1/k+ ≥ nk 1 − 1 Γ(L + 2)Γ 2 − k ≥ nk − O n · k · L−1/k − O L · n1−1/k+ ≥ nk − O n1−1 , where 1 is a small constant. At time n, the total number of edges of the graph is nk. Thus the number of edges of length more than L is at most O (n1−1 ) (notice how, for this argument to work, it is crucial to have a very strong bound on the behavior of the Y`n random variables; this is why we used the Gamma function in their expressions). The maximum edge length is O(n) and so each edge can be compressed in O(log n) bits. The overall contribution, in terms of bits, of the edges longer than L will then be o(n). Now, we calculate the bit contribution B of the edges of length at most L. !! L X Γ ` + 1 − k1 1−1/k+ n+c+O n B ≤ O (log (` + 1)) 1 Γ 2 + Γ(` + 2) k `=1 ! L X ≤ n·O log (` + 1) · `−1−1/k + O L · n1−1/k+ · log L ≤ O(n), `=1 8 Which we use to conclude that 1 1 Γ(2− k ) 1 Γ(`+1− k ) `=1 Γ(`+2) PL =k· 1− 1 Γ(L+2− k ) 1 Γ(L+2)Γ(2− k ) . 100 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH ) where the penultimate inequality follows since the Γ(···Γ(··· fraction can be upper bounded )Γ(··· ) by O(`−1−1/k ), and the last inequality from O (`−1−2 · log `) ≤ O (`−1− ) and from the convergence of the Riemann series. The proof is complete. Thus, given an ordering of nodes, we can compress the graph to use O(1) bits per edge using a linear-time algorithm. A natural question is if it is still possible to compress this graph without knowing the ordering. We show that this is still possible. Theorem 5.7.2 The graphs generated by our model can be compressed using O(n) bits in linear time, even if ordering of the nodes is not available. Proof: Given a node v in G, just by looking at two-neighborhood, we can either (a) find an out-neighbor w of v having exactly k − 1 out-neighbors in common with v, or (b) we can conclude that v was part of the “seed” graph Gt0 (having constant order). This step takes time O(k 2 ) = O(1). Indeed, if v were not part of Gt0 , during its insertion, v added a proximity edge to its “real prototype” w, and copied k − 1 of w’s outlinks. If more than one out-neighbor of v has k − 1 out-neighbors in common with v, we choose one arbitrarily and we call it the “possible prototype” of v. For compressing, we create an unlabeled rooted forest out of the nodes in Gt0 . A node v will look for a possible prototype w. If such a w is found, then v will choose w as its parent. Otherwise v will be a root in the forest. To describe G, it will suffice to (a) describe the unlabeled rooted forest, (b) describe the subgraph induced by the roots of the trees in the forest, and (c) for each non-root node v in the forest, use dlog ke bits to describe which of its parent’s out-neighbors was not copied by v in G. The forest can be described with O(n) bits, for instance, by writing down the down / up steps made when visiting each tree in the forest, disregarding edge orientations (as each edge is directed from the child to the parent). This requires O(n) bits. The graph induced by the roots of the trees (i.e., a subgraph of Gt0 ) can be stored in a non-compressed way using O(t20 ) = O(1) bits. The third part of the encoding will require at most O(n log k) = O(n) bits. Note that it is possible to compute each of the three encodings in linear time. 5.8 Other properties of our model In this section we prove some additional properties of our model: that it has a large number of bipartite cliques, high clustering coefficient, and small undirected diameter. 5.8.1 Bipartite cliques Recall that a bipartite clique K(a, b) is a set A of a nodes and a set B of b nodes such that each node in A has an edge to every node in B. We can show that the graphs generated by our model contain a large number of bipartite cliques. The proof is similar to the one of [65] for the linear growth model. Theorem 5.8.1 There exists a β > 0, such that the number of bipartite cliques K(Ω(log n), k) in our model is Ω(nβ ), w.h.p. 5.8. OTHER PROPERTIES OF OUR MODEL 101 Proof:[Proof (Sketch)] Take any fixed node x of the seed graph Gt0 and a subset S of k − 1 of its successors. Divide the time steps t − t0 into disjoint epochs of exponentially increasing size, i.e., of sizes cτ, c2 τ, c3 τ, . . ., for a large enough τ . Let j be the number of epochs; then, j = Ω(log n). Note that for i ≤ j, the probability that at least one node added in epoch i will attach itself to x and copy exactly the edges in S is at least a constant; also, for each i 6= i0 , these events are independent. Thus, w.h.p., at least Ω(log n) nodes will be good, i.e., will have S ∪ {v} as successors. Now, any subset of the good nodes will form a bipartite clique with S ∪ {v}. The number of subsets of size Ω(log n) is easily shown to grow as Ω(nβ ) for some β > 0. 5.8.2 Clustering coefficient Watts and Strogatz [103] introduced the concept of clustering coefficient. The clustering coefficient C(x) of a node x is the ratio of the number of edges between neighbors of x and the maximum possible number9 of such edges. The clustering coefficient C(G) of a (simple) graph G is the average of the clustering coefficients of its nodes. Snapshots of the real web graph have been observed to possess a pretty high clustering coefficient. Thus, having a high clustering coefficient (that is, having a constant clustering coefficient) is a desirable property of web graphs’ models. Theorem 5.8.2 Take a (directed) graph G generated by our model. The clustering coefficient of G is Θ(1) w.h.p. Proof: By Theorem 5.5.1, and Lemma 5.5.1, there will exists q = Θ(n) many nodes of indegree 0 w.h.p. Take any node x of indegree 0, and let y be the node that x was copied from. Then, x and y shares k − 1 out-neighbors (the “copied” ones). The total degree of x k−1 = k1 ∈ Ω(1). The clustering coefficient is k, thus the clustering coefficient of x is ≥ k(k−1) of the graph is the average of the clustering coefficients of its nodes; thus, in our case, it is ≥ n1 · q · k1 ≥ Ω(1). In general, the maximum value of the clustering coefficient is 1. The claim follows. The previous proof also shows that, if we remove orientations from the edges of our model’s graphs, the clustering coefficient of the undirected graphs we obtain is Θ(1). 5.8.3 Undirected diameter We now argue that, w.h.p., the undirected diameter of our random graphs is O(log n) (provided that the seed graph Gt0 was weakly-connected). By undirected diameter, we mean the diameter of the undirected graph obtained by removing edge orientations from our graphs. Note that our graphs are almost DAGs, i.e., they are DAGs perhaps except for the nodes in the seed graph Gt0 and therefore directed diameter is not a meaningful notion to consider. Consider the so-called random recursive trees: the process starts with a single node and at each step, a node is chosen uniformly at random, and a new leaf is added as a child of 9 That is, 1 2 deg(x)(deg(x) − 1) in the undirected case and deg(x)(deg(x) − 1) in the directed case. 102 CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH that node; the process ends at the generic time n. A result by Szymanski [100] shows that random recursive trees on n nodes have height O(log n) w.h.p. Consider the “proximity” edges added in step (ii) in our model, i.e., those added from the new node, to a node chosen uniformly at random. Now, these edges induce a random recursive forest with t0 different roots corresponding to the nodes of the seed graph Gt0 . A result of [100] states that the height of a random recursive tree on n nodes is O(log n) w.h.p. Thus, assuming that Gt0 is weakly-connected implies that the (undirected) diameter of our model’s graphs is O(log n) w.h.p. Chapter 6 Compressibility of social networks Motivated by structural properties of the Web graph that support efficient data structures for in memory adjacency queries, we study the extent to which a large network can be compressed. Boldi and Vigna (WWW 2004), showed that Web graphs can be compressed down to three bits of storage per edge; we study the compressibility of social networks where again adjacency queries are a fundamental primitive. To this end, we propose simple combinatorial formulations that encapsulate efficient compressibility of graphs. We show that some of the problems are NP-hard yet admit effective heuristics, some of which can exploit properties of social networks such as link reciprocity. Our extensive experiments show that social networks and the web graph exhibit vastly different compressibility characteristics. 6.1 Introduction We study the extent to which social networks can be compressed. There are two distinct motivations for such studies. First, Web properties require high-speed indexes for serving adjacencies in the social network: thus, a typical query seeks the neighbors of a node (member) of a social network. Maintaining these indexes in memory demands that the underlying graph be stored in a compressed form that facilitates efficient adjacency queries. Secondly, there is a wealth of evidence (e.g., [64]) that social networks are not random graphs in the usual sense: they exhibit certain distinctive local characteristics (such as degree sequences). Studying the compressibility of a social network is akin to studying the degree of “randomness” in the social network. The Web graph (Web pages are nodes, hyperlinks are directed edges) is a special variant of a social network, in that we have a network of pages rather than of people. It is known that the Web graph is highly compressible [10] and [22]. Particularly impressive results have been obtained by Boldi and Vigna [10], who exploit lexicographic locality in the Web graph: when pages are ordered lexicographically by URL, proximal pages have similar neighborhoods. More precisely, two properties of the ordering by URL are experimentally observed to hold: The work described in this chapter is a joint work with F. Chierichetti, R. Kumar, M. Mitzenmacher, A. Panconesi and P. Raghavan and its extended abstract appeared in the Proceedings of 15th Conference on Knowledge Discovery and Data Mining(KDD09) [24]. 103 104 CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS • Similarity: pages that are proximal in the lexicographic ordering tend to have similar sets of neighbors. • Locality: many links are intra-domain, and therefore likely to point to pages nearby in the lexicographic ordering. These two empirical observations are exploited in the BV-algorithm to compress the Web graph down to an amortized storage of a few bits per link, leading to efficient in-memory data structures for Web page adjacency queries (a basic primitive in link analysis). Do these properties of locality and similarity extend to social networks in general? Whereas the Web graph has a natural lexicographic order (by URL) under which this locality holds, there is no such obvious ordering for social networks. Can we find such an ordering for social networks, leading to compression through lexicographic locality? Our main contributions presented in this chapter are the following. We propose a new compression method that exploits link reciprocity in social networks. Motivated by this and BV, we formulate a genre of graph node ordering problems that distill the essence of locality in BV-style algorithms. We develop a simple and practical heuristic based on shingles for obtaining an effective node ordering; this ordering can be used in BV-style compression algorithms. We then perform an extensive set of experiments on four large real-world graphs, including two social networks. Our main findings are: social networks appear far less compressible than Web graphs yet closer to host graphs and exploiting link reciprocity in social networks can vastly help its compression. The rest of the chapter is organized as follows. Section 6.2 discusses the related work. Section 6.3 outlines the basic compression scheme of Boldi and Vigna, and proposes a new scheme that exploits link reciprocity. Section 6.4 formalizes the optimal node ordering problem and supplies a simple and practical heuristic for this problem. Section 6.9 contains a detailed account of our experiments on four large real-world graphs. 6.2 Related work Prior related work falls into three major categories: (1) compressing Web graphs; (2) compressed indexes and (3) graph ordering problems. Randall et al. [91] suggested lexicographic ordering as a way to obtain good Web graph compression; some hardness results in this context were obtained by Adler and Mitzenmacher [1]. Raghavan and Garcia-Molina [90] considered a hierarchical view of the Web graph to achieve compression; see also Suel and Yuan [99] for a structural approach to compressing Web graphs. A major step was taken by Boldi and Vigna [10], who developed a generic Web graph compression framework that takes into account the locality and similarity of Web pages; our formulation is based on this framework. Boldi and Vigna [11] also developed ζ-codes, to exploit power law distributed integer gaps. Recently, Buehrer and Chellapilla [22] used the frequent pattern mining approach to compress Web graphs. Using this different approach, they were able to achieve a compression of under two bits per link. The problem of assigning or reassigning document identifiers in order to compress text indexes has a long history. Blandford and Blelloch [8] considered the problem of compressing 6.3. COMPRESSION SCHEMES 105 text indexes by permuting the document identifiers to create locality in an inverted index, i.e., clustering property of posting lists. Silvestri, Perego, and Orlando [96] proposed a clustering approach for reassigning document identifiers. Shieh et al. [94] proposed a document id reassignment method based on a heuristic for the traveling salesman problem. Recently, Silvestri [95] showed that assigning document identifiers to Web documents based on URL lexicographic ordering improves compression. There are several classical node ordering problems on graphs. The minimum bandwidth problem, where the goal is to order the nodes to minimize the maximum stretch of edges, and the minimum linear arrangement problem, where the goal is to order the nodes to minimize the sum of stretch of edges, have a long history. We refer to [45] and the online compendium at www.nada.kth.se/~viggo/wwwcompendium/node52.html. 6.3 Compression Schemes In this section we outline the compression technique used in the rest of the chapter. The framework is based on the algorithm of Boldi and Vigna for compressing Web graphs [10]; their algorithm achieved a compression down to about three bits per link on a snapshot of the Web graph. We henceforth refer to this as the BV compression scheme, which we first describe. Next, we describe what we call the backlinks compression (BL) scheme, which targets directed graphs that are highly reciprocal. Notation Let G = (V, E) be a directed graph and let |V | = n. The nodes in V are bijectively identified with the set [n] = {1, . . . , n} of integers. For a node u ∈ V , let out(u) ⊆ V denote the set of outlinks of u, i.e., out(u) = {v | (u, v) ∈ E}. Likewise, let in(u) denote the set of inlinks of u. Let outdeg(u) = |out(u)| and indeg(u) = |in(u)|. If both (u, v) ∈ E and (v, u) ∈ E and u < v, then we call the edge (v, u) to be reciprocal. For a node u ∈ V , let rec(u) be {v | (v, u) is reciprocal }. Let lg denote log2 . We will encode all integers using one of three different encoding schemes, namely, Elias’s γ-code, δ-code, and Boldi–Vigna ζ-code with parameter 4 (which we found to be the best in our experiments) [11]. These integer encoding schemes encode an integer x ∈ Z + using close to the informatic-theoretic minimum of 1 + blg(x)c bits. For example, the number of bits used by the γ-code to represent x is 1 + 2blg xc. We refer to [107] for more background on these codes. 6.3.1 BV compression scheme BV incorporates three main ideas. First, if the graph has many nodes whose neighborhoods are similar, then the neighborhood of a node can be expressed in terms of other nodes with similar neighborhoods. Second, if the destinations of edges exhibit locality, then small integers can be used to encode them (relative to their sources). Third, rather than store the destination of each edge separately, one can use gap encodings to store a sequence of edge destinations. Given a sorted list of positive integers (say, the destinations of edges from a node), we write down the sequence of gaps between subsequent integers on the list, rather 106 CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS than the integers themselves. The idea is that even if the integers are big (requiring many bits to record), the gaps between integers on the list could be recorded with few bits. We now detail the BV scheme for compressing Web graphs. The nodes are Web pages and the directed edges are the hyperlinks. First, we order Web pages lexicographically by URL. This assigns to each Web page a unique integer identifier (ID), which is its position in this ordering. Let w be a window parameter; for the Web, BV recommend w = 8. Let v be a Web page. Its encoding will be as follows. 1. Copying. Check if the list out(v) of v’s outlinks is a small variation on the list of one of the w − 1 preceding Web pages in the lexicographic ordering. Let u be such a prototype page, if it exists. 2. Encoding. Encode v’s outlinks as follows. If the copying phase found a prototype u, then use lg w bits to encode the (backward) offset from v to u, followed by the changes from u’s list to v’s. If none of the lg w preceding pages in the lexicographic ordering offers a good prototype, set the first lg w bits to all 0’s, then explicitly write down v’s outlinks. (BV also optimize further by storing a list i, i + 1, . . . , j − 1, j of consecutive outlinks by storing the interval [i, j] instead.) Note that locality and similarity are captured by the copying phase. By using clever gap encoding schemes (using the integer codes mentioned earlier) on top of the basic method above, BV obtain their best results. Note that the exploitation of lexicographic locality here hinges crucially on the natural ordering available on the Web pages (URLs). For more details, we refer to the original paper [10] and [107, Chapter 20]. This general method of compression has two nice properties. First, it is dependent only on locality in some canonical ordering. Second, adjacency queries (fetch all the outlinks of a given node) can be served fairly efficiently. Given a Web page whose outlinks are sought, we enumerate these outlinks by decoding backwards through the chain of prototypes, until we arrive at a list whose encoding begins with at least lg w 0’s. While in principle this chain could be arbitrarily long, in practice you can force the algorithm to cut them down when their length exceeds a given threshold t, and small values of t already provide a good compromise between compression ratio and decompression speed. 6.3.2 Backlinks compression scheme We now describe a slighly different compression scheme that is motivated by the observed properties of social networks. This scheme, called BL, incorporates an additional idea on top of BV, namely, link reciprocity. Here, reciprocal links are encoded in a special way. Since social networks are known to be mostly reciprocal (if Alice is Bob’s friend, then Bob is very likely to be Alice’s friend), this will turn out to be advantageous. Suppose we obtain an ordering of the nodes in the graph through some process to be discussed later; we will identify each node in the graph with its position in this ordering. Let v be a node. Its encoding will consist of the following. 1. Base information. The outdegree |out(v)|, minus 1 if v has a self-loop, and minus the number of reciprocal edges from v. Also, a bit specifying if v has a self-loop. 2. Prototype. The node u that v uses as a prototype to copy from: as u ≤ v in the ordering, u is encoded as the difference between u and v. If u = v, then no copying is 6.4. COMPRESSION-FRIENDLY ORDERINGS 107 performed. Otherwise, a bit is added for each outlink of u, representing whether or not that outlink of u is also an outlink of v. 3. Residual edges. Let (v, v1 ), . . . , (v, vk ) be the outlinks of v that are yet to be encoded after the above step. Let v1 ≤ · · · ≤ vk . We write one bit stating if v > v1 or v < v1 . Then we encode the gaps |v1 − v| , |v2 − v1 | , . . . , |vk − vk−1 |. 4. Reciprocal edges. Finally, we encode the reciprocal outlinks of v. For each v 0 ∈ out(v) such that v 0 > v, we encode whether v 0 ∈ rec(v) or not using one bit per link and discard (v 0 , v). Note that reciprocal edges are succinctly encoded by the last step. Thus, this method potentially outperform BV in terms of compression. However, it has a drawback: unlike in BV, adjacency queries may be slower. This is because BV limits the “length” of prototype chains but we do not impose such a limit in BL, for best compression. If the compressed representation of a network bottlenecks adjacency query serving, then a limit on the length of copying chain can be introduced in BL as well. 6.4 Compression-friendly orderings In both the BV and BL schemes, the ordering of nodes plays a crucial role in the performance of the compression scheme. The performance of suggests that the lexicographic ordering of URL’s for the Web graph is both natural and crucial, begging the question: can we find such orderings for other graphs, in particular, social networks? If we could, we would be able to apply either the BV or the BL scheme. In this section we formulate ordering problems that are directly motivated by the BV and BL compression schemes. 6.4.1 Formulation We first formalize the problem of finding the best ordering of nodes in a graph for the BV and BL schemes. As we saw earlier, both algorithms benefit if locality and similarity are captured by this ordering. This leads to the following natural combinatorial optimization problem, which we call minimum logarithmic arrangement. P Problem 6.4.1 (MLogA) Find a permutation π : V → [n] such that (u,v)∈E lg |π(u) − π(v)| is minimized. The motivation behind this definition is to minimize the sum of the logarithms of the edge lengths according to the ordering (where the length of the edge u → v is |π(u) − π(v)|). Notice this cost represents the compression size of the length of the edge in an encoding that is information-theoretically optimal (or nearly so). Also note that if the term inside the summation were just |π(u) − π(v)|, then this is the well-known minimum linear arrangement (MLinA) problem. MLinA is NP-hard [46]; little, however, is known about its approximability. The best algorithm [92] approximates √ MLinA with a O( log n log log n) multiplicative error with respect to the optimal solution; further this algorithm is not practical for large graphs. From the standpoint of the hardness of approximation, only the existence of a PTAS has been ruled out [4]. One cannot hope to 108 CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS use an approximate solution to MLinA to solve MLogA since we can show (Section 6.5) that these problems are very different in their structure. In actually compressing the graph, it is more efficient to compress the gaps induced by the neighbors of a node. Suppose u < v1 < v2 and (u, v1 ), (u, v2 ) ∈ E. Then, compressing the gaps v1 − u and v2 − v1 is always and could be far less expensive than compressing the lengths of the edges, namely, v1 − u and v2 − u. For this reason, we introduce a slightly modified problem, called minimum logarithmic gap arrangement. Let f (u, out(u), π) be the cost of compressing out(u) under the ordering π, i.e., if u0 = u, out(u) = {u1 , . . . , uk } with π(u1 ) ≤ · · · ≤ π(uk ), then fπ (u, out(u)) = k X lg |π(ui ) − π(ui−1 )|, i=1 where u = u0 . Problem 6.4.2 (MLogGapA) Find a permutation π : V → [n] such that is minimized. P u∈V fπ (u, out(u)) Once again, as a problem, MLogGapA turns out to be very different from MLinA and MLogA. Both formulations MLogA and MLogGapA capture the essence of obtaining an ordering that will benefit BV and BL compressions. We believe a good approximation algorithm for either of these problem will be of practical interest. 6.4.2 Hardness results In this section we exploit some structure of those problem and we give some hardness result. 6.5 MLogA vs. MLinA vs. MLogGapA 0 1 2 4 5 6 3 Figure 6.1: An example showing the difference between MLogA and MLinA. The graph in Figure 6.1 is an example showing that the MLinA and the MLogA problems can have different solutions: there is no ordering that minimizes both the objective 6.6. HARDNESS OF MLOGA 109 functions simultaneously. The best solutions for MLinA have value 19 whereas the best solutions for MLogA have value lg 180. It can checked that among the optimal MLinA orderings (with value 19), the best for MLogA has value lg 192 (e.g, the ordering 4, 5, 3, 2, 6, 1, 0). Among the optimal MLogA ordering (with value lg 180), the best for MLinA has value 20 (obtained by swapping 3 and 5 in the previous ordering). It is easy to similarly show that MLogGapA can have different solutions from both MLinA and MLogA problems. For instance, consider a star with three leaves. The optimum ordering for MLogGapA will place the center of the star as the first (or last) node of the ordering, yielding a total cost of 0. On the other hand this solution is suboptimal for both MLinA and MLogA, which would place the center of the star as either the second or the third in the ordering. 6.6 Hardness of MLogA In this subsection we prove that the MLogA problem is NP-hard on multi-graphs. Theorem 6.6.1 The MLogA problem is NP-hard on multi-graphs. Proof: We prove the hardness of MLogA via a reduction making use of the inapproximability of MaxCut. Our starting point, from [50], is that MaxCut cannot be approximated 16 + unless P = N P . In the reduction below we have not attempted to a factor greater than 17 to optimize parameters. We start from a MaxCut instance (G(V, E), k), where the question is whether there exists a cut of size at least k in G. Let |V | = n and |E| = m. We build the graph G0 composed by a clique of size n100 and a disjoint copy of the negation of G denoted by Ḡ. Further, we add an edge between each node of the clique and each node of Ḡ. Each edge of the clique will have multiplicity n500 + 1, all other edges will havePunit multiplicity. P Let C be equal to 1≤i<j≤n100 +n+1 lg(j − i) and let X = n500 1≤i<j≤n100 lg(j − i). Now we would like to answer the following question Q: Is it possible to find an ordering of G0 with an MLogA cost smaller then Z? We show that answering questions of the form Q would allow us to approximate the corresponding MaxCut instance. First, note that in any ordering of G0 for which the answer for Q is yes when Z = C + X − k lg n100 , the nodes in the clique must be adjacent. Otherwise, at least one edge of the clique will be enlarged by at least 1. In this case, the overall cost of the clique edges will be at least X − (n500 + 1)(lg n100 ) + (n500 + 1) lg(n100 + 1), which is X + Ω(n400 ). This is larger than the value allowed by the question Q. We show that if the answer to Q when Z = C + X − k lg n100 is positive then there 1 is a cut in G of size at least k(1 − 50 ), and otherwise there is no cut of size k. As this allows approximations of MaxCut to a factor better than 16/17, this shows that we can have an algorithm to answer questions of the form Q only if P = N P , proving the hardness of MLogA. From our previous argument, we now need only consider ordering of G0 where the clique nodes are laid out consecutively. Each such ordering naturally gives a cut of the P original graph, and the value of the MLogA objective function is equal to C + X − {u,v}∈E(G) lg |π(u) − π(v)|. Consider the edges in G (corresponding to the missing edges in 110 CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS G0 ) that pass over the clique. Each of these edges will have length at least n100 , and hence the cost of value of the MLogA objective function is smaller than C + X − k lg n100 = Z. Hence if the there is a cut of size at least k in G the answer to Q is yes. On the other hand, each of the other missing edges will have length at most n, (the order of G), and hence have cost at most lg n. As the MaxCut value k is at least m2 , if G does 1 ) the smallest that the MLogA objective function can not have a cut of size at least k(1 − 50 be is 1 C +X −k 1− lg(n100 + n) − k lg n > Z 50 for n sufficiently large. This proves the claim. 6.7 Hardness of MLinGapA While we are currently unable to show that MLogGapA is NP-hard, we can show that its “linear” version (i.e., without the logarithms), MLinGapA, is indeed hard. Theorem 6.7.1 The MLinGapA problem is NP-hard. Proof: We start from the (directed) MLinA problem, which is known to be NP-hard. Let (G(V, E), k) be a MLinA instance (is there a linear arrangement whose sum of edge lengths is ≤ k?). Let n = |V | and m = |E|. We create the instance of the (directed) MLinGapA problem as follows. The graph G0 will be composed by n0 = nc+1 + 2m nodes (for some large enough constant c). For each node v ∈ V (G), two directed cliques Kv,1 and Kv,2 of equal sizes nc will be created. Also, a clique of n nodes dv,1 , . . . , dv,2n (the “peer nodes” of v) will be created for each v ∈ V (G). Each node in Kv,1 and each node in Kv,2 will point to node dv,i for all i = 1, . . . , deg(v) and vice versa. The set E(G0 ) will contain 2m other edges, that we call the “original” edges. In particular, for each edge (v, u) ∈ E(G) the edges (dv,∗ ,du,∗ ) and (du,∗ ,dv,∗ ) will be added (in such a way that each node dv,∗ will have outdegree ≤ n). Given an arbitrary node v, consider the following ordering (that we dub good) of its two cliques and of its peer nodes: the first clique laid out on nc consecutive nodes, followed by its 2n peers, and finally the second clique (using a total of nc + n nodes). Let F be the cost of the edges of the cliques, and the edges from the cliques to the peers, in this ordering (F can be trivially computed in polytime). Now we ask: does there exist an ordering with MLinGapA value at most nF +3K(2nc )+ 3mn2 = T ? If there exists a MLinA ordering π of cost at most K, it is easy to find a MLinGapA ordering of cost at most T . If v is the first node of π, place the first clique of v followed by the peers of v and the second clique of v at the beginning. Then do the same for the second node of π, and so on, until all nodes have been placed. What is the total MLogGapA cost? We have a fixed cost of nF (the ordering of the “nodes structures”) for the non-original edges. As for the original edges, note that each node from which an original edge starts has 6.7. HARDNESS OF MLINGAPA 111 out-degree 1, thus encoding the “gap” induced by that edge has the same cost of encoding its length. What is its length? The number of cliques that an edge (that had length ` in π) passes over in the new ordering is 2`. Each such clique has size nc . Thus, the cost in the new ordering of the edge will be at most `2nc + ξ, where ξ is an error term that equals n2 (the total number of peer nodes). Now for any edge of length ` in the MLA, there are three gaps of cost at most `2nc + n2 . The total cost will thus be at most nF + 3K(2nc ) + 3mn2 = T . Now suppose we have a MLinGapA ordering with MLinGapA value at most T . We show in turn that there is a MLinA ordering of cost at most K. To show this, we first prove that for each v the ordering will be such that a) the distance between any two nodes of Kv,1 (resp., Kv,2 ) will be at most nc + n4 (that is, the cliques won’t be spread out), b) the distance between each single peer of v and its nearest node of Kv,1 (Kv,2 ) will be at most n4 . Suppose this statement is true. We show by contradiction that there must exists a MLinA ordering of value at most K. First, notice that the minimum cost that we have to pay for the edge between nodes in V (G0 ) that are generated from one node v is at least F (in any ordering the gaps are of length at least 1 and for any ordering the sum of the backward edges is at least their cost in the good ordering). Further, from properties (a) and (b) it follows that in all valid solutions, for each v ∈ V (G), each peer node of v must be placed at distance at most nc +2n4 from each clique node of v. Now, the number of nodes of the cliques generated by v is 2nc so it’s necessary that each peer node has to be placed after at least nc − 2n4 nodes of one of its two cliques and before nc − 2n4 nodes of its other (as each peer node has to be at distance ≤ nc + 2n4 from each node of its cliques). Hence, the total cost for any ordering of cost K +1 for the MLA problem is at least nF +3(K +1)(2nc −4n4 ) > T , a contradiction. Now we have to prove properties (a) and (b). First we show (a): if the maximum distance between two nodes in any of the Kv cliques is > nc + n4 the total cost of the ordering is > T . Indeed if the distance between any two nodes of Kv is > nc + n4 , then the cost for the edges between the clique and peer nodes of v will be ≥ F + nc+4 − nc were the first term of the sum is due to the fact that all the gaps are of length at least one, and that there are at least nc + n backlinks. The nc+4 − nc term is the added cost due to the spread of the clique (which is ≥ n4 , and the – say – rightmost node of the clique must go across all the non-clique nodes between clique nodes, for a total of at least nc − 1 links). Hence, the cost of the ordering would be ≥ nF + nc+4 − nc > T , contradicting the validity of the solution (as K ∈ O(n2 )). Finally we have to prove (b): for each v ∈ V (G), no peer node of v is at distance ≥ n4 from the each of the cliques of v. Proceeding as before, we lower bound the cost of the ordering for the edges between the nodes of the peers and the cliques of v. The cost of the ordering will be F plus the cost due to the enlargement of the gaps between v and Kv . Thus, the total cost of the ordering is ≥ nF + nc+4 > T , again a contradiction. 112 6.8 CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS Lowerbound: MLogA for expanders We can also show a lower bound on the solution to MLogA for expander-like graphs, suggesting that they are not compressible with constant number of bits per edge via BV/BL schemes. Lemma 6.8.1 If G has either constant edge expansion or constant conductance, then the value of MLogA on G is Ω(m log n). If, instead, G has constant node expansion, then the value of MLogA on G is Ω(n log n). Proof: Let G be a simple graph with no isolated nodes. For the edge expansion case, note that for any S ⊆ V such that |S| = n/2, we have that Θ(n) edges are in the√cut (S, G \ S). Now if Θ(n) edges are in the cut then there are Θ(n) gaps of length at least n because the graph is simple. Hence the claim follows. For the constant conductance case, note that if G has no isolated nodes, then having constant conductance implies having constant edge expansion. The node expansion case is analogous. 6.8.1 The shingle ordering heuristic In this section we propose a simple and practical heuristic for both MLogA and MLogGapA problems. Our heuristic is based on obtaining a fingerprint of the outlinks of a node and ordering the nodes according to this fingerprint. If the fingerprint can succinctly capture the locality and similarity of nodes, then it can be effective in BV/BL compression schemes. To motivate our heuristic, we recall the Jaccard coefficient J(A, B) = |A ∩ B|/|A ∪ B|, a natural notion of similarity of two sets. Let σ be a random permutation of the elements in A ∪ B. For a set A, let Mσ (A) = σ −1 (mina∈A {σ(a)}), the smallest element in A according to σ; we call it the shingle. It can be shown [19] that the probability that the shingles of A and B are identical is precisely the Jaccard coefficient J(A, B), i.e., Pr[Mσ (A) = Mσ (B)] = |A ∩ B| . |A ∪ B| Instead of using random permutations, it was shown that the so-called min-wise independent family suffices [19]; in practice, even pairwise independent hash functions work well. It is also easy to boost the accuracy of this probabilistic estimator by combining multiple shingles obtained from independent hash functions. The intuition behind our heuristic is to treat the outlinks out(u) of a node u as a set and compute the shingle Mσ (out(u)) of this set for a suitably chosen permutation (or hash function) σ. The nodes in V can then be ordered by the shingles. By the property stated above, if two nodes have significantly overlapping outlinks, i.e., share a lot of common neighbors, then with high probability they will have the same shingle and hence be close to each other in a shingle-based ordering. Thus, the properties of locality and similarity are captured by the shingle ordering heuristic. (Gibson et al. [47] used a similar heuristic, but for identifying dense subgraphs of large graphs.) 6.8. LOWERBOUND: MLOGA FOR EXPANDERS 6.8.2 113 Properties of shingle ordering While shingle ordering might appear to be an unmotivated heuristic for obtaining a compressionfriendly ordering, it has theoretical justification. In this section we show that using shingle ordering, it is possible to copy a constant fraction of the edges in a large class of random graphs with certain properties. The well-known preferential attachment (PA) model [5, 14], for instance, generates graphs in this class. Our analysis thus shows that it is indeed possible to obtain provable performance guarantees on shingle ordering with respect to copying (hence compression) in stylized models. We first prove the following general statement about the sufficient conditions under which using shingle ordering can copy a constant fraction of edges. Theorem 6.8.1 Let G = (V, E) be such that |E| = Θ(n) and ∃S ⊆ V such that (a) |S| = Θ(n), (b) ∀v ∈ S, ∃v 0 ∈ S, v 6= v 0 , s.t. |out(v) ∩ out(v 0 )| ≥ 1, (c) there exists a constant k, s.t. ∀v ∈ S, outdeg(v) ≤ k, 1 (d) ∀v ∈ S, ∀w ∈ out(v), indeg(w) ≤ n 2 − . Then, with probability 1 − o|V | (1) (over the space of permutations), at least a constant fraction of the edges will be “copied” (even with a window of size 1) when using the shingle ordering. Proof: We need the following concentration inequality, proved (in a stronger form) by McDiarmid [77]. Theorem 6.8.2 Let X be a non-negative random variable not identically 0, which is determined by an independent random permutation σ, satisfying the following for some c, r > 0: interchanging two elements in the permutation can affect X by at most c, and for any s, if X ≥ s then there is a set of at most rs coordinates of σ whose values certify that X ≥ s. Then, for any 0 ≤ t ≤ E[X], p t2 . Pr[|X − E[X]| > t + 60c rE[X]] ≤ 4 exp − 2 8c rE[X] Using this, we prove Theorem 6.8.1. Given an ordering and a node v, we say that v 0 is the predecessor of v if it is placed at the immediate left of v in the ordering. Also, given an ordering and an arbitrary node v, we say that the edge (v, w) is “shingled” if the position of v is determined by w (that is, if the minimum out-neighbor of v, according to the random shingle permutation, is w). Also, we say that a node v is shingled by w if w is the minimum out-neighbor of v according to the random shingle permutation. A node v ∈ S is “good” if there exists another node v 0 ∈ S, v 6= v 0 , such that v and v 0 are shingled by the same node. Let X be the number of “good” nodes. How can we lower bound the expectation of X? By property (b) each node v in S has a common outneighbor with at least another node in S. As all nodes in S have outdegree bounded by k, with probability 1/(2k − 1) ≥ 1/(2k) one of their common out-neighbors will be the smallest of both their out-neighborhoods 114 CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS according to the random shingle permutation — that is, they will be shingled together. 1 Thus, E[X] ≥ 2k |S|. We will later argue that X will be ≥ Ω(|S|), w.h.p. . This entails that at least Ω(|S|) edges are copied. Indeed, partition the good nodes in S according to their shingling node. Each part will contain at least two nodes (by the definition of good nodes), and in each part all the nodes - but the first - will copy their edge pointing to their shingling node. Thus, the fraction of good nodes in a part copying at least one of their edges is ≥ 1/2. The claim follows. To obtain the high probability lower bound on X we use Theorem 6.8.2. Note that here we only have the random shingle permutation (that is, no random trials). In order to use Theorem 6.8.2 we have to choose suitable c, r. Using property (d), we can upperbound the effect on X of a swap of two elements with 1 c = 2n 2 − (the only nodes that can change their good or bad status are the in-neighbors of 1 the two swapped nodes — these can be upperbounded by 2n 2 − ). If a node v ∈ S is good, then there exists one other node v 0 ∈ S with the same shingling node w. Thus, to certify that v is good it suffices to reveal the positions of the nodes in N + (v) ∪ N + (v 0 ) — v is good iff w is the first of the nodes in N + (v) ∪ N + (v 0 ). As the degrees of v, v 0 are bounded by k, we can safely choose r = 2k. By plugging c, r into Theorem 6.8.2 we get the high probability lower bound on X. It is trivial to note that this holds even for undirected graphs; indeed, each undirected edge {u, v} can be substituted by two directed edges (u, v), (v, u). Then, for each node, its original set of neighbors will be the same as its new sets of in- and outneighbors. We now show the main result of the section: using shingle ordering it is possible to copy a constant fraction of the edges of graphs generated by the PA model. Theorem 6.8.3 With high probability, the graphs generated by the PA model satisfies the properties of Theorem 6.8.1. Proof: We start by removing the nodes incident to multi-edges or loops — these nodes (and their incident edges) are1 , altogether, o(n). Also, we remove all nodes of degree > k, for some constant k — by [15] only k n edges and nodes will be removed this way. The resulting graph will thus have at most n nodes and at least (1 − 2k )mn ≥ (1 − 2k )n edges. Also its maximum degree will be k. By averaging, a graph having these three n nodes of degree at least 2. properties will contain at least (1 − 2k ) 2k Now take all the nodes v in this graph incident to a neighbor of degree ≥ 2. There are n ≥ (1 − 2k ) 2k such neighbors and each of them will be connected to at most k such v’s — thus the number of these v’s is at least Ω(n/(2k 2 )) = Ω(n). The set of these v’s is the S of Theorem 6.8.1. As our experiments show, shingle ordering allows both BL and BV schemes to take significant advantage of copying. 1 This can be easily shown by noting that the expected number of multiple edges and self-loops added by the n-th inserted node is O(m3 /n1/2− ), conditioned on the fact that the highest degree at that point is O(n1/2+ ) whp [40]. Then, by Markov’s inequality we can obtain the claim. 6.9. EXPERIMENTAL RESULTS 6.9 115 Experimental results In this section we describe the experimental results. The goal of our experiments is two-fold: (1) study the performance of BV/BL schemes using the shingle ordering on social networks; (2) obtain insights into the differences between the Web and social networks in terms of their compressibility. First we begin with the description of the data sets we use for our experiments. Next we discuss the baselines we use (to compare against shingle ordering). Finally we present and discuss our experimental results. 6.9.1 Data For our experiments, we chose four large directed graphs: (i) a 2008 snapshot of LiveJournal (a social network site, livejournal.com) and an induced subgraph of users, called LiveJournal (zip), for whom we know their zip codes; (ii) monthly snapshots of Flickr (a photosharing site, flickr.com) from March 2004 until April 2008; (iii) the host graph of a 2005 snapshot of the .uk Web graph; and (iv) the host graph of a 2004 snapshot of the India+China (.in,.cn) Web graph. Graph UK-host India+China host LiveJournal LiveJournal (zip) Flickr (04/2008) Flickr (03/2004) n |E| % reciprocal edges 587,205 12,825,465 18.6 19,123 233,380 10.6 5,363,260 79,023,142 72.0 1,314,288 8,040,562 79.0 25,158,667 69,702,479 64.4 4,708 7,694 83.6 Table 6.1: Basic properties of our graphs. In Table 6.1, we summarize the properties of the graphs we have considered. Notice the magnitude of the reciprocity of social networks (LiveJournal and Flickr). Our BL scheme critically leverages this property of such networks. 6.9.2 Baselines We use the following orderings as our baselines to compare against the shingle ordering. (1) Random order. We use a random permutation of all the nodes in the graph. (2) Natural order. This is the most basic order that can be defined for a graph. For Web and host graphs, a natural order is the URL lexicographic ordering (used by BV). For a snapshot of LiveJournal, a natural order is the order in which the user profiles were crawled. For Flickr, since we know the exact time at which each node and edge was created, a natural order is the order in which users joined the network. (3) Geographic order. In a social network, if geographic information is available in the form of a zip code, then this defines a geography-based order. Liben-Nowell et al. [73] showed that about the 70% of social network links arise from geographical proximity, suggesting that 116 CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS friends can be grouped together using geographical information. Notice that this only defines a partial order (i.e., with ties). (4) DFS and BFS order. Here, the orderings are given by these common graph traversal algorithms. We also try the undirected versions of these traversals, where the edge direction is discarded. To test the robustness of shingle ordering, we also use an ordering obtained by two shingles instead of just one, where the second shingle is used to break ties produced by the first. We call this the double shingle ordering. When only one shingle was used, ties were broken using the natural order. Our performance numbers are always measured in bits/link. 6.9.3 Compression performance Graph LiveJournal Flickr UK host India+China host Natural 14.435 21.865 10.826 9.224 Random 23.566 23.958 15.543 10.543 LiveJournal Flickr UK host India+China host Natural 9.564 (ζ4 ) 16.382 (ζ4 ) 10.574 (δ) 9.753 (ζ4 ) Random 15.169 (ζ4 ) 17.785 (ζ4 ) 14.528 (δ) 10.823 (ζ4 ) BV Shingle Double Shingle 15.956 15.828 13.549 13.496 8.218 8.138 7.367 7.120 BL Shingle Double Shingle 10.461 (ζ4 ) 10.435 (ζ4 ) 10.952 (ζ4 ) 10.915 (ζ4 ) 8.243 (δ) 8.133 (δ) 7.310 (δ) 7.126 (δ) Table 6.2: Performances of the compression techniques under different orderings. In Table 6.2, we present the results of the different compression/orderings on four of the graphs. This table shows that double shingle ordering produces the best or near-best compression, for both BV and BL. In some cases, it cuts almost half the number of bits used by the natural order. Also we note that the improvement of BL over BV is significant for networks that are highly reciprocal, i.e., social networks. Finally, the numbers show interesting similarities between social networks and host graphs. In both cases, their compressibility using the best compression (BL with double shingle order) is on par with one another. It is interesting to note how the best compression rates for the UK host and the India+China host graphs are almost as high as the ones of the social networks (only 2-3 bits less than the 10-11 bits needed for social networks), even if the host graphs are much smaller than the social networks. For comparison, we note how the snapshot of the UK domain (India+China domains) that we used to obtain the host graph, was found to be compressible to 1.701 (1.472) bits/link (see [10] and http://law.dsi.unimi.it/). This seems to indicate that host graphs are very hard to compress. We also note how (Table 6.3) the BFS/DFS orderings are always suboptimal (almost as bad as a random order). In Table 6.4, we show the performance of geographical ordering on 6.9. EXPERIMENTAL RESULTS Graph LiveJournal UK host India+China host LiveJournal UK host India+China host 117 BV DFS Undir. DFS BFS Undir. BFS 19.992 20.253 20.763 21.376 14.630 14.474 14.903 14.634 10.172 10.210 10.231 9.810 BL DFS Undir. DFS BFS Undir. BFS 12.924 (ζ4 ) 13.096 (ζ4 ) 13.401 (ζ4 ) 13.778 (ζ4 ) 13.774 (ζ4 ) 13.607 (ζ4 ) 13.978 (ζ4 ) 13.731 (ζ4 ) 10.561 (ζ4 ) 10.317 (ζ4 ) 10.558 (ζ4 ) 10.105 (ζ4 ) Table 6.3: Performance of the BFS/DFS orderings. Graph LiveJournal (zip) LiveJournal (zip) BV Shingle Double Shingle 17.042 16.975 BL Geographic Shingle Double Shingle 11.396 (ζ4 ) 10.964 (δ) 10.950 (δ) Geographic 17.258 Table 6.4: Performance of geographic ordering on LiveJournal (zip). the induced subgraph of LiveJournal, restricted to users in US with a known zip code. We see how ordering by zip code (i.e., in such a way that people at small geographic distance are close to each other in the ordering) is much worse than ordering by shingle, suggesting that geographic ordering is perhaps not useful for compression. 6.9.4 Temporal analysis In Figure 6.2, we see how the different ordering and compression techniques achieve different results on the monthly snapshots of the Flickr social network. The upper half of the figure shows how the Flickr network grew over time. Here, we see that BL with shingle ordering beats the competition uniformly over all the snapshots. We also see an interesting pattern: BL obtains a better compression rate, with each of the orderings. It is remarkable to note that even though the number of edges in Flickr grew by an enormous number between March 2005 and April 2008, the compressibility of the network (under a variety of schemes and orderings) has remained robust. 6.9.5 Why does shingle ordering work best? Figures 6.3 and 6.4 show one reason why the shingle ordering helps compression: in the LiveJournal, India+China host and UK-host graphs the number of small gaps is higher with shingle ordering than with any other ordering (with the notable exception of the LiveJournal graph, where the natural ordering is marginally better). 118 CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS 26 24 22 20 bits/link 18 16 14 12 10 BV - Joining BV - Shingle BV - Random Back Links - Joining Back Links - Shingle Back Links - Random 8 6 Mar 2004 Sep 2004 Mar 2005 Sep 2005 Mar 2006 Time Sep 2006 Mar 2007 Sep 2007 Mar 2008 Figure 6.2: Performance on the temporal Flickr graph. In Figure 6.3, the upper panel represents the number of gaps (y-axis) of a certain length (x-axis) for the LiveJournal graph. The lower panel represents a sub-sampled version of the same data: for each length i we deleted the length = i point with probability Θ(1/i). This way, on expectation, the number of points in each interval 10k , . . . , 10k+1 is the same. The bottom panel is more readable. Recall that in LiveJournal, the natural (crawl) ordering beats shingle ordering by a small amount. Figure 6.3: Gap distribution in LiveJournal graph. In Figure 6.4, the upper (lower) panel represents the number of gaps (y-axis) of a certain length (x-axis) for (top to bottom) for the UK-host and the India+China host. These are sub-sampled versions of the actual data. Note that in both cases, the shingle ordering is best. That is, the shingle ordering creates many more gaps of small length than the other orderings. The smaller the length of a gap, the fewer bits it takes for encoding. From these, we see that shingle ordering reduces gaps lengths. As we argued earlier, shingle ordering also helps the BV and BL schemes exploit copying. These two benefits together appear to be the main reasons why shingle ordering almost always outperforms many other orderings. 6.9. EXPERIMENTAL RESULTS 119 Figure 6.4: Gap distribution in UK-host and India+China host graphs. 6.9.6 A cause of incompressibility We investigate what causes social networks to be far less compressible than web graphs (observed by [10] to be compressible to 2-3 bits per link). We ask the question: is the densest portion of a social network far more compressible than the rest of the graph? To study this, we analyze k-cores of the LiveJournal social network. Recall that a k-core of a graph is the largest induced subgraph whose minimum degree is at least k. For each k, the k-core of LiveJournal was extracted and compressed by itself. Then, the k-core edges were removed from the original LiveJournal, which was also compressed by itself. The results are shown in Figure 6.5. It is clear that as k increases, the k-core gets easier to compress but at the same time the remaining graph gets harder and harder to compress. This suggests that the low-degree nodes in social networks are primarily responsible for its incompressibility. Total K-core Remaining K-core size 2M 1M 500K 200K Number of Nodes Bits/Link 100K 9.8 9.75 9.7 9.65 9.6 9.55 9.5 9.45 9.4 9 8 7 6 5 4 3 2 1 10 20 30 40 50 60 70 80 90 100 K Figure 6.5: Compressibility of k-cores. k-cores can also be used to compress the social network. This is done by representing all the nodes in a k-core by a single virtual node, and compressing the k-core graph and the remainder graph (with the virtual node) separately. In our example, for k = 50, we obtain 9.435 bits/link compression. This is a slighly improvement over the best numbers in Table 6.2. Bibliography [1] Adler, M., and Mitzenmacher, M. Towards compressing web graphs. In Proc. Data Compression Conference(DCC01) (2001), pp. 203–212. [2] Aiello, W., Chung, F. R. K., and Lu, L. Random evolution in massive graphs. In Proc. 42nd IEEE Symposium on Foundations of Computer Science(FOCS01) (2001), pp. 510–519. [3] Althófer, I., Das, G., Dobkin, D., Joseph, D., and Soares, J. On sparse spanners of weighted graph. Discrete and Computational Geometry 9 (1993), 81–100. [4] Ambühl, C., Mastrolilli, M., and Svensson, O. Inapproximability results for sparsest cut, optimal linear arrangement, and precedence constrained scheduling. In Proc. 48th Annual IEEE Symposium on Foundations of Computer Science(FOCS07) (2007), pp. 329–337. [5] Barabási, A.-L., and Albert, R. Emergence of scaling in random networks. Science 5439, 286 (1999), 509–512. [6] Baswana, S., and Sen, S. A simple linear time algorithm for computing sparse spanners in weighted graphs. In Proc. 30th International Colloquium on Automata, Languages and Programming(ICALP03) (2003), pp. 384–396. [7] Berenbrink, P., Elsässer, R., and Friedetzky, T. Efficient randomised broadcasting in random regular networks with applications in peer-to-peer systems. In Proc. 27th ACM symposium on Principles of distributed computing(PODC08) (2008), pp. 155–164. [8] Bladford, D., and Blelloch, G. Index compression through document reordering. In Proc. Data Compression Conference(DCC02) (2002), pp. 342–351. [9] Boldi, P., Santini, M., and Vigna, S. Permuting web graphs. In Proc. of the 6th International Workshop on Algorithms and Models for the Web-Graph(WAW09) (2009), pp. 116–126. [10] Boldi, P., and Vigna, S. The webgraph framework I: Compression techniques. In Proc. 13th International World Wide Web Conference(WWW04) (2004), pp. 595–601. 121 122 BIBLIOGRAPHY [11] Boldi, P., and Vigna, S. The Webgraph framework ii: Codes for the world-wide web. In Proc. Data Compression Conference(DCC04) (2004). [12] Boldi, P., and Vigna, S. Codes for the world-wide web. Internet Mathematics 4, 2 (2005), 405–427. [13] Bollobás, B. The diameter of random graphs. IEEE Trans. Inform.Theory 36, 2 (1990), 285–288. [14] Bollobás, B., and Riordan, O. The diameter of a scale-free random graph. Combinatorica 1, 24 (2004), 5–34. [15] Bollobás, B., Riordan, O., Spencer, J., and Tusnády, G. E. The degree sequence of a scale-free random graph process. Random Structures and Algorithms 18, 3 (2001), 279–290. [16] Borgs, C., Chayes, J. T., Daskalakis, C., and Roch, S. First to market is not everything: An analysis of preferential attachment with fitness. In Proc. 39th Annual ACM Symposium on Theory of Computing(STOC07) (2007), pp. 135–144. [17] Boyd, S. P., Ghosh, A., Prabhakar, B., and Shah, D. Gossip algorithms: design, analysis and applications. IEEE Transactions on Information Theory 52 (2006), 1653–1664. [18] Breiger, R. L. The duality of persons and groups. Social Forces (1974). [19] Broder, A., Charikar, M., Frieze, A., and Mitzenmacher, M. Min-wise independent permutations. J. Comput. Syst. Sci. 60, 3 (2000), 630–659. [20] Broder, A., Glassman, S., Manasse, M., and Zweig, G. Syntactic clustering of the web. Comput. Netw. ISDN Syst 29, 8-13 (1997), 1157–1166. [21] Broder, A. Z., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. Graph structure in the web. Computer Networks 33 (2000), 309–320. [22] Buehrer, G., and Chellapilla, K. A scalable pattern mining approach to web graph compression with communities. In Proc. 1st International Conference on Web Search and Data Mining(WSDM08) (2008), pp. 95–106. [23] Carlson, J., and Doyle, J. Highly optimized tolerance: A mechanism for power laws in designed systems. Phys. Rev. E 60 (1999), 1412. [24] Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., and Raghavan, P. On compressing social networks. In Proc. 15th Conference on Knowledge Discovery and Data Mining(KDD09) (2009), pp. 219–228. BIBLIOGRAPHY 123 [25] Chierichetti, F., Kumar, R., Lattanzi, S., Panconesi, A., and Raghavan, P. Models for the compressible web. In Proc. 50th Annual IEEE Symposium on Foundations of Computer Science(FOCS09) (2009), pp. 331–340. [26] Chierichetti, F., Lattanzi, S., and Panconesi, A. Rumor spreading in social networks. In Proc. 36th Internatilonal Collogquium on Automata, Languages and Programming: Part II(ICALP09) (2009), pp. 375–386. [27] Chierichetti, F., Lattanzi, S., and Panconesi, A. Rumour spreading and graph conductance. In Proc. 21st Annual ACM-SIAM Symposium on Discrete Algorithms(SODA10) (2010), pp. 1657–1663. [28] Cooper, C., and Frieze, A. M. A general model of web graphs. Random Structures and Algorithms 3, 22 (2003), 311–335. [29] Cooper, C., and Frieze, A. M. The cover time of the preferential attachment graph. Journal of Combinatorial Theory, Ser. B 97, 2 (2007), 269–290. [30] Demers, A. J., Greene, D. H., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H. E., Swinehart, D. C., and Terry, D. B. Epidemic algorithms for replicated database maintenance. In Proc. 6th ACM symposium on Principles of distributed computing(PODC’87) (1987). [31] Dodds, P., Muhamad, R., and Watts, D. An experimental study of search in global social networks. Science 5634, 301 (2003), 827–829. [32] Doerr, B., Friedrich, T., and Sauerwald, T. Quasirandom rumor spreading. In Proc. 19th Annual ACM-SIAM Symposium on Discrete Algorithms(SODA08) (2008). [33] Doerr, B., Friedrich, T., and Sauerwald, T. Quasirandom rumor spreading: Expanders, push vs. pull, and robustness. In Proc. 36th International Colloquium on Automata, Languages and Programming(ICALP09) (2009), pp. 366–377. [34] Dubhashi, D. Talagrand’s inequality in hereditary settings. In Technical report, Dept. CS, Indian Istitute of Technology (1998). [35] Dubhashi, D., and Panconesi, A. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009. [36] Elsässer, R. On the communication complexity of randomized broadcasting in random-like graphs. In Proc. 18th Annual ACM Symposium on Parallel Algorithms and Architectures SPAA (2006), pp. 148–157. [37] Fabrikant, A., Koutsoupias, E., and Papadimitriou, C. H. Heuristically optimized trade-offs: A new paradigm for power laws in the internet. In Proc. 29th International Colloquium on Automata, Languages and Programming(ICALP02) (2002), pp. 110–122. 124 BIBLIOGRAPHY [38] Faloutsos, M., Faloutsos, P., and Faloutsos, C. On power-law relationships of the internet topology. In Proc. Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication(SIGCOMM99) (1999), pp. 251– 262. [39] Feige, U., Peleg, D., Raghavan, P., and Upfal, E. Randomized broadcast in networks. Algorithms 1 (1990), 128–137. [40] Flaxman, A., Frieze, A. M., and Fenner, T. I. High degree vertices and eigenvalues in the preferential attachment graph. Internet Mathematics 2, 1 (2005). [41] Fraigniaud, P., and Giakkoupis, G. The effect of power-law degrees on the navigability of small worlds. In Proc. 28th ACM symposium on Principles of distributed computing(PODC’09) (2009), pp. 240–249. [42] Fraigniaud, P., and Giakkoupis, G. On the searchability of small-world networks with arbitrary underlying structure. In Proc. 42nd Annual ACM Symposium on the Theory of Computing(STOC10) (2010), pp. 389–398. [43] Friedrich, T., and Sauerwald, T. Near-perfect load balancing by randomized rounding. In Proc. 41st Annual ACM Symposium on Theory of Computing(STOC09) (2009), pp. 121–130. [44] Frieze, A., and Grimmett, G. The shortest-path problem for graphs with random arc-lengths. Algorithms 1, 10 (1985), 57–77. [45] Garey, M., and Johnson, D. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, 1979. [46] Garey, M. R., Johnson, D. S., and Stockmeyer, L. Some simplified NPcomplete graph problems. Theory of Computer Science 1 (1976), 237–267. [47] Gibson, D., Kumar, R., and Tomkins, A. Discovering large dense subgraphs in massive graphs. In Proc. 31st International Conference on Very Large Data Bases(VLDB05) (2005), pp. 721–732. [48] Goel, S., Muhamad, R., and Watts, D. J. Social search in “small-world” experiments. In Proc. 18th international conference on World wide web (WWW’09) (2009), pp. 701–710. [49] Granovetter, M. The strength of weak ties. American Journal of Sociology 78, 6 (1973), 1360–1380. [50] Håstad, J. Some optimal inapproximability results. Journal of the Assoc. Comp. Mach 4, 48 (2001), 798–859. [51] Jerrum, M., and Sinclair, A. Approximating the permanent. SIAM J. Comput. 18, 6 (1989), 1149–1178. BIBLIOGRAPHY 125 [52] J.L. Guillaume, M. L. Bipartite graphs as models of complex networks. In Proc. 1st Workshop on Combinatorial and Algorithmic Aspects of Networking(CAAN04) (2004), pp. 127–139. [53] Karande, C., Chellapilla, K., and Andersen, R. Speeding up algorithms on compressed web graphs. In Proc. 2nd International Conference on Web Search and Data Mining(WSDM09) (2009), pp. 272–281. [54] Karoński, M., Scheinerman, E. R., and Singer-Cohen, K. B. On random intersection graphs: The subgraph problem. Combinatorics, Probability and Computing 8, 1–2 (2006), 131–159. [55] Karp, R., Schindelhauer, C., Shenker, S., and Voecking, B. Randomized rumor spreading. In Proc. 41st Annual IEEE Symposium on Foundations of Computer Science(FOCS00) (2000), p. ***. [56] Kempe, D., Dobra, A., and Gehrke, J. Gossip-based computation of aggregate information. In Proc. 44th IEEE Symposium on Foundations of Computer Science(FOCS03) (2003), pp. 482–491. [57] Klee, V., and Larman, D. Diameters of random graphs. Canad. J. Math 33 (1981), 618–640. [58] Kleinberg, J. Navigation in a small world. Nature 406 (2000), 845. [59] Kleinberg, J. The small-world phenomenon: An algorithmic perspective. In Proc. 37th Annual ACM Symposium on Theory of Computing(STOC00) (2000), pp. 163–170. [60] Kleinberg, J. Small-world phenomena and the dynamics of information. In Proc. 14th Advances in Neural Information Processing Systems (NIPS01) (2001), pp. 431– 438. [61] Kleinfeld, J. Could it be a big world after all? Society 39 (2002), 61–66. [62] Korte, C., and Milgram, S. Acquaintance links between white and negro populations: Application of the small world method. Journal of Personality and Social Psychology 15, 2 (1970), 101–108. [63] Kossinets, G., and Watts, D. J. Empirical analysis of evolving social networks. Science 311, 5757 (2006), 88–90. [64] Kumar, R., Novak, J., and Tomkins, A. Structure and evolution of online social networks. In Proc. 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD06) (2006), pp. 611–617. [65] Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A., and Upfal, E. Stochastic models for the web graph. In Proc. 41st IEEE Symposium on Foundations of Computer Science(FOCS00) (2000), pp. 57–65. 126 BIBLIOGRAPHY [66] Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. Trawling the web for emerging cybercommunities. In Proc. 8th International World Wide Web Conference(WWW09) (1999), pp. 403–416. [67] Lattanzi, S., and Sivakumar, D. Affiliation networks. In Proc. 41st ACM Symposium on Theory of Computing(STOC09) (2009), pp. 427–434. [68] Leskovec, J., Backstrom, L., Kumar, R., and Tomkins, A. Microscopic evolution of social networks. In Proc. 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’08) (2008), pp. 462–470. [69] Leskovec, J., Chakrabarti, D., Kleinberg, J., and Faloutsos, C. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In European Conference on Principles and Practice of Knowledge Discovery in Databases(PKDD05) (2005), pp. 133–145. [70] Leskovec, J., and Horvitz, E. Planetary-scale views on a large instant-messaging network. In Proc. 17th international conference on World Wide Web (WWW’08) (2008), pp. 915–924. [71] Leskovec, J., Kleinberg, J. M., and Faloutsos, C. Graph evolution: Densification and shrinking diameters. ACM Transactions on the Web (TWEB) 1, 1 (2007), 1–41. [72] Leskovec, J., Lang, K. J., Dasgupta, A., and Mahoney, M. W. Statistical properties of community structure in large social and information networks. In Proc. 17th international conference on World Wide Web (WWW’08) (2008), pp. 695–704. [73] Liben-Nowell, D., Novak, J., Kumar, R., Raghavan, P., and Tomkins, A. Geographic routing in social networks. Proc. National Academy of Sciences 102, 33 (2005), 11623–11628. [74] Lin, N., Dayton, P., and Greenwald, P. The urban communication network and social stratification: A “small world experiment. Communication yearbook 1 (1978), 107–119. [75] Mahdian, M., and Xu, Y. Stochastic kronecker graphs. In Proc. 5th Workshop on Algorithms and Models for the Web-Graph (WAW’07) (2007), pp. 179–186. [76] Mandelbrot, B. An informational theory of the statistical structure of languages. In Communication Theory. 1953, pp. 486–502. [77] McDiarmid, C. J. H. On the method of bounded differences. In Proc. 12th British Combinatorial Conference (1989), pp. 148–188. [78] Mihail, M., Papadimitriou, C. H., and Saberi, A. On certain connectivity properties of the internet topology. J. Comput. Syst. Sci 72, 2 (2006), 239–251. BIBLIOGRAPHY 127 [79] Milgram, S. The small world problem. Psychology Today 2 (1967), 60–67. [80] Mitzenmacher, M. A brief history of generative models for power law and lognormal distributions. Internet Mathematics 1, 2 (2003). [81] Mitzenmacher, M. Editorial: The future of power law research. Internet Mathematics 2, 4 (2006), 525–534. [82] Mosk-Aoyama, D., and Shah, D. Fast distributed algorithms for computing separable functions. IEEE Transactions on Information Theory 54, 7 (2008), 2997–3007. [83] Mutafchiev, L. The largest tree in certain models of random forests. Random Structures and Algorithms 13, 3-4 (1998), 211–228. [84] Naor, M. Succinct representation of general unlabeled graphs. Discrete Applied Mathematics 28 (1990), 303–307. [85] Newman, M. Properties of highly clustered networks. Phys Rev E Stat Nonlin Soft Matter Phys 68 (2003). [86] Paris, R. B., and Kaminsky, D. Asymptotics and the Mellin-Barnes Integrals. Cambridge University Press, 2001. [87] Peleg, D., and Upfal, E. A trade-off between space and efficiency for routing tables. Journal of Assoc. Comp. Mach 3, 36 (1989), 510–530. [88] Pittel, B. On spreading a rumor. SIAM Journal on Applied Mathematics 47 (1987), 213–223. [89] Raftery, A. E., Handcock, M. S., and Hoff, P. D. Latent space approaches to social network analysis. J. Amer. Stat. Assoc. 15, 460 (2002). [90] Raghavan, S., and Garcia-Molina, H. Representing web graphs. In Proc. 19th International Conference on Data Engineering(ICDE03) (2003), pp. 405–416. [91] Randall, K. H., Stata, R., Wiener, J., and Wickremesinghe, R. The Link database: Fast access to graphs of the web. In Proc. Data Compression Conference(DCC02) (2002), pp. 122–131. [92] Rao, S., and Richa, A. W. New approximation techniques for some linear ordering problems. SIAM Journal on Computing 2, 34 (2004), 388–404. [93] Sarkar, P., and Moore, A. W. Dynamic social network analysis using latent space models. ACM SIGKDD Explorations Newsletter 7, 2 (2005), 31–40. [94] Shieh, W., Chen, T., Shann, J. J., and Chung, C. P. Inverted file compression through document identifier reassignment. Information Processing and Management 1, 39 (2003), 117–131. 128 BIBLIOGRAPHY [95] Silvestri, F. Sorting out the document identifier assignment problem. In Proc. Advances in Information Retrieval, 29th European Conference on IR Research(ECIR07) (2007), pp. 101–112. [96] Silvestri, F., Perego, R., and Orlando, S. Assigning document identifiers to enhance compressibility of web search indexes. In Proc. 2004 ACM Symposium on Applied Computing (SAC) (2004), pp. 600–605. [97] Simon, H. On a class of skew distribution functions. Biometrika 42 (1955), 425–440. [98] Spielman, D. A., and Teng, S. H. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. 36th Annual ACM Symposium on Theory of Computing(STOC04) (2004), pp. 81–90. [99] Suel, T., and Yuan, J. Compressing the graph structure of the web. In Proc. Data Compression Conference(DCC01) (2001), pp. 213–222. [100] Szymanski, J. On the complexity of algorithms on recursive trees. Theoretical Computer Science 3, 74 (1990), 355–361. [101] Travers, J., and Milgram, S. An experimental study of the small world problem. Sociometry 4, 32 (1969), 425–443. [102] Turán, G. On the succinct representation of graphs. Discrete Applied Mathematics 8, 3 (1984), 289–294. [103] Watts, D., and Strogatz, S. Collective dynamics of ’small-world’ networks. Nature 393 (1998), 409–410. [104] Watts, D. J. A twenty-first century science. Nature 445 (2007), 489. [105] Watts, D. J., Dodds, P. S., and Newman, M. E. J. Identity and search in social networks. Science 296 (2002), 1302–1305. [106] Whittaker, E., and Watson, G. A Course in Modern Analysis. Cambridge University Press, 1996. [107] Witten, I. H., Moffat, A., and Bell, T. C. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kauffman Publishers, 1999. [108] Zipf, G. K. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.