Download Algorithms and models for social networks

Document related concepts

Probability interpretations wikipedia , lookup

Transcript
Sapienza, Università di Roma
Dottorato di Ricerca in Computer Science
XXIII Ciclo – 2010
Algorithms and models for social networks
Silvio Lattanzi
Sapienza, Università di Roma
Dottorato di Ricerca in Computer Science
XXIII Ciclo - 2010
Silvio Lattanzi
Algorithms and models for social networks
Thesis Committee
Prof. Alessandro Panconesi
D. Sivakumar
Prof. Angelo Monti
(Advisor)
Author’s address:
Silvio Lattanzi
Computer Science department
Sapienza, University of Rome
Via Salaria 113, 00198 Rome, Italy
e-mail: [email protected]
www: http://sites.google.com/site/silviolattanzi/
To Angela
and Barbara
Abstract
The coming-together of the Internet and large scale social networks(i.e. Flikr, Facebook,
MSN, Bebo, Twitter etc.) is having a deep and fruitful impact on the study of social
networks. Thanks to the new amount of data available nowadays is now possible to observe
social phenomena with greater precision and to study their evolution at a relatively fine
temporal scale. These opportunities, together with the increasing economic importance of
the internet economy create new interest on studying networks as unifying theme of research
between computer science, economy, sociology and biological science. In this context, to
fully characterize the statistical properties and to find suitable stochastic model for these
kind of graphs is now a central problem in theoretical and experimental computer science.
Within this umbrella, our research focus is to develop mathematical models of behavioral
social networks and to study the performances of algorithms on real world graph both from
a theoretical that from practical point of view. More specifically we first give a new model
that explains the evolving properties of social networks and we analyze its algorithmic implications. Then we consider the diffusion of information in real networks and we give an
explanation to the rumor spreading based on a statistical property of social network, i.e. the
conductance. Finally, we study the compressibility of the WorldWideWeb and of its models,
and we use our findings to design an algorithm to compress other social networks.
i
Acknowledgments
I owe my deepest gratitude to Prof. Alessandro Panconesi for the great guidance and
advice. He inspired me to work in this field and continuously transmitted excitement in
regard to research. Without his suggestions and help this thesis would not have been possible.
I would like to thank Flavio Chierichetti for many insights he shared with me and for
the great time we spent together.
I am grateful to my host in Yahoo! Research Ravi Kumar and my host in Google D.
Sivakumar for the invaluable ideas they shared with me, for suggesting many wonderful
problems and for working with me on them.
I would like to thank Lorenzo Alvisi, Amitanand Aiyer, Allen Clement, Rafael Frongillo,
Federico Mari, Michael Mitzenmacher, Ben Moseley, Prabhakar Raghavan, Siddarth Suri,
Sergei Vassilvitskii, Andrea Vattani for being wonderful coauthors and great coworkers.
I am grateful to Benjamin Doerr, Kevin McCurley, Patrick Nguyen and Mark Sandler
for numerous insightful comments and discussions.
I am indebted to my many of my university colleagues to support me during my PhD.
A special thanks goes to Federico Mari, Francesco Davì, Emanuele Fusco, Massimo Lauria,
Gaia Maselli, Igor Melatti, Simone Silvestri, Blerina Sinaimeri and Julinda Stefa.
I would like to show my gratitude to my family for their incredible support, help and
understanding.
Finally, I would like to thank Livio Romano, Stefano Pozio, Alessandro D’Amico, Valeria Morra, Delia De Siervo, Simona Tramontana, Guido Bolognesi, Donna Alvarez, Riccardo Romano, Gabriele Carracoy, Paolo Pino, Matteo Bonavia, Barbara Lattanzi, Giancarlo
Capezzuoli, Gianluca Gallo, Stefano Paoletti, Gioacchino Mendola, Andrea Giannantonio,
Alessandro Bonelli, Amit Lavy, Sascha Trifunovic and Alex Loddengard for all the wonderful
experiences we had together in the past three years.
Contents
1 Introduction
1.1 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
2 Affiliation networks
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Our model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Concentration Theorems . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Degree distribution of B(Q, U ) . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Lipschitz condition for the random variable Xti . . . . . . . . . . . . .
2.5 Properties of the degree distribution of B(Q, U ) . . . . . . . . . . . . . . . .
2.6 Properties of the degree distributions of the graphs G(Q, E) and Ĝ(Q, Ê) . .
2.7 Densification of edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8 Shrinking/stabilizing of the effective diameter . . . . . . . . . . . . . . . . .
2.9 Sparsification of G(Q, E) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9.1 Sparsification with preservation of the distances from a set of relevant
nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9.2 Sparsification with a stretching of the distances . . . . . . . . . . . .
2.10 Flexibility of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
9
13
14
15
16
18
20
22
23
25
26
3 Navigability of Affiliation Networks
3.1 Introduction . . . . . . . . . . . . .
3.2 Our model . . . . . . . . . . . . . .
3.3 Preliminaries . . . . . . . . . . . .
3.3.1 Concentration Theorems . .
3.4 Properties of the model . . . . . . .
3.5 The crucial role of weak ties . . . .
3.6 Local routing and the interest space
3.7 Experiments . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
33
33
35
37
37
38
42
44
49
4 Gossip
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
55
57
58
.
.
.
.
.
.
.
.
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
29
30
ii
CONTENTS
4.4
4.5
4.6
4.7
Warm-up: a weak bound . . .
A tighter bound . . . . . . . .
Push and Pull by themselves .
Optimality of Corollary 6.5.1 .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Compressibility of the Web graph
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Incompressibility of the existing models . . . . . . . . . . . . .
5.3.1 Proving incompressibility . . . . . . . . . . . . . . . . .
5.3.2 Incompressibility of the preferential attachment model
5.3.3 Incompressibility of the ACL model . . . . . . . . . . .
5.3.4 Incompressibility of the copying model . . . . . . . . .
5.3.5 Incompressibility of the Kronecker multiplication model
5.3.6 Incompressibility of Kleinberg’s small-world model . . .
5.4 The new web graph model . . . . . . . . . . . . . . . . . . . .
5.5 Rich get richer . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6 Long get longer . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7 Compressibility of our model . . . . . . . . . . . . . . . . . . .
5.8 Other properties of our model . . . . . . . . . . . . . . . . . .
5.8.1 Bipartite cliques . . . . . . . . . . . . . . . . . . . . . .
5.8.2 Clustering coefficient . . . . . . . . . . . . . . . . . . .
5.8.3 Undirected diameter . . . . . . . . . . . . . . . . . . .
6 Compressibility of social networks
6.1 Introduction . . . . . . . . . . . . . . .
6.2 Related work . . . . . . . . . . . . . .
6.3 Compression Schemes . . . . . . . . . .
6.3.1 BV compression scheme . . . .
6.3.2 Backlinks compression scheme .
6.4 Compression-friendly orderings . . . .
6.4.1 Formulation . . . . . . . . . . .
6.4.2 Hardness results . . . . . . . . .
6.5 MLogA vs. MLinA vs. MLogGapA
6.6 Hardness of MLogA . . . . . . . . . .
6.7 Hardness of MLinGapA . . . . . . . .
6.8 Lowerbound: MLogA for expanders .
6.8.1 The shingle ordering heuristic .
6.8.2 Properties of shingle ordering .
6.9 Experimental results . . . . . . . . . .
6.9.1 Data . . . . . . . . . . . . . . .
6.9.2 Baselines . . . . . . . . . . . . .
6.9.3 Compression performance . . .
6.9.4 Temporal analysis . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
62
68
74
75
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
77
77
80
81
82
82
84
87
89
90
91
93
94
99
100
100
101
101
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
103
103
104
105
105
106
107
107
108
108
109
110
112
112
113
115
115
115
116
117
CONTENTS
iii
6.9.5
6.9.6
Why does shingle ordering work best? . . . . . . . . . . . . . . . . . 117
A cause of incompressibility . . . . . . . . . . . . . . . . . . . . . . . 119
Chapter 1
Introduction
Over the past decade, the idea of networks as a unifying theme to study how social, technological, and natural systems are connected has emerged as an important and extroverted
direction within computer science, biology, economy and sociology. Indeed the growth and
the relevance of Internet-based networks draws the interest of several researchers on the
study of this new topic. Furthermore thanks to the new technologies and the computing
power available nowadays it is possible to analyze, discover and explain for the first time the
main properties of those networks.
Historically sociological, economic and biological studies on the structure and evolution
of networks have been based simply on local information. Thus their main results focused
only on dynamics of small chunks of a network. They were based only on small communities
(the usual size of an analyzed network was about a hundred of individuals) and consisted of
just few trials. Due to this lack of data it was hard to get any result on the global structure of
networks or at least to have a rigorous verification of their results. Furthermore it was hard
to analyze the interactions between nodes and communities on meaningful data-sets, while,
as pointed out by Watts in [104], this information is crucial to understand the structure and
the dynamics of real networks.
In the Nineties, with the introduction of the Internet and the WorldWideWeb, it was
possible for the first time to observe the dynamics and the evolution of those networks on
a large scale. In addition, with the birth of the Web 2.0 and the consequent introduction
of the blogosphere and social networks, it is now possible to have access to large data-sets.
The new opportunities, plus the new computing power available nowadays, generated a new
interest in analyzing common patterns in real networks. These studies led to the conclusion
that different kinds of networks share some macroscopic statistical properties. For example
studying the WorldWideWeb, the Internet, protein interactions graphs, Facebook and several
other real graphs it has been noticed that all of them have similar degree distributions and
similar community structure.
At the same time the increasing economic relevance of the Internet-based economy and
the proliferation of new social networks create an urge to study this interesting class of
graphs more deeply in order to design more efficient and performing algorithms for them.
In this new area of research three main trends arise independently:
• Statistical analysis of the data. In order to have a better understanding of the
2
CHAPTER 1. INTRODUCTION
behavior of social networks it is crucial to analyze the available data-sets and to find
out common patterns. The main challenge here is to cope with the size of the datasets(often in the order of billions of nodes) and with the technical, bureaucratic and
sometimes ethical issues to retrieve data.
• Modeling the dynamics of social networks. The modeling effort goes hand to
hand with the discovery of new statistic patterns of the data. After the discovery of
the first static properties of real networks it was clear that existing models failed to
explain several properties observed in real network. So recently there has been a lot
of effort to come out with models for social networks that can explain mathematically
such properties.
• Analysis of algorithms for social networks. Once it is possible to describe and
model this class of graphs it becomes of great interest to analyze the performance of
known and new algorithms for them.
In this context, the thesis focuses on developing mathematical models of behavioral social
networks and studying the performance of algorithms in real world graph from a theoretical
and practical point of view. In particular we addressed the following fundamental questions:
• Can we formulate a stochastic random process that matches all the known static properties and the evolving properties of social networks? Can we use the new model to
develop efficient algorithms?
• Can we build a model that at the same time explains the statistical properties and the
local routing properties of social networks?
• Can we explain the fast diffusion of information in social networks?
• Can we explain the compressibility of the Web? Can we achieve the same compression
rate for other social graphs?
In the following sections we first give a more precise definition of the problems that we study
in the thesis and an overview of our main contributions in this thesis.
1.1
Roadmap
In this section we introduce our main results and give an overview of the organization of the
thesis.
Affiliation Networks [Chapter 2] As already outlined in the previous section the
problem of finding a suitable stochastic process to explain the properties of social networks
attracts a lot of attentions from the theory community. In 2005 Leskovec et al, in a breakthrough paper [69], studied for the first time the evolving properties of social networks. The
authors analyzed the behavior over time of the following social graphs: the ArXiv citation
1.1. ROADMAP
3
graph, the patent citation graph, the autonomous system graph and a few co-authorship networks. In particular they focused their attention on the the average degree and the diameter
of the graph over time, with surprising results.
The common wisdom before their work was that the average degree is constant time and
that the diameter is slowly growing in time. Instead, they observed that the average degree
is actually growing in time and, even more surprisingly, the diameter of social networks tends
to shrink and finally stabilize over time.
These findings have several interesting implications for social network analysis and immediately invalidate all the previously known models for social networks.
In [67], we presented a new model that explains all the static and evolving properties of
social networks, as well as densification and shrinking diameter. A nice aspect of our model
is that it is based on the coevolution of a social network and an affiliation network, that
is a bipartite graph that captures the connections between people and interests. This idea
has strong sociological roots and appeared for the first time in the groundbreaking work of
Breiger [18].
More precisely in our model there are two graphs that co-evolve in time: a bipartite graph
on people and interest and a social graph on people. Initially we start with two graphs, a
bipartite graph of people and interests and a people graph with the property that if two
people share an interest in the bipartite graph they are friends in the people graph. In every
time step we add an interest to the bipartite graph with probability α or a person to both
graphs with probability 1 − α. (See figure 1.1)
• When a new interest is added, it selects a prototype and copies a “perturbation” of its
edges. Then the people graph is updated so that if two people share an interest in the
bipartite graph they are friends.
• When a new person is added, he(she) selects a prototype and connects to a “perturbation” of his/her neighborhood. Then the people graph is updated so that if two people
share an interest in the bipartite graph they are friends. Finally a constant number
preferential attachment edges going out from the new node in the people network are
added.
Note that in our model, there are two graphs that evolve at the same time, a bipartite
graph on people and interests, and a social network on people. Let’s call them the friendship
graph and the people-interest graph.
In our model there are two kinds of social ties that arise independently. The first comes
from a preferential attachment process, while the second comes from the existence of common
interests. In this way we are able combine the intuition of Breiger with the idea of centrality.
Using this technique, we can prove formally that our model not only enjoys the usual static
properties of real networks, but also the evolving ones observed in [71]. Intuitively we get
the static properties because we use a copying process to generate the bipartite graph. And
we obtain the evolving properties because when we add an edge to the bipartite graph we
add multiple edges to the people graph. In addition, by combining the effects of densification
and of preferential attachment edges we are able to prove that the effective diameter initially
shrinks and then stabilizes in time.
4
CHAPTER 1. INTRODUCTION
P1
I1
P2
I2
P3
I3
P2
P1
P3
P1
I1
P2
I2
P3
I3
P2
P1
P3
P4
P4
(A)
P1
I1
P2
I2
P3
I3
P2
P1
P3
(B)
P1
I1
P2
I2
P3
I3
P2
P1
P3
P4
P4
P4
P4
(C)
P1
I1
P2
I2
P3
I3
P2
P1
P3
(D)
P1
I1
P2
I2
P3
I3
P2
P1
P3
P4
P4
(E)
P4
P4
(F)
Figure 1.1: Insertion of a new person in the affiliation network and the social network derived
from it. (A)The initial affiliation network and the related social graph. (B)Insertion of P4
in the affiliation network. (C)P4 selects as prototype P3 . (D)P4 copies a perturbation of the
edges of P3 . (E)The social graph is updated. (F)P4 adds some preferential attachment edges
in the social graph.
Finally we also analyze some algorithmic consequences of our model. Since once we
understand the causes of the densification property of social networks it is natural to ask
if we can produce sparse graphs that preserve connectivity properties of the initial social
network. Specifically to overcome the difficulties of processing dense graph we study the
performance of two simple sparsification algorithms in our model. We proved that there
are sparsification algorithms that return a graph with a linear number of edges and that
approximate all the distance with constant distortion.
Navigability and Affiliation Networks [Chapter 3] One of the main limitations
of previously known evolving models is that those models are not embedded in any space
so it is impossible to define the concept of local information and to study their navigability.
This is not true for the Affiliation Network model, where using the concept of interest space
we are able to show that our model is navigable.
1.1. ROADMAP
5
Figure 1.2: An affiliation network(A) and the induced social network(B) and hierarchy of
interests(C). The dotted lines from a to b in (A) represent that b is the prototype of a. In
particular in figure we have that I1 is the prototype of I2 and I3 and that I3 is the prototype
of I4 . From those relationship we derive the interest tree represented in (C).
Specifically an interesting peculiarity of the Affiliation Networks model is that there are a
friendship graph and people-interest graph that co-evolve at the same time. In addition using
the notion of prototype inspired by the copying model it is possible to define a prototype
interest tree, where every interest is connected to its prototypes(see Figure 1.2). So in our
model, there are actually three graphs that evolve at the same time, a friendship graph,
people-interest graph and a hierarchy of interests. We will refer to the latter as the “interest
space”.
Using this characteristic we can embed every node in the social graph in the interest
space using its interests(i.e. every node is embedded in the positions of its interests, and we
define a distance between two nodes as the minimum distance between any pair of interests
of the first and of the second node). Using this definition it is possible to explain for the
first time the “small-world phenomenon” in a model that matches all the static and evolving
properties of real world graphs. Specifically, we proved that in our model, if every node
knows the interest space and its neighborhood in the social graph, the greedy local routing
algorithm routes a message from any node to any other node in at most polylogarithmically
many steps. Furthermore, if the receiver is a high degree node(i.e. a “hub”) the algorithm
will use only a constant number of rounds.
One of the most interesting features of our model is that it is the first attempt to create
a bridge between the study of the small-world phenomenon and the study of other statistical
properties of social networks. Finally, this is the first model that can explain Milgram’s
experiment if we include the presence of attrition [48], i.e. the unwillingness of people to
forward the messages.
In order to validate our model we ran a cyber-replica of Milgram’s experiment. We
perform a series of Milgram’s experiments on the network of co-authorship in scientific
6
CHAPTER 1. INTRODUCTION
papers, which naturally lends itself as a test-bed for evaluating our theory of social networks
derived from affiliation networks. Furthermore our experiments are also the first attempt to
make a cyber replica of Milgram’s experiment based on the interest space, and it is also the
first time that some rudimental concept of data mining is used to explore the navigability of
social networks. The empirical finding of our experiments confirmed Milgram’s initial result
and give a stronger empirical evidence of the reliability of our model.
Gossiping in social networks [Chapter 4] One of the aims of Milgram’s experiment
was to show that information can be easily delivered in social networks. A similar question,
that arises from everyday life, is whether information spreads efficiently in social networks.
Indeed in the real world it is possible to find many examples in which information, viruses
or malwares spread quickly in social graphs, so it would be interesting to understand why
this is happening.
First, we give an algorithmic formalization of the problem. As a first step we study the
well known randomized broadcast algorithm, also known as rumor spreading. Demers et
al [30] in particular defined three variants of this algorithm: PUSH, PULL and PUSH-PULL.
In the PUSH strategy in each round, every informed node selects a neighbor uniformly at
random and forwards the message to her(him). The PULL is a symmetric variant. In each
round, every node that does not yet have the message selects a neighbor uniformly at random
and asks for the information. Finally, the PUSH-PULL strategy is a combination of the two
techniques: in each round every informed node performs a PUSH and every uninformed node
performs a PULL.
Second, instead of giving an arbitrary definition of social networks or to analyze the problem only in some specific model, we study the correlation between information dissemination
and the conductance of the underlying network. In [72] it is shown empirically that social
networks have high conductance. In particular we prove that high conductance implies that
the PUSH-PULL strategy is fast, specifically we show that if a connected graph with n nodes
has conductance φ then rumour spreading, also known as randomized broadcast, successfully broadcasts a message within Õ(φ−1 · log n), many rounds with high probability. This
result is almost tight since there exists graph of n nodes, and conductance φ, with diameter
Ω(φ−1 · log n). Furthermore we also show that high conductance is not a sufficient condition
for PUSH or PULL to be efficient by themselves.
Compressible models for the Web graph [Chapter 5]
Compressibility is a
fundamental property of large scale graphs, indeed the ability of storing the structure of
these graphs using few bits has a great impact on the possibility to efficiently store and
manipulate these massive amounts of data. In an intriguing set of papers Boldi, Santini and
Vigna [9, 10] showed that the web is compressible using just a few bits per link (i.e. 2-3
bits per link on average). These findings suggest that the Web is compressible using just
O(1) bits per link. Starting from this observation we studied the compressibility of various
well-known models in order to understand if they can explain the good compression rate
observed in [9, 10].
More precisely we study the entropy of several well-known web graph models and by
using a min-entropy argument we are able to prove that their entropy is too large to explain
1.1. ROADMAP
7
the compressibility of the Web. Specifically they need to store at least Θ(log n) bits per link,
on average.
For this reason we introduce and analyze mathematically a new evolving model for the
Web graph that is able to explain all the well-known static properties of the Web and that
can explain also the good compression rate. In particular, our model achieves O(1) bits per
link, on average.
Compressibility of social networks [Chapter 6]
As underlined in the previous
subsection compressibility is a fundamental property of the Web graph. It is unclear if
also other social networks share it. We study this problem formally for the first time and we
introduce a new algorithm that outperforms all the previously known compression technique.
First, we adapted the Boldi and Vigna technique to compress general social networks.
The main problem with this approach is that in their work Boldi and Vigna use heavily two
properties of the URLs ordering:
• Locality: web pages that are close in the ordering point to a similar set of pages.
• Proximity: the typical edge length is small.
Those properties arise naturally in the case of URL ordering but it is not clear how to obtain
them in the case of social networks. Indeed in this second case there is no natural ordering
that we can use to sort the nodes.
Our first contribution is to define the concept of optimal ordering for compression and
to show that to find the optimal order is NP-hard. We then design heuristic to overcome
this hardness. Our new heuristic used shingles [20] to measure the similarity of the outgoing
edges of two nodes, and then to order the nodes in such a way that similar nodes appear
close in the ordering.
We also propose a new compression method inspired by the Boldi and Vigna algorithm
that in addition exploits also link reciprocity in social networks. Finally we then perform an
extensive set of experiments on four large real-world graphs, including two social networks.
Our experimental results show that social networks and the Web graph exhibit different
compressibility characteristics. Even if we can compress those graphs using only 10 bits per
link in average, we cannot compress any graph using less than 8 bits per links. These findings
support the intriguing idea that entropy can be used to distinguish different kind of social
networks that have similar static properties but that differ in the amount of randomness of
their structure(indeed it seems that the Web graph seems to be way more structured due to
the domain hierarchy).
Now we move to the core of the thesis, the thesis is organized in 6 chapters(the introduction and the chapters described above), each chapter is self-contained consisting of a brief
introduction, an explanation of related works and the presentation of our results.
Chapter 2
Affiliation networks
In the last decade, structural properties of several naturally arising networks (the Internet,
social networks, the web graph, etc.) have been studied intensively with a view to understanding their evolution. In recent empirical work, Leskovec, Kleinberg, and Faloutsos
identify two new and surprising properties of the evolution of many real-world networks:
densification (the ratio of edges to vertices grows over time), and shrinking diameter (the
diameter reduces over time). These properties run counter to conventional wisdom, and are
certainly inconsistent with graph models prior to their work.
In this chapter, we present the first model that provides a simple, realistic, and mathematically tractable generative model that intrinsically explains all the well-known properties
of the social networks, as well as densification and shrinking diameter. Our model is based
on ideas studied empirically in the social sciences, primarily in the groundbreaking work of
Breiger (1973) on bipartite models of social networks that capture the affiliation of agents
to societies.
We also present algorithms that harness the structural consequences of our model. Specifically, we show how to overcome the bottleneck of densification in computing shortest paths
between vertices by producing sparse subgraphs that preserve or approximate shortest distances to all or a distinguished subset of vertices. This is a rare example of an algorithmic
benefit derived from a realistic graph model.
Finally, our work also presents a modular approach to connecting random graph paradigms
(preferential attachment, edge-copying, etc.) to structural consequences (heavy-tailed degree
distributions, shrinking diameter, etc.).
2.1
Introduction
The aim of this chapter is to develop mathematical models of real-world “social” networks
that are realistic, mathematically tractable, and — perhaps most importantly — algorithmically useful . There are several models of social networks that are natural and realistic (fit
available data) but are hard from an analytical viewpoint; the ones that are amenable to
The work described in this chapter is a joint work with D. Sivakumar and its extended abstract appeared
in the Proceedings of 41st ACM Symposium on Theory of Computing (STOC09) [67].
9
10
CHAPTER 2. AFFILIATION NETWORKS
mathematical analysis or that have algorithmic significance are often unnatural or unrealistic. In contrast, we present a model, rooted in sociology, that leads to clean mathematical
analysis as well as algorithmic benefits.
We now briefly outline the history of significant recent developments in modeling realworld networks that provide the immediate context for our work. The numerous references
from and to these salient pieces of work will offer the reader a more comprehensive picture
of this area.
Internet and Web Graphs. One of the first observations that led to the interest in
random graph models significantly different from the classical Erdös–Rényi models comes in
the work of Faloutsos et al. [38], who noticed that the degree distribution of the Internet
graph1 is heavy-tailed, and roughly obeys a “power law,” that is, for some constant α > 0,
the fraction of nodes of degree d is proportional to d−α . Similar observations were made
about the web graph2 by Barabasi and Albert [5], who also presented models based on
the notion of “preferential attachment,” wherein a network evolves by new nodes attaching
themselves to existing nodes with probability proportional to the degrees of those nodes.
Both works draw their inspiration and mathematical precedents from classical works of
Zipf [108], Mandelbrot [76], and Simon [97]. The latter work was formalized and studied
rigorously in [14, 15, 28]. Broder et al. [21] made a rich set of observations about the degree
and connectivity structure of the web graph, and showed that besides power-law degree
distribution, the web graph consisted of numerous dense bipartite subgraphs (often dubbed
“communities”).
Within theoretical CS, Aiello et al. [2], and Kumar et al. [65] presented two models of
random graphs, both of which offer rigorous explanations for power-law degree distributions;
the models of [65] also led to graphs with numerous dense bipartite subgraphs, the first
models to do so. The models of [65] are based on the notion of graph evolution by “copying,”
where each new vertex picks an existing vertex as its “prototype,” and copies (according to
some probabilistic model) its edges.
Preferential attachment and edge copying are two basic paradigms that both lead to
heavy-tailed degree distributions and small diameter. The former is simpler to analyze, and
indeed, despite its shortcomings with respect to explaining community structure, it has been
analyzed extensively [14, 15, 25, 26]. For an entirely different treatment, see [37].
Small-World Graphs. In another development, Watts and Strogatz [103], Kleinberg
[58, 59], and Dodds et al. [31] revisited a classic 1960’s experiment of the sociologist Stanley
Milgram [79], who discovered that, on average, pairs of people chosen at random from the
population are only six steps apart in the network of first-name acquaintances. In Kleinberg’s
model, vertices reside in some metric space, and a vertex is usually connected to most
other vertices in its metric neighborhood, and, in addition, to a few “long range” neighbors.
Kleinberg introduced an algorithmic twist, and proved the remarkable result that the network
has small diameter and easily discoverable paths iff the long-range neighbors are chosen in
a specific way.
1
Loosely speaking, this is the graph whose vertices are computers and whose edges are network links.
This is the graph whose vertices are web pages, and whose directed edges are hyperlinks among web
pages.
2
2.1. INTRODUCTION
11
Kleinberg’s models, dubbed “small-world networks,” offer a nice starting point to analyze
social networks3 . A piece of folklore wisdom about social networks is the observation that
friendship is mostly transitive, that is, if a and b are friends and b and c are friends, then
there is a good chance that a and c are friends as well. Kleinberg’s model certainly produces
graphs that satisfy this condition, but because of its stylized nature, isn’t applicable in
developing an understanding of real social networks. The other limitation of Kleinberg’s
model is that it is static, and is not a model of graph evolution.
Densification and Shrinking Diameter. Returning to the topic of evolving random
graphs, the next significant milestone is the work of Leskovec et al. [71], who made two
stunning empirical observations, both of which immediately invalidate prior models based
on preferential attachment, edge copying, etc., as well as the small-world models. Namely,
they reported that real-world networks became denser over time (super-constant average
degree), and their diameters effectively decreased over time!
The dual pursuits of empirical observations and theoretical models go hand in hand4 ,
and the work of [71] poses new challenges for mathematical modeling of real-world networks.
Along with their observations, Leskovec et al. [71] present two graph models called the
“community guided attachment” and “forest fire model”. The former is a hierarchical model,
and the latter is based on an extension of edge copying. While several analytical results
are proved concerning these two models in [71], the models are quite complex and do not
admit analyses powerful enough to establish all the observed properties, most notably degree
distribution, densification, and shrinking diameter simultaneously.
The papers [69] and [75] study models explicitly contrived to be mathematically tractable
and yielding the observed properties, without any claims of being realistic or intuitively
natural. In the opposite direction, Leskovec et al. [68] propose a model that fit the data
quite well, but that do not admit mathematical analyses. The crucial features of the latter
model are that edges are created based on preferential attachment and by randomly “closing
triangles.”
Affiliation Networks. Our design goals for a mathematical model of generic social
networks are that it should be simple to state and intuitively natural, sufficiently flexible
and modular in structure with respect to the paradigms employed, and, of course, by judicious choice of the paradigms, offer compelling explanations of the empirically observed
phenomena.
The underlying idea behind our model is that in social networks there are two types of
entities — actors and societies — that are related by affiliation of the former in the latter.
These relationships can be naturally viewed as bipartite graphs, called affiliation networks;
the social network among the actors that results from the bipartite graph is obtained by
“folding” the graph, that is, replacing paths of length two in the bipartite graph among
actors by an (undirected) edge. The central thesis in developing a social network as a folded
3
collaboration networks among authors, email and instant messaging networks, as well as the ones underlying Friendster, LiveJournal, Orkut, LinkedIn, MySpace, FaceBook, Bebo, etc. Indeed, the work of [73]
demonstrates interesting correlations of friendships on the LiveJournal network with geographic proximity
as an underlying metric for a small-world model.
4
see Mitzenmacher’s editorial [81] for an eloquent articulation of this phenomenon
12
CHAPTER 2. AFFILIATION NETWORKS
affiliation network is that acquaintance among people often stem from one or more common
or shared affiliations — living on the same street, working at the same place, being fans of
the same football club, having coauthored a paper together, etc.
Affiliation networks are certainly not new — indeed, this terminology is prevalent in sociology, and a fundamental 1974 paper of Breiger [18] appears to be the first one to explicitly
address the duality of “persons and groups” in the context of “networks of interpersonal ties...
[and] intergroup ties.” Breiger notes that the metaphor of this “dualism” occurs as early as
in 1902 in the work of Cooley. Finally in two previous paper the connectivity and the degree
distribution of a similar static version of this model as been studied in [52, 54, 85].
Our model for the evolving affiliation network and the consequent social network incorporates elements of preferential attachment and edge copying in fairly natural ways. The
folding rule we analyze primarily in the chapter is the one that places an undirected edge
between every pair of (actor) nodes connected by a length-2 path in the bipartite graph.
We comment briefly on some extensions for which our analyses continue to work, and more
generally, on the flexibility of our model, Section 2.10. We show that when an affiliation
network B is generated according to our model and its folding G on n vertices is produced,
the resulting graphs satisfy the following properties:
(1) B has a power-law distribution, and G has a heavy-tailed degree distribution as well,
and all but o(n) vertices of G have bounded degree;
(2) under a mild condition on the ratio of the expected degree of actor nodes and society
nodes in B, the graph G has superlinear number of edges;
(3) under the same condition, the effective diameter of G stabilizes to a constant.
Algorithmic Benefits, and an Application. Although they are very interesting,
these structural properties do not yield any direct insight into the development of efficient
algorithms for challenging problems on large-scale graphs. With our model of networks based
on affiliation graphs, we take a significant step towards remedying this situation. We show
how we can approach path problems on our networks by taking advantage of a key feature in
their structure. Namely, we utilize the fact that even though the ultimate network produced
by the model is dense, there is a sparse (constant average degree) backbone of the network
given by the underlying affiliation network.
First we show that if we are given a large random set R of distinguished nodes and we
care about paths from arbitrary nodes to nodes in R, then we can sparsify the graph to have
only a small constant fraction of its edges, yet preserving all shortest distances to vertices
in R. Secondly, we show that if we are allowed some distortion, we can sparsify the graph
significantly via a simple algorithm for graph spanners: namely, we show that we can sparsify
the graph to have a linear number of edges, while stretching distances by no more than a
factor given by the ratio of the expected degree of actor and society nodes in the affiliation
network.
Finally, we mention our motivating example: a “social” network that emerges from search
engine queries, where these shortest path problems have considerable significance. The affiliation network here is the bipartite graph of queries and web pages (urls), with edges
between queries and urls that users clicked on for the query; by folding this network, we
may produce a graph on just the queries, whose edges take on a natural semantics of relatedness. Now suppose we are given a distinguished subset of queries that possess some
2.2. OUR MODEL
13
significance (high commercial value, topic names, people names, etc.). Given any query, we
can: find the nearest commercial queries (to generate advertisements), classify the query
into topics, or discover people associated with the query. We have empirically observed that
our sparsification algorithms work well on these graphs with hundreds of millions of nodes.
Critique of our work. In this chapter we only analyze the most basic folding rule,
namely replace each society node in the affiliation network by a complete graph on its actors
in the folded graph. As noted in Section 2.10, this could be remedied somewhat without
losing the structural properties; we leave for future work a more detailed exploration of the
possibilities here.
The next drawback of our models is that given a social network (or other large graph),
it is not at all clear how one can test the hypothesis that it was formed by the folding of an
affiliation network. The general problem of solving, given a graph G on a set Q of vertices,
whether it was obtained by folding an affiliation network on vertex sets Q and U , where
|U | = O(|Q|), is NP-Complete.
Finally, our model of folded affiliation networks seems limited to social networks among
people related together by various attributes (the societies). A feature that is often seen in
several large real networks that appears to be missed by our model is the presence of an approximately hierarchical structure (for example, the Internet graph exhibits an approximate
hierarchy in the form of autonomous systems, domains, intra- and inter-domain edges via
gateways, and so forth).
2.2
Our model
In our model, two graphs evolve at the same time. The first one is a simple bipartite graph
that represents the affiliation network; we refer to this graph as B(Q, U ). The second one
it is the social network graph, we call this graph G(Q, E). The set Q is the same in both
graphs. As defined, G(Q, E) is a multigraph, so we also analyze the underlying simple
graph Ĝ(Q, Ê). For readability, we present the two evolution processes separately even
though the two graphs evolve together. More precisely the bipartite graph B(Q, U ) evolves
independently, and at every step G(Q, E) is obtained by “folding” the edges of B(Q, U ) and
by adding some extra edges (the folding process is simply B(Q, U )2 [Q], where B(Q, U )2 is
the usual product (composition) of B with itself and [Q] denotes the subgraph of B(Q, U )
induced by Q).
In order to understand the intuition behind this evolving process, let us consider, for
example, the citation graph among papers. In this case the bipartite graph consists of
papers, the set Q, and topics, the set U . Now when an author writes a new paper, he
probably has in mind some older paper that will be the prototype, and he is likely to write
on (a subset of the) topics considered in this prototype. Similarly, when a new topic emerges
in the literature, it is usually inspired by an existing topic (prototype) and it has been
probably foreseen by older papers.
To continue the example of citation networks, the intuition behind the construction of
G(Q, E) is that when an author writes the references of a new paper he will cite all, or most,
of the paper on the same topics and some other papers of general interest. The same ideas
14
CHAPTER 2. AFFILIATION NETWORKS
B(Q, U )
G(Q, E)
Fix integers cq , cu , s > 0, and let β ∈ (0, 1).
At time 0, G0 (Q, E) consists of the subset
Fix two integers cq , cu > 0, and let β ∈ Q of the vertices of B0 (Q, U ), and two ver(0, 1).
tices have an edge between them for every
At time 0, the bipartite graph B0 (Q, U ) neighbor in U that they have in common
is a simple graph with at least cq cu edges, in B0 (Q, U ).
where each node in Q has at least cq edges At time t > 0:
and each node in U has at least cu edges. (Evolution of Q) With probability β:
At time t > 0:
(Arrival ) A new node q is added to Q.
(Edges via Prototype) An edge between q
(Evolution of Q) With probability β:
and another node in Q is added for ev(Arrival ) A new node q is added to Q.
ery neighbor that they have in common in
(Preferentially chosen Prototype) A node B(Q, U ) (note that this is done after the
q 0 ∈ Q is chosen as prototype for the new edges for q are determined in B).
node, with probability proportional to its (Edges via evolution of U )
degree.
With probability 1 − β:
(Edge copying) cq edges are “copied” from A new edge is added between two nodes q1
q 0 ; that is, cq neighbors of q 0 , denoted by and q2 if the new node added to u ∈ U is
u1 , . . . , ucq , are chosen uniformly at ran- a neighbor of both q1 and q2 in B(Q, U ).
dom (without replacement), and the edges (Preferentially Chosen Edges) A set of
(q, u1 ), · · · , (q, ucq ) are added to the graph. s nodes q , . . . , q is chosen, each node ini1
is
dependently
of
the
others (with replace(Evolution of U ) With probability 1 − β,
a new node u is added to U following a ment), by choosing vertices with probasymmetrical process, adding cu edges to u. bility proportional to their degrees, and
the edges (q, qi1 ), . . . , (q, qis ) are added to
G(Q, E).
that suggest this model as a reasonable model for the citation graph can be applied also to
several other social graphs.
We call folded any edge that is in G0 (Q, E) or has been added to G(Q, E) via the
prototype or by evolution of U ; the set of folded edges is denoted by F . In the next section
we will introduce some notation and some results that we will use in this chapter.
2.3
Preliminaries
We say that an event occurs with high probability (whp) if it happens with probability
1 − o(1), where the o(1) term goes to zero as n, the number of vertices goes to ∞. We
1
denote with ∆ the fraction cu1(1−β) and with ∆0 the fraction
. We define eB0 as the
cq β
4+
cq β
4+ c
u (1−β)
number of edges of the initial graph B0 (Q, U ). Finally, we denote c∗ and c∗ respectively the
max(cq , cu ) and the min(cq , cu ).
2.3. PRELIMINARIES
2.3.1
15
Concentration Theorems
Now we recall two important properties of functions that makes the task of establishing
measure concentration results easier, and present the relevant concentration results from the
literature.
Definition 2.3.1 [Averaged Lipschitz Condition] A function f satisfies the averaged
Lipschitz condition with parameters cj , j ∈ [n] with respect to the random variables X1 , · · · , Xn
if for any aj , a0j and for 1 ≤ j ≤ n.
0 E
f
(X
,
·
·
·
,
X
)
X
=
a
,
·
·
·
,
X
=
a
−E
f
(X
,
·
·
·
,
X
)
X
=
a
,
·
·
·
,
X
=
a
1
n
1
1
j
j
1
n
1
1
j
j ≤ dj
Lemma 2.3.1 [cf. [77]] Assume f satisfies the averaged Lipschitz condition with respect
to the variables X1 , · · · ,P
Xn with parameters cj , j ∈ [n]. Then P r[|f − E[f ]| > t] ≤
2
exp (−t /2c), where c = j≤n c2j .
Definition 2.3.2 (Hereditary Property and Hereditary Function) A Boolean property ρ(x, J ), where x is a sequence of n reals and J is a family of subsets of [n], is said to
be a hereditary property of index sets if:
(1) ρ is a property of index sets, that is, if xj = yj for every j ∈ J ∈ J , then ρ(x, J ) =
ρ(y, J );
(2) ρ is non-increasing on the index sets, that is, if I ⊆ J , then ρ(x, J ) ⇒ ρ(x, I).
Let fρ (x) be the function determined by a hereditary property of index sets ρ given by fρ =
maxJ :ρ(x,J ) |J |; we will call fρ a hereditary function of index sets.
The concentration result for hereditary functions of index sets is a consequence of the
Talagrand’s inequality [34].
Theorem 2.3.1 [ [34]] Let M [f ] be the median of f (x) and fρ (x) be a hereditary function
of index sets. Then for all t > 0,
Pr[f > M [f ]+t] ≤ 2 exp (−t2 /(4(M [f ] + t))), and Pr[f < M [f ]−t] ≤ 2 exp (−t2 /(4(M [f ]))).
The next proposition gives a passage for concetration theorems for the median value of
a function to its mean value.
Proposition 2.3.1 The following are equivalent for an arbitrary function f and random
variables X1 , · · · , Xn :
(1) For all t > 0, there exist c1 , α1 > 0 such that Pr[|f − E[f ]| > t] ≤ c1 e−α1 t .
(2) For all t > 0, there exist c2 , α2 > 0 such that Pr[|f − M [f ]| > t] ≤ c2 e−α2 t .
16
CHAPTER 2. AFFILIATION NETWORKS
2.4
Degree distribution of B(Q, U )
Theorem 2.4.1 For the bipartite graph B(Q, U ) generated after n steps, almost surely,
when n → ∞, the degree sequence
U ) follows a power law distribution
of nodes in Q (resp.
cq β
cu (1−β)
with exponent α = −2 − cu (1−β) α = −2 − cq β , for every degree smaller than nγ , with
γ < ∆0 (γ < ∆).
Before we present the proof, we recall the following useful lemma from [2].
Lemma 2.4.1 [ [2]] If a sequence at satisfies the recursive formula at+1 = (1 − bt /t) at + ct
for t ≥ t0 , where limt→∞ bt = b > 0 and limt→∞ ct ≥ c exist. Then limt→∞ at /t exists and
equals c/(1 + b).
Proof: (of Theorem 2.4.1) Let Xti be the random variable that counts the number of nodes
i
i
in Q of degree i at time t. We would write Eti = E[Xti ] in terms of Et−1
= E[Xt−1
]. First,
we analyze the case when i = cq .
c
c
q
+ Pr[a new node is added to Q]
Et q = Et−1
−E[number of nodes in Q with degree cq at time t − 1 whose degrees increase].
In the random process that generates B(Q, U ) the degree of a node in Q can increase if and
only if a node is added to U , so we have that:
c
c
q
Et q = Et−1
+ Pr[a new node is added to Q] − (1 − β)
E[num. nodes in Q of degree cq at time t − 1 whose degrees increase |
a node is added toP
U]
cq
u
= Et−1
+ β − (1 − β) ci=1
Pr[a node whose degree is cq at time t
is chosen as endpoint for the i-th edge]
where the second equation comes from linearity of expectation.
c
Let Et q |G(t − 1) be the expectation of the random variable that counts the number of
nodes in Q of degree i at time t given that G(Q, E) at time t − 1 is equal to G(t − 1). Now
by noticing that in the random process the endpoint of any edge in Q is chosen with equal
probability as a destination of the i-th new edge, we have:
c
c
E
cq
cq
q
Et q |G(t − 1) = Et−1
+ β − (1 − β)cu et−1t−1
+eB
0
cq
cq
= Et−1 1 − (1 − β)cu et−1 +eB + β,
0
where et−1 is the number of edges added by the process until time t−1 and eB0 is the number
of edges in B0 (Q, U ). Using the Chernoff bound and summing over all the possible G(t − 1)
we obtain:
cq
cq
cq
Et = Et−1 1 − (1 − β)cu (cq β+cu (1−β))(t−1)±o(t)+eB + o(1) + β
0
cq
cq
= Et−1 1 − (1 − β)cu (cq β+cu (1−β))(t−1)±o(t) + o(1) + β
cq
cq
= Et−1
1 − (1 − β)cu (cq β+cu (1−β))(t−1)
(1 ± o(1)) + o(1) + β
2.4. DEGREE DISTRIBUTION OF B(Q, U )
17
Where with o(1) we consider the sum of all the G(t − 1) where the number of edges is far
from the mean. Thus, using lemma 2.4.1, we have:
c
Et q
β
β(cq β + cu (1 − β))
lim
=
.
=
cq
t→∞ t
1 + (1 − β)cu (cq β+cu (1−β))
cq β + cu (1 − β) + (1 − β)cu cq
Now let us analyze the general case when i > cu . We have that:
i
Eti = Et−1
− E[number of nodes in Q, with degree i at time t − 1, that increase their
degree] + E[number of nodes in Q with degree smaller than i at time t − 1 that
increase their degree to i]
Noticing that in the bipartite graph there are no multiple edges and with an analysis similar
to the case of cq we get:
i−1
i
i
1 − (1 − β)cu et−1
Eti = Et−1
+ (1 − β)cu ei−1
Et−1
+ o(1)
t−1
i
i
(1 ± o(1))
1 − (1 − β)cu (cq β+cu (1−β))(t−1)
= Et−1
i−1
i−1
+(1 − β)cu (cq β+cu (1−β))(t−1)
(1 ± o(1))Et−1
+ o(1).
Let define Y i = limt→∞ Eti /t. Thus from lemma 2.4.1, we have:
Yi =
=
=
=
i−1
Y i−1
q β+cu (1−β))
i
1+(1−β)cu (c β+c (1−β))
q
u
(1−β)cu (c
(1−β)cu (i−1)
Y i−1
cq β+cu (1−β)+(1−β)cu i
(i−1)
i−1
Y
c β
q
i+1+ c (1−β)
u
Q
k−1
Y cu ik=cu +1
cq β
k+1+ c (1−β)
u
= Y
cu
Γ(i)
cq β
−2− c (1−β)
u
∼ i
cq β
)
u (1−β)
Γ(i+2+ c
cq β
)
u (1−β)
Γ(cu +2+ c
Γ(cu )
.
Finally, to obtain theorem 2.4.1 we need to prove that the variables Xti are concentrated
around their expected values. In order to do this we describe our random process in term
of two random choice. That is, at each time step firstly a biased coin is tossed and a node
is added to Q or U according to the outcome, then a set of endpoints is chosen. Let be Ct
and St the two random choices at time t.
We show that the random variables Xti satisfies certain “bounded differences” property,
defined below in 2.4.2.
Before we prove Lemma 2.4.2, we will complete the proof of Theorem 2.4.1, using
Lemma 2.3.1.
Combining Lemma 2.4.2 and Lemma 2.3.1, we obtain the concentration result. By noticing that all proofs hold also for the distribution of nodes in U by symmetry, Theorem 2.4.1
follows.
18
CHAPTER 2. AFFILIATION NETWORKS
Lipschitz condition for the random variable Xti
2.4.1
Lemma 2.4.2 The random variables Xti satisfies the averaged Lipschitz condition with parameters 2cu + (2cu + 2cq )(i + 1) with respect to the random variables C1 , S1 · · · , Cn , Sn
c
Proof: First, we start by computing the Averaged Lipschitz Condition for the variable Xt q .
We begin by analizing what is the change if a different set of endpoints is chosen. At time
j we have that the maximum possible difference is cu . This is due to the fact that if a node
c
is added to Q even changing the endpoints of its starting edges will no effect Xj q , instead
adding a node to U changing the endpoints of its starting edges will effect at most cu nodes.
Thus, recalling the notation Eti = E[Xti ], we have:
c
c
cq
∆jq = |Ej q − Ê j | ≤ cu
Now we have that:
c
cq
c
∆t q = |Et q −Ê t |
PeB0 +(t−1)(cu +cv )
cq
cq
= |Et−1 1 − k=1
(1 − β)cu k P r(k edges at time t − 1) + β+
PeB0 +(t−1)(cu +cv )
cq
(1 − β)cu ckq P r(k edges at time t − 1) − β|
−Ê t−1 1 − k=1
PeB0 +(t−1)(cu +cv )
cq
cq
= 1 − k=1
− Ê t−1 |
(1 − β)cu ckq P r(k edges at time t − 1) |Et−1
cq
c
q
≤ |Et−1
− Ê t−1 | ≤ · · · ≤ cu
Let us consider the case when we have a different result in the tossed coin. In one case we
add the j − th node nj to Q and in the other we add nj to U . So we have that:
c
c
∆jq
=
c
|Ej q
−
cq
Êj |
=
cq
|Xj−1
+1−
cq
Xj−1
q
cq Ej−1
| ≤ cu + 1
+ cu
ej−1
Thus:
c
c
cq
q
∆t q = |E
t −Ê t |P
eB0 +(t−1)(cu +cv )
cq
= Et−1
1 − k=1
(1 − β)cu ckq P r(k edges at time t − 1)|vj ∈ Q) + β+
PeB0 +(t−1)(cu +cv )
cq
cq
−Ê t−1 1 − k=1
(1 − β)cu k P r(k edges at time t − 1)|vj ∈ U ) − β Now we cannot use the same trick of before, because we have different number of edges, but
we have that P r(k − cq + cu edges at time t − 1|nj ∈ U ) = P r(k edges at time t − 1|nj ∈ Q).
In this case we assume wlog that cu ≥ cq . So we have:
PeB0 +(t−1)(cu +cv )
cq
cq
∆t = Et−1
1 − k=1
(1 − β)cu k+ccuq−cq P r(k edges at time t − 1|vj ∈ U ) +
PeB0 +(t−1)(cu +cv )
cq
−Ê t−1 1 − k=1
(1 − β)cu ckq P r(k edges at time t − 1|vj ∈ U ) PeB0 +(t−1)(cu +cv )
cq
cq
cq
= (Et−1 − Ê t−1 ) 1 − k=1
(1 − β)cu k P r(k edges at time t − 1)|vj ∈ U
P
eB0 +(t−1)(cu +cv )
cq
1
1
+Et−1
(1
−
β)c
c
−
P r(k edges at time t − 1|
u
q
k+cu −cq
k
k=1
|vj ∈ U ) 2.4. DEGREE DISTRIBUTION OF B(Q, U )
19
c
q
Now suppose by induction that ∆t−1
≤ 2cu + 2cq . We have two cases if the expression in the
absolute value is ≥ 0 then
PeB0 +(t−1)(cu +cv )
cq
cq
c
− Ê t−1 ) 1 − k=1
(1 − β)cu ckq P r(k edges at time t − 1)|vj ∈ U +
∆t q = (Et−1
P
eB0 +(t−1)(cu +cv )
cq
cu −cq
P
r(k
edges
at
time
t
−
1|v
∈
U
)
−Et−1
(1
−
β)c
c
j
u q k(k+cu −cq )
k=1
PeB0 +(t−1)(cu +cv )
cq
cq
cq
(1 − β)cu k P r(k edges at time t − 1)|vj ∈ U +
≤ (Et−1 − Ê t−1 ) 1 − k=1
P
eB0 +(t−1)(cu +cq )
−(cu + cv )
(1
−
β)P
r(k
edges
at
time
t
−
1|v
∈
U
)
j
k=1
cq
≤ max(∆t−1 , cu − cv )
Otherwise if the expression in the absolute value is < 0
PeB0 +(t−1)(cu +cv )
cq
c
cq
∆t q ≤ (Ê t−1 − Et−1
) 1 − k=1
(1 − β)cu ckq P r(k edges at time t − 1)|vj ∈ U +
P
eB0 +(t−1)(cu +cq )
cq
cu −cq
+Et−1
(1
−
β)c
c
P
r(k
edges
at
time
t
−
1|v
∈
U
)
u q k(k+cu −cq )
j
k=1
P
eB0 +(t−1)(cu +cv )
cq
cq
cu −cq
cq
≤ ∆t−1 − ∆t−1 − cq
(1 − β)cu k P r(k edges at time t − 1)|
k=1
|vj ∈ U
≤ 2cu + 2cq
Thus we have the 2cu + 2cq -Averaged Lipschitz Condition with respect to the vaiables
c
C1 , S1 · · · , Cn , Sn for the random variable Xt q . Now we have to analyze the case of Xti . First,
we begin by analizing what is the change if a different set of endpoints is chosen. We have:
i
∆ij ≤ |Eji − Ê j | ≤ cu
So:
i
i
∆it = |E
t − Ê t | P
eB0 +(t−1)(cu +cv )
i
= Et−1
1 − k=1
(1 − β)cu ki P r(k edges at time t − 1) +
PeB0 +(t−1)(cu +cv )
i−1
+ k=1
(1 − β)cu i−1
P r(k edges at time t − 1)Et−1
k
PeB0 +(t−1)(cu +cv )
i
i
−Ê t−1 1 − k=1
(1 − β)cu k P r(k edges at time t − 1) +
PeB0 +(t−1)(cu +cv )
i−1 − k=1
(1 − β)cu i−1
P
r(k
edges
at
time
t
−
1)Ê
t−1 k
P
i
i
eB0 +(t−1)(cu +cv )
i
i
= Et−1
− Ê t−1 − k=1
(1 − β)cu ki P r(k edges at time t − 1)(Et−1
− Ê t−1 )+
PeB0 +(t−1)(cu +cv )
i−1 i−1
− Ê t−1 )
+ k=1
(1 − β)cu i−1
P r(k edges at time t − 1)(Et−1
k
Thus:
∆it
PeB0 +(t−1)(cu +cv )
i
i−1 1
i
≤ ∆t−1 − k=1
(1 − β)cu k P r(k edges at time t − 1)(i∆t−1 − (i − 1)∆t−1 )
i−1
≤ ∆t−1 + 1
Now let us focus on the case when there is a different result in the tossed coin. In one case
we add the j − th node nj to Q and in the other we add nj to U . As before we assume wlog
20
CHAPTER 2. AFFILIATION NETWORKS
that cu ≥ cq . So we have that:
∆ij
=
|Eji
−
i
Ê j|
=
cq
|Xj−1
−
cq
Xj−1
i−1
i
cq (i − 1)Xj−1
iXj−1
− cu
| ≤ cu + 1
+ cu
ej−1
ej−1
Like before also here we can have a different number of edges, so we analyze this case as
before:
i
i
∆it = |E
t − Ê t | P
eB0 +(t−1)(cu +cv )
i
= Et−1
1 − k=1
(1 − β)cu k+cui −cq P r(k edges at time t − 1|vj ∈ U ) +
P
eB0 +(t−1)(cu +cv )
i−1
i−1
+Et−1
(1
−
β)c
P
r(k
edges
at
time
t
−
1|v
∈
U
)
u k+cu −cq
j
k=1
PeB0 +(t−1)(cu +cv )
i
i
−Ê t−1 1 − k=1
(1 − β)cu k P r(k edges at time t − 1|vj ∈ U ) +
i−1 PeB0 +(t−1)(cu +cv )
i−1
(1
−
β)c
−Ê t−1
P
r(k
edges
at
time
t
−
1|v
∈
U
)
u k
j
k=1
P
eB0 +(t−1)(cu +cv )
(1 − β)cu k1 P r(k edges at time t − 1|
≤ ∆it−1 − i∆it−1 − (i − 1)∆i−1
t−1
k=1
P
eB0 +(t−1)(cu +cv )
i−1
i
|vj ∈ U ) + (Et−1
− Et−1
)
(1 − β)cu cq ( k+cu1 −cq − k1 )P r(k edges at
k=1
time t − 1|vj ∈ U ) PeB0 +(t−1)(cu +cv )
(1 − β)cu k1 P r(k edges at time t − 1|
≤ ∆it−1 − (i∆it−1 − (i − 1)∆i−1
t−1 )
k=1
P
eB0 +(t−1)(cu +cv )
cu −cq
(1
−
β)c
|vj ∈ U ) +
P
r(k
edges
at
time
t
−
1|v
∈
U
)
u k
j
k=1
i−1
≤ ∆t−1 + 2cu + 2cq
By induction we have the 2cu + (2cu + 2cq )(i + 1)-Averaged Lipschitz Condition with respect
to the vaiables C1 , S1 · · · , Cn , Sn for the random variable Xti .
2.5
Properties of the degree distribution of B(Q, U )
Here we will explore several aspects of the evolution model. In particular we start by showing
that if a node in B(Q, U ) has degree g(n) at time n, it should have degree θ(g(n)) also at
time n for > 0 and its degree is increased by θ(g(n)) between time n and n. First, we
prove that the following property.
Lemma 2.5.1 If a node in B(Q, U ) has degree g(n) ∈ Ω(logn) at time φn, with constant
B0
cq β−e
c∗
n−1
and
1 < φ < 0 then, with high probability, it will have degree smaller than g(n) φn
cq β−eB
c 0
∗
n−1
larger than g(n) φn
at the end of the process, for any constant 0 < ϕ < 1.
Proof: We want to compute the number of nodes in Q that will point to u at the end of
the process knowing that at time φn its degree was g(n). Let Etu be the expected degree of
u at time t. We have that:
cq β
u
u
Et = Et−1 1 +
et−1
2.5. PROPERTIES OF THE DEGREE DISTRIBUTION OF B(Q, U )
21
Where et−1 is the number of edges at time t − 1. Thus we have that:
cq β
cq β
u
u
u
< Et < Et−1 1 + ∗
Et−1 1 +
c∗ (t − 1) + eB0
c (t − 1) + eB0
So we have:
E
Thus:
cq β
c∗
eB
(t−1)+ c 0
∗
“
”
eB
c β
Γ t−1+ cq
Γ(φn+ c 0 )
∗
u
∗ ”
“
φn Γ(t−1+ eB0 ) Γ φn+ cq β
c∗
c∗
u
Et−1
t−1+
“
”
eB
c β
Γ n−1+ cq
Γ(φn+ c 0 )
∗
u
∗ ”
“
Eφn
eB
c β
Γ(n−1+ c 0 ) Γ φn+ cq
∗
n−1
φn
∗
cq β−eB
0
c∗
<
Etu
<
< Etu < E
< Enu <
<
Enu
<
cq β
c∗
eB
(t−1)+ c∗0
“
”
c β
eB
Γ t−1+ cq∗
Γ(φn+ c∗0 )
u
“
”
φn Γ(t−1+ eB0 ) Γ φn+ cq β
c∗
c∗
u
Et−1
t−1+
“
”
c β
c β
Γ n−1+ cq∗
Γ(φn+ cq∗ )
u
“
”
Eφn
cq β
cq β
Γ(n−1+ c∗ ) Γ φn+ c∗
n−1
φn
B0
cq β−e
c∗
Now we have to show that the degree of u is concentrated around its mean, in order to
do it we will use the theorem 2.3.1 combined with proposition 2.3.1.
Indeed the degree of u can be seen as a hereditary function on the set of edges where the
boolean property, associated with the hereditary function, is: having all the endpoints in U
equal to u. Further, we have that the lower bound and the mean value for f ∈ Θ(g(n)),
so M [f ] ∈ Θ(g(n)). So by the proposition 2.3.1 and the theorem 2.3.1 we have that Etu is
concentrated.
The two following Corollaries follow from the previous by selecting the parameters of the
previous Lemma carefully.
Lemma 2.5.2 If a node in B(Q, U ) has degree g(n) at time n, with g(n) ∈ ω(log n), it had,
with high probability, degree Ω(g(n)) also at time n for any constant > 0.
Lemma 2.5.3 If a node in B(Q, U ) has degree Θ(nλ ) at the end of the process a δ fraction
of the node pointing to u have been inserted after time φn, for any constant 0 < δ, λ < 1 and
for a constant φ that depends on δ.
The last lemma is an upper bound to the number of edges in B(Q, U ) that points to
a node of degree at least i ∈ U . This lemma is important to have an upperbound on the
probability of pointing to a high degree node.
Lemma 2.5.4 At any time φn, for any 0 < φ ≤ 1 the number
of edges, in B(Q, U ),
c (1−β)
− uc β
q
, for any i up to nγ , with
that points to a node in U of degree at least i is Θ ni
γ<
1
4+
cu (1−β)
cq β
.
Proof: Let Zti be the number of edges with the endpoint in U of degree j at time t. We
have that Ztj = jXtj , where Xtj is the number of nodes of degree j in U at time t. By
22
CHAPTER 2. AFFILIATION NETWORKS
Xnj
theorem 2.4.1, we have that
c (1−β)
−1− uc β
j
q
Zn ∈ Θ j
. Hence
n
X
c (1−β)
−2− uc β
q
∈Θ j
, for j up to nγ , with γ <
1
4+
cu (1−β)
cq β
, thus
c (1−β)
c (1−β)
c (1−β)
− uc β
− uc β
− uc β
q
q
q
=Θ n i
−n
Z ∈Θ n i
j
j=i
for i up to nγ , with γ <
1
4+
cu (1−β)
cq β
.
Finally, we have to prove that this is true for every φn, for any 0 < φ ≤ 1, by lemma 2.5.2
j
= Θ(g(n)), thus using the same technique we can prove
we know that if Xnj = g(n) then Xφn
P
P
n
j
that the same property holds also for the variables Ztj . So the nj=i Ztj = Θ
Z
, for
j=i
any t > φn.
2.6
Properties of the degree distributions of the graphs
G(Q, E) and Ĝ(Q, Ê)
Although derived from B(Q, U ), the problem of computing the degree distributions of
G(Q, E) and of Ĝ(Q, Ê) is much harder; in this section we will show some interesting properties of the degree distribution of the folded graphs. First we will show that the probability of
a random node G having high-degree dominates the complementary cumulative distribution
function of the degree distribution of the nodes in U in B(Q, U ). Then, by construction, a
similar theorem follows with respect to the nodes in Q. Together, these results imply:
Theorem 2.6.1 The degree distributions of the graphs G(Q, E) and Ĝ(Q, Ê) are heavytailed.
Proposition 2.6.1 For the folded graphs G(Q, E) and Ĝ(Q, Ê) generated after n steps,
almost surely, when n → ∞, the complementary cumulative distribution function of the
degrees of nodes inserted after time φn, for any 0 < φ < 1, dominates the complementary
cumulative distribution of a power law with exponent α = −2 − cu (1−β)
, for every degree
cq β
2+
1
γ
smaller than n , with γ < cu (1−β) , and in ω(log n).
4+
cq β
Proof: Let Qi be the number of nodes inserted after time φn and with degree at least i
in G(Q, E). Instead ofcomputing directly Qi we show that Qi is bigger than a random
variable, which is in Θ ni
−2−
cu (1−β)
cq β
.
Let S i be the number of edges inserted after time φn, pointing to a node of degree at
least i and such that ∀(a, b) ∈ S i , if a ∈ Q then (a, b) is the oldest edge pointing to the node
a. By definition the following inequality holds: S i ≤ Qi . Now by Lemma 2.5.2 we know
2.7. DENSIFICATION OF EDGES
23
that all the nodes inserted after time φn will have degree in O(log n) in B(Q, U ). So any
node in Qi has degree in O(log n) in B(Q, U ) and only cu of its neighbors can have degree in
ω(log n). Hence if a node in Q0 has degree i ∈ ω(log2+ n) at least one of its initial neighbors
has degree in Θ(i) in B(Q, U ).
to nγ , with γ <
4+
−
cu (1−β)
cq β
edges of degree at least i, for i up
c (1−β)
− uc β
1
q
i
such that p∗ <
cu (1−β) , thus there is a constant p∗ ∈ Θ
Now by Lemma 2.5.4 there are Θ ni
cq β
Pr[copying an edge of degree at least i at time t], for any t ≥ φn.
Now S i dominates the number of heads that we have if we flip Θ((1 − φ)n) times a
biased coin that gives head with probability p∗ . Thus applying the Chernoff bound, we
i
have: Θ(p∗ (1 − φ)n)
≤Q. Hence Qi ∈ Ω ni
−
cu (1−β)
cq β
, and Pr[a node in Q0 has degree > i] ∈ Ω i
−
cu (1−β)
cq β
.
Proposition 2.6.2 For the folded graphs G(Q, E) and Ĝ(Q, Ê) generated after n steps,
almost surely, when n → ∞, the complementary cumulative distribution function of nodes
dominates the complementary cumulative distribution function of a power law distribution
cq β
1
with exponent α = −2 − cu (1−β)
, for every degree smaller than nγ , with γ <
.
cq β
4+ c
u (1−β)
The proof follows from the definition of the co-evolution of the graphs.
Finally we will show that most the nodes have degrees in Θ(1). Recall that F is the set
of edges obtained by the folding process.
Proposition 2.6.3 For the folded graphs G(Q, E) and Ĝ(Q, Ê) generated after n steps, all
but o(n) nodes have degree in Θ(1).
Proof: We start by notice that we can restrict our attention to the edges in F because
the edges in |E − F | ∈ Θ(n). Indeed only o(n) nodes have degree in ω(1) in the graph
G(Q, E − F ).
Further, by Theorem 2.4.1, all but o(n) nodes in B(Q, U ) have degree ∈ Θ(1). In addition,
recalling that for Lemma 2.5.4 only an o(n) of the edges in B(Q, U ) will point to a nonconstant degree node in U . Hence only an o(n) of the nodes increase its degree by more than
a constant factor.
2.7
Densification of edges
In this section we prove that the number of edges of the graph G(Q, E) and Ĝ(Q, Ê) are in
ω(|Q|).
Theorem 2.7.1 If cu <
β
c
1−β q
the number of edges in G(Q, E) is ω(n).
Proof: We notice that every node u in U ∈ B(Q, U ) in G(Q, E) gives rise to a clique where
all neighbors of u are connected. Thus we can lower bound the number of edges in the graph
G(Q, E) as follows:
24
CHAPTER 2. AFFILIATION NETWORKS
X
n
N X
i
i
|E| >
(# of nodes of degree i in U )
≥
(# of nodes of degree i)
,
2
2
i=1
i=1
1
. By Theorem 2.4.1 with high probability:
where N = nγ , with γ <
cq β
4+ c (1−β)
u




N
X
n
1
  (1 ± o(1)) i  ∈ ω(n).
|E| >
cu (1−β)
2+ c β
2
q
ζ −2 − cu (1−β)
i
i=1
cq β
Theorem 2.7.2 If cu <
β
c
1−β q
the number of edges in Ĝ(Q, Ê) is an ω(n).
Proof: In order to prove the claim we start by noticing that the number of edges increases
in Ĝ(Q, Ê) only when a node is added to Q, indeed when a new node is added to U only
multiple edges and self loops are introduced. In the proof we restrict our attention to the
edges in F and with one endpoint in a node of degree bigger then nλ .
First, we start by computing the number of nodes of degree bigger than µnλ , let us call
this set H.
λ
|H| = n −
#of nodes of degree smaller than µn

Pµnλ −1


1
= Θ n − n · 2+ cu1(1−β) !

c (1−β)
i=1
2+ u
ζ i

cq β
i
cq β



= Θ n − n 1 −

1
ζ i
c (1−β)
2+ u
cq β
!
Pn
1
i=µnλ
i
c (1−β)
2+ u
cq β


c (1−β)
1−λ−λ uc β
q
= Θ n
P
Where in the last passage we use the fact that ni=k i1α = Θ(k 1−α − n1−α ).
When a node, q, is added to Q, it will add cq edges to nodes in U . Let us denote
eq = (q, u1 ) the first edge added by q. Then q will introduce in Ĝ(Q, Ê) a number of edges
larger than the degree of u1 . So if u1 is in H, q will introduce at least µnλ new edges. Let
us define an edge interesting if it points to H and it is the first edge added by a new node
in Q, we would like to lower bound the number of interesting edges added after φn.
We start by lower bounding the number of edges added before φn from below and pointing
to a node in
2.5.2 and the estimate of |H| we get that the number of those
H. Using lemma
edges is Θ n
1−λ
cu (1−β)
cq β
. Thus the number of interesting edges added after φn dominates
the number of headthat we get ifwe tossed
(1 − φ)ntimes a biased coin, which gives head
c (1−β)
1−λ u
−ε
c (1−β)
cq β
−λ uc β −ε
q
with probability Θ n (cu +cv )n
=Θ n
, for any small ε > 0.
2.8. SHRINKING/STABILIZING OF THE EFFECTIVE DIAMETER
25
So using the Chernoff bound
we have that w.h.p. the number of interesting edges in
cu (1−β)
1−λ c β −ε
q
troduced after time φn is Ω n
. Now recalling that each one of this edges will
introduce in the folded graph at least Θ(nλ ) edges the claim follows.
2.8
Shrinking/stabilizing of the effective diameter
We use the definition of the ψ-effective diameter given in [71].
Definition 2.8.1 (Effective Diameter) For 0 < ψ < 1, we define the ψ-effective diameter
as the minimum de such that, for at least a ψ fraction of the reachable node pairs, the shortest
path between the pairs is at most de .
In this section we will show that the effective diameters of G(Q, E) and Ĝ(Q, Ê) shrink
or stabilize over time. The intuition behind those proofs is that even if a person q is not
interested in any popular topic, and so is not linked to any popular topic in B(Q, U ), with
high probability at least a friend of q is interested in a popular topic.
β
Theorem 2.8.1 If cu < 1−β
cq , the ψ-effective diameter of the graph G(Q, E) shrinks or
stabilizes after time φn with high probability, for any 0 < φ < 1 and for any constant
0 < ψ < 1.
Proof: Let H be the set of nodes of U in B(Q, U ) with degree ≥ nα , for small α > 0.
By Lemma 2.5.2 every node in H has been inserted in the graph before time γn, for any
0 < γ < 1. Thus the diameter of the neighborhood of H in G(Q, E) shrinks or stabilizes
after time γn.
Now we want to show that all but o(n) nodes inserted after time n with φ has
at least a neighbor that is in the neighborhood of H in B(Q, U ). Hence we will be able to
upper bound the q-effective diameter with the diam(H) + 2, for any constant q < 1.
The number of edges that have as one endpoint a node which is a neighbor of H is lower
bounded by the number of edges generated by the existence of nodes in H. At any time
after n
the number of this edges can be lower bounded,
as in Theorem 2.7.1, by
PN
1
n
1
“
”
(1 ± o(1)) 2i , where N = nγ , with γ <
,
cq β
c (1−β)
c (1−β)
i=nα
2+ u
4+
ζ −2− uc β
cq β
cu (1−β)
q
i
!
cu (1−β)
1
1+
thus they are in Ω
n
4+
cq β
cu (1−β)
cq β
. Instead the number of edges whose endpoints are
not neighbors of H can be upper bounded by
!
! !
!
nα
X
cu (1−β)
n
1
i
1+α c β
q
(1 ± o(1))
+ sn ∈ Θ n
,
cu (1−β) 2+ cu (1−β)
2
cq β
ζ
−
2
−
i
i=1
cq β
where the first term of the sum represents all the edges that are created by nodes in U in
B(Q, U ) and ∈
/ H and the second term represents all the edges added to the graph by a
choice based on preferential attachment in G(Q, E).
26
CHAPTER 2. AFFILIATION NETWORKS
Now when a new node v arrives at a time between n and φn, it chooses a set of nodes
qi1 , . . . , qis independently with a probability proportional to their degrees and it connects to
1
those nodes. Thus by fixing α <
we have that v will point with high probability
cq β
4+ c
u (1−β)
to a node that is neighbor to H in B(Q, U ). Hence for at least a q fraction of the reachable
node pairs, the shortest path length between a pair is at most diam(H) + 2.
β
Theorem 2.8.2 If cu < 1−β
cq , the ψ-effective diameter of the graph Ĝ(Q, Ê) is upper
bounded by a constant with high probability for any time φn, for 0 < φ < 1.
Proof: This proof is mostly same as the proof of Theorem 2.8.1; the only difference is that we
cannot use the same lower bound for the edges that have an endpoint in the neighbors of H.
Let 0 < φ, using the same tecniques of Theorem 2.7.2we have
that the
number of edges
” “
1+δ 1−
of degree at least nδ inserted between time 0 and is Ω n
cu (1−β)
cq β
−ε
. Thus, fixing
the δ and the α of the proof of Theorem 2.8.1 such that α is smaller than δ 1 − cu (1−β)
−ε
cq β
we have that also in this case the probability of choosing a destination of an edge that is not
in the neighbors of H is o(1). Hence using the same arguments of Theorem 2.8.1 the result
follows.
2.9
Sparsification of G(Q, E)
Several interesting algorithms (eg. the Dijkstra’s algorithm) have complexity proportional to
the number of edges in the graph. So to harness the implicit hardness due to the densification
of the edges in social networks we study in this section the performances of two sparsification
algorithms.
First we analyze a setting in which we have a set of several relevant, or distinguished,
nodes and we want to preserve all the distances between a relevant node and every other
nodes. The set of relevant nodes has cardinality at most logn n and is chosen uniformly at
random. For this case, we present an algorithm, Algorithm A , which, with high probability,
generates from G(Q, E) a new graph G0 (Q, E 0 ), with |E 0 | ≤ δ|E| and 0 < δ < 1, such that
for any node u in G and any relevant node v, a path of shortest distance in G is also present
in G0 .
In the second setting, in which a constant stretching of distances is allowed, we show
that exists an algorithm that reduces the number of edges to a Θ(n) both in G(Q, E) and
in Ĝ(Q, Ê).
2.9.1
Sparsification with preservation of the distances from a set of
relevant nodes
We start by describing algorithm A , the sparsification algorithm.
Input: G(Q, E) and a set R of relevant nodes.
(1) Initially, label all edges deletable.
2.9. SPARSIFICATION OF G(Q, E)
27
(2) For each node a ∈ R:
(a) Compute the breadth first search tree starting from node a and exploring the children
of a node in increasing order of insertion.
(b) Label all edges in the breadth first search tree of any node a as undeletable.
(3) Delete all edges labeled as deletable.
Theorem 2.9.1 Suppose the set of relevant nodes R has cardinality logn n and suppose that
β
the elements of R are chosen uniformly at random from Q. If cu ≤ 1−β
cq , the algorithm A
0
0
with high probability generates from G(Q, E) a new graph G (Q, E ), with |E 0 | ≤ δ|E|, for
some small constant 0 < δ < 1, in which the distance between every pair of nodes (a, b) is
preserved if at least one of the two node is in R.
Before proving Theorem 2.9.1 we introduce two useful lemmata.
Definition 2.9.1 (Useless Nodes) For a node u ∈ U in B(Q, U ), we say that a set Su of
nodes in G(Q, E) is useless for u if every v ∈ Su has an edge to u in B and, furthermore, if
we compute a breadth first search in B(Q, U ), starting from node u and analyzing the nodes
following increasing order of insertion, no node in Su will be in a path between u and a
relevant node in the breadth first search tree.
Lemma 2.9.1 Let u ∈ U and let Su be a set of useless nodes for u; then algorithm A will
delete all edges in G(Q, E) that are between nodes in Su and that are in the clique generated
by the interest u.
Lemma 2.9.2 For > 0, if u has degree Ω(n ), then δdeg(u) neighbors of u are in Su , for
some small constant 0 < δ < 1.
Proof:(of Theorem 2.9.1) It is easy to see that running algorithm A will not change distances
between pairs of nodes (a, b) if at least one of the two nodes is in R. So we have only to prove
that a constant fraction of the edges are deleted by the algorithm. First we notice that we can
restrict our attention only to the set F of folded edges; indeed, by construction, |E − F | ∈
Θ(n). Now, recalling the description of the generating process given in Theorem 2.7.1,
we have that all but an o(|E|) of the edges in F will be part of cliques of polynomial size
generated from a node u of degree Ω(n ), for small . Now by Lemma 2.9.1 and Lemma 2.9.2
we have that in every clique generated from such a node a δ fraction of the edges will be
deleted, for any constant 0 < δ < 1, thus the claim follows.
Proof:(of Lemma 2.9.1) First we notice if an edge is deleted by A in G(Q, F ), where F is
the set of folded edges, it will be deleted also in G(Q, E). This is true because A deletes all
edges that do not appear in any shortest path from any node to a node to in R and F ⊂ E.
In the following we will consider G(Q, F ).
Let u ∈ U and NB (u) the set of neighbors of u in B(Q, U ). After running algorithm
A , we have that any node v ∈ Su ⊂ NB (u) does not appear as an intermediate node in a
shortest path between a relevant node and a node in NB (u). Indeed suppose by contradiction
that v appears as an intermediate node in the path between a relevant node r and a node
28
CHAPTER 2. AFFILIATION NETWORKS
t ∈ NB (u), this would imply that no node h ∈ NB (u) would satisfy d(h, r) ≤ d(v, r), where
d(·, ·) is the distance function, and h has been added to B(Q, U ) before v.
Thus the breadth first search tree in B(Q, U ) rooted at u should have v in the path between
(u, r), thus v ∈
/ Su a contradiction. Thus each node in Su belongs to a different branch
in every breadth first search tree in G(Q, F ) rooted at any relevant node, hence any edge
between two nodes in Su will be deleted.
Proof:(of Lemma 2.9.2) By Lemma 2.5.2 and Lemma 2.5.3 we have that if a node u has
degree nλ ∈ Ω(n ) at the end of the process it should have degree µnλ , also at time φn
and that a δ fraction of the nodes pointing to u have been inserted after time φn, for any
constant 0 < δ < 1 for some constant 0 < µ ≤ 1 and for some constant 0 < φ < 1 that
depends on δ. We call this set L of nodes latecomers. We prove that in the breadth first
search from u, only o(|L|) of the vertices in L are used to reach a relevant node. Thus
|Su | ≥ |L| − o(L) ≥ (1 − δ)|NB (u)|, for any constant 0 < δ < 1 so the Lemma will follow. In
order to prove this we start by showing that the sum of the nodes over the branches of the
breadth first search tree rooted at u and containing a latecomer node is Θ(nλ ).5
We say that a node i is a child of u if the edge (i, j) exists in B(Q, U ) and i has been
inserted in B(Q, U ) after u. Let the descendants of u be the set S such that a node v is in
S if and only if v is a child of u or v is a child of a node in S. It is easy to notice that the
number of nodes in branches of u that contain also a latecomer at time t is upper bounded
by the number of descendants of u. Let Etdesc be the expected number of nodes that are
desc
= 0, so we have:
descendants of u. Notice that Eφn
desc
+ (βcq + (1 − β)cu )
Etdesc = Et−1
desc +µnλ
Et−1
et−1 +eB0
Instead of studying Etdesc will we study the function Wt , with Wφn = µnλ and the recursive
equation:
Wt−1
Wt = Wt−1 + (βcq + (1 − β)cu )
et−1 + eB0
It easy to note that Wt > Etdesc . So we have:
Etdesc < Wt−1 (1 + (βcq + (1 − β)cu ) et−1 1+eB )
0
q +(1−β)cu
< Wt−1 1 + eφnβc
+c∗ (t−1)−c∗ φn)
βcq +(1−β)cu
∗
c
< Wt−1 1 +
e
−c∗ φn
t−1+ φn c∗
βc +(1−β)cu
t−1+ q c∗
< Wt−1
eφn −c∗ φn
Endesc <
t−1+
c∗
“
” “
”
βc +(1−β)cu
βc +(1−β)cu
Γ n−1+ q c∗
Γ φn+ q c∗
”
“
”
Wφn “
e
−c∗ φn
e
−c∗ φn
Γ n−1+ φn c∗
Γ φn+ φn c∗
= Θ nλ
5
n−1
φn
βcq +(1−β)ccu∗−eφn +c∗ φn
!
∈ Θ(nλ )
Note that when a node is added all its edges are copied from its prototype. So the distance between any
couple of pre-existing nodes cannot shrink after the insertion of a new node. Thus in the breadth-first tree
built by A it holds that: for any internal node i all the sons of i have been inserted after i.
2.9. SPARSIFICATION OF G(Q, E)
29
The final technical steps use the concentration results on hereditary function. Specifically, we notice that the number of descendants can be seen as a hereditary function on
the set of edges where the boolean property is being a descendant of u. In addition
M [number of descendants] < cm nλ for a 0 < cm < 1. By proposition 2.3.1 and the Theorem 2.3.1, we have that Etdesc is sharply concentrated.
Furthermore the set of relevant nodes is of cardinality logn n and it is chosen uniformly at
random hence with high probability only a o(|L|) of the latecomers and their descendants
would be a relevant. Thus only a o(|L|) of the branches of the breadth first search tree
rooted at u and containing a node inserted after time φn will lead to a relevant nodes. So
all but a o(|L|) of the latecomers will be in Su .
2.9.2
Sparsification with a stretching of the distances
In the previous subsection we have shown that we can reduce the number of edges in G(Q, E)
by a constant factor using the algorithm A . In this section we will study what we can achieve
if we permit some bounded stretching of the shortest distance between two nodes.
We start by noticing that the graph B(Q, U ) has a linear number of edges and any
distance between two nodes in this graph is equal to 2 times the distance of nodes in G(Q, F )
so adding the edges in E − F it seems that we have the perfect solution to our problem.
Unfortunately the original bipartite graph may not be available to us; nevertheless, we are
able to explore the underlying backbone structure of G to prove the following theorem.
Theorem 2.9.2 There is a polynomial algorithm that, for any fixed cu , cq , β, finds a graph
G0 (Q, E 0 ) with a linear number of edges, where the distance between two nodes is at most k
times larger that the distance in G(Q, E) and in Ĝ(Q, Ê), where k is a function of cu , cq , β.
Proof: First we notice that we can restrict our attention only to the folded edges, indeed
by construction |E − F | ∈ Θ(n).
Let us say that S is a k-spanner of the graph G if it is a subgraph of G in which every
two vertices are no more than k times further apart as they are in G. The problem of finding
k-spanners of a graph is studied extensively in several papers — [3, 6, 87], to name a few. In
our analysis, we will consider the algorithm proposed in [3] for the unit-weight case.
Their algorithm builds the set ES of edges of the 2k-spanner as follows: at the beginning
ES = ∅. The edges are processed one by one, and an edge is added to ES if and only if it
does not close a cycle of length 2k or smaller in the graph induced by the current spanner
edges ES . At the end of the process the graph G(Q, ES ) will be a 2k-spanner of G(Q, E)
by construction and the fact that the girth of G(V, ES ) will be at least k + 1. Since a graph
1
with more thann1+ k edges must have a cycle of at most 2k, the algorithm builds a spanner
1
of size O n1+ k .
It is important to notice that if we apply the algorithm described above to G(Q, E)
and G(Q, F ), analyzing the edges in F in the same order, every edge deleted in G(Q, F ) is
deleted also in G(Q, E). Now in the G(Q, F ) we have that any clique generated by any node
30
CHAPTER 2. AFFILIATION NETWORKS
1
in U has O n1+ k edges. Thus using the algorithm described, we have the following upper
bound on the number edges for a 2k-spanner of G(Q, F ).
|FS | ≤
n X
1 # of nodes of degree i in U ) i1+ k
i=1
By Theorem 2.4.1, we have with high probability:
P
n
1
1+ k1
|FS | ≤
(1 ± o(1)) i
c (1−β)
c (1−β)
i<n∆−
ζ(−2− uc β ) 2+ ucq β
q
i
1
Pn
+
(#
of
nodes
of
degree
i
in
U
)
i1+ k
i≥n∆−
P
n
1
1+ k1
=
(1 ± o(1)) i
c (1−β)
c (1−β)
i<n∆−
ζ(−2− uc β ) 2+ ucq β
q
i
1
Pn
(# of edges pointing to a node in U of degree i)
i1+ k
+
∆−
i≥n
i
P
n
1
1+ k1
=
(1 ± o(1)) i
c (1−β)
c (1−β)
i<n∆−
ζ(−2− uc β ) 2+ ucq β
q
i
1
Pn
k
+
i≥n∆− (# of edges pointing to a node in U of degree i) i
P
n
1
1+ k1
=
i
(1
±
o(1))
∆−
c (1−β)
c (1−β)
i<n
ζ(−2− uc β ) 2+ ucq β
q
i
1P
n
+ nk
i≥n∆− (# of edges pointing to a node in U of degree i)
By Lemma 2.5.4:
|FS | ≤
n
P
i<n∆−
ζ(−2−
cu (1−β)
1
)
c (1−β)
2+ u
cq β
1+ k1
(1 ± o(1)) i
cq β i
c (1−β)
1
−(∆−) uc β
q
+ n·Θ n
nk
So if k >
2.10
Flexibility of the model
4cq β+cu (1−β)
cu (1−β)
then |FS | ∈ Θ(n). Thus also |FS + (E − F )| ∈ Θ(n).
In this section we consider some variations of the model for which is easy to prove that the
main theorems hold. We will analyze the two following cases:
• Instead of generating only one bipartite graph B(Q, U ), a list B0 (Q, U ), · · · , Bk (Q, U )
of bipartite graphs 6 are generated. At the same time the multigraph G(Q, E) evolves
in parallel; besides “folding” length-2 paths in B0 , · · · , Bk into edges, we also add to
G(Q, E) a few preferentially attached neighbors.
6
In this model the choice of adding a node to U or Q is the same for all the graphs, but the number of
edges added(cu0 , cq0 , · · · , cuk , cqk ) and their destination differ.
2.10. FLEXIBILITY OF THE MODEL
31
• Instead of “folding” length-2 paths in B into edges, for every pair of nodes in Q and
every shared common neighbor u ∈ U between them, we randomly and independently
place an edge between the nodes in G(Q, E) with probability proportional to the
reciprocal of d(u)α , where d(·) denotes degree 0 < α < 1.
β
In the first case if for at least a bipartite graph cui < 1−β
cqi the densification of the
edges and the shrinking/stabilizing follow using the same arguments used in the proof of the
Theorems 2.7.1, 2.7.2, 2.8.1 and 2.8.2. Furthermore if k is constant all the theorems on the
degree distribution of G(Q, E) and Ĝ(Q, Ê) continue to hold.
In the second case it is sufficient to notice that every node u in U in B(Q, U ) is no
1
longer substituted by a clique but by a G(n, p), where n = d(u) and p = d(u)
α . Now if
β
cu < 1−β cq (1−α) using the same argument of Theorem 2.7 and the Chernoff bound we obtain
the densification of the edges. The shrinking/stabilizing diameter in this case follows from the
fact that most of the nodes will point to a high degree node in G(Q, E)7 and that the G(n, p),
where p = nα for 0 < α < 1 has constant diameter by [13]. Finally also in this case the degree
distribution is heavy-tailed because with high probability the complementary cumulative
distribution function of nodes dominates the complementary cumulative distribution function
of the degrees of Q in B(Q, U ).
7
This can be proved using the same proof strategy as before.
Chapter 3
Navigability of Affiliation Networks
We demonstrate how the Affiliation Networks model offers powerful cues in local routing
within social networks, a theme made famous by sociologist Milgram’s "six degrees of separation" experiments. This model posits the existence of an "interest space" that underlies
a social network; we prove that in networks produced by this model, not only do short
paths exist among all pairs of nodes but natural local routing algorithms can discover them
effectively. Specifically, we show that local routing can discover paths of length O(log2 n)
to targets chosen uniformly at random, and paths of length O(1) to targets chosen with
probability proportional to their degrees. Experiments on the co-authorship graph derived
from DBLP data confirm our theoretical results, and shed light into the power of one step
of lookahead in routing algorithms for social networks.
3.1
Introduction
Milgram’s six-degrees-of-separation experiment [79, 101] and the fascinating small world hypothesis that follows from it, has generated a lot of interesting research in recent years. In
this landmark experiment, human subjects were asked to deliver a letter to a target person
in a far away city only if they knew the target on a first name basis. Otherwise, they would
pass along the letter to a friend who, recursively, would follow the same instructions. The
surprising outcome was that a reasonably large fraction of the letters reached the target and
moreover, they did so in very few hops. This led to the fascinating small world hypothesis:
take any two people in a social network, and they will be connected by a short chain of
acquaintances. The extent to which the hypothesis is true is still actively debated, and no
evolving model for social network, that exhibits the standard statistical property for social
network (i.e. power law distribution [21, 38], high clustering coefficient [103], densification
and shrinking diameter [71]), can explain at the same time the small world phenomena.
The main contribution of this chapter is to create a bridge between the analysis of the
small world phenomena and the analysis of the evolving model for social networks. In
particular we introduce a new dynamic model that explains the small worlds and all the
The work described in this chapter is a joint work with A. Panconesi and D. Sivakumar.
33
34
CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS
standard properties of social networks. The model is based on affiliation networks, a very
natural model for situations such as those considered in this chapter. This model, studied
in the previous chapter, gives a very good explanation of evolutionary properties such as
densification and shrinking diameter observed in [71] and several other properties that arise
in social networks analysis [63]. In this chapter we introduce and analyze a more general
version of the model and show that it naturally defines in an implicit way a space of interests
that co-evolves with the social network, and that furthermore this space is navigable. The
model is more general than that one introduced in the previous chapter because new interests
that join in can be a perturbation of a mixture of pre-existing interests. In the original model,
a new interest could only be a perturbation of one pre-existing interest. Similarly, a new
person joining the network will share a subset of the interests of several friends, as opposed
to just one of them. Thus, this extension is more natural and flexible. In this chapter we
prove that this enhanced model has several strong properties that are especially relevant for
modeling small worlds. Our model is the first to exhibit simultaneously three different sets
of properties of social networks: small world, evolutionary properties and navigability of the
interest space. In previous attempts, these features were somehow captured but separately.
For instance, the models in [41, 42, 58, 103] deal with the small world phenomenon, but they
are static and unable to explain evolutionary properties or even the heavy-tailed distribution
of popularity (number of friends). Furthermore they assume that every person knows the
distance between its neighbors and the target, instead we only assume that every person
knows the closeness between two interests. There have been also some attempts to define and
navigate an interest space instead of geographic informations [60,105] or to use a latent space
of interests to define the friendship graph [89, 93]. But, again, these models are static(the
number of nodes in the graph does not increase in time) and unable to explain evolutionary
properties. In contrast, in our model all these different aspects come forth naturally from
the same model.
Our model also matches the experimental evidence from a quantitative point of view. The
effective diameter of the friendship graph is upper bounded by a constant. This is compatible
with the empirical observations of [70] where a huge social network of hundreds of millions
of nodes was analyzed and its effective diameter found to be a very small number. When we
analyze the actual working of Milgram routing in the friendship graph (not to be confused
with the mere existence of short paths), we find that when source and target are chosen at
random, their expected routing distance is O(log2 n). The novelty here, is that to find this
short chain we navigate the interest space associated with the affiliation network, and not
the friendship graph itself. When the target is chosen by popularity, i.e. with probability
proportional to the numbers of friends, then the expected length of the chain can be upper
bounded by a constant. This is quite in line with the experimental evidence with human
subjects. It has been pointed out that the successful outcome of Milgram’s experiment
was due to the fact that the target was a person of high social status and had a profession
that contributed even more than his status to establish and nurture many social connections.
When the experiment was repeated using targets of low social status the outcome was indeed
quite different [61]. Our model captures these features of the real world very nicely. Further,
in accordance with the observation of Granovetter [49], the proofs of the upper bound for
the diameter and the expected routing distance use heavily the presence of weak ties(i.e
3.2. OUR MODEL
35
preferential attachment edges in the model). Finally, we point out that our model is the
only one to capture, together with evolutionary properties and the notion of a navigable
interest space, another crucial property of social networks, the heavy-tailed distribution of
popularity (number of friends).
Another important issue with the Milgram’s small world hypothesis is in the structural
hardness of its verification. Milgram’s painstaking work enabled him to collect data on a
few hundreds of individuals, to solve this problem now it is possible to use large-scale social
networking sites, indeed “in silico” experiments that make use of social networks can easily
manage millions of individuals. Furthermore, the issue of attrition, the natural unwillingness
of human subjects to drop the experiment, disappears. For such reasons, several “cyber
replicas” of the experiment have been performed [70,73]. These replicas confirm qualitatively
the small world hypothesis but they are very crude simulations of the experiment. In one
such instance for example [73], a snapshot of the social networking site LiveJournal was
downloaded to obtain a social network of roughly 15 million individuals. The experiment
was simulated by picking source and target at random, and by moving toward the target
according to geographical proximity (geo-greedy): from the current node X we move to the
neighbor of X that is closest to the target. In another instance [70], the effective diameter of
the social network of IM chat exchanges was estimated and found to be compatible with the
small world hypothesis. The main drawback of these approaches is that they only take into
account geographical or positional information, while it is clear that other cues play a role.
In the original experiment, subjects knew the profession of the target and this information
proved to be crucial. This motivates the second question the we address in this chapter: Is
it possible to perform a cyber-replica of Milgram’s experiment in which a cognitive “space of
interests” is navigated? We show here that this is possible. In our experiment we consider a
social network of co-authorships of computer science papers. Two people in this networks are
“friends” if they are co-authors. We then extract a space of interests consisting of computer
science topics. In simulating the experiment, we go from person to person by moving to
the friend of the current person that has more interests in common with the target. By
and large, the outcome confirms the small-world hypothesis in general, and in particular
our assertion that navigating the dual space of interests offers powerful cues in decentralized
routing. Furthermore, our experiments strongly reinforce two significant pieces of work in
the sociology literature — the importance of weak ties [49] and the significance of the social
status of the target node in Milgram’s experiment [61]. Finally, since our experiments are
based on publicly available data, it should be possible for other researchers to replicate our
work as well as derive additional insights underlying small-world routing.
3.2
Our model
The model that we consider in this chapter is a variation of the Affiliation Networks model
presented in the previous chapter. In both models, two graphs evolve at the same time. The
first is a bipartite graph, denoted as B(P, I), that represents the affiliation network, with a
set P of people on one side and a set of interests I on the other. An edge (p, i) represents the
fact that p is interested in i. The second graph is a friendship network, denoted as G(P, E),
36
CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS
representing friendship relations within the same set P of people. In this graph, people
can be friends for two different reasons: if they share an interest or because of preferential
attachment. Thus, G is the “folding” of B, plus a set of edges generated by preferential
attachment.
B(P, I)
Fix two
P 1integers k1 and
P 2 k2 , fix k1 + k2 integers kj=1
cpj = cp , kj=1
cij = ci > 0, and
let β ∈ (0, 1).
At time 0, the bipartite graph B0 (P, I) is a
simple graph with at least cp ci edges, where
each node in P has at least cp edges and
each node in I has at least ci edges.
At time t > 0:
(Evolution of P ) With probability β:
(Arrival ) A new node p is added to P .
(Preferentially chosen Prototypes) A set of
nodes p1 , · · · , pk1 ∈ P , with k > 1, are
chosen as prototypes for the new node, with
probability proportional to their degrees.
(Edge copying) cpj edges are “copied” from
P1
pj , with 1 ≤ j ≤ k1 and kj=1
cp j = cp ;
that is, cpj neighbors of pj , denoted by
u1 , . . . , ucpj , are chosen uniformly at random (without replacement), and the edges
(p, i1 ), · · · , (p, icpj ) are added to the graph.
(Evolution of I) With probability 1 −
β, a new node i is added to I following a
symmetrical process, adding ci edges to i.
G(P, E)
Fix threeP
integers k1 , k2 P
and s, fix k1 +k2 +s
1
2
integers kj=1
cpj = cp , kj=1
cij = ci > 0,
and let β ∈ (0, 1).
At time 0, G0 (P, E) consists of the subset
P of the vertices of B0 (P, I), and two vertices have an edge between them for every
neighbor in I that they have in common in
B0 (P, I).
At time t > 0:
(Evolution of P ) With probability β:
(Arrival ) A new node p is added to P .
(Edges via Prototype) An edge between p
and another node in P is added for every neighbor that they have in common in
B(P, I) (note that this is done after the
edges for p are determined in B).
(Edges via evolution of I)
With probability 1 − β:
A new edge is added between two nodes p1
and p2 if the new node added to i ∈ I is a
neighbor of both p1 and p2 in B(P, I).
(Preferentially Chosen Edges) A set of
s nodes pi1 , . . . , pis is chosen, each node independently of the others (with replacement), by choosing vertices with probability proportional to their degrees, and
the edges (p, pi1 ), . . . , (p, pis ) are added to
G(P, E).
In the previous chapter the graph B evolves as follows. When a new interest (resp.
person) comes in, it selects a prototype node among the existing interests (resp. people) and
copies it with a small perturbation. In this new version, when a new node joins B it can
select more than one prototype. A new interest for example, will be a slightly perturbed
mixture of a few existing interests, and a new person will be interested in a combination of
interests of his/her friends. This new model seems more realistic and, from the technical
point of view, it presents a few complications that make it a non straightforward extension
of the previous one. Furthermore in this new version of the model it is possible to prove
3.3. PRELIMINARIES
37
some additional properties of the graph such the constant diameter and the navigability.
The above table describes the model precisely. For readability, we present the two evolution processes separately even though the two graphs evolve together.
Before proceeding, let us introduce some terminology. An edge of G between two people
that comes from the fact that these two people share an interest in B is called a folded edge.
The set of folded edges is denoted by F . In the next section we will introduce some notation
and some results that we will use in the chapter.
3.3
Preliminaries
We say that an event occurs with high probability (whp) if it happens with probability
1 − o(1), where the o(1) term goes to zero as n (the number of vertices) goes to ∞. Finally,
recall that the distribution of a r.v. X with distribution function F is said to be heavy-tailed
if: limx→∞ eλx P r[X > x] = ∞ for all constants λ > 0.
3.3.1
Concentration Theorems
Now we recall three important properties of functions that make the task of establishing
measure concentration results easier, and present the relevant concentration results from
the literature (see [35]). First we present the simplest version of the method of bounded
differences.
Definition 3.3.1 [Lipschitz Condition] A function f satisfies the Lipschitz condition
with parameters dj , j ∈ [n] with respect to the random variables X1 , · · · , Xn if for any aj , a0j
and for 1 ≤ j ≤ n.
f (X1 = a1 , · · · , Xj = aj , · · · , Xn = an ) − f X1 = a1 , · · · , Xj = a0j , · · · , Xn = an ≤ dj
Theorem 3.3.1 [cf. [35,77]] Assume f satisfies the Lipschitz condition with respect to the
2
variables XP
1 , · · · , Xn with parameters dj , j ∈ [n]. Then P r[|f − E[f ]| > t] ≤ exp (−t /2d),
where d = j≤n d2j .
Now we recall an extension of the Lipschitz Condition and of the method of bounded
differences introduced in Chapter 2.
Definition 3.3.2 [Averaged Lipschitz Condition] A function f satisfies the averaged
Lipschitz condition with parameters cj , j ∈ [n] with respect to the random variables X1 , · · · , Xn
if for any aj , a0j and for 1 ≤ j ≤ n.
E f (X1 , · · · , Xn ) X1 = a1 , · · · , Xj = aj −E f (X1 , · · · , Xn ) X1 = a1 , · · · , Xj = a0j ≤ dj
Lemma 3.3.1 [cf. [35, 77]] Assume f satisfies the averaged Lipschitz condition with respect to the variables X1P
, · · · , Xn with parameters cj , j ∈ [n]. Then P r[|f − E[f ]| > t] ≤
2
exp (−t /2c), where c = j≤n c2j .
38
CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS
Finally we recall a concentration result on hereditary function that will be used in the
proof of Lemma 3.4.1 and already introduced in Chapter 2.
Definition 3.3.3 (Hereditary Property and Hereditary Function) A Boolean property ρ(x, J ), where x is a sequence of n reals J is a family of subsets of [n], is said to be a
hereditary property of index sets if:
(1) ρ is a property of index sets, that is, if xj = yj for every j ∈ J ∈ J , then ρ(x, J ) =
ρ(y, J );
(2) ρ is non-increasing on the index sets, that is, if I ⊆ J , then ρ(x, J ) ⇒ ρ(x, I).
Let fρ be the function determined by a hereditary property of index sets ρ given by fρ =
maxJ :ρ(x,J ) |J |; we will call fρ a hereditary function of index sets.
The concentration result for hereditary functions of index sets is a consequence of the
Talagrand’s inequality and it was proven in [34].
Theorem 3.3.2 [ [34]] Let fρ be a hereditary function of index sets. Then for all t > 0,
Pr[f > M [f ]+t] ≤ 2 exp (−t2 /(4(M [f ] + t))), and Pr[f < M [f ]−t] ≤ 2 exp (−t2 /(4(M [f ]))).
The next proposition relate the concentration theorems for the median value of a function
to concentration theorems on its mean value.
Proposition 3.3.1 The following are equivalent for an arbitrary function f and random
variables X1 , · · · , Xn :
(1) For all t > 0, there exist c1 , α1 > 0 such that Pr[|f − E[f ]| > t] ≤ c1 e−α1 t .
(2) For all t > 0, there exist c2 , α2 > 0 such that Pr[|f − M [f ]| > t] ≤ c2 e−α2 t .
3.4
Properties of the model
In this section we give some definitions and then we some describe relevant properties of the
model. We first define the concepts of effective diameter, core and hubs of a graph.
Definition 3.4.1 [Effective Diameter] For 0 < q < 1, we define the q-effective diameter
as the minimum de such that, for at least a q fraction of the node pairs, the shortest path
between the pair is at most de .
Definition 3.4.2 [Core and hubs of B(P, I)] The core of B(P, I) is the set C ⊆ I of
vertices such that there exist two constants , α > 0 such that for all v ∈ C then d(v) ≥ αn .
The hubs is the set of vertices in P that are at distance 1 from the core.
Now we introduce some properties that our model shares with the original “Affiliation Network” model introduced in chapter 2. Most of the techniques that we will use in the proof
are inspired by the results in chapter 2.
3.4. PROPERTIES OF THE MODEL
39
Theorem 3.4.1 [General properties of the model]
β
If ci < 1−β
cp , we have that:
(1) For the bipartite graph B(P, I) generated after n steps, almost surely, when n → ∞,
the degree sequence of nodes
in P (resp. I) follows a power law distribution with expo
cp β
nent α = −2 − ci (1−β) α = −2 − ci (1−β)
, for every degree smaller than nγ , with γ <
cp β
1
γ < ci1(1−β) with high probability.
cp β
4+ c
4+
i (1−β)
cp β
(2) The degree distributions of the graphs G(P, E) is heavy-tailed with high probability.
(3) The number of edges in G(P, E) is ω(n) with high probability.
(4) The q-effective diameter of G(P, E) shrinks or stabilizes after time φn with high probability, for any constant 0 < φ < 1 and for any constant 0 < q < 1.
Proof: We show that our new models have many statistical properties in common with the
model presented in chapter 2, the basic idea is to show that the expected number of nodes
of degree k evolve in the same way in our new model and in the Affiliation Network model
so we get similar properties.
Let Xti be the random variable that counts the number of nodes in P of degree i at time
i
i
].
= E[Xt−1
t. We want to express Eti = E[Xti ] in terms of Et−1
In the case of the Affiliation Network model recall from chapter 2(Theorem 2.4.1) that:
cp
c
cp
1 − (1 − β)ci et−1
Et p = Et−1
+β
(3.1)
and
Eti
=
i
Et−1
i−1
1 − (1 − β)ci et−1 + (1 − β)ci ei−1
Et−1
.
t−1
i
(3.2)
Similarly for our new model we have that:
c
c
p
Et p = Et−1
+ Pr[a new node is added to P ] − Pr[a new node is added to I]·
·E[num. nodes in P of degree cp at time t − 1 whose degrees increase |
| a node is added to I]
P 2 Pcil
cp
cp
= Et−1
+ β − (1 − β) kl=1
j=1 Pr[a node in Et−1 is chosen as endpoint for the i-th
edge]
where the last equation follows from linearity of expectation. In addition, if we focus on an
addition of a single edge, we have that every edge has the same probability to be copied by
the process, thus we get that:
c
cp
cp
Et p = Et−1
1 − (1 − β)ci et−1
+β
40
CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS
So in the base case the recursive equation for the new model is equal to the equation for the
Affiliation Networks model in chapter 2. Now let us consider the general case, we have that:
i
− E[number of nodes in P , with degree i at time t − 1, that increase deg.]+
Eti = Et−1
+E[num. nodes in P with deg. smaller than i at time t − 1 that increase deg. to i]
i
− E[number of nodes in P , with deg. i at time t − 1, that increase deg.]+
= Et−1
Pk2
+ j=1 E[num. nodes in P with deg. i − j at time t − 1 that increase deg. to i]
i−j j i−j
Pk
ci
i−1
i
i
i
≤ Et−1
− Et−1
(1 − β)ci et−1
+ (1 − β)ci ei−1
E
+
Et−1 .
t−1
j=2 j
et−1
t−1
i−j j i−j
Pk
ci
i−1
i
i
1 − (1 − β)ci et−1
≤ Et−1
+ (1 − β)ci ei−1
E
+
Et−1 .
t−1
j=2
j
et−1
t−1
where the first inequality equation follows from the same observations above and the linearity
of expectation. Furthermore we can also prove a lower bound to the expectation using the
same equalities:
i
i − 1 i−1
i
i
+ (1 − β)ci
E .
Et ≥ Et−1 1 − (1 − β)ci
et−1
et−1 t−1
Thus in this case only the lower bound for Eti is equal to the equation for Eti in the original
Affiliation Network model, so we can not use directly the results of chapter 2. Fortunately
we are interested in the behavior of Eti when t → ∞. In particular we are interested in the
value of
Ei
Y i = lim t
t→∞ t
“
Pk
j=2
(cji )
i−j
et−1
”j
i−j
Et−1
goes to zero when t → ∞ so the
And in the upper bound of Eit the term
t
upper and the lower bound are tight when t → ∞, so when t → ∞ the equation for Y i is
the same as for the original Affiliation Networks thus using the same techniques we get the
Y i have the same value in the new and in the original model. More precisely we get that:
ci β
p (1−β)
−2− c
Yi ∼i
To finish the prove of property (1) we have to show that number of nodes of degree i at
time t, Xti , is concentrated around Eti . To do it we will use Theorem 3.3.1, as in chapter 2, in
particular we would like to follow the same techniques used in chapter 2 but unfortunately
we cannot do it directly because we have to consider the additional terms in the upper bound
of Eti .
We define ∆it = Eit − Êit , where Eit = E[Xti |x1 = a1 , x2 = a2 , · · · , xs = as ] and Êit =
E[Xti |x1 = a1 , x2 = a2 , · · · , xs = a0s ], with s > 0. In chapter 2 it is shown that ∆it ≤
j
P
i−j
∆it−1 + 2ci + 2cp . In our case we note that the additive factor kj=2 cji ei−j
Et−1
≤ k2 ,
t−1
thus using the same algebraic manipulation presented in chapter 2 we get that in our case
∆it ≤ ∆it−1 + 2ci + 2cp + k2 . So we have 2cp + (2cp + 2ci + k2 )(i + 1)–Averaged Lipschitz
Condition for our variables, hence combing the bounds for Y i and using theorem 3.3.1 we
get property (1) for the set P . Finally by symmetry of the degree distributions of P and I
we obtain property (1).
3.4. PROPERTIES OF THE MODEL
41
The proof of the properties (2-4) follows directly from property (1) and the presence of
the preferential attachment edges as shown in chapter 2. We refer to chapter 2 for more
details on those proofs.
Now we prove two technical Lemmata on the evolution of our graph model that we will
use in the following sections. In the following Lemma we give an explicit relation between
the degree of a node at time n and its final degree.
Lemma 3.4.1 Let v be a node in B(P, I) with degree g(n) at time n, with g(n) ∈ Ω(log2 n),
then, with high probability, its degree at time n is smaller than C · g(n), for every constant
> 0 and some constant C > 0. Furthermore if a node v has degree o(log2 n) at time n or
it is inserted after time n, for any constant > 0, then the final degree of v is in o(log2 n)
with high probability.
Proof: Let v be a node in B(P, I) with degree g(n) at time , with g(n) ∈ Ω(log2 n), without
loss of generality we assume that v ∈ P . First we find an upper to the expected final degree
of a vertex of degree g(n). We call Et the expected degree of v at time t, we have that
En = g(n) and for t > n
ci
Et−1
Et = Et−1 + E[new edges pointing to v added at time t] ≤ Et−1 +
et−1
ci
≤ Et−1 1 +
(t − 1)cmin
where et−1 is the number of edges at time t − 1 and cmin = min (ci , cp ). Thus we have:
t Y
ci
ci
Et ≤ Et−1 1 +
≤ g(n)
1+
(t − 1)cmin
(i − 1)cmin
i=n
!
ci
t
c
i
Γ
(t
−
1)
+
Γ (t − 1)
Y (i − 1) + c
cmin
min
≤ g(n)
≤ g(n)
ci
(i
−
1)
Γ
(t
−
1)
Γ
(t
−
1)
+
i=n
cmin
c ci
t − 1 min
≤ C 0 g(n)
= C 00 g(n)
t − 1
Where C 0 and C 00 are two positive constants. Furthermore, Et ≥ g(n) and it is the expected
value of an hereditary function with median g(n). Thus
theorem 3.3.2 and propo applying
000
− log2 n
sition 3.3.1, we have that P r[d(v) > C g(n)] = Θ e
= o (n−1 ), for some positive
constant C 000 . Thus using the union bounds on the number of nodes we get the first part of
the claim.
Now let us consider the case in which v is inserted after n or has degree in o(log2 n). Let
h(n) be the degree of v at time αn, where αn is n or the insertion time of v if it is bigger
than n. Using the same derivation as above we get that Et ≤ K · h(n) for some positive
constant K, thus noticing that the median final degree of v is in Θ(h(n)) and applying
4
2
Theorem 3.3.2 we get that P r[d(v) ≥ K log2 n] < e−K log n = o (n−1 ), for every constant
K > 0. Thus using again the union bounds on the number of nodes we get the claim.
42
CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS
Finally we prove a connectivity property of the nodes inserted after time φn, for some
constant φ > 0.
β
Lemma 3.4.2 If ci < 1−β
cp , any node of P inserted after time φn, for any constant φ > 0,
will have, with probability 1 − o(1), at least one preferential attachment edge incident to an
hub in G(P, E).
Proof: Our proof strategy is to lower bounding the volume of the hubs in G at time φn
and then prove that with high probability every new node will add an edge to them. Note
that by definition the volume of the hubs is bigger or equal to the sum of the squares of the
degrees of the interests in the core of B(P, I), thus using Lemma 3.4.1, we have that at time
t > φn
X
X d(i)
X d
dG (v) ≥
≥
(# nodes of degree d at time t)
2
2
i∈C
v∈hubs
t ≤d≤tγ
X d c (1−β)
−2− i c β
p
≥
Θ t·d
2
γ
t ≤d≤t
“
“
”
”
c (1−β)
c (1−β)
1+γ 1− i c β
1+ 1− i c β
p
p
∈ Ω t
−t
”
“
c (1−β)
1+γ 1− i c β
p
∈ Ω t
Where the last inequality follow from
(1) of Theorem 3.4.1 and Lemma 3.4.1 and
Pn property
1
the last two passages follow from i=k iα ∈ Θ(k 1−α − n1−α ).
Similarly for any φn we have that:
X
X d
dG (v) ≤
(# nodes of degree d at time t) + (edges added via preferential
2
d<t
v ∈hubs
/
attachment)
“
”
X d c (1−β)
c (1−β)
−2− i c β
1+ 1− i c β
p
p
Θ t·d
≤
+n∈Θ t
2
d<t
Thus by taking a small enough constant > 0 all the nodes added after φn will point to at
least one hub with high probability.
3.5
The crucial role of weak ties
In this section we study the effective diameter of G(P, E) and show that it is upper bounded
by a constant (it is unknown if this property holds in the original Affiliation Network model).
This property is a consequence of the coexistence of folded and preferential attachment edges.
Several studies have shown that links in a social network can be of two types, local and longrange, also called weak, ties [49]. Weak ties have several important structural properties,
3.5. THE CRUCIAL ROLE OF WEAK TIES
43
for instance they form bridges between different communities and, in particular, they are
the crucial ingredient that makes small worlds possible. It is thanks to them that Milgram’s
routing can be so effective and fast.
In our model folded edges are local, for they connect people within a community of shared
interests, while preferential attachment edges are the weak (or long-range) ties [58,59]. Note
that, in accordance with the previous literature and sociological intuition, in our model
weak ties are very few compared to folded edges. In this section we show that weak ties play
another interesting structural function that is in accordance with the empirical evidence. It
is because of them that the diameter of the friendship graph shrinks to the point that the
effective diameter is bounded by a constant. Our proof also uses in a fundamental way the
presence of hubs. This might seem in contrast with the results in [31] where the authors
suggest that their role is not relevant. A possible explanation is that they consider only
the degree induced by the explored paths, and thus consider only a subgraph of the social
network. Thus it is possible that in their experiments a high degree node seems to have
small degree just because only few messages passed through him. In our proof, we consider
the real degree of a node. We note that our results are in line with the original
findings of Milgram [79] and also with our experiments, presented in section 7. The main
theorem is the following.
β
cp then the q-effective diameter of G(P, E) is constant with
Theorem 3.5.1 Let ci < 1−β
high probability, for every constant q < 1.
To prove Theorem 3.5.1 we first show the following lemma on the maximum distance
between two nodes in the core in B(P, I).
β
Lemma 3.5.1 Let ci < 1−β
cp . Then, there exists a constant D such that for any pair of
nodes u, v ∈ C the distance between u and v in B(P, I) is smaller than or equal to D with
high probability.
Proof: The idea behind the proof is to show that B(P, I) contains a subgraph with properties
similar to an Erdös-Renyi random graph. More specifically we will show that a graph
composed by the nodes in the core and some path of length 2 between them behave as
an Erdös-Renyi random graph G(C, M ), i.e. a graph chosen uniformly at random among
all graphs having |C| nodes and |M | edges. In addition we will prove that in this graph
|M | = Ω(|C|1+α ), for some constant α. Thus from [57] it follows that the diameter of
G(C, M ) is smaller or equal to 1/α with high probability. And so we will get that the
diameter of the core is bounded by 2/α with high probability.
Consider the following alternative description of the evolution of B(P, I). With probability β a new node v is added to P . Then the following steps take place:
• The new node v selects k > 1 edges, (p1 , i1 ), · · · , (pk , ik ) uniformly at random and the
edges (p, i1 ), · · · , (p, ik ) are added to B(P, I),
• For j = 1, 2, . . . , k, cpj − 1 nodes are chosen uniformly at random in N (pj )/ij , and v is
connected to them. Where N (pj ) is the neighborhood of pj .
44
CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS
A symmetric process takes place when a node is added to I.
Note that from definition 3.4.2 and from lemma 3.4.1 all nodes in C are inserted before
time φn with high probability, for any constant φ > 0. Let us fix a time δn for some fixed
constant δ > 0 and let d be the minimum degree of a node in C at time δn.
We say that an edge is fair if it is among the first d edges that are added to the node in
C. When a new node, inserted after time φn, selects two fair edges in the first step of our
alternative definition of the process we say that a pseudo-edge is added between the two
endpoints in I of the fair edges. Note that with this definition every node in I is selected as an
endpoint of a pseudo-edge with the same probability. So we can define an Erdös-Renyi-like
random graph G(C, M ) that consists of the nodes in the core and the pseudo-edges.
Now we want to find a bound on the number of pseudo-edges. From theorem 3.4.1 and
lemma 3.4.1 we have with high probability that the number of fair edges is larger or equal
to d|C|.
Thus, at any step, the probability of adding a pseudo-edge is larger or equal to n(cpd+ci ) |C|.
By computing the expectation and applying the Chernoff bound, we get that the number
d
of pseudo-edges in G(C, M ) is in (cp +c
|C| with high probability. By noticing that only an
i)
o(1) of
a loop at every step is
the pseudo-edges is a loop i.e. the probability of introducing
2
1+α
d
( /n) and the bound on the diameter of a G(N, M ) when M = n
presented in [57], we
get that the maximum distance between any two nodes in the core is upper bounded by a
constant.
The following corollary is a consequence of Lemma 3.5.1.
β
Corollary 3.5.1 Let ci < 1−β
cp then the hubs are at constant distance in G(P, E) and
B(P, I) with high probability.
Now we prove Theorem 3.5.1. Proof: Recall that from Lemma 3.4.2 we have that all nodes
in P inserted after time φn, for any φ > 0, will have at least one preferentail attachment
edge incident to an hub with probability 1 − o(1). Now, let Xi be a random variable such
that:
1 if i has a hub in its neighborhood
Xi =
0 otherwise
Pn
The number of nodes that have at least one hub
in
their
neighborhood
is
i=1 Xi ≥
i
hP
Pn
n
i=φn Xi . From Lemma 3.4.2 it follows that E
i=φn Xi ≥ (1 − c)n, for any constant
c > φ. Observe that eachP
Xi satisfies the Lipschitz condition with di equal 1. So by Theorem 3.3.1 we have that ni=φn Xi ≥ (1 − c0 )n, for any constant c0 > c. Hence the claim
follows from Corollary 3.5.1.
3.6
Local routing and the interest space
In this section we analyze the performance of a local routing algorithm based on the interests.
We notice that it is not clear if the model introduced in chapter 2 is navigable, in particular
it is not even clear what is its diameter. Our model has two separate graphs that evolve
together, the friendship graph and the affiliation network. In this section we show that the
3.6. LOCAL ROUTING AND THE INTEREST SPACE
45
affiliation network naturally induces a space of interests that is navigable. This is a crucial
feature of a model for Milgram’s experiment, for it is known that cues other than geographic
distance play a crucial role. For instance, in the experiment the target was defined not only
by a location, but, crucially, by a profession.
This is also the first study of the performance of local routing algorithm on an evolving
model. We study for the first time the navigability problem in an evolving graph and with
an evolving embedding. Furthermore, ours is the first model that can explain Milgram’s
experiment if we assume some constant attrition, as suggested in [48] (i.e. in this case only
paths of constant length can be observed with high probability).
We start by defining a notion of distance between interests. In order to do this we have
first to define the prototype graph G(I, Ẽ). The nodes of the prototype graph are the interests
of the Affiliation Network, and two interest i1 , i2 have an edge between them if i1 has been
selected as a prototype for i2 or vice versa. Furthermore, we have that two initial interests
i0 and i00 contained in the graph B0 (P, I) are connected if there is a person contained in
B0 (P, I) that is interested to both i0 and i00 . Note that the prototype graph is composed by
the clique of the initial interests and a DAG and every non stating interest ts connected only
with topics related to it.
Now we can define the distance between two interests.
Definition 3.6.1 [Distance between interests] Let i1 , i2 ∈ I. We define the distance
between i1 and i2 as the hop distance of the two nodes in the prototype graph. Further, we
define the interest distance between two people p1 and p2 as the smallest distance between
any pair of interests, where the first element of the pair is an interest of p1 and the second
is an interest of p2 .
In our analysis we assume that every person knows the distance between any two interests.
In practice we are assuming that every person in reality is able to compute the similarity
between every two interests decide who is his(her) closest neighbor to the target1 . In particular in our setting the message holder has knowledge of the distances between interests of
the destination and those of its neighbors. We define our routing algorithm as follows.
Definition 3.6.2 [Local Routing algorithm] In each step the message holder u performs
the following local algorithm:
• If the destination is a neighbor of u, the message is forwarded to it.
• Otherwise, u forwards the message to the neighbor that minimizes the interest distance
to the destination.
We start by proving a basic property of our algorithm.
Lemma 3.6.1 In every step of the local routing algorithm, or the interest distance between
the message holder and the destination is reduced or the message is delivered to the target.
1
Note that this assumption is made also in every previous navigation model. For example in the Kleinberg [58, 59] model a node is always able to select the neighbor that is closer in the metric space to the
target.
46
CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS
Proof: If the message holder knows the target the lemma is true by definition 3.6.2. Otherwise let v be any interest of the message holder and let w(v) be an interest connected to
v in the prototype graph but with smaller distance from the target. Note that w(v) always
exists because the graph is connected.
There are three cases: either v, w(v) ∈ B0 (P, I) so there is a person in B0 (P, I) interested
to both v, w(v). Or v is a prototype of w(v) or vice-versa, those last two cases are symmetric
and in both cases v and w(v) have a neighbor in common in B(P, I) by definition of the
evolving process. Thus in any case for any interest v of the message holder in the people
graph there is a person that is interested to both v and w(v). Thus in the neighborhood of
the message holder for any interest v there is a person interested to w(v). So using the local
routing algorithm it is always possible forward the message to neighbor closer to the target
and thus the claim follows.
We now show that for most source-destination pairs it is possible to route the message within
a constant number of steps, provided that the destination is selected with a probability that
is proportional to its degree, i.e. its “popularity” in the social network. This result is in
accordance with the analysis of Milgram’s experiment done by Kleinfeld [61], who pointed
out that a successful outcome crucially depends on the social status of the target2 .
β
cp . If the destination is selected with probability proportional
Theorem 3.6.1 Let ci < 1−β
to its degree and the source is selected uniformly at random then, with probability bigger or
equal (1 − φ − o(1)), for any constant φ > 0, the local routing algorithm route the message
in constantly many steps.
Proof: Let v be the destination, we first prove that with probability 1 − o(1) v is an hub.
Let V (hubs, t) be the total volume of the hubs at time t, and V (G/hubs, t) the total volume
of the rest of the graph at time t. Recall that as shown in Lemma 3.4.2 we have that, for
t > φn:
”
“
X
c (1−β)
1+γ 1− i c β
p
dG (v) ∈ Ω t
V (hubs, t) =
v∈hubs
and
V (G/hubs, t) =
X
“
”
c (1−β)
1+ 1− i c β
dG (v) ∈ Θ t
p
v ∈hubs
/
For some such that γ > . Thus when the destination is selected with probability proportional to its degree, with probability 1 − o(1), it will be an hub. In addition note that
Lemma 3.5.1 implies that two hubs are at constant distance also in the interest space. So,
by Lemma 3.6.1, it holds with high probability that if a message reaches an hub it will need
2
Also this point is in contrast by the claim in [31], but on this point Kleinfeld wrote in [61] that in
the Milgram’s experiment “the selection of the sample. I found in the archives the original advertisement
recruiting subjects for the Wichita, Kansas study. This advertisement was worded so as to attract not
representative people but particularly sociable people proud of their social skills and confident of their
powers to reach someone across class barriers.” Besides this there are other experiments that suggest that
social barriers can actually stop the Milgram’s local routing algorithm [62, 74]
3.6. LOCAL ROUTING AND THE INTEREST SPACE
47
only a constant additional number of step to reach every other hub using the local routing
algorithm defined in 3.6.2.
Now note that Lemma 3.4.1 implies that all the hubs are inserted before time φn with
high probability, for every constant φ > 0. Further by Lemma 3.4.2 every node inserted after
time φn will have a hub in its neighborhood with probability 1 − o(1). So with probability
(1 − φ − o(1)) the destination is a hub and the source has at least a hub in its neighborhood.
Thus the local routing algorithm of Definition 3.6.2 will deliver a message in a constant
number of round with probability bigger or equal to (1 − φ − o(1)).
We now consider another interesting setting. In this case we expand the interests of the
destination in such a way that they include the interest of its neighbors. We call this case
the expanded interests setting. This is an attempt to capture the additional knowledge
that human subjects have about the destination, apart from its personal information. This
setting is interesting because it capture some feature of the original Milgram’s experiment.
For instance, in the first experiment presented by Milgram in [79], the sources knew also
that the target was married with a divinity student at Cambridge.
In this setting we can prove the following theorem.
β
cp . In the expanded interests setting when source and destinaTheorem 3.6.2 Let ci < 1−β
tion are selected uniformly at random then, with probability (1 − 2φ − o(1)), the local routing
algorithm will route the message in constantly many steps, for every constant φ > 0.
Proof: The proof strategy is similar to the proof of the previous lemma the main difference
is that in the previous Lemma the hubs played a crucial role instead in this new Lemma the
central role is played by the nodes in the core.
Let v be the destination, we first prove that with probability 1 − o(1) v has a neighbor
with an interest in the core. Let EC (t) be set of folded edges that are generated by an interest
in the core at time t and E(t) the number of edges at time t. Using the same strategy of
Lemma 3.4.2, we get that, for t > φn:
”
“
X
c (1−β)
1+γ 1− i c β
p
EC (t) =
dG (v) ∈ Ω t
v∈hubs
and
E(t)/EC (t) =
X
”
“
c (1−β)
1+ 1− i c β
dG (v) ∈ Θ t
p
v ∈hubs
/
Where γ > . Now with probability 1 − φ the destination is a node inserted after time φn
and by Lemma 3.4.1 every interest in the core has been inserted before time φn, for every
constant φ > 0. So with high probability the destination is connected with a preferential
attachment edge to a node that is interested to a topic in the core. Thus if we augment
the interests of the destination with those of its neighbors we have that with probability
1 − φ − o(1) the new set of interests have an interest in the core.
But now as shown in Lemma 3.6.1 the source has an hub in its neighborhood with
probability 1 − φ − o(1). Furthermore by Lemma 3.5.1 it follows that if the message is in an
hub in a constant number of step it can reach every node that has an interest in the core.
Thus, using the same argument of Lemma 3.6.1, we get the claim.
48
CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS
Now we study the most general case, when source and target are chosen adversarially and
we do not extend the interest space of the destination, in this setting we are able to show
the following upper bound on the running time of the local routing algorithm.
β
Theorem 3.6.3 If ci < 1−β
cp then, for any source and any destination, the local routing
algorithm routes the message within O(log2 n) steps with high probability.
Proof: To prove the result we will bound the diameter of the interest prototype tree,
by Lemma 3.6.1 the diameter is an easy upper bound to our local routing algorithm. In
particular we will show that, with high probability (whp), the diameter of the prototype
graph is O(log2 n).
The general idea of the proof is to divide the random process in O(log n) macro-phases
and to show
that in each macro-phase the probability that diameter increases of a ω (log n)
1
is o log n . Thus, we get that the diameter is O(log2 n) whp.
Let us divide our evolving process in O(log n) phases. In phase zero we group the first
600 log n steps. Phase one is from the end of phase zero to step b(1 + ) log nc, for a small
constant > 0. Phase two is up to step b(1 + )2 log nc. In genera, phase i starts after the
end of phase i − 1 and ends at step b(1 + )i log nc.
Let now consider a generic phase t > 0. Let T = (1 + )t 600 log n. First we want to
bound the number of edges that we have at the beginning of each phase in B(P, I). Let At
be the random variable that counts the number of edges at the beginning of phase t. We
have that E[At ] = (βcp + (1 − β)ci )T . By the Chernoff bound we have that
1
E[At ]
1
P r |E[At ] − At | > E[At ] ≤ exp −
≤ 2.
10
300
n
Thus using the union bound on the number of macro-phases, it follows that at the beginning
9
11
of each phase t, 10
E[At ] ≤ At ≤ 10
E[At ] with high probability. In the rest of the proof we
9
11
will assume that 10 E[At ] ≤ At ≤ 10 E[At ].
To get a bound on the diameter, we start by studying the two following event ξ1 and ξ2 .
ξ1 (j) = {interest j, inserted in phase t, of degree ci is selected in a step during phase t as
a prototype for the first time}
ξ2 (j) = {interest j, inserted in phase t, of degree ci increases its degree in a step during
phase t}
First notice that from the definition of the evolving process, we have that P r[ξ1 (j)] ≤
i
≤ 10c
.
9T
To bound P r[ξ2 (j)], recall that interest j has degree ci , so there are ci people interested
in it, denote them as p1 , p2 , · · · , pci . Now if j increases its degree, this implies that a new
person arrives in the graph and copies the interest j from one of the person interested to it,
p1 , p2 , · · · , pci . This happens with probability:
c ci
X
10di
1 p
P r[ξ2 (j)] ≤
1− 1−
9T
di
i=1
ci
At
3.7. EXPERIMENTS
49
By an application of calculus, it is possible to see that this probability is maximized when
d1 = · · · = dci = T . Thus
c cc
cp
1 p
p i
P r[ξ2 (j)] ≤ ci 1 − 1 −
≤ ci 1 − e T ≤
T
T
So P r[ξ1 (j) ∨ ξ2 (j)] ≤ 2 cpTci . Let us define ξ(j) = ξ1 (j) ∨ ξ2 (j).
Now we can compute the probability that in phase t the diameter of the prototype graph
increases by more then C, with C > e. Let us call this event τC . Note that if τC implies
that a sequence of C new interests added in phase t increase the diameter of the prototype
tree of C. In order for this event to hold ξ have to occur at least C times in a phase. So we
can upper bound τC as follows.
P r[τC ] ≤ (# of steps in a phase) · (# of new nodes in a phase)·
·P [xi (j) holds for node j of degree ci ]
cp cq C
dT e
dT e
2
≤ dT e C P [xi holds for a node j in a step] ≤ dT e C
T
p
1
dT e
2πdT e
(dT e)
e 12n
cp cq C
≤ dT e p
2
1
T
2πC2π(dT
e − C) C C (dT e − C)dT e−C e 6n+1
√
T −C C
1
1
T
T
= dT e p
(2cp cq )C e( 12n − 6n+1 )
C
2πC(T − C) T − C
T −C C
C
C
≤ dT e 1 +
(2cp cq )C ≤ dT eeC
(2cp cq )C
T − C
C
C
< dT e (2cp cq )C
Where in the third inequality we use Stirling’s approximation [86, 106]. Therefore the probability of τC decreases geometrically with C.
Finally, let us compute the probability that the final diameter is bigger than K = k log2 n.
After the first phase the diameter is at most 600 log n, so we can bound the previous probability as the probability that the diameter increases by at least (k − 600) log n after phase
1. Hence
X
log
n
P r[diameter is at least k log n] ≤
Πi=2(1+) P r[ξki ]
k2 ,k3 ,··· ,klog
Plog(1+) n
i=2
≤
≤
(1+) n
ki =K−600 log n
log(1+) n · (K − 600 log n) · T log n · (2cp cq )K−600 log n
log(1+) n · (K − 600 log n) · Θ nlog n · n−k log n ∈ o(1)
Thus by fixing a big enough k the claim follows.
3.7
Experiments
Our mathematical model of social networks, building on the affiliation network model, suggests natural decentralized routing algorithms in social networks. Namely, given a source
50
CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS
vertex s and a target vertex t, identify the interests of s and t in the underlying affiliation
network and identify the neighbor of s whose interests are closer to that of t (with respect to
the hierarchy of interests implied by the prototype selection step). Inspired by this, one can
define natural algorithms that perform decentralized routing in real-world social networks by
suitably approximating the process of navigating the interest hierarchy. In this section, we
do precisely this, and report our findings based on simple experiments with a modestly-sized
social network.
Our social network consists of authors as nodes and edges defined by co-authorship of one or
more articles. We downloaded a copy of the DBLP database of computer science papers, a
DB of roughly 735,000 authors and 1.24M articles, and constructed the co-authorship graph
with about 4.63M edges (for an average degree of roughly 6.7 co-authors per node). On this
network, we randomly selected about 575 source–target pairs and attempted to construct
paths between them. The largest connected component in this network has roughly 80%
of the vertices, with the rest of the vertices in very small isolated components, so that the
probability that two randomly selected nodes belong to the largest connected component
is roughly 64%. The mean length of the shortest path between nodes in this component is
roughly 6.3 (with a median length of 6).
Notice that in this way, we construct an affiliation network where two authors are friends if
they coauthor a paper, now we have to infer a metric on the interest in order to route the
messages. Unfortunately this is not easy, because there is not a clear definition of closeness
between papers and all the standard classification system for the papers are too poor for our
purpose. To overcome this difficulty we define the interest space not as the set of papers but
as the set of bigrams and unigrams contained in the title of the paper.
In particular we begin by segmenting article titles into one-word and two-word sequences
(unigrams and bigrams) after suitably eliminating stopwords that occur commonly (‘and’,
‘the’, etc.). For instance, the title “Small world experiments for everyone” generates four unigrams — ‘small’, ‘world’, ‘experiments’, and ‘everyone’, and two bigrams — ‘small world’,
‘world experiments’. Both bigrams and unigrams are treated as interests, with the latter
of a more generic kind; for instance, the unigram ‘physics’ is somewhat general, whereas
the bigram ‘particle physics’ is much more specific. In this fashion, for every author, their
interest profile is identified; specifically, for author a and interest i, we define s(i, a) to be
the strength of interest i for author a, and is defined as the number of occurrences of interest
(unigram/bigram) i within author a’s publications.
To simulate Milgram’s experiment, our basic algorithm operates as follows: if we are currently at node x, we move to the neighbor y of x whose interest profile is closest to the target
t, where the measure of proximity of y to t is computed according to the formula
proximity(y, t) =
P
Interest
i
s(i,y)s(i,t)
,
p(i)
P
where p(i) denotes the overall popularity of interest i, defined by p(i) =
a s(i, a). If
there is no neighbor with non-zero proximity, we either declare failure, or in a variation of
the experiment, proceed greedily to the neighbor of highest degree.
The most basic variant of the algorithm outlined insists that the proximity measure strictly
increase in each step of the routing: this version is called Local-Monotone, and the ver-
3.7. EXPERIMENTS
51
sion without this restriction is called Local. The next variation we consider is to allow one
step of ‘lookahead’, where we not only evaluate neighbors of x, but also evaluate neighbors
of neighbors of x, and route through the neighobor whose neighbor achieves the highest
proximity to the target; this idea of ‘lookahead’, very common in computer science, captures
the belief that in real social networks, one not only has knowledge about their friends, one
often has partial knowledge about friends-of-friends. The corresponding non-monotone and
monotone variations are called, respectively, Lookahead and Lookahead-Monotone.
In a third variation, we allow the algorithm the knowledge not only of the target’s interests, but also those of its neighbors’; this is a ‘reverse’ and limited form of lookahead, and
has precedent in Milgram’s experiment, where the sources had the knowledge that the target was the wife of a student of divinity in Cambridge, Mass. This is naturally aimed at
routing to hard-to-reach destinations by augmenting the algorithm with extra information.
The corresponding variations of the four algorithms described above are Local-Expand,
Local-Monotone-Expand, and so on.
Figure 3.1, 3.2 report the percentage of succesful chains for the eight variations of the decentralized routing algorithm we studied. For reference, we compare the performance of the
decentralized routing algorithms to that of the omniscient algorithm that has full information about the network structure and employs a standard ‘shortest path’ computation. The
‘success percentage’ in Figure 3.1, 3.2 is the percentage of source–target pairs successfully
routed, divided by 0.64 (which is the fraction for this omniscient algorithm). The results
are presented in four groups, each corresponding to one value of a parameter called τ , which
restricts the sampling of the target nodes to be uniform among all nodes of degree at least
τ ; this is done to explore the role of the centrality of the target in determining the success
of decentralized routing.
Success Rate without expanded interests
100
Lookahead Monotone
Local Monotone
Lookahead
Local
Success Rate
80
60
40
20
0
0
2
4
6
8
10
12
Minimum degree of the destinations
14
16
Figure 3.1: Success Rate without extended interest.
We briefly highlight some salient observations based on Figures 3.1, 3.2, 3.3 and 3.4 and
other related experiments.
(1) Navigation based on interests is an extremely powerful paradigm; the success of the
basic algorithm Local in achieving 21% successful routing is, a priori, unexpected, given
how crude our construction of the interest space is. In particular the previous replicas of the
small-world experiment had always lower successful rate [?, ?].
(2) Adding even one of two natural cues to local routing (either expanding the interests of
52
CHAPTER 3. NAVIGABILITY OF AFFILIATION NETWORKS
Success Rate with expanded interests
100
Success Rate
80
60
40
20
0
Lookahead Monotone
Local Monotone
Lookahead
Local
0
2
4
6
8
10
12
Minimum degree of the destinations
14
16
Figure 3.2: Success Rate with extended interest.
Average path length without expanded interests
30
Lookahead Monotone
Local Monotone
Lookahead
Local
Path Length
25
20
15
10
5
0
0
2
4
6
8
10
12
Minimum degree of the destinations
14
16
Figure 3.3: Average path length without extended interest.
the target or adding a step of lookahead) is enormously powerful — with each cue raising
the success rate to about 57%, and reducing the path length from about 24 to about 12.
(3) Adding both interest expansion and lookahead results in 80% successful routing, with
extremely short paths (a median path length of 7).
(4) Insisting on monotonically better proximity to the target’s interests typically reduces
success rate, but significantly improves the length of the path constructed, for each of the
four variations of the algorithm.
(5) Picking the target from a distribution that is restricted to targets of certain minimum
degree dramatically improves the success rate and path length for decentralized routing
algorithms. While this restriction might appear strange, this captures the idea the even
modestly ‘well-connected’ nodes are significantly easier to reach than completely isolated
ones. When we place a minimum degree restriction of 15 (recall that the average degree
is only 6.7), the best algorithm achieves 97% success rate and produces paths almost as
short as the shortest possible! Even the simplest of algorithms, Local, succeeds on 50%
of the cases — this reinforces the argument made by Kleinfeld12 , who, analyzing Milgram’s
experiments, suggests that the success of the routing depends, at least to some extent, on
the fact that the target was not a completely isolated person but one well-connected in terms
of geographic location, employment, social status, etc.
3.7. EXPERIMENTS
53
Average path length with expanded interests
20
Lookahead Monotone
Local Monotone
Lookahead
Local
Path Length
15
10
5
0
0
2
4
6
8
10
12
Minimum degree of the destinations
14
16
Figure 3.4: Average path length with extended interest.
Chapter 4
Gossip
In this chapter we show that if a connected graph with n nodes has conductance φ then
rumour spreading, also known as randomized broadcast, successfully broadcasts a message
within Õ(φ−1 · log n), rounds with high probability, regardless of the source, by using the
PUSH-PULL strategy. The Õ(· · · ) notation hides a polylog φ−1 factor. This result is almost
tight since there exists graph of n nodes, and conductance φ, with diameter Ω(φ−1 · log n).
If, in addition, the network satisfies some kind of uniformity condition on the degrees,
our analysis implies that both both PUSH and PULL, by themselves, successfully broadcast
the message to every node in the same number of rounds.
4.1
Introduction
Rumour spreading, also known as randomized broadcast or randomized gossip (all terms
that will be used as synonyms throughout the chapter), refers to the following distributed
algorithm. Starting with one source node with a message, the protocol proceeds in a sequence
of synchronous rounds with the goal of broadcasting the message, i.e. to deliver it to every
node in the network. In round t ≥ 0, every node that knows the message selects a neighbour
uniformly at random to which the message is forwarded. This is the so-called PUSH strategy.
The PULL variant is symmetric. In round t ≥ 0 every node that does not yet have the
message selects a neighbour uniformly at random and asks for the information, which is
transferred provided that the queried neighbour knows it. Finally, the PUSH-PULL strategy
is a combination of both. In round t ≥ 0, each node selects a random neighbour to perform
a PUSH if it has the information or a PULL in the opposite case.
These three strategies have been introduced in [30] and since then have been intensely
investigated (see the related work section). One of the most studied questions concerns their
completion time: how many rounds will it take for one of the above strategies to disseminate
the information to all nodes in the graph, assuming a worst-case source? In this chapter we
prove the following two results:
The work described in this chapter is a joint work with F.Chierichetti and A. Panconesi, its extended
abstract appeared in the Proceedings of 42st ACM Symposium on Theory of Computing (STOC10) [27].
55
56
CHAPTER 4. GOSSIP
• If a network has conductance φ and n nodes,
then, with high probability, PUSH-PULL
log2 φ−1
reaches every node within O
· log n many rounds, regardless of the source.
φ
• If, in addition, the network satisfies the following condition for every edge uv and some
constant α > 0:
deg(u) deg(v)
max
,
≤α
deg(v) deg(u)
then both PUSH and PULL, by themselves1 , reach every node within
O(cα · φ−1 · log n · log2 φ−1 )
many rounds with high probability regardless of the source, where cα is a constant
depending only on α.
The first result is a significant improvement with respect to best current bound of O(log4 n/φ6 )
[27]. (The proof of [27] is based on an interesting connection with spectral sparsification [98]. The approach followed here is entirely different.) The result is almost tight
because Ω(log n/φ) is a lower bound2 — in particular, the bound is tight in the case of
constant conductance (for instance, this is the case for the almost-preferential-attachment
graphs of [78].) The second result can be proved using the same approach we use for the
main one.
Our main motivation comes from the study of social networks. Loosely stated, we are
looking for a theorem of the form “Rumour spreading is fast in social networks”. There
is some empirical evidence showing that real social networks have high conductance. The
authors of [72] report that in many different social networks there exist only cuts of small
(logarithmic) size having small (inversely logarithmic) conductance – all other cuts appear
to have larger conductance. That is, the conductance of the social networks they analyze is
larger than a quantity seemingly proportional to an inverse logarithm.
Our work should also be viewed in the context of expansion properties of graphs, of
which conductance is an important example, and their relationship with rumour spreading.
In particular we observe how, interestingly, the convergence time of the PUSH-PULL process
on graph of conductance φ is a factor of φ smaller than the worst-case mixing time of the
uniform random walk on such graphs.
Conductance is one of the most studied measures of graph expansion. Edge expansion,
and vertex expansion are two other notable measures. In the case of edge expansion there
are classes of graphs for which the protocol is slow (see [26] for more details), while the
problem remains open for vertex expansion.
1
We observe that the star, a graph of conductance O(1), is such that both the PUSH and the PULL strategy
by themselves require Ω(n) many rounds to spread the information to each node, assuming a worst case,
or even uniformly random, source. That is, conductance alone is not enough to ensure that PUSH, or PULL,
spread the information fast.
2
Indeed, choose any n, and any φ ≥ n−1+ . Take any 3-regular graph of constant vertex expansion (a
random 3-regular graph will suffice) on O(n · φ) nodes. Then, substitute each edge of the regular graph with
a path of O(φ−1 ) new nodes. The graph obtained is easily seen to have O(n) nodes, diameter O(φ−1 · log n)
and conductance Ω(φ).
4.2. RELATED WORK
57
In terms of message complexity, we observe first that it has been determined precisely
only for very special classes of graphs (cliques [55] and Erdös-Rényi random graphs [36]).
Apart from this, given the generality of our class, it is impossible to improve the trivial
upper bound on the number of messages – that is, number of rounds times number of nodes.
For instance consider the “lollipop graph” 3 . Fix ω(n−1 ) < φ < o(log−1 n), and suppose to
have a path of length φ−1 connected to a clique of size n − φ−1 = Θ(n). This graph has
conductance ≈ φ. Let the source be any node in the clique. After Θ(log n) rounds each node
in the clique will have the information. Furthermore, at least φ−1 steps will be needed for
the information to be sent to each node in the path. So, at least n − φ−1 = Θ(n) messages
are pushed (by the nodes in the clique) in each round, for at least φ−1 − Θ(log n) = Θ(φ−1 )
rounds. Thus, the total number of messages sent will be Ω(n · φ−1 ). Observing that the
running time is Θ(φ−1 + log n) = Θ(φ−1 ), we have that the total number of rounds times n
is (asymptotically) less than or equal to the number of transmitted messages.
4.2
Related work
The literature on the gossip protocol and social networks is huge and we confine ourselves
to what appears to be more relevant to the present work.
Clearly, at least as many rounds as the diameter are needed for the gossip protocol to
reach all nodes. It has been shown that O(n log n) rounds are always sufficient for each
connected graph of n nodes [39]. The problem has been studied on a number of graph
classes, such as hypercubes, bounded-degree graphs, cliques and Erdös-Rényi random graphs
(see [39, 44, 88]). Recently, there has been a lot of work on “quasi-regular” expanders (i.e.,
expander graphs for which the ratio between the maximum and minimum degree is constant)
— it has been shown in different settings [7, 32, 33, 43, 95] that O(log n) rounds are sufficient
for the rumour to be spread throughout the graph. See also [56,82]. Our work can be seen as
an extension of these studies to graphs of arbitrary degree distribution. Observe that many
real world graphs (e.g., facebook, Internet, etc.) have a very skewed degree distribution —
that is, the ratio between the maximum and the minimum degree is very high. In most
social networks’ graph models the ratio between the maximum and the minimum degree can
be shown to be polynomial in the graph size.
Mihail et al. [78] study the edge expansion and the conductance of graphs that are very
similar to preferential attachment (PA) graphs. We shall refer to these as “almost” PA graphs.
They show that edge expansion and conductance are constant in these graphs. Their result
and ours together imply that rumor spreading requires O(log n) rounds on almost PA graphs.
For what concerns the original PA graphs, in [26] it is shown that rumour spreading is
fast (requires time O(log2 n)) in those networks.
In [17] it is shown that high conductance implies that non-uniform (over neighbours)
rumour spreading succeeds. By non-uniform we mean that, for every ordered pair of neighbours i and j, node i will select j with probability pij for the rumour spreading step (in
general, pij 6= pji ). This results does not extend to the case of uniform probabilities studied
3
The lollipop graph is the graph obtained by joining a complete graph to a path graph with a bridge.
58
CHAPTER 4. GOSSIP
in this chapter. In our setting (but not in theirs), the existence of a non uniform distribution
that makes rumour spreading fast is a rather trivial matter. A graph of conductance φ has
diameter bounded by O(φ−1 log n). Thus, in a synchronous network, it is possible to elect a
leader in O(φ−1 log n) many rounds and set up a BFS tree originating from it. By assigning
probability 1 to the edge between a node and its parent one has the desired non uniform
probability distribution. Thus, from the point of view of this chapter the existence of non
uniform probabilities is rather uninteresting.
In [82] the authors consider a problem that at first sight might appear equivalent to ours.
They consider the conductance φP of the connection probability matrix P , whose entry Pi,j ,
1 ≤ i, j ≤ n, gives the probability that i calls j in any given round. They show that if
P is doubly stochastic then the running time of PUSH-PULL is O(φ−1
P · log n). This might
seem to subsume our result but this is not the case. The catch is that they consider the
conductance of a doubly stochastic matrix instead of the actual conductance of the graph,
as we do. Observe that the are graphs of high conductance that do not admit doubly
stochastic matrices of high conductance. For instance, in the star, no matter how one sets
the probabilities Pij , there will always exist a leaf ` that will be contacted by the central
1
. Since the matrix is doubly-stochastic this implies that ` will
node with probability ≤ n−1
contact the central node with probability O(n−1 ). Thus, at least Ω(n) rounds will be needed.
Therefore their result gives too weak a bound for the uniform PUSH-PULL process that we
analyze in this chapter.
4.3
Preliminaries
Observe that
1
2
vol(V ) = |E|. Given S ⊆ V , and v ∈ S, we define
NS+ (v) = {w | w ∈ V − S ∧ {v, w} ∈ E}
+ −
−
+
−
and d+
S (v) = NS (v) . Analogously, we define NS (w) = NV −S (w) and dS (w) = NS (w) .
Recall that the conductance (see [51]) of a graph G(V, E) is:
Φ(G) =
cut(S, V − S)
S⊂V :vol(S)≤|E|
vol(S)
min
Where cut(S, V − S) is the number of edges in the cut between S and V − S and vol(S) is
the volume of S.
We recall three classic concentration results for random variables using, respectively, the
first moment, the second moment and every moment of a random variable X.
Theorem 4.3.1 (Markov inequality) Let X be a random variable. Then,
E[|X|]
Pr |X| ≥
≤ .
4.3. PRELIMINARIES
59
Theorem 4.3.2 (Chebyshev inequality) Let X be a random variable. Then,
h
i
p
Pr |X − E[X]| ≥ Var[X]/ ≤ ,
where Var[X] is the variance of X, Var[X] = E[X 2 ] − E[X]2 .
Pn
Theorem 4.3.3 (Chernoff bound) Let X =
i=1 Xi , where Xi are independently distributed random variables in [0, 1]. Then,
2
Pr [|X − E[X]| > · E[X]] ≤ exp − · E[X] .
3
We now state and prove some technical lemmas that we will use in our analysis. The
first one can be seen as an “inversion” of Markov’s inequality.
Lemma 4.3.1 Suppose X1 , X2 , . . . , Xt are random variables, with Xi having co-domain
{0, vi } and such that Xi = vi with probability pi . Fix p ≤ min pi . Then, for each 0 < q < p,
"
X #
X
1−p
Pr
Xi ≥ 1 −
·
vi ≥ q.
1
−
q
i
i
Proof: Let X i = vi − Xi . Observe that each Xi and each X i is a non-negative random
variable, of mean
Pexpected sum
P pi · vi and (1 − pi ) · vi , respectively. We use µ to denote the
of the Xi , µ = (pi · vi ), and
P µ to denote the expected sum of the X i , µ = ((1 − pi ) · vi ).
Observe that µ ≤ (1 − p) · vi .
We have
"
#
1−p X
Pr
Xi ≤ 1 −
vi ≤
1−q
i
i
"
#
X
X
1
Pr
·µ =
Xi ≤
vi −
1−q
i
i
"
#
X
1
Pr
Xi ≥
·µ ≤ 1−q
1−q
i
X
where in the last step we applied Markov’s inequality. Thus the claim.
The next lemma gives some probabilistic bounds on the sum of binary random variables
having close expectations.
Lemma 4.3.2 Let p ∈ (0, 1). Suppose X1 , . . . , Xt are independent 0/1 random variables,
the i-th of which such that Pr[Xi = 1] = pi , with 12 · p ≤ pi ≤ p. Then,
P
1
1. if pt2 > 1, then Pr[ Xi ≥ pt4 ] ≥ 32
;
P
2. if pt2 ≤ 1, then Pr[ Xi ≥ 1] ≥ pt4 ;
60
CHAPTER 4. GOSSIP
3. in general, for P = min
1 pt
,
32 4
, we have
X
Pr
Xi ≥
pt
128 · P
≥ P.
P
Proof: Let X = ti=1 Xi . In the first case, E[X] ≥ t · p2 ; in particular, E[X] ≥ 1. Therefore,
by Chernoff’s bound, we have
1
1
Pr X < · E[X] ≤ e− 16 E[X]
2
1
1
≤ e− 16 ≤ 1 − ,
32
where the last inequality follows from e−x ≤ 1 −
x
2
if x ∈ [0, 1].
In the second case, we compute the probability that for no i, Xi = 1:
t
Y
Pr [Xi = 0] ≤
i=1
t Y
i=1
=
p
1−
2
p t
1−
=
2
p
≤ e− 2 ·t ≤ 1 −
So, with probability ≥
pt
4
p p2
1−
2
p2 ·t
pt
.
4
at least one Xi will be equal to 1.
The third case, follows directly from the former two, by choosing — respectively —
1
, and P = pt2 .
P = 32
The following lemma, which we will use later in the analysis, gives a probability bound
close to the one that one could be obtained using Bernstein Inequality. We keep it this way,
for simplicity of exposition of our later proofs.
Lemma 4.3.3 Suppose a player starts with a time budget of B time units. At each round i,
an adversary (knowledgeable of the past) chooses a number of time units 1 ≤ `i ≤ L. If the
remaining budget of the player is at least `i then a game, lasting for `i time units, is played.
The outcome of the game is determined by an independent random coin flip: with probability
pi ≥ P the gain is equal to `i , the length of the round, and with probability 1 − pi the gain is
zero. The game is then repeated.
24
If B ≥ 193 · PL · ln dlogδ2 Le with probability at least 1 − δ the gain is at least 193
· B · P.
Proof: Let the game
Pt go on until the192end. Suppose the adversary chose games’ lengths
`1 , `2 , . . . , `t , with i=1 `t > B − L ≥ 193 B.
Let Xj be the set containing all the rounds whose `i ’s were such that 2j ≤ `i < 2j+1 ,
Xj = {i | 2j ≤ `i < 2j+1 }. The sets X0 , X1 , . . . , Xdlog2 Le partition the rounds in O(log L)
buckets.
4.3. PRELIMINARIES
61
AssignPwith each bucket Xj the total number S(Xj ) of time units “spent” in that bucket,
S(Xj ) = i∈Xj `i .
Let X be the set of buckets Xj for which S(Xj ) ≥ 12
· 2j+1 · ln dlogδ2 Le . The total number
P
of time units spent in buckets not in X will then be at most
dlog2 Le X
j=0
dlog2 Le
12 j+1
· 2 · ln
P
δ
dlog2 Le
12
dlog2 Le X
96
dlog2 Le
=
· ln
·
2j+1 ≤
· L · ln
.
P
δ
P
δ
j=0
P
Therefore the total number of units spent in buckets of X , S(X ) = Xj ∈X S(Xj ), will be
96
at least S(X ) ≥ 193
B. Furthermore, the number of rounds |Xj | played in bucket Xj ∈ X
will be at least S(Xj ) · 2−(j+1) ≥ 12
· ln dlogδ2 Le . Each such round will let us gain a positive
P
amount with probability at least P . Therefore, the expected number of rounds E[W (Xj )]
in bucket Xj ∈ X having positive gain will be at least
E[W (Xj )] ≥ 2−(j+1) · S(Xj ) · P ≥ 12 · ln
dlog2 Le
.
δ
By the Chernoff bound,
1 −(j+1)
1
Pr W (Xj ) < · 2
· S(Xj ) · P ≤ Pr W (Xj ) < · E[W (Xj )]
2
2
1
≤ exp − · E[W (Xj )]
12
dlog2 Le
δ
.
≤ exp − ln
=
δ
dlog2 Le
Observe that the gain G(Xj ) in bucket Xj ∈ X is at least 2j · W (Xj ). By union bound, the
probability that at least one bucket Xj ∈ X is such that G(Xj ) < 14 · S(Xj ) · P is at most δ.
Therefore with probability at least 1 − δ, the total gain is at least
24
P
P X
B · P.
S(Xj ) = · S(X ) ≥
4 X ∈X
4
193
j
Finally we give a lemma that underlines symmetries in the PUSH-PULL strategy. Let
u→
− v be the event that an information originated at u arrives to v in t rounds using the
t
PUSH-PULL strategy. And let u ←
− v the event that an information that is originally in v
arrives to u in t rounds using the PUSH-PULL strategy. We have that:
t
Lemma 4.3.4 Let u, v ∈ V , then
t
t
P r[u →
− v] = P r[u ←
− v]
62
CHAPTER 4. GOSSIP
Proof: Look at each possible sequence of PUSH-PULL requests done by the nodes of G in
t rounds. We define the “inverse” of some sequence, as the sequence we would obtain by
looking at the sequence starting from the last round to the first and exchanging PUSH’s and
PULLÕs. Now the probability that the information spreads from u to v (resp., from v to u)
in at most t steps is equal to the sum of the probabilities of sequences of length at most t
that manage to pass the information from u to v (from v to u) — given that the probability
of a sequence, and that of its inverse, are the same, the claim follows.
4.4
Warm-up: a weak bound
In this section we prove a completion time bound for the PUSH-PULL strategy of O(φ−2 ·log n).
Observe that this bound happens to be tight if φ ∈ Ω(1). The general strategy is as follows:
• we will prove that, given any set S of informed nodes having volume ≤ |E|, after
O(φ−1 ) rounds (that we call a phase) the new set S 0 of informed vertices, S 0 ⊇ S, will
have volume vol(S 0 ) ≥ (1 + Ω(φ)) · vol(S) with constant probability (over the random
choices performed by nodes during those O(φ−1 ) rounds) — if this happens, we say
that the phase was successful; this section is devoted to proving this lemma;
• given the lemma, it follows that PUSH-PULL informs a set of nodes of volume larger
than |E|, starting from any single node, in time O(φ−2 · log n).
Indeed, by applying the Chernoff bound one can prove that, by flipping c · φ−1 · log n
IID coins, each having Θ(1) head probability, the number of heads will be at least
f (c)·φ−1 ·log n with high probability — with f (c) increasing, and unbounded, in c. This
implies that we can get enough (that is, Θ(φ−1 · log n)) successful phases for covering
more than half of the graph’s volume in at most Θ(φ−1 ) · Θ(φ−1 · log n) = Θ(φ−2 · log n)
rounds;
• applying lemma 4.3.4, we can then show that each uninformed node can get the information in the same number of steps, if a set S of volume > |E| is informed —
completing the proof. Recall that the probability that the information spreads from
any node v to a set of nodes with more than half the volume of the graph is 1−O(n−2 ).
Then, with that probability the source node s spreads the information to a set of nodes
with such volume. Furthermore, by lemma 4.3.4, any uninformed node would get the
information from some node — after node s successfully spreads the information —
with probability 1−O(n−2 ). By union bound, we have that with probability 1−O(n−1 )
PUSH-PULL will succeed in O(φ−2 · log n).
Our first lemma shows how one can always find a subset of nodes in the “smallest” part
of a good conductance cut, that happen to hit many of the edges in the cut, and whose
elements have a large “fraction” of their degree that cross the cut.
Lemma 4.4.1 Let G(V, E) be a simple graph.
4.4. WARM-UP: A WEAK BOUND
63
Let A ⊆ B ⊆ V , with vol(B) ≤ |E| and cut(A, V − B) ≥ 34 · cut(B, V − B). Suppose
further that the conductance of the cut (B, V − B) is at least φ, cut(B, V − B) ≥ φ · vol(B).
If we let
d+ (v)
φ
B
U = UB (A) = v ∈ A
≥
,
d(v)
2
1
4
then cut(U, V − B) ≥
· cut(B, V − B).
Proof: We prove the lemma with the following derivation:
X
X
d+
(v)
+
d+
B
B (v) = cut(B, V − B)
v∈A
X
v∈B−A
d+
B (v) = cut(B, V − B) −
v∈A
X
d+
B (v) +
v∈A∩U
X
X
d+
B (v)
v∈B−A
X
d+
B (v) = cut(A, V − B)
v∈A−U
d+
B (v) +
v∈U
X
d+
B (v) ≥
v∈A−U
3
· cut(B, V − B),
4
then,
X
3
· cut(B, V − B) −
d+
B (v)
4
v∈U
v∈A−U
X
X φ
3
+
dB (v) ≥
· cut(B, V − B) −
· d(v)
4
2
v∈U
v∈A−U
X
φ
3
d+
· cut(B, V − B) − · vol(B)
B (v) ≥
4
2
v∈U
X
1
3
d+
· cut(B, V − B) − · cut(B, V − B)
B (v) ≥
4
2
v∈U
X
1
d+
· cut(B, V − B).
B (v) ≥
4
v∈U
X
d+
B (v) ≥
Given v ∈ U = UB (A), we define
read “N-pull-U-v”) as follows
NU# (v)
(to be read “N-push-U-v”) and
1
NU# (v) = {u ∈ NU+ (v) | d(u) ≥ d+ (v)}
3
and
1
NU# (v) = {u ∈ NU+ (v) | d(u) < d+ (v)}.
3
NU#
(v)(to be
64
CHAPTER 4. GOSSIP
Then,
U # = {v ∈ U | NB# (v) ≥ NB# (v)}
and
U # = {v ∈ U | NB# (v) > NB# (v)}.
Observe that U # ∩ U # = ∅ and U # ∪ U # = U . In particular, (at least) one of vol(U # ) ≥
1
· vol(U ) and vol(U # ) ≥ 12 · vol(U ) holds. In the following, if vol(U # ) ≥ 12 · vol(U ) we will
2
“apply” the PUSH strategy on U ; otherwise, we will “apply” the PULL strategy.
Given a vertex v ∈ U , we will simulate either the PUSH the PULL strategy, for O φ1 steps
over it. The “gain” g(v) of node v is then the volume of the node(s) that pull the information
from v, or that v pushes the information to.
Our aim is to get a bound on the gain of the whole original vertex set S. This cannot
be done by summing the gains of single vertices in S, because of the many dependencies
in the process. For instance, different nodes v, v 0 ∈ S might inform (or could be asked the
information by) the same node in V − S.
To overcome this difficulty, we use an idea similar in spirit to the deferred decision
principle. First of all, let us remark that,
given a vertex set S having the information, we
will run the PUSH-PULL process for O φ1 rounds. We will look at what happens to the
neighbourhoods of different nodes in S sequentially, by simulating the O φ1 steps (which
we call a phase) of each v ∈ S and some of its peers in NS+ (v)
⊆ V − S. Obviously, we will
1
make sure that no node in V − S performs more than O φ PULL steps in a single phase.
l m
steps.
Specifically, we consider Algorithm 1 with a phase of k = 10
φ
Algorithm 1 The expansion process of the O
1:
2:
3:
4:
5:
6:
7:
log n
φ2
bound, with a phase length of k steps.
at step i, we consider the sets Ai , Bi ; at the first step, i = 0, we take A0 = B0 = S and
H0 = ∅;
if cut(Ai , V − Bi ) < 43 · cut(Bi , V − Bi ), or vol(Bi ) > |E|, we stop; otherwise, apply
lemma 4.4.1 to Ai , Bi , obtaining set Ui = UBi (Ai );
we take a node v out of Ui , and we consider the effects of either the push or the pull
strategy, repeated for k steps, over v and NB+i (v);
Hi+1 ← Hi ;
each node u ∈ NB+i (v) that gets informed (either by a push of v, or by a pull from v) is
added to the set of the “halted nodes” Hi+1 ; v is also added to the set Hi+1 ;
let Ai+1 = Ai − {v}, and Bi+1 = Bi ∪ Hi+1 ; observe that Bi+1 − Ai+1 = Hi+1 ;
iterate the process.
l m
Observe that — in Process 1 with k = 10
— no vertex in V − S will make more than
φ
O φ1 PULL steps in a single phase. Indeed, each time we run point 3 in the process, we only
4.4. WARM-UP: A WEAK BOUND
65
disclose whether some node u ∈ V − S actually makes, or does not make, a PULL from v. If
the PULL does not go through, and node u later tries to make a PULL to another node v 0 , the
probability of this second batch of PULL’s (and in fact, of any subsequent batch) to succeed
is actually larger than the probability of success of the first batch of PULL’s of u (since at
that point, we already know that the previous PULL batches made by u never reached any
previous candidate node v ∈ S).
The next lemma summarize the gain, in a single step, of a node v ∈ Ui .
Lemma 4.4.2 If v ∈ Ui# , then
1 +
φ
Pr g(v) ≥ · dBi (v) ≥ .
3
4
On the other hand, if v ∈ Ui# then
1
1
+
Pr g(v) ≥
· dBi (v) ≥ .
20
10
In general, if v ∈ Ui ,
φ
1
+
Pr g(v) ≥
· dBi (v) ≥ .
20
10
Proof: Suppose that v ∈ Ui# . Then, at least 21 d+
Bi (v) of the neighbours of v that are not in
Bi have degree ≥ 31 d+
Bi (v). Since v ∈ Ui , we have that
d+
B (v)
i
d(v)
≥ φ2 . Thus, the probability that
v pushes the information to one of its neighbours of degree ≥ 13 d+
Bi (v) is ≥
1 +
d (v)
2 Bi
d(v)
≥ φ4 .
Now, suppose that v ∈ Ui# . Recall that g(v) is the random variable denoting the gain
of v; that is,
X
g(v) =
gu (v),
#
u∈NB
(v)
i
where gu (v) is a random variable equal to d(u) if u pulls the information from v, and 0
otherwise.
Observe that E[gu (v)] = 1, so that E[g(v)] = N # (v), and that the variance of gu (v) is
Bi
2
Var[gu (v)] = E[gu (v) ] − E[gu (v)]2
1
· d(u)2 − 1
=
d(u)
= d(u) − 1.
Since the gu1 (v), gu2 (v), . . . are independent, we have
X
Var[g(v)] =
Var[gu (v)] =
#
u∈NB
(v)
i
≤ vol(NB#i (v))
1 +
≤
dBi (v) NB#i (v).
3
X
(d(u) − 1)
#
u∈NB
(v)
i
66
CHAPTER 4. GOSSIP
In the following chain of inequalities we will apply Chebyshev’s inequality to bound the
deviation of g(v), using the variance bound we have just obtained:
1
+
Pr g(v) ≤
· d (v) ≤
20 Bi
1
+
· d (v) ≤
Pr g(v) ≤ E[g(v)] − E[g(v)] +
20 Bi
#
1
+
Pr −g(v) + E[g(v)] ≥ NBi (v) −
· d (v) ≤
20 Bi
#
1
+
Pr |g(v) − E[g(v)]| ≥ NBi (v) −
· d (v) ≤
20 Bi


#
N (v) − 1 d+ (v) p
20 Bi
Pr |g(v) − E[g(v)]| ≥ qBi
· Var[g(v)] ≤
#
1 +
d (v) NBi (v)
3 Bi
#
1 +
N (v)
d
(v)
B
B
3# i 1 i +
≤
N (v) − d (v) 2
Bi
B
20
i
2
+
1
dBi (v)
6
2 ≤
1 +
1 +
(v)
−
(v)
d
d
B
B
2
20
i
i
2
20
9
√
≤ .
10
9· 6
This concludes the proof of the second claim. The third one is combination of the other
two.
l m
+
10
Now we focus on v, and its neighbourhood NBi (v), for φ many steps. What is the
gain G(v) of v in these many steps?
Lemma 4.4.3 Pr[G(v) ≥
1
20
−1
· d+
Bi (v)] ≥ 1 − e .
Proof: lObserve
that the probability that the event “g(v) ≥
m
10
once in φ independent trials is lower bounded by
1
20
· d+
Bi (v)” happens at least
10
10
φ dφe
φ φ
1− 1−
≥1− 1−
≥ 1 − e−1 .
10
10
The claim follows.
We now prove the main theorem of the section:
Theorem 4.4.1 Let S be the set of informed nodes, vol(S) ≤ |E|. Then, if S 0 is the set of
informed nodes after Ω(φ−1 ) steps, then with Ω(1) probability,
vol(S 0 ) ≥ (1 + Ω(φ)) · vol(S).
4.4. WARM-UP: A WEAK BOUND
67
l m
Proof: Consider Process 1 with a phase of length k = 10
. For the process to finish, at
φ
some step t it must happen that either vol(Bt ) > |E| (in which case, we are done — so we
assume the contrary), or cut(At , V − Bt ) < 43 · cut(Bt , V − Bt ). Analogously,
1
· cut(Bt , V − Bt ) ≤ cut(Bt − At , V − Bt ) = cut(Ht , V − Bt ).
4
But then,
1
1
· φ · vol(S) ≤
· φ · vol(Bt )
4
4
1
≤
· cut(Bt , V − Bt )
4
≤ cut(Ht , V − Bt )
X
X
=
d+
(v)
+
Bt
v∈Ht ∩S
≤
X
v∈Ht ∩S
d+
Bt (v)
v∈Ht ∩(V −S)
d+
Bt (v) +
X
vol(v)
v∈Ht ∩(V −S)
Consider the following two inequalities (that might, or might not, hold):
P
1
(a)
v∈Ht ∩(V −S) vol(v) ≥ 1000 · φ · vol(S), and
P
+
249
(b)
v∈Ht ∩S dBt (v) ≥ 1000 · φ · vol(S).
At least one of (a) and (b) has to be true. We call two-cases property the disjunction of
(a) and (b). If (a) is true, we are done, in the sense that we have captured enough volume
to cover a constant fraction of the cut induced by S.
We lower bound the probability of (a) to be false given the truth of (b), since the negation
of (b) implies the truth of (a).
Recall lemma 4.4.3. It states that — for each vi ∈ Ht ∩ S — we had probability at least
+
1
1
· d+
1 − e−1 of gaining at least 20
Bi (vi ) ≥ 20 · dBt (vi ), since i ≤ t implies Bi ⊆ Bt .
For each vi , let us define the random variable Xi as follows: with probability 1 − e−1 , Xi
1
has value 20
· d+
Bt (vi ) , and with the remaining probability it has value 0. Then, the gain of
vi is a random variable that dominates Xi . Choosing q = 1 − 2e−1 , in lemma 4.3.1, we can
conclude that
"
#
X
X 1
1
+
Pr
Xi ≥ ·
· dBt (vi )
≥ 1 − 2e−1 .
2 v ∈H ∩S 20
i: vi ∈Ht ∩S
t
i
P
1
Thus, with constant probability (≥ 1 − 2e−1 ) we gain at least 40
· vi d+
Bt (vi ), which in
turn, is at least
X
1
1 249
6
·
d+
·
· φ · vol(S) ≥
· φ · vol(S).
Bt (vi ) ≥
40 v ∈H ∩S
40 1000
1000
i
t
68
CHAPTER 4. GOSSIP
Hence, (a) is true with probability at least 1 − 2e−1 . So with constant probability there is
1
a gain of 1000
· φ · vol(S) in φ1 steps. Thus using the proof strategy presented at the beginning
of the section we get a O (φ−2 log n) bound on the completion time.
4.5
A tighter bound
In this section we will present a tighter bound of
!
log2 φ1
log n
O
· log n = Õ
.
φ
φ
Observe that, given the already noted diametral lower bound of Ω
conductance φ ≥
in φ−1 ).
1
,
n1−
log n
φ
on graphs of
the bound is almost tight (we only lose an exponentially small factor
Our generalstrategy
for showing the tighter bound will be close in spirit to the one we
log n
used for the O φ2 bound of the previous section.
The new strategy is as follows:
• we will prove in this section that, given any set S of informed nodes having volume at
most |E|, for some p = p(S) ≥ Ω(φ), after O(p−1 ) rounds (that we call a p-phase) the
new set S 0 of informed vertices, S 0 ⊇ S, will have volume vol(S 0 ) ≥ 1 + Ω p·logφ2 φ−1 ·
vol(S) with constant probability (over the random choices performed by nodes during
those O(p−1 ) rounds) — if this happens, we say that the phase was successful;
• using the previous statement we can show that PUSH-PULL informs a set of nodes of
volume larger than |E|, starting from any single node, in time T ≤ O(φ−1 · log2 φ−1 ·
log n) with high probability.
Observe that at the end of a phase one has a multiplicative volume gain of
φ
1+Ω
p · log2 φ−1
with probability lower bounded by a positive constant c. If one averages that gain over
the O(p−1 ) rounds of the phase, one can say that with constant probability
c, each
φ
round in the phase resulted in a multiplicative volume gain of 1 + Ω log2 φ−1 .
We therefore apply lemma 4.3.3 with L = Θ
log2 φ−1
φ
, B = Θ
log2 φ−1
φ
· log n ,
P = c and δ equal
to any inverse polynomial in n, δ = n−Θ(1) . Observe that
log
L
B ≥ Θ PL log δ . Thus, with probability 1 − δ = 1 − n−Θ(1) , we have Θ(B · P ) =
Θ
log2 φ−1
φ
· log n
successful steps. Since each successful step gives a multiplicative
4.5. A TIGHTER BOUND
volume gain of 1 + Ω
69
φ
log φ−1
2
, we obtain a volume of
„
1+Ω
φ
log φ−1
2
Θ
«
log2 φ−1
·log n
φ
= eΘ(log n) ,
which, by a suitable choice of the constants, is larger than |E|.
• by applying lemma 4.3.4, we can then then show by symmetry that each uninformed
node can get the information in T rounds, if a set S of volume > |E| is informed —
completing the proof.
b # (v) (to be read “N-hat-push-U-v”) and N
b # (v)(to
Given v ∈ U = UB (A), we define N
U
B
be read “N-hat-pull-U-v”) as follows
b # (v) = {u ∈ N + (v) | d(u) ≥ 1 · φ−1 · d+ (v)}
N
B
B
B
3
and
b # (v) = {u ∈ N + (v) | d(u) < 1 · φ−1 · d+ (v)}.
N
B
B
B
3
Then, we define,
o
n
b # b #
#
b
U = v ∈ U | NB (v) ≥ NB (v)
and
o
n
b # (v) .
b # (v) > N
b # = v ∈ U | N
U
B
B
b # ) ≥ 1 · vol(U ) we “apply” the PUSH strategy on U ; otherwise, we
As before, if vol(U
2
“apply” the PULL strategy.
The following lemma is the crux of our analysis. It is a strengthening of lemma 4.4.2. A
corollary of the lemma is that there exists a p = pv ≥ Ω(φ), such that afterp−1 rounds,
with
constant probability, node v lets us gain a new volume proportional to Θ
d+
B (v)
i
p·log φ−1
.
b # (v) can be partitioned in at most 6+log φ−1 parts,
Lemma 4.5.1 Assume v ∈ Ui . Then, N
Bi
S1 , S2 , . . ., in such a way that for each part i it holds that, for some PSi ∈ (2−i+1 , 2−i ],
|Si |
≥ 1 − 2e−1 ,
Pr GSi (v) ≥
256 · PSi
where GSi (v) is the total volume of nodes in Si that perform a PULL from v in PS−1
rounds.
i
Lemma 4.4.2, that we used previously, only stated that with probability Ω(φ) we gained
a new volume of Θ(d+
Bi (v)) in a single step. If we do not allow v to go on for more than one
70
CHAPTER 4. GOSSIP
step then the bounds of lemma 4.4.2 are sharp4 .
The insight of lemma 4.5.1 is that different nodes might require different numbers of
rounds to give their “full” contribution in terms of new volume, but the more we have to
wait for, the more we gain.
We now prove Lemma 4.5.1. Proof:[Proof of Lemma
4.5.1.]
+
b # (v) in K = KB (v) = lg dBi (v)
We divide the nodes in N
i
Bi
3·φ
buckets in a power-of-two
b # (v) having degree
manner. That is, for j = 1, . . . , K, Rj contains all the nodes u in N
Bi
2j−1 ≤ d(u) < 2j . Observe that the Rj ’s are pairwise disjoint and that their union is equal
b # (v).
to N
Bi
Consider the buckets Rj , with j > lg φ−1 . We will empty some of them, in such a way
that the total number of nodes removed from the union of the bucket is an fraction of the
total (that is, of d+
Bi (v)). This node removal step is necessary for the following reason: the
buckets Rj , with j > lg φ−1 , contain nodes with a degree so high that any single one of them
will perform a PULL operation on v itself with probability strictly smaller than φ. We want
to guarantee that the probability of a “gain” is at least φ so we are forced to remove nodes
having too high degree. If their number is so small that — overall — the probability of any
single one of them to actually perform a PULL on v is smaller than φ.
The node removal phase is as follows. If Rj0 is the set of nodes in the j-th bucket after
the node removal phase, then
1
Rj |Rj | ≥ 16
· 2j · φ
0
Rj =
∅
otherwise
Observe that the total number of nodes we remove is upper bounded by
K
K X
1
φ X i φ K+1
i
·2 ·φ
≤
2 ≤ 2
16
16
8
i=0
i=1
(v)
d+
φ
1
≤
· 4 · Bi
= · d+
(v).
8
3·φ
6 Bi
P P
1 +
Therefore, j Rj0 ≥ 31 d+
Bi (v), since
j |Rj | ≥ 2 dBi (v).
4
To prove this, we give two examples. In the first one, we show that the probability of informing any
new node might be as small as O(φ). In the second, we show that a single step the gain might be only
a φ fraction of the volume of the informed nodes. Lemma 4.5.1 implies that these two phenomena cannot
happen together.
For the first example, take two stars: a little one with Θ φ−1 leaves, and a large one with Θ(n) leaves.
Connect the centers of the two stars by an edge. The graph will have conductance Θ(φ). Now, suppose that
the center and the leaves of the little star are informed, while the nodes of the large star are not. Then, the
probability of the information to spread to any new node (that is, to the center of the large star), will be
O(φ).
For the second example, keep the same two stars, but connect them with a path of length 2. Again, inform
only the nodes of the little star. Then, in a single step, only the central node in the length-2 path can be
informed. The multiplicative volume gain is then only 1 + O(φ).
4.5. A TIGHTER BOUND
71
Consider the random variable g(v), which represents the total volume of the nodes in
the different Rj0 ’s that manage to pull the information from v. If we denote by gj (v) the
P
contribution of the nodes in bucket Rj0 to g(v), we have g(v) = K
j=1 gj (v).
Take any non-empty bucket Rj0 . We want to show, via lemma 4.3.2, that
"
#
1 Rj0 Pr gj (v) ≥
·
≥ pj .
128 pj
(If this event occurs, we say that bucket j succeeds.)
This claim follows directly from 4.3.2 by creating one Xi in the lemma for each u ∈ Rj0 ,
and letting Xi = 1 iff node u pulls the information from v. The probability of this event is
pi ∈ (2−j , 2−j+1 ], so we can safely choose the p of lemma 4.3.2 to be p = 2−j+1 .
Consider
different pj ’s of the buckets. Fix some j. If pj came out of case 2 then,
0 the
1
1
1
since Rj ≥ 16 · 2j · φ, we have pj ≥ 32
· φ. If pj came out of case 1, then pj = 32
. In general,
φ
pj ≥ 32 , and pj ≤ 1.
Let us divide the unit into segments of exponentially decreasing length: that is, 1, 21 , 12 , 14 ,
. . . , [2−j+1 , 2−j ) , . . .. For each j, let us put each bucket Rj0 into the segment containing its
m
l
≤ 6 + lg φ−1 segments.
pj . Observe that there are at most lg 32
φ
Fix any non-empty segment `. Let S` be the union of the buckets in segment `. Observe
that if we let the nodes in the buckets of S` run the process for 2` times, we have that, for
each bucket Rj0 ,
"
#
1 Rj0 `
Pr Gj (v) ≥
·
≥ 1 − (1 − pj )2 ≥ 1 − e−1 ,
128 pj
where Gj (v) is the total volume of nodes in Rj0 that perform a PULL from v in 2` rounds.
|R0 |
1
Now, we can apply lemma 4.3.1, choosing Xj to be equal vj = 128
· pjj if bucket Rj0
in segment ` (buckets can be ordered arbitrarily) is such that Gj (v) ≥ vj , and 0 otherwise.
Choosing p = 1 − e−1 and q = 1 − 2e−1 , we obtain:
1 |S` |
Pr GS` (v) ≥
·
≥ 1 − 2e−1 .
256 2−`
The following corollary follows from lemma 4.5.1. (We prove in Appendix 4.7 that,
constants aside, it is the best possible.)
φ
Corollary 4.5.1 Assume vi ∈ Ui . Then, there exists pi ∈ 64
, 1 such that
"
#
d+
Bi (vi )
Pr G(vi ) ≥
≥ 1 − 2e−1 .
5000 · pi · lg φ2
where G(vi ) is the total volume of nodes in NB+i (vi ) that perform a PULL from vi , or that vi
pushes the information to, in p−1
rounds.
i
72
CHAPTER 4. GOSSIP
b # , the same reasoning of lemma 4.4.2 applies. If vi ∈ U
b # , then we apply
Proof: If vi ∈ U
i
i
lemma 4.5.1 choosing the part S with the largest cardinality. By the bound on the number
of partitions, we will have
S≥
d+
d+
1
Bi (vi )
Bi (vi )
·
≥
,
3
6 + lg φ−1
18 · lg φ2
which implies the corollary.
We now prove the main theorem of the section:
Theorem 4.5.1 Let S be the set of informed nodes, vol(S) ≤ |E|. Then, if S 0 is the set
of informed nodes then there exists some Ω(φ) ≤ p ≤ 1 such that, after O (p−1 ) steps, then
with Ω(1) probability,
!!
φ
· vol(S).
vol(S 0 ) ≥ 1 + Ω
p · log2 φ1
Corollary 4.5.1 is a generalization of lemma 4.4.2, which would lead to our result if we could
prove an analogous of the two-cases property of the previous section. Unfortunately, the
final gain we might need to aim for, could be larger than the cut — this inhibits the use
of the two-cases property. Still, by using a strenghtening of the two-cases property, we will
prove Theorem 4.5.1 with an approach similar to the one of Theorem 4.4.1.
Proof:
We say that an edge in the cut (S, V − S) is easy if its endpoint w in V − S is such that
≥ φ. Then, to overcome the just noted issue, we consider two cases separately: (a) at
least half of the edges in the cut are easy, or (b) less than half of the edges in the cut are
easy.
l
m
In case (a) we bucket the easy nodes in Γ(S) (the neighbourhood of S) in lg φ1 buckets
l
m
in the following way. Bucket Di , i = 1, . . . , lg φ1 , will contain all the nodes w in Γ(S)
d−
S (w)
d(w)
such that 2−i <
arbitrarily).
d−
S (w)
d(w)
≤ 2−i+1 . Now let Dj be the bucket of highest volume (breaking ties
For any node v ∈ Dj we have that its probability to pull the information in one step
is at least 2−j . So, the probability of v to pull the information in 2j rounds is at least
1 − e−1 . Hence, by applying lemma 4.3.1, we get that with probability greater than or equal
vol(D )
to 1 − 2e−1 we gain a set of new nodes of volume at least 2 j in 2j rounds. But,
vol(Dj )
cut(S, Dj )
cut(S, Γ(S))
φ · vol(S)
l
m ≥ 2j · l
m .
≥ 2j ·
≥ 2j ·
2
2
2 lg φ1
2 lg φ1
Thus in this first case we gain with probability at least 1 − 2e−1 a set of new nodes of
volume at least 2j · φ·vol(S)
in 2j rounds. By the reasoning presented at the beginning of
1
2dlg φ
e
section 4.5 the claim follows.
4.5. A TIGHTER BOUND
73
Now let us consider the second case, recall that in this case half of the edges in the cut
− (u)
point to nodes u in Γ(S) having dd(u)
≥ φ1 .
We then replace the original two-cases property with the strong two-cases property:
P
−
1
(a’)
v∈Ht ∩(V −S) dBt (v) ≥ 1000 · cut(S, V − S), and
P
+
249
(b’)
v∈Ht ∩S dBt (v) ≥ 1000 · cut(S, V − S).
As before, at least one of (a’) and (b’) has to be true. If (a’) happens to be true then
we
obtains will be greater than or equal
P are done since the−1total
P volume of −the new nodes
1
−1
· cut(S, V − S). By Corollary 4.5.1,
v∈Ht ∩(V −S) d(v) ≥ φ
v∈Ht ∩(V −S) dBt (v) ≥ 1000 · φ
−1
we will wait at most w rounds for the cut (S, V − S), for some w ≤ O (φ
). Thus,if (a’)
cut(S,V −S)
holds, we are guaranteed to obtain a new set of nodes of total volume Ω
in w
w
rounds. Which implies our main claim.
We now show
that if (b’) holds, then with Θ(1) probability our total gain will be at least
cut(S,V −S)
Ω w log2 φ−1 in w rounds, for some w ≤ (φ−1 ).
Observe that each vi ∈ Ht ∩ S, when it was considered by the process, was given some
φ
probability pi ∈ 64
, 1 by Corollary 4.5.1. We partition Ht ∩ S in buckets according to
probabilities pi . The j-th bucket will contain all the nodes vi in Ht ∩ S such that 2−j < pi ≤
2−j+1 . Recalling that Bi is thePset of informed nodes when node vi is considered, we let F
be the bucket that maximizes vi ∈F d+
Bt (vi ).
Then,
X
d+
Bt (vi ) ≥
vi ∈F
249
l
m · cut(S, V − S)
1000 lg 64
φ
(4.1)
φ
By Corollary 4.5.1, we have that for each
l m vi ∈ F , there exists p = p(F ) ≥ 64 , such that
+
1
with probability at least 1 − 2e−1 , after p2 round, we gain at least 5000·p·lg
2 · dBi (vi ) ≥
φ
1
5000·p·lg
2
φ
· d+
Bt (vi ) (since i ≤ t implies Bi ⊆ Bt ). For each vi , let us define the random
variable Xi as follows: with probability 1 − 2e−1 , Xi has value
1
5000·p·lg
2
φ
· d+
Bt (vi ) , and with
the remaining probability it has value 0. Then, the gain of vi is a random variable that
dominates Xi . Choosing q = 1 − 52 e−1 , in lemma 4.3.1, we can conclude that
"
!#
X
1
4 X
+
· d (vi )
Pr
Xi ≥ ·
5 v ∈F 5000 · p · lg φ2 Bt
i: v ∈F
i
i
5
≥ 1 − e−1 ≥ 0.08.
2
Thus, in
l m
2
p
rounds, with constant probability we gain at least
X d+
1
Bt (vi )
·
,
2
p
6250 · lg φ v ∈F
i
74
CHAPTER 4. GOSSIP
which by equation 4.1, it is lower bounded by
1
6250·p·lg
2
φ
·
249
1000dlg
64
≥Ω
cut(S,V −S)
p·log2 φ−1
φ
e
· cut(S, V − S) ≥
Thus applying the reasoning presented at the beginning of the section the claim follows. 4.6
Push and Pull by themselves
We now comment on how one can change our analysis to get a bound of O(cα · φ−1 · log2 φ−1
log n) on the completion time of PUSH or PULL by themselves. Observe that, if degrees of
neighboring nodes have a constant ratio, then the probability that a single node vi ∈ S (in
our analysis) to make a viable PUSH or to be hit by a PULL is Θ(αφ) (indeed, v will have at
least φ · d(v) neighboring nodes in V − S, each having at most its degree times α — an easy
calculation shows that the probability bound for both PUSH and PULL is then Θ(αφ)). Using
this observation our analysis can be concluded as in the previous section.
Lemma 4.6.1 If the PUSH strategy is used then for each v ∈ Ui ,
1
φ
Pr g(v) ≥ · d(v) ≥ .
α
4
If the PULL strategy is used then for each v ∈ Ui ,
1
φ
Pr g(v) ≥ · d(v) ≥
.
α
4α
Proof:
By the uniformity condition (a) each of the d+
Bi (v) neighbours of v, that are outside of
−1
Bi , have degree within α · d(v) and α · d(v); furthermore, since v ∈ Ui , (b) it holds that
d+
B (v)
i
≥ φ2 .
Suppose the PUSH strategy is used. By (b) the probability that v pushes the information to
d(v)
some neighour outside Bi (obtaining a gain g(v) of at least α−1 ·d(v), by (a)) is ≥
1 +
d (v)
2 Bi
d(v)
≥ φ4 .
Suppose, instead, the PULL strategy is used. Then the probability that some neighbour
of v outside Bi performs a PULL from v is
d(v)·φ·1/2
Y Y 1
1
1
1−
1−
≥ 1−
1−
≥1− 1−
d(u)
α · d(v)
α · d(v)
u∈NBi (v)
u∈NBi (v)
φ
4α
where the first inequality is justified by (a), the second by (b), and the remaining two are
classic algebraic manipulations. Using (a) again, we obtain that the probability of having a
φ
gain g(v) of at least α−1 · d(v) is at least 4α
.
1
≥ 1 − e−φ· 2α ≥
4.7. OPTIMALITY OF COROLLARY 4.5.1
4.7
75
Optimality of Corollary 4.5.1
Figure 4.1: A construction showing that Corollary 4.5.1 is sharp. Each node is labeled with
its degree.
Consider the cut in Figure 4.1, with φ = 2−t , for some integer t ≥ 1. The set of informed
1
nodes S is a star; its central node (shown in the figure), having degree log2φ /φ , is connected
log2 1/φ
φ
− log2 1/φ leaves inside S, and log2 1/φ nodes outside S. The volume of S is then
φ
2
vol(S) = φ − 1 log2 1/φ, and the conductance of the cut is then 2+φ
= Ω(φ). (It follows
that, for any sufficiently large order, there exists a graph of that order with conductance
Θ(φ) that contains the graph in Figure 4.1 as a subgraph.) Finally, the i-th neighbour of S,
i = 1, . . . , log2 1/φ, has degree 2i .
to
Corollary 4.5.1, applied to our construction, gives that there exists some p, Ω(φ) ≤
p ≤ 1, such that the gain in p−1 rounds is Ω(p−1 ) with constant probability. (One can get
a direct proof of this statement by analyzing the PULL performance.) We will show that
Corollary 4.5.1 is sharp in the sense that, for each fixed constant c > 0, and for any p in
the range, the probability of having a gain of at least cp−1 with no more than p−1 rounds
is O().
Observe that the claim is trivial if p > , since no gain can be obtained in zero rounds.
We will then assume p ≤ . Because of the O (·) notation, we can also assume ≤ c/8.
Therefore we prove the statement for p ≤ 8c .
Let us analyze the PUSH strategy. Observe that the probability of performing a PUSH
76
CHAPTER 4. GOSSIP
−1
from S to the outside in φ−1 ≥ Ω(p−1 ) rounds is (1 − φ)φ
of gaining anything with the PUSH strategy is at most .
≤ . Therefore the probability
Now let us analyze the PULL strategy. Fix any Ω(φ−1 ) ≤ p ≤ 1. Let A be the set of the
neighbours of S having degree less than 2c p−1 , and let B be the set of remaining neighbours.
Then, the total volume of (and, thus, the total PULL gain from) nodes in A is not more than
cp−1 − 1. Therefore to obtain the required gain, we need a node in B to make a PULL from
S.
The probability that some node in B makes a PULL from S in one round is upperbounded
by
log2 1/φ
X
c −1
2
4p
1
≤ .
2−i = 2 · 1 − 2− log2 /φ−1 − 1 + 2−dlog2 2 p e ≤
c −1
c
2dlog2 2 p e
i=dlog2 2c p−1 e
l m
It follows that the probability P that some node in B performs a PULL from S in k = p
k
c
2
c/8 we have that 1 − 4p 4p ≥ 1
rounds is at most P ≥ 1 − 1 − 4p
.
Since
p
≤
= 41 .
c
c
2
Therefore,
1 4p
4p
4
= Θ().
P ≥ 1 − 4−k· c ≥ 1 − 4−· p · c = 1 − 4−· c = Θ
c
The claim is thus proved.
Chapter 5
Compressibility of the Web graph
Graphs resulting from human behavior (the web graph, friendship graphs, etc.) have hitherto
been viewed as a monolithic class of graphs with similar characteristics; for instance, their
degree distributions are markedly heavy-tailed. In this chapter we take our understanding
of behavioral graphs a step further by showing that an intriguing empirical property of
web graphs — their compressibility — cannot be exhibited by well-known graph models for
the web and for social networks. We then develop a more nuanced model for web graphs
and show that it does exhibit compressibility, in addition to previously modeled web graph
properties.
5.1
Overview
There are three main reasons for modeling and analyzing graphs arising from the Web and
from social networks: (1) they model social and behavioral phenomena whose graph-theoretic
analysis has led to significant societal impact (witnessed by the role of link analysis in web
search); (2) from an empirical standpoint, these networks are several orders of magnitude
larger than those studied hitherto (search companies are now working on crawls of 100
billion pages and beyond); (3) from a theoretical standpoint, stochastic processes built from
independent random events — the classical basis of the design and analysis of computing
artifacts — are no longer appropriate. The characteristics of such behavioral graphs (viz.,
graphs arising from human behavior) demand the design and analysis of new stochastic
processes in which elementary events are highly dependent. This in turn demands new
analysis and insights that are likely to be of utility in many other applications of probability
and statistics.
In such analysis, there has been a tendency to lump together behavioral graphs arising
from a variety of contexts, to be studied using a common set of models and tools. It has
been observed [5, 21, 66] for instance that the directed graphs arising from such diverse
phenomena as the web graph (pages are nodes and hyperlinks are edges), citation graphs,
The work described in this chapter is a joint work with F. Chierichetti, R. Kumar, A. Panconesi and
P. Raghavan and its extended abstract appeared in the Proceedings of 50th Annual IEEE Symposium on
Foundations of Computer Science(FOCS09) [25]. This work is also part of F. Chierichetti’s PhD Thesis.
77
78
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
friendship graphs, and email traffic graphs all exhibit power laws in their degree distributions:
the fraction of nodes with indegree k > 0 is proportional to 1/k α typically for some α > 1;
random graphs generated by classic Erdös–Rényi models cannot exhibit such power laws.
To explain the power law degree distributions seen in behavioral graphs, several models have
been developed for generating random graphs [2, 5, 15, 16, 23, 37, 58, 71] in which dependent
events combine to deliver the observed power laws.
While the degree distribution is a fundamental but local property of such graphs, an
important global property is their compressibility — the number of bits needed to store each
edge in the graph. Compressibility determines the ability to efficiently store and manipulate
these massive graphs [53, 99, 107]. An intriguing set of papers by Boldi, Santini, and Vigna
[9, 10, 12] shows that the web graph is highly compressible: it can be stored such that
each edge requires only a small constant number — between one and three — of bits on
average; a more recent experimental study confirms these findings [22]. These empirical
results suggest the intriguing possibility that the Web can be described with only O(1)
bits per edge on average. Two properties are at the heart of the compression algorithm of
Boldi and Vigna [10]. First, once web pages are sorted lexicographically by URL, the set of
outlinks of a page exhibits locality; this can plausibly be attributed to the fact that nearby
pages are likely to come from the same web site’s template. Second, the distribution of
the lengths of edges follows a power law with exponent > 1 (the length of an edge is the
distance of its endpoints in the ordering); this turns out to be crucial for high compressibility.
This prompts the natural question: can we model the compressibility of the web graph, in
particular mirroring the properties of locality and edge length distribution, while maintaining
other well-known properties such as power law degree distribution.
Main results. Our first set of results in this chapter is to show that the best known models
for the web graph cannot account for compressibility, in the sense that they require Ω(log n)
bits storage per edge on average. This holds even when these graphs are represented just
in terms of their topology (i.e., with all labels stripped away). Specifically, we show that
the preferential attachment model [5, 15], the ACL model [2], the copying model [65], the
Kronecker product model [69], and Kleinberg’s model for navigability1 on social networks
[58], all have large entropy in the above sense.
We then show our main result: a new model for the web graph that has constant entropy per edge, while preserving crucial properties of previous models such as the power
law distribution of indegrees, a large number of communities (i.e., bipartite cliques), small
diameter, and a high clustering coefficient. In this model, nodes lie on the line and when
a new node arrives it selects an existing node uniformly at random, placing itself on the
line to the immediate left of the chosen node. An edge from the new to the chosen node is
added, and moreover all outgoing edges of the chosen node but one are copied (these edges
are chosen at random); thus, the edges have some locality. We then show a crucial property
of our model: the power law distribution of edge lengths. Intuitively, this long-get-longer
effect is caused since a long edge is likely to receive the new node (which selects its position
1
Since navigability is a crucial property of real-life social networks (cf. [31, 73, 101]), it is tempting to
conjecture that social networks are incompressible; see, for instance, [24].
5.1. OVERVIEW
79
uniformly at random) under its protective wing, and the longer it gets, the more likely it
is to attract new nodes. Using this, we show that the graphs generated by our model are
compressible to O(1) bits per edge; we also provide a linear-time algorithm to compress an
unlabeled graph generated by our model.
Technical contributions and guided tour. In Section 5.3 we prove that several wellknown web graph models are not compressible, i.e., they need Ω(log n) bits per edge. In
fact, we prove incompressibility even after the labels of nodes and orientations of edges are
removed.
Sections 5.4 presents our new model and Sections 5.5, 5.6 and 5.8 present the basic
properties of our model. Although our new model might at first sight closely resembles
a prior copying model of [65], it differs in fundamental respects. First, our new model
successfully admits the global property of compressibility which the copying model provably
does not. Second, while the analysis of the distribution of the in-degrees is rather standard,
the crucial property that edge lengths are distributed according to a power law requires
an entirely novel analysis; in particular, the proof requires a very delicate understanding
of the structural properties of the graphs generated by our model in order to establish the
concentration of measure. Section 6.3 addresses the compressibility of our model, where we
also provide an efficient algorithm to compress graphs generated by our model.
It is difficult to distinguish experimentally between graphs that require only O(1) bits per
edge and those requiring, say, log n bits. The point however is that the compressibility of
our model relies upon other important structural properties of real web graphs that previous
models, in view of our lower bounds, provably cannot have.
Related prior work. The observation of power law degree distributions in behavioral (and
other) graphs has a long history [5,66]; indeed, such distributions predate the modern interest
in social networks through observations in linguistics [108] and sociology [97]; see the survey
by Mitzenmacher [80]. Simon [97], Mandelbrot [76], Zipf [108] and others have provided a
number of explanations for these distributions, attributing them to the dependencies between
the interacting humans who collectively generate these statistics. These explanations have
found new expression in the form of rich-get-richer and herd-mentality theories [5, 103].
Early rigorous analyses of such models include [2, 15, 28, 65]. Whereas Kumar et al. [65] and
Borgs et al. [16] focused on modeling the web graph, the models of Aiello, Chung, and Lu
(ACL) [2], Kleinberg [58], Lattanzi and Sivakumar [67], and Leskovec et al. [69] addressed
social graphs in which people are nodes and the edges between them denote friendship. The
ACL model is in fact known not to be a good representation of the web graph [66], but is a
plausible model for human social networks. Kleinberg’s model of social networks focuses on
their navigability: it is possible for a node to find a short route to a target using only local,
myopic choices at each step of the route. The papers by Boldi, Santini and Vigna [9, 10, 12]
suggests that the web graph is highly compressible (see also [1, 22, 24, 99]).
80
5.2
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
Preliminaries
The graph models we study will either have a fixed number of nodes or will be evolving
models in which nodes arrive in a discrete-time stochastic process; for many of them, the
number of edges will be linear in the number of nodes. We analyze the space needed to store
a graph randomly generated by the models under study; this can be viewed in terms of the
entropy of the graph generation process. Note that a naive representation of a graph would
require Ω(log n) bits per edge; entropically, one can hope for no better for an Erdös–Rényi
graph. We are particularly interested in the case when the amortized storage per edge can
be reduced to a constant. As in the work of Boldi and Vigna [10, 12], we view the nodes as
being arranged in a linear order. To prove compressibility we then study the distribution of
edge lengths — the distance in this linear order between the end-points of an edge.
Background.We now recall a concentration result introduced in the previous chapters.
Given a function f : A1 × · · · × An → R, we say that f satisfies the c-Lipschitz property
if, for any sequence (a1 , . . . , an ) ∈ A1 × · · · × An , and for any i and a0i ∈ Ai ,
|f (a1 , . . . , ai−1 , ai , ai+1 , . . . , an ) − f (a1 , . . . , ai−1 , a0i , ai+1 , . . . , an )| ≤ c.
In order to establish that certain events occur w.h.p., we will make use of the following
concentration result known as the method of bounded differences (cf. [35]).
Theorem 5.2.1 (Method of bounded differences) Let X1 , . . . , Xn be independent r.v.’s.
Let f be a function on X1 , . . . , Xn satisfying the c-Lipschitz property. Then,
2
2
Pr [|f (X1 , . . . , Xn ) − E [f (X1 , . . . , Xn )]| > t] ≤ 2e−t /(c n) .
We also prove the following lemma about the Gamma function. We will use it in the
compressibility analysis of our new model.
Lemma 5.2.1 Let a, b ∈ R+ be such that b 6= a + 1. For each t ∈ Z+ , it holds that
t
X
Γ(i + a)
i=1
1
=
·
Γ(i + b)
b−a−1
Γ(a + 1) Γ(t + a + 1)
−
Γ(b)
Γ(t + b)
.
Proof: We start by giving an expression of Γ(i+a)
, for i ≥ 1, that we will use to telescope
Γ(i+b)
the sum. Consider the following chain of equations:
Γ(i + a)
Γ(i + b)
Γ(i + a)
Γ(i + b)
b − a − 1 = (i + b − 1) − (i + a)
Γ(i + a)
Γ(i + a)
· (b − a − 1) =
· (i + b − 1) −
· (i + a)
Γ(i + b)
Γ(i + b)
Γ(i + a)
Γ(i + a + 1)
· (b − a − 1) =
−
Γ(i + b − 1)
Γ(i + b)
Γ(i + a)
1
Γ(i + a)
Γ(i + a + 1)
=
·
−
Γ(i + b)
b−a−1
Γ(i + b − 1)
Γ(i + b)
5.3. INCOMPRESSIBILITY OF THE EXISTING MODELS
Then, by telescoping on the sum terms, we get:
Γ(a+1)
Γ(a+2)
Γ(a+2)
Γ(a+3)
Γ(a+t)
t
−
+
−
+
·
·
·
+
−
X
Γ(b)
Γ(b+1)
Γ(b+1)
Γ(b+2)
Γ(b+t−1)
Γ(i + a)
=
Γ(i + b)
b−a−1
i=1
=
Γ(a+1)
Γ(b)
−
Γ(a+t+1)
Γ(b+t)
b−a−1
Γ(a+t+1)
Γ(b+t)
,
proving the claim.
5.3
81
Incompressibility of the existing models
In this section we prove the inherent incompressibility of commonly-studied random graph
models for social networks and the web. We show that on average Ω(log n) bits per edge are
necessary to store graphs generated by several well-known models for web/social networks,
including the preferential attachment and the copying models. In our lower bounds, we show
that the random graph produced by the models we consider are incompressible, even after
removing the labels of their nodes and orientations of their edges. Given a labeled/directed
graph and its unlabeled/undirected counterpart(the set of graphs obtained from the initial
graph by applying an isomorphism), the latter is more compressible than the former; in fact,
the gap can be arbitrarily large [84,102]. Thus the task of proving incompressibility of unlabeled/undirected versions of graphs generated by various models is made more challenging.
(Note that it is crucial to analyze the compressibility of unlabeled graphs — the experiments
on web graph [10, 12] show how just the edges can be compressed using only ≈ 2 bits per
edge.)
We now give some intuition on why one cannot preclude an incompressible directed/labeled
graph from becoming very compressible after removing the labels and directions.
Consider the following (non-graph related) random process.
Suppose we have two bins B1 and B2 and suppose we toss two independent fair coins
c1 , c2 . If c1 is head (resp., tail), then we place a white (resp., black) ball in B1 . Analogously,
if c2 is head (resp., tail), then we place a white (resp., black) ball in B2 . Now, consider the
r.v. X describing the status of the two distinguishable bins. It has four possible outcomes
((W, W ), (W, B), (B, W ), (B, B)) and each of them is equally likely; thus H(X) = 2. Now,
suppose we empty the bins B1 and B2 on a table, and let Y be the random variable describing
the status of the table after the two balls are placed on it. Y has three possible outcomes
({W, W }, {W, B}, {B, B}) and its entropy is H(Y ) = 32 < 2 = H(X).
Similarly, for n coins and n bins, we have H(Xn ) = n and H(Yn ) = Θ(log n). Thus, we
can get an exponential gap between the entropies of the labeled (i.e., each outcome can be
matched to the coin toss that determined it) and unlabeled processes.
For a graph-related example, suppose we choose a labeled transitive tournament on n
nodes u.a.r. There are n! such graph, each equally likely, so that the entropy would be
log(n!) = Θ(n log n). On the other hand, there exists a single unlabeled transitive tournament, i.e., the entropy of the unlabeled version is zero.
82
5.3.1
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
Proving incompressibility
Let Gn denote the set of all directed labeled graphs on n nodes. Let Pnθ : Gn → [0, 1] denote
the probability distribution on Gn induced by the random graph model θ. In this chapter
we consider the preferential attachment model (θ = pref), the ACL model (θ = acl), the
copying model (θ = copy), the Kronecker multiplication model (θ = krm), and Kleinberg’s
model (θ = kl).
For a given θ, let H(Pnθ ) denote the Shannon entropy of the distribution Pnθ , that is, the
average number of bits needed to represent a directed labeled random graph generated by
θ. Our goal is to obtain lower bounds on the representation. This is accomplished by the
following min-entropy argument.
P
Lemma 5.3.1 (Min-entropy argument) Let Gn∗ ⊆ Gn , P + ≤ G∈Gn∗ Pnθ (G), and P ∗ ≥
maxG∈Gn∗ Pnθ (G). Then, H(Pnθ ) ≥ P + · log(1/P ∗ ).
Proof:
H(Pnθ ) =
X
G∈Gn
Pnθ (G) log
X
X
1
1
1
1
θ
P
(G)
log
Pnθ (G) log ∗ ≥ P + ·log ∗ . 2
≥
≥
n
θ
θ
Pn (G) G∈G ∗
Pn (G) G∈G ∗
P
P
n
n
by P and
we will upper bound
Thus, to obtain
P lowerθ bounds on
+
∗
lower bound G∈Gn∗ Pn (G) by P , for a suitably chosen Gn ⊆ Gn . For good lower bounds on
H(Pnθ ), Gn∗ has to be chosen judiciously. For instance, choosing a large Gn∗ (say, Gn ) might
only yield a P ∗ that is moderately small, while at the same time, it is important to choose
a Gn∗ such that P + is large.
Let Hn denote the set of all undirected unlabeled graphs on n nodes. Let ϕ : Gn → Hn
be the many-to-one map that discards node and edge labels and edge orientations. For
aPgiven model θ, let Qθn : Hn → [0, 1] be the probability distribution such that Qθn (H) =
θ
θ
θ
θ
ϕ(G)=H Pn (G). Clearly, H(Qn ) ≤ H(Pn ) and therefore, lower bounds on H(Qn ) are stronger
and harder to obtain.
In the following subsections we consider a number of Web Graph models, showing that
each of them requires Ω(log n) bits per link — that is, that they all are incompressible. We
consider, in this order, the Preferential Attachment model [15], the Aiello-Chung-Lu (ACL)
model [2], the copying model [65], the Kronecker multiplication model [69] and Kleinberg’s
small-world model [58].
H(Pnθ ),
5.3.2
maxG∈Gn∗ Pnθ (G)
∗
Incompressibility of the preferential attachment model
Consider the preferential attachment model (pref[k]) defined in [15]. This model is parametrized
by an integer k ≥ 1. At time 1, the (undirected) graph consists of a single node x1 with 1
self-loop. At time t > 1,
(1) a new node xt , labeled t, is added to the graph;
(2) a random node y is chosen from the graph with probability proportional to its current
degree (in this phase, the degree of xt is taken to be 1);
5.3. INCOMPRESSIBILITY OF THE EXISTING MODELS
83
(3) the edge xt → y, labeled t mod k, is added to the graph;2 and
(4) if t is a multiple of k, nodes t − k + 1, . . . , t are merged together, preserving self-loops
and multi-edges.
For k = 1, note that the graphs generated by the above model are forests. Since there
are 2O(n) unlabeled forests on n nodes (e.g., [83]), whose edges can be directed in at most 2n
pref[k]
ways, H(Qn
) = O(n), i.e., the graph without labels and edge orientations is compressible
to O(1) bits per edge. The more interesting case is when k ≥ 2 for which we show an
incompressibility bound.
We underscore the importance of a good choice of Gn∗ in applying Lemma 5.3.1. Consider
the graph G having the first node of degree k(n + 1) and the other n − 1 nodes of degree
Q
pref[k]
k−1+i
−nk
k. Clearly, Pn
(G) = nk
. Thus, choosing a set Gn∗ containing G, would
i=k+1 2i−1 ≥ 2
force us to have P ∗ ≥ 2−nk so that the entropy bound given by Lemma 5.3.1 would only be
pref[k]
H(Pn
) ≥ nk = Θ(n). (A similar issue would be encountered in the unlabeled case as
well.) A careful choice of Gn∗ , however, yields a better lower bound.
pref[k]
Theorem 5.3.1 H(Qn
) = Ω(n log n), for k ≥ 2.
Proof: Let G be a graph generated by pref[k]. Let degt (xi ), for i ≤ t, be the degree of the
i-th inserted node at time t in G. By [29, Lemma 6], with probability
1 − O(n−3 ), for each
p
1 ≤ t ≤ n, each node xi , 1 ≤ i ≤ t, will have degree degt (xi ) < ( t/i) log3 n in G.
P∗
√
In particular, let t∗ = d 3 ne. Let ξ be the event: “∃t ≥ t∗ , ti=1 degt (xi ) ≥ n3/4 .” At
time n, the sum of the degrees of nodes x1 , . . . , xt∗ can be upper bounded by
t∗
t∗ r
t∗
X
X
X
√
n
3
3
degn (xi ) ≤
log n = n log n
i−1/2 < O(n3/4 ),
i
i=1
i=1
i=1
w.h.p. Indeed, Pr [ξ] ≤ O(n−3 ).
Now define t+ = dne, for some small enough > 0; let n be large enough such that
∗
t < t+ . We call a node added after time t+ good if it is not connected to any of the first t∗
nodes. To bound the number of good nodes from below, we condition on ξ, and we upper
bound the number of bad nodes. Using a union bound, the probability that node xt for
t ≥ t∗ is bad can be upper bounded by k · n3/4 /(n) ≤ O(n−1/4 ).
Let ξ 0 be the event: “at least (1 − 2)n nodes are good”; by stochastic dominance, the
event ξ 0 happens w.h.p. In our application of Lemma 5.3.1, we will choose Gn∗ ⊆ Gn to be
the set of graphs satisfying ξ ∩ ξ 0 . Thus, P + = Pr [ξ ∩ ξ 0 ] = 1 − o(1). Moreover,
(1−2)kn
q n
3
√
3 n log n
2(1−2)n
4
14

≤ O(n−2/3+ )
≤ n− 3 n+ 3 n = ρ.
max∗ Pnpref[k] (G) ≤ 
G∈Gn
kn
pref[k]
(Notice how, by applying Lemma 5.3.1 at this point, we already have that H(Pn
Ω(n log n).)
2
) ≥
In the original PA model, edges are both undirected and unlabeled: we direct and label them for
simplicity of exposition. The entropy lower bound will hold for the undirected and unlabeled version of
these graphs.
84
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
pref[k]
Now, we proceed to lower bound H(Qn
) through an upper bound on |ϕ−1 (H)| for
pref[k]
H ∈ Hn
, by a careful counting argument. Given a H, it is possible to determine for each
of its edges, which of the two endpoints of the edge was responsible for adding the edge to
the graph. This task is easy for edges incident to any node of degree k, as that node will
have necessarily added all k edges to the graph. So, we can remove all degree k nodes from
the graph and repeat this process until the graph becomes empty.
Thus, H could have been produced from at most n! · (k!)n labeled graphs, since there are
at most n! ways of labeling the nodes, and k! ways of labeling each of the “outgoing” edges
of each node. That is, |ϕ−1 (H)| ≤ n! · (k!)n ≤ nn k kn . Then, choosing Hn∗ ⊆ Hn to be the
set of unlabeled graphs obtained by removing labels from Gn∗ , Hn∗ = {ϕ(G) | G ∈ Gn∗ }, we
obtain P + = 1 − o(1), and
(H) ≤ ρ · nn · k kn = n−Ω(n) k kn = P ∗ .
max Qpref[k]
n
∗
H∈Hn
pref[k]
Finally, an application of Lemma 5.3.1 gives H(Qn
the proof.
5.3.3
) ≥ P + ·log P1∗ ≥ Ω(n log n), completing
Incompressibility of the ACL model
We recall the ACL model (model A in [2]). This model (acl[α]) is parametrized by some
α ∈ (0, 1). At time 1, the graph consists of a single node. At time t + 1, a coin is tossed:
with probability 1 − α, a new node is added to the graph and with probability α, an edge
from x to y is added to the graph, where node x is chosen with probability proportional to
the outdegree of x, while node y is chosen randomly with probability proportional to the
indegree of y.
We assume that α > 1/2. This is because the edge density of the graph generated
by model is α/(1 − α), w.h.p.; if α < 1/2, then there are many more nodes than edges,
an uninteresting case both in theory and in practice. Under this assumption, we show
acl[α]
H(Pn ) = Ω(n log n).
0acl[α]
Theorem 5.3.2 H(Qn
) = Ω(n log n), for3 α > 1/2.
Proof: Let α > 1/2 be the parameter of the acl[α] model. Let Gn0 be the set4 of all timelabeled graphs, that can be generated by acl[α] model in n time steps, where the label
represents the time when a node or an edge was added to the graph. Let Hn0 be the set of
all undirected and unlabeled graphs that can be obtained by removing the orientation and
(time-)labels from the graphs in Gn0 .
0acl[α]
Let Pn
: Gn0 → [0, 1] denote the probability distribution induced on Gn0 by the model
acl[α]. We define the following two events.
3
Here we do not use the probability distribution Q on the graphs of n nodes — in the acl[α] model the
0acl[α]
number of nodes is a r.v. Qn
denotes the probability distribution on the graphs that can be generated
by the acl[α] model in n steps.
4
Note that here it would be unnatural to consider the previously defined class Gn , as the number of nodes
in the acl[α] model is a r.v. The same holds for Hn .
5.3. INCOMPRESSIBILITY OF THE EXISTING MODELS
85
ξ: the number of edges is αn ± o(n), while the number of nodes is (1 − α)n ± o(n), and
ξ 0 : the number of edges going from a node of O(1) outdegree to a node of O(1) indegree
is at least (α − )n, for some > 0 to be fixed later.
Our plan is first show (Lemma 5.3.2) that ξ ∧ ξ 0 occurs with probability 1 − o(1). Let
Gn0∗ ⊆ Gn0 be the subset of Gn0 containing the graphs satisfying ξ ∧ ξ 0 . Then, with the notation
of Lemma 5.3.1, it holds that P + = 1 − o(1). We will then show (Lemma 5.3.3) that
0acl[α]
P ∗ = maxG0 ∈Gn0∗ Pn
(G0 ) ≤ n−(2α−)n . Given these, we can complete the proof as follows.
Let ϕ0 : Gn0 → Hn0 , be the map that removes edge and node labels from the graphs of Gn0 . As
P
0acl[α]
0acl[α]
before, Qn (H 0 ) = ϕ0 (G0 )=H 0 Pn
(G0 ). Note that for each H 0 we have that |ϕ0 (G0 )| ≤ n!
(as each element of the graph has one label out of the set {1, . . . , n}). Thus,
(G) ≤ n! · n−(2α−)n ≤ n−(2α−)n+n = n(1−2α+)n .
max Q0acl[α]
n
0∗
G0 ∈Gn
The proof can be concluded with an application of Lemma 5.3.1.
Lemma 5.3.2 Pr [ξ ∧ ξ 0 ] = 1 − o(1).
Proof: By Chernoff bound, Pr [ξ] = 1 − o(1). Thus it suffices to show that Pr [ξ 0 ] = 1 − o(1).
Let Xit (Yit ) be the r.v. denoting the number of nodes having indegree (outdegree) i at
time t. The authors of [2] show that
E [Xit ]
1
E [Yit ]
1−α
1
Γ(i)
±O
=
=
Γ 1+
,
1
t
t
α
α Γ i+1+ α
t
and that
i
h
t √
t
Pr Xi − E Xi > 2t log n + 2 < exp − log2 n ,
i
h
t √
t
Pr Yi − E Yi > 2t log n + 2 < exp − log2 n .
Note that, by union bound, each of the r.v.s Xit , Yit can be shown to deviate from their
mean by at most the stated error term w.h.p.
Let j be an integer to be fixed later. An edge is good if it goes from a node of outdegree
≤ j to a node of indegree ≤ j. Let us denote by Zjt the number of good edges at time t.
Note that Zjt−1 + 1 ≥ Zjt ≥ Zjt−1 − 2j. This is because at most one edge is added in a single
step and adding an edge changes the degree of at most 2 nodes. Thus, the number of good
edges can decrease at most 2j in a single step, i.e., Zjt satisfies the (2j)-Lipschitz condition.
Then,
2j
t
t−1 t
X
t−1
E Zj = E Zj
+ Pr Zj = Zj + 1 −
i Pr Zjt = Zkt−1 − i .
i=1
In order to increase the number of good edges, a node of indegree < j and a node of
outdegree < j must be chosen as the ending and the starting point of the new edge.
P
P
j−1
j−1
t−1
t−1
iX
iY
i
i
i=1
i=1
Pr Zjt = Zjt−1 + 1 = α
.
(t − 1)2
86
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
For the number of good edges to decrease, either the origin of the new edge has outdegree
j, or the destination of the new edge has indegree j. Thus,
jXjt−1 jYjt−1
Pr Zjt < Zjt−1 ≤
+
.
t−1
t−1
By calculations,
2j
X
i=1
2j
X
t
t
X t−1 + Yjt−1
t−1
t−1
2 j
Pr Zj = Zj − i ≤ 2j
i Pr Zj = Zj − i ≤ 2j
.
t−1
i=1
Thus,
E
Zjt
≥E
Zjt−1
E
hP
+α
j−1
i=1
iXit−1
P
j−1
i=1
iYit−1
(t − 1)2
i
E Xjt−1 + E Yjt−1
.
− 2j
t−1
2
With probability 1 − o(1), for all log2 n ≤ t ≤ n and 1 ≤ i ≤ j − 1, we have
1−α
1
Γ(i)
t
t
t.
Xi = Yi = (1 ± o(1))
Γ 1+
α
α Γ i + 1 + α1
Thus w.h.p., for all t ≥ log2 n,
j−1
j−1
X
Γ(j + 1)Γ(1 + α1 )
Xit X Yit
i
=
=1−
± o(j 2 ).
i
1
t
t
Γ(j + α )
i=1
i=1
As j is a constant, the error term is o(1). Then,
2
t
t−1 Γ(j + 1)Γ(1 + α1 )
1−α
1
j 2 Γ(j)
E Zj ≥ E Zj
−
4
Γ
1
+
± o(1).
+α 1−
α
α Γ(j + 1 + α1 )
Γ(j + α1 )
Γ(j+1)Γ(1+ 1 )
2
j Γ(j)
α
Note that, as j grows, both
and Γ(j+1+
tend to 0. That is, for each 1 ,
1
1
Γ(j+ α
)
)
α
there exists a j = j(1 ) such that
(5.1)
E Zjt ≥ E Zjt−1 + (1 − 2 )α.
For each j, and for each t, we will define a Bjt in such a way that, w.h.p., Bjt ≤ E Zjt . Let
Bjt = 0 for t ≤ dlog3 ne so that the base case is true. Define Bjt = (t−dlog3 ne)(1−2 )α. This
definition satisfies Bjt ≤ E Zjt for all t — this can be shown by induction on (5.1). Recall
that all these hold w.h.p. As we have already
argued,
the r.v. Zkt satisfies the (2j)-Lipschitz
condition, i.e., using [2, Lemma 1], Zjt = E Zjt ± o(t) ≥ t(1 − 2 )α, for every t ≥ dlog3 ne,
w.h.p.
In particular for any 2 > 0 there exists a j = j() s.t. Zjn ≥ n(1 − 2 )α, w.h.p.
0acl[α]
Lemma 5.3.3 Conditioned on ξ ∧ ξ 0 , maxG0 ∈Gn0∗ Pn
(G0 ) ≤ n−(2α−)n .
5.3. INCOMPRESSIBILITY OF THE EXISTING MODELS
87
Proof: Since we condition on ξ 0 , there are at least n(1 − 2 )α good edges. These good edges
are labeled with their order of arrival. For i ≥ 3 αn, the probability that the i-th arrived
2
2
edge is good is at most ji2 ≤ (3jαn)2 .
The probability that all the edges with label at least 3 αn are good is at most
j
3 αn
2(1−3 )αn
≤
j
3 α
2(1−3 )αn
n−2(1−3 )αn ≤ n−(2α−)n .
Thus, the maximum probability of generating a graph in Gn0∗ , conditioned on ξ ∧ ξ 0 , is at
most n−(2α−)n .
5.3.4
Incompressibility of the copying model
We now turn our attention to the (linear growth) copying model (copy[α, k]) of Kumar et
al. [65]. This model is parametrized by an integer k ≥ 1 and an α ∈ (0, 1). Here, k represents
the outdegree of nodes and α determines the “copying rate” of the graph. At time t = 1, the
graph consists of a single node with k self-loops. At time t > 1,
(1) a new node xt is added to the graph;
(2) a node x is chosen uniformly at random among x1 , . . . , xt−1 ; and
(3) for each i = 1, . . . , k, a α-biased coin is flipped: with probability α, the i-th outlink
of xt is chosen uniformly at random from x1 , . . . , xt−1 and with probability 1 − α, the i-th
outlink of xt will be equal to the i-th outlink of x, i.e., the i-th outlink will be “copied”.
copy[α,k]
Theorem 5.3.3 H(Qn
) = Ω(n log n), for k > 2/α.
Proof: We start by noting that the copying model with outdegree k can be completely
described by k independent versions of the copying model with outdegree 1. We use copy[α, k]
to denote the copying model with k outlinks, Gn,k for the set of labeled5 graphs on n nodes
that can be generated by copy[α, k], and Hn,k for the set of unlabeled graphs that can be
obtained by removing labels and orientations from the graphs in Gn,k .
We start with the case k = 1. Let E [Xit ] be the expected indegree at time t of the node
inserted at time i ≤ t. Then,
t
0
t=k
E Xi =
α
+
t
> i.
E Xit−1 1 + 1−α
t−1
t−1
αΓ(t+1−α)Γ(i)
α
Note that E [Xit ] = (1−α)Γ(i+1−α)Γ(t)
− 1−α
. We now show that Xit satisfies a O(1)-Lipschitz
condition, with the constant depending on i and α.
Let Yjt denote the number of edges “copied”, directly or indirectly, from the j-th edge
until time t ≥ j. Precisely, let us define the singleton Sjj = {ej } containing the j-th added
edge. The set Sjt , t > j, will be defined as follows: if the t-th edge et was copied from some
edge in Sjt−1 , then Sjt = {et } ∪ Sjt−1 , otherwise Sjt = Sjt−1 . With this notation, Yjt = |Sjt |.
We now use the following concentration bound [35].
5
Nodes are labeled with 1, . . . , n and, for each node, its outlinks are labeled with 1, . . . k.
88
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
Theorem 5.3.4 (Method of average bounded differences) Suppose f is some function of (possibly dependent) r.v.’s X1 , . . . , Xn . Suppose that, for each i = 1, . . . , n, there
exists a ci such that, for all pairs xi , x0i of possible values of Xi , and for any assignment
X1 = x1 , . . . , Xi−1 = xi−1 , it holds that |E − E 0 | ≤ ci , where
E = E [f (X1 , . . . , Xn ) | Xi = xi , Xi−1 = xi−1 , . . . , X1 = x1 ] ,
E 0 = E [f (X1 , . . . , Xn ) | Xi = x0i , Xi−1 = xi−1 , . . . , X1 = x1 ] .
Let c =
Pn
2
i=1 ci .
Then,
2
t
Pr [|f (X1 , . . . , Xn ) − E [f (X1 , . . . , Xn )]| > t] ≤ 2 exp −
.
2c
Let j be fixed.
to bound cj in
5.3.4 can be applied.
such a way that Theorem
goal
ist−1
Our
j
1−α
t
· 1 + t−1 , for t > j, and Yj = 1. Then, it follows that
Observe that E Yj = E Yj
t Γ(t+1−α)Γ(j)
E Yj = Γ(t)Γ(j+1−α) .
Suppose we want to bound the degree of the i-th node xi . Then, we are interested in
t
bounding the maximum expected change cj in the degree
nXi of xi over the possible choices
of the j-th edge, for j =Pi + 1, . . . , n. We have cj ≤ 2 E Yj .
Let us consider c = nj=i+1 c2j . We have
c ≤
n
X
2
2 E Yjn
j=i+1
≤ 4
Γ(n + 1 − α)
Γ(n)
≤ a · n2−2α
n
X
j=i+1
2
1
j 2−2α
·
n X
j=i+1
Γ(j)
Γ(j + 1 − α)
2
,
for some large enough constant a > 0. Thus we obtain,

i2α−1 −n2α−1
α<
 a · n2−2α ·
1−2α
a · n · (log n + 1)
α=
c≤

2−2α n2α−1 −i2α−1 +1
a·n
·
α>
2α−1
1
2
1
2
1
2
Let us fix i = dne. Then,

1−2α
1−2α
= a · n · 1−
α<
 a · n2−2α · n2α−1 1−
1−2α
1−2α
c≤
a · n · (log n + 1)
α=

1
1
a · n2−2α · n2α−1 2α−1
+ n2−2α = a · n · 2α−1
+ o(n) α >
1
2
1
2
1
2
Thus, c ≤ O(n log n). Applying Theorem 5.3.4, we get
h
i
p
t 4c
log
n
2
t
Pr Xi − E Xi ≥ 2 c log n ≤ 2 exp
= 2 exp(2 log n) = 2 .
2c
n
5.3. INCOMPRESSIBILITY OF THE EXISTING MODELS
89
By the union bound, with probability
√ 1 − O(1/n), each node i = dne, dne + 1, . . . , n
will have degree upper bounded by O( n log n) (note that the expected degree of these
nodes is constant). Conditioning (as in the proof of Theorem 5.3.1) on this event we obtain
∗
⊆ Gn,1 , P + = 1 − o(1), and for k = 1,
Gn,1
r
max Pncopy[α,1] (G) ≤
∗
G∈Gn,1
O
log n
n
!!αn
.
Now let us consider copy[α, k], with k > 1. Since outlinks are chosen independently, it
holds that
!!kαn
r
log
n
max
Pncopy[α,k] (G) ≤ O
.
∗
G∈Gn,k
n
For constant k > 2/α, this upper bound is less than n−(1+)n for some constant > 0.
copy[α,k]
To show a lower bound on H(Qn
), we once again upper bound |ϕ−1 (H)|, for H ∈
Hn,k . We proceed as in the proof of Theorem 5.3.1. Given H, for each of its nodes v, it is
possible to determine which of the edges incident to v were its outlinks in all the G’s such
that ϕ(G) = H (this can be done by induction, noting that a node of degree k in H in would
have had in-degree 0 in G). As there are exactly k labels for the outlinks of each node, and
the number of nodes is n, we have that, for each H ∈ Hn,k , |ϕ−1 (H)| ≤ n! · (k!)n . The proof
can be concluded as in Theorem 5.3.1.
5.3.5
Incompressibility of the Kronecker multiplication model
We now turn our attention to the Kronecker multiplication model (krm) of Leskovec et
al. [69].
Given two matrices A ∈ Rn×n and B ∈ Rm×m , their Kronecker product A ⊗ B is an
nm × nm matrix


a1,1 B a1,2 B · · · a1,n B
 a2,1 B a2,2 B · · · a2,n B 


A ⊗ B =  ..
..
..  ,
.
.
 .
.
.
. 
an,1 B an,2 B · · · an,n B
where A = {ai,j } and ai,j B is the usual scalar product.
The Kronecker multiplication model is parametrized by a square matrix M ∈ [0, 1]`×` , and
a number s of multiplication “steps”. The graph will be composed by `s nodes. The edges are
generated as follows. For each couple of distinct nodes (i, j) in the graph an edge going from
[s]
[s]
i to j will be added independently with probability Mi,j , where Mi,j = |M ⊗ M {z
⊗ · · · ⊗ M}.
s times
It is clear that for some choices of the matrix M , the graph will be compressible. Indeed,
if M has only 0/1 values then the random graph has zero entropy, as its construction is
completely deterministic. On the other hand, we show here that there exists a matrix M
that makes the graph incompressible. Indeed, even some 2 × 2 matrix M would generate
graphs requiring at least Ω(log n) bits per edge and we expect that a lot of probabilistic
90
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
matrix will have the same behavior. (Note that a 1 × 1 matrix can only produce graphs
containing a single node.)
1 1
krm[M,s]
Theorem 5.3.5 Let ` ≥ 2, J =
and 1/` < α < 1. Then, w.h.p., H(Qn
)=
1 1
Ω(m log n), where n = `s , M = α · J` , and m is the number of edges.
Proof: Consider the original directed version of the graph. Note that M [s] = αs · J`s . Thus
the events “the edge i → j is added to the graph” are i.i.d. trials, each having probability of
success αs .
In the undirected and simple version of the graph, the events “the edge {i, j} is added to
the graph”, for i 6= j, are again i.i.d. trials, each of probability β = 1 − (1 − αs )2 = Θ(αs ).
Thus we obtain an Erdös–Rényi Gn,p graph with n = `s and p = Θ(αs ). By a Chernoff
bound, m = Θ(n2 p), w.h.p. Now,
s 1
1
m = Θ(n2 p) = Θ(n · (`α)s ) = Θ n · `1+log` α
= Θ n · (`s )1−log` α = Θ n2−log` α .
By α > `−1 , we obtain log` α1 < 1; thus m = Θ(n2 p) is a polynomial in n of degree > 1.
Recall that, for Lemma 5.3.1 to apply, we need to find a subset Hn∗ ⊆ Hn , having large
total probability P + , and such that each graph in Hn∗ has probability upper bounded by
a (small) P ∗ . The condition {m = Θ(n2 p)} determines our Hn∗ , giving us P + = 1 − o(1).
To upper bound P ∗ , note that each labeled version of each graph in Hn∗ has probability
2
2
≤ pΘ(n p) ≤ 2−Θ(s·n p) . There are at most n! ≤ 2O(n log n) many labeled versions of each fixed
graph in Hn∗ . Thus,
2
2
P ∗ ≤ 2O(n log n)−Θ(s·n p) = 2−Θ(s·n p) .
+
∗
2
By Lemma 5.3.1, we have that H(Qkrm
n ) ≥ P log(1/P ) ≥ Θ(s · n p). Noting that
s = Θ(log n) and m = Θ(n2 p) concludes the proof.
5.3.6
Incompressibility of Kleinberg’s small-world model
Recall Kleinberg’s small-world model6 (kl) [58, 59] on the line, with nodes 1, . . . , n. A directed labeled random graph is generated by the following stochastic process. Each node x
independently chooses a node y with probability proportional to 1/(|x − y|) and adds the
directed edge x → y; these are the so-called long-range edges. In addition, the node x has
(fixed) directed edges to its neighbors x − 1 and x + 1 (the short-range edges).
For simplicity, we start by proving the following weaker result. After the proof, we will
comment on how one can obtain the stronger incompressibility of Ω(n log n).
Lemma 5.3.4 H(Qkl
n ) = Ω(n log log n).
6
Note that an important difference between Kleinberg’s small-world model and other models considered
in this chapter lies in their degree distribution. Nodes’ degrees in Kleinberg’s model are upper bounded by
O(log n) w.h.p.; the other models we consider here have a power law degree distribution, and thus nodes of
polynomial degree, w.h.p.
5.4. THE NEW WEB GRAPH MODEL
91
Proof: Note that in Kleinberg’s one-dimensional model, the normalization factor for the
probability distribution that generates long-range edges is Θ(log n). Hence, for every node x,
the maximum probability of choosing a particular long-range edge x → y is at most c1 / log n,
for some constant c1 . Since each node chooses edges independently, the maximum probability
of generating any labeled n-node graph O((c1 / log n)n ), i.e., maxG∈Gn Pnkl (G) ≤ (c1 / log n)n .
Using Lemma 5.3.1, we conclude H(Pnkl ) = Ω(n log log n).
To get a lower bound on H(Qkl
n ), we first obtain an upper bound on the number ρ(H)
of Hamiltonian paths in an undirected graph H with m edges (this upper bound will hold
for directed
Pn graphs as well). Suppose
Qn that H has degree sequence d1 ≥ · · · ≥ dn , with
2m = i=1 di . Clearly, ρ(H) ≤ n · i=1 di , where the leading
n is for the different choices
p
Pn
Qn
1
n
of the starting node. Applying the AM-GM
inequality (
i=1 xi , for noni=1 xi ≤ n
Qn
negative xi ’s), we have that ρ(H) ≤ n · i=1 di ≤ n · (2m/n)n .
Let H ∈ Hn . By just considering all possible permutations of the node labels, we can
see that |ϕ−1 (H)| ≤ n!. However, not all permutations are valid. In particular, a valid
permutation preserves adjacency, hence the number of valid permutations is upper bounded
by the number of Hamiltonian paths in H. Since m = O(n) in kl, by the above argument,
ρ(H) ≤ cn2 , for some constant c2 . Thus, |ϕ−1 (H)| ≤ cn2 . We have
n
n
X
c1
1
kl
kl
−1
kl
n
Qn (H) =
Pn (G) ≤ |ϕ (H)| · (max Pn (G)) ≤ c2
=O
.
G∈Gn
log n
log n
ϕ(G)=H
The proof is complete by appealing to Lemma 5.3.1.
The above lower bound can be improved as follows. First, we only consider graphs in
which Ω(n) of the edges exist between nodes that are nΩ(1) apart. Using a Chernoff bound
we show that the graphs generated by Kleinberg’s model satisfies this property w.h.p. (i.e.,
the P + of the Lemma 5.3.1 is Ω(1)). It can then be shown that the maximum probability
of generating any one of these graphs is at most P ∗ = n−Ω(n) . Once again, applying Lemma
5.3.1, we can obtain the following theorem:
Theorem 5.3.6 H(Qkl
n ) = Ω(n log n).
Finally, we note that the similar incompressibility bounds can be obtained for the rank-based
friendship model [73].
5.4
The new web graph model
In this section we present our new web graph model. Let k ≥ 2 be a fixed positive integer.
Our new model creates a directed simple graph (i.e., no self-loops or multi-edges) by the
following process.
The process starts at time t0 with a simple directed seed graph Gt0 whose nodes are
arranged on a (discrete) line, or list. The graph Gt0 has t0 nodes, each of outdegree k. Here,
Gt0 could be, for instance, a complete directed graph with t0 = k + 1 nodes.
At time t > t0 , an existing node y is chosen uniformly at random (u.a.r.) as a prototype:
92
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
k=2
A
B
C
A
B
D
C
1
2
3
1
2
3
4
Gt0 = G3
G4
(x = D, y = C)
Figure 5.1: The new node x = D chooses y = C as its prototype. The edge C → B is copied
and the new edge D → C is added for reference. Notice that all the edges incident to C in
Gt0 = G3 increase their length by 1 in Gt0 +1 = G4 .
(1) a new node x is placed to the immediate left of y (so that y, and all the nodes on its
right, are shifted one position right in the ordering),
(2) a directed edge x → y is added to the graph, and
(3) k − 1 edges are “copied” from y, i.e., k − 1 successors (i.e., out-neighbors) of y,
say z1 , . . . , zk−1 , are chosen u.a.r. without replacement and the directed edges x →
z1 , . . . , x → zk−1 are added to the graph.
See Figure 5.1 for an illustration of our model.
An intuitive explanation of this process is as follows. Consider the list of web pages
ordered lexicographically by their URLs (for this ordering, a URL a.b.com/d/e is to be
interpreted as com/b/a/d/e.) A website owner might decide to add a new web page to her
site; to do this, she could take one of the existing web pages from her site as a prototype,
modify it as needed, add an edge to the prototype for reference, and publish the new page
on her site. Thus the new web page and the prototype will be close in the URL ordering.
In our model, we can show the following:
1
(1) The fraction of nodes of indegree i is asymptotic to Θ(i−2− k−1 ); this power law is
often referred to as “rich get richer.”
1
(2) The fraction of edges of length7 ` in the given embedding is asymptotic to Θ(`−1− k );
analogously, we refer to this as “long get longer.”
Boldi and Vigna [10] study the distribution of gaps in the web graph, defined as follows.
Sort the web pages lexicographically by URLs and this gives an embedding of nodes on the
line. Now, if a web page x = z0 has edges to z1 , . . . , zj in this order, the gaps are given by
|zi−1 − zi |, 1 ≤ i ≤ j. They observe how the gap distribution in real web graph snapshots
follows a power law with exponent ≈ 1.3. Our model can capture a similar distribution for
the edge lengths, by an appropriate choice of k. In fact, both the average edge length and
the average gap in our model are small; intuitively, though not immediately, this leads to
the compressibility result of Section 6.3. It turns out that a power law distribution of either
7
The length of an edge x → y is the absolute difference between the positions of node x and y in the
given embedding.
5.5. RICH GET RICHER
93
the lengths or the gaps (with exponent > 1) is sufficient to show compressibility; for sake of
simplicity, we focus on the former in Section 5.6.
5.5
Rich get richer
In this section we characterize the indegree distribution of our graph model. We show that
the expected indegree distribution follows a power law. We then show the distribution is
tightly concentrated.
Let
2
1
1
k 21+ k−1 Γ 23 + k−1
Γ i + 1 + k−1
.
√
f (i) =
·
2
(k − 1) π
Γ i + 3 + k−1
2
. k 21+ k−1
1
1
1
Γ( 32 + k−1
)
−2−
k−1
√
= 1, i.e., f (i) = Θ(i−2− k−1 ). Let
It follows that limi→∞ f (i)
·i
(k−1) π
Xit denote the number of nodes of indegree i at time t. We first show that E [Xit ] can be
bounded by f (i) · t ± c, for some constant c.
Theorem 5.5.1 There is a constant c = c(Gt0 ) such that
f (i) · t − c ≤ E [Xit ] ≤ f (i) · t + c,
for all t ≥ t0 and i ∈ [t].
(1)
Proof: For now, assume t > t0 . Let x be the new node, and let y be the node we will copy
edges from; recall that y is chosen u.a.r. First, we focus on the case i = 0. We have
E X0t | X0t−1 = X0t−1 − Pr [y had indegree 0] + 1,
as at each time step a new node (i.e., x) of indegree 0 is added, and the only node that could
change its indegree to 1 is y. The probability of the latter event is exactly X0t−1 /(t − 1). By
the linearity of expectation, we get
t
1
E X0t−1 + 1.
(5.2)
E X0 = 1 −
t−1
Next, consider i ≥ 1. According to our model, nodes z1 , . . . , zk−1 , will be chosen without
replacement from S(y), the successors of y. The successors of the new node x will then be
S(x) = {y, z1 , . . . , zk−1 }. Since z1 , . . . , zk−1 are all distinct, the graph remains simple and
|S(x)| = k.
For each j = 1, . . . , k − 1, the node zj is chosen with probability proportional to its
indegree; this follows since node zj was the endpoint of an edge chosen u.a.r. The probability
i(k−1)
1
+ k(t−1)
(recall that
that a particular node of indegree i ≥ 1 gets chosen as a successor is t−1
all the k successors of x will be distinct). Thus, for i ≥ 1,
t
t−1 t−1 1
i k−1
1
i−1k−1
E Xi = 1 −
−
E Xi
+
+
E Xi−1
. (5.3)
t−1 t−1 k
t−1 t−1 k
For the base cases, note that Xtt = 0 for each t ≥ t0 . Also, the variables Xit0 are
1
completely determined by Gt0 . For each fixed k, we have f (t) = Θ(t−2− k−1 ). Thus, there is
94
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
a constant
c0 such that for any c ≥ c0 , and for all t ≥ t0 , E [Xtt ] follows (1). The base cases
t0 E Xi , i = 1, 2, . . ., can also be covered with a sufficiently large c (that has to be greater
than some function of the initial graph Gt0 ).
√
For the inductive case, we have f (0) = 12 (by applying Γ(x)Γ(x + 12 ) = Γ(2x) 21−2x π,
1
and Γ(2x + 1) = 2x Γ(2x), with x = 1 + k−1
). Using this, (5.2), and calculations, we can
t−1
t
show that if X0 satisfies (1), then X0 also satisfies (1). For i ≥ 1, we have f (i − 1) =
f (i) · (ik − i + 2k + 2)/(ik − i + 1). An induction on (5.3) completes the proof.
Thus, in expectation, the indegrees follow a power law with exponent −2 − 1/(k − 1).
We now show a O(1)-Lipschitz property for the r.v.’s Xit for k = O(1). The concentration
immediately follows using Theorem 5.2.1.
Lemma 5.5.1 Each r.v. Xit satisfies the (2k)-Lipschitz property.
Proof: Our model can be interpreted as the following stochastic process: at step t, two
independent dice, with t − 1 and k faces respectively, are thrown. Let Qt and Rt be the
respective outcomes of these two trials. The new node x will position itself to the immediate
left of the node y that was added at time Qt . Suppose that the (ordered) list of successors
of y is (z1 , . . . , zk ). The ordered list of successors of x will be composed of y followed by the
nodes z1 , . . . , zk with the exception of node zRt . Thus, the number of nodes Xiτ of indegree
i at time τ can be interpreted as a function of the trials (Q1 , R1 ), . . . , (Qτ , Rτ ).
We want to show that changing the outcome of any single trial (Qt0 , Rt0 ), changes the r.v.
τ
Xi (for fixed i) by an amount not greater than 2k. Suppose we change (qt0 , rt0 ) to (qt00 , rt0 0 ),
going from graph G to G0 . Let x be the node added at time t0 with the choice (qt0 , rt0 ), and
x0 be the node added with the choice (qt00 , rt0 0 ).
Let S, S 0 be the successors of x in G and x0 in G0 , respectively. The proof is complete
by showing inductively that at any time step t, and for any nodes y, y 0 added at the same
time respectively in G, G0 , the (ordered) list of successors of y and y 0 are close, i.e., in each
of their positions, they either have the same successor, or they have two different elements
of S ∪ S 0 .
If t ≤ t0 , then the proof is immediate. For t > t0 , it follows that the only edges we need
to consider are the copied edges. By induction, we know that at time t − 1, the lists of
successors of the node we are copying from, in G and G0 , were close. Since the two lists are
sorted, either the i-th copied edges in G and G0 will be the same, or they will both point to
nodes in S ∪ S 0 . Thus the lists of the time t node are close and the proof is complete.
5.6
Long get longer
In this section we analyze the edge length distribution in our graph model. We show it
follows a power law with exponent more than 1. Later, we will use this to establish the
compressibility of graphs generated by our model. Let
Γ ` + 1 − k1
g(`) =
.
Γ 2 − k1 Γ (` + 2)
5.6. LONG GET LONGER
95
.
1
1
It holds that lim`→∞ g(`)
`−1− k Γ 2 − k1
= 1, i.e., g(`) = Θ(`−1− k ). Recall that the
length of an edge from a node in position i to a node in position j is equal to |i − j|; we define
its circular directed length, denoted cd-length, to be j − i if j > i, and t − (i − j) otherwise.
Let Y`t be the number of edges of length ` at time t. We aim to show that Y`t ≈ g(`) · t. It
turns out to be useful to consider a related r.v. Z`t , which denotes the number of edges of
cd-length ` at time t. We will first show that, w.h.p., Z`t ≈ g(`) · t. We will then argue that
Y`t is very close to Z`t .
The following shows that E [Z`t ] is bounded by g(`) · t ± O(1).
Theorem 5.6.1 There exists some constant c = c(Gt0 ) such that
g(`) · t − c ≤ E [Z`t ] ≤ g(`) · t + c,
for all t ≥ t0 and ` ∈ [t].
Proof: As in the proof of Theorem 5.5.1, we start by obtaining a recurrence on the r.v.’s
Zit . Let x be the node added at time t, and let y, y 0 be the nodes to the immediate right and
left of x respectively (where y 0 equals the last node in the ordering if x is placed before the
first node y).
Consider Z1t . For t > t0 , E Z1t | Z1t−1 = Z1t−1 −Pr [x enlarges an edge of cd-length 1]+1,
as an edge x → y of length 1 is necessarily added to the graph, and adding x can enlarge at
most one edge of cd-length 1 (that is, the edge y 0 → y if it exists). The probability of the
latter event is equal to Z1t−1 /(t − 1). By the linearity of expectation,
t
1
E Z1 = 1 −
E Z1t−1 + 1.
t−1
Now consider Z`t , for ` ≥ 2 and t > t0 . We have,
t−1
t−1
E Z`t | Z`t−1 , Z`−1
= Z`t−1 − E # of edges of cd-length ` that x enlarged | Z`t−1 , Z`−1
t−1
+ E # of edges of cd-length (` − 1) that x enlarged | Z`t−1 , Z`−1
t−1
+ E # of edges of cd-length (` − 1) that x copied from y | Z`t−1 , Z`−1
.
Recall that x is placed to the left of a node y chosen u.a.r. Thus, given a fixed edge of length
`, the probability this edge is enlarged by x is `/(t − 1). Thus,
t−1
=
E # of edges of length ` that x enlarged | Z`t−1 , Z`−1
`
Z`t−1 , and
t−1
` − 1 t−1
t−1
=
E # of edges of length (` − 1) that x enlarged | Z`t−1 , Z`−1
Z ,
t − 1 `−1
t−1
E # of edges of cd-length (` − 1) that x copied from y | Z`t−1 , Z`−1
=
k−1
X
j=1
t−1
Pr the j-th copied edge had cd-length (` − 1) | Z`t−1 , Z`−1
.
96
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
Note that, for each j = 1, . . . , k − 1, the j-th copied edge is chosen uniformly at random
over all the edges (even if the k − 1 copied edges are not independent). Thus,
k−1
X
t−1
(k − 1)Z`−1
t−1
Pr the j-th copied edge had cd-length (` − 1) | Z`t−1 , Z`−1
=
.
k(t
−
1)
j=1
By the linearity of expectation, we get for ` ≥ 2,
t
t−1 t−1 `
`−1
1 k−1
E Z` = 1 −
E Z`
+
+
E Z`−1
.
t−1
t−1 t−1 k
The base cases can be handled as in Theorem 5.5.1. The inductive step for ` = 1 can be
directly verified. For ` ≥ 2, it suffices to note that g(` − 1) = k · (` + 1)/(`k − 1) · g(`). Thus, the expectation of the edge lengths follows a power law with exponent −1 − 1/k.
To establish the concentration result, we need to analyze quite closely the combinatorial
structure of the graphs generated by our model. Recall that the nodes in our graphs are
placed contiguously on a discrete line (or list). At a generic time step, we use xi to refer to
the i-th node in the ordering from left to right. Given an ordering π = (x1 , x2 , . . . , xt ) of the
nodes, and an integer 0 ≤ k < t, a k-rotation, ρk (xi ) maps the generic node xi , 1 ≤ i ≤ t, to
position 1 + ((i + k) mod t).
We say that two nodes x, x0 are consecutive if there exists a k such that |ρk (x) − ρk (x0 )| =
1, i.e., they are consecutive if in the ordering either they are adjacent or one is the first and
the other the last. Further, we say that an edge x00 → x000 passes over an node x if there
exists k such that ρk (x00 ) < ρk (x) < ρk (x000 ). Finally, two edges x → x0 and x00 → x000 are said
to cross if there exists a k such that after a k-rotation exactly one of x and x0 is within the
positions ρk (x00 ) and ρk (x000 ). We prove the following characterization that will be used later
in the analysis.
Lemma 5.6.1 At any time, given any two consecutive nodes x, x0 , and any positive integer `,
the number of edges of cd-length ` that pass over x or x0 (or both) is at most C = (k +2)t0 +1.
Proof: Let us define G−
t as the graph Gt minus the edges incident to the nodes that were
originally in Gt0 . Note that, for each cd-length `, the number of the edges of cd-length `
that we remove is upper-bounded by 2t0 as each node can be incident to at most two edges
of cd-length `, one going in, and one going out of the node. Unless otherwise noted, we will
consider G−
t for the rest of the proof.
Fix the time t, and take any rotation ρ; let x1 , . . . , xt be the nodes in the list in the
left-right order given by the rotation (i.e., node xi is in position i according to ρ). For a set
of edges of the same cd-length to pass over at least one of two consecutive nodes x, x0 it is
necessary for every pair of them to cross. We will bound, for a generic edge e, the number of
edges that cross e and have the same length as e. Let t(xa ) be the time when xa was added
to the graph. First, by definition we have that if xa → xb , then t(xa ) > t(xb ).
Second, we claim that if there exists a rotation ρ0 such that xa , xb , xc are three nodes
with ρ0 (xa ) < ρ0 (xb ) < ρ0 (xc ) and t(xc ) > t(xb ), then the edge xa → xc cannot exist. To
see this, for xa → xc to exist it must be that t(xa ) > t(xc ). We want to show inductively
5.6. LONG GET LONGER
97
that all the nodes that will point to xc will be both to the left of xc and to the right of xb ,
in the ordering implied by ρ0 . Note that xc was not in Gt0 since its insertion time is larger
than that of xb . Thus, each node placed to the immediate left of xc will point to it, and will
satisfy the induction hypothesis. Furthermore, each node that copies an edge to xc must be
placed to the immediate left of a node pointing to xc . Thus, the second claim is proved.
Third, we claim that if xa , xb , xc , xd are four nodes such that the edges xa → xc and
xb → xd exist, and cross each other, then there exists an edge xc → xd . To see this, first
note that none of these four nodes could have been part of Gt0 , for otherwise at least one of
00
00
00
00
the two edges could not have been part of G−
t . Fix a rotation ρ s.t. ρ (xa ) < ρ (xb ) < ρ (xc );
by the second claim, it must be that t(xb ) > t(xc ). Thus, the edge xb → xd has necessarily
been copied from some node, say xb1 . Note that ρ00 (xb1 ) ≤ ρ(xc ). Indeed by assumption
ρ00 (xc ) > ρ00 (xb ) and it is impossible that ρ00 (xc ) < ρ00 (xb1 ), for otherwise xb could not have
copied from xb1 as t(xb ) > t(xc ). Now, we know that the edge xb1 → xd exists (as before,
xb1 is not part of Gt0 ). If xb1 = xc , then we are done. Otherwise, there must exist an xb2
pointing to xd from which xb1 has copied the edge. Note that ρ00 (xb1 ) < ρ00 (xb2 ) ≤ ρ00 (xc ).
By iterating this reasoning, the claim follows.
Take any set S of edges having the same length, and such that any pair of them cross.
Given an arbitrary ρ000 , let x be the node with the smallest ρ000 (x) such that, for some x0 , the
edge x → x0 is in S (the nodes x and x0 are unique). For any other edge y → y 0 in S, by
the third claim, there must exist the edge x0 → y 0 . As x0 has outdegree k, it follows that
|S| ≤ k + 1.
Finally, since the seed graph Gt0 had k · t0 edges and we removed at most 2t0 edges of
cd-length ` (for an arbitrary ` ≥ 1) in the cut [Gt0 , Gt \ Gt0 ], we have refrained from counting
at most k · t0 + 2t0 edges of length ` passing over one of the nodes x, x0 . The proof follows.
Now we prove the O(1)-Lipschitz property of the r.v.’s Z`t , if t0 , k = O(1). The concentration of the Z`t will follow from Theorem 5.2.1.
Lemma 5.6.2 Each r.v. Z`t satisfies the ((k + 2)t0 + k + 1)-Lipschitz property.
Proof: We use the stochastic interpretation as in the proof of Lemma 5.5.1. For each τ ,
let Z`τ be the r.v. representing the number of edges of cd-length ` at time τ . We consider
Y`τ as a function of the trials (Q1 , R1 ), . . ., (Qτ , Rτ ). We show that changing the outcome
of any single trial (Qt0 , Rt0 ), changes the r.v. Z`τ , for fixed `, by an amount not greater than
C + k = (k + 2)t0 + k + 1.
Suppose we change (qt0 , rt0 ) to (qt00 , rt0 0 ), going from graph G to G0 . Let x be the node
added at time t0 with the choice (qt0 , rt0 ), and x0 be its equivalent with the choice (qt00 , rt0 0 ). We
show that choosing two different positions for x and x0 can change the number of edges of
cd-length ` by at most C + k at any time step. Note that before time step t0 , the cd-lengths
are all equal.
By Lemma 5.6.1, at time t > t0 , for all `, the number of edges of cd-length ` that pass
over x (resp., x0 ) is upper bounded by C. For an edge e, let Se be the set of edges that have
been copied from e, directly or indirectly, including e itself, i.e., e ∈ Se and if an edge e0
is copied from some edge in Se , then e0 ∈ Se . Note that no two edges in Se have the same
cd-length, since they all start from different nodes, but end up at the same node.
98
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
For any node z, if e1 , . . . , ek are the successors of z, we define Sz = Se1 ∪ · · · ∪ Sek . The
last observation implies that, for any fixed `, no more than k edges of cd-length ` are in Sv
(or Sv0 ) at any single time step. Now, consider the following edge bijection from G to G0 :
the i-th edge of the j-th inserted node in G is mapped to the i-th edge of the j-th inserted
node in G0 . It follows that if an edge e in G (resp., G0 ) does not pass over x (resp., x0 ) and
is not in Sx (resp., Sx0 ), then e gets mapped to an edge of the same cd-length in G0 (resp.,
G). Thus, the difference in the number of edges of the cd-length ` in G and G0 is at most
C + k.
We now show that the number Dt of edges whose length and cd-length are different (at
time t) is very small. Since the maximum absolute difference between Y`t and Z`t is bounded
by Dt , this will show that these r.v.’s are close to each other. First note that if an edge
xi → xj has different length and cd-length, then j < i; call such an edge left-directed and
let Rt be the set of left-directed edges. Since Dt ≤ Rt , it suffices to bound the latter.
Lemma 5.6.3 With probability 1 − O
1
t
1 , Rt ≤ O t1− k + , for each constant > 0.
Proof: Observe that each edge xi → xj counted by Rt is such that j < i. Thus, Rt0 is equal
to the number of left-directed edges in Gt0 with its given embedding.
Further, Rt ’s increase over Rt−1 equals the number of left-directed edges copied at step
t (the proximity edge is always
not left-directed).
Thus, E[Rt |Rt−1 ] = 1 + (k − 1) ·
E[Rt−1 ], for each t > t0 . Therefore,
1
k(t−1)
· Rt−1 and E[Rt ] =
1 + (k − 1) ·
1
k(t−1)
·
t
t
Y
Y
+ 1 · Γ (t0 + 1)
Γ t + k−1
i + k−1
k−1 1
k
k
·
= Rt0 ·
.
= Rt0 ·
E[Rt ] = Rt0 ·
1+
k−1
k
i
i
Γ
t
+
+
1
·
Γ
(t
+
1)
0
k
i=t +1
i=t +1
0
0
1
Thus, E[Rt ] = Θ t1− k . We note how a O(1)-Lipschitz condition holds (at most k − 1
new left-directed edges can be added
1at each step).
1Thus
Theorem 5.2.1 can be applied with
√
+
1−
+
an error term of O t log t ≤ O t 2
≤O t k
. The result follows.
Applying Theorem 5.2.1, Theorem 5.6.1, Lemma 5.6.2, and Lemma 5.6.3, we obtain the
following.
Corollary 5.6.1 With probability ≥ 1 − O
i. E[Z`t ] − O
√
1
t
, it holds that
√
t log t ≤ Z`t ≤ E[Z`t ] + O t log t , and
ii. E[Z`t ] − O t1−1/k+ ≤ Y`t ≤ E[Z`t ] + O t1−1/k+ .
√
Note that the concentration error term, O( t log t), is upper bounded by Rt , for each k ≥ 2.
Also, the corollary is vacuous if ` > t1/(k+2) .
5.7. COMPRESSIBILITY OF OUR MODEL
5.7
99
Compressibility of our model
We now analyze the number of bits needed to compress the graphs generated by our model.
Recall that the web graph has a natural embedding on the line via the URL ordering that
experimentally gives very good compression [10,12]. Our model generates a web-like random
graphs and an embedding “à-la-URL” on the line. We work with the following BV-like
compression scheme: a node at position p on the line stores its list of successors at positions
p1 , . . . , pk as a list (p1 − p, . . . , pk − p) of compressed integers. An integer i 6= 0 will be
compressed using O (log (|i| + 1)) bits, using Elias γ-code, for instance [107]. We show that
our graphs can be compressed using O(1) bits per edge using above scheme.
Theorem 5.7.1 The above BV-like scheme compresses
the graphs generated by our model
1
using O(n) bits, with probability at least 1 − O n .
Proof: Let > 0 be a small constant. At time n, consider the number of edges of length
at most L = dn e. Note that by Corollary 5.6.1, for each 1 ≤ ` ≤ L, it holds that
|Y`n − E[Z`n ]| ≤ O n1−1/k+ , with probability 1 − O (n−1 ). For the rest of the proof, we
implicitly condition on this event.
Lower bounding E[Z`n ] as in Theorem 5.6.1, we obtain the following lower bound on the
number of edges of length ≤ L, using standard algebraic manipulation and Lemma8 5.2.1
!
L
X
Γ ` + 1 − k1
· n − c − O n1−1/k+
S ≥
1
Γ
2
−
Γ(`
+
2)
k
`=1
!
Γ L + 2 − k1
− O L · n1−1/k+
≥ nk 1 −
1
Γ(L + 2)Γ 2 − k
≥ nk − O n · k · L−1/k − O L · n1−1/k+ ≥ nk − O n1−1 ,
where 1 is a small constant.
At time n, the total number of edges of the graph is nk. Thus the number of edges of
length more than L is at most O (n1−1 ) (notice how, for this argument to work, it is crucial
to have a very strong bound on the behavior of the Y`n random variables; this is why we
used the Gamma function in their expressions). The maximum edge length is O(n) and so
each edge can be compressed in O(log n) bits. The overall contribution, in terms of bits, of
the edges longer than L will then be o(n).
Now, we calculate the bit contribution B of the edges of length at most L.
!!
L
X
Γ ` + 1 − k1
1−1/k+
n+c+O n
B ≤
O (log (` + 1))
1
Γ
2
+
Γ(` + 2)
k
`=1
!
L
X
≤ n·O
log (` + 1) · `−1−1/k + O L · n1−1/k+ · log L ≤ O(n),
`=1
8
Which we use to conclude that
1
1
Γ(2− k
)
1
Γ(`+1− k
)
`=1 Γ(`+2)
PL
=k· 1−
1
Γ(L+2− k
)
1
Γ(L+2)Γ(2− k
)
.
100
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
)
where the penultimate inequality follows since the Γ(···Γ(···
fraction can be upper bounded
)Γ(··· )
by O(`−1−1/k ), and the last inequality from O (`−1−2 · log `) ≤ O (`−1− ) and from the convergence of the Riemann series. The proof is complete.
Thus, given an ordering of nodes, we can compress the graph to use O(1) bits per edge
using a linear-time algorithm. A natural question is if it is still possible to compress this
graph without knowing the ordering. We show that this is still possible.
Theorem 5.7.2 The graphs generated by our model can be compressed using O(n) bits in
linear time, even if ordering of the nodes is not available.
Proof: Given a node v in G, just by looking at two-neighborhood, we can either (a) find
an out-neighbor w of v having exactly k − 1 out-neighbors in common with v, or (b) we can
conclude that v was part of the “seed” graph Gt0 (having constant order). This step takes
time O(k 2 ) = O(1).
Indeed, if v were not part of Gt0 , during its insertion, v added a proximity edge to its
“real prototype” w, and copied k − 1 of w’s outlinks. If more than one out-neighbor of v has
k − 1 out-neighbors in common with v, we choose one arbitrarily and we call it the “possible
prototype” of v.
For compressing, we create an unlabeled rooted forest out of the nodes in Gt0 . A node v
will look for a possible prototype w. If such a w is found, then v will choose w as its parent.
Otherwise v will be a root in the forest.
To describe G, it will suffice to (a) describe the unlabeled rooted forest, (b) describe the
subgraph induced by the roots of the trees in the forest, and (c) for each non-root node v in
the forest, use dlog ke bits to describe which of its parent’s out-neighbors was not copied by
v in G. The forest can be described with O(n) bits, for instance, by writing down the down
/ up steps made when visiting each tree in the forest, disregarding edge orientations (as each
edge is directed from the child to the parent). This requires O(n) bits. The graph induced by
the roots of the trees (i.e., a subgraph of Gt0 ) can be stored in a non-compressed way using
O(t20 ) = O(1) bits. The third part of the encoding will require at most O(n log k) = O(n)
bits. Note that it is possible to compute each of the three encodings in linear time.
5.8
Other properties of our model
In this section we prove some additional properties of our model: that it has a large number
of bipartite cliques, high clustering coefficient, and small undirected diameter.
5.8.1
Bipartite cliques
Recall that a bipartite clique K(a, b) is a set A of a nodes and a set B of b nodes such that
each node in A has an edge to every node in B. We can show that the graphs generated by
our model contain a large number of bipartite cliques. The proof is similar to the one of [65]
for the linear growth model.
Theorem 5.8.1 There exists a β > 0, such that the number of bipartite cliques K(Ω(log n), k)
in our model is Ω(nβ ), w.h.p.
5.8. OTHER PROPERTIES OF OUR MODEL
101
Proof:[Proof (Sketch)] Take any fixed node x of the seed graph Gt0 and a subset S of k − 1
of its successors. Divide the time steps t − t0 into disjoint epochs of exponentially increasing
size, i.e., of sizes cτ, c2 τ, c3 τ, . . ., for a large enough τ . Let j be the number of epochs; then,
j = Ω(log n). Note that for i ≤ j, the probability that at least one node added in epoch i
will attach itself to x and copy exactly the edges in S is at least a constant; also, for each
i 6= i0 , these events are independent. Thus, w.h.p., at least Ω(log n) nodes will be good, i.e.,
will have S ∪ {v} as successors.
Now, any subset of the good nodes will form a bipartite clique with S ∪ {v}. The number
of subsets of size Ω(log n) is easily shown to grow as Ω(nβ ) for some β > 0.
5.8.2
Clustering coefficient
Watts and Strogatz [103] introduced the concept of clustering coefficient. The clustering
coefficient C(x) of a node x is the ratio of the number of edges between neighbors of x and
the maximum possible number9 of such edges. The clustering coefficient C(G) of a (simple)
graph G is the average of the clustering coefficients of its nodes.
Snapshots of the real web graph have been observed to possess a pretty high clustering
coefficient. Thus, having a high clustering coefficient (that is, having a constant clustering
coefficient) is a desirable property of web graphs’ models.
Theorem 5.8.2 Take a (directed) graph G generated by our model. The clustering coefficient of G is Θ(1) w.h.p.
Proof: By Theorem 5.5.1, and Lemma 5.5.1, there will exists q = Θ(n) many nodes of
indegree 0 w.h.p. Take any node x of indegree 0, and let y be the node that x was copied
from. Then, x and y shares k − 1 out-neighbors (the “copied” ones). The total degree of x
k−1
= k1 ∈ Ω(1). The clustering coefficient
is k, thus the clustering coefficient of x is ≥ k(k−1)
of the graph is the average of the clustering coefficients of its nodes; thus, in our case, it is
≥ n1 · q · k1 ≥ Ω(1).
In general, the maximum value of the clustering coefficient is 1. The claim follows. The previous proof also shows that, if we remove orientations from the edges of our
model’s graphs, the clustering coefficient of the undirected graphs we obtain is Θ(1).
5.8.3
Undirected diameter
We now argue that, w.h.p., the undirected diameter of our random graphs is O(log n) (provided that the seed graph Gt0 was weakly-connected). By undirected diameter, we mean the
diameter of the undirected graph obtained by removing edge orientations from our graphs.
Note that our graphs are almost DAGs, i.e., they are DAGs perhaps except for the nodes in
the seed graph Gt0 and therefore directed diameter is not a meaningful notion to consider.
Consider the so-called random recursive trees: the process starts with a single node and
at each step, a node is chosen uniformly at random, and a new leaf is added as a child of
9
That is,
1
2
deg(x)(deg(x) − 1) in the undirected case and deg(x)(deg(x) − 1) in the directed case.
102
CHAPTER 5. COMPRESSIBILITY OF THE WEB GRAPH
that node; the process ends at the generic time n. A result by Szymanski [100] shows that
random recursive trees on n nodes have height O(log n) w.h.p.
Consider the “proximity” edges added in step (ii) in our model, i.e., those added from
the new node, to a node chosen uniformly at random. Now, these edges induce a random
recursive forest with t0 different roots corresponding to the nodes of the seed graph Gt0 . A
result of [100] states that the height of a random recursive tree on n nodes is O(log n) w.h.p.
Thus, assuming that Gt0 is weakly-connected implies that the (undirected) diameter of our
model’s graphs is O(log n) w.h.p.
Chapter 6
Compressibility of social networks
Motivated by structural properties of the Web graph that support efficient data structures
for in memory adjacency queries, we study the extent to which a large network can be
compressed. Boldi and Vigna (WWW 2004), showed that Web graphs can be compressed
down to three bits of storage per edge; we study the compressibility of social networks
where again adjacency queries are a fundamental primitive. To this end, we propose simple
combinatorial formulations that encapsulate efficient compressibility of graphs. We show that
some of the problems are NP-hard yet admit effective heuristics, some of which can exploit
properties of social networks such as link reciprocity. Our extensive experiments show that
social networks and the web graph exhibit vastly different compressibility characteristics.
6.1
Introduction
We study the extent to which social networks can be compressed. There are two distinct
motivations for such studies. First, Web properties require high-speed indexes for serving
adjacencies in the social network: thus, a typical query seeks the neighbors of a node (member) of a social network. Maintaining these indexes in memory demands that the underlying
graph be stored in a compressed form that facilitates efficient adjacency queries. Secondly,
there is a wealth of evidence (e.g., [64]) that social networks are not random graphs in the
usual sense: they exhibit certain distinctive local characteristics (such as degree sequences).
Studying the compressibility of a social network is akin to studying the degree of “randomness” in the social network. The Web graph (Web pages are nodes, hyperlinks are directed
edges) is a special variant of a social network, in that we have a network of pages rather than
of people. It is known that the Web graph is highly compressible [10] and [22]. Particularly
impressive results have been obtained by Boldi and Vigna [10], who exploit lexicographic locality in the Web graph: when pages are ordered lexicographically by URL, proximal pages
have similar neighborhoods. More precisely, two properties of the ordering by URL are
experimentally observed to hold:
The work described in this chapter is a joint work with F. Chierichetti, R. Kumar, M. Mitzenmacher,
A. Panconesi and P. Raghavan and its extended abstract appeared in the Proceedings of 15th Conference
on Knowledge Discovery and Data Mining(KDD09) [24].
103
104
CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS
• Similarity: pages that are proximal in the lexicographic ordering tend to have similar
sets of neighbors.
• Locality: many links are intra-domain, and therefore likely to point to pages nearby
in the lexicographic ordering.
These two empirical observations are exploited in the BV-algorithm to compress the Web
graph down to an amortized storage of a few bits per link, leading to efficient in-memory
data structures for Web page adjacency queries (a basic primitive in link analysis). Do these
properties of locality and similarity extend to social networks in general? Whereas the Web
graph has a natural lexicographic order (by URL) under which this locality holds, there is no
such obvious ordering for social networks. Can we find such an ordering for social networks,
leading to compression through lexicographic locality?
Our main contributions presented in this chapter are the following. We propose a new
compression method that exploits link reciprocity in social networks. Motivated by this
and BV, we formulate a genre of graph node ordering problems that distill the essence
of locality in BV-style algorithms. We develop a simple and practical heuristic based on
shingles for obtaining an effective node ordering; this ordering can be used in BV-style
compression algorithms. We then perform an extensive set of experiments on four large
real-world graphs, including two social networks. Our main findings are: social networks
appear far less compressible than Web graphs yet closer to host graphs and exploiting link
reciprocity in social networks can vastly help its compression.
The rest of the chapter is organized as follows. Section 6.2 discusses the related work.
Section 6.3 outlines the basic compression scheme of Boldi and Vigna, and proposes a new
scheme that exploits link reciprocity. Section 6.4 formalizes the optimal node ordering problem and supplies a simple and practical heuristic for this problem. Section 6.9 contains a
detailed account of our experiments on four large real-world graphs.
6.2
Related work
Prior related work falls into three major categories: (1) compressing Web graphs; (2) compressed indexes and (3) graph ordering problems.
Randall et al. [91] suggested lexicographic ordering as a way to obtain good Web graph
compression; some hardness results in this context were obtained by Adler and Mitzenmacher [1]. Raghavan and Garcia-Molina [90] considered a hierarchical view of the Web
graph to achieve compression; see also Suel and Yuan [99] for a structural approach to compressing Web graphs. A major step was taken by Boldi and Vigna [10], who developed a
generic Web graph compression framework that takes into account the locality and similarity of Web pages; our formulation is based on this framework. Boldi and Vigna [11] also
developed ζ-codes, to exploit power law distributed integer gaps. Recently, Buehrer and
Chellapilla [22] used the frequent pattern mining approach to compress Web graphs. Using
this different approach, they were able to achieve a compression of under two bits per link.
The problem of assigning or reassigning document identifiers in order to compress text
indexes has a long history. Blandford and Blelloch [8] considered the problem of compressing
6.3. COMPRESSION SCHEMES
105
text indexes by permuting the document identifiers to create locality in an inverted index, i.e.,
clustering property of posting lists. Silvestri, Perego, and Orlando [96] proposed a clustering
approach for reassigning document identifiers. Shieh et al. [94] proposed a document id
reassignment method based on a heuristic for the traveling salesman problem. Recently,
Silvestri [95] showed that assigning document identifiers to Web documents based on URL
lexicographic ordering improves compression.
There are several classical node ordering problems on graphs. The minimum bandwidth
problem, where the goal is to order the nodes to minimize the maximum stretch of edges, and
the minimum linear arrangement problem, where the goal is to order the nodes to minimize
the sum of stretch of edges, have a long history. We refer to [45] and the online compendium
at www.nada.kth.se/~viggo/wwwcompendium/node52.html.
6.3
Compression Schemes
In this section we outline the compression technique used in the rest of the chapter. The
framework is based on the algorithm of Boldi and Vigna for compressing Web graphs [10];
their algorithm achieved a compression down to about three bits per link on a snapshot of
the Web graph. We henceforth refer to this as the BV compression scheme, which we first
describe. Next, we describe what we call the backlinks compression (BL) scheme, which
targets directed graphs that are highly reciprocal.
Notation Let G = (V, E) be a directed graph and let |V | = n. The nodes in V are
bijectively identified with the set [n] = {1, . . . , n} of integers. For a node u ∈ V , let
out(u) ⊆ V denote the set of outlinks of u, i.e., out(u) = {v | (u, v) ∈ E}. Likewise, let
in(u) denote the set of inlinks of u. Let outdeg(u) = |out(u)| and indeg(u) = |in(u)|. If both
(u, v) ∈ E and (v, u) ∈ E and u < v, then we call the edge (v, u) to be reciprocal. For a node
u ∈ V , let rec(u) be {v | (v, u) is reciprocal }. Let lg denote log2 .
We will encode all integers using one of three different encoding schemes, namely, Elias’s
γ-code, δ-code, and Boldi–Vigna ζ-code with parameter 4 (which we found to be the best
in our experiments) [11]. These integer encoding schemes encode an integer x ∈ Z + using
close to the informatic-theoretic minimum of 1 + blg(x)c bits. For example, the number of
bits used by the γ-code to represent x is 1 + 2blg xc. We refer to [107] for more background
on these codes.
6.3.1
BV compression scheme
BV incorporates three main ideas. First, if the graph has many nodes whose neighborhoods
are similar, then the neighborhood of a node can be expressed in terms of other nodes
with similar neighborhoods. Second, if the destinations of edges exhibit locality, then small
integers can be used to encode them (relative to their sources). Third, rather than store the
destination of each edge separately, one can use gap encodings to store a sequence of edge
destinations. Given a sorted list of positive integers (say, the destinations of edges from a
node), we write down the sequence of gaps between subsequent integers on the list, rather
106
CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS
than the integers themselves. The idea is that even if the integers are big (requiring many
bits to record), the gaps between integers on the list could be recorded with few bits.
We now detail the BV scheme for compressing Web graphs. The nodes are Web pages
and the directed edges are the hyperlinks. First, we order Web pages lexicographically by
URL. This assigns to each Web page a unique integer identifier (ID), which is its position in
this ordering. Let w be a window parameter; for the Web, BV recommend w = 8.
Let v be a Web page. Its encoding will be as follows.
1. Copying. Check if the list out(v) of v’s outlinks is a small variation on the list of one
of the w − 1 preceding Web pages in the lexicographic ordering. Let u be such a prototype
page, if it exists.
2. Encoding. Encode v’s outlinks as follows. If the copying phase found a prototype u,
then use lg w bits to encode the (backward) offset from v to u, followed by the changes from
u’s list to v’s. If none of the lg w preceding pages in the lexicographic ordering offers a good
prototype, set the first lg w bits to all 0’s, then explicitly write down v’s outlinks. (BV also
optimize further by storing a list i, i + 1, . . . , j − 1, j of consecutive outlinks by storing the
interval [i, j] instead.)
Note that locality and similarity are captured by the copying phase. By using clever gap
encoding schemes (using the integer codes mentioned earlier) on top of the basic method
above, BV obtain their best results. Note that the exploitation of lexicographic locality
here hinges crucially on the natural ordering available on the Web pages (URLs). For more
details, we refer to the original paper [10] and [107, Chapter 20].
This general method of compression has two nice properties. First, it is dependent only
on locality in some canonical ordering. Second, adjacency queries (fetch all the outlinks of
a given node) can be served fairly efficiently. Given a Web page whose outlinks are sought,
we enumerate these outlinks by decoding backwards through the chain of prototypes, until
we arrive at a list whose encoding begins with at least lg w 0’s. While in principle this
chain could be arbitrarily long, in practice you can force the algorithm to cut them down
when their length exceeds a given threshold t, and small values of t already provide a good
compromise between compression ratio and decompression speed.
6.3.2
Backlinks compression scheme
We now describe a slighly different compression scheme that is motivated by the observed
properties of social networks. This scheme, called BL, incorporates an additional idea on top
of BV, namely, link reciprocity. Here, reciprocal links are encoded in a special way. Since
social networks are known to be mostly reciprocal (if Alice is Bob’s friend, then Bob is very
likely to be Alice’s friend), this will turn out to be advantageous.
Suppose we obtain an ordering of the nodes in the graph through some process to be
discussed later; we will identify each node in the graph with its position in this ordering.
Let v be a node. Its encoding will consist of the following.
1. Base information. The outdegree |out(v)|, minus 1 if v has a self-loop, and minus the
number of reciprocal edges from v. Also, a bit specifying if v has a self-loop.
2. Prototype. The node u that v uses as a prototype to copy from: as u ≤ v in the
ordering, u is encoded as the difference between u and v. If u = v, then no copying is
6.4. COMPRESSION-FRIENDLY ORDERINGS
107
performed. Otherwise, a bit is added for each outlink of u, representing whether or not that
outlink of u is also an outlink of v.
3. Residual edges. Let (v, v1 ), . . . , (v, vk ) be the outlinks of v that are yet to be encoded
after the above step. Let v1 ≤ · · · ≤ vk . We write one bit stating if v > v1 or v < v1 . Then
we encode the gaps |v1 − v| , |v2 − v1 | , . . . , |vk − vk−1 |.
4. Reciprocal edges. Finally, we encode the reciprocal outlinks of v. For each v 0 ∈ out(v)
such that v 0 > v, we encode whether v 0 ∈ rec(v) or not using one bit per link and discard
(v 0 , v).
Note that reciprocal edges are succinctly encoded by the last step. Thus, this method
potentially outperform BV in terms of compression. However, it has a drawback: unlike in
BV, adjacency queries may be slower. This is because BV limits the “length” of prototype
chains but we do not impose such a limit in BL, for best compression. If the compressed
representation of a network bottlenecks adjacency query serving, then a limit on the length
of copying chain can be introduced in BL as well.
6.4
Compression-friendly orderings
In both the BV and BL schemes, the ordering of nodes plays a crucial role in the performance
of the compression scheme. The performance of suggests that the lexicographic ordering of
URL’s for the Web graph is both natural and crucial, begging the question: can we find such
orderings for other graphs, in particular, social networks? If we could, we would be able to
apply either the BV or the BL scheme. In this section we formulate ordering problems that
are directly motivated by the BV and BL compression schemes.
6.4.1
Formulation
We first formalize the problem of finding the best ordering of nodes in a graph for the BV
and BL schemes. As we saw earlier, both algorithms benefit if locality and similarity are
captured by this ordering. This leads to the following natural combinatorial optimization
problem, which we call minimum logarithmic arrangement.
P
Problem 6.4.1 (MLogA) Find a permutation π : V → [n] such that (u,v)∈E lg |π(u) −
π(v)| is minimized.
The motivation behind this definition is to minimize the sum of the logarithms of the edge
lengths according to the ordering (where the length of the edge u → v is |π(u) − π(v)|).
Notice this cost represents the compression size of the length of the edge in an encoding that
is information-theoretically optimal (or nearly so).
Also note that if the term inside the summation were just |π(u) − π(v)|, then this is
the well-known minimum linear arrangement (MLinA) problem. MLinA is NP-hard [46];
little, however, is known
about its approximability. The best algorithm [92] approximates
√
MLinA with a O( log n log log n) multiplicative error with respect to the optimal solution;
further this algorithm is not practical for large graphs. From the standpoint of the hardness
of approximation, only the existence of a PTAS has been ruled out [4]. One cannot hope to
108
CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS
use an approximate solution to MLinA to solve MLogA since we can show (Section 6.5)
that these problems are very different in their structure.
In actually compressing the graph, it is more efficient to compress the gaps induced by
the neighbors of a node. Suppose u < v1 < v2 and (u, v1 ), (u, v2 ) ∈ E. Then, compressing
the gaps v1 − u and v2 − v1 is always and could be far less expensive than compressing the
lengths of the edges, namely, v1 − u and v2 − u. For this reason, we introduce a slightly
modified problem, called minimum logarithmic gap arrangement. Let f (u, out(u), π) be the
cost of compressing out(u) under the ordering π, i.e., if u0 = u, out(u) = {u1 , . . . , uk } with
π(u1 ) ≤ · · · ≤ π(uk ), then
fπ (u, out(u)) =
k
X
lg |π(ui ) − π(ui−1 )|,
i=1
where u = u0 .
Problem 6.4.2 (MLogGapA) Find a permutation π : V → [n] such that
is minimized.
P
u∈V
fπ (u, out(u))
Once again, as a problem, MLogGapA turns out to be very different from MLinA and
MLogA.
Both formulations MLogA and MLogGapA capture the essence of obtaining an ordering that will benefit BV and BL compressions. We believe a good approximation algorithm
for either of these problem will be of practical interest.
6.4.2
Hardness results
In this section we exploit some structure of those problem and we give some hardness result.
6.5
MLogA vs. MLinA vs. MLogGapA
0
1
2
4
5
6
3
Figure 6.1: An example showing the difference between MLogA and MLinA.
The graph in Figure 6.1 is an example showing that the MLinA and the MLogA problems can have different solutions: there is no ordering that minimizes both the objective
6.6. HARDNESS OF MLOGA
109
functions simultaneously. The best solutions for MLinA have value 19 whereas the best solutions for MLogA have value lg 180. It can checked that among the optimal MLinA orderings (with value 19), the best for MLogA has value lg 192 (e.g, the ordering 4, 5, 3, 2, 6, 1, 0).
Among the optimal MLogA ordering (with value lg 180), the best for MLinA has value 20
(obtained by swapping 3 and 5 in the previous ordering).
It is easy to similarly show that MLogGapA can have different solutions from both
MLinA and MLogA problems. For instance, consider a star with three leaves. The optimum
ordering for MLogGapA will place the center of the star as the first (or last) node of the
ordering, yielding a total cost of 0. On the other hand this solution is suboptimal for both
MLinA and MLogA, which would place the center of the star as either the second or the
third in the ordering.
6.6
Hardness of MLogA
In this subsection we prove that the MLogA problem is NP-hard on multi-graphs.
Theorem 6.6.1 The MLogA problem is NP-hard on multi-graphs.
Proof: We prove the hardness of MLogA via a reduction making use of the inapproximability of MaxCut. Our starting point, from [50], is that MaxCut cannot be approximated
16
+ unless P = N P . In the reduction below we have not attempted
to a factor greater than 17
to optimize parameters.
We start from a MaxCut instance (G(V, E), k), where the question is whether there
exists a cut of size at least k in G. Let |V | = n and |E| = m. We build the graph G0
composed by a clique of size n100 and a disjoint copy of the negation of G denoted by Ḡ.
Further, we add an edge between each node of the clique and each node of Ḡ. Each edge of
the clique will have multiplicity
n500 + 1, all other edges will havePunit multiplicity.
P
Let C be equal to 1≤i<j≤n100 +n+1 lg(j − i) and let X = n500 1≤i<j≤n100 lg(j − i). Now
we would like to answer the following question Q: Is it possible to find an ordering of G0
with an MLogA cost smaller then Z? We show that answering questions of the form Q
would allow us to approximate the corresponding MaxCut instance.
First, note that in any ordering of G0 for which the answer for Q is yes when Z =
C + X − k lg n100 , the nodes in the clique must be adjacent. Otherwise, at least one edge
of the clique will be enlarged by at least 1. In this case, the overall cost of the clique edges
will be at least X − (n500 + 1)(lg n100 ) + (n500 + 1) lg(n100 + 1), which is X + Ω(n400 ). This
is larger than the value allowed by the question Q.
We show that if the answer to Q when Z = C + X − k lg n100 is positive then there
1
is a cut in G of size at least k(1 − 50
), and otherwise there is no cut of size k. As this
allows approximations of MaxCut to a factor better than 16/17, this shows that we can
have an algorithm to answer questions of the form Q only if P = N P , proving the hardness
of MLogA. From our previous argument, we now need only consider ordering of G0 where
the clique nodes are laid out consecutively. Each such ordering naturally gives a cut of
the
P original graph, and the value of the MLogA objective function is equal to C + X −
{u,v}∈E(G) lg |π(u) − π(v)|. Consider the edges in G (corresponding to the missing edges in
110
CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS
G0 ) that pass over the clique. Each of these edges will have length at least n100 , and hence
the cost of value of the MLogA objective function is smaller than C + X − k lg n100 = Z.
Hence if the there is a cut of size at least k in G the answer to Q is yes.
On the other hand, each of the other missing edges will have length at most n, (the order
of G), and hence have cost at most lg n. As the MaxCut value k is at least m2 , if G does
1
) the smallest that the MLogA objective function can
not have a cut of size at least k(1 − 50
be is
1
C +X −k 1−
lg(n100 + n) − k lg n > Z
50
for n sufficiently large. This proves the claim.
6.7
Hardness of MLinGapA
While we are currently unable to show that MLogGapA is NP-hard, we can show that its
“linear” version (i.e., without the logarithms), MLinGapA, is indeed hard.
Theorem 6.7.1 The MLinGapA problem is NP-hard.
Proof: We start from the (directed) MLinA problem, which is known to be NP-hard. Let
(G(V, E), k) be a MLinA instance (is there a linear arrangement whose sum of edge lengths
is ≤ k?). Let n = |V | and m = |E|. We create the instance of the (directed) MLinGapA
problem as follows.
The graph G0 will be composed by n0 = nc+1 + 2m nodes (for some large enough constant
c). For each node v ∈ V (G), two directed cliques Kv,1 and Kv,2 of equal sizes nc will be
created. Also, a clique of n nodes dv,1 , . . . , dv,2n (the “peer nodes” of v) will be created for
each v ∈ V (G). Each node in Kv,1 and each node in Kv,2 will point to node dv,i for all
i = 1, . . . , deg(v) and vice versa.
The set E(G0 ) will contain 2m other edges, that we call the “original” edges. In particular,
for each edge (v, u) ∈ E(G) the edges (dv,∗ ,du,∗ ) and (du,∗ ,dv,∗ ) will be added (in such a way
that each node dv,∗ will have outdegree ≤ n).
Given an arbitrary node v, consider the following ordering (that we dub good) of its two
cliques and of its peer nodes: the first clique laid out on nc consecutive nodes, followed by
its 2n peers, and finally the second clique (using a total of nc + n nodes). Let F be the cost
of the edges of the cliques, and the edges from the cliques to the peers, in this ordering (F
can be trivially computed in polytime).
Now we ask: does there exist an ordering with MLinGapA value at most nF +3K(2nc )+
3mn2 = T ?
If there exists a MLinA ordering π of cost at most K, it is easy to find a MLinGapA
ordering of cost at most T . If v is the first node of π, place the first clique of v followed by
the peers of v and the second clique of v at the beginning. Then do the same for the second
node of π, and so on, until all nodes have been placed. What is the total MLogGapA
cost? We have a fixed cost of nF (the ordering of the “nodes structures”) for the non-original
edges. As for the original edges, note that each node from which an original edge starts has
6.7. HARDNESS OF MLINGAPA
111
out-degree 1, thus encoding the “gap” induced by that edge has the same cost of encoding
its length. What is its length? The number of cliques that an edge (that had length ` in π)
passes over in the new ordering is 2`. Each such clique has size nc . Thus, the cost in the
new ordering of the edge will be at most `2nc + ξ, where ξ is an error term that equals n2
(the total number of peer nodes).
Now for any edge of length ` in the MLA, there are three gaps of cost at most `2nc + n2 .
The total cost will thus be at most nF + 3K(2nc ) + 3mn2 = T .
Now suppose we have a MLinGapA ordering with MLinGapA value at most T . We
show in turn that there is a MLinA ordering of cost at most K. To show this, we first prove
that for each v the ordering will be such that a) the distance between any two nodes of Kv,1
(resp., Kv,2 ) will be at most nc + n4 (that is, the cliques won’t be spread out), b) the distance
between each single peer of v and its nearest node of Kv,1 (Kv,2 ) will be at most n4 .
Suppose this statement is true. We show by contradiction that there must exists a
MLinA ordering of value at most K. First, notice that the minimum cost that we have to
pay for the edge between nodes in V (G0 ) that are generated from one node v is at least F (in
any ordering the gaps are of length at least 1 and for any ordering the sum of the backward
edges is at least their cost in the good ordering). Further, from properties (a) and (b) it
follows that in all valid solutions, for each v ∈ V (G), each peer node of v must be placed at
distance at most nc +2n4 from each clique node of v. Now, the number of nodes of the cliques
generated by v is 2nc so it’s necessary that each peer node has to be placed after at least
nc − 2n4 nodes of one of its two cliques and before nc − 2n4 nodes of its other (as each peer
node has to be at distance ≤ nc + 2n4 from each node of its cliques). Hence, the total cost
for any ordering of cost K +1 for the MLA problem is at least nF +3(K +1)(2nc −4n4 ) > T ,
a contradiction.
Now we have to prove properties (a) and (b). First we show (a): if the maximum distance
between two nodes in any of the Kv cliques is > nc + n4 the total cost of the ordering is
> T . Indeed if the distance between any two nodes of Kv is > nc + n4 , then the cost for the
edges between the clique and peer nodes of v will be ≥ F + nc+4 − nc were the first term
of the sum is due to the fact that all the gaps are of length at least one, and that there are
at least nc + n backlinks. The nc+4 − nc term is the added cost due to the spread of the
clique (which is ≥ n4 , and the – say – rightmost node of the clique must go across all the
non-clique nodes between clique nodes, for a total of at least nc − 1 links). Hence, the cost
of the ordering would be ≥ nF + nc+4 − nc > T , contradicting the validity of the solution
(as K ∈ O(n2 )).
Finally we have to prove (b): for each v ∈ V (G), no peer node of v is at distance ≥ n4
from the each of the cliques of v. Proceeding as before, we lower bound the cost of the
ordering for the edges between the nodes of the peers and the cliques of v. The cost of the
ordering will be F plus the cost due to the enlargement of the gaps between v and Kv . Thus,
the total cost of the ordering is ≥ nF + nc+4 > T , again a contradiction.
112
6.8
CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS
Lowerbound: MLogA for expanders
We can also show a lower bound on the solution to MLogA for expander-like graphs,
suggesting that they are not compressible with constant number of bits per edge via BV/BL
schemes.
Lemma 6.8.1 If G has either constant edge expansion or constant conductance, then the
value of MLogA on G is Ω(m log n). If, instead, G has constant node expansion, then the
value of MLogA on G is Ω(n log n).
Proof:
Let G be a simple graph with no isolated nodes. For the edge expansion case, note that
for any S ⊆ V such that |S| = n/2, we have that Θ(n) edges are in the√cut (S, G \ S). Now if
Θ(n) edges are in the cut then there are Θ(n) gaps of length at least n because the graph
is simple. Hence the claim follows. For the constant conductance case, note that if G has no
isolated nodes, then having constant conductance implies having constant edge expansion.
The node expansion case is analogous.
6.8.1
The shingle ordering heuristic
In this section we propose a simple and practical heuristic for both MLogA and MLogGapA problems. Our heuristic is based on obtaining a fingerprint of the outlinks of a node
and ordering the nodes according to this fingerprint. If the fingerprint can succinctly capture
the locality and similarity of nodes, then it can be effective in BV/BL compression schemes.
To motivate our heuristic, we recall the Jaccard coefficient J(A, B) = |A ∩ B|/|A ∪ B|, a
natural notion of similarity of two sets. Let σ be a random permutation of the elements in
A ∪ B. For a set A, let Mσ (A) = σ −1 (mina∈A {σ(a)}), the smallest element in A according
to σ; we call it the shingle. It can be shown [19] that the probability that the shingles of A
and B are identical is precisely the Jaccard coefficient J(A, B), i.e.,
Pr[Mσ (A) = Mσ (B)] =
|A ∩ B|
.
|A ∪ B|
Instead of using random permutations, it was shown that the so-called min-wise independent
family suffices [19]; in practice, even pairwise independent hash functions work well. It is
also easy to boost the accuracy of this probabilistic estimator by combining multiple shingles
obtained from independent hash functions.
The intuition behind our heuristic is to treat the outlinks out(u) of a node u as a set
and compute the shingle Mσ (out(u)) of this set for a suitably chosen permutation (or hash
function) σ. The nodes in V can then be ordered by the shingles. By the property stated
above, if two nodes have significantly overlapping outlinks, i.e., share a lot of common
neighbors, then with high probability they will have the same shingle and hence be close to
each other in a shingle-based ordering. Thus, the properties of locality and similarity are
captured by the shingle ordering heuristic. (Gibson et al. [47] used a similar heuristic, but
for identifying dense subgraphs of large graphs.)
6.8. LOWERBOUND: MLOGA FOR EXPANDERS
6.8.2
113
Properties of shingle ordering
While shingle ordering might appear to be an unmotivated heuristic for obtaining a compressionfriendly ordering, it has theoretical justification. In this section we show that using shingle
ordering, it is possible to copy a constant fraction of the edges in a large class of random
graphs with certain properties. The well-known preferential attachment (PA) model [5, 14],
for instance, generates graphs in this class. Our analysis thus shows that it is indeed possible to obtain provable performance guarantees on shingle ordering with respect to copying
(hence compression) in stylized models.
We first prove the following general statement about the sufficient conditions under which
using shingle ordering can copy a constant fraction of edges.
Theorem 6.8.1 Let G = (V, E) be such that |E| = Θ(n) and ∃S ⊆ V such that
(a) |S| = Θ(n),
(b) ∀v ∈ S, ∃v 0 ∈ S, v 6= v 0 , s.t. |out(v) ∩ out(v 0 )| ≥ 1,
(c) there exists a constant k, s.t. ∀v ∈ S, outdeg(v) ≤ k,
1
(d) ∀v ∈ S, ∀w ∈ out(v), indeg(w) ≤ n 2 − .
Then, with probability 1 − o|V | (1) (over the space of permutations), at least a constant
fraction of the edges will be “copied” (even with a window of size 1) when using the shingle
ordering.
Proof:
We need the following concentration inequality, proved (in a stronger form) by McDiarmid
[77].
Theorem 6.8.2 Let X be a non-negative random variable not identically 0, which is determined by an independent random permutation σ, satisfying the following for some c, r > 0:
interchanging two elements in the permutation can affect X by at most c, and for any s, if
X ≥ s then there is a set of at most rs coordinates of σ whose values certify that X ≥ s.
Then, for any 0 ≤ t ≤ E[X],
p
t2
.
Pr[|X − E[X]| > t + 60c rE[X]] ≤ 4 exp − 2
8c rE[X]
Using this, we prove Theorem 6.8.1. Given an ordering and a node v, we say that v 0 is
the predecessor of v if it is placed at the immediate left of v in the ordering.
Also, given an ordering and an arbitrary node v, we say that the edge (v, w) is “shingled”
if the position of v is determined by w (that is, if the minimum out-neighbor of v, according
to the random shingle permutation, is w).
Also, we say that a node v is shingled by w if w is the minimum out-neighbor of v
according to the random shingle permutation. A node v ∈ S is “good” if there exists another
node v 0 ∈ S, v 6= v 0 , such that v and v 0 are shingled by the same node.
Let X be the number of “good” nodes. How can we lower bound the expectation of X?
By property (b) each node v in S has a common outneighbor with at least another node in
S. As all nodes in S have outdegree bounded by k, with probability 1/(2k − 1) ≥ 1/(2k)
one of their common out-neighbors will be the smallest of both their out-neighborhoods
114
CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS
according to the random shingle permutation — that is, they will be shingled together.
1
Thus, E[X] ≥ 2k
|S|.
We will later argue that X will be ≥ Ω(|S|), w.h.p. . This entails that at least Ω(|S|)
edges are copied. Indeed, partition the good nodes in S according to their shingling node.
Each part will contain at least two nodes (by the definition of good nodes), and in each part
all the nodes - but the first - will copy their edge pointing to their shingling node. Thus,
the fraction of good nodes in a part copying at least one of their edges is ≥ 1/2. The claim
follows.
To obtain the high probability lower bound on X we use Theorem 6.8.2. Note that here
we only have the random shingle permutation (that is, no random trials). In order to use
Theorem 6.8.2 we have to choose suitable c, r.
Using property (d), we can upperbound the effect on X of a swap of two elements with
1
c = 2n 2 − (the only nodes that can change their good or bad status are the in-neighbors of
1
the two swapped nodes — these can be upperbounded by 2n 2 − ).
If a node v ∈ S is good, then there exists one other node v 0 ∈ S with the same shingling
node w. Thus, to certify that v is good it suffices to reveal the positions of the nodes in
N + (v) ∪ N + (v 0 ) — v is good iff w is the first of the nodes in N + (v) ∪ N + (v 0 ). As the degrees
of v, v 0 are bounded by k, we can safely choose r = 2k.
By plugging c, r into Theorem 6.8.2 we get the high probability lower bound on X. It is trivial to note that this holds even for undirected graphs; indeed, each undirected
edge {u, v} can be substituted by two directed edges (u, v), (v, u). Then, for each node, its
original set of neighbors will be the same as its new sets of in- and outneighbors.
We now show the main result of the section: using shingle ordering it is possible to copy
a constant fraction of the edges of graphs generated by the PA model.
Theorem 6.8.3 With high probability, the graphs generated by the PA model satisfies the
properties of Theorem 6.8.1.
Proof: We start by removing the nodes incident to multi-edges or loops — these nodes (and
their incident edges) are1 , altogether, o(n). Also, we remove all nodes of degree > k, for
some constant k — by [15] only k n edges and nodes will be removed this way.
The resulting graph will thus have at most n nodes and at least (1 − 2k )mn ≥ (1 − 2k )n
edges. Also its maximum degree will be k. By averaging, a graph having these three
n
nodes of degree at least 2.
properties will contain at least (1 − 2k ) 2k
Now take all the nodes v in this graph incident to a neighbor of degree ≥ 2. There are
n
≥ (1 − 2k ) 2k
such neighbors and each of them will be connected to at most k such v’s —
thus the number of these v’s is at least Ω(n/(2k 2 )) = Ω(n). The set of these v’s is the S of
Theorem 6.8.1.
As our experiments show, shingle ordering allows both BL and BV schemes to take
significant advantage of copying.
1
This can be easily shown by noting that the expected number of multiple edges and self-loops added
by the n-th inserted node is O(m3 /n1/2− ), conditioned on the fact that the highest degree at that point is
O(n1/2+ ) whp [40]. Then, by Markov’s inequality we can obtain the claim.
6.9. EXPERIMENTAL RESULTS
6.9
115
Experimental results
In this section we describe the experimental results. The goal of our experiments is two-fold:
(1) study the performance of BV/BL schemes using the shingle ordering on social networks;
(2) obtain insights into the differences between the Web and social networks in terms of
their compressibility. First we begin with the description of the data sets we use for our
experiments. Next we discuss the baselines we use (to compare against shingle ordering).
Finally we present and discuss our experimental results.
6.9.1
Data
For our experiments, we chose four large directed graphs: (i) a 2008 snapshot of LiveJournal
(a social network site, livejournal.com) and an induced subgraph of users, called LiveJournal (zip), for whom we know their zip codes; (ii) monthly snapshots of Flickr (a photosharing
site, flickr.com) from March 2004 until April 2008; (iii) the host graph of a 2005 snapshot of the .uk Web graph; and (iv) the host graph of a 2004 snapshot of the India+China
(.in,.cn) Web graph.
Graph
UK-host
India+China host
LiveJournal
LiveJournal (zip)
Flickr (04/2008)
Flickr (03/2004)
n
|E|
% reciprocal
edges
587,205
12,825,465
18.6
19,123
233,380
10.6
5,363,260 79,023,142
72.0
1,314,288 8,040,562
79.0
25,158,667 69,702,479
64.4
4,708
7,694
83.6
Table 6.1: Basic properties of our graphs.
In Table 6.1, we summarize the properties of the graphs we have considered. Notice the
magnitude of the reciprocity of social networks (LiveJournal and Flickr). Our BL scheme
critically leverages this property of such networks.
6.9.2
Baselines
We use the following orderings as our baselines to compare against the shingle ordering.
(1) Random order. We use a random permutation of all the nodes in the graph.
(2) Natural order. This is the most basic order that can be defined for a graph. For Web
and host graphs, a natural order is the URL lexicographic ordering (used by BV). For a
snapshot of LiveJournal, a natural order is the order in which the user profiles were crawled.
For Flickr, since we know the exact time at which each node and edge was created, a natural
order is the order in which users joined the network.
(3) Geographic order. In a social network, if geographic information is available in the
form of a zip code, then this defines a geography-based order. Liben-Nowell et al. [73] showed
that about the 70% of social network links arise from geographical proximity, suggesting that
116
CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS
friends can be grouped together using geographical information. Notice that this only defines
a partial order (i.e., with ties).
(4) DFS and BFS order. Here, the orderings are given by these common graph traversal
algorithms. We also try the undirected versions of these traversals, where the edge direction
is discarded.
To test the robustness of shingle ordering, we also use an ordering obtained by two
shingles instead of just one, where the second shingle is used to break ties produced by the
first. We call this the double shingle ordering. When only one shingle was used, ties were
broken using the natural order.
Our performance numbers are always measured in bits/link.
6.9.3
Compression performance
Graph
LiveJournal
Flickr
UK host
India+China host
Natural
14.435
21.865
10.826
9.224
Random
23.566
23.958
15.543
10.543
LiveJournal
Flickr
UK host
India+China host
Natural
9.564 (ζ4 )
16.382 (ζ4 )
10.574 (δ)
9.753 (ζ4 )
Random
15.169 (ζ4 )
17.785 (ζ4 )
14.528 (δ)
10.823 (ζ4 )
BV
Shingle
Double Shingle
15.956
15.828
13.549
13.496
8.218
8.138
7.367
7.120
BL
Shingle
Double Shingle
10.461 (ζ4 )
10.435 (ζ4 )
10.952 (ζ4 )
10.915 (ζ4 )
8.243 (δ)
8.133 (δ)
7.310 (δ)
7.126 (δ)
Table 6.2: Performances of the compression techniques under different orderings.
In Table 6.2, we present the results of the different compression/orderings on four of
the graphs. This table shows that double shingle ordering produces the best or near-best
compression, for both BV and BL. In some cases, it cuts almost half the number of bits used
by the natural order. Also we note that the improvement of BL over BV is significant for
networks that are highly reciprocal, i.e., social networks. Finally, the numbers show interesting similarities between social networks and host graphs. In both cases, their compressibility
using the best compression (BL with double shingle order) is on par with one another.
It is interesting to note how the best compression rates for the UK host and the India+China host graphs are almost as high as the ones of the social networks (only 2-3 bits
less than the 10-11 bits needed for social networks), even if the host graphs are much smaller
than the social networks. For comparison, we note how the snapshot of the UK domain (India+China domains) that we used to obtain the host graph, was found to be compressible to
1.701 (1.472) bits/link (see [10] and http://law.dsi.unimi.it/). This seems to indicate
that host graphs are very hard to compress.
We also note how (Table 6.3) the BFS/DFS orderings are always suboptimal (almost as
bad as a random order). In Table 6.4, we show the performance of geographical ordering on
6.9. EXPERIMENTAL RESULTS
Graph
LiveJournal
UK host
India+China host
LiveJournal
UK host
India+China host
117
BV
DFS
Undir. DFS
BFS
Undir. BFS
19.992
20.253
20.763
21.376
14.630
14.474
14.903
14.634
10.172
10.210
10.231
9.810
BL
DFS
Undir. DFS
BFS
Undir. BFS
12.924 (ζ4 ) 13.096 (ζ4 ) 13.401 (ζ4 ) 13.778 (ζ4 )
13.774 (ζ4 ) 13.607 (ζ4 ) 13.978 (ζ4 ) 13.731 (ζ4 )
10.561 (ζ4 ) 10.317 (ζ4 ) 10.558 (ζ4 ) 10.105 (ζ4 )
Table 6.3: Performance of the BFS/DFS orderings.
Graph
LiveJournal (zip)
LiveJournal (zip)
BV
Shingle
Double Shingle
17.042
16.975
BL
Geographic
Shingle
Double Shingle
11.396 (ζ4 ) 10.964 (δ)
10.950 (δ)
Geographic
17.258
Table 6.4: Performance of geographic ordering on LiveJournal (zip).
the induced subgraph of LiveJournal, restricted to users in US with a known zip code. We
see how ordering by zip code (i.e., in such a way that people at small geographic distance
are close to each other in the ordering) is much worse than ordering by shingle, suggesting
that geographic ordering is perhaps not useful for compression.
6.9.4
Temporal analysis
In Figure 6.2, we see how the different ordering and compression techniques achieve different
results on the monthly snapshots of the Flickr social network. The upper half of the figure
shows how the Flickr network grew over time. Here, we see that BL with shingle ordering
beats the competition uniformly over all the snapshots. We also see an interesting pattern:
BL obtains a better compression rate, with each of the orderings. It is remarkable to note
that even though the number of edges in Flickr grew by an enormous number between
March 2005 and April 2008, the compressibility of the network (under a variety of schemes
and orderings) has remained robust.
6.9.5
Why does shingle ordering work best?
Figures 6.3 and 6.4 show one reason why the shingle ordering helps compression: in the
LiveJournal, India+China host and UK-host graphs the number of small gaps is higher with
shingle ordering than with any other ordering (with the notable exception of the LiveJournal
graph, where the natural ordering is marginally better).
118
CHAPTER 6. COMPRESSIBILITY OF SOCIAL NETWORKS
26
24
22
20
bits/link
18
16
14
12
10
BV - Joining
BV - Shingle
BV - Random
Back Links - Joining
Back Links - Shingle
Back Links - Random
8
6
Mar 2004
Sep 2004
Mar 2005
Sep 2005
Mar 2006
Time
Sep 2006
Mar 2007
Sep 2007
Mar 2008
Figure 6.2: Performance on the temporal Flickr graph.
In Figure 6.3, the upper panel represents the number of gaps (y-axis) of a certain length
(x-axis) for the LiveJournal graph. The lower panel represents a sub-sampled version of the
same data: for each length i we deleted the length = i point with probability Θ(1/i). This
way, on expectation, the number of points in each interval 10k , . . . , 10k+1 is the same. The
bottom panel is more readable. Recall that in LiveJournal, the natural (crawl) ordering
beats shingle ordering by a small amount.
Figure 6.3: Gap distribution in LiveJournal graph.
In Figure 6.4, the upper (lower) panel represents the number of gaps (y-axis) of a certain
length (x-axis) for (top to bottom) for the UK-host and the India+China host. These are
sub-sampled versions of the actual data. Note that in both cases, the shingle ordering is
best. That is, the shingle ordering creates many more gaps of small length than the other
orderings. The smaller the length of a gap, the fewer bits it takes for encoding.
From these, we see that shingle ordering reduces gaps lengths. As we argued earlier,
shingle ordering also helps the BV and BL schemes exploit copying. These two benefits
together appear to be the main reasons why shingle ordering almost always outperforms
many other orderings.
6.9. EXPERIMENTAL RESULTS
119
Figure 6.4: Gap distribution in UK-host and India+China host graphs.
6.9.6
A cause of incompressibility
We investigate what causes social networks to be far less compressible than web graphs
(observed by [10] to be compressible to 2-3 bits per link). We ask the question: is the
densest portion of a social network far more compressible than the rest of the graph? To
study this, we analyze k-cores of the LiveJournal social network. Recall that a k-core of a
graph is the largest induced subgraph whose minimum degree is at least k. For each k, the
k-core of LiveJournal was extracted and compressed by itself. Then, the k-core edges were
removed from the original LiveJournal, which was also compressed by itself. The results are
shown in Figure 6.5. It is clear that as k increases, the k-core gets easier to compress but at
the same time the remaining graph gets harder and harder to compress. This suggests that
the low-degree nodes in social networks are primarily responsible for its incompressibility.
Total
K-core
Remaining
K-core size
2M
1M
500K
200K
Number of Nodes
Bits/Link
100K
9.8
9.75
9.7
9.65
9.6
9.55
9.5
9.45
9.4
9
8
7
6
5
4
3
2
1
10
20
30
40
50
60
70
80
90
100
K
Figure 6.5: Compressibility of k-cores.
k-cores can also be used to compress the social network. This is done by representing all
the nodes in a k-core by a single virtual node, and compressing the k-core graph and the
remainder graph (with the virtual node) separately. In our example, for k = 50, we obtain
9.435 bits/link compression. This is a slighly improvement over the best numbers in Table
6.2.
Bibliography
[1] Adler, M., and Mitzenmacher, M. Towards compressing web graphs. In Proc.
Data Compression Conference(DCC01) (2001), pp. 203–212.
[2] Aiello, W., Chung, F. R. K., and Lu, L. Random evolution in massive graphs. In
Proc. 42nd IEEE Symposium on Foundations of Computer Science(FOCS01) (2001),
pp. 510–519.
[3] Althófer, I., Das, G., Dobkin, D., Joseph, D., and Soares, J. On sparse
spanners of weighted graph. Discrete and Computational Geometry 9 (1993), 81–100.
[4] Ambühl, C., Mastrolilli, M., and Svensson, O. Inapproximability results for
sparsest cut, optimal linear arrangement, and precedence constrained scheduling. In
Proc. 48th Annual IEEE Symposium on Foundations of Computer Science(FOCS07)
(2007), pp. 329–337.
[5] Barabási, A.-L., and Albert, R. Emergence of scaling in random networks. Science 5439, 286 (1999), 509–512.
[6] Baswana, S., and Sen, S. A simple linear time algorithm for computing sparse
spanners in weighted graphs. In Proc. 30th International Colloquium on Automata,
Languages and Programming(ICALP03) (2003), pp. 384–396.
[7] Berenbrink, P., Elsässer, R., and Friedetzky, T. Efficient randomised broadcasting in random regular networks with applications in peer-to-peer systems. In
Proc. 27th ACM symposium on Principles of distributed computing(PODC08) (2008),
pp. 155–164.
[8] Bladford, D., and Blelloch, G. Index compression through document reordering.
In Proc. Data Compression Conference(DCC02) (2002), pp. 342–351.
[9] Boldi, P., Santini, M., and Vigna, S. Permuting web graphs. In Proc. of the
6th International Workshop on Algorithms and Models for the Web-Graph(WAW09)
(2009), pp. 116–126.
[10] Boldi, P., and Vigna, S. The webgraph framework I: Compression techniques. In
Proc. 13th International World Wide Web Conference(WWW04) (2004), pp. 595–601.
121
122
BIBLIOGRAPHY
[11] Boldi, P., and Vigna, S. The Webgraph framework ii: Codes for the world-wide
web. In Proc. Data Compression Conference(DCC04) (2004).
[12] Boldi, P., and Vigna, S. Codes for the world-wide web. Internet Mathematics 4,
2 (2005), 405–427.
[13] Bollobás, B. The diameter of random graphs. IEEE Trans. Inform.Theory 36, 2
(1990), 285–288.
[14] Bollobás, B., and Riordan, O. The diameter of a scale-free random graph. Combinatorica 1, 24 (2004), 5–34.
[15] Bollobás, B., Riordan, O., Spencer, J., and Tusnády, G. E. The degree
sequence of a scale-free random graph process. Random Structures and Algorithms 18,
3 (2001), 279–290.
[16] Borgs, C., Chayes, J. T., Daskalakis, C., and Roch, S. First to market is not
everything: An analysis of preferential attachment with fitness. In Proc. 39th Annual
ACM Symposium on Theory of Computing(STOC07) (2007), pp. 135–144.
[17] Boyd, S. P., Ghosh, A., Prabhakar, B., and Shah, D. Gossip algorithms: design, analysis and applications. IEEE Transactions on Information Theory 52 (2006),
1653–1664.
[18] Breiger, R. L. The duality of persons and groups. Social Forces (1974).
[19] Broder, A., Charikar, M., Frieze, A., and Mitzenmacher, M. Min-wise
independent permutations. J. Comput. Syst. Sci. 60, 3 (2000), 630–659.
[20] Broder, A., Glassman, S., Manasse, M., and Zweig, G. Syntactic clustering
of the web. Comput. Netw. ISDN Syst 29, 8-13 (1997), 1157–1166.
[21] Broder, A. Z., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S.,
Stata, R., Tomkins, A., and Wiener, J. Graph structure in the web. Computer
Networks 33 (2000), 309–320.
[22] Buehrer, G., and Chellapilla, K. A scalable pattern mining approach to web
graph compression with communities. In Proc. 1st International Conference on Web
Search and Data Mining(WSDM08) (2008), pp. 95–106.
[23] Carlson, J., and Doyle, J. Highly optimized tolerance: A mechanism for power
laws in designed systems. Phys. Rev. E 60 (1999), 1412.
[24] Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi,
A., and Raghavan, P. On compressing social networks. In Proc. 15th Conference
on Knowledge Discovery and Data Mining(KDD09) (2009), pp. 219–228.
BIBLIOGRAPHY
123
[25] Chierichetti, F., Kumar, R., Lattanzi, S., Panconesi, A., and Raghavan,
P. Models for the compressible web. In Proc. 50th Annual IEEE Symposium on
Foundations of Computer Science(FOCS09) (2009), pp. 331–340.
[26] Chierichetti, F., Lattanzi, S., and Panconesi, A. Rumor spreading in social networks. In Proc. 36th Internatilonal Collogquium on Automata, Languages and
Programming: Part II(ICALP09) (2009), pp. 375–386.
[27] Chierichetti, F., Lattanzi, S., and Panconesi, A. Rumour spreading and
graph conductance. In Proc. 21st Annual ACM-SIAM Symposium on Discrete Algorithms(SODA10) (2010), pp. 1657–1663.
[28] Cooper, C., and Frieze, A. M. A general model of web graphs. Random Structures
and Algorithms 3, 22 (2003), 311–335.
[29] Cooper, C., and Frieze, A. M. The cover time of the preferential attachment
graph. Journal of Combinatorial Theory, Ser. B 97, 2 (2007), 269–290.
[30] Demers, A. J., Greene, D. H., Hauser, C., Irish, W., Larson, J., Shenker,
S., Sturgis, H. E., Swinehart, D. C., and Terry, D. B. Epidemic algorithms
for replicated database maintenance. In Proc. 6th ACM symposium on Principles of
distributed computing(PODC’87) (1987).
[31] Dodds, P., Muhamad, R., and Watts, D. An experimental study of search in
global social networks. Science 5634, 301 (2003), 827–829.
[32] Doerr, B., Friedrich, T., and Sauerwald, T. Quasirandom rumor spreading. In
Proc. 19th Annual ACM-SIAM Symposium on Discrete Algorithms(SODA08) (2008).
[33] Doerr, B., Friedrich, T., and Sauerwald, T. Quasirandom rumor spreading:
Expanders, push vs. pull, and robustness. In Proc. 36th International Colloquium on
Automata, Languages and Programming(ICALP09) (2009), pp. 366–377.
[34] Dubhashi, D. Talagrand’s inequality in hereditary settings. In Technical report, Dept.
CS, Indian Istitute of Technology (1998).
[35] Dubhashi, D., and Panconesi, A. Concentration of Measure for the Analysis of
Randomized Algorithms. Cambridge University Press, 2009.
[36] Elsässer, R. On the communication complexity of randomized broadcasting in
random-like graphs. In Proc. 18th Annual ACM Symposium on Parallel Algorithms
and Architectures SPAA (2006), pp. 148–157.
[37] Fabrikant, A., Koutsoupias, E., and Papadimitriou, C. H. Heuristically optimized trade-offs: A new paradigm for power laws in the internet. In Proc. 29th International Colloquium on Automata, Languages and Programming(ICALP02) (2002),
pp. 110–122.
124
BIBLIOGRAPHY
[38] Faloutsos, M., Faloutsos, P., and Faloutsos, C. On power-law relationships
of the internet topology. In Proc. Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication(SIGCOMM99) (1999), pp. 251–
262.
[39] Feige, U., Peleg, D., Raghavan, P., and Upfal, E. Randomized broadcast in
networks. Algorithms 1 (1990), 128–137.
[40] Flaxman, A., Frieze, A. M., and Fenner, T. I. High degree vertices and eigenvalues in the preferential attachment graph. Internet Mathematics 2, 1 (2005).
[41] Fraigniaud, P., and Giakkoupis, G. The effect of power-law degrees on the
navigability of small worlds. In Proc. 28th ACM symposium on Principles of distributed
computing(PODC’09) (2009), pp. 240–249.
[42] Fraigniaud, P., and Giakkoupis, G. On the searchability of small-world networks
with arbitrary underlying structure. In Proc. 42nd Annual ACM Symposium on the
Theory of Computing(STOC10) (2010), pp. 389–398.
[43] Friedrich, T., and Sauerwald, T. Near-perfect load balancing by randomized
rounding. In Proc. 41st Annual ACM Symposium on Theory of Computing(STOC09)
(2009), pp. 121–130.
[44] Frieze, A., and Grimmett, G. The shortest-path problem for graphs with random
arc-lengths. Algorithms 1, 10 (1985), 57–77.
[45] Garey, M., and Johnson, D. Computers and Intractability: A Guide to the Theory
of NP-Completeness. W.H. Freeman and Company, 1979.
[46] Garey, M. R., Johnson, D. S., and Stockmeyer, L. Some simplified NPcomplete graph problems. Theory of Computer Science 1 (1976), 237–267.
[47] Gibson, D., Kumar, R., and Tomkins, A. Discovering large dense subgraphs
in massive graphs. In Proc. 31st International Conference on Very Large Data
Bases(VLDB05) (2005), pp. 721–732.
[48] Goel, S., Muhamad, R., and Watts, D. J. Social search in “small-world” experiments. In Proc. 18th international conference on World wide web (WWW’09) (2009),
pp. 701–710.
[49] Granovetter, M. The strength of weak ties. American Journal of Sociology 78, 6
(1973), 1360–1380.
[50] Håstad, J. Some optimal inapproximability results. Journal of the Assoc. Comp.
Mach 4, 48 (2001), 798–859.
[51] Jerrum, M., and Sinclair, A. Approximating the permanent. SIAM J. Comput.
18, 6 (1989), 1149–1178.
BIBLIOGRAPHY
125
[52] J.L. Guillaume, M. L. Bipartite graphs as models of complex networks. In Proc. 1st
Workshop on Combinatorial and Algorithmic Aspects of Networking(CAAN04) (2004),
pp. 127–139.
[53] Karande, C., Chellapilla, K., and Andersen, R. Speeding up algorithms on
compressed web graphs. In Proc. 2nd International Conference on Web Search and
Data Mining(WSDM09) (2009), pp. 272–281.
[54] Karoński, M., Scheinerman, E. R., and Singer-Cohen, K. B. On random intersection graphs: The subgraph problem. Combinatorics, Probability and Computing
8, 1–2 (2006), 131–159.
[55] Karp, R., Schindelhauer, C., Shenker, S., and Voecking, B. Randomized
rumor spreading. In Proc. 41st Annual IEEE Symposium on Foundations of Computer
Science(FOCS00) (2000), p. ***.
[56] Kempe, D., Dobra, A., and Gehrke, J. Gossip-based computation of aggregate information. In Proc. 44th IEEE Symposium on Foundations of Computer Science(FOCS03) (2003), pp. 482–491.
[57] Klee, V., and Larman, D. Diameters of random graphs. Canad. J. Math 33 (1981),
618–640.
[58] Kleinberg, J. Navigation in a small world. Nature 406 (2000), 845.
[59] Kleinberg, J. The small-world phenomenon: An algorithmic perspective. In Proc.
37th Annual ACM Symposium on Theory of Computing(STOC00) (2000), pp. 163–170.
[60] Kleinberg, J. Small-world phenomena and the dynamics of information. In Proc.
14th Advances in Neural Information Processing Systems (NIPS01) (2001), pp. 431–
438.
[61] Kleinfeld, J. Could it be a big world after all? Society 39 (2002), 61–66.
[62] Korte, C., and Milgram, S. Acquaintance links between white and negro populations: Application of the small world method. Journal of Personality and Social
Psychology 15, 2 (1970), 101–108.
[63] Kossinets, G., and Watts, D. J. Empirical analysis of evolving social networks.
Science 311, 5757 (2006), 88–90.
[64] Kumar, R., Novak, J., and Tomkins, A. Structure and evolution of online social networks. In Proc. 12th ACM SIGKDD international conference on Knowledge
discovery and data mining (KDD06) (2006), pp. 611–617.
[65] Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A.,
and Upfal, E. Stochastic models for the web graph. In Proc. 41st IEEE Symposium
on Foundations of Computer Science(FOCS00) (2000), pp. 57–65.
126
BIBLIOGRAPHY
[66] Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. Trawling the
web for emerging cybercommunities. In Proc. 8th International World Wide Web
Conference(WWW09) (1999), pp. 403–416.
[67] Lattanzi, S., and Sivakumar, D. Affiliation networks. In Proc. 41st ACM Symposium on Theory of Computing(STOC09) (2009), pp. 427–434.
[68] Leskovec, J., Backstrom, L., Kumar, R., and Tomkins, A. Microscopic evolution of social networks. In Proc. 14th ACM SIGKDD international conference on
Knowledge discovery and data mining (KDD’08) (2008), pp. 462–470.
[69] Leskovec, J., Chakrabarti, D., Kleinberg, J., and Faloutsos, C. Realistic,
mathematically tractable graph generation and evolution, using kronecker multiplication. In European Conference on Principles and Practice of Knowledge Discovery in
Databases(PKDD05) (2005), pp. 133–145.
[70] Leskovec, J., and Horvitz, E. Planetary-scale views on a large instant-messaging
network. In Proc. 17th international conference on World Wide Web (WWW’08)
(2008), pp. 915–924.
[71] Leskovec, J., Kleinberg, J. M., and Faloutsos, C. Graph evolution: Densification and shrinking diameters. ACM Transactions on the Web (TWEB) 1, 1 (2007),
1–41.
[72] Leskovec, J., Lang, K. J., Dasgupta, A., and Mahoney, M. W. Statistical
properties of community structure in large social and information networks. In Proc.
17th international conference on World Wide Web (WWW’08) (2008), pp. 695–704.
[73] Liben-Nowell, D., Novak, J., Kumar, R., Raghavan, P., and Tomkins, A.
Geographic routing in social networks. Proc. National Academy of Sciences 102, 33
(2005), 11623–11628.
[74] Lin, N., Dayton, P., and Greenwald, P. The urban communication network and
social stratification: A “small world experiment. Communication yearbook 1 (1978),
107–119.
[75] Mahdian, M., and Xu, Y. Stochastic kronecker graphs. In Proc. 5th Workshop on
Algorithms and Models for the Web-Graph (WAW’07) (2007), pp. 179–186.
[76] Mandelbrot, B. An informational theory of the statistical structure of languages.
In Communication Theory. 1953, pp. 486–502.
[77] McDiarmid, C. J. H. On the method of bounded differences. In Proc. 12th British
Combinatorial Conference (1989), pp. 148–188.
[78] Mihail, M., Papadimitriou, C. H., and Saberi, A. On certain connectivity
properties of the internet topology. J. Comput. Syst. Sci 72, 2 (2006), 239–251.
BIBLIOGRAPHY
127
[79] Milgram, S. The small world problem. Psychology Today 2 (1967), 60–67.
[80] Mitzenmacher, M. A brief history of generative models for power law and lognormal
distributions. Internet Mathematics 1, 2 (2003).
[81] Mitzenmacher, M. Editorial: The future of power law research. Internet Mathematics 2, 4 (2006), 525–534.
[82] Mosk-Aoyama, D., and Shah, D. Fast distributed algorithms for computing separable functions. IEEE Transactions on Information Theory 54, 7 (2008), 2997–3007.
[83] Mutafchiev, L. The largest tree in certain models of random forests. Random
Structures and Algorithms 13, 3-4 (1998), 211–228.
[84] Naor, M. Succinct representation of general unlabeled graphs. Discrete Applied
Mathematics 28 (1990), 303–307.
[85] Newman, M. Properties of highly clustered networks. Phys Rev E Stat Nonlin Soft
Matter Phys 68 (2003).
[86] Paris, R. B., and Kaminsky, D. Asymptotics and the Mellin-Barnes Integrals.
Cambridge University Press, 2001.
[87] Peleg, D., and Upfal, E. A trade-off between space and efficiency for routing
tables. Journal of Assoc. Comp. Mach 3, 36 (1989), 510–530.
[88] Pittel, B. On spreading a rumor. SIAM Journal on Applied Mathematics 47 (1987),
213–223.
[89] Raftery, A. E., Handcock, M. S., and Hoff, P. D. Latent space approaches
to social network analysis. J. Amer. Stat. Assoc. 15, 460 (2002).
[90] Raghavan, S., and Garcia-Molina, H. Representing web graphs. In Proc. 19th
International Conference on Data Engineering(ICDE03) (2003), pp. 405–416.
[91] Randall, K. H., Stata, R., Wiener, J., and Wickremesinghe, R. The Link
database: Fast access to graphs of the web. In Proc. Data Compression Conference(DCC02) (2002), pp. 122–131.
[92] Rao, S., and Richa, A. W. New approximation techniques for some linear ordering
problems. SIAM Journal on Computing 2, 34 (2004), 388–404.
[93] Sarkar, P., and Moore, A. W. Dynamic social network analysis using latent space
models. ACM SIGKDD Explorations Newsletter 7, 2 (2005), 31–40.
[94] Shieh, W., Chen, T., Shann, J. J., and Chung, C. P. Inverted file compression
through document identifier reassignment. Information Processing and Management
1, 39 (2003), 117–131.
128
BIBLIOGRAPHY
[95] Silvestri, F. Sorting out the document identifier assignment problem. In Proc. Advances in Information Retrieval, 29th European Conference on IR Research(ECIR07)
(2007), pp. 101–112.
[96] Silvestri, F., Perego, R., and Orlando, S. Assigning document identifiers to
enhance compressibility of web search indexes. In Proc. 2004 ACM Symposium on
Applied Computing (SAC) (2004), pp. 600–605.
[97] Simon, H. On a class of skew distribution functions. Biometrika 42 (1955), 425–440.
[98] Spielman, D. A., and Teng, S. H. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. 36th Annual ACM
Symposium on Theory of Computing(STOC04) (2004), pp. 81–90.
[99] Suel, T., and Yuan, J. Compressing the graph structure of the web. In Proc. Data
Compression Conference(DCC01) (2001), pp. 213–222.
[100] Szymanski, J. On the complexity of algorithms on recursive trees. Theoretical Computer Science 3, 74 (1990), 355–361.
[101] Travers, J., and Milgram, S. An experimental study of the small world problem.
Sociometry 4, 32 (1969), 425–443.
[102] Turán, G. On the succinct representation of graphs. Discrete Applied Mathematics
8, 3 (1984), 289–294.
[103] Watts, D., and Strogatz, S. Collective dynamics of ’small-world’ networks. Nature
393 (1998), 409–410.
[104] Watts, D. J. A twenty-first century science. Nature 445 (2007), 489.
[105] Watts, D. J., Dodds, P. S., and Newman, M. E. J. Identity and search in social
networks. Science 296 (2002), 1302–1305.
[106] Whittaker, E., and Watson, G. A Course in Modern Analysis. Cambridge
University Press, 1996.
[107] Witten, I. H., Moffat, A., and Bell, T. C. Managing Gigabytes: Compressing
and Indexing Documents and Images. Morgan Kauffman Publishers, 1999.
[108] Zipf, G. K. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1949.