Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Distributed operating system wikipedia , lookup
Airborne Networking wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Everything2 wikipedia , lookup
P2P and multimedia applications over the Internet Notes on the course Fiandrino Claudio July 4, 2011 ‡ II Contents 1 P2P systems 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . 1.2 Definition . . . . . . . . . . . . . . . . . . . . . . 1.3 Time evolution of applications . . . . . . . . . . 1.4 Issues . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 General Issues . . . . . . . . . . . . . . . 1.4.2 Issues for ISP . . . . . . . . . . . . . . . . 1.4.3 Issues for Users . . . . . . . . . . . . . . . 1.5 Overlay network . . . . . . . . . . . . . . . . . . 1.6 Family of systems . . . . . . . . . . . . . . . . . . 1.7 Napster . . . . . . . . . . . . . . . . . . . . . . . 1.8 Gnutella . . . . . . . . . . . . . . . . . . . . . . . 1.8.1 Analysis . . . . . . . . . . . . . . . . . . . 1.8.2 Messages . . . . . . . . . . . . . . . . . . 1.8.3 Characteristics . . . . . . . . . . . . . . . 1.8.4 Performance evaluation . . . . . . . . . . 1.9 Chord . . . . . . . . . . . . . . . . . . . . . . . . 1.9.1 Analysis . . . . . . . . . . . . . . . . . . . 1.9.2 Example . . . . . . . . . . . . . . . . . . . 1.9.3 Issues . . . . . . . . . . . . . . . . . . . . 1.9.4 Load balance . . . . . . . . . . . . . . . . 1.9.5 Comparison between Chord and Gnutella 1.10 CAN . . . . . . . . . . . . . . . . . . . . . . . . . 1.10.1 Routing . . . . . . . . . . . . . . . . . . . 1.10.2 Join . . . . . . . . . . . . . . . . . . . . . 1.10.3 Performances . . . . . . . . . . . . . . . . 1.10.4 Leaving of a node and failures . . . . . . . 1.11 Tapestry . . . . . . . . . . . . . . . . . . . . . . . 1.12 BitTorrent . . . . . . . . . . . . . . . . . . . . . . 1.12.1 Analysis . . . . . . . . . . . . . . . . . . . 1.12.2 Policies . . . . . . . . . . . . . . . . . . . 1.12.3 Case study: Flash Crowd . . . . . . . . . 1.13 Skype . . . . . . . . . . . . . . . . . . . . . . . . III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 3 3 3 4 4 5 5 6 6 10 12 13 16 17 20 22 23 25 26 27 27 28 28 29 30 30 32 34 36 IV CONTENTS 1.14 P2P Streaming systems . . . . . . . . . . . . . . . . . . . . . 1.14.1 Tree-based systems . . . . . . . . . . . . . . . . . . . . 1.14.2 Meshed-based systems . . . . . . . . . . . . . . . . . . 2 Random graphs 2.1 Introduction and definitions 2.2 Erdős-Renyi Model . . . . . 2.2.1 Average degree . . . 2.2.2 Degree distribution . 2.3 Bender-Canfield Model . . . 2.3.1 Node reachability . . 2.3.2 Small-world effect . 2.3.3 Clustering . . . . . . 2.4 Heavy-Tailed Distribution . 2.5 Watts-Strogatz model . . . 2.5.1 Clustering analysis . 2.5.2 Small-world analysis 2.6 Theory of evolving networks 2.7 Resume scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 40 43 53 53 54 55 56 56 56 61 63 65 66 67 68 69 72 Chapter 1 P2P systems 1.1 Introduction For P2P analysis point of view, the Internet is a structure already defined and perfectly working: only users are taken into account and they are called hosts or peers. Hosts communicate thanks to the Internet, which can be seen as the transport media that carries data, therefore the analysis focuses on layers 4 and 7 of the OSI stack. Indeed it is necessary having a knowledge of transport layer to understand and predict the behavior of the network, but it is also necessary know what kind of features users may require from the application layer, since they operate with applications. Layer 7 Layer 4 1.2 Definition P2P (peer-to-peer) systems are system in which users receive and provide part of the service. This is a general definition, indeed the concept of service has to be declared. The important thing is that hosts also contribute to service provisioning: it means that the service is distributed and not centralized like a web browsing application. Depending on the type of service, users provide different things using their resources. 1 2 CHAPTER 1. P2P systems Sharable resources In this section the attention will focus on kind of sharable resources. A first type are content resources: users share content that they have on their machines. If there are no other users with that content, the quality of service will be very bad while, if a lot of hosts share the same content, the service will be excellent. An example of application is Napstar where the content is music. Types of content indeed might be various; grouping them, it is possible introduce the following classification: . file sharing; . directories. File sharing groups a lot of possible contents: music, games, videos, films, ebooks. Directories are typically part of a distributed database that once received it is redistributed and anyone can access to that part (Skype). Another possible sharable resource is CPU : in this context the computational power is shared. For example, if an application requires a very huge computational capacity not owned by a single machine, it can be distributed among Internet hosts to use their computational power to process a single part of the application (application to discover new form of life that require sharing power to signal processing). The last possible shareable resource is bandwidth: an example is the case in which an host owns a very popular film requested by a lot of other peers; if it has to distribute to everyone, a very large bandwidth it is required at the access link. Perhaps it is better if he distribute parts of that film to other users that in turn redistribute: in this way the bandwidth actually used is greater. Examples of applications are Bit-torrent, P2P Tv, Gaming. 1.3 Time evolution of applications At the begin, the Internet was in certain sense peer-to-peer: flat topology, distributed features and protocols. Growing up, it moves to the clientserver paradigm in which someone provide some service requested by other user: the web browsing is a typical client-server application. ISP developed applications in that sense and that choose implied having asymmetric access: upload and download treated separately, typically assigning to download much bandwidth (ASDL). Indeed, usually there is one server with several clients. With the development of peer-to-peer applications the situation changed in a fair symmetric way and now there is no a strict division to download and upload bandwidth because, if peers have to redistribute contents, they need an application able to exploit in particular the upload bandwidth. 1.4. Issues 3 Moreover with the technological evolution of devices, a much more computational power has made it possible push down some tasks from the core network to edges. 1.4 1.4.1 Issues General Issues Peer-to-peer systems suffers of critical issues. One is churning, the high variability in time of the system. Indeed hosts can freely join or leave so the quantity of content avaiable changes very frequently. For example, for P2P Tv, resources have to be balanced on the quantity that a peer can redistribute and the quantity that he needs. Furthermore, a perfect knowledge of participants is required, such as their Ip address that, due to churning, can change over time. This knowledge is not strictly necessary in others applications. If a peer is hidden behind a NAT or a firewall, further information is required, in particular the public Ip address of NATs. The reason is that NATs were developed for a client-server kind of application. Firewalls, instead, can denied the access of a machine to the P2P application. Every P2P system has to deal with join issue: when users want to join the net, they require some information like the address of the firs neighbor. If, in a certain moment, there are no peers in the network the service can not be provided. In order to join is possible: . access to a web page which contains a list of peers active or recently active: the new peer contact them as soon as he finds one up; . connect to some server always on. These mechanisms are centralized techniques: an application that use them is BitTorrent. 1.4.2 Issues for ISP ISPs have to cope with following troubles: . traffic engineering: to improve the service, having in mind the goal of satisfying users requirement, ISPs can balance traffic (symmetric or asymmetric access means different amount of traffic in the network); . capacity problems: many applications generate a lot of traffic and ISPs, when exchange traffic to other ISPs, have to respect cost policies stipulated; moreover, the quantity of the traffic can be huge because applications does not care of the physical topology so, being neighbors in the peer network does not implies belonging to the same ISP: the consequence is that, in general ISPs are crossed many times; 4 CHAPTER 1. P2P systems . competitive services: ISP can have their own telephony company which gives a non free service; of course they also carry data traffic and, if that traffic is Skype traffic, which is free service VOIP, they may penalize it since it is concurrent. 1.4.3 Issues for Users Considering users, they have to deal with: . legal issues: some services, for example file sharing, may incur in this issue because contents are distribuited violing copyright; . security and private issues: maybe some applications are malicious and exchange traffic potentially riskily (viruses, malaware, spyware). 1.5 Overlay network The layer 7 network that connects peers is called overlay network. The overlay network is completely independent from the physical network and can be fully mesh connected or not (if peers does not know all other peers, but they have a partially view of the topology). The picture below reports an example. Overlay Network Isp 1 Isp 3 Isp 2 Links are logical of course, and two peers connected by a link of the overlay network are neighbors and they may belong to different ISP: it means that physically they can be located very far away. Links can be created in different ways, with direct TCP connections for example, or with UDP connections plus some further information. 1.6. Family of systems 5 The overlay network is used to implement functions, different from application to application and it is possible have more than one overlay network nested together. Some examples are: ( query files: overlay network Gnutella : retrieve files: tcp connection n BitTorrent : retrieve files: overlay network 1.6 Family of systems According to the following classification, it is possible to distinguish: . unstructured P2P systems: they are systems in which the topology is not regular, but a random graph (neighbors are randomly chosen); an example is Gnutella; . structured P2P systems: in these systems the topology is regular; an example is Chord; . hierarchical P2P systems: a hierarchy is created among peers, distinguishing high priority peers (super peers) and ordinary peers; super peers are connected together in a structured way while ordinary peers are connected with unstructured topology; an example is Skype. 1.7 Napster Napster can be considered as the first P2P system, developed by Shawn Fanning with Sean Parker and released in 1999. Actually it was not a really P2P system since users were not connected together (they had to join servers), but it has some peculiar characteristics of P2P systems. Those servers contained, in a database, lists of shareable contents that users had on their pcs. The architecture was something like a star where central nodes were servers; it is briefly shown in the following picture. Db Server Users 6 CHAPTER 1. P2P systems Properties . Informations that users declare: ID, Ip address, number of port, list of sharable contents. . Fundamental function: query for a given content. How it worked When a user wanted to retrieve some content like a song, sent his request to the server; at that point the server looked for the content into the database to know who hold it. If someone had it, it returned to the initial user all informations regarding the user that had the content: in this way the two hosts can exchanged the content using a direct connection. 1.8 Gnutella Gnutella is not an application or a system, but it is a protocol that other applications implement (for example Shareaza, Bearshare, LimeWire). The topology is unstructured and there is no distinction among peers: it is serverless. Moreover, each node can request or distribute contents: this kind of peers are called servent. It is assumed that users share contents stored on their pcs so they have first to declare to the network the knowledge of their contents. The purpose of Gnutella is make queries in a smart way. A query, to discover the requested file, has to search on a list of contents held by peers; such a search is realized thanks to flooding: the initial node can send the request only to its neighbors, they forward it to their neighbors and so on. It implies that each node has not a global view of the network. 1.8.1 Analysis To analyze a P2P protocol, the attention has to be focused on the following aspects: . how users join; . maintenance: fundamental task to deal with churning; . search: discover some content in the network (is a typical task for file sharing applications); . download: when a search succeeds, how the file is downloaded. 1.8. Gnutella 7 Joining The protocol does not specify a procedure: usually on a web page there is a list of peers active or recently seen active. The new user has to connect to that page and download that list; then he has simply to try to contact users presents on the list as soon as he is able to find one of them active: at that point he can open a connection and wait for the acknowledge. To be contacted each peer has to declare: its ID, the Ip address and the port number. The graphical explanation is reported below. Step 1: contact the web page Web Page A Step 2: download the list of peers Web Page A Step 3: contact a peer Web Page A Step 4: wait the acknowledge Web Page A 8 CHAPTER 1. P2P systems Step 5: the new user is a peer Web Page A Steps 1-4 are called signalling procedure: after that the new user becomes a peer and, at the beginning, he has just one neighbour (the peer contacted by means of the web page); in Gnutella, two peers are neighbors when they have established a TCP connection (at that time using TCP was very peculiar). Since it is possible to contact each peer present in the list, the topology is randomly created. Maintenance When one peer is connected he has to discover other neighbors to have a good connectivity; indeed, if the only neighbor that it has switch off, he remains no longer connected with the network. Goals of maintenance mechanism are: . guarantee a good connectivity; . give the possibility of change neighbors (in order to discover peers with more contents). The second feature implies that the overlay change a lot in time, due to this fact and to churning. To reach the two purposes, the following mechanism is provided: . time by time a ping message is sent to check if neighbors are alive; . when a ping message is received: . with a pong message a neighbor signals that it is alive; . the peer forwards to all its neigbors the ping (they will answer with a pong just to the peer that forward the first ping, not to the initial sender); . when a pong message is received it is forward to peers that previously send a ping. This mechanism is called flooding or discovery method because the new peer, thanks to ping and pong, can discover new neighbors. The algorithm stops using the TTL field of both messages; it allows to: 1.8. Gnutella 9 . avoid messages that run forever in the network; . discover a part of the topology, not the complete knowledge of the network. Since each message has an almost unique identifier (it is selected randomly among a large set, so the probability of having two messages with the same ID is negligible), the peer i has not to forward a message (both ping and pong) if he has received it more times; this choice has been taken: . to avoid useless propagation of messages; . to have a small cache in which store messages (possible only if useless messages are not propagated). The mechanism does specify the policy in which a new peer operates, once it has discovered new peers with pongs: contact all of them, just a part chosen randomly, a part chosen following some criterion. Search mechanism The search method is implemented with flooding as the maintenance mechanism. When a peer wants to search a given file, it has to send a query message to his neighbors; the message contains all fundamental information on the file. Nodes that receive the query check if they have that content: . if not, they have to forward the message to their neighbors (as before, the message has an unique ID, so if a peer receive it more time, it just ignores the message); . if yes, they have to answer with a query hit message. The node that has the content does not forward anymore the query message; notice that the query hit uses the reverse path to reach the initial node. The reverse path is exactly the path followed by the query message and it is extremely important since, as mentioned, each node does not have the global view of the topology. Download When a query succeeds, the initial requester has to download the file; it is able to do it, since the query hit message contains all information on the node that holds the content. In particular, peculiar features are: . Ip address; . peer ID; . port number. 10 CHAPTER 1. P2P systems The download uses HTTP protocol and it happens directly between the requester and the peer that holds the file: it means that contents are not distributed over the overlay network, just queries are. 1.8.2 Messages Messages, or descriptors, are used to implement functions mentioned before like maintenance and search. They are composed by header (common to all messages) and payload (different from function to function): Fields Payload Header 0 variable 22 23 Header The header is composed by: Descriptor ID Fields 0 PT 16 Hops TTL 17 18 Lenght 19 22 where: . descriptor ID is the unique identifier; . PT is the payload type; . TTL is the counter decremented each hop crossed; . Hops is a counter incremented each hop crossed; . Length is the field that specify the length of the payload (since it is variable is not known a priori). Payload Ping This message has no payload. 1.8. Gnutella 11 Pong Port N. Fields Ip Addr 0 2 Num. Files 6 Num. Kb 10 13 Last two fields represent the sharable capability of the node (in number of files and Kb): this information helps to decide to what peer is convenient to be connected to. Query Min. Speed Fields Search Criterion 0 variable 2 where: . minimum speed is the rate at which the peer wants to achieve the file (measured in kbit/s); . search criterion is the field that contains information used to search the content; since the protocol says nothing, each application can specify its own policy and it is a good choice because, the more general is the search criterion, the easier will be the research. Query Hit N. Hits Fields 0 N. Port 1 Ip Addr Speed Result Set 3 7 11 Servent ID N+16 N where: . num hits field represents how many contents satisfy the research; . speed represents the minimum speed (see query message); . result set contains: Fields File index 0 File size 4 File name 8 variable 12 CHAPTER 1. P2P systems Push If the node that contains the file is behind a firewall the requester servent is not able to contact him: in this situation he sends to his neighbors a push message. Once it is reached by the final node (always with flooding), the connection between the two servent is opened by that peer and not by the requester on. A push message is compound by: Servent ID Fields 0 1.8.3 File index 16 Ip Addr 20 Num Port 24 25 Characteristics Network’s aspects From the network point of view, main characteristics to keep in mind are: . scalability with the number of peers: the system scales very well because is completely distributed; . robustness with respect to churning/failures: the system is very robust both to churning and failures because the maintenance is realized with flooding and the connectivity is very high. User’s aspects From the users point of view, the main characteristic in which they are interested in, is the efficiency or response time. It depends on the popularity of the content: . if it is very or quite popular, probably the hit will happen before the TTL goes to 0; . if it is not popular, the probability of finding the content before the TTL goes to 0 is not sure. In the first case the efficiency is guaranteed while in the second case no. Costs Since there is a lot of traffic to deal with, from the network point of view, the protocol is extremely costly: this is the main drawback of Gnutella. Considering users, the algorithm is simply and, in terms of resources consumed, is cheaper since the storage capability devoted to the protocol is little. Only things to manage are: . neighbors; . cache. 1.8. Gnutella 1.8.4 13 Performance evaluation To evaluate Gnutella performances the analysis focuses on flooding procedure: Each arrow color represents a different step of the procedure: this is a sort of tree: A B E C F G D H I L To perform some analysis, first parameters have to be declared; they are: . κ is the number of neighbors for each peer (in the previous picture κ = 3: for example, A can contact B, C and D while C can contact G, H and A); it is assumed constant; . H is the number of hops: represent the deep (number of levels) of the tree; . N is the number of peers; . T is the average time to contact a peer; it is a random variable depending on: . layer 3-4; . physical distance; . number of routers crossed; . possible congestion in the network; . p represent the popularity of the file: it is a probability that some peer hold that content. 14 CHAPTER 1. P2P systems Number of contacted peers Since κ is assumed to be constant, at each level of the tree there, each node can contact exactly other κ nodes; to have an approximation of the number of contacted peers c , the following assumptions are taken: . common neighbors are neglected, therefore each node contact κ·(κ−1) peers (all sons of the tree a part from the father); . the value κ · (κ − 1) is approximate with κ2 . In conclusion at each step the number grows by: c = κ‡ + κ2 ‡ + κ3 ‡ + . . . + κH ‡ It is possible to rewrite the expression into: c= H X κi i=1 Example Taking values for H and κ it is possible to determine realistic values for c: ( κ=4 =⇒ c∼ = 22k H=7 If the message was a ping, peers will answer with a pong, therefore for each ping, in a scenario like the preceding one, there will exchange ∼ = 44k. Time need to contact peers To compute it, first an assumption has to be taken: at each level of the tree the time to contact peer (from father node to sons) it fixed and equal to T . Implicitly it means that the time required to send sequentially messages is considered negligible with respect to the time need to reach neighbors. Under that assumption, considering independent each level of the tree, parallels propagations occur and so: Avg{time} = H ‡ · T ‡ In a time (H · T ), κH nodes are reached. Example Considering: ( H=7 T ∼ = 200 ms =⇒ Avg{time} = 0.2 · 7 = 1.4 s Therefore, it is possible say that, the response received by an huge number of peers is quite quick. ‡ First step. ‡ Second step. hops. ‡ Time to cross an hop. ‡ Third step. ‡ H-th level of the tree. ‡ Number of 1.8. Gnutella 15 Probability of not finding a content This is an inefficiency of the system perceived by users. In general, the number of copies of a given content with popularity p is (N · p). It means that each peer has an independent probability of having that content. Considering c the number of contacted peer, the probability of not finding the content is: P (not find) = (1 − p)c Choosing a target F under which P (not find) must be assured: P (not find) < F ⇓ (1 − p)c < F Taking the logarithm: c · log(1 − p) < log F Example =⇒ c> log(F ) log(1 − p) Considering κ = 4: Value of H Value of c 1 4 2 20 3 84 4 340 5 1360 6 5460 7 21844 Maintaining κ = 4, considering F = 0.01: ( p = 0.05 (5%) p = 0.01 (1%) =⇒ ( c > 90 take H = 4 c > 458 take H = 5 16 CHAPTER 1. P2P systems Performance Performances principally means the average number of hops require to contact before having the first hit. For example: P (1) = P (find the file at the first hop) = 1 − (1 − p)κ Prosecuting: 2 P (2) = = (1 − P (1)) · [1 − (1 − p)κ ] 3 P (3) = = (1 − P (1)) · (1 − P (2)) · [1 − (1 − p)κ ] The average time to send a request is: H X ! i · P (i) ·T i=0 The average time to receive an answer is: ! H X i · P (i) · 2T i=0 1.9 Chord Chord is a structured system (on the overlay) which implies that churning is a big issue since the topology is fixed. So the choice of the topology is very relevant: it can not be a star because in a P2P system in general there are no role distinctions like the one introduced by the star topology with the central node. Moreover, also regular structured topologies are not so good since they introduce the concept of priority based on the geographical position. The topology actually used is a ring. The attention must be focused on the P2P technology, so the application layer and network layer are non considered; using a diagram, the stack should be: Application P2P Technology Layer 3/4 1.9. Chord 17 The P2P technology concerns features like overlay creation and maintenance, join operation and management of messages. Chord is similar to Gnutella since it is a protocol, but it distributes the information about contents and not the request for a given file. For example, it is possible that the peer that knows where is located a certain content is not the holder: the two aspects are completely separated. 1.9.1 Analysis A regular structure like the ring gives, implicitly, a knowledge about the distance between nodes. This fact is very useful to help the join operation: a new peer that wants to be connected has just to know in which position he should be placed. The distance knowledge is not provided physically: it is too complex to manage. Moreover it introduce some differences from a peer to another one: if the application that runs this protocol becomes very popular in a given country, nodes belonging to that country will be physically placed near with respect to a node belonging to another country. The density would be different. On the contrary, supposing to have a knowledge of distance at the overlay, allows to consider peers physically located very far away as neighbors. The way in which nodes are placed on the ring is to apply a function F to a list of information about the peer: the outcome is deterministic and uniformly distributed into an interval. This outcome is a number mapped in bits, so the ring is usually divided into m bits and, consequently the interval is divided into 2m−1 parts. F Peer Info Node Id The function F is realized thanks to cryptografy (SHA-I): . because makes difficult from the Node Id, obtain the peer information list; . allow to map a lot of information into an uniformly distributed space avoiding some proximity among peers; . although the mapping is random into the interval [0, 2m−1 ], the function is deterministic, so receiving two identical inputs, it will provide the same output (possible collisions). The Node Id represent the final position of the peer on the ring; thanks to that topology, each peer has just two neighbors called predecessor (i − 1 in the following picture) and successor (i + 1 in the picture); therefore it is 18 CHAPTER 1. P2P systems possible define neighbors as the closest active peers of the considered node (i). 2m −1 0 i−1 i+1 i Join Up to now, the join operation can occur with following steps: . the new node applies the function F to his peer list information receiving as a result his own position (N7); . he should know another peer and contact it (N24); . this peer contact his successor and so on until the right position of the new node is reached; . when successor and predecessor of the new node are founded, the connection is established and the node becomes a peer. Graphically: N24 N24 N7 N7 N7 1.9. Chord 19 How information is distributed Unlike Gnutella, in Chord the information of where contents are located is distributed among peers. Each peers knows that information thanks to keys that are generated applying a function G to metadata (data that describe synthetically the content). Graphically: G Metadata Key Keys are values generated with the same properties of Node Id, therefore they are uniformly distributed in the same interval [0, 2m−1 ]. An important thing to remark is that F and G, starting from different inputs (peer information list and metadata), are both able to map different kind of outputs (Node Ids and keys) into the same interval. To associate keys to Node Ids the rule used is to assign a key to the nearest peer succeeding the key value. Queries When the node N wants to retrieve a content, runs the function G over the metadata obtaining the key. Since it knows only his neighbors, he forwards to them the query that each time is redistributed. In this way sooner or later the peer that has holds the key searched by N is founded. If peer are n, globally, the expected time to found the one with the right key is n/2. This assumption holds just because both keys and node id are uniformly distributed. Therefore the order of complexity is quite high with respect to Gnutella, but Chord guarantees that the content is surely found (in Gnutella it depends). Shortcuts The query process has been improved by using shortcuts: in practise each node does not have just the knowledge about his neighbors, but know the location of more peers. Those peers are not chosen randomly, but with a specific rule: each time the space of a possible search of a file must be divided in two parts. The graphical explanation is: 20 CHAPTER 1. P2P systems The principal advantage of using shortcuts is that the search, instead being linear (complexity n), becomes dicotomic and therefore, the complexity is log n. The main drawback is that a sort of routing table is required: in Chord is called finger table. For a given node N, it has m entries and it is build as: Index Value Successor 1 N+20 successor(N+1) 2 N+21 successor(N+2) 3 N+22 successor(N+4) N+2i−1 successor(N+2i−1 ) N+2m−1 successor(N+2m−1 ) .. . i .. . m The value of m is critical: if it is large the probability of having conflicts (same output value applying the function on different inputs) is negligible; on the other side, high values of m imply: . large number of bits used; . high length of the finger table. 1.9.2 Example Given the following picture with m = 6 and the number of bits 26 = 64: 1.9. Chord 21 K54 N56 N4 N51 N8 K10 N48 N14 K10 N42 N32 N39 K38 K24 K30 K38 consider the case in which N8 is looking for K54. The finger table of N8 is: Index Value Successor 1 8+1=9 N14 2 8+2=10 N14 3 8+4=12 N14 4 8+8=16 N21 5 8+16=24 N32 6 8+32=40 N42 In this case the query is forwarded to N42 which is the nearest peer; the finger table of N42 is: Index Value Successor 1 42+1=42 N48 2 42+2=44 N48 3 42+4=46 N48 4 42+8=50 N51 5 42+16=58 N4 6 42+32=74=10 N4 22 CHAPTER 1. P2P systems At this moment, the nearest peer is N51; its finger table is: Index Value Successor 1 51+1=52 N56 2 51+2=53 N56 3 51+4=55 N56 4 51+8=59 N4 5 51+16=67=3 N4 6 51+32=83=19 N21 Since the key is in between values 53 and 55, the peer selected is N56: in three hops the key is founded. Join procedure with shortcuts If a new node wants to connect to the P2P application, runs the function F to discover his Node Id: assume it is N26. In the example, it has to be placed between N21 and N32. If, for example, he contact N4 to discover his successor and predecessor, the way in which this search is made is thanks to shortcuts, exactly like a query: first the successor of N26 is found and then contacting N32 is possible discover N21 which will be the predecessor of N26, but at the moment is the predecessor of N32. After this preliminary step, all finger tables have to be updated. Procedure 1. ask to some nodes to retrieve the successor(n) and the predecessor(n); 2. create finger table of n and update finger tables of other nodes; the update operation is very complex; 3. redistribution of keys. 1.9.3 Issues A possible problem of consistency takes place when finger tables are updated: for example, if a node is searching a key in a given node N , but if finger tables that point to N are not updated the content will not be found. Another issue is a failure of a peer. When it happens due to a simple switch off of a peer, notifications are sent to other nodes, but if a node fails how notifications are sent? 1.9. Chord 23 To avoid some of those issues, it is possible introduce some redundancy: each node maintains a list of some successors and not only the knowledge of one predecessor and successor. If, for some reason, the immediate successor fails, the node considered contact some of other successors. Stabilization procedure It is run every some time: each peer n ask to his successor n + 1 to answer who is its predecessor; if the answer is positive the peer n is actually the predecessor of n + 1. Otherwise, if the answer is p, two possible anomalies take place: 1. in the case p > n: n p n+1 in this case the information is wrong and the node n has to update his finger table since his own successor is p and not n + 1; 2. in the case p < n: p n n+1 in this case the information is wrong and the node n + 1 has to update his finger table since his own predecessor is n and not p. 1.9.4 Load balance The amount of work that each peer has to deal with depends how keys are associated to nodes. Let x: A x B 24 CHAPTER 1. P2P systems B−A 2m This parameter x is simply the fraction of the ring that the peer B is in charge of; larger is x, larger can be the number of key assigned to B, so that node has to deal with a large amount of work. In other words, it is also possible to say that x is the probability that B is storing a given key: since they are uniformly distributed on the space (normalized values in the picture below), the probability of having a key is proportional to the space that a node is in charge of: x= x 0 1 Assuming that there are κ keys in the system, the probability that A is not in charge of having keys is: P (A has no keys) = (1 − x)κ while the probability that A has exactly i keys: κ P (A has i keys) = · xi · (1 − x)κ−i i The distribution of that probability is something like: fA (n) 1 2 n The region 1 represents nodes that hold few keys, while region 2 describe peers with a huge amount of work to deal with; since the distribution is symmetric with a low variance, the load is assigned quite fairly to nodes. The mean number of keys stored in peer B is: E[# keys] = κ · x and, if there are N active peers, due to their uniformly distribution into the ring: 1 x= N 1.9. Chord 25 Therefore: E[# keys] = κ · 1 κ = N N The fair assignment of keys to nodes on average should not be good: if, for example, the peer A has much more bandwidth with respect to peer B, it would be better assign to A more keys in order to provide a better service to all users. 1.9.5 Comparison between Chord and Gnutella Chord Gnutella scalability very good very good robustness (to churning) poor very good overlay maintenance complex/less costly simple/costly performances (users) service guaranteed no service guaranteed responsiveness O(log n) O(H) performances (network) efficient (shortcuts) inefficient (flooding) O(log n) O(κH ) node: complexity small very small node: storage size order of m order of κ node: load balanced depends on κ node: contents no user dependency user dependency Robustness in Chord is poor since the routing is deterministic (shortcuts): if churning is high, updating finger table implies consistency problems. Indeed, structured systems, suffer an intrinsic issue due to the fact that peers have a quite large knowledge of the topology: this implies that the state information is high therefore the accuracy have to be very precise otherwise the system will be not reliable. The responsive time is similar for both protocols, but actually they are not comparable because one is a structured system and the other one unstructured, Chord uses a deterministic routing to found contents while Gnutella uses flooding. 26 1.10 CHAPTER 1. P2P systems CAN CAN (Content Addressable Network) uses the same basic approach of Chord: peer, thanks an hash function, are mapped on a space like keys. Moreover the space is the same for both keys and peers; the main difference is that the space is not mono-dimensional like in Chord, but it could have d-dimensions. F Peer Info Node Id G Contents Keys For example, with d = 2, the space will have two dimensions identified by two coordinates: y x The way in which keys are assigned to peers is on the base of the distance: the space is divided fairly to peers and each one controls his region. It implies that, all keys placed in a given region, are assigned to the peer that is in charge of that region. Graphically peers are marked in blue while keys in orange: 1.10. CAN 1.10.1 27 Routing When a peer is looking for a given key, he follows the shortest path to contact the peer that is in charge of the region where the key is placed. Implicitly, it means that peers has a detailed knowledge about their neighbors (with a routing table): indeed, to select the shortest path, they have to choose among them to contact the best one that guarantees the reachability of the key. 1.10.2 Join Once a new host has run the hash function he is able to know its final own position on the space. First he has to download, from a web page, for example, a list of active peer. Then he contact one of them: this node, by contacting his neighbors, determines the position of the new peer in the same way in which queries are performed. When the right position is discovered, the node that is in charge of that region has to partitioned it, assigning to the new node a portion. Regions describe the load that each peer deal with, therefore high width means high load. Graphically, pictures show the scenario before and after the arrive of a new peer (marked in yellow): A A B At first, peer A, was in charge of an huge area with 2 keys. After the arrive of peer B, the area has been reduced and, nodes A and B, have to deal with one key each one. In practise, the step 3 in Chord (redistributing keys in page 22), is realized in an hidden way just dividing the area. It could happen that the hash function returns values very similar for two different peers: in this scenario is possible that, one of the two nodes is in charge of a region, but it does not physically belong to that region. For example: 28 CHAPTER 1. P2P systems A B B is in charge of the yellow region although it does not belong to it. This phenomenon is due to the fact that the algorithm tries to obtain a fair distribution of the load and, therefore, to divide regularly areas. 1.10.3 Performances The complexity of a query request or a join can be evaluated by means of the average path length: AVG{path lenght} = d 1/d ·n 4 The formula says that, in order to have a complexity not too high, d must be taken sufficiently large, but large values of d implies have many dimensions and, therefore, many neighbors to contact each time a message is sent. The parameter d is much more critical with respect to the parameter m analyzed in Chord: indeed, the complexity in Chord grows by log n independently by m while the complexity of CAN is directly given by the value of d. 1.10.4 Leaving of a node and failures When a node leaves, notifications must be sent to his neighbors in order to decide which of them have to take care of the leaving peer’s region. Periodically, peers send messages containing information to their neighbors: among of them there is also the width of the area. Indeed, the criterion that peers uses to incorporate region is simply: the neighbor with the smallest area will be the new owner. This is done to maintain some uniform into the space. When a message is sent and after sometime a timeout expires without having received any notification, the peer realizes that some problems occur. To recover, a timer is started and that peer waits for some other information about his neighbor that seems failed. If nothing arrives the takeover procedure take place. The timer is proportional to the area owned by the neighbor of the node that seems failed, therefore being in charge of a small area allows to enter quickly in the recover procedure. The takeover runs: 1.11. Tapestry 29 . sending pickover messages to all neighbors of the node that is assumed to fail (it implies that each peer has also the knowledge about neighbors of his neighbors); . assigning to someone the area of the node failed. All these managing mechanisms are asynchronous and only provided in structured systems that are very complex to managed. 1.11 Tapestry Tapestry adopts the same method of Chord and CAN: peers and keys are mapped on the same space. The peculiarity is that the space is composed of 160 bits organized into 40 hexadecimal digits. To know distances among nodes, digits that represent a peer are compared; for example, considering: Node 4227: . Node 4228 has distance 1 so it is a Layer 4 neighbor (1 digit different); . Node 42A2 has distance 2 so it is a Layer 3 neighbor (2 digits different); . Node 43C9 has distance 3 so it is a Layer 2 neighbor (3 digits different); . Node 6FA0 has distance 4 so it is a Layer 1 neighbor (4 digits different). Therefore: . Layer 4: 422x; . Layer 3: 42xx; . Layer 2: 4xxx; . Layer 1: xxxx. where x ∈ [0 − F]. If each digit is a peer the knowledge near the considered one is very detailed while it is reduced going far away: this mechanism is called mesh routing and allows to reduce complexity. 30 CHAPTER 1. P2P systems Routing It is very similar to the longest prefix match: if the peer 5230 queries 42A1: 400F L1 5230 L2 4277 L3 L4 42A2 42A1 The search is reduced more deeply goes into layers, but this advantage has a cost: the maintenance of tables that potentially are large. If β is the base of digits, the complexity is O(logβ (n)). It could happen that the table is not completely full: it means that some digits are not associated to some peer. This is very risky because the algorithm was designed for a stable number of peers and this implies that is not robust to churning. 1.12 BitTorrent BitTorrent is a very popular system and it is a bit different with respect to previous mentioned systems. The objective is distribute files with huge size to a, potentially, high number of customers. The peculiar feature is that, the content, is not stored by a given user, but it is distributed among peers that share, among them, the bandwidth to download it. The overlay, therefore, is designed for this purpose and not for make queries. The content is divided into small pieces called chunks: to consume the file they have to be all downloaded so, from a peer point of view, they have the same importance. The usual dimension of chunks is around 64−256 kbit: they are quite small. The neighborhood (overlay) is established randomly, so peers are forced to both download (new chunks) and upload (chunks held). Transmission occur by means of TCP. 1.12.1 Analysis The distributor that wants to share the file, has to create a .torrent file by means of an hash function: indeed the .torrent is simply a file which index all chunks including the hash keys that guaranteed the correctness of chunks and, therefore, of the file. The .torrent contains also other information; some 1.12. BitTorrent 31 of them are: the file name, the file size, the number of chunks in which is divided into a the address of the tracker. After the creation of the .torrent the distributor has to upload it to a website from which peers can download and start to receive the file. There is a central authority that maintains the list of active peers that are sharing the content: it is called tracker. The tracker is not connected to the overlay; his purpose is just help peer to download the file and, for reliability is better have more than one tracker managing the overlay for each file. Website 1. upload .torrent 2. request 3. download .torrent 5. list of peers A Tracker 4. contact The list downloaded by the tracker is, usually, composed by 40 peers: they will become the neighborhood of peer A. Definitions . seeders: peers that hold the whole content; they are very important for the well behaviour of the system because it is possible download every chunk by a seeder; . leechers: peers that hold just a part of the content; . swarm: the totality of peers (seeders and leechers) that share the file; . chocked peers: this nodes are not allowed to receive content from a given peer; . unchocked peers: this nodes are allowed to receive content from a given peer. Among the list of 40 peers downloaded by the tracker, the node select just 4 peers: they are effectively those one that he is in contact with. 32 CHAPTER 1. P2P systems 1.12.2 Policies In this section are describe policies in which a peer select the 4 nodes to exchange traffic and how select chunks to be downloaded. Selection of chunks Peers distribute a map that shows what chunks they hold; this map is sent to peer’s neighbors, so they can decide which chunk should be downloaded. The policy is simple: the rarest chunk is selected and this is done for two reasons: . avoid risks that a rare chunk disappears from the network; . speed up the download. Chunk are subdivided in sub-blocks which are composed by around 10 TCP packets (∼ 16 kbit). If some neighbors have the same chunk, it is possible open more TCP connections to download in parallel (typically 5) sub-blocks at a time. In this way an higher download bit rate is expected because the bandwidth is enlarged: indeed, if the connection established for downloading a sub-block is very very slow, the effect on the global rate is mitigate from the other connections. Selection of peers Actually BitTorrent introduces two overlays: . one for the list of 40 peers downloaded by the tracker (green peers); . a second that contains the 4 peers (marked in orange) in which a given peer is in contact with (the blue one). The following picture shows this concept: Overlay 2 Overlay 1 Physical network 1.12. BitTorrent 33 The selection is based on the technique tit-for-tat: it depends on how much peers contributed in the past. The global advantage is that connections with large bandwidth are favourite and the local advantage is that the system forced each peer to share more because in this way it will receive a better service (avoid free riders: peers that want just to download and not contribute). In conclusion, tit-for-tat: . improve cooperation among peers; . provide fairness. Due to tit-for-tat, there is the distinction of chocked and unchocked peers: if a node in the past has contribute very little, probably it will be put in the chocked list. Each peers has his own chocked list, computed every time window (10 s for example), in which nodes are ordered by how much they shared: in first positions are put unchocked peers. The main drawback is that, at the beginning, each node should receive a very bad service since he is not able to contribute so much. This fact is avoid thanks to optimistic unchocking: each time, one chocked peer is unchocked. Indeed, when a peer receives request from others, the one that he will serves are peers that have lots of chunks (they have lots of rare chunks and they can contribute to share wery well). It means that the rarest approach for beginning users can not be used: they have to choose randomly chunks to download, then when their number will be sufficiently high, they can start use the rarest approach since their contribution will be enough. Tit-for-tat tries to improve fairness balancing how much a peer can contribute with his desired service, but it is possible that, due to asymmetry of network flow, it reduces the performances of the system. Imagine that two peers are exchanging chunks belonging to the same content: if the communication follows two different paths, it is possible that one of them is bottlenecked. It implies that one of the two peers (A) has a very slow upload ratio with respect to the other (B), therefore (B) can not exploit completely his bandwidth because the mechanism tries to punish (A) that has a low contribute. To improve efficiency and performances the end game mechanism has been introduced: for each chunk, last sub-blocks are requested by the peer in broadcast to his neighbors. Once the positive answer is received, the request is aborted. This technique allows to avoid that, being unlucky, the receiver waits too much time the download from a slower peer: indeed, since just one chunk at a time is possible download, waiting for just the last sub-blocks is waste of time that is possible to avoid. This implies that the download is sped up. 34 CHAPTER 1. P2P systems 1.12.3 Case study: Flash Crowd Supposing that a content is very popular and the purpose is to distribute it to the largest number of customer possible. Assume: . the number of peers interested in is n = 2κ ; . two cases are avaiable: 1. a client/server scenario; 2. a scenario in which the content is redistributed by peers; . the content distributed is an atomic entity; . all peers have the same upload bandwidth b. If the size of the content is s, the time needed to download/upload the content is: s T = b Plotting on the x axis the number of peer contacted at each step and on the y axis the time: peers T 2 2T 4 3T 8 κT 2κ time Case 1 Considering the client/server scenario, the service capacity needed, is: 1.12. BitTorrent 35 C(t) B t where B is the global capacity of the server, and B > b. Case 2 In the other approach: C(t) b t It implies that this method is very effective: in a very short time, it reaches the client/server approach. Now consider the case of parallel download: each peer divides in two his upload bandwidth in such way that two other peers can download the content simultaneously. This time the time to complete a download is: Tx = The graph will be: s 2s = = 2T b/2 b 36 CHAPTER 1. P2P systems peers T 3 2T 3T 9 4T κT time If the content is a chunk, comparing the two graphs, it is immediately clear that is better not divide the bandwidth distributing it: this allows to speed up the download because more peers are reached in less time. Moreover, now becomes clear the fact that the size of chunks is reduced: if s is small, also T is small and if the download time is small, the redistribution takes place quickly improving performances. The source (colored in blue in both graphs) is the peer that works for the highest time, but the (κ − 1) step (that is the most effective because allows to reach half peers interested in the content) works just for a while: it implies that the potential bandwidth (2κ · b) is not completely exploited. A way to improve it, is having independent distribution trees: they represent paths follow by chunks to reach peers. The most effective step is, as mentioned before, the last one because allows to reach a large number of peers: this is a reason why the rarest chunk selection is implemented. Indeed, in first steps, the chunk is very rare, so it is better to distribute it otherwise it can disappear from the network, but at the end it is very popular and the risk of a loss is negligible. 1.13 Skype Skype is a very popular system that adopts proprietary solutions, therefore the design is closed and everything is encrypted. The knowledge about this system is obtained thanks to reverse engineering. In this system directories of people are distributed and they are managed only by super-peers. Reasons of his success are: 1.13. Skype 37 . very good design and high quality (also in presence of NATs/firewalls); . users are involved to use it since lot of people use it. The overlay is hierarchical and distinguish: . peers; . super-peers that are very well connected. An example is: Normal peers Super-peers Super-peers are chosen by election among normal peers and it is possible force the software to be not elected; super-peers must have: . a public Ip address; . bandwidth to share. Super-peers are in charge of managing their normal-peers: they know when peers are on/off line, they helps peers to find other contacts and with communications in presence of NATs/firewalls. However, each normal peer can contact more than one super-peer for reliability. Users are not identify based on their Ip address, but with an identifier: this helps people to use the application regardless the place in which they are. Indeed, if they are at home they can use a pc, when they are at office another, but for the application the user is the same. This purpose is reached through an authentication method: each time the user have to declare his identity before being connected. Due to this fact, it is possible distinguish two classes of signalling: . one to login and to authenticate; . one to look for other users. 38 CHAPTER 1. P2P systems In general, as transport protocol, is used UDP: since the human voice requires a low bandwidth, to avoid fluctuations is better use UDP that does not provide congestion control although it is not reliable. Of course, when it is needed (in particular in presence of NATs and firewalls), it is possible use TCP; the signalling traffic, instead, is always sent through TCP. A communication between two hosts not behind a NAT happens like: . the initiator asks to his super-peer informations (Ip address and port number) about the peer that wants to talk with; . the super-peer provide those information; . a test connectivity take place: the initiator tries to open a direct connection; . if possible they can start to communicate. If the initiator is behind a NAT, the connectivity test fails because information retrieved by the super-peer are different from the actual information for the receiver: the answer is negative, therefore, and in the message are specified the current Ip address and port number. In this way, the initiator, using those new parameters seems that he is not behind the NAT. In the case in which is the destination behind the NAT, it can not be reached: therefore the initiator contact the super-peer telling that is the destination has to start the talk. When both are behind the NAT, they have also to retrieve their public information from super-peers before start the communication. It is possible to conclude that the reachability in Skype is very high: indeed, super-nodes can also works as relay nodes in presence of NATs or firewalls; in this case the two links are completely independent and transport protocols used can be different. Solutions discussed are called Simple Traversal of UDP through NATs (STUN) and Traversal Using Relay NAT (TURN). With Skype is also possible contact the fixed telephone network (procedures called skypein/skypeout) by using gateways: in this case the quality perceived is the same of the fixed telephone because a different codec is used (G729). Usually the voice codec is select from a list; main features are: . bit rate: 10 − 32 kbit/s; . fixed inter packet gap (IPG): 30 ms. Moreover, to deal with losses, Skype introduce redundancy. 1.14 P2P Streaming systems P2P streaming systems are systems that provide multimedia service in principle and the fundamental assumption is that, the user interested in the 1.14. P2P Streaming systems 39 content, consume it in real time that is he consume it while downloading. Therefore several efforts are make in this sense: service interruption avoided, reduce the delay are some of them. Services provided are video, audio or both video and audio. Those systems can be distinguished based on the kind of service provided: . VoD: video on demand (example: catalogue of video channels); . real-time TV (examples: live sports events, interactive TV). The fundamental distinction is the delay: in the second case it is much more tight than the previous category. For real-time TV the latency, therefore is very short: it is the gap between the moment in which the video is generated and the moment in which the video is consumed. Regardless this classification, there is a delay to take into account every time: it is the delay that consider the distribution of the content. Therefore, peers that compose the neighborhood of a given node, are just those one interested in the same part of the content. Peers are not forced to be synchronized, but in general peers are interested to consume the same part of the content more or less at the same time. Reasons for which these systems are now popular are: . possibility of distribution everywhere at the same time (example of foreign communities or places with few infrastructure where just internet arrives); . scenarios of closed market or due to expensiveness; . small distributors: small communities interested in a given moment where the number of users is large but sparse, for example scientific contests. Another classification of this system is based on the the type of overlay used: . tree-based; . mesh-based (similar to BitTorrent). The overlay is in charge of the distribution of contents and tasks performed are: . how find content; . how find neighbors. Users share their upload bandwidth to distribute contents. 40 CHAPTER 1. P2P systems 1.14.1 Tree-based systems These systems were proposed as alternative to multicast IP distribution using routers to reach more than one users simultaneously. That method suffered because routers were assumed to have much more capability; moreover multicast suffer of following issues: . routers bottlenecked; . addresses; . group maintenance; . security. Hosts are divided in source (that generate the content), destinations and intermediate hosts. An example of topology is: Source P1 P4 P6 P2 P5 P7 P3 P9 P8 If each node is in charge of distribute the content just to his children, the bottleneck problem disappears because the required bandwidth is not too high. Tree construction Parameters to define are: . the number of levels of the tree (the number of hops to reach the last layer of the tree); . the fan out: the maximum number of children that each node can have. 1.14. P2P Streaming systems 41 Based on the number of levels, it is possible impose an upper-bound on the delay: it will be small if the number of levels is reduced. Based on the fan out, instead, it is possible to impose a limit on the upload bandwidth: too many children are difficult to manage. Indeed, the maximun fan out is: Fout = global capacity bit rate for each video The important think to remind is that the upload bandwidth can not be completely exploited because some signalling is needed. Tree maintenance This is a very critical point because trees suffer of an intrinsic vulnerability: when a node switch off, the topology is divided, therefore some parts of the tree may incur in a potential service interruption. Potential problem Based on their position in the network, nodes can contribute more or less to distribute contents, a part from nodes placed in the last layer: they do not contribute at all. It implies that there is some unfairness. End-system multicast (ESM) This system was not designed with a P2P approach and it has two overlays: . one which is in charge of the tree maintenance: it is based on a mesh topology (information related to maintenance is distributed with flooding); . the second is in charge of distribute and find contents: it is based on the tree topology. The approach is distributed, but peers actually maintain a global view of the network. Join operation . After a bootstrap phase (where the initial node downloads from a web page a list of active peers) he contact someone; . a join message is sent through the mesh overlay to all peers (in this way everyone knows that a new peer wants to join); . the same happens for the leaving step: a leave message is propagated through the mesh topology. 42 CHAPTER 1. P2P systems Periodically each peer sends a message by flooding: in this way nodes can build a neighbor table because messages contains information like the peer id from which they have received the message, the Ip address, the id of the message and a timestamp. This helps to know when a node leave the network after a failure: if after a timeout (checked with respect to the last timestamp received by a given node) no messages arrives from that node, the peer send a message to it; in the case in which no answer is received, he is in charge to notify by flooding the leaving otherwise he just has to update his neighbor table. Once the mesh topology is created, to select the subset of the graph used to detect the tree is used a distance vector algorithm. Multi-tree systems They are still tree-based system in which tree used are more than one; those systems are also called second generation systems. They were developed to deal with issues of single tree-based systems, like: . little robustness to churning (part of the tree isolated); . inefficient use of the bandwidth (last layer of the tree does not contribute). The content is organized in m sub-streams and each one is served by a different tree. In this way, nodes that are in the final layer of a tree can be sources for another tree: this improve efficiency and robustness because, when a node leave there should be problems not in all trees, but only in those one in which the node is not in the last layer. An important advantage is that, managing several trees does not implies having too much complexity. There is a balancing to what a peer receive and how much he contribute: in some conditions it acts like an internal peer, distributing contents to m children and in other conditions it acts like a leave by receiving contents. 1/m m 1.14. P2P Streaming systems 43 A drawback is that the parameter m can not be adapted time by time (to the capacity, to the number of peers), but it has to be decided a priori. Peer Join A new peer that wants to join to the network, has to: . find his current position in m trees; . join as an internal child in one tree, the parent will be the first node with the lowest depth that can accept a further child. The highest is the position the more the peer will contribute in the distribution. Peer leaving The leaving of a peer, if he is placed as a leaf, does not cause problems, while if he is an interior peer yes: his children have to re-perform join operation. Descriptors Multi-tree systems were designed for multi-descriptor codecs: the original information is taken and coded into several descriptors, where each one has a different codec. If the user is able to receive all of them, he can consume the content with an high quality; if he is able to catch just a part of them, he still is able to consume the content, but with a lower quality. Multi-trees systems ensure that, a node leaving, is not a critical issue because it just could happen that some descriptors are lost, but this implies that there is no service interruption: the content will be received with a low quality. The drawback is compression: the efficiency of multi-trees plus multidescriptors is a bit reduced because multi-descriptors, to reach a good quality, need much more bandwidth. 1.14.2 Meshed-based systems Those systems take inspiration from BitTorrent although the purpose is different. Pieces of a streaming content are distributed to neighbors and there is a tracker whose role is letting peers join by sending them a list of active peers. As BitTorrent, there is no structured overlay: nodes are not force to be placed in a given position accordingly to a given topology. The overlay, indeed, is a mesh randomly created. The maintenance is provided with a gossip-algorithm: peers declare a list of neighbors that send to their neighbors and through Hello messages the presence is periodically notified. Indeed, having a small neighborhood, limit a peer in: . distributing/receiving; . more easily to be out of service. 44 CHAPTER 1. P2P systems Gossiping is not flooding: no rules are imposed to reach all nodes with a given update information. As in BitTorrent, the neighborhood from which the peer exchange traffic is reduced: the peer select them based on: . their workload or capacity; . path characteristics: RTT, loss probability (but are time variable parameters and have to be measured); . content availability. Mesh Topology Neighbors Neighbors to exchange traffic Data delivery Contents are divided into pieces called chunks that are treated independently, therefore they can follow different distribution trees. Policies to distribute chunks are local, so there is no a wide coordination; scheduling mechanism are basically: . push: decisions are taken by the transmitter; 1.14. P2P Streaming systems 45 . pull: decisions are taken by the receiver. With a push, the peer, based on the chunk, send it to neighbor without negotiation while with a pull, is the receiver that requests the desired chunk: this implies having some knowledge about contents. Push Pull short delays (no negotiation) requires more signalling multiple copies (waste of bw) no multiple copies possible losses larger delays Push may suffer of losses when, due to multiple copies, the bandwidth is not enough. Strategies Let: . u and v be peers and neighbors; . c(u) be the set of chunks held by u; . C ∈ c(u) be the set of chunks sent by u. Strategies are methods to decide what transmit or request ,based on C and v, chunks and neighbors. . first peer selection: . random selection; . random selection of useful peer (the one that need something from u) v such that c(u) \ c(v) 6= 0 here is very important to keep in mind what is the order of selection: if first is the peer, there are constraints on chunks to deliver, while on the contrary, a chunk-peer selection implies having constraints on the peer; . most deprived peer (the one that can receive a lot of chunks from u); . first chunk selection: . random selection; . random useful selection; . latest blind chunk (the most recent chunk, with respect to source generation, is sent: it is the more urgent needed by peers, the one with the tight delay constraint); 46 CHAPTER 1. P2P systems . latest useful. Indeed, another concept similar to BitTorrent, is that latest chunks are held by few peers, so it is better to distribute to make them safe (not lost easily). Examples . random peer/latest blind: this combination pushes every time the last chunk; if the source is greedy, the service perceived is good because, due to latest blind, does not matter which chunks hold the receiver; properties are: . little overhead; . minimum delay; . possible losses and duplicates; . most deprived/latest useful: first is selected the peer that hold the less and is sent to him the latest useful chunk; this implies that peers must have knowledge about chunks held by their neighbors; properties are: . large overhead; . large delays. Performances There are two complementary indices: . diffusion rate r(t): probability that a generic peer receives a chunk in a time smaller than t; it gives ideas on delays; Diffusion rate Given a time, how many peer is possible reach? . diffusion delay: it is the delay that a chunk takes to reach a fraction 1 − of peers; fixed , i.e. 5%, the diffusion delay measure the time needed to reach the 95% of peers. Diffusion delay Given a population, which time take to reach a part of it? 1.14. P2P Streaming systems 47 Relation delay-losses Since users consume the content while they are downloading it, the delay should be the shortest as possible: 1 ∆t 2 ∆t 3 t Source Layer 3 Network 1 2 3 t Peer The delay of layer 3 network is composed by a combination of scheduling and buffer policies, possible congestion and propagation delay; but also delays of layer 4 and 7 have to be considered, therefore each packet is received with a different delay: the variability of delay is called jitter. Moreover, it can happen that packets are received out of order: 1 ∆t 2 ∆t 3 t Source 1 Peer 3 2 t When the first packet had stared to be played, the codec needs that exactly after ∆t the second is ready and so on, therefore out of order packets are very dangerous: they decrease the quality perceived. To deal with this fact is possible introduce an initial playout delay: it is artificial and used just to increase the probability of receiving right chunks before play them. 48 CHAPTER 1. P2P systems 1 2 ∆t ∆t 3 t Source 1 3 2 t Peer-received 1 Peer-played playout delay 2 ∆t 3 t ∆t The trade off delays-losses is: . higher delay −→ no losses −→ high quality perceived; . lower delay −→ possible losses −→ low quality perceived. Losses can be due to: . packets/chunks never received; . packets/chunks received late. The second category is much more critical because packets late received are useless and are an useless waste of resources (in terms of bandwidth). Indeed, the following picture highlights this fact: chunk loss useless info chunk not already received chunk owned buffer current chunk played Chunks that are not received that in the sequence order are put previously of the current chunk going to be played are useless, while if they are consequent they can still be received: therefore a buffer is needed to store them. By adopting a policy in which chunks are selected by latest blind, the one needed is shown in orange in the previous picture. The use of buffer allows to reach some synchronization: indeed peers are interested in the same content at the same time, so in a situation like: 1.14. P2P Streaming systems 49 Peer 1 current chunk played Peer 2 current chunk played the two peers are not interested in communicate each other. The consequence is that chunks has not the same relevance as in BitTorrent: some of them are more urgent and other can become useless if not received in time. The information needed by peers to communicate what chunks they can transmit is the buffer map (BM): it is a map that describe owned and not owned chunks by a given peer. For example: 1 0 0 1 1 The exchange of buffer maps can happen: . periodically (issues in choosing the period: long implies delays, small overhead); . at each received chunk. When a peer has to re-distribute a chunk: new chunk received A B C the temporal diagram, considering a pull policy, is: D 50 CHAPTER 1. P2P systems delay A-C delay A-B to B BM M B to C k un ch B to nk chu to C t A B C request from B received B received C request from C It is possible conclude that, the exchange of buffer maps introduce a further delay, while, considering a push policy, the temporal diagram is: delay A-C delay A-B to M B B BM to C k un ch B to nk chu to C t A B C request from B received B received C request from C The delay is reduced, but if B and C do not request that particular chunk, the bandwidth is waste with no meaning behind. There are some proposes to reduce the delay by using a pull policy: . select peer based on RTT: indeed the exchanging phase of buffer maps allows to measure the RTT, therefore if the peer is selected based on that measure, decision could be better; a possible drawback is that, distance-based decisions may degenerate in partition the network, in terms of connectivity: locality is introduced; . select peer based on probability: with p select randomly, with 1 − p use RTT measures; . wait an amount of time t before select the same peer again: this allows to inhibit the selection of the same peer to not favourite him; 1.14. P2P Streaming systems 51 . bandwidth aware policy: reduce delays by exploiting at the best the bandwidth, especially when peers have different upload bandwidth because the delivery is favourite to nodes that have it more; the picture show this fact, emphasising that peers with more bandwidth have large size: statistically, this allows to reduce the number of hops because trees are much more short; the main issue is the detection of the upload bandwidth. Issues . Fairness: very evident in bandwidth aware policy, some nodes may distribute more than they receive. . Depending on the codec, is not possible increase too much the download bandwidth, therefore the quality is bounded. . Content aware: to improve efficiency is possible change codec, but chunks do not have all the same importance, so the ones more relevant have to be transmitted in such a way to be sure that they can be received. . Costs for ISPs. 52 CHAPTER 1. P2P systems Chapter 2 Random graphs 2.1 Introduction and definitions Random graphs are created through rules that provide randomicity: they are use to model and describe systems with many components and high complexity. Application fields are: . model the internet (layer 3 network); . model the web, www (interconnection when browse a page, layer 7 network); . network designing; . biology; . social networking. P2P systems are based on overlays: a way to model them is through random graphs. This kind of models are used to: . understand the system; . tuning parameters; . design choices; . performance evaluation (in simulation, for example, evaluation of scalability). Definitions . Graph: composed by: . nodes/vertices; . edges/links; 53 54 CHAPTER 2. Random graphs . neighbor: node connected directly through a link; . degree: number of neighbors of a given node; . component: subset of nodes connected each other through links (more components inside the graph implies having a disconnected network because picking up two nodes from two different components they are not reachable); . giant component: a finite fraction of nodes belonging to the same component (if the number of nodes is high and there is a giant component the network has a very good connectivity; in biology scenario, to isolate viruses, the presence of giant component is bad because allows infections easily: better have a low connectivity); . clustering: the probability that two nodes are neighbors increases if they have at least one neighbor in common; . clustering coefficient: the average probability that two neighbors of a given node are neighbors too; . radius (around a node): the distance (in number of hops) to reach any node from a given node. 2.2 Erdős-Renyi Model Given: . n: nodes; . p: probability that a link between two nodes exists. the the resultant graph is called G(n , p). Another equivalent definition is: G(n , p) is a set of graph of n nodes and each graph appears with a probability that is typical of the number of links. Indeed, considering: . n: nodes; . m: links; there are many combinations that have a certain probability of appear: P = pm · (1 − p)M −m where M is the total number of possible links. Analysing each term: . pm : is the probability that exactly m links are present; . (1 − p)M −m : is the probability that all other links do not exist. 2.2. Erdős-Renyi Model 55 Definitely P is the probability that a given graph G appears, but it is possible build several graphs over the same number of nodes; consider another of 0 them, G : 0 P (G) = P G M is the total number of links, therefore is the number of links if the topology is a full mesh: n · (n − 1) M= 2 where the division by 2 is necessary since directions of links do not count. 2.2.1 Average degree The average degree of a node is the average number of links that he has, it depends on the graph and it is a random variable. The average degree can be computed as the probability of the total number of possible links divided by the number of nodes. . M · p is the average number of links generated by the process (the number of potential links times the success probability); . n is the number of nodes. Actually this is not enough because to be precise one link consist of two end-links that connects two nodes, therefore the average number of links generated is 2 · M p. In conclusion: 2 · Mp n · (n − 1) · p = = (n − 1) · p n n The average degree can also be written as z or < κ >. For large number of n: z = (n − 1) · p ∼ n · p avg{degree} = Values of z . z = 1: is a critical value. . z > 1: with high probability there is a giant component. . z < 1: there is not a giant component. Clustering coefficient The clustering coefficient perceived is: c=p therefore: c=p= for large values of nodes present. z n 56 CHAPTER 2. Random graphs 2.2.2 Degree distribution Given κ random variable describing the degree and Pκ the probability that the degree of a node is equal to κ, it is possible say that: n−1 κ Pκ = p · (1 − p)(n−1)−κ κ where: . n − 1 are the total number of possible experiments: all nodes minus the one considered; . κ is exactly the number of successful experiments. If n (κ · z): z κ · e −κ κ! and it is a Poisson distribution with parameter z: it means that E[Pκ ] = z. This approximation is due to the fact that the binomial distribution tends to a Poisson for large numbers of n and small numbers of κ. Pκ = 2.3 Bender-Canfield Model This model deals with random graphs that have a given non-Poisson degree distribution. Graphs are built in two steps: . assign edge-ends to nodes (for each value of the degree probability density function, edge ends are assigned accordingly); . randomly connect edge-ends. This is a different way to build random graphs with respect to the ErdősRenyi model because positions are independent and no notions of locality is present. Following sections deal with properties derived by this model. 2.3.1 Node reachability The node reachability property studies the possibility of having a giant component: if nodes are easily reached, it means that the probability of having a giant component increases, while, on the contrary, a bad reachability implies low connectivity and therefore, the giant component will not be present. Consider the following topology, in which, starting from a given node (marked in orange) the reachability of 1-hop (in light-blue) and 2-hop (in violet) neigbors is studied: 2.3. Bender-Canfield Model 57 . 1-hop neighbors: their number is the degree; . 2-hop neighbors: to compute their number, the distribution degree of 1-hop neighbors is required; in principal, each node has the same probability Pκ to be picked, but 1-hop neighbors are not picket randomly: if a node has an higher degree, it has much more probability to be picked, so the rule is κ · Pκ . To understand this concept, consider the star topology: in which n nodes are composed in such a way: . the center with degree n − 1; . n − 1 nodes with degree 1. From this is possible to derive: Pκ n−1 n 1 n 1 n−1 κ 58 CHAPTER 2. Random graphs where the heigh is proportional to the degree and, to be a distribution, is normalized. By starting from the center, the degree perceived is 1, but starting from any other node the degree perceived is n−1 because the center is easy to reach. Therefore: Pκ n−1 n 1 n 1 κ n−1 If each node counts proportionally to his degree, the center counts: (n − 1) · Pκ because is reached many times, while any other node counts: 1 · Pκ In conclusion, it is possible say that the general distribution of 2-hop neighbors is proportional to: κ · Pκ Of course it is not a distribution because it does not sum to 1. Since from 1-hop neighbors also the initial node is reachable, it does not have to be accounted, therefore new nodes reachable are κ − 1. It implies that the probability density function of new nodes reachable in 2 hops is: qκ−1 ∼ = Pκ · κ Therefore: qκ = Pκ+1 · (κ + 1) To be a distribution: qκ = Pκ+1 · (κ + 1) P j j · Pj The average is given by: Avg{qκ } = ∞ X κ · qκ = κ=0 By substituting i = κ + 1 =⇒ ∞ X κ=0 κ· Pκ+1 · (κ + 1) P j j · Pj κ = 0 i = 1: ∞ ∞ X Pi · i(i − 1) X Pi · (i2 − i) P P Avg{qκ } = = j j · Pj j j · Pj i=1 i=1 2.3. Bender-Canfield Model 59 By splitting the numerator into two sums: P∞ Avg{qκ } = P − ∞ i=1 Pi · i j · P j j ·i i=1 PiP 2 Now: . P∞ · i2 is the second moment < κ2 >; . P∞ · i and i=1 Pi i=1 Pi P j j · Pj are first moment (average) < κ >. Therefore: Avg{qκ } = < κ2 > − < κ > <κ> Since this represents the average number of nodes discovered in two hops it will be denoted with z2 . Till now are considered just 2-hop neighbors of one 1-hop neighbor of a given node; the following picture shows this fact by highlighting the paths mentioned in red: Of course, the initial node has more neighbors so, to compute exactly z2 all of them have to be considered: to do this, it is just needed to multiply z2 by the number of nodes of the initial node and this number is the degree < κ > (also possible to call z1 to emphasize that counts 1-hop reachable neighbors): z2 = < κ2 > − < κ > < κ2 > − < κ > · z1 = · < κ >=< κ2 > − < κ > <κ> <κ> The formula shows how the number of reachable nodes growths: the dominant value is < κ2 >. 60 CHAPTER 2. Random graphs Example If the distribution is Poisson (it is the case of the Erdős-Renyi model) the variance is equal to the mean value and: < κ >=< κ2 > −(< κ >)2 < κ2 >= (< κ >)2 + < κ > =⇒ Therefore: z2 =< κ2 > − < κ >= (< κ >)2 + < κ > − < κ >= (< κ >)2 Starting from z2 , by iteration, it is possible discover that: zm = < κ2 > − < κ > · zm−1 <κ> Since: . z2 =< κ2 > − < κ > . z1 =< κ > the result is: zm z2 = · zm−1 = z1 z2 z1 m−1 · z1 By analysing the fraction z2 /z1 : . if: z2 z1 <1 when m grows (the distance grows) it seems like a constant, so there is bad connectivity: it implies that there is not a giant component; . if: z2 z1 >1 on the contrary, all conditions lead to have a giant component; . if: z2 z1 =1 there is the so called critical condition: it is difficult study the behaviour. 2.3. Bender-Canfield Model 61 Example Focusing on the Erdős-Renyi model in critical conditions: z2 =1 z1 z2 = (< κ >)2 =⇒ (< κ >)2 =1 <κ> therefore: < κ >= 1 Conditions that lead to a giant component is: (< κ >) > 1 Since: z2 =< κ > z1 It is possible discover that: zm = (< κ >)m−1 ·z1 =⇒ zm = (< κ >)m−1 · < κ > =⇒ zm = (< κ >)m it means that the discovering process of reachable nodes grows geometrically. 2.3.2 Small-world effect This effect tells that considering a network with a large number of users, the distance between them is relatively small because some of users are very well connected. Assuming: z2 1 (2.1) z1 for sure there is a giant component, therefore the network is very well connected. Now m represent the distance between a given node and any other: each iteration (1 , 2 , . . . , m) allow to discover a very high number of new nodes, but is the last iteration, the one that allows to reach nodes at distance m, that lead to discover more nodes. As a consequence, the mean value is dominated by the last hop. If n is the number of nodes, when zl ∼ = n, the maximum distant nodes are reached and, thanks to hypothesis 2.1, for sure is possible reach them. In formulas: l−1 z2 · z1 = n zl = z1 By taking the logarithm: l−1 z2 n log = log z1 z1 =⇒ l−1= log n/z1 log z2 /z1 62 CHAPTER 2. Random graphs In conclusion: l= log n/z1 +1 log z2 /z1 where such l is the average distance inside the network: it is also called diameter. The parameter l grows as the logarithm of n: if the number of nodes is very large, l does not grow too much, therefore the small-world effect is ensured. It also means that randomly built graphs have a shortest distance. Since in the Erdős-Renyi model z1 =< κ >= z and z2 = (< κ >)2 = z 2 : l= log n/z1 log n/z1 ∼ log n − log z ∼ log n +1∼ = = = log z log z log z log z This behavior is also valid for trees topologies, while regular structures: . the ring has an average distance that grows with n (because it is n/2); . a grid topology in which there are n2 nodes has an average distance √ that grows with n. It means that regular structures have intrinsically worst performances because: . have higher distances; . are less robust to churning (maintenance is hard). Example Consider an average delay D = 0.2 s; to not exceed a maximum average delay R = 1 s the distance l should be computed as: l · D‡ < R By using: l∼ = log n ·D <R log z It is possible obtain: log z > log n ·D R Consider: ‡ . n = 104 =⇒ log z > (4 · 0.2) =⇒ z > 6.3 . n = 106 =⇒ log z > (6 · 0.2) =⇒ z > 15.8 This term, l · D, shows the average delay to reach the farest node. 2.3. Bender-Canfield Model 63 It means that the degree increases by a factor of 3 every time the number of nodes increase by a factor of 100. Focusing on the critical condition, it is possible say that: z2 =1 z1 =⇒ z2 = z1 Therefore: < κ2 > − < κ >=< κ > This is: ∞ X =⇒ < κ2 > −2 < κ >= 0 κ · (κ − 2) · Pκ = 0 κ=0 By analysing this expression, it is clear that terms with κ = 0 , 1 , 2 have no effect on the final result (the occurrence of the giant component) because: . terms with κ = 0 are isolated nodes; . in terms of reachability, κ = 1 , 2 are the same: = 2.3.3 Clustering The following analysis are performed for any distribution that is not Poisson; the clustering property shows the probability that two neighbors of a given nodes are neighbors. To be verified, it is need that the orange link in the following picture is established: B A C Therefore the clustering coefficient describe how much locality is introduced into the network. Considering that: . node B has connectivity κi ; 64 CHAPTER 2. Random graphs . node C has connectivity κj , the clustering coefficient is given by: c= < κi > · < κj > n·z where: . the numerator represents the all ways in which is possible connect the two nodes; . the denominator represents the average number of links in the network because is given by the number of nodes n multiplied by the average degree of each node z. For 1-hop neighbors the distribution is qκ and it is independent looking ad different nodes, therefore: " #2 2 X < κi > · < κj > 1 1 < κ2 > − < κ > c= κ · qκ = = · · n·z n·z n·z <κ> κ By multiplying and dividing by z 2 : 2 < κ2 > − < κ > z = c= · n (< κ >)2 Now, to the numerator is added and subtracted the quantity (< κ >)2 : 2 z < κ2 > −(< κ >)2 + (< κ >)2 − < κ > c= · n (< κ >)2 In this way is possible recognize, within the numerator, the variance. Since the coefficient of variation is defined as: p √ Var {< κ2 > −(< κ >)2 } var = cv = avg <κ> within the clustering coefficient it is possible recognize the square: (cv )2 = Therefore: < κ2 > −(< κ >)2 (< κ >)2 z < κ > −1 2 2 c = · (cv ) + n <κ> Since the clustering coefficient depends on the square of coefficient of variation, the dominant value is the variance. In conclusion the variance is extremely important: it ensures high connectivity and introduces locality. 2.4. Heavy-Tailed Distribution 65 Variance Giant component Clustering coefficient Example Using this formulas for the Erdős-Renyi model: (cv )2 = Therefore: Var {κ} <κ> 1 = = 2 2 (< κ >) (< κ >) z n·p 1 z−1 2 z z + =p = ·1= c= · n z z n p Indeed, p is the probability that two nodes have a link that connect them, so it is the also the clustering coefficient. 2.4 Heavy-Tailed Distribution The heavy-tailed distribution (also called power-law ) is used to represent phenomena like P2P systems, the topology of the Internet, how much a client is connect (temporarly) and social networks: they both have in common the feature that their distribution does not decrease as an exponential, therefore are not representable through a Poisson distribution. It means that the probability of having large values is not negligible; it is: Pκ ∼ = α · κ−γ and such γ can take, typically, values: 2<γ<3 In mathematical terms, this systems has a finite average, but infinite variance since the second moment tends to infinite: Z ∞ κ2 · Pκ dκ −→ ∞ n This behavior is not really good, because both the small world and clustering property depends largely on the variance. But the distribution comes from measures and the tail is typically difficult to estimate precisely. 66 CHAPTER 2. Random graphs Scale-free property The scale-free property says that after this change: κ −→ λ · κ the shape of the distribution does not change. But, the mean value, is not too much representative of system described before: think at the time connectivity. There are few users that have very long time connections while the major part of users have short time connections. 2.5 Watts-Strogatz model This model represent a family of random graphs that is obtained as an intermediate solution between pure random-graphs and regular structures. This interpolation allows to provide both peculiar properties of the two families: . regular structures (lattices): notion of locality (clustering); . random graphs: small world effect. By considering a regular structure (a ring, for example), a Watts-Strogatz model is built introducing randomicity: The connectivity, considering a given node (marked in blue in the picture), is: . m nodes in the clockwise order; . m nodes in the counter clockwise order. Therefore, each node has a degree equal to 2m. The average distance between nodes grows linearly with the number of nodes n in the network: thanks to short cuts (as in Chord) it is possible reduce it. Indeed, the process to obtain a Watts-Strogatz model is: . for each node: 2.5. Watts-Strogatz model 67 . take each clockwise link; . rewire it randomly with a probability p (or maintain it with a probability 1 − p). The following picture shows this procedure: =⇒ Properties mentioned before (small-world effect and clustering) depends on p: . if it is large, the system tends to be a pure random-graph (for p → 1 tends to be a Erdős-Renyi graph); . if it is small, the system tends to be a regular structure with high clustering (long fixed routes to reach farthest nodes). 2.5.1 Clustering analysis When p = 0, the clustering coefficient is: c= 3 · (m − 1) 2 · (2m − 1) therefore depends basically on m, but it is very high (greater than 0, while for Erdős-Renyi is something near 10−4 ). It means that, the probability for two nodes of being neighbors is high if they have a common neighbor. Indeed, look at the following picture: the green nodes are neighbors and have a common neighbor: the blue node. This behavior has to taken into account not just considering the degree of 68 CHAPTER 2. Random graphs a node, but considering the degree for all of them: the result is a very high locality. When p > 0: 3 · (m − 1) c= · (1 − p)3 2 · (2m − 1) it means that when p increases, the connectivity based on locality decreases. 2.5.2 Small-world analysis The small-world property describe the distance between nodes. The average distance depends on the number of nodes in regular structures: if it is a grid: √ . with 2 dimensions, the complexity is O( n); √ . with 3 dimensions, the complexity is O( 3 n); In general: l ∼ O(n) Look at the following graph: In the region placed at the left top values of p leads to a regular structure, while the bottom right region describe random graphs. In the center there is a zone in which are satisfy both the small-world property and clustering. Considering the ring, it is possible say that, by introducing few short cuts (few with respect to the number of links) the small-world property start to be ensured because those short cuts connect very far nodes. When the number of short cuts inserted increases, their benefit decreases: it is better, indeed, introduce few of them an use just to reach farthest regions, then use the locality connections to reach the destination. With short cuts, the size of regions obtained by splitting is given by: 1 ∼n ∼ ‡ ∼ np p where: ‡ The complexity of this formula il linear. 2.6. Theory of evolving networks 69 . ∼ n is the space size (number of nodes); . ∼ np is the number of short cuts introduced. To ensure the small-world property: 1 n p =⇒ p 1 n If the network is large, the small-world property is ensured by having p also small. To guarantee clustering: p 1‡ In conclusion, to have simultaneously the small world effect and clustering, is necessary have: 1 p1 n This model has been largely used to model P2P systems: for example, in P2P streaming system, too much locality lead to obtain bad performances because to reach, with a chunk, the entire network it take a very large time (so the delay increases). To deal with this fact, sometimes neighbors are randomly picked: this can be seen as a short cut. In BitTorrent happens the same: to diversify the content downloadable, neighbors are not always selected based on the tit-for-tat procedure, but sometimes are randomly selected. 2.6 Theory of evolving networks This model takes care of the evolution of the network: how the overlay evolves in time. The algorithm: . define a graph with n final nodes; . starts with m0 nodes, where m0 < n (such m0 is the initial condition); . at each step a node is added: it takes n − m0 steps to build the final topology. The time evolution is characterized by the fact that, at each step, nodes have a different degree: depending on the policy adopted, the system can evolve differently. Basically, the simplest policy is add new nodes to ones that have an higher degree: this helps to reach more nodes with shortest path. ‡ This term is due to the term (1 − p)3 . 70 CHAPTER 2. Random graphs Definitions . s: time in which a node is introduced, it represents the age (older nodes has more chance to be well connected); . κs : degree of the node introduced at time s; it is described with a differential equation: to simplify the math, it is assumed to be continuous (κs (t)); . m: links of the new node. The evolution of the system is described by: ∂ κs (t) = m · π(κs (t)) ∂t (2.2) The increase of the degree depends on the number of links and is proportional to the degree itself. The term π(·) is a function that describe how new nodes are connected to the already existent network: it is the connection policy and can be considered as the term that describe the system evolution. At the beginning the degree is: κs (s) = m Barabasi-Albert criterion This approach says that scale-free networks are built with a preferential attachment criterion. The algorithm is: . start with an initial graph; . at each step a node is attached (m links); . links are preferentially attached to nodes based on their degree: π(κs (t)) = P 1 · κs (t) j κj (t) (2.3) The term: X κj (t) j is a normalization coefficient that describe, statistically, the amount of all possible degrees of links. By substituting (2.3) in (2.2), it is possible obtain: ∂ κs (t) m · κs (t) = ∂t (2mt + 2m0 < κs >) where: (2.4) 2.6. Theory of evolving networks 71 . 2mt represents the links already introduced in the network; . 2m0 < κs > is the initial distribution of the degree since m0 is the number of links at time s and < κs > is the average degree at the beginning. The denominator is, globally, the coefficient of normalization seen in (2.2). The equation (2.4) shows that at each step t, 2m new links are introduced: this is the contribution of the degree of two different nodes. At the beginning: κs (s) = m < κs >= 2m At the end: κs (s) ∼ =m· 1/2 t for s t→∞ This suggest that the degree increases as a square root function in time; the denominator is s and represent the current node: the degree is high if the 0 node is older, therefore it depends on the age of nodes. Consider a node s older than s where: 0 s <s<t The ratio: κs0 (t) κs (t) ∼ = s 1/2 s0 Looking at large values of t: Pκ = 2 · m2 κ−3 therefore the probability that a node has degree κ is a heavy-tailed distribution: the scale-free property is ensured. For what concern the small-world property and the clustering: l ∼ c = log n log log n m · (log n)2 8n The small-world property is expected because there are few nodes very well connected: they are the oldest nodes. The clustering property is similar to the Erdős-Renyi model in which decreases with the number of nodes. 72 CHAPTER 2. Random graphs 2.7 Resume scheme Model ER Small-world l∼ = log n log z Clustering c=p∼ 1 n z z−1 2 c = · (cv ) + n z RG with empirical distr. log n/z1 l= log z2 /z1 WS (p = 0) l ∼ n non ensured WS (p > 0) ensured high clustering BA ensured low clustering c= 3 · (m − 1) 2 · (2m − 1) For random graphs with empirical distribution, both, small-world property and clustering depends on the variance: with power-law the scale-free property is ensured. For Watts-Strogatz the value p should be taken: 1 p1 n