* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Napster & Gnutella
IEEE 802.1aq wikipedia , lookup
Computer network wikipedia , lookup
Network tap wikipedia , lookup
Distributed operating system wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Deep packet inspection wikipedia , lookup
Distributed firewall wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Real-Time Messaging Protocol wikipedia , lookup
Airborne Networking wikipedia , lookup
Packet switching wikipedia , lookup
Napster & Gnutella An Overview 1 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in mid 1999, and remained operational for two years. Most Innovative Software (Computer Shopper, Dec 2000) Best Cyber Tech 2000 (Time Magazine) Sparked off a host of lawsuits for copyright infringement. 2 Napster Principles The actual MP3 files are stored on client machines. A central server holds metainformation about online Index Server clients along with the files held by them. Clients contact Napster Server to locate files, and contact each other to download them 3 Remarks Simple but elegant Central index server is a bottleneck and a point of failure Replication / chaining of servers eased the bottleneck to some extent. 4 What is KaZaA? Extension of Napster - uses the FastTrack technology. Powerful nodes (supernodes) act as local index servers, and client queries are propagated to other supernodes. Step in the right direction. Second generation P2P Supernode Supernode Supernode 5 What is Gnutella? ? ? ? Gnutella is a protocol for distributed search • peer-to-peer comm • decentralized model • No third party lookup Two stages : 1. 2. Join Network … later Use Network 1. Discover other peers 2. Search other peers 6 The Jargon Each servent is both a client and a server Servent: A Gnutella node. 2 Hops Hops: a hop is a pass through an intermediate node 1 Hop Horizon/TTL: how many hops a packet can go before it dies (default setting is 7 in Gnutella) 7 Gnutella Messages •Ping: used to actively discover hosts on the network. A servent receiving a Ping descriptor is expected to respond with one or more Pong descriptors. •Pong: the response to a Ping. Each Pong packet contains a Globally Unique Identifier (GUID) plus address of servent and information regarding the amount of data it is making available to the network) •Query: the primary mechanism for searching the distributed network. A servent receiving a Query descriptor will respond with a QueryHit if a match is found against its local data set. •QueryHit: the response to a Query: contains IP address, GUID and search results •Push: allows downloading from firewalled servents 8 Gnutella Scenario Step 0: Join the network Step 1: Determining who is on the network • "Ping" packet is used to announce your presence on the network. • Other peers respond with a "Pong" packet. • Also forwards your Ping to other connected peers • A Pong packet also contains: • an IP address • port number • amount of data that peers is sharing • Pong packets come back via same route Step 2: Searching •Gnutella "Query" ask other peers if they have the file you desire (and have an acceptably fast network connection). • A Query packet might ask, "Do you have any content that matches the string ‘Homer"? • Peers check to see if they have matches & respond (if they have any matches) & send packet to connected peers • Continues for TTL Step 3: Downloading • Peers respond with a “QueryHit” (contains contact info) • File transfers use direct connection using HTTP protocol’s GET method • When there is a firewall a "Push" packet is used – reroutes via Push path 9 Remarks Simple idea - But lacks scalability, since bandwidth is wasted. Sometimes, objects that are present may not be located due to limited TTL. Various improved search strategies have been proposed. 10 Efficient search: Gia approach To improve the scalability of Gnutella on a heterogeneous Platform, Gia used the following approach: Flooding was replaced by (biased) random walk Topology adaptation: high-capacity nodes maintain more clues via one hop replication. They should be easily reachable Flow Control to prevent overloading 11 Search via random walk Gnutella is a protocol for distributed search • peer-to-peer • decentralized model Two stages : 1. 2. Join Network … later Use Network 1. Discover other peers 2. Search other peers via random walk 12 Topology control Goals are (1) high capacity nodes have high degrees and (2) low capacity nodes are within a short distance from the high capacity nodes. S = satisfaction level. S=0 means dissatisfied (# neighbors < max # according to its capacity), S=1 means satisfied. To increase the level of satisfaction, each host picks a node from its cache as neighbor if it is not already a neighbor, or not marked dead. 13 Topology Control {To add a neighborY} If # of neighbors + 1 < max then accept Y as a neighbor {To drop an existing neighbor for the sake of adding Y} pick the highest degree current neighbor Z; If Y has higher capacity or Y has fewer neighbors than Z then drop Z, accept Y else reject Y {This ensures we do not drop poorly connected neighbors} 14 One hop replication Each node keeps track of the indices of the files at its immediate neighbors. As a result, high capacity/ high degree nodes can provide useful response to a a large number of queries. 15 Biased random walk P=2/10 P=5/10 P=3/10 How does it help? 16 Analysis of random walk One can deploy k random walkers to expedite the search p = number of nodes that have the desired object O = overhead, i.e. distance traveled by the walkers to complete the search (that can be successful or unsuccessful) T= TTL (time to live for each walker) Expected overhead = 1-(1-p)T-1 + (1-p)T-1 .k p 17 Analysis of random walk Probability of success = 1-(1-p) Expected overhead = kT 1-(1-p)T-1 + (1-p)T-1 . k p As k increases, the probability of success increases, the overhead increases, but the delay decreses! There is a tradeoff here. 18