* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 04gnutella
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Network tap wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Deep packet inspection wikipedia , lookup
Remote Desktop Services wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Distributed firewall wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Ideal as a research test bed Large scale network demonstrates the need for scalable P2P protocols A Gnutella client has 4-10 TCP connections to other peers For signaling traffic UDP is used and to make use of the benefits of server based networks a ”ultra-peer” state was created Introduction (Cont.) ”Ultra-peer” status is self assigned by powerful peers and provides some extra functionality compared to ordinary nodes There exist many freely available Gnutella clients Some of the most popular are: Limewire Bearshare Morpheus Shareaza It has the most increasing number of users It has a very pleasant GUI and connects also to eDonkey and BitTorrent Its Main Features This protocol underlies much of the current file-sharing activity on the Internet. It is based on TCP/IP and http! A file sharing network (fsn) is a bunch of machines that exchange files using gnutella. To connect to a gnutella network, you need the IP address of one single machine that is already part of the network. Gnutella Peer-to-peer indexing and searching service. Peer-to-peer point-to-point file downloading using HTTP. A gnutella node needs a server (or a set of servers) to “start-up”… gnutellahosts.com provides a service with reliable initial connection points But introduces a new single point of failure! Gnutella vs. Napster Like Napster, distributed file storage and transmission Added the ability to distribute file discovery Ask your direct peers who else they know Query those machines directly Concepts of Unstructured Services There are many interesting ideas being explored; Breaking shared files into many parts to both increase bandwidth (parallel I/O) and increase security of content as no one site can access files without cooperation from its peers This type of technology makes censorship very hard. MojoNation has a load balancing and scheduling algorithm in the form of micro payments to reward those who contribute most to the community of peers. Gnutella - which is a family of related products -- is usually described as a P2P search engine as its interface is nearer that of a search engine than a Web file system Characteristics Gnutella is a distributed system for file sharing provide means for network discovery provide means for file searching and sharing Defines a network at the application level Employs the concept of peer-to-peer all hosts are equal (symmetry) there is no central point anonymous search, but reveal the IP addresses when downloading connection Once you establish connection to the first servent, you announce your presence. The first servent will pass on that message to all the servents that it is connected to, and so on. These servents all reply with data about themselves how many files it is sharing how many kilo bytes the files take up This already adds up to a lot of traffic! Gnutella File Sharing model Users register files with network neighbors Search across the network to find files to copy Does not require a centralized broker (as Napster) Bob Where is Final Fantasy 4? Copying Final Fantasy 4 Carol Carol has Final Fantasy 4 Ted Where is Final Fantasy 4? Carol has it Alice Resource Discovery Decentralized File-sharing Model Peers have same capability and responsibility The communication between peers is symmetric There is no central directory server Index on the metadata of shared files is stored locally among all peers Gnutella FreeServe MojoNation Resource Discovery Decentralized (Cont.) every user acts as a client, a server or both (servent) User connects to framework and becomes a member of the community, allowing others to connect through him/her Users speak directly to other users with no intermediate or central authority No one entity controls the information that passes through the community Resource Discovery Advantages and Disadvantages Advantages: Inherent scalability Avoidance of “single point of litigation” problem Fault Tolerance Disadvantages: Slow information discovery More query traffic on the network Unstructured Decentralized Services There some 200 available Napster clones to support this area http://www.ultimateresourcesite.com/napster/main.htm Currently the most popular is Imesh [http://www.imesh.com], which has some 2 million users and can share any type of file. Some of the best known file sharing systems are MojoNation [http://www.mojonation.net] Freenet [http://freenet.sourceforge.net/] Gnutella [http://gnutella.wego.com/] These three are not server based like Napster but rather support waves of software agents expressing resource availability and interest propagating among an informal dynamic networks of peers DFS Variations DFS: Distributed File Sharing Napster Gnutella Freenet Shawn Fanning Gene Kan @ AOL Ian Clark Remote file sharing (portal) File-sharing community (portal) Decentralized file sharing community Decentralized anonymous file sharing Yes Yes Yes No No Yes Yes No No No No Search Serverbased Serverbased Serverbased Serverbased p2p p2p File transfer Client/ server Client/ server Client/ server p2p p2p p2p nfs http, caching http Proprietary, encrypted, caching FTP NFS Web Purpose Remote file sharing Local file sharing Moderated? Yes Access control? File transfer protocol ftp proprietary P2P File Sharing Benefits Cost sharing Resource aggregation Improved scalability/reliability Anonymity/privacy Dynamism Management/Placement Challenges Per-node state Bandwidth usage Search time Fault tolerance/resiliency Gnutella in Details Share any type of files (not just music) Decentralized search unlike Napster You ask your neighbors for files of interest Neighbors ask their neighbors, and so on TTL field quenches messages after a number of hops Users with matching files reply to you Figure from http://computer.howstuffworks.com/file-sharing.htm The Gnutella protocol (v0.4) PING – Notify a peer of your existence PONG – Reply to a PING request QUERY – Find a file in the network RESPONSE – Give the location of a file PUSHREQUEST – Request a server behind a firewall to push a file out to a client. Joining Gnutella Network The new node connects to a well known ‘Anchor’ node. Then sends a PING message to discover other nodes. PONG messages are sent in reply from hosts offering new connections with the new node. Direct connections are then made to the newly discovered nodes. Gnutella Network New PING PING PONG PING A PING PONG PING PING Properties of the Flooding Searching by flooding: If you don’t have the file you want, query 7 of your partners. If they don’t have it, they contact 7 of their partners, for a maximum hop count of 10. Requests are flooded, but there is no tree structure. No looping but packets may be received twice Note: Play gnutella animation at: http://www.limewire.com/index.jsp/p2p Query flooding Gnutella no hierarchy use bootstrap node to learn about others join message Send query to neighbors Neighbors forward query to all attached neighbors (floods) If queried peer has object, it sends message back to querying peer query join More on query flooding Pros peers have similar responsibilities: no group leaders highly decentralized no peer maintains directory info Cons excessive query traffic query radius: may not have content when present bootstrap node still required maintenance of overlay network About the Flooding There is nothing that stops a servant flooding its network region with messages. Cost of maintaining Network Cost of searching file Breadth-First Search (BFS) = source = forward query = processed query = found result = forward response Resource Discovery Pros and Cons Benefits: Peers speak directly with no central authority Nobody owns the Gnutella Network and nobody can shut it down No central point of failure Limited per-node state Isolated node failure can quickly and automatically be worked around Free loading Scalability Drawbacks: Searches are less effective and can be slow Bandwidth intensive Gnutella network evolving to include “controlled decentralization” (limewire, bearshare, toadnode) Searching for a File A node broadcasts its QUERY to all its peers who in turn broadcast to their peers. Nodes route QUERYHITs along the QUERY path back to the sender containing file location details. To download files a direct connection is made using details of the host in the QUERYHIT messages. Gnutella Network QUERY HIT QUERY QUERY QUERY QUERY HIT The Cooperation Spectrum Free Riding File sharing networks rely on users sharing data Two types of free riding Downloading but not sharing any data Not sharing any interesting data On Gnutella 15% of users contribute 94% of content 63% of users never responded to a query Didn’t have “interesting” data Data from E. Adar and B.A. Huberman (2000), “Free Riding on Gnutella” Example: GNUTELLA Summary of the Gnutella’s Features Decentralized No single point of failure Not as susceptible to denial of service Cannot ensure correct results Flooding queries Search is now distributed but still not scalable Initials Problems and Fixes Freeloading: WWW sites offering search/retrieval from Gnutella network without providing file sharing or query routing Block file-serving to browser-based non-file-sharing users Prematurely terminated downloads: Software bugs long download times over modems modem users run gnutella peer only briefly (Napster problem also!) or any users becomes overloaded fix: peer can reply “I have it, but I am busy. Try again later” Initials Problems and Fixes 2 2000: avg size of reachable network only 400-800 hosts Why so small? modem users: not enough bandwidth to provide search routing capabilities: routing black holes Fix: create peer hierarchy based on capabilities previously: all peers identical, most modem blackholes connection preferencing: favors routing to well-connected peers favors reply to clients that themselves serve large number of files: prevent freeloading Limewire gateway functions as Napster-like central server on behalf of other peers for searching purposes Gnutella Enhancements Pings/Pongs can consume up to 50% of bandwidth Solutions: Pong Limiting Pong Caching Ping Multiplexing http://www.limewire.com/index.jsp/pingpong Gnutella enhancements 2 Cache query responses Results Evolving Protocol Gnutella Developer Forum UltraPeers Alternative query routing algorithms Can Heterogeneity Make Gnutella Scale? Ideas Replace query flooding with multiple random walks Proactive replication #replicas proportional to sqrt(request rate) Result: Two orders of magnitude improvement in terms of query-time, per node load and message traffic Can Heterogeneity Make Gnutella Scale? 2 Gnutella assumption: All peers are equal Not true! Heterogeneity among P2P peers (dial-up users vs. college users) Evolve topology to match node capacities Use random walks over this topology Can Heterogeneity Make Gnutella Scale? 3 Solution outline C_i, node capacity in[j,i] messages from j->i, out[i,j] messages i->j Init in[i,j]=out[i,j]=0, OutMax[i,j]=c_i/d_I Update according the messages received/sent Check if overloaded If so redirect high-input neighbor to neighbor with high OutMax (spare capacity) Intuitively, take yourself out of the loop If node cannot be found ask neighbor to throttle back Result: Average query length reduces from 70 to 2-9 hops depending on topology Measurement Results Who is sharing what? August 2000 The top Share As percent of whole 333 hosts (1%) 1,142,645 37% 1,667 hosts (5%) 2,182,087 70% 3,334 hosts (10%) 2,692,082 87% 5,000 hosts (15%) 2,928,905 94% 6,667 hosts (20%) 3,037,232 98% 8,333 hosts (25%) 3,082,572 99% Problems With Gnutella Protocol scalability Message broadcast technique imposes limitations on the network size TTL i packets per message = ∑noPeers i=0 In November 2000 dial-up bandwidth barrier reached Overlay network efficiency Random selection of peers results in inefficient use of the underlying network Redundant traffic generated on the Internet Heterogeneous connection qualities of the Gnutella 35% have upstream bottleneck bandwidth of at least 100Kbps only 8% have at least 10Mbps bandwidth 22% have bandwidth 100kbps or less Number of Shared Files Why Look at Gnutella Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts 2006: still heavily in use by about 2 million users Gnutella clients (among others): LimeWire Morpheus BearShare OpenCola Shareaza It has the most increasing number of users It has a very pleasant GUI and connects also to eDonkey and BitTorrent Ideal as a research test bed Large scale network demonstrates the need for scalable P2P protocols Limewire: Improvement on Gnutella Creation peer hierarchy based on capabilities previously: all peers identical, most modem blackholes connection preferencing: favors routing to well-connected peers favors reply to clients that themselves serve large number of files: prevent freeloading Limewire gateway functions as Napster-like central server on behalf of other peers for searching purposes Limewire The Limewire P2P file sharing program connects to the Gnutella P2P network Limewire client software is widely recognized for its clean user interface that does not contain adware Sometimes billed as the „fastest file sharing program” Limewire claims to offer relatively good search and download performance Free Limewire software downloads are available for Windows, Linux and Macintosh operating systems Limewire Pro pay clients also exist BearShare The BearShare P2P file sharing program is a popular free software client for the Gnutella P2P network Both free and pay downloads of BearShare file sharing programs exist Shareaza Shareaza is an up-and-coming P2P file sharing program This client offers an extremely powerful search engine capable of connecting to multiple popular P2P networks including eDonkey, BitTorrent and Gnutella Shareaza file sharing software includes intelligence for detecting fake and/or corrupted files The free Shareaza download also contains no ads or spyware As the installed base of Shareaza client users grows expect Shareaza to become an even better P2P file sharing program Anonymous? The person you are getting the file from knows who you are That’s not anonymous. Other protocols exist where the owner of the files doesn’t know the requester. Peer-to-peer anonymity exists Summary peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for files each application instance serves to: store selected files route queries (file searches) from and to its neighboring peers respond to queries (serve file) if file stored locally Gnutella history: 3/14/00: release by AOL, almost immediately withdrawn too late: 23K users on Gnutella at 8 am this AM many iterations to fix poor initial design (poor design turned many people off) What we care about: How much traffic does one query generate? how many hosts can it support at once? What is the latency associated with querying? Is there a bottleneck?