Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IEEE 802.1aq wikipedia , lookup
Distributed firewall wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Deep packet inspection wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Distributed operating system wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Overview of Peer-to-Peer Systems, Grids, and Clouds (Part 1) Unstructured P2P Systems COP 6390.all 02/21/2011 Anda Iamnitchi 1 Google trends: Grid computing (blue), peer-to-peer (red), cloud computing (orange) 2 What they have in common Real instances of distributed systems Passed the test of deployment and usability Large-scale distributed systems Interesting phenomena at large scale Scale: users, resources, geographical 3 Why Peer-to-Peer Systems? Wide-spread user experience Large-scale distributed application, unprecedented growth and popularity KaZaA – 389 millions downloads (1M/week) one of the most popular applications ever! Heavily researched in the last 10 years with results in: User behavior characterization Scalability Novel problems: reputation, trust, incentives for fairness Commercial impact eDonkey 3,108,909 FastTrack (Kazaa) 2,114,120 Gnutella 2,899,788 Cvernet 691,750 Filetopia 3,405 Number of users for file-sharing applications (estimate www.slyck.com, Sept ‘06) What Is a P2P System? Node Node Node Internet Node Node A distributed system architecture: No centralized control (debatable: Napster?) Nodes are symmetric in function (debatable: New Gnutella 5 protocol?) Large number of unreliable nodes Initially identified with music file sharing P2P Definition(s) A number of definitions coexist: Def 1:“A class of applications that takes advantage of resources — storage, cycles, content, human presence — available at the edges of the Internet.” Edges often turned off, without permanent IP addresses Def 2: “A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric.” 6 Lots of other definitions that fit in between The Promise of P2P Computing High capacity through parallelism: Many disks Many network connections Many CPUs Reliability: Many replicas: of data of network paths Geographic distribution Automatic configuration Useful in public and proprietary settings 7 Popularity since 2004 Britney Spears 1.00 p2p 0.40 Normalized and compared to the popularity of Britney Spears as shown by Google Trends 8 Napster: History Program for sharing files over the Internet History: 5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service 12/99: first lawsuit 3/00: 25% UWisc traffic Napster 2000: est. 60M users 2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws 7/01: # simultaneous online users: Napster 160K, Gnutella: 40K, Morpheus: 300K 9 Basic Primitives for File Sharing Join: How do I begin participating? Publish: How do I advertise my file(s)? Search: How do I find a file? Fetch: How do I retrieve a file? 10 Napster: How It Works napster.com • Client-Server: Use central server to locate files • Peer-to-Peer: Download files directly from peers 11 Napster 1. File list is uploaded (Join and Publish) napster.com users 12 Napster 2. User requests search at server (Search). napster.com Request and results user 13 Napster napster.com 3. User pings hosts that apparently have data. Looks for best transfer rate. ping ping user 14 Napster 4. User retrieves file (Fetch) napster.com Download file user 15 Lessons Learned from Napster Strengths: Decentralization of Storage Every node “pays” its participation by providing access to its resources physical resources (disk, network), knowledge (annotations), ownership (files) Every participating node acts as both a client and a server (“servent”): P2P Decentralization of cost and administration = avoiding resource bottlenecks Weaknesses: Centralization of Data Access Structures (Index) Server is single point of failure Unique entity required for controlling the system = design bottleneck Copying copyrighted material made Napster target of legal attack increasing degree of resource sharing and decentralization 16 Centralized System Decentralized System Gnutella: File-Sharing with No Central Server 17 Gnutella: History Developed in a 14 days “quick hack” by Nullsoft (winamp) Originally intended for “exchange of cooking recipes” Evolution of Gnutella Published under GNU General Public License on the Nullsoft web server Taken off after a couple of hours by AOL (owner of Nullsoft): too late Gnutella protocol was reverse engineered from the original Protocol published Third-party clients were published and Gnutella started to spread Many iterations to fix poor initial design High impact: Many versions implemented Many different designs 18 Lots of research papers/ideas Gnutella: Search in an Unstructured Overlay I have file A. I have file A. Reply Flooding Query Where is file A? 19 Gnutella: Overview Join: on startup, client contacts a few other nodes; these become its “neighbors” Initial list of contacts published as gnutellahosts.com:6346 Outside the Gnutella protocol specification Default value for number of open connections (neighbors): C = 4 Publish: no need Search: Flooding: ask neighbors, who ask their neighbors, and so on... Each forwarding of requests decreases a TTL. Default: TTL = 7 When/if found, reply to sender Drop forwarding requests when TTL expires One request leads to 2 * i 0 C * (C 1)i 26,240 TTL Back-propagation in case of success (Why?) Fetch: get the file directly from peer (HTTP) messages Gnutella: Protocol Message Types Type Ping Pong Query QueryHit Push Description Contained Information Announce availability and probe for None other servents Response to a ping IP address and port# of responding servent; number and total kb of files shared Search request Minimum network bandwidth of responding servent; search criteria Returned by servents that have IP address, port# and network bandwidth of the requested file responding servent; number of results and result set File download requests for Servent identifier; index of requested file; IP servents behind a firewall address and port to send file to What would you ask about a Gnutella network? 22 Gnutella: Tools for Network Exploration Eavesdrop traffic - insert modified node into the network and log traffic. Crawler - connect to active nodes and use the membership protocol to discover membership and topology. 23 Gnutella: Heterogeneity All Peers Equal? (1) 1.5Mbps DSL 1.5Mbps DSL 56kbps Modem 1.5Mbps DSL 10Mbps LAN 1.5Mbps DSL 56kbps Modem 56kbps Modem 24 Gnutella Network Structure: Improvement Gnutella Protocol 0.6 Two tier architectures of ultrapeers and leaves Ultrapeers Leaves Data transfer (file download) Control messages (search, join, etc) 25 Déjà vu? Gnutella Protocol 0.6 Two tier architectures of ultrapeers and leaves Ultrapeers Leaves Data transfer (file download) Control messages (search, join, etc) 26 Gnutella: Free Riding All Peers Equal? (2) More than 25% of Gnutella clients share no files; 75% share 100 files or less Conclusion: Gnutella has a high percentage of free riders If only a few individuals contribute to the public good, these few peers effectively act as centralized servers. Outcome: Significant efforts in building incentive-based systems 27 BitTorrent? Adar and Huberman (Aug ’00) Flooding in Gnutella: Loops? Seen request already 28 Improvements of Message Flooding Expanding Ring start search with small TTL (e.g. TTL = 1) if no success iteratively increase TTL (e.g. TTL = TTL +2) k-Random Walkers forward query to one randomly chosen neighbor only, with large TTL start k random walkers random walker periodically checks with requester whether to continue Experiences (from simulation) adaptive TTL is useful message duplication should be avoided flooding should be controlled at fine granularity 29 Gnutella Topology (Mis)match? 30 Gnutella: Network Size? Explosive growth in 2001, slowly shrinking thereafter • High user interest – Users tolerate high latency, low quality results • Better resources – DSL and cable modem nodes grew from 24% to 41% over first 6 months. 31 Is Gnutella a Power-Law Network? Power-law networks: the number of links per node follows a power-law distribution N = L-k Num. of nodes (log scale) 10000 Examples of power-law networks: November 2000 1000 – – – – – 100 10 The Internet at AS level In/out links to/from HTML pages Airports US power grid Social networks 1 1 10 100 Number of links (log scale) Implications: High tolerance to random node failure but low reliability 32 when facing of an ‘intelligent’ adversary Network Resilience Partial Topology Random 30% die Targeted 4% die from Saroiu et al., MMCN 2002 33 Is Gnutella a Power-Law Network? (Later Data) Later, larger networks display a bimodal distribution Implications: – High tolerance to random node failures preserved – Increased reliability when facing an attack. Number of nodes (log scale) 10000 May 2001 1000 From Ripeanu, Iamnitchi, Foster, 2002 34 100 10 1 1 10 100 Number of links (log scale) Discussion on Unstructured P2P Networks Performance Search latency: low (graph properties) Message Bandwidth: high improvements through random walkers, but essentially the 35 whole network needs to be explored Storage cost: low (only local neighborhood) Update cost: low (only local updates) Resilience to failures good: multiple paths are explored and data is replicated Qualitative Criteria search predicates: very flexible, any predicate is possible global knowledge: none required peer autonomy: high BitTorrent 36 BitTorrent Components Torrent File Metadata of file to be shared Address of tracker List of pieces and their checksums Tracker Lists peers interested in the distribution of the file Peers Clients interested in the distribution of the file Can be “seeds” or “leachers” 37 A BitTorrent Swarm • A “seed” node has the file • A “tracker” associated with the file • A “.torrent” meta-file is built for the file: identifies the address of the tracker node • The .torrent file is published on web • File is split into fixed-size segments (e.g., 256KB) 38 Choking Algorithm Each connected peer is in one of two states Choked: Download requests by a choked peer are ignored Unchoked: Download requests by an unchoked peer are honored Choking occurs at the peer level Each peer has a certain number of unchoke slots 4 regular unchoke slots (per BitTorrent standard) 1 optimistic unchoke slot (per BitTorrent standard) Choking Algorithm Peers unchoke connected peers with best service rate Service rate = rolling 20 second average of its upload bandwidth Optimistically unchoking peers prevents a static set of unchoked 39 peers The choking algorithm runs every 10 seconds Peers optimistically unchoked every 30 seconds New peers are 3 times more likely to be optimistically unchoked Piece Selection Random First Piece Piece download at random Algorithm used by new peers Rarest Piece First Ensures > 1 distributed copies of a piece Increases interest of connected peers Increases scalability Random Piece vs. Rarest Piece Rarest has probabilistically high download time New peers want to reduce download time but also increase their interest 40 BitTorrent: Overview Join: nothing Just find that there is a community ready to host your tracker Publish: Create tracker, upload .torrent metadata file Search: For file: nothing the community is supposed to provide search tools For segments: exchange segment IDs maps with other peers. Fetch: exchange segments with other peers (HTTP) 41 Gnutella vs. BitTorrent: Discussion Architecture Decentralization? System properties Reliability? Scalability? Fairness? Overheads? Quality of Service Search coverage for content? Ability to download content fast? 42