* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt1 - People.cs.uchicago.edu
Survey
Document related concepts
Computer network wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Network tap wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
SIP extensions for the IP Multimedia Subsystem wikipedia , lookup
Distributed firewall wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Deep packet inspection wikipedia , lookup
Airborne Networking wikipedia , lookup
Everything2 wikipedia , lookup
Transcript
CSPP 54001: Large-Scale Networked Systems Week 5: P2P Technologies and Applications Matei Ripeanu P2P Definition(s) A number of definitions coexist: Def 1: “A class of applications that takes advantage of resources — storage, cycles, content, human presence — available at the edges of the Internet.” Edges often turned off, without permanent IP addresses Def 2: “A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric.” Lots of other definitions that fit in between CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Principles More in-depth Case Studies Gnutella CDN: P2P, Web and Akamai traffic analysis DHTs Your Role CSPP 54001 – February 7, 2003 P2P Impact: Widespread adoption KaZaA – 170 millions downloads (3.5M/week) the most popular application ever! Number of users for file-sharing applications (www.slyck.com, FastTrack 4,114,120 2/4/2003) iMesh 1,375,199 eDonkey DirectConnect Blubster Gnutella CSPP 54001 –Cvernet February 7, 2003 569,097 136,552 97,128 92,678 91,750 P2P Impact (2): Huge traffic P2P generated traffic now dominates the Internet load Internet2 traffic statistics Cornell.edu (March ’02): 60% P2P UChicago estimate (March100% ‘01): Gnutella control 90% traffic about 1% of all Internet traffic. 80% 70% 60% Other 50% Data transfers 40% Unidentified 30% File sharing 20% 10% 0% CSPP 54001 – February 7, 2003 Feb.'02 Aug.'02 Feb.'03 P2P Impact (3) Shows a huge pool of underutilized resources that can be put to work Seti@Home statistics Users Last 24 Hours 4,236,090 2,365 764M 1.13M Total CPU time 1.3 M years 1.3 K years Floating Point Operations 2.564760e+21 4.441277e+18 (51.40 TeraFLOPs/sec) Results received Total Entropia statistics : CSPP 54001 – February 7, 2003 P2P Impact (4) Might force a few companies to change their business models Data copying and distribution carries zero almost cost now this might impact copyright laws New research domain grants and PhD theses CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Applications Mechanisms More in-depth Case Studies Your Role CSPP 54001 – February 7, 2003 Applications: Number crunching Examples: Seti@Home, Entropia, UnitedDevices, DistributedScience, many others Approach suitable for a particular class of problems. Some characteristics (for Seti@Home): Massive parallelism Low bandwidth/computation ratio Fixed-rate data processing task Error tolerance Users do donate *real* resources What are the problems? $1.5M / year extra consumed power Centralized. Does it scale? How to prevent cheating? How to extend the model to problems CSPP 54001 – February 7,that 2003 are not massively parallel Applications: File sharing The ‘killer application’ to date Too many to list them all: Napster, FastTrack (KaZaA, KazaaLite), Gnutella (LimeWire, Morpheus, BearShare), iMesh CSPP 54001 – February 7, 2003 Applications: Content distribution uServ project: your own Web Server at no (or little) cost Goal: Provide a system and software that users run to serve content on the web with their own machines Alternatives: Run your own webserver Hosting Services (free or paid) Other P2P applications Not just a webserver, but a webserving community: Users cooperatively improve availability (through site replication/mirroring) Users cooperatively liberate fire-walled content (through P2P proxying/relaying) CSPP 54001 – February 7, 2003 Applications: Content Distribution Streaming: the user plays the data as as it arrives Possible solution: If A is the first user, A gets the stream from the server Else, A can get the stream from the central site or a set of users who have already been receiving the stream source Oh, I am exhausted! P2P approach CSPP 54001 – February 7, 2003 Client/server approach Applications: Measurements Evaluate the performance of your Web site form end-user perspective (example: Porivo Networks) Multiple views on your site performance Generating Internet statistics Connectivity statistics Routing errors CSPP 54001 – February 7, 2003 Measurements: The Performance “Blind Spot” Back-end Infrastructure Network Landscape Last-mile “Blind Spot” Datacenter Testing “Beacon” Database Web server Backbone ISP Enterprise Provider T1 Firewall Corporate Network ISP App server Backbone 3rd party content Regional Network Corporate User Major Provider Local ISP Component Testing •BMC •Mercury Interactive •Tivoli •ProactiveNet •HP OpenView •Computer Associates Datacenter Monitoring •Keynote Systems •Mercury Interactive •BMC/SiteAngel •Service Metrics Consumer User Critical to estimate end-to-end performance CSPP 54001 – February 7, 2003 Slide source: www.porivo.com Measurements: End-to-end Performance Back-end Infrastructure Database Network Landscape Web server App server Backbone ISP Enterprise Provider T1 Firewall Corporate Network ISP Backbone 3rd party content Regional Network Corporate User Major Provider Local ISP Component Testing Datacenter Monitoring Consumer User End-to-end Web Performance Testing CSPP 54001 – February 7, 2003 Slide source: www.porivo.com More applications … Instant messaging (Yahoo, AOL) Collaborative environments (Groove) Backup storage (HiveNet, OceanStore) Spam filtering Anonymous email Censorship-resistant publishing systems (Ethernity, Freenet) CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Your Role CSPP 54001 – February 7, 2003 Mechanisms (1) To obtain a resilient system integrate multiple components with uncorrelated failure curves. Use data and service replication. To improve quality of service delivered integrate multiple providers with uncorrelated demand curves (this way less over-provisioning is necessary for each of them) Move service delivery closer to the user CSPP 54001 – February 7, 2003 Mechanisms (2) To provide anonymity Detect anomalies, generate good statistics use large number of independent components, “hide in the crowd” and make search impossible (or costly) Use multiple views Other facts It is not clear what’s the QoS acceptable for users (sometime they are willing to accept a degradation in service quality if they get it for free) Social engineering – building communities CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Gnutella Traffic analysis: P2P vs. Web vs. Akamai DHTs Your Role CSPP 54001 – February 7, 2003 Gnutella Network Why analyze Gnutella network? Large scale – up to 500k nodes, 100TB data, 10M files today Self-organizing network Fast growth in its early stages – more than 50 times during first half of 2001 Open architecture, simple and flexible protocol Interesting mix of social and technical issues CSPP 54001 – February 7, 2003 Gnutella protocol overview P2P file sharing app. on top of an overlay network Nodes maintain open TCP connections Messages are broadcasted (flooded) or back-propagated (Initial) protocol Membership Query Broadcast (Flooding) Backpropagated PING PONG QUERY QUERY HIT File download Node to node GET, PUSH Protocol refinements (2001 and later) Ping messages used more efficiently, Vendor specific extensions, GWebCaches, XML searches, super-nodes (2layer hierarchy). CSPP 54001 – February 7, 2003 Gnutella search mechanism Steps: • Node 2 initiates search for file A 7 1 A 4 2 6 3 5 CSPP 54001 – February 7, 2003 Gnutella search mechanism A Steps: • Node 2 initiates search for file A • Sends message to all neighbors 7 1 4 2 3 A 6 A 5 CSPP 54001 – February 7, 2003 Gnutella search mechanism A A Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message 7 1 4 2 6 3 A A 5 CSPP 54001 – February 7, 2003 Gnutella search mechanism A:7 A 7 1 4 2 6 3 A:5 A A Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message 5 CSPP 54001 – February 7, 2003 Gnutella search mechanism 7 1 4 2 3 A:7 A:5 A 6 A Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated 5 CSPP 54001 – February 7, 2003 Gnutella search mechanism 7 1 A:7 2 4 A:5 6 3 Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated 5 CSPP 54001 – February 7, 2003 Gnutella search mechanism download A 1 7 4 2 6 3 5 Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is backpropagated • File download CSPP 54001 – February 7, 2003 Gnutella: tools for network exploration Eavesdropper - modified node inserted into the network to log traffic. Crawler - connects to all active nodes and uses the membership protocol to discover graph topology. Client-server approach. CSPP 54001 – February 7, 2003 Gnutella: Network size Explosive growth in 2001, but slowly shrinking after High user interest Users tolerate high latency, low quality results Better resources DSL and cable modem nodes grew from 24% to 41% over first 6 months. CSPP 54001 – February 7, 2003 Gnutella: Growth Invariants (1) Unchanged average node connectivity Number of links ('000) 3.4 links/node on average 200 150 100 50 0 0 10000 20000 30000 40000 50000 Number of nodes CSPP 54001 – February 7, 2003 Gnutella: Growth Invariants Average node-to-node distance varied only 25% while the network grew 50 times over 6 months Percent of node pairs (%) (1) Unchanged average node connectivity (2) Node-to-node distance maintains similar distribution 50% 40% 30% 20% 10% 0% 1 2 3 CSPP 54001 – February 7, 2003 4 5 6 7 8 9 10 11 12 Node-to-node shortest path (hops) Is Gnutella a power-law network? Power-law networks: the number of links per node follows a power-law distribution N = L-k Num. of nodes (log scale) 10000 November 2000 1000 100 10 1 1 10 Number of links (log scale) 100 Examples: The Internet, In/out links to/from HTML pages, Citations network, US power grid, Social networks. Implications: High tolerance to random node failure but low reliability when facing CSPP of an ‘intelligent’ adversary 54001 – February 7, 2003 Is Gnutella a power-law network? Later, larger networks display a bimodal distribution Implications: High tolerance to random node failures preserved Increased reliability when facing an attack. 10000 Number of nodes (log scale) May 2001 1000 100 10 1 1 CSPP 54001 – February 7, 2003 10 100 Number of links (log scale) Gnutella: Query distribution Similar to Web pages popularity: Zipf distribution for query popularity Significance: caching will work well CSPP 54001 – February 7, 2003 Gnutella: Traffic analysis messages per secod 6-8 kbps per link over all connections Traffic structure Message Frequency 25 changed over time . 20 Ping Push Query Other 15 10 5 CSPP 54001 – February 7, 2003 353 321 289 257 225 193 161 129 97 65 33 1 minute Gnutella: Total generated traffic 1Gbps (or 330TB/month)! Note that this estimate excludes actual file transfers Q: Does it matter? Compare to 15,000TB/month estimated in US Internet backbone (Dec. 2000) CSPP 54001 – February 7, 2003 Gnutella: Topology mismatch A B F D E C G H Physical links Logical (overlay) links Perfect mapping! CSPP 54001 – February 7, 2003 Gnutella: Topology mismatch A B F D E C G H Inefficient mapping Link D-E needs to support six times higher traffic. CSPP 54001 – February 7, 2003 Gnutella: Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology! 40% of all nodes are in the 10 largest Autonomous Systems (AS) Only 2-4% of all TCP connections link nodes within the same AS Largely ‘random wiring’ CSPP 54001 – February 7, 2003 Gnutella: Free Riding More than 25% of Gnutella clients share no files; 75% share 100 files or less Conclusion: Gnutella has a high percentage of free riders If only a few individuals contribute to the public good, these few peers effectively act as centralized servers. Adar and Huberman (Aug ’00) CSPP 54001 – February 7, 2003 Gnutella: Summary Gnutella: self-organizing, large-scale, P2P application based on overlay network. It works! Discovered growth invariants specific to largescale systems that: Help predict resource usage. Give hints for better search and resource organization techniques. Growth hindered by the volume of generated traffic and inefficient resource use. CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Gnutella Content Distribution Systems traffic analysis: P2P vs. Web vs. Akamai DHTs Your Role CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Your Role CSPP 54001 – February 7, 2003 Your Role Network administrator Lawyer Application designer/developer Entrepreneur User CSPP 54001 – February 7, 2003 More information Links to reports, papers, slides used to prepare this presentation: http://www.cs.uchicago.edu/~matei/P2P/ CSPP 54001 – February 7, 2003 Course Outline (Subject to Change) 1. 2. 3. 4. 5. (January 9th) Internet design principles and protocols (January 16th) Internetworking, transport, routing (January 23rd) Mapping the Internet and other networks (January 30th) Security (February 6th) P2P technologies & applications (Matei Ripeanu) (plus midterm) 6. 7. 8. 9. 10. (February 13th) Optical networks (Charlie Catlett) *(February 20th) Web and Grid Services (Steve Tuecke) (February 27th) Network operations (Greg Jackson) *(March 6th) Advanced applications (with guest lecturers: Terry Disz, Mike Wilde) (March 13th) Final exam * Ian Foster is out of town. CSPP 54001 – February 7, 2003