* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Tracker
Network tap wikipedia , lookup
Video on demand wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Remote Desktop Services wikipedia , lookup
Airborne Networking wikipedia , lookup
Deep packet inspection wikipedia , lookup
Zero-configuration networking wikipedia , lookup
NETE4631 Network Information Systems (NISs): Peer-to-Peer (P2P) Suronapee, PhD [email protected] 1 P2P Peer-to-peer: “direct” connections between peers Peers are all equal - both a sender and a receiver of a content P2P core principle Self-organizing Large collection of resources Millions of simultaneous users, voluntary participation Scalability 2 no central management, peers are completely independent scalability with respect to number of nodes P2P principle P2P is an overlay network (of internet) a virtual network on top of the underlying IP network Overlay graph 3 Overlay network 4 Peer-to-Peer (P2P) Systems Old ideas 1979 - USENET news service (still in use) Popular around 1999 Napster, Kazaa and Gnutella for sharing files, music.. ‘01: Skype launched (Kazaa) ‘06’, ’10’: Acquired by eBay, Microsoft ‘01: BitTorrent launched – heavily used for file and music sharing Still very popular today for sharing multimedia content BitTorrent – 30% of internet traffic (mid 2000s) Skype – 663M users (2010), 700 M minutes a day Problem: Free Riders - only consume, not contribute 5 Current State of P2P P2P networks going strong, all over the world Currently P2P accounts for almost 70% of network traffic P2P networks currently mostly used for illegal sharing of copyrighted material Music, videos, software, … Content providers not so happy 6 Sue companies making P2P software (e.g., Napster), sue software developers (Winny), sue users sharing material P2P Application P2P principle applicable to many kinds of systems Content distribution Most current P2P targeted at one application: File sharing Users share files (e.g., music, video, software) and others download Also often illegally shared (except BitTorrent) Example From Acadamic Chord Communication 7 BitTorrent, Napster, Gnutella, KaZaA Skype Napster Napster launched in 1999 by Shawn Fanning The term “P2P was coined by Napster. In 2000:,25% of traffic out of Uni. of Wisconsin Madison, 60M users Centralized real-time directory, distributed files, mostly MP3 music; Based in USA; lawsuits put it out of business RIAA sues Napster, asking $100K per download Indirectly helping users to infringe copyright Currently, paid service 8 Pay % to songwriters and music companies as copyright required Napster protocol is open, people free to develop Napster Connect to Napster server Upload list of music files that you want to share Server stores no files Maintain a list of Structure <filename, ip_address, portnum> 9 Napster search Send server keywords to search with Server returns a list of hosts – <ip_address, portnum> tuples – to client Client pings each host in the list to find transfer rates Client fetches file from best host 10 Napster Problem Centralized server a source of congestion Centralized server single point of failure Napster.com declared to be responsible for users’ copyright violation “Indirect infringement” Next system: Gnutella 11 Gnutella Eliminate the servers Client search and retrieve amongst themselves Clients act as servers too, called servents In 2000, release by AOL, 88K users by ’03’ 12 How a peer join a network To join the network, peer needs the address of another peer that is currently a member New peer sends connect message to existing peer 13 GNUTELLA CONNECT Reply is simply “OK” Gnutella search Gnutella routes different messages within the overlay graph Gnutella protocol has 5 main message types Query (search) 14 QueryHit (response to query) Ping (to probe network for other peers) Pong (reply to ping, contains address of another peer) Push (used to initiate file transfer) Gnutella Message Header Format 15 Flooding query message Query message 16 How do search results come back? 17 Avoiding excessive traffic To avoid duplicate transmissions, each peer maintains a list of recently received messages Query forwarded to all neighbors except peer from which received Each Query (identified by DescriptorID) forwarded only once QueryHit routed back only to peer from which Query received with same DescriptorID Duplicates with same DescriptorID and Payload descriptor (msg type) are dropped QueryHit with DescriptorID for which Query not seen is dropped 18 After receiving QueryHit messages Requestor chooses “best” QueryHit responder Initiates HTTP request directly to responder’s ip+port Responder then replies with file packets after this message: 19 Dealing with Firewalls Requestor sends Push to responder asking for file transfer Responder establishes a TCP connection at ip_address, port specified. Sends Requestor then sends GET to responder (as before) and file is transferred as explained earlier 20 PING-PONG Peers initiate Ping’s periodically Ping’s flooded out like Query’s, Pong’s routed along reverse path like QueryHit’s Pong replies used to update set of neighboring peers 21 To keep neighbor lists fresh in spite of peers joining, leaving and failing Problem Flooding a query is extremely inefficient Gnutella’s network management not efficient Wastes lot of network and peer resources Repeated searches with same keywords Solution: Periodic PING/PONGs consume lot of resources Ping/Pong constituted 50% traffic Modem-connected hosts do not have enough bandwidth for passing Gnutella traffic Another solution: 22 FastTrack System FastTrack Hybrid between Gnutella and Napster Takes advantage of “healthier” participants in the system Underlying technology in Kazaa, KazaaLite, Grokster Like Gnutella, but with some peers designated as supernodes 23 FastTrack (2) A supernode stores a directory listing a subset of nearby (<filename,peer pointer>), similar to Napster servers Supernode membership changes over time Any peer can become (and stay) a supernode, provided it has earned enough reputation Kazaalite: participation level (=reputation) of a user between 0 and 1000, initially 10, then affected by length of periods of connectivity and total number of uploads More sophisticated Reputation schemes invented, especially based on economics (See P2PEcon workshop) A peer searches by contacting a nearby supernode 24 Strength Combines good points from Napster and Gnutella Efficient searching under each supernode Flooding restricted to supernodes only Result: Efficient searching with “low” resource usage Most popular network 25 Lot of content, lot of users Currently most file sharing networks adopted this architecture BitTorrent Developed by Bram Cohen in 2001 Written in Python, available on many platforms BitTorrent is a new approach for sharing large files distributed directories, distributed files Each file divided as chunks Each chunk contains 32 KB – 256 KB Each chunks can traverse different paths BitTorrent widely used also for legal content 26 For example, Linux distributions, software patches, Official movie Currently lots of illegal content on BitTorrent too… Topology of BitTorrent Overlay graph (1) physical (2) neighboring peer (3) peering relationship A tracker a server which tracks the currently active clients serves as a centralized directory Topology can be changed regularly 27 Tracker factors: content, distance, peer churn, randomization BitTorrent: Players Three entities needed to start distribution of a file Terminology: 28 A “torrent” file: the metadata about the file Seed: Client with a complete copy of the file Leecher: Client still downloading the file BitTorrent Start Up New client gets torrent-file and gets peer list from tracker 29 BitTorrent Operation 30 Summary of BitTorrent operation A new peer A receives a.torrent file from one of the BitTorrent web servers, including the name, size, and number of chunks of a particular file, together with the IP address and port number of the corresponding tracker. It then registers with the right tracker. It will also periodically send keep-alive messages to the tracker. The tracker sends to peer A a list of potential peers (peer set = 50 peers). Peer A selects a subset (following the tit-for-tat and randomization rules) and establishes connections with these five peers. Peer A downloads chunks from peers in peer set and provides them with its own chunks (possible to parallel) Chunks typically 256 KB Starting with the rarest chunks. Every now and then, each peer updates its peer list. 31 Peering construction methods Tracker suggests a set of 50 peers Let new peer picks 5 peers (at this time!) for exchanging chunks Exchanging contents evenly between them (Rarely chunk first) Peer serves 4 peers in peer set simultaneously (tit-for-tat) Seeks 4 best downloaders in last time slot if it’s a seed Seeks 4 best uploaders in last time slot if it’s a leecher The fifth peer selected at 50% randomly (randomization) Choking: Limit number of neighbors to which concurrent uploads or download <= a number 32 Strength Tit-for-Tat A peer serves peers that serve it Rarely chunk first Prefer early download of blocks that are least replicated among neighbors Encourages cooperation, discourage free-riding Avoid the problem that most of peers have most of the chuck but all must wait for the few rare chunks Randomization 33 avoids unfairness of little upload capacity nodes Weakness File needs to be quite large 256 KB chunks Rarest first needs large number of chunks Everyone must contribute 34 Low-bandwidth clients have a disadvantage How can BitTorrent be free? It leverages peer uplink capacities to send chunks of files to each other without deploying many media servers. P2P is used for sharing content in BitTorrent. Scalable? 35 Add many nodes as the network scale up without a bottleneck P2P versus client-server architecture Client-server architecture Each client requests data from the server Not help each other P2P 36 Peer is both a sender and a receiver of a content each peer helps each other in a distributed manner Data transmission is distributed Although control plane for signaling is centralized Summary Most existing P2P networks built on searching, however Searching does not scale in same way Either centralized system with all its problems Distributed system with all its problems Hybrid systems cannot guarantee discovery either Alternatively, use addressing instead of searching Distributed hash tables (DHTs) - efficient searching and object location in P2P network Example 37 Chord, CAN, Plaxton, Pastry, Tapestry Reference Kangasharju: Peer-to-Peer Networks Brinton, Christopher; Chiang, Mung (2013-06-10). Networks Life: 20 Questions and Answers