* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to Peer-to
Survey
Document related concepts
Computer security wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Airborne Networking wikipedia , lookup
Distributed operating system wikipedia , lookup
Transcript
Outline Introduction to Peer-to-Peer Systems ! ! ! ! ! ! Ozalp Babaoglu ALMA MATER STUDIORUM – UNIVERSITA’ DI BOLOGNA Introduction Peer-to-Peer vs. Client / Server Common Topologies Data location Churn Security © Montresor/Babaoglu 2010 ! Introduction ! Peer-to-peer (P2P) systems have become extremely popular and contribute to vast amounts of Internet traffic ! P2P basic definition: ! ! Complex Systems 2 P2P History: 1969 - 1990 ! The origins: ! A P2P system is a distributed collection of peer nodes Each node is both a server and a client: ! In the beginning, all nodes in Arpanet/Internet were peers Each node was capable of: ▴ Performing routing (locate machines) ▴ Accepting ftp connections (file sharing) ▴ Accepting telnet connections (distribution computation) ▴ May provide services to other peers ▴ May consume services from other peers ! Very different from the client-server model © Montresor/Babaoglu 2010 Complex Systems 3 © Montresor/Babaoglu 2010 Complex Systems 4 P2P History: 1999 - today Napster Napster Client 2 Napster Client 1 Copy: song.mp3 ! The advent of Napster: ! ! ! Query: song.mp3 Jan 1999: the first version of Napster was released by Shawn Fanning, student at the Northeastern University July 1999: Napster Inc. founded Feb 2001: Napster closed down Napster Client 3 Napster Central Index Server - Client/Server search - P2P download - Napster is not “pure P2P” Napster Client 4 Complex Systems © Montresor/Babaoglu 2010 Your Computer Client 2 5 Complex Systems © Montresor/Babaoglu 2010 Client/Server vs. Peer-to-Peer Example – Video sharing Client-Server: YouTube Advantages • Client can disconnect after upload downloader downloader • Uploader needs little bandwidth • Other users can find the file easily (just use search on server webpage) downloader uploader Disadvantages • Server may not accept file or remove it later (according to content policy) • Servers well connected to the • Only nodes located at the • Servers carry out critical tasks • Clients only talk to servers • Tasks distributed across all nodes • Clients talk to other clients “core” of the Internet © Montresor/Babaoglu 2010 “periphery of the Internet” Complex Systems downloader downloader client-server © Montresor/Babaoglu 2010 • Whole system depends on the server (what if shut down like Napster?) • Server storage and bandwidth are expensive! Complex Systems Example – Video sharing Peer-to-peer: BitTorrent Advantages • Does not depend on a central server downloader • Bandwidth shared across nodes (downloaders also act as uploaders) downloader • High scalability, low cost downloader seeder Disadvantages • Seeder must remain on-line to guarantee file availability downloader downloader peer-to-peer © Montresor/Babaoglu 2010 • Content is more difficult to find (downloaders must find .torrent file) • Freeloaders cheat in order to download without uploading Complex Systems P2P vs. client-server Client-server Peer-to-peer Asymmetric: client and servers carry out different tasks Symmetric: No distinction between node; they are peers Global knowledge: servers have a global view of the network Local knowledge: nodes only know a small set of other nodes Centralization: communications and management are centralized Decentralization: no global knowledge, only local interactions Single point of failure: a server failure brings down the system Robustness: several nodes may fail with little or no impact Limited scalability: servers easily overloaded High scalability: high aggregate capacity, load distribution Expensive: server storage and bandwidth capacity is not cheap Low-cost: storage and bandwidth are contributed by users © Montresor/Babaoglu 2010 10 Complex Systems P2P and Overlay Networks Overlay networks ! Peer-to-Peer Networks are usually “overlays” ! Logical structures built on top of a physical routed communication infrastructure (IP) that creates the allusion of a completely-connected graph ! Links based on logical relationships (“knows”) rather than physical connectivity Physical network: “who has a communication link to whom” © Montresor/Babaoglu 2010 Complex Systems 11 © Montresor/Babaoglu 2010 Complex Systems 12 Overlay networks Overlay networks Overlay network (ring): “who knows whom” Logical network: “who can communicate with whom” © Montresor/Babaoglu 2010 Complex Systems © Montresor/Babaoglu 2010 Complex Systems 13 14 Overlay networks P2P Environment ! High latency, low bandwidth communication ! Churn ! ! Nodes may disconnect temporarily New nodes are continuously joining the system, while others leave permanently ! Security ! ! P2P clients runs on machines under the total control of their owners Malicious users may try to bring down the system ! Selfishness ! Overlay network (binary tree): “who knows whom” © Montresor/Babaoglu 2010 Complex Systems Users may run hacked clients in order to avoid contributing resources © Montresor/Babaoglu 2010 15 Complex Systems 16 Why P2P? P2P Problems ! Overlay construction and maintenance ! ! Decentralization enables deployment of applications that are: ! ! ! ! ! ! Data location ! Highly available Fault-tolerant Self organizing Scalable Difficult or impossible to shut down locate a given data object among a large number of nodes ! Data dissemination ! propagate data in an efficient and robust manner ! Per-node state ! ! This results in a “grassroots” approach and “democratization” of the Internet © Montresor/Babaoglu 2010 e.g., random, two-level, ring, etc. ! Tolerance to churn ! 17 Complex Systems keep the amount of state per node small maintain system invariants (e.g., topology, data location, data availability) despite node arrivals and departures © Montresor/Babaoglu 2010 Complex Systems P2P Applications 18 P2P Topologies ! Sharing of content: ! ! File sharing, content delivery networks Gnutella, eMule, Akamai ! ! ! ! ! Sharing of storage: ! Distributed file systems ! Sharing of CPU time: ! ! Centralized Hierarchical Decentralized Hybrid Parallel computing Grid, Seti@home, Folding@home, FightAids@home © Montresor/Babaoglu 2010 Complex Systems 19 © Montresor/Babaoglu 2010 Complex Systems 20 Evaluating topologies ! Manageability ! ! Manageable ! How hard is it to keep working? ! Information coherence ! ! How authoritative is info? ! How easy is it to grow? ! How well can it handle failures? Single point of failure ! Censorship ! How hard is it to shut down? © Montresor/Babaoglu 2010 No ! Fault tolerance ! Censorship ! Information is centralized ! Extensible ! Fault tolerance ! System is all in one place ! Coherent ! Extensibility ! Centralized Complex Systems 21 Easy to shut down © Montresor/Babaoglu 2010 Complex Systems Hierarchical ! Manageable ! ! Manageable ! Chain of authority ! Coherent ! ! Cache consistency Add more leaves, rebalance ! Root is vulnerable Anyone can join in redundancy ! Censorship ! Just shut down the root © Montresor/Babaoglu 2010 ! ! Fault tolerance ! Censorship ! Difficult, unreliable peers ! Extensible ! Fault tolerance ! Difficult, many owners ! Coherent ! Extensible ! Decentralized Complex Systems Difficult to shut down © Montresor/Babaoglu 2010 Complex Systems Centralized + Decentralized ! Manageable ! ! Flat unstructured: a node can connect to any other node Same as decentralized ! Coherent ! Better than decentralized superpeer form a small overlay used for indexing and forwarding ! high load on superpeer ! redundancy ! Flat structured: constraints based on node ids ! Difficult to shut down © Montresor/Babaoglu 2010 fast join procedure good for data dissemination, bad for location ! Anyone can join in ! Censorship ! only constraint: maximum degree dmax ! ! Two-level unstructured: nodes connect to a superpeer ! Fault tolerance ! ! ! ! Extensible ! Some Common Topologies ! Complex Systems allows for efficient data location constraints require long join and leave procedures © Montresor/Babaoglu 2010 Data location: Flooding ! Problem: find the set of nodes S that store a copy of object O ! Flooding: send a search message to all nodes (first Gnutella protocol) ! ! Data location: Flooding ! Flooding (cont.) ! Flooding in a flat unstructured network: A search message contains either keywords or an object id Advantages: obj ▴ simplicity ▴ no topology constraints ! search horizon for TTL = 2 Disadvantages: ▴ high network overhead (huge traffic generated by each search request) ▴ flooding stopped by TTL (which produces search horizon) ▴ only applicable to small number of nodes © Montresor/Babaoglu 2010 26 Complex Systems Complex Systems Objects that lie outside of the horizon are not found 27 © Montresor/Babaoglu 2010 Complex Systems 28 Data location: Superpeers ! Two-Level Overlay (cont.) ! Two-level overlay: use superpeers to track the locations of an object [eMule, Gnutella 2, BitTorrent] ! ! Each node connects to a superpeer and advertises the list of objects it stores Search requests are sent to the supernode, which forwards them to other supernodes Advantages: ! Disadvantages: ! Data location: Superpeers obj request response ▴ highly scalable ▴ superpeers must be realiable, powerful and well connected to the Internet (expensive) ▴ superpeers must maintain large state ▴ the system relies on a small number of superpeers © Montresor/Babaoglu 2010 Complex Systems A two-level overlay is a partially centralized system In some systems, superpeers may be disconnected (e.g., BitTorrent) 29 © Montresor/Babaoglu 2010 30 Complex Systems Data location: Key-Based Routing Pastry ! Structured networks: use a routing algorithm that implements KeyBased Routing (KBR) [Chord, Pastry, Overnet, Kad, eMule] ! KBR (also known as Distributed Hash Tables) works as follows: ! ! ! ! ! nodes are (randomly) assigned unique node identifiers (nodeId) given a key k, the node whose nodeId is numerically closest to k among all nodes in the network is known as the root of key k given a key k, a KBR algorithm can route a message to the root of k in a small number of hops, usually O(log n) the location of an object with id objectId is tracked by the root of k = objectId thus, one can find the location of an object by routing a message to the root of k = objectId and querying the root for the location of the object © Montresor/Babaoglu 2010 Complex Systems 31 ! Pastry is a substrate for the wide area peer-to-peer location and routing middleware ! Any host that runs Pastry middleware and has proper credentials can participate in the overlay network © Montresor/Babaoglu 2010 Complex Systems 32 Pastry Characteristics ! Each node in the Pastry has a unique 128-bit nodeId ! Keys (of items or messages) and nodeIds are in the same address space ! Given a message with a key, Pastry routes the message to an alive node whose nodeId is numerically closest to the key (its root) ! The routing will be done in O(log n) hops where n is the number of nodes in the network ! Takes into account network locality ! Design of Pastry ! Node Identifiers ! ! ! Assigned randomly Uniformly distributed over some large ID space (e.g., 128 bits) through a hash function (e.g., SHA1) IDs are thought of as a sequence of base-2b digits ▴ b=1, IDs are binary numbers ▴ b=4, IDs are hexadecimal numbers ▴ Typically, b is 4 Seeks to minimize the distance the message travel © Montresor/Babaoglu 2010 33 Complex Systems © Montresor/Babaoglu 2010 Design of Pastry ! ! ! ! ! Route the message to the numerically closest node Done in approximately log2bn steps n number of nodes Eventual delivery of messages unless " L or more adjacent nodes fail simultaneously L is typically 24 © Montresor/Babaoglu 2010 Pastry Node State ! Routing table ! Routing characteristics ! 34 Complex Systems Complex Systems ! ! ! ! 35 log2bN rows 2b-1 entries in each rows N is the item space size The entry at row x, column y represents the node which shares a x-digit prefix with the current node, and the x+1st digit is y Each entry in the routing table refers to any node whose nodeId has the appropriate prefix © Montresor/Babaoglu 2010 Complex Systems 36 Pastry Node State Pastry Node State Routing table example ! Each entry maps the nodeId to its IP address ! Only the nodes which are likely to be close to the current node are selected ! If no node is suitable for the entry, it will be left blank ! Selection of b involves a trade off ! ! used to find next hop with longer shared prefix used to find the nodeid closest to a key that is close to the local nodeid When b decreases, the number of hops to route a message will increase When b increases, the populated portion in the routing table will be low • In this example the routing table size is 4 x 15 = 60 entries, for a maximum network size of N = 65536 nodes. • The average route length in this case is 4 hops. © Montresor/Babaoglu 2010 Complex Systems 37 © Montresor/Babaoglu 2010 Pastry Node State: Leaf set ! Contains the entries of the nodes numerically closest to the current node ! The set of nodes with " L numerically closest with larger nodeIds, and ! The set of nodes with " L numerically closest with smaller nodeIds ! L = 2b © Montresor/Babaoglu 2010 Complex Systems 39 38 Complex Systems Pastry Routing ! A particular node receives the message ! If the message key falls within the leaf set ! Directly forward the message to the target node ! Otherwise use the routing table ! The current node forwards the message to a node whose nodeId shares with the message ID a prefix that is one digit longer than the prefix shared with the current nodeId © Montresor/Babaoglu 2010 Complex Systems 40 Pastry Routing Pastry Routing ! If the entry cannot be found in the leaf set nor in the routing table ! ! ! ! Among all the entries that the current node has, identify the ones that share with the message key a prefix at least as long as the current nodeId Choose the an entry whose nodeId is numerically closer to the message key than the current node’s Such an entry must exist in the leaf set if the message hasn’t reached the destination Forward the message to that node © Montresor/Babaoglu 2010 ! The routing will finally converge ! 41 Complex Systems At each step, the message is sent to a node with ID closer to the message key © Montresor/Babaoglu 2010 Pastry Routing Example route(k=8955,msg) Key-Based Routing [Pastry] Source nodeid: k= 04F2 04F2 Hop id 04F2 85E0 8909 8957 8954 Object 8955 is tracked by node 8954, which knows of two copies stored at nodes 620F and C52A ! 8957 8821 8954 obj 8955 8909 stored at nodes 620F,C52A overlay address space [0000,FFFF] © Montresor/Babaoglu 2010 Complex Systems Disadvantages: ▴ each object must be tracked by a different node ▴ objects are tracked by unreliable nodes (which may disconnect) ▴ keyword-based searches are more difficult to implement than with superpeers (because objects are located by their objectid) ▴ the overlay must be structured according to a given topology in order to achieve a low hop count ▴ routing tables must be updated every time a node joins or leaves the overlay 5230 85E0 Advantages: ▴ completely decentralized (no need for superpeers) ▴ routing algorithm achieves low hop count for large network sizes C52A AC78 620F obj 8955 Data location: KRB (cont.) ! obj 8955 8955 Shared prefix length 0 1 2 3 3 (root of k) 42 ! Structured networks (cont.) E25A 3A79 Hop # 0 1 2 3 4 Complex Systems 43 © Montresor/Babaoglu 2010 Complex Systems 44 Data location - Loosely structured overlays Loosely structured networks: use hints for the location of objects [Freenet] • Nodes locate objects by sending search requests containing the objectid Data location - Loosely structured overlays Loosely structured networks (cont.) • A search response leaves routing hints on the path back to the source • Requests are propagated using a technique similar to flooding • Hints are used when propagating future requests for similar object ids • Objects with similar identifiers are grouped on the same nodes Hints AE5J: B request for AE5J C Hints AE5J: D 5B20: E AE5J 5B20 © Montresor/Babaoglu 2010 request for AF02 B B E C A A D response F Complex Systems AE5J 5B20 D E AF02 F AF02 Hints AE5J: F 45 46 Complex Systems © Montresor/Babaoglu 2010 Data location - Loosely structured overlays Data location - Summary The location schemes described previously can be classified according to the two dimensions: structure and decentralization ! Advantages: ! Degree of Structure no topology constraints, flat architecture searches are more efficient than with plain flooding ! Disadvantages: ! ! ! does not support keyword-based searches search requests have a TTL contrary to structured overlays, loosely structured overlay do not guarantee a low number of hops, nor that the object will be found © Montresor/Babaoglu 2010 Complex Systems 47 Degree of Decentralization ! © Montresor/Babaoglu 2010 low loose high partial eMule Gnutella 2 BitTorrent - - total Gnutella Freenet Chord Pastry Complex Systems 48 Effects of Churn ! Churn can have several effects on a P2P system: ! ! ! data objects may be become unavailable if all replicas disconnect routing tables may become inconsistent (e.g., entries may point to disconnected nodes) the overlay may become partitioned if many nodes suddenly disconnect: Churn - Preventing Partitions ! A naïve approach to preventing partitions is to increase the average node degree ! Ring partitions can be avoided by keep a list of successor nodes © Montresor/Babaoglu 2010 49 Complex Systems © Montresor/Babaoglu 2010 Churn Tolerance ! Node arrivals and departures must not disrupt the normal behavior of the system ! ! ! two types of churn tolerance: ! ▴ dynamic recovery: ability to react to changes in the overlay to maintain system invariants (e.g., heal partitions) ▴ static resilience: ability to continue operating correctly before adaptation occurs (e.g., route messages through alternate paths) © Montresor/Babaoglu 2010 Complex Systems Security ! Security in P2P systems is hard to enforce: system invariants must be maintained ▴ connected overlay (i.e., no partitions), low average path length ▴ data objects accessible from anywhere in the network ! 50 Complex Systems ! 51 Users have full control of their computers Modified clients may not follow the standard protocol Data may be corrupted Private data stored on remote computers may disclosed © Montresor/Babaoglu 2010 Complex Systems 52 Security - Weak identities ! The user may leave the system and rejoin it with a new identity (different user id) ! If an attack is detected, the attacker can re-enter the system with a new id ! An attacker may create a large number of false identities (Sybil attack) S1 ! The user cannot change its identity ! Solution: use a centralized, trusted Certification Authority (CA) ! ! S2 ! Example of Sybil attack: A ! - Nodes S1 to S6 are actually 6 instances of the P2P client running on the same machine S6 S3 © Montresor/Babaoglu 2010 ! Strong identities prevent Sybil Attacks ! If an attacker is caught, it cannot easily rejoin the system - The attacker can intercept all traffic coming from or going to node A Complex Systems © Montresor/Babaoglu 2010 Security - Weak vs. Strong identities ! Strong identities requires a centralized CA New nodes must contact the CA before joining the network: ▴ The CA response may be slow ▴ If the CA is unavailable, new nodes cannot join ! The security of the system depends on CA: ▴ The CA must correctly verify the identity of the requester ▴ The CA’s private key must be secret ! Many P2P systems use weak identities ! ! Each new user must obtain an identity certificate The certificate is digitally signed by the CA, whose public key is known by all users A certificate cannot be forged (require the CA’s private key) To prove his identity, a user signs a message with his private key, and attaches the corresponding certificate signed by the CA S4 S5 ! Security - Strong identities IP addresses already gives some identity information Some systems ensure anonymity © Montresor/Babaoglu 2010 Complex Systems 55 Complex Systems 54