Download chord - CSE, IIT Bombay

Distributed Hash-based Lookup for Peer-to-Peer Systems Mohammed Junaid Azad 09305050 Gopal Krishnan 09305915 Mtech1 ,CSE Agenda • • • • Peer-to-Peer System Initial Approaches to Peer-to-Peer Systems Their Limitations Distributed Hash Tables – CAN-Content Addressable Network – CHORD Peer-to-Peer Systems • Distributed and Decentralized Architecture • No centralized Server(Unlike Client Server Architecture) • Any Peer can behave as Server Napster • P2P file sharing system • Central Server stores the index of all the files available on the network • To retrieve a file, central server contacted to obtain location of desired file • Not completely decentralized system • Central directory not scalable • Single point of failure Gnutella • P2P file sharing system • No Central Server store to index the files available on the network • File location process decentralized as well • Requests for files are flooded on the network • No Single point of failure • Flooding on every request not scalable File Systems for P2P systems • The file system would store files and their metadata across nodes in the P2P network • The nodes containing blocks of files could be located using hash based lookup • The blocks would then be fetched from those nodes Scalable File indexing Mechanism • In any P2P system, File transfer process is inherently scalable • However, the indexing scheme which maps file names to location crucial for scalability • Solution:- Distributed Hash Table Distributed Hash Tables • Traditional name and location services provide a direct mapping between keys and values • What are examples of values? A value can be an address, a document, or an arbitrary data item • Distributed hash tables such as CAN/Chord implement a distributed service for storing and retrieving key/value pairs 8 DNS vs. Chord/CAN DNS Chord • provides a host name to IP address mapping • can provide same service: Name = key, value = IP • relies on a set of special root servers • requires no special servers • imposes no naming structure • names reflect administrative boundaries • is specialized to finding named hosts or services • can also be used to find data objects that are not tied to certain machines 9 Example Application using Chord: Cooperative Mirroring File System Block Store Block Store Block Store Chord Chord Chord Client Server Server • Highest layer provides a file-like interface to user including user-friendly naming and authentication • This file systems maps operations to lower-level block operations • Block storage uses Chord to identify responsible node for storing a block and then talk to the block storage server on that node 10 CAN What is CAN ? • CAN is a distributed infrastructure that provides hash table like functionality • CAN is composed of many individual nodes • Each CAN node stores a chunk (zone) of the entire hash table • Request for a particular key is routed by intermediate CAN nodes whose zone contains that key • The design can be implemented in application level (no changes to kernel required) Co-ordinate space in CAN Design Of CAN • Involves a virtual d-dimensional Cartesian Coordinate space – The co-ordinate space is completely logical – Lookup keys hashed into this space • The co-ordinate space is partitioned into zones among all nodes in the system – Every node in the system owns a distinct zone • The distribution of zones into nodes forms an overlay network Design of CAN (..continued) • To store (Key,value) pairs, keys are mapped deterministically onto a point P in co-ordinate space using a hash function • The (Key,value) pair is then stored at the node which owns the zone containing P • To retrieve an entry corresponding to Key K, the same hash function is applied to map K to the point P • The retrieval request is routed from requestor node to node owning zone containing P Routing in CAN • Every CAN node holds IP address and virtual coordinates of each of it’s neighbours • Every message to be routed holds the destination co-ordinates • Using it’s neighbour’s co-ordinate set, a node routes a message towards the neighbour with coordinates closest to the destination co-ordinates • Progress: how much closer the message gets to the destination after being routed to one of the neighbours Routing in CAN(continued…) • For a d-dimensional space partitioned into n equal zones, routing path length = O(d.n1/d ) hops – With increase in no. of nodes, routing path length grows as O(n1/d ) • Every node has 2d neighbours – With increase in no. of nodes, per node state does not change Before a node joins CAN After a Node Joins Allocation of a new node to a zone 1. First the new node must find a node already in CAN(Using Bootstrap Nodes) 2. The new node randomly chooses a point P in the coordinate space 3. It sends a JOIN request to point P via any existing CAN node 4. The request is forwarded using CAN routing mechanism to the node D owning the zone containing P 5. D then splits it’s node into half and assigns one half to new node 6. The new neighbour information is determined for both the nodes Failure of node • Even if one of the neighbours fails, messages can be routed through other neighbours in that direction • If a node leaves CAN, the zone it occupies is taken over by the remaining nodes – If a node leaves voluntarily, it can handover it’s database to some other node – When a node simply becomes unreachable, the database of the failed node is lost • CAN depends on sources to resubmit data, to recover lost data CHORD Features • CHORD is a distributed hash table implementation • Addresses a fundamental problem in P2P  Efficient location of the node that stores desired data item  One operation: Given a key, maps it onto a node  Data location by associating a key with each data item • Adapts Efficiently  Dynamic with frequent node arrivals and departures  Automatically adjusts internal tables to ensure availability • Uses Consistent Hashing  Load balancing in assigning keys to nodes  Little movement of keys when nodes join and leave Features (continued) • Efficient Routing Distributed routing table Maintains information about only O(logN) nodes Resolves lookups via O(logN) messages • Scalable Communication cost and state maintained at each node scales logarithmically with number of nodes • Flexible Naming Flat key-space gives applications flexibility to map their own names to Chord keys • Decentralized Some Terminology • Key  Hash key or its image under hash function, as per context  m-bit identifier, using SHA-1 as a base hash function • Node  Actual node or its identifier under the hash function  Length m such that low probability of a hash conflict • Chord Ring  The identifier circle for ordering of 2m node identifiers • Successor Node  First node whose identifier is equal to or follows key k in the identifier space • Virtual Node  Introduced to limit the bound on keys per node to K/N  Each real node runs Ω(logN) virtual nodes with its own identifier Chord Ring Consistent Hashing • A consistent hash function is one which changes minimally with changes in the range of keys and a total remapping is not required • Desirable properties High probability that the hash function balances load Minimum disruption, only O(1/N) of the keys moved when a nodes joins or leaves Every node need not know about every other node, but a small amount of “routing” information • m-bit identifier for each node and key • Key k assigned to Successor Node Simple Key Location Example Scalable Key Location • A very small amount of routing information suffices to implement consistent hashing in a distributed environment • Each node need only be aware of its successor node on the circle • Queries for a given identifier can be passed around the circle via these successor pointers • Resolution scheme correct, BUT inefficient: it may require traversing all N nodes! 30 Acceleration of Lookups • Lookups are accelerated by maintaining additional routing information • Each node maintains a routing table with (at most) m entries (where N=2m) called the finger table • ith entry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2i-1 on the identifier circle (clarification on next slide) • s = successor(n + 2i-1) (all arithmetic mod 2) • s is called the ith finger of node n, denoted by n.finger(i).node 31 Finger Tables (1) finger table start int. 1 2 4 [1,2) [2,4) [4,0) 1 3 0 finger table start int. 0 1 7 succ. keys 6 6 2 3 5 [2,3) [3,5) [5,1) succ. keys 1 3 3 0 2 5 3 4 finger table start int. 4 5 7 [4,5) [5,7) [7,3) succ. keys 2 0 0 0 32 Finger Tables (2) - characteristics • Each node stores information about only a small number of other nodes, and knows more about nodes closely following it than about nodes farther away • A node’s finger table generally does not contain enough information to determine the successor of an arbitrary key k • Repetitive queries to nodes that immediately precede the given key will lead to the key’s successor eventually 33 Pseudo code Example Node Joins – with Finger Tables finger table start int. 1 2 4 [1,2) [2,4) [4,0) finger table start int. 7 0 2 [7,0) [0,2) [2,6) 1 3 06 finger table start int. 0 1 7 succ. keys 6 2 3 5 [2,3) [3,5) [5,1) succ. keys 1 3 3 06 keys succ. 0 0 3 6 2 5 3 4 finger table start int. 4 5 7 [4,5) [5,7) [7,3) succ. keys 2 06 06 0 36 Node Departures – with Finger Tables finger table start int. 1 2 4 [1,2) [2,4) [4,0) finger table start int. 7 0 2 [7,0) [0,2) [2,6) succ. 0 0 3 keys 6 succ. 13 3 06 finger table start int. 0 1 7 keys 6 2 3 5 [2,3) [3,5) [5,1) succ. keys 1 3 3 06 2 5 3 4 finger table start int. 4 5 7 [4,5) [5,7) [7,3) succ. keys 2 6 6 00 37 Source of Inconsistencies: Concurrent Operations and Failures • Basic “stabilization” protocol is used to keep nodes’ successor pointers up to date, which is sufficient to guarantee correctness of lookups • Those successor pointers can then be used to verify the finger table entries • Every node runs stabilize periodically to find newly joined nodes 38 Pseudo code Pseudo Code(Continue..) Stabilization after Join  ns n joins      succ(np) = ns nil np runs stabilize  n succ(np) = n pred(ns) = np pred(ns) = n  predecessor = nil n acquires ns as successor via some n’ n notifies ns being the new predecessor ns acquires n as its predecessor   np asks ns for its predecessor (now n) np acquires n as its successor np notifies n n will acquire np as its predecessor  all predecessor and successor pointers are now correct  fingers still need to be fixed, but old fingers will still work np 41 Failure Recovery • Key step in failure recovery is maintaining correct successor pointers • To help achieve this, each node maintains a successor-list of its r nearest successors on the ring • If node n notices that its successor has failed, it replaces it with the first live entry in the list • stabilize will correct finger table entries and successor-list entries pointing to failed node • Performance is sensitive to the frequency of node joins and leaves versus the frequency at which the stabilization protocol is invoked 42 Impact of Node Joins on Lookups: Correctness • For a lookup before stabilization has finished, 1. Case 1: All finger table entries involved in the lookup are reasonably current then lookup finds correct successor in O(logN) steps 2. Case 2: Successor pointers are correct, but finger pointers are inaccurate. This scenario yields correct lookups but may be slower 3. Case 3: Incorrect successor pointers or keys not migrated yet to newly joined nodes. Lookup may fail. Option of retrying after a quick pause, during which stabilization fixes successor pointers Impact of Node Joins on Lookups: Performance • After stabilization, no effect other than increasing the value of N in O(logN) • Before stabilization is complete • Possibly incorrect finger table entries • Does not significantly affect lookup speed, since distance halving property depends only on ID-space distance • If new nodes’ IDs are between the target predecessor and the target, then lookup speed is influenced • Still takes O(logN) time for N new nodes Handling Failures • Problem: what if node does not know who its new successor is, after failure of old successor • May be in a gap in the finger table • Chord would be stuck! • Maintain successor list of size r, containing the node’s first r successors • If immediate successor does not respond, substitute the next entry in the successor list • Modified version of stabilize protocol to maintain the successor list • Modified closest_preceding_node to search not only finger table but also successor list for most immediate predecessor • If find_successsor fails, retry after some time • Voluntary Node Departures • Transfer keys to successor before departure • Notify predecessor p and successor s before leaving Theorems • Theorem IV.3: Inconsistencies in successor are transient • If any sequence of join operations is executed interleaved with stabilizations, then at sometime after the last join the successor pointers will form a cycle on all the nodes in the network. • Theorem IV.4: Lookup take log(N) time with high probability even if N nodes join a stable N node network, once successor pointers are correct, even if finger pointers are not updated • Theorem IV.6: If network is initially stable, even if every node fails with probability ½, expected time to execute find_succcessor is O(log N) Simulation • Implements Iterative Style (other one is recursive style) • Node resolving a lookup initiates all communication unlike Recursive Style, where intermediate nodes forward request Optimizations • During stabilization, a node updates its immediate successor and 1 other entry in successor list or finger table • Each entry out of k unique entries gets refreshed once in ’k’ stabilization rounds • Size of successor list is 1 • Immediate notification of predecessor change to old predecessor, without waiting for next stabilization round Parameters • • • • Mean of delay of each packet is 50 ms Round trip time is 500 ms Number of nodes is 104 Number of Keys vary from 104 to 106 Load Balance • Test ability of consistent hashing, to allocate keys to nodes evenly • Number of keys per node exhibits large variations, that increase linearly with the number of keys • Association of keys with Virtual Nodes Makes the number of keys per node more uniform and Significantly improves load balance • Asymptotic value of query path length not affected much • Total identifier space covered remains same on average • Worst-case number of queries does not change • Not much increase in routing state maintained • Asymptotic number of control messages not affected In the absence of Virtual Node In Presence of Virtual Nodes Path Length • Number of nodes that must be visited to resolve a query, measured as the query path length • As per theorem, IV.2 • The number of nodes that must be contacted to find a successor in an N-node Network is O(log N) • Observed Results : • Mean query path length increases logarithmically with number of nodes • Average Same as expected average query path length Path Length Simulator Parameters • A network with N = 2K nodes • No of Keys 100 x 2K • K varied between 3 to 14 and Path length is measured Graph Future Work • • • • • • • • • • • Resilience against Network Partitions Detect and heal partitions For every node have a set of initial nodes Maintain a long term memory of a random set of nodes Likely to include nodes from other partition Handle Threats to Availability of data Malicious participants could present incorrect view of data Periodical Global Consistency Checks for each node Better Efficiency O(logN) messages per lookup too many for some apps Increase the number of fingers References • Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications, I. Stoica, R. Morris, D. Karger, M. Frans Kaashoek, H. Balakrishnan, In Proc. ACM SIGCOMM 2001. Expanded version appears in IEEE/ACM Trans. Networking, 11(1), February 2003. • A Scalable Content-Addressable Network, S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, In Proc. ACM SIGCOMM 2001) • Querying the Internet with PIER Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, and Ion Stoica, VLDB 03 Thank You ! Any Question ??

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download chord - CSE, IIT Bombay