Download Document

Distributed Hash tables 1 Overview  Objective A distributed lookup service  Data items are distributed among n parties  Anyone in the network can find an item efficiently  No central node or infrastructure   Applications: P2P file sharing, without central node (items are files)  Large scale, distributed or grid computation (items are computational tasks)  2 Chord  Uses consistent hashing to map keys to nodes.  Enables Lookup  Node joining the network  Node leaving the network   Each node maintains routing information on the network. 3 Consistent hashing  Consistent hash function assigns each node and key an m-bit identifier.  A node is a member of the network: a processing and communication unit.  A key identifies a data item:  Example: movie name  A node’s identifier can be defined by e.g. hashing the node’s IP address.  An identifier for node or key can be produced by hashing   ID(node) = hash(IP address) ID(key) = hash(key) 4 Consistent hashing (cont.)  Properties of consistent hashing:  Load balancing – given N nodes, with high probability O(1/N) of the keys are stored in each node.  When the N-th node joins, with high probability O(1/N) of the keys are moved.  Practical problems with identifiers  IP is not always a good identifier: • NAT • DHCP  Key is not always well defined (e.g. file name). 5 Chord ring  Chord orders all m-bit identifiers (both nodes and      keys) on ring of 2m elements Order is defined modulo 2m (2m-1<0) Identifier space for keys – {0,…,2m-1}  Not all keys need be in use concurrently Identifier space for nodes – {0,…,2m-1}  Not all nodes are necessarily active concurrently Key k is assigned to node n if  id(n)=min {id (n’); id(n’)id(k)} ( is interpreted modulo 2m, i.e. first node clockwise). n is called successor(k) 6 Successor Nodes identifier node 6 1 0 successor(6) = 0 6 identifier circle 6 5 key successor(1) = 1 1 7 X 2 2 successor(2) = 3 3 4 2 m=3 7 More on hashing  Desirable properties for hash function:  Collision resistant on nodes (mandatory)  Collision resistant on keys (hopefully)  Load balancing  Good adversarial behavior.  Chord suggestion  SHA-1  ID(node) = SHA-1(IP address)  ID(key) = SHA-1(key)  Drawback  Full SHA-1 implies m=160  Truncated SHA-1 may not be collision-resistant 8 Key lookup  Suppose each node stores successor(node)  We could use the following pseudo-code n.find_successor(id) if (id  [node, successor]) return successor; else // forward the query around the circle return successor.find_successor(id);  Example of execution  What is the time/message complexity? 9 Key lookup in O(m)  Each node n holds additional routing information: finger table m entries (at most)  Each entry is called “finger”   Data in each entry Finger[i].node=successor (n+2i-1 mod 2m), i=1,…,m  Finger[i] includes TCP/IP access information such as IP and port.   Denote the i-th finger by node.finger(i) 10 Example – finger tables finger table start int. 1 2 4 [1,2) [2,4) [4,0) 1 6 1 3 0 finger table start Int. 0 7 succ. keys 6 2 3 5 [2,3) [3,5) [5,1) succ. keys 1 3 3 0 2 5 3 4 finger table start int. 4 5 7 [4,5) [5,7) [7,3) succ. keys 2 0 0 0 11 Example – fast lookup 12 Fast lookup - pseudo code // ask node n to find the successor of id n.find_successor(id) if (id  (n, successor]) return successor; else n’ = closest_preceding_node(id); return n’.find_successor(id); // search the local table for the highest predecessor of id n.closest_preceding_node(id) for i = m downto 1 if (finger[i]  (n, id)) return finger[i]; return n; 13 Fast lookup - analysis  Time complexity at each node  O(m)  Theorem - Number of nodes in a search path is O(m) in the worst case  Number of messages is equal to number of nodes in search path  Theorem – if nodes are distributed uniformly among identifiers then number of nodes in search path is O(log n) with high probability. 14 Fast lookup – analysis (cont.)  Xi – i-th node is in last interval of 2m/N identifiers in search path.  X=Σxi  =E[X]=1  Chernoff bound –  Pr[X>(1+)]<(e/(1+) 1+)  15 Node Joins and Stabilizations  Nodes join dynamically  Network maintains two invariants  Node successor is up to date.  For every key k, successor(k) is responsible for k.  When node n joins:  n.successor and n.finger[i] are updated.  Successor and finger tables of other nodes are updated.  Correct keys are transferred to n.  Each node runs a “stabilization” protocol periodically in the background to update successor pointer and finger table. 16 Node Joins and Stabilizations  “Stabilization” protocol contains 6 functions: create()  join()  stabilize()  notify()  fix_fingers()  check_predecessor()   Each node has both successor and predecessor pointers 17 Node Joins – join()  When node n first starts, it calls n.join(n’), where n’ is any known Chord node.  The join() function asks n’ to find the immediate successor of n.  join() does not make the rest of the network aware of n. 18 Node Joins – join() // create a new Chord ring. n.create() predecessor = nil; successor = n; // join a Chord ring containing node n’. n.join(n’) predecessor = nil; successor = n’.find_successor(n); 19 Node Joins – stabilize()  Each time node n runs stabilize(), it asks its successor for the successor’s predecessor p, and decides whether p should be n’s successor instead.  stabilize() notifies node n’s successor of n’s existence, giving the successor the chance to change its predecessor to n.  The successor does this only if it knows of no closer predecessor than n. 20 Node Joins – stabilize() // called periodically. verifies n’s immediate // successor, and tells the successor about n. n.stabilize() x = successor.predecessor; if (x  (n, successor)) successor = x; successor.notify(n); // n’ thinks it might be our predecessor. n.notify(n’) if (predecessor is nil or n’  (predecessor, n)) predecessor = n’; 21 Node Joins – Join and Stabilization  ns n joins   n runs stabilize   np succ(np) = ns n nil succ(np) = n pred(ns) = np pred(ns) = n   predecessor = nil n acquires ns as successor via some n’ n notifies ns being the new predecessor ns acquires n as its predecessor np runs stabilize     np asks ns for its predecessor (now n) np acquires n as its successor np notifies n n will acquire np as its predecessor  all predecessor and successor pointers are now correct  fingers still need to be fixed, but old fingers will still work 22 Node Joins – fix_fingers()  Each node periodically calls fix fingers to make sure its finger table entries are correct.  New nodes initialize their finger tables using fix_fingers().  Existing nodes incorporate new nodes into their finger tables using fix_fingers().  Each node maintains a pointer next into its finger table. 23 Node Joins – fix_fingers() // called periodically. refreshes finger table entries. n.fix_fingers() next = next + 1 ; if (next > m) next = 1 ; finger[next].node = find_successor(n + 2next-1); // checks whether predecessor has failed. n.check_predecessor() if (predecessor has failed) predecessor = nil;  What is the complexity? 24 Node Failures  Key step in failure recovery is maintaining correct successor pointers  To help achieve this, each node maintains a successor-list of its r nearest successors on the ring  If node n notices that its successor has failed, it replaces it with the first live entry in the list  Successor lists are stabilized as follows:   node n reconciles its list with its successor s by copying s’s successor list, removing its last entry, and prepending s to it. If node n notices that its successor has failed, it replaces it with the first live entry in its successor list and reconciles its successor list with its new successor. 25

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document