* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CS2104 Lecture1 - Royal Institute of Technology
Cracking of wireless networks wikipedia , lookup
Computer network wikipedia , lookup
Network tap wikipedia , lookup
Backpressure routing wikipedia , lookup
Distributed operating system wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Airborne Networking wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Introduction to Structured Overlay Networks Seif Haridi KTH/SICS 5/25/2017 1 Presentation Overview Gentle introduction to Structured Overlay Networks and Distributed Hash Tables General use of SONs and DHTs Chord algorithms and others 5/25/2017 EP2400 Network Algorithms, Seif Haridi 2 What’s a Distributed Hash Table (DHT)? 5/25/2017 An ordinary hash table , which is distributed Key Value Alexander Berlin Ali Stockholm Marina Gothenburg Peter Louvain la neuve Seif Stockholm Stefan Stockholm Every node provides a lookup operation Provide the value associated with a key Nodes keep routing pointers If item not found, route to another node 3 So what? Characteristic Scalability properties Time to findofdata is Store number items Self-management routing info: logarithmic proportional toinformation number of • Ensure routing Size of routing tables is nodes is up-to-date logarithmic Typically: Example: of items: of nodes can be huge Self-management With D items and n nodes that data is always log2(1000000)≈20 Number of items can be huge • Ensure Number Store D/n items per node replicated and available EFFICIENT! in presence joins/leaves/failures Move D/n items when nodes join/leave/fail Routing information Data items EFFICIENT! Self-manage 5/25/2017 4 Traditional Motivation (1/2) Peer-to-Peer file sharing very popular central index Napster Completely centralized Central server knows who has what Judicial problems decentralized index Gnutella 5/25/2017 Completely decentralized Ask everyone you know to find data Very inefficient EP2400 Network Algorithms, Seif Haridi 5 Traditional Motivation (2/2) Grand vision of DHTs Provide efficient file sharing Quote from Chord: ”In particular, [Chord] can help avoid single points of failure or control that systems like Napster possess, and the lack of scalability that systems like Gnutella display because of their widespread use of broadcasts.” [Stoica et al. 2001] Hidden assumptions Millions of unreliable nodes User can switch off computer any time (leave=failure) Extreme dynamism (nodes joining/leaving/failing) Heterogeneity of computers and latencies Untrusted nodes 5/25/2017 EP2400 Network Algorithms, Seif Haridi 6 Motivation: DHT overlay as communication infra-structure Internet communication Not suited for 21st century computing 5/25/2017 IP/port, TCP and UDP Firewalls NATs Changing IP addresses EP2400 Network Algorithms, Seif Haridi 7 Name based communication DHT’s can overcome these node A Value seif 130.237.32.51 babak 193.10.64.99 mihhail 18.7.22.83 olle … 128.178.50.12 … node B node D node C How? 5/25/2017 Key Use the DHT Map names to locations Bypass firewalls and NATs by routing through neighbors EP2400 Network Algorithms, Seif Haridi 8 Name based communication What about group communication? IP Multicast is not enabled on the Internet Use the overlay to broadcast to all nodes Create multiple groups, broadcast within each node A node A 5/25/2017 Key Value seif 130.237.32.51 babak 193.10.64.99 mihhail 18.7.22.83 olle … 128.178.50.12 … node B node D node C Key Value seif 130.237.32.51 babak 193.10.64.99 mihhail 18.7.22.83 olle … 128.178.50.12 EP2400 Network Algorithms, Seif Haridi … node B node D node C 9 What’s it good for? Lets look at 10 applications built using such systems 5/25/2017 EP2400 Network Algorithms, Seif Haridi 10 Distributed Authorization Defense project at SICS, Swedish Institute of Computer Science node A Key Value seif 130.237.32.51 babak 193.10.64.99 mihhail 18.7.22.83 olle 128.178.50.12 node B Store certificates in the directory node C node D … … No central server Survives even if nodes are attacked 5/25/2017 EP2400 Network Algorithms, Seif Haridi 11 Distributed Backup Setup Clients installed the backup tool Decide on amount of space to share Choose files for backup node A Regular backup Key Value seif 130.237.32.51 babak 193.10.64.99 mihhail 18.7.22.83 olle 128.178.50.12 node B Data is encrypted Stored in the directory node C node D … 5/25/2017 EP2400 Network Algorithms, Seif Haridi … 12 Distributed File System Similar to AFS and NFS Files stored in directory What is new? Application logic self-managed node A Key Value seif 130.237.32.51 babak 193.10.64.99 mihhail 18.7.22.83 olle 128.178.50.12 node B node C node D … … Add/remove servers on the fly Automatically handles failures Automatically load-balances No manual configuration needed 5/25/2017 EP2400 Network Algorithms, Seif Haridi 13 P2P Cache A distributed cache Every node in an org. runs a client Want to browse a web page? If exists locally -> download it from a peer Otherwise, fetch and cache Key Value seif 130.237.32.51 babak 193.10.64.99 mihhail 18.7.22.83 olle 128.178.50.12 node A node B No central proxy needed … 5/25/2017 EP2400 Network Algorithms, Seif Haridi … node C node D 14 P2P Web Servers Distributed Web Server node A Pages stored in the directory What is new? Key Value seif 130.237.32.51 babak 193.10.64.99 mihhail 18.7.22.83 olle 128.178.50.12 node B … node C node D … Application logic self-managed Automatically load-balances Add/remove servers on the fly Automatically handles failures 5/25/2017 EP2400 Network Algorithms, Seif Haridi 15 P2P SIP node A Session Initiation Protocol Key Value node B Used to initiate calls on the Internet seif 130.237.32.51 babak 193.10.64.99 mihhail 18.7.22.83 olle 128.178.50.12 Is being standardized Use the directory to find end-hosts 5/25/2017 node C node D … … Improving Skype EP2400 Network Algorithms, Seif Haridi 16 Host Identity Payload (HIP) Uses the directory to provide seamless mobility Unlike Mobile IP 5/25/2017 No home agent needed Self-managing EP2400 Network Algorithms, Seif Haridi 17 PIER (databases) A relational view of the directory Use SQL to fetch data 5/25/2017 Standard operations (projection, selection, equijoin) EP2400 Network Algorithms, Seif Haridi 18 Summary DHT is a useful data structure Assumptions mentioned might not be true Moderate amount of dynamism Leave not same thing as failure Dedicated servers 5/25/2017 Nodes can be trusted Less heterogeneity EP2400 Network Algorithms, Seif Haridi 19 Chord as Example of DHT 5/25/2017 EP2400 Network Algorithms, Seif Haridi 20 How to construct a DHT (Chord)? Use a logical name space, called the identifier space, consisting of identifiers {0,1,2,…, N-1} Identifier space is a logical ring modulo N Every node picks a random identifier though Hash H Example: Space N=16 {0,…,15} 15 a picks 6 b picks 5 c picks 0 d picks 11 e picks 2 1 14 Five nodes a, b, c, d, e 0 2 13 3 12 4 5 11 5/25/2017 EP2400 Network Algorithms, Seif Haridi 6 10 9 8 7 21 Definition of Successor The successor of an identifier is the first node met going in clockwise direction starting at the identifier 15 Example succ(12)=14 succ(15)=2 succ(6)=6 0 1 14 2 13 3 12 4 5 11 5/25/2017 6 10 9 8 7 22 Where to store data (Chord) ? Use globally known hash function, H Store number of items Each item <key,value> gets proportional to number of identifier H(key) = k nodes Typically: Store is responsible for item k With D items and nn nodes Node EFFICIENT! H(“Seif”)=9 5/25/2017 H(“Stefan”)=14 Alexander Berlin Marina Seif Gothenburg Louvain la neuve Stockholm Stefan Stockholm 15 Example Move D/n items when H(“Marina”)=12 nodes join/leave/fail H(“Peter”)=2 Value Peter each item at its successor Store D/n items per node Key 0 1 14 2 13 3 12 4 5 11 6 10 9 8 7 23 Where to point (Chord) ? Each node points to its successor successor of a node n is succ(n+1) Known as a node’s succ pointer The Each node points to its predecessor node met in anti-clockwise direction starting at n-1 Known as a node’s pred pointer First Example 0’s successor is succ(1)=2 2’s successor is succ(3)=5 5’s successor is succ(6)=6 6’s successor is succ(7)=11 11’s successor is succ(12)=0 15 1 14 5/25/2017 0 2 13 3 12 4 5 11 6 10 9 8 7 24 DHT Lookup To lookup a key k Calculate H(k) Follow succ pointers until item k is found Key Value Alexander Berlin Marina Gothenburg Peter Louvain la neuve Seif Stockholm Stefan Stockholm Example Lookup ”Seif” at node 2 H(”Seif”)=9 Traverse nodes: 2, 5, 6, 11 (BINGO) Return “Stockholm” to initiator 5/25/2017 15 0 1 14 2 13 3 12 4 5 11 6 10 9 8 7 25 DHT Lookup (a, b] the segment of the ring moving clockwise from but not including a until and including b n.foo(.) denotes an RPC of foo(.) to node n n.bar denotes and RPC to fetch the value of the variable bar in node n We call the process of finding the successor of an id a LOOKUP // ask node n to find the successor of id procedure n.findSuccessor(id) if predecessor nil id (predecessor, n] then return n else if id (n, successor] then return successor else // forward the query around the circle return successor.findSuccessor(id) 5/25/2017 EP2400 Network Algorithms, Seif Haridi 26 DHT Lookup and Update // ask node n to find the successor of id procedure n.put(id,value) s = findSuccessor(id) s.store(id,value) procedure n.get(id) s = findSuccessor(id) return s.retrieve(id) PUT and GET are nothing but lookups!! 5/25/2017 EP2400 Network Algorithms, Seif Haridi 27 Speeding up lookups If only pointer to succ(n+1) is used Worst case lookup time is N, for N nodes Time to find data is Improving lookup time (finger/routing table) logarithmic Point to succ(n+1) 15 Size of routing tables is Point to succ(n+2) 14 logarithmic Point to succ(n+4) Example: Point to succ(n+8) log2(1000000)≈20 … Point to succ(n+2M-1) EFFICIENT! 0 1 2 5/25/2017 Distance always halved to the destination 13 3 12 4 5 11 6 10 9 8 7 28 Chord – Routing (1/7) 15 Routing table size: M, where N = 2M Every node n knows successor(n + 2 i-1) , for i = 1..M Routing entries = log2(N) log2(N) hops from any node to any other node 15 0 Get(15 ) 1 2 14 13 3 12 4 11 5 10 6 9 5/25/2017 EP2400 Network Algorithms, Seif Haridi 8 7 29 Chord – Routing (2/7) 15 Routing table size: M, where N = 2M Every node n knows successor(n + 2 i-1) , for i = 1..M Routing entries = log2(N) log2(N) hops from any node to any other node 15 0 1 2 14 13 3 12 4 11 5 10 Get(15 ) 5/25/2017 6 9 EP2400 Network Algorithms, Seif Haridi 8 7 30 Chord – Routing (3/7) Get(15 ) Routing table size: M, where N = 2M Every node n knows successor(n + 2 i-1) , for i = 1..M Routing entries = log2(N) log2(N) hops from any node to any other node 15 15 0 1 2 14 13 3 12 4 11 5 10 6 9 5/25/2017 EP2400 Network Algorithms, Seif Haridi 8 7 31 Chord – Routing (4/7) 15 From node 1, only 2 hops to 14 node 0 where item 15 is stored 13 For an id space of 16 is, the maximum is log2(16) = 4 12 hops between any two nodes In fact, if nodes are uniformly distributed, the maximum is 11 log2(# of nodes), i.e. log2(8) hops between any two nodes 10 The average complexity is: ½ log(#nodes) 5/25/2017 15 0 Get(15 ) 1 2 3 4 5 6 9 EP2400 Network Algorithms, Seif Haridi 8 7 32 Chord – Routing (5/7) Pseudo code findSuccessor(.) // ask node n to find the successor of id procedure n.findSuccessor(id) if predecessor nil id (predecessor, n] then return n if id (n, successor] then return successor else n’ := closestPrecedingNode(id) return n’.findSuccessor(id) // search locally for the highest predecessor of id procedure closestPrecedingNode(id) for i = m downto 1 do if finger[i] (n, id) then return finger[i] end return n 5/25/2017 EP2400 Network Algorithms, Seif Haridi 33 Chord: Discussion We are basically done But…. What about joins and failures/leaves? Nodes come and go as they wish What about data? Should I lose my doc because some kid decided to shut down his machine and he happened to store my file? What about storing addresses of files instead of files? What did we gain compared to Gnutella? Increased guarantees and determinism? So actually we just started.. 5/25/2017 EP2400 Network Algorithms, Seif Haridi 34 Agenda Handling successor pointers Scalability Joins, Leaves Routing table reducing the cost from O(N) to O(logN) Failures (for all the above) 5/25/2017 EP2400 Network Algorithms, Seif Haridi 35 Handling Successors Ring maintenance Every thing depends on successor pointers, so, we better have them right all the time!! In Chord, in addition to the successor pointer, every node has a predecessor pointer as well for ring maintenance 5/25/2017 EP2400 Network Algorithms, Seif Haridi 36 Handling Dynamism Periodic stabilization is used to make pointers eventually correct Try pointing succ to closest alive successor Try pointing pred to closest alive predecessor Periodically at n: When receiving notify(p) at n: 1. v:=succ.pred 2. if vnil and v is in (n,succ] 3. set succ:=v 4. send a notify(n) to succ 1. if pred=nil or p is in (pred,n] 2. set pred:=p 5/25/2017 EP2400 Network Algorithms, Seif Haridi 37 Handling joins When n joins Find n’s successor with lookup(n) Set succ to n’s successor Stabilization fixes the rest 15 13 11 Periodically at n: When receiving notify(p) at n: 1. 2. 3. 4. 1. 2. set v:=succ.pred if v≠nil and v is in (n,succ] set succ:=v send a notify(n) to succ 5/25/2017 if pred=nil or p is in (pred,n] set pred:=p EP2400 S. Haridi, Network ID2210, Algorithms, Lecture Seif 02Haridi 38 Handling Successors - Chord Algorithm nil 5/25/2017 EP2400 Network Algorithms, Seif Haridi 39 Handling Join/Leaves For Fingers Finger Stabilization (1/5) Periodically refresh finger table entries, and store the index of the next finger to fix This is also the initialization procedure for the finger table Local variable next initially 0 procedure n.fixFingers() next := next+1 if next > m then next := 1 finger[next] := findSuccessor(n 2next-1) 5/25/2017 EP2400 Network Algorithms, Seif Haridi 40 Example finger stabilization (2/5) Current situation: succ(N48) is N60 Succ(N21.Fingerj.start) = Succ(53) =N21.Fingerj.node = N60 N21.Fingerj.start N21 5/25/2017 N26 N32 N48 N53 EP2400 Network Algorithms, Seif Haridi N21.Fingerj.node N60 41 Example finger stabilization (3/5) New node N56 joins and stabilizes successor pointer Finger j of node N21 is wrong N53 eventually try to fix finger j by looking up 53 which stops at N48, however and nothing changes N21.Fingerj.start N21 N26 N32 N21.Fingerj.node N60 N48 N53 N56 5/25/2017 EP2400 Network Algorithms, Seif Haridi 42 Example finger stabilization (4/5) N48 will eventually stabilize its successor This means the ring is correct now. N21.Fingerj.start N21 5/25/2017 N26 N32 N48 N53 EP2400 Network Algorithms, Seif Haridi N21.Fingerj.node N56 N60 43 Example finger stabilization (5/5) When N21 tries to fix Finger j again, this time the response from N48 will be correct and N21 corrects the finger N21.Fingerj.node N21.Fingerj.start N21 5/25/2017 N26 N32 N48 N53 EP2400 Network Algorithms, Seif Haridi N56 N60 44 Agenda Handling successor pointers Scalability Joins, Leaves, Routing table reducing the cost from O(N) to O(log N) Failures (for all the above) Handling data 5/25/2017 Joins, Leaves EP2400 Network Algorithms, Seif Haridi 45 Handling Failures – Replication of Successors Evidently the failure of one successor pointer means total collapse Solution: A node has a “successors list” of size r containing the immediate r successors How big should r be? log(N) or a large constant should be ok Enhance periodic stabilization to handle failures 5/25/2017 46 Dealing with failures Each node keeps a successor-list Pointer to r closest successors succ(n+1) succ(succ(n+1)+1) succ(succ(succ(n+1)+1)+1) ... 15 0 1 14 2 13 3 12 4 5 11 If successor fails 6 10 9 Replace with closest alive successor 8 7 If predecessor fails Set pred to nil 5/25/2017 EP2400 Network Algorithms, Seif Haridi 47 Handling leaves When n leaves Just dissappear (like failure) 15 13 When pred detected failed Set pred to nil When succ detected failed Set succ to closest alive in successor 11 list Periodically at n: When receiving notify(p) at n: 1. 2. 3. 4. 1. 2. set v:=succ.pred if v≠nil and v is in (n,succ] set succ:=v send a notify(n) to succ 5/25/2017 if pred=nil or p is in (pred,n] set pred:=p EP2400 S. Haridi, Network ID2210, Algorithms, Lecture Seif 02Haridi 48 Handling Failures- Ring (1/5) Maintaining the ring Each node maintains a successor list of length r If a node’s immediate successor fails, it uses the second entry in its successor list updateSuccessorList copies a successor list from s: removing last entry, and prepending s Join a Chord containing node n’ procedure n.join(n’) predecessor := nil s := n’.findSuccessor(n) updateSuccessorList(s.successorList) 5/25/2017 EP2400 S. Haridi, Network ID2210, Algorithms, Lecture Seif 02Haridi 49 Handling Failures- Ring (2/5) Check whether predecessor has failed (Failure detector) procedure n.checkPredecessor() if predecessor has failed then predecessor := nil 5/25/2017 EP2400 Network Algorithms, Seif Haridi 50 Handling Failures- Ring (3/5) procedure n.stabilize() s := Find first alive node in successorList x := s.predecessor if x not nil and x (n, s) then s := x end updateSuccessorList(s.successorList) s.notify(n) procedure n.notify(n’) if predecessor = nil or n’ (predecessor, n) then predecessor := n’ 5/25/2017 EP2400 Network Algorithms, Seif Haridi 51 Failure – Ring (4/5) Example – Node failure (N26) Initially suc(N21,2) suc(N21,1) N26 N21 N32 pred(N32) pred(N32) suc(N26,1) After N21 performed stabilize(), before N21.notify(N32) suc(N21,1) N21 N32 N26 pred(N32) 5/25/2017 EP2400 Network Algorithms, Seif Haridi 52 Failure – Ring (5/5) Example - Node failure (N26) After N21 performed stabilize(), before N21.notify(N32) N21.notify(N32) has no effect suc(N21,1) N21 N32 N26 pred(N32) After N32.checkPredecessor() suc(N21,1) N21 N26 N32 Next N21.stabilize() fixes N32’s predecessor 5/25/2017 EP2400 Network Algorithms, Seif Haridi 53 Failure – Lookups (1/5) // ask node n to find the successor of id procedure n.findSuccessor(id) if id (n, successor] then return successor else n’ := closestPreceedingNode(id) return try n’.findSuccessor(id) catch failure of n’ then mark n’ in finger[.] as failed n.findSuccessor(id) // search locally for the highest predecessor of id procedure closestPreceedingNode(id) for i = m downto 1 do if finger[i].node is alive and finger[i] (n, id) then return finger[i] end return n 5/25/2017 EP2400 Network Algorithms, Seif Haridi 54 Variations of Chord DKS Chord# 5/25/2017 EP2400 Network Algorithms, Seif Haridi 55 DKS Routing Generalization of Chord to provide arbitrary arity Provide logk(n) hops per lookup k being a configurable parameter n being the number of nodes Instead of only log2(n) 5/25/2017 EP2400 Network Algorithms, Seif Haridi 56 Achieving logk(n) lookup Each node logk(N)=L levels, N=kL Each level contains k intervals, Example, k=4, N=64 (43), node 0 0 4 8 Interval 3 Interval 0 Node 0 I0 I1 I2 I3 12 Level 1 0…15 16…31 32…47 48…63 48 16 Interval 2 Interval 1 5/25/2017 EP2400 Network Algorithms, Seif Haridi 32 57 Achieving logk(n) lookup Each node logk(N) levels, N=kL Each level contains k intervals, Example, k=4, N=64 (43), node 0 0 4 Interval 0 8 Node 0 Interval 1 I0 I1 I2 I3 12 Level 1 0…15 16…31 32…47 48…63 Interval 2 48 Interval 3 5/25/2017 16 Level 2 EP2400 Network Algorithms, Seif Haridi 32 0…3 4…7 8…11 12…15 58 Achieving logk(n) lookup Each node logk(N) levels, N=kL Each level contains k intervals, Example, k=4, N=64 (43), node 0 0 4 8 Node 0 I0 I1 I2 I3 12 Level 1 0…15 16…31 32…47 48…63 48 16 5/25/2017 Level 2 0…3 4…7 Level 3 0 1 EP2400 Network Algorithms, Seif Haridi 32 8…11 12…15 2 3 59 Arity is Important Maximum number of hops can be configured k N 1 r 1 r log k (N ) log 1 (N ) log 1 N Nr N r r r Example, a 2-hop system k log 5/25/2017 N N (N ) 2 EP2400 Network Algorithms, Seif Haridi 60 Chord # The routing table has exponentially increasing pointers on the node space and NOT the identifier space (skip-list like structure) 5/25/2017 EP2400 Network Algorithms, Seif Haridi 61 Routing Table of Chord# Building the routing table log2N pointers exponentially spaced pointers Chord# 5/25/2017 EP2400 Network Algorithms, Seif Haridi 62 Chord vs. Chord# Good for load balancing 5/25/2017 EP2400 Network Algorithms, Seif Haridi 63