Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Airborne Networking wikipedia , lookup
Backpressure routing wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Distributed operating system wikipedia , lookup
IEEE 802.1aq wikipedia , lookup
Everything2 wikipedia , lookup
Peer-to-Peer Structured Overlay Networks Antonino Virgillito Background Peer-to-peer systems • • • • • distribution symmetry (communication, node roles) decentralized control self-organization dynamicity Data Lookup in P2P Systems • Data items spread over a large number of nodes • Which node stores which data item? • A lookup mechanism needed – Centralized directory -> bottleneck/single point of failure – Query Flooding -> scalability concerns – Need more structure! More Issues • Organize, maintain overlay network – node arrivals – node failures • Resource allocation/load balancing • Resource location • Network proximity routing What is a Distributed HashTable? • Exactly that • A service, distributed over multiple machines, with hash table semantics – put(key, value), Value = get(key) • Designed to work in a peer-to-peer (P2P) environment • No central control • Nodes under different administrative control • But of course can operate in an “infrastructure” sense What is a DHT? • Hash table semantics: put(key, value), Value = get(key) • • • • • Key is a single flat string Limited semantics compared to keyword search Put() causes value to be stored at one (or more) peer(s) Get() retrieves value from a peer Put() and Get() accomplished with unicast routed messages • In other words, it scales • Other API calls to support application, like notification when neighbors come and go Distributed Hash Tables (DHT) nodes k1,v1 Operations: put(k,v) get(k) P2P overlay network k2,v2 k3,v3 k4,v4 k5,v5 k6,v6 • p2p overlay maps keys to nodes • completely decentralized and self-organizing • robust, scalable Popular DHTs • Tapestry (Berkeley) – Based on Plaxton trees---similar to hypercube routing – The first* DHT – Complex and hard to maintain (hard to understand too!) • CAN (ACIRI), Chord (MIT), and Pastry (Rice/MSR Cambridge) – Second wave of DHTs (contemporary with and independent of each other) DHTs Basics • Node IDs can be mapped to the hash key space • Given a hash key as a “destination address”, you can route through the network to a given node • Always route to the same node no matter where you start from • Requires no centralized control (completely distributed) • Small per-node state is independent of the number of nodes in the system (scalable) • Nodes can route around failures (fault-tolerant) Things to look at • What is the structure? • How does routing work in the structure? • How does it deal with node joins and departures (structure maintenance)? • How does it scale? • How does it deal with locality? • What are the security issues? The Chord Approach • Consistent Hashing • Logical Ring • Finger Pointers The Chord Protocol • Provides: – A mapping successor: key -> node – To lookup key K, go to node successor(K) • successor defined using consistent hashing: – Key hash – Node hash – Both Keys and Nodes hash to same (circular) identifier space – successor(K)=first node with hash ID equal to or greater than hash(K) Example: The Logical Ring Nodes 0, 1, 3 Keys 1, 2, 6 Consistent Hashing [Karger et al. ‘97] • Some Nice Properties: – Smoothness: minimal key movement on node join/leave – Load Balancing: keys equitably distributed over nodes Mapping Details • Range of Hash Function – Circular ID space module 2m • Compute 160 bit SHA-1 hash, and truncate to m-bits – Chance of collision rare if m is large enough • Deterministic, but hard for an adversary to subvert Chord State • Successor/Predecessor in the Ring • Finger Pointers – n.finger[i] = successor (n+2 i-1) – Each node knows more about portion of circle close to it! Example: Finger Tables Chord: routing protocol Notation n.foo( ) stands for a remote call to node n. - A set of nodes towards id are contacted remotely - Each node is queried for the known node which is closest to id - Process stops when a node is found having successor > id Example: Chord Routing Finger Pointers for Node 1 Lookup Complexity • With high probability: O(log(N)) • Proof Intuition: – Being p the successor of the targeted key, distance to p reduces by at least half in each step – In m steps, would reach p – Stronger claim: In O(log(N)) steps, distance ≤ 2m/N Thereafter even linear advance will suffice to give O(log(N)) lookup complexity Chord invariants • Every key in the network can be located as long as the following invariants are preserved after joins and leaves: – Each node’s successor is correctly maintained – For every key k, node successor(k) is responsible for k Chord: Node Joins • New node B learns of at least one existing node A via external means • B asks A to lookup its finger-table information – Given that B’s hash-id is b, A does lookup for B.finger[i] = successor ( b + 2i-1) if interval not already included in finger[i-1] – B stores all finger information and sets up pred/succ pointers Node Joins (contd.) • Update of finger table of existing nodes p such that: 1. p precedes b by at least 2i-1 2. the i-th finger of node p succeeds b – Starts from p = predecessor( b - 2i-1 ) and proceeds in counter-clock-wise direction while 2. is true • Transferring keys: – – Only from successor(b) to b Must send notification to the application Example: finger table update Node 6 joins Example: transferring keys Node 1 leaves Concurrent Joins/Leaves • Need a stabilization protocol to guard against inconsistency • Note: – Incorrect finger pointers may only increase latency, but incorrect successor pointers may cause lookup failure! • Nodes periodically run stabilization protocol – Finds successor’s predecessor – Repair if this isn’t self • This algorithm is also run at join Example: node 25 joins Example: node 28 joins before 20 stabilizes (1) Example: node 28 joins before 20 stabilizes (2) CAN • Virtual d-dimensional Cartesian coordinate system on a d-torus – Example: 2-d [0,1]x[1,0] • Dynamically partitioned among all nodes • Pair (K,V) is stored by mapping key K to a point P in the space using a uniform hash function and storing (K,V) at the node in the zone containing P • Retrieve entry (K,V) by applying the same hash function to map K to P and retrieve entry from node in zone containing P – If P is not contained in the zone of the requesting node or its neighboring zones, route request to neighbor node in zone nearest P Routing in a CAN • Follow straight line path through the Cartesian space from source to destination coordinates • Each node maintains a table of the IP address and virtual coordinate zone of each local neighbor • Use greedy routing to neighbor closest to destination • For d-dimensional space partitioned into n equal zones, nodes maintain 2d neighbors – Average routing path length: 1 d d n 4 CAN Construction • Joining node locates a bootstrap node using the CAN DNS entry – Bootstrap node provides IP addresses of random member nodes • Joining node sends JOIN request to random point P in the Cartesian space • Node in zone containing P splits the zone and allocates “half” to joining node • (K,V) pairs in the allocated “half” are transferred to the joining node • Joining node learns its neighbor set from previous zone occupant – Previous zone occupant updates its neighbor set Departure, Recovery and Maintenance • Graceful departure: node hands over its zone and the (K,V) pairs to a neighbor • Network failure: unreachable node(s) trigger an immediate takeover algorithm that allocate failed node’s zone to a neighbor – Detect via lack of periodic refresh messages – Neighbor nodes start a takeover timer initialized in proportion to its zone volume – Send a TAKEOVER message containing zone volume to all of failed node’s neighbors – If received TAKEOVER volume is smaller kill timer, if not reply with a TAKEOVER message – Nodes agree on neighbor with smallest volume that is alive Pastry Generic p2p location and routing substrate • Self-organizing overlay network • Lookup/insert object in < log16 N routing steps (expected) • O(log N) per-node state • Network proximity routing Pastry: Object distribution 2128-1 O Consistent hashing 128 bit circular id space objId nodeIds (uniform random) objIds (uniform random) nodeIds Invariant: node with numerically closest nodeId maintains object Pastry: Object insertion/lookup 2128-1 O X Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible Route(X) Pastry: Routing table (# 65a1fc) Row 0 0 x 1 x 2 x 3 x 4 x Row 1 6 0 x 6 1 x 6 2 x 6 3 x 6 4 x Row 2 6 5 0 x 6 5 1 x 6 5 2 x 6 5 3 x 6 5 4 x Row 3 6 5 a 0 x 6 5 a 2 x 6 5 a 3 x 6 5 a 4 x log16 N rows 5 x 7 x 8 9 a x x x b x c x d x e x f x 6 6 6 7 x x 6 6 6 8 9 a x x x 6 b x 6 c x 6 d x 6 e x 6 f x 6 5 5 x 6 5 6 x 6 5 7 x 6 5 8 x 6 5 9 x 6 5 b x 6 5 c x 6 5 d x 6 5 e x 6 5 f x 6 5 a 5 x 6 5 a 6 x 6 5 a 7 x 6 5 a 8 x 6 5 a 9 x 6 5 a b x 6 5 a c x 6 5 a d x 6 5 a e x 6 5 a f x 6 5 a a x Pastry: Leaf sets Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIds, respectively. • routing efficiency/robustness • fault detection (keep-alive) • application-specific local coordination Pastry: Routing procedure if (destination is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if (Rld exists) forward to Rld else forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node Pastry: Routing d471f1 d467c4 d462ba d46a1c d4213f Route(d46a1c) 65a1fc d13da3 Properties • log16 N steps • O(log N) state Pastry: Performance Integrity of overlay message delivery: • guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: • No failures: < log16 N expected, 128/b + 1 max • During failure recovery: – O(N) worst case, average case much better Pastry Join • X = new node, A = bootstrap, Z = nearest node • A finds Z for X • In process, A, Z, and all nodes in path send state tables to X • X settles on own table – Possibly after contacting other nodes • X tells everyone who needs to know about itself Pastry Leave • Noticed by leaf set neighbors when leaving node doesn’t respond – Neighbors ask highest and lowest nodes in leaf set for new leaf set • Noticed by routing neighbors when message forward fails – Immediately can route to another neighbor – Fix entry by asking another neighbor in the same “row” for its neighbor – If this fails, ask somebody a level up