Download (slides)

Peer-to-peer resource discovery for self-organizing virtual private clusters Renato Figueiredo Associate Professor Center for Cloud and Autonomic Computing ACIS Lab University of Florida Center for Cloud and Autonomic Computing Vrije University, March 14th, 2012 Outlook  Peer-to-peer overlays in virtual private clusters   Self-configuration, self-healing, self-optimization Framework for communication among nodes    Virtual machines + virtual networks Isolation, security, encapsulation Also framework for resource discovery Brunet/IPOP peer-to-peer overlay design  Virtual networking and virtual clusters  Resource discovery  2 Peer-to-peer virtual networks  Focus:   Why virtual?    Software overlays that provide virtual network infrastructure over existing Internet infrastructure Support unmodified TCP/IP applications and existing Internet physical infrastructure Hide heterogeneity of physical network (firewalls, NATs), avoid IPv4 address space constraints Why self-organizing?   Autonomous behavior: low management cost compared to typical VPNs Decentralized architecture for scalability and fault tolerance 3 IP-over-P2P (IPOP) overlays  Isolation   Virtual address space decoupled from Internet address space Self-organization  Automatic configuration of routes and topologies  Decentralized – structured peer-to-peer (P2P)  No global state  No central points of failure   Mobility – keep same virtual IP address Decentralized NAT traversal   No changes to NAT configuration No need for STUN server infrastructure [IPDPS 2006, Ganguly et al]; http://ipop-project.org 4 Use case scenarios  Collaborative environments based on virtual machines and virtual networks    VM provides isolation Virtual appliances provide software encapsulation WOWs: Virtual appliance clusters   Homogeneous software environment on top of heterogeneous infrastructure Homogeneous virtual network on top of wide-area, NATed environments 5 Brunet P2P overlay design    Based on the Brunet library Bi-directional ring ordered by 160-bit node IDs Structured connections: n4 • “Near”: with neighbours n3 n5 • “Far”: across the ring n2 n6 Multi-hop path between n1 and n7 Far n7 n1 n12 Near n8 n11 n1 < n2 < n3 <……. < n13 < n14 n9 n10 6 Overlay edges and Routing  Overlay   Multiple transports: UDP, TCP, tunnel UDP: decentralized NAT hole-punching  Greedy   edges routing Send over edge leading closest to destination Constant number of edges  O((1/k)log2(n)) overlay hops K-connections per node  Kleinberg small-world network   Adaptive shortcut edges 1-hop between frequently communicating nodes  Autonomous proxy selection if NAT traversal fails  7 Managing virtual IP spaces  One P2P overlay, multiple IP namespaces   IP assignment, routing occurs within a namespace Each IPOP namespace: a unique string  Distributed Hash Table (DHT) stores mapping    IPOP node configured with a namespace    Query namespace for DHCP configuration Guess an IP address at random within range Attempt to store in DHT    Key=namespace Value=DHCP configuration (IP range, lease, ...) Key=namespace+IP Value=IPOPid (160-bit) IP->P2P Address resolution:  Given namespace+IP, lookup IPOPid 8 Autonomic features  Self-configuration [IPDPS’06, HPDC’06, PCgrid’07]    Self-optimization [HPDC’06]    Direct shortcut connections created/trimmed based upon IP traffic inspection for fast end-to-end IP tunnels Proximity neighbor selection based on network coordinate estimates for improved structured routing Self-healing [HPDC’08]   Routing tables using structured P2P links NAT traversal, DHCP over DHT “Tunnel” edges created to maintain overlay structure to deal with routing outages and NATs/firewalls that are not traversable VLAN routers, overlay bypass [VTDC09, SC09] 9 WOW: Virtual Private Clusters  Plug-and-play deployment of Condor grids    Wide-area virtual clusters, where:    Sandboxing; software packaging; decoupling Virtual network provides:   Nodes are virtual machines (VMs: e.g. VMware, Xen) Network is virtual: IPOP VMs provide:   High-throughput computing; LAN and WAN Data sharing Virtual private LAN over WAN Condor provides:  Match-making, reliable scheduling, … unmodified 10 Example: The Archer System User desktops Community-contributed content: applications, datasets Deployment, support, configuration, troubleshooting Self-configuring Virtual appliances Archer seed resources Archer software and management Voluntary resources Web portal, documentation, tutorials Local resource pools: servers, clusters, desktop labs http://archer-project.org 11 Example: Archer PlanetLab overlay: ~450 nodes, 24/7 on a shared infrastructure Archer cluster ramp-up: UFL, NEU, UMN, UTA, FSU 12 Resource discovery Scale up resources in a virtual cluster, selfconfiguring middleware, fault-tolerance  How to leverage the P2P virtual network for flexible, scalable resource discovery?  Traditional approaches   Centralized approach    Multiple central managers (e.g., Condor Flocking) Pull-based job discovery/scheduling (e.g., Boinc) Decentralized approach  DHT-based  MAAN, SWORD, Mercury  Multi-dimension based  Squid, RD on CAN Overall Approach  Decentralized resource discovery framework No single point-of-failure; self-configuring  Leveraging self-organizing multicast tree  Query distribution/processing (“Map”)      Resolve matches “in-situ” No periodic resource attribute update needed Result aggregation (“Reduce”) Rich query processing capacity   Regular expression match Appropriate for dynamic attributes Query propagation  Self-organizing multicast tree F" H" G" I" J" E" K" D" C" B" N" A" M" L" A" [D,G)" D" E" [B,D)" B" F" G"[G,K)" C" H" I" N"[N,A)" K" [K,N)" M" J" L" Modules  Matchmaking   Checking if a requirement matches the resource status Using timely local information   Aggregation    Appropriate for dynamic attributes Sort requirement satisfying nodes based on a rank value Hierarchical query results summary at internal nodes Implementation  Condor ClassAds and condor_startd  Provides a rich query processing Query Processing Requirements = (Memory>1GB) Rank = Memory, Return top 2 rank value nodes {Node1, 5GB} {Node3, 4GB} Aggreg ation {Node2, 2GB} {Node4, 2GB} Node2: 2GB {Node4, 2GB} Match making Node1: 5GB Aggreg ation Match makin g {Node3, 4GB} {Node6, 3GB} Aggreg ation {} {Node6, 3GB} Match makin g Node3: 4GB {} {Node8, 2GB} Aggreg ation Aggreg ation Aggreg ation Aggreg ation Aggreg ation Match makin g Match makin g Match making Match making Match making Node4: 2GB Node5: 1GB Node6: 3GB Node7: 512MB Node8: 2GB Dealing with Faults Nodes join/leave  Slow nodes    Adaptive timeouts Redundant query topologies A [D,G) D E F [B,D) B G C H I [G,K) N [N,A) K M J [K,N) L Timeouts on multicast tree Dynamic timeout based on completion score  Node determines timeout based on the latency and fraction of results returned by a child  WF : User input : WF  0 li : latency until i  th result returned (0  i  nc ) Ci : Completion score at i  th result li  WF Timeouti  Ci Timeouts on multicast tree  Dynamic timeout based on completion score A 1.0 D 0.2 E F B 0.1 G 0.4 C H I T0 time N 0.1 K 0.2 M Size Estimate L J TO1 TO2 T1 T2 TO3 TO4 T3 T4 Node G returns a Node D returns a result result WF ´ (T2 - T0 ) WF ´ (T4 - T0 ) TO2 = TO = 4 0.3 0.8 Node B returns a Node N returns a result result WF ´ (T1 - T0 ) TO1 = WF ´ (T3 - T0 ) TO3 = 0.1 Node A: Query begin 0.4 Timeouts on multicast tree Guarantee the bounded response time  Reflects user’s input    More results returned, less waiting time   WaitFactor Completion score Dynamically adjusts to network status Redundant topology  Multicast tree generation Pj : An arbitrary parent node Ci-1,Ci ,Ci+1 : Child nodes of Pj , Ci-1 < Ci < Ci+1  Clockwise tree generation Ci is responsible for [Ci ,Ci+1 )  Counter-clockwise tree generation Ci is responsible for (Ci-1,Ci ] Redundant topology ID F E G mode H AI B C clk B D D E F G H I J J ID B K B D (B,D] [D,G) ED c-clk C D E F M N G ED C N G E K L M N c-clk G F (A,B] [B,D) B c-clk K I N N G [G,K) (D,G] C H F M B A E c-clk clk J c-clk [N,A) (K,N] K (G,K [K,N) ] K c-clk F C c-clk clk I G [G,K) N [N,A) [B,D B ) F J clk A H A [D,G D ) I c-clk clk K H clk M c-clk clk G N L B clk D c-clk clk A A clk L C K mode C H I M J K [K,N) L L N J (B,D D ] C (A,B B ] M J L L M I H A G (D,G N (K,N] K (G,K] F J ] E M L I H Redundant topology + timeouts  Experiments on PlanetLab Redundant%Topology%with%Sta>c%Timeout% Cumula&ve)Distribu&on) 1% 0.023% 0.134% 0.95% 0.9% 0.85% 0.8% 0.75% 0.7% 0.00% A. Query response time Plain%MatchTree% 0.02% 0.04% 0.06% 0.08% Missing)Region)Ra&o) 0.10% B. Query coverage 0.12% 0.14% Ongoing/future work  Self-organizing virtual clusters   Discover nodes that match job class-ads Self-organize virtual private namespace, selfconfigure scheduling middleware on demand    Condor Hadoop Techniques to bound scope of queries for improved latency/reduced bandwidth    First-fit Sub-region queries Machine learning 25 Thank you!  For more information and downloads      http://ipop-project.org http://grid-appliance.org http://archer-project.org http://futuregrid.org Acknowledgments  ACIS P2P group   Resource discovery slides - Kyungyong Lee National Science Foundation   CAC center Archer, FutureGrid and DiRT 26

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download (slides)