* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides (PDF).
Survey
Document related concepts
Transcript
Self-configuring IP-over-P2P Overlays: Interconnecting Virtual Machines for Wide-area Computing Renato Figueiredo, Arijit Ganguly, Abhishek Agrawal, P. Oscar Boykin Advanced Computing and Information Systems (ACIS) University of Florida Advanced Computing and Information Systems laboratory Outline z What’s in a title z IPOP – IP-over-P2P overlays z Experiments & analysis • Virtual machines for Grid computing • Virtual networks • Use cases • Overlay routing • NAT traversal and direct connections • WOW – wide-area overlay network of virtual workstations Advanced Computing and Information Systems laboratory 2 Wide-area, Grid computing z 1 “Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources” 1 “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, I. Foster, C. Kesselman, S. Tuecke. International J. Supercomputer Applications, 15(3), 2001 Advanced Computing and Information Systems laboratory 3 Resource sharing z Traditional solutions: z Evolved from centrally-admin. domains • Multi-task operating systems • User accounts • File systems • Functionality available for reuse • However, Grids span administrative domains Advanced Computing and Information Systems laboratory 4 Sharing – owner’s perspective z Wishes to provide resource cycles to Grid √User “A” is trusted and uses an A B C environment common to my cluster ×User “B” is not to be trusted • May compromise resource, other users ×User “C” has different O/S, application needs? • Administrative overhead • May not be possible to support “C” without dedicating resource or interfering with other users Advanced Computing and Information Systems laboratory Slide provided by R. Figueiredo 5 Sharing – user’s perspective Wishes to use cycles from a Grid 9Develops apps using standard Grid interfaces, and trusts users who share resource A ×Has a grid-unaware application? A • Provider B may not support the environment B C expected by application: O/S, libraries, packages, … ×Does not trust others using resource C? • If another user compromises C’s O/S, they also compromise user’s work Advanced Computing and Information Systems laboratory Slide provided by R. Figueiredo 6 An alternative approach System Virtual Machines Examples: VMware, Xen, MS Virtual Server, … z “A virtual machine is taken to be an efficient, isolated, duplicate copy of the real machine” 2 • • • “A statistically dominant subset of the virtual processor’s instructions is executed directly by the real processor” 2 “…transforms the single machine interface into the illusion of many” 3 “Any program run under the VM has an effect identical with that demonstrated if the program had been run in the original machine directly” 2 2 “Formal Requirements for Virtualizable Third-Generation Architectures”, G. Popek and R. Goldberg, Communications of the ACM, 17(7), July 1974 3 “Survey of Virtual Machine Research”, R. Goldberg, IEEE Computer, June 1974 Advanced Computing and Information Systems laboratory Slide provided by R. Figueiredo 7 Networking VMs z System VM isolates user execution environment from host • Great! Now how do I access it? z Users: want full bi-directional TCP/IP connectivity • Facilitate programming and deployment • But cross-domain communication subject to NAT, firewall policies z Providers: want to isolate traffic • Users with admin privileges inside VM still pose security problems: viruses, DoS Advanced Computing and Information Systems laboratory 8 Virtual networking z Isolation: dealt with similarly to VMs • Multiple, isolated virtual networks time-share physical network z z Key technique: tunneling Related work • Generic: VPNs • Applied to Grid computing: • VNET (P. Dinda, Northwestern U.) • Violin (D. Xu, Purdue U.) • ViNe (J. Fortes, U. Florida) Advanced Computing and Information Systems laboratory 9 VMs + Virtual Networking z Resource aggregation • Consistent view of the resources • Same VM configuration –> homogeneous cluster overlaying heterogeneous nodes z z Leverage LAN applications, middleware Enable VM migration across subnets • Virtual IP decoupled from physical IP • For load-balancing or fault tolerance Advanced Computing and Information Systems laboratory 10 Our approach – IP-over-P2P z Virtual network should be self-configured z Virtual network should be isolated • Avoid administrative overhead of VPNs • Including cross-domain NAT traversal • Virtual private address space decoupled from Internet address space Advanced Computing and Information Systems laboratory 11 Use cases z z VM “appliances”: define once, instantiate many Example: compute server appliance • Condor master, worker, submit • Role defined through configuration when it • z joins the virtual network Growing shared base of compute resources Example: client appliance • Contains software needed to submit jobs to a community’s gateway, mount file systems, etc Advanced Computing and Information Systems laboratory 12 Motivations for P2P z z z z Scalability • Overhead of adding a new node is constant and independent of size of the network Resiliency • Robust P2P routing Accessibility • Ability to traverse NAT/Firewalls Traffic pattern based topology adaptation of P2P network • • Self-organized Decentralized Advanced Computing and Information Systems laboratory 13 IPOP (IP-over-P2P) z IP tunneling over P2P overlay networks z Virtual IP packet capture and injection through tap interface z Builds upon Brunet library • UDP, TCP • Robust UDP/TCP connection support • Hole-punching a-la STUN for NAT traversal • C# - rapid prototyping Advanced Computing and Information Systems laboratory 14 Brunet P2P architecture z Ring-structured overlay network topology Overlay link: z Greedy packet routing z • Near: neighbor connections along ring • Far: connections along chord • O(log2(n)) overlay hop routing Node A Node B Multi-hop overlay path between nodes 900-node Brunet ring over PlanetLab Advanced Computing and Information Systems laboratory 15 IPOP algorithm z z At sender: • • • • • Read Ethernet packet from tap interface Retrieve IP payload Compute brunet Id ‘x’ = SHA1(IP destination) Encapsulate IP packet inside brunet packet Send Brunet packet to ‘x’ At IPOP node ‘x’ on receiving a packet: • • • • Check of brunet destination Id == ‘x’ Extract IP packet Wrap inside an Ethernet packet addressed to Ethernet address of tap Write packet to tap device Advanced Computing and Information Systems laboratory 16 Packet capture and routing Advanced Computing and Information Systems laboratory 17 Distributed topology adaptation z O(log2(n)) overlay hop routing used to selectively “bootstrap” 1-hop routing • Monitor outbound traffic at each IPOP node • Setup direct overlay link setup between communicating nodes based on IP overlay traffic inspection • Heuristic: track if number of packets during time interval t greater than a threshold T z Once 1-hop “shortcut” is established • No dependency on other P2P nodes for communication Advanced Computing and Information Systems laboratory 18 Establishing shortcuts (1) Sends CTM request Node A Node B Path taken by CTM request - A communicates with B - Traffic inspection triggers request to create shortcut - Connect-to-me (CTM) Brunet protocol message - “A” tells “B” its address(es): - “A” knows its private address - “A” learns public IP/port of its NAT translation when it joins the overlay; at least one node on public network Advanced Computing and Information Systems laboratory 19 Establishing shortcuts (2) Sends CTM reply; and initiates linking Node A Node B Path taken by CTM reply - “B” sends CTM reply – routed through overlay - “B” tells “A” its address(es) - “B” initiates linking protocol by attempting to connect to “A” directly Advanced Computing and Information Systems laboratory 20 Establishing shortcuts (3) Gets CTM reply; initiates linking Node A Node B - “A” gets CTM reply; initiates linking protocol with “B” - B’s linking protocol message to A pokes hole on B’s NAT - A’s linking protocol message to B pokes hole on A’s NAT (exponential backoff retries deal with packets potentially dropped by NATs) - After holes are poked, the CTM protocol establishes shortcut Advanced Computing and Information Systems laboratory 21 Experiments z z z z z Latency and bandwidth overheads Delays incurred by new node • • To become fully routable Direct overlay link setup High-throughput computing • PBS-scheduled sequential application: Meme Loosely-coupled parallel application • PVM parallel application: FastDNA VM migration Advanced Computing and Information Systems laboratory 22 Experimental Setup Hosts: 2.4GHz Xeon, Linux 2.4.20, VMware GSX Host: 1.3GHz P-III Linux 2.4.21 VMPlayer Host: 1.7GHz P4, Win XP SP2, VMPlayer Wide-area Overlay of virtual Workstations (WOW) 34 compute nodes, 118-node PlanetLab P2P routers Advanced Computing and Information Systems laboratory 23 Latency, bandwidth (single IPOP link) • 6ms-11ms latency overhead per packet for ICMP ping • 1.9MB/s ttcp LAN bandwidth (20% of physical), 1.2MB/s WAN bandwidth (80% of physical) • High overhead in LAN due to: • User-level overlay; double traversal of kernel stack • C# runtime • (Other user-level overlays (ViNe, VNET, Violin) report few-ms latency overheads) •Wide-area overhead amortized over long latency links Advanced Computing and Information Systems laboratory 24 Topology adaptation VM joins IPOP, starts pinging another VM Average ICMP Echo round trip latency (ms) Round-trip latencies during WOW node join (UFL-NWU) 300 Time to become P2P routable: few seconds <2% packets dropped @ ICMP 17 250 200 150 100 50 0 0 40 80 120 160 200 240 280 320 360 400 ICMP Sequence Number % Packets Dropped (over 100 trials) Dropped packets during WOW node join (UFL-NWU) Initial latency >180 ms overlay routing @ ICMP 30: < 40ms 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 11 21 31 41 ICMP Sequence Number Advanced Computing and Information Systems laboratory 25 High-throughput computing Meme: unmodified sequential application Average execution time: 24s 4000 jobs 1 job/second PBS head node NFS server WOW worker nodes Advanced Computing and Information Systems laboratory 26 Meme execution time histograms Execution time histogram: PBS/Meme (shortcuts disabled) 30% 30% 25% 25% Frequency of Occurrence Frequency of Occurrence Execution time histogram: PBS/Meme (shortcuts enabled) 20% 15% 10% 5% 0% 20% 15% 10% 5% 0% 8 24 40 56 72 88 Wall-clock time (s) 8 24 40 56 72 88 Wall-clock time (s) Shortcuts enabled: average job wallclock=24.6s, stdev=6.6s (53 jobs/min) Shortcuts disabled: average job wallclock=32.2s, stdev=9.7s (22 jobs/min) Advanced Computing and Information Systems laboratory 27 Loosely-coupled parallel application fastDNAml: PVM-based; master keeps task list, sends to available workers Sequential Execution Execution time Parallel Execution Node 2 Node 34 15 nodes 34 nodes 22272s 45191s 2439s 1642s 9.1 13.6 Parallel speedup (with respect to node 2) Advanced Computing and Information Systems laboratory 28 Data-intensive parallel application z LSS application • Least-square minimization of light-scattering • spectrum collected from an experiment against Mie theory analytical results MPI parallel application; LAM-API using ssh; shared image databases mounted over virtualized NFS (Grid Virtual File System) Advanced Computing and Information Systems laboratory 29 VM migration PBS worker node migrated over WAN while processing Meme jobs Resumed gracefully after migrated VM autonomously re-joined the WOW Virtual IP or tap device remains the same; physical IP of eth0 changes IPOP restarts; PBS daemons, NFS, application – no need to restart Migration of SSH server during SCP transfer also successful Job Wall Clock Time (seconds) Migration of PBS worker VM during job execution 250 200 150 100 50 0 0 30 60 90 120 150 180 PBS Job ID Background load injected on UFL host VM migrates to NWU Advanced Computing and Information Systems laboratory VM runs on unloaded host 30 Related Work z Virtual Networking z Internet Indirection Infrastructure (i3) z IPv6 tunneling • VIOLIN • VNET; topology adaptation • ViNe • Support for mobility, multicast, anycast • Decouples packet sending from receiving • Based on Chord p2p protocol • IPv6 over UDP (Teredo protocol) • IPv6 over P2P (P6P) Advanced Computing and Information Systems laboratory 31 Summary z Target applications • High-throughput computing • Loosely-coupled parallel applications • Collaborative environments • In-VIGO; nanoHUB z On-going work: open environment for Grid Computing • Downloadable VM “appliance” image • Automatic configuration Advanced Computing and Information Systems laboratory 32 Acknowledgments z z In-VIGO team at UFL National Science Foundation • • Middleware Initiative (http://www.nsf-middleware.org) Research Resources Program • nCn center z Resources z IBM Shared University Research • Peter Dinda (Northwestern University) • SURA/SCOOP Advanced Computing and Information Systems laboratory 33 Questions? • Send email to renato (at) acis.ufl.edu • http://byron.acis.ufl.edu/~renato • IPOP wiki: • http://boykin.acis.ufl.edu/wiki/index.php/IPOP • Has pointers to an arch repository with the code (it’s C#, runs on mono) • More documentation to be added Advanced Computing and Information Systems laboratory 34