Download Slides (PDF).

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Airborne Networking wikipedia , lookup

Deep packet inspection wikipedia , lookup

Distributed operating system wikipedia , lookup

Social computing wikipedia , lookup

Computer cluster wikipedia , lookup

Peer-to-peer wikipedia , lookup

Transcript
Self-configuring IP-over-P2P
Overlays:
Interconnecting Virtual Machines
for Wide-area Computing
Renato Figueiredo,
Arijit Ganguly, Abhishek Agrawal, P. Oscar Boykin
Advanced Computing and Information Systems (ACIS)
University of Florida
Advanced Computing and Information Systems laboratory
Outline
z
What’s in a title
z
IPOP – IP-over-P2P overlays
z
Experiments & analysis
• Virtual machines for Grid computing
• Virtual networks
• Use cases
• Overlay routing
• NAT traversal and direct connections
• WOW – wide-area overlay network of virtual
workstations
Advanced Computing and Information Systems laboratory
2
Wide-area, Grid computing
z
1
“Flexible, secure, coordinated resource
sharing among dynamic collections of
individuals, institutions, and resources” 1
“The Anatomy of the Grid: Enabling Scalable Virtual
Organizations”, I. Foster, C. Kesselman, S. Tuecke.
International J. Supercomputer Applications, 15(3), 2001
Advanced Computing and Information Systems laboratory
3
Resource sharing
z
Traditional solutions:
z
Evolved from centrally-admin. domains
• Multi-task operating systems
• User accounts
• File systems
• Functionality available for reuse
• However, Grids span administrative domains
Advanced Computing and Information Systems laboratory
4
Sharing – owner’s perspective
z
Wishes to provide resource cycles to Grid
√User “A” is trusted and uses an
A
B
C
environment common to my cluster
×User “B” is not to be trusted
• May compromise resource, other users
×User “C” has different O/S, application
needs?
• Administrative overhead
• May not be possible to support “C” without
dedicating resource or interfering with other
users
Advanced Computing and Information Systems laboratory
Slide provided by R. Figueiredo
5
Sharing – user’s perspective
Wishes to use cycles from a Grid
9Develops apps using standard Grid
interfaces, and trusts users who share
resource A
×Has a grid-unaware application?
A
• Provider B may not support the environment
B
C
expected by application: O/S, libraries,
packages, …
×Does not trust others using resource C?
• If another user compromises C’s O/S, they also
compromise user’s work
Advanced Computing and Information Systems laboratory
Slide provided by R. Figueiredo
6
An alternative approach
System Virtual Machines
Examples: VMware, Xen, MS Virtual Server, …
z
“A virtual machine is taken to be an efficient, isolated,
duplicate copy of the real machine” 2
•
•
•
“A statistically dominant subset of the virtual processor’s
instructions is executed directly by the real processor” 2
“…transforms the single machine interface into the illusion of
many” 3
“Any program run under the VM has an effect identical with
that demonstrated if the program had been run in the original
machine directly” 2
2
“Formal Requirements for Virtualizable Third-Generation Architectures”, G. Popek and R.
Goldberg, Communications of the ACM, 17(7), July 1974
3 “Survey of Virtual Machine Research”, R. Goldberg, IEEE Computer, June 1974
Advanced Computing and Information Systems laboratory
Slide provided by R. Figueiredo
7
Networking VMs
z
System VM isolates user execution
environment from host
• Great! Now how do I access it?
z
Users: want full bi-directional TCP/IP
connectivity
• Facilitate programming and deployment
• But cross-domain communication subject to
NAT, firewall policies
z
Providers: want to isolate traffic
• Users with admin privileges inside VM still
pose security problems: viruses, DoS
Advanced Computing and Information Systems laboratory
8
Virtual networking
z
Isolation: dealt with similarly to VMs
• Multiple, isolated virtual networks time-share
physical network
z
z
Key technique: tunneling
Related work
• Generic: VPNs
• Applied to Grid computing:
• VNET (P. Dinda, Northwestern U.)
• Violin (D. Xu, Purdue U.)
• ViNe (J. Fortes, U. Florida)
Advanced Computing and Information Systems laboratory
9
VMs + Virtual Networking
z
Resource aggregation
• Consistent view of the resources
• Same VM configuration –> homogeneous
cluster overlaying heterogeneous nodes
z
z
Leverage LAN applications, middleware
Enable VM migration across subnets
• Virtual IP decoupled from physical IP
• For load-balancing or fault tolerance
Advanced Computing and Information Systems laboratory
10
Our approach – IP-over-P2P
z
Virtual network should be self-configured
z
Virtual network should be isolated
• Avoid administrative overhead of VPNs
• Including cross-domain NAT traversal
• Virtual private address space decoupled from
Internet address space
Advanced Computing and Information Systems laboratory
11
Use cases
z
z
VM “appliances”: define once, instantiate
many
Example: compute server appliance
• Condor master, worker, submit
• Role defined through configuration when it
•
z
joins the virtual network
Growing shared base of compute resources
Example: client appliance
• Contains software needed to submit jobs to a
community’s gateway, mount file systems, etc
Advanced Computing and Information Systems laboratory
12
Motivations for P2P
z
z
z
z
Scalability
•
Overhead of adding a new node is constant and
independent of size of the network
Resiliency
•
Robust P2P routing
Accessibility
•
Ability to traverse NAT/Firewalls
Traffic pattern based topology adaptation of
P2P network
•
•
Self-organized
Decentralized
Advanced Computing and Information Systems laboratory
13
IPOP (IP-over-P2P)
z
IP tunneling over P2P overlay networks
z
Virtual IP packet capture and injection
through tap interface
z
Builds upon Brunet library
• UDP, TCP
• Robust UDP/TCP connection support
• Hole-punching a-la STUN for NAT traversal
• C# - rapid prototyping
Advanced Computing and Information Systems laboratory
14
Brunet P2P architecture
z
Ring-structured overlay network topology
Overlay link:
z
Greedy packet routing
z
• Near: neighbor connections along ring
• Far: connections along chord
• O(log2(n)) overlay hop routing
Node A
Node B
Multi-hop overlay path between nodes
900-node Brunet ring over PlanetLab
Advanced Computing and Information Systems laboratory
15
IPOP algorithm
z
z
At sender:
•
•
•
•
•
Read Ethernet packet from tap interface
Retrieve IP payload
Compute brunet Id ‘x’ = SHA1(IP destination)
Encapsulate IP packet inside brunet packet
Send Brunet packet to ‘x’
At IPOP node ‘x’ on receiving a packet:
•
•
•
•
Check of brunet destination Id == ‘x’
Extract IP packet
Wrap inside an Ethernet packet addressed to Ethernet
address of tap
Write packet to tap device
Advanced Computing and Information Systems laboratory
16
Packet capture and routing
Advanced Computing and Information Systems laboratory
17
Distributed topology adaptation
z
O(log2(n)) overlay hop routing used to
selectively “bootstrap” 1-hop routing
• Monitor outbound traffic at each IPOP node
• Setup direct overlay link setup between
communicating nodes based on IP overlay
traffic inspection
• Heuristic: track if number of packets during time
interval t greater than a threshold T
z
Once 1-hop “shortcut” is established
• No dependency on other P2P nodes for
communication
Advanced Computing and Information Systems laboratory
18
Establishing shortcuts (1)
Sends
CTM
request
Node A
Node B
Path taken by CTM request
- A communicates with B
- Traffic inspection triggers request to create shortcut
- Connect-to-me (CTM) Brunet protocol message
- “A” tells “B” its address(es):
- “A” knows its private address
- “A” learns public IP/port of its NAT translation when it
joins the overlay; at least one node on public network
Advanced Computing and Information Systems laboratory
19
Establishing shortcuts (2)
Sends CTM
reply; and
initiates
linking
Node A
Node B
Path taken by CTM reply
- “B” sends CTM reply – routed through overlay
- “B” tells “A” its address(es)
- “B” initiates linking protocol by attempting to connect to
“A” directly
Advanced Computing and Information Systems laboratory
20
Establishing shortcuts (3)
Gets CTM
reply;
initiates
linking
Node A
Node B
- “A” gets CTM reply; initiates linking protocol with “B”
- B’s linking protocol message to A pokes hole on B’s NAT
- A’s linking protocol message to B pokes hole on A’s NAT
(exponential backoff retries deal with packets
potentially dropped by NATs)
- After holes are poked, the CTM protocol establishes shortcut
Advanced Computing and Information Systems laboratory
21
Experiments
z
z
z
z
z
Latency and bandwidth overheads
Delays incurred by new node
•
•
To become fully routable
Direct overlay link setup
High-throughput computing
•
PBS-scheduled sequential application: Meme
Loosely-coupled parallel application
•
PVM parallel application: FastDNA
VM migration
Advanced Computing and Information Systems laboratory
22
Experimental Setup
Hosts: 2.4GHz Xeon, Linux 2.4.20,
VMware GSX
Host: 1.3GHz P-III Linux 2.4.21
VMPlayer
Host: 1.7GHz P4,
Win XP SP2, VMPlayer
Wide-area Overlay of virtual Workstations (WOW)
34 compute nodes, 118-node PlanetLab P2P routers
Advanced Computing and Information Systems laboratory
23
Latency, bandwidth (single IPOP link)
• 6ms-11ms latency overhead per packet for ICMP ping
• 1.9MB/s ttcp LAN bandwidth (20% of physical), 1.2MB/s
WAN bandwidth (80% of physical)
• High overhead in LAN due to:
• User-level overlay; double traversal of kernel stack
• C# runtime
• (Other user-level overlays (ViNe, VNET, Violin) report
few-ms latency overheads)
•Wide-area overhead amortized over long latency links
Advanced Computing and Information Systems laboratory
24
Topology adaptation
VM joins IPOP, starts pinging
another VM
Average ICMP Echo
round trip latency
(ms)
Round-trip latencies during WOW node join
(UFL-NWU)
300
Time to become P2P routable:
few seconds
<2% packets dropped @ ICMP 17
250
200
150
100
50
0
0
40
80
120
160
200
240
280
320
360
400
ICMP Sequence Number
% Packets Dropped
(over 100 trials)
Dropped packets during WOW node join
(UFL-NWU)
Initial latency >180 ms
overlay routing
@ ICMP 30: < 40ms
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1
11
21
31
41
ICMP Sequence Number
Advanced Computing and Information Systems laboratory
25
High-throughput computing
Meme: unmodified sequential application
Average execution time: 24s
4000 jobs
1 job/second
PBS head node
NFS server
WOW worker nodes
Advanced Computing and Information Systems laboratory
26
Meme execution time histograms
Execution time histogram: PBS/Meme (shortcuts
disabled)
30%
30%
25%
25%
Frequency of Occurrence
Frequency of Occurrence
Execution time histogram: PBS/Meme (shortcuts
enabled)
20%
15%
10%
5%
0%
20%
15%
10%
5%
0%
8
24
40
56
72
88
Wall-clock time (s)
8
24
40
56
72
88
Wall-clock time (s)
Shortcuts enabled: average job wallclock=24.6s, stdev=6.6s (53 jobs/min)
Shortcuts disabled: average job wallclock=32.2s, stdev=9.7s (22 jobs/min)
Advanced Computing and Information Systems laboratory
27
Loosely-coupled parallel application
fastDNAml: PVM-based; master keeps task list, sends to available workers
Sequential Execution
Execution
time
Parallel Execution
Node 2
Node 34
15 nodes
34 nodes
22272s
45191s
2439s
1642s
9.1
13.6
Parallel
speedup
(with respect
to node 2)
Advanced Computing and Information Systems laboratory
28
Data-intensive parallel application
z
LSS application
• Least-square minimization of light-scattering
•
spectrum collected from an experiment
against Mie theory analytical results
MPI parallel application; LAM-API using ssh;
shared image databases mounted over
virtualized NFS (Grid Virtual File System)
Advanced Computing and Information Systems laboratory
29
VM migration
PBS worker node migrated over WAN while processing Meme jobs
Resumed gracefully after migrated VM autonomously re-joined the WOW
Virtual IP or tap device remains the same; physical IP of eth0 changes
IPOP restarts; PBS daemons, NFS, application – no need to restart
Migration of SSH server during SCP transfer also successful
Job Wall Clock Time
(seconds)
Migration of PBS worker VM during job execution
250
200
150
100
50
0
0
30
60
90
120
150
180
PBS Job ID
Background load injected on UFL host
VM migrates to NWU
Advanced Computing and Information Systems laboratory
VM runs on unloaded host
30
Related Work
z
Virtual Networking
z
Internet Indirection Infrastructure (i3)
z
IPv6 tunneling
• VIOLIN
• VNET; topology adaptation
• ViNe
• Support for mobility, multicast, anycast
• Decouples packet sending from receiving
• Based on Chord p2p protocol
• IPv6 over UDP (Teredo protocol)
• IPv6 over P2P (P6P)
Advanced Computing and Information Systems laboratory
31
Summary
z
Target applications
• High-throughput computing
• Loosely-coupled parallel applications
• Collaborative environments
• In-VIGO; nanoHUB
z
On-going work: open environment for
Grid Computing
• Downloadable VM “appliance” image
• Automatic configuration
Advanced Computing and Information Systems laboratory
32
Acknowledgments
z
z
In-VIGO team at UFL
National Science Foundation
•
•
Middleware Initiative (http://www.nsf-middleware.org)
Research Resources Program
• nCn center
z
Resources
z
IBM Shared University Research
• Peter Dinda (Northwestern University)
• SURA/SCOOP
Advanced Computing and Information Systems laboratory
33
Questions?
• Send email to renato (at) acis.ufl.edu
• http://byron.acis.ufl.edu/~renato
• IPOP wiki:
• http://boykin.acis.ufl.edu/wiki/index.php/IPOP
• Has pointers to an arch repository with the code
(it’s C#, runs on mono)
• More documentation to be added
Advanced Computing and Information Systems laboratory
34