Download INM2007

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Network tap wikipedia , lookup

Distributed firewall wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Airborne Networking wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Lag wikipedia , lookup

Remote Desktop Services wikipedia , lookup

Transcript
Live Data Center Migration across WANs:
A Robust Cooperative Context Aware Approach
Kobus Van der Merwe
with
K.K. Ramakrishnan and Prashant Shenoy
Motivation
• Most network based services/applications involve components
hosted in data centers
• Internet:
–
•
Mail/Web servers, VoIP, IPTV, P2P directory services etc
VPNs:
–
Mail servers, financial/business applications etc
• Many of these services require 24x7 availability
• Any downtime is unacceptable
–
At best inconvenience users; at worst major business impact;
typically has financial implications
• Recent well published outages: Blackberry, Skype
• Objective our work:
• Business continuity in face of data center outages, both
planned (planned maintenance) or unplanned (disaster
recovery)
Page 2
Motivation cont.
Existing solutions to deal with outages are inadequate:
• Local redundancy solutions
– Component redundancy (hot-swappable), multiple network
connections
•
•
No protection against data center outages
Existing cross data center solutions
– Instance replication
•
•
–
Remote replication (either synchronous or asynchronous)
•
Page 3
Same content/service available in multiple locations
Works well for stateless services (e.g., Web servers)
– Not for any statefull applications
Partial solutions
– Typically only deals with storage
– Not seamless; involves server downtime, IP addresses
change etc
Our approach
•
Basic approach:
– Seamless live service migration across WANs
•
•
•
Including all components: server, data, network
Cooperative, migration aware approach
– Migration manager orchestrates migration across all three
subsystems
In summary:
– Planned outages
•
•
–
Unplanned outages
•
Page 4
Migration of both server and data
– Live server migration
– Performed once
Atomic switchover of network to complete migration
“Continuous live migration”
– Server and data continuously replicated to remote site
– On failure, atomic switchover of network
Challenges
Seamless
liveserver
server
migration across WAN
LAN
based live
migration:
–•The
image
of running
virtual
server isenabled
copied to
new physical
LAN
based
live server
migration
bya virtual
server
platform
(while the
server
is still running on the old platform)
technologies
(Xen,
Vmware)
–Server state is synchronized between the two images
• WAN based server migration
–Migration software switches over to the new server with minimal
– Use existing
server migration
downtime
(tens ofvirtual
milliseconds)
“Management”
connectivity
site
to enable
image
–New• server
is exactly
the sametoasremote
the old
server
(same
IPmigration
address,
network
state support
stays intact
etc) IP address to migrate with the (virtual)
– Network
to allow
–Storage
handled by through network attached storage (NAS), e.g.,
server
NFS
– Migrate storage to remote site
•
•
Server and storage remain consistent
Continuous live migration
Page 5
Networking Support
•
IP address migration:
– Challenging to move IP addresses in current Internet
•
Especially dynamically
•
Isolate impact on the rest of the network
Routing protocols don’t change instantly
– Connectivity changes not under data center control
Our approach:
– Allow migration management system to initiate network
connectivity change
–
•
•
–
Time critical changes are kept local
•
–
Page 6
Network provides API to migration manager
Network-wide (routing protocol) changes not time critical
Use temporary tunnels to deal with mobility
IP Migration Primitive
PSa
Data Center
A
VSa
Migration
Software
Data Center
B
PEa
PEc
Physical Server (PS)
PSb
VSa
Virtual Server (VS)
PEb
PEd
Goal: Migrate Virtual Server “a” (VSa) with IP address IPa from Physical Server “a” (PSa) in data center “A”
(DCa) to Physical Server “b” (PSb) in data center “B” (DCb)
Network part of migration
1. Migration software signals to “network” that IPa will (soon) migrate from PEa to PEb
2. “Network” creates a tunnel between PEa and PEb
3. Server migration executed between PSa and PSb
4. Migration software signals to “network” that switchover should take place
5. PEa switches all traffic towards IPa to tunnel between PEa and PEb which delivers the traffic to VSa in
PSb. (Return traffic does not need to go through tunnel.)
Page 7
IP Migration Primitive
PSa
Physical Server (PS)
PSb
Data Center
A
PEa
PEc
VSa
Data Center
B
VSa
Virtual Server (VS)
PEb
PEd
After first five steps, server migration is done as far as migration software is concerned. Traffic towards IPa is
“dog-legged” through PEa, so a few more steps remain in the network:
1. PEb starts to advertise a route to IPa with high local preference. So at this point there are two valid paths
towards IPa, one though PEa and the tunnel and another directly through PEb. As routers start to
learn about the newly advertised path they will prefer the direct path towards IPa and the tunnel will
“dry out”.
IP Migration Primitive:
2. When PEa detects no more traffic flowing through the tunnel it withdraws the route for IPa (if it had a
specific route for IPa) and tears down the tunnel.
Takes care of planned maintenance without storage needs
(E.g., VoIP network element)
Page 8
Data Storage
Synchronous
Local
Remote
Asynchronous
Local
•
Remote
Existing WAN solutions: remote replication
–
–
–
Maintain a primary/local and remote storage system
Replicate data between primary and remote systems
One of two modes:
•
•
Page 9
Synchronous: each write performed locally and remotely before return to
“application”
– Local and remote remains synchronized
– Poor performance: both throughput and application latency
Asynchronous: local and remote allowed to diverge, replicate a consistent
“snapshot”
– Good performance (high throughput, low (local) latency
– Potential data loss because of divergence
Our approach:
– Remote replication that can
seamlessly move between
synchronous and asynchronous
replication
– Allow replication mode to be
controlled by migration
management system:
•
•
Allow bulk of data to be replicated
asynchronously
Switch to synchronous when
needed
– Final part of server migration
process
Local
Switch
Synchronous
•
Asynchronous
Migration Aware Replication
IP Migration Primitive + Migration Aware Replication:
Takes care of planned maintenance with storage needs
Page 10
Remote
Unplanned Outages
•
Conflicting metrics of concern
– Recovery point objective (RPO)
•
–
How much data loss is acceptable?
Recovery time objective (RTO)
•
How long can service be down?
Cost (overhead of protection)
Range of meaning to “unplanned”
– Catastrophic instantaneous failure
–
•
•
–
But also imminent failure scenarios
•
•
•
•
No notice whatsoever
Imminent equipments failure (e.g., increase in disk errors; imminent
failure of fiber)
Developing natural/man-made disasters
– E.g., flooding/steam pipe burst in NY, probably even with 911
– Minutes to hours to react
Existing remote replication solutions deal with storage
– No support for server migration
Our goal:
– Replicate data and server to allow for seamless failover
Page 11
Application state requirements
•Limited application state:
– E.g., VoIP network element that maintains call state (for 3-way calling and mid-call events),
or VoD servers (for fast-forward, random access events)
– Lost session state => application impact
•
–
RTO small, RPO medium
•
–
•
Some state loss is tolerable (drop few calls), but service has to stay up
Instrument application to initiate partial migration, when new state has been created
Statefull applications:
–
–
E.g., e-commerce applications (shopping cart, auction sites)
Lost session state => application impact
•
–
–
•
Inconvenience
(At best) inconvenience, (at worst)application correctness, monetary impact
RTO small, RPO small (minimize state loss, site has to stay up)
Continuous (incremental) server migration
High integrity applications
–
–
–
Page 12
E.g., financial transactions, other data base applications
RTO medium, RPO very small (absolutely no data loss, rather some downtime)
Reduce RTO with continuous (incremental) server migration
Continuous Server Migration
Enabling Technology: VS record/replay
RECORD
Start Recording
PS
VS
REPLAY
Restore snapshot
•
Snapshot of VS
Record execution state
Replay execution state
Virtual server record/replay: available from VMware
–
–
–
Efficient recording: track “external” events + times
Synchronize events with VM state during replay
Developed as a debugging tool
Page 13
Continuous Server Migration
Local: RECORD
With migration aware replication
REPLICATE
Remote: REPLAY
•
•
Asynchronously replicate initial snapshot
Replication of execution state
IP Migration Primitive + Migration Aware Replication + Continuous Server Migration:
–
–
Asynchronous if application can tolerate some state loss and execution state
Takes care of unplanned outages
represent consistent checkpoint
Synchronous otherwise
Page 14
Status
•
Migration aware replication
– Key building blocks prototyped
•
•
•
WAN live migration
– Key building blocks prototyped (without storage)
•
•
•
•
“Semantic Aware Replication” project
Gal Niv (UMass)
“Live virtual router migration” project
Yi Wang (Princeton)
Continuous Server Migration
– Just getting off the ground
Work in progress
– Many open issues remain!
Page 15