Download INM2007

Live Data Center Migration across WANs: A Robust Cooperative Context Aware Approach Kobus Van der Merwe with K.K. Ramakrishnan and Prashant Shenoy Motivation • Most network based services/applications involve components hosted in data centers • Internet: – • Mail/Web servers, VoIP, IPTV, P2P directory services etc VPNs: – Mail servers, financial/business applications etc • Many of these services require 24x7 availability • Any downtime is unacceptable – At best inconvenience users; at worst major business impact; typically has financial implications • Recent well published outages: Blackberry, Skype • Objective our work: • Business continuity in face of data center outages, both planned (planned maintenance) or unplanned (disaster recovery) Page 2 Motivation cont. Existing solutions to deal with outages are inadequate: • Local redundancy solutions – Component redundancy (hot-swappable), multiple network connections • • No protection against data center outages Existing cross data center solutions – Instance replication • • – Remote replication (either synchronous or asynchronous) • Page 3 Same content/service available in multiple locations Works well for stateless services (e.g., Web servers) – Not for any statefull applications Partial solutions – Typically only deals with storage – Not seamless; involves server downtime, IP addresses change etc Our approach • Basic approach: – Seamless live service migration across WANs • • • Including all components: server, data, network Cooperative, migration aware approach – Migration manager orchestrates migration across all three subsystems In summary: – Planned outages • • – Unplanned outages • Page 4 Migration of both server and data – Live server migration – Performed once Atomic switchover of network to complete migration “Continuous live migration” – Server and data continuously replicated to remote site – On failure, atomic switchover of network Challenges Seamless liveserver server migration across WAN LAN based live migration: –•The image of running virtual server isenabled copied to new physical LAN based live server migration bya virtual server platform (while the server is still running on the old platform) technologies (Xen, Vmware) –Server state is synchronized between the two images • WAN based server migration –Migration software switches over to the new server with minimal – Use existing server migration downtime (tens ofvirtual milliseconds) “Management” connectivity site to enable image –New• server is exactly the sametoasremote the old server (same IPmigration address, network state support stays intact etc) IP address to migrate with the (virtual) – Network to allow –Storage handled by through network attached storage (NAS), e.g., server NFS – Migrate storage to remote site • • Server and storage remain consistent Continuous live migration Page 5 Networking Support • IP address migration: – Challenging to move IP addresses in current Internet • Especially dynamically • Isolate impact on the rest of the network Routing protocols don’t change instantly – Connectivity changes not under data center control Our approach: – Allow migration management system to initiate network connectivity change – • • – Time critical changes are kept local • – Page 6 Network provides API to migration manager Network-wide (routing protocol) changes not time critical Use temporary tunnels to deal with mobility IP Migration Primitive PSa Data Center A VSa Migration Software Data Center B PEa PEc Physical Server (PS) PSb VSa Virtual Server (VS) PEb PEd Goal: Migrate Virtual Server “a” (VSa) with IP address IPa from Physical Server “a” (PSa) in data center “A” (DCa) to Physical Server “b” (PSb) in data center “B” (DCb) Network part of migration 1. Migration software signals to “network” that IPa will (soon) migrate from PEa to PEb 2. “Network” creates a tunnel between PEa and PEb 3. Server migration executed between PSa and PSb 4. Migration software signals to “network” that switchover should take place 5. PEa switches all traffic towards IPa to tunnel between PEa and PEb which delivers the traffic to VSa in PSb. (Return traffic does not need to go through tunnel.) Page 7 IP Migration Primitive PSa Physical Server (PS) PSb Data Center A PEa PEc VSa Data Center B VSa Virtual Server (VS) PEb PEd After first five steps, server migration is done as far as migration software is concerned. Traffic towards IPa is “dog-legged” through PEa, so a few more steps remain in the network: 1. PEb starts to advertise a route to IPa with high local preference. So at this point there are two valid paths towards IPa, one though PEa and the tunnel and another directly through PEb. As routers start to learn about the newly advertised path they will prefer the direct path towards IPa and the tunnel will “dry out”. IP Migration Primitive: 2. When PEa detects no more traffic flowing through the tunnel it withdraws the route for IPa (if it had a specific route for IPa) and tears down the tunnel. Takes care of planned maintenance without storage needs (E.g., VoIP network element) Page 8 Data Storage Synchronous Local Remote Asynchronous Local • Remote Existing WAN solutions: remote replication – – – Maintain a primary/local and remote storage system Replicate data between primary and remote systems One of two modes: • • Page 9 Synchronous: each write performed locally and remotely before return to “application” – Local and remote remains synchronized – Poor performance: both throughput and application latency Asynchronous: local and remote allowed to diverge, replicate a consistent “snapshot” – Good performance (high throughput, low (local) latency – Potential data loss because of divergence Our approach: – Remote replication that can seamlessly move between synchronous and asynchronous replication – Allow replication mode to be controlled by migration management system: • • Allow bulk of data to be replicated asynchronously Switch to synchronous when needed – Final part of server migration process Local Switch Synchronous • Asynchronous Migration Aware Replication IP Migration Primitive + Migration Aware Replication: Takes care of planned maintenance with storage needs Page 10 Remote Unplanned Outages • Conflicting metrics of concern – Recovery point objective (RPO) • – How much data loss is acceptable? Recovery time objective (RTO) • How long can service be down? Cost (overhead of protection) Range of meaning to “unplanned” – Catastrophic instantaneous failure – • • – But also imminent failure scenarios • • • • No notice whatsoever Imminent equipments failure (e.g., increase in disk errors; imminent failure of fiber) Developing natural/man-made disasters – E.g., flooding/steam pipe burst in NY, probably even with 911 – Minutes to hours to react Existing remote replication solutions deal with storage – No support for server migration Our goal: – Replicate data and server to allow for seamless failover Page 11 Application state requirements •Limited application state: – E.g., VoIP network element that maintains call state (for 3-way calling and mid-call events), or VoD servers (for fast-forward, random access events) – Lost session state => application impact • – RTO small, RPO medium • – • Some state loss is tolerable (drop few calls), but service has to stay up Instrument application to initiate partial migration, when new state has been created Statefull applications: – – E.g., e-commerce applications (shopping cart, auction sites) Lost session state => application impact • – – • Inconvenience (At best) inconvenience, (at worst)application correctness, monetary impact RTO small, RPO small (minimize state loss, site has to stay up) Continuous (incremental) server migration High integrity applications – – – Page 12 E.g., financial transactions, other data base applications RTO medium, RPO very small (absolutely no data loss, rather some downtime) Reduce RTO with continuous (incremental) server migration Continuous Server Migration Enabling Technology: VS record/replay RECORD Start Recording PS VS REPLAY Restore snapshot • Snapshot of VS Record execution state Replay execution state Virtual server record/replay: available from VMware – – – Efficient recording: track “external” events + times Synchronize events with VM state during replay Developed as a debugging tool Page 13 Continuous Server Migration Local: RECORD With migration aware replication REPLICATE Remote: REPLAY • • Asynchronously replicate initial snapshot Replication of execution state IP Migration Primitive + Migration Aware Replication + Continuous Server Migration: – – Asynchronous if application can tolerate some state loss and execution state Takes care of unplanned outages represent consistent checkpoint Synchronous otherwise Page 14 Status • Migration aware replication – Key building blocks prototyped • • • WAN live migration – Key building blocks prototyped (without storage) • • • • “Semantic Aware Replication” project Gal Niv (UMass) “Live virtual router migration” project Yi Wang (Princeton) Continuous Server Migration – Just getting off the ground Work in progress – Many open issues remain! Page 15

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download INM2007