Download Zephyr: Live Migration in Shared Nothing Databases for Elastic

Aaron J. Elmore, Sudipto Das, Divyakant Agrawal, Amr El Abbadi Distributed Systems Lab University of California Santa Barbara  Serve thousands of applications (tenants) ◦ AppEngine, Azure, Force.com  Tenants are (typically) ◦ ◦ ◦ ◦ Small SLA sensitive Erratic load patterns Subject to flash crowds  i.e. the fark, digg, slashdot, reddit effect (for now)   Support for Multitenancy is critical Our focus: DBMSs serving these platforms Sudipto Das {[email protected]} What the tenant wants… What the service provider wants… Sudipto Das {[email protected]} Resources Capacity Resources Static provisioning for peak is inelastic Capacity Demand Demand Time Time Traditional Infrastructures Deployment in the Cloud Unused resources Slide Credits: Berkeley RAD Lab Sudipto Das {[email protected]} Load Balancer Application/ Web/Caching tier Database tier Sudipto Das {[email protected]}  Migrate a tenant’s database in a Live system ◦ A critical operation to support elasticity  Different from ◦ Migration between software versions ◦ Migration in case of schema evolution Sudipto Das {[email protected]}   VM migration [Clark et al., NSDI 2005] One tenant-per-VM ◦ Pros: allows fine-grained load balancing ◦ Cons  Performance overhead  Poor consolidation ratio [Curino et al., CIDR 2011]  Multiple tenants in a VM ◦ Pros: good performance ◦ Cons: Migrate all tenants  Coarse-grained load balancing Sudipto Das {[email protected]}  Multiple tenants share the same database process ◦ Shared process multitenancy ◦ Example systems: SQL Azure, ElasTraS, RelationalCloud, and may more  Migrate individual tenants   VM migration cannot be used for fine-grained migration Target architecture: Shared Nothing ◦ Shared storage architectures: see our VLDB 2011 Paper Sudipto Das {[email protected]} Sudipto Das {[email protected]}  How to ensure no downtime?  Need to migrate the persistent database image (tens of MBs to GBs)  How to guarantee correctness during failures?  Nodes can fail during migration  How to ensure transaction atomicity and durability?  How to recover migration state after failure?  Nodes recover after a failure  How to guarantee serializability?  Transaction correctness equivalent to normal operation  How to minimize migration cost? … Sudipto Das {[email protected]}  Downtime ◦ Time tenant is unavailable  Service Interruption ◦ Number of operations failing/transactions aborting  Migration Overhead/Performance impact ◦ During normal operation, migration, and after migration  Additional Data Transferred ◦ Data transferred in addition to DB’s persistent image Sudipto Das {[email protected]}  Migration executed in phases  Starts with transfer of minimal information to destination (“wireframe”)   Source and destination concurrently execute transactions in one migration phase Database pages used as granule of migration  Pages “pulled” by destination on-demand  Minimal transaction synchronization  A page is uniquely owned by either source or destination  Leverage page level locking  Logging and handshaking protocols to tolerate failures Sudipto Das {[email protected]}  For this talk ◦ Small tenants  i.e. not sharded across nodes. ◦ No replication ◦ No structural changes to indices  Extensions in the paper ◦ Relaxes these assumptions Sudipto Das {[email protected]} P1 Owned Pages P2 P3 Pn Active transactions TS1,…, TSk Source Destination Page owned by Node Page not owned by Node Sudipto Das {[email protected]} Freeze index wireframe and migrate P1 Owned Pages Active transactions P2 P3 P1 P2 P3 Pn Pn Un-owned Pages TS1,…, TSk Source Destination Page owned by Node Page not owned by Node Sudipto Das {[email protected]} Source Destination Sudipto Das {[email protected]} Requests for un-owned pages can block P1 P2 P3 P3 accessed by TDi Pn Old, still active transactions TSk+1,… , TSl Source P1 P2 P3 P3 pulled from source Pn TD1,…, TDm New transactions Destination Index wireframes remain frozen Page owned by Node Page not owned by Node Sudipto Das {[email protected]} Pages can be pulled by the destination, if needed P1 P2 P3 P1 P2 P3 Pn P1, P2, … pushed from source Pn TDm+1, …, TDn Completed Source Destination Page owned by Node Page not owned by Node Sudipto Das {[email protected]} Index wireframe un-frozen P1 P2 P3 Pn TDn+1,… , TDp Source Destination Page owned by Node Page not owned by Node Sudipto Das {[email protected]}  Once migrated, pages are never pulled back by source ◦ Transactions at source accessing migrated pages are aborted  No structural changes to indices during migration ◦ Transactions (at both nodes) that make structural changes to indices abort  Destination “pulls” pages on-demand ◦ Transactions at the destination experience higher latency compared to normal operation Sudipto Das {[email protected]}  Only concern is “dual mode” ◦ Init and Finish: only one node is executing transactions    Local predicate locking of internal index and exclusive page level locking between nodes  no phantoms Strict 2PL  Transactions are locally serializable Pages transferred only once ◦ No Tdest  Tsource conflict dependency  Guaranteed serializability Sudipto Das {[email protected]}  Transaction recovery ◦ For every database page, transactions at source ordered before transactions at destination ◦ After failure, conflicting transactions replayed in the same order  Migration recovery ◦ Atomic transitions between migration modes  Logging and handshake protocols ◦ Every page has exactly one owner  Bookkeeping at the index level Sudipto Das {[email protected]}  In the presence of arbitrary repeated failures, Zephyr ensures: ◦ Updates made to database pages are consistent ◦ A failure does not leave a page without an owner ◦ Both source and destination are in the same migration mode  Guaranteed termination and starvation freedom Sudipto Das {[email protected]}  Replicated Tenants  Sharded Tenants  Allow structural changes to the indices ◦ Using shared lock managers in the dual mode Sudipto Das {[email protected]}  Prototyped using an open source OLTP database H2 ◦ ◦ ◦ ◦  Supports standard SQL/JDBC API Serializable isolation level Tree Indices Relational data model Modified the database engine ◦ Added support for freezing indices ◦ Page migration status maintained using index ◦ Details in the paper…  Tungsten SQL Router migrates JDBC connections during migration Sudipto Das {[email protected]}   Two database nodes, each with a DB instance running Synthetic benchmark as load generator ◦ Modified YCSB to add transactions  Small read/write transactions  Compared against Stop and Copy (S&C) Sudipto Das {[email protected]} System Controller Metadata Default transaction parameters: 10 operations per transaction 80% Read, 15% Update, 5% Inserts Workload: 60 sessions 100 Transactions per session Migrate Hardware: 2.4 Ghz Intel Core 2 Quads, 8GB RAM, 7200 RPM SATA HDs with 32 MB Cache Gigabit ethernet Default DB Size: 100k rows (~250 MB) Sudipto Das {[email protected]}  Downtime (tenant unavailability) ◦ S&C: 3 – 8 seconds (needed to migrate, unavailable for updates) ◦ Zephyr: No downtime. Either source or destination is available  Service interruption (failed operations) ◦ S&C: ~100 s – 1,000s. All transactions with updates are aborted ◦ Zephyr: ~10s – 100s. Orders of magnitude less interruption Sudipto Das {[email protected]}  Average increase in transaction latency (compared to the 6,000 transaction workload without migration) ◦ S&C: 10 – 15%. Cold cache at destination ◦ Zephyr: 10 – 20%. Pages fetched on-demand  Data transfer ◦ S&C: Persistent database image ◦ Zephyr: 2 – 3% additional data transfer (messaging overhead)  Total time taken to migrate ◦ S&C: 3 – 8 seconds. Unavailable for any writes ◦ Zephyr: 10 – 18 seconds. No-unavailability Sudipto Das {[email protected]} Orders of magnitude fewer failed operations Sudipto Das {[email protected]}  Proposed Zephyr, a live database migration technique with no downtime for shared nothing architectures ◦ The first end to end solution with safety, correctness and liveness guarantees   Prototype implementation on a relational OLTP database Low cost on a variety of workloads Sudipto Das {[email protected]} Txns Source Destination Sudipto Das {[email protected]} Txns Source Destination Sudipto Das {[email protected]} Txns Source Destination Sudipto Das {[email protected]} Txns Source Destination Sudipto Das {[email protected]} Txns Source Destination Sudipto Das {[email protected]} 37 Txns Source Destination Sudipto Das {[email protected]}  Either source or destination is serving the tenant ◦ No downtime  Serializable transaction execution ◦ Unique page ownership ◦ Local multi-granularity locking  Safety in the presence of failures ◦ Transactions are atomic and durable ◦ Migration state is recovered from log  Ensure consistency of the database state Sudipto Das {[email protected]}  Wireframe copy  Typically orders of magnitude smaller than data  Operational overhead during migration  Extra data (in addition to database pages) transferred  Transactions aborted during migration Sudipto Das {[email protected]} Failures due to attempted modification of Index structure Sudipto Das {[email protected]}    Only committed transaction reported Loss of cache for both migration types Zephyr results in a remote fetch Sudipto Das {[email protected]}

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Zephyr: Live Migration in Shared Nothing Databases for Elastic