Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Relational model wikipedia , lookup
Commitment ordering wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Database model wikipedia , lookup
Clusterpoint wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Object-relational impedance mismatch wikipedia , lookup
Aaron J. Elmore, Sudipto Das, Divyakant Agrawal, Amr El Abbadi Distributed Systems Lab University of California Santa Barbara Serve thousands of applications (tenants) ◦ AppEngine, Azure, Force.com Tenants are (typically) ◦ ◦ ◦ ◦ Small SLA sensitive Erratic load patterns Subject to flash crowds i.e. the fark, digg, slashdot, reddit effect (for now) Support for Multitenancy is critical Our focus: DBMSs serving these platforms Sudipto Das {[email protected]} What the tenant wants… What the service provider wants… Sudipto Das {[email protected]} Resources Capacity Resources Static provisioning for peak is inelastic Capacity Demand Demand Time Time Traditional Infrastructures Deployment in the Cloud Unused resources Slide Credits: Berkeley RAD Lab Sudipto Das {[email protected]} Load Balancer Application/ Web/Caching tier Database tier Sudipto Das {[email protected]} Migrate a tenant’s database in a Live system ◦ A critical operation to support elasticity Different from ◦ Migration between software versions ◦ Migration in case of schema evolution Sudipto Das {[email protected]} VM migration [Clark et al., NSDI 2005] One tenant-per-VM ◦ Pros: allows fine-grained load balancing ◦ Cons Performance overhead Poor consolidation ratio [Curino et al., CIDR 2011] Multiple tenants in a VM ◦ Pros: good performance ◦ Cons: Migrate all tenants Coarse-grained load balancing Sudipto Das {[email protected]} Multiple tenants share the same database process ◦ Shared process multitenancy ◦ Example systems: SQL Azure, ElasTraS, RelationalCloud, and may more Migrate individual tenants VM migration cannot be used for fine-grained migration Target architecture: Shared Nothing ◦ Shared storage architectures: see our VLDB 2011 Paper Sudipto Das {[email protected]} Sudipto Das {[email protected]} How to ensure no downtime? Need to migrate the persistent database image (tens of MBs to GBs) How to guarantee correctness during failures? Nodes can fail during migration How to ensure transaction atomicity and durability? How to recover migration state after failure? Nodes recover after a failure How to guarantee serializability? Transaction correctness equivalent to normal operation How to minimize migration cost? … Sudipto Das {[email protected]} Downtime ◦ Time tenant is unavailable Service Interruption ◦ Number of operations failing/transactions aborting Migration Overhead/Performance impact ◦ During normal operation, migration, and after migration Additional Data Transferred ◦ Data transferred in addition to DB’s persistent image Sudipto Das {[email protected]} Migration executed in phases Starts with transfer of minimal information to destination (“wireframe”) Source and destination concurrently execute transactions in one migration phase Database pages used as granule of migration Pages “pulled” by destination on-demand Minimal transaction synchronization A page is uniquely owned by either source or destination Leverage page level locking Logging and handshaking protocols to tolerate failures Sudipto Das {[email protected]} For this talk ◦ Small tenants i.e. not sharded across nodes. ◦ No replication ◦ No structural changes to indices Extensions in the paper ◦ Relaxes these assumptions Sudipto Das {[email protected]} P1 Owned Pages P2 P3 Pn Active transactions TS1,…, TSk Source Destination Page owned by Node Page not owned by Node Sudipto Das {[email protected]} Freeze index wireframe and migrate P1 Owned Pages Active transactions P2 P3 P1 P2 P3 Pn Pn Un-owned Pages TS1,…, TSk Source Destination Page owned by Node Page not owned by Node Sudipto Das {[email protected]} Source Destination Sudipto Das {[email protected]} Requests for un-owned pages can block P1 P2 P3 P3 accessed by TDi Pn Old, still active transactions TSk+1,… , TSl Source P1 P2 P3 P3 pulled from source Pn TD1,…, TDm New transactions Destination Index wireframes remain frozen Page owned by Node Page not owned by Node Sudipto Das {[email protected]} Pages can be pulled by the destination, if needed P1 P2 P3 P1 P2 P3 Pn P1, P2, … pushed from source Pn TDm+1, …, TDn Completed Source Destination Page owned by Node Page not owned by Node Sudipto Das {[email protected]} Index wireframe un-frozen P1 P2 P3 Pn TDn+1,… , TDp Source Destination Page owned by Node Page not owned by Node Sudipto Das {[email protected]} Once migrated, pages are never pulled back by source ◦ Transactions at source accessing migrated pages are aborted No structural changes to indices during migration ◦ Transactions (at both nodes) that make structural changes to indices abort Destination “pulls” pages on-demand ◦ Transactions at the destination experience higher latency compared to normal operation Sudipto Das {[email protected]} Only concern is “dual mode” ◦ Init and Finish: only one node is executing transactions Local predicate locking of internal index and exclusive page level locking between nodes no phantoms Strict 2PL Transactions are locally serializable Pages transferred only once ◦ No Tdest Tsource conflict dependency Guaranteed serializability Sudipto Das {[email protected]} Transaction recovery ◦ For every database page, transactions at source ordered before transactions at destination ◦ After failure, conflicting transactions replayed in the same order Migration recovery ◦ Atomic transitions between migration modes Logging and handshake protocols ◦ Every page has exactly one owner Bookkeeping at the index level Sudipto Das {[email protected]} In the presence of arbitrary repeated failures, Zephyr ensures: ◦ Updates made to database pages are consistent ◦ A failure does not leave a page without an owner ◦ Both source and destination are in the same migration mode Guaranteed termination and starvation freedom Sudipto Das {[email protected]} Replicated Tenants Sharded Tenants Allow structural changes to the indices ◦ Using shared lock managers in the dual mode Sudipto Das {[email protected]} Prototyped using an open source OLTP database H2 ◦ ◦ ◦ ◦ Supports standard SQL/JDBC API Serializable isolation level Tree Indices Relational data model Modified the database engine ◦ Added support for freezing indices ◦ Page migration status maintained using index ◦ Details in the paper… Tungsten SQL Router migrates JDBC connections during migration Sudipto Das {[email protected]} Two database nodes, each with a DB instance running Synthetic benchmark as load generator ◦ Modified YCSB to add transactions Small read/write transactions Compared against Stop and Copy (S&C) Sudipto Das {[email protected]} System Controller Metadata Default transaction parameters: 10 operations per transaction 80% Read, 15% Update, 5% Inserts Workload: 60 sessions 100 Transactions per session Migrate Hardware: 2.4 Ghz Intel Core 2 Quads, 8GB RAM, 7200 RPM SATA HDs with 32 MB Cache Gigabit ethernet Default DB Size: 100k rows (~250 MB) Sudipto Das {[email protected]} Downtime (tenant unavailability) ◦ S&C: 3 – 8 seconds (needed to migrate, unavailable for updates) ◦ Zephyr: No downtime. Either source or destination is available Service interruption (failed operations) ◦ S&C: ~100 s – 1,000s. All transactions with updates are aborted ◦ Zephyr: ~10s – 100s. Orders of magnitude less interruption Sudipto Das {[email protected]} Average increase in transaction latency (compared to the 6,000 transaction workload without migration) ◦ S&C: 10 – 15%. Cold cache at destination ◦ Zephyr: 10 – 20%. Pages fetched on-demand Data transfer ◦ S&C: Persistent database image ◦ Zephyr: 2 – 3% additional data transfer (messaging overhead) Total time taken to migrate ◦ S&C: 3 – 8 seconds. Unavailable for any writes ◦ Zephyr: 10 – 18 seconds. No-unavailability Sudipto Das {[email protected]} Orders of magnitude fewer failed operations Sudipto Das {[email protected]} Proposed Zephyr, a live database migration technique with no downtime for shared nothing architectures ◦ The first end to end solution with safety, correctness and liveness guarantees Prototype implementation on a relational OLTP database Low cost on a variety of workloads Sudipto Das {[email protected]} Txns Source Destination Sudipto Das {[email protected]} Txns Source Destination Sudipto Das {[email protected]} Txns Source Destination Sudipto Das {[email protected]} Txns Source Destination Sudipto Das {[email protected]} Txns Source Destination Sudipto Das {[email protected]} 37 Txns Source Destination Sudipto Das {[email protected]} Either source or destination is serving the tenant ◦ No downtime Serializable transaction execution ◦ Unique page ownership ◦ Local multi-granularity locking Safety in the presence of failures ◦ Transactions are atomic and durable ◦ Migration state is recovered from log Ensure consistency of the database state Sudipto Das {[email protected]} Wireframe copy Typically orders of magnitude smaller than data Operational overhead during migration Extra data (in addition to database pages) transferred Transactions aborted during migration Sudipto Das {[email protected]} Failures due to attempted modification of Index structure Sudipto Das {[email protected]} Only committed transaction reported Loss of cache for both migration types Zephyr results in a remote fetch Sudipto Das {[email protected]}