Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Replex: A Multi-Index, HighlyAvailable Data Store Amy Tai*§, Michael Wei*, Michael J. Freedman§, Ittai Abraham*, Dahlia Malkhi* *VMware Research, §Princeton University SQL vs NoSQL Ex: CockroachDB, Yesquel, H-Store SQL Key-value stores Ex: HyperDex, Cassandra NoSQL NewSQL Rich querying model Rich querying model Transaction support Transaction support Poor performance Scalable, high performance Simple, key-based querying No or limited transaction support Scalable SQL vs NoSQL Ex: CockroachDB, Yesquel, H-Store SQL Key-value stores Ex: HyperDex, Cassandra NoSQL NewSQL Rich querying model Rich querying model Transaction support Transaction support Poor performance Scalable, high performance Simple, key-based querying No or limited transaction support Scalable NoSQL Scales with Database Partitioning col col 0 1 col2 ... colc r[2] ... r[c] rows r[0] r[1] default partitioning key NoSQL Scales with Database Partitioning Partitions (with respect to the partitioning index) How to enable richer queries without sacrificing sharednothing scale-out? Key Observations 1. Indexing enables richer queries Local Indexing on the Partitioning Index col0 col1 col2 ... colc select * where col0=“foo” Index updates are local Index lookups are local Local Indexing Index stored locally at each partition insert (r[0] r[1] r[2] col0 col1 col2 ... colc . . . r[c]) Index updates are local Local Indexing Each partition builds a local index col0 col1 col2 ... colc select * where col2=“foo” Index updates are local Index lookups must be broadcast to all partitions Potential synchronization across partitions Global Indexing Distributed data structure spans all partitions insert (r[0] r[1] r[2] col0 . . . r[c]) Index updates are multi-hop Index lookups executed on a single data structure col1 col2 ... colc Key Observations 1. Indexing enables richer queries 2. Indexes more efficient if data is partitioned according to that index Partitioning by Index Storage Overheads col0 col1 col2 ... colc default partitioning key Partitioning by index must store data again Partitioning by Index Storage Overheads col0 col1 col2 ... colc default partitioning key Partitioning by index must store data again Partitioning by Index Storage Overheads col0 col1 col2 ... colc default partitioning key Partitioning by index must store data again Partitioning by Index Storage Overheads col0 col1 col2 default partitioning key HyperDex ... colc Key Observations 1. Indexing enables richer queries 2. Indexes more efficient if data is partitioned according to that index 3. Rethink replication paradigm Replex New replication unit: replex -- data replica partitioned and sorted with respect to an associated index replexes Serves data replication and indexing Replex New replication unit: replex -- data replica partitioned and sorted with respect to an associated index Replace data replicas with replexes replexes Serves data replication and indexing System Architecture col0 col1 col2 ... colc Inserts in Replex insert (r[0] r[1] r[2] . . . r[c]) ? Partition Determined by a replex’s Index insert (r[0] r[1] r[2] . . . r[c]) ? Partition Determined by a replex’s Index insert (r[0] r[1] r[2] . . . r[c]) ? Partition Determined by a replex’s Index insert (r[0] r[1] r[2] . . . r[c]) ? Partition Determined by a replex’s Index insert (r[0] r[1] r[2] . . . r[c]) Data Chains in Replex (z[0] (x[0] z[1] z[2] x[2] (y[0] x[1] y[1] y[2] . . . x[c]) z[c]) y[c]) Partitions in different replexes make up a data chain if they share rows Data Chains in Replex All pairs of partitions are potential data chains Main Challenge in Replex Increased failure complexity Partition Failures 1. Partition Recovery 2. Client Requests Index is unavailable, data is available Partition Failures Where is the data? Index is unavailable, data is available Finding Data After Failure Finding Data After Failure Partition Recovery Client Requests Tradeoff? Parallel Recovery Requests must be multicast Recovery Time Lookups reduce to linear scan at each partitions Request Amplification Hybrid Replex: Key Contributions 1. 2. 3. 4. Recovery time vs request amplification tradeoff Improve failure availability of multiple replexes Storage vs recovery performance tradeoff Graceful failure degradation Hybrid Replex: Key Contributions 1. 2. 3. 4. Recovery time vs request amplification tradeoff Improve failure availability of multiple replexes Storage vs recovery performance tradeoff Graceful failure degradation Hybrid Replex: What is it? • Shared across r replexes • Not associated with an index • Partitioning function dependent on these r replexes Hybrid Replex hybrid replex Data Chains with Hybrid Replex insert (r[0] r[1] r[2] . . . r[c]) Data Chains with Hybrid Replex insert (r[0] r[1] r[2] . . . r[c]) Data Chains with Hybrid Replex insert (r[0] r[1] r[2] . . . r[c]) ? ? ? ? Data Chains with Hybrid Replex insert (r[0] r[1] r[2] . . . r[c]) 1. Recovery time vs Request Amplification Recovery is less parallel… but lower request amplification 2. Improve failure availability of multiple replexes 2. Improve failure availability of multiple replexes Improving Failure Availability w/o Hybrid Replexes 2x Storage! 3. Storage vs. recovery performance 1.5x Storage! … but worse client requests Generalizing Hybrid Replexes k hybrid replexes may be shared across r replexes For the special case when r = 2, request amplification will be O(n1/(k+1)) For the special case when k = 1, request amplification will be O(n(r-1)/r) Implementation • Built on top of HyperDex, ~700 LOC • Implemented partition recovery and request re-routing on failure • Implemented variety of hybrid replex configurations Evaluation 1. What is impact of replexes on steady-state performance? 2. How do hybrid replexes affect failure performance? Evaluation Setup • Table with two indexes • 12 server machines, 4 client machines • All machines colocated in the same rack, connected via 1GB head-of-rack switch • 8 CPU, 16GB memory per machine Systems Under Evaluation Replex-2 Storage overhead: 2x Network hops per insert: 2 Failures Tolerated: 1 Replex-3 HyperDex Storage overhead: 3x Network hops per insert: 3 Failures Tolerated: 2 Storage overhead: 6x Network hops per insert: 6 Failures Tolerated: 2 Systems Under Evaluation Replex-2 Storage overhead: 2x Network hops per insert: 2 Failures Tolerated: 1 Replex-3 HyperDex Storage overhead: 3x Network hops per insert: 3 Failures Tolerated: 2 Storage overhead: 6x Network hops per insert: 6 Failures Tolerated: 2 Steady State Latency 1 .75 .25 Replex−3 HyperDex 0 .25 Replex−3 HyperDex 3 network 6 network hops hops .50 CDF 1 .50 .75 Inserts 0 CDF Reads 0 20 40 60 80 Latency (us) Reads against either index have similar latency, but we report reads against primary index 100 0 50 100 150 200 250 Latency (us) Replex-2 not included because it has a lower fault tolerance threshold 300 100K 50K Failure Replex−2 Replex−3 HyperDex 0 Operations/s 150K Single Failure Performance 0 50 100 Test time (s) Experiment • Load with 10M, 100 byte rows • Split reads 50:50 between each index 150 100K 50K Replex−2 Replex−3 HyperDex 0 Operations/s 150K Single Failure: Recovery Time 0 50 100 150 Test time (s) Recovery Time 1. HyperDex recovers slowest because 2-3x more data 2. Replex-2 recovers fastest because least data, parallel recovery 50K 100K Increased throughput with hybrid replex Replex−2 Replex−3 HyperDex 0 Operations/s 150K Single Failure: Failure Throughput 0 50 100 150 Test time (s) Failure Throughput 1. Replex-2 low throughput because of high request amplification 2. Replex-3 has throughput comparable to HyperDex 100K 200K Failures Replex−3 HyperDex 0 Operations/s 300K Two Failure Performance 0 50 100 150 Test time (s) Experiment • Load with 1M, 1 KB rows • 50:50 reads against each index *Replex-2 not included b/c only tolerates 1 failure 200K 100K Replex−3 HyperDex 0 Operations/s 300K Two Failure Performance 0 50 100 150 Test time (s) 1. Recovery Time: Replex-3 recovers nearly 3x faster, due to less data 2. Failure Throughput: HyperDex has ½ throughput, because nearly every partition is participating in recovery Summary 1. Replacing replicas with replexes decreases index storage AND steady-state overheads 2. Hybrid replexes enable a rich tradeoff space during failures Questions?