Download Horticulture

CSCI5570 Large Scale Data Processing Systems NewSQL James Cheng CSE, CUHK Slide Ack.: modified based on the slides from Hefu Chai Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems Andrew Pavlo, Carlo Curino, Stanley Zdonik SIGMOD 2012 2 Main Memory • Parallel • Shared-Nothing Transaction Processing H-Store: A High-Performance, Distributed Main Memory Transaction Processing System Proc. VLDB Endow., vol. 1, iss. 2, pp. 1496-1499, 2008. 3 4 Procedure Name Input Parameters Transaction Execution Client Application Database Cluster 5 Transaction Result Client Application Database Cluster 6 OLTP Transactions short-lived (i.e., no user stalls) typically executed as pre-defined txn templates or stored procedures Fast Repetitive touch a small subset of data using index (i.e., no full table scans or large distributed joins) Small 7 We need an approach that supports… • Stored Procedure • Load balancing in the presence of timevarying skew • Complex schemas • Deployments with larger number of partitions 8 Optimal Database Design • Scalability of NewSQL depends on the existence of an optimal database design, which defines – how an application’s data and workload is partitioned or replicated across nodes – how queries and transactions are routed to nodes – the above determines two crucial factors: • the number of transactions accessing multiple nodes • the skewness of the load across the cluster A growing fraction of distributed transactions and load skew => 10x worse performance (see following slides) 9 Automatic Database Design Tool for Parallel Systems Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems SIGMOD 2012 10 What are the key issues? • Two key issues when generating a good database design for enterprise OLTP applications – Distributed transactions • network overhead for employ two-phase commit or similar distributed consensus protocol to ensure atomicity and serializability – Temporal workload skew • node with skewed load becomes saturated, while other nodes are idle and clients are blocked waiting for results 11 What are the key issues? • Distributed transactions • Temporal workload skew 12 Impact of distributed transactions on throughput 13 What are the key issues? • Distributed transactions • Temporal workload skew 14 Temporal workload skew • Think about the example of Wikipedia – Even though the average load of the cluster for the entire day is uniform, the load across the cluster for any point is unbalanced (due to difference in languages of the wiki content and time difference) – Static Skew Vs. Temporal Skew 15 Impact of temporal workload skew on throughput 16 What are the key issues? • A complex tradeoff: distributed transactions vs. temporal workload skew – put database on a single node and execute all transactions there • no distributed transactions • extreme load skew – execute all transactions as distributed transactions that access data at every partition • total distributed transactions • no load skew 17 Horticulture’s Goal • Analyze – a database schema – the structure of application’s stored procedures – a sample transaction workload • Generate partitioning that – minimizes distribution overhead – balances access skew 18 19 Maintain the tradeoff between distributed transactions and temporal skew Extend design space to include replicated secondary indexes Organically handling stored procedure routing Two Main Technical Contributions Large Neighborhood Search: automatic database partitioning Three Unique Features Skew-Aware Cost Model: coordination cost and load distribution estimation 20 What are the design options • For each table: – Horizontal partition – Replicate on all partitions – Replicate a secondary index for a subset of its columns – Effectively route incoming transaction requests 21 Horizontal Partitioning 22 Table Replication For read-only or read-mostly tables 23 Secondary Index For read-only or read-mostly columns 24 Stored Procedure Routing 25 Stored Procedure Routing 26 What are the key technique contributions • Large-Neighborhood Search • Skew-Aware Cost Model 27 Large-Neighborhood Search 4. local search for a design new design 2. Perform Generate an initial “best” Dbest 1. Database schema 1. After 3. 5. Analyze Create running a sample new incomplete for workload a limited design to time, pre-compute stop D and by using D as starting point. Replace Dbest relax relax based on the most frequently accessed 2. Stored procedures info relaxing return used Ddesign (i.e., to guide resetting) the search a subset process of Dbest w/ new with a lower cost. best columns Sample workloaddo not improve Restart3.Step 3 if k searches Dbest or no design in Drelax‘s neighborhood. 28 Large-Neighborhood Search Initial Design 1. Select the most frequently accessed column in each table as the horizontal partitioning attribute 2. Greedily replicate read-only tables until no space left 3. Select next most frequently accessed, read-only column as secondary index attribute for each table 4. Select the routing parameter for stored procedures based on how often the parameters are referenced in Q (Q: queries that access columns selected in Step 1) 29 Large-Neighborhood Search Relaxation: • The process of selecting random tables in the database and resetting their chosen partitioning attributes in Dbest • Allow LNS to escape a local minimum and jump to a new neighborhood of potential solutions • Horticulture: • decides the number of tables to relax • randomly chooses which tables to relax (routing parameters of stored procedures referencing a relaxed table will also be reset) • generates the candidate attributed for the relaxed tables and procedures 30 Large-Neighborhood Search • Local Search Explore the tree using branch-and-bound search, replace the For each procedure, choose the routing parameter w/ the table’s design option in Drelax to that of the tree node. lowest cost, before moving down the tree. Estimate the cost, if lower than that of Dbest, go down the tree. Phase 1 Phase 2 31 What are the key technique contributions • Large-Neighborhood Search • Skew-Aware Cost Model 32 Skew-Aware Cost Model • LNS relies on a cost model to estimate the cost of executing the sample workload using a given design • The cost model must be able to – – – – accentuate the properties that are important in a DB be computed quickly estimate the cost of an incomplete design return a monotonically increasing cost as more variables are set when searching down the tree 33 Skew-Aware Cost Model Distributed Transactions + Workload Skew Factor 34 Skew-Aware Cost Model • Measure – how much workload executes as distributed transactions – how uniformly load is distributed across the cluster 𝛼 × 𝐶𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑖𝑜𝑛𝐶𝑜𝑠𝑡 𝐷, 𝑊 + 𝛽 × 𝑆𝑘𝑒𝑤𝐹𝑎𝑐𝑡𝑜𝑟(𝐷, 𝑊) 𝑐𝑜𝑠𝑡 𝐷, 𝑊 = 𝛼+𝛽 Tradeoff! 35 Skew-Aware Cost Model • Coordinator Cost 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝐶𝑜𝑢𝑛𝑡 𝑑𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡 × 1.0 + 𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡 × 𝑛𝑢𝑚𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡 Total number of partitions accessed divided by total number of partitions could have been accessed, and scale it based on the ratio of distributed transactions to single-partition transactions 36 Skew-Aware Cost Model • Skew Factor 𝑛𝑢𝑚𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠 𝑠𝑘𝑒𝑤[𝑖] × 𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡𝑠[𝑖] 𝑖=0 ∑𝑡𝑥𝑛𝐶𝑜𝑢𝑛𝑡𝑠 • To avoid time varying skew, divide W into finite intervals • Estimate skew factor, skew[i], of each interval i • Final skew factor is the mean of the skew factors weighted by the number of transactions executed in each interval 37 Incomplete Designs • Query that references a table with an unset attribute in a design is labeled as unknown • For each unknown query – Coordinator cost: assume that any unknown query is singlepartitioned – Skew factor: assume that unknown queries execute on all partitions in the cluster • ‘Unknown’ can change to ‘known’ • ‘Known’ cannot change to ‘unknown’ Estimated cost monotonically increasing! 38 Optimizations • Access Graphs • Workload Compression 39 Access Graph Model and store input sample workload as an access graph: • Vertex: table • Edge: tables are co-accessed in a query • Edge weight: the number of times the queries forming the relationship LNS uses access graph to quickly identify important relationships between tables w/o repeatedly reprocessing input sample workload 40 Optimizations • Access Graphs • Workload Compression 41 Workload Compression • Given a larger input sample workload, LNS finds a better database design, but less efficient • Solution – workload compression: – Combine sets of similar queries in individual transactions into fewer weighted records – Combine similar transactions into a smaller number of weighted records in the same manner • The cost model scales its estimates using these weights w/o having to process each of the records separately in the original workload 42 Algorithm Comparison 43 Throughput 44 Search Times The best solution found by Horticulture over time (red line: known optimal design, if available) 45

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Horticulture