Download slides - Johns Hopkins University

I/O Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov, Eric Perlman, Randal Burns, Yanif Ahmad, and Alexander Szalay Johns Hopkins University I/O Streaming For Batch Queries  Based on partial sums  Allows access to the underlying data in any order and in parts  Data streamed from disk in a single pass  Eliminates redundant I/O  Over an order of magnitude improvement in performance over direct evaluation of queries Introduction  Data-intensive computing breakthroughs have allowed for new interaction with scientific numerical simulations  Formerly, analysis performed during the computation  No data stored for subsequent examination  Turbulence Database Cluster  Stores entire space-time evolution of the simulation  Two datasets totaling 70TB; part of the 1.1PB GrayWulf cluster  Provides public access to world-class simulation  Implements “immersive turbulence*” approach *E. Perlman, R. Burns,Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database cluster. In Supercomputing, 2007. Turbulence Database Cluster Motivation  Without I/O streaming:  Heavy DB usage slows down the service by a factor of 10 to 20  Query evaluation techniques adapted from simulation code do not access data coherently  Substantial storage overhead (~42%) incurred to localize each computation  Turbulence queries:  95% of queries perform Lagrange Polynomial interpolation  Can be evaluated in parts Processing a Batch Query 0 10 11 14 15 8 9 12 13 2 3 6 7 0 1 4 5 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 Processing a Batch Query query 2 10 0 11 14 15 8 9 12 13 2 3 6 7 0 1 4 query 1 5 1 q1: 0 2 3 5 6 4 6 q2: 9 2 1 2 3 1 4 q3: 4 5 6 7  Multiple disk seeks query 3 4 1 1 1  Redundant I/O 7 8 8 9 9 1 2 1 0 1 1 1 2 1 3 1 4 1 5 Streaming Evaluation Method  Linear data requirements of the computation allow for:  Incremental evaluation  Streaming over the data  Concurrent evaluation of batch queries Processing a Batch Query query 2 10 0 11 14 15 8 9 12 13 2 3 6 7 0 1 4 query 1 5 1 2 I/O Streaming: 0 q1 3  Sequential I/O  Single pass query 3 4 5 6 7 8 1 0 9 1 1 1 2 3 4 5 6 7 8 9 1 1 q1 q1 q1 q1 q3 q1 q3 q1 q1 q2 q3 q3 q2 1 2 1 2 q1 q2 1 3 1 4 q2 1 4 1 5 Lagrange Polynomial Interpolation N f (x', y')   ly j1 p N j 2 N (y') lx n N i 2 (x')  f (x i1 Lagrange coefficients  n N i 2 ,y Data p N j 2 ) Processing a Batch Query  Input queries pre-processed into a key-value dictionary  Keys are z-index values of data atoms stored in DB  Entries are lists of queries  Temp table is created out of dictionary keys  Execute a join between temp table and data table  When data atom is read-in all queries that need data from it are processed and their partial sums updated Experimental Evaluation  Random workloads:  across the entire cube space  a 1283 subset of the entire space  Workload from the usage log of the Turbulence cluster  Compare with direct methods of evaluation:  Direct  Sorting  Join/Order By 3D Workload  Used for generating global statistics 128 Workload  Used for:  Examining ROI  Creating visualizations Experimental Evaluation  Random workloads:  across the entire cube space  a 1283 subset of the entire space  Workload from the usage log of the Turbulence cluster  Compare with direct methods of evaluation:  Direct  Sorting  Join/Order By Setup  Experimental version of the MHD database  ~300 timesteps of the velocity fields of the MHD simulation  Two 2.33 GHz dual quad-core Windows 2003 servers with SQL Server 2008 and 8GB of memory  Part of the 1.1PB GrayWulf cluster with aggregate low-level throughput of 70 GB/sec  Data tables striped across 7 disks per node 3D Workload  I/O Streaming  Join/Order Sorting Bytoof executes a magnitude more sequential entire batch acces as a join Over anleads order improvement  Each atom is read only once  Effective cache usage 128 Workload  Less I/O  More data sharing  I/O Streaming alleviates I/O bottleneck  Computation emerges as the more costly operation 128 Workload Future Work  Extend I/O streaming technique to other decomposable kernel computations:  Differentiation  Temporal interpolation  Filtering  Multi-job batch scheduling:  Integrate into a batch scheduling framework such as JAWS* *X. Wang, E. Perlman, R. Burns, T. Malik, T. Budavari, C. Meneveau, and A. Szalay. Jaws: Job-aware workload scheduling for the exploration of turbulence simulations. In Supercomputing, 2010. Summary  I/O Streaming method for data-intensive batch queries  Single pass by means of partial-sums  Effective exploitation of data sharing  Improved cache locality  Over an order of magnitude improvement in performance Questions Images courtesy of Kai Buerger ([email protected])

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides - Johns Hopkins University