Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Database Connectivity wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Commitment ordering wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Database model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
S-Store: Streaming Meets Transaction Processing H-Store is an experimental database management system (DBMS) designed for online transaction processing applications Manasa Vallamkondu Motivation • Reducing the latency of results with the applications like monitoring systems by typically running the operations/queries in main memory, by avoiding the extreme latency caused by disk access. • S-store can simultaneously accommodate OLTP and streaming applications S-store • S-store is a combination of stream processing and transaction processing. S-Store can simultaneously accommodate OLTP and streaming applications. • S-Store is an extension of H-Store - an opensource, in-memory, distributed OLTP database system. Glossary • H- store- H-Store is an experimental main-memory, parallel database management system that is optimized for on-line transaction processing (OLTP) applications. It is a highly distributed, row-store-based relational database that runs on a cluster on shared-nothing, main memory executor nodes. • Stream processing is a computer programming paradigm, equivalent to dataflow programming and reactive programming, that allows some applications to more easily exploit a limited form of parallel processing. Such applications can use multiple computational units, such as the FPUs on a GPU or field programmable gate arrays (FPGAs), without explicitly managing allocation, synchronization, or communication among those units. • Ad hoc analysis is a business intelligence process designed to answer a single, specific business question. The product of ad hoc analysis is typically a statistical model, analytic report, or other type of data summary. Application Domain • Real time Data Ingestion - An analytics warehouse must be updated periodically with recent data. Transaction mechanism is needed for adding new data into warehouse. • S-Store is well-positioned to satisfy the need of ETL tools for working on streaming data. • Shared Mutable State - S-Store is useful beyond real-time ETL Financial Information Exchange Data Computational Model Hybrid workloads(combination of independent OLTP transactions and streaming transactions) are supported with well-defined correctness guarantees1. ACID guarantees for individual transactions (both OLTP and streaming) 2. Ordered Execution guarantees for dataflow graphs of streaming transactions 3. Exactly-Once Processing guarantees for streams (i.e., no loss or duplication) Transaction Execution • A Transaction Execution essentially corresponds to an atomic batch and its subsequent processing by a stored procedure. H-Store system • H-Store is an open-source, main-memory OLTP engine, transactions are predefined as stored procedures. • Transaction executions (TEs) are instantiated by binding input parameters of a stored procedure to real values and running it. • H- store initiates the transaction in a layer called Partition engine (PE) which is responsible for managing transaction distribution, scheduling, coordination, and recovery. H-Store • PE manages another layer with Execution engine (EE)) that is responsible for the local execution of SQL queries. • A client program connects to the PE via a stored procedure execution request. If the stored procedure requires SQL processing, then the EE is invoked with these sub-requests S-Store Architecture Includes stream processing which enables the management of • Inputs from streaming clients and dataflow graphs of stored procedures at the PE layer • Triggers at both the PE and the EE layers • stream- and window-based queries at the EE layer • In-memory stream and window state. S-Store Architecture Experiments • A number of micro-experiments were performed to evaluate the optimizations achieved by S-Store over H- Store with transactional stream processing workloads. • Execution Engine Triggers - In S-Store, the SQL statements of stored procedure can be activated using EE triggers and the execution takes place inside the EE layer. where the submission of the set of SQL statements (an insert and a delete) for each query as a separate execution batch from PE to EE. Experiments • Partition Engine Triggers -S-Store’s PE triggers to an equivalent implementation in H-Store, which has no such trigger support in its PE. • Serializing transaction requests severely limits H-Store’s performance where as S-Store uses a “streaming scheduler” which can activate the next transaction directly within the PE and can prioritize these triggered transactions ahead of the current scheduling queue Results Execution Engine triggers Partition Engine Triggers Results • S-Store processes 2,200 batches per second. S-Store is able to handle multiple asynchronous transaction requests from the client and still preserve the tuple processing order. • In PE weak recovery not only achieves better throughput during normal operation, but it also provides lower recovery time. Conclusion • S-Store is a new model of transactions for stream processing, that seamlessly combines OLTP transaction processing with our transactional stream processing model. • S-Store shows how the symbiosis can be implemented in the context of a main-memory, OLTP DBMS in a straight-forward way. S-Store is shown to outperform H-Store, Esper, and Storm on a streaming workload that requires transactional state access, while at the same time providing stronger correctness guarantees. Future Work • Extending S-Store to operate on multiple nodes by addressing a number of research issues including data and workload partitioning, distributed recovery, and distributed transaction scheduling and handling of dynamic and hybrid (OLTP+streaming) workloads.