Download S-Store: Streaming Meets Transaction Processing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Commitment ordering wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Serializability wikipedia , lookup

Concurrency control wikipedia , lookup

Transcript
S-Store: Streaming Meets
Transaction Processing
H-Store is an experimental database
management system (DBMS) designed for
online transaction processing applications
Manasa Vallamkondu
Motivation
• Reducing the latency of results with the
applications like monitoring systems by
typically running the operations/queries in
main memory, by avoiding the extreme
latency caused by disk access.
• S-store can simultaneously accommodate
OLTP and streaming applications
S-store
• S-store is a combination of stream processing
and transaction processing. S-Store can
simultaneously accommodate OLTP and
streaming applications.
• S-Store is an extension of H-Store - an opensource, in-memory, distributed OLTP database
system.
Glossary
• H- store- H-Store is an experimental main-memory, parallel database
management system that is optimized for on-line transaction processing
(OLTP) applications. It is a highly distributed, row-store-based relational
database that runs on a cluster on shared-nothing, main memory executor
nodes.
• Stream processing is a computer programming paradigm, equivalent to
dataflow programming and reactive programming, that allows some applications
to more easily exploit a limited form of parallel processing. Such applications can
use multiple computational units, such as the FPUs on a GPU or field
programmable gate arrays (FPGAs), without explicitly managing allocation,
synchronization, or communication among those units.
• Ad hoc analysis is a business intelligence process designed to answer a
single, specific business question. The product of ad hoc analysis is
typically a statistical model, analytic report, or other type of data
summary.
Application Domain
• Real time Data Ingestion - An analytics
warehouse must be updated periodically with
recent data. Transaction mechanism is needed for
adding new data into warehouse.
• S-Store is well-positioned to satisfy the need of ETL
tools for working on streaming data.
• Shared Mutable State - S-Store is useful
beyond real-time ETL
Financial Information Exchange Data
Computational Model
Hybrid workloads(combination of independent OLTP
transactions and streaming transactions) are supported
with well-defined correctness guarantees1. ACID guarantees for individual transactions
(both OLTP and streaming)
2. Ordered Execution guarantees for dataflow graphs of
streaming transactions
3. Exactly-Once Processing guarantees for streams
(i.e., no loss or duplication)
Transaction Execution
• A Transaction Execution essentially
corresponds to an atomic batch and its
subsequent processing by a stored procedure.
H-Store system
• H-Store is an open-source, main-memory OLTP
engine, transactions are predefined as stored
procedures.
• Transaction executions (TEs) are instantiated by
binding input parameters of a stored procedure
to real values and running it.
• H- store initiates the transaction in a layer called
Partition engine (PE) which is responsible for
managing transaction distribution, scheduling,
coordination, and recovery.
H-Store
• PE manages another layer with Execution
engine (EE)) that is responsible for the local
execution of SQL queries.
• A client program connects to the PE via a
stored procedure execution request. If the
stored procedure requires SQL processing,
then the EE is invoked with these sub-requests
S-Store Architecture
Includes stream processing which enables the
management of
• Inputs from streaming clients and dataflow
graphs of stored procedures at the PE layer
• Triggers at both the PE and the EE layers
• stream- and window-based queries at the EE
layer
• In-memory stream and window state.
S-Store Architecture
Experiments
• A number of micro-experiments were performed
to evaluate the optimizations achieved by S-Store
over H- Store with transactional stream
processing workloads.
• Execution Engine Triggers - In S-Store, the SQL
statements of stored procedure can be activated
using EE triggers and the execution takes place
inside the EE layer. where the submission of the
set of SQL statements (an insert and a delete) for
each query as a separate execution batch from PE
to EE.
Experiments
• Partition Engine Triggers -S-Store’s PE triggers
to an equivalent implementation in H-Store,
which has no such trigger support in its PE.
• Serializing transaction requests severely limits
H-Store’s performance where as S-Store uses
a “streaming scheduler” which can activate
the next transaction directly within the PE and
can prioritize these triggered transactions
ahead of the current scheduling queue
Results
Execution Engine triggers
Partition Engine Triggers
Results
• S-Store processes 2,200 batches per second.
S-Store is able to handle multiple
asynchronous transaction requests from the
client and still preserve the tuple processing
order.
• In PE weak recovery not only achieves better
throughput during normal operation, but it
also provides lower recovery time.
Conclusion
• S-Store is a new model of transactions for stream
processing, that seamlessly combines OLTP
transaction processing with our transactional
stream processing model.
• S-Store shows how the symbiosis can be
implemented in the context of a main-memory,
OLTP DBMS in a straight-forward way. S-Store is
shown to outperform H-Store, Esper, and Storm
on a streaming workload that requires
transactional state access, while at the same time
providing stronger correctness guarantees.
Future Work
• Extending S-Store to operate on multiple
nodes by addressing a number of research
issues including data and workload
partitioning, distributed recovery, and
distributed transaction scheduling and
handling of dynamic and hybrid
(OLTP+streaming) workloads.