Download SEA-CNN - Worcester Polytechnic Institute

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Relational algebra wikipedia , lookup

Transcript
Continuous Stream Monitoring Technology
Elke A. Rundensteiner
Database Systems Research Laboratory
Department of Computer Science
Worcester Polytechnic Institute, USA
rundenst @ cs.wpi.edu
November 2006
A Database . . .
Vast amount of electronic
information in organisations,
companies, scientific institutes
that needs to be organized,
stored securily, and accessed
efficiently and easily.
Select name
from employee;
DBMS
Stored
Database
Three common steps :
 Make schema design
 Load database
 Query static database
2
So what next ?
Select name
from employee;
DBMS
Stored
Database
3
A Look at Modern Data : Streams !
Digital radio telescopes
 Network traffic flow
 Stock tickers/feeds
 Sensor networks
 Web usage transactions
 Outpatient care
 Environmental instruments

DSMS
Filter & Transform
select fft(s)
from radiosignal s
where source(s)=
“Antenna1”;
4
Databases : Everything is Upside Down !
static data
data
data
data
Standing queries
Query
Query
Query
Query
data
one-time queries
data
data
streams
of data
5
Continuous Queries on Data Streams
Online
Stream
Monitoring
6
Motivating Applications Everywhere





Traffic Management : Streams of Cars and Mobile Requests
Market Analysis : Streams of Stock Exchange Data
Critical Care : Streams of Vital Sign Measurements
Physical Plant Monitoring: Streams of RFID/Environmental Readings
Emergency Response: Streams of Sensors and People tracking
7
Mobile Traffic-Related Streams
- moving objects
- dynamic range query
- dynamic kNN query
8
FireEngine Project : Sensors in Rooms
10
Fire Monitoring Queries


Track smoke and heat clouds (moving clusters) in
terms of their sizes and speeds?
Is there an outlier (prank), or an actual fire ?

Match sensors readings of fire with a fire stream
simulation to determine similarity ?

Any sensors faulty, and thus should be ignored?
11
Dynamicity in Stream Query Processing
Register
Continuous
Queries
High workload of
queries
Streaming Data
Real-time and
accurate responses
required
Scalable Stream
Query Engine
May
have time-paradigm)
(push-based
varying rates and
Memory- and CPU
high-volumes
resource limitations
(continuous evaluation)
Streaming Result
Available resources for
executing each operator
may vary over time.
New query processing technology required.
12
Execution of Queries
s
Slide
s
..
.
s
s
..
.
m
m
Tumble
s
..
.
QoS
App
..
.

QoS
App
App
..
.
QoS
m
Queries =
• Graph = Query Plan
• Boxes = Query Operators such as Filter or Join
• Arcs = Streams with time-stamped tuples
13
Execution of Queries
s
Slide
s
..
.
s
s
..
.
m
m
Tumble
s
..
.
QoS
App
..
.

QoS
App
App
..
.
QoS
m
Execution via Operator Scheduling
14
Adaptation Techniques in CAPE

On-Line Query Plan Reshaping
(with Yali Zhu and G. Heineman )
Published in ACM SIGMOD’ 2004,
and in Submission to TODS journal 2006
15
Query Optimization
BC
AB
AB
A
BC
B
C A
B
C
How optimize if query is continuously running?
16
Run-time Plan Re-Optimization

Step1 - Decide when to optimize


Step2 – Generate new query plan


Statistics monitoring
Query optimization
Step3 – Replace current plan by new plan

Plan Migration
17
Naïve Plan Migration Strategy
BC
AB
AB
A

BC
B
C A
B
C
Migration Steps




Pause execution of old plan
Drain out all tuples inside old plan
Replace old plan by new plan
Resume execution of new plan
Problem:
Works for stateless operators only
18
Stateful Operator in Streaming

Why stateful


Need non-blocking operators
Operator needs to output partial results
State A
State B
AB
A
Symmetric hash join
For each new tuple A
purge state B,
join state B,
insert to state A
B
Key Observation:
The purge of tuples in states
relies on processing of
new tuples.
19
Naïve Migration Strategy Revisited
BC
AB
A

Deadlock Waiting Problem:
B
C
(2)
All tuples
drained
Steps
(1) Pause execution of old plan
(2) Drain out all tuples inside old plan
(3) Replace old plan by new plan
(4) Resume execution of new plan
(3)
Old Replaced
By new
(4)
Processing
Resumed 20
Proposed Dynamic Migration Strategies
Moving State Strategy
 Parallel Track Strategy

21
Moving State Strategy

Basic idea


Share common states between two boxes
Key Steps

Identify common states


Share common states


State matching
State moving
Recompute unmatched states

State recomputing
22
Moving State Strategy


QABCD
SABC
QABCD


CD
SD
SA
BC
SAB
State Matching
AB
SBCD
SBC
SC
CD

SD
State Moving


BC
AB
SA
QA
SB
QB
QC
QD
QA
SB
SC
QB
QC
QD

New Box
Between matched states
On same machine, creates new
pointers for matched states in
new box
What’s left?

Old Box
State in old box has unique ID
During rewriting, new ID given to
new state in new box
When rewriting done, match
states based on IDs.
Unmatched states in new box
23
Unmatched States

State Recomputing

QABCD
Recursively recompute
unmatched SBC and SBCD by
joining matched states
SA
AB
SBCD
CD
SBC
SD
BC
SB
QA
QB
SC
QC
QD
24
MS Migration Pros and Cons

Pros

Fast when # of tuples in states is small


Cons


Low input rates or small window size
Output silence during entire migration stage
Can we output results even during migration?

Motivation for Parallel Track Strategy
25
Parallel Track Strategy

Basic idea



Execute both old and new plans in parallel
Gradually “push” old tuples out of old box by purging
Key Steps



Connect new box
Execute both boxes in parallel
Remove old box once “expired”


Contains only new tuples
No old tuples or sub-tuples
26
Parallel Track Strategy
A Tuple ABC in SABC
A
B
C

QABCD
SABC
SAB
QABCD
SD
CD
AB
SC
SBC

Until all old tuples purged
Disconnect old box
SD
CD
SB
SB
SC
BC
AB
QA
QA

SBCD
SA
BC
SA

Connect boxes
Execute in parallel
QB
QC
QB
QC
QD
QD
27
PT Migrations Pros and Cons

Pros

Keep on producing results even during migration


No results during MS migration
Cons

Migration duration is at least 2W

MS may be faster depends on # of tuples in states
28
Summary : Stream Plan Migration

Our central theme : Optimization via Adaptation
First run-time solution for stateful operators
 Two migration methods:



Moving State Strategy
Parallel Track Strategy
Cost Models for Comparative Analysis
 System Implementation in CAPE
 Experimental Evaluations

29
Overall Summary : So Much Left to Do !

Large variety of challenging stream applications

Generic core technology for stream processing engines

Startup starting to pop up : StreamBase for Stockmarket

Major DBMS players like IBM, Oracle, etc. joining in

Cool open research, great potential for real impact !
30
The End
http://davis.wpi.edu.edu/~dsrg
Questions ? Questions ?
31
Subset of CAPE Publications
[RDZ04] E. A. Rundensteiner, L. Ding, Y. Zhu, T. Sutherland and B. Pielech, “CAPE: A ConstraintAware Adaptive Stream Processing Engine”. Invited Book Chapter.
http://www.cs.uno.edu/~nauman/streamBook/. July 2004
[ZRH04] Y. Zhu, E. A. Rundensteiner and G. T. Heineman, "Dynamic Plan Migration for
Continuous Queries Over Data Streams”. SIGMOD 2004, pages 431-442.
[DMR+04] L. Ding, N. Mehta, E. A. Rundensteiner and G. T. Heineman, "Joining Punctuated
Streams“. EDBT 2004, pages 587-604.
[DR04] L. Ding and E. A. Rundensteiner, "Evaluating Window Joins over Punctuated Streams“.
CIKM 2004, to appear.
[DRH03] L. Ding, E. A. Rundensteiner and G. T. Heineman, “MJoin: A Metadata-Aware Stream Join
Operator”. DEBS 2003.
[RDSZBM04] E A. Rundensteiner, L Ding, T Sutherland, Y Zhu, B Pielech And N Mehta. CAPE:
Continuous Query Engine with Heterogeneous-Grained Adaptivity. Demonstration Paper.
VLDB 2004
[SR04] T. Sutherland and E. A. Rundensteiner, "D-CAPE: A Self-Tuning Continuous Query Plan
Distribution Architecture“. Tech Report, WPI-CS-TR-04-18, 2004.
[SPR04] T. Sutherland, B. Pielech, Yali Zhu, Luping Ding, and E. A. Rundensteiner, "Adaptive MultiObjective Scheduling Selection Framework for Continuous Query Processing “. IDEAS
2005.
[SLJR05] T Sutherland, B Liu, M Jbantova, and E A. Rundensteiner, D-CAPE: Distributed and SelfTuned Continuous Query Processing, CIKM, Bremen, Germany, Nov. 2005.
[LR05] Bin Liu and E.A. Rundensteiner, Revisiting Pipelined Parallelism in Multi-Join Query
Processing, VLDB 2005.
[B05] Bin Liu , Yali Zhu and E.A. Rundensteiner, Spill Policies for Long-Running Queries, ACM
SIGMOD 2006, to appear.
CAPE Project: http://davis.wpi.edu/dsrg/CAPE/index.html
32