Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic Institute, USA rundenst @ cs.wpi.edu November 2006 A Database . . . Vast amount of electronic information in organisations, companies, scientific institutes that needs to be organized, stored securily, and accessed efficiently and easily. Select name from employee; DBMS Stored Database Three common steps : Make schema design Load database Query static database 2 So what next ? Select name from employee; DBMS Stored Database 3 A Look at Modern Data : Streams ! Digital radio telescopes Network traffic flow Stock tickers/feeds Sensor networks Web usage transactions Outpatient care Environmental instruments DSMS Filter & Transform select fft(s) from radiosignal s where source(s)= “Antenna1”; 4 Databases : Everything is Upside Down ! static data data data data Standing queries Query Query Query Query data one-time queries data data streams of data 5 Continuous Queries on Data Streams Online Stream Monitoring 6 Motivating Applications Everywhere Traffic Management : Streams of Cars and Mobile Requests Market Analysis : Streams of Stock Exchange Data Critical Care : Streams of Vital Sign Measurements Physical Plant Monitoring: Streams of RFID/Environmental Readings Emergency Response: Streams of Sensors and People tracking 7 Mobile Traffic-Related Streams - moving objects - dynamic range query - dynamic kNN query 8 FireEngine Project : Sensors in Rooms 10 Fire Monitoring Queries Track smoke and heat clouds (moving clusters) in terms of their sizes and speeds? Is there an outlier (prank), or an actual fire ? Match sensors readings of fire with a fire stream simulation to determine similarity ? Any sensors faulty, and thus should be ignored? 11 Dynamicity in Stream Query Processing Register Continuous Queries High workload of queries Streaming Data Real-time and accurate responses required Scalable Stream Query Engine May have time-paradigm) (push-based varying rates and Memory- and CPU high-volumes resource limitations (continuous evaluation) Streaming Result Available resources for executing each operator may vary over time. New query processing technology required. 12 Execution of Queries s Slide s .. . s s .. . m m Tumble s .. . QoS App .. . QoS App App .. . QoS m Queries = • Graph = Query Plan • Boxes = Query Operators such as Filter or Join • Arcs = Streams with time-stamped tuples 13 Execution of Queries s Slide s .. . s s .. . m m Tumble s .. . QoS App .. . QoS App App .. . QoS m Execution via Operator Scheduling 14 Adaptation Techniques in CAPE On-Line Query Plan Reshaping (with Yali Zhu and G. Heineman ) Published in ACM SIGMOD’ 2004, and in Submission to TODS journal 2006 15 Query Optimization BC AB AB A BC B C A B C How optimize if query is continuously running? 16 Run-time Plan Re-Optimization Step1 - Decide when to optimize Step2 – Generate new query plan Statistics monitoring Query optimization Step3 – Replace current plan by new plan Plan Migration 17 Naïve Plan Migration Strategy BC AB AB A BC B C A B C Migration Steps Pause execution of old plan Drain out all tuples inside old plan Replace old plan by new plan Resume execution of new plan Problem: Works for stateless operators only 18 Stateful Operator in Streaming Why stateful Need non-blocking operators Operator needs to output partial results State A State B AB A Symmetric hash join For each new tuple A purge state B, join state B, insert to state A B Key Observation: The purge of tuples in states relies on processing of new tuples. 19 Naïve Migration Strategy Revisited BC AB A Deadlock Waiting Problem: B C (2) All tuples drained Steps (1) Pause execution of old plan (2) Drain out all tuples inside old plan (3) Replace old plan by new plan (4) Resume execution of new plan (3) Old Replaced By new (4) Processing Resumed 20 Proposed Dynamic Migration Strategies Moving State Strategy Parallel Track Strategy 21 Moving State Strategy Basic idea Share common states between two boxes Key Steps Identify common states Share common states State matching State moving Recompute unmatched states State recomputing 22 Moving State Strategy QABCD SABC QABCD CD SD SA BC SAB State Matching AB SBCD SBC SC CD SD State Moving BC AB SA QA SB QB QC QD QA SB SC QB QC QD New Box Between matched states On same machine, creates new pointers for matched states in new box What’s left? Old Box State in old box has unique ID During rewriting, new ID given to new state in new box When rewriting done, match states based on IDs. Unmatched states in new box 23 Unmatched States State Recomputing QABCD Recursively recompute unmatched SBC and SBCD by joining matched states SA AB SBCD CD SBC SD BC SB QA QB SC QC QD 24 MS Migration Pros and Cons Pros Fast when # of tuples in states is small Cons Low input rates or small window size Output silence during entire migration stage Can we output results even during migration? Motivation for Parallel Track Strategy 25 Parallel Track Strategy Basic idea Execute both old and new plans in parallel Gradually “push” old tuples out of old box by purging Key Steps Connect new box Execute both boxes in parallel Remove old box once “expired” Contains only new tuples No old tuples or sub-tuples 26 Parallel Track Strategy A Tuple ABC in SABC A B C QABCD SABC SAB QABCD SD CD AB SC SBC Until all old tuples purged Disconnect old box SD CD SB SB SC BC AB QA QA SBCD SA BC SA Connect boxes Execute in parallel QB QC QB QC QD QD 27 PT Migrations Pros and Cons Pros Keep on producing results even during migration No results during MS migration Cons Migration duration is at least 2W MS may be faster depends on # of tuples in states 28 Summary : Stream Plan Migration Our central theme : Optimization via Adaptation First run-time solution for stateful operators Two migration methods: Moving State Strategy Parallel Track Strategy Cost Models for Comparative Analysis System Implementation in CAPE Experimental Evaluations 29 Overall Summary : So Much Left to Do ! Large variety of challenging stream applications Generic core technology for stream processing engines Startup starting to pop up : StreamBase for Stockmarket Major DBMS players like IBM, Oracle, etc. joining in Cool open research, great potential for real impact ! 30 The End http://davis.wpi.edu.edu/~dsrg Questions ? Questions ? 31 Subset of CAPE Publications [RDZ04] E. A. Rundensteiner, L. Ding, Y. Zhu, T. Sutherland and B. Pielech, “CAPE: A ConstraintAware Adaptive Stream Processing Engine”. Invited Book Chapter. http://www.cs.uno.edu/~nauman/streamBook/. July 2004 [ZRH04] Y. Zhu, E. A. Rundensteiner and G. T. Heineman, "Dynamic Plan Migration for Continuous Queries Over Data Streams”. SIGMOD 2004, pages 431-442. [DMR+04] L. Ding, N. Mehta, E. A. Rundensteiner and G. T. Heineman, "Joining Punctuated Streams“. EDBT 2004, pages 587-604. [DR04] L. Ding and E. A. Rundensteiner, "Evaluating Window Joins over Punctuated Streams“. CIKM 2004, to appear. [DRH03] L. Ding, E. A. Rundensteiner and G. T. Heineman, “MJoin: A Metadata-Aware Stream Join Operator”. DEBS 2003. [RDSZBM04] E A. Rundensteiner, L Ding, T Sutherland, Y Zhu, B Pielech And N Mehta. CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity. Demonstration Paper. VLDB 2004 [SR04] T. Sutherland and E. A. Rundensteiner, "D-CAPE: A Self-Tuning Continuous Query Plan Distribution Architecture“. Tech Report, WPI-CS-TR-04-18, 2004. [SPR04] T. Sutherland, B. Pielech, Yali Zhu, Luping Ding, and E. A. Rundensteiner, "Adaptive MultiObjective Scheduling Selection Framework for Continuous Query Processing “. IDEAS 2005. [SLJR05] T Sutherland, B Liu, M Jbantova, and E A. Rundensteiner, D-CAPE: Distributed and SelfTuned Continuous Query Processing, CIKM, Bremen, Germany, Nov. 2005. [LR05] Bin Liu and E.A. Rundensteiner, Revisiting Pipelined Parallelism in Multi-Join Query Processing, VLDB 2005. [B05] Bin Liu , Yali Zhu and E.A. Rundensteiner, Spill Policies for Long-Running Queries, ACM SIGMOD 2006, to appear. CAPE Project: http://davis.wpi.edu/dsrg/CAPE/index.html 32