Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Scheduling in Data Stream Processing Multi-Query Scheduling for TimeCritical Data Stream Applications Many data stream applications involve time-critical tasks, where timeliness of output delivery is crucial Yongluan Zhou, Ji Wu and Ahmed Khan Leghari University of Southern Denmark 25th International Conference on Scientific and Statistical Database Management 1 Algorithmic trading Disaster early-warning Intrusion detection In these applications, an efficient resource allocation scheme is indispensable 2 25th International Conference on Scientific and Statistical Database Management Problem Formulation System Model • Each input tuple is associated with – a timestamp – a validity period • Tuples have to be processed in order • No tuple should be dropped • For an output tuple – is the time when – Utility: 25th International Conference on Scientific and Statistical Database Management 3 25th International Conference on Scientific and Statistical Database Management is produced 4 Job Scheduling Model Problem Definition (continued) • The utility function for a query: In a multi-query environment Each query is associated with a weight the utility function over all the queries: NonLeaf-Job (NL-Job) Leaf-Job (L-Job) 25th International Conference on Scientific and Statistical Database Management 5 25th International Conference on Scientific and Statistical Database Management 6 OptProfit Algorithm Profit-Based Scheduling • Prioritize jobs according to their profit densities • Difficulty: assigning profits of non-leaf jobs – E.g.: Sum of its downstream leaf operators’ profits (Aurora scheduler1) 1 Donald Carney, et al. Operator scheduling in a data stream manager. In VLDB, 2003. 25th International Conference on Scientific and Statistical Database Management 7 Given the static job graph, OptProfit algorithm guarantees that the produced job execution sequence is optimal. 25th International Conference on Scientific and Statistical Database Management 8 Deadline-Aware Strategies Deadline-Aware Strategies Profit-based scheduling is a static approach fails to consider the input’s actual deadline Unfortunately, it has been proven that no on-line algorithm can guarantee the optimal performance • Prioritize jobs according to profit density first. • If jobs with less profit density are about to expire, they may also preempt jobs with higher profit density values. • Two heuristics dual approaches 25th International Conference on Scientific and Statistical Database Management Deadline-Dominant Strategy (DD strategy) • More urgent tasks are given higher priority to execute • A job with high profit-density has a chance to preempt another more urgent job (controlled by a threshold) – Deadline-Dominant Strategy (DD) – Profit-Dominant Strategy (PD) Profit-Dominant Strategy (PD strategy) 9 Experimental Study Experimental Study Compare six scheduling algorithms (Basic, OptProfit, Basic+DD, OptProfit+DD, Basic+PD, OptProfit+PD) on a prototype system A round-robin scheduler used as a baseline Test Queries 10 25th International Conference on Scientific and Statistical Database Management Input Load OptProfit generally performs better than other strategies mainly due to its low scheduling overhead Randomly generated Query numbers: 20 – 28 Each query has a weighting factor between 1 to 10 Test Data Real dataset: a trace collected from the Internet Traffic Archive Synthetic data: generated according to b-model to simulate data burst 25th International Conference on Scientific and Statistical Database Management 11 25th International Conference on Scientific and Statistical Database Management 12 Experimental Study Experimental Study Tuple Urgency Data Burstiness Deadline-aware strategies are generally advantageous 25th International Conference on Scientific and Statistical Database Management OptProft+PD appears to be the best choice 13 Summary Revisited data stream scheduling for time-critical applications Model the problem in a job scheduling perspective The OptProfit algorithm can generate a schedule with maximum benefit OptProfit works reasonably well when workload is high or inputs are not very “urgent”. Deadline-aware strategies combined with the OptProfit algorithm perform the best in general 25th International Conference on Scientific and Statistical Database Management 15 25th International Conference on Scientific and Statistical Database Management 14