Download System Model Problem Formulation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Scheduling in Data Stream Processing

Multi-Query Scheduling for TimeCritical Data Stream Applications
Many data stream applications involve time-critical tasks,
where timeliness of output delivery is crucial



Yongluan Zhou, Ji Wu and Ahmed
Khan Leghari
University of Southern Denmark
25th International Conference on Scientific and Statistical Database Management

1
Algorithmic trading
Disaster early-warning
Intrusion detection
In these applications, an efficient resource allocation
scheme is indispensable
2
25th International Conference on Scientific and Statistical Database Management
Problem Formulation
System Model
• Each input tuple
is associated with
– a timestamp
– a validity period
• Tuples have to be processed in order
• No tuple should be dropped
• For an output tuple
–
is the time when
– Utility:
25th International Conference on Scientific and Statistical Database Management
3
25th International Conference on Scientific and Statistical Database Management
is produced
4
Job Scheduling Model
Problem Definition (continued)
• The utility function for a query:

In a multi-query environment
 Each query
is associated with a weight
 the utility function over all the queries:
NonLeaf-Job (NL-Job)
Leaf-Job (L-Job)
25th International Conference on Scientific and Statistical Database Management
5
25th International Conference on Scientific and Statistical Database Management
6
OptProfit Algorithm
Profit-Based Scheduling
• Prioritize jobs according to their profit densities
• Difficulty: assigning profits of non-leaf jobs
– E.g.: Sum of its downstream leaf operators’ profits (Aurora
scheduler1)
1 Donald Carney, et al. Operator scheduling in a data stream manager. In
VLDB, 2003.
25th International Conference on Scientific and Statistical Database Management
7
Given the static job graph, OptProfit algorithm guarantees that
the produced job execution sequence is optimal.
25th International Conference on Scientific and Statistical Database Management
8
Deadline-Aware Strategies


Deadline-Aware Strategies

Profit-based scheduling is a static approach
 fails to consider the input’s actual deadline
Unfortunately, it has been proven that no on-line algorithm
can guarantee the optimal performance
• Prioritize jobs according to profit density first.
• If jobs with less profit density are about to expire, they may also
preempt jobs with higher profit density values.
• Two heuristics

dual approaches
25th International Conference on Scientific and Statistical Database Management
Deadline-Dominant Strategy (DD strategy)
• More urgent tasks are given higher priority to execute
• A job with high profit-density has a chance to preempt another
more urgent job (controlled by a threshold)
– Deadline-Dominant Strategy (DD)
– Profit-Dominant Strategy (PD)
Profit-Dominant Strategy (PD strategy)
9
Experimental Study
Experimental Study


Compare six scheduling algorithms (Basic, OptProfit, Basic+DD,
OptProfit+DD, Basic+PD, OptProfit+PD) on a prototype system

A round-robin scheduler used as a baseline

Test Queries




10
25th International Conference on Scientific and Statistical Database Management
Input Load
OptProfit generally
performs better than
other strategies mainly
due to its low
scheduling overhead
Randomly generated
Query numbers: 20 – 28
Each query has a weighting factor between 1 to 10
Test Data


Real dataset: a trace collected from the Internet Traffic Archive
Synthetic data: generated according to b-model to simulate data burst
25th International Conference on Scientific and Statistical Database Management
11
25th International Conference on Scientific and Statistical Database Management
12
Experimental Study

Experimental Study
Tuple Urgency

Data Burstiness
Deadline-aware
strategies are
generally
advantageous
25th International Conference on Scientific and Statistical Database Management
OptProft+PD
appears to be the
best choice
13
Summary





Revisited data stream scheduling for time-critical applications
Model the problem in a job scheduling perspective
The OptProfit algorithm can generate a schedule with
maximum benefit
OptProfit works reasonably well when workload is high or
inputs are not very “urgent”.
Deadline-aware strategies combined with the OptProfit
algorithm perform the best in general
25th International Conference on Scientific and Statistical Database Management
15
25th International Conference on Scientific and Statistical Database Management
14