Download NiagaraCQ - CS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linked list wikipedia , lookup

Quadtree wikipedia , lookup

Lattice model (finance) wikipedia , lookup

Red–black tree wikipedia , lookup

Binary search tree wikipedia , lookup

Interval tree wikipedia , lookup

B-tree wikipedia , lookup

Transcript
NiagaraCQ
A Scalable Continuous Query System
for Internet Databases
Outline
1.
2.
3.
4.
Problem
NiagaraCQ
Selection Placement Strategies
Dynamic Regrouping Algorithm
NiagaraCQ
2
Problem
Lack of a scalable and efficient system which
supports persistent queries, that allow users to
receive new results when they become available:
Notify me whenever the price of Dell stock drops by more
than 5% and the price of Intel stock remains unchanged over
next three months.
NiagaraCQ
3
NiagaraCQ
Support continues queries
Change-based queries
Timer-based queries
Scalability
Performance
Adequate to the Internet
User Interface - high level query language
NiagaraCQ
4
Command Language
Create continuous query:
CREATE CQ_name
XML-QL query
DO action
{START start_time} {EVERY time_interval}
{EXPIRE expiration_time}
Delete continuous query:
DELETE CQ_name
NiagaraCQ
5
Expression Signature
Represent the same syntax structure, but possibly
different constant values, in different queries.
Where <Quotes> <Quote>
<Symbol>INTC</>
</> </> element_as $g
in “http://www.cs.wisc.edu/db/quotes.xml”
construct $g
Where <Quotes> <Quote>
<Symbol>MSFT</>
</> </> element_as $g
in “http://www.cs.wisc.edu/db/quotes.xml”
construct $g
NiagaraCQ
6
Expression Signature (2)
=
Quotes.Quote.Symbol
in quotes.xml
NiagaraCQ
constant
7
Query Plan
Trigger Action I
Trigger Action J
Select
Symbol=“INTC”
Select
Symbol=“MSFT”
File Scan
File Scan
quotes.xml
quotes.xml
NiagaraCQ
8
Group Signature
Common expression signature of all queries in the
group
=
Quotes.Quote.Symbol
in quotes.xml
NiagaraCQ
constant
9
Group Constant Table
Constant_value
Destination_buffer
…
…
INTC
Dest . I
MSFT
Dest . J
…
…
NiagaraCQ
10
Group Plan
……..
Trigger Action I
Trigger Action J
Split
Join
Symbol = Constant_value
File Scan
File
quotes.xml
Constant Table
NiagaraCQ
11
Incremental Grouping Algorithm
1. Group optimizer
traverses the query
plan bottom up.
2. Matches the query’s
expression
signature with the
signatures
of
existing groups.
NiagaraCQ
Trigger Action
Select
Symbol=“AOL”
File Scan
quotes.xml
12
Incremental Grouping Algorithm (2)
3. Group optimizer breaks the
query plan into two parts.
Lower – removed
Upper – added onto the
group plan.
Trigger Action
Select
Symbol=“AOL”
File Scan
4. Adds the constant to the
constant table.
NiagaraCQ
quotes.xml
13
Pipeline Approach
Tuples are pipelined from the output of one
operator into the input of the next operator.
Disadvantages
Doesn’t work for grouping timer-based queries.
Split operator may become a bottleneck.
Not all parts should be executed.
NiagaraCQ
14
Intermediate Files
NiagaraCQ
15
Intermediate Files (2)
Advantages
 Intermediate files and data sources are monitored
uniformly.
 Each query is scheduled independently.
 The potential bottleneck problem of the pipelined
approach is avoided.
Disadvantages
Extra disk I/Os.
Split operator becomes a blocking operator.
NiagaraCQ
16
Virtual Intermediate Files
Where <Quotes> <Quote>
<Change_ratio>$c</>
</> </> element_as $g
in “quotes.xml”, $c>0.05
construct $g
Overlap
Where <Quotes> <Quote>
<Change_ratio>$c</>
</> </> element_as $g
in “quotes.xml”, $c>0.15
construct $g
>
Quotes.Quote.Change_Ratio
in quotes.xml
constant
NiagaraCQ
17
Virtual Intermediate Files (2)
All outputs from split operator are stored in one
real intermediate file.
This file has index on the range attribute.
Virtual intermediate files store a value range.
Modification of virtual intermediate files can
trigger upper-level queries.
The value range is used to retrieve data from the
real intermediate file.
NiagaraCQ
18
Event Detection
Types of Events
Data-source change
Timer
Types of data sources
Push-based
Pull-based
NiagaraCQ
19
Timer-based
Timer events are stored in an event list, sorted in
time order.
Each entry stores query ids.
Query will be fired if its data source has been
modified since its last firing time.
After a timer event, the next firing times are
calculated and the queries are added into the
corresponding entries.
NiagaraCQ
20
Incremental Evaluation
Queries are been invoked only on changed data.
For each file, NiagaraCQ keeps a “delta file”.
Queries are run over delta files.
Incremental evaluation of join operators
requires complete data files.
Time stamp is added to each tuple in order to
support timer-based.
NiagaraCQ
21
Memory Caching
Query plans - using LRU policy that favors
frequently fired queries.
Data files - favors the delta files.
Event list – only a “time window”
NiagaraCQ
22
System Architecture
NiagaraCQ
23
If file changes and timer
events are satisfied, ED
provides CQM with a list
of firing CQs
CQM adds continuous
queries with file and timer
information to enable ED
to monitor the events
Continues Queries Processing
CQM
Continuous
invokes
Query Manager
QE to
(CQM)
execute
6
firing CQs
1
5
Event Detector
(ED)
2, 3
7
4
ED asks DM
to monitor
changes to
files
Data Manager DM informs
ED
of
(DM)
8
changes
to
When a timer event
pushed-based
File scan operator
DM only returns changes
happens, ED asks
data sources
calls DM to retrieve
between last fire time DM
andthe last modified
selected documents
24
current fire time NiagaraCQtime of files
Query Engine
(QE)
Selection Placement Strategies
Where <Quotes><Quote><Symbol>$s</> <Price>$p</></>
element_as $g </> in “quotes.xml”, $p > 90
<Companies><Company><Symbol>$s</></>
element_as $t</> in “profiles.xml” construct $g, $t
Where <Quotes><Quote><Symbol>$s</> <Price>$p</></>
element_as $g </> in “quotes.xml”, $p > 100
<Companies><Company><Symbol>$s</></>
element_as $t</> in “profiles.xml” construct $g, $t
NiagaraCQ
25
Expressions Signatures
>
Quotes.Quote.Price
in quotes.xml
constant
Symbol=Symbol
quotes.xml
profiles.xml
NiagaraCQ
26
Where to place the selection
operator ?
Below the join - PushDown
(σ1R S) U (σ2R S) U … U (σnR S)
Above the join – PullUp
σ1(R S) U σ2(R S) U … U σn(R S)
PullUp achieves an average 10-fold performance
improvement over PushDown.
NiagaraCQ
27
PushDown - Query Plan
Join
Select
Price>90
profiles.xml
quotes.xml
NiagaraCQ
28
PushDown - Groups Plans
NiagaraCQ
29
PullUp - Groups Plans
NiagaraCQ
30
PullUp Vs. PushDown
Only one join group and one selection group
 Maintains a single intermediate file
Irrelevant tuples being joined
Very large intermediate file
Changes in profiles.xml affect the intermediate
file (file_k) – maintenance overhead.

NiagaraCQ
31
Filtered PullUp
quotes.xml
Grouped Join Plan
Join
Selection
Price>90
profiles.xml
quotes.xml
NiagaraCQ
32
Filtered PullUp Vs. PullUp
Relevant tuples being joined
 Reduce the size of intermediate file
 Reduce the cost of PullUp by 75%
Complexity – the selection predicate may need
to be dynamically modified (query with
price>70)

NiagaraCQ
33
Dynamic Re-grouping
Let Q1 (A B C) and Q2 (B C) be two
continuous queries submitted sequentially.
Incremental grouping algorithm chooses a plan
((A B) C).
Neither of these groups can be used for Q2.
ABC
AB
ABC
BC
BC
NiagaraCQ
34
Dynamic Re-grouping (2)
Existing queries are not regrouped with new
grouping
opportunities
introduced
by
subsequent queries.
Reduction in the overall performance - queries
are continuously being added and removed.
Naive regrouping-algorithm – periodically
perform a global query optimization:
Expensive
Redundant work (already done by incremental opt.)
NiagaraCQ
35
Data Structures
A query graph – directed acyclic graph, with each node
representing an existing join expression in the group plan.
Node {
char* query;
SIG_TYPE sig;
int final_node_count;
list<Child*> children;
list<Node*> parents;
float updateFreq;
float cost;
//ASCII query plan
//signature of the query string
//number of users that require this query.
//0: non-final node; >0: final node
//children of this node, where Child={Node*, weight}
//parents of this node
//update frequency of this node
//the cost for computing this node
//Following data structures used only for dynamic regrouping
int reference_count;
//reference count
bool visited;
//a flag that records whether
//purgeSibling has performed on this node
}
NiagaraCQ
36
Data Structures (2)
A group table – array of hash tables.
i-th hash table - queries with query length
(number of joins) i.
Hash table entry - mapping from a query string
to the corresponding node in the graph.
Array
Hash
Node
NiagaraCQ
37
Data Structures (3)
A query log – array of vectors.
Stores new nodes that have been added since the
last regrouping.
Cleared after regrouping.
Array
Vector
Node
NiagaraCQ
38
Incremental Grouping Algorithm
Top-down local exhaustive search:
If the query exists, increases the final node count by 1.
Else
Enumerates all possible sub-query in a top-down
manner and probes the group table to check whether a
sub-query node exists.
Computes the minimal cost of using existing sub-query
nodes.
Computes the minimal cost without using existing subquery nodes.
The least-costly plan will be chosen.
NiagaraCQ
39
Dynamic Regrouping Algorithm
Phase 1 : constructing links among existing
nodes and new nodes.
Phase 2 : find minimal-weighted solution from
the current solution by removing redundant
nodes.
ABC
AB
BC
NiagaraCQ
40
Phase 1: constructing links among
existing nodes and new nodes
Main idea - for any pair of nodes in the graph,
if one node is a sub-query of another node, it
creates a link between them if it did not exist
before.
Relationships are only evaluated between
existing nodes and nodes added since last
regrouping.
The difference of levels between a parent and a
child is always 1.
NiagaraCQ
41
Phase 1 - Algorithm
bottom-up
for each node in level i query log
if node has parents in level i+1 group table
connect node to parent
if node has children in level i-1 group table
connect node to children
NiagaraCQ
42
Phase 2: A greedy algorithm for levelwise graph minimization
Main idea – traverse the query graph level-bylevel and attempt to remove any redundant
nodes at one level a time.
Starts from the second level from the top.
Subset of level i nodes retain if:
Nodes at level i+1 have at least one child in this set.
These nodes have a minimum total cost.
Nodes that are not selected are removed
permanently.
NiagaraCQ
43
Phase 2 - Algorithm
MinimizeGraph() {
for each level L in group-table {
// L ranging from the maximum number of join-1 to 1
for each node N in the level-L group table
InitializeSet(N)
for each node N in finalSet
PurgeSiblings(N);
while (remain set is not empty) {
scan each node R in the remain set {
if (R’s reference count == 0) {
remove R from the remain set
deleteNode(R)
}
else if (R.cost/R.reference_count <
Current_minimum) {
M=R
Current_minimum
=R.cost/R.reference_count;
}
} //scan …
remove M from the remain set
PurgeSiblings(M)
} //while…
InitializeSet(Node N) {
if N is a final node
Add N into final_set
else {
add N into the remain_set
N.reference_count =
number of parents of N
}
N.visited = false
}
purgeSiblings(Node N) {
For each parent P of N {
if (!P.visited) {
Decrease the reference count of
N’s siblings of same parent P by 1
P.visited = true
}
}
}
} //for each level …
} //MinimizeGraph
NiagaraCQ
44
Cost Analysis
N = number of queries
Number of nodes is proportional to the number
of queries = C*N
Each query contains no more then 10 joins.
Each level contain about C*N/10 nodes
NiagaraCQ
45
Cost Analysis – Phase 1
R or K*R = regrouping frequencies
In frequency R
N/R = number of regrouping
C*R = number of nodes that will be joined with
existing nodes.
m*C*R = number of nodes after m-1
regrouping.
m*(C*R)2 = number of comparisons for m-th
regrouping (ignoring a constant reduction).
NiagaraCQ
46
Cost Analysis – Phase 1 (2)
Total number of comparisons, frequency R:
(C*R)2+2*(C*R)2+…+N/R*(C*R)2 =
N(N+R)C2/2 = O(N2)
Total number of comparisons, frequency K*R:
(C*K*R)2+…+(N/(K*R))*(C*K*R)2 =
N(N+KR)C2/2
The ratio:
[N(N+KR)C2/2]/[N(N+R)C2/2] = (N+KR)/(N+R)
NiagaraCQ
47
Cost Analysis – Phase 2
Worst case – each pass remove one node.
Cost for a level:
(C*N/10) +(C*N/10-1) +…+1 =
CN(CN+10)/200 = O(N2)
Purge siblings:
(C*N/10 * C*N/10) = (CN)2/100 = O(N2)
All 9 levels: O(N2)
NiagaraCQ
48
References
NiagaraCQ: A Scalable Continuous Query System for
Internet Databases
http://www.cs.wisc.edu/niagara/papers/NiagaraCQ.pdf
Design and Evaluation of Alternative Selection
Placement Strategies in Optimizing Continuous Queries
http://www.cs.wisc.edu/niagara/papers/Icde02.pdf
Dynamic Re-grouping of Continuous Queries
http://www.cs.wisc.edu/niagara/papers/507.pdf
NiagaraCQ
49