Download Progress on Continuous Data Stream Processing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Continuous Data Stream
Processing
MAKE Lab
Date: 2006/03/07
Post-Excellence Project
Subproject 6
Continuous Data Stream Processing
Music Virtual Channel
Peer search
engine
Clustering
Profile
engine
database
Internet
Cluster
coordinator
Interface
Channel
monitor
V.C.
player
Profile
monitor
Favorite
channel
1
…
Music
channel
simulator
Filtering
XML
Filtering
engine
2
…
V.C.
player
Cluster
monitor
MusicXML
Music
database
metadata
N
Music collections
2
Continuous Data Stream Processing
Research Directions
Sequence Query Matching
Temporal Query Processing
Episode Query Matching
Filtering
Spatial Query Processing
Range Search
KNN Search
Aggregate Query Processing
Streaming
Data
Management
Top-K Search
Frequent Tree Pattern Mining
Closed Tree Pattern Mining
Frequent Itemset Mining
(sliding window)
Frequent Itemset Mining
(landmark model)
Mining
3
Continuous Data Stream Processing
Sequence Query Matching
Given a set of sequence queries (SQs), how to
continuously monitor the event stream for them
and report the segments that are approximate
answers of certain queries as soon as the segments
arrive according to the error bounds of the queries?
Event Stream
 <a,b,c,d><c,e><a,b,c><b,d><a,d><e,f><a,e><a,b,
c><e,f><a,b,c><e><b,c,e><d,f>······················
Sequence Query
 <a,b,c><b,d><a,c,d><e,f><a,e>, ε=1
4
Continuous Data Stream Processing
Episode Query Matching
Knowledge Discovery from Telecommunication
Network Alarm Databases [ICDE96]
 If an alarm of type A occurs, then an alarm of type B occurs within 30
seconds with probability 0.8
 If alarms of types A and B occurs within 5 seconds, then a alarm of
type C occurs within 60 seconds with probability 0.7
 If an alarm of type A precedes an alarm of type B, and C precedes D,
all within 15 seconds, then E will follow within 4 minutes with
probability 0.6
5 seconds
B
A
15 seconds
A
A
B
C
D
5
Continuous Data Stream Processing
Top-K Query
Suppose there are two continuous queries  and
. Then, another continuous query  is registered.
Which two web documents are the most popular across the first and
second servers?
Which two web documents are the most popular across the third and
fourth servers?
Which two web documents are the most popular across the second
Coordinator
and third servers?
Queries
Server 1
Server4
Server 2
Server 3
6
Continuous Data Stream Processing
Main Difficulties
 Heavy Communication Cost
 The serve only updates its current data when necessary
 Multiple Continuous Queries
 Most papers focus on one-time top-k queries or single
continuous top-k query
 Information sharing is necessary
7
Continuous Data Stream Processing
Spatial Query Processing
Continuous queries for moving objects in highdimensional space
 Range search
user
profile
Search
 KNN search
V.C.
engine
recommended
channel
user profile,
channel
V.C.
player
selected
channel
V.C.
player
player
Vote Mechanism
V.C.
player
V.C.
player
8
Continuous Data Stream Processing
Problem Definition
Given a set of objects with their positions on a Ndimension (N>20) region. The set of objects is
highly dynamic: each object can move in an
unrestricted fashion, i.e., we do not assume any
pattern of motion
Continuously monitoring the results of each query
point
 Range Query
 KNN Query
9
Continuous Data Stream Processing
Main Difficulties
Q1
Q2
 Heavy Communication Cost
 The object updates occur only when the results for some
queries might change
• Safe Region [SIGMOD05]
 Incremental Update
 Efficiently maintain the effective results
 Multiple Continuous Queries
 Decide the quarantine area for each query
 Mixed Types of Queries
 Support both the range query and
Q1
the KNN query
Q2
Q1 Q2
10
Continuous Data Stream Processing
Range Query
Query Q: (x,y), r
Cell C
A: max < r
B: min r  max
C: min > r
max: dis(query,cell)
min: dis(query,cell)
11
Continuous Data Stream Processing
Range Query (Cont.)
Moving Query MQ
How to maintain the
Result for a MQ?
12
Continuous Data Stream Processing
Range Query (Cont.)
Server
Q1 Q2 Q3
flag = 0/1
When to update?
Client
Q1
Q2
Q3
A
A
A
No update and no recalculate
A
A
B
Update and recalculate for some queries
A
A
C
No update and no recalculate
We only need to consider those objects marked with B
13
Continuous Data Stream Processing
Range Query (Cont.)
For a range query Q
Result list
O3 O5 O7
Covered cells
A
C3
C4
C5
B
C2
C7
C9
For a cell C
Affected queries A
Q2 Q4 Q7
B
Q3 Q6 Q9
C2
C2
Query Motion
14
Continuous Data Stream Processing
KNN Query
Query Q: (x,y), 3
Object Update
update the order
update the order
re-computation
15
Continuous Data Stream Processing
KNN Query (Cont.)
Query Q: (x,y), 3
d’max
Query Q’: (x’,y’), r
r = d’max
16
Continuous Data Stream Processing
KNN Query (Cont.)
Query Q: (x,y), 3
Query Q’: (x’,y’), r
r = dmax+dquery
dmax
dquery
17
Continuous Data Stream Processing
KNN Query (Cont.)
Query Q: (x,y), 3
Query Q’: (x’,y’), r
r = dmax+dcell
dmax
dcell
18
Continuous Data Stream Processing
Tree Pattern Mining
As the trees stream in, find out the subtrees that
occur more than θ·N times, where N is the number
of trees received so far and 0≦θ≦1
Frequent Tree Patterns
T3
T2
T1
STMer
19
Continuous Data Stream Processing
Closed Tree Pattern Mining
Mining closed frequent subtrees over data streams
 a subtree is closed if none of its proper supertrees
has the same support as its
A
A
B
B
B
C
C
D
frequent
subtrees
A
B
C
2
3
3
D
C
A
A
B
B
D
B
D
C
2
2
2
3
B
C
B
D
2
closed
C
2
20