Download Pattern Directed Mining Of Sequence Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Pattern Directed Mining Of
Sequence Data
By Valery Guralnik,
Duminda Wijesekera,
Jaideep Srivastava
Presenter : Jyothsna R
Nayak
contents
Introduction
Sequential Patterns
Data Structure and Algorithm
Experimental Evaluation
SP Tree Optimization
Conclusions
References
Introduction
Sequence data
event has an associated time of
occurrence
Episode is a collection of events
Frequent Episodes : Episodes occurring
with a frequency above a certain
threshold
Steps involved in mining of
frequent episodes
Present a language for specifying episodes of
interest
Describe a data structure: Sequential Pattern
Tree
Mining algorithm to generate frequent episodes
Optimize SP Tree
Sequential Patterns
Pattern language
A = {A1,A2,….Am}
D1, D2,…,Dm = Domains
e over A is a (m + 2) tuple(a1, a2,..,am,
tbeg, tend)
Example of Events in the Stock Market
Domain
Event ID
Date
Comp Type
Comp Name Movement
e1
01/02/91
Computer
Microsoft
Down
e2
01/03/91
Computer
Microsoft
Up
e3
01/02/91
Computer
Microsoft
e4
01/03/91
Computer
Microsoft
Volatility
Activenes
Low
High
Medium
High
NoMovmt
High
Low
Down
High
High
Definitions
 Ordering Constraint
 Serial Occurrence
e -> f ,
e.tend < f.tbegin
 Parrallel Occurrence
(e || f)
 Attribute constraint
 Selection Constraint
e.type = ‘computer’
 Join Constraint
e.name = f.name
Event specification
Partial specifications
e[(e.type = ‘computer’ v e.type = ‘electronic’) ^
e.movement_direction = ‘down’]
comparing some characteristics
e[e.movement_direction = ‘up’] -> [e.name =
f.name] f[f. movement_direction = ‘down’]
Data Structure





Leaf node represents an event
An interior node represents an ordering constraint
If . is an ordering constraint labeling some interior
node, and if e and f are the left and right children of
that node then e . f is a sequential pattern.
Associated with each node is a table of matching
events
Attached to each node is a Boolean expression tree
representing attribute constraints
SP Tree
Matching
episodes
Matching
events
Matching
events
=
e
=
e.mvmt
e.name
up
f
f.name
f.mvmt
=
down
SP Tree for e[e.mvmt = ‘up’] -> [e.name = f.name]f[f.mvmt = ‘down’]
User specified pattern
Bottom-up algorithm
Intialize queue Q to empty
for (each leaf 1 in T) do begin
generate events from S that match constraints of 1
if(the parent p of 1 is not ready in Q) then
put p in Q
end
While (Q is not empty) do begin
Remove node n from Q
Generate_Events(n)
if(for n’s parent p another child was processed) then
put p in Q
end
Generate-events Algorithm
 for(each episode e from left child l of n) do begin
for (each episode f from right child r of n) do begin
if(node n is serial) then
if(e.tend >= f.tbegin) then
continue
if(events in e and f match the join constraint) then
form new episode g from events from e and f
end
end
Experimental evaluation
Results
window size variation
data set size
number of event specifications
attribute constraints
18
16
14
12
Time in Secs 10
8
6
4
2
0
4
5
6
7
8
9
Window Size in Days
Minimum Frequency = 0.8
10
11
40
35
30
25
Time in Secs 20
15
10
5
0
1
2
3
4
5
6
Number of Event specifications
Minimum Frequency = 0.8
Window size = 11
30
25
20
Time in Secs 15
10
5
0
3
4
5
6
Number of constraints
Minimum Frequency = 0.8
window size = 5
7
7
6
5
Time in Secs
4
3
2
1
0
2
3
4
5
6
7
8
9 10 11 12
Number of Events in Data sets
Minimum Frequency = 0.7
Window size = 5
SP Tree Optimization
If two event nodes represent the same
event, then only one of the nodes can be
used.
If two ordering nodes have the same join
constraints, and they both have the left
and right children representing the same
events then one such node is sufficient.
Conclusions
Approach is
Robust
Flexible
Efficient
Complex pattern
Good performance
References
Discovering frequent episodes in sequences by
Mannila. H., Toivonen, H and Verkamo
Agarwal, R., and Srikanth “Mining sequential
patterns”
Mannila. H., Toivonen, H “ Discovering
generalised episodes using minimal occurences
Agarwal, R., and Srikanth”Mining generalised
association rules
Related documents