Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Pattern Directed Mining Of
Sequence Data
By Valery Guralnik,
Duminda Wijesekera,
Jaideep Srivastava
Presenter : Jyothsna R
Nayak
contents
Introduction
Sequential Patterns
Data Structure and Algorithm
Experimental Evaluation
SP Tree Optimization
Conclusions
References
Introduction
Sequence data
event has an associated time of
occurrence
Episode is a collection of events
Frequent Episodes : Episodes occurring
with a frequency above a certain
threshold
Steps involved in mining of
frequent episodes
Present a language for specifying episodes of
interest
Describe a data structure: Sequential Pattern
Tree
Mining algorithm to generate frequent episodes
Optimize SP Tree
Sequential Patterns
Pattern language
A = {A1,A2,….Am}
D1, D2,…,Dm = Domains
e over A is a (m + 2) tuple(a1, a2,..,am,
tbeg, tend)
Example of Events in the Stock Market
Domain
Event ID
Date
Comp Type
Comp Name Movement
e1
01/02/91
Computer
Microsoft
Down
e2
01/03/91
Computer
Microsoft
Up
e3
01/02/91
Computer
Microsoft
e4
01/03/91
Computer
Microsoft
Volatility
Activenes
Low
High
Medium
High
NoMovmt
High
Low
Down
High
High
Definitions
Ordering Constraint
Serial Occurrence
e -> f ,
e.tend < f.tbegin
Parrallel Occurrence
(e || f)
Attribute constraint
Selection Constraint
e.type = ‘computer’
Join Constraint
e.name = f.name
Event specification
Partial specifications
e[(e.type = ‘computer’ v e.type = ‘electronic’) ^
e.movement_direction = ‘down’]
comparing some characteristics
e[e.movement_direction = ‘up’] -> [e.name =
f.name] f[f. movement_direction = ‘down’]
Data Structure
Leaf node represents an event
An interior node represents an ordering constraint
If . is an ordering constraint labeling some interior
node, and if e and f are the left and right children of
that node then e . f is a sequential pattern.
Associated with each node is a table of matching
events
Attached to each node is a Boolean expression tree
representing attribute constraints
SP Tree
Matching
episodes
Matching
events
Matching
events
=
e
=
e.mvmt
e.name
up
f
f.name
f.mvmt
=
down
SP Tree for e[e.mvmt = ‘up’] -> [e.name = f.name]f[f.mvmt = ‘down’]
User specified pattern
Bottom-up algorithm
Intialize queue Q to empty
for (each leaf 1 in T) do begin
generate events from S that match constraints of 1
if(the parent p of 1 is not ready in Q) then
put p in Q
end
While (Q is not empty) do begin
Remove node n from Q
Generate_Events(n)
if(for n’s parent p another child was processed) then
put p in Q
end
Generate-events Algorithm
for(each episode e from left child l of n) do begin
for (each episode f from right child r of n) do begin
if(node n is serial) then
if(e.tend >= f.tbegin) then
continue
if(events in e and f match the join constraint) then
form new episode g from events from e and f
end
end
Experimental evaluation
Results
window size variation
data set size
number of event specifications
attribute constraints
18
16
14
12
Time in Secs 10
8
6
4
2
0
4
5
6
7
8
9
Window Size in Days
Minimum Frequency = 0.8
10
11
40
35
30
25
Time in Secs 20
15
10
5
0
1
2
3
4
5
6
Number of Event specifications
Minimum Frequency = 0.8
Window size = 11
30
25
20
Time in Secs 15
10
5
0
3
4
5
6
Number of constraints
Minimum Frequency = 0.8
window size = 5
7
7
6
5
Time in Secs
4
3
2
1
0
2
3
4
5
6
7
8
9 10 11 12
Number of Events in Data sets
Minimum Frequency = 0.7
Window size = 5
SP Tree Optimization
If two event nodes represent the same
event, then only one of the nodes can be
used.
If two ordering nodes have the same join
constraints, and they both have the left
and right children representing the same
events then one such node is sufficient.
Conclusions
Approach is
Robust
Flexible
Efficient
Complex pattern
Good performance
References
Discovering frequent episodes in sequences by
Mannila. H., Toivonen, H and Verkamo
Agarwal, R., and Srikanth “Mining sequential
patterns”
Mannila. H., Toivonen, H “ Discovering
generalised episodes using minimal occurences
Agarwal, R., and Srikanth”Mining generalised
association rules