Download Pattern Directed Mining Of Sequence Data

Pattern Directed Mining Of Sequence Data By Valery Guralnik, Duminda Wijesekera, Jaideep Srivastava Presenter : Jyothsna R Nayak contents Introduction Sequential Patterns Data Structure and Algorithm Experimental Evaluation SP Tree Optimization Conclusions References Introduction Sequence data event has an associated time of occurrence Episode is a collection of events Frequent Episodes : Episodes occurring with a frequency above a certain threshold Steps involved in mining of frequent episodes Present a language for specifying episodes of interest Describe a data structure: Sequential Pattern Tree Mining algorithm to generate frequent episodes Optimize SP Tree Sequential Patterns Pattern language A = {A1,A2,….Am} D1, D2,…,Dm = Domains e over A is a (m + 2) tuple(a1, a2,..,am, tbeg, tend) Example of Events in the Stock Market Domain Event ID Date Comp Type Comp Name Movement e1 01/02/91 Computer Microsoft Down e2 01/03/91 Computer Microsoft Up e3 01/02/91 Computer Microsoft e4 01/03/91 Computer Microsoft Volatility Activenes Low High Medium High NoMovmt High Low Down High High Definitions  Ordering Constraint  Serial Occurrence e -> f , e.tend < f.tbegin  Parrallel Occurrence (e || f)  Attribute constraint  Selection Constraint e.type = ‘computer’  Join Constraint e.name = f.name Event specification Partial specifications e[(e.type = ‘computer’ v e.type = ‘electronic’) ^ e.movement_direction = ‘down’] comparing some characteristics e[e.movement_direction = ‘up’] -> [e.name = f.name] f[f. movement_direction = ‘down’] Data Structure      Leaf node represents an event An interior node represents an ordering constraint If . is an ordering constraint labeling some interior node, and if e and f are the left and right children of that node then e . f is a sequential pattern. Associated with each node is a table of matching events Attached to each node is a Boolean expression tree representing attribute constraints SP Tree Matching episodes Matching events Matching events = e = e.mvmt e.name up f f.name f.mvmt = down SP Tree for e[e.mvmt = ‘up’] -> [e.name = f.name]f[f.mvmt = ‘down’] User specified pattern Bottom-up algorithm Intialize queue Q to empty for (each leaf 1 in T) do begin generate events from S that match constraints of 1 if(the parent p of 1 is not ready in Q) then put p in Q end While (Q is not empty) do begin Remove node n from Q Generate_Events(n) if(for n’s parent p another child was processed) then put p in Q end Generate-events Algorithm  for(each episode e from left child l of n) do begin for (each episode f from right child r of n) do begin if(node n is serial) then if(e.tend >= f.tbegin) then continue if(events in e and f match the join constraint) then form new episode g from events from e and f end end Experimental evaluation Results window size variation data set size number of event specifications attribute constraints 18 16 14 12 Time in Secs 10 8 6 4 2 0 4 5 6 7 8 9 Window Size in Days Minimum Frequency = 0.8 10 11 40 35 30 25 Time in Secs 20 15 10 5 0 1 2 3 4 5 6 Number of Event specifications Minimum Frequency = 0.8 Window size = 11 30 25 20 Time in Secs 15 10 5 0 3 4 5 6 Number of constraints Minimum Frequency = 0.8 window size = 5 7 7 6 5 Time in Secs 4 3 2 1 0 2 3 4 5 6 7 8 9 10 11 12 Number of Events in Data sets Minimum Frequency = 0.7 Window size = 5 SP Tree Optimization If two event nodes represent the same event, then only one of the nodes can be used. If two ordering nodes have the same join constraints, and they both have the left and right children representing the same events then one such node is sufficient. Conclusions Approach is Robust Flexible Efficient Complex pattern Good performance References Discovering frequent episodes in sequences by Mannila. H., Toivonen, H and Verkamo Agarwal, R., and Srikanth “Mining sequential patterns” Mannila. H., Toivonen, H “ Discovering generalised episodes using minimal occurences Agarwal, R., and Srikanth”Mining generalised association rules

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Pattern Directed Mining Of Sequence Data