Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Efficient Mining of Partial
Periodic Patterns in Time Series
Database
J. Han, G.Dong, Y. Yin
695 KDD
Paper Presentation
University of Alberta
Department of Computing Science
Anita Petrinjak
Introduction
Outline
•
•
•
•
•
•
•
Introduction
Problem definition
Methods for mining partial periodic patterns
Implementation issues
Performance
Conclusion
Future work
2
Problem definition
• Previous work:
• Partial periodic pattern:
– mining full periodicity -> not applicable
– time series data mining -> not periodicity
– abc***d*abc***d*abc***d*
– s = abc***d* :
• length of period=8
• L-length=4
• Importance
– real life patterns - mostly partially periodical
• Goal:
– discovering all frequent patterns of the time
series for one period or a range of periods
3
– frequency count, m - maximum number of
periods of length |s|
– confidence(s)=frequency count/m
– confidence threshold
4
Methods
Single-period methods
• Apriori on periodicity
– each subpattern of a frequent pattern of period
p is itself a frequent pattern of period p.
– Apriori-like algorithm: p database scans
– PROBLEM:
M in in g P artial P e rio d ic ity
S in g le-p eriod
A p riori m eth od
M ax-s u b p a ttern
h it-s et m e th od
M u ltip le p eriod s
L o op in g ove r
s in g le-p e rio d
c om p u tation
S h are d m in in g
o f m u ltip le
p erio d s
• the number of frequent i-patterns shrinks slowly as i
increases (unlike in mining association rules)
5
6
Single-period methods
Single-period methods
• Single-period max-subpattern hit set
method
• Single-period max-subpattern hit set
algorithm:
– improved mining of partial periodicity:
– 1) max-pattern, Cmax
– 1) Scan db to find set of frequent 1-patterns,
generate max-pattern Cmax
– 2) Scan db to find max-subpattern for each
period segment in time-series data
– 3) Derive frequent patterns from the maxsubpattern tree
• combination of all 1-patterns
• Example: {a***, *b**, **c*} -> Cmax=abc*
– 2) subpattern is hit of a period segment if it is
maximal subpattern of Cmax and is true in that
period segment
7
• 2 database scans
8
Multiple periods methods
Multiple periods methods
• Looping over single period computation
• Shared mining of multiple periods
– find all partial periodic patterns for a set of
periods p1, …, pk
– given set of periods p1, …, pk
• Algorithm:
• Algorithm:
– 1) for each period pj in the range of interest (p1,
…, pk) apply max-subpattern hit set algorithm
• 2xk database scans
9
Implementation issues
– 1) find sets of 1-patterns and Cmax-s for each
period pj
– 2) scan db, find max-subpatterns for every
period segment for each period pj
• 2 database scans
• requires more space than multiple scan
10
Max-subpattern tree
• Max-subpattern hit set algorithm - efficient
implementation:
– good data structure:
• storing the set of max-subpattern hits
• deriving the set of all frequent patterns
– new data structure: max-subpattern tree
11
12
Max-subpattern tree
Performance
• Insertion of a node in the max-subpattern tree:
– starting from the root, find the corresponding node
by checking the missing non-* letter
– if node exists, increase count, otherwise create it
Time/
seconds
Apriori 500K
6000
HitSet 500K
3000
• Derivation of frequent patterns from maxsubpattern tree
Apriori 100K
1000
– Apriori-like technique
HitSet 100K
2
4
6
8
10
Max-Pat-Length
13
Conclusion/Summary
Future work
• Mining partial periodicity in time series
database
• Properties:
• Further exploration of shared mining for
periodicity with multiple periods
• Mining periodic association rules based on
partial periodicity
• Query- and constraint- based mining of
partial periodicity
– Apriori-like property
– max-subpattern hit set property
• Best performance: Max-subpattern hit set
algorithm
• Multiple periods mining: lots of scans or lot of
memory
14
15
16