* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann” Ratanamahatana Eamonn Keogh 2004 SIAM International Conference on DATA MINING April 22, 2004 Roadmap • Time series and their similarity measures • Euclidean distance and its limitation • Dynamic time warping (DTW) • Global constraints • R-K band • Experimental Evaluation • Conclusions and future work Important Note! You are free to use any slides in this talk for teaching purposes, provide that the authorship of the slides is clearly attributed to Ratanamahatana and Keogh. You may not use any text or images contained here in a paper (including tech reports or unpublished works) or tutorial, without the express permission of Dr.Keogh. Chotirat Ann Ratanamahatana and Eamonn Keogh. Making Time-series Classification More Accurate Using Learned Constraints. In proceedings of SIAM International Conference on Data Mining (SDM '04), Lake Buena Vista, Florida, April 22-24, 2004. pp. 11-22 Classification in Time Series Classification, in general, maps data into predefined groups (supervised learning) Will this person buy a computer? Age Income Student CreditRating Class: buy comp. 28 High No Fair No 25 High No Excellent No 35 High No Fair Yes 45 Medium No Excellent No 18 Low Yes Fair Yes 49 High No Fair ?? Pattern Recognition is a type of supervised classification where an input pattern is classified into one of the classes based on its similarity to these predefined classes. Which class does belong to? Class A Class B Euclidean Distance Metric Given 2 time series Q = q1, …, qn and C = c1, …, cn their Euclidean distance is defined as 1.5 1 0.5 0 Q -0.5 C -1 -1.5 0 50 100 150 0 50 100 150 1.5 1 D(Q, C ) n 2 ( q c ) i i i 1 0.5 0 -0.5 -1 -1.5 Limitations of Euclidean Metric Very sensitive to some distortion in the data Training data consists of 10 instances from each of the 3 classes Perform a 1-nearest neighbor algorithm, with “leaving-one-out” evaluation, averaged over 100 runs. Euclidean distance Error rate: 29.77% DTW Error rate: 3.33 % Dynamic Time Warping (DTW) Euclidean Distance One-to-one alignments Time Warping Distance Non-linear alignments are allowed How Is DTW Calculated? (I) Q DTW (Q, C ) min C K k 1 wk C Q Warping path w How Is DTW Calculated? (II) Each warping path w can be found using dynamic programming to evaluate the following recurrence: (i, j ) d (qi , c j ) min{ (i 1, j 1), (i 1, j ), (i, j 1)} where γ(i, j) is the cumulative distance of the distance d(i, j) and its minimum cumulative distance among the adjacent cells. (i-1, j) (i-1, j-1) (i, j) (i, j-1) Global Constraints (I) Prevent any unreasonable warping C C Q Q Sakoe-Chiba Band Itakura Parallelogram Global Constraints (II) A Global Constraint for a sequence of size m is defined by R, where Ri = d 0 d m, 1 i m. Ri defines a freedom of warping above and to the right of the diagonal at any given point i in the sequence. Ri Sakoe-Chiba Band Itakura Parallelogram Is Wider the Band, the Better? Euclidean distance = 2.4836 DTW dist = 1.6389 R=1 DTW dist = 1.0204 R = 10 DTW dist = 1.0204 R = 25 identical Wider Isn’t Always Better Recall this example Most accuracies peak at smaller window size 4 5 x 10 100 4.5 95 4 3.5 CPU Time (msec) Accuracy (%) 90 auslan gun digit trace wordspotting 85 80 75 3 auslan gun digit trace wordspotting 2.5 2 1.5 70 1 65 60 0 0.5 0 10 20 Euclidean 30 40 Warping Window Size 50 60 70 0 10 20 30 40 Warping Window Size 50 60 70 Larger warping window is not always a good thing. Ratanamahatana-Keogh Band (R-K Band) Solution: we create an arbitrary shape and size of the band that is appropriate for the data we want to classify. How Many Bands Do We Need? • Of course, we could use ONE same band to classify all the classes, as almost all of the researchers do. • But…the width of the band does depend on the characteristic of the data within each class. Having one single band for classification is unlikely to generalize. • Our proposed solution: We create an arbitrary band (R-K band) for each class and use it accordingly for classification. How Do We Create an R-K Band? First Attempt: We could look at the data and manually create the shape of the bands. (then we need to adjust the width of each band as well until we get a good result) 1.5 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 -1.5 -1.5 -2 -2 -2.5 0 50 100 150 200 250 -2.5 250 250 200 200 150 150 100 100 50 50 50 100 150 200 250 0 100 % Accuracy! 50 100 150 200 250 50 100 150 200 250 Learning an R-K Band Automatically Our heuristic search algorithm automatically learns the bands from the data. (sometimes, we can even get an unintuitive shape that give a good result.) 1.5 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 -1.5 -1.5 -2 -2 -2.5 0 50 100 150 200 250 -2.5 250 250 200 200 150 150 100 100 50 50 50 100 150 200 250 0 50 100 150 200 250 50 100 150 200 250 100 % Accuracy as well! R-K Band Learning With Heuristic Search Calculate h(1) Calculate h(2) h(2) > h(1) ? Yes No R-K Band Learning in Action! Click on figure to animate Experiment: Datasets 1. Gun Problem 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 2. 50 100 150 -1 0 50 100 150 Trace (transient classification benchmark) 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 0 0 0 0 -1 -1 -1 -1 -2 -3 3. 0 -2 0 50 100 150 200 250 300 -3 -2 0 50 100 150 200 250 300 Handwritten Word Spotting data -3 -2 0 50 100 150 200 250 300 -3 0 50 100 150 200 250 300 Experimental Design We measure the accuracy and CPU time of each dataset, using the following methods: 1. Euclidean distance 2. Uniformed warping window (size 1 to 100) 3. Learning different R-K bands for all classes, and perform classification based on them. The leaving-one-out in 1-nearest-neighbor classification is used to Measure the accuracy. The lower bounding method is also used to prune off unnecessary Calculation of DTW. Experimental Results (I) 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 0 50 100 150 -1 140 140 120 120 100 100 80 80 60 60 40 40 20 20 20 40 60 80 100 120 140 Gun Draw 0 50 20 40 100 60 80 100 150 120 140 Point Euclidean Best Unif. Warping 10% Unif. Warping DTW with R-K Band Error Rate (%) 5.5 1.0 (width = 4) 4.5 (width = 15) 0.5 (max width = 4) CPU Time (msec) N/A 2,440 5,430 1,440 CPU Time (no LB) 60 11,820 17,290 9,440 Experimental Results (II) 4 4 4 4 3 3 3 3 2 2 2 2 1 1 1 1 0 0 0 0 -1 -1 -1 -1 -2 -2 -2 -2 -3 0 50 100 150 200 250 300 -3 0 50 100 150 200 250 300 -3 0 50 100 150 200 250 300 -3 250 250 250 250 200 200 200 200 150 150 150 150 100 100 100 100 50 50 50 50 50 100 150 200 250 50 100 150 200 250 50 100 150 200 250 0 50 50 100 100 150 150 200 250 200 Euclidean Best Unif. Warping 10% Unif. Warping DTW with R-K Band Error Rate (%) 11 0 (width = 8) 0 (width = 27) 0 (max width = 7) CPU Time (msec) N/A 16,020 34,980 7,420 CPU Time (no LB) 210 144,470 185,460 88,630 300 250 Conclusions • Different shapes and widths of the band contributes to the classification accuracy. • Each class can be better recognized using its own individual R-K Band. • Heuristic search algorithm is a good approach to R-K Band learning. • R-K Band combining with the Lower Bounding technique yields higher accuracy and makes a classification task much faster. Future Work • Investigate other choices that may make envelope learning more accurate. – Heuristic functions – Search algorithm (refining the search) • Is there a way to always guarantee an optimal solution? • Examine the best way to deal with multi-variate time series. • Consider a more generalized form of our framework, i.e. a single R-K Band is learned for a particular domain. • Explore the utility of R-K Band specifically on real-world problems: music, bioinformatics, biomedical data, etc. Contact: [email protected] [email protected] Homepage: All datasets are publicly available at: UCR Time Series Data Mining Archive: