Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Exact Indexing of Dynamic Time Warping Dr Eamonn Keogh University of California – Riverside Computer Science & Engineering Department What are Time Series? • A time series is a collection of observations made sequentially in time. • Lots of useful information can be obtained by measuring time series data over times. pattern that are commonly being classified • Time series occur in virtually every medical, scientific and businesses domain • Finding out the similarity between two time series is the heart of many time series data mining applications What are the challenges of working with Time Series data? • subjective notion of similarity How do we define similarity • large amount of data • different type of data format How do we search quickly Any solutions available? •We need a method that allows an elastic shifting of the time axis, to accommodate sequences which are similar, but out of time phase •Euclidean Distance •most popular approach for defining similarity and indexing of time series data. •a very brittle distance approach which cannot index time series accurately among two different time phases. •Dynamic Time Warping •base on dynamic programming which proved to be a very reliable method. •does not obey the triangular inequality. This has resisted attempts at exact indexing. •“… performance on large database may be a limitation.” What are the challenges of working with Time Series data? cont. Classification experiment on Cylinder-Bell-Funnel dataset • Training data consists of 10 exemplars from each class. • (One) Nearest Neighbor Algorithm. • “Leaving-one-out” evaluation, averaged over 100 runs. Times Series A Times Series B Comparison of two approaches mean error rate speed Euclidean Distance Metric 0.2734 X Dynamic Time Warping (DTW) 0.0269 230X •The result proved the reliability of DTW and motivates the necessity of introducing technique to index DTW What is Dynamic Time Warping ? • DTW is being used in different area like chemical engineering, pattern matching, bioinformatics, . . . What is Time Warping? Given: two sequences x1,x2,...,xn and y1,y2,...,ym two series in different time phase Wanted: align two sequence base on a common time-axis shifting of time axis Aligning time series with Dynamic Programming Matrix optimal warping path two time series Q and C, length n and m respectively an (n*m) matrix is constructed to store the distance between items in Q and C. the result alignment What is Dynamic Time Warping? cont. • There're three basic constraints for time warping Boundary conditions -we want the path not to skip a part at the beginning or ending of utterance Continuity -no jumps Monotonicity - we can't go back in time •In the matrix, there are many warping paths that satisfy the three basic constraints. Goal : How can we find a path that gives the minimal overall distance? formula of dynamic programming (i,j) = d(qi,cj) + min{ (i-1,j-1) , (i-1,j ) , (i,j-1) } Demonstration of computing the Minimal Editing Distance: http://isl.ira.uka.de/speechCourse/slides/dtw/editdist/applet/applet.html How to speed up the calculation of DTW? Basic idea: Approximate the time series with some compressed or down sampled representation, and do DTW on the new representation. Q p w k C j 11 w1 n i Solution : Lower Bounding Measure with Global Path Constraint What is Global Path Constraints ? - path should be close to diagonal - in theory, it limits warping path by how far it may stay from the diagonal - in practice, it constrains the range of indices in the warping path Why using global constraints ? - speed up the DTW distance calculation (reduces the search effort from O(n2) to O(n)) - to avoid a relatively small section of one sequence maps onto a relatively large section of another sequence. warping window How to speed up the calculation of DTW? cont. What is Lower Bounding Measure ? - a dimensionality reduction technique Why using Lower Bounding Measure ? - both Euclidean metric and DTW is either heavily I/O bound or very demanding in terms of CPU time. - a fast lower bounding function can address this problem by erasing sequences that could not possibly be a best match. How to define a good Lower Bounding Measure ? A good lower bounds function should basically match two criteria - must be fast to compute - must produces a relatively tight lower bounds which means that it can more tightly approximates the true DTW distance Two existing type of Lower bounding measure [LB_Kim] [LB_Yi] squared different between two series' first (A), last (D), min (B) and max (C) sum of squared length of gray line represent the minimum the corresponding points contribution to the overall DTW How to speed up the calculation of DTW? cont. Proposed lower bounding measure : LB_Keogh with global constraint Notation A: bounding envelope - Sako-China Band (global constraint) B: bounding envelope - Itakura Parallelogram (global constraint) Q: original sequence U: Upper L: Lower [LB_Keogh] 1. Limiting the range of warping path by using global constraint 2. Approximating the tightest lower bound by using LB_Keogh Itakura Parallelogram and LB_Keogh together produces the tightest bounds squared sum of the distances from every part of the candidate sequence C not falling within the bounding envelope, to the nearest orthogonal edge of the bounding envelope is returned as the lower bound. LB_Keogh lower bound <= DTW bound How to index Dynamic Time Warping? Piecewise Constant Approximation (PAA) Basic idea: Represent the time series as a sequence of box basis functions. Each box is in same length • Reducing the time series from n dimensions to N dimensions, the data is divided into N equal sized “frames”. Why using PAA ? • time series data may include hundreds to thousands items, this will rapidly degrade the performance of indexing. • 16 dimension time series will be reasonably handled by multidimensional index structure. • a way is needed to further reduce the dimension of lower bound by LB_Keogh • PAA is the most efficient technique among other approaches (Wavelets, Fourier Transforms, Adaptive Piecewise Constant Approximation) A sequence of length 256 is reduced to 16 dimensions How to index Dynamic Time Warping? cont. Modified PAA to index time warped queries [LB_PAA] • there are two time series data sets (Q and C) in length n, both are being divided into N dimension. C is a candidate sequence. Q is a query sequence. • approximate the minimum bounding rectangle (R) in each dimension of candidate sequence C • approximate the max (U^) and min (L^) point in each dimension of query sequence Q by using LB_PAA How to index Dynamic Time Warping? cont. Modified PAA to index time warped queries •define a MINDIST(Q,R) function that returns a lower bounding measure of the distance between a query Q, and R, were R is a Minimum Bounding Rectangle (MBR) of C. l4 original dimension l5 L^2 h2 reduced dimension U^ 4 U^ 5 How to search time series with DTW ? K-Nearest Neighbor Search Algorithm What is K-NN Search - KNNSearch(Q,K)? • query sequence Q and desired number of K time series neighbors from a set C • priority queue is being used for storing the index in an increasing order of distance RangeSearch Algorithm What is RangeSearch Algorithm - RangeSearch(Q,E,T)? • answering a range queries • a classic R-tree-style recursive search algorithm Experimental Evaluation Evaluation among three lower bounding measures (LB_Kim, LB_Yi, LB_Keogh) •Comparing tightness of lower bound against query length •Comparing pruning power against database size Evaluation between linear scan and LB_Keogh •Comparing normalized CPU cost against data size Conclusion • This paper override the traditional believe of "dynamic time warping ...cannot be speeded up by indexing • However, it based on two assumption - both time series data are in the same length (out of time phase is allowed) - index sequence when warping path is constrained (Boundary conditions, Continuity, Monotonicity, Global constraint) • The proposed approach is state of the art in terms of efficiency and flexibility. It may benefit the matching of 2 and 3 dimensional shapes. Acknowledge Dr Eamonnn Keogh (Computer Science & Engineering Department, University of California – Riverside, Riverside,CA 92521) Exact Indexing of Dynamic time Warping A Tutorial on Indexing and Mining Time Series Data http://www.cs.ucr.edu/~eamonn/tutorial_on_time_series.ppt Carnegie Mellon University Automatic Speech Recognition http://werner.ira.uka.de/speechCourse