Download PPT

The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for Data Mining, Chapter 7, pp. 113-130 Cho, Dong-Yeon Introduction  Complexity  Patterns: continuous time series segments with particular features  The reflection of events in time series is better represented by patterns.  The complexity of processing patterns  The number of all possible segments for a time series of length N is N(N+1)/2.  A simple inspection of each of these segments takes O(N3).  Good instance selection algorithms are especially helpful here, since they can greatly reduce complexity by reducing the volume of data.  Similarity Model  Euclidian distance does not match human intuition.  1,2,3,4,3 and 3,4,5,6,5  Previous works  None of these proposed techniques supports a similarity model that can both capture the similarity and support efficient pattern querying of time series.  Pattern Representation  Two formats for temporal association rules to verify the cause-effect relation association: C1,…,Cn  E1,…,Em  Backward association: C1,…,Cn  E1,…,Em  Forward  Association rules can be either formulated as hypotheses and verified with data, or be discovered by data mining process.  It is sill not clear what kind of segments can represented event.  What is the basic vocabulary for spelling association rule?  Noise Removal and Data Smoothing  Commonly-used smoothing techniques, such as moving averages, often lag or miss the most significant peaks and bottoms.  These peaks and bottoms can be very meaningful, and smoothing or removing them can lose a great deal of information.  Little previous work takes smoothing as an integral part of the process of pattern definition, index construction, and query processing. The Landmark Data Model and Similarity Model  The Landmark Concept  Episodic memory: human and animals depend on landmarks in organizing their spatial memory  Landmarks: (times, events)  Using landmarks instead of the raw data for processing  N-th order landmark of a curve if the N-th order derivative is 0.  Local maxima, local minima, and inflection points  Tradeoff  The more different types of landmarks in use, the more accurately a time series will be represented.  Using fewer landmarks will result in storage savings and smaller index trees.  Stock market data  Almost half of the record  The normalized error is reasonably small when the curve is reconstructed from the landmarks.  The more volatile the time series, the less significant the higher-order landmarks.  Smoothing  Minimal Distance/Percentage Principle (MDPP)  A minimal distance D and a minimal percentage P  Remove landmarks (xi, yi) and (xi+1, yi+1) if xi 1  xi  D and | yi 1  yi | P (| yi |  | yi 1 |) / 2  The effect of the MDPP  Normalized error generated by the MDPP and DFT  Transformations  Six kinds of transformations  Shifting: SHk(f) such that SHk(f(t))=f(t)+k where k is a constant.  Uniform Amplitude Scaling: UASk(f) such that UASk(f(t))=kf(t) where k is a constant.  Uniform Time Scaling: UTSk(f) such that UTSk(f(t))=f(kt) where k is a positive constant.  Uniform Bi-scaling: UBSk(f) such that UBSk(f(t))=kf(t/k) where k is a positive constant.  Time Warping: TWg(f) such that TWg(f(t))=f(g(t)) where g is a positive and monotonically increasing.  Non-uniform Amplitude Scaling: NASg(f) such that NASg(f(t))=g(t) where for every t, g´(t)=0 if and only if f´(t)=0.  The more transformation included in a similarity model, the more powerful the similarity model.  These transformations can be composed to form new transformations. composition order is flexible: Fu  Gv  Gu  Fv  The composition is idempotent: Fw  Fu  Fv  The  Two time series are defined to be similar if they differ only by a transform.  Landmark Similarity  Dissimilarity measure two sequences of landmarks L= L1,…,Ln and L´= L´1,…,Lń where Li=(xi, yi) and Lí=(xí, yí), the distance between the k-th landmark is defined by  k ( L, L)  ( ktime ( L, L),  kamp ( L, L)) where  Given  | ( xk  xk 1 )  ( xk  xk 1 ) | if 1  k  n   ktime ( L, L)   (| xk  xk 1 |  | xk  xk 1 |) / 2 0 otherwise if yk  yk 0   kamp ( L, L)   | yk  yk | otherwise  (| yk |  | yk |) / 2  The distance between the two sequences is ( L, L)   time ( L, L) ,  amp ( L, L)  ( time ,  amp ) time  We define ( ,  amp )  ( time,  amp ) if  time   time and  amp   amp    A land mark similarity measure is a binary relation on time series segments defined by a 5-tuple LSM=D,P,T,time,amp.  Given two time series sequences s1 and s2, let L1 and L2 be the landmark sequences after MDPP(D, P) smoothing.  (s1, s2)LMS if and only if |L1|=|L2| and there exist two parameterized transformations T1 and T2 of T whose dissimilarity satisfies time(T1(L1), T2(L2)) < time and amp(T1(L1), T2(L2)) < amp. Data Representation  Family of Time Series Segments  Equivalent under the six transformations  Replacing naïve landmark coordinates with various features of landmarks that are invariant under these transformations  F = {y, h, v, hr, vr, vhr, pv} hi=xi-xi-1 vi=yi-yi-1 hri=hi+1/hi vri=vi+1/vi vhri=vi/ hi pvi=vi/yi  Invariant features under transformations Conclusion  Landmark Model  An instance selection system for time series  This integrates similarity measures, data representation and smoothing techniques in a single framework.  Minimal Distance/Percentage Principle (MDPP): The smoothing method for the Landmark Model  This also supports a generalized similarity model which can ignore differences corresponding to six transformations.  Intuitive to human

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PPT