Download time series length

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
A Time Series Representation Framework
Based on Learned Patterns
Mustafa Gokce Baydogan●
George Runger*
Didem Yamak†
Boğaziçi University
* Arizona State University
† DeVry University
●
10/5/2013
8th INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2013)
Outline

Time series data mining

Motivation



Representing time series
Measuring similarity
Learning a pattern-based representation


Pattern (relationship) discovery
Learned pattern similarity (LPS)

Computational experiments and results

Conclusions and future work
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
2
Time Series Data Mining
Motivations

People measure things, and things (with
rare exceptions) change over time

Time series are everywhere
ECG Heartbeat

Stock
Consider a patient’s medical record



test values
observations
actions and related responses
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
3
Time Series Data Mining
Motivations

Other types of data can be converted to
time series.


Everything is about the representation.
Example: Recognizing words
An example word “Alexandria” from
the dataset of word profiles for
George Washington's manuscripts.
A word can be represented by
two time series created by
moving over and under the word
Images from E. Keogh. A quick tour of the datasets for VLDB 2008. In VLDB, 2008.
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
4
Challenges

Local patterns are important

Translations and dilations (warping)
Observed four peaks are related to
certain event in the manufacturing
process
Time of the peaks may change (two
peaks are observed earlier for blue
series)
Indication of
a problem
Problem occurred over
a shorter time interval
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
5
Challenges
Time series are usually noisy
 Multivariate time series (MTS)



Relation of patterns within the series and
interactions between series may be important
High-dimensionality
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
6
Motivations

Time series representation

To reduce



high-dimensionality
noise
To capture

trends, shapes and patterns
 As they provide more information compared to exact
values of each time series data point

Time series similarity

Accurate


Handle warping
Fast
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
7
Time series representation
* Allows lower bounding for similarity computations
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
8
Time series similarity
Popular (No parameter)
Intuitive
Fast computation
Performs bad
•
•
•
•
•
•
•
•
•
•
•
Very popular (No parameter)
Handles warping (Accurate)
Hard to beat
May perform bad (long series
with noise)
Handles warping (Accurate)
Too many parameters to tune
Computationally not efficient
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
9
Learning a pattern-based representation
A regression tree-based approach is used to
learn a representation
 Earlier (Geurts, 2001),

Your data
matrix
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
t
observed
value
1
0.440
2
3
4
.
.
.
127
128
0.363
0.081
0.083
.
.
.
0.962
0.553
10
A new learning approach
Predicting (forecasting) a segment
Your data
matrix
Forecast ∆
(gap) time
units forward
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
11
Representation
Learned patterns
Time series is 128 units long
Predictor segment 1-60
Response segment 51-111
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
12
Multiple segments
Concatenate
for all time series to create
1. Randomly, select a response segment (column) of
length L
2. Build a regression tree
Multiple
random
∆ levels
At each split decision, select a random predictor column
(one segment at each time)*
Build J trees with depth D
*Known to work well for regression
P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 63(1):3-42, 2006.
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
13
Multiple segments (cont.)
Tree #1
Tree #2
………
Tree #3
Tree #J
………………...
10
9
8
7
6
5
4
3
10
10
9
9
10
8
8
9
7
7
6
6
5
5
4
4
3
3
2
2
1
1
8
………………
7
6
5
4
3
2
2
1
0
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
1
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
1
2
3
4
5
6
7
8
0
9 10 11 12 13 14 15 16 17 18 19 20
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
1. Aggregate the information over all trees for prediction (i.e. denoising)
Each terminal node defines a basis
4 3 3 3 3
2. pattern-based
representation
3 3 1 2
2 7 8 5 6
………………
3 1 2 2 7
(a vector of size RxJ)
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
14
Similarity measure
Learned Pattern Similarity (LPS)

Time series
is represented by
Suppose
be kth entry of
then*

Penalizes the number of mismatches



Robust to noise


Series with mismatching observations in the patterns
are different
Implicitly works on the discrete values
Robust to warping

Representation learning handles the problem of warping
*Assuming each tree has R terminal nodes
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
15
Similarity measure (cont.)

The computations are similar to Euclidean
distance


Fast
Allows for bounding schemes

Early abandon

Similarity search: Find the reference time series that
is most similar to query series
 Keep record of the best distance found so far
 Stop computing distance for a reference series if
current distance is larger than best-so-far

Known to improve the testing time (query time)
significantly
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
16
S-MTS
Experiments

45 univariate time series datasets from UCR database*

Compared to popular NN classifiers with different distance
measures





Addition of difference series


Taking trend information into consideration
A multivariate time series extension


Euclidean
DTW (Constrained and unconstrained version)
SpADe
Sparse Spatial Sample Kernels (SSSK)
If time permits
Parameters

Cross-validation to set parameters for each dataset



Segment length (L) (0.25, 0.5, 0.75) factor of time series length
Depth of trees (4,6,8)
Not important if set large
Number of trees=150
enough
*http://www.cs.ucr.edu/~eamonn/time_series_data/
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
17
Univariate datasets
Health
Energy
Robotics
Astronomy
Chemistry
Gesture
recognition
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
18
Parameters

Illustration over 6 datasets (L=0.5xT)
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
19
Average error rates over 10 replications
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
Multivariate time series

While training, randomly select one univariate
time series and a target segment


Complexity does not change


Find splits over randomly selected predictor segments of
randomly selected univariate time series
More trees with larger depth may be required
uWaveGestureLibrary


the accelerometer readings in three dimensions (i.e. x, y
and z)
Same parameters result in error rate of 0.022
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
21
LPS
Conclusions and future work

A new approach for time series representation





Captures relations between and within the series
Features learned within the algorithm (not pre-specified)
Handles nominal and missing values
Handles warping by representation learning
Scalable (also allows for parallel implementation)
 Training complexity: O(JNTD)


Linear to time series length and number of training series
Training took at most 6 minutes for 45 datasets
(single thread, J=150, D=8, N=1800, T=750)
 SpADe did not return a result for a week of run


Similarity search takes less than a millisecond
Fast and accurate results with few parameters
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
22
LPS
Conclusions and future work

This approach can be extended to many
data mining tasks (for both univariate and
multivariate time series and images)
such as






Denoising (in progress)
Forecasting (in progress)
Anomaly detection (in progress)
Clustering (in progress)
Indexing
…
LPS package is provided on
http://www.mustafabaydogan.com/learned-pattern-similarity-lps.html
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
23
Thanks!
Questions and Comments?
LPS package is provided on
http://www.mustafabaydogan.com/learned-pattern-similarity-lps.html
Mustafa Gokce Baydogan, George Runger and Didem Yamak DM-HI 2013, Minneapolis
24