Download Mining Time-Series Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Mining Time-Series Databases
I – Introduction
Data Mining
 The extraction of nontrivial, implicit and useful
knowledge from the data
Data
• Artificial
Data
Intelligence
Mining
• Computer
Science
• Statistics
• Information
Retrieval
Knowle
dge
I – Introduction
Data Mining goals

To find “structure” in the large amount of
information available from different sources
To organize the data
 To identify patterns that translate into new
understandings and viable predictions
 To discover relationships between data and
phenomena that ordinary operations and
routine analysis would otherwise overlook

I – Introduction
Time Series
 People measure things:
 Oil price
 Sócrates popularity
 Blood pressure, etc.
and things change over time, creating a time series
Introduction

A Time-Series Database is a database
that contains data for each point in time.

Examples:


Weather Data
Stock Prices
What to Mine?

Full Periodic Patterns



Every point in time contributes to the cyclic
behavior of the time-series for each period.
e.g., describing the weekly stock prices
pattern considering all the days of the week.
Partial Periodic Patterns


Describing the behavior of the time-series at
some but not all points in time.
e.g., discovering that the stock prices are high
every Saturday and small every Tuesday.
I – Introduction
Time Series definition

A (numeric) time series is a sequence of
observations
of a numeric property over
-1,25
-1,00
time
0,01
0,05
…
5,45
0,00
…
Motivation to Work in Time
Series
I – Introduction
 Time series are
ubiquitous
 Most of the information (data) produced in a
variety of areas are time series
 e.g. about 50% of all newspaper graphics are
time series
 Other types of data can be converted to time
series
Image from E. J. Keogh. A decade of progress in indexing and mining large time series
databases. In VLDB, page 1268, 2006.
I – Introduction
Time Series Examples
Images from a variety of papers by E. J. Keogh. Available at: www.cs.ucr.edu/~eamonn
electroencephalogram
historical
archives
physiology (muscle activation
sensors
ECG
motion
data
I – Introduction
Time Series Examples (cont.)
Image from E. J. Keogh. A decade of progress in indexing and mining large time series
databases. In VLDB, page 1268, 2006.
stocks
data
images
DNA sequences
sales
goods consumption
animal ECG
motion capture
handwritten character recognition
Time Series data
characteristics
I – Introduction
 Analysis is hard, as we are typically dealing with
massive data-sets:
 One hour EEG: 1 GB of data
 Typical weblog: 5 GB / week
 MACHO database: 5 TB (growing 3 GB a day)
 Stanford Linear Accelerator database: 500 TB
 Quadratic complexity algorithms are insufficient
 The data also present some distortions (noise,
scaling effects, etc.) that make the analysis more
difficult
I – Introduction
Time Series Data Mining Tasks
Image from E. J. Keogh. A decade of progress in indexing and mining large time series
databases. In VLDB, page 1268, 2006.