Download Burst Detection: Application

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Fast Fourier transform wikipedia , lookup

Transcript
Online Pattern Discovery Applications in Data Streams
• Sensor-less: Pairs-trading in stock trading (find highly correlated
pairs in n log n time)
• Sensor-full: Gamma Ray Detection in astrophysics (burst detection
over a large number of window sizes in almost linear time)
• Dennis Shasha (joint work with Yunyue Zhu)
• yunyue,[email protected]
Application 1: Pairs Trading
• Stock prices streams
– The New York Stock Exchange (NYSE)
– 50,000 securities (streams); 100,000 ticks (trade and quote)
• Pairs Trading, a.k.a. Correlation Trading
• Query:“which pairs of stocks were correlated with a value of
over 0.9 for the last three hours?”
XYZ and ABC have been correlated with a correlation of 0.95 for the last three hours.
Now XYZ and ABC become less correlated as XYZ goes up and ABC goes down.
They should converge back later.
I will sell XYZ and buy ABC …
Online Detection of High Correlation
• Given tens of thousands of high speed time series data streams,
to detect high-value correlation, including synchronized and
time-lagged, over sliding windows in real time.
• Real time
– high update frequency of the data stream
– fixed response time, online
Correlated!
Correlated!
StatStream: Algorithm
• Naive algorithm
– N : number of streams
– w : size of sliding window
– space O(N) and time O(N2w) VS space O(N2) and time O(N2) .
• Suppose that the streams are updated every second.
– With a Pentium 4 PC, the exact method can monitor only 700 streams
with a delay of 2 minutes.
• Our Approach
– Discrete Fourier Transform to approximate correlation
– grid structure to filter out unlikely pairs
– Our approach can monitor 10,000 streams with a delay of 2 minutes.
StatStream: Stream synoptic data structure
• Three level time interval hierarchy
– Time point, Basic window, Sliding window
• Basic window (the key to our technique)
– The computation for basic window i must finish by the end of the basic
window i+1
– The basic window time is the system response time.
• Digests
Time point
Basic window
digests:
sum
DFT coefs
Basic window
digests:
sum
DFT coefs
Basic window
digests:
sum
DFT coefs
Basic window
Sliding window
Sliding window
digests:
sum
DFT coefs
Basic window
digests:
sum
DFT coefs
Application 2: elastic burst detection
• Discover time intervals with an unusually large numbers of
events.
– In astrophysics, the sky is constantly observed for high-energy particles.
When a particular astrophysical event happens, a shower of high-energy
particles arrives in addition to the background noise.
– In finance, stocks with unusual high trading volumes should attract the
notice of traders (or perhaps regulators).
• Challenge : to discover time and duration of burst, which may
vary
– In astrophysics, a burst of high-energy particles associated with a special
event might last for a few milliseconds or a few hours or even a few days
NB: Similar idea may apply to spatial burst detection.
Application 2: burst detection
• example
Burst Detection: Problem Statement
• Problem:Given a time series of positive number x1, x2,..., xn, and
a threshold function f(w), w=1,2,...,n, find the subsequences of
any size such that their sums are above the thresholds:
– all 0<w<n, 0<m<n-w, such that xm+ xm+1+…+ xm+w-1 > f(w)
• Brute force search : O(n^2) time
• Our shift wavelet tree (SWT): O(n+k) time.
– k is the size of the output, i.e. the number of windows with bursts
Burst Detection: Data Structure and Algorithm
– Lemma 1:any subsequence s is included by one window w in the SWT.
– Lemma 2: if Sum(s)>threshold, then Sum(w)>threshold (no false
positives).