Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Elastic Burst Detection: Applications • Discovering intervals with an unusually large numbers of events. – In astrophysics, the sky is constantly observed for high-energy particles. When a particular astrophysical event happens, a shower of high-energy particles arrives in addition to the background noise. – In finance, stocks with unusual high trading volumes should attract the notice of traders (or perhaps regulators). • Challenge : to discover not only the time of the burst, but also the duration of the burst which may vary widely. – In astrophysics, a burst of high-energy particles associated with a special event might last for a few milliseconds or a few hours or even a few days. Burst Detection: Problem Statement • Problem:Given a time series of positive number x1, x2,..., xn, and a threshold function f(w), w=1,2,...,n, find the subsequences of any size such that their sums are above the thresholds: – all 0<w<n, 0<m<n-w, such that xm+ xm+1+…+ xm+w-1 > f(w) • Brute force search : O(n^2) time • Our shift wavelet tree (SWT): O(n+k) time. – k is the size of the output, i.e. the number of windows with bursts Burst Detection: Data Structure and Algorithm – Lemma 1:any subsequence s is included by one window w in the SWT. – Lemma 2: if Sum(s)>threshold, then Sum(w)>threshold (no false positives). StatStream: Motivation • Stock prices streams – The New York Stock Exchange (NYSE) – 50,000 securities (streams); 100,000 ticks (trade and quote) • Pairs Trading, a.k.a. Correlation Trading • Query:“which pairs of stocks were correlated with a value of over 0.9 for the last three hours?” XYZ and ABC have been correlated with a correlation of 0.95 for the last three hours. Now XYZ and ABC become less correlated as XYZ goes up and ABC goes down. They should converge back later. I will sell XYZ and buy ABC … StatStream:Goal • Given tens of thousands of high speed time series data streams, to detect high-value correlation, including synchronized and time-lagged, over sliding windows in real time. • Real time – high update frequency of the data stream – fixed response time, online Correlated! Correlated! StatStream: Algorithm • Naive algorithm – N : number of streams – w : size of sliding window – space O(N) and time O(N2w) VS space O(N2) and time O(N2) . • Suppose that the streams are updated every second. – With a Pentium 4 PC, the exact computing method can only monitor 700 streams with a delay of 2 minutes. • Our Approach – Using Discrete Fourier Transform to approximate correlation – Using grid structure to filter out unlikely pairs – Our approach can monitor 10,000 streams with a delay of 2 minutes. StatStream: Stream synoptic data structure • Three level time interval hierarchy – Time point, Basic window, Sliding window • Basic window (the key to our technique) – The computation for basic window i must finish by the end of the basic window i+1 – The basic window time is the system response time. • Digests Time point Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs Basic window Sliding window Sliding window digests: sum DFT coefs Basic window digests: sum DFT coefs