Download Change Detection in Data Streams by Testing Exchangeability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Change Detection in Data Streams
by Testing Exchangeability
Shen-Shyang Ho
JPL/Caltech
The research is part of the author’s PhD dissertation (in computer science) at George Mason University
Conference travel is partially sponsored by NASA Postdoctoral Program (NPP) Travel Grant.
5/22/2017
1
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Introduction
Previous Work (Statistics and Machine Learning/Data Mining/Computer Vision)
Intuition
Background (Exchangeability/Martingale)
Methodology
Comparison and Experimental Results
Application I: Adaptive Support Vector Machine (Classification Model)
Application II: Video Shot Change Detection (Cluster Model)
5/22/2017
2
Introduction
Let X1, X 2 ,, X n be a sequence of independent p-dimensional
random vectors with parameters 1 , 2 ,, n .Test the following
hypothesis:
H 0 : 1   2     n  0
H1 : m with 1     m   m1     n
Assumption: Data vectors are observed sequentially.
5/22/2017
3
Introduction
5/22/2017
4
Previous Work
Statistics :- Sequential Analysis is statistical inference with the
assumption that the number of observations/samples required is not
pre-determined.
• Sequential Probability Ratio Test – A. Wald (1945)
• Application: Quality Control (Military/Manufacturing)
• CUSUM (Cumulative Sum) – E. S. Page (1954)
• Refer to “Sequential Analysis: Design Methods and Applications”
Journal for recent research.
• Most recent issue (vol 27, no 2, 2008) – papers on structural
change/minimax method for change-point detection
problems/multidecision quickest change-point detection – 3 out of 6
papers.
Machine Learning/Data Mining:
• Applications: Concept Drift Problem, Adaptive classifier, Anomaly in
Internet Traffic, Video-shot change detection
• Proposed methodology is usually problem-specific
• Monitoring error, sliding window, weighted data, ensemble classifier
…
• Statistical method: Likelihood ratio method, Bayesian methods,
Hypothesis Testing …
5/22/2017
5
Related Data Mining/Machine
Learning/Computer Vision Research
1. Xiuyao Song, Mingxi Wu, Christopher M. Jermaine, Sanjay Ranka: Statistical
change detection for multi-dimensional data. KDD 2007: 667-676
2. Kolter, J.Z. and Maloof, M.A. Dynamic Weighted Majority: An ensemble
method for drifting concepts. Journal of Machine Learning Research
8:2755--2790, 2007.
3. Klinkenberg, Ralf and Joachims, Thorsten: Detecting Concept Drift with
Support Vector Machines. Proceedings of the Seventeenth International
Conference on Machine Learning (ICML): 487--494, 2000.
4. Bi Song, Namrata Vaswani, Amit K. Roy Chowdhury: Closed-Loop Tracking
and Change Detection in Multi-Activity Sequences. CVPR 2007
5. Paul L. Rosin: Thresholding for Change Detection. ICCV 1998: 274-279
6. Balachander Krishnamurthy, Subhabrata Sen, Yin Zhang, Yan Chen: Sketchbased change detection: methods, evaluation, and applications. Internet
Measurement Conference 2003: 234-247
7. Tsuyoshi Idé, Keisuke Inoue: Knowledge Discovery from Heterogeneous
Dynamic Systems using Change-Point Correlations. SDM 2005
8. Tsuyoshi Idé, Koji Tsuda: Change-Point Detection using Krylov Subspace
Learning. SDM 2007
9. Daniel Kifer, Shai Ben-David, Johannes Gehrke, Detecting Changes in Data
Streams, Proc. 30th VLDB Conference, 2004.
5/22/2017
10. ... …
6
Motivation
“Lack of Exchangeability” implies
“Change in Data Distribution/Model”
5/22/2017
7
Intuition
1
Identically Distributed
but may be Dependent
5/22/2017
2
3
4
5
6
7
8
9 10
1
9
3
5
2
6
7
4
8 10
1
2
3
4
5
6
7
8
9 10
1
9
3
5
2
6
7
2
8 10
8
Background
Vovk et al’s work on “Testing
Exchangeability Online” (ICML 2003)
and “Algorithmic Learning in a
random world” (Springer) : 1.Testing exchangeability assumption
in an online mode.
2.Explicit Martingale for testing the
hypothesis of exchangeability
(Refer to http://www.vovk.net (conformal prediction) )
5/22/2017
9
Background
Let {Zi : 1  i  } be a sequence of random variables. A finite sequence
of random variable Z1 ,, Z n is exchangeable if p( Z1 ,, Z n ) , the joint
distribution is invariant under any permutation of the indices of the
random variables.
A martingale is a sequence of random variables {M i : 1  i  } such
that M n is a measurable function of Z1 ,, Z n for all n  0,1, (in
particular, M 0 is a constant value) and the conditional expectation of M n1
given Z1 ,, Z n is equal to M n , i.e., E ( M n1 | Z1 ,, Z n )  M n .
5/22/2017
10
Background
( Doob ' s Maximal Inequality )
Suppose that {M i : 0  i  } is a nonnegative martingale.
Then for any   0 and n  ,
P(max M k   )  E ( M n ).
0k n
5/22/2017
11
Methodology - Strangeness
Strangeness measures how well one data point (for each
data point seen so far) is represented by a data model
compared to other points
• Applicable to classification, regression or cluster
model
• measure diversity / disagreements, i.e. the higher the
strangeness of a point, the less likely it comes from
the model
Condition for a valid strangeness measure: A
strangeness value of a data point at a
particular time instance should be
independent of the order it is observed with
respect to the other data points.
5/22/2017
12
Classification Model
k
Strangeness (K-NN):
i 
y
d
 ij
j 1
k
y
d
 ij
t = 1 to 1000
1001 to 2000 2001 to 3000
j 1
A
B
C
t
aaaaa…aaaaabbbbbb…….bbbbbccccc…cccccc
Strangeness (SVM): Lagrange Multiplier
5/22/2017
Classification Model
Strangeness (SVM): Lagrange Multiplier
5/22/2017
Cluster Model
Strangeness of a data vector z i in a cluster
 i || zi  C ||
where C is the center of the cluster.
5/22/2017
15
Regression Model
 i ( xi , yi ) 
where
f
| yi  f ( xi ) |
exp( g ( xi ))
is the regression function and g is the error estimation function for
f
at
xi
(Papadopoulos et al., Inductive Confidence Machines for Regression, ECML, LNAI 2430, pp 345-356, 2002)
5/22/2017
16
Methodology
p-value of a new point
xn1 given previous seen data points:
PV ({x1 , x2 ,, xn1}, n1 ) 
#{i :  i   n1}   n1 #{i :  i   n1}
n 1
( B)
where  i is the strangeness measure for xi , i  1,2,, n  1
and  n is1 randomly chosen from [0,1] for each new point xn1.
 n1: necessary so the sequence of p-values are uniformly distributed in [0,1]
for any strangeness measure (Vovk, 2003)
5/22/2017
17
Methodology
5/22/2017
18
Methodology
Consider the null hypothesis
H 0 : no change in data stream
against the alternative hypothesis
H1 : a change occurs in data stream
The test for change continues as long as
0  Mi  
One rejects the null hypothesis
when M i  
5/22/2017
19
Methodology
n
M n( )   (pi 1 )
i 1
where  is a fixed positive number in (0,1)
and pi ,i  1,, n are pvalues at time 1,, n
5/22/2017
20
Methodology
5/22/2017
21
Experimental Result –
Performance Measure
5/22/2017
22
Experimental Result – Varying
5/22/2017

23
Experimental Result – Varying
Strangeness
5/22/2017
24
Experimental Result –Varying 
Linearly Separable Classification Model
5/22/2017
Linearly Non-separable Classification Model 25
Experimental Result
Ringnorm/Twonorm (Change in
dataset every 1000 points)
5/22/2017
Nursery Categorical Dataset (Change in
class compositions every 1000 points)
26
Experimental Result
5/22/2017
27
Experimental Result – Different
Methods
Changes at time  200 from Y1  3x1  x22  n1 to
Y2  10 x1  222  n2 where n1 and n2 are
Gaussian noise.
5/22/2017
28
Application: Adaptive SVM
5/22/2017
29
Application: Adaptive SVM
Simulated USPS 3-Digit Image Data Stream
01120120…0340033404…156556115…77789987…
5/22/2017
t
30
Application: Adaptive SVM
A (blue): True Change Point
Known to the SVM
B(red): Adaptive SVM using
martingale method
C(magenta): SVM using
sliding window of size 250
D(black): SVM using sliding
window of size 500
E(green): SVM using sliding
window of size 1000
5/22/2017
31
Application: Video-Shot Change
Detection
Martingale Change
Detection using multiple
features (MVMT: Multipleview martingale test)
5/22/2017
32
Application: Video-Shot Change
Detection
• HI: Histogram Intersection
• Chi-Square Measure
• Euclidean Distance (ED)
5/22/2017
33
Reference
1. S.-S. Ho and H. Wechsler, Detecting Change-Points in Unlabeled
Data Streams using Martingale, Proc. 20th Int. Joint. Conf. Artificial
Intelligence (IJCAI 2007), Hyderabad, India, Jan. 6 - 12, 2007.
2. S-S Ho, A Martingale Framework for Concept Change Detection in
Time-Varying Data Streams, Proc Int. Conf. on Machine Learning
(ICML 2005), Bonn, Germany, Aug. 7 - 11, 2005
3. S-S Ho and H. Wechsler, Adaptive Support Vector Machine for
Time-Varying Data streams Using the Martingale, Proc. Int. Joint
Conf. on Artificial Intelligence (IJCAI 2005), Edinburgh, Scotland,
July 30 - Aug. 5, 2005
4. S-S Ho and H. Wechsler, On the detection of concept change in
time-varying data streams by testing exchangeability, Proc.
Conference on Uncertainty in Artificial Intelligence (UAI 2005),
Edinburgh, Scotland, July 26 - 29, 2005
5. http://shenshyang.googlepages.com/codes (matlab codes +
datasets)
5/22/2017
34
Acknowledgement
• Harry Wechsler, PhD Advisor (George Mason University)
• Volodya Vovk, (Royal Holloway, University of London)
• Alexander Gammerman (Royal Holloway, University of London)
• Oak Ridge Associated University (ORAU)
5/22/2017
35