Download Data Stream Mining with Extensible Markov Model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Data Stream Mining with
Extensible Markov Model
Yu Meng, Margaret H. Dunham, F. Marco Marchetti,
Jie Huang, Charlie Isaksson
October 18, 2006
Outline
Data Stream Mining
 EMM Framework
 EMM Applications
 Future Work
 Conclusions

2
Data Mining
Is the process of automatically searching large volumes of data for
the nontrivial, hidden, previously unknown, and potentially useful
information (interrelation of data)
 Also called Knowledge-Discovery in Databases (KDD) or
Knowledge-Discovery and Data Mining,
 Classification (Yahoo news, finance, etc.)
 Clustering (type of customers in online purchase)
 Association (Market Basket Analysis)
3
Classification:




Given a collection of records (training set )
 Each record contains a set of attributes, one of the
attributes is the class.
 Find a model for class attribute as a function of the
values of other attributes.
Goal: previously unseen records should be assigned a class as
accurately as possible.
 A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into training
and test sets, with training set used to build the model
and test set used to validate it.
Decision tree, neural network, naïve Bayes, etc.
Classification is a supervised learning process.
4
Illustrating Classification Task
Tid
Attrib1
Attrib2
Attrib3
Class
1
Yes
Large
125K
No
2
No
Medium
100K
No
3
No
Small
70K
No
4
Yes
Medium
120K
No
5
No
Large
95K
Yes
6
No
Medium
60K
No
7
Yes
Large
220K
No
8
No
Small
85K
Yes
9
No
Medium
75K
No
10
No
Small
90K
Yes
Learning
algorithm
Induction
Learn
Model
Model
10
Training Set
Tid
Attrib1
Attrib2
Attrib3
11
No
Small
55K
?
12
Yes
Medium
80K
?
13
Yes
Large
110K
?
14
No
Small
95K
?
15
No
Large
67K
?
Apply
Model
Class
Deduction
10
Test Set
5
Clustering


Finding groups of objects such that the objects in a group will
be similar (or related) to one another and different from (or
unrelated to) the objects in other groups
Clustering is an unsupervised learning
Intra-cluster
distances are
minimized
Inter-cluster
distances are
maximized
6
Association Rule Mining

Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items in
the transaction
Market-Basket transactions
TID
Items
1
Bread, Milk
2
3
4
5
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper}  {Beer},
{Milk, Bread}  {Eggs,Coke},
{Beer, Bread}  {Milk},
Implication means co-occurrence,
not causality!
7
Why Data Stream Mining?

A growing number of applications generate
streams of data.
 Computer network monitoring data (IEPM-BW2004, Abilene





2005)
Call detail records in telecommunications (Cisco VoIP data
2003)
Highway transportation traffic data (MnDot 2005)
Online web purchase log records (JcPenny data 2003)
Sensor network data (Ouse, Serwent 2002)
Stock exchange, transactions in retail chains, ATM
operations in banks, credit card transactions.
8
What we see from the data streams?
Characteristics of data stream:






Records may arrive at a rapid rate
High volume (possibly infinite) of continuous data
Concept drifts: Data distribution changes on the fly
Data are raw
Multidimensional
Spatiality, Temporality
9
What we see from the data streams?
Requirements:


High efficient computation and processing of the input streams
in terms of both time and space. Soft-real time and scalability.
“Seek needles in a haystack”. Rare event detections.
Haixun Wang, Jian Pei, Philip S. Yu, ICDE 2005;
Keogh, ICDM’04
10
What we see from the data streams?
Stream processing restrictions:

Single pass: Each record is examined at most once
Bounded storage: Limited Memory to be used
Real-time: Per record processing time must be low
Incremental responses to queries

Our Solution






Data modeling (global synopsis)
Mining on local patterns based on the synopsis
Incremental, scalable algorithms
11
Extensive Markov Model
To develop a new data mining framework to model
spatiotemporal data stream, and mine interesting
local patterns.
Assumptions of data:




Data are collected in discrete time intervals,
Data are in structured format, <a1, a2, …>
Data are multidimensional,
Data hold an approximation of the Markov property.
12
Extensive Markov Model
Capabilities of the technique:





soft real-time processing time (Incremental)
Global modeling capability (scalable, synopsis)
Local pattern finding capability (mining performed on synopsis)
Adaptive to concept changes,
Rare event detection
13
Outline
Introduction
 EMM Framework
 EMM Applications
 Future Work
 Conclusions

14
EMM: An Overview
Motivation of EMM:
 Markov process is a random process satisfying Markov property.
Markov chain is a Markov process with discrete states.
 Clustering -> determine representative granules in the data
space.
 Static Markov chain -> dynamic Markov chain
 Map a cluster into a state in Markov chain
What is EMM: A data mining framework which models
spatiotemporal data stream and is employed for local pattern
detections.
EMM models data stream by interleaving a clustering algorithm
with a dynamic Markov chain.
EMM applies a series of efficient algorithms to mine interesting
patterns from the modeled data (synopsis).
15
EMM Overview
EMM Clustering Algorithms
Nearest neighbor O(m)
Hierarchical Clustering
O(log m)
EMM uses any clustering
algorithm
EMM Building Algorithms O(1)
EMMIncrement algorithm,
EMMDecrement algorithm,
EMMMerge algorithm
EMMSplit algorithm
EMM Application Algorithms
O(1)
Predictions
Anomaly detection
Risk Assessment
Emerging Event Finding
Selection of these algorithms
are solely dependent on
hypotheses of data profiles
EMM performs learning
incrementally and is able to
perform application
computations simultaneously.
16
EMM Components and Workflow
Hypotheses
Data stream
Online
Preprocessing
Query
EMM Modeling
EMM Pattern
Finding
Output
- Flexibility
- Modularization
- It models while executes applications
17
EMM – A Walk Through
Attributes
EMM Clustering
1
2
3
4
5
6
7
1-> N1
18.63
10.97
3.179
3.803
1.239
0.718
0.137
2 -> N1
17.6
10.81
2.989
3.741
1.497
0.661
0.135
3 -> N2
16
9.503
2.685
3.432
1.169
0.594
0.125
4 -> N3
14.62
8.966
2.561
3.296
1.01
0.56
0.116
5 -> N3
14.62
8.32
2.409
3.107
0.915
0.512
0.114
6 -> N1
18.73
10.37
3.19
3.83
1.39
1.18
0.13
Inputs
->States
N1
EMM Building
18
EMM – A Walk Through
Attributes
EMM Clustering
1
2
3
4
5
6
7
1-> N1
18.63
10.97
3.179
3.803
1.239
0.718
0.137
2 -> N1
17.6
10.81
2.989
3.741
1.497
0.661
0.135
3 -> N2
16
9.503
2.685
3.432
1.169
0.594
0.125
4 -> N3
14.62
8.966
2.561
3.296
1.01
0.56
0.116
5 -> N3
14.62
8.32
2.409
3.107
0.915
0.512
0.114
6 -> N1
18.73
10.37
3.19
3.83
1.39
1.18
0.13
Inputs
->States
CL11=1
N1
N
1
CN1=1
EMM Building
19
EMM – A Walk Through
Attributes
EMM Clustering
1
2
3
4
5
6
7
1-> N1
18.63
10.97
3.179
3.803
1.239
0.718
0.137
2 -> N1
17.6
10.81
2.989
3.741
1.497
0.661
0.135
3 -> N2
16
9.503
2.685
3.432
1.169
0.594
0.125
4 -> N3
14.62
8.966
2.561
3.296
1.01
0.56
0.116
5 -> N3
14.62
8.32
2.409
3.107
0.915
0.512
0.114
6 -> N1
18.73
10.37
3.19
3.83
1.39
1.18
0.13
Inputs
->States
CL11=1
CL11=1
NN
11
CN1=2
CN1=1
L12=1
N2
EMM Building
20
EMM – A Walk Through
Attributes
EMM Clustering
CL11=1
CL11=1
1
2
3
4
5
6
7
1-> N1
18.63
10.97
3.179
3.803
1.239
0.718
0.137
2 -> N1
17.6
10.81
2.989
3.741
1.497
0.661
0.135
3 -> N2
16
9.503
2.685
3.432
1.169
0.594
0.125
4 -> N3
14.62
8.966
2.561
3.296
1.01
0.56
0.116
5 -> N3
14.62
8.32
2.409
3.107
0.915
0.512
0.114
6 -> N1
18.73
10.37
3.19
3.83
1.39
1.18
0.13
Inputs
->States
EMM Applications
CN1=2
NN11 CN1=2
CL12=1
CL12=1
CN2=1
NN22
CL23=1
EMM Building
N2
21
EMM – A Walk Through
Attributes
2
3
4
5
6
7
1-> N1
18.63
10.97
3.179
3.803
1.239
0.718
0.137
2 -> N1
17.6
10.81
2.989
3.741
1.497
0.661
0.135
3 -> N2
16
9.503
2.685
3.432
1.169
0.594
0.125
4 -> N3
14.62
8.966
2.561
3.296
1.01
0.56
0.116
5 -> N3
14.62
8.32
2.409
3.107
0.915
0.512
0.114
6 -> N1
18.73
10.37
3.19
3.83
1.39
1.18
0.13
Inputs
->States
EMM Clustering
EMM Applications
CL11=1
N1
1
CN1=2
CL12=1
CN2=1
N2
CL23=1
N2
CN3=1
EMM Building
22
CL33=1
EMM – A Walk Through
Attributes
1
2
3
4
5
6
7
1-> N1
18.63
10.97
3.179
3.803
1.239
0.718
0.137
2 -> N1
17.6
10.81
2.989
3.741
1.497
0.661
0.135
3 -> N2
16
9.503
2.685
3.432
1.169
0.594
0.125
4 -> N3
14.62
8.966
2.561
3.296
1.01
0.56
0.116
5 -> N3
14.62
8.32
2.409
3.107
0.915
0.512
0.114
6 -> N1
18.73
10.37
3.19
3.83
1.39
1.18
0.13
Inputs
->States
EMM Clustering
CL11=1
CL11=1
EMM Applications
N
CN1=2
N11 CN1=2
CL12=1
CL12=1
CN2=1
CN2=1
L23=1
N
N22
CL23=1
CL23=1
EMM Building
NN22 CN3=1
CN3=2
CL33=1
CL33=1
23
More Issues of EMM
Label of Nodes:
Cluster feature: <CNi, LSi>
LS: Medoid or Centroid
Label of Links:
<CLij>
Calibration of Granularity of
Clusters
 Determine threshold using
Markov property
 Parameter free modeling
[Keogh, KDD04]
RMS



Pe rform ance
8.4
8.2
8
7.8
7.6
7.4
7.2
7
6.8
6.6
1
2
3
4
5
6
7
8
Th = 0.5 0.6 0.7 0.8 0.9 0.99 0.995 0.999
Figure 65.5. RMS Error for Prediction in
Serwent Dataset
24
Modeling Performance



Growth rate of EMM states (Matlab as a testbed)
 Sublinear growth of number of states
 Growth rate decreases
 Memory usage: 0.02-0.04% of data size for Ouse,
Serwent, and MnDot.
Time efficiency
 Clustering: O(m) vs. O(log m)
 Markov chain: O(1)
Continued learning
25
Outline
Introduction
 EMM Framework
 EMM Applications




Anomaly detection
Risk Assessment
Emerging Event Finding
Future Work
 Conclusions

26
EMM Application: Anomaly Detections

Problem: compares a synopsis representing “normal”
behavior to actual behavior. Any deviation is flagged
as a potential interesting pattern.



Methodology: Concepts and rules





Also known as Positive Security Model [http://www.imperva.com]
Assume that everything that deviates from normal is bad.
Cardinality of nodes and links
Normalized Occurrence Frequency and Normalized
Transition Probability
Performance Metric: detection rate = TP/(TP+TN)
Plus: has the potential to detect interesting
patterns of all kind – including "unknown" patterns
Minus: can lead to a high false alarm rate.
27
EMM Application: Anomaly Detections
28
EMM Application: Anomaly Detections
29
EMM Application: Risk Assessment




Problem: Mitigate false alarm rate while maintain a
high detection rate.
“98% of the alarm incidents in most
Methodology:
communities are false alarms which
 Historic
used as a free resource
distracts
law feedbacks
enforcement can
frombe
real
to take
some possibly
safe anomalies
public
safetyout
responses.
“ PurvisGary,
http://www.falsealarmreduction.com/
 Combine anomaly detection model and user’s
feedbacks.
 Risk level index
Evaluation metrics: Detection rate, false alarm rate.
Results and discussions
Detection rate = TP/(TP+TN)
False alarm rate = FP/(TP+FP)
30
EMM Application: Risk Assessment
DETECTION RATE OF ANOMALY DETECTION AND RISK ASSESSMENT MODELS
1
0.5
0
16
ANOMALY DETECTION
RISK ASSESSMENT
18
20
22
24
26
28
(a) EUCLIDEAN THRESHOLD FOR CLUSTERING (th)
30
32
1
0.5
0
0
ANOMALY DETECTION
RISK ASSESSMENT
0.1
1
0.3
0.4
0.5
0.6
0.7
0.8
(b) RISK ASSESSMENT WEIGHT FACTOR (alpha)
0.9
1
ANOMALY DETECTION
RISK ASSESSMENT
0.5
0
0
0.2
50
100
150
200
250
300
(c) EMM STATE CARDINALITY THRESHOLD (thNode)
350
400
1
0.5
0
0
ANOMALY DETECTION
RISK ASSESSMENT
50
100
150
200
250
(d) EMM TRANSITION CARDINALITY THRESHOLD (thLink)
300
31
EMM Application: Risk Assessment
FALSE ALARM RATE OF ANOMALY DETECTION AND RISK ASSESSMENT MODELS
1
ANOMALY DETECTION
RISK ASSESSMENT
0.5
0
16
18
1
20
22
24
26
28
(a) EUCLIDEAN THRESHOLD FOR CUSTERING (th)
0.1
1
1
0.5
0
0
0.2
0.3
0.4
0.5
0.6
0.7
0.8
(b) RISK ASSESSMENT WEIGHT FACTOR (alpha)
0.9
1
ANOMALY DETECTION
RISK ASSESSMENT
0.5
0
0
32
ANOMALY DETECTION
RISK ASSESSMENT
0.5
0
0
30
50
100
150
200
250
300
(c) EMM STATE CARDINALITY THRESHOLD (thNode)
350
400
ANOMALY DETECTION
RISK ASSESSMENT
50
100
150
200
250
(d) EMM TRANSITION CARDINALITY THRESHOLD (thLink)
300
32
EMM Application: Risk Assessment
RELATIVE OPERATING CHARACTERISTIC CURVE
OF ANOMALY DETECTION MODEL
1
0.95
0.9
DETECTION RATE
0.85
0.8
0.75
0.7
0.65
W/ D. EUCLIDEAN THRESHOLDS
W/ D. EMM STATE THRESHOLDS
0.6
W/ D. EMM TRANSITION THRESHOLDS
0.55
0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
FALSE ALARM RATE
0.7
0.8
0.9
1
33
EMM Application: Emerging Events

Problem: Model dynamic changing spatiotemporal data
series. Find emerging events that represent new and
significant trends.



Methodology:






How to delete obsolete nodes?
How to identify the new trend at an early time?
Sliding window : EMMDelete
Decay of importance: Aging Score
Extended Cluster Feature
Extended Transition Labeling
Emerging events
Results and discussions

O(1)
34
EMM Application: Emerging Events
1.0
1.0
1.0 1.0
1.0
1
1
S(t)/CN = 3/5 =0.6
CN = 5
t
t
1
1.0
1
S(t)/CN = 4.5/5 =0.9
0.6 0.7
0.3
0.4
t
S(t)=3
t
1
S(t)/CN = 2.8/3 =0.93
t
= 0.3+0.4+0.5+0.6+1.0
= 3.0
35
EMM Application: Emerging Events
EMM STATE INCREMENT (CISCOINTERNAL2 EULIDEAN TH=30 CENTROID WINDOW SIZE=1000 ALPHA=0.01 R=0.9)
30
25
EMM STATE
20
15
10
5
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
TIME
36
Outline
Introduction
 EMM Framework
 EMM Applications
 Future Work
 Conclusions

37
Future Work: Adaptive EMM
Adaptive EMM


Motivation: Modeling dynamically changing data profile
needs change of cluster granularity.
Our proposed methodology: local ensemble of EMMs




One main EMM and two ancillary EMMs (less descriptors ),
Compare performance of the three EMMs,
Switch the main EMM
Create a new ancillary EMM based on the new main EMM
(Faster time-to-mature).
EMM performance at time t


EMMSplit
EMMMerge
Performance
New algorithms are needed
38
Granularity
Future Work: Hierarchical EMM
Hierarchical EMM: The logical geographic area under
consideration will be divided into virtual regions. A high level EMM
is an agglomeration of lower level EMMs.
 Parallel EMM: a high level EMM is a summary of lower level
EMMs with the same features/attributes.
 Heterogeneous EMM: a lower level EMM is a feature of the
higher level EMM,
 Recursive EMM: a lower level EMM represents one or
several sub-states of the higher level EMM.
EMM
EMM
EMM
EMM
EMM
EMM
EMM
EMM
39
Conclusions





EMM is an efficient, modularized, flexible data mining
framework suitable for spatiotemporal data steam processing
It has a series of applications,
EMM complies with current research trend and demanding
techniques,
EMM is innovative,
List of Publications.
40
Related Publications






Yu Meng and Margaret H. Dunham, "Mining Developing Trends of
Dynamic Spatiotemporal Data Streams", Journal of Computers, Vol. 1,
No. 3, Academy Publisher, 2006.
Charlie Isaksson, Yu Meng and Margaret H. Dunham, "Risk Leveling of
Network Traffic Anomalies", Int'l Journal of Computer Science and
Network Security (IJCSNS), Vol. 6, No. 6, 2006.
Yu Meng and Margaret H. Dunham, “Online Mining of Risk Level of Traffic
Anomalies with User's Feedbacks”, in Proceedings of the Second IEEE
International Conference on Granular Computing (GrC'06), Atlanta, GA,
May 10-12, 2006.
Y. Meng, M.H. Dunham, F.M. Marchetti, and J. Huang, ”Rare Event
Detection in A Spatiotemporal Environment”, in Proceedings of the
Second IEEE International Conference on Granular Computing (GrC'06),
Atlanta, GA, May 10-12, 2006.
Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events
in A Dynamic Spatiotemporal Environment”, in Proceedings of the Tenth
Pacific-Asia Conference on Knowledge Discovery and Data Mining
(PAKDD 2006) , Singapore, April 9-12, 2006, Springer LNCS Vol. 3918.
M.H. Dunham, Y. Meng, and J. Huang, “Extensible Markov Model”, in
Proceedings of the 4th IEEE International Conference on Data Mining
(ICDM'04), Brighton, UK, November 1-4, 2004.
41
Thank you
Questions?
42