Download Section 6: Relational Outlier Detection and Exception Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
School of Computing Science
Simon Fraser University
Vancouver, Canada
Anomaly Detection
Outlier Detection
Exception Mining
Profile-Based Outlier Detection for
Relational Data
Population Database
e.g. IMDB
Individual Database
Profile, Interpretation, egonet
e.g. Brad Pitt’s movies
Goal: Identify exceptional individual databases
Maervoet, J.; Vens, C.; Vanden Berghe, G.; Blockeel, H. & De Causmaecker, P. (2012), 'Outlier Detection in Relational
2
Data: A Case Study in Geographical Information Systems', Expert SystemsWith Applications 39(5), 4718—4728.
Example: population data
gender = Man
country = U.S.
False
n/a
False
n/a
runtime = 98 min
drama = true
action = true
gender = Man
country = U.S.
gender = Woman
country = U.S.
True
False
$500K n/a
False True
n/a $5M
runtime = 111 min
drama = false
action = true
gender = Woman
country = U.S.
False
n/a
True
$2M
ActsIn
salary
3
Example: individual data
gender = Man
country = U.S.
False
n/a
False
n/a
runtime = 98 min
drama = true
4
Model-Based Relational Outlier Detection
• Model-based: Leverage result of Bayesian network
learning
1. Feature generation based on BN model
2. Define outlierness metric using BN model
Population Database
Individual Database
Class-level Bayesian network
Maervoet, J.; Vens, C.; Vanden Berghe, G.; Blockeel, H. & De Causmaecker, P. (2012), 'Outlier Detection in Relational
5
Data: A Case Study in Geographical Information Systems', Expert SystemsWith Applications 39(5), 4718—4728.
Model-Based Feature Generation
Learning Bayesian Networks for Complex Relational Data
Model-Based Outlier Detection for
Relational Data
Population Database
Class-level Bayesian network
Individual Database
Individual Feature Vector
• Propositionalization/Relation Elimination/ETL:
• Feature vectors summarize the individual data
• leverage outlier detection for i.i.d. feature matrix data
Riahi, F. & Schulte, O. (2016), Propositionalization for Unsupervised Outlier Detection in Multi-Relational Data,
in 'Proceedings FLAIRS 2016.', pp. 448--453.
7
Example: Class Bayesian Network
ActsIn(A,M)
Drama(M)
gender(A)
Learning Bayesian Networks for Complex Relational Data
8
Example: Feature Matrix
0
1
1/2
1/2
0
0
1/2
1/2
0
0
0
0
1/2
1/2
1/2
1/2
0
0
0
0
0
1/2
1/2
0
1/2
1/2
1/2
1/2
1/2
0
0
1/2
0
0
0
0
1/2
1/2
1/2
1/2
0
0
0
0
0
1/2
1/2
0
• Each feature corresponds to a family configuration in the Bayesian
network
• Similar to feature matrix for classification
• For step-by-step construction, see supplementary slides on website
Learning Bayesian Networks for Complex Relational Data
9
Feature Generation/Propositionalization
for Outlier Detection
 Similar to feature generation for classification
 Main difference: include all first-order random variables, not
just the Markov blanket of the class variable
 Bayesian network learning discovers relevant conjunctive
features
 Related work: The Oddball system also extracts a feature
matrix from relational information based on network analysis
(Akoglu et al. 2010)
+ Leverages existing i.i.d. outlier detection methods
- does not define a “native” relational outlierness metric
Akoglu, L.; Mcglohon, M. & Faloutsos, C. (2010), OddBall: Spotting Anomalies in Weighted Graphs, in 'PAKDD', pp. 410-421
Akoglu, L.; Tong, H. & Koutra, D. (2015), 'Graph based anomaly detection and description: a survey', Data Mining and Knowledge
Discovery 29(3), 626--688.
10
Relational Outlierness Metrics
Learning Bayesian Networks for Complex Relational Data
Exceptional Model Mining for
Relational Data
EMM approach (Knobbe et al. 2011) for subgroup discovery in
i.i.d. data
1. Fix a model class with parameter vector θ.
2. Learn parameters θc for the entire class.
3. Learn parameters θg for a subgroup g.
4. Measure difference between θc and θg
quality measure for subgroup g.
5. For relational data, an individual o = subgroup g of size 1.

Compare random individual against target individual
Knobbe, A.; Feelders, A. & Leman, D. (2011), Exceptional Model Mining'Data Mining: Foundations
and Intelligent Paradigms.', Springer Verlag, Heidelberg, Germany .
“Model-based Outlier Detection for Object-Relational Data”. Riahi and Schulte (2015). IEEE SSCI.
12
EMM-Based Outlier Detection for
Relational Data
Population Database
Class Bayesian network
(for random individual)
Individual Database
Individual Bayesian network
Outlierness Metric (quality measure) =
Measure of dissimilarity between class and individual BN
e.g. KLD, ELD (new)
“Model-based Outlier Detection for Object-Relational Data”. Riahi and Schulte (2015). IEEE SSCI.
13
Example: class and individual Bayesian
network parameters
Gender Drama(M
P(Drama(M)=T) = 0.5 (A)
)
P(gender(A)=M) = 0.5
gender(A)
Cond. Prob.
of
ActsIn(A,M)=
T
Drama(M)
ActsIn(A,M)
P(gender(bradPitt)=M) = 1
gender(BradPitt)
M
T
1/2
M
F
0
W
T
0
W
F
1
P(Drama(M)=T) = 0.5 Gender
Drama
(bradPitt) (M)
Drama(M)
ActsIn(BradPitt,M)
Cond. Prob.
of
ActsIn(A,M)=T
M
T
0
M
F
0
14
Outlierness Metric =
Kulback-Leibler Divergence
KLD(Bo || Bc ) =
å å å
PBo (Xi = xik , Pa(Xi ) = pa j )´ ln(
nodes i values k parent-state j
PBo (Xi = xik | Pa(Xi ) = pa j )
PBc (Xi = xik | Pa(Xi ) = pa j )
where
 Bc models the class database distribution
 Bo model the individual database distribution Do
 Assuming that PBo=PDo (MLE estimation), the KLD is the
individual data log-likelihood ratio:
KLD(Bo || Bc ) = L(Bo;Do )- L(Bc ;Do )
Learning Bayesian Networks for Complex Relational Data
15
)
Brad Pitt Example
individual
joint
gender(A)
M
individual
cond
1
ln(ind.cond.
)
ln(class cond.) KLD
class cond
1
0.5
0
-0.69
0.69
Drama(M individual individual
ln(ind.cond.
ActsIn(A,M) gender(A) )
joint
cond
class cond )
ln(class cond.) KLD
F
M
T
1/2
1
0.5
0
-0.69
0.35
F
M
F
1/2
1
1
0
0.00
0.00
total
• total KLD = 0.69 + 0.35 = 1.04
• KLD for Drama(M) = 0
• omitted rows with individual probability = 0
Learning Bayesian Networks for Complex Relational Data
0.35
16
Mutual Information Decomposition
The interpretability of the metric can be increased by a mutual information
decomposition of KLD
KLD wrt marginal single-variable distributions
KLD(Bo || Bc ) =
å å
PD (xik )ln(
nodes i values k
+å
å
nodes i values k
å
parent-state j
PD (xik , pa j ) ´[ln(
PBo (xik )
PBc (xik )
PBo (Xi = xik | Pa(Xi ) = pa j )
PBo (xik )
lift of parent condition
in individual distribution
) - ln(
)
PBC (Xi = xik | Pa(Xi ) = pa j )
PBC (xik )
lift of parent condition
in class distribution
• The first sum measures single-variable distribution difference
• The second sum measures difference in strength of associations
Learning Bayesian Networks for Complex Relational Data
17
)]
ELD = Expected Log-Distance
 A problem with KLD: some log ratios are positive, some
negative  cancelling of differences, reduces power
 Can fix by taking log-distances
ELD(Bo || Bc ) =
å å
PD (xik ) | ln(
nodes i values k
+å
å
nodes i values k
å
parent-state j
PD (xik , pa j )´ | ln(
PBo (xik )
PBc (xik )
PBo (Xi = xik | Pa(Xi ) = pa j )
PBo (xik )
Learning Bayesian Networks for Complex Relational Data
) - ln(
)|
PBC (Xi = xik | Pa(Xi ) = pa j )
PBC (xik )
18
)|
Two Types of Outliers
 Feature Outlier: unusual distribution over single attribute
in isolation
 DribbleEfficiency
 Correlation Outlier: unusual relevance of parent for
children (mutual information, lift)
 DribbleEfficiency  Win
Learning Bayesian Networks for Complex Relational Data
19
Example: Edin Dzeko, Marginals
 Data are from Premier League Season 2011-2012.
 Low Dribble Efficiency in 16% of his matches.
 Random Striker: Low DE in 50% of matches.
 ELD contribution for marginal sum:
16% x |ln(16%/50%)| = 0.18
Learning Bayesian Networks for Complex Relational Data
20
Example: Edin Dzeko, Associations
 Association: Shotefficiency = high, TackleEfficiency =
medium DribbleEffiency = low
 For Edin Dzeko:
 confidence = 50%
 lift = ln(50%/16%)=1.13
 support (joint prob) = 6%
 For random striker
 confidence = 38%
 lift = ln(38%/50%) =-0.27
 ELD contribution for association
10% x |1.13-(-0.27)|= 6% x 1.14 = 0.068
Learning Bayesian Networks for Complex Relational Data
21
Evaluation Metrics
 Use precision as evaluation metric
 Set the percentages of outliers to be 1% and 5%.
 How many outliers were correctly recognized
 Similar results with AUC, recall.
Gao, J.; Liang, F.; Fan, W.; Wang, C.; Sun,Y. & Han, J. (2010), On Community Outliers and Their Efficient Detection in
Information Networks, in ‘SIGKDD, pp. 813--822.
22
Methods Compared
 Outlierness metrics
 KLD
 |KLD|: replace log-differences by log-distances
 ELD
 LOG = -log-likelihood of generic class model on individual
database
 FD: |KLD| with respect to marginals only
 Aggregation Methods
Use counts of single feature values to form data matrix
2. Apply standard single-table methods (LOF, KNN, OutRank)
1.
Learning Bayesian Networks for Complex Relational Data
23
Synthetic Datasets
 Synthetic Datasets: Should be easy!
 Two Features per player per match
 Samples below
high
correlation
ShotEff Match
Result
low
correlation
ShotEff Match
Result
Normal
1
1
Normal
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
0
1
1
0
0
0
0
0
1
0
0
Outlier
Outlier
24
Synthetic Data Results
25
●
●
●●
1D Scatter-Plots
Red points are outliers and blue points are normal class points
High−correlation Synthetic dataset
●
●
●
●
●
●
●
●
●
●
●
Low−correlation Synthetic dataset
●●●●● ●●●●●●
●●●
●●
●
●
●
●
●
●●
●●
●●
●
● ●
●●● ● ● ●●●●
●
●●
●●●
●●
●
●
●●●
●
●
●
●●
●●
●
●● ●
●●
●
●
●
●●● ●
●
●
●
●● ● ●●●●●●
●
●
●
●●
●
● ●
●
●●
●
●●●
●●
● ●●●
●● ● ●●●
●●
●
●●
●●●
●
●●●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●●
●
●●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●
●
●●
●
●●●
●
●
●
●
●●●
●●
●●
●
●●●
LOG
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
● ●●
●●
●
●
●
●
●
● ●●
●● ●
●
LOG
●● ●
●
●●
●●●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
FD
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
● ●
● ●
●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●●
●●
● ●
●
● ●● ●●●
●●
●●
●
●
●
● ● ●●
●
●●
●●
●●
●●
●
●●●●
●●
● ●
●
●
●
●
●
●●●
●
● ●
● ●
●
●
●● ●
●
●
●
●
●●
●●
●●
●
●●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
● ●●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
● ●
●●
●
●●
●
●●
●
●
●● ●●
●
●
●
●●
●
●●
●●
●
●
●
●●
● ●● ●
●●
● ●
●●
●
●
●● ●● ●
●●
●
●●
●
●
●●
●●
●
● ● ●●●● ● ●● ●
●●●
●
●●●●●●● ● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
3.0
●
●
●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●●●
●●
●●●
●
●
●●●
● ●●
●
●
●
●
●
●●
●
●●
●●●
●
●
●
●●●
●●●
●
●
●●
●●
●●
●●
●
●
●●
● ●
●●
●●
●
●●●●●
●●
●
●
● ●
●
●
|LR|
●
●
●
● ●
●
●● ●
●
● ●
●●
●
●
●
●
●
●●
●
●● ●●●●● ●●
●
●● ● ●
●●
●●● ● ● ● ● ● ●
●● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
ELD
2.5
●
●
●
●
● ●
●●●
|LR|
2.0
●
●
LR
●● ● ●
●●
●●
●●
●●
● ●●
●● ●
●
●
●
●●
●●
●●
●
●
●●
●
●●●
●● ●●
●●
● ●
●●
●
●●
●
●● ● ●
●
●
●
●●
●
●
●●
●
●●
●●
●●
●●
●●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
FD
LR
●
●
●
●●●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●●
● ●● ●
● ●● ●●
●●
● ●● ●●●
●●
● ●●
●●●●●●●
● ●●
● ●
ELD
3.5
Log(Metrics+1)
4.0
4.5
5.0
2.0
2.5
3.0
3.5
Log(Metrics+1)
4.0
264.5
5.0
Case Study: Strikers and Movies
ELD ELD
Player Name Position Rank Max Node
Edin Dzeko
Striker
FD
Max Value
Object
Class
Probability Probability
1 Dribble Efficiency DE = Low
0.16
0.50
Paul Robinson Goalie
2 SavesMade
SM = Medium
0.30
0.04
Michel Vorm
3 SavesMade
SM = Medium
0.37
0.04
Goalie
Striker = Normal
MovieTitle
Genre
Brave Heart
Drama
ELD
Rank ELD Max Node
FD Max
Object
Class
feature Value Probability Probability
1 Actor_Quality
a_quality=4
0.93
0.42
Austin Powers Comedy
2 Cast_position
cast_num=3
0.78
0.49
Blue Brothers Comedy
3 Cast_position
cast_num=3
0.88
0.49
Drama = Normal
27
Conclusion
 Relational outlier detection: two approaches for leveraging
BN structure learning
 Propositionalization
 BN structure defines features for single-table outlier detection
 Relational Outlierness metric
 Use divergence between database distribution for target
individual and random individual
 Novel variant of Kullback-Leibler divergence works well:
 interpretable
 accurate
Learning Bayesian Networks for Complex Relational Data
28
Tutorial Conclusion: First-Order
Bayesian Networks
 Many organizations maintain structured data in relational
databases.
 First-order Bayesian networks model probabilistic
associations across the entire database.
 Halpern/Bacchus probabilistic logic unifies logic and
probability.
 random selection semantics for Bayesian networks: can
query frequencies across the entire database.
Learning Bayesian Networks for Relational Data
29
Conclusion: Learning First-Order
Bayesian Networks
 Extend Halpern/Bacchus random selection semantics to
statistical concepts
 new random selection likelihood function
 tractable parameter and structure learning
• can also be used to learn Markov Logic Networks
 relational Bayesian network classification formula
 log-linear model whose predictors are the proportions of
Bayesian network features.
 New approach to relational anomaly detection
• compare probability distribution of potential outlier with
distribution for reference class
Learning Bayesian Networks for Relational Data
30
Related documents