Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
School of Computing Science Simon Fraser University Vancouver, Canada Anomaly Detection Outlier Detection Exception Mining Profile-Based Outlier Detection for Relational Data Population Database e.g. IMDB Individual Database Profile, Interpretation, egonet e.g. Brad Pitt’s movies Goal: Identify exceptional individual databases Maervoet, J.; Vens, C.; Vanden Berghe, G.; Blockeel, H. & De Causmaecker, P. (2012), 'Outlier Detection in Relational 2 Data: A Case Study in Geographical Information Systems', Expert SystemsWith Applications 39(5), 4718—4728. Example: population data gender = Man country = U.S. False n/a False n/a runtime = 98 min drama = true action = true gender = Man country = U.S. gender = Woman country = U.S. True False $500K n/a False True n/a $5M runtime = 111 min drama = false action = true gender = Woman country = U.S. False n/a True $2M ActsIn salary 3 Example: individual data gender = Man country = U.S. False n/a False n/a runtime = 98 min drama = true 4 Model-Based Relational Outlier Detection • Model-based: Leverage result of Bayesian network learning 1. Feature generation based on BN model 2. Define outlierness metric using BN model Population Database Individual Database Class-level Bayesian network Maervoet, J.; Vens, C.; Vanden Berghe, G.; Blockeel, H. & De Causmaecker, P. (2012), 'Outlier Detection in Relational 5 Data: A Case Study in Geographical Information Systems', Expert SystemsWith Applications 39(5), 4718—4728. Model-Based Feature Generation Learning Bayesian Networks for Complex Relational Data Model-Based Outlier Detection for Relational Data Population Database Class-level Bayesian network Individual Database Individual Feature Vector • Propositionalization/Relation Elimination/ETL: • Feature vectors summarize the individual data • leverage outlier detection for i.i.d. feature matrix data Riahi, F. & Schulte, O. (2016), Propositionalization for Unsupervised Outlier Detection in Multi-Relational Data, in 'Proceedings FLAIRS 2016.', pp. 448--453. 7 Example: Class Bayesian Network ActsIn(A,M) Drama(M) gender(A) Learning Bayesian Networks for Complex Relational Data 8 Example: Feature Matrix 0 1 1/2 1/2 0 0 1/2 1/2 0 0 0 0 1/2 1/2 1/2 1/2 0 0 0 0 0 1/2 1/2 0 1/2 1/2 1/2 1/2 1/2 0 0 1/2 0 0 0 0 1/2 1/2 1/2 1/2 0 0 0 0 0 1/2 1/2 0 • Each feature corresponds to a family configuration in the Bayesian network • Similar to feature matrix for classification • For step-by-step construction, see supplementary slides on website Learning Bayesian Networks for Complex Relational Data 9 Feature Generation/Propositionalization for Outlier Detection Similar to feature generation for classification Main difference: include all first-order random variables, not just the Markov blanket of the class variable Bayesian network learning discovers relevant conjunctive features Related work: The Oddball system also extracts a feature matrix from relational information based on network analysis (Akoglu et al. 2010) + Leverages existing i.i.d. outlier detection methods - does not define a “native” relational outlierness metric Akoglu, L.; Mcglohon, M. & Faloutsos, C. (2010), OddBall: Spotting Anomalies in Weighted Graphs, in 'PAKDD', pp. 410-421 Akoglu, L.; Tong, H. & Koutra, D. (2015), 'Graph based anomaly detection and description: a survey', Data Mining and Knowledge Discovery 29(3), 626--688. 10 Relational Outlierness Metrics Learning Bayesian Networks for Complex Relational Data Exceptional Model Mining for Relational Data EMM approach (Knobbe et al. 2011) for subgroup discovery in i.i.d. data 1. Fix a model class with parameter vector θ. 2. Learn parameters θc for the entire class. 3. Learn parameters θg for a subgroup g. 4. Measure difference between θc and θg quality measure for subgroup g. 5. For relational data, an individual o = subgroup g of size 1. Compare random individual against target individual Knobbe, A.; Feelders, A. & Leman, D. (2011), Exceptional Model Mining'Data Mining: Foundations and Intelligent Paradigms.', Springer Verlag, Heidelberg, Germany . “Model-based Outlier Detection for Object-Relational Data”. Riahi and Schulte (2015). IEEE SSCI. 12 EMM-Based Outlier Detection for Relational Data Population Database Class Bayesian network (for random individual) Individual Database Individual Bayesian network Outlierness Metric (quality measure) = Measure of dissimilarity between class and individual BN e.g. KLD, ELD (new) “Model-based Outlier Detection for Object-Relational Data”. Riahi and Schulte (2015). IEEE SSCI. 13 Example: class and individual Bayesian network parameters Gender Drama(M P(Drama(M)=T) = 0.5 (A) ) P(gender(A)=M) = 0.5 gender(A) Cond. Prob. of ActsIn(A,M)= T Drama(M) ActsIn(A,M) P(gender(bradPitt)=M) = 1 gender(BradPitt) M T 1/2 M F 0 W T 0 W F 1 P(Drama(M)=T) = 0.5 Gender Drama (bradPitt) (M) Drama(M) ActsIn(BradPitt,M) Cond. Prob. of ActsIn(A,M)=T M T 0 M F 0 14 Outlierness Metric = Kulback-Leibler Divergence KLD(Bo || Bc ) = å å å PBo (Xi = xik , Pa(Xi ) = pa j )´ ln( nodes i values k parent-state j PBo (Xi = xik | Pa(Xi ) = pa j ) PBc (Xi = xik | Pa(Xi ) = pa j ) where Bc models the class database distribution Bo model the individual database distribution Do Assuming that PBo=PDo (MLE estimation), the KLD is the individual data log-likelihood ratio: KLD(Bo || Bc ) = L(Bo;Do )- L(Bc ;Do ) Learning Bayesian Networks for Complex Relational Data 15 ) Brad Pitt Example individual joint gender(A) M individual cond 1 ln(ind.cond. ) ln(class cond.) KLD class cond 1 0.5 0 -0.69 0.69 Drama(M individual individual ln(ind.cond. ActsIn(A,M) gender(A) ) joint cond class cond ) ln(class cond.) KLD F M T 1/2 1 0.5 0 -0.69 0.35 F M F 1/2 1 1 0 0.00 0.00 total • total KLD = 0.69 + 0.35 = 1.04 • KLD for Drama(M) = 0 • omitted rows with individual probability = 0 Learning Bayesian Networks for Complex Relational Data 0.35 16 Mutual Information Decomposition The interpretability of the metric can be increased by a mutual information decomposition of KLD KLD wrt marginal single-variable distributions KLD(Bo || Bc ) = å å PD (xik )ln( nodes i values k +å å nodes i values k å parent-state j PD (xik , pa j ) ´[ln( PBo (xik ) PBc (xik ) PBo (Xi = xik | Pa(Xi ) = pa j ) PBo (xik ) lift of parent condition in individual distribution ) - ln( ) PBC (Xi = xik | Pa(Xi ) = pa j ) PBC (xik ) lift of parent condition in class distribution • The first sum measures single-variable distribution difference • The second sum measures difference in strength of associations Learning Bayesian Networks for Complex Relational Data 17 )] ELD = Expected Log-Distance A problem with KLD: some log ratios are positive, some negative cancelling of differences, reduces power Can fix by taking log-distances ELD(Bo || Bc ) = å å PD (xik ) | ln( nodes i values k +å å nodes i values k å parent-state j PD (xik , pa j )´ | ln( PBo (xik ) PBc (xik ) PBo (Xi = xik | Pa(Xi ) = pa j ) PBo (xik ) Learning Bayesian Networks for Complex Relational Data ) - ln( )| PBC (Xi = xik | Pa(Xi ) = pa j ) PBC (xik ) 18 )| Two Types of Outliers Feature Outlier: unusual distribution over single attribute in isolation DribbleEfficiency Correlation Outlier: unusual relevance of parent for children (mutual information, lift) DribbleEfficiency Win Learning Bayesian Networks for Complex Relational Data 19 Example: Edin Dzeko, Marginals Data are from Premier League Season 2011-2012. Low Dribble Efficiency in 16% of his matches. Random Striker: Low DE in 50% of matches. ELD contribution for marginal sum: 16% x |ln(16%/50%)| = 0.18 Learning Bayesian Networks for Complex Relational Data 20 Example: Edin Dzeko, Associations Association: Shotefficiency = high, TackleEfficiency = medium DribbleEffiency = low For Edin Dzeko: confidence = 50% lift = ln(50%/16%)=1.13 support (joint prob) = 6% For random striker confidence = 38% lift = ln(38%/50%) =-0.27 ELD contribution for association 10% x |1.13-(-0.27)|= 6% x 1.14 = 0.068 Learning Bayesian Networks for Complex Relational Data 21 Evaluation Metrics Use precision as evaluation metric Set the percentages of outliers to be 1% and 5%. How many outliers were correctly recognized Similar results with AUC, recall. Gao, J.; Liang, F.; Fan, W.; Wang, C.; Sun,Y. & Han, J. (2010), On Community Outliers and Their Efficient Detection in Information Networks, in ‘SIGKDD, pp. 813--822. 22 Methods Compared Outlierness metrics KLD |KLD|: replace log-differences by log-distances ELD LOG = -log-likelihood of generic class model on individual database FD: |KLD| with respect to marginals only Aggregation Methods Use counts of single feature values to form data matrix 2. Apply standard single-table methods (LOF, KNN, OutRank) 1. Learning Bayesian Networks for Complex Relational Data 23 Synthetic Datasets Synthetic Datasets: Should be easy! Two Features per player per match Samples below high correlation ShotEff Match Result low correlation ShotEff Match Result Normal 1 1 Normal 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1 0 0 Outlier Outlier 24 Synthetic Data Results 25 ● ● ●● 1D Scatter-Plots Red points are outliers and blue points are normal class points High−correlation Synthetic dataset ● ● ● ● ● ● ● ● ● ● ● Low−correlation Synthetic dataset ●●●●● ●●●●●● ●●● ●● ● ● ● ● ● ●● ●● ●● ● ● ● ●●● ● ● ●●●● ● ●● ●●● ●● ● ● ●●● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ●●●●●● ● ● ● ●● ● ● ● ● ●● ● ●●● ●● ● ●●● ●● ● ●●● ●● ● ●● ●●● ● ●●● ● ●● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●● ● ●● ● ●●● ● ● ● ● ●●● ●● ●● ● ●●● LOG ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● LOG ●● ● ● ●● ●●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● FD ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ●● ●●● ●● ●● ● ● ● ● ● ●● ● ●● ●● ●● ●● ● ●●●● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ●● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ●● ●● ● ●● ● ●● ● ● ●● ●● ● ● ● ●●●● ● ●● ● ●●● ● ●●●●●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3.0 ● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ●●● ●● ●●● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ●● ●●● ● ● ● ●●● ●●● ● ● ●● ●● ●● ●● ● ● ●● ● ● ●● ●● ● ●●●●● ●● ● ● ● ● ● ● |LR| ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ●●●●● ●● ● ●● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ELD 2.5 ● ● ● ● ● ● ●●● |LR| 2.0 ● ● LR ●● ● ● ●● ●● ●● ●● ● ●● ●● ● ● ● ● ●● ●● ●● ● ● ●● ● ●●● ●● ●● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● FD LR ● ● ● ●●● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ●● ●● ●● ● ●● ●●● ●● ● ●● ●●●●●●● ● ●● ● ● ELD 3.5 Log(Metrics+1) 4.0 4.5 5.0 2.0 2.5 3.0 3.5 Log(Metrics+1) 4.0 264.5 5.0 Case Study: Strikers and Movies ELD ELD Player Name Position Rank Max Node Edin Dzeko Striker FD Max Value Object Class Probability Probability 1 Dribble Efficiency DE = Low 0.16 0.50 Paul Robinson Goalie 2 SavesMade SM = Medium 0.30 0.04 Michel Vorm 3 SavesMade SM = Medium 0.37 0.04 Goalie Striker = Normal MovieTitle Genre Brave Heart Drama ELD Rank ELD Max Node FD Max Object Class feature Value Probability Probability 1 Actor_Quality a_quality=4 0.93 0.42 Austin Powers Comedy 2 Cast_position cast_num=3 0.78 0.49 Blue Brothers Comedy 3 Cast_position cast_num=3 0.88 0.49 Drama = Normal 27 Conclusion Relational outlier detection: two approaches for leveraging BN structure learning Propositionalization BN structure defines features for single-table outlier detection Relational Outlierness metric Use divergence between database distribution for target individual and random individual Novel variant of Kullback-Leibler divergence works well: interpretable accurate Learning Bayesian Networks for Complex Relational Data 28 Tutorial Conclusion: First-Order Bayesian Networks Many organizations maintain structured data in relational databases. First-order Bayesian networks model probabilistic associations across the entire database. Halpern/Bacchus probabilistic logic unifies logic and probability. random selection semantics for Bayesian networks: can query frequencies across the entire database. Learning Bayesian Networks for Relational Data 29 Conclusion: Learning First-Order Bayesian Networks Extend Halpern/Bacchus random selection semantics to statistical concepts new random selection likelihood function tractable parameter and structure learning • can also be used to learn Markov Logic Networks relational Bayesian network classification formula log-linear model whose predictors are the proportions of Bayesian network features. New approach to relational anomaly detection • compare probability distribution of potential outlier with distribution for reference class Learning Bayesian Networks for Relational Data 30