Download Anomaly Detection in Communication Networks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Association Mining via
Co-clustering of Sparse Matrices
Brian Thompson*, Linda Ness†,
David Shallcross†, Devasis Bassu†
*
†
Definitions
 Let 𝑀 be an 𝑚 × 𝑛 matrix. A bicluster of 𝑀 is a subset
of matrix entries formed by the intersection of a set of
rows 𝐼 ⊆ [𝑚] and a set of columns 𝐽 ⊆ [𝑛], and is
denoted by 𝑀𝐼,𝐽 .
𝑀𝐼,𝐽
𝑀
Association Mining via Co-clustering of Sparse Matrices
Motivation
 Matrices can represent: binary relations, objects and
attributes, terms and documents, gene expression,
recommender systems, ...
 Dense biclusters indicate strong associations
𝑀𝐼,𝐽
𝑀
Association Mining via Co-clustering of Sparse Matrices
Motivation
 Matrices can represent: binary relations, objects and
attributes, terms and documents, gene expression,
recommender systems, ...
 Dense biclusters indicate strong associations
𝑀𝐼,𝐽
𝑀
Association Mining via Co-clustering of Sparse Matrices
Co-Clustering
 Co-clustering: Given a matrix, cluster the rows and
columns to form large, dense biclusters
R1
R2
R3
C1 C2 C3
 Challenges:
 Don’t know the number or sizes of clusters a priori
 Want solution to be efficient and scalable
 Matrix may be sparse
Association Mining via Co-clustering of Sparse Matrices
Our Approach
 We propose a two-step approach:
1. Define a quality metric 𝝁 for bicluster partitions
We consider metrics of the form 𝜇 =
𝐵∈Π 𝑓
𝐵
(Motivation for this choice is in the 15-minute version of the talk...)
2. Find a co-clustering that maximizes the value of 𝝁
We propose the CC-MACS algorithm
(Co-Clustering via Maximal Anti-Chain Search)
Association Mining via Co-clustering of Sparse Matrices
The CC-MACS Algorithm
1. Build randomized k-d trees on rows (𝑇 𝑟𝑜𝑤 ), cols (𝑇 𝑐𝑜𝑙 )
2. Populate 𝐹𝑥,𝑦 = 𝑓(𝑀𝐼𝑥,𝐽𝑦 ) for 𝑥 ∈ 𝑇 𝑟𝑜𝑤 , 𝑦 ∈ 𝑇 𝑐𝑜𝑙 via DP
3. Initialize MACs 𝑆 𝑟𝑜𝑤 , 𝑆 𝑐𝑜𝑙 and heaps 𝐻𝑟𝑜𝑤 , 𝐻𝑐𝑜𝑙 ;
ℎ 𝑥 =
𝑦∈𝑆 𝑐𝑜𝑙 𝑓(𝑀𝐼𝑥 ,𝐽𝑦 ) − 𝑓(𝑀𝐼𝑥.𝑙𝑒𝑓𝑡 ,𝐽𝑦 )
least one of 𝐻 𝑟𝑜𝑤 and 𝐻 𝑐𝑜𝑙 is
− 𝑓(𝑀𝐼𝑥.𝑟𝑖𝑔ℎ𝑡 ,𝐽𝑦 )
4. While at
non-empty:
• WLOG let 𝐻𝑟𝑜𝑤 . 𝑔𝑒𝑡𝑀𝑎𝑥 > 𝐻𝑐𝑜𝑙 . 𝑔𝑒𝑡𝑀𝑎𝑥
• Update data structures and variables:
𝐻 𝑟𝑜𝑤 , 𝑆 𝑟𝑜𝑤 , 𝜇𝑐𝑢𝑟𝑟 += ℎ𝑟𝑜𝑤 𝑥 , ℎ𝑐𝑜𝑙 𝑦 for 𝑦 ∈ 𝐻 𝑐𝑜𝑙
•
If 𝑥. 𝑠𝑖𝑏𝑙𝑖𝑛𝑔 ∈ 𝑆 𝑟𝑜𝑤 , add 𝑥. 𝑝𝑎𝑟𝑒𝑛𝑡 to 𝐻 𝑟𝑜𝑤
5. Return co-clustering formed by 𝑆 𝑟𝑜𝑤 × 𝑆 𝑐𝑜𝑙
Association Mining via Co-clustering of Sparse Matrices
The CC-MACS Algorithm
The CC-MACS Algorithm
The CC-MACS Algorithm
The CC-MACS Algorithm
The CC-MACS Algorithm
The CC-MACS Algorithm
Experiments: Synthetic Data
• Generate 𝑚 × 𝑛 matrix 𝑀 with 𝑘 biclusters of size 𝑟 × 𝑠
selected randomly from 𝑀; non-bicluster entries are 0,
each bicluster entry is a 1 with probability 1 − 𝑝
• Want co-clustering output to match ground truth
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
• Compare via 𝐹1-score: 𝐹1 = 2 ⋅ 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦
+ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
Association Mining via Co-clustering of Sparse Matrices
Experiments: Real-World Data
• Matrices from domains of finite element modeling and
quantum chemistry [src: NIST Matrix Market repository]
Dataset
Original
Matrix
CrossAssociation
CC-MACS
(𝒘𝟐 /𝒔)
CC-MACS
(𝒘𝟑 /(𝒂𝒔))
Association Mining via Co-clustering of Sparse Matrices
CC-MACS
(𝒘𝟒 /(𝒂𝟐 𝒔))
Concluding Thoughts
• The CC-MACS algorithm runs in 𝑂(𝑚𝑛 log 𝑚𝑛) time.
• Our approach compared favorably to state-of-the-art
and baseline methods for a classification task on
synthetic data.
• Choice of metric can affect quality and granularity of
results; different metrics may be appropriate for
different applications.
• The CC-MACS algorithm effectively identified large,
dense biclusters in the datasets evaluated.
Association Mining via Co-clustering of Sparse Matrices
Acknowledgements/Disclaimer
 This research was supported by the Intelligence Advanced
Research Projects Activity (IARPA) via Air Force Research
Laboratory (AFRL) contract number FA8650-10-C-706. The
U.S. Government is authorized to reproduce and distribute
reprints for Governmental purposes notwithstanding any
copyright annotation thereon. The views and conclusions
contained herein are those of the authors and should not be
interpreted as necessarily representing the official policies or
endorsements, either expressed or implied, of IARPA, AFRL,
or the U.S. Government.
 Any misinformation, mistakes, or misunderstanding resulting
from this talk are solely the fault of the speaker.
Association Mining via Co-clustering of Sparse Matrices
Association Mining via Co-clustering of Sparse Matrices
Example Matrices
 Spectral methods, which try to rearrange rows and
columns to form a diagonal block matrix, would not
perform well on this matrix.
 The dashed lines suggest a good co-clustering.
Association Mining via Co-clustering of Sparse Matrices
Related documents