Download Experiments

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Transcript
From Context to Distance-Learning
Dissimilarity for Categorical Data Clustering
Presenter : JIAN-REN CHEN
Authors
: DINO IENCO, RUGGERO G. PENSA, and ROSA MEO
2012 , ACMKDD
Intelligent Database Systems Lab
Outlines
 Motivation
 Objectives
 Methodology
 Experiments
 Conclusions
 Comments
Intelligent Database Systems Lab
Motivation
• Clustering data described by categorical attributes
is a challenging task in data mining applications.
• It is difficult to define a distance between pairs of
values of a categorical attribute, since the values
are not ordered.
Intelligent Database Systems Lab
Objectives
• We present a new methodology to compute a
context-based distance between values of a categorical
variable.
- apply this technique to hierarchical clustering of categorical data.
Intelligent Database Systems Lab
Methodology-Framework
DILCA
(DIstance Learning for Categorical Attributes)
1. selection of a suitable context:
(i) a parametric method
(ii) a fully automatic one
2. compute the distance between any pair of values of a
specific categorical attribute
Intelligent Database Systems Lab
Methodology - Context Selection
Intelligent Database Systems Lab
Methodology - Context Selection
Intelligent Database Systems Lab
Methodology - Context Selection
Intelligent Database Systems Lab
Methodology - Distance Computation
Intelligent Database Systems Lab
Experiments - Datasets
Intelligent Database Systems Lab
Experiments - Purity、NMI、ARI
Intelligent Database Systems Lab
Experiments - Purity、NMI、ARI
Intelligent Database Systems Lab
Experiments - Purity、NMI、ARI
Intelligent Database Systems Lab
Experiments - Impact of σ on DILCAM
Intelligent Database Systems Lab
Experiments - Impact of σ on DILCAM
Intelligent Database Systems Lab
Experiments - Scalability
Intelligent Database Systems Lab
Conclusions
• DILCA is competitive with respect to the state of the
art of categorical data clustering approaches.
• DILCA is scalable and has a low impact on the overall
computational time of a clustering task.
Intelligent Database Systems Lab
Comments
• Advantages
– scalable, computational time
• Applications
– a context-based distance between values of a
categorical variable
– hierarchical clustering of categorical data
Intelligent Database Systems Lab