Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
From Context to Distance-Learning Dissimilarity for Categorical Data Clustering Presenter : JIAN-REN CHEN Authors : DINO IENCO, RUGGERO G. PENSA, and ROSA MEO 2012 , ACMKDD Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments Intelligent Database Systems Lab Motivation • Clustering data described by categorical attributes is a challenging task in data mining applications. • It is difficult to define a distance between pairs of values of a categorical attribute, since the values are not ordered. Intelligent Database Systems Lab Objectives • We present a new methodology to compute a context-based distance between values of a categorical variable. - apply this technique to hierarchical clustering of categorical data. Intelligent Database Systems Lab Methodology-Framework DILCA (DIstance Learning for Categorical Attributes) 1. selection of a suitable context: (i) a parametric method (ii) a fully automatic one 2. compute the distance between any pair of values of a specific categorical attribute Intelligent Database Systems Lab Methodology - Context Selection Intelligent Database Systems Lab Methodology - Context Selection Intelligent Database Systems Lab Methodology - Context Selection Intelligent Database Systems Lab Methodology - Distance Computation Intelligent Database Systems Lab Experiments - Datasets Intelligent Database Systems Lab Experiments - Purity、NMI、ARI Intelligent Database Systems Lab Experiments - Purity、NMI、ARI Intelligent Database Systems Lab Experiments - Purity、NMI、ARI Intelligent Database Systems Lab Experiments - Impact of σ on DILCAM Intelligent Database Systems Lab Experiments - Impact of σ on DILCAM Intelligent Database Systems Lab Experiments - Scalability Intelligent Database Systems Lab Conclusions • DILCA is competitive with respect to the state of the art of categorical data clustering approaches. • DILCA is scalable and has a low impact on the overall computational time of a clustering task. Intelligent Database Systems Lab Comments • Advantages – scalable, computational time • Applications – a context-based distance between values of a categorical variable – hierarchical clustering of categorical data Intelligent Database Systems Lab