Download Data Clustering - An Overview and Issues in Clustering Multiple

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia, lookup

Nonlinear dimensionality reduction wikipedia, lookup

Human genetic clustering wikipedia, lookup

K-means clustering wikipedia, lookup

Nearest-neighbor chain algorithm wikipedia, lookup

Cluster analysis wikipedia, lookup

Transcript
Data Clustering - An Overview and Issues in Clustering Multiple
Heterogeneous Datasets – by Mr. Mahmood Hossain
Clustering is a well-studied data mining problem that has found
applications in many areas. Cluster analysis is the process of
categorizing data into subsets that have meaning in the context of a
particular problem. For example, clustering can be applied to a
document collection to reveal which documents are about the same topic.
The objective in any clustering application is to minimize the intercluster similarities and maximize the intra-cluster similarities. There
are different clustering algorithms each of which may or may not be
suited to a particular application.
The traditional clustering paradigm pertains to a single dataset.
Recently, attention has been drawn to the problem of clustering
multiple heterogeneous datasets where the datasets are related but may
contain information about different types of objects and the attributes
of the objects in the datasets may differ significantly. A clustering
based on related but different object sets may reveal significant
information that cannot be obtained by clustering a single dataset.
This talk will focus on an introductory overview of clustering
algorithms and issues in clustering heterogeneous datasets. Some
preliminary results of clustering multiple contextually related
heterogeneous datasets from document clustering domain will also be
presented.