Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Content and material: Clustering Methods Spring 2007 1. Introduction - Applications: cluster analysis, quantization, data reduction, pattern matching. - Cluster software. - K-means algorithm - Clustering problem definitions. 2. Algorithms - Hierarchical algorithms: Divisive, Agglomerative, MST-based. - Iterative algorithms - Self-organizing map (SOM) - Optimization techniques (local search, GA, Tabu search) - Optimal clustering (Branch-and-Bound, DP for 1-d case) - Projection-based (Cleju’s thesis, Mantao’s paper) - Comparison (Tomi+Marko manuscript, PR paper) 3. Number of clusters / cluster validity - DBI, F-ratio… - Stochastic complexity (Mantao’s PRL paper) 4. Preprocessing and metrics - Distance metrics and distortion functions - 0/1 normalization, RCMS (??) - Outlier detection / removal (ICPR paper, SCIA paper) - Missing values: recognition and fixing. - Binary data, categorical data, non-attributive data 5. Search techqniques / nearest neighbor search - Partial distortion search - Projection-based search (MPS, PCA, some other projection axis?) - Neighborhood graph 6. Visualization - Graph-based - Circle (?) - Projection and sorting - SOM 7. Data dimensionality - Estimation of dimensionality - Dimensional reduction - Sphere packing / kissing number problem 8. Other models - Fuzzy clustering - Gaussian Mixture model - Image segmentation - Spatial clustering Applications: Data compression Text mining Speaker recognition Customer relationship management Shopping basquet analysis Business intelligence Oracle http://www.oracle.com/technology/products/bi/odm/index.html Case example: Amazon (find similar books!) Material: Cluster-software, GhostMiner, Oracle(?) Pseuco codes Data: S sets, Dim, Birch, Image Data: UCI, Delve dataset SA hakemus, Singapore-esitelmä, Articles / Pointers SIPU papers + web material Theodoridis and Koutroumbas, Pattern Recognition, Academic Press, 3rd ed., 2006. http://cgi.di.uoa.gr/~stpatrec/welcome3d.html Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006. Timo Kaukoranta, PhD thesis, TUCS, 1999. Juha Kivijärvi, PhD thesis, TUCS, 2004. Mantao Xu, PhD thesis, Joensuu, 2005. Ismo Kärkkäinen, PhD thesis, Joensuu, 2006. Jarkko Venna, PhD thesis, Tech.Univ. of Helsinki, 2007 (to appear) Sami Äyrämö, PhD thesis, Knowledge Mining Using Robust Clustering, J-kylä, 2006. http://dissertations.jyu.fi/studcomp/9513926559.pdf Data: http://www.ics.uci.edu/~mlearn/MLRepository.html Data: http://www.cs.toronto.edu/~delve/data/datasets.html Data: http://kdd.ics.uci.edu/summary.data.type.html Data: http://lib.stat.cmu.edu/ Data: http://cs.joensuu.fi/pages/franti/ts/