Download Lecture 20 clustering (1): Kmeans algorithm

Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1) Intro. ANN & Fuzzy Systems Outline • Unsupervised learning (Competitive Learning) and Clustering • K-Means Clustering Algorithm (C) 2001 by Yu Hen Hu 2 Intro. ANN & Fuzzy Systems Unsupervised Learning • Data Mining – Understand internal/hidden structure of data distribution • Labeling (Target value, teaching input) Cost is High – Large amount of feature vectors – Sampling may involve costly experiments – Data label may not be available at all • Pre-processing for classification – features within the same cluster are similar, and – often belong to the same class (C) 2001 by Yu Hen Hu 3 Intro. ANN & Fuzzy Systems Competitive Learning • A form of unsupervised learning. • Neurons compete against each other with their activation values. The winner(s) reserve the privilege to update their weights. The losers may even be punished by updating their weights in opposite direction. • Competitive and Cooperative Learning: Competitive: Only one neuron's activation can be reinforced. Cooperative: Several neurons' activation can be reinforced. (C) 2001 by Yu Hen Hu 4 Intro. ANN & Fuzzy Systems Competitive Learning Rule • A neuron WINS the competition if its output is largest among all neurons for the same input x(n). • The weights of the winning neuron (k-th) is adjusted: D wk(n)  [x(n) – wk(n)] The positions of losing neurons remain unchanged. • If the weights of a neuron represents its POSITION. If the output of a neuron is inversely proportional to the distance between x(n) and wk(n), then Competitive Learning = CLUSTERING! (C) 2001 by Yu Hen Hu 5 Intro. ANN & Fuzzy Systems Competitive Learning Example initial after 25 iterations 2 2 1 1 0 0 -1 -1 -2 -2 -1 0 1 2 -2 -2 after 75 iterations 2 1 1 0 0 -1 -1 -1 0 1 0 1 2 at end of 100 iterations 2 -2 -2 -1 2 -2 -2 -1 0 1 2 learncl1.m (C) 2001 by Yu Hen Hu 6 Intro. ANN & Fuzzy Systems What is “Clustering”? 1 What can we learn from these “unlabeled” data samples? 0.5 0 0 20 40 60 80 100 1.5 1 0.5 0 -0.5 (C) 2001 by Yu Hen Hu 0 0.5 1 – Structures: Some samples are closer to each other than other samples – The closeness between samples are determined using a “similarity measure” – The number of samples per unit volume is related to the concept of “density” or “distribution” 1.5 7 Intro. ANN & Fuzzy Systems Clustering Problem Statement • Given a set of vectors {xk; 1  k  K}, find a set of M clustering centers {w(i); 1  i  c} such that each xk is assigned to a cluster, say, w(i*), according to a distance (distortion, similarity) measure d(xk, w(i)) such that the average distortion 1 D K c K  I ( x , i)d ( x ,W (i)) i 1 k 1 k k is minimized. • I(xk,i) = 1 if x is assigned to cluster i with cluster center w(I); and = 0 otherwise -- indicator function. (C) 2001 by Yu Hen Hu 8 Intro. ANN & Fuzzy Systems k-means Clustering Algorithm Initialization: Initial cluster center w(i); 1  i  c, D(–1)= 0, I(xk,i) = 0, 1  i  c, 1  k  K; Repeat (A) Assign cluster membership (Expectation step) Evaluate d(xk, w(i)); 1  i  c, 1  k  K I(xk,i) = 1 if d(xk, w(i)) < d(xk, w(j)), j  i; = 0; otherwise. 1kK N (B) Evaluate distortion D: D(iter )  I ( x , i )d ( x , w(i )) 1  k  K  k 1 k k (C) Update code words according to new assignment (Maximization) N W (i )   I ( xk , i ) xk , k 1 N N i   I ( xk , i ), 1 i  c k 1 (D) Check for convergence if 1–D(Iter–1)/D(Iter) < e , then convergent = TRUE, (C) 2001 by Yu Hen Hu 9 Intro. ANN & Fuzzy Systems A Numerical Example x = {-1, -2,0,2,3,4}, W={2.1, 2.3} 1. Assign membership 2.1: {-1, -2, 0, 2} 2.3: {3, 4} 2. Distortion D = (-1-2.1)2 + (-2-2.1)2 + (0-2.1)2 + (2-2.1)2 + (3-2.3)2 + (4-2.3)2 (C) 2001 by Yu Hen Hu 3. Update W to minimize distortion W1 = (-1-2+0+2)/4 = -.25 W2 = (3+4)/2 = 3.5 4. Reassign membership -.25: {-1, -2, 0} 3.5: {2, 3, 4} 5. Update W: w1 = (-1-2+0)/3 = -1 w2 = (2+3+4)/3 = 3. Converged. 10 Intro. ANN & Fuzzy Systems Kmeans Algorithm Demonstration 2.5 2 data points True cluster centers data samples converged centers true centers cluster boundary 2 1.5 1.5 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -0.5 (C) 2001 by Yu Hen Hu 0 0.5 1 1.5 -1 -2 -1 0 1 2 3 Clusterdemo.m 11

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 20 clustering (1): Kmeans algorithm