Non Parametric Methods Pattern Recognition and Machine Learning Debrup Chakraborty Nearest Neighbor classification Given: Given a labeled sample of n feature vectors ( call X) A distance measure (say the Euclidian Distance) To find: The class label of a given feature vector x which is not in X Nearest Neighbor classification (contd.) The NN rule: Find the point y in X which is nearest to x Assign the label of y to x Nearest Neighbor classification (contd.) This rule allows us to partition the feature space into cells consisting of all points closer to a given training point x All points in such cells are labeled by the class of the training point. This partitioning is called a Voronoi Tesselation Nearest Neighbor classification (contd.) Voronoi Cells in 2d Nearest Neighbor classification Complexity of the NN rule Distance calculation Finding the minimum distance Nearest Neighbor classification Nearest Neighbor Editing X= Data set, n= no of training points, j=0 Construct the full Voronoi diagram for X Do j=j+1, for each point xj in X find Voronoi neighbors of xj If any neighbor is not from the same class as xj then mark xj Until j==n Discard all points that are not marked. k nearest neighbor classification Given: Given a labeled sample of N feature vectors ( call X) A distance measure (say the Euclidian Distance) An integer k (generally odd) To find: The class label of a given feature vector x which is not in X k-NN classification (contd.) Algorithm: Find out the k nearest neighbors of x in X Call them x1 , x2 ,..., xk Out of the k samples, let ki of them belong to class ci . Choose that ci to be the class of x for which ki is maximum K-nn Classification Class 1 Class 2 z Class 3 k-NN classification (contd.) Distance weighted nearest neighbor xi , f ( xi ) n i 1 Training set In case x=xi, return f(xi) Given an instance x to be classified Let x1 , x2 ,..., xk be the nearest neighbors of x Return k ^ f ( x) arg max wi (c, f ( xi )) cC 1 if a b ( a, b) 0 otherwise i 1 wi 1 x xi 2 Remarks on k-NN classification •The distance weighted kNN is robust to noisy training data and is quite effective when it is provided a sufficiently large set of training examples. •One drawbak of kNN method is that, it defers all computation till a new querry point is presented. Various methods have been developed to index the training examples so that the nearest neighbor can be found with less search time. One such indexing method is kd-tree developed by Bently 1975 •kNN is a lazy learner Locally Weighted Regression In the linear regression problem, to find h(x) at a point x we would do the following: 1. Minimize 2. Output Tx 1 n T 2 ( x y ) i i 2 i 1 Locally Weighted Regression In the llocally weighted regression problem we would do the following 1. Minimize 2. Output 1 n T 2 w ( x y ) i i i 2 i 1 Tx A standard choice of weights is xi x 2 wi exp 2 2 is called the bandwidth parameter Clustering Is different from Classification Classification is partitioning the feature space whereas Clustering is partitioning the data into “homogeneous groups” Clustering is Unsupervised!! K-means Clustering Given: A data set X x1 , x2 ,..., xn R p Fix the number of clusters K Let zk represent the i-th cluster center i (prototype) at the k-th iteration k S Let j represent the j-th cluster at the k-th iteration K-means Clustering Steps 1. Choose the initial cluster centers Z 1 z11 , z21 ,..., z1K 2. At the k-th iterative step distribute the points in X in K cluster using: k xi S j if k xi z j k xi zl l 1,2,..., K 3. Compute zkj 1 1 x S kj xS jk 1 k k 1 z z th then the procedure has 4. If j j K j converged else repeat from 2.