Download AN ADAPTIVE METRIC MACHINE FOR PATTERN CLASSIFICATION

Classification Problem • Given {( xn , yn )}nN1 - - - -- - - x2 - - xn  q yn {, } - + + + + + + + + + + + + - - + ++ + + -x + 0 + + + + + + . x1 • Predict class label of a given query x 0 Classification Problem • Unknown probability distribution P( x, y ) • We need to estimate: P (  | x 0 )  f  ( x0 ) P (  | x 0 )  f  ( x0 ) The Bayesian Classifier • Loss function:  j | k  • Expected loss (conditional risk) associated with class j: J • Bayes rule: R( j | x )     j | k  Pk | x  k 1 j *  arg min R( j | x ) 1 j J • Zero-one loss function: 0 if j  k  j | k    1 if j  k j *  arg max P( j | x ) Bayes rule 1 j J The Bayesian Classifier j *  arg max P( j | x ) 1 j J • Bayes rule achieves the minimum error rate • How to estimate the posterior probabilities: P j | x  J j 1 ˆj  x   arg max Pˆ ( j | x ) 1 j J Density estimation • Use Bayes theorem to estimate the posterior probability values: P( j | x )  p x | j P j  J  p x | k Pk  k 1 p x | j  class P j  is the probability density function of j is the prior probability of class j x given Naïve Bayes Classifier • Makes the assumption of independence of features given the class: p x | j   px1 , x2 ,, xq | j    pxi | j  q i 1 • The task of estimating a q-dimensional density function is reduced to the estimation of q one-dimensional density functions. Thus, the complexity of the task is drastically reduced. • The use of Bayes theorem becomes much simpler. • Proven to be effective in practice. Nearest-Neighbor Methods • Predict the class label of x0 as the most frequent one occurring in the K neighbors - - - -+ ++ + + -- - - - + ++ + + -- + + - + + x2 + + + - -+ + - + + - - - ++ ++ - . x1 Nearest-Neighbor Methods • Predict the class label of x0 as the most frequent one occurring in the K neighbors - - - -+ ++ + + -- - - - + ++ + + -- + + - + + x2 + + + - -- + + + - + + - - - ++ ++ - x1 Nearest-Neighbor Methods • Predict the class label of x0 as the most frequent one occurring in the K neighbors - - - -+ ++ + + -- - - - + ++ + + -- + + - + + x2 + + + - -+ + - + + - - - ++ ++ - . .. x1 Basic assumption: f  ( x   x)  f  ( x) f  ( x   x)  f  ( x) for small  x Example: Letter Recognition First statistical moment . . . Edge count Asymptotic Properties of K-NN Methods lim N  fˆj  x   f j ( x ) if lim N  K   and lim N  K / N  0 • The first condition reduces the variance by making the estimation independent of the accidental characteristics of the K nearest neighbors. • The second condition reduces the bias by assuring that the K nearest neighbors are arbitrarily close to the query point. Asymptotic Properties of K-NN Methods lim N  E1  2E E1  classification error rate of the 1-NN rule E  classification error rate of the Bayes rule In the asymptotic limit no decision rule is more than twice as accurate as the 1-NN rule Finite-sample settings • How well the 1-NN rule works in finitesample settings? • If the number of training data N is large and the number of input features q is small, then the asymptotic results may still be valid. • However, for a moderate to large number of input variables, the sample required for their validity is beyond feasibility. Curse-of-Dimensionality • This phenomenon is known as the curse-of-dimensionality • It refers to the fact that in high dimensional spaces data become extremely sparse and are far apart from each other • It affects any estimation problem with high dimensionality Curse of Dimensionality DMAX/DMIN DMAX DMIN Sample of size N=500 uniformly distributed in [0, 1]q Curse of Dimensionality dim The distribution of the ratio DMAX/DMIN converges to 1 as the dimensionality increases Curse of Dimensionality dim Variance of distances from a given point Curse of Dimensionality dim The variance of distances from a given point converges to 0 as the dimensionality increases Curse of Dimensionality Distance values from a given point Values flatten out as dimensionality increases Computing radii of nearest neighborhoods median radius of a nearest neighborhood uniform distributi on in the unit cube - .5,.5 q Curse-of-Dimensionality As dimensionality increases, the distance from the closest point increases faster • Random sample of size N ~ uniform distribution in the q -dimensional unit hypercube • Diameter of a K  1neighborhood using Euclidean distance: d (q, N )  O( N 1/ q ) q N d(q,N) 4 4 6 6 10 10 100 1000 100 1000 1000 10000 10000 0.71 0.48 0.91 0.72 1.51 0.42 0.23 20 20 20 106 1010 1.20 0.76 Large d (q, N )  Highly biased estimations Curse-of-Dimensionality • It is a serious problem in many real-world applications • Microarray data: 3,000-4,000 genes; • Documents: 10,000-20,000 words in dictionary; • Images, face recognition, etc. How can we deal with the curse of dimensionality? 7.68 92.2  92.2 1912.5    x1   1  1 N x    μ      xi  x2    2  N i 1 2  2 covariance matrix :   E  x  μ  x  μ   T  x     E  1 1  x1  1 , x2   2    x2   2     x1  1 2 E  x1  1  x2   2  1 N   x1  1 x2   2    2  x2   2   2 i  x1  1   i i i 1   x1  1 x2   2 N    x i 1    1 x2i   2   2 i x2   2    1 N   2  x1i  1  i  i x   x i 1  1 2  2  1 N    x i 1    1 x2i   2   2 i x2   2    variance covariance 2 1 N i  x1  1   N  N i 1 i i 1 x   x  1 1 2  2  N i 1   covariance    1 N  x   x N i 1 1 N i 1  N i 1     1 2   2  x2i   2  i 2  variance 0.94 0.93 0.93 1.03   0.97 0.49 0.49 1.04    0.93 0.01 0.01 1.05    0.99  0.5  0.5 1.06     1.04  1.05  1.05 1.15    Dimensionality Reduction • Many dimensions are often interdependent (correlated); We can: • Reduce the dimensionality of problems; • Transform interdependent coordinates into significant and independent ones;

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download AN ADAPTIVE METRIC MACHINE FOR PATTERN CLASSIFICATION