Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011 Pattern Recognition Overview Training Training Raw Data Feature extraction Training Features Unknown Output Values Classifier/ Regressor Testing Testing Raw Data Feature extraction Training Features Learned Classification/ Regression Classifier/ Regressor Output Values Feature extraction: extract the most discriminative features to concisely represent the original data, typically involving dimensionality reduction Training/Learning: learn a mapping function that maps input to output Classification/regression: map the input to a discrete output value for classification and to continuous output value for regression. Pattern Recognition Overview (cont’d) • Supervised learning • Both input (feature) and output (class labels) are provided • Unsupervised learning-only input is given • Clustering • Dimensionality reduction • Density estimation • Semi-supervised learning-some input has output labels and others do not have Examples of Pattern Recognition Applications • Computer/Machine Vision object recognition, activity recognition, image segmentation, inspection • Medical Imaging Cell classification • Optical Character Recognition Machine or hand written character/digit recognition • Brain Computer Interface Classify human brain states from EEG signals • Speech Recognition Speaker recognition, speech understanding, language translation • Robotics Obstacle detection, scene understanding, navigation Computer Vision Example: Facial Expression Recognition Machine Vision Example Example: Handwritten Digit Recognition Probability Calculus U is the sample space X is a subset of the outcome or an event P(X ˅ Y)=P(X)+P(Y) - P(X ˄Y) ,i.e, X and Y are mutually exclusive 8 Probability Calculus (cont’d) • Conditional independence X Y | Z P( X | Y , Z ) P( X | Z ) or P( X , Y | Z ) P( X | Z ) P(Y | Z ) • The Chain Rule Given three events A, B, C P ( A, B, C ) P ( A | B, C ) P ( B | C ) P (C ) 9 The Rules of Probability • Sum Rule • Product Rule Combining the sum and product rules yields P ( X ) P ( X | Y ) P (Y ) Y and conditional sum rule P (C | A) P (C | A, B ) P ( B | A) B Bayes’ Theorem posterior likelihood × prior Bayes Rule p(A, B) p(B | A)p(A) p(A | B) p(B) p(B) p(E | A i )p(A i ) p(E | A i )p(A i ) p(A i | E) p(E) p(E | Ai )p(A i ) A1 A2 A3 A4 E A6 A5 i • • • • • Based on definition of conditional probability p(Ai|E) is posterior probability given evidence E p(Ai) is the prior probability P(E|Ai) is the likelihood of the evidence given Ai p(E) is the probability of the evidence 13 Bayesian Rule (cont’d) P( H | E1 ) P( E2 | E1 , H ) P( H | E1 ) P( E2 | E1 , H ) P( H | E1 , E2 ) P( E2 | E1 ) P( H | E1 ) P( E2 | E1 , H ) H Assume E1 and E2 are independent given H, the above equation may be written as P( H | E1 , E2 ) P( H | E1 ) P( E2 | H ) P( H | E1 ) P( E2 | H ) H where P( H | E1 ) is the prior and P( E2 | H ) is the likelihood of H given E2 A Simple Example Consider two related variables: 1. Drug (D) with values y or n 2. Test (T) with values +ve or –ve And suppose we have the following probabilities: P(D = y) = 0.001 P(T = +ve | D = y) = 0.8 P(T = +ve | D = n) = 0.01 These probabilities are sufficient to define a joint probability distribution. Suppose an athlete tests positive. What is the probability that he has taken the drug? P(D y|T ve) P(T ve | D y ) P( D y ) P(T ve | D y ) P( D y ) P(T ve | D n) P( D n) 0.8 0.001 0.8 0.001 0.01 0.999 0.074 15 Expectation (or Mean) • For discrete RV X E ( X ) x p ( x) x • For continuous RV X E ( X ) x p ( x) dx x • Conditional Expectation E ( X | y ) x p ( x | y ) dx x 16 Expectations Conditional Expectation (discrete) Approximate Expectation (discrete and continuous) Variance • The variance of a RV X Var ( X ) E[( X E ( X )) 2 ] E ( X 2 ) E 2 ( X ) • Standard deviation X Var ( X ) • Covariance of RVs X and Y, 2 XY E[( X E ( X ))(Y E (Y ))] E ( XY ) E ( X ) E (Y ) • Chebyshev inequality 1 P(| X E ( X ) | k x ) 2 k 18 Independence • If X and Y are independent, then E ( X Y ) E ( X ) E (Y ) Var ( X Y ) Var ( X ) Var (Y ) 20 Probability Densities p(x) is the density function, while P(x) is the cumulative distribution. P(x) is a non-decreasing function. Transformed Densities The Gaussian Distribution Gaussian Mean and Variance The Multivariate Gaussian mmean vector Scovariance matrix Minimum Misclassification Rate Two types of mistakes: False positive (type 1) False negative (type 2) The above is called Bayes error. Minimum Bayes error is achieved at x0 Generative vs Discriminative Generative approach: Model Use Bayes’ theorem Discriminative approach: Model directly