Download slides_chapter1

ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011 Pattern Recognition Overview Training Training Raw Data Feature extraction Training Features Unknown Output Values Classifier/ Regressor Testing Testing Raw Data Feature extraction Training Features Learned Classification/ Regression Classifier/ Regressor Output Values Feature extraction: extract the most discriminative features to concisely represent the original data, typically involving dimensionality reduction Training/Learning: learn a mapping function that maps input to output Classification/regression: map the input to a discrete output value for classification and to continuous output value for regression. Pattern Recognition Overview (cont’d) • Supervised learning • Both input (feature) and output (class labels) are provided • Unsupervised learning-only input is given • Clustering • Dimensionality reduction • Density estimation • Semi-supervised learning-some input has output labels and others do not have Examples of Pattern Recognition Applications • Computer/Machine Vision  object recognition, activity recognition, image segmentation, inspection • Medical Imaging  Cell classification • Optical Character Recognition  Machine or hand written character/digit recognition • Brain Computer Interface  Classify human brain states from EEG signals • Speech Recognition  Speaker recognition, speech understanding, language translation • Robotics  Obstacle detection, scene understanding, navigation Computer Vision Example: Facial Expression Recognition Machine Vision Example Example: Handwritten Digit Recognition Probability Calculus U is the sample space X is a subset of the outcome or an event P(X ˅ Y)=P(X)+P(Y) - P(X ˄Y) ,i.e, X and Y are mutually exclusive 8 Probability Calculus (cont’d) • Conditional independence X Y | Z  P( X | Y , Z )  P( X | Z ) or P( X , Y | Z )  P( X | Z ) P(Y | Z ) • The Chain Rule Given three events A, B, C P ( A, B, C )  P ( A | B, C ) P ( B | C ) P (C ) 9 The Rules of Probability • Sum Rule • Product Rule Combining the sum and product rules yields P ( X )   P ( X | Y ) P (Y ) Y and conditional sum rule P (C | A)   P (C | A, B ) P ( B | A) B Bayes’ Theorem posterior  likelihood × prior Bayes Rule p(A, B) p(B | A)p(A)   p(A | B) p(B) p(B) p(E | A i )p(A i ) p(E | A i )p(A i )  p(A i | E)  p(E)  p(E | Ai )p(A i ) A1 A2 A3 A4 E A6 A5 i • • • • • Based on definition of conditional probability p(Ai|E) is posterior probability given evidence E p(Ai) is the prior probability P(E|Ai) is the likelihood of the evidence given Ai p(E) is the probability of the evidence 13 Bayesian Rule (cont’d) P( H | E1 ) P( E2 | E1 , H ) P( H | E1 ) P( E2 | E1 , H ) P( H | E1 , E2 )   P( E2 | E1 )  P( H | E1 ) P( E2 | E1 , H ) H Assume E1 and E2 are independent given H, the above equation may be written as P( H | E1 , E2 )  P( H | E1 ) P( E2 | H )  P( H | E1 ) P( E2 | H ) H where P( H | E1 ) is the prior and P( E2 | H ) is the likelihood of H given E2 A Simple Example Consider two related variables: 1. Drug (D) with values y or n 2. Test (T) with values +ve or –ve And suppose we have the following probabilities: P(D = y) = 0.001 P(T = +ve | D = y) = 0.8 P(T = +ve | D = n) = 0.01 These probabilities are sufficient to define a joint probability distribution. Suppose an athlete tests positive. What is the probability that he has taken the drug? P(D  y|T  ve)    P(T  ve | D  y ) P( D  y ) P(T  ve | D  y ) P( D  y )  P(T  ve | D  n) P( D  n) 0.8  0.001 0.8  0.001  0.01 0.999 0.074 15 Expectation (or Mean) • For discrete RV X E ( X )   x  p ( x) x • For continuous RV X E ( X )   x  p ( x) dx x • Conditional Expectation E ( X | y )   x  p ( x | y ) dx x 16 Expectations Conditional Expectation (discrete) Approximate Expectation (discrete and continuous) Variance • The variance of a RV X Var ( X )  E[( X  E ( X )) 2 ]  E ( X 2 )  E 2 ( X ) • Standard deviation  X  Var ( X ) • Covariance of RVs X and Y,  2 XY  E[( X  E ( X ))(Y  E (Y ))]  E ( XY )  E ( X ) E (Y ) • Chebyshev inequality 1 P(| X  E ( X ) | k x )  2 k 18 Independence • If X and Y are independent, then E ( X  Y )  E ( X ) E (Y ) Var ( X  Y )  Var ( X )  Var (Y ) 20 Probability Densities p(x) is the density function, while P(x) is the cumulative distribution. P(x) is a non-decreasing function. Transformed Densities The Gaussian Distribution Gaussian Mean and Variance The Multivariate Gaussian mmean vector Scovariance matrix Minimum Misclassification Rate Two types of mistakes: False positive (type 1) False negative (type 2) The above is called Bayes error. Minimum Bayes error is achieved at x0 Generative vs Discriminative Generative approach: Model Use Bayes’ theorem Discriminative approach: Model directly

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides_chapter1