Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Pattern Recognition Topic 2: Bayes Rule Expectant mother: Boy or girl ? P(boy) = 0.5 P(girl) = 0.5 P(boy or girl) = 1 Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 1 Conditional Probability Density probability P(wt | girl) 6 Dr. Ng Teck Khim 7 CS4243 Computer Vision and Pattern Recognition P(wt | boy) 8 9 10 Pattern Recognition Basics Weight increase 2 Let’s suppose the weight increase of all expectant mothers depends on whether the baby is boy or girl. The conditional probability of weight increase for a large sample of expectant mothers is as shown in the graph on the next slide. Suppose now I tell you the weight increase of the expectant mother. Then you are “better informed” as far as guessing boy or girl is concerned. If I tell you weight increase is 9, then you have better chance of winning the bet if you guess “boy”. If I tell you weight increase is 7, then you have better chance of winning the bet if you guess “girl”. Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 3 So the conditional probaility tells you the likelihood of event A happening given that event B has happened: P( A | B) Sometimes, you want to compute P(A | B) but you are not given the conditional probability function P(A | B). Instead, you are given P(B | A). Can you make use of P(B | A) to compute P(A | B) ? The answer is “Yes”, with some additional information. Bayes Rule allows you to do this. Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 4 Bayes Decision Theory boy Suppose we have two classes: girl c1 and c2 with a priori probability density functions P(c1 ) and P(c2 ) Respectively. weight increase Suppose we have the measurement (observation) x that will give some clue to allow us to guess whether the measurement comes from class c1 or c2 . Depending on whether the class is c1 or c2 , the probability of observing x may be different. We describe this using the conditional probability density P( x | c1 ) Dr. Ng Teck Khim and CS4243 Computer Vision and Pattern Recognition P( x | c2 ) Pattern Recognition Basics 5 Our goal is to find out which class is more likely, given the measurement x. In other words, we want to compare the a posteriori probabilities P(c1 | x) and P(c2 | x) Bayes rule allows us to express the a posteriori probability in terms of the a priori probability and class conditional density. Bayes Rule P( x | ci ) P(ci ) P(ci | x) P( x) Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 6 p( wt | b) p(b) p(b | wt ) p( wt ) so --- Eqn-1 where p(b) 0.5 p( wt ) p( wt | girl ) * p( girl ) p( wt | boy ) * p(boy ) 2 i.e. P( x) P( x | ci ) P (ci ) i 1 In this expectant mother problem, we are actually not required to compute the actual probability p(b | wt). All that we need is to have a good way to decide whether it is boy or girl. Since p(wt) is a constant whether it is boy or girl, all that we are interested in is the numerator of the right hand side of Eqn-1. Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 7 A natural approach: boy girl Decide class c1 if P(c1 | x) P(c2 | x) Decide class c2 if P(c2 | x) P(c1 | x) weight increase If we have a more complicated situation whereby different wrong decisions result in different levels of penalty, the decision process will be slightly more complicated than above. Example: if baby is girl, and you say “boy”, you lose $10 if baby is boy, and you say “girl”, you lose $20 Then you should attach “cost” in your decision making. Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 8 Let 11 be the penalty for deciding class c1 if x comes from class c1 Let 12 be the penalty for deciding class c1 if x comes from class c2 Let 21 be the penalty for deciding class c2 if x comes from class Let 22 be the penalty for deciding class (Usually, 11 and 22 c1 c2 if x comes from class c2 are both set to zero) Then given a measurement x, the risk of deciding class c1 is R(c1 | x) 11P(c1 | x) 12 P(c2 | x) Similarly, the risk of deciding class c2 is R(c2 | x) 21P(c1 | x) 22 P(c2 | x) Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 9 So we will decide class c1 if R(c1 | x) R(c2 | x) P(c1 | x) [11 21 ] P(c2 | x) [22 12 ] Using Bayes rule, P(c1 | x) [11 21 ] P(c2 | x) [22 12 ] P( x | c1 ) P(c1 ) P ( x | c 2 ) P ( c2 ) [11 21 ] [22 12 ] P( x) P( x) P( x | c1 ) P(c1 ) [11 21 ] P( x | c2 ) P(c2 ) [22 12 ] How to obtain these in practice ? Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 10 Gaussian (Normal) Density Probability density functions (pdf) are often expressed in analytical form. One such pdf that is very commonly encountered is Gaussian Density, also known as Normal Density. Let X be a d-dimensional feature vector. x1 x 2 x x3 xd Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 11 If each of the elements of the vector X is a Gaussian random variable, the multivariate normal density is given by 1 1 T 1 P ( x) exp( (x x) C (x x) 2 (2 ) d C Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 12 where x is the mean vector, and C is the covariance matrix. x E{x} There’s a mistake in Lab4 instruction C E{ (x x) (x x)T } (x(i) x) 2 (x(i) x)( y(i) y ) 1 N C 2 N i 1 (y (i) y )( x(i) x) (y (i) y ) C is determinant of C The linear combination of Gaussian random vectors is also a Gaussian random vector. Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 13 In Lab4 instruction sheet, It says x 2 E yx xy 2 y It should have been ( x x ) 2 ( x x )( y y ) E 2 ( y y) ( y y )( x x ) Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 14 Properties of covariance matrix • C is always symmetric and positive semi-definite • C is usually positive definite. It is positive semi-definite when at least one of the variables have zero variance, or at least 2 of the variables are identical. These are degenerate cases that we will exclude. • C is a symmetric matrix (Hermitian symmetric if x is complex) C 0 Dr. Ng Teck Khim and so C CS4243 Computer Vision and Pattern Recognition 1 exists. Pattern Recognition Basics 15 Maximum Likelihood Estimates For parameter estimation Given a set of observations (measurements) x {x1 , x2 , x3 ,, xN } corresponding to time t {t1 , t2 , t3 ,, t N } we want to estimate the parameter vector that satisfies xi f ( , t i ) Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 16 The maximum likelihood estimate of is the ˆ that maximizes P(x | ) In computing the ML estimates, it is frequently more convenient and easier to work with the logarithm of P(x | ) , i.e. log P(x | ) It is ok to do this because logarithm is a monotonically increasing function. Dr. Ng Teck Khim CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 17 If {x1 , x2 , x3 ,, xN } are independent measurements of a random variable x, then N P(x | ) P( xi | ) i 1 N log P(x | ) log P( xi | ) i 1 Finding ˆ by setting Dr. Ng Teck Khim that maximizes log P(x | ) can usually be done log P(x | ) 0 CS4243 Computer Vision and Pattern Recognition Pattern Recognition Basics 18