Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Review CS479/679 Pattern Recognition Dr. George Bebis 1 Why Bother About Probabilities? • Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements. • Probability theory is the proper mechanism for accounting for uncertainty. • Take into account a-priori knowledge, for example: "If the fish was caught in the Atlantic ocean, then it is more likely to be salmon than sea-bass 2 Definitions • Random experiment – An experiment whose result is not certain in advance (e.g., throwing a die) • Outcome – The result of a random experiment • Sample space – The set of all possible outcomes (e.g., {1,2,3,4,5,6}) • Event – A subset of the sample space (e.g., obtain an odd number in the experiment of throwing a die = {1,3,5}) 3 Intuitive Formulation of Probability • Intuitively, the probability of an event α could be defined as: where N(a) is the number of times that event α happens in n trials • Assumes that all outcomes are equally likely (Laplacian definition) 4 Axioms of Probability S A1 A3 A2 A4 C) 5 Unconditional (Prior) Probability • This is the probability of an event in the absence of any evidence. P(Cavity)=0.1 means that “in the absence of any other information, there is a 10% chance that the patient is having a cavity”. 6 Conditional (Posterior) Probability • This is the probability of an event given some evidence. P(Cavity/Toothache)=0.8 means that “there is an 80% chance that the patient is having a cavity given that he is having a toothache” 7 Conditional (Posterior) Probability (cont’d) • Conditional probabilities are defined as follows: P( A, B) P( A / B) , P( B) P( A, B) P B / A P( A) • The above definitions lead to the chain rule: P(A,B)=P(A/B)P(B)=P(B/A)P(A) 8 Law of Total Probability • If A1, A2, …, An is a partition of mutually exclusive events and B is any event, then: P( B) P( B, A1 ) P( B, A2 ) ... P( B, An ) P( B / A1 ) P( A1 ) P( B / A2 ) P( A2 ) ... P( B / An ) P( An ) n P( B / Aj ) P( Aj ) j 1 • Special case : A, A S A1 A2 A3 A4 P( B) P( B, A) P( B, A) P( B / A) P( A) P( B / A) P( A) 9 Bayes’ Rule • Using the definition of conditional probabilities leads to the Bayes’ rule: P( A, B) P( A / B) , P( B) P( A, B) P B / A P( A) P( B / A) P( A) P( A / B) P( B) where: P( B) P( B, A) P( B, A) P( B / A) P( A) P( B / A) P( A) 10 Example • Consider the probability of Disease given Symptom: P( Symptom / Disease) P( Disease) P( Disease / Symptom) P( Symptom) where: P ( Symptom ) P ( Symptom / Disease ) P ( Disease ) P ( Symptom / Disease ) P ( Disease ) 11 Example (cont’d) • Meningitis causes a stiff neck 50% of the time. • A patient comes in with a stiff neck – what is the probability that he has meningitis? P( S / M ) P( M ) P( M / S ) P( S ) where P(S/M)=0.5 • Need to know the following: – The prior probability of a patient having meningitis (P(M)=1/50,000) – The prior probability of a patient having a stiff neck (P(S)=1/20) P(M/S)=0.0002 12 General Form of Bayes’ Rule • If A1, A2, …, An is a partition of mutually exclusive events and B is any event, then the Bayes’ rule is given by: P( B / Ai ) P( Ai ) P( Ai / B) P( B) where P( B) n P( B / A ) P( A ) j 1 j j 13 Independence • Two events A and B are independent iff: P(A,B)=P(A)P(B) • Using independence, we can show that: P(A/B)=P(A) and P(B/A)=P(B) • A and B are conditionally independent given C iff: P(A/B,C)=P(A/C) e.g., P(WetGrass/Season,Rain)=P(WetGrass/Rain) 14 Random Variables • In many experiments, it is easier to deal with a summary variable than with the original probability structure. Example • In an opinion poll, we ask 50 people whether agree or disagree with a certain issue. – Suppose we record a "1" for agree and "0" for disagree. – The sample space for this experiment has 250 elements. • Suppose we are only interested in the number of people who agree. – Define the variable X=number of "1“ 's recorded out of 50. – The new sample space has only 51 elements. 15 Random Variables (cont’d) • A random variable (r.v.) is a function that assigns a value to the outcome of a random experiment. X(j) 16 Random Variables (cont’d) • How do we compute probabilities using random variables? – Suppose the sample space is S=<s1, s2, …, sn> – Suppose the range of the random variable X is <x1,x2,…,xm> – We observe X=xj iff the outcome of the random experiment is an s j S such that X(sj)=xj P( X x j ) s j :X ( s j ) x j P( s j ) 17 Example • Consider the experiment of throwing a pair of dice: 18 Discrete/Continuous Random Variables • A discrete r.v. can assume only discrete values. • A continuous r.v. can assume a continuous range of values (e.g., sensor readings). • Probability mass/density function (pmf/pdf) • Probability distribution function (PDF) 19 Probability mass function (pmf) • The pmf of a discrete r.v. X assigns a probability to each possible value x of X. • Notation: given two r.v.'s, X and Y, their pmf are denoted as pX(x) and pY(y); for convenience, we will drop the subscripts and denote them as p(x) and p(y). However, keep in mind that these functions are different! 20 Probability density function (pdf) • The pdf of a continuous r.v. X shows the probability of being “close” to some number x. • Notation: given two r.v.'s, X and Y, their pdf are denoted as pX(x) and pY(y); for convenience, we will drop the subscripts and denote them as p(x) and p(y). However, keep in mind that these functions are different! 21 Probability Distribution Function (PDF) • Definition: F ( x) P( X x) • Some properties of PDF: – (1) 0 F ( x) 1 – (2) F(x) is a non-decreasing function of x • If X is discrete, its PDF can be computed as follows: x x k 0 k 0 F ( x) P( X x) P( X k ) p (k ) 22 Probability Distribution Function (PDF) (cont’d) pmf 23 Probability Distribution Function (PDF) (cont’d) • If X is continuous, its PDF can be computed as follows: x F ( x) p(t )dt for all x • The pdf can be obtained from the PDF using: dF p ( x) ( x) dx 24 Example • Gaussian pdf and PDF 0 25 Joint pmf (discrete case) • For n random variables, the joint pmf assigns a probability for each possible combination of values: p(x1,x2,…,xn)=P(X1=x1, X2=x2, …, Xn=xn) p( x1 , x2 ,..., xn ) 1 x1 , x2 ,..., xn Notation: the joint pmf /pdf of the r.v.'s X1, X2, ..., Xn and Y1, Y2, ..., Yn are denoted as pX1X2...Xn(x1,x2,...,xn) and pY1Y2...Yn(y1,y2,...,yn); for convenience, we will drop the subscripts and denote them p(x1,x2,...,xn) and p(y1,y2,...,yn), keep in mind, however, that these are two different functions. 26 Joint pmf (discrete case) (cont’d) • Specifying the joint pmf requires a large number of values – kn assuming n random variables where each one can assume one of k discrete values. – Can be simplified if we assume independence or conditional independence. P(Cavity, Toothache) is a 2 x 2 matrix Joint Probability Toothache Not Toothache Cavity 0.04 0.06 Not Cavity 0.01 0.89 Sum of probabilities = 1.0 27 Joint pdf (continuous r.v.) For n random variables, the joint pdf gives the probability of being “close” to a given combination of values: p( x1 , x2 ,..., xn ) 0 ... p( x , x ,..., x )dx ...dx 1 x1 2 n 1 n 1 xn 28 Conditional pdf • The conditional pdf is defined as follows: p ( x, y ) p ( y / x) p ( x) • Using the definition of conditional pdfs, leads to the chain rule: p ( x, y ) p ( y / x ) p ( x ) • General case: p( x1 , x2 ,..., xn ) p( x1 / x2 ,..., xn ) p( x2 / x3 ,..., xn )... p( xn1 / xn ) p( xn ) 29 Independence • Knowledge about independence between r.v.'s is very useful because it simplifies things greatly! e.g., if X and Y are independent, then: p ( x, y ) p ( x ) p ( y ) 2D 1D 30 Law of Total Probability • The law of total probability: p ( y ) p ( y / x) p ( x) x 31 Marginalization • Given the joint pmf/pdf, we can compute the pmf/pdf of any subset of variables using marginalization: – Examples: p( x) p( x, y)dy p( x1 , x2 ,..., xi 1 , xi 1,..., xn ) p( x1, x2 ,..., xn )dxi p( x1 , x2 ) ... p( x1 , x2 ,..., xn )dx3 ...dxn x3 xn 32 Example 33 Normal (Gaussian) Distribution 1-dimensional: - d-dimensional: dx1 d x d symmetric matrix 34 Normal (Gaussian) Distribution (cont’d) • Parameters and shape of Gaussian distribution d ( d 1) – Number of parameters is d 2 – Location determined by μ; shape determined by Σ 35 Normal (Gaussian) Distribution (cont’d) • Mahalanobis distance: r (x ) (x ) t 1 • If the variables are independent, the multivariate normal distribution becomes: ( xi i ) 2 p ( x) exp[ ] 2 2 i 2 i 2 i 1 Σ is diagonal 36 Expected Value 37 Expected Value (cont’d) • The sample mean and expected value are related by: • The expected value for a continuous r.v. is given by: 38 Variance and Standard Deviation 39 Covariance 40 Covariance Matrix – 2 variables Σ= and Cov(X,Y)=Cov(Y,X) (i.e., symmetric matrix) can be computed using: 41 Covariance Matrix – n variables (i.e., symmetric matrix) 42 Uncorrelated r.v.’s D D 43 Properties of covariance matrix (Note: we will review these concepts later in case you do not remember them) 44 Covariance Matrix Decomposition where Φ-1= ΦT 45 Linear Transformations 46 Linear Transformations (cont’d) Whitening transform 47