Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Probability Theory Probability Theory Probability spaces Probability spaces Counting Counting Probabilities Probabilities Conditional probability Conditional probability Bayes’ Theorem Introduction to Probability Theory Random variables Expectation & variance Distributions To start out the course, we need to know something about statistics and probability I L645 Dept. of Linguistics, Indiana University Fall 2015 Probability spaces Introduction to Probability Theory Bayes’ Theorem Random variables Expectation & variance Distributions This is only an introduction; for a fuller understanding, you would need to take a statistics course Probability theory = theory to determine how likely it is that some outcome will occur 1 / 34 2 / 34 Introduction to Probability Theory Introduction to Probability Theory Sample space Die rolling We state things in terms of an experiment (or trial)—e.g., flipping three coins I outcome: one particular possible result I I e.g., coin 1 = heads, coin 2 = tails, coin 3 = tails (HTT) event: one particular possible set of results, i.e., a more abstract idea I e.g., two tails and one head ({HTT, THT, TTH}) Probability spaces Probability spaces Counting Counting Probabilities Probabilities Conditional probability Conditional probability Bayes’ Theorem Bayes’ Theorem Random variables Expectation & variance Distributions If we have a 6-sided die I I The set of basic outcomes makes up the sample space (Ω) I I Discrete sample space: countably infinite outcomes (1, 2, 3, ...), e.g., heads or tails Continuous sample space: uncountably infinite outcomes (1.1293..., 8.765..., ...), e.g., height Random variables Expectation & variance Distributions Sample space Ω = {One, Two, Three, Four, Five, Six} Event space F = {{One}, {One, Two}, {One, Three, Five} ...} I With 6 options, there are 26 = 64 distinct events We will use F to refer to the set of events, or event space Principles of counting 3 / 34 4 / 34 Introduction to Probability Theory Introduction to Probability Theory Principles of counting Examples Probability spaces Probability spaces Counting Counting Probabilities I I Multiplication Principle: if there are two independent events, P and Q, and P can happen in p different ways and Q in q different ways, then P and Q can happen in p · q ways Probabilities Conditional probability I Bayes’ Theorem Random variables Expectation & variance Distributions example 1: If there are 3 roads leading from Bloomington to Indianapolis and 5 roads from Indianapolis to Chicago, how many ways are there to get from Bloomington to Chicago? Conditional probability Bayes’ Theorem Random variables Expectation & variance Distributions answer: 3 · 5 = 15 Addition Principle: if there are two independent events, P and Q, and P can happen in p different ways and Q in q different ways, then P or Q can happen in p + q ways I example 2: If there are 2 roads going south from Bloomington and 6 roads going north. How many roads are there going south or north? answer: 2 + 6 = 8 5 / 34 6 / 34 Exercises Introduction to Probability Theory Probability spaces Counting Probabilities I I How many different 7-place license plates are possible if the first two places are for letters and the other 5 for numbers? John, Jim, Jack and Jay have formed a band consisting of 4 instruments. I I Probability functions Probabilities: if A is an event, P (A ) is its probability I Conditional probability Bayes’ Theorem Random variables Expectation & variance Distributions 0 ≤ P (A ) ≤ 1 Probability spaces Counting Probabilities Conditional probability A probability function (distribution) distributes a probability mass of 1 over the sample space Ω I If each of the boys can play all 4 instruments, how many different arrangements are possible? What if John and Jim can play all 4 instruments, but Jay and Jack can each play only piano and drums? Introduction to Probability Theory Bayes’ Theorem Random variables Expectation & variance Distributions Probability function: any function P : F → [0, 1], where: I P (Ω) = 1 I Aj ∈ F: P ( ∞ S j =1 Aj ) = ∞ P j =1 P (Aj ) (countable additivity) ... for disjoint sets: Aj ∩ Ak = ∅ for j , k i.e., the probability of any event Aj happening = sum of the probabilities of any individual event happening I e.g., P (roll = 1 ∪ roll = 2) = P (roll = 1) + P (roll = 2) 7 / 34 Example Introduction to Probability Theory 8 / 34 Additivity for non-disjoint sets Probability spaces Probability spaces Counting Counting Probabilities Toss a fair coin three times. What is the chance of exactly 2 heads coming up? I I Sample space Ω = {HHH,HHT,HTH,HTT,THH,THT,TTH,TTT} Event of interest A = {HHT,HTH,THH} Probabilities Conditional probability Conditional probability Bayes’ Theorem Bayes’ Theorem Random variables Expectation & variance Distributions P (A ∪ B ) = P (A ) + P (B ) − P (A ∩ B ) I The probability of unioning A and B requires adding up their individual probabilities I ... then subtracting out their intersection, so as not to double count that portion Since the coin is fair, we have a uniform distribution, i.e., each outcome is equally likely (1/8) I P (A ) = |A | |Ω| = Introduction to Probability Theory Random variables Expectation & variance Distributions 3 8 Conditional probability 9 / 34 10 / 34 Introduction to Probability Theory Introduction to Probability Theory Conditional probability Example Probability spaces Probability spaces Counting Counting Probabilities The conditional probability of an event A occurring given that event B has already occurred is notated as P (A |B ) I I Prior probability of A : P (A ) Posterior probability of A (after additional knowledge B): P (A |B ) (1) P (A |B ) = I I P (A ∩B ) P (B ) = Probabilities Conditional probability I Bayes’ Theorem Random variables Expectation & variance Distributions I P ( A ,B ) P (B ) I P (A , B ) (or P (AB )) is the joint probability In some sense, B has become the sample space I 11 / 34 A coin is flipped twice. If we assume that all four points in the sample space Ω = {(H , H ), (H , T ), (T , H ), (T , T )} are equally likely, what is the conditional probability that both flips result in heads given that the first flip does? Conditional probability Bayes’ Theorem Random variables Expectation & variance Distributions A = {(H , H )}, B = {(H , H ), (H , T )} P (A ) = 14 P (B ) = 24 P (A , B ) = P (A |B ) = 1 4 P (A ,B ) P (B ) = 1 4 2 4 = 1 2 12 / 34 The chain rule Introduction to Probability Theory The multiplication rule restates P (A |B ) = P (A ∩B ) : P (B ) (2) P (A ∩ B ) = P (A |B )P (B ) = P (B |A )P (A ) The chain rule (used in Markov models): Counting Probabilities Probabilities Conditional probability Bayes’ Theorem Random variables nT −1 i =1 I I I Two events are independent if knowing one does not affect the probability of the other Events A and B are independent if I Ai ) I i.e., to obtain the probability of events occurring: I Probability spaces Counting Distributions P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 )...P (An | Introduction to Probability Theory Probability spaces Expectation & variance (3) P (A1 ∩ ... ∩ An ) = Independence Conditional probability Bayes’ Theorem Random variables Expectation & variance Distributions P (A ) = P (A |B ) i.e., P (A ∩ B ) = P (A )P (B ) i.e., probability of seeing A and B together = product of seeing each one individually, as one does not affect other select the first event select the second event, given the first ... select the nth event, given all the previous ones 13 / 34 Introduction to Probability Theory Independence 14 / 34 Bayes’ Theorem Introduction to Probability Theory Example Probability spaces Probability spaces Counting Counting Probabilities I I I fair die: event A = divisible by two; event B = divisible by three P(AB) = P({six}) = P(A) × P(B) = 1 2 × 1 6 1 3 = event C = divisible by four P(C) = P({four}) = I 1 6 P(AC) = P({four}) = P(A) × P(C) = × Bayes’ Theorem Distributions Bayes’ Theorem allows one to calculate P (B |A ) in terms of P (A |B ) (4) P (B |A ) = I 1 2 Bayes’ Theorem Random variables Expectation & variance 1 6 I I Probabilities Conditional probability Conditional probability 1 6 1 6 = P (A ∩B ) P (A ) = P (A |B )P (B ) P (A ) 15 / 34 16 / 34 Introduction to Probability Theory Partitioning Complement sets Probability spaces Probability spaces Counting Counting Probabilities Bayes’ Theorem takes into account the normalizing constant P (A ) Probabilities Conditional probability I Bayes’ Theorem Let E and F be events. Random variables Expectation & variance = P (A |B )P (B ) P (A ) Distributions c P (A |B )P (B ) P (A ) E = EF ∪ EF c Conditional probability Bayes’ Theorem Random variables Expectation & variance Distributions where EF and EF are mutually exclusive. If P (A ) is the same for every event of interest, and we want to find the value of B which maximizes the function: (6) arg maxB Distributions Introduction to Probability Theory Getting the most likely event P (A ∩B ) P (A ) Expectation & variance 1 12 Bayes (5) P (B |A ) = Random variables I P (E ) = P (EF ) + P (EF c ) = arg maxB P (A |B )P (B ) = P (E |F )P (F ) + P (E |F c )P (F c ) = P (E |F )P (F ) + P (E |F c )P (1 − P (F )) ... so, in these cases, we can ignore the denominator 17 / 34 18 / 34 Example Introduction to Probability Theory Example (2) Introduction to Probability Theory Probability spaces Probability spaces Counting Counting Probabilities I I Probabilities An insurance company groups people into two classes: Those who are accident-prone and those who are not. Conditional probability Their statistics show that an accident-prone person will have an accident within a fixed 1-year period with prob. 0.4. Whereas the prob. decreases to 0.2 for a non-accident-prone person. Random variables I Let us assume that 30 percent of the population are accident prone. I What is the prob. that a new policyholder will have an accident within a year of purchasing a policy? I Bayes’ Theorem Expectation & variance Distributions I Y = policyholder will have an accident within one year A = policyholder is accident prone Conditional probability Bayes’ Theorem Random variables Expectation & variance look for P(Y) Distributions I P (Y ) = P (Y |A )P (A ) + P (Y |A c )P (A c ) = 0.4 · 0.3 + 0.2 · 0.7 = 0.26 20 / 34 19 / 34 Partitioning Introduction to Probability Theory Example of Bayes’ Theorem Introduction to Probability Theory Probability spaces Counting Probabilities Let’s say we have i different, disjoint sets Bi , and these sets S partition A (i.e., A ⊆ i Bi ) Then, the following is true: (7) P (A ) = P i P (A ∩ Bi ) = P i Conditional probability Bayes’ Theorem Random variables Expectation & variance Distributions P (A |Bi )P (Bi ) P (A |Bj )P (Bj ) P (A ) = Given that we have pulled a red chip, what is the probability that it came from bowl B1 ? In other words, what is P (B1 |R )? I This gives us a more general form of Bayes’ Theorem: (8) P (Bj |A ) = Probability spaces Assume the following: I Bowl B1 (P (B1 ) = 1 ) has 2 red and 4 white chips 3 I Bowl B2 (P (B2 ) = 1 ) has 1 red and 2 white chips 6 I Bowl B3 (P (B3 ) = 1 ) has 5 red and 4 white chips 2 I I P (A |Bj )P (Bj ) P i P (A |Bi )P (Bi ) P (B1 ) = 13 P (R |B1 ) = 2 2+ 4 = Expectation & variance Distributions P (R ) = P (B1 ∩ R ) + P (B2 ∩ R ) + P (B3 ∩ R ) = P (R |B1 )P (B1 ) + P (R |B2 )P (B2 ) + P (R |B3 )P (B3 ) = P (R |B1 )P (B1 ) P (R ) = (1/3)(1/3) 4/9 = 4 9 1 4 21 / 34 22 / 34 Introduction to Probability Theory Random Variables Motivation Probability spaces Probability spaces Counting Counting Probabilities Probabilities Conditional probability If we roll 2 dice: Bayes’ Theorem ... {(4,6),(5,5),(6,4)} {(5,6),(6,5)} {(6,6)} Bayes’ Theorem Random variables Introduction to Probability Theory Mapping Outcomes to Numerical Values Outcomes {(1,1)} {(1,2),(2,1)} {(1,3),(2,2),(3,1)} {(1,4),(2,3),(3,2),(4,1)} Probabilities Conditional probability 1 3 So, we have: P (B1 |R ) = Random Variables Counting Numerical Value 2 3 4 5 Probability 10 11 12 1 12 1 18 1 36 1 36 1 18 1 12 1 9 Random variables Expectation & variance Distributions A random variable maps the set of outcomes Ω into the set of real numbers. I Formally, a random variable is a function X → Rn Conditional probability Bayes’ Theorem Random variables Expectation & variance Distributions Motivation for random variables: 23 / 34 I They abstract away from outcomes by putting outcomes into equivalence classes I Mapping to numerical values makes calculations easier. I They facilitate numerical manipulations, such as the definition of mean and standard deviation. 24 / 34 Example Introduction to Probability Theory Suppose that our experiment consists of tossing 3 fair coins. If we let Y denote the number of heads appearing, then Y is a random variable taking on one of the values 0, 1, 2, 3 with respective probabilities. Introduction to Probability Theory Probability spaces Probability spaces Counting Counting 1 P { Y = 0} = P ({(T , T , T )}) = 8 Probabilities Conditional probability Bayes’ Theorem Random variables Expectation & variance Distributions P { Y = 0} = P ({(T , T , T )}) = Example (2) 1 8 3 8 3 P { Y = 2} = P ({(H , H , T ), (H , T , H ), (T , H , H )}) = 8 1 P { Y = 3} = P ({(H , H , H )}) = 8 P { Y = 1} = P ({(T , T , H ), (T , H , T ), (H , T , T )}) = Probabilities Conditional probability 3 8 3 P { Y = 2} = P ({(H , H , T ), (H , T , H ), (T , H , H )}) = 8 1 P { Y = 3} = P ({(H , H , H )}) = 8 P { Y = 1} = P ({(T , T , H ), (T , H , T ), (H , T , T )}) = Bayes’ Theorem Random variables Expectation & variance Distributions Since Y must take on one of the values 0 through 3, 3 [ 1 = P( i =0 {Y = i }) = 3 X i =0 P {Y = i } 25 / 34 Expectation of a Random Variable Introduction to Probability Theory 26 / 34 Example Introduction to Probability Theory Probability spaces Probability spaces Counting Counting Probabilities If X is a discrete random variable with probability mass function p (a ), the expectation or expected value of X , denoted by E [X ], is defined by E [X ] = X x :p (x )>0 Probabilities Conditional probability Find E [X ] where X is the outcome when we roll a fair die. Bayes’ Theorem Random variables Expectation & variance Distributions x · p (x ) Bayes’ Theorem Random variables Solution Since p (1) = · · · = p (6) = Conditional probability Expectation & variance Distributions 1 6, 1 1 1 1 1 7 1 E [X ] = 1 · ( ) + 2 · ( ) + 3 · ( ) + 4 · ( ) + 5 · ( ) + 6 · ( ) = 6 6 6 6 6 6 2 In other words: E [X ] is the weighted average of the possible values of X , each value being weighted by the probability that X has that particular value. 28 / 34 27 / 34 Example (2) Introduction to Probability Theory Example (3) Probability spaces Probability spaces Counting Counting Probabilities Probabilities Conditional probability Bayes’ Theorem Assume we have the following weighted die: I p (X = 1) = p (X = 2) = p (X = 5) = I p (X = 3) = p (X = 4) = I p (X = 6) = 1 3 1 12 1 6 Introduction to Probability Theory Random variables Expectation & variance Distributions A school class of 120 students are driven in three buses to a symphonic performance. There are 36 students in one of the buses, 40 in another, and 44 in the third bus. When the buses arrive, one of the 120 students is randomly chosen. Let X be a random variable denoting the students on the bus of that randomly chosen student. Conditional probability Bayes’ Theorem Random variables Expectation & variance Distributions What is the expectation here? Task: Find E [X ] 29 / 34 30 / 34 Variance Introduction to Probability Theory Expectation alone ignores the question: I Do the values of a random variable tend to be consistent over many trials or do they tend to vary a lot? If X is a discrete random variable with mean µ, then the variance of X , denoted by Var (X ), is defined by Example Introduction to Probability Theory Probability spaces Probability spaces Calculate Var (X ) if X represents the outcome when a fair die is rolled. Counting Probabilities Conditional probability Bayes’ Theorem Random variables Expectation & variance Distributions 6 1 1 +5 ( ) + 62 ( ) 6 6 1 = ( )(91) 6 6 Probabilities Conditional probability Bayes’ Theorem Solution 1 1 1 1 E [X 2 ] = 12 ( ) + 22 ( ) + 32 ( ) + 42 ( ) 6 Counting Random variables Expectation & variance 6 Distributions 2 Var (X ) = E [(X − µ)2 ] = E (X 2 ) − (E (X ))2 Hence The standard deviation of X , denoted by σ(X ), is defined as σ(X ) = q Var (X ) = Var (X ) 91 7 35 − ( )2 = 6 2 12 31 / 34 Example for expectation and variance Introduction to Probability Theory Probability spaces Counting When we roll two dice, what is the expectation and the variance for the sum of the numbers on the two dice? 32 / 34 Joint distributions (9) (10) = = x P x Counting Probabilities I Conditional probability Joint pmf: The probability of both x and y happening Random variables Expectation & variance Distributions Expectation & variance I Marginal pmfs: The probability of x happening is the sum of the occurrences of x with all the different y’s (12) pX (x ) = (13) pY (y ) = p (x )(x − E (X ))2 Conditional probability Bayes’ Theorem (11) p (x , y ) = P (X = x , Y = y ) Random variables Var (X ) = E ((X − E (X ))2 ) P Probability spaces With (discrete) random variables, we can define: Probabilities Bayes’ Theorem E (X ) = E (Y + Y ) = E (Y ) + E (Y ) = 3.5 + 3.5 = 7 Introduction to Probability Theory P y p (x , y ) x p (x , y ) P Distributions If X and Y are independent, then p (x , y ) = pX (x )pY (y ), so, e.g., the probability of rolling two sixes is: p (x )(x − 7)2 = 5 65 I 33 / 34 p (X = 6, Y = 6) = p (X = 6)p (Y = 6) = ( 16 )( 16 ) = 1 36 34 / 34