Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Advanced Algorithms – COMS31900 Sample space S is the set of outcomes of an experiment. 2013/2014 For x ∈ S , the probability of x, written Pr(x), is a real number P between 0 and 1, such that x∈S Pr(x) = 1. E XAMPLE Lecture 1 Mathematical preliminaries The outcome of a die roll: S = {1, 2, 3, 4, 5, 6}. Pr(1) = Pr(2) = Pr(3) = Pr(4) = Pr(5) = Pr(6) = 16 . Markus Jalsenius E XAMPLE Flip a coin: S = {H, T}. Pr(H) = Pr(T) = 12 . E XAMPLE Amount of money you can win when playing some lottery: S = {£0, £10, £100, £1000, £10, 000, £100, 000}. Pr(£0) = 0.9, Pr(£10) = 0.08, . . . , Pr(£100, 000) = 0.0001. 2 4 Event Probability An event is a subset V of the sample space S . The probability of event V happening, denoted Pr(V ), is X Pr(x). Pr(V ) = O BSERVE The sample space is not necessarily finite. E XAMPLE x∈S Flip a coin until first tail shows up: S = {T, HT, HHT, HHHT, HHHHT, HHHHHT, . . . }. Pr(It takes n coin flips) = and 1 n 2 , ∞ n X 1 n=1 E XAMPLE =1 2 5 Flip a coin 3 times. There are 8 outcomes, each have probability 18 . Event V : first and last coinflips are the same. Pr(V ) = Pr(HHH)+Pr(HTH)+Pr(THT)+Pr(TTT) = 4× 81 = 12 . 6 Random variable Linearity of expectation A random variable (r.v.) Y over sample space S is a function S → R T HEOREM Let Y1 , Y2 , . . . , Yk be k random variables. Then (a mapping from every outcome to the real numbers). The probability of Y taking value y is Pr(Y = y) = X k k X X E Yi = E(Yi ) Pr(x). i=1 x∈S such that Y(x) = y E XAMPLE Two coin flips. S Y HH 2 1 5 2 HT TH TT Pr(Y = 2) = 7 O BSERVE Linearity of expectation always holds, regardless of whether the random varibles Yi are dependent or not. The expected value (the mean) of a r.v. Y , denoted E(Y ), is X Y (x) · Pr(x). E(Y ) = E XAMPLE What is the expected sum of two dice rolls? Sum over all possible outcomes: 2 3 1 1 + 3 · 36 + 4 · 36 + . . . , 12 · 36 =7 E(sum) = 2 · 36 Couple the dice so that their outcomes are always the same. x∈S 1 2 E(Y ) = 2 · 1 2 i=1 + 1· 1 4 + 5· 1 4 = 7 2 8 Individually, they are still normal dice. E(sum) = 2 · 16 + 4 · 16 + 6 · 16 + 8 · 16 + 10 · 16 + 12 · 16 = 7 Indicator random variables Mean An indicator random variable is a r.v. that can only be 0 and 1. We usually use the letter I to emphasise that the r.v. is an indicator r.v. Property: E(I) = 0 · Pr(I = 0) + 1 · Pr(I = 1) = Pr(I = 1). E XAMPLE Often an indicator r.v. I is associated with an event such that if the event happens then I = 1, otherwise I = 0. Indicator random variables and linearity of expectation work great together! Roll a die n times. How many die rolls do we expect to show a value that is at least the value of the previous roll? j=2 9 n 10 100 200 1000 100,000 E XAMPLE For j ∈ {2, . . . , n}, indicator r.v. Ij = 1 if the value of the j th roll is at least the value of the previous roll, otherwise Ij = 0. 7 Pr(Ij = 1) = 12 (verify this!). Use linearity of expectation: n n X X 7 E Ij = E(Ij ) = (n − 1) · 12 Flip a coin n times. Let r.v. T be the number of tails. E(T ) = n2 . How likely is T to be within 5% of n 2? Chance 24% 38% 52% 89% (100 − 2.6 · 10−54 )% O BSERVE The sample average of independent, identically distributed variables converges in probability towards their means (Central Limit Theorem). j=2 10 Markov’s inequality Markov’s inequality E XAMPLE E XAMPLE Suppose average (mean) speed on the motorway is 60 mph. Then at most n people go to a party, leaving their hats at the door. Each person leaves with a random hat. 1 2 of all cars run at 120 mph or more, 2 3 of all cars run at 90 mph or more, How many leave with their own hat? For j ∈ {1, . . . , n}, let indicator r.v. Ij = 1 if the j th person gets their own hat, otherwise Ij = 0. By linearity of expectation, otherwise the mean must be higher than 60 mph. T HEOREM If X is a non-negative r.v., then for all a > 0, Pr(X ≥ a) ≤ E(X) . a Go through the proof as an exercise. Indicator random variables and linearity of expectation will come in handy again. E j=1 11 n n X X 1 Ij = E(Ij ) = Pr(Ij = 1) = n · = 1. n j=1 j=1 E(X) By Markov’s inequality (recall: Pr(X ≥ a) ≤ a ), Pr(5 or more people leaving with their own hats) ≤ 15 , Pr(at least 1 person leaving with their own hat) ≤ 11 = 1. E XAMPLE From the example above: 60 = 12 , Pr(speed of a random car ≥ 120 mph) ≤ 120 60 Pr(speed of a random car ≥ 90 mph) ≤ 90 = 23 . n X 12 Sometimes Markov’s inequality is not particularly informative. (In fact, here it can be shown that as n → ∞, the probability that at least one person leaves with their own hat is 1 − 1e .) Union bound Markov’s inequality C OROLLARY E XAMPLE If X is a non-negative r.v. that only takes integer values, then S = {1, . . . , 6} is the set of outcomes of a die roll. Three events: Pr(X > 0) = Pr(X ≥ 1) ≤ E(X) . V1 = {3, 4, 5} (between 3 and 5), V2 = {1, 2, 3} (3 or less), V3 = {2, 3, 5} (a prime). E XAMPLE The probability of winning money on lottery is at most the expected gain. V2 S 1 3 O BSERVE 4 For an indicator r.v. I , the bound is tight, as Pr(I > 0) = E(I). V1 5 6 2 V3 Pr(V1 or V2 or V3 ) = 56 ≤ Pr(V1 ) + Pr(V2 ) + Pr(V3 ) = 32 . Double counting Pr(2) and Pr(5), Triple counting Pr(3). 13 14 Union bound Binomial coefficients The number of ways to choose k elements from a set of n elements is ! n n! = . k k!(n − k)! T HEOREM Let V1 , . . . , Vk be k events. Then Pr k [ Vi i=1 ≤ k X Pr(Vi ). i=1 O BSERVE The bound is tight when the events are all disjoint. P ROOF Indicator r.v. Ij = 1 if event Vj happens, otherwise Ij = 0. P Let X = kj=1 Ij be the number of events happening. Sk Pr j=1 Vj P P = Pr(X > 0) ≤ E(X) = kj=1 E(Ij ) = kj=1 Pr(Vj ). Using previous corollary Linearity of expectation 15 16 Binomial coefficients L EMMA n k k ≤ ! n · e k n ≤ k k P ROOF The lower bound The number of ways of picking k balls from n balls with repetition allowed is nk . One can generate all the possible ways by first deciding which k out of n balls to draw, and then draw from the k selected balls instead (this will ideed overcount the number of ways). n There are k ways to choose the k balls and k k ways to pick from the selected k balls with repetition allowed. This gives us ! ! n k k k ⇐⇒ i=0 17 n k n ≤ . k k The upper bound ! ∞ n · e k n nk nk kk nk X ki ≤ = k · ≤ k · ≤ . k k! k! i! k k k nk ≤