* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Notes - kaharris.org
Survey
Document related concepts
Transcript
Bounding Probabilities Chebyshev’s Inequality A measure of the concentration of a random variable X near its mean µ is its variance σ . Chebyshev’s Inequality says that the probability that X lies outside Math 425 Introduction to Probability Lecture 36 2 an arbitrary interval (µ − ε, µ + ε) is negligible, provided the ratio sufficiently small. Kenneth Harris [email protected] If X is a random variable with finite mean µ and variance σ 2 , then for any value ε > 0, April 18, 2009 σ2 P { X − µ ≥ ε } ≤ 2 . ε Math 425 Introduction to Probability Lecture 36 April 18, 2009 1/1 Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 Bounding Probabilities ∞ f (x) dx µ+ε Compute the variance Var (X ) = σ : 2 σ2 Z f (x) dx + = −∞ 3/1 Proof of Chebyshev’s Inequality Fix ε > 0. For clarity we take X to be continuous with density f (x). Z April 18, 2009 Bounding Probabilities Proof of Chebyshev’s Inequality Z is Proposition (Chebyshev’s Inequality) Department of Mathematics University of Michigan Kenneth Harris (Math 425) σ ε ∞ (x − µ)2 · f (x) dx −∞ f (x) dx Z |x−µ|≥ε µ−ε Z = (x − µ)2 · f (x) dx ≥ Z |x−µ|≥ε ≥ f (x) dx = P X − µ ≥ ε . ≥ ε2 |x−µ|≥ε Z f (x) dx (x − µ)2 ≥ ε2 |x−µ|≥ε = ε2 · P X − µ ≥ ε . 8ÈX-ΜȳΕ< 8ÈX-ΜÈ<Ε< Μ-Ε H Μ 8ÈX-ΜȳΕ< So indeed, σ2 P { X − µ ≥ ε } ≤ 2 . ε Μ+Ε L R Note. The proof for discrete X is exactly the same but for replacing " " with P " " and density "f (x)" with mass "p(x)". Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 4/1 Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 5/1 Bounding Probabilities Bounding Probabilities Example Example Chebyshev’s inequality is quite crude, and we will see examples Example. Chebyshev’s inequality is the best possible inequality – in that there are random variables for which the inequality is in fact an equality. where it provides a poor bound. However Fix ε > 0 and choose X with distribution pX (−ε) = 1 2 pX (ε) = Chebyshev’s inequality provides a bound for any distribution whatsoever, and can therefore be used even when no information about the distribution except its statistics (mean and variance) are known. 2 Chebyshev’s inequality is the best we can do when only the statistics of the distribution are known (by the previous example). 3 Chebyshev’s inequality is still useful enough to provide very general and powerful theorems, such as the Weak Law of Large Numbers. These results do not depend on finer determination of the 1 2 So, µ = E[X ] = 0 1 2 2 σ = Var (X ) = ε , and thus, 1 = P X − µ ≥ ε ≤ σ2 ε2 = 1. probability of the “tail" of the distribution. Note: this distribution cuts-out the center (µ − ε, µ + ε) and places the entire mass of the tail of the distribution on ±σ. Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 6/1 Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 Bounding Probabilities 7/1 Bounding Probabilities Example Example Example. Let X be a random variable with variance σ 2 = 0 and mean µ. By Chebyshev’s inequality, the probability that X lies outside the interval (µ − ε, µ + ε) equals zero for any ε > 0: Example. Chebyshev’s inequality provides a bound on the probability that a distribution lies greater than k σ (k standard deviations) from its mean. Let X be any random variable with mean µ and variance σ 2 . Use ε = k σ in Chebyshev: P X − µ ≥ k σ ≤ 2 σ P X − µ ≥ ε ≤ 2 = 0. ε probability for a normally distributed random variable: Let X be a random variable for which E[X ] = E[X ] . Then 2 2 k 1 2 3 4 Var (X ) = E[X 2 ] − E[X ]2 = 0. Thus, P {X = E[X ]} = 1. Math 425 Introduction to Probability Lecture 36 σ2 1 = 2. k 2 σ2 k Here is a comparison of Chebyshev’s bound with the known So, the event {X = µ} has probability 1. Kenneth Harris (Math 425) April 18, 2009 April 18, 2009 8/1 Kenneth Harris (Math 425) Chebyshev (= 1 0.5 0.111 0.0625 1 ) k2 Normal 0.3164 0.0456 0.0038 0.0000633 Math 425 Introduction to Probability Lecture 36 April 18, 2009 9/1 Bounding Probabilities Sample Mean Example Problem Example. Let X be the number of heads obtained in tossing a fair coin 100 times. Find a bound on P {41 ≤ X ≤ 59}. The statistics for X are E[X ] = 100 · 1 = 50 2 Var (X ) = 100 · Problem. We wish to determine the value of an unknown X by making a measurement. However, we expect some variation in the expected measurement do to measuring inaccuracies. Here are some examples 1 2 = 25 2 Use Chebyshev’s inequality to obtain the bound: X is the unknown bias of a coin. 25 P {41 ≤ X ≤ 59} = 1 − P {X − 50 ≥ 10} ≥ 1 − 2 = 0.75. 10 Using the standard normal approximation to the binomial distribution: P {41 ≤ X ≤ 59} X is the angle of reflection in the collision of billiard balls. X is the average score on the final exam of a typical class in Probability theory. = P {40.5 ≤ X ≤ 59.5} 59.5 − 50 40.5 − 50 −Φ ≈ 0.9426. ≈ Φ 5 5 The actual probability is about 0.9431. Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 10 / 1 Kenneth Harris (Math 425) Sample Mean April 18, 2009 12 / 1 Sample Mean Problem Problem Let X have mean µ and variance σ . Chebyshev’s inequality puts a bound on the accuracy ε of a single measurement of X 2 σ2 P { X − µ < ε } ≥ 1 − 2 . ε thenIf the variation σ is very small compared to the desired accuracy ε, this probability will be very close to 1. That is, we can be “almost . . . . . . or else make a sufficiently large number of independent measurements. A powerful consequence of Chebyshev’s inequality is that if we average of sufficiently many independent measurements of X , then the sample mean, X , will with high probability provide the desired accuracy in the measurement: X − ε ≤ µ ≤ X + ε. certain" that X − ε ≤ µ ≤ X + ε. If σ is not small compared to ε, then a single measurement cannot provide the desired accuracy with any certain. We must either give-up accuracy on a single measurement . . . . . . Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 Math 425 Introduction to Probability Lecture 36 April 18, 2009 13 / 1 The reason is that X has the same mean µ, but smaller variance σ 2 . Sampling reduces variation!! This result is the Weak Law of Large Numbers. Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 14 / 1 Sample Mean Sample Mean Definition IID random variables Mean and variance of Sample Mean Proposition Definition A finite (or infinite) sequence of random variables X1 , X2 , . . . is said to be independent and identically distributed (IID) if they are mutually independent and each has the same distribution function: P {Xi ≤ x} = F (x) for all i and x Let X1 , X2 , . . . be IID random variables with finite common mean µ and variance σ 2 . Define the random variables: sum Sn and average An for each n by Sn = X1 + X2 + . . . + Xn Var (Xi ) = σ 2 Sn X1 + X2 + . . . + Xn = n n Then Note that IID random variables have the same statistics: they have the same mean µ and variance σ 2 : E[Xi ] = µ An = Var (Sn ) = n · σ 2 σ2 Var (An ) = n E[Sn ] = n · µ E[An ] = µ for all i Note: An is called the sample mean of X1 , . . . , Xn by statisticians and often denoted by X . Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 15 / 1 Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 Sample Mean April 18, 2009 16 / 1 Sample Mean Proof Example By the linearity property of expectation (no independence required): Example. Let X1 , X2 , . . . , Xn be the indicator variables for the individual trials in a Bernoulli trials process with probability 0.3 for success. So, E[Xi ] = 0.3 E[Sn ] = E[X1 ] + . . . + E[Xn ] = n · µ 1 Sn E[An ] = E[ ] = E[Sn ] = µ. n n Since X , . . . , X 1 The sample average of the X is i An = n are independent, variance is linear: Math 425 Introduction to Probability Lecture 36 X1 + X2 + . . . + Xn n So, Var (Sn ) = Var (X1 ) + . . . + Var (Xn ) = n · σ 2 Sn 1 σ2 Var (An ) = Var ( ) = 2 Var (Sn ) = . n n n Kenneth Harris (Math 425) Var (Xi ) = (0.3)(0.7) = 0.21. April 18, 2009 µ = E[An ] = 0.3 σ 2 = Var (An ) = 0.21 . n Chebyshev’s inequality for ε = 0.1 provides 0.21 n − 21 P 0.2 ≤ An ≤ 0.4 ≥ 1 − = . n(0.1)2 n 17 / 1 Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 18 / 1 Sample Mean Sample Mean Example – continued Example Example. Suppose we choose at random n numbers in the interval [0, 1] with uniform distribution. Let Xi be the ith choice. Then 21 n − 21 P 0.2 ≤ An ≤ 0.4 ≥ 1 − = . n n Z µ Bounds for n = 50, n = 100 and n = 1000: P 0.2 ≤ A50 P 0.2 ≤ A100 P 0.2 ≤ A1000 ≤ 0.4 ≤ 0.4 ≤ 0.4 ≥ 0.58 σ2 x 2 dx − µ2 = 0 ≥ 0.79 Let An = ≥ 0.979. X1 +...+Xn n Var An ≈ 0.836347 ≈ 0.962549 = 1 12n 1 1 P An − < ε ≥ 1 − . 2 12nε2 April 18, 2009 19 / 1 Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 20 / 1 Sample Mean Example – continued Example Example. Suppose we choose at random n numbers using the standard normal distribution. Let Xi be the ith choice. Then 1 1 P An − < ε ≥ 1 − . 2 12nε2 This says that if we choose n numbers at random from [0, 1], then µ = E[Xi ] = 0 2 = Var (Xi ) = 1. σ 1 the chances are better than 1 − 12nε 2 that the average of the chosen 1 values differs from 2 by less than ε. n Let An = X1 +...+X be the sample average. n This is a normal distribution with mean 0 and variance 1. The statistics for the average value are Suppose ε = 0.1. Then E An = Var An = 1 100 P An − < ε ≥ 1 − . 2 12n 0 1 n By Chebyshev’s Inequality, for any ε > 0 For n = 100, the probability is about 0.92, For n = 1000, the probability is about 0.99, P An − 0 ≥ ε ≤ For n = 10, 000, the probability is about 0.999. Math 425 Introduction to Probability Lecture 36 1 2 By Chebyshev’s Inequality, for any ε > 0 ≈ 1. Math 425 Introduction to Probability Lecture 36 1 . 12 be the sample average. Then Sample Mean Kenneth Harris (Math 425) 1 Z = Var (Xi ) = E An = P 0.2 ≤ A50 ≤ 0.4 P 0.2 ≤ A100 ≤ 0.4 P 0.2 ≤ A1000 ≤ 0.4 1 2 x dx = 0 The actual values are Kenneth Harris (Math 425) 1 = E[Xi ] = April 18, 2009 21 / 1 Kenneth Harris (Math 425) 1 . nε2 Math 425 Introduction to Probability Lecture 36 April 18, 2009 22 / 1 Sample Mean Weak Law of Large Numbers Example – continued The Weak Law of Large Numbers For ε = 0.1 Chebyshev gives: 100 P An ≥ 0.1 ≤ . n Here is a comparison of the actual probabilities (A Theorem Let X1 , X2 , . . . be a sequence of IID random variables with finite mean E[Xi ] = µ and finite variance Var (Xi ) = σ 2 . n is normally −1 distributed with mean 0 and variance n ) with the Chebyshev estimates: P A ≥ 0.1 Chebyshev n n 100 200 300 400 500 600 700 800 900 1000 Kenneth Harris (Math 425) 0.3173 0.1573 0.0833 0.0455 0.0254 0.0143 0.0082 0.0047 0.0027 0.0016 Let An = 1.0000 0.5000 0.3333 0.2500 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000 X1 +...+Xn n be the sample average. For any ε > 0, P An − µ ≥ ε → 0 as n → ∞. Equivalently, P An − µ < ε → 1 Math 425 Introduction to Probability Lecture 36 April 18, 2009 23 / 1 Kenneth Harris (Math 425) Weak Law of Large Numbers Let X , X , . . . be IID. Then for the sample average A n 2 E An = µ n → ∞. Math 425 Introduction to Probability Lecture 36 April 18, 2009 25 / 1 Weak Law of Large Numbers Proof 1 as Var An Measurements = We want to make a measurement X with expected value µ and with X1 +...+Xn , n some accuracy , where each measurement randomly varies with variance σ 2 . We also want a high degree of certainty of accuracy: σ2 = . n By Chebyshev’s Inequality, for any fixed ε > 0, P X − ≤ µ ≤ X +ε ≥ p By Chebyshev’s inequality, choose n sufficiently large so that σ2 P An − µ ≥ ε ≤ 2 . nε Thus, for fixed ε, 1−p ≥ P An − µ ≥ ε → 0 as as equivalently n ≥ To increase the accuracy ε → n → ∞. Or equivalently, P An − µ < ε → 1 σ2 nε2 n≥ n → ∞. σ2 . (1 − p)ε2 ε k, =⇒ σ2 . (1 − p)ε2 increase the trials by k 2 : k 2n ≥ σ2 (1 − p) kε 2 . To Increase the accuracy by 1 decimal place, multiply trials by 100. Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 26 / 1 Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 27 / 1 Weak Law of Large Numbers Weak Law of Large Numbers Example: Coin Tossing Example: Random Numbers Plot of the mass of the random variable A (sample average) of a fair coin tossed n times. The mean of An is µ = 0.5. Note the larger n, the greater the percentage of area is contained in the interval (0.45, 0.55), as predicted by the Law of Large Numbers. n 0.25 0.15 0.10 0.05 0.2 0.4 0.6 0.08 0.8 n=100 0.06 0.04 0.02 0.2 0.4 Kenneth Harris (Math 425) 0.6 0.8 n=50 0.2 1.0 (sample average) of n reals uniformly chosen in [0, 1]. The mean of An is µ = 0.5. Note the larger n, the greater the percentage of area is contained in the interval (0.45, 0.55), as predicted by the Law of Large Numbers. n 8 0.10 0.08 0.06 0.04 0.02 n=10 0.20 Plot of the density of the random variable A 0.4 0.6 8 n=5 6 0.8 4 2 2 1.0 0.3 0.4 0.5 0.6 0.7 0.8 0.3 0.4 0.5 0.6 0.7 0.8 8 0.05 0.04 0.03 0.02 0.01 n=200 7 6 5 4 3 2 1 n=20 6 4 2 0.3 0.4 0.5 0.6 0.7 0.8 Math 425 Introduction to Probability Lecture 36 April 18, 2009 n=10 6 4 n=30 0.3 0.4 0.5 0.6 0.7 0.8 28 / 1 Kenneth Harris (Math 425) Example: Coin tossing 0.3 0.4 0.5 0.6 0.7 0.8 Math 425 Introduction to Probability Lecture 36 April 18, 2009 29 / 1 Example: Coin tossing Example – continued Example – continued Consider a Bernoulli trials process with IID indicator variables Example – continued. The variance σ 2 = p(1 − p) has a maximum value of 14 achieved at p = 12 : X1 , X2 , . . . denoting whether the trial was a success or failure. Suppose the probability of success is p. So, 0.25 0.20 Let A E[Xi ] = p Var (Xi ) = p(1 − p). 0.15 n = X1 +...+Xn n be the same average over n trials. So, E An = p 0.10 p(1 − p) Var An = . n 0.05 1 p= By Chebyshev’s inequality, for any ε > 0 0.2 Math 425 Introduction to Probability Lecture 36 2 0.6 0.8 1.0 Plugging back into Chebyshev gives us a bound on the deviation of the sample average from the mean: p(1 − p) 1 P An − p ≥ ε ≤ ≤ . 2 nε 4nε2 p(1 − p) P An − p ≥ ε ≤ . nε2 Kenneth Harris (Math 425) 0.4 April 18, 2009 31 / 1 Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 32 / 1 Example: Coin tossing Example: Coin tossing Example Example – continued We have the following bound on the deviation of the sample average A from the mean p using Chebyshev’s inequality: n Example We have two coins: one is fair and the other produces heads with probability 3/4. One coin is picked at random. How many tosses suffice for us to be 95 percent sure of which coin we had? P An − p ≥ ε ≤ We want to n large enough so that we have only 5% error: To make this problem more concrete: if the proportion of heads is 1 ≤ 0.05 4nε2 less than 0.625, then we will guess the coin was fair; otherwise, if the proportion of heads is greater than 0.625 we will guess the biased coin. How many tosses will suffice for 95 percent certainty that the chosen coin will not deviate by more than = 0.125 from its mean? Kenneth Harris (Math 425) Math 425 Introduction to Probability Lecture 36 April 18, 2009 1 . 4nε2 33 / 1 equivalently, n ≥ 1 . 4(0.05)ε2 We now have a bound on the number of trials needed without needing to know the mean or the variance. Kenneth Harris (Math 425) Example: Coin tossing Math 425 Introduction to Probability Lecture 36 April 18, 2009 Example: Coin tossing Example – continued Degree of Certainty vs. Number of Trials For ε = 0.125 choose n so that To achieve certainty p that we are within ε = 0.125 of the mean n≥ 1 4(0.05)(0.125)2 requires n trials, where equivalently n ≥ 320. By tossing the coin n ≥ 320 times we can be 95% certain the n ≥ sample average is within 0.125 of the true bias p of the coin to heads: P An − p ≥ 0.125 ≤ 0.05 Toss the coin 320 times and count heads. If fewer than 200 heads appear guess the fair coin. If more than 200 heads appear guess the biases coin. If exactly 200 heads appears, then laugh at your (bad?) luck. You can be 95 percent certain you chose the right coin. Kenneth Harris (Math 425) 34 / 1 Math 425 Introduction to Probability Lecture 36 April 18, 2009 35 / 1 1 4(1 − p)(0.125)2 Degree of Certainty 50% 75% 90% 95% 99% 99.9% Kenneth Harris (Math 425) Number of Trials 32 64 160 320 1600 16,000 Math 425 Introduction to Probability Lecture 36 April 18, 2009 36 / 1