* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 17: The Law of Large Numbers and the Monte
Survey
Document related concepts
Transcript
Lecture 17: The Law of Large Numbers and the Monte-Carlo method The Law of Large numbers Suppose we perform an experiment and a measurement encoded in the random variable X and that we repeat this experiment n times each time in the same conditions and each time independently of each other. We thus obtain n independent copies of the random variable X which we denote X1 , X2 , · · · , Xn Such a collection of random variable is called a IID sequence of random variables where IID stands for independent and identically distributed. This means that the random variables Xi have the same probability distribution. In particular they have all the same means and variance E[Xi ] = µ , var(Xi ) = σ 2 , i = 1, 2, · · · , n Each time we perform the experiment n tiimes, the Xi provides a (random) measurement and if the average value X1 + · · · + Xn n is called the empirical average. The Law of Large Numbers states for large n the empirical average is very close to the expected value µ with very high probability Theorem 1. Let X1 , · · · , Xn IID random variables with E[Xi ] = µ and var(Xi ) for all i. Then we have X1 + · · · Xn σ2 − µ ≥ ≤ P n n2 In particular the right hand side goes to 0 has n → ∞. Proof. The proof of the law of large numbers is a simple application from Chebyshev n inequality to the random variable X1 +···X . Indeed by the properties of expectations we n have X1 + · · · X n 1 1 1 E = E [X1 + · · · Xn ] = (E [X1 ] + · · · E [Xn ]) = nµ = µ n n n n For the variance we use that the Xi are independent and so we have X1 + · · · Xn 1 1 σ2 var = 2 var (X1 + · · · Xn ]) = 2 (var(X1 ) + · · · + var(Xn )) = n n n n 1 By Chebyshev inequality we obtain then X1 + · · · Xn σ2 P − µ ≥ ≤ n n2 Coin flip I: Suppose we flip a fair coin 100 times. How likely it is to obtain between 40 and 60 heads? We consider the random variable X which is 1 if the coin lands on head and 0 otherwise. We have µ = E[X] = 1/2 and σ 2 = var(X) = 1/4 and by Chebyshev P (between 40 and 60 heads) P (40 ≤ X1 + · · · X100 ≤ 60) X1 + · · · X100 1 1 − ≥ = P 100 2 10 X1 + · · · X100 1 1 − ≤ = 1 − P 100 2 10 1/4 3 ≥ 1− = 2 100(1/10) 4 = (1) Confidence interval: One can use Chebyshev to build a confidence interval. A typical statement is a certain quantity is equal to x with a precision error of with a β % confidence. Typical values for β are 95 % or 99% and the statement means P (value is in (x − , x + )) ≥ β 100 In the context of the LLN is the precision error and we have σ2 β = 1− 2 n 100 From this we obtain the following important consequences • Precision error: With a fixed confidence if we want to the precision error by a factor 10 we need 102 as many measurement. • Confidence error: With a fixed precision if we want to the decrease the confidence error by a factor 10 (say go from a 90% to a 99% confidence interval) we need 10 as many measurement. 2 Coin flip II: I hand you a coin and make the claim that it is biased and that heads comes up only 48% of the times you flip it. Not believing me you decide you test the coin and since you intend to use that coin to cheat in a game you want to be sure with 95% confidence that the coin is biased. How many times should you flip that coin? To do this you assume that the coin is biased and so pick a random X which is 1 if the coin lands on head and 0 otherwise. We have then µ = .48 and σ 2 = .48 × .52 = .2496. To test of the coin is biased we flip the coin n times and we choose a precision error of = .01 so that we are testing if the probability that the coin lands on head is between (.47, .49), in particular it is biased. Using the law of large numbers we want to pick n such that X1 + · · · Xn .2496 − .48 ≥ .1 ≤ P n n(.01)2 and to obtain a 95% confidence interval we need 1 .2496 = 2 n(.01) 20 that is n should be at least .24.96 × 1002 × 20 = 49920. Polling: You read in the newspaper that after polling 1500 people it was found that 38% percent of the voting population prefer candidate A while 62% prefer candidate B. Estimate the margin of error of such poll with a 90% confidence. We pick a random variable with µ = .38 and so σ 2 = ..38 = ×.62 = .2356. By the LLN with n = 1500 we have X1 + · · · X1500 .2356 − 0.38 ≥ ≤ P 1500 15002 So if we set 1 10 = .2356 15002 we find r = 10 × .2356 = 0.039 1500 So with the margin of error is about 4% with a 90% confidence. The Monte-Carlo method: Under the name Monte-Carlo methods, one understands an algorithm which uses randomness and the LLN to compute a certain quantity which might have nothing to do with randomness. Such algorithm are becoming ubiquitous 3 in many applications in statistics, computer science, physics and engineering. We will illustrate the ideas here with some very simple test examples (toy models). Random number: A computer comes equipped with a random number generator, (usually the command rand, which produces a number which is uniformly distributed in [0, 1]. We call such a number U and such a number is characterized by the fact that P (U ∈ [a, b]) = b − a for any interval [a, b] ⊂ [0, 1] . Every Monte-Carlo method should be in principle constructed with Random number so as to be easily implementable. For example we can generate a 0 − 1 random variable X with P (X = 1) = p and P (X = 0) = 1 − p by using a random number. We simply set 1 if U ≤ p X= 0 if U > p Then we have P (X = 1) = P (U ∈ [0, p]) = p . An algorithm to compute the number π: To compute the number π we draw a square with side length 1 and inscribe in it a circle of radius 1/2. The area of the square of 1 while the area of the circle is π/4. To compute π we generate a random point in the square. If the point generated is inside the circle we accept it, while if it is outside we reject it. The we repeat the same experiment many times and expect by the LLN to have a proportion of accepted points equal to π/4 More precisely the algorithm now goes as follows • Generate two random numbers U and V , this is the same as generating a random point in the square [0, 1] × [0, 1]. • If U 2 + V 2 ≤ 1 then set X1 = 1 while if U 2 + V 2 > 1 set X1 = 0. • Repeat to the two previous steps to generate X2 , X3 , · · · , Xn . We have P (X = 1) = P (U12 + U2 ≤ 1) = π/4 area of circle = area of the square 1 and P (X = 0) = 1 − π/4. We have then E[X] = µ = π/4 var(X) = σ 2 = π/4(1 − π/4) 4 1 0.75 0.5 0.25 0 0.25 0.5 0.75 1 1.25 1.5 Figure 1: So using the LLN we have X1 + · · · X n π π/4(1 − π/4) P − ≥ ≤ n 4 n2 Suppose we want to compute π with a 4 digit accuracy and a 99% confidence. That is we 1 1 1 want π ± .001 that is π ± 1000 or π4 ± 4000 . That is we take = 4000 . To find an appropriate n we need then 1 π/4(1 − π/4) ≤ . n2 100 Note that we don’t know π so σ 2 is not known. But we note that the function p(1 − p) on [0, 1] has its maximum at p = 1/2 and the maximum is 1/4. So we can take σ 2 ≤ 1/4 and then we pick n such that 1 1/4 ≤ or n ≥ 400 × 106 2 n(1/4000) 100 The Monte-carlo method to compute the integral tion f on the interval [a, b] and we wish to compute Z I = Rb a f (x) dx. We consider a func- b f (x) dx . a for some bounded function f . Without loss of generality we can assume that f ≥ 0, otherwise we replace f by f + c for some constant c. Next we can also assume that f ≤ 1 otherwise we replace f by cf for a sufficiently small c. Finally we may assume that a = 0 5 1 0.75 0.5 0.25 0 0.08 0.16 0.24 0.32 0.4 0.48 0.56 0.64 0.72 0.8 0.88 0.96 3 Figure 2: The graph of the function y = esin(x ) 3(1 + 5x8 ) and b = 1 otherwise we make the change of variable y = (x − a)/(b − a). For example suppose we want to compute the integral Z 1 3 esin(x ) dx 8 0 3(1 + 5x ) see figure This cannot be done by hand and so we need a numerical method. A standard method would be to use a Riemann sum, i.e. we divide the interval [0, 1] in subinterval and set xi = ni then we can approximate the integral by Z n 1X f (x)dx ≈ f (xi ) n i=1 that is we approximate the area under the graph of f by the sum of areas of rectangles of base length 1/n and height f (i/n). We use instead a Monte-Carlo method. We note that I = Area under the graph of f and we construct a 0 − 1 random variable X so that E[X] = I. We proceed as for computing π. More precisely the algorithm now goes as follows • Generate two random numbers U and V , this is the same as generating a random point in the square [0, 1] × [0, 1]. • If V ≤ f (U ) then set X1 = 1 while if V > f (U ) set X1 = 0. 6 • Repeat to the two previous steps to generate X2 , X3 , · · · , Xn . We have Area under the graph of f = I= P (X = 1) = P (V ≤ f (U )) = Area of [0, 1] × [0, 1] Z 1 f (x) dx 0 and so E[X] = I and var(X) = I(1 − I). Exercise 1: How many people you include in a poll about preferences between two candidates if you want a precision of 1%? What about 0.5%? Exercise 2: Suppose that you run the Monte-Carlo algorithm to compute π 10’000 times and observe 7932 points inside the circle. Find your estimation for the number π? What is the accuracy of your observation with a 95% confidence? The answer should be in the form: based on my computation the number π belong to the interval [a, b] with probability .95. Exercise 3: Invent a Monte-carlo algorithm to compute the number e. You may consider for example a suitable integral 7