Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1. Introduction The quick summary, going forwards: (1) Start with random variable X . (2) Compute the mean E(X) and variance σ 2 = var(X). (3) Approximate X by the normal distribution N with mean µ = E(X) and standard deviation σ. That is, we make the approximation P(a < X < b) ≈ P(a < N < b). (It's not always good, but we can always make it.) (4) Convert the normal distribution N to the standard normal distribution Z . Specically N −µ σ a−µ b−µ P(a < N < b) = P <Z< σ σ Z= and so (5) Look up the appropriate values in a table and you're done. Sometimes you just follow these steps going backwards a bit. 2. Getting an intuition for the Central Limit Theorem The following pictures are meant to illustrate the idea of the central limit theorem. Let's put say for the sake of discussion, that we are ipping an unfair coin which has the probability 1 p = 10 of coming up heads. If we perform just one trial, then we expect to get no heads nine-tenths of the time and a heads one-tenth of the time. Figure 2.1 gives a histogram representation of the distribution. Notice that the height of the rectangle centered at 0 is Histogram of binomial distribution with 1 trials and probabilty p of success (blue). Normal distribution with mean 1·p and variance 1·p(1−p) (red). Figure 2.1. nine-tenths and the height of the rectangle centered at 1 is one-tenth. If we were to use a normal distribution to approximate the rst box, we'd nd the area under the normal distribution from −0.5 to 0.5. This is illustrated as computing the green shaded area in Figure 2.2. 1 2 The green shaded area represents the approximation of the probability of getting no heads. Figure 2.2. Now what happens when we increase the number of trials? As we ip n coins, we have the following information: n k P(k heads show up) = p (1 − p)n−k , k for k ranging between 0 and n. The mean number of heads to show up is E(Sn ) = np and the variance is var(Sn ) = np(1 − p). For example, when we ip 50 coins, we'd expect heads to come up, on average, 50 · 101 = 5 times. Let's look at the corresponding mass density functions of the binomial distribution for n = 5 (Figure 2.3), n = 50 (Figure 2.4), and n = 100 (Figure 2.5). As before, we also plot the (probability) density function for the normal distribution of corresponding mean and variance. For n = 5, notice that there is a probability of about 0.6 of getting no heads. We could approximate this by nding the area under the corresponding normal distribution from x = −0.5 to 0.5. Then the probability of getting more heads decreases. But then for n = 50, there is less than a probability of 0.01 for getting no heads and the probability of getting more heads increases at rst, peaking at the probability to get 5 heads, and then decreasing. In general, as n increases, the normal distribution becomes a better and better approximation for Sn . Note also that the approximation of the normal distribution to the binomial distribution for n = 1 (above) is relatively bad compared to the case for n = 100. Let's continue to the next section with an example. 3 Mass density function of the binomial distribution with 5 trials and probabilty p of success (blue) (truncated at n = 3). Probability density function of the normal distribution with mean 5p and variance 5p(1 − p) (red). Figure 2.3. Mass density function of the binomial distribution with 50 trials and probabilty p of success (blue) (truncated at n = 13). Probability density function of the normal distribution with mean 50p and variance 50p(1 − p) (red). Figure 2.4. Mass density function of the the binomial distribution with 100 trials and probabilty p of success (blue) (truncated at n = 22). Probability density function of the normal distribution with mean 100p and variance 100p(1 − p) (red). Figure 2.5. 4 Histogram of binomial distribution with 200 trials and probabilty p of success (blue). Normal distribution with mean 200p and variance 200p(1− p) (red). Shaded in blue is the probability of getting at least 23 heads in 200 ips of a coin with probability of success p. Figure 3.1. Figure 3.2. Histogram of binomial distribution with 200 trials and probabilty p of success (blue). Normal distribution with mean 200p and variance 200p(1− p) (red). Shaded in red is the approximation of getting at least 23 heads in 200 ips of a coin with probability of success p. 3. Doing an Example Let's nd the probability of getting at least 23 heads after ipping a coin 200 times provided the coin has a probability of success (coming up heads) p = 101 . Thus we're looking for the shaded blue area in Figure 3.1. But instead of using the formula for binomial distribution and adding up many terms, we'll use the normal curve to approximate. Thus, we'll be looking for the shaded red area in Figure 3.2. Note that the red region starts at 22.5 to improve the approximation. But what are we to do, because all we have is a table for the standard normal distribution! The key is in computing z-values! z= x−µ , σ 5 where µ is the mean and σ is the standard deviation. You can choose to read some theory in Section 3.1 or head straight to Section 3.2 to continue with the example. 3.1. Theory Behind z-value. We start o with a normal distribution X with mean µ and standard deviation σ , and want to nd the probability that X lies between a and b. For example, we have Figure 3.3a. By denition we have ˆ b P(a ≤ X ≤ b) = f (x)dx a ˆ b = a 1 2 2 √ e−(x−µ) /(2σ ) dx. σ 2π First let's make a u-substitution, u = x − µ. Then du = dx and we have ˆ b a 1 2 2 √ e−(x−µ) /(2σ ) dx = σ 2π u σ ˆ b−µ a−µ 1 2 2 √ e−u /(2σ ) du. σ 2π 1 σ Next, let's make another substitition, v = . Thus dv = du and we have ˆ b−µ a−µ 1 2 2 √ e−u /(2σ ) du = σ 2π ˆ (b−µ)/σ 1 2 √ e−v /2 dv 2π (a−µ)/σ a−µ b−µ =P ≤Z≤ , σ σ where Z is the standard normal distribution (mean 0 and standard deviation 1). Summarizing, given a normal distribution X with mean µ and standard deviation σ , a−µ X −µ b−µ P(a ≤ X ≤ b) = P ≤ ≤ σ σ σ a−µ b−µ =P ≤Z≤ σ σ where Z = X−µ is the standard normal distribution. σ We graph the new labeling in Figure 3.3b. Notice that the shape of the curve and the area we're looking to compute remains the same as Figure 3.3a. 3.2. Resuming the Example. At the beginning of the example, we established that the probability of ipping at least 23 heads after 200 coin ips with probality 101 is approximately the area under the normal√distribution X with mean 200· 101 = 20 and variance 200· 101 · 109 = 18 (so standard deviation 3 2). In short, we have P(S200 ≥ 23) ≈ P(X ≥ 22.5). By using z-values, see theory above, we have 22.5 − 20 √ P(X ≥ 22.5) = P Z ≥ 3 2 6 (a) Normal distribution X with mean µ = 10 and standard deviation σ = 3. Shaded in red is probability that X lies between a=7 and b = 15 (b) Standard normal distribution Z (mean 0 and standard deviation 1). Shaded in red is probability that X lies between 7−10 3 = −1 and 15−10 3 = 5 3. Note that to fully imitate the look of Figure 3.3a, the y-axis has been shifted from intersecting the x-axis at 0 to intersecting it at Figure 3.3 0−10 3 = − 10 3 . 7 Approximating the z-value 0.7224 and we conclude 22.5−20 √ 3 2 as 0.589..., we look in the chart to nd P(Z ≤ 0.59) = 27.5 − 20 √ P Z≥ ≈ 1 − 0.7224 = 0.2776. 3 2 If we were to replace the numbers in this example with letters, we'd have the following. P(Sn ≥ k) ≈ P(X ≥ k − 0.5) where X has mean np and variance np(1−p) (and so standard deviation z-values, we have P(X ≥ k − 0.5) = P (k − 0.5) − np Z≥ p np(1 − p) p np(1 − p)). Using ! where X − np Z=p np(1 − p) is the standard normal distribution. 3.3. Some more theory. The end of the last example leads up the next discussion, where we're interested in X = n1 X , where X is the total number of heads that appear after n tosses. We have µ = E(X) = 1 1 E(X) = (np) = p n n and var(X) = 1 1 p(1 − p) var(X) = (np(1 − p)) = n2 n2 n so that r σ= p(1 − p) . n Then we approximate, for large n, P(a ≤ X ≤ b) ≈ P =P a−µ b−µ ≤Z≤ σ σ √ a−p np ≤Z≤ p(1 − p) √ b−p np p(1 − p) ! , where Z= √ X −p np p(1 − p) In any case, all we're ever doing is approximating rst by a normal distribution with the mean and standard deviation of the original distribution. Then we convert the normal distribution to a standard normal distribution. 8 4. Example, Going Backwards Let's apply what we've learned going backwards. Instead of being given the probability, we want to determine how often we have to ip a coin to know the probability of heads coming up within 0.1 of its true value with probability at least 0.8. That is, for what n do we have P X n − p ≤ 0.1 ≥ 0.8. Well let's rewrite the left-hand side to match what we've been discussing. P X n − p ≤ 0.1 = P −0.1 ≤ X n − p ≤ 0.1 = P(−0.1 + p ≤ X n ≤ 0.1 + p) Great, now we have our random variable between two values and we're ready to approximate it q by a normal distribution. We subtract its mean p and divide by its standard deviation p(1−p) and obtain n P √ −0.1 np ≤Z≤ p(1 − p) √ 0.1 np p(1 − p) ! ≥ 0.8. Now part of going backwards is to nd the value c such that P(−c ≤ Z ≤ c) = 0.8. We can use our table to do this. Using a table that gives values from −∞ to z for z ≥ 0, we have 0.2 = 1 − 0.8 and dividing by two gives us 0.1, so that we look for the value 0.9 in the table and nd 1.29 (rounding up, instead of down). We have P(−1.29 ≤ Z ≤ 1.29) = 0.9015 − 0.0985 = 0.8030. Then we need which is the same as or √ 0.1 np ≥ 1.29 p(1 − p) p √ n ≥ (12.9) p(1 − p) n ≥ (12.9)2 p(1 − p). Since we don't know p, the worst-case scenario is when p = 12 , so we need 1 n ≥ (12.9) 1− ≈ 41.6 2 2 21