Download 1. Introduction 2. Getting an intuition for the Central Limit Theorem

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
1.
Introduction
The quick summary, going forwards:
(1) Start with random variable X .
(2) Compute the mean E(X) and variance σ 2 = var(X).
(3) Approximate X by the normal distribution N with mean µ = E(X) and standard
deviation σ. That is, we make the approximation
P(a < X < b) ≈ P(a < N < b).
(It's not always good, but we can always make it.)
(4) Convert the normal distribution N to the standard normal distribution Z . Specically
N −µ
σ
a−µ
b−µ
P(a < N < b) = P
<Z<
σ
σ
Z=
and so
(5) Look up the appropriate values in a table and you're done.
Sometimes you just follow these steps going backwards a bit.
2.
Getting an intuition for the Central Limit Theorem
The following pictures are meant to illustrate the idea of the central limit theorem. Let's
put say for the sake of discussion, that we are ipping an unfair coin which has the probability
1
p = 10
of coming up heads. If we perform just one trial, then we expect to get no heads
nine-tenths of the time and a heads one-tenth of the time. Figure 2.1 gives a histogram
representation of the distribution. Notice that the height of the rectangle centered at 0 is
Histogram of binomial distribution with 1 trials and probabilty
p of success (blue). Normal distribution with mean 1·p and variance 1·p(1−p)
(red).
Figure 2.1.
nine-tenths and the height of the rectangle centered at 1 is one-tenth. If we were to use
a normal distribution to approximate the rst box, we'd nd the area under the normal
distribution from −0.5 to 0.5. This is illustrated as computing the green shaded area in
Figure 2.2.
1
2
The green shaded area represents the approximation of the probability of getting no heads.
Figure 2.2.
Now what happens when we increase the number of trials? As we ip n coins, we have
the following information:
n k
P(k heads show up) =
p (1 − p)n−k ,
k
for k ranging between 0 and n. The mean number of heads to show up is
E(Sn ) = np
and the variance is
var(Sn ) = np(1 − p).
For example, when we ip 50 coins, we'd expect heads to come up, on average, 50 · 101 = 5
times. Let's look at the corresponding mass density functions of the binomial distribution
for n = 5 (Figure 2.3), n = 50 (Figure 2.4), and n = 100 (Figure 2.5). As before, we also
plot the (probability) density function for the normal distribution of corresponding mean
and variance. For n = 5, notice that there is a probability of about 0.6 of getting no heads.
We could approximate this by nding the area under the corresponding normal distribution
from x = −0.5 to 0.5. Then the probability of getting more heads decreases. But then for
n = 50, there is less than a probability of 0.01 for getting no heads and the probability of
getting more heads increases at rst, peaking at the probability to get 5 heads, and then
decreasing. In general, as n increases, the normal distribution becomes a better and better
approximation for Sn . Note also that the approximation of the normal distribution to the
binomial distribution for n = 1 (above) is relatively bad compared to the case for n = 100.
Let's continue to the next section with an example.
3
Mass density function of the binomial distribution with 5 trials
and probabilty p of success (blue) (truncated at n = 3). Probability density
function of the normal distribution with mean 5p and variance 5p(1 − p) (red).
Figure 2.3.
Mass density function of the binomial distribution with 50 trials
and probabilty p of success (blue) (truncated at n = 13). Probability density
function of the normal distribution with mean 50p and variance 50p(1 − p)
(red).
Figure 2.4.
Mass density function of the the binomial distribution with 100
trials and probabilty p of success (blue) (truncated at n = 22). Probability density function of the normal distribution with mean 100p and variance
100p(1 − p) (red).
Figure 2.5.
4
Histogram of binomial distribution with 200 trials and probabilty
p of success (blue). Normal distribution with mean 200p and variance 200p(1−
p) (red). Shaded in blue is the probability of getting at least 23 heads in 200
ips of a coin with probability of success p.
Figure 3.1.
Figure 3.2.
Histogram of binomial distribution with 200 trials and probabilty
p of success (blue). Normal distribution with mean 200p and variance 200p(1−
p) (red). Shaded in red is the approximation of getting at least 23 heads in
200 ips of a coin with probability of success p.
3.
Doing an Example
Let's nd the probability of getting at least 23 heads after ipping a coin 200 times provided
the coin has a probability of success (coming up heads) p = 101 . Thus we're looking for the
shaded blue area in Figure 3.1. But instead of using the formula for binomial distribution and
adding up many terms, we'll use the normal curve to approximate. Thus, we'll be looking
for the shaded red area in Figure 3.2. Note that the red region starts at 22.5 to improve the
approximation.
But what are we to do, because all we have is a table for the standard normal distribution!
The key is in computing z-values!
z=
x−µ
,
σ
5
where µ is the mean and σ is the standard deviation. You can choose to read some theory
in Section 3.1 or head straight to Section 3.2 to continue with the example.
3.1. Theory Behind z-value. We start o with a normal distribution X with mean µ and
standard deviation σ , and want to nd the probability that X lies between a and b. For
example, we have Figure 3.3a.
By denition we have
ˆ
b
P(a ≤ X ≤ b) =
f (x)dx
a
ˆ
b
=
a
1
2
2
√ e−(x−µ) /(2σ ) dx.
σ 2π
First let's make a u-substitution, u = x − µ. Then du = dx and we have
ˆ
b
a
1
2
2
√ e−(x−µ) /(2σ ) dx =
σ 2π
u
σ
ˆ
b−µ
a−µ
1
2
2
√ e−u /(2σ ) du.
σ 2π
1
σ
Next, let's make another substitition, v = . Thus dv = du and we have
ˆ
b−µ
a−µ
1
2
2
√ e−u /(2σ ) du =
σ 2π
ˆ
(b−µ)/σ
1
2
√ e−v /2 dv
2π
(a−µ)/σ
a−µ
b−µ
=P
≤Z≤
,
σ
σ
where Z is the standard normal distribution (mean 0 and standard deviation 1).
Summarizing, given a normal distribution X with mean µ and standard deviation σ ,
a−µ
X −µ
b−µ
P(a ≤ X ≤ b) = P
≤
≤
σ
σ
σ
a−µ
b−µ
=P
≤Z≤
σ
σ
where Z = X−µ
is the standard normal distribution.
σ
We graph the new labeling in Figure 3.3b. Notice that the shape of the curve and the
area we're looking to compute remains the same as Figure 3.3a.
3.2. Resuming the Example. At the beginning of the example, we established that the
probability of ipping at least 23 heads after 200 coin ips with probality 101 is approximately
the area under the normal√distribution X with mean 200· 101 = 20 and variance 200· 101 · 109 = 18
(so standard deviation 3 2). In short, we have
P(S200 ≥ 23) ≈ P(X ≥ 22.5).
By using z-values, see theory above, we have
22.5 − 20
√
P(X ≥ 22.5) = P Z ≥
3 2
6
(a) Normal distribution X with mean µ = 10 and standard deviation σ = 3.
Shaded in red is probability that
X
lies between
a=7
and
b = 15
(b) Standard normal distribution Z (mean 0 and standard deviation 1).
Shaded in red is probability that
X
lies between
7−10
3
= −1
and
15−10
3
=
5
3.
Note that to fully imitate the look of Figure 3.3a, the y-axis has been shifted
from intersecting the
x-axis
at
0
to intersecting it at
Figure 3.3
0−10
3
= − 10
3 .
7
Approximating the z-value
0.7224 and we conclude
22.5−20
√
3 2
as 0.589..., we look in the chart to nd P(Z ≤ 0.59) =
27.5 − 20
√
P Z≥
≈ 1 − 0.7224 = 0.2776.
3 2
If we were to replace the numbers in this example with letters, we'd have the following.
P(Sn ≥ k) ≈ P(X ≥ k − 0.5)
where X has mean np and variance np(1−p) (and so standard deviation
z-values, we have
P(X ≥ k − 0.5) = P
(k − 0.5) − np
Z≥ p
np(1 − p)
p
np(1 − p)). Using
!
where
X − np
Z=p
np(1 − p)
is the standard normal distribution.
3.3. Some more theory. The end of the last example leads up the next discussion, where
we're interested in X = n1 X , where X is the total number of heads that appear after n
tosses. We have
µ = E(X) =
1
1
E(X) = (np) = p
n
n
and
var(X) =
1
1
p(1 − p)
var(X)
=
(np(1
−
p))
=
n2
n2
n
so that
r
σ=
p(1 − p)
.
n
Then we approximate, for large n,
P(a ≤ X ≤ b) ≈ P
=P
a−µ
b−µ
≤Z≤
σ
σ
√
a−p
np
≤Z≤
p(1 − p)
√
b−p
np
p(1 − p)
!
,
where
Z=
√
X −p
np
p(1 − p)
In any case, all we're ever doing is approximating rst by a normal distribution with the mean
and standard deviation of the original distribution. Then we convert the normal distribution
to a standard normal distribution.
8
4.
Example, Going Backwards
Let's apply what we've learned going backwards. Instead of being given the probability,
we want to determine how often we have to ip a coin to know the probability of heads
coming up within 0.1 of its true value with probability at least 0.8. That is, for what n do
we have
P X n − p ≤ 0.1 ≥ 0.8.
Well let's rewrite the left-hand side to match what we've been discussing.
P X n − p ≤ 0.1 = P −0.1 ≤ X n − p ≤ 0.1
= P(−0.1 + p ≤ X n ≤ 0.1 + p)
Great, now we have our random variable between two values and we're ready to approximate
it
q by a normal distribution. We subtract its mean p and divide by its standard deviation
p(1−p)
and obtain
n
P
√
−0.1
np
≤Z≤
p(1 − p)
√
0.1
np
p(1 − p)
!
≥ 0.8.
Now part of going backwards is to nd the value c such that
P(−c ≤ Z ≤ c) = 0.8.
We can use our table to do this. Using a table that gives values from −∞ to z for z ≥ 0, we
have 0.2 = 1 − 0.8 and dividing by two gives us 0.1, so that we look for the value 0.9 in the
table and nd 1.29 (rounding up, instead of down). We have
P(−1.29 ≤ Z ≤ 1.29) = 0.9015 − 0.0985 = 0.8030.
Then we need
which is the same as
or
√
0.1
np
≥ 1.29
p(1 − p)
p
√
n ≥ (12.9) p(1 − p)
n ≥ (12.9)2 p(1 − p).
Since we don't know p, the worst-case scenario is when p = 12 , so we need
1
n ≥ (12.9)
1−
≈ 41.6
2
2
21