Download The Central Limit Theorem (CLT) says that if ξ is a random variable

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
The Central Limit Theorem (CLT) says that if ξ is a random variable with
mean < ξ > and variance σ 2 , then for large N the sample mean
m=
ξ1 + ξ2 + ... + ξN
N
(1)
is approximately Gaussian with mean < ξ > and variance σ 2 /N .
Let us take an example: let ξ be a Poisson random variable with mean 10.
Then its variance is also 10 (a peculiar property of the Poisson distribution). If
we let N be 20 in the sample mean above, then we expect the sample mean to
be approximately Gaussian with mean 10 and variance σ 2 = 10/20 = 0.5, so
that 2σ 2 = 1; thus the probability distribution of the sample means should be
approximately the Gaussian distribution
2
1
P (x) = √ e−(x−10)
π
(2)
In fact, though, m is a discrete random variable. What do we mean when we
say that its distribution is described by the distribution function (or probability
density) P(x)? This can only mean that when we look in some interval, the two
agree pretty well. Now if we compute sample means many times and make a
histogram of the results, we fill up “bins” that correspond to certain ranges of
values. Since ξ takes integer values, and in finding the mean we divide by N=20,
the actual values that can occur for m are things like ... 9.35, 9.40, 9.45, ....
These could all go into a bin for values between 9 and 9.5. Suppose we sample m
many times, say Nm times. Then what does the Gaussian distribution predict
for the number of values in that bin? We would have to integrate over the
interval, to get the probability for landing in the interval, and then multiply by
Nm , to find the actual number that landed there. That is we predict
Z 9.5
Nm
P (x)dx ≈ Nm P (9.25)(0.5)
(3)
9
values in the interval. Notice that the bin size, 0.5, comes in here. Clearly if
we chose the bins larger than 0.5, there would be more values in each bin (and
fewer bins). This number should agree with the histogram. You should see the
agreement if you sample m many times (Nm times), do a histogram, and plot
on top of it Nm P (x)∆x, where ∆x is the bin size in the histogram. We’ll see
an example.
Here is a possibly confusing point. The Poisson distribution itself looks Gaussian for large mean λ. Notice that this is a completely different phenomenon! It
means that if λ is large and you sample the distribution many times (Np times),
and make a histogram, it will look like the Gaussian
Np √
2
1
e−(x−λ) /2λ
2πλ
(4)
(I’m assuming here that you make the bin size 1, since the Poisson distribution
takes values 0, 1, 2, ..., i.e., each integer value gets its own bin.) Think why this
is NOT the Central Limit Theorem.
1