Download The Central Limit Theorem

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Department of Mathematical Sciences, University of Oulu
Department of Signal Processing and Acoustics, Aalto University
March 19, 2013
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Agenda
1
Motivation (about Sampling)
2
Sampling Distributions
3
The Central Limit Theorem
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Statistical Inference
The purpose of statistical inference is to make statements about a
population based on information contained in a sample.
Why samples? Often, it is not possible to examine all the elements in
a population due to limitation of our time, resources, and efforts.
Sampling variability
Each time we take a random sample from a population, we are likely
to get a different set of individuals and calculate a different summary.
Sampling distributions
If we take a lot of random samples of the same size, the nature of
variation from sample to sample will follow a (predictable) pattern and
can be determined or approximated in many situations.
This allows us to evaluate the reliability of our inference.
The Central Limit Theorem describes the characteristics of
distribution of the sample means when the sample size is large.
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Definitions and Notations
Population: the set of all the elements of interest.
Random variable : a variable whose value is subject to variations due
to chance, denoted by X , Y , or Z ...
Sample: a subset of the population from which data is collected.
Data are observations of a random variable denoted by x1 , . . . , xn
Parameter: a numerical characteristic of a population.
e.g. population mean (μ), population variance (σ2 )
Statistic: a numerical summary of the sample i.e. a function of data
p)
e.g. sample mean (X ), sample variance (s 2 ), sample proportion(^
Population Distribution : probability distribution of a random variable
Sampling Distribution : probability distribution of a sample statistic
Def. X =
n
i =1
n
Xi
,
The Central Limit Theorem
s2 =
n
2
i =1 (Xi −X )
n−1
,
p
^ = total number of successes(X )
the sample size(n)
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Example I: We want to know the average weight (μ) of eggs of the brown
variety produced by a company (satamuna).
.
We buy a carton of 12 brown eggs and the
box weighs 708g. The average egg weight
from that sample is 59g (X ).
If we take another carton of 12 brown eggs,
we might have X =62g...
(sampling variability)
If we were to sample “many times” with
n = 12, the resulting distribution of the
values of X is called the sampling
distribution of X .
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Example II: We want to know what proportion (p) of Helsinki residents
who will vote in an upcoming election favor the candidate A.
64 out of 100 residents who answered in a phone survey claim that
they favor A : p^ = 0.64
If we selected randomly another 100 residents (all answered), we may
have p^ = 0.55...
If we were to sample “many times” with n = 100, the resulting
distribution of the values of p^ is called the sampling distribution of p^.
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Sampling Distribution of the Mean
A fair die is thrown (infinitely many times) and the number on the spot is
observed. The random variable X = the number observed on the spot
The probability distribution of X :
x
p(x)
1
1/6
2
1/6
3
1/6
4
1/6
5
1/6
6
1/6
Meanof X :
μ = xp(x) = 1(1/6) + 2(1/6) + ... + 6(1/6) = 3.5
Variance
of X : 2
2
σ = (x − μ) p(x) = (1 − 3.5)2 (1/6) + ... + (6 − 3.5)2 (1/6) = 35/12
Standard deviation of X :
√ 2
σ = σ ≈ 1.71
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Two fair six-sided dice are rolled and observe the mean of the numbers.
2
All samples of size n = 2 and their means: X = X1 +X
2
sample
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(1,6)
x
1.0
1.5
2.0
2.5
3.0
3.5
sample
(2,1)
(2,2)
(2,3)
(2,4)
(2,5)
(2,6)
x
1.5
2.0
2.5
3.0
3.5
4.0
sample
(3,1)
(3,2)
(3,3)
(3,4)
(3,5)
(3,6)
x
2.0
2.5
3.0
3.5
4.0
4.5
sample
(4,1)
(4,2)
(4,3)
(4,4)
(4,5)
(4,6)
x
2.5
3.0
3.5
4.0
4.5
5.0
sample
(5,1)
(5,2)
(5,3)
(5,4)
(5,5)
(5,6)
3.5
6/36
4.0
5/36
4.5
4/36
x
3.0
3.5
4.0
4.5
5.0
5.5
sample
(6,1)
(6,2)
(6,3)
(6,4)
(6,5)
(6,6)
x
3.5
4.0
4.5
5.0
5.5
6.0
The sampling distribution of the mean X :
x
p(x)
1.0
1/36
1.5
2/36
2.0
3/36
2.5
4/36
3.0
5/36
5.0
3/36
5.5
2/36
6.0
1/36
Mean of X : μx =
xp(x) = 1.0(1/36) + 1.5(1/36) + ... + 6.0(1/36) = 3.5
X
:
Variance
of
2
2
σ2x = (x − μx )2 p(x) = (1.0
− 3.5) (1/36) + ... + (6.0 − 3.5) (1/36) = 35/24 ≈ 1.46
Standard Error of X : σx = σ2x ≈ 1.21
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Compare...
the distribution of X and the sampling distribution of X
Histogram of x_bar
0
0.0
1
0.2
2
3
Frequency
0.6
0.4
Frequency
4
0.8
5
6
1.0
Histogram of x
1
2
3
4
x
5
6
1
2
3
4
5
6
x_bar
Note that μx = μ and σ2x = σ2 /2.
When we generalize the mean and the variance of the sampling distribution
of X from two dice to the sampling distribution of X from n dice,
√
μx = μ, σ2x = σ2 /n, and σx = σ/ n
(This generalization works as long as X is taken over the independent variables (Xi ’s).)
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
More Illustrations
X ∼ Exponential : μ = E[X ] = 2, σ2 = Var[X ] = 4, σ = 2.
√
⇒ X : μx = 2, σx = 2/ n, X ∼ approx. Normal (as n → ∞ by CLT).
Population distribution
Sample Means (n=4)
0.4
0.3
Density
Density
0.3
0.2
0.1
0.1
0.0
0.0
0
2
4
6
8
10
12
14
0
1
2
3
4
5
6
Means of Size 1
Means of Size 4
Sample Means (n=16)
Sample Means (n=30)
0.8
7
1.0
0.8
Density
0.6
Density
0.2
0.4
0.2
0.6
0.4
0.2
0.0
0.0
1.0
1.5
2.0
2.5
3.0
3.5
Means of Size 16
The Central Limit Theorem
1.0
1.5
2.0
2.5
3.0
3.5
Means of Size 30
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Highly non-Normal Population
n=1
Consider X with p.d.f
3
fX (x) = x 2 for − 1 < x < 1 :
2
1.4
1.2
p̂( x )
1
0.8
μ = E[X ] = 0, σ2 = Var[X ] = 35 .
3
,
⇒ X : μx = 0, σx = 5n
0.6
0.4
0.2
0
−1
−0.8
−0.6
−0.4
−0.2
0
x
0.2
0.4
0.6
0.8
1
X ∼ approx. Normal (as n → ∞).
n=4
n = 16
1.6
2
1.4
1.2
1.5
1
0.8
1
0.6
0.4
0.5
0.2
0
−1
−0.8
−0.6
−0.4
−0.2
0
x̄
The Central Limit Theorem
0.2
0.4
0.6
0.8
0
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
x̄
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
The Central Limit Theorem
[Most important theorem in Statistics]
The sampling distribution of the mean of a random variable drawn from
any population is approximately normal for a sufficiently large sample size
(under certain general conditions).
It describes characteristics of the ‘population of sample means (sums)’.
The sample means are produced from the means of many independent
observations from a given ‘parent population’.
The parent population can be any distribution or of any form.
Ubiquity of the famous bell-shaped ‘Normal’ (‘Gaussian’) distribution:
We can use probabilities associated with the normal curve to answer
questions about the means of sufficiently large samples.
When a random variable is Normal, its mean is Normal
by the property of Normal distribution.
(Don’t need the Central Limit Theorem in this case.)
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Formally...
The Central Limit Theorem (CLT)
Let X1 , X2 , . . . , Xn be i.i.d. random variables from ANY distribution with
E[Xi ] = μ and Var[Xi ] = σ2 (both existing). Then, for a large n,
n
Xi − nμ
X −μ
√ = i =1 √
∼ Approx. Normal(0, 1)
σ/ n
σ n
X is approximately normally distributed with mean μ and variance
σ2 /n for a large n. (linear transformation of a Normal r.v.)
n
i =1 Xi = X1 + · · · + Xn is approximately normally distributed with
mean nμ and variance nσ2 for a large n.
Def. X1 , ..., Xn is a random sample of X if it is the result of n mutually independent
trials of the random process or experiment which generates X
⇔ X1 , ..., Xn i.i.d . (independent and identically distributed)
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
The sampling distribution of sample proportions
The sample proportion: p^ = Xn
where X is the number of successes and n is the sample size.
Then, X is a Binomial random variable with mean E[Xi ] = np and the
variance Var[Xi ] = np(1 − p).
Recall that a Binomial random variable can be represented as a sum of
independent Bernoulli random variables : X = ni=1 Yi where Yi ∼
Bern(p).
Thus, the central limit theorem applies to X for a large n:
X
∼ Approx. Normal(np, np(1 − p))
or
p(1 − p)
p^ ∼ Approx. Normal p,
n
(Normal Approximation to the Binomial)
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
The Speed of Convergence in CLT
Q. How large does n have to be?
It is not only n but “n combined with the skewness of distribution of the
random variable (X )” that matters (plus kurtosis to a much lesser degree)
by Edgeworth expansions.
- skewness of X : E (X − μ)3 /σ3
- kurtosis of X : E (X − μ)4 /σ4
For the Central Limit Theorem (CLT) to be applicable:
If the population distribution is reasonably symmetric, n ≥ 30 should
suffice.
If the population distribution is highly skewed or unusual, even larger
sample sizes will be required.
For the sampling distribution of sample proportion, the sample size n
has to be big enough so that both np and n(1 − p) are at least 10.
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Applications of CLT
A die is rolled 420 times. What is the probability that the sum of the rolls
lies between 1400 and 1550?
The sum is a random variable
420
Xi = X1 + · · · + X420
Y =
i =1
where Xi has a distribution (in page 8) with μ = 7/2 and σ2 = 35/12.
Thus, E[Y ] = nμ = 420 · 7/2 = 1470, Var[Y ] = 420 · 35/12 = 1225.
1550 − 1470
Y − nμ
1400 − 1470
√
≤
≤
)
35
35
σ n
≈ P(−2.0 ≤ Z ≤ 2.2857) = 0.9661
P(1400 ≤ Y ≤ 1550) = P(
where Z ∼ Normal(0, 1).
When the sample size is large, the effect of the correction factor is negligible.
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
Proof of the CLT
Let Zi = Xiσ−μ .
Then, E[Zi ] = 0 and Var[Zi ] = 1, since E[Xi ] = μ & Var[Xi ] = σ2 .
n
n n
Xi − nμ
1 Xi − μ
1 X −μ
i
=1
√ =
√
=√
=√
Zi
Let Un =
σ
σ/ n
σ n
n
n
i =1
i =1
Then, the moment generating function of Un is given by
n
t
Zi
MUn (t) = E [e tUn ] = E exp √
n
i =1
n
t
E exp √ Zi
since Zi are independent
=
n
i =1
n
n
t
t
= MZi √
MZi √
:Zi are identically distributed
=
n
n
i =1
Note: Zi are i.i.d. because Xi are i.i.d.
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
By Taylor’s theorem, for t near 0,
1
MZi (t) = MZi (0) + tMZi (0) + t 2 MZi (0) + o(t 2 )
2
Note that MZi (0) = 1, MZi (0) =E[Zi ] = 0 and MZi (0) =E[Zi2 ]= 1.
Then,
n 2 n
t
t2
t
+o
MZi √
= 1+
MUn (t) =
2n
n
n
→ et
Thus, Un =
X −μ
√
σ/ n
2 /2
as n → ∞ for each fixed t.
converges to a standard normal r.v. as n → ∞.
Moment generating function continuity theorem:
If moment generating functions MXn (t) are defined for all t and n and
limn→∞ MXn (t) = MX (t) for all t, then Xn converges in distribution to X
(limn→∞ FXn (x) = FX (x) at all x ∈ R at which FX is continuous, X ∼ FX , Xn ∼ FXn ).
The Central Limit Theorem
Hyon-Jung Kim-Ollila
Motivation (about Sampling)
Sampling Distributions
The Central Limit Theorem
More General Versions of CLT
The central limit theorem is actually fairly robust.
Variants of the theorem still apply
if you allow the Xi’s not to be identically distributed,
or the Xi’s not to be completely independent.
Roughly speaking, if you have a lot of little random terms that are
“mostly independent” (and no single term contributes more than a
“small fraction” of the total sum), then the total sum should be
“approximately” normal.
A general form of CLT:
Let X1 , X2 , . . . , Xn be independent random variables from ANY distribution with
E[Xi ] = μi and Var[Xi ] = σ2i (both existing). Then, for a large n,
n
i =1 (Xi − μi )
∼ Approx. Normal(0, 1)
n
2
i =1 σi
The Central Limit Theorem
Hyon-Jung Kim-Ollila