Download Lecture 8 - Center for Statistical Sciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
PHP2510: Principles of Biostatistics & Data Analysis
Lecture VIII: Law of Large numbers and central limit theorem
PHP 2510 – Lec 8: law of large numbers, CLT
1
Properties of the Sample Mean
Example I: Consider the experiment of rolling a dice.
Ω = {1, 2, 3, 4, 5, 6}
E[X] =
6
X
i=1
P (X = i) × i =
6
X
1
i=1
6
i = 3.5
If we roll a dice and calculate the average from all the rolls, what
happes to that average (sample mean)?
Let Xn represent the sample mean when we have rolled n times.
• Xn is a random variable
• E[Xn ] = E[X] = 3.5
• Anything more we can say about Xn ?
PHP 2510 – Lec 8: law of large numbers, CLT
2
Simulation: Below is the result of one such attempt of rolling a dice
for many times, simulated by a computer:
5 6 5 6 3 1 2 4 5 6 1 1 5 1 3 3 3 3 2 6 ...
This is one realization of that experiment. Let xn denote the
sample mean for the first n rolls. We have
n=1
5
2
(5+6)/2=5
3
(5+6+5)/3=5.333
4
(5+6+5+6)/4= 5.5
5
(5+6+5+6+3)/5= 5
6
(5+6+5+6+3+1)/6= 4.333
... ...
Let’s observe how the sample mean x̄k changes when we follow
n = 1, n = 2, . . . till a large number
PHP 2510 – Lec 8: law of large numbers, CLT
3
5.5
5.0
5.5
4.5
sample mean
4.5
3.5
4.0
3.5
sequence 1, first 10000 rolls
4.0
5.0
sequence 1, first 100 rolls
0
20
40
60
80
100
0
2000
4000
8000
10000
k
3.0
2.5
seqeunce 3, first 10000 rolls
2.0
seqeunce 2, first 10000 rolls
sample mean
3.5
3.0 3.5 4.0 4.5 5.0 5.5 6.0
k
6000
0
2000
4000
6000
8000
10000
k
PHP 2510 – Lec 8: law of large numbers, CLT
0
2000
4000
6000
8000
10000
k
4
The LAW OF LARGE NUMBERS states that: If we take larger
and larger independent samples of a random variable X, then the
sample mean converges to the expected value.
This applies to all random variables with an expectation.
PHP 2510 – Lec 8: law of large numbers, CLT
5
Another Example: Suppose X follows Binomial distribution,
Binom(n=15,p=.3). You have done a similar simulation in lab.
set obs 100
set seed 111
gen x1=rbinomial(15,.3)
Now let’s compute the samples means:
gen k=_n
gen xbar1=sum(x1)/k
twoway scatter xbar1 k
E[X] = np = 15 × .3 = 4.5, if the LAW OF LARGE NUMBERS
applies, we should see the sample mean converge to 4.5 as n
increases.
PHP 2510 – Lec 8: law of large numbers, CLT
6
The first 13 numbers and the samples means...
list k x1 xbar if k<=13
+--------------------+
| k
x1
xbar1 |
|--------------------|
1. | 1
6
6 |
2. | 2
5
5.5 |
3. | 3
5
5.333333 |
4. | 4
5
5.25 |
5. | 5
1
4.4 |
|--------------------|
6. | 6
4
4.333333 |
7. | 7
4
4.285714 |
8. | 8
8
4.75 |
9. | 9
4
4.666667 |
10. | 10
1
4.3 |
|--------------------|
11. | 11
5
4.363636 |
12. | 12
5
4.416667 |
13. | 13
8
4.692307 |
PHP 2510 – Lec 8: law of large numbers, CLT
7
6
xbar1
5
4.5
4
4
4.5
xbar1
5
5.5
5.5
6
Example: Sample means from iid binomial(15,.3)
0
20
40
60
80
100
0
200
400
600
800
1000
600
800
1000
xbar1
3
4
5
4
xbar1
6
5
7
6
k
8
k
0
200
400
600
800
1000
k
PHP 2510 – Lec 8: law of large numbers, CLT
0
200
400
k
8
sample means from iid poisson(7.5)
clear
set obs 1000
set seed 111
gen x1=rpoisson(7.5)
gen k=_n gen
xbar1=sum(x1)/k
twoway scatter xbar1 k,c(l) msize(.4)
PHP 2510 – Lec 8: law of large numbers, CLT
9
14
12
8
7
8
xbar1
10
xbar1
6
6
5
4
0
200
400
600
800
1000
0
200
400
600
800
1000
6000
8000
10000
k
7
7
7.5
8
8
xbar1
9
xbar1
8.5
10
9
11
9.5
k
0
200
400
600
800
1000
k
PHP 2510 – Lec 8: law of large numbers, CLT
0
2000
4000
k
10
Sample means from iid normal with mean 15, standard deviations
2.5 or 1 or 10.
clear
set obs 5000
set seed 111
gen x1=rnormal(15,2.5)
gen k=_n gen
xbar1=sum(x1)/k
twoway scatter xbar1 k if xbar1<15.5&xbar1>14.75,c(l) msize(.4)
clear
set obs 5000
set seed 222
gen x1=rnormal(15,1)
gen k=_n gen xbar1=sum(x1)/k
twoway scatter
xbar1 k if xbar1<15.5&xbar1>14.75,c(l) msize(.4) c(l)
PHP 2510 – Lec 8: law of large numbers, CLT
11
14.8
14.8
15
14.9
xbar1
15.2
xbar1
15
15.4
15.1
15.2
σ=1
15.6
σ = 2.5
0
1000
2000
3000
4000
5000
0
1000
2000
k
3000
4000
5000
k
σ = 10,zoomed in
14
14.8
16
15
xbar1
18
xbar1
15.2
20
15.4
22
15.6
σ = 10
0
1000
2000
3000
4000
5000
k
PHP 2510 – Lec 8: law of large numbers, CLT
0
1000
2000
3000
4000
5000
k
12
Notice that in each example, the sample mean converges to the
expectation µ = 15.
Also notice that for most n, the sample mean Xn does not equal to
15. And for the last simulation from N (µ = 15, σ 2 = 102 ), the
sample mean varies more than the first example
(N (µ = 15, σ 2 = 2.52 )) and second example (N (µ = 15, σ 2 = 1).
Recall if Xi ∼ N (µ, σ 2 ) we have Xn ∼ N (µ, σ 2 /n)
Thus the variance of X̄ depends on both the orignal σ 2 as well as
the sample size n
PHP 2510 – Lec 8: law of large numbers, CLT
13
We have seen examples of Binomial, Poisson and Normal
distributions.
In each case, the sample mean converges to the expected value
when
sample size increases.
In fact, the law of large numbers applies to all probability
distributions that have expectations.
If X1 , X2 , . . . are independent identical random variables from a
distribution with expectation E[X]
X1 + X 2 + . . . + X n
→ E[X] as n → ∞
Xn =
n
PHP 2510 – Lec 8: law of large numbers, CLT
14
FYI: Cauchy distribution
0.05
0.15
0.25
Exceptions exist but you may never run into these in your study.
For those interested, the most famous example is the Cauchy
distribution with probability density
1
f (x) = π(1+x
2)
−4
−2
0
2
4
x
This distribution has no expected value, and when you take a
sequence of independent observations from a cauchy distribution,
the sample mean does not converge to the ”center”.
PHP 2510 – Lec 8: law of large numbers, CLT
15
FYI: law of large numbers on variance estimate
Law of large numbers applies to all distributions with
expectations. Recall that we have defined variance as
”expectation of squared distance to the mean”: for a random
variable X with mean µ, var(X) = E[ (X − µ)2 ].
If we take an infinite sequence of independent random variables
X1 , X2 , . . . , Xk , . . . and each Xk has the same probability
distribution of X, we have
Y1 = (X1 − µ)2 , Y2 = (X2 − µ)2 , . . . , Yk = (Xk − µ)2 , . . . still
independent of each other and each have the expectation
E[Yk ] = E[(Xk − µ)2 ] = var(X).
Thus
Yk = (Xk − µ)2 → var(X) as n → ∞
PHP 2510 – Lec 8: law of large numbers, CLT
16
FYI: law of large numbers on variance estimate
Let’s see the Bernoulli example. Consider X ∼ Bernoulli(p = .35).
k
X
(X − .35)2
(X − .35)2 k
1
0
0.1225
0.1225
2
1
0.4225
(.1225+.4225)/2=0.2725
3
0
0.1225
(.1225+.4225+.1225)/3=0.2225
4
1
0.4225
0.2725
5
1
0.4225
0.3025
6
0
0.1225
0.2725
7
0
0.1225
0.2511
8
1
0.4225
0.2725
9
0
0.1225
0.2558
10
0
0.1225
0.2425
...
...
...
...
Recall var(X) = p(1 − p) = .35 ∗ .65 = .2275
PHP 2510 – Lec 8: law of large numbers, CLT
17
0.15
0.20
0.25
0.30
FYI: law of large numbers on variance estimate
0
200
400
PHP 2510 – Lec 8: law of large numbers, CLT
600
800
1000
18
FYI: law of large numbers on a function of X
In general, if we have take a long sequence of independent
observations of random variable X, and if a function of X, g(X)
has expectation E[g(X)], then the law of large numbers applies to
g(X) as well
n
X
g(X)k =
g(Xk )
k=1
PHP 2510 – Lec 8: law of large numbers, CLT
n
→ E[g(X)]
19
FYI: law of large numbers on a function of X
For example, if X1 , X2 , . . . , Xn all have the same probability
distribution Normal(µ = 5, σ 2 = 22 )
• X̄ =
X1 +X2 +...
n
→ 5 as n → ∞
• Consider the function g(X) = X 2 .
2
X12 +X22 +...+Xn
n
n→∞
→ E[X 2 ] = var(X) + (E[X])2 = 22 + 52 = 29 as
(since var(X) = E[X 2 ] − (E[X])2 , we have
E[X 2 ] = var(X) + (E[X])2 )
Exercise: if X1 , X2 , . . . , Xn all have the same probability
distribution Normal(µ = 0, σ 2 = 32 ), what does the sample mean
of X 2 converges to?
PHP 2510 – Lec 8: law of large numbers, CLT
20
The law of large numbers is the reason statistics works: by taking a
large sample of observations, we can use an average of the
observations to estimate unknown parameters.
• For Bernoulli trials, we can use Xn to estimate p
• For Poisson distributions, we can use Xn to estimate λ since
Xn → λ
• For normal distributions, we can use Xn to estimate µ
We know these estimates should be close to the (unknown) truth
when n gets larger and larger because of the law of large numbers.
But how close do we get? By ”close” do we mean equally probable
to be greater than or less than the truth? The sample mean Xn
from n independent observations of random variable X is itself a
random variable. What is the probability distribution of Xn like?
PHP 2510 – Lec 8: law of large numbers, CLT
21
Central Limit Theorem
Central Limit Theorem gives answer to the probability
distribution of Xn when n increases towards infinity.
PHP 2510 – Lec 8: law of large numbers, CLT
22
Central Limit Theorem
Recall that for normal random variables
X1 , X2 , . . . , Xn that are independent identical Normal
N (µ, σ 2 ),
then we know Xn ∼ N (µ, σ 2 /n) and n does not have to be a large
number.
For other random variables, the distribution of sample mean
converges to a normal distribution as n → ∞.
√
nXn −→ N (µ, σ 2 ) as n → ∞.
For a large number n, Xn is approximately distributed as
N (µ, σ 2 /n)
PHP 2510 – Lec 8: law of large numbers, CLT
23
Central Limit Theorem
Example:
Suppose that white blood cell count (per microliter) in adults is
normally distributed with mean 8000 and standard deviation 1200.
If we have a random sample of 15 adults, what is the probability
that the average from the 15 adults
• is less than 7500?
• is greater than 8600?
• is between 8000 and 8600?
PHP 2510 – Lec 8: law of large numbers, CLT
24
Central Limit Theorem
Since X1 , X2 , . . . , X15 are i.i.d. N (8000, 12002 )
X ∼ N (µ, σ 2 /n)
X ∼ N (8000, 12002 /15)
Standardization:
X −µ
√ ∼ N (0, 1) = Z
σ/ n
P (X < 7500) =
=
=
7500 − 8000
X − 8000
√ <
√
)
1200/ 15
1200/ 15
7500 − 8000
√
)
P (Z <
1200/ 15
P (Z < −1.61) = .053
P(
PHP 2510 – Lec 8: law of large numbers, CLT
25
Central Limit Theorem
The probability that any individual having wbc count less than
7500 is, in contrast,
7500 − 8000
X1 − 8000
<
)
P (X1 < 7500) = P (
1200
1200
7500 − 8000
)
= P (Z <
1200
= P (Z < −0.41) = .34
So we would not be surpised to see one individuals wbc count to be
less than 7500. But we may be somewhat surprised that the
average from 15 individuals to be less than 7500.
PHP 2510 – Lec 8: law of large numbers, CLT
26
Central Limit Theorem
P (X > 8600) =
=
=
8600 − 8000
X − 8000
√ >
√
)
1200/ 15
1200/ 15
8600 − 8000
√
)
P (Z <
1200/ 15
P (Z > 1.936) ≈ 0.026
P(
Since P (X > 8000) = 0.5
P (8000 < X < 8600) = 0.50 − .026 = 0.474
PHP 2510 – Lec 8: law of large numbers, CLT
27
Central Limit Theorem
Example:
X1 , X2 , . . . , XN are i.i.d. Binomial(n = 8, p = .95)
(NOTE here N is the number of X you observe, n is the parameter
for each binomial distribution. n = 8 and is fixed. ) Suppose N is a
large number.
What do we know about X?
• E[XN ] = E[X] = np = 8 × .95 = 7.6
• var(XN ) = var(X1 )/N = 8(.95)(1 − .95)/N = .38/N
What about the distribution of X?
Central Limit Theorem says the distribution of X is approximately
N (7.6, .38/N )
PHP 2510 – Lec 8: law of large numbers, CLT
28
Central Limit Theorem
Distribution of X̄, when N = 5
0
50
100
150
200
250
distribution of sample mean, N=5
6.5
7.0
PHP 2510 – Lec 8: law of large numbers, CLT
7.5
8.0
29
Central Limit Theorem
distribution of sample mean, N=10
0
0
50
50
100
100
150
150
200
250
200
distribution of sample mean, N=5
6.5
7.0
7.5
8.0
7.2
7.4
7.6
7.8
8.0
300
distribution of sample mean, N=100
0
0
50
50
100
100
150
150
200
200
250
250
300
distribution of sample mean, N=20
7.0
7.2
7.4
7.6
7.8
8.0
PHP 2510 – Lec 8: law of large numbers, CLT
7.4
7.5
7.6
7.7
7.8
30
Central Limit Theorem
0
50
100
150
200
250
300
350
distribution of sample mean from Binom(8,.95), N=100
7.4
7.5
7.6
7.7
7.8
The red smooth curve is a normal distribution with mean 7.6 and
√
variance 0.38/100=.038, standard deviation 0.038 = 0.0616
Recall we showed that
• E[XN ] = 8 × .95 = 7.6
• var(XN ) = var(X1 )/N = 8(.95)(1 − .95)/N = .38/N
√
√
• thus SD(X̄) = .38/ N = .0616
PHP 2510 – Lec 8: law of large numbers, CLT
31
Central Limit Theorem
Example: What if we have i.i.d. Poisson random variables, what is
the distribution of Xn ?
Suppose X1 , X2 , . . . , Xn are i.i.d. P oisson(λ = 1.5)
PHP 2510 – Lec 8: law of large numbers, CLT
32
Central Limit Theorem
distribution of sample mean, N=10
0
0
50
50
100
150
100
200
150
250
300
200
distribution of sample mean, N=5
0.5
1.0
1.5
2.0
2.5
3.0
3.5
1.0
2.0
2.5
3.0
distribution of sample mean, N=100
150
distribution of sample mean, N=20
1.5
0
0
50
50
100
150
100
200
250
300
0.0
1.0
1.5
2.0
PHP 2510 – Lec 8: law of large numbers, CLT
1.2
1.4
1.6
1.8
33
Central Limit Theorem
x
0
50
100
150
distribution of sample mean , N=100
1.2
1.4
1.6
1.8
The red smooth curve is a normal distribution with mean 1.5 and
variance 0.15.
Recall each X ∼ P ois(1.5), so
• E[Xn ] = E[X] = 1.5
• var(Xn ) = var(X1 )/N = 1.5/N = .015
PHP 2510 – Lec 8: law of large numbers, CLT
34
Central Limit Theorem
√
• thus SD(X̄) = .015 = .122
PHP 2510 – Lec 8: law of large numbers, CLT
35
Central Limit Theorem
Exercise: A department of a Rhode Island Hospital has daily visits
following a poisson distribution of mean 5. Assume the number of
visits is independent from day to day.
• What is the expected value of ”average daily visit in October”?
• What is the variance of average daily visit in October?
• What is the probability that there are more than 6 visits in a
day?
• What is the probability that the average daily visit in October
is greater than 5.5?
• What is the probability that the average daily visit from a year
is greater than 5.5?
PHP 2510 – Lec 8: law of large numbers, CLT
36
Central Limit Theorem
Since we have X ∼ P oisson(5), n=31 for Oct and n=365 for a year,
• E[Xn ] = E[X] = 5
• var(X31 ) = var(X)/31 = 5/31
• P (X > 6) = 1 − P (X ≤ 6) = 1 −
6
X
e−5 5k
k=0
k!
= 0.238
• X31 approximately N ormal(5, 5/31)
X31 −5
5.5−5
>√
) = P (Z > 1.245) = 0.1067
P (X31 > 5.5) = P ( √
5/31
5/31
• X365 approximately N ormal(5, 5/365)
X365 −5
> √5.5−5 ) = P (Z > 4.27) ≈ 0
P (X365 > 5.5) = P ( √
5/365
PHP 2510 – Lec 8: law of large numbers, CLT
5/365
37
Central Limit Theorem
Example: Rice Chapter 5 problem 15
Suppose you bet $5 on a fair game. Use the central limit theorem
to approximate your probability of losing more than $75 in a
sequence of 50 independent games.
PHP 2510 – Lec 8: law of large numbers, CLT
38
Central Limit Theorem
Let Xi be your winning of the i − th game. Fair game means
P (Xi = 5) = P (Xi = −5) = 0.5
Event of interest:
50
X
i=1
Xi < −75, equivalent to X < −75/50 = −1.5
In order to apply CLT we need the mean and variance of X.
E[X] = 0
2
E[X ] =
X
x={−5,5}
x2 P (X = x) = 52 × 0.5 + (−5)2 × 0.5 = 25
σ 2 = var(X) = E[X 2 ] − [EX]2 = 25 − 0 = 25
PHP 2510 – Lec 8: law of large numbers, CLT
39
Central Limit Theorem
Since by CLT, X ∼ N (0, σ 2 /n),
Here X ∼ N (0, 25/50), thus
P (X < −1.5) =
−1.5 − 0
)
P (Z < p
25/50
=
P (Z < −2.12))
=
0.017
PHP 2510 – Lec 8: law of large numbers, CLT
40
Central Limit Theorem
EXERCISE:Rice, Chapter 5, Problem 17
Suppose that a measurement has mean µ and variance σ 2 = 25. We
would like to estimate µ by taking a number of measurements and
calculate the average X. We want to be 95% sure that the estimate
is close enough to µ such that the difference between X̄ and µ is
less than 1. How many measurements do you need to take?
PHP 2510 – Lec 8: law of large numbers, CLT
41
Central Limit Theorem
X −µ
√ =Z
X ∼ N (µ, σ /n) =⇒
σ/ n
2
P (|X − µ| < 1) = 0.95
X − µ
1
=⇒ P ( √ < √ ) = 0.95
σ/ n
σ n
=⇒ P (|Z| <
Lookup the Z-table,
1√
σ/ n
1
√ ) = 0.95
σ/ n
= 1.96
Thus
n = 1.962 σ 2 = 1.962 × 25 = 96
PHP 2510 – Lec 8: law of large numbers, CLT
42
Central Limit Theorem
Summary:
For i.i.d. random variables X1 , X2 , . . . , Xn that each follow a
distribution with mean µ and variance σ 2
• The sample mean Xn is a random variable
• The observed data points x1 , x2 , . . . , xn are one set of
realizations of the random variables X1 , X2 , . . . , Xn , and the
sample mean calculated on that particular dataset, xn , is one
realization of r.v. Xn
• The sample mean Xn has mean µ and variance σ 2 /n
• when n is a large number, the distribution of Xn is
approximately normal with mean µ and variance σ 2 /n
PHP 2510 – Lec 8: law of large numbers, CLT
43
Related documents