Download 7 Sampling Distributions and the Central Limit Theorem

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
7
Sampling Distributions and the Central Limit Theorem
7.1
Introduction
Example 7.1. Suppose that Y1 , . . . , Yn is an iid sample from fY (y). For example, each of the
following are statistics:
P
• T (Y1 , . . . , Yn ) = Ȳ = n1 ni=1 Yi
• T (Y1 , . . . , Yn ) = 12 [Y(n/2) + Y(n/2+1) ] if n is even.
• T (Y1 , . . . , Yn ) = Y(1)
• T (Y1 , . . . , Yn ) = Y(n)
• T (Y1 , . . . , Yn ) = S 2 =
Y(1)
1 Pn
n 1
i=1 (Yi
Ȳ )2
30
7.2
Sampling distributions related to the normal distribution
Example 7.2. Suppose that Y1 , . . . , Yn is an iid sample from a N (µ,
of the sample mean?
2 ).
What is the distribution
Example 7.3. In the interest of pollution control, an experimenter records Y , the amount of bacteria
per unit volume of water (measured in mg/cm3 ). The population distribution for Y is assumed to
be normal with mean µ = 48 and variance 2 = 100. That is Y ⇠ N (µ, 2 ).
(a) What is the probability that a single water specimen’s bacteria amount will exceed 50 mg/cm3 ?
(b) Suppose that the experimenter takes a random sample of n = 100 water specimens, and denote
the observation by Y1 , . . . , Y100 . What is the probability that the sample mean Ȳ will exceed
exceed 50 mg/cm3 ?
(c) How large should the sample size n be so that P (Ȳ > 50) < 0.01?
31
32
Now we prove that
(n
1)S 2
2
⇠
33
2
(n
1).
Example 7.4. In an ecological study examining the e↵ects of Hurricane Katrina, researchers choose
n = 9 plots and, for each plot, record Y , the amount of dead weight material (recorded in grams).
Denote the nine dead weights by Y1 , . . . , Y9 , where Yi represents the dead weight for plot i. The
researchers model the data Y1 , . . . , Y9 as an iid N (100, 32) sample. What is the probability that the
sample variance S 2 of the nine dead weights is less than 20? That is, what is P (S 2 < 20)? Further,
how large should the sample size n be so that P (S 2 < 20) < 0.01.
34
7.3
The t distribution
Recall that if Y1 , . . . , Yn is an iid N (µ,
2)
Z=
Suppose we replace
sample, the sample mean Ȳ ⇠ N (µ,
Ȳ
2 /n);
µ
p ⇠ N (0, 1).
/ n
by its estimator S, now we want to find the distribution of
t=
Ȳ µ
p .
S/ n
35
i.e.,
36
37
The t(3) density function (dotted) and the standard normal density (solid)
38
7.4
The F distribution
1. If W ⇠ F (⌫1 , ⌫2 ), then 1/W ⇠ F (⌫2 , ⌫1 ).
2. If T ⇠ t(⌫), then T 2 ⇠ F (1, ⌫).
3. If W ⇠ F (⌫1 , ⌫2 ), then (⌫1 /⌫2 )W/[1 + (⌫1 /⌫2 )W ] ⇠ Beta(⌫1 /2, ⌫2 /2).
39
Example 7.5. Suppose that Y1 , . . . , Yn is an iid sample from a N (µ,
p
(Ȳ µ)/(S/ n). What is the distribution of T 2 ?
Then we have
F =
S12 /
S22 /
2
1
2
⇠ F (n1
40
1, n2
1).
2)
distribution. Let T =
7.5
The Central Limit Theorem
41
42
Proof of the central limit theorem:
43
Example 7.6. A chemist is studying the degradation behavior of vitamin B6 in a multivitamin.
The chemist selects a random sample of n = 36 multivitamin tablets, and for each tablet, counts
the number of days until the B6 content falls below the FDA requirement. Let Y1 , . . . , Y36 denote
the measurements for the 36 tablets, and assume that Y1 , . . . , Y36 is an iid sample from a Poisson
distribution with mean 50. What is the approximate probability that the average number of days Ȳ
will exceed 52? How many tablets does the research need to observe so that P (Ȳ < 49.5) ⇡ 0.01?
44
7.6
The normal approximation to the binomial
Let
X=
n
X
Yi ,
i=1
the number of “successes.” What is the distribution of X?
Define the sample proportion p̂ as
n
p̂ =
X
1X
=
Yi = Ȳ .
n
n
i=1
45
Example 7.7. Use Monte Carlo simulation to approximate the sample distribution of p̂ for the
following cases:
Case 1: n = 10, p = 0.1
Case 4: n = 10, p = 0.5
Case 2: n = 40, p = 0.1
Case 2: n = 40, p = 0.5
Case 3: n = 100, p = 0.1
Case 6: n = 100, p = 0.5
One can clearly see that the normal approximation is not good when p = 0.1, except when n is very
large. On the other hand, when p = 0.5, the normal approximation is already pretty good when
n = 40.
46