Download Chapter 8: Sampling Distribution of the Mean Sample mean X: it is

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
22S:101 Biostatistics: J. Huang
1
Chapter 8: Sampling Distribution of the Mean
Sample mean X: it is often used to estimate the
population mean µ.
Sampling distribution of X: If we could indefinitely
repeatedly take samples of size n from a population we are
interested in, then for each sample, we could compute its
sample mean. Thus we would have a collection of samples
means. The histogram of these sample means is called the
sampling distribution of X.
For example, if we were to select repeated samples of
size 25 from the population of males living in the US and
calculate the mean serum cholesterol level for each sample,
we would end up with the sampling distribution of mean
serum cholesterol levels of sample of size 25.
We can use simulation to gain some intuition about the
CLT (See the simulations in R).
22S:101 Biostatistics: J. Huang
2
However, in practice, repeated sampling is often very
expensive or sometimes impossible. Fortunately, the
(sampling) distribution of some most important statistics,
such as the sample mean, can be derived based on the
Central Limit Theorem.
22S:101 Biostatistics: J. Huang
3
Before going into the Central Limit Theorem, the two basic
properties of the distribution of the sample mean X:
1. The mean (expectation) of the sample mean X is
identical to the population mean µ.
2. The variance of the sample mean is equal to σ 2/n,
where σ 2 is the population variance and n is the sample
size. Thus the variance of the sample mean is n times
smaller than the population variance.
3. We can standardize X so that it has mean 0 and
variance 1 as follows:
X −µ √ X −µ
√ = n
Z=
.
σ
σ/ n
Note that this is the same equation we used in transforming
an arbitrary normal distribution into a standard normal
distribution.
22S:101 Biostatistics: J. Huang
4
The Central Limit Theorem
For large enough sample size n, the distribution of
√ X −µ
Z= n
σ
is approximately normal with mean 0 and variance 1.
Another way to state the above result is: for large enough
sample size n, the distribution of X is approximately
normal with mean µ and variance σ 2/n. We can write
this as
X ∼approx N (µ, σ 2/n).
22S:101 Biostatistics: J. Huang
5
Applications of the Central Limit Theorem [Section 8.3]
As described above, one application of the CLT is to
approximate the distribution of the sample mean X for
reasonably large sample size n.
22S:101 Biostatistics: J. Huang
6
Example: (Pages 198-202) Consider the distribution of
serum cholesterol level (SCL) for all 20 to 74-year old males in the US:
its population mean is µ = 211mg/100ml, and its standard deviation
is σ = 46mg/100ml. If we select repeated samples of size 25 from
the population:
(1) What mean value of SCL cuts off lower and upper 10% of the
sampling distribution?
(2) If we select repeated random samples of size 25 from this
population, what proportion of the samples will have a mean SCL
between 193 and 229?
(3) What are the lower and upper limits that enclose 90% of the
means of samples of zise 25 drawn from the population?
(4) What is the upper bound for 95% of the mean serum cholesterol
levels of sample of size 25?
(5) How large would the sample need to be for 95% of their (sample)
means to lie within 5 mg/100ml of the population mean µ?