Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
22S:101 Biostatistics: J. Huang 1 Chapter 8: Sampling Distribution of the Mean Sample mean X: it is often used to estimate the population mean µ. Sampling distribution of X: If we could indefinitely repeatedly take samples of size n from a population we are interested in, then for each sample, we could compute its sample mean. Thus we would have a collection of samples means. The histogram of these sample means is called the sampling distribution of X. For example, if we were to select repeated samples of size 25 from the population of males living in the US and calculate the mean serum cholesterol level for each sample, we would end up with the sampling distribution of mean serum cholesterol levels of sample of size 25. We can use simulation to gain some intuition about the CLT (See the simulations in R). 22S:101 Biostatistics: J. Huang 2 However, in practice, repeated sampling is often very expensive or sometimes impossible. Fortunately, the (sampling) distribution of some most important statistics, such as the sample mean, can be derived based on the Central Limit Theorem. 22S:101 Biostatistics: J. Huang 3 Before going into the Central Limit Theorem, the two basic properties of the distribution of the sample mean X: 1. The mean (expectation) of the sample mean X is identical to the population mean µ. 2. The variance of the sample mean is equal to σ 2/n, where σ 2 is the population variance and n is the sample size. Thus the variance of the sample mean is n times smaller than the population variance. 3. We can standardize X so that it has mean 0 and variance 1 as follows: X −µ √ X −µ √ = n Z= . σ σ/ n Note that this is the same equation we used in transforming an arbitrary normal distribution into a standard normal distribution. 22S:101 Biostatistics: J. Huang 4 The Central Limit Theorem For large enough sample size n, the distribution of √ X −µ Z= n σ is approximately normal with mean 0 and variance 1. Another way to state the above result is: for large enough sample size n, the distribution of X is approximately normal with mean µ and variance σ 2/n. We can write this as X ∼approx N (µ, σ 2/n). 22S:101 Biostatistics: J. Huang 5 Applications of the Central Limit Theorem [Section 8.3] As described above, one application of the CLT is to approximate the distribution of the sample mean X for reasonably large sample size n. 22S:101 Biostatistics: J. Huang 6 Example: (Pages 198-202) Consider the distribution of serum cholesterol level (SCL) for all 20 to 74-year old males in the US: its population mean is µ = 211mg/100ml, and its standard deviation is σ = 46mg/100ml. If we select repeated samples of size 25 from the population: (1) What mean value of SCL cuts off lower and upper 10% of the sampling distribution? (2) If we select repeated random samples of size 25 from this population, what proportion of the samples will have a mean SCL between 193 and 229? (3) What are the lower and upper limits that enclose 90% of the means of samples of zise 25 drawn from the population? (4) What is the upper bound for 95% of the mean serum cholesterol levels of sample of size 25? (5) How large would the sample need to be for 95% of their (sample) means to lie within 5 mg/100ml of the population mean µ?