Survey

Survey

Document related concepts

Transcript

Sampling Distributions and The Central Limit Theorem The BIG PICTURE of statistics is to make inferences about UNKNOWN Population using SAMPLE Information. For example, use Sample mean as an estimate of the population mean of the study. This chapter tells us how well a sample statistic such as sample mean perform when it is used to estimate the unknown population mean. Recall the difference between ‘statistic’ and ‘parameter’. Population Parameter Mean m Variance s2 Standard deviation s Sample Statistic Mean Variance s2 Standard deviation s X Population parameters do not change, since they describe the entire population. Sample statistics vary from sample to sample, therefore, a sample statistic such as sample mean is a random variable. For each sample, we can compute a sample mean, which will be different from sample to sample, and we can learn about the distribution of these sample means to see how sample means behave. To characterize the behavior of sample means, we need to study the distribution of all possible sample means. Sampling distribution of sample mean In a real world situation, population is often not available. All we can do is to use sample information to make an estimate or prediction of the population characteristics. How do we know if our estimate or prediction is a 'good' one? Example: To estimate the average weekly grocery spending for a family in a city, a random sample of 25 families are surveyed. The sample average is $80 and s.d. $30. Is $80 a ‘good estimate of average grocery spending per family in the city? How about if we take another random sample of 25 families, and we obtained the average to b $90. Which one is better? Q: How do we decide a good way for estimating the average family grocery spending? Decide a good way for estimating the average family grocery spending? The Idea: • Study the behavior of all potential sample means, each is computed from the spending of 25 families. We can • Then use the pattern of the general behavior of sample means to figure out how much confidence we have when we make our estimation or prediction. The behavior of all possible sample means, in statistics, can be described by the distribution of sample mean. Based on the distribution of sample mean, when we take a sample and obtain only one sample mean, we can tell how close the observed sample mean is to the unknown population mean. So, first thing is to learn the distributional behavior of all possible sample means. This is Sampling distribution of Sample Mean The distributional behavior of sample means is characterized by four properties: 1. How do we determine the distribution of sample mean? 2. What is the center of the distribution? 3. What is the variation of the distribution? 4. What is the shape of the distribution? The sampling distribution of sample mean is the probability distribution of all possible sample means, each sample mean is obtained from a random sample of n observations drawn from the population with mean m and standard deviation s. NOTE: The distribution of sample mean depends on (a) the population from which we draw the sample and (b) the sample size, n. How do we determine the sampling distribution of sample mean? x x x x xxx xx x POPULATION xx x x x x x x x x x x x x x x x xxxxx xxxx x x x xx x xxxxx x x x x x x x xx x x x x x xxx x xxx xx xxx x x x xxx Samples: Each sample is a random sample of 8 SAT scores from the entire population Individual SAT scores x x x xx x x xx x xxx x x xx x x x x x Xxxx xxx x1 x xxx xxxxx x2 xxxxxxxx x3 xxxxxxxx x4 xxxx Xxx xxxx xxxx x5 x6 Sample Means: In this example, you see only six samples and six sample means. It is not x xi / 8 enough to demonstrate the distribution of sample means. If we continue to go through the same process and obtain, say, 1000 sample means, then, we can construct histogram of these sample means. The distribution of sample mean is shown by this histogram. OUR GOAL is to describe the distribution of all possible sample means. A graphical illustration of distribution of population and distribution of sample mean Figure A represents the weights for a sample of 26 pebbles, each weighed to the nearest gram. Figure B represents the mean weights of random samples of 3 pebbles each, with the mean weights rounded to the nearest gram.. One value is circled in each distribution. Is there a difference between what is represented by the X circled in A and the X circled in B? Please select the best answer from the list below. a) No, in both Figure A and Figure B, the X represents one pebble that weighs 6 grams. b) Yes, Figure A has a larger range of values than Figure B c) Yes, the X in Figure A is the weight for a single pebble, while the X in Figure B represents the average weight of 3 pebbles. Dot plot (A): each dot represents the weight of an individual pebble. This is the distribution of the population Dot plot (B): each dot represents the AVERAGE weight of THREE pebbles in the sample. This is the sampling distribution of sample mean, X Must known facts of Sampling Distribution of X Suppose random sample of size n is drown from a population with mean m and s.d. s. Then, we can describe the distribution of Sample Mean based on the following two situations: (A) If the population where we draw our sample is normal: X will be normal with mean m and s.d. s / n (B) If the population where we draw our sample is not normal: (B-1) When sample size n is small (<30): has the similar distribution shape as the population, X and the mean will be m and s.d. will be s / n (B-2) When n is large (>= 30) then, regardless the distribution of the original population where we draw our samples, X will be approximately normal with mean m and s.d.s / n [The Fact of (B-2) is called the Central Limit Theorem] The Distribution of Sample Mean When Population is NOT Normal [FACT B-2] : Central Limit Theorem] [Similar exam questions] Take random sample of n observations from population, which is NOT normal, Then: (1) The center (the mean) of sample means m m = the center (mean) of population mean x (2) The spread (s.d.) of sample means = the spread (s.d.) of population/sqrt(n) s.d .( X ) s x s / n s.d .( Population ) / n s.d .( X ) / n (3) If the population is not normal (could be skewed-to-right, to-left or others), then, the shape of the distribution of sample mean depends on the sample size n. If n is larger, the distribution shape of sample mean is closer to Normal. This is what so-called Central Limit Theorem. A general guideline is that when n > 30, we say the sampling distribution is approximately normal. Distribution of Sample Means: still skewed, but not as skewed as population. Mean is m, s.d. is s / n Population is skewed-to-right Mean is m, s.d. is s m m FACT B-2: The Central Limit Theorem If the population from which the samples are drawn is NOT Normal, the shape of the sampling distribution of sample mean: (a) If sample size n is small, the distribution shape of sample mean is similar to the population distribution shape. (b) If sample n is large, the distribution shape of sample mean is closer to normal. In general, as n is larger than 30, the distribution of sample mean is approximately NORMAL, regardless the distribution shape of the population. X is approximately Normal with mean, m x m (the population mean) and s.d. of X is s x s Population s.d. n n Example : Sampling Distribution of Sample Mean [Similar Exam problems] 1. Suppose we draw a random sample of size n = 10 from bank accounts in a large city. We are interest in the average amount of saving per 10 accounts. The individual saving does not follow a normal curve. In fact, the distribution of individual saving is very skewed to right. Suppose we know the population average saving is m = $3000 and s = $2000. Q: What would be the distribution of sample means, each is the average of 10 accounts drawn from this population? Answer ANS: The sampling distribution of Sample Means, each is the average of 10 account savings drawn from this very skewed population would be: The shape of the distribution of sample means is still skewed, but, less skewed than the individual account saving distribution. (This is FACT B1) The mean of the distribution of Sample Means is m X $3000, and the standard deviation is: s X s / n 2000 / 10 $632.46 Example : Sampling Distribution of Sample Mean [Similar Exam problems] 2. Suppose we draw a random sample of size n = 50 from bank accounts in a large city. We are interest in the average amount of saving per 50 accounts. The individual saving does not follow a normal curve. In fact, the distribution of individual saving is very skewed to right. Suppose we know the population average saving is m = $3000 and s = $2000. Question: What would be the distribution of sample means, each is the average of 50 accounts drawn from this population? Answer ANS: The sampling distribution of Sample Means, each is the average of 50 account savings drawn from this very skewed population would be: The shape of the distribution of sample means is approximately normal (This is Central Limit Theorem (Fact B2) The mean of the distribution of Sample Means is m X $3000, and the standard deviation is: s X s / n 2000 / 50 $282.84 Some Important Points related to Sampling distribution of Sample Mean • The difference between distribution of sample mean and the original population distribution is the variation of sample mean is getting smaller when sample size is getting larger: s.d .( X ) s x s / n s.d .( Population ) / n s.d .( X ) / n • The s.d .( X ) s x s / n tells us that sample means will be closer to the population mean when sample size is larger. • Applying the empirical rule to the distribution of sample mean tells us that we are sure that about 68% of sample means will be within one s / n of population mean, m. About 95% of sample means will be within two s / n of population mean, m. This works like magic. Since, this allows us to determine that one unit of error of using sample mean to estimate population is s / n . • As you see when sample size is large, this error becomes smaller. Examples: calculate probabilities based on the sampling distribution of sample mean. [Similar exam questions] A random sample of size n = 25 is chosen from a normal population with known mean, m8, and s.d., s = 4. (a) Determine the sampling distribution of sample mean. (b) Determine the probability of having sample mean less than 7. (c) Determine the probability of having sample mean between 7 and 9. (d) What is the 75th percentile of the sample mean? (b) (c) Answer to Q(b) From Q(a) we have X ~ N(8, 0.8) Q(b) asks P( X < 7) . Note that the mean =8 and sd 0.8. Now use your TI Calculator or the table to find the answer. Answer is .10565 Answer to Q(c) From Q(a) we have X ~ N(8, 0.8) Q(c) asks P(7 < X < 9) . Note that mean =8 and sd 0.8, then, use TI calculator or the table to get Answer is .7887 Answer to Q(d) From Q(a) we have X ~ N(8, 0.8) Q(d) asks to find a value of sample mean , so that P( X < ) = .75, Use mean =8 and sd 0.8 in your TI calculator or the table to get Answer is: the 75th percentile = 8.5396 Exercises for Sampling Distribution [Similar Exam Problems]. 1. In a marketing study of gas prices for a State, if a random sample of 16 prices will be observed, and suppose the individual prices follow a normal distribution with mean price of $1.45 and a standard deviation $.2. (a) What will be the distribution of sample mean, from size of n = 16? (b) If you indeed observe 16 prices from a middle size city and compute the average of these 16 prices, you have the average price is $1.38. What is the chance of having the average price from 16 samples to be lower than $1.38? (c) The city manager claims that average price of 16 stations, $1.38, is extremely low comparing with all other averages, each from 16 prices. Is this claim correct? (d) Can you find the 40th percentile average price of 16 prices? 2. In a household income survey study for a State, if a random sample of 64 will be observed, and that we do not know the distribution of individual household incomes, but, we do have information about overall average household income, m = $45,000 and s.d. = $16,000. (a) Now based on this information, what cay you say about the distribution of the sample means, each from 64 household incomes? (b) Is the average household income of $52,000 from 64 households an indication of an unusually high average? (c) Find a 95th percentile of average household incomes from 64 households. 3. Suppose that the mean time for an oil change at a “10minute oil change joint” is 11.4 minutes with a standard deviation of 3.2 minutes. (a) If a random sample of n = 35 oil changes is selected, describe the sampling distribution of the sample mean. (b) If a random sample of n = 35 oil changes is selected, what is the probability the mean oil change time is less than 11 minutes? (c ) If a random sample of n = 50 oil changes is selected, what is the probability the mean oil change time is less than 11 minutes? (d) What effect did increasing the sample size have on the probability? 4. In a marketing study of gas prices for a State, if a random sample of 16 prices will be observed, and suppose the individual prices follow a normal distribution with mean price of $1.45 and a standard deviation $.2. (a) What will be the distribution of sample mean, each sample is a random sample of n = 16 prices? (b) If you indeed observe 16 prices from a middle size city and compute the average of these 16 prices, you have the average price is $1.38. What is the chance of having the average price from 16 samples to be lower than $1.38? (c) The city manager claims that average price of 16 stations, $1.38, is extremely low comparing with all other averages, each from 16 prices. Is this claim correct? (d) Can you find the 40th percentile average price of 16 prices? 5. A random sample of n = 64 observations are to be randomly selected. Determine if each of the following statements is correct or not: • The sampling distribution of sample mean in this case is the histogram of the 64 observations that are to be collected. • The average of all possible sample means must be equal to the true population mean, that is E(X ) = m.( The center of the distribution of is the population mean, m. ( This is the property called UNBIASED. ) • Since each sample mean is from an average of 64 observations, different samples will result different sample average. Therefore, there will be variation of sample means. • The standard deviation of sample mean,s x < s, the population standard deviation. • The shape of the sampling distribution of sample mean can not be close to normal because the original population distribution shape is not known. • The shape of the sampling distribution of sample mean will be close to normal because the sample size is large. • Central Limit Theorem says: when population is normal, the shape of sampling distribution of sample mean is close to normal, regardless the shape of the size of the sample.