Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
+ Chapter 7: Sampling Distributions Section 7.3 Sample Means The Practice of Statistics, 4th edition – For AP* STARNES, YATES, MOORE + Section 7.3 Sample Means Learning Objectives After this section, you should be able to… FIND the mean and standard deviation of the sampling distribution of a sample mean CALCULATE probabilities involving a sample mean when the population distribution is Normal EXPLAIN how the shape of the sampling distribution of sample means is related to the shape of the population distribution APPLY the central limit theorem to help find probabilities involving a sample mean Means + Sample When we record quantitative variables we are interested in other statistics such as the median or mean or standard deviation of the variable. Sample means are among the most common statistics. Like any statistic computed from a random sample, a sample mean also has a sampling distribution. Sample Means We have seen how sample proportions arise most often when we are interested in categorical variables. We might be interested in finding the proportion of males or females, etc. Sampling Distribution of x Mean and Standard Deviation of the Sampling Distribution of Sample Means Suppose that x is the mean of an SRS of size n drawn from a large population with mean and standard deviation . Then : The mean of the sampling distribution of x is x The standard deviation of the sampling distribution of x is x n as long as the 10% condition is satisfied: n ≤ (1/10)N. x are true no matter what shape the population distribution has. Note : These facts about the mean and standard deviation of Sample Means As we have seen in section 7.1, when we choose many SRSs from a population, the sampling distribution of the sample mean is centered at the population mean µ and is less spread out than the population distribution. Here are the facts. + The from a Normal Population In one important case, there is a simple relationship between the two distributions. If the population distribution is Normal, then so is the sampling distribution of x. This is true no matter what the sample size is. Sample Means We have described the mean and standard deviation of the sampling distribution of the sample mean x but not its shape. That' s because the shape of the distribution of x depends on the shape of the population distribution. + Sampling Sampling Distribution of a Sample Mean from a Normal Population Suppose that a population is Normally distributed with mean and standard deviation . Then the sampling distribution of x has the Normal distribution with mean and standard deviation / n, provided that the 10% condition is met. Example: Young Women’s Heights Find the probability that a randomly selected young woman is taller than 66.5 inches. Let X = the height of a randomly selected young woman. X is N(64.5, 2.5) z 66.5 64.5 0.80 2.5 Sample Means The height of young women follows a Normal distribution with mean µ = 64.5 inches and standard deviation σ = 2.5 inches. P(X 66.5) P(Z 0.80) 1 0.7881 0.2119 The probability of choosing a young woman at random whose height exceeds 66.5 inches is about 0.21. Find the probability that the mean height of an SRS of 10 young women exceeds 66.5 inches. For an SRS of 10 young women, the sampling distribution of their sample mean height will have a mean and standard deviation 2.5 x 64.5 x 0.79 n 10 Since the population distribution is Normal, the sampling distribution will follow an N(64.5, 0.79) distribution. P(x 66.5) P(Z 2.53) 66.5 64.5 z 2.53 1 0.9943 0.0057 0.79 It is very unlikely (less than a 1% chance) that we would choose an SRS of 10 young women whose average height exceeds 66.5 inches. + Simulating the Sampling Distribution of a Mean Most population distributions are not Normal. What is the shape of the sampling distribution of sample means when the population distribution isn’t Normal? We can use simulation to get a sense as to what the sampling distribution of the sample mean might look like… Slide 18- 7 Means – The “Average” of One Die Let’s start with a simulation of 10,000 tosses of a die. A histogram of the results is: Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 18- 8 Means – Averaging More Dice Looking at the average of two dice after a simulation of 10,000 tosses: Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley The average of three dice after a simulation of 10,000 tosses looks like: Slide 18- 9 Means – Averaging Still More Dice The average of 5 dice after a simulation of 10,000 tosses looks like: Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley The average of 20 dice after a simulation of 10,000 tosses looks like: Slide 18- 10 Means – What the Simulations Show As the sample size (number of dice) gets larger, each sample average is more likely to be closer to the population mean. So, we see the shape continuing to tighten around 3.5 And, it probably does not shock you that the sampling distribution of a mean becomes Normal. Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 18- 11 The Fundamental Theorem of Statistics The sampling distribution of any mean becomes Normal as the sample size grows. All we need is for the observations to be independent and collected with randomization. We don’t even care about the shape of the population distribution! This fact is he Fundamental Theorem of Statistics and is called the Central Limit Theorem (CLT). Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 18- 12 The Central Limit Theorem Not only does the distribution of the sample means get closer and closer to the Normal model as the sample size grows, but this is true regardless of the shape of the population distribution. The CLT works better (and faster) the closer the population model is to a Normal itself. It also works better for larger samples. Draw an SRS of size n from any population with mean and finite standard deviation . The central limit theorem (CLT) says that when n is large, the sampling distributi on of the sample mean x is approximat ely Normal. Note: How large a sample size n is needed for the sampling distribution to be close to Normal depends on the shape of the population distribution. More observations are required if the population distribution is far from Normal. Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Sample Means The CLT is surprising and a bit weird: Something so Powerful bears repeating as it is what we know to be The Fundamental Theorem of Statistics The Central Limit Theorem (CLT) The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be. Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 18- 14 Assumptions and Conditions The CLT requires remarkably few assumptions, but there are few conditions to check: 1. Random Sampling Condition: The data values must be sampled randomly or the concept of a sampling distribution makes no sense. 2. Independence Assumption: The sample values must be mutually independent. (When the sample is drawn without replacement, check the 10% condition…) 3. Large Enough Sample Condition: There is no onesize-fits-all rule, although you can be pretty sure about using a Normal Model if the sample size is a minimum of 30. Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 18- 15 Example: Servicing Air Conditioners Your company will service an SRS of 70 air conditioners. You have budgeted 1.1 hours per unit. Will this be enough? Sample Means Based on service records from the past year, the time (in hours) that a technician requires to complete preventative maintenance on an air conditioner follows the distribution that is strongly right-skewed, and whose most likely outcomes are close to 0. The mean time is µ = 1 hour and the standard deviation is σ = 1 Since the 10% condition is met (there are more than 10(70)=700 air conditioners in the population), the sampling distribution of the mean time spent working on the 70 units has 1 x 0.12 x 1 n 70 The sampling distribution of the mean time spent working is approximately N(1, 0.12) since n = 70 ≥ 30. We need to find P(mean time > 1.1 hours) z 1.1 1 0.83 0.12 P(x 1.1) P(Z 0.83) 1 0.7967 0.2033 If you budget 1.1 hours per unit, there is a 20% chance the technicians will not complete the work within the budgeted time. Sampling Distribution Models Always remember that the statistic itself is a random quantity. We can’t know what our statistic will be because it comes from a random sample. Fortunately, for the mean and proportion, the CLT tells us that we can model their sampling distribution directly with a Normal model. Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 18- 17 Sampling Distribution Models (cont.) There are two basic truths about sampling distributions: 1. Sampling distributions arise because samples vary. Each random sample will have different cases and, so, a different value of the statistic. 2. Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for means and proportions. Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 18- 18 What Can Go Wrong? Don’t confuse the sampling distribution with the distribution of the sample. When you take a sample, you look at the distribution of the values, usually with a histogram, and you may calculate summary statistics. The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples—the one you got and the ones you didn’t get. Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 18- 19 What Can Go Wrong? (cont.) Beware of observations that are not independent. The CLT depends crucially on the assumption of independence. You can’t check this with your data—you have to think about how the data were gathered. Watch out for small samples from skewed populations. The more skewed the distribution, the larger the sample size we need for the CLT to work. Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 18- 20 What have we learned? Sample proportions and means will vary from sample to sample—that’s sampling error (sampling variability). Sampling variability may be unavoidable, but it is also predictable! Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 18- 21 What have we learned? (cont.) We’ve learned to describe the behavior of sample proportions when our sample is random and large enough to expect at least 10 successes and failures. We’ve also learned to describe the behavior of sample means (thanks to the CLT!) when our sample is random (and larger if our data come from a population that’s not roughly unimodal and symmetric). Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 18- 22 + Section 7.3 Sample Means Summary In this section, we learned that… When we want information about the population mean for some variable, we often take an SRS and use the sample mean x to estimate the unknown parameter . The sampling distribution of x describes how the statistic varies in all possible samples of the same size from the population. The mean of the sampling distribution is unbiased estimator of . , so that x is an The standard deviation of the sampling distribution of x is / n for an SRS of size n if the population has standard deviation . This formula can be used if the population is at least 10 times as large as the sample (10% condition). + Section 7.3 Sample Means Summary In this section, we learned that… Choose an SRS of size n from a population with mean and standard deviation . If the population is Normal, then so is the sampling distribution of the sample mean x. If the population distribtution is not Normal, the central limit theorem (CLT) states that when n is large, the sampling distribution of x is approximately Normal. We can use a Normal distribution to calculate approximate probabilities for events involving x whenever the Normal condition is met : If the population distribution is Normal, so is the sampling distribution of x . If n 30, the CLT tells us that the sampling distribution of approximately Normal in most cases. x will be