Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 7: Sampling Distributions Name: M&M Activity: We want to figure out what proportion of M&M’s are green. We can’t possibly count every green M&M in the world so we are going to take samples instead and see what happens. Open pack of M&M’s. Do not eat yet. Take a sample of size 20. Find the proportion of green M&M’s and write it here. Put the M&M’s back and repeat. Do three trials total. Graph and describe the sampling distribution of proportions: Notation: p true population proportion of green M&Ms p̂ sample proportion of green M&Ms Other Definitions: Sampling Error/Sampling Variability The natural variability we can expect from one sample to another. What’s the point of all this? Different random samples give different values for a statistic. The model of a sampling distribution shows the behavior of the statistic over all the possible samples for the same size n. We can use certain assumptions and conditions that, if met, can help us describe the shape, center, and spread of certain sampling distributions. Provided that the sampled values are independent and the sample size is large enough, the sampling distribution for p̂ will follow a normal model with ( pˆ ) p and SD( pˆ ) pq . n In other words, “Sampling models are what makes statistics work. They inform us about the amount of variation we should expect when we sample.” Stats: Modeling the World, page 414. Assumptions/Conditions to check before using a Sampling Distribution Model for a Proportion: 2 Assumptions 1. The sampled values must be independent of each other. 3 Conditions 1. The sample should be a simple random sample of the population. Sometimes it is difficult or impossible to get an SRS. At least we need to be very confident that the sampling method was not biased and that the sample should be representative of the population. 2. If sampling has not been made without replacement, then the sample size, n, must be no larger than 10% of the population. There are other ways in which samples can fail to be independent, but the only good protection from such failures is to think carefully about possible reasons for the data to fail to be independent. There are no simple conditions that guarantee independence. 2. The sample size, n, must be large enough. 3. The sample size must be big enough for both np and nq to be greater than 10. In other words, we must expect at least 10 successes and at least 10 failures to have enough data to make conclusions. Interesting To Think About: How large does a sample need to be as the proportion changes? Example 1: Assume that 30% of students at a large university wear contact lenses. We randomly pick 100 students. Find the probability that more than 1/3 of this sample wear contacts? Check assumptions and conditions! Independence: 1. Randomization Condition: It was stated that the sample was chosen randomly. 2. !0% Condition: 100 students is most likely less than 10% of the population of a large university, since a large university will most likely have more than 1000 students. Also, it is reasonable to believe that whether or not a student wears contacts is independent of whether other students wear contacts. Sample Size: 3. Success/Failure: np ³ 10 nq ³ 10 100(.3) ³ 10 100(1-.3) ³ 10 30 ³ 10 70 ³ 10 The sample is large enough. Since all of the conditions are met we may assume that this sampling distribution follows a normal model with a mean of .3 and a standard deviation of ( P p̂ > 1 3 (.3)(.7) = .0458 . 100 ) = normalcdf ( 13, 99999, .3, .0458) = 23.35% Example 2: A restaurant anticipates serving about 180 people on a Friday evening, and believes that about 20%of the patrons will order the special. How many of those meals should he plan on serving in order to be pretty sure of having enough ingredients on hand to meet customer demand? Check assumptions and conditions! Independence: 1. Randomization Condition: It is reasonable to believe that the 180 customers that come to the restaurant that night are a random sample of all patrons of the restaurant. 2. !0% Condition: 180 customers is most likely less than 10% of the population of all of the people who have ever gone to this restaurant, since a restaurant will most likely serve more than 1800 patrons. Also, it is reasonable to believe that whether or not a customer will order the special is independent of whether other customers order the special. Sample Size: 3. Success/Failure: np ³ 10 nq ³ 10 180(.2) ³ 10 180(1-.2) ³ 10 36 > 10 144 > 10 The sample is large enough. Conclusion: Since all of the conditions are met we may assume that this sampling distribution follows a normal model with a mean of .2 and a standard deviation of (.2)(.8) = .0298 . 180 P ( p̂ > ?) £ .05 z = invnorm(.95) = 1.6449 p̂ -.2 1.6449 £ Þ p̂ ³ .2490 .0298 So, they should expect at least 24.90% of the 180 patrons to order the special, in other words they’ll need enough ingredients to plan on serving at least 45 orders of the special. What’s the point? As the sample size grows, the sampling distribution of means becomes more and more symmetric and unimodal. More specifically…. The Central Limit Theorem: The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be. Why we need it: To approximate a sampling distribution of means of populations that are not normally distributed. Assumptions/Conditions to check before using a Sampling Distribution Model for a Mean: 1. The sampled values must be independent 1. The data values must be sampled randomly or the concept of each other. of a sampling distribution makes no sense. 2. If sampling has not been made without replacement, then the sample size, n, must be no larger than 10% of the population. There are other ways in which samples can fail to be independent, but the only good protection from such failures is to think carefully about possible reasons for the data to fail to be independent. There are no simple conditions that guarantee independence. 2. The sample size, n, must be large enough. 3. There is no one-size-fits-all rule for the large enough condition. If the population is unimodal and symmetric, even a relatively small sample is okay, but for a strongly skewed population a larger sample size is needed. We will not worry about exact numbers at this time. The Sampling Distribution Model for the Mean: If the above conditions are met then the sampling distribution model for a mean will follow a normal model with a mean m , equal to the population mean, and a standard deviation s (y) = SD(y) = deviation of the population. s n , where s is the standard Example 3: The weight of potato chips in a medium-size bag is stated to be 10 ounces. The amount that the packaging machine puts in these bags is believed to have a Normal distribution with mean 10.2 ounces and standard deviation 0.12 ounces. What is the probability that the mean weight in a 12-bag case is below 10 ounces? Example 4: Grocery store receipts show that customer purchases have a skewed distribution with a mean of $32 and a standard deviation of $20. 1. Explain why you cannot determine the probability that the next customer will spend at least $40. 2. Can you estimate the probability that the next 10 customers will spend an average of at least $40? 3. Is it likely that the next 50 customers will spend an average of at least $40? 4. Suppose the store had 312 customers today. Estimate the probability that the store’s revenues were at least $10,000. 5. If in a typical day, the store serves 312 customers, how much does the store take in on the worst 10% of such days? Example 5: Although most of us buy mild by the quart or gallon, farmers measure daily production in pounds. Ayrshire cows average 47 pounds of milk a day, with a standard deviation of 6 pounds. For Jersey cows, the mean daily production is 43 pounds, with a standard deviation of 5 pounds. Assume that Normal models describe mild production for these breeds. 1. We select an Ayrshire at random. What’s the probability that she averages more than 50 pounds of milk a day? 2. What’s the probability that a randomly selected Ayrshire gives more milk than a randomly selected Jersey? 3. A farmer has 20 Jerseys. What’s the probability that the average production for this small herd exceeds 45 pounds of milk a day? 4. A neighboring farmer has 10 Ayrshires. What’s the probability that his herd average is at least 5 pounds higher than the average for the Jersey herd? Example 6: A champion archer can generally hit the bull’s-eye 80% of the time. Suppose she shoots 200 arrows during competition. What’s the probability she gets at least 85% bull’s-eyes? Example 7: The ISA Babcock Company supplies poultry farmers with hens, advertising that a mature B300 Layer produces eggs with a mean weight of 60.7 grams. Suppose that egg weights follow a Normal model with standard deviation 3.1 grams. What’s the probability that a dozen randomly selected eggs average more than 62 grams?