Survey

Survey

Transcript

Chapter 15 Sampling Distribution Models Copyright © 2014, 2012, 2009 Pearson Education, Inc. 1 Objectives State and apply the conditions and uses of the Central Limit Theorem. 55. Determine the mean and standard deviation (standard error) for a sampling distribution of proportions or means. 56. Apply the sampling distribution of a proportion or a mean to application problems. 54. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 2 15.1 Sampling Distribution of a Proportion Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3 Sample Proportions and Sampling Distributions The Harris poll found that of 889 U.S. adults, 40% said they believe in ghosts. CBS News found that of 808 U.S. adults, 48% said they believe in ghosts. • Why are these two sample proportions different? • What is the true population proportion (of ALL U.S. adults)? We’ll denote the population proportion p, and the sample proportion p^ Consider all possible samples of size 808… if we made a histogram of the number of samples having a given p^ what might that look like? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 4 4 The Central Limit Theorem for Sample Proportions Rather than showing real repeated samples, imagine what would happen if we were to actually draw many samples and look at their proportions. The histogram we’d get if we could see all the proportions from all possible samples is called the sampling distribution of the proportions. What would the histogram of all the sample proportions look like? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 5 5 Sampling About Evolution According to a Gallup poll, 43% believe in evolution. Assume this is true of all Americans. • If many surveys were done of 1007 Americans, we could calculate the sample proportion for each. • The histogram shows the distribution of a simulation of 2000 sample proportions. • The distribution of all possible sample proportions from samples with the same sample size is called the sampling distribution. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 6 Sampling Distributions Sampling Distribution for Proportions • Symmetric • Unimodal • Centered at p • The sampling distribution follows the Normal model: N p, pq n What does the sampling distribution tell us? • The sampling distribution allows us to make statements about where we think the corresponding population parameter is and how precise these statements are likely to be. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 7 Another way of saying this… Sample statistics are random variables themselves • Sample proportion (for categorical data) • Sample mean (for quantitative data) They have a probability distribution, mean, standard deviation, etc. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 8 Mean and Standard Deviation Sampling Distribution for Proportions • Mean = p • npq pq σ( pˆ ) = = n n • pq N p, n Copyright © 2014, 2012, 2009 Pearson Education, Inc. 9 The Normal Model for Evolution Population: p = 0.43, n = 1007. Sampling Distribution: • Mean = 0.43 • Standard deviation = σ( pˆ ) = 0.43 0.57 0.0156 1007 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 10 Assumptions and Conditions Most models are useful only when specific assumptions are true. There are two assumptions in the case of the model for the distribution of sample proportions: 1. The Independence Assumption: The sampled values must be independent of each other. 2. The Sample Size Assumption: The sample size, n, must be large enough. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 11 11 Assumptions and Conditions (cont.) Assumptions are hard—often impossible—to check. That’s why we assume them. Still, we need to check whether the assumptions are reasonable by checking conditions that provide information about the assumptions. The corresponding conditions to check before using the Normal to model the distribution of sample proportions are the Randomization Condition,10% Condition and the Success/Failure Condition. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 12 12 Assumptions and Conditions (cont.) 1. Randomization Condition: The sample should be a simple random sample of the population. 2. 10% Condition: If sampling has not been made with replacement, then the sample size, n, must be no larger than 10% of the population. 3. Success/Failure Condition: The sample size has to be big enough so that both np and nq are at least 10. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 13 13 The Central Limit Theorem for Sample Proportions (cont) Because we have a Normal model, for example, we know that 95% of Normally distributed values fall within two standard deviations of the mean. So we should not be surprised if 95% of various polls gave results that were near the mean but varied above and below that by no more than two standard deviations. • This is what we mean by sampling error. It’s not really an error at all, but just variability you’d expect to see from one sample to another. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 14 14 Solving Sampling Distribution Problems (Proportions) o First identify what sampling distribution is involved. o Hint: you must know the underlying population p and there must be a sample proportion involved. pq o The sampling distribution is given by N p, n o Check the conditions, to be sure the sampling distribution applies. o Draw a picture of the Sampling Distribution (Normal curve) o Find where p^ falls on this distribution and use NormalCdf to solve for the probability of seeing p^ or something more extreme (shade from p^ to the nearest tail) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 15 Practice 12) Public Health statistics indicate that 26.4% of American adults smoke cigarettes. Describe the sampling distribution model for the proportion of smokers among a randomly selected group of 50 adults. What are your assumptions and conditions? 15) Based on past experience, a bank believes that 7% of the people who receive loans will not make payments on time. The bank has recently approved 200 loans. • What are the mean and standard deviation of the proportion of clients in this group who may not make timely payments? • What assumptions underlie your model? Are the conditions met? • What is the probability that over 10% of these clients will not make timely payments? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 16 16 Practice 16) Assume that 30% of students at a university wear contact lenses. • We randomly pick 100 students. Let p^ represent the proportion of students who wear contact lenses. What’s the appropriate model for the distribution of p^? – Specify the name of the distribution, the mean, and the standard deviation. – Be sure the verify that the conditions are met. • What’s the approximate probability that more than one third of this sample wear contacts? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 17 17 Enough Lefty Seats? 13% of all people are left handed. • A 200-seat auditorium has 15 lefty seats. • What is the probability that there will not be enough lefty seats for a class of 90 students? Think→ ˆ > 0.167 • Plan: p^=15/90 ≈ 0.167, Want P p • Model: Independence Assumption: With respect to lefties, the students are independent. 10% Condition: This is out of all people. Success/Failure Condition: 15 10, 75 10 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 18 Enough Lefty Seats? Think→ • Model: p = 0.13, n=90 SD pˆ = 0.13 0.87 0.035 90 The model is: N(0.13, 0.035) Show→ • Plot 0.167 0.13 1.06 • Mechanics: z = 0.035 P ( pˆ > 0.167) = P ( z >1.06) • 0.1446 Or normalcdf(0.167, 1E99, 0.13, 0.035) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 19 Enough Lefty Seats? Tell → • Conclusion: There is about a 14.5% chance that there will not be enough seats for the left handed students in the class. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 20 15.3 The Sampling Distribution of Other Statistics Copyright © 2014, 2012, 2009 Pearson Education, Inc. 21 The Sampling Distribution for Others • There is a sampling distribution for any statistic, but the Normal model may not fit. • Below are histograms showing results of simulations of sampling distributions. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 22 The Sampling Distribution For Others • The medians seem to be approximately Normal. • The variances seem somewhat skewed right. • The minimums are all over the place. • In this course, we will focus on the proportions and the means. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 23 Sampling Distribution of the Means • Imagine we roll a number of dice and take the average of the rolls over and over again. • For 1 die, the distribution is Uniform. • For 3 dice, the sampling distribution for the means is closer to Normal. • For 20 dice, the sampling distribution for the means is very close to normal. The standard deviation is much smaller. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 24 15.4 The Central Limit Theorem: The Fundamental Theorem of Statistics Copyright © 2014, 2012, 2009 Pearson Education, Inc. 25 The Central Limit Theorem The Central Limit Theorem • The sampling distribution of any mean becomes nearly Normal as the sample size grows. Requirements • Independent • Randomly collected sample The sampling distribution of the means is close to Normal if either: • Large sample size • Population close to Normal Copyright © 2014, 2012, 2009 Pearson Education, Inc. 26 Video on the Central Limit Theorem http://www.nytimes.com/video/science/100000002452709 /bunnies-dragons-and-the-normalworld.html?playlistId=100000002438160 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 27 How Normal? Copyright © 2014, 2012, 2009 Pearson Education, Inc. 28 Population Distribution and Sampling Distribution of the Means Population Distribution • Normal Sampling Distribution for the Means → Normal (any sample size) • Uniform → Normal (large sample size) • Bimodal → Normal (larger sample size) • Skewed → Normal (larger sample size) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 29 Standard Deviation of the Means • Which would be more unusual: a student who is 6’9” tall in the class or a class that has mean height of 6’9”? • The sample means have a smaller standard deviation than the individuals. • The standard deviation of the sample means goes down by the square root of the sample size: σ SD y = n Copyright © 2014, 2012, 2009 Pearson Education, Inc. 30 The Sampling Distribution Model for a Mean When a random sample is drawn from a population with mean m and standard deviation s, the sampling distribution has: • Mean: m σ • Standard Deviation: n • For large sample size, the distribution is approximately normal regardless of the population the random sample comes from. • The larger the sample size, the closer to Normal. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 31 Solving Sampling Distribution Problems (Means) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 32 Caution! Pay attention to how the sampling distribution of means differs depending on the size of the sample. Be careful to distinguish between the underlying distribution of the population (which may or may not be normal) and the sampling distribution of means (which depends on sample size n). Copyright © 2014, 2012, 2009 Pearson Education, Inc. 33 38) Statistics indicate that Ithaca, NY gets an average rainfall of 35.4” of rain each year, with a standard deviation of 4.2”. Assume that a Normal model applies – During what percentage of years does Ithaca get more than 40” of rain? – Less than how much rain falls in the driest 20% of all years? – A Cornell student is in Ithaca for 4 years. Let y(bar) represent the mean amount of rain for those 4 years. Describe the sampling distribution model of this sample mean y(bar). – What’s the probability that those 4 years average less than 30” of rain? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 34 34 Too Heavy for the Elevator? Mean weight of US men is 190 lb, the standard deviation is 59 lb. An elevator has a weight limit of 10 persons or 2500 lb. Find the probability that 10 men in the elevator will overload the weight limit. Think → • Plan: 10 over 2500 lb same as their mean over 250. • Model: Independence Assumption: Not random, but probably independent. Sample Size Condition: Weight approx. Normal. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 37 Too Heavy for the Elevator Think → • Model: m = 190, s = 59 By the CLT, the sampling distribution of y is approximately Normal: σ 59 μ( y ) =190, SD( y ) = = 18.66 n 10 Show→ • Plot: Copyright © 2014, 2012, 2009 Pearson Education, Inc. 38 Too Heavy for the Elevator? • Mechanics: y μ 250 190 z= = 3.21 SD( y ) 18.66 P ( y > 250) P ( z > 3.21) 0.0007 Tell → • Conclusion: There is only a 0.0007 chance that the 10 men will exceed the elevator’s weight limit. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 39 43) The College Board reported the score distribution shown in the table for all students who took the 2006 AP Statistics Exam: • Find the mean and standard deviation of the scores • If we select a random sample of 40 AP students would we expect their scores to follow a Normal Model? • Consider the mean scores of random samples of 40 AP stats students. Describe the sampling model for these means An AP stats teacher had 63 students preparing to take the AP exam. He considers his students to be “typical” of all the national students. What’s the probability that his students will achieve an average score of at least 3? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Score Percent of Students 5 12.6 4 22.2 3 25.3 2 18.3 1 21.6 Slide 1- 40 40 48) The weight of potato chips in a bag is stated to be 10 ounces. The amount that the machine puts in these bags is believed to have a normal model with mean 10.2 oz and standard deviation of 0.12 oz. • What fraction of all bags are underweight? • Some of the chips are sold in “bargain packs” of 3 bags. What is the probability that none of the 3 is underweight? • What’s the probability that the mean weight of the 3 bags is below 10 oz. • What’s the probability that the mean weight of a 24-bag case is below 10 oz? Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 41 41 15.5 Sampling Distributions: A Summary Copyright © 2014, 2012, 2009 Pearson Education, Inc. 42 Sample Size and Standard Deviation σ • SD( y ) = n SD( pˆ ) = pq n • Larger sample size → Smaller standard deviation • Multiply n by 4 → Divide the standard deviation by 2. • Need a sample size of 100 to reduce the standard deviation by a factor of 10. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 43 Billion Dollar Misunderstanding Bill and Melinda Gates Foundation found that the 12% of the top 50 performing schools were from the smallest 3%. They funded a transformation to small schools. • Small schools have a smaller n, thus a higher y standard deviation. • Likely to see both higher and lower means. • 18% of the bottom 50 were also from the smallest 3%. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 44 Distribution of the Sample vs. the Sampling Distribution Don’t confuse the distribution of the sample and the sampling distribution. • If the population’s distribution is not Normal, then the sample’s distribution will not be normal even if the sample size is very large. • For large sample sizes, the sampling distribution, which is the distribution of all possible sample means from samples of that size, will be approximately Normal. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 45 Two Truths About Sampling Distributions • Sampling distributions arise because samples vary. Each random sample will contain different cases and, so, a different value of the statistic. • Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for proportions and means. This is especially important when we do not know the population’s distribution. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 46 What Can Go Wrong? Don’t confuse the sampling distribution with the distribution of the sample. • A histogram of the data shows the sample’s distribution. The sampling distribution is more theoretical. • Beware of observations that are not independent. • The CLT fails for dependent samples. A good survey design can ensure independence. • Watch out for small samples from skewed or bimodal populations. • The CLT requires large samples or a Normal population or both. • Copyright © 2014, 2012, 2009 Pearson Education, Inc. 47