Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 7 Survey Sampling and Inference Copyright © 2017, 2014 Pearson Education, Inc. Slide 1 Chapter 7 Topics • Discuss survey quality and identify possible sources of bias in surveys • Use the Central Limit Theorem for Sample Proportions to construct confidence intervals Copyright © 2017, 2014 Pearson Education, Inc. Slide 2 Section 7.1 anaken2012. Shutterstock LEARNING ABOUT THE WORLD THROUGH SURVEYS • Distinguish between Populations and Samples, Parameters and Statistics • Identify Possible Sources of Bias in Surveys • Identify the features of a Simple Random Sample Copyright © 2017, 2014 Pearson Education, Inc. Slide 3 Populations and Parameters A population is a group of objects or people we with to study. A parameter is a numerical value that characterizes some aspect of the population. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 4 Populations and Parameters: Example Data is gathered on the heights of all NBA basketball players. The mean of the data is calculated. The population is all NBA basketball players. The parameter is the mean height of all NBA basketball players. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 5 Measuring Populations: Census A census is a survey which measures every member of a population. Most populations are too large for a census, so we study populations by measuring samples instead. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 6 Samples and Statistics A sample is a collection of objects or people taken from the population of interest. A statistic is a numerical characteristic of a sample of data. Because a statistic is used to estimate the value of a population characteristic, it sometimes called an estimator. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 7 Samples and Statistics: Example A researcher is interested in studying the heights of American women. She takes a random sample of 1500 American women finds the mean height of the sample is 63.6 inches. The sample is the 1500 American women. The statistic is the sample mean of 63.6 inches. The researcher may estimate that the average American woman is 63.6 inches tall. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 8 Statistical Inference • Statistical inference is the art and science of drawing conclusions about a population based on observing characteristics of samples. • It involves uncertainty because the entire population is not being measured. • An important component of statistical inference is measuring our uncertainty. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 9 Example: Genetically Modified Foods In January 2015 the Pew Research Center published a report stating that 37% of Americans believed that genetically modified foods (GMOs) were safe to eat. This was based on a survey of 2002 American adults. Identify the population and sample. What was the parameter of interest? What is the statistic? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 10 Example: GMOs (continued) The population is all American adults. The sample was the 2002 American adults who were surveyed. The parameter of interest is the percentage of all American adults who believe that GMOs are safe to eat. The statistic is 37% (the percentage of the sample who felt this way). Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 11 Statistics vs. Parameters Statistics are knowable – any time we collect data we can find the value of a statistic. Parameters are typically unknown – they can be estimated with statistics, but these estimates will involve some degree of uncertainty. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 12 Notation In general, Greek characters are used to represent population parameters. English letters are used to represent statistics. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 13 Bias A survey method is biased if it has a tendency to produce an untrue value. Three types: 1. Measurement bias 2. Sampling bias 3. Use of an estimator that is biased Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 14 Measurement Bias • Results from asking questions that do not produce a true answer; occurs when measurements tend to record values larger (or smaller) than the true value Example: Asking people, “How much do you weigh?” It is likely that people will report a number less than their actual weight, resulting in an estimate that tends to be too small. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 15 Sources of Measurement Bias Measurement bias can occur in a variety of situations including: • Self-reporting of personal data • The use of confusing wording in survey questions • The use of non-neutral language in questions Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 16 Sampling Bias • Occurs when a sample is used that is not representative of the population Example: Internet polls – people who answer these polls tend to be those who have strong feelings about the results and are not necessarily representative of the population Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 17 Important Questions to Ask about Sampling 1. What percentage of people who were asked to participate actually did so? 2. Did researchers choose people to participate in the survey or did the people themselves choose to participate? If a large percentage of those chosen to participate refused to do so or if people themselves chose to participate, the conclusions of the survey are suspect. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 18 Example: Identifying Sources of Possible Bias A school district is interested in finding out what percentage of voters in the district would favor the passage of a school bond measure. The district sends out a survey to all school parents asking, “Do you favor the passing of a school bond measure to provide additional resources to improve educational opportunities for your student?” Identify any sources of possible bias in this survey. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 19 Example: Identifying Sources of Possible Bias The use of non-neutral language in the wording of the question, “Do you favor the passing of a school bond measure to provide additional resources to improve educational opportunities for your student?” introduces possible measurement bias into the survey. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 20 Simple Random Sampling (SRS) • Draw subjects at random from the population without replacement • A random sample is one in which every member of the population is equally likely to be chosen for the sample. • A true random sample is difficult to achieve. • Statisticians have developed methods for producing random samples that can be used to estimate characteristics of populations. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 21 Section 7.2 Lord and Leverett. Pearson Education Ltd MEASURING THE QUALITY OF A SURVEY • Use Accuracy and Precision to Measure the Quality of a Survey • Describe the Important Features of a Sampling Distribution, Including the Standard Error Copyright © 2017, 2014 Pearson Education, Inc. Slide 22 Evaluating Surveys Statisticians evaluate the method used for a survey, not the outcome of a single survey. Example: If a group of researchers were to survey 1000 randomly selected people, we would expect the results to vary from sample to sample. Because we would want to know how the group did as a whole, we evaluate the estimation methods, not the individual estimates. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 23 The Goal: Accuracy and Precision An estimation method should be both accurate and precise. • Accurate – The method measures what it intended; correctly estimates the population parameter. • Precise – If the method is repeated, the estimates are very consistent. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 24 Accuracy and Precision If the goal is to get the golf ball in the hole, which of these pictures shows a method that accurate? Precise? Both? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 25 Accuracy and Precision T This picture shows both accuracy and precision. This picture shows precision, but not accuracy. This picture shows accuracy, but not precision. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 26 Understanding the Behavior of Estimators: Simulation 1 We have a small population of eight people, two of whom are Cat People (C) and six of whom are Dog People (D). Random samples of four are taken from this population and the percentage of Cat People are noted for each sample. Note: 25% of the population are Cat People. • Will each sample contain 25% Cat People? • Is is possible to get a sample with 0 Cat People? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 27 Simulation 1: Looking at Data Using a random number table, three random samples were taken and results are shown below: D D D D (0% Cat People) D C D D (25% Cat People) C D D D (25% Cat People) Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 28 Simulation 1: Looking at Data Note: The sample proportion p̂ changes from sample to sample, but the population sample (p) remains the same. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 29 Sampling Distribution of p̂ The probability distribution of p̂ is called the sampling distribution. We can represent it using a table or a graph. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 30 Notes about the Simulation • Our estimator, p̂ , is not always the same as our parameter, p. • The mean of the sampling distribution is 25%, the same as the value of p – this indicates that the estimator p̂ is unbiased. • Even though p̂ is not always equal to p, the estimate is never more than 25 percentage points away from p. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 31 The Standard Error The standard error (SE) is the standard deviation of the sampling distribution. It measures how much an estimator typically varies from sample to sample. When the standard error is small, we say the estimator is precise. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 32 Simulation 2: Using a Larger Population This time we consider a population of 1000 people, 250 of whom are Cat People (25% of the population). We take random samples of size 10 and find the sample proportion, p̂ , of Cat People in each sample. Using technology, 10,000 samples were taken and the results are shown in the graph on the next slide. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 33 Simulation 2: Looking at Data Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 34 Simulation 2: Looking at Data Since the center of the distribution is about 25%, the bias of p̂ is still 0, even though a larger population and sample size were used. The variation of p̂ is less; this estimator is more precise even though the population is larger. In general, the precision has nothing to do with the size of the population, but only with the size of the sample. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 35 Simulation 3: Large Samples Produce More Precise Estimators Again we consider a population of 1000 people, 250 of whom are Cat People (25% of the population). This time we increase our sample size from 10 to 100, take 10,000 such samples, and compute the proportion of Cat People in each of the samples. The following slide shows the results of this simulation. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 36 Simulation 3: Looking at Data Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 37 Simulation 3: Looking at Data • The estimation method remains unbiased (center is still at 25%, the population proportion). • The shape of the histogram looks more symmetric than the one for samples of size 10. • The estimator is more precise because it uses a larger sample size (standard error is smaller than for samples of size 10). Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 38 Summary of Three Simulations 1. The estimator p̂ is unbiased for all sample sizes (as long as we take random samples). 2. The precision improves as the sample size gets larger. 3. The shape of the sampling distribution is more symmetric for larger sample sizes. Increasing sample size improves precision. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 39 Finding Bias and Standard Error The bias of p̂ is 0. The standard error of p̂ is SE p(1 p) n if the following conditions are met: 1. The sample is randomly selected from the population of interest. 2. If the sampling is without replacement, the population must be at least 10 times larger than the sample size. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 40 Section 7.3 racorn. Shutterstock THE CENTRAL LIMIT THEOREM FOR SAMPLE PROPORTIONS • Identify the Conditions Needed to Apply the Central Limit Theorem for Proportions • Use the Central Limit Theorem for Proportions to Describe the Sampling Distribution of a Sample Proportion and to Find Probabilities Copyright © 2017, 2014 Pearson Education, Inc. Slide 41 Central Limit Theorem for Sample Proportions • Used to estimate proportions in a population • Tells us that, if some basic conditions are met, the sampling distribution of the sample proportion is close to the Normal distribution Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 42 Central Limit Theorem for Sample Proportions If we take a random sample from a population, and if the sample size is large and the population size is much larger, then the sampling distribution of p̂ is approximately Normal with a mean of p and a standard deviation of SE p(1 p) n If you don’t know the value of p, you can substitute the value of p̂ . Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 43 Conditions for the Central Limit Theorem for Sample Proportions 1. Sampling is random and independent. 2. Large sample: The sample size, n, is large enough that the sample expects at least 10 successes (yes) and 10 failures (no). np̂ ³10 and n(1- p̂) ³10 3. Big population: If sampling is done without replacement, the population must be at least 10 times larger than the sample size. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 44 CLT for Sample Proportions: A Picture This figure shows the sampling distribution of p̂ for 10,000 samples of size 10 drawn from a population with p = 0.25. Note that this does not satisfy condition 2 of the CLT – the sample size is too small and the distribution is not Normal. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 45 CLT for Sample Proportions: A Picture This figure shows the sampling distribution of p̂ for 10,000 samples of size 100 drawn from a population with p = 0.25. Note that this does satisfy the sample size condition of the CLT and the shape of the distribution is approximately Normal. The standard deviation (or standard error) is: 0.25 0.75 0.0433. 100 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 46 Example: Vegetarians According to www.statisticbrain.com, 59% of vegetarians are women. Suppose a random sample of 500 vegetarians is taken. What is the approximate probability that the proportion of females in our sample will be more than 63%? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 47 Example: Vegetarians First, check conditions: 1. The sample was random and independent. 2. Large sample size: np = 500(0.59) = 295, and 295 ≥ 10 n(1 – p) = 500(1 – 0.59) = 205, and 205 ≥ 10 3. We can assume that there are at least 10(500) = 5000 vegetarians in the population. Since these conditions are met, the CLT tells us the sampling distribution will be approximately normal, with SE = p(1 p) 0.59 0.41 n 500 0.022. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 48 Example: Vegetarians To find the probability that the proportion in the sample is greater than 63%, we can use technology to find the probability of getting a value larger than 0.63 in a N(0.59, 0.022). Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 49 Example: Vegetarians The probability that the proportion of females in the sample will be more than 63% is 0.0345. Copyright © 2017, 2014 Pearson Education, Inc. Slide 50 Example: Voting In a closely contested school bond election, polling indicates that 50% of voters are in favor of the bond and 50% are against the bond. Suppose a random sample of 80 voters is selected. 1. What percentage of the sample would we expect to favor the bond? 2. Does the Central Limit Theorem apply? 3. What is the standard error for this sample proportion? 4. If so, what is the approximate probability that the sample proportion will fall within two standard errors of the population value p = 0.50? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 51 Example: Voting 1. Because we have taken a random sample we expect the sample proportion to be about the same as the population proportion, so we expect 50% favor the bond. 2. We have a random sample, np = 80 x 0.50 = 40, and n(1 – p) = 80 x 0.50 = 40. Since 40 ≥ 10, our sample is large enough. We will assume there are more than 800 voters in the district, so the population size is at least 10 times larger than the sample size. The conditions for the CLT are met. 3. The SE = 0.50 0.50 0.0559. 80 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 52 Example: Voting 4. Since the CLT applies, we can use the Normal model for the distribution of sample proportions N(0.50, 0.0559). By the Empirical Rule, we know that the probability that the sample proportion will fall within two standard errors of 0.50 is 95%. We could also verify this using technology. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 53 Example: Voting 0.50 + 2SE = 0.50 + 2(0.0559) = 0.6118 0.50 – 2SE = 0.50 – 2(0.0559) = 0.3882 The probability of being within 2 SE of the mean is 0.9545≈ 95%. Copyright © 2017, 2014 Pearson Education, Inc. Slide 54 Key Idea If the conditions for the CLT are met, the probability that a sample proportion will fall within two standard errors of the population value is 95%. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 55 Example: Dog Owners According to the American Veterinarian Medical Association, 36.5% of all Americans own a dog. Suppose a random sample of 200 Americans is taken. Would it be surprising to find that 45% of the sample owned a dog? Why or why not? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 56 Example: Dog Owners Check the conditions needed to apply the CLT are satisfied: 1. Random independent sample 2. np = 200(0.365) = 73, n(1-p) = 200(1 – 0.365) = 127. The sample is large enough since both 73 and 127 are greater than or equal to 10. 3. The population is at least 10 times larger than the sample. So the distribution of sample proportions will be Normal with a mean = 0.365 and a standard error = 0.365(1 0.365) 0.034. 200 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 57 Example: Dog Owners mean + 2 SE = 0.365 + 2(0.034) = 0.433 Since 45% is more than two standard errors from the mean, it would be surprising. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 58 Section 7.4 ocphoto. Shutterstock ESTIMATING THE POPULATION PROPORTION WITH CONFIDENCE INTERVALS • Use the Central Limit Theorem to Construct a Confidence Interval for a Population Proportion Copyright © 2017, 2014 Pearson Education, Inc. Slide 59 Affordable Care Act: Birth Control The Kaiser Health Tracking Poll surveyed 1504 American adults and asked them if they supported or opposed the requirement that private health insurance plans cover the full cost of birth control. The poll found the 61% of respondents supported this requirement. What percentage of ALL American adults support this requirement? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 60 Affordable Care Act: Birth Control From previous sections we know: 1. Our estimator p̂ is unbiased; the population parameter will be very close to 61%. 2. The standard error is estimated as pˆ (1 pˆ ) n 0.61 0.39 0.0126. 1504 3. Because the sample size is large, the probability distribution of p̂ is approximately Normal and centered around the true population proportion. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 61 Affordable Care Act: Birth Control There’s about a 95% chance that p̂ is closer than 2 standard errors away from the population proportion. 2SE = 2 ´ 0.0126 = 0.0252 So we are 95% confident that the value of the population proportion is within 2.5 percentage points of 61%. 61% – 2.5% = 58.5% 61% + 2.5% = 63.5% We are 95% confident that the population proportion is between 58.5% and 63.5%. This is called a confidence interval. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 62 Confidence Intervals Confidence intervals provide us with: 1. A range of plausible values for a population parameter. 2. A confidence level, which expresses our level of confidence that the interval contains the population parameter. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 63 Confidence Level • Tells us how often the estimation is successful • Measures the success rate of the method, not of any one particular interval Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 64 Margin of Error • Tells us how far from the population value our estimate can be Margin of error = z* SE where z* is a number that tells how many standard errors to include in the margin of error. From the Empirical Rule we know that z* = 1 corresponds with a confidence level of 68% and z* = 2 corresponds with a confidence level of 95%. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 65 Confidence Level and Margin of Error This table shows the more precise z* values for some frequently used confidence levels. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 66 Confidence Interval for a Population Proportion Confidence intervals have the form p̂ ± m where m is the margin of error. The margin of error is z*SE so the confidence interval is pˆ z * SE. To find SE, we substitute p̂ for p since p Is unknown, and estimate the standard error as SEest pˆ (1 pˆ ) . n Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 67 Example: Intelligent Life on Other Planets A Huffpost/YouGov poll of 1000 Americans found that 38% believed that there is intelligent life on other planets. a. Construct a 95% confidence interval for the percentage of all Americans who believe there is intelligent life on other planets. b. Would it be plausible to conclude that 40% of Americans believe in intelligent life on other planets? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 68 Example: Intelligent Life on Other Planets Check that the conditions for the Central Limit Theorem apply. 1. Random independent sample 2. Large Sample: p̂ = 0.38 np̂ =1000(0.38) = 380 n(1- p̂) =1000(1- 0.38) = 620 both 380 and 620 are greater than or equal to 10. 3. The population is at least 10 times larger than the sample. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 69 Example: Intelligent Life on Other Planets The confidence interval can be constructed by first calculating the standard error or by using technology. 1. SE = 0.38(1- 0.38) = 0.0153 1000 2. m = 1.96(0.0153) = 0.03 3. 0.38 – 0.03 = 0.35 0.38 + 0.03 = 0.41 The 95% confidence interval is 35% to 41%. It is plausible that 40% of Americans believe in intelligent life on other planets, since 40% is contained within the confidence interval. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 70 Using the TI-84 Calculator To construct a confidence interval for a population proportion on the TI-84 calculator: 1. Push STAT then select TESTS. 2. Select option 1-PropZInt. 3. Enter values for x and n, enter the confidence level (C-Level), and press Calculate. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 71 Using StatCrunch Stat > Proportion Stats > One sample > with data Enter # successes (x), # observations (n), select Confidence Interval and enter confidence level. Click Compute. L Limit = lower limit of confidence interval U Limit = upper limit of confidence interval Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 72 Confidence Interval: (0.350, 0.410) Copyright © 2017, 2014 Pearson Education, Inc. Slide 73 Example: Election Prediction A political consultant randomly surveys 300 voters in a district to see how many intend to vote for a certain candidate. Of the 300 voters surveyed, 165 indicate they will vote for the candidate. Using a 99% confidence interval, should the consultant predict that the candidate will win the election? Why or why not? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 74 Example: Election Prediction Confidence interval by applying the CLT: 1. Random independent sample 2. Large sample size p̂ = 165/300 = 0.55 np̂ = 300(0.55) = 165 n(1- p̂) = 300(1- 0.55) = 135 Both 165 and 135 are greater than or equal to 10. 3. We can assume that the number of voters is greater than or equal to 10(300) = 3000. The conditions needed to apply the CLT are met. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 75 Example: Election Prediction Constructing the confidence interval: 1. The standard error = 0.55(1 0.55) 300 0.0287. 2. m = 2.58(0.0287) = 0.074 3. 0.55 – 0.074 = 0.476 0.55 + 0.075 = 0.624 The 99% confidence interval is (0.476, 0.624). Since the interval contains plausible values that are less than 50% the consultant might not predict the candidate will win the election. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 76 Example: Election Prediction Confidence intervals are frequently constructed using technology. This is the StatCrunch output for the99% confidence interval based on on the polling data. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 77 Relationship between Confidence Level and Interval Width As the level of confidence increases, the width of the confidence intervals also increases. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 78 Section 7.5 Sozaijiten COMPARING TWO POPULATION PROPORTIONS WITH CONFIDENCE • Construct a Confidence Interval for the Difference of Two Population Proportions Copyright © 2017, 2014 Pearson Education, Inc. Slide 79 Comparing Two Population Proportions: Introduction In 2002, a Pew Poll based on a random sample of 1500 people suggested that 43% of Americans approved of stem cell research. In 2009 a new poll of a different sample of 1500 people found that 58% approved. Did American opinion really change? Or do the sample proportions differ just by chance? Could the population proportions be the same even though the sample proportions are different? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 80 Sample Proportions Even if two population proportions are equal, the sample proportions drawn from these populations are usually different. Confidence intervals are one method for determining whether different sample proportions indicate there are “real” differences in the population proportions. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 81 Basic Approach Our comparison of two populations proportions will be based on the statistic pˆ1 pˆ 2 . This statistic is used to estimate the difference between two population proportions p1 p2 . Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 82 Basic Approach 1. Find a confidence interval for the difference in proportions p1 – p2. 2. Check to see if 0 is included in the interval. • If 0 is in the interval, this suggests the two population proportions might be the same because if p1 – p2 = 0, then p1 = p2 and the proportions are the same. • If 0 is not in the interval, the confidence interval tells us how much greater one of the proportions might be than the other. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 83 General Method The confidence interval for two proportions has the same structure as the confidence interval for one proportion statistic ± z* x SEest where p̂ (1- p̂1 ) p̂2 (1- p̂2 ) SEest = 1 + n1 n2 and z* is chosen to get the desired confidence level, as we did for one proportion. Putting these two together, we see the confidence interval for the difference of two proportions is pˆ1 pˆ 2 z * pˆ1 (1 pˆ1 ) pˆ 2 (1 pˆ 2 ) . n1 n2 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 84 Using the TI-84 Calculator To construct a confidence interval for the difference of two population proportions on the TI-84 calculator: 1. Push STAT > TESTS and select the 2-PropZInt option. 2. Enter x1, n1, x2, n2 and the C-level you want. Make sure to round values for x1 and x2 to whole numbers. 3. Press Calculate. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 85 Using StatCrunch Select STAT > Proportion Stats > Two samples > with summary. Enter the number of successes and sample size for each sample. Click the option Confidence Interval for p1 – p2 and adjust your confidence level. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 86 Checking Four Conditions Before constructing a confidence interval for the difference in the population proportions. whether you use the formula or technology, check these conditions: 1. Random and Independent samples. Both samples are randomly drawn from their populations and are independent of each other. 2. Large Samples. Both sample sizes are large enough that at least 10 success and 10 failures can be expected in both samples. In symbols, this means we need to check: n1 p̂1 ³10 n1 (1- p̂1 ) ³10 n2 p̂2 ³10 n2 (1- p̂2 ) ³10 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 87 Checking Conditions (Continued) 3. Big Population. If the samples are collected without replacement, both population sizes must be at least 10 times bigger than their samples. 3. Independent Samples. The samples must be independent of each other. This means there can be no relationship between the objects in one sample and the objects in the other. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 88 Example: Space Station In August 2014 the Pew Poll asked random samples of 2002 Americans (non-scientists) and 3748 scientists if the Space Station has been a good investment for the US. Of the nonscientists polled, 64% said that the Space Station has been a good investment; of the scientists polled, 68% said that the Space Station has been a good investment. Construct a 95% confidence interval for the difference between the proportion of scientists and non-scientists who believe the Space Station has been good investment for the US. Based on your confidence interval, is there a difference in the population proportions? Explain. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 89 Example: Space Station Check conditions: 1. Random and Independent 2. Large Samples: all values greater than or equal to 10 a. b. 2002(0.64) = 1281.3 2002(1-0.64) = 720.7 3748(0.68) = 2548.6 3748(1-0.68) = 1199.4 3. Big Populations. The populations are at least 10 times greater than the sample sizes. 4. Independent Samples The conditions are met. Use technology to construct the confidence interval. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 90 Example: Space Station Non-scientists: 64% of 2002 = .64(2002) = 1281.28 ≈ 1281 Scientists: 68% of 3748 = .68(3748) = 2548.64 ≈ 2549 Use technology to construct the confidence interval. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 91 Confidence Interval (–0.0660, –0.0144) Copyright © 2017, 2014 Pearson Education, Inc. Slide 92 Interpreting Confidence Interval for Two Proportions When constructing a confidence interval for p1 – p2: Interval Contains 0 Both values are positive (+,+) Both values are negative (-, -) Interpretation The population proportions may be equal The population proportions are different and p1>p2 The population proportions are different and p1<p2 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 93 Interpreting Space Station Confidence Interval Our confidence interval for p1 – p2 is (–0.0660, –0.0144). The interval does not contain 0, so there is a significant different in the population proportions. p1 - p2 p̂1 - p̂2 Since both values in the confidence interval are negative, this tells us that p1 < p2. In other words, the proportion of nonscientists who believe the space station has been a good investment is less than that proportion of scientists who believe so. The plausible values for the difference in the population proportions lie between 1.4% and 6.6%. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 94 Example: Interpreting Confidence Intervals Extracting natural gas through hydraulic fracturing, or “fracking,” is controversial practice. Residents in two states with large natural gas deposits, Michigan and Pennsylvania, were polled by the University of Michigan to determine whether they favored or opposed fracking. Let p1 represent the proportion of Michigan residents who favor fracking and p2 represent the proportion of Pennsylvania residents who favor fracking. A 95% confidence interval for the difference in the population proportions is (–0.0184, 0.117). Does this interval indicate there is a difference in support of fracking between these two states? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 95 Example: Interpreting Confidence Intervals Since the confidence interval contains 0, we can conclude there is no significant difference in the proportion of residents who support fracking in these two states. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 96