Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 10 Asking and Answering Questions about a Population Proportion Created by Kathy Fritz In the original study, 1385 women were sent a 19 question survey. Of the 561 surveys returned, 229 women said they would like to choose the sex of a future child. Of these 229 women, 140 choose to have a girl. The article “Boy or Which Gender Baby Would What is the population of Girl: interest? You Pick?” (LiveScience, March 23, 2005, Are the 561 womenawho responded to the www.livescience.com) summarized study that was survey representative of the population? In thisinchapter, learn techniques for published Fertilityyou andwill Sterility. The LiveScience Does the high nonresponse testing this claim.statement: article makes the following rate pose a problem? “When given the opportunity to choose the sex of their baby, women are just as likely to choose pink socks as blue socks, a new study shows.” Or, is observing a sample proportion as large as 0.611 very unlikely if the population proportion is 0.50? Hypotheses and Possible Conclusions Null Hypothesis Alternative Hypothesis Hypotheses In its simplest form, a hypothesis is a claim or statement about the valueHypotheses of a single are population characteristic. ALWAYS statements about the population characteristic – The following are examples of hypotheses about NEVER the sample statistic. population proportions: Hypothesis Population Proportion of Interest The Hypothesis says . . . p < 0.25 Where p is the proportion of e-mail messages that included an attachment Less than 25% of the e-mail messages sent included an attachment p > 0.8 Where p is the proportion of e-mail messages that were longer than 500 characters More than 80% of the e-mail messages sent were longer than 500 characters p = 0.3 Where p is the proportion of e-mail messages that were sent to multiple recipients 30% of the e-mail messages sent were sent to multiple recipients What is a hypothesis test? A hypothesis test uses sample data to choose between two competing hypotheses about a population characteristic. Suppose that a particular community college claims that the majority of students completing an associate’s degree transfer to a 4-year college. You would then want to determine if the sample data provide convincing evidence support of the of Notice thatinthe hypothesis hypothesis p > 0.5. interest is one of the two competing hypotheses. To test a claim: set up competing hypotheses p ≤ 0.5 and p > 0.5 Hypothesis statements: The null hypothesis, denoted by H0, is a claim about a population characteristic that is initially assumed to be true. If theout sample not provide such In carrying a testdata of Hdo versus H , the null 0 a evidence, H0inisfavor not rejected. The alternative hypothesis, denoted by Ha, is the hypothesis H is rejected of the alternative 0 Notice that the conclusions are made about the competing claim. hypothesis Ha ONLY if the sample data provide null hypothesis NOT about the alternative! convincing evidence that H0 is false. Two possible conclusions in a hypothesis test are: • Reject H0 • Fail to reject H0 The Form of Hypotheses: This one is considered a twoNull hypothesis tailed test because you are H0: interested populationincharacteristic both directions. = hypothesized value Alternative hypothesis The null hypothesis always Ha: population characteristic > hypothesized value includes the equal case. This hypothesized value is a Ha: population characteristic < hypothesized value specific number determined by Ha: Notice population ≠This hypothesized value sign is uses the context of the problem thatcharacteristic the alternative hypothesis These are considered one- determined by the the same population characteristic and the tailed tests because you are context of the hypothesized valuewriting as the null Let’s onlysame interested in onepractice problem. hypothesis. hypothesis statements. direction. Let’s consider a murder trial . . . What is the null hypothesis? This is what you assume is true before you begin. If there is not convincing evidence, then we would reject” the hypothesis? null hypothesis. What is“fail the to alternative H0: the defendant is innocent end upisdetermining the null HaWe : thenever defendant guilty hypothesis is true – only that there is not To determine which hypothesis enough evidence to say it’s is notcorrect, true. the jury will listen to the evidence. Only if there is “evidence beyond a reasonable doubt” would the null hypothesis be rejected in favor of the alternative hypothesis. In a study, researchers were interested in determining if sample data support the claim that more than one in four young adults live with their parents. Define the population characteristic: p = the proportion young who as: It is acceptable to writeofthe null adults hypothesis live with their parents H0: p ≤ 0.25 State the hypotheses : What is the hypothesized H0: p = 0.25 value? What words indicate the direction alternative Ha: pof> the 0.25 hypothesis? A study included data from a survey of 1752 people ages 13 to 39. One of the survey questions asked participants how satisfied they were with their current financial situation. Suppose you want to determine if the survey data provide convincing evidence that fewer than 10% of adults 19 to 39 are very satisfied with their current financial situation. Define the population characteristic: p = the proportion of adults ages 19 to 39 who are very satisfied with their current financial situation State the hypotheses : H0: p = 0.10 Ha: p < 0.10 What words indicate the direction of the alternative hypothesis? The manufacturer of M&Ms claims that 40% of plain M&Ms are brown. A sample of M&Ms will be used to determine if the proportion of brown M&Ms is different from what the manufacturer claims. Define the population characteristic: p = the proportion of plain M&Ms that are brown State the hypotheses : What words indicate the direction H0: p = 0.40of the alternative hypothesis? Ha: p ≠ 0.40 For each pair of hypotheses, indicate which are not legitimate and explain why Must use a population characteristic! a) H0 : p 0.15 ; Ha : p 0.15 ˆinclude does NOT b) H0 : pˆHa 0 .4 ; H a : p 0.4 Must usestatement! same equality asa :inpH0!0.1 c) H0 : p number 0.1 ; H H0 MUST include the d) H0 : p 0.23statement! ; Ha : p 0.32 equality e) H0 : p .5 ; Ha : p .5 Potential Errors in Hypothesis Testing Type I Errors Type II Errors Significance Level When you perform a hypothesis test you make a decision: reject H0 or fail to reject H0 Each could possibly be a wrong decision; therefore, there are two types of errors. Type I error A Type I error is the error of rejecting H0 when H0 is true. The probability of a Type I error is denoted by a. This is the lower-case Greek letter “alpha”. In a hypothesis test, the probability of a Type I error, a, is also called the significance level. Type II error A Type II error is the error of failing to reject H0 when H0 is false. The probability of a Type II error is denoted by b. This is the lower-case Greek letter “beta”. The U.S. Bureau of Transportation Statistics reports that for 2008, 65.3% of all domestic passenger flights arrived on time (meaning within 15 minutes of its scheduled arrival time). Suppose that an airline with a poor on-time record decides to offer its employees a bonus if the airline’s proportion of on-time flights exceeds the overall industry rate of 0.653 in an upcoming month. Let p = the actual proportion of the airline’s flights that are on time during the month of interest. The hypotheses are: State StateaaType Type II I error errorin in AA Type IIIerror is concluding concluding that that Type is not context. H0: p = 0.653 the airline on-time rate exceeds the airline’s on-time proportion overallthan industry rating when in Ha: p > 0.653 the is better the industry fact the airline have a proportion whendoes the not airline better on-time really did have record. a better on-time record. Boston Scientific developed a new heart stent used to treat arteries blocked by heart disease. The new stent, called the Liberte, is made of thinner metal than heart stents currently in use, making it easier for doctors to direct the stent to a blockage. A consequence of making a Type I error In order to obtain approval to sell the new Liberte stent, the be that the new stent is approved Food and Drug would Administration (FDA) required Boston Scientific for sale. patients will experience reto provide evidence thatMore the proportion of patients receiving blockedaarteries. the Liberte stent who experienced re-blocked artery was less than 0.1. Let p = the proportion of patients receiving the Liberte stent who experience a re-blocked artery H0: p = 0.1 Ha: p < 0.1 A Type beintothe StateI aerror Typewould I error conclude thatofthe context thisre-block problem. proportion for the new stent is less than 0.1 when it really is 0.1 (or greater). Boston Scientific developed a new heart stent used to treat arteries blocked by heart disease. The new stent, called the Liberte, is made of thinner metal than heart stents currently in use, making it easier for doctors to direct the stent to a blockage. In order to obtain approval to sell the new Liberte stent, the A consequence of making a Type II error Food and Drug Administration (FDA) required Boston Scientific would be that the new stent is not to provide evidence that the proportion of patients receiving for sale. Patients and doctors the Liberte stentapproved who experienced a re-blocked artery was less will not benefit from the new design. than 0.1. Let p = the proportion of patients receiving the Liberte stent who experience a re-blocked artery H0: p = 0.1 Ha: p < 0.1 A Type II error would be that State a Type II error in the you are not convinced that the context of this problem. re-block proportion for the new stent is less than 0.1 when it really is less than 0.1. The relationship between a and b The ideal test procedure would result in both a = 0 (probability of a Type I error) and b = 0 (probability of a Type II error). Selecting a significance level a = 0.05 results in a test procedure that, used over This isand impossible to achieve since rejects we must over with different samples, a base our decisiontrue on sample H0 aboutdata. 5 times in 100. Standard test procedures allow us to select a, the significance of thechoose test,abut So whylevel not always smallwe a have no direct control over (like ab=. 0.05 or a = 0.01)? The relationship between a and b If the null hypothesis is false and the alternative hypothesis is true, then the true Let’sproportion consideristhe believed to berepresent greater than This tail would b, 0.5 – Thisreally is the of the following hypotheses: so the curve bepart shifted the theshould probability of failing toto curve that represents right. reject a false H0a . or the Type I error. H0: p = 0.5 Ha: p > 0.5 Let a = 0.05 .5 The relationship between a and b If the null hypothesis is false and the alternative hypothesis is true, then the true Notice that as a gets Let’s consider proportionthe is believed to be greater than 0.5 – smaller, b gets larger! so hypotheses: the curve should really be shifted to the following right. This tail would represent b, the probability of failing to reject a false H0. H0: p = 0.5 Ha: p > 0.5 Let a = 0.01 How does one decide what a level to use? After assessing the consequences of type I and type II errors, identify the largest a that is tolerable for the problem. Then employ a test procedure that uses this maximum acceptable value – rather than anything smaller – as the level of significance. Remember, using a smaller a increases b. Heart Stents Revisited . . . Let p = the proportion of patients receiving the Liberte stent who experience a re-blocked artery H0: p = 0.1 versus Ha: p < 0.1 A consequence of making a Type I error would be that the new stent is approved for sale. More patients will experience re-blocked arteries. A Type I error has a amore consequence. A consequence of making Typeserious II error would be that the new stent is not approved for sale. Patients and Becausewill thisnot represents an unnecessary risk to patients doctors benefit from the new design. (given that other stents with lower re-block proportions are available), a small valuetype for a , such 0.01, Which of errorashas thewould morebe selected. serious consequence? The Logic of Hypothesis Testing An Informal Example In June 2006, an Associated Press survey was conducted to investigate how people use the nutritional information provided on food packages. Interviews were conducted with 1003 randomly selected adult Americans, and each participant was asked a series of questions, including the following two: Based on these data, is it reasonable to conclude that Question 1: When purchasing packaged food, howcheck often do a majority of adult Americans frequently you checklabels the nutritional labeling on the package? nutritional when purchasing packaged foods? Question 2: How often do you purchase food that is bad for you, even after you’ve checked the nutrition labels? It was reported that 582 responded “frequently” to the question about checking labels and 441 responded “very often” or “somewhat often” to the question about purchasing bad foods even after checking the labels. Nutritional Labels Continued . . . H0: p = 0.5 Ha: p > 0.5 p = true proportion of adult Americans who frequently check nutritional labels We use p > 0.5 to test for a majority of 582frequently check For this sample: adult Americans who pˆ .58 nutritional 1003labels. Thus, we have convincing evidence to suggest that the null hypothesis is not true. We would reject H0. A Procedure for Carrying Out a Hypothesis Test Test Statistic P-value Test Statistic A test statistic is computed using sample data. The value of the test statistic is used to determine the P-value associated with the test. P-values The P-value (also sometimes called the observed significance level) is a measure of inconsistency between the null hypothesis and the observed sample. It is the probability, assuming that H0 is true, of obtaining a test statistic value at least as inconsistent with H0 as what actually resulted. You reject the null hypothesis when the P-value is small. Using P-values to make a decision: A decision in a hypothesis test is based on comparing the P-value to the chosen significance level a. H0 is rejected if the P-value ≤ a. H0 is not rejected if the P-value > a. What decision would be made if a = 0.01? For example, suppose that P-value = 0.0352 and a = 0.05. Then, because 0.0352 ≤ 0.05 H0 would be rejected. Recall the 5 Steps for Performing a Hypothesis Test Step This Step Includes . . . H Hypotheses 1. Describe the population characteristic of interest. 2. Translate the research question or claim into null and alternative hypotheses. 1. Identify the appropriate test and test statistic. M Method C Check C Calculate C Communicate Results 2. Select a significance level for the test. 1. Verify that any conditions for the selected test are met. 1. Find the values of any sample statistics needed to calculate the value of the test statistic. 2. Calculate the value of the test statistic. 3. Determine the P-value for the test. 1. Compare the P-value to the selected significance level and make a decision to either reject H0 or fail to reject H0. 2. Provide a conclusion in words that is in context and addresses the question of interest. Large Sample Hypothesis Test for a Population Proportion Computing P-values The calculation of the P-value depends on the form of the inequality in the alternative hypothesis. Ha: p > hypothesize value z curve P-value = area in upper tail Calculated z Computing P-values The calculation of the P-value depends on the form of the inequality in the alternative hypothesis. Ha: p < hypothesize value z curve P-value = area in lower tail Calculated z Computing P-values The calculation of the P-value depends on the form of the inequality in the alternative hypothesis. Ha: p ≠ hypothesize value P-value = sum of area in two tails z curve Calculated -z and z A Large-Sample Test for a Population Proportion Appropriate when the following conditions are met: 1. The sample is a random sample from the population of interest or the sample is selected in a way that would result in a representative sample. 2. The sample size n is large. This condition is met when both np > 10 and n (1 - p) > 10. When these conditions are met, the following test statistic can be used: pˆ p0 z p0 (1 P0 ) n Where p0 is the hypothesized value from the null hypothesis A Large-Sample Test for a Population Proportion Continued . . . Null hypothesis: H0: p = p0 When the Alternative Hypothesis Is . . . The P-value Is . . . Ha: p > p0 Area under the z curve to the right of the calculated value of the test statistic Ha: p < p0 Area under the z curve to the left of the calculated value of the test statistic Ha: p ≠ p0 2·(area to the right of z) if z is positive Or 2·(area to the left of z) if z is negative In a study, 2205 adolescents ages 12 to 19 took a cardiovascular treadmill test. The researchers conducting the study believed that the sample was representative of adolescents nationwide. Of the 2205 adolescents tested, 750 had a poor level of cardiovascular fitness. Does this sample provide convincing evidence that more than thirty percent of adolescents have a poor level of cardiovascular fitness? Hypothesis: Let p = proportion of American adolescents who have a poor level of cardiovascular fitness H0: p = 0.30 Ha: p > 0.30 Cardiovascular Fitness Continued . . . H0: p = 0.30 versus Ha: p > 0.30 Method: Because the aanswers to the level, four key are 1) the To select significance youquestions must consider hypothesis testing, 2) sample Type data, 3) one Type categorical variable, consequences of potential I and II errors. and 4) one sample, consider a large sample hypothesis test for a population proportion. What are the Type I and Type II errors? Which has the most serious consequence? Significance level: a = 0.05 Because neither type of error is much worse than the other, you might choose a value of 0.05. Cardiovascular Fitness Continued . . . H0: p = 0.30 versus Ha: p > 0.30 Check: 1. The researchers believed the sample to be representative of adolescents nationwide. 2. The sample size is large enough because np0 = 2205(0.3) = 661.5 ≥ 10 and n (1 - p0) = 2205(0.7) = 1543.5 ≥ 10 Cardiovascular Fitness Continued . . . H0: p = 0.30 versus Ha: p > 0.30 z = 4.00 P-value ≈ 0 Communicate Results: Decision: 0 < 0.05, Reject H0 Conclusion: The sample provides convincing evidence that more than 30% of adolescents have a poor fitness level. Notice that the conclusion answers the question that was posed in the problem. A Few Final Things to Consider 1. What about Small Samples? In np ≥ 10 and n (1 – p) ≥ 10, the standard normal distribution is a reasonable approximation to the distribution of the z test statistic when the null hypothesis is true. If the sample size is not large enough to satisfy the large sample conditions, the distribution of the test statistic may be quite different from the standard normal distribution. Thus, you can’t use the standard normal distribution to calculate P-values. A Few Final Things to Consider 2. Choosing a Potential Method Take a look back at Table 7.1 (on page 420 and also on the inside back cover of the text). As you get into the habit of answering the four key questions for each new situation that you encounter, it will become easier to use this table to select an appropriate method in a given situation. Power and Probability of Type II Error Suppose that the manager of a grocery store is thinking about expanding the store’s selection of organically grown produce. Because organically grown produce costs more than produce that is not organically grown, the manager thinks this expansion will be profitable only if the proportion of store customers who would pay more for organic produce is greater than 0.30. What is a Type I error What a Type sample II error The manager plans to ask each person in aisrandom for this context? for to this context? of customers if he or she would be willing pay more for organic produce. He will then use the resulting data to test H0: p = 0.30 versus Ha: p > 0.30 using a significance level of a = 0.05. Grocery Store Problem Continued . . . H0: p = 0.30 versus Ha: p > 0.30 How likely is it that the null hypothesis is rejected? Grocery Store Problem Continued . . . H0: p = 0.30 versus Ha: p > 0.30 How likely is it that the null hypothesis is rejected? Let’s investigate this further. A package delivery service advertises that at least 90% of all packages brought to its office by 9 a.m. for delivery in the same city are delivered by noon that day. Let p denote the proportion of all such packages actually delivered by noon. H0: p = 0.90 versus Ha: p < 0.90 Suppose the value for the proportion of all packages delivered by noon is The actually p = 0.80. Using a states that alternative hypothesis significance level of 0.01the andcompany’s a sample of n = is 225 claim untrue. packages, what is the probability that the departure from H0 represented by this alternative value (0.80) will be detected (H0 rejected)? The calculations for the probabilities of Type II errors and of power are NOT in the AP Statistics course description. Package Delivery Continued . . . Let p denote the proportion of all such packages actually delivered by noon. H0: p = 0.90 versus Ha: p < 0.90 a = 0.01 This area of the alternative p = 0.90 versus pa = from 0.80 This area of the alternative The probability departure distribution is0that whenthe H0 is rejected. distribution whenhas H0these has not H0sampling represented by this alternative 0.90 The distribution of 𝑝 whenofpvalue =is0.80 It represents the probability been rejected. (0.80) will be detected (H rejected) properties: 0 correctly rejecting a false H0, called is It represents the probability reasonable large. 0.90 𝜇𝑝𝑝 = 0.80 Since n is large, the distribution the power of the test. of a Type II error 0.02 𝜎𝑝𝑝 == 0.027 is approximately normal. Reject Fail to reject Notice: Power + b = 1 a 0.8 b 0.9 0.9 Package Delivery Continued . . what happens if the Let’s. see population proportion Let p denote the proportionactual of all such packages actuallyis p = 0.85. delivered by noon. H0: p = 0.90 versus Ha: p < 0.90 a = 0.01 Notice that the discrepancy between the p0 = 0.90 versus pa = 0.85 hypothesized value and the actual value of the The sampling distribution of 𝑝 when has these population characteristic ( |p0 –pp=a|0.85 ) is smaller, but properties: the probability of making a Type II error has 𝜇𝑝 = 0.85 increased. Thus, Since power n is large, the distribution is smaller. 𝜎𝑝 = 0.024 is approximately normal. Reject Fail to reject a b Effects of Various Factors on the Power of a Test • The larger the size of the differences between the hypothesized value and the actual value of the population characteristic, the higher the power. Package Delivery Continued . . . we use a significance Suppose level of 0.05.actually Let p denote the proportion of all such packages delivered by noon. H0: p = 0.90 versus Ha: p < 0.90 a = 0.05 p0 = 0.90 versus pa = 0.85 The sampling distribution of 𝑝 when p = 0.85 has these properties: 𝜇𝑝 = 0.85 Since n is large, the distribution Notice that the larger 𝜎𝑝 = 0.024 is approximately normal. the significance level Fail to a, the smaller Reject reject probability of making a Type II error. Thus, power is larger. a 0.85 b 0.9 Effects of Various Factors on the Power of a Test • The larger the size of the differences between the hypothesized value and the actual value of the population characteristic, the higher the power. • The larger the significance level a, the higher the power of a test. Package Delivery Continued . . what happens if the Let’s. see was n =actually 500 Let p denote the proportion of sample all suchsize packages instead of n = 225. delivered by noon. H0: psample = 0.90 versus Ha: p < 0.90 The larger the size, the smaller 𝜎𝑝 a = 0.05 p0 = 0.90 versus pa = 0.85 The sampling distribution of 𝑝 when p = 0.9 has a standard deviation of Notice that the probability of a 𝜎𝑝 = 0.0134. Type II error (b) is much smaller, The sampling distribution of 𝑝 when p = 0.85 has a standard deviation of the power of the test larger. 𝜎making 𝑝 = 0.0159. The shorter, taller curves dotted are the curves sampling are the sampling distributions distributions when n = 500. when n = 225. Reject Fail to reject a 0.85 b 0.9 Effects of Various Factors on the Power of a Test • The larger the size of the differences between the hypothesized value and the actual value of the population characteristic, the higher the power. • The larger the significance level a, the higher the power of a test. • The larger the sample size, the higher the power of a test Avoid These Common Mistakes Be sure to include all the relevant information: 1. Hypothesis. Whether specified in symbols or described in words, it is important that both the null and the alternative hypothesis be clearly stated. If using symbols, be sure to define them in the context of the problem. 2. Test procedure. You should be clear about what test procedure was used and why you think it was reasonable to use this procedure. 3. Test statistic. Be sure to include the value of the test statistic and the associated P-value. 4. Conclusion in context. Always provide a conclusion that is in the context of the problem and that answers the question posed. Avoid These Common Mistakes 1. A hypothesis test can never show strong support for the null hypothesis. Make sure that you don’t confuse “There is no reason to believe the null is not true” with the statement “There is convincing evidence that the null hypothesis is true”. These are very different statements! This is like saying the defendant is “innocent” instead of “not guilty”. Avoid These Common Mistakes 2. If you have complete information for the population (census), don’t carry out a hypothesis test! Avoid These Common Mistakes 3. Don’t confuse statistical significance with practical significance. When a null hypothesis has been rejected, be sure to step back and evaluate the result in the light of its practical importance. For example, you may be convinced that the proportion who respond favorably to a proposed new medical treatment is greater than p0 = 0.4. But if your estimate for the proposed new treatment is 0.405, it may not be of any practical use.