* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download AP Review Inference - Hypotheses Test Key
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Law of large numbers wikipedia , lookup
Gibbs sampling wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
AP Statistics Vocabulary Significance test Alternative hypotheses statistically significant type II error sampling distribution chi-square goodness of fit observed count AP Review Inference – Hypotheses Tests Null hypothesis p-value type I error power standard error chi-square distribution expected count Notes A significance test assesses the evidence provided by data against a null hypothesis and in favor of an alternative hypothesis The hypotheses are usually stated in terms of population parameters. Often, H0 is a statement of no change or no difference. The alternative hypothesis states what we hope or suspect is true. A one-sided alternative says that a parameter differs from the null hypothesis value in a specific direction. A two-sided alternative says that a parameter differs from the null value in either direction. The reasoning of a significance test is as follows. Suppose that the null hypothesis is true. If we repeated our data production many times, would we often get data as inconsistent with H0 in the direction specified by Ha as the data we actually have? If the data are unlikely when H0 is true, they provide evidence against H0 and in favor of Ha . The p-value of a test is the probability, computed supposing H0 to be true, that the statistic will take a value at least as extreme as the observed result in the direction specified by Ha . Small p-values indicate strong evidence against H0 . To calculate a p-value, we must know the sampling distribution of the test statistic when H0 is true. If the p-value is smaller than a specified value α (called the significance level), the data are statistically significant at level α. In that case, we can reject H0 and say that we have convincing evidence for Ha . If the p-value is greater than or equal to α, we fail to reject H0 and say that we do not have convincing evidence for Ha . A Type I error occurs if we reject H0 when it is in fact true. In other words, the data give convincing evidence for Ha when the null hypothesis is correct A Type II error occurs if we fail to reject H0 when Ha true. In other words, the data don’t give convincing evidence for Ha even though the alternative hypothesis is correct. In a fixed level α significance test, the probability of a Type I error is the significance level α. Follow a four-step process when asked to carry out a hypothesis test Hypotheses: Write a null and alternative hypothesis in terms of a parameter (population) and in context Conditions: Usually Random, Normal and Independent be sure to show how you checked each Calculations: Write either the name or formula of the test, test statistic and p-value. (df when appropriate) Conclusion: Compare your p-value and α, write whether or not you reject H0 , and state your conclusion in the context of the problem Confidence intervals provide additional information that significance tests do not – namely, a set of plausible values for the true population parameter. The power of a significance test against a specific alternative is the probability that the test will reject H0 when the alternative is true. Power measures the ability of the test to detect an alternative value of the parameter. For a specific alternative, P(Type II error) = 1 – power We can increase the power of a significance test by increasing the sample size, increasing the significance level, or increasing the difference that is important to detect between the null and alternative parameter values. Analyze paired data by first taking the difference within each pair to produce a single sample. Then use onesample t procedures. Don’t use t sample t procedures to compare means for paired data. Very small differences can be highly significant (small p-value) when a test is based on a large sample. A statistically significant difference need not be practically important. The chi-square goodness of fit tests the null hypothesis that a categorical variable has a specified distribution in the population of interest. If a chi-square test finds a statistically significant result, consider doing a follow-up analysis that compares the observed and expected counts and that looks for the largest components of the chi-square statistic. Some studies aim to compare the distribution of a single categorical variable for each of several populations or treatments. In such cases, researchers should take independent random samples from the populations of interest or sue the groups in a randomized experiment. The null hypothesis is that there is no difference in the distribution of the categorical variable for each of the populations or treatments. We use the chisquare test for homogeneity to test this hypothesis. Other studies are designed to investigate the relationship between two categorical variables. In such cases, researchers take a random sample from the population of interest and classify each individual based on the two categorical variables. The chi-square test for independence tests the null hypothesis that there is no association between the two categorical variables in the population of interest. Least squares regression fits a straight line of the form ŷ a bx to data to predict a response variable y from an explanatory variable x. Inference the this setting uses the sample regression line to estimate or test a claim about the population regression line Confidence intervals and significance tests for the slope of the population regression line are based on a t distribution with n – 2 degrees of freedom. Curved relationships between two quantitative variables can sometimes be changed into linear relationships by transforming one or both of the variables. Once we transform the data to achieve linearity, we can fit a least-squares regression line to the transformed data and use this linear model to make predictions. Problems 1. Suppose you suspect a “chute” of playing cards is not fair. The chute supposedly contains 10 standard decks shuffled together. You are interested in knowing whether there are more hearts than usual. To test this, you deal 12 cards at random and calculate the proportion of hearts in your hand. A. Describe the parameter of interest in this setting. The proportion of hearts in the chute B. Write the appropriate null and alternative hypotheses for this situation. H0 : p 0.25 the proportion of hearts in the chute is 0.25 Ha : p 0.25 the proportion of hearts in the chute is more than 0.25 C. Suppose your deal contains 7 hearts. What is the sample proportion of hearts? Is it possible to deal 7 hearts out of 12 cards if the chute really does contain standard decks? Is it likely? Why or why not? It is possible that you would get 7 out of 12 cards hearts making your proportion 0.5833. If in fact one fourth of the cards are hearts, the probability of getting a sample of 7 out of 12 red cards is 7 0.25 7 12 0.0038 . Since we would get 7 or more hearts out of a sample of P pˆ P z 12 0.25 1 0.25 12 12 cards less than 1% of the time by chance, it is very unlikely. D. Describe a Type I error in this setting We conclude there are more than 50% red cards in the chute when in fact there are 50% E. Describe a Type II error in this setting We conclude there are 50% red cards in the chute when in fact there are more than 50% 2. Humerus bones from the same species of animal tend to have approximately the same length-to-width ratios. When fossils of humerus bones are discovered, archeologists can often determine the species of animal by examining these ratios. It is known that the species Molekius Primatium exhibits a mean ratio of µ = 8.9. Suppose 41 fossils of humerus bones are unearthed at a site on Minnesota’s Iron Range, where this species was known to have lived. Researchers are willing to view these as a random sample of all such humerus bones. The length-to-width ratios were calculated and are listed below. Test whether the population mean for the species that left these bones differs from 8.9 at α = 0.05. 9.73 9.17 6.66 8.87 9.2 9.93 9.84 10.48 8.71 10.89 12 9.35 6.23 9.33 8.91 9.59 10.39 9.57 9.07 9.38 8.86 9.41 9.98 11.77 9.48 9.39 9.29 9.17 9.94 8.39 6.85 8.38 9.89 8.07 8.8 8.52 8.17 8.37 10.02 8.3 11.67 Hypotheses: H0 : 8.9 the population mean for the species that left these bones is 8.9 Ha : 8.9 the population mean for the species that left these bones differs from 8.9 Conditions: Random: stated Independent: it is reasonable to believe there are more than 41(10) = 410 humerus bones. Normal: Since we have a large sample size (41 > 30), the sampling distribution is approximately normal. Calculations: 1 – sample t test x 9.27 8.9 t 1.97 s 1.199 n 41 df = 41 – 1 = 40 p 0.0558 Conclusion: Since p 0.0558 0.05 , I fail to reject H0 . There is insufficient evidence to conclude the population mean for the species that left these bones differs from 8.9. 3. A recent study tracked the television viewing habits of 100 randomly selected first-grade boys and 200 randomly selected first-grade girls. Each child was asked to identify their favorite TV show. The following table summarizes the results: Zooboomafoo iCarly Phineas and Ferb Boys 20 30 30 36.67 50 33.33 Girls 70 60 80 73.33 50 66.67 Do these data provide convincing evidence that television preferences differ significantly for boys and girls? Hypotheses: H0 : Television preferences are the same for boys and girls Ha : Television preferences are different for boys and girls Conditions: Random: stated Independent: it is reasonable to believe there are more than 100(10) = 1000 first grade boys and more than 200(10) = 200 first grade girls. All expected counts > 5 (see chart). Calculations: 2 Test 2 observed p 0.000064 expected expected 2 20 30 2 30 36.67 30 36.67 df = (3 – 1)(2 – 1) = 2 2 ... 19.318 Conclusion: Since p 0.000064 0.05 , I reject H0 . There is sufficient evidence to conclude television preferences are different for boys and girls. A follow-up analysis suggests the biggest difference occurs in the preferences for Phineas and Ferb, with more boys and fewer girls preferring that show than expected 4. A study of classic authors uncovered a distinguishable speech pattern that differed from author to author. Plato utilized this pattern in 21.4% of the passages in his works. The owner of a rare bookstore claims to have an original Plato work, but you suspect the speech pattern occurs too frequently to be an original Plato work. A random sample of passages from the work in question was taken and it was found that 136 of the 439 selected passages followed the speech pattern. Do these data provide convincing evidence that the work was not written by Plato? Hypotheses: H0 : p 0.214 the speech pattern occurs 21.4% in this work: the work is written by Plato Ha : p 0.214 the speech pattern occurs more than 21.4% in this work; the work is not written by Plato Conditions: Random: stated Independent: it is reasonable to believe there are more than 439(10) = 4390 passages in this work. 136 136 Normal: Since 439 136 10 and 439 1 303 10 , the sample is large enough to 439 439 conclude the sampling distribution is approximately normal. Calculations: 1 – proportion z test 136 0.214 pˆ p 439 z 4.894 p 1 p 0.214 1 0.214 n 439 p 4.95 107 Conclusion: Since p < any reasonable α, I reject H0 . There is sufficient evidence to conclude the speech pattern occurs more than 21.4% in this work; the work is not written by Plato. 5. A recent study suggested that 77% of teenagers have texted while driving. A random sample of 27 teenage drivers in Atlanta was taken and 15 admitted to texting while driving. Use a 99% confidence interval to determine whether there is convincing evidence that the population proportion of teens who text while driving is different than 77%. Conditions: Random: stated Independent: it is reasonable to believe there are more than 27(10) = 270 teenagers who drive. 15 15 Normal: Since 27 15 10 and 27 1 12 10 , the sample is large enough to conclude the 27 27 sampling distribution is approximately normal. Calculations: 1 – proportion z interval pˆ z * 15 15 1 15 27 27 2.576 27 27 pˆ 1 pˆ n (0.309, 0.802) Conclusion: I am 99 % confident the true proportion of teenagers who text while driving is between 0.309 and 0.802. Since 77% is within the interval, there is insufficient evidence to conclude the true proportion of teenagers who text while driving is different than 77%. 6. Researchers are interested in studying the effect of sleep on exam performance. Suppose the population of individuals who get at least 8 hours of sleep prior to an exam score an average of 96 points on the exam with a standard deviation of 18 points. The population of individuals who get less than 8 hours of sleep score an average of 72 points with a standard deviation of 9.4 points. Suppose 40 individuals are randomly sampled from each population. A. Describe the shape, center, and spread of the sampling distribution of x1 x2 x x x x 96 72 24 1 2 x x 1 2 1 sx21 nx1 2 sx22 nx2 182 9.42 3.21 40 40 Since the sample size is large (40 > 30) for each group, the sampling distribution of x1 x2 is approximately normal. B. Find the probability of observing a difference in sample means of 2 points or more from the mean difference of the two samples. Show your work. P x1 x2 22 or x1 x2 26 P x1 x2 22 P x1 x2 26 22 24 26 24 Pt Pt 0.268 0.268 0.537 3.21 3.21 7. A study measured how fast subjects could repeatedly push a button when under the effects of caffeine. Subjects were asked to push a button as many times as possible in two minutes after consuming a typical amount of caffeine. During another test session, they were asked to push the button after taking a placebo. The subjects did not know which treatment they were administered each day and the order of the treatments was randomly assigned. The data, given in presses per two minutes for each treatment follows. Determine whether or not caffeine results in a higher rate of beats, on average, per two-minute period. Subject 1 2 3 4 5 6 7 8 9 10 11 Beats caffeine 281 284 300 421 240 294 377 345 303 340 408 Beats placebo 201 262 283 290 259 291 354 346 283 391 411 Differences 80 22 17 131 -19 3 23 -1 20 -51 -3 Hypotheses: H0 : d 0 the population mean difference between the number of beats with or without caffeine is 0. There is no difference between the number of beats with or without caffeine. Ha : d 0 the population mean difference between the number of beats with or without caffeine is greater than 0. Caffeine results in a higher rate of beats, on average. Conditions: Random: “randomly assigned” stated Independent: it is reasonable to believe there are more than 11(10) = 110 possible subjects. Normal: Since we don’t have a large sample size (11 < 30) and a graph of the differences shows outliers and skewness, it is not safe to assume the sampling distribution is approximately normal. This condition is not met. Calculations: 1 – sample t test x 20.18 0 t 1.373 s 48.75 n 11 df = 11 – 1 = 10 p 0.0999 Conclusion: Since p 0.0999 0.05 , I fail to reject H0 . There is insufficient evidence to conclude the population mean difference between the number of beats with or without caffeine is greater than 0. We cannot conclude that caffeine results in a higher rate of beats, on average 8. School officials are interested in implementing a policy that would allow students to bring their own technology to school for academic use. There are two large high schools in a town, Lakeville North and Lakeville South, each with 1700 students. At Lakeville North, 60% of students own technological devices that could be used for academic purposes. 75% of students at Lakeville South own those types of devices. The district takes an SRS of 125 students from Lakeville North and a separate SRS of 160 students at Lakeville South. The sample proportions of students who own devices that could be used at the school are recorded and the difference pˆS pˆN is determined to be 0.07. A. Describe the shape, center, and spread of the sampling distribution of pˆS pˆN . pˆ pˆ pS pN 0.75 0.60 0.15 S N pˆ pˆ S N pS 1 pS nS pN 1 pN nN 0.75 1 0.75 160 0.6 1 0.6 125 0.0556 Since the sample size is large (160(0.75) = 120 > 10, 160(1 – 0.75) = 40 > 10, 125(0.60) = 75 > 10, and 125(1 – 0.60) = 50 > 10), the sampling distribution of pˆS pˆN is approximately normal. B. Find the probability of getting a difference in sample proportions of 0.07 or less from the two surveys. Show your work. 0.07 0.15 P pˆS pˆN 0.07 P z 0.075 0.0556 C. Does the result in part B give you reason to doubt the study’s reported value? Explain. Although the probability we would 0.07 or less by chance is not extremely high, it is not small enough to give me reason to doubt the study’s reported value. 9. A school official suspects the difference in the proportion of students who own technological devices between Lakeville North and Lakeville South high schools may be a result of a difference in the socioeconomic status of the students in the two schools. The results of a random sampling of student registration records indicated 28 out of 120 students at Lakeville North came from low income families while 30 out of 150 students at Lakeville South came from low income families. Do these data provide convincing evidence that the proportion of low income students at Lakeville North is higher than the proportion at Lakeville South? Use a 5% significance level. Hypotheses: H0 : pN pS the proportion of students from low income families is the same for both high schools Ha : pN pS the proportion of low income students at Lakeville North is higher than the proportion at Lakeville South Conditions: Random: 2 independent random samples stated Independent: there are 1700 students at each high school (see problem 8) which is more than 120(10) = 1200 or 150(10) = 1500 students. 28 28 30 Normal: Since 120 28 10 , 120 1 30 10 and 92 10 , 150 120 150 120 30 150 1 120 10 , the samples are large enough to conclude the sampling distribution is 150 approximately normal. Calculations: 2 – proportion z test 28 30 ˆpN pˆS 120 150 z 0.66 28 28 30 30 pˆN 1 pˆN pˆS 1 pˆS 1 1 120 120 150 150 nN nS 120 150 p 0.2538 Conclusion: Since p > α (0.25 > 0.05), I fail to reject H0 . There is insufficient evidence to conclude the proportion of low income students at Lakeville North is higher than the proportion at Lakeville South. 10. Do boys have better short term memory than girls? A random sample of 200 boys and 150 girls was administered a short term memory test. The average score for boys was 48.9 with a standard deviation 12.96. The girls had as average score of 48.1 with standard deviation 11.85. Is there significant evidence at the 5% level to suggest boys have better short term memory than girls? Note: higher test scores indicate better short term memory. Hypotheses: H0 : B G Boys and girls have the same mean short term memory score. Ha : B G Boys have a higher mean short term memory score than girls. Conditions: Random: 2 independent random samples – stated Independent: there are more than 200(10) = 2000 boys and more than 150(10) = girls. Normal: Since we have a large sample sizes (200 < 30 and 150 > 30), the sampling distribution is approximately normal. Calculations: 2 – sample t test x x 48.9 48.1 t B G 0.6003 2 2 12.962 11.852 sB sG 200 150 nB nG Using the t chart and df = 150 – 1 = 149, p > 0.25 Using the calculator, df = 334.6 and p 0.274 Conclusion: Since p 0.27 0.05 , I fail to reject H0 . There is insufficient evidence to conclude boys have a higher mean short term memory score than girls. 11. After playing a dice game with a friend, you suspect the die may not be fair. That is, you suspect some numbers may be rolled more often than you would expect. To test your suspicion, you roll the die 300 times and record the results. Value: 1 2 3 4 5 6 Frequency: 42 55 38 57 64 44 Is there convincing evidence that the die is not fair? Hypotheses: H0 : the distribution of values is uniform. The dice is fair. Ha : some values are rolled more than others. The dice is not fair. Conditions: Random: rolling a die is a random value Independent: there are more than 300(10) = 3000 dice rolls All expected counts are 300/6 = 50 > 5 Calculations: 2 Goodness of Fit Test 2 observed expected 2 42 50 expected 50 Using the chart and df = 5, 0.05 < p < 0.10 Using the calculator and df = 5, p = 0.0677 Conclusion: 2 55 50 2 50 ... 10.28 Since p 0.0677 0.05 , I fail to reject H0 . There is insufficient evidence to conclude some values are rolled more than others. We cannot conclude the dice is unfair. 12. A study by Consumer Reports rated 77 randomly selected cereals on a 100 point scale (higher numbers are better) and recorded the number of grams of sugar in each serving. The scatterplot, residual plot, and computer output of the regression analysis are given. A. Use this output to determine the LSRL for the sample data. yˆ 59.284 2.4088 x Where ŷ is the predicted rating and x is the amount of sugar in grams per serving. B. Interpret the slope in the context of the situation. If the amount of sugar increases by 1 gram/serving, we predict the rating will decrease on average 2.4088. C. Is there convincing evidence that the slope of the true regression line is less than zero? Assume the conditions for inference are met. Hypotheses: H0 : 0 there is no linear relationship between ratings and sugar content. Ha : 0 the slope of the true regression line is less than zero. There is a negative linear relationship between ratings and sugar content. Conditions: met – stated Calculations: t Test for slope Using the computer output, t = 1 – 10.12 and p =0 Conclusion: Since p any reasonable , I reject H0 . There is sufficient evidence to conclude the slope of the true regression line is less than zero. There is a negative linear relationship between ratings and sugar content