Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter Summaries 741 Chapter 4: Hypothesis Tests Hypothesis tests are used to investigate claims about population parameters. We use the question of interest to determine the two competing hypotheses: The null hypothesis (H0 ) is generally that there is no effect or no difference while the alternative hypothesis (Ha ) is the claim for which we seek evidence. We conclude in favor of the alternative hypothesis if the sample supports the alternative hypothesis and provides strong evidence against the null hypothesis. We measure the strength of evidence a sample shows against the null hypothesis with a p-value. The p-value is the probability of obtaining a sample statistic as extreme as (or more extreme than) the observed sample statistic, when the null hypothesis is true. A small p-value means that the observed sample results would be unlikely to happen, when the null hypothesis is true, just by random chance. When making formal decisions based on the p-value, we use a pre-specified significance level, 𝛼. • If p-value < 𝛼, we reject H0 and have statistically significant evidence for Ha . • If p-value ≥ 𝛼, we do not reject H0 , the test is inconclusive, and the results are not statistically significant. The key idea is: The smaller the p-value, the stronger the evidence against the null hypothesis and in support of the alternative hypothesis. Rather than making a formal reject/do not reject decision, we sometimes interpret the p-value as a measure of strength of evidence. One way to estimate a p-value is to construct a randomization distribution of sample statistics that we might see by random chance, if the null hypothesis were true. The p-value is the proportion of randomization statistics that are as extreme as the observed sample statistic. If the original sample falls out in the tails, then a result that extreme is unlikely to occur if the null hypothesis is true, providing evidence against the null. A randomization distribution for difference in mean memory recall between sleep and caffeine groups for data in SleepCaffeine is shown. Each dot is a difference in means that might be observed just by random assignment to treatment groups, if there were no difference in terms of mean (memory) response. We see that 0.042 of the simulated statistics are as extreme as the observed statistic (xs − xc = 3), so the p-value is 0.042. This p-value is less than 0.05, so the results are statistically significant at 𝛼 = 0.05, giving moderately strong evidence that sleeping is better than drinking caffeine for memory. Randomization Dotplot of x1 — x2 ; Null hypothesis μ1 = μ2 70 Left Tail Two Tail Right Tail # samples = 1000 mean = 0.092 st. dev. = 1.48 60 50 40 0.021 30 0.958 0.021 20 10 0 ‒4 ‒3 ‒2.833 ‒2 ‒1 0 0.092 1 2 Randomization distribution of differences in means 3 3 4