Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IQL Chapter 9 – Hypothesis Testing Statistical Reasoning for everyday life, Bennett, Briggs, Triola, 3rd Edition 9.1 Fundamentals of Hypothesis Testing LEARNING GOAL Understand the goal of hypothesis testing and the basic structure of a hypothesis test, including how to set up the null and alternative hypotheses, how to determine the possible outcomes of a hypothesis test, and how to decide between these possible outcomes. In statistics, questions are answered through hypothesis testing. FORMULATING THE HYPOTHESIS Null and Alternative Hypotheses The null hypothesis, or H0, is the starting assumption for a hypothesis test. For the types of hypothesis tests in this chapter, the null hypothesis always claims a specific value for a population parameter and therefore takes the form of an equality: H0 (null hypothesis): population parameter = claimed value The alternative hypothesis, or Ha, is a claim that the population parameter has a value that differs from the value claimed in the null hypothesis. It may take one of the following forms: (left-tailed) Ha: population parameter < claimed value (right-tailed) Ha: population parameter > claimed value (two-tailed) Ha: population parameter ≠ claimed value IQL – CHAPTER 9: HYPOTHSIS TESTING Page 1 POSSIBLE OUTCOMES OF A HYPOTHESIS TEST A hypothesis test always begins with the assumption that the null hypothesis is true. Then tests are conducted to determine whether the data gives any reason to think otherwise. Two Possible Outcomes of a Hypothesis Test There are two possible outcomes to a hypothesis test: 1. Reject the null hypothesis, H0, in which case we have evidence in support of the alternative hypothesis. 2. Not reject the null hypothesis, H0, in which case we do not have enough evidence to support the alternative hypothesis. DRAWING A CONCLUSION FROM A HPYOTHESIS TEST STATISTICAL SIGNIFICANCE The idea of statistical significance was introduced in Section 6.1, statistical significance is defined as: a measure of the likelihood that a result is meaningful. And we say that a statistically significant result is a result in a study that is unlikely to have occurred by chance. The most commonly quoted levels of statistical significance are the 0.05 level (the probability of a result’s having occurred by chance is 5% or less than 1 in 20) and the 0.01 level (the probability of the result’s having occurred by chance is 1% less, or less than 1 in 100). Hypothesis Test Decisions Based on Levels of Statistical Significance We decide the outcome of a hypothesis test by comparing the actual sample result (mean or proportion) to the result expected if the null hypothesis is true. We must choose a significance level for the decision. • If the chance of a sample result at least as extreme as the observed result is less than 1 in 100 (or 0.01), then the test is statistically significant at the 0.01 level and offers strong evidence for rejecting the null hypothesis. • If the chance of a sample result at least as extreme as the observed result is less than 1 in 20 (or 0.05), then the test is statistically significant at the 0.05 level and offers moderate evidence for rejecting the null hypothesis. • If the chance of a sample result at least as extreme as the observed result is greater than the chosen level of significance (0.05 or 0.01), then we do not reject the null hypothesis. IQL – CHAPTER 9: HYPOTHSIS TESTING Page 2 P – Values A P – Value in a hypothesis test is the probability of selecting a sample at leaste as extreme as the observed sample, assuming that the null hypothesis is true. Hypothesis Test Decisions Based on P-Values The P-value (probability value) for a hypothesis test of a claim about a population parameter is the probability of selecting a sample at least as extreme as the observed sample, assuming that the null hypothesis is true: • A small P-value (such as less than or equal to 0.05) indicates that the sample result is unlikely, and therefore provides reason to reject the null hypothesis. • A large P-value (such as greater than 0.05) indicates that the sample result could easily occur by chance, so we cannot reject the null hypothesis PUTTING IT ALL TOGETHER After you have the basic premise of hypothesis testing with the exception of the calculations. The Hypothesis Test Process Step 1. Formulate the null and alternative hypotheses, each of which must make a claim about a population parameter, such as a population mean (μ) or a population proportion (p); be sure this is done before drawing a sample or collecting data. Based on the form of the alternative hypothesis, decide whether you will need a left-, right-, or two-tailed hypothesis test. Step 2. Draw a sample from the population and measure the sample statistics, including the sample size (n) and the relevant sample statistic, such as the sample mean (x) or sample proportion (p). Step 3. Determine the likelihood of observing a sample statistic (mean or proportion) at least as extreme as the one you found under the assumption that the null hypothesis is true. The precise probability of such an observation is the P-value (probability value) for your sample result. Step 4. Decide whether to reject or not reject the null hypothesis, based on your chosen level of significance (usually 0.05 or 0.01, but other significance levels are sometimes used). IQL – CHAPTER 9: HYPOTHSIS TESTING Page 3 9.2 Hypothesis Tests for Population Means ONE – TAILED HYPOTHESIS TESTS LEARNING GOAL Understand and interpret one- and two-tailed hypothesis tests for claims made about population means, and learn to recognize and avoid common errors (type I and type II errors) in hypothesis tests. Computing the Standard Score for the Sample Mean in a Hypothesis Test When we draw a random sample for a hypothesis test, we can consider it to be one of many possible samples in the sampling distribution. Given the sample size (n), the sample mean the population standard deviation (σ), and the claimed population mean (μ), we make the following computations: standard deviation for the distribution of sample means = σ n standard score for the sample mean, z = x – / n Note: In reality, it is rare that we know the population standard deviation σ; see Section 10.1 about how to deal with such cases. Critical Values of Statistical Significance A hypothesis test is significant at the 0.05 level if the probability of finding a result is extreme as the one actually observed is 0.05 of less (assuming the null hypothesis is true). Decisions Based on Statistical Significance for One-Tailed Hypothesis Tests We decide whether to reject or not reject the null hypothesis by comparing the standard score (z) for a sample mean to critical values for significance at a given level. Table 9.1 (next slide) summarizes the decisions for one-tailed hypothesis tests at the 0.05 and 0.01 levels of significance. Critical Values: The critical value(s) for a hypothesis test is a threshold to which the value of the test statistic in a sample is compared to determine whether or not the null hypothesis is rejected. The critical value for any hypothesis test depends on the significance level at which the test is carried out, and whether the test is one-sided or two-sided. http://www.stats.gla.ac.uk/steps/glossary/hypothesis_testing.html#critval IQL – CHAPTER 9: HYPOTHSIS TESTING Page 4 TWO – TAILED TESTS The same basic ideas apply to two-tailed hypothesis tests in which the alternative hypothesis has the “not equal to” form of Ha: μ = claimed value. For two-tailed tests a value “as extreme as the one actually found” can lie either on the left or on the right of the sampling distribution (Figure 9.5 in the next slide). A probability of 0.05, or 5%, therefore corresponds to standard scores either in the first 2.5% of the sampling distribution on the left or in the last 2.5% on the right. From Appendix A, the 2.5th percentile corresponds to a standard score of -1.96 and the 97.5th percentile corresponds to a standard score of 1.96. These standard scores become the critical values for two-tailed tests to be significant at the 0.05 level. Two-Tailed Test (Ha: μ ≠ claimed value) Statistical significance: A two-tailed test is significant at the 0.05 level if the standard score of the sample mean is at or below a critical value of -1.96 or at or above a critical value of 1.96. For significance at the 0.01 level, the critical values are -2.575 and 2.575. P-values: To find the P-value for a sample mean in a two-tailed test, first use the standard score of the sample mean to find the P-value assuming the test is one-tailed; then double this value to find the P–value for the two-tailed test. COMMON ERRORS IN HYPOTHESIS TESTING Type I and Type II errors • An error in which H0 is wrongly rejected, is called a type I error. • An error in which we wrongly fail to reject H0, is called a type II error. IQL – CHAPTER 9: HYPOTHSIS TESTING Page 5 9.3 Hypothesis Tests for Population Proportions LEARNING GOAL Understand and interpret hypothesis tests for claims made about population proportions CALCULATIONS FOR HYPOTHESIS TESTS WITH PROPORTIONS How do we determine whether there is enough evidence in the sample to reject the null hypothesis? We will use 4-steps of the hypothesis test process on page 376. Step 1 is to formulate the hypotheses H0: p = 0.5 (more than 50% of voters favor the candidate) Ha: p > 0.5 (more than 50% of voters favor the candidate) Step 2 is to collect the sample data.ulations for Hypothesis Tests with Proportions p Step 3 is to determine the likelihood that the sample result could have arisen by chance if the null hypothesis is true. Under the starting assumption that the null hypothesis is true (that the proportion of people in the population who favor the candidate is 0.5), the peak of this distribution will be the population proportion claimed by the null hypothesis, p = 0.5. The standard deviation of distribution of sample proportions is p(1 p) 0.5(1 0.5) 0.025 n 400 Standard Score for the Sample Proportion in a Hypothesis Test Given the sample size (n), the sample proportion ( ), and the claimed population proportion (p), the standard score for the sample proportion is z IQL – CHAPTER 9: HYPOTHSIS TESTING pˆ p p(1 p)/n Page 6 SIGNIFICANCE LEVELS AND P – VALUES Statistical significance is a statistical assessment of whether observations reflect a pattern rather than just chance, the fundamental challenge being that any partial picture is subject to observational error. In statistical testing, a result is deemed statistically significant if it is unlikely to have occurred by chance, and hence provides enough evidence to reject the hypothesis of 'no effect'. As used in statistics, significant does not mean important or meaningful, as it does in everyday speech. The amount of evidence required to accept that an event is unlikely to have arisen by chance is known as the significance level or critical p-value: in traditional Fisherian statistical hypothesis testing, the pvalue is the probability of observing data at least as extreme as that observed, given that the null hypothesis is true. If the obtained p-value is small then it can be said that either the null hypothesis is false or an unusual event has occurred. p-values do not have any repeat sampling interpretation.[citation needed][clarification needed] Research analysts who focus solely on significant results may miss important response patterns which individually fall under the threshold set for tests of significance. Many researchers[weasel words] urge that tests of significance should always be accompanied by effect-size statistics, which approximate the size and thus the practical importance of the difference. An alternative (but nevertheless related) statistical hypothesis testing framework is the Neyman– Pearson frequentist school which requires both a null and an alternative hypothesis to be defined and investigates the repeat sampling properties of the procedure, i.e. the probability that a decision to reject the null hypothesis will be made when it is in fact true and should not have been rejected (this is called a "false positive" or Type I error) and the probability that a decision will be made to accept the null hypothesis when it is in fact false (Type II error). Fisherian p-values are philosophically different from Neyman–Pearson Type I errors. This confusion[clarification needed] is unfortunately propagated by many statistics textbooks.[1] http://en.wikipedia.org/wiki/Statistical_significance SUMMARY OF HYPOTHESIS TESTS WITH PROPORTIONS • Because we are dealing with population proportions, the null hypothesis has the form p = claimed value. To decide whether to reject or not reject the null hypothesis, we must determine whether a sample as extreme as the one found in the hypothesis test is likely or unlikely to occur if the null hypothesis is true. • We determine this likelihood from the standard score (z) of the sample proportion, which we compute from the formula z pˆ p p(1 p) / n where n is the sample size, is the sample proportion, and p is the population proportion claimed by the null hypothesis. IQL – CHAPTER 9: HYPOTHSIS TESTING Page 7 IQL – CHAPTER 9: HYPOTHSIS TESTING Page 8