Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hypothesis Testing Summary Hypothesis testing begins with the drawing of a sample and calculating its characteristics (aka, “statistics”). A statistical test (a specific form of a hypothesis test) is an inferential process, based on probability, and is used to draw conclusions about the population parameters. One way to clarity the process of hypothesis testing is to imagine that you have first of all, a population to which no treatment has been applied (aka, “comparison group”). You know the parameters of this population (for example, the mean and standard deviation). Another population exists that is the same as the first, except that some treatment has been applied (aka, the treatment or experimental group). You do not know the parameters of this population. Samples are drawn from this latter population and the statistics derived from the sample serve as the estimates of the unknown population parameters. This is the situation in which hypothesis testing applies, and it provides an introduction for understanding more complicated versions of hypothesis testing that you will encounter later. The logic of hypothesis testing can be stated in three steps: 1 A hypothesis concerning a population is stated. 2 A sample is selected from the population. 3 The sample data are used to determine whether the hypothesis can reasonably be supported or not. Ultimately, the conclusion drawn is about the population not just the sample. None of this is necessary if the entire population is small and accessible, but this is almost never the case. Step 1: HO Step 1 breaks down into a series of formal stages. The first stage is to state the null hypothesis, which is usually the hypothesis of no difference. The null hypothesis states that the treatment has no effect, or, stated differently, there is no difference between treated and untreated populations (e.g., µ1 −µ2 = 0). That is, the independent variable or treatment will have no effect on the dependent variable. The null hypothesis is represented by the symbol Ho. Examples of several forms that the null hypothesis can take are contained in the following table: Explanation H0: A. µ = 100 B. µ =0 Population mean is 0. C. σ = 15 Standard deviation in the population is 15. D. µ1 −µ2 = 0 Means of populations 1 and 2 are equal — no difference in the parameters µ1 and µ2 E. σ1 2−σ2 2 = 0 Variance in population 1 is equal to variance in population 2. F. ρ XY = 0 Correlation coefficient between x and y in the population is 0. C:\rsm\y520\sec5982_fall02\week_12\hypothesis_test_summary011109.fm Population mean is 100. Explanation H0: G. ρ 1 −ρ 2 = 0 H. µ1 = µ2 = µ3 I. π = .5 The difference between ρ XY in population 1 and ρ XY in population 2 is 0. The means in populations 1, 2, and 3 are equal. The proportion in the population is.5 Step 1: HA The second stage is to state the alternative hypothesis. It proposes the opposite of the null hypothesis in that it says there will be an effect of treatment, there will be differences between populations, or that the independent variable or treatment does indeed affect the dependent variable. The symbol for the alternative hypothesis is either HA or H1 . Most often the HA is non-directional — it just says there will be a difference without saying in which direction. Sometimes a directional hypothesis is used. This will be discussed a little later. Step 2: Sampling Step 2 requires that a suitable sample is selected from the population. In order to adequately represent the population, the sample must be random. See sections 9.4 through 9.9 of Hopkins, Hopkins, & Glass (1996) if you have problems with this concept. Step 3: Statistical Test In Step 3 the data from the sample are compared with the statement of the null hypothesis. For example, the sample mean (representing the mean of the unknown population) is compared with the known population mean. The decision is made whether or not to reject the null hypothesis. See the discussion in section 10.3 in HH&G for the reasoning behind dealing only with failure to accept the null hypothesis. If we fail to accept null hypothesis, we accept the alternative hypothesis and conclude that there is a treatment effect or a difference between populations, that is, that the independent variable or treatment has affected the dependent variable. These steps are restated in the following table: Step Action 1. State the statistical hypothesis H0 to be tested (e.g., H0: µ = 100) 2. Set the level of statistical significance (alpha level). That is, specify the degree of risk of a Type I error — the risk of incorrectly concluding that H0 is false when indeed it is true. This risk, stated as a probability, is denoted by α (alpha) and is the probability of a type I error (e.g., α =.05). 3. Assuming H0 to be correct, determine the probability (p) of obtaining a sample mean (X) that differs from the population mean (µ) by an amount as large or larger than that which was observed (e.g., if µ = 100, and X = 108, calculate the probability of observing a difference between X and µ of 8 or more points). 4. Make a decision regarding H0 — whether or not to reject it (e.g., if the probability (p) from Step 3 is less than alpha (Step 2), we fail to accpet the null hypothesis and conclude that mu does not equal 100. C:\rsm\y520\sec5982_fall02\week_12\hypothesis_test_summary011109.fm Whenever we make a decision about rejecting or failing to reject the null hypothesis, two types of error may occur: • We reject the null hypothesis when we should not because in reality the null hypothesis is true. This is known Type I, or alpha error. • We accept the null hypothesis when we should not because in reality the null hypothesis is false. This decision is known as Type II, or beta error. The four outcomes of decision making are illustrated in the following box: Actual State of Nature — The Null Hypothesis is, in reality: True Accept HO ☺ Reject HO Error, Type I or alpha False Error, Type II or beta Our decision ☺ Most experimenters hope to reject the null hypothesis and to therefore claim that their experimental treatment has had an effect. However, as false claims of treatment effects (Type I errors) are scientifically serious, it is necessary to set stringent criteria. We can never be absolutely certain that we have correctly rejected, or failed to reject, the null hypothesis, but we can determine the probability associated with making an error in this process. You may recall from a previous lecture and textbook readings that the probability of obtaining a particular sample mean from a population can be determined using z-scores. Sample means very close to the population mean are highly likely. Sample means distant from the population mean (in the tails of the distribution) are very unlikely, but they do occur. If the null hypothesis is true and our treatment has no effect, we would expect that the sample we draw will have a mean close to that of the population. Sample means in the tails are not very likely if the null hypothesis is true. Such means indicate that we should reject the null hypothesis. (See Figure 10.2, p. 175, HH&G). A boundary or decision line has to be drawn, therefore, between those sample means that are expected, given the null hypothesis, and those that are so unlikely that they lead to rejection of the null hypothesis. The boundary that separates these sample means is called the level of significance or alpha level. It is a probability value beyond which obtained sample means are very unlikely to occur if the null hypothesis is true. The value .05 is commonly used as the alpha ) level. It represents the proportion of the area in the tails of the distribution where sample means are sufficiently unlikely, if the null hypothesis is true. The alpha level also tells us the probability of producing a Type I error. An example makes this whole process clearer— a good one is provided on pp. 174-175 of HH&G Although .05 is the most commonly accepted alpha level in psychological and educational research, more stringent levels, such as .01 and .001 may be used when the consequence of making a Type I error is serious. A general statement of the z-score statistic is provided on p. 175 of HH&G: – hypothesizedpopulationmeanz = samplemean --------------------------------------------------------------------------------------------------------------------s tan darderrorofthesamplingdistribution C:\rsm\y520\sec5982_fall02\week_12\hypothesis_test_summary011109.fm A more general form of this which you will find applicable to a large range of statistics that you will learn about in the future is: samplestatistic – hypothesizedpopulationparameter teststatistic = ------------------------------------------------------------------------------------------------------------------------------------------s tan darderrorofthedistributionoftheteststatistic This may be restated in less statistical terms as: obtaineddifference teststatistic = ---------------------------------------------------------------------------difference exp ectedbychance Do remember that, when testing the null hypothesis, you can reject it when the difference between your sample data and that which would be expected according to the null hypothesis is large enough. However, if a small difference is obtained, you should not accept the null hypothesis. Instead, according to the logic involved in this process, you are only entitled to say that you fail to reject the null hypothesis. When we reject the null hypothesis, we are saying that the difference we obtained (between the sample statistic and the hypothesized population parameter) is sufficiently unlikely to occur by chance alone. We are entitled to say that our treatment has had an effect. But there is always a small chance that we are wrong. In this case we have made a Type I error. The probability that we are wrong is equal to significance level. When findings are stated in a research report the null or the alternative hypotheses are normally not mentioned. Instead, the term statistical significance is used. If the null hypothesis has been rejected, the findings are said to be “statistically significant.” If the null hypothesis was not rejected the findings are not statistically significant. It is necessary to make a statement about whether or not you obtained statistical significance and to include the value of your sample statistic and say whether the probability of obtaining that statistic is greater or smaller than the alpha level you have chosen. These values are often included in brackets for linguistic simplicity. There is an endless variety of ways in which you can say the same thing, but you must take care with the wording of statements about significance and non-significance. One-Tailed Hypothesis Tests The null hypothesis always says that there is no treatment effect. The alternative hypothesis says that there is a treatment effect (or in other words, a difference between the sample data and that expected according to the null hypothesis). Such a statement does not predict the direction of difference created by the treatment. It is said to be a two-tailed hypothesis because highly unlikely events in either tail of the distribution will lead to rejection of the null hypothesis. Although they are used less often, alternative hypotheses may be one-tailed or directional. In this case, the researcher is predicting either an increase or a decrease as a result of treatment, but not both. In this case the null hypothesis is rejected only if the sample data fall in the predicted tail of the distribution. The critical region still represents the same area of the curve (e.g., .05) but the whole area is located only in one tail (see section 10.10, p. 181; HH&G. Two-tailed tests are said to be more conservative because the difference between the sample data and that expected according to the null hypothesis (that is the treatment effect) must be larger to achieve the same level of statistical significance, and thus reject the null hypothesis, C:\rsm\y520\sec5982_fall02\week_12\hypothesis_test_summary011109.fm than in a one-tailed test. Even when a particular direction of treatment effect can be predicted a two-tailed test is still often used. Statistical Power The goal of hypothesis testing is usually to correctly reject the null hypothesis, or in other words, to show that the treatment applied has had an effect. The probability of correctly rejecting the null hypothesis is called the power of a statistical test. Power is calculated as 1 - α, where α is the probability of making a Type II error (failing to reject the null hypothesis when it is false). Statistical power is large when the treatment effect is large. Put another way, you are more likely to correctly reject the null hypothesis when the treatment has created a large difference between your sample data and the original population. Other factors that influence power that are more directly controllable than size of treatment effect are: • the alpha level chosen. Smaller α levels produce smaller values for power. • whether a one-tailed or two-tailed test is used. Statistical power is greater for one-tailed tests • sample size. Larger samples provide greater power C:\rsm\y520\sec5982_fall02\week_12\hypothesis_test_summary011109.fm