Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Eigenstate thermalization hypothesis wikipedia , lookup
Foundations of statistics wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Hypothesis Testing (Statistical Significance Testing) Two Points to Emphasize: 1. Hypothesis testing ALWAYS involves a null hypothesis (H0) whether one is explicitly stated or not. 2. The significance level (i.e., -level) is chosen BEFORE the sample statistic is calculated. We have data from one of the General Social Surveys (GSS), a random sample of 2,013 individuals. Therefore, we know that the Central Limit Theorem can be applied. We are social psychologists interested in people’s image of themselves (i.e., self concept). Each participant in the survey was given a card with the following scale on it: 1—2—3—4—5—6—7 Respondents were asked to rate their own personal appearance. On this scale, 1 meant “way below average,” 7 meant “way above average,” and 4 meant “average.” Based on theories of self-concept, we hypothesize that in general people consider themselves “above average” in personal appearance. In this example, the expectation that “People in general rate their personal appearance as being above average” is the alternate hypothesis (H1). Note that this is a statement about the universe (“People in general…”), not the sample. Thus, the symbolic representation of the alternate hypothesis is expressed as: H1: Y > 4.00 Why “greater than 4.00”?; “greater than” because our alternate hypothesis states that the rating is above average; “4.00” because, on this scale, 4.00 is average. Our null hypothesis (H0), whether we state it or not, is “People in general rate their personal appearance as average.” This is the NEGATION of the alternate hypothesis. (To state that “People in general rate their appearance as below average” is to specify another alternate hypothesis, not a null hypothesis.) Symbolically, the null hypothesis is: H0: Y = 4.00 1—2—3—4—5—6—7 Below Average Average Above Average No special statistical test is needed to test this null hypothesis since, with 2,013 cases in a random sample, we can assume that the Central Limit Theorem applies. Thus we can use our knowledge of the normal curve in testing this hypothesis. Let’s set the significance or -level for our hypothesis test at = 0.05. Since we can assume that the Central Limit Theorem applies in this case, we know that the sampling distribution of all possible (sample) mean selfappearance ratings is normally distributed. In other words, we can use Appendix 1, pp. 540-542. With Appendix 1, we can identify the critical value. We are making a test with an -level of 0.05, meaning that we want only a 5 percent chance of wrongly rejecting the null hypothesis (H0). Alpha = 0.05 means 5 percent of the total area under the normal curve. Since our alternate hypothesis (H1) is a directional one, pointing to scale values ABOVE the average score of 4.00, we are only dealing with the RIGHT HALF of the sampling distribution. We are looking for sample mean self-appearance ratings that are so extreme that they cannot be explained chance alone—the (un)luck of the draw. We are looking for those theoretically possible sample means that could occur by chance only 5 percent or less than 5 percent of the time. This means that we identify a region of rejection in the right tail of the sampling distribution that contains 5 percent of the area under the normal curve. Note: alpha = 0.05 means 5 percent. (Alpha stands for “area”; = 0.05 means 5 percent of the area in the tail.) Searching Appendix 1, we find in Column C the areas in the tails of 0.0505 in Row 1.64 and 0.0495 in Row 1.65. Interpolating (splitting the difference in this case), we determine the critical value of z to be + 1.645. Next, we need to calculate the mean self-assessed appearance score in our sample data, locate it on the sampling distribution of mean self-appearance scores, convert it to a z-value, and compare this z-value to the critical value + 1.645 that we have just found. Summing the 2,013 self-appearance scores from our sample respondents and dividing that sum by 2,013 produced a sample mean of 4.90. This is 0.90 scale units above the average self-appearance rating of 4.00, but is this sufficiently greater than the average rating score to conclude that a general trend exists (in the universe)? If not, we must settle on chance—the luck of the draw—as the explanation for getting 4.90 in a random sample when the actual (unknown) rating in the population is probably close to 4.00. Under the null hypothesis, the mean of the sampling distribution (Y) is PRESUMED (initially) to be 4.00 (because this would be the population mean if people in general rate their personal appearance as “average”). Our sample mean, 4.90, is 0.90 steps to the right of the presumed mean of the sampling distribution. If we travel down the curve toward the right to location + 0.90, where are we in z-values on the x-axis below? Our conversion factor is the estimated value of the standard error (here, technically called the standard error of the mean): sY ̂ N In the present example, N = 2,013. We need to calculate the standard deviation for the sample. Its value turns out to be 1.153. Therefore, we estimate the standard error to be: ˆ 1.153 2,013 ˆ 1.153 44.866 ˆ 0.026 To decide whether to reject or not reject the null hypothesis, we need to determine whether a difference of + 0.90 (4.90 – 4.00) sends us to a z-location at or beyond zCV = + 1.645. To find out, we divide this difference by the value of the standard error. The equation is: Y Y z ˆ Y Here, 0.90 divided by 0.026 produces z = + 34.62. This z-location is WAY BEYOND the critical value of z (i.e., + 1.645). In other words, we are WELL INSIDE the region of rejection. This means that a sample mean personal appearance rating of 4.90 is likely to occur by chance LESS THAN 5 percent of the time. Thus, we REJECT the hull hypothesis and conclude that we CAN infer from our sample data a general tendency for people to rate their personal appearance as better than average. Because we chose an -level of 0.05, there is a 5 percent chance that we have WRONGLY REJECTED the null hypothesis (i.e., ruled out chance as the explanation for our sample statistic in favor of a “true” general tendency). There are two types of errors that can arise in hypothesis (significance) testing. We have dealt only with Type I () errors. There is also something called a Type II error. Furthermore, there are two different types of tests of null hypotheses. We have made what is called a onetailed test, meaning that we located the region of rejection entirely within one tail of the sampling distribution (the right). This was dictated by our alternate hypothesis which directed us to the portion of the sampling distribution where all theoretically possible sample means were greater than the (presumed universe) mean (hence the “” sign in symbolic expression of our null hypothesis). Another way of saying this is that directional alternate hypotheses ALWAYS dictate making one-tailed significance tests of null hypotheses. What is a directional alternate hypothesis? It is a hypothesis about the parameter in some universe (population) expressed in language of “greater than” or “less than” (symbolically, with “” and “” signs). To state a hypothesis this way, we need to know a lot about the subject we are studying. If we don’t, then we are better off formulating a non-directional (alternate) hypothesis. In our self-assessed appearance example, this would be: “In general, people DO NOT rate themselves as average in appearance.” Notice that this does not state precisely how people rate themselves, only that they resist defining themselves as average. The null hypothesis would be the same as before: “People in general rate themselves as average in their personal appearance.” Symbolically, the non-directional hypothesis would be represented as: H1: Y 4.00 and the null hypothesis as: H0: Y = 4.00 It is the way the ALTERNATE HYPOTHESIS is worded (NOT the null hypothesis) that determines whether a directional or a non-directional test is called for. A NON-DIRECTIONAL alternate hypothesis dictates that we perform a two-tailed significance test. Whereas in the one-tailed test the region of rejection is located entirely in one tail (either the left or the right) of the sampling distribution, in the two-tailed test the region defined by the -level must be SPLIT EQUALLY between the two tails. For instance, if = 0.05 and the region of rejection were thus 5 percent of the area under the sampling distribution curve, then we would need to locate 2.5 percent of the region in the left tail AND 2.5 percent in the right tail. This is precisely what we did in the case of estimation in determining lower and upper confidence limits. To identify TWO critical values (one establishing the left region of rejection and the other establishing the right region of rejection), we need to find the value of z beyond which 2.5 percent of the area remains in each tail. Whenever the Central Limit Theorem holds, we can use Appendix 1, pp. 540-542. In Appendix 1, we find in Column C area = .0250 (2.5 percent) in Column C. This identifies the value of z as 1.96 (Column A). Therefore, the z-value where the left tail begins is - 1.96, and the z-value where the right tail begins is + 1.96. A value for a test statistic that is LESS THAN - 1.96 or one that is GREATER THAN + 1.96 would be so unlikely to occur by chance at the 0.05 level that we would decide to REJECT a null hypothesis. Conversely, any test statistic whose z-value is BETWEEN - 1.96 and + 1.96 would result in our deciding NOT to reject the null hypothesis. With small random samples, we cannot assume that the Central Limit Theorem applies. Fortunately, someone has worked out a series of sampling distributions for such situations. These are the socalled “Student’s t” distributions (Appendix 2, p. 543). The specific sampling distribution of t is a function of the number of degrees of freedom, here simply sample size less one (i.e., N - 1). In other words, with N values, N - 1 are free to take any value, but the final value (the Nth value) is FIXED in order to equal the sum of the N values. If we have three numbers constrained to sum to 10, any two can take any value, but the third number must have the value to make the sum of the three equal to 10. For example, if X1 = 6 and X2 = 11, then X3 must equal - 7. 6 + 11 + (-7) = 10 Two of the values are “free,” but the third is not. Just remember N - 1 degrees of freedom in a sample of size N. As you can see in Appendix 2, once sample size (N) exceeds 121 (i.e., df 120) the Student t distribution becomes normally shaped. Using the same data set that you used in your first SAS exercise comprised of 63 randomly-selected cities in the U.S., the question is: Was there a decrease in the population of central cities in the U.S. between 1960 and 1970? The alternate hypothesis is that there was: H1: average population change < 0.0 H1: Y < 0.0 Therefore, the null hypothesis is: H0: Y = 0.0 Since the percentage change in population is expected to be a negative number (percentage less than 0.0), this one-tailed test demands a critical value and region of rejection in the left tail. The sample mean is - 1.26 percent, meaning that there was a 1.26 percent decline in population in the sample. The hypothesis, however, is about the universe, that is, that the trend probably holds for ALL U.S. cities. The sample standard deviation was 6.32 percent. Let's set alpha at 0.05. Sample size is 63. Therefore, there are 62 degrees of freedom. What is the critical value of t for this problem? We find the .05 column for the one-tailed significance level (the second column) and look for row df = 62. Since there is no such row in the table, we find this critical value by (1) interpolating and then (2) adding a negative sign. Sixty-two degrees of freedom is 2/60 of the way between 1.671 and 1.658. Multiplying (2/60) times 0.013 (the difference between 1.671 and 1.658) produces 0.0004. Decrementing 1.671 by 0.0004 yields 1.6706, or 1.671 rounded off. Since our region of rejection lies entirely in the left tail, we supply the negative sign. Thus the critical value is: t0.05 = - 1.671 We begin by estimating the standard error in the same way: sY ̂ N ˆ 6.32 63 ˆ 6.32 7.937 ˆ 0.796 Using this “exchange rate,” we can now convert percentage differences in population change into t-units. The algorithm is: Y Y t ˆ Y (1.26) (0.0) t 0.796 1.26 t 0.796 t 1.582 Thus, location on the curve - 1.26 percent translates to a location of - 1.58 in t-units on the underlying x-axis. The question is: How likely are we to land there if the true difference in population of central cities between 1960 and 1970 was 0.0? The answer is: It is likely that we could land there, i.e., find a sample difference of - 1.26 percent when there actually was no overall difference among all cities. That is, t = - 1.58 DOES NOT lie in the region of rejection because - 1.58 does not exceed t0.05 = - 1.671. Thus we CANNOT reject the null hypothesis. There is no evidence in our sample data to infer a loss of population in general among central cities in the U.S. between 1960 and 1970. Testing Single-Mean Hypotheses A random sample of 29 college students studied an average of 2.5 hours per day with a standard deviation of 0.75 hours. Using student's t distribution (Appendix 2, p. 543), test the null hypothesis (H0) that students in general do not study at all. Assume that = 0.05 and perform a two-tailed test. 1. Symbolically, what is the null hypothesis (H 0)? __________ 2. Symbolically, what is the alternate hypothesis (H 1)? __________ 3. What is the value of the standard error? __________ 4. What is the value of t? __________ 5. How many degrees of freedom in this problem? __________ 6. What is the critical value of t0.05? __________ 7. Do you reject or not reject the null hypothesis? __________ Testing Single-Mean Hypotheses Answers A random sample of 29 college students studied an average of 2.5 hours per day with a standard deviation of 0.75 hours. Using student's t distribution (Appendix 2, p. 543), test the null hypothesis (H0) that students in general do not study at all. Assume that = 0.05 and perform a two-tailed test. 1. Symbolically, what is the null hypothesis (H 0)? = 0 2. Symbolically, what is the alternate hypothesis (H 1)? 0 3. What is the value of the standard error? 0.139 4. What is the value of t? 17.951 5. How many degrees of freedom in this problem? 6. What is the critical value of t0.05? 7. Do you reject or not reject the null hypothesis? 29 2.048 reject