Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Witte & Witte, 10e Chapter 14 Page 1 of 12 Pages Chapter 14: t Test for Two Independent Samples Exercise 1 1, specify both the null and alternative hypotheses for each of the following studies. Assume that the data from all these studies will be analyzed with directional tests. a. College sophomores were randomly assigned to a study skills training program that would receive training designed to enhance their study skills or to a comparison control group that would receive no training. The researchers wanted to find out if the study skills training program would improve the students’ grades as measured by their grade point averages. b. College men who volunteered to participate in a communication skills research project were randomly assigned to an experimental treatment condition or to a control treatment condition. The men in the experimental condition received special training designed to improve their skills in communicating with college women The men in the control condition met for an equal amount of time, but instead of communication training, they discussed ways to manage their money. After the training was completed, each volunteer was observed in a special lab as he talked with a female confederate. The volunteer’s communication skills were scored by a panel of judges. Higher scores indicated more effective communication skills. c. Depressed teenagers were randomly assigned to an experimental treatment condition or to a control treatment condition. Teenagers in the experimental treatment were given a drug that was hoped would reduce symptoms of depression. The teenagers in the control condition were given a placebo. At the end of the experimental treatment time, each participant took a standardized depression test on which high scores indicated a relatively high level of depression. Answers: a. H0 1– b. H0 1– c. H0 1– ≤ 0; H1 ≤ 0; H1 2 ≥ 0; H1 – – 1– >0 >0 2<0 2 1 2 2 1 2 Exercise 2 Using Table B in Appendix C of your textbook, find the critical t values for each of the following hypothesis tests: a. b. c. d. e. oneoneonetwotwo-tai = 15; n2 = 15 = 18; n2 = 21 1 = 20; n2 = 16 1 = 15; n2 = 17 1 = 45; n2 = 45 1 1 1 Witte & Witte, 10e Chapter 14 Page 2 of 12 Pages Answers: a. -2.467 b. -1.697 c. 2.457 d. ±2.750 e. ±2.000 Exercise 3 Willoughby, Porter, Belsito, and Yearsley (1999) investigated the effectiveness of special study strategies among elementary school children for the task of learning factual information about unfamiliar animals. Their study included three grade levels and three treatment conditions. Summary information for only the fourth and sixth graders in the keyword strategy condition is presented in this exercise. The mean memory test performance of the 15 fourth graders was 8.20, with SS equal to 102.06. The mean memory test performance of the 15 sixth graders was 10.73, with SS equal to 168.57. Was there a significant difference between the memory performance of the fourth and sixth graders? Follow the steps below to answer this question. Designate the fourth graders as group 1 and the sixth graders as group 2. a. Calculate the degrees of freedom. b. Use Table B in Appendix C of your textbook to identify the critical t value for a two-tailed test with alpha equal to .05. c. Calculate the pooled variance estimate. d. Calculate the estimated standard error. e. Calculate observed t. f. Present the statistical decision. g. Present a verbal summary of the results. Answers: a. b. c. d. e. f. g. df = 28 ±2.048 Pooled variance estimate = 9.6654 Estimated standard error = 1.1352 Observed t = -2.2287 Reject the null hypothesis. The sixth graders outperformed the fourth graders on the memory test. Exercise 4 As your textbook indicates, computer software frequently provides the p-value associated with a specified t test value. Microsoft Excel was used to obtain the one-tailed and twotailed p-values shown in the items below. These p-values are for independent t tests that were carried out to investigate a difference between an experimental group and a control 2 Witte & Witte, 10e Chapter 14 Page 3 of 12 Pages group. For each, indicate whether the result would be statistically significant with alpha equal to .05 for a one-tailed test and for a two-tailed test. a. b. c. d. e. One-tailed p-value = 0.267006; two-tailed p-value = 0.534011 One-tailed p-value = 0.176499; two-tailed p-value = 0.352999 One-tailed p-value = 0.000112; two-tailed p-value = 0.000223 One-tailed p-value = 0.002699; two-tailed p-value = 0.005399 One-tailed p-value = 0.040598; two-tailed p-value = 0.081196 Answers: a. b. c. d. e. One-tailed is not significant; two-tailed is not significant One-tailed is not significant; two-tailed is not significant One-tailed is significant; two-tailed is significant One-tailed is significant; two-tailed is significant One-tailed is significant; two-tailed is not significant Exercise 5 Five exact p-values are shown below. 1. 2. 3. 4. 5. p = 0.534011 p = 0.352999 p = 0.000223 p = 0.005399 p = 0.081196 Select the approximate p-value that would most accurately describe each of these exact p-values. a. p > .05 b. p < .05 c. p < .01 d. p < .001 Answers: 1. 2. 3. 4. 5. a a d c a 3 Witte & Witte, 10e Chapter 14 Page 4 of 12 Pages Exercise 6 Select the approximate p-value for each of the following test results: a. b. c. d. 1. 2. 3. 4. 5. 6. p > .05 p < .05 p < .01 p < .001 one-tailed test, lower tail critical; df = 18; t = -1.857 two-tailed test; df = 14; t = -2.335 one-tailed test, upper tail critical; df = 30; t = 3.249 two-tailed test; df = 24; t = 0.724 one-tailed test, lower tail critical; df = 8; t = -4.684 two-tailed test; df = 62; t = 3.279 Answers: 1. 2. 3. 4. 5. 6. b b c a d c Exercise 7 In Exercise 3, you worked with summary data from a study conducted by Willoughby, Porter, Belsito, and Yearsley (1999). Here’s more summary information from that study for students in the imagery condition. The mean memory test performance of the 15 fourth graders was 5.80, with SS equal to 146.06. The mean memory test performance of the 15 sixth graders was 10.47, with SS equal to 305.32. Follow the steps given below to construct a 95% confidence interval around the obtained difference. Designate the fourth graders as group 1 and the sixth graders as group 2. 1. Calculate the degrees of freedom. 2. Use Table B in Appendix C of your textbook to identify the critical t value for a two-tailed test with alpha equal to .05. 3. Calculate the pooled variance estimate. 4. Calculate the estimated standard error. 5. Calculate the 95% confidence interval. 6. Provide a verbal interpretation of the confidence interval. Answers: 1. df = 28 2. critical t = 2.048 4 Witte & Witte, 10e Chapter 14 3. 4. 5. 6. Page 5 of 12 Pages Pooled variance estimate = 16.1207 Estimated standard error = 1.4661 95% CI: -7.67 to -1.67 We are 95% confidence that the difference in the population means is between -7.67 and -1.67 test score points. The negative signs indicate that, on average, the fourth graders performed less well than the sixth graders. Also, because both endpoints of the confidence interval are negative and zero is not included in the interval, we can conclude that the poorer performance of the fourth graders is probably real. Exercise 8 Calculate a standardized effect size, Cohen’s d, for the data given in Exercise 3. Interpret this effect size using Cohen’s guidelines which are provided in Table 14.2 of your textbook. Answers: d = -0.81; Cohen’s guidelines indicate that this is a large effect. Note that it is common practice to present the absolute value of the effect size. In this case, then, the value of d would be reported as 0.81. Exercise 9 Using the results of Exercise 3 and Exercise 8, write a statement reporting the outcome as it might appear in a published article. Answer: Memory test scores for the fourth graders ( X = 8.20, s = 2.70) and sixth graders ( X = 10.73, s = 3.47) differed significantly [t(28) = -2.23, p < .05 and d = 0.81]. Exercise 10 Mounsey, Vandehey, and Diekhoff (2013) investigated anxiety of working and nonworking university students. The researchers found that the working students reported more anxiety symptoms on the Beck Anxiety Inventory (M = 8.17, SD = 7.94) than the non-working students (M = 4.40, SD = 4.97) and that the difference was statistically significant [t(106) = –2.42, p < .05]. 1. Although the authors do not explicitly report their statistical decision, based on the information that they did provide, would they have rejected or retained the null hypothesis? 2. Because the difference was statistically significant, can we conclude that the difference between the sample means was large? 5 Witte & Witte, 10e Chapter 14 Page 6 of 12 Pages 3. Because the difference was statistically significant, can we conclude that the working students had an anxiety level that would be considered abnormally high? 4. If the investigators repeated their study with another sample of working and non-working university students drawn from the same population, should we expect that they will obtain a similar result? 5. Your textbook tells you that large sample sizes can produce statistically significant results that lack importance. The analysis of Beck Anxiety Inventory scores was based on 78 working students and 30 non-working students, or a total of 108 students. Would a sample size of 108 be considered excessively large? Answers: 1. The researchers would have rejected the null hypothesis. 2. A statistically significant difference is not necessarily a large difference. To determine whether or not a difference is large, it is a good idea to calculate a standardized effect size. Exercise 10 provides the information needed to calculate Cohen’s d, which comes out to be approximately 0.52. Using Cohen’s guidelines presented in section 14.9 of your textbook, we would conclude that the effect size is medium. 3. Because the working students’ anxiety test mean is significantly higher than that of the non-working students, we cannot automatically conclude that the mean is in the abnormal range. In the article, the authors indicate that the working students’ mean was in the mild anxiety range and the non-working students’ mean was in the minimal anxiety range. 4. A statistically significant result implies that the observed outcome is reliable and would likely reappear if the study were repeated. 5. A total sample size of 108 would not generally be considered excessively large for testing for a standardized effect size in the vicinity of one-half of a standard deviation. Samples of 500 or more in each group (i.e., a total of 1,000) students, however, would be considered excessively large for testing for an effect size approximately equal to 0.5. Exercise 11 Juvonen, Wang, and Espinoza (2013) investigated social prominence, physical aggression, and spreading rumors among adolescents at three time points. The outcome measures were based on peer nominations made individually by each participating student. Social prominence was assessed by asking students to name grade mates whom they considered the “coolest”. Physical aggression was measured by asking students to name grade mates who “start fights or push other kinds around”. Spreading of rumors was assessed by asking students to name grade mates who “spread nasty rumors about other kids”. This exercise is only concerned with the results of analyses carried out on the fall of 8th grade data that were designed to test for gender differences. The table presented here presents a summary of the analysis results. 6 Witte & Witte, 10e Chapter 14 Variable Social prominence Physical aggression Spreading rumors Page 7 of 12 Pages Summary of Fall of 8th Grade Results Mean (SD) t value Boys Girls n = 872 n = 1,023 7.30 7.00 .71 (9.05) (8.48) 6.65 3.06 9.93 (8.98) (5.60) 4.56 3.28 4.80 (6.01) (4.87) p value Cohen’s d .48 .03 < .001 .49 < .001 .24 1. For which of the variables was the difference between boys and girls statistically significant? For the significant difference(s), whose mean was higher, the boys’ or the girls’? 2. Would the significant difference(s) be considered large? 3. Would the significant differences be considered important differences? 4. Would the sample size be considered excessively large? Answers: 1. Boys and girls were significantly different on the measures of physical aggression and spreading rumors. The boys’ mean was higher on both of these measures. 2. No, they weren’t large. Cohen’s d for physical aggression was .49, a moderate effect size, and Cohen’s d for spreading rumors was .24, a small effect size. 3. In a study on gender differences in these variables, both moderate- and smallsized differences would probably be considered important. Even a nonsignificant difference, such as the gender difference regarding social prominence, might be considered important if it fits with the authors’ theory and expected outcomes. 4. The total sample size was 1,895 and this is indeed a very large sample. We know that excessively large samples can produce statistically significant results that are not important. In this study, however, the authors presented Cohen’s d along with means, standard deviations, and p values. Therefore, we have a lot of information to use when judging the importance of the results. Exercise 12 Sherman, Haidt, and Coan (2009) were interested in the behavioral carefulness that is needed when caring for a small, delicate child. In a research study, they addressed the question of whether or not perceiving cuteness could enhance behavioral carefulness. In Experiment 1, a total of 40 undergraduate women were randomly assigned to one of two conditions to view slides of puppies and kittens (high cuteness) or dogs and cats (low cuteness). The research participants performed a task using tweezers that was designed to measure carefulness. This task was performed both before and after viewing the 7 Witte & Witte, 10e Chapter 14 Page 8 of 12 Pages slides. Grip strength and heart rate were also measured before and after viewing the slides. The before-after differences in the measures were calculated for each participant and mean differences were analyzed via independent samples t-tests. The results are summarized in the table. Summary of Task Performance by Condition Mean (SD) Variable t value p value High cute Low cute n = 20 n = 20 1.80 .60 Change in task performance 1.99 .05 (1.83) (1.97) -4.35 -3.35 Change in grip strength 0.36 .72 10.42 6.98 1.64 .02 Change in heart rate 1.89 .07 3.22 2.06 Cohen’s d .63 .12 .61 1. For which of the variables was the difference between the high cute and low cute conditions statistically significant? For the significant differences, which condition had the larger mean change, high cute or low cute? 2. Would the significant differences be considered large? 3. Would the significant differences be considered important differences? 4. Would the sample size be considered excessively large? Answers: 1. Assuming alpha was set equal to .05, change in task performance was the only variable for which the difference was statistically significant. If alpha had been set at .10, the change in heart rate would also have been statistically significant, and in the article, the authors do indicate that heart rate change was statistically significant. For both of these variables, the high cute mean change was significantly larger than the low cute mean change. 2. Cohen’s d is useful for determining whether or not a difference is large, and for both change in task performance and change in heart rate, d falls in the category of a medium effect size. 3. The results would be considered important because they support the authors’ hypotheses. Namely, the authors concluded that the results provided evidence that cuteness of viewed stimuli has an effect on behavioral carefulness. Viewing high cute as compared to low cute stimuli resulted in greater improvement on a task that required carefulness. The significant difference regarding change in heart rate provided evidence that the improved behavior in the high cute condition could be attributed to general physiological arousal. 4. The sample sizes would be considered quite small. In fact, with only 20 participants in each of the two conditions, it appears that the authors barely had sufficient statistical power to find a moderate-sized effect size to be statistically significant. 8 Witte & Witte, 10e Chapter 14 Page 9 of 12 Pages Exercise 13 McConnell, Brown, Shoda, Stayton, and Martin (2011) carried out an investigation of the well-being benefits of pets for everyday people. In Study 1, they addressed the question: Do pet owners enjoy better well-being than nonowners? A sample of 217 people participated in the study. The participants completed a battery of instruments to provide data on well-being, personality, and attachment style. The only Study 1 results presented here concern the six well-being measures. Summary of Well-Being Measures for Pet Owners and Nonowners Mean Cohen’s Variable p value Owners Nonowners t(215) d n = 167 n = 50 Depression 30.00 31.72 1.29 .198 .21 † Loneliness 38.64 41.64 1.79 .075 .29 Self-esteem 34.27 32.21 2.59* .010 .42 Physical illnesses and 3.98 4.21 0.45 .653 .07 symptoms Subjective happiness 5.20 5.06 0.66 .510 .11 Exercise and fitness 4.40 3.94 2.64** .009 .43 † p < .08. * p < .05. ** p < .01. 1. For which of the variables was the difference between owners and nonowners statistically significant? For the significant differences, which group had the more positive outcome? 2. Would the significant differences be considered large? 3. Would the significant differences be considered important differences? 4. Would the sample size be considered excessively large? Answers: 1. The variables for which the owner-nonowner difference was statistically significant were: a. Loneliness: Owners had the more positive outcome because their mean score indicates less loneliness b. Self-esteem: Owners had the more positive outcome because their mean score indicates a higher level of self-esteem c. Exercise and fitness: Owners had the more positive outcome because their mean score indicates more exercise and better fitness. 2. The effect sizes associated with the significant differences range from .29 to .43. According to Cohen’s guidelines, these effect sizes would be considered small. However, researchers are encouraged to apply their own guidelines to the interpretation of effect sizes based on results typically obtained in their fields. For this field of study, effect sizes of .42 and .43 might well be considered moderate. 3. Effect sizes less than .20 are frequently considered to be extremely small or trivial. The differences identified by the authors as statistically significant are all greater than .20. So based on effect sizes, the differences would very likely be 9 Witte & Witte, 10e Chapter 14 Page 10 of 12 Pages considered to be important. Note that the authors reported three levels of statistical significance: p < .08, p < .05, and p < .01. From this, we might assume that the authors considered the effect size of .29 worthy of special attention and decided to utilize the nonconventional alpha of .08 when judging statistical significance. 4. The total sample size of 217 is not excessively large. Trivial differences were not statistically significant. Exercise 14 In their meta-analysis of the effects of intelligent tutoring systems (ITS) on students’ mathematical learning, Steenbergen-Hu and Cooper (2013) examined results in many different categories such as math subject, ITS duration, schooling level, and research design. To calculate effect sizes, the comparison group mean was subtracted from the ITS mean and the difference was divided by the average of the two groups’ standard deviations. With respect to research design, they found that the average effect size from 15 quasi-experimental studies was .09, 95% CI (.05, .14), and the average effect size from 11 true experiments was -.01, 95% CI (-.07, .05). 1. Using Cohen’s guidelines, how would you interpret the mean effect size from quasi-experimental studies? 2. Based on the 95% CI, what would we conclude regarding the statistical significance of the quasi-experimental mean effect size? 3. Using Cohen’s guidelines, how would you interpret the mean effect size from true experiments? 4. Based on the 95% CI, what would we conclude regarding the statistical significance of the true experiment mean effect size? Answers: 1. An effect size of .09 would be considered very small or trivial. 2. The 95% CI does not contain 0, therefore we would conclude that the average effect of ITS tutoring on mathematical learning is significantly greater than 0. 3. An effect size of -.01 would be considered very small or trivial. 4. The 95% CI does contain 0, therefore we would conclude that the average effect is not different from 0. In other words, we would conclude that the ITS tutoring had no effect on mathematical learning. Exercise 15 Steenbergen-Hu and Cooper (2013) also examined the effects of ITS for students in elementary school, middle school and high school (see Exercise 14). The results are shown below. Elementary school Middle school High school Mean ES = .41 Mean ES = .09 Mean EF = -.09 10 95% CI: (-.01, .84) 95% CI: (.01, .17) 95% CI: (-.17, -.02) Witte & Witte, 10e Chapter 14 Page 11 of 12 Pages 1. Using Cohen’s guidelines, how would you interpret the mean effect size for the three school levels? 2. Based on the 95% CI, what would we conclude regarding the statistical significance of the three mean effect sizes? Answers: 1. The elementary school mean ES would be considered small or even perhaps moderate. The middle school and high school ES’s would be considered very small or trivial. 2. Even though the elementary school mean ES is the largest of the three, its 95% CI contains 0. Therefore, we would conclude that the average effect of ITS on elementary school students’ mathematical learning is not different from 0, or, in other words, we would conclude that, on average, ITS tutoring has no effect on elementary school students’ mathematical learning. On the other hand, 0 is not in the 95% CI’s of either middle school or high school. We would conclude that, on average, ITS tutoring has a very small positive effect on the mathematical learning of middle school students and a very small negative effect on the mathematical learning of high school students. Exercise 16 Researchers have noted that pathological gambling and obsessive-compulsive disorder (OCD) share some similarities such as being unable to delay or withhold repetitive behaviors. Accordingly, Durdle, Gorey, and Stewart (2008) were interested in the relations among pathological gambling and OCD. In their meta-analysis, they calculated ES’s using the formula for Cohen’s d, subtracting the mean of the comparison group of nonpathological gambers from the mean of the pathological gambling group. They then weighted the ES’s based on the number of study participants. The weighted ES’s are presented here. Obsessive-compulsive comorbidity OCD in first-degree relatives Obsessive-compulsive personally disorder Obsessive-compulsive traits Mean ES = .07disorder 95% comorbidity CI: (-.05, .19) Mean ES = .08 95% CI: (-.03, .19) Mean ES = .23 95% CI: (-.07, .35) Mean ES = 1.01 95% CI: (.88, 1.14) 1. Using Cohen’s guidelines, how would you interpret the mean effect size for the four areas? 2. Based on the 95% CI, what would we conclude regarding the statistical significance of the four mean effect sizes? Answers: 1. The mean ES’s of .07 and .08 would be considered very small or trivial. The mean ES of 23 would be considered small. The mean ES of 1.01 would be 11 Witte & Witte, 10e Chapter 14 Page 12 of 12 Pages considered large. Note that an ES of 1.01 is equivalent to one standard deviation unit. 2. Only the mean ES for obsessive-compulsive traits is statistically significant, because 0 is not contained in the 95% CI. We would conclude that pathological gamblers show more obsessive-compulsive traits than nonpathological gamblers. References Durdle, H., Gorey, K. M., & Stewart, S. H. (2008). A meta-analysis examining the relations among pathological gambling, obsessive-compulsive disorder, and obsessive-compulsive traits. Psychological Reports, 103, 485-498. Juvonen, J., Wang, Y., & Espinoza, G. (2013). Physical aggression, spreading of rumors, and social prominence in early adolescence: Reciprocal effects supporting gender similarities? Journal of Youth and Adolescence, 42, 1801-1810. McConnell, A. R., Brown, C. M., Shoda, T. M., Stayton, L. E., & Martin, C. E. (2011). Friends with benefits: On the positive consequences of pet ownership. Journal of Personality and Social Psychology, 101, 1239-1252. Mounsey, R., Vandehey, M. A., & Diekhoff, G. M. (2013). Working and non-working university students: Anxiety, depression, and grade point average. College Student Journal, 47, 379-389. Sherman, G. D., Haidt, J., & Coan, J. A. (2009). Viewing cute images increases behavioral carefulness. Emotion, 9, 282-286. Steenbergen-Hu, S., & Cooper, H. (2013). A meta-analysis of the effectiveness of intelligent tutoring systems on K-12 students’ mathematical learning. Journal of Educational Psychology, 105, 970-987. Willoughby, T., Porter, L., Belsito, L., & Yearsley, T. (1999) Use of elaboration strategies by students in grades two, four, and six. Elementary School Journal, 99, 221-231. 12