Download Chapter 8: Populations, Samples, and Probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Misuse of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Witte & Witte, 10e
Chapter 14
Page 1 of 12 Pages
Chapter 14: t Test for Two Independent Samples
Exercise 1
1, specify both the null and alternative hypotheses
for each of the following studies. Assume that the data from all these studies will be
analyzed with directional tests.
a. College sophomores were randomly assigned to a study skills training program
that would receive training designed to enhance their study skills or to a
comparison control group that would receive no training. The researchers wanted
to find out if the study skills training program would improve the students’ grades
as measured by their grade point averages.
b. College men who volunteered to participate in a communication skills research
project were randomly assigned to an experimental treatment condition or to a
control treatment condition. The men in the experimental condition received
special training designed to improve their skills in communicating with college
women The men in the control condition met for an equal amount of time, but
instead of communication training, they discussed ways to manage their money.
After the training was completed, each volunteer was observed in a special lab as
he talked with a female confederate. The volunteer’s communication skills were
scored by a panel of judges. Higher scores indicated more effective
communication skills.
c. Depressed teenagers were randomly assigned to an experimental treatment
condition or to a control treatment condition. Teenagers in the experimental
treatment were given a drug that was hoped would reduce symptoms of
depression. The teenagers in the control condition were given a placebo. At the
end of the experimental treatment time, each participant took a standardized
depression test on which high scores indicated a relatively high level of
depression.
Answers:
a. H0
1–
b. H0
1–
c. H0
1–
≤ 0; H1
≤ 0; H1
2 ≥ 0; H1
–
–
1–
>0
>0
2<0
2
1
2
2
1
2
Exercise 2
Using Table B in Appendix C of your textbook, find the critical t values for each of the
following hypothesis tests:
a.
b.
c.
d.
e.
oneoneonetwotwo-tai
= 15; n2 = 15
= 18; n2 = 21
1 = 20; n2 = 16
1 = 15; n2 = 17
1 = 45; n2 = 45
1
1
1
Witte & Witte, 10e
Chapter 14
Page 2 of 12 Pages
Answers:
a. -2.467
b. -1.697
c. 2.457
d. ±2.750
e. ±2.000
Exercise 3
Willoughby, Porter, Belsito, and Yearsley (1999) investigated the effectiveness of special
study strategies among elementary school children for the task of learning factual
information about unfamiliar animals. Their study included three grade levels and three
treatment conditions. Summary information for only the fourth and sixth graders in the
keyword strategy condition is presented in this exercise. The mean memory test
performance of the 15 fourth graders was 8.20, with SS equal to 102.06. The mean
memory test performance of the 15 sixth graders was 10.73, with SS equal to 168.57.
Was there a significant difference between the memory performance of the fourth and
sixth graders? Follow the steps below to answer this question. Designate the fourth
graders as group 1 and the sixth graders as group 2.
a. Calculate the degrees of freedom.
b. Use Table B in Appendix C of your textbook to identify the critical t value for a
two-tailed test with alpha equal to .05.
c. Calculate the pooled variance estimate.
d. Calculate the estimated standard error.
e. Calculate observed t.
f. Present the statistical decision.
g. Present a verbal summary of the results.
Answers:
a.
b.
c.
d.
e.
f.
g.
df = 28
±2.048
Pooled variance estimate = 9.6654
Estimated standard error = 1.1352
Observed t = -2.2287
Reject the null hypothesis.
The sixth graders outperformed the fourth graders on the memory test.
Exercise 4
As your textbook indicates, computer software frequently provides the p-value associated
with a specified t test value. Microsoft Excel was used to obtain the one-tailed and twotailed p-values shown in the items below. These p-values are for independent t tests that
were carried out to investigate a difference between an experimental group and a control
2
Witte & Witte, 10e
Chapter 14
Page 3 of 12 Pages
group. For each, indicate whether the result would be statistically significant with alpha
equal to .05 for a one-tailed test and for a two-tailed test.
a.
b.
c.
d.
e.
One-tailed p-value = 0.267006; two-tailed p-value = 0.534011
One-tailed p-value = 0.176499; two-tailed p-value = 0.352999
One-tailed p-value = 0.000112; two-tailed p-value = 0.000223
One-tailed p-value = 0.002699; two-tailed p-value = 0.005399
One-tailed p-value = 0.040598; two-tailed p-value = 0.081196
Answers:
a.
b.
c.
d.
e.
One-tailed is not significant; two-tailed is not significant
One-tailed is not significant; two-tailed is not significant
One-tailed is significant; two-tailed is significant
One-tailed is significant; two-tailed is significant
One-tailed is significant; two-tailed is not significant
Exercise 5
Five exact p-values are shown below.
1.
2.
3.
4.
5.
p = 0.534011
p = 0.352999
p = 0.000223
p = 0.005399
p = 0.081196
Select the approximate p-value that would most accurately describe each of these exact
p-values.
a. p > .05
b. p < .05
c. p < .01
d. p < .001
Answers:
1.
2.
3.
4.
5.
a
a
d
c
a
3
Witte & Witte, 10e
Chapter 14
Page 4 of 12 Pages
Exercise 6
Select the approximate p-value for each of the following test results:
a.
b.
c.
d.
1.
2.
3.
4.
5.
6.
p > .05
p < .05
p < .01
p < .001
one-tailed test, lower tail critical; df = 18; t = -1.857
two-tailed test; df = 14; t = -2.335
one-tailed test, upper tail critical; df = 30; t = 3.249
two-tailed test; df = 24; t = 0.724
one-tailed test, lower tail critical; df = 8; t = -4.684
two-tailed test; df = 62; t = 3.279
Answers:
1.
2.
3.
4.
5.
6.
b
b
c
a
d
c
Exercise 7
In Exercise 3, you worked with summary data from a study conducted by Willoughby,
Porter, Belsito, and Yearsley (1999). Here’s more summary information from that study
for students in the imagery condition. The mean memory test performance of the 15
fourth graders was 5.80, with SS equal to 146.06. The mean memory test performance of
the 15 sixth graders was 10.47, with SS equal to 305.32. Follow the steps given below to
construct a 95% confidence interval around the obtained difference. Designate the fourth
graders as group 1 and the sixth graders as group 2.
1. Calculate the degrees of freedom.
2. Use Table B in Appendix C of your textbook to identify the critical t value for a
two-tailed test with alpha equal to .05.
3. Calculate the pooled variance estimate.
4. Calculate the estimated standard error.
5. Calculate the 95% confidence interval.
6. Provide a verbal interpretation of the confidence interval.
Answers:
1. df = 28
2. critical t = 2.048
4
Witte & Witte, 10e
Chapter 14
3.
4.
5.
6.
Page 5 of 12 Pages
Pooled variance estimate = 16.1207
Estimated standard error = 1.4661
95% CI: -7.67 to -1.67
We are 95% confidence that the difference in the population means is between
-7.67 and -1.67 test score points. The negative signs indicate that, on average, the
fourth graders performed less well than the sixth graders. Also, because both
endpoints of the confidence interval are negative and zero is not included in the
interval, we can conclude that the poorer performance of the fourth graders is
probably real.
Exercise 8
Calculate a standardized effect size, Cohen’s d, for the data given in Exercise 3.
Interpret this effect size using Cohen’s guidelines which are provided in Table 14.2 of
your textbook.
Answers:
d = -0.81; Cohen’s guidelines indicate that this is a large effect. Note that it is common
practice to present the absolute value of the effect size. In this case, then, the value of d
would be reported as 0.81.
Exercise 9
Using the results of Exercise 3 and Exercise 8, write a statement reporting the outcome as
it might appear in a published article.
Answer:
Memory test scores for the fourth graders ( X = 8.20, s = 2.70) and sixth graders ( X =
10.73, s = 3.47) differed significantly [t(28) = -2.23, p < .05 and d = 0.81].
Exercise 10
Mounsey, Vandehey, and Diekhoff (2013) investigated anxiety of working and nonworking university students. The researchers found that the working students reported
more anxiety symptoms on the Beck Anxiety Inventory (M = 8.17, SD = 7.94) than the
non-working students (M = 4.40, SD = 4.97) and that the difference was statistically
significant [t(106) = –2.42, p < .05].
1. Although the authors do not explicitly report their statistical decision, based
on the information that they did provide, would they have rejected or retained
the null hypothesis?
2. Because the difference was statistically significant, can we conclude that the
difference between the sample means was large?
5
Witte & Witte, 10e
Chapter 14
Page 6 of 12 Pages
3. Because the difference was statistically significant, can we conclude that the
working students had an anxiety level that would be considered abnormally
high?
4. If the investigators repeated their study with another sample of working and
non-working university students drawn from the same population, should we
expect that they will obtain a similar result?
5. Your textbook tells you that large sample sizes can produce statistically
significant results that lack importance. The analysis of Beck Anxiety
Inventory scores was based on 78 working students and 30 non-working
students, or a total of 108 students. Would a sample size of 108 be considered
excessively large?
Answers:
1. The researchers would have rejected the null hypothesis.
2. A statistically significant difference is not necessarily a large difference. To
determine whether or not a difference is large, it is a good idea to calculate a
standardized effect size. Exercise 10 provides the information needed to
calculate Cohen’s d, which comes out to be approximately 0.52. Using
Cohen’s guidelines presented in section 14.9 of your textbook, we would
conclude that the effect size is medium.
3. Because the working students’ anxiety test mean is significantly higher than
that of the non-working students, we cannot automatically conclude that the
mean is in the abnormal range. In the article, the authors indicate that the
working students’ mean was in the mild anxiety range and the non-working
students’ mean was in the minimal anxiety range.
4. A statistically significant result implies that the observed outcome is reliable
and would likely reappear if the study were repeated.
5. A total sample size of 108 would not generally be considered excessively
large for testing for a standardized effect size in the vicinity of one-half of a
standard deviation. Samples of 500 or more in each group (i.e., a total of
1,000) students, however, would be considered excessively large for testing
for an effect size approximately equal to 0.5.
Exercise 11
Juvonen, Wang, and Espinoza (2013) investigated social prominence, physical
aggression, and spreading rumors among adolescents at three time points. The outcome
measures were based on peer nominations made individually by each participating
student. Social prominence was assessed by asking students to name grade mates whom
they considered the “coolest”. Physical aggression was measured by asking students to
name grade mates who “start fights or push other kinds around”. Spreading of rumors
was assessed by asking students to name grade mates who “spread nasty rumors about
other kids”. This exercise is only concerned with the results of analyses carried out on
the fall of 8th grade data that were designed to test for gender differences. The table
presented here presents a summary of the analysis results.
6
Witte & Witte, 10e
Chapter 14
Variable
Social prominence
Physical aggression
Spreading rumors
Page 7 of 12 Pages
Summary of Fall of 8th Grade Results
Mean
(SD)
t value
Boys
Girls
n = 872
n = 1,023
7.30
7.00
.71
(9.05)
(8.48)
6.65
3.06
9.93
(8.98)
(5.60)
4.56
3.28
4.80
(6.01)
(4.87)
p value
Cohen’s d
.48
.03
< .001
.49
< .001
.24
1. For which of the variables was the difference between boys and girls statistically
significant? For the significant difference(s), whose mean was higher, the boys’
or the girls’?
2. Would the significant difference(s) be considered large?
3. Would the significant differences be considered important differences?
4. Would the sample size be considered excessively large?
Answers:
1. Boys and girls were significantly different on the measures of physical
aggression and spreading rumors. The boys’ mean was higher on both of
these measures.
2. No, they weren’t large. Cohen’s d for physical aggression was .49, a
moderate effect size, and Cohen’s d for spreading rumors was .24, a small
effect size.
3. In a study on gender differences in these variables, both moderate- and smallsized differences would probably be considered important. Even a nonsignificant difference, such as the gender difference regarding social
prominence, might be considered important if it fits with the authors’ theory
and expected outcomes.
4. The total sample size was 1,895 and this is indeed a very large sample. We
know that excessively large samples can produce statistically significant
results that are not important. In this study, however, the authors presented
Cohen’s d along with means, standard deviations, and p values. Therefore, we
have a lot of information to use when judging the importance of the results.
Exercise 12
Sherman, Haidt, and Coan (2009) were interested in the behavioral carefulness that is
needed when caring for a small, delicate child. In a research study, they addressed the
question of whether or not perceiving cuteness could enhance behavioral carefulness. In
Experiment 1, a total of 40 undergraduate women were randomly assigned to one of two
conditions to view slides of puppies and kittens (high cuteness) or dogs and cats (low
cuteness). The research participants performed a task using tweezers that was designed
to measure carefulness. This task was performed both before and after viewing the
7
Witte & Witte, 10e
Chapter 14
Page 8 of 12 Pages
slides. Grip strength and heart rate were also measured before and after viewing the
slides. The before-after differences in the measures were calculated for each participant
and mean differences were analyzed via independent samples t-tests. The results are
summarized in the table.
Summary of Task Performance by Condition
Mean
(SD)
Variable
t value
p value
High cute Low cute
n = 20
n = 20
1.80
.60
Change in task performance
1.99
.05
(1.83)
(1.97)
-4.35
-3.35
Change in grip strength
0.36
.72
10.42
6.98
1.64
.02
Change in heart rate
1.89
.07
3.22
2.06
Cohen’s
d
.63
.12
.61
1. For which of the variables was the difference between the high cute and low cute
conditions statistically significant? For the significant differences, which
condition had the larger mean change, high cute or low cute?
2. Would the significant differences be considered large?
3. Would the significant differences be considered important differences?
4. Would the sample size be considered excessively large?
Answers:
1. Assuming alpha was set equal to .05, change in task performance was the only
variable for which the difference was statistically significant. If alpha had been
set at .10, the change in heart rate would also have been statistically significant,
and in the article, the authors do indicate that heart rate change was statistically
significant. For both of these variables, the high cute mean change was
significantly larger than the low cute mean change.
2. Cohen’s d is useful for determining whether or not a difference is large, and for
both change in task performance and change in heart rate, d falls in the category
of a medium effect size.
3. The results would be considered important because they support the authors’
hypotheses. Namely, the authors concluded that the results provided evidence
that cuteness of viewed stimuli has an effect on behavioral carefulness. Viewing
high cute as compared to low cute stimuli resulted in greater improvement on a
task that required carefulness. The significant difference regarding change in
heart rate provided evidence that the improved behavior in the high cute condition
could be attributed to general physiological arousal.
4. The sample sizes would be considered quite small. In fact, with only 20
participants in each of the two conditions, it appears that the authors barely had
sufficient statistical power to find a moderate-sized effect size to be statistically
significant.
8
Witte & Witte, 10e
Chapter 14
Page 9 of 12 Pages
Exercise 13
McConnell, Brown, Shoda, Stayton, and Martin (2011) carried out an investigation of the
well-being benefits of pets for everyday people. In Study 1, they addressed the question:
Do pet owners enjoy better well-being than nonowners? A sample of 217 people
participated in the study. The participants completed a battery of instruments to provide
data on well-being, personality, and attachment style. The only Study 1 results presented
here concern the six well-being measures.
Summary of Well-Being Measures for Pet Owners and Nonowners
Mean
Cohen’s
Variable
p value
Owners Nonowners t(215)
d
n = 167
n = 50
Depression
30.00
31.72
1.29
.198
.21
†
Loneliness
38.64
41.64
1.79
.075
.29
Self-esteem
34.27
32.21
2.59*
.010
.42
Physical illnesses and
3.98
4.21
0.45
.653
.07
symptoms
Subjective happiness
5.20
5.06
0.66
.510
.11
Exercise and fitness
4.40
3.94
2.64**
.009
.43
†
p < .08.
* p < .05.
** p < .01.
1. For which of the variables was the difference between owners and nonowners
statistically significant? For the significant differences, which group had the more
positive outcome?
2. Would the significant differences be considered large?
3. Would the significant differences be considered important differences?
4. Would the sample size be considered excessively large?
Answers:
1. The variables for which the owner-nonowner difference was statistically
significant were:
a. Loneliness: Owners had the more positive outcome because their mean
score indicates less loneliness
b. Self-esteem: Owners had the more positive outcome because their mean
score indicates a higher level of self-esteem
c. Exercise and fitness: Owners had the more positive outcome because their
mean score indicates more exercise and better fitness.
2. The effect sizes associated with the significant differences range from .29 to .43.
According to Cohen’s guidelines, these effect sizes would be considered small.
However, researchers are encouraged to apply their own guidelines to the
interpretation of effect sizes based on results typically obtained in their fields. For
this field of study, effect sizes of .42 and .43 might well be considered moderate.
3. Effect sizes less than .20 are frequently considered to be extremely small or
trivial. The differences identified by the authors as statistically significant are all
greater than .20. So based on effect sizes, the differences would very likely be
9
Witte & Witte, 10e
Chapter 14
Page 10 of 12 Pages
considered to be important. Note that the authors reported three levels of
statistical significance: p < .08, p < .05, and p < .01. From this, we might assume
that the authors considered the effect size of .29 worthy of special attention and
decided to utilize the nonconventional alpha of .08 when judging statistical
significance.
4. The total sample size of 217 is not excessively large. Trivial differences were not
statistically significant.
Exercise 14
In their meta-analysis of the effects of intelligent tutoring systems (ITS) on students’
mathematical learning, Steenbergen-Hu and Cooper (2013) examined results in many
different categories such as math subject, ITS duration, schooling level, and research
design. To calculate effect sizes, the comparison group mean was subtracted from the
ITS mean and the difference was divided by the average of the two groups’ standard
deviations. With respect to research design, they found that the average effect size from
15 quasi-experimental studies was .09, 95% CI (.05, .14), and the average effect size
from 11 true experiments was -.01, 95% CI (-.07, .05).
1. Using Cohen’s guidelines, how would you interpret the mean effect size from
quasi-experimental studies?
2. Based on the 95% CI, what would we conclude regarding the statistical
significance of the quasi-experimental mean effect size?
3. Using Cohen’s guidelines, how would you interpret the mean effect size from true
experiments?
4. Based on the 95% CI, what would we conclude regarding the statistical
significance of the true experiment mean effect size?
Answers:
1. An effect size of .09 would be considered very small or trivial.
2. The 95% CI does not contain 0, therefore we would conclude that the average
effect of ITS tutoring on mathematical learning is significantly greater than 0.
3. An effect size of -.01 would be considered very small or trivial.
4. The 95% CI does contain 0, therefore we would conclude that the average effect
is not different from 0. In other words, we would conclude that the ITS tutoring
had no effect on mathematical learning.
Exercise 15
Steenbergen-Hu and Cooper (2013) also examined the effects of ITS for students in
elementary school, middle school and high school (see Exercise 14). The results are
shown below.
Elementary school
Middle school
High school
Mean ES = .41
Mean ES = .09
Mean EF = -.09
10
95% CI: (-.01, .84)
95% CI: (.01, .17)
95% CI: (-.17, -.02)
Witte & Witte, 10e
Chapter 14
Page 11 of 12 Pages
1. Using Cohen’s guidelines, how would you interpret the mean effect size for the
three school levels?
2. Based on the 95% CI, what would we conclude regarding the statistical
significance of the three mean effect sizes?
Answers:
1. The elementary school mean ES would be considered small or even perhaps
moderate. The middle school and high school ES’s would be considered very
small or trivial.
2. Even though the elementary school mean ES is the largest of the three, its 95% CI
contains 0. Therefore, we would conclude that the average effect of ITS on
elementary school students’ mathematical learning is not different from 0, or, in
other words, we would conclude that, on average, ITS tutoring has no effect on
elementary school students’ mathematical learning. On the other hand, 0 is not in
the 95% CI’s of either middle school or high school. We would conclude that, on
average, ITS tutoring has a very small positive effect on the mathematical
learning of middle school students and a very small negative effect on the
mathematical learning of high school students.
Exercise 16
Researchers have noted that pathological gambling and obsessive-compulsive disorder
(OCD) share some similarities such as being unable to delay or withhold repetitive
behaviors. Accordingly, Durdle, Gorey, and Stewart (2008) were interested in the
relations among pathological gambling and OCD. In their meta-analysis, they calculated
ES’s using the formula for Cohen’s d, subtracting the mean of the comparison group of
nonpathological gambers from the mean of the pathological gambling group. They then
weighted the ES’s based on the number of study participants. The weighted ES’s are
presented here.
Obsessive-compulsive comorbidity
OCD in first-degree relatives
Obsessive-compulsive personally
disorder
Obsessive-compulsive traits
Mean ES = .07disorder
95%
comorbidity
CI: (-.05, .19)
Mean ES = .08
95% CI: (-.03, .19)
Mean ES = .23
95% CI: (-.07, .35)
Mean ES = 1.01
95% CI: (.88, 1.14)
1. Using Cohen’s guidelines, how would you interpret the mean effect size for the
four areas?
2. Based on the 95% CI, what would we conclude regarding the statistical
significance of the four mean effect sizes?
Answers:
1. The mean ES’s of .07 and .08 would be considered very small or trivial. The
mean ES of 23 would be considered small. The mean ES of 1.01 would be
11
Witte & Witte, 10e
Chapter 14
Page 12 of 12 Pages
considered large. Note that an ES of 1.01 is equivalent to one standard deviation
unit.
2. Only the mean ES for obsessive-compulsive traits is statistically significant,
because 0 is not contained in the 95% CI. We would conclude that pathological
gamblers show more obsessive-compulsive traits than nonpathological gamblers.
References
Durdle, H., Gorey, K. M., & Stewart, S. H. (2008). A meta-analysis examining the
relations among pathological gambling, obsessive-compulsive disorder, and
obsessive-compulsive traits. Psychological Reports, 103, 485-498.
Juvonen, J., Wang, Y., & Espinoza, G. (2013). Physical aggression, spreading of
rumors, and social prominence in early adolescence: Reciprocal effects supporting
gender similarities? Journal of Youth and Adolescence, 42, 1801-1810.
McConnell, A. R., Brown, C. M., Shoda, T. M., Stayton, L. E., & Martin, C. E. (2011).
Friends with benefits: On the positive consequences of pet ownership. Journal of
Personality and Social Psychology, 101, 1239-1252.
Mounsey, R., Vandehey, M. A., & Diekhoff, G. M. (2013). Working and non-working
university students: Anxiety, depression, and grade point average. College Student
Journal, 47, 379-389.
Sherman, G. D., Haidt, J., & Coan, J. A. (2009). Viewing cute images increases
behavioral carefulness. Emotion, 9, 282-286.
Steenbergen-Hu, S., & Cooper, H. (2013). A meta-analysis of the effectiveness of
intelligent tutoring systems on K-12 students’ mathematical learning. Journal of
Educational Psychology, 105, 970-987.
Willoughby, T., Porter, L., Belsito, L., & Yearsley, T. (1999) Use of elaboration
strategies by students in grades two, four, and six. Elementary School Journal, 99,
221-231.
12