Download CEP 933 Assignment One

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Answer Key
CEP 933 Assignment One
Due January 31 and February 1, 2000
1) For this question you will use data from the article by Hannaway and Talbert, found at
the front of your class packet as well as SPSS. In Table I Hannaway and Talbert present
descriptive data on the variable "Teacher community" (TC). Use the data in Table I to
examine the mean difference on TC at the  = .10 level of significance. Compare the means
for Urban versus Suburban teachers.
Urban
Suburban
n
72
126
Mean
26.4
27.2
Standard Deviation
3.2
2.9
a. For this comparison,
1. Pose the null and alternative hypotheses about the comparison of the two means. Use
both symbols and words.
H 0 : u   s
The mean ratings of teacher community for the population of urban schools are equal to
the mean ratings of teacher community for the population of suburban schools.
H1 :  u   s
The mean ratings of teacher community for the population of urban schools are different
from the mean ratings of teacher community for population of suburban schools.
PS. The underlined assumption in statistical testing is that the null hypothesis is assumed
to be true until it is shown to be false. So, always state the null hypothesis as if it were
true.
2. Estimate the pooled variance of the scores.
S 2p 
nu  1 * su2  ns  1 * ss2
nu  ns  2

72  1 * 3.22  126  1 * 2.92
72  126  2
P.S. The pooled standard deviation is calculated by:
S p  S p2  9.073  3.012
1
 9.073
3. What test statistic will you use? What are the appropriate degrees of freedom? What
would the critical value be for  = .10?
Test: t statistic
Degrees of Freedom: nu  ns  2  72  126  2  196
t-critical for two sided test and 0.1(Table C, p.619)  1.658
PS. For a more conservative analysis we use the df = 120 (t=1.658) instead of df= 
(t=1.645). When the df is much larger than 120, perhaps df=300 or larger, we use the
bottom line with df=  .
4. Compute the test statistic and determine its probability (p value) as best you can.
t
xu  xs
26.4  27.2

 1.802
1 1
1
1
Sp *

3.0 *

nu ns
72 126
For the two-tailed hypothesis we can use the absolute value of the t-statistic |t|=1.802.
We can affirm that assuming the null hypothesis is true, the probability of obtaining a tstatistic as big or bigger than 1.802 is less than 0.1 and more than 0.05. This occurs
because the |t|=1.802 is between 1.645 (p=0.1) and 1.960 (p=0.05), as shown in the table
below:
Two-tailed significance level
n>120
p=0.1
1.645
p=0.05
1.960
p= ?
1.802
We can write 0.10 > p > 0.05, or we can interpolate. Since 1.802 is about half way
between 1.645 and 1.645, following the computation below, we can guess that p  .075 .

Distance from 1.802 to 1.645
1.802 - 1.645 0.157

 .498.
1.96 1.645 0.315
Distance from 1.96 to 1.645
b. Next you will use the SPSS school level NELS data set to make a similar comparison of
urban and suburban schools on teacher community (called tchcomm in our data set). The
urbanicity variable is called g10urban, and the groups to be compared are coded urban =1
and suburban=2.
2
1. For this comparison test the same null and alternative hypotheses you tested in part a, and include
relevant SPSS output.
Group Sta tisti cs
Teacher Community
(High values=lots o'
community )
URBANICITY OF THE
STUDE NT'S SCHOOL
URBAN
N
SUBURBA N
Mean
St d.
Deviation
St d. E rror
Mean
327
.2567
6.9513
.3844
504
.1241
5.9923
.2669
Independent Samples Test
Levene's Test for
Equality of Variances
F
Teacher
Community
(High
values=lots
o'
community)
Equal
variances
assumed
Equal
variances
not
assumed
9.250
Sig.
.002
t-test for Equality of Means
t
df
Sig.
(2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
.292
829
.770
.1326
.4535
-.7576
1.0227
.283
622.369
.777
.1326
.4680
-.7865
1.0516
2. Explain which test statistic on the output gives you the p value for an appropriate test.
If we use  =.05 for Levene’s test, we observe that the p value (Sig) is 0.002, p < .05. We
conclude that equal variances are not assumed. This conclusion gives us a p-value of
0.777 for the t-test.
If we use a very small  (  =.001) for Levene’s Test, we can assume equal variances.
3. What is your decision about the hypotheses for these data?
We may use the same significance level (   0.1) of the previous study to compare the
results. Because t = 0.283, with a p= .777, p > .05. The p-value 0.777 is also bigger than
0.1, so using this significance level (   0.1 ), we do not reject the null hypothesis that
“the mean ratings of teacher community for the population of urban schools are equal to
the mean ratings of teacher community for the population of suburban schools”. We
proceed as if the null hypothesis H 0 were true.
3
c. Interpret the two test results you found in parts a and b.
n
Urban
Suburban
72
126
H&T Study
s
xu  xs
x
26.4 3.2
27.2 2.9
0.8
sp
t
n
x
NELS Study
s
xu  xs
3.0
1.80
327
504
.26
.12
6.95
5.99
0.13
sp
t
6.4 .283
1. How do the components of the t tests (differences in means, variance estimates, sample
sizes) contribute to the difference in p values? (Note: the scales of the teacher community
variables in H&T and SPSS are different!!)
The differences between the two tests are:

the sample size,
In the NELS study, the urban and suburban samples are larger. This means that even for a similar
sized effect, we will get a larger t value.

the differences between the means,
The mean differences are on different scales. However, we can use the effect size d to compare the
two mean differences.
d H &Tstudy 
xu  xs   0.8  0.27
Sp
3.0
d NELSstudy 
xu  xs   0.13  0.02
Sp
6.4
Even though the NELS mean difference appears larger, when compared to the scale of the scores and the
spread seen, we conclude that the difference is quite small (0.02). The H&T effect shows over one-fourth of
a standard deviation unit difference (0.27) versus a difference of 0.02 for the NELS study.

the pooled variance,
The variation (scale) differences are reflected in the effect sizes.

the differences in t values and the p-values,
The critical t values differ slightly because of the differences in sample sizes.
However, our t table does not show t values for df=196 and df=829. We know that the
tdf 196  tdf 829 . For both studies (H&T and NELS study) let us use the df=  . Then, only
if we use different  values will the t-values differ. At  =0.1 for the H&T study our
t-critical =  1.645 whereas at  =0.05 for the NELS study t-critical = 1.96.
A better comparison would consider the same significance level  =0.1 for both studies.
In our assignment, this does not change the results of our analysis, since the conclusion of
H&T is the same using  =0.1 or  =0.05.
4
2. What do the results tell you about differences in teacher community between the two different locations in
these two studies?
There are moderately large and significant differences between teacher community
levels for suburban versus urban schools in the H&T study, but there are no clear
differences in the NELS study.
3. Can you think of any explanations of these findings?
The studies may address slightly different populations. We know NELS is a
national sample collected in 1988 and H&T used High School and Beyond data from
1982-84. It is also a national sample but from a few years earlier.
The urbanity categories may not be well defined in the two studies. So, for
instance, the label “suburban” may be applied to cases in NELS that may have been
considered urban in the H&T study.
Furthermore, we do not know exactly how teacher community was measured in
both studies. In NELS, teacher community is based on 10 items (see pp.166-7 in your
course packet). H&T’s measure had 24 items, and we are not told what response scale
was used. Differences in definitions of teacher community and measurement may be
related to the findings of the analysis.
2) A study has been done to examine the effects of different types of reinforcement on canine learning. The
outcome is a score on a dog obedience test (higher scores are better).
Two types of reinforcement were used: For the first group, success is rewarded with only verbal praise, and
in the second group, both verbal praise and "doggie treats" were given to the dog when it obeyed commands.
The results are summarized here:
Group
n
Mean
Standard deviation
Verbal praise
12
19.44
5.06
Praise + treat
10
23.14
5.84
Pooled variance (S2p)
29.43
a.
(15 points) Test to see if the population mean of the praise + treat group is higher. (Do dogs receiving
5
praise + treats learn better?) Use a significance level of .05. Be sure to specify both the null and alternative
hypotheses, and to show all your work.
Test: t-statistic
Hypothesis:
H 0 : vp  t  vp or H 0 : vp  t  vp  0
The mean rating of the population of dogs receiving verbal praise and treats in the
obedience test is less than or equal to the mean rating of the population of dogs receiving
verbal praise on the same obedience test.
H1 :  vp  t   vp or H1 : vp   vp  t or H 0 :  vp  t   vp  0
The mean ratings of the population of dogs receiving verbal praise and treats in the
obedience test are higher than the mean ratings of the population of dogs receiving verbal
praise on the same obedience test
Degrees of Freedom: nvp  nvp t  2  12  10  2  20
t-critical for 0.05 one sided test (directional test) = 1.725
t-statistic computation (t-observed):
t
xvp  t  xvp
Sp *
1
nvp  t

1
nvp

23.14  19.44
3.7

 1.594
5.425 * 0.428
1 1
5.425 *

10 12
We accept the null hypothesis and conclude that the mean ratings of the population of
dogs receiving verbal praise and treat are less than or equal to the mean ratings of
population of dogs receiving verbal praise on the same obedience test.
b.
Make a 95% confidence interval for the difference between the two means.
CI.95  ( xvp  xvp t ) t critical*S xvp  xvp  t thus,
( xvp  xvp t ) t critical*Sxvp  xvp  t  vp  vp t  ( xvp  xvp t ) t critical*Sxvp  xvp  t
We need to calculate: ( xvp  xvp t )  23.14  19.44  3.7
Use two-tailed t value (p=0.05): t critical 2.086
6
We use the pooled standard error
S xvp  xvp  t  S p *
1 1

 2.32
n1 n2
Substituting in the CI formula:
( xvp  xvp t ) t critical*S xvp  xvp  t   vp   vp t  ( xvp  xvp t ) t critical*S xvp  xvp  t
(3.7)  2.086 * 2.32   vp   vp t  (3.7)  2.086 * 2.32
 1.14   vp   vp t  8.54
We accept the null hypothesis and conclude that the mean ratings of the population of
dogs receiving verbal praise and treat are less than or equal to the mean ratings of
population of dogs receiving verbal praise on the same obedience test.
c.
(10 points) Compute the effect size to show how different the groups are in
standardized units. Comment on the effect-size value. Does it seem consistent with
(i.e., does it agree with) the results of the t test and confidence interval? Explain any
differences you see.
d
x
vp
 xvp  t 
Sp

23.14  19.44  0.68
5.425
This is considered to be a moderate effect size according to Shavelson (1995, p.317-318),
who defines an effect size around 0.2 as small, 0.5 as moderate and 0.8 as a large effect
size. d = 0.68 means nearly 2/3 of a standard deviation. This result is consistent with the
t-test and with the CI. If the sample were bigger, the consistency with the t-test would be
clearer. The CI shows us what the estimate is in the scale points [-1.14,8.54] while the
effect size is scale free.
3) Anne, who is a science teacher, is interested in children's understanding of the life cycles of butterflies and
moths. She gave her class a lesson on this topic, followed by a 25 item test. A summary of information about
their performance follows.
Test results for 30 students in Anne’s class
Minimum score
2
Maximum score
25
Sum of scores
483
Sum of squared scores ( X²)
8037.3
Median
19
7
a. Use the data to obtain two measures of location for the students' performance. Compare the two
measures. Does the typical student have a strong understanding of life cycles of moths and
butterflies? Justify your answer.
Median = 19
n
Mean = x 
x
i 1
i
n

483
 16.1
30
The median is bigger than the mean. This implies that there are some extreme low
scores that are decreasing the mean.
The typical student has a moderate understanding of the content because 50% of
the students presented scores above the median (19 points), which is only moderately
close to the maximum score possible. The median score of 19 out of 25 points is 76% of
the total score, while the mean is only 64%. If we consider a score of 80% to represent
mastery of the content, the typical student has not achieved mastery.
b.
Now obtain two measures of spread for Anne’s students.
Range = Highest score – Lowest score = 25 – 2 = 23
Variance
30
s2 
s2 
(X  X )
2

n 1
n 1
(X
2
 2 XX  X 2 )
n 1

X
2
  2 XX   X 2
8037.3  2 * 16.1 * 483  30 * (16.1) 2
9
30  1
You can also compute the variance using:
s
2
x

2
i
 nx 2
n 1
8037.3  25 * 16.1
9
25  1
2

Standard Deviation = s  s 2  9  3
8
n 1

X
2
 2 X  X  n *X 2
n 1
4) Another science teacher (Betty) gave the same test and obtain this summary data with SPSS:
Statistics
SCIENCE
N
Mean
Median
Mode
Std.
Deviation
Variance
Minimum
Maximum
Percentiles
Valid
Missing
25
50
75
20
0
15.50
17.50
15
2.03
4.121
6
20
12.5
17.5
19
a.
Discuss the "typical" performance of the two classes. Which location statistic seems
most appropriate for describing the performance of these two sets of students? Does there
appear to be a difference between the "average" levels of understanding for students of
Teachers Anne and Betty?
Anne’s class
Betty’s class
n
30
20
Median
19.0
17.5
Mean
16.1
15.5
Standard Deviation
3.00
2.03
The median is a better measure of central tendency since both the distributions are
negatively skewed. The medians are quite close: less than 2 points apart (only 6%
different on the percentage scale). The average level of understanding in Anne’s class is
higher than in Betty’s class, because the mean and the median in Anne’s class are higher.
However, the differences are small, whether we examine means or medians. The
difference between means is small relative to the standard deviations and to the pooled
standard deviation of 2.65, computed as follows:
Sp 
nu  1 * su2  ns  1 * ss2
nu  ns  2

30  1 * 3.02  20  1 * 2.02
30  20  2
This gives us an effect size of 0.6 / 2.65 = 0.23.
9
 7.02  2.65
b. For Anne’s students, Q1 = 14, and Q3 = 22. Make a graph that compares the
distributions of scores for students of Anne and Betty. Discuss the two
distributions. Does your assessment of the performance of the two classes change?
Anne's Class
Betty's Class
0
10
20
30
No, we maintain our conclusion. Even though, the spread of both distributions is pretty
similar, scores in Ann’s class seems to be more skewed. Anne’s class still does better
than Betty’s class because we observe that 50% of Anne’s students are above 19 points,
whereas only around 25% of Betty’s students perform above Anne’s student’s median.
Betty’s students have a top score that seems to shows a ceiling effect at about 20 points.
10
Related documents