Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Answer Key CEP 933 Assignment One Due January 31 and February 1, 2000 1) For this question you will use data from the article by Hannaway and Talbert, found at the front of your class packet as well as SPSS. In Table I Hannaway and Talbert present descriptive data on the variable "Teacher community" (TC). Use the data in Table I to examine the mean difference on TC at the = .10 level of significance. Compare the means for Urban versus Suburban teachers. Urban Suburban n 72 126 Mean 26.4 27.2 Standard Deviation 3.2 2.9 a. For this comparison, 1. Pose the null and alternative hypotheses about the comparison of the two means. Use both symbols and words. H 0 : u s The mean ratings of teacher community for the population of urban schools are equal to the mean ratings of teacher community for the population of suburban schools. H1 : u s The mean ratings of teacher community for the population of urban schools are different from the mean ratings of teacher community for population of suburban schools. PS. The underlined assumption in statistical testing is that the null hypothesis is assumed to be true until it is shown to be false. So, always state the null hypothesis as if it were true. 2. Estimate the pooled variance of the scores. S 2p nu 1 * su2 ns 1 * ss2 nu ns 2 72 1 * 3.22 126 1 * 2.92 72 126 2 P.S. The pooled standard deviation is calculated by: S p S p2 9.073 3.012 1 9.073 3. What test statistic will you use? What are the appropriate degrees of freedom? What would the critical value be for = .10? Test: t statistic Degrees of Freedom: nu ns 2 72 126 2 196 t-critical for two sided test and 0.1(Table C, p.619) 1.658 PS. For a more conservative analysis we use the df = 120 (t=1.658) instead of df= (t=1.645). When the df is much larger than 120, perhaps df=300 or larger, we use the bottom line with df= . 4. Compute the test statistic and determine its probability (p value) as best you can. t xu xs 26.4 27.2 1.802 1 1 1 1 Sp * 3.0 * nu ns 72 126 For the two-tailed hypothesis we can use the absolute value of the t-statistic |t|=1.802. We can affirm that assuming the null hypothesis is true, the probability of obtaining a tstatistic as big or bigger than 1.802 is less than 0.1 and more than 0.05. This occurs because the |t|=1.802 is between 1.645 (p=0.1) and 1.960 (p=0.05), as shown in the table below: Two-tailed significance level n>120 p=0.1 1.645 p=0.05 1.960 p= ? 1.802 We can write 0.10 > p > 0.05, or we can interpolate. Since 1.802 is about half way between 1.645 and 1.645, following the computation below, we can guess that p .075 . Distance from 1.802 to 1.645 1.802 - 1.645 0.157 .498. 1.96 1.645 0.315 Distance from 1.96 to 1.645 b. Next you will use the SPSS school level NELS data set to make a similar comparison of urban and suburban schools on teacher community (called tchcomm in our data set). The urbanicity variable is called g10urban, and the groups to be compared are coded urban =1 and suburban=2. 2 1. For this comparison test the same null and alternative hypotheses you tested in part a, and include relevant SPSS output. Group Sta tisti cs Teacher Community (High values=lots o' community ) URBANICITY OF THE STUDE NT'S SCHOOL URBAN N SUBURBA N Mean St d. Deviation St d. E rror Mean 327 .2567 6.9513 .3844 504 .1241 5.9923 .2669 Independent Samples Test Levene's Test for Equality of Variances F Teacher Community (High values=lots o' community) Equal variances assumed Equal variances not assumed 9.250 Sig. .002 t-test for Equality of Means t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper .292 829 .770 .1326 .4535 -.7576 1.0227 .283 622.369 .777 .1326 .4680 -.7865 1.0516 2. Explain which test statistic on the output gives you the p value for an appropriate test. If we use =.05 for Levene’s test, we observe that the p value (Sig) is 0.002, p < .05. We conclude that equal variances are not assumed. This conclusion gives us a p-value of 0.777 for the t-test. If we use a very small ( =.001) for Levene’s Test, we can assume equal variances. 3. What is your decision about the hypotheses for these data? We may use the same significance level ( 0.1) of the previous study to compare the results. Because t = 0.283, with a p= .777, p > .05. The p-value 0.777 is also bigger than 0.1, so using this significance level ( 0.1 ), we do not reject the null hypothesis that “the mean ratings of teacher community for the population of urban schools are equal to the mean ratings of teacher community for the population of suburban schools”. We proceed as if the null hypothesis H 0 were true. 3 c. Interpret the two test results you found in parts a and b. n Urban Suburban 72 126 H&T Study s xu xs x 26.4 3.2 27.2 2.9 0.8 sp t n x NELS Study s xu xs 3.0 1.80 327 504 .26 .12 6.95 5.99 0.13 sp t 6.4 .283 1. How do the components of the t tests (differences in means, variance estimates, sample sizes) contribute to the difference in p values? (Note: the scales of the teacher community variables in H&T and SPSS are different!!) The differences between the two tests are: the sample size, In the NELS study, the urban and suburban samples are larger. This means that even for a similar sized effect, we will get a larger t value. the differences between the means, The mean differences are on different scales. However, we can use the effect size d to compare the two mean differences. d H &Tstudy xu xs 0.8 0.27 Sp 3.0 d NELSstudy xu xs 0.13 0.02 Sp 6.4 Even though the NELS mean difference appears larger, when compared to the scale of the scores and the spread seen, we conclude that the difference is quite small (0.02). The H&T effect shows over one-fourth of a standard deviation unit difference (0.27) versus a difference of 0.02 for the NELS study. the pooled variance, The variation (scale) differences are reflected in the effect sizes. the differences in t values and the p-values, The critical t values differ slightly because of the differences in sample sizes. However, our t table does not show t values for df=196 and df=829. We know that the tdf 196 tdf 829 . For both studies (H&T and NELS study) let us use the df= . Then, only if we use different values will the t-values differ. At =0.1 for the H&T study our t-critical = 1.645 whereas at =0.05 for the NELS study t-critical = 1.96. A better comparison would consider the same significance level =0.1 for both studies. In our assignment, this does not change the results of our analysis, since the conclusion of H&T is the same using =0.1 or =0.05. 4 2. What do the results tell you about differences in teacher community between the two different locations in these two studies? There are moderately large and significant differences between teacher community levels for suburban versus urban schools in the H&T study, but there are no clear differences in the NELS study. 3. Can you think of any explanations of these findings? The studies may address slightly different populations. We know NELS is a national sample collected in 1988 and H&T used High School and Beyond data from 1982-84. It is also a national sample but from a few years earlier. The urbanity categories may not be well defined in the two studies. So, for instance, the label “suburban” may be applied to cases in NELS that may have been considered urban in the H&T study. Furthermore, we do not know exactly how teacher community was measured in both studies. In NELS, teacher community is based on 10 items (see pp.166-7 in your course packet). H&T’s measure had 24 items, and we are not told what response scale was used. Differences in definitions of teacher community and measurement may be related to the findings of the analysis. 2) A study has been done to examine the effects of different types of reinforcement on canine learning. The outcome is a score on a dog obedience test (higher scores are better). Two types of reinforcement were used: For the first group, success is rewarded with only verbal praise, and in the second group, both verbal praise and "doggie treats" were given to the dog when it obeyed commands. The results are summarized here: Group n Mean Standard deviation Verbal praise 12 19.44 5.06 Praise + treat 10 23.14 5.84 Pooled variance (S2p) 29.43 a. (15 points) Test to see if the population mean of the praise + treat group is higher. (Do dogs receiving 5 praise + treats learn better?) Use a significance level of .05. Be sure to specify both the null and alternative hypotheses, and to show all your work. Test: t-statistic Hypothesis: H 0 : vp t vp or H 0 : vp t vp 0 The mean rating of the population of dogs receiving verbal praise and treats in the obedience test is less than or equal to the mean rating of the population of dogs receiving verbal praise on the same obedience test. H1 : vp t vp or H1 : vp vp t or H 0 : vp t vp 0 The mean ratings of the population of dogs receiving verbal praise and treats in the obedience test are higher than the mean ratings of the population of dogs receiving verbal praise on the same obedience test Degrees of Freedom: nvp nvp t 2 12 10 2 20 t-critical for 0.05 one sided test (directional test) = 1.725 t-statistic computation (t-observed): t xvp t xvp Sp * 1 nvp t 1 nvp 23.14 19.44 3.7 1.594 5.425 * 0.428 1 1 5.425 * 10 12 We accept the null hypothesis and conclude that the mean ratings of the population of dogs receiving verbal praise and treat are less than or equal to the mean ratings of population of dogs receiving verbal praise on the same obedience test. b. Make a 95% confidence interval for the difference between the two means. CI.95 ( xvp xvp t ) t critical*S xvp xvp t thus, ( xvp xvp t ) t critical*Sxvp xvp t vp vp t ( xvp xvp t ) t critical*Sxvp xvp t We need to calculate: ( xvp xvp t ) 23.14 19.44 3.7 Use two-tailed t value (p=0.05): t critical 2.086 6 We use the pooled standard error S xvp xvp t S p * 1 1 2.32 n1 n2 Substituting in the CI formula: ( xvp xvp t ) t critical*S xvp xvp t vp vp t ( xvp xvp t ) t critical*S xvp xvp t (3.7) 2.086 * 2.32 vp vp t (3.7) 2.086 * 2.32 1.14 vp vp t 8.54 We accept the null hypothesis and conclude that the mean ratings of the population of dogs receiving verbal praise and treat are less than or equal to the mean ratings of population of dogs receiving verbal praise on the same obedience test. c. (10 points) Compute the effect size to show how different the groups are in standardized units. Comment on the effect-size value. Does it seem consistent with (i.e., does it agree with) the results of the t test and confidence interval? Explain any differences you see. d x vp xvp t Sp 23.14 19.44 0.68 5.425 This is considered to be a moderate effect size according to Shavelson (1995, p.317-318), who defines an effect size around 0.2 as small, 0.5 as moderate and 0.8 as a large effect size. d = 0.68 means nearly 2/3 of a standard deviation. This result is consistent with the t-test and with the CI. If the sample were bigger, the consistency with the t-test would be clearer. The CI shows us what the estimate is in the scale points [-1.14,8.54] while the effect size is scale free. 3) Anne, who is a science teacher, is interested in children's understanding of the life cycles of butterflies and moths. She gave her class a lesson on this topic, followed by a 25 item test. A summary of information about their performance follows. Test results for 30 students in Anne’s class Minimum score 2 Maximum score 25 Sum of scores 483 Sum of squared scores ( X²) 8037.3 Median 19 7 a. Use the data to obtain two measures of location for the students' performance. Compare the two measures. Does the typical student have a strong understanding of life cycles of moths and butterflies? Justify your answer. Median = 19 n Mean = x x i 1 i n 483 16.1 30 The median is bigger than the mean. This implies that there are some extreme low scores that are decreasing the mean. The typical student has a moderate understanding of the content because 50% of the students presented scores above the median (19 points), which is only moderately close to the maximum score possible. The median score of 19 out of 25 points is 76% of the total score, while the mean is only 64%. If we consider a score of 80% to represent mastery of the content, the typical student has not achieved mastery. b. Now obtain two measures of spread for Anne’s students. Range = Highest score – Lowest score = 25 – 2 = 23 Variance 30 s2 s2 (X X ) 2 n 1 n 1 (X 2 2 XX X 2 ) n 1 X 2 2 XX X 2 8037.3 2 * 16.1 * 483 30 * (16.1) 2 9 30 1 You can also compute the variance using: s 2 x 2 i nx 2 n 1 8037.3 25 * 16.1 9 25 1 2 Standard Deviation = s s 2 9 3 8 n 1 X 2 2 X X n *X 2 n 1 4) Another science teacher (Betty) gave the same test and obtain this summary data with SPSS: Statistics SCIENCE N Mean Median Mode Std. Deviation Variance Minimum Maximum Percentiles Valid Missing 25 50 75 20 0 15.50 17.50 15 2.03 4.121 6 20 12.5 17.5 19 a. Discuss the "typical" performance of the two classes. Which location statistic seems most appropriate for describing the performance of these two sets of students? Does there appear to be a difference between the "average" levels of understanding for students of Teachers Anne and Betty? Anne’s class Betty’s class n 30 20 Median 19.0 17.5 Mean 16.1 15.5 Standard Deviation 3.00 2.03 The median is a better measure of central tendency since both the distributions are negatively skewed. The medians are quite close: less than 2 points apart (only 6% different on the percentage scale). The average level of understanding in Anne’s class is higher than in Betty’s class, because the mean and the median in Anne’s class are higher. However, the differences are small, whether we examine means or medians. The difference between means is small relative to the standard deviations and to the pooled standard deviation of 2.65, computed as follows: Sp nu 1 * su2 ns 1 * ss2 nu ns 2 30 1 * 3.02 20 1 * 2.02 30 20 2 This gives us an effect size of 0.6 / 2.65 = 0.23. 9 7.02 2.65 b. For Anne’s students, Q1 = 14, and Q3 = 22. Make a graph that compares the distributions of scores for students of Anne and Betty. Discuss the two distributions. Does your assessment of the performance of the two classes change? Anne's Class Betty's Class 0 10 20 30 No, we maintain our conclusion. Even though, the spread of both distributions is pretty similar, scores in Ann’s class seems to be more skewed. Anne’s class still does better than Betty’s class because we observe that 50% of Anne’s students are above 19 points, whereas only around 25% of Betty’s students perform above Anne’s student’s median. Betty’s students have a top score that seems to shows a ceiling effect at about 20 points. 10