Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SPSS/Excel Project Running Head: SPSS/EXCEL PROJECT SPSS/Excel Project: Equity in Public School Expenditures EDU 6976 Interpreting and Applying Educational Research II Seattle Pacific University Gulseren Arikan December 10, 2009 1 SPSS/Excel Project INTRODUCTION This paper aims to analyze the data that was compiled in response to equity in public school expenditures debates. The data includes 50 states’ following records: Variable Descriptions: Expenditure Current expenditure per pupil in average daily attendance in public elementary and secondary schools, 2005-06 Ratio Average pupil/teacher ratio in public elementary and secondary schools, Fall 2005 and Fall 2006 Salary Estimated average annual salary of teachers in public elementary and secondary schools, 2005-06 Eligible Percentage of graduates taking the SAT, 2006-07 Reading Average verbal SAT score, 2005-06 Math Average math SAT score, 2005-06 Writing Average writing SAT score, 2005-06 Region 1= West; 2=Midwest; 3= South; 4=Northeast SES Percent of students in elementary and secondary schools who are eligible for free or reduced-price lunch, by state or jurisdiction: School year 2006–07 Students Number of students in elementary and secondary schools who are eligible for free or reduced-price lunch, by state or jurisdiction: School year 2006–07 IDEA Percent of students with disabilities, 2006-07 Money Total revenues for the year 2005-06 (in thousands) 2 SPSS/Excel Project PART 1 A) Current expenditure per pupil in average daily attendance in public elementary and secondary schools (2005-06) Current expenditure per pupil in average daily attendance in public elem. and sec. schools (2005-06) Mean 10327.78 9805.00 Median Std. Deviation 2502.202 1.16 Skewness .33 Std. Error of Skewness Minimum 5960 Maximum 18339 The histogram of current expenditure per pupil does not show a normal curve. It shows a strong negative skewness. The most frequent values are ranged between 8,000 and 11,000. The mean of the data set is 10,327 and the median is 9,805. Mode in the histogram is 8000. According to histogram, 6,000 - 16, 000 and 18, 000 may be the outliers that increased standard deviation. To understand the source of extreme values, distribution of expenditure per pupil amongst regions would help. If we look at the box plot of the variable, we will notice that NE has a range between 12,000 and 18,000. It has the highest values, except the one value in South. 3 SPSS/Excel Project A Southern state with 18,000 showed up as an outlier. When I checked the outlier value, I noted that the correspondence state is D.C., which makes sense. According to the box plots; West and then Northeast has the most dispersed values, and Midwest has the most clustered and even distribution, compared to other regions. The difference between the Midwest’s highest and lowest expenditures are 2,000, while the difference in the West is 7,000. Furthermore, the medians of the West, Midwest and South are close to each other, around 9,000; but Northeast deviates average and median of total values with 13,000 median. The median lines within the boxes of Midwest and Northeast are not equidistant from the hidges; medians are close to upper percentile so data is skewed positively there, but skewed negatively in the South. The West seems equally distributed. B) Average pupil/teacher ratio fall 2005 Average pupil/teacher ratio Fall 2005 Mean 15.23 Median 14.80 Std. Deviation 2.54 Skewness .75 Std. Error of .33 Skewness Minimum 15.23 Maximum 14.80 The histogram of average pupil/teacher ratio does not show a normal curve. It shows a rare negative skewness. The most frequent values are ranged between 13 and 17. The mean of the data set is 15.23 and the median is 14.80. The mode in the histogram is 4 SPSS/Excel Project 15. According to histogram, 21-11-23 may be the outliers that increased standard deviation. To get a deeper understanding, we need to interpret the box plot of pupil/teacher ratio which shows the distribution according to the regions. If we look at the box plot of the variable, we will see that the West has a range between 13 and 23. It is the most diverse region in terms of pupil/teacher rate and has the highest rates. While the values are widely dispersed in the West, they are very clustered in the South and Northeast. Surprisingly again, there is an outlier value in the South, and this time the state is Virginia. Northeast has the lowest values. The West and the Northeast have extreme low and high values, compared to other regions. The median is around 13 for the Northeast, around 18 for the West, around 15 for the South, and around 14 for the Midwest. It is clear that the West contributes to the increases in the standard deviation of the variable. The median line within the box of the Midwest and the Northeast is not equidistant from the hidges, median is close to lower percantile so data is skewed negatively there, but the median is close to the higher percantile in the Northeast, so data skewed positively. Those two regions had showed different skewness in terms of the expenditure analysis as well. 5 SPSS/Excel Project C) Estimated ave. Salary 2005-2006 estimated ave. salary 20052006 Mean 47679.08 Median 45575.00 Std. Deviation 6942.01 Skewness .60 Std. Error of .33 Skewness Minimum 35607.00 Maximum 61372.00 The histogram of estimated average salary does not show a normal curve. It shows a negative skewness. The most frequent values are ranged between 40,000 and 45,000. However, the mean of the data set is 47,679 and the median is 45,575. Mode in the histogram is 42,500. According to histogram, 35,000-57,500-62,500 may be the outliers that increased standard deviation. I will use box plots to explain the regional differences in detail. If we look at the box plots, we will see that Midwest has a range between 35, 000 and 60,000 -an extremely wide dispersion. The South, as in the previous analyses, has the most clustered data and it has three outliers in itself, DC is again one of those outliers. The Midwest has the lowest value and 6 SPSS/Excel Project the West has the highest value. The West and the Midwest show a positive skew, which means the higher salary rates are prevailing, while the South and the Northeast show a negative skew, which means more people get lower salary when compared to the fourth percentile group. The medians of the West, the Midwest and the South are close to each other-45,000, as it was in the pupil/teacher ratio, but again the Northeast deviates average and median of total values, with a median value of 57,000. D) Percentage of all eligible students taking the SAT 2006-07 percentage of all eligible students taking the SAT 2006-07 Mean 39.33 Median 32.00 Std. Deviation 31.12 Skewness .25 Std. Error of .33 Skewness Minimum Maximum 3.0 100.0 It looks like a histogram with lots of outliers. The standard deviation of the variable is almost equal to the mean and the median of the variable. It shows a strong negative skewness. The most frequent values are ranged between 0 and 10. The range is extremely high. The mode is far from the mean and the median. There are gaps -missing values- in the histogram. According to the histogram, 100-80-10 may be the outliers that increased standard deviation. For further understanding, let’s consider the following plot boxes : 7 SPSS/Excel Project It looks totally different than the previous variables. Unlike the other variables, the South has an extremely wide dispersion, compared to other regions. The range is between 3 and 80. It seems that the South affected a lot the analysis of data. Surprisingly the Midwest has an extremely clustered dispersion with the lowest rates. The highest percentage of students that are eligible taking the SAT is around 10 % in the South. On the other hand, the rate rises up to 100% in the Northeast. Two outliers show up in the Midwest, Ohio with 27 % and Indiana with 62 %. The South, the Northeast and the Midwest show a negative skew, while the West shows a positive skew. Regarding the eligibility rate to take the SAT, the Northeast looks like the best region with clustered values and high percantages. E) Average verbal SAT score (2005-06) Average verbal SAT score 2005-06 Mean 534.94 Median 523.00 Std. Deviation 37.80 Skewness .31 Std. Error of .33 Skewness Minimum 482.00 Maximum 610.00 8 SPSS/Excel Project The histogram of average verbal SAT score does not show a normal curve. It shows a rare negative skewness. The most frequent values are ranged between 490 and 500. The mode is close to the mean and the median. According to the histogram, 610620-480 may be the outliers that affected the standard deviation. I think, we need to compare this plot box set with the eligibility one. There are interesting facts here. In contrast to the eligibility data, the Midwest has the highest values and the Northeast has the lowest values. The score and the number of students are inversely proportional. Just like the eligibility data, the same two states showed up as outliers in the Midwest. They were the states with the highest rate of eligibility and show lower scores than the rest of the Midwestern states. This supports my claim regarding the inverse proportionality between the number of students and their scores. Some other facts about the plot boxes are: the West, the South and the Northeast show a positive skewness, the West has the lowest scores, and again as in the case of the previous data set, the South has the most diverse scores with a range around 80. 9 SPSS/Excel Project F) Average Writing SAT score (2005-06) Average writing SAT score (200506) Mean 525.37 Median 511.00 Std. Deviation 37.63 Skewness .30 Std. Error of .33 Skewness Minimum 472.00 Maximum 591.00 The histogram of average writing SAT score looks like the previous histogram of verbal score; it does not show a normal curve, but shows a negative skewness. Some facts that can be derived from the illustration are: The most frequent values are ranged between 485 and 490; the mode is around 480 and far from the mean; 475-580-600 may be the outliers that affected the standard deviation. The box plots of writing scores are almost identical with verbal scores. So it seems that there is a strong positive correlation between verbal scores and writing scores. 10 SPSS/Excel Project G) Average Math SAT Score (2005-06) Average math SAT score 2005-06 Mean 540.59 Median 529.00 Std. Deviation 37.46 Skewness .47 Std. Error of .33 Skewness Minimum 472.00 Maximum 617.00 The histogram of average math SAT score also shows a negative skewness. Some facts that can be derived from the illustration are: The most frequent values are ranged between 510 and 530; the mode is around 510; there are some gaps in data set; 470-480610 may be the outliers that affected the standard deviation. Some facts about the plot boxes are: The Midwest has the highest values and the South has the lowest values; like in the previous variable, the two states showed up as outliers in the Midwest; the West, the South and the Midwest show a positive skewness; the South has the most diverse scores; and the Northeast shows the most clustered dispersion. 11 SPSS/Excel Project H) % of students eligible for free/reduced lunch 2006-07 % of students eligible for free/reduced lunch 2006-07 Mean 39.82 Median 37.35 Std. Deviation 10.64 Skewness .62 Std. Error of .34 Skewness Minimum 17.70 Maximum 67.50 The histogram shows a slightly positive skewness. Some facts that can be derived from the illustration are: The most frequent values are ranged between 30% and 40%; the mode is around 35%; there are some gaps in data set; 20%-70%-60% may be the outliers that affected the standard deviation. According to the box plots: Dispersions in all four regions are clustered; there are a few outliers in the West, the South and the Northeast; the Northeast and the Midwest show a positive skewness; the Midwest has the most diverse scores. The data are not represented in terms of percentages unlike the histogram. 12 SPSS/Excel Project I) % of students with disabilities 2006-07 % of students with disabilities 2006-07 Mean 14.22 Median 14.30 Std. Deviation 2.13 Skewness .23 Std. Error of .33 Skewness Minimum 10.50 Maximum 19.90 The histogram shows a positive skewness. The facts that I can interpret from the the histogram are: The most frequent values are ranged between 14% and 16%; the mode is around 15%, and although 10% may be an outlier that affected the standard deviation, the standard deviation is not so much significant. According to Plot boxes, the West, the Northeast and the Midwest show a positive skewness. Both the Northeast and the South have the most diverse values. The Northeast includes the highest percentages, while the West includes the lowest percentages. 13 SPSS/Excel Project 14 J) Total Revenues for the Year 2005-06 (in thousands) Total revenues for the year 200506 (in thousands Mean 10208705.02 Median 6346033.00 Std. Deviation 12216880.36 Skewness 2.61 Std. Error of .33 Skewness Minimum 958109.00 Maximum 63785872.00 The histogram shows a negative skewness. There are many gaps and outliers in the data.The most frequent values are ranged between 958,109,000.00 and 100,000,000.00, the main outliers are 400,000,000.00 and 600,000,000.00 which inflated the standard deviation. Box plots shows the differences amongst the regions. The Nortwest has the most diverse values within a large range, and also includes the lowest and the highest revenues. There are some extreme values for the West and the South. All the regions show a positive skewness. SPSS/Excel Project 15 K) Enrollment Fall (2005) Enrollment Fall 2005 Mean 963005.84 Median 654526.00 Std. Deviation 1145966.304 Skewness 2.98 Std. Error of .33 Skewness Minimum 76876.00 Maximum 6437202.00 The histogram shows a negative skewness. There are many gaps and outliers in the data.The most frequent values are ranged between 77,000 and 100,000. The main outliers are 4,000,000 and 6,437,202. According to plot boxes, the outliers are in the West and the South regions. The Northeast has the highest range compared to other regions and it has a positive skewness. PART 2 I explained above the differences amongst regions through histograms, descriptive stattistics and plot boxes . Besides the information above, I just want to add the analysis SPSS/Excel Project of ANOVA and TUKEY tests regarding the region differences. ANOVA test results helped me to get whether there is a significant difference in means but I used TUKEY values to find where the differences are, to double check and not to reject null hypothesis incorrectly. In addition, to get some measure of practical significance I evaluated the effect size through eta square values. For eta square, values of .01, .06, and .14 represent small, medium, and large effect sizes respectively. 1-Expenditure per pupil Statistical significance: F ratio 9.75 > F (3,46) =2.81 Reject hypothesis, since at .05 level calculated F is greater than critical value, so there is a significant difference in means of the regions. There is a greater variability between regions than within. According to, Levene test with %24 probability the regions have equivalent variance. According to TUKEY values, there is a significant difference between the Northeast and the rest of the regions: The South, the Midwest and the West. From previous graph analysis we can retrieve that the Northeast has the highest expenditure values compared to other region. Practical significance: Partial eta square .36 So, the effect size is large. 2-Pupil/teacher ratio Statistical significance: F ratio 13.08 > F (3,46) =2.81 Reject hypothesis, since at .05 level calculated F is greater than critical value, so there is a significant difference in pupil/teacher ratio means of the regions. There is a 16 SPSS/Excel Project greater variability between regions than within. According to, Levene test there is %1 probability that the regions have equivalent variance. According to TUKEY values, there is a significant difference between the West and the rest of the regions: The West’s teacher/pupil rate mean is higher than the Northeast, the Midwest and the South. Furthermore, there is a significant difference between the south and the Northeast, but the Northeast has a higher mean. Practical significance: Partial eta square .46 so, the effect size is large. 3. Teacher average salary Statistical significance: F ratio 3.45 > F (3,46) =2.81 Reject hypothesis, since at .05 level calculated F is greater than critical value, so there is a significant difference in teacher average salary means of the regions. There is a greater variability between regions than within. According to, Levene test there is %66 probability that the regions have equivalent variance. According to TUKEY values, there is a significant difference between the South and the Northeast, the Northeast has a higher average salary mean.. Practical significance: Partial eta square .18 So, the effect size is pretty large 4. Percentage of all eligible students taking the SAT, 2006-07 Statistical significance: F ratio 16.66 > F (3,46) =2.81 Reject hypothesis, since at .05 level calculated F is greater than critical value, so there is a significant difference amongst means of the regions. There is a greater variability between regions than within. According to, Levene test there is no probability that the regions have equivalent variance. 17 SPSS/Excel Project According to TUKEY values, there is a significant difference between the Northeast and the rest of the regions, once more the Northeast has the highest rates. Another difference showed up between the Midwest and the South, the South has a higher mean regarding percentage of all eligible students taking the SAT, than the Midwest. Practical significance: Partial eta square .52 So, the effect size is large 5. Performance on the SAT verbal Statistical significance: F ratio 12.00 > F (3,46) =2.81 Reject hypothesis, since at .05 level calculated F is greater than critical value, so there is a significant difference in the means of the regions. There is a greater variability between regions than within. According to, Levene test there is no probability that the regions have equivalent variance. According to TUKEY values, there is a significant difference between the Midwest and the rest of the regions, so the Midwest has the highest SAT scores compared to other regions. Another difference showed up between the Midwest and the South, the South has a higher mean regarding percentage of all eligible students taking the SAT. Practical significance: Partial eta square .43 So, the effect size is large 6. Performance on the SAT writing Statistical significance: F ratio 9.62 > F (3,46) =2.81 Reject hypothesis, since at .05 level calculated F is greater than critical value, so there is a significant difference amongst the means of the regions. There is a greater variability between regions than within. According to, Levene test there is no probability that the regions have equivalent variance. 18 SPSS/Excel Project According to TUKEY values, there is a significant difference between the Midwest and the rest of the regions, so the Midwest has the highest SAT writing scores as well, compared to other regions. Practical significance: Partial eta square .38 So, the effect size is large 6. Performance on the SAT math Statistical significance: F ratio 16.94 > F (3,46) =2.81 Reject hypothesis, since at .05 level calculated F is greater than critical value, so there is a significant difference amongst the means of the regions. There is a greater variability between regions than within. According to, Levene test there is no probability that the regions have equivalent variance. According to TUKEY values, there is a significant difference between the Midwest and the rest of the regions, so the Midwest has the highest SAT math scores compared to other regions. Another difference showed up between the West and the Northeast; the West has higher math scores than the Northeast. Practical significance: Partial eta square .52 so, the effect size is large 6. SES as measured by percent of students on free/reduced lunch Statistical significance: F ratio 14.87 > F (3,46) =2.81 Reject hypothesis, since at .05 level calculated F is greater than critical value, so there is a significant difference amongst the means of the regions. There is a greater variability between regions than within. According to, Levene test there is %29 probability that the regions have equivalent variance. 19 SPSS/Excel Project According to TUKEY values, there is a significant difference between the South and the rest of the regions. The South has the highest mean. Another significant difference is between the Northeast and the West; the West has higher mean than Northeast. Practical significance: Partial eta square .49 So, the effect size is pretty large 7. Percent students with disabilities Statistical significance: F ratio 10.25 > F (3, 46) =2.81 Reject hypothesis, since at .05 level calculated F is greater than critical value, so there is a significant difference amongst the means of the regions. There is a greater variability between regions than within. According to, Levene test there is %7 probability that the regions have equivalent variance. According to TUKEY values, there is a significant difference between the Northeast, the West and the South; the Northeast has higher mean. Other significant difference showed up between the Midwest and the West; the Midwest has higher mean. Practical significance: Partial eta square .40 So, the effect size is pretty large 8. Revenues Statistical significance: F ratio 30 > F (3, 46) =2.81 Reject hypothesis, since at .05 level calculated F is greater than critical value, with a %83 probability, there is a significant difference amongst the means of the regions. There is a greater variability between regions than within. According to, Levene test there is %52 probability that the regions have equivalent variance. 20 SPSS/Excel Project According to TUKEY values, there is a significant difference between the Northeast, the West and the South; the Northeast has higher mean. Other significant difference showed up between the Midwest and the West; the Midwest has higher mean. The analysis results are similar with disability variable. Practical significance: Partial eta square .40 So, the effect size is pretty large Conclusion I was expecting that the Northeast will have higher SAT scores, since it has the highest values in terms of revenues, expenditure, teacher /pupil rate. However, according to my analyses above, the Midwest has higher SAT scores. When the number of students that are eligible to take SAT increases, the SAT scores decreases. It seems that there is an obvious negative correlation between two variables. The main reason behind Midwest’s success looks like the low population rates. Midwest has the lowest number of population that is eligible taking SAT which affected the analysis. PART 3 Regarding my analyses above, I have noticed significant correlation between some variables. So, in order to verify my observations, I have implemented the following analyses: 1a. Expenditure and SAT scores (verbal) Correlation coefficient: -0.42 Moderate negative relationship Statistical significance: Average verbal SAT scores are expected to decline slightly, as the current expenditure per pupil in average daily attendance increases. 21 SPSS/Excel Project Practical significance: Actual scatter plot reveals that there is no significant correlation in between the two parameters. The discrepancy of the statistical data may be due to some outliers. Corresponding regression equation: y= -0.006x + 600.47 (almost constant) 1b. Expenditure and SAT scores (math) Correlation coefficient: -0.39 Moderate negative relationship Statistical significance: Average math SAT scores are expected to decline slightly, as the current expenditure per pupil in average daily attendance increases. Practical significance: Actual scatter plot reveals that there is no significant correlation in between the two parameters. The discrepancy of the statistical data may be due to some outliers. Corresponding regression equation: y= -0.006x + 600.89 (almost constant) 2a. Pupil/teacher ratio and SAT scores (verbal) Correlation coefficient: -0.03 Very low negative relationship (almost none) Statistical significance: Average verbal SAT scores are expected to be unrelated with the average number of pupils per teacher. Practical significance: Actual scatter plot reveals that there is a moderate negative correlation in between the two parameters. The average verbal SAT scores decline 22 SPSS/Excel Project slightly, as pupil/teacher ratio increases. The discrepancy of the statistical data may be due to some outliers. Corresponding regression equation: y= -0.45x + 541.76 2b. Pupil/teacher ratio and SAT scores (math) Correlation coefficient: -0.03 Very low negative relationship (almost none) Statistical significance: Average math SAT scores are expected to be unrelated with the average number of pupils per teacher. Practical significance: Actual scatter plot reveals that there is a moderate negative correlation in between the two parameters. The average math SAT scores decline slightly, as pupil/teacher ratio increases. The discrepancy of the statistical data may be due to some outliers. Corresponding regression equation: y= -0.446x + 547.35 3a. Salary and SAT scores (verbal) Correlation coefficient: -0.48 Moderate negative relationship Statistical significance: Average verbal SAT scores are expected to decline slightly, as the teachers’ estimated average annual salary increases. Practical significance: Actual scatter plot reveals that there is no significant correlation in between the two parameters. The discrepancy of the statistical data may be due to some outliers. 23 SPSS/Excel Project Corresponding regression equation: y= -0.003x + 659.56 (almost constant) 3b. Salary and SAT scores (math) Correlation coefficient: -0.41 Moderate negative relationship Statistical significance: Average math SAT scores are expected to decline slightly, as the teachers’ estimated average annual salary increases. Practical significance: Actual scatter plot reveals that there is no significant correlation in between the two parameters. The discrepancy of the statistical data may be due to some outliers. Corresponding regression equation: y= -0.002x + 646.08 (almost constant) 4a. Revenues and SAT scores (verbal) Correlation coefficient: -0.29 Low negative relationship Statistical significance: Average verbal SAT scores are expected to decline slightly, as the total revenues increase. Practical significance: Actual scatter plot reveals that there is no significant correlation in between the two parameters. The discrepancy of the statistical data may be due to some outliers. Corresponding regression equation: y= 0.000x + 548.06 y= 548.06 (constant) 4b. Revenues and SAT scores (math) 24 SPSS/Excel Project Correlation coefficient: -0.2 Low negative relationship Statistical significance: Average math SAT scores are expected to decline slightly, as the total revenues increase. Practical significance: Actual scatter plot reveals that there is no significant correlation in between the two parameters. The discrepancy of the statistical data may be due to some outliers. Corresponding regression equation: y= 0.000x + 549.56 y= 549.56 (constant) 5a. SES and SAT scores (verbal) Correlation coefficient: 0.02 Very low positive relationship (almost none) Statistical significance: Students’ percentage of eligibility for free or reduced-price lunch is expected to have no effect on average verbal SAT scores. Practical significance: Actual scatter plot reveals that there is no correlation in between the two parameters. The discrepancy of the statistical data may be due to some outliers. Corresponding regression equation: y= 0.071x + 532.11 5b. SES and SAT scores (math) Correlation coefficient: -0.08 Very low negative relationship (almost none) Statistical significance: Students’ percentage of eligibility for free or reduced-price lunch is expected to have no effect on average math SAT scores. 25 SPSS/Excel Project Practical significance: Actual scatter plot reveals that there is no significant correlation in between the two parameters. The discrepancy of the statistical data may be due to some outliers. Corresponding regression equation: y= -0.282x + 551.81 6a. Students with disabilities and SAT scores (verbal) Correlation coefficient: -0.07 Very low negative relationship (almost none) Statistical significance: Percentage of students’ disabilities is expected to have no effect on average verbal SAT scores. Practical significance: Actual scatter plot reveals that there is no significant correlation in between the two parameters. The discrepancy of the statistical data may be due to some outliers. Corresponding regression equation: y= -1.242x + 552.6 6b. Students with disabilities and SAT scores (math) Correlation coefficient: -0.07 Very low negative relationship (almost none) Statistical significance: Percentage of students’ disabilities is expected to have no effect on average math SAT scores. Practical significance: Actual scatter plot reveals that there is no significant correlation in between the two parameters. The discrepancy of the statistical data may be due to some outliers. Corresponding regression equation: y= -1.231x + 558.1 26 SPSS/Excel Project Conclusions I was expecting a positive correlation between some variables, especially between expenditure and SAT scores; pupil-teacher rate and SAT scores, but surprisingly I found a negative correlation between those variables. Although statistical data provides important insights about the information collected, it is important to support numerical information with other graphical clues (scatter plots etc.) before their interpretation. In most of the cases scatter plots supported the correlation coefficients as well. I think, the population differences amongst regions had a strong impact in analyzing process of SAT scores in terms of correlations. Therefore, researchers should be careful about interpreting the results with a scope on population number. 27