Download Running Head: SPSS/EXCEL PROJECT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
SPSS/Excel Project
Running Head: SPSS/EXCEL PROJECT
SPSS/Excel Project: Equity in Public School Expenditures
EDU 6976 Interpreting and Applying Educational Research II
Seattle Pacific University
Gulseren Arikan
December 10, 2009
1
SPSS/Excel Project
INTRODUCTION
This paper aims to analyze the data that was compiled in response to equity in public
school expenditures debates. The data includes 50 states’ following records:
Variable Descriptions:
Expenditure
Current expenditure per pupil in average daily attendance in public
elementary and secondary schools, 2005-06
Ratio
Average pupil/teacher ratio in public elementary and
secondary schools, Fall 2005 and Fall 2006
Salary
Estimated average annual salary of teachers in public
elementary and secondary schools, 2005-06
Eligible
Percentage of graduates taking the SAT, 2006-07
Reading
Average verbal SAT score, 2005-06
Math
Average math SAT score, 2005-06
Writing
Average writing SAT score, 2005-06
Region
1= West; 2=Midwest; 3= South; 4=Northeast
SES
Percent of students in elementary and secondary schools who are
eligible for free or reduced-price lunch, by state or jurisdiction: School
year 2006–07
Students
Number of students in elementary and secondary schools who are
eligible for free or reduced-price lunch, by state or jurisdiction: School
year 2006–07
IDEA
Percent of students with disabilities, 2006-07
Money
Total revenues for the year 2005-06 (in thousands)
2
SPSS/Excel Project
PART 1
A) Current expenditure per pupil in average daily attendance in public elementary
and secondary schools (2005-06)
Current expenditure per pupil in
average daily attendance in public
elem. and sec. schools (2005-06)
Mean
10327.78
9805.00
Median
Std. Deviation
2502.202
1.16
Skewness
.33
Std. Error of
Skewness
Minimum
5960
Maximum
18339
The histogram of current expenditure per pupil does not show a normal curve. It
shows a strong negative skewness. The most frequent values are ranged between 8,000
and 11,000. The mean of the data set is 10,327 and the median is 9,805. Mode in the
histogram is 8000. According to histogram, 6,000 - 16, 000 and 18, 000 may be the
outliers that increased standard deviation.
To understand the source of extreme values,
distribution of expenditure per pupil amongst
regions would help. If we look at the box plot
of the variable, we will notice that NE has a
range between 12,000 and 18,000. It has the
highest values, except the one value in South.
3
SPSS/Excel Project
A Southern state with 18,000 showed up as an outlier. When I checked the outlier value, I
noted that the correspondence state is D.C., which makes sense.
According to the box plots; West and then Northeast has the most dispersed
values, and Midwest has the most clustered and even distribution, compared to other
regions. The difference between the Midwest’s highest and lowest expenditures are
2,000, while the difference in the West is 7,000. Furthermore, the medians of the West,
Midwest and South are close to each other, around 9,000; but Northeast deviates average
and median of total values with 13,000 median. The median lines within the boxes of
Midwest and Northeast are not equidistant from the hidges; medians are close to upper
percentile so data is skewed positively there, but skewed negatively in the South. The
West seems equally distributed.
B) Average pupil/teacher ratio fall 2005
Average pupil/teacher ratio Fall
2005
Mean
15.23
Median
14.80
Std. Deviation
2.54
Skewness
.75
Std. Error of
.33
Skewness
Minimum
15.23
Maximum
14.80
The histogram of average pupil/teacher ratio does not show a normal curve. It
shows a rare negative skewness. The most frequent values are ranged between 13 and 17.
The mean of the data set is 15.23 and the median is 14.80. The mode in the histogram is
4
SPSS/Excel Project
15. According to histogram, 21-11-23 may be the outliers that increased standard
deviation.
To get a deeper understanding, we need to
interpret the box plot of pupil/teacher ratio
which shows the distribution according to
the regions. If we look at the box plot of
the variable, we will see that the West has
a range between 13 and 23. It is the most
diverse region in terms of pupil/teacher
rate and has the highest rates. While the
values are widely dispersed in the West, they are very clustered in the South and
Northeast. Surprisingly again, there is an outlier value in the South, and this time the state
is Virginia. Northeast has the lowest values. The West and the Northeast have extreme
low and high values, compared to other regions. The median is around 13 for the
Northeast, around 18 for the West, around 15 for the South, and around 14 for the
Midwest. It is clear that the West contributes to the increases in the standard deviation of
the variable.
The median line within the box of the Midwest and the Northeast is not
equidistant from the hidges, median is close to lower percantile so data is skewed
negatively there, but the median is close to the higher percantile in the Northeast, so data
skewed positively. Those two regions had showed different skewness in terms of the
expenditure analysis as well.
5
SPSS/Excel Project
C) Estimated ave. Salary 2005-2006
estimated ave. salary 20052006
Mean
47679.08
Median
45575.00
Std. Deviation
6942.01
Skewness
.60
Std. Error of
.33
Skewness
Minimum
35607.00
Maximum
61372.00
The histogram of estimated average salary does not show a normal curve. It
shows a negative skewness. The most frequent values are ranged between 40,000 and
45,000. However, the mean of the data set is 47,679 and the median is 45,575. Mode in
the histogram is 42,500. According to histogram, 35,000-57,500-62,500 may be the
outliers that increased standard deviation.
I will use box plots to explain the
regional differences in detail. If we
look at the box plots, we will see
that Midwest has a range between
35, 000 and 60,000 -an extremely
wide dispersion. The South, as in
the previous analyses, has the most
clustered data and it has three
outliers in itself, DC is again one of those outliers. The Midwest has the lowest value and
6
SPSS/Excel Project
the West has the highest value. The West and the Midwest show a positive skew, which
means the higher salary rates are prevailing, while the South and the Northeast show a
negative skew, which means more people get lower salary when compared to the fourth
percentile group. The medians of the West, the Midwest and the South are close to each
other-45,000, as it was in the pupil/teacher ratio, but again the Northeast deviates average
and median of total values, with a median value of 57,000.
D) Percentage of all eligible students taking the SAT 2006-07
percentage of all eligible students
taking the SAT 2006-07
Mean
39.33
Median
32.00
Std. Deviation
31.12
Skewness
.25
Std. Error of
.33
Skewness
Minimum
Maximum
3.0
100.0
It looks like a histogram with lots of outliers. The standard deviation of the
variable is almost equal to the mean and the median of the variable. It shows a strong
negative skewness. The most frequent values are ranged between 0 and 10. The range is
extremely high. The mode is far from the mean and the median. There are gaps -missing
values- in the histogram. According to the histogram, 100-80-10 may be the outliers that
increased standard deviation.
For further understanding, let’s consider the following plot boxes :
7
SPSS/Excel Project
It looks totally different than the
previous variables. Unlike the
other variables, the South has an
extremely wide dispersion,
compared to other regions. The
range is between 3 and 80. It
seems that the South affected a
lot the analysis of data.
Surprisingly the Midwest has an
extremely clustered dispersion with the lowest rates. The highest percentage of students
that are eligible taking the SAT is around 10 % in the South. On the other hand, the rate
rises up to 100% in the Northeast. Two outliers show up in the Midwest, Ohio with 27 %
and Indiana with 62 %. The South, the Northeast and the Midwest show a negative skew,
while the West shows a positive skew. Regarding the eligibility rate to take the SAT, the
Northeast looks like the best region with clustered values and high percantages.
E) Average verbal SAT score (2005-06)
Average verbal SAT score 2005-06
Mean
534.94
Median
523.00
Std. Deviation
37.80
Skewness
.31
Std. Error of
.33
Skewness
Minimum
482.00
Maximum
610.00
8
SPSS/Excel Project
The histogram of average verbal SAT score does not show a normal curve. It
shows a rare negative skewness. The most frequent values are ranged between 490 and
500. The mode is close to the mean and the median. According to the histogram, 610620-480 may be the outliers that affected the standard deviation.
I think, we need to compare this plot
box set with the eligibility one. There
are interesting facts here. In contrast
to the eligibility data, the Midwest
has the highest values and the
Northeast has the lowest values. The
score and the number of students are
inversely proportional.
Just like the eligibility data, the same two states showed up as outliers in the
Midwest. They were the states with the highest rate of eligibility and show lower scores
than the rest of the Midwestern states. This supports my claim regarding the inverse
proportionality between the number of students and their scores.
Some other facts about the plot boxes are: the West, the South and the Northeast
show a positive skewness, the West has the lowest scores, and again as in the case of the
previous data set, the South has the most diverse scores with a range around 80.
9
SPSS/Excel Project
F) Average Writing SAT score (2005-06)
Average writing SAT score (200506)
Mean
525.37
Median
511.00
Std. Deviation
37.63
Skewness
.30
Std. Error of
.33
Skewness
Minimum
472.00
Maximum
591.00
The histogram of average writing SAT score looks like the previous histogram of
verbal score; it does not show a normal curve, but shows a negative skewness. Some facts
that can be derived from the illustration are: The most frequent values are ranged between
485 and 490; the mode is around 480 and far from the mean; 475-580-600 may be the
outliers that affected the standard deviation.
The box plots of writing
scores are almost identical with
verbal scores. So it seems that there
is a strong positive correlation
between verbal scores and writing
scores.
10
SPSS/Excel Project
G) Average Math SAT Score (2005-06)
Average math SAT score 2005-06
Mean
540.59
Median
529.00
Std. Deviation
37.46
Skewness
.47
Std. Error of
.33
Skewness
Minimum
472.00
Maximum
617.00
The histogram of average math SAT score also shows a negative skewness. Some
facts that can be derived from the illustration are: The most frequent values are ranged
between 510 and 530; the mode is around 510; there are some gaps in data set; 470-480610 may be the outliers that affected the standard deviation.
Some facts about the plot boxes are:
The Midwest has the highest values
and the South has the lowest values;
like in the previous variable, the two
states showed up as outliers in the
Midwest; the West, the South and the
Midwest show a positive skewness;
the South has the most diverse
scores; and the Northeast shows the most clustered dispersion.
11
SPSS/Excel Project
H) % of students eligible for free/reduced lunch 2006-07
% of students eligible for
free/reduced lunch 2006-07
Mean
39.82
Median
37.35
Std. Deviation
10.64
Skewness
.62
Std. Error of
.34
Skewness
Minimum
17.70
Maximum
67.50
The histogram shows a slightly positive skewness. Some facts that can be derived
from the illustration are: The most frequent values are ranged between 30% and 40%;
the mode is around 35%; there are some gaps in data set; 20%-70%-60% may be the
outliers that affected the standard deviation.
According to the box plots:
Dispersions in all four regions
are clustered; there are a few
outliers in the West, the South
and the Northeast; the Northeast
and the Midwest show a positive
skewness; the Midwest has the
most diverse scores. The data
are not represented in terms of
percentages unlike the histogram.
12
SPSS/Excel Project
I) % of students with disabilities 2006-07
% of students with disabilities
2006-07
Mean
14.22
Median
14.30
Std. Deviation
2.13
Skewness
.23
Std. Error of
.33
Skewness
Minimum
10.50
Maximum
19.90
The histogram shows a positive skewness. The facts that I can interpret from the
the histogram are: The most frequent values are ranged between 14% and 16%; the mode
is around 15%, and although 10% may be an outlier that affected the standard deviation,
the standard deviation is not so much significant.
According to Plot boxes, the West,
the Northeast and the Midwest show
a positive skewness. Both the
Northeast and the South have the
most diverse values. The Northeast
includes the highest percentages,
while the West includes the lowest
percentages.
13
SPSS/Excel Project
14
J) Total Revenues for the Year 2005-06 (in thousands)
Total revenues for the year 200506 (in thousands
Mean
10208705.02
Median
6346033.00
Std. Deviation
12216880.36
Skewness
2.61
Std. Error of
.33
Skewness
Minimum
958109.00
Maximum
63785872.00
The histogram shows a negative skewness. There are many gaps and outliers in
the data.The most frequent values are ranged between 958,109,000.00 and
100,000,000.00, the main outliers are 400,000,000.00 and 600,000,000.00 which inflated
the standard deviation.
Box plots shows the differences
amongst the regions. The
Nortwest has the most diverse
values within a large range, and
also includes the lowest and the
highest revenues. There are some
extreme values for the West and
the South. All the regions show a
positive skewness.
SPSS/Excel Project
15
K) Enrollment Fall (2005)
Enrollment Fall 2005
Mean
963005.84
Median
654526.00
Std. Deviation
1145966.304
Skewness
2.98
Std. Error of
.33
Skewness
Minimum
76876.00
Maximum
6437202.00
The histogram shows a negative skewness. There are many gaps and outliers in
the data.The most frequent values are ranged between 77,000 and 100,000. The main
outliers are 4,000,000 and 6,437,202.
According to plot boxes, the outliers
are in the West and the South
regions. The Northeast has the
highest range compared to other
regions and it has a positive
skewness.
PART 2
I explained above the differences amongst regions through histograms, descriptive
stattistics and plot boxes . Besides the information above, I just want to add the analysis
SPSS/Excel Project
of ANOVA and TUKEY tests regarding the region differences. ANOVA test results
helped me to get whether there is a significant difference in means but I used TUKEY
values to find where the differences are, to double check and not to reject null hypothesis
incorrectly. In addition, to get some measure of practical significance I evaluated the
effect size through eta square values. For eta square, values of .01, .06, and .14 represent
small, medium, and large effect sizes respectively.
1-Expenditure per pupil
Statistical significance: F ratio  9.75 > F (3,46) =2.81
Reject hypothesis, since at .05 level calculated F is greater than critical value, so
there is a significant difference in means of the regions. There is a greater variability
between regions than within. According to, Levene test with %24 probability the regions
have equivalent variance.
According to TUKEY values, there is a significant difference between the
Northeast and the rest of the regions: The South, the Midwest and the West. From
previous graph analysis we can retrieve that the Northeast has the highest expenditure
values compared to other region.
Practical significance: Partial eta square  .36 So, the effect size is large.
2-Pupil/teacher ratio
Statistical significance: F ratio  13.08 > F (3,46) =2.81
Reject hypothesis, since at .05 level calculated F is greater than critical value, so
there is a significant difference in pupil/teacher ratio means of the regions. There is a
16
SPSS/Excel Project
greater variability between regions than within. According to, Levene test there is %1
probability that the regions have equivalent variance.
According to TUKEY values, there is a significant difference between the West
and the rest of the regions: The West’s teacher/pupil rate mean is higher than the
Northeast, the Midwest and the South. Furthermore, there is a significant difference
between the south and the Northeast, but the Northeast has a higher mean.
Practical significance: Partial eta square  .46 so, the effect size is large.
3. Teacher average salary
Statistical significance: F ratio  3.45 > F (3,46) =2.81
Reject hypothesis, since at .05 level calculated F is greater than critical value, so
there is a significant difference in teacher average salary means of the regions. There is a
greater variability between regions than within. According to, Levene test there is %66
probability that the regions have equivalent variance.
According to TUKEY values, there is a significant difference between the South
and the Northeast, the Northeast has a higher average salary mean..
Practical significance: Partial eta square  .18 So, the effect size is pretty large
4. Percentage of all eligible students taking the SAT, 2006-07
Statistical significance: F ratio  16.66 > F (3,46) =2.81
Reject hypothesis, since at .05 level calculated F is greater than critical value, so
there is a significant difference amongst means of the regions. There is a greater
variability between regions than within. According to, Levene test there is no probability
that the regions have equivalent variance.
17
SPSS/Excel Project
According to TUKEY values, there is a significant difference between the
Northeast and the rest of the regions, once more the Northeast has the highest rates.
Another difference showed up between the Midwest and the South, the South has a
higher mean regarding percentage of all eligible students taking the SAT, than the
Midwest.
Practical significance: Partial eta square  .52 So, the effect size is large
5. Performance on the SAT verbal
Statistical significance: F ratio  12.00 > F (3,46) =2.81
Reject hypothesis, since at .05 level calculated F is greater than critical value, so
there is a significant difference in the means of the regions. There is a greater variability
between regions than within. According to, Levene test there is no probability that the
regions have equivalent variance.
According to TUKEY values, there is a significant difference between the
Midwest and the rest of the regions, so the Midwest has the highest SAT scores compared
to other regions. Another difference showed up between the Midwest and the South, the
South has a higher mean regarding percentage of all eligible students taking the SAT.
Practical significance: Partial eta square  .43 So, the effect size is large
6. Performance on the SAT writing
Statistical significance: F ratio  9.62 > F (3,46) =2.81
Reject hypothesis, since at .05 level calculated F is greater than critical value, so
there is a significant difference amongst the means of the regions. There is a greater
variability between regions than within. According to, Levene test there is no probability
that the regions have equivalent variance.
18
SPSS/Excel Project
According to TUKEY values, there is a significant difference between the
Midwest and the rest of the regions, so the Midwest has the highest SAT writing scores
as well, compared to other regions.
Practical significance: Partial eta square  .38 So, the effect size is large
6. Performance on the SAT math
Statistical significance: F ratio  16.94 > F (3,46) =2.81
Reject hypothesis, since at .05 level calculated F is greater than critical value, so
there is a significant difference amongst the means of the regions. There is a greater
variability between regions than within. According to, Levene test there is no probability
that the regions have equivalent variance.
According to TUKEY values, there is a significant difference between the
Midwest and the rest of the regions, so the Midwest has the highest SAT math scores
compared to other regions. Another difference showed up between the West and the
Northeast; the West has higher math scores than the Northeast.
Practical significance: Partial eta square  .52 so, the effect size is large
6. SES as measured by percent of students on free/reduced lunch
Statistical significance: F ratio  14.87 > F (3,46) =2.81
Reject hypothesis, since at .05 level calculated F is greater than critical value, so
there is a significant difference amongst the means of the regions. There is a greater
variability between regions than within. According to, Levene test there is %29
probability that the regions have equivalent variance.
19
SPSS/Excel Project
According to TUKEY values, there is a significant difference between the South
and the rest of the regions. The South has the highest mean. Another significant
difference is between the Northeast and the West; the West has higher mean than
Northeast.
Practical significance: Partial eta square  .49 So, the effect size is pretty large
7. Percent students with disabilities
Statistical significance: F ratio  10.25 > F (3, 46) =2.81
Reject hypothesis, since at .05 level calculated F is greater than critical value, so
there is a significant difference amongst the means of the regions. There is a greater
variability between regions than within. According to, Levene test there is %7 probability
that the regions have equivalent variance.
According to TUKEY values, there is a significant difference between the
Northeast, the West and the South; the Northeast has higher mean. Other significant
difference showed up between the Midwest and the West; the Midwest has higher mean.
Practical significance: Partial eta square  .40 So, the effect size is pretty large
8. Revenues
Statistical significance: F ratio  30 > F (3, 46) =2.81
Reject hypothesis, since at .05 level calculated F is greater than critical value,
with a %83 probability, there is a significant difference amongst the means of the
regions. There is a greater variability between regions than within. According to, Levene
test there is %52 probability that the regions have equivalent variance.
20
SPSS/Excel Project
According to TUKEY values, there is a significant difference between the
Northeast, the West and the South; the Northeast has higher mean. Other significant
difference showed up between the Midwest and the West; the Midwest has higher mean.
The analysis results are similar with disability variable.
Practical significance: Partial eta square  .40 So, the effect size is pretty large
Conclusion
I was expecting that the Northeast will have higher SAT scores, since it has the
highest values in terms of revenues, expenditure, teacher /pupil rate. However, according
to my analyses above, the Midwest has higher SAT scores. When the number of students
that are eligible to take SAT increases, the SAT scores decreases. It seems that there is an
obvious negative correlation between two variables. The main reason behind Midwest’s
success looks like the low population rates. Midwest has the lowest number of population
that is eligible taking SAT which affected the analysis.
PART 3
Regarding my analyses above, I have noticed significant correlation between
some variables. So, in order to verify my observations, I have implemented the following
analyses:
1a. Expenditure and SAT scores (verbal)
Correlation coefficient: -0.42  Moderate negative relationship
Statistical significance: Average verbal SAT scores are expected to decline slightly, as
the current expenditure per pupil in average daily attendance increases.
21
SPSS/Excel Project
Practical significance: Actual scatter plot reveals that there is no significant correlation
in between the two parameters. The discrepancy of the statistical data may be due to
some outliers.
Corresponding regression equation: y= -0.006x + 600.47 (almost constant)
1b. Expenditure and SAT scores (math)
Correlation coefficient: -0.39  Moderate negative relationship
Statistical significance: Average math SAT scores are expected to decline slightly, as the
current expenditure per pupil in average daily attendance increases.
Practical significance: Actual scatter plot reveals that there is no significant correlation
in between the two parameters. The discrepancy of the statistical data may be due to
some outliers.
Corresponding regression equation: y= -0.006x + 600.89 (almost constant)
2a. Pupil/teacher ratio and SAT scores (verbal)
Correlation coefficient: -0.03  Very low negative relationship (almost none)
Statistical significance: Average verbal SAT scores are expected to be unrelated with the
average number of pupils per teacher.
Practical significance: Actual scatter plot reveals that there is a moderate negative
correlation in between the two parameters. The average verbal SAT scores decline
22
SPSS/Excel Project
slightly, as pupil/teacher ratio increases. The discrepancy of the statistical data may be
due to some outliers.
Corresponding regression equation: y= -0.45x + 541.76
2b. Pupil/teacher ratio and SAT scores (math)
Correlation coefficient: -0.03  Very low negative relationship (almost none)
Statistical significance: Average math SAT scores are expected to be unrelated with the
average number of pupils per teacher.
Practical significance: Actual scatter plot reveals that there is a moderate negative
correlation in between the two parameters. The average math SAT scores decline slightly,
as pupil/teacher ratio increases. The discrepancy of the statistical data may be due to
some outliers.
Corresponding regression equation: y= -0.446x + 547.35
3a. Salary and SAT scores (verbal)
Correlation coefficient: -0.48  Moderate negative relationship
Statistical significance: Average verbal SAT scores are expected to decline slightly, as
the teachers’ estimated average annual salary increases.
Practical significance: Actual scatter plot reveals that there is no significant correlation
in between the two parameters. The discrepancy of the statistical data may be due to
some outliers.
23
SPSS/Excel Project
Corresponding regression equation: y= -0.003x + 659.56 (almost constant)
3b. Salary and SAT scores (math)
Correlation coefficient: -0.41  Moderate negative relationship
Statistical significance: Average math SAT scores are expected to decline slightly, as the
teachers’ estimated average annual salary increases.
Practical significance: Actual scatter plot reveals that there is no significant correlation
in between the two parameters. The discrepancy of the statistical data may be due to
some outliers.
Corresponding regression equation: y= -0.002x + 646.08 (almost constant)
4a. Revenues and SAT scores (verbal)
Correlation coefficient: -0.29  Low negative relationship
Statistical significance: Average verbal SAT scores are expected to decline slightly, as
the total revenues increase.
Practical significance: Actual scatter plot reveals that there is no significant correlation
in between the two parameters. The discrepancy of the statistical data may be due to
some outliers.
Corresponding regression equation: y= 0.000x + 548.06  y= 548.06 (constant)
4b. Revenues and SAT scores (math)
24
SPSS/Excel Project
Correlation coefficient: -0.2  Low negative relationship
Statistical significance: Average math SAT scores are expected to decline slightly, as the
total revenues increase.
Practical significance: Actual scatter plot reveals that there is no significant correlation
in between the two parameters. The discrepancy of the statistical data may be due to
some outliers.
Corresponding regression equation: y= 0.000x + 549.56  y= 549.56 (constant)
5a. SES and SAT scores (verbal)
Correlation coefficient: 0.02  Very low positive relationship (almost none)
Statistical significance: Students’ percentage of eligibility for free or reduced-price lunch
is expected to have no effect on average verbal SAT scores.
Practical significance: Actual scatter plot reveals that there is no correlation in between
the two parameters. The discrepancy of the statistical data may be due to some outliers.
Corresponding regression equation: y= 0.071x + 532.11
5b. SES and SAT scores (math)
Correlation coefficient: -0.08  Very low negative relationship (almost none)
Statistical significance: Students’ percentage of eligibility for free or reduced-price lunch
is expected to have no effect on average math SAT scores.
25
SPSS/Excel Project
Practical significance: Actual scatter plot reveals that there is no significant correlation
in between the two parameters. The discrepancy of the statistical data may be due to
some outliers.
Corresponding regression equation: y= -0.282x + 551.81
6a. Students with disabilities and SAT scores (verbal)
Correlation coefficient: -0.07  Very low negative relationship (almost none)
Statistical significance: Percentage of students’ disabilities is expected to have no effect
on average verbal SAT scores.
Practical significance: Actual scatter plot reveals that there is no significant correlation
in between the two parameters. The discrepancy of the statistical data may be due to
some outliers.
Corresponding regression equation: y= -1.242x + 552.6
6b. Students with disabilities and SAT scores (math)
Correlation coefficient: -0.07  Very low negative relationship (almost none)
Statistical significance: Percentage of students’ disabilities is expected to have no effect
on average math SAT scores.
Practical significance: Actual scatter plot reveals that there is no significant correlation
in between the two parameters. The discrepancy of the statistical data may be due to
some outliers.
Corresponding regression equation: y= -1.231x + 558.1
26
SPSS/Excel Project
Conclusions
I was expecting a positive correlation between some variables, especially between
expenditure and SAT scores; pupil-teacher rate and SAT scores, but surprisingly I found
a negative correlation between those variables. Although statistical data provides
important insights about the information collected, it is important to support numerical
information with other graphical clues (scatter plots etc.) before their interpretation. In
most of the cases scatter plots supported the correlation coefficients as well. I think, the
population differences amongst regions had a strong impact in analyzing process of SAT
scores in terms of correlations. Therefore, researchers should be careful about
interpreting the results with a scope on population number.
27