Download Relationships between Alpha Coefficient and Scale Stability Index

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Transcript
Relationships between Alpha Coefficient and Scale Stability Index
James B. Olsen, Alpine Media Corporation
July 8, 2004
Purpose
This paper describes an empirical investigation between two reliability coefficients as
indices of relationship among a set of item or task scores. These item and task scores
may be from standard multiple-choice exams, performance-like tests (scenario or
simulation based) and performance tests (intentionally designed to evaluate candidate
proficiency levels on authentic, integrated knowledge and skill clusters). The reliability
coefficients selected for this comparison are the Cronbach alpha coefficient and the Judd
Scale Stability Index (SSI).
Introduction
The Cronbach alpha coefficient is an inter-class correlation that measures the average
degree of relationship among item or task scores drawn from all possible split-halves of
the test scores. The test scores could be divided into two comparable groups based on the
order of test completion. Alternatively, the test scores could be divided into two groups
based on odd and even number indices from the examinee’s identification code. The test
scores could be divided into two groups based on quartile scores where the first group
includes scores in the first and fourth quartiles and the second group includes scores from
the second and third quartiles. There are many other possible ways of dividing the scores
into two comparable groups. Scores from one group (one split half) are correlated with
the scores from the other group (second split half). The alpha coefficient of reliability is
the average of all possible split halves of these test scores. Coefficient alpha is also highly
related to other interclass correlations such as the KR-20 and KR-21 coefficients.
The Judd Scale Stability Index (SSI) is a coefficient that measures the degree to which a
set of item or task scores forms a consistent ordered scale. “The individual Scale Stability
Index (SSI) measures how well a set of responses conforms to a Guttman scale.
Individual scores range from 1 (perfect conformity to a Guttman scale) to 0 (responses
exactly counter to a Guttman scale).” (Judd, 2004) Items or tasks are ordered by item
difficulty (proportion correct across an appropriate sample). Examinee actual response
patterns are compared to a Guttman scale ordering of the response patterns with the same
number of correct answer choices (note the similarity of this idea to Rasch model ability
measure where there is a direct one to one relationship between the number correct score
and the proficiency or ability metric). SSI creates a hypothetical edge condition where the
expected response pattern in scale order shifts from a string of correct answer choices to a
string of incorrect choices. SSI computes the distance discrepancy per item from the
hypothetical edge for any test items or tasks that do not conform to the standard Guttman
scale ordering of the number of correct responses by item difficulty order.
Research Question
Is there a relationship between the Cronbach alpha coefficient and the Judd Scale
Stability Index?
Research Procedure
Random samples of item responses to a ten-item test with thirteen examinees were
generated from the scale stability index program. The scale stability index for each of
these samples was recorded. Each of these random samples of examinee responses was
also analyzed using the alpha reliability procedure in SPSS. Alpha reliabilities and scale
stability indices were computed for each of the random samples of items and examinees
and then correlated with a Pearson bi-variate correlation and two non-parametric
correlations Kendall tau_b and Spearman rho.
Sample
One hundred and seven data sets (ten items x 13 examinees) were analyzed with the scale
stability index and alpha reliability indices. Appendix A includes the data used for this
analysis.
Results
The SSI scores and the alpha reliability values were correlated 0.900 (significance level
< .01, two tailed test) using the Pearson bi-variate parametric correlation, 0.770
(significance level < .01, two tailed test) using the Kendall tau_b nonparametric
correlation, and 0.924 (significance level <.01, two tailed test) for the Spearman rho nonparametric correlation.
Descriptive statistics for the analysis are provided in Table 1. These statistics are very
comparable for the range, minimum, maximum, mean score, standard error of the mean,
standard deviation, and variance. Differences were found in the skewness and kurtosis
statistics. The alpha reliability showed a slightly larger skewness value and the SSI index
showed a larger kurtosis value.
Frequency distributions were computed for the data and are presented in Table 2.
These frequency distribution statistics also include multiple measures of central tendency
(mean, median, and mode). The multiple measures of central tendency and the
distributional proportions at quartiles and deciles are also very similar.
Table 1
Descriptive Statistics
N
Range
Minimum
Maximum
Mean
Std. Deviation
Variance
Skewness
Kurtos is
Statistic
Statistic
Statistic
Statistic
Statistic
Std. Error
Statistic
Statistic
Statistic
Std. Error
Statistic
Std. Error
SSISCORE
107
.37
.55
.92
.7619
.01083
.11203
.013
-.098
.234
-1.492
.463
ALPHA
107
.487
.405
.892
.74205
.010651
.110173
.012
-.645
.234
-.190
.463
Table 2
Frequency Distribution Analysis
Statistics
N
Valid
Missing
Mean
Median
Mode
Percentiles
10
20
25
30
40
50
60
70
75
80
90
SSISCORE
107
0
.7619
.7500
.90
.6180
.6500
.6600
.6640
.7000
.7500
.8200
.8600
.8700
.8800
.9000
ALPHA
107
0
.74205
.74700
.841a
.57300
.66100
.67500
.68400
.71380
.74700
.79580
.83620
.84500
.84940
.86900
a. Multiple modes exis t. The smallest value is shown
Figure 1 presents a scatter plot of the SSI and alpha reliability comparisons. This scatter
plot shows strong comparability between the SSI and alpha reliability if either of the
estimates are above 0.80. Each of the estimates appears to lie close to the expected
regression diagonal. When the estimates are below 0.80 there is considerably greater
scatter or deviations from the expected regression diagonal line.
Figure 1
Scatterplot of SSI Scores (Y axis) and Alpha Reliabilities (X Axis)
0.90
SSISCORE
0.80
0.70
0.60
0.400
0.500
0.600
0.700
0.800
0.900
ALPHA
Discussion
This research investigation has shown that there is a consistent and high degree of
relationship or correlation between the reliability indices produced by the scale stability
index and the alpha reliability procedures. The reliability indices are using substantively
different assumptions and computational procedures, however, using common data sets
of items and examinees, these reliability procedures show striking similarity to each other
when analyzed using parametric correlation procedures, non-parametric correlation
procedures, descriptive statistics, and frequency distribution comparisons.
This analysis shows the comparability as well as differences between the SSI and alpha
reliability values as illustrated in the scatter plot of values. SSI and alpha reliability
appear to be measuring some things in common or to be measuring something in
common with some other construct or variable related to each measure. Also SSI and
alpha reliability are different measures of the consistency, stability, or generaliability of
scores using different assumptions, computational procedures, and show considerable
divergences when the reliability estimates from either index are considerably lower than
0.80. The comparability of the Pearson bi-variate correlation and the Spearman rho nonparametric correlation suggests considerably consistency between the SSI and the alpha
reliability estimates. The considerably lower Kendall tau_b coefficient and the
scatterplot diagram show that SSI and alpha reliability are measuring somewhat different
things or measuring somewhat different constructs or variables. We need more insight
into these similarities and differences. Hopefully, this investigation has provided some
information for this further quest.
Appendix A
SSI
.73
.57
.65
.68
.63
.65
.63
.65
.82
.64
.79
.74
.81
.70
.72
.70
.64
.61
.64
.90
.66
.91
.63
.91
.56
.60
.90
.55
Alpha
.743
.487
.705
.682
.555
.676
.674
.661
.814
.616
.756
.684
.772
.678
.667
.661
.708
.549
.664
.841
.725
.863
.574
.872
.590
.682
.886
.576
.91
.65
.61
.62
.66
.92
.90
.59
.75
.63
.63
.66
.67
.92
.90
.92
.89
.89
.69
.88
.88
.92
.68
.61
.88
.87
.65
.69
.72
.87
.89
.80
.88
.65
.64
.69
.88
.89
.69
.75
.88
.87
.87
.87
.86
.85
.865
.620
.569
.656
.628
.887
.869
.567
.686
.549
.713
.717
.553
.849
.849
.862
.884
.868
.710
.849
.847
.878
.684
.477
.841
.829
.611
.675
.738
.845
.872
.792
.841
.524
.405
.610
.879
.864
.687
.727
.874
.869
.865
.843
.808
.798
.87
.84
.79
.83
.81
.90
.66
.66
.90
.91
.90
.60
.68
.66
.60
.67
.73
.86
.85
.74
.74
.82
.82
.67
.73
.84
.85
.80
.86
.80
.83
.84
.82
.853
.850
.741
.810
.772
.865
.733
.688
.845
.892
.854
.737
.708
.742
.671
.647
.747
.843
.812
.736
.710
.749
.767
.690
.754
.825
.798
.796
.828
.749
.795
.829
.769