Download NIU Testing Services Report: Standard Test Analyses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
NIU Testing Services Report:
Standard Test Analyses
Test Item Analysis Printout
Testing Services’ Test Item Analysis provides a variety of information about each item on your exam. This
information can be used to improve your exams and assess teaching effectiveness. Following is an example
of a Test Item Analysis followed by descriptions of its contents.
•
Item No. – Refers to the exam item numbers in sequential order.
•
Disc. Index – The Discrimination Index is a measure of how well the item discriminates among the
various ability levels within the group of examinees. In many classroom exams the discrimination indices
will generally range from +.50 to -.30, although sometimes the index may be +.60 or higher or -.50 or
lower. High values indicate that the item is discriminating appropriately between highly competent and
less competent examinees. A high discrimination index indicates that the competent examinees
answered the item correctly AND that less competent examinees answered the item incorrectly – a
desirable characteristic of an exam item. A low discrimination index generally indicates that some
competent examinees answered incorrectly and less competent examinees answered correctly – an
undesirable characteristic of a exam item. Literature in the field of educational measurement presents
more than one algorithm for computing an item’s discrimination index. The index used by NIU’s Testing
Services is the point-biserial correlation coefficient between the right/wrong scores on the item across
examinees and the total exam scores of the examinees. Like other correlations, the range of possible
values for the point-biserial is +1.0 to -1.0. In practice, however, discrimination indices close to +1.0 or 1.0 are extremely rare. Generally, items exhibiting a discrimination index less than +.15 may be good
candidates for modification and improvement. Occasionally the discrimination index for some items
equals 0.00. This is usually because all examinees answered the item correctly; the ITEM AVG will be
1.00. Such an item is not discriminating among ability levels within the examinee group.
•
Item Diff. – The Item Difficulty is the number of examinees answering the item correctly divided by the
number of examinees (i.e., the proportion of examinees who chose the correct answer). For example, if 7
examinees out of 10 answer an item correctly, the item difficulty is 7/10 or .70.
•
Item Weight – Refers to the number of points or weight assigned to each exam item.
•
No. Wrong – The number of examinees who answered each exam item incorrectly.
•
N % – Number and Percent of examinees, respectively, choosing each response option for an item. The
correct response option is indicated by an asterisk "*".
•
Omit – This column shows the number and percent of examinees who did not respond to each exam
item. Omits near the end of an exam may indicate that insufficient time was allowed.
•
N – Number of examinees taking the exam.
•
Mean – The arithmetic average of the exam scores. Normally, the mean should fall approximately
halfway between the highest possible exam score and the expected chance score.
•
Std Dev – The standard deviation is a measure of the dispersion of exam scores around the mean exam
score. Normally the SD value would be expected to be around 1/6 the range between the highest
possible exam score and the expected chance score.
•
KR20 and KR21 – KR20 and KR21 are estimates of the reliability of an exam. Exam reliability is
traditionally measured by computing the correlation between the exam scores from two equivalent exam
administrations with the same group of examinees. In this way, the range of possible reliability estimates
is +1.0 to -1.0 with high, positive values indicating reliable exams. Ideally, the examinees would achieve
very similar exam scores on these two exam administrations.
There are numerous ways of assessing the reliability of an exam including test-retest estimates and
parallel/alternate form estimates. KR20 and KR21 compute a measure of exam reliability from only one
exam administration by estimating the strength of the correlation between theoretical halves of the same
exam.
Carefully developed standardized achievement exams usually have reliability estimates around .90. The
typical classroom exam of 50-75 items might have reliability values averaging in the low .70's. Exam
reliability can be improved by improving the discrimination of the exam items or by adding more, highly
discriminating, items to the exam.
•
SE Measure– The Standard Error of Measurement is an estimate of the possible amount of error in a set
of exam scores. Theoretically, each actual (obtained) exam score is only an estimate of the student's true
score or ability. The true score cannot be determined directly as there are a variety of sources of error
involved in estimating an examinee’s ability. For this reason, the Standard Error of Measurement is
computed to provide a range in which an examinee’s true score is likely to fall. For example, if an
examinee's obtained score is 60 and the SE MEAS is 3, the true score is likely to be near 60 but could be
as low as 51 or as high as 69 (60 ± 3 Standard Errors of Measurement).
01/29/2010
SAMPLE SCORE DISTRIBUTION REPORT
ƒ SCORE: Distribution of weighted test scores; it is the sum of the weights assigned to the chosen answers
ƒ PERCENT: Conversion of the weighted score to a percentage score (Note: If the maximum score is 100
and equal weights are used, this score is the percent correct)
ƒ FREQUENCY: Indicates the number of individuals obtaining each score (Note: Sum of the FREQUENCY
column equals the number of individuals taking the test)
ƒ PERCENTILE: Indicates percentile level of each score (i.e. % of individuals scoring equal to or less than
each particular score)
ƒ FREQUENCY GRAPH: The lines of astericks (" * ") to the right of the percentile distribution is a graphic
representation of the frequency distribution. (Note: If space permits, the asterisks equal the number of
students achieving each score)
ƒ MEAN: Mean for the weighted scores
ƒ STANDARD DEVIATION: Standard deviation for the weighted scores
ƒ NO. of EXAMS: Number of examinees
ƒ The cumulative score distribution report uses cumulative scores, rather than current test scores
01/29/2010
SAMPLE STUDENT ROSTER REPORT
ƒ STUDENT NAME: Last name of each individual followed by his/her first and middle initial
ƒ STUDENT ID: ID number (usually Z-ID, but could be any number unique to each individual)
ƒ SEC. ID: Course section number
ƒ GRADE (optional): This column indicates the letter grade assigned to each individual for the current test
ƒ PERCENTILE: Percentile level of the individual’s total score (i.e. the percentage of the class scoring
equal to or less than that individual’s score) (Note: this is for the cumulative score, not for the current test)
ƒ TOTAL: Sum of the current test plus previous tests (i.e. the total score)
ƒ 1…2…3…: Individual test scores
01/29/2010