Download Assessment In Early Childhood Education

Document related concepts

Psychometrics wikipedia , lookup

Transcript
CHAPTER 4:
Using and Reporting
Standardized Test Results
Assessment In Early Childhood Education
Fifth Edition
Sue C. Wortham
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Chapter Objectives
1. Explain the difference between norm-referenced and
criterion-referenced tests.
2. List common characteristics of norm-referenced and
criterion-referenced tests.
3. Explain the advantages and disadvantages of using standardized
tests.
4. Understand how test scores are interpreted and reported.
5. Describe how individual and group test results are used to report
student progress and program effectiveness.
6. Discuss the advantages and disadvantages of using
norm-referenced and criterion-referenced tests with
young children.
7. Understand the difficulties in using standardized tests with young
children.
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Distinctions Between Norm-Referenced and
Criterion-Referenced Tests
• Norm-referenced tests provide information on
how the performance of an individual compares
with that of a known group.
• Criterion-referenced tests provide information
on how the individual performed on some
standard or objective (without considering the
performance of others).
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Common Characteristics of Norm- and
Criterion-Referenced Tests
• Require a relevant and representative sample of test
items
• Require specification of the achievement domain to be
measured
• Use the same type of test items
• Use the same rules for item writing (except for item
difficulty)
• Judged by the same qualities of goodness (validity and
reliability)
• Useful in educational measurement
(Linn and Gronlund, 2000)
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Aptitude vs. Achievement Tests
Aptitude Tests
Achievement Tests
Predict a student’s ability to
learn a skill
or accomplish a task.
(Stanford Binet,
Wechsler, SAT when
used to predict success)
Measure what the
student has learned
or mastered.
(California Achievement,
IOWA Basic Skills,
SAT when used to
determine what has been
learned)
15.5
Uses of Norm-Referenced Tests
with School-Age Children
Achievement tests are:
• given to measure and analyze individual and
group performance resulting from the
educational program
• analyzed for trends in achievement
• used to describe the program effectiveness areas of weakness and strength, and plans can
be made to improve curriculum
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
How Criterion-Referenced Tests with
Preschoolers Are Used
•
•
•
Developmental screening
Diagnostic evaluation
Instructional planning
Developmental screenings determine
whether further evaluation is needed to identify
disabilities and strategies for remediation.
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Reasons Criterion-Referenced Tests
Are Used with School-Age Children
• Achievement test scores describe individual
performance and are used to plan instruction for
groups and individual students.
• Diagnostic evaluation intelligence batteries in
academic content areas are used with students
who demonstrate learning difficulties.
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
 Savant Syndrome
 condition in which a person otherwise
limited in mental ability has an exceptional
specific skill
 Calculation abilities
 Drawing
 Musical
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
The Psychometric Approach
Intelligence • A single attribute?
– Spearman (1863-1945)
2 – factor theory of intelligence
“g” = general ability
“s” = special abilities
Figure 9.3 According to Spearman (1904), all intelligent abilities have an area of overlap,
which he called (for “general”). Each ability also depends partly on an s (for “specific”)
factor.
Figure 9.4a Measurements of sprinting, high jumping, and long jumping correlate with
one another because they all depend on the same leg muscles. Similarly, the g factor
that emerges in IQ testing could reflect a single ability that all tests tap.
• Many attributes?
– Thurstone: 7 primary mental abilities
• Spatial ability, perceptual speed, numeric
reasoning, verbal meaning, word fluency,
memory, inductive reasoning
What is Intelligence?
• Fluid intelligence and crystallized intelligence
– Cattell & Horn believed that the “g” factor has
two components:
- Fluid intelligence is the power of reasoning,
solving unfamiliar problems, seeing relationships
and gaining new knowledge
- Crystallized intelligence is acquired knowledge
and the application of that knowledge to
experience.
Concept Check:
A 16-year-old is learning to play chess and is
becoming proficient enough to be accepted into
the school’s chess club. Is this fluid or crystallized
intelligence?
Concept Check:
• Ten years later, the chess player achieves
grandmaster status. Is this a result of fluid or
crystallized intelligence?




Gardner’s
Theory of
Multiple
Intelligences





Logical-Mathematical
Linguistic
Musical
Spatial
Bodily-Kinesthetic
Interpersonal
Intrapersonal
Naturalistic
Existential
Copyright © Allyn & Bacon 2006
Gardner’s Multiple Intelligences
Sternberg’s Triarchic Theory
• Contextual Component (“street smarts or practical”)
– Adapting to the environment
• Experiential Component: (creative)
– Response to novelty
– Automatization
• Componential Component (“academic or analytical”)
– Information processing
– Efficiency of strategies
Theories and Tests of
Intelligence
• IQ tests
– Intelligence quotient (IQ) tests attempt to
measure an individual’s probable
performance in school and similar settings.
Binet (1857-1911) and Simon created
1st IQ test in 1905
Binet Intelligence Tests
Mental Age
Intelligence
Quotient (IQ)
• An individual’s level of mental
development relative to others
MentalAge
IQ =
 100
Chronological Age
4.23
Theories and Tests of
Intelligence
• The Stanford-Binet test
– The Stanford-Binet test - V (2-85)
– The mean or average IQ score for all age
groups is designated as 100 ± 15 (85-115).
– Given individually
Interpreting Test Scores
• A child’s performance on a standardized test is
meaningless until it can be compared with other
scores.
• A raw score is translated into a standard score
that reports how well the child’s performance
compares with that of other children who took
the same test.
• The bell-shaped normal curve is the graph on
which the distribution of standard scores is
arranged.
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
The Normal Curve
• Represents the ideal normal distribution of test
scores
• The scores are distributed in a bell-shaped
frequency polygon, with most scores clustered
toward the center of the curve
– (see Figure 4-5 on p. 87 of the text)
• Standard deviations are used to calculate how
an individual scored, compared with the scores
of the norming group
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Normal Distribution
Normal Distribution
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. Santrock, Educational Psychology, Second Edition, Classroom Update
Bell Curve
4.28
Individual Intelligence Tests
The Wechsler Scales
Overall IQ and also verbal and performance IQs.
(WPPSI-III) Wechsler Preschool and Primary
Scale of Intelligence-Revised. Ages 2 ½ to 7
years, 3 months
(WISC-IV) Wechsler Intelligence Scale for
Children-Revised. Ages 6 to 16 years, 11 months
(WAIS-IV) Wechsler Adult Intelligence ScaleRevised
Ages 16 to 90 years, 11 months
WPPSI-III
WPPSI
4.33
WAIS-III
4.34
WISC-IV
• Word Reasoning—measures reasoning with verbal material;
child identifies underlying concept given successive clues.
• Matrix Reasoning—measures fluid reasoning a (highly reliable
subtest on WAIS® –III and WPPSI™–III); child is presented
with a partially filled grid and asked to select the item that
properly completes the matrix.
• Picture Concepts—measures fluid reasoning, perceptual
organization, and categorization (requires categorical reasoning
without a verbal response); from each of two or three rows of
objects, child selects objects that go together based on an
underlying concept.
• Letter-Number Sequencing—measures working memory
(adapted from WAIS–III); child is presented a mixed series of
numbers and letters and repeats them numbers first (in
numerical order), then letters (in alphabetical order).
• Cancellation—measures processing speed using random and
structured animal target forms (foils are common non-animal
objects).
WAIS - IV
Theories and Tests of
Intelligence
• Raven’s Progressive Matrices
– Psychologists created “culture-reduced”
tests without language. It tests abstract
reasoning ability (non-verbal
intelligence or performance IQ)
4.42
Descriptive statistics are the mathematical procedures
that are used to describe and summarize data.
Counting the Data-Frequency
Look at the set of data that follows on the next slide.
A tally mark was made to count each time a score
occurred
Which number most likely represents the
average score?
Which number is the most frequently
occurring score?
Frequency Distribution
Scores
100
99
98
94
90
89
88
82
75
74
68
60
Tally
1
1
11
11
1111
1111 11
1111 1111
1111 1
11
1
1
1
Frequency
1
1
2
2
5
7
10
6
2
1
1
1
Average
Score?
88
Most
Most Frequent
Score?
88
Tally
1
1
11
11
1111
1111 11
1111 1111
1111 1
11
1
1
1
This frequency count represents data that
closely represent a normal distribution.
Descriptive Statistics
15.48
Frequency Polygons
Data
100
99
98
98
94
94
90
90
90
90
90
89
89
89
89
88
88
75
75
74
68
60
5
4
3
2
1
60 68 74 75 88 89 90 94 98 99 100
Scores
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. Santrock, Educational Psychology, Second Edition, Classroom Update
Measures of Central
Tendency
Measures of central tendency provide information
about the average or typical score in a data set
Mean: The numerical average of a group of scores
Median: The score that falls exactly in the middle of
a data set
Mode: The score that occurs most often
15.50
Central tendency = representative or typical value in
a distribution
X

Mean
M 
Same thing as an average
N
Computed by
Summing all the scores (sigma, )
Dividing by the number of scores (N)
Mean- To find the mean, simply add the
scores and divide by the number of scores
in the set of data.
98 + 94 + 88 + 75 = 355
Divide by the number of scores: 355/4 = 88.75
Mean
Measures of Central Tendency
• Steps to computing the median
1. Line up scores from highest to lowest
2. Count up to middle score
• If there is 1 middle score, that’s the
median
• If there are 2 middle scores, median
is their average
Median-The Middlemost point in a set of data
Data Set 1
100
99
99
98
97
96
90
88
85
80
79
Data Set 2
Median
96
100
99
98
97
86
82
78
72
70
68
The median is
84 for this set.
84 represents
the middle
most point in
this set of
data.
Mode-The most frequently occurring score in
a set of data.
Find the modes for the following sets of data:
Data Set 3
99
89
89
89
89
75
Mode:
89
Data set 4
99
88
88
87
87
72
70
88 and 87 are both
modes for this
set of data. This is
called a bimodal
distribution.
Measures of Variability (Dispersion)
Range- Distance between the highest
and lowest scores in a set of data.
100 - 65 = 35
35 is the range in this set of scores.
Variance - Describes the total amount
that a set of scores varies from the
mean.
1. Subtract the mean from
each score.
When the mean for a set of data is
87, subtract 87 from each score.
100 - 87 = 13
98- 87 = 11
95- 87 = 8
91- 87 = 4
85- 87 = -2
80- 87 = -7
60- 87 = -27
2. Next-Square each differencemultiply each difference by itself.
13 x 13 = 169
11 x 11 = 121
8 x 8=
64
4 x 4 =
16
-2 x -2 =
4
-7 x -7 =
49
-27x -27= + 729
1,152
3. Sum these
differences
Sum of
squares
4. Divide the sum of squares by the
number of scores.
1,152 divided by 7 =164.5714
This number
represents the variance for this set of data.
Standard Deviation-Represents the typical
amount that a score is expected to vary
from the mean in a set of data.
5. To find the standard deviation, find the
square root of the variance. For this
set of data, find the square root of
164.5714.
The standard deviation for this set of
data is 12.82 or 13.
Ceiling and Floor Effects
• Ceiling effects
– Occur when scores can go
no higher than an upper
limit and “pile up” at the
top
– e.g., scores on an easy
exam, as shown on the
right
– Causes negative skew
• Floor effects
– Occur when scores can go
no lower than a lower limit
and pile up at the bottom
– e.g., household income
– Causes positive skew
Skewed Frequency Distributions
• Normal distribution (a)
• Skewed right (b)
– Fewer scores right of the peak
– Positively skewed
– Can be caused by a floor effect
• Skewed left (c)
– Fewer scores left of the peak
– Negatively skewed
– Can be caused by a ceiling effect
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. Santrock, Educational Psychology, Second Edition, Classroom Update
Understanding Descriptive Statistics
The Normal Distribution: A “bell-shaped” curve in which
most of the scores are clustered around the mean; the farther
from the mean, the less frequently the score occurs.
15.68
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. Santrock, Educational Psychology, Second Edition, Classroom Update
Commonly Reported Test Scores
Based on the Normal Curve
15.69
Z Scores
• When values in a distribution are
converted to Z scores, the distribution will
have
– Mean of 0
– Standard deviation of 1
• Useful
– Allows variables to be compared to one
another even when they are measured on
different scales, have very different
distributions, etc.
– Provides a generalized standard of comparison
Z Scores
• To compute a Z
score, subtract the
mean from a raw
score and divide
by the SD
• To convert a Z
score back to a
raw score, multiply
the Z score by the
SD and then add
the mean
(X  M )
Z
SD
X  ( Z )( SD)  M
The Normal Curve
• Derived scores are used to specify where the individual
score falls on the curve and how far above or below the
mean the score falls
• Raw scores are transformed into percentiles, stanine or
other standard scores
• All scoring scales are drawn parallel to the baseline of
the normal curve; and use the deviation from the mean
as the reference to compare an individual score with the
mean score of a group
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Percentile Ranks
• Percentiles represent the point on the normal curve
below which a percentage of test scores is distributed.
• A student’s percentile rank on a test indicates the
percentage of students who scored lower in the
comparison group.
For example, if a student is ranked in the 55th percentile,
the student’s score was 55% better than the comparison
group who took the test.
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Stanines
Parents find stanine results easiest to understand
because their child’s standardized test scores
are reported as:
9 Very superior
8 Superior
7 Considerably above average
6 Slightly above average
5 Average
4 Slightly below average
3 Considerably below average
2 Poor
1 Very poor
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Z Scores and T Scores
• Called standard scores
• Report how many standard deviations a
transformed raw score is located above or
below the mean
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Grade Equivalent Scores
Test publishers recommend that grade equivalents not
be used to report to parents because they may not
understand that the score does not mean the child
should be placed in a higher or lower grade.
• Grade level results are compared with test results from
grades above and below the grade, indicating whether
the child performed above or below average.
• The grade equivalent score does not indicate grade
level placement in school.
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Reporting Standardized Test
Results
Both norm- and criterion-referenced information can be
organized in a useful form.
• Scores can be reported for an individual, a class, a
grade, a school, and a district.
• Strengths and weaknesses can be analyzed by content
areas, by school, and by grade level.
• Achievement can be compared over several years to
determine long-term improvement or decline.
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Reporting Test Results
to Parents
A parent–teacher conference may be used to report
test results.
The teacher should explain:
• both the value and the limitations of the test scores
• why the test was chosen
• how the results will be used - for example, to plan
appropriate learning experiences for their child
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Advantages of Standardized Tests
Norm-referenced and criterion-referenced achievement
tests provide valuable information regarding the
effectiveness of curriculum and instruction.
• Teachers can determine curriculum strengths and
weaknesses.
• Individual students’ reports determine who would benefit
from additional instruction and those who are ready to
move to more advanced learning experiences.
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Advantages of Standardized Tests
Standardized tests have unique qualities
that are advantageous:
• Uniformity in test administration
• Quantifiable scores
• Norm referencing
• Validity and reliability
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Disadvantages of Standardized
Tests
• Standardized tests are not necessarily
the best method of evaluation of young
children.
• A variety of strategies should be used in
assessing children.
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Concerns About the Use of
Standardized Tests
• Use of tests with children from a different
culture or whose first language is not
English
• Use of standardized tests to deny children
entrance to school, or retention in grade
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
No Child Left Behind Act
Assessment of Students with Disabilities and/or
Limited English Proficiency (LEP)
• NCLB requires that all students be assessed regardless
of their special needs
• Accommodations have been made for students with
disabilities and for those who speak a language other
than English or have limited English
• Limitations of the tests designed for NCLB when used
with these populations has become an issue
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Standardized Tests Have Effects
On Curriculum And Instruction
• Standardized tests only sample a few of the
curriculum objectives.
• Pressures for higher test scores result in
limitations on the curriculum that is taught.
• Instruction becomes focused on what will be
tested and limits the balance of the curriculum.
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.
Misapplication of Test Results
Using of standardized tests to decide school entry or the
placement into early childhood programs is inappropriate
because:
• tests do not differentiate between limited intelligence and
limited opportunities to learn
• decisions on enrollment, retention, and placement in
special classes should never be based on a single test
score
• other sources of information, including systematic
observation and samples of children’s work, should be a
part of the evaluation process
Wortham. Assessment in Early Childhood Education, 5e.
© 2008 by Pearson Education, Inc. All Rights Reserved.