Download ses1part2 - Dr. McLaughlin`s Classes

Document related concepts

Psychometrics wikipedia , lookup

Transcript
SES1 part two
Basic Quantitative Concepts
• Measurement and assessment are based on
the concepts related to quantitative concepts
• Scale of Measurement
– Nominal
– Ordinal
– Ratio
– equal intervals
Measurement Scales
• There are four levels of measuring variables
and data. Each level provide greater
information
– Nominal
– Ordinal
– Interval
– Ratio
Ratio Scale
• Not used a great deal in education. Show data
as represented in ratio or percentages.
• His weight was half as much as Tom’s.
• Not the same as percentile scores.
Nominal
• Nominal defines several mutually exclusive
categories (remember this is quantitative, there are
no gray areas)
– Gender, race, type of school.
– (The term Nominal also refers to the nature of the data
that are collected. You can simply count instances of
number of participants, fourth, fifth, sixth)
Measurement Scales
• In education, Ordinal and equal interval scales
are by far the most used scale
• Nominal and ratio scales are rarely used in
education
Ordinal Scale
• Categories are rank order. No indication of by
how much they are separated. Top, bottom,
middle, first, second, third.
• It ranks things from better to worse, worse to
better
– Example- good, better or novice, intermediate,
expert
– High risk, low risk, emerging, reader
Ordinal Scale
• Used in education a great deal- Serious, more
serious, most serious,
• In an ordinal scale, the difference between the
scales is unknown- we cannot determine how
much better an intermediate is from a novice
• Because the value between the scales is
unknown, and presumed unequal it cannot be
averaged
• For example, an emerging and an expert cannot
be averaged to be a proficient
Equal–interval scales
• Like an ordinal scale, levels are ordered from
best to worse and vice versa
• The difference is that you know the
magnitude of the difference between the
levels
• The values between the scales is also equal,
therefore you can add, subtract and average
them
– Ex. Length or weight
Frequency Distribution
• If you have a set of scores for a class. A
frequency distribution organizes these scores
into a format that makes it easier to see how
subjects preformed. The two most commonly
used formats are the Table and the Graph
Distribution
• Sets of scores( a class set of test scores, or a
school set) can be described in terms of four
characteristics
– Mean
– Variance
– Skew
– Kurtosis
Mean
• Mean is just a term for
the average
95
88
99
91
69
442
442÷5=88.4
Variance
• Variance is the difference between each score
and every other score
• It is calculated with a rather complicated
formula, which I will show you next
• It is a key concept in understanding most
statistics
Variance
Skew
• Refers to the symmetry of of a distribution of
scores.
– In a symmetrical set, the scores above the mean
mirror the scores below the mean, they are not
skewed
– If a set of scores represent all good grades, because
the students learned the material well, or the test was
easy, the chart for the distribution of scores would be
skewed
– The opposite can also happen when the test scores
are not good
Bell Shaped Curve
Bell Shaped Curve
• Symmetrical
• Used to determine many standards
– A, B,C
– IQ
– SAT
Negative Skewed Curve
Negative Skewed Curve
• Shows most students scored well or high on
the measure
Positive Skewed Curve
Positive Skewed Curve
• Shows most students scored poorly or low on
the measure
kurtosis
• Describes the peakedness of a curve
• Platykuric- flat curve
• Leptokuric – fast rising curve
kurtosis
kurtosis
Measures of Central Tendency
• Mean- Average
• Median- Middle
• Mode- Most frequent
Average
• Is a general description of a group as a whole
• Already demonstrated
Mode
• THE MODE- MOST FREQUENT SCORE
• Some distributions have two mode, called
bimodal some have more than two modes
Median
• Important concept in understanding certain
scores
• The mean is the point where 50% of test
takers are above the point and 50% of test
takers are below the point
Mean
• The mean is the same thing as the average
• It is symbolized by
Measures of dispersion
• There are three measures of dispersion
– Range
– Variance
– Standard deviation
Range
• Is the distance between extremes, between
the highest and lowest score
• A crude measure of dispersion because it uses
only two points of data
• What if the scores were
• 35 88,89,94,95,98, 99,97,96
• The range is 61, but it is deceiving, the scores
are really clustered in the 90s
Variance
• Shows how far numbers are spread from the
mean
• Some data have high variance, some have low
• Already saw video on this
Standard Deviation
• It is the positive square root of the variance
• It is often used as a unit of measurement
similar to an inch or a foot
• When scores are equal interval, standard
deviation
Standard Deviation
• On a classic curve 34 % fall in first SD
– 13% on second SD
– 2% on third SD
– .13% on fourth SD
Correlation
• Measures how much two variables are related. The
value of one variable can predict the other. When
one occurs, so does the other,
• Positive correlation when one variable increases, the
other variable increases
• Negative correlation- when one variable increases,
the other decreases
Correlation
• Correlation Coefficient-it is measured by a
number from 1 to -1. It measures he direction
and strength of the relationship of the
correlation. .10 and .3o are small correlations,
,4 and .6 moderate correlation., .8 to 1.O are
high relationships
Objective vs. Subjective scoring
• Objective- based on observable qualities that
is not based on emotion
• Subjective- is based on observable scoring
that is based on qualities that rely on personal
impression
Activity
• List the strengths and weaknesses of
subjective and objective scoring
Summarizing Student Performances
• As you assess students you have a number of
ways to determine if they understand the
knowledge you are interested in assessing.
• Some are very simple and provide limited
amounts of information while others are more
detailed and relay a variety of information
Activity
• Meet in your groups
• As I describe all of the different methods for
summarizing students data, think of examples
of when it would be used or a test that uses it.
• Save these and we will share them in the end
Dichotomous Scoring
• Used when you have one right or wrong
answer
• It either is right, or it is wrong
Dichotomous Scoring
Partial Answer
• Partial answer is used when heir are multiple
steps and you are looking for more than a
right or wrong answer
• Some of the criteria could include steps or a
procedure in the process in addition to the
right answer
Summary Index
• When you are concerned with multiple items
and you are concerned with performance on
all of the items as a whole
• The sum of the correct items is the most basic
summary index
Other summary indexes
• Often a summary index does not provide enough
information as just a raw score
• These scores are then converted into more
meaningful scores
• There are five basic types
–
–
–
–
–
Percent correct
Percent accuracy
Rate of correct responses
Fluency
Retention
Percent correct
• Divide the number correct into the number
possible
– 40 correct, 50 possible 40÷50=.8x100=80%
– Usually used when students have ample time to
complete test- Power test
– Very frequently used
Accuracy
• This is the number of correct responses divided
by the number of attempted responses multiplied
by 100
– 150÷175=.86x100=86% correct
• Yields a higher score, reveals if a child has a skill,
but does not have automaticity or processing
speed,
• Provides important information on intervention.
Accuracy
• Often these percentages are given labels
– Above 90% mastery
– Below 90- non-mastery
– The labels are somewhat arbitrary
• The other form is labeling them as and
instructional level
– 95% or above independent
– Between 85 and 94- instructional
– Below 85- frustration
Accuracy
• Independent- means is the point where a
students can perform without assistance
• Instructional- is the point that has enough
challenge where a students is likely to be
successful, but is not guaranteed success
• Frustration- is the level that is too difficult for
a student
Fluency
• Is the number of correct responses per
minute
• Measures what a child can do automatically,
what is at their fingertips
Retention
• Refers to the percentage of learned
information that is recalled
• Sometimes called maintenance or recall of
what has been learned
• Calculated the same way
– The number recalled(15) by the number originally
learned (20)
– 15÷20=.75x100=75% correct
Feedback
• Dichotomous Scoring
• Partial Answer
• Summary Index
– Percent correct
– Percent accuracy
– Rate of correct responses
– Fluency
– Retention
Interpretation of Test Performance
• There are three common was to interpret
students performance
– Criterion –Referenced Interpretations
– Achievement-Referenced Interpretations
– Norm-Referenced Interpretations
Criterion-Referenced Interpretations
• When you are interested in what a student’s
knowledge about a single fact, you compare it
against an objective and absolute standard
(criterion) of performance
• To be considered criterion-Referenced, there
must be a clear objective response to each
portion of the question if partial credit is to be
given
Achievement Standards-Referenced
Interpretations
• In large scale assessments, school districts
measure the degree to which they are meeting
state and national achievement levels.
• to do so the indices consist of four components
–
–
–
–
Level of performance
Objective criteria
Examples
Cut scores
Achievement Standards-Referenced
Interpretations
• Level of performance
– There is a range of level that are attached and
assigned to bands or ranges of performance, below
basic, basic, proficient, advance
• Objective criteria
– Each level of performance is described by a precise,
objective description that can be quantified
• Examples
– Examples of student work at each level
• Cut scores- these scores delineate student
performance at each level
Norm-Referenced Interpretations
• Sometimes testers are interested in how their
students compare to the performance of
other students, usually with similar
demographics characteristics
– Grade, age, gender and so forth
• In order to make these comparisons, the
students scores are transformed into derived
scores
Derived Scores
• Two types
– Developmental scores– Relative standing scores
Derived Scores
• Developmental scores- There are two types of
developmental scores
– Developmental Equivalents– Developmental quotients
Derived Scores
• Developmental scores– Developmental Equivalents- maybe be based on age
or grade
• Developmental equivalents are based on the average
performance of individuals in a given grade or age
• If the average fifth grader averaged 25 correct on the test,
and Jim scored 25,he would have a grade equivalent of 5th
grade
• Grade equivalents are expressed in decimals- 5.3 means 5th
grade third month
• Age equivalents are expressed in years and months with a
hyphen 7-11, seven years 11 months
Interpreting scores
• Often scores are misinterpreted
• They should be interpreted as the average
performance of that age level
There are five problems that people make
– Systematic misrepresentation
– Need for interpolation and extrapolation
– Promotion of typological thinking
– Implications of false standard of performance
– Tendency for scales to be ordinal, not interval
Interpreting scores
• Systematic misrepresented
– Because a child scores a 12-0 age equivalent it
does not mean preformed as a 12-0 child. They
got the same score, they may have attacked the
problems differently
– A younger child and an older child could get the
same score, but could have went about getting
the scores very differently
Interpreting Scores
• Need for interpolation and extrapolation
– The scores that students are compared to are
estimates, there may not be a sample for each
grade, age and month
– Levels were determined by estimates based on
what students did take the test
Interpreting Scores
Promotion of typological thinking- the logic behind
these equivalences, it that average children
perform the same
What is an average child, lives in a family with 1.2
siblings, their family has 2.3 cars, has ,8 dogs
In other words, there is no average family
Interpreting Scores
Implications of a false
standard performanceOften the performance
level stated is not
accurate
Think of how you get an
averageLook at the set of numbers
to the right
•
•
•
•
•
•
+
95
90
85
80
75
70
65
560÷7= 80
Interpreting Scores
The average in this case is 80
80 would be the score the
equivalent would be based
on (except for more
numbers)
Half of the scores are highertherefore, is it really a
representative as an age or
grade equivalent?
•
•
•
•
•
•
+
95
90
85
80
75
70
65
560÷7= 80
Interpreting Scores
Tendency for Scales to be ordinal, not equal interval
Because the scales are ordinal , and not equal
distances, you can not add, or subtract or average
them
Interpreting Scores
When interpreting score , you must always relate it
to chronological age
You need to compare MA and CA simultaneously
IQ = MA/CA
a Developmental age of 120 months is great for a
fourth grader, but bad for a eight grader
Stopped 2-14-14
Interpreting Scores
• Percentile scores- these are scores that are
below a given rank
• A raw score (58) converts to a percentile rank
68. this means that this person scored the
same or better than 68% of the test takers
• Score can range from .1 to 99.9
• The 50th is median
Interpreting Scores
• Percentile scores- are sometimes presented in
bands- the two most common are Deciles and
quartiles ( PSSAs uses quintiles)
– Deciles- is when 10 percentile ranks are within
each norm group
– The first decile is .1 to 9.9
– The second is 10 to 19.9
– The tenth is from 90 too99.9
Interpreting Scores
• Percentile scores- are sometimes presented in
bands- the two most common are Deciles and
quartiles ( PAAS uses quintiles) Quartiles are
percentile bands 25 percentiles wide
– First quartile - .1 to 24.9
– Second is 25 to 49.9
Interpreting Scores
• Percentile scores- are great to compare a
child. It allows you to see how they are doing
to another students in math
• Done all the time with heights and weights
• You cannot compare between the test, you
cannot say that tom is 10 percentile points
better in reading than math
• You also need to interpret them in context
• See next slide
Activity
 If a child had a 40 percent locally and a 85
percent nationally, is this the sign of a high or
low achieving school district
 You are in a meeting, Describe to a parent
what this means
Activity
 It means that 60% of the students in the
district scored at or above the 85th percentile
nationally.
Interpreting Scores
• Standard Scores– There are many different types of standard scores
– T-scores and Z-scores are the most popular
Interpreting Scores
• Standard Scores- are scores with a Predetermined
mean and standard deviation
• The most common is a Z score
• The mean is a 0
• The standard deviation is 1
• Positive scores are above the mean and negative
scores are below the mean
• The larger the number, the further you are away
from the mean
Interpreting Scores
z-score
Interpreting Scores
t-score
• T scores have a mean of 50 with a standard
deviation of 10
• 60 is one standard deviation above the mean
Interpreting Scores
Standard scores
• IQ are standard deviations with a mean of
100 and a standard deviation of 15
Interpreting Scores
Standard scores
• As I mentioned, Standard scores have
predetermined mean and Standard deviations
• Figure out what is the mean and standard
deviation for the sat math or reading?
Interpreting Scores
Standard scores
• Standards scores are used frequently and the
mean can be determine arbitrarily as well as
the standard deviation
• SAT Mean = 500, +1sd=600, +2sd=700,
+3sd=800, -1sd=400,-2=300,-3sd=200
• This is why the lowest you can get on any test
is 200
Interpreting Scores
Standard scores
• NCE- Normal curve equivalent are standard
scores of a mean equal to 50 and a standard
deviation 21.06
Interpreting Scores
Standard scores
• NCE- A normal curve equivalent (NCE) allows
meaningful comparison between different test
sections within . For example, if a student
receives NCE scores of 53 on the Reading test and
45 on the Mathematics test, you can correctly say
that the Reading score is eight points higher than
the Mathematics score.
• NCEs are represented on a scale of 1 - 99. This
scale coincides with the national percentile scale
at 1, 50, and 99.
•
Interpreting Scores
Standard scores
• NCEs have the advantage of being based on an
equal-interval scale. That is, the difference
between two successive scores on the scale is the
same over all parts of the scale. This means that,
unlike percentiles, you can average NCE scores to
compare groups of students.
• You can also convert average NCEs to national
percentiles for a more meaningful understanding
of the scores. This is because NCEs and NPs have
a consistent relationship
Norms
• Normative groups allow us to compare one
person performance to the performance of
others
• It is important to know what norm group you
are being compared to
• The assumption always is that you are being
compared to the general population-make
sure
Norms
• Take our example
• It means that 60% of the students in the
district scored at or above the 85th percentile
nationally.
• What help is this standardized test in placing
students in a high math class?
• Be sure you know the characteristics of the
norm group and how it compares to what you
are using it for?
Characteristics of a Norm Group
• Gender
– Girls develop physically before and faster than
boys
– After puberty, it shifts
– Some say there are different role expectations
– Even though, people see differences, On most
test, gender differences are small
– If differences are pronounced, the you should
have different norm groups
Characteristics of a Norm Group
• Age
– Is important, different abilities develop at different
times
– Younger children see differences every couple of
months ( pre-school) once school age a few
months should not make a difference- look for
differences by every six months to a year
Characteristics of a Norm Group
• Grade
– All achievements test should use this
– Some subjects might be off, but math, reading
should be accurate
Characteristics of a Norm Group
• Acculturation
– Very imprecise concept, should not rely on
Characteristics of a Norm Group
• Race and culture
– An important concept to understand and watch
for, but in some cases not a lot you can do with
• There is an overrepresentation of certain groups, be
aware of this
• Yet if there are needs, you need to provide support
– Steps have been taking to correct this problem
Characteristics of a Norm Group
• Geography,
– I never have considered this
– There are differences between different areas of
the country
Characteristics of a Norm Group
• Proportional representation- should have
representation for each sub group- k-12 is 13
groups, if male female- that is 26
• Can’t be to date- do not want data from the
1950s
Characteristics of a Norm Group