Download Reliability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Reliability and Validity
Q560: Experimental Methods in Cognitive Science
Lecture 16
Psychological Measurement
 Psychometric theory, sources of measurement noise
 Cognitive Construct:

The label given to our hypothetical characteristic (e.g., attention,
STM-capacity, executive function, intelligence, forgetting, etc.)

Dimensions along which Ss can be located based on behavior

Cannot directly measure construct, so we link it to behavior
believed to reflect the construct
 Operational Definition:

Statement specifies how a construct is measured

Link between the overt and latent variables
Reliability and Validity
Reliability and Validity are important concepts in both
measurement and the full experiment to insure the link
between our statistics and conclusions is sound.
Reliability = “consistency”
Validity = “on targetness”
Reliability is a necessary but insufficient condition for validity
Reliability
Reliability is the extent to which the measurements of a test
remain consistent over repeated tests of the same subject
under identical conditions.
An experiment is reliable if it yields consistent results of the same
measure. It is unreliable if repeated measurements give
different results.
Reliable car, repeatability, etc.
Reliability of experiment, replication, and error rate
Reliability does not imply validity
Validity
Validity of a measure is the degree to which the variable
measures what it is intended to measure
e.g., IQ tests, GREs, Eye tests
A valid measure is reliable
A reliable measure is not necessarily valid
Unreliable and Invalid Shooting:
Sam “Scattershot” Wilson
Reliable but Invalid Shooting:
Ralph “Rightpull” Roberts
Reliable and Valid Shooting:
Kit “Bullseye” Carson
Experimental Validity
 Valid design is necessary for valid scientific conclusions
 Crib sheet: www.indiana.edu/~clcl/Q560/Validity.pdf
1. Statistical Conclusion Validity:

Validity with which statements about the association of two
variables can be made based on statistical tests

Threats: measurement Rxx, statistical assumptions
2. Construct Validity:

Validity with which we can make generalizations about higher-order
constructs from the experimental results

Threats: vague operational def (Watson); experimenter/participant
expectancy effects
Experimental Validity
3. Internal Validity:

Validity with which statements about the causal relationship
between variables as manipulated

Threats: Confounds (history, maturation, testing), instrumentation,
statistical regression, mortality, etc.
4. External Validity:

Validity with which we can make generalizations from sample/expt

Ecological validity

Threats: Interactions of setting/selection method and treatment
Dr. N. Lewis is interested in whether memory encoding is stronger for pictures of
objects or for words that refer to the same objects. He has participants learn a list of 30
written words that refer to objects (then a distracting task) and then recall as many
words as they can. Next, he gives the same participants a list of 30 pictures of the
same objects and (after the same distracting task) has them again recall as many
words as they can. At the end of the experiment, participants recalled a mean of 16
words in the written condition vs. a mean of 24 words in the pictures condition.
1.
2.
3.
4.
5.
6.
What type of study is this?
What is the independent variable?
What is the dependent variable?
Is the dependent variable discrete or continuous?
What is the scale of measurement for the dependent variable?
Name one confounding variable.
A researcher is interested in whether or not snakes can detect insults. He buys 20
exotic snakes, and separates them into two groups of 10. For one group of snakes, he
insults them for 10 minutes each. For the other group, he simply stares at them for 10
minutes each. For each group, he records the number of times the snakes bite him
(assuming that a bite indicates that the snake took offense to him). At the end of the
experiment, the group of snakes he insulted had bit him 23 times vs. only 8 bites from
the group he did not insult.
1.
2.
3.
4.
5.
6.
What type of study is this?
What is the independent variable?
What is the dependent variable?
Is the dependent variable discrete or continuous?
What is the scale of measurement for the dependent variable?
Name one confounding variable.
Article Critique

Assignment #5: Article critique
Small-N Designs
Small-N Designs
Why would we want to do an experiment with a
small number of subjects?
• B/c it’s easier/faster than doing an experiment
with a large number of Ss?
Small-N Designs
Why would we want to do an experiment with a
small number of subjects?
• B/c it’s easier/faster than doing an experiment
with a large number of Ss?
This is untrue…small N experiments are frequently
more difficult and time consuming, even though
there are only 1..5 participants
Small-N Designs
• In a small N design, we make repeated
measurements on a small number of participants
• Although there are fewer participants, there are
many more observations per participant
• These experiments can be many hours of testing
consisting of several thousand trials
• The goal is to provide a complete and accurate
description of a single subject’s behavioral
changes as a function of a repeated measure
• Other subjects are replications (separate expts)
• The experimenter is often a subject
Why would we want to do this?
1. Practical Reasons: A small N design may be
necessary b/c it is difficult to get Ss from a rare
population (e.g., OCD, Alzheimer’s), the
treatments may be expensive or time consuming
(e.g., teaching sign language to a chimp:
Patterson & Linden, 1981, spent 10 years doing
this)
2. Theoretical Reasons: Skinner believed that “the
best way to understand behavior is to study
single individuals intensely” …one should “study a
single subject for a thousand hours rather than a
thousand subjects for an hour” (1966, p.21)
“If conditions are precisely controlled, then orderly
and predictable behavior will follow”
Why would we want to do this?
3. Methodological Reasons: Pooling or averaging
data from many subjects can produce misleading
results as an artifact of grouping (Sidman, 1960)
• Averaging can produce results that do not
characterize any subject who participated. More
importantly, it can produce a result supporting
theory X when perhaps it shouldn’t (Estes)
• E.g.: Manis (1971) tested children in a
discrimination learning task…stimuli were simple
objects, and your will have to learn which feature
is the diagnotic one (shape, color, position, etc.)
Trial 1:
+
Trial 2:
+
Trial 3:
+
Trial 4:
+
Trial 5:
+
Trial 6:
+
Continuity Theory: Concept learning is a gradual
process of accumulating “habit strength”
Noncontinuity Theory: Subjects actively try out
different “hypotheses” over trials. While they search
for the correct hypothesis, performance is at chance,
but once they hit the correct hypothesis, the
performance shoots up to 100% and stays there
Perfect
Performance
Chance
Trials
Here are the averaged data. Continuity is right!!
But here are the individual data before averaging.
Discontinuity is right?!
CogSci Began with Small-N Research
• Ebbinghaus, Wundt, Dressler, Thorndike
• It wasn’t until the 1930s that experiments with
large numbers of participants and aggregate
statistics became commonplace (largely due to
Fisher)
• Psychophysics is still dominated by Small-N
designs. Idea is that we have very high similarity
between our perceptual systems; with sufficient
control, a stable effect should be observable
without needing many subjects
• “An effect that isn’t stable enough to be studied
with a small N isn’t worth studying”
Elements of Small-N Designs
1. A within-Ss manipulation
2. Target behavior must be operationally defined
3. Establish a baseline of responding/behavior
4. Begin treatment manipulation, and monitor
change from baseline
How do we analyze this?
•
Visual inspection
•
Curve/Trend fitting based on theory
•
Change from baseline
Withdraw Designs
•
Get measurement of baseline behavior on the DV
•
Introduce the manipulation…but, a change in
responding may be due to history or maturation
(AB design)
•
Return to Baseline: If the change is due to hist
or mat, it is unlikely that the behavior will regress
when treatment is removed (called an ABA
design)….ABAB design is more popular
1.
2.
3.
4.
Multiple baselines design
Alternating treatments design
Changing criterion design
Staircase designs
Withdraw Designs:
•
E.g., Does talking to a plant make it grow?
(A)
Baseline
(B)
Treatment
(A)
Baseline
Growth in inches
Growth of ficus divinicus
First three months
Second three months
Final three months
Withdraw Designs:
•
E.g., Does talking to a plant make it grow?
(A)
Baseline
(B)
Treatment
(A)
Baseline
Growth in inches
Growth of ficus divinicus
First three months
Second three months
Final three months
Small-N Designs and Psychophysics
• Psychophysics: the relationship between the
physical stimulus and the perceptual reaction to it
• Small-N designs are popular in Psychophysics bc:
 Few Ss are needed b/c of of the similarity
between our sensory systems (generlizes)
 On each trial, data are much less affected by
error variance than in questionnaire research
(also b/c of laboratory control)
 Trials are very quick and easy: so why do a 30
min experiment where the data collection only
takes 30 seconds?
 Study one S, others are replications
Criticisms Against Small-N Designs
1. External validity: To what extent do these results
generalize?
2. Criticized for relying on visual inspection of data
instead of statistical analysis (but there are more
theory-driven and useful for model fitting)
3. Small-N designs cannot adequately test for
interaction effects (interactive designs exist, but
are very cumbersome: ABBCBABBCB design, etc.)
4. Due to their operant learning tradition
(Skinnerian), they tend to focus on response
frequency as a DV, rather than RT, accuracy,
habituation, etc.