Download t - Computational Cognitive Science Lab

Document related concepts
no text concepts found
Transcript
Normative models of
human inductive inference
Tom Griffiths
Department of Psychology
Cognitive Science Program
University of California, Berkeley
Perception is optimal
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncomp resse d) de com press or
are nee ded to s ee this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Körding & Wolpert (2004)
Cognition is not
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Optimality and cognition
• Can optimal solutions to computational
problems shed light on human cognition?
Optimality and cognition
• Can optimal solutions to computational
problems shed light on human cognition?
• Can we explain aspects of cognition as the
result of sensitivity to natural statistics?
• What kind of representations are extracted
from those statistics?
Optimality and cognition
• Can optimal solutions to computational
problems shed light on human cognition?
• Can we explain aspects of cognition as the
result of sensitivity to natural statistics?
• What kind of representations are extracted
from those statistics?
Joint work with Josh Tenenbaum
Natural statistics
Neural representation
Images of natural scenes
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
sparse
coding
(Olshausen & Field, 1996)
Predicting the future
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
How often is Google News updated?
t = time since last update
ttotal = time between updates
What should we guess for ttotal given t?
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Reverend Thomas Bayes
Bayes’ theorem
Posterior
probability
Likelihood
Prior
probability
p ( d | h) p ( h)
p(h | d ) 
 p(d | h) p(h)
hH
h: hypothesis
d: data
Sum over space
of hypotheses

Bayes’ theorem
p(h | d)  p(d | h)p(h)
h: hypothesis
d: data
Bayesian inference
p(ttotal|t)  p(t|ttotal) p(ttotal)
posterior
probability
likelihood
prior
Bayesian inference
p(ttotal|t)  p(t|ttotal) p(ttotal)
posterior
probability
likelihood
prior
p(ttotal|t)  1/ttotal p(ttotal)
assume
random
sample
(0 < t < ttotal)
The effects of priors
Evaluating human predictions
• Different domains with different priors:
–
–
–
–
–
a movie has made $60 million
your friend quotes from line 17 of a poem
you meet a 78 year old man
a movie has been running for 55 minutes
a U.S. congressman has served for 11 years
[power-law]
[power-law]
• Prior distributions derived from actual data
• Use 5 values of t for each
• People predict ttotal
[Gaussian]
[Gaussian]
[Erlang]
people
Gott’s rule
empirical prior
parametric prior
Predicting the future
• People produce accurate predictions for the
duration and extent of everyday events
• People are sensitive to the statistics of their
environment in making these predictions
– form of the prior (power-law or exponential)
– distribution given that form (parameters)
Optimality and cognition
• Can optimal solutions to computational
problems shed light on human cognition?
• Can we explain aspects of cognition as the
result of sensitivity to natural statistics?
• What kind of representations are extracted
from those statistics?
Joint work with Adam Sanborn
Categories are central to cognition
Sampling from categories
Frog
distribution
P(x|c)
Markov chain Monte Carlo
• Sample from a target distribution P(x) by
constructing Markov chain for which P(x) is
the stationary distribution
• Markov chain converges to its stationary
distribution, providing outcomes that can be
used similarly to samples
Metropolis-Hastings algorithm
(Metropolis et al., 1953; Hastings, 1970)
Step 1: propose a state (we assume symmetrically)
Q(x(t+1)|x(t)) = Q(x(t))|x(t+1))
Step 2: decide whether to accept, with probability
Metropolis acceptance
function
Barker acceptance
function
Metropolis-Hastings algorithm
p(x)
Metropolis-Hastings algorithm
p(x)
Metropolis-Hastings algorithm
p(x)
Metropolis-Hastings algorithm
p(x)
A(x(t), x(t+1)) = 0.5
Metropolis-Hastings algorithm
p(x)
Metropolis-Hastings algorithm
p(x)
A(x(t), x(t+1)) = 1
A task
Ask subjects which of two alternatives
comes from a target category
Which animal is a frog?
A Bayesian analysis of the task
Assume:
Response probabilities
If people probability match to the posterior,
response probability is equivalent to the Barker
acceptance function for target distribution p(x|c)
Collecting the samples
Which is the frog?
Which is the frog?
Which is the frog?
Trial 1
Trial 2
Trial 3
Verifying the method
Training
Subjects were shown schematic fish of different sizes
and trained on whether they came from the ocean
(uniform) or a fish farm (Gaussian)
Between-subject conditions
Choice task
Subjects judged which of the two fish came from
the fish farm (Gaussian) distribution
Examples of subject MCMC chains
Estimates from all subjects
• Estimated means and standard deviations are
significantly different across groups
• Estimated means are accurate, but standard
deviation estimates are high
– result could be due to perceptual noise or response gain
Sampling from natural categories
Examined distributions for four natural categories:
giraffes, horses, cats, and dogs
Presented stimuli with nine-parameter stick figures
(Olman & Kersten, 2004)
Choice task
Samples from Subject 3
(projected onto plane from LDA)
Mean animals by subject
S1
giraffe
horse
cat
dog
S2
S3
S4
S5
S6
S7
S8
Marginal densities
(aggregated across subjects)
Giraffes are
distinguished by
neck length,
body height and
body tilt
Horses are like
giraffes, but with
shorter bodies
and nearly
uniform necks
Cats have longer
tails than dogs
Markov chain Monte Carlo with people
• Normative models can guide the design of
experiments to measure psychological variables
• Markov chain Monte Carlo (and other methods)
can be used to sample from subjective
probability distributions
– category distributions
– prior distributions
Conclusion
• Optimal solutions to computational problems
can shed light on human cognition
• We can explain aspects of cognition as the result
of sensitivity to natural statistics
• We can use optimality to explore representations
extracted from those statistics
Relative volume of categories
Convex Hull
Minimum Enclosing Hypercube
Convex hull content divided by enclosing
hypercube content
Giraffe
Horse
Cat
Dog
0.00004
0.00006
0.00003
0.00002
Discrimination method
(Olman & Kersten, 2004)
Parameter space for discrimination
Restricted so that most random draws were animal-like
MCMC and discrimination means
Iterated learning
(Kirby, 2001)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Each learner sees data, forms a hypothesis,
produces the data given to the next learner
With Bayesian learners, the distribution
over hypotheses converges to the prior
(Griffiths & Kalish, 2005)
Explaining convergence to the prior
PL(h|d)
PL(h|d)
PP(d|h)
PP(d|h)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• Intuitively: data acts once, prior many times
• Formally: iterated learning with Bayesian
agents is a Gibbs sampler on P(d,h)
(Griffiths & Kalish, in press)
Iterated function learning
(Kalish, Griffiths, & Lewandowsky, in press)
data
hypotheses
• Each learner sees a set of (x,y) pairs
• Makes predictions of y for new x values
• Predictions are data for the next learner
Function learning experiments
Stimulus
Feedback
Response
Slider
Examine iterated learning with different initial data