Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Probability box wikipedia , lookup

Birthday problem wikipedia , lookup

Probability amplitude wikipedia , lookup

Expected utility hypothesis wikipedia , lookup

Risk aversion (psychology) wikipedia , lookup

Transcript
Adapting Proper Scoring Rules for
Measuring Subjective Beliefs to the
Current State of the Art in Decision
Theory
Theo Offerman Joep Sonnemans Gijs van de Kuilen Peter P. Wakker
October 12, 2007
Dept. Econ., Univ. Florida, Orlando, FL
Topic: Our chance estimates of uncertain events;
e.g.: Hillary Clinton next president of the US.
Application considered first: grading students.
2
Say, you grade a multiple choice exam in
geography to test students' knowledge about
Statement H: Capital of North Holland =
Amsterdam.

H
 not-H
Reward: if H true
1
0
if not H true
0
1
Assume: Two kinds of students.
1. Those who know. They answer correctly.
2. Those who don't know. Answer 50-50.
Problem: Correct answer does not completely
identify student's knowledge.
Some correct answers, and high grades, are
due to luck. There is noise in the data.
3
One way out: oral exam, or ask details.
Too time consuming.
Now comes a perfect way out:
Exactly identify state of knowledge of each
student.
Still multiple choice, taking no more time!
Best of all worlds!
4
Reward: if H true

H
1
 don't know
0.75
 not-H
0
if not H true
0
0.75
1
For those of you who do not know answer to H.
What would you do?
Best to say "don't know."
System perfectly well discriminates between
students!
New Assumption: Students have all kinds of
degrees of knowledge. Some are almost
sure H is true but are not completely sure;
others have a hunch; etc.
Above system does not discriminate
perfectly well between such various states of
knowledge.
One solution (too time consuming): Oral
exams etc.
5
Second solution:
Find r such that student indifferent between:
Reward: if H true
 H
1
 partly know H,
r
to degree r
if not H true
0
r
Then r = P(H).
(Assuming expected value maximization …)
How measure r?
Can elicit r from risky choices
(we will skip details):
6
Reward: if H true
H
choice 1 
1
 don't know
0.10
if not H true
0
0.10
Reward: if H true
H
choice 2 
1
 don't know
0.20
if not H true
0
0.20
Reward: if H true
H
choice 3 
1
 don't know
0.30
if not H true
0
0.30
Reward: if H true
H
choice 9 
1
 don't know
0.90
if not H true
0
0.90
.
.
.
7
8
Etc. Can get approximation of true subjective
probability p.
Binary-choice ("preference") solutions are
popular in decision theory.
Too time consuming for us.
Rewards for students are still somewhat noisy
this way (only approximation of p).
Third solution [introspection]
9
Simply ask student which r ("reported
probability") brings indifference in
Reward: if H true if not H true
 H
1
0
 partly know H,
r
r
to degree r
Problems:
Why would student give true answer r = p?
What at all is the reward (grade?) for the student?
Does reward, whatever it is, give an incentive to
give true answer?
10
I now promise a perfect way out:
de Finetti's dream-incentives.
Exactly identifies state of knowledge of each
student, no matter what it is.
Takes little time; no more than multiple choice.
Rewards students fairly, with little noise.
Best of all worlds.
For the full variety of degrees of knowledge.
----------------------------------------------------------Student can choose reported probability r for
H from the [0,1] continuum, as follows:
11
Reward: if H true
1
r=1: (H = sure!?)
of
r: degree
belief in H (?)
r=0.5:
(don't know!?)
r=0: (not-H is sure!?)
if not H true
0
1 – (1–r)2
1–r2
0.75
0.75
0
1
Claim: Under "subjective expected value,"
optimal reported probability r
=
true subjective probability p.
Proof of claim.
To help memory:
Reward: if H true if not H true
2
2
 r: degree of
1
–
(1–r)
1–r
belief in H
p true probability; r reported probability.
Optimize EV = p(1 – (1–r)2) + (1–p)(1–r2).
1st order optimality:
2p(1–r) – 2r(1–p) = 0.
r = p!

12
13
Easy in algebraic sense.
Conceptually: !!! Wow !!!
Incentive compatible ... Many implications ...
de Finetti (1962) and Brier (1950) were the
first neuro-scientists!
Useful in many domains.
"Bayesian truth serum" (Prelec, Science, 2005).
Superior to elicitations through preferences .
Superior to elicitations through indifferences ~
(BDM).
Widely used: Hanson (Nature, 2002), Prelec (Science 2005). In
accounting (Wright 1988), Bayesian statistics (Savage 1971),
business (Stael von Holstein 1972), education (Echternacht 1972),
finance (Shiller, Kon-Ya, & Tsutsui 1996), medicine (Spiegelhalter
1986), psychology (Liberman & Tversky 1993; McClelland & Bolger
1994), experimental economics (Nyarko & Schotter 2002).
Remember: based on expected value; in 2007 …!?
We bring
- realism (of prospect theory) to proper scoring rules;
- the beauty of proper scoring rules to prospect theory
and studies of ambiguity.
14
Survey
Part I. Deriving reported prob. r from theories:
 expected value;
 expected utility;
 nonexpected utility for probabilities;
 nonexpected utility for ambiguity.
Part II. Deriving theories from observed r.
In particular: Derive beliefs/ambiguity
attitudes. Will be surprisingly easy.
Proper scoring rules <==> risk & ambiguity:
Mutual benefits.
Part III. Implementation in an experiment.
15
16
Part I. Deriving r from Theories (EV, and
then 3 deviations).
Event H: Hillary Clinton next president of the
US.
not-H: Someone else next president.
We quantitatively measure your subjective
belief about this event (subjective
probability?), i.e. how much you believe in
Hillary.
17
Your subj. prob.(H: Hillary next president) =
0.75 (charming husband Bill).
EV:
Then your optimal rH = 0.75.
18
R(p) 1
rEU
rEV
0.75
0.69
0.61
(a) expected value (EV);
rnonEU (b) expected utility with U(x) =
x (EU);
0.50
rnonEUA (c) nonexpected utility for
nonEU
0.25
Reported probability R(p) = rH
as function of true probability
p, under:
known probabilities, with U(x)
= x0.5 and with w(p) as
common;
EU
EV
0
0
0.25

rEV=0.75

rEU=0.69
 rnonEU=0.61
 rnonEUA=0.52
0.50
0.75
reward: if H true
0.94
0.91
0.85
0.77
p
rnonEUA: nonexpected utility for
unknown probabilities
1 ("Ambiguity").
if not H true EV
0.44
0.8125
0.52
0.8094
0.63
0.7920
0.73
0.7596
next p.
go to p. 21,
Example EU
go to p. 25,
Example nonEU
go to p. 29,
Example nonEUA
19
So far we assumed EV (as does everyone using
proper scoring rules, but as does no-one in
modern risk-ambiguity theories ...)
Deviation 1 from EV: EU with U nonlinear
Now optimize
pU(1 – (1– r)2) + (1 – p)U(1 – r2)
r = p need no more be optimal.
Theorem. Under expected utility with true
probability p,
r =
p
U´(1–r2)
p + (1–p)
U´(1 – (1–r)2)
Reversed (and explicit) expression:
r
p =
U´(1–r2)
r + (1–r)
U´(1– (1–r)2)
A corollary to distinguish from EV: r is
nonadditive as soon as U is nonlinear.
20
21
How bet on Hillary? [Expected Utility].
EV: rEV = 0.75.
Expected utility, U(x) = x:
rEU = 0.69.
You now bet less on Hillary. Closer to safety
(Winkler & Murphy 1970).
go to p. 18,
with figure of
R(p)
Deviation 2 from EV: nonexpected utility for
probabilities (Allais 1953, Machina 1982, Kahneman &
Tversky 1979, Quiggin 1982, Gul 1991, Luce & Fishburn 1991,
Tversky & Kahneman 1992; Birnbaum 2005; survey: Starmer
2000)
For two-gain prospects, virtually all those
theories are as follows:
For r  0.5, nonEU(r) =
w(p)U(1 – (1–r)2) + (1–w(p))U(1–r2).
r < 0.5, symmetry, etc.
Different treatment of highest and lowest
outcome: "rank-dependence."
22
23
w(p)
1
.51
1/3
0
1/3
2/3
1
p
Figure. The common weighting function w.
w(p) = exp(–(–ln(p))) for  = 0.65.
w(1/3)  1/3;
w(2/3)  .51
24
Theorem. Under nonexpected utility with
true probability p,
r =
w(p)
U´(1–r2)
w(p) + (1–w(p))
U´(1– (1–r)2)
Reversed (explicit) expression:
r
w –1
p =
U´(1–r2)
r + (1–r)
U´(1– (1–r)2)
(
)
25
How bet on Hillary now? [nonEU with probabilities].
EV: rEV = 0.75.
EU: rEU = 0.69.
Nonexpected utility, U(x) = x,
w(p) = exp(–(–ln(p))0.65).
rnonEU = 0.61.
You bet even less on Hillary. Again closer to
fifty-fifty safety.
go to p. 18,
with figure of
R(p)
Deviations from EV were at level of behavior so far,
not of beliefs. Now for something different;
more fundamental for our purposes.
Deviation 3 from EV: Ambiguity
26
(unknown probabilities;
concerns belief or decision-attitude?
Yet to be settled).
Of different nature than previous two deviations.
Not to "correct for if distorting" but the thing to
measure.
How deal with unknown probabilities?
Have to give up Bayesian beliefs descriptively.
According to some even normatively.
Instead of additive beliefs p = P(H),
nonadditive beliefs B(H)
(Dempster&Shafer, Tversky&Koehler; etc.)
27
All currently existing decision models:
For r  0.5, nonEU(r) =
W(H)U(1 – (1–r)2) + (1–W(H))U(1–r2).
Is '92 prospect theory, = Schmeidler (1989).
Includes multiple priors (Wald 1950; Gilboa & Schmeidler 1989);
For binary gambles: Pfanzagl 1959; Luce ('00 Chapter 3);
Ghirardato & Marinacci ('01, "biseparable").
Can always write B(H) = w–1(W(H)),
so W(H) = w(B(H)). Then
w(B(H))U(1 – (1–r)2) + (1–w(B(H)))U(1–r2).
28
Theorem. Under nonexpected utility with
ambiguity,
rH =
w(B(H))
U´(1–r2)
w(B(H)) + (1–w(B(H)))
U´(1– (1–r)2)
Reversed (explicit) expression:
r
B(H) = w –1
U´(1–r2)
r + (1–r)
U´(1– (1–r)2)
(
)
29
How bet on Hillary now? [Ambiguity, nonEUA].
rEV = 0.75.
rEU = 0.69.
rnonEU = 0.61.
Similarly,
rnonEUA = 0.52 (under plausible assumptions).
r's are close to insensitive fifty-fifty.
"Belief" component B(H) = w–1(W) = 0.62.
go to p. 18,
with figure of
R(p)
30
B(H): ambiguity attitude /=/ beliefs??
Before entering that debate, first:
How measure B(H)?
Our contribution: through proper scoring rules
with "risk correction."
This ends Part I. Before going to Part II (derive
theory from r), in preparation, we consider other
proposals for measuring B and W considered in
the literature (that we will improve upon):
31
Proposal 1 (common in decision theory):
Measure U,W, and w from behavior, and derive
B(H) = w–1(W(H)) from it.
Problem: Much and difficult work!!!
Proposal 2 (common in decision analysis of
the 1960s, and in modern experimental
economics): measure canonical probabilities,
that is,
for H, find event Hp with objective probability p
such that
(H:100) ~ (Hp:100) = (p:100). Then B(H) = p.
Problem: measuring indifferences is difficult.
Proposal 3 (common in proper scoring rules):
Calibration …
Problem: Need many repeated observations.
Part II. Deriving Theoretical Things from
Empirical Observations of r
We reconsider reversed explicit expressions:
r
p = w –1
U´(1–r2)
r + (1–r)
U´(1– (1–r)2)
)
(
(
B(H) = w –1
r
U´(1–r2)
r + (1–r)
U´(1– (1–r)2)
)
Corollary. p = B(H) if related to the same r!!
32
Our proposal takes the best of several worlds!
Need not measure U,W, and w.
Get "canonical probability" without measuring
indifferences (BDM …; Holt 2006).
Calibration without needing many repeated
observations.
Do all that with no more than simple properscoring-rule questions.
33
Example (participant 25)
34
stock 20, CSM
certificates
dealing in sugar
and bakeryingredients.
Reported
probability:
r = 0.75
91 91
For objective probability p=0.70, also
reported probability r = 0.75.
Conclusion: B(elief) of ending in bar is 0.70!
We simply measure the R(p) curves, and
use their inverses: is risk correction.
Directly implementable empirically. We did so in
an experiment, and found plausible results.
35
36
Part III. Experimental Test of
Our Correction Method
37
Method
Participants. N = 93 students.
Procedure. Computarized in lab.
Groups of 15/16 each.
4 practice questions.
Stimuli 1. First we did proper scoring rule
for unknown probabilities. 72 in total.
For each stock two small intervals, and, third,
their union. Thus, we test for additivity.
38
Stimuli 2. Known probabilities:
Two 10-sided dies thrown.
Yield random nr. between 01 and 100.
Event H: nr.  75 (p = 3/4 = 15/20) (etc.).
Done for all probabilities j/20.
39
Motivating subjects. Real incentives.
Two treatments.
1. All-pay. Points paid for all questions.
6 points = €1.
Average earning €15.05.
2. One-pay (random-lottery system).
One question, randomly selected afterwards,
played for real. 1 point = €20. Average
earning: €15.30.
40
Results
(of group average; at individual level
more corrections)
41
1
0.9
Average
correction
curves
0.8
Corrected probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
Reported probability
ONE (rho = 0.70)
ALL (rho = 1.14)
45°
42
F(ρ )
1
0.9
Individual
corrections
0.8
0.7
0.6
0.5
treatment
one
0.4
0.3
0.2
treatment
all
0.1
0
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
ρ
43
Figure 9.1. Empirical density of additivity bias for the two treatments
Fig. b. Treatment t=ALL
Fig. a. Treatment t=ONE
uncorrected
160
160
140
140
120
120
100
100
80
corrected
80
60
60
40
40
20
20
0
0.6
0.4
0.2
uncorrected
0
0.2
0.4
0.6
corrected
0
0.6
0.4
0.2
0
0.2
0.4
0.6
For each interval [(j2.5)/100, (j+2.5)/100] of length 0.05 around j/100, we counted the number of additivity biases
in the interval, aggregated over 32 stocks and 89 individuals, for both treatments. With risk-correction, there were
65 additivity biases between 0.375 and 0.425 in the treatment t=ONE, and without risk-correction there were 95
such; etc.
Corrections reduce nonadditivity, but more than half remains: ambiguity generates
more deviation from additivity than risk.
Fewer corrections for Treatment t=ALL. Better use that if no correction possible.
44
Summary and Conclusion
 Modern risk&ambiguity theories: traditional
proper scoring rules are heavily biased.
 We correct for those biases. Benefits
for proper-scoring rule community and for
risk- and ambiguity theories.
 Experiment: correction improves quality;
reduces deviations from ("rational"?)
Bayesian beliefs.
 Do not remove all deviations from Bayesian
beliefs. Beliefs are genuinely nonadditive/
nonBayesian/sensitive-to-ambiguity.
 Proper scoring rules: the post-neuroeconomics
approach for mind-reading.
45
The end.