Download Honesty via Choice-Matching - Internet Surveys of American Opinion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ars Conjectandi wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Probability box wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Honesty via Choice-Matching
Jakša Cvitanić ∗and Dražen Prelec †
Abstract
We propose scoring rule mechanisms for eliciting honest responses to a multiple choice question, MCQ. The respondent’s score
consists of two terms: an auxiliary score based on his response to a related auxiliary question, and the corresponding scores of
other respondents giving the same response as him to the MCQ. For this to work, the auxiliary score has to be truth-incentive
for the auxiliary question, which can be accomplished by using strictly proper scoring rules, and the auxiliary question has to be
sufficiently correlated with the MCQ.
Key words: Proper scoring rules, Bayesian Truth Serum, truth-telling equilibria
JEL codes: C11, D82, D83, M00
∗ Division
of the Humanities and Social Sciences, Caltech. E-mail: [email protected]. Research supported in part by NSF grant DMS 10-08219.
Sloan School of Management, Department of Economics, Department of Brain and Cognitive Sciences. E-mail: [email protected]. Supported by
Intelligence Advanced Research Projects Activity (IARPA) via the Department of Interior National Business Center contract number D11PC20058. Disclaimer:
The views and conclusions expressed herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements,
either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government.
† MIT,
1
1
Introduction
Scoring rules for eliciting honest responses from many agents to a multiple choice question, MCQ, that have been considered
in the literature either require strong assumptions on the planner, such as knowing the prior beliefs of the respondents, or may
be difficult to explain to the average respondent, Prelec (2004), Waggoner and Chen (2013), Cvitanić, Prelec, Radas and Šikić
(2015), or are tailored for binary (dichotomous) questions, Witkowski and Parkes (2012), Baillon (2016). In this paper we propose
honesty-inducing scoring rules that do not require the planner to know the prior, are simple to explain, and allow any number of
multiple choices.
The mechanism asks each respondent to respond to two questions, the MCQ of the main interest, with responses called x
responses, and an additional related auxiliary question, with responses called y responses. Given an honesty-inducing score ρ for
y responses, each respondent receives the corresponding ρ score, but adjusted by a term that depends on the ρ scores of other
respondents who make the same choice in the MCQ. For example, one can add the average of such scores to the respondent’s ρ
score. We show that this results in a truth-inducing mechanism, assuming that the respondent’s expected value of the adjusting term
is the highest if she aligns with the group that picks his honest MCQ choice. We discuss sufficient conditions for the mechanism to
work, and how to implement it. In particular, truth-telling is an equilibrium in the setting in which the second question is "What are
the percentages of the respondents choosing each particular choice in the MCQ?", assuming that the respondents are risk-neutral
Bayesian maximizers who have a common prior, and that different honest x responses imply different honest y responses.
In the latter setting, and with infinitely many respondents, a well known scoring mechanism that is implementable in practice
is the Bayesian Truth Serum, BTS, introduced by Prelec (2004). The advantages of the scoring mechanisms in this paper relative
to BTS are: (i) they are simpler to explain to respondents in implementation; (ii) they don’t necessarily require infinitely many
respondents; (iii) they don’t necessarily require a common prior. A potential disadvantage is that they do not necessarily rank
respondents according to any measure of expertise, while BTS ranks them according to their posterior probabilities of the true state
of nature.
In the next section we present an example to illustrate the main idea and possible applications. In Section 3 we present the
model and all the results. We conclude in Section 4. In Appendix, we compare the proposed mechanisms with Bayesian Truth
Serum.
2
Example
We present here an example that illustrates what kind of problems we want to address, to motivate the theory below. Suppose a
planner wants to poll economists, using the following multiple choice question:1
- Consider the following statement: “Some variation of the Keynesian approach to dealing with economic crisis is better than
the existing alternatives”. Choose “Yes” if you agree, and “No” if you don’t.
Suppose the planner wants to know the true percentage of those who agree, and that she is worried some among the respondents
might not provide honest responses. She will design a mechanism that assigns to each a respondent a score based on his responses.
Ideally, the planner wants a scoring rule that is such that, if the respondent wants to maximize his expected score and if everyone
1 In
this case, the question offers only two possible responses, but, in general, it could be any number.
1
else tells the truth, then, it is strictly optimal for the respondent to provide an honest response. However, that is not possible
to achieve with a rule that is based only on responses to a single question 2 . What we provide in this paper are truth-inducing
mechanisms of the following type: each respondent will be asked an additional auxiliary question, and using those two responses,
each will be assigned a score. The scoring rules will be such that truth-telling is an equilibrium if the objective of each respondent
is to maximize the expected score.
3
Denote by k the true opinion of respondent r among the multiple choices, called his honest x choice. One term in the scoring
function will be a score ρ that depends on the responses to the additional auxiliary question. Then, the idea is to pay to each
respondent a weighted average of his ρ score and the average ρ̄ −r (i) of the ρ scores of the respondents who choose the same
answer i to the original question as respondent r does. In other words, by responding to the original question, the respondent is
matched (aligned) to a subgroup of other respondents, which determines the value of his score via the responses to the additional
question. For this to be truth-incentive, the additional question has to be correlated to the original question, in the sense that for
each respondent r, the average score ρ̄ −r (k) of the respondents agreeing with his true choice k has to have higher expected value
than the average score ρ̄ −r (i) of the respondents who choose a different response i 6= k to the original question.
Here are two different examples of possible auxiliary questions correlated to the original question above:
- (a) What is the probability that a randomly selected respondent from the poll will respond “Yes” to the original question?
- (b) What is the probability that the rate of inflation in US will be above value I this year?
We see that even though the true state of nature behind the original question is not verifiable, the auxiliary question may be
based on something that is verifiable. The form of the secondary questions in (a) and (b) is the same (requesting a probability) but
in the first case what is verifiable are the answers of other respondents, in the second it is a macroeconomic statistic.4
For concreteness, we first provide our benchmark example of a truth-incentive scoring rule for multiple choice questions, in the
context of approach (a), in which respondents estimate the probability that another respondent will answer in a particular way.
Assume MA possible types and a multiple choice question with MA possible answers. The number of respondents is assumed
to be at least n > MA + 1. Random variables Xir take value zero for all MCQ choices i ∈ {0, . . . , MA } not declared by respondent
r, and Xkr is equal to one for k corresponding to the response declared by respondent r. The response to the auxiliary question,
denoted by Y jr , is the prediction by respondent r of the empirical distribution of endorsements among n − 1 respondents, excluding
respondent r. The values random variables Xir and Y jr take are denoted xir , yrj .
Denote by zrj the actual percentage of all the respondents other than r who declare x choice j,
zrj = (n − 1)−1 ∑ xtj
t6=r
We first define the score for the auxiliary response (the prediction) as the K-L divergence between a respondent’s prediction of the
sample percentages and the actual percentage:
ρ r = ∑ zrj log yrj
j
2 For example, this is impossible in the setup in which all the respondents have a common prior that is not used by the planner in assigning the scores; see
Cvitanić, Prelec, Radas and Šikić (2014).
3 For this to work in practice, the respondents either have to be sufficiently compensated proportionally to their score, or they have to care about their score for
other reasons, e.g., because of reputational concerns.
4 The disadvantage of the second form is that one would have to wait up to one year to compute the scores.
2
It is well known that honest predictions yrj (i.e., predictions equal to the expected percentages) maximize above expected score.
In order to create incentives for the answer to the primary question, we also credit respondent r with the average auxiliary score
for all respondents who made the same type declaration:
r
t −1
ρ̄ −r = ∑ zrj logy−r
∑ xkr xtk log ytj
j = ∑ z j ∑(∑ xk )
j
j
k t6=r
t6=r
where logy−r
j is the average log-prediction of the sample percentages made by the respondents other than r who have declared the
same MCQ choice as r, called his “type".
To cover the possibility that no one has declared the same MCQ choice as respondent r, we adapt a clever device that Baillon
(2016) used in his Bayesian market mechanism. We define a trigger variable,
K r = ∏(1 − ∏(1 − xti ))
i
t6=r
which has value zero if there is at least one “vacant” answer with zero declarations once respondent r is excluded, and value one
otherwise. This includes the case when r is a “singleton.” We say that type matching is in effect for respondent r if K r = 1.
Importantly, K r does not involve responses (xr , yr ) of respondent r, but only the MCQ choices of other respondents. Therefore, a
respondent cannot influence whether type-matching is in effect for him or her.
The complete type-matching entropy scoring formula is then a weighted average of the logarithmic proper score that a respondent receives for their own prediction, and the average proper score of predictions of all the respondents other than r who have
declared the same type:
K r (λ ρ r + (1 − λ )ρ̄ −r ) = K r (λ ∑ zri log yri + (1 − λ ) ∑ zri logy−r
i )
i
i
Observe that the only impact of type declaration on score is by defining the type-matching subset of respondents. If respondent
believes that, conditional on type matching being in effect, the predictions of respondents that are type matched to him will be
more accurate (on average) than predictions of respondents that are not type matched, then his best strategy is to honestly declare
his type. In particular, this will hold if respondents believe that types determine predictions exactly, and that different types support
different predictions (stochastic relevance). However, it does not require that posterior beliefs about other types are generated from
a common prior.
The minimum sample size for this approach is at least two more respondents than the number of possible answers to the MCQ:
n > MA + 1. It is useful to consider why the approach will not work if n = MA + 1. In that case, the only way one can avoid a vacant
answer after excluding a respondent is if other respondents each chose a different answer. Since this is common knowledge among
types, all types will predict the same (uniform) distribution over answers. Therefore, the stochastic relevance requirement, that
different types generate different predictions conditional on type matching in effect, does not hold. Exactly this point was made by
Baillon (2016).
In the rest of the paper, we present a general model of which the above is an example, and elaborate on the assumptions we
make and the results.
3
3
The Model and the Results
3.1
Choice-matching scoring rules and strict separation
3.1.1
The setup
A survey planner is interested in receiving honest responses to a multiple choice question, the choices of which comprise a set A.
She will do that by assigning to each a respondent a score based on his responses to two questions.
First, the planner asks each respondent r to pick a response in set A = {1, . . . , MA }, the action of which we will name choice
declaration. Given a respondent r, random variables Xir take value zero for all choices i ∈ {0, . . . , MA } not declared by respondent
r, and Xkr is equal to one for k corresponding to the choice declared by respondent r.
Second, each respondent r is also asked to provide (numerically represented) responses to a list of MB ≥ MA auxiliary questions,
and the responses to those questions, as random variables, are denoted Y jr , j = 1, . . . , MB . The values that random variables Xir and
Y jr take are denoted xir , yrj , and we denote by xr , yr , X r ,Y r the vectors consisting of the respective values subscripted with i and j.
The vector of honest (X r ,Y r ) responses of respondent r is called his type, and is denoted T r = (Txr , Tyr ). The values random
vectors/variables T r , Txr , Tyr take are denoted t r ,txr ,tyr .
A pure strategy for player r is a map σ (t r ) = (xσ (t r ), yσ (t r )) that maps a player’s type t r to his response choice (xr , yr ). The
profile of all respondents’ pure strategies is denoted σ (t), with entries σ r (t r ), and the profile excluding player r is denoted σ −r (t −r ).
We consider only pure strategies. A scoring rule is a function ρ(σ r (t r ), σ −r (t −r )) that takes the responses of respondent r, given
all other respondents’ responses, to the set of real numbers.
We suppose that each respondent maximizes the expected value of his score, conditional on his type.
Definition 1. (i) Given a scoring rule ρ(σ r (t r ), σ −r (t −r )), we call a set of response strategies a (Bayesian) Nash Equilibrium, NE,
if, for any potential response action (x, y) 6= (xσ (t r ), yσ (t r )), we have
E[ρ x, y; σ −r (t −r ) | T r = t r ] ≤ E[ρ xσ (t r ), yσ (t r ), σ −r (t −r ) | T r = t r ]
That is, by deviating in responses (x, y), player r would be worse off than not deviating, in the sense of resulting in a lower expected
score, conditional on his type. If the inequality is always strict, we call the NE a strict NE, SNE. If the responses in a NE are honest
in x, we say that the NE is honest in x. If the inequality is strict as soon as x 6= xσ (t r ), we say that the NE is strict in x. If there is
a NE that is honest and strict in x, we say that ρ is strictly incentive compatible for x. If there are NE’s that are honest in x, but no
such NE is strict in x, we say that the rule is weakly incentive compatible in x. The corresponding definitions for y are analogous.
(ii) If there is a NE strict in x (y) in which the respondents with different honest x (y) choices provide distinct x (y) responses,
we say that rule ρ allows a Strictly Separating Nash Equilibrium, SSNE, in x (y) responses.
4
3.1.2
Strict separation via choice-matching scoring rules
In the case yr is a probability vector and yrj ∈ (0, 1), our benchmark example of a scoring rule is to assign to respondent r the score
ρ(xr , yr , x−r , y−r ) = ρlog (yr , y−r ) =
MB
∑ zrj log(yrj )
j=1
where zrj is the percentage of all the respondents other than r who choose j as the x choice. We will argue below that, under some
assumptions, this rule allows an SSNE in y. Note, however, that the benchmark rule does not depend on choice declaration xr , and
thus cannot lead to a NE that is strictly separating in x. In what follows, we want to find ways of modifying a rule that allows an
SSNE in y to a rule in which that NE is also strictly separating in x. For this, we want to match the score of respondent r to the
scores of other respondents declaring the same x choice. For example, we could consider the average ρ̄ −r (i) of the scores of other
respondents declaring the same choice i.
5
More generally, we introduce
Definition 2. A functional ρ̄ −r (i) of scores ρ −r of the players (if any) other than r who declare choice i is called a choice i induced
ρ score if, when all such players receive the same score ρ(i) then ρ̄ −r (i) = ρ(i). If respondent r happens to be the only respondent
who declares choice i, ρ̄ −r (i) is not defined.
Henceforth in the notation, we suppress the dependence on other players’ responses x−r , y−r .
Definition 3. Denote by E r the event such that at least one player other than r selects each possible value j ∈ {1, . . . , MA } for the
x choices.
We now introduce the main definition of this paper, of our scoring rules.
Definition 4. Given λ ∈ (0, 1), function Rρ,λ (xr , yr ) is called the choice-matching scoring rule corresponding to ρ and ρ̄ −r , if
(a) In the event E r , player r receives
MA
Rρ,λ (xr , yr ) = λ ρ(xr , yr ) + (1 − λ ) ∑ xir ρ̄ −r (i)
i=1
(b) On the complement of E r , player r receives zero.
In other words, if all x choices are represented, and if a respondent is not the only one choosing a specific x response, the
choice-matching scoring rule assigns him a score that is a weighted average of score ρ evaluated at respondent r’s responses and
score ρ̄ −r corresponding to the respondents who declare the same x choice as r. Otherwise, he receives zero.
The reason why the zero scores are assigned if a respondent is a singleton relative to his x choice, or if an x choice is not chosen
by anyone is the following: suppose a respondent r happens to be the only one with honest response xkr = 1, and suppose all other
respondents are honest. Then, either he is honest in his x choice and he cannot be matched to the respondents with the same x
choice (there are none), or he can choose a dishonest x choice, in which case no one chooses k. We want the respondent not to be
able to influence his score by influencing whether or not to be matched, and one way to accomplish that is to assign the same (zero)
score in either case.
5 There
are many other possibilities other than average, like the geometric average, or the average over a subset.
5
For this mechanism to work in theory, we need to impose the following assumption. First, introduce E r [·] = E[· | E r , T r ], which
is player r’s expectation operator conditional on his type and all the x choices being declared by at least one other respondent.
Assumption 1. - (i) The game ρ(x, y) has an NE, defined relative to the expectat dions of each player r computed using the
operator E r [·], in which the choice declarations x are honest, and which is strictly separating in y (but not in x), with the NE
strategies denoted (xr,∗ , yr,∗ ).
- (ii) The probability that respondent r assigns to event E r,i , conditional on all the respondents other than r playing the strategies
from the NE in (i), does not depend on i. We denote that (conditional) probability Pr(E r,i ) .
(iii) If all respondents other than r play the NE from (i), then, we have
E r [ρ̄ −r (k)| T r = t r ,txr = k] > E r [ρ̄ −r (i)| T r = t r ,txr = k]
(3.1)
That is, respondent r with honest choice k would strictly prefer receiving ρ̄ −r (k) to ρ̄ −r (i), i 6= k.
Remark 1. - (a) Assumption 1 (ii) is a symmetry assumption on the distribution of honest x choices in the population of respondents
(from the point of view of respondent r).
- (b) In (iii) we assume that the highest expected value for ρ̄ −r (i) is attained when respondent r declares i that is honest. In
reality, Assumption 1 (iii) would not be satisfied exactly, but if it is approximately satisfied, we would expect that the mechanisms
we propose below would still work well.6 We discuss below sufficient conditions for Assumption 1 (iii) to be satisfied.
The main result of the paper is
Proposition 1. Suppose Assumption 1 is satisfied. Then, the game in which the scores are defined as in Definition 4 has an NE in
which the choice declarations x are honest, and the equilibrium scores are equal to those in the ρ(x, y) equilibrium from Assumption
1 (i). Moreover, the equilibrium is strictly separating in x and in y.
Remark 2. The assumption that ρ allows a NE that is strictly separating in y, but not strictly separating in x, is satisfied, in particular,
if the payoff ρ(xr , yr ) paid to respondent r does not depend on his choice xr , as in our benchmark example. Then, the result says
that adding the second, choice-matching term in Rρ,λ (x, y) makes it strictly separating in x.
Proof: Consider the NE from Assumption 1, and fix a respondent r with honest choice k. Also denote by ρ r,k (x∗ , y∗ ) the
ρ-score of respondent r with honest choice k, where x∗ , y∗ denote his NE responses (thus, x∗ is honest). Suppose that all the players
other than r play the NE strategies. If player r also plays the NE strategy, his expected score is equal to
h
i
Pr(E r,k ) × λ E r ρ r,k (x∗ , y∗ ) + (1 − λ )E r ρ̄ −r (k)
If he deviates and declares choice i via x(i), and provides a response yr , his expected score is
Pr(E r,i ) × λ E r [ρ r (x(i), yr )] + (1 − λ )E r ρ̄ −r (i)
6 However,
this has to be tested.
6
By Assumption 1 (iii), we have E r [ρ̄ −r (i)] < E r [ρ̄ −r (k)]. Moreover, E r [ρ r (x(i), yr )] ≤ E r [ρ r,k (x∗ , y∗ )] because the right-hand
side is the respondent’s expected value in the NE for the game ρ(x, y). In fact, if yr 6= y∗ , the latter inequality is strict, because the
NE is assumed to be strict in y. Finally, by Assumption 1 (ii), Pr(E r,i ) does not depend on i. Combing all of the above, we see that
the NE is strictly separating in x and y.
Remark 3. I. Note that one could create other incentive compatible scores for x and y by, instead of summing the two terms in the
definition of Rρ,λ (x, y), applying to them a different, appropriately monotone function.
II. Instead of having weights λ and 1 − λ in the definition of Rρ,λ , one could assign the score ρ(xr , yr ) with probability λ
and the score ρ̄ −r (i) with probability 1 − λ . Under appropriately modified assumptions, this mechanism would work also if each
respondent r was risk averse, receiving utility U r (ρ) when the score is ρ, for some utility function U r , if the respondents with the
same honest x choice have the same utility function.
3.1.3
Sufficient conditions for Assumption 1 (iii)
The inequality in Assumption 1(iii) is crucial. Assuming that there exists an NE as in part (i), we now discuss sufficient conditions
for the inequality to be satisfied.
Sufficient Condition I. In the NE from Assumption 1 (i) two respondents have different y choices if and only if they have
different x choices.
Under this condition the inequality is satisfied: in that case ρ̄ −r ( j) = ρ( j), and respondent r’s expected value of ρ( j) is strictly
the highest for j = i where i is his honest x choice, because for j 6= i the (unique) y = y( j) choice corresponding to ρ̄ −r ( j) would
be different from y(i), and the NE of the ρ(x, y) game is strict in y and honest in x.
In our example, the sufficient condition would be, in approach (a), that, in the NE, exactly those respondents who do not think
a Keynesian approach is appropriate also respond that inflation will be higher than I. In approach (b), it would be that, in the NE,
those who make the same x choice also provide the same estimates of the percentages of the respondents making various x choices.
A sufficient condition for Condition I on one–to–one correspondence between y responses and honest x responses is the following.
Sufficient Condition II. The y responses are responses y j to the question “What are the probabilities of events C j , conditional
on your effective matching event E r ?”, where C j form a finite partition of the whole probability space. Moreover, the respondents
with the same honest choice Tx have a common prior belief on the distributions of pairs (Tx , 1C j ), they compute the probabilities in
Bayesian fashion, and the respondents with different honest choices Tx provide different vectors of y responses (where everything
is conditional on effective matching).
Indeed, under this condition, the respondent r’s honest estimate for y j is, when his honest choice is i and with Pr(C) denoting
his probability of an event C conditional on E r ,
Pr(C j |Txr = i) =
Pr(Txr = i,C j )
∑k Pr(Txr = i,Ck )
and this is the same for all respondents with Txr = i. Thus, Condition I is satisfied.
7
While it may not be satisfied exactly in reality, the common prior assumption is standard in the theory of mechanism design,
and it may be a good approximation. As for the condition that the respondents with different honest choices Tx provide different
vectors of y responses, it can be shown that this is true for almost all values of the model parameters, and that it is always true in
the case of two types only (MA = 2). More precisely, in the latter case it can be shown that, for example, type 1’s estimate of the
percentage of type 1 respondents in the sample is strictly higher than type 2’s estimate of that percentage.
3.1.4
Budget-balanced scoring rules
Suppose now we want that the scores add up to zero, with probability one, and if that is the case, we say that the scoring rule is
budget-balanced.
7
With infinitely many respondents, we can modify any scoring rule R to get a budget-balanced scoring rule R0 , as follows:
1 N
∑ R(xs , ys )
N→∞ N
s=1
R0 (xr , yr ) = R(xr , yr ) − lim
if the limit, denoted R̄, exists. Then, the score will be budget balanced if (∑Nr=1 R(xr , yr ) − N R̄) converges to zero as N → ∞. In this
case, R0 is strictly incentive compatible in x and strictly separating in y when R is, because the subtracted term R̄, being the average
of the first term over infinitely many respondents, does not depend on the responses of an individual respondent.
In practice, if the number of respondents is large we might still want to use the above modification. Alternatively, we could do
the following. Given a choice matching scoring rule R, if there are n > MB respondents, we take a subset of n − 1 of them, and
apply the R game in which the remaining n−th respondent does not participate. We charge the n−th player the negative sum of
the scores in the game with n − 1 respondents. The aggregate score is zero. We do this simultaneously for all subsets of size n − 1,
i.e., there are n games in total, and each respondent gets the score that is the sum of his scores across the n games. This preserves
incentive compatibility of an NE, while also making it budget-balanced.
3.2
Incentive compatible scoring rules
In this section we introduce scoring rules ρ(xr , yr , x−r , y−r ) which allow NE’s which are honest in x and strictly separating in y.
r } of the probability space, such that, before
Consider an integer MB > 1, and, for each respondent r, a partition {C1r , . . . ,CM
B
the scores are assigned, it will be revealed which Crj has occurred. Let Y r = (Y1r , . . . ,YMr B ) be a random vector consisting of the
responses of respondent r to the question: “What is the probability that Crj occurs, j = 1, . . . , MB , conditional on effective matching
event E r ?”. In our example above, this is the second question, on the probability of a respondent responding “Yes”, or on the
probability of the inflation rate exceeding value I, modified as to take into account when matching is in effect.
Thus, when responding truthfully, respondent r provides responses
Y jr = Prr (Crj | T r ) := E r [1Crj ]
We now recall the definition of a strictly proper scoring rule.
7 We may want to have budget balance to reduce the likelihood of collusion between the respondents, for example, to avoid all of them declaring the same
choices.
8
Definition 5. Given an integer MB > 1 we say that functions f (p; j) form a strictly proper scoring rule (SPSR) if, for any probability
vectors p = (p1 , . . . , pMB ), q = (q1 , . . . , qMB ) taking values in (0, 1)MB , q 6= p, we have
MB
MB
∑ p j f (p; j) > ∑ p j f (q; j)
j=1
j=1
The benchmark example of an SPSR is f (p; j) = log(p j ), in which case the above inequality is known as the Gibbs inequality.
8
Proposition 2. Let f (y; j) form an SPSR, and let Z rj be any random variables such that for the honest yrj response of respondent r,
when all other respondents are honest, we have9
E r [Z rj ] = Y jr
(3.2)
Then, the scoring rule
ρ f (yr ) :=
MB
∑ zrj f (yr ; j)
j=1
is strictly incentive compatible for y, where zrj is the value Z rj takes.
Proof: Assume everyone else is honest. Suppose first that r is honest. Then, by (3.2),the expected score of r, when providing
honest responses yrj , may be written as
"
E
r
MB
∑
#
Z rj f (yr ;
r
j) | T = t
r
MB
=
j=1
∑ yrj f (yr ; j)
j=1
On the other hand, if respondent r declares a vector ỹr different from the vector yr of his honest y responses, his expected score
is
MB
∑ yrj f (ỹr ; j)
j=1
Since f (y; j) form an SPSR, this is strictly less than the above expected score he attains when honest.
We now modify ρ f so that it becomes strictly incentive compatible for x, too, which is really what the planner is interested in.
The following is a corollary of Propositions 1 and 2.
Corollary 1. Let f (y; j) form an SPSR, and let Z rj be as in Proposition 2, and such that the outcome of Z rj is not affected by the x
and y responses of respondent r. Under Assumption 1, for every λ ∈ (0, 1), the choice-matching scoring rule that pays, when not
zero, the amount
MA
R f (xr , yr ) = λ ρ f (yr ) + (1 − λ ) ∑ xir ρ̄ −r
f (i)
i=1
8 There
9 For
are many other proper scoring rules. A general characterization with many examples is provided in Gneiting and Raftery (2007).
example, Z rj = 1Crj .
9
is strictly incentive compatible for x and y.
Remark 4. I. In our example, we can set Z1r = 1C1r (and Z2r = 1 − Z1r ), where C1r is the event that a randomly chosen respondent
other than r thinks positively of the Keynesian approach. Then, in approach (a) we need to assume that respondent r estimates the
percentage Y1r of the respondents other than r who will state that they think positively of the Keynesian approach, conditional on
E r , by probability E r [1C1r ]of event C1r . In approach (b) we need to assume that respondent r estimates the percentage Y1r of the
respondents other than r who will state that the inflation rate is higher than I, conditional on E r , by the same probability E r [1C1r ].
II. Generalizing approach (a) from the example, we can define Z rj = 1Crj , where Crj is the event that a randomly chosen respondent picks a choice labeled j ∈ {1, . . . , MB } in a set B consisting of MB elements (that exhaust all possible outcomes), where some
natural choices for B might be
- (i) B = A.
- (ii) B is the set of all possible K-tuples of A, K < MA .
- (iii) B is a (non-trivial) partition of A.
For example, if A corresponds to the choice of one-to-five star ratings, then case (i) corresponds to asking about what percentage
of respondents picked one star, two stars, ..., five stars. Case (ii), with K = 2, corresponds to asking about the percentage of
respondent pairs both choosing one star, one choosing one star and the other two stars, and so on. An example of case (iii) would
be asking for three percentages, for instance: of those giving one or two stars, those giving three stars, and those giving four or five
stars.
4
Conclusions
Given a truth-inducing scoring rule for responses to an auxiliary multiple choice question, we show how to construct a truthinducing scoring rule for a correlated main question of interest. This is accomplished by rewarding the respondents with the
score corresponding to the auxiliary choice of the group they align themselves with. Having a “high correlation" between the two
questions is a sufficient condition for the rule to work in theory. An interesting possibility for future research would be to test the
mechanisms we propose here, to explore their performance and robustness in experiments, or in practice.
5
Appendix: Comparison with Bayesian Truth Serum
A well known scoring mechanism that allows an SSNE in x and y, with A = B, is Bayesian Truth Serum, BTS, introduced by Prelec
(2004). Under the assumption of a common prior for infinitely many respondents, the BTS score that a respondent receives in the
honest SSNE is an increasing linear function of the logarithm of his posterior probability of the true state of nature, where the latter
can be taken to be equal to the actual distribution of types.
10
The advantages of scoring mechanisms in this paper relative to BTS are: (i) they may be simpler to explain to respondents in
implementation; (ii) they don’t necessarily require infinitely many respondents; (iii) they don’t necessarily require a common prior.
10 As mentioned above, the common prior assumption implies that the respondents with the same honest x choice will choose, when honest, the same y responses,
so that the types are completely determined by x choices.
10
A potential disadvantage is that they do not necessarily rank respondents according to any measure of expertise, while BTS
ranks them according to their posteriors of the true state of nature. We now present an example in which R f (xr , yr ) score of a
respondent with a higher posterior probability of the true state of nature is lower than the score of a respondent with a lower such
probability. In the example, we take f (x) = log(x).
Suppose there are infinitely many respondents, and there are three choices in set A that are also the only possible types,
and suppose we are in the setting of Remark 4 II, with B = A. The types of the respondents are assumed to be exchangeable
random variables. Suppose also that there are three possible states of nature. We assume that the types have a common prior
belief represented by the joint distribution P(T = i, Ω = j) of type T and state of nature Ω, and that they compute probabilities in
Bayesian fashion.
The idea is to have an example with a very low probability of type 3 in states 1 and 2, in which type 2 predicts a low probability
for type 3, while type 1 does not. If the actual state is state 1, for example, then type 2 may be more accurate by measures of
distributional prediction error, while still putting less probability on state 1 than type 1.
We now confirm this in a numerical example. Assume that the prior is such that each of the three types is equally likely, and
that the posterior probabilities P(Ω = j | T r = i) of the state of nature Ω are given by the following matrix, with rows corresponding
to type i and columns to state j, for some small value ε > 0:
0.6
0.2
0.2
0.4
0.6 − ε
ε
ε
ε
1 − 2ε
Let us assume that the actual state of nature is j = 1. Thus, type i = 1 has a posterior 0.6 for the true state of nature that is higher
than the corresponding posterior 0.4 of type i = 2. However, type 2 has a better knowledge about the distribution of the beliefs
in the realized state, for a sufficiently low ε, in the following sense. Let us set ε = 0.001, for example. It can be computed,
using standard Bayes rule, that type 1’s belief about the distribution of types is (0.443, 0.3897, 0.1673), while type 2’s belief is
(0.3897, 0.6083, 0.002). Moreover, the actual distribution of types in state 1 is (0.5994, 0.3996, 0.001). In particular, type 2 gives a
low probability to type 3, which, indeed, is very unlikely in the realized state. This makes the R f (xr , yr ) score higher for type 2: it
can be computed that the score of type 1 is (−0.866) and of type 2 it is (−0.770). Thus, indeed, a respondent can have a posterior
probability for the realized state of nature higher than another player, hence a higher BTS score, while obtaining a lower R f score.
References
[1] Baillon, A. (2016) A market to read minds. Working paper.
[2] Cvitanić, J, Prelec, D., Radas, S. and Šikić, H. (2015) Mechanism design for an agnostic planner: universal mechanisms,
logarithmic equilibrium payoffs and implementation. Submitted.
[3] Gneiting, T. and Raftery, A.E. (2007) Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American
Statistical Association, 102, 359–378.
11
[4] Miller, N., Resnick, P. and Zeckhauser, R. (2005) Eliciting Informative Feedback: The Peer-Prediction Method. Management
Science 51, 1359–1373.
[5] Prelec, D. (2004) A Bayesian Truth Serum for Subjective Data. Science 306, 462–466.
[6] Prelec, D., Seung, H.S. and McCoy, J. (2013) Finding truth even if the crowd is wrong. Working paper, MIT.
[7] Waggoner, B., and Chen, Y. (2013) Information Elicitation Sans Verification. In Proceedings of the 3rd Workshop on Social
Computing and User Generated Content (SC13).
[8] Witkowski, J., and Parkes, D. (2012) Peer Prediction Without a Common Prior. In Proceedings of the 13th ACM Conference
on Electronic Commerce (EC12).
12