Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Honesty via Choice-Matching Jakša Cvitanić ∗and Dražen Prelec † Abstract We propose scoring rule mechanisms for eliciting honest responses to a multiple choice question, MCQ. The respondent’s score consists of two terms: an auxiliary score based on his response to a related auxiliary question, and the corresponding scores of other respondents giving the same response as him to the MCQ. For this to work, the auxiliary score has to be truth-incentive for the auxiliary question, which can be accomplished by using strictly proper scoring rules, and the auxiliary question has to be sufficiently correlated with the MCQ. Key words: Proper scoring rules, Bayesian Truth Serum, truth-telling equilibria JEL codes: C11, D82, D83, M00 ∗ Division of the Humanities and Social Sciences, Caltech. E-mail: [email protected]. Research supported in part by NSF grant DMS 10-08219. Sloan School of Management, Department of Economics, Department of Brain and Cognitive Sciences. E-mail: [email protected]. Supported by Intelligence Advanced Research Projects Activity (IARPA) via the Department of Interior National Business Center contract number D11PC20058. Disclaimer: The views and conclusions expressed herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government. † MIT, 1 1 Introduction Scoring rules for eliciting honest responses from many agents to a multiple choice question, MCQ, that have been considered in the literature either require strong assumptions on the planner, such as knowing the prior beliefs of the respondents, or may be difficult to explain to the average respondent, Prelec (2004), Waggoner and Chen (2013), Cvitanić, Prelec, Radas and Šikić (2015), or are tailored for binary (dichotomous) questions, Witkowski and Parkes (2012), Baillon (2016). In this paper we propose honesty-inducing scoring rules that do not require the planner to know the prior, are simple to explain, and allow any number of multiple choices. The mechanism asks each respondent to respond to two questions, the MCQ of the main interest, with responses called x responses, and an additional related auxiliary question, with responses called y responses. Given an honesty-inducing score ρ for y responses, each respondent receives the corresponding ρ score, but adjusted by a term that depends on the ρ scores of other respondents who make the same choice in the MCQ. For example, one can add the average of such scores to the respondent’s ρ score. We show that this results in a truth-inducing mechanism, assuming that the respondent’s expected value of the adjusting term is the highest if she aligns with the group that picks his honest MCQ choice. We discuss sufficient conditions for the mechanism to work, and how to implement it. In particular, truth-telling is an equilibrium in the setting in which the second question is "What are the percentages of the respondents choosing each particular choice in the MCQ?", assuming that the respondents are risk-neutral Bayesian maximizers who have a common prior, and that different honest x responses imply different honest y responses. In the latter setting, and with infinitely many respondents, a well known scoring mechanism that is implementable in practice is the Bayesian Truth Serum, BTS, introduced by Prelec (2004). The advantages of the scoring mechanisms in this paper relative to BTS are: (i) they are simpler to explain to respondents in implementation; (ii) they don’t necessarily require infinitely many respondents; (iii) they don’t necessarily require a common prior. A potential disadvantage is that they do not necessarily rank respondents according to any measure of expertise, while BTS ranks them according to their posterior probabilities of the true state of nature. In the next section we present an example to illustrate the main idea and possible applications. In Section 3 we present the model and all the results. We conclude in Section 4. In Appendix, we compare the proposed mechanisms with Bayesian Truth Serum. 2 Example We present here an example that illustrates what kind of problems we want to address, to motivate the theory below. Suppose a planner wants to poll economists, using the following multiple choice question:1 - Consider the following statement: “Some variation of the Keynesian approach to dealing with economic crisis is better than the existing alternatives”. Choose “Yes” if you agree, and “No” if you don’t. Suppose the planner wants to know the true percentage of those who agree, and that she is worried some among the respondents might not provide honest responses. She will design a mechanism that assigns to each a respondent a score based on his responses. Ideally, the planner wants a scoring rule that is such that, if the respondent wants to maximize his expected score and if everyone 1 In this case, the question offers only two possible responses, but, in general, it could be any number. 1 else tells the truth, then, it is strictly optimal for the respondent to provide an honest response. However, that is not possible to achieve with a rule that is based only on responses to a single question 2 . What we provide in this paper are truth-inducing mechanisms of the following type: each respondent will be asked an additional auxiliary question, and using those two responses, each will be assigned a score. The scoring rules will be such that truth-telling is an equilibrium if the objective of each respondent is to maximize the expected score. 3 Denote by k the true opinion of respondent r among the multiple choices, called his honest x choice. One term in the scoring function will be a score ρ that depends on the responses to the additional auxiliary question. Then, the idea is to pay to each respondent a weighted average of his ρ score and the average ρ̄ −r (i) of the ρ scores of the respondents who choose the same answer i to the original question as respondent r does. In other words, by responding to the original question, the respondent is matched (aligned) to a subgroup of other respondents, which determines the value of his score via the responses to the additional question. For this to be truth-incentive, the additional question has to be correlated to the original question, in the sense that for each respondent r, the average score ρ̄ −r (k) of the respondents agreeing with his true choice k has to have higher expected value than the average score ρ̄ −r (i) of the respondents who choose a different response i 6= k to the original question. Here are two different examples of possible auxiliary questions correlated to the original question above: - (a) What is the probability that a randomly selected respondent from the poll will respond “Yes” to the original question? - (b) What is the probability that the rate of inflation in US will be above value I this year? We see that even though the true state of nature behind the original question is not verifiable, the auxiliary question may be based on something that is verifiable. The form of the secondary questions in (a) and (b) is the same (requesting a probability) but in the first case what is verifiable are the answers of other respondents, in the second it is a macroeconomic statistic.4 For concreteness, we first provide our benchmark example of a truth-incentive scoring rule for multiple choice questions, in the context of approach (a), in which respondents estimate the probability that another respondent will answer in a particular way. Assume MA possible types and a multiple choice question with MA possible answers. The number of respondents is assumed to be at least n > MA + 1. Random variables Xir take value zero for all MCQ choices i ∈ {0, . . . , MA } not declared by respondent r, and Xkr is equal to one for k corresponding to the response declared by respondent r. The response to the auxiliary question, denoted by Y jr , is the prediction by respondent r of the empirical distribution of endorsements among n − 1 respondents, excluding respondent r. The values random variables Xir and Y jr take are denoted xir , yrj . Denote by zrj the actual percentage of all the respondents other than r who declare x choice j, zrj = (n − 1)−1 ∑ xtj t6=r We first define the score for the auxiliary response (the prediction) as the K-L divergence between a respondent’s prediction of the sample percentages and the actual percentage: ρ r = ∑ zrj log yrj j 2 For example, this is impossible in the setup in which all the respondents have a common prior that is not used by the planner in assigning the scores; see Cvitanić, Prelec, Radas and Šikić (2014). 3 For this to work in practice, the respondents either have to be sufficiently compensated proportionally to their score, or they have to care about their score for other reasons, e.g., because of reputational concerns. 4 The disadvantage of the second form is that one would have to wait up to one year to compute the scores. 2 It is well known that honest predictions yrj (i.e., predictions equal to the expected percentages) maximize above expected score. In order to create incentives for the answer to the primary question, we also credit respondent r with the average auxiliary score for all respondents who made the same type declaration: r t −1 ρ̄ −r = ∑ zrj logy−r ∑ xkr xtk log ytj j = ∑ z j ∑(∑ xk ) j j k t6=r t6=r where logy−r j is the average log-prediction of the sample percentages made by the respondents other than r who have declared the same MCQ choice as r, called his “type". To cover the possibility that no one has declared the same MCQ choice as respondent r, we adapt a clever device that Baillon (2016) used in his Bayesian market mechanism. We define a trigger variable, K r = ∏(1 − ∏(1 − xti )) i t6=r which has value zero if there is at least one “vacant” answer with zero declarations once respondent r is excluded, and value one otherwise. This includes the case when r is a “singleton.” We say that type matching is in effect for respondent r if K r = 1. Importantly, K r does not involve responses (xr , yr ) of respondent r, but only the MCQ choices of other respondents. Therefore, a respondent cannot influence whether type-matching is in effect for him or her. The complete type-matching entropy scoring formula is then a weighted average of the logarithmic proper score that a respondent receives for their own prediction, and the average proper score of predictions of all the respondents other than r who have declared the same type: K r (λ ρ r + (1 − λ )ρ̄ −r ) = K r (λ ∑ zri log yri + (1 − λ ) ∑ zri logy−r i ) i i Observe that the only impact of type declaration on score is by defining the type-matching subset of respondents. If respondent believes that, conditional on type matching being in effect, the predictions of respondents that are type matched to him will be more accurate (on average) than predictions of respondents that are not type matched, then his best strategy is to honestly declare his type. In particular, this will hold if respondents believe that types determine predictions exactly, and that different types support different predictions (stochastic relevance). However, it does not require that posterior beliefs about other types are generated from a common prior. The minimum sample size for this approach is at least two more respondents than the number of possible answers to the MCQ: n > MA + 1. It is useful to consider why the approach will not work if n = MA + 1. In that case, the only way one can avoid a vacant answer after excluding a respondent is if other respondents each chose a different answer. Since this is common knowledge among types, all types will predict the same (uniform) distribution over answers. Therefore, the stochastic relevance requirement, that different types generate different predictions conditional on type matching in effect, does not hold. Exactly this point was made by Baillon (2016). In the rest of the paper, we present a general model of which the above is an example, and elaborate on the assumptions we make and the results. 3 3 The Model and the Results 3.1 Choice-matching scoring rules and strict separation 3.1.1 The setup A survey planner is interested in receiving honest responses to a multiple choice question, the choices of which comprise a set A. She will do that by assigning to each a respondent a score based on his responses to two questions. First, the planner asks each respondent r to pick a response in set A = {1, . . . , MA }, the action of which we will name choice declaration. Given a respondent r, random variables Xir take value zero for all choices i ∈ {0, . . . , MA } not declared by respondent r, and Xkr is equal to one for k corresponding to the choice declared by respondent r. Second, each respondent r is also asked to provide (numerically represented) responses to a list of MB ≥ MA auxiliary questions, and the responses to those questions, as random variables, are denoted Y jr , j = 1, . . . , MB . The values that random variables Xir and Y jr take are denoted xir , yrj , and we denote by xr , yr , X r ,Y r the vectors consisting of the respective values subscripted with i and j. The vector of honest (X r ,Y r ) responses of respondent r is called his type, and is denoted T r = (Txr , Tyr ). The values random vectors/variables T r , Txr , Tyr take are denoted t r ,txr ,tyr . A pure strategy for player r is a map σ (t r ) = (xσ (t r ), yσ (t r )) that maps a player’s type t r to his response choice (xr , yr ). The profile of all respondents’ pure strategies is denoted σ (t), with entries σ r (t r ), and the profile excluding player r is denoted σ −r (t −r ). We consider only pure strategies. A scoring rule is a function ρ(σ r (t r ), σ −r (t −r )) that takes the responses of respondent r, given all other respondents’ responses, to the set of real numbers. We suppose that each respondent maximizes the expected value of his score, conditional on his type. Definition 1. (i) Given a scoring rule ρ(σ r (t r ), σ −r (t −r )), we call a set of response strategies a (Bayesian) Nash Equilibrium, NE, if, for any potential response action (x, y) 6= (xσ (t r ), yσ (t r )), we have E[ρ x, y; σ −r (t −r ) | T r = t r ] ≤ E[ρ xσ (t r ), yσ (t r ), σ −r (t −r ) | T r = t r ] That is, by deviating in responses (x, y), player r would be worse off than not deviating, in the sense of resulting in a lower expected score, conditional on his type. If the inequality is always strict, we call the NE a strict NE, SNE. If the responses in a NE are honest in x, we say that the NE is honest in x. If the inequality is strict as soon as x 6= xσ (t r ), we say that the NE is strict in x. If there is a NE that is honest and strict in x, we say that ρ is strictly incentive compatible for x. If there are NE’s that are honest in x, but no such NE is strict in x, we say that the rule is weakly incentive compatible in x. The corresponding definitions for y are analogous. (ii) If there is a NE strict in x (y) in which the respondents with different honest x (y) choices provide distinct x (y) responses, we say that rule ρ allows a Strictly Separating Nash Equilibrium, SSNE, in x (y) responses. 4 3.1.2 Strict separation via choice-matching scoring rules In the case yr is a probability vector and yrj ∈ (0, 1), our benchmark example of a scoring rule is to assign to respondent r the score ρ(xr , yr , x−r , y−r ) = ρlog (yr , y−r ) = MB ∑ zrj log(yrj ) j=1 where zrj is the percentage of all the respondents other than r who choose j as the x choice. We will argue below that, under some assumptions, this rule allows an SSNE in y. Note, however, that the benchmark rule does not depend on choice declaration xr , and thus cannot lead to a NE that is strictly separating in x. In what follows, we want to find ways of modifying a rule that allows an SSNE in y to a rule in which that NE is also strictly separating in x. For this, we want to match the score of respondent r to the scores of other respondents declaring the same x choice. For example, we could consider the average ρ̄ −r (i) of the scores of other respondents declaring the same choice i. 5 More generally, we introduce Definition 2. A functional ρ̄ −r (i) of scores ρ −r of the players (if any) other than r who declare choice i is called a choice i induced ρ score if, when all such players receive the same score ρ(i) then ρ̄ −r (i) = ρ(i). If respondent r happens to be the only respondent who declares choice i, ρ̄ −r (i) is not defined. Henceforth in the notation, we suppress the dependence on other players’ responses x−r , y−r . Definition 3. Denote by E r the event such that at least one player other than r selects each possible value j ∈ {1, . . . , MA } for the x choices. We now introduce the main definition of this paper, of our scoring rules. Definition 4. Given λ ∈ (0, 1), function Rρ,λ (xr , yr ) is called the choice-matching scoring rule corresponding to ρ and ρ̄ −r , if (a) In the event E r , player r receives MA Rρ,λ (xr , yr ) = λ ρ(xr , yr ) + (1 − λ ) ∑ xir ρ̄ −r (i) i=1 (b) On the complement of E r , player r receives zero. In other words, if all x choices are represented, and if a respondent is not the only one choosing a specific x response, the choice-matching scoring rule assigns him a score that is a weighted average of score ρ evaluated at respondent r’s responses and score ρ̄ −r corresponding to the respondents who declare the same x choice as r. Otherwise, he receives zero. The reason why the zero scores are assigned if a respondent is a singleton relative to his x choice, or if an x choice is not chosen by anyone is the following: suppose a respondent r happens to be the only one with honest response xkr = 1, and suppose all other respondents are honest. Then, either he is honest in his x choice and he cannot be matched to the respondents with the same x choice (there are none), or he can choose a dishonest x choice, in which case no one chooses k. We want the respondent not to be able to influence his score by influencing whether or not to be matched, and one way to accomplish that is to assign the same (zero) score in either case. 5 There are many other possibilities other than average, like the geometric average, or the average over a subset. 5 For this mechanism to work in theory, we need to impose the following assumption. First, introduce E r [·] = E[· | E r , T r ], which is player r’s expectation operator conditional on his type and all the x choices being declared by at least one other respondent. Assumption 1. - (i) The game ρ(x, y) has an NE, defined relative to the expectat dions of each player r computed using the operator E r [·], in which the choice declarations x are honest, and which is strictly separating in y (but not in x), with the NE strategies denoted (xr,∗ , yr,∗ ). - (ii) The probability that respondent r assigns to event E r,i , conditional on all the respondents other than r playing the strategies from the NE in (i), does not depend on i. We denote that (conditional) probability Pr(E r,i ) . (iii) If all respondents other than r play the NE from (i), then, we have E r [ρ̄ −r (k)| T r = t r ,txr = k] > E r [ρ̄ −r (i)| T r = t r ,txr = k] (3.1) That is, respondent r with honest choice k would strictly prefer receiving ρ̄ −r (k) to ρ̄ −r (i), i 6= k. Remark 1. - (a) Assumption 1 (ii) is a symmetry assumption on the distribution of honest x choices in the population of respondents (from the point of view of respondent r). - (b) In (iii) we assume that the highest expected value for ρ̄ −r (i) is attained when respondent r declares i that is honest. In reality, Assumption 1 (iii) would not be satisfied exactly, but if it is approximately satisfied, we would expect that the mechanisms we propose below would still work well.6 We discuss below sufficient conditions for Assumption 1 (iii) to be satisfied. The main result of the paper is Proposition 1. Suppose Assumption 1 is satisfied. Then, the game in which the scores are defined as in Definition 4 has an NE in which the choice declarations x are honest, and the equilibrium scores are equal to those in the ρ(x, y) equilibrium from Assumption 1 (i). Moreover, the equilibrium is strictly separating in x and in y. Remark 2. The assumption that ρ allows a NE that is strictly separating in y, but not strictly separating in x, is satisfied, in particular, if the payoff ρ(xr , yr ) paid to respondent r does not depend on his choice xr , as in our benchmark example. Then, the result says that adding the second, choice-matching term in Rρ,λ (x, y) makes it strictly separating in x. Proof: Consider the NE from Assumption 1, and fix a respondent r with honest choice k. Also denote by ρ r,k (x∗ , y∗ ) the ρ-score of respondent r with honest choice k, where x∗ , y∗ denote his NE responses (thus, x∗ is honest). Suppose that all the players other than r play the NE strategies. If player r also plays the NE strategy, his expected score is equal to h i Pr(E r,k ) × λ E r ρ r,k (x∗ , y∗ ) + (1 − λ )E r ρ̄ −r (k) If he deviates and declares choice i via x(i), and provides a response yr , his expected score is Pr(E r,i ) × λ E r [ρ r (x(i), yr )] + (1 − λ )E r ρ̄ −r (i) 6 However, this has to be tested. 6 By Assumption 1 (iii), we have E r [ρ̄ −r (i)] < E r [ρ̄ −r (k)]. Moreover, E r [ρ r (x(i), yr )] ≤ E r [ρ r,k (x∗ , y∗ )] because the right-hand side is the respondent’s expected value in the NE for the game ρ(x, y). In fact, if yr 6= y∗ , the latter inequality is strict, because the NE is assumed to be strict in y. Finally, by Assumption 1 (ii), Pr(E r,i ) does not depend on i. Combing all of the above, we see that the NE is strictly separating in x and y. Remark 3. I. Note that one could create other incentive compatible scores for x and y by, instead of summing the two terms in the definition of Rρ,λ (x, y), applying to them a different, appropriately monotone function. II. Instead of having weights λ and 1 − λ in the definition of Rρ,λ , one could assign the score ρ(xr , yr ) with probability λ and the score ρ̄ −r (i) with probability 1 − λ . Under appropriately modified assumptions, this mechanism would work also if each respondent r was risk averse, receiving utility U r (ρ) when the score is ρ, for some utility function U r , if the respondents with the same honest x choice have the same utility function. 3.1.3 Sufficient conditions for Assumption 1 (iii) The inequality in Assumption 1(iii) is crucial. Assuming that there exists an NE as in part (i), we now discuss sufficient conditions for the inequality to be satisfied. Sufficient Condition I. In the NE from Assumption 1 (i) two respondents have different y choices if and only if they have different x choices. Under this condition the inequality is satisfied: in that case ρ̄ −r ( j) = ρ( j), and respondent r’s expected value of ρ( j) is strictly the highest for j = i where i is his honest x choice, because for j 6= i the (unique) y = y( j) choice corresponding to ρ̄ −r ( j) would be different from y(i), and the NE of the ρ(x, y) game is strict in y and honest in x. In our example, the sufficient condition would be, in approach (a), that, in the NE, exactly those respondents who do not think a Keynesian approach is appropriate also respond that inflation will be higher than I. In approach (b), it would be that, in the NE, those who make the same x choice also provide the same estimates of the percentages of the respondents making various x choices. A sufficient condition for Condition I on one–to–one correspondence between y responses and honest x responses is the following. Sufficient Condition II. The y responses are responses y j to the question “What are the probabilities of events C j , conditional on your effective matching event E r ?”, where C j form a finite partition of the whole probability space. Moreover, the respondents with the same honest choice Tx have a common prior belief on the distributions of pairs (Tx , 1C j ), they compute the probabilities in Bayesian fashion, and the respondents with different honest choices Tx provide different vectors of y responses (where everything is conditional on effective matching). Indeed, under this condition, the respondent r’s honest estimate for y j is, when his honest choice is i and with Pr(C) denoting his probability of an event C conditional on E r , Pr(C j |Txr = i) = Pr(Txr = i,C j ) ∑k Pr(Txr = i,Ck ) and this is the same for all respondents with Txr = i. Thus, Condition I is satisfied. 7 While it may not be satisfied exactly in reality, the common prior assumption is standard in the theory of mechanism design, and it may be a good approximation. As for the condition that the respondents with different honest choices Tx provide different vectors of y responses, it can be shown that this is true for almost all values of the model parameters, and that it is always true in the case of two types only (MA = 2). More precisely, in the latter case it can be shown that, for example, type 1’s estimate of the percentage of type 1 respondents in the sample is strictly higher than type 2’s estimate of that percentage. 3.1.4 Budget-balanced scoring rules Suppose now we want that the scores add up to zero, with probability one, and if that is the case, we say that the scoring rule is budget-balanced. 7 With infinitely many respondents, we can modify any scoring rule R to get a budget-balanced scoring rule R0 , as follows: 1 N ∑ R(xs , ys ) N→∞ N s=1 R0 (xr , yr ) = R(xr , yr ) − lim if the limit, denoted R̄, exists. Then, the score will be budget balanced if (∑Nr=1 R(xr , yr ) − N R̄) converges to zero as N → ∞. In this case, R0 is strictly incentive compatible in x and strictly separating in y when R is, because the subtracted term R̄, being the average of the first term over infinitely many respondents, does not depend on the responses of an individual respondent. In practice, if the number of respondents is large we might still want to use the above modification. Alternatively, we could do the following. Given a choice matching scoring rule R, if there are n > MB respondents, we take a subset of n − 1 of them, and apply the R game in which the remaining n−th respondent does not participate. We charge the n−th player the negative sum of the scores in the game with n − 1 respondents. The aggregate score is zero. We do this simultaneously for all subsets of size n − 1, i.e., there are n games in total, and each respondent gets the score that is the sum of his scores across the n games. This preserves incentive compatibility of an NE, while also making it budget-balanced. 3.2 Incentive compatible scoring rules In this section we introduce scoring rules ρ(xr , yr , x−r , y−r ) which allow NE’s which are honest in x and strictly separating in y. r } of the probability space, such that, before Consider an integer MB > 1, and, for each respondent r, a partition {C1r , . . . ,CM B the scores are assigned, it will be revealed which Crj has occurred. Let Y r = (Y1r , . . . ,YMr B ) be a random vector consisting of the responses of respondent r to the question: “What is the probability that Crj occurs, j = 1, . . . , MB , conditional on effective matching event E r ?”. In our example above, this is the second question, on the probability of a respondent responding “Yes”, or on the probability of the inflation rate exceeding value I, modified as to take into account when matching is in effect. Thus, when responding truthfully, respondent r provides responses Y jr = Prr (Crj | T r ) := E r [1Crj ] We now recall the definition of a strictly proper scoring rule. 7 We may want to have budget balance to reduce the likelihood of collusion between the respondents, for example, to avoid all of them declaring the same choices. 8 Definition 5. Given an integer MB > 1 we say that functions f (p; j) form a strictly proper scoring rule (SPSR) if, for any probability vectors p = (p1 , . . . , pMB ), q = (q1 , . . . , qMB ) taking values in (0, 1)MB , q 6= p, we have MB MB ∑ p j f (p; j) > ∑ p j f (q; j) j=1 j=1 The benchmark example of an SPSR is f (p; j) = log(p j ), in which case the above inequality is known as the Gibbs inequality. 8 Proposition 2. Let f (y; j) form an SPSR, and let Z rj be any random variables such that for the honest yrj response of respondent r, when all other respondents are honest, we have9 E r [Z rj ] = Y jr (3.2) Then, the scoring rule ρ f (yr ) := MB ∑ zrj f (yr ; j) j=1 is strictly incentive compatible for y, where zrj is the value Z rj takes. Proof: Assume everyone else is honest. Suppose first that r is honest. Then, by (3.2),the expected score of r, when providing honest responses yrj , may be written as " E r MB ∑ # Z rj f (yr ; r j) | T = t r MB = j=1 ∑ yrj f (yr ; j) j=1 On the other hand, if respondent r declares a vector ỹr different from the vector yr of his honest y responses, his expected score is MB ∑ yrj f (ỹr ; j) j=1 Since f (y; j) form an SPSR, this is strictly less than the above expected score he attains when honest. We now modify ρ f so that it becomes strictly incentive compatible for x, too, which is really what the planner is interested in. The following is a corollary of Propositions 1 and 2. Corollary 1. Let f (y; j) form an SPSR, and let Z rj be as in Proposition 2, and such that the outcome of Z rj is not affected by the x and y responses of respondent r. Under Assumption 1, for every λ ∈ (0, 1), the choice-matching scoring rule that pays, when not zero, the amount MA R f (xr , yr ) = λ ρ f (yr ) + (1 − λ ) ∑ xir ρ̄ −r f (i) i=1 8 There 9 For are many other proper scoring rules. A general characterization with many examples is provided in Gneiting and Raftery (2007). example, Z rj = 1Crj . 9 is strictly incentive compatible for x and y. Remark 4. I. In our example, we can set Z1r = 1C1r (and Z2r = 1 − Z1r ), where C1r is the event that a randomly chosen respondent other than r thinks positively of the Keynesian approach. Then, in approach (a) we need to assume that respondent r estimates the percentage Y1r of the respondents other than r who will state that they think positively of the Keynesian approach, conditional on E r , by probability E r [1C1r ]of event C1r . In approach (b) we need to assume that respondent r estimates the percentage Y1r of the respondents other than r who will state that the inflation rate is higher than I, conditional on E r , by the same probability E r [1C1r ]. II. Generalizing approach (a) from the example, we can define Z rj = 1Crj , where Crj is the event that a randomly chosen respondent picks a choice labeled j ∈ {1, . . . , MB } in a set B consisting of MB elements (that exhaust all possible outcomes), where some natural choices for B might be - (i) B = A. - (ii) B is the set of all possible K-tuples of A, K < MA . - (iii) B is a (non-trivial) partition of A. For example, if A corresponds to the choice of one-to-five star ratings, then case (i) corresponds to asking about what percentage of respondents picked one star, two stars, ..., five stars. Case (ii), with K = 2, corresponds to asking about the percentage of respondent pairs both choosing one star, one choosing one star and the other two stars, and so on. An example of case (iii) would be asking for three percentages, for instance: of those giving one or two stars, those giving three stars, and those giving four or five stars. 4 Conclusions Given a truth-inducing scoring rule for responses to an auxiliary multiple choice question, we show how to construct a truthinducing scoring rule for a correlated main question of interest. This is accomplished by rewarding the respondents with the score corresponding to the auxiliary choice of the group they align themselves with. Having a “high correlation" between the two questions is a sufficient condition for the rule to work in theory. An interesting possibility for future research would be to test the mechanisms we propose here, to explore their performance and robustness in experiments, or in practice. 5 Appendix: Comparison with Bayesian Truth Serum A well known scoring mechanism that allows an SSNE in x and y, with A = B, is Bayesian Truth Serum, BTS, introduced by Prelec (2004). Under the assumption of a common prior for infinitely many respondents, the BTS score that a respondent receives in the honest SSNE is an increasing linear function of the logarithm of his posterior probability of the true state of nature, where the latter can be taken to be equal to the actual distribution of types. 10 The advantages of scoring mechanisms in this paper relative to BTS are: (i) they may be simpler to explain to respondents in implementation; (ii) they don’t necessarily require infinitely many respondents; (iii) they don’t necessarily require a common prior. 10 As mentioned above, the common prior assumption implies that the respondents with the same honest x choice will choose, when honest, the same y responses, so that the types are completely determined by x choices. 10 A potential disadvantage is that they do not necessarily rank respondents according to any measure of expertise, while BTS ranks them according to their posteriors of the true state of nature. We now present an example in which R f (xr , yr ) score of a respondent with a higher posterior probability of the true state of nature is lower than the score of a respondent with a lower such probability. In the example, we take f (x) = log(x). Suppose there are infinitely many respondents, and there are three choices in set A that are also the only possible types, and suppose we are in the setting of Remark 4 II, with B = A. The types of the respondents are assumed to be exchangeable random variables. Suppose also that there are three possible states of nature. We assume that the types have a common prior belief represented by the joint distribution P(T = i, Ω = j) of type T and state of nature Ω, and that they compute probabilities in Bayesian fashion. The idea is to have an example with a very low probability of type 3 in states 1 and 2, in which type 2 predicts a low probability for type 3, while type 1 does not. If the actual state is state 1, for example, then type 2 may be more accurate by measures of distributional prediction error, while still putting less probability on state 1 than type 1. We now confirm this in a numerical example. Assume that the prior is such that each of the three types is equally likely, and that the posterior probabilities P(Ω = j | T r = i) of the state of nature Ω are given by the following matrix, with rows corresponding to type i and columns to state j, for some small value ε > 0: 0.6 0.2 0.2 0.4 0.6 − ε ε ε ε 1 − 2ε Let us assume that the actual state of nature is j = 1. Thus, type i = 1 has a posterior 0.6 for the true state of nature that is higher than the corresponding posterior 0.4 of type i = 2. However, type 2 has a better knowledge about the distribution of the beliefs in the realized state, for a sufficiently low ε, in the following sense. Let us set ε = 0.001, for example. It can be computed, using standard Bayes rule, that type 1’s belief about the distribution of types is (0.443, 0.3897, 0.1673), while type 2’s belief is (0.3897, 0.6083, 0.002). Moreover, the actual distribution of types in state 1 is (0.5994, 0.3996, 0.001). In particular, type 2 gives a low probability to type 3, which, indeed, is very unlikely in the realized state. This makes the R f (xr , yr ) score higher for type 2: it can be computed that the score of type 1 is (−0.866) and of type 2 it is (−0.770). Thus, indeed, a respondent can have a posterior probability for the realized state of nature higher than another player, hence a higher BTS score, while obtaining a lower R f score. References [1] Baillon, A. (2016) A market to read minds. Working paper. [2] Cvitanić, J, Prelec, D., Radas, S. and Šikić, H. (2015) Mechanism design for an agnostic planner: universal mechanisms, logarithmic equilibrium payoffs and implementation. Submitted. [3] Gneiting, T. and Raftery, A.E. (2007) Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102, 359–378. 11 [4] Miller, N., Resnick, P. and Zeckhauser, R. (2005) Eliciting Informative Feedback: The Peer-Prediction Method. Management Science 51, 1359–1373. [5] Prelec, D. (2004) A Bayesian Truth Serum for Subjective Data. Science 306, 462–466. [6] Prelec, D., Seung, H.S. and McCoy, J. (2013) Finding truth even if the crowd is wrong. Working paper, MIT. [7] Waggoner, B., and Chen, Y. (2013) Information Elicitation Sans Verification. In Proceedings of the 3rd Workshop on Social Computing and User Generated Content (SC13). [8] Witkowski, J., and Parkes, D. (2012) Peer Prediction Without a Common Prior. In Proceedings of the 13th ACM Conference on Electronic Commerce (EC12). 12