Download Interpretation of Inconsistent Choice Data: How Many Context

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Mixture model wikipedia , lookup

Transcript
Interpretation of Inconsistent Choice Data:
How Many Context-Dependent Preferences are There?
Annie Liang∗
October 22, 2015
Abstract
Inconsistencies in choice data may emerge either from context dependencies
in preference or from stochastic choice error. These inconsistencies are quite
different, and have distinct implications for welfare assessment and prediction.
How can the analyst separate the two in data? This paper provides a tool
for identifying the number of context-dependent preferences in noisy choice
data. Using the technique of statistical regularization, I define a best multipleordering rationalization as one that maximizes fit to the data subject to a
penalty on the number of orderings used. I show that although recovery of the
orderings themselves is an ill-posed problem, exact recovery of the number of
context-dependent orderings is feasible with probability exponentially close to
1 (in quantity of data) using the proposed approach.
1
Introduction
In the simplest model of choice, an individual’s preference is described as a linear
ordering r over alternatives in a set X, and his choice from any subset A ⊆ X
is the r-maximal element in A. It is common to interpret choice data under this
model, and to infer a single ordering best fit to the data. In practice, however,
choice datasets may result from the maximization of several different preferences.
For example:
1. Heterogeneity in preference across choice domains. Choice data frequently
pools observations drawn from a variety of choice domains, but agents may
∗
Department of Economics, Harvard University. Email: [email protected]. I am especially grateful to Jerry Green for invaluable advice and support. I am also grateful to Emily
Berger, Gabriel Carroll, Drew Fudenberg, David Laibson, Eric Maskin, Jose Montiel Olea, Gleb
Romanyuk, Andrei Shleifer, Ran Shorrer, and Tomasz Strzalecki for useful comments and discussions.
1
have different preferences in different domains. For example, Einav, Finkelstein, Pascu & Cullen (2012) study the commonality of financial risk preferences across six choice domains — including 401(k) asset allocations, shortterm disability insurance, and insurance choices regarding health, drug, and
dental expenditures — and find that just over 30% of their sample makes
decisions that can be simultaneously rationalized over all six domains.
2. Multiple behavioral selves. There is extensive empirical evidence that preference varies with external details of the choice environment, for example the
framing of the problem (Kahneman & Tversky 2000), the presence of default
options (Beshears, Choi, Laibson & Madrian 2008), and the addition of decoy
options (Huber, Payne & Puto 1982). Outside of the laboratory, it is unusual
for external details to remain constant across every observed choice; choices
may therefore reflect different behavioral biases in different observations.
3. Multiple representative agents. Choice data is often pooled from a population, across which there may be subpopulations or clusters of individuals with
different preferences. Ideally, agents in different clusters are identifiably different, but in practice the observable characteristics of these agents may be
indistinguishable (see, for example, Crawford & Pendakur (2012)).
In each of these settings, the analyst may not know beforehand the number of
context-dependent preferences maximized in the data. But context-dependence is
a persistent feature of preference; thus, it is important to know the number of contexts for two reasons: 1) Welfare implications—it tells us whether standard welfare
analysis is appropriate for interpretation of the observed choices, or whether there
is genuine preference variation that should be elicited to inform normative statements; 2) Prediction—it tells us whether to interpret the observed inconsistency as
noise, or as systematic variation that can inform prediction of future choices.
The purpose of this paper is to provide a tool for determining the number of
context-dependent preferences maximized in the data. The challenge is that inconsistencies may reveal genuine context-dependencies, but may also simply reveal
choice error (for example, due to inattention by the subject or measurement error
by the analyst). At extremes, we can rationalize the data using only multiplicity in rationales (in which case every observation in conflict is described with a
new ordering) or using only choice error (in which case every observation in conflict is described as a mistake). How can the analyst recover the true number of
context-dependent preferences from choice data?
This paper proposes a simple approach using regularization1 , a statistical technique in which a penalty is imposed on the complexity of the learned model to
1
For example, two common regularization techniques for learning regression function f (x) = xβ
P
extend the usual OLS approach, min i (yi − xi β)2 , to:
2
prevent overfitting. Regularization techniques have received tremendous interest
in recent decades in the applied mathematics and computer science communities due to their ability to recover various sparse structures from noisy or otherwise corrupted data. Applications range from recovery of signals (Donoho &
Huo 2001, Elad & Bruckstein 2002) to repair of damaged images and videos (Ren,
Zhang & Ma 2012, Yang, Zhou, Ganes, Sastry & Ma 2013) to lyrics and music
separation (Huang, Chen, Smaragdis & Hasegawa-Johnson 2013).2 The sparse
structure in my problem is preference: while the agent may possess as many as
|X|! context-dependent-orderings over X, I assume that the number of orderings
he maximizes is much smaller.
I adapt methods from this literature to suggest an “optimal” number of orderings to use in describing the data. A best multiple-ordering rationalization (BMOR)
is defined as a solution to the following linear programming program:
argminR∈R E(D, R) + λ|R|,
(1)
where R consists of all sets of orderings over X, E(D, R) is the number of observations in dataset D that are inconsistent with maximization of any ordering in R,
and λ ∈ R+ is a constant. The program in (1) thus maximizes fit (by minimizing
the number of unexplained observations E(D, R)) subject to a penalty on the number of orderings used (by minimizing |R|), and the constant λ trades off between
these two goals.
Notice that the problem in (1) nests as special cases two well-known approaches
in the literature. The choice of λ = 0 returns the Houtman & Maks (1985) solution:
the ordering that explains the largest number of observations in the data. The
choice of λ ≥ 1 returns the Kalai, Rubinstein & Spiegler (2002) solution: the
smallest set of orderings that explains all of the data. The problem in (1) generalizes
these two approaches by considering intermediate values of λ ∈ (0, 1).
For what choices of λ, and with what guarantee, does the solution to (1) recover
features of the agent’s preference? The main part of the paper shows that if choice
data is generated by any in a large class of models, there is an interval of choices
of λ for which the approach in (1) exactly recovers the correct number of orderings
with probability exponentially close to 1. The class of data-generating processes
I consider is the following. Let F be a (finite) set of K contexts and let R =
• min
P
− xi β)2 + λ||β||2 (ridge regularization), and
• min
P
− xi β)2 + λ||β||1 (lasso regularization).
i (yi
i (yi
Ridge regression penalizes the complexity of the learned model through the l2 norm on the vector
of coefficients, and lasso regression penalizes the l1 norm.
2
These ideas and techniques have recently begun to be applied to economic problems. See
for example Belloni, Chernozhukov & Wang (2011), Belloni & Chernozhukov (2011), Belloni,
Chernozhukov, Chen & Hansen (2012), and Gabaix (2014).
3
{rf }f ∈F be a set of context-dependent preferences.3 A choice problem is a pair
(A, f ) consisting of a choice set A and a context f . In each choice problem, the
the agent selects the rf -optimal alternative in A with probability at least 1 − p;
otherwise, he trembles and selects a different alternative. I do not impose any
parametric assumptions on either the pattern of error or the nature of the orderings.
Taking K = 1 and p = 0 returns the canonical single-ordering model, and taking
K ≥ 1 and p = 0 returns the generalized choice functions proposed independently
in Salant & Rubinstein (2008) and Bernheim & Rangel (2009).
My main result (Theorem 1) shows that if the number of orderings K is sufficiently small (relative to the number of possible orderings, |X|!), the probability p
of erring in each choice is sufficiently low, and the choice implications of the preference orderings in R are sufficiently different (see Section 5.2), then the problem in
1 recovers the exact number of orderings K with probability exponentially close to
1 (in quantity of data). Section 5.3 qualifies this result: while we can recover the
number of orderings, it is not in general possible to recover the orderings. I make
preliminary comments towards extension of the approach to recover further details
of preference.
Section 6 is the literature review, and Section 7 concludes.
2
Notation
Let X denote the set of choice alternatives, A denote a typical subset of X, and
PX denote the set of all subsets of X. I refer to A as a choice set, and PX as the
set of all choice sets. A choice observation (x, A) is a pair denoting selection of
alternative x from choice set A. A dataset is a collection of choice observations
D = {(x, A) | A ∈ A},
where A is a (multi)set of elements from PX .
A strict linear ordering is a complete, antisymmetric, and transitive binary
relation on X. Let R be the set of all permutations of (1, 2, . . . , N ), with typical
element r. Identify every linear ordering with the permutation or preference
ordering r = (r1 , r2 , . . . , rN ) ∈ R satisfying ri < rj if and only if xi xj . (For
example, x1 x3 x2 is identified with r = (1, 3, 2).) Coordinate ri can be
interpreted as the ordinal rank of alternative xi according to the ordering . The
choice function cr : PX → X induced by r takes every choice set A ∈ PX to the
r-maximal element in A. Say that dataset D is consistent if there exists an ordering
r such that (x, A) ∈ D only if x = cr (A), and inconsistent otherwise. Two choice
observations (x, A) and (x0 , A0 ) are said to be in violation of the Independence of
3
In the special case in which the set of contexts is a partition of the power set on X, R describes
a set of menu-dependent preferences.
4
Irrelevant Alternatives axiom (IIA) if x, x0 ∈ A and also x, x0 ∈ A0 , so that they
cannot be rationalized by the same strict ordering.
For a given set of orderings R ∈ R, define
C(R) := {(x, A) | A ⊆ PX and x ∈ {cr (A), r ∈ R}}
to be the set of all choice observations consistent with maximization of some ordering in R. I refer to these as the choice implications of the set R. For example, let
X = {x1 , x2 , x3 } and define orderings r1 = (1, 2, 3) and r2 = (1, 3, 2). Then,
C(R) ={(x1 , {x1 , x2 }), (x1 , {x1 , x3 }), (x2 , {x2 , x3 }),
(x3 , {x2 , x3 }), (x1 , {x1 , x2 , x3 })}.
3
Example
I begin by illustrating ideas on a toy choice dataset, and subsequently describe
the general approach in Section 4. Consider a set of choice alternatives X =
{x1 , x2 , x3 , x4 } and the dataset D consisting of the following 21 observations:
• Choice of x1 from every subset containing x1 .
• Choice of x2 from {x2 , x3 }, {x2 , x4 }, and {x2 , x3 , x4 }.
• Choice of x3 from {x3 , x4 }.
• Choice of x4 from every subset containing x4 .
• Choice of x3 from {x1 , x3 }, {x2 , x3 }, and {x1 , x2 , x3 }.
• Choice of x2 from {x1 , x2 }.
• Choice of x3 from {x1 , x3 , x4 }.
There are many possible rationalizations of this data. If the analyst uses a single
ordering to explain the data, he can explain up to 10 observations, for example
with
R1 = {x1 x2 x3 x4 }.
The minimal obtainable choice error using a single ordering is ∆1 = 21 − 10 = 11.
If the analyst allows for two orderings, he can explain 20 observations, for example
with
R2 = {x1 x2 x3 x4 , x4 x3 x2 x1 }.
The minimal obtainable choice error using two orderings is ∆2 = 21 − 20 = 1. Finally, if the analyst allows for three orderings, he can explain all of the observations,
for example with
R3 = {x1 x2 x3 x4 ,
x4 x3 x2 x1 ,
5
x3 x1 x2 x4 }.
The minimal obtainable choice error using three orderings is ∆3 = 0. These observations are collected in the figure below, which graphs the minimal obtainable
choice error ∆k for values k = 1, . . . , 5.
Choice Error
k
11
1
1
2
3
Number of Orderings k
How should the analyst choose between these solutions? Notice that the ordering in R1 is consistent with the proposal of Houtman & Maks (1985), which finds
the largest subset of choice observations that can be explained by a single ordering4 . The set of orderings R3 is consistent with the proposal of Kalai, Rubinstein
& Spiegler (2002), which finds the smallest number of orderings that can perfectly
explain all observations.
This paper proposes an intermediate solution. Define the set of best multipleordering rationalizations to be the solution to
argminR∈R E(D, R) + λ|R|,
1
0.1|D|
1
2.1 ,
choosing λ =
=
as proposed in Corollary 1. It is easy to verify that all
best multiple-ordering rationalizations consist of two orderings5 . This is because the
gain from introducing a second ordering is nearly half of the dataset (∆2 −∆1 = 10),
whereas the gain from permitting a third is only a single observation (∆3 −∆2 = 1).
The proposed approach thus interprets the data as reflecting maximization of two
orderings, with a single choice observation in error.
4
Approach
Fix a dataset D. The implied choice error when using a set of orderings R to
rationalize D is defined
E(D, R) := |{(x, A) ∈ D : x 6= cr (A)
4
5
for all r ∈ R}|,
This solution is not unique. For example x4 x3 x2 x1 also explains
10 observations.
1
And moreover, that this outcome holds for any choice of λ ∈ 10
,1 .
6
i.e. the number of choice observations in D that cannot be explained as maximization of any preference ordering r ∈ R. If we restrict to sets of r orderings, the
minimal obtainable choice error is given by
∆r :=
min
R⊆R, |R|=r
E(D, R).
Say that D is r-rationalizable if ∆r = 0.
Remark 4.1. Dataset D is 1-rationalizable if and only if it is consistent.
Remark 4.2. For every dataset D, there exists a constant L ≤ min{|D|, |X|} such
that D is r-rationalizable for every r ≥ L.
To allow for a tradeoff between the “simplicity” of the model (as defined through
the number of orderings) and its fit to the data, I define the set of λ-best multipleordering rationalizations of D as follows.
Definition 4.1. For any λ ∈ R+ , R∗ is a λ-BMOR of D if
R∗ ∈ argmin |R| + λE(D, R).
(2)
R⊆R
We can interpret λ as arbitrating between the two goals of minimizing the number
of orderings and minimizing the number of implied choice errors. Loosely speaking,
an ordering is included in R if and only if it explains at least λ1 observations that
would otherwise be interpreted as choice error. Thus, as λ → 0, the analyst prefers
to adopt a unique ordering for the agent and interpret the remaining observations
as error, while for large choices of λ, the analyst prefers to use as many orderings
as necessary to eliminate choice error. Below, I show that the Houtman & Maks
(1985) solution is selected if λ < ∆11 (Claim 1), and the Kalai, Rubinstein & Spiegler
(2002) solution is selected if λ > 1 (Claim 2).
Claim 1. If λ <
1
∆1 ,
every solution R∗ to Eq. (2) satisfies |R∗ | ≤ 1.
Proof. Suppose there exists some λ-BMOR R∗ satisfying |R∗ | = K > 1. Then by
the definition of a λ-BMOR, necessarily K + λ∆K ≤ 1 + λ∆1 . Since moreover
1 + λ∆1 < 2, it follows that K < 2 − λ∆K ≤ 2. This contradicts the assumption
that K > 1.
Claim 2. If λ > 1, every solution R∗ to Eq. (2) satisfies |R∗ | = L, where L is the
smallest constant such that the data is L-rationalizable.
Proof. Since D is L-rationalizable, clearly no solution R∗ to Eq. (2) will have |R∗ | >
L. Suppose there exists a λ-BMOR R∗ with |R∗ | = K < L. Assign a unique
ordering to each of the ∆K implied choice errors from rationalizing D with R∗ .
This perfectly explains the data with K + ∆K < K + λ∆K orderings, contradicting
the assumption that R∗ is a λ-BMOR.
7
Geometrically, solutions to (2) are described as follows. Let f be the linear
interpolation of points {(k, ∆k ), k ∈ N}, and let F = {(x, y) | y ≥ f (x)} be the
epigraph of f . Then, for any choice λ ∈ R+ , the problem in (2) returns a kordering solution if and only if the line with normal vector (−1, −λ) supports F at
(k, ∆k ). For example, given the dataset provided in Section 3, the line with normal
1
vector (−1, −λ) supports F at (2, ∆2 ) for any choice of λ ∈ 10
,1 .
Choice Error
0
f
F
2
1
2
3
Number of Orderings
Figure 1: The problem in (2) returns a solution with 2 orderings if and only if the
line with normal vector (−1, −λ) supports F at (2, ∆2 ).
How should the analyst choose λ, and under what conditions on the datagenerating process do we find the proposed approach to recover the “true” number
of orderings maximized by the agent?
5
Recovery Results
Consider the following class of choice rules. Let F be a set of K contexts, R =
{rf }f ∈F be a set of context-dependent preferences, and p ∈ [0, 1] be a probability of
error. Given choice problem (A, f ), suppose that the agent chooses the rf -optimal
alternative in A with probability at least 1 − p; otherwise, he trembles and chooses
a different alternative in A. Assume that the analyst does not know
1. the number of contexts.
2. the locations or number of realized errors.
3. the distribution of error.
8
Can he recover the true number of orderings K using the proposed approach in
(2)?
In general, recovery of the number of context-dependent preferences will not be
possible. My main result demonstrates, however, that the proposed method will
exactly recover K with probability exponentially close to 1 if: (1) the number of
preferences is small relative to the quantity of data, (2) the probability of error is
small, and (3) the context-dependent preferences are “sufficiently different,” in a
sense made precise in Section 5.2.
A natural next question is whether we can do better: in particular, can we recover either the locations of mistakes, or the set of orderings themselves? In Section
5.3, I show that even in the absence of choice error (p = 0), recovery of multiple
preferences from choice data alone is in general an ill-posed task. Proposition 1
states that no sets of three or more context-dependent orderings, and only special
pairs of context-dependent orderings, can be recovered.
5.1
Class of choice models
In this section, I show that the class of choice models I consider is fairly general,
including as special cases several familiar models of choice. To describe these relationships, recall that a random choice rule is a map P from choice problems
(A, f ) to distributions over the elements of A, with the property that the support
of P (A, f ) is a subset of A for every (A, f ). The class of choice models I consider
is equivalent to the class of random choice rules satisfying
P x(A,f ) | A, f ≥ 1 − p for all (A, f ).
where x(A,f ) denotes the rf -optimal choice in A. The following special cases are of
interest.
Case 1: p = 0 and K = 1. This returns the classic theory of choice in which a
single preference ordering r is defined over X and choice satisfies
(
1 if x = cr (A)
for all (A, f ) ∈ P
P (x|A, f ) =
0 otherwise.
where the dependence on f is trivial.
Case 2: p = 0 and K ≥ 1. This returns the model proposed in Salant &
Rubinstein (2008) and Bernheim & Rangel (2009), in which an agent is described
by a set of context-dependent orderings {rf }f ∈F , and choice satisfies
(
1 if x = x(A,f )
P (x|A, f ) =
for all (A, f ) ∈ P.
0 otherwise.
In the special case in which F is a partition of PX , this is equivalent to the model
studied in Kalai, Rubinstein & Spiegler (2002): a map h : PX → F cues contexts
from choice sets, and choice from A is the rh(A) -optimal alternative in A.
9
Case 3. p > 0 and K = 1. If there exist distributions {qA }A∈PX such that
P (x|A, f ) = qA (Rx,A )
for all (A, f ) ∈ P,
then this returns the class of random utility models with choice-set dependent
distributions (as considered, for example, in Fudenberg, Iijima & Strzalecki (2015)).
In the special case in which qA = q for every choice set A, we have the classic random
utility model (Block & Marshak 1960).
5.2
Can we recover the number of orderings?
I provide intuition for the subsequent results by discussing a few negative examples
in which recovery of K using the approach in (2) is either impossible or difficult.
In both examples, I fix p = 0 to simplify ideas.
Example 5.1. Define the sets of orderings
R = {(1, 2, 3), (1, 3, 2), (3, 2, 1)}, and
R0 = {(1, 2, 3), (3, 2, 1)}.
Suppose the agent’s true set of context-dependent preferences is described by R. It is
easy to verify6 that C(R) = C(R0 ), implying that every choice observation consistent
with maximization of some ordering in R is also consistent with maximization of
some ordering in R0 . Then, for every choice of λ and any choice dataset D generated
by (perfectly) maximizing orderings in R, we have that
E(D, R0 ) + λ|R0 | = 0 + 2λ < E(D, R00 ) + λ|R00 |
for every R00 consisting of three or more orderings. Thus, no solution of (2) will
return the true number of orderings (3) in the agent’s preference.7
Example 5.2. Define the sets of orderings
R = {(1, . . . , 9, 10), (1 . . . , 10, 9)}, and
R0 = {(1, . . . , 9, 10)}.
Suppose the agent’s true set of context-dependent preferences is described by R.
Observe that the single ordering in R0 explains almost every choice observation
that can result from maximization of orderings in R. The single exception is the
C(R) = C(R0 ) = {(x1 , {x1 , x2 , x3 }), (x1 , {x1 , x2 , }), (x1 , {x1 , x3 }), (x2 , {x2 , x3 }) (x2 , {x1 , x2 }),
(x3 , {x2 , x3 }), (x3 , {x1 , x3 }), (x3 , {x1 , x2 , x3 })}.
7
An alternative perspective is to consider R and R0 not merely observationally equivalent, but
in fact equivalent models. In this case, we can re-interpret the observation as follows: the “more
complex” description R will never be selected over the “less complex” description R0 .
6
10
observation (x10 , {x9 , x10 }), which is consistent with maximization of the ordering
(1, . . . , 10, 9), but not with maximization of the ordering (1, . . . , 9, 10). Consider
any choice dataset D that is generated by (perfectly) maximizing orderings in R,
and does not include (x10 , {x9 , x10 }). Then, for any choice of λ,
E(D, R0 ) + λ|R0 | = 0 + λ < E(D, R00 ) + λ|R00 |
for every R00 consisting of two or more orderings. So observation of (x10 , {x9 , x10 })
is necessary to return the true number of orderings (2) using (2).
These examples are suggestive of the following: in order to recover the number
of context-dependent orderings, it is necessary that there exist “sufficient differentiation” in the choice implications of these orderings. Following, I define such a
notion of differentiation.
Definition 5.1. Say that choice problems
A = {(A1 , f1 ), . . . , (Ak , fk )}
are in k-violation of IIA if
1. crf (A) 6= crf 0 (A0 ) for every (A, f ), (A0 , f 0 ) ∈ A, and
2. crf (A) ∈
Tk
i=1 Ak
for every (A, f ) ∈ A.
Condition (1) requires that every choice problem in A has a distinct optimal choice,
and condition (2) requires that each of these (distinct) k alternatives is available
in every set Ai for i = 1, . . . , k. Notice that every pair of choice problems from A
constitutes a (standard) violation of IIA.
Definition 5.2. The differentiation parameter dR (A) of a (multi)set of choice
problems A is the largest d such that there exists a partition of A into subsets
{A1 , . . . , Ad+1 } satisfying
1. |Ai | = K, and
2. Ai is in K-violation of IIA.
for every i ∈ {1, . . . , d}.
I illustrate this definition on an example.
Example 5.3. Consider a set of choice alternatives X = {x1 , x2 , x3 , x4 , x5 }, and
a set of frames F = {fp , fq }. The agent has context-dependent preferences
R = {rp , rq } = {(5, 4, 3, 2, 1), (1, 2, 3, 4, 5)}.
11
Suppose the agent maximizes rq when there are three or fewer alternatives in the
choice set, and maximizes rp otherwise. Let A consist of every choice problem
(A, f ) with A ∈ PX and f ∈ F . Then dR (A) = 6, with every pair in
({x1 , x2 }, fq ), ({x1 , x2 , x3 , x4 }, fp )
({x1 , x3 }, fq ), ({x1 , x2 , x3 , x5 }, fp )
({x1 , x4 }, fq ), ({x1 , x2 , x4 , x5 }, fp )
({x1 , x5 }, fq ), ({x1 , x3 , x4 , x5 }, fp )
({x2 , x3 }, fq ), ({x2 , x3 , x4 , x5 }, fp )
({x2 , x4 }, fq ), ({x1 , x2 , x3 , x4 , x5 }, fp )
constituting a 2-violation of IIA.
The subsequent recovery results (informally) say the following: if there is sufficient differentiation between context-dependent orderings and sufficient (not necessarily complete) sampling of choice problems, then we can recover the number
of context-dependent orderings using Equation (2) with probability very close to
1. Theorem 1 applies to general sets of choice problems. Corollary 2 considers a
particular data-generating process in which M choice problems are sampled uniformly at random from P. Throughout, I take M to be the number of observations
in the data, N to be the number of alternatives, p to be the probability of error,
and dR (A) to be the differentiation parameter of A given the agent’s preferences
R. When there is no chance of confusion, I express dR (A) simply as d(A).
Theorem 1. Let A be any (multi)set of M choice problems. Suppose, for some
constant β > 0,
2p + β
d := d(A) >
M
(3)
(1 − p)K
d(1−p)K
Then for any δ ∈ 0, M
− 2p − β , there exists a constant c > 1 such that the
optimization problem in Eq. (2) with λ =
probability at least 1 − O c−M .
1
(p+δ)M
exactly recovers |R| = K with
I provide a brief proof sketch here and defer the details to the appendix. Identify every dataset with an undirected (hyper)graph8 in the following way: nodes
represent choice alternatives, and there is an edge between a set of observations
if and only if these observations cannot be rationalized using the same preference
ordering. The key observation in the proof is that a dataset is k-rationalizable if
and only if the corresponding graph is k-colorable9 . This equivalence is shown by
8
A hypergraph is a generalization of a graph in which edges may connect more than two vertices.
A k-coloring of a graph is a partition of its vertex set V into k color classes such that no edge
in E is monochromatic. A graph is k-colorable if it admits an k-coloring.
9
12
taking each color class to represent consistency with a distinct ordering. Thus, the
problem in 2 can be seen as finding the smallest number of colors k such that the
greatest number of nodes are k-colorable.
Consistent with maximization of r1
Consistent with maximization of r2
Consistent with maximization of r3
Figure 2: Studying rationalizability of a dataset is equivalent to studying colorability of a graph in which nodes represent observations and edges represent inconsistencies.
Fix any (multi-)set of choice problems A and suppose for the moment that
there is no choice error. Since the data is generated by perfect maximization of
K orderings, the corresponding hypergraph admits a K-coloring. Moreover, notice
that every set of observations in a K-violation of IIA constitutes a complete Kpartite subgraph. Since by assumption, the data includes at least d such sets, the
corresponding hypergraph includes at least d complete K-partite subgraphs. So it
cannot be colored by fewer than K colors.
Now introduce choice error. Each node is “corrupted” with probability p, following which its edges are re-arranged (the node is removed from some edges to
which it belongs, and new edges between this node and others are introduced). I
show that if choice error is introduced at a sufficiently low probability, then enough
complete K-partite graphs remain in the perturbed graph such that “most” nodes
can be partitioned into K (but not fewer) colors.
The following corollary presents recovery properties for a particular, convenient,
choice of λ.
Corollary 1. Let A be any (multi)set of M choice problems. Suppose p ≤ 0.05
0.25
1
and d(A) > (0.95)
K M . The optimization problem in Eq. (2) with λ = 0.1M exactly
recovers |R| = K with probability at least 1 − O e−0.005M .
13
Since the number of non-overlapping sets of size K from a set of M elements
cannot exceed M
K , Condition (3) in Theorem 1 implies a tradeoff between the number of orderings that can be recovered and the probability of error that can be
tolerated. For example, if p = 0.05 (5% probability of error) the theorem does not
apply to sets with more than 6 orderings, and if p = 0.01 (1% chance of error), the
theorem does not apply to sets with more than 35 orderings. In Corollary 1, the
0.25
(stronger) requirement d(A) > 0.95
K M is satisfied only by sets A including three or
fewer orderings. These strict thresholds are not necessary conditions for recovery,
and can be relaxed in future work10 .
Below, I illustrate the implications of this approach and choice of λ using a
problem of preference elicitation considered in Crawford & Pendakur (2012).
Example 5.4. Crawford & Pendakur (2012) study preferences over six different
types of milk, using a dataset including 500 Danish households and their purchases.
Unsurprisingly, no single utility function can explain all 500 observations. To accommodate heterogeneity in preference, Crawford & Pendakur (2012) suggest an
application of Kalai, Rubinstein & Spiegler (2002), in which the minimal number
of utility functions that explain all of the data is found. They find that in fact 4-5
utility functions are sufficient to explain all of the data. This is a perfect multiplepreference fit to the data.
But they further comment that the fifth utility function explains only 8 out of 500
observations, and drop this utility function in many of their latter analyses. This
highlights a limitation of the approach proposed in Kalai, Rubinstein & Spiegler
(2002): the approach ignores variation in the strength of evidence for recovered
preferences.
We can extend the approach in Kalai, Rubinstein & Spiegler (2002) by finding
1
1
a “best” imperfect multiple-preference fit. Under the choice of λ = 0.1(500)
= 50
proposed in Corollary 1, preferences exist in a best multiple-ordering rationalization
only if they uniquely explain at least 50 observations. The fifth utility function does
not satisfy this criterion, so the remaining 8 observations are interpreted as choice
error.11
Corollary 2 considers a related context in which the set of choice problems A is
not fixed by the analyst, but generated by uniform sampling over the set of possible
choice problems P = {(A, f ) : A ∈ PX , f ∈ F }. A similar result obtains provided
the differentiation parameter d(P) is sufficiently high.
10
One way to relax this restriction is to count the number of possibly overlapping sets of choice
problems that constitute an K-violation of IIA.
11
It is important to note, however, that Crawford & Pendakur (2012) consider utility functions
defined on the continuous space of price-quantity pairs, whereas my recovery results in Section 5
pertain to orderings defined on finite sets of alternatives.
14
Corollary 2. Suppose A consists of M choice problems sampled uniformly at random from P, and
2p + β
N
d := d(PX ) >
2
K
(4)
(1 − p)K
K
for some constant β > 0. Then for any δ ∈ 0, d(1−p)
−
2p
−
β
, there is a
M
constant c > 1 such that the optimization problem in Eq. (2) with λ =
exactly recovers |R| = K with probability at least 1 − O c−M .
1
(p+δ)M
The details of the proof are deferred to the appendix.
5.3
Can we recover more?
Section 4.1 provides conditions under which the problem in (2) recovers the correct
number of context-dependent orderings with high probability. Is it possible to
recover the context-dependent orderings themselves? First, I show that even in
the absence of choice error (p = 0), recovery of multiple preferences from choice
data is in general an ill-posed task. From Proposition 1, no set of three or more
context-dependent orderings can be uniquely recovered, and only special pairs of
context-dependent orderings can be recovered.
Next, I discuss the possibility of identifying an equivalence class (in choice
implications) containing the true set of context-dependent orderings. I provide an
example to illustrate that even this weaker notion of identifiability is not met. I
suggest that a more nuanced notion of complexity than cardinality is needed for
recovery of sets of context-dependent orderings, and conclude with preliminary
comments toward extension.
In the following, I say that R is identified if there exists data D such that
{R} = argmin |R0 | | E(D, R0 ) = 0 .
(5)
R0 ∈R
That is, there exists some set of choice observations (possibly including observation
of multiple choices from the same choice set) such that R is the unique set of
k ≤ |R| orderings that perfectly explains the data. We can think of the data as
“revealing” R to the data analyst. Otherwise, say that R is not identified. The
following observation provides an equivalent characterization.
Observation 1. R is identified if and only if there does not exist R0 6= R satisfying
|R0 | ≤ |R| and E(C(R), R0 ) = 0.
Thus, if there exists any data which identifies R, then the dataset C(R) will
identify R.
15
Example 5.5. Fix X = {x1 , x2 , x3 } and R = {(3, 2, 1), (3, 1, 2)}. The choice
implications of R
C(R) ={(x1 , {x1 , x2 }), (x1 , {x1 , x3 }), (x2 , {x2 , x3 }),
(x3 , {x2 , x3 }), (x1 , {x1 , x2 , x3 })}
is a subset of the choice implications of R0 , for every
R0 = {{(3, 2, 1), (2, 1, 3)}, {(3, 2, 1), (1, 2, 3)},
{(2, 3, 1), (3, 1, 2)}, {(1, 3, 2), (3, 1, 2)}}.
So every dataset that can be perfectly rationalized by the orderings in R can be
perfectly rationalized by the orderings in R0 ∈ R. Therefore, R is not identified.
The following proposition establishes non-identifiability of most non-singleton
sets. Specifically, every set with at least three orderings is not identified, and a pair
of orderings is identified if and only if the orderings differ at extremes (the maximal
element in the first ordering is the minimal element in the second, and vice versa).
Proposition 1. If |R| ≥ 3, then R is not identified. If R = {r1 , r2 }, then R is
identified if and only if argmaxi rij = argmini ri3−j for j = 1, 2. Every set R = {r}
is identified.
The proof is deferred to the appendix. Proposition 1 leaves open the possibility
of recovering an equivalence class (in choice implications) containing the true set
of orderings. Say that R ∼ R0 if C(R) = C(R0 ), and let
[R] = {R0 ∈ R | R0 ∼ R}
be the equivalence class of R induced by ∼. Every set of orderings in [R] has the
same choice implications as R, so that an observation can be rationalized using an
ordering in R if and only if it can be rationalized using an ordering in R0 . Say that
R is choice-identified if there exists data D such that
[R] = argmin |R0 | | E(D, R0 ) = 0 .
R0 ∈R
That is, there exists some set of choice observations D such every observation in D
is consistent with maximization of some ordering in R0 if and only if C(R0 ) = C(R).
The following negative example shows that even this weaker kind of identifiability
need not be satisfied by typical sets of orderings.
Example 5.6. Define
R = {(1, 2, 3, 4, 5), (2, 1, 3, 4, 5)}, and
R0 = {(1, 2, 3, 4, 5), (5, 4, 3, 2, 1)}
16
Every choice observation consistent with maximization of some ordering in R is
also consistent with maximization of some ordering in R0 , but the converse does
not hold. So C(R) ⊂ C(R0 ), implying both that R0 ∈
/ [R], and also that every
dataset that can be perfectly rationalized using the orderings in R can be perfectly
rationalized using the orderings in R0 . It follows that [R] is not choice-identified.
The example above highlights the (general) phenomenon that across sets with a
fixed number of orderings, there is variation in the “richness” of choice implication.
Since the approach in (2) penalizes all sets consisting of the same number of orderings equally, it is biased towards elicitation of sets with richer choice implications.
That is, if the decision maker’s context-dependent preferences are many but similar, the proposed approach will incorrectly interpret the data using orderings that
are fewer but “more different”. It may be possible to recover equivalence classes by
extending the approach in (2) to loss functions of the form
E(D, R) + λf (R),
where f penalizes the “richness” or “expressiveness” of the orderings in R, instead
of the cardinality. I leave this extension for future work.
6
Relationship to Literature
This paper extends ideas in Kalai, Rubinstein & Spiegler (2002), which defines a set
of orderings {i }L
i=1 as a rationalization by multiple rationales of choice function c :
PX → X if for every choice set A ∈ PX , the selected alternative c(A) is i -maximal
in A for some i = 1, 2, . . . , L. Using the notation of Section 2, any set of orderings
R with choice error E(D, R) = 0 is a rationalization by multiple rationales of the
dataset D. This set of orderings may not, however, correspond to a best multipleordering rationalization of the data as defined in (2). In particular, I suggest
that the analyst may prefer an imperfect rationalization of the data using some
K < L orderings to perfect rationalization of the data using L orderings. The key
conceptual difference is that Kalai, Rubinstein & Spiegler (2002) is agnostic towards
the “degree of evidence” for any particular ordering k , whereas the approach in
this paper insists on sufficient evidence for each ordering in order to separate error
from preference variation.
The model of choice that I consider throughout is an extension of frame-dependent
preferences proposed independently in Salant & Rubinstein (2008) and Bernheim
& Rangel (2009), with the addition of choice error. In each of these papers, the
standard model is enriched by a set F of contexts.12 A choice problem13 is defined
12
Frames in Salant & Rubinstein (2008), and ancillary conditions in Bernheim & Rangel (2009)
Extended choice problem in Salant & Rubinstein (2008), and generalized choice situation in
Bernheim & Rangel (2009).
13
17
as a pair (A, f ) where A ⊆ X is a choice set and f ∈ F is a context. An extended
choice function c assigns to every extended choice problem (A, f ) an element of
A. I consider an extension of this model to allow for probability of error, so that
c(A, f ) is chosen with probability at least 1 − p, but with probability p the agent
trembles.
Although my model of choice is very similar, the goals of this paper are very
different. Salant & Rubinstein (2008) characterizes the choice correspondence
Cc (A) = {x | c(A, f ) = x for some f ∈ F }, and Bernheim & Rangel (2009) proposes
a framework for welfare assessment. This paper studies the question of whether it
is possible to recover the number of contexts in F , using choice data alone. My
results in Section 5 show that recovery of context-dependent preferences given the
model proposed in Salant & Rubinstein (2008) and Bernheim & Rangel (2009) is
an ill-posed problem, but recovery of the number of contexts is possible even under
choice error.
This paper is related also to Ambrus & Rozen (2013), which shows that without
prior restriction on the number of selves involved in a decision, many multi-self
models have no testable implications. Although the set of choice models considered
in Ambrus & Rozen (2013) is distinct from the set of choice models considered
in my paper,14 their lesson that restricting the number of selves is important for
constraining the available degrees of freedom holds in my domain as well, and
motivates in part the suggested criterion in (2).
Finally, the applications that I suggest are related to exercises undertaken in
Crawford & Pendakur (2012) and Dean & Martin (2010), which respectively apply
the approaches of Kalai, Rubinstein & Spiegler (2002) and Houtman & Maks (1985)
to interpret inconsistent choice data.
Conclusion
Inconsistencies in choice data may emerge either from choice error, or from maximization of multiple orderings. It is important to separate the two in analysis of
the data, since their implications for welfare assessment and prediction are very
different. But how does the analyst know how many distinct orderings are being maximized? This paper suggests use of statistical regularization to recover a
small number of context-dependent preferences from noisy choice data. I show that
with probability exponentially close to 1, the proposed approach is able to recover
the true number of context-dependent preferences. This provides an alternative
14
Ambrus & Rozen (2013) study multi-self models in which every self is active in every decision,
and choice is determined through maximization of a choice-set independent aggregation rule over
selves. In contrast, I study multi-self models in which every self acts as a “dictator” in a subset of
choices, thus varying the aggregation rule across choice problems.
18
to existing approaches, which deliver either a single “best-fit” ordering or multiple
“perfect-fit” orderings.
19
Appendix: Proofs from Main Text
A
Theorem 1
A.1
Preliminary Notation and Results
I use the following objects and definitions. A hypergraph is a pair H = (V, E) where
V is a finite nonempty set, called the set of vertices, and E is a family of distinct
subsets of V , called the set of edges.15 A k-coloring of a hypergraph is a partition
of its vertex set V into k color classes such that no edge in E is monochromatic.
A hypergraph is k-colorable if it admits an k-coloring. Finally, G = (V, E) is a
complete k-partite graph if there is a partition {Vi }ki=1 of the vertex set V such
that {u, v} ∈ E if and only if u and v are in different partitions. The set of all
hypergraphs on M vertices is denoted H. In the remainder of this proof, I refer to
hypergraphs simply as graphs.
These concepts are related to our problem as follows. Enumerate the observations in any dataset D = {(x, A) : A ∈ A} as {(xi , Ai )}M
i=1 . These choice observations can be identified with a graph H = (V, E) where V = {1, 2, . . . , M } indexes
observations, and E consists of every set T ⊆ V such that: (1) the observations in
{(xi , Ai ) | i ∈ T } are inconsistent16 , and (2) no proper subset of {(xi , Ai ) | i ∈ T }
is inconsistent. I refer to the vertices of H and the observations they represent
interchangeably.
Claim 3. The following statements are equivalent:
1. H is k-colorable.
2. D is k-rationalizable.
Proof. Take each color class to represent consistency with a distinct ordering, and
the equivalence directly follows.
For any graph H, let fH be the linear interpolation of points {(k, ∆k,H ) : k ∈
Z+ }, where ∆k,H is the minimal number of nodes in H that must be removed for
H to become k-colorable.17 Let FH be the convex hull of the epigraph of fH (see
Figure A.1), and define c := λ1 . Then, if there does not exist k ∈ N satisfying
∆k,H < ∆K,H + c(K − k), 18
15
This is a generalization of a graph in which edges may connect more than two vertices.
There does not exist an ordering r such that xi is r-maximal in Ai for every i ∈ T .
17
This is equivalent to the definition of ∆k used in the main text, through Claim 3.
18
In vector notation,
(−1, −λ) · (K − k, ∆k,H − ∆K,H ) ≥ 0.
16
20
(6)
the line h = {x | (−1, −λ) · x = (−1, −λ) · (K, ∆K,H )} properly supports FH at
(K, ∆K,H ), and any solution R∗ to the minimization problem argminR⊆R |R| +
λE(D, R) must satisfy |R∗ | = K, as desired.
k,H
fH
FH
K,H
K
k
Figure 3: Any choice of λ for which (−1, −λ) is a subgradient of fH
at
Withohigh probability, the set of vectors
n (K, ∆K,H ) will recover KK.
d(1−p)
1
(−1, − (p+δ) ) | δ ∈ 0, M
− 2p − β
is a subset of the subdifferential of fH
at (K, ∆K,H ).
Finally, suppose that p = 0, so that the agent perfectly maximizes his contextdependent ordering. This identifies a deterministic graph G.
Claim 4. G includes at least d non-overlapping complete K-partite subgraphs.19
Proof. Each subgraph induced by the vertices in a K-violation of IIA is a complete
K-partite graph.
A.2
Main Proof
Imperfect maximization using the random choice rule P generates a probability
distribution
over H. Denote
the random graph with this distribution by H. Fix
K
δ ∈ 0, d(1−p)
−
2p
−
β
and
λ = (p + δ)M . I will now show that the probability
M
that no k ∈ N satisfies (6) is at least 1 − O c−M , from which it will follow that
the probability that Eq. (2) recovers K is at least 1 − O c−M .
In the subsequent claims, take S ∼ Bin(M, p) to be the number of observations
which are imperfectly maximized, and take VE ⊆ V to be the random variable
whose outcome is the set of imperfectly maximized observations.
Lemma 1. The probability that no k > K satisfies (6) is at least 1 − e−2δ
19
Two subgraphs are said to be non-overlapping if they do not share vertices.
21
2M
.
Proof. Since ∆K,H ≤ S, if there exists k > K such that ∆k,H < ∆K,H − c(k − K),
then necessarily S > c = (p + δ)M . Otherwise, ∆k,H < c(K − k + 1) ≤ 0. Since
E(S) = pM , it follows from Hoeffding’s Inequality that
((p + δ)M − pM )2
Pr(S ≥ c) = Pr(S − pM ≥ c − pM ) ≤ exp −2
M
= e−2δ
2M
,
and therefore the probability that no k > K satisfies (6) is at least 1 − e−2δ
desired.
2M
, as
Lemma 2. The probability that no k < K satisfies (6) is at least
2
(1 − p)K β 2 M
− β2 M
1−e
1 − exp −
.
2(2p + δ + β)
Proof. If there exists k < K satisfying (6), then H must include strictly fewer than
c + S non-overlapping complete K-partite graphs. Otherwise, ∆k,H ≥ (c + S)(K −
k) ≥ ∆K,H + c(K − k) for every k < K, since every complete K-partite graph is
K-colorable and every subgraph of a complete graph is itself a complete graph.
Define HP to be the random subgraph of H induced by vertices in V \VE (perfectly maximized observations). Then, if HP contains at least c+S non-overlapping
complete K-partite graphs, H must also. I determine the probability that HP includes at least c + S non-overlapping
complete K-partite graphs as a lower bound.
β2
β
2
M with probability at least 1 − e− 2 M , and
n
o
subsequently that conditional on the event S < p + β2 M , subgraph HP includes at
least cK+2 S non-overlapping complete K-partite graphs with probability
β M
1 − exp − (1−p)
2(2p+δ+β) . The first statement follows from immediately from Hoeffding’s inequality, since E(S) = pM and
β2
β
Pr S < p +
M ≥ 1 − e− 2 M .
(7)
2
I first show that S <
p+
Suppose S < (p + β2 )M . Index the d complete K-partite subgraphs in G (existence
from Claim 4) by i = 1, 2, . . . , d. Let Xi be the indicator variable which takes value
1 if every vertex in complete K-partite subgraph i is perfectly maximized, and let
P
X = di=1 Xi . Notice Xi ∼ Ber((1 − p)K ) for every i and EX = d(1 − p)K . Using
Hoeffding’s inequality,
Pr(X < c + S) = Pr X − d(1 − p)K < c + S − d(1 − p)K
(c + S − d(1 − p)K )2
≤ exp −2
.
d
22
Since by assumption δ ∈
K
0, d(1−p)
−
2p
−
β
, it follows that d ≥
M
(2p+β+δ)M
.
(1−p)K
Therefore, d(1 − p)K > (2p + δ + β)M > c + S, so
∂f (S)
2(c + S)2
= −2(1 − p)2K +
< 0, and
∂d
d2
∂f (S)
4(c + S)
= 4(1 − p)K −
> 0.
∂S
d
where f (S) =
bound on d,
−2(c+S−d(1−p)K )2
.
d
Then, using the upper bound on S and the lower
(1 − p)K β 2 M
(c + S − d(1 − p)K )2
≤ exp −
.
exp −2
d
2(2p + δ + β)
(8)
From (7) and (8), the probability that no k > K satisfies (6) is at least
2
(1 − p)K β 2 M
− β2 M
1−e
1 − exp −
2(2p + δ + β)
as desired.
Using Lemmas 1 and 2, the probability that no k ∈ Z+ satisfies (6) is at least
2
(1 − p)K β 2 M
− β2 M
1−e
1 − exp −
− exp −2δ 2 M = 1 − O(c−M ),
2(2p + δ + β)
n
o
2
(1−p)K β 2
where c = min exp( β2 ), exp 2(2p+δ+β)
, exp 2δ 2 .
B
Corollary 1
Take β = 0.1 and δ = .05. Then for any d >
2p+δ
M
(1−p)K
=
0.25
M,
(1−p)K
d(1 − p)K
− 2p − β > 0.25 − 2p − β = 0.05
M
K
−
2p
−
β
. Directly apply Theorem 1.
so that δ = 0.05 ∈ 0, d(1−p)
M
C
Corollary 2
By assumption, P includes at least d non-overlapping sets of choice problems in
K-violation of IIA. Enumerate the Kd choice problems included in a K-violation
using i = 1, . . . , Kd. Let Zi be the random variable whose outcome is the number
of times choice problem i is sampled, and let Qi be the event {Zi < α 2NMK },
where α =
2p+ β2 +δ
2p+β+δ
Pr
< 1. Then,
!
Kd
Kd
[
X
M
Qi ≤
Pr(Qi ) = Kd Pr Z1 < α
.
2N K
i=1
i=1
23
Since Z1 ∼ Bin M, 2N1K and E(Z1 ) = 2NMK , it follows from Hoeffding’s Inequality
that
2 !
(1 − α) 2MN
M
αM
M
Kd Pr Z1 − N < N − N
≤ Kd exp −2
2 K
2 K
2 K
M
(1 − α)2 M
:= g(K, d, p, β, δ, M )
= Kd exp −
22N −1
(9)
S
h
i
(1−α)2 M
Kd
Therefore Pr Zi ≥ α 2NMK for every i = 1−Pr
Q
≥
1−Kd
exp
−
.
i
2N
−1
i=1
2
M
M
Conditional on the event {Zi ≥ α 2N K for every i}, there are at least dα 2N K >
2p+ β2 +δ
M
(1−p)K
non-overlapping sets of K choice problems in K-violation of IIA in the
sampled data, and we can apply Theorem 1 to conclude that probability of recovery
has lower bound
!#
"
K β2M
β2
(1
−
p)
f (K, d, p, β, δ, M ) := 1 − e− 8 M 1 − exp −
− exp −2δ 2 M .
β
8(2p + δ + 2 )
(10)
From (9) and (10), the probability of recovery is at least
(1 − g(K, d, p, β, δ, M ))f (K, d, p, β, δ, M ) = 1 − O(c−M )
with
c := c(K, d, p, β, δ) = min { exp
β2
8
, exp
(1 − p)K β 2
8(2p + δ + β2 )

exp(2δ 2 ), exp 
1
22N −1
!
,
2p + β2 + δ
1−
2p + β + δ
!2 

 >1

as desired.
D
Proposition 1
For any set of orderings R and ordering r ∈ R, define g(r, R) to be the set of
all choice observations consistent with maximization of r and inconsistent with
maximization of any other r0 ∈ R.20 The set of revealed preferences in g(r, R) is
given by the binary relation
Br := {(x, y) : (x, A) ∈ g(r, R) for some A including y}.
20
For example, if R = {(1, 2, 3), (2, 3, 1)}, then g((1, 2, 3), R) = {(x3 , {x1 , x2 , x3 }), (x3 , {x1 , x3 }),
(x3 , {x2 , x3 })}, since these observations are consistent with maximization of (1, 2, 3) and inconsistent with maximization of (2, 3, 1).
24
Let Br be its transitive closure. The following is a necessary condition for identifiability of R.
Claim 5. Suppose there exist orderings r, r0 ∈ R such that
argmaxi ri 6= argmini r0i .
(11)
Then, R is not identified.
Proof. First, I show that R = r1 , . . . , rK is identified only if Br is complete for
every r ∈ R. Suppose to the contrary that R is identified, but (without loss of
generality) Br1 is not complete. Then there exists some ordering r̂1 6= r1 such that
every choice observation in g(r1 , R) is consistent with maximization of r̂1 , so we
can replace r1 with r̂1 in R without loss of any choice implications. Formally, define
R0 = r̂1 , . . . , rK . Since C(R) ⊆ C(R0 ), it follows from Observation 1 that R is
not identified.
Next, I show that if there exist orderings r, r0 ∈ R satisfying (11), then Br is
not complete for some r ∈ R. Index the alternatives such that x1 := argmaxi ri
and x2 := argmini r0i . I show that neither (x1 , x2 ) nor (x2 , x1 ) is in Br0 , and hence
Br0 is not complete. Suppose towards contradiction that (x1 , x2 ) ∈ Br0 . Then
(x1 , A) ∈ g(r0 , R) for some A ∈ PX . But since x1 is r-maximal, every observation in
which x1 is selected is consistent with maximization of ordering r0 . Thus, for every
A ∈ PX , choice observation (x1 , A) ∈
/ g(r0 , R). This yields the desired contradiction.
Suppose alternatively that (x2 , x1 ) ∈ Br0 . Then, (x2 , A) ∈ g(r0 , R) for some A ∈
PX . But x2 is ranked last according to r0 , so (x2 , A) ∈
/ g(r0 , R) for every A ∈ PX .
This yields the desired contradiction. Therefore, if there exist orderings r, r0 ∈ R
satisfying (11) then R is not identified.
It follows immediately from Claim 5 that every set R = r1 , r2 with argmaxi rji 6=
for some j ∈ {1, 2} is not identified. Moreover, since every set R with
argmini r3−j
i
|R| ≥ 3 must include orderings satisfying (11), it follows from Claim 5 that every
set R with three or more orderings orderings is not identified.
Next, I show that sets R = r1 , r2 with argmaxi rji = argmini ri3−j for j =
1, 2 are identified. Index the alternatives such that x1 = argmaxi r1i and xN =
argmaxi r2i , and define D = C(r1 ) ∪ C(r2 ). Suppose to the contrary that there
exists a set of orderings R0 = {r̂1 , r̂2 } =
6 R such that E(D, R0 ) = 0. I show a contradiction by identifying an observation in D which is inconsistent with maximization
of both r̂1 and r̂2 .
First observe that necessarily either x1 is highest ranked in r̂1 and xN is highest
ranked in r̂2 or vice versa, since (x1 , X), (xN , X) ∈ D. Without loss of generality,
suppose the former. Since r̂1 6= r1 , there exist alternatives xk , xl with k, l ∈
/ {1, N }
1
1
1
1
such that r̂k < r̂l and rk > rl ; that is, xk is higher ranked than xl under r1
but not under r̂1 . Let A be the set of all alternatives ranked lower than xk in
25
ordering r1 , noting that xN ∈ A since xN = argmini r1i by assumption. Then,
choice observation (xk , A) ∈ C(r1 ), but is inconsistent with maximization of r̂2
since xN ∈ A. Moreover, (xk , A) is inconsistent with maximization of r̂1 since
xl ∈ A. This yields the desired contradiction.
Finally, every singleton set R = {r} is trivially identified using the set of all of
its choice implications C(r).
26
References
Ambrus, Attila & Kareen Rozen. 2013. “Rationalizing Choice with Multi-Self Models.” Economic Journal .
Belloni, Alexandra, Victor Chernozhukov, Dan Chen & Christian Hansen. 2012.
“Sparse Models and Methods for Instrumental Regression, with an Application
to Eminent Domain.” Econometrica .
Belloni, Alexandra, Victor Chernozhukov & Lie Wang. 2011. “Square-Root Lasso:
Pivotal Recovery of Sparse Signals via Conic Programming.” Biometrika .
Belloni, Alexandre & Victor Chernozhukov. 2011. “L1-Penalized Quantile Regression in High-Dimensional Sparse Models.” Annals of Statistics .
Bernheim, B. Douglas & Antonio Rangel. 2009. “Beyond Revealed Preference:
Choice Theoretic Foundations for Behavioral Welfare Economics.” Quarterly
Journal of Economics .
Beshears, John, James Choi, David Laibson & Brigitte Madrian. 2008. The Importance of Default Options for Retirement Saving Outcomes: Evidence from the
United States. Oxford University Press pp. 59–87.
Block, H.D. & Jacob Marshak. 1960. “Random Orderings and Stochastic Theories
of Response.” Contributions to Probability and Statistics .
Crawford, Ian & Krishna Pendakur. 2012. “How Many Types Are There?” Economic Journal .
Dean, Mark & Daniel Martin. 2010. “How Consistent are your Choice Data?”
Working Paper.
Donoho, David L. & Xiaoming Huo. 2001. “Uncertainty Principles and Ideal Atomic
Decomposition.” IEEE Transactions on Information Theory .
Einav, Liran, Amy Finkelstein, Iuliana Pascu & Mark Cullen. 2012. “How General are Risk Preferences? Choices under Uncertainty in Different Domains.”
American Economic Review .
Elad, Michael & Alfred Bruckstein. 2002. “A Generalized Uncertainty Principle
and Sparse Representation.” IEEE Transactions on Information Theory .
Fudenberg, Drew, Ryota Iijima & Tomasz Strzalecki. 2015. “Stochastic Choice and
Revealed Perturbed Utility.” Econometrica .
27
Gabaix, Xavier. 2014. “A Sparsity-Based Model of Bounded Rationality, with
Application to Basic Consumer and Equilibrium Theory.” Quarterly Journal
of Economics . Working Paper.
Houtman, Martijn & J.A.H. Maks. 1985. “Determining all Maximal Data Subsets
Consistent with Revealed Preference.” Kwantitatieve Methoden 19:89–104.
Huang, Po-Sen, Scott Deeann Chen, Paris Smaragdis & Mark Hasegawa-Johnson.
2013. “Singing-Voice Separation from Monaural Recordings Using Robust Principal Component Analysis.” International Conference on Acoustics,
Speech, and Signal Processing .
Huber, Joel, John Payne & Christopher Puto. 1982. “Adding Asymmetrically Dominated Alternatives: Violations of Regularity and the Similarity Hypothesis.”
Journal of Consumer Research .
Kahneman, Daniel & Amos Tversky. 2000. Choices, Values, and Frames. The Press
Syndicate of the University of Cambridge.
Kalai, Gil, Ariel Rubinstein & Ran Spiegler. 2002. “Rationalizing Choice Functions
by Multiple Rationales.” Econometrica 70(6):2481–2488.
Ren, Xiang, Zhendong Zhang & Yi Ma. 2012. “Repairing Sparse Low-Rank Texture.” Journal of LaTeX Class Files .
Salant, Yuval & A. Rubinstein. 2008. “(A,f): Choice with Frames.” The Review of
Economics Studies .
Yang, Allen, Zihan Zhou, Arvind Ganes, S. Shankar Sastry & Yi Ma. 2013. “Fast
L1-Minimization Algorithms for Robust Face Recognition.” Computer Vision
and Pattern Recognition .
28