Download Interpretation of Inconsistent Choice Data: How Many Context

Interpretation of Inconsistent Choice Data: How Many Context-Dependent Preferences are There? Annie Liang∗ October 22, 2015 Abstract Inconsistencies in choice data may emerge either from context dependencies in preference or from stochastic choice error. These inconsistencies are quite different, and have distinct implications for welfare assessment and prediction. How can the analyst separate the two in data? This paper provides a tool for identifying the number of context-dependent preferences in noisy choice data. Using the technique of statistical regularization, I define a best multipleordering rationalization as one that maximizes fit to the data subject to a penalty on the number of orderings used. I show that although recovery of the orderings themselves is an ill-posed problem, exact recovery of the number of context-dependent orderings is feasible with probability exponentially close to 1 (in quantity of data) using the proposed approach. 1 Introduction In the simplest model of choice, an individual’s preference is described as a linear ordering r over alternatives in a set X, and his choice from any subset A ⊆ X is the r-maximal element in A. It is common to interpret choice data under this model, and to infer a single ordering best fit to the data. In practice, however, choice datasets may result from the maximization of several different preferences. For example: 1. Heterogeneity in preference across choice domains. Choice data frequently pools observations drawn from a variety of choice domains, but agents may ∗ Department of Economics, Harvard University. Email: [email protected]. I am especially grateful to Jerry Green for invaluable advice and support. I am also grateful to Emily Berger, Gabriel Carroll, Drew Fudenberg, David Laibson, Eric Maskin, Jose Montiel Olea, Gleb Romanyuk, Andrei Shleifer, Ran Shorrer, and Tomasz Strzalecki for useful comments and discussions. 1 have different preferences in different domains. For example, Einav, Finkelstein, Pascu & Cullen (2012) study the commonality of financial risk preferences across six choice domains — including 401(k) asset allocations, shortterm disability insurance, and insurance choices regarding health, drug, and dental expenditures — and find that just over 30% of their sample makes decisions that can be simultaneously rationalized over all six domains. 2. Multiple behavioral selves. There is extensive empirical evidence that preference varies with external details of the choice environment, for example the framing of the problem (Kahneman & Tversky 2000), the presence of default options (Beshears, Choi, Laibson & Madrian 2008), and the addition of decoy options (Huber, Payne & Puto 1982). Outside of the laboratory, it is unusual for external details to remain constant across every observed choice; choices may therefore reflect different behavioral biases in different observations. 3. Multiple representative agents. Choice data is often pooled from a population, across which there may be subpopulations or clusters of individuals with different preferences. Ideally, agents in different clusters are identifiably different, but in practice the observable characteristics of these agents may be indistinguishable (see, for example, Crawford & Pendakur (2012)). In each of these settings, the analyst may not know beforehand the number of context-dependent preferences maximized in the data. But context-dependence is a persistent feature of preference; thus, it is important to know the number of contexts for two reasons: 1) Welfare implications—it tells us whether standard welfare analysis is appropriate for interpretation of the observed choices, or whether there is genuine preference variation that should be elicited to inform normative statements; 2) Prediction—it tells us whether to interpret the observed inconsistency as noise, or as systematic variation that can inform prediction of future choices. The purpose of this paper is to provide a tool for determining the number of context-dependent preferences maximized in the data. The challenge is that inconsistencies may reveal genuine context-dependencies, but may also simply reveal choice error (for example, due to inattention by the subject or measurement error by the analyst). At extremes, we can rationalize the data using only multiplicity in rationales (in which case every observation in conflict is described with a new ordering) or using only choice error (in which case every observation in conflict is described as a mistake). How can the analyst recover the true number of context-dependent preferences from choice data? This paper proposes a simple approach using regularization1 , a statistical technique in which a penalty is imposed on the complexity of the learned model to 1 For example, two common regularization techniques for learning regression function f (x) = xβ P extend the usual OLS approach, min i (yi − xi β)2 , to: 2 prevent overfitting. Regularization techniques have received tremendous interest in recent decades in the applied mathematics and computer science communities due to their ability to recover various sparse structures from noisy or otherwise corrupted data. Applications range from recovery of signals (Donoho & Huo 2001, Elad & Bruckstein 2002) to repair of damaged images and videos (Ren, Zhang & Ma 2012, Yang, Zhou, Ganes, Sastry & Ma 2013) to lyrics and music separation (Huang, Chen, Smaragdis & Hasegawa-Johnson 2013).2 The sparse structure in my problem is preference: while the agent may possess as many as |X|! context-dependent-orderings over X, I assume that the number of orderings he maximizes is much smaller. I adapt methods from this literature to suggest an “optimal” number of orderings to use in describing the data. A best multiple-ordering rationalization (BMOR) is defined as a solution to the following linear programming program: argminR∈R E(D, R) + λ|R|, (1) where R consists of all sets of orderings over X, E(D, R) is the number of observations in dataset D that are inconsistent with maximization of any ordering in R, and λ ∈ R+ is a constant. The program in (1) thus maximizes fit (by minimizing the number of unexplained observations E(D, R)) subject to a penalty on the number of orderings used (by minimizing |R|), and the constant λ trades off between these two goals. Notice that the problem in (1) nests as special cases two well-known approaches in the literature. The choice of λ = 0 returns the Houtman & Maks (1985) solution: the ordering that explains the largest number of observations in the data. The choice of λ ≥ 1 returns the Kalai, Rubinstein & Spiegler (2002) solution: the smallest set of orderings that explains all of the data. The problem in (1) generalizes these two approaches by considering intermediate values of λ ∈ (0, 1). For what choices of λ, and with what guarantee, does the solution to (1) recover features of the agent’s preference? The main part of the paper shows that if choice data is generated by any in a large class of models, there is an interval of choices of λ for which the approach in (1) exactly recovers the correct number of orderings with probability exponentially close to 1. The class of data-generating processes I consider is the following. Let F be a (finite) set of K contexts and let R = • min P − xi β)2 + λ||β||2 (ridge regularization), and • min P − xi β)2 + λ||β||1 (lasso regularization). i (yi i (yi Ridge regression penalizes the complexity of the learned model through the l2 norm on the vector of coefficients, and lasso regression penalizes the l1 norm. 2 These ideas and techniques have recently begun to be applied to economic problems. See for example Belloni, Chernozhukov & Wang (2011), Belloni & Chernozhukov (2011), Belloni, Chernozhukov, Chen & Hansen (2012), and Gabaix (2014). 3 {rf }f ∈F be a set of context-dependent preferences.3 A choice problem is a pair (A, f ) consisting of a choice set A and a context f . In each choice problem, the the agent selects the rf -optimal alternative in A with probability at least 1 − p; otherwise, he trembles and selects a different alternative. I do not impose any parametric assumptions on either the pattern of error or the nature of the orderings. Taking K = 1 and p = 0 returns the canonical single-ordering model, and taking K ≥ 1 and p = 0 returns the generalized choice functions proposed independently in Salant & Rubinstein (2008) and Bernheim & Rangel (2009). My main result (Theorem 1) shows that if the number of orderings K is sufficiently small (relative to the number of possible orderings, |X|!), the probability p of erring in each choice is sufficiently low, and the choice implications of the preference orderings in R are sufficiently different (see Section 5.2), then the problem in 1 recovers the exact number of orderings K with probability exponentially close to 1 (in quantity of data). Section 5.3 qualifies this result: while we can recover the number of orderings, it is not in general possible to recover the orderings. I make preliminary comments towards extension of the approach to recover further details of preference. Section 6 is the literature review, and Section 7 concludes. 2 Notation Let X denote the set of choice alternatives, A denote a typical subset of X, and PX denote the set of all subsets of X. I refer to A as a choice set, and PX as the set of all choice sets. A choice observation (x, A) is a pair denoting selection of alternative x from choice set A. A dataset is a collection of choice observations D = {(x, A) | A ∈ A}, where A is a (multi)set of elements from PX . A strict linear ordering is a complete, antisymmetric, and transitive binary relation on X. Let R be the set of all permutations of (1, 2, . . . , N ), with typical element r. Identify every linear ordering with the permutation or preference ordering r = (r1 , r2 , . . . , rN ) ∈ R satisfying ri < rj if and only if xi xj . (For example, x1 x3 x2 is identified with r = (1, 3, 2).) Coordinate ri can be interpreted as the ordinal rank of alternative xi according to the ordering . The choice function cr : PX → X induced by r takes every choice set A ∈ PX to the r-maximal element in A. Say that dataset D is consistent if there exists an ordering r such that (x, A) ∈ D only if x = cr (A), and inconsistent otherwise. Two choice observations (x, A) and (x0 , A0 ) are said to be in violation of the Independence of 3 In the special case in which the set of contexts is a partition of the power set on X, R describes a set of menu-dependent preferences. 4 Irrelevant Alternatives axiom (IIA) if x, x0 ∈ A and also x, x0 ∈ A0 , so that they cannot be rationalized by the same strict ordering. For a given set of orderings R ∈ R, define C(R) := {(x, A) | A ⊆ PX and x ∈ {cr (A), r ∈ R}} to be the set of all choice observations consistent with maximization of some ordering in R. I refer to these as the choice implications of the set R. For example, let X = {x1 , x2 , x3 } and define orderings r1 = (1, 2, 3) and r2 = (1, 3, 2). Then, C(R) ={(x1 , {x1 , x2 }), (x1 , {x1 , x3 }), (x2 , {x2 , x3 }), (x3 , {x2 , x3 }), (x1 , {x1 , x2 , x3 })}. 3 Example I begin by illustrating ideas on a toy choice dataset, and subsequently describe the general approach in Section 4. Consider a set of choice alternatives X = {x1 , x2 , x3 , x4 } and the dataset D consisting of the following 21 observations: • Choice of x1 from every subset containing x1 . • Choice of x2 from {x2 , x3 }, {x2 , x4 }, and {x2 , x3 , x4 }. • Choice of x3 from {x3 , x4 }. • Choice of x4 from every subset containing x4 . • Choice of x3 from {x1 , x3 }, {x2 , x3 }, and {x1 , x2 , x3 }. • Choice of x2 from {x1 , x2 }. • Choice of x3 from {x1 , x3 , x4 }. There are many possible rationalizations of this data. If the analyst uses a single ordering to explain the data, he can explain up to 10 observations, for example with R1 = {x1 x2 x3 x4 }. The minimal obtainable choice error using a single ordering is ∆1 = 21 − 10 = 11. If the analyst allows for two orderings, he can explain 20 observations, for example with R2 = {x1 x2 x3 x4 , x4 x3 x2 x1 }. The minimal obtainable choice error using two orderings is ∆2 = 21 − 20 = 1. Finally, if the analyst allows for three orderings, he can explain all of the observations, for example with R3 = {x1 x2 x3 x4 , x4 x3 x2 x1 , 5 x3 x1 x2 x4 }. The minimal obtainable choice error using three orderings is ∆3 = 0. These observations are collected in the figure below, which graphs the minimal obtainable choice error ∆k for values k = 1, . . . , 5. Choice Error k 11 1 1 2 3 Number of Orderings k How should the analyst choose between these solutions? Notice that the ordering in R1 is consistent with the proposal of Houtman & Maks (1985), which finds the largest subset of choice observations that can be explained by a single ordering4 . The set of orderings R3 is consistent with the proposal of Kalai, Rubinstein & Spiegler (2002), which finds the smallest number of orderings that can perfectly explain all observations. This paper proposes an intermediate solution. Define the set of best multipleordering rationalizations to be the solution to argminR∈R E(D, R) + λ|R|, 1 0.1|D| 1 2.1 , choosing λ = = as proposed in Corollary 1. It is easy to verify that all best multiple-ordering rationalizations consist of two orderings5 . This is because the gain from introducing a second ordering is nearly half of the dataset (∆2 −∆1 = 10), whereas the gain from permitting a third is only a single observation (∆3 −∆2 = 1). The proposed approach thus interprets the data as reflecting maximization of two orderings, with a single choice observation in error. 4 Approach Fix a dataset D. The implied choice error when using a set of orderings R to rationalize D is defined E(D, R) := |{(x, A) ∈ D : x 6= cr (A) 4 5 for all r ∈ R}|, This solution is not unique. For example x4 x3 x2 x1 also explains 10 observations. 1 And moreover, that this outcome holds for any choice of λ ∈ 10 ,1 . 6 i.e. the number of choice observations in D that cannot be explained as maximization of any preference ordering r ∈ R. If we restrict to sets of r orderings, the minimal obtainable choice error is given by ∆r := min R⊆R, |R|=r E(D, R). Say that D is r-rationalizable if ∆r = 0. Remark 4.1. Dataset D is 1-rationalizable if and only if it is consistent. Remark 4.2. For every dataset D, there exists a constant L ≤ min{|D|, |X|} such that D is r-rationalizable for every r ≥ L. To allow for a tradeoff between the “simplicity” of the model (as defined through the number of orderings) and its fit to the data, I define the set of λ-best multipleordering rationalizations of D as follows. Definition 4.1. For any λ ∈ R+ , R∗ is a λ-BMOR of D if R∗ ∈ argmin |R| + λE(D, R). (2) R⊆R We can interpret λ as arbitrating between the two goals of minimizing the number of orderings and minimizing the number of implied choice errors. Loosely speaking, an ordering is included in R if and only if it explains at least λ1 observations that would otherwise be interpreted as choice error. Thus, as λ → 0, the analyst prefers to adopt a unique ordering for the agent and interpret the remaining observations as error, while for large choices of λ, the analyst prefers to use as many orderings as necessary to eliminate choice error. Below, I show that the Houtman & Maks (1985) solution is selected if λ < ∆11 (Claim 1), and the Kalai, Rubinstein & Spiegler (2002) solution is selected if λ > 1 (Claim 2). Claim 1. If λ < 1 ∆1 , every solution R∗ to Eq. (2) satisfies |R∗ | ≤ 1. Proof. Suppose there exists some λ-BMOR R∗ satisfying |R∗ | = K > 1. Then by the definition of a λ-BMOR, necessarily K + λ∆K ≤ 1 + λ∆1 . Since moreover 1 + λ∆1 < 2, it follows that K < 2 − λ∆K ≤ 2. This contradicts the assumption that K > 1. Claim 2. If λ > 1, every solution R∗ to Eq. (2) satisfies |R∗ | = L, where L is the smallest constant such that the data is L-rationalizable. Proof. Since D is L-rationalizable, clearly no solution R∗ to Eq. (2) will have |R∗ | > L. Suppose there exists a λ-BMOR R∗ with |R∗ | = K < L. Assign a unique ordering to each of the ∆K implied choice errors from rationalizing D with R∗ . This perfectly explains the data with K + ∆K < K + λ∆K orderings, contradicting the assumption that R∗ is a λ-BMOR. 7 Geometrically, solutions to (2) are described as follows. Let f be the linear interpolation of points {(k, ∆k ), k ∈ N}, and let F = {(x, y) | y ≥ f (x)} be the epigraph of f . Then, for any choice λ ∈ R+ , the problem in (2) returns a kordering solution if and only if the line with normal vector (−1, −λ) supports F at (k, ∆k ). For example, given the dataset provided in Section 3, the line with normal 1 vector (−1, −λ) supports F at (2, ∆2 ) for any choice of λ ∈ 10 ,1 . Choice Error 0 f F 2 1 2 3 Number of Orderings Figure 1: The problem in (2) returns a solution with 2 orderings if and only if the line with normal vector (−1, −λ) supports F at (2, ∆2 ). How should the analyst choose λ, and under what conditions on the datagenerating process do we find the proposed approach to recover the “true” number of orderings maximized by the agent? 5 Recovery Results Consider the following class of choice rules. Let F be a set of K contexts, R = {rf }f ∈F be a set of context-dependent preferences, and p ∈ [0, 1] be a probability of error. Given choice problem (A, f ), suppose that the agent chooses the rf -optimal alternative in A with probability at least 1 − p; otherwise, he trembles and chooses a different alternative in A. Assume that the analyst does not know 1. the number of contexts. 2. the locations or number of realized errors. 3. the distribution of error. 8 Can he recover the true number of orderings K using the proposed approach in (2)? In general, recovery of the number of context-dependent preferences will not be possible. My main result demonstrates, however, that the proposed method will exactly recover K with probability exponentially close to 1 if: (1) the number of preferences is small relative to the quantity of data, (2) the probability of error is small, and (3) the context-dependent preferences are “sufficiently different,” in a sense made precise in Section 5.2. A natural next question is whether we can do better: in particular, can we recover either the locations of mistakes, or the set of orderings themselves? In Section 5.3, I show that even in the absence of choice error (p = 0), recovery of multiple preferences from choice data alone is in general an ill-posed task. Proposition 1 states that no sets of three or more context-dependent orderings, and only special pairs of context-dependent orderings, can be recovered. 5.1 Class of choice models In this section, I show that the class of choice models I consider is fairly general, including as special cases several familiar models of choice. To describe these relationships, recall that a random choice rule is a map P from choice problems (A, f ) to distributions over the elements of A, with the property that the support of P (A, f ) is a subset of A for every (A, f ). The class of choice models I consider is equivalent to the class of random choice rules satisfying P x(A,f ) | A, f ≥ 1 − p for all (A, f ). where x(A,f ) denotes the rf -optimal choice in A. The following special cases are of interest. Case 1: p = 0 and K = 1. This returns the classic theory of choice in which a single preference ordering r is defined over X and choice satisfies ( 1 if x = cr (A) for all (A, f ) ∈ P P (x|A, f ) = 0 otherwise. where the dependence on f is trivial. Case 2: p = 0 and K ≥ 1. This returns the model proposed in Salant & Rubinstein (2008) and Bernheim & Rangel (2009), in which an agent is described by a set of context-dependent orderings {rf }f ∈F , and choice satisfies ( 1 if x = x(A,f ) P (x|A, f ) = for all (A, f ) ∈ P. 0 otherwise. In the special case in which F is a partition of PX , this is equivalent to the model studied in Kalai, Rubinstein & Spiegler (2002): a map h : PX → F cues contexts from choice sets, and choice from A is the rh(A) -optimal alternative in A. 9 Case 3. p > 0 and K = 1. If there exist distributions {qA }A∈PX such that P (x|A, f ) = qA (Rx,A ) for all (A, f ) ∈ P, then this returns the class of random utility models with choice-set dependent distributions (as considered, for example, in Fudenberg, Iijima & Strzalecki (2015)). In the special case in which qA = q for every choice set A, we have the classic random utility model (Block & Marshak 1960). 5.2 Can we recover the number of orderings? I provide intuition for the subsequent results by discussing a few negative examples in which recovery of K using the approach in (2) is either impossible or difficult. In both examples, I fix p = 0 to simplify ideas. Example 5.1. Define the sets of orderings R = {(1, 2, 3), (1, 3, 2), (3, 2, 1)}, and R0 = {(1, 2, 3), (3, 2, 1)}. Suppose the agent’s true set of context-dependent preferences is described by R. It is easy to verify6 that C(R) = C(R0 ), implying that every choice observation consistent with maximization of some ordering in R is also consistent with maximization of some ordering in R0 . Then, for every choice of λ and any choice dataset D generated by (perfectly) maximizing orderings in R, we have that E(D, R0 ) + λ|R0 | = 0 + 2λ < E(D, R00 ) + λ|R00 | for every R00 consisting of three or more orderings. Thus, no solution of (2) will return the true number of orderings (3) in the agent’s preference.7 Example 5.2. Define the sets of orderings R = {(1, . . . , 9, 10), (1 . . . , 10, 9)}, and R0 = {(1, . . . , 9, 10)}. Suppose the agent’s true set of context-dependent preferences is described by R. Observe that the single ordering in R0 explains almost every choice observation that can result from maximization of orderings in R. The single exception is the C(R) = C(R0 ) = {(x1 , {x1 , x2 , x3 }), (x1 , {x1 , x2 , }), (x1 , {x1 , x3 }), (x2 , {x2 , x3 }) (x2 , {x1 , x2 }), (x3 , {x2 , x3 }), (x3 , {x1 , x3 }), (x3 , {x1 , x2 , x3 })}. 7 An alternative perspective is to consider R and R0 not merely observationally equivalent, but in fact equivalent models. In this case, we can re-interpret the observation as follows: the “more complex” description R will never be selected over the “less complex” description R0 . 6 10 observation (x10 , {x9 , x10 }), which is consistent with maximization of the ordering (1, . . . , 10, 9), but not with maximization of the ordering (1, . . . , 9, 10). Consider any choice dataset D that is generated by (perfectly) maximizing orderings in R, and does not include (x10 , {x9 , x10 }). Then, for any choice of λ, E(D, R0 ) + λ|R0 | = 0 + λ < E(D, R00 ) + λ|R00 | for every R00 consisting of two or more orderings. So observation of (x10 , {x9 , x10 }) is necessary to return the true number of orderings (2) using (2). These examples are suggestive of the following: in order to recover the number of context-dependent orderings, it is necessary that there exist “sufficient differentiation” in the choice implications of these orderings. Following, I define such a notion of differentiation. Definition 5.1. Say that choice problems A = {(A1 , f1 ), . . . , (Ak , fk )} are in k-violation of IIA if 1. crf (A) 6= crf 0 (A0 ) for every (A, f ), (A0 , f 0 ) ∈ A, and 2. crf (A) ∈ Tk i=1 Ak for every (A, f ) ∈ A. Condition (1) requires that every choice problem in A has a distinct optimal choice, and condition (2) requires that each of these (distinct) k alternatives is available in every set Ai for i = 1, . . . , k. Notice that every pair of choice problems from A constitutes a (standard) violation of IIA. Definition 5.2. The differentiation parameter dR (A) of a (multi)set of choice problems A is the largest d such that there exists a partition of A into subsets {A1 , . . . , Ad+1 } satisfying 1. |Ai | = K, and 2. Ai is in K-violation of IIA. for every i ∈ {1, . . . , d}. I illustrate this definition on an example. Example 5.3. Consider a set of choice alternatives X = {x1 , x2 , x3 , x4 , x5 }, and a set of frames F = {fp , fq }. The agent has context-dependent preferences R = {rp , rq } = {(5, 4, 3, 2, 1), (1, 2, 3, 4, 5)}. 11 Suppose the agent maximizes rq when there are three or fewer alternatives in the choice set, and maximizes rp otherwise. Let A consist of every choice problem (A, f ) with A ∈ PX and f ∈ F . Then dR (A) = 6, with every pair in ({x1 , x2 }, fq ), ({x1 , x2 , x3 , x4 }, fp ) ({x1 , x3 }, fq ), ({x1 , x2 , x3 , x5 }, fp ) ({x1 , x4 }, fq ), ({x1 , x2 , x4 , x5 }, fp ) ({x1 , x5 }, fq ), ({x1 , x3 , x4 , x5 }, fp ) ({x2 , x3 }, fq ), ({x2 , x3 , x4 , x5 }, fp ) ({x2 , x4 }, fq ), ({x1 , x2 , x3 , x4 , x5 }, fp ) constituting a 2-violation of IIA. The subsequent recovery results (informally) say the following: if there is sufficient differentiation between context-dependent orderings and sufficient (not necessarily complete) sampling of choice problems, then we can recover the number of context-dependent orderings using Equation (2) with probability very close to 1. Theorem 1 applies to general sets of choice problems. Corollary 2 considers a particular data-generating process in which M choice problems are sampled uniformly at random from P. Throughout, I take M to be the number of observations in the data, N to be the number of alternatives, p to be the probability of error, and dR (A) to be the differentiation parameter of A given the agent’s preferences R. When there is no chance of confusion, I express dR (A) simply as d(A). Theorem 1. Let A be any (multi)set of M choice problems. Suppose, for some constant β > 0, 2p + β d := d(A) > M (3) (1 − p)K d(1−p)K Then for any δ ∈ 0, M − 2p − β , there exists a constant c > 1 such that the optimization problem in Eq. (2) with λ = probability at least 1 − O c−M . 1 (p+δ)M exactly recovers |R| = K with I provide a brief proof sketch here and defer the details to the appendix. Identify every dataset with an undirected (hyper)graph8 in the following way: nodes represent choice alternatives, and there is an edge between a set of observations if and only if these observations cannot be rationalized using the same preference ordering. The key observation in the proof is that a dataset is k-rationalizable if and only if the corresponding graph is k-colorable9 . This equivalence is shown by 8 A hypergraph is a generalization of a graph in which edges may connect more than two vertices. A k-coloring of a graph is a partition of its vertex set V into k color classes such that no edge in E is monochromatic. A graph is k-colorable if it admits an k-coloring. 9 12 taking each color class to represent consistency with a distinct ordering. Thus, the problem in 2 can be seen as finding the smallest number of colors k such that the greatest number of nodes are k-colorable. Consistent with maximization of r1 Consistent with maximization of r2 Consistent with maximization of r3 Figure 2: Studying rationalizability of a dataset is equivalent to studying colorability of a graph in which nodes represent observations and edges represent inconsistencies. Fix any (multi-)set of choice problems A and suppose for the moment that there is no choice error. Since the data is generated by perfect maximization of K orderings, the corresponding hypergraph admits a K-coloring. Moreover, notice that every set of observations in a K-violation of IIA constitutes a complete Kpartite subgraph. Since by assumption, the data includes at least d such sets, the corresponding hypergraph includes at least d complete K-partite subgraphs. So it cannot be colored by fewer than K colors. Now introduce choice error. Each node is “corrupted” with probability p, following which its edges are re-arranged (the node is removed from some edges to which it belongs, and new edges between this node and others are introduced). I show that if choice error is introduced at a sufficiently low probability, then enough complete K-partite graphs remain in the perturbed graph such that “most” nodes can be partitioned into K (but not fewer) colors. The following corollary presents recovery properties for a particular, convenient, choice of λ. Corollary 1. Let A be any (multi)set of M choice problems. Suppose p ≤ 0.05 0.25 1 and d(A) > (0.95) K M . The optimization problem in Eq. (2) with λ = 0.1M exactly recovers |R| = K with probability at least 1 − O e−0.005M . 13 Since the number of non-overlapping sets of size K from a set of M elements cannot exceed M K , Condition (3) in Theorem 1 implies a tradeoff between the number of orderings that can be recovered and the probability of error that can be tolerated. For example, if p = 0.05 (5% probability of error) the theorem does not apply to sets with more than 6 orderings, and if p = 0.01 (1% chance of error), the theorem does not apply to sets with more than 35 orderings. In Corollary 1, the 0.25 (stronger) requirement d(A) > 0.95 K M is satisfied only by sets A including three or fewer orderings. These strict thresholds are not necessary conditions for recovery, and can be relaxed in future work10 . Below, I illustrate the implications of this approach and choice of λ using a problem of preference elicitation considered in Crawford & Pendakur (2012). Example 5.4. Crawford & Pendakur (2012) study preferences over six different types of milk, using a dataset including 500 Danish households and their purchases. Unsurprisingly, no single utility function can explain all 500 observations. To accommodate heterogeneity in preference, Crawford & Pendakur (2012) suggest an application of Kalai, Rubinstein & Spiegler (2002), in which the minimal number of utility functions that explain all of the data is found. They find that in fact 4-5 utility functions are sufficient to explain all of the data. This is a perfect multiplepreference fit to the data. But they further comment that the fifth utility function explains only 8 out of 500 observations, and drop this utility function in many of their latter analyses. This highlights a limitation of the approach proposed in Kalai, Rubinstein & Spiegler (2002): the approach ignores variation in the strength of evidence for recovered preferences. We can extend the approach in Kalai, Rubinstein & Spiegler (2002) by finding 1 1 a “best” imperfect multiple-preference fit. Under the choice of λ = 0.1(500) = 50 proposed in Corollary 1, preferences exist in a best multiple-ordering rationalization only if they uniquely explain at least 50 observations. The fifth utility function does not satisfy this criterion, so the remaining 8 observations are interpreted as choice error.11 Corollary 2 considers a related context in which the set of choice problems A is not fixed by the analyst, but generated by uniform sampling over the set of possible choice problems P = {(A, f ) : A ∈ PX , f ∈ F }. A similar result obtains provided the differentiation parameter d(P) is sufficiently high. 10 One way to relax this restriction is to count the number of possibly overlapping sets of choice problems that constitute an K-violation of IIA. 11 It is important to note, however, that Crawford & Pendakur (2012) consider utility functions defined on the continuous space of price-quantity pairs, whereas my recovery results in Section 5 pertain to orderings defined on finite sets of alternatives. 14 Corollary 2. Suppose A consists of M choice problems sampled uniformly at random from P, and 2p + β N d := d(PX ) > 2 K (4) (1 − p)K K for some constant β > 0. Then for any δ ∈ 0, d(1−p) − 2p − β , there is a M constant c > 1 such that the optimization problem in Eq. (2) with λ = exactly recovers |R| = K with probability at least 1 − O c−M . 1 (p+δ)M The details of the proof are deferred to the appendix. 5.3 Can we recover more? Section 4.1 provides conditions under which the problem in (2) recovers the correct number of context-dependent orderings with high probability. Is it possible to recover the context-dependent orderings themselves? First, I show that even in the absence of choice error (p = 0), recovery of multiple preferences from choice data is in general an ill-posed task. From Proposition 1, no set of three or more context-dependent orderings can be uniquely recovered, and only special pairs of context-dependent orderings can be recovered. Next, I discuss the possibility of identifying an equivalence class (in choice implications) containing the true set of context-dependent orderings. I provide an example to illustrate that even this weaker notion of identifiability is not met. I suggest that a more nuanced notion of complexity than cardinality is needed for recovery of sets of context-dependent orderings, and conclude with preliminary comments toward extension. In the following, I say that R is identified if there exists data D such that {R} = argmin |R0 | | E(D, R0 ) = 0 . (5) R0 ∈R That is, there exists some set of choice observations (possibly including observation of multiple choices from the same choice set) such that R is the unique set of k ≤ |R| orderings that perfectly explains the data. We can think of the data as “revealing” R to the data analyst. Otherwise, say that R is not identified. The following observation provides an equivalent characterization. Observation 1. R is identified if and only if there does not exist R0 6= R satisfying |R0 | ≤ |R| and E(C(R), R0 ) = 0. Thus, if there exists any data which identifies R, then the dataset C(R) will identify R. 15 Example 5.5. Fix X = {x1 , x2 , x3 } and R = {(3, 2, 1), (3, 1, 2)}. The choice implications of R C(R) ={(x1 , {x1 , x2 }), (x1 , {x1 , x3 }), (x2 , {x2 , x3 }), (x3 , {x2 , x3 }), (x1 , {x1 , x2 , x3 })} is a subset of the choice implications of R0 , for every R0 = {{(3, 2, 1), (2, 1, 3)}, {(3, 2, 1), (1, 2, 3)}, {(2, 3, 1), (3, 1, 2)}, {(1, 3, 2), (3, 1, 2)}}. So every dataset that can be perfectly rationalized by the orderings in R can be perfectly rationalized by the orderings in R0 ∈ R. Therefore, R is not identified. The following proposition establishes non-identifiability of most non-singleton sets. Specifically, every set with at least three orderings is not identified, and a pair of orderings is identified if and only if the orderings differ at extremes (the maximal element in the first ordering is the minimal element in the second, and vice versa). Proposition 1. If |R| ≥ 3, then R is not identified. If R = {r1 , r2 }, then R is identified if and only if argmaxi rij = argmini ri3−j for j = 1, 2. Every set R = {r} is identified. The proof is deferred to the appendix. Proposition 1 leaves open the possibility of recovering an equivalence class (in choice implications) containing the true set of orderings. Say that R ∼ R0 if C(R) = C(R0 ), and let [R] = {R0 ∈ R | R0 ∼ R} be the equivalence class of R induced by ∼. Every set of orderings in [R] has the same choice implications as R, so that an observation can be rationalized using an ordering in R if and only if it can be rationalized using an ordering in R0 . Say that R is choice-identified if there exists data D such that [R] = argmin |R0 | | E(D, R0 ) = 0 . R0 ∈R That is, there exists some set of choice observations D such every observation in D is consistent with maximization of some ordering in R0 if and only if C(R0 ) = C(R). The following negative example shows that even this weaker kind of identifiability need not be satisfied by typical sets of orderings. Example 5.6. Define R = {(1, 2, 3, 4, 5), (2, 1, 3, 4, 5)}, and R0 = {(1, 2, 3, 4, 5), (5, 4, 3, 2, 1)} 16 Every choice observation consistent with maximization of some ordering in R is also consistent with maximization of some ordering in R0 , but the converse does not hold. So C(R) ⊂ C(R0 ), implying both that R0 ∈ / [R], and also that every dataset that can be perfectly rationalized using the orderings in R can be perfectly rationalized using the orderings in R0 . It follows that [R] is not choice-identified. The example above highlights the (general) phenomenon that across sets with a fixed number of orderings, there is variation in the “richness” of choice implication. Since the approach in (2) penalizes all sets consisting of the same number of orderings equally, it is biased towards elicitation of sets with richer choice implications. That is, if the decision maker’s context-dependent preferences are many but similar, the proposed approach will incorrectly interpret the data using orderings that are fewer but “more different”. It may be possible to recover equivalence classes by extending the approach in (2) to loss functions of the form E(D, R) + λf (R), where f penalizes the “richness” or “expressiveness” of the orderings in R, instead of the cardinality. I leave this extension for future work. 6 Relationship to Literature This paper extends ideas in Kalai, Rubinstein & Spiegler (2002), which defines a set of orderings {i }L i=1 as a rationalization by multiple rationales of choice function c : PX → X if for every choice set A ∈ PX , the selected alternative c(A) is i -maximal in A for some i = 1, 2, . . . , L. Using the notation of Section 2, any set of orderings R with choice error E(D, R) = 0 is a rationalization by multiple rationales of the dataset D. This set of orderings may not, however, correspond to a best multipleordering rationalization of the data as defined in (2). In particular, I suggest that the analyst may prefer an imperfect rationalization of the data using some K < L orderings to perfect rationalization of the data using L orderings. The key conceptual difference is that Kalai, Rubinstein & Spiegler (2002) is agnostic towards the “degree of evidence” for any particular ordering k , whereas the approach in this paper insists on sufficient evidence for each ordering in order to separate error from preference variation. The model of choice that I consider throughout is an extension of frame-dependent preferences proposed independently in Salant & Rubinstein (2008) and Bernheim & Rangel (2009), with the addition of choice error. In each of these papers, the standard model is enriched by a set F of contexts.12 A choice problem13 is defined 12 Frames in Salant & Rubinstein (2008), and ancillary conditions in Bernheim & Rangel (2009) Extended choice problem in Salant & Rubinstein (2008), and generalized choice situation in Bernheim & Rangel (2009). 13 17 as a pair (A, f ) where A ⊆ X is a choice set and f ∈ F is a context. An extended choice function c assigns to every extended choice problem (A, f ) an element of A. I consider an extension of this model to allow for probability of error, so that c(A, f ) is chosen with probability at least 1 − p, but with probability p the agent trembles. Although my model of choice is very similar, the goals of this paper are very different. Salant & Rubinstein (2008) characterizes the choice correspondence Cc (A) = {x | c(A, f ) = x for some f ∈ F }, and Bernheim & Rangel (2009) proposes a framework for welfare assessment. This paper studies the question of whether it is possible to recover the number of contexts in F , using choice data alone. My results in Section 5 show that recovery of context-dependent preferences given the model proposed in Salant & Rubinstein (2008) and Bernheim & Rangel (2009) is an ill-posed problem, but recovery of the number of contexts is possible even under choice error. This paper is related also to Ambrus & Rozen (2013), which shows that without prior restriction on the number of selves involved in a decision, many multi-self models have no testable implications. Although the set of choice models considered in Ambrus & Rozen (2013) is distinct from the set of choice models considered in my paper,14 their lesson that restricting the number of selves is important for constraining the available degrees of freedom holds in my domain as well, and motivates in part the suggested criterion in (2). Finally, the applications that I suggest are related to exercises undertaken in Crawford & Pendakur (2012) and Dean & Martin (2010), which respectively apply the approaches of Kalai, Rubinstein & Spiegler (2002) and Houtman & Maks (1985) to interpret inconsistent choice data. Conclusion Inconsistencies in choice data may emerge either from choice error, or from maximization of multiple orderings. It is important to separate the two in analysis of the data, since their implications for welfare assessment and prediction are very different. But how does the analyst know how many distinct orderings are being maximized? This paper suggests use of statistical regularization to recover a small number of context-dependent preferences from noisy choice data. I show that with probability exponentially close to 1, the proposed approach is able to recover the true number of context-dependent preferences. This provides an alternative 14 Ambrus & Rozen (2013) study multi-self models in which every self is active in every decision, and choice is determined through maximization of a choice-set independent aggregation rule over selves. In contrast, I study multi-self models in which every self acts as a “dictator” in a subset of choices, thus varying the aggregation rule across choice problems. 18 to existing approaches, which deliver either a single “best-fit” ordering or multiple “perfect-fit” orderings. 19 Appendix: Proofs from Main Text A Theorem 1 A.1 Preliminary Notation and Results I use the following objects and definitions. A hypergraph is a pair H = (V, E) where V is a finite nonempty set, called the set of vertices, and E is a family of distinct subsets of V , called the set of edges.15 A k-coloring of a hypergraph is a partition of its vertex set V into k color classes such that no edge in E is monochromatic. A hypergraph is k-colorable if it admits an k-coloring. Finally, G = (V, E) is a complete k-partite graph if there is a partition {Vi }ki=1 of the vertex set V such that {u, v} ∈ E if and only if u and v are in different partitions. The set of all hypergraphs on M vertices is denoted H. In the remainder of this proof, I refer to hypergraphs simply as graphs. These concepts are related to our problem as follows. Enumerate the observations in any dataset D = {(x, A) : A ∈ A} as {(xi , Ai )}M i=1 . These choice observations can be identified with a graph H = (V, E) where V = {1, 2, . . . , M } indexes observations, and E consists of every set T ⊆ V such that: (1) the observations in {(xi , Ai ) | i ∈ T } are inconsistent16 , and (2) no proper subset of {(xi , Ai ) | i ∈ T } is inconsistent. I refer to the vertices of H and the observations they represent interchangeably. Claim 3. The following statements are equivalent: 1. H is k-colorable. 2. D is k-rationalizable. Proof. Take each color class to represent consistency with a distinct ordering, and the equivalence directly follows. For any graph H, let fH be the linear interpolation of points {(k, ∆k,H ) : k ∈ Z+ }, where ∆k,H is the minimal number of nodes in H that must be removed for H to become k-colorable.17 Let FH be the convex hull of the epigraph of fH (see Figure A.1), and define c := λ1 . Then, if there does not exist k ∈ N satisfying ∆k,H < ∆K,H + c(K − k), 18 15 This is a generalization of a graph in which edges may connect more than two vertices. There does not exist an ordering r such that xi is r-maximal in Ai for every i ∈ T . 17 This is equivalent to the definition of ∆k used in the main text, through Claim 3. 18 In vector notation, (−1, −λ) · (K − k, ∆k,H − ∆K,H ) ≥ 0. 16 20 (6) the line h = {x | (−1, −λ) · x = (−1, −λ) · (K, ∆K,H )} properly supports FH at (K, ∆K,H ), and any solution R∗ to the minimization problem argminR⊆R |R| + λE(D, R) must satisfy |R∗ | = K, as desired. k,H fH FH K,H K k Figure 3: Any choice of λ for which (−1, −λ) is a subgradient of fH at Withohigh probability, the set of vectors n (K, ∆K,H ) will recover KK. d(1−p) 1 (−1, − (p+δ) ) | δ ∈ 0, M − 2p − β is a subset of the subdifferential of fH at (K, ∆K,H ). Finally, suppose that p = 0, so that the agent perfectly maximizes his contextdependent ordering. This identifies a deterministic graph G. Claim 4. G includes at least d non-overlapping complete K-partite subgraphs.19 Proof. Each subgraph induced by the vertices in a K-violation of IIA is a complete K-partite graph. A.2 Main Proof Imperfect maximization using the random choice rule P generates a probability distribution over H. Denote the random graph with this distribution by H. Fix K δ ∈ 0, d(1−p) − 2p − β and λ = (p + δ)M . I will now show that the probability M that no k ∈ N satisfies (6) is at least 1 − O c−M , from which it will follow that the probability that Eq. (2) recovers K is at least 1 − O c−M . In the subsequent claims, take S ∼ Bin(M, p) to be the number of observations which are imperfectly maximized, and take VE ⊆ V to be the random variable whose outcome is the set of imperfectly maximized observations. Lemma 1. The probability that no k > K satisfies (6) is at least 1 − e−2δ 19 Two subgraphs are said to be non-overlapping if they do not share vertices. 21 2M . Proof. Since ∆K,H ≤ S, if there exists k > K such that ∆k,H < ∆K,H − c(k − K), then necessarily S > c = (p + δ)M . Otherwise, ∆k,H < c(K − k + 1) ≤ 0. Since E(S) = pM , it follows from Hoeffding’s Inequality that ((p + δ)M − pM )2 Pr(S ≥ c) = Pr(S − pM ≥ c − pM ) ≤ exp −2 M = e−2δ 2M , and therefore the probability that no k > K satisfies (6) is at least 1 − e−2δ desired. 2M , as Lemma 2. The probability that no k < K satisfies (6) is at least 2 (1 − p)K β 2 M − β2 M 1−e 1 − exp − . 2(2p + δ + β) Proof. If there exists k < K satisfying (6), then H must include strictly fewer than c + S non-overlapping complete K-partite graphs. Otherwise, ∆k,H ≥ (c + S)(K − k) ≥ ∆K,H + c(K − k) for every k < K, since every complete K-partite graph is K-colorable and every subgraph of a complete graph is itself a complete graph. Define HP to be the random subgraph of H induced by vertices in V \VE (perfectly maximized observations). Then, if HP contains at least c+S non-overlapping complete K-partite graphs, H must also. I determine the probability that HP includes at least c + S non-overlapping complete K-partite graphs as a lower bound. β2 β 2 M with probability at least 1 − e− 2 M , and n o subsequently that conditional on the event S < p + β2 M , subgraph HP includes at least cK+2 S non-overlapping complete K-partite graphs with probability β M 1 − exp − (1−p) 2(2p+δ+β) . The first statement follows from immediately from Hoeffding’s inequality, since E(S) = pM and β2 β Pr S < p + M ≥ 1 − e− 2 M . (7) 2 I first show that S < p+ Suppose S < (p + β2 )M . Index the d complete K-partite subgraphs in G (existence from Claim 4) by i = 1, 2, . . . , d. Let Xi be the indicator variable which takes value 1 if every vertex in complete K-partite subgraph i is perfectly maximized, and let P X = di=1 Xi . Notice Xi ∼ Ber((1 − p)K ) for every i and EX = d(1 − p)K . Using Hoeffding’s inequality, Pr(X < c + S) = Pr X − d(1 − p)K < c + S − d(1 − p)K (c + S − d(1 − p)K )2 ≤ exp −2 . d 22 Since by assumption δ ∈ K 0, d(1−p) − 2p − β , it follows that d ≥ M (2p+β+δ)M . (1−p)K Therefore, d(1 − p)K > (2p + δ + β)M > c + S, so ∂f (S) 2(c + S)2 = −2(1 − p)2K + < 0, and ∂d d2 ∂f (S) 4(c + S) = 4(1 − p)K − > 0. ∂S d where f (S) = bound on d, −2(c+S−d(1−p)K )2 . d Then, using the upper bound on S and the lower (1 − p)K β 2 M (c + S − d(1 − p)K )2 ≤ exp − . exp −2 d 2(2p + δ + β) (8) From (7) and (8), the probability that no k > K satisfies (6) is at least 2 (1 − p)K β 2 M − β2 M 1−e 1 − exp − 2(2p + δ + β) as desired. Using Lemmas 1 and 2, the probability that no k ∈ Z+ satisfies (6) is at least 2 (1 − p)K β 2 M − β2 M 1−e 1 − exp − − exp −2δ 2 M = 1 − O(c−M ), 2(2p + δ + β) n o 2 (1−p)K β 2 where c = min exp( β2 ), exp 2(2p+δ+β) , exp 2δ 2 . B Corollary 1 Take β = 0.1 and δ = .05. Then for any d > 2p+δ M (1−p)K = 0.25 M, (1−p)K d(1 − p)K − 2p − β > 0.25 − 2p − β = 0.05 M K − 2p − β . Directly apply Theorem 1. so that δ = 0.05 ∈ 0, d(1−p) M C Corollary 2 By assumption, P includes at least d non-overlapping sets of choice problems in K-violation of IIA. Enumerate the Kd choice problems included in a K-violation using i = 1, . . . , Kd. Let Zi be the random variable whose outcome is the number of times choice problem i is sampled, and let Qi be the event {Zi < α 2NMK }, where α = 2p+ β2 +δ 2p+β+δ Pr < 1. Then, ! Kd Kd [ X M Qi ≤ Pr(Qi ) = Kd Pr Z1 < α . 2N K i=1 i=1 23 Since Z1 ∼ Bin M, 2N1K and E(Z1 ) = 2NMK , it follows from Hoeffding’s Inequality that 2 ! (1 − α) 2MN M αM M Kd Pr Z1 − N < N − N ≤ Kd exp −2 2 K 2 K 2 K M (1 − α)2 M := g(K, d, p, β, δ, M ) = Kd exp − 22N −1 (9) S h i (1−α)2 M Kd Therefore Pr Zi ≥ α 2NMK for every i = 1−Pr Q ≥ 1−Kd exp − . i 2N −1 i=1 2 M M Conditional on the event {Zi ≥ α 2N K for every i}, there are at least dα 2N K > 2p+ β2 +δ M (1−p)K non-overlapping sets of K choice problems in K-violation of IIA in the sampled data, and we can apply Theorem 1 to conclude that probability of recovery has lower bound !# " K β2M β2 (1 − p) f (K, d, p, β, δ, M ) := 1 − e− 8 M 1 − exp − − exp −2δ 2 M . β 8(2p + δ + 2 ) (10) From (9) and (10), the probability of recovery is at least (1 − g(K, d, p, β, δ, M ))f (K, d, p, β, δ, M ) = 1 − O(c−M ) with c := c(K, d, p, β, δ) = min { exp β2 8 , exp (1 − p)K β 2 8(2p + δ + β2 )  exp(2δ 2 ), exp  1 22N −1 ! , 2p + β2 + δ 1− 2p + β + δ !2    >1  as desired. D Proposition 1 For any set of orderings R and ordering r ∈ R, define g(r, R) to be the set of all choice observations consistent with maximization of r and inconsistent with maximization of any other r0 ∈ R.20 The set of revealed preferences in g(r, R) is given by the binary relation Br := {(x, y) : (x, A) ∈ g(r, R) for some A including y}. 20 For example, if R = {(1, 2, 3), (2, 3, 1)}, then g((1, 2, 3), R) = {(x3 , {x1 , x2 , x3 }), (x3 , {x1 , x3 }), (x3 , {x2 , x3 })}, since these observations are consistent with maximization of (1, 2, 3) and inconsistent with maximization of (2, 3, 1). 24 Let Br be its transitive closure. The following is a necessary condition for identifiability of R. Claim 5. Suppose there exist orderings r, r0 ∈ R such that argmaxi ri 6= argmini r0i . (11) Then, R is not identified. Proof. First, I show that R = r1 , . . . , rK is identified only if Br is complete for every r ∈ R. Suppose to the contrary that R is identified, but (without loss of generality) Br1 is not complete. Then there exists some ordering r̂1 6= r1 such that every choice observation in g(r1 , R) is consistent with maximization of r̂1 , so we can replace r1 with r̂1 in R without loss of any choice implications. Formally, define R0 = r̂1 , . . . , rK . Since C(R) ⊆ C(R0 ), it follows from Observation 1 that R is not identified. Next, I show that if there exist orderings r, r0 ∈ R satisfying (11), then Br is not complete for some r ∈ R. Index the alternatives such that x1 := argmaxi ri and x2 := argmini r0i . I show that neither (x1 , x2 ) nor (x2 , x1 ) is in Br0 , and hence Br0 is not complete. Suppose towards contradiction that (x1 , x2 ) ∈ Br0 . Then (x1 , A) ∈ g(r0 , R) for some A ∈ PX . But since x1 is r-maximal, every observation in which x1 is selected is consistent with maximization of ordering r0 . Thus, for every A ∈ PX , choice observation (x1 , A) ∈ / g(r0 , R). This yields the desired contradiction. Suppose alternatively that (x2 , x1 ) ∈ Br0 . Then, (x2 , A) ∈ g(r0 , R) for some A ∈ PX . But x2 is ranked last according to r0 , so (x2 , A) ∈ / g(r0 , R) for every A ∈ PX . This yields the desired contradiction. Therefore, if there exist orderings r, r0 ∈ R satisfying (11) then R is not identified. It follows immediately from Claim 5 that every set R = r1 , r2 with argmaxi rji 6= for some j ∈ {1, 2} is not identified. Moreover, since every set R with argmini r3−j i |R| ≥ 3 must include orderings satisfying (11), it follows from Claim 5 that every set R with three or more orderings orderings is not identified. Next, I show that sets R = r1 , r2 with argmaxi rji = argmini ri3−j for j = 1, 2 are identified. Index the alternatives such that x1 = argmaxi r1i and xN = argmaxi r2i , and define D = C(r1 ) ∪ C(r2 ). Suppose to the contrary that there exists a set of orderings R0 = {r̂1 , r̂2 } = 6 R such that E(D, R0 ) = 0. I show a contradiction by identifying an observation in D which is inconsistent with maximization of both r̂1 and r̂2 . First observe that necessarily either x1 is highest ranked in r̂1 and xN is highest ranked in r̂2 or vice versa, since (x1 , X), (xN , X) ∈ D. Without loss of generality, suppose the former. Since r̂1 6= r1 , there exist alternatives xk , xl with k, l ∈ / {1, N } 1 1 1 1 such that r̂k < r̂l and rk > rl ; that is, xk is higher ranked than xl under r1 but not under r̂1 . Let A be the set of all alternatives ranked lower than xk in 25 ordering r1 , noting that xN ∈ A since xN = argmini r1i by assumption. Then, choice observation (xk , A) ∈ C(r1 ), but is inconsistent with maximization of r̂2 since xN ∈ A. Moreover, (xk , A) is inconsistent with maximization of r̂1 since xl ∈ A. This yields the desired contradiction. Finally, every singleton set R = {r} is trivially identified using the set of all of its choice implications C(r). 26 References Ambrus, Attila & Kareen Rozen. 2013. “Rationalizing Choice with Multi-Self Models.” Economic Journal . Belloni, Alexandra, Victor Chernozhukov, Dan Chen & Christian Hansen. 2012. “Sparse Models and Methods for Instrumental Regression, with an Application to Eminent Domain.” Econometrica . Belloni, Alexandra, Victor Chernozhukov & Lie Wang. 2011. “Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming.” Biometrika . Belloni, Alexandre & Victor Chernozhukov. 2011. “L1-Penalized Quantile Regression in High-Dimensional Sparse Models.” Annals of Statistics . Bernheim, B. Douglas & Antonio Rangel. 2009. “Beyond Revealed Preference: Choice Theoretic Foundations for Behavioral Welfare Economics.” Quarterly Journal of Economics . Beshears, John, James Choi, David Laibson & Brigitte Madrian. 2008. The Importance of Default Options for Retirement Saving Outcomes: Evidence from the United States. Oxford University Press pp. 59–87. Block, H.D. & Jacob Marshak. 1960. “Random Orderings and Stochastic Theories of Response.” Contributions to Probability and Statistics . Crawford, Ian & Krishna Pendakur. 2012. “How Many Types Are There?” Economic Journal . Dean, Mark & Daniel Martin. 2010. “How Consistent are your Choice Data?” Working Paper. Donoho, David L. & Xiaoming Huo. 2001. “Uncertainty Principles and Ideal Atomic Decomposition.” IEEE Transactions on Information Theory . Einav, Liran, Amy Finkelstein, Iuliana Pascu & Mark Cullen. 2012. “How General are Risk Preferences? Choices under Uncertainty in Different Domains.” American Economic Review . Elad, Michael & Alfred Bruckstein. 2002. “A Generalized Uncertainty Principle and Sparse Representation.” IEEE Transactions on Information Theory . Fudenberg, Drew, Ryota Iijima & Tomasz Strzalecki. 2015. “Stochastic Choice and Revealed Perturbed Utility.” Econometrica . 27 Gabaix, Xavier. 2014. “A Sparsity-Based Model of Bounded Rationality, with Application to Basic Consumer and Equilibrium Theory.” Quarterly Journal of Economics . Working Paper. Houtman, Martijn & J.A.H. Maks. 1985. “Determining all Maximal Data Subsets Consistent with Revealed Preference.” Kwantitatieve Methoden 19:89–104. Huang, Po-Sen, Scott Deeann Chen, Paris Smaragdis & Mark Hasegawa-Johnson. 2013. “Singing-Voice Separation from Monaural Recordings Using Robust Principal Component Analysis.” International Conference on Acoustics, Speech, and Signal Processing . Huber, Joel, John Payne & Christopher Puto. 1982. “Adding Asymmetrically Dominated Alternatives: Violations of Regularity and the Similarity Hypothesis.” Journal of Consumer Research . Kahneman, Daniel & Amos Tversky. 2000. Choices, Values, and Frames. The Press Syndicate of the University of Cambridge. Kalai, Gil, Ariel Rubinstein & Ran Spiegler. 2002. “Rationalizing Choice Functions by Multiple Rationales.” Econometrica 70(6):2481–2488. Ren, Xiang, Zhendong Zhang & Yi Ma. 2012. “Repairing Sparse Low-Rank Texture.” Journal of LaTeX Class Files . Salant, Yuval & A. Rubinstein. 2008. “(A,f): Choice with Frames.” The Review of Economics Studies . Yang, Allen, Zihan Zhou, Arvind Ganes, S. Shankar Sastry & Yi Ma. 2013. “Fast L1-Minimization Algorithms for Robust Face Recognition.” Computer Vision and Pattern Recognition . 28

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Interpretation of Inconsistent Choice Data: How Many Context