Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Political Entropy of Vote Choice: An Empirical Test of Uncertainty Reduction Je Gill Department of Political Science Department of Statistics Cal Poly University E-mail: [email protected] Prepared for the 1997 Annual Meeting of the American Political Science Association, August 27-31, Washington, DC. The American National Election Study data used in this research were produced by the Center for Political Studies at the University of Michigan and distributed by the Inter-University Consortium for Political and Social Research. Replications data sets are available upon request. Data analysis and graphical representations were produced using S-Plus Version 3.4. and GAUSS Version 3.2.25. Thanks to Stuart Wugalter, SSDBA Administrator, for database support, Dianne Long, David George and Randal Cruikshanks for ideas and input, and Je Moore for the original inspiration. Abstract Recent literature in voting theory has developed the idea that individual voting preferences are probabilistic rather than strictly deterministic. This work builds upon spatial voting models (Enelow and Hinich 1981, Ferejohn and Fiorina 1974, Davis, DeGroot and Hinich 1972, Farquharson 1969) by introducing probabilistic uncertainty into the calculus of voting decision on an individual level. Some suggest that the voting decision can be modeled with traditional probabilistic tools of uncertainty (Coughlin 1990, Coughlin and Nitzen 1981). Entropy is a measure of uncertainty that originated in statistical thermodynamics. Essentially, entropy indicates the amount of uncertainty in probability distributions (Soo 1992), or it can be thought of as signifying a lack of human knowledge about some random event (Denbigh and Denbigh, 1985). Entropy in statistics developed with Kolmogorov (1959), Kinchin (1957), and Shannon (1948), but has rarely been applied to social science problems. Exceptions include Darcy and Aigner's (1980) use of entropy to analyze categorical survey responses in political science, and economic applications by Theil (1967) and Theil and Fiebig (1984). I examine voters' uncertainty as they assess candidates, and measure policy positions. I then test whether or not these voters minimize the cost of voting (specically the cost of information) by determining a maximum entropy selection. Except for the inclusion of entropy terms, this approach is similar to others in the recent literature. In this paper I develop a measure to aggregate evaluation of issue uncertainty and corresponding vote choice where the uncertainty parameterization is derived from an entropy calculation on a set of salient election issues. The primary advantage of this approach is that it requires very few assumptions about the nature of the data. Using 1994 American National Election Study survey data from the Center for Political Studies, I test the hypothesis that the \Contract with America" reduced voter uncertainty about the issue positions of Republican House candidates. The entropic model suggests that voters used the written and explicit Republican agenda as a means of reducing issue uncertainty without substantially increasing time spent evaluating candidate positions. An overview of this paper is as follows. Section 1 reviews the relevant literature on voting under uncertainty. Section 2 briey introduces the history and theory of entropy. This section can be skipped by those familiar with entropy. Section 3 develops and tests a maximum entropy model of voter uncertainty from the 1994 congressional elections. Conclusions are provided in Section 4. 1 1 Introduction: Information, Uncertainty, and Voting Complexity, obfuscation, vagueness, and uncertainty are permanent features of American electoral politics (Aldrich, Niemi, Rabinowitz, and Rohde 1982). As modern social and economic issues gain in complexity and legislatures increasingly dodge controversial and divisive votes, a greater number of misunderstood, and poorly explained issue positions are thrust onto the ballot. Candidates often have motivations to make their issue positions deliberately vague (Downs 1957, Shepsle 1972, Franklin 1991). This can be out of a desire not to alienate certain constituency groups or simply that the candidate lacks a strongly dened position. Complex or lengthy ballot measures can also confuse voters (Gerber 1996). These are commonly worded in stern legal language and many voters lack the time or analytical skills necessary to determine policy consequences. There is evidence of this phenomenon even in very high salience elections. Bob Dole made deliberately vague statements on his abortion position in the 1996 presidential election after conservatives in the Republican party successfully placed strong anti-abortion rights language into the party platform at the San Diego convention (San Francisco Chronicle August 5, 1996, p.A1). The 1996 ballot initiative in California to eliminate state supported armative action programs (Proposition 209) confused many voters on its actual intent (Los Angeles Times, October 31, 1996, p.A22). The words "armative action" do not appear anywhere in the bill's language, and the bill is counter-intuitively entitled the \California Civil Rights Act." In campaigns it is a well-worn strategy to recast the language of an opponents issue position into something less popular with voters (Nimmo 1970), and with frequent occurrence this has a generally confusing eect on the electorate. So why do many voters endure high levels of uncertainty and cast their ballot? The answer is elusive, though well studied in the literature (Downs 1957, Ferejohn & Fiorina 1974, Olson 1965, Wolnger & Rosenstone 1980, Abramson & Aldrich 1982, Burnham 1987, Teixera 1992). Rather than address the eect of uncertainty on turnout, I address those individuals that make a decision to vote despite uncertainty. The motivation stems from the observation that models developed without incorporating a voter uncertainty component miss an important criterion of vote choice. Ordeshook (1986, p.187) points out: \Models that rely on the assumption of complete information may oer a misleading view of the democratic process". Hinich (1977) found that even a small amount of uncertainty is enough to destroy 2 Black's elegant medium voter equilibrium result. Coughlin (1990) points out that while considerable work has been done on voting under uncertainty in two-candidate elections (where voters abstain only due to indierence), we still don't have a unied theory leading to equilibrium conditions. Alvarez and Franklin (1994) note the additional problem that there is not a consensus in the literature on how uncertainty is measured, although they were able to exploit the features of one particular survey that provided very specic information on issue-level uncertainty to reveal subtle patterns in the form of the responses. Bartels (1988) looks at the 1980 presidential election and nds that voters in general dislike uncertainty and this variance around candidate positions can exceed perceived spatial distances between the candidates. Bartels is the rst to point out that uncertainty is not just a function of the candidate or measure, but that voters have diering levels uncertainty and position (respondent heterogeneity). Like the Bartels uncertainty model, the model developed here rests on the work of Enelow & Hinich (1981) and Shepsle (1972) in that issue positions are not singular and xed. Instead they are represented by probability distribution functions with manifest observations (samples). It is precisely the quality of these samples, which provide information about the unknown distribution, that aect levels of certainty. This work leverages a unique aspect of the 1994 midterm election to evaluate how voters interpret diering levels of information about House candidate positions. In this election the Republican leadership in the House developed a well-dened and explicit issue agenda. Each Republican incumbent and challenger signed the \Contract with America" promising that if elected to a majority they would bring ten particular issues to House oor for a vote within the rst 100 days of the 104th Congress. The model developed here focuses on the uncertainty dierence between Republican and Democratic candidates in the 1994 House election. Unlike previous measures of uncertainty which come from survey based indicators or respondent self-assessments, I propose the use of a measure of uncertainty that is well known and tested in the physical sciences. The entropy approach focuses on measures of respondent uncertainty that is produced by the format and content of the survey question. 3 2 What Is This Thing Called Entropy? 2.1 Entropy in Thermodynamics In 1865 Clausius asserted that an isolated physical system not in equilibrium always tends to equilibrium. Equilibrium in thermodynamics means absolutely nothing is happening to the particles, as opposed to the conventional mathematical denition of a system having converged to some maxima or minima. While equilibrium states are not very interesting to physicists, the process of reaching equilibrium is well worth consideration. In fact, Clausius's assertion is now known as the second law of thermodynamics. At the heart of this process is the idea of reversibility: \A process that can go in in two directions is reversible if two (or more) directions are equally probable for all starting points, given a random nudge" (Ayres 1994, p.1). Or more precisely, a process is reversible with regard to some serial explanatory factor, if inserting , for in all model equations leaves these equations unchanged. Clausius operationalized entropy in the following simple way: dS = k dQ T (1) Where dS is the system's entropy change, dQ is the absolute value of the energy reduction in a system, T is the initial temperature of the system, and k is the number of molecules in the system times 1.38 joule per degree Kelvin (Boltzmann's constant). So large incremental changes in a system's energy relative to its starting absolute temperature produce large gains in entropy, and entropy is maximized when all of the energy is taken out of the system. What made Clausius' assertion so threatening (and therefore controversial) to physicists at the time was that it contradicted the then current paradigm that Newtonian mechanics implied reversibility in equations of particle motion. This can be seen in equation (1) because positive or negative changes in energy increase the entropy of a system. So entropy is always either constant or increasing. Boltzmann (1877) developed the idea of the most probable distribution given a set of observations. From this he dened entropy as a physical description of uncertainty and disorder in particle systems. To understand Boltzmann's construct imagine a nite number of atomic particles in a three dimensional box, each with some exactly known position, velocity and direction. We therefore know the positions of the particles in the next time period given this information about the particles in the current time period, although the number of calculations tends to innity very quickly in actual settings. Now suppose that we have some uncertainty about the position, 4 velocity, and direction of our particles. So instead of knowing exactly the position of the particles in the next time period we have a set of possible positions. Therefore there are a number of dierent \microstates" the system can have in the next time period. Add to this situation the complexity that as the particles are moving around they have diering energy states. Given these uncertainties there are clearly a large number of possible microstates the system can take on. Boltzmann dened entropy as the number of digits in the the number of possible microstates (i.e. the log of the number microstates). Suppose that the the number of possible microstates in a given system is: 95; 000; 000; 000; 000 = 9:5X 1013 , then entropy = klog10 (1X 1013 ) = 13k (where k is a proportionality factor). So Boltzmann's idea of entropy is a statistically derived method that indicates the amount of uncertainty (i.e. randomness) in a particle system where mathematically the number is transformed to a more manageable form (Plank 1906, Von Neumann 1927). 2.2 Entropy in Information Theory There are many denitions of information in various literatures but all of them have the same property of distinction from a message. If a message is the physical manifestation of information transmission (language, character set, salutations, headers/trailers, body, etc.), then the information is the substantive content independent of transmission processes. For example, in baseball the third base coach transmits a message to the runner on second base through the use of hand-gestures. The information contained in the hand gestures is independent of the actual gesture to the base-runner. Suppose you could condense a set of messages down to a minimal nite length. Then the information content would be the number of digits required by these alternate sets of messages. In this case information and message would be identical. Unfortunately information and message are never exactly the same. For example, the DNA strand required to uniquely determine the characteristics of a person contains far more codes than is minimally sufcient to transmit the genetic information. In information theory entropy is a measure of the average amount of information required to describe the distribution of some random variable of interest (Cover and Thomas 1991). More generally information can be thought of as a measure of reduction in uncertainty given a specic message (Shannon 1948, Ayres 1994, p.44). Suppose we wanted to identify a particular voter by serial information on this person's characteristics. We are allowed to ask a consecutive set of yes/no questions (i.e. like the common guessing game). As we get answers 5 to our series of questions we gradually converge (hopefully) on the desired voter. Our rst question is: does the voter reside in California. Now since about 13% of the voters in the United States reside in California, a yes answer gives us dierent information than a no answer. Restated, a yes answer reduces our uncertainty more than a no answer. If Pi is the probability of the ith event (residing in California), then the improvement in information is: log 1 = ,log P : (2) 2 2 i Pi The log is base 2 since there are only two possible answers to our question (yes and no) making the units of information bits. In our example Hi = ,log2 (0:13) = 2:943416 bits, whereas if we had asked: does the voter live in Arkansas then an armative reply would have increased our information by: Hi = ,log2 (0:02) = 5:643856 bits, or about twice as much. However, there is a much smaller probability that we would have gotten an armative reply had the question been asked about Arkansas. What Slater (1938) found, and Shannon (1948) later rened was the idea that the \value" of the question was the information returned by a positive response times the probability of a positive response. So the the value of the ith binary-response question is just: H = f log 1 = ,f log f : (3) i i 2 i fi 2 i And the value of a series of n of these questions is: n X i=1 Hi = k n X i=1 n X filog2 f1 = ,k filog2 fi: i i=1 ith yes (4) Where fi is the frequency distribution of the answer and k is an arbitrary scaling factor that determines choice of units. The arbitrary scaling factor makes the choice of base in the logarithm unimportant since we can change this base by manipulating the constant1 . The function (4) was actually rst introduced by Slater (1938), although similar forms were referenced by Boltzmann where fi is the probability that a particle system is in microstate i. For instance if the entropy form were expressed in terms of the natural log, but log2 was more appropriate for the application (such as above), then setting k = ln12 converts the entropy form to base 2. 1 6 Its easy to see that the total improvement in information is the additive value of the series of individual information improvements. So in our simple example we might ask a series of questions narrowing down on the individual of interest. Is the voter in California? Is the voter registered as a Democrat? Does the voter reside in an urban area? Is the voter female?. The total information supplied by this vector of yes/no responses is the total information improvement in units of bits since the response-space is binary. Its important to remember that entropy is dened only with regard to a well-dened question having nite, enumerated responses much the way entropy in statistical thermodynamics is dened only with regard to a carefully dened system (Tribus and McIrvine 1971). The link between information and entropy is often confusing due to differing terminology and usage in various literatures. In the thermodynamic sense, information and entropy are complementary: \gain in entropy means loss in information - nothing more" (Lewis 1930). So as uncertainty about the system increases, information about the conguration of the microstate of the system decreases. In terms of information theory, the distinction is less clear. Shannon (1948) presented the rst unambiguous picture. To Shannon the function (4) represented the expected uncertainty. In a given communication possible messages are indexed m1 ; m2 ; : : : ; mk , and each have an associated probability p1 ; p2 ; : : : ; pk . The probabilities necessarily sum to one, so the scaling factor is 1 here. The Shannon entropy function is the negative expected value of the natural log of the probability mass function: H = ,k X piln(pi) (5) Shannon dened the information in a message as the dierence between the entropy before the message and the entropy after the message. If there is no information before the message, then a uniform prior distribution is assigned to the pi and entropy is at its maximum2 In this case any result increases our information. Conversely if there is certainty about the result, then a degenerate distribution describes the mi , and the message does not change our information level3 . So according to Shannon, the message produces a new assessment about the state of the world and this in turn leads to the assignment of new probabilities (pi ) and a subsequent updating of the entropy value. It is now easy to see why entropy and bayesian methods are compatible. P ln( ) = ln(n), so entropy increases logarithmically with the number of equally likely Palternatives. H = , , (0) , log(1) = 0 2 3 H=, 1 n 1 n n 1 i=1 7 The simplicity of the Shannon entropy function belies its theoretical and practical importance. Shannon (1948, Appendix 2) showed that (5) is the only function that satises the following three desirable properties: 1. H is continuous in (p1 ; p2 ; : : : ; pn ). 2. If the pi are uniformly distributed, then H is at its maximum and is monotonically increasing with n. 3. If a set of alternatives can be reformulated as multiple, consecutive sets of alternatives, then the rst H should equal the weighted sum of the consecutive H values: (H (p1 ; p2 ; p3 ) = H (p1 ; 1 , p1 )+(1 , p1 )H (p2 ; p3 ). Up until this point only the discrete form of the entropy formulation has been discussed. Since entropy is also a measure of uncertainty in probability distributions, its important to have a formulation that applies to continuous probability distribution functions. The following entropy formula also satises all of the properties enumerated above: H= +1 Z ,1 f (x)ln(f (x))dx (6) This form is often referred to as the the dierential entropy of a random variable X with a known probability density function f(x). There is one important distinction between (6) and (5): the discrete case measures the the absolute uncertainty of a random variable given a set of probabilities, whereas the continuous case this uncertainty is measured relative to the chosen coordinate system (Ryu 1993). So transformations of the coordinate system require transformation in the probability distribution function with the associated jacobian. In addition, the continuous case can have innite entropy unless additional, \side" conditions are imposed (discussed below). There has been considerable debate about the nature of information with regard to entropy (Ayres 1994, Ruelle 1991, Tribus 1979, Jaynes 1967). A major question arises as to whose entropy is it? The sender? The receiver? The transmission equipment? These are often philosophical debates rather than practical considerations. Shannon's focus was clearly on the receiver as the unit of entropy measure. Jaynes (1979) considers the communication channel and its limitations as the determinants of entropy. Some, like Aczel and Daroczy (1975), view entropy as a descriptor of a stochastic event. The traditional thermodynamic denition described above sees entropy as a measure of uncertainty in microstates. For the purposes of this work, the Shannonian denition (5) is used. The perspective developed in this work 8 is that the voter has some ignorance of candidate preferences and policies that changes with the receipt of new information. 2.3 The Maximum Entropy Concept Maximum entropy can be conceptualized in a very simple manner. Suppose you were to bet on the outcome of the role of two dice that are not necessarily \fair" (a non-uniform distribution of outcomes on each die). There are obviously eleven possible outcomes each having unknown probabilities of occurrence. Even though you did not know the distribution of outcomes, you would still be advised to bet on the number seven as opposed to two or twelve since there are more combinations that produce seven (6) than two (1) or twelve (1). The bet on seven is the maximum entropy bet because, lacking prior information about the probabilities, seven produces the maximum expected value. The theoretical basis of this example is best explained by Jaynes (1968): \...the probability distribution which maximizes the entropy is numerically identical with the frequency distribution which can be realized in the greatest number of ways." The rst derivation of maximum entropy was accomplished by Planck (1906). Shannon (1948, p. 629) proved that with the side condition of: EX 2 = 2 ( bounded) for some random variable X, the maximum entropy distribution is gaussian(0; 2 ). In fact all of the commonly used probability distribution functions are maximum entropy distributions provided the appropriate moment constraints are given. Van Campehout and Cover (1981) point out that even the Cauchy distribution function is a maximum entropy distribution over all distributions that satisfy the side condition: R ln(1 + X 2 ) = k, for some constant k. Jaynes (1957) and Tribus (1961) developed the idea that the maximum of an entropy function has particular value with regard to the transmission of information. The maximum entropy philosophy can be condensed down to the following statement: when we make parametric inferences based on incomplete information, we should select the probability distribution function with the maximum entropy value given observed data (Jaynes 1982). Suppose we are interested in making inferences about some unknown distribution function. We have observations from the unknown distribution and a set of side conditions that must be met. For instance: the value of the nth moment, or that the probabilities of the set of possible events sum to one. The maximum entropy density function is the least informative (and therefore most conservative) estimation of the the unknown underlying distribution function. Jaynes (1982) points out that distributions with entropy 9 much lower than the maximum entropy are atypical given the data. This statement nicely links the similar orientation of maximum entropy methods and traditional maximum likelihood methods. Figure 1: Simple Maximum Entropy Examples 1.0 1.4 Maximum Entropy for Bernoulli PMF ME at 0.5 0.6 0.2 0.5 0.4 Prob 0.2 0.4 0.3 0.8 0.6 abili ob Pr 0.1 0.2 0.4 y1 t 0.8 0.6 ility 2 ab Entropy 0.7 0.2 0.8 0.9 y trop 1 En 0.6 ••••• ••• ••• •• ••• ••• •••• ••• • ••• •• •• ••• ••• ••• •• ••• • ••• ••• ••• •• •• ••• •• ••• •• ••• • ••• ••• ••• •• •• ••• •• ••• •• ••• • ••• ••• ••• •• •• ••• •• •• •• •• • •• ••• ••• •• ••• •• ••• •• •• ••• •• ••• •• ••• •• •••• •• •• •• •• •• •• •• •• •• ••• ••• •• • 0.0 0.5 1.0 Maximum Entropy for Trichotomous Outcome ME at 1.350541 Values of p As a simple example, suppose we were interested in making simple Bernoulli inferences derived from ipping a coin. If we have no prior knowledge of the probability of getting a head (p), then what is the maximum entropy formulation of the Bernoulli probability mass function? We place the single side restriction that p + pc = 1. So the entropy function is: H = ,k X pilog2 pi = , plog2 p + (1 , p)log2 (1 , p) In this case the maximum entropy is reached for a uniform distribution of the two alternatives. Therefore maximizing H subject to the simple side restriction produces Hmax = 1 where p = 12 . This is seen as the rst example 10 in Figure 1. If there are three alternatives each with a distinct probability, then the maximum entropy occurs at 1.35 for the uniform probability case (pi = 0:33). The maximum entropy for this case can be visualized in threespace as shown in the second example of Figure 1. Note that the third probability does not need to be assigned to an axis since once the rst two probabilities are selected, the third is determined by the side condition. 2.4 Entropy Applied to Vote Choice The entropy concept has been applied to a wide variety problems concerning uncertainty as described above. Yet there is no known application of entropy to measuring the lack of knowledge political actors have about each other's positions and intentions. Uncertainty about candidates and ballot issues is a fundamental problem for voters. Information is necessarily limited by transmission means (Shannon 1948), candidates and issues can be confusing at the source (Franklin 1991, Gerber 1996), and voters have limited time and attention and can search for shortcuts for even the most salient election issues (Carmines and Stimson 1980). The search for a measure of uncertainty is disadvantaged in that uncertainty in individuals is a subjective and changing phenomenon (Alvarez and Franklin 1994). Entropy, however, is a measure of uncertainty independent of a survey respondent's subjective assessment of issue-level uncertainty. It is independent because it is a function of how the question is constructed (number of response alternatives) and the distribution of of responses from aggregate data, nothing else. This approach to measuring voter uncertainty not only has very few assumptions, but it also applies to survey data where there are no specic uncertainty questions (Alvarez and Franklin 1994), there is no need to estimate a uniform variance threshold (Bartels 1988), and no need to exploit some unique attribute in the data (Franklin 1991). In the next section I build a vote choice model for the 1994 House election where a respondent's utility, and therefore vote choice, are a negative function of issue distance and entropy. 3 An Entropy Model of Vote Choice In this section a model of congressional vote choice is developed using the entropy concept as a measure of voter uncertainty. This model is based on the well-known proximity spatial model developed by Downs (1957), Davis, Hinich, and Ordeshook (1970), Hinich and Pollard (1981), and Enelow and Hinich (1981, 1984, 1990), and others. Like these works, the model assumes that voters seek a candidate choice that minimizes the squared Euclidean 11 distance between their own issue preference and the perceived position of the candidate. The contribution developed here is the addition of an entropic measure of uncertainty that attempts to capture the observable eects of voter ignorance on an individual issue basis. For each measured policy issue, the squared distance term is added to an entropy value for that issue. For example, the healthcare variable is constructed by squaring the distance between the respondent's preferred level of federal involvement in managing various healthcare programs and the respondent's placement of the Republican and Democratic candidates on the same scale added to an entropy value for the healthcare scale. This is done for ve issues topical to the 1994 congressional election in which the American National Election Study queries respondents about their position and their perceived position of the Democratic and Republican candidates for the House. In addition, the model, like that of Bartels (1988) and Palfrey and Poole (1987), allows for heterogeneity of candidate placement on these issues. Because respondents obviously evaluate candidates on issues beyond those of current media and congressional attention, an ideology entropy term is included which attempts to capture issue preference for which the data set does not provide a direct comparison between respondents' preferences and perceived candidate positions. The assumption is, that lacking direct and current information about candidates' policy preferences, voters (or potential voters) use ideology as an evaluation measure of last resort. It has long been argued that voters are not typically ideological in determining candidate and issue preference (Converse 1964), that voters are only more ideological as a response to ideologically dening choices (Nie, Verba and Petrocik 1976), and that voters can sometimes select candidates that dier dramatically from their preferred issue positions (Bishop, Tuchfarber and Oldendick 1978). Unlike these perspectives, the inclusion of the ideology term focuses not on the ideology of the respondent as a determinant of vote choice, but instead on respondents' uncertainty about candidate ideology. In other words, the more uncertainty about a candidate's ideology the lower the expected utility for that candidate. The utility function incorporates K = 5 policy preference comparisons in which respondent and perceived candidate positions can be explicitly compared with the survey data. The utility of candidate j to voter i is: Uij = 5 X k=1 ,((Cijk , Rik )2 + Hjk) , Hbij (7) where Cijk is respondent i's placement of candidate j on policy issue k, 12 Rik is respondent i's self-placement on issue k, Hij is candidate j 's entropy for issue k, and Hbij is respondent i's ideology entropy for candidate j . The entropy function for a given candidate (Democrat or Republican) and a given issue is constructed according to (5) where the probabilities for each of the seven-point responses are the empirically observed relative frequencies. For example, the Democratic candidates' respondent placement distribution on support for Clinton is the normalized vector: f0:076880; 0:014485; 0:000000; 0:899164; 0:000000; 0:006685; 0:002786g producing an entropy value of HDemocrat;Support:Clinton = 0:4040129. The heterogeneity of uncertainty and squared issue distance for each candidate/voter pairing is captured by indexing through the K = 5 issues. The ideology entropy term is added to the model as a means of capturing uncertainty about issues that are not specically addressed in the data. If there is not an issue preference question and a corresponding set of candidate issue placement questions in the survey, then it is impossible to obtain the squared distance term for this issue. Since a limited number of issues is topical to any given campaign and voters necessarily trade-o the value of information with the cost of obtaining the information, ideology (or perceived) ideology represents a very convenient method for approximating candidate position (Downs 1957, Enelow and Hinich 1981). Ideology placement is used as a surrogate for issues that are not highly salient to the 1994 election or where respondent and candidate placement are not included in the data set. Because there is uncertainty, the utility of the j th candidate to voter i is probabilistic in the model as developed in by Enelow and Hinich (1981). Utility to the ith voter is therefore a combination of proximity to the candidate on currently salient issues and the uncertainty in the voter's mind about that proximity. If the entropy terms in (7) are relatively large, then they reduce utility to voter i since their signs are negative. This implies that voters prefer candidates that are less vague about their issue positions and ideology. The entropy term replaces a variance term in the Bartels (1988) model and removes a major diculty: the variance that a respondent has for a given issue and candidate is not directly observable. To overcome this obstacle, Bartels is forced to dene a variance threshold (constant across candidates and issues, and respondents) which denes when respondents will make candidate placement on an issue. Unfortunately this approach requires the assumption of homogeneity of the uncertainty (variance) threshold, and 13 leads to a more complex, two-stage estimation procedure. Conversely, the maximum entropy approach makes absolutely no assumptions about the distribution of the variance of uncertainty (Shannon 1948, Jaynes 1968, 1982). Furthermore, empirical distributions that satisfy given linear constraints P (side conditions), pi = 1 in our case, concentrate around the maximum entropy distribution which satises the same constraints (Van Campenhout and Cover 1981, Robert 1990). This nding, formalized as Jaynes' Entropy Concentration Theorem (1968), provides the motivation for treating the unknown distribution of candidate uncertainty as an entropy term. 0 0 10 20 30 20 40 60 Figure 2: Ideology Placement By Certainty, Respondent Counts by 7-Point Scale 80 0 40 0 EL L SL M SC C EC Democratic Candidate for ‘Pretty Certain’ 40 80 120 EL L SL M SC C EC Republican Candidate for ‘Pretty Certain’ EL L SL M SC C EC Democratic Candidate for ‘Not Certain’ 0 50 100 0 EL L SL M SC C EC Republican Candidate for ‘Very Certain’ 40 80 120 EL L SL M SC C EC Democratic Candidate for ‘Very Certain’ EL L SL M SC C EC Republican Candidate for ‘Not Certain’ Due to the emphasis on the ideology entropy term in the model, it is constructed dierently than the issue entropy contributions. The ideology entropy, Hb ij , is modied by treating aggregate respondent ideology placement of candidates by party as a prior distribution and updating the probabilities 14 with \true" data from interest group rankings. The prior probabilities, Dem and Rep are created by weighting the aggregate response probabilities for each party's House candidate ideology. For instance the unweighted probability that the Republican candidate will be placed as extremely conservative on the seven-point scale, P (Rep = 7), is the number of respondents placing their Republican candidate at 7 divided by the total number placing the Republican candidate. The probabilities are then weighted for each respondent according to their level of certainty in placing the candidate: \very certain", \certain", and \not certain"4 . The weights are the mean dierence between the group's mode and the distance of interest. So respondents in a `certainty by candidate' group with a very large mode value and a sharp mean drop to nearby categories have their own mode upweighted proportionally (as well as near mode values correspondingly downweighted). The eect is to alter the probabilities whereby more certain respondents have lower prior entropy. Figure 2 summarizes these the six diering ideology placement distributions by party and certainty. Not surprisingly, respondents indicating that they were not certain about ideology placement tended to put the candidates in the middle, clustered around the moderate category. Conversely respondents that were very sure about ideology placement were not reluctant to label candidates as extreme. Figure 2 also indicates that there is less variance in the distribution of ideology placement for Republicans than Democrats. This leads back to the research question: Did the Contract with America produce less voter uncertainty about Republican House candidates? The posterior ideology entropy is constructed by introducing American Conservative Union (ACU) and Americans for Democratic Action (ADA) scores as new observations. The initial scale is created by equally weighting ACU and (100-ADA) values for Democratic and Republican House members in the 103rd House. Six equally spaced cut points are identied, creating frequencies on a 7 point-scale for each party's House member ideology. An empirical pdf for the new observations is produced by normalizing the ordinal scale quantities by the total. The basic entropy formula is modied by adding an unknown function of the new observations: Hbij = , X l ln(l ) + g(; ) (8) where l is the probability of observing the lth ideology category out of L = 7, and l are the corresponding prior probabilities. The value, Hbij , is 4 VAR 844 for the Democratic candidate and VAR 846 for the Republican candidate 15 required to be a maximum with regard to marginal changes in the l , so for each of the L values: b l Hij = ,(1 , ln(l )) + l g(; ) (9) Bevensee (1993) provides a simple function that meets this requirement: g(; ) = L X L X l=1 l=1 (l + l ln( l )) = 1 + l ln( l ) = 0 (10) This makes (8): Hbij = , L X l ln(l ) + 1 + l=1 L X =1, l=1 L X l=1 L X l ln( l ) , l l ln( l ) l=1 l ln( l ) l (11) I drop the \1" term above since is it is uniform across all entropy comparisons in the model and also constitutes a trivial entropy shift (Bevensee 1993). It is easy to show that the second derivative with respect any l is negative so that Hb ij is uniformly concave down and therefore has only one maxima. The continuous function form of (11) is well-studied in the physics literature and is generally referred to as cross entropy. The modied prior entropy formulation for ideology (11) now captures both the respondents view of candidate ideology uncertainty and a corresponding exogenous empirical estimate5 . The candidates are dierentiated by party, so j = 1 indicates the Democratic House candidate and j = 2 indicates the Republican House candidate in the 1994 congressional election. If for voter i, Ui1 > Ui2 because candidate 2 has greater summed squared issue distances and more uncertainty, then i should vote for the Democratic candidate. The form of the utility function and the dichotomous response suggest a relatively simple probit regression test. A small imperfection exists with regard to pairing these two terms. The exogenous estimate is derived from interest group rankings for the members of the 103rd House and I have aggregated them by party. Conversely the respondents are asked to place the 1994 House candidates by party. So to the extent that these are diering groups of individuals, the two measures are weakly independent. If one accepts the argument that the two parties do not radically change in ideological composition in a two year period, then this concern is considerably lessened. 5 16 3.1 Heteroscedasticity Attitudes and assessments about political gures are likely to vary across respondents such that the residuals from a simple generalized linear model are heteroscedastic. A simple diagnostic graphical test of the model described above indicates that this is the case here. It is not surprising to nd that respondents perceive choices and candidate issue dierences with nonuniform variation. Heteroscedasticity is particularly troublesome in qualitative choice models as the resulting maximum likelihood estimates of the coecients are often inconsistent (Amemiya 1981, Davidson and MacKinnon 1984) and the asymptotic covariance matrix (computed from the negative inverse of the Hessian at the maximum likelihood coecient values) is biased (Yatchew and Griliches 1985). Yatchew and Griliches also point out that the biases are particularly problematic when an explanatory variable is correlated with the heteroscedasticity (1985, p.138). An eective way of compensating for heteroscedasticity in probit models is to identify parameters that introduce widely varying response patterns and specically associate them in the model with the variance term. This method, based on Harvey (1976), is developed in Greene (1993, p.649-51), and eectively employed by Alvarez and Brehm (1995, 1997) to address response heteroscedasticity from issue ambivalence. The functional form is changed from: Y = (X ) + to: (12) ! Y = X + eZi0 (13) for a set of model coecients, , with corresponding matrix of explanatory observations X, and a set of variance determination coecients, , with a corresponding matrix of explanatory observations Z. Therefore, the variance term is now reparameterized to be a function of some additional set of coecients and observed values that are suspected of causing the dierences in error variances: i2 = eZi0 (14) The vector is an additional set of coecients that are estimated but, instead of tting variance in the response vector, they t variance in the residuals of the t from the rst set of coecients: . This changes the log 17 likelihood of the probit model from: logL(X ) = n X i=1 to: n X " # xi log((X )) + (1 , xi)log(1 , (X )) " ! (15) ! # X logL(X; Z) = xi log( eX Z ) + (1 , xi )log(1 , eZ ) (16) i=1 The inclusion of a separate parameterized variance component in (16) addresses the heteroscedasticity problem. A formal test for heteroscedasticity is a likelihood ratio test using the value of the log likelihood in the restricted (15) (the restriction is that = ~0) and the unrestricted (16) evaluated at the maximum likelihood estimates of the parameters: and (Greene 1993, p.650). The test is simply: LR = 2 Lunrestricted , Lrestricted (17) where LR is distributed 2 with degrees of freedom equal to the length of the Z vector (number of variance estimation parameters). 3.2 Testing the Entropy Model of Uncertainty The 1994 congressional election represents an interesting and perhaps unique opportunity to test the eect of uncertainty on vote choice. The Republican House candidates signed the \Contract with America" on September 9, 1994 with the goal of capturing the majority in Congress by specically identifying their policy positions through a set of ten promised House oor votes. It is reasonable to predict that a clearly dened, written agenda, widely reported in the media, increases voter certainty about the Republican candidates' positions. I test this claim by applying the entropy model described above using survey data from Center for Political Study's 1994 American National Election Study conducted between November 9, 1994 and January 9, 1995. The data include placements for 1; 795 respondents on current issues, evaluations of House candidate positions, and demographic information. Respondents were asked both their own position and the position of the House candidates on ve issues: support for Clinton, concern about crime, the government's role in providing economic opportunities to citizens, appropriate level of government spending, and the appropriate level of government involvement in the healthcare system. These variables are particularly useful 18 since they are policy areas addressed by the 103rd Congress and the respondent places themself and the two candidates for the House. Each of these fteen variables (ve times each for the respondent, the Republican, and the Democrat) are rescaled to the seven-point format6 . Ideology and party identication summary are left in their seven-point format. In each case, those responding \Don't Know" are placed at the midpoint of the scale in order to minimize their squared error distance and subsequent eect on the model. Responses coded as inappropriate or refused are left as missing values. Cases were the House race is for an open seat were dropped since the survey does not provide data on support for Clinton and (obviously) vote on the Crime bill for non-incumbent candidates. This reduces the sample size to 942. Three general demographic variables are included in the model: education (in years), gender, and race (white or non-white). The model also includes two variables which are intended to capture the respondents' levels of interest and information. The study includes a question which asks the respondents to place themselves on a three-point scale in terms of their attention to the recent campaigns (VAR 124). Respondent information is measured by mean-averaging respondents' number of days reading the newspaper per week (VAR 125), number of days watching television per week (VAR 126) and number of days listening to news on the radio per week (VAR 127). All seven-point scales are constructed whereby higher values are those traditionally associated with Republican positions. For instance, the question that asks about increasing or decreasing government spending for the provision of domestic services (VAR 940) was recoded to make \7" the response indicating government should provide far fewer services. The dependent variable is candidate choice in the 1994 House election by party of the candidate7 . So there are exactly two utility functions from The respondents' Support for Clinton measure was transformed from the ve-point scale to a seven-point scale by combining the approve/disapprove question (VAR 201) with the strength of approval/disapproval question (VAR 202). The seven-point Democratic Candidate Support for Clinton variable and Republican Candidate Support for Clinton variable were created by combining House incumbent supports Clinton Half the time (VAR 647) with House incumbent supports Clinton almost always (VAR 645) and House incumbent supports Clinton almost never (VAR 646) then cross-referencing by incumbent's party (VAR 17). Concern About Crime for the respondent and the two candidates is created by rescaling voted for or supported the Crime bill (VAR 1040 for the respondent, VAR 648 and VAR 649 for the incumbent, cross referenced again by incumbent's party), where the respondents' interior points are derived from the question asking about appropriate level of spending on programs dealing with crime (VAR 825). 7 Democrats coded 0, and Republicans coded 1. 6 19 (7) for each respondent: UiD for the ith respondent's utility from the Democratic candidate, and UiR for the ith respondent's utility from the Republican candidate. The inferential value of this qualitative choice model is assessed by the frequency with which the respondent selects the candidate with the higher utility function. Table 1 summarizes the results from the heteroscedastic probit regression model, (16), on these data. Table 1: Parameter Estimates for Entropy Model of Vote Choice 1994 American National Election Study Choice Parameters Intercept Democratic Support for Clinton Republican Support for Clinton Democratic Crime Concern Republican Crime Concern Democratic Gvt. Help Disadv. Republican Gvt. Help Disadv. Democratic Gvt. Spending Republican Gvt. Spending Democratic Federal Healthcare Republican Federal Healthcare Democratic Ideology Entropy Republican Ideology Entropy Party Identication Scale Variance Parameters Political Interest Education Gender Race Media Exposure Age Parameter Standard Estimate Error z-statistic p-value -1.116 -0.015 0.030 0.044 0.007 0.029 -0.006 0.114 -0.100 0.031 -0.017 0.104 0.303 0.368 0.387 0.008 0.011 0.009 0.009 0.011 0.013 0.025 0.025 0.008 0.010 0.131 0.068 0.028 -2.882 -1.943 2.701 4.960 0.699 2.698 -0.438 4.633 -4.030 3.670 -1.691 0.794 4.437 13.158 0.004 0.052 0.007 0.000 0.485 0.007 0.661 0.000 0.000 0.000 0.091 0.427 0.000 0.000 -0.013 0.055 -0.184 -0.239 -0.041 -0.016 0.076 0.034 0.105 0.160 0.024 0.003 -0.176 1.633 -1.750 -1.501 -1.700 -4.738 0.860 0.102 0.080 0.133 0.089 0.000 LR Heteroscedasticity Test: LR = 68:2317; p < 0:0001 for 2df =6 Goodness of Fit Test: LRT = 359:3869; p < 0:0001 for 2df =19 Percent Correctly Classied: 78.66% (using the \naive criteria") 20 The heteroscedasticity likelihood ratio test clearly indicates that we are justied in developing the Harvey heteroscedastic probit model for these data as the 2 test shows signicance at p < 0:0001. Although several of the variance parameters show poor signicance, exclusion from the model actually worsened some of the choice parameter signicance levels. It is interesting to note that the political interest coecient is not signicant as one would normally expect political interest to increase a respondent's attention to the election and therefore reduce uncertainty. By a large margin the age variable is the most signicant of the variance parameters (the only signicant one at = 0:05). Since the sign on the coecient is negative and the exponential term (ez ) is in the denominator of (16), then there is evidence of greater variability in younger voters: these cases needed greater downweighting. The goodness of t test (LRT) shows strong evidence that the model selection is appropriate8 . I include the percent correctly classied value (79%) with some trepidation as it is based here on a threshold value of 0.5. If the sample of response variables is unbalanced in one direction or another, then the 0.5 threshold can incorrectly classify cases (Greene 1993, p.652). Since there were 498 \1"s in the sample (Republican votes) out of n = 942, there is little justication to explore alternative threshold values. The results in Table 1 suggest that the model does reect vote choice under uncertainty in a reasonable manner. Of the ten policy issue distance by candidate variables, all but two are signed as expected, six are signicance at the p < 0:001 level, and no non-signicant coecient values are paired for the same policy issue. This suggests that there is an eect on vote choice for squared respondent/candidate issue distance for policy on crime, government help for underprivileged social classes, government spending on social programs, and federal involvement in the healthcare system. The single policy issue that is not signed as expected is the candidates perceived support for Clinton's policies. In this case as squared issue distance between the respondent and the Democratic candidate increases there is a greater probability of voting for the Democratic candidate (although there is a p-value of 0.052 on the coecient). We would normally expect that as the distance increases, a respondent is increasingly dissatised with the Democratic candidate and more likely to vote for the Republican candidate. The sign on the I recognize that the question of which goodness of t measure best describes a qualitative choice model is unanswered. Rather than take a position on this debate I caveat the LRT result above with the caution that there are other measures which may yield diering results. However, given the magnitude of the LRT value here, I suspect these dierences would not be substantial. Amemiya (1981) discusses some alternatives in detail. 8 21 coecient for support for Clinton's policies for the Republican candidate is positive, showing the same kind of counter-intuitive relationship described for the Democratic candidate. This suggests that there is something special happening with this variable. One possible explanation is that Clinton often embraced traditional Republican positions as a \New Democrat" (reinventing government, a focus on the economy, protection of programs for the elderly, reevaluation of welfare policies). Another possible explanation is that this variable is capturing some component of respondents' desire for divided government. As expected, the party identication coecient is large and signicant. There is a vast literature to support this nding even though the eects appear to be waning (Nie, Verba and Petrocik 1976, Schulman and Pomper 1976, Page and Jones 1979, Wattenberg 1990). Note that the Republican ideology entropy term is highly signicant (p < 0:001), whereas there is no evidence for the signicance of the Democratic ideology entropy term. This observation and the positive sign suggest that the reduction in uncertainty is an important factor for the Republicans only in the 1994 elections. Figure 3 shows the the eect of the ideology entropy term on the probability of voting for the Republican candidate at the mean response of the other explanatory variables. Since the ideology entropy coecient for the Democrats is not signicant, it is useless to construct a similar graph for the probability of voting for the Democratic candidate. The eect on vote for the Republican candidate shows a slightly curvilinear increase in probability as the (negative) entropy term increases. Several observations can be made from Figure 3. The mean values of the explanatory variables provide a range of vote probability that certainly favors Republican candidates (55% to 99%). We can also observe that as the Republican ideology entropy term approaches maximum entropy (most certainty), the probability of voting for the Republican candidate approaches one. However, the range of the Democratic entropy term is -1.9652 to -0.0514 compared with -3.1212 to -0.0537 for the Republican candidates, implying that respondents are more uncertain of Republican candidates. This suggests that uncertainty is a larger electoral issue for the Republicans in 1994 both because their positions are less clear, and because they have substantial opportunity to gain votes by reducing uncertainty. Bartels (1996) has a similar nding in which Democratic presidential candidates are advantaged by \uninformed voting." A more dicult question to address is why is the ideology entropy term for the Democratic candidates not signicant when it is for the Republican candidates. Possibly there is more perceived certainty about the Democratic 22 0.8 0.6 0.4 ••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••• • • • • • • • • • • • • • • • • • • • • ••••••••• •••••••••••••••• p < 0.001 •••••••••••••• •••••••••••• • • • • • • • • • • •• • • • • • • • • • •••• •••••••••• ••••••••• •••••••• • • • • • • • •• •••••••• •••••••• ••••••• • • • • • • •• ••••••• ••••••• •••••• • • • • • • • ••••••• ••••••• ••••••• 0.2 Probability of Voting for Republican Candidate 1.0 Figure 3: Eect on Republican Vote Probability by Ideology Entropy Entropy Min = -3.1213 0.0 Entropy Max = -0.0547 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 Ideology Entropy Values for the Republican Candidate candidate's ideology since the Democrats had been the majority party in the House for forty years. This seems appropriate since the majority party's positions are generally better known and there were more Democratic incumbents running in 1994 than Republican incumbents (although they obviously fared worse). If one is willing to give the Republican House leadership a great deal of analytical credit, it could be claimed that they were aware that as the long-term minority party in the House, their policy positions were not well known and there was a need for the \Contract with America" to reduce voter uncertainty. The entropy term in (11) is at its maximum when the p vector is uniform (all equal probabilities: 17 in this case), and the vector is as close to degenerate as possible (nearly all of the density assigned to one value)9 We cannot make the vector perfectly degenerate as this would violate the requirement that prevents division by zero in (11). 9 23 i > 0 8i This is equivalent to a respondent being very sure they can place the Republican candidate on the ideology scale, whereas the Republican candidate has equal probability of falling into any of the seven ideology categories. In the case of a seven-choice set where r1 is assigned 0.9994, and the six other ri are assigned 0.0001 each, we get an entropy value of -5.9487 from (11). If we insert this value for the Republican ideology entropy and hold other explanatory variables at their mean, the probability of a vote for the Republican candidate is 4.34%. Since this is much lower than the smallest observed ideology entropy vote in Figure 3, we know that no respondent was close to being this uninformed. 4 Conclusion Clearly uncertainty matters. This model demonstrates that voters have an aversion to vagueness in campaigns. The implication is that politicians gain by clarifying policy positions provided that those positions are not far out of line with those preferred by the electorate. Or, at a minimum, politicians need to trade-o the cost of uncertainty with the cost of alienating specic groups of voters. Entropy as a measure of uncertainty in political communication, as developed here, indicates a lack of substance in transmissions. Entropy captures the cost of uncertainty in this calculus by measuring this uncertainty through analysis of discrete issue placements. The entropy measure is superior to other tools because it calculates uncertainty without any strict assumptions. Conversely, measures of uncertainty based on variances require indirect estimation of unobservable parameters (Bartels 1988). There is evidence that voter uncertainty diered by the party of the candidate. Although the Republican candidates fared better in policy preferences in the study (and of course in the election as well), Republican candidates could considerably raise their vote probability by clarifying their positions in the 1994 election. The entropy measure of uncertainty showed that the Republican candidates faced greater levels of uncertainty in the 1994 congressional election, but also had the opportunity to substantially improve their vote probability by decreasing uncertainty about their positions in the minds of voters. The \Contract with America" was an eort to stake out and explicate a set of specic policy positions. The shape of the curve in Figure 3 suggests that Republican candidates helped themselves considerably whenever they increased the clarity of their positions at the mean of the other explanatory 24 variables. Conversely, there was no evidence that Democratic candidates for the House were aected by diering levels of uncertainty. It is likely that forty years of controlling the House created a high level of perceived certainty about the policy positions of Democratic candidates. The political sophistication of the Republican leaders in the House during this time is still open to discussion. If the \Contract with American" was developed to increase the clarity of Republican candidates' positions nationwide, then it would imply a high level of sophistication with regard to analyzing voter preferences and uncertainty. However, there are other likely reasons that the leadership developed an explicit agenda and required members to publicly sign the document. These include control of the agenda by the leadership, control of the freshmen members, and a strategy for opposing policies advocated by the executive branch. It is entirely possible that the Republican leadership was not aware of the possible advantage available by reducing uncertainty, but they simply beneted by luck. 25 References [1] Abramson, Paul R. and John H. Aldrich. 1982. \The Decline of Electoral Participation in America." American Political Science Review 76: 502-21. [2] Aczel J. and Z. Daroczy. 1975. On Measures of Information and Their Characterizations. New York: Academic Press. [3] Amemiya, Takeshi. 1981. \Qualitative Response Models: A Survey." Journal of Economic Literature 19: 1483-536. [4] Aldrich, John H., Richard G. Niemi, George Rabinowitz, and David W. Rohde. 1982. American Journal of Political Science 26: 391-414. [5] Alvarez, R. Michael and John Brehm. 1995. \American Ambivalence Towards Abortion Policy: Development of a Heteroscedastic Probit Model of Competing Values." American Journal of Political Science 39: 1055-82. [6] Alvarez, R. Michael and John Brehm. 1997. \Are Americans Ambivalent Towards Racial Policies?" American Journal of Political Science 41: 345-72. [7] Alvarez, R. Michael and Charles H. Franklin. 1994 \Uncertainty and Political Perceptions." Journal of Politics 56 (3): 671-88. [8] Ayres, David. 1994. Information, Entropy, and Progress. New York: American Institute of Physics Press. [9] Bartels, Larry. 1996. \Uninformed Votes: Information Eects in Presidential Elections." American Journal of Political Science 40: 194-230. [10] Bartels, Larry. 1988. \Issue Voting Under Uncertainty: An Empirical Test". American Journal of Political Science 30: 709-28. [11] Bevensee, Robert M. 1993. Maximum Entropy Solutions to Scientic Problems Englewood Clis, N.J.: Prentice Hall. [12] Boltzmann, L. 1877. \Uber die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Warmetheorie und der Wahrscheinlichkeitsrechnung, respective den Satzen uber das Wermegleichgewicht." Wein Ber 76: 373-95. [13] Burnham, Walter Dean. 1987. \The Turnout Problem." In Elections American Style. ed. A. James Reichley Washington, D.C.: Brookings Institution. [14] Carmines, Edward G. and James A. Stimson. 1980. \The Two Faces of Issue Voting." American Political Science Review 74: 78-91. [15] Coughlin, Peter J. 1990. \Candidate Uncertainty and Electoral Equilibria." In Advances in the Spatial Theory of Voting, eds. James M. Enelow & Melvin J. Hinich. Cambridge: Cambridge University Press. 26 [16] Coughlin, Peter J. and S. Nitzen. 1981. \Electoral Outcomes with Probabilistic Voting and Nash Social Welfare Maxima." Journal of Public Economics 15: 113-21. [17] Converse, Phillip E. 1964. \The Nature of Mass Belief Systems." In Ideology and Discontent, ed. David Apter. New York: Free Press. [18] Cover, Thomas M., Joy A. Thomas. 1991. Elements of Information Theory. New York: John Wiley and Sons. [19] Darcy, R. and Hans Aigner. 1980. \The Uses of Entropy in the Multivariate Analysis of Categorical Data." American Journal of Political Science 24: 15574. [20] Davidson, Russell and James G. MacKinnon. 1984. \Convenient Specication Tests for Logit and Probit Models." Journal of Econometrics. 25: 241-62. [21] Denbigh, Kenneth George and J.S. Denbigh. 1985. Entropy in Relation to Incomplete Knowledge. Cambridge: Cambridge University Press. [22] Downs, Anthony. 1957. An Economic Theory of Democracy. New York: Harper and Row. [23] Enelow, James M. and Melvin J. Hinich. 1981. \A New Approach to Voter Uncertainty in the Downsian Spatial Model." American Journal of Political Science 43: 1062-89. [24] Enelow, James M. and Melvin J. Hinich. 1984. \A New Approach to Voter Uncertainty in the Downsian Spatial Model." American Journal of Political Science 25: 483-93. [25] Enelow, James M. and Melvin J. Hinich. 1990. Advances in the Spatial Theory of Voting. Cambridge: Cambridge University Press. [26] Farquharson, Robin. 1969. Theory of Voting. New Haven: Yale University Press. [27] Ferejohn John A. and Morris Fiorina. 1974. \The Paradox of Not Voting: A Decision Theoretic Analysis." American Political Science Review 68: 525-36. [28] Franklin, Charles. 1991. \Eschewing Obfuscation? Campaigns and the Perceptions of U.S. Senate Incumbents." American Political Science Review 85: 1193-214. [29] Gerber, Elisabeth. 1996. \Legislatures, Initiatives, and Representation: The Eects of State Legislative Institutions on Policy." Political Research Quarterly 49: 263-86. 27 [30] Greene, William H. 1993. Econometric Analysis. Second Edition. New York: Macmillan Publishing. [31] Harvey, A.C. 1976. \Estimating Regression Models with Multiplicative Heteroscedasticity." Econometrica 44: 461-5. [32] Hinich, Melvin J. 1977. \Equilibrium in Spatial Voting: The Median Voter Is An Artifact." Journal of Economic Theory 16: 280-19. [33] Jaynes, Edwin T. 1957. \Information Theory and Statistical Mechanics." Physics Review 106: 620-30. [34] Jaynes, Edwin T. 1968 \Prior Probabilities." IEEE Transmissions on Systems Science and Cybernetics SSC(4): 227-41. [35] Jaynes, Edwin T. 1982. \On the Rationale of Maximum-Entropy Methods." Proceedings of the IEEE 70(9): 939-52. [36] Kinchin, A. 1957. Mathematical Foundations of Information Theory. Translation by R. Silverman and M. Friedman, New York: Dover. [37] Kolmogorov, A.N. 1959. \Entropy per Unit Time as a Metric Invariant of Automorphisms." Dokl. Akad. Nauk SSSR 124: 754-55. [38] Lewis, Gilbert N. 1930. \The Symmetry of Time In Physics ." Science LXXI: 569-77. [39] Nie, Norman H., Sidney Verba and John R. Petrocik. 1976. The Changing American Voter. Cambridge, MA: Harvard University Press. [40] Nimmo, Dan. 1970. The Political Persuaders. Englewood Clis, NJ: PrenticeHall. [41] Olson, Mancur. 1965. The Logic of Collective Action. Cambridge, MA: Harvard University Press. [42] Ordeshook, Peter C. 1986 Game Theory and Political Theory. Cambridge: Cambridge University Press. [43] Page, Benjamin I. and Calvin C. Jones. 1979. \Reciprocal Eects of Policy Preferences, Party Loyalties and the Vote." American Political Science Review 73: 1071-89. [44] Palfrey, Thomas R. and Keith T. Poole. 1987. \The Relationship Between Information, Ideology, and Voting Behavior." American Journal of Political Science 31: 512-30. [45] Planck, Max. 1906. Theorie der Warmstrahlung. Leipzig: J.A. Barth. 28 [46] Robert, Claudine. 1990. \An Entropy Concentration Theorem: Applications in Articial Intelligence and Descriptive Statistics." Journal of Applied Probability 27: 303-13. [47] Ruelle, David. 1991. Chance and Chaos. Princeton: Princeton University Press. [48] Ryu, Hang K. 1993. \Maximum Entropy Estimation of Density and Regression Functions." Journal of Econometrics 56: 397-440. [49] Schulman, Mark A. and Gerald M. Pomper. 1975. \Variability in Election Behavior: Longitudinal Perspectives from Causal Modeling." American Journal of Political Science 19: 1-18. [50] Shannon, Claude. 1948. \A Mathematical Theory of Communication." Bell System Technology Journal 27: 379-423, 623-56. [51] Shepsle, Kenneth A. 1972. \The Strategy of Ambiguity: Uncertainty and Electoral Competition." American Political Science Review 66: 555-68. [52] Soo, E.S. 1992. \A Generalizable Formulation of Conditional Logit with Diagnostics." Journal of the American Statistics Association 87: 812-6. [53] Teixera, R. 1992. The Disappearing American Voter. Washington: Congressional Quarterly Books. [54] Theil, Henri. 1967. Economics and Information Theory. Amsterdam: North Holland Publishing Company. [55] Theil, Henri and Denzil G. Fiebig. 1984. Exploiting continuity: maximum entropy estimation of continuous distributions Cambridge, MA: Ballinger Publishing Company. [56] Tribus, Myron. 1961. \Information Theory as the Basis for Thermostatics and Thermodynamics." Journal of Applied Mechanics 28: 1-8. [57] Tribus, Myron. 1979. \Thirty Years of Information Theory." In The Maximum Entropy Formalism, eds. Raphael D. Levine & Myron Tribus. Cambridge: MIT Press. [58] Tribus, Myron and Edward C. McIrvine. 1971. \Energy and Information" Scientic American 225(3): 179-88. [59] Van Campenhout, Jan M. and Thomas M Cover. 1981. \Maximum Entropy and Conditional Probability" IEEE Transactions on Information Theory 27(4): 483-9. [60] Von Neumann, John. 1927. \Thermodynamik Quantenmechanischer Gesamtheiten." Gott. Nach. 273-91. 29 [61] Yatchew, Adonis and Zvi Griliches. 1984. \Specication Error in Probit Models." Review of Economics and Statistics 67: 134-9. [62] Wattenberg, Martin P. 1990. The Decline of American Political Parties. Cambridge, MA: Harvard University Press. [63] Wolnger, Raymond E. and Steven J. Rosenstone. 1980 Who Votes. New Haven: Yale University Press. 30