Download The Political Entropy of Vote Choice: An Je Gill

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Political Entropy of Vote Choice: An
Empirical Test of Uncertainty Reduction
Je Gill
Department of Political Science
Department of Statistics
Cal Poly University
E-mail: [email protected]
Prepared for the 1997 Annual Meeting of the American Political Science Association, August 27-31, Washington, DC. The American National Election Study data used
in this research were produced by the Center for Political Studies at the University of
Michigan and distributed by the Inter-University Consortium for Political and Social Research. Replications data sets are available upon request. Data analysis and graphical
representations were produced using S-Plus Version 3.4. and GAUSS Version 3.2.25.
Thanks to Stuart Wugalter, SSDBA Administrator, for database support, Dianne Long,
David George and Randal Cruikshanks for ideas and input, and Je Moore for the original
inspiration.
Abstract
Recent literature in voting theory has developed the idea that individual voting preferences are probabilistic rather than strictly deterministic. This work builds upon spatial voting models (Enelow and
Hinich 1981, Ferejohn and Fiorina 1974, Davis, DeGroot and Hinich
1972, Farquharson 1969) by introducing probabilistic uncertainty into
the calculus of voting decision on an individual level. Some suggest that
the voting decision can be modeled with traditional probabilistic tools
of uncertainty (Coughlin 1990, Coughlin and Nitzen 1981). Entropy is
a measure of uncertainty that originated in statistical thermodynamics.
Essentially, entropy indicates the amount of uncertainty in probability
distributions (Soo 1992), or it can be thought of as signifying a lack
of human knowledge about some random event (Denbigh and Denbigh, 1985). Entropy in statistics developed with Kolmogorov (1959),
Kinchin (1957), and Shannon (1948), but has rarely been applied to
social science problems. Exceptions include Darcy and Aigner's (1980)
use of entropy to analyze categorical survey responses in political science, and economic applications by Theil (1967) and Theil and Fiebig
(1984).
I examine voters' uncertainty as they assess candidates, and measure policy positions. I then test whether or not these voters minimize
the cost of voting (specically the cost of information) by determining a
maximum entropy selection. Except for the inclusion of entropy terms,
this approach is similar to others in the recent literature. In this paper
I develop a measure to aggregate evaluation of issue uncertainty and
corresponding vote choice where the uncertainty parameterization is
derived from an entropy calculation on a set of salient election issues.
The primary advantage of this approach is that it requires very few
assumptions about the nature of the data. Using 1994 American National Election Study survey data from the Center for Political Studies,
I test the hypothesis that the \Contract with America" reduced voter
uncertainty about the issue positions of Republican House candidates.
The entropic model suggests that voters used the written and explicit
Republican agenda as a means of reducing issue uncertainty without
substantially increasing time spent evaluating candidate positions.
An overview of this paper is as follows. Section 1 reviews the relevant literature on voting under uncertainty. Section 2 briey introduces the history and theory of entropy. This section can be skipped
by those familiar with entropy. Section 3 develops and tests a maximum entropy model of voter uncertainty from the 1994 congressional
elections. Conclusions are provided in Section 4.
1
1 Introduction: Information, Uncertainty, and Voting
Complexity, obfuscation, vagueness, and uncertainty are permanent features of American electoral politics (Aldrich, Niemi, Rabinowitz, and Rohde
1982). As modern social and economic issues gain in complexity and legislatures increasingly dodge controversial and divisive votes, a greater number
of misunderstood, and poorly explained issue positions are thrust onto the
ballot. Candidates often have motivations to make their issue positions deliberately vague (Downs 1957, Shepsle 1972, Franklin 1991). This can be out
of a desire not to alienate certain constituency groups or simply that the candidate lacks a strongly dened position. Complex or lengthy ballot measures
can also confuse voters (Gerber 1996). These are commonly worded in stern
legal language and many voters lack the time or analytical skills necessary to
determine policy consequences. There is evidence of this phenomenon even
in very high salience elections. Bob Dole made deliberately vague statements
on his abortion position in the 1996 presidential election after conservatives
in the Republican party successfully placed strong anti-abortion rights language into the party platform at the San Diego convention (San Francisco
Chronicle August 5, 1996, p.A1). The 1996 ballot initiative in California
to eliminate state supported armative action programs (Proposition 209)
confused many voters on its actual intent (Los Angeles Times, October 31,
1996, p.A22). The words "armative action" do not appear anywhere in
the bill's language, and the bill is counter-intuitively entitled the \California
Civil Rights Act." In campaigns it is a well-worn strategy to recast the language of an opponents issue position into something less popular with voters
(Nimmo 1970), and with frequent occurrence this has a generally confusing
eect on the electorate.
So why do many voters endure high levels of uncertainty and cast their
ballot? The answer is elusive, though well studied in the literature (Downs
1957, Ferejohn & Fiorina 1974, Olson 1965, Wolnger & Rosenstone 1980,
Abramson & Aldrich 1982, Burnham 1987, Teixera 1992). Rather than address the eect of uncertainty on turnout, I address those individuals that
make a decision to vote despite uncertainty. The motivation stems from
the observation that models developed without incorporating a voter uncertainty component miss an important criterion of vote choice. Ordeshook
(1986, p.187) points out: \Models that rely on the assumption of complete
information may oer a misleading view of the democratic process". Hinich
(1977) found that even a small amount of uncertainty is enough to destroy
2
Black's elegant medium voter equilibrium result. Coughlin (1990) points
out that while considerable work has been done on voting under uncertainty
in two-candidate elections (where voters abstain only due to indierence),
we still don't have a unied theory leading to equilibrium conditions. Alvarez and Franklin (1994) note the additional problem that there is not a
consensus in the literature on how uncertainty is measured, although they
were able to exploit the features of one particular survey that provided very
specic information on issue-level uncertainty to reveal subtle patterns in
the form of the responses.
Bartels (1988) looks at the 1980 presidential election and nds that voters in general dislike uncertainty and this variance around candidate positions can exceed perceived spatial distances between the candidates. Bartels is the rst to point out that uncertainty is not just a function of the
candidate or measure, but that voters have diering levels uncertainty and
position (respondent heterogeneity). Like the Bartels uncertainty model,
the model developed here rests on the work of Enelow & Hinich (1981) and
Shepsle (1972) in that issue positions are not singular and xed. Instead they
are represented by probability distribution functions with manifest observations (samples). It is precisely the quality of these samples, which provide
information about the unknown distribution, that aect levels of certainty.
This work leverages a unique aspect of the 1994 midterm election to
evaluate how voters interpret diering levels of information about House
candidate positions. In this election the Republican leadership in the House
developed a well-dened and explicit issue agenda. Each Republican incumbent and challenger signed the \Contract with America" promising that if
elected to a majority they would bring ten particular issues to House oor
for a vote within the rst 100 days of the 104th Congress. The model developed here focuses on the uncertainty dierence between Republican and
Democratic candidates in the 1994 House election. Unlike previous measures of uncertainty which come from survey based indicators or respondent
self-assessments, I propose the use of a measure of uncertainty that is well
known and tested in the physical sciences. The entropy approach focuses
on measures of respondent uncertainty that is produced by the format and
content of the survey question.
3
2 What Is This Thing Called Entropy?
2.1 Entropy in Thermodynamics
In 1865 Clausius asserted that an isolated physical system not in equilibrium
always tends to equilibrium. Equilibrium in thermodynamics means absolutely nothing is happening to the particles, as opposed to the conventional
mathematical denition of a system having converged to some maxima or
minima. While equilibrium states are not very interesting to physicists, the
process of reaching equilibrium is well worth consideration. In fact, Clausius's assertion is now known as the second law of thermodynamics. At the
heart of this process is the idea of reversibility: \A process that can go in
in two directions is reversible if two (or more) directions are equally probable for all starting points, given a random nudge" (Ayres 1994, p.1). Or
more precisely, a process is reversible with regard to some serial explanatory
factor, if inserting , for in all model equations leaves these equations
unchanged. Clausius operationalized entropy in the following simple way:
dS = k dQ
T
(1)
Where dS is the system's entropy change, dQ is the absolute value of the
energy reduction in a system, T is the initial temperature of the system,
and k is the number of molecules in the system times 1.38 joule per degree
Kelvin (Boltzmann's constant). So large incremental changes in a system's
energy relative to its starting absolute temperature produce large gains in
entropy, and entropy is maximized when all of the energy is taken out of
the system. What made Clausius' assertion so threatening (and therefore
controversial) to physicists at the time was that it contradicted the then
current paradigm that Newtonian mechanics implied reversibility in equations of particle motion. This can be seen in equation (1) because positive
or negative changes in energy increase the entropy of a system. So entropy
is always either constant or increasing.
Boltzmann (1877) developed the idea of the most probable distribution
given a set of observations. From this he dened entropy as a physical
description of uncertainty and disorder in particle systems. To understand
Boltzmann's construct imagine a nite number of atomic particles in a three
dimensional box, each with some exactly known position, velocity and direction. We therefore know the positions of the particles in the next time
period given this information about the particles in the current time period,
although the number of calculations tends to innity very quickly in actual
settings. Now suppose that we have some uncertainty about the position,
4
velocity, and direction of our particles. So instead of knowing exactly the
position of the particles in the next time period we have a set of possible positions. Therefore there are a number of dierent \microstates" the system
can have in the next time period. Add to this situation the complexity that
as the particles are moving around they have diering energy states. Given
these uncertainties there are clearly a large number of possible microstates
the system can take on.
Boltzmann dened entropy as the number of digits in the the number of possible microstates (i.e. the log of the number microstates). Suppose that the the number of possible microstates in a given system is:
95; 000; 000; 000; 000 = 9:5X 1013 , then entropy = klog10 (1X 1013 ) = 13k
(where k is a proportionality factor). So Boltzmann's idea of entropy is a
statistically derived method that indicates the amount of uncertainty (i.e.
randomness) in a particle system where mathematically the number is transformed to a more manageable form (Plank 1906, Von Neumann 1927).
2.2 Entropy in Information Theory
There are many denitions of information in various literatures but all of
them have the same property of distinction from a message. If a message is
the physical manifestation of information transmission (language, character
set, salutations, headers/trailers, body, etc.), then the information is the
substantive content independent of transmission processes. For example, in
baseball the third base coach transmits a message to the runner on second
base through the use of hand-gestures. The information contained in the
hand gestures is independent of the actual gesture to the base-runner. Suppose you could condense a set of messages down to a minimal nite length.
Then the information content would be the number of digits required by
these alternate sets of messages. In this case information and message would
be identical. Unfortunately information and message are never exactly the
same. For example, the DNA strand required to uniquely determine the
characteristics of a person contains far more codes than is minimally sufcient to transmit the genetic information. In information theory entropy
is a measure of the average amount of information required to describe the
distribution of some random variable of interest (Cover and Thomas 1991).
More generally information can be thought of as a measure of reduction in
uncertainty given a specic message (Shannon 1948, Ayres 1994, p.44).
Suppose we wanted to identify a particular voter by serial information
on this person's characteristics. We are allowed to ask a consecutive set of
yes/no questions (i.e. like the common guessing game). As we get answers
5
to our series of questions we gradually converge (hopefully) on the desired
voter. Our rst question is: does the voter reside in California. Now since
about 13% of the voters in the United States reside in California, a yes answer gives us dierent information than a no answer. Restated, a yes answer
reduces our uncertainty more than a no answer. If Pi is the probability of
the ith event (residing in California), then the improvement in information
is:
log 1 = ,log P :
(2)
2
2 i
Pi
The log is base 2 since there are only two possible answers to our question
(yes and no) making the units of information bits. In our example Hi =
,log2 (0:13) = 2:943416 bits, whereas if we had asked: does the voter live in
Arkansas then an armative reply would have increased our information by:
Hi = ,log2 (0:02) = 5:643856 bits, or about twice as much. However, there
is a much smaller probability that we would have gotten an armative reply
had the question been asked about Arkansas. What Slater (1938) found, and
Shannon (1948) later rened was the idea that the \value" of the question
was the information returned by a positive response times the probability
of a positive response. So the the value of the ith binary-response question
is just:
H = f log 1 = ,f log f :
(3)
i
i
2
i
fi
2 i
And the value of a series of n of these questions is:
n
X
i=1
Hi = k
n
X
i=1
n
X
filog2 f1 = ,k filog2 fi:
i
i=1
ith yes
(4)
Where fi is the frequency distribution of the
answer and k is an
arbitrary scaling factor that determines choice of units. The arbitrary scaling factor makes the choice of base in the logarithm unimportant since we
can change this base by manipulating the constant1 . The function (4) was
actually rst introduced by Slater (1938), although similar forms were referenced by Boltzmann where fi is the probability that a particle system is
in microstate i.
For instance if the entropy form were expressed in terms of the natural log, but log2
was more appropriate for the application (such as above), then setting k = ln12 converts
the entropy form to base 2.
1
6
Its easy to see that the total improvement in information is the additive value of the series of individual information improvements. So in our
simple example we might ask a series of questions narrowing down on the
individual of interest. Is the voter in California? Is the voter registered as
a Democrat? Does the voter reside in an urban area? Is the voter female?.
The total information supplied by this vector of yes/no responses is the total
information improvement in units of bits since the response-space is binary.
Its important to remember that entropy is dened only with regard to a
well-dened question having nite, enumerated responses much the way entropy in statistical thermodynamics is dened only with regard to a carefully
dened system (Tribus and McIrvine 1971).
The link between information and entropy is often confusing due to differing terminology and usage in various literatures. In the thermodynamic
sense, information and entropy are complementary: \gain in entropy means
loss in information - nothing more" (Lewis 1930). So as uncertainty about
the system increases, information about the conguration of the microstate
of the system decreases. In terms of information theory, the distinction is less
clear. Shannon (1948) presented the rst unambiguous picture. To Shannon
the function (4) represented the expected uncertainty. In a given communication possible messages are indexed m1 ; m2 ; : : : ; mk , and each have an
associated probability p1 ; p2 ; : : : ; pk . The probabilities necessarily sum to
one, so the scaling factor is 1 here. The Shannon entropy function is the
negative expected value of the natural log of the probability mass function:
H = ,k
X
piln(pi)
(5)
Shannon dened the information in a message as the dierence between the
entropy before the message and the entropy after the message. If there is no
information before the message, then a uniform prior distribution is assigned
to the pi and entropy is at its maximum2 In this case any result increases
our information. Conversely if there is certainty about the result, then a
degenerate distribution describes the mi , and the message does not change
our information level3 . So according to Shannon, the message produces
a new assessment about the state of the world and this in turn leads to
the assignment of new probabilities (pi ) and a subsequent updating of the
entropy value. It is now easy to see why entropy and bayesian methods are
compatible.
P ln( ) = ln(n), so entropy increases logarithmically with the number of
equally likely
Palternatives.
H = , , (0) , log(1) = 0
2
3
H=,
1
n
1
n
n 1
i=1
7
The simplicity of the Shannon entropy function belies its theoretical and
practical importance. Shannon (1948, Appendix 2) showed that (5) is the
only function that satises the following three desirable properties:
1. H is continuous in (p1 ; p2 ; : : : ; pn ).
2. If the pi are uniformly distributed, then H is at its maximum and is
monotonically increasing with n.
3. If a set of alternatives can be reformulated as multiple, consecutive sets
of alternatives, then the rst H should equal the weighted sum of the
consecutive H values: (H (p1 ; p2 ; p3 ) = H (p1 ; 1 , p1 )+(1 , p1 )H (p2 ; p3 ).
Up until this point only the discrete form of the entropy formulation has
been discussed. Since entropy is also a measure of uncertainty in probability
distributions, its important to have a formulation that applies to continuous probability distribution functions. The following entropy formula also
satises all of the properties enumerated above:
H=
+1
Z
,1
f (x)ln(f (x))dx
(6)
This form is often referred to as the the dierential entropy of a random
variable X with a known probability density function f(x). There is one
important distinction between (6) and (5): the discrete case measures the
the absolute uncertainty of a random variable given a set of probabilities,
whereas the continuous case this uncertainty is measured relative to the
chosen coordinate system (Ryu 1993). So transformations of the coordinate
system require transformation in the probability distribution function with
the associated jacobian. In addition, the continuous case can have innite
entropy unless additional, \side" conditions are imposed (discussed below).
There has been considerable debate about the nature of information with
regard to entropy (Ayres 1994, Ruelle 1991, Tribus 1979, Jaynes 1967). A
major question arises as to whose entropy is it? The sender? The receiver?
The transmission equipment? These are often philosophical debates rather
than practical considerations. Shannon's focus was clearly on the receiver
as the unit of entropy measure. Jaynes (1979) considers the communication
channel and its limitations as the determinants of entropy. Some, like Aczel
and Daroczy (1975), view entropy as a descriptor of a stochastic event.
The traditional thermodynamic denition described above sees entropy as
a measure of uncertainty in microstates. For the purposes of this work, the
Shannonian denition (5) is used. The perspective developed in this work
8
is that the voter has some ignorance of candidate preferences and policies
that changes with the receipt of new information.
2.3 The Maximum Entropy Concept
Maximum entropy can be conceptualized in a very simple manner. Suppose
you were to bet on the outcome of the role of two dice that are not necessarily \fair" (a non-uniform distribution of outcomes on each die). There
are obviously eleven possible outcomes each having unknown probabilities
of occurrence. Even though you did not know the distribution of outcomes,
you would still be advised to bet on the number seven as opposed to two or
twelve since there are more combinations that produce seven (6) than two
(1) or twelve (1). The bet on seven is the maximum entropy bet because,
lacking prior information about the probabilities, seven produces the maximum expected value. The theoretical basis of this example is best explained
by Jaynes (1968): \...the probability distribution which maximizes the entropy is numerically identical with the frequency distribution which can be
realized in the greatest number of ways."
The rst derivation of maximum entropy was accomplished by Planck
(1906). Shannon (1948, p. 629) proved that with the side condition of:
EX 2 = 2 ( bounded) for some random variable X, the maximum entropy
distribution is gaussian(0; 2 ). In fact all of the commonly used probability distribution functions are maximum entropy distributions provided
the appropriate moment constraints are given. Van Campehout and Cover
(1981) point out that even the Cauchy distribution function is a maximum
entropy
distribution over all distributions that satisfy the side condition:
R
ln(1 + X 2 ) = k, for some constant k. Jaynes (1957) and Tribus (1961)
developed the idea that the maximum of an entropy function has particular
value with regard to the transmission of information. The maximum entropy philosophy can be condensed down to the following statement: when
we make parametric inferences based on incomplete information, we should
select the probability distribution function with the maximum entropy value
given observed data (Jaynes 1982).
Suppose we are interested in making inferences about some unknown
distribution function. We have observations from the unknown distribution
and a set of side conditions that must be met. For instance: the value of the
nth moment, or that the probabilities of the set of possible events sum to
one. The maximum entropy density function is the least informative (and
therefore most conservative) estimation of the the unknown underlying distribution function. Jaynes (1982) points out that distributions with entropy
9
much lower than the maximum entropy are atypical given the data. This
statement nicely links the similar orientation of maximum entropy methods
and traditional maximum likelihood methods.
Figure 1: Simple Maximum Entropy Examples
1.0
1.4
Maximum Entropy
for Bernoulli PMF
ME at 0.5
0.6
0.2
0.5
0.4 Prob
0.2
0.4
0.3
0.8
0.6 abili
ob
Pr
0.1
0.2
0.4
y1
t
0.8
0.6 ility 2
ab
Entropy
0.7
0.2
0.8
0.9
y
trop 1
En
0.6
•••••
••• •••
•• •••
••• ••••
•••
•
•••
••
••
•••
•••
•••
••
•••
•
•••
•••
•••
••
••
•••
••
•••
••
•••
•
•••
•••
•••
••
••
•••
••
•••
••
•••
•
•••
•••
•••
••
••
•••
••
••
••
••
•
••
•••
•••
••
•••
••
•••
••
••
•••
••
•••
••
•••
••
••••
••
••
••
••
••
••
••
••
••
•••
•••
••
•
0.0
0.5
1.0
Maximum Entropy for Trichotomous Outcome
ME at 1.350541
Values of p
As a simple example, suppose we were interested in making simple
Bernoulli inferences derived from ipping a coin. If we have no prior knowledge of the probability of getting a head (p), then what is the maximum
entropy formulation of the Bernoulli probability mass function? We place
the single side restriction that p + pc = 1. So the entropy function is:
H = ,k
X
pilog2 pi = , plog2 p + (1 , p)log2 (1 , p)
In this case the maximum entropy is reached for a uniform distribution
of the two alternatives. Therefore maximizing H subject to the simple side
restriction produces Hmax = 1 where p = 12 . This is seen as the rst example
10
in Figure 1. If there are three alternatives each with a distinct probability,
then the maximum entropy occurs at 1.35 for the uniform probability case
(pi = 0:33). The maximum entropy for this case can be visualized in threespace as shown in the second example of Figure 1. Note that the third
probability does not need to be assigned to an axis since once the rst two
probabilities are selected, the third is determined by the side condition.
2.4 Entropy Applied to Vote Choice
The entropy concept has been applied to a wide variety problems concerning
uncertainty as described above. Yet there is no known application of entropy
to measuring the lack of knowledge political actors have about each other's
positions and intentions. Uncertainty about candidates and ballot issues
is a fundamental problem for voters. Information is necessarily limited by
transmission means (Shannon 1948), candidates and issues can be confusing
at the source (Franklin 1991, Gerber 1996), and voters have limited time
and attention and can search for shortcuts for even the most salient election
issues (Carmines and Stimson 1980). The search for a measure of uncertainty is disadvantaged in that uncertainty in individuals is a subjective and
changing phenomenon (Alvarez and Franklin 1994). Entropy, however, is a
measure of uncertainty independent of a survey respondent's subjective assessment of issue-level uncertainty. It is independent because it is a function
of how the question is constructed (number of response alternatives) and the
distribution of of responses from aggregate data, nothing else. This approach
to measuring voter uncertainty not only has very few assumptions, but it
also applies to survey data where there are no specic uncertainty questions
(Alvarez and Franklin 1994), there is no need to estimate a uniform variance
threshold (Bartels 1988), and no need to exploit some unique attribute in
the data (Franklin 1991). In the next section I build a vote choice model
for the 1994 House election where a respondent's utility, and therefore vote
choice, are a negative function of issue distance and entropy.
3 An Entropy Model of Vote Choice
In this section a model of congressional vote choice is developed using the
entropy concept as a measure of voter uncertainty. This model is based on
the well-known proximity spatial model developed by Downs (1957), Davis,
Hinich, and Ordeshook (1970), Hinich and Pollard (1981), and Enelow and
Hinich (1981, 1984, 1990), and others. Like these works, the model assumes
that voters seek a candidate choice that minimizes the squared Euclidean
11
distance between their own issue preference and the perceived position of the
candidate. The contribution developed here is the addition of an entropic
measure of uncertainty that attempts to capture the observable eects of
voter ignorance on an individual issue basis. For each measured policy issue, the squared distance term is added to an entropy value for that issue.
For example, the healthcare variable is constructed by squaring the distance
between the respondent's preferred level of federal involvement in managing
various healthcare programs and the respondent's placement of the Republican and Democratic candidates on the same scale added to an entropy value
for the healthcare scale. This is done for ve issues topical to the 1994 congressional election in which the American National Election Study queries
respondents about their position and their perceived position of the Democratic and Republican candidates for the House. In addition, the model, like
that of Bartels (1988) and Palfrey and Poole (1987), allows for heterogeneity
of candidate placement on these issues.
Because respondents obviously evaluate candidates on issues beyond
those of current media and congressional attention, an ideology entropy
term is included which attempts to capture issue preference for which the
data set does not provide a direct comparison between respondents' preferences and perceived candidate positions. The assumption is, that lacking
direct and current information about candidates' policy preferences, voters
(or potential voters) use ideology as an evaluation measure of last resort. It
has long been argued that voters are not typically ideological in determining
candidate and issue preference (Converse 1964), that voters are only more
ideological as a response to ideologically dening choices (Nie, Verba and
Petrocik 1976), and that voters can sometimes select candidates that dier
dramatically from their preferred issue positions (Bishop, Tuchfarber and
Oldendick 1978). Unlike these perspectives, the inclusion of the ideology
term focuses not on the ideology of the respondent as a determinant of vote
choice, but instead on respondents' uncertainty about candidate ideology.
In other words, the more uncertainty about a candidate's ideology the lower
the expected utility for that candidate.
The utility function incorporates K = 5 policy preference comparisons
in which respondent and perceived candidate positions can be explicitly
compared with the survey data. The utility of candidate j to voter i is:
Uij =
5 X
k=1
,((Cijk , Rik )2 + Hjk) , Hbij
(7)
where Cijk is respondent i's placement of candidate j on policy issue k,
12
Rik is respondent i's self-placement on issue k, Hij is candidate j 's entropy
for issue k, and Hbij is respondent i's ideology entropy for candidate j . The
entropy function for a given candidate (Democrat or Republican) and a given
issue is constructed according to (5) where the probabilities for each of the
seven-point responses are the empirically observed relative frequencies. For
example, the Democratic candidates' respondent placement distribution on
support for Clinton is the normalized vector:
f0:076880; 0:014485; 0:000000; 0:899164; 0:000000; 0:006685; 0:002786g
producing an entropy value of HDemocrat;Support:Clinton = 0:4040129. The
heterogeneity of uncertainty and squared issue distance for each candidate/voter pairing is captured by indexing through the K = 5 issues.
The ideology entropy term is added to the model as a means of capturing uncertainty about issues that are not specically addressed in the
data. If there is not an issue preference question and a corresponding set of
candidate issue placement questions in the survey, then it is impossible to
obtain the squared distance term for this issue. Since a limited number of
issues is topical to any given campaign and voters necessarily trade-o the
value of information with the cost of obtaining the information, ideology (or
perceived) ideology represents a very convenient method for approximating
candidate position (Downs 1957, Enelow and Hinich 1981). Ideology placement is used as a surrogate for issues that are not highly salient to the 1994
election or where respondent and candidate placement are not included in
the data set.
Because there is uncertainty, the utility of the j th candidate to voter i
is probabilistic in the model as developed in by Enelow and Hinich (1981).
Utility to the ith voter is therefore a combination of proximity to the candidate on currently salient issues and the uncertainty in the voter's mind
about that proximity. If the entropy terms in (7) are relatively large, then
they reduce utility to voter i since their signs are negative. This implies
that voters prefer candidates that are less vague about their issue positions
and ideology.
The entropy term replaces a variance term in the Bartels (1988) model
and removes a major diculty: the variance that a respondent has for a
given issue and candidate is not directly observable. To overcome this obstacle, Bartels is forced to dene a variance threshold (constant across candidates and issues, and respondents) which denes when respondents will
make candidate placement on an issue. Unfortunately this approach requires
the assumption of homogeneity of the uncertainty (variance) threshold, and
13
leads to a more complex, two-stage estimation procedure. Conversely, the
maximum entropy approach makes absolutely no assumptions about the distribution of the variance of uncertainty (Shannon 1948, Jaynes 1968, 1982).
Furthermore, empirical
distributions that satisfy given linear constraints
P
(side conditions), pi = 1 in our case, concentrate around the maximum
entropy distribution which satises the same constraints (Van Campenhout
and Cover 1981, Robert 1990). This nding, formalized as Jaynes' Entropy
Concentration Theorem (1968), provides the motivation for treating the unknown distribution of candidate uncertainty as an entropy term.
0
0 10 20 30
20 40 60
Figure 2: Ideology Placement By Certainty, Respondent Counts by 7-Point
Scale
80
0
40
0
EL
L
SL M SC C EC
Democratic Candidate for ‘Pretty Certain’
40 80 120
EL
L
SL M SC C EC
Republican Candidate for ‘Pretty Certain’
EL
L
SL M SC C EC
Democratic Candidate for ‘Not Certain’
0
50 100
0
EL
L
SL M SC C EC
Republican Candidate for ‘Very Certain’
40 80 120
EL
L
SL M SC C EC
Democratic Candidate for ‘Very Certain’
EL
L
SL M SC C EC
Republican Candidate for ‘Not Certain’
Due to the emphasis on the ideology entropy term in the model, it is constructed dierently than the issue entropy contributions. The ideology entropy, Hb ij , is modied by treating aggregate respondent ideology placement
of candidates by party as a prior distribution and updating the probabilities
14
with \true" data from interest group rankings. The prior probabilities, Dem
and Rep are created by weighting the aggregate response probabilities for
each party's House candidate ideology. For instance the unweighted probability that the Republican candidate will be placed as extremely conservative
on the seven-point scale, P (Rep = 7), is the number of respondents placing
their Republican candidate at 7 divided by the total number placing the Republican candidate. The probabilities are then weighted for each respondent
according to their level of certainty in placing the candidate: \very certain",
\certain", and \not certain"4 . The weights are the mean dierence between
the group's mode and the distance of interest. So respondents in a `certainty
by candidate' group with a very large mode value and a sharp mean drop to
nearby categories have their own mode upweighted proportionally (as well
as near mode values correspondingly downweighted). The eect is to alter
the probabilities whereby more certain respondents have lower prior entropy.
Figure 2 summarizes these the six diering ideology placement distributions
by party and certainty.
Not surprisingly, respondents indicating that they were not certain about
ideology placement tended to put the candidates in the middle, clustered
around the moderate category. Conversely respondents that were very sure
about ideology placement were not reluctant to label candidates as extreme.
Figure 2 also indicates that there is less variance in the distribution of ideology placement for Republicans than Democrats. This leads back to the
research question: Did the Contract with America produce less voter uncertainty about Republican House candidates?
The posterior ideology entropy is constructed by introducing American
Conservative Union (ACU) and Americans for Democratic Action (ADA)
scores as new observations. The initial scale is created by equally weighting
ACU and (100-ADA) values for Democratic and Republican House members
in the 103rd House. Six equally spaced cut points are identied, creating
frequencies on a 7 point-scale for each party's House member ideology. An
empirical pdf for the new observations is produced by normalizing the ordinal scale quantities by the total. The basic entropy formula is modied by
adding an unknown function of the new observations:
Hbij = ,
X
l ln(l ) + g(; )
(8)
where l is the probability of observing the lth ideology category out of
L = 7, and l are the corresponding prior probabilities. The value, Hbij , is
4
VAR 844 for the Democratic candidate and VAR 846 for the Republican candidate
15
required to be a maximum with regard to marginal changes in the l , so for
each of the L values:
b
l Hij = ,(1 , ln(l )) + l g(; )
(9)
Bevensee (1993) provides a simple function that meets this requirement:
g(; ) =
L
X
L
X
l=1
l=1
(l + l ln( l )) = 1 +
l ln( l ) = 0
(10)
This makes (8):
Hbij = ,
L
X
l ln(l ) + 1 +
l=1
L
X
=1,
l=1
L
X
l=1
L
X
l ln( l ) ,
l
l ln( l )
l=1
l ln( l )
l
(11)
I drop the \1" term above since is it is uniform across all entropy comparisons
in the model and also constitutes a trivial entropy shift (Bevensee 1993). It
is easy to show that the second derivative with respect any l is negative
so that Hb ij is uniformly concave down and therefore has only one maxima.
The continuous function form of (11) is well-studied in the physics literature
and is generally referred to as cross entropy. The modied prior entropy
formulation for ideology (11) now captures both the respondents view of
candidate ideology uncertainty and a corresponding exogenous empirical
estimate5 .
The candidates are dierentiated by party, so j = 1 indicates the Democratic House candidate and j = 2 indicates the Republican House candidate
in the 1994 congressional election. If for voter i, Ui1 > Ui2 because candidate
2 has greater summed squared issue distances and more uncertainty, then i
should vote for the Democratic candidate. The form of the utility function
and the dichotomous response suggest a relatively simple probit regression
test.
A small imperfection exists with regard to pairing these two terms. The exogenous
estimate is derived from interest group rankings for the members of the 103rd House and
I have aggregated them by party. Conversely the respondents are asked to place the 1994
House candidates by party. So to the extent that these are diering groups of individuals,
the two measures are weakly independent. If one accepts the argument that the two
parties do not radically change in ideological composition in a two year period, then this
concern is considerably lessened.
5
16
3.1 Heteroscedasticity
Attitudes and assessments about political gures are likely to vary across
respondents such that the residuals from a simple generalized linear model
are heteroscedastic. A simple diagnostic graphical test of the model described above indicates that this is the case here. It is not surprising to nd
that respondents perceive choices and candidate issue dierences with nonuniform variation. Heteroscedasticity is particularly troublesome in qualitative choice models as the resulting maximum likelihood estimates of the
coecients are often inconsistent (Amemiya 1981, Davidson and MacKinnon
1984) and the asymptotic covariance matrix (computed from the negative
inverse of the Hessian at the maximum likelihood coecient values) is biased
(Yatchew and Griliches 1985). Yatchew and Griliches also point out that
the biases are particularly problematic when an explanatory variable is correlated with the heteroscedasticity (1985, p.138). An eective way of compensating for heteroscedasticity in probit models is to identify parameters
that introduce widely varying response patterns and specically associate
them in the model with the variance term. This method, based on Harvey
(1976), is developed in Greene (1993, p.649-51), and eectively employed
by Alvarez and Brehm (1995, 1997) to address response heteroscedasticity
from issue ambivalence. The functional form is changed from:
Y = (X ) + to:
(12)
!
Y = X
+
eZi0 (13)
for a set of model coecients, , with corresponding matrix of explanatory
observations X, and a set of variance determination coecients, , with a
corresponding matrix of explanatory observations Z. Therefore, the variance
term is now reparameterized to be a function of some additional set of
coecients and observed values that are suspected of causing the dierences
in error variances:
i2 = eZi0 (14)
The vector is an additional set of coecients that are estimated but,
instead of tting variance in the response vector, they t variance in the
residuals of the t from the rst set of coecients: . This changes the log
17
likelihood of the probit model from:
logL(X ) =
n
X
i=1
to:
n
X
"
#
xi log((X )) + (1 , xi)log(1 , (X ))
"
!
(15)
! #
X
logL(X; Z) =
xi log( eX
Z ) + (1 , xi )log(1 , eZ ) (16)
i=1
The inclusion of a separate parameterized variance component in (16) addresses the heteroscedasticity problem. A formal test for heteroscedasticity
is a likelihood ratio test using the value of the log likelihood in the restricted
(15) (the restriction is that = ~0) and the unrestricted (16) evaluated at
the maximum likelihood estimates of the parameters: and (Greene 1993,
p.650). The test is simply:
LR = 2 Lunrestricted , Lrestricted
(17)
where LR is distributed 2 with degrees of freedom equal to the length of
the Z vector (number of variance estimation parameters).
3.2 Testing the Entropy Model of Uncertainty
The 1994 congressional election represents an interesting and perhaps unique
opportunity to test the eect of uncertainty on vote choice. The Republican
House candidates signed the \Contract with America" on September 9, 1994
with the goal of capturing the majority in Congress by specically identifying
their policy positions through a set of ten promised House oor votes. It is
reasonable to predict that a clearly dened, written agenda, widely reported
in the media, increases voter certainty about the Republican candidates'
positions. I test this claim by applying the entropy model described above
using survey data from Center for Political Study's 1994 American National
Election Study conducted between November 9, 1994 and January 9, 1995.
The data include placements for 1; 795 respondents on current issues,
evaluations of House candidate positions, and demographic information. Respondents were asked both their own position and the position of the House
candidates on ve issues: support for Clinton, concern about crime, the government's role in providing economic opportunities to citizens, appropriate
level of government spending, and the appropriate level of government involvement in the healthcare system. These variables are particularly useful
18
since they are policy areas addressed by the 103rd Congress and the respondent places themself and the two candidates for the House. Each of these
fteen variables (ve times each for the respondent, the Republican, and
the Democrat) are rescaled to the seven-point format6 . Ideology and party
identication summary are left in their seven-point format. In each case,
those responding \Don't Know" are placed at the midpoint of the scale in
order to minimize their squared error distance and subsequent eect on the
model. Responses coded as inappropriate or refused are left as missing values. Cases were the House race is for an open seat were dropped since the
survey does not provide data on support for Clinton and (obviously) vote
on the Crime bill for non-incumbent candidates. This reduces the sample
size to 942.
Three general demographic variables are included in the model: education (in years), gender, and race (white or non-white). The model also
includes two variables which are intended to capture the respondents' levels of interest and information. The study includes a question which asks
the respondents to place themselves on a three-point scale in terms of their
attention to the recent campaigns (VAR 124). Respondent information is
measured by mean-averaging respondents' number of days reading the newspaper per week (VAR 125), number of days watching television per week
(VAR 126) and number of days listening to news on the radio per week
(VAR 127). All seven-point scales are constructed whereby higher values
are those traditionally associated with Republican positions. For instance,
the question that asks about increasing or decreasing government spending
for the provision of domestic services (VAR 940) was recoded to make \7"
the response indicating government should provide far fewer services.
The dependent variable is candidate choice in the 1994 House election
by party of the candidate7 . So there are exactly two utility functions from
The respondents' Support for Clinton measure was transformed from the ve-point
scale to a seven-point scale by combining the approve/disapprove question (VAR 201) with
the strength of approval/disapproval question (VAR 202). The seven-point Democratic
Candidate Support for Clinton variable and Republican Candidate Support for Clinton
variable were created by combining House incumbent supports Clinton Half the time (VAR
647) with House incumbent supports Clinton almost always (VAR 645) and House incumbent supports Clinton almost never (VAR 646) then cross-referencing by incumbent's party
(VAR 17). Concern About Crime for the respondent and the two candidates is created by
rescaling voted for or supported the Crime bill (VAR 1040 for the respondent, VAR 648
and VAR 649 for the incumbent, cross referenced again by incumbent's party), where the
respondents' interior points are derived from the question asking about appropriate level
of spending on programs dealing with crime (VAR 825).
7
Democrats coded 0, and Republicans coded 1.
6
19
(7) for each respondent: UiD for the ith respondent's utility from the Democratic candidate, and UiR for the ith respondent's utility from the Republican candidate. The inferential value of this qualitative choice model is
assessed by the frequency with which the respondent selects the candidate
with the higher utility function. Table 1 summarizes the results from the
heteroscedastic probit regression model, (16), on these data.
Table 1: Parameter Estimates for Entropy Model of Vote Choice 1994 American National Election Study
Choice Parameters
Intercept
Democratic Support for Clinton
Republican Support for Clinton
Democratic Crime Concern
Republican Crime Concern
Democratic Gvt. Help Disadv.
Republican Gvt. Help Disadv.
Democratic Gvt. Spending
Republican Gvt. Spending
Democratic Federal Healthcare
Republican Federal Healthcare
Democratic Ideology Entropy
Republican Ideology Entropy
Party Identication Scale
Variance Parameters
Political Interest
Education
Gender
Race
Media Exposure
Age
Parameter Standard
Estimate
Error z-statistic p-value
-1.116
-0.015
0.030
0.044
0.007
0.029
-0.006
0.114
-0.100
0.031
-0.017
0.104
0.303
0.368
0.387
0.008
0.011
0.009
0.009
0.011
0.013
0.025
0.025
0.008
0.010
0.131
0.068
0.028
-2.882
-1.943
2.701
4.960
0.699
2.698
-0.438
4.633
-4.030
3.670
-1.691
0.794
4.437
13.158
0.004
0.052
0.007
0.000
0.485
0.007
0.661
0.000
0.000
0.000
0.091
0.427
0.000
0.000
-0.013
0.055
-0.184
-0.239
-0.041
-0.016
0.076
0.034
0.105
0.160
0.024
0.003
-0.176
1.633
-1.750
-1.501
-1.700
-4.738
0.860
0.102
0.080
0.133
0.089
0.000
LR Heteroscedasticity Test: LR = 68:2317; p < 0:0001 for 2df =6
Goodness of Fit Test: LRT = 359:3869; p < 0:0001 for 2df =19
Percent Correctly Classied: 78.66% (using the \naive criteria")
20
The heteroscedasticity likelihood ratio test clearly indicates that we are
justied in developing the Harvey heteroscedastic probit model for these
data as the 2 test shows signicance at p < 0:0001. Although several of
the variance parameters show poor signicance, exclusion from the model
actually worsened some of the choice parameter signicance levels. It is
interesting to note that the political interest coecient is not signicant
as one would normally expect political interest to increase a respondent's
attention to the election and therefore reduce uncertainty. By a large margin
the age variable is the most signicant of the variance parameters (the only
signicant one at = 0:05). Since the sign on the coecient is negative
and the exponential term (ez ) is in the denominator of (16), then there is
evidence of greater variability in younger voters: these cases needed greater
downweighting.
The goodness of t test (LRT) shows strong evidence that the model
selection is appropriate8 . I include the percent correctly classied value
(79%) with some trepidation as it is based here on a threshold value of 0.5.
If the sample of response variables is unbalanced in one direction or another,
then the 0.5 threshold can incorrectly classify cases (Greene 1993, p.652).
Since there were 498 \1"s in the sample (Republican votes) out of n = 942,
there is little justication to explore alternative threshold values.
The results in Table 1 suggest that the model does reect vote choice under uncertainty in a reasonable manner. Of the ten policy issue distance by
candidate variables, all but two are signed as expected, six are signicance
at the p < 0:001 level, and no non-signicant coecient values are paired
for the same policy issue. This suggests that there is an eect on vote choice
for squared respondent/candidate issue distance for policy on crime, government help for underprivileged social classes, government spending on social
programs, and federal involvement in the healthcare system. The single policy issue that is not signed as expected is the candidates perceived support
for Clinton's policies. In this case as squared issue distance between the
respondent and the Democratic candidate increases there is a greater probability of voting for the Democratic candidate (although there is a p-value
of 0.052 on the coecient). We would normally expect that as the distance
increases, a respondent is increasingly dissatised with the Democratic candidate and more likely to vote for the Republican candidate. The sign on the
I recognize that the question of which goodness of t measure best describes a qualitative choice model is unanswered. Rather than take a position on this debate I caveat the
LRT result above with the caution that there are other measures which may yield diering
results. However, given the magnitude of the LRT value here, I suspect these dierences
would not be substantial. Amemiya (1981) discusses some alternatives in detail.
8
21
coecient for support for Clinton's policies for the Republican candidate is
positive, showing the same kind of counter-intuitive relationship described
for the Democratic candidate. This suggests that there is something special
happening with this variable. One possible explanation is that Clinton often
embraced traditional Republican positions as a \New Democrat" (reinventing government, a focus on the economy, protection of programs for the
elderly, reevaluation of welfare policies). Another possible explanation is
that this variable is capturing some component of respondents' desire for
divided government.
As expected, the party identication coecient is large and signicant.
There is a vast literature to support this nding even though the eects
appear to be waning (Nie, Verba and Petrocik 1976, Schulman and Pomper
1976, Page and Jones 1979, Wattenberg 1990). Note that the Republican
ideology entropy term is highly signicant (p < 0:001), whereas there is no
evidence for the signicance of the Democratic ideology entropy term. This
observation and the positive sign suggest that the reduction in uncertainty
is an important factor for the Republicans only in the 1994 elections.
Figure 3 shows the the eect of the ideology entropy term on the probability of voting for the Republican candidate at the mean response of the
other explanatory variables. Since the ideology entropy coecient for the
Democrats is not signicant, it is useless to construct a similar graph for
the probability of voting for the Democratic candidate. The eect on vote
for the Republican candidate shows a slightly curvilinear increase in probability as the (negative) entropy term increases. Several observations can
be made from Figure 3. The mean values of the explanatory variables provide a range of vote probability that certainly favors Republican candidates
(55% to 99%). We can also observe that as the Republican ideology entropy term approaches maximum entropy (most certainty), the probability
of voting for the Republican candidate approaches one. However, the range
of the Democratic entropy term is -1.9652 to -0.0514 compared with -3.1212
to -0.0537 for the Republican candidates, implying that respondents are
more uncertain of Republican candidates. This suggests that uncertainty
is a larger electoral issue for the Republicans in 1994 both because their
positions are less clear, and because they have substantial opportunity to
gain votes by reducing uncertainty. Bartels (1996) has a similar nding in
which Democratic presidential candidates are advantaged by \uninformed
voting."
A more dicult question to address is why is the ideology entropy term
for the Democratic candidates not signicant when it is for the Republican
candidates. Possibly there is more perceived certainty about the Democratic
22
0.8
0.6
0.4
•••••••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•••••••••
••••••••••••••••
p < 0.001
••••••••••••••
••••••••••••
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
••••
••••••••••
•••••••••
••••••••
•
•
•
•
•
•
•
••
••••••••
••••••••
•••••••
•
•
•
•
•
•
••
•••••••
•••••••
••••••
•
•
•
•
•
•
•
•••••••
•••••••
•••••••
0.2
Probability of Voting for Republican Candidate
1.0
Figure 3: Eect on Republican Vote Probability by Ideology Entropy
Entropy Min = -3.1213
0.0
Entropy Max = -0.0547
-3.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
Ideology Entropy Values for the Republican Candidate
candidate's ideology since the Democrats had been the majority party in the
House for forty years. This seems appropriate since the majority party's positions are generally better known and there were more Democratic incumbents running in 1994 than Republican incumbents (although they obviously
fared worse). If one is willing to give the Republican House leadership a great
deal of analytical credit, it could be claimed that they were aware that as
the long-term minority party in the House, their policy positions were not
well known and there was a need for the \Contract with America" to reduce
voter uncertainty.
The entropy term in (11) is at its maximum when the p vector is uniform (all equal probabilities: 17 in this case), and the vector is as close
to degenerate as possible (nearly all of the density assigned to one value)9
We cannot make the vector perfectly degenerate as this would violate the
requirement that prevents division by zero in (11).
9
23
i
> 0 8i
This is equivalent to a respondent being very sure they can place the Republican candidate on the ideology scale, whereas the Republican candidate
has equal probability of falling into any of the seven ideology categories.
In the case of a seven-choice set where r1 is assigned 0.9994, and the six
other ri are assigned 0.0001 each, we get an entropy value of -5.9487 from
(11). If we insert this value for the Republican ideology entropy and hold
other explanatory variables at their mean, the probability of a vote for the
Republican candidate is 4.34%. Since this is much lower than the smallest
observed ideology entropy vote in Figure 3, we know that no respondent was
close to being this uninformed.
4 Conclusion
Clearly uncertainty matters. This model demonstrates that voters have an
aversion to vagueness in campaigns. The implication is that politicians gain
by clarifying policy positions provided that those positions are not far out
of line with those preferred by the electorate. Or, at a minimum, politicians
need to trade-o the cost of uncertainty with the cost of alienating specic
groups of voters.
Entropy as a measure of uncertainty in political communication, as developed here, indicates a lack of substance in transmissions. Entropy captures the cost of uncertainty in this calculus by measuring this uncertainty
through analysis of discrete issue placements. The entropy measure is superior to other tools because it calculates uncertainty without any strict
assumptions. Conversely, measures of uncertainty based on variances require indirect estimation of unobservable parameters (Bartels 1988).
There is evidence that voter uncertainty diered by the party of the
candidate. Although the Republican candidates fared better in policy preferences in the study (and of course in the election as well), Republican
candidates could considerably raise their vote probability by clarifying their
positions in the 1994 election. The entropy measure of uncertainty showed
that the Republican candidates faced greater levels of uncertainty in the
1994 congressional election, but also had the opportunity to substantially
improve their vote probability by decreasing uncertainty about their positions in the minds of voters.
The \Contract with America" was an eort to stake out and explicate a
set of specic policy positions. The shape of the curve in Figure 3 suggests
that Republican candidates helped themselves considerably whenever they
increased the clarity of their positions at the mean of the other explanatory
24
variables. Conversely, there was no evidence that Democratic candidates
for the House were aected by diering levels of uncertainty. It is likely
that forty years of controlling the House created a high level of perceived
certainty about the policy positions of Democratic candidates.
The political sophistication of the Republican leaders in the House during this time is still open to discussion. If the \Contract with American"
was developed to increase the clarity of Republican candidates' positions
nationwide, then it would imply a high level of sophistication with regard to
analyzing voter preferences and uncertainty. However, there are other likely
reasons that the leadership developed an explicit agenda and required members to publicly sign the document. These include control of the agenda by
the leadership, control of the freshmen members, and a strategy for opposing policies advocated by the executive branch. It is entirely possible that
the Republican leadership was not aware of the possible advantage available
by reducing uncertainty, but they simply beneted by luck.
25
References
[1] Abramson, Paul R. and John H. Aldrich. 1982. \The Decline of Electoral
Participation in America." American Political Science Review 76: 502-21.
[2] Aczel J. and Z. Daroczy. 1975. On Measures of Information and Their Characterizations. New York: Academic Press.
[3] Amemiya, Takeshi. 1981. \Qualitative Response Models: A Survey." Journal
of Economic Literature 19: 1483-536.
[4] Aldrich, John H., Richard G. Niemi, George Rabinowitz, and David W. Rohde.
1982. American Journal of Political Science 26: 391-414.
[5] Alvarez, R. Michael and John Brehm. 1995. \American Ambivalence Towards
Abortion Policy: Development of a Heteroscedastic Probit Model of Competing Values." American Journal of Political Science 39: 1055-82.
[6] Alvarez, R. Michael and John Brehm. 1997. \Are Americans Ambivalent Towards Racial Policies?" American Journal of Political Science 41: 345-72.
[7] Alvarez, R. Michael and Charles H. Franklin. 1994 \Uncertainty and Political
Perceptions." Journal of Politics 56 (3): 671-88.
[8] Ayres, David. 1994. Information, Entropy, and Progress. New York: American
Institute of Physics Press.
[9] Bartels, Larry. 1996. \Uninformed Votes: Information Eects in Presidential
Elections." American Journal of Political Science 40: 194-230.
[10] Bartels, Larry. 1988. \Issue Voting Under Uncertainty: An Empirical Test".
American Journal of Political Science 30: 709-28.
[11] Bevensee, Robert M. 1993. Maximum Entropy Solutions to Scientic Problems
Englewood Clis, N.J.: Prentice Hall.
[12] Boltzmann, L. 1877. \Uber die Beziehung zwischen dem zweiten Hauptsatze
der mechanischen Warmetheorie und der Wahrscheinlichkeitsrechnung, respective den Satzen uber das Wermegleichgewicht." Wein Ber 76: 373-95.
[13] Burnham, Walter Dean. 1987. \The Turnout Problem." In Elections American
Style. ed. A. James Reichley Washington, D.C.: Brookings Institution.
[14] Carmines, Edward G. and James A. Stimson. 1980. \The Two Faces of Issue
Voting." American Political Science Review 74: 78-91.
[15] Coughlin, Peter J. 1990. \Candidate Uncertainty and Electoral Equilibria."
In Advances in the Spatial Theory of Voting, eds. James M. Enelow & Melvin
J. Hinich. Cambridge: Cambridge University Press.
26
[16] Coughlin, Peter J. and S. Nitzen. 1981. \Electoral Outcomes with Probabilistic
Voting and Nash Social Welfare Maxima." Journal of Public Economics 15:
113-21.
[17] Converse, Phillip E. 1964. \The Nature of Mass Belief Systems." In Ideology
and Discontent, ed. David Apter. New York: Free Press.
[18] Cover, Thomas M., Joy A. Thomas. 1991. Elements of Information Theory.
New York: John Wiley and Sons.
[19] Darcy, R. and Hans Aigner. 1980. \The Uses of Entropy in the Multivariate
Analysis of Categorical Data." American Journal of Political Science 24: 15574.
[20] Davidson, Russell and James G. MacKinnon. 1984. \Convenient Specication
Tests for Logit and Probit Models." Journal of Econometrics. 25: 241-62.
[21] Denbigh, Kenneth George and J.S. Denbigh. 1985. Entropy in Relation to
Incomplete Knowledge. Cambridge: Cambridge University Press.
[22] Downs, Anthony. 1957. An Economic Theory of Democracy. New York:
Harper and Row.
[23] Enelow, James M. and Melvin J. Hinich. 1981. \A New Approach to Voter
Uncertainty in the Downsian Spatial Model." American Journal of Political
Science 43: 1062-89.
[24] Enelow, James M. and Melvin J. Hinich. 1984. \A New Approach to Voter
Uncertainty in the Downsian Spatial Model." American Journal of Political
Science 25: 483-93.
[25] Enelow, James M. and Melvin J. Hinich. 1990. Advances in the Spatial Theory
of Voting. Cambridge: Cambridge University Press.
[26] Farquharson, Robin. 1969. Theory of Voting. New Haven: Yale University
Press.
[27] Ferejohn John A. and Morris Fiorina. 1974. \The Paradox of Not Voting: A
Decision Theoretic Analysis." American Political Science Review 68: 525-36.
[28] Franklin, Charles. 1991. \Eschewing Obfuscation? Campaigns and the Perceptions of U.S. Senate Incumbents." American Political Science Review 85:
1193-214.
[29] Gerber, Elisabeth. 1996. \Legislatures, Initiatives, and Representation: The
Eects of State Legislative Institutions on Policy." Political Research Quarterly 49: 263-86.
27
[30] Greene, William H. 1993. Econometric Analysis. Second Edition. New York:
Macmillan Publishing.
[31] Harvey, A.C. 1976. \Estimating Regression Models with Multiplicative Heteroscedasticity." Econometrica 44: 461-5.
[32] Hinich, Melvin J. 1977. \Equilibrium in Spatial Voting: The Median Voter Is
An Artifact." Journal of Economic Theory 16: 280-19.
[33] Jaynes, Edwin T. 1957. \Information Theory and Statistical Mechanics."
Physics Review 106: 620-30.
[34] Jaynes, Edwin T. 1968 \Prior Probabilities." IEEE Transmissions on Systems
Science and Cybernetics SSC(4): 227-41.
[35] Jaynes, Edwin T. 1982. \On the Rationale of Maximum-Entropy Methods."
Proceedings of the IEEE 70(9): 939-52.
[36] Kinchin, A. 1957. Mathematical Foundations of Information Theory. Translation by R. Silverman and M. Friedman, New York: Dover.
[37] Kolmogorov, A.N. 1959. \Entropy per Unit Time as a Metric Invariant of
Automorphisms." Dokl. Akad. Nauk SSSR 124: 754-55.
[38] Lewis, Gilbert N. 1930. \The Symmetry of Time In Physics ." Science LXXI:
569-77.
[39] Nie, Norman H., Sidney Verba and John R. Petrocik. 1976. The Changing
American Voter. Cambridge, MA: Harvard University Press.
[40] Nimmo, Dan. 1970. The Political Persuaders. Englewood Clis, NJ: PrenticeHall.
[41] Olson, Mancur. 1965. The Logic of Collective Action. Cambridge, MA: Harvard University Press.
[42] Ordeshook, Peter C. 1986 Game Theory and Political Theory. Cambridge:
Cambridge University Press.
[43] Page, Benjamin I. and Calvin C. Jones. 1979. \Reciprocal Eects of Policy
Preferences, Party Loyalties and the Vote." American Political Science Review
73: 1071-89.
[44] Palfrey, Thomas R. and Keith T. Poole. 1987. \The Relationship Between
Information, Ideology, and Voting Behavior." American Journal of Political
Science 31: 512-30.
[45] Planck, Max. 1906. Theorie der Warmstrahlung. Leipzig: J.A. Barth.
28
[46] Robert, Claudine. 1990. \An Entropy Concentration Theorem: Applications
in Articial Intelligence and Descriptive Statistics." Journal of Applied Probability 27: 303-13.
[47] Ruelle, David. 1991. Chance and Chaos. Princeton: Princeton University
Press.
[48] Ryu, Hang K. 1993. \Maximum Entropy Estimation of Density and Regression
Functions." Journal of Econometrics 56: 397-440.
[49] Schulman, Mark A. and Gerald M. Pomper. 1975. \Variability in Election Behavior: Longitudinal Perspectives from Causal Modeling." American Journal
of Political Science 19: 1-18.
[50] Shannon, Claude. 1948. \A Mathematical Theory of Communication." Bell
System Technology Journal 27: 379-423, 623-56.
[51] Shepsle, Kenneth A. 1972. \The Strategy of Ambiguity: Uncertainty and
Electoral Competition." American Political Science Review 66: 555-68.
[52] Soo, E.S. 1992. \A Generalizable Formulation of Conditional Logit with Diagnostics." Journal of the American Statistics Association 87: 812-6.
[53] Teixera, R. 1992. The Disappearing American Voter. Washington: Congressional Quarterly Books.
[54] Theil, Henri. 1967. Economics and Information Theory. Amsterdam: North
Holland Publishing Company.
[55] Theil, Henri and Denzil G. Fiebig. 1984. Exploiting continuity: maximum entropy estimation of continuous distributions Cambridge, MA: Ballinger Publishing Company.
[56] Tribus, Myron. 1961. \Information Theory as the Basis for Thermostatics and
Thermodynamics." Journal of Applied Mechanics 28: 1-8.
[57] Tribus, Myron. 1979. \Thirty Years of Information Theory." In The Maximum
Entropy Formalism, eds. Raphael D. Levine & Myron Tribus. Cambridge:
MIT Press.
[58] Tribus, Myron and Edward C. McIrvine. 1971. \Energy and Information"
Scientic American 225(3): 179-88.
[59] Van Campenhout, Jan M. and Thomas M Cover. 1981. \Maximum Entropy and Conditional Probability" IEEE Transactions on Information Theory
27(4): 483-9.
[60] Von Neumann, John. 1927. \Thermodynamik Quantenmechanischer
Gesamtheiten." Gott. Nach. 273-91.
29
[61] Yatchew, Adonis and Zvi Griliches. 1984. \Specication Error in Probit Models." Review of Economics and Statistics 67: 134-9.
[62] Wattenberg, Martin P. 1990. The Decline of American Political Parties. Cambridge, MA: Harvard University Press.
[63] Wolnger, Raymond E. and Steven J. Rosenstone. 1980 Who Votes. New
Haven: Yale University Press.
30