Download doc file - Index of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Identification Page – Cover Sheet
Title:
Non-Trivial Conditional Probability
Authors:
Kenneth A. Presting
University of North Carolina at Chapel Hill
[email protected]
Christopher A. Pynes
University of Tennessee at Knoxville
[email protected]
Colloquium Submission
Word count: Paper: 3,121
Abstract: 155
Phone:
(Presting)
(919) 465-7227
Page 1 of 16
Title: Non-Trivial Conditional Probability
Abstract:
A prominent feature of Frank Ramsey’s seminal work on causality (1990) was the
conjecture that an indicative conditional proposition, ‘If F, then G’, would have a probability
equal to the conditional probability of ‘G’, on the assumption that ‘F’. In 1975, David Lewis’s
‘triviality theorem’ showed that if a probability measure should give any truth-functional schema
the conditional probability of its components, then that probability could take only four distinct
values instead of the usual continuous interval.
Lewis’s triviality results apply in propositional logic, but Ramsey explicitly identified
quantified propositions as his topic. We follow Ramsey by changing the context from a purely
Boolean propositional logic to a first-order logic. We introduce our proposal by calculating
conditional probabilities in a simple finite example, then in a technical appendix extend our
treatment to infinite domains. We conclude by sketching an extension to multivariate cases, and
compare our proposal to a classic treatment due to Haim Gaifman.
Page 2 of 16
Non-Trivial Conditional Probability
A prominent feature of Frank Ramsey’s seminal work on causality (1990) was the
conjecture that an indicative conditional proposition, ‘If F, then G’, would have a probability
equal to the conditional probability of ‘G’, on the assumption that ‘F’. This thesis is sometimes
credited to Ernest Adams, who gave indirect support for it in his 1975 work on non-truthfunctional conditionals. Around the same time, David Lewis’s ‘triviality theorem’ showed that if
a probability measure should give any truth-functional schema the conditional probability of its
components, then that probability could take only four distinct values instead of the usual
continuous interval. Recent work has extended, strengthened, and simplified the triviality results
(Eells 1994 and Milne 2003).
We avoid Lewis’s triviality results by changing the context from a purely Boolean
propositional logic to a first-order logic, where the Ramsey claim is born out in almost its
original form. Ramsey did not take quantified assertions to express propositions, but we shall not
hesitate to do so. We shall calculate conditional probabilities for general conditional propositions
under a non-trivial measure; thereby demonstrating how a conventional application of
probability theory to first-order models can assign a probability to a quantified conditional
formula as a function of its predicates’ extensions.
Before we present our general proof, it will be useful to provide a particular instance of
our method. Suppose an urn contains nine blocks of two shapes and three colors, thus:
Spheres: 2 Yellow, 1 Red, 3 Blue
Cubes: 1 Yellow, 1 Red, 1 Blue
Page 3 of 16
If one were to perform the experiment of drawing a single block at random from this urn, one
would, under common assumptions, assign a probability of 2/3 to the event of drawing a sphere,
1/3 to the event of drawing something yellow, and 1/9 to the event of drawing the yellow cube.
Instead of just one, let us draw a number of shapes from the urn, without replacement. Assume
we do so in a way that assigns equal probability to each subset of the objects in the urn1. Since
we are considering a collection of objects rather than an individual, we may address ourselves to
general assertions regarding the set of objects we select: What is the probability that the objects
selected are all yellow? What is the probability that within a selection there exists a sphere?
Though we will later part company with his analysis, we follow Gaifman (1964) in taking
the existential quantifier as primitive. By a common intuition one may truly say ‘There is an F’,
just when the collection under discussion contains at least one member satisfying the open
formula ‘Fx’. Collections satisfying the universal are then defined by the usual double
complementation, ‘x(Fx)’ iff ‘~x~(Fx)’. For example, just as every particular object is either
yellow or not yellow, so in every collection either there is a non-yellow object or there is not. If
no object is non-yellow, then we say all are yellow2.
The following calculations give answers to some probability questions we might ask
about the subset we draw from the urn. Note that for any set with N members, the set of its
subsets is the power set, whose size is 2N. For example, since three objects in the urn are yellow,
the number of subsets containing no non-yellow objects is 23 or 8. Schematizing two predicates
as: ‘Sx’: ‘x is spherical’ and ‘Yx’: ‘x is yellow’, we can calculate the following probabilities:
For example, we might number our blocks ‘1’ through ‘9’, print out the nine-place binary numerals from
‘000000000’ to ‘111111111’ on 512 slips of paper, draw a slip at random, and use the pattern of digits to specify,
for each block, whether it has been selected. Alternatively, we might flip a fair coin, once for each block, then
include in the random sample just those blocks who’s coin flip showed heads.
2
These definitions entail the intriguing peculiarity that an empty collection is taken to satisfy any universal
generalization. This feature of our definitions is essential to the calculations and proof we present below, so we
accept it without further discussion.
1
Page 4 of 16
P x(Yx) 
23
1
 26 
9
2
64
26
1
P x(Sx)  9  23 
2
8
P x(Sx & Yx) 
22
1
 27 
9
2
128
Thus we have values for the probability that a randomly selected subset is entirely
yellow, entirely spherical, or entirely both. We can use the values in the usual formula for
conditional probability. The following calculations express, respectively, the conditional
probability that a sample is all yellow given that it is all spherical, and that a sample is all
spherical given that it is all yellow:
P x(Yx)|x(Sx)  
P x(Sx)|x(Yx)  
P x(Sx & Yx) 
P x(Sx) 
P x(Sx & Yx) 
P x(Yx) 

1 128
8
1


18
128 16

1 128 64 1


1 64 128 2
Next we calculate the probability of the corresponding quantified material conditionals.
Note that five objects in the urn satisfy the open formula version of the material conditional, ‘Sx
 Yx’, while eight satisfy its converse. Therefore the quantified expression will be satisfied by a
number of collections equal to 2 raised to the respective exponent. We obtain the same
probabilities again, but now by the direct calculations:
P x(Sx  Yx) 
25
1
 2-4 
9
2
16
28
1
P x(Yx  Sx)  9  2-1 
2
2.
Page 5 of 16
Thus we see that a familiar and elementary conditional expression has a probability that
is equal to the conditional probability of its consequent given its antecedent. It is easily verified
that all the familiar laws of probability are satisfied as well, including Lewis’s postulates.
Seeing that all the probability theory here is standard, what distinguishes our analysis
from the triviality results that have gone before? It is the use of two different but linked
probability spaces for calculating the differing probabilities. One space is formed by the
extensions of predicates as applied to simple objects, e.g., ‘Yellow’ or ‘Spherical’. The second
space is the power set of the first. Events in the first space are extensions of predicates, or open
formulas. Events in the second space are the extensions of quantified sentences, or Boolean
combinations of sentences. Spaces of each type are used often in statistics, and the term ‘sample
space’ is applied to both. Philosophical discussion has focused on the population space alone,
because that space corresponds most closely to the familiar propositional calculus. The Lewisstyle results apply to truth-functional compounds of proposition, but the quantified material
conditional is not a truth-functional compound of its component predicates. Previous discussions
of triviality have exclusively considered propositional logic, and thereby overlooked that the
quantified conditional proposition can have a conditional probability without trivializing the
entire measure.
Is this unfair? Aren’t we changing the rules, since Lewis’ result assumes a propositional
logic on a single probability space? We are not arguing that Lewis’ results are mistaken. On the
contrary, we are taking Lewis’ work (and its subsequent extensions) to show that the recurring
intuition which suggests a connection between indicative conditionals and conditional
probability should lead us beyond the classical propositional calculus. Many recent authors have
had a similar response to Lewis’ result. Many efforts toward non-trivial conditional probability
Page 6 of 16
have extended the field of discussion past classical logic. Jeffrey and Stalnaker propose a
propositional connective which is not only non-truth-functional, it is defined by an infinitary
process of conditionalization. Richard Bradley has proposed a conditional connective which is
extensional and axiomatically defined, but interpreted into a ‘conditional algebra’ which is
distinct from the Boolean algebra of the normal connectives.
A more important consideration is whether an operation of generalization has a legitimate
role in the formation of a proposition which is widely considered as anything but general.
Consider an obviously singular proposition such as ‘If Tweety is a raven, then Tweety is black’.
There appears to be no room in such a sentence for anything like a domain of individuals, let
alone a space of samples or a generalization across random samples.
This is an important issue, but in this paper we will deliberately do no more than cravenly
evade it. As a distractive tactic, we will refer to a theory of indicative conditionals such as Bill
Lycan’s Real Conditionals. Lycan postulates that the deep structure of the indicative conditional
is a logical generalization over a domain of events. The existence of such a position makes our
work here relevant to philosophical discussion of conditionals in natural language. It is not our
purpose here to advance any particular hypothesis in philosophical logic. Rather, it is our
purpose to identify a formal structure which may be useful for other workers who are trying to
understand the conditional as it is used in natural language. We take some comfort in
recognizing how poorly the venerable material conditional serves when taken by itself as a
model for conditionals in human speech. Poor as it is, the material conditional is at the
foundation of many sophisticated attempts to understand indicative conditionals. The present
work is intended to serve as a building block for future theories, in a parallel role to the material
conditional.
Page 7 of 16
As a final objection, let us reply to those who have noticed how implausible it is to even
consider such a proposition as, ‘(x)Fx’, when it might mean something as simple as,
‘Everything is a raven.’ It’s just too obvious that not everything is a raven, as one cannot help
but notice with the most superficial observation. Therefore the probability that everything is a
raven is zero, and we cannot conditionalize on that assumption any more than we could divide by
zero. The answer to this objection is, again, that we are proposing in this paper a mathematical
device which has certain interesting mathematical properties, but will surely cause problems if
taken in itself to be a hypothesis about language. If we were advancing a theory about natural
language conditional (which we are not), then we might suggest that the antecedent of a
conditional has a pragmatic role as well as a semantic one. Perhaps those who take upon
themselves the project of understanding the semantics of natural language (as we do not) will
find it useful to investigate such a hypothesis as the following:
‘The antecedent of a conditional identifies the domain of interpretation for that
conditional.’
On the other hand, maybe they won’t. But if they do, we hope our calculations will prove
useful to them.
We now present an algebraic proof. Consider a first-order language L, interpreted into a
non-empty finite domain U with N elements. We will assume that L has at least one open
formula R(x) for every element u in U such that the extension of R(x) is just the singleton set
{u}. We use the square brackets, as in ‘[Rx]’ or ‘[x(Rx)]’, to denote the extension of any
expression, whether open or quantified.
Page 8 of 16
Let <U, Σ, P0> be a Kolmogorov probability space (cf. Billingsley 1995), where U is, as
above, the domain of interpretation for our language L, and Σ is a field of sets consisting of the
extensions of the open formulas of L, with their intersections, unions, and complements. The
measure P0(S) is defined for every S in Σ as:
P0 (S) 
|| S ||
N
where the notation || S || means “the number of elements in the set S”. Note that because the field
of sets Σ contains all the singletons of U and all the unions thereof, Σ is exactly all the subsets, or
the power set of U.
Our second probability space is <Σ, Ψ, P>, where the domain Σ is, as above, the power
set of the model domain U. In this space, the field of sets Ψ is generated not by the open
formulas of L, but instead by the quantified sentences. The extension of a sentence ‘x(Fx)’ is
given by:
[x(Fx)] = { S ∈ Σ | s ∈ S  s ∈ [Fx] }.
Since the new domain Σ has 2N elements, for any C in Ψ we define the probability measure P(C)
as:
P(C) 
|| C ||
2N
Theorem 1.
Let L be a first-order language interpreted into a finite non-empty domain U such that
every singleton is the extension of some open formula, and let two probability spaces <U, Σ, P0>
and <Σ, Ψ, P> be defined as above. If ‘Fx’ and ‘Gx’ are any open formulas of L, then
P[x(Fx  Gx)] = P[x(Gx) | x(Fx)].
Page 9 of 16
Proof.
Let the number of elements in [Fx] be the non-negative3 integer Nf, and let the number of
elements in [Gx] and [Fx & Gx] be, respectively, the non-negative integers Ng and Nfg. Since
[Fx] and [Gx] are subsets of the domain of interpretation U, and Σ is the power set of U, Σ will
also contain all subsets of [Fx] and of [Fx & Gx]. By definition, the extension of the
generalization [x(Fx)] is just the set of S in Σ which are subsets of [Fx]. Similarly for [x(Fx &
Gx)]. The cardinality of these extensions is then
N
|| [x(Fx)] || = 2 f
|| [x(Fx & Gx)] || = 2
N fg
.
With these values, recalling that the cardinality of  is 2N, we can calculate the
conditional probability by:
P[x(Gx) | x(Fx)] = P[x(Fx & Gx)]
P[x(Fx)]
N
2 fg 2 N
Nf
N
= 2 2
N
2 fg
N
= 2 f .
We can also calculate the size of the extension [Fx  Gx]. Since [~Fx] is disjoint from
[Fx], this is given by:
|| [Fx  Gx] || = || [~Fx] ∪ [Fx & Gx] || = (N- Nf) + Nfg.
Then the corresponding generalization has the cardinality:
3
It is not necessary to suppose that the extension of F is not empty, because we are not conditioning on [Fx] itself.
Even when [Fx] is empty, the extension of [x(Fx)] is not empty – it will contain one element, the empty set.
Page 10 of 16
|| [x(Fx  Gx)] ||
=2
((N - N f ) + N fg )
(N - N f )
)(2
= (2
Nfg
-N f
N
= (2 )(2 )(2
)
N fg
),
and therefore the probability
(2 N )(2-Nf )(2
P x(Fx  Gx)  
2N
 (2-Nf )(2
N fg
N fg
)
)
N
2 fg
 Nf
2
which is, as desired, the conditional probability we calculated above.4 The proof of this
result for infinite domains is outlined in the Technical Appendix.
Before closing, we shall touch briefly on two issues. First, is the relation between the
theory of Haim Gaifman and the theory presented here. We will discuss an important limitation
of Gaifman’s theory, and explain how the mathematical background of his theory compares to
ours. In our concluding topic, we will indicate future directions for the treatment of probability
of relational expressions, and other expressions involving more than one free variable.
The central concept of Haim Gaifman’s 1964 work, “Concerning Measures on FirstOrder Calculi”, is an extension of a common observation in logic. It is familiar to identify the
existential quantifier with an infinite disjunction. It is also common to identify a disjunction as
the least upper bound, or supremum, of two propositions in a Boolean lattice. Gaifman used the
Page 11 of 16
fact that an infinite disjunction corresponds to the least upper bound of all its finite subdisjunctions. He defines the probability of an existential generalization as the supremum of the
probabilities of every finite disjunction of its substitution-instances. It is not obvious that this
value has any direct connection to the truth of the existential generalization, but there is a more
serious problem.
In a wide variety of interesting cases, especially the case of finite domains, this
supremum is non-zero and the existential generalization is given a positive probability. In cases
of continuous domains such as the real numbers, the probability of a substitution-instance of a
formula corresponds to the measure of a one-point set. This measure is generally zero, and
Gaifman’s procedure cannot assign positive probability to any existential generalization over
such domains.
The central result of Gaifman’s 1964 paper is the uniqueness theorem. Gaifman began
with an assignment of probability values to the substitution-instances of every quantifier-free
formula. These formulas include all the unquantified atomic formulas, plus all the Boolean
combinations of those formulas. Once probabilities are defined over the molecular quantifierfree formulas, there is then a unique extension of that probability measure to all sentences in the
language. Gaifman notes that uniqueness does not obtain when probabilities are specified only
for the atomic formulas.
Gaifman’s theorem can be understood as an instance of a more general result in measure
theory, if we recall how a first-order language is represented by a cylindrical algebra. In a
Tarski-style semantics, assignments of values to variables are infinite sequences of domain
elements. The set of all possible assignments is an infinite-dimensional Cartesian product, one
dimension for every variable in the language. Each of Gaifman’s molecular quantifier-free
Page 12 of 16
formulas is satisfied by a set of sequences. This set of sequences is a cylinder. It is a basic result
of measure theory (cf. Halmos 1950) that once a probability is assigned to every cylinder in an
infinite-dimensional Cartesian product, then there is a unique assignment of probability to every
measurable subset which extends the cylinder measures.
Now let us conclude by considering relational expressions and other formulas with more
than one free variable. Quantified relational sentences are handled more naturally in our theory
than in Gaifman’s. Our concept of probability for every sentence is always the probability that
the sentence is true. Therefore our concept applies to complex sentences, such as relational
sentences with multiple quantified variables, without referring to the probabilities of its
substitution instances. Since the truth of closed relational expressions is defined by the usual
Tarski semantics, every such sentence has a truth value at every point in the space of samples.
The extension of the sentence is the set of points where it is true, and the probability of the
sentence is the measure of its extension. This much is the same as the treatment of singlevariable sentences.
A complex issue which arises in our construction is the relation between the extension of
an open formula and the extensions of related sentences. Halpern (1999) discusses related issues
which arise in Gaifman’s system. We agree with Halpern that these complexities do not
represent difficulties which make the theory objectionable, but rather show that the theory of
probability for first-order languages is an appropriate context for formalizing the complex
inference structures of statistical reasoning.
Here is how the complexity arises. Recall that a formula with a single free variable has
an extension which is simply a set. The universal generalization of that formula is true in every
subset of the formula’s extension. The relation between the extension of the formula and the
Page 13 of 16
extension of the sentence is a simple power set operation. But the extension of a formula with
multiple free variables is a set of ordered tuples, so the simple power set relation does not hold.
All of our current proofs depend on this relation, so at this point there is little to say about the
numerical probabilities. It is possible to describe some qualitative relationships.
Closed sentences define binary-valued random variables in the space of samples, since
they have a truth value at every point. Expressions with single free variables can define a setvalued random variable, since the expression has a set as its extension at every sample point.
This set-valued random variable also defines a real-valued random variable. This works by
taking the P0-measure of the formula’s extension at every sample point. We may now take the
expectation of this variable over the sample space. If the extension of the formula in question is
close to the whole population, but not perfectly universal, then the expected value of the
extension averaged over the sample space will be smaller than the measure of the original
extension. Therefore taking random samples of a population is an efficient means of testing the
hypothetical universality of a property.
A similar range of operations is available for the formulas with additional free variables.
The variety of operations multiplies as free variables increase in number, but a pattern is clear.
First, by substituting references to particular individuals for all but one variable, we can obtain
indexed families of set-valued or real-valued random variables. If we take the expectation of all
those real-valued random variables, then the indexed family of variables becomes a real-valued
function of its indices. These indexed families can represent stochastic processes, and other
multivariate statistical models.
Page 14 of 16
References
Adams, E. 1975. The Logic of Conditionals. Dordrecht: Reidel.
Billingsley, P. 1995. Probability and Measure, Third Edition. New York: J. Wiley and Sons.
Eells, E. and Skyrms, B. eds. 1994. Probability and Conditionals. Cambridge: Cambridge
University Press.
Gaifman, H. 1964. Concerning Measures on First-Order Calculi. Israel Journal of Mathematics
2:1-17.
Halmos, P. R. 1950. Measure Theory. New York: Springer-Verlag,
Halpern, J. 1990. An Analysis of First-Order Logics of Probability. Artifical Intelligence 46, pp
311-350.
Lewis, D. 1976. Probabilities of conditionals and conditional probabilities. Philosophical
Review 85: 297-315.
Milne, P. 2003. The simplest Lewis-style triviality proof yet? Analysis 63: 300-304.
Page 15 of 16
Ramsey, F. P. 1990. General Propositions and Causality reprinted in F. P. Ramsey’s
Philosophical Papers, 145-63. Cambridge: Cambridge University Press.
Page 16 of 16