* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download doc file - Index of
Survey
Document related concepts
Transcript
Identification Page – Cover Sheet Title: Non-Trivial Conditional Probability Authors: Kenneth A. Presting University of North Carolina at Chapel Hill [email protected] Christopher A. Pynes University of Tennessee at Knoxville [email protected] Colloquium Submission Word count: Paper: 3,121 Abstract: 155 Phone: (Presting) (919) 465-7227 Page 1 of 16 Title: Non-Trivial Conditional Probability Abstract: A prominent feature of Frank Ramsey’s seminal work on causality (1990) was the conjecture that an indicative conditional proposition, ‘If F, then G’, would have a probability equal to the conditional probability of ‘G’, on the assumption that ‘F’. In 1975, David Lewis’s ‘triviality theorem’ showed that if a probability measure should give any truth-functional schema the conditional probability of its components, then that probability could take only four distinct values instead of the usual continuous interval. Lewis’s triviality results apply in propositional logic, but Ramsey explicitly identified quantified propositions as his topic. We follow Ramsey by changing the context from a purely Boolean propositional logic to a first-order logic. We introduce our proposal by calculating conditional probabilities in a simple finite example, then in a technical appendix extend our treatment to infinite domains. We conclude by sketching an extension to multivariate cases, and compare our proposal to a classic treatment due to Haim Gaifman. Page 2 of 16 Non-Trivial Conditional Probability A prominent feature of Frank Ramsey’s seminal work on causality (1990) was the conjecture that an indicative conditional proposition, ‘If F, then G’, would have a probability equal to the conditional probability of ‘G’, on the assumption that ‘F’. This thesis is sometimes credited to Ernest Adams, who gave indirect support for it in his 1975 work on non-truthfunctional conditionals. Around the same time, David Lewis’s ‘triviality theorem’ showed that if a probability measure should give any truth-functional schema the conditional probability of its components, then that probability could take only four distinct values instead of the usual continuous interval. Recent work has extended, strengthened, and simplified the triviality results (Eells 1994 and Milne 2003). We avoid Lewis’s triviality results by changing the context from a purely Boolean propositional logic to a first-order logic, where the Ramsey claim is born out in almost its original form. Ramsey did not take quantified assertions to express propositions, but we shall not hesitate to do so. We shall calculate conditional probabilities for general conditional propositions under a non-trivial measure; thereby demonstrating how a conventional application of probability theory to first-order models can assign a probability to a quantified conditional formula as a function of its predicates’ extensions. Before we present our general proof, it will be useful to provide a particular instance of our method. Suppose an urn contains nine blocks of two shapes and three colors, thus: Spheres: 2 Yellow, 1 Red, 3 Blue Cubes: 1 Yellow, 1 Red, 1 Blue Page 3 of 16 If one were to perform the experiment of drawing a single block at random from this urn, one would, under common assumptions, assign a probability of 2/3 to the event of drawing a sphere, 1/3 to the event of drawing something yellow, and 1/9 to the event of drawing the yellow cube. Instead of just one, let us draw a number of shapes from the urn, without replacement. Assume we do so in a way that assigns equal probability to each subset of the objects in the urn1. Since we are considering a collection of objects rather than an individual, we may address ourselves to general assertions regarding the set of objects we select: What is the probability that the objects selected are all yellow? What is the probability that within a selection there exists a sphere? Though we will later part company with his analysis, we follow Gaifman (1964) in taking the existential quantifier as primitive. By a common intuition one may truly say ‘There is an F’, just when the collection under discussion contains at least one member satisfying the open formula ‘Fx’. Collections satisfying the universal are then defined by the usual double complementation, ‘x(Fx)’ iff ‘~x~(Fx)’. For example, just as every particular object is either yellow or not yellow, so in every collection either there is a non-yellow object or there is not. If no object is non-yellow, then we say all are yellow2. The following calculations give answers to some probability questions we might ask about the subset we draw from the urn. Note that for any set with N members, the set of its subsets is the power set, whose size is 2N. For example, since three objects in the urn are yellow, the number of subsets containing no non-yellow objects is 23 or 8. Schematizing two predicates as: ‘Sx’: ‘x is spherical’ and ‘Yx’: ‘x is yellow’, we can calculate the following probabilities: For example, we might number our blocks ‘1’ through ‘9’, print out the nine-place binary numerals from ‘000000000’ to ‘111111111’ on 512 slips of paper, draw a slip at random, and use the pattern of digits to specify, for each block, whether it has been selected. Alternatively, we might flip a fair coin, once for each block, then include in the random sample just those blocks who’s coin flip showed heads. 2 These definitions entail the intriguing peculiarity that an empty collection is taken to satisfy any universal generalization. This feature of our definitions is essential to the calculations and proof we present below, so we accept it without further discussion. 1 Page 4 of 16 P x(Yx) 23 1 26 9 2 64 26 1 P x(Sx) 9 23 2 8 P x(Sx & Yx) 22 1 27 9 2 128 Thus we have values for the probability that a randomly selected subset is entirely yellow, entirely spherical, or entirely both. We can use the values in the usual formula for conditional probability. The following calculations express, respectively, the conditional probability that a sample is all yellow given that it is all spherical, and that a sample is all spherical given that it is all yellow: P x(Yx)|x(Sx) P x(Sx)|x(Yx) P x(Sx & Yx) P x(Sx) P x(Sx & Yx) P x(Yx) 1 128 8 1 18 128 16 1 128 64 1 1 64 128 2 Next we calculate the probability of the corresponding quantified material conditionals. Note that five objects in the urn satisfy the open formula version of the material conditional, ‘Sx Yx’, while eight satisfy its converse. Therefore the quantified expression will be satisfied by a number of collections equal to 2 raised to the respective exponent. We obtain the same probabilities again, but now by the direct calculations: P x(Sx Yx) 25 1 2-4 9 2 16 28 1 P x(Yx Sx) 9 2-1 2 2. Page 5 of 16 Thus we see that a familiar and elementary conditional expression has a probability that is equal to the conditional probability of its consequent given its antecedent. It is easily verified that all the familiar laws of probability are satisfied as well, including Lewis’s postulates. Seeing that all the probability theory here is standard, what distinguishes our analysis from the triviality results that have gone before? It is the use of two different but linked probability spaces for calculating the differing probabilities. One space is formed by the extensions of predicates as applied to simple objects, e.g., ‘Yellow’ or ‘Spherical’. The second space is the power set of the first. Events in the first space are extensions of predicates, or open formulas. Events in the second space are the extensions of quantified sentences, or Boolean combinations of sentences. Spaces of each type are used often in statistics, and the term ‘sample space’ is applied to both. Philosophical discussion has focused on the population space alone, because that space corresponds most closely to the familiar propositional calculus. The Lewisstyle results apply to truth-functional compounds of proposition, but the quantified material conditional is not a truth-functional compound of its component predicates. Previous discussions of triviality have exclusively considered propositional logic, and thereby overlooked that the quantified conditional proposition can have a conditional probability without trivializing the entire measure. Is this unfair? Aren’t we changing the rules, since Lewis’ result assumes a propositional logic on a single probability space? We are not arguing that Lewis’ results are mistaken. On the contrary, we are taking Lewis’ work (and its subsequent extensions) to show that the recurring intuition which suggests a connection between indicative conditionals and conditional probability should lead us beyond the classical propositional calculus. Many recent authors have had a similar response to Lewis’ result. Many efforts toward non-trivial conditional probability Page 6 of 16 have extended the field of discussion past classical logic. Jeffrey and Stalnaker propose a propositional connective which is not only non-truth-functional, it is defined by an infinitary process of conditionalization. Richard Bradley has proposed a conditional connective which is extensional and axiomatically defined, but interpreted into a ‘conditional algebra’ which is distinct from the Boolean algebra of the normal connectives. A more important consideration is whether an operation of generalization has a legitimate role in the formation of a proposition which is widely considered as anything but general. Consider an obviously singular proposition such as ‘If Tweety is a raven, then Tweety is black’. There appears to be no room in such a sentence for anything like a domain of individuals, let alone a space of samples or a generalization across random samples. This is an important issue, but in this paper we will deliberately do no more than cravenly evade it. As a distractive tactic, we will refer to a theory of indicative conditionals such as Bill Lycan’s Real Conditionals. Lycan postulates that the deep structure of the indicative conditional is a logical generalization over a domain of events. The existence of such a position makes our work here relevant to philosophical discussion of conditionals in natural language. It is not our purpose here to advance any particular hypothesis in philosophical logic. Rather, it is our purpose to identify a formal structure which may be useful for other workers who are trying to understand the conditional as it is used in natural language. We take some comfort in recognizing how poorly the venerable material conditional serves when taken by itself as a model for conditionals in human speech. Poor as it is, the material conditional is at the foundation of many sophisticated attempts to understand indicative conditionals. The present work is intended to serve as a building block for future theories, in a parallel role to the material conditional. Page 7 of 16 As a final objection, let us reply to those who have noticed how implausible it is to even consider such a proposition as, ‘(x)Fx’, when it might mean something as simple as, ‘Everything is a raven.’ It’s just too obvious that not everything is a raven, as one cannot help but notice with the most superficial observation. Therefore the probability that everything is a raven is zero, and we cannot conditionalize on that assumption any more than we could divide by zero. The answer to this objection is, again, that we are proposing in this paper a mathematical device which has certain interesting mathematical properties, but will surely cause problems if taken in itself to be a hypothesis about language. If we were advancing a theory about natural language conditional (which we are not), then we might suggest that the antecedent of a conditional has a pragmatic role as well as a semantic one. Perhaps those who take upon themselves the project of understanding the semantics of natural language (as we do not) will find it useful to investigate such a hypothesis as the following: ‘The antecedent of a conditional identifies the domain of interpretation for that conditional.’ On the other hand, maybe they won’t. But if they do, we hope our calculations will prove useful to them. We now present an algebraic proof. Consider a first-order language L, interpreted into a non-empty finite domain U with N elements. We will assume that L has at least one open formula R(x) for every element u in U such that the extension of R(x) is just the singleton set {u}. We use the square brackets, as in ‘[Rx]’ or ‘[x(Rx)]’, to denote the extension of any expression, whether open or quantified. Page 8 of 16 Let <U, Σ, P0> be a Kolmogorov probability space (cf. Billingsley 1995), where U is, as above, the domain of interpretation for our language L, and Σ is a field of sets consisting of the extensions of the open formulas of L, with their intersections, unions, and complements. The measure P0(S) is defined for every S in Σ as: P0 (S) || S || N where the notation || S || means “the number of elements in the set S”. Note that because the field of sets Σ contains all the singletons of U and all the unions thereof, Σ is exactly all the subsets, or the power set of U. Our second probability space is <Σ, Ψ, P>, where the domain Σ is, as above, the power set of the model domain U. In this space, the field of sets Ψ is generated not by the open formulas of L, but instead by the quantified sentences. The extension of a sentence ‘x(Fx)’ is given by: [x(Fx)] = { S ∈ Σ | s ∈ S s ∈ [Fx] }. Since the new domain Σ has 2N elements, for any C in Ψ we define the probability measure P(C) as: P(C) || C || 2N Theorem 1. Let L be a first-order language interpreted into a finite non-empty domain U such that every singleton is the extension of some open formula, and let two probability spaces <U, Σ, P0> and <Σ, Ψ, P> be defined as above. If ‘Fx’ and ‘Gx’ are any open formulas of L, then P[x(Fx Gx)] = P[x(Gx) | x(Fx)]. Page 9 of 16 Proof. Let the number of elements in [Fx] be the non-negative3 integer Nf, and let the number of elements in [Gx] and [Fx & Gx] be, respectively, the non-negative integers Ng and Nfg. Since [Fx] and [Gx] are subsets of the domain of interpretation U, and Σ is the power set of U, Σ will also contain all subsets of [Fx] and of [Fx & Gx]. By definition, the extension of the generalization [x(Fx)] is just the set of S in Σ which are subsets of [Fx]. Similarly for [x(Fx & Gx)]. The cardinality of these extensions is then N || [x(Fx)] || = 2 f || [x(Fx & Gx)] || = 2 N fg . With these values, recalling that the cardinality of is 2N, we can calculate the conditional probability by: P[x(Gx) | x(Fx)] = P[x(Fx & Gx)] P[x(Fx)] N 2 fg 2 N Nf N = 2 2 N 2 fg N = 2 f . We can also calculate the size of the extension [Fx Gx]. Since [~Fx] is disjoint from [Fx], this is given by: || [Fx Gx] || = || [~Fx] ∪ [Fx & Gx] || = (N- Nf) + Nfg. Then the corresponding generalization has the cardinality: 3 It is not necessary to suppose that the extension of F is not empty, because we are not conditioning on [Fx] itself. Even when [Fx] is empty, the extension of [x(Fx)] is not empty – it will contain one element, the empty set. Page 10 of 16 || [x(Fx Gx)] || =2 ((N - N f ) + N fg ) (N - N f ) )(2 = (2 Nfg -N f N = (2 )(2 )(2 ) N fg ), and therefore the probability (2 N )(2-Nf )(2 P x(Fx Gx) 2N (2-Nf )(2 N fg N fg ) ) N 2 fg Nf 2 which is, as desired, the conditional probability we calculated above.4 The proof of this result for infinite domains is outlined in the Technical Appendix. Before closing, we shall touch briefly on two issues. First, is the relation between the theory of Haim Gaifman and the theory presented here. We will discuss an important limitation of Gaifman’s theory, and explain how the mathematical background of his theory compares to ours. In our concluding topic, we will indicate future directions for the treatment of probability of relational expressions, and other expressions involving more than one free variable. The central concept of Haim Gaifman’s 1964 work, “Concerning Measures on FirstOrder Calculi”, is an extension of a common observation in logic. It is familiar to identify the existential quantifier with an infinite disjunction. It is also common to identify a disjunction as the least upper bound, or supremum, of two propositions in a Boolean lattice. Gaifman used the Page 11 of 16 fact that an infinite disjunction corresponds to the least upper bound of all its finite subdisjunctions. He defines the probability of an existential generalization as the supremum of the probabilities of every finite disjunction of its substitution-instances. It is not obvious that this value has any direct connection to the truth of the existential generalization, but there is a more serious problem. In a wide variety of interesting cases, especially the case of finite domains, this supremum is non-zero and the existential generalization is given a positive probability. In cases of continuous domains such as the real numbers, the probability of a substitution-instance of a formula corresponds to the measure of a one-point set. This measure is generally zero, and Gaifman’s procedure cannot assign positive probability to any existential generalization over such domains. The central result of Gaifman’s 1964 paper is the uniqueness theorem. Gaifman began with an assignment of probability values to the substitution-instances of every quantifier-free formula. These formulas include all the unquantified atomic formulas, plus all the Boolean combinations of those formulas. Once probabilities are defined over the molecular quantifierfree formulas, there is then a unique extension of that probability measure to all sentences in the language. Gaifman notes that uniqueness does not obtain when probabilities are specified only for the atomic formulas. Gaifman’s theorem can be understood as an instance of a more general result in measure theory, if we recall how a first-order language is represented by a cylindrical algebra. In a Tarski-style semantics, assignments of values to variables are infinite sequences of domain elements. The set of all possible assignments is an infinite-dimensional Cartesian product, one dimension for every variable in the language. Each of Gaifman’s molecular quantifier-free Page 12 of 16 formulas is satisfied by a set of sequences. This set of sequences is a cylinder. It is a basic result of measure theory (cf. Halmos 1950) that once a probability is assigned to every cylinder in an infinite-dimensional Cartesian product, then there is a unique assignment of probability to every measurable subset which extends the cylinder measures. Now let us conclude by considering relational expressions and other formulas with more than one free variable. Quantified relational sentences are handled more naturally in our theory than in Gaifman’s. Our concept of probability for every sentence is always the probability that the sentence is true. Therefore our concept applies to complex sentences, such as relational sentences with multiple quantified variables, without referring to the probabilities of its substitution instances. Since the truth of closed relational expressions is defined by the usual Tarski semantics, every such sentence has a truth value at every point in the space of samples. The extension of the sentence is the set of points where it is true, and the probability of the sentence is the measure of its extension. This much is the same as the treatment of singlevariable sentences. A complex issue which arises in our construction is the relation between the extension of an open formula and the extensions of related sentences. Halpern (1999) discusses related issues which arise in Gaifman’s system. We agree with Halpern that these complexities do not represent difficulties which make the theory objectionable, but rather show that the theory of probability for first-order languages is an appropriate context for formalizing the complex inference structures of statistical reasoning. Here is how the complexity arises. Recall that a formula with a single free variable has an extension which is simply a set. The universal generalization of that formula is true in every subset of the formula’s extension. The relation between the extension of the formula and the Page 13 of 16 extension of the sentence is a simple power set operation. But the extension of a formula with multiple free variables is a set of ordered tuples, so the simple power set relation does not hold. All of our current proofs depend on this relation, so at this point there is little to say about the numerical probabilities. It is possible to describe some qualitative relationships. Closed sentences define binary-valued random variables in the space of samples, since they have a truth value at every point. Expressions with single free variables can define a setvalued random variable, since the expression has a set as its extension at every sample point. This set-valued random variable also defines a real-valued random variable. This works by taking the P0-measure of the formula’s extension at every sample point. We may now take the expectation of this variable over the sample space. If the extension of the formula in question is close to the whole population, but not perfectly universal, then the expected value of the extension averaged over the sample space will be smaller than the measure of the original extension. Therefore taking random samples of a population is an efficient means of testing the hypothetical universality of a property. A similar range of operations is available for the formulas with additional free variables. The variety of operations multiplies as free variables increase in number, but a pattern is clear. First, by substituting references to particular individuals for all but one variable, we can obtain indexed families of set-valued or real-valued random variables. If we take the expectation of all those real-valued random variables, then the indexed family of variables becomes a real-valued function of its indices. These indexed families can represent stochastic processes, and other multivariate statistical models. Page 14 of 16 References Adams, E. 1975. The Logic of Conditionals. Dordrecht: Reidel. Billingsley, P. 1995. Probability and Measure, Third Edition. New York: J. Wiley and Sons. Eells, E. and Skyrms, B. eds. 1994. Probability and Conditionals. Cambridge: Cambridge University Press. Gaifman, H. 1964. Concerning Measures on First-Order Calculi. Israel Journal of Mathematics 2:1-17. Halmos, P. R. 1950. Measure Theory. New York: Springer-Verlag, Halpern, J. 1990. An Analysis of First-Order Logics of Probability. Artifical Intelligence 46, pp 311-350. Lewis, D. 1976. Probabilities of conditionals and conditional probabilities. Philosophical Review 85: 297-315. Milne, P. 2003. The simplest Lewis-style triviality proof yet? Analysis 63: 300-304. Page 15 of 16 Ramsey, F. P. 1990. General Propositions and Causality reprinted in F. P. Ramsey’s Philosophical Papers, 145-63. Cambridge: Cambridge University Press. Page 16 of 16