* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sebastiaan Terwijn
Survey
Document related concepts
Transcript
Probability Logic Sebastiaan Terwijn Radboud University Nijmegen Logic Colloquium 2015, Helsinki Human probabilistic reasoning Humans are notoriously bad at probabilistic reasoning. • Examples from Kahneman and Tversky, where 85% of the respondents rated the probability of an event A ∧ B higher than the probability of just A. • Substituting plausibible for probable leads to violations of laws of probability. • Even experts perform badly when reasoning with conditionals (cf. Stenning–van Lambalgen 2008). • Common mistake: ignoring base rates (cf. Kahneman). • Random data generated by humans too regular. (Example: Stapel.) Probability logic Two kinds of probability logic: • “Probabilities over models” Examples: many! Large literature, e.g. I I I logics assigning probabilities to sentences, inclusing those of Carnap, Gaifman, Scott and Kraus, Nilsson, Väänänen. PCTL in model checking. Valiant’s robust logics. • “Models with probabilities” I I I I H. Friedman’s measure quantifier Q. Keisler and Hoover. ε-logic. Approximate Measure Logic (Goldbring and Towsner). See also Leitgeb 2014 for a survey of many of these. How to interpret a universal statement • Aristotle: “. . . every belief comes either through syllogism or from induction.” • Sextus Empiricus, Against the logicians: Universal statements problematic outside of realm of mathematics. Impossibility of stating “All ravens are black”. • Hume: Problem of induction. Popper: Falsification instead of verification. • Long history of alternative interpretations of universal statements. Probability quantifiers • Carnap: Probability as degree of confirmation. However, in Carnap’s inductive logic (1945), ∀xR(x) always has degree of confirmation zero in infinite models. • H. Friedman’s measure quantifier Qx. Borel structures (cf. Steinhorn 1985). • Keisler introduced probability quantifiers of the form (P x > r)ϕ(x). Hard to combine with classical ∃ because projections of measurable sets need not be measurable. Pac-learning pac (for “probably approximately correct”) is a quality mark for learning algorithms in computational learning theory. The pac-model of learning was introduced by Valiant in 1984. • X sample space with an unknown probability distribution D. • C ⊆ P(X) concept class. In a typical learning situation, given an unknown concept c ∈ C, we have to formulate an hypothesis h that closely approximates c on the basis of finite samples from X, drawn according to D, and labeled according to c. Pac-learning (continued) More precisely, we are given an error parameter ε and a confidence parameter δ. The algorithm has to produce an hypothesis h ∈ C that is close to the unknown concept c. Since the algorithm is probabilistic (as it relies on D), we only require this to happen with high probability. An algorithm pac-learns an unknown concept c ∈ C if for any D, given ε and δ it produces, after sampling from X using D, with probability at least 1 − δ an hypothesis h such that PrD c 4 h < ε. That is, the output of the algorithm is probably approximately correct. Vapnik-Chervonenkis dimension The VC-dimension of a concept class C ⊆ P(X) (Vapnik and Chervonenkis 1972) is a purely combinatorial measure of the “richness” of the class. Given S ⊆ X, we can consiser all possible “behaviours” c ∩ S of C on the set S. If |S| = n, we view a behaviour as a subset of {0, 1}n . We say that S is shattered by C if the number of behaviours is maximal, i.e. 2n . The VC-dimension of C is the largest n such that there is a set S of cardinality n that is shattered by C, and ∞ if there is no largest such n. Invented in statistics, the notion of VC-dimension has become a central notion in computational learning theory. It was discovered independently in model theory. VC-dimension and learning theory Let ΠC (n) be the maximal number of behaviours of C on sets S of cardinality n. Sauer-Shelah Lemma (1972) ΠC (n) is either a polynomial of degree d = VC-dim(C) or equal to 2n . The link with learning theory is provided by the “4 Germans” paper: Theorem (Blumer, Ehrenfeucht, Haussler, and Warmuth 1989) Suppose there exists a learning algoritm for C that produces hypotheses that are consistent with given samples. Then C is pac-learnable iff the VC-dimension of C is finite. Sampling from models M first-order model, D arbitrary probability distribution over M. Want to decide with high probability the approximate truth of sentences ϕ on the basis of samples of atomic data taken from M. We have the following assymmetry between ∃ and ∀: • On seeing an atomic truth R(a), we know with certainty that ∃xR(x), • On the other hand, inducing a universal statement ∀xR(x) can only be done probabilistically. As in the pac-model, it is important that the distribution D is unknown, which is counterbalanced by the fact that success of the learning task is measured using the same distribution D. ε-Logic Fix error parameter ε. Inductively define the relation (M, D) |=ε ϕ as follows. • For every atomic formula ϕ, (M, D) |=ε ϕ if M |= ϕ. • The logical connectives ∧ and ∨ are treated classically. • The existential quantifier is also classical: (M, D) |=ε ∃xϕ(x) if there exists a ∈ M such that (M, D) |=ε ϕ(a). ε-Logic • The case of negation is split into subcases as follows: I I For ϕ atomic, (M, D) |=ε ¬ϕ if (M, D) 6|=ε ϕ. ¬ distributes in the classical way over ∧ and ∨, e.g. (M, D) |=ε ¬(ϕ ∧ ψ) if (M, D) |=ε ¬ϕ ∨ ¬ψ. I I I I (M, D) |=ε (M, D) |=ε (M, D) |=ε (M, D) |=ε ¬¬ϕ if (M, D) |=ε ϕ. ¬(ϕ → ψ) if (M, D) |=ε ϕ ∧ ¬ψ. ¬∃xϕ(x) if (M, D) |=ε ∀x¬ϕ(x). ¬∀xϕ(x) if (M, D) |=ε ∃x¬ϕ(x). • (M, D) |=ε ϕ → ψ if (M, D) |=ε ¬ϕ ∨ ψ. • Finally, we define (M, D) |=ε ∀xϕ(x) if PrD a ∈ M | (M, D) |=ε ϕ(a) > 1 − ε. ε-Logic Everything is “as classically as possible”, except the interpretation of ∀. The rules for negation allow us to put every formula in prenex normal form. Note that both (M, D) |=ε ∃xϕ(x) and (M, D) |=ε ∀x¬ϕ(x) may hold, since the interpretation of the first is the classical one, but the interpretation of the second is that most x’s satisfy ¬ϕ(x). Since by definition ¬∃xϕ(x) ≡ ∀x¬ϕ(x), we see that ε-logic is paraconsistent. Probabilistic truth ϕ is ε-valid if (M, D) |=ε ϕ for every M and D. Note that unlike in classical logic, validity and satisfiability are not complementary. Theorem For all ε < ε0 , the ε-validities are strictly included in the ε0 -validities. Theorem Although for ε = 0, ε-truth is not the same as classical truth, the 0-valid formulas coincide with the classically validities. Measurability assumptions on models In truth definition we require in particular that the set x ∈ M : (M, D) |=ε ϕ(x) is D-measurable, where ϕ may contain parameters from M. To avoid pathologies, in the following we will require a bit more, namely: for every k-ary predicate R occurring in ϕ, the set of k-tuples satisfying R is Dk -measurable. We given an example of an excluded model, based on Sierpinski’s argument that under CH there are unmeasurable subsets of the real plane. Define D on ω1 by 1 if A = ω1 with the exception of at most D(A) = countably many elements, 0 otherwise. Then we have: (ω1 , D) |=0 ∀x∀y(x < y) (ω1 , D) 6|=0 ∀y∀x(x < y) Note that the relation {(x, y) ∈ ω1 2 : x < y} is not D2 -measurable: Since all its vertical sections {y : x < y} have D-measure 1, and all its horizontal sections {x : x < y} have D-measure 0, this would contradict Fubini’s theorem. Sampling from models EX(D) sampling oracle When called upon the oracle randomly draws an element x from the model M according to the distribution D. Given a sample of elements, the oracle supplies us with all the atomic truths these elements satisfy, for every relation of every arity in the language L. We assume that the language L is of finite signature, so that every sample satisfies only finitely many atomic truths. Learning first-order formulas Definition A (probabilistic) algorithm L pac-learns sentence ϕ if, for any unknown M and D, given error parameter ε > 0 and confidence parameter δ > 0, and with access to the sampling oracle EX(D), L outputs one of the possibilities M |=D,ε ϕ, M |=D,ε ¬ϕ, such that with probability at least 1 − δ the output is correct. Note that both possible outputs can be correct. Theorem (T 2005) There exists an algorithm L that pac-learns any sentence ϕ. For a fixed number of quantifiers n, L is polynomial in 1ε and 1δ . A sufficient sample size for this algorithm is (Kuyper 2011) (n+1) 1 1 (2n)! ε2 δ Valiant’s Robust Logic Introduced by Valiant 2000. Related to ε-logic in motivation of interpreting ∀ probabilistically, and ∃ classically. However, restricts to finite models. More importantly, this logic falls under the “probabilities over models” approach. Models are coded by vectors, that in this context are called scenes, and aribtrary probability distributions over the set of all scenes are considered. Valiant showed that a certain class of formulas called rules are pac-learnable from scenes in this context. Approximate Measure Logic Introduced by Goldbring and Towsner 2012. AML similar to ε-logic, but with a completely different motivation. More specifically, it aims to provide a logical framework for the connection between results in finitary combinatorics and results in measure theory or ergodic theory. Typical example: Szemerédi’s Theorem on arithmetic progressions in sufficiently dense sets of natural numbers. The idea for this logic was based in part on work by Hrushovski. Classical truth is not a limit Theorem There exist sentences ϕ that are not classically valid but that are ε-valid for all ε > 0. Proof. Consider lin = ∀x∀y(x 6 y ∨ y 6 x) ϕ = ¬lin ∨ ∃x∀y(y 6 x) When (M, D) 6|=ε ¬lin then we have classically M |= lin so 6 really is a linear order in M. But if this is the case, by the measurability assumption that 6 is D2 -measurable, there is a point x such that most of the measure is to the left of x. M x Example The sentence ∀x∀y R(x, y) ∧ ∀y∀x ¬R(x, y) is finitely 13 -satisfiable: v f f v v f f v f Example (continued) The stronger sentence ∀x∀y R(x, y) ∧ ¬R(y, x) is also 1 3 -satisfiable, but not by any finite model. By approximating this one can obtain finite ε-models for any ε > 31 . One can also show that the sentence does not have any countable 13 -models. Finite and countable models Downward Löwenheim-Skolem fails for ε-logic: Not every ε-satisfiable sentence has a countable model. Theorem T.f.a.e.: (i) ∀M finite M |= ϕ, (ii) ∀M countable ∀D ∀ε > 0 (M, D) |=ε ϕ. Notice that it is essential we exclude the case ε = 0, since otherwise we would obtain all classical validities instead of only the finitely valid sentences. Failure of classical Löwenheim-Skolem Example of a satisfiable sentence without any countable model. Let ϕ = ∀x∀y(R(x, y) ∧ ¬R(x, x)). Then ϕ is 0-satisfiable; take the unit interval [0, 1] with the Lebesgue measure and take R(x, y) to be x 6= y. However, ϕ does not have any countable 0-models: If (M, D) |=0 ϕ then for almost every x ∈ M the set Bx = {y ∈ M | (M, D) |=0 ¬R(x, y) ∨ R(x, x)} S has measure zero. Since x ∈ Bx , the set x∈M Bx equals M, and therefore has measure 1. But if M is countable it is also the union of countable many sets of measure 0 and hence has measure 0, a contradiction. Note that ϕ is finitely ε-satisfiable for every ε > 0. Downward Löwenheim-Skolem Theorem (Kuyper–T 2013) Let (M, D) be an ε-model and let X ⊆ M be of cardinality at most 2ω . Then there exists an elementary ε-submodel (N , E) ≺ε (M, D) such that X ⊆ N and N is of cardinality at most 2ω . Result is for countable languages, with equality, but no function symbols. For languages without equality we have: Theorem (Kuyper–T 2013) Let Γ be an ε-satisfiable set of sentences. Then there exists an ε-model on [0, 1] with the Lebesgue measure which ε-satisfies Γ. Furthermore, all relations in the new ε-model can be chosen to be Borel. The Löwenheim number Let λε be the Löwenheim number of ε-logic, that is, the smallest cardinal λ such that every ε-satisfiable sentence has a model of cardinality λ. Theorem (Kuyper–T 2013) For every rational ε ∈ [0, 1) we have (i) ℵ1 6 λε 6 2ℵ0 , (ii) If Martin’s axiom MA holds then λε = 2ℵ0 . Hence the statement λε = ℵ1 is independent of ZFC. Question Is there a model of ZFC in which λε < 2ℵ0 ? For example λε = ℵ1 < 2ℵ0 ? Compactness Theorem (Kuyper–T 2013) For every rational ε ∈ (0, 1), ε-logic is not compact, i.e. there exists a countable set Γ of sentences such that each finite subset of Γ is ε-satisfiable, but Γ itself is not ε-satisfiable. Idea similar to that for Keisler’s logic. Theorem (Kuyper 2014) For languages without function symbols, 0-logic is compact. Weak ε-models In the definition of ε-model we required that all relations and functions are D-measurable. If we drop this requirement, we obtain the notion of weak ε-model. Ultraproducts Given an ultrafilter U on ω, and (finitely additive) measure spaces (Xi , Di ), we can define the ultraproduct Y (Xi , Di )/U i∈ω equipped with a uniquely defined (σ-additive) ultraproduct measure. (Construction due to Hoover.) Theorem (Kuyper–T 2013) • A variant of Los’ theorem holds. • The ultraproduct is a weak ε-model. A probabilistic version of Los’ theorem Theorem (Kuyper–T 2013) T.f.a.e.: Q (i) i∈ω (Mi , Di )/U |=ε ϕ([a1 ], . . . , [an ]), (ii) for all ε0 > ε, i ∈ ω | (Mi , Di ) |=ε0 ϕ(a1i , . . . , ani ) ∈ U, (iii) there exists a sequence ε0 , ε1 , . . . with U-limit ε such that i ∈ ω | (Mi , Di ) |=εi ϕ(a1i , . . . , ani ) ∈ U. 1 , . . . , an ) ∈ U, then we In particular, if i ∈ ω | (M , D ) |= ϕ(a i i ε i i Q have i∈ω (Mi , Di )/U |=ε ϕ([a1 ], . . . , [an ]). Weak models and compactness Theorem (Kuyper–T 2013) ε-logic becomes compact when considering weak ε-models. This is very similar to the compactness of Goldbring and Towsner’s AML. Satisfiability as a limit Theorem (Kuyper 2014) \ ε-satisfiable = 0-satisfiable. ε>0 Theorem (Kuyper 2014) For ε > 0 we have \ ε0 -satisfiable 6= ε-satisfiable. ε0 >ε The latter result follows from considerations about computational complexity of satisfiability. The complexity of ε-satisfiability Theorem Given ϕ we can decide whether there exists ε < 1 such that ϕ is ε-satisfiable. Theorem (Kuyper 2014) For ε = 0, ε-satisfiability is decidable. Proof uses Skolemization + indiscernables + Ramsey theory. Theorem (Kuyper 2014) For ε ∈ (0, 1) ∩ Q, ε-satisfiability is Σ11 -complete. Proof of latter result uses ideas from Hoover. Caveat: These results are only for relational languages. The complexity of ε-validity Terwijn (2009) proved that for every rational ε ∈ [0, 1), the set of ε-valid formulas is Σ01 -hard. (Recall that ε = 0 equals the classical case.) Theorem (Kuyper 2014) For every rational ε ∈ (0, 1), the set of ε-valid formulas is Π11 -hard. Question: Is it Π11 -complete ? Finite models of ε-logic G. Yang considered the complexity of ε-satisfiability and ε-validity over finite models. Theorem (Yang 2014) Over finite models we have: (i) for rational ε ∈ (0, 1), ε-satisfiability is Σ01 -complete, and ε-validity is Π01 -complete. (ii) for ε = 0, ε-satisfiability is decidable, and ε-validity is Π01 -complete. Some open problems about ε-logic • For the Löwenheim number λε of ε-logic, does ZFC prove that λε = 2ℵ0 ? • Does the Σ11 -completeness of ε-satisfiability hold for languages with = and function symbols? • Is ε-validity Π11 -complete for ε > 0 ? • Craig interpolation / Beth definability / Robinson consistency ? Note that classically the equivalence of these properties relies on compactness. Potentially, some of these could hold for ε-logic, while others do not. • Proof theory. • Study ε-logic as part of the larger area of model-theoretic logics (cf. Barwise and Feferman 1985) and generalized quantifiers (cf. Väänänen 2008). References I. Goldbring and H. Towsner, An approximate logic for measures, Israel Journal of Mathematics 199(2) (2014) 867–913. H. J. Keisler, Probability quantifiers, in: J. Barwise and S. Feferman (eds.), Model-Theoretic Logics, Springer-Verlag 1985, 509–556. R. Kuyper, Computability, probability and logic, PhD thesis, Radboud University Nijmegen, 2015. R. Kuyper and S. A. Terwijn, Model theory of measure spaces and probability logic, Review of Symbolic Logic 6(3) (2013) 367–393. H. Leitgeb, Probability in Logic, Oxford Handbook of Probability and Philosophy, forthcoming, 2014. S. A. Terwijn, Probabilistic logic and induction, Journal of Logic and Computation 15(4) (2005) 507–515. L. G. Valiant, Robust logics, Artificial Intelligence 117 (2000) 231–253. G. Yang, Computabilities of Validity and Satisfiability in Probability Logics over Finite and Countable Models, arXiv, 2014.