* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Automated Deduction
Survey
Document related concepts
Mathematical logic wikipedia , lookup
Abductive reasoning wikipedia , lookup
Model theory wikipedia , lookup
Law of thought wikipedia , lookup
Natural deduction wikipedia , lookup
Quasi-set theory wikipedia , lookup
Intuitionistic logic wikipedia , lookup
Curry–Howard correspondence wikipedia , lookup
Structure (mathematical logic) wikipedia , lookup
Non-standard calculus wikipedia , lookup
First-order logic wikipedia , lookup
Propositional formula wikipedia , lookup
Transcript
Automated Deduction Stefan Hetzl [email protected] Vienna University of Technology Summer Term 2016 ii Contents 1 Introduction 1 2 Resolution in propositional logic 3 2.1 Reminders on propositional logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Normal forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 The Tseitin transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 SAT- and SMT-solving 11 3.1 DPLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Reminders on first-order logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 DPLL(T) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Congruence closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Normal forms in first-order logic 19 4.1 Variable normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Negation normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Skolemisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.4 Clause normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5 Resolution in first-order logic 23 5.1 Unification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.2 Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6 Redundancy 29 6.1 Subsumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.2 Tautology deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 7 Completeness 35 7.1 Completeness without equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 7.2 Completeness with equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 iii 8 Further Topics 8.1 39 Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 1 Introduction It is worthwhile to start these course notes by giving a brief outline of the history of concepts which are fundamental to the area of automated deduction. Even though this outline is quite superficial it nevertheless serves to illustrate how old many of the ideas and concepts at the base of automated deduction are. The notion of mathematical proof goes back to ancient Greece, more specifically to the Elements of Euclid („ 300 BC). This work, having been the first deductive treatment of mathematics, is considered the inception of the axiomatic method. It has had tremendous influence on all of mathematics and, by being the root of the notion of mathematical proof, also on logic and automated deduction. An import milestone in the conceptual development of automated deduction was the work of Leibniz (1646–1716) who among other subjects has made seminal contributions to mathematics and philosophy. He was entertaining the idea that a dispute between two persons could be solved by writing up the statement under discussion in a universal language (characteristica universalis) and then using a calculus of reasoning (calculus ratiocinator) for dedicing the truth of assertions expressed in the universal language. Instead of arguing about a statement two persons in disagreement could then calculate who is right using this framework. This idea is embodied in the slogan calculemus!, let us calculate! Another aspect of automated deduction, the automation, was also present in Leibniz’ work. He has designed a computation machine for addition, multiplication, subtraction and division. While Leibniz has never fully worked out proposals of such a universal language and a calculus of reasoning for it, he foresaw, at least on a conceptual level, the three components which are essential to automating deduction: a language for expressing statements, rules of computation applicable to expressions of the language and the automatisation of these rules. The first mathematisation of logical reasoning can be attributed to Boole (1815–1864). He proposed to express logical propositions by algebraic equations and to thus treat logic by computation rules much like those valid for the numbers. This provides an infallible method for logical reasoning. In honour of this contribution, propositional logic is often also called Boolean logic. At the end of the 19th century and well into the 1920s and 1930s logic has undergone an enormous development, characterised primarily by its mathematisation and the solution of many of its fundamental problems. This development cannot possibly be recounted here except for the following aspects which are of central importance to automated deduction: In 1928 Hilbert posed his famous “Entscheidungsproblem” (which often, even in the English- 1 language literature is still called by its German name). In modern terminology: is there an algorithm which when given a formula in first-order logic determines whether it is valid? Turing and Church indepedently solved this problem negatively in 1936. Without going into the details of the Church-Turing thesis here, the ramification of these results for automated deduction is that a full automation of validity-checking is impossible. Leibniz’ idea, when we consider firstorder logic to be the universal language he has envisaged, is hence not realisable in a fully automated way. However, on the positive side Gödel’s 1929 completeness theorem entails that the set of valid first-order formulas is semi-decidable, i.e. there is an algorithm which takes a first-order formula ϕ as input. If ϕ is valid the algorithm will eventually terminate with the information that ϕ is valid. If, on the other hand, ϕ is not valid, the algorithm may either terminate with that information or not terminate at all. The situation for validity in first-order logic is much better suited for automatisation than that of truth in arithmetic. As Gödels’s first incompleteness theorem (1931) shows, truth in arithmetic is not even semi-decidable (in fact: truth in arithmetic is much more complicated than semi-decidable problems). Therefore validity in first-order logic forms a good basis for automated deduction: while the formalism is very expressive it is still semi-decidable. And indeed, most of this course will be devoted to algorithms that prove the validity of first-order formulas. While such mathematical results on decidability rightly form cornerstones of computational logic one should also not over-emphasise their relevance for practical applications. After all, the guarantee that a computer program will terminate eventually is of little practical use if the computation time is beyond what a user is willing or able to wait for. So, for practical applications, we are interested not only in the existence of a semi-decision procedure but also in that procedure being efficient. Seminal work in that respect has been done by Robinson in 1965 by his invention of the resolution principle. Until then most provers have generated ground instances of quantified formulas and then applied propositional reasoning steps to these ground instances. Robinson’s resolution principle was the first to combine instantiation and propositional reasoning in a single inference rule using unification: the first-order resolution rule. Since then a vast amount of techniques has been developed on that basis. This course will be primarily about these techniques. Today automated deduction has a wealth of applications throughout computer science in fields such as hardware and software verification, artificial intelligence, logic programming, deductive information systems, formal mathematics, . . . 2 Chapter 2 Resolution in propositional logic 2.1 Reminders on propositional logic This course supposes familiarity with basic notions in propositional logic. This section only serves to remind the reader about these notions and to fix notation. A thorough introduction to propositional logic, which is well-suited as a basis for this course, can be found in Dirk van Dalen: Logic and Structure, 4th edition, Springer, sections 1.1–1.3. In propositional logic formulas are built up inductively from a countably infinite set of atoms p1 , p2 , p3 , . . ., the logical connectives ^, _, Ñ, and the logical constants K, J. Often we will also use letters like p, q, r, . . . for atoms. We will also ocassionally take some liberty as to whether ^ and _ are considered as binary or as n-ary connectives. We can (and often will) think of formulas as trees, for example the formula p p _ qq is written as the tree: _ q p Then words as “above”, “below”, “immediately above”, “immediately below” become meaningful on formulas. The size of a formula ϕ is defined by induction on the structure of ϕ as: |p| “ |K| “ |J| “ 1, |ψ ˝ χ| “ |ψ| ` |χ| ` 1 for ˝ P t_, ^, Ñu, and | ψ| “ |ψ| ` 1. An interpretation is a mapping I : tp1 , p2 , . . .u Ñ t0, 1u where 1 represents “true” and 0 represents “false”. The interpretation of a formula is defined by fixing IpKq “ 0, IpJq “ 1 and then proceeding by induction on the structure of the formula via the following truth tables. p 0 0 1 1 q 0 1 0 1 p^q 0 0 0 1 p_q 0 1 1 1 pÑq 1 1 0 1 p 0 1 p 1 0 A formula ϕ is called satisfiable if there is an I s.t. Ipϕq “ 1. It is called valid (or tautological) if Ipϕq “ 1 for all I. Two formulas ϕ, ψ are called logically equivalent, written as1 ϕ ô ψ, if 1 Note the difference between the connective Ø and the relation ô. 3 Ipϕq “ Ipψq for all I. If a formula ϕ only contains the constants p1 , . . . , pn then I 1 pϕq “ I 2 pϕq for all I 1 , I 2 which agree on p1 , . . . , pn . Therefore the validity as well as the satisfiability of a formula can be decided in exponential time by using a truth table. 2.2 Normal forms Normal forms of logical formulas play a very important role in automated deduction. This is due to the following reason: the central computational problem of automated deduction is that the search space (of an algorithm that searches for a proof) is very large. Avoiding different syntactic representations of one and the same semantical meaning helps to keep down the size of the search space. We will see this principle at work when we consider redundancy-elimination techniques like subsumption later. Definition 2.1. A formula is said to be in negation normal form (NNF) if it does not contain Ñ and only appears immediately above atoms. Proposition 2.1. For every formula ϕ in tp1 , . . . , pn u there is a formula ϕN in tp1 , . . . , pn u which is in NNF, is logically equivalent to ϕ, and satisfies |ϕN | “ Op|ϕ|q. Proof. The following formula rewriting rules preserve logical equivalence. pIq ψ Ñ χ ÞÑ ψ _ χ pM1q pψ ^ χq ÞÑ ψ _ χ pDNq pM2q ψ ÞÑ ψ pψ _ χq ÞÑ ψ^ χ Logical equivalence is a congruence relation, hence application of the above rules anywhere in a formula transforms it into a logically equivalent formula. Let ψ be a formula. Write nÑ pψq for the number of implications in ψ and n˝ pψq for the number of pairs pc, dq where c is a negation in ψ, d is a binary connective in ψ and c is above d. Note that each of the above rewriting rules decreases the lexicographic order on pnÑ p¨q, n˝ p¨q, | ¨ |q. Therefore, every rewriting sequence eventually terminates with a normal form. A normal form of these rewriting rules is in NNF. Moreover, note that none of these rules changes the number of binary connectives. The logical complexity |ϕN | of a formula in NNF is at most its number of binary connectives plus 2 times its number of atoms. Hence |ϕN | “ Opϕq for any normal form ϕN obtained from a formula ϕ. Definition 2.2. Atoms and negated atoms are called literals. A formula is in conjunctive normal form (CNF) if it is a conjunction of disjunctions of literals. CNFs are often conveniently notated as Źn Žki j“1 Li,j i“1 where the Li,j are literals. Proposition 2.2. For every formula ϕ in tp1 , . . . , pn u there is a formula ϕC in tp1 , . . . , pn u which is in CNF and is logically equivalent to ϕ. Proof. By Proposition 2.1 the formula ϕ has a NNF ϕN . Let ϕC be a formula obtained from ϕN by exhaustive application of the following formula rewriting rules: pD1q ψ _ pχ1 ^ χ2 q Ñ Þ pψ _ χ1 q ^ pψ _ χ2 q pD2q pχ1 ^ χ2 q _ ψ Ñ Þ pχ1 _ ψq ^ pχ2 _ ψq We show that every rewriting sequence of pD1q- and pD2q-steps eventually terminates. To that aim, proceed by induction on the size of ϕ. These rules terminate on atoms since they are not 4 applicable to atoms. For the induction step, the statement follows trivially if neither pD1q nor pD2q is applicable to the root of ϕ. If pD1q is applicable at the root of ϕ, then ϕ is of the form ψ _ pχ1 ^ χ2 q. In that case, by induction hypothesis every sequence of reductions in ψ, χ1 , χ2 as well as in ψ _ χ and ψ _ χ2 terminates hence a reduction sequence of ϕ also terminates. Finally, ϕC contains negation only immediately above atoms and does not contain a conjunction below a disjunction, hence it is in CNF. Example 2.1. Consider the formula ϕn “ pp1 ^ q1 q _ pp2 ^ q2 q _ ¨ ¨ ¨ _ ppn ^ qn q. The above transformation into CNF will yield the formula ľ ϕC n “ n ł vi pv1 ,...,vn qPtp1 ,q1 uˆ¨¨¨ˆtpn ,qn u i“1 which is of size exponential in that of ϕn . Both of the above transformations to NNF and CNF respectively are simple to define and quite elegant theoretically. But while the above transformation to NNF is harmless complexitywise the above transformation to CNF is not. We will see later that there is also a linear transformation to CNF which introduction additional atoms and preserves only satisfiability and not logical equivalence. In practice, mostly such CNF-transformations are used (since preserving satisfiability is enough). 2.3 Resolution The resolution calculus works on formulas in conjunctive normal form. Formulas in CNF will be notated by clause sets, defined below. Resolution is a refutational calculus, i.e. we start from a given clause set and try to show that it is unsatisfiable by showing that it implies a contradiction. This can be used for proving a formula ϕ valid by proving ϕ unsatisfiable and observing that ϕ is valid iff ϕ is unsatisfiable. Definition 2.3. A clause is a finite set of literals. A clause set is a set of clauses, i.e. a set of sets of literals. The semantical meaning of a clause is that of a disjunction, i.e. the clause C “ tL1 , . . . , Lk u is interpreted as the disjunction L1 _ ¨ ¨ ¨ _ Lk and consequently we define IpCq “ IpL1 _ ¨ ¨ ¨ _ Lk q, i.e. IpCq “ 1 iff there is an Li P C s.t. IpLi q “ 1. The semantical meaning of a clause set is that of a conjunction of its clauses. While clauses will always be finite sets of literals, clause sets will sometimes be infinite. Consequently the interpretation of a clause set S cannot be defined via a (finite) formula but is instead defined directly as IpSq “ 1 iff IpCq “ 1 for all C P S. Note that the interpretation of the empty clause is 0. We will never consider the empty clause set. These definitions allow to speak about satisfiability, validitiy, logical equivalence, etc. of clause sets just as for formulas. Note that in a clause, multiple occurrences of literals are identified, the order of literals does not matter and the parenthesis around disjunctions do not matter. The same is true for clauses in a clause set. Therefore, there is not a 1-1 relation between clauses sets and formulas in CNF but all formulas in CNF which correspond to a given clause set are logically equivalent since both conjunction and disjunction are idempotent, commutative and associative. 5 Example 2.2. The clause set corresponding to ϕ “ ppp _ qq _ pq ^ p q _ pq ^ p is S “ ttp, qu; tpuu. As in the above example we usually write a semicolon instead of a comma for separating the clauses of a clause set. Definition 2.4. Let C and D be clauses s.t. p P C und p P D. Then the clause resp pC, Dq :“ pCztpuq Y pDzt puq is called p-resolvent of C and D. Definition 2.5. Let S be a clause set. A list C1 , . . . , Cn of clauses is called resolution deduction from S if for all i P t1, . . . , nu: (I) Ci P S, or (R) there are j, k ă i and an atom p s.t. Ci “ resp pCj , Ck q. A resolution deduction C1 , . . . , Cn from S is called resolution refutation of S if Cn “ H. Example 2.3. Let S “ ttp1 u; t p1 , p2 u; t p1 , p2 , p3 u, t p3 uu. The following list of clauses is a resolution refutation of S. C1 “ t p1 , p2 u pIq C2 “ t p1 , p2 , p3 u pIq C3 “ t p1 , p3 u pRpC1 , C2 qq C4 “ tp1 u pIq C5 “ tp3 u pRpC4 , C3 qq C6 “ t p 3 u pIq C7 “ H pRpC5 , C6 qq Sometimes we will also write resolution deductions and refutations in tree form: p1 , p2 p1 p1 , p2 , p3 p1 , p3 p3 p3 H Theorem 2.1 (Soundness). If S has a resolution refutation, then S is unsatisfiable. Proof. We will show the following, slightly more general, statement: if C1 , . . . , Cn is a resolution deduction from S and I an interpretation with IpSq “ 1, then IpCn q “ 1. We proceed by induction on the deduction making a case distinction on the inference rule used for deriving Cn . If Cn P S the statement follows trivially. If Cn “ resp pCi , Cj q for some i, j ă n then by induction hypothesis we have IpCi q “ IpCj q “ 1. If Ippq “ 1 and Cj “ t p, L1 , . . . , Lk u then IptL1 , . . . , Lk uq “ 1 and hence IpCn q “ 1. If Ippq “ 0 then IpCn q “ 1 follows symetrically from IpCi q “ 1. 6 In order to prove the completeness of the resolution calculus we will use semantic trees. Definition 2.6. Let ppi qiě1 be a sequence of atoms. The semantic tree of ppi qiě1 is the following tree ‚ p1 ‚ p2 ‚ p1 w ' p2 p2 ‚ ‚ ‚ p2 ‚ in which every branch is infinite. Every vertex v of this tree induces a partial interpretation Iv of tp1 , p2 , . . .u where Iv ppi q “ 1 if pi occurs on the path from v to the root and Iv ppi q “ 0 if pi occurs on this path. Let S be a clause set in the atoms tpi | i ě 1u. Then the semantic tree of S, written as T pSq, is defined from the above tree by closing a branch after finitely many steps at a vertex v iff there is a C P S s.t. Iv pCq “ 0. A vertex closed by a clause C is written as: ‚ ˆ C Note that we do not require the clause set S to be finite for the definition of T pSq. Also note that the requirement of closing at v iff there is a C P S s.t. Iv pCq “ 0 entails that branches are closed as early as possible. Example 2.4. Let S “ ttp1 u; t p1 , p2 u; t p1 , p2 , p3 u, t p3 uu be the clause set of Example 2.3. The semantic tree T pSq is: p1 ‚ t p1 ‚ ˆ *‚ tp1 u p2 p2 u ‚ ˆ " t p1 , p2 u p3 w ‚ p3 $ ‚ ˆ ‚ ˆ t p1 , p2 , p3 u t p3 u Fact 2.1. H P S iff T pSq consists of a single node. Fact 2.2. S is satisfiable iff T pSq has an infinite branch. Proof. Let I be an interpretation with IpSq “ 1, then I induces an infinite branch and vice versa: an infinite branch is never closed, hence the interpretation it induces satisfies all C P S and hence S itself. 7 Fact 2.3. S is unsatisfiable iff T pSq is finite. Proof. By König’s Lemma a finitely branching tree is infinite iff it has an infinite branch hence the claim directly follows from Fact 2.2. Fact 2.4. If S Ď S 1 then T pSq Ě T pS 1 q (where Ě is to be understood as applied to the set of vertices) because every vertex which can be closed in T pSq can also be closed in T pS 1 q. We are now ready to prove the completeness of propositional resolution. Theorem 2.2 (Completeness). Let S be a clause set. If S is unsatisfiable, then S has a resolution refutation. Proof. Let S be unsatisfiable, then by the above Fact 2.3 the tree T pSq has only finitely many vertices. We will proceed by induction on |T pSq|, the number of vertices in T pSq. If |T pSq| “ 1 then by the above Fact 2.1 we have H P S and the list consisting only of H is already a resolution refutation of S. If |T pSq| ą 1 then T pSq contains a configuration of the form p ‚v x p & ‚ ˆ C1 ‚ ˆ C2 because T pSq is finite: suppose each vertex had a child which is not closed, then every vertex would be the start of an infinite branch which is a contradiction to T pSq being finite. Now let C “ resp pC1 , C2 q and S 1 “ S Y tCu. Then T pS 1 q Ď T pSq already due to Fact 2.4. We claim that we even have T pS 1 q Ă T pSq. To show this, let C1 “ C11 Z tpu, C2 “ C21 Z t pu, and consider the path π from v to the root. Since C1 closes a sucessor of v, π must contain duals of all literals in C11 and – analogously – duals of all literals in C21 . Hence π contains duals of all literals in C and thus v can be closed by C and we have |T pS 1 q| ă |T pSq|. By induction hypothesis there is a resolution refutation of S 1 which, w.l.o.g., is of the form C, C11 , . . . , Cn1 with C ‰ Ci1 for all i P t1, . . . , nu. Hence C1 , C2 , C, C11 , . . . , Cn1 is a resolution refutation of S. Note that this proof is constructive in the sense that it computes a resolution refutation from a finite semantic tree. Definition 2.7. Let S be a clause set. The smallest superset of S which is closed under resolution is called closure of S and is denoted by Ŝ. Corollary 2.1. A clause set S is unsatisfiable iff H P Ŝ. Proof. By soundness and completeness S is unsatisfiable iff S has a resolution refutation which in turn is equivalent to H P Ŝ. This corollary suggests the following algorithm for deciding the satisfiability of a propositional formula ϕ: first compute a clause set S which is satisfiability-equivalent to ϕ, then successively compute Ŝ. If the empty clause is found in Ŝ, ϕ is unsatisfiable. If the computation of Ŝ finishes without finding the empty clause, ϕ is satisfiable. Note that S is finite and hence Ŝ is finite, so the computation of Ŝ terminates. This algorithm already has an asymptotic complexity which 8 is better than computing a truth table: while the truth table computation is exponential even in the best case, this is not so for the above algorithm. It allows to exploit the existence of short resolution refutations for determining the unsatisfiability of a formula faster than a truth table would allow. 2.4 The Tseitin transformation In this section, we will see how the exponential blowup of the distributivity-based computation of a CNF can be avoided. The crucial idea is to introduce new propositional atoms which will serve as abbreviations of complex formulas. Definition 2.8. Let ϕ be a formula, the set of subformulas of ϕ are defined inductively as: subfppq “ tpu and analogously for J and K subfp ψq “ t ψu Y subfpψq subfpψ1 ˝ ψ2 q “ tψ1 ˝ ψ2 u Y subfpψ1 q Y subfpψ2 q for ˝ P t^, _, Ñu In what follows we will write CNFpϕq for a conjunctive normal form of a formula ϕ obtained by the distributivity-based transformation of Proposition 2.2. For writing formulas which express logical equivalences we introduce the abbreviation ϕ Ø ψ for pϕ Ñ ψq ^ pψ Ñ ϕq. Definition 2.9. Let ϕ be a formula in tp1 , . . . , pn u. For every subformula ψ of ϕ define the following formula Dpψq in the language tp1 , . . . , pn u Y tqψ | ψ P subfpϕqu: Dppi q “ CNFpqpi Ø pi q Dp ψ0 q “ CNFpq ψ0 qψ0 q Ø Dpψ1 ˝ ψ2 q “ CNFpqψ1 ˝ψ2 Ø qψ1 ˝ qψ2 q Ź Furthermore, define T pϕq “ qϕ ^ ψPsubfpϕq Dpψq for ˝ P t^, _, Ñu This transformation to conjunctive normal form is known as the Tseitin-transformation, named after G. Tseitin. Hence also the notation T pϕq. Example 2.5. Let us compute T pp1 ^ p2 q: Dpp1 q “ CNFpqp1 Ø p1 q “ CNFppqp1 Ñ p1 q ^ pp1 Ñ qp1 qq “ p qp1 _ p1 q ^ p p1 _ qp1 q and analogously Dpp2 q “ p qp2 _ p2 q ^ p p2 _ qp2 q. Furthermore Dp p2 q “ CNFpq “p q p2 p2 _ qp2 q ^ pqp2 _ q Dpp1 ^ p2 q “ CNFpqp1 ^ p2 “ CNFppqp1 ^ “ p qp1 ^ qp2 q “ CNFppq Ø p2 Ø qp 1 ^ q p2 Ñ qp 1 ^ q p2 Ñ qp 2 q ^ p qp 2 Ñ q p2 qq p2 q p2 q p2 q _ qp1 q ^ p qp1 ^ ^ pqp1 ^ q p2 _q p2 q p2 Ñ qp1 ^ p2 qq ^ p qp1 _ q p2 _ qp 1 ^ p2 q and finally T pp1 ^ p2 q “ qp1 ^ p2 ^ Dpp1 ^ p2 q ^ Dp p2 q ^ Dpp2 q ^ Dpp1 q Proposition 2.3. Let ϕ be a propositional formula, then Tpϕq is in CNF, satisfiability-equivalent to ϕ, and satisfies |Tpϕq| “ Op|ϕ|q. 9 Notation: In the below proof we will also use ^, . . . for the Boolean function and not just for the connective. Proof. Since Tpϕq is a conjunction of CNFs it is clearly in CNF. For satisfiability-equivalence, let I be an interpretation of the atoms tp1 , . . . , pn u of ϕ with Ipϕq “ 1. Define the interpretation I ˚ of tp1 , . . . , pn u Y tqψ | ψ P subfpϕqu by I ˚ ppi q “ Ippi q and I ˚ pqψ q “ Ipψq. Then clearly I ˚ pqϕ q “ Ipϕq “ 1 and it remains to show that I ˚ pDpψqq “ 1 for all subformulas ψ of ϕ. We do this by a case distinction on the top connective of ψ: if ψ “ ψ1 ^ ψ2 , then I ˚ pqψ1 ^ψ2 q “ Ipψ1 ^ ψ2 q “ Ipψ1 q ^ Ipψ2 q “ I ˚ pqψ1 q ^ I ˚ pqψ2 q “ I ˚ pqψ1 ^ qψ2 q and therefore I ˚ pqψ1 ^ψ2 Ø qψ1 ^ qψ2 q “ 1 but as CNF preserves logical equivalence we also have I ˚ pCNFpqψ1 ^ψ2 Ø qψ1 ^ qψ2 qq “ 1. The other connectives are analogous. For the other direction of satisfiability-equivalence, let I ˚ be an interpretation of tp1 , . . . , pn u Y tqψ | ψ P subfpϕqu s.t. I ˚ pTpϕqq “ 1. Define I “ I ˚ ætp1 ,...,pn u . Since I ˚ pTpϕqq “ 1 we have I ˚ pDpψqq “ 1 for all subformulas ψ of ϕ. We show that I ˚ pqψ q “ Ipψq for all subformulas ψ of ϕ by induction on ψ: for an atom pi we have I ˚ pqpi q “ Ippi q as I ˚ pDppi qq “ 1. For a conjunction we have I ˚ pqψ1 ^ψ2 q “ I ˚ pqψ1 ^ qψ2 q since I ˚ pDpψ1 ^ ψ2 qq “ 1 and furthermore “ I ˚ pqψ1 q ^ I ˚ pqψ2 q “(IH) Ipψ1 q ^ Ipψ2 q “ Ipψ1 ^ ψ2 q. The other connectives are analogous. Since I ˚ pTpϕqq “ 1 we have I ˚ pqϕ q “ 1 and hence Ipϕq “ 1. For the complexity-result, note that |Tpϕq| ď c ` d|subfpϕq| ď c ` d|ϕ| for some constants c, d P N. Remark 2.1. Note that the mere existence of a small satisfiability-equivalent formula is trivial: if ϕ is satisfiable, then J is satisfiability-equivalent to ϕ, if it is not then K is satisfiabilityequivalent to ϕ. The point of the above result is that ϕ ÞÑ Tpϕq is much easier to compute than # J if ϕ is satisfiable ϕ ÞÑ . K if ϕ is unsatisfiable 10 Chapter 3 SAT- and SMT-solving 3.1 DPLL The basis for state of the art SAT-solvers is the DPLL algorithm which we are going to present based on a transition system in this chapter. This procedure is named after M. Davis, H. Putnam, G. Logeman, and D. Loveland. The idea of the DPLL algorithm is to traverse the set of all possible interpretations of a clause set S to find out whether S is satisfiable or not. In principle, this works like a depth-first search in the semantic tree of S (see Section 2.3). In practice it makes a large difference how this set is traversed and we will see a number of improvements over this naive traversal which are crucial for efficiency. In the following, S will denote a clause set, C a clause and L a literal. It will often be convenient to write a clause as a disjunction, so, e.g., C _ L is an abbreviation for the clause C Y tLu where L R C. For a literal L, we write L for its dual literal, i.e., if L “ p, then L “ p and if L “ p, then L “ p. A transition system is a directed graph which is presented by defining a set of states as the vertices of the graph and by defining the edges by transition rules. Definition 3.1. An annotated literal is either a literal L or a literal marked as decide literal, written as Ld . A state of DPLL is either “FAIL” or a pair I | S where I “ L1 , . . . , Ln is a list of annotated literals and S is a clause set. I will be considered a (partial) interpretation by setting Ippq “ 1 if p P I and Ippq “ 0 if p P I. If I and J are disjoint partial interpretations, then we will write pI, Jqppq for the value of p under their union. Definition 3.2. A state is called final if it is either FAIL or of the form I | S where I contains all atoms of S and IpSq “ 1. In the search through the tree of all interpretations, decide literals represent choices on which decisions have been taken and hence serve as backtracking points. Non-decide literals represent decisions which are enforced by other decisions taken previously. Therefore no backtracking is needed for them (avoiding backtracking to non-decide literals is already a first improvement over the naive depth-first search). Definition 3.3. The transitions of DPLL are: UnitPropagate " I|S ÝÑ I, L | S if L undefined in I there is C _ L P S s.t. IpCq “ 0 11 Decide I|S ÝÑ d I, L | S " L undefined in I L or L occurs in S " there is C P S s.t. pI, Ld , JqpCq “ 0, and J does not contain decide literals " there is C P S s.t. IpCq “ 0 I contains no decide literals if Backtrack d I, L , J | S ÝÑ I, L | S if Fail I|S ÝÑ FAIL if Note that in the above transition rules of DPLL the right side does not change. We include it in the above definition nevertheless because we will later consider extensions of this transition system where the right side does change. The standard strategy for applying these rules is as follows: Algorithm 1 The DPLL algorithm Input: a non-empty clause set S Output: a final DPLL-state IÐH UnitPropagate˚ while I | S not final do if DC P S s.t. IpCq “ 0 then ( Backtrack; UnitPropagate˚ ) or Fail else Decide; UnitPropagate˚ end if end while Example 3.1. Let S “ tt p1 , p2 u; tp2 , p3 , p4 u; tp2 , p4 , p5 u, t p3 , p5 uu. A run of the DPLL algorithm is the following H|S ÝÑD pd1 | S ÝÑUP pd1 , p2 | S ÝÑD pd1 , p2 , pd3 | S ÝÑUP pd1 , p2 , pd3 , p4 | S ÝÑUP pd1 , p2 , pd3 , p4 , p5 | S ÝÑD pd1 , p2 , p3 , pd4 | S ÝÑUP pd1 , p2 , p3 , pd4 , p5 | S ÝÑBT pd1 , p2 , p3 | S Now we are in a final state because the interpretation defined by I “ p1 , p2 , p3 , p4 , p5 satisfies all clauses of S. Theorem 3.1 (Termination). For every clause set S there is a final state F s.t. H | S ÝÑ˚ F . Proof Sketch. The standard strategy eventually reaches a final state. Note that there is a small caveat here: the “standard strategy” as defined above is not deterministic since it does not specify which literals to decide and neither how to decide them. However, for the termination result, this is irrelevant since any sequence of choices on the decide-rules will eventually reach a final state. Theorem 3.2 (Correctness). If H | S ÝÑ˚ F where F is final, then 12 1. If F is FAIL, then S is unsatisfiable. 2. If F “ I | S then IpSq “ 1. Additional DPLL-rules which are useful in practice (when properly applied) are Learn, Forget, Restart and Backjump. Learn " each atom of C occurs in S S(C I|S ÝÑ I | S, C if I | S, C ÝÑ I|S if S ( C I|S ÝÑ H|S Forget Restart Backjump is a more general form of backtracking which we will study more closely in the exercises. A number of features not treated in detail here are important for obtaining efficient implementations in practice: the decision rule does not specify on which literal to decide, heuristics for literal selection play a crucial role in practice. The backjump rule must be applied using cleverly constructed backjump clauses. There are efficient techniques for doing this. Often it pays out to add the backjump clauses to the clause set (using the Learn-rule), this is also called CDCL (conflict-driven clause learning). While the learned clauses can help to restructure the search space in a favourable way the downside is that the size of the clause set grows. Therefore typically such learned clauses are again forgotten if their activity-level falls below a certain threshold. The restart rule is helpful for dropping a search which has run astray (i.e. into regions of the search space where no interpretation is found) and restart from an empty interpretation keeping the learned clauses. Current state of the art SAT-solvers are capable of solving CNFs consisting of millions of atoms and clauses (if the structure of these CNFs is sufficiently simple). This has led to the common practice of using SAT-solvers for solving other NP-complete problems via their reduction to SAT (which is not without irony since such reductions have originally been conceived to argue that a problem at hand cannot be solved feasibly). On the other hand, the worst-case complexity of DPLL is still exponential; current SAT-solvers fail to solve quite small clause sets if their structure is sufficiently complicated. A popular SAT-solver is minisat1 . SAT-solvers typically use the DIMACS input format which is illustrated in the following example: Example 3.2. The clause set tt p2 , p4 , p1 u; t p2 , p4 u; t p1 , p3 u; tp2 u; t p4 , p3 uu is formulated in a language of 4 atoms and consists of 5 clauses. It is written in DIMACS input format as: p cnf 4 5 -2 -4 1 0 -2 4 0 -1 -3 0 2 0 -4 3 0 You can find more information on this format on the web. 1 http://minisat.se/ 13 3.2 Reminders on first-order logic Before we move on to SMT-solving and the DPLL(T) algorithm commonly employed for it, some reminders on first-order logic: A first-order language contains constant symbols, function symbols and predicate symbols. Each symbol has an arity, the number of arguments it takes (written f {n for the symbol f with arity n P N). In addition, we assume a countably infinite supply of variable names at our disposal. The terms over a language L (the L-terms) are defined inductively from constant symbols, variables and function symbols. L-formulas are defined inductively from atoms, the propositional connectives ^, _, , Ñ and the quantifiers @x, Dx. An L-structure is a pair M “ pD, Iq where D is a set, the domain of M and I maps all constant symbols, function symbols and predicate symbols of L to elements, functions and relations respectively of D and some variables to elements of D. The interpretation I is extended to cover all terms by defining Ipf pt1 , . . . , tn qq “ Ipf qpIpt1 q, . . . , Iptn qq. A formula may have free and bound variables, a formula without free variables is called sentence. A formula without any variables (and hence without quantifiers) is called ground formula. The truth of a formula F in a structure M “ pD, Iq is written as M ( F , pronounced as “F is true in M” or “M satisfies F ” or “M is a model of F ”, and defined, as usual, inductively on the structure of F under the assumption that all free variables of F are interpreted by I. This definition is extended to cover M ( F where F contains free variables which are not interpreted in M by considering these free variables as universally quantified. A sentence which is true in all structures is called valid. A sentence is called satisfiable if there is a structure in which it is true. A set of sentences Γ is called satisfiable if there is a structure in which all F P Γ are true. There is a number of different proof calculi for first-order logic, the notation $ ϕ means that the formula ϕ is provable. For a set of sentences Γ and a formula ϕ, the notation Γ $ ϕ means that ϕ can be proved using assumptions from Γ. The notation Γ ( ϕ means that every model which satisfies Γ also satisfies ϕ. By soundness and completeness of these calculi, all of them prove the same formulas: the valid formulas, i.e., we have Γ $ ϕ iff Γ ( ϕ. By the compactness theorem we have Γ ( ϕ iff there is a finite Γ0 Ď Γ s.t. Γ0 ( ϕ. Definition 3.4. A set of sentences Γ is called deductively closed if Γ $ ϕ implies ϕ P Γ. A theory is a deductively closed set of sentences. Definition 3.5. Let T be a theory. A formula ϕ is called T -satisfiable if there is a structure M s.t. M ( T and M ( ϕ. It is called T -unsatifiable if there is no such structure. It is called T -valid if T ( ϕ. For a theory T , a set of sentences Γ and a formula ϕ, we also write (T ϕ for T ( ϕ and Γ (T ϕ for Γ, T ( ϕ. Example 3.3. Let L be first-order language. Define the theory of equality of L, written EQL by the following axioms: @x x “ x @x@y px “ y Ñ y “ xq @x@y@z ppx “ y ^ y “ zq Ñ x “ zqq @x1 ¨ ¨ ¨ @xn @y1 ¨ ¨ ¨ @yn ppx1 “ y1 ^ . . . ^ xn “ yn q Ñ f px1 , . . . , xn q “ f py1 , . . . , yn qq for every n-ary function symbol f @x1 ¨ ¨ ¨ @xn @y1 ¨ ¨ ¨ @yn ppx1 “ y1 ^ . . . ^ xn “ yn q Ñ pP px1 , . . . , xn q Ñ P py1 , . . . , yn qqq 14 for every n-ary predicate symbol P Example 3.4. Let L “ tc{0, d{0, f {1, g{2u, then pc “ f pdq^d “ f pcqq Ñ f pf pcqq “ c is EQL -valid (which can be shown by a simple derivation in your favourite proof calculus). On the other hand, the formula gpc, dq “ gpd, cq is not EQL -valid. Let ptc, du˚ , ¨, εq be the free monoid generated by tc, du, then M “ ptc, du˚ , Iq with Ipgq “ ¨, Ipcq “ c, Ipdq “ d, and Ip“q being equality is a model of EQL but not of gpc, dq “ gpd, cq since cd ‰ dc in tc, du˚ . The formula gpc, dq “ gpd, cq is EQL -satisfiable. To see that, let M “ pN, Iq with Ipgq “ `, Ipcq “ 1, Ipdq “ 2, and Ip“q being equality and observe that 1 ` 2 “ 2 ` 1. The theory EQL will serve as illustrating example for DPLL(T). There are also more expressive theories, e.g., Presburger arithmetic or the theory of arrays which are routinely treated in SMT-solving. 3.3 DPLL(T) In applications it is often convenient to have stronger expressivity than that of propositional logic. On the other hand, problems in a more expressive formalism are more difficult to solve. A good compromise in the sense that much more expressivity can be obtained with only comparatively little more difficulty can be found in the area of “satisfiability modulo theories (SMT)”. An SMT-solver considers a quantifier-free first-order formula in a certain background theory T and determines its T -satisfiability using an extension of the DPLL procedure, called DPLL(T). For this algorithm to work, it is crucial to restrict the theories T we consider to only such theories which satisfy the following decidability condition: we require that T -satisfiability of conjunctions of ground literals is decidable. We will then call a decision procedure for conjunctions of ground literals a T -solver. The key to the DPLL(T) procedure is the subtle interplay between propositional interpretations and first-order models of ground formulas in T . Consider for example the following ground formula in EQL : a “ b ^ f paq “ f pbq. When this formula is considered a propositional formula, e.g., by a SAT-solver, it has the shape p1 ^ p2 . Now, p1 ^ p2 is a satisfiable propositional formula. An interpretation that makes it true is I with Ipp1 q “ 1 and Ipp2 q “ 0. When we consider I as a first-order model M we would require that M ( a “ b and M * f paq “ f pbq. However, this is inconsistent with EQL . Identifying an interpretation I of ground atoms with the set of literals it sets to true we are led to the following definition. Ź Definition 3.6. Let I be a set of ground literals in T . We say that I is T -consistent iff LPI I is T -satisfiable. So, to continue the example, in other words a “ b ^ f paq “ f pbq is T -unsatisfiable and consequently I is T -inconsistent. Therefore the propositional interpretation I does not represent a first-order model of T . In such a situation the DPLL(T) procedure will add this information to the clause set under consideration and thus find either an interpretation which is T -consistent or terminate the search with the result that the original clause set is T -unsatisfiable. 15 Definition 3.7. The DPLL(T) procedure has the states of the form I | S as DPLL with the only difference that S is no longer a set of propositional clauses but a set of ground clauses in the language of T . The transitions of DPLL(T) consist of UnitPropagate, Decide, Fail, and Backtrack defined as above plus in addition: T -Learn and Restart. T -Learn " I|S ÝÑ I | S, C if each atom of C occurs in S S (T C Definition 3.8. A DPLL(T)-state is called DPLL-final if it is either FAIL or of the form I | S where I contains all atoms of S and IpSq “ 1. A DPLL(T)-state is called DPLL(T)-final if it is either FAIL or of the form I | S where I is a T -consistent interpretation, contains all atoms of S and IpSq “ 1. The standard strategy for applying these rules is as follows: Algorithm 2 The DPLL(T) algorithm Input: a non-empty clause set S in the language of T Output: a DPLL(T)-final state while I | S Ð DPLLpSq is not DPLL(T)-final do S Ð S Y ttL | L P Iuu end while Ź T-Learn a conflict clause Ź Restart If we have a state I | S which, as in the condition of the while-loop, is DPLL-final but not DPLL(T)-final, then I Ž is an interpretation with IpSq “ 1 which is T -inconsistent. Therefore Ź (T I, i.e., ( T LPI LPI L and we can apply a T -Learn transition to add the clause tL | L P Iu to our clause set. In practice one will usually not add the whole clause tL | L P Iu but a subclause of it which is already T -valid but as small as possible. So, the software architecture of an SMT-solver has the following overall shape: propositional interpretation input clause set ' / SAT solver g T -solver conflict clause or “T -consistent” T -consistent interpretation or “T -unsat” Example 3.5. Let S “ tt a “ b, c “ du; t f pa, cq “ f pb, dquu, which, as a propositional clause set, is written as tt p1 , p2 u; t p3 uu. The DPLL(T) starts just like the DPLL-algorithm: H|S ÝÑUP p3 | S ÝÑD p3 , pd1 | S 16 ÝÑUP p3 , pd1 , p2 | S This state is DPLL-final. The propositional interpretations it induces is t p3 , p1 , p2 u which satisfies tt p1 , p2 u; t p3 uu. This interpretation is given to the EQL -solver which returns with the information that p3 ^ p1 ^ p2 , i.e., f pa, cq “ f pb, dq ^ a “ b ^ c “ d is EQL -unsatisfiable, in other words: (EQL a “ b _ c “ d _ f pa, cq “ f pb, dq. This clause (let us abbreviate it as C) is now added to the clause set by means of the EQL -Learn rule: ÝÑEQL -Learn p3 , pd1 , p2 | S, C ÝÑ D p3 , pd1 ÝÑ D p3 , p1 , p2 | S, C | S, C ÝÑ UP ÝÑR H | S, C p3 , pd1 , p2 | S, C ÝÑUP ÝÑ BT p3 | S, C p3 , p1 | S, C Now we have again reached a DPLL-final state. This time, its interpretation t p3 , p1 , p2 u is EQL -consistent, consider, e.g., the model M “ pN, Iq with Ipf q “ `, Ipcq “ Ipdq “ 0, Ipaq “ 1, Ipbq “ 2. Then 1 ` 0 ‰ 2 ` 0, 1 ‰ 2, 0 “ 0. We can conclude that S, C and hence S is EQL -satisfiable. As one can see already in the above simple example, the use of the Restart rule leads to the duplications of steps. In practice, one uses more efficient transitions like Backjump with the learned conflict clause instead. Theorem 3.3 (Termination). Let T be a theory. For set S of ground clauses in the language of T there is a DPLL(T )-final state F s.t. H | S ÝÑ˚ F . If T -satifiability of conjunctions of ground literals is decidable, a final state can be computed from S. Theorem 3.4 (Correctness). Let T be a theory and S be a set of ground clauses in the language of T . If H | S ÝÑ˚ F where F is DPLL(T )-final, then 1. If F is FAIL, then S is T -unsatisfiable. 2. If F “ I | S 1 then I is T -consistent and IpSq “ 1. Current SMT-solvers like z32 or veriT3 typically accept input in the SMT-LIB2 format. The clause set S of the above Example 3.5 is written in SMT-LIB2 format as: (set-logic QF_UF) (declare-sort U 0) (declare-fun a () U) (declare-fun b () U) (declare-fun c () U) (declare-fun d () U) (declare-fun f (U U) U) (assert (and (or (not (= a b)) (= c d)) (not (= (f a c) (f b d))))) (check-sat) (exit) The assert command accepts any formula, not just CNFs. A binary predicate can be defined by (declare-fun P (U U) Bool). For more information on this format, see the SMT-LIB2 tutorial4 . 2 http://github.com/Z3Prover/z3 http://www.verit-solver.org/ 4 http://www.grammatech.com/resources/smt/SMTLIBTutorial.pdf 3 17 3.4 Congruence closure So far we have considered the T -solver as a black box. In this section we will see a decision procedure for conjunctions of ground literals in EQL . This decision procedure, called congruence closure, computes the minimal congruence relation satisfying the positive equality literals in the input formula and then checks whether some negative information Źn contradicts this minimal relation. For ease of notation we will identify a conjunction F “ i“1 Ai of ground literals with a set of ground literals and consequently write A P F for Di P t1, . . . , nu s.t. A “ Ai . Algorithm 3 Congruence closure Input: a conjunction F of ground literals in EQL Output: EQL -satisfiable or EQL -unsatisfiable C Ð ttt, su | t “ s P F u Y tttu | t P subtermspF q, Es s.t. t “ s P F u while DC1 , C2 P C s.t. C1 ‰ C2 and C1 X C2 ‰ H do Ź close under transitivity C Ð pCztC1 , C2 uq Y tC1 Y C2 u end while while DC1 , C2 P C s.t. C1 ‰ C2 and f pt1 , . . . , tn q P C1 , f ps1 , . . . , sn q P C2 s.t. @i P t1, . . . , nuDC P C with ti , si P C do Ź close under congruence C Ð pCztC1 , C2 uq Y tC1 Y C2 u end while if Ds ‰ t P F, C P C s.t. s, t P C or DP pt1 , . . . , tn q, P ps1 , . . . , sn q P F s.t. @i P t1, . . . , nuDC P C s.t. ti , si P C then return EQL -unsat else return EQL -sat end if Example 3.6. Consider the formula F : a “ b ^ a ‰ c ^ b “ f pcq ^ gpa, f pcqq ‰ gpb, aq. We start out with the sets ta, bu, tb, f pcqu, tcu, tgpa, f pcqqu, tgpb, aqu. Closure under transitivity yields the equivalence relation ta, b, f pcqu, tcu, tgpa, f pcqqu, tgpb, aqu. Closure under congruence yields the congruence relation ta, b, f pcqu, tcu, tgpa, f pcqq, gpb, aqu. But since gpa, f pcqq ‰ gpb, aq is in F , the algorithm returns the information that F is EQL unsatisfiable. 18 Chapter 4 Normal forms in first-order logic As we saw, in propositional logic it is possible to transform every formula into a logically equivalent CNF. However, for reasons of practical efficiency it is usually more sensible to compute only a satisfiablity-equivalent CNF by applying the Tseitin-Transformation. In this section we will see how to compute normal forms for first-order formulas. The additional complication lies, of course, in the presence of quantifiers. They will be dealt with by a technique called Skolemisation1 which replaces existential quantifiers by new function symbols. As in propositional logic this will allow to obtain a clause set which is satisfiability-equivalent to the original formula. An important difference between propositional and first-order clause sets is that the atoms of the latter contain variables. These variables are considered to be universally quantified. Therefore, a first-order formula is considered to be in CNF if it is of the form ki n ł ľ @x Li,j . i“1 j“1 This formula can then be written as the clause set ttLi,j | 1 ď j ď ki u | 1 ď i ď nu. 4.1 Variable normal form We will start with the simple operation of renaming bound variables. Definition 4.1. A formula is said to be in variable normal form (VNF) if 1. it does not contain a variable that is free and bound, and 2. it does not contain a variable that is bound by two different quantifiers. Lemma 4.1. Let ϕ be a formula, then there is a formula ϕ1 in VNF which is logically equivalent to ϕ. Proof. Let @x ψ be a subformula of ϕ and let u be a variable which does not appear in ϕ. Then @x ψ ô @u ψrxzus and consequently ϕr@x ψs ô ϕr@u ψrxzuss. Thus all variables that violate the conditions of VNF can be renamed. 1 named after Thoralf Albert Skolem (1887–1963) 19 Example 4.1. Let ψ “ Du@x ppRpu, xq Ñ Qpxqq ^ Dv Rpx, vqq ^ Dx P pxq ^ @y pQpyq Ñ P pyqq. This formula is not in VNF since x is bound twice. By renaming x we obtain a formula ψ 1 “ Du@x ppRpu, xq Ñ Qpxqq ^ Dv Rpx, vqq ^ Dz P pzq ^ @y pQpyq Ñ P pyqq. in VNF. 4.2 Negation normal form In a second step we extend the notion of negation normal form (NNF) to first-order formulas: as in the case of propositional logic, a first-order formula is said to be in NNF if it does not contain Ñ and appears only immediately above atoms. Definition 4.2. We extend the formula transformations Φ` and Φ´ from the first exercise sheet to first-order logic by defining: Φ` pQx ψq “ Qx Φ` pψq Φ´ pQx ψq “ Qx Φ´ pψq where Q P t@, Du and Q “ @ if Q “ D and Q “ D if Q “ @. Lemma 4.2. Φ` is an NNF-transformation for first-order logic, i.e., for every formula ψ: Φ` pψq contains the same atoms as ψ, Φ` pψq is logically equivalent to ψ, Φ` pψq is in NNF, and |Φ` pψq| “ Op|ψ|q. Without proof. Example 4.2. The formula ψ 1 “ Du@x ppRpu, xq Ñ Qpxqq ^ Dv Rpx, vqq ^ Dz P pzq ^ @y pQpyq Ñ P pyqq obtained in Example 4.1 can be transformed to the NNF ψ 2 “ Φ` pψ 1 q “ Du@x pp Rpu, xq _ Qpxqq ^ Dv Rpx, vqq ^ @z P pzq ^ @y p Qpyq _ P pyqq. 4.3 Skolemisation Definition 4.3. Let ϕ be a formula in VNF. We define the ordering ďϕ on the bound variables of ϕ as x ďϕ y if Qx is above Qy in the formula tree of ϕ. Definition 4.4. Let ϕrDy ψs be a formula in VNF. Let @x1 , . . . , @xn be the quantifiers Qx with x ďϕ y. Define sky pϕrDy ψsq “ ϕrψryzf px1 , . . . , xn qss where f is a function symbol which does not appear in ϕ. Note that the above substitution ryzf px1 , . . . , xn qs carries variables into a context where they are bound by the quantifiers @x1 , . . . , @xn in ϕ. This is intended. This will occur at a few more occasions during our discussion of Skolemisation. Definition 4.5. Let ϕ be a formula in VNF. Let y1 , . . . , ym be all existentially bound variables in ϕ s.t. yi ďϕ yj implies i ď j. Define skpϕq “ skym p¨ ¨ ¨ psky1 pϕqq¨q. 20 Lemma 4.3. Let @x1 ¨ ¨ ¨ @xn Dy ϕ be a formula and f a function symbol which does not occur in ϕ, then @xDy ϕ „sat @x ϕryzf pxqs Proof. @x ϕryzf px1 , . . . , xn qs Ñ @xDy ϕ is a valid formula. For the other direction, let M “ pD, Iq be a model of @xDy ϕ in the language L of @xDy ϕ. We define a structure N in the language L Y tf u as follows: first N |L “ M und secondly f N pa1 , . . . , an q “ b for a b P M s.t. M ( ϕrxza, yzbs. Note that such a b exists since M ( @xDy ϕ. Therefore N is well-defined and we have N ( @x ϕryzf px1 , . . . , xn qs. Definition 4.6. Let Qx be a quantifier in ϕ. We define ϕQx as ϕ without the quantifier Qx. Lemma 4.4. Let ϕ be a formula in VNF and NNF. Let Qx be a quantifier in ϕ which is minimal w.r.t. ďϕ . Then ϕ ô Qx ϕQx . Proof. The transformation rules Qx ψ ˝ χ ÞÑ Qx pψ ˝ χq χ ˝ Qx ψ ÞÑ Qx pχ ˝ ψq preserve logical equivalence (since x cannot occur in χ due to the formula being in VNF). Proposition 4.1. Let ϕ be a formula in VNF and NNF. Then skpϕq is in VNF and NNF, satisfiability-equivalent to ϕ and does not contain D. Proof. Proceeding by induction on the number of existential quantifiers in ϕ, it suffices to show that sky pϕq is satisfiability-equivalent to ϕ for y being an outermost existential quantifier. Let Dy be so that x ďϕ y implies that x is universally quantified and let @x1 , . . . , @xn be all quantifiers which dominate Dy. By Lemma 4.4 we obtain ϕ ô @xDy ϕQxQy . By applying Lemma 4.3 we obtain @xDy ϕQxQy „sat @xϕQxQy ryzf pxqs for a fresh function symbol f . By applying Lemma 4.4 for shifting back the universal quantifiers we obtain @xϕQxQy ryzf pxqs ô ϕQy ryzf pxqs “ sky pϕq. So in total we have ϕ „sat sky pϕq. Example 4.3. We continue Example 4.2 where we have obtained the formula ψ 2 “ Du@x pp Rpu, xq _ Qpxqq ^ Dv Rpx, vqq ^ @z P pzq ^ @y p Qpyq _ P pyqq. in VNF and NNF. The Skolemisation of ψ 2 is ψ 3 “ @x pp Rpc, xq _ Qpxqq ^ Rpx, f pxqqq ^ @z P pzq ^ @y p Qpyq _ P pyqq. 21 4.4 Clause normal form Proposition 4.2. For every formula ϕ there is a clause set S which is satisfiability-equivalent to ϕ. Moreover, the mapping from ϕ to S is computable. Proof. Proceed as follows: • Apply Lemma 4.1 to obtain a formula ϕ1 in VNF. • Apply Lemma 4.2 to obtain a formula ϕ2 in VNF and NNF. • Apply Proposition 4.1 to obtain a formula ϕ3 in VNF and NNF which does not contain D. • Let ϕ4 be ϕ3 without universal quantifiers, then @xϕ4 ô ϕ3 by Lemma 4.4. Ź Ž i • Transform the quantifier-free formula ϕ4 to a satisfiability-equivalent CNF ni“1 kj“1 Li,j in order to obtain ki n ł ľ Li,j . ϕ „sat @x i“1 j“1 Then S “ ttLi,j | 1 ď j ď ki u | 1 ď i ď nu. Example 4.4. In Example 4.3 we have obtained the formula ψ 3 “ @x pp Rpc, xq _ Qpxqq ^ Rpx, f pxqqq ^ @z P pzq ^ @y p Qpyq _ P pyqq. which is logically equivalent to the clause set S “ tt Rpc, xq, Qpxqu; tRpx, f pxqqu; t P pzqu; t Qpyq, P pyquu. 22 Chapter 5 Resolution in first-order logic 5.1 Unification Before we start with unification, we need some additional notions about substitutions. First note that the set of all substitutions forms a monoid with the operation ˝ of concatenation and the identity id. Definition 5.1. Let σ be a substitution, then dompσq “ tx P V | xσ ‰ xu and rngpσq “ Ť xPdompσq Varpxσq. For substitutions σ, τ with dompσq X dompτ q “ H we define the substitution σ Y τ as $ ’ &xσ if x P dompσq xpσ Y τ q “ xτ if x P dompτ q ’ % x otherwise. Definition 5.2. Let T be a non-empty set of terms. A substitution σ is called unifier of T if |T σ| “ 1. Example 5.1. Let T “ thpx, gpxqq, hpf pyq, zqu. The substitution rxzf pyq, zzgpf pyqqs is a unifier of T . The substitution rxzf pf pyqq, yzf pyq, zzgpf pf pyqqqs is another unifier of T . Definition 5.3. Let σ and τ be substitutions. We say that σ is more general than τ , in symbols σ ď τ , if there is a substitution θ s.t. σθ “ τ . Definition 5.4. Let T be a set of terms. Then σ is called most general unifier of T if σ ď τ for every unifier τ of T . We will see soon that every set of literals, if it is unifiable at all, has a most general unifier. For showing this we will need some auxiliary notions and results. Definition 5.5. Let s and t be terms. Their difference set Diffps, tq is a finite set of pairs of terms which is defined inductively as follows: 1. If s “ t then Diffps, tq “ H. 2. If s ‰ t but s “ f ps1 , . . . , sn q and t “ f pt1 , . . . , tn q for a function symbol f , then Diffps, tq “ n ď i“1 23 Diffpsi , ti q. 3. Otherwise, Diffps, tq “ tps, tqu. Example 5.2. Diffphpf pyq, zq, hpx, gpxqqq “ tpf pyq, xq, pz, gpxqqu Lemma 5.1. A substitution σ is unifier of tt1 , t2 u iff σ is unifier of all pairs in Diffpt1 , t2 q. Proof. Let t1 σ “ t2 σ and ps1 , s2 q P Diffpt1 , t2 q. Then s1 is a subterm of t1 at a certain position p and s2 is subterm of t2 at the same position p. Therefore t1 σ “ t2 σ implies s1 σ “ s2 σ. For the other direction: if s1 σ “ s2 σ for all ps1 , s2 q P Diffpt1 , t2 q, then the definition of Diff directly implies that t1 σ “ t2 σ (using induction). The next lemma is a central property for the existence of most general unifiers: an arbitrary unifier can be factored into a difference pair and a rest. Lemma 5.2. Let t1 and t2 be terms, τ a unifier of tt1 , t2 u and px, sq P Diffpt1 , t2 q. Then τ “ rxzssτ 1 where τ 1 “ τ |dompτ qztxu . Proof. As px, sq P Diffpt1 , t2 q and τ is unifier of tt1 , t2 u we have xτ “ sτ. (5.1) by Lemma 5.1. Furthermore x R Varpsq: since px, sq is a difference pair s ‰ x. So if x P Varpsq then xτ would be proper subterm of sτ which would contradict (5.1). Therefore sτ “ sτ 1 . (5.2) and we obtain τ “ rxzxτ s Y τ 1 “p5.1q rxzsτ s Y τ 1 “p5.2q rxzsτ 1 s Y τ 1 “ rxzssτ 1 . Example 5.3. Let t1 “ hpf pyq, zq and t2 “ hpx, gpxqqq, then τ “ rxzf pcq, yzc, zzgpf pcqqs is a unifier of tt1 , t2 u, pz, gpxqq is a difference pair and hence τ “ rzzgpxqsrxzf pcq, yzcs. Theorem 5.1. Let tt1 , t2 u be unifiable. Then tt1 , t2 u has a most general unifier. Proof. We will proceed by induction on the number of variables which occur in tt1 , t2 u, written as |Varptt1 , t2 uq|. If |Varptt1 , t2 uq| “ 0 then unifiability already implies t1 “ t2 . If t1 “ t2 (independently of |Varptt1 , t2 uq|), then every substitution is a unifier and hence id is a most general unifier. So let t1 ‰ t2 . Then Diffpt1 , t2 q ‰ H. Let ps1 , s2 q P Diffpt1 , t2 q. Since tt1 , t2 u is unifiable, so is ts1 , s2 u. Furthermore, s1 and s2 have different head symbols since they are a difference pair. One of these symbols must be a variable (otherwise ts1 , s2 u would not be unifiable). So let w.l.o.g. s1 “ x. Then we also have x R Varps2 q, because otherwise xσ would be a proper subterm of s2 σ for any substitution σ which would contradict unifiability of ts1 , s2 u. We define t1i “ ti rxzs2 s for i “ 1, 2 and claim that tt11 , t12 u is unifiable. To see that, let τ be a unifier of tt1 , t2 u. Then by Lemma 5.2 we have τ “ rxzs2 sτ 1 and hence t11 τ 1 “ t1 rxzs2 sτ 1 “ t1 τ “ t2 τ “ t2 rxzs2 sτ 1 “ t12 τ 1 . 24 So tt11 , t12 u are unifiable and contain stricly less variables than tt1 , t2 u because x does no longer appear. By induction hypothesis there is a most general unifier σ 1 of tt11 , t12 u. We define σ “ rxzs2 sσ 1 and claim that σ is a most general unifier of tt1 , t2 u. First, σ is a unifier because t1 σ “ t1 rxzs2 sσ 1 “ t11 σ 1 “ t12 σ 1 “ t2 rxzs2 sσ 1 “ t2 σ. Secondly, let τ be an arbitrary unifier of tt1 , t2 u, then by Lemma 5.2 we can write τ as τ “ rxzs2 sτ 1 and – as above – we can show that τ 1 is a unifier of tt11 , t12 u. So there is a θ s.t. σ 1 θ “ τ 1 . But we also have σθ “ rxzs2 sσ 1 θ “ rxzs2 sτ 1 “ τ and therefore σ is a most general unifier. The above proof induces the following algorithm for the computation of a most general unififer of two terms t1 and t2 : • If Diffpt1 , t2 q “ H then mgupt1 , t2 q “ id. • If ps1 , s2 q P Diffpt1 , t2 q where both s1 and s2 have starting symbols which are constants or function symbols, then t1 , t2 is not unifiable. • Otherwise, let px, sq P Diffpt1 , t2 q. Then: – If x P Varpsq, then t1 , t2 is not unifiable. – If x R Varpsq, let t11 “ t1 rxzss and t12 “ t2 rxzss. Then mgupt1 , t2 q “ rxzssmgupt11 , t12 q. Example 5.4. Let t1 “ gpx, cq and t2 “ gpf pyq, yq. The the application of the above algorithm yields the following table: gpx, cq, gpf pyq, yq rxzf pyqs gpf pyq, cq, gpf pyq, yq ryzcs gpf pcq, cq, gpf pcq, cq id and hence mgupt1 , t2 q “ rxzf pyqsryzcsid “ rxzf pcq, yzcs. Definition 5.6. A set of literals E is called unifiable, if there is a substitution σ s.t. |Eσ| “ 1. Corollary 5.1. If a finite set of literals is unifiable, then it has a most general unifier. Proof. We will show that for every finite set E of literals there are terms s1 , s2 s.t. the unifiers of E are exactly the unifiers of ts1 , s2 u. By replacing every r-ary predicate symbol that appears in E by a new r-ary function symbol fP and negation by a new unary function symbol n we obtain a set of terms TE “ tt1 , . . . , tm u from E. Now let f be a new m-ary function symbol and define s1 “ f pt1 , . . . , t1 q and s2 “ f pt1 , . . . , tm q. Then a substitution σ is unifier of ts1 , s2 u iff t1 σ “ ti σ for all i P t1, . . . , mu iff ti σ “ tj σ for all i, j P t1, . . . , mu iff σ is unifier of E. 5.2 Resolution A variable permutation is a substitution σ : V Ñ V which is bijective. 25 Definition 5.7. For two clauses C and C 1 we say that C 1 is a variant of C, if there is a variable permutation π s.t. Cπ “ C 1 . Definition 5.8. Let C and D be variable-disjoint clauses. Let K P C and L P D be literals s.t. tK, Lu are unifiable and let µ be a most general unifier of tK, Lu. Then resK,L pC, Dq “ ppCztKuq Y pDztLuqqµ is called resolvent of C and D. Example 5.5. Let C “ ty ď y ¨ yu and D “ t x ď y, x ă spyqu. We rename y in C to y 1 and thus obtain C 1 “ ty 1 ď y 1 ¨ y 1 u. Now C 1 and D are variable-disjoint. The atoms y 1 ď y 1 ¨ y 1 and x ď y have a most general unifier µ “ rxzy 1 , yzy 1 ¨ y 1 s and hence C and D form the resolvent ty 1 ă spy 1 ¨ y 1 qu. Note that renaming y to y 1 is necessary for carrying out this resolution step as y ď y ¨ y and x ď y are not unifiable. Definition 5.9. Let C be a clause and D Ď C be unifiable with most general unifier µ. Then Cµ is called factor of C. Example 5.6. The clause set S “ ttP pxq, P pyqu; t P puq, P pvquu is unsatisfiable. Up to variable-renaming the only resolvent obtainable from S is tP pxq, P pvqu. In particular, the empty clause is not derivable from S by resolution alone. On the other hand, tP pxq, P pyqu has the factor tP pxqu and t P puq, P pvqu has the factor t P puqu. From tP pxqu and t P puqu one obtains the empty clause by a single resolution step. The above example shows that for completeness, the factor rule is necessary. There are also variants of the resolution rule which incorporate factoring. Then an explicit factor rule is not necessary. We will see more details on this later. Definition 5.10. Let C and D be variable-disjoint clauses. Let s “ t P C and Lrus P D be literals s.t. s and u are unifiable. Let µ be a most general unifier of ts, uu. Then ` ˘ pars“t,Lrus pC, Dq “ pCzts “ tuq Y pDztLrusuq Y tLrtsu µ is called paramodulant of C and D. Example 5.7. A short example for a deduction consisting of two paramodulations is: x`0“x x ` spyq “ spx ` yq s2 p0q ` sp0q ď s2 p0q sps2 p0q ` 0q ď s2 p0q s3 p0q ď s2 p0q Definition 5.11. Let C, D be clauses and let C 1 , D1 be variable-disjoint variants of C and D respectively. Let C01 Ď C 1 and D01 Ď D1 s.t. C01 Y D01 is unifiable with mgu σ. Then the clause ppC 1 zC01 q Y pD1 zD01 qqσ is called big-step resolvent of C and D. Definition 5.12. Let C and D be clauses and let C 1 and D1 be variable-disjoint variants of C and D respectively. Let C01 Ď C 1 s.t. C01 is unifiable with mgu µ to s “ t and let D01 Ď D1 s.t. D01 is unifiable with mgu ν to Lrus for some term u which is unifiable with s with mgu σ. Then ` 1 ˘ pC µzts “ tuq Y pD1 νztLrusuq Y tLrtsu σ is called big-step paramodulant of C and D. 26 Definition 5.13. Let S be a clause set. A finite list C1 , . . . , Cn of clauses is called resolution deduction from S if for all i P t1, . . . , nu: 1. Ci P S, or 2. Ci “ tt “ tu for some term t, or 3. there are j, k ă i s.t. Ci is a big-step resolvent of Cj and Ck , or 4. there are j, k ă i s.t. Ci is a big-step paramodulant of Cj and Ck . If Cn “ H, then C1 , . . . , Cn is called resolution refutation. Theorem 5.2 (Soundness). If S has a resolution refutation, then S is unsatisfiable. Proof. We will show the following stronger statement: if C1 , . . . , Cn is a deduction consisting of clauses from S, reflexivity, factor, variant, resolution and paramodulation, then S ( Cn . Then, if Cn “ H, S is unsatisfiable. We proceed by induction on n, making a case distinction on the rule used for deriving the last clause Cn : 1. If Cn P S, we are done. 2. If Cn “ tt “ tu, we are done, 3. If Cn is a variant of some Cj with j ă n then we are done since renaming of bound variables preserves logical equivalence. 4. If Cn “ Cj σ for some j ă n then we are done since C ( Cτ for all clauses C and all substitutions τ . 5. Let Cn “ resLj ,Lk pCj , Ck q “ ppCj ztLj uq Y pCk ztLk uqqµ for some j, k ă n. Then Lj µ “ Lk µ. Let M ( S, then by induction hypothesis M ( Cj and M ( Ck . Therefore also M ( Cj µ and M ( Ck µ. Now, writing Cj “ Cj1 _ Lj and Ck “ Ck1 _ Lk , we make a case distinction. If M ( Lj µ, then M ( Ck1 µ and hence M ( Cn . The case M ( Lk µ is symmetric. ` ˘ 6. Let Cn “ pars“t,Lrus pCj , Ck q “ pCj zts “ tuq Y pCk ztLrusuq Y tLrtsu µ for some j, k ă n. Then sµ “ uµ. Let M ( S, then by induction hypothesis M ( Cj and M ( Ck . Therefore also M ( Cj µ and M ( Ck µ. Now, writing Cj “ Cj1 _s “ t and Ck “ Ck1 _Lrus we make a case distinction. If M ( sµ “ tµ then, since uµ “ sµ and M ( Ck1 µ _ Lrusµ, we have M ( Ck1 µ _ Lrtsµ which is a subclause of Cn and so M ( Cn . If, on the other hand, M * sµ “ tµ, then M ( Cj1 µ which is a subclause of Cn and so M ( Cn . There is a number of theorem provers for first-order logic which are based on resolution, paramodulation and variants therefore, for example Vampire1 , E2 , SPASS3 , prover94 . The following is an example input file for prover9. We speak about a context of groups, use f for the binary group operation, g for the unary inverse operation and e for the unit element. The following input file asks prover9 to show that every left-unit is also a right-unit. 1 http://www.vprover.org/ www.eprover.org/ 3 http://www.spass-prover.org/ 4 https://www.cs.unm.edu/~mccune/mace4/ 2 27 formulas(assumptions). f(x,f(y,z)) = f(f(x,y),z). f(e,x) = x. f(g(x),x) = e. f(x,g(x)) = e. end_of_list. formulas(goals). f(x,e) = x. end_of_list. 28 Chapter 6 Redundancy 6.1 Subsumption Definition 6.1. Let C and D be clauses. We say that C subsumes D if there is a substitution σ s.t. Cσ Ď D. In this case we write C ďss D. Let S, T be clause sets. Then S ďss T if @D P T DC P S s.t. C ďss D. Occasionally we want to make the substitution explicit; then we write C ďσss D as abbreviation for Cσ Ď D. Lemma 6.1. If C ďss D then C ( D. If S ďss T then S ( T . Proof. We have C ( Cσ for any clause C and any substitution σ. Moreover, if D1 Ď D, then D1 ( D because a clause is a disjunction. Letting D1 “ Cσ we see that C ďss D implies that C ( D. If S ďss T , then @D P T DC P S s.t. C ďss D, hence C ( D and so S ( D since S is a conjunction. This means that every conjunct of T is implied by S, so S ( T . Example 6.1. The converse of Lemma 6.1 is not true. Consider C “ t P pxq, P pf pxqqu and D “ t P pyq, P pf pf pyqqqu. Then C ( D but there is no substitution σ s.t. Cσ Ď D. So subsumption is a restricted form of implication. While clause implication is undecidable, subsumption is decidable. Proposition 6.1. Let S be a clause set, T Ď S s.t. SzT ďss S. Then S and SzT are logically equivalent. Proof. S ( SzT is immediate. The other direction has just been shown in Lemma 6.1. This result shows that S is unsatisfiable iff SzT is unsatisfiable. Therefore, up to a certain point, it gives a justification for removing subsumed clauses from a clause set before we start the search for a refutation. By telling us that unsatisfiability is preserved it shows the correctness of this preprocessing step. However, it does not tell us anything about the proof length. This result does not rule out the existence of short refutations of S in a situation where all refutations of SzT are long. However, we will see that this is not the case. Every refutation of S can be pruned to one of Sz. In order to prove this result we need to carry out a thorough study of the relationship between subsumption and the inference rules considered so far. 29 C ďss ďss E D ďss variant E ďss C variant D Figure 6.1: Lemma 6.2 C ďss factor D ďss E Figure 6.2: Lemma 6.3 Lemma 6.2. Let C ďss D and E be a variant of D, then C ďss E. Proof. If C ďss D, then there is a σ s.t. Cσ Ď D. If E is a variant of D, then there is a variable permutation π s.t. Dπ “ E. Therefore Cσπ Ď E, i.e., C ďss E. Note that being a variant is a symmetric relation, hence we obtain both of the statements depicted in Figure 6.1. Lemma 6.3. Let D be a factor of C and D ďss E. Then C ďss E. Proof. D “ Cµ for some substitution µ and Dσ Ď E for some subtitution σ. Therefore Cµσ Ď E, i.e., C ďss E. Note that factor (in contrast to variant) is not symmetric, for the other direction, a different relation holds and more work is needed, see the next lemma: Lemma 6.4. Let C 1 ďss C. Let C0 Ď C be unifiable with mgu µ and let C0 µ “ tLu. Then there is a factor C 1 µ1 of C 1 with C 1 µ1 ďτss Cµ s.t. there is at most one L1 P C 1 µ1 with L1 τ “ L. Proof. As C 1 ďss C there is a σ s.t. C 1 σ Ď C. Let C0 Ď Cˆ0 Ď C be maximal with Cˆ0 µ “ tLu. Let C01 “ tL P C 1 | Lσ P Cˆ0 u. Since µ is a unifier of Cˆ0 and C01 σ Ď Cˆ0 , µ is also a unifier of C01 σ. Then σµ is a unifier of C01 . Let µ1 be a mgu of C01 , then C 1 µ1 is a factor of C 1 and we have µ1 ď σµ. Let τ be s.t. µ1 τ “ σµ. Then we have C 1 µ1 τ “ C 1 σµ Ď Cµ “ D, i.e., C 1 µ1 ďτss Cµ. C01 Ď C 1 ďσ ss factorµ1 C 1 µ1 C Ě C0 factorµ ďτss Cµ Q L Figure 6.3: Lemma 6.4 30 C1 ďss C Ě C0 factorν 1 D0 Ď D ďθss Cν Q L D1 factorλ1 factorλ factorν L2 P C 1 ν 1 ss ě K P Dλ τ ě ss D 1 λ1 Q K 2 µ˚ E ďεss µ2 E1 Figure 6.4: Lemma 6.6 Now suppose that there are L21 , L22 P C 1 µ1 with L21 τ “ L22 τ “ L. Then there are L11 , L12 P C 1 s.t. L11 µ1 “ L21 and L12 µ1 “ L22 . Furthermore, L11 σµ “ L11 µ1 τ “ L21 τ “ L and analogously L12 σµ “ L. Hence, by maximality of Cˆ0 , L11 σ, L12 σ P Cˆ0 . Therefore L11 , L12 P C01 and as µ1 is mgu of C01 we have L11 µ1 “ L12 µ1 , i.e., L21 “ L22 . Before we move on to analyse the relationship between the resolution rule and subsumption we need to make a prepartory observation on set-operations on clauses and substitutions. We have already repeatedly used the fact that pC Y Dqσ “ Cσ Y Dσ for all clauses C and D, and all substitutions σ. For set difference, the situation is more complicated as the following example demonstrates. Example 6.2. Let C “ tP paqu, D “ tP pxqu and σ “ rxzas. Then Cσ “ tP paqu, Dσ “ tP paqu, CσzDσ “ H, pCzDqσ “ tP paqu. so pCzDqσ ‰ CσzDσ. However, as the following lemma shows, we have equality under an additional injectivitiycondition. Lemma 6.5. Let C, D be clauses and σ a substitution. 1. Then CσzDσ Ď pCzDqσ. 2. If for every L P Dσ there is at most one L1 P C YD with L1 σ “ L, then pCzDqσ Ď CσzDσ. Proof. If L P CσzDσ then there is a L0 P C s.t. L “ L0 σ. But L0 R D for suppose L0 P D, then L0 σ P Dσ which is not the case. So L0 P CzD and hence L “ L0 σ P pCzDqσ. For 2, let L P pCzDqσ, then there is L0 P C s.t. L0 R D and L0 σ “ L, so L P Cσ. Suppose L P Dσ, then there would be L1 P D s.t. L1 σ “ L. But then, by the assumption, L0 “ L1 which is a contradiction. Therefore L R Dσ and so L P CσzDσ. Lemma 6.6. Let E be a big-step resolvent of C and D and let C 1 ďss C and D1 ďss D. Then C 1 ďss E or D1 ďss E or there is a big-step resolvent E 1 of C 1 and D1 s.t. E 1 ďss E. 31 Proof. By Lemma 6.2 we can assume that C and D are variable-disjoint while preserving the assumptions C 1 ďss C and D1 ďss D. Let C0 Ď C and D0 Ď D s.t. C0 Y D0 is unifiable with mgu µ s.t. E “ ppCzC0 q Y pDzD0 qqµ. Let ν be the mgu of C0 and λ be the mgu of D0 and let C0 ν “ tLu and D0 λ “ tKu. Since C0 Y D0 is unifiable and ν, λ are most general, also tL, Ku is unifiable with a mgu µ˚ . Then µ “ pν Y λqµ˚ By Lemma 6.4 applied to C 1 ďss C and the factor Cν, there is a factor C 1 ν 1 of C 1 s.t. C 1 ν 1 ďθss Cν and there is at most one L2 P C 1 ν 1 s.t. L2 θ “ L. Assume L R C 1 ν 1 θ. We know that C 1 ν 1 θ Ď Cν and tLu “ C0 ν. So C 1 ν 1 θ Ď CνzC0 ν Ď pCzC0 qν and therefore C 1 ν 1 θµ˚ Ď pCzC0 qνµ˚ Ď ppCzC0 qν Y pDzD0 qλqµ˚ “ E. So C 1 ν 1 ďss E and since C 2 is a factor of C 1 also C 1 ďss E by Lemma 6.3. So from now on we assume that L P C 1 ν 1 θ. By Lemma 6.4 applied to D1 ďss D and the factor Dλ, there is a factor D1 λ1 of D1 s.t. D1 λ1 ďτss Dλ and there is at most one K 2 P D1 λ1 s.t. K 2 τ “ K. As above K R D1 λ1 τ implies D1 ďss E and therefore, from now on, we assume K P D1 λ1 τ . Since L P C 1 ν 1 θ there is a L2 P C 1 ν 1 s.t. L2 θ “ L and similarily there is a K 2 P D1 λ1 s.t. K 2 τ “ K. As tL, Ku is unifiable with mgu µ˚ we have Lµ˚ “ Kµ˚ and hence L2 θµ˚ “ K 2 τ µ˚ . Let µ2 be a mgu of tL2 , K 2 u, then µ2 ď pθ Y τ qµ˚ . We define E 1 “ resL2 ,K 2 pC 1 ν 1 , D1 λ1 q Then E 1 is a big-step resolvent of C 1 and D1 by definition. It remains to show that E 1 ďss E. To that aim, let ε be a substitution s.t. µ2 ε “ pθ Y τ qµ˚ . Then E 1 ε “ ppC 1 ν 1 ztL2 uq Y pD1 λ1 ztK 2 uqµ2 ε “ ppC 1 ν 1 ztL2 uqθ Y pD1 λ1 ztK 2 uqτ qµ˚ . There is at most one L0 P C 1 ν 1 with L0 θ “ L2 θ and – analogously – there is at most one K0 P D1 λ1 with K0 τ “ K 2 τ . Therefore we can apply Lemma 6.5 to obtain “ ppC 1 ν 1 θztL2 uθq Y pD1 λ1 τ ztK 2 uτ qqµ˚ and since L2 θ “ L, K 2 τ “ K and C 1 ν 1 θ Ď Cν and D1 λ1 τ Ď Dλ we have Ď ppCνztLuq Y pDλztKuqµ˚ and as C0 ν “ tLu and D0 λ “ tKu we have Ď ppCzC0 qν Y pDzD0 qλqµ˚ “ ppCzC0 q Y pDzD0 qqµ “ E, i.e., E 1 ďεss E. Lemma 6.7. Let E be a big-step paramodulant of C and D and let C 1 ďss C and D1 ďss D. Then C 1 ďss E or D1 ďss E or there is a big-step paramodulant E 1 of C 1 and D1 s.t. E 1 ďss E. Proof Sketch. Follow the same strategy as for Lemma 6.6 above. We can now prove our main lemma on subsumption in resolution deductions which will be useful on several occasions. Lemma 6.8. Let C1 , . . . , Cn be a resolution deduction and let Ck1 ďss Ck . Then there is a resolution deduction C11 , . . . , Cn1 s.t. for all i P t1, . . . , nuztku: if i ă k or i is an initial position, then Ci1 “ Ci and otherwise: Ci1 ďss Ci . 32 Note that C1 , . . . , Cn being a deduction from a clause set S does not entail that C11 , . . . , Cn1 is also a deduction from S. If Ck P S and Ck1 R S, then C11 , . . . , Cn1 is a deduction from S Y tCk1 u or, if Ck occurs only once in C1 , . . . , Cn , even a deduction from pSztCk uq Y tCk1 u. 1 Proof. We consider the deduction C1 , . . . , Ck´1 , Ck1 and show that there are Ck`1 , . . . , Cn1 s.t. C1 , . . . , Ck´1 , Ck1 , . . . , Cn1 is a deduction by induction on n. If n “ k we are done. For the 1 induction step, assume the clauses exist for n. If Cn`1 is an initial clause, let Cn`1 “ Cn`1 and we are done. If Cn`1 is a big-step resolvent of Cj and Cl then by induction hypothesis Cj1 ďss Cj and Cl1 ďss Cl . So by Lemma 6.6 we have i) Cj1 ďss Cn`1 or ii) Cl1 ďss Cn`1 or iii) 1 there is a big-step resolvent D of Cj1 and Cl1 s.t. D ďss Cn`1 . We let Cn`1 “ Cj1 in case i), 1 1 1 1 Cn`1 “ Cl in case ii), and Cn`1 “ D in case iii) and hence have Cn`1 ďss Cn`1 . If Cn`1 is a big-step paramodulant, proceed analogously using Lemma 6.7. We are now in a position to prove a stronger version of Proposition 6.1. Theorem 6.1. Let S, T be clause sets s.t. S ďss T . If T has a resolution refutation of length n, then S has a resolution refutation of length at most n. Proof. Let ρ be a resolution refutation of a finite T0 Ď T . Let S0 be a finite subset of S s.t. S0 ďss T0 . W.l.o.g. assume that every clause occurs at most once in ρ. Then, use Lemma 6.8 for each D P T0 to replace D in ρ by a C P S0 with C ďss D resulting in a refutation ρ1 of pT0 ztDuq Y tCu. The above theorem shows that a clause set should be reduced by subsumption before the search for a resolution refutation is started. The following proposition shows that we can also restrict the search for a refutation in such a way that we never derive a clause which is subsumed by a clause already derived. This is called forward subsumption and is one of the most important techniques for avoiding redundancy in first-order resolution theorem proving. Theorem 6.2 (forward-subsumption). If S has a resolution refutation of length n, then S has a resolution refutation C1 , . . . , Cm for some m ď n s.t. there are no i ă j with Ci ďss Cj . Proof. Let C1 , . . . , Cm be a resolution refutation and let i ă j s.t. Ci ďss Cj . Apply Lemma 6.8 1 1 . After in order to replace Cj by Ci and thus obtain a refutation C1 , . . . , Cj´1 , Ci , Cj`1 , . . . , Cm 1 1 . Redropping the copy of Ci at position j we obtain a refutation C1 , . . . , Cj´1 , Cj`1 , . . . , Cm peating this step will terminate since it descreases the length of the derivation. Furthermore, it will terminate with a refutation satisfiying the condition of the theorem. 6.2 Tautology deletion Definition 6.2. A clause C is called tautological if there is a literal L s.t. L, L P C. Tautological clauses may be derived from non-tautological clauses, e.g. as in P pf pxqq, Qpf pxqq, Rpxq Qpyq, P pyq res ryzf pxqs P pf pxqq, P pf pxqq, Rpxq Lemma 6.9. If S has a resolution refutation ρ, then there is a resolution refutation ρ1 of S s.t. |ρ1 | ď |ρ| and every clause in ρ1 is ancestor of the empty clause at the end of ρ1 . 33 Without Proof. Theorem 6.3. If ρ is a resolution refutation of S, then S has a resolution refutation ρ1 which does not contain tautologies and satisfies |ρ1 | ď |ρ|. Proof. By Lemma 6.9 we can assume that ρ only contains ancestors of the empty clause. If a tautological clause is ancestor of the empty clause, both of its dual literals must eventually disappear. The negative literal can not be removed by paramodulation, so it must be removed by resolution. Let L be the positive, L be the negative literal. Then the resolution step which removes L is of the form C Y tL, Lu D Y tKu resµ pC Y D Y tLuqµ where µ is an mgu of K and L. Note that Kµ “ Lµ and hence that D YtKu ďµss pC YD YtLuqµ. But by the forward-subsumption theorem there is a ρ1 with |ρ1 | ď |ρ| without a clause being subsumed by an earlier derived clause. This theorem shows that the derivation of tautological clauses is useless. This can be avoided in order to reduce the size of the search space. 34 Chapter 7 Completeness In this chapter, it is important to distingish between first-order logic with equality and firstorder logic without equality. The former interprets the binary predicate symbol “ as actual equality in a structure, the latter treats “ just as any other predicate symbol. Formally, these are two different notions of structure and hence two different notions of truth, satisfiability, validity, etc. To illustrate this difference, consider the following example. Example 7.1. Let L “ t0, 1, `, ´, ¨u be the language of rings. Let M “ pZ, Iq be the structure in first-order logic with equality defined by I being the standard interpretation of L. Then Ip“q is the actual equality relation on Z. If we work in first-order logic without equality, then “ is just another binary predicate symbol, whose interpretation we have to fix when defining a structure. For example, let M1 “ pZ, I 1 q where I 1 |L “ I and I 1 p“q is defined by px, yq P I 1 p“q ô x ” y pmod mq for some m ě 2. Then, Žm´1 letting ϕm “ @x i“0 x “ i, we have M1 ( ϕm but M * ϕm . We will first prove the completeness theorem for first-order logic without equality and then base the completeness of first-order logic with equality on that. A central tool for the proof of completeness are ground refutations. Definition 7.1. A clause D is called ground instance of a clause C if D contains no variables and there is a substitution σ s.t. Cσ “ D. A resolution deduction is called ground deduction if it consists of ground clauses only. Note that, in particular, in a ground deduction every unifier is the identity substitution. Definition 7.2. For a clause set S we define GpSq “ tD | D is ground instance of a C P Cu. In extension of the terminology, a ground refutation of GpSq is also called ground refutation of S. 7.1 Completeness without equality Lemma 7.1. Let S be a clause set. Then S is satisfiable in first-order logic without equality iff GpSq is satisfiable in first-order logic without equality. Proof. The left-to-right implication follows directly from the observation that every model of S is a model of GpSq. For the other direction, let I be a propositional interpretation of GpSq s.t. IpGpSqq “ 1. We define a first-order structure MI as follows: the domain of MI are all ground terms of the 35 language of S. The interpretation of a term t is defined as tMI “ t and the interpretation of the predicate symbols as P MI pt1 , . . . , tn q “ IpP pt1 , . . . , tn qq. Let C P S, then MI ( D for every ground instance D of C. But since the domain of MI only contains ground terms, satisfying D for all ground instances is equivalent to satisfying @x C. Therefore MI ( C for all C P S. The key observation for basing the completess proof on the notion of subsumption is that S ďss GpSq. Theorem 7.1 (Completeness). Let S be a clause set. If S is unsatisfiable in first-order logic without equality, then S has a resolution refutation. In fact, a stronger statement is true: S even has a resolution refutation without reflexivity instances and paramodulation inferences. Proof. Let S be unsatisfiable, then by Lemma 7.1 also GpSq is unsatisfiable. By the completeness of propositional resolution there is a propositional resolution refutation of GpSq, i.e., a first-order resolution refutation of GpSq which consists only of initial clauses from S and ground resolution. Since S ďss GpSq we can apply Theorem 6.1 in order to obtain a resolution refutation of S. This proof provides, at least on the theoretical level, an alternative method for showing that a first-order clause set is unsatifiable: generate ground instances from GpSq and refute them using propositional resolution. While this unification-free method is complete, it is much less efficient than first-order resolution with unification and does not play a role in practice. 7.2 Completeness with equality In order to prove the completeness theorem for first-order logic with equality we will make use of an explicit axiomatisation of the theory of equality EQL (see Chapter 3) as a clause set. Definition 7.3. Let L be a first-order language. We define the clause set EQL “ttx “ xu; t x “ y, y “ xu; t x “ y, y “ z, x “ zuu Y tt x1 “ y1 , . . . , xn “ yn , f px1 , . . . , xn q “ f py1 , . . . , yn qu | f {n P Lu Y tt x1 “ y1 , . . . , xn “ yn , P px1 , . . . , xn q, P py1 , . . . , yn qu | P {n P Lu. Lemma 7.2. Every clause in EQL has a deduction from tautologies and reflexivity using paramodulation. Proof. For reflexivity, there is nothing to do. For symmetry we have x “ y, x “ y x “ x x “ y, y “ x For transitivity we have y “ z, y “ z x “ y, x “ y x “ y, y “ z, x “ z For f -congruence we have x1 “ y1 , x1 “ y1 f px1 , . . . , xn q “ f px1 , . . . , xn q x1 “ y1 , f px1 , . . . , xn q “ f py1 , x2 , . . . , xn q .. .. x1 “ y1 , . . . , xn “ yn , f px1 , . . . , xn q “ f py1 , . . . , yn q 36 and for P -congruence we have x1 “ y1 , x1 “ y1 P px1 , . . . , xn q, P px1 , . . . , xn q x1 “ y1 , P px1 , . . . , xn q, P py1 , x2 , . . . , xn q .. .. x1 “ y1 , . . . , xn “ yn , P px1 , . . . , xn q, P py1 , . . . , yn q Lemma 7.3. Let S be a clause set. S is satisfiable in first-order logic with equality iff S Y EQL is satisfiable in first-order logic without equality. Proof. Let M be an L-structure with equality, define the L Y t“u-structure without equality M1 “ pD, I 1 q where I 1 |L “ I and I 1 p“q is equality in D. Since M interprets “ as equality in D, we have M ( ϕ iff M1 ( ϕ. In addition, equality in D is a congruence relation w.r.t. L and therefore M1 ( EQL . For the other direction, let M “ pD, Iq be an L-structure without equality which satisfies EQL , then Ip“q is a congruence relation on D w.r.t. L. Define M1 “ pD{Ip“q, I 1 q where I 1 is the interpretation induced by I on D{Ip“q. Note that I 1 is well-defined because Ip“q is a congruence relation. Then we have M ( ϕ iff M1 ( ϕ. Theorem 7.2 (Completeness). Let S be a clause set. If S is unsatisfiable in first-order logic with equality, then S has a resolution refutation. Proof. Let S be unsatisfiable in first-order logic with equality. Then S Y EQL is unsatisfiable in first-order logic without equality by Lemma 7.3. By the completeness theorem for first-order logic without equality we obtain a resolution refutation of S Y EQL . By Lemma 7.2 we obtain a resolution refutation of S Y T where T is a set of tautological clauses. By Theorem 6.3 we obtain a resolution refutation of S. 37 38 Chapter 8 Further Topics 8.1 Induction Resolution theorem provers, being sound and complete for first-order logic, do not handle induction. However, it is possible to prove theorems by induction by supplying the necessary induction axioms thus reducing the problem to checking validity of a first-order formula. For instance, when we want to prove the associativity of ` from the definition of ` we need to do an induction on the rightmost variable of the associativity axiom. This information can be passed to prover9 as follows: formulas(sos). all x x + 0 = x. all x all y x + s(y) = s(x + y). all z ( P(z) <-> all x all y ( x + y ) + z = x + ( y + z )). ( P(0) & all z ( P(z) -> P(s(z))) ) -> all z P(z). end_of_list. formulas(goals). all x all y all z ( x + y ) + z = x + ( y + z ). end_of_list. The automated generation of induction invariants is a difficult problem which is of considerable importance for applications of automated deduction in areas such as software verification. 39