Download Automated Deduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mathematical logic wikipedia , lookup

Abductive reasoning wikipedia , lookup

Rewriting wikipedia , lookup

Model theory wikipedia , lookup

Law of thought wikipedia , lookup

Theorem wikipedia , lookup

Natural deduction wikipedia , lookup

Quasi-set theory wikipedia , lookup

Intuitionistic logic wikipedia , lookup

Curry–Howard correspondence wikipedia , lookup

Structure (mathematical logic) wikipedia , lookup

Non-standard calculus wikipedia , lookup

First-order logic wikipedia , lookup

Propositional formula wikipedia , lookup

Propositional calculus wikipedia , lookup

Laws of Form wikipedia , lookup

Transcript
Automated Deduction
Stefan Hetzl
[email protected]
Vienna University of Technology
Summer Term 2016
ii
Contents
1 Introduction
1
2 Resolution in propositional logic
3
2.1
Reminders on propositional logic . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.2
Normal forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.3
Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.4
The Tseitin transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
3 SAT- and SMT-solving
11
3.1
DPLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2
Reminders on first-order logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3
DPLL(T) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4
Congruence closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Normal forms in first-order logic
19
4.1
Variable normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2
Negation normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3
Skolemisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4
Clause normal form
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Resolution in first-order logic
23
5.1
Unification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2
Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 Redundancy
29
6.1
Subsumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2
Tautology deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7 Completeness
35
7.1
Completeness without equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2
Completeness with equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
iii
8 Further Topics
8.1
39
Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Chapter 1
Introduction
It is worthwhile to start these course notes by giving a brief outline of the history of concepts
which are fundamental to the area of automated deduction. Even though this outline is quite
superficial it nevertheless serves to illustrate how old many of the ideas and concepts at the
base of automated deduction are.
The notion of mathematical proof goes back to ancient Greece, more specifically to the Elements
of Euclid („ 300 BC). This work, having been the first deductive treatment of mathematics,
is considered the inception of the axiomatic method. It has had tremendous influence on all
of mathematics and, by being the root of the notion of mathematical proof, also on logic and
automated deduction.
An import milestone in the conceptual development of automated deduction was the work of
Leibniz (1646–1716) who among other subjects has made seminal contributions to mathematics
and philosophy. He was entertaining the idea that a dispute between two persons could be
solved by writing up the statement under discussion in a universal language (characteristica
universalis) and then using a calculus of reasoning (calculus ratiocinator) for dedicing the truth
of assertions expressed in the universal language. Instead of arguing about a statement two
persons in disagreement could then calculate who is right using this framework. This idea is
embodied in the slogan calculemus!, let us calculate!
Another aspect of automated deduction, the automation, was also present in Leibniz’ work.
He has designed a computation machine for addition, multiplication, subtraction and division.
While Leibniz has never fully worked out proposals of such a universal language and a calculus
of reasoning for it, he foresaw, at least on a conceptual level, the three components which are
essential to automating deduction: a language for expressing statements, rules of computation
applicable to expressions of the language and the automatisation of these rules.
The first mathematisation of logical reasoning can be attributed to Boole (1815–1864). He
proposed to express logical propositions by algebraic equations and to thus treat logic by computation rules much like those valid for the numbers. This provides an infallible method for
logical reasoning. In honour of this contribution, propositional logic is often also called Boolean
logic.
At the end of the 19th century and well into the 1920s and 1930s logic has undergone an
enormous development, characterised primarily by its mathematisation and the solution of
many of its fundamental problems. This development cannot possibly be recounted here except
for the following aspects which are of central importance to automated deduction:
In 1928 Hilbert posed his famous “Entscheidungsproblem” (which often, even in the English-
1
language literature is still called by its German name). In modern terminology: is there an
algorithm which when given a formula in first-order logic determines whether it is valid? Turing
and Church indepedently solved this problem negatively in 1936. Without going into the details
of the Church-Turing thesis here, the ramification of these results for automated deduction is
that a full automation of validity-checking is impossible. Leibniz’ idea, when we consider firstorder logic to be the universal language he has envisaged, is hence not realisable in a fully
automated way. However, on the positive side Gödel’s 1929 completeness theorem entails that
the set of valid first-order formulas is semi-decidable, i.e. there is an algorithm which takes a
first-order formula ϕ as input. If ϕ is valid the algorithm will eventually terminate with the
information that ϕ is valid. If, on the other hand, ϕ is not valid, the algorithm may either
terminate with that information or not terminate at all.
The situation for validity in first-order logic is much better suited for automatisation than
that of truth in arithmetic. As Gödels’s first incompleteness theorem (1931) shows, truth in
arithmetic is not even semi-decidable (in fact: truth in arithmetic is much more complicated
than semi-decidable problems).
Therefore validity in first-order logic forms a good basis for automated deduction: while the
formalism is very expressive it is still semi-decidable. And indeed, most of this course will be
devoted to algorithms that prove the validity of first-order formulas.
While such mathematical results on decidability rightly form cornerstones of computational
logic one should also not over-emphasise their relevance for practical applications. After all,
the guarantee that a computer program will terminate eventually is of little practical use if the
computation time is beyond what a user is willing or able to wait for.
So, for practical applications, we are interested not only in the existence of a semi-decision
procedure but also in that procedure being efficient. Seminal work in that respect has been done
by Robinson in 1965 by his invention of the resolution principle. Until then most provers have
generated ground instances of quantified formulas and then applied propositional reasoning steps
to these ground instances. Robinson’s resolution principle was the first to combine instantiation
and propositional reasoning in a single inference rule using unification: the first-order resolution
rule. Since then a vast amount of techniques has been developed on that basis. This course will
be primarily about these techniques.
Today automated deduction has a wealth of applications throughout computer science in fields
such as hardware and software verification, artificial intelligence, logic programming, deductive
information systems, formal mathematics, . . .
2
Chapter 2
Resolution in propositional logic
2.1
Reminders on propositional logic
This course supposes familiarity with basic notions in propositional logic. This section only
serves to remind the reader about these notions and to fix notation. A thorough introduction
to propositional logic, which is well-suited as a basis for this course, can be found in Dirk van
Dalen: Logic and Structure, 4th edition, Springer, sections 1.1–1.3.
In propositional logic formulas are built up inductively from a countably infinite set of atoms
p1 , p2 , p3 , . . ., the logical connectives ^, _, Ñ, and the logical constants K, J. Often we will
also use letters like p, q, r, . . . for atoms. We will also ocassionally take some liberty as to whether
^ and _ are considered as binary or as n-ary connectives. We can (and often will) think of
formulas as trees, for example the formula p p _ qq is written as the tree:
_
q
p
Then words as “above”, “below”, “immediately above”, “immediately below” become meaningful on formulas. The size of a formula ϕ is defined by induction on the structure of ϕ as:
|p| “ |K| “ |J| “ 1, |ψ ˝ χ| “ |ψ| ` |χ| ` 1 for ˝ P t_, ^, Ñu, and | ψ| “ |ψ| ` 1.
An interpretation is a mapping I : tp1 , p2 , . . .u Ñ t0, 1u where 1 represents “true” and 0 represents “false”. The interpretation of a formula is defined by fixing IpKq “ 0, IpJq “ 1 and then
proceeding by induction on the structure of the formula via the following truth tables.
p
0
0
1
1
q
0
1
0
1
p^q
0
0
0
1
p_q
0
1
1
1
pÑq
1
1
0
1
p
0
1
p
1
0
A formula ϕ is called satisfiable if there is an I s.t. Ipϕq “ 1. It is called valid (or tautological)
if Ipϕq “ 1 for all I. Two formulas ϕ, ψ are called logically equivalent, written as1 ϕ ô ψ, if
1
Note the difference between the connective Ø and the relation ô.
3
Ipϕq “ Ipψq for all I. If a formula ϕ only contains the constants p1 , . . . , pn then I 1 pϕq “ I 2 pϕq
for all I 1 , I 2 which agree on p1 , . . . , pn . Therefore the validity as well as the satisfiability of a
formula can be decided in exponential time by using a truth table.
2.2
Normal forms
Normal forms of logical formulas play a very important role in automated deduction. This is
due to the following reason: the central computational problem of automated deduction is that
the search space (of an algorithm that searches for a proof) is very large. Avoiding different
syntactic representations of one and the same semantical meaning helps to keep down the size
of the search space. We will see this principle at work when we consider redundancy-elimination
techniques like subsumption later.
Definition 2.1. A formula is said to be in negation normal form (NNF) if it does not contain
Ñ and only appears immediately above atoms.
Proposition 2.1. For every formula ϕ in tp1 , . . . , pn u there is a formula ϕN in tp1 , . . . , pn u
which is in NNF, is logically equivalent to ϕ, and satisfies |ϕN | “ Op|ϕ|q.
Proof. The following formula rewriting rules preserve logical equivalence.
pIq
ψ Ñ χ ÞÑ ψ _ χ
pM1q
pψ ^ χq ÞÑ ψ _ χ
pDNq
pM2q
ψ ÞÑ ψ
pψ _ χq ÞÑ
ψ^ χ
Logical equivalence is a congruence relation, hence application of the above rules anywhere in
a formula transforms it into a logically equivalent formula.
Let ψ be a formula. Write nÑ pψq for the number of implications in ψ and n˝ pψq for the number
of pairs pc, dq where c is a negation in ψ, d is a binary connective in ψ and c is above d. Note
that each of the above rewriting rules decreases the lexicographic order on pnÑ p¨q, n˝ p¨q, | ¨ |q.
Therefore, every rewriting sequence eventually terminates with a normal form. A normal form
of these rewriting rules is in NNF. Moreover, note that none of these rules changes the number
of binary connectives. The logical complexity |ϕN | of a formula in NNF is at most its number of
binary connectives plus 2 times its number of atoms. Hence |ϕN | “ Opϕq for any normal form
ϕN obtained from a formula ϕ.
Definition 2.2. Atoms and negated atoms are called literals. A formula is in conjunctive
normal form (CNF) if it is a conjunction of disjunctions of literals.
CNFs are often conveniently notated as
Źn Žki
j“1 Li,j
i“1
where the Li,j are literals.
Proposition 2.2. For every formula ϕ in tp1 , . . . , pn u there is a formula ϕC in tp1 , . . . , pn u
which is in CNF and is logically equivalent to ϕ.
Proof. By Proposition 2.1 the formula ϕ has a NNF ϕN . Let ϕC be a formula obtained from
ϕN by exhaustive application of the following formula rewriting rules:
pD1q ψ _ pχ1 ^ χ2 q Ñ
Þ
pψ _ χ1 q ^ pψ _ χ2 q
pD2q pχ1 ^ χ2 q _ ψ Ñ
Þ
pχ1 _ ψq ^ pχ2 _ ψq
We show that every rewriting sequence of pD1q- and pD2q-steps eventually terminates. To that
aim, proceed by induction on the size of ϕ. These rules terminate on atoms since they are not
4
applicable to atoms. For the induction step, the statement follows trivially if neither pD1q nor
pD2q is applicable to the root of ϕ. If pD1q is applicable at the root of ϕ, then ϕ is of the form
ψ _ pχ1 ^ χ2 q. In that case, by induction hypothesis every sequence of reductions in ψ, χ1 , χ2
as well as in ψ _ χ and ψ _ χ2 terminates hence a reduction sequence of ϕ also terminates.
Finally, ϕC contains negation only immediately above atoms and does not contain a conjunction
below a disjunction, hence it is in CNF.
Example 2.1. Consider the formula
ϕn “ pp1 ^ q1 q _ pp2 ^ q2 q _ ¨ ¨ ¨ _ ppn ^ qn q.
The above transformation into CNF will yield the formula
ľ
ϕC
n “
n
ł
vi
pv1 ,...,vn qPtp1 ,q1 uˆ¨¨¨ˆtpn ,qn u i“1
which is of size exponential in that of ϕn .
Both of the above transformations to NNF and CNF respectively are simple to define and
quite elegant theoretically. But while the above transformation to NNF is harmless complexitywise the above transformation to CNF is not. We will see later that there is also a linear
transformation to CNF which introduction additional atoms and preserves only satisfiability
and not logical equivalence. In practice, mostly such CNF-transformations are used (since
preserving satisfiability is enough).
2.3
Resolution
The resolution calculus works on formulas in conjunctive normal form. Formulas in CNF will
be notated by clause sets, defined below. Resolution is a refutational calculus, i.e. we start
from a given clause set and try to show that it is unsatisfiable by showing that it implies a
contradiction. This can be used for proving a formula ϕ valid by proving ϕ unsatisfiable and
observing that ϕ is valid iff ϕ is unsatisfiable.
Definition 2.3. A clause is a finite set of literals. A clause set is a set of clauses, i.e. a set of
sets of literals.
The semantical meaning of a clause is that of a disjunction, i.e. the clause C “ tL1 , . . . , Lk u is
interpreted as the disjunction L1 _ ¨ ¨ ¨ _ Lk and consequently we define IpCq “ IpL1 _ ¨ ¨ ¨ _ Lk q,
i.e. IpCq “ 1 iff there is an Li P C s.t. IpLi q “ 1. The semantical meaning of a clause set is that
of a conjunction of its clauses. While clauses will always be finite sets of literals, clause sets will
sometimes be infinite. Consequently the interpretation of a clause set S cannot be defined via
a (finite) formula but is instead defined directly as IpSq “ 1 iff IpCq “ 1 for all C P S. Note
that the interpretation of the empty clause is 0. We will never consider the empty clause set.
These definitions allow to speak about satisfiability, validitiy, logical equivalence, etc. of clause
sets just as for formulas.
Note that in a clause, multiple occurrences of literals are identified, the order of literals does
not matter and the parenthesis around disjunctions do not matter. The same is true for clauses
in a clause set. Therefore, there is not a 1-1 relation between clauses sets and formulas in CNF
but all formulas in CNF which correspond to a given clause set are logically equivalent since
both conjunction and disjunction are idempotent, commutative and associative.
5
Example 2.2. The clause set corresponding to
ϕ “ ppp _ qq _ pq ^ p q _ pq ^ p
is
S “ ttp, qu; tpuu.
As in the above example we usually write a semicolon instead of a comma for separating the
clauses of a clause set.
Definition 2.4. Let C and D be clauses s.t. p P C und
p P D. Then the clause
resp pC, Dq :“ pCztpuq Y pDzt puq
is called p-resolvent of C and D.
Definition 2.5. Let S be a clause set. A list C1 , . . . , Cn of clauses is called resolution deduction
from S if for all i P t1, . . . , nu:
(I) Ci P S, or
(R) there are j, k ă i and an atom p s.t. Ci “ resp pCj , Ck q.
A resolution deduction C1 , . . . , Cn from S is called resolution refutation of S if Cn “ H.
Example 2.3. Let S “ ttp1 u; t p1 , p2 u; t p1 , p2 , p3 u, t p3 uu. The following list of clauses is a
resolution refutation of S.
C1 “ t p1 , p2 u
pIq
C2 “ t p1 , p2 , p3 u
pIq
C3 “ t p1 , p3 u
pRpC1 , C2 qq
C4 “ tp1 u
pIq
C5 “ tp3 u
pRpC4 , C3 qq
C6 “ t p 3 u
pIq
C7 “ H
pRpC5 , C6 qq
Sometimes we will also write resolution deductions and refutations in tree form:
p1 , p2
p1
p1 , p2 , p3
p1 , p3
p3
p3
H
Theorem 2.1 (Soundness). If S has a resolution refutation, then S is unsatisfiable.
Proof. We will show the following, slightly more general, statement: if C1 , . . . , Cn is a resolution
deduction from S and I an interpretation with IpSq “ 1, then IpCn q “ 1. We proceed by
induction on the deduction making a case distinction on the inference rule used for deriving
Cn . If Cn P S the statement follows trivially. If Cn “ resp pCi , Cj q for some i, j ă n then by
induction hypothesis we have IpCi q “ IpCj q “ 1. If Ippq “ 1 and Cj “ t p, L1 , . . . , Lk u then
IptL1 , . . . , Lk uq “ 1 and hence IpCn q “ 1. If Ippq “ 0 then IpCn q “ 1 follows symetrically from
IpCi q “ 1.
6
In order to prove the completeness of the resolution calculus we will use semantic trees.
Definition 2.6. Let ppi qiě1 be a sequence of atoms. The semantic tree of ppi qiě1 is the following
tree
‚
p1
‚
p2
‚
p1
w
'
p2

p2
‚
‚
‚
p2

‚
in which every branch is infinite. Every vertex v of this tree induces a partial interpretation Iv
of tp1 , p2 , . . .u where Iv ppi q “ 1 if pi occurs on the path from v to the root and Iv ppi q “ 0 if pi
occurs on this path.
Let S be a clause set in the atoms tpi | i ě 1u. Then the semantic tree of S, written as T pSq, is
defined from the above tree by closing a branch after finitely many steps at a vertex v iff there
is a C P S s.t. Iv pCq “ 0. A vertex closed by a clause C is written as:
‚
ˆ
C
Note that we do not require the clause set S to be finite for the definition of T pSq. Also note
that the requirement of closing at v iff there is a C P S s.t. Iv pCq “ 0 entails that branches are
closed as early as possible.
Example 2.4. Let S “ ttp1 u; t p1 , p2 u; t p1 , p2 , p3 u, t p3 uu be the clause set of Example 2.3.
The semantic tree T pSq is:
p1
‚
t
p1
‚
ˆ
*‚
tp1 u
p2
p2
u
‚
ˆ
"
t p1 , p2 u
p3
w
‚
p3
$
‚
ˆ
‚
ˆ
t p1 , p2 , p3 u
t p3 u
Fact 2.1. H P S iff T pSq consists of a single node.
Fact 2.2. S is satisfiable iff T pSq has an infinite branch.
Proof. Let I be an interpretation with IpSq “ 1, then I induces an infinite branch and vice
versa: an infinite branch is never closed, hence the interpretation it induces satisfies all C P S
and hence S itself.
7
Fact 2.3. S is unsatisfiable iff T pSq is finite.
Proof. By König’s Lemma a finitely branching tree is infinite iff it has an infinite branch hence
the claim directly follows from Fact 2.2.
Fact 2.4. If S Ď S 1 then T pSq Ě T pS 1 q (where Ě is to be understood as applied to the set of
vertices) because every vertex which can be closed in T pSq can also be closed in T pS 1 q.
We are now ready to prove the completeness of propositional resolution.
Theorem 2.2 (Completeness). Let S be a clause set. If S is unsatisfiable, then S has a
resolution refutation.
Proof. Let S be unsatisfiable, then by the above Fact 2.3 the tree T pSq has only finitely many
vertices. We will proceed by induction on |T pSq|, the number of vertices in T pSq.
If |T pSq| “ 1 then by the above Fact 2.1 we have H P S and the list consisting only of H is
already a resolution refutation of S.
If |T pSq| ą 1 then T pSq contains a configuration of the form
p
‚v
x
p
&
‚
ˆ
C1
‚
ˆ
C2
because T pSq is finite: suppose each vertex had a child which is not closed, then every vertex
would be the start of an infinite branch which is a contradiction to T pSq being finite.
Now let C “ resp pC1 , C2 q and S 1 “ S Y tCu. Then T pS 1 q Ď T pSq already due to Fact 2.4. We
claim that we even have T pS 1 q Ă T pSq. To show this, let C1 “ C11 Z tpu, C2 “ C21 Z t pu, and
consider the path π from v to the root. Since C1 closes a sucessor of v, π must contain duals
of all literals in C11 and – analogously – duals of all literals in C21 . Hence π contains duals of
all literals in C and thus v can be closed by C and we have |T pS 1 q| ă |T pSq|. By induction
hypothesis there is a resolution refutation of S 1 which, w.l.o.g., is of the form C, C11 , . . . , Cn1 with
C ‰ Ci1 for all i P t1, . . . , nu. Hence C1 , C2 , C, C11 , . . . , Cn1 is a resolution refutation of S.
Note that this proof is constructive in the sense that it computes a resolution refutation from
a finite semantic tree.
Definition 2.7. Let S be a clause set. The smallest superset of S which is closed under
resolution is called closure of S and is denoted by Ŝ.
Corollary 2.1. A clause set S is unsatisfiable iff H P Ŝ.
Proof. By soundness and completeness S is unsatisfiable iff S has a resolution refutation which
in turn is equivalent to H P Ŝ.
This corollary suggests the following algorithm for deciding the satisfiability of a propositional
formula ϕ: first compute a clause set S which is satisfiability-equivalent to ϕ, then successively
compute Ŝ. If the empty clause is found in Ŝ, ϕ is unsatisfiable. If the computation of Ŝ finishes
without finding the empty clause, ϕ is satisfiable. Note that S is finite and hence Ŝ is finite, so
the computation of Ŝ terminates. This algorithm already has an asymptotic complexity which
8
is better than computing a truth table: while the truth table computation is exponential even
in the best case, this is not so for the above algorithm. It allows to exploit the existence of
short resolution refutations for determining the unsatisfiability of a formula faster than a truth
table would allow.
2.4
The Tseitin transformation
In this section, we will see how the exponential blowup of the distributivity-based computation
of a CNF can be avoided. The crucial idea is to introduce new propositional atoms which will
serve as abbreviations of complex formulas.
Definition 2.8. Let ϕ be a formula, the set of subformulas of ϕ are defined inductively as:
subfppq “ tpu
and analogously for J and K
subfp ψq “ t ψu Y subfpψq
subfpψ1 ˝ ψ2 q “ tψ1 ˝ ψ2 u Y subfpψ1 q Y subfpψ2 q
for ˝ P t^, _, Ñu
In what follows we will write CNFpϕq for a conjunctive normal form of a formula ϕ obtained by
the distributivity-based transformation of Proposition 2.2. For writing formulas which express
logical equivalences we introduce the abbreviation ϕ Ø ψ for pϕ Ñ ψq ^ pψ Ñ ϕq.
Definition 2.9. Let ϕ be a formula in tp1 , . . . , pn u. For every subformula ψ of ϕ define the
following formula Dpψq in the language tp1 , . . . , pn u Y tqψ | ψ P subfpϕqu:
Dppi q “ CNFpqpi Ø pi q
Dp ψ0 q “ CNFpq
ψ0
qψ0 q
Ø
Dpψ1 ˝ ψ2 q “ CNFpqψ1 ˝ψ2 Ø qψ1 ˝ qψ2 q
Ź
Furthermore, define T pϕq “ qϕ ^ ψPsubfpϕq Dpψq
for ˝ P t^, _, Ñu
This transformation to conjunctive normal form is known as the Tseitin-transformation, named
after G. Tseitin. Hence also the notation T pϕq.
Example 2.5. Let us compute T pp1 ^ p2 q:
Dpp1 q “ CNFpqp1 Ø p1 q “ CNFppqp1 Ñ p1 q ^ pp1 Ñ qp1 qq “ p qp1 _ p1 q ^ p p1 _ qp1 q
and analogously
Dpp2 q “ p qp2 _ p2 q ^ p p2 _ qp2 q.
Furthermore
Dp p2 q “ CNFpq
“p q
p2
p2
_ qp2 q ^ pqp2 _ q
Dpp1 ^ p2 q “ CNFpqp1 ^
p2
“ CNFppqp1 ^
“ p qp1 ^
qp2 q “ CNFppq
Ø
p2
Ø qp 1 ^ q
p2
Ñ qp 1 ^ q
p2
Ñ
qp 2 q ^ p qp 2 Ñ q
p2 qq
p2 q
p2 q
p2 q
_ qp1 q ^ p qp1 ^
^ pqp1 ^ q
p2
_q
p2 q
p2
Ñ qp1 ^
p2 qq
^ p qp1 _ q
p2
_ qp 1 ^
p2 q
and finally
T pp1 ^ p2 q “ qp1 ^
p2
^ Dpp1 ^ p2 q ^ Dp p2 q ^ Dpp2 q ^ Dpp1 q
Proposition 2.3. Let ϕ be a propositional formula, then Tpϕq is in CNF, satisfiability-equivalent
to ϕ, and satisfies |Tpϕq| “ Op|ϕ|q.
9
Notation: In the below proof we will also use ^, . . . for the Boolean function and not just for
the connective.
Proof. Since Tpϕq is a conjunction of CNFs it is clearly in CNF.
For satisfiability-equivalence, let I be an interpretation of the atoms tp1 , . . . , pn u of ϕ with
Ipϕq “ 1. Define the interpretation I ˚ of tp1 , . . . , pn u Y tqψ | ψ P subfpϕqu by I ˚ ppi q “ Ippi q
and I ˚ pqψ q “ Ipψq. Then clearly I ˚ pqϕ q “ Ipϕq “ 1 and it remains to show that I ˚ pDpψqq “ 1
for all subformulas ψ of ϕ. We do this by a case distinction on the top connective of ψ: if
ψ “ ψ1 ^ ψ2 , then
I ˚ pqψ1 ^ψ2 q “ Ipψ1 ^ ψ2 q “ Ipψ1 q ^ Ipψ2 q “ I ˚ pqψ1 q ^ I ˚ pqψ2 q “ I ˚ pqψ1 ^ qψ2 q
and therefore I ˚ pqψ1 ^ψ2 Ø qψ1 ^ qψ2 q “ 1 but as CNF preserves logical equivalence we also have
I ˚ pCNFpqψ1 ^ψ2 Ø qψ1 ^ qψ2 qq “ 1. The other connectives are analogous.
For the other direction of satisfiability-equivalence, let I ˚ be an interpretation of tp1 , . . . , pn u Y
tqψ | ψ P subfpϕqu s.t. I ˚ pTpϕqq “ 1. Define I “ I ˚ ætp1 ,...,pn u . Since I ˚ pTpϕqq “ 1 we have
I ˚ pDpψqq “ 1 for all subformulas ψ of ϕ. We show that I ˚ pqψ q “ Ipψq for all subformulas ψ of ϕ
by induction on ψ: for an atom pi we have I ˚ pqpi q “ Ippi q as I ˚ pDppi qq “ 1. For a conjunction
we have
I ˚ pqψ1 ^ψ2 q “ I ˚ pqψ1 ^ qψ2 q
since I ˚ pDpψ1 ^ ψ2 qq “ 1 and furthermore
“ I ˚ pqψ1 q ^ I ˚ pqψ2 q “(IH) Ipψ1 q ^ Ipψ2 q “ Ipψ1 ^ ψ2 q.
The other connectives are analogous. Since I ˚ pTpϕqq “ 1 we have I ˚ pqϕ q “ 1 and hence
Ipϕq “ 1.
For the complexity-result, note that |Tpϕq| ď c ` d|subfpϕq| ď c ` d|ϕ| for some constants
c, d P N.
Remark 2.1. Note that the mere existence of a small satisfiability-equivalent formula is trivial:
if ϕ is satisfiable, then J is satisfiability-equivalent to ϕ, if it is not then K is satisfiabilityequivalent
to ϕ. The point of the above result is that ϕ ÞÑ Tpϕq is much easier to compute than
#
J if ϕ is satisfiable
ϕ ÞÑ
.
K if ϕ is unsatisfiable
10
Chapter 3
SAT- and SMT-solving
3.1
DPLL
The basis for state of the art SAT-solvers is the DPLL algorithm which we are going to present
based on a transition system in this chapter. This procedure is named after M. Davis, H. Putnam, G. Logeman, and D. Loveland. The idea of the DPLL algorithm is to traverse the set of all
possible interpretations of a clause set S to find out whether S is satisfiable or not. In principle,
this works like a depth-first search in the semantic tree of S (see Section 2.3). In practice it
makes a large difference how this set is traversed and we will see a number of improvements
over this naive traversal which are crucial for efficiency.
In the following, S will denote a clause set, C a clause and L a literal. It will often be convenient
to write a clause as a disjunction, so, e.g., C _ L is an abbreviation for the clause C Y tLu where
L R C. For a literal L, we write L for its dual literal, i.e., if L “ p, then L “ p and if L “ p,
then L “ p. A transition system is a directed graph which is presented by defining a set of
states as the vertices of the graph and by defining the edges by transition rules.
Definition 3.1. An annotated literal is either a literal L or a literal marked as decide literal,
written as Ld . A state of DPLL is either “FAIL” or a pair I | S where I “ L1 , . . . , Ln is a list
of annotated literals and S is a clause set.
I will be considered a (partial) interpretation by setting Ippq “ 1 if p P I and Ippq “ 0 if p P I.
If I and J are disjoint partial interpretations, then we will write pI, Jqppq for the value of p
under their union.
Definition 3.2. A state is called final if it is either FAIL or of the form I | S where I contains
all atoms of S and IpSq “ 1.
In the search through the tree of all interpretations, decide literals represent choices on which
decisions have been taken and hence serve as backtracking points. Non-decide literals represent
decisions which are enforced by other decisions taken previously. Therefore no backtracking is
needed for them (avoiding backtracking to non-decide literals is already a first improvement
over the naive depth-first search).
Definition 3.3. The transitions of DPLL are:
UnitPropagate
"
I|S
ÝÑ
I, L | S
if
L undefined in I
there is C _ L P S s.t. IpCq “ 0
11
Decide
I|S
ÝÑ
d
I, L | S
"
L undefined in I
L or L occurs in S
"
there is C P S s.t. pI, Ld , JqpCq “ 0, and
J does not contain decide literals
"
there is C P S s.t. IpCq “ 0
I contains no decide literals
if
Backtrack
d
I, L , J | S
ÝÑ
I, L | S
if
Fail
I|S
ÝÑ
FAIL
if
Note that in the above transition rules of DPLL the right side does not change. We include it
in the above definition nevertheless because we will later consider extensions of this transition
system where the right side does change.
The standard strategy for applying these rules is as follows:
Algorithm 1 The DPLL algorithm
Input: a non-empty clause set S
Output: a final DPLL-state
IÐH
UnitPropagate˚
while I | S not final do
if DC P S s.t. IpCq “ 0 then
( Backtrack; UnitPropagate˚ ) or Fail
else
Decide; UnitPropagate˚
end if
end while
Example 3.1. Let S “ tt p1 , p2 u; tp2 , p3 , p4 u; tp2 , p4 , p5 u, t p3 , p5 uu. A run of the DPLL
algorithm is the following
H|S
ÝÑD
pd1 | S
ÝÑUP
pd1 , p2 | S
ÝÑD
pd1 , p2 , pd3 | S
ÝÑUP
pd1 , p2 , pd3 , p4 | S
ÝÑUP
pd1 , p2 , pd3 , p4 , p5 | S
ÝÑD
pd1 , p2 , p3 , pd4 | S
ÝÑUP
pd1 , p2 , p3 , pd4 , p5 | S
ÝÑBT
pd1 , p2 , p3 | S
Now we are in a final state because the interpretation defined by I “ p1 , p2 , p3 , p4 , p5 satisfies
all clauses of S.
Theorem 3.1 (Termination). For every clause set S there is a final state F s.t. H | S ÝÑ˚ F .
Proof Sketch. The standard strategy eventually reaches a final state. Note that there is a small
caveat here: the “standard strategy” as defined above is not deterministic since it does not
specify which literals to decide and neither how to decide them. However, for the termination
result, this is irrelevant since any sequence of choices on the decide-rules will eventually reach
a final state.
Theorem 3.2 (Correctness). If H | S ÝÑ˚ F where F is final, then
12
1. If F is FAIL, then S is unsatisfiable.
2. If F “ I | S then IpSq “ 1.
Additional DPLL-rules which are useful in practice (when properly applied) are Learn, Forget,
Restart and Backjump.
Learn
"
each atom of C occurs in S
S(C
I|S
ÝÑ
I | S, C
if
I | S, C
ÝÑ
I|S
if S ( C
I|S
ÝÑ
H|S
Forget
Restart
Backjump is a more general form of backtracking which we will study more closely in the
exercises.
A number of features not treated in detail here are important for obtaining efficient implementations in practice: the decision rule does not specify on which literal to decide, heuristics
for literal selection play a crucial role in practice. The backjump rule must be applied using
cleverly constructed backjump clauses. There are efficient techniques for doing this. Often it
pays out to add the backjump clauses to the clause set (using the Learn-rule), this is also called
CDCL (conflict-driven clause learning). While the learned clauses can help to restructure the
search space in a favourable way the downside is that the size of the clause set grows. Therefore typically such learned clauses are again forgotten if their activity-level falls below a certain
threshold. The restart rule is helpful for dropping a search which has run astray (i.e. into regions
of the search space where no interpretation is found) and restart from an empty interpretation
keeping the learned clauses.
Current state of the art SAT-solvers are capable of solving CNFs consisting of millions of atoms
and clauses (if the structure of these CNFs is sufficiently simple). This has led to the common
practice of using SAT-solvers for solving other NP-complete problems via their reduction to SAT
(which is not without irony since such reductions have originally been conceived to argue that
a problem at hand cannot be solved feasibly). On the other hand, the worst-case complexity
of DPLL is still exponential; current SAT-solvers fail to solve quite small clause sets if their
structure is sufficiently complicated. A popular SAT-solver is minisat1 . SAT-solvers typically
use the DIMACS input format which is illustrated in the following example:
Example 3.2. The clause set tt p2 , p4 , p1 u; t p2 , p4 u; t p1 , p3 u; tp2 u; t p4 , p3 uu is formulated in a language of 4 atoms and consists of 5 clauses. It is written in DIMACS input format
as:
p cnf 4 5
-2 -4 1 0
-2 4 0
-1 -3 0
2 0
-4 3 0
You can find more information on this format on the web.
1
http://minisat.se/
13
3.2
Reminders on first-order logic
Before we move on to SMT-solving and the DPLL(T) algorithm commonly employed for it,
some reminders on first-order logic:
A first-order language contains constant symbols, function symbols and predicate symbols.
Each symbol has an arity, the number of arguments it takes (written f {n for the symbol f
with arity n P N). In addition, we assume a countably infinite supply of variable names at
our disposal. The terms over a language L (the L-terms) are defined inductively from constant
symbols, variables and function symbols. L-formulas are defined inductively from atoms, the
propositional connectives ^, _, , Ñ and the quantifiers @x, Dx.
An L-structure is a pair M “ pD, Iq where D is a set, the domain of M and I maps all constant
symbols, function symbols and predicate symbols of L to elements, functions and relations
respectively of D and some variables to elements of D. The interpretation I is extended to
cover all terms by defining Ipf pt1 , . . . , tn qq “ Ipf qpIpt1 q, . . . , Iptn qq.
A formula may have free and bound variables, a formula without free variables is called sentence.
A formula without any variables (and hence without quantifiers) is called ground formula. The
truth of a formula F in a structure M “ pD, Iq is written as M ( F , pronounced as “F is
true in M” or “M satisfies F ” or “M is a model of F ”, and defined, as usual, inductively on
the structure of F under the assumption that all free variables of F are interpreted by I. This
definition is extended to cover M ( F where F contains free variables which are not interpreted
in M by considering these free variables as universally quantified. A sentence which is true in
all structures is called valid. A sentence is called satisfiable if there is a structure in which it is
true. A set of sentences Γ is called satisfiable if there is a structure in which all F P Γ are true.
There is a number of different proof calculi for first-order logic, the notation $ ϕ means that
the formula ϕ is provable. For a set of sentences Γ and a formula ϕ, the notation Γ $ ϕ means
that ϕ can be proved using assumptions from Γ. The notation Γ ( ϕ means that every model
which satisfies Γ also satisfies ϕ. By soundness and completeness of these calculi, all of them
prove the same formulas: the valid formulas, i.e., we have Γ $ ϕ iff Γ ( ϕ. By the compactness
theorem we have Γ ( ϕ iff there is a finite Γ0 Ď Γ s.t. Γ0 ( ϕ.
Definition 3.4. A set of sentences Γ is called deductively closed if Γ $ ϕ implies ϕ P Γ. A
theory is a deductively closed set of sentences.
Definition 3.5. Let T be a theory. A formula ϕ is called T -satisfiable if there is a structure
M s.t. M ( T and M ( ϕ. It is called T -unsatifiable if there is no such structure. It is called
T -valid if T ( ϕ.
For a theory T , a set of sentences Γ and a formula ϕ, we also write (T ϕ for T ( ϕ and Γ (T ϕ
for Γ, T ( ϕ.
Example 3.3. Let L be first-order language. Define the theory of equality of L, written EQL by
the following axioms:
@x x “ x
@x@y px “ y Ñ y “ xq
@x@y@z ppx “ y ^ y “ zq Ñ x “ zqq
@x1 ¨ ¨ ¨ @xn @y1 ¨ ¨ ¨ @yn ppx1 “ y1 ^ . . . ^ xn “ yn q Ñ f px1 , . . . , xn q “ f py1 , . . . , yn qq
for every n-ary function symbol f
@x1 ¨ ¨ ¨ @xn @y1 ¨ ¨ ¨ @yn ppx1 “ y1 ^ . . . ^ xn “ yn q Ñ pP px1 , . . . , xn q Ñ P py1 , . . . , yn qqq
14
for every n-ary predicate symbol P
Example 3.4. Let L “ tc{0, d{0, f {1, g{2u, then pc “ f pdq^d “ f pcqq Ñ f pf pcqq “ c is EQL -valid
(which can be shown by a simple derivation in your favourite proof calculus). On the other hand,
the formula gpc, dq “ gpd, cq is not EQL -valid. Let ptc, du˚ , ¨, εq be the free monoid generated
by tc, du, then M “ ptc, du˚ , Iq with Ipgq “ ¨, Ipcq “ c, Ipdq “ d, and Ip“q being equality is a
model of EQL but not of gpc, dq “ gpd, cq since cd ‰ dc in tc, du˚ . The formula gpc, dq “ gpd, cq
is EQL -satisfiable. To see that, let M “ pN, Iq with Ipgq “ `, Ipcq “ 1, Ipdq “ 2, and Ip“q
being equality and observe that 1 ` 2 “ 2 ` 1.
The theory EQL will serve as illustrating example for DPLL(T). There are also more expressive
theories, e.g., Presburger arithmetic or the theory of arrays which are routinely treated in
SMT-solving.
3.3
DPLL(T)
In applications it is often convenient to have stronger expressivity than that of propositional
logic. On the other hand, problems in a more expressive formalism are more difficult to solve. A
good compromise in the sense that much more expressivity can be obtained with only comparatively little more difficulty can be found in the area of “satisfiability modulo theories (SMT)”.
An SMT-solver considers a quantifier-free first-order formula in a certain background theory T
and determines its T -satisfiability using an extension of the DPLL procedure, called DPLL(T).
For this algorithm to work, it is crucial to restrict the theories T we consider to only such
theories which satisfy the following decidability condition: we require that
T -satisfiability of conjunctions of ground literals is decidable.
We will then call a decision procedure for conjunctions of ground literals a T -solver.
The key to the DPLL(T) procedure is the subtle interplay between propositional interpretations
and first-order models of ground formulas in T . Consider for example the following ground
formula in EQL :
a “ b ^ f paq “ f pbq.
When this formula is considered a propositional formula, e.g., by a SAT-solver, it has the shape
p1 ^ p2 .
Now, p1 ^ p2 is a satisfiable propositional formula. An interpretation that makes it true is I
with Ipp1 q “ 1 and Ipp2 q “ 0. When we consider I as a first-order model M we would require
that M ( a “ b and M * f paq “ f pbq. However, this is inconsistent with EQL . Identifying
an interpretation I of ground atoms with the set of literals it sets to true we are led to the
following definition.
Ź
Definition 3.6. Let I be a set of ground literals in T . We say that I is T -consistent iff LPI I
is T -satisfiable.
So, to continue the example, in other words a “ b ^ f paq “ f pbq is T -unsatisfiable and
consequently I is T -inconsistent. Therefore the propositional interpretation I does not represent
a first-order model of T . In such a situation the DPLL(T) procedure will add this information
to the clause set under consideration and thus find either an interpretation which is T -consistent
or terminate the search with the result that the original clause set is T -unsatisfiable.
15
Definition 3.7. The DPLL(T) procedure has the states of the form I | S as DPLL with the
only difference that S is no longer a set of propositional clauses but a set of ground clauses in
the language of T .
The transitions of DPLL(T) consist of UnitPropagate, Decide, Fail, and Backtrack defined as
above plus in addition: T -Learn and Restart.
T -Learn
"
I|S
ÝÑ
I | S, C
if
each atom of C occurs in S
S (T C
Definition 3.8. A DPLL(T)-state is called DPLL-final if it is either FAIL or of the form I | S
where I contains all atoms of S and IpSq “ 1.
A DPLL(T)-state is called DPLL(T)-final if it is either FAIL or of the form I | S where I is a
T -consistent interpretation, contains all atoms of S and IpSq “ 1.
The standard strategy for applying these rules is as follows:
Algorithm 2 The DPLL(T) algorithm
Input: a non-empty clause set S in the language of T
Output: a DPLL(T)-final state
while I | S Ð DPLLpSq is not DPLL(T)-final do
S Ð S Y ttL | L P Iuu
end while
Ź T-Learn a conflict clause
Ź Restart
If we have a state I | S which, as in the condition of the while-loop, is DPLL-final but not
DPLL(T)-final,
then I Ž
is an interpretation with IpSq “ 1 which is T -inconsistent. Therefore
Ź
(T
I,
i.e.,
(
T
LPI
LPI L and we can apply a T -Learn transition to add the clause tL |
L P Iu to our clause set. In practice one will usually not add the whole clause tL | L P Iu but
a subclause of it which is already T -valid but as small as possible.
So, the software architecture of an SMT-solver has the following overall shape:
propositional interpretation
input clause set
'
/ SAT solver
g
T -solver
conflict clause
or “T -consistent”
T -consistent interpretation or “T -unsat”
Example 3.5. Let S “ tt a “ b, c “ du; t f pa, cq “ f pb, dquu, which, as a propositional clause
set, is written as tt p1 , p2 u; t p3 uu. The DPLL(T) starts just like the DPLL-algorithm:
H|S
ÝÑUP
p3 | S
ÝÑD
p3 , pd1 | S
16
ÝÑUP
p3 , pd1 , p2 | S
This state is DPLL-final. The propositional interpretations it induces is t p3 , p1 , p2 u which
satisfies tt p1 , p2 u; t p3 uu. This interpretation is given to the EQL -solver which returns with
the information that p3 ^ p1 ^ p2 , i.e., f pa, cq “ f pb, dq ^ a “ b ^ c “ d is EQL -unsatisfiable,
in other words: (EQL a “ b _ c “ d _ f pa, cq “ f pb, dq. This clause (let us abbreviate it as
C) is now added to the clause set by means of the EQL -Learn rule:
ÝÑEQL -Learn
p3 , pd1 , p2 | S, C
ÝÑ
D
p3 , pd1
ÝÑ
D
p3 , p1 , p2 | S, C
| S, C
ÝÑ
UP
ÝÑR
H | S, C
p3 , pd1 , p2
| S, C
ÝÑUP
ÝÑ
BT
p3 | S, C
p3 , p1 | S, C
Now we have again reached a DPLL-final state. This time, its interpretation t p3 , p1 , p2 u
is EQL -consistent, consider, e.g., the model M “ pN, Iq with Ipf q “ `, Ipcq “ Ipdq “ 0,
Ipaq “ 1, Ipbq “ 2. Then 1 ` 0 ‰ 2 ` 0, 1 ‰ 2, 0 “ 0. We can conclude that S, C and hence S is
EQL -satisfiable.
As one can see already in the above simple example, the use of the Restart rule leads to the
duplications of steps. In practice, one uses more efficient transitions like Backjump with the
learned conflict clause instead.
Theorem 3.3 (Termination). Let T be a theory. For set S of ground clauses in the language
of T there is a DPLL(T )-final state F s.t. H | S ÝÑ˚ F . If T -satifiability of conjunctions of
ground literals is decidable, a final state can be computed from S.
Theorem 3.4 (Correctness). Let T be a theory and S be a set of ground clauses in the language
of T . If H | S ÝÑ˚ F where F is DPLL(T )-final, then
1. If F is FAIL, then S is T -unsatisfiable.
2. If F “ I | S 1 then I is T -consistent and IpSq “ 1.
Current SMT-solvers like z32 or veriT3 typically accept input in the SMT-LIB2 format. The
clause set S of the above Example 3.5 is written in SMT-LIB2 format as:
(set-logic QF_UF)
(declare-sort U 0)
(declare-fun a () U)
(declare-fun b () U)
(declare-fun c () U)
(declare-fun d () U)
(declare-fun f (U U) U)
(assert (and (or (not (= a b)) (= c d)) (not (= (f a c) (f b d)))))
(check-sat)
(exit)
The assert command accepts any formula, not just CNFs. A binary predicate can be defined
by (declare-fun P (U U) Bool). For more information on this format, see the SMT-LIB2
tutorial4 .
2
http://github.com/Z3Prover/z3
http://www.verit-solver.org/
4
http://www.grammatech.com/resources/smt/SMTLIBTutorial.pdf
3
17
3.4
Congruence closure
So far we have considered the T -solver as a black box. In this section we will see a decision
procedure for conjunctions of ground literals in EQL . This decision procedure, called congruence
closure, computes the minimal congruence relation satisfying the positive equality literals in
the input formula and then checks whether some negative information
Źn contradicts this minimal
relation. For ease of notation we will identify a conjunction F “ i“1 Ai of ground literals with
a set of ground literals and consequently write A P F for Di P t1, . . . , nu s.t. A “ Ai .
Algorithm 3 Congruence closure
Input: a conjunction F of ground literals in EQL
Output: EQL -satisfiable or EQL -unsatisfiable
C Ð ttt, su | t “ s P F u Y tttu | t P subtermspF q, Es s.t. t “ s P F u
while DC1 , C2 P C s.t. C1 ‰ C2 and C1 X C2 ‰ H do
Ź close under transitivity
C Ð pCztC1 , C2 uq Y tC1 Y C2 u
end while
while DC1 , C2 P C s.t. C1 ‰ C2 and f pt1 , . . . , tn q P C1 , f ps1 , . . . , sn q P C2 s.t.
@i P t1, . . . , nuDC P C with ti , si P C do
Ź close under congruence
C Ð pCztC1 , C2 uq Y tC1 Y C2 u
end while
if Ds ‰ t P F, C P C s.t. s, t P C or
DP pt1 , . . . , tn q, P ps1 , . . . , sn q P F s.t. @i P t1, . . . , nuDC P C s.t. ti , si P C then
return EQL -unsat
else
return EQL -sat
end if
Example 3.6. Consider the formula
F : a “ b ^ a ‰ c ^ b “ f pcq ^ gpa, f pcqq ‰ gpb, aq.
We start out with the sets
ta, bu, tb, f pcqu, tcu, tgpa, f pcqqu, tgpb, aqu.
Closure under transitivity yields the equivalence relation
ta, b, f pcqu, tcu, tgpa, f pcqqu, tgpb, aqu.
Closure under congruence yields the congruence relation
ta, b, f pcqu, tcu, tgpa, f pcqq, gpb, aqu.
But since gpa, f pcqq ‰ gpb, aq is in F , the algorithm returns the information that F is EQL unsatisfiable.
18
Chapter 4
Normal forms in first-order logic
As we saw, in propositional logic it is possible to transform every formula into a logically equivalent CNF. However, for reasons of practical efficiency it is usually more sensible to compute
only a satisfiablity-equivalent CNF by applying the Tseitin-Transformation.
In this section we will see how to compute normal forms for first-order formulas. The additional
complication lies, of course, in the presence of quantifiers. They will be dealt with by a technique
called Skolemisation1 which replaces existential quantifiers by new function symbols.
As in propositional logic this will allow to obtain a clause set which is satisfiability-equivalent to
the original formula. An important difference between propositional and first-order clause sets
is that the atoms of the latter contain variables. These variables are considered to be universally
quantified. Therefore, a first-order formula is considered to be in CNF if it is of the form
ki
n ł
ľ
@x
Li,j .
i“1 j“1
This formula can then be written as the clause set
ttLi,j | 1 ď j ď ki u | 1 ď i ď nu.
4.1
Variable normal form
We will start with the simple operation of renaming bound variables.
Definition 4.1. A formula is said to be in variable normal form (VNF) if
1. it does not contain a variable that is free and bound, and
2. it does not contain a variable that is bound by two different quantifiers.
Lemma 4.1. Let ϕ be a formula, then there is a formula ϕ1 in VNF which is logically equivalent
to ϕ.
Proof. Let @x ψ be a subformula of ϕ and let u be a variable which does not appear in ϕ. Then
@x ψ ô @u ψrxzus and consequently ϕr@x ψs ô ϕr@u ψrxzuss. Thus all variables that violate
the conditions of VNF can be renamed.
1
named after Thoralf Albert Skolem (1887–1963)
19
Example 4.1. Let
ψ “ Du@x ppRpu, xq Ñ Qpxqq ^ Dv Rpx, vqq ^ Dx P pxq ^ @y pQpyq Ñ P pyqq.
This formula is not in VNF since x is bound twice. By renaming x we obtain a formula
ψ 1 “ Du@x ppRpu, xq Ñ Qpxqq ^ Dv Rpx, vqq ^ Dz P pzq ^ @y pQpyq Ñ P pyqq.
in VNF.
4.2
Negation normal form
In a second step we extend the notion of negation normal form (NNF) to first-order formulas:
as in the case of propositional logic, a first-order formula is said to be in NNF if it does not
contain Ñ and appears only immediately above atoms.
Definition 4.2. We extend the formula transformations Φ` and Φ´ from the first exercise
sheet to first-order logic by defining:
Φ` pQx ψq “ Qx Φ` pψq
Φ´ pQx ψq “ Qx Φ´ pψq
where Q P t@, Du and Q “ @ if Q “ D and Q “ D if Q “ @.
Lemma 4.2. Φ` is an NNF-transformation for first-order logic, i.e., for every formula ψ:
Φ` pψq contains the same atoms as ψ, Φ` pψq is logically equivalent to ψ, Φ` pψq is in NNF, and
|Φ` pψq| “ Op|ψ|q.
Without proof.
Example 4.2. The formula
ψ 1 “ Du@x ppRpu, xq Ñ Qpxqq ^ Dv Rpx, vqq ^ Dz P pzq ^ @y pQpyq Ñ P pyqq
obtained in Example 4.1 can be transformed to the NNF
ψ 2 “ Φ` pψ 1 q “ Du@x pp Rpu, xq _ Qpxqq ^ Dv Rpx, vqq ^ @z P pzq ^ @y p Qpyq _ P pyqq.
4.3
Skolemisation
Definition 4.3. Let ϕ be a formula in VNF. We define the ordering ďϕ on the bound variables
of ϕ as x ďϕ y if Qx is above Qy in the formula tree of ϕ.
Definition 4.4. Let ϕrDy ψs be a formula in VNF. Let @x1 , . . . , @xn be the quantifiers Qx with
x ďϕ y. Define sky pϕrDy ψsq “ ϕrψryzf px1 , . . . , xn qss where f is a function symbol which does
not appear in ϕ.
Note that the above substitution ryzf px1 , . . . , xn qs carries variables into a context where they
are bound by the quantifiers @x1 , . . . , @xn in ϕ. This is intended. This will occur at a few more
occasions during our discussion of Skolemisation.
Definition 4.5. Let ϕ be a formula in VNF. Let y1 , . . . , ym be all existentially bound variables
in ϕ s.t. yi ďϕ yj implies i ď j. Define skpϕq “ skym p¨ ¨ ¨ psky1 pϕqq¨q.
20
Lemma 4.3. Let @x1 ¨ ¨ ¨ @xn Dy ϕ be a formula and f a function symbol which does not occur
in ϕ, then
@xDy ϕ „sat @x ϕryzf pxqs
Proof. @x ϕryzf px1 , . . . , xn qs Ñ @xDy ϕ is a valid formula.
For the other direction, let M “ pD, Iq be a model of @xDy ϕ in the language L of @xDy ϕ.
We define a structure N in the language L Y tf u as follows: first N |L “ M und secondly
f N pa1 , . . . , an q “ b for a b P M s.t. M ( ϕrxza, yzbs. Note that such a b exists since M (
@xDy ϕ. Therefore N is well-defined and we have N ( @x ϕryzf px1 , . . . , xn qs.
Definition 4.6. Let Qx be a quantifier in ϕ. We define ϕQx as ϕ without the quantifier Qx.
Lemma 4.4. Let ϕ be a formula in VNF and NNF. Let Qx be a quantifier in ϕ which is
minimal w.r.t. ďϕ . Then ϕ ô Qx ϕQx .
Proof. The transformation rules
Qx ψ ˝ χ ÞÑ Qx pψ ˝ χq
χ ˝ Qx ψ ÞÑ Qx pχ ˝ ψq
preserve logical equivalence (since x cannot occur in χ due to the formula being in VNF).
Proposition 4.1. Let ϕ be a formula in VNF and NNF. Then skpϕq is in VNF and NNF,
satisfiability-equivalent to ϕ and does not contain D.
Proof. Proceeding by induction on the number of existential quantifiers in ϕ, it suffices to show
that sky pϕq is satisfiability-equivalent to ϕ for y being an outermost existential quantifier. Let Dy
be so that x ďϕ y implies that x is universally quantified and let @x1 , . . . , @xn be all quantifiers
which dominate Dy. By Lemma 4.4 we obtain
ϕ ô @xDy ϕQxQy .
By applying Lemma 4.3 we obtain
@xDy ϕQxQy „sat @xϕQxQy ryzf pxqs
for a fresh function symbol f . By applying Lemma 4.4 for shifting back the universal quantifiers
we obtain
@xϕQxQy ryzf pxqs ô ϕQy ryzf pxqs “ sky pϕq.
So in total we have ϕ „sat sky pϕq.
Example 4.3. We continue Example 4.2 where we have obtained the formula
ψ 2 “ Du@x pp Rpu, xq _ Qpxqq ^ Dv Rpx, vqq ^ @z P pzq ^ @y p Qpyq _ P pyqq.
in VNF and NNF. The Skolemisation of ψ 2 is
ψ 3 “ @x pp Rpc, xq _ Qpxqq ^ Rpx, f pxqqq ^ @z P pzq ^ @y p Qpyq _ P pyqq.
21
4.4
Clause normal form
Proposition 4.2. For every formula ϕ there is a clause set S which is satisfiability-equivalent
to ϕ. Moreover, the mapping from ϕ to S is computable.
Proof. Proceed as follows:
• Apply Lemma 4.1 to obtain a formula ϕ1 in VNF.
• Apply Lemma 4.2 to obtain a formula ϕ2 in VNF and NNF.
• Apply Proposition 4.1 to obtain a formula ϕ3 in VNF and NNF which does not contain
D.
• Let ϕ4 be ϕ3 without universal quantifiers, then @xϕ4 ô ϕ3 by Lemma 4.4.
Ź Ž i
• Transform the quantifier-free formula ϕ4 to a satisfiability-equivalent CNF ni“1 kj“1
Li,j
in order to obtain
ki
n ł
ľ
Li,j .
ϕ „sat @x
i“1 j“1
Then S “ ttLi,j | 1 ď j ď ki u | 1 ď i ď nu.
Example 4.4. In Example 4.3 we have obtained the formula
ψ 3 “ @x pp Rpc, xq _ Qpxqq ^ Rpx, f pxqqq ^ @z P pzq ^ @y p Qpyq _ P pyqq.
which is logically equivalent to the clause set
S “ tt Rpc, xq, Qpxqu; tRpx, f pxqqu; t P pzqu; t Qpyq, P pyquu.
22
Chapter 5
Resolution in first-order logic
5.1
Unification
Before we start with unification, we need some additional notions about substitutions. First
note that the set of all substitutions forms a monoid with the operation ˝ of concatenation and
the identity id.
Definition 5.1. Let σ be a substitution, then dompσq “ tx P V | xσ ‰ xu and rngpσq “
Ť
xPdompσq Varpxσq.
For substitutions σ, τ with dompσq X dompτ q “ H we define the substitution σ Y τ as
$
’
&xσ if x P dompσq
xpσ Y τ q “ xτ if x P dompτ q
’
%
x
otherwise.
Definition 5.2. Let T be a non-empty set of terms. A substitution σ is called unifier of T if
|T σ| “ 1.
Example 5.1. Let T “ thpx, gpxqq, hpf pyq, zqu. The substitution rxzf pyq, zzgpf pyqqs is a unifier
of T . The substitution rxzf pf pyqq, yzf pyq, zzgpf pf pyqqqs is another unifier of T .
Definition 5.3. Let σ and τ be substitutions. We say that σ is more general than τ , in symbols
σ ď τ , if there is a substitution θ s.t. σθ “ τ .
Definition 5.4. Let T be a set of terms. Then σ is called most general unifier of T if σ ď τ
for every unifier τ of T .
We will see soon that every set of literals, if it is unifiable at all, has a most general unifier. For
showing this we will need some auxiliary notions and results.
Definition 5.5. Let s and t be terms. Their difference set Diffps, tq is a finite set of pairs of
terms which is defined inductively as follows:
1. If s “ t then Diffps, tq “ H.
2. If s ‰ t but s “ f ps1 , . . . , sn q and t “ f pt1 , . . . , tn q for a function symbol f , then
Diffps, tq “
n
ď
i“1
23
Diffpsi , ti q.
3. Otherwise, Diffps, tq “ tps, tqu.
Example 5.2. Diffphpf pyq, zq, hpx, gpxqqq “ tpf pyq, xq, pz, gpxqqu
Lemma 5.1. A substitution σ is unifier of tt1 , t2 u iff σ is unifier of all pairs in Diffpt1 , t2 q.
Proof. Let t1 σ “ t2 σ and ps1 , s2 q P Diffpt1 , t2 q. Then s1 is a subterm of t1 at a certain position
p and s2 is subterm of t2 at the same position p. Therefore t1 σ “ t2 σ implies s1 σ “ s2 σ.
For the other direction: if s1 σ “ s2 σ for all ps1 , s2 q P Diffpt1 , t2 q, then the definition of Diff
directly implies that t1 σ “ t2 σ (using induction).
The next lemma is a central property for the existence of most general unifiers: an arbitrary
unifier can be factored into a difference pair and a rest.
Lemma 5.2. Let t1 and t2 be terms, τ a unifier of tt1 , t2 u and px, sq P Diffpt1 , t2 q. Then
τ “ rxzssτ 1 where τ 1 “ τ |dompτ qztxu .
Proof. As px, sq P Diffpt1 , t2 q and τ is unifier of tt1 , t2 u we have
xτ “ sτ.
(5.1)
by Lemma 5.1. Furthermore x R Varpsq: since px, sq is a difference pair s ‰ x. So if x P Varpsq
then xτ would be proper subterm of sτ which would contradict (5.1). Therefore
sτ “ sτ 1 .
(5.2)
and we obtain
τ “ rxzxτ s Y τ 1 “p5.1q rxzsτ s Y τ 1 “p5.2q rxzsτ 1 s Y τ 1 “ rxzssτ 1 .
Example 5.3. Let t1 “ hpf pyq, zq and t2 “ hpx, gpxqqq, then τ “ rxzf pcq, yzc, zzgpf pcqqs is a
unifier of tt1 , t2 u, pz, gpxqq is a difference pair and hence τ “ rzzgpxqsrxzf pcq, yzcs.
Theorem 5.1. Let tt1 , t2 u be unifiable. Then tt1 , t2 u has a most general unifier.
Proof. We will proceed by induction on the number of variables which occur in tt1 , t2 u, written
as |Varptt1 , t2 uq|. If |Varptt1 , t2 uq| “ 0 then unifiability already implies t1 “ t2 . If t1 “ t2
(independently of |Varptt1 , t2 uq|), then every substitution is a unifier and hence id is a most
general unifier.
So let t1 ‰ t2 . Then Diffpt1 , t2 q ‰ H. Let ps1 , s2 q P Diffpt1 , t2 q. Since tt1 , t2 u is unifiable,
so is ts1 , s2 u. Furthermore, s1 and s2 have different head symbols since they are a difference
pair. One of these symbols must be a variable (otherwise ts1 , s2 u would not be unifiable). So
let w.l.o.g. s1 “ x. Then we also have x R Varps2 q, because otherwise xσ would be a proper
subterm of s2 σ for any substitution σ which would contradict unifiability of ts1 , s2 u.
We define t1i “ ti rxzs2 s for i “ 1, 2 and claim that tt11 , t12 u is unifiable. To see that, let τ be a
unifier of tt1 , t2 u. Then by Lemma 5.2 we have τ “ rxzs2 sτ 1 and hence
t11 τ 1 “ t1 rxzs2 sτ 1 “ t1 τ “ t2 τ “ t2 rxzs2 sτ 1 “ t12 τ 1 .
24
So tt11 , t12 u are unifiable and contain stricly less variables than tt1 , t2 u because x does no longer
appear. By induction hypothesis there is a most general unifier σ 1 of tt11 , t12 u. We define σ “
rxzs2 sσ 1 and claim that σ is a most general unifier of tt1 , t2 u. First, σ is a unifier because
t1 σ “ t1 rxzs2 sσ 1 “ t11 σ 1 “ t12 σ 1 “ t2 rxzs2 sσ 1 “ t2 σ.
Secondly, let τ be an arbitrary unifier of tt1 , t2 u, then by Lemma 5.2 we can write τ as τ “
rxzs2 sτ 1 and – as above – we can show that τ 1 is a unifier of tt11 , t12 u. So there is a θ s.t. σ 1 θ “ τ 1 .
But we also have
σθ “ rxzs2 sσ 1 θ “ rxzs2 sτ 1 “ τ
and therefore σ is a most general unifier.
The above proof induces the following algorithm for the computation of a most general unififer
of two terms t1 and t2 :
• If Diffpt1 , t2 q “ H then mgupt1 , t2 q “ id.
• If ps1 , s2 q P Diffpt1 , t2 q where both s1 and s2 have starting symbols which are constants or
function symbols, then t1 , t2 is not unifiable.
• Otherwise, let px, sq P Diffpt1 , t2 q. Then:
– If x P Varpsq, then t1 , t2 is not unifiable.
– If x R Varpsq, let t11 “ t1 rxzss and t12 “ t2 rxzss. Then mgupt1 , t2 q “ rxzssmgupt11 , t12 q.
Example 5.4. Let t1 “ gpx, cq and t2 “ gpf pyq, yq. The the application of the above algorithm
yields the following table:
gpx, cq, gpf pyq, yq
rxzf pyqs
gpf pyq, cq, gpf pyq, yq
ryzcs
gpf pcq, cq, gpf pcq, cq
id
and hence mgupt1 , t2 q “ rxzf pyqsryzcsid “ rxzf pcq, yzcs.
Definition 5.6. A set of literals E is called unifiable, if there is a substitution σ s.t. |Eσ| “ 1.
Corollary 5.1. If a finite set of literals is unifiable, then it has a most general unifier.
Proof. We will show that for every finite set E of literals there are terms s1 , s2 s.t. the unifiers
of E are exactly the unifiers of ts1 , s2 u.
By replacing every r-ary predicate symbol that appears in E by a new r-ary function symbol
fP and negation by a new unary function symbol n we obtain a set of terms TE “ tt1 , . . . , tm u
from E. Now let f be a new m-ary function symbol and define s1 “ f pt1 , . . . , t1 q and s2 “
f pt1 , . . . , tm q. Then a substitution σ is unifier of ts1 , s2 u iff t1 σ “ ti σ for all i P t1, . . . , mu iff
ti σ “ tj σ for all i, j P t1, . . . , mu iff σ is unifier of E.
5.2
Resolution
A variable permutation is a substitution σ : V Ñ V which is bijective.
25
Definition 5.7. For two clauses C and C 1 we say that C 1 is a variant of C, if there is a variable
permutation π s.t. Cπ “ C 1 .
Definition 5.8. Let C and D be variable-disjoint clauses. Let K P C and L P D be literals s.t.
tK, Lu are unifiable and let µ be a most general unifier of tK, Lu. Then
resK,L pC, Dq “ ppCztKuq Y pDztLuqqµ
is called resolvent of C and D.
Example 5.5. Let C “ ty ď y ¨ yu and D “ t x ď y, x ă spyqu. We rename y in C to y 1 and
thus obtain C 1 “ ty 1 ď y 1 ¨ y 1 u. Now C 1 and D are variable-disjoint. The atoms y 1 ď y 1 ¨ y 1 and
x ď y have a most general unifier µ “ rxzy 1 , yzy 1 ¨ y 1 s and hence C and D form the resolvent
ty 1 ă spy 1 ¨ y 1 qu.
Note that renaming y to y 1 is necessary for carrying out this resolution step as y ď y ¨ y and
x ď y are not unifiable.
Definition 5.9. Let C be a clause and D Ď C be unifiable with most general unifier µ. Then
Cµ is called factor of C.
Example 5.6. The clause set S “ ttP pxq, P pyqu; t P puq, P pvquu is unsatisfiable. Up to
variable-renaming the only resolvent obtainable from S is tP pxq, P pvqu. In particular, the
empty clause is not derivable from S by resolution alone.
On the other hand, tP pxq, P pyqu has the factor tP pxqu and t P puq, P pvqu has the factor
t P puqu. From tP pxqu and t P puqu one obtains the empty clause by a single resolution step.
The above example shows that for completeness, the factor rule is necessary. There are also
variants of the resolution rule which incorporate factoring. Then an explicit factor rule is not
necessary. We will see more details on this later.
Definition 5.10. Let C and D be variable-disjoint clauses. Let s “ t P C and Lrus P D be
literals s.t. s and u are unifiable. Let µ be a most general unifier of ts, uu. Then
`
˘
pars“t,Lrus pC, Dq “ pCzts “ tuq Y pDztLrusuq Y tLrtsu µ
is called paramodulant of C and D.
Example 5.7. A short example for a deduction consisting of two paramodulations is:
x`0“x
x ` spyq “ spx ` yq s2 p0q ` sp0q ď s2 p0q
sps2 p0q ` 0q ď s2 p0q
s3 p0q ď s2 p0q
Definition 5.11. Let C, D be clauses and let C 1 , D1 be variable-disjoint variants of C and D
respectively. Let C01 Ď C 1 and D01 Ď D1 s.t. C01 Y D01 is unifiable with mgu σ. Then the clause
ppC 1 zC01 q Y pD1 zD01 qqσ
is called big-step resolvent of C and D.
Definition 5.12. Let C and D be clauses and let C 1 and D1 be variable-disjoint variants of C
and D respectively. Let C01 Ď C 1 s.t. C01 is unifiable with mgu µ to s “ t and let D01 Ď D1 s.t.
D01 is unifiable with mgu ν to Lrus for some term u which is unifiable with s with mgu σ. Then
` 1
˘
pC µzts “ tuq Y pD1 νztLrusuq Y tLrtsu σ
is called big-step paramodulant of C and D.
26
Definition 5.13. Let S be a clause set. A finite list C1 , . . . , Cn of clauses is called resolution
deduction from S if for all i P t1, . . . , nu:
1. Ci P S, or
2. Ci “ tt “ tu for some term t, or
3. there are j, k ă i s.t. Ci is a big-step resolvent of Cj and Ck , or
4. there are j, k ă i s.t. Ci is a big-step paramodulant of Cj and Ck .
If Cn “ H, then C1 , . . . , Cn is called resolution refutation.
Theorem 5.2 (Soundness). If S has a resolution refutation, then S is unsatisfiable.
Proof. We will show the following stronger statement: if C1 , . . . , Cn is a deduction consisting of
clauses from S, reflexivity, factor, variant, resolution and paramodulation, then S ( Cn . Then,
if Cn “ H, S is unsatisfiable. We proceed by induction on n, making a case distinction on the
rule used for deriving the last clause Cn :
1. If Cn P S, we are done.
2. If Cn “ tt “ tu, we are done,
3. If Cn is a variant of some Cj with j ă n then we are done since renaming of bound
variables preserves logical equivalence.
4. If Cn “ Cj σ for some j ă n then we are done since C ( Cτ for all clauses C and all
substitutions τ .
5. Let Cn “ resLj ,Lk pCj , Ck q “ ppCj ztLj uq Y pCk ztLk uqqµ for some j, k ă n. Then Lj µ “
Lk µ. Let M ( S, then by induction hypothesis M ( Cj and M ( Ck . Therefore also
M ( Cj µ and M ( Ck µ. Now, writing Cj “ Cj1 _ Lj and Ck “ Ck1 _ Lk , we make a
case distinction. If M ( Lj µ, then M ( Ck1 µ and hence M ( Cn . The case M ( Lk µ is
symmetric.
`
˘
6. Let Cn “ pars“t,Lrus pCj , Ck q “ pCj zts “ tuq Y pCk ztLrusuq Y tLrtsu µ for some j, k ă n.
Then sµ “ uµ. Let M ( S, then by induction hypothesis M ( Cj and M ( Ck .
Therefore also M ( Cj µ and M ( Ck µ. Now, writing Cj “ Cj1 _s “ t and Ck “ Ck1 _Lrus
we make a case distinction. If M ( sµ “ tµ then, since uµ “ sµ and M ( Ck1 µ _ Lrusµ,
we have M ( Ck1 µ _ Lrtsµ which is a subclause of Cn and so M ( Cn . If, on the other
hand, M * sµ “ tµ, then M ( Cj1 µ which is a subclause of Cn and so M ( Cn .
There is a number of theorem provers for first-order logic which are based on resolution,
paramodulation and variants therefore, for example Vampire1 , E2 , SPASS3 , prover94 . The
following is an example input file for prover9. We speak about a context of groups, use f for
the binary group operation, g for the unary inverse operation and e for the unit element. The
following input file asks prover9 to show that every left-unit is also a right-unit.
1
http://www.vprover.org/
www.eprover.org/
3
http://www.spass-prover.org/
4
https://www.cs.unm.edu/~mccune/mace4/
2
27
formulas(assumptions).
f(x,f(y,z)) = f(f(x,y),z).
f(e,x) = x.
f(g(x),x) = e.
f(x,g(x)) = e.
end_of_list.
formulas(goals).
f(x,e) = x.
end_of_list.
28
Chapter 6
Redundancy
6.1
Subsumption
Definition 6.1. Let C and D be clauses. We say that C subsumes D if there is a substitution
σ s.t. Cσ Ď D. In this case we write C ďss D. Let S, T be clause sets. Then S ďss T if @D P T
DC P S s.t. C ďss D.
Occasionally we want to make the substitution explicit; then we write C ďσss D as abbreviation
for Cσ Ď D.
Lemma 6.1. If C ďss D then C ( D. If S ďss T then S ( T .
Proof. We have C ( Cσ for any clause C and any substitution σ. Moreover, if D1 Ď D, then
D1 ( D because a clause is a disjunction. Letting D1 “ Cσ we see that C ďss D implies that
C ( D.
If S ďss T , then @D P T DC P S s.t. C ďss D, hence C ( D and so S ( D since S is a
conjunction. This means that every conjunct of T is implied by S, so S ( T .
Example 6.1. The converse of Lemma 6.1 is not true. Consider C “ t P pxq, P pf pxqqu and
D “ t P pyq, P pf pf pyqqqu. Then C ( D but there is no substitution σ s.t. Cσ Ď D.
So subsumption is a restricted form of implication. While clause implication is undecidable,
subsumption is decidable.
Proposition 6.1. Let S be a clause set, T Ď S s.t. SzT ďss S. Then S and SzT are logically
equivalent.
Proof. S ( SzT is immediate. The other direction has just been shown in Lemma 6.1.
This result shows that S is unsatisfiable iff SzT is unsatisfiable. Therefore, up to a certain point,
it gives a justification for removing subsumed clauses from a clause set before we start the search
for a refutation. By telling us that unsatisfiability is preserved it shows the correctness of this
preprocessing step. However, it does not tell us anything about the proof length. This result
does not rule out the existence of short refutations of S in a situation where all refutations of
SzT are long. However, we will see that this is not the case. Every refutation of S can be
pruned to one of Sz. In order to prove this result we need to carry out a thorough study of the
relationship between subsumption and the inference rules considered so far.
29
C
ďss
ďss
E
D
ďss
variant
E
ďss
C
variant
D
Figure 6.1: Lemma 6.2
C
ďss
factor
D
ďss
E
Figure 6.2: Lemma 6.3
Lemma 6.2. Let C ďss D and E be a variant of D, then C ďss E.
Proof. If C ďss D, then there is a σ s.t. Cσ Ď D. If E is a variant of D, then there is a variable
permutation π s.t. Dπ “ E. Therefore Cσπ Ď E, i.e., C ďss E.
Note that being a variant is a symmetric relation, hence we obtain both of the statements
depicted in Figure 6.1.
Lemma 6.3. Let D be a factor of C and D ďss E. Then C ďss E.
Proof. D “ Cµ for some substitution µ and Dσ Ď E for some subtitution σ. Therefore
Cµσ Ď E, i.e., C ďss E.
Note that factor (in contrast to variant) is not symmetric, for the other direction, a different
relation holds and more work is needed, see the next lemma:
Lemma 6.4. Let C 1 ďss C. Let C0 Ď C be unifiable with mgu µ and let C0 µ “ tLu. Then
there is a factor C 1 µ1 of C 1 with C 1 µ1 ďτss Cµ s.t. there is at most one L1 P C 1 µ1 with L1 τ “ L.
Proof. As C 1 ďss C there is a σ s.t. C 1 σ Ď C. Let C0 Ď Cˆ0 Ď C be maximal with Cˆ0 µ “ tLu.
Let C01 “ tL P C 1 | Lσ P Cˆ0 u. Since µ is a unifier of Cˆ0 and C01 σ Ď Cˆ0 , µ is also a unifier of
C01 σ. Then σµ is a unifier of C01 . Let µ1 be a mgu of C01 , then C 1 µ1 is a factor of C 1 and we have
µ1 ď σµ. Let τ be s.t. µ1 τ “ σµ. Then we have C 1 µ1 τ “ C 1 σµ Ď Cµ “ D, i.e., C 1 µ1 ďτss Cµ.
C01 Ď C 1
ďσ
ss
factorµ1
C 1 µ1
C Ě C0
factorµ
ďτss
Cµ Q L
Figure 6.3: Lemma 6.4
30
C1
ďss
C Ě C0
factorν 1
D0 Ď D
ďθss
Cν Q L
D1
factorλ1
factorλ
factorν
L2 P C 1 ν 1
ss ě
K P Dλ
τ ě
ss
D 1 λ1 Q K 2
µ˚
E
ďεss
µ2
E1
Figure 6.4: Lemma 6.6
Now suppose that there are L21 , L22 P C 1 µ1 with L21 τ “ L22 τ “ L. Then there are L11 , L12 P C 1 s.t.
L11 µ1 “ L21 and L12 µ1 “ L22 . Furthermore, L11 σµ “ L11 µ1 τ “ L21 τ “ L and analogously L12 σµ “ L.
Hence, by maximality of Cˆ0 , L11 σ, L12 σ P Cˆ0 . Therefore L11 , L12 P C01 and as µ1 is mgu of C01 we
have L11 µ1 “ L12 µ1 , i.e., L21 “ L22 .
Before we move on to analyse the relationship between the resolution rule and subsumption we
need to make a prepartory observation on set-operations on clauses and substitutions. We have
already repeatedly used the fact that pC Y Dqσ “ Cσ Y Dσ for all clauses C and D, and all
substitutions σ. For set difference, the situation is more complicated as the following example
demonstrates.
Example 6.2. Let C “ tP paqu, D “ tP pxqu and σ “ rxzas. Then
Cσ “ tP paqu,
Dσ “ tP paqu,
CσzDσ “ H,
pCzDqσ “ tP paqu.
so pCzDqσ ‰ CσzDσ.
However, as the following lemma shows, we have equality under an additional injectivitiycondition.
Lemma 6.5. Let C, D be clauses and σ a substitution.
1. Then CσzDσ Ď pCzDqσ.
2. If for every L P Dσ there is at most one L1 P C YD with L1 σ “ L, then pCzDqσ Ď CσzDσ.
Proof. If L P CσzDσ then there is a L0 P C s.t. L “ L0 σ. But L0 R D for suppose L0 P D,
then L0 σ P Dσ which is not the case. So L0 P CzD and hence L “ L0 σ P pCzDqσ.
For 2, let L P pCzDqσ, then there is L0 P C s.t. L0 R D and L0 σ “ L, so L P Cσ. Suppose
L P Dσ, then there would be L1 P D s.t. L1 σ “ L. But then, by the assumption, L0 “ L1
which is a contradiction. Therefore L R Dσ and so L P CσzDσ.
Lemma 6.6. Let E be a big-step resolvent of C and D and let C 1 ďss C and D1 ďss D. Then
C 1 ďss E or D1 ďss E or there is a big-step resolvent E 1 of C 1 and D1 s.t. E 1 ďss E.
31
Proof. By Lemma 6.2 we can assume that C and D are variable-disjoint while preserving the
assumptions C 1 ďss C and D1 ďss D. Let C0 Ď C and D0 Ď D s.t. C0 Y D0 is unifiable with
mgu µ s.t. E “ ppCzC0 q Y pDzD0 qqµ. Let ν be the mgu of C0 and λ be the mgu of D0 and let
C0 ν “ tLu and D0 λ “ tKu. Since C0 Y D0 is unifiable and ν, λ are most general, also tL, Ku
is unifiable with a mgu µ˚ . Then µ “ pν Y λqµ˚
By Lemma 6.4 applied to C 1 ďss C and the factor Cν, there is a factor C 1 ν 1 of C 1 s.t. C 1 ν 1 ďθss Cν
and there is at most one L2 P C 1 ν 1 s.t. L2 θ “ L. Assume L R C 1 ν 1 θ. We know that C 1 ν 1 θ Ď Cν
and tLu “ C0 ν. So C 1 ν 1 θ Ď CνzC0 ν Ď pCzC0 qν and therefore C 1 ν 1 θµ˚ Ď pCzC0 qνµ˚ Ď
ppCzC0 qν Y pDzD0 qλqµ˚ “ E. So C 1 ν 1 ďss E and since C 2 is a factor of C 1 also C 1 ďss E by
Lemma 6.3. So from now on we assume that L P C 1 ν 1 θ.
By Lemma 6.4 applied to D1 ďss D and the factor Dλ, there is a factor D1 λ1 of D1 s.t. D1 λ1 ďτss
Dλ and there is at most one K 2 P D1 λ1 s.t. K 2 τ “ K. As above K R D1 λ1 τ implies D1 ďss E
and therefore, from now on, we assume K P D1 λ1 τ .
Since L P C 1 ν 1 θ there is a L2 P C 1 ν 1 s.t. L2 θ “ L and similarily there is a K 2 P D1 λ1 s.t.
K 2 τ “ K. As tL, Ku is unifiable with mgu µ˚ we have Lµ˚ “ Kµ˚ and hence L2 θµ˚ “ K 2 τ µ˚ .
Let µ2 be a mgu of tL2 , K 2 u, then µ2 ď pθ Y τ qµ˚ . We define
E 1 “ resL2 ,K 2 pC 1 ν 1 , D1 λ1 q
Then E 1 is a big-step resolvent of C 1 and D1 by definition. It remains to show that E 1 ďss E.
To that aim, let ε be a substitution s.t. µ2 ε “ pθ Y τ qµ˚ . Then
E 1 ε “ ppC 1 ν 1 ztL2 uq Y pD1 λ1 ztK 2 uqµ2 ε “ ppC 1 ν 1 ztL2 uqθ Y pD1 λ1 ztK 2 uqτ qµ˚ .
There is at most one L0 P C 1 ν 1 with L0 θ “ L2 θ and – analogously – there is at most one
K0 P D1 λ1 with K0 τ “ K 2 τ . Therefore we can apply Lemma 6.5 to obtain
“ ppC 1 ν 1 θztL2 uθq Y pD1 λ1 τ ztK 2 uτ qqµ˚
and since L2 θ “ L, K 2 τ “ K and C 1 ν 1 θ Ď Cν and D1 λ1 τ Ď Dλ we have
Ď ppCνztLuq Y pDλztKuqµ˚
and as C0 ν “ tLu and D0 λ “ tKu we have
Ď ppCzC0 qν Y pDzD0 qλqµ˚ “ ppCzC0 q Y pDzD0 qqµ “ E,
i.e., E 1 ďεss E.
Lemma 6.7. Let E be a big-step paramodulant of C and D and let C 1 ďss C and D1 ďss D.
Then C 1 ďss E or D1 ďss E or there is a big-step paramodulant E 1 of C 1 and D1 s.t. E 1 ďss E.
Proof Sketch. Follow the same strategy as for Lemma 6.6 above.
We can now prove our main lemma on subsumption in resolution deductions which will be
useful on several occasions.
Lemma 6.8. Let C1 , . . . , Cn be a resolution deduction and let Ck1 ďss Ck . Then there is a
resolution deduction C11 , . . . , Cn1 s.t. for all i P t1, . . . , nuztku: if i ă k or i is an initial position,
then Ci1 “ Ci and otherwise: Ci1 ďss Ci .
32
Note that C1 , . . . , Cn being a deduction from a clause set S does not entail that C11 , . . . , Cn1 is
also a deduction from S. If Ck P S and Ck1 R S, then C11 , . . . , Cn1 is a deduction from S Y tCk1 u
or, if Ck occurs only once in C1 , . . . , Cn , even a deduction from pSztCk uq Y tCk1 u.
1
Proof. We consider the deduction C1 , . . . , Ck´1 , Ck1 and show that there are Ck`1
, . . . , Cn1 s.t.
C1 , . . . , Ck´1 , Ck1 , . . . , Cn1 is a deduction by induction on n. If n “ k we are done. For the
1
induction step, assume the clauses exist for n. If Cn`1 is an initial clause, let Cn`1
“ Cn`1
and we are done. If Cn`1 is a big-step resolvent of Cj and Cl then by induction hypothesis
Cj1 ďss Cj and Cl1 ďss Cl . So by Lemma 6.6 we have i) Cj1 ďss Cn`1 or ii) Cl1 ďss Cn`1 or iii)
1
there is a big-step resolvent D of Cj1 and Cl1 s.t. D ďss Cn`1 . We let Cn`1
“ Cj1 in case i),
1
1
1
1
Cn`1 “ Cl in case ii), and Cn`1 “ D in case iii) and hence have Cn`1 ďss Cn`1 . If Cn`1 is a
big-step paramodulant, proceed analogously using Lemma 6.7.
We are now in a position to prove a stronger version of Proposition 6.1.
Theorem 6.1. Let S, T be clause sets s.t. S ďss T . If T has a resolution refutation of length
n, then S has a resolution refutation of length at most n.
Proof. Let ρ be a resolution refutation of a finite T0 Ď T . Let S0 be a finite subset of S s.t.
S0 ďss T0 . W.l.o.g. assume that every clause occurs at most once in ρ. Then, use Lemma 6.8
for each D P T0 to replace D in ρ by a C P S0 with C ďss D resulting in a refutation ρ1 of
pT0 ztDuq Y tCu.
The above theorem shows that a clause set should be reduced by subsumption before the search
for a resolution refutation is started. The following proposition shows that we can also restrict
the search for a refutation in such a way that we never derive a clause which is subsumed by
a clause already derived. This is called forward subsumption and is one of the most important
techniques for avoiding redundancy in first-order resolution theorem proving.
Theorem 6.2 (forward-subsumption). If S has a resolution refutation of length n, then S has
a resolution refutation C1 , . . . , Cm for some m ď n s.t. there are no i ă j with Ci ďss Cj .
Proof. Let C1 , . . . , Cm be a resolution refutation and let i ă j s.t. Ci ďss Cj . Apply Lemma 6.8
1
1 . After
in order to replace Cj by Ci and thus obtain a refutation C1 , . . . , Cj´1 , Ci , Cj`1
, . . . , Cm
1
1 . Redropping the copy of Ci at position j we obtain a refutation C1 , . . . , Cj´1 , Cj`1 , . . . , Cm
peating this step will terminate since it descreases the length of the derivation. Furthermore,
it will terminate with a refutation satisfiying the condition of the theorem.
6.2
Tautology deletion
Definition 6.2. A clause C is called tautological if there is a literal L s.t. L, L P C.
Tautological clauses may be derived from non-tautological clauses, e.g. as in
P pf pxqq, Qpf pxqq, Rpxq
Qpyq, P pyq res
ryzf pxqs
P pf pxqq, P pf pxqq, Rpxq
Lemma 6.9. If S has a resolution refutation ρ, then there is a resolution refutation ρ1 of S s.t.
|ρ1 | ď |ρ| and every clause in ρ1 is ancestor of the empty clause at the end of ρ1 .
33
Without Proof.
Theorem 6.3. If ρ is a resolution refutation of S, then S has a resolution refutation ρ1 which
does not contain tautologies and satisfies |ρ1 | ď |ρ|.
Proof. By Lemma 6.9 we can assume that ρ only contains ancestors of the empty clause. If
a tautological clause is ancestor of the empty clause, both of its dual literals must eventually
disappear. The negative literal can not be removed by paramodulation, so it must be removed
by resolution. Let L be the positive, L be the negative literal. Then the resolution step which
removes L is of the form
C Y tL, Lu D Y tKu
resµ
pC Y D Y tLuqµ
where µ is an mgu of K and L. Note that Kµ “ Lµ and hence that D YtKu ďµss pC YD YtLuqµ.
But by the forward-subsumption theorem there is a ρ1 with |ρ1 | ď |ρ| without a clause being
subsumed by an earlier derived clause.
This theorem shows that the derivation of tautological clauses is useless. This can be avoided
in order to reduce the size of the search space.
34
Chapter 7
Completeness
In this chapter, it is important to distingish between first-order logic with equality and firstorder logic without equality. The former interprets the binary predicate symbol “ as actual
equality in a structure, the latter treats “ just as any other predicate symbol. Formally, these
are two different notions of structure and hence two different notions of truth, satisfiability,
validity, etc. To illustrate this difference, consider the following example.
Example 7.1. Let L “ t0, 1, `, ´, ¨u be the language of rings. Let M “ pZ, Iq be the structure
in first-order logic with equality defined by I being the standard interpretation of L. Then Ip“q
is the actual equality relation on Z.
If we work in first-order logic without equality, then “ is just another binary predicate symbol,
whose interpretation we have to fix when defining a structure. For example, let M1 “ pZ, I 1 q
where I 1 |L “ I and
I 1 p“q is defined by px, yq P I 1 p“q ô x ” y pmod mq for some m ě 2. Then,
Žm´1
letting ϕm “ @x i“0 x “ i, we have M1 ( ϕm but M * ϕm .
We will first prove the completeness theorem for first-order logic without equality and then
base the completeness of first-order logic with equality on that. A central tool for the proof of
completeness are ground refutations.
Definition 7.1. A clause D is called ground instance of a clause C if D contains no variables
and there is a substitution σ s.t. Cσ “ D. A resolution deduction is called ground deduction if
it consists of ground clauses only.
Note that, in particular, in a ground deduction every unifier is the identity substitution.
Definition 7.2. For a clause set S we define GpSq “ tD | D is ground instance of a C P Cu. In
extension of the terminology, a ground refutation of GpSq is also called ground refutation of S.
7.1
Completeness without equality
Lemma 7.1. Let S be a clause set. Then S is satisfiable in first-order logic without equality iff
GpSq is satisfiable in first-order logic without equality.
Proof. The left-to-right implication follows directly from the observation that every model of S
is a model of GpSq.
For the other direction, let I be a propositional interpretation of GpSq s.t. IpGpSqq “ 1. We
define a first-order structure MI as follows: the domain of MI are all ground terms of the
35
language of S. The interpretation of a term t is defined as tMI “ t and the interpretation of
the predicate symbols as P MI pt1 , . . . , tn q “ IpP pt1 , . . . , tn qq.
Let C P S, then MI ( D for every ground instance D of C. But since the domain of MI only
contains ground terms, satisfying D for all ground instances is equivalent to satisfying @x C.
Therefore MI ( C for all C P S.
The key observation for basing the completess proof on the notion of subsumption is that
S ďss GpSq.
Theorem 7.1 (Completeness). Let S be a clause set. If S is unsatisfiable in first-order logic
without equality, then S has a resolution refutation.
In fact, a stronger statement is true: S even has a resolution refutation without reflexivity
instances and paramodulation inferences.
Proof. Let S be unsatisfiable, then by Lemma 7.1 also GpSq is unsatisfiable. By the completeness
of propositional resolution there is a propositional resolution refutation of GpSq, i.e., a first-order
resolution refutation of GpSq which consists only of initial clauses from S and ground resolution.
Since S ďss GpSq we can apply Theorem 6.1 in order to obtain a resolution refutation of S.
This proof provides, at least on the theoretical level, an alternative method for showing that
a first-order clause set is unsatifiable: generate ground instances from GpSq and refute them
using propositional resolution. While this unification-free method is complete, it is much less
efficient than first-order resolution with unification and does not play a role in practice.
7.2
Completeness with equality
In order to prove the completeness theorem for first-order logic with equality we will make use
of an explicit axiomatisation of the theory of equality EQL (see Chapter 3) as a clause set.
Definition 7.3. Let L be a first-order language. We define the clause set
EQL “ttx “ xu; t x “ y, y “ xu; t x “ y, y “ z, x “ zuu
Y tt x1 “ y1 , . . . , xn “ yn , f px1 , . . . , xn q “ f py1 , . . . , yn qu | f {n P Lu
Y tt x1 “ y1 , . . . , xn “ yn , P px1 , . . . , xn q, P py1 , . . . , yn qu | P {n P Lu.
Lemma 7.2. Every clause in EQL has a deduction from tautologies and reflexivity using
paramodulation.
Proof. For reflexivity, there is nothing to do. For symmetry we have
x “ y, x “ y x “ x
x “ y, y “ x
For transitivity we have
y “ z, y “ z
x “ y, x “ y
x “ y, y “ z, x “ z
For f -congruence we have
x1 “ y1 , x1 “ y1 f px1 , . . . , xn q “ f px1 , . . . , xn q
x1 “ y1 , f px1 , . . . , xn q “ f py1 , x2 , . . . , xn q
..
..
x1 “ y1 , . . . , xn “ yn , f px1 , . . . , xn q “ f py1 , . . . , yn q
36
and for P -congruence we have
x1 “ y1 , x1 “ y1
P px1 , . . . , xn q, P px1 , . . . , xn q
x1 “ y1 , P px1 , . . . , xn q, P py1 , x2 , . . . , xn q
..
..
x1 “ y1 , . . . , xn “ yn , P px1 , . . . , xn q, P py1 , . . . , yn q
Lemma 7.3. Let S be a clause set. S is satisfiable in first-order logic with equality iff S Y EQL
is satisfiable in first-order logic without equality.
Proof. Let M be an L-structure with equality, define the L Y t“u-structure without equality
M1 “ pD, I 1 q where I 1 |L “ I and I 1 p“q is equality in D. Since M interprets “ as equality in
D, we have M ( ϕ iff M1 ( ϕ. In addition, equality in D is a congruence relation w.r.t. L and
therefore M1 ( EQL .
For the other direction, let M “ pD, Iq be an L-structure without equality which satisfies EQL ,
then Ip“q is a congruence relation on D w.r.t. L. Define M1 “ pD{Ip“q, I 1 q where I 1 is the
interpretation induced by I on D{Ip“q. Note that I 1 is well-defined because Ip“q is a congruence
relation. Then we have M ( ϕ iff M1 ( ϕ.
Theorem 7.2 (Completeness). Let S be a clause set. If S is unsatisfiable in first-order logic
with equality, then S has a resolution refutation.
Proof. Let S be unsatisfiable in first-order logic with equality. Then S Y EQL is unsatisfiable
in first-order logic without equality by Lemma 7.3. By the completeness theorem for first-order
logic without equality we obtain a resolution refutation of S Y EQL . By Lemma 7.2 we obtain
a resolution refutation of S Y T where T is a set of tautological clauses. By Theorem 6.3 we
obtain a resolution refutation of S.
37
38
Chapter 8
Further Topics
8.1
Induction
Resolution theorem provers, being sound and complete for first-order logic, do not handle induction. However, it is possible to prove theorems by induction by supplying the necessary
induction axioms thus reducing the problem to checking validity of a first-order formula. For
instance, when we want to prove the associativity of ` from the definition of ` we need to
do an induction on the rightmost variable of the associativity axiom. This information can be
passed to prover9 as follows:
formulas(sos).
all x x + 0 = x.
all x all y x + s(y) = s(x + y).
all z ( P(z) <-> all x all y ( x + y ) + z = x + ( y + z )).
( P(0) & all z ( P(z) -> P(s(z))) ) -> all z P(z).
end_of_list.
formulas(goals).
all x all y all z ( x + y ) + z = x + ( y + z ).
end_of_list.
The automated generation of induction invariants is a difficult problem which is of considerable
importance for applications of automated deduction in areas such as software verification.
39