* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PREDICATE LOGIC
Survey
Document related concepts
Modal logic wikipedia , lookup
Truth-bearer wikipedia , lookup
Curry–Howard correspondence wikipedia , lookup
Sequent calculus wikipedia , lookup
Hyperreal number wikipedia , lookup
Intuitionistic logic wikipedia , lookup
Non-standard calculus wikipedia , lookup
Structure (mathematical logic) wikipedia , lookup
History of the function concept wikipedia , lookup
Boolean satisfiability problem wikipedia , lookup
Natural deduction wikipedia , lookup
First-order logic wikipedia , lookup
Law of thought wikipedia , lookup
Propositional calculus wikipedia , lookup
Propositional formula wikipedia , lookup
Transcript
PREDICATE LOGIC Jorma K. Mattila LUT, Department of Mathematics and Physics 1 Basic Concepts In predicate logic the formalism of propositional logic is extended and is made it more finely build than propositional logic. Thus it is possible to present more complicated expressions of natural language and use them in formal inference. Example 1.1. Consider the inference All ravens fly. Peter is a raven. So, Peter flies. In the view of propositional logic, the sentences are totally different atoms, and thus they have no such common parts needed in the analysis of the inference. However, there are clearly similar parts in the sentences. In the first sentence there is spoken about ravens and flying, in the second sentence about ravens and Peter, and in the last sentence there is spoken about Peter and flying. So, points of contacts can be found. Predicate language, as propositional language, too, consists of the following parts: Syntax: Alphabet and rules for formation wff’s Semantics: Questions connecting models: interpretation, determining truth values, i.e. how to translate wff’s for example into natural language, and what are the exact conditions for their truth. Proof theory: The principle of calculus, axioms and inference rules. Theorems are deduced from axioms by means of inference rules. (Axioms are 1 premises that are taken the basic truths of the system. Thus they must be logically true.) Let us denote the pure predicate language by the symbol P. 1.1 Syntactic Components of Predicate Logic Predicate logic contains all the components of propositional logic, including propositional variables and constants. In addition, predicate logic contains terms, predicates, and quantifiers. Terms are typically used in place of nouns and pronouns. They are combined into sentences by means of predicates. For example, in the sentence "John loves Mary", the nouns are "John" and "Mary", and the predicate is "loves". The same is true if this sentence is translated into predicate logic, except that "John" and "Mary" are now called terms. Predicate logic uses quantifiers to indicate if a statement is always true, if it is sometimes true, or it is never true. In this sense, the quantifiers are used to correspond words such as "all", "some", "never", and related expressions. Definition 1.1. Alphabet of P consists of connectives (familiar from propositional logic) ¬, ∧, ∨, → and ↔and parentheses (, ) and dot ,. Other symbols belonging to alphabet are individual constants: a, b, c, . . . variables: x, y, z, . . . predicate symbols P , Q, R, . . . function symbols f , g, . . . identity symbol = quantifiers ∀ and ∃. Constants are thought to be names of creatures of (real or ideal) world (more precisely: interpretation maps constants to some elements of a set A of creatures). Variables refer to creatures, too, but not to any certain ones. They correspond nearest to pronouns, like he, she, it, . . .. Expressions including so-called 2 free variables can have a truth value not until variables have got their constant values. 1.1.1 The Universe of Discourse To explain the main concepts of this section, we use the following logical argument: 1. Jane is Paul’s mother. 2. Jane is Mary’s mother. 3. Any two persons having the same mother are siblings. 4. Paul and Mary are siblings. The truth of the statement "Jane is Paul’s mother" can only be assessed within a certain context. There are many people named Jane and Paul, and without further information the statement in question can defer to many different people, which makes it ambiguous. To remove such ambiguities, we introduce the concept of a universe or a domain. Definition 1.2. The universe of discourse or domain is the collection of all persons, ideas, symbols, data structures, and so on, that affect the logical argument under consideration. The elements of the universe of discourse are called individuals In the argument concerning Mary and Paul, the universe of discourse may, for example, consist of the people living in a particular house or on a particular block. Many arguments involve numbers and, in this case, one must stipulate whether the domain is the set of natural numbers, the set of integers, the set of real numbers, or even the set of complex numbers. In fact, the truth of a statement may depend on the domain selected. The statement "There is a smallest number" is true in the domain of natural numbers, but false in the domain of integers. 3 To avoid trivial cases, one stipulates that every universe of discourse must contain at least one individual. Hence, the set of all natural numbers less that 0 does not constitute a universe because there is no negative natural numbers. Instead of the word individual, one sometimes uses the word object, such as in "the domain must contain at least one object". To refer to a particular individual or object, identifiers are used.These identifiers are called individual constants. Each individual constant must uniquely identify a particular individual and no other one. For example, if the universe of discourse consists of persons, there must not be two persons with the same name. 1.1.2 Predicates Generally, predicates make statements about individuals. To illustrate this notion, consider the following statements: (a) Mary and Paul are siblings. (b) Jane is the mother of Mary. (c) Tom is a cat. (d) The sum of 2 and 3 is 5. In each of these statements, there is a list of individuals, which is given by the argument list, together with phrases that describe certain relations among or properties of the individuals mentioned in the argument list. These properties are referred to as predicates. For example, in the statement (a), the argument list is given by "Mary" and "Paul", in that order, whereas the predicate is described by the phrase "are siblings". The entries of the argument list are called arguments. In this sense, arguments are terms, i.e. variables or individual constants. In predicate logic, each predicate is given a name, which followed by the list of arguments. For example, to express "Jane is the mother of Mary" one would choose an identifier, say, "mother, to express the predicate "is mother of", and one would write ’mother(Jane, Mary)’. Very often only single letters are used for predicate names and terms. We usually write, for example, M (j, m) instead 4 of mother(Jane, Mary). The order of the arguments is important. Clearly, the statement M (j, m) and M (m, j) have completely different meaning. Each predicate is associated with arity number. It indicates the number of elements in the argument list of a predicate.Unary predicates describe properties of of objects, for example, "P (x)" indicates that x has the property P . The interpretation of a predicate P in a set of objects, A, is the set of those elements of A that have the property P , i.e., {α ∈ A | P (α)}. A predicate with arity n is often called an n-place predicate. These predicates indicate relations between objects. For example, if Q is a two-place predicate then we interpret Q as a binary relation on a universe of discourse A, i.e. as the pairs of elements (α, β) ∈ A×A to denote that α is in the relation Q to β. Hence we can write {(α, β) ∈ A × A | Q(α, β)}. Example 1.2. The predicate "is a cat" is one-place predicate. The predicate "is the mother of" is two-place predicate, i.e., its arity is 2. The predicate in the statement "The sum of 2 and 3 is 6" (which is false) contains the three-place predicate "is the sum of". The identity ’=’ is a constant two-place predicate always having the same interpretation: identity on a set of terms A is {(α, β) | α, β ∈ A}. The identity symbol can be inserted only between two terms. Definition 1.3. In predicate language P, an atomic formula (or an atom) is (a) a predicate name followed by an argument list, (a) an identity t1 = t2 , where t1 and t2 are terms (i.e. individual constants or variables). 5 Atomic formulas are statements, and they can be combined by logical connectives in the same way as atoms in propositional logic. For example, using above mentioned natural language counterparts, we can write M (j, m) → ¬M (m, j) to mean that "If Jane is Mary’s mother then Mary is not Jane’s mother." If all arguments of a predicate are individual constants, then the resulting atomic formula must either be true or false. For example, if the universe of discourse consists of Jane, Doug, Mary, and Paul, we have to know for each ordered pair of individuals whether or not the predicate "is the mother of" (or "mother" for short) is true. This can be done in the form of a table. Any method that assigns truth-values to all possible combinations of individuals of a predicate is called an assignment of the predicate.For example, the following table is an assignment of the predicate "mother". Doug Jane Mary Paul Doug F F F F Jane F F T T Mary F F F F Paul F F F F In general, if a predicate has two arguments, its assignment can be given by a table in which the lows correspond to the first argument and columns to the second. In a finite universe of discourse, one can represent the assignments of predicates with arity n by n-dimensional arrays. For example, properties are assigned by one-dimensional arrays, predicates of arity 2 by two-dimensional arrays, and so on. Note that mathematical symbols ≥ and ≤ are predicates. However, these predicates are normally used in infix notations. By this, we mean that they are placed between the arguments. For example, we usually write 2 > 1 instead of > (2, 1). 6 1.1.3 Variables and Instantiations Often, one does not want to associate the arguments of an atomic formula with a particular individual. To avoid this, variables are used. Variable names are frequently chosen from the end of the alphabet: x, y, z etc., with or without subscripts. Examples of expressions containing variables include cat(x) → hastail(x), dog(y) ∧ brown(y), grade(x) → (x ≥ 0) ∧ (x ≤ 100). As in propositional logic, expressions can be given names, i.e. meta-variables are used. For example, one can give the name A to the expression A := B(x) → C(x) which means that when we write A we really mean "B(x) → C(x)". Example 1.3. Consider a statement "If x is a cat then x has tails." We formalize it as follows: C(x) := "x is a cat", T (x) := "x has tails", then the formalization of the whole sentence is A := C(x) → T (x). This is an open formula, because there exists a variable x in the formula. We cannot give a truth-value to it before we know the value of of x. If S is a domain of objects, where x can have values, we can choose some object and replace it to all occurrences of x in the formula. After doing this, we can state the truth-value of the corresponding closed formula. Suppose that a ∈ S, and a refers to Tom. When we replace a to the instances of x of the formula, we have C(a) → T (a), and its interpretation in natural language is "If Tom is a cat then Tom has tails". Generally, if A is an expression, the expression obtained by replacing all occurrences of a variable x in A by term t is denoted by Stx A. According to Example 1.3, Sax A stands for C(a) → T (a). 7 Definition 1.4. Let A represent an expression, x represent a variable, and t represent a term. Then Stx A represents the expression obtained by replacing all occurrences of x in A by t. Stx A is called an instantiation of A, and t is said to be an instance of x. Example 1.4. Let a, b, and c be individual constants, P and Q be predicate symbols, and x and y be variables. Find Sax (P (a) → Q(x)), Sby (P (y) ∨ Q(y)), Say Q(a), Say (P (x) → Q(x)). Solution: Sax (P (a) → Q(x)) is P (a) → Q(a), and Sby (P (y) ∨ Q(y)) is P (b) ∨ Q(b). Since (Q(a)) does not contain any y, replacing all occurrences of y by a leaves Q(a) unchanged, which means that Say Q(a) = Q(a). Similarly, Say (P (x) → Q(x)) = P (x) → Q(x). 1.1.4 Quantifiers Consider the following statements: 1. All cats have tails. 2. Some people like their meat raw. 3. Everyone gets a break once in a while. All these statements indicate how frequently certain things are true. In predicate logic, one uses quantifiers in this context. Specially, we will discuss two quantifiers: the universal quantifier, which indicates that something is true for all individuals, and the existential quantifier, which indicates that a statement is true for some individuals. 8 Definition 1.5. Let A represent an expression, and let x represent a variable. If we want to indicate that A is true for all possible values of x, we write ∀ x A. Here ∀ x is called the universal quantifier, and A is called the scope of the quantifier. The variable x is said to be bound by the quantifier. The symbol ∀ is pronounced "for all". The quantifier and the bounded variable that follows have to be treated as a unit, and this unit acts somewhat like a unary connective. Statements containing words like "every", "each", and "everyone" usually indicate universal quantification. Such statements must typically be reworded such that they start with "for every x", which is then translated to ∀ x. Example 1.5. Express "Everyone gets a break once in a while" in predicate logic. Solution: We define B to mean "gets a break once in a while". Hence, B(x) means that x gets a break once in a while. The word "everyone" indicates that this is true for all x. This leads to the following translation ∀ x B(x). Example 1.6. Express "All cats have tails" in predicate logic. Solution: We first have to find the scope of the universal quantifier, which is "If x is a cat, then x has tails". After choosing descriptive predicate symbols, we express this by the following compound formula: cat(x) → hastails(x). This expression must be universally quantified to yield the required solution: ∀ x (cat(x) → hastails(x)). At last, when we let C stand for "cat" and T stand for "hastails", we have the result ∀ x (C(x) → T (x)). Under quantification, we are able to use what ever variable we like, for example ∀ y (C(y) → T (y)) tells the same thing. 9 Definition 1.6. Let A represent an expression, and let x represent a variable. If we want to indicate that A is true for at least one value of x, we write ∃ x A. This statement is pronounced "There exists an x such that A". Here ∃ x is called the existential quantifier, and A is called the scope of the existential quantifier. The variable x is said to be bound by the quantifier. Statements containing such phrases as "some" and "at least one" suggest existential quantification. They should be rephrased as "there is an x such that", which is translated by ∃ x. Example 1.7. Let P be the property "like their meat raw". Then ∃ x P (x) can be translated as "There exists people who like their meat raw" or "Some people like their meat raw". Example 1.8. If the universe of discourse is a collection of things, ∃ x blue(x) should be understood as " There exist objects that are blue" or "Some objects are blue". Quantifiers ∀ x and ∃ x have to be treated like unary connectives. The quantifiers are given a higher precedence than all binary connectives. For example, if P (x) and Q(x) means that x is living and that x is dead, respectively, then one has to write ∀ x(P (x) ∨ Q(x)) to indicate that everything is either living or dead. ∀ x P (x) ∨ Q(x) means that everything is living or x is dead. The variable x in a quantifier is just a placeholder, and it can be replaced by any other variable name not appearing elsewhere in the expression. For example, ∀ x P (x) and ∀ y P (y) mean the same thing: they are logically equivalent. The expression ∀ y P (y) is called a variant of ∀ x P (x). Definition 1.7. An expression is called a variant of ∀ x A if it is of the form ∀ y Syx A, where y is any variable name, and Syx A is the expression obtained from A by replacing all instances of x by y. Similarly, ∃ x A and ∃ y Syx A are variants of one another. Quantifiers may be nested, as demonstrated by the following examples. 10 Example 1.9. Translate the sentence "There is somebody who knows everyone" into the language of predicate logic. To do this, use K(x, y) to express the fact that x knows y. Solution: The best way to solve this problem is to go in steps. We write informally ∃ x(x knows everybody). Here "x knows everybody" is still in English and means that for all y it is true that x knows y. Hence, x knows everybody = ∀ y K(x, y). We now add the existential quantifier and obtain ∃ x ∀ y K(x, y). Example 1.10. Translate "Everybody has somebody who is his or her mother" into predicate logic. Solution: We define M to be the predicate "mother"; that is, M (x, y) stands for "x is the mother of y". The statement "Someone is the mother of y" becomes ∃ x M (x, y). To express that this must be true for all y, we add the universal quantifier, which yields ∀ y ∃ x M (x, y). The statement "Nobody is perfect" also includes a quantifier, "nobody", which is the absence of an individual with a certain property. In predicate logic, the fact that nobody has property P cannot be expressed directly. To express the fact that there is no x for which an expression A is true, one can either use ¬∃ x A or ∀ x ¬A. For example, if P represents the property of perfection, both ¬∃ x P and ∀ x ¬P indicate that nobody is perfect. In the first case, we say, in verbal translation, "It is not the case that there is somebody who is perfect", whereas in the second case we say "For everyone, it is not the case that he or she is perfect". The two methods to express that "nobody is A" must of course be logically equivalent, i.e., ¬∃ x A ≡ ∀ x ¬A. (1.1) 11 According to Definitions 1.5 and 1.6, the variable appearing in the quantifier is said to be bound. For example, in the expression ∀ x (P (x) → Q(x)), x appears three times, and each time x is bound variable. Any variable that is not bound is said to be free. Later, we will see that the same variable can occur both bound and free in an expression. For this reason, it is important to also indicate the position of the variable in question. Example 1.11. Find the bound and free variables in ∀ z (P (z) ∧ Q(x)) ∨ ∃ y Q(y). Solution: Only the variable x is free. All occurrences of z are bound, and so are all occurrences of the variable y. Note that the status of a variable changes as expressions are divided into subexpressions. For example, in ∀ x P (x) x occurs twice, and it is bound both times. This statement contains P (x) as a subexpression. Nevertheless, in P (x), the variable x is free. Instantiations only affect free variables. Specifically, if A is an expression, x St A only affects the free occurrence of the variable x. For example, Syx ∀ x P (x) is still ∀ x P (x), that is, the variable x is not free. However„ Syx (Q(x)∧∀ x P (x)) yields Q(y) ∧ ∀ x P (x). Hence, instantiation treats the variable x differently, depending on whether it is free or bound, even if this variable appears twice in the same expression. Obviously, two things are only identical if they are treated identically. This implies that, if a variable appears both free and bound within the same expression, we have in fact two different variables that happen to have the same name. From this it follows that if several quantifiers use the same bound variable for quantification, then all these variables are local to their scope, and they are therefore distinct. To illustrate this, consider the statement "y has a mother". If M is the predicate name for "is mother of", then this statement translates into ∃ x M (x, y). One obviously must not form the variant ∃ y M (y, y), which means that y is her own mother. For similar reasons, there are restrictions on instantiations. For example, the instantiation Sxy (∃ x M (x, y)) is illegal. because its result is ∃ x M (x, x). In such cases, one tampers with the way in which a variable is defined, and this undesired side effects. 12 If all occurrences of x in an expression A are bound, we say "A does not contain x free". If A does not contain x free, then the truth-value of A does not change if x is instantiated to an individual constant. A is independent of x in this sense. 1.1.5 Restrictions of Quantifiers to Certain Groups Sometimes, quantification is over a subset of the universe of discourse. Suppose, for instance, that animals form the universe of discourse. How can one express sentences such as "All dogs are mammals" and "Some dogs are brown"? Consider first the statement "All dogs are mammals". Since the quantifier should be restricted to dogs, one rephrases the statement as "If x is a dog, then x is a mammal". This immediately leads to ∀ x(dog(x) → mammal(x)). Generally, the sentence ∀ x(P (x) → Q(x)) can be translated as "All individuals with property P also have property Q". Consider now the statement "Some dogs are brown". This statement means that there are some animals that are dogs and that are brown. Of course, the statement "x is a dog and x is brown" can be translated as dog(x) ∧ brown(x). "There are some brown dogs" can now be translated as ∃ x(dog(x) ∧ brown(x)) The statement ∃ x(P (x) ∧ Q(x)) can in general be interpreted as "Some individuals with property P also have property Q". Note that if the universal quantifier is to apply to individuals with a given property we use the conditional to restrict the universe of discourse. On the other hand, if we similarly restrict application of the existential quantifier, we use conjunction. 13 Finally, consider statements containing the word "only", such as "only dogs bark". To convert this into predicate logic, this must be reworded as "It barks only if it is a dog", or, equivalently, "If it barks, then it is a dog". One has therefore ∀ x (barks(x) → dog(x)). 2 2.1 Interpretations and Validity Introduction This section gives the semantical approach to predicate logic, and it deals with interpretations of logical statements and with the soundness of logical arguments. Interpretations are obviously fundamental to predicate logic, and they are therefore important in their own right. Moreover, interpretations allow one to distinguish between arguments that are sound and arguments that are not. Soundness is closely related to validity. Generally, an expression A is valid if A is true for all interpretations. Valid expressions in predicate logic play the same role as tautologies in propositional logic. In particular, logical implications and logical equivalences are defined as valid implications and valid equivalences, respectively. For consideration of truth of sentences of predicate language, we divide the alphabet of L into two parts, such that predicate symbols, function symbols, and individual constants form non-logical alphabet and other symbols logical alphabet of L. Non-logical alphabet refer to those things illustrated by the language, i.e., to objects and their mutual relationships outside the language. Hence, in each case, we can restrict L by giving a list of the non-logical alphabet under consideration. For example, L = {P, Q, R; a, b, c, d} is a predicate language having the predicate symbols P , Q, and R and the individual constants a, b, c, and d as its non-logical alphabet. 2.2 Interpretations An interpretation of a logical expression contains the following components: 14 1. There must be a universe of discourse. 2. For each individual, there must be an individual constant that exclusively refers to this particular individual, and to no other. 3. Every free variable must be assigned a unique individual constant. 4. There must be an assignment for each predicate used in the expression, including predicates of arity 0, which represents propositions. The truth status of a sentence can be determined after an interpretation is given to the sentence or to the whole L, i.e. we can speak about truth in some model. Let L = {P, Q, . . . ; a, b, . . .} be a predicate language and A a non-empty set. Definition 2.1. An interpretation of L in a set A is a valuation V that associates an element of A to every constant symbol and an n-ary relation to every n-ary predicate symbol in L. Example 2.1. Let L = {P, Q; a, b} be a predicate language where the both predicate symbols are binary predicates. Let A = Z+ (the set of positive integers). We can choose an interpretation to L, for example, such that V (a) := 3, V (b) := 2, V (P (x, y)) := x < y, andV (Q(x, y)) := x ≥ y. A model of L consists of a non-empty set and an interpretation according to the following definition. Definition 2.2. A model of L is an ordered pair M = (A, V ) where A is a non-empty set and V is an interpretation of L in the set A. A is the universe of discourse of M. A term of predicate logic has an interpretation according to the following definition. 15 Definition 2.3. (a) If a term t is a constant symbol then its interpretation in the a A is a certain element V (t) ∈ A. (b) If a term t is a variable symbol then it can have any element of A as its value in a given interpretation V , i.e., V (t) ∈ A arbitrarily. Example 2.2. Consider the predicate language in Example 2.1 in the model M = (Z+ , V ) where V is the valuation of Example 2.1. Determine the truthvalue of a formula P (c, b) → ¬Q(c, a) in the model M. We have the interpretation 3 < 2 → 2 6≥ 3 for the formula, and hence it is true in M. We consider more exactly the truth of a formula in a model. Definition 2.4. Let M = (A, V ) be a model, and α a formula of L. (a) If P is a k-ary predicate symbol and a1 , . . . , ak ∈ A are constants then P (a1 , . . . , ak ) is true in M, denote M |= P (a1 , . . . , ak ), iff the k-tuple (a1 , . . . , ak ) belongs to the relation V (P ). (b) A formula ¬α is true in a model M iff α is false in M, i.e., M |= ¬α iff M 2 α. (c) M |= α → β iff M 2 α or M |= β. (d) M |= ∀ x α iff M |= Sax α(x) for any a ∈ A. Metatheorem 2.1. Let M = (A, V ) be a model, α and β formulas of L, and γ an open formula of L where x is free. Hence we have (a) M |= α → β iff M |= α ⇒ M |= β. (b) M |= α ∨ β iff M |= α or M |= β. (c) M |= α ∧ β iff M |= α and M |= β. (d) M |= α ↔ β iff either M |= α and M |= β or M 2 α and M 2 β. (e) M |= ∀ x γ(x) iff M |= Sax γ(x) for any a ∈ A. 16 (f) M |= ∃ x γ(x) iff M |= Sax γ(x) for some a ∈ A. Proof. The assertions of Metatheorem follow from Definition 2.4 by the mutual dependency of connectives and that of quantifiers. Note that only closed formulas can have a truth-value. It is nonsense to speak about the truth status of open formulas. On the other hand, we can speak about satisfaction of open formulas in a sense that in an interpretation variables of an open formula can have such values belonging to the universe of discourse of the correspondent model, and according to the interpretation, it is possible to get a true expression from the formula. It may also happen that a given expression is not true and not false in a model. For this reason, we define for every statement α in L a set of relevant models as follows: Mα = {M | M = (A, V ), such that either M |= α or M 2 α}. Definition 2.5. A statement α is valid iff M |= α for all M ∈ Mα . A statement α is satisfiable iff M |= α for some M ∈ Mα . A statement α is refutable iff M 2 α for some M ∈ Mα . A statement α is contradiction iff M 2 α for all M ∈ Mα . The closure of an open formula α is the closed formula obtained from α by bounding universally all occurrences of the free variables in α. Hence, according to Definition 2.5, the validity of an open formula means the consideration of truth of the closure of the formula in all relevant models. Definition 2.6. Let ∆ be a set of statements of L and M = (A, V ) a model. M is the model of the set of statements ∆ iff M |= α for all statements α of the set ∆, denoted by M |= ∆. Now the concept of semantic entailment (or logical consequence) in L can be defined. Definition 2.7. A statement α is a semantic entailment of a set ∆ of statements iff α is true in every model of ∆. 17 Example 2.3. Show that the statement ∃ y ∀ x Q(x, y) → ∀ x ∃ y Q(x, y) is valid. Let M = (A, V ) be any relevant model of the statement. If M 2 ∃ y ∀ x Q(x, y) then M |= ∃ y ∀ x Q(x, y) → ∀ x ∃ y Q(x, y) by Definition 2.4 (c). Thus, we consider the case where M |= ∃ y ∀ x Q(x, y). In this case, M |= ∀ x Q(x, b) for some element b ∈ A by Metatheorem 2.1 (f ). Hence, for some element b ∈ A it holds that for all elements a ∈ A, M |= Q(a, b) by Definition 2.4 (d). Hence, for all elements a ∈ A, there exists an element b ∈ A, such that M |= Q(a, b). Hence, M |= ∃ y Q(a, y) holds foa all a ∈ A by Metatheorem 2.1 (f), and thus M |= ∀ x ∃ y Q(x, y) holds by Definition 2.4 (d). Hence, in this case, too, M |= ∃ y ∀ x Q(x, y) → ∀ x ∃ y Q(x, y) holds by Definition 2.4 (c). Because M was an arbitrarily chosen relevant model of α, the formula ∃ y ∀ x Q(x, y) → ∀ x ∃ y Q(x, y) is valid. Example 2.4. Show that the statement ∀ x ∃ y Q(x, y) → ∃ y ∀ x Q(x, y) is not valid. Consider the formula in the model M = (N, V ) (N = {1, 2, . . .} is the set of natural numbers) where V (Q) = {(a, b) ∈ N × N | a < b}. Hence, M |= ∀ x ∃ y Q(x, y), because for all natural number, there exists a natural number b, such that M |= Q(a, b), i.e. a < b. On the other hand, M 2 ∃ y ∀ x Q(x, y) because there does not exist a natural number b, such that M |= Q(a, b) for all natural numbers a, because the set N does not have the greatest element. Hence, M is such a model where our statement is not true. From this it follows that the statement is not valid. Consider the satisfaction of an arbitrary formula more closely. Let M = (A, V ) be a model of L and t1 , . . . , tn are terms in L. 18 Definition 2.8. Let P be a unary predicate symbol. Then a formula P (t) is satisfied in a model M = (A, V ) iff in the interpretation V , t has a value in A, such that it has the property V (P ), i.e. V (t) ∈ V (P ). Example 2.5. Consider a formula P (x), and let V (P (x)) := x is a father, i.e. V (P (x)) = {x | x is a father. Let A be the set of students. Hence the formula P (x) is satisfied in the model M = (A, V ) iff V gives x a value a ∈ A, such that a is a father, i.e. x is a student who is also a father. Definition 2.9. Let P be a n-ary predicate symbol, n ≥ 2. A formula P (t1 , . . . , tn ) is satisfied in a model M = (A, V ) iff in the interpretation V , the terms t1 , . . . , tn have such values V (ti ) ∈ A (i = 1, . . . n) which are in the relation V (P ), i.e. (V (t1 ), . . . , V (tn )) ∈ V (P ). Example 2.6. Consider a formula P (xc), and let M = (A, V ) be a model, such that A is the set of soldiers of Finnish army, V (P (x, y)) means that x is the superior of y, and V (c) is a soldier named Karhunen (we use the individual constant k to refer him/her), i.e. k ∈ A. Hence, the formula P (x, c) is satisfiable in M iff a ∈ A, such that (a, k) ∈ V (P (x, y)), i.e. the formula P (xc) is satisfiable in M iff a is a superior of k, for example, a is a captain and Karhunen is a sergeant. If we want, we can use the satisfiability in a model of a formula when determining the truth status of a statement in a given model. However, we have to distinguish the case "a statement to be satisfiable" mentioned in Definition 2.5 from the case "a formula to be satisfiable in a model" mentioned in Definitions 2.5 and 2.9. Example 2.7. Show that the formula ∀ x (P (x) ∨ ¬P (x)) is valid. Let M = (A, V ) be an arbitrary relevant model. Hence M |= ∀ x (P (x) ∨ ¬P (x)) iff M |= P (a) ∨ ¬P (a) for all a ∈ A. 19 This is the case iff M |= P (a) or M |= ¬P (a) for all a ∈ A, i.e. M |= P (a) or M 2 P (a) for all a ∈ A. It is obviously clear that P (a) is either true or false in any model where a is any element of the domain of the model. From this it follows that ∀ x (P (x) ∨ ¬P (x)) is true in any relevant model and hence valid. Example 2.8. Show that the formula ∃ x (P (x) ∧ ¬P (x)) is a contradiction. Let M = (A, V ) be an arbitrary relevant model. Hence M 2 ∃ x (P (x) ∧ ¬P (x)) iff M |= ¬∃ x (P (x) ∧ ¬P (x)) iff M |= ∀ x ¬(P (x) ∧ ¬P (x)) iff M |= ∀ x (¬P (x) ∨ P (x)), which holds by Example 2.7. Hence the formula ∃ x (P (x) ∧ ¬P (x)) is a contradiction. Example 2.9. Show that a formula ∃ x (P (x) ∧ ¬Q(x)) is satisfiable. Consider the model M = (Z, V ), such that V (P (x)) = {x | x > 2} and V (Q(x)) = {x | x < 10}. M |= ∃ x (P (x) ∧ ¬Q(x)) iff M |= P (a) ∧ ¬Q(a) for some a ∈ Z. This is the case iff M |= P (a) and M |= ¬Q(a), i.e. M 2 Q(a). For example, the value a = 15 satisfies the last condition, because 15 ∈ V (P ) (i.e. 15 > 2) and 15 ∈ / V (Q) (i.e. 15 ≮ 10). Hence, the formula is satisfiable. Example 2.10. Examine, whether the formula ∃ x (P (x) ∧ ¬Q(x)) is refutable. Consider the model M = (Z, V ) where V (P (x)) = {x | x = 2} and V (Q(x)) = {x | x < 10}. 20 The interpretation in the model M is as follows: ∃ x (x = 2 ∧ x ≮ 10) which is false for any integers x ∈ Z. Hence, M 2 ∃ x (P (x) ∧ ¬Q(x)) and thus the formula is refutable. Example 2.11. Examine, whether the set of formulas ∆ = {∀x(P (x) → Q(x)), ∃x(¬P (x) ∧ Q(x))} is satisfiable. We refer to the first formula by (1) and to the second formula by (2). Let M = (A, V ) be a model, such that A = {1, 2, 3, 4}, V (P (x)) = {x | x > 2}, and V (Q(x)) = {x | x > 1}. M |= ∀x(P (x) → Q(x)) iff M |= P (a) → Q(a) for all a ∈ A. Clearly, x > 2 ⇒ x > 1 holds for all elements of A. Hence, the formula (1) is true in M. M |= ∃x(¬P (x) ∧ Q(x)) iff M |= ¬P (a) ∧ Q(a) for some a ∈ A. Clearly, x ≮ 2 ∧ x > 1 holds for example for 2 ∈ A. Hence, (2) is true in M. Because there exists at least one model, such that the formulas (1) and (2) are true in it, the set ∆ is satisfiable. Example 2.12. Examine, whether in the following set of formulas the last formula follows logically from the two first formulas: ∆ = {∀x(P (x) → Q(x)), ∀x(P (x) → R(x)), ∀x(Q(x) → R(x))} We refer to these formulas from the first to the third by the respective numbers (1), (2), and (3). We consider the formulas in the model M = (A, V ) where A = {1, −2, 3, −4}, V (P (x)) = {x | x is odd}, V (Q(x)) = {x | x < 5}, and V (R(x)) = {x | x is positive}. (1) M |= ∀x(P (x) → Q(x)) iff M |= P (a) → Q(a) for all a ∈ A. All the odd numbers are less that 5 in A. Hence the formula (1) is true in M. 21 (2) M |= ∀x(P (x) → R(x)) iff M |= P (a) → R(a) for all a ∈ A. All the odd numbers are positive in A. Hence the formula (2) is true in M. (3) M |= ∀x(Q(x) → R(x)) iff M |= Q(a) → R(a) for all a ∈ A. Because the elements of A which are less than 5, are not positive, the formula (3) is not true in M. Because there exists at least one model of formulas (1) and (2), where the formula (3) is not true, the formula (3) is not a logical consequence of the formulas (1) and (2). 3 Axiomatization and Proof Theory In predicate logic, the concepts deduction, proof, and theorem are defined in the similar way as in propositional logic. 3.1 Axiom Schemes Axiom schemes of predicate logic are created by extending those of propositional logic. Hence, we add to the axioms of propositional logic additional axioms concerning quantification. Also, we add to the set of inference rules a rule concerning the use of universal quantifier. The meta-variables appearing in the axioms coming from propositional logic refer to formulas of predicate logic. Definition 3.1 (Axioms). Let α, β, and γ be formulas of predicate logic, then the following formulas are the axiom schemes of predicate logic: A1 α → (β → α), A2 (α → (β → γ)) → ((α → β) → (α → γ)), A3 (¬β → ¬α) → (α → β), A4 ∀x α → Sax α, A5 ∀x(α → β) → (α → ∀x β) if x is not free in α. 22 Definition 3.2 (Inference Rules). The inference rules of predicate logic are as follows: R1 Modus ponens (MP): α α→β β R2 Universal Generalization (UG): α ∀x α, if x does not appear as a free variable in any premises. According to the rule Universal Generalization, since x becomes bound in the process, we say that the universal generalization is over x, or that one generalizes over x. We will justify universal generalization later. At the moment, it must be pointed out that universal generalization is subject to restrictions, If one generalizes over x, then x must not appear in any premise as free variable, If x does appear free in any premise, then x always refers to the same individual, and it is fixed in this sense. For example, if P (x) appears in a premise, then P (x) is only true for x and not necessarily true for any other individual. If x is fixed, one cannot generalize over x. Generalizations from one particular individual toward the entire population are unsound. If, on the other hand, x does not appear in any premise or if x is bound in all premises, then x is assumed to stand for everyone, and universal generalization may be applied without restriction. Example 3.1. To demonstrate universal generalization, consider the following problem whose domain consists of a group of computer science students. Of course, all computer science students like programming. The derivation must prove that everyone in the domain likes programming. If P (x) and Q(x) stand for "x is a computer science student" and "x likes programming", respectively, the premises become ∀x P (x), ∀x (P (x) → Q(x)). 23 The desired conclusion is ∀x Q(x). Hence, we have to prove that ∀x P (x), ∀x (P (x) → Q(x)) ` ∀x Q(x). The deduction is as follows: 1. 2. 3. 4. 5. 6. Formula ∀x P (x) ∀x (P (x) → Q(x)) P (x) P (x) → Q(x) Q(x) ∀x Q(x) Rule premise premise Sxx , 1 Sxx , 2 MP, 3,4 UG, 5 Comment Everyone is a CS major. CS majors like programming. x is a CS major. If x is a CS major, x likes programming. x likes programming. Everyone likes programming. In the proof, Q(x) is derived in line 5, which means that x likes programming. This statement is then generalized to ∀x Q(x). This generalization is only possible because all instances of x in the premises are bound. If the premise ∀x P (x) is replaced by P (x), then universal instantiation over x is no longer sound. This is the case because x is fixed and universal generalization over fixed variables is unsound. Example 3.2. As a second example, we derive ∀y ∀x P (x, y) from ∀x ∀y P (x, y). We have the deduction: 1. 2. 3. 4. 5. Formula ∀x ∀y P (x, y) ∀y P (x, y) P (x, y) ∀x P (x, y) ∀y ∀x P (x, y) Rule premise Sxx Syy UG, 3 UG, 4 Comment Drop the first quantifier. Drop the second quantifier. This is a sound generalization. Generalize again to obtain the desired conclusion. On the line 3, we drop the second quantifier to obtain an expression without quantifiers. Then we use UG to add the quantifiers back in reverse order. On the line 4, it is sound to generalize, because the premise does not contain x as a free variable. All occurrences of x in the premises are bound. The situation is similar on the line 5. 24 Example 3.3. In the third example of UG, we show that the variable x in a universal quantifier may be changed to the variable y i.e. we prove ∀x P (x) ` ∀y P (y): Formula 1. ∀x P (x) 2. P (y) 3. ∀y P (y) Rule Comment premise Syx , 1 Instantiate the premise for y. UG, 2 Generalize P (y) to obtain the conclusion. In fact, when we use the argument of instantiation Stx in deductions, we use the axiom A4 in the following way: Formula 1. ∀x P (x) 2. ∀x P (x) → P (t) 3. P (t) Rule premise A4 MP, 1,2 Comment Premise, (or deduced from earlier steps). A4 is used in its original meaning. This is the result of Stx applied to 1. Hence, we have proved that ∀x P (x) ` P (t). This justifies the use of instantiation in that form used in the previous inferences. Thus, we have a derived the rule of universal specification (US): ∀x α(x) Stx α(t) Example 3.4. Consider an additional example about the rule US. We prove the following inference to be correct: Healthy people live long. Socrates was healthy. Socrates lived long. To do the derivation, let H(x) indicate that x is a healthy people, L(x) x lives (or lived) long, and s stands for Socrates. We have the deduction: 1. 2. 3. 4. ∀x (H(x) → L(x)) H(s) H(s) → L(s) L(s) 25 premise premise US, 1 MP, 2,3 3.2 Deduction Theorem and Universal Generalization In the deduction theorem, one assumes B, proves C, using the assumption B like a premise, and concludes, that B → C. Once this is done, B is discharged. The question now is how to treat free variables occurring in B. First, while B is used as an assumption, that is, as long as B is not discharged, B has to be treated like any other premise. In particular, should B contain x as a free variable, then one must not generalize over x. However, as soon as B is discharged, this is no longer true. Once B is discharged, it has no effect whatsoever on the status of any variable. Hence, if x is not free in any other premise, one can universally generalize over x even if x appears free in B. The deduction theorem is now demonstrated by an example. Example 3.5. Let S(x) stand for "x studied" and P (x) stands for "x passed". The premise is that everyone who studied passed. Prove that everyone who did not pass did not study. Solution: The premise "everyone who studied passed" can be translated as ∀x(S(x) → P (x)), and the statement "everyone who did not pass did not study" becomes ∀x(¬P (x) → ¬S(x)). 1. 2. 3. 4. 5. 6. Formula ∀x(S(x) → P (x)) S(x) → P (x) ¬P (x) ¬S(x) ¬P (x) → ¬S(x) ∀x(¬P (x) → ¬S(x)) Rule premise US, 1 assumption MT, 2,3 DT, 3,4 UG, 5 Comment Everyone who studied passed. If x studied, x passed. Assume that x did not pass. x cannot have studied. Apply DT and discharge ¬P (x). Anyone who did not pass cannot have studied. To obtain the result, the assumption ¬P (x) is introduced in line 3. As long as this assumption is not discharged, no generalization over x is allowed. To indicate that an assumption is in effect, lines 3 and 4 are indented. However, once the deduction theorem is applied, the indentation is removed and the assumption ¬P (x) is discharged, and one can generalize over x. This is done in line 6. In all other aspects, the proof is self-documenting. ∀x(S(x) → P (x)) is a premise, and ∀x(¬P (x) → ¬S(x)) follows. To arrive 26 at the desired conclusion, one uses universal generalization. This can be done because x is not free in any premise. 3.3 Dropping the Universal Quantifiers In mathematics, universal quantifiers are frequently omitted. For example, in the statement x + y = y + x, both x and y are implicitly universally quantified. This causes problems when such statements are used as premises because, according to our rules, any variable appearing free in a premise is foxed in the sense that throughout the proof it is bound to one and the same individual. To get around this difficulty, we single out certain variables in the premises and explicitly state that these variables are not fixed. All variables that are not fixed will be called true variables. A variable may be universally generalized if and only if it is a true variable. If a variable appears in a premise, then it is assumed to be fixed, unless it is explicitly stated that the variable is a true variable. By using true variables, one can omit many universal quantifiers and this, in turn, simplifies proofs. Moreover, we allow from now on that any true variable can be instantiated to any term. The same effect can, of course, be achieved by using universal generalization first, following by universal instantiation. However, direct instantiation is shorter and often clearer. Until now, instantiations were always represented by the symbols S, such as x Sy . From now on, we will frequently make use of the notation x := y to indicate that x is replaced by y. In some programming languages, the notation := means "assign to", which is the same as "instantiate to". Example 3.6. Let P (x, y, z) : x + y = z. Given the premises P (x, 0, x) and P (x, y, z) → P (y, x, z), where x, y, and z are true variables, prove that 0 + x = x, i.e., prove P (0, x, x). Solution: The following derivation is used to prove P (0, x, x). Note that the first two lines are premises and that x, y, and z are explicitly declared as true variables. We have the deduction: 27 1. 2. 3. 4. P (x, y, z) → P (y, x, z) P (x, 0, x) P (x, 0, x) → P (0, x, x) P (0, x, x) Premise: x + y = z ⇒ y + x = z, x, y, and z are true. Premise: x + 0 = x, x is a true variable. Line 1 with x := x, y := 0, z := x. MP, 2,3; 0 + x = x. All true variables are strictly local to the line on which they appear. Hence, if the true variable x appears on two different lines, then these two instances of x are really two different variables. For example, in the proof of Example 3.6, x in line 1 and x in line 2 are two different variables. When doing the proof, one obviously has to establish some type of connection between the variables, and this connection is through instantiation. Of course, instantiations must not be made blindly. Instead, one has to do the instantiations in such a way that progress is made toward the desired conclusion. How this is done in detail depends on the general strategy, and in each proof, some type of strategy should be followed. However, there are some general principles that are helpful, and one of them is unification. Definition 3.3. Two expressions are said to unify if there are legal instantiations that make the expressions in question identical. The act of unifying is called unification. The instantiation that unifies the expressions in question is called a unifier. Example 3.7. Suppose the expressions Q(a, y, z) and Q(y, b, c) are expressions appearing on different lines. Show that the two expressions unify, and give a unifier. Here, a, b, and c are fixed, and y and z are true variables. Solution: Since y in Q(a, y, z) is a different variable than y in Q(y, b, c), rename y in the second expression to become y1 . This means that one must unify Q(a, y, z) with Q(y1 , b, c). An instance of Q(a, y, z) is Q(a, b, c), and an instance of Q(y1 , b, c) is Q(a, b, c). Since these two instances are identical, Q(a, y, z) and Q(y, b, c) unify. The unifier is a = y1 , b = y, c = z. There may be several unifiers. For example, if a and b are constants, then R(a, x) and R(y, z) have the unifier y = a, z = x, which yields the common instance R(a, x). However, there is also the unifier y = a, x = b, z = b, which 28 yields the common instance R(a, b). However, R(a, b) is an instance of R(a, x), and the unifier y = a, x = b, z = b is in this sense less general than the unifier y = a, z = x. Of course, we always want to find the most general unifier, if one exists. The solution of Example 3.6 involved unification. Specifically, to make use of the modus ponens, line 2 was unified with the antecedent of line 1. Generally, unification is performed in such a way that some rule of inference can be applied after unification. Example 3.8. Clearly, if x is the mother of y and if z is the sister of x, then z is the aunt of y. Suppose now that Brent’s mother is Jane and Liza is Jane’s sister. Prove that Liza is Brent’s aunt. Solution: If "mother(x, y)" is the predicate that is true if x is the mother of y, and if "sister(x, y)" and "aunt(x, y)" are defined in a similar fashion, one can state the premises as follows: 1. mother(x, y)∧sister(x, y) → aunt(x, y) 2. mother(Jane, Brent) 3. sister(Liza, Jane) The problem is now to create an expression that unifies with the antecedent of line 1. To do this, one combines lines 2 and 3 to obtain 4. mother(Jane, Brent) ∧ sister(Liza, Jane) This expression can be unified with mother(x, y)∧sister(z, x) by setting x := Jane, y := Brent, and z := Liza. This yields 5. mother(Jane, Brent) ∧ sister(Liza, Jane) → aunt(Liza, Brent) The conclusion that Liza is Brent’s aunt now follows from 4 and 5 by modus ponens. 29 3.4 Rules for Existential quantifiers Existential Generalization If there is any term t for which P (t) holds, then one can conclude that some x satisfies P (x). Hence, P (t) logically implies that ∃x P (x). More generally, ∃x α can be derived from Stx α, where t is any term. This leads to the following rule of inference, existential generalization (EG): Stx α ∃x α Example 3.9. Let c be Aunt Cornelia, and let P (x) stand for "x is over 100 years old". Then one has P (c) ∃x P (x) The reason is that if one replaces x by c in P (x) then one finds P (c). The following example demonstrates demonstrates how to use existential generalization within a formal proof. The premises of our derivation are 1. Everybody who has won a million is rich. 2. Mary has won a million. We want to show that these two statements logically imply that 3. There is somebody who is rich. If somebody were asked to demonstrate that the conclusion follows from the premises, he or she would probably argue as follows. If everybody who wins a million euros is rich, then Mary is rich if she wins a million. Since we know that Mary has won a million, we apply modus ponens and conclude that Mary is rich. There is thus somebody, Mary, who is rich. This argument is now formalized. W (x) means that x has won a million, R(x) means that x is rich, and m stands for Mary. Hence, we prove that ∀x (W (x) → R(x)), W (m) ` ∃x R(x). The deduction is as follows: 30 1. 2. 3. 4. 5. ∀x (W (x) → R(x)) W (m) → R(m) W (m) R(m) ∃x R(x) premise US, 1, x := m premise MP, 2,3 EG, 4 It was stated earlier that ¬∃x P (x) is logically equivalent to ∀x¬P (x). We now prove the first half of this statement by showing that ¬∃x P (x) ` ∀x¬P (x). The deduction is as follows: 1. 2. 3. 4. 5. 6. ¬∃x P (x) P (x) ∃x P (x) P (x) → ∃x P (x) ¬P (x) ∀x¬P (x) premise assumption EG, 2 DT, 2,3 MT, 1,4 UG, 3 We will later prove ∀x¬P (x) ` ¬∃x P (x) The two proofs together establish the logical equivalence of ¬∃x P (x) and ∀x¬P (x). Existential Spesification If ∃x α is true, then there must be some term t that satisfies α; that is, Stx α must be true for some t. For example, if P (x) stands for "x does somersaults", then ∃x P (x) means that Stx P (x) = P (t) must be true for some t. The problem is that we do not know whether it is Aunt Eulalia, Uncle Petronius or even somebody else who makes somersaults. In a proof, the question must therefore be kept as to who the individual is who makes somersaults. To do this, a new variable, say b, is selected to denote this unknown individual. This leads to the following rule of inference, existential specification (ES): ∃x α Stx α The variable introduced by existential instantiation must not have appeared earlier as a free variable. For example, when applying ES to the two statements 31 "There exists someone who is over 100 years old" and "There exists someone who makes somersaults", one must not use the same variable b for existential instantiation in both cases. Otherwise, one could conclude that b is both over 100 and makes somersaults, which certainly does not follow logically. Similarly, one cannot use any variable that appears free in any of the premises. Hence, ES must not introduce any variable that has appeared already as a free variable in the derivation. Moreover, the variable introduced is fixed in the sense that one cannot use universal generalization over this variable. For example, if b makes somersaults, then one cannot use UG to conclude that everyone makes somersaults. Moreover, a variable with an unknown value must not appear in the conclusion, and since any variable introduced ny ES is unknown, it must not appear in the conclusion either. For the purpose of demonstration, suppose that there is someone who won a million euros, and we want to prove that there is someone who is rich. Hence, the premises are 1. Someone has won a million euros. 2. Everybody who has won a million is rich. We want to show that these two statements logically imply 3. There is somebody who is rich. Hence, we have to prove ∀x (W (x) → R(x)), ∃x W (x) ` ∃x R(x). The deduction is as follows: 1. 2. 3. 4. 5. 6. ∃x W (x) W (b) ∀x (W (x) → R(x)) W (b) → R(b) R(b) ∃x R(x) premise ES, 1, x := b premise US, 3, x := b MP, 2,4 EG, 5 In this proof, existential specification is used on line 2, where the winner is called b. Once this is obtained, the second premise is given on line 3, and this premise is instantiated with x := b on line 4. Note that one must not derive lines 3 and 4 before lines 1 and 2; that is, one must not apply the universal 32 specification before the existential specification. The reason for this is that, once W (b) → R(b) is obtained, b is no longer a new variable, and it therefore must not be used for existential specification. For this reason, it is generally a good idea to apply existential specification first. As a second example of ES, we prove ∀x ¬P (x) ` ¬∃x P (x). The deduction is as follows: 1. 2. 3. 4. 5. 6. ∃x P (x) P (b) ∀x ¬P (x) ¬P (b) P (b) ∧ ¬P (b) ¬∃x P (x) assumption ES, 1, x := b premise US, 3, x := b CI, 3,4 RAA, 1, 5 We used indirect proof in this derivation. Since we want to derive ¬∃x P (x), the assumption to be rejected is ∃x P (x). Existential specification allows us to derive P (b) on line 2, which contradicts ¬P (b) derived on line 4 by using universal specification. The resulting contradiction is given on line 5. This line allows one to reject the assumption ∃x P (x); that is, ¬∃x P (x) must be true. Hence, ∀x ¬P (x) implies logically ¬∃x P (x), which provides the promised second half of the proof of logical equivalence of these formulas. 3.5 Natural Inference Rules in Predicate Logic The important basis for natural inference which supports human intuitive deduction is the set of natural inference rules for propositional logic, i.e. SuppesGenzen type inference rules. To this set, we add rules concerning quantifiers, logical equivalence, and identity. We collect here together all the needed inference rules concerning these things. Rule of Universal Specification (US): Let ϕ(x) be a formula where x is free, and ϕ(t) = Stx ϕ(x), then ∀x ϕ(x) ` ϕ(t). Rule of Universal Generalization (HG): Let ϕ(t) be a formula where t is an 33 arbitrary term, and ϕ(x) = Sxt ϕ(t), then ϕ(t) ` ∀x ϕ(x) iff x does not occur in ϕ(t), and t does not occur in the premises, on which the conclusion ∀x ϕ(x) depends. Rule of Existential Specification (ES): Let t be a term, x is free in ϕ(x), and ϕ(t) = Stx ϕ(x), then ∃x ϕ(x) ` ϕ(t). Remark: • In a deduction, the rule ES must be applied before applying the rule US, if possible. • Applying ES, t must not occur in the same deduction in earlier steps. • t must not occur in the conclusion of the deduction. Rule of Existential Generalization (EG): Let ϕ(t) be a formula where t is a term, and ϕ(x) = Sxt ϕ(t), then ϕ(t) ` ∃x ϕ(x). Quantifier Exchange Rule (Q1): ∀x can be replaced with ¬∃x¬. Quantifier Exchange Rule (Q2): ∃x can be replaced with ¬∀x¬. Rule of Logical Equivalence (RE): If a formula ψ1 occurs as a part of a formula ϕ1 , and if ψ1 ≡ ψ2 (i.e. ψ1 and ψ2 are logically equivalent), then ϕ1 ψ1 ≡ ψ2 ϕ2 where ϕ2 is created by substituting ψ2 on the place of ψ1 in ϕ1 . Rule of Identity (I): If ϕ(t1 ) is a formula, and Stt21 = ϕ(t2 ), where t1 are terms, then 34 ϕ(t1 ) t1 = t2 ϕ(t2 ) Example 3.10. Find a deduction from the premises to the conclusion in the following inference: No ducks are willing to waltz. No officers are unwilling to waltz. All my poultry are ducks. Therefore, none of my poultry are officers. For formalizing the sentences, D(x) stands for "x is a duck", W (x) stands for "x is willing to waltz", O(x) stands for "x is an officer", and P (x) stands for "x’s are my poultry". We have the following deduction: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 3.6 3.6.1 ∀x (D(x) → ¬W (x)) ∀y (O(y) → W (y)) ∀z (P (z) → D(z)) D(x) → ¬W (x) O(x) → W (x) P (x) → D(x) P (x) D(x) ¬W (x) ¬O(x) P (x) → ¬O(x) ∀x (P (x) → ¬O(x)) premise premise premise US, 1 US, 2 US, 3 assumption MP, 6,7 MP, 4,8 TT, 5,9 DT, 7, 10 UG, 11 Logical Equivalences Main Cases As in the case of propositional logic, one can use logical equivalences to manipulate logical expressions. Specifically, if A is a logical expression, one can replace any subexpression B of A with a subexpression C, as long as B and C are logically equivalent. We give a collection of useful logical equivalents in the following list. 35 1. 1d. 2. 2d. 3. 3d. 4. 4d. 5. 5d. 6. 6d. 7. 7d. ∀x A ≡ A ∃x A ≡ A ∀x A ≡ ∀ySyx A ∃x A ≡ ∃ySyx A ∀x A ≡ Stx A ∧ ∀x A ∃x A ≡ Stx A ∨ ∃x A ∀x(A ∨ B) ≡ A ∨ ∀xB ∃x(A ∧ B) ≡ A ∧ ∃xB ∀x(A ∧ B) ≡ ∀xA ∧ ∀xB ∃x(A ∨ B) ≡ ∃xA ∨ ∃xB ∀x∀yA ≡ ∀y∀xA ∃x∃yA ≡ ∃y∃xA ¬∃xA ≡ ∀x¬A ¬∀xA ≡ ∃x¬A if x not free in A if x not free in A if y not free in A if y not free in A for any term t for any term t if x not free in A if x not free in A Example 3.11. In the expression ∀xP (x)∨∀xQ(x), the bound variable x appears under two different scopes. By using law 2, one can change the x of the second universal quantifier to y, which yields ∀xP (x) ∨ ∀xQ(x) ≡ ∀xP (x) ∨ ∀yQ(y). Example 3.12. In the expression P (x) ∧ ∃xQ(x), the variable x appears first free and then bound. By using law 2d, one can write this as P (x) ∧ ∃yQ(y). Examples 3.11 and 3.12 show how to standardize the variables apart. Definition 3.4. Renaming the variables in an expression such that distinct variables have distinct names is called standardizing the variables apart. Example 3.13. Standardize all variables apart in the following expression: ∀x(P (x) → Q(x)) ∧ ∃xO(x) ∧ ∃zP (z) ∧ ∃z(Q(z) → R(x)). Use y for x in ∀x, u for x in ∃xQ(x), and w for z in ∃zP (z) to obtain ∀y(P (y) → Q(y)) ∧ ∃uO(u) ∧ ∃wP (w) ∧ ∃z(Q(z) → R(x)). 36 Some operations cannot be done for expressions containing negated quantifiers. To remove negated quantifiers, laws 7 and 7d are useful, as the next example demonstrates. Example 3.14. Apply laws 7 and 7d to remote all negations in front of the quantifiers of the following expression: ¬∀z(∃xP (x, z) ∧ ¬∀xQ(x, z)) We have ¬∀z(∃xP (x, z) ∧ ¬∀xQ(x, z)) ≡ ∃z¬(∃xP (x, z) ∧ ¬∀xQ(x, z)) ≡ ∃z(¬∃xP (x, z) ∨ ∀xQ(x, z)) ≡ ∃z(∀x¬P (x, z) ∨ ∀xQ(x, z)) 3.6.2 Law7d DeM organ Law7 Other Important Equivalences Universal quantifiers cannot be dropped unless they are at the beginning of the expression. In this respect, the following equivalence is of interest. ∀xP (x) ∨ ∀yQ(y) ≡ ∀x∀y(P (x) ∨ Q(y)). To prove this law, rewrite law 4 as (∀xB) ∨ A ≡ ∀x(B ∨ A). (3.1) Now we have ∀xP (x) ∨ ∀yQ(y) ≡ ∀x(P (x) ∨ ∀yQ(y)) ≡ ∀x∀y(P (x) ∨ Q(y)) (3.1) with A := ∀y(Qy) Law 4 with A := P (x) Note that the condition that A must not contain the bound variable of the quantifier in question applies. ∀yQ(y) does not contain the bound variable x. Moreover, when A := P (x), the bound variable is y, and P (x) does not contain y. 37 Consider the statement "If somebody talks, it will be in the news tomorrow". If C(x) stands for "x talks" and if Q stands for "it will be in the news tomorrow", one can translate this sentence as ∃xC(x) → Q. (3.2) Another translation, which is not obvious, is ∀x(C(x) → Q). (3.3) Both expressions are logically equivalent, as we will prove next. The first version is the version normally used in natural language. However, for logical derivations, the second version is preferable. Unfortunately, a verbal translation of the second version is difficult. One would have to say something like this: for each x, it is true that if x talks then it will be in the news tomorrow. The following law shows that the expressions given by (3.2) and (3.3) are indeed logically equivalent: ∀x(B → A) ≡ ∃xB → A. (3.4) The proof of this equivalency is as follows: ∀x(¬B ∨ A) ≡ (∀x¬B) ∨ A ≡ ¬∃xB ∨ A ≡ ∃xB → A Note again that A must not contain x. Example 3.15. Clearly, if x < y and y < z, then x < z. If G(x, y) means that x < y, then this translates into G(x, y) ∧ G(y, z) → G(x, z). This obviously holds for all x, y, and z; that is ∀x∀y∀z(G(x, y) ∧ G(y, z) → G(x, z)). (3.5) On the other hand, one can say that x < z if there is a y such that x < y and y < z. This can be expressed as follows: ∀x∀z(∃y(G(x, y) ∧ G(y, z)) → G(x, z)). Are these two expressions logically equivalent? 38 (3.6) We solve this question as follows. Interchange the second and third quantifier in (3.5) to obtain ∀x∀z(∀y(G(x, y) ∧ G(y, z) → G(x, z))) The innermost quantifier is an instance of the left of (3.4), which implies that ∀x∀z(∃y(G(x, y) ∧ G(y, z)) → G(x, z)). This shows that (3.5) and (3.6) are logically equivalent: ∀x∀y∀z(G(x, y) ∧ G(y, z) → G(x, z)) ≡ ∀x∀z(∃y(G(x, y) ∧ G(y, z)) → G(x, z)). 3.7 3.7.1 (3.7) Equational Logic Introduction Equality is essential for doing algebraic operations. In fact, algebraic operations are used to convert a given expression into some other expression that is either simpler or otherwise more suitable for the purpose at hand. Equality is also important in logic, in particular for indicating that there is only one element satisfying a certain property. For doing algebraic operations, one needs an algebra. Generally, an algebra is given by domain, such as the real numbers, together with some operations, such as + and ×. As it turns out, operations are really functions, and to understand operations, one must know about functions. Generally, functions have values or images, and these images depend on their arguments. All arguments must be individuals, that is, they must be part of the universe, and so are their images. A function establishes in this sense a relation between individuals. What makes this relation special is that the image of a function is always unique. The fact that the image of a function is always uniquely identifiable individual makes the image a term. In other words, one can use a function wherever one can use a constant or a variable. Moreover, without requiring uniqueness, algebraic manipulations would be rather restricted, as will be shown. Although the standard algebra, that is, the algebra in the domain of the reals using operations + and × (or ·), is by far the most important algebra, there are other algebras. 39 3.7.2 Equality Two elements t and r are equal if they refer to the same individual, and to express this, one writes t = r. Equality is really a predicate, and Eq(t, r) could be used instead to express that t and r are equal. However, the normal equal sign is more convenient, and it will therefore be used here. Hence, t = r is an atomic formula, which can be combined with other atomic formulas in the usual fashion, as in (x = y) ∧ (y = z), or ¬(x = y). Especially, instead of using the form ¬(x = y), the abbreviation x 6= y is generally used. Consider the ways to define equality between two objects. We have following possibilities: 1. Explicit assignment: objects that are equal must be enumerated. This method is obviously restricted to finite domains. 2. Providing rules to determine whether or not two objects are equal. 3. To postulate a number of axioms that the equality predicate must satisfy. We use the last alternative. Obviously, every term t is equal to itself, which leads to the following axiom: ∀x(x = x). (3.8) This axiom is called the reflexivity axiom. A predicate G(x, y) is said to be reflexive if G(x, x) is true for all x. Another important property of the equality is the substitution property. If t and r, t = r, are two terms, the substitution property allows one to substitute t in any expression by r. For example, if t = r, and P (t) is true, then P (r) must also be true. This gives rise to a rule of inference. To formulate this rule, we first introduce a symbol for doing replacements. Specifically, if A is any expression, then R(n)tt A is the expression one obtains from A by replacing the nth instance of term t by r. If t occurs fewer that n times in A, then R(n)tr A = A. 40 Example 3.16. Determine R(1)xy (x = x), R(2)xy (x = x), and R(3)xy (x = x). Since the first instance of x is at the left of the equal sign, R(1)xy (x = x) is y = x. Similarly, R(2)xy (x = x) is x = y. Finally, R(3)xy (x = x) is x = x. We now have the following rule: Substitution Rule: If A and t = r are two expressions that have been derived, one is allowed to conclude that R(n)tr A for any n > 0. In this case, we say that we substitute t from t = r into A. Example 3.17. Substitute x + 1 from x + 1 = 2y into x < x + 1. According to the substitution rule, x + 1 is replaced by 2y, which yields x < 2y. This is the same as R(1)x+1 2y (x < x + 1). Example 3.18. show that x = y substituted into x = x yields y = x. Replace the first x in x = x by y, which yields R(1)xy (x = x) = (y = x). The equality predicate is symmetric; that is, x = y implies that y = x. This is easily proved. We assume x = y, and we prove that y = x. This is done as follows: 1. ∀x(x = x) 2. x=y 3. x=x 4. y=x 5. x = y → y = x reflexivity axiom assumption instantiate line 1 with x := x substitute line 2 into line 3 deduction theorem, 2, 4 The proof starts with line 3, which instantiates the reflexivity axiom given on line 1. Specifically, the bound variable x of line 1 is replaced by free variable x, and this is indicated by writing x := x. Line 3 is now obtained by substituting the first x by y. According to the deduction theorem, one concludes that x = y → y = x. This completes the proof. The resulting expression can be universally quantified, which yields ∀x∀y((x = y) → (y = x)). 41 Consequently, given t = u, not only can one replace t by u, but one can also replace u by t. (To be completely rigorous, one would have to add the corresponding steps 2, 3, and 4 of the preceding derivation. For simplicity, we omit these steps.) Next we show that x = y and y = z imply that x = z. This property is called the transitivity property of equality. To show that this is the case, x = y and y = z are used as premises. This allows one to derive x = z as follows: 1. x = y premise 2. y = z premise 3. x = z substitute line 2 into line 1. Hence, (x = y) ∧ (y = z) → (x = z). Since this is valid, one can generalize to obtain ∀x∀y∀z((x = y) ∧ (y = z) → (x = z)). Substitutions of the type used in the proofs for reflexivity and transitivity are frequent. We therefore add the following example. Example 3.19. After each iteration in a certain loop, the condition s > 3i is true. Moreover, once the loop terminates, i = 10. Prove that when the loop terminates s > 30. When the loop terminates, both s > 3i and i = 10 hold. By using these conditions as premises, one easily finds s > 30 by replacing i in s > 3i by 10, which completes the proof. For the sake of completeness, we give the formal proof: 1. s > 3i premise 2. i = 10 premise 3. s > 3 · 10 substitute line 2 into line 1. The substitution rule can be applied to subexpressions embedded in other expressions. No other rule of inference discussed so far can do this. This means 42 that the substitution rule is very efficient. Whereas the application of all other rules requires that the higher layers of a formula must first be peeled away to allow access to the target subexpressions, the substitution rule can be applied directly. This saves many steps in the derivation. The substitution rule can be applied directly. This saves many steps in the derivation. The substitution rule is therefore used extensively in mathematical derivations. 3.7.3 Equality and Uniqueness Consider the statement "The lion is a mammal". Examine, whether this statement can be expressed as lion = mammal. The answer to this question is no. To see why, add the statement bear = mammal. By substituting the first statement into the second, one obtains bear = lion, and this is obviously false. This shows that the word "is" cannot always be translated to ’=’. Generally, the equal sign cannot be used if the left side of the expression can refer to different objects. If x1 = y and x2 = y, then one can always conclude that x1 = x2 ; that is, x1 and x2 must also be the same. There can be only one x for each y such that x = y. Equality necessitates uniqueness. Conversely, to express uniqueness, one uses equality. The statement that individual a is the only element with the property P can be reworded as "if x is not a, then P (x) cannot be true". This translates into ∀x(¬(x = a) → (x = a)). Example 3.20. Translate "only a could have forgotten the meeting" into logic. Let P (x) be "x forgot". Then one has ∀x(P (x) → x = a). In words, "If somebody forgot, then it must have been a". 43 The fact that there is one, but only one individual with the property P can now be expressed as ∃x(P (x) ∧ ∀y(P (y) → (y = x))). This expresses the fact that there is an x that makes P (x) true, and P (y) is true only if y = x. To avoid having to write such a lengthy expression, one defines ∃1 xP (x) to indicate that only one element satisfies P . In other words, one has ∃1 xP (x) ≡ ∃x(P (x) ∧ ∀y(P (y) → y = x)). (3.9) Example 3.21. Translate the statement "The company has exactly one CEO" into logic, both with and without using the quantifier ∃1 . Let CEO(x) express the fact that x is CEO. Then one has ∃1 xCEO(x) ≡ ∃x(CEO(x) ∧ ∀y(CEO(y) → y = x)). Example 3.22. Let C(x), S(x), and Q(x) indicate that x is a city, x is a capital, and x is a country, respectively. Assume that the universe of discourse is the set of all cities and the set of all countries. Express the statement "All countries have exactly one capital" in terms of logic. By using the quantifier ∃1 , this statement translates into ∀x(Q(x) → ∃1 y(S(y) ∧ C(y))). To write this expression without the symbol ∃1 , one uses (3.9), except that all variables must be renamed properly. Moreover, care must be taken that the variables do not clash. In the case considered here, one finds that ∀x(Q(x) ∧ ∃y(S(y) ∧ ∀z(S(z) ∧ C(z) → z = y))). There is a second way to express uniqueness. Clearly, if P (x) and P (y) always imply that x = y, then there can be at most one x such that P (x) is true. If, in addition to this, there is an element with property P , then this element is unique. Consequently, ∃1 xP (x) ≡ ∃xP (x) ∧ ∀x∀y(P (x) ∧ P (y) → x = y). 44 (3.10) This method of expressing uniqueness is logically equivalent to the one given by (3.9). In fact, one can derive (3.9) from (3.10), and one can derive (3.10) from (3.9). Example 3.23. Use (3.10) to express "There is exactly one carpenter in the village". If C(x) stands for x is carpenter, and if there is at most one carpenter, then C(x)∧C(y) implies x = y for all possible x and y. In logic, this can be expressed as ∀x∀y(C(x) ∧ C(y) → x = y). If there is exactly one carpenter, one therefore has ∃xC(x) ∧ ∀x∀y(C(x) ∧ C(y) → x = y). 3.7.4 Functions and Equational Logic Functions are extremely important for doing equational logic. To define a function, it must first be given a name, such as f . One now has the following: Definition 3.5. A function f with one argument associates with each individual x a unique individual y, which is referred to as f (x). The value y = f (x) is called the image of x. The fact that each function has a unique image is essential. The reason is that functions are used as terms in logic, and fallacies arise if a term can refer to two or more different individuals. Definition 3.6. A function with n arguments is said to have an arity of n. The arity of a function is fixed. A function f with an arity of n associates with each list of n individuals x1 , x2 , . . . , xn a unique individual y = f (x1 , x2 , . . . , xn ). The individual y is said to be the image of x1 , x2 , . . . , xn . The fact that in a function f each x has a unique image y is crucial. Without this condition, one cannot even write y = f (x) without risking fallacies. To demonstrate this, consider the following example, which pretends to show that 1 = −1. 45 √ Example 3.24. Let y = f (x) if x = y 2 or, alternatively, f (x) = ± y. By using f (x) as a function, one can consider a "proof" that 1 = −1. Find the fallacy in the argument: 1. 2. 3. 4. 1=1 1 = f (1) −1 = f (1) 1 = −1 reflexivity √ of equality 1=± √ 1 −1 = ± 1 substitute f (1) in line 2 by line 3. Clearly, f (x) is not a function, because for each x there are two values for √ √ f (x), that is, + x and − x. Consequently, one is not allowed to write 1 = f (1) or −1 = f (1). When ignoring this fact, fallacies like the preceding can be obtained. The definition of a function implies that there is an image for every argument or, if the function has n arguments, for every possible n-tuple of arguments. This is essential in logic because, otherwise, predicates that use functions as terms may be undefined. Outside logic, it is sometimes convenient to remove this restriction. For example, technically, 1/x is not a function of x in the domain of the real numbers, yet 1/x shares many properties with functions. One therefore defines the following: Definition 3.7. A partial function f of arity 1 associates with each individual x at most one individual y, which, if it exists, is referred to as f (x). Similarly, a partial function g of arity n associates with each n-tuple of individuals x1 , x2 , . . . , xn at most one individual g(x1 , x2 , . . . , xn ). Note that every function is a partial function, but not every partial function is a function. For example, on the domain of integers, f (x) = x/2 is a partial function. However, since one cannot associate any individual with f (x) if x is odd, f (x) = x/2 is not a function, according to Definition 3.5. Definition 3.8. A partial function that is not a function is called a strict partial function. 46 The definitions of the terms function and partial function are not uniform. Some authors include partial functions among the functions. In this case, they use total function for the term function as defined in Definition 3.5. For clarity, we will frequently use the phrase "total function", even though according to our definition, the word "function" alone would suffice. All floating-point operations on computers that may lead to exponent underflow and overflow are strict partial functions. They are partial functions because for each set of operands, the result, if it exists, will ne unique. However, in case of overflow or underflow, there is no result, and the operation is therefore not a total function. Division on the integers is not a total function either. To be total, all divisions have to be defined. However, a division by zero is undefined. 3.7.5 Function Compositions If f is a function with one variable, then z = f (y) is the image of y under the function f . Since each individual y has an image, one can set y = g(x), where g is some function with one variable. This construct associates with each x a unique value z, where z = f (g(x)). Definition 3.9. Let f and g be two functions with one argument. The function that associates with each x the value f (g(x)) is called the composition of f and g, and it is written f ◦ g. it follows that (f ◦ g)(x) = f (g(x)). We will occasionally use the word composition for the composition of partial functions. Hence, if f and g are two partial functions, f ◦ g is the partial function that associates the individual f (g(x)) with x, provided such an individual exists. Example 3.25. Let m be the function that associates with each x his or her mother, and let f be the function that associates with each x his or her father. Then f (m(x)) is the father of the mother, f (f (x)) is the father of the father, m(f (x)) is the mother of the father, and m(m(x)) is the mother of the mother of x. Consequently, f ◦ m is the function that associates with each individual its maternal grandfather, f ◦ f is the function that associates with each individual its paternal grandfather, and so on. Note, that f ◦ m and m ◦ f are distinct. f ◦ m is the maternal grandfather function, whereas m ◦ f is the paternal grandmother 47 function. One can also form compositions involving functions with several variables, as shown by the next example. Example 3.26. Let s(x, y) be the function that associates with each pair (x, y) the sum x + y, and let p(x, y) be similarly the function that associates with each pair (x, y) the product x·y. Then p(s(x, y), z), s(z, p(x, y)), and s(s(z, x), y) are all function compositions: they all associate with each triple (x, y, z) a unique result, as indicated by the following equations: p(s(x, y), z) = (x + y) · z s(z, p(x, y)) = (z + (x · y)) s(s(z, x), y) = (z + x) + y Operations are functions. However, expressions tend to be much clearer when they are written by using operations than when they are written as functions. For example, (x + y) · z is much clearer than p(s(x, y), z). Hence, we will now use operation symbols rather than functions. Strictly speaking, only total functions are operations. However, we will occasionally use operations to express partial functions. A domain, together with one or more operations constitutes an algebra. Since all operations are functions or at least partial functions, algebraic expressions are really compositions, or even compositions of compositions. In this sense, one can say that algebras deal with function compositions. Algebraic expressions must be unambiguous, because fallacies arise otherwise. For example, 3 · 4 + 5 may be interpreted as either (3 · 4) + 5 or 3 · (4 + 5), and if the wrong interpretation is used, the resulting derivation is erroneous. This is demonstrated by the following derivation, which pretends to show that 12 = 6. 1. 3 = 2 + 1 premise 2. 12 = 3 · 4 premise 3. 12 = 2 + 1 · 4 substitute 2 + 1 for 3 on line 2. Hence, 12 = 2 + 1 · 4 = 2 + 4 = 6, which is obviously false. To avoid such errors, one can use fully parenthesized expressions. If line 1 had been 48 written as 3 = (2 + 1), and line 2 as 12 = (3 · 4), the result would have been 12 = ((2 + 1) · 4), which does not admit the erroneous conclusion. We note, that grouping terms in the wrong way is one of the most frequent errors when doing algebraic manipulations. 3.7.6 Properties of Operations This section deals with operations and their properties, a topic of great importance in algebra. The properties of the operations in standard algebra are of course well known. For example, everyone knows that x + y = y + x, and that x · y = y · x. This is the commutative property of addition and multiplication, respectively. Definition 3.10. A function f of arity 2 is commutative if f (x, y) = f (y, x). Similarly, if ◦ is an operation, the ◦ is said to be commutative if x ◦ y = y ◦ x. When speaking about an operation, one always implies a certain domain. Note, however, that the same operation symbol may be used in different domains. For example, the operation + may be used for natural numbers, integers, and real numbers. In this case, it is assumed that there are really three different operations, one corresponding to each domain. Often, the domain is given by the context. Example 3.27. Let max be the binary operation that selects the maximum value of two integers. For example, 3 max 4 = 4. The operation max is commutative, since for all x and y, x max y = y max x, one concludes that the operation max is commutative. The operation ’−’ is not commutative, because x − y = y − x does not generally hold. All logical connectives can be thought of as operations. Obviously, ∨ and ∧ are commutative operations, while → is not commutative. If one deals with files, one can define several operations. A merge operation can be used to indicate that two files are merged. The merge operation is typically commutative. 49 If ◦ is an operation that is defined for a finite universe, one can express the results of the operation by means of a table. Such a table is often called an operation table. For example, the truth tables of logical connectives are operation tables. Example 3.28. Examine, whether the operation ◦ defined in the table is commutative. ◦ a b c d a a a d a b b b c b c d c b a d c d a b The operation ◦ is not commutative. To be commutative, the operation must have the property that x ◦ y = y ◦ x for all x and y. This is not the case here. For example, d ◦ a = a, yet a ◦ d = c. Generally, an operation is only commutative if its operation table is symmetric, and this is not the case in this example. Definition 3.11. If ◦ is an operation, then ◦ is said to be associative, if, for all x, y, and z, x ◦ (y ◦ z) = (x ◦ y) ◦ z. Clearly, the operation + is associative, and the operation − is not. If an operation ◦ is associative and if an expression contains ◦ as its only operation, one can drop all parentheses. This makes expressions more readable. To find if an operation ◦ given by its operation table is associative, one must verify that x ◦ (y ◦ z) = (x ◦ y) ◦ z holds for each possible combination of x, y, and z. Often the only way to do this is to enumerate all these combinations, and this is a lengthy process. In the case of the table in the previous example, there are 43 = 64 such combinations, and to show that ◦ is associative, one must enumerate all these 64 combinations. Example 3.29. Consider a universe of discourse that contains only the two individuals a and b. In this universe, the operation ◦ is defined by the following table: 50 ◦ a b a a b b a a The values x ◦ (y ◦ z) and (x ◦ y) ◦ z are calculated in the following table: xyz y ◦ z x ◦ (y ◦ z) x ◦ y (x ◦ y) ◦ z aaa a a a a aab b b a b aba a a b a abb a a b a baa a a a a bab b a a b bba a a a a bbb a a a b Since the values for these two expressions do not always agree, the operation ◦ is not associative. Because of rounding errors, floating-point addition is not associative. If (x + y)+z is evaluated, then the operation x+y is done first, and the result is rounded. To this intermediate result, z is added, and rounding takes place once more. In the expression x + (y + z), on the other hand, y + z is calculated first, rounded, and then x is added. This final result is then rounded. Typically, the rounding errors in these two evaluations are different, which means that (x + y) + z is not equal to x + (y + z). To minimize rounding errors in floating-point arithmetic, one should make sure that all intermediate results are as small as possible. The reason is that rounding errors tend to be proportional to the numbers that are rounded, and smaller intermediate results therefore typically give rise to smaller absolute errors. Specifically, if x, y, and z are all positive reals and if x and y are both smaller than z, it is best to add x and y first, because the intermediate result x + y is smaller than y + zand the same tends to be true for the rounding error. If several operations are defined within the same domain, then the relations between the operations become important. In arithmetic, one has addition and multiplication, and in statement algebra, one has ∧ and ∨. It is therefore important to consider systems that have two operations, say ◦ and . 51 Definition 3.12. Let ◦ and ne two operations. The operation ◦ is said to be left distributive over if, for all x, y, and z, one has x ◦ (yz) = (x ◦ y)(x ◦ z). The operation ◦ is said to be left distributive over if, for all x, y, and z, (yz) ◦ x = (y ◦ x)(z ◦ x). An operation which is both right and left distributive is said to be distributive. If ◦ is left distributive over and if ◦ is commutative, then ◦ is also right distributive over . To see this, interchange the order of all terms containing x in the applicable definition. Example 3.30. In arithmetic, multiplication is distributive over addition, but addition is not distributive over multiplication. This means that x · (y + z) = (x · y) + (x · z) but x + (y · z) 6= (x + y) · (x + z). Example 3.31. In logic, ∧ is distributive over ∨, and ∨ is distributive over ∧. Example 3.32. Let us denote x max y = max{x, y} to be the maximum of x and y, and let min{x, y} be similarly the minimum of x and y. We show that max is distributive over min. To show this, one has to show that max{x, min{y, z}} = min{max{x, y}, max{x, z}}. (3.11) Assume first that y ≥ z. In this case, min{y, z} = z, and equation (3.11) becomes max{x, z} = min{max{x, y}, max x, z. Since y ≥ z, max{x, y} ≥ max{x, z}, which proves that both sides of the equation are equal. This settles the case y ≥ z. The case y < z can be dealt with in the same way. Hence, by the law of cases, (3.11) is always true. 52 3.7.7 Identity and Zero Elements In this section we discuss two elements, the identity element and the zero element, that, if they exist, turn out to be very useful for doing algebraic manipulations. In particular, they allow one to simplify algebraic expressions. Definition 3.13. Let ◦ be an operation, defined for some universe of discourse. If there is an individual er with the property that, for all x, x ◦ er = x, then er is called a right identity. Similarly, if there is an individual el such that, for all x, el ◦ x = x, then el is called a left identity. An individual e that is both a right and a left identity is called an identity. If an operation possesses both a left and a right identity, then the two identity elements must be equal. In fact, any identity that is both a right and a left identity is unique. The important point now is that, if we talk about the identity, then we implicitly assume that this identity is both a right and a left identity and that it is therefore unique. Example 3.33. Consider a domain with the three elements a, b, and c, together with the operation ◦, as defined by the following operation table: ◦ a b c a a c a b b b b c c a c We find all left and right identities. Since a ◦ x = x, no matter whether x is a, b, or c, a is a left identity. Similarly, c is a left identity. There is no right identity, however. In fact, the existence of two left identities precludes that these be a right identity. If an operation is commutative, the right identity must be equal to the left identity. This is shown as follows: If el is a left identity of ◦, and if ◦ is commutative, then el ◦x = x◦el , which makes el a right identity. Hence, for commutative operations there is only one identity, which is both a right an a left identity. 53 Example 3.34. Find the identities for addition and multiplication. Since addition and multiplication are both commutative, there is only one identity. Since ∀x(x+ 0 = x) is true, 0 is the identity for +. Similarly, 1 is the identity for multiplication, because ∀x(x · 1 = x). Suppose that the operation ◦ has both a right and a left identity e. Any proper subexpression of the form x ◦ y satisfying x ◦ y = e can obviously be deleted. To facilitate such deletions, one defines inverses as follows: Definition 3.14. If ◦ is an operation and if x is an individual, then y is called a left inverse of x if y ◦ x = e. Similarly, if x ◦ y = e, then y is called a right inverse of x. An individual that is both a left and a right inverse of x is called an inverse of x. Example 3.35. Consider the operation ◦ given in the table. Find the right inverses and the left inverses of all elements. ◦ a b c d a a b c d b b a b a c c a d b d d c c c According to the table, the identity element is a. To find the inverses, one identifiers all combinations x ◦ y that yield a. One has a ◦ a = a, b ◦ b = a, b ◦ c = a, d ◦ b = a. Hence, a is its own inverse, and so is b. In addition, b is the left inverse of c, and c is the right inverse of b. Similarly, d is the left inverse of b, and b is the right inverse of d. There is no right inverse of c; that is, there is no x such that c ◦ x = a. Example 3.36. Let x be a real number. Find the additive inverse, that is, the inverse of + for all numbers x. Also, find the multiplicative inverse of x if it exists. 54 The identity for + is 0. The additive left inverse must therefore satisfy y +x = 0. Clearly, y = −x, which makes (−x) the left inverse of x. One easily verifies that −x is also the right inverse of x. Hence, there is only one inverse, which is simultaneously the right and the left inverse. The inverse of x in multiplication is the reciprocal of x, which we denote by x−1 . The right and the left inverses are again equal, and the inverse, if it exists, is unique. However, the number 0 has no inverse. The following theorem deals with the uniqueness of an inverse. Metatheorem 3.1. If the operation ◦ has the (right and left) identity e, if ◦ is associative, and if the right and left inverses of x are equal, then x has a unique inverse. Example 3.37. Use the table in Example 3.35 to demonstrate that it is not sufficient to show that the right and left inverses are equal in order to prove that the inverse is unique. In the table of Example 3.35, b is both a right and a left inverse of itself, yet b has other inverses besides b. In expressions containing inverses, one can rearrange the terms in such a way that the inverses can be combined to yield the identity element, which can then be dropped. For example, the expression (x + y) + (−x) can be written as (y + x) + (−x) by applying the commutative law, and because of the associative law, this yields in turn y + (x + (−x)). since x + (−x) = e, this yields y + e, or y. Of course, in an expression like x + (−x) = e, nothing is left after e has been dropped, and in this case e must be retained. Except for this case, one can cancel all inverses, provided that the expression contains only one operation and this operation is commutative and associative. Example 3.38. Simplify aba−1 bcb−1 in the domain of real numbers, where a−1 and b−1 denote the inverse of a and b, respectively. Since multiplication is associative and commutative, one can cancel a against −1 a and b against b−1 , and the result is bc. 55 Another important element is the zero element. To avoid the need to distinguish between a left and a right zero, we assume that the operation in question is commutative. Definition 3.15. An element d is called a zero element of a commutative operation ◦ if, for all x, x ◦ d = d. Example 3.39. Find the zero element of multiplication in the domain of integers. Since ∀x(x · 0 = 0), one concludes that the zero for multiplication is d = 0. Not all operations have zeros. In particular, the operation + for integers has no zero. There is no number d such that x + d = d for all possible values of x. Suppose that ◦ is a commutative and associative operation with the zero element d. Then x1 ◦ x2 ◦ . . . ◦ xn = d if a single term xi for some i = 1, 2, . . . , n, is equal to the zero element d. Example 3.40. In the case of multiplication of integers, a · b · c = 0 if either a, b, or c is zero. In expressions with two operations, further possibilities for creating identity elements and zero elements arise. In the domain of the real numbers, the identity of the addition becomes the zero of the multiplications, which allows one to simplify expressions such as (x + (−x)) · y, which simply become 0. 3.7.8 Derivations in Equational Logic Equational logic will now be used to give formal derivations for results from the previous section. Moreover, some new results will be derived in a formal way. In equational logic, as in other formal derivations, unification plays a major role. It was stated that if an operation has a right and a left identity then these two identities must be equal, and there is no other identity. Hence, if u and u0 are two left identities and if v and v 0 are two right identities, then u = u0 = v = v 0 . This result is now formally derived. We need the following premises, which are direct consequences of the definitions (steps 1 - 4). In the steps 5 - 7 we unify the left sides of line 1 and 2 and equate the results. In the steps 8 - 10 we unify lines 3 56 and 4, one similarly finds u0 = v 0 . Finally, in the steps 11 - 13 we unify lines 1 and 4 to show that u = v 0 . 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. u◦x=x x◦v =x u0 ◦ x = x x ◦ v0 = x u◦v =v u◦v =u u=v u0 ◦ v 0 = v 0 u0 ◦ v 0 = u0 u0 = v 0 u ◦ v0 = v0 u ◦ v0 = u u = v0 u is a left identity, x is a true variable. v is a right identity, x is a true variable. u0 is a left identity, x is a true variable. v 0 is a right identity, x is a true variable. Instantiate line 1 with x := v. Instantiate line 2 with x := u. Substitute u ◦ v on line 5 by line 6. Instantiate line 3 with x := v 0 . Instantiate line 4 with x := u0 . Substitute u0 ◦ v 0 on line 8 by line 9. Instantiate line 1 with x := v 0 . Instantiate line 4 with x := u. Substitute u ◦ v 0 on line 11 by line 12. The equalities of lines 7, 10, and 13 imply that u, v, v 0 , and u0 are all equal. To be rigorous, one would have to prove equality for each combination of these four variables; that is, one would have to prove that u = v, u = v 0 , u = u0 , v = v 0 , and so on, but since these proofs are easy, they are omitted. Let d be the inverse of c. Then one has ∀x((x ◦ c) ◦ d = x). (3.12) For a formal proof, all premises must be stated. They are as follows: 1. x ◦ e = x e is an identity, x is a true variable. 2. c ◦ d = e d is the inverse of c, where c, d are fixed. 3. (x ◦ y) ◦ z = x ◦ (y ◦ z) Operation is associative and x, y, z are true variables. Line 1 indicates that to have an inverse one must have an identity. This identity is denoted by e. Line 2 defines d to be the inverse of c, and line 3 indicates that ◦ is associative. To derive (x ◦ c) ◦ d = x, first unify this (x ◦ c) ◦ d with the left of line 3. This yields line 4. The remaining lines of the derivation are easy to trace. 57 4. (x ◦ c) ◦ d = x ◦ (c ◦ d) Instantiate line 3 with y := c and z := d. 5. x ◦ c) ◦ d = x ◦ e Replace c ◦ d on line 4 by line 2. 6. x ◦ c) ◦ d = x Substitute line 1 into line 5. Since x is a true variable, one can generalize the last line to obtain (3.12) as required. In some cases, there is no premise that can be applied directly. In such cases, one uses the reflexivity axiom to generate an equality that can then be modified by substitutions. Example 3.41. Show that a ◦ (b ◦ e) = a ◦ b. Here, e is the identity. The premises are the reflexivity axiom and the definition of the identity. The reflexivity axiom is used first, with x instantiated to the left of the desired conclusion. The following derivation results. 1. 2. 3. 4. 5. ∀x(x = x) ∀x(x ◦ e = x) a ◦ (b ◦ e) = a ◦ (b ◦ e) b◦e=b a ◦ (b ◦ e) = a ◦ b Reflexivity axiom. Definition of the identity e. Instantiate line 1 with x := a ◦ (b ◦ e). Instantiate line 2 with x := b. Replace second b ◦ e in line 3 by line 4. It is well known that one can add a constant to both sides of an equation and that one can multiply both sides of an equation by a constant. More generally, from a = b, one can conclude that a ◦ c = b ◦ c for any operation ◦. One has 1. 2. 3. 4. ∀x(x = x) a=b a◦c=a◦c a◦c=b◦c Reflexivity axiom. Premise. Instantiate line 1 with c := a ◦ c. Substitute from line 2 into line 3. The step from a = b into a ◦ c = b ◦ c is called postmultiplication. In spite of its name, postmultiplication is not restricted to multiplication; it can involve any operation, including addition, Boolean operations, and others. One can also premultiply equations. For example, if a = b, then premultiplication with c leads to c ◦ a = c ◦ b. Note that a, b, and c are true variables, which means that one can 58 generalize as follows: ∀x∀y∀z((x = y) → (z ◦ x = z ◦ y)) ∀x∀y∀z((x = y) → (x ◦ z = y ◦ z)) 3.7.9 (3.13) (3.14) Equational Logic in Practice In practice, algebraic manipulations are often abbreviated. The commutative and associative laws are often used like rules of inference, and to get from one expression to the next, one often applies several rules simultaneously. Moreover, to express the two equations x = y and y = z, one frequently writes x = y = z. Because of the transitive property of equality, x = y and y = z imply that x = z, and for this reason, x = y = z also means that x = z. This observation obviously generalizes. For example, to prove that x1 = x4 , one can show that x1 = x2 = x3 , and x3 = x4 . One abbreviates this as x1 = x2 = x3 = x4 . The following example shows how this idea is applied. Example 3.42. Simplify a ◦ (b ◦ a−1 ), where ◦ is commutative and associative, and a−1 is the inverse of a. One has a ◦ (b ◦ a−1 ) = a ◦ (a−1 ◦ b) = (a ◦ a−1 ) ◦ b =e◦b =b Commutativity Associativity Inverse Identity Note the strategy used: in each step, the distance between a and a−1 is reduced in some sense, until a ◦ a−1 is obtained. Generally, one should always try to keep the objective of the derivation in mind and try to narrow the gap between what one has and when one wants to accomplish. It also helps to identify intermediate goals, especially when dealing with derivations. The process of writing a sequence of expressions, each equal to the previous one, with the objective to reach some type of a goal is used extensively in functional programming languages, where it is known as rewriting. Although 59 rewriting is extremely important, it sometimes leads to inefficiencies because it may require that some subexpression be evaluated repeatedly. To show this, consider the calculation of the Fibonacci numbers Fn , which are defined as follows: F0 = 1, (3.15) F1 = 1, Fn = Fn−1 + Fn−2 , n > 1. For example, F2 = F1 + F0 , F3 = F2 + F1 , and so on. Suppose now that F4 is to be calculated. This can be done in two ways. First, one can write F2 = F1 + F0 = 1 + 1 = 2 F3 = F2 + F1 = 2 + 1 = 3 F4 = F3 + F2 = 3 + 2 = 5 since F1 = 1, F0 = 1 since F2 = 2, F1 = 1 since F3 = 3, F2 = 2 Alternatively, one can rewrite F4 as follows: F4 = F3 + F2 = (F2 + F1 ) + (F1 + F0 ) = (F2 + 1) + (1 + 1) = ((F1 + F0 ) + 1) + 2 = ((1 + 1) + 1) + 2 = (2 + 1) + 2 = 5. Expand F3 and F2 . F1 = F0 = 1. Expand F2 . 1 + 1 = 2. F1 = F0 = 1. Note that, in the derivation, F2 = F1 +F0 = 1+1 = 2 has been calculated twice. This can happen unless special precautions are taken. It will be shown that this can slow down execution times dramatically. Generally, it pays to watch for common subexpressions appearing more that once in order to avoid evaluating them repeatedly. An important result is the cancellation rule, which allows one to infer under certain conditions that a = b from a ◦ c = b ◦ c. To see why restrictions apply, consider the following applications of the cancellation rule. Clearly, if a + c = b + c, then a = b. In the case a · c = b · c, one can conclude that a = b only if c 6= 0. If c = 0, the cancellation law does not apply. This suggests that for the cancellation law to hold the term to be cancelled must have an inverse, and this 60 is indeed the case. Hence, the derivation must reflect that c has an inverse, say c−1 . The premise is a ◦ c = b ◦ c. Postmultiplying by c−1 yields (a ◦ c) ◦ c−1 = (b ◦ c) ◦ c−1 . We can now simplify both sides separately. The left side yields (a ◦ c) ◦ c−1 = a ◦ (c ◦ c−1 ) =a◦e =a Associativity Inverse Identity (3.16) A similar argument shows that the right side yields b, and we conclude that a = b. If c has an inverse and if ◦ is associative, then a ◦ c = b ◦ c implies that a = b. 3.7.10 Boolean Algebra For statement algebra, the notion of commutativity and associativity have already been introduced earlier. Indeed, ∧ and ∨ are both commutative and associative. Moreover, P ∧ T = P (where T is the truth value ’true’) for all P , which makes T the identity of ∧. The identity of ∨ is simirarly F (where F is the truth value ’false’). Also, by the law of domination, P ∧ F = F , which makes F the zero of ∧. The zero of ∨ is similarly T . Finally, the distributivity laws indicate that ∧ is distributive over ∨ and that ∨ is distributive over ∧. However, there is a difference in the philosophy of statement algebra and other algebras. As the name implies, statement algebra deals with statements, and it distinguishes between atomic and compound statements. Two statements of a different form are not considered equal, even if they are equivalent. For example, P ∨ Q is not considered equal to Q ∨ P . For this reason, we formally introduce a number of new operations and new symbols. In fact, we introduce a Boolean algebra. A Boolean algebra is an algebra with two operations, which are denoted by + and ·. Both operations are total in the sense that, for all arguments x and y, x + y and x · y are defined. Moreover, the algebra has the following properties: 1. There is an identity element for +, called 0. 61 2. There is an identity element for ·, called 1. 3. The operation + is commutative. 4. The operation · is commutative. 5. The operation + is distributive over ·. 6. The operation · is distributive over +. 7. For every individual x, there is an element x0 , called the complement of x, with the property that x + x0 = 1 and x · x0 = 0. 8. The domain contains at least two elements. The operation · is often omitted. In the simplest case, the domain of a Boolean algebra contains two values, 0 and 1. Such Boolean algebra is called a twovalued Boolean algebra. We will show that the following assignment constitutes a two-valued Boolean algebra. In fact, it is the only such algebra. 1. The complements x0 are defined as follows: 00 = 1, 10 = 0. 2. The operations + and · are defined by the following operation tables: · 0 1 0 0 0 1 0 1 + 0 1 0 0 1 1 1 1 We now verify that all the conditions of a Boolean algebra are met. Since the operation tables for + and · are both symmetric, the algebra is commutative. Second, 0 is the identity element of + and 1 is the identity element of ·, as is easily verified from the operation tables. The fact that + is distributive over · is proved by verifying that, for all x and y, x, y = 0, 1, one has x + (y · z) = (x + y) · (x + z). This is done in the table 62 (3.17) xyz 000 001 010 011 100 101 110 111 1 2 3 4 5 y · z x + y · z x + y x + z (x + y) · (x + z) 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 In this table, x + (y · z) is given in the column labelled 2, and (x + y) · (x + z) on column 5. It is easy to see that the two columns are equal in all cases, which means that (3.17) holds in all cases. The proof that · is distributive over + is done in a similar way. Finally, 0 + 00 = 1 and 1 + 10 = 1, which means that, for all x, x + x0 = 1. The fact that x · x0 = 0 for all x is shown in a similar way. We now show that this is the only two-valued Boolean algebra. Since 0 is the identity for +, one must have 0 + 0 = 0 and 0 + 1 = 1 + 0 = 1, and since 1 is the identity of ·, one must have 0 · 1 = 1 · 0 = 0 and 1 · 1 = 1. Moreover, since a Boolean algebra must satisfy x + x0 = 1 for all x, 0 + 00 must be 1. This rules out that 00 = 0 because, under this assignment, 0 + 00 = 0 + 0 = 0. The only alternative assignment for 00 is of course 1, and this corresponds to the Boolean algebra given earlier. A similar argument shows that 10 = 0. The only values still open at this point are 1 + 1 and 0 · 0, the system no longer has the required distributive properties. The reader may want to verify this. This leaves the Boolean algebra given previously as the only possibility. If in statement algebra all equivalences are interpreted as equalities, then the statement algebra becomes a Boolean algebra. To see this, redefine 0, 1, +, ·, and complementation as shown in the next table. There is a dual interpretation of two-valued Boolean algebra. In this dual interpretation, one uses 0 for T, 1 for F, + for ∨, and · for ∧. This dual interpretation is the explanation for the dual relations discussed earlier. 63 Boolean Statement Natural algebra algebra language 0 F false 1 T true + ∨ or · ∧ and x0 ¬x not Example 3.43. Express p ∨ ¬p ↔ T and p ∧ ¬p ↔ F in Boolean algebra. p ∨ ¬p ↔ T becomes p + p0 = 1, and p ∧ ¬p ↔ F becomes p · p0 = 0. In conclusion, equational logic provides a basis for Boolean algebra and indirectly, a basis for propositional logic. In this sense, it ties together predicate logic and propositional logic. 3.8 3.8.1 An Axiomatization for Fields Tarski’s Axioms for Fields We begin from the well-known field theory of real numbers. Consider a first order language P that is an extension of classical predicate calculus. We give the axiomatization for field theory using axioms being essentially given by Tarski in his Introduction to Logic, New York, 1941. Definition 3.16. Let P = {F, +, ×, −, −1 , 0, 1} be a first order calculus where F is the set of terms of the system, + and × are binary operation symbols, − and −1 are unary operation symbols, and 0 and 1 are individual constants. A theory having the following non-logical axioms is called (a first order) field theory: A1. ∀ x ∀ y (x + y = y + x), A2. ∀ x (x + 0 = x) (0 is the identity element of +), A3. ∀ x (x + (−x) = 0), A4. ∀ x ∀ y ∀ z ((x + y) + z = x + (y + z)), A5. ∀ x ∀ y ∀ z ((x × y) × z = x × (y × z)), 64 A6. ∀ x (x × 1 = x) (1 is the identity element of ×), A7. ∀ x (¬(x = 0) → x × x−1 = 1), A8. ∀ x ∀ y (x × y = y × x), A9. ∀ x ∀ y ∀ z (x × (y + z) = x × y + x × z), A10. ¬(0 = 1). A model of a field theory is called a field. 3.8.2 The Field of Real Numbers The set of real numbers, i.e. the first order calculus PR = {R, +, ×, −, −1 , 0, 1}, is a field satisfying these axioms for arithmetical operations of real numbers. The variable symbols refer to crisp real numbers. References [1] Stanley N. Burris, Logic for Mathematics and Computer Science, Prentice Hall, 1998. [2] Winfried.K. Grassmann, Jean-Paul Tremblay Logic and Discrete Mathematics. A Computer Science Perspective, Addison-Wesley, Pearson Education Limited, 2002. [3] R. Johnsonbaugh, Discrete Mathematics, Prentice Hall, 4. painos, 2001. [4] Veikko Rantala, Ari Virtanen, Logiikkaa. Teoriaa ja sovelluksia, Matemaattisten tieteiden laitos, Tampereen yliopisto, B 43, 1995 (In Finnish) 65