Download Short Introduction to Elementary Set Theory and Logic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Large numbers wikipedia , lookup

Abuse of notation wikipedia , lookup

Functional decomposition wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Continuous function wikipedia , lookup

Big O notation wikipedia , lookup

Dirac delta function wikipedia , lookup

Elementary mathematics wikipedia , lookup

Principia Mathematica wikipedia , lookup

Non-standard calculus wikipedia , lookup

History of the function concept wikipedia , lookup

Function (mathematics) wikipedia , lookup

Function of several real variables wikipedia , lookup

Order theory wikipedia , lookup

Transcript
Short Introduction to Elementary Set Theory and Logic
This is a short introduction to elementary set theory and logic. The goal of these notes is to
familiarize students of Introduction to Linear Algebra (Math 235) with the basic concepts and
notation of set theory. Even though this is not formally part of the course, students who study
these notes will find it easier to understand various concepts throughout the class, especially
in later Chapters of the book.
1
Elementary Set Theory
§ 1.1
Sets
A set is a collection of objects. We denote sets by letters, usually capital ones. The objects of
a set are called the elements of the set. For instance, the collection of all integers more than 0
and less than 5 is a set, let’s call it A, consisting of the elements 1, 2, 3, and 4. Each object is
considered distinct and no repetitions occur. For instance, if we say that S is a set consisting
of the numbers 1, 1, and 2 (1 appearing twice), we mean that S consists of 1 and 2; we ignore
the repetition of the element 1.
When x is an element of S, we also say that x lies in S. The expression x ∈ S means that
x is an element of the set S, while the expression x 6∈ S means that x is not an element of S.
If A is the set described above, then 3 ∈ A, while 7 6∈ A. For another example, let P be the
set of planets in our solar system. It is a set consisting of 8 elements (Pluto is not a planet).
Then “Jupiter” ∈ P , while 4 6∈ P .
We can denote a set by listing its elements inside curly brackets “{. . .}”. The set A given
above is {1, 2, 3, 4}. If the set has a large number of elements, or even infinitely many of them,
and these elements follow some pattern, then we may use dots “. . .” to shorten our expressions.
For instance, if B is the set of all even integers greater than 7 and less than 2015, and C is the
set of all integers less than −5, we may write
B = {8, 10, 12, . . . , 2014}
and C = {−4, −3, −2, . . .} .
Sometimes the elements of a set do not follow a clear pattern, but still they are of the same
“type” or “kind”. As an example, the collection of all the functions of real numbers is a set,
let’s call it F , but we cannot list its elements with a clear pattern and write F = {f1 , f2 , f3 , . . .}.
In this case we just describe the set in words, which we may put between curly brackets. In
particular,
F = {functions of real numbers} .
There are some commonly used sets for which notation has been standardized. Some of
them are:
N : Set of all natural numbers (positive integers)
Z : Set of all integers
Q : Set of all rational numbers
R : Set of all real numbers
1
Note that 0 is not an element of N. You may also be familiar with complex numbers, the set
of which is denoted by C. Later in class you will learn more “standard” sets, like the set of all
(real) n-vectors, denoted by Rn , and the set of (real) m × n matrices, denoted by Mm×n (R)
or Rm×n . One set that you should also have in mind is the empty set, the set which has no
elements. It is denoted by ∅.
§ 1.2
Subsets
Given a set S, a subset of S is a set U that consists of some of the elements of S. Equivalently,
U is a subset of S if whenever x is an element of U , x is also an element of S, i.e. if x ∈ U ,
then x ∈ S. We write U ⊂ S or S ⊃ U to denote that U is a subset of S. For instance the set
B = {1, 2} is a subset of the set A = {1, 2, 3, 4}, as both 1 and 2 are elements of A. On the
contrary, C = {−1, 3, 4} is not a subset of A, because −1 ∈ C but −1 6∈ A. So we may write
B ⊂ A, but not C ⊂ S. Note that S ⊂ S is always true, no matter what S is.
We can describe subsets by listing their elements between curly brackets as in §1.1, or in
any other way we use to describe sets. In addition to that, there is an alternative way for
describing subsets. We illustrate this with an example. Consider the collection G of all the
functions of real numbers whose value at 0 is 1. This is clearly a subset of the set F of all
functions of real numbers. We cannot list the elements of G with a clear pattern and write
F = {f1 , f2 , f3 , . . .}. In this case we use the following expression to describe G:
G = {f ∈ F : f (0) = 1}
This says that G is a set whose elements are the elements of F that satisfy the property
f (0) = 1. To determine whether an element f of F (a function of real numbers) lies in G we
have to ask the question: is the value of f at 0 equal to 1? If yes, then f ∈ G, otherwise f 6∈ G.
In general, given a set S, the expression
U = {x ∈ S : P (x)}
means U is the subset of S consisting of all elements of S that satisfy property P (x). The
property P (x) is an expression that involves x and can be true or false, depending on the
particular value of the variable x. The variable x takes values from the set S. The elements of
U are the elements x of S that make P (x) true.
There is one more alternative way of expressing subsets. Once again, we begin with an
example. Recall that N denotes the set of all positive integers. Consider the set Q of squares
of all positive integers, namely 1 = 12 , 4 = 22 , 9 = 32 , 16 = 42 , and so on. We could write
Q = {1, 4, 9, 16, . . .} and hope that the reader will understand what the pattern is. We can also
write this set as
Q = n2 : n ∈ N .
This expression says that Q consists of all numbers n2 , as n ranges in the set N. So, to determine
which elements lie in Q, we give n all the values from the set N and we calculate n2 ; the results
are the elements of Q.
In general, given a set S and an expression R(x) that takes elements x ∈ S and returns
elements from some set T , we write
U = {R(x) : x ∈ S}
for the subset U of T (not of S) consisting of all elements R(x) as x ranges in the set S. To
determine which elements lie in U , we give x all the values from the set S, and we calculate
R(x).
2
Before we move ahead, we should make one important remark. Sets are characterized by
their elements, and not by the way we choose to represent them. For instance, we can write
the set of positive even integers in any of the following ways:
{2, 4, 6, . . .} ,
{2n : n ∈ N} ,
{x ∈ N : x is even} .
These expressions refer to the same set, not different ones. It is not always clear if the sets
defined by different expressions are equal or distinct.
To determine if two sets are equal we need to check if they have precisely the same elements.
We do this as follows. Say that X and Y are two sets. To show that X and Y are equal, we
need to show that any element x ∈ X lies in Y , and that any element y ∈ Y lies in X. This
amounts to showing X ⊂ Y and X ⊃ Y . To show that two sets are not equal, we just need to
find an element which lies in one set but not the
other.
For an example, let A = {−2, 2} and B = x ∈ R : x2 − 4 = 0 . We claim that A and B
are equal. Let a be an element of A. We need to show that a is in B. The elements of B are
precisely the numbers that satisfy the equation x2 − 4 = 0. Since a ∈ A, from the description
of A we know that a is either −2 or 2. No matter what the case is, a satisfies a2 − 4 = 0: if
a = −2, then (−2)2 − 4 = 0 is true, and similarly if a = 2, then 22 − 4 = 0. Thus a ∈ B and so
A is a subset of B. Now we show that any element of B is in A. Let b ∈ B. Then b is a real
number that satisfies the equation x2 − 4 = 0. This equation has two solutions, −2 and 2. So,
b is either −2 or 2. In any case, b is in A. This shows A ⊃ B, and so A = B follows.
The latter example was easy because the two sets (which turned out to be equal) had only
2 elements. But what if the two sets had infinitely many elements? Consider the infinite sets
A = {x ∈ R : −2 < x < 2} and B = x ∈ R : x2 + 1 < 5 .
We show that they are equal. First let a be an arbitrary element in A. We do not know which
element a is, we only know that it satisfies −2 < a < 2. There are infinitely many possibilities
for a. But since −2 < a < 2, the square of a is a non-negative number less than 4, i.e. a2 < 4.
Adding one in this inequality we get a2 +1 < 5. We see that a satisfies the inequality x2 +1 < 5,
so that it is an element of B. Conversely, let b ∈ B be an arbitrary element. The only thing we
can tell about b is that it satisfies b2 + 1 < 5. Then we can subtract 1 in the latter inequality
and get b2 < 4. This is equivalent to −2 < b < 2. Therefore, b is a real number that satisfies
the inequality −2 < x < 2 and so it is in A. We conclude that A = B.
Showing that two sets are not equal is much easier. For instance, the set of integers Z is not
equal to the set of natural numbers N since −1 ∈ Z, while −1 6∈ N. This is enough to establish
that Z and N are distinct.
We should point out the difference between showing X = Y and X 6= Y for two sets X and
Y . If we want to show that X and Y are equal, it is not sufficient to pick a specific element
of X and show that it lies in Y . We need to show that every element of X is in Y , and vice
versa. In practice, since many sets have a huge number or even infinitely many elements, we
do this by picking an arbitrary element of X, for which we know that it satisfies the properties
that all elements in X satisfy, and then show that it lies in Y based on these properties. For
showing that X 6= Y , we need only one element that is not in both sets. There may be other
elements that lie in both sets, but the existence of a single one that is not is enough.
In class we will encounter subsets numerous times. We will talk about the kernel and the
image of a linear transformation, which are certain kinds of subsets, and we will discuss about
subspaces of Rn or other vector spaces, which are also certain types of subsets (don’t worry
about what these words mean yet).
3
2
Functions
§ 2.1
Functions
A function f between sets X and Y is a rule that associates to each element x of X a unique
element of Y . We denote f (x) the element associated to x. We call the set X the domain of
f , and Y the codomain or range of f . The element f (x) is called the image or the value of x.
For convenience, we write f : X → Y to say that f is a function with domain X and range Y .
We also write x 7→ f (x) to say that the image of x is f (x). A function is also called a map,
mapping, or transformation. In these notes I use the term function, but in the class I will use
all these names. Almost all of the functions we will encounter in this class will be between
vector spaces, and particularly between Rn and Rm (don’t worry about what these words mean
right now).
We give some examples of functions. Recall that the set of all real numbers is denoted by
R. The familiar functions of real numbers associate a real number to a real number. In the
new language, we can write f : R → R to mean that f is a function of real numbers. Note that
here R is both the domain and the range. For instance f (x) = x2 is a function f : R → R. The
image of any x is x2 , so we may write x 7→ x2 .
Let X = {1, 2, 3} and Y = {2, 4, 6, 8, 10}. We can define a function f : X → Y by
f (1) = 4,
f (2) = 10,
f (3) = 4.
We can define a function g : Y → X by sending 10 to 3, and all the rest of the elements to 1.
Then we would write
g(2) = 1,
g(4) = 1,
g(6) = 1,
g(8) = 1,
g(10) = 3.
Now let P be the set of all planets in our solar system. Recall that Z denotes the set of all
integers. We can define a function ϕ : P → Z by sending each planet to the number of (natural)
satellites it has. Some of the values of ϕ are
ϕ (“Earth”) = 1,
ϕ(“Venus”) = 0,
ϕ (“Mars”) = 2.
In some of the previous examples we specified a function by giving the value for every single
element in the domain. This was possible because the domain consisted of a small number of
elements. If the domain is large, or even infinite, then we cannot describe a function this way,
and we use other means. Let f : N → N be the function that sends any n ∈ N to 2n. Then
we can describe this function by writing f (n) = 2n or n 7→ 2n. This function has a “formula”,
namely f (n) = 2n. Most interesting functions have a formula or expression of some sort. But,
in general, a function does not necessarily have a formula or nice description. If we associate
to each natural number another random natural number, starting with, say
f (1) = 6432,
f (2) = 2,
f (3) = 8791872398,
f (4) = 777,
...
we will still get a function N → N, for which we do not have a nice way to describe. Fortunately
such functions are not usually of any particular interest.
Two important remarks before we end this subsection. In the definition of a function we
stated that a function f : X → Y associates to an element x ∈ X a unique element f (x) ∈ Y .
If we associate an element of X with none or more than one elements of Y , then the result is
not a function. If X = {1, 2, 3} and Y = {2, 4, 6, 8, 10} are as above, then we cannot have a
function f : X → Y such that f (1) = 2 and f (1) = 4 at the same time.
4
As sets are determined by their elements and not by the way we choose to represent them,
functions are determined by the underlying association between the elements of the domain
and the range, and not the formula we use to describe them. Consider the following functions
from R to R:
√
−x if x < 0
f (x) = |x|, g(x) =
, h(x) = x2
x if x ≥ 0
One can see that these expressions return the same value for every x ∈ R. So they describe the
same function.
It is not always obvious if two functions are the same. For two functions to be the same,
they need first of all to have the same domain and range. Say f : X → Y and g : X → Y are
two such functions. Then f and g are equal if f (x) = g(x) for all x ∈ X. For instance, consider
the following functions R → R:

2

 x − 1 if x 6= 1
x−1
f (x) = x + 1, g(x) =


2
if x = 1
Then if x 6= 1,
g(x) =
x2 − 1
(x + 1)(x − 1)
=
= x + 1 = f (x).
x−1
x−1
We see that f (x) = g(x) for all x ∈ R, except possibly for x = 1. In this case,
f (1) = 1 + 1 = 2 = g(1)
Therefore, f (x) = g(x) for all x ∈ R. So, the two functions are equal, i.e. the two expressions
represent the same function.
§ 2.2
Image of a Function
If f : X → Y is a function, we call the collection of the elements of Y that are values of
elements of X the image of f , and we denote it by im f or image f . The image of a function
is always a subset of the range. For instance, if f : R → R is given by f (x) = x2 , the image
of f is the set of all positive numbers and zero. This is because the square of any number is a
positive number or zero, and conversely any non-negative number can be written as the square
of its square root. Do not confuse the image f (x) of the element x, which is a single element
in the range of the function, with the image of the function, which is a subset of the range.
Let X = {1, 2, 3} and Y = {2, 4, 6, 8, 10}. As in §2.1, we can define a function f : X → Y
by
f (1) = 4, f (2) = 10, f (3) = 4.
the image of f is {4, 10}, which is a subset of Y .
We can write the image of a function f : X → Y using the notation introduced in section
§1.2 as
im f = {y ∈ Y : y = f (x) for some x ∈ X} .
or, even better, as
im f = {f (x) : x ∈ X} .
So, saying that y ∈ Y is in the image of f means that we can find some x ∈ X such that
y = f (x).
5
In class we will talk about the image of a linear transformation. It is the same as the image
of a function introduced here (a linear transformation is a certain kind of function). Using the
particular properties of linear transformations we will be able to give a nice description of the
image.
§ 2.3
Composition of Functions and the Identity Function
Let X, Y, Z be sets, and f : X → Y and g : Y → Z functions (the range of f is the domain of
g). Given an element x ∈ X, its image f (x) is an element of Y , so that we can take the image
g(f (x)) of it in Z. This rule associates to each element x ∈ X the element g(f (x)) of Z. In
other words, this is a function X → Z given by x 7→ g(f (x)). We denote it by g ◦ f and call it
the composition of g with f . It is the function
g ◦ f : X → Z,
(g ◦ f )(x) = g (f (x)) .
Note that in g ◦ f the function on the right is the one that “acts” first.
For an example, let f : R → R and g : R → R be the functions f (x) = x2 and g(x) = x + 1.
Then f (−2) = 4 and g(4) = 5, so that (g ◦ f )(−2) = g(f (−2)) = 5. In general,
(g ◦ f )(x) = g (f (x)) = f (x) + 1 = x2 + 1.
We see that the composition g ◦ f is the function R → R given by x 7→ x2 + 1.
For any set X, we can construct the function I : X → X that sends an element to itself, i.e.
x 7→ x. This is called the identity function. For real numbers this is the polynomial I : R → R,
I(x) = x. If f : X → Y is a function, then the composition f ◦ I, which is a function X → Y ,
is equal to f . Similarly, the composition I 0 ◦ f , where I 0 : Y → Y is the identity function on
Y , is equal to f . Indeed, for any x ∈ X
(f ◦ I)(x) = f (I(x)) = f (x)
§ 2.4
and
(I 0 ◦ f )(x) = I 0 (f (x)) = f (x).
Inverse Functions
Let f : X → Y be a function. Any element of X is associated with precisely a single element
of Y , but an element of Y can be associated with any number of elements of X. So, if x is in
X, then f (x) is a unique element in Y , while if y ∈ Y , it does not mean that there is a unique
element x ∈ X such that f (x) = y. For example, consider the function f : R → R given by
f (x) = x2 . The square of a number is unique, e.g. (−5)2 = 25 and 33 = 9, so that for any
x, there is a unique number f (x). On the contrary, given a positive number y, both numbers
√
√
x1 = y and x2 = − y are such that f (x1 ) = y and f (x2 ) = y, e.g. for the number 9, we have
9 = 32 = (−3)2 , so that f (−3) = 9 and f (3) = 9. If y is a negative number, then there is no
number whose square is y. Hence for −3, there is no number x such that f (x) = −3.
If f : X → Y is a function with the property that given any y ∈ Y , there is some x ∈ X
such that y = f (x), then we say that f is onto or surjective. This is equivalent to saying that
the image of f is all of Y . We saw that the function f : R → R given by f (x) = x2 is not
surjective, as there is no number x ∈ R such that f (x) = −3. The function g : R → R given
√
by g(x) = x3 is surjective. Indeed, if y is a number, then x = 3 y is a number such that
√
g(x) = x3 = ( 3 y)3 = y (even if y is negative).
If given any two distinct x1 , x2 ∈ X the values f (x1 ) and f (x2 ) are distinct, we say that f
is one-to-one or injective. The function f : R → R given by f (x) = x2 is not injective. Indeed,
1 and −1 are distinct, but f (1) and f (−1) are equal, both have the value 1. On the contrary,
6
g : R → R given by g(x) = x3 is injective. For any two distinct values x1 , x2 , the cubes x31 and
x32 are distinct.
If a function is both injective and surjective we say that it is bijective. The function g given
above is bijective, while f is not.
Let X, Y be sets, and let f : X → Y be a function. For each element x ∈ X, f (x) is an
element in Y . We want to construct a function g : Y → X such that f and g cancel each other:
for any x ∈ X, g takes the value f (x) back to x, and for any y ∈ Y , f takes the value g(y)
back to y:
X
x
f
.............................................................
..................................................
Y
g
.............................................................
f (x)
..................................................
X
Y
x
y
g
.............................................................
...................................................
X
f
.............................................................
Y
................................................
.
...
y
g(y)
This is equivalent to saying that g ◦ f is the identity function on X, and f ◦ g is the identity
function on Y . Indeed, if g is such a function, for any x ∈ X,
(g ◦ f ) (x) = g (f (x)) = x
and
(f ◦ g) (x) = f (g(x)) = x.
It is not always possible to find a function g with this property. We may encounter two
problems: (i) there may be values y ∈ Y for which there is no x ∈ X such that y = f (x), and
(ii) there may be elements y ∈ Y for which there are more than one elements x ∈ X such that
y = f (x). In the case of (i), if y ∈ Y is such that there is no x ∈ X for which y = f (x), then f
cannot send g(y) back to y, since f (x) is never y, for any x ∈ X. In the case of (ii), say y ∈ Y
is such that there are distinct x1 , x2 ∈ X for which f (x1 ) = y and f (x2 ) = y, then g cannot
send both y = f (x1 ) and y = f (x2 ) back to x1 and x2 , since g is a function and the value g(y)
is unique. Both of these problems occur for the function f : R → R given by f (x) = x2 , so that
there is no g : R → R such that g(y) is the number x ∈ R for whic f (x) = y. Indeed, whatever
value we give to g(−3), (f ◦ g)(−3) = g(−3)2 cannot be −3, and also (g ◦ f )(3) and (g ◦ f )(−3)
are both equal to g(9), but g(9) cannot be 3 and −3 at the same time.
The first obstacle does not occur if f is surjective, and the second if f is injective. So, if f
has both of these properties, i.e. if it is bijective, then we can construct a function g : Y → X
that “cancels” f . We define g by sending y ∈ Y to the unique x ∈ X such that y = f (x).
Such an x exists because f is surjective, and it is unique because f is injective. We call this
function the inverse of f and denote it by f −1 . If the inverse exists, then we say that f is
invertible. The function g : R → R given by g(x) = x3 is bijective, as noted above, and so it
√
has an inverse. Its inverse is the cubic root function, namely g −1 : R → R, x 7→ 3 x.
§ 2.5
Classes of Functions
Even though a function can be arbitrary without following some formula or rule, usually we
are interested in functions of a certain kind. For instance, when dealing with functions of
real numbers, i.e. functions R → R, we talk about polynomial functions, rational functions,
trigonometric functions etc.
A function f : R → R is a polynomial function if it can be expressed by a polynomial p(x).
The functions
f (x) = x3 − 1, g(x) = x4 − x3 + x2 − x + 1, h(x) = 4
are all polynomial functions. If a function is given by a formula which is a polynomial, then
it is certainly a polynomial function. But if the function is not given by a formula which is a
7
polynomial, it does not mean that the function is not a polynomial function. Recall that the
function is the underlying association between elements of the domain and the range (here both
the domain and range are R), not the representation (the formula) we choose for the function.
So, saying that f (x) is a polynomial function means that we can express f (x) as a polynomial,
even if it is not originally expressed as such.
We give two examples. In §2.1 we introduced the functions R → R:

2

 x − 1 if x 6= 1
x−1
f (x) = x + 1, g(x) =


2
if x = 1
and we saw that they are equal. So, even though g is not given by a formula which is a
polynomial, we see that it is equal to f , which is a polynomial. This means that we can write
g as g(x) = f (x) = x + 1. We see that g is indeed a polynomial function.
Now we give an example of a polynomial function which is described geometrically. Let
f : R → R be the function that assigns to each number a ∈ R the area of the triangle on
the cartesian plane formed by the line y = x, the x-axis, and the vertical line x = a (with
the convention that the area is negative if x is negative). This is a function, and at first sight
there is no reason to think that it is a polynomial. Either by integrating or using elementary
geometry, you can see that the area of the triangle between the line y = z, the x-axis, and the
vertical line x = a is 12 a2 , so that f (a) = 12 a2 . We see that f is a polynomial.
Given a function f : R → R, showing that it is not a polynomial is a less trivial task,
because we have to show that it is not possible to find a polynomial function that is equal to
f . There are ways to do that, depending on the particular function in hand, but they are out
of the scope of these notes.
In class we will introduce linear functions (or linear transformations). These will be functions Rn → Rm of a certain kind (don’t worry about what the sets Rn and Rm are right now).
We will give a definition, and we will also give some classes of functions Rn → Rm described
geometrically. It will not be immediate that these functions defined geometrically are linear
functions, but we will show that they are equal to certain functions which are linear.
Given a class of functions from a set to itself, one can ask whether compositions of two
such functions, the identity function, or inverses of such functions, if they exist, are again in
the same class. To continue the example with polynomials, one can ask whether given two
polynomial functions f, g from R to R, the composition g ◦ f is again a polynomial. The answer
is yes, but we will not prove it here. The identity function is given by the formula I(x) = x,
so it is certainly a polynomial. On the other hand, if f : R → R is an invertible polynomial
function, then its inverse does not have to be a polynomial function. For instance, we saw
that the function g : R → R given by g(x) = x3 is invertible, and its inverse is the function
√
g −1 : R → R given by x 7→ 3 x, which is not a polynomial function.
We will see that the composition of linear functions is again a linear function, the identity
function is a linear function, and given an invertible linear function, its inverse will be linear.
3
§ 3.1
Elementary Logic
“If P , then Q” Statements
In this class we will use statements of the form “if P , then Q”, where P and Q are two
statements that can be true or false. The statement “if P , then Q” is itself a statement that
8
can be true or false. It is a true statement if Q is true whenever P is true, and it is false if Q
can be false while P is true. For instance, let x be a real number. The statement:
if 0 < x < 1, then x2 < 9
is of this form and is true: if x satisfies the first condition, i.e. it is a number between 0 and
1, then its square is going to be certainly less than 9, hence it satisfies the condition x2 < 9.
The condition Q by itself, namely x2 < 9, is not always true: if you pick a number it does not
mean that its square is less than 9. But if in addition you assume that x satisfies condition P ,
that is 0 < x < 1, then Q is true. On the other hand, the statement:
if x is an integer, then x > 0
is false; the number −2 is an integer but is not more than 0. Here the statement Q by itself,
that is x > 0, can be true or false, as some numbers are above 0 and some not. Assuming
that x is an integer, the statement Q can still be false. So Q is not true whenever P is true;
the statement P can be true while Q is false (e.g. for x = −2). This means that the “big”
statement “if x is an integer, then x > 0” is false.
Do not confuse the statement “if P , then Q” with “if Q, then P ”. These are distinct
statements that can be either both true, both false, or one of them true and the other false.
For instance, if 0 < x < 1, then x2 < 9 is a true statement as we saw above, but the statement:
if x2 < 9, then 0 < x < 1
is false: if x is 2, then x2 < 9 is true, but x is not between 0 and 1, so that x2 < 9 does not
imply 0 < x < 1.
A shorter notation for this form of statements is P ⇒ Q and we will use it frequently. The
previous statement can be written as
0 < x < 1 ⇒ x2 < 9.
When we say that that we will prove a statement we mean that we will show that it is a
true statement, and not a false one. Many times we will prove a statement of the form “if P ,
then Q” using successive smaller true such statements:
P = P1 ⇒ P2 ⇒ P3 ⇒ · · · ⇒ Pn = Q.
This means that whenever P = P1 is true, then P2 must be true, and so P3 must be true, and
so on; in summary, whenever P is true, Q must be true. As an example, we can show that if
(x2 − 2x + 7)/(x2 + 1) > 1, then x must be less than 3:
x2 − 2x + 7
> 1 ⇒ x2 − 2x + 7 > x2 + 1 ⇒ −2x + 6 > 0 ⇒ x < 3.
x2 + 1
One can check that each step is true, so that the whole argument is true, and we write:
x2 − 2x + 7
> 1 ⇒ x < 3.
x2 + 1
9
§ 3.2
“P if and only if Q” Statements
Another type of statements, which is a bit less self-explanatory, is the statements of the form
“P if and only if Q”, where P, Q are again statements that can be true or false. This statement
is true if P and Q are always either both true or both false. If one of them is true and the
other false, then the statement is false. For instance,
0 < x < 1 if and only if x2 < 9
is false, as the second statement can be true with the first being false, e.g. for x = 2 (see §3.1).
On the other hand, the statement:
−3 < x < 3 if and only if x2 < 9
is true; a number is between −3 and 3 precisely when its square is less than 9. So either the
number is between −3 and 3 and its square is less than 9, or the number is not between −3
and 3 and its square is not less than 9.
We abbreviate “P if and only if Q” by P ⇔ Q. There is a reason for this notation: P ⇔ Q
is equivalent to the condition “P ⇒ Q and Q ⇒ P ”. Given a statement of the form P ⇔ Q
we can prove it (show that it is true) by checking that both P ⇒ Q and Q ⇒ P hold. In §3.1
we saw that (x2 − 2x + 7)/(x2 + 1) > 1 ⇒ x < 3. Conversely, we can show that
x < 3 ⇒ −x + 3 > 0 ⇒ −2x + 6 > 0 ⇒ x2 − 2x + 7 > x2 + 1 ⇒
Since both
x2 − 2x + 7
>1 ⇒ x<3
x2 + 1
and x < 3 ⇒
hold, we can say that (x2 − 2x + 7)/(x2 + 1) > 1 ⇔ x < 3.
10
x2 − 2x + 7
> 1.
x2 + 1
x2 − 2x + 7
>1
x2 + 1