Download Lectures 1-31 - School of Mathematical Sciences

Document related concepts

Georg Cantor's first set theory article wikipedia , lookup

Addition wikipedia , lookup

Big O notation wikipedia , lookup

Location arithmetic wikipedia , lookup

Wiles's proof of Fermat's Last Theorem wikipedia , lookup

Large numbers wikipedia , lookup

Elementary mathematics wikipedia , lookup

Theorem wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Non-standard calculus wikipedia , lookup

Recurrence relation wikipedia , lookup

Fundamental theorem of calculus wikipedia , lookup

Collatz conjecture wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Four color theorem wikipedia , lookup

Proofs of Fermat's little theorem wikipedia , lookup

Transcript
MTH 6109: Combinatorics
Dr David Ellis
Autumn Semester 2012
2
Contents
1 Counting
1.1 Introduction . . . . . . . . . . . .
1.2 Counting sequences . . . . . . . .
1.3 Counting subsets . . . . . . . . .
1.4 The inclusion-exclusion principle .
1.5 Counting surjections . . . . . . .
1.6 Permutations and derangements .
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
9
16
20
22
2 Recurrence relations & generating
2.1 Introduction . . . . . . . . . . . .
2.2 Solving recurrence relations . . .
2.3 Generating series . . . . . . . . .
series
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
35
35
36
51
3 Graphs
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Bipartite graphs and matchings . . . . . . . . . . . . . . . . . . .
75
75
81
85
4 Latin squares
4.1 Introduction . . . . . . . . . . .
4.2 Orthogonal latin squares . . . .
4.3 Upper bounds on the number of
4.4 Transverals in Latin Squares . .
3
.
.
.
.
.
.
. . .
. . .
latin
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . .
. . . . .
squares
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
95
. 95
. 98
. 105
. 107
4
CONTENTS
Chapter 1
Counting sequences, subsets,
integer partitions, and
permutations
1.1
Introduction
Combinatorics is a very broad, rich part of mathematics. It is mostly about
the size and properties of finite structures. Often in combinatorics, we want to
know whether it is possible to arrange a set of objects into a pattern satisfying
certain rules. If it is possible, we want to know how many such patterns there
are. And can we come up with an explicit recipe, or algorithm, for producing
such a pattern?
Here is an example. The great mathematician Leonhard Euler asked the
following question in 1782.
‘There are 6 different regiments. Each regiment has 6 soldiers, one of each
of 6 different ranks. Can these 36 soldiers be arranged in a square formation so
that each row and each column contains one soldier of each rank and one from
each regiment?’
Euler conjectured that the answer is ‘no’, but it was not until 1900 that this
was proved correct. He also conjectured that the answer is ‘no’ if six is replaced
by 10, 14, or any number congruent to 2 mod 4. He was completely wrong about
this, but this was not discovered until the 1960’s.
Euler’s formations are known as mutually orthogonal latin squares; we will
study them later in the course.
Note that if we replace ‘6’ with ‘3’, then such an arrangment is possible: if
the regiments are labelled 1,2 and 3, and the ranks are labelled a,b and c, then
the following works:
5
6
CHAPTER 1. COUNTING
a1 b2 c3
b3 c1 a2
c2 a3 b1
Challenge: work out how many such arrangements there are! (You may want
to wait until after the first 6 lectures, before tackling this.)
While it includes many interesting and entertaining puzzles, combinatorics is
also of great importance in the modern digital world. Much of computer science
can be seen as combinatorics, and indeed, computer scientists and combinatorialists are interested in many of the same problems. Even Euler’s ‘puzzle’ turns
out to be relevant to the construction of error-correcting codes.
1.2
Counting sequences
In this chapter, we’ll be concerned with working out how many there are of some
very common mathematical patterns, or structures.
Let’s start with some simple, but important, examples.
Example 1. How many sequences of length 3 can we make using the letters
a,b,c,d,e? (Order is important, repetition is allowed.)
Answer: There are 5 choices for the first letter. For each choice of the first
letter, there are 5 choices for the second letter. And for each choice of the first two
letters, there are 5 choices for the third letter. So the answer is 5 × 5 × 5 = 125.
Example 2. How many sequences in Example 1 have no repetitions?
Answer: There are 5 choices for the first letter. For each choice of the first
letter, there are 4 choices for the second letter. For each choice of the first two
letters, there are 3 choices for the third letter. So the answer is 5 × 4 × 3 = 60.
Example 3. How many sequences in Example 2 contain the letter a?
Answer: there are 3 choices of where to put the letter a. There are then 4
choices of which letter to put in the first remaining space, and then 3 choices of
which letter to put in the second remaining space. So the answer is 3×4×3 = 36.
We can generalize Example 1 as follows.
Example 4. Suppose X is a set of n elements. How many sequences of length k
can we make, using elements of X?
Answer: There are n choices for the first letter in the sequence. For each
choice of the first letter, there are n choices for the second. And so on, until, for
each choice of the first k − 1 letters, there are n choices for the kth letter. So the
answer is
k
|n × n × n{z× . . . × n} = n .
k times
1.2. COUNTING SEQUENCES
7
Aside: making proofs formal
It is intuitively obvious that this is the right answer, but the argument above is
not a totally formal proof, because of the ‘and so on’ in the middle. To make it
formal, we need to use two principles: the principle of induction, and the bijection
principle.
You are all familiar with the principle of mathematical induction.
Principle 1 (The principle of mathematical induction). For each natural number
n, let P (n) be a statement, which can either be true or false. For example, P (n)
might be ‘1 + 2 + . . . + n = 12 n(n + 1).’ Suppose that:
• P (1) is true;
• For each n, P (n) implies P (n + 1).
Then P (n) is true for all natural numbers n.
To state the bijection principle, we need some definitions. Let X and Y be
sets, and let f : X → Y be a function.
Definition. We say that f is an injection if f (x) = f (x0 ) implies that x = x0 .
In other words, any element of Y has at most one pre-image.
Definition. We say that f is a surjection if for every y ∈ Y , there exists an
x ∈ X such that f (x) = y. In other words, every element of Y has at least one
pre-image.
Definition. We say that f is a bijection if it is both an injection and a surjection.
In other words, every element of Y has exactly one pre-image.
We can now state the bijection principle.
Principle 2 (The bijection principle). If X and Y are finite sets, and there exists
a bijection from X to Y , then |X| = |Y |.
A bijection from X to Y is simply a way of pairing up the elements of X with
the elements of Y , so that each element of Y is paired with exactly one element of
X. It is also known as a ‘one-to-one correspondence’ between X and Y . If there
is a one-to-one correspondence between X and Y , we sometimes denote this fact
using a double arrow, X ↔ Y .
Remark 1. Recall that f : X → Y is a bijection if and only if there exists a
function g : Y → X such that
• g ◦ f = IdX , the identity function on X, and
• f ◦ g = IdY , the identity function on Y .
8
CHAPTER 1. COUNTING
The function g is called the inverse of f .
Armed with these two principles, we can now give a formal proof of the answer
to Example 4.
Theorem 1. Let n and k be positive integers. Let X be a set of size n. Then the
number of sequences of length k which can be made using elements of X, is nk .
Proof. Let us write X k for the set of sequences of length k which can be made
using elements of X. Our aim is to show that
|X k | = nk
(1.1)
for all k ∈ N.
First, observe that |X k | = n|X k−1 |. This is because, for every sequence of
length k − 1, we can construct n sequences of length k by choosing any element
of X and joining it to the end of the sequence. Every sequence of length k is
obtained exactly once in this way. (Formally, the function
f : X k−1 × X → X k ;
(S, x) 7→ (S followed by x)
is a bijection.) This shows that |X k | = n|X k−1 |.
We can now prove (1.1) using induction on k. There are n sequences of length
1, so |X 1 | = n, so (1.1) holds for k = 1. Now suppose (1.1) holds for k − 1, i.e.
|X k−1 | = nk−1 . Then by the above fact, we have |X k | = n|X k−1 | = n×nk−1 = nk ,
so (1.1) holds for k also. Therefore, by induction, (1.1) holds for all k ∈ N.
In general, the words ‘and so on’ (or . . .), in a proof, indicate an induction
argument, which is sufficiently obvious that it need not be spelt out. When
answering coursework or exam questions, you do not need to write down the
formal argument in the ‘aside’, but it is good to know what lies underneath ‘. . .’
in a proof!
Remark 2. Observe that the number of sequences of length k, using elements
of X, is just the same as the number of functions from {1, 2, . . . , k} to X. Indeed, there is an one-to-one correspondence, or bijection, between X k and the set
of functions from {1, 2, . . . , k}: just pair up the sequence (x1 , . . . , xk ) with the
function
i 7→ xi (i ∈ [k]).
Sequences without repetition
Example 5. Suppose X is a set of n elements. How many sequences of length k
can we make, using elements of X, without repetition?
1.3. COUNTING SUBSETS
9
Answer: there are n choices for the first letter in the sequence. For each choice
of the first letter, there are n − 1 choices for the second. For each choice of the
first two letters, there are n − 2 choices for the third. And so on, until for each
choice of the first k − 1 letters, there are n − (k − 1) = n − k + 1 choices for the
kth letter. So the answer is
n(n − 1) . . . (n − k + 1).
Remark 3. This can be made into a formal proof, just like Theorem 1, using
induction. Exercise: write down this proof !
Remark 4. Note that the number of sequences of length k which we can make using elements of X without repetition is just the same as the number of injections
from {1, 2, . . . , k} to X. This should be ‘obvious’ by now. But it is worth bearing in mind that ‘a mathematical statement is obvious if the proof writes itself.’
Exercise: write down this proof ! (One sentence.)
Example 6. How many sequences of length n can we make using the elements
of {1, 2, . . . , n} without repetition?
This is just a special case of Example 5, where n = k, so the answer is
n(n − 1) . . . (2)(1).
Of course, a sequence of length n made out of the numbers in {1, 2, . . . , n}
without repetition, must contain each number exactly once. So it is just a reordering of the numbers 1, 2, . . . , n. There is an obvious one-to-one correspondence between reorderings of 1, 2, . . . , n, and bijections from {1, 2, . . . , n} to itself.
As you know, a bijection from a set X to itself is known as a permutation of X.
So we see that the number of permutations of {1, 2, . . . , n} is n(n − 1) . . . (2)(1).
This number is so important that we give it a name; it is written as n!,
pronounced ‘n factorial’:
n! := n(n − 1) . . . (2)(1).
In terms of factorials, we can rewrite the answer to Example 5 as
n(n − 1) . . . (n − k + 1) =
n!
.
(n − k)!
This number is known as the kth falling factorial moment of n. It is sometimes
written as n Pk .
1.3
Counting subsets
Example 7. If X is an n-element set, how many subsets of size k does it have?
10
CHAPTER 1. COUNTING
A k-element subset can be viewed as an unordered sequence of k distinct elements of X. We know already that the number of (ordered) k-element sequences
of distinct elements of X is n(n − 1) . . . (n − k + 1). Let’s try to relate this to the
number of k-element subsets of X.
We can generate (ordered) k-element sequences of distinct elements of X as
follows. First choose a k-element subset of X, and then choose a way of ordering
its elements to produce a length-k sequence of distinct elements of X. Each
length-k sequence of distinct elements of X is produced exactly once by this
process. There are k! orderings of each k-element set, so we obtain:
n(n − 1) . . . (n − k + 1) = (number of k-element subsets of X) × k!.
Therefore, the number of k-element subsets of X is
n!
n(n − 1) . . . (n − k + 1)
=
.
k!
k!(n − k)!
This number is written nk , pronounced ‘n choose k’:
n
n!
:=
k
k!(n − k)!
(1.2)
(It is also sometimes written as n Ck , but we will not use this notation.)
This argument is an example of the proof-technique of ‘double-counting’,
which occurs very often in combinatorics. We want to count a certain quantity, so we do it by counting another related quantity in two different ways, and
then rearranging to get an expression for the first quantity.
n
Exercise 1. Show that nk = n−k
, where n and k are non-negative integers
with 0 ≤ k ≤ n.
(i) Using the formula (1.2);
(ii) By means of a suitable bijection between k-element subsets of {1, 2, . . . , n}
and (n − k)-element subsets of {1, 2, . . . , n}.
Example 8. If X is an n-element set, how many subsets of X are there (no
restriction on size)?
In fact, this turns out to be easier than counting k-element subsets.
Theorem 2. If X is an n-element set, then the number of subsets of X is 2n .
Proof 1. Let X = {x1 , . . . , xn }. Observe that we can choose a subset of X using
the following n-stage process.
Stage 1: either x1 ∈ S or x1 6∈ S (2 choices);
1.3. COUNTING SUBSETS
11
Stage 2: either x2 ∈ S or x2 6∈ S (2 choices);
...
Stage n: either xn ∈ S or xn 6∈ S (2 choices).
Hence there are 2n subsets altogether.
Proof 2. We observe that there is a one-to-one correspondence (a bijection) between subsets of X and length-n sequences of 0’s and 1’s. As above, let X =
{x1 , . . . , xn }. For each S ⊂ X, pair up S with the sequence which has a 1
in the ith position if xi ∈ S, and a 0 in the ith position if xi ∈
/ S, for each
i ∈ {1, 2, . . . , n}.
For example, when n = 5, the set {x1 , x3 , x4 } is paired up with the sequence
(1, 0, 1, 1, 0).
It is easy to see that this is a one-to-one correspondence. We already know
that the number of length-n sequences we can make using elements of {0, 1} is
2n . So the number of subsets of X is also 2n .
As in section 1.2, there is a one-to-one correspondence between length-n sequences of 0’s and 1’s, and functions from X to {0, 1}: simply pair up the sequence
(1 , . . . , n ) ∈ {0, 1}n with the function
xi 7→ i .
This suggests a way of rewriting proof 2, using an explicit one-to-one correspondence between subsets of X and functions from X to {0, 1}.
Proof 3. We observe that there is a one-to-one correspondence between subsets of
X and functions from X to the set {0, 1}. Indeed, if A ⊂ X, let χA : X → {0, 1}
denote the function with χA (x) = 1 if x ∈ A, and χA (x) = 0 if x ∈
/ A. It is easy
to see that A ↔ χA is a one-to-one correspondence. We already know that the
number of functions from X to {0, 1} is 2n , so the number of subsets of X is also
2n .
Remark 5. The function χA defined in proof 3 is called the characteristic function of the subset A. The characteristic function is a very useful object indeed,
as we will see later.
We now come to a very useful tool: the binomial theorem.
Theorem 3 (The Binomial Theorem). If n is any positive integer, then
n X
n k n−k
(x + y) =
x y .
k
k=0
n
12
CHAPTER 1. COUNTING
Proof. Consider (x + y)n as a product of n factors B1 .B2 . · · · .Bn , where
B1 = B2 = · · · = Bn = (x + y).
To get a term xk y n−k in this product, we need to choose an x from exactly k of the
factors B1 , B2 , . . . , Bn , and a y from the remaining n − k factors. The number of
ways of doing this is just the number of k-element subsets of {1, 2, . . . , n}, which
is
n
.
k
n
Hence in the expansion of the product there
are
exactly
terms xk y n−k . In
k
other words, the coefficient of xk y n−k is nk .
Corollary 4. If m is any positive integer, then
n X
n
= 2n .
k
k=0
Proof 1. Put x = y = 1 in the Binomial Theorem.
n
Proof 2. Let X be an n-element set. The
number of subsets of X is 2 , and the
n
number of k-element subsets of X is k , for each k ∈ {0, 1, 2, . . . , n}. So
n X
n
= 2n .
k
k=0
as required.
Exercise 2. (i) Use the binomial theorem to show that if X is an n-element
set, then the number of even-sized subsets of X is equal to the number of
odd-sized subsets of X.
(ii) When n is odd, this can also be proved using a bijection: pair up the subset
A with Ac — check that this works.
(iii) When n is even, the bijection above does not work. Find another bijection
which works for both n even and n odd. (This is Exercise 6 in Assignment
1.)
Exercise 3. Write down the first 7 rows of Pascal’s triangle. (Recall that we
construct Pascal’s triangle by starting with
1
1
1
1
1
..
.
1
2
1
1
1
..
.
1.3. COUNTING SUBSETS
13
and then writing in each space the sum of the two numbers above that space,
working down the triangle row by row.) What is the number in row n and space
k (from the left)? Write down the identity (in terms of binomial coefficients) you
used to construct Pascal’s triangle. Now prove it:
(i) Directly, from the formula (1.2);
(ii) By counting subsets in two different ways.
Example 9. Let n and k be positive integers with 1 ≤ k ≤ n. How many lengthk sequences of non-negative integers are there which sum to n? InPother words,
how many sequences (x1 , x2 , . . . , xk ) of non-negative integers have ki=1 xi = n?
There will be small prizes for correct answers (with proofs) of this at the
beginning of Lecture 3, provided I am reasonably convinced that it is your own
work!
Answer: there is a one-to-one correspondence (a bijection) between solutions
to this equation, and diagrams containing n stars and k − 1 bars in a row. Given
a sequence (x1 , . . . , xk ) of non-negative integers with x1 + . . . + xk = n, we pair
it up with a diagram as follows. We first place k vertical bars in a row, and then
we place xi stars between the (i − 1)th bar and the ith bar, for i = 1, 2, . . . , k − 1.
(So we place x1 stars before the first bar, and xk stars after the (k − 1)th bar.)
For example, when n = 5 and k = 4, the solution (1, 2, 2, 0) corresponds to the
diagram
∗ |∗∗ |∗∗ |
(Check that thisdefines a bijection.) How many such diagrams are there? The
answer is, n+k−1
. Why? The diagram is a row of n+k −1 symbols, and we must
k−1
choose k − 1 of the symbols to be bars. (The rest are stars.) The number of ways
of choosing which symbols are to be bars is equal to the number of (k −1)-element
subsets of an (n + k − 1)-element set, which is n+k−1
. Hence
k−1
n+k−1
number of sequences = number of diagrams =
.
k−1
More complicated counting
Examples 1, 5, 6 and 8 can be seen as applications of the following simple principle.
Principle 3 (The Multiplication Principle). Suppose that a finite set X is descibed to us, and we want to find |X|, the number of elements in X. Suppose we
can generate the elements of X using a process consisting of k steps such that:
(i) The number of possible choices at the ith step is ti , and this number is
independent of which choices we made in the previous stages;
14
CHAPTER 1. COUNTING
(ii) Each element of X is produced by exactly one sequence of choices. (So if
we change the choice at any stage of the process, we get a different element
of X.)
Then |X| = t1 t2 · · · tk .
In Example 1, we have ti = n for all i with 1 ≤ i ≤ k, and in Example 5, we
have ti = n − i + 1 for all i with 1 ≤ i ≤ k.
Here is an example where we cannot use the multiplication principle straight
away.
Example 10. If n ≥ 5, how many k-element subsets of {1, 2, . . . , n} contain at
least 3 elements of the set {1, 2, 3, 4, 5}?
Answer: Let F be the family of k-element subsets of {1, 2, . . . , n} containing
at least 3 elements of the set {1, 2, 3, 4, 5}. We want to find |F| by generating the
sets S ∈ F using a sequence of choices, generating each set in F exactly once.
The natural thing to do is to generate S in the following sequence of steps:
Step 1: Choose the number of elements of {1, 2, 3, 4, 5} that S contains.
Step 2: Choose exactly which elements of {1, 2, 3, 4, 5} S contains.
Step 3: Choose exactly which elements of {6, 7, . . . , n} S contains.
Obviously, in Step 1, there are 3 choices for the number of elements of
{1, 2, 3, 4, 5} which S contains: 3,4 or 5. However:
• If we choose ‘3’ in Step 1, then at Step 2, there are 53 = 10 possible choices
for which 3 elements of {1, 2, 3, 4, 5} S contains (we just have to choose a
3-element subset of {1, 2, 3, 4, 5}). For each such choice, there are n−5
k−3
possible choices at Step 3 for which elements of {6, 7, . . . , n} S contains.
(We just have to choose a (k − 3)-element subset of {6, 7, . . . , n}.) So the
total number of possibilities in this case is
5 n−5
n−5
= 10
.
3 k−3
k−3
• If we choose ‘4’ in Step 1, then at Step 2, there are 54 = 5 possible choices
for which 4 elements of {1, 2, 3, 4, 5} S contains (we just have to choose a
4-element subset of {1, 2, 3, 4, 5}). For each such choice, there are n−5
k−4
possible choices at Step 3 for which elements of {6, 7, . . . , n} S contains.
(We just have to choose a (k − 4)-element subset of {6, 7, . . . , n}.) So the
total number of possibilities in this case is
5 n−5
n−5
=5
.
4 k−4
k−4
1.3. COUNTING SUBSETS
15
• If we choose ‘5’ in Step 1, then at Step 2, there is just 55 = 1 choice
for which elements of {1, 2, 3, 4, 5} S contains (S must contain all 5 of
them). There are then n−5
possible choices at Step 3 for which elements
k−5
of {6, 7, . . . , n} S contains. (We just have to choose a (k −5)-element subset
of {6, 7, . . . , n}.) So the total number of possibilities in this case is
5 n−5
n−5
=
.
5 k−5
k−5
Here, the number of possible choices at Step 2 and Step 3 depend upon the choice
at Step 1, so we cannot use the multiplication principle straight away.
However, observe that once we have made the choice at Step 1, the number
of possible choices at Step 3 does not depend upon the choice at Step 2 — only
upon the choice at Step 1. So, to calculate the total number of possible choices
after Step 1, we can use the multiplication principle. So the right thing to do is
to sum over all the choices in Step 1: the total number of possible choices is
5 n−5
5 n−5
5 n−5
n−5
n−5
n−5
+
+
= 10
+5
+
.
3 k−3
4 k−4
5 k−5
k−3
k−4
k−5
We have generated each set in F exactly once, so
n−5
n−5
n−5
|F| = 10
+5
+
.
k−3
k−4
k−5
Another way of what we are explaining what we are doing is partitioning F
according to the number of elements of {1, 2, 3, 4, 5} a set contains: if we let Fi
be the family of k-element subsets of {1, 2, . . . , n} containing exactly i elements
of {1, 2, 3, 4, 5}, then we are saying that
|F| =
5
X
i=3
n−5
n−5
n−5
|Fi | = |F3 | + |F4 | + |F5 | = 10
+5
+
.
k−3
k−4
k−5
Counting sets by partitioning
Recall that if X is a set, and X1 , . . . , Xk are subsets of X, we say that {X1 , . . . , Xk }
is a partition of X if:
• the Xi ’s are pairwise disjoint (meaning that Xi ∩ Xj = ∅ for all i 6= j), and
• X = ∪ki=1 Xi .
Partioning is often useful for counting. Often, when we want to work out
the size of a set X, we cannot apply the multiplication principle to X straight
away. But even then, we can sometimes still find a partition of X into disjoint
16
CHAPTER 1. COUNTING
sets X1 , X2 , . . . , Xk such that we can apply the multiplication principle to each
Xi , separately. We then have
|X| =
k
X
|Xi |,
i=1
enabling us to count X. This is what happens in Example 10.
1.4
The inclusion-exclusion principle
If A1 , . . . , An are finite sets which are all disjoint from one another, it is easy to
calculate the size of their union: we simply have
|A1 ∪ A2 ∪ . . . ∪ An | = |A1 | + |A2 | + . . . + |An |.
Often, we want to calculate the size of the union of n sets which are not all disjoint
from one another. The inclusion-exclusion formula gives us a way of doing this
in terms of intersections.
It is easy to see that if A1 , A2 are subsets of a finite set X, then
|A1 ∪ A2 | = |A1 | + |A2 | − |A1 ∩ A2 |.
Equivalently, taking complements,
|X \ (A1 ∪ A2 )| = |X| − |A1 | − |A2 | + |A1 ∩ A2 |
The inclusion-exclusion formula generalises this to k arbitrary subsets.
Theorem 5 (The inclusion-exclusion formula). Let X be a finite set.
Let A1 , A2 , . . . , An be subsets of X. Then
n
[
Ai = |A1 ∪ A2 ∪ · · · ∪ An |
i=1
= |A1 | + |A2 | + · · · + |An |
− (|A1 ∩ A2 | + |A1 ∩ A3 | + · · · + |An−1 ∩ An |)
+ (|A1 ∩ A2 ∩ A3 | + · · · + |An−2 ∩ An−1 ∩ An |)
− . . . + (−1)n−1 |A1 ∩ A2 ∩ · · · ∩ An |
\ X
=
(−1)|I|−1 Ai I⊆{1,2,...,n}:
I6=∅
i∈I
1.4. THE INCLUSION-EXCLUSION PRINCIPLE
17
Equivalently, taking complements, we have
!
n
[
Ai = |X| − (|A1 | + |A2 | + · · · + |An |)
X \
i=1
+ (|A1 ∩ A2 | + |A1 ∩ A3 | + · · · + |An−1 ∩ An |)
− (|A1 ∩ A2 ∩ A3 | + · · · + |An−2 ∩ An−1 ∩ An |)
. . . + (−1)n |A1 ∩ A2 ∩ · · · ∩ An |
\ X
=
(−1)|I| Ai .
(1.3)
i∈I
I⊆{1,2,...,n}
(Note that the term in the above sum where I = ∅ is |X|; by convention, the
intersection of no subsets of X is simply X.)
For example, if A1 , A2 , A3 are arbitrary subsets of a finite set X, then the
above becomes
|A1 ∪ A2 ∪ A3 | = |A1 | + |A2 | − |A1 ∩ A2 | − |A1 ∩ A3 | − |A2 ∩ A3 | + |A1 ∩ A2 ∩ A3 |
and
|X \ (A1 ∪ A2 ∪ A3 )| = |X| − |A1 | − |A2 | − |A3 |
+ |A1 ∩ A2 | + |A1 ∩ A3 | + |A2 ∩ A3 | − |A1 ∩ A2 ∩ A3 |.
Proof. (Non-examinable.) Notice the similarity between the above formula for
|X \ (∪ni=1 Ai )|, and the equation of polynomials
X
Y
(1 − X1 )(1 − X2 ) . . . (1 − Xn ) =
(−1)|I|
Xi .
(1.4)
I⊂{1,...,n}
i∈I
For any set S ⊂ X, write χS for its characteristic function, defined by
χS : X → {0, 1};
(
1 if x ∈ S;
χS (x) =
0 if x ∈
/ S.
Observe that for any set S ⊂ X, we have
X
|S| =
χS (x).
x∈X
This equation enables us express the size of a set as a sum of values of a function,
which can then be analysed using (1.4).
18
CHAPTER 1. COUNTING
Let B = X \ (∪ni=1 Ai ). First, observe that
n
Y
χB =
(1 − χAi ).
i=1
Second, observe that
n
Y
(1 − χAi (x)) =
i=1
X
(−1)|I|
Y
χAi (x) ∀x ∈ X,
i∈I
I⊂{1,...,n}
by substituting Xi = χAi (x) (which is just some real number) into the equation
(1.4). In other words,
n
Y
(1 − χAi ) =
i=1
X
(−1)|I|
Y
χAi ,
i∈I
I⊂{1,...,n}
as real-valued functions on X. Therefore,
n
Y
χB =
(1 − χAi ) =
i=1
X
(−1)|I|
Y
χAi .
i∈I
I⊂{1,...,n}
(Here, all equalities are between real-valued functions on X.) But note that for
any subset I ⊂ {1, 2, . . . , n}, we have
Y
χAi = χ(∩i∈I Ai ) .
i∈I
(This is another useful property of characteristic functions: the characteristic
function of an intersection of sets is equal to the product of all their characteristic
functions. Check this!) Therefore,
X
Y
X
χB =
(−1)|I|
χ Ai =
(−1)|I| χ(∩i∈I Ai ) .
(1.5)
I⊂{1,...,n}
i∈I
I⊂{1,...,n}
Hence, summing over all x ∈ X gives
|B| =
X
\ (−1)|I| Ai .
I⊆{1,2,...,n}
i∈I
This proves the second version of the inclusion-exclusion formula. The first version follows from taking complements: we have
n
\ [
X
| Ai | = |X| − |B| =
(−1)|I|−1 Ai .
i=1
∅6=I⊆{1,2,...,n}
i∈I
1.4. THE INCLUSION-EXCLUSION PRINCIPLE
19
Example 11. How many primes are there between 1 and 100?
Answer: let X = {1, 2, . . . , 100}. Suppose x ∈ X is not
√ prime. Then we may
write x = yz, where y is prime and y < z. Hence, y < 100 = 10, so y = 2, 3, 5
or 7.
Now let Ai = {x | 1 ≤ x ≤ 100 and i divides x}, for i = 2, 3, 5, 7. Then the
set of all primes in X is



[
X \ {1} \ 
Ai  ∪ {2, 3, 5, 7}.
i∈{2,3,5,7}
Now we compute the sizes of all the intersections of the Ai .
100
c = 50
2
100
b
c = 33
3
100
b
c = 20
5
100
b
c = 14
7
100
b
c = 16
6
100
b
c = 10
10
100
b
c=7
14
100
c=6
b
15
100
b
c=4
21
100
b
c=2
35
100
b
c=3
30
100
b
c=2
42
100
b
c=1
70
100
b
c=0
105
100
b
c=0
210
|A2 | = b
|A3 | =
|A5 | =
|A7 | =
|A2 ∩ A3 | =
|A2 ∩ A5 | =
|A2 ∩ A7 | =
|A3 ∩ A5 | =
|A3 ∩ A7 | =
|A5 ∩ A7 | =
|A2 ∩ A3 ∩ A5 | =
|A2 ∩ A3 ∩ A7 | =
|A2 ∩ A5 ∩ A7 | =
|A3 ∩ A5 ∩ A7 | =
|A2 ∩ A3 ∩ A5 ∩ A7 | =
Here, we are using the fact that if p1 , . . . , pl are distinct primes, then
Ap1 ∩ Ap2 ∩ . . . ∩ Apl = Ap1 p2 ···pl ,
20
CHAPTER 1. COUNTING
which follows from the Fundamental Theorem of Arithmetic.
So, by the inclusion exclusion formula, we have
|A2 ∪ A3 ∪ A5 ∪ A7 | = (50 + 33 + 20 + 14) − (16 + 10 + 7 + 6 + 4 + 2) + (3 + 2 + 1)
= 117 − 45 + 6 = 78.
Hence the number of primes in X is 100 − 78 − 1 + 4 = 25.
The following corollary to the inclusion-exclusion formula is useful when all
the Ai ’s ‘look the same’.
Corollary 6. Let X be a finite set. Suppose that A1 , A2 , . . . , An are subsets of
X, and assume that for every j with 1 ≤ j ≤ n and for every I ⊆ {1, 2, . . . , n}
with |I| = j we have
\ Ai = aj .
i∈I
(Note that this means a0 = |X|, as the intersection of no sets is understood to be
X.)
Then
n
n
[ X
j−1 n
(−1)
aj ,
Ai =
j
i=1
j=1
or equivalently,
X \
n
[
i=1
!
n
X
j n
(−1)
Ai =
aj .
j
j=0
Proof. We will prove the second version. If |I| = j, then the contribution from
I to the sum in the inclusion-exclusion
formula (2.4) is (−1)j aj . Adding this up
over all the sets of size j gives nj (−1)j aj . Finally adding up over all j gives the
result. Again, the first version follows by taking complements.
1.5
Counting surjections
Suppose S and T are sets with |S| = k and |T | = n. Recall that the number of
functions from S to T is nk . (This follows from Example 1.) We also saw that
the number of injections from S to T is n(n − 1) · · · (n − k + 1) (which is 0 if
k > n) — see Remark 4. But we did not count the number of surjections. Let’s
do this now.
We can count the number of surjections using the inclusion-exclusion formula.
Let S be a k-element set and let T be an n-element set. Without loss of
generality, we may assume that T = {1, 2, . . . , n}. Let F denote the set of all
functions from S to {1, 2, . . . , n}. Let
Ai = {f ∈ F : i ∈
/ f (S)}
1.5. COUNTING SURJECTIONS
21
denote the set of all functions in F whose image does not contain i. A surjection
is precisely a function which does not lie in any of the sets Ai , so the set of
surjections is
!
n
[
F\
Ai .
i=1
We can give calculate the size of this set using the inclusion-exclusion
formula.
T
Let I ⊂ {1, 2, . . . , n} with |I| = j. The intersection C = i∈I Ai is simply the
set of all functions in X whose image does not contain any i ∈ I. There is an
obvious one-to-one correspondence between C and the set of all functions from S
to {1, 2, . . . , n}\I. Therefore, the number of functions in C is (n−|I|)k = (n−j)k ,
the same as the number of functions from S to {1, 2, . . . , n} \ I, and therefore
\ Ai = (n − j)k .
i∈I
This depends only on |I| = j, so we can use the version of the inclusion-exclusion
formula in Corollary 6, with aj = (n − j)k , giving:
!
n
n
X
[
j n
(−1)
(n − j)k .
number of surjections from S to T = F \
Ai =
j
j=0
i=1
Note that the term with j = n is zero (there are no functions from S to
{1, 2, . . . , n} whose image contains none of {1, 2, . . . , n}), so we can rewrite this
as
! n−1
n
X
[
j n
(−1)
(n − j)k .
number of surjections from S to T = F \
Ai =
j
j=0
i=1
Example 12. How many ways are there to choose 3 teams (an A-team, a B-team
and C-team) from a class of 7 children? (Each team must have at least one child
in it, and the names of the teams are important, so swapping the children in the
A-team with the children in the B-team produces a different choice.)
Answer: this is simply the number of surjections from the set of children to
T = {A, B, C}, the set of team-names. Hence, we simply apply the above formula
with k = 7 and n = 3, giving
2
X
j 3
number of ways =
(−1)
(3 − j)7
j
j=0
3
3
3
7
7
=
·3 −
·2 +
· 17
0
1
2
= 37 − 3 · 27 + 3
= 1806.
22
1.6
CHAPTER 1. COUNTING
Permutations and derangements
Recall that a permutation of a set X is defined to be a bijection from X to itself.
We saw before that if X is an n-element set, then the number of permutations of
X is
n! := n(n − 1) . . . (2)(1).
When studying permutations, the names of the elements of the set X do not
matter, so from now on, we will work with permutations of the set {1, 2, . . . , n}.
We write Sn for the set of all permutations of {1, 2, . . . , n}. If f ∈ Sn , we can
write f as a 2 × n matrix, as follows:
1
2
...
n
.
f (1) f (2) . . . f (n)
This is known as the two-line notation for permutations. For example, when
n = 3,
1 2 3
3 1 2
represents the permutation which sends 1 to 3, 2 to 1, and 3 to 2.
Remark 6. The set Sn is a group under the binary operation of composition of
permutations. (Check that it satisfies the group axioms: closure, associativity,
identity and inverses.) We call it the symmetric group on {1, 2, . . . , n}.
We can also write a permutation in disjoint cycle notation. We do this by
example. Consider the permutation
1 2 3 4 5 6
f=
∈ S6 .
4 1 6 2 5 3
1.6. PERMUTATIONS AND DERANGEMENTS
23
We can represent f diagramatically as follows. Place 6 points in the plane,
label them with the numbers 1 to 6, and draw an arrow from i to f (i) for each
i ∈ {1, 2, 3, 4, 5, 6}:
1
2
4
3
6
5
This produces a set of disjoint cycles, in which each of the numbers occurs
exactly once. We now list these cycles in any order. Choose of the cycles (say
the top one), and write it out as a sequence, starting at any point (1 say):
(1 4 2).
Now choose any of the other cycles (say the second one), and write it out as
a sequence, starting at any point (3 say):
(1 4 2)(3 6).
Do the same with the last cycle:
(1 4 2)(3 6)(5).
This is a disjoint cycle representation for this permutation. Each cycle is a
list of iterates of the permutation: f sends 1 to 2, 2 to 4, and 4 to 1, it sends 3
to 6 and 6 to 3, and it sends 5 to itself.
Fact. It is easy to see that for any permutation, this process always produces a
list of disjoint cycles, in which each number in {1, 2, . . . , n} occurs exactly once.
Exercise 4. Check this fact!
The disjoint cycle representation of a permutation is not unique, as we can
choose what order we list the cycles in, and we can choose where to start each
cycle. (These are the only choices we have, however.)
As an example, we could have represented the permutation above as
(6 3)(5)(4 2 1),
24
CHAPTER 1. COUNTING
if we had first chosen to start with 6, and then with 5, and then with 4.
The cycles of length 1 in a disjoint cycle representation are the fixed points
of the permutation. When n is given beforehand, some authors abbreviate the
disjoint cycle notation by leaving out the cycles of length 1 from the disjoint cycle
representation. So the disjoint cycle representation above would be abbreviated
to:
(1 4 2)(3 6).
However, in this course, you should write out the cycles of length 1 as well, for
clarity.
To compute a disjoint cycle representation without drawing the picture above,
we start by writing down any number, say 1, and then we write down the iterates
of the permutation until we get back to 1 again. We get the cycle:
(1 4 2).
If we have written down all the numbers, we stop. Otherwise (as in this case),
we pick another number, say 3, and repeat the process. We now have two cycles:
(1 4 2)(3 6).
We repeat this process until we have written down all the numbers. We end up
with a list of cycles, in this case
(1 4 2)(3 6)(5).
How do we find the number of permutations with cycles of given lengths?
Let’s start with a simple example.
Example 13. How many permutations of {1, 2, 3, 4, 5, 6} are cycles of length 6
(‘6-cycles’)?
Answer: we can produce a 6-cycle (in disjoint cycle notation) by first choosing
an ordering of the set {1, 2, 3, 4, 5, 6} (this produces an ordered sequence containing each element of {1, 2, 3, 4, 5, 6} exactly once), e.g.
(3, 2, 1, 6, 4, 5)
and then turning it into a cycle:
(3 2 1 6 4 5).
There are 6! choices for the sequence (the same number as the number of permutations of {1, 2, 3, 4, 5, 6}!!), but notice that the permutations
(3 2 1 6 4 5),
(2 1 6 4 5 3),
(1 6 4 5 3 2),
(6 4 5 3 2 1),
(4 5 3 2 1 6),
(5 3 2 1 6 4)
1.6. PERMUTATIONS AND DERANGEMENTS
25
are all the same: 6 different sequences produce the same permutation. (The cycle
is the same, whichever number you choose to write at the start.) In general,
there are exactly 6 different ways of writing a cycle of length 6 (you just have to
choose which number to write at the start), so the process above produces each
permutation exactly 6 times. Therefore,
6 × number of 6-cycles in S6 = 6!
so
number of 6-cycles in S6 = 6!/6 = 5! = 120.
Exercise 5. In exactly the same way, show that the number of permutations in
Sn which are n-cycles, is (n − 1)!.
Now let’s do a slightly harder example.
Example 14. How many permutations of {1, 2, 3, 4, 5, 6} are there with two cycles of length 3?
We can produce these permutations by first choosing an ordering of the set
{1, 2, 3, 4, 5, 6}, e.g
(3, 2, 5, 4, 1, 6)
and then turning it into a permutation by bracketing the first three numbers
together, and then bracketing the last three numbers together, so the above
example produces
(3 2 5)(4 1 6).
As always, there are 6! choices for the ordering, but how many times do we
produce each permutation? The answer is, each permutation is produced exactly
3 × 3 × 2 = 18
times. Why? for any of the above permutations, we can represent it in
3 × 3 × 2!
ways: there are 3 choices for where to start the first 2-cycle, 3 choices for where
to start the second 2-cycle, and 2! choices for the order of the two 2-cycles. So
number of the above permutations × 3 × 3 × 2 = 6!,
and therefore
6!
= 40.
3×3×2
Permutations with no fixed points are called derangements, and are quite
useful in various parts of mathematics and computer science. It is easy to read off
the number of fixed points of a permutation f from a disjoint cycle representation
of f : it is just the number of cycles of length 1.
Consider the following puzzle.
number of the above permutations =
26
CHAPTER 1. COUNTING
Example 15. There are 100 guests at a party. When they arrive, they all turn
out to have identical coats. They put them in a big pile altogether, and the coats
get thoroughly mixed up. When the guests leave, one by one, each is equally likely
to pick up any of the remaining coats. What is the probability that no-one leaves
with their own coat?
If we label the guests with the numbers from 1 to 100, and we (invisibly) label
each person’s coat with the same number, then the matching between guests and
coats after the party defines a permutation f of {1, 2, . . . , n}: if person i takes
home coat j, we define f (i) = j. The condition that each guest is equally likely
to pick up any of the remaining coats, means that f is equally likely to be any
of the n! permutations of {1, 2, . . . , n}. (Think about this for a moment, to see
why.)
Notice that person i leaves with their own coat if and only if the permutation
f fixes i. So the probability that no-one leaves with their own coat is simply
number of derangements of {1, 2, . . . , 100}
.
total number of permutations of {1, 2, . . . , 100}
Writing down all the permutations in S100 in disjoint cycle notation, and
counting how many have no fixed points, would take far too much paper: there
are more than 9 × 10157 permutations in S100 , and there are only around 1087
particles in the universe!
In order to answer this question, we want to find a way of counting (or estimating) the number of permutations of {1, 2, . . . , n} which are derangements, for
general n.
In fact, we can use the inclusion-exclusion formula. If we let X be Sn , the set
of all permutations of {1, 2, . . . , n}, and we let Ai be the set of all permutations
of {1, 2, . . . , n} which have i as a fixed point, the set of all derangements of
{1, 2, . . . , n} is
!
n
[
X\
Ai .
i=1
We want a formula for the size of this set. To apply inclusion-exclusion, we must
calculate
\ Ai ,
i∈I
for each subset I ⊂ {1, 2, . . . , n}. The set ∩i∈I Ai is simply the set of permutations
of {1, 2, . . . , n} which fix every number in the set I. Therefore, it’s in one-to-one
correspondence with the set of all permutations of {1, 2, . . . , n} \ I, so it has size
(n − |I|)!. Since this just depends on |I|, we can again apply Corollary 6, this
1.6. PERMUTATIONS AND DERANGEMENTS
27
time with aj = (n − j)!, to get:
number of derangements of {1, 2, . . . , n} = X \
!
Ai i=1
n
X
n
=
(n − j)!
(−1)j
j
j=0
=
=
n
X
j=0
n
X
n
[
(−1)j
n!
(n − j)!
(n − j)!j!
(−1)j
n!
j!
j=0
= n!
n
X
(−1)j
j=0
j!
Therefore,
n!
number of derangements of {1, 2, . . . , n}
=
total number of permutations of {1, 2, . . . , n}
Pn
j=0
n!
(−1)j
j!
=
n
X
(−1)j
j=0
j!
.
Now does this series remind you of anything? What would we get if we took
the sum to infinity? We would get
∞
X
(−1)j
j=0
j!
,
which is one of the expansions of 1/e. This suggests that the number of permutations of {1, 2, . . . , n} which are derangements is approximately n!/e. But how
good is this approximation?
The error in this approximation is
n
∞
n
n!
X
X
(−1)j X (−1)j
(−1)j − n!
= n!
− n!
e
j! j=0 j!
j! j=0
j=0
∞
X
j
(−1)
n!
=
j!
j=n+1
∞
X
j
(−1)
=
(j) . . . (n + 1) j=n+1
Since
∞
X
(−1)j
(j) . . . (n + 1)
j=n+1
28
CHAPTER 1. COUNTING
is a sum of terms of decreasing absolute value and alternating signs, it must
converge, and its absolute value is at most the absolute value of the first term.
(This rule is known as the alternating series test.) Hence,
∞
X
(−1)j
1
< 12 for n ≥ 2.
≤
(j) . . . (n + 1) (n + 1)
j=n+1
It follows that the number of permutations of {1, . . . , n} which are derangements
is n!/e to the nearest integer — the best possible approximation we could possibly
hope for! (The calculation above only works for n ≥ 2, but it’s easy to see that
the statement holds for n = 1 as well.) So we see that, with an astonishingly
high degree of accuracy, the proportion of permutations of {1, 2, . . . , n} which are
derangements is 1/e, and
Probability[None of the 100 guests leaves with their own coat] =
[100!/e]
.
100!
(Here, if x is a real number, [x] denotes the integer nearest to x, rounded down
if x is of the form m + 1/2 for some integer m.)
Some group-theoretic aspects of permutations
For studying more group-theoretic aspects of permutations, disjoint cycle notation is very useful. In this section, when n is understood, we will sometimes
abbreviate the disjoint cycle notation for permutations in Sn , by missing out the
fixed points (the cycles of length 1). For example, if
1 2 3 4 5
f=
,
3 2 5 4 1
then we abbreviate the disjoint cycle notation
f = (1 3 5)(2)(4) ∈ S5
to
f = (1 3 5).
The identity permutation (which consists only of 1-cycles) is often abbreviated
to Id.
Recall that the set Sn of all permutations of {1, 2, . . . , n} is a group under
the multiplication operation of composition of permutations. If f, g ∈ Sn , the
composition gf is defined as follows.
(gf )(i) = g(f (i)) (i ∈ {1, 2, . . . , n}).
1.6. PERMUTATIONS AND DERANGEMENTS
29
(We compose starting from the right — first do f , then do g.) It is easy to check
that gf is a permutation.
Observe that if f ∈ Sn has disjoint cycle notation
f = Cl Cl−1 . . . C2 C1 ,
where C1 , . . . , Cl are cycles, then f is simply the product (composition) of cyclic
permutations
f = cl cl−1 . . . c2 c1 ,
where ci is the cyclic permutation which has Ci as a cycle, and which fixes all the
numbers not in Ci . So we can view the disjoint cycle notation as an expression
of a permutation as a composition of cyclic permutations. For example, if
f = (1 2 3)(4 5)(6) ∈ S6
in disjoint cycle notation, then
f = c3 c2 c1 ,
where
c1 = (6) = Id = (1)(2)(3)(4)(5)(6),
c2 = (4 5) = (4 5)(1)(2)(3)(6),
c3 = (1 2 3) = (1 2 3)(4)(5)(6).
It is easy to multiply permutations which are written in disjoint cycle notation.
We will do this by example. Suppose we wish to find gf , where
f = (1 3 5)(2)(4),
g = (1 3 2 5)(4).
Then
gf = [(1 3 2 5)(4)][(1 3 5)(2 4)].
Since multiplication is associative, and since we can think of f and g as compositions of the cycles in their disjoint cycle notation, this can be viewed as a product
of 4 cyclic permutations:
gf = (1 3 2 5)(4)(1 3 5)(2 4);
the four permutations are
(2 4),
(1 3 5),
(4) = Id,
(1 3 2 5),
30
CHAPTER 1. COUNTING
written here in abbreviated disjoint cycle notation (i.e., missing out the fixed
points). Since multiplying by the identity permutation has no effect, we can
shorten the product above by missing out the identity permutation:
gf = (1 3 2 5)(1 3 5)(2 4).
To obtain a disjoint cycle representation of gf , we begin by choosing any number
(say 1), to start the first cycle of gf :
gf = (1 . . .
What is (gf )(1)? Remember that we compose permutations from the right. The
cycle (2 4) takes 1 to 1 (it leaves 1 fixed), then the cycle (1 3 5) takes 1 to 3, and
finally the cycle (1 3 2 5) takes 3 to 2. So (gf )(1) = 2, and so we write 2 next:
gf = (1 2 . . .
What is (gf )(2)? The cycle (2 4) takes 2 to 4, the cycle (1 3 5) takes 4 to 4,
and finally the cycle (1 3 2 5) takes 4 to 4. So (gf )(2) = 4. Continuing in this
way, we find that (gf )(4) = 5, (gf )(5) = 3 and (gf )(3) = 1, so we obtain:
gf = (1 2 4 5 3).
Transpositions
Transpositions are the simplest type of permutation: a transposition just swaps
two elements, and leaves all the rest fixed. So, in disjoint cycle notation, a
transposition in Sn consists of one 2-cycle, and (n − 2) 1-cycles. In abbreviated
disjoint cycle notation, the transpositions in Sn are
{(i j) : 1 ≤ i < j ≤ n};
n
there are 2 of them.
We have the following
Fact. The transpositions generate Sn as a group, meaning that any permutation
in Sn is a product (composition) of a finite number of transpositions.
Since any permutation is a composition of the cycles in its disjoint cycle notation, to prove the above fact, it is enough to prove that any cycle is a composition
(product) of a finite number of transpositions. In fact, a k-cycle is a product of
k − 1 transpositions: observe that
(1 2 3 . . . k) = (1 2)(2 3)(3 4) . . . (k − 1 k),
using abbreviated disjoint cycle notation.
Using this fact, it is easy to see that any permutation f ∈ Sn is a product of
at most n − 1 transpositions.
Exercise 6. Show that if f ∈ Sn is an n-cycle, then at least n − 1 transpositions
are needed to express f as a product of transpositions. Show that if f ∈ Sn is not
an n-cycle, f can be expressed as a product of at most n − 2 transpositions.
1.6. PERMUTATIONS AND DERANGEMENTS
31
The sign of a permutation
We can express the same permutation as products of different numbers of transpositions — for example, if f = (1 2 3) ∈ S3 , then we have
f = (1 2)(2 3) = (2 3)(1 2)(1 3)(2 3).
However, we have the following theorem.
Theorem 7. Let f ∈ Sn . Whenever we write f as a product of transpositions,
f = t1 t2 . . . tl ,
the number of transpositions l always has the same parity: it is either always even
or always odd.
Proof 1. If f ∈ Sn , we define c(f ) to be the number of disjoint cycles in a full
disjoint cycle representation of f . (Full means that we include cycles of length
1.) For example, if f = (12)(3)(4) ∈ S5 , then c(f ) = 3. We now define
(f ) = (−1)n−c(f ) .
Our aim is to show that (f ) = 1 if f is a product of an even number of transpositions, and (f ) = −1 if f is a product of an odd number of transpositions.
Observe that if f is a transposition, then (f ) = −1, since f has n − 1 cycles
in its full disjoint cycle representation. I now make the following
Claim: For any permutation f ∈ Sn and any transposition (p q),
(f (p q)) = −(f ).
Proof of claim: Let f ∈ Sn , and let
f = C1 C2 . . . Cl
be a disjoint cycle representation of f . We have two cases to deal with:
Case (i): p and q are in different cycles of f .
Case (ii): p and q are in the same cycle of f .
First, suppose we are in case (i): p and q are in different cycles of f . Notice that
we can reorder disjoint cycles in a disjoint cycle representation, without changing
the permutation. Therefore, we may assume that p is in Cl−1 and q is in Cl .
Suppose Cl−1 = (p x1 x2 . . . xM ) and Cl = (q y1 y2 . . . yN ), so that
f = C1 C2 . . . Cl−2 (p x1 x2 . . . xM )(q y1 y2 . . . yN ).
32
CHAPTER 1. COUNTING
We can now write down a disjoint cycle representation for f (p q):
f (p q) = C1 C2 . . . Cl−2 (p x1 x2 . . . xM )(q y1 y2 . . . yN )(p q)
= C1 C2 . . . Cl−2 (p y1 y2 . . . yN q x1 x2 . . . xM ).
Therefore, f (p q) has one less cycle than f in its disjoint cycle representation, so
(f (p q)) = −(f ).
Now suppose we are in case (ii): p and q are in the same cycle of f . Since
we can reorder disjoint cycles in a disjoint cycle representation, without changing
the permutation, we may assume that p and q are both in the cycle Cl . Suppose
Cl = (p x1 x2 . . . xM q y1 y2 . . . yN ), so that
f = C1 C2 . . . Cl−1 (p x1 x2 . . . xM q y1 y2 . . . yN ).
We can now write down a disjoint cycle representation for f (p q):
f (p q) = C1 C2 . . . Cl−1 (p x1 x2 . . . xM q y1 y2 . . . yN )(p q)
= C1 C2 . . . Cl−1 (p y1 y2 . . . yN )(q x1 x2 . . . xM ).
Therefore, f (p q) has one more cycle than f in its (full) disjoint cycle representation, so again, (f (p q)) = −(f ). This proves the claim.
It now follows (by induction on r) that if f ∈ Sn , and t1 , . . . , tr are transpositions, then
(f t1 t2 . . . tr ) = (−1)r (f ).
Therefore, taking f = Id,
(t1 t2 . . . tr ) = (−1)r (Id) = (−1)r .
Hence, if g is a product of an even number of transpositions, then (g) = 1; if g
is a product of an odd number of transpositions, then (g) = −1. It follows that
no permutation can be written as a product of an even number of transpositions
and also as a product of an odd number of transpositions. This proves the
theorem.
Proof 2 (non-examinable). We now give a more algebraic proof. If P (X1 , . . . , Xn )
is any polynomial in X1 , . . . , Xn with real coefficients, and f ∈ Sn , we define
f (P ) = P (Xf (1) , Xf (2) , . . . , Xf (n) ).
In other words, to produce f (P ), we just take the polynomial P and replace
Xi with Xf (i) , for each i. This defines an action of Sn on the set of all real
polynomials in X1 , . . . , Xn . In particular, for any two permutations f, g ∈ Sn , we
have
(gf )(P ) = g(f (P )).
(1.6)
1.6. PERMUTATIONS AND DERANGEMENTS
33
(Obviously, replacing i with f (i) and then replacing f (i) with g(f (i)), for each i,
is the same as replacing i with (gf )(i), for each i.)
Now let ∆ be the polynomial
Y
∆=
(Xi − Xj ).
1≤i<j≤n
So, if f ∈ Sn , the polynomial f (∆) is defined by
Y
f (∆) =
(Xf (i) − Xf (j) ).
1≤i<j≤n
Observe that f (∆) = ±∆. This is because, if we write out the pairs
{(f (i), f (j)) : 1 ≤ i < j ≤ n},
then for each (a, b) with 1 ≤ a < b ≤ n, exactly one of (a, b) and (b, a) appears.
When (a, b) appears, we get a factor of (Xa − Xb ) in f (∆), just as we do in ∆.
When (b, a) appears, we get a factor of (Xb − Xa ) = −(Xa − Xb ) in f (∆), instead
of a factor of Xa − Xb (a sign-change). So
f (∆) = (−1)number of sign changes ∆ = ±∆.
If f (∆) = ∆, we say that sign(f ) = 1; if f (∆) = −∆, we say that sign(f ) = −1.
Now observe that if f is a transposition, f (∆) = −∆. To see this, suppose
that f = (p q), with p < q. If neither i nor j is equal to p or q, then Xf (i) −Xf (j) =
Xi − Xj (no sign change). If i = p and j 6= q, then Xf (p) − Xf (j) = Xq − Xj (sign
change if and only if p < j < q). If i 6= p and j = q, then Xf (i) − Xf (q) = Xi − Xp
(sign change if and only if p < i < q). Finally, if i = p and j = q, then
Xf (p) − Xf (q) = Xq − Xp (a sign change). The total number of sign changes is
2(q − p − 1) + 1, which is odd, so f (∆) = −∆, as claimed.
The same argument shows that, if (p q) is any transposition, and g ∈ Sn , then
(p q)(g(∆)) = −g(∆).
Therefore, using (1.6),
((p q)g)(∆) = (p q)(g(∆)) = −g(∆).
Induction on l now shows that for any f ∈ Sn , if f is a product of l transpositions,
then
f (∆) = (−1)l ∆,
so
sign(f ) = (−1)l .
So if sign(f ) = 1, then l is always even; if sign(f ) = −1, then l is always odd.
This proves the theorem.
34
CHAPTER 1. COUNTING
Definition. If f ∈ Sn has sign(f ) = 1 (meaning that f is a product of an even
number of transpositions), we call it an even permutation; if it has sign(f ) = −1
(meaning that f is a product of an odd number of transpositions), we call it an
odd permutation.
Remark 7. The two proofs of Theorem 7 shows that (f ) = sign(f ).
It follows immediately from Theorem 7 that sign is a group homomorphism
from Sn to the cyclic group ({±1}, ×), meaning that
sign(f g) = sign(f ) sign(g) ∀f, g ∈ Sn .
Therefore, Kernel(sign) (which is the set of all even permutations in Sn ) is a
normal subgroup of Sn . It is called the alternating group of order n, written An .
By the Isomorphism Theorem, we have
Sn / Kernel(sign) ∼
= Image(sign) = {±1},
and therefore
|Sn |
= |{±1}| = 2.
| Kernel(sign)|
So |An | = n!/2. In Exercise Sheet 3, question 9, you are asked to give a bijection
between the set of even permutations and the set of odd permutations in Sn ; this
gives another proof that |An | = n!/2.
Chapter 2
Recurrence relations &
generating series
2.1
Introduction
In combinatorics, we often want to find the solution to a sequence of counting
problems. For example, we calculated the number of permutations of an nelement set, for each positive integer n. This leads us to study sequences,
s1 , s2 , . . .
where sn , the nth element of the sequence, is the number of objects of a certain
kind, which are at ‘level’ n. We call this a combinatorial sequence.
Often, we can express the nth element of a combinatorial sequence in terms
of earlier elements, for each n. For example, if sn is the number of orderings of
{1, 2, . . . , n} (i.e., the number of permutations of {1, 2, . . . , n}), then sn = nsn−1 .
This is called a recurrence relation. Recurrence relations are very useful, both for
calculating early values in a sequence, and for proving general formulae.
Let’s see one of the most famous early examples of a recurrence relation.
Example 16. Leonardo Fibonacci was an Italian mathematician of the 13th century. His most important work was the introduction of the Arabic numerals 0, 1,
2, 3, 4, 5, 6, 7, 8, 9 to Europe. In order to show how much easier it is to calculate
with these than with the Roman numerals previously used, he posed the following
problem as an exercise in his book Liber Abaci (The Book of Calculation):
‘A pair of rabbits do not breed in their first month of life, but at the end of
the second and every subsequent month they produce one pair of offspring (one
male, and one female). If I acquire a new-born pair of rabbits at the beginning of
the year, how many pairs of rabbits will I have at the end of the year?’
Answer. Under these conditions, the number of pairs of rabbits after n months
is called the nth Fibonacci number, Fn . How do we calculate these numbers?
35
36
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
We have F0 = 1, since there is one pair of rabbits after 0 months, and F1 = 1,
since no breeding takes place in the first month. For each n ≥ 2, we have
Fn = Fn−1 + Fn−2 , since there are Fn−1 pairs of rabbits at the beginning of the
nth month, and only the Fn−2 pairs born before the (n − 1)th month are old
enough to breed at the end of the nth month; each such pair produces exactly
one new pair.
Hence, it takes only 11 additions to calculate the number of rabbits after 12
months:
F0
F1
F2
F3
F4
F5
F6
F7
F8
F9
F10
F11
F12
=1
=1
= F0 + F1 = 2
= F1 + F2 = 3
=5
=8
= 13
= 21
= 34
= 55
= 89
= 144
= 233.
So there are 2 × 233 = 466 rabbits after 12 months. This was easy using the
new Arabic numerals, but not so easy using Roman numerals (try it!).
As it well known, the Fibonacci numbers occur as the number of spirals on
the seed heads of many different plants.
The recurrence relation Fn = Fn−1 +Fn−2 is an example of a 2-term recurrence
relation: Fn is given in terms of Fn−1 and Fn−2 . Before investigating it in more
detail, let’s look at 1-term recurrence relations, the simplest type of recurrence
relation.
2.2
Solving recurrence relations
Example 17. A bacterium reproduces by dividing into 2 identical bacteria after
it has lived for 1 minute. Suppose we start with one of these bacterium. Let xn
be the number of bacteria after n minutes. Find a recurrence relation for xn , and
find xn as a function of n.
Answer: Clearly, we have x0 = 1 and xn = 2xn−1 for all n ≥ 1. It follows
(formally, by induction) that xn = 2n for all n ≥ 0. So after an hour, there will
be 260 > 1018 bateria. Yikes!
2.2. SOLVING RECURRENCE RELATIONS
37
The recurrence relation xn = 2xn−1 is an example of a 1-term recurrence
relation.
Example 18. A new kind of bacterium is discovered which reproduces by dividing
into k identical bacteria after it has lived for 1 minute. Suppose we start with
s of these bacteria. Let yn be the number of bacteria after n minutes. Find a
recurrence relation for yn , and find yn as a function of n.
Answer: clearly, we have y0 = s and yn = kyn−1 for all n ≥ 1. It follows that
yn = sk n for all n ≥ 0.
Now let’s look at 2-term recurrence relations.
Example 19. If fn = fn−1 + 2fn−2 , f0 = 1, and f1 = 1, find a formula for fn as
a function of n.
Inspired by our success for 1-term recurrence relations, let’s try a solution of
the form fn = tn . Substituting this into the recurrence relation gives:
tn = tn−1 + 2tn−2 .
Rearranging,
tn − tn−1 − 2tn−2 = 0
Factorizing,
tn−2 (t − 2)(t + 1) = 0.
This has solutions t = 0, t = −1, t = 2, so we know that the function
Atn
satisfies the recurrence relation for t = 0, −1 or 2. (You can check this directly.)
Since the recurrence relation is linear, any linear combination of these functions
also satisfies the recurrence relation, so
A(−1)n + B2n
satisfies the recurrence relation, for any real numbers A and B. This gives a
family of solutions; it is called the ‘general solution’. We must now find the
correct values of A and B to satisfy the initial conditions f0 = 0,f1 = 1. We do
this by substituting these initial conditions into the ‘general solution’
fn = A(−1)n + B2n .
Substituting n = 0, we get
0 = A + B;
substituting n = 1 we get
1 = −A + 2B.
38
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
To solve this pair of simultaneous equations, we do the usual thing: eliminate A
by adding the two equations together, getting 3B = 1, so B = 1/3. Substituting
this back into the first equation gives A = −1/3. So
fn = 13 (2n − (−1)n )
satisfies the recurrence relation and the two initial conditions. Is it the only
solution to the problem? Yes, because the problem has exactly one solution: the
value of f0 , the value of f1 , and the recurrence relation, tell us exactly what the
other fn ’s must be. So the solution to the problem is
fn = 13 (2n − (−1)n ).
Exercise 7. Give an example of a real-world population (like Fibonacci’s rabbits)
which satisfies the conditions of Example 19.
Let’s now apply this method to find a formula for the Fibonacci numbers,
as a function of n. We substitute Fn = tn into the recurrence relation Fn =
Fn−1 + Fn−2 . This gives
tn = tn−1 + tn−2 .
Rearranging,
tn − tn−1 − tn−2 = 0.
Taking out a factor of tn−2 ,
tn−2 (t2 − t − 1) = 0.
Solving the equation t2 − t − 1 = 0 for t, we get t = 12 (1 ±
‘general solution’ of the form
√ !n
√ !n
1+ 5
1− 5
Fn = A
+B
.
2
2
√
5). So we try a
Substituting in n = 0 gives
1 = A + B.
Substituting in n = 1 gives
1=A
√ !
1+ 5
+B
2
√ !
1− 5
.
2
This is just a pair of simultaneous
√ equations we have√to solve. To simplify the
calculation, let’s write α = (1 + 5)/2 and β = (1 − 5)/2. The equations now
2.2. SOLVING RECURRENCE RELATIONS
39
become
1=A+B
1 = Aα + Bβ
⇒ α = Aα + Bα
⇒ α − 1 = B(α − β)
√
√
⇒ ( 5 − 1)/2 = B 5
√
⇒ B = −β/ 5
√
⇒ A = α/ 5
Hence,
√ !n+1
1
1
5
1
+
Fn = Aαn + Bβ n = √ (αn+1 − β n+1 ) = √ 
−
2
5
5


√ !n+1
1− 5

2
satisfies both the recurrence relation and the two initial conditions. As before,
there can only be one solution to the problem, and we have found it. So we have
solved Fibonacci’s problem!
Observe that α > 1 and |β| < 1, so the ratio between two consecutive Fibonacci numbers satisfies
√
αn+1 − β n+1
1+ 5
Fn
=
→α=
as n → ∞.
Fn−1
αn − β n
2
As you probably know, this limit is known as the Golden Ratio. The 16th
century Franciscan friar Lucia Pacioli thought that the golden ratio had a special
spiritual significance, and called it De Divine Proportione. Some have claimed
that it is the architectural ratio most pleasing to the human eye, but (sadly)
there is no statistical evidence for this. The claim that the Parthenon was built
using the Golden Ratio is also, sadly, wrong! (Look at some photographs!) But
it does occur in some beautiful natural spirals.
Now let’s look at a trickier example.
Example 20. Solve the recurrence relation fn = 4fn−1 − 4fn−2 with the initial
conditions f0 = 1, f1 = 4. (I.e., find fn as a function of n.)
In this case putting fn = tn yields the equation tn−2 (t−2)2 = 0, with repeated
root t = 2, so certainly fn = A2n is a solution to the recurrence relation. But no
choice of A satisfies both initial conditions, so it looks like we’re stuck. However,
the key thing to notice is that there is actually another solution: fn = n.2n is
also a solution, for in this case
fn − 4fn−1 + 4fn−2 = n.2n − 4(n − 1)2n−1 + 4(n − 2)2n−2
= 2n (n − 2(n − 1) + (n − 2)) = 0
40
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
So we try the general solution
fn = (A + Bn)2n .
Now substitute in the initial conditions, to get
f0 = 1 = A
f1 = 4 = 2A + 2B
⇒ B = 1,
A = 1,
so the solution is fn = (n + 1)2n .
In this course, we will mostly be concerned with linear recurrence relations,
which are of the form
n
X
fn =
Aj,n fn−j ,
j=1
where the coefficients Aj,n are real numbers. A k-term recurrence relation is one
that expresses fn in terms of fn−1 , fn−2 . . . , and fn−k alone. A k-term linear
recurrence relation with constant coefficients is of the form
fn =
k
X
cj fn−j = c1 fn−1 + c2 fn−2 + . . . + cn−k fn−k .
j=1
(The coefficients are said to be constant because cj , the coefficient of fn−j , is
only allowed to depend on j.) These are the most important ones for this course,
and are the easiest to solve; indeed, we will now see a general method for solving
them.
A general method for solving k-term linear recurrence relations with constant coefficients
We now describe a general method for solving k-term linear recurrence relations
with constant coefficients.
Suppose we want to solve the recurrence relation
fn = c1 fn−1 + c2 fn−2 + · · · + ck fn−k
(for n ≥ k), where c1 , . . . , ck are constants, subject to initial values f0 = a0 ,
f1 = a1 , . . . , fk−1 = ak−1 . (Notice that in order for there to be a unique solution,
there will be k initial conditions.)
Step 1: Write down the characteristic equation: this is given by substituting
fn = tn into the recurrence relation and cancelling the factor of tn−k . So in our
case, it is:
tk − c1 tk−1 − c2 tk−2 − · · · − ck−1 t − ck = 0.
2.2. SOLVING RECURRENCE RELATIONS
41
Now find the roots of the characteristic equation, with their multiplicities. Suppose the roots are α1 with multiplicity m1 , and α2 with multiplicity m2 , and so
on, up to αr with multiplicity mr . Then m1 + m2 + · · · + mr = k.
Step 2: the solutions corresponding to each αi are
(Ai + Bi n + Ci n2 + · · · + Zi nmi −1 )αin .
The number of arbitrary constants in this expression is mi . Putting fn equal to
the sum of all of these, for 1 ≤ i ≤ r, gives an expression with m1 +m2 +· · ·+mr =
k arbitrary constants.
Step 3: substitute in the values f0 = a0 , . . . , fk−1 = ak−1 to get k simultaneous linear equations in k unkowns A1 , . . . , Zr . Solve these to get the unique
solution for fn .
Example 21. Find a formula for fn defined by the recurrence relation fn =
3fn−2 + 2fn−3 (for n ≥ 3) and the initial conditions f0 = 2, f1 = 0, f2 = 7.
Answer: if fn = tn , then tn − 3tn−2 − 2tn−3 = 0. Cancelling the factor of tn−3 ,
we get the characteristic equation:
0 = t3 − 3t − 2 = 0.
Factorizing this gives
(t + 1)2 (t − 2) = 0
This has roots 2 (with multiplicity 1) and -1 (with multiplicity 2), so the general
solution is
fn = A2n + B(−1)n + Cn(−1)n .
Substiting n = 0, 1, 2 gives the three simultaneous equations
2 = A+B
0 = 2A − B − C
7 = 4A + B + 2C
which you solve in the usual way to get A = 1, B = 1, C = 1. Hence, the solution
is
fn = 2n + (n + 1)(−1)n .
More complicated recurrence relations
We will now see a more complicated example of a recurrence relation.
The nth Bell number Bn is defined as the number of partitions of a set with n
elements. The names of the elements do not matter, so we might as well suppose
our set is {1, 2, . . . , n}.
42
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
If n = 0, then X = ∅, and there is a unique partition of X, namely ∅. Hence,
B0 = 1.
If n = 1, then X = {1}, and there is a unique partition of X, namely {{1}},
so B1 = 1.
If n = 2, then X = {1, 2}, and there are exactly two partitions of X, one
partition into two pieces and one partition into just one part, that is {{1}, {2}}
and {{1, 2}}. So B2 = 2.
When n = 3, we have one partition into a single part, {{1, 2, 3}}, and one
partition into three parts, {{1}, {2}, {3}}, and three partitions into two parts,
{{1}, {2, 3}}, {{2}, {1, 3}}, and {{3}, {1, 2}}. Therefore B3 = 5.
Theorem 8. The Bell numbers satisfy the following recurrence relations:
n X
n−1
Bn =
Bn−k .
k
−
1
k=1
Proof. Let Bn be the set of all partitions of {1, 2, . . . , n}, so that Bn = |Bn |.
Now we divide up Bn according to the size of the part of the partition containing n. Let Tk be the set of those partitions π of {1, 2, . . . , n} such that the
part of π which contains n has size k. In symbols,
Tk = {π ∈ Bn : |S| = k, where S is the part of π which contains n}.
Now every partition π of {1, 2, . . . , n} has a unique part S ∈ π such that
n ∈ S, and this part must have some size, between 1 and n inclusive. So
Bn = |Bn | =
n
X
|Tk |.
k=1
Next we need to work out the size of Tk , for each k. We can pick the partitions
in Tk by a two-stage process: first pick the part of the partition which contains
n; then pick the rest of the partition.
Stage 1: We need to pick the set S, of size k, such that n ∈ S. In other words, we
need to pick
k − 1 more elements of S, from the set {1, 2, . . . , n − 1}. There
n−1
are k−1 ways of doing this.
Stage 2: We have already put k elements into one part of the partition, so now we
have to partition the remaining n − k elements. This can be done in Bn−k
ways.
Therefore, we have
n−1
|Tk | =
Bn−k .
k−1
2.2. SOLVING RECURRENCE RELATIONS
43
Hence,
n
X
n X
n−1
Bn =
|Tk | =
Bn−k ,
k
−
1
k=1
k=1
proving the theorem.
This recurrence relation does not have constant coefficients, and is not k-term
for any fixed k, so we cannot use our earlier methods to solve it. However, it can
be used to compute small Bell numbers relatively quickly.
Example 22. Use the recurrence relation to compute B4 and B5 .
Answer:
3
3
3
3
B4 =
B3 +
B2 +
B1 +
B0
0
1
2
3
=5+3×2+3+1
= 15,
B5 = B4 + 4B3 + 6B2 + 4B1 + B0
= 15 + 20 + 12 + 4 + 1
= 52.
In fact, we can find an explicit formula for Bn , using a completely different
argument, from Probability!
Theorem 9. The Bell numbers are given by the formula
∞
1 X rn
.
Bn =
e r=0 n!
Proof. (Non-examinable.) Let Bn denote the set of all partitions of the set
{1, 2, . . . , n}. First, I make the following.
Claim. For any real number t, we have
X
tn =
(t)|π| .
(2.1)
π∈Bn
Here, |π| denotes the number of parts of the partition π, and
(t)r = t(t − 1) . . . (t − r + 1)
denotes the rth falling factorial moment of t; this is defined for all real numbers
t.
44
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
First, I prove (2.1) whenever t is a positive integer. In this case, the lefthand side is the total number of functions from {1, 2, . . . , n} to X, where X is
a t-element set. But I can also choose a function f from {1, 2, . . . , n} to X as
follows. First, I choose a partition π = {S1 , S2 , . . . , Sk } of {1, 2, . . . , n}. Then I
choose a sequence of k distinct elements of X, (t1 , . . . , tk ) say, and I put f (i) = tj
for all i ∈ Sj , for each j. In other words, I am choosing k distinct values for the
function f to take, and then I force it to take the jth value on every number
in the jth part, for each j. The number of ways of choosing the sequence of k
distinct elements of X is simply t(t − 1)(t − 2) . . . (t − k + 1) = (t)k . Hence, the
number of functions from {1, 2, . . . , n} to X is also equal to
X
(t)π .
π∈Bn
This proves (2.1) whenever t ∈ N. Notice that
X
tn −
(t)|π|
π∈Bn
is a polynomial of degree at most n in the variable t. We have shown that
every positive integer is a root of this polynomial. A polynomial which is not
identically zero, has only finitely many roots, so the above polynomial must be
the zero polynomial. It follows that
X
tn =
(t)|π|
π∈Bn
for all real numbers t, proving the claim.
It follows that if T is a Poisson random variable with mean 1, then
X
(T )|π| .
Tn =
π∈Bn
Taking the expectation of both sides, it follows that
X
E[T n ] =
E[(T )|π| ].
π∈Bn
By definition, we have
n
E[T ] =
∞
X
r=0
e−1
rn
.
r!
Moreover, for any integer k, we have
∞
X
∞
∞
1
1X
1
1X1
e
E[(T )k ] =
e r(r − 1) . . . (r − k + 1) =
=
= = 1.
r!
e r=k (r − k)!
e l=0 l!
e
r=0
−1
2.2. SOLVING RECURRENCE RELATIONS
45
Hence, we have
∞
X
e−1
r=0
X
X
rn
= E[T n ] =
E[(T )|π| ] =
1 = |Bn |,
r!
π∈B
π∈B
n
n
proving the theorem.
Example 23. For each positive integer n, let Cn denote the number of possible
triangulations of a convex (n + 2)-sided polygon, by non-intersecting diagonals.
(A diagonal is a straight line between two non-adjacent vertices.) For example,
we have C1 = 1 and C2 = 2, since there are 2 possible triangulations of a convex
quadrilateral. For convenience, we define C0 = 1. The Cn ’s are called the Catalan
numbers. Find a recurrence relation for Cn in terms of Cn−1 , Cn−2 , . . . , C0 .
Answer: Suppose n ≥ 2. Let the vertices of our (n + 2)-sided polygon be
v1 , v2 , . . . , vn+2 .
We can generate all the possible triangulations (generating each triangulation
exactly once) as follows. Choose a triangle for the side v1 vn+2 to be in; say it is
in the triangle v1 vi vn+2 , where i ∈ {2, 3, . . . , n + 1}. After including this triangle,
we must finish off the triangulation. How many ways of doing this are there? If
i = 2, then we just have to triangulate the (n + 1)-sided polygon v2 v3 . . . vn+2 v2 ;
there are Cn−1 ways of doing this. Similarly, if i = n + 1, then we just have to
triangulate the (n+1)-sided polygon v1 v2 . . . vn+1 v1 ; there are Cn−1 ways of doing
this. If 3 ≤ i ≤ n, then we must triangulate the i-sided polygon v1 v2 . . . vi v1 , and
then we must triangulate the (n + 3 − i)-sided polygon vi vi+1 . . . vn+2 vi ; in total,
there are Ci Cn+1−i ways of doing this. So altogether, there are
Cn−1 + C1 Cn−2 + C2 Cn−3 + . . . + Cn−3 C2 + Cn−2 C1 + Cn−1
ways of triangulating the (n + 2)-sided polygon, so
Cn = Cn−1 + C1 Cn−2 + C2 Cn−3 + . . . + Cn−3 C2 + Cn−2 C1 + Cn−1
= C0 Cn−1 + C1 Cn−2 + C2 Cn−3 + . . . + Cn−3 C2 + Cn−2 C1 + Cn−1 C0
=
n−1
X
Ck Cn−1−k ,
k=0
for all n ≥ 2.
Note that this is not a linear recurrence relation; there is no easy way to solve
it! To find a formula for the Cn ’s, we consider another problem which produces
the same recurrence relation.
46
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
Example 24. Let Ln denote the number of (2n)-step paths in the xy-plane which
go from (0, 0) to (n, n) by moving either right by 1 (from (x, y) to (x+1, y)) or up
by 1 (from (x, y) to (x, y + 1)) at each step, and never rise above the line y = x.
For example, L1 = 1, and L2 = 2. We define L0 = 1 for convenience. Find a
recurrence relation for Ln , and then, using a different argument, find a general
formula for Ln in terms of n.
Answer: Suppose n ≥ 2. Such a path must start off by moving right, from
(0, 0) to (1, 0). Suppose it moves above the line y = x − 1 for the first time when
it goes from (i + 1, i) → (i + 1, i + 1), where i ∈ {1, 2, . . . , n}. The total number
of paths from (1, 0) to (i + 1, i) which never move above the line y = x − 1 is
Li , and the total number of paths from (i + 1, i + 1) to (n, n) which never move
above the line y = x is Ln−1−i , so the total number of paths which move above
the line y = x − 1 for the first time when going from (i + 1, i) → (i + 1, i + 1), is
Li Ln−1−i . It follows that
Ln = L0 Ln−1 +L1 Ln−2 +L2 Ln−3 +. . .+Ln−3 L2 +Ln−2 L1 +Ln−1 L0 =
n−1
X
Lk Ln−1−k ,
k=0
for all n ≥ 2. So the Ln ’s satisfy the same recurrence relation as the Cn ’s! Since
L0 = C0 = 1 and L1 = C1 = 1, it follows that Ln = Cn for all n.
Let’s now find a general formula for the Ln ’s, without using the recurrence
relation above. To do this, we let Pn denote the set of (2n)-step paths in the
xy-plane which go from (0, 0) to (n, n) by moving either right by 1 (from (x, y)
to (x + 1, y)) or up by 1 (from (x, y) to (x, y + 1)) at each step, and we let Qn
denote the subset of these paths which do rise above the line y = x. Clearly, we
have
Ln = |Pn | − |Qn | ∀n.
Notice that
|Pn | =
2n
.
n
This is because any path in Pn has 2n steps in total, and to choose a path in
Pn , we just have to choose which n of those 2n steps are steps to the right.
The number of ways of doing
this is simply the number of n-element subsets of
{1, 2, . . . , 2n}, which is 2n
.
n
Now let’s find a formula for |Qn |. We do this by finding a bijection from Qn
to a ‘simpler’ set Rn which we know how to count.
For any path q in Qn , let (i, i) → (i, i + 1) be the first step at which it moves
above the line y = x. Reflect the portion of the path after (i + 1, i) in the line
y = x + 1. The resulting path r goes from (0, 0) to (n − 1, n + 1) by taking n − 1
steps to the right and n + 1 steps upwards (in some order), and any path of this
form is obtained from exactly one path in Qn . Hence, the map q 7→ r defines a
bijection from Qn to the set Rn , where Rn is the set of (2n)-step paths that go
2.2. SOLVING RECURRENCE RELATIONS
47
from (0, 0) to (n − 1, n + 1) by taking n − 1 steps to the right and n + 1 steps
upwards (in some order). Notice that
2n
|Rn | =
,
n−1
since to choose a path in Rn , we must simply choose n − 1 steps out of 2n in
which to move right. Therefore,
2n
|Qn | = |Rn | =
.
n−1
Hence,
Ln = |Pn | − |Qn | =
So
2n
2n
2n
n
2n
1
2n
−
=
−
=
.
n
n−1
n
n+1 n
n+1 n
2n
1
.
Cn =
n+1 n
We have found a formula for the Catalan numbers!
Bijective proofs
We proved that Ln = Cn by showing that the sequences satisfied the same recurrence relation and initial conditions. If we know (or suspect) that two sets have
the same size, an elegant way of proving this (without doing any calculations) is
to construct a bijection between the two sets. Such proofs are called bijective
proofs, and are greatly prized by combinatorialists!
In this section, we will show that the Catalan numbers count two other types
of objects, by constructing bijections.
Example 25. In a group, multiplication is always associative:
(a · b) · c = a · (b · c)
for all a, b and c. However, some common multiplication operations are nonassociative. For example, the cross-product of vectors in R3 is non-associative:
we have
(i × i) × j = 0 × j = 0, i × (i × j) = i × k = −j.
Suppose ∗ is a non-associative multiplication operation. In order to make the
product
a∗b∗c
48
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
well-defined, we must place brackets to indicate the order in which we multiply.
There are two ways of bracketing a product of 3 elements:
(a ∗ b) ∗ c
,
a ∗ (b ∗ c).
Let Mn denote the number of ways of bracketing a product of n + 1 elements.
Show that Mn = Cn , the nth Catalan number, by constructing an appropriate
bijection.
Answer: let Tn denote the set of all triangulations of a fixed, convex (n + 2)gon. Let Mn denote the set of all different ways of bracketing the product
a1 ∗ a2 ∗ a3 ∗ . . . ∗ an+1 .
Our aim is to define a function f : Tn → Mn , and to show that it is a bijection.
To do this, take a convex (n + 2)-gon, P say, and label its sides with the symbols
a1 , a2 , . . . , an+1 in an anticlockwise order starting from any side. Leave the last
side blank. Let T be a triangulation of P . Look at the triangles of T . If any
triangle has two of its sides already labelled and its third side unlabelled, label
the third side with the product
(label of the first side) ∗ (label of the second side),
where the order of (first side, second side, third side) is anticlockwise.
Repeat this process, until you have labelled every line which is a side of a
triangle in the triangulation. The last line to be labelled will be the ‘blank’ side
of P ; the label on this side is a bracketing of the product a1 ∗ a2 ∗ . . . ∗ an+1 .
We define f (T ) to be this bracketing. For example, replacing a1 , . . . , a5 with
a, b, c, d, e, we would obtain
There are two things we must check, to be completely rigorous. Firstly, we
must check that this process always actually produces a bracketing (i.e, that the
function f is well-defined). And secondly, we must check that the function f is
actually a bijection.
First of all, we have to be able to start the process, so we must make sure
that for any triangulation T , there is one triangle of T which has two sides that
are both labelled sides of P . There are n + 2 sides and n triangles, so there must
be at least two triangles (∆1 and ∆2 , say), each of which shares two sides with
P . There is only one unlabelled side of P , so one of these two triangles (∆1 say)
must share two labelled sides of P , ai and ai+1 say. So we can label the third side
of ∆1 as ai ∗ ai+1 . We can then replace the two sides ai and ai+1 with the new
side ai ∗ ai+1 , producing a triangulated (n + 1)-gon, and repeat the above process
on the triangulation of the (n + 1)-gon.
It is easy to see that if, at any stage, we have a choice of two or more triangles
whose remaining side we can label, it does not matter which we choose (we will
2.2. SOLVING RECURRENCE RELATIONS
49
c
b
d
(a*b)*c
a*b
d*e
a
e
((a*b)*c)*(d*e)
always get the same bracketing at the end). Moreover, the unlabelled side of P
is the last side to be labelled by this process.
To show that f is a bijection, we will show that it has an inverse, g. Again, take
a convex (n+2)-gon, P say, and label its sides with the symbols a1 , a2 , . . . , an+1 in
an anticlockwise order starting from any side. Leave the last side blank. Given a
bracketing of a1 ∗ a2 ∗ . . . ∗ an , we produce a triangulation of P as follows. Choose
any pair (ai ∗ ai+1 ) which is bracketed together, and include the triangle with
sides ai and ai+1 in the triangulation. Let b be the other side of this triangle. It
remains for us to triangulate the convex (n + 1)-gon P 0 produced by replacing the
two sides ai and ai+1 with the side b. Replace (ai ∗ ai+1 ) with b in the bracketing,
and use this new bracketing to triangulate P 0 , by repeating the above process.
Example 26. Let Vn denote the set of all sequences of X’s and Y ’s such that
there are n X’s, n Y ’s, and for any k, the number of Y ’s in the first k terms of
the sequence never exceeds the number of X’s. Let Vn = |Vn | denote the number
of these sequences. For example, V2 = 2: we have
XXY Y
,
XY XY.
Show that Vn = Mn , the number of bracketings of a product of length n + 1, by
constructing a bijection between the sets Vn and Mn .
Answer: our bijection f˜ is defined as follows. Given a bracketing of a1 ∗ a2 ∗
. . . ∗ an+1 , add one left bracket on the far left and one right bracket on the far
right. Directly below this new bracketing, write an X underneath every ∗ and a
Y underneath every right bracket. The sequence of X’s and Y ’s that you get is in
Vn , since there are n ∗’s, n right brackets, and there cannot be more right brackets
than ∗’s up to any point in the bracketing. (Each right bracket corresponds to a
50
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
unique ∗, the last operation which it encloses, and this ∗ must lie before it.) For
example, the bracketing
(a1 ∗ a2 ) ∗ ((a3 ∗ a4 ) ∗ a5 )
produces new bracketing
((a1 ∗ a2 ) ∗ ((a3 ∗ a4 ) ∗ a5 )),
from which we get the sequence
XY XXY XY Y.
To show that f˜ is a bijection, we will show that it has an inverse, g̃. Define g̃
as follows. Given a sequence in Vn , write a ∗ below each X and a right bracket
below each Y . Now place an a1 before the first ∗, and an ai directly after the
ith ∗, for each i ∈ {1, . . . , n}. How do we choose where the left brackets go? At
some point in our sequence of ai ’s, ∗’s and right brackets, we must have a string
of the form
ai ∗ ai+1 )
(There is one more ai than right bracket, and the sequence ends with a right
bracket, so at some point there must be two ai ’s separated only by a ∗.) Place a
left bracket just before ai , to produce
. . . (ai ∗ ai+1 ) . . .
Now draw a box around (ai ∗ ai+1 ) , and regard it as a single ‘letter’, b say. We
now have a sequence of n letters,
a1 , . . . , ai−1 , b, ai+2 , ai+3 , . . . , an ,
and n−1 right brackets. Repeat the above process on the new sequence, to choose
where the next left bracket goes. Continuing, we eventually obtain a bracketing
of a1 ∗ a2 ∗ . . . ∗ an+1 , with an extra left bracket on the far left and an extra
right bracket on the far right. Deleting the two extra brackets produces our final
bracketing. For example, from the sequence
XXY Y XY XY
we get
∗∗))∗)∗),
then
a1 ∗ a2 ∗ a3 )) ∗ a4 ) ∗ a5 ),
a1 ∗ (a2 ∗ a3 )) ∗ a4 ) ∗ a5 ),
(a1 ∗ (a2 ∗ a3 )) ∗ a4 ) ∗ a5 ),
((a1 ∗ (a2 ∗ a3 )) ∗ a4 ) ∗ a5 ),
(((a1 ∗ (a2 ∗ a3 )) ∗ a4 ) ∗ a5 ),
((((a1 ∗ (a2 ∗ a3 )) ∗ a4 ) ∗ a5 ),
2.3. GENERATING SERIES
51
so the final bracketing is
(((a1 ∗ (a2 ∗ a3 )) ∗ a4 ) ∗ a5 .
Example 27. Let Ln denote the set of all paths in Example 24, so that Ln = |Ln |.
Show that Ln = Vn , by finding a bijection between Ln and Vn .
Answer: there is an obvious bijection, f¯ say: given a sequence of X’s and Y ’s,
X means ‘go right by 1’ and Y means ‘go up by 1’.
In the three examples above, we constructed bijections:
f˜
f
f¯
Tn −→ Mn −→ Vn −→ Ln .
The composition f¯ ◦ f˜ ◦ f of these bijections, is a bijection from Tn to Ln .
2n
1
, without using recurrence
This gives an alternative proof that Cn = Ln = n+1
n
relations!
This is a very beautiful argument, but it requires a certain amount of ingenuity
(or luck!) to get the formula
2n
1
Ln =
.
n+1 n
In the next section, we will see how to use generating series to get a formula for the
nth term of a sequence like the Catalan numbers, relying only upon a recurrence
relation for the sequence, and a collection of simple tools not requiring any special
ingenuity.
2.3
Generating series
We will now see a new way of investigating a combinatorial sequence. As we said
before, a lot of combinatorics is about sequences of numbers,
(a0 , a1 , a2 , . . .).
We’ve seen such sequences as
1, 1, 2, 3, 5, 8, 13, 21, 34, . . .
(the Fibonacci numbers), and
1, 1, 2, 6, 24, 120, 720, . . .
(the factorials). A very useful device for investigating a combinatorial sequence,
is to take its terms to be the coefficients in a power series,
∞
X
n=0
an x n = a0 + a1 x + a2 x 2 + a3 x 3 + . . . .
52
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
You’ve encountered things called ‘power series’ in calculus, and maybe in
analysis also. In combinatorics, they have a slightly different meaning. We are
not doing calculus, so we don’t necessarily have to worry about whether a power
series ‘converges’ or not. For us, a power series is just a way of combining infinitely
many numbers into a single mathematical object — in the words of Herbert S.
Wilf, ‘it is a clothesline, on which we hang up the numbers for display’.
For example, if our sequence is the factorials above, then the power series is
∞
X
n!xn = 1 + x + 2x2 + 6x3 + 24x4 + 120x5 + 720x6 + . . . .
n=0
If you remember the ratio test from calculus, you should be able to show that, if
we view x as a real number, this series only converges when x = 0. The ratio of
successive terms is
(n + 1)!xn+1
= (n + 1)x,
n!xn
which tends to infinity as n → ∞. But this power series is still useful!
In formal mathematical language, a power series
∞
X
an x n
n=0
is just an element of a ring, where we define addition by
∞
X
an x n +
∞
X
bn x n =
(an + bn )xn ,
n=0
n=0
n=0
∞
X
and multiplication by
∞
X
n=0
!
an x n
∞
X
!
bn x n
n=0
=
∞
n
X
X
n=0
!
ak bn−k
xn ,
k=0
in other words, we add and multiply power series as if they were polynomials.
This ring is denoted by R[[x]], if the power series have coefficients in R. Two
power series are defined to be equal if and only if they have the same coefficient
of xn for every n. We are not regarding x as a variable which can take different
values (yet!), so what happens when you substitute particular values for x is
totally irrelevant, for now!
If a0 , a1 , a2 , . . . is a combinatorial sequence, meaning that an is the number of
objects of a certain kind, for each n, then the power series
∞
X
n=0
an x n
2.3. GENERATING SERIES
53
is known as the generating series for (an ). (It is often called the generating
function for (an ), but this is misleading, as it is defined to be a power series,
rather than a function of x, so we will not use the term ‘generating function’ very
much at first.)
Multiplying the generating series for two combinatorial sequences has a useful
combinatorial interpretation. Suppose A and B are families of sets of different
sizes, where every set in A is disjoint from every set in B. For each n, let An be
the family of all sets in A with size n, and let Bn be the family of all sets in B
with size n. Define the combinatorial sequences
an = |An | = number of n-element sets in A,
bn = |Bn | = number of n-element sets in B.
Now let’s build a new family, C, consisting of all sets that are a union of a
set in A and a set in B. Let Cn be the family of all n-element sets in C, and let
cn = |Cn |, for each n. What is cn in terms of the ai ’s and the bi ’s? The answer is
cn =
n
X
ak bn−k ,
k=0
since to choose an n-element set in C, we must first choose an integer k between
0 and n, then a k-element set from A (ak choices), and then an (n − k)-element
set from B (bn−k choices). So altogether,
cn = |Cn | =
n
X
ak bn−k .
k=0
This means that
∞
X
cn x n =
n=0
∞
X
!
an x n
n=0
∞
X
!
bn x n
n=0
— the generating series for the sequence (cn ) is just the product of the generating
series for (an ) and the generating series for (bn ).
Now we define some other useful operations on power series.
P
n
Definition (Reciprocal). LetP ∞
n=0 an x be a power series with a0 6= 0. Its
∞
n
reciprocal is the power series n=0 bn x satisfying
! ∞
!
∞
X
X
an x n
bn xn = 1.
n=0
n=0
Equating the coefficients on both sides, this is equivalent to
a0 b 0 = 1
(equating coefficients of x0 ),
54
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
n
X
ai bn−i + a0 bn = 0 ∀n ≥ 1
(equating coefficients of xn ).
i=1
Rearranging, this is equivalent to
b0 = 1/a0 ,
n
1 X
ai bn−i
bn = −
a0 i=1
∀n ≥ 1,
which gives a recursive definition for the sequence (bn ).
P
P∞
n
n
Definition (Substitution). If A(x) = ∞
n=0 an x , B(x) =
n=0 bn x is a power
series with a0 = 0, then we define
B(A(x)) =
∞
X
bn (A(x))n ,
n=0
where (A(x))n is calculated using the multiplication rule.
Note that if a0 6= 0, then the above formula would not give a finite expression
for the coefficient of x0 in B(A(x)); instead, we would get
b0 + b1 a0 + b2 a20 + . . . .
If we have a0 = 0, however, then (A(x))n only contributes to the coefficient of
xi in B(A(x)) if n ≤ i, so the above formula gives a finite expression for all the
coefficients in B(A(x)).
P∞
n
Definition (Derivative). If A(x) =
n=0 an x is a power series, its (formal)
derivative is defined by
∞
X
A0 (x) =
nan xn−1 .
n=1
P∞
Definition (Integral). If A(x) = n=0 an xn is a power series, its (formal) integral is defined by
Z
∞
X
1
A(x) =
an xn+1 .
n
+
1
n=0
We now come to what is perhaps the most important tool for manipulating
power series.
Theorem 10 (General binomial theorem). For any rational number a,
∞ X
a n
(1 + x) =
x .
n
n=0
a
2.3. GENERATING SERIES
55
Here, if a is a rational number, we define the binomial coefficient
a
a(a − 1) . . . (a − n + 1)
=
;
n
n!
this agrees with our definition when a is positive integer.
The general binomial theorem can be viewed in two ways. Firstly, it can be
interpreted as a statement about power series: if we view 1 + x as a power series,
and we define (1 + x)a to be the power series above, then all the usual rules of
exponents hold, namely
(1 + x)a (1 + x)b = (1 + x)a+b ,
((1 + x)a )b = (1 + x)ab .
It can also be interpreted as a statement about functions of x. Namely, for any
real number x with −1 < x < 1, the right-hand side converges (by the ratio test),
and is equal to the left-hand side, provided we take the left-hand side to be the
ath power of (1 + x) which is real and positive.
An important special case of Theorem 10 is when a = −1; then we have
−1
(−1)(−2)(−3) . . . (−n)
= (−1)n ,
=
n!
n
so
(1 + x)−1 =
∞
X
(−1)n xn = 1 − x + x2 − x3 + x4 − . . . .
n=0
Substituting −x for x in the above, we get
(1 − x)
−1
=
∞
X
n=0
n
n
(−1) (−x) =
∞
X
2n n
(−1) x =
n=0
∞
X
xn .
n=0
Interpreting the two sides as functions of x, this is the familiar formula for the
sum of a geometric progression.
Generating series and recurrence relations
If
a0 , a1 , a2 . . .
is a combinatorial sequence, and we know a recurrence relation for an with initial
conditions, we can often use generating series to come up with a formula for an
as a function of n. Let’s see an example of how we do this with the Fibonacci
numbers.
Example 28. Recall that the Fibonacci numbers are defined by F0 = F1 = 1,
Fn = Fn−1 + Fn−2 for all n ≥ 2. Use the generating series for the sequence (Fn )
to derive a formula for Fn as a function of n.
56
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
Answer: Let
F (x) =
∞
X
F n xn
n=0
be the generating series of (Fn ). Using the recurrence relation Fn = Fn−1 + Fn−2 ,
we obtain:
∞
X
F (x) =
Fn xn
n=0
= F0 + F1 x +
∞
X
F n xn
n=2
=1+x+
∞
X
(Fn−1 + Fn−2 )xn
n=2
=1+x+
∞
X
n
Fn−1 x +
n=2
∞
X
=1+x+x
Fn−2 xn
n=2
∞
X
2
F n xn + x
n=1
∞
X
= 1 + x + x(
∞
X
Fn xn
n=0
n
Fn x − 1) + x
2
n=0
=1+x
∞
X
∞
X
Fn xn
n=0
F n xn + x 2
n=0
∞
X
F n xn
n=0
2
= 1 + xF (x) + x F (x).
Rearranging, we obtain:
(1 − x − x2 )F (x) = 1.
Taking reciprocals,
1
.
1 − x − x2
Notice that we have found a very simple formula for F (x), without any sums; this
is known as a ‘closed form’ expression for F (x). The formula on the right-hand
side is really a power series (although it can also be viewed as a function of x).
If we can find the coefficients of this power series, we will be done! To make this
task easier, let’s now express the right-hand side using partial fractions; this will
enable us to use the binomial theorem to find the coefficients of the power series.
First, let us factorise
F (x) =
1 − x − x2 = (1 − αx)(1 − βx);
then
1
α
and
1
β
are the two roots of the equation 1 − x − x2 = 0, so
√
√
1+ 5
1− 5
α=
, β=
.
2
2
2.3. GENERATING SERIES
Write
57
1
A
B
1
=
+
;
=
2
1−x−x
(1 − αx)(1 − βx)
1 − αx 1 − βx
then we must have
A(1 − βx) + B(1 − αx) = 1.
Equating coefficients of 1 and x on both sides, we get
A + B = 1,
−βA − αB = 0.
Solving this pair of simultaneous equations gives
A=
α
α
=√ ,
α−β
5
B=−
β
β
= −√ .
α−β
5
Hence,
√
√
1
1
α/ 5
β/ 5
F (x) =
=
=
−
.
1 − x − x2
(1 − αx)(1 − βx)
1 − αx 1 − βx
We can now expand the right-hand side as a power series, using the binomial
theorem: we get
∞
∞
α X
β X
F (x) = √
(αx)n − √
(βx)n
5 n=0
5 n=0
∞
X
αn+1 − β n+1 n
√
=
x .
5
n=0
Equating coefficients in F (x), we obtain
Fn =
αn+1 − β n+1
√
5
∀n ≥ 0,
the same formula as we obtained before.
Exponential Generating Series
There is another type of generating series which can be more useful for some combinatorial problems. This is called the exponential generating series of a sequence.
Let a0 , a1 , a2 , . . . be a sequence of real numbers. Its exponential generating series
is the power series
∞
X
an n
Ae (x) =
x .
n!
n=0
58
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
It is called the exponential generating series because of the close relation with
the exponential function
∞
X
tn
.
exp(t) =
n!
t=0
Note that the generating series
A(x) =
∞
X
an x n
n=0
we worked with before is often called the ordinary generating series for (an ), to
distinguish it from the exponential generating series. Like the ordinary generating
series, the exponential generating series is just defined to be a formal power series,
rather than a function of x (although in many applications later on, it will be
useful to regard it as a function of x).
Taking the derivative of an exponential generating series has a particularly
simple interpretation. Let a0 , a1 , a2 , . . . be a sequence of real numbers, and let
Ae (x) =
∞
X
an
n=0
n!
xn
be its exponential generating series. Then
A0e (x) =
=
=
∞
X
an
n xn−1
n!
n=1
∞
X
n=1
∞
X
an
xn−1
(n − 1)!
am+1 m
x ,
m!
m=0
which is just the exponential generating series for the sequence (bn ) defined by
bn = an+1 for all n ≥ 0, i.e., the sequence you get by shifting the original sequence
left by 1. Similarly, the rth derivative if Ae (x) is the exponential generating series
for the sequence you get by shifting (an ) to the left by r places.
Multiplying two exponential generating series also has a nice combinatorial
interpretation. Let a0 , a1 , a2 , . . . and b0 , b1 , b2 , . . . be sequences of real numbers,
and let
∞
∞
X
X
an n
bn n
Ae (x) =
x , Be (x) =
x
n!
n!
n=0
n=0
be their exponential generating series. Then their product is the exponential
generating series for the sequence c0 , c1 , c2 , . . ., where
n X
n
cn =
ak bn−k .
k
k=0
2.3. GENERATING SERIES
59
Indeed, we have
Ae (x)Be (x) =
∞
X
am m
x
m!
m=0
!
∞
X
bm m
x
m!
m=0
!
∞ X
n
X
ak bn−k n
=
x
k!(n − k)!
n=0 k=0
∞
X
1
=
n!
n=0
∞
X
1
=
n!
n=0
=
∞
X
cn
n=0
n!
m
X
n!
ak bn−k
k!(n
−
k)!
k=0
!
n X
n
ak bn−k xn
k
k=0
!
xn
xn
= Ce (x).
If (an ) and (bn ) are combinatorial sequences, what is the interpretation of the
sequence (cn )? Suppose A and B are families of structures. For each n ≥ 0, let
an be the number of structures in A which contain n points, and let bn be the
number of structures in B which contain n points. Suppose we now create a new
family of structures, C, as follows. For each pair of structures (A, B) with A ∈ A
and B ∈ B, we place A and B side-by-side (without overlap) to create a new
structure, and then we relabel the points. Let C be the set of all structures we
can produce in this way, and for each n ≥ 0, let cn be the number of structures
in C which contain n points. Often, cn will be given by
n X
n
cn =
ak bn−k .
k
k=0
Let’s see an example where this happens. For each n ≥ 3, let an be the number
of permutations in Sn which have a single cycle of length n, and define a0 = a1 =
a2 = 0. For each n ≥ 1, define bn to be the number of permutations in Sn which
only have cycles of lengths 1 and 2. Then
n X
n
ak bn−k
k
k=0
is the number of permutations in Sn which have exactly one cycle of length
greater than 2. (Choose a length k for the cycle
of length greater than 2. Then
n
choose which k numbers go in this cycle — k choices. Then choose how these
k numbers are arranged in the cycle — ak choices. Finally, choose the cycles
formed by the other n − k numbers — bn−k choices.)
60
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
Like ordinary generating series, exponential generating series are useful for
solving recurrence relations. Whether they are better or worse than ordinary
generating series, depends upon the form of the recurrence relation.
Recall that the Fibonacci numbers satisfy F0 = F1 = 1, Fn = Fn−1 + Fn−2
for all n ≥ 2. Since exponential generating series behave nicely when we shift a
sequence to the left, they give us another method of obtaining the general formula
for the Fibonacci numbers, which is easier in some ways (and harder in others)
than the method using ordinary generating series.
Example 29. Use exponential generating series to obtain again the general formula for the Fibonacci numbers.
Answer: recall that the Fibonacci numbers satisfy F0 = F1 = 1, Fn = Fn−1 +
Fn−2 for all n ≥ 2. Let
∞
X
Fn n
x
f (x) =
n!
n=0
denote the exponential generating series for (Fn ). We have
Fn+2 = Fn+1 + Fn
∀n ≥ 0,
and taking the exponential generating series of both sides of this equation gives:
∞
X
Fn+2
n=0
n!
xn =
∞
X
Fn+1
n=0
n!
xn +
∞
X
Fn
n=0
n!
xn .
Therefore,
f 00 (x) = f 0 (x) + f (x).
(2.2)
Let us now consider f (x) as a function of x, so f is a function from R to R.
Equation (2.2) is a second-order linear differential equation for f , and we know
how to solve these. Let us try to solve it, to obtain an explicit expression for f
as a function of x. As always with second-order, linear differential equations, we
try a solution of the form
f (x) = etx .
Substituting this into (2.2) yields:
t2 etx = tetx + etx .
Rearranging,
(t2 − t − 1)etx = 0.
Cancelling the factor of etx gives
t2 − t − 1 = 0.
2.3. GENERATING SERIES
61
√
This has roots t = α, β where α = 1+2 5 and β =
solution of (2.2) is
f (x) = Aeαx + Beβx .
√
1− 5
.
2
Therefore, the general
(2.3)
We now want to use our initial conditions to find A and B. We can do this easily
by finding the values of f (0) and f 0 (0). Note that if we evaluate the function
f (x) at zero, we get f (0) = F0 = 1. Note also that
0
f (x) =
∞
X
Fn+1
n=0
n!
xn ,
so f 0 (0) = F1 = 1. Substituting these values into (2.3) yields the pair of simultaneous equations
A+B =1
αA + βB = 1.
Multiplying the first equation by β gives
βA + βB = β
αA + βB = 1.
Subtracting gives
√
(α − β)A = 1 − β = α ⇒ A = α/ 5.
√
Similarly, B = −β/ 5.
Therefore, we have the following closed-form expression for the exponential
generating series:
α
β
f (x) = √ eαx − √ eβx .
5
5
Now let us expand the right-hand side as a power series; we obtain
∞
∞
∞ α X (αx)n
β X (βx)n X αn+1 − β n+1 1
√
−√
=
xn ,
f (x) = √
n!
5 n=0 n!
5 n=0 n!
5
n=0
so
∞
X
Fn
n=0
∞ n+1
X
α
− β n+1 1
√
x = f (x) =
xn .
n!
n!
5
n=0
n
Equating coefficients of xn on both sides gives
Fn =
αn+1 − β n+1
√
5
∀n ≥ 0,
giving yet another derivation of our general formula for the Fibonacci numbers.
62
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
Remark 8. Note that in this derivation, we actually considered the exponential
generating series f (x) as a function of x. When we consider (ordinary or exponential) generating series as functions from R to R, in order for our proofs to
work, we need the series to converge in some neighbourhood of zero. With most
of the generating series we work with, this will be true, although you may recall
that the ordinary generating series for n!,
∞
X
n!xn ,
n=0
converges only at x = 0, so it cannot be considered as a function of x. When you
are answering exam questions using generating series, you will not be required to
check convergence. (In the above example, the power series all converge for all
x ∈ R!)
Remark 9. Notice that this derivation using the exponential generating function
of the Fibonacci sequence is in some ways easier, and in some ways harder than the
one using the ordinary generating function. Harder, because we have to solve a
differential equation (although only one of the easiest kinds). Easier, because once
we had the closed-form expression for f (x), expanding it as a power-series was
very easy, using the power-series expansion of et . (With the ordinary generating
series, we had to use partial fractions and the general Binomial Theorem.)
Now let’s see an example where the ordinary generating series is much better
than the exponential one.
Example 30. P
Recall that the Catalan numbers (Cn ) may be defined by C0 =
C1 = 1, Cn = n−1
k=0 Ck Cn−1−k for all n ≥ 2. Use the generating series for the
sequence (Cn ) to derive a formula for Cn as a function of n.
Observe that the recurrence relation for Cn looks very like a term in the square
of the (ordinary) generating series of Cn . Let
C(x) =
∞
X
n=0
be the generating series for (Cn ).
C n xn
2.3. GENERATING SERIES
63
Inspired by this observation, notice that
∞
X
C(x)2 =
=
=
n=0
∞
X
n=0
∞
X
!2
C n xn
n
X
!
xn
Ck Cn−k
k=0
Cn+1 xn
n=0
1
=
x
∞
X
!
Cn x n − 1
n=0
1
= (C(x) − 1).
x
Therefore,
xC(x)2 − C(x) + 1 = 0.
Let us now think of C(x) as a function of x, so for any real number x, C(x) is
just a real number. The quadratic equation above has two solutions,
C(x) =
1±
√
1 − 4x
.
2x
√
Notice that the solution 1+ 2x1−4x → ∞ as x → 0, whereas we have C(0) = 1.
Therefore, we take the solution
C(x) =
1−
√
1 − 4x
.
2x
This is our ‘closed form’ expression for C(x). Now let us use the binomial theorem
64
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
to expand this as a power series. We have
(1 − 4x)
1/2
=
∞ X
1/2
n=0
n
(−4x)n
∞
X
( 21 )(− 12 ) · · · ( 32 − n)
(−4x)n
=1+
n!
n=1
=1+
∞
X
(2n − 3)(2n − 5) . . . (3)(1)
n=1
=1−
∞
X
n=1
n!2n
(−1)n−1 4n (−1)n xn
(2n − 2)!
4n xn
(n − 1)!2n−1 n!2n
∞
X
2 (2n − 2)! n
=1−
x
n ((n − 1)!)2
n=1
∞
X
1 2n − 2 n
=1−2
x .
n
n
−
1
n=1
It follows that
!!
∞
X
1 2n − 2 n
1− 1−2
x
n
n
−
1
n=1
∞
X
1 2n − 2 n−1
=
x
n n−1
n=1
∞
X
1
2n n
=
x .
n
+
1
n
n=0
1
C(x) =
2x
Equating coefficients of xn in C(x), we obtain
1
2n
Cn =
∀n ≥ 0.
n+1 n
We have obtained our previous formula for the Catalan numbers. Note that
the generating series approach requires none of the ingenuity of our previous
method. In fact, it is useful for tackling a wide range of problems.
Remark 10. Note that in the above derivation, we are considering C(x) as a
function of x, so in order to be completely rigorous, we would need to check that
C(x) converges in some open neighbourhood of 0 (in fact, it converges for all
|x| < 1/4.) When you are answering a question and are using a generating series
to derive a formula for a given recurrence relation, you do not need to check
convergence. If we want to be completely rigorous, we can always use generating
2.3. GENERATING SERIES
65
series to find the general formula, and then prove that it satisfies the recurrence
relation. This is a common strategy in mathematics: we often try to ‘guess’
the answer to a question, by non-rigorous means; only once we have guessed the
answer, do we try to prove it rigorously.
Now, let’s see an example where it is best to use the exponential generating
series.
Studying derangements via generating series
For each n ∈ N, let dn denote the number of derangements of {1, 2, . . . , n}. (Recall
that a derangement of {1, 2, . . . , n} is a permutation of {1, 2, . . . , n} which has
no fixed point.) We proved that
dn = n!
n
X
(−1)j
j=0
j!
,
using the inclusion-exclusion formula. We’ll now see two more proofs of this,
both using the exponential generating function for the sequence (dn ). For the
first proof, we need the following recurrence relation for dn .
Theorem 11. The derangement numbers (dn ) satisfy
dn = (n − 1)(dn−2 + dn−1 ) ∀n ≥ 3.
Proof. If f ∈ Sn is a derangement, then f (n) ∈ {1, 2, . . . , n − 1}. Our aim is to
count the number of derangements with f (n) = i, for each i ∈ {1, 2, . . . , n − 1},
in terms of dn−1 and dn−2 . If f ∈ Sn is a derangement with f (n) = i, then either
(a) f (i) = n, i.e. f swaps n and i, or else
(b) f (i) 6= n.
Let A denote the set of all derangements in Sn with f (n) = i and f (i) = n, and
let B denote the set of all derangements in Sn with f (n) = i but f (i) 6= n. Our
task is to find |A| and |B|.
Observe that if f ∈ A, then f has a disjoint cycle representation of the form
f = (i n)g,
where g consists of a collection of disjoint cycles (of length > 1) involving all the
numbers except for i and n. In other words, g is a derangement of {1, 2, . . . , i −
1, i + 1, i + 2, . . . , n − 1}. The number of choices for g is simply dn−2 , the number
of derangements of a set of size n − 2, and therefore |A| = dn−2 .
66
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
We now turn our attention to B. Let C be the set of all permutations in Sn
which fix n, but do not fix any other number; I will now construct a bijection
from B to C. Consider the map
Φ : Sn → Sn ;
f 7→ (i n)f.
Note that Φ is a bijection from Sn to itself. It is its own inverse, since
(i n)(i n)f = f
∀f ∈ Sn .
I claim that Φ(B) = C. To see this, first note that if f ∈ B, and g = Φ(f ) =
(i n)f , then
g(n) = (i n)f (n) = (i n)(i) = n.
Second, note that if f ∈ B, then Φ(f ) = (i n)f cannot fix any j ∈ {1, 2, . . . , n−1}.
Indeed, if f ∈ B and g = (i n)f fixes some j ∈ {1, 2, . . . , n − 1}, then
f (j) = (i n)g(j) = j,
so f also fixes j, a contradiction. Therefore, we have
Φ(B) ⊂ C.
(2.4)
Finally, note that if g ∈ B, then Φ(g) ∈ C, so
Φ(C) ⊂ B.
Applying Φ to both sides, we have
Φ(Φ(C)) ⊂ Φ(B).
Since Φ is its own inverse, we have Φ(Φ(C)) = C. It follows that
C ⊂ Φ(B).
(2.5)
Combining (2.4) and (2.5) proves that Φ(B) = C, as claimed.
Let φ : B → C be the restriction of Φ to B; then φ is a bijection from B to C.
It follows that |B| = |C|. Note that |C| = dn−1 , the number of derangements of
an (n − 1)-element set. We conclude that |B| = dn−1 .
We are finally ready to prove the recurrence relation. The number of derangements satisfies
dn = (n − 1)(dn−2 + dn−1 ),
since there are n − 1 choices for the image i of n, and for each choice, we have
|A| = dn−2 and |B| = dn−1 .
2.3. GENERATING SERIES
67
Our aim is now to use the exponential generating series to turn the recurrence
relation above into a formula for dn in terms of n. First, it is helpful to define
d0 = 1; then the recurrence relation dn = (n − 1)(dn−2 + dn−1 ) holds for all n ≥ 2.
Let
∞
X
dn n
x
D(x) =
n!
n=0
denote the exponential generating function of the sequence (dn ).
We first rewrite our recurrence relation in a form which is easier to use with
the exponential generating series:
dn+1 = n(dn−1 + dn ) ∀n ≥ 1.
We now multiply each side by xn /n! and sum over all n ≥ 1:
∞
X
dn+1
n=1
n!
n
x =
∞
X
ndn−1
n=1
n!
n
x +
∞
X
ndn
n=1
n!
xn .
(2.6)
Since d1 = 0, we may rewrite the left-hand side of (2.6) as
∞
X
dn+1
n=0
n!
xn = D0 (x).
We rewrite the right-hand side of (2.6) as
∞
∞
∞
∞
X
X
X
X
dn−1 n−1
dn
dn n
dn+1 n
n−1
x
x
+x
x
=x
x +x
x
(n
−
1)!
(n
−
1)!
n!
n!
n=1
n=1
n=0
n=0
= xD(x) + xD0 (x).
Therefore, we have
D0 (x) = xD0 (x) + xD(x).
(2.7)
Let us now regard D(x) as a function of x. Equation (2.7) is now a differential
equation, which is separable, so we can solve it! Rearranging, we have
D(x)
x
1
=
=
− 1.
0
D (x)
1−x
1−x
Integrating both sides, we get
ln(D(x)) = − ln(1 − x) − x + C,
where C is the constant of integration. Raising both sides to the power of e, we
get
Ae−x
D(x) =
,
1−x
68
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
where A = eC . To find A, note that we need D(0) = d0 = 1, giving A = 1.
Therefore,
e−x
D(x) =
1−x
— we have found a closed form expression for D. Expanding the left-hand side
as a power series in powers of x gives:
! ∞
!
!
∞
n
∞
X
X
X
X
(−1)k
(−1)m m
m
x
=
x
xn .
D(x) =
m!
k!
m=0
n=0
m=0
k=0
Therefore, we have
∞
X
dn
n=0
∞
n
X
X
(−1)k
n
x = D(x) =
n!
k!
n=0
k=0
!
xn .
Equating coefficients of powers of xn on both sides, we have
dn = n!
n
X
(−1)k
k=0
k!
∀n ≥ 0.
We have re-proved our formula for the derangement numbers!
We now give another proof, which is much shorter. Observe that the derangement numbers also satisfy the equation
n X
n
n! =
dn−k .
k
k=0
This is because both sides count the number of permutations in Sn : to choose a
permutation in Sn , we can first choose exactly how many fixed points it has —
say k ∈ {0, 1, . . . , n}. We can then choose which k points are fixed — nk choices.
The other cycles of the permutation must form a derangement of the other n − k
numbers — there are dn−k choices for this derangement. The total number of
possibilities is therefore
n X
n
dn−k .
k
k=0
Recall the formula for multiplying two exponential generating series:
! ∞
!
∞
∞
X an
X
X
cn n
an n
n
x
x
=
x ,
n!
n!
n!
n=0
n=0
n=0
where
n X
n
cn =
ak bn−k .
k
k=0
2.3. GENERATING SERIES
69
If we choose an = 1 for all n and bn = dn for all n, then we get cn = n! for all n.
Therefore,
! ∞
!
∞
∞
X
X dn
X
1 n
n! n
x
xn =
x .
n!
n!
n!
n=0
n=0
n=0
Thinking of both sides of the above equation as functions, we obtain
ex D(x) =
1
,
1−x
so we recover the same closed form expression for D(x) as we had above.
Composing exponential generating functions
It turns out that composing two exponential generating series has a useful combinatorial interpretation. (Composing two ordinary generating series is not so
useful, in general.) This is given by the following theorem.
Theorem 12. Let
A(x) =
∞
X
an
n=0
where a0 = 0, and let
B(x) =
n!
∞
X
bn
n=0
Then
B(A(x)) =
n!
xn ,
xn .
∞
X
cn
n=0
n!
xn ,
where
X
cn =
bk a|S1 | a|S2 | · . . . · a|Sk | .
π={S1 ,...,Sk }∈Bn
(Recall that Bn denotes the set of all partitions of {1, 2, . . . , n}.)
This has many uses. One is counting permutations with a given number of
cycles of given lengths. Here is an example.
Example 31. Use exponential generating series to find an explicit formula (as
a function of n) for the number of permutations in Sn whose cycles all have odd
lengths.
Answer: Define
an =
(n − 1)!
0
if n is odd,
if n is even,
70
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
and define bn = 1 for all n ≥ 0. If cn is given by the formula in Theorem 12, then
we have
X
cn =
bk a|S1 | a|S2 | · . . . · a|Sk |
π={S1 ,...,Sk }∈Bn
X
=
bk (|S1 | − 1)!(|S2 | − 1)! · . . . · (|Sk | − 1)!,
π={S1 ,...,Sk }∈Bn : |Si | is odd ∀i
which is precisely the number of permutations whose cycles all have odd lengths!
(The partition π of {1, 2, . . . , n} tells us which numbers go in the same cycles
as one another; there are (|Si | − 1)! cycles we can form from the numbers in
the part Si .) Therefore, by Theorem 12, if A(x), B(x), C(x) are the exponential
generating series for (an ), (bn ) and (cn ) respectively, then we have
C(x) = B(A(x)).
Note that B(x) = exp(x), when viewed as a function of x. We have
A(x) =
X xn
(n − 1)! n
x =
.
n!
n
n≥1, n odd
n odd
X
n≥1,
Can we write A as a familiar function of x? Yes, we can! Notice the similarity
to the power-series expansion of log(1 + x):
ln(1 + x) = x −
x2 x3 x4
+
−
+ ....
2
3
4
Similarly,
ln(1 − x) = −x −
x2 x3 x4
−
−
− ....
2
3
4
Therefore,
A(x) =
xn
= 12 (ln(1 + x) − ln(1 − x)) = ln
n
n odd
X
n≥1,
r
1+x
.
1−x
Hence,
r
C(x) = exp ln
1+x
1−x
!
r
=
1+x
.
1−x
To find the number of permutations with all cycles of odd lengths, we now expand
C(x) as a power series in powers of x. To do this, it is easier to rewrite
√
1 − x2
C(x) =
,
1−x
and then to expand the numerator as a power series.
2.3. GENERATING SERIES
71
By the general binomial theorem, we have
∞ X
√
1/2
1 − x2 =
(−x2 )n
n
n=0
=1+
=1+
=1−
∞
X
( 21 )(− 21 ) · · · ( 32 − n)
(−1)n x2n
n!
n=1
∞
X
(2n − 3)(2n − 5) . . . (3)(1)
n=1
∞
X
n=1
n!2n
(−1)2n−1 x2n
(2n − 2)!
x2n .
n−1
n
(n − 1)!2 n!2
∞
X
(2n − 2)!
x2n .
2n−1 n!2n
(n
−
1)!n!2
n=1
∞
X
1
2n − 2 2n
=1−
x .
22n−1 n n − 1
n=1
=1−
Now observe that it is easy to multiply a power series by 1/(1 − x): for any
sequence (fn ) of real numbers, we have
!
! ∞
!
∞
∞
∞
X
X
X
X
1
f n xn
gn xn ,
=
f n xn
xn =
1
−
x
n=0
n=0
n=0
n=0
where
gn =
∞
X
fi .
i=0
In other words, if we multiply a power series by 1/(1 − x), the coefficients of the
new power series are just the partial sums of the coefficients of the old power
series.
So in our case, we have
√
!
X
∞
∞
X
1 − x2
1
2n − 2 2n
1
cn n
C(x) =
= 1−
x
=
x ,
2n−1
1−x
2
n n−1
1−x
n!
n=1
n=0
where
X
cn
1
2i − 2
=1−
.
n!
22i−1 i i − 1
1≤i≤n/2
Hence, cn , the number of permutations in Sn with all cycles of odd lengths,
satisfies




X
X
1
2i − 2 
Ci−1 
cn = n! 1 −
= n! 1 −
,
2i−1
2
i i−1
22i−1
1≤i≤n/2
1≤i≤n/2
72
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
2m
1
where Cm = m+1
denotes the mth Catalan number.
m
We’ll now use exponential generating series to prove a rather surprising theorem.
Theorem 13. If n ≥ 2 is an even integer, then the number of permutations in
Sn with all cycles of even lengths is equal to the number of permutations in Sn
with all cycles of odd lengths.
Proof. Define
gn =
and let
G(x) =
∞
X
gn
n=0
n!
n!
0
if n is even,
if n is odd,
X
xn =
xn = 1 + x2 + x4 + . . .
n even, ≥0
be the exponential generating series for (gn ).
Let en be the number of permutations in Sn with all cycles even, for each
n ≥ 0, and let
number of permutations in Sn with all cycles odd if n is even,
on =
0
if n is odd.
(For convenience, we define e0 = o0 = 1.) Let
E(x) =
∞
X
en
n=0
n!
n
x ,
O(x) =
∞
X
on
n=0
n!
xn
be their exponential generating series. Our aim is to show that in fact, E(x) =
O(x). To do this, we’ll first show that
E(x)O(x) = G(x).
Indeed, we have
∞
X
en
n=0
where
n!
!
x
n
∞
X
on
n=0
n!
!
=
∞
X
fn
n=0
n!
,
∞ X
n
fn =
ek on−k .
k
k=0
But if n is even, then fn is just the number of permutations in Sn : to choose
a permutation in Sn , we can first choose how many numbers are in even cycles
(say k), then choose exactly which numbers are in even cycles ( nk choices), then
choose the even cycles (ek choices), and then choose the odd cycles (on−k choices).
2.3. GENERATING SERIES
73
If n is odd, then fn = 0, since ek = 0 if k is odd, and on−k = 0 if n − k is odd,
and either k or n − k must be odd. So fn = gn , for all n ≥ 0. This proves that
E(x)O(x) = G(x).
Viewing G(x) as a function of x, we have
G(x) = 1 + x2 + x4 + x6 + . . . =
1
.
1 − x2
Now we’ll express E(x) as a function of x. To do this, we do a very similar
thing to in Example 31. Define the sequence
(n − 1)! if n is even, ≥ 2
an =
,
0
otherwise
so an just counts even cycles; define bn = 1 for all n ≥ 0. Let
A(x) =
∞
X
an
n=0
n!
xn ,
B(x) =
∞
X
bn
n=0
n!
xn
be their exponential generating series. Using Theorem 12, we have
B(A(x)) =
∞
X
cn
n=0
n!
xn ,
where
X
cn =
bk a|S1 | a|S2 | · . . . · a|Sk |
π={S1 ,...,Sk }∈Bn
X
=
(|S1 | − 1)!(|S2 | − 1)!) · . . . · (|Sk | − 1)!
π={S1 ,...,Sk }∈Bn :
|Si | is even ∀i
= en ,
by the same argument as in Example 31. So
E(x) = B(A(x)).
As before, B(x) = exp(x), when viewed as a function of x. Now observe that
X xn
A(x) =
n
n even, ≥2
!
∞
∞
n
n
X
X
x
x
+
= 12
(−1)n
n
n
n=1
n=1
= 21 (− ln(1 + x) − ln(1 − x))
1
= ln √
.
1 − x2
74
CHAPTER 2. RECURRENCE RELATIONS & GENERATING SERIES
Therefore,
1
1
E(x) = exp ln √
=√
.
2
1−x
1 − x2
Since E(x)O(x) =
1
,
1−x2
we must have
O(x) = √
1
= E(x).
1 − x2
If two power series are the same as functions of x, then they must have the same
coefficients — if
∞
X
cn n
C(x) =
x ,
n!
n=0
then we can calculate the nth coefficient by differentiating the function n times
and setting x = 0:
dn
cn =
C(x)
.
dxn
x=0
It follows that en = on for all n, proving the theorem.
Remark 11. To be completely rigorous, we must note that all of the above power
series converge for −1 < x < 1, so the above proof is indeed a valid one!
Remark 12. There is, in fact, a bijective proof of Theorem 13, but it is not a
particularly nice one!
Chapter 3
Graph Theory
3.1
Introduction
A graph is one of the most basic and important objects in Combinatorics. Informally, a graph is a set of points (called vertices), together with a set of lines
(called edges) where each edge joins a pair of vertices together, and each pair of
vertices has at most one edge between them. Hence, the London Underground
can be thought of as a graph: the vertices are the stations, and two stations
have an edge between them if they are adjacent stops on one of the Underground
Lines. Here is part of it:
Baker Street
Regent’s Park
Bond
Street
Oxford Circus
Picadilly Circus
Green Park
The formal mathematical definition of a graph is as follows.
Definition. A graph G is a pair of sets (V, E) where E is a set of pairs of
elements of V . The set V is called the set of vertices of G (or the vertex-set of
G, for short), and the set E is called the set of edges of G (or the edge-set of G,
for short).
For example, the picture above corresponds to the graph with vertex-set
{Baker Street, Bond Street, Green Park, Oxford Circus, Regent’s Park, Picadilly Circus},
75
76
CHAPTER 3. GRAPHS
and edge-set
{{Baker Street, Bond Street}, {Bond Street, Green Park},
{Green Park, Picadilly Circus}, {Picadilly Circus, Oxford Circus},
{Oxford Circus, Regent’s Park}, {Regent’s Park, Baker Street},
{Green Park, Oxford Circus}, {Oxford Circus, Bond Street}}.
Often, we will work with graphs whose vertex-sets are {1, 2, . . . , n} for some
natural number n. Here is a picture of the graph with vertex-set {1, 2, 3, 4} and
edge-set {{1, 2}, {2, 3}, {1, 3}, {2, 4}}:
3
1
2
4
If G is a graph, we will often denote its vertex-set by V (G) and its edge-set
by E(G).
In this course, we will only be concerned with finite graphs: graphs where the
vertex-set (and therefore the edge-set) is finite. (Infinite graphs are intersting
mathematical objects too, but we will not study them in this course.) From now
on, ‘graph’ will always mean ‘finite graph.’
If G is a finite graph, we will often write v(G) for the number of vertices of
G, and e(G) for the number of edges of G.
Here are some of the simplest examples of graphs.
The path Pn :
1
3
2
5
4
n
n−1
3.1. INTRODUCTION
77
The cycle Cn :
2
3
n
4
n−1
5
The empty graph, En , consisting of n vertices and no edges:
1
2
3
n
...
The complete graph, Kn , consisting of n vertices which are all joined to one
another by edges.
1
5
2
3
4
(Note that e(Kn ) = n2 .)
Notice that in the above definition of a graph, the vertices (points) are labelled,
and the labels matter: the graph
3
G1 1
2
is different to the graph
G2
2
1
3
78
CHAPTER 3. GRAPHS
Some authors use the term ‘labelled graph’ for what we defined as a ‘graph’.
An unlabelled graph is different: informally, it is a ‘graph’ where we do not label
the vertices. Both of the graphs G1 and G2 above define the same unlabelled
graph:
Informally, an unlabelled graph is produced by ‘forgetting about the labels
on the vertices of a labelled graph’. Formally, we can define unlabelled graphs as
follows.
If G and H are labelled graphs, we say that they are isomorphic if there
exists a bijection f : V (G) → V (H) such that {u, v} ∈ E(G) if and only if
{f (u), f (v)} ∈ E(H). The bijection f is then called an isomorphism from G to
H. For example, the bijection
1 7→ 1, 2 7→ 3, 3 7→ 2
is an isomorphism from G1 to G2 . (Informally, an isomorphism from G to H can
be thought of as a way of relabelling the vertices of G with the names of vertices
of H, in such a way as to turn G into H.)
We define a relation ∼ on the set of labelled graphs as follows. We say that
G ∼ H if G is isomorphic to H. Note that ∼ is an equivalence relation — it
satisfies the three axioms for an equivalence relation:
Reflexive G ∼ G for all graphs G. (The identity map on V (G) is an isomorphism from G to itself.)
Symmetric If G ∼ H, then H ∼ G. (If f is an isomorphism from G to H, then
f −1 is an isomorphism from H to G.)
Transistive If G ∼ H and H ∼ K, then G ∼ K. (If f is an isomorphism from G
to H, and f 0 is an isomorphism from H to K, then f 0 ◦ f is an isomorphism
from G to K.)
Recall that if X is a set, and ∼ is an equivalence relation on X, then we can
partition X into ∼-equivalence classes. (The equivalence classes are defined by
x ∼ x0 if and only if x and x0 are in the same equivalence class.) An unlabelled
graph is defined to be a ∼-equivalence class of labelled graphs.
This constructs unlabelled graphs (as formal mathematical objects), starting
with labelled graphs, which in turn were defined in terms of sets. It is nice to
know that we can do this, but when you are solving problems, you should just
think of an unlabelled graph as a graph whose vertices do not have labels.
It is useful to have a few more definitions. If G and H are labelled graphs, we
say that H is a subgraph of G if V (H) ⊂ V (G) and E(H) ⊂ E(G). For example,
if
3.1. INTRODUCTION
79
1
4
2
1
G=
4
H=
2
3
then H is a subgraph of G. We will sometimes write H ⊂ G to mean that H
is a subgraph of G.
If H is a subgraph of G, we say that H is an induced subgraph of G if whenever
u, v ∈ V (H) with {u, v} ∈ E(G), we have {u, v} ∈ E(H) as well. The graph H
(above) is not an induced subgraph of G (above) because it does not contain the
edge {1, 4}, but the graph
3
2
4
is an induced subgraph of G.
If H is a subgraph of G, we say that H is a spanning subgraph of G if V (H) =
V (G).
Now we come to an important property which graphs can have. We say that
a graph G is connected if for any two distinct vertices u, v ∈ V (G), there is a walk
along edges of the graph which begins at u and ends at v. In other words, if there
is a sequence of vertices w1 , w2 , . . . , wl where w1 = u, wl = v, and {wi , wi+1 } is
an edge of G for all i ∈ {1, 2, . . . , l − 1}.
It is easy to see that if G is a graph, and u, v ∈ V (G), then there is a walk
(in G) from u to v if and only if there is a path (in G) from u to v. (A walk can
repeat vertices, but a path cannot.) Exercise: write down a formal proof of this!
So equivalently, a graph is connected if and only if for any two distinct vertices
u, v ∈ V (G), there is a path in G from u to v.
If a graph is not connected, it is said to be disconnected.
If G is a graph, then we can partition it into maximal connected subgraphs.
The maximal connected subgraphs of a graph are called the components of a
graph.
80
CHAPTER 3. GRAPHS
For example, the graph G below has components G1 , G2 and G3 :
6
3
2
4
8
9
7
G=
1
G1
5
10
G2
G3
Here, saying that H is a ‘maximal’ connected subgraph of G means that if
you try to extend H to a larger subgraph of G, you get a disconnected subgraph.
In general, the term ‘maximal’ is different to the term ‘maximum’. A ‘maximum’
object means ‘one of the largest objects of its kind’. A ‘maximal’ object means
‘an object which cannot be made any larger without destroying the key property.’
In the above example, the maximum connected subgraphs of G are G1 and G2 ,
but the maximal connected subgraphs of G are G1 , G2 and G3 .
Notice that a component of a graph G is always an induced subgraph of G.
By definition, two distinct components of a graph cannot share any vertices.
3.2. TREES
3.2
81
Trees
A tree is a simple but important type of graph.
Definition. A tree is a connected graph which contains no cycles.
If a graph contains no cycles, we say that it is acyclic. So a tree is precisely
a connected, acyclic graph. Notice that if G is an acyclic graph, then each of its
components is a connected, acyclic graph (i.e., a tree). So an acyclic graph is
often called a forest.
The following theorem tells us two useful conditions which are equivalent to
the condition of being a tree.
Lemma 14. Let G be a graph. The following three conditions are equivalent.
(a) G is a tree.
(b) G is a minimal connected graph.
(c) G is a maximal acyclic graph.
Remark 13. Here, saying that a graph G is ‘minimal connected’ means that
it is connected, but removing any edge from G produces a disconnected graph.
Similarly, saying that a graph G is ‘maximal acyclic’ means that it is acyclic, but
adding any edge to G (between two vertices of G with no edge of G between them)
produces a graph with a cycle.
Proof of Lemma 14. (a) ⇒ (b): Suppose G is a tree. Then it is connected; we
must show that it is a minimal connected graph. Let e = {u, v} be an edge of
G. Remove e to produce a new graph G0 . I claim that G0 is disconnected. To see
this, suppose for a contradiction that G0 is connected; then there is a path in G0
from u to v. Together with the edge e, this forms a cycle in G, contradicting the
fact that G is acyclic. Hence, G is a minimal connected graph.
(a) ⇒ (c): Suppose G is a tree. Then it is acyclic; we must show that it is
a maximal acyclic graph. Let u, v ∈ V (G) such that {u, v} ∈
/ E(G). Produce
0
a new graph G by adding in the edge {u, v}. I claim that G0 contains a cycle.
To see this, observe that since G is connected, there is a path from u to v in
G. Together with the new edge {u, v}, this forms a cycle in G0 . Hence, G is a
maximal acyclic graph.
(b) ⇒ (a): Suppose G is a minimal connected graph. We must show that it is
acyclic. Suppose for a contradiction that G has a cycle, C say. Let e be any edge
of the cycle C. Produce a new graph G0 by removing e. I claim that the graph G0
is connected (this will contradict the fact that G is a minimal connected graph).
To see this, let u, v ∈ V (G0 ). Since G is connected, there must be a path in G
from u to v. If this path does not use the edge e, then it is also a path from u to
v in G0 . If the path does use the edge e, then we can produce a walk from u to
82
CHAPTER 3. GRAPHS
v in G0 by replacing the edge e by the other edges of the cycle. Therefore, G0 is
connected, proving the claim.
(c) ⇒ (a): Suppose G is a maximal acyclic graph. We must show that G is
connected. Let u, v ∈ V (G); we will show that there exists a path in G from u
to v. If {u, v} ∈ E(G), then we are done, so we may assume that {u, v} ∈
/ E(G).
Let G0 be the graph produced from G by adding in the edge {u, v}. Then G0
contains a cycle. This cycle must contain {u, v}, otherwise G itself would have
had a cycle. The other edges of the cycle form a path from u to v. Therefore, G
is connected.
We can use Lemma 14 to prove the following useful fact.
Theorem 15. If G is connected, then G has a spanning tree. (A spanning tree
of G is a subgraph of G which contains all the vertices of G, and is a tree.)
Proof. Let G0 be a minimal connected spanning subgraph of G. (This means
that G0 is a spanning subgraph of G, which is connected, but removing any edge
from G0 produces a disconnected graph. Observe that G is a connected spanning
subgraph of itself, so there does exist at least one connected spanning subgraph
of G, and so there must be a minimal one.) By Lemma 14, G0 is a tree, so it is a
spanning tree, proving the theorem.
Choosing a minimal (or maximal) object with a certain property is a very
important method of proof in combinatorics. Our next theorem is also proved
using this kind of technique. First, we need some more definitions.
Definition. Let G be a graph, and let v be a vertex of G. The degree of v is the
number of edges of G which meet v; it is denoted by d(v). The neighbours of v
are the vertices which are joined to v by edges of G; the neighbourhood of v is
the set of neighbours of v.
Definition. Let G be a graph. If e ∈ E(G), then G − e is the graph produced
from G by removing the edge e. If v ∈ V (G), then G − v is the graph produced
from G by removing the vertex v, and all the edges which meet v.
Theorem 16. Let T be a tree with at least two vertices. Then T contains at least
two leaves.
Proof. Choose any vertex v ∈ T . Let P be a maximal path in T which goes
through v. (Here, ‘maximal’ means that that there is no edges= of T which we
could add to P to produce a longer path. Note that, since T is a connected graph
with at least two vertices, P has at least one edge.)
We shall prove that an endpoint of P must be a leaf of T . Let a be an
endpoint of P ; I claim that the only neighbour of a in T is the vertex adjacent
to a on P . Suppose for a contradition that there is another vertex c which is
also a neighbour of a in T . If c does not lie on P , then we could extend P to
3.2. TREES
83
c, contradicting the maximality of P . If c does lie on P , then together with the
part of P between a and c, the edge {a, c} forms a cycle in T , contradicting the
fact that T is acyclic. This proves that a has just one neighbour in T , so is a
leaf. Since P has two endpoints, T has at least two leaves.
We can use this to prove the following theorem.
Theorem 17. Let T be a tree with n vertices. Then e(T ) = n − 1.
Proof. By induction on n. When n = 1, this is clear. Let n ≥ 2, and suppose
that the statement of the theorem holds for all trees with n − 1 vertices. Let T
be a tree with n vertices. By the previous theorem, T has a leaf, v say. Let u be
the neighbour of v. Remove v and the edge {u, v} to produce a new graph, T 0 .
Note that T 0 is a tree. By the induction hypothesis, we have e(T 0 ) = n − 2, and
therefore e(T ) = e(T 0 ) + 1 = n − 1. This completes the proof of the inductive
step, proving the theorem.
Induction on the number of vertices (or edges) of a graph is another very
frequently used method of proof in graph theory.
We can now give two other conditions which are equivalent to a graph being
a tree.
Theorem 18. Let G be a graph with n vertices. The following three conditions
are equivalent.
(a) G is a tree.
(b) G is a connected graph with e(G) = n − 1.
(c) G is an acyclic graph with e(G) = n − 1.
Proof. (a) ⇒ (b) and (c): this is Theorem 17.
(b) ⇒ (a): Let G be a connected graph with n vertices and n − 1 edges. By
Theorem 15, G has a spanning tree, T say. By Theorem 17, T also has n − 1
edges, so G = T , so G is a tree.
(c) ⇒ (a) Let G be an acyclic graph with n vertices and n − 1 edges. Now
we build a new graph G0 (with the same set of vertices as G) as follows. Starting
with G, let us produce G0 by trying to add as many edges as we can to G
without producing any cycles. The graph G0 will be a maximal acyclic graph,
and therefore a tree, by Lemma 14. Hence, by Theorem 17, it has n − 1 edges.
But G also has n − 1 edges, so in fact, G = T , so G is a tree.
So far, we have mainly been interested in the properties of trees. As combinatorialists, we are also interested in how many trees there are!
Question: How many (labelled) trees are there with vertex-set {1, 2, . . . , n}?
The answer is surprisingly simple.
84
CHAPTER 3. GRAPHS
Theorem 19. If n ≥ 2, then there are nn−2 labelled trees with vertex-set {1, 2, . . . , n}.
Proof. Let Tn denote the set of labelled trees with vertex-set {1, 2, . . . , n}. Let
Sn denote the set of all sequences of n − 2 numbers, where each number is an
integer between 1 and n. We know from Chapter 1 that |Sn | = nn−2 . We shall
give a bijection f : Tn → Sn . (This bijection was discovered by Prűfer.)
Given a tree T ∈ Tn , we define the sequence f (T ) as follows. Let i1 be the leaf
of T with the lowest label, and let j1 be its neighbour in T . Write down j1 as the
first element of the sequence, and then produce a new tree (T2 say) by removing
the leaf i1 and the edge {i1 , j1 } from T . Now repeat this process on the new tree
T2 : let i2 be the leaf of T2 with the lowest label, let j2 be its neighbour in T , and
write down j2 as the second element of the sequence. Produce a new tree T3 by
deleting the leaf i2 and the edge {i2 , j2 }. Repeat this process until you are left
with a tree which has just one edge, and then stop. Since T had n − 1 edges, and
we removed an edge at each stage, the sequence we produce has length n − 2.
Here is an example:
1
1
1
T=
4
2
3
2
2
5
6
8
3
3
5
5
6
8
8
7
7
7
1
3
3
3
3
5
8
5
5
8
8
8
7
7
The tree T above has f (T ) = (1, 2, 1, 3, 5, 3).
Challenge: prove that f is a bijection! (There will be small prizes for correct
answers to this.)
3.3. BIPARTITE GRAPHS AND MATCHINGS
3.3
85
Bipartite graphs and matchings
We now come to another important type of graph.
Definition. Let r ∈ N with r ≥ 2. We say that a graph G is r-partite if we can
partition V (G) into r sets (or ‘classes’) such that none of these classes contains
an edge of G.
For example, the following graph is 4-partite; a suitable partition into 4 classes
is indicated by circles.
2
1
3
4
5
8
6
9
7
(It is also 3-partite; can you see why?)
A very important special case of this definition is when r = 2. A 2-partite
graph is usually called a bipartite graph. A graph is bipartite if its vertex-set can
be partitioned into two sets (X and Y say) such that every edge of the graph has
one end in X and the other in Y .
A tree is an example of a bipartite graph. (Can you prove this before reading
on?) The complete bipartite graph Km,n is another example: this is a graph with
bipartition X ∪ Y , where |X| = m and |Y | = n, where every vertex of X is joined
to every vertex of Y :
Remarkably, there is a very simple characterization of bipartite graphs.
Theorem 20. Let G be a graph. Then G is bipartite if and only if it contains
no odd cycles.
To prove this, we need to define the distance between two vertices in a graph.
Definition. Let G be a graph, and let u, v ∈ V (G). The distance from u to v is
the length of the shortest path (in G) from u to v; it is written d(u, v).
86
CHAPTER 3. GRAPHS
X
Y
p
a
K3,4
b
q
c
r
s
Proof of Theorem 20. First suppose that G is bipartite; we must prove that G
contains no odd cycle. Suppose for a contradiction that G does have an odd
cycle; let v1 v2 . . . v2l+1 v1 be an odd cycle in G. Since G is bipartite, there is a
partition V (G) = X ∪ Y such that all the edges of G go between X and Y . By
relabelling if necessary, we may assume that v1 ∈ X. Then v2 ∈ Y , so v3 ∈ X,
and so on — in general, if i is odd, then vi ∈ X, and if i is even, then vi ∈ Y .
This implies that v2l+1 ∈ X — but then v1 ∈ X, a contradiction. Therefore, a
bipartite graph cannot contain an odd cycle.
For the other direction, suppose that G has no odd cycles; we must prove that
G is bipartite. Let G1 , . . . , GN be the components of G; we shall deal with each
component separately. Let us prove that G1 is bipartite.
Choose any vertex v0 ∈ V (G1 ). For each i, let
Vi = {w ∈ V (G1 ) : d(v0 , w) = i}
be the set of all vertices of G1 which are at a distance of exactly i from v0 . If l
is the maximum possible distance a vertex of G can be from v0 , then
V0 ∪ V1 ∪ . . . ∪ Vl
is a partition of V (G1 ).
Firstly, observe that there are no edges of G1 between Vi and Vj if j > i + 1.
Indeed, if there was an edge of G1 between Vi and Vj , say {a, b} ∈ E(G1 ) with
a ∈ Vi and b ∈ Vj , then we have
d(v0 , b) ≤ d(v0 , a) + 1 = i + 1 < j,
contradicting b ∈ Vj . (This is true whether or not the graph G1 has odd cycles.)
3.3. BIPARTITE GRAPHS AND MATCHINGS
87
V0
V1
V2
Vi
Vj
Vl
Secondly, observe that there are no edges of G1 within any of the Vi . Indeed,
if there was an edge of G1 within Vi , then there would be a path of length i back
to v0 from each of its two endpoints. At the point where these two paths meet,
you would get an odd cycle, a contradiction.
V0
V1
V2
Vi
Vj
Vl
We may conclude that each edge of G1 goes between Vi and Vi+1 , for some i.
Therefore, the bipartition
[
[
X=
Vi , Y =
Vi
i even
i odd
shows that G1 is a bipartite graph. Similarly, each of the other components
G2 , . . . , GN are bipartite, so G is bipartite.
Theorem 20 makes it easy for us to decide whether a given graph is bipartite:
we just need to check whether it has any odd cycles. Unfortunately, there is no
such theorem for 3-partite graphs (or for r-partite graphs, for any r > 2).
Matchings and Hall’s Marriage Theorem
Suppose there are a certain number of women, each of whom has some male
friends. Is it possible for each woman to marry a man who she knows? (Each
man is only allowed to marry one woman!) If there are four women (A, B, C and
D), and four men (P, Q, R and S), and
• A knows P and Q only,
88
CHAPTER 3. GRAPHS
• B knows P and Q only,
• C knows P, Q, R and S,
• D knows P and Q only,
then this is not possible: between them, A, B and D know only two men (P and
Q). Hall’s Marriage Theorem gives us a condition which tells us exactly when it
is possible. (In fact, the kind of situation above is the only obstacle.)
The question can be phrased in terms of a bipartite graph. Let X be the set
of women, and let Y be the set of men. Define a bipartite graph with bipartition
X ∪ Y by joining each woman to all the men she knows; what we want to find is
a set of edges (M say) in this graph, such that each vertex in X meets an edge
of M , and no two edges of M share any vertex. This is called a matching.
Definition. Let G be a bipartite graph with bipartition X ∪ Y . A set of edges M
in G is called a matching (from X to Y ) if each vertex in X meets one edge, and
no two edges of M share any vertex.
For example, the bold edges in the graph below form a matching from X to
Y:
X
Y
p
a
b
q
r
c
s
d
e
t
u
Before stating Hall’s Marriage Theorem, we need some more notation. If G
is a graph, and S is a set of vertices of G, we define
Γ(S) = {v ∈ V (G) : {v, s} ∈ E(G) for some s ∈ S}
to be the set of vertices joined to a vertex in S; this is sometimes called the
neighbourhood of S in G.
If A is a set of vertices of G, we define G[A] to be the induced subgraph of
G with vertex-set A — in other words, the subgraph of G whose vertex-set is A,
and where two vertices are joined if and only if they are joined in G.
Here, then, is Hall’s Marriage Theorem.
3.3. BIPARTITE GRAPHS AND MATCHINGS
89
Theorem 21 (Hall’s Marriage Theorem). Let G be a bipartite graph with bipartition X ∪ Y . Then G has a matching from X to Y if and only if
|Γ(A)| ≥ |A|
∀A ⊂ X
(‘Hall’s condition’)
Proof. The forward direction is easy. Suppose G has a matching, M say. Then
for any subset A ⊂ X, the edges of M join each vertex in A to a different vertex
of Y . These |A| distinct vertices of Y are all in Γ(A), so |Γ(A)| ≥ |A|. Therefore,
Hall’s condition holds.
Now let’s prove the other direction. We claim that any bipartite graph satisfying Hall’s condition, must have a matching. We prove this claim by induction
on |X|. If |X| = 0 then G trivially has a matching (the empty matching), so the
claim holds when |X| = 0.1 Now for the induction step. Let k ≥ 1, and suppose
that the claim holds whenever |X| ≤ k − 1. Let G be a bipartite graph with
bipartition X ∪ Y , where |X| = k, and suppose that G satisfies Hall’s condition.
We must show that G has a matching from X to Y . We consider two cases:
case 1: For any subset S ⊂ X with S 6= ∅, S 6= X, we have |Γ(S)| > |S|;
case 2: There exists a subset T ⊂ X with T 6= ∅, T 6= X, with |Γ(T )| = |T |.
(Such a subset T is called a critical subset.)
Suppose first that we are in case 1. Since |Γ(X)| ≥ |X| ≥ 1, G must have at least
one edge. Choose u ∈ X and v ∈ Y such that {u, v} ∈ E(G). Now let G0 be the
graph produced from G by removing both u and v (and all edges meeting them):
in symbols, G0 = (G − u) − v. Let X 0 = X \ {u} and let Y = Y \ {v}; then G0
is bipartite with bipartition X 0 ∪ Y 0 . Observe that Hall’s condition holds for G0 .
Indeed, if B ⊂ X 0 with B 6= ∅, let Γ0 (B) denote the neighbourhood of B in G0 ,
Γ0 (B) = {v ∈ Y 0 : {v, b} ∈ E(G0 ) for some b ∈ B}.
Then we have
|Γ0 (B)| ≥ |Γ(B)| − 1 ≥ |B| ∀B ⊂ X 0 .
Therefore, Hall’s condition holds for G0 . So by the induction hypothesis, G0
contains a matching, M 0 say. We can now combine M 0 with the deleted edge
{u, v} to produce a matching M in G. So G has a matching. This deals with
case 1.
Suppose now that we are in case 2. Then there exists a subset T ⊂ X with
T 6= ∅, T 6= X, with |Γ(T )| = |T |. Let G1 = G[T ∪Γ(T )] be the induced subgraph
of G on vertex-set T ∪Γ(T ), and let G2 = G[(X \T )∪(Y \Γ(T ))] be the subgraph
of G induced on the rest of the vertices of G. Our aim is to show that there is
a matching in G1 and a matching in G2 . Observe that Hall’s condition must
1
If you don’t like this, you can start the induction with |X| = 1, as in the lectures.
90
CHAPTER 3. GRAPHS
hold in G1 , since all the edges of G which start in T must end in Γ(T ), by the
definition of Γ(T ). So by the induction hypothesis, G1 has a matching.
Now we must check that Hall’s condition holds in G2 . If B ⊂ X \ T , let Γ2 (B)
denote the neighbourhood of B in G2 . We must show that |Γ2 (B)| ≥ |B|. To do
this, we use a trick: we consider B ∪ T . Notice that
|Γ(B ∪ T )| = |Γ2 (B)| + |Γ(T )| = |Γ2 (T )| + |T |,
since T is a critical subset — so we can express |Γ2 (T )| in terms of the size of a
neighbourhood in the original graph G, which we know all about. We have
|Γ2 (B)| = |Γ(B ∪ T )| − |T | ≥ |B ∪ T | − |T | = |B| + |T | − |T | = |B|,
using the fact that Hall’s condition holds in G. Therefore, Hall’s condition holds
in G2 . So by the induction hypothesis, G2 has a matching. Put the matching
in G1 and the matching in G2 together to produce a matching in G. So G has
a matching. This deals with case 2, completing the proof of the induction step,
and proving the theorem.
We shall now use Hall’s theorem to show that a special kind of bipartite graph
always contains a matching.
We need one more piece of notation. If G is a graph, and S, T ⊂ V (G) with
S ∩ T = ∅ (S and T are disjoint sets of vertices of G), then E(S, T ) denotes the
set of edges of G which go between S and T , and e(S, T ) = |E(S, T )| denotes the
number of edges of G which go between S and T .
Theorem 22. Let k be a positive integer. Suppose G is a bipartite graph with
bipartition X ∪ Y , such that each vertex in X has degree at least k, and each
vertex in Y has degree at most k. Then G has a matching from X to Y .
Proof. We check that Hall’s condition holds. Let A ⊂ X. We have
k|A| ≤ e(A, Y ) ≤ e(X, Γ(A)) ≤ k|Γ(A)|,
since there are at least k|A| edges coming out of A, the set of edges coming out of
A is a subset of the edges going in to Γ(A), and there are at most k|Γ(A)| edges
going in to Γ(A). Therefore |Γ(A)| ≥ |A|, so Hall’s condition holds, so G has a
matching from X to Y .
Let us introduce some more definitions.
Definition. If G is a graph, the minimum degree of G is the minimum of the
degrees of all the vertices of G; it is sometimes written as δ(G). In symbols,
δ(G) = min{d(v) : v ∈ V (G)}.
The maximum degree of G is the maximum of the degrees of all the vertices
of G; it is sometimes written as ∆(G). In symbols,
∆(G) = max{d(v) : v ∈ V (G)}.
3.3. BIPARTITE GRAPHS AND MATCHINGS
91
Definition. We say that a graph is k-regular if all its vertices have degree exactly
k.
Theorem 22 implies that a k-regular bipartite graph always has a matching,
for any positive integer k.
In fact, if G is a bipartite graph with bipartition X ∪ Y , and we know that G
has a matching from X to Y , and we also know that each vertex in X has ‘high’
degree, then we can deduce that in fact, G has ‘many’ matchings from X to Y .
This is the content of the following theorem, which will be useful in our study of
latin squares.
Theorem 23. Let G be a bipartite graph with bipartition X ∪ Y . Suppose that
G has a matching from X to Y , and that d(x) ≥ r for all x ∈ X, where r ∈ N.
Then in fact, the number of matchings in G from X to Y is at least
r!
if r ≤ |X|;
r(r − 1)(r − 2) . . . (r − |X| + 1) if r > |X|.
Proof. We prove this by induction on |X|. When |X| = 1, the statement of the
theorem holds: let X = {x}; then d(x) ≥ r, so there are at least r matchings
from X to Y . Now for the induction step. Let k ≥ 2, and assume that the
statement of the theorem holds whenever |X| ≤ k − 1. Now let G be a bipartite
graph with bipartition X ∪ Y , where |X| = k. Suppose that G has a matching
from X to Y , and that d(x) ≥ r for all x ∈ X. We split into two cases, as in the
proof of Hall’s theorem:
case 1: For any subset S ⊂ X with S 6= ∅, S 6= X, we have |Γ(S)| > |S|;
case 2: There exists a subset T ⊂ X with T 6= ∅, T 6= X, with |Γ(T )| = |T |.
(Such a subset T is called a critical subset.)
First suppose that we are in case 1. Choose any vertex u ∈ X. Now choose any
vertex v ∈ Y such that {u, v} ∈ E(G), and remove u and v from G, producing the
new bipartite graph G0 = (G−u)−v with bipartition X 0 ∪Y 0 , where X 0 = X \{u}
and Y 0 = Y \ {v}. (G0 is produced from G by removing u, v and all the edges
meeting u or v.) If x ∈ X 0 , let d0 (x) be the degree of x in G0 . Note that d0 (x) ≥
r − 1 for all x ∈ X, since we have only deleted one vertex from Y . Moreovor, as
in the proof of Hall’s theorem, G0 satisfies Hall’s condition. Therefore, by Hall’s
theorem, G0 must have a matching. So by the induction hypothesis (applied to
G0 , with r − 1 in place of r), the number of matchings in G0 is at least

if r − 1 ≤ |X 0 | = |X| − 1;
 (r − 1)!
(r − 1)(r − 2) . . . (r − 1 − |X 0 | + 1)
(3.1)

0
= (r − 1)(r − 2) . . . (r − |X| + 1)
if r − 1 > |X | = |X| − 1.
Each one of these matchings in G0 can be extended to a matching in G by adding
in the edge {u, v}. Since d(u) ≥ r, there must be at least r choices for v, so the
92
CHAPTER 3. GRAPHS
total number of matchings in G we can produce in this way is at least r times
(3.1), which is at least
r!
if r ≤ |X|;
r(r − 1)(r − 2) . . . (r − |X| + 1) if r > |X|.
This completes the proof of the induction step in case 1.
Now suppose that we are in case 2. Let T be a critical set. Then we have
r ≤ |Γ(T )| = |T | ≤ |X|, so we must prove that G has at least r! matchings. Define
G1 = G[T ∪ Γ(T )] and G2 = G[(X \ T ) ∪ (Y \ Γ(T ))], as in the proof of Hall’s
theorem. Again, as in the proof of Hall’s theorem, G1 satisfies Hall’s condition,
so G1 has a matching. For x ∈ V (G1 ), let d1 (x) denote the degree of x in G1 .
Then we have d1 (x) ≥ r for all x ∈ T . Therefore, by the induction hypothesis, G1
has at least r! matchings (as r ≤ |T |). Also, as in the proof of Hall’s theorem, G2
has a matching. We can use this matching in G2 to extend any matching in G1
to a different matching in G, so G must have at least r! matchings as well. This
completes the proof of the induction step in case 2, proving the theorem.
Sometimes, we are interested in whether a bipartite graph has a partial matching of a certain size.
Definition. Let G be a graph. A partial matching of size k in G is a set of k
edges of G, such that no two of these edges meet.
If G is a bipartite graph with bipartition X ∪ Y , then a partial matching of
size k in G consists of a set of k edges which match k distinct vertices of X to
k distinct vertices of Y . The following bipartite graph has a partial matching of
size 3 (one is indicated in bold), but no matching:
X
Y
We can use Hall’s theorem to deduce the following characterization of when
a bipartite graph contains a partial matching of size k.
Theorem 24. Let k be a positive integer. Let G be a bipartite graph with bipartition X ∪ Y . Then G contains a partial matching of size k if and only if
|Γ(A)| ≥ |A| − (|X| − k)
∀A ⊂ X
(∗).
3.3. BIPARTITE GRAPHS AND MATCHINGS
93
Proof. As with Hall’s theorem, the forward direction is easy. Suppose that G
contains a partial matching (M , say) of size k. Let A ⊂ X. Since M has at most
|X \ A| = |X| − |A| endpoints in X \ A, it must have at least k − (|X| − |A|)
endpoints in A; the other ends of these edges are all distinct neighhours of A in
Y . So |Γ(A)| ≥ k − (|X| − |A|) = |A| − (|X| − k), so (∗) holds.
Now suppose that (∗) holds. Produce a new bipartite graph G0 by taking G
and adding |X| − k new vertices to Y , each joined to every vertex of X. Then
Hall’s condition holds in G0 , so by Hall’s theorem, G0 has a matching, M 0 say.
Delete all the edges of M 0 which meet one of the |X| − k new vertices; there are
at most |X| − k of these edges, so we are left with at least k edges. These form
a partial matching of size k in G. So G has a partial matching of size k. This
proves the theorem.
In some societies, a man can have more than one wife. And in some societies,
a woman can have more than one husband! Suppose there is a society where a
woman is allowed to have more than one husband, but each man can have at
most one wife. Can each woman find two husbands, both of whom she knows?
In terms of bipartite graphs, what we are asking for is a set of edges from X
to Y , such that each vertex of X meets exactly 2 of these edges, but each vertex
of Y meets at most one of these edges. Such a set of edges is known as a 2-fold
matching from X to Y . Similarly, we make the following general definition.
Definition. Let r be a positive integer. If G is a bipartite graph with bipartition
X ∪ Y , an r-fold matching in G from X to Y is a set of edges of G such that
each vertex of X meets exactly r of these edges, but each vertex of Y meets at
most one of these edges.
We can use Hall’s theorem to deal with this situation, too.
Theorem 25. Let G be a bipartite graph with bipartition X ∪ Y . Then G has an
r-fold matching from X to Y if and only if
|Γ(A)| ≥ r|A|
∀A ⊂ X
(∗∗).
Proof. If G has an r-fold matching from X to Y , then we clearly have |Γ(A)| ≥
r|A| for any subset A ⊂ X, so (∗∗) holds.
Now suppose that (∗∗) holds. We produce a new bipartite graph G0 by replacing each vertex u ∈ X with r ‘copies’ of u, joined to all the same vertices as u
was joined to. The graph G0 has bipartition X 0 ∪ Y , where X 0 consists of all the
copies of vertices in X (so |X 0 | = r|X|). I claim that G0 satisfies Hall’s condition.
Indeed, if B ⊂ X 0 , then let A be the set of vertices of X which B contains a copy
of. Let Γ0 (B) denote the neighbourhood of B in G0 ; then Γ0 (B) = Γ(A).
Each vertex in X was copied exactly r times, so |B| ≤ r|A|. Therefore,
|Γ0 (B)| = |Γ(A)| ≥ r|A| ≥ |B|,
94
CHAPTER 3. GRAPHS
so Hall’s condition holds in G0 , as claimed. Therefore, by Hall’s theorem, G0 has a
matching, M 0 say. This corresponds to an r-fold matching in the original graph,
G. This proves the theorem.
Kőnig’s Theorem is another useful consequence of Hall’s Theorem; it relates
the maximum possible size of a partial matching to the minimum possible size of
a vertex-cover in a bipartite graph.
Definition. Let G be a bipartite graph with bipartition X ∪ Y . A maximum
partial matching in G is a partial matching from X to Y in G with the maximum
possible size.
Definition. Let G be a bipartite graph with bipartition X ∪ Y . A vertex-cover
in G is a set of vertices of G such that each edge of G meets at least one of
these vertices. A minimum vertex-cover in G is a vertex-cover with the minimum
possible number of vertices.
Here, then, is Kőnig’s theorem.
Theorem 26. Let G be a bipartite graph with bipartition X ∪ Y . Then the size of
a maximum partial matching in G is equal to the size of a minimum vertex-cover
in G.
Proof. Let M be a partial matching in G with the maximum possible number of
edges. Let S be a minimum vertex-cover in G. Then each edge of M meets a
different vertex of S, so |S| ≥ e(M ).
Now let |S| = k. Our aim is to prove that G has a partial matching of size k.
I claim that
|Γ(A)| ≥ |A| − (|X| − k) ∀A ⊂ X.
(Theorem 24 will then imply that G has a partial matching of size k.) To prove
the claim, observe that for any A ⊂ X, (X \ A) ∪ Γ(A) is a vertex-cover of G.
Since any vertex-cover is at least as large as S (S was chosen to be a minimum
vertex-cover), we must have
k ≤ |(X \ A) ∪ Γ(A)| = |X \ A| + |Γ(A)| = |X| − |A| + |Γ(A)|.
Rearranging, we get |Γ(A)| ≥ |A| − (|X| − k), as claimed. Therefore, by Theorem
24, G has a partial matching of size k. It follows that e(M ) ≥ k = |S|. Hence,
e(M ) = |S|, proving the theorem.
Chapter 4
Latin Squares
4.1
Introduction
In this chapter, we will study Latin Squares. These objects are quite simple
to define, but display interesting and complicated behaviour, and the various
constructions involved use techniques from both combinatorics and algebra.
Definition. A latin square of order n is an n by n array (i.e., matrix), where
each entry is a symbol drawn from an alphabet of size n, and each symbol occurs
exactly once in each row and exactly once in each column.
For example, here is a latin square of order 3, with symbols drawn from the
alphabet {1, 2, 3}:
1 2 3
2 3 1
3 1 2
The alphabet does not have to be the numbers {1, 2, . . . , n} — here is a latin
square of order 3 with alphabet {a, b, c}:
c b a
b a c
a c b
Here is a latin square of order 4:
1
2
3
4
2
1
4
3
3
4
1
2
4
3
2
1
A natural question to ask is the following: does there exist a latin square of
order n, for any positive integer n? The answer is ‘yes’: we can construct one
95
96
CHAPTER 4. LATIN SQUARES
using modular arithmetic. Let n ∈ N. We define an order-n latin square, L, as
follows. The alphabet is
Zn = {0, 1, 2, . . . , n − 1},
‘the integers modulo n’. We index the rows of L with the numbers 0, 1, 2, . . . , n−1,
and we index the columns of L with the numbers 0, 1, 2, . . . , n − 1. We define Li,j
(the (i, j)th entry of L) by
Li,j ≡ i + j (mod. n).
(In other words, Li,j is produced by adding i and j in the Abelian group Zn .)
For example, when n = 5, we get
0
1
L= 2
3
4
1
2
3
4
0
2
3
4
0
1
3
4
0
1
2
4
0
1
2
3
I make the following claim.
Claim. Each element of Zn occurs at most once in each row of L and at most
once in each column of L.
Proof of claim: Suppose Li,j = Li,k . Then i + j ≡ i + k (mod. n), so j ≡
k (mod. n), so j = k. So each element of Zn occurs at most once in each
row of L. Similarly, each element of Zn occurs at most once in each column of
L.
The above claim is equivalent to saying that L is a latin square — since the
alphabet has exactly n symbols, if each symbol occurs at most once in a row of
L, then each symbol must occur exactly once in that row, and similarly for a
column.
Therefore, L is a latin square of order n — we have constructed a latin square
of order n, for each n.
In fact, the construction above is a special case of the following.
Exercise 8. Let G = {g1 , . . . , gn } be a finite group of order n, with multiplication
∗. We define the Cayley multiplication table of G to be the array LG with (i, j)th
entry gi ∗ gj . Prove that LG is a latin square.
(Taking the group G to be Zn , with the ‘multiplication’ operation ∗ being
addition modulo n, we get the latin square L above.)
There is also a more combinatorial way of constructing latin squares. We can
construct them row by row, without worrying about whether we will run into
trouble in the future! In order to do this, we need to define k × n latin rectangles,
for k < n.
4.1. INTRODUCTION
97
Definition. Let n and k be positive integers with k ≤ n. A k × n latin rectangle
is a k by n array (i.e., matrix), where each symbol is drawn from an alphabet
of size n, and each symbol occurs exactly once in each row, and at most once in
each column.
The following lemma will enable us to construct a latin square ‘row by row’.
Lemma 27. Let n and k be positive integers with k < n. Let R be a k × n latin
rectangle. Then L can be extended to a (k + 1) × n latin rectangle by adding one
more row. (In fact, L can be extended to at least (n − k)! different (k + 1) × n
latin rectangles.)
Proof. Add a row of n empty cells to the bottom of the latin rectangle R. Let X
be the set of these cells, and let Y be the set of symbols in R’s alphabet. Define
a bipartite graph G with bipartition X ∪ Y as follows. Let us join the empty
cell in column j to all of the symbols which have not yet occurred in column j.
Notice that a matching in G is exactly what we need to extend L to a (k + 1) × n
latin rectangle. I claim that G is an (n − k)-regular bipartite graph, meaning
that every vertex in G has degree n − k. (We can then conclude, from the section
on matchings, that G has a matching — in fact, that it has at least (n − k)!
matchings.) To prove this claim, first observe that each column of R contains
exactly k symbols, so there are n − k symbols it does not contain, so each x ∈ X
has degree n − k. Secondly, observe that each symbol y ∈ Y occurs exactly once
in each of the k rows of R, and each time it occurs in a different column, so it
occurs in exactly k columns of R, so there are exactly n−k columns where it does
not appear. So each y ∈ Y has degree n − k. This proves the claim. Therefore,
G has at least (n − k)! different matchings. So L can be extended to a (k + 1) × n
latin square in at least (n − k)! different ways.
Corollary 28. We can produce an order-n latin square with alphabet {1, 2, . . . , n}
by first choosing any ordering of 1, 2, . . . , n for the first row, and then using the
following step-by-step process. At each step, starting with a k × n latin rectangle,
we can extend it to a (k + 1) × n latin rectangle, by the lemma above. If we
continue for n − 1 steps, we have a latin square of order n.
Corollary 29. There are at least n!(n − 1)! . . . 2!1! =
squares of order n with alphabet {1, 2, . . . , n}.
Qn−1
k=0 (n − k)!
different latin
Proof. If we construct latin squares using the previous corollary, there are n!
choices for the first step (we must just choose any ordering of 1, 2, . . . , n, i.e. any
permutation in Sn ), and for the kth step in the above process, there are (n − k)!
choices.
98
CHAPTER 4. LATIN SQUARES
4.2
Orthogonal latin squares
We shall now look at orthogonal latin squares.
Definition. Let L = (Li,j ) and M = (Mi,j ) be two latin squares of order n.
Suppose L uses alphabet A, and M uses alphabet B. The two latin squares L
and M are said to be orthogonal to one another if every ordered pair of symbols
(a, b) ∈ A × B occurs exactly once when we list the ordered pairs (Li,j , Mi,j ).
For example, the two latin squares
1 2 3
L= 2 3 1 ,
3 1 2
a b c
M= c a b
b c a
are orthogonal to one another, since when
we get the array
(1, a) (2, b)
(2, c) (3, a)
(3, b) (1, c)
we list the ordered pairs (Li,j , Mi,j ),
(3, c)
(1, b) ,
(2, a)
in which each one of the 9 ordered pairs of symbols in {1, 2, 3} × {a, b, c} occurs
exactly once.
Of course, if we have a pair of orthogonal latin squares, and we take one
of them and replace it by a new latin square by relabelling the symbols in its
alphabet, we get another pair of orthogonal latin squares. In the example above,
if we relabel a as 1, b as 2, and c as 3, we get another pair of orthogonal latin
squares:
1 2 3
1 2 3
0
L= 2 3 1 , M = 3 1 2 .
3 1 2
2 3 1
We can do this with both squares in a pair of orthogonal latin squares. So,
‘relabelling the symbols makes no difference’ to orthogonality.
When n = 2, there is no pair of orthogonal latin squares. The reason is as
follows. Suppose that there was a pair of orthogonal latin squares of order 2.
Then, by relabelling the alphabets as above, there would be a pair of orthogonal
latin squares with alphabet {1, 2}. But there are only two latin squares with
alphabet {1, 2}, namely
S=
1 2
,
2 1
T =
2 1
1 2
and these are not orthogonal to one another, as when we list all the pairs
(Si,j , Ti,j ), we only get
(1, 2) (2, 1)
(2, 1) (1, 2)
4.2. ORTHOGONAL LATIN SQUARES
99
— the pairs (1, 1) and (2, 2) do not appear. Hence, there is no pair of orthogonal
latin squares of order 2.
You may remember from the first lecture that Euler asked the following question.
‘There are 6 different regiments. Each regiment has 6 soldiers, one of each
of 6 different ranks. Can these 36 soldiers be arranged in a square formation so
that each row and each column contains one soldier of each rank and one from
each regiment?’
This is really asking whether there exists a pair of orthogonal latin squares
of order 6. (Why?) Euler conjectured that the answer is ‘no’, but he could
not prove this. (There are an awful lot of latin squares of order 6 — at least
6!5!4!3!2! = 24, 883, 200, as we saw above, and in fact exactly 812, 851, 200, so it
would take far too long to check all the possible pairs by hand.) In 1900, Tarry
proved that there is no pair of orthogonal latin squares of order 6, using a clever
argument with lots of different cases, but obviously (and impressively!) without
the use of a computer.
Euler also made the daring conjecture that there exists no pair of orthogonal
latin squares of order n, for any n which is congruent to 2 (mod. 4). In fact, this
was completely false — Bose, Shrikhande and Parker proved in 1960 that there
is a pair of orthogonal latin squares of order n for any n, except for n = 2 and
n = 6. Often in combinatorics, ‘small-number’ behaviour can be deceptive!
Our aim is to prove an easier result: that whenever n ≡ 0, 1 or 3 (mod. 4),
there exists a pair of orthogonal latin squares of order n. The easiest cases are
when n ≡ 1 or 3 (mod. 4), i.e. when n is odd:
Theorem 30. If n is an odd positive integer, there is a pair of orthogonal latin
squares of order n.
Proof. We use Zn = {0, 1, . . . , n − 1}, the integers modulo n, as our alphabet.
Define the latin square L by
Li,j ≡ i + j (mod.n),
as before, and define the matrix M by
Mi,j ≡ 2i + j (mod. n).
(As before, the rows and columns are both indexed by the elements of Zn .) I
claim that M is also a latin square. Indeed, if Mi,j = Mi,k , then 2i + j ≡ 2i + k
(mod. n), so j ≡ k (mod. n), so j = k. Also, if Mi,j = Mk,j , then 2i + j ≡ 2k + j
(mod. n), so 2i ≡ 2k (mod. n), so i ≡ k (mod. n), by multiplying both sides
of the previous equation by the multiplicative inverse of 2 in Zn . (Recall that if
100
CHAPTER 4. LATIN SQUARES
r ∈ Zn , a multiplicative inverse for r is an element s ∈ Zn such that rs = 1; an
element r ∈ Zn has a multiplicative inverse if and only if the highest common
factor of r and n is 1. So 2 has a multiplicative inverse in Zn , whenever n is odd.)
So M is a latin square, as claimed.
I now claim that L and M are orthogonal. Indeed, take any (a, b) ∈ Zn ×
Zn ; we must show that there exist i, j ∈ Zn with (Li,j , Mi,j ) = (a, b). This is
equivalent to
i + j ≡ a (mod.n)
2i + j ≡ b (mod. n)
which has solution i ≡ b − a (mod. n), j ≡ 2a − b (mod. n). So each pair (a, b)
occurs at least once. Since there are n2 possible pairs (a, b) and only n2 possible
entries (i, j), if each occurs at least once, then each must occur exactly once. So
each pair (a, b) ∈ Zn × Zn occurs exactly once, so L and M are orthogonal to one
another.
Our aim is now to construct a pair of orthogonal latin squares of order n, for
every n ≡ 0 (mod. 4). To do this, we will use finite fields.
Recall that a field is a set F equipped with two binary operations, denoted
by + and ·, such that (F, +) is an Abelian group (with identity element denoted
by 0), (F \ {0}, ·) is an Abelian group (with identity element denoted by 1), and
· is distributive over +. Writing out the axioms in full, this means that:
• x + (y + z) = (x + y) + z for all x, y, z ∈ F ;
• x + 0 = 0 + x = x for all x ∈ F ;
• For any x ∈ F , there exists (−x) ∈ F such that x + (−x) = (−x) + x = 0.
• x + y = y + x for all x, y ∈ F ;
• x · (y · z) = (x · y) · z for all x, y, z ∈ F \ {0};
• x · 1 = 1 · x = x for all x ∈ F \ {0};
• For any x ∈ F \{0}, there exists (x−1 ) ∈ F such that x·(x−1 ) = (x−1 )·x = 1;
• (x + y) · z = x · z + y · z and z · (x + y) = z · x + z · y for all x, y, z ∈ F ;
• x · y = y · x for all x, y ∈ F .
The order of a finite field F is just the number of elements F has; it is denoted
by |F |.
The simplest examples of finite fields are Zp , the integers modulo p, for any
prime p, under the usual operations of + and ×. In fact, there is a field Fpd of
order pd for any prime p and any positive integer d. In particular, there is a field
4.2. ORTHOGONAL LATIN SQUARES
101
F4 of order 22 = 4. (The integers modulo 4 do not form a field under + and
×; why?) The field F4 = {0, 1, α, β} has addition and multiplication tables as
follows:
+
0
1
α
β
0
0
1
α
β
1
1
0
β
α
α
α
β
0
1
β
β
α
1
0
·
0
1
α
β
0
0
0
0
0
1
0
1
α
β
α
0
α
β
1
β
0
β .
1
α
If F is a finite field, and f ∈ F \ {0}, then we can define a latin square L(f )
by
L(f )i,j = f · i + j
(i, j ∈ F ).
(Here, the alphabet is F , and the rows and columns are each indexed by the
elements of F .) Let’s check that L(f ) is indeed a latin square. If L(f )i,j = L(f )i,k ,
then f ·i+j = f ·i+k, so i = k. Similarly, if L(f )i,j = L(f )k,j , then f ·i+j = f ·k+j,
so f · i = f · k, so i = k (multiplying both sides of the previous equation by f −1 ,
the multiplicative inverse of f ). So each element of F occurs at most once in
each row and at most once in each column, so it must occur exactly once in each
row and exactly once in each column, so L(f ) is indeed a latin square.
Moreover, I claim that for any two distinct elements f, g ∈ F \ {0}, L(f ) and
L(g) are a pair of orthogonal latin squares. To prove this, we must show that for
any (a, b) ∈ F × F , there exist i, j ∈ F satisfying
L(f )i,j = a,
L(g)i,j = b
i.e.
f · i + j = a,
g · i + j = b.
This is just a pair of simultaneous equations (with variables i and j). We solve
it as follows: subtract the first from the second to give
(g − f ) · i = b − a ⇒ i = (g − f )−1 · (b − a),
subsitute the value of i into the first equation to give j = a − (g − f )−1 · (b − a).
So the pair (a, b) appears in the entry (i, j) where
i = (g − f )−1 · (b − a),
j = a − (g − f )−1 · (b − a).
Therefore, the latin squares L(f ) and L(g) are orthogonal to one another, as
claimed.
102
CHAPTER 4. LATIN SQUARES
If F is any finite field with |F | ≥ 3, then it must have at least two non-zero
elements; choosing f and g to be any two distinct non-zero elements, we get a
pair of orthogonal latin squares L(f ), L(g) of order |F |. Since there is a field of
order pd for any prime p and any d ∈ N, we see that there is a pair of orthogonal
latin squares of order pd , for any prime p and any d ∈ N with pd ≥ 3:
Theorem 31. If p is prime and d ∈ N with pd ≥ 3, then there exists a pair of
orthogonal latin squares of order pd .
In particular, taking p = d = 2, there is a pair of orthogonal latin squares of
order 4. Taking F4 = {0, 1, α, β}, we can take f = 1 and g = α, giving
0
1
L(1) =
α
β
1
0
β
α
α
β
0
1
β
α
,
1
0
0
α
L(α) =
β
1
1
β
α
0
α
0
1
β
β
1
.
0
α
(Note that L(1)i,j = 1 · i + j = i + j, so L(1) is just the addition table of F4 ,
which we saw above; L(α)i,j = α · i + j, so we can work out the entries of L(α)
using the multiplication of F4 , which we saw above.) You can see directly that
L(1) and L(α) are orthogonal to each other.
We need one more tool to allow us to construct a pair of orthogonal latin
squares of order n, for every n ≡ 0 (mod. 4). This is the product construction of
latin squares.
Definition. Let A be a latin square of order m, using alphabet {1, 2, . . . , m}, and
let B be a latin square of order n. The product latin square A ◦ B is defined as
follows. Produce m copies of B, say B1 , . . . , Bm , by relabelling the alphabet of
B using m different alphabets which are pairwise disjoint (meaning that no two
of these m alphabets share any symbols). Now produce A ◦ B by taking A, and
replacing each symbol i in A with the latin square Bi .
For example, if
1 2
A=
,
2 1
1 2 3
B= 2 3 1 ,
3 1 2
then let us take
a b c
B1 = b c a ,
c a b
d e f
B2 = e f d
f d e
(note that we must use disjoint alphabets for B1 and B2 , so we relabel 1, 2, 3 as
4.2. ORTHOGONAL LATIN SQUARES
103
a, b, c to produce B1 , and we relabel 1, 2, 3 as d, e, f to produce B2 ). We get
A◦B =
B1 B2
B2 B1
a
b
c
=
d
e
f
b
c
a
e
f
d
c
a
b
f
d
e
d
e
f
a
b
c
e
f
d
b
c
a
f
d
e
.
c
a
b
The squares Bi are referred to as the ‘blocks’ of A ◦ B. Exercise: check that
A ◦ B is always a latin square!
We can use this product construction to produce new pairs of orthogonal latin
squares:
Lemma 32. Suppose (A, C) is a pair of orthogonal latin squares of order m, and
(B, D) is a pair of orthogonal latin squares of order m. Then (A ◦ B, C ◦ D) is
a pair of orthogonal latin squares of order mn.
Proof. Since relabelling alphabets does not affect orthogonality, we may assume
that A and C both use alphabet {1, 2, . . . , m}. Let us produce A ◦ B by taking
m copies B1 , . . . , Bm of B (with disjoint alphabets), and let us produce C ◦ D
by taking m copies D1 , . . . , Dm of D (with disjoint alphabets). Let X denote
the alphabet of A ◦ B (which is the union of all the alphabets of B1 , . . . , Bm ),
and let Y denote the alphabet of C ◦ D (which is the union of all the alphabets
of D1 , . . . , Dm ). We must prove that each ordered pair (x, y) ∈ X × Y appears
exactly once when we list all the ordered pairs ((A ◦ B)p,q , (C ◦ D)p,q ).
Take any x ∈ X and any y ∈ Y . Suppose that x is in the alphabet of Ci
and that y is in the alphabet of Dj . Since A and C are orthogonal, the pair
of symbols (i, j) appears exactly once when we list the ordered pairs (Ar,s , Cr,s ).
Suppose (i, j) appears in entry (r0 , s0 ), so that (Ar0 ,s0 , Cr0 ,s0 ) = (i, j). Then the
pair (x, y) can only possibly appear in the (r0 , s0 ) ‘block’ when we list the ordered
pairs ((A ◦ B)p,q , (C ◦ D)p,q ). Since the (r0 , s0 ) ‘block’ of A ◦ B is exactly Bi , and
the (r0 , s0 ) ‘block’ of C ◦ D is exactly Dj , and Bi and Dj are orthogonal to one
another, the pair of symbols (x, y) appears exactly once in the (r0 , s0 ) block.
Therefore, each pair of symbols (x, y) ∈ X × Y appears exactly once when we list
the ordered pairs ((A ◦ B)p,q , (C ◦ D)p,q ), so (A ◦ B, C ◦ D) is a pair of orthogonal
latin squares.
It is helpful to illustrate this
and
1
A= 2
3
proof with an example. Let us take m = n = 3,
2 3
3 1 ,
1 2
3 1 2
C= 2 3 1 ,
1 2 3
a b c
B= b c a ,
c a b
a b c
D= c a b .
b c a
104
CHAPTER 4. LATIN SQUARES
Then let us take copies
a b c
B1 = b c a ,
c a b
d e f
B2 = e f d ,
f d e
g h i
B3 = h i g
i g h
a b c
D1 = c a b ,
b c a
d e f
D2 = f d e ,
e f d
g h i
D3 = i g h .
h i g
and
The ‘blocks’ of A ◦ B and C ◦ D are as follows:
B1 B2 B3
A ◦ B = B2 B3 B1 ,
B3 B1 B2
D3 D1 D2
C ◦ D = D2 D3 D1 .
D1 D2 D3
Suppose x is in the alphabet of B2 and y is in the alphabet of D3 . The only ‘block’
in which B2 appears in A ◦ B and D3 appears in C ◦ D is the (3, 3) ‘block’. Since
B2 and D3 are orthogonal to one another, the pair (x, y) must appear exactly
once in this ‘block’.
We can use this to construct a pair of orthogonal latin squares of order n, for
any n ≡ 0 (mod. 4).
Corollary 33. If n ≡ 0 (mod. 4), then there exists a pair of orthogonal latin
squares of order n.
Proof. Write n = 2d q, where q is odd and d ≥ 2. There exists a pair of orthogonal
latin squares of order 2d (by Theorem 31), and there exists a pair of orthogonal
latin squares of order q (by Theorem 30). Therefore, by Lemma 32, there exists
a pair of orthogonal latin squares of order 2d q.
Sometimes, we are interested in finding a family of latin squares of order n,
which are all orthogonal to one another. This motivates the following definition.
Definition. A family of mutually orthogonal latin squares is a family of latin
squares such that any two distinct latin squares in the family are orthogonal to
one another.
What is the maximum possible number of squares we can have in a family of
mutually orthogonal latin squares? Recall that if n = pd , where p is prime and
d ∈ N, and Fpd denotes a finite field of order pd , then the latin squares
L(f ) : f ∈ Fpd \ {0}
are all orthogonal to one another, so they form a family of n − 1 mutually orthogonal latin squares of order n. It turns out that we cannot have a larger family of
mutually orthogonal latin squares:
4.3. UPPER BOUNDS ON THE NUMBER OF LATIN SQUARES
105
Theorem 34. Let n ∈ N, and let A be a family of mutually orthogonal latin
squares of order n. Then |A| ≤ n − 1.
Proof. Suppose for a contradiction that there exists a family A of at least n
mutually orthogonal latin squares of order n. By removing some of the squares
if necessary, we may produce a family B of n mutually orthogonal latin squares
of order n. Let B = {B(1), . . . , B(n)}.
Since relabelling alphabets does not affect orthogonality, we may assume
that each B(i) has alphabet {1, 2, . . . , n}, and that the first row of each B(i)
is 1, 2, . . . , n. Now, where can the symbol 1 go in the second row of the square
B(i)? It can never go in the (2, 1)-space, as the symbol 1 appears in the (1, 1)space of B(i), and each B(i) is a latin square. So for each B(i), the symbol 1
must appear in one of the spaces (2, 2), (2, 3), . . . or (2, n). But there are n of
these squares B(i), and only n − 1 of these spaces, so there must be two latin
squares (B(r) and B(s), say) where the symbol 1 appears in the same space,
(2, j) say. But then the ordered pair of symbols (1, 1) appears twice when we list
the ordered pairs (B(r)p,q , B(s)p,q ) — once in the space (1, 1) (when p = q = 1),
and once in the space (2, j) (when p = 2 and q = j). This contradicts the fact
that B(r) and B(s) are orthogonal to one another, proving the theorem.
4.3
Upper bounds on the number of latin squares
How many latin squares of order n are there with alphabet {1, 2, . . . , n}? Let
Ln denote the set of all latin squares of order n, with alphabet {1, 2, . . . , n}.
Estimating |Ln | accurately is an important unsolved problem in Combinatorics!
The best known upper and lower bounds for |Ln | are quite far apart. We saw
previously (Corollary 29) that
|Ln | ≥ n!(n − 1)!(n − 2)! . . . 2!1!.
In this section, we shall prove some simple upper bounds on |Ln |. First, we have
the following (very crude) upper bound.
Lemma 35. For any n ∈ N, |Ln | ≤ (n!)n .
Proof. Let L be a latin square with alphabet {1, 2, . . . , n}. Each row of L is a
permutation of {1, 2, . . . , n} (thinking of a permutation as an ordering). If we
choose a latin square row-by-row, there are n! possibilities for the first row, and
then there are at most n! possibilities for each subsequent row. Therefore, there
are at most (n!)n possibilities altogether.
We can improve on this by noting that in fact, each row of a latin square
below the first row must be a derangement of the first row — meaning that there
there is no column where the two rows have the same number.
First of all, notice the following fact.
106
CHAPTER 4. LATIN SQUARES
Lemma 36. The number of latin squares in Ln whose first row is
f (1), f (2), . . . , f (n)
is the same for all permutations f ∈ Sn .
Proof. The idea of the proof is just to relabel the alphabet of the latin squares.
We define a bijection Φf from the set of all latin squares in Ln with first row
1, 2, . . . , n to the set of all latin squares in Ln with first row f (1), f (2), . . . , f (n),
as follows. If L is a latin square with first row 1, 2, . . . , n, define Φf (L) to be the
latin square produced from L by replacing the symbol i with the symbol f (i),
wherever i occurs in L, for each i. It is clear that Φf is a bijection. Therefore, the
number of latin squares with first row f (1), f (2), . . . , f (n) is equal to the number
of latin squares with first row 1, 2, . . . , n, for all f , proving the lemma.
Definition. A latin square of order n is said to be row-standard if its first row
is 1, 2, . . . , n. We let L̃n denote the set of all row-standard latin squares of order
n.
It follows from Lemma 36 that
|Ln | = n!|L̃n |.
Notice that if L ∈ L̃n , then any row of L below row 1 must be a derangement of
{1, 2, . . . , n}, so there are at most dn choices for each row below row 1, where dn
is the number of derangments of {1, 2, . . . , n}. It follows that
|L̃n | ≤ (dn )n−1 .
(4.1)
We would like to use the estimate dn = [n!/e] to deduce that
|L̃n | ≤ (n!/e)n−1 .
(4.2)
(Recall that if y is a real number, [y] denotes the closest integer to y, rounded
down if y is of the form m + 1/2 for some integer m.)
If n is odd, then we have dn < n!/e, so (4.2) follows directly from (4.1). If
n is even, however, we have dn > n!/e, so we cannot just apply (4.1). Instead,
note that the number of choices for row 2 is dn , but row 3 cannot equal row 2,
so there are at most dn − 1 choices for row 3 (and for all subsequent rows). This
gives us the slightly better upper bound
|L̃n | ≤ dn (dn − 1)n−2 .
If n is even, then we have dn − 1/2 < n!/e < dn , so
dn (dn − 1) < d2n − dn + 1/4 = (dn − 1/2)2 < (n!/e)2 ,
4.4. TRANSVERALS IN LATIN SQUARES
107
and therefore
|L̃n | ≤ dn (dn − 1)n−2 = dn (dn − 1)(dn − 1)n−3 ≤ (n!/e)2 (n!/e)n−3 = (n!/e)n−1
(provided n > 2), which is what we wanted. It follows that
|Ln | = n!|L̃n | ≤ n!(n!/e)n−1 = (n!)n /en−1 ,
provided n > 2. (Note that this does not hold for n = 2, since (2!)2 /e = 4/e < 2,
but there are two latin squares of order 2 with alphabet {1, 2}.) We have proved
the following.
Theorem 37. If n > 2, then
|Ln | < (n!)n /en−1 .
In fact, it is known that
2 (1− )
n
|Ln | ≤ (n!)n /en
,
where n is a function of n which tends to zero as n tends to infinity, but the
proof of this is slightly beyond the scope of this course.
It is a major open problem in combinatorics to find the ‘asymptotic behaviour’
of |Ln |’ — i.e., to find a function f (n) such that
|Ln |/f (n) → 1 as n → ∞.
4.4
Transverals in Latin Squares
We conclude the chapter on latin squares by a discussion of the most well-known
unsolved problem in the area.
Definition. Let L be a latin square of order n. A transversal of L is a set of n
entries of L such that there is exactly one entry from each row and exactly one
entry from each column, and each symbol occurs in exactly one of the entries.
For example, the bold entries form transversals in the two latin squares below:
1 2 3
2 3 1
3 1 2
1
2
3
4
2
1
4
3
3
4
1
2
4
3
2
1
108
CHAPTER 4. LATIN SQUARES
Conjecture (Ryser’s Conjecture, 1967). Every latin square of odd order has a
transversal.
This is perhaps the best-known unsolved problem on latin squares. It is a
good example of a problem in mathematics which is very simple to state (indeed,
it can be explained to someone without a formal mathematical training), but
which has baffled mathematicians for a very long time!
The hypothesis that n is odd is necessary — indeed, for every even n there
exists a latin square of order n which has no transversal:
Exercise 9. Prove that if n is even, then the addition table of Zn has no transversal.
Notice that if L is a latin square which has a latin square orthogonal to it,
then L has a transversal. To see this, suppose that L is a latin square of order
n, and that M is a latin square orthogonal to L. Then for any symbol i in the
alphabet of M , the set of entries where M has symbol i is a transversal of L. So
in fact, the n2 entries of L can be partitioned into n disjoint transversals!
We proved that whenever n is not congruent to 2 (modulo 4), there exists a
pair of orthogonal latin squares of order n. It follows that for any odd n, there
exists a latin square of order n that has a transversal. However, this does not
prove that every latin square of that order has a transversal!