Download Isometries of the plane - math.jacobs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Basis (linear algebra) wikipedia , lookup

Factorization wikipedia , lookup

Eisenstein's criterion wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Dessin d'enfant wikipedia , lookup

Point groups in three dimensions wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Transcript
1. Isometries of the plane
We will denote the Euclidean plane by R2 . If a and b are points in the plane, we
write |ab| for the (Euclidean) distance between a and b.
Definition 1.1. A map Φ : R2 → R2 is called an isometry if |Φ(a)Φ(b)| = |ab| for
any pair of points a, b ∈ R2 .
Problem 1.2. Prove that an isometry takes different points to different points, i.e.
if a 6= b, then Φ(a) 6= Φ(b).
Example 1.3. Consider the map taking a point with coordinates (x, y) to the
point with coordinates (x + 1, y). This is an isometry. The following maps are also
isometries:
(x, y) 7→ (x, −y),
(x, y) 7→ (x + 1, −y),
(x, y) 7→ (−y, x).
This can be easily verified using the Pythagorean theorem: the distance between a
point a with coordinates (x, y) and a point a0 with coordinates (x0 , y 0 ) is
p
(x − x0 )2 + (y − y 0 )2 .
The notion of isometry is closely related with symmetry. A mathematical definition of symmetry is the following:
Definition 1.4. A set A ⊂ R2 is said to be symmetric with respect to an isometry
Φ if Φ(A) = A (which means that a ∈ A iff Φ(a) ∈ A). The isometry Φ is called a
symmetry of A.
Proposition 1.5. Any isometry maps straight segments to straight segments, lines
to lines, and circles to circles.
Proof. Recall that c ∈ R2 is on the straight segment [x, y] connecting x with y if
and only if |xz| + |zy| = |xy|. It follows that |Φ(x)Φ(z)| + |Φ(z)Φ(y)| = |Φ(x)Φ(y)|,
i.e. Φ(z) belongs to the segment [Φ(x), Φ(y)]. A proof of the other two statements
is left to the reader.
¤
Proposition 1.6. Any isometry preserves angles.
Proof. Indeed, let abc be any triangle. Then Φ(a)Φ(b)Φ(c) is a congruent triangle
because the sides of the two triangles are equal. It follows that the corresponding
angles are also equal.
¤
We will now define some particular classes of isometries.
Definition 1.7. A map Φ : R2 → R2 is called a translation if, for every pair
x, y ∈ R2 , the four points x, y, Φ(y) and Φ(x) form a parallelogram (so that [x, y]
and [Φ(x), Φ(y)] are two sides of this parallelogram).
Problem 1.8. Prove that any translation is an isometry.
1
Problem 1.9. Show that the map (x, y) 7→ (x + α, y + β) is a translation for any
pair of real numbers α, β ∈ R. Moreover, any translation has this form.
Problem 1.10. Show that the graph of the function y = sin x has translational
symmetry. Find all such symmetries.
Definition 1.11. A map Φ : R2 → R2 is called a rotation around a point a by an
angle θ if Φ(a) = a and, for every b 6= a, we have |ab| = |aΦ(b)|, and the angle
baΦ(b) is θ. Here θ can be any real number in [0, 2π). We measure the angle baΦ(b)
by going counterclockwise from b to Φ(b).
Many flowers have rotational symmetry. A regular n-gon is symmetric w.r.t. a
rotation by angle 2π/n.
Problem 1.12. Prove that any rotation is an isometry.
Problem 1.13. Prove that the rotation by an angle θ around (0, 0) is given by the
formula
(x, y) 7→ (x cos θ + y sin θ, −x sin θ + y cos θ).
Definition 1.14. A map Φ : R2 → R2 is called a reflection in a line L if, for every
a ∈ L we have Φ(a) = a and, for every a 6∈ L, the segment [a, Φ(a)] is bisected by
its intersection point with L.
Problem 1.15. Prove that any reflection is an isometry.
2. Classification of isometries
Note that the identity transformation that maps every point to itself is an isometry. It is a translation and a reflection at the same time.
Problem 2.1. Suppose that an isometry fixes two different points on a line L. Prove
that this isometry is a reflection or the identity.
Theorem 2.2. Suppose that an isometry fixes three non-collinear points. Then this
isometry is the identity transformation.
Problem 2.3. Prove this theorem.
Consider two maps Φ, Ψ : R2 → R2 . Recall that the composition of Φ and Ψ is
defined as the map taking a point a ∈ R2 to the point Ψ(Φ(a)). We denote the
composition by Ψ ◦ Φ (note that the right term is applied first).
Theorem 2.4. Every isometry is a composition of at most three reflections.
Proof. Let Φ be an isometry, and abc any triangle. Set a0 = Φ(a), b0 = Φ(b) and
c0 = Φ(c). It suffices to define a composition of at most three reflections that maps
a, b and c to a0 , b0 and c0 , respectively. The map Φ will necessarily coincide with this
composition of reflections by Theorem 2.2. By the first reflection, we map the point
a to the point a0 (we need to reflect in the perpendicular bisector of the segment
2
[a, a0 ]). We can use another reflection to map b to b0 by keeping a fixed. Finally, we
may need the third reflection to map c to c0 . (This argument will not make much
sense until you have pictures, either on paper or in your mind).
¤
We now need to describe compositions of at most three reflections explicitly. The
compositions of two of reflections are described below:
Problem 2.5. Let L and L0 be a pair of intersecting lines. Consider reflections R
and R0 in L and L0 , respectively. Denote the angle between L and L0 by θ. Prove
that the composition R ◦ R0 is a rotation by angle 2θ.
Problem 2.6. Let L and L0 be a pair of parallel lines. Denote the corresponding
reflections by R and R0 . Then R ◦ R0 is a translation.
We say that a ∈ R2 is a fixed point of a map Φ : R2 → R2 if Φ(a) = a. We also
say that Φ fixes the point a.
Theorem 2.7. Any isometry with a fixed point is either rotation or a reflection.
Proof. Let a be a fixed point of an isometry Φ. Consider a triangle abc. Let ab0 c0 be
the image of this triangle under Φ. There is a reflection and there is a rotation that
take the segment [a, b] to the segment [a0 , b0 ]. Either the reflection or the rotation
takes c to c0 (Why?).
¤
Problem 2.8. Show that the composition of two rotations is either a reflection or a
translation. Hint: represent each rotation as a composition of two reflections. Note
that there is a certain freedom in the choice of the mirrors.
Definition 2.9. Define a glide reflection as the composition of a reflection in a line
L and a translation that maps L to itself.
A half-turn around a point a is defined as the rotation around a by angle π.
Proposition 2.10. The composition of a half-turn and a reflection is a glide reflection.
Proof. Consider the composition of the half-turn around a point a ∈ R2 and the
reflection in a line L. One can choose a coordinate system such that the line L is
the x-axis, and the point a is on the y-axis, i.e. a = (0, α). Then the half-turn is
(x, y) 7→ (−x, 2α − y),
and the reflection is
(x, y) 7→ (x, −y).
The composition of the two isometries (first half-turn, then reflection) is
(x, y) 7→ (−x, y − 2α).
This is clearly a glide reflection in the line x = 0.
¤
Theorem 2.11. Any isometry of the plane is a reflection, rotation, translation or
a glide reflection.
3
Proof. Let Φ be an isometry. Choose any point a ∈ R2 , and let H be the half-turn
mapping Φ(a) to a. Then the composition H ◦ Φ fixes a. We already know that such
an isometry is a rotation or a reflection. Now use Proposition 2.10 and Problem
2.8.
¤
3. Transformation groups
Let Φ : R2 → R2 be a transformation of the plane (i.e. a map such that for
every b ∈ R2 there is a unique a ∈ R2 satisfying b = Φ(a)). Define Φ−1 (the inverse
transformation) as the map mapping any point b ∈ R2 to the point a such that
b = Φ(a).
A set G of transformations is called a transformation group if
• the identity transformation is in G,
• the composition of two elements of G also belongs to G,
• the inverse element to any element of G also belongs to G.
We can define a transformation group of any set, not necessarily a plane, in the
same way. Clearly, all isometries of the plane form a transformation group.
Problem 3.1. Prove that all translations form a group. All rotations around a
fixed point a also form a group. The set of all rotations is not a group. Neither is
the set of all reflections.
Problem 3.2. For any set A ⊂ R2 , all symmetries of A form a transformation
group. This group is called the symmetry group of A.
Problem 3.3. Draw a figure, whose symmetry group consists of exactly 5 elements.
Problem 3.4. Find the symmetry group of a regular n-gon (i.e. count and describe
all symmetries).
4. Wallpaper groups
A group Γ of isometries of the plane is called a wallpaper group (or a plane crystallographic group) if there is a polygon P with the following properties:
S
• the images of P under Γ cover the plane: R2 = g∈Γ g(P ),
• different images are almost disjoint: they can only intersect by a part of the
boundary.
The images g(P ) are called tiles. Note that the same wallpaper group can have
tiles of many different shapes. Thus we are mainly interested in the structure of the
group rather than in a shape of a tile. All wallpaper groups were classified by Fedorov
in 1891. His work was motivated by crystallography. There are 17 wallpaper groups,
but only 5 consist entirely of orientation preserving isometries. We will sketch an
argument describing these 5 orientation preserving wallpaper groups.
First, let us discuss some general properties of a wallpaper group Γ. We fix some
particular tile P from the definition of a wallpaper group.
4
Proposition 4.1. Every bounded domain of the plane can intersect only finitely
many tiles.
Indeed, each tile has definite area, thus infinitely many tiles require infinite area.
Fix a point a ∈ Γ. Define the orbit of a under the action of Γ as the set
Γa = {g(a) | g ∈ Γ}.
Proposition 4.2. Every bounded domain of the plane can contain only finitely many
points of Γa.
Suppose that a is not on the boundary of a tile. Then different images g(a) are
contained in different tiles. Thus Proposition 4.2 follows from Proposition 4.1 in
this case. One should be a little bit more careful if a belongs to the boundary of a
tile.
Every translation Φ is characterized by its vector ~v : for any point a ∈ R2 , we
have Φ(a) = a + ~v . The existence of such vector ~v follows from the definition of a
translation.
We will now assume that Γ consists of rotations and translations only.
Proposition 4.3. The group Γ contains a translation by a nonzero vector.
Proof. Indeed, if r and s are rotations with different centers, then r ◦ s ◦ r−1 ◦ s−1 is
a translation by nonzero vector (check this!). It remains to consider the case when
Γ consists of rotations with the same center. But in this case, the images of P will
be of bounded distance from the center, thus they cannot cover the whole plane, a
contradiction.
¤
Proposition 4.4. The group Γ contains two translations by non-collinear vectors.
Proof. We already now that Γ contains a translation by a nonzero vector ~v . Suppose
that g is a translation by vector ~v , and r a rotation. Then r ◦ g ◦ r−1 is a translation
by vector r(~v ). Thus, if Γ does not contain two translations by non-collinear vectors,
then all non-trivial rotations in Γ are half-turns. Now, let g and h be the half-turns
around a and b, respectively. Then g ◦ h is a translation along the line ab. Therefore,
the centers of all half-turns should lie on the same line parallel to ~v . But in this
case all tiles will be of bounded distance from this line, thus they cannot cover the
whole plane.
¤
Let a ∈ R2 be any point. Choose a shortest vector ~v such that Γ contains the
translation by ~v . Then choose a shortest vector w
~ such that w
~ is not collinear with
~v , and the translation by w
~ is contained in Γ. Form the following set
Λa = {a + n~v + mw
~ | n, m ∈ Z}.
Such sets are called lattices.
Proposition 4.5. The lattice Λa is stable under all translations in Γ, i.e. for every
b ∈ Λ and every translation t ∈ Γ we have t(b) ∈ Λ.
5
Proof. It suffices to take b = a. We have t(a) = a + x~v + y w,
~ where x and y are not
integer if t(a) 6∈ Λ. We can assume without loss of generality that |x| < 1, |y| < 1
and |x| + |y| < 1. (otherwise apply a suitable composition of translations by vectors
~v and w).
~ Then the distance between a and t(a) is less than the length of w,
~ a
contradiction with the choice of w.
~
¤
Proposition 4.6. Let r ∈ Γ be a rotation around a. Then r preserves the lattice
Λa , i.e. for every b ∈ Λa , we have r(b) ∈ Λa .
~ and let t be the translation by vector ξ. Clearly, t ∈ Γ. We
Proof. Set b = a + ξ,
0
know that t = r ◦t◦r−1 is a translation by vector r(ξ). Therefore, t0 (a) = a+r(ξ) =
r(b). On the other hand, t0 ∈ Γ since Γ is a transformation group. It follows that
r(b) = t0 (a) ∈ Λa .
¤
It remains to describe all rotations that preserve a lattice.
Proposition 4.7. Suppose that a rotation r around a preserves a lattice Λa . Then
r is the rotation by an integer multiple of π/2 or π/3.
Proof. Consider the vectors ~v and w
~ from the definition of the lattice Λa . Suppose
that the angle of the rotation r is not a multiple of π. Replacing r with r◦n if
necessary, we can arrange that the angle of r be an integer fraction of 2π.
Since r(~v ) and ~v have the same length, these are two shortest vectors in Λa , in
particular, the length of w
~ coincides with the length of ~v . The vector r(~v ) − ~v must
be no shorter than ~v , therefore, the angle between ~v and r(~v ) must be no less than
π/3. On the other hand, this angle is an integer fraction of 2π. The result now
follows.
¤
This proposition shows at least that a complete classification of wallpaper groups
is possible. The actual classification requires a little bit of extra work.
5. Space symmetry groups
We will now consider symmetry groups of 3-dimensional objects. Let us start
with the description of isometries of R3 . The following theorem shows that these
are largely similar to isometries of the plane, at least if we speak of orientationpreserving isometries:
Theorem 5.1. Any orientation preserving isometry of R3 that fixes a point is a
rotation around some axis.
6. Regular polyhedra
There are 5 regular polyhedra in 3-space. Their symmetry groups are very important. A regular tetrahedron has 4 triangular faces, 4 vertices and 6 edges. The
faces are equilateral triangles. A regular cube has 6 square faces, 8 vertices and 12
edges. The faces of a regular octahedron are equilateral triangles, there are 8 faces, 6
vertices and 12 edges. Exactly 4 faces meet at each vertex. A regular dodecahedron
6
has 12 pentagonal faces, meeting 3 at a vertex, 20 vertices and 30 edges. A regular
icosahedron has 20 triangular faces, meeting 5 at a vertex, 12 vertices and 30 edges.
7. The Euler theorem
If you make a table showing the numbers of vertices (V ), edges (E) and faces
(F ) of regular polyhedra, then you see two remarkable things: one is that certain
pairs of regular polyhedra have the same numbers V , E, F , but in different order;
another observation is that V − E + F = 2. We start with the second statement.
It is more general, because it concerns not only regular polyhedra but all convex
polyhedra (and many non-convex polyhedra as well).
Theorem 7.1. Let ∆ be a convex polyhedron in R3 (convex means that ∆ stays on
one side of every plane containing a face of ∆). Then V − E + F = 2, where V , E
and F refer to the number of vertices, edges and faces, respectively, of ∆.
This theorem, due to Euler, is probably the oldest and the most important result
in geometric combinatorics.
Before proving the theorem, we need one definition. A convex polyhedron ∆ is said
to be simple if exactly 3 edges meet at each vertex. There are many simple polyhedra.
In fact, by varying all faces a little bit, one can make any convex polyhedron simple.
Proof of Theorem 7.1 for simple polyhedra. Let ∆ be a simple polyhedron. We can
assume that no edge of ∆ is horizontal, otherwise just rotate ∆ by a small angle.
For every vertex v of ∆, define the index of v as the number of edges going down
from v. The index of a vertex can be equal to 0,1,2 or 3. Moreover, there is only
one vertex of index 3 (the top one), and only one vertex of index 0 (the bottom
one). All other vertices have indices 1 or 2. Let h1 , respectively h2 , be the number
of vertices of index 1, respectively, 2. We can count the number of vertices, faces
and edges of ∆ as follows:
V = 2 + h1 + h2 ,
E = 3 + h1 + 2h2 ,
F = 3 + h2
The first equality just says that to count all vertices, we need to count the top
vertex, the bottom vertex, and the vertices of index 1 or 2. In the second equality,
we count edges as follows: there are 3 edges going down from the top vertex, 1 edge
going down from every vertex of index 1, and 2 edges going down from every vertex
of index 2; moreover, every edge goes down from a vertex of index 1, 2 or 3. In
the third equality, we count faces as follows: every face has the top vertex — this
is a vertex of index 2 or 3; for a vertex of index 2, there is only one face having
this vertex as the top; for the top vertex of ∆, there are 3 such faces. Using the
expressions for V , E and F obtained above, we conclude that V − E + F = 2. ¤
We can now prove Theorem 7.1 in general, by reducing it to the case of simple
polyhedra:
7
Proof of Theorem 7.1. Let ∆ be any convex polyhedron. Consider the polyhedron
ˆ obtained from ∆ by cutting out all vertices. Clearly, ∆
ˆ is a simple polyhedron.
∆
ˆ respectively.
Let V̂ , Ê and F̂ denote the number of vertices, edges and faces of ∆,
For every vertex v of ∆, let nv denote the number of edges meeting at v. Then there
are also exactly nv faces meeting at v. We have
X
X
V̂ =
nv , Ê = E +
nv , F̂ = F + V
v
v
where the sums are over all vertices of ∆. We know that V − E + F = 2. Using the
relations obtained above, we can conclude that V̂ − Ê + F̂ = 2 as well.
¤
8. The duality of regular polyhedra
There is the following natural question: why are there only 5 regular polyhedra.
To fully answer to this question, we would need a definition of a regular polyhedron,
which is straightforward but not exactly simple. Instead, we will show that, under
some natural assumptions on what regular polyhedra are, we can find strong restrictions on the combinatorics of regular polyhedra. Let ∆ be a regular polyhedron.
Denote by n the number of edges incident to a vertex (which should be the same
for all vertices). Also, let k denote the number of edges incident to a face (which
should be the same for all faces).
Problem 8.1. Prove that nV = kF = 2E.
We can now rewrite Euler’s theorem
1 1
+ =
n k
Also, note that n, k ≥ 3. Indeed, there
and at least 3 edges on every face.
as follows:
1
1
+ > 1/2.
2 E
are at least 3 edges meeting at each vertex,
Problem 8.2. Prove that there are only 5 pairs (n, k) satisfying the inequalities
k, n ≥ 3,
1 1
1
+ > .
n k
2
These are (3, 3), (3, 4), (4, 3), (3, 5) and (5, 3). These pairs correspond to a tetrahedron, a cube, an octahedron, a dodecahedron and an icosahedron, respectively.
Let us now discuss the second remarkable property of regular polyhedra in 3space: certain pairs of polyhedra have the same numbers V , E, F , but appearing
in a different order. This is explained by the following construction. Let ∆ be a
regular polyhedron. Then the centers of faces of ∆ are vertices of a new regular
polyhedron ∆∗ .
Problem 8.3. Prove this statement.
8
We call ∆∗ the dual polyhedron to ∆.
For example, the octahedron is dual to the cube, the cube is dual to the octahedron. The tetrahedron is dual to itself. The dodecahedron is dual to the icosahedron,
and vice versa. Thus dual polyhedra come in pairs.
Problem 8.4. Let V ∗ , E ∗ and F ∗ be the number of vertices, edges and faces,
respectively, of ∆∗ . Show that V ∗ = F , E ∗ = E and F ∗ = V .
9. Symmetry groups of regular polyhedra
We first give a general definition. Let G be a group of transformations of R3 ,
and a ∈ R3 . Consider the set of all g ∈ G such that g(a) = a. This set is also
a transformation group (check this!), which is called the stabilizer of a in G and
denoted by Ga .
We first describe the symmetry group of a regular tetrahedron. First, find the
stabilizer of a vertex. It consists of 3 rotations and 3 reflections. Actually, it is
no coincidence that it has the same number of elements as the symmetry group of
triangle. The stabilizer of a vertex is basically the symmetry group of the opposite
face.
Proposition 9.1. There are exactly 24 symmetries of a regular tetrahedron.
Proof. Let ∆ be a regular tetrahedron, and Γ its symmetry group. Fix a vertex v0
of ∆. For every vertex v, fix a symmetry gv of ∆ mapping v0 to v (such a symmetry
always exists). Now let g be any symmetry of ∆. Then g can be uniquely represented
as g = h ◦ gv , where v = g(v0 ), and h ∈ Γv . Thus the number of elements in Γ is the
number of vertices in ∆ times the number of elements in Γv (which is independent
of v). In our case, this is 4 times 6 = 24.
¤
Problem 9.2. Describe all symmetries of a regular tetrahedron.
The symmetry group of a regular tetrahedron consists of the following conjugacy
classes:
(1) the identity map (1)
(2) rotations by 2π/3 or 4π/3 around axes passing though a vertex and the
center of the opposite face (8)
(3) half-turns around axes passing though the centers of opposite edges (3)
(4) reflections in planes passing though an edge and the center of the opposite
edge (6)
(5) this group is the most complicated: consider a pair of opposite edges; there is
a plane parallel to both edges and at the same distance from them; consider
the reflection in this plane composed with a rotation by π/4 or 3π/4; these
symmetries form the remaining conjugacy class (6)
Next, let us discuss symmetries of a regular cube. There are 8 vertices of the
cube, and the stabilizer of each vertex has order 6 (as for the tetrahedron). We say
9
order meaning the number of elements in a group. Thus the symmetry group of the
cube has order 48.
There are the following conjugacy classes in the symmetry group of a regular cube:
(1) the identity map (1)
(2) rotations by 2π/3 or 4π/3 around axes passing through a pair of opposite
vertices (8)
(3) rotations by π/2 and 3π/2 around axes passing through the centers of opposite faces (6)
(4) rotations by π around axes passing through the centers of opposite faces (3)
(5) rotations by π around axes passing though the centers of opposite edges (6)
(6) reflections interchanging the pairs of opposite faces (3)
(7) reflections in a mirror passing through a pair of opposite edges (6)
(8) reflections interchanging the pairs of opposite faces composed with rotations
by π/2 or 3π/2 preserving these pairs of faces (6)
(9) the antipodal map (x, y, z) 7→ (−x, −y, −z) (we are assuming that the center
of the cube has coordinates (0, 0, 0)). (1)
(10) there are plane sections of the cube that are regular hexagons — reflections
in such sections composed with rotations by π/3 or 5π/3 (8)
Let G and H be transformation group, and ϕ : G → H a map. We say that ϕ is
a homomorphism if ϕ(g1 ◦ g2 ) = ϕ(g1 ) ◦ ϕ(g2 ) for every pair of elements g1 , g2 ∈ G.
We say that ϕ is an isomorphism if ϕ is a homomorphism and a bijection (i.e. a 1-1
correspondence).
Proposition 9.3. The symmetry group of a regular cube is isomorphic to the symmetry group of a regular octahedron.
Proof. A regular cube has the dual octahedron inscribed into the cube in the standard way. Every symmetry of the cube is a symmetry of the octahedron, and vice
versa.
¤
Problem 9.4. Prove that the symmetry group of a dodecahedron is isomorphic to
the symmetry group of an icosahedron.
Problem 9.5. Compute the number of symmetries of a regular icosahedron. Answer: 120.
There is the following remarkable relation between symmetries of a cube and
symmetries of a tetrahedron.
Proposition 9.6. The symmetry group of a regular tetrahedron is isomorphic to a
subgroup in the symmetry group of a regular cube.
Proof. One can inscribe a regular tetrahedron into a cube in such a way that the
edges of the tetrahedron are diagonals of faces in the cube. Any symmetry of the
inscribed tetrahedron is also a symmetry of the cube. However, some symmetries of
the cube map the inscribed tetrahedron to a different tetrahedron.
¤
10
10. Linear optimization problems
In many real-life questions, one needs to minimize or maximize something. E.g.
in all activities, people want to minimize the cost, minimize the effort, and maximize
the profit. These lead to optimization problems. Mathematically, the value being
optimized is a function of certain parameters. This function is called the objective
function. The parameters of the objective function can be real numbers or more
complicated objects, e.g. other functions. The simplest and most common situation is when all parameters are real numbers, and the objective function is linear.
Also, the parameters are subject to linear restrictions. Optimization problems of
this sort arise in economics, logistics, biology and social sciences. They are called
linear optimization problems. There is a whole branch of mathematics that studies linear optimization problems. For historical reasons, this branch is called linear
programming, although this name is slightly confusing.
Linear programming as a science was founded in 1939 by L. Kantorovich (Soviet
mathematician and economist, Nobel prize winner). Some years later, it was reinvented and advanced by D. Dantzig (American mathematician). They introduced a
powerful algorithm for solving linear programming problems: the simplex method.
We will discuss a variant of the simplex method below.
Some examples of real-life problems leading to linear optimization include:
• Financial planning. A bank offers a (rather large) number of financial services, each giving a certain profit. The problem is to maximize the profit
subject to various restrictions imposed by law, tax policies, risk management
policies etc.
• Animal food production. Animal food must contain sufficient amounts of
certain nutrients (like vitamins, proteins, etc.). There are several ingredients
to be blended, each containing known amounts of nutrients. The problem is
to blend the ingredients in such a way as to minimize the cost but produce
sufficient amounts of all nutrients.
• Transportation problem. The problem is to minimize the total cost of transportation, choosing a combination of different transportation channels, each
having a certain fixed cost.
Consider the following problem: maximize the function f (x, y) = 3x + 2y subject
to the restrictions x ≥ 0, y ≥ 0, x + y ≤ 3, 2x + y ≤ 4. It is not hard to draw the
region defined by these inequalities. This is the polygon ∆ with vertices at points
(0, 0), (0, 3), (1, 2) and (2, 0). The following is a rather obvious statement, which
helps to solve the problem:
Theorem 10.1. The maximum of a linear function on a convex polygon is always
attained at a vertex.
Thus it is enough to evaluate f on all vertices of ∆. In this way, we find that the
maximal value is attained at the vertex (1, 2), and is equal to 7.
11
Theorem 10.1 generalizes to polyhedra in 3-space and even to higher-dimensional
polyhedra. Thus we can use the same principle for any number of variables. However, if the dimension n of the polyhedron and the number m of facets are rather
big, then it is hard to list all vertices. Even to
¡ ¢find a single vertex, one needs to
solve an n × n linear system. But there are m
such systems — this number is
n
usually too big for computational purposes. Thus we need a better algorithm.
The algorithm described below is a variant of the simplex method. Consider
the following problem: minimize a linear function f subject to linear restrictions
L1 , . . . , Lm ≥ 0. Recall that a linear function of n variables is a function of the
form:
f (x1 , . . . , xn ) = a1 x1 + · · · + an xn + b.
The set of points (x1 , . . . , xn ) ∈ Rn given by linear inequalities L1 , . . . , Lm ≥ 0
is called a convex polyhedron. Note that, in the sense of this definition, a convex
polyhedron does not need to be bounded. For the subsequent discussion, it suffices
to think of 2-dimensional polygons or 3-dimensional polyhedra, but the algorithm
works the same in any dimension.
Let ∆ be the polyhedron given by the inequalities L1 , . . . , Lm ≥ 0. Note that
every vertex of ∆ is the intersection point of n hyperplanes of the form Li = 0.
Suppose e.g. that the hyperplanes L1 = 0, . . . , Ln = 0 intersect at a vertex v of ∆
(so that the intersection of these n hyperplanes is exactly the vertex — a priori, it
can be larger).
Proposition 10.2. Every linear function g can be represented in the form
g = a1 L1 + · · · + an Ln + b.
Proof. Indeed, the functions L1 , . . . , Ln form a linear system of coordinates in Rn ,
and a linear function is always given as a linear combination of coordinates plus a
constant, no matter which coordinate system we use.
¤
In particular, we have
Lk =
n
X
aki Li + bk ,
i=1
f=
n
X
ai Li + b.
i=1
Since v is a vertex, in particular, a point in the polyhedron, we have Lk (v) ≥ 0 for
all k. On the other hand, L1 (v) = · · · = Ln (v) = 0. It follows that bk ≥ 0 for all k.
Proposition 10.3. If all ai ≥ 0, then the function f attains its minimum at v.
Proof. Indeed, we have f (v) = b. On the other hand, if x ∈ ∆, then
f (x) =
n
X
ai Li (x) + b ≥ b,
i=1
because Li (x) ≥ 0 by definition of ∆ and ai ≥ 0 by our assumption.
12
¤
Thus if all coefficients ai are nonnegative, then we do not need to do anything.
Suppose now that some coefficients are negative. We should choose a negative
coefficient and try to get rid of the corresponding term. Namely, we would like to
go from the vertex v given by L1 = · · · = Ln = 0, to an adjacent vertex w given by
L1 = · · · = Ls−1 = Ls+1 = · · · = Ln = Lr = 0.
Thus we replace the hyperplane Ls = 0 with a different hyperplane Lr = 0, where
r > n. We need to choose Ls in such a way that as < 0. One obvious rule is the
following:
First rule Choose the hyperplane Ls such that as is the most negative coefficient
among a1 , . . . , an .
This rule explains how to choose Ls . We also need a particular rule to choose
Lr , but suppose for a moment that Lr is chosen. Then we can express Ls from the
relation
n
X
Lr =
ari Li + br ,
i=1
which gives
1
Ls = −
ars
Ã
!
X
ari Li − Lr + br
.
i6=s, i≤n
Plug in this expression to the expression for Lk through L1 , . . . , Ln , which will give
the expression for Lk through the new set of hyperplanes:
µ
¶
¶
X µ
aks
aks
aks
Lk =
aki −
ari Li +
L r + bk −
br .
a
a
a
rs
rs
rs
i6=s, i≤n
In particular, we have
µ
¶
aks
Lk (w) = bk −
br ,
ars
because Li (w) = 0 for all i = 1, . . . , n different from s, and Lr (w) = 0. Since w
should belong to ∆, we need to impose the following condition:
aks
bk −
br ≥ 0
(1)
ars
for all k. This condition is trivially satisfied if br = 0. On the other hand, in this
case, v = w, which is not very good (actually, we can get going even with that, but
this requires a separate consideration). Now we will make an additional assumption
for simplicity.
Suppose that there is a point x ∈ ∆ such that Li (x) = 0 and Lj (x) > 0 for all
j 6= i. Then we say that the set {Li = 0} ∩ ∆ is called a facet of ∆. Note that, in
general, not all functions L1 , . . . , Lm give rise to facets. Some of the functions are
unnecessary (i.e. the corresponding restrictions are superfluous), but this is hard to
check. We say that ∆ is simple if exactly n facets meet at each vertex. Note that,
13
in order to make ∆ non-simple, one needs to choose the linear functions L1 , . . . , Lm
in a very special way. E.g. in general, four planes do not pass through the same
point — one needs to choose them in a very special way to arrange this. We now
assume that ∆ is simple. Then no facet different from L1 , . . . , Ln passes through v.
In particular, bk > 0 for ALL k > n, and the case br = 0 is excluded.
Note that
0 = Lr (w) = ars Ls (w) + br ,
since Li (w) = 0 for all i 6= s in the range from 1 to n. We know that Ls (w) > 0,
thus the coefficient ars must be negative. We can now rewrite equation (1) in the
following from:
ars
aks
≤
.
br
bk
The inequality sign changed because we divided it by a negative number. Now we
can state the rule for choosing r:
Second Rule. Choose r such that
ars
br
is minimal.
The two rules we introduced are enough to arrange the computation. We just
need to do some bookkeeping to keep track of all variables.
Let A be the matrix (aki ), where k = n + 1, . . . , m and i = 1, . . . , n. Thus A
has m − n rows and n columns. Let B be the column consisting of the numbers bk ,
k = n + 1, . . . , m. Finally, we denote by E the identify matrix of size m − n, and by
a the row consisting of the coefficients a1 , . . . , an of f .
Consider the following matrix, called a simplex tableau:
µ
¶
−A E B
.
a 0 b
Note that all rows of this tableau but the last correspond to the relations between
L1 , . . . , Lm . For example, the first row is
¡
¢
−a(n+1)1 · · · −amn 1 0 · · · 0 b1 ,
which corresponds to the relation
−a(n+1)1 L1 − · · · − amn Ln + Ln+1 = b1 .
We can obtain other relations that are consequences of these relations by doing row
operations with the tableau. The last row of the tableau represents the function f .
There are m − n columns, called the basic columns, in which the simplex tableau
looks like the identity matrix (except for the last row, which is zero). Note that
basic columns correspond to the hyperplanes Lk = 0 NOT containing the vertex v.
The first and second rules can now be restated as follows in terms of the simplex
tableau:
(1) Choose the column of the simplex tableau with the most negative entry in
the last row. (This column corresponds to Ls ). The chosen column is a
non-basic column, which should be made basic. It is called the entering
column.
14
(2) We should also choose one basic column, which should become non-basic —
the leaving column. (This column corresponds to Lr ) To this end, in every
row of the simplex tableau, compute the ratio of the entry in the entering
column and the entry in the last column. Choose the row with the biggest
ratio (the biggest not the smallest, because our ratio has the opposite sign
to the one in the Second Rule). The leaving column is the one, whose entry
in the chosen row is 1.
Next, we do the Gauss elimination step to make the entering column basic. This
corresponds to expressing Ls through Lr and substituting this expression into all
our relations.
Example 10.4. Consider the same optimization problem as above: maximize the
function f = 3x + 2y subject to the conditions x ≥ 0, y ≥ 0, x + y ≤ 3, and
2x + y ≤ 4. Consider the vertex v = (0, 0). Let L1 = x, L2 = y, so that v is given
by L1 = L2 = 0. Set L3 = 3 − x − y and L4 = 4 − 2x − y, then our restrictions have
the standard form L1 , . . . , L4 ≥ 0. We have
L1 + L2 + L3 = 3,
2L1 + L2 + L4 = 4,
f − 3L1 − 2L2 = 0
Note that −f is the function we want to MINIMIZE. The corresponding tableau is
the following:


1
1 1 0 3
2
1 0 1 4
−3 −2 0 0 0
The vertex v is not optimal, because there are some negative terms in the last row.
According to Rule 1, we need to choose the entering column, which contains the
most negative entry. This is the first column corresponding to L1 (so s = 1). Next,
we choose the row, in which the ratio of the first and the last entries is maximal.
This will be the first row. Therefore, the leaving column will be the fourth. Now,
we do the Gauss elimination step to make the first row basic (change the first and
the third rows by multiples of the second row, to kill the entries in the first column,
and normalize the second row):


0 1/2 1 −1/2 1
1 1/2 0 1/2 2 .
0 −1/2 0 3/2 6
This simplex tableau corresponds to the vertex (2, 0). The value of the function f
at this vertex can be seen in the bottom-right corner. The obtained simplex tableau
corresponds to the following relations:
(1/2)L2 + L3 − (1/2)L4 = 1.
L1 + (1/2)L2 + (1/2)L4 = 2.
15
f − (1/2)L2 + (3/2)L4 = 6.
The next entering column is the second one. The next leaving column is the third
one. The next simplex tableau is


0 1 2 −1 2
1 0 −1 1 1 .
0 0 1
1 7
There are no negative entries in the last row, thus we have reached the minimum of
−f , i.e. the maximum of f . The maximal value is seen in the bottom-right corner.
Further computational aspects of linear programming are described in Marcel
Oliver’s lecture notes
http://math.jacobs-university.de/oliver/teaching/iub/spring2007/
cps102/handouts/linear-programming.pdf
11. Shortest distance
We will next discuss some famous optimization problems, in which the objects
being optimized are not numbers (they may be functions, curves, figures, etc). One
of the oldest and simplest optimization problem is the problem of finding the shortest
curve between 2 points. The answer is well-known:
Theorem 11.1. The shortest curve between two points in the Euclidean plane is a
straight line segment.
Proof. The proof is based on the triangle inequality:
|ab| ≤ |ac| + |cb|.
This inequality generalizes to the case of many points as follows:
|ab| ≤ |ac1 | + |c1 c2 | + · · · + |cn−1 cn | + |cn b|.
Now consider a curve connecting a with b. The length of the curve can be approximated by the length of a broken line with vertices on the curve, provided that the
vertices are dense enough. Let the vertices be c1 , . . . , cn . Using the triangle inequality and passing to the limit, we obtain that the length of the curve is at least |ab|,
i.e. the length of the straight segment. This proves the theorem.
¤
The next problem is also very classical:
Problem 11.2. Find the shortest curve between two points in the plane that intersects a given line.
This problem makes sense only if the two points are on the same side of the line.
There are many possible motivations/applications for this problem. One motivation
coming from physics: it is known that light always chooses the path that takes
minimal time. If lights propagates in a uniform medium, than the speed of light is
16
constant, hence to minimize the time is the same as to minimize the distance. Thus
the problem is to find the trajectory of a light ray reflecting from a mirror.
The problem can be solved as follows: reflect one point in the line, and compute
the minimal distance between the the other point and the reflection.
Another solution is by using calculus:
Problem 11.3. Let a and b be two points in the upper half-plane. Compute the
x-derivative of the function |ac| + |cb|, where c = (x, 0).
At the point of the minimum, the derivative must be zero. A generalization of
this computation is the following:
Proposition 11.4. Let a and x be two points in the plane, and ~v a vector. Let xt
be the point x + t~v . The derivative of |axt | with respect to t is equal to ~u · ~v , where
~u is the unit vector parallel to ax.
~
A similar methods applies to the following more general problem:
Problem 11.5. Let γ be a smooth closed curve in the plane, and a and b are
two points lying outside of the region bounded by γ. Consider the shortest curve
connecting a with b and passing through a point x on γ. Suppose that this curve
has no parts in common with γ except for x. Then the segments ax and xb make
the same angles with γ: the angle of reflection equals to the angle of incidence.
The following problem is due to Fermat:
Problem 11.6. Let abc be an acute triangle. Find the point x in this triangle such
that the sum |ax| + |bx| + |cx| is the smallest. Answer: this is the point, from which
all sides are seen at the angle of 120 degrees.
A similar problem for a convex quadrilateral is actually simpler:
Problem 11.7. For a convex quadrilateral abcd, find the point x in it such that
|ax| + |bx| + |cx| + |dx|
is the smallest. Answer: this is the intersection point of the diagonals.
12. Snell’s law
See also page 23 of
http://www.math.psu.edu/tabachni/Books/billiardsgeometry.pdf
Consider a ray of light that goes from the air into the water. Let c0 be the speed
of light in the air, and c1 the speed of light in the water. We have c0 > c1 . The
physical intuition says that, if the light hits the water at an acute angle α0 , then it
breaks and goes further at a bigger angle α1 . This can be deduced from the Fermat
principle: the light always chooses the path taking the least possible time. Let a be
a point in the air, and b a point in the water. The function we need to minimize is
|ax| |xb|
+
,
f (x) =
c0
c1
17
where x is a variable point on the surface of the water. We know that the derivative
of |ax| is cos α0 , and the derivative of |xb| is − cos α1 , thus the equilibrium condition
f 0 (x) = 0 reads
cos α0
cos α1
=
.
c0
c1
This can be generalized further to the case of ray light passing through several
media:
cos αi
= const,
ci
where ci is the speed of light in the i-th medium, and αi is the angle, at which the
light ray goes through this medium. Passing to the limit, we can even make the
statement about a general non-homogenous (meaning that the speed of light may
depend on a point) but isotropic (meaning that the speed of light does not depend
on the direction) medium:
cos α(t)
= const,
v(x(t), y(t))
where x(t) and y(t) are coordinates of a photon at time t, and α(t) is the angle,
at which the photon goes (if v depends only on y, this is the angle between the
trajectory and a horizontal line; in general, this is the angle between the trajectory
and the curve v(x, y) = const). Although we used physical language, the fact
we obtained is mathematical, and can be restated as follows: suppose we want to
minimize the integral
Z
ds
γ v(x, y)
over all paths γ with the fixed endpoints a and b. Here s means the length of the
segment of γ from a to (x, y). The coordinates x and y can be written as functions
of s, thus the integral above is just the usual integral
Z L
ds
,
0 v(x(s), y(s))
where L is the total length of γ. The fact we actually obtained is that the optimal
path satisfies the condition
cos α(s)
= const
v(x(s), y(s))
for all s. We will refer to this principle as the (generalized) Snell’s law.
13. Brachistochrone
See also
http://en.wikipedia.org/wiki/Brachistochrone
e.g. about (very interesting!) history of the problem. A definition (copied from
wikipedia) reads the following:
18
A Brachistochrone curve, or curve of fastest descent, is the curve between two
points that is covered in the least time by a body that starts at the first point with
zero speed and passes down along the curve to the second point, under the action
of constant gravity and ignoring friction.
The time that the body slides along a curve γ is given by the following integral:
Z
ds
,
γ v
where v is the speed of the body at a given point. This speed can be found from
the conservation of total energy:
mv 2
− gy = 0
2
(for convenience, we chose the y-axis to look down, thus the negative sign). As
√
theoretical physicists do, we set all constants to be 1, thus v = y. Snell’s law then
tells us that the optimal curve must satisfy the following:
√
cos α = y.
Here cos α can be expressed through the derivative y 0 = dy/dx (which is the tangent
of α):
1
cos α = p
.
1 + (y 0 )2
This leads to the following differential equation:
s
C −y
.
y0 =
y
We will solve this differential equation by the method of separation of variables.
This is a commonly used method but we will not discuss its rigorous justification.
Instead, we will do computations at the physical level of rigor. Rewrite the right
hand side as the ratio dy/dx. Then move everything containing y into the left hand
side, and everything containing x into the right hand side. We will get
√
y dy
√
= dx.
C −y
Now integrate both parts:
Z
Z
√
y dy
√
= dx.
C −y
The left integral can be found by the following substitution: y = C sin2 (t/2). It is
equal to
C
(t − sin t).
2
19
As the result, we have the following parameter representation for the brachistochrone:
C
C
x = (t − sin t), y = (1 − cos t).
2
2
Let us think of this parameter representation as describing the trajectory of a particle, so that x(t) and y(t) are the coordinates of the particle at time t. This motion
decomposes into two parts: the first part is
C
C
t, y = ,
2
2
which means that a point moves parallel to the x-axis on height C/2 with constant
speed C/2. The second part is
x=
C
C
sin t, y = cos t,
2
2
which is the motion along the circle of radius C/2 with uniform speed.
It is not very hard to see that the sum of the two parts represents the motion
of a particle attached to the border of a wheel as this wheel rolls freely along a
straight horizontal line. The trajectory of such particle is called a cycloid. Thus the
brachistochrone is a cycloid (we only need to invert the cycloid, because our y-axis
looked down).
x=
14. Catenary
The definition of catenary (or chain curve) taken from wikipedia:
In physics, the catenary is the shape of a hanging flexible chain or cable when
supported at its ends and acted upon by a uniform gravitational force (its own
weight).
The catenary is also solving some optimization problem, namely, it minimizes the
potential energy, which is given by the integral
Z
−y ds.
γ
However, the minimum is taken not over all curves but over curves with fixed length:
Z
ds = L.
γ
This is a slightly different type of optimization problem. The extremum of this type
is called conditional extremum.
Lagrange suggested the following approach to finding conditional minima or maxima. Let f be a function of certain parameters, which can be of any nature (numbers, curves, functions etc.). Suppose we want to minimize f subject to the condition
g = 0, where g is another function of these parameters. Introduce a variable λ (called
the Lagrange multiplier) that does not depend on the parameters of the problem (so
20
that, with respect to these parameters, λ is a constant). Then try to minimize the
function f − λg.
Suppose that we found a minimum of f − λg for an undefined value of λ (the
minimum we found depends on λ, of course). Then we can find λ from the equation
g = 0 that should be satisfied at the point of minimum. The minimum we found
will be the conditional minimum, because to minimize the function f − λg subject
to g = 0 is the same as to minimize the function f subject to g = 0.
In our case,
Z
Z
f=
−y ds,
g=
γ
ds.
γ
We will use Lagrange’s principle. To minimize f − λg is the same as to minimize the
integral of −y −λ over ds, which differs just by a constant from f −λg. Furthermore,
we need to assume −y − λ > 0, otherwise there will be no minimum (explain why).
In particular, λ must be negative, and it would be convenient to set Y = −λ. So
that we need to minimize
Z
(Y − y)ds.
γ
This can be done using generalized Snell’s Law.
Problem 14.1. Perform the computations.
15. Huygens principle
Consider propagation of light in a non-homogenous (and perhaps non-isotropic)
medium. For a point x, let Ft (x) be the region where the light propagates at time
t. The Huygens principle states that
[
Fs (y).
Ft+s (x) =
y∈Ft (x)
This can be interpreted as putting “secondary light sources” at all points of Ft (x)
and letting the light from all the secondary sources propagate for time s.
The Huygens principle has the following very useful discretization. Suppose we
know F1 (x) for all points x. Then Fn (x) can be determined inductively by the
formula
[
F1 (y).
Fn+1 (x) =
y∈Fn (x)
As an example, consider a homogenous but non-isotropic medium. Suppose that
F1 (x) = x + A (these are translates of the same set A since the medium is homogenous). By definition, we have
F2 (x) = {a + b | a, b ∈ A}.
This set is denoted by A + A. Similarly,
F3 (x) = A + A + A = {a + b + c | a, b, c ∈ A}
21
and so on. It is important not to confuse, say, A + A with
2A = {2a | a ∈ A}.
In general, these two sets are different.
However, 2A and A + A are the same for convex sets A. Recall that a set A is
called convex if, with any two points a, b ∈ A, the set A contains the whole segment
[a, b] connecting these points.
Problem 15.1. Show that, if A is convex, then 2A = A + A.
Problem 15.2. Let A be the union of the segments {0} × [−1, 1] and [−1, 1] × {0}.
Draw the picture of the sets A + A and A + A + A. Does the sequence of sets
1
(A + A +
{z· · · + A})
n |
n
converge?
In general, suppose that F1 (x) = x + A for all points x of a medium. Then the
sequence of sets (1/n)Fn (x) converges to a convex set.
Definition 15.3. Consider a non-homogenous and non-isotropic medium. Then
the speed of light depends not only on a point, but also on a particular direction
from this point. For a point x in this medium, define the indicatrix S(x) of x as
the set of vectors of unit speed sitting at x. In other terms, a vector v belongs to
the indicatrix if and only if there is a ray trajectory with constant unit speed and
velocity v at x.
If we are in 3-dimensional space, then the indicatrices are surfaces. In the plane,
indicatrices are closed curves. Define the indicatrix region as the region bounded by
an indicatrix.
The following theorem, although not very precise in the form we state it, can be
made very precise:
Theorem 15.4. Consider a smooth non-homogenous non-isotropic medium satisfying the Huygens principle. Then the indicatrix at every point must be convex.
The thing that is not very precise here is what exactly a “smooth medium” means.
This can be defined mathematically.
Proof. This is only a sketch of a proof, since we do not work with exact definitions.
Let x be a point in our medium, and consider the indicatrix region A at x. Also,
choose the time interval t to be small enough so that Ft (x) would almost coincide
with tA. On the other hand, in a small neighborhood of x, all indicatrix regions are
almost the same. Therefore, by the Huygens principle, we must have
t
+ ·{z
· · + A}).
tA = Ft (x) = Ft/n (x) + · · · + Ft/n (x) = (A
|
{z
} n |
n
n
The right-hand side converges to a convex region, therefore, A must be convex. ¤
22
16. Optimization on graphs
Other uses of the Huygens principle include some graph algorithms. Recall that a
graph is a picture consisting of vertices and edges. What is important is only which
vertices are connected by which edges. A particular geometric shape of edges is not
relevant.
Consider a graph Γ and two vertices a and b of Γ. A path from a to b is defined
as a sequence of vertices
a0 = a, a1 , . . . , an = b
such that ak is connected to ak+1 by an edge. The length of this path is defined as
n (thus all edges are considered to be of equal length — there are variations of this
definition allowing for different lengths of different edges). Consider the following
problem: given two vertices a and b of a graph Γ, find the shortest path from a to
b. Such problems appear e.g. for train track systems, where you want to minimize
the number of connections you need to make in order to get from a to b.
The idea of the following algorithm is to construct an appropriate “wave front”.
Fix a and define Fn (a) to be the set of all vertices of distance ≤ n from a. We will
construct the sets Fn (a) one by one, together with natural sets of edges En (a) that
can be used to build shortest paths from a to points in Fn (a).
First, let F1 (a) to be the set of all vertices connected to a by an edge. For every
b ∈ F1 (a), choose just one edge going from a to b, and include it into E1 (a). Next,
proceed by induction. Suppose the sets Fn (a) and En (a) are already defined. Let
∗
Fn+1
(a) be the set of all vertices that are not in Fn (a) but that can be connected to
∗
Fn (a) with a single edge. For every b ∈ Fn+1
(a), choose a single edge connecting it
to Fn (a) and include it into En+1 (a). Finally, set Fn+1 (a) to be the union of Fn (a)
∗
and Fn+1
(a).
The union
T (a) = E1 (a) ∪ E2 (a) ∪ . . .
is a tree (a graph without closed paths). Every vertex b of Γ is connected to a by a
single path in the tree T . This path realizes the shortest distance between a and b.
17. Isoperimetric inequality
Among all figures with given length of the boundary (perimeter), the figure of
maximal area is a round disk. This is a very classical statement known as the
isoperimetric theorem. It is sometimes written in the form of inequality:
l(A)2
,
4π
where S(A) is the area of a figure A, and l(A) is the length of its boundary. Of
course, we need to make certain assumptions on A that guarantee that both S(A)
and l(A) are well-defined. The inequality above is called the isoperimetric inequality.
The first proof of isoperimetric inequality was given by Jakob Steiner. However,
his proof contained a gap, namely, it was based on the assumption that the figure
S(A) ≤
23
with the given perimeter and the maximal area exists. Then it is not hard to prove
that this optimal figure should be a round disk:
• Convexity. The optimal figure must be convex because taking the convex
hull does not make the area smaller and does not make the perimeter bigger.
Recall that a figure A is convex if the segment [a, b] lies in A for every
a, b ∈ A. The convex hull of any figure A is defined as the intersection of all
convex figures containing A.
• Symmetry. Take any line that divides the perimeter of the optimal figure
into halves. Then choose the part of the figure (with respect to the given
line) with the maximal area, and reflect this part in the line. The figure thus
obtained would have the same perimeter and at least the same area.
• Suppose that the optimal figure A is symmetric with respect to a line l.
Let a and c be the boundary points of A lying in l, and b any point on the
boundary of A. Suppose we can change the triangle abc so that a and c glide
along l, and the parts of the optimal figure outside abc move isometrically.
Then the perimeter does not change, but the area of abc is maximal when
the angle abc is π/2. We must conclude that the angle abc is π/2 for ANY
point b on the boundary of the optimal figure. But then the figure is a round
disk.
However, a crucial step is to prove the existence of the optimal figure. This step
is not at all obvious. Actually, it is much harder than the rest of the proof.
Problem 17.1. Dido, a daughter of a king of Tyre, was a founder and the first
queen of Carthage. She was forced to flee to Cyprus with the treasures of her
husband Acerbas killed by her brother. Eventually she landed at Africa. She asked
the local king for a small bit of land, only as much land as could be encompassed
by an oxhide. He agreed. Dido cut the oxhide into fine strips and sewed the strips
so that she had a long rope enough to encircle an entire nearby hill. Carthage was
found on this hill. Thus Dido solved a variant of isoperimetric problem known as
the Dido’s problem:
Find the figure of maximal area bounded by a line and a curve of fixed length
with both endpoints on the line. The answer is the semicircle.
18. Fast Multiplication and Fast Fourier Transform
We will now discuss some problems from the computational science. Most of
them can be stated as follows: how to compute certain things fast? One of the first
and most natural operations to consider is multiplication of large numbers. How to
multiply two large numbers fast? The high-school algorithm turns out not to be the
fastest (at least when the numbers are large enough so that one would not want to
multiply them by hands anyway).
24
Consider first the multiplication of polynomials, which is simpler from the computational viewpoint. A polynomial f (t) can be written as
f (t) = a(0) + a(1)t + · · · + a(d)td ,
where a(n) is the coefficient with tn . To know a polynomial f is the same as to know
its coefficient sequence a (the n-th terms of the sequence a is a(n)). Let f and g
be polynomials with coefficient sequences a and b, respectively. Then the coefficient
sequence of the polynomial f · g is given by the formula
c(n) =
n
X
a(k)b(n − k).
(1)
k=0
Such sequence is called the convolution of a and b, and we write c = a ∗ b. To
multiply polynomials is thus the same as to do the convolution on their coefficient
sequences.
Suppose we want to compute the product of two polynomials of degree N − 1
(which is supposed to be a large number). If we use formula (1), then we need to
perform N multiplications for each of the coefficients. Thus, in total, we need about
2N 2 multiplications.
Note that, it is much easier to multiply polynomials if we represent them by their
values rather than coefficients. Any polynomial f of degree N − 1 is determined by
its values at N points x0 , . . . , xN −1 . We can take more than N points if we wish.
Now, if f is given by its values at points x0 , . . . , x2N −1 , and g is given by its values
at the same points, then
f · g(xi ) = f (xi )g(xi ),
and we can reconstruct the coefficients of f · g from these values.
Now the question is how to find the coefficients of f if we just know the values f (x0 ), . . . , f (xN −1 ). This can be done with the help of the following Lagrange
interpolation formula:
N
−1 Q
X
(x − xj )
Q j6=i
f (x) =
f (xi ).
j6=i (xi − xj )
i=0
We can now choose the points xi in a special way to make this formula simpler.
Namely, let ζ be a primitive N -th root of unity, and set xi = ζ i . We have then
N
−1
X
xN − 1
xN − xN
i
(x − xj ) =
=
=
xiN −1−k xk .
x
−
x
x
−
x
i
i
j6=i
k=0
Y
In particular,
Y
−1
=
(xi − xj ) = N xN
i
j6=i
25
N
.
xi
Now from the Lagrange interpolation formula it follows that the coefficients of the
polynomial f can be expressed as follows:
N −1
1 X −n
a(n) =
x f (xi ).
N i=0 i
(2)
With our choice of xi = ζ i , the sequence of values f (xi ) of f is called the discrete
Fourier transform (DFT) of a. Formula (2) gives a way to recover the original
sequence a by its DFT. The operation transforming the DFT of a sequence into the
original sequence is called the inverse discrete Fourier transform. We see from the
formula that the inverse DFT is not much different from the DFT.
We can now compute the product of two polynomials as follows. Let the polynomials be given by their coefficient sequences a and b. First, compute the DFTs
of a and b. Then multiply them element-wise (which is very fast). Finally, apply
the inverse DFT to the result. Unfortunately, the whole algorithm is still slow (at
least as slow as the high-school algorithm). The reason is that computing DFT
still takes the order of N 2 multiplications. Indeed, to compute f (xi ) one needs N
multiplications:
f (xi ) = a(0) + x1 (a(1) + x1 (a(2) + . . . ))
(this is a better way than computing powers of x1 , the latter takes even longer).
Fortunately, there are better ways to compute the DFT. The following algorithm
is called the Fast Fourier Transform (FFT) — the idea behind it goes back to Gauss.
It takes about N log N operations rather than N 2 . Suppose first that N = pq. Write
the polynomial f as
f (x) =
p−1
X
k=0
q−1
p−1
X
X
pj
xk fk (x),
x (
a(pj + k)x ) =
k
j=0
k=0
fk (x) =
q−1
X
a(pj + k)xpj .
j=0
Suppose that the numbers fk (xi ) are exactly the discrete Fourier transform of the
sequence
a(k), a(p + k), a(2p + k), . . . , a((q − 1)p + k).
2
It takes about q multiplications to compute each of these Fourier transforms using
the standard algorithm (which makes pq 2 = N q multiplications total to compute
all fk (xi )). Now computing all f (xi ) takes about N p multiplications. So that, in
total, we have about N (p + q) multiplications instead of N pq. By doing further
factorization of N , we can make the algorithm even better. Repetition of this trick
leads to an algorithm with about N log N multiplications. Thus it makes sense to
do the FFT to perform multiplication of polynomials.
Now, numbers can be thought of as polynomials since any number is represented
in the form
a(0) + a(1) · 10 + a(2) · 102 + . . .
with coefficient sequence a satisfying certain restrictions, namely, 0 ≤ a(n) < 10.
Multiplication of numbers is not quite the same as multiplication of polynomials,
26
since we need to make sure the resulting sequence satisfies our restrictions. However,
multiplying the corresponding polynomials helps to solve the problem. It is better
to do it not in base 10, but rather choose a large base (in practice, something like
216 or 232 ).
19. Approximate Fourier coefficients
Consider a (smooth) function f periodic of period 1, i.e. f (t + 1) = f (t) for all t.
Then f can be written as a sum of uniformly convergent trigonometric series
f (t) =
∞
X
a(n) en (t),
en (t) = e2πint
n=−∞
called the Fourier series for f . The numbers a(n) are called the Fourier coefficients
of f . Knowing Fourier coefficients is important e.g. for analyzing sounds (Fourier
coefficients of the sound are basically what we hear).
Suppose we want to approximate Fourier coefficients. However, we usually do not
have an exact expression for f , we can just measure the values of f at several points.
Let t0 , . . . , tN −1 be the points, at which we measure f . Assume that tj = j/N . Then
en (tj ) are N -th roots of unity.
Let us first find a function of the form
fˆ(t) =
N/2
X
â(n) en (t)
n=−N/2
such that fˆ(tj ) = f (tj ) for all tj . Here we assume for simplicity that N is even, so
that N/2 is an integer. Set x = e1 (t), Then we can rewrite fˆ(t) in the form
fˆ(t) = x−N/2 F (x),
F (x) =
N/2
X
â(n) xN/2+n .
n=−N/2
Note that F is a polynomial of x! We have the following condition on the values of
this polynomial at points xj = e1 (tj ), which are the N -th roots of unity:
F (xj ) = f (tj ) eN/2 (tj ).
By the way, eN/2 (tj ) is equal to (−1)j . We can recover the coefficients of the polynomial F by formula (2) of the inverse discrete transform:
N −1
1 X −n
x f (tj ).
â(n) =
N j=0 j
These numbers are approximations of the Fourier coefficients of f . Let us estimate how gut these approximations are. To this end, substitute the Fourier series
27
expansion of f into the last formula:
â(n) =
N −1
N
−1
X
1 X
1 X −n X
xj
a(m)xm
=
a(m)
xm−n
.
j
j
N j=0
N
j=0
m∈Z
m∈Z
Note that
N
−1
X
j=0
½
xm−n
j
=
1, m = n + N k, k ∈ Z
0, otherwise
It follows that
â(n) − a(n) =
1 X
a(n + N k).
N k6=0
For smooth functions, the Fourier coefficients a(m) are small for big m. Thus
we have a rather good approximation. Of course, one needs some estimates on the
function to say precisely how good the approximation is.
20. How to evaluate elementary functions
We will now address the question of how to compute numerical values of elementary functions. Of course, a value can only be computed approximately, but we
would like to have algorithms allowing to get a required accuracy fast.
Polynomials. Polynomials can be computed using a fast multiplication algorithm.
Addition is done much faster.
Exponential. The exponential function can be computed rather fast using the
Taylor series expansion. First, we can make the exponent small, using the formula
ex = (ex/N )N .
Then, for small t, compute et as the sum
t
e =
∞
X
tn
n=0
n!
.
Note that this series converges very fast! Actually, the slowest thing here is to
compute the powers of t. We can use the same trick as for polynomials:
et = 1 + t(1 + t/2(1 + t/3(1 + · · · ))).
Trigonometric functions. In principle, trigonometric functions are not much different from the exponential, and the computation of trigonometric functions can
be reduced to the computation of the exponential. However, it is better to do the
following, e.g. to compute the sine function:
• Use the identity sin(t + π) = − sin(t) to reduce to the case t ∈ (0, π).
• Use the identity sin(π − t) = sin(t) to reduce to the case t ∈ (0, π/2).
• Use the identity sin(2t) = 2 sin(t) cos(t) to reduce to the case t ∈ (0, π/4).
Then we have |t| < 1.
28
• Use the (rapidly convergent) power series for the sine:
∞ 2n+1
X
t
(−1)n−1
sin(t) =
.
(2n + 1)!!
n=0
Arrange the computation as follows:
sin(t) = t(1 − t/2/3(1 − t/4/5(1 − · · · ))).
A similar computation scheme is used for the cosine.
21. Newton’s method
√
Consider the computation of the square root. The power series for 1 + t converges fast for small values of t. Thus we can use power series to compute square
roots of numbers that are close to 1. However, the problem is to reduce the general
computation to this case: for this, one needs to have some decent approximation of
the square root to begin with. This is why a different method is preferable, and it
also works faster.
The method we will talk about is due to Newton, it is called Newton’s method.
Suppose we want to solve an equation f (x) = y0 for x. Also, suppose we have some
approximation x0 to the solution. Then we can replace f (x) in the left hand side
by its linear approximation at x0 :
fˆ(x) = f (x0 ) + f 0 (x0 )(x − x0 ).
The equation fˆ(x) = y0 is linear, and the solution is
y0 − f (x0 )
.
x = x0 +
f 0 (x0 )
Then x is regarded as the next approximation, and the whole procedure is repeated.
Let us discuss how
a be a complex
√ Newton’s method works for the square root. Let
2
number. To find a, we need to solve the following equation: x = a. If we have
some approximation xn to the solution, then the next approximation is given by the
formula
xn
a
xn+1 =
+
.
2
2xn
29