Download Notes: Orthogonal transformations and isometries

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Laplace–Runge–Lenz vector wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Symmetric cone wikipedia , lookup

Euclidean vector wikipedia , lookup

Exterior algebra wikipedia , lookup

Determinant wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

System of linear equations wikipedia , lookup

Rotation matrix wikipedia , lookup

Vector space wikipedia , lookup

Gaussian elimination wikipedia , lookup

Principal component analysis wikipedia , lookup

Jordan normal form wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Matrix multiplication wikipedia , lookup

Matrix calculus wikipedia , lookup

Four-vector wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Transcript
Orthogonal transformations and isometries
Steve Mitchell
January 2002
1
Introduction
Our study of curves and surfaces will focus primarily on their geometric properties—that is,
on properties that are invariant under distance-preserving transformations such as translations, rotations and reflections. Such a transformation is called an isometry. For example,
the arc-length of a curve in R3 is invariant under isometries of R3 . Since isometries play a
central role, it is natural to ask: What does a typical isometry of R3 look like? The main
goal of these notes is to give a complete answer to this question. In particular, we will show
that every oriented isometry or “rigid motion” of R3 is the composition of a translation and
a rotation, while a non-oriented isometry also involves a reflection (see below for the precise
definitions and results). In fact we will analyze more generally the isometries of Rn . This is
no harder than the 3-dimensional case, but no harm will be done if you assume n ≤ 3 for a
first reading.
A secondary goal is to review some basic linear algebra. This is important for two reasons:
(i) The most interesting isometries are the linear isometries, also known as orthogonal transformations or orthogonal matrices; and (ii) linear algebra is an important tool in differential
geometry generally. Concerning item (ii), it could even be said that the essential strategy of
differential calculus is to approximate non-linear transformations by linear transformations.
So we certainly want to have a firm grasp on the linear case. Please note, however, that this
is not a complete or self-contained account of linear algebra; many definitions and proofs
are omitted or only sketched. I am assuming that you have seen most of the linear algebra
before, and that you have a linear algebra textbook at hand for further reference.
2
2.1
Vector spaces and linear transformations
Vector spaces
Recall that a vector space (over the real numbers) is a set V equipped with a zero element,
+
·
an addition map V × V −→ V , and a scalar multiplication map R × V −→ V . These are
subject to a familiar list of axioms, such as the associative, commutative and distributive
laws; see any linear algebra text for the precise list. The zero element is sometimes denoted
simply as 0, and sometimes as ~0 if there is danger of confusion with other zero objects that
are lying around (such as the scalar zero that lives in R).
1
The fundamental examples are the Euclidean spaces Rn , defined for each n ≥ 1 as the
set of n-tuples (x1 , ..., xn ) of real numbers, with addition and scalar multiplication defined
componentwise. It is useful to also make the convention that R0 is the zero vector space;
that is, the vector space consisting of the zero element and nothing else. There are many
other examples of vector spaces, some of a quite different character. For instance, consider
the set C(R2 , R) of all continuous real-valued functions on the plane. Then with the usual
pointwise addition and scalar multiplication, C(R2 , R) is a vector space. Although examples
of this type are not important for present purposes, they serve to illustrate the fact that the
“vectors” in a vector space are not always vectors in the classical sense of the term.
A basis for a vector space V is a subset S of V such that every v ∈ V can be written
uniquely as a linear combination of elements of S. A fundamental theorem asserts that every
vector space has a basis. We will be interested only in the case when S is finite. In that
case it is a theorem that any two bases have the same number of elements, and this common
number is called the dimension of V . (If there is no finite basis then V is infinite-dimensional;
the vector space C(R2 , R) is an example.) In the finite-dimensional case the defining proprety
of a basis can be expressed precisely as follows. Let v1 , ..., vn be a basis for V , where n is the
dimension of V . Then for every v ∈ V there are unique coefficients ci ∈ R such that
v=
n
X
ci v i .
i=1
Note this assertion breaks down into two separate claims: (1) Such an equation exists;
in other words, the vi ’s span V ; and (2) the coefficients ci are unique; in other words, the
vi ’s are linearly independent.
For example, we will frequently make use of the standard basis for Rn , defined as the set
e1 , ..., en where ei = (0, ..., 1, .., 0) and the “1” is in the i-th place. At some point in your
life you have to prove that this is indeed a basis, but I will assume this fact is known. In
particular, the dimension of Rn is n. It is essential to note, however, that there are infinitely
many different bases for Rn ; the standard basis is just one possibility. In fact if v1 , ..., vn are
vectors in Rn , and we form an n by n matrix A with the vi ’s as columns, then v1 , ..., vn will
constitute a basis if and only if the matrix A is invertible. This is another standard fact
from linear algebra; recall also that A is invertible if and only if A has nonzero determinant.
2.2
Vector subspaces
A vector subspace, or linear subspace, of a vector space V is a subset W that contains 0
and is closed under addition and scalar multiplication. Here the use of the word “closed”
has nothing to do with topology; it is standard mathematical useage to say that a set is
“closed under the operations blah-blah-blah” if, whenever the operations blah-blah-blah are
performed on members of the set, the resulting element is still in the set. In particular, any
vector subspace of a vector space is then a vector space in its own right; this yields a rich
supply of further examples. For example, note that any line or plane through the origin in
R3 is a vector subspace of R3 . Note also the two extreme cases: V is always a vector subspace
of itself, and the zero vector space {0} is always a vector subspace of V .
Warning. In topology or metric space theory, the term subspace is used in a much more
2
general sense: A subspace of R3 can be a completely arbitrary subset, with the metric (or
if you know topology, the subspace topology) that it inherits from the usual metric on R3 .
Thus a vector subspace of R3 , or more generally of Rn , is also a subspace in this topological
sense, but not conversely. This can lead to confusion because in the heat of battle one tends
to drop the qualifier “vector” and say simply “subspace” when one really means “vector
subspace”. In fact, proofreading notwithstanding, I would not be surprised if I’ve done it
in these notes. Usually the context will make the meaning clear, but to be safe let’s try to
leave the qualifier in.
Red Alert. A line or plane that does not go through the origin is not a vector subspace of
R3 . The definition is quite explicit about this, after all! In order to make the distinction it
is psychologically critical to have a technical term for these more general lines and planes.
Let us call a subset W of Rn an affine subspace if it has the form W = V + ~q for some
vector subspace V and vector ~q. In other words, after translating W by −~q, we get a vector
subspace V . Notice that any line or plane in R3 is an affine subspace. It is worthwhile to get
this straight, because later, when we discuss tangent planes and lines, we will need to think
of them both ways: as affine subspaces (which is the usual way that you’re accustomed to,
even if you never used that term!), or as the vector subspaces obtained after translation to
the origin.
Here is another type of example: Suppose given a homogeneous system of m linear
equations in n unknowns, written in matrix form as Ax = 0 for some m × n matrix A.
Then the solution set, also known as the nullspace of A (or kernel if one is thinking in terms
of linear transformations), is a vector subspace of Rn . On the other hand, if we have an
inhomogeneous system Ax = b, then the solution set is an affine subspace. Of course it
could be the empty set, but if there is one solution x0 and W is the nullspace of A, then the
complete solution set is precisely the affine subspace W + x0 .
One last type of example: Given vectors v1 , ..., vr ∈ V , we can form the set W consisting
of all linear combinations of the vi ’s. It follows trivially that W is a vector subspace; we call
W the span of the vi ’s, or “the vector subspace spanned by the set S = {v1 , ..., vr }. Thus any
nonzero vector spans a line, any two linearly independent vectors span a plane, etc.; more
generally, the span of S has dimension at most r, and has dimension exactly r if and only if
the vi ’s are linearly independent.
2.3
Linear transformations and matrices
Let V, W be vector space. A linear transformation from V to W is a function F : V −→W
such that
(i) for all v1 , v2 ∈ V , F (v1 + v2 ) = F (v1 ) + F (v2 );
and
(ii) for all v ∈ V , c ∈ R, F (cv) = cF (v).
It is clear that a linear transformation is uniquely determined by its values on a basis
for V . For if v1 , ..., vm is a basis and we know F (vi ) for each i, then to compute F (v) for
P
P
arbitrary v we write v = ci vi and get F (v) = ci F (vi ).
3
If we fix a basis w1 , ..., wn for W , we can make this even more explicit as follows: There
are unique coefficients aij ∈ R such that
F (vj ) =
n
X
aij wi .
i=1
If we form the n × m matrix A with entries aij , then the linear transformation F is uniquely
determined by the matrix A, and vice-versa. Notice that the matrix depends on the choice
of basis for V, W.
An important special case is when V = W , so we are considering a linear transformation
from V to itself. In this case we could still consider two different bases v1 , ..., vn and w1 , ..., wn
for V and compute the matrix A in terms of these bases—the vi ’s being used for the input,
and the wi ’s for the output. Occasionally it will be useful to do this, but more often we
will insist on using a single basis for V . Thus when we say that F : V −→V is a linear
transformation and has matrix A = (aij ) with respect to the basis {vi }, we mean that
P
F (vj ) = i aij vi .
The basic example is to take V = W = Rn , and to represent linear transformations by
their matrices with respect to the standard basis. Recall the general fact: The i-th column
of the matrix represents the image under F of the i-th standard basis vector (if we are
computing with respect to the standard basis in Rn ). Here are some specific examples, with
n = 3; in this case we often use the traditional x, y, z in place of x1 , x2 , x3 :
Example 1. Let F (x, y, z) = (−y, x, z). Geometrically, F is a 90-degree rotation around
the z-axis, counterclockwise as viewed from the positive z-axis. The matrix A with respect
to the standard basis is


0 −1 0


 1 0 0 
0 0 1
Example 2. Let F (x, y, z) = (x, y, −z). Then F is reflection across the xy-plane, and has
matrix


1 0 0


 0 1 0 
0 0 −1
Example 3. Generalizing example 2, let F (x, y, z) = (ax, by, cz) for scalars a, b, c. Then
the matrix of F is


a 0 0


 0 b 0 
0 0 c
A matrix of this form is called diagonal, for the obvious reason. The geometric interpretation is also easy to see: Suppose for instance that a, b > 1 and 0 < c < 1. Then F is an
expansion (or “dilation”) by the appropriate factor in the x, y directions, and is a contraction
by the factor c in the z direction. For another example, suppose a = 0 and b = c = 1. Then
4
F is projection on the yz-plane. Note that all of the above examples except this last one are
invertible.
There is no law that says we must use the standard basis. Indeed, it will be essential
to remain flexible and allow ourselves to choose whatever basis is most convenient for the
problem at hand. Consider, for instance, the linear transformation F of the plane that
reflects across the line through the origin at an angle of π/6 with the x-axis. The matrix of
F with respect to the standard basis is
√
3
2
−1
2
1
√2
3
2
!
√
√
Now suppose we use instead the basis v1 = ( 3, 1), v2 = (−1, 3). Then the matrix of F
with respect to this basis is
1 0
0 −1
!
The second matrix is obviously simpler, and displays the essential features of the transformation: F fixes the line spanned by v1 pointwise, and acts as multiplication by −1 on the
perpendicular line spanned by v2 .
2.4
Composition of linear transformations
Suppose we are given linear transformations G : Rk −→Rm and F : Rm −→Rn . Then the
composition F ◦ G is a linear transformation Rk −→Rn , as you can easily check. This raises
the question: How is the matrix representing F ◦ G related to the matrices representing
G, F ? (Here all matrices are being computed with respect to the standard bases.) Let A,
B be matrices of size n × m, m × k, respectively. Then the product AB is defined to be
the n × k matrix whose ij-th entry is the dot product of the i-th row of A with the j-th
column of B. Note this only makes sense when the number of columns of A is the same as
the number of rows of B. Explicitly, we have
(AB)ij =
m
X
air brj .
r=1
Proposition 2.1 Let G : Rk −→Rm and F : Rm −→Rn be linear transformations. Let B, A
denote the matrices of G, F , respectively, with respect to the standard bases. Then the matrix
of F ◦ G is AB.
The proof is a straightforward check, if you keep your wits about you and don’t let those
unruly indices get the upper hand. In beginning linear algebra courses, one usually defines
matrix multiplication first and relates it to composition of linear transformations later. But
from a conceptual point of view, Proposition 2.1 is the reason why matrix multiplication is
defined the way it is in the first place. Here is a simple illustration of the enormous advantage
of the conceptual viewpoint: One of the important properties of matrix multiplication is
5
that it is associative: A(BC) = (AB)C, provided the matrices in question have the right
size so that their products are defined. A direct proof of this fact from the definition of
matrix multiplication is tedious and unilluminating. With Proposition 2.1 at hand, however,
the proof becomes trivial and at the same time enlightening—the associativity of matrix
multiplication is equivalent to associativity of composition of functions: F ◦ (G ◦ H) =
(F ◦ G) ◦ H. This last equation is immediate on inspection for any kind of functions, not
just linear ones. Q.E.D.!
2.5
Eigenvalues and eigenvectors
In the long run, there will be much to say on this subject. For now, I just review the most
basic facts and definitions. Recall that a nonzero vector v is an eigenvector for the linear
transformation A if there is a scalar λ such that Av = λv. The scalar λ is an eigenvalue. Some
linear transformations have no eigenvectors at all; for example, any non-identity rotation of
the plane. At the opposite extreme, it can happen that there is actually a basis of Rn
consisting of eigenvectors of A. This is equivalent to saying that there is a basis such that
the matrix of A with respect to that basis is diagonal (exercise). More examples can be
found below.
The most important result about eigenvalues is the following: The characteristic polynomial of A is det (xI − A), where x is an indeterminate. Note that if A is n × n, the
characteristic polynomial is a monic polynomial of degree n.
Proposition 2.2 Let A be a linear transformation of Rn (thought of as an n × n matrix
with respect to the standard basis). Then the eigenvalues of A are precisely the real roots of
the characteristic polynomial of A.
For example, the fact that the 90-degree rotation matrix has no eigenvalues is reflected
in the fact that its characteristic polynomial is x2 + 1, which has no real roots.
Remark: There is a small ambiguity in the terminology here, at least as used in practice.
Any n × n matrix of real numbers can also be regarded as a matrix of complex numbers, and
when so regarded it may have various non-real eigenvalues. In the example just given, the
rotation matrix has complex eigenvalues ±i. In order to avoid this ambiguity we sometimes
use the terminology “A has a real eigenvalue” in order to emphasize that we are excluding
the non-real eigenvalues, if any (even though logically this is redundant since our eigenvalues
are real by definition).
2.6
Invariant subspaces
Suppose F : V −→V is a linear transformation from V to itself, where we assume as usual
that V is finite-dimensional. A vector space W ⊂ V is F-invariant, or an invariant subspace
for F, if F (W ) ⊂ W .
Exercise 1. Prove the following: Suppose F is invertible and W is F -invariant. Then
F (W ) = W . Conclude that W is invariant under the inverse tranformation F −1 .
6
As an example, think of a rotation around the z-axis. Then both the xy-plane and the
z − axis are invariant subspaces. Of course the two extreme cases—the zero vector subspace
of V , and V itself—are automatically F -invariant for any linear transformation F . But in
general a linear transformation may not have any other invariant subspaces; for example, a
rotation of the plane has none. Note also that F has an invariant subspace of dimension one
(i.e., a line) if and only if F has a real eigenvalue. (Prove!)
2.7
Matrix algebra
Let Mn R denote the set of all n × n matrices over R. Then Mn R is a vector space in its
own right, with the evident component-wise definition of addition and scalar multiplication.
2
Its dimension is n2 , and indeed as a vector space it is really no different from Rn ; we just
decided to arrange our n2 numbers in an n × n array instead of in a single row. But Mn R has
a lot more structure to it, because we also have matrix multiplication. The multiplication
satisfies the associative law already discussed, as well as the usual distributive laws. Those
who know some abstract algebra will recognize that Mn R is precisely what is called a ring
(a term for which neither Wagner nor Tolkien can take any credit), but we don’t need to
get that abstract. The crucial point is that not all of the “usual rules of algebra” are still
operative; in particular, the commutative law AB = BA does not hold.
Let’s be clear about one thing right away: There is nothing in the least bit mysterious
or surprising about this failure of the commutative law. Life is full of non-commutative
operations: Put on your socks. Put on your shoes. I claim these two operations do not
commute; try it if you don’t believe me. The order of operations matters—this is wellknown to anyone who has ever played with Rubik’s cube, or moved a large piece of furniture
through a corner doorway. Thus the fact that the matrices
0 −1
1 0
!
1 0
0 −1
!
and
fail to commute is as commonplace as a rainy December day in Seattle. If you rotate the
plane 90 degrees counterclockwise, then reflect across the x-axis, the point (1, 0) ends up at
(0, −1). If you do it in the other order, it ends up at (0, 1). Try it with a piece of paper,
or lie down on the floor and act it out. There is no reason such operations should commute
with one another, and they don’t.
3
3.1
Inner products and orthonormal frames
Inner products
Suppose given vectors v = (a1 , ..., an ) and w = (b1 , ..., bn ) in Rn . The dot product (or inner
product, or scalar product) is given by
7
v·w =
n
X
ai bi .
i=1
Another common notation is hv, wi. The key properties of the inner product are as follows:
• It is bilinear: hv1 + v2 , wi = hv1 , wi + hv2 , wi and for scalars c we have hcv, wi = chv, wi,
and similarly with the roles of v, w reversed.
• It is commutative: hv, wi = hw, vi.
• It is positive definite: hv, vi = 0 if and only if v = 0.
√
The length of a vector v is by definition | v |= v · v. Two vectors v, w are orthogonal
if hv, wi = 0. Two vector subspaces V, W of Rn are orthogonal if v, w are orthogonal for all
v ∈ V and w ∈ W . The orthogonal complement of a vector subspace V is the set of all
vectors in Rn orthogonal to V . We denote it V ⊥ (read as “V-perp”). It is easy to check that
V ⊥ is itself a vector subspace. For example, the orthogonal complement of the z-axis is the
xy-plane, and vice-versa.
A collection of vectors v1 , ..., vk in Rn is orthonormal if each vi has length one and any
two distinct vi ’s are orthogonal. In other words,
(
vi · vj =
1 if i = j
0 if i 6= j
Exercise. Show that any set of orthonormal vectors v1 , ..., vk ∈ Rn is linearly independent.
Hence if k = n, these vectors form an orthonormal basis of Rn .
The dot product determines the length function, by definition. Conversely, the length
function determines the dot product by the so-called “polarization identity” (a fancy name
for a simple fact):
Proposition 3.1 v · w = 21 (|v + w|2 − |v|2 − |w|2 ).
Proof: Easy check.
Finally, we note a handy alternative way of thinking about the dot product. We regard
vectors v in Rn as “column vectors”, i.e. as n × 1 matrices. Then the transpose v T is just
the 1 × n matrix or “row vector” obtained by writing out the components of v horizontally
instead of vertically. With this convention, v · w = v T w, where the right-hand side is matrix
multiplication. Thinking of the dot product this way greatly simplifies certain calculations,
as we’ll see below.
8
3.2
Orthonormal bases
Proposition 3.2 Let V be a linear subspace of Rn . Then V admits an orthonormal basis.
In fact a stronger statement is true: If v1 , ..., vm is any basis for V , then there exists an
orthonormal basis w1 , ..., wm with the property that for all 1 ≤ k ≤ m, the span of w1 , ..., wk
is the same as the span of v1 , ..., vk .
Proof: We proceed by induction on the dimension m of V . If m = 1 and v1 is a basis for
V , then we just take the normalization w1 = v1 /|v1 |. At the inductive step, suppose V
has dimension m + 1, and let v1 , ..., vm+1 be a basis. Applying the inductive hypothesis to
the subspace U with basis v1 , ..., vm , we get an orthonormal basis w1 , ..., wm for U with the
property stated in the proposition. Now define
xm+1 = vm+1 −
m
X
(vm+1 · wi )wi .
i=1
A simple calculation shows that xm+1 · wi = 0 for all i; notice also that the span of
w1 , ...wm , xm+1 is still V . Then set wm+1 = xm+1 /|xm+1 |. This is the desired basis, QED.
Remark: Note that the proof gives an algorithm for computing the wi ’s, known as the
Gram-Schmidt process. For many purposes, however, all we need is the existence statement.
Remark. Instead of assuming V is given as a linear subspace of Rn , we could have assumed
V is any finite-dimensional real vector space equipped with an inner product V × V −→R.
The rest of the statement and proof go through unchanged.
Given V as above, choose an orthornormal basis v1 , ..., vm for V and an orthonormal basis
w1 , ..., wk for V ⊥ . Then I claim that v1 , ..., vm , w1 , ..., wk is an orthonormal basis for Rn , so
m + k = n. Certainly these m + k vectors are orthonormal, hence linearly independent. It
remains to show that they span Rn . If not, then as in the proof of the proposition we can
find a unit vector u that is orthogonal to the vi ’s and the wj ’s. Since u is orthogonal to the
vi ’s, u ∈ V ⊥ by definition. Since u is orthogonal to the wj ’s, it is orthogonal to itself and
hence u = 0, contradiction.
A nice basis-free way of expressing the results of the previous paragraph:
Proposition 3.3 Let V ⊂ Rn be a linear subspace. Then every x ∈ Rn can be written
uniquely in the form v + w, where v ∈ V and w ∈ V ⊥ .
Proof: Expanding x in the basis vi , wj defined above shows that x = v + w as required. For
the uniqueness, suppose x = v1 + w1 is another such expression. Then v + w = v1 + w1 , so
v − v1 = w1 − w. But clearly V ∩ V ⊥ = 0 (any element of the intersection is orthogonal to
itself), forcing v − v1 = 0 = w − v1 . So v = v1 and w = w1 as desired.
3.3
Orthogonal projections
This section is of general interest, although tangential to our main goal of classifying isometries.
9
3.3.1
Definition and the main examples
Proposition 3.4 Suppose V ⊂ Rn is a linear subspace (picture a line or plane in R3 ). Then
there is a unique linear transformation πV : Rn −→Rn with the property
(
πV (x) =
x if x ∈ V
0 if x ∈ V ⊥
Proof: Write x = v + w as in Proposition 3.3, and define πV (x) = v. The uniqueness in
Proposition 3.3 ensures πV is well-defined, and makes it easy to show πV is linear (check
this!). The desired property of πV is then immediate from the definition. Finally, the fact
that πV is the unique linear transformation with the stated property is also immediate: If π
also has the property, then π(x) = π(v) + π(w) = v + 0 = v = πV (x).
The transformation πV is called orthogonal projection on V; πV (x) is thought of as the
“component of the vector x in the V direction”.
Exercise. Show that: a) πV πV = πV ;
b) πV ⊥ = I − πV , where I is the identity;
c) πV πV ⊥ = 0 = πV ⊥ πV .
In our differential geometry applications, we’ll be interested mainly in the case n = 3,
with V a line or a plane. In these cases there is a simple explicit formula for πV . Suppose
first that V is a line, and choose a unit vector v that spans it (there are exactly two such
vectors). Then
πV (x) = hx, viv.
To prove this, one has only to check that πV is linear, πV (v) = v and πV (x) = 0 if hx, vi = 0.
All three are immediate on inspection. As a reality check, it behooves us to see what happens
when we replace v by −v; we had better get the same transformation! But as you can see,
the minus sign will appear twice in the above formula, and cancel out.
Now suppose V is a plane. Choose v a unit vector orthogonal to V (again there are two
choices). Then by part (b) of the exercise we have immediately
πV (x) = x − hx, viv.
3.3.2
A general matrix formula for projections
This section is optional reading. We consider how to express orthogonal projections as
explicit matrices.
Choose an orthormal basis v1 , ..., vk for V , and form the matrix A = (v1 |v2 |...|vk ) (i.e.
the vi ’s are the columns). Then the orthonormality is equivalent to the condition AT A = Ik ,
where Ik is the k × k identity matrix. We can also take the product in the other order: AAT ,
obtaining an n × n matrix.
Proposition 3.5 The matrix of πV in the standard basis is AAT .
10
Proof: Since AAT defines a linear transformation, we only need to check the properties
(
T
AA (x) =
x if x ∈ V
0 if x ∈ V ⊥
as in Proposition 3.4. First note that
(AAT )A = A(AT A) = AIk = A.
Since the columns of A are the vi ’s, it follows that (AAT )(vi ) = vi for all i, and then by
linearity that (AAT )(v) = v for all v ∈ V . On the other hand if w ∈ V ⊥ then (AAT )(w) =
A(AT w) = 0, since by assumption w is orthogonal to the columns of A. QED.
Note the surprising consequence: The matrix AAT is independent of the choice of orthonormal basis for V . We saw in the previous section how this works when v is a line, and
it is possible to give an a priori proof here as well, if desired.
Example. If V is a line spanned by a unit vector v = (c1 , ..., cn ), then AAT is the matrix
(ci cj ). For example if n = 3 and v = √13 (1, 1, 1), then AAT is the matrix with all entries
√
equal to 1/ 3.
Remark. Call an n × n matrix B symmetric if bij = bji for all i, j; in other words,
B = B T . Then (as already seen in the previous example) the projection matrix AAT above
is symmetric: (AAT )T = (AT )T AT = AAT . Symmetric matrices arise in another way in the
context of “self-adjoint linear operators”, which in turn arise in the definition of curvature
for surfaces.
Finally, note that if we choose any orthonormal bases v1 , ..., vm and w1 , ..., wk for V , V ⊥
as we did earlier, then the matrix of πV with respect to the basis v1 , ..., vm , w1 , ..., wk is
Im 0
0 0
!
Here Im is the m × m identity matrix, and the 0’s denote zero matrices of the appropriate
sizes.
4
Orthogonal transformations
Let F : Rn −→Rn be a linear transformation. We call F an orthogonal transformation if F
preserves inner products:1
hF (v), F (w)i = hv, wi.
It follows that F preserves lengths (take v = w) and orthogonality (i.e., if v, w are orthogonal,
that is to say have inner product zero, then F (v), F (w) are again orthogonal); hence the term
orthogonal transformation.
1
Unfortunately, the terminology here clashes with the term “orthogonal projection”. Orthogonal projections are never orthogonal transformations, except for the identity.
11
Naturally, we would like to characterize orthogonal transformations in terms of the associated matrix with respect to the standard basis. Call an n × n matrix A orthogonal if
AAT = I. Here T denotes the transpose and I is the identity matrix. In particular, A is
invertible with inverse AT . If we unravel the equation AAT = I into components, we find
that it says precisely that the rows of A form an orthonormal basis for Rn . Since AAT = I
implies AT A = I, it follows that the columns also form an orthonormal basis. Summarizing:
A matrix is orthogonal if and only if its rows form an orthonormal basis if and only if its
columns form an orthonormal basis.
Here is the promised characterization:
Proposition 4.1 Let F : Rn −→Rn be a linear map. Then the following are equivalent:
a) F is a linear isometry.
b) F is an orthogonal transformation.
c) The matrix of F with respect to any orthonormal basis is orthogonal.
d) The matrix of F with respect to the standard basis is orthogonal.
Proof: (a) ⇒ (b): By assumption F preserves distances , so in particular preserves distance
from the origin: |F (v)| = |v| for all v. Now use the polarization identity and the linearity of
F to show F preserves inner products, i.e. is an orthogonal transformation. (The linearity
of F is needed to get |F (v) + F (w)|2 = |F (v + w)|2 .)
(b) ⇒ (c): Let v1 , ..., vn be an orthonormal basis, and let A denote the matrix of F with
respect to this basis. Hence the j-th column of A is F (vj ), expressed in the given basis.
Since F preserves inner product, it preserves orthonormality. It follows that the columns are
P
orthonormal, i.e. A is an orthogonal matrix. (Here we’re using the fact that if v = ai vi
P
P
and w = bi vi , then since the vi ’s are orthonormal, v · w = ai bi .)
(c) ⇒ (d): Immediate.
(d) ⇒ (a): It’s convenient and instructive to break this into (d) ⇒ (b) ⇒ (a). So let A
denote the matrix of F in the standard basis, and suppose A ∈ O(n). We want to show first
that (Av) · (Aw) = v · w for all v, w. Recall that the dot product of any two column vectors
x, y is the same as the matrix product xT y. So we compute
F (v) · F (w) = (Av) · (Aw) = (Av)T Aw = v T AT Aw = v T Iw = v T w = v · w.
So F is an orthogonal transformation. It follows that |F (v)| = |v| for all v. But the
distance d(v, w) = |v − w|, so using the linearity of F we get
d(F (v), F (w)) = |F (v) − F (w)| = |F (v − w)| = |v − w| = d(v, w)
as desired.
Exercise 2: Prove the following basic properties:
a) If A is orthogonal then det A = ±1.
b) Any product of orthogonal matrices is orthogonal, the inverse of an orthogonal matrix
is orthogonal, and the identity matrix is orthogonal.
12
c) If λ is a real eigenvalue of an orthogonal matrix A, then λ = ±1. In particular, a
diagonal matrix is orthogonal if and only if its diagonal entries are all ±1.
d) Suppose A is an orthogonal matrix, and v, w are eigenvectors of A with eigenvalues
+1, −1 respectively. Then v is orthogonal to w.
Let’s look at some low-dimensional cases. If n = 1 there are exactly two orthogonal
matrices; namely, ±1. Yawn. If n = 2, the situation is already getting interesting. For
example, consider the orthogonal transformation given by counterclockwise rotation through
an angle θ. The corresponding matrix is
cos θ − sin θ
sin θ cos θ
!
Note that a direct computation show this matrix is indeed orthogonal (its columns have
length one, and are orthogonal). Note that such a matrix has no real eigenvalues, except
when θ is a multiple of π. On the other hand the matrix
− cos θ sin θ
sin θ cos θ
!
is a reflection (except when θ is a multiple of 2π, in which case we have the identity matrix).
Verify this by finding the axis of reflection explicitly. See Exercise 4 for the general definition
of reflection.
Exercise 3. a) Show that a 2 × 2 orthogonal matrix is a rotation if it has determinant one,
and is a reflection if it has determinant -1.
b) Show that every rotation is a product of two reflections.
c) Show that if A is a rotation and B is a reflection, then AB = BA if and only if A = ±I.
Now consider the case n = 3, which is the most important case for us. One easy way to
cook up examples is to promote 2 × 2 examples. Thus if
a b
c d
!
is orthogonal, then so is


a b 0


 c d 0 
0 0 ±1
For example, if the original matrix was rotation through angle θ and we use +1 in the
lower right entry, the new matrix is rotation of 3-space around the z-axis, through the same
angle. But there is nothing special about the z-axis; we could take some other line L through
the origin, and then rotate the plane W orthogonal to L to get an orthogonal transformation.
Eventually, we will probably want to find a formula for this matrix (in terms of a unit vector
spanning L, and an angle of rotation), but there is no need to do so now.
13
Among the many interesting properties enjoyed2 by orthogonal matrices, I would like to
emphasize the following:
Theorem 4.2 Suppose A is an orthogonal matrix (and hence represents an orthogonal transformation of Rn ), and W ⊂ Rn is an A-invariant subspace. Then the orthogonal complement
W ⊥ is also an A-invariant subspace.
Proof: Suppose v ∈ W ⊥ ; that is, hv, wi = 0 for all w ∈ W . Then
hAv, wi = hAv, AA−1 wi = hv, A−1 wi = 0.
Here the second equality uses the fact that A is orthogonal, while the third uses the fact
that W is also invariant under A−1 . Since w ∈ W was arbitrary, this completes the proof.
Note that the theorem can fail for non-orthogonal matrices. For example, if A is the
matrix
1 1
0 1
!
then the x-axis is A-invariant, but its orthogonal complement, namely the y-axis, is not.
As an application of the above theorem, we prove:
Theorem 4.3 Let F be an orthogonal transformation of R3 . Then there is an orthonormal
basis v1 , v2 , v3 such that the matrix of F with respect to this basis has the form


a b 0


 c d 0 
0 0 ±1
To prove this we need some preliminary results that are of interest in their own right.
Lemma 4.4 Let f (x) be a polynomial of odd degree n, with real coefficients. Then f has a
real root.
Proof: The assumption is that f (x) = a0 + a1 x + ... + an xn , where n is odd, the ai ’s are real
numbers, and an 6= 0. Without loss of generality, we can assume an > 0. Then as x−→∞,
f (x)−→∞; and as x−→ − ∞, f (x)−→ − ∞ (note this uses the assumption n odd!). In
particular, f (a) < 0 for some a < 0, and f (b) > 0 for some b > 0. It then follows from the
Intermediate Value Theorem that f (c) = 0 for some a < c < b.
Now recall that for any n×n matrix A, the eigenvalues of A are the roots of its characteristic polynomial det (xI − A). Since this is a polynomial of degree n, Lemma 4.4 immediately
implies:
2
I always liked this use of the word “enjoy” in mathematical writing. In this case it conjures up an image
of orthogonal matrices frolicking on the beach in Tahiti, rotating and reflecting under the tropical sun.
14
Proposition 4.5 Let A be an n × n matrix, and suppose n is odd. Then A has a real
eigenvalue λ. Hence A has an eigenvector v ∈ Rn , with Av = λv.
(Note that both the lemma and the proposition are false for n even: For example, the
90-degree rotation matrix has no real eigenvalues; its characteristic polynomial is x2 + 1.)
We now return to the proof of the theorem. Since the number three is well-known to
be odd3 , by Proposition 4.5 F has a real eigenvalue λ. Furthermore, since F is orthogonal,
λ = ±1. Let v be a unit eigenvector, let L denote the line spanned by v, and let W denote
the plane orthogonal to v. Then L and W are both F -invariant subspaces. Hence if v1 , v2
is an orthonormal basis for W , and we set v3 = v, then v1 , v2 , v3 is an orthonormal basis for
R3 with the desired property.
The orthogonal group O(n) is the set of all orthogonal n × n matrices. The special
orthogonal group SO(n) is the subset consisting of all A ∈ O(n) with determinant 1. The
term “group” refers to the fact that O(n) is a group under matrix multiplication, and SO(n)
is a subgroup. You don’t need to know any group theory here, but those who do might
amuse themselves by showing that SO(n) is in fact a normal subgroup of index 2. We also
set O(n)− = {A ∈ O(n) : det A = −1}.
Corollary 4.6 If A ∈ SO(3), then A is a rotation. If A ∈ O(3)− , then A is the product of
a reflection and a rotation (where reflections themselves are included in this description by
taking the rotation to be the identity).
Proof: Suppose A ∈ SO(3). Choose a basis as in the above theorem. If the entry in the
lower right corner is +1, then the 2-by-2 block in the upper left must have determinant one,
and hence is a rotation by Exercise 3a. Thus the axis of rotation is the line spanned by v3 .
If the entry in the lower right corner is −1, then the 2-by-2 block is a reflection. A further
change of basis then yields a new matrix that is diagonal with diagonal entries −1, −1, 1;
hence A is a rotation through angle π. The second assertion of the corollary is similar, and
left as an exercise.
Corollary 4.7 SO(3) is path-connected.
Proof: Intuitively this now obvious: Since any A ∈ SO(3) is a rotation about some axis, we
can follow a path along rotations about the same axis to reach the identity matrix. Somewhat
more precisely, we know that there is an orthonormal basis for R3 such that the matrix of A
with respect to this basis is


cos θ − sin θ 0

cos θ 0 
 sin θ

0
0
1
3
For a proof of this fact see Parity questions in Ordovician trilobite morphology by I.M. Jauqine, Journal
des Études Théologiques, vol. 73, pp. 48-912.
15
Here θ ∈ [0, 2π); inserting tθ in place of θ yields a path to the identity matrix. But
there’s still something slightly fishy about this, because we changed our basis in R3 ; how do
we know that this preserves continuity? To make things absolutely, pedantically precise, we
recall that the above assertion beginning “we know that there is an orthonormal basis...”
is equivalent to saying that there exists B ∈ O(3) such that if C is the matrix displayed
above (now thought in terms of the standard basis), A = BCB −1 . Now certainly there is
a continuous path λ from C to the identity in SO(3), as described above; then the path
µ(t) = Bλ(t)B −1 is a continuous path from A to the identity (continuous because matrix
multiplication is continuous).
It follows that O(3) has two path-components, namely SO(3) and O(3)− . See the more
general statement in the optional exercises below.
Optional Exercises 4.
1. Show that any A ∈ O(n) can be written as a product of m reflections, for some m ≤ n.
Notes: 1.1. (As a matter of convention, we regard the identity matrix as a product of
zero reflections, and a reflection itself as a product of one reflections, casting grammatical
correctness to the winds.) Here the precise definition of reflection is as follows: Call a vector
subspace W of Rn a hyperplane if it has dimension n − 1. Thus a hyperplane in R2 is a
line, a hyperplane in R3 is a plane in the usual sense, and so on. A reflection is a linear
transformation σ of Rn such that there is a hyperplane W with σ(w) = w for all w ∈ W ,
and σ(v) = −v for all v in the line orthogonal to W . Note that every reflection is necessarily
an orthogonal transformation.
1.2. Proceed by induction on n. At the inductive step, assuming the result proven for
O(n − 1).
Case 1: Suppose that Aen = en . Then A ∈ O(n − 1), where by abuse of notation we are
identifying O(n − 1) with the subgroup of O(n) that fixes en . Hence by inductive hypothesis,
A is a product of m reflections with m ≤ n − 1.
Case 2: Suppose Aen 6= en . Digress to prove the following lemma: If v, w ∈ S n−1 with
v 6= w, then there is a reflection r with rv = w. Hence we can find such an r with rAen = en .
By case 1, rA = r1 r2 ...rm with m < n, so A = rr1 r2 ...rm is a product of m + 1 reflections,
with m + 1 ≤ n.
2. Call an orthogonal transformation R of Rn a rotation if there is a vector subspace
W of dimension n − 2 such that R(w) = w for all w ∈ W , and the restriction of R to the
2-dimensional subspace orthogonal to W is a rotation in the sense defined above. Note this
is equivalent to the simpler statement: There is a vector subspace W of the indicated type,
and R has determinant one. Show that every A ∈ SO(n) is a product of rotations. How
many rotations are required, in general?
Note: Use induction on n, as in problem 1. The lemma you’ll need in this case is that
given any v, w ∈ S n−1 , there is a rotation R with Rv = w. You’ll also need to pay attention
to whether n is odd or even, but other than that the proof is very similar in spirit to problem
1.
3. Deduce from problem 2 that SO(n) is path-connected for all n, and that O(n) has
two path-components: SO(n) and O(n)− .
16
5
Isometries of Euclidean space
We now arrive at the main result of these notes—a complete characterization of the isometries
of Rn . Recall that G : Rn −→Rn is an isometry if d(G(x), G(y)) = d(x, y) for all x, y, where d
is the usual distance function. Fix v ∈ Rn and define the corresponding translation Rn −→Rn
by Tv (x) = x + v. It is clear that a translation is an isometry.
Theorem 5.1 Let G : Rn −→Rn be an isometry. Then G can be written uniquely as the
composition of an orthogonal transformation followed by a translation. In other words, G =
Tv ◦ F for unique v ∈ Rn and F ∈ O(n).
The main step in the proof is to establish the following lemma (note the lemma also
follows from the theorem, but of course logic forbids that path).
Lemma 5.2 Suppose the isometry G fixes the origin. Then G is an orthogonal transformation.
Proof: The proof is elementary, but requires some clever algebra. We will break it down into
several steps.
Step 1: G(−v) = −G(v) for all v ∈ Rn . To see this, we can assume v 6= 0, so that v lies on
a sphere centered at the origin of radius r =| v |. Hence G(v) also lies on this sphere. On
the other hand, d(G(v), G(−v)) = d(v, −v) = 2r, and this forces G(−v) = −G(v).
Step 2: | G(v) + G(w) |=| v + w |, for all v, w. This follows because
| G(v) + G(w) |=| G(v) − G(−w) |=| v − (−w) |,
where the first equality holds by Step 1 and the second by the assumption that G is an
isometry.
Step 3: hG(v), G(w)i = hv, wi for all v, w. Here we use the easily verified “polarization
identity”
1
hv, wi = ((| v + w |)2 − | v |2 − | w |2 ).
2
We apply this identity first with G(v), G(w) in place of v, w. Using Step 2 and the fact that
G is an isometry, and then using the identity again on v, w, yields the assertion of Step 3.
Step 4: G is linear. To prove this, we need to show that G(v + w) = G(v) + G(w) and
G(cv) = cG(v) for all v, w and all scalars c. To show this, it is enough to show that for
some orthonormal basis f1 , ..., fn of Rn , the two sides of the equation agree after taking inner
products with each of the fi ’s. But if e1 , ..., en denotes the standard orthonormal basis as
usual, then Step 3 shows that setting fi = G(ei ) yields an orthonormal basis. It is then easy
to check by Step 3 that
hG(v + w), G(ei )i = hv + w, ei i = hG(v) + G(w), G(ei )i,
17
and similarly for the other equation. This completes the proof of Step 4. Combining Steps
3 and 4 completes the proof of the lemma.
Now suppose G is an arbitrary isometry, and let v = G(0). Then T−v ◦ G fixes the
origin, and hence by the lemma T−v ◦ G = F for some orthogonal transformation F . Then
G = Tv ◦ F , as desired.
Finally, suppose Tv ◦ F = Tw ◦ H with F, H orthogonal transformations. Then
Tv−w = T−w ◦ Tv = H ◦ F −1 .
Hence Tv−w is a translation fixing the origin and so must be the identity; that is, v = w.
It then follows that F = H. This proves the uniqueness, and completes the proof of the
theorem.
Exercise 5. Show that F ◦ Tv = TF (v) ◦ F . Hence the translation Tv commutes with the
orthogonal transformation F if and only if v is fixed by F . (Try out some examples with
n = 2, 3.)
Optional exercise for fans of group theory: Show that the set of all translations is an
abelian normal subgroup of the group of all isometries. Show also that O(n) is a non-normal
subgroup.
Given an isometry G of Rn , let G = Tv ◦ F be its unique decomposition as in the
theorem. If F has determinant one (i.e., belongs to SO(n)), we call G an oriented isometry.
In particular, any translation or rotation is an oriented isometry. Another common term
for oriented isometry is “rigid motion”. A reflection is a typical example of a non-oriented
isometry.
6
Concluding philosophical remarks
As already noted, isometries are important because they leave invariant various geometric
properties. For example, we will show that the arc-length and curvature of a space curve are
unchanged by isometry of R3 . Then if α : [a, b]−→R3 is a curve with non-vanishing velocity
vectors, and we want to study abstractly its arc-length and curvature, we can assume without
loss of generality that α starts at the origin and that its initial velocity vector points along
the positive x-axis. We arrange this by (i) translating its initial point back to the origin;
and then (ii) rotating the curve to get its initial velocity vector in the desired direction.
(Here (ii) uses problem 1 of assignment 1, although it is any case intuitively obvious that
such a rotation exists). Since the arc-length and curvature are unchanged—translations and
rotations being isometries—no relevant information has been lost or distorted. We will see
many examples of this principle.
But there is another way to view the process just described. Instead of moving the curve,
we can think of leaving the curve in place and moving the coordinate axes instead. The
point here is that there is no God-given choice of coordinates. If the curve does not start at
the origin, we can simply redefine the origin so that it does. Similarly, we can redefine the
coordinate axes so that the initial velocity vector points along the positive x-axis, if this is
18
desirable. This is not to say that we can move the coordinates around anyway we like; for
geometrical purposes the essential point is that the coordinates be moved by an isometry.
The importance of oriented isometries stems from the fact that many interesting “geometric” properties are invariant under oriented isometry but not under reflection—e.g.,
signed curvature for plane curves, or torsion for space curves. The adjective “oriented” is
really shorthand for “orientation-preserving”; the significance of this will be explained later.
Lastly, I want to alert you to a major paradigm shift that will occur later in the course.
You can ignore this remark for now, if you want, but it is so important that I want to put it
on the record right away. Our curves and surfaces live in Euclidean space R3 . It is convenient
to call this larger space in which they live the ambient space. Then our entire discussion of
geometric properties—so far— has been phrased in terms of invariance under isometries of
the ambient space. In other words, when we look at a curve we are looking not just at the
curve itself but at the particular way it sits inside the ambient space. Later we will change
this point of view and look for intrinsic geometric properties of curves and surfaces. With
this new paradigm, we will allow isometries (= distance-preserving transformations) that are
defined only on the curve or surface itself; then an intrinsic geometric property will mean a
property that depends only on the metric of the curve or surface.
To give a dramatic illustration, the whole concept of curvature for curves disappears! For
example, the map (cos t, sin t) from [0, π] to the upper semi-circle is an isometry in this new
sense, where distance on the semi-circle is measured using arc-length. Thus the curvature of
a curve is not an intrinsic geometric concept; if arc-length measurements alone are allowed,
a being whose existence is confined to the semi-circle cannot tell the difference between her
world and a straight line. Arc-length is intrinsic; curvature is not.
The study of surfaces, on the other hand, opens up a vast new panorama. Now it is
true that we can and will study geometric properties in the old sense—for example, notions
of curvature that depend on the ambient space and are invariant under isometries of the
ambient space. But here it turns out that there is an intrinsic notion of curvature, that can
be detected even by beings such as ourselves whose whole lives are confined (usually!) to
the surface. This fabulous discovery is due to Gauss, in his “Theorem Egregium”—aptly
translated by my colleague Jack Lee as “Totally Awesome Theorem”.
But all this comes later. For now, we will live in the ambient world, looking up to the
stars.
19