Download Notes: Orthogonal transformations and isometries

Orthogonal transformations and isometries Steve Mitchell January 2002 1 Introduction Our study of curves and surfaces will focus primarily on their geometric properties—that is, on properties that are invariant under distance-preserving transformations such as translations, rotations and reflections. Such a transformation is called an isometry. For example, the arc-length of a curve in R3 is invariant under isometries of R3 . Since isometries play a central role, it is natural to ask: What does a typical isometry of R3 look like? The main goal of these notes is to give a complete answer to this question. In particular, we will show that every oriented isometry or “rigid motion” of R3 is the composition of a translation and a rotation, while a non-oriented isometry also involves a reflection (see below for the precise definitions and results). In fact we will analyze more generally the isometries of Rn . This is no harder than the 3-dimensional case, but no harm will be done if you assume n ≤ 3 for a first reading. A secondary goal is to review some basic linear algebra. This is important for two reasons: (i) The most interesting isometries are the linear isometries, also known as orthogonal transformations or orthogonal matrices; and (ii) linear algebra is an important tool in differential geometry generally. Concerning item (ii), it could even be said that the essential strategy of differential calculus is to approximate non-linear transformations by linear transformations. So we certainly want to have a firm grasp on the linear case. Please note, however, that this is not a complete or self-contained account of linear algebra; many definitions and proofs are omitted or only sketched. I am assuming that you have seen most of the linear algebra before, and that you have a linear algebra textbook at hand for further reference. 2 2.1 Vector spaces and linear transformations Vector spaces Recall that a vector space (over the real numbers) is a set V equipped with a zero element, + · an addition map V × V −→ V , and a scalar multiplication map R × V −→ V . These are subject to a familiar list of axioms, such as the associative, commutative and distributive laws; see any linear algebra text for the precise list. The zero element is sometimes denoted simply as 0, and sometimes as ~0 if there is danger of confusion with other zero objects that are lying around (such as the scalar zero that lives in R). 1 The fundamental examples are the Euclidean spaces Rn , defined for each n ≥ 1 as the set of n-tuples (x1 , ..., xn ) of real numbers, with addition and scalar multiplication defined componentwise. It is useful to also make the convention that R0 is the zero vector space; that is, the vector space consisting of the zero element and nothing else. There are many other examples of vector spaces, some of a quite different character. For instance, consider the set C(R2 , R) of all continuous real-valued functions on the plane. Then with the usual pointwise addition and scalar multiplication, C(R2 , R) is a vector space. Although examples of this type are not important for present purposes, they serve to illustrate the fact that the “vectors” in a vector space are not always vectors in the classical sense of the term. A basis for a vector space V is a subset S of V such that every v ∈ V can be written uniquely as a linear combination of elements of S. A fundamental theorem asserts that every vector space has a basis. We will be interested only in the case when S is finite. In that case it is a theorem that any two bases have the same number of elements, and this common number is called the dimension of V . (If there is no finite basis then V is infinite-dimensional; the vector space C(R2 , R) is an example.) In the finite-dimensional case the defining proprety of a basis can be expressed precisely as follows. Let v1 , ..., vn be a basis for V , where n is the dimension of V . Then for every v ∈ V there are unique coefficients ci ∈ R such that v= n X ci v i . i=1 Note this assertion breaks down into two separate claims: (1) Such an equation exists; in other words, the vi ’s span V ; and (2) the coefficients ci are unique; in other words, the vi ’s are linearly independent. For example, we will frequently make use of the standard basis for Rn , defined as the set e1 , ..., en where ei = (0, ..., 1, .., 0) and the “1” is in the i-th place. At some point in your life you have to prove that this is indeed a basis, but I will assume this fact is known. In particular, the dimension of Rn is n. It is essential to note, however, that there are infinitely many different bases for Rn ; the standard basis is just one possibility. In fact if v1 , ..., vn are vectors in Rn , and we form an n by n matrix A with the vi ’s as columns, then v1 , ..., vn will constitute a basis if and only if the matrix A is invertible. This is another standard fact from linear algebra; recall also that A is invertible if and only if A has nonzero determinant. 2.2 Vector subspaces A vector subspace, or linear subspace, of a vector space V is a subset W that contains 0 and is closed under addition and scalar multiplication. Here the use of the word “closed” has nothing to do with topology; it is standard mathematical useage to say that a set is “closed under the operations blah-blah-blah” if, whenever the operations blah-blah-blah are performed on members of the set, the resulting element is still in the set. In particular, any vector subspace of a vector space is then a vector space in its own right; this yields a rich supply of further examples. For example, note that any line or plane through the origin in R3 is a vector subspace of R3 . Note also the two extreme cases: V is always a vector subspace of itself, and the zero vector space {0} is always a vector subspace of V . Warning. In topology or metric space theory, the term subspace is used in a much more 2 general sense: A subspace of R3 can be a completely arbitrary subset, with the metric (or if you know topology, the subspace topology) that it inherits from the usual metric on R3 . Thus a vector subspace of R3 , or more generally of Rn , is also a subspace in this topological sense, but not conversely. This can lead to confusion because in the heat of battle one tends to drop the qualifier “vector” and say simply “subspace” when one really means “vector subspace”. In fact, proofreading notwithstanding, I would not be surprised if I’ve done it in these notes. Usually the context will make the meaning clear, but to be safe let’s try to leave the qualifier in. Red Alert. A line or plane that does not go through the origin is not a vector subspace of R3 . The definition is quite explicit about this, after all! In order to make the distinction it is psychologically critical to have a technical term for these more general lines and planes. Let us call a subset W of Rn an affine subspace if it has the form W = V + ~q for some vector subspace V and vector ~q. In other words, after translating W by −~q, we get a vector subspace V . Notice that any line or plane in R3 is an affine subspace. It is worthwhile to get this straight, because later, when we discuss tangent planes and lines, we will need to think of them both ways: as affine subspaces (which is the usual way that you’re accustomed to, even if you never used that term!), or as the vector subspaces obtained after translation to the origin. Here is another type of example: Suppose given a homogeneous system of m linear equations in n unknowns, written in matrix form as Ax = 0 for some m × n matrix A. Then the solution set, also known as the nullspace of A (or kernel if one is thinking in terms of linear transformations), is a vector subspace of Rn . On the other hand, if we have an inhomogeneous system Ax = b, then the solution set is an affine subspace. Of course it could be the empty set, but if there is one solution x0 and W is the nullspace of A, then the complete solution set is precisely the affine subspace W + x0 . One last type of example: Given vectors v1 , ..., vr ∈ V , we can form the set W consisting of all linear combinations of the vi ’s. It follows trivially that W is a vector subspace; we call W the span of the vi ’s, or “the vector subspace spanned by the set S = {v1 , ..., vr }. Thus any nonzero vector spans a line, any two linearly independent vectors span a plane, etc.; more generally, the span of S has dimension at most r, and has dimension exactly r if and only if the vi ’s are linearly independent. 2.3 Linear transformations and matrices Let V, W be vector space. A linear transformation from V to W is a function F : V −→W such that (i) for all v1 , v2 ∈ V , F (v1 + v2 ) = F (v1 ) + F (v2 ); and (ii) for all v ∈ V , c ∈ R, F (cv) = cF (v). It is clear that a linear transformation is uniquely determined by its values on a basis for V . For if v1 , ..., vm is a basis and we know F (vi ) for each i, then to compute F (v) for P P arbitrary v we write v = ci vi and get F (v) = ci F (vi ). 3 If we fix a basis w1 , ..., wn for W , we can make this even more explicit as follows: There are unique coefficients aij ∈ R such that F (vj ) = n X aij wi . i=1 If we form the n × m matrix A with entries aij , then the linear transformation F is uniquely determined by the matrix A, and vice-versa. Notice that the matrix depends on the choice of basis for V, W. An important special case is when V = W , so we are considering a linear transformation from V to itself. In this case we could still consider two different bases v1 , ..., vn and w1 , ..., wn for V and compute the matrix A in terms of these bases—the vi ’s being used for the input, and the wi ’s for the output. Occasionally it will be useful to do this, but more often we will insist on using a single basis for V . Thus when we say that F : V −→V is a linear transformation and has matrix A = (aij ) with respect to the basis {vi }, we mean that P F (vj ) = i aij vi . The basic example is to take V = W = Rn , and to represent linear transformations by their matrices with respect to the standard basis. Recall the general fact: The i-th column of the matrix represents the image under F of the i-th standard basis vector (if we are computing with respect to the standard basis in Rn ). Here are some specific examples, with n = 3; in this case we often use the traditional x, y, z in place of x1 , x2 , x3 : Example 1. Let F (x, y, z) = (−y, x, z). Geometrically, F is a 90-degree rotation around the z-axis, counterclockwise as viewed from the positive z-axis. The matrix A with respect to the standard basis is   0 −1 0    1 0 0  0 0 1 Example 2. Let F (x, y, z) = (x, y, −z). Then F is reflection across the xy-plane, and has matrix   1 0 0    0 1 0  0 0 −1 Example 3. Generalizing example 2, let F (x, y, z) = (ax, by, cz) for scalars a, b, c. Then the matrix of F is   a 0 0    0 b 0  0 0 c A matrix of this form is called diagonal, for the obvious reason. The geometric interpretation is also easy to see: Suppose for instance that a, b > 1 and 0 < c < 1. Then F is an expansion (or “dilation”) by the appropriate factor in the x, y directions, and is a contraction by the factor c in the z direction. For another example, suppose a = 0 and b = c = 1. Then 4 F is projection on the yz-plane. Note that all of the above examples except this last one are invertible. There is no law that says we must use the standard basis. Indeed, it will be essential to remain flexible and allow ourselves to choose whatever basis is most convenient for the problem at hand. Consider, for instance, the linear transformation F of the plane that reflects across the line through the origin at an angle of π/6 with the x-axis. The matrix of F with respect to the standard basis is √ 3 2 −1 2 1 √2 3 2 ! √ √ Now suppose we use instead the basis v1 = ( 3, 1), v2 = (−1, 3). Then the matrix of F with respect to this basis is 1 0 0 −1 ! The second matrix is obviously simpler, and displays the essential features of the transformation: F fixes the line spanned by v1 pointwise, and acts as multiplication by −1 on the perpendicular line spanned by v2 . 2.4 Composition of linear transformations Suppose we are given linear transformations G : Rk −→Rm and F : Rm −→Rn . Then the composition F ◦ G is a linear transformation Rk −→Rn , as you can easily check. This raises the question: How is the matrix representing F ◦ G related to the matrices representing G, F ? (Here all matrices are being computed with respect to the standard bases.) Let A, B be matrices of size n × m, m × k, respectively. Then the product AB is defined to be the n × k matrix whose ij-th entry is the dot product of the i-th row of A with the j-th column of B. Note this only makes sense when the number of columns of A is the same as the number of rows of B. Explicitly, we have (AB)ij = m X air brj . r=1 Proposition 2.1 Let G : Rk −→Rm and F : Rm −→Rn be linear transformations. Let B, A denote the matrices of G, F , respectively, with respect to the standard bases. Then the matrix of F ◦ G is AB. The proof is a straightforward check, if you keep your wits about you and don’t let those unruly indices get the upper hand. In beginning linear algebra courses, one usually defines matrix multiplication first and relates it to composition of linear transformations later. But from a conceptual point of view, Proposition 2.1 is the reason why matrix multiplication is defined the way it is in the first place. Here is a simple illustration of the enormous advantage of the conceptual viewpoint: One of the important properties of matrix multiplication is 5 that it is associative: A(BC) = (AB)C, provided the matrices in question have the right size so that their products are defined. A direct proof of this fact from the definition of matrix multiplication is tedious and unilluminating. With Proposition 2.1 at hand, however, the proof becomes trivial and at the same time enlightening—the associativity of matrix multiplication is equivalent to associativity of composition of functions: F ◦ (G ◦ H) = (F ◦ G) ◦ H. This last equation is immediate on inspection for any kind of functions, not just linear ones. Q.E.D.! 2.5 Eigenvalues and eigenvectors In the long run, there will be much to say on this subject. For now, I just review the most basic facts and definitions. Recall that a nonzero vector v is an eigenvector for the linear transformation A if there is a scalar λ such that Av = λv. The scalar λ is an eigenvalue. Some linear transformations have no eigenvectors at all; for example, any non-identity rotation of the plane. At the opposite extreme, it can happen that there is actually a basis of Rn consisting of eigenvectors of A. This is equivalent to saying that there is a basis such that the matrix of A with respect to that basis is diagonal (exercise). More examples can be found below. The most important result about eigenvalues is the following: The characteristic polynomial of A is det (xI − A), where x is an indeterminate. Note that if A is n × n, the characteristic polynomial is a monic polynomial of degree n. Proposition 2.2 Let A be a linear transformation of Rn (thought of as an n × n matrix with respect to the standard basis). Then the eigenvalues of A are precisely the real roots of the characteristic polynomial of A. For example, the fact that the 90-degree rotation matrix has no eigenvalues is reflected in the fact that its characteristic polynomial is x2 + 1, which has no real roots. Remark: There is a small ambiguity in the terminology here, at least as used in practice. Any n × n matrix of real numbers can also be regarded as a matrix of complex numbers, and when so regarded it may have various non-real eigenvalues. In the example just given, the rotation matrix has complex eigenvalues ±i. In order to avoid this ambiguity we sometimes use the terminology “A has a real eigenvalue” in order to emphasize that we are excluding the non-real eigenvalues, if any (even though logically this is redundant since our eigenvalues are real by definition). 2.6 Invariant subspaces Suppose F : V −→V is a linear transformation from V to itself, where we assume as usual that V is finite-dimensional. A vector space W ⊂ V is F-invariant, or an invariant subspace for F, if F (W ) ⊂ W . Exercise 1. Prove the following: Suppose F is invertible and W is F -invariant. Then F (W ) = W . Conclude that W is invariant under the inverse tranformation F −1 . 6 As an example, think of a rotation around the z-axis. Then both the xy-plane and the z − axis are invariant subspaces. Of course the two extreme cases—the zero vector subspace of V , and V itself—are automatically F -invariant for any linear transformation F . But in general a linear transformation may not have any other invariant subspaces; for example, a rotation of the plane has none. Note also that F has an invariant subspace of dimension one (i.e., a line) if and only if F has a real eigenvalue. (Prove!) 2.7 Matrix algebra Let Mn R denote the set of all n × n matrices over R. Then Mn R is a vector space in its own right, with the evident component-wise definition of addition and scalar multiplication. 2 Its dimension is n2 , and indeed as a vector space it is really no different from Rn ; we just decided to arrange our n2 numbers in an n × n array instead of in a single row. But Mn R has a lot more structure to it, because we also have matrix multiplication. The multiplication satisfies the associative law already discussed, as well as the usual distributive laws. Those who know some abstract algebra will recognize that Mn R is precisely what is called a ring (a term for which neither Wagner nor Tolkien can take any credit), but we don’t need to get that abstract. The crucial point is that not all of the “usual rules of algebra” are still operative; in particular, the commutative law AB = BA does not hold. Let’s be clear about one thing right away: There is nothing in the least bit mysterious or surprising about this failure of the commutative law. Life is full of non-commutative operations: Put on your socks. Put on your shoes. I claim these two operations do not commute; try it if you don’t believe me. The order of operations matters—this is wellknown to anyone who has ever played with Rubik’s cube, or moved a large piece of furniture through a corner doorway. Thus the fact that the matrices 0 −1 1 0 ! 1 0 0 −1 ! and fail to commute is as commonplace as a rainy December day in Seattle. If you rotate the plane 90 degrees counterclockwise, then reflect across the x-axis, the point (1, 0) ends up at (0, −1). If you do it in the other order, it ends up at (0, 1). Try it with a piece of paper, or lie down on the floor and act it out. There is no reason such operations should commute with one another, and they don’t. 3 3.1 Inner products and orthonormal frames Inner products Suppose given vectors v = (a1 , ..., an ) and w = (b1 , ..., bn ) in Rn . The dot product (or inner product, or scalar product) is given by 7 v·w = n X ai bi . i=1 Another common notation is hv, wi. The key properties of the inner product are as follows: • It is bilinear: hv1 + v2 , wi = hv1 , wi + hv2 , wi and for scalars c we have hcv, wi = chv, wi, and similarly with the roles of v, w reversed. • It is commutative: hv, wi = hw, vi. • It is positive definite: hv, vi = 0 if and only if v = 0. √ The length of a vector v is by definition | v |= v · v. Two vectors v, w are orthogonal if hv, wi = 0. Two vector subspaces V, W of Rn are orthogonal if v, w are orthogonal for all v ∈ V and w ∈ W . The orthogonal complement of a vector subspace V is the set of all vectors in Rn orthogonal to V . We denote it V ⊥ (read as “V-perp”). It is easy to check that V ⊥ is itself a vector subspace. For example, the orthogonal complement of the z-axis is the xy-plane, and vice-versa. A collection of vectors v1 , ..., vk in Rn is orthonormal if each vi has length one and any two distinct vi ’s are orthogonal. In other words, ( vi · vj = 1 if i = j 0 if i 6= j Exercise. Show that any set of orthonormal vectors v1 , ..., vk ∈ Rn is linearly independent. Hence if k = n, these vectors form an orthonormal basis of Rn . The dot product determines the length function, by definition. Conversely, the length function determines the dot product by the so-called “polarization identity” (a fancy name for a simple fact): Proposition 3.1 v · w = 21 (|v + w|2 − |v|2 − |w|2 ). Proof: Easy check. Finally, we note a handy alternative way of thinking about the dot product. We regard vectors v in Rn as “column vectors”, i.e. as n × 1 matrices. Then the transpose v T is just the 1 × n matrix or “row vector” obtained by writing out the components of v horizontally instead of vertically. With this convention, v · w = v T w, where the right-hand side is matrix multiplication. Thinking of the dot product this way greatly simplifies certain calculations, as we’ll see below. 8 3.2 Orthonormal bases Proposition 3.2 Let V be a linear subspace of Rn . Then V admits an orthonormal basis. In fact a stronger statement is true: If v1 , ..., vm is any basis for V , then there exists an orthonormal basis w1 , ..., wm with the property that for all 1 ≤ k ≤ m, the span of w1 , ..., wk is the same as the span of v1 , ..., vk . Proof: We proceed by induction on the dimension m of V . If m = 1 and v1 is a basis for V , then we just take the normalization w1 = v1 /|v1 |. At the inductive step, suppose V has dimension m + 1, and let v1 , ..., vm+1 be a basis. Applying the inductive hypothesis to the subspace U with basis v1 , ..., vm , we get an orthonormal basis w1 , ..., wm for U with the property stated in the proposition. Now define xm+1 = vm+1 − m X (vm+1 · wi )wi . i=1 A simple calculation shows that xm+1 · wi = 0 for all i; notice also that the span of w1 , ...wm , xm+1 is still V . Then set wm+1 = xm+1 /|xm+1 |. This is the desired basis, QED. Remark: Note that the proof gives an algorithm for computing the wi ’s, known as the Gram-Schmidt process. For many purposes, however, all we need is the existence statement. Remark. Instead of assuming V is given as a linear subspace of Rn , we could have assumed V is any finite-dimensional real vector space equipped with an inner product V × V −→R. The rest of the statement and proof go through unchanged. Given V as above, choose an orthornormal basis v1 , ..., vm for V and an orthonormal basis w1 , ..., wk for V ⊥ . Then I claim that v1 , ..., vm , w1 , ..., wk is an orthonormal basis for Rn , so m + k = n. Certainly these m + k vectors are orthonormal, hence linearly independent. It remains to show that they span Rn . If not, then as in the proof of the proposition we can find a unit vector u that is orthogonal to the vi ’s and the wj ’s. Since u is orthogonal to the vi ’s, u ∈ V ⊥ by definition. Since u is orthogonal to the wj ’s, it is orthogonal to itself and hence u = 0, contradiction. A nice basis-free way of expressing the results of the previous paragraph: Proposition 3.3 Let V ⊂ Rn be a linear subspace. Then every x ∈ Rn can be written uniquely in the form v + w, where v ∈ V and w ∈ V ⊥ . Proof: Expanding x in the basis vi , wj defined above shows that x = v + w as required. For the uniqueness, suppose x = v1 + w1 is another such expression. Then v + w = v1 + w1 , so v − v1 = w1 − w. But clearly V ∩ V ⊥ = 0 (any element of the intersection is orthogonal to itself), forcing v − v1 = 0 = w − v1 . So v = v1 and w = w1 as desired. 3.3 Orthogonal projections This section is of general interest, although tangential to our main goal of classifying isometries. 9 3.3.1 Definition and the main examples Proposition 3.4 Suppose V ⊂ Rn is a linear subspace (picture a line or plane in R3 ). Then there is a unique linear transformation πV : Rn −→Rn with the property ( πV (x) = x if x ∈ V 0 if x ∈ V ⊥ Proof: Write x = v + w as in Proposition 3.3, and define πV (x) = v. The uniqueness in Proposition 3.3 ensures πV is well-defined, and makes it easy to show πV is linear (check this!). The desired property of πV is then immediate from the definition. Finally, the fact that πV is the unique linear transformation with the stated property is also immediate: If π also has the property, then π(x) = π(v) + π(w) = v + 0 = v = πV (x). The transformation πV is called orthogonal projection on V; πV (x) is thought of as the “component of the vector x in the V direction”. Exercise. Show that: a) πV πV = πV ; b) πV ⊥ = I − πV , where I is the identity; c) πV πV ⊥ = 0 = πV ⊥ πV . In our differential geometry applications, we’ll be interested mainly in the case n = 3, with V a line or a plane. In these cases there is a simple explicit formula for πV . Suppose first that V is a line, and choose a unit vector v that spans it (there are exactly two such vectors). Then πV (x) = hx, viv. To prove this, one has only to check that πV is linear, πV (v) = v and πV (x) = 0 if hx, vi = 0. All three are immediate on inspection. As a reality check, it behooves us to see what happens when we replace v by −v; we had better get the same transformation! But as you can see, the minus sign will appear twice in the above formula, and cancel out. Now suppose V is a plane. Choose v a unit vector orthogonal to V (again there are two choices). Then by part (b) of the exercise we have immediately πV (x) = x − hx, viv. 3.3.2 A general matrix formula for projections This section is optional reading. We consider how to express orthogonal projections as explicit matrices. Choose an orthormal basis v1 , ..., vk for V , and form the matrix A = (v1 |v2 |...|vk ) (i.e. the vi ’s are the columns). Then the orthonormality is equivalent to the condition AT A = Ik , where Ik is the k × k identity matrix. We can also take the product in the other order: AAT , obtaining an n × n matrix. Proposition 3.5 The matrix of πV in the standard basis is AAT . 10 Proof: Since AAT defines a linear transformation, we only need to check the properties ( T AA (x) = x if x ∈ V 0 if x ∈ V ⊥ as in Proposition 3.4. First note that (AAT )A = A(AT A) = AIk = A. Since the columns of A are the vi ’s, it follows that (AAT )(vi ) = vi for all i, and then by linearity that (AAT )(v) = v for all v ∈ V . On the other hand if w ∈ V ⊥ then (AAT )(w) = A(AT w) = 0, since by assumption w is orthogonal to the columns of A. QED. Note the surprising consequence: The matrix AAT is independent of the choice of orthonormal basis for V . We saw in the previous section how this works when v is a line, and it is possible to give an a priori proof here as well, if desired. Example. If V is a line spanned by a unit vector v = (c1 , ..., cn ), then AAT is the matrix (ci cj ). For example if n = 3 and v = √13 (1, 1, 1), then AAT is the matrix with all entries √ equal to 1/ 3. Remark. Call an n × n matrix B symmetric if bij = bji for all i, j; in other words, B = B T . Then (as already seen in the previous example) the projection matrix AAT above is symmetric: (AAT )T = (AT )T AT = AAT . Symmetric matrices arise in another way in the context of “self-adjoint linear operators”, which in turn arise in the definition of curvature for surfaces. Finally, note that if we choose any orthonormal bases v1 , ..., vm and w1 , ..., wk for V , V ⊥ as we did earlier, then the matrix of πV with respect to the basis v1 , ..., vm , w1 , ..., wk is Im 0 0 0 ! Here Im is the m × m identity matrix, and the 0’s denote zero matrices of the appropriate sizes. 4 Orthogonal transformations Let F : Rn −→Rn be a linear transformation. We call F an orthogonal transformation if F preserves inner products:1 hF (v), F (w)i = hv, wi. It follows that F preserves lengths (take v = w) and orthogonality (i.e., if v, w are orthogonal, that is to say have inner product zero, then F (v), F (w) are again orthogonal); hence the term orthogonal transformation. 1 Unfortunately, the terminology here clashes with the term “orthogonal projection”. Orthogonal projections are never orthogonal transformations, except for the identity. 11 Naturally, we would like to characterize orthogonal transformations in terms of the associated matrix with respect to the standard basis. Call an n × n matrix A orthogonal if AAT = I. Here T denotes the transpose and I is the identity matrix. In particular, A is invertible with inverse AT . If we unravel the equation AAT = I into components, we find that it says precisely that the rows of A form an orthonormal basis for Rn . Since AAT = I implies AT A = I, it follows that the columns also form an orthonormal basis. Summarizing: A matrix is orthogonal if and only if its rows form an orthonormal basis if and only if its columns form an orthonormal basis. Here is the promised characterization: Proposition 4.1 Let F : Rn −→Rn be a linear map. Then the following are equivalent: a) F is a linear isometry. b) F is an orthogonal transformation. c) The matrix of F with respect to any orthonormal basis is orthogonal. d) The matrix of F with respect to the standard basis is orthogonal. Proof: (a) ⇒ (b): By assumption F preserves distances , so in particular preserves distance from the origin: |F (v)| = |v| for all v. Now use the polarization identity and the linearity of F to show F preserves inner products, i.e. is an orthogonal transformation. (The linearity of F is needed to get |F (v) + F (w)|2 = |F (v + w)|2 .) (b) ⇒ (c): Let v1 , ..., vn be an orthonormal basis, and let A denote the matrix of F with respect to this basis. Hence the j-th column of A is F (vj ), expressed in the given basis. Since F preserves inner product, it preserves orthonormality. It follows that the columns are P orthonormal, i.e. A is an orthogonal matrix. (Here we’re using the fact that if v = ai vi P P and w = bi vi , then since the vi ’s are orthonormal, v · w = ai bi .) (c) ⇒ (d): Immediate. (d) ⇒ (a): It’s convenient and instructive to break this into (d) ⇒ (b) ⇒ (a). So let A denote the matrix of F in the standard basis, and suppose A ∈ O(n). We want to show first that (Av) · (Aw) = v · w for all v, w. Recall that the dot product of any two column vectors x, y is the same as the matrix product xT y. So we compute F (v) · F (w) = (Av) · (Aw) = (Av)T Aw = v T AT Aw = v T Iw = v T w = v · w. So F is an orthogonal transformation. It follows that |F (v)| = |v| for all v. But the distance d(v, w) = |v − w|, so using the linearity of F we get d(F (v), F (w)) = |F (v) − F (w)| = |F (v − w)| = |v − w| = d(v, w) as desired. Exercise 2: Prove the following basic properties: a) If A is orthogonal then det A = ±1. b) Any product of orthogonal matrices is orthogonal, the inverse of an orthogonal matrix is orthogonal, and the identity matrix is orthogonal. 12 c) If λ is a real eigenvalue of an orthogonal matrix A, then λ = ±1. In particular, a diagonal matrix is orthogonal if and only if its diagonal entries are all ±1. d) Suppose A is an orthogonal matrix, and v, w are eigenvectors of A with eigenvalues +1, −1 respectively. Then v is orthogonal to w. Let’s look at some low-dimensional cases. If n = 1 there are exactly two orthogonal matrices; namely, ±1. Yawn. If n = 2, the situation is already getting interesting. For example, consider the orthogonal transformation given by counterclockwise rotation through an angle θ. The corresponding matrix is cos θ − sin θ sin θ cos θ ! Note that a direct computation show this matrix is indeed orthogonal (its columns have length one, and are orthogonal). Note that such a matrix has no real eigenvalues, except when θ is a multiple of π. On the other hand the matrix − cos θ sin θ sin θ cos θ ! is a reflection (except when θ is a multiple of 2π, in which case we have the identity matrix). Verify this by finding the axis of reflection explicitly. See Exercise 4 for the general definition of reflection. Exercise 3. a) Show that a 2 × 2 orthogonal matrix is a rotation if it has determinant one, and is a reflection if it has determinant -1. b) Show that every rotation is a product of two reflections. c) Show that if A is a rotation and B is a reflection, then AB = BA if and only if A = ±I. Now consider the case n = 3, which is the most important case for us. One easy way to cook up examples is to promote 2 × 2 examples. Thus if a b c d ! is orthogonal, then so is   a b 0    c d 0  0 0 ±1 For example, if the original matrix was rotation through angle θ and we use +1 in the lower right entry, the new matrix is rotation of 3-space around the z-axis, through the same angle. But there is nothing special about the z-axis; we could take some other line L through the origin, and then rotate the plane W orthogonal to L to get an orthogonal transformation. Eventually, we will probably want to find a formula for this matrix (in terms of a unit vector spanning L, and an angle of rotation), but there is no need to do so now. 13 Among the many interesting properties enjoyed2 by orthogonal matrices, I would like to emphasize the following: Theorem 4.2 Suppose A is an orthogonal matrix (and hence represents an orthogonal transformation of Rn ), and W ⊂ Rn is an A-invariant subspace. Then the orthogonal complement W ⊥ is also an A-invariant subspace. Proof: Suppose v ∈ W ⊥ ; that is, hv, wi = 0 for all w ∈ W . Then hAv, wi = hAv, AA−1 wi = hv, A−1 wi = 0. Here the second equality uses the fact that A is orthogonal, while the third uses the fact that W is also invariant under A−1 . Since w ∈ W was arbitrary, this completes the proof. Note that the theorem can fail for non-orthogonal matrices. For example, if A is the matrix 1 1 0 1 ! then the x-axis is A-invariant, but its orthogonal complement, namely the y-axis, is not. As an application of the above theorem, we prove: Theorem 4.3 Let F be an orthogonal transformation of R3 . Then there is an orthonormal basis v1 , v2 , v3 such that the matrix of F with respect to this basis has the form   a b 0    c d 0  0 0 ±1 To prove this we need some preliminary results that are of interest in their own right. Lemma 4.4 Let f (x) be a polynomial of odd degree n, with real coefficients. Then f has a real root. Proof: The assumption is that f (x) = a0 + a1 x + ... + an xn , where n is odd, the ai ’s are real numbers, and an 6= 0. Without loss of generality, we can assume an > 0. Then as x−→∞, f (x)−→∞; and as x−→ − ∞, f (x)−→ − ∞ (note this uses the assumption n odd!). In particular, f (a) < 0 for some a < 0, and f (b) > 0 for some b > 0. It then follows from the Intermediate Value Theorem that f (c) = 0 for some a < c < b. Now recall that for any n×n matrix A, the eigenvalues of A are the roots of its characteristic polynomial det (xI − A). Since this is a polynomial of degree n, Lemma 4.4 immediately implies: 2 I always liked this use of the word “enjoy” in mathematical writing. In this case it conjures up an image of orthogonal matrices frolicking on the beach in Tahiti, rotating and reflecting under the tropical sun. 14 Proposition 4.5 Let A be an n × n matrix, and suppose n is odd. Then A has a real eigenvalue λ. Hence A has an eigenvector v ∈ Rn , with Av = λv. (Note that both the lemma and the proposition are false for n even: For example, the 90-degree rotation matrix has no real eigenvalues; its characteristic polynomial is x2 + 1.) We now return to the proof of the theorem. Since the number three is well-known to be odd3 , by Proposition 4.5 F has a real eigenvalue λ. Furthermore, since F is orthogonal, λ = ±1. Let v be a unit eigenvector, let L denote the line spanned by v, and let W denote the plane orthogonal to v. Then L and W are both F -invariant subspaces. Hence if v1 , v2 is an orthonormal basis for W , and we set v3 = v, then v1 , v2 , v3 is an orthonormal basis for R3 with the desired property. The orthogonal group O(n) is the set of all orthogonal n × n matrices. The special orthogonal group SO(n) is the subset consisting of all A ∈ O(n) with determinant 1. The term “group” refers to the fact that O(n) is a group under matrix multiplication, and SO(n) is a subgroup. You don’t need to know any group theory here, but those who do might amuse themselves by showing that SO(n) is in fact a normal subgroup of index 2. We also set O(n)− = {A ∈ O(n) : det A = −1}. Corollary 4.6 If A ∈ SO(3), then A is a rotation. If A ∈ O(3)− , then A is the product of a reflection and a rotation (where reflections themselves are included in this description by taking the rotation to be the identity). Proof: Suppose A ∈ SO(3). Choose a basis as in the above theorem. If the entry in the lower right corner is +1, then the 2-by-2 block in the upper left must have determinant one, and hence is a rotation by Exercise 3a. Thus the axis of rotation is the line spanned by v3 . If the entry in the lower right corner is −1, then the 2-by-2 block is a reflection. A further change of basis then yields a new matrix that is diagonal with diagonal entries −1, −1, 1; hence A is a rotation through angle π. The second assertion of the corollary is similar, and left as an exercise. Corollary 4.7 SO(3) is path-connected. Proof: Intuitively this now obvious: Since any A ∈ SO(3) is a rotation about some axis, we can follow a path along rotations about the same axis to reach the identity matrix. Somewhat more precisely, we know that there is an orthonormal basis for R3 such that the matrix of A with respect to this basis is   cos θ − sin θ 0  cos θ 0   sin θ  0 0 1 3 For a proof of this fact see Parity questions in Ordovician trilobite morphology by I.M. Jauqine, Journal des Études Théologiques, vol. 73, pp. 48-912. 15 Here θ ∈ [0, 2π); inserting tθ in place of θ yields a path to the identity matrix. But there’s still something slightly fishy about this, because we changed our basis in R3 ; how do we know that this preserves continuity? To make things absolutely, pedantically precise, we recall that the above assertion beginning “we know that there is an orthonormal basis...” is equivalent to saying that there exists B ∈ O(3) such that if C is the matrix displayed above (now thought in terms of the standard basis), A = BCB −1 . Now certainly there is a continuous path λ from C to the identity in SO(3), as described above; then the path µ(t) = Bλ(t)B −1 is a continuous path from A to the identity (continuous because matrix multiplication is continuous). It follows that O(3) has two path-components, namely SO(3) and O(3)− . See the more general statement in the optional exercises below. Optional Exercises 4. 1. Show that any A ∈ O(n) can be written as a product of m reflections, for some m ≤ n. Notes: 1.1. (As a matter of convention, we regard the identity matrix as a product of zero reflections, and a reflection itself as a product of one reflections, casting grammatical correctness to the winds.) Here the precise definition of reflection is as follows: Call a vector subspace W of Rn a hyperplane if it has dimension n − 1. Thus a hyperplane in R2 is a line, a hyperplane in R3 is a plane in the usual sense, and so on. A reflection is a linear transformation σ of Rn such that there is a hyperplane W with σ(w) = w for all w ∈ W , and σ(v) = −v for all v in the line orthogonal to W . Note that every reflection is necessarily an orthogonal transformation. 1.2. Proceed by induction on n. At the inductive step, assuming the result proven for O(n − 1). Case 1: Suppose that Aen = en . Then A ∈ O(n − 1), where by abuse of notation we are identifying O(n − 1) with the subgroup of O(n) that fixes en . Hence by inductive hypothesis, A is a product of m reflections with m ≤ n − 1. Case 2: Suppose Aen 6= en . Digress to prove the following lemma: If v, w ∈ S n−1 with v 6= w, then there is a reflection r with rv = w. Hence we can find such an r with rAen = en . By case 1, rA = r1 r2 ...rm with m < n, so A = rr1 r2 ...rm is a product of m + 1 reflections, with m + 1 ≤ n. 2. Call an orthogonal transformation R of Rn a rotation if there is a vector subspace W of dimension n − 2 such that R(w) = w for all w ∈ W , and the restriction of R to the 2-dimensional subspace orthogonal to W is a rotation in the sense defined above. Note this is equivalent to the simpler statement: There is a vector subspace W of the indicated type, and R has determinant one. Show that every A ∈ SO(n) is a product of rotations. How many rotations are required, in general? Note: Use induction on n, as in problem 1. The lemma you’ll need in this case is that given any v, w ∈ S n−1 , there is a rotation R with Rv = w. You’ll also need to pay attention to whether n is odd or even, but other than that the proof is very similar in spirit to problem 1. 3. Deduce from problem 2 that SO(n) is path-connected for all n, and that O(n) has two path-components: SO(n) and O(n)− . 16 5 Isometries of Euclidean space We now arrive at the main result of these notes—a complete characterization of the isometries of Rn . Recall that G : Rn −→Rn is an isometry if d(G(x), G(y)) = d(x, y) for all x, y, where d is the usual distance function. Fix v ∈ Rn and define the corresponding translation Rn −→Rn by Tv (x) = x + v. It is clear that a translation is an isometry. Theorem 5.1 Let G : Rn −→Rn be an isometry. Then G can be written uniquely as the composition of an orthogonal transformation followed by a translation. In other words, G = Tv ◦ F for unique v ∈ Rn and F ∈ O(n). The main step in the proof is to establish the following lemma (note the lemma also follows from the theorem, but of course logic forbids that path). Lemma 5.2 Suppose the isometry G fixes the origin. Then G is an orthogonal transformation. Proof: The proof is elementary, but requires some clever algebra. We will break it down into several steps. Step 1: G(−v) = −G(v) for all v ∈ Rn . To see this, we can assume v 6= 0, so that v lies on a sphere centered at the origin of radius r =| v |. Hence G(v) also lies on this sphere. On the other hand, d(G(v), G(−v)) = d(v, −v) = 2r, and this forces G(−v) = −G(v). Step 2: | G(v) + G(w) |=| v + w |, for all v, w. This follows because | G(v) + G(w) |=| G(v) − G(−w) |=| v − (−w) |, where the first equality holds by Step 1 and the second by the assumption that G is an isometry. Step 3: hG(v), G(w)i = hv, wi for all v, w. Here we use the easily verified “polarization identity” 1 hv, wi = ((| v + w |)2 − | v |2 − | w |2 ). 2 We apply this identity first with G(v), G(w) in place of v, w. Using Step 2 and the fact that G is an isometry, and then using the identity again on v, w, yields the assertion of Step 3. Step 4: G is linear. To prove this, we need to show that G(v + w) = G(v) + G(w) and G(cv) = cG(v) for all v, w and all scalars c. To show this, it is enough to show that for some orthonormal basis f1 , ..., fn of Rn , the two sides of the equation agree after taking inner products with each of the fi ’s. But if e1 , ..., en denotes the standard orthonormal basis as usual, then Step 3 shows that setting fi = G(ei ) yields an orthonormal basis. It is then easy to check by Step 3 that hG(v + w), G(ei )i = hv + w, ei i = hG(v) + G(w), G(ei )i, 17 and similarly for the other equation. This completes the proof of Step 4. Combining Steps 3 and 4 completes the proof of the lemma. Now suppose G is an arbitrary isometry, and let v = G(0). Then T−v ◦ G fixes the origin, and hence by the lemma T−v ◦ G = F for some orthogonal transformation F . Then G = Tv ◦ F , as desired. Finally, suppose Tv ◦ F = Tw ◦ H with F, H orthogonal transformations. Then Tv−w = T−w ◦ Tv = H ◦ F −1 . Hence Tv−w is a translation fixing the origin and so must be the identity; that is, v = w. It then follows that F = H. This proves the uniqueness, and completes the proof of the theorem. Exercise 5. Show that F ◦ Tv = TF (v) ◦ F . Hence the translation Tv commutes with the orthogonal transformation F if and only if v is fixed by F . (Try out some examples with n = 2, 3.) Optional exercise for fans of group theory: Show that the set of all translations is an abelian normal subgroup of the group of all isometries. Show also that O(n) is a non-normal subgroup. Given an isometry G of Rn , let G = Tv ◦ F be its unique decomposition as in the theorem. If F has determinant one (i.e., belongs to SO(n)), we call G an oriented isometry. In particular, any translation or rotation is an oriented isometry. Another common term for oriented isometry is “rigid motion”. A reflection is a typical example of a non-oriented isometry. 6 Concluding philosophical remarks As already noted, isometries are important because they leave invariant various geometric properties. For example, we will show that the arc-length and curvature of a space curve are unchanged by isometry of R3 . Then if α : [a, b]−→R3 is a curve with non-vanishing velocity vectors, and we want to study abstractly its arc-length and curvature, we can assume without loss of generality that α starts at the origin and that its initial velocity vector points along the positive x-axis. We arrange this by (i) translating its initial point back to the origin; and then (ii) rotating the curve to get its initial velocity vector in the desired direction. (Here (ii) uses problem 1 of assignment 1, although it is any case intuitively obvious that such a rotation exists). Since the arc-length and curvature are unchanged—translations and rotations being isometries—no relevant information has been lost or distorted. We will see many examples of this principle. But there is another way to view the process just described. Instead of moving the curve, we can think of leaving the curve in place and moving the coordinate axes instead. The point here is that there is no God-given choice of coordinates. If the curve does not start at the origin, we can simply redefine the origin so that it does. Similarly, we can redefine the coordinate axes so that the initial velocity vector points along the positive x-axis, if this is 18 desirable. This is not to say that we can move the coordinates around anyway we like; for geometrical purposes the essential point is that the coordinates be moved by an isometry. The importance of oriented isometries stems from the fact that many interesting “geometric” properties are invariant under oriented isometry but not under reflection—e.g., signed curvature for plane curves, or torsion for space curves. The adjective “oriented” is really shorthand for “orientation-preserving”; the significance of this will be explained later. Lastly, I want to alert you to a major paradigm shift that will occur later in the course. You can ignore this remark for now, if you want, but it is so important that I want to put it on the record right away. Our curves and surfaces live in Euclidean space R3 . It is convenient to call this larger space in which they live the ambient space. Then our entire discussion of geometric properties—so far— has been phrased in terms of invariance under isometries of the ambient space. In other words, when we look at a curve we are looking not just at the curve itself but at the particular way it sits inside the ambient space. Later we will change this point of view and look for intrinsic geometric properties of curves and surfaces. With this new paradigm, we will allow isometries (= distance-preserving transformations) that are defined only on the curve or surface itself; then an intrinsic geometric property will mean a property that depends only on the metric of the curve or surface. To give a dramatic illustration, the whole concept of curvature for curves disappears! For example, the map (cos t, sin t) from [0, π] to the upper semi-circle is an isometry in this new sense, where distance on the semi-circle is measured using arc-length. Thus the curvature of a curve is not an intrinsic geometric concept; if arc-length measurements alone are allowed, a being whose existence is confined to the semi-circle cannot tell the difference between her world and a straight line. Arc-length is intrinsic; curvature is not. The study of surfaces, on the other hand, opens up a vast new panorama. Now it is true that we can and will study geometric properties in the old sense—for example, notions of curvature that depend on the ambient space and are invariant under isometries of the ambient space. But here it turns out that there is an intrinsic notion of curvature, that can be detected even by beings such as ourselves whose whole lives are confined (usually!) to the surface. This fabulous discovery is due to Gauss, in his “Theorem Egregium”—aptly translated by my colleague Jack Lee as “Totally Awesome Theorem”. But all this comes later. For now, we will live in the ambient world, looking up to the stars. 19

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Notes: Orthogonal transformations and isometries