Download A blitzkrieg through decompositions of linear transformations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

System of linear equations wikipedia , lookup

Euclidean vector wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Exterior algebra wikipedia , lookup

Symmetric cone wikipedia , lookup

Matrix calculus wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Jordan normal form wikipedia , lookup

Vector space wikipedia , lookup

Four-vector wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Transcript
A blitzkrieg through decompositions of linear transformations
Anirbit
Department of Theoretical Physics,
Tata Institute of Fundamental Research,
Mumbai 400005, India
(Dated: August 18, 2009)
The objective of this article is to indicate the arguments that go into understanding the basic
kinds of decompositions of linear transformation. The core ideas have been pointed out with the
expectation that the reader can fill in the detailed proofs but the two decomposition theorems
mentioned next are proven in details.
Starting point is to understand the notion of diagonalizability of matrices of which 3 versions will be explained through matching of algebraic and geometric dimensions of eigenvalues,
through existence of distinct roots of minimal polynomials and through families of projection
operators.
The final goal is to understand the Primary Decomposition Theorem (Rational Form) and
the Cyclic Decomposition Theorem and how the first alone gives the powerful Jordan-Chevalley
Decomposition and the both together give the miraculous Jordan Decomposition.
En route we will need to understand the concepts of conductors and annihilators of vectors,
cyclic vectors and cyclic vector spaces, minimal polynomials and characteristic polynomials,
Cayley-Hamilton Theorem, Lagrange Polynomials and the idea of companion matrices.
Towards the end ideas have been indicated about how these decomposition concepts generalize like to certain kinds of groups. The fist appendix gives some elementary concepts about Lie
Algebras and Representation Theory and the second appendix gives some elementary concepts
about Groups. Rings and Fields and Vector Spaces.
I.
MOTIVATION
Given a matrix representation of a linear transformation on a finite dimensional vector space one can ask the
following natural questions:
• Can it be decided whether the matrix is diagonalizable without finding the eigen vectors?
• Can any matrix be block-diagonalized?
• Can any matrix be brought to a form where there
are entries only along the diagonal and super/sub
diagonal and the entries be known from just the
characteristic and minimal polynomial?
The answer to the first two questions is affirmative and
over algebraically closed fields like C the answer to the
last question is also affirmative.
The point of this article is to indicate the key ideas
that explain the above.
A.
Decomposition of Linear Transformations
For the cause of Representation Theory it is important to understand the elementary ideas that go into
the idea of decomposition of a linear transformation into
transformations on smaller dimensional vector-spaces. It
is desirable that a given linear transformation on an ndimensional vector-space is writable as a ”direct sum” of
n linear transformations on one-dimensional subspaces
of the original space. This is what is the idea behind
“diagonalization”. But we know that all linear transformations are not diagonalizable and then it becomes
necessary to understand which transformations are diagonalizable and when. Further if the matrix of the linear
transformation is not diagonalizable it might still be reducible into a form that is “block-diagonal” and we need
to see what is the simplest block-diagonal form to which
a matrix can be reduced. Here I shall list out in a logical
sequence the theorems which establish the above things
and I shall indicate the basic idea behind them omitting
the detailed proofs.
All vector-spaces in the following section are finite dimensional. Many of the concepts might not
naturally extend for infinite dimensional vectorspaces.
2
3 elementary operations on vector spaces:
• Given two subspaces W1 and W2 of a vector space V
we denote as W1 +W2 the sum of the two subspaces
defined as the set {w1 + w2 |w1 ∈ W1 andw2 ∈ W2 }.
The notion can be naturally extended to arbitrary
number of subspaces.
One notes that dim(W1 ) + dim(W2 ) = dim(W1 ∩
W2 ) + dim(W1 + W2 ).
• Given k subspaces of V , say W1 , W2 ,...,Wk this set
of subspaces is called independent if for any set
of k vectors one from each subspace (Say ωi from
Wi ,i ∈ {1, 2, ..., k}) the relation w1 + w2 + w3 + ... +
wk = 0 implies each wi = 0
• Given k subspaces of V , say W1 , W2 ,...,Wk , the
subspace W defined as W = W1 + W2 + ... + Wk
is said to be a direct sum of these k subspaces
(deoted as W = W1 ⊕ W2 ⊕ W3 ⊕ ... ⊕ Wk if these
k subspaces are independent.
One notes that W could be equal to V and then one
would say that the k subspaces Wi , i ∈ {1, 2, ..., k}
of V form a direct sum decomposition of V .
One would then denote it as V = W1 ⊕ W2 ⊕ W3 ⊕
... ⊕ Wk . Then one can see that an ordered basis of
V is obtained by concatenating together an ordered
basis each from Wi .
I shall soon show a natural example of a set of subspaces of V connected to a given endomorphism on it
which will always be independent but in general will NOT
form a direct sum decomposition of the full space V . Such
an example comes from the following crucial concept.
The following are equivalent ways of defining a Characteristic Value (also known as an “Eigen Value”) for
T an endomorphism of a finite dimensional vector-space
V:
• If the field of the vector space is not closed the
existence of characteristic values isn’t guaranteed.
0 −1
One can easily see that the matrix
doesn’t
1 0
have any characteristic values in R.
• If c is a characteristic polynomial of an endomorphism A acting on the vector space V then an element v ∈ V is called a Characteristic Vector (or
an “Eigen Vector”) for c if Av = cv.
• An endomorphism of a vector space V is called Diagonalizable if there is a basis of V consisting of
eigen vectors. It is easy to see that in that basis the
matrix representation of the map will be diagonal.
• One can see that for a characteristic value c of an
endomorphism of V , the subset of V consisting of
characteristic vectors for c forms a vector subspace
of V . The space spanned by all the characteristic
vectors of a given eigen value is called the Characteristic Space or Eigen Space for that characteristic value. This subspace can also be thought
of as the Null Space of the operator T −cI (where
T is the linear map).
Dimension of the eigen subspace of an eigen value
is called the Geometric Dimension of that eigen
value and its multiplicity in the characteristic polynomial is called its Algebraic Dimension.
1.
Diagonalizability when algebraic dimension equals
geometric dimension
We will realize the criteria of diagonalizability in
multiple ways, some of which are conceptually cleaner
and some of which are computationally efficient.
• c is a characteristic value of T
Diagonalizability(Version 1)
• The operator (T − cI) is singular (not invertible)
Let T be an endomorphism of a finite dimensional
vector space V and let {c1 , c2 , c3 , ..., ck } be the set of
characteristic values of T and Wi be the null space of
(T − ci I). Then the following are equivalent:
• det(T − cI) = 0
Just by virtue of the above 3 equivalent definitions
of an eigenvalue the following observations and concepts
follow:
• Linear transformations related by a similarity
transformation have the same characteristic values. Being similar is an equivalence relation on
the set of all linear transformations and this splits
the space into equivalence classes. And the characteristic value gives a multivalued function on each
equivalence class. Such structures shall be ubiquitous in Representation Theory and are called Class
Functions.
• The polynomial defined as f (x) = det(T − xI)
is called the Characteristic Polynomial and its
roots are precisely the characteristic values.
• T is diagonalizable
• The characteristic polynomial of T is of the form
f (x) = (x − c1 )dim(W1 ) (x − c2 )dim(W2 ) ...(x − ck )dim(Wk )
. Or in otherwords that for each eigen value its Algebraic Dimenison=Geometric Dimension.
• dim(V ) = dim(W1 ) + dim(W2 ) + dim(W3 ) + ... +
dim(Wk )
The crucial ingredients that go into the above recipe
are:
• Nullity of a diagonal matrix is equal to the number
of 0s along its diagonal.
3
• One sees that the set of null spaces, one for each
characteristic value of a given endomorphism of a
finite dimensional vector space V are all linearly independent and hence if W = W1 +W2 +...+Wk then
dim(W ) = dim(W1 ) + dim(W2 ) + ... + dim(Wk ).
But in general dim(W ) < dim(V ) and hence in
general the null-spaces don’t give a direct sum decomposition. But when the third criteria above is
true dim(W ) = dim(V ) then the set of null-spaces
of the distinct eigen values give a direct sum decomposition of the full space and hence diagonalizability.
An example to see the geometric dimension and
algebraic dimension conflict
Consider 2 matrices




3 1 −1
5 −6 −6
A =  2 2 −1  and B =  −1 4 2 
2 2 0
3 −6 −4
A simple calculation shows that both of them have the
same characteristic polynomial
(x − 1)(x − 2)2
. So both of them have eigenvalues 1 and 2.
Now let us look at these operators



1 1 −1
2 1 −1
A − I =  2 1 −1  , A − 2I =  2 0 −1 
2 2 −2
2 2 −1




3 −6 −6
4 −6 −6
B − I =  −1 3 2  , B − 2I =  −1 2 2 
3 −6 −6
3 −6 −5

In what follows we shall use the following technique
that geometric dimension of an eigen value a of an automorphism T is the nullity of the operator T − aI which
can be deduced from the Rank − N ullityT heorem which
states that Rank + Nullity of an automorphism =
dimension of the vector-space on which it acts.
Further rank is usually easier to see either by just staring at the matrix hard enough or by solving a set of
simultaneous equations.
One can observe the following from the above:
• Rank of A − I = 2. So for A geometric dimension of the eigen value 1 is 1 (same as its algebraic
dimension).
• Rank of A − 2I = 2 . So for A geometric dimension
of the eigen value 2 is 1 (whereas algebraic dimension was2).
• Rank of B − I = 2. So for B geometric dimension of the eigen value 1 is 1(same as its algebraic
dimension).
• Rank of B − 2I = 2 . So for B geometric dimension of the eigen value 2 is 2(same as its algebraic
dimension).
Hence A is NOT diagonalizablewhereas Bis.
3 2 2
The similarity transformation by  −1 1 0  diagonal3 0 1
izes B.
From this example we conclude that whether or not a
matrix is diagnalizable is a deeper story than just the
characteristic polynomial. To understand what really
matters we have to look at some more subtle polynomials
that capture this phenomenon.
Annihilating Polynomial for a linear transformation over a ring is a polynomial p such that p(T ) = 0
where T is a matrix representation of the linear transformation.
The following concepts follow from the above construction:
• Since the vector-space is finite dimensional say of
dimension n, then the space L(V, V ) is of dimension
n2 and hence there is an annihilating polynomials
of degree < n2 . But stronger is true as follows.
• For a fixed linear transformation, set of its annihilating polynomials forms an ideal in the ring of
polynomials. If in a ring there is euclid’s algorithm
then one knows that there is a unique generator
of the ideal which is the monic in the ideal of the
lowest degree. This unique monic generator of the
ideal of annihilating polynomials is called the minimal polynomial
• The characteristic polynomial and the minimal
polynomial have exactly the same roots upto multiplicities. This immediately shows that if the linear transformation has all distinct eigen values then
its minimal and characteristic polynomial are the
same.
• The Cayley-Hamilton Theorem (weak form)
states that the minimal polynomial divides the
characteristic polynomial and hence a linear transformation satisfies its own characteristic polynomial.
• If W is an invariant subspace for a linear transformation T and TW be the restriction of T to W .
Then the characteristic polynomial and the minimal polynomial for TW divides the characteristic
and the minimal polynomial for T respectively.
4
2.
Conductors and Triangulability
If W is an invariant subspace for the linear transformation of T on the vector space V and v ∈ V , then the
T -conductor of v into W is the set of all polynomials
g in the corresponding field such that g(T )v ∈ W .
As for annihilators, the set of conductors also form an
ideal and it has an unique monic generator. (By abuse of
terminology, henceforth we shall call this unique monic
generator as the, “T -conductor of v into W ” by) Since
the minimal polynomial will take everything to 0, one
notes that every T -conductor for a linear transformation T divides its minimal polynomial. Hence
like above if the factorization of the minimal polynomial
into irreducibles is known then that highly constraints
the form of all the conductors.
The following concepts follow from the above construction:
• Let T be a linear operator acting on a finite dimensional vector space V with an invariant subspace
W and let the minimal polynomial of T factorize
completely into linear polynomials. Then there exists a v ∈ V, v ∈
/ W and a characteristic value c of
T such that (T − cI)v ∈ W .
T he above existence theorem is what shall crucially
make the Cyclic Decomposition Theorem produce
the Jordan Form.
• Call a linear transformation Triangulable if there
is a basis in which the matrix is upper/lower triangular. A linear transformation on a finite dimensional vector space is triangulable iff its minimal
polynomial is a product of linear polynomials with
coefficients in the corresponding field. (Hence any
linear transformation over an algebraically closed
field like C is triangulable.)
The first concept is almost a tautology!. The central
idea in the first concept is that the one can pick any arbitrary vector, say β ∈ V \W and c comes from one of the
common factors between the minimal polynomial and the
characteristic polynomial (which is guaranteed by the linof β
ear decomposition) and then v = T −conductor
β does
x−c
the job. And by the minimality of the degree of the
minimal polynomial, we have v ∈
/ W . To get the second concept one just needs to iterate the first by starting with a W spanned by an eigen-vector of T . Let the
eigen vector be v1 , then one is guaranteed that there is
a v2 ∈ V W s.t (T − cI)v2 = mv1 for some c and m and
hence T v2 = mv1 + cv1 . In the next iteration choose W
as the subspace spanned by v1 and v2 . Continuing one
generates the required oredered basis. The converse is
trivial.
3.
Diagonalizability when minimal polynomial has distinct
roots
This leads us immediately to the computationally
most efficient test of diagonalizability.
Diagonalizability(Version 2)
An endomorphism of a finite dimensional vector
space is diagonalizable iff its minimal polynomial factorizes into a product of distinct linear factors with
coefficients in the corresponding field.
Stated otherwise it means that if {c1 , c2 , ..., ck }
are the distinct eigenvalues of the linear operator T then T is diagonalizable iff as operators
(T − c1 I)(T − c2 I)...(T − ck I) = 0.
The central idea in the above proof is to show that if the
characteristic vectors can’t span the whole space then
the minimal polynomial must have repeated roots. If c
is an eigen value then there exists a vector α beyond
the span of the characteristic vectors, say W , such that
(T − cI)α ∈ W . Since the minimal polynomial factors
as (x − c)q, q being some polynomial then q(T )α ∈ W
since q(T )α is an eigen vector of T with eigen value c.
Then look at the polynomial q(x) − q(c) and one can see
that since c is a root of it, (q(T ) − q(c))α ∈ W (since
W is an invariant subspace of T ) and hence q(c)α ∈ W .
But since α ∈
/ W by definition, we have q(c) = 0 which
is in contradiction to the assumption that the minimal
polynomial has repeated roots. Hence proved. The
converse is trivial.
4.
Diagonalizabilty through projection operators
Projection operators and direct sum decompositions are intimately related to each other in the sense
that giving one gives the other. This notion is made precise in the following way,
If V = W1 ⊕ W2 ⊕ ... ⊕ Wk , then there exists k linear
operators E1 , E2 , ..., Ek on V such that:
• Each Ei is a projection i.e Ei2 = Ei
• Ei Ej = 0 if i 6= j
• I = E1 + E2 + ... + Ek
• The range of Ei is Wi
Conversely if E1 , E2 , ..., Ek are linear operators
on V which satisfy these above mentioned conditions
and if Wi is the range of Ei then V = W1 ⊕W2 ⊕...⊕Wk .
The first part is trivially satisfied by choosing the
Ei s as the projection operator to the Wi s. To prove the
converse note that the condition I = E1 + E2 + ... + Ek
5
ensures that V = W1 + W2 + ... + W . This splits any
vector α ∈ V as α = α1 + α2 + ... + αk and one can
then see that Ej α = αj by using the construction that
for any i, Ei α = Ei βi for some βi ∈ Wi using the
surjectivity of Ei on Wi . This proves the uniqueness of
the decomposition of alpha and hence the proof.
• If g is any polynomial over the field F and hence if
T is a diagonalizable linear operator as above with
Ei s being projection operators into its eigenspaces
then
The above idea coupled with a concept we had
earlier seen, that subspace of eigen-vectors of a given
eigenvalue form an invariant subspace of the linear transformation and these subspaces for different eigen values
are independent, gives us another way to characterize
diagonalizability.
• Thus if T is a diagonalizable operator then g(T ) =
0 iff all g(ci ) = 0 and hence the minimal polynomial
Qk
is i (x − ci ).
Qk
• Conversely if i (x − ci ) is the minimal polynomial
for T then consider the set of k Lagrange Polynomials defined by these ci s as:
Diagonalizability(Version 3)
Let T be a linear operator on a finite dimensional
vector space V . T is diagonalizable if there exists
k distinct scalars c1 , c2 , ..., ck and k non-zero linear
operators E1 , E2 , ..., Ek such that
• T = c1 E1 + c2 E2 + ... + ck Ek
• I = E1 + E2 + ... + Ek
• Ei Ej = 0, i 6= j
It would also imply that:
• c1 , c2 , ..., ck are the eigenvalues of T
• Ei2 = Ei (Ei is a projection)
• The range of Ei is the characterisic space for T
associated with eigenvalue ci
Conversely if T is diagonalizable then k non-zero
linear operators E1 , E2 , ..., Ek are guaranteed to exist
which satisfy all the six criteria above.
The converse is easy to prove by choosing Ei to
be the projection operator into the characteristic space
of eigenvalue ci and by showing that for any v ∈ V ,
T v = c1 E1 v + c2 E2 v + ... + ck Ek v using the idea that
since Ei is the projection into an invariant space of T , it
commutes with T i.e [T, Ei ] = 0∀i.
To prove the diagonalizability condition one notes that
T Ei = ci Ei and hence the range of Ei is the nullspace
of (T − ci I). Thus ci s are the characterristic values of T
and there is no other (which can be shown P
by using the
fact that if there is one say c then T = cI = i (ci − c)Ei
and the consequence follows by the linear independence
of the known eigenspaces).
So we have seen that the range of all the Ei s is the
space spanned by all the characteristic
P vectors of T and
that is the whole space since I = i Ei . Hence diagonalizable.
Further we can see that for any v ∈ V , such that T v =
ci v then it implies Ei v = 0.
So the range of Ei = the nullspace of T − ci I.
One can make the following important observations
about the projection operators:
g(T ) = g(c1 )E1 + g(c2 )E2 + ... + g(ck )Ek
pj =
Y x − ci
cj − ci
i6=j
One then observes that pj (ci ) = δij and hence if
T is diagonalizable then pj (T ) = Ej and hence the
projection operators to the characteristic spaces of
a diagonalizable operator are polynomials in the
operator. (given by the Lagrange Polynomials)
Conversely by defining Ej = pj (T ) one can show
that these Ej s satisfy all the criteria’s of the third
diagonalizability test and hence T is diagonalizable.
This gives an independent proof of the fact that a
linear transformation is diagonalizable iff its minimal polynomial factorizes into distinct linear polynomials.
6
5.
Primary Decomposition Theorem
and
Jordan-Chevalley Decomposition
We had earlier seen in the two matrices considered
which had minimal polynomial being (x − 1)(x − 2)2 the
algebraic and the geometric dimension of the eigenvalues
didn’t match for the two matrices considered and the
characteristic spaces for 1 and 2 were less than the
full space. But in those examples one can see that the
nullspaces of the operators (T − 2I)2 and (T − I) could
give a direct sum decomposition of the full space. This
idea is captured in the following theorem:
Primary Decomposition Theorem Let T be a
linear operator on a finite dimensional vector space V
over the field F and let p be the minimal polynomial for
T where
p = pr11 pr22 ...pk rk
where the pi are distinct irreducible monic polynomials over F and ri are positive integers. Let Wi be the
nullspace of the operator pri i . Then the following holds:
• V = W1 ⊕ W2 ⊕ ... ⊕ Wk
• Each Wi is invariant under Ti
• If Ti is the operator induced on Wi by T , then the
minimal polynomial for Ti is pri i
The block diagonal form for any linear transformation
that this theorem guarantees is called the Rational
Form
This theorem gives a very powerful consequence
which is crucial in representation theory. If Ei s denote
the projection operators to the Wi s defined above and
if pi = T − ci I then one can show by construction that
I = E1 + E2 + ... + Ek and we can observe the following:
• The operator D defined as D = c1 E1 + c2 E2 + ... +
ck Ek along with the Ei s and the ci s satisfies all the
criteria of the projection version of diagonalizability. Hence D is diagonalizable. In this context D
is said to be semisimple.
• One can show that operator N defined as N =
T − D is nilpotent.
• N and D commute and this decomposition of T is
unique (as can be shown by direct evaluation)
• Since we know that the projections are given by
polynomials in the original linear transformation
(specifically by Lagrange Polynomials), we conclude that the diagonalizable and the nilpotent
parts of the operator are also given as polynomials in the operator.
This unique decomposition of any linear operator over
an algebraically closed field as a sum of a pair of commuting nilpotent and semisimple operators (which are polynomials in the original operator) is called the JordanChevalley Decomposition.
The way these operators are constructed is as follows.
Consider the polynomials
fi =
p
pri i
. One can see that all the fi s are relatively prime and
hence their g.c.dP
is one and hence there exists polynomials gi such that i fi gi = 1. One can then see that if Ei
is defined as
Ei = fi gi
then Ei form a set of projection operators. (p|Ei Ej and
hence Ei (T )Ej (T )v = 0, ∀v ∈ V ). Further by definition
if v ∈ { range of Ei } then Ei v = v and pi (T )ri v = 0
by substituting the former into the LHS of the later
equation. Further P
for i 6= j, pri i |Ej and hence Ej v = 0
ri
if pi v = 0. Since i Ei = 1 (by construction), we have
Ei v = v. So the range of Ei is equal to the null space of
pri i .
On an algebraically closed field, as a consequence
of the Jordan Form it shall become obvious that the
dimension of Wi (the nullspace of the operator pri i ) is
the algebraic dimension of the eigenvalue it corresponds
to. Stated otherwise, the dimension of the nullspace of
(T − cI)(its minimal polynomial multiplicity) is the multiplicity of c in the characteristic polynomial.
Since pi (T )ri = 0 on Wi we have pi (Ti )ri = 0. So
the minimal polynomial for Ti divides pri i . Further
if g is any other annihilating polynomial of Ti then
g(T )fi (T ) = 0 since after the direct sum decomposition
is proven one sees that all the projections of a vector
r
along Wj (j 6= i) is annihilated by the factor pj j of fi
and the projection along Wi will be killed by g. So
trivially p|g(T )fi (T ) and so pri i |g. So the minimal polynomial of Ti divides pri i and pri i divides any annihilating
polynomial of Ti . So pri i is the minimal polynomial of
Ti .
7
6.
Cyclic vectors, Cyclic subspaces
and
Companion Matrices
Let T be a linear operator on the vector space V over
the field F . In our aim to understand the decomposition
of linear transformation further, the following constructions shall be of interest to us now:
• Given a v ∈ V , one shall then be interested in polynomials g ∈ F [x] such that g(T )v = 0. This is an
ideal in the ring of polynomials and it is non-zero
since it contains the minimal polynomial.
• Given a v ∈ V , one shall then be interested in
the smallest T -invariant subspace of V which contains v. This subspace is also the intersection of all
T -invariant subspaces containing v. This space is
also the space spanned by all vectors of the form
g(T )v, ∀g ∈ F [x].
• Given a T -invariant subspace, W of V one asks the
question as to whether there is another T -invariant
subspace W 0 such that V = W ⊕ W 0 .
This leads us to define the following two concepts:
• If v ∈ V then the T-cyclic subspace generated
by v is the subspace of all vectors of the form
g(T )v, ∀g ∈ F [x] denoted as Z(v, T ).
• If Z(v, T ) = V then v is called a cyclic vector for
T.
• If v ∈ V then the T-annihilator of v is the ideal
in the ring of polynomials F [x] generated by g ∈
F [x] such that g(T )v = 0. This ideal is denoted as
M (v, T ). The unique monic generator of this ideal
will also be called the T-annihilator of v (by abuse
of notation!)
• A T -invariant subspace W of V is T -admissible if
for every v ∈ V and f ∈ F [x] there exists a w ∈ W
such that f (T )v = f (T )w.
One notes the following:
• The T-cyclic subspace generated by 0 is 0.
• If deg(pv )
=
k
then the vectors
v, T v, T 2 v, ..., T k−1 v forms a basis for Z(v, T ).
• The minimal polynomial for T |Z(v,T ) is pv .
The above is seen by noting that the remainder obtained by dividing any polynomial by pv has to be of degree less then deg(pv ) and hence has to be a linear sum of
v, T v, T 2 v, ..., T k−1 v which are linearly independent since
any non-trivial relation between them will contradict the
definition of pv as the minimal polynomial.
The last assertion is established by noting that for any
g ∈ F [x], pv (T |Z(v,T ) )g(T )v = 0 and hence pv (T |Z(v,T ) )
has Z(v, T ) as its kernel. And by definition there cannot
be any polynomial of lower degree annihilating Z(v, T )
since that will contradict the definition of pv .
From the above assertions one can see that if there
is a cyclic vector then the minimal polynomial and the
characteristic polynomial are the same.
Let us for now look at a linear operator L on a vector
space W of dimension k which has w ∈ W as its cyclic
vector. Then the set {w, Lw, L2 w, ..., Lk−1 w} forms a
basis for W . Consider the vectors defined as wi = Li−1 w
then in the basis {w1 , w2 , ..., wk } L looks like


0 0 0 ... 0 −c0
 1 0 0 ... 0 −c1 


 0 1 0 ... 0 −c2 
 . . . ... .

.
0 0 0 ... 1 −ck−1
where the monic polynomial pw is c0 + c1 x + ... +
ck−1xk−1 . And this matrix is called the Companion
Matrix for this monic polynomial.
One notes the following about the Companion Matrix
• If T is a linear operator on a vector space V then T
has a cyclic vector iff there is a basis of V in which
T looks like the companion matrix of its minimal
polynomial.
• Minimal and the characteristic polynomial of a
companion matrix is the same monic polynomial
from which it came.
These companion matrices in the cyclic subspaces
shall be the building blocks from which we shall try to
build the full linear transformations.
• Z(v, T ) is 1-dimensional iff v is an eigen vector of
T.
• For the identity operator every non-zero vector generates a cyclic subspace. So for dim(V ) > 1, identity operator has no cyclic vector.
• If a T -invariant subspace has a complementary T invariant subspace then it is also T -admissible.
Let pv denote the unique monic generator of M (v, T ).
Then one notes the following:
• deg(pv ) = dim(Z(v, T ))
7.
The idea behind cyclic decomposition (Part 1)
The argument for Cyclic Decomposition Theorem
needs synthesis of many concepts from linear algebra and
hence we shall give the constituent ideas in instalments
in an effort to make it more palatable.
Basically, if T is a linear operator on a finite dimensional vector space V then we can show that there are
vectors {v1 , v2 , ..., vr } such that
V = Z(v1 , T ) ⊕ Z(v2 , T ) ⊕ ... ⊕ Z(vr , T )
8
.
Say one has found vectors {v1 , v2 , ..., vj } inductively
and the subspace
The direct sum of the cyclic spaces given by the
above decomposition gives the required complementary space
Wj = Z(v1 , T ) ⊕ Z(v2 , T ) ⊕ ... ⊕ Z(vj , T )
is proper. One now needs to see that all this ensures
that there is another vector vj+1 ∈ V such that Wj ∩
Z(vj+1 , T ) = {0}.
Take a vector v ∈ V, v ∈
/ Wj . Then the Let f ∈ F [x] be
the T -conductor of v into Wj and IF Wj is T -admissible
then there is a w ∈ Wj s.t f (T )v = f (T )w. Let x =
v − w. Then since v − x ∈ Wj , and by definition Wj is T invariant, g(T )v ∈ Wj iff g(T )w ∈ Wj for some g ∈ F [x],
i.e if the the T -conductor into Wj of both x and v match
i.e if f is the T -conductor of x into Wj .
But f (T )x = 0, f being the T -conductor of x into
W by the above argument and if g(T )v ∈ Wj for any
g ∈ F [x], then by definition f |g and hence g(T )v ∈ Wj
iff g(T )v = 0.
Since by definition Z(v, T ) is the space of all g(T )v for
arbitrary g ∈ F [x], we see that
Wj ∩ Z(vj+1 , T ) = {0}
So the thing to ensure for the induction to continue is
that Wj is T -admissible.
8.
Cyclic Decomposition Theorem
and
Jordan Form
• A linear transformation has a cyclic vector iff
its minimal polynomial and the characteristic
polynomial match.
One was was shown earlier by the companion
matrix. The other way is obvious since if minimal
and characteristic polynomial match then the size
of the first cyclic space will exhaust the dimension
• A priori, it may be the case that the minimal polynomial for the whole vector space is some polynomial but each specific vector has a smaller minimal polynomial (and the lcm of these smaller minimal polynomials is the minimal polynomial over
the whole space).
But one sees that there is a vector whose minimal polynomial is the minimal polynomial over the
whole space. From the next part of the explanation
of this theorem it shall be clear that if one chooses
W0 as the null space then this vector is vi
• Cayley-Hamilton Theorem (strong form)is a
consequence of this theorem which states that if a
linear operator T on a vector space V has minimal
polynomial p and characteristic polynomial f then,
– p|f
The idea in the above section of decomposition of a
vector space into direct sum of cyclic subspaces is made
precise in the following sense:
Cyclic Decomposition Theorem
Let T be a linear operator on a finite-dimensional vector space V and let W0 be a proper T -admissible subspace
of V . Then there exists a set of vectors {v1 , v2 , ..., vr } in
V with respective T -annihilators p1 , p2 , ..., pr such that
• V = W0 ⊕Z(v1 , T )⊕Z(v2 , T )⊕Z(v3 , T )...⊕Z(vr , T )
• pk |pk−1 , k ∈ {2, 3, ..., r}
The integer r and the annihilators p1 , p2 , ..., pr are
uniquely determined by the above 2 conditions and the
fact that vk 6= 0, ∀k.
(Note that one can always choose W0 as the null space
since it is trivially always an admissible space further
recall that earlier it had been shown that deg(pi ) =
dim(Z(vi , T )))
Some of the consequences of existence of the above cyclic
decomposition are:
• Given a linear operator T on a vector space V ,
every T -admissible subspace of it has a complementary T -invariant subspace.
– p and f have the same prime factors except
for multiplicities.
Qk
– if the prime factorization of p is p = 1 fiki
Qk di
then f =
where di is the nullity of
1 fi
fi (T )ki divided by deg(fi ).
The central idea is that pi is the minimal polynomial for T |Z(vi ,T ) and since Z(vi , T ) is a cyclic space
pi is also the characteristic polynomial for this restricted operator and hence by the block diagonal
Qr
form given by the cyclic decomposition f = 1 pi
and if W0 is chosen as the null space then p1 = p
and hence the first part is shown and by the fact
that pi |pi−1 we get the next part. From the Primary Decomposition we know that if Ti is the null
space of fi (T )ki then fiki is the minimal polynomial
for Ti . So by the fact just now proved that prime
factors arising in the factorization of minimal and
the characteristic polynomials are the same we see
that the characteristic polynomial for Ti would be
fidi withs some di ≥ ri . Since the degree of a characteristic polynomial is the dimensionality of the
i)
vector space we automatically have di = dim(V
deg(fi ) .
And further by the direct sum structure it follows
Qk
that f = 1 fidi
9
• The Jordan Formis obtainable over an algebraically closed field by doing the Cyclic Decomposition of the induced operator in every subspace given
by the Primary Decomposition.
Before understanding how the proof of the above big result works let us try to understand how the above leads
to the Jordan Form.
Suppose the characteristic polynomial f of a linear operator T on a vector space V over an algebraically closed
field F factors as:
f=
k
Y
(x − ci )di
So the Cyclic decomposition theorem says that a nilpotent operator on the space V has an ordered basis in
which
A1 0 ... 0
 0 A2 ... 0

. ... .
N = .
 .
. ... .
0 0 ... Ar





where Ai is a ki × ki companion matrix of the type
explained above.
Hence going back to Ti , since Ni can be written as
above, we see that Ti can be written as a direct sum
of matrices of the type:
1
and hence the ci s are its distinct eigenvalues and di >
1∀i and hence its minimal polynomial p has to be of the
form
p=








k
Y
(x − ci )ri
1
with 1 ≤ ri ≤ di . If Wi is the nullspace of (T − ci I)ri
then the Primary Decomposition theorem tells us that
Lk
V = 1 Wi and the operator Ti induced on Wi has as
its minimal polynomial (x − ci )ri .
ci
1
0
.
0
ci
1
.
...
...
...
...
0
0
0
.
ci
0 0 ... 1
0
0
0
.
0
ci







The above kind of matrices are called Elementary
Jordan Block of eigen value ci .
By abuse of notation calling T the matrix representation of the operator T and similarly forTi in the basis
just generated we see that the final form looks like:
Hence the operator Ni on Wi defined as Ni = Ti − ci I is
nilpotent. So now we want to do a Cyclic Decomposition
of Wi with respect to the nilpotent operator Ti −ci I on it.



T =

Hence we need to see how Cyclic Decomposition works for any nilpotent operator say N on
some finite dimensional vector space V .

T1 0 ... 0
0 T2 ... 0 

. . ... . 
. . ... . 
0 0 ... Tk
where
Cyclic Decomposition Theorem tells guarantees us
the existence of r non-zero vectors {v1 , v2 , ..., vr } with
N -annihilators {p1 , p2 , ..., pr } such that
V =
r
M


Ti = 

Z(vi , N )
1
and pi+1 |pi . Since N is nilpotent its minimal polynomial
is xk for some k ≤ n and hence each pi = xki where
k1 = k and kr ≥ 1 and ki+1 ≤ ki .
Now the idea of Companion Matrix guarantees that
there exists a basis in the subspace Z(vi , N ) in which the
induced operator is represented by a ki × ki matrix Ai of
the form:
0
1

Ai =  0
 .
0

0
0
1
.
0
...
...
...
...
...
0
0
0
.
1
0
0
0
.
0

Ji1 0
0 Ji2
.
.
.
.
0 0
... 0
... 0
... .
... .
... Jini





where Jim is the mth elementary Jordan block for
the eigen value ci (corresponding to the block for Ti )
and ni being the number of cyclic spaces into which the
nullspace of (x − ci I)ri splits.
The above is called the Jordan Form for the linear
operator.
From the Jordan form three crucial observations immediately follow:





• One can see that the dimension of the null space of
(T − ci I)ri is di i.e dim(Wi ) = di (where di is the
algebraic multiplicity of the eigenvalue ci and ri is
its geometric multiplicity)
10
• ni is the geometric multiplicity for the eigenvalue
ci since in the Cyclic Decomposition of Ni each
cyclic subspace Z(vm , Ni ) gives one vector i.e Nikm
which is non-zero and is in the null space of the operator Ni = Ti −ci I. Where km = dim(Z(vm , Ni )).
This again shows the diagonalizability criteria
that a linear transformation is diagonalizable
iff ni = di , ∀i where ni and di have been just
now argued to be the algebraic and the geometric
multiplicities respectively of the eigenvalue ci
• Dimension of the Elementary Jordan Block Ji1 for
all i is the multiplicity of the eigenvalue ci in the
minimal polynomial.
9.
The idea behind cyclic decomposition (Part 2)
Starting with the T -admissible space W0 one looks for
a vector w1 such that if p1 = s(w1 , W0 ) (the conductor
of w1 into W0 ) then p1 = maxw∈V degs(w, W0 ) and the
corresponding w is taken as w1 .
One continues this induction so that after k steps
Pk
one has Wk = W0 + 1 Z(wi , T ) and polynomials
p1 , p2 , ..., pk such that wk ∈ V, wk ∈
/ Wk−1 and among
all T -conductors into Wk−1 it has the maximal degree
conductor pk .
So we see
P that if w ∈ W and f ∈ s(w, Wk−1 ) then
f w = w0 + 1≤i≤k−1 gi wi where gi are some polynomials
and wi ∈ Wi . One can then show that f |gi and w0 = f z0
for some z0 ∈ W0 . (call this the “Divisibility Claim”)
After k steps of the induction call the w of above as
wk and f as pk and we have for some set of polynomials
say hi and with z0 as above, the relation
X
pk wk = pk z0 +
pk hi wi
1≤i≤k−1
and define
vk = wk − z0 −
X
hi wi
1≤i≤k−1
Since wk − vk ∈ Wk−1 we have s(vk , Wk−1 ) =
s(wk , Wk−1 ) = pk and since pk vk = 0 we have
Wk−1 ∩ Z(vk , T ) = {0}
pk vk = 0 + p1 v1 + p2 v2 + ... + pk−1 vk−1
on which applying the Divisibility Claim we have that
pi |pi−1
After getting Wk−1 one searches for the vector wk in
the rest of the space which has a conductor into Wk−1
of maximal degree to construct Wk = Wk−1 + Z(vk , T ).
And dim(Wk ) > dim(Wk−1 ) and hence this induction
will end after atmost dim(V ) steps.
10.
Let us now try to understand the basic idea behind how
the divisibility claim works, now that we have a hang of
how the cyclic decomposition works as to how to inductively search for the cyclic vectors which will exhaust the
full space by their cyclic subspaces.
The argument is initially almost the same as above.
Let gi = f hi + ri and ri = 0 or deg(ri ) < deg(f ) and
Pk−1
define z = w
P− 1 hi wi . Since z − w ∈ Wk−1 we have
f z = w0 + k−1 ri wi and let j be the largest i for which
1
ri 6= 0. Since Wj−1 ⊂ Wk−1 we have that there exists a
polynomial g such that p = gf where p = s(z, Wj−1 ) and
f = s(z, Wk−1 ). Then we have
.
and we have the trivial relation
X
pz = gf z = grj wj + gw0 +
gri wi
1≤i≤(j−1)
Since pz ∈ Wj−1 it implies that grj wj ∈ Wj−1 .
Now we remember that the degree of the monic generator of an ideal is the polynomial of the least degree
in that ideal and also that the pi s were chosen to be the
maximum degree polynomials among all the conductors
into the respective Wi s and hence combining these two
we have the inequality
deg(grj ) ≥ deg(s(wj , Wj−1 )) = pj ≥ degs(z, Wj−1 ) = deg(p)
and
deg(p) = deg(f g)
Hence we have deg(rj ) ≥ deg(f ) which is absurd by
the definition of j and hence we have that all the ri = 0.
This also shows that if
And we have the construction that
Wk = W0 ⊕ Z(v1 , T ) ⊕ Z(v2 , T ) ⊕ ... ⊕ Z(vk , T )
The idea behind cyclic decomposition (Part 3)
gi = f hi
and
z=
k−1
X
1
hi wi
11
then
II.
w0 = f z
but since W0 is by definition an admissible space we
have some z0 such that w0 = f z0 and hence we have for
any w ∈ V and f = s(w, Wk−1 ),
f (T )w = f (T )(z0 +
k−1
X
hi wi )
1
and hence it shows that at every step of the induction each Wi is a T -admissible space!
This completes the proof of the Cyclic Decomposition
Theorem.
11.
The larger picture
One notes that for the Cyclic Decomposition to work
we crucially needed two things
• That the ideals of conductors are principal coupled
with the non-existence of 0-divisors.
• The uniqueness coming from the existence of a notion of unique prime factorization.
The first property was coming from the fact that we
were working in the polynomial ring where the euclid’s division algorithm gives the monic generators of the ideals.
But in general we can carry over all that for any Principal Integral Domain (PID) (not necessarily a Euclidean
Domain) which also are Unique Factorization Domains
and since we never needed any specific property of vector
spaces for this we can do the same Cyclic Decomposition
on any module over a PID.
This givs us the “Structure theorem for modules over
PID” and since any abelian group is a Z−module where Z
is a PID we can have the “Structure Theorem for Abelian
Groups” which is conventionally stated as
Every finitely generated abelian group is a direct sum
of cyclic groups of prime power order and of a free
abelian group.
or
If G is a finitely generated group then V = L ⊕ Cd1 ⊕
Cd2 ⊕ ... ⊕ Cdk where Ci is a cyclic group of order i and
di > 1 and d1 |d2 |...|dk . One can identity any Cn to Zn .
ACKNOWLEDGEMENT
Interspersed in this document are influences arising
from the few hundred emails exchanged over the last
4 years with Vipul (formerly my college-mate at CMI
(Chennai Mathematical Institute) and currently a mathematics graduate student at UChicago) . A large part of
the core content is from the algebra book by Hoffman and
Kunze and there is the undeniable influence of the algebra
books by Herstein and Artin on me over the last 4 years,
Herstein being my initiation into the subject. Some of the
contents in the second appendix of this article are from
this breathtaking repository of group theory created by
Vipul http://groupprops.subwiki.org/wiki/MainP age
12
III.
APPENDIX-1 (SOME ELEMENTARY
IDEAS OF LIE ALGEBRA AND
REPRESENTATION THEORY)
A.
Lie Algebra
Definition 1 A Lie Algebra is a vector-space V over
the field F with a map V × V → V denoted by (x, y) 7→
[x, y] called the Lie Bracket of x and y such that the
following axioms are satisfied:
• The Lie Bracket is a bilinear map.
• In the first picture if g ∈ G then ρg : V → V and
hence we have an action π : G × V → V such that
π(g, v) = ρg (v) for all g ∈ G and v ∈ V . The
linearity of the action follows from ρg ∈ GL(V ).
• In the second picture if one fixes a g ∈ G then π(g) :
V → V such that π(g) ∈ GL(V ) since π(g −1 ) =
π(g)−1 (guarantee of existence of inverse from the
very defintion of a group action). So it gives a map
ρ such that ρg (v) = π(g)(v).
Gnerally the map ρ is kept implicit and we shall say
that the vector-space V is a representation of G.
• [x, x] = 0 for all x ∈ V
• [x, [y, z]]+[z, [x, y]]+[y, [z, x]] = 0 for all x, y, z ∈ V
The third axiom can also be said as that the Lie Bracket
saisfies the Jacobi Identity
We note that the second axiom implies anticommutativity i.e [x, y] = −[y, x] but anticommutativity implies
the second axiom only when Char(F ) 6= 2
We shall in general keep the bilinar map implicit an
call V as the Lie Algebra by a somewhat of an abuse of
notation.Further by a subspace of the Lie Algebra V we
shall mean a vector subspace of V inheriting the same
bilinear form from the original space but the subspace
need not be closed under this.
1.
Associated concepts of a Lie Algebra V
• Lie Algebras V1 and V2 over F are Isomorphic
Lie Algebras is there exists a vector space isomorphism φ : V1 → V2 such that φ([x, y]) = [φ(x), φ(y)]
• A subspace W of Lie Algebra V is said to be a
Subalgebra of V if [x, y] ∈ W for all x, y ∈ W .
• A Linear Lie Algebra is a subalgebra of gl(V )
which is the Lie Algebra of GL(V ).
• A Ideal I of the Lie Algebra V is a vector-subspace
of V such that [x, y] ∈ I whenever x ∈ V and y ∈ I.
Because of anticommutativity one does not need to
distinguish between Left Ideals and Right Ideals
B.
Representation Theory
Definition 2 A Representation of a group G on a
vector space V is either of the 2 equivalent data:
• A homomorphism ρ : G → GL(V )
• A group action π of G on V , π : G × V → V such
that π(g, v1 + v2 ) = π(g, v1 ) + π(g, v2 ) for all g ∈ G
and v1 ∈ V and v2 ∈ V .
The equivalence of the above 2 definition is easy to see.
1.
Associated concepts of a Representation
• Given two representations ρv on V and ρw on W ,
a map φ : V → W is called G-linear map if
the following is satisfied ρw(g) (φ(v)) = φ(ρv(g) (v)).
Where ρw(g) is the element of GL(W ) to which g
is mapped to by ρw and similarly for ρv(g) .
• Given a representation of the group G on V a Subrepresentation of G is a proper subspace W of G
which is invariant under the action of G on V .
• A representation V of G is said to be Irreducible
if V has no proper invariant subspace under that
action.
• If ρ is a representation of G on V with inner
product H0 , then one can define on V a Ginvariant inner-product i.e an inner product
H such that H(v1 , v2 ) = H(ρg (v1 ), ρg (v2 )) for
all v1 , v2 ∈ P
V . Such a H is can be defined as
H(v1 , v2 ) = g∈G H0 (ρg (v1 ), ρg (v2 )).
For compact Lie Groups we can extend this notion using the idea of Invariant Measures or Haar
Measures.
• Under the inner-product H let W ⊥ be the orthogonal subspace of W where W is an invariant subspace of the representation V , then W ⊥ is also a
representation.
• A G-linear map is a map between vector-spaces V
and W , φ : V → W , each carrying a representation
of the group G such that φ(g(v)) = g(φ(v)) for all
g ∈ G and v ∈ V . Where it is understood that the
action of g on the LHS is the representation on V
and the action of g on the RHS is the representation
on W .
– Given a G-linear map one can see that Im φ,
Ker φ and Coker φ are also representations of
G
– Given two vector-spaces the set of all G-linear
maps between them form a vector space and
this space can be thought of as the subspace of
13
Hom(V, W ) which are “fixed under the action
of G”
• Let G act from the left on a finite set X where the
elements of X are used to label the elements in a
chosen basis of a vector space V . So a basis of V
is given as {ex |x ∈ X} and
natural action
P hence a P
of G on V is given as g( ax ex ) =
ax egx . This
called the Permutation Representation
The following properties work equally well
for finite groups and compact topological
groups but for the later in general the summations over the group has to be replaced by
integrations wth respect to the Haar Measure on the group
• Let R be the space of all complex-valued functions
on G and then there is a natural representation of
G on R in the following way: Let f : G → C be
an element of R and g, h be any two elements in G.
Then the action is given by
gf (h) = f (g −1 h)
. This is called the Regular Representation
• Given a representation V of G which has W as a
subrepresentation, one can construct another subrepresentation W 0 of V such that V = W ⊕ W 0 .
There are 2 natural ways of consructing this W 0 :
– On V one can define the G-invariant inner
product and then look at the space W ⊥ which
is normal to W under this inner-product.
Then one can easily see that W ⊥ is a subrepresentation of G. Further
V = W ⊕ W⊥
– Let π0 be some projection map from V onto
W which is identity on W . There is nothing canonical about this map since choosing
such a projection map is equivalent to choosing some arbitrary subspace U of V such that
V = W ⊕ U and there are infinite ways in
which U can be chosen. But having made a
choice of π0 one can define another map π from
V onto W as follows (where we call the representation of G on V as ρ):
πV → W
π(v) =
X
ρg (π0 (ρg−1 (v)))
g∈G
– The above map π has the following properties:
∗ π is a G-linear map from V to W and
hence Ker(π) is a subrepresentation of G.
∗ If G is a finite group then on W , π is a
map that multiplies all vectors by |G|.
∗ One can easily show that
V = W ⊕ Ker(π)
. Hence the decomposition is done! But
one must note that the proof of independence of W and Ker(π) works ONLY IF
the field is NOT of finite charateristic.
• Continuing the above process inductively one can
see that for finite and compact groups, any representation can be written as a direct sum of irreducible representations. This property is called
Complete Reducibilty. One can state this as:
F or any representation V of a finite group G, there
is a direct sum decomposition
V = V1⊕a1 ⊕ V2⊕a2 ... ⊕ Vk⊕ak
where the Vi are distinct irreducible representation.
The decomposition of V into a direct sum of k factors is unique as in the Vi and its multiplicities ai
are unique.
• To prove the above one needs the following fact
which is called the Schur’s Lemma.
If V and W are two irreducible representations of a group G and φ : V → W is a G-linear
map then:
– Either φ is an isomorphism or φ = 0.
– If V = W and the underlying field is algebraically closed then φ is a scalar multiple of
identity.
The key idea to be used in understanding the above
is that for a G-linear map, the kernel and image
both are representations of the group and hence
irreducibility of the domain and image force them
to be either 0 or the full space. For the second
part, the algebraic closure assures the existence of
eigen values and eigen vectors for them. Let c be
some eigen value and then apply the first part to
the G-linear map φ − cI.
One notes that there are representations for which the
above fails and even given an invariant subspace one may
not find a complememtary invariant. Like the representation of R sending
1 a
a 7→
0 1
has the x − axis as its invariant subspace but has no
complementary subspace. Further complications in reducibility proofs arise if the underlying field is of finite
characteristic as pointed out earlier.
Later we shall see that if a group satisfies the property
of being Semisimple then it is completely reducible.
14
IV.
APPENDIX-2 (SOME ELEMENTARY
DEFINITIONS IN ALGEBRA)
A.
Group
Given a set S an action of a group G on the set S
is a map:
The concept of a Group evolved from ideas of symmery
transformations but the abstractions were started by
Abel and Galois. Finally the formal definition was given
by von Dyck in the 1880. We take the following definition of group for our purposes.
Definition 3 A Group is a set G equipped with the following 3 operations:
• A binary operation of multiplication, ∗ : G×G →
G.
• An unary operaion of inversion,
• To put it very bluntly the raison d’etre for Groups is
that they act on other structures in various natural
ways. Let us make this idea of act more precise:
−1
: G → G.
• A 0-ary operation 0 − ary : G → e which on any
element of the set evaluates to a special element
of the set denoted by e called the neutral or the
identity element.
G×S →S
(g, s) 7→ gs
such that the following 2 axioms are satisfied:
– 1s = s for all s ∈ S and 1 is the identity of G.
– (gg 0 )s = g(g 0 s) for all g, g 0 ∈ G and s ∈ S.
Given such an action S is called a G-set and sometimes the action is precisely called the Left Action
and the Right Action is defined in the obvious
way.
When a group G acts on a set X one can think of
this as giving a map from G → Aut(X)
• Neutrality of e, ∀a ∈ G, we have a∗e = e∗a = a.
Very often the set S is actually also a group and the
role of G is played by some subgroup of it. There
are many important and natural actions of a subgroup on the full group. But even in the general
setting of just a G-set many of the important notions can be seen. Like given this action and an
element s ∈ S one has the following two natural
sets:
• Annihilation by the inverse element,∀a ∈ G,
we have a ∗ a1 = a1 ∗ a = e.
– Orb(s) (Orbit of s) = {s0 ∈ S|s0 =
gs for some g ∈ G}
and the operations statisfy the following 3 compatibility
conditions:
• Associativity, ∀a, b, c ∈ G, we have a ∗ (b ∗ c) =
(a ∗ b) ∗ c.
– Stab(s) (Stabilizer of s)={g ∈ G|gs = s}
1.
Associated Concepts of a Group
• It is good to to be aware that there exists variants
of a group simplest of which are:
– Magma A set equipped with a binary operation.
– Monoid A Magma with an identity element
and whose binary operation is associative.
– Semigroup A Magma whose binary operation is associative.
– Inverse Semigroup A Semigroup whose every element has an inverse.
• A Subgroup H of a group G is a non-empty subset
of G which satsfies either of the following conditions:
Now we note the following important properties of
the above:
– One an put the following relation on S that
s ∼ s0 is s0 = gs for some g ∈ G. It is easy
to see that this is an E quivalence Relation
by checking for the properties of T ransitivity,
Reflexivity and S ymmetry and this check crucially needs the properties of closure,existence
of identity and inverse of each element of a
Group. This does NOT need associativity. So
it shows that {Orb(s)|s ∈ S} forms a Partition of the set S since “being in a orbit” is
an equivalence relation and hence 2 orbits can
either be identical or disjoint but cant’s have
a proper subset of each overlapping.
– Hence S is a disjoint union of orbits.
– Stab(s) is a subgroup of G.
– It is closed under l eft quotient of elements,
that is, for any a, b ∈ H, a−1 b ∈ H.
– It is closed under r ight quotient of elements,
that is, for any a, b ∈ H, ab−1 ∈ H.
– The partitioning effect on S and subgroup
property of Stab(s) depends only on the group
properties of G and doesn’t demand anything
special from S.
15
• A very important class of group actions is the Left
an Right action of a subgroup H on the full group G
by group multiplication given by the maps G×H →
G and H × G → G respectively. Same arguments
that worked for the general action on a set will
work here too since the group G is also a set to
show that the group multiplication action of H on
G partitions the fullgoup into equivalence classes.
But here we change terminologies slightly and call
Orbits as Cosets.
For this the Stab(g) is given a new name
Centralizer(g) denoted as Z(g). And the Orb(g)
under this action is also given a new name called the
the Conjugacy Class of g denoted as C(g). Stating
as equation we have:
Z(x) = {g ∈ G|gxg −1 = x}
A Left Coset of g ∈ G = gH = {gh|h ∈ H}
A Right Coset of g ∈ G = Hg = {hg|h ∈ H}
C(x) = {g ∈ G|g = g 0 xg 0−1 for some g 0 ∈ G}
So Left Cosets and Right Cosets both partition the
group G. One defines the symbol G/H (said as “G
mod H”) to be the set of all cosets of S. One doesn’t
need to distinguish between the Left and the Right
Cosets while defining G/H since there is a natural
isomorphism between the set of all Left Cosets to
the set of all Right Cosets.
Again from the idea of partitioning of the acted on
set by Orbits we have here what is called the Class
Equation stated as:
X
|G| =
|C|
conjugacyclassesC
• We note this following important thing that when
a group was acting on just a set S the different
Orbit(s) as s varies over S are of different sizes (for
finite groups one can think of “size” as cardinality
of the set Orb(s) and for others one can say that
there is no natural bijection from Orb(s) to Orb(r)
is s 6= r) But when a subgroup acts on the full
group all the orbits or the cosets for all elements
of the group are bijective to each other. The group
structure of the set being acted on brings in this
new feature. And it is easy to see that |gH| = |H|
for all g ∈ G (stating for finite groups). Since orbits partition the acted-on set, one trivially gets
the Lagrange’s Thorem that |H| divides |G|. Further since the cosets partition the set the number
of partitions is precisely |G/H| and hence we have
|G| = |H||G/H|.
• This brings us to the extremely important idea in
Group Theory that
If the group G acts on the set S then there is a
natural bijection from G/Stab(s) to Orb(s) for any
s ∈ S which maps the coset gStab(s) to the element
gs
This powerfully couples with the fact that the cardinality of the set S is the sum of the cardinalities
of the orbits of the action all of which are not guaranteed to be of the same size unless S is a group.
• Another standard group action which is important for Representation Theory is the Conjugation Action which is an action of a group G on
itself given as:
B.
Ring
The concept of a Ring evolved from attempts to generalize the properties of integers in which one can add as
well as multiply staying within the set but cannot divide.
Interestingly this generalization gives powerful extension
of the idea of a “prime number” and this lifting excitingly connects to geometry through what is called the
“Hilbert’s Nullstellensatz”. But just now we shall not
get into thse exciting ideas!
Definition 4 A Ring is a set R endowed with 2 binary
operations + (addition) and × (multiplication) such that
the following axioms are satisfied:
• R is an abelian group with respect to +.
• × operation is commutative, associative and has an
identity.
• The × operation is left-distributive as well as rightdistributive over + that is ∀a, b, c ∈ R we have a ×
(b + c) = a × b + a × c and (b + c) × a = b × a + c × a.
Some standard notation
• The identity for + is denoted by 0 and the identity
of × is denoted by 1.
G×G→G
• Almost always a × (b + c) is written as a(b + c). For
example the first equation of the third condition
above will read as a(b + c) = ab + ac.
(g, x) 7→ gxg −1
• (R, +) will mean R thought of as an abelian group
under addtion.
16
1.
Associated Concepts of a Ring
• A Subring is a subset of a Ring which satisfies all
the axioms of being a Ring.
• If 1 = 0 then the Ring is a zero ring.
• An Ideal I of a Ring R is subset of R such that:
– I is a subgroup of (R, +).
– If i ∈ I and r ∈ R then ri ∈ I.
D.
Vector Space
Definition 6 A Vector Space is a set of 3 data:
• An abelian group G.
• A field F .
• An action of F on G i.e a map F × G → G
where
If R is non-commutative then one needs to distingish between Left Ideal and Right Ideal depending on whether the r in the second axiom of
being an ideal is multiplying from the left or the
right.
• The group operation in G is denoted as addition by
+
An Ideal can also be defined as a non-empty subset
of R such that given a finite subset ai ⊂ I and
another finite subset ri ⊂ R the linear combination
r1 a1 + r2 a2 + ... + rk ak ∈ I.
and the following consistency conditions are satisfied
by the action:
If a, b ∈ F and v, w ∈ G and 1 is the multiplicative
identity of the Field.
• A Principal Ideal generated by an element r ∈ R
is the set of all multiples of r. It is denoted as (r).
(a) = aR = Ra = {ra|r ∈ R}
• An Ideal generated by {a1 , a2 , ..., an } ⊂ R is
the smallest Ideal containing these elements. It is
denoted by (a1 , a2 , ..., an ).
(a1 , a2 , ..., an ) = {r1 a1 + r2 a2 + ... + rn an |ri ∈ R}
• A Maximal Ideal of a Ring is an Ideal which is
not contained in any other ideal except itself and
the Ring.One notes that ther can be many maximal
ideals in a Ring.
• All ideals in Z are principal ideals. All maximal
ideals of Z are the principal ideals generated by
the prime numbers and the converse is also true.
• A Ring is an Integral Domain if 1 6= 0 and if
for a, b ∈ R such that ab = 0 then either a = 0 or
b = 0.
• The action of an element a ∈ F on v ∈ G is denoted
by av
• 1v = v
• (ab)v = a(bv)
• (a + b)v = av + bv
• a(v + w) = av + aw
We shall denote a vector space V as the triplet
(G, F, F × G → G). And when it is said that v ∈ V ,
v will be meant to be an elements of the implicit abelian
group that is part of the data called “Vector Space”.
1.
Associated concepts of a Vector Space
• W is called a Subspace of V = (G, F, F × G → G)
if:
– W ⊂G
– If w1 , w2 ∈ W then w1 + w2 ∈ W
– If w ∈ W and a ∈ F then aw ∈ W
– 0 (identity of G) is in W .
C.
Field
The concept of a F ield is modelled on the idea of Real
numbers where you can do addition, multiplcation, subtraction and division staying wthin the set.
Definition 5 A Field is a Ring where all elements except the additive identity have a multiplicative inverse.
So the Field is an abelian group under addition and
the non-zero elements of the Field form an abelian group
under multiplication. The left and the right distribution laws of the multiplication over addition are inherited
from it being a Ring to start off.
It is to be noted that the base-field of the subspace
W is F , the same as that of the total-space V .
• An Homomorphism φ from vector space V to W
is both over the field F is a map φ : V → W which
satisfies the following axioms:
If v1 and v2 are in V and c ∈ F then,
– φ(v1 + v2 ) = φ(v1 ) + φ(v2 )
– φ(cv) = cφ(v)
If φ is also bijective then it is called an Isomorphism. If V = W then the isomorphism is called
an Automorphism of V (or W ).
17
• If F is a Field then GL(n, F ) can be defined in
either of the following ways:
– The group of all automphisms of the n −
dimensional vector-space over F .
– The group of all invertible n × n matrices with
entries in F .
– Th group of all n × n matrices with entries in
F whose determinant is 6= 0.