Download 1. Algebra of Matrices

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Rotation matrix wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Symmetric cone wikipedia , lookup

Capelli's identity wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Four-vector wikipedia , lookup

Jordan normal form wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Determinant wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Matrix calculus wikipedia , lookup

System of linear equations wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Matrix multiplication wikipedia , lookup

Gaussian elimination wikipedia , lookup

Transcript
LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF
LINEAR EQUATIONS
MAT 204 - FALL 2006
PRINCETON UNIVERSITY
ALFONSO SORRENTINO
[Read also
§
1.1-7, 2.2,4, 4.1-4]
1. Algebra of Matrices
Denition.
set of
mn
Let
m, n
be two positive integers. A
elements, placed on
m
rows and

a11
 a21

 ..
 .
am1
We will denote by
M1,1 (R)
Mmn (R)
a12
a22
...
...
.
.
.
..
am2
...
the set of
can be identied with
R
n
m
m by n real matrix
is an ordered
columns:
.
by

a1n
a2n 

.
.
.

.
amn
n
real matrices.
Obviously, the set
(set of real numbers).
NOTATION:
A ∈ Mm,n (R). To simplify the notation, sometimes we will abbreviate A =
(aij ). The element aij is placed on the i-th row and j -th column, it will be also
(i)
denoted by (A)ij . Morerover, we will denote by A
the i-th row (ai1 , . . . , ain ) of


aij


A and by A(j) the j -th column  ...  of A. Therefore:
Let
amj

A(1)

.
.
.

A = A(1) . . . A(n) = 

.
A(m)
Remark.
For any integer
n ≥ 1,
we can consider the cartesian product
ously, there exists a bijection between
Denition.
Let
A ∈ Mm,n (R).
The
R
n
and
M1,n (R)
transpose of A
or
Rn .
Obvi-
Mn,1 (R).
is a matrix
B ∈ Mn,m (R)
such that:
bij = aji
The matrix
i-th
B
column of
for any
i = 1, . . . , n and j = 1, . . . , m.
T
. In few words, the i-th row of
will be denoted by
A
AT :
(AT )ij = Aji ,
(AT )j = A(j)
and
Moreover,
(AT )T = A .
1
(AT )i = A(i) .
A is
simply the
2
ALFONSO SORRENTINO
We will see that the space of matrices can be endowed with a structure of vector
space. Let us start, by dening two operations in the set of matrices
Mm,n (R):
the
addition and the scalar multiplication.
•
Addition:
+ : Mm,n (R) × Mm,n (R) −→
(A, B) 7−→
where
•
A+B
is such that:
Mm,n (R)
A+B
(A + B)ij = (A)ij + (B)ij .
Scalar multiplication:
· : R × Mm,n (R) −→ Mm,n (R)
(c, A) 7−→ cA
where
cA
is such that:
(cA)ij = c(A)ij .
Proposition 1. (Mm,n , +, ·) is a vector space. Moreover its dimension is mn.
Proof.
First of all, let us observe that the zero vector is the
matrix with all entries equal to zero:
to the addition) of a matrix
A,
0ij = 0];
zero matrix 0 [i.e., the
the opposite element (with respect
will be a matrix
−A,
such that
(−A)ij = −(A)ij .
We leave to the reader to complete the exercise of verifying that all the axioms that
dene a vector space hold in this setting.
Let us compute its dimension. It suces to determine a basis. Let us consider the
following
mn
matrices
E hk
h = 1, . . . , m and k = 1, . . . , n),
0
if (i, j) 6= (h, k),
(E hk )ij =
1
if (i, j) = (h, k)
(for
[in other words, the only non-zero element of
k -th
column]. It is easy to verify that for any
A=
m X
n
X
such that:
E hk is the one on the h-th row
A = (aij ) ∈ Mm,n (R), we have:
and
aij E ij ;
i=1 j=1
therefore this set of
mn
matrices is a spanning set of
Mm,n (R).
Let us verify that
they are also linearly independent. In fact, if
m X
n
X
aij E ij = 0
⇐⇒
aij = 0
for all
i = 1, . . . , m and j = 1, . . . , n .
i=1 j=1
This shows that
Denition.
Let
{E 11 , . . . , E mn }
is a basis.
A ∈ Mm,n (R).
• A is a square matrix (of order n), if m = n; the n-ple (a11 , . . . , ann ) is called
diagonal of A. The set Mn,n (R) will be simply denoted by Mn (R).
• A square matrix A = (aij ) ∈ Mn (R) is said upper triangular [resp. lower
triangular] if aij = 0, for all i > j [resp. if aij = 0, for all i < j ].
• A square matrix A ∈ Mn (R) is called diagonal if it is both upper and lower
triangular [i.e., the only non-zero elements are on the diagonal: aij = 0,
for all i 6= j ].
• A diagonal matrix is called scalar, if a11 = a22 = . . . = ann .
• The scalar matrix with a11 = a22 = . . . = ann = 1 is called unit matrix,
and will be denoted In .
• A square matrix A ∈ Mn (R) is symmetric if A = AT [therefore, aij = aji ]
T
and it is skew-symmetric if A = −A [therefore, aij = −aji ].
Now we come to an important question: how do we multiply two matrices?
The rst step is dening the product between a row and a column vector.
LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS

Denition.
Let
A = (a1 . . . an ) ∈ M1,n (R)
and

B=
b1
.
.
.
3


 ∈ Mm,1 (R).
We
bn
dene the multiplication between
A


AB = (a1 . . . an ) 
B

and
b1
.
.
.
by:

 = a1 b1 + . . . + an bn ∈ R .
bn
A ∈ Mm,n (R) and B ∈ Mn,p (R), we dene the matrix

 (1)
A B(1) A(1) B(2) . . . A(1) B(p)
 A(2) B(1) A(2) B(2) . . . A(2) B(p) 


AB = 
 ∈ Mm,p (R) ,
.
.
.
..
.
.
.


.
.
.
.
More generally, if
where
A(i) B(j)
Remark.
product:
A(m) B(1) A(m) B(2) . . . A(m) B(p)
Pn
= k=1 aik bkj , for all i = 1, . . . , m and j = 1, . . . , p.
Observe that this product makes sense only if the number of columns of
A is the same as the number of rows of B . Obviously, the product is always dened
when the two matrices A and B are square and have the same order.
Let us see some properties of these operations (the proof of which, is left as an
exercise).
Proposition 2. [Exercise]
i) The matrix multiplication is associative; namely:
(AB)C = A(BC)
ii)
for any A ∈ Mm,n (R), B ∈ Mn,p (R) and C ∈ Mp,q (R).
The following properties hold:
(A + B)C = AC + BC , for any A, B ∈ Mm,n (R) and C ∈ Mn,p (R);
A(B + C) = AB + AC , for any A ∈ Mm,n (R) and B, C ∈ Mn,p (R);
AIn = A = Im A, for any A ∈ Mm,n (R);
(cA)B = c(AB), for any c ∈ R and A ∈ Mm,n (R), B ∈ Mn,p (R);
(A + B)T = AT + B T , for any A, B ∈ Mm,n (R);
(AB)T = B T AT , for any A ∈ Mm,n (R) and B ∈ Mn,p (R).
Remark.
A
is
A ∈ Mn (R), it is not true in general that there
AB = BA = In . In case it does, we say that
inverse by A−1 .
Given a square matrix
exists a matrix
B ∈ Mn (R)
such that
invertible and we denote its
Let us consider in
Mn (R)
GLn := {A ∈ Mn (R) :
This set is called
the subset of invertible matrices:
there exists
B ∈ Mn (R) such
that
AB = BA = In } .
general linear group of order n.
We leave as an exercise to the reader, to verify that the following properties hold.
Proposition 3. [Exercise]
−1
−1
i) For any A1 , A2 ∈ GLn (R), we have: (A1 A2 )
= A−1
2 A1 .
T
T −1
ii) For any A ∈ GLn (R), we have: A ∈ GLn (R) and (A )
= (A−1 )T .
−1
iii) For any A ∈ GLn (R) and c ∈ R, with c 6= 0, we have: (cA)
= 1c A−1 . In
−1
−1
particular, (−A) = −(A ).
An important subset of
matrices.
GLn (R),
that we will use later on, is the set of
orthogonal
4
ALFONSO SORRENTINO
Denition.
A matrix
A ∈ Mn (R)
is called
orthogonal, if:
[i.e., A−1 = AT ].
AT A = In = AAT
The set of the orthogonal matrices of order
from what observed above (i.e.,
A−1 = AT ),
n
On (R). Moreover,
On (R) ⊂ GLn (R) (i.e.,
is denoted by
it follows that
orthogonal matrices are invertible).
To conclude this section, let us work out a simple exercise, that will provide us with
a characterization of
Example.
The set
Rα =
for all
O2 (R).
O2 (R)
consists of all matrices of the form:
− sin α
cos α
cos α
sin α
cos α
sin α
and Sα =
sin α
− cos α
,
α ∈ R.
Proof.
One can easily verify (with a direct computation), that:
T
T
Rα Rα
= I2 = Rα
Rα
and Sα SαT = I2 = SαT Sα ;
therefore, these matrices are orthogonal.
We need to show that all orthogonal
matrices are of this form.
Consider a matrix
such that
A = Rα
a b
∈ O2 (R).
c d
A = Sα . By denition:
A=
or
A ∈ O2 (R) ⇐⇒
⇐⇒
α ∈ R,
AT A = In = AAT ⇐⇒
 2
 a + c2 = 1 = a2 + b2
ab + cd = 0 = ac + bd
 2
b + d2 = 1 = a2 + c2 .
From the rst two equations, if follows that
i) b = c
Let us show that there exists
b2 = c2 .
There are two cases:
ii) b = −c .
or
i) In this case, plugging into the other equations, one gets that
(a + d)c = 0
and therefore:
i0 ) c = 0
or
i00 ) a + d = 0 .
In the case i'):
A=
namely,
A
a 0
0 d
with a2 = d2 = 1 ;
is one of the following matrices:
1
0
0
1
−1 0
= Sπ ,
0 1
−1 0
= Rπ .
0 −1
= R0 ,
1 0
= S0 ,
0 −1
In the case i):
A=
with a2 + b2 = 1 ;
A = Sα with α ∈ [0, 2π)
2
2
possible since a + b = 1).
therefore,
is
a b
b −a
such that
(a, b) = (cos2 α, sin2 α)
(this
LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS
(a − d)c = 0
ii) Proceeding as above, we get that
5
and therefore there are two
possibilities:
ii0 ) c = 0
ii00 ) a − d = 0 .
or
In the case ii'), we obtain again:
a 0
0 d
a −c
c a
A=
with a2 = d2 = 1 .
In the case ii):
A=
with a2 + c2 = 1 ;
A = Rα with α ∈ [0, 2π)
2
2
possible since a + c = 1).
therefore,
is
such that
(a, c) = (cos2 α, sin2 α)
(this
2. Systems of linear equations I: Gauss-Jordan method
Denition.
A
be abbreviated
system of m linear equations and n unknowns x1 , . . . , xn
LS
LS(m, n, R)) is a set of m equations

a11 x1 + a12 x2 + . . . + a1n xn = b1




a21 x1 + a22 x2 + . . . + a2n xn = b2
or




(that will
of the form:
.
.
.
am1 x1 + am2 x2 + . . . + amn xn = bm ,
or, in a shorter way:
Pn
j=1
aij xj = bi
for
i = 1, . . . , m .
aij ∈ R are called coecients and the bi ∈ R are the right-hand sides
knowns) of the equations of the LS .
n
A solution of such LS is a n-ple (z1 , . . . , zn ) ∈ R such that
The elements
(or
Pn
j=1
A
LS
aij zj = bi
for
without any solution is said to be
ible or solvable.
i = 1, . . . , m .
incompatible; otherwise it is called compat-
LS(m, n, R) is homogeneous (it will be abbreviated HLS(m, n, R)), if bi = 0
i = 1, . . . , n. Obviously, a HLS is always solvable; in fact, the zero n-ple
(0, . . . , 0) is always a solution (the trivial solution); the other non-trivial solutions
A
for all
(if they exist) are called
Observe that given any
substituting all
tem.
bi 's
by
eigensolutions.
LS(m, n, R), it is always possible to obtain a HLS(m, n, R),
zeros; this new HLS is called associated homogeneous sys-
Introducing the matrix notation, it is possible to rewrite linear systems in a more
compact way.
In fact, consider a
LS(m, n, R):
Pn
j=1 aij xj = bi
for
i = 1, . . . , m
and let us denote:
• X
the column of unknowns

x1



.
.
.

,
xn
• A = (aij ) ∈ Mm,n (R),
the matrix of coecients,
6
ALFONSO SORRENTINO


• b=

b1
.
.
.

 ∈ Mm,1 (R),
the columns of right-hand sides.
bm
From the denition of matrix multiplication, it follows that:

a11 x1 + a12 x2 + . . . + a1n xn
 a21 x1 + a22 x2 + . . . + a2n xn

AX = 
.
.

.
am1 x1 + am2 x2 + . . . + amn xn
Therefore, the
LS
can be rewritten in the


b1
  b2 

 
 =  ..  = b .
  . 
bm

matricial form:
AX = b .
(A b) ∈ Mm,n+1 (R) is called complete matrix or bordered (or edged)
AX = b and identies it univocally; there is, indeed, a bijecbetween Mm,n+1 (R) and LS(m, n, R).
The matrix
matrix
tion
of the system
Remark.
(z1 , . . . , zn ) ∈ Rn of the
 LS(m,
 n, R) AX =
z1


b , are in a 1 − 1 correspondence with the column matrices z =  ...  ∈ Mn,1 (R)
It is evident, that the solutions
zn
such that
Az = b.
Moreover, observe that any solution
z = (z1 , . . . , zn )
of the linear system
can be expressed as a linear combination of the columns of

z1
.
.
.

b = Az = (A(1) , . . . , A(n) ) 

A;
AX = b
in fact:
n
 X
zi A(i) .
=
i=1
zn
Proposition 4. Let AX = 0 be a HLS(m, n, R). The set Σ0 of its solutions is a
vector subspace of Rn .
Proof.
It suces to verify that for any
In fact (identifying
y
and
z
y, z ∈ Σ0
and
a, b ∈ R,
then
ay + bz ∈ Σ0 .
with the corresponding column matrices):
A(ay + bz) = A(ay) + A(bz) = a(Ay) + b(Az) = a0 + b0 = 0 ;
hence,
ay + bz ∈ Σ0 .
Proposition 5. Let AX = b be a LS(m, n, R) and Σ the set of its solutions. Let us
denote by Σ0 the set of solutions of the associated HLS AX = 0. If Σ is non-empty
(i.e., the LS is solvable), then for any z0 ∈ Σ, we have:
Σ = z0 + Σ0 = {z0 + y,
for all y ∈ Σ0 }.
Hence, there is a bijection between Σ and Σ0 .
Proof.
(⊆) Let
z ∈ Σ (i.e., Az = b).
Since
Az0 = b,
then:
A(z − z0 ) = Az − Az0 = b − b = 0 ;
consequently,
z − z0 ∈ Σ 0 .
(⊇) Let us verify that for any
Hence,
y ∈ Σ0 ,
z = z0 + (z − z0 ) ∈ z0 + Σ0 .
z0 + y ∈ Σ. In fact,
we have
A(z0 + y) = Az0 + Ay = b + 0 = b .
LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS
Remark.
AX = b
Let
Rn .
In fact,
correspondence with a vector subspace
Σ
Σ
(0, . . . , 0) 6∈ Σ; nevertheless it is in a 1 − 1
n
of R (namely, Σ0 associated to AX = 0).
a solvable non-homogeneous linear system. In this case,
is never a vector subspace of
Since
7
is NOT a vector space, it does not make sense to talk about its
dimension,
but we can associate to it a sort of dimension, dened in terms of the dimension of
Σ0 . More explicitly, if dim (Σ0 ) = t, we will say that the LS AX = b
∞t solutions. In particular, if dim (Σ0 ) = 0 (i.e., the associated
only the trivial solution) then AX = b, if solvable, has only one solution
the associated
(if solvable) has
HLS has
0
= 1).
(∞
From prop.
5, we deduce that in order to identify all possible solutions of a
LS AX = b,
it is sucient to nd one particular solution and all the solutions of
the associated
Denition.
HLS
(and sum them up).
Two linear systems
unknowns are
AX = b
and
A0 X = b0
with the same number
n
of
equivalent, if and only if they have the same solutions (i.e., Σ = Σ0 ).
We want now to describe an algorithm to solve a linear system: the Gauss-Jordan
method.
Let us start with a denition.
Denition.
A
LS(m, n, R) AX = b
aij = 0
aii 6= 0
A

In particular, the matrix
is called a
if i > j
for i = 1, . . . , m .
is upper triangular,
a11
 0

 ..
 .
0
...
a22
...
...
..
..
.
0
step system, if m ≥ n and
i.e., of the form:
...
...
.
0
am,m

...
... 

. .
. 
.
...
Proposition 6. Every step LS(m, n, R) AX = b is solvable and has ∞n−m solutions.
Proof.
If
m = n,
the
LS
has only one solution
x = (x1 , . . . , xn ),
that we get in the
following way:
•
let us start solving the last equation of the system
the component
•
plugging this value of
the component
•
ann xn = bn ,
and we get
xn ;
xn
in the second-last equation and solving it, we get
xn−1 ;
proceeding in the same way, one can nd all other components
xn−2 , . . . , x2 , x1 .
m < n. If we assign to the last n − m components xm+1 , . . . , xn ,
t1 , . . . , tn−m ∈ R, we obtain a step LS(m, m, R), that, for what
has a unique solution (x1 , . . . , xm ). Therefore,
Assume now that
arbitrary values
observed above,
x = (x1 , . . . , xm , t1 , . . . , tn−m )
is a solution for the original system (so it is solvable).
In other words, we have dened an application
Φ : Rn−m −→
(t1 , . . . , tn−m ) 7−→
Σ
x = (x1 , . . . , xm , t1 , . . . , tn−m ) .
LS(m, m, R)
z = (z1 , . . . , zn ) ∈ Σ,
Such application is a bijection. In fact, it is injective since each step
has a unique solution; moreover, it is surjective, since for any
we have:
Φ(zm+1 , . . . , zn ) = z .
8
ALFONSO SORRENTINO
From the denition of dimension introduced above, to show that the
dim (Σ0 ) = n − m
HLS ).
solutions, we need to verify that
(where
Σ0
LS
has
∞n−m
denotes the vector
subspace of the solutions of the associated
Analogously to what already done above, dene
Φ0 : Rn−m
where
Σ0
−→ Σ0 ,
is the solution space associated to
standard basis of Rn−m .
AX = 0
and by
e1 , . . . , en−m
the
One can easily verify that
Φ0 (e1 ), . . . , Φ0 (en−m )
form a basis for
Σ0
[EXERCISE]; this completes the proof.
Gauss-Jordan method consists of transforming (if possible) a given LS(m, n, R)
AX = b, into an equivalent step linear system (that can be solved using the method
The
of prop. 6).
In order to perform this transformation, several
elementary operations (on the equa-
tions) are allowed:
• I elementary operation:
LS ;
• II elementary operation:
interchange the position of two equations of the
Multiply both sides of an equation, by a non-zero
real number;
• III elementary operation:
Substitute one equation, with the sum of the
same equation and a multiple of another.
It is evident that these transformations do not change the set of solutions,
transform the given system in an equivalent one.
In particular, we can see these operations as operations on the rows of the
matrix M = (A b) of the system; more precisely (remember that M (i)
i-th
row of
I:
II:
III:
i.e., they
complete
denotes the
M ):
(i)
[M
←→ M (j) ];
(i)
[M
←→ cM (i) ], where c ∈ R and c 6= 0;
(i)
[M
←→ M (i) + cM (j) ], with i 6= j and c ∈ R.
Now, let us describe the Gauss-Jordan algorithm (in four steps):
i) we eliminate all zero rows of
0 = 0);
ii) we want to make sure that
M
(they correspond to the trivial equation
M(1) 6= 0 (i.e., A(1) 6= 0);
in case it is not, we
can always pursue it, interchanging two columns (this corresponds to an
interchange of 2 variables);
iii) we want to make
If
a11 = 0,
a11 = 1.
To obtain this, we proceed as follows.
we perform operation I
[M (1) ←→ M (i) ] and we get a11 6= 0 (it
is always possible to do this since, from step ii) it follows that there exists
M(1) ).
a11 6= 0. If a11 6= 1, we perform operation II
[M (1) ←→ a111 M (1) ] and get a11 = 1.
iv) Now, we proceed in order to get a21 = a31 = . . . = am1 = 0. It is sucient
(i)
to perform operation III [M
←→ M (i) − ai1 M (1) ] for i = 2, . . . , m.
After these transformations, the matrix A looks like:


1 a12 . . . . . .
 0 a22 . . . . . . 


 0
...
... ... 


 ..

.
..
.
 .

.
.
0 am0 ,2 . . . . . .
a non-zero element in the column
Now, we can assume that
LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS
with
9
m0 ≤ m.
We can now repeat the algorithm i)-iv), starting from the second row and the
second column (
i.e.,
we apply it to the sub-matrix obtained eliminating the rst
row and the rst column) and get:







with
1 a12
0 1
0 0
...
...
a33
.
.
.
.
.
.
.
.
.
0
0
am00 ,3
...
...
...




.

... 
...
m00 ≤ m0 .
We keep on iterating the algorithm (this time starting from the third row and third
column and then on)
•
etc ...
If, during the performance of the algorithm, we obtain a row
with
b 6= 0,
then the
LS
(0, . . . , 0, b),
is incompatible (and the Gauss-Jordan method
stops);
•
otherwise, we end up with a step system that we can solve, as showed in
prop. 6.
Observe that, if the method involved any interchange of variable, we need
to restore the original one in the nal solution (using the opposite interchange).
[READ
§
1.5 - Triangular factors and row exchange AND
§
1.6 - Inverses and
transposes (in particular, how to compute the inverse using Gauss-Jordan method).]
3. Determinant of a square matrix
[READ Ch. 4 on the book, for an introduction to the determinant]
We just summarize some properties of the determinant (see
§
4.2).
Proposition 7. Let A = (aij ) ∈ Mn (R). We have:
Qn
i) if A is a diagonal matrix, det(A) =
i=1 aii . In particular, det(In ) = 1;
ii) if A has a zero row or column, then det(A) = 0;
T
iii) det(A) = det(A );
(i)
iv) let A
= bU +cV , where b, c ∈ R and U, V ∈ M1,n (R). Let B, C ∈ Mn (R),
the matrices obtained from A, substituting A(i) with (respectively) U and
V ; we have:
det(A) = b det(B) + c det(C)
v)
vi)
vii)
[ an analogous result holds for the columns of A];
Let B ∈ Mn (R) be the matrix obtained from A, by interchanging two rows
or columns; then det(B) = − det(A);
If A has two proportional rows or columns, then det(A) = 0;
(Binet's Theorem):
det(AB) = det(A) det(B) ;
viii)
For all A ∈ GLn (R), then:
det(A−1 ) =
1
.
det(A)
In particular, it follows that if A ∈ GLn (R), then det(A) 6= 0.
10
ALFONSO SORRENTINO
Finally, we want to illustrate another important property of the determinant (
theorem).
Denition.
and
Laplace's
Some preliminary denitions are necessary.
Let
1 ≤ q ≤ n.
M ∈ Mm,n (R) and p, q positive integers, such
p integers {i1 , . . . , ip } such that
that
1≤p≤m
Let us choose
1 ≤ i1 ≤ . . . ≤ ip ≤ m ,
and
q
{j1 , . . . , jq }
integers
such that
1 ≤ j1 ≤ . . . ≤ jq ≤ n .
submatrix of M relatively to the rows i1 , . . . , ip and the columns
j1 , . . . , jq , as the matrix obtained intersecting the rows M (i1 ) , . . . , M (ip ) and the
columns M(j1 ) , . . . , M(jq ) .
This submatrix will be denoted by M (i1 , . . . , ip | j1 , . . . , jq ).
We dene the
M,
One can verify that the submatrices of
m
p
n
q
=
(p, q)
are exactly:
m(m − 1) . . . (m − p + 1) n(n − 1) . . . (n − q + 1)
.
p!
q!
Denition.
element
of type
Let A = (aij ) ∈ Mn (R),
A
αij = αij
∈ R, dened as:
with
n ≥ 2.
We call
cofactor
of
aij ,
the
A
αij = αij
= (−1)i+j det (A(1, . . . , 6 i, . . . , n | 1, . . . , 6 j, . . . , n)) .
The matrix formed by all cofactors is called
A
and it is denoted by
cofactor matrix
(or
adjoint matrix)
of
CA :

α11
 α21

CA =  .
 ..
αn1
α12
α22
...
...
.
.
.
..
αn2
...
.

α1n
α2n 

.
.
.

.
αnn
Theorem 1 (Laplace's Theorem). Let A ∈ Mn (R), with n ≥ 2. For any
i, j = 1, . . . , n:
det(A) =
n
X
ait αit
and
det(A) =
t=1
n
X
atj αtj
t=1
these expressions are called, respectively, expansion of the determinant with respect
to the row A(i) and expansion of the determinant with respect to the column A(j) ).
(
Remark.
This theorem provides a very useful tool for computing the determinant
of a matrix.
In fact, given a square matrix
A
of order
n
and xed one of its
rows or columns, we can rewrite the determinant as the sum of
submatrices of order
as the sum of
n−1
n − 1;
n
determinants of
inductively, each of these determinants can be written
determinants of submatrices of order
n−2
and so on ...
Obviously, the more zeros there are in the chosen row or column, the easier the
computation becomes.
Corollary 1. Let A ∈ Mn (R). We have:
T
CA
A = det(A)In = AT CA .
It follows that: if A ∈ GLn (R), then
A−1 =
1
CT .
det(A) A
LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS
Proof.
11
Let us start to verify that:
T
(CA
A)ij
= det(A)δij =
det(A)
0
if i = j
if i =
6 j.
In fact,
T
(CA
A)ij
(i)
T
A(j) = (CA )(i) A(j) =


a1j
n

 X
(α1i , . . . , αni )  ...  =
αti atj .
T
CA
=
=
anj
t=1
i = j , the above sum is the expansion of the det(A) with respect to A(i) ; therefore,
T
(CA
A)ij = det(A).
If i 6= j , let us denote by B the matrix obtained from A, by substituting A(i) with
A(j) . Let us compute the Laplace expansion of B with respect to B(i) . Observe
A
B
that αti = αti (in fact, these cofactors do not depend on the i-th row, and A and
B coincide apart from this row). Hence:
If
0 = det(B) =
n
X
B
bti αti
t=1
=
n
X
A
T
atj αti
= (CA
A)ij .
t=1
It remains to verify that:
T
(ACA
)ij = det(A)δij .
Proceeding as above, one can verify:
T
(ACA
)ij =
n
X
αjt ait .
t=1
i = j , the above sum is the expansion of the det(A), with respect to A(i) ; if i 6= j ,
(i)
(j)
let us denote by B the matrix obtained from A, by substituting A
with A
. One
(j)
expands det(B) with respect to B
and can proceed as above.
If
4. Rank of a matrix
[See also denition in
§
2.2,4]
In lecture I, we dened the
vector subspace they span;
rank of t vectors v1 , . . . , vt ∈ V
i.e.,
as the dimension of the
rank (v1 , . . . , vt ) = dim (hv1 , . . . , vt i).
Recall, that it coincides with the maximum number of linearly independent vectors
among
{v1 , . . . , vt };
moreover,
rank (v1 , . . . , vt ) ≤ min{t, dim (V )}.
A ∈ Mm,n (R). Observe that the matrix A is made of m rows A(1) , . . . , A(m) ∈
M1,n (R) and n columns A(1) , . . . , A(n) ∈ Mm,1 (R); moreover, recall that M1,n (R)
and Mm,1 (R) are two vector subspace, with dimension respectively n and m.
Let
Denition.
We dene the
row rank of A as the non-negative integer:
rA = rank (A(1) , . . . , A(m) ) = dim (hA(1) , . . . , A(m) i) .
Obviously,
rA ≤ min{m, n}.
Analogously, we dene the
column rank of A as the non-negative integer:
cA = rank (A(1) , . . . , A(n) ) = dim (hA(1) , . . . , A(n) i) .
Also in this case,
cA ≤ min{m, n}.
12
ALFONSO SORRENTINO
Theorem 2. For each matrix A ∈ Mm,n (R), it results: rA = cA .
For a proof of this theorem, see
§
2.4.
rank
This common value will be called
Denition.
Let
A ∈ Mm,n (R).
A.
We will call the
equivalently the column rank) of
independent of rows (or columns)
Remark.
of
A. This
of A.
i) For any matrix
rank
A,
of
the row rank (or
is the maximum number of linearly
A ∈ Mm,n (R),
we have:
T
rank (A ) = rank (A)
[in fact,
rank (A) = rA = cAT = rank (AT )].
ii) Performing an elementary operation on the rows or columns of
A,
the rank
does not change.
Proposition 8. Let A ∈ Mn (R). The following conditions are equivalent:
i)
ii)
iii)
A ∈ GLn (R);
det(A) 6= 0;
rank (A) = n.
Proof.i) ⇔
ii)
⇒
ii) See prop. 7 (viii).
rank (A)
P < n, then one row is a linear combination
A(i) = j6=i cj A(j) and denote by Bj the matrix ob(i)
(j)
tained by A, substituting A
with A
(for j = 1, . . . , n and i 6= j ), then
det(Bj ) = 0. Moreover, because of prop. 7:
X
X
det(A) =
cj det(Bj ) =
cj 0 = 0 ;
iii) If, by contradiction,
of the others.
Let
j6=i
j6=i
contradiction.
iii)
⇒
i) Let us denote by
E
i
1j
= δij ).
{E 1 , . . . , E n }
the standard basis of
Since, by hypotheses
Ei =
n
X
bit A(t)
(1)
{A
,...,A
(n)
}
M1,n (R)
(therefore,
is a basis of
M1,n (R):
[i = 1, . . . , n],
t=1
Pn
bit ∈ R [note that E i 1j = t=1 bit atj ].
B = (bit ) ∈ Mn (R), we have:
for suitable
matrix
(BA)ij = B (i) A(j) =
n
X
bit atj = E i
1j
If we consider the
= δij ;
t=1
hence,
BA = In .
C ∈ Mn (R),
AC = In (in fact, from BA = In = AC , it follows easily that
B = C and A ∈ GLn (R)).
To obtain C , one can proceed as above, expressing the matrices of the
standard basis of Mn,1 (R) in terms of {A(1) , . . . , A(n) }.
In order to conclude the proof, it suces to determine a matrix
such that
We want to conclude this section, characterizing the rank of a matrix as the greatest
order of its non-zero minors. Let us start with a denition.
Denition.
A ∈ Mm,n (R). We call minor (of order t) of A, the determinant
A of order t.
ρ = ρ(A) the greatest order of the non-zero minors of A. In other
Let
of a square submatrix of
We denote with
words,
ρ
is dened by the following conditions:
LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS
•
•
there exists at least an invertible square submatrix of
all square submatrices of
A
of order
>ρ
A
of order
have determinant
13
ρ;
= 0.
Theorem 3. Let A ∈ Mm,n (R). Then, rank (A) = ρ(A).
We need a lemma.
Lemma 1. Let B a submatrix of A; then rank (B) ≤ rank (A).
Proof. [Lemma 1]
Let B = A(i1 , . . . , ip | j1 , . . . , js ). Consider the matrix M =
A(i1 , . . . , ip | 1, . . . , n), of which B is a submatrix. We have: B = M (1, . . . , p | j1 , . . . , js ) .
Obviously,
rank (M ) = dim (hA(i1 ) , . . . , A(ip ) i) ≤ rA = rank (A) .
Moreover,
rank (B) = dim (hM(j1 ) , . . . , M(js ) i) ≤ cM = rank (M ) .
It follows, that
rank (B) ≤ rank (M ) ≤ rank (A) .
Proof. [Theorem 3]
Let us start to verify that ρ = ρ(A) ≤ rank (A). Let us
A an invertible submatrix of order ρ: M ∈ GLρ (R). Because of prop. 8,
rank (M ) = ρ; from lemma 1, rank (M ) ≤ rank (A). Therefore, ρ ≤ rank (A).
choose in
r = rank (A) ≤ ρ(A). Let us choose in A r linearly
A(i1 ) , . . . , A(ir ) [with i1 < . . . < ir ]. Let B be the submatrix of
A, formed by these rows. Obviously, rank (B) = r and therefore, B has r linearly
independent columns: B(j1 ) , . . . , B(jr ) . If we dene M = A(i1 , . . . , ir | j1 , . . . , jr ),
we have that det(M ) 6= 0 and consequently ρ(A) ≥ r .
Conversely, let us prove that
independent rows:
5. Systems of linear equations II: Rouché-Capelli's theorem and
Cramer's theorem
Rouché-Capelli's theorem provides a useful tool to determine whether a system is
solvable or not.
Theorem 4 (Rouché-Capelli). Let AX = b be a LS(m, n, R). Such LS is
solvable if and only if rank (A) = rank (A b).
If such system is solvable, then it admits ∞n−rank (A) solutions.
Proof. AX = b is
 solvable
 ⇐⇒
there exists
a ∈ Mn,1 (R)
such that
Aa = b
a1
⇐⇒
there exists
hA(1) , . . . , A(n) i
rank (A b).


.
.
.
such that
Pn
i=1
ai A(i) = b
an
⇐⇒ hA(1) , . . . , A(n) i = hA(1) , . . . , A(n) , bi
Now, suppose that the
rank (A b).

 ∈ Mn,1 (R)
LS(m, n, R) AX = b
is solvable and denote
r
b ∈
rank (A) =
r = rank (A) =
A), we
Without any loss of generality (up to interchanging the rows of
r rows of A are linearly independent. Therefore, also the
(A b) are linearly independent (and the remaining m − r are a linear
can assume that the rst
rst
⇐⇒
⇐⇒
rows of
combinations of the former).
Let us denote:


A∗ = 
A(1)

.
.
.


A(r)

and

b∗ = 
b1

.
.
.


br
14
ALFONSO SORRENTINO
LS(r, n, R) A∗ X = b∗ .
It is easy to see that this LS is equivalent
A∗ X = b∗ , instead of AX = b.
∗
∗
If we apply the Gauss-Jordan algorithm to A X = b , since this system is compatible, we will end up with a step LS ; moreover, since rank (A) = r and the rank is
and consider the
to the original one; hence, we will solve
preserved by elementary operations, we won't obtain any zero row! In the end our
step system will have exactly
r
equations and, as observed in prop. 6, it will have
∞n−r .
Proposition 9. Let A ∈ Mm,n (R) and B ∈ Mn,p (R). One has:
rank (AB) ≤ min{rank (A), rank (B)}.
Proof.
It is sucient to verify that
rank (AB) ≤ rank (B).
In fact, if this inequality
is true, it follows that:
rank (AB) = rank ((AB)T ) = rank (B T AT ) ≤ rank (AT ) = rank (A).
Let us consider the
i-th
AB :
=
A(i) B(1) . . . A(i) B(p) =
!
n
n
X
X
=
ati bt1 . . .
ati btp =
row of
AB (i)
t=1
=
n
X
t=1
ati (bt1 . . . btp ) =
t=1
Therefore,
AB
(i)
∈ hB
(1)
, ..., B
(n)
n
X
ati B (t) .
t=1
i
[for
i = 1, . . . , m].
It follows that
hAB (1) , . . . , AB (n) i ⊆ hB (1) , . . . , B (n) i
and consequently
rank (AB) ≤ rank (B).
Corollary 2. Let A ∈ Mm,n (R). For any B ∈ GLn (R) and C ∈ GLm (R), one
has:
rank (A) = rank (AB) = rank (CA).
Proof.
We have:
rank (AB) ≤ rank (A) = rank (ABB −1 ) ≤ rank (AB);
therefore,
rank (AB) = rank (A) .
Analogously:
rank (CA) ≤ rank (A) = rank (C −1 CA) ≤ rank (CA);
therefore,
rank (CA) = rank (A) .
Let us state and prove another method, that can be used to solve
LS(n, n, R):
Cramer's method.
Theorem 5. Let AX = b a LS(n, n, R), with A ∈ GLn (R). We have:
i) this LS has a unique solution;
ii) for i = 1, . . . , n let us denote with Bi the matrix obtained by A, substituting
the i-th column A(i) with b. The unique solution of this LS is given by:
x=
(Cramer's formula).
det(Bn )
det(B1 )
, ...,
det(A)
det(A)
LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS
Proof.
i) The column
x = A−1 b
−1
A(A
If
y
AX = b;
is a solution of
−1
b) = (AA
15
in fact:
)b = In b = b .
is another solution, then:
y = In y = (A−1 A)y = A−1 (Ay) = A−1 b ;
therefore, the solution is unique.
Bi , using Laplace's expansion (cofactors
i-th column:
n
n
X
X
(B )
(A)
det(Bi ) =
bt αti i =
bt αti
ii) Let us compute the determinant of
expansion) with respect to the
t=1
[in fact, the cofactors w.r.t. the
A−1 =
Since
t=1
i-th
column of
A
and
B
coincide].
1
T
det(A) CA , then
x = A−1 b =
1
CT b
det(A) A
i = 1, . . . , n):
1
1
T (i)
CA
b
=
det(A)
det(A)
T
1
(CA )(i) b =
det(A)

b1
1
 ..
(α1i . . . αni )  .
det(A)
bn
n
X
1
bt αti =
det(A) t=1
and consequently (for
xi
=
=
=
=
=
T
CA
(i)
b=


=
1
det(Bi ) .
det(A)
From Rouché-Capelli's theorem, it follows that, in order to decide whether a
AX = b is solvable
rank (A b).
or not, one has to compute (and then compare)
rank (A)
Kroenecker's theorem
To compute the rank of a matrix, the following result (
theorem of the bordered minors)
LS
and
or
will come in handy, saving a lot of computations!
Let us start with a denition.
Denition.
A,
of order
B
Let
We will call a
be a square submatrix of order
bordered minor
r+1
of
B,
r,
of a matrix
A ∈ Mm,n (R).
C of
will say that C
the determinant of any square submatrix
B as a submatrix.
B with a row and a column of A.
r = min{m, n} B has no bordered minors.
and such that it contains
We
is obtained by bordering
Obviously, if
Theorem 6 (Kroenecker). Let A ∈ Mm,n (R). We have that rank (A) = r if and
only if the two following conditions are satised:
a) there exists in A an invertible square submatrix B of order r ;
b) all bordered minors of B (if there exist any) are zero.
Proof.
(=⇒) If
rank (A) = r, then ρ(A) = r and therefore there exists an invertA of order r: B ∈ GLr (R). Moreover, all minors of order
A (and in particular the bordered minors of B ) are zero. Hence,
ible submatrix of
r+1
of
conditions a) and b) are satised.
16
ALFONSO SORRENTINO
(⇐=) To simplify the notation, let us suppose that
B = A(1, . . . , r | 1, . . . , r). By
det(B) 6= 0 (i.e., rank (B) = r).
Let C = (A(1) . . . A(r) ) be the submatrix of A, formed by the rst r columns
of A. Obviously, rank (C) ≤ r ; since B is a submatrix of C , then r =
rank (B) ≤ rank (C) ≤ r and consequently, rank (C) = r. Hence, the
columns A(1) , . . . , A(r) are linearly independent.
hypothesis,
i.e., rank
To show the claim (
(A) = r), we need to prove that:
A(r+1) , . . . , A(n) ∈ hA(1) . . . A(r) i,
namely, that for
t = r + 1, . . . , n,
the matrix
(A(1) . . . A(r) A(t) )
has rank
r.
Let us denote such matrix by
formed by the rst
r
D ∈ Mm,r+1 (R)
and consider the submatrix
rows:

D(1)


.
.
.


 ∈ Mr,r+1 (R).
D(r)
This submatrix has rank
r
(it has
B
as a submatrix, therefore it must have
i.e., rank (D) = r), it suces to verify:
maximal rank). To prove the claim (
D(r+1) , . . . , D(m) ∈ hD(1) , . . . , D(r) i.
 (1) 
D
 .. 
 . 
In fact, for s = r + 1, . . . , m, consider det 
. This is a bordered
 D(r) 
D(s)
matrix of B , therefore it is zero. This means that its rst r + 1 columns
(1)
are linearly dependent and, since the rst r D
, . . . , D(r) are linearly independent, we have necessarily:
D(s) ∈ hD(1) , . . . , D(r) i ,
as we wanted to show.
Remark.
•
•
•
•
•
•
Let
A ∈ Mm,n (R).
To compute the rank one can proceed as follows:
A an invertible square matrix B of order t;
if t = min{m, n}, then rank (A) = t;
if t < min{m, n}, consider all possible bordered minors of B ;
if all bordered minors of B are zero, then rank (A) = t;
otherwise, we have obtained a new invertible square matrix C of order t+1;
therefore, rank (A) ≥ t + 1 and we repeat the above procedure.
nd in
Without Kroenecker's theorem, once we have found an invertible square submatrix
B oforder t,we
shouldcheck that all possible minors of order t + 1 are zero; they
m
n
are
; while the bordered minors of B are only (m − t)(n − t).
t+1
t+1
For instance, if A ∈ M4,6 (R) and B ∈ GL2 (R), the minors of A of order 3 are
4
6
= 80, while the bordered minors are (4 − 2)(6 − 2) = 8.
3
3
Remark. Let AX = b be a
matrix and let r = rank (A).
if and only if
rank (A b) = r.
given
LS(m, n, R).
Let
M = (A b)
be its
From Rouché-Capelli's theorem, such
In this case, it has
∞n−r
solutions.
LS
complete
is solvable
LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS
17
We want to describe now a procedure to nd such solutions (without using GaussJordan's method).
Choose in
A
an invertible square submatrix
damental submatrix of the LS ).
Dene:

A(i1 )

.
.
.



A0 = 
and consider the new
B
r (that we will call funB = A(i1 , . . . , ir | j1 , . . . , jr ).


bi 1


b0 =  ... 
bir
of order
For instance, let
and
A(ir )
LS(r, n, R): A0 X = b0 .
This system is equivalent to the orig-
m − r equations, corresponding
A that could be expressed as linear combinations of the remaining r.
Let us solve this new system. Bring the n − r unknowns dierent from xj1 , . . . , xjr
to the right-hand side and attribute to them the values t1 , . . . , tn−r ∈ R (arbitrarily
chosen). We get in this way, a system LS(r, r, R) that admits a unique solution
(since the coecient matrix B is invertible), that can be expressed by Cramer's
formula. Varying the n − r parameters t1 , . . . , tn−r ∈ R, we get the set Σ of the
∞n−r solutions of the LS .
inal one, since it has been obtained by eliminating
to the rows of
A particular simple solution of the system, is obtained by choosing
tn−r = 0;
let us denote it by
z0 .
Using prop. 5,
the vector space of the solutions of
generic solution of
of this
HLS
Σ,
(where
t1 = . . . =
Σ0 denotes
therefore, instead of computing the
it might be more convenient to compute the generic solution
and then sum it up to
(t1 , . . . , tn−r )
Attributing to
A0 X = 0);
Σ = z0 + Σ 0
z0 .
the values,
(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1) ,
we get
n−r
n−r
Therefore:
LS(r, r, R), each of which admits a unique solution.
y1 , . . . , yn−r are linearly independent and hence form a basis
(
)
n−r
X
Σ = z0 +
ti yi , for all t1 , . . . , tn−r ∈ R .
systems
solutions
These
of
Σ0 .
i=1
Remark.
We have already observed that a
HLS AX = 0
is always solvable (in
fact, it has at least the trivial solution). This fact, is conrmed by Rouché-Capelli's
rank (A) is clearly equal to rank (A 0). From the same theorem,
HLS(m, n, R) AX = 0 has ∞n−rank (A) .
One can verify that a HLS has no eigensolutions (i.e., the only solution is the
0
trivial one), if and only if n = rank (A) [in fact, ∞ = 1].
theorem; in fact,
it follows that every
Department of Mathematics, Princeton University
E-mail address :
[email protected]