Download unit2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 2 Introduction to Matrices
C. Devon Lin
Queen’s University, Sept. 28, 2015
Chapter 2 Introduction to Matrices – p. 1/16
Outline
2.1 A Brief Introduction to Matrix
2.2 Linear Equations and Solutions
2.3 Expected Value and Covariance Matrix of a Random
Vector
2.4 Multivariate Normal Distribution Theory
Chapter 2 Introduction to Matrices – p. 2/16
2.1 A Brief Introduction to Matrix
(1) Definition
An n × m matrix A = (aij ) is a two-dimensional array which has n rows
and m columns with the (i, j)th element being aij .


a11 a12 · · · a1m


 a

 21 a22 · · · a2m 

A=
..
.. . .
.. 


.
.
.
. 


an1
an2
···
anm
(2) Special types of matrices: square matrix, symmetric matrix, diagonal
matrix, identity matrix, idempotent matrix
Chapter 2 Introduction to Matrices – p. 3/16
(3) Matrix operations: addition, subtraction, multiplication, transpose, partition,
inverse
Inverse of a diagonal matrix A,

a−1
 11

0

−1
A =
..


.

0
0
a−1
22
..
.
0
···
···
..
.
···

0

0 

.

0 

a−1
nn
Orthogonal matrix: An n × n matrix Q is orthogonal if QT Q = QQT = In
and thus Q−1 = QT . For example





Q=

1
√
3
1
√
3
1
√
3
√1
2
0
−1
√
2
√1
6
−2
√
6
√1
6
 −1

 , Q = QT = 


1
√
3
1
√
2
1
√
6
√1
3
0
−2
√
6
√1
3
−1
√
2
√1
6



Chapter 2 Introduction to Matrices – p. 4/16
(4) Properties of matrix operations
Associativity: (AB)C = A(BC)
Right distributivity: (A + B)C = AC + BC
Left distributivity: C(A + B) = CA + CB
In general: AB 6= BA
(A + B)T = AT + B T , (AB)T = B T AT
(5) Characteristic of a matrix: determinant, rank, trace
Chapter 2 Introduction to Matrices – p. 5/16
1.2 Linear equations and solutions
(1) Linear equations: a set of r linear equations can be represented by
Ax = y where x is a vector of s unknowns, A is a r × s matrix of known
coefficients on the s unknowns, and y is a r × 1 vector of known constants.
For example,


 

1 2 3
x1
6


 

 2 4 6   x2  =  10 


 

3 3 3
x3
9
(2) The solutions of x: no solution, unique solution and infinite solutions
The linear equation has a unique solution if rank(A) = s
Chapter 2 Introduction to Matrices – p. 6/16
2.3 Expected Value and Covariance Matrix of a Random Vector
(1) Random vector: a vector of random variables X = (X1 , X2 , . . . , Xn )T .
Also known as a multivariate random variable. The distribution of each
random variable Xi is called a marginal distribution while the distribution of
the random vector X is called a joint probability distribution or a
multivariate distribution.
2
2
). The joint
)
and
X
∼
N
(µ
,
σ
Example 2.1 X1 ∼ N (µX1 , σX
2
X
X
2
2
1
probability distribution of (X1 , X2 )T is
f (X1 , X2 )
=
1
2πσX1 σX2
(X1 − µX1 )2
1
exp{−
+
[
p
2
2)
2
2(1
−
ρ
σ
1−ρ
X
(X2 − µX2 )2
2
σX
2
1
−
2ρ(X1 − µX1 )(X2 − µX2 )
σX1 σX2
]}.
(1)
We say
X1
X2
!
∼N
µ X1
µ X2
!
,
2
σX
1
ρσX1 σX2
ρσX1 σX2
2
σX
2
!!
Chapter 2 Introduction to Matrices – p. 7/16
The joint distribution of X1 and X2 is given by
fX (X1 , X2 ) =

where X = 
X1
X2

1
1
T
−1
(X
−
µ)
Σ
(X − µ)),
exp(−
2
(2π)2/2 |Σ|1/2

,µ = 
µX1
µX2


,Σ = 
2
σX
1
ρσX1 σX2
ρσX1 σX2
2
σX
2

.
Chapter 2 Introduction to Matrices – p. 8/16
The marginal distribution of X1 is
(X1 − µX1 )2
1
f (X1 ) =
}
√ exp{−
2
2σX
σX1 2π
1
The marginal distribution of X2 is
(X2 − µX2 )2
1
}
f (X2 ) =
√ exp{−
2
2σX
σX2 2π
2
Chapter 2 Introduction to Matrices – p. 9/16
(2) Expected value of a random vector X: E(X) = (E(X1 ), E(X2 ), . . . , E(Xn ))T .
Linearity: if Y = AX + b where Am×n and bm×1 are constants,
E(Y) = AE(X) + E(b).
(3) Covariance matrix of a random vector X = (X1 , X2 , . . . , Xn )T is an n × n
matrix with the (i, j)th element defined as
Cov(Xi , Xj ) = E[(Xi − E(Xi ))(Xj − E(Xj ))], that is,
Cov(X)
=
=
E[(X − E(X))(X − E(X))T ]

Var(X1 )
Cov(X1 , X2 )

 Cov(X2 , X1 )
Var(X2 )


.
.

.
.

.
.

Cov(Xn , X1 ) Cov(Xn , X2 )
···
···
..
.
···
Cov(X1 , Xn )
Cov(X2 , Xn )
.
.
.
Var(Xn )








(2)
Chapter 2 Introduction to Matrices – p. 10/16
If A is an m × n matrix, The covariance matrix of Y = AX is
Cov(Y) = ACov(X)AT .
(4) Covariance matrix of two random vectors X = (X1 , . . . , Xn )T and
Y = (Y1 , . . . , Ym )T is defined as
Cov(X, Y)
=
=
T
E[(X − E(X))(Y − E(Y)) ]

Cov(X1 , Y1 )
Cov(X1 , Y2 )

 Cov(X2 , Y1 )
Cov(X2 , Y2 )


.
.

.
.

.
.

Cov(Xn , Y1 ) Cov(Xn , Y2 )
···
···
..
.
···
Cov(X1 , Ym )
Cov(X2 , Ym )
.
.
.
Cov(Xn , Ym )








(3)
If A is a p × n matrix and B is a q × m matrix., then
Cov(AX, BY) = ACov(X, Y)BT .
Chapter 2 Introduction to Matrices – p. 11/16
2.4 Multivariate Normal Distribution Theory
(1) Standard multivariate normal distribution: If Z1 , . . . , Zn are independent
N (0, 1), then
Z = (Z1 , . . . , Zn )T ∼ M V Nn (0, I).
(2) Multivariate normal distribution
Suppose that
X = AZ + µ,
(4)
where A is an m × n matrix of constants, µ is a vector of length m, and
Z ∼ M V Nn (0, I). Then we say that X has an M V Nm (µ, AAT )
distribution. Let Σ = AAT .
fX (X1 , . . . , Xn ) =
1
T
−1
exp(−
(X
−
µ)
Σ
(X − µ)).
2
(2π)n/2 |Σ|1/2
1
Chapter 2 Introduction to Matrices – p. 12/16
E(X) = E(AZ + µ) has the ith element
E((AZ)i + µi )
X
=
E(
=
X
Aij Zj ) + µi
j
Aij E(Zj ) + µi
j
=
Cov(X)
=
µi
E[(X − µ)(X − µ)T ]
=
E[(AZ)(AZ)T ]
=
E[AZZT AT ]
=
AE[ZZT ]AT
=
AIAT = AAT
Chapter 2 Introduction to Matrices – p. 13/16
Example 2.2 Suppose Z1 , Z2 , Z3 are independent standard normal distributions.
Let Xi = µ + σZi . Thus X1 , X2 , X3 are independent N (µ, σ 2 ) random variables.
Define

X1


Z1








X =  X2  , Z =  Z2 
.
Z3
X3
Express X in the form of AZ + b for a matrix A and a vector b.
Show that X ∼ M V N3 (µX , ΣX ) and identify µX and ΣX .
√
√
Let U1 = (Z1 − Z2 )/ 2, U2 = (Z1 + Z2 − Z3 )/ 6 and
U3 = (Z1 + Z2 + Z3 )/3. Show that U = (U1 , U2 , U3 )T has a multivariate
normal distribution and identify the mean and the variance of U.
Chapter 2 Introduction to Matrices – p. 14/16
(3) The following hold for a random vector X having a multivariate normal
distribution.
Linear combinations of the components of X are normally distributed
All subsets of the components of X have a (multivariate) normal distribution.
Zero covariance implies that the corresponding components are
independently distributed.
The conditional distributions of the components are (multivariate) normal.
Partition X into X1 and X2 with E(X1 ) = µ1 , Var(X1 ) = Σ11 , E(X2 ) = µ2 ,
Var(X2 ) = Σ22 , we have
X1 |X2 ∼ MVN(µ1|2 , Σ1|2 ),
−1
(X
−
µ
)
and
Σ
=
Σ
−
Σ
Σ
where µ1|2 = µ1 + Σ12 Σ−1
2
11
12
1|2
2
22 Σ21 , and
22


Σ11 Σ12


Σ=
Σ11 Σ22
Chapter 2 Introduction to Matrices – p. 15/16
Example 2.3 Consider a linear combination aT X of a multivariate normal random
vector X ∼ M V Nn (µ, Σ) determined by the choice aT = (1, 0, . . . , 0). Because

X1

 X2

aT X = (1, 0, . . . , 0) 
 ..
 .

Xn


µ1


 µ2



 = X1 and aT µ = (1, 0, . . . , 0)  .
 .

 .



µn

σ11

 σ21

T
a Σa = (1, 0, . . . , 0) 
 ..
 .

σn1
σ12
σ22
.
.
.
σn2
···
···
..
.
···
σ1n
σ2n
.
.
.
σnn




 = µ1











1
0
.
.
.
0




 = σ11 ,



2
X1 is distributed as N (µ1 , σ11
)
Chapter 2 Introduction to Matrices – p. 16/16