Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 2 Introduction to Matrices C. Devon Lin Queen’s University, Sept. 28, 2015 Chapter 2 Introduction to Matrices – p. 1/16 Outline 2.1 A Brief Introduction to Matrix 2.2 Linear Equations and Solutions 2.3 Expected Value and Covariance Matrix of a Random Vector 2.4 Multivariate Normal Distribution Theory Chapter 2 Introduction to Matrices – p. 2/16 2.1 A Brief Introduction to Matrix (1) Definition An n × m matrix A = (aij ) is a two-dimensional array which has n rows and m columns with the (i, j)th element being aij . a11 a12 · · · a1m a 21 a22 · · · a2m A= .. .. . . .. . . . . an1 an2 ··· anm (2) Special types of matrices: square matrix, symmetric matrix, diagonal matrix, identity matrix, idempotent matrix Chapter 2 Introduction to Matrices – p. 3/16 (3) Matrix operations: addition, subtraction, multiplication, transpose, partition, inverse Inverse of a diagonal matrix A, a−1 11 0 −1 A = .. . 0 0 a−1 22 .. . 0 ··· ··· .. . ··· 0 0 . 0 a−1 nn Orthogonal matrix: An n × n matrix Q is orthogonal if QT Q = QQT = In and thus Q−1 = QT . For example Q= 1 √ 3 1 √ 3 1 √ 3 √1 2 0 −1 √ 2 √1 6 −2 √ 6 √1 6 −1 , Q = QT = 1 √ 3 1 √ 2 1 √ 6 √1 3 0 −2 √ 6 √1 3 −1 √ 2 √1 6 Chapter 2 Introduction to Matrices – p. 4/16 (4) Properties of matrix operations Associativity: (AB)C = A(BC) Right distributivity: (A + B)C = AC + BC Left distributivity: C(A + B) = CA + CB In general: AB 6= BA (A + B)T = AT + B T , (AB)T = B T AT (5) Characteristic of a matrix: determinant, rank, trace Chapter 2 Introduction to Matrices – p. 5/16 1.2 Linear equations and solutions (1) Linear equations: a set of r linear equations can be represented by Ax = y where x is a vector of s unknowns, A is a r × s matrix of known coefficients on the s unknowns, and y is a r × 1 vector of known constants. For example, 1 2 3 x1 6 2 4 6 x2 = 10 3 3 3 x3 9 (2) The solutions of x: no solution, unique solution and infinite solutions The linear equation has a unique solution if rank(A) = s Chapter 2 Introduction to Matrices – p. 6/16 2.3 Expected Value and Covariance Matrix of a Random Vector (1) Random vector: a vector of random variables X = (X1 , X2 , . . . , Xn )T . Also known as a multivariate random variable. The distribution of each random variable Xi is called a marginal distribution while the distribution of the random vector X is called a joint probability distribution or a multivariate distribution. 2 2 ). The joint ) and X ∼ N (µ , σ Example 2.1 X1 ∼ N (µX1 , σX 2 X X 2 2 1 probability distribution of (X1 , X2 )T is f (X1 , X2 ) = 1 2πσX1 σX2 (X1 − µX1 )2 1 exp{− + [ p 2 2) 2 2(1 − ρ σ 1−ρ X (X2 − µX2 )2 2 σX 2 1 − 2ρ(X1 − µX1 )(X2 − µX2 ) σX1 σX2 ]}. (1) We say X1 X2 ! ∼N µ X1 µ X2 ! , 2 σX 1 ρσX1 σX2 ρσX1 σX2 2 σX 2 !! Chapter 2 Introduction to Matrices – p. 7/16 The joint distribution of X1 and X2 is given by fX (X1 , X2 ) = where X = X1 X2 1 1 T −1 (X − µ) Σ (X − µ)), exp(− 2 (2π)2/2 |Σ|1/2 ,µ = µX1 µX2 ,Σ = 2 σX 1 ρσX1 σX2 ρσX1 σX2 2 σX 2 . Chapter 2 Introduction to Matrices – p. 8/16 The marginal distribution of X1 is (X1 − µX1 )2 1 f (X1 ) = } √ exp{− 2 2σX σX1 2π 1 The marginal distribution of X2 is (X2 − µX2 )2 1 } f (X2 ) = √ exp{− 2 2σX σX2 2π 2 Chapter 2 Introduction to Matrices – p. 9/16 (2) Expected value of a random vector X: E(X) = (E(X1 ), E(X2 ), . . . , E(Xn ))T . Linearity: if Y = AX + b where Am×n and bm×1 are constants, E(Y) = AE(X) + E(b). (3) Covariance matrix of a random vector X = (X1 , X2 , . . . , Xn )T is an n × n matrix with the (i, j)th element defined as Cov(Xi , Xj ) = E[(Xi − E(Xi ))(Xj − E(Xj ))], that is, Cov(X) = = E[(X − E(X))(X − E(X))T ] Var(X1 ) Cov(X1 , X2 ) Cov(X2 , X1 ) Var(X2 ) . . . . . . Cov(Xn , X1 ) Cov(Xn , X2 ) ··· ··· .. . ··· Cov(X1 , Xn ) Cov(X2 , Xn ) . . . Var(Xn ) (2) Chapter 2 Introduction to Matrices – p. 10/16 If A is an m × n matrix, The covariance matrix of Y = AX is Cov(Y) = ACov(X)AT . (4) Covariance matrix of two random vectors X = (X1 , . . . , Xn )T and Y = (Y1 , . . . , Ym )T is defined as Cov(X, Y) = = T E[(X − E(X))(Y − E(Y)) ] Cov(X1 , Y1 ) Cov(X1 , Y2 ) Cov(X2 , Y1 ) Cov(X2 , Y2 ) . . . . . . Cov(Xn , Y1 ) Cov(Xn , Y2 ) ··· ··· .. . ··· Cov(X1 , Ym ) Cov(X2 , Ym ) . . . Cov(Xn , Ym ) (3) If A is a p × n matrix and B is a q × m matrix., then Cov(AX, BY) = ACov(X, Y)BT . Chapter 2 Introduction to Matrices – p. 11/16 2.4 Multivariate Normal Distribution Theory (1) Standard multivariate normal distribution: If Z1 , . . . , Zn are independent N (0, 1), then Z = (Z1 , . . . , Zn )T ∼ M V Nn (0, I). (2) Multivariate normal distribution Suppose that X = AZ + µ, (4) where A is an m × n matrix of constants, µ is a vector of length m, and Z ∼ M V Nn (0, I). Then we say that X has an M V Nm (µ, AAT ) distribution. Let Σ = AAT . fX (X1 , . . . , Xn ) = 1 T −1 exp(− (X − µ) Σ (X − µ)). 2 (2π)n/2 |Σ|1/2 1 Chapter 2 Introduction to Matrices – p. 12/16 E(X) = E(AZ + µ) has the ith element E((AZ)i + µi ) X = E( = X Aij Zj ) + µi j Aij E(Zj ) + µi j = Cov(X) = µi E[(X − µ)(X − µ)T ] = E[(AZ)(AZ)T ] = E[AZZT AT ] = AE[ZZT ]AT = AIAT = AAT Chapter 2 Introduction to Matrices – p. 13/16 Example 2.2 Suppose Z1 , Z2 , Z3 are independent standard normal distributions. Let Xi = µ + σZi . Thus X1 , X2 , X3 are independent N (µ, σ 2 ) random variables. Define X1 Z1 X = X2 , Z = Z2 . Z3 X3 Express X in the form of AZ + b for a matrix A and a vector b. Show that X ∼ M V N3 (µX , ΣX ) and identify µX and ΣX . √ √ Let U1 = (Z1 − Z2 )/ 2, U2 = (Z1 + Z2 − Z3 )/ 6 and U3 = (Z1 + Z2 + Z3 )/3. Show that U = (U1 , U2 , U3 )T has a multivariate normal distribution and identify the mean and the variance of U. Chapter 2 Introduction to Matrices – p. 14/16 (3) The following hold for a random vector X having a multivariate normal distribution. Linear combinations of the components of X are normally distributed All subsets of the components of X have a (multivariate) normal distribution. Zero covariance implies that the corresponding components are independently distributed. The conditional distributions of the components are (multivariate) normal. Partition X into X1 and X2 with E(X1 ) = µ1 , Var(X1 ) = Σ11 , E(X2 ) = µ2 , Var(X2 ) = Σ22 , we have X1 |X2 ∼ MVN(µ1|2 , Σ1|2 ), −1 (X − µ ) and Σ = Σ − Σ Σ where µ1|2 = µ1 + Σ12 Σ−1 2 11 12 1|2 2 22 Σ21 , and 22 Σ11 Σ12 Σ= Σ11 Σ22 Chapter 2 Introduction to Matrices – p. 15/16 Example 2.3 Consider a linear combination aT X of a multivariate normal random vector X ∼ M V Nn (µ, Σ) determined by the choice aT = (1, 0, . . . , 0). Because X1 X2 aT X = (1, 0, . . . , 0) .. . Xn µ1 µ2 = X1 and aT µ = (1, 0, . . . , 0) . . . µn σ11 σ21 T a Σa = (1, 0, . . . , 0) .. . σn1 σ12 σ22 . . . σn2 ··· ··· .. . ··· σ1n σ2n . . . σnn = µ1 1 0 . . . 0 = σ11 , 2 X1 is distributed as N (µ1 , σ11 ) Chapter 2 Introduction to Matrices – p. 16/16