Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Theory Linear transformations A transformation is said to be linear if every single function in the transformation is a linear combination. Chapter 5 The multivariate normal distribution Thommy Perlinger, Probability Theory When dealing with linear transformations it is convenient to use matrix notation. A linear transformation can then be written as Y=BX+b where 1 The mean vector and the covariance matrix 2 The mean vector and the covariance matrix Definition 2.1. Let X be a random n-vector whose components have finite variance. The mean vector of X is μ=E(X), and the covariance matrix of X is Λ=E(X-μ)(X-μ)´ Λ E(X μ)(X μ) . For the covariance matrix of X, it follows that When dealing with random vectors and matrices expectations are taken componentwise, which means that That is, the elements of the mean vector are the means of the components of X. Thommy Perlinger, Probability Theory 3 Thommy Perlinger, Probability Theory 4 1 Expectations for linear transformations Expectations for linear transformations Proof (the covariance matrix) Theorem 2.2. Let X be a random n-vector with mean vector μ and covariance matrix Λ. Further, let B be an m×n matrix, let b b a constant m-vector, and be d set Y=BX+b. Y BX b Then Th and Proof (the covariance matrix). Because of the fact that multiplicative constant matrices can be moved outside of the expectation it follows that Thommy Perlinger, Probability Theory 5 The multivariate normal distribution Definition I Thommy Perlinger, Probability Theory 6 Exercise 5.3.2 Definition I. The random n-vector X is normal iff, for every n-vector a, the linear combination a´X is (univariate) normal. Notation. The notation X ∈ N(µ,Λ) is used to denote that X has a multivariate normal distribution with mean vector μ and covariance matrix Λ. Let X = (X1,X2)´ be a normal random vector distributed as What is the joint distribution of Y1=X1+X2 and Y2=2X1-3X2? Since Theorem 3.1. Let X ∈ N(µ,Λ) and set Y=BX+b. Then Y ∈ N(Bµ+b, BΛB´) Proof. The correctness of the mean vector and the covariance matrix follows directly from Theorem 2.2. Next we prove that every linear combination of Y is normal by showing that a linear combination of Y is another linear combination of X. Thommy Perlinger, Probability Theory 7 it follows from Theorem 3.1 that 8 2 The multivariate normal distribution Definition II: Transforms Proof of Theorem 4.2. Definition I implies Definition II The moment generating function of a random vector X is given by Let X be N(µ,Λ) by Definition I. The mgf of X is given by Definition II. The random vector X is normal, N(µ,Λ), iff its moment generating function is on the form and since Y = t´X is a linear combination of X, it follows from Definition I that Y is (univariate) normal and therefore has a moment generating function. Furthermore, it follows from Theorem 2.2 that E(Y)=t´μ and Var(Y)=t´Λt. Hence Theorem 4.2. Definition I and Definition II are equivalent. q The meaning. If every linear combination of X is univariate normal then the moment generating function of X is on the form given above. If, on the other hand, the moment generating function of X is on the form given above, then every linear combination of X is univariate normal. Thommy Perlinger, Probability Theory 9 and the first part of the proof is established. Thommy Perlinger, Probability Theory 10 Properties of non-negative definite symmetric matrices Properties of symmetric matrices Definition. A symmetric matrix A is said to be positive-definite if for all x≠0 the quadratic form x´Ax is positive. If for f allll x the th quadratic d ti form f is i non-negative ti then th A is i said id to t be b nonnegative-definite (or positive-semidefinite). Orthogonal matrices. A symmetric matrix C is an orthogonal matrix if C´C=I, where I is the identity matrix. Theorem 2.1. Every covariance matrix Λ is nonnegativedefinite. Diagonal matrices. A symmetric matrix D is a diagonal matrix if the diagonal elements are the only non-zero elements of D. Proof. Let X be a random vector whose covariance matrix is Λ, and d now study d the h linear li combination bi i y´X. ´X By B Th Theorem 2.2 22 It follows that the rows (and columns) of an orthogonal matrix is orthonormal, that is, they all have unit length and they are all pairwise orthogonal. Diagonalization. Let A be a symmetric matrix. Then there exists an orthogonal g matrix C and a diagonal g matrix D such that A=CDC´. Furthermore, the diagonal elements of D are the eigenvalues of A. The square root. Let A be a symmetric matrix. The square root of A is a matrix (usually denoted) A1/2 where A1/2 A1/2 = A. and the theorem is proved. It follows from the diagonalization of A that A1/2 = CD1/2C´. Thommy Perlinger, Probability Theory 11 12 3 Proof of Theorem 4.2. Definition II implies Definition I Proof of Theorem 4.2. Definition II implies Definition I Let Y1,…,Yn be independent N(0,1), that is, Y = (Y1,…,Yn )´ are N(0,I) by Definition I. The moment generating function of Y is given by The moment generating function of X is given by which is the mgf g g given in Definition II. Since Next we let X = Λ1/2Y + µ and since this is a linear transformation of Y it follows from Theorem 2.2 that it is clear that any linear combination of X is another linear combination of Y, which means that X is normal, N(µ,Λ), according to Definition I. 13 Thommy Perlinger, Probability Theory Problem 5.10.30 (part 1) 14 Problem 5.10.30 (part 1) Let X₁,X₂, and X₃ have joint moment generating function as follows: Find the joint distribution of Y1=X1+X3 and Y2=X1+X2, that is, the distribution of the linear transformation Y=BX where Since By Definition II it follows that X₁,X₂, and X₃ are jointly normal it follows from Theorem 3.1 that Thommy Perlinger, Probability Theory 15 16 4 Important properties of determinants The multivariate normal distribution Definition III: The density function Definition III. The random vector X is normal, N(µ,Λ), (where det Λ > 0) iff its density function is on the form 1. A square matrix A is invertible iff det A ≠ 0. 2. For the identity matrix I we have that det I = 1. 3. For the transpose of A we have that det A = det A´. 4. Let A and B be square matrices. Then det AB = det A det B. 5. Results 2. and 4. now imply that det A-1 = (det A)-1. Theorem 5.2. Definitions I, II, and III are equivalent (in the nonsingular case). 6 Let C be an orthogonal matrix. 6. matrix Results 2., 2 3 3., and 4 4. now imply that det C = ±1. 7. Since a symmetric matrix A can be diagonalized as A=CDC´ it follows by results 4. and 6. that det A = det D = λ1·λ2···λn, where λ1,λ2,…,λn are the eigenvalues of A. 17 Proof of Theorem 5.2 Idea for the proof. First we find a normal random vector Y whose density function is easy to derive. Then a suitably defined linear transformation X = BY will be N(µ,Λ). Finally the transformation theorem (Theorem 1.2.1) will give us the density function of X. Thommy Perlinger, Probability Theory 18 Proof of Theorem 5.2 Step 1. Find a normal random vector Y whose density function is easy to derive Let Y1,…,Yn be independent N(0,1). Then, by Definition I, Y = (Y1,…,Yn )´ is N(0,I). The density function of Y is given by Step 2. We know from before that X = Λ1/2Y + µ is N(µ,Λ). Step 3. Find the density function of X. Recall Theorem 1.2.1. Step 3.1. Inversion yields that Y = Λ-1/2(X - µ). Step 3.2. Since it is a linear transformation, the Jacobian becomes Step 3.3. Finally, it follows from Theorem 1.2.1. that Thommy Perlinger, Probability Theory 19 20 5 Problem 5.10.30 (part 2) Conditional distributions General situation. Let X be N(µ,Λ) with det Λ > 0. Furthermore, let X1 and X2 be subvectors of X where the components of X1 and d X2 are assumed d to t be b different. diff t By B definition d fi iti In the first part of the problem we found that Since det Λ = 4·10-2·2 = 36 and it follows from Definition III that the density of Y is given by y g be said about the distribution of X2||X1=x1? Can anything Answer. YES! Conditional distributions of multivariate normal distributions are normal. Thommy Perlinger, Probability Theory 21 Thommy Perlinger, Probability Theory Problem 5.10.30 (part 3) Independence Find the conditional density of Y1 given that Y2=1, that is, find fY1|Y2=1(y1). Natural question 1. Is there an easy way to determine whether the components of a normal random vector are independent? 22 Theorem 7.1. Let X be a normal random vector. The components of X are independent iff they are uncorrelated. Proof. Show that uncorrelated components imply independence. Since it follows that which means that Hence, the conditional distribution of Y1 given that Y2=1 is N(4/5, 18/5). 23 24 6 Problem 5.10.10 Problem 5.10.10 Since (U,V)′ is a linear transformation of (X,Y)′ it is clear that (U,V)′ is also bivariate normal. The covariance matrix of (U,V)′ is given by Suppose that the moment generating function of (X,Y)′ is Determine a so that U=X+2Y and V=2X-Y become independent. Since it follows from Definition II that Thommy Perlinger, Probability Theory and it is thus clear that only for a=4/3 will U and V be independent. 25 Independence and linear transformations Natural question 2. A linear transformation of a normal random vector is itself normal. Is it always possible to find a linear transformation that will have uncorrelated and hence, uncorrelated, hence independent components? Theorem 8.1. Let X be N(µ,Λ). Furthermore, let C be the orthogonal matrix that diagonalizes Λ, that is, C´ΛC = D, where the diagonal elements of D are the eigenvalues of Λ. Then Y = C´X is N(C´μ, D). Theorem 8.2. Let X be N(µ,σ2I). Furthermore, let C be an arbitrary orthogonal matrix. Then Y = C´X is N(C´μ, σ2I). Thommy Perlinger, Probability Theory 26 Problem 5.10.9 b Let X and Y be independent N(0,σ2). Show that X+Y and X-Y are independent normal random variables. Since X and Y are independent we have that (X,Y)′ is bivariate normal, N(0,σ2I). Furthermore and because of the fact that Conclusion. For the general N(µ,Λ) it always exists one orthogonal transformation that will yield a normal random vector with independent components. For the special case N(µ,σ2I) any orthogonal transformation will produce a normal random vector with independent components. Thommy Perlinger, Probability Theory It is, however, by Theorem 7.1, enough to determine an off-diagonal element. 27 it follows from Exercise 8.2 that the components of (X+Y,X-Y)′ are independent normal random variables. Thommy Perlinger, Probability Theory 28 7 Problem 5.10.37 Problem 5.10.37 Let Since det Λ = 1-ρ² and where ρ is the correlation coefficient. Determine the probability distribution of it follows that the joint density function of X and Y is given by The moment generating function of W is defined by It follows by the density of (X,Y)´ that the main part of the expression for the moment generating function of W is given by Q where and in order to find it we first have to find the joint density of X and Y. Thommy Perlinger, Probability Theory 29 Problem 5.10.37 Thommy Perlinger, Probability Theory 30 Problem 5.10.37 It follows that the moment generating function of W is given by Since Q is the main part of a multivariate normal density function where and it is clear that W is χ2(2). and 31 Thommy Perlinger, Probability Theory 32 8 The multivariate normal distribution and the Chi-square distribution Theorem 9.1. Let X be N(µ,Λ) with det Λ > 0. Then where n is the dimension of X. Proof. Set Y = Λ-1/2(X - µ). Then Y is N(0,I) and it follows that and d since i where Y1,Y2,…,Yn are i.i.d. N(0,1) it is clear that Y´Y is χ2(n). Thommy Perlinger, Probability Theory 33 9