Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Jordan normal form wikipedia , lookup
System of linear equations wikipedia , lookup
Vector space wikipedia , lookup
Euclidean vector wikipedia , lookup
Matrix multiplication wikipedia , lookup
Laplace–Runge–Lenz vector wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Principal component analysis wikipedia , lookup
1.12. MULTIVARIATE RANDOM VARIABLES 59 1.12 Multivariate Random Variables We will be using matrix notation to denote multivariate rvs and their distributions. Denote by X = (X1 , . . . , Xn )T an n-dimensional random vector whose components are random variables. Then, all the definitions given for bivariate rvs extend to the multivariate case. For example, if X is continuous, then we may write Z x1 Z xn ··· fX (x1 , . . . , xn )dx1 . . . dxn FX (x1 , . . . , xn ) = −∞ and P (X ⊆ A) = Z ··· −∞ Z fX (x1 , . . . , xn )dx1 . . . dxn , A where A ⊆ X and X ⊆ Rn is the support of fX . Example 1.35. Let X = (X1 , X2 , X3 , X4 )T be a four-dimensional random vector with the joint pdf given by 3 fX (x1 , x2 , x3 , x4 ) = (x21 + x22 + x23 + x24 )IX , 4 where X = {(x1 , x2 , x3 , x4 ) ∈ R4 : 0 < xi < 1, i = 1, 2, 3, 4}. Calculate: 1. the marginal pdf of (X1 , X2 ); 2. the expectation E(X1 X2 ); 3. the conditional pdf f x3 , x4 |x1 = 13 , x2 = 2 3 4. the probability P X1 < 12 , X2 < 34 , X4 > 1 2 Solution: ; . 1. Here we have to calculate the double integral of the joint pdf with respect to x3 and x4 , that is, Z ∞Z ∞ f (x1 , x2 ) = fX (x1 , x2 , x3 , x4 )dx3 dx4 −∞ −∞ Z 1Z 1 3 2 (x1 + x22 + x23 + x24 )dx3 dx4 = 4 0 0 1 3 2 = (x1 + x22 ) + . 4 2 60 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY 2. By definition of expectation we have Z ∞Z ∞ E(X1 X2 ) = x1 x2 f (x1 , x2 )dx1 dx2 −∞ −∞ Z 1Z 1 5 1 3 2 2 dx1 dx2 = . (x1 + x2 ) + = x1 x2 4 2 16 0 0 3. By definition of a conditional pdf we have, fX (x1 , x2 , x3 , x4 ) f x3 , x4 |x1 , x2 = f (x1 , x2 ) 3 2 (x1 + x22 + x23 + x24 ) 4 = 3 (x21 + x22 ) + 12 4 x2 + x2 + x2 + x2 = 1 2 2 2 3 2 4. x1 + x2 + 3 Hence, 2 1 = f x3 , x4 |x1 = , x2 = 3 3 1 2 3 + 1 2 3 2 2 3 + x23 + x24 9 9 5 + x23 + x24 . = 2 11 11 11 + 23 + 23 4. Here we use (indirectly) the marginal pdf for (X1 , X2 , X4 ): 3 1 1 P X1 < , X2 < , X4 > 2 4 2 Z 1Z 1Z 3 Z 1 4 2 3 151 = (x21 + x22 + x23 + x24 )dx1 dx2 dx3 dx4 = . 1 1024 0 0 4 0 2 The following results will be very useful in the second part of this course. They are extensions of Definition 1.18, Theorem 1.13, and Theorem 1.14, respectively, to n random variables X1 , X2 , . . . , Xn . Definition 1.22. Let X = (X1 , X2 , . . . , Xn )T denote a continuous n-dimensional rv with joint pdf fX (x1 , x2 , . . . , xn ) and marginal pdfs fXi (xi ), i = 1, 2, . . . , n. The random variables are called mutually independent (or just independent) if fX (x1 , x2 , . . . , xn ) = n Y fXi (xi ). i=1 1.12. MULTIVARIATE RANDOM VARIABLES 61 It means that all pairs Xi , Xj , i 6= j, are independent. Example 1.36. Suppose that Yi ∼ Exp(λ) independently for i = 1, 2, . . . , n. Then the joint pdf of Y = (Y1 , Y2 , . . . , Yn )T is fY (y1 , . . . , yn ) = n Y λe−λyi = λn e−λ Pn i=1 yi . i=1 Theorem 1.21. For gj (Xj ), a function of Xj only, j = 1, 2, . . . , m, m ≤ n, we have ! m m Y Y E gj (Xj ) = E gj (Xj ) . j=1 j=1 Theorem 1.22. Let X = (X1 , X2 , . . . , Xn )T be a vector of mutually independent an and b1 , b2 , . . . , bn rvs with mgfs MX1 (t), MX2 (t), . . . , MXn (t) and let a1 , a2 , . . . , P be fixed constants. Then the mgf of the random variable Z = ni=1 (ai Xi + bi ) is MZ (t) = et P bi n Y MXi (ai t). i=1 Exercise 1.20. Proof Theorem 1.22. Example 1.37. Calculate the mean and the variance of the random variable Y = P n i=1 Xi , where Xi ∼ Gamma(αi , λ) independently. First, we will find the mgf of Y and then generate the first and second moments using this mgf (Theorem 1.7). Xi are independent, hence, by Theorem 1.22 we have n Y MY (t) = MXi (t). i=1 The pdf of a single rv X ∼ Gamma(α, λ) is fX (x) = λα α−1 −λx x e I[0,∞)(x). Γ(α) 62 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY Thus, by the definition of the mgf we have MX (t) = E etX Z ∞ λα = etx xα−1 e−λx dx Γ(α) 0 Z ∞ λα xα−1 e−(λ−t)x dx = Γ(α) 0 Z ∞ (λ − t)α λα xα−1 e−(λ−t)x dx = (λ − t)α Γ(α) 0 Z λα (λ − t)α ∞ α−1 −(λ−t)x x e dx = (λ − t)α Γ(α) 0 | {z } =1, (pdf of a Gamma rv) = λ λ−t α = t 1− λ −α . Hence, −αi − Pni=1 αi n Y t t MY (t) = MXi (t) = 1− = 1− . λ λ i=1 i=1 n Y This Pn has the same form as the mgf of a Gamma random variable with parameters i=1 αi and λ, that is, ! n X Y ∼ Gamma αi , λ . i=1 The mean and variance of a Gamma rv can be obtained calculating the derivatives of the mgf at t = 0, see Theorem 1.7. For X ∼ Gamma(α, λ) we have −α t MX (t) = 1 − λ α EX = λ α(α + 1) E X2 = λ2 α var(X) = E X 2 − [E X]2 = 2 λ Pn Hence, for Y ∼ Gamma( i=1 αi , λ) we get Pn Pn α αi i E Y = i=1 and var(Y ) = i=12 . λ λ 1.12. MULTIVARIATE RANDOM VARIABLES 63 The following definition is often used when we consider realizations of rvs (samples) coming from populations having the same distribution. Definition 1.23. The random variables X1 , X2 , . . . , Xn are identically distributed if their distribution functions are identical, that is, FX1 (x) = FX2 (x) = . . . = FXn (x) f or all x ∈ R. If they are also independent then we denote this briefly as IID, which means Independently, Identically Distributed. For example, notation {Xi }i=1,2,...,n ∼ IID means that the variables Xi are IID but the type of the distribution is not specified. We will often use IID normal rvs denoted by Xi ∼ N (µ, σ 2), i = 1, 2, . . . , n. iid Exercise 1.21. Find the pdf of the random variable X = 2 1 n Xi ∼ N (µ, σ ), i = 1, 2, . . . , n. iid Pn i=1 Xi , where 1.12.1 Expectation and Variance of Random Vectors The expectation of a random vector X nents, that is, X1 X2 E(X) = E .. = . Xn is a vector of expectations of its compoE(X1 ) E(X2 ) .. . E(Xn ) = µ1 µ2 .. . µn = µ. The variance-covariance matrix of X is V = Var(X) = E (X − E(X))(X − E(X))T var(X1 ) cov(X1 , X2 ) . . . cov(X1 , Xn ) cov(X2 , X1 ) var(X2 ) . . . cov(X2 , Xn ) = .. .. .. .. . . . . cov(Xn , X1 ) cov(Xn , X2 ) . . . var(Xn ) (1.20) 64 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY The following theorem shows a basic property of the variance-covariance matrix. Theorem 1.23. If X is a random vector then its variance-covariance matrix V is a non-negative definite matrix, that is for any constant vector b the quadratic form bT V b is non-negative. Proof. For any constant vector b ∈ Rn we can construct a one-dimensional variable Y = bT X whose variance is 0 ≤ var(Y ) = E (Y − E(Y ))2 = E (bT X − E(bT X))2 = E (bT X − E(bT X))(bT X − E(bT X))T = E bT (X − E(X))(X − E(X))T b = bT E (X − E(X))(X − E(X))T b = bT Var(X)b = bT V b. That is bT V b ≥ 0 and so V is a non-negative definite matrix. The proof of the above theorem shows that the variance of a combination Y = P n i=1 bi Xi of random variables Xi is a quadratic form of the variance-covariance matrix of X and the vector of the coefficients of the combination b. More generally, if X is n-dimensional rv, B is an m × n constant matrix and a is a real m × 1 vector, then the expectation and the variance of the random vector Y = a + BX are, respectively E(Y ) = a + B E(X) = a + Bµ, and Var(Y ) = B Var XB T . The covariance of two random vectors, n-dimensional X and m-dimensional Y , is defined as Cov(X, Y ) = E (X − E(X))(Y − E(Y ))T . It is an n × m-dimensional matrix. 1.12. MULTIVARIATE RANDOM VARIABLES 65 1.12.2 Joint Moment Generating Function Definition 1.24. Let X = (X1 , X2 , . . . , Xn )T be a random vector. We define the joint mgf as T MX (t) = E et X , where t = (t1 , t2 , . . . , tn )T is an n-dimensional argument of M. Similarly as in the univariate case, there is a unique relationship between the joint pdf and the joint mgf. The mgf related to a marginal distribution of a subset of variables Xi1 , . . . , Xis can be obtained by setting tj = 0 for all j not in the set {i1 , . . . , is }. Note also that if the variables X1 , X2 , . . . , Xn are mutually independent, then the joint mgf is a product of the marginal mgfs, that is tT X MX (t) = E e Pn =E e j=1 tj Xj =E n Y tj Xj e = j=1 n Y MXj (tj ). j=1 Another useful property of the joint mgf is given in the following theorem. Theorem 1.24. Let X = (X1 , X2 , . . . , Xn )T be a random vector. If the joint mgf of X can be written as a product of some functions gj (tj ), j = 1, 2, . . . , n, that is MX (t) = n Y gj (tj ), j=1 then the variables X1 , X2 , . . . , Xn are independent. Proof. Let ti = 0 for all i 6= j. Then, the marginal mgf MXj (tj ) is Y MXj (tj ) = gj (tj ) gi (0). i6=j Also, note that if ti = 0 for all i = 1, 2, . . . , n, then Pn MX (t) = E e j=1 tj Xj = E e0 = 1. This gives 1 = MX (t) = n Y j=1 gj (0) ⇒ Y i6=j gi (0) = 1 . gj (0) 66 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY Therefore, MXj (tj ) = gj (tj ) gj (0) and hence MX (t) = n Y j=1 gj (tj ) = n Y j=1 gj (0)MXj (tj ) = 1 × n Y MXj (tj ). j=1 This means that the joint pdf can also be written as a product of marginal pdfs, g (t ) each with the marginal mgf equal to MXj (tj ) = gjj (0)j . Hence, the random variables X1 , X2 , . . . , Xn are independent. 1.12.3 Transformations of Random Vectors Let X = (X1 , X2 , . . . , Xn )T be a continuous random vector and let g : Rn → Rn be a one-to-one and onto function denoted by g(x) = (g1 (x), g2(x), . . . , gn (x))T , where x = (x1 , x2 , . . . , xn )T and gi : Rn → R. Then, for a transformed random vector Y = g(X) we have the following result. Theorem 1.25. The density of Y = g(X) is given by fY (y) = fX h(y) Jh (y), where h(y) = g −1 (y) and Jh (y) denotes the absolute value of the Jacobian ∂ Jh (y) = det h(y) = det ∂y ∂ h (y) ∂y1 1 ∂ h (y) ∂y2 1 .. . ∂ h (y) ∂yn 1 ∂ h (y) ∂y1 2 ∂ h (y) ∂y2 2 ... ... .. . ∂ h (y) ∂y1 n ∂ h (y) ∂y2 n .. . ∂ ∂ h (y) . . . ∂yn hn (y) ∂yn 2 Another useful form of the Jacobian is −1 , Jh (y) = Jg h(y) 1.12. MULTIVARIATE RANDOM VARIABLES where Jg (x) = det 67 ∂ g(x). ∂x Exercise 1.22. Let A be a non-singular n × n real matrix and let X be an ndimensional random vector. Show that the linearly transformed random variable Y = AX has the joint pdf given by 1 fX A−1y . fY (y) = | det A| 1.12.4 Multivariate Normal Distribution A random variable X has a multivariate normal distribution if its joint pdf can be written as 1 1 T −1 √ exp − (x − µ) V (x − µ) , fX (x1 , . . . , xn ) = 2 (2π)n/2 det V where the mean is µ = (µ1 , . . . , µn )T , and the variance-covariance matrix has the form (1.20). Exercise 1.23. Use the result from Exercise 1.22 to show that if X ∼ N n (µ, V ) then Y = AX has n-dimensional normal distribution with expectation Aµ and variance-covariance matrix AV AT . Lemma 1.3. If X ∼ Nn (µ, V ), B is an m × n matrix, and a is a real m × 1 vector, then the random vector Y = a + BX is also multivariate normal with E(Y ) = a + B E(X) = a + Bµ, and the variance-covariance matrix, VY = BV B T . 68 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY Note that taking B = bT , where b is an n × 1 dimensional vector and a = 0 we obtain Y = bT X = b1 X1 + . . . + bn Xn , and Y ∼ N (bT µ, bT V b).