Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multivariate Probability Distributions Dr. Tai-kuang Ho, National Tsing Hua University The slides draw from the textbooks of Wackerly, Mendenhall, and Schea¤er (2008) & Devore and Berk (2012) 1 5.1 Introduction The intersection of two or more events is frequently of interest to an experimenter. Observing both the height and the weight of an individual represents the intersection of a speci…c pair of events associated with height-weight measurements. 5.2 Bivariate and Multivariate Probability Distributions Many random variables can be de…ned over the same sample space. 2 Tossing a pair of dice The sample space contains 36 sample points, corresponding to 36 ways in which numbers may appear on thee faces of the dice. (y1; y2) 1; P (y1; y2) = P (Y1 = y1; Y2 = y2) = 36 y1 = 1; 2; 3; 4; 5; 6; You can de…ne the following variables over the sample space: Y1 :The number of dots appearing on die 1. 3 y2 = 1; 2; 3; 4; 5; 6 Y2 :The number of dots appearing on die 2. Y3 :The sum of the number of dots on the dice. Y4 :The product of the number of dots appearing on the dice. Figure 5.1 DEFINITION: Joint Probability Function Let Y1 and Y2 be discrete random variables. The joint (or bivariate) probability function for Y1 and Y2 is given by 4 p(y1; y2) = P (Y1 = y1; Y2 = y2); 1 < y1 < 1; 1 < y2 < 1 THEOREM If Y1 and Y2 are discrete random variables with joint probability function p(y1; y2), then 1. p(y1; y2) 0 for all y1; y2. P 2. y1;y2 p(y1; y2) = 1, where the sum is over all values (y1; y2) that are assigned nonzero probabilities. 5 The joint probability function for discrete random variables is called the joint probability mass function. It speci…es the probability (mass) associated with each of the possible pairs of values for the random variables. Table 5.1 DEFINITION: joint (cumulative) distribution function For any random variables Y1 and Y2, the joint (bivariate) distribution function F (y1; y2) is 6 F (y1; y2) = P (Y1 y1; Y2 y2); 1 < y1 < 1; More precisely F (y1; y2) = X X t1 y1 t2 y2 Die-tossing experiment 7 p (t1; t2) 1 < y2 < 1 F (2; 3) = P (Y1 2; Y2 6 p (2; 2) + p (2; 3) = 36 3) = p (1; 1) + p (1; 2) + p (1; 3) + p (2; 1) + DEFINITION: joint distribution for continuous variable Let Y1 and Y2 be continuous random variables with joint (cumulative) distribution function F (y1; y2). If there exists a nonnegative function f (y1; y2), such that F (y1; y2) = Z y Z y 1 2 1 1 f (t1; t2)dt2dt1; for all 1 < y1 < 1, 1 < y2 < 1, then Y1 and Y2 are said to be jointly continuous random variables. The function f (y1; y2) is called the joint probability density function. 8 Two random variables are said to be jointly continuous if their joint distribution function F (x1; x2) is continuous in both arguments. THEOREM: Properties of bivariate cumulative distribution functions If Y1 and Y2 are random variables with joint distribution function F (y1; y2), then 1. F ( 1; 1) = F ( 1; y2) = F (y1; 2. F (1; 1) = 1 9 1) = 0 3. If y1 0. y1 and y2 y2, then F (y1 ; y2 ) F (y1 ; y2 ) = P (y1 < Y1 F (y1 ; y2) F (y1 ; y2) F (y1; y2 ) + F (y1; y2) F (y1; y2 ) + F (y1; y2) y1 ; y2 < Y2 y2 ) 0 THEOREM If Y1 and Y2 are jointly continuous random variables with a joint density function given by f (y1; y2), then 10 1. f (y1; y2) 0 for all y1; y2. R1 R1 2. 1 1 f (y1 ; y2 )dy1 dy2 = 1. Figure 5.2 P (a1 Y1 a2; b1 Y2 b2) = Z b Z a 2 2 b1 a1 f (y1; y2) dy1dy2 Volumes under this surface correspond to probabilities. 11 The above can be extended to n events. p (y1; y2; : : : ; yn) = P (Y1 = y1; Y2 = y2; : : : ; Yn = yn) P (Y1 y1; Y2 y2; : : : ; Yn = F (y1; y2; : : : ; yn) = Z y Z y 2 1 1 1 Z y n 1 yn ) f (t1; t2; : : : ; tn) dt1dt2 : : : dtn 12 5.3 Marginal and Conditional Probability Distributions Tossing a pair of dice Draw a table, compute the joint probability mass function and marginal probability DEFINITION: marginal probability functions a. Let Y1 and Y2 be jointly discrete random variables with probability function p(y1; y2). Then the marginal probability functions of Y1 and Y2, respectively, are given by 13 p1(y1) = X p(y1; y2), p2(y2) = all y2 X p(y1; y2) all y1 For a speci…c y1 value, sum over all possible y2. b Let Y1 and Y2 be jointly continuous random variables with joint density function f (y1; y2). Then the marginal density functions of Y1 and Y2, respectively, are given by f1(y1) = Z 1 1 f (y1; y2)dy2; f2(y2) = 14 Z 1 1 f (y1; y2)dy1 Table 5.2 P (A \ B ) = P (A) P (BjA) p (y1; y2) = p (y1) p (y2jy1) p (y1; y2) p (y2jy1) = p (y1) 15 Draw a table, compute the conditional probability DEFINITION: conditional distribution If Y1 and Y2 are jointly discrete random variables with joint probability function p(y1; y2) and marginal probability functions p1(y1) and p2(y2), respectively, then the conditional discrete probability function of Y1 given Y2 is p(y1; y2) P (Y1 = y1; Y2 = y2) p(y1jy2) = P (Y1 = y1jY2 = y2) = = P (Y2 = y2) p2(y2) provided that p2(y2) > 0. 16 DEFINITION: conditional probability for continuous variables Let Y1 and Y2 be jointly continuous random variables with joint density f (y1; y2) and marginal densities f1(y1) and f2(y2), respectively. For any y2 such that f2(y2) > 0, the conditional density of Y1 given Y2 = y2 is given by f (y1jy2) = f (y1; y2) f2(y2) and, for any y1 such that f1(y1) > 0, the conditional density of Y2 given Y1 = y1 is given by 17 f (y1; y2) f (y2jy1) = f1(y1) DEFINITION: conditional (cumulative) distribution function If Y1 and Y2 are jointly continuous random variables with joint density function f (y1; y2), then the conditional distribution function of Y1 given Y2 = y2 is F (y1jy2) = P (Y1 Z y 1 f (t1 ; y2 ) y1jY2 = y2) = dt1 1 f2 (y2 ) 18 Remark F (y1) = F (y1) = = Z y 1 1 F (y1jy2) f2 (y2) dy2 f1 (t1) dt1 = 1Z Z 1 y1 1 Z 1 1 Z y 1 1 Z 1 1 f (t1; y2) dy2 dt1 f (t1; y2) dt1dy2 F (y1jy2) f2 (y2) = 19 Z y 1 1 f (t1; y2) dt1 F (y1jy2) = 5.4 R y1 1 f (t1 ; y2 ) dt1 f2 (y2) Independent Random Variables Two events A and B are independent if P (A \ B ) = P (A) P (B ). Two random variables Y1 and Y2. P (a < Y1 b; c < Y2 d) = P (a < Y1 20 b) P (c < Y2 d) If Y1 and Y2 are independent, the joint probability can be written as the product of the marginal probability. DEFINITION: independence Let Y1 have distribution function F1(y1), Y2 have distribution function F2(y2), and Y1 and Y2 have joint distribution function F (y1; y2). Then Y1 and Y2 are said to be independent if and only if F (y1; y2) = F1(y1) F2(y2) for every pair of real numbers (y1; y2). 21 If Y1 and Y2 are not independent, they are said to be dependent. THEOREM: convenient method to establish independence of variables If Y1 and Y2 are discrete random variables with joint probability function p(y1; y2) and marginal probability functions p1(y1) and p2(y2), respectively, then Y1 and Y2 are independent if and only if p(y1; y2) = p1(y1) p2(y2) for all pairs of real numbers (y1; y2). 22 If Y1 and Y2 are continuous random variables with joint density function f (y1; y2) and marginal density functions f1(y1) and f2(y2), respectively, then Y1 and Y2 are independent if and only if f (y1; y2) = f1(y1) f2(y2) for all pairs of real numbers (y1; y2). Example: the die-tossing problem, p (1; 2) = p1 (1) p2 (2). THEOREM: property of independence 23 Let Y1 and Y2 have a joint density f (y1; y2) that is positive if and only if a y1 b and c y2 d, for constants a, b, c, and d; and f (y1; y2) = 0 otherwise. Then Y1 and Y2 are independent random variables if and only if f (y1; y2) = g (y1) h(y2) where g (y1) is a nonnegative function of y1 alone and h(y2) is a nonnegative function of y2 alone. Independence can be extended to n variables. F (y1; y2; : : : ; yn) = F1 (y1) F2 (y2) 24 Fn (yn) 5.5 The Expected Value of a Function of Random Variables Analogue to univariate situation DEFINITION: expected value Let g (Y1; Y2; :::; Yk ) be a function of the discrete random variables, Y1; Y2; :::; Yk , which have probability function p(y1; y2; :::; yk ). Then the expected value of g (Y1; Y2; :::; Yk ) is E [g (Y1; Y2; :::; Yk )] X X X = g (y1; y2; :::; yk ) p(y1; y2; :::; yk ) all yk all y2 all y1 25 If Y1; Y2; :::; Yk are continuous random variables with joint density function f (y1; y2; :::; yk ), then E [g (Y1; Y2; :::; Yk )] = Z 1 1 Z 1 Z 1 1 1 g (y1; y2; :::; yk ) f (y1; y2; :::; yk )dy1dy2 Show the formula for univariate case Example 5.15 (Y1; Y2) ! f (y1; y2) 26 dyk E (Y1) = = Z 1 Z 1 1 Z 1 1 y1 y1 f (y1; y2)dy1dy2 1 Z 1 1 f (y1; y2)dy2 dy1 = Example 5.16 5.6 Special Theorems Similar to the univariate case 27 Z 1 1 y1f1 (y1) dy1 THEOREM Let c be a constant. Then E (c) = c: THEOREM 5.7 Let g (Y1; Y2) be a function of the random variables Y1 and Y2 and let c be a constant. Then 28 E [c g (Y1; Y2)] = c E [g (Y1; Y2)]: THEOREM Let Y1 and Y2 be random variables and g1(Y1; Y2); g2(Y1; Y2); : : : ; gk (Y1; Y2) be functions of Y1 and Y2. Then E [g1(Y1; Y2) + g2(Y1; Y2) + + gk (Y1; Y2)] = E [g1(Y1; Y2)] + E [g2(Y1; Y2)] + 29 + E [gk (Y1; Y2)]: If the random variables are independent: THEOREM Let Y1 and Y2 be independent random variables and g (Y1) and h(Y2) be functions of only Y1 and Y2, respectively. Then E [g (Y1) h(Y2)] = E [g (Y1)] E [h(Y2)]; provided that the expectations exist. Proof 30 (Y1; Y2) ! f (y1; y2) E [g (Y1) h (Y2)] = = = = Z 1 Z 1 1Z 1 Z 1 1 1 Z 1 1 Z 1 1 1 g (y1) h (y2) f (y1; y2)dy1dy2 g (y1) h (y2) [f1(y1) f2(y2)] dy1dy2 g (y1) f1(y1) Z 1 1 h (y2) f2(y2)dy2 dy1 g (y1) f1(y1)E [h (Y2)] dy1 = E [h (Y2)] Z 1 1 g (y1) f1(y1)dy1 = E [h (Y2)] E [g (Y1)] 31 5.7 The Covariance of Two Random Variables Dependence of two variables Two measure of dependence Covariance Correlation coe¢ cient Figure 5.8 32 E (Y1) = 1; (y1 1 ) ( y2 E (Y2) = 2 2) DEFINITION: covariance If Y1 and Y2 are random variables with means 1 and 2, respectively, the covariance of Y1 and Y2 is Cov (Y1; Y2) = E [(Y1 33 1 )(Y2 2 )]: A positive value, negative value and zero value Cov (Y1; Y2) depends on the scale of measurement. This problem can be solved by using correlation coe¢ cient. Cov (Y1; Y2) 1 2 1 1 34 = 0; 1; +1; > 0; < 0 THEOREM: covariance and mean If Y1 and Y2 are random variables with means 1 and 2, respectively, then Cov (Y1; Y2) = E [(Y1 1 )(Y2 2 )] Proof THEOREM: independence and covariance 35 = E (Y1Y2) E (Y1)E (Y2): If Y1 and Y2 are independent random variables, then Cov (Y1; Y2) = 0: Thus, independent random variables must be uncorrelated (the converse is not true). Proof Cov (Y1; Y2) = E [(Y1 1 )(Y2 = E (Y1)E (Y2) 2 )] = E (Y1Y2) E (Y1)E (Y2) = 0 36 E (Y1)E (Y2) The converse of the above theorem is not true. Example 5.24 5.8 The Expected Value and Variance of Linear Functions of Random Variables Estimators that are linear functions of the measurements in a sample. a1 ; a 2 ; : : : ; an 37 Y1; Y2; : : : ; Yn U1 = a1Y1 + a2Y2 + + anYn = n X aiYi i=1 We need to …nd the expected value and variance of U1. And the covariance between two such linear combinations. THEOREM: variance between two linear functions of random variables 38 Let Y1; Y2; :::; Yn and X1; X2; :::; Xm be random variables with E (Yi) = E (Xj ) = j . De…ne U1 = n X aiYi m X bj Xj i=1 U2 = j=1 for constants a1; a2; :::; an and b1; b2; :::; bm. Then the following hold: a. E (U1) = Pn i=1 ai i. 39 i and Pn PP 2 b. V (U1) = i=1 ai V (Yi) + 2 1 i<j n aiaj Cov (Yi; Yj ), where the double sum is over all pairs (i; j ) with i < j . c. Cov (U1; U2) = Pn Pm i=1 j=1 aibj Cov (Yi; Xj ). Illustrate using the case of two variables. Proof of (a) is straightforward. Proof of (b) 40 E (U1)]2 V (U1) = E [U1 V (U1) = E [U1 2 E (U1)]2 = E 4 = i=1 n X aiYi i=1 6 n 6X 2 = E6 ai (Yi 6 4i=1 n X 2 a2i E (Yi 2+ ) i i )2 + n X n X i=1 j=1 i6=j n X n X aiaj (Yi i=1 j=1 i6=j 41 32 2 n X ai i 5 = E 4 ai(Yi i=1 i=1 3 n X i)(Yj h aiaj E (Yi 7 7 7 ) j 7 5 i)(Yj i j) : i 32 )5 V (U1) = n X i=1 a2i V (Yi) + n X n X aiaj Cov (Yi; Yj ): i=1 j=1 i6=j Cov (Yi; Yj ) = Cov (Yj ; Yi) V (U1) = n X XX 2 ai V (Yi) + 2 aiaj Cov (Yi; Yj ): i=1 1 i<j n Proof of (c) 42 Cov (U1; U2) = E f[(U1 20 = E 4@ = = = = n X E (U1)] [(U2 aiYi n X E (U2)]g 10 ai i A @ m X bj Xj m X i=1 i=1 j=1 j=1 82 32 39 n m < X = X E 4 ai(Yi )5 4 bj (Xj )5 i j : ; i=1 j=1 2 3 n m X X 5 4 aibj (Yi E i)(Xj j) i=1 j=1 n X m h i X aibj E (Yi i)(Xj j) i=1 j=1 n X m X aibj Cov (Yi; Xj ): i=1 j=1 43 13 bj j A5 5.9 Conditional Expectations We have introduced conditional probability functions and conditional density functions. Conditional expectations are de…ned in the same manner as univariate expectation. DEFINITION: conditional expectation If Y1 and Y2 are any two random variables, the conditional expectation of g (Y1), given that Y2 = y2, is de…ned to be 44 E (g (Y1)jY2 = y2) = Z 1 1 if Y1 and Y2 are jointly continuous and E (g (Y1)jY2 = y2) = g (y1)f (y1jy2)dy1 X all y1 g (y1)p(y1jy2) if Y1 and Y2 are jointly discrete. The conditional expectation of Y1 given Y2 = y2 is a function of y2. E (Y1jY2) is a function of Y2. 45 THEOREM: law of iterated expectations Let Y1 and Y2 denote random variables. Then E (Y1) = E [E (Y1jY2)]; where on the right-hand side the inside expectation is with respect to the conditional distribution of Y1 given Y2 and the outside expectation is with respect to the distribution of Y2. Proof 46 E (Y1) = = = = Z 1 Z 1 1Z 1 Z 1 1 y1 f (y1; y2) dy1dy2 1 {z } y1 f (y1jy2)f2(y2) dy1dy2 1 Z1 Z 1 1 1 Z 1 | 1 | {z } y1f (y1jy2)dy1 f2(y2)dy2 E (Y1jY2 = y2)f2(y2)dy2 = E [E (Y 1jY 2)]: Now consider conditional variance. V (Y1jY2 = y2) = E Y12jY2 = y2 47 [E (Y1jY2 = y2)]2 V (Y1jY2) is a function of Y2. THEOREM Let Y1 and Y2 denote random variables. Then V (Y1) = E [V (Y1jY2)] + V [E (Y1jY2)] : Implication and interpretation 48 V (Y1) > E [V (Y1jY2)] Proof As previously indicated, V (Y1jY2) is given by V (Y1jY2) = E (Y12jY2) and 49 [E (Y1jY2)]2 E [V (Y1jY2)] = By de…nition, h E E (Y12jY2) n i V [E (Y1jY2)] = E [E (Y1jY2)] The variance of Y1 is V (Y1) = = h i 2 E Y1 n h 2 n E [E (Y1jY2)] o 2 o : fE [E (Y1jY2)]g2 : [E (Y1)]2 io 2 E E Y1 jY2 io n h 2 E E Y1 jY2 fE [E (Y1jY2)]g2 n = E [E (Y1jY2)] = E [V (Y1jY2)] + V [E (Y1jY2)] : 50 2 o n + E [E (Y1jY2)] 2 o fE [E (Y1jY2)]g2