Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics 510: Notes 23 Reading: Section 7.8 I. The Multivariate Normal Distribution One of the most important joint distributions is the multivariate normal distribution. Let Z1 , , Z n be mutually independent standard normal random variables. If, for some constants aij ,1 i n,1 j m and i ,1 i m , X 1 a11Z1 a1n Z n 1 X 2 a21Z1 a2 n Z n 2 X i ai1Z1 ain Z n i amn Z n m then the random variables X1 , , X m are said to have a multivariate normal distribution. X m am1Z1 Note: Without loss of generality, we can assume that n m. It follows from the fact that the sum of independent normal random variables is itself a normal random variable that each X i is a normal random variable with mean and variance given by 1 E ( X i ) i n Var ( X i ) aij2 j 1 Let’s specialize to the case of m 2 . This is called the bivariate normal distribution. X1 a11Z1 a12 Z 2 1 X 2 a21Z1 a22 Z 2 2 The means and variances of X1 and X 2 are E ( X1 ) 1 , Var ( X1 ) a112 a122 12 2 2 E ( X 2 ) 2 , Var ( X 2 ) a21 a22 22 The correlation between X1 and X 2 is a11a21 a12 a22 2 2 a112 a122 a21 a22 Using the method of Chapter 6.7 for calculating the joint pdf of functions of random variables and a lot of messy algebra leads to the conclusion that the the joint density of X1 and X 2 is 2 f ( x1 , x2 ) 1 2 1 2 x1 1 1 exp{ [ 2(1 2 ) 1 1 2 2 2 x 2 ( x1 1 )( x2 2 ) 2 ]} 2 2 1 2 so the joint density depends only on the means, variances and correlation of X1 and X 2 . The algebra is shown below (taken from the Mathworld page on the bivariate normal distribution) To derive the bivariate normal probability function, let and be normally and independently distributed variates with mean 0 and variance 1, then define (13) (14) (Kenney and Keeping 1951, p. 92). The variates distributed with means and , variances and are then themselves normally (15) (16) and covariance (17) The covariance matrix is defined by (18) where (19) Now, the joint probability density function for and is (20) 3 but from (◇) and (◇), we have (21) As long as (22) this can be inverted to give (23) (24) Therefore, (25) and expanding the numerator of (◇) gives (26) so (27) Now, the denominator of (◇) is (28) so (29) 4 (30) (31) (32) can be written simply as (33) and (34) Solving for and and defining (35) gives (36) (37) But the Jacobian is (38) (39) (40) so (41) and 5 (42) where (43) Q.E.D. Conditional expectations for the bivariate normal distribution: We will now compute the conditional expectation E ( X 2 | X1 x1 ) for ( X 1 , X 2 ) having a bivariate normal distribution. E ( X 2 | X1 x1 ) x2 f X 2 | X1 ( x2 | x1 )dx2 We now determine the conditional density of X 2 | X1 x1 . In doing so, we will continually collect all factors that do not depend on x2 , and represent them by the constants Ci . The final constant will then be found by using f X 2 | X1 ( x2 | x1 )dx2 1 . 6 f X 2 | X1 ( x2 | x1 ) f ( x1 , x2 ) f X1 ( x1 ) C1 f ( x1 , x2 ) x 2 x2 ( x1 1 ) 1 2 2 C2 exp 2 2 2(1 ) 2 1 2 2 2 1 C3 exp 2 x 2 x ( ( x )) 2 2 2 1 1 2 1 2 2 (1 ) 2 2 1 C4 exp 2 x ( 2 ( x1 1 )) 2 2 2 (1 ) 2 1 The last expression is proportional to a normal density with 2 ( x1 1 ) and variance 2 (1 2 ) . Thus, mean 2 2 1 the conditional distribution of X 2 | X1 x1 is normal with 2 ( x1 1 ) and variance 2 (1 2 ) . mean 2 2 1 E ( X 2 | X 1 x1 ) 2 2 ( x1 1 ) . 1 Also, by interchanging X1 and X 2 in the above calculations, it follows that E ( X 1 | X 2 x2 ) 1 1 ( x2 2 ) 2 Example 1: The statistician Karl Pearson carried out a study on the resemblances between parents and children. He measured the heights of 1078 fathers and sons, and found that the fathers and sons joint heights approximately 7 followed a bivariate normal distribution with the mean of the fathers’ heights = 5 feet, 9 inches; mean of sons’ heights = 5 feet, 10 inches; standard deviation of fathers’ heights = 2 inches; standard deviation of sons’ heights = 2 inches; correlation between fathers and sons’ heights = 0.5. (a) Predict the height of the son of a father who is 6’2’’ tall. (b) What is the probability that a father is taller than his son? 8 Example 2: Regression to the Mean. As part of their training, air force pilots make two practice landings with instructors, and are rated on performance. The instructors discuss the ratings with the pilots after each landing. Statistical analysis shows that pilots who make poor landings the first time tend to do better the second time. Conversely, pilots who make good landings the first time tend to do worse the second time. The conclusion: criticism helps the pilots while praise makes them do worse. As a result, instructors were ordered to criticize all landings, good or bad. Was this warranted by the facts? Let X1 =rating on first landing, X 2 =rating on first landing. 9 We consider the following model for the ratings ( X 1 , X 2 ) of a random pilot. X 1 aZ1 X 2 aZ 2 where ( , Z1 , Z 2 ) are independent standard normals. represents the true skill of the pilot and (aZ1 , aZ 2 ) are chance errors for each flight. The joint distribution of ( X 1 , X 2 ) is bivariate normal with parameters 1 2 2 2 2 0, 0, 1 a , 1 a , 2 1 2 1 1 a2 . 2 1 E ( X | X x ) ( x ) x 2 1 1 2 Thus, 1 1 1 1 a2 1 Thus, even if praise or punishment has no effect, for pilots who did well on the first landing ( x1 0 ), we expect them to do worse on the second landing, and for pilots who did poorly on the first landing ( x2 0 ), we expect them to do better on the second landing. This is called the regression to the mean effect. 10