Download Notes 23 - Wharton Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Statistics 510: Notes 23
Reading: Section 7.8
I. The Multivariate Normal Distribution
One of the most important joint distributions is the
multivariate normal distribution. Let Z1 , , Z n be mutually
independent standard normal random variables. If, for
some constants aij ,1  i  n,1  j  m and i ,1  i  m ,
X 1  a11Z1   a1n Z n  1
X 2  a21Z1 
 a2 n Z n  2
X i  ai1Z1 
 ain Z n  i
 amn Z n  m
then the random variables X1 , , X m are said to have a
multivariate normal distribution.
X m  am1Z1 
Note: Without loss of generality, we can assume that
n  m.
It follows from the fact that the sum of independent normal
random variables is itself a normal random variable that
each X i is a normal random variable with mean and
variance given by
1
E ( X i )  i
n
Var ( X i )   aij2
j 1
Let’s specialize to the case of m  2 . This is called the
bivariate normal distribution.
X1  a11Z1  a12 Z 2  1
X 2  a21Z1  a22 Z 2  2
The means and variances of X1 and X 2 are
E ( X1 )  1 , Var ( X1 )  a112  a122  12
2
2
E ( X 2 )  2 , Var ( X 2 )  a21
 a22
  22
The correlation between X1 and X 2 is
a11a21  a12 a22

2
2
a112  a122 a21
 a22
Using the method of Chapter 6.7 for calculating the joint
pdf of functions of random variables and a lot of messy
algebra leads to the conclusion that the the joint density of
X1 and X 2 is
2
f ( x1 , x2 ) 
1
2 1 2
 x1  1 
1
exp{
[


2(1   2 )   1 
1  2
2
2
 x  2 
( x1  1 )( x2  2 )
 2
]}
  2



2
1 2


so the joint density depends only on the means, variances
and correlation of X1 and X 2 .
The algebra is shown below (taken from the Mathworld
page on the bivariate normal distribution)
To derive the bivariate normal probability function, let
and
be normally and
independently distributed variates with mean 0 and variance 1, then define
(13)
(14)
(Kenney and Keeping 1951, p. 92). The variates
distributed with means
and
, variances
and
are then themselves normally
(15)
(16)
and covariance
(17)
The covariance matrix is defined by
(18)
where
(19)
Now, the joint probability density function for
and
is
(20)
3
but from (◇) and (◇), we have
(21)
As long as
(22)
this can be inverted to give
(23)
(24)
Therefore,
(25)
and expanding the numerator of (◇) gives
(26)
so
(27)
Now, the denominator of (◇) is
(28)
so
(29)
4
(30)
(31)
(32)
can be written simply as
(33)
and
(34)
Solving for
and
and defining
(35)
gives
(36)
(37)
But the Jacobian is
(38)
(39)
(40)
so
(41)
and
5
(42)
where
(43)
Q.E.D.
Conditional expectations for the bivariate normal
distribution:
We will now compute the conditional expectation
E ( X 2 | X1  x1 ) for ( X 1 , X 2 ) having a bivariate normal
distribution.

E ( X 2 | X1  x1 )   x2 f X 2 | X1 ( x2 | x1 )dx2

We now determine the conditional density of X 2 | X1  x1 .
In doing so, we will continually collect all factors that do
not depend on x2 , and represent them by the constants Ci .
The final constant will then be found by using



f X 2 | X1 ( x2 | x1 )dx2  1 .
6
f X 2 | X1 ( x2 | x1 ) 
f ( x1 , x2 )
f X1 ( x1 )
 C1 f ( x1 , x2 )

 x    2
x2 ( x1  1 )  
1
2
2


 C2 exp 
  2
2
2(1


)




 
2
1 2



 2
 
2
1

 C3 exp  2
x

2
x
(



(
x


))

2
2
2
1
1

2
1
 
 2 2 (1   ) 
2


 
2
1
 C4 exp  2
x  ( 2  
( x1  1 ))  
2  2
2

(1


)

2
1

 

The last expression is proportional to a normal density with
2



( x1  1 ) and variance  2 (1   2 ) . Thus,
mean 2
2
1
the conditional distribution of X 2 | X1  x1 is normal with
2



( x1  1 ) and variance  2 (1   2 ) .
mean 2
2
1

E ( X 2 | X 1  x1 )  2   2 ( x1  1 ) .
1
Also, by interchanging X1 and X 2 in the above
calculations, it follows that

E ( X 1 | X 2  x2 )  1   1 ( x2  2 )
2
Example 1: The statistician Karl Pearson carried out a
study on the resemblances between parents and children.
He measured the heights of 1078 fathers and sons, and
found that the fathers and sons joint heights approximately
7
followed a bivariate normal distribution with the mean of
the fathers’ heights = 5 feet, 9 inches; mean of sons’
heights = 5 feet, 10 inches; standard deviation of fathers’
heights = 2 inches; standard deviation of sons’ heights = 2
inches; correlation between fathers and sons’ heights = 0.5.
(a) Predict the height of the son of a father who is 6’2’’ tall.
(b) What is the probability that a father is taller than his
son?
8
Example 2: Regression to the Mean.
As part of their training, air force pilots make two
practice landings with instructors, and are rated on
performance. The instructors discuss the ratings with the
pilots after each landing. Statistical analysis shows that
pilots who make poor landings the first time tend to do
better the second time. Conversely, pilots who make good
landings the first time tend to do worse the second time.
The conclusion: criticism helps the pilots while praise
makes them do worse. As a result, instructors were ordered
to criticize all landings, good or bad. Was this warranted
by the facts?
Let X1 =rating on first landing, X 2 =rating on first landing.
9
We consider the following model for the ratings
( X 1 , X 2 ) of a random pilot.
X 1    aZ1
X 2    aZ 2
where ( , Z1 , Z 2 ) are independent standard normals. 
represents the true skill of the pilot and (aZ1 , aZ 2 ) are
chance errors for each flight. The joint distribution of
( X 1 , X 2 ) is bivariate normal with parameters
1 

2
2
2
2


0,


0,


1

a
,


1

a
,


2
1
2
 1

1  a2  .

2
1
E
(
X
|
X

x
)




(
x


)

x
2
1
1
2
Thus,
1 1 1 1  a2 1
Thus, even if praise or punishment has no effect, for pilots
who did well on the first landing ( x1  0 ), we expect them
to do worse on the second landing, and for pilots who did
poorly on the first landing ( x2  0 ), we expect them to do
better on the second landing. This is called the regression
to the mean effect.
10