Download 1.12 Multivariate Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Jordan normal form wikipedia , lookup

System of linear equations wikipedia , lookup

Vector space wikipedia , lookup

Euclidean vector wikipedia , lookup

Matrix multiplication wikipedia , lookup

Laplace–Runge–Lenz vector wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Principal component analysis wikipedia , lookup

Four-vector wikipedia , lookup

Matrix calculus wikipedia , lookup

Transcript
1.12. MULTIVARIATE RANDOM VARIABLES
59
1.12 Multivariate Random Variables
We will be using matrix notation to denote multivariate rvs and their distributions.
Denote by X = (X1 , . . . , Xn )T an n-dimensional random vector whose components are random variables. Then, all the definitions given for bivariate rvs extend
to the multivariate case. For example, if X is continuous, then we may write
Z x1
Z xn
···
fX (x1 , . . . , xn )dx1 . . . dxn
FX (x1 , . . . , xn ) =
−∞
and
P (X ⊆ A) =
Z
···
−∞
Z
fX (x1 , . . . , xn )dx1 . . . dxn ,
A
where A ⊆ X and X ⊆ Rn is the support of fX .
Example 1.35. Let X = (X1 , X2 , X3 , X4 )T be a four-dimensional random vector
with the joint pdf given by
3
fX (x1 , x2 , x3 , x4 ) = (x21 + x22 + x23 + x24 )IX ,
4
where X = {(x1 , x2 , x3 , x4 ) ∈ R4 : 0 < xi < 1, i = 1, 2, 3, 4}. Calculate:
1. the marginal pdf of (X1 , X2 );
2. the expectation E(X1 X2 );
3. the conditional pdf f x3 , x4 |x1 = 13 , x2 =
2
3
4. the probability P X1 < 12 , X2 < 34 , X4 >
1
2
Solution:
;
.
1. Here we have to calculate the double integral of the joint pdf with respect
to x3 and x4 , that is,
Z ∞Z ∞
f (x1 , x2 ) =
fX (x1 , x2 , x3 , x4 )dx3 dx4
−∞ −∞
Z 1Z 1
3 2
(x1 + x22 + x23 + x24 )dx3 dx4
=
4
0
0
1
3 2
= (x1 + x22 ) + .
4
2
60 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
2. By definition of expectation we have
Z ∞Z ∞
E(X1 X2 ) =
x1 x2 f (x1 , x2 )dx1 dx2
−∞ −∞
Z 1Z 1
5
1
3 2
2
dx1 dx2 = .
(x1 + x2 ) +
=
x1 x2
4
2
16
0
0
3. By definition of a conditional pdf we have,
fX (x1 , x2 , x3 , x4 )
f x3 , x4 |x1 , x2 =
f (x1 , x2 )
3
2
(x1 + x22 + x23 + x24 )
4
=
3
(x21 + x22 ) + 12
4
x2 + x2 + x2 + x2
= 1 2 2 2 3 2 4.
x1 + x2 + 3
Hence,
2
1
=
f x3 , x4 |x1 = , x2 =
3
3
1 2
3
+
1 2
3
2 2
3
+ x23 + x24
9
9
5
+ x23 + x24 .
=
2
11 11
11
+ 23 + 23
4. Here we use (indirectly) the marginal pdf for (X1 , X2 , X4 ):
3
1
1
P X1 < , X2 < , X4 >
2
4
2
Z 1Z 1Z 3 Z 1
4
2 3
151
=
(x21 + x22 + x23 + x24 )dx1 dx2 dx3 dx4 =
.
1
1024
0
0 4
0
2
The following results will be very useful in the second part of this course. They
are extensions of Definition 1.18, Theorem 1.13, and Theorem 1.14, respectively,
to n random variables X1 , X2 , . . . , Xn .
Definition 1.22. Let X = (X1 , X2 , . . . , Xn )T denote a continuous n-dimensional
rv with joint pdf fX (x1 , x2 , . . . , xn ) and marginal pdfs fXi (xi ), i = 1, 2, . . . , n.
The random variables are called mutually independent (or just independent) if
fX (x1 , x2 , . . . , xn ) =
n
Y
fXi (xi ).
i=1
1.12. MULTIVARIATE RANDOM VARIABLES
61
It means that all pairs Xi , Xj , i 6= j, are independent.
Example 1.36. Suppose that Yi ∼ Exp(λ) independently for i = 1, 2, . . . , n.
Then the joint pdf of Y = (Y1 , Y2 , . . . , Yn )T is
fY (y1 , . . . , yn ) =
n
Y
λe−λyi = λn e−λ
Pn
i=1
yi
.
i=1
Theorem 1.21. For gj (Xj ), a function of Xj only, j = 1, 2, . . . , m, m ≤ n, we
have
!
m
m
Y
Y
E
gj (Xj ) =
E gj (Xj ) .
j=1
j=1
Theorem 1.22. Let X = (X1 , X2 , . . . , Xn )T be a vector of mutually independent
an and b1 , b2 , . . . , bn
rvs with mgfs MX1 (t), MX2 (t), . . . , MXn (t) and let a1 , a2 , . . . , P
be fixed constants. Then the mgf of the random variable Z = ni=1 (ai Xi + bi ) is
MZ (t) = et
P
bi
n
Y
MXi (ai t).
i=1
Exercise 1.20. Proof Theorem 1.22.
Example 1.37. Calculate the mean and the variance of the random variable Y =
P
n
i=1 Xi , where Xi ∼ Gamma(αi , λ) independently.
First, we will find the mgf of Y and then generate the first and second moments
using this mgf (Theorem 1.7). Xi are independent, hence, by Theorem 1.22 we
have
n
Y
MY (t) =
MXi (t).
i=1
The pdf of a single rv X ∼ Gamma(α, λ) is
fX (x) =
λα α−1 −λx
x e I[0,∞)(x).
Γ(α)
62 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
Thus, by the definition of the mgf we have
MX (t) = E etX
Z ∞
λα
=
etx xα−1 e−λx dx
Γ(α) 0
Z ∞
λα
xα−1 e−(λ−t)x dx
=
Γ(α) 0
Z ∞
(λ − t)α λα
xα−1 e−(λ−t)x dx
=
(λ − t)α Γ(α) 0
Z
λα (λ − t)α ∞ α−1 −(λ−t)x
x e
dx
=
(λ − t)α Γ(α)
0
|
{z
}
=1, (pdf of a Gamma rv)
=
λ
λ−t
α
=
t
1−
λ
−α
.
Hence,
−αi − Pni=1 αi
n Y
t
t
MY (t) =
MXi (t) =
1−
= 1−
.
λ
λ
i=1
i=1
n
Y
This
Pn has the same form as the mgf of a Gamma random variable with parameters
i=1 αi and λ, that is,
!
n
X
Y ∼ Gamma
αi , λ .
i=1
The mean and variance of a Gamma rv can be obtained calculating the derivatives
of the mgf at t = 0, see Theorem 1.7. For X ∼ Gamma(α, λ) we have
−α
t
MX (t) = 1 −
λ
α
EX =
λ
α(α + 1)
E X2 =
λ2
α
var(X) = E X 2 − [E X]2 = 2
λ
Pn
Hence, for Y ∼ Gamma( i=1 αi , λ) we get
Pn
Pn
α
αi
i
E Y = i=1
and var(Y ) = i=12 .
λ
λ
1.12. MULTIVARIATE RANDOM VARIABLES
63
The following definition is often used when we consider realizations of rvs (samples) coming from populations having the same distribution.
Definition 1.23. The random variables X1 , X2 , . . . , Xn are identically distributed
if their distribution functions are identical, that is,
FX1 (x) = FX2 (x) = . . . = FXn (x) f or all x ∈ R.
If they are also independent then we denote this briefly as IID, which means Independently, Identically Distributed. For example, notation
{Xi }i=1,2,...,n ∼ IID
means that the variables Xi are IID but the type of the distribution is not specified.
We will often use IID normal rvs denoted by
Xi ∼ N (µ, σ 2), i = 1, 2, . . . , n.
iid
Exercise 1.21. Find the pdf of the random variable X =
2
1
n
Xi ∼ N (µ, σ ), i = 1, 2, . . . , n.
iid
Pn
i=1
Xi , where
1.12.1 Expectation and Variance of Random Vectors
The expectation of a random vector X
nents, that is,

 
X1
 X2  

 
E(X) = E  ..  = 
 .  
Xn
is a vector of expectations of its compoE(X1 )
E(X2 )
..
.
E(Xn )


 
 
=
 
µ1
µ2
..
.
µn



 = µ.

The variance-covariance matrix of X is
V = Var(X)
= E (X − E(X))(X − E(X))T

var(X1 )
cov(X1 , X2 ) . . . cov(X1 , Xn )
 cov(X2 , X1 )
var(X2 )
. . . cov(X2 , Xn )

=
..
..
..
..

.
.
.
.
cov(Xn , X1 ) cov(Xn , X2 ) . . .
var(Xn )





(1.20)
64 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
The following theorem shows a basic property of the variance-covariance matrix.
Theorem 1.23. If X is a random vector then its variance-covariance matrix V
is a non-negative definite matrix, that is for any constant vector b the quadratic
form bT V b is non-negative.
Proof. For any constant vector b ∈ Rn we can construct a one-dimensional variable Y = bT X whose variance is
0 ≤ var(Y ) = E (Y − E(Y ))2
= E (bT X − E(bT X))2
= E (bT X − E(bT X))(bT X − E(bT X))T
= E bT (X − E(X))(X − E(X))T b
= bT E (X − E(X))(X − E(X))T b
= bT Var(X)b = bT V b.
That is bT V b ≥ 0 and so V is a non-negative definite matrix.
The proof of the above theorem shows that the variance of a combination Y =
P
n
i=1 bi Xi of random variables Xi is a quadratic form of the variance-covariance
matrix of X and the vector of the coefficients of the combination b. More generally, if X is n-dimensional rv, B is an m × n constant matrix and a is a real
m × 1 vector, then the expectation and the variance of the random vector
Y = a + BX
are, respectively
E(Y ) = a + B E(X) = a + Bµ,
and
Var(Y ) = B Var XB T .
The covariance of two random vectors, n-dimensional X and m-dimensional Y ,
is defined as
Cov(X, Y ) = E (X − E(X))(Y − E(Y ))T .
It is an n × m-dimensional matrix.
1.12. MULTIVARIATE RANDOM VARIABLES
65
1.12.2 Joint Moment Generating Function
Definition 1.24. Let X = (X1 , X2 , . . . , Xn )T be a random vector. We define the
joint mgf as
T MX (t) = E et X ,
where t = (t1 , t2 , . . . , tn )T is an n-dimensional argument of M.
Similarly as in the univariate case, there is a unique relationship between the joint
pdf and the joint mgf. The mgf related to a marginal distribution of a subset of
variables Xi1 , . . . , Xis can be obtained by setting tj = 0 for all j not in the set
{i1 , . . . , is }.
Note also that if the variables X1 , X2 , . . . , Xn are mutually independent, then the
joint mgf is a product of the marginal mgfs, that is
tT X
MX (t) = E e
Pn
=E e
j=1 tj Xj
=E
n
Y
tj Xj
e
=
j=1
n
Y
MXj (tj ).
j=1
Another useful property of the joint mgf is given in the following theorem.
Theorem 1.24. Let X = (X1 , X2 , . . . , Xn )T be a random vector. If the joint mgf
of X can be written as a product of some functions gj (tj ), j = 1, 2, . . . , n, that is
MX (t) =
n
Y
gj (tj ),
j=1
then the variables X1 , X2 , . . . , Xn are independent.
Proof. Let ti = 0 for all i 6= j. Then, the marginal mgf MXj (tj ) is
Y
MXj (tj ) = gj (tj )
gi (0).
i6=j
Also, note that if ti = 0 for all i = 1, 2, . . . , n, then
Pn
MX (t) = E e j=1 tj Xj = E e0 = 1.
This gives
1 = MX (t) =
n
Y
j=1
gj (0) ⇒
Y
i6=j
gi (0) =
1
.
gj (0)
66 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
Therefore,
MXj (tj ) =
gj (tj )
gj (0)
and hence
MX (t) =
n
Y
j=1
gj (tj ) =
n
Y
j=1
gj (0)MXj (tj ) = 1 ×
n
Y
MXj (tj ).
j=1
This means that the joint pdf can also be written as a product of marginal pdfs,
g (t )
each with the marginal mgf equal to MXj (tj ) = gjj (0)j . Hence, the random variables X1 , X2 , . . . , Xn are independent.
1.12.3 Transformations of Random Vectors
Let X = (X1 , X2 , . . . , Xn )T be a continuous random vector and let g : Rn → Rn
be a one-to-one and onto function denoted by
g(x) = (g1 (x), g2(x), . . . , gn (x))T ,
where x = (x1 , x2 , . . . , xn )T and gi : Rn → R. Then, for a transformed random
vector Y = g(X) we have the following result.
Theorem 1.25. The density of Y = g(X) is given by
fY (y) = fX h(y) Jh (y),
where h(y) = g −1 (y) and Jh (y) denotes the absolute value of the Jacobian


∂

Jh (y) = det
h(y) = det 
∂y

∂
h (y)
∂y1 1
∂
h (y)
∂y2 1
..
.
∂
h (y)
∂yn 1
∂
h (y)
∂y1 2
∂
h (y)
∂y2 2
...
...
..
.
∂
h (y)
∂y1 n
∂
h (y)
∂y2 n
..
.
∂
∂
h (y) . . . ∂yn hn (y)
∂yn 2
Another useful form of the Jacobian is
−1
,
Jh (y) = Jg h(y)





1.12. MULTIVARIATE RANDOM VARIABLES
where
Jg (x) = det
67
∂
g(x).
∂x
Exercise 1.22. Let A be a non-singular n × n real matrix and let X be an ndimensional random vector. Show that the linearly transformed random variable
Y = AX has the joint pdf given by
1
fX A−1y .
fY (y) =
| det A|
1.12.4 Multivariate Normal Distribution
A random variable X has a multivariate normal distribution if its joint pdf can be
written as
1
1
T −1
√
exp − (x − µ) V (x − µ) ,
fX (x1 , . . . , xn ) =
2
(2π)n/2 det V
where the mean is
µ = (µ1 , . . . , µn )T ,
and the variance-covariance matrix has the form (1.20).
Exercise 1.23. Use the result from Exercise 1.22 to show that if X ∼ N n (µ, V )
then Y = AX has n-dimensional normal distribution with expectation Aµ and
variance-covariance matrix AV AT .
Lemma 1.3. If X ∼ Nn (µ, V ), B is an m × n matrix, and a is a real m × 1
vector, then the random vector
Y = a + BX
is also multivariate normal with
E(Y ) = a + B E(X) = a + Bµ,
and the variance-covariance matrix,
VY = BV B T .
68 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
Note that taking B = bT , where b is an n × 1 dimensional vector and a = 0 we
obtain
Y = bT X = b1 X1 + . . . + bn Xn ,
and
Y ∼ N (bT µ, bT V b).