Download Chapter 5 The multivariate normal distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Probability Theory
Linear transformations
A transformation is said to be linear if every single function in the
transformation is a linear combination.
Chapter 5
The multivariate
normal distribution
Thommy Perlinger, Probability Theory
When dealing with linear transformations it is convenient to use matrix
notation. A linear transformation can then be written as Y=BX+b where
1
The mean vector and
the covariance matrix
2
The mean vector and
the covariance matrix
Definition 2.1. Let X be a random n-vector whose components have finite
variance. The mean vector of X is μ=E(X), and the covariance matrix of X is
Λ=E(X-μ)(X-μ)´
Λ
E(X μ)(X μ) .
For the covariance matrix of X, it follows that
When dealing with random vectors and matrices expectations are taken
componentwise, which means that
That is, the elements of the mean vector are the means of the components
of X.
Thommy Perlinger, Probability Theory
3
Thommy Perlinger, Probability Theory
4
1
Expectations for linear transformations
Expectations for linear transformations
Proof (the covariance matrix)
Theorem 2.2. Let X be a random n-vector with mean vector μ
and covariance matrix Λ. Further, let B be an m×n matrix, let b
b a constant m-vector, and
be
d set Y=BX+b.
Y BX b Then
Th
and
Proof (the covariance matrix). Because of the fact that
multiplicative constant matrices can be moved outside of the
expectation it follows that
Thommy Perlinger, Probability Theory
5
The multivariate normal distribution
Definition I
Thommy Perlinger, Probability Theory
6
Exercise 5.3.2
Definition I. The random n-vector X is normal iff, for every n-vector a, the
linear combination a´X is (univariate) normal.
Notation. The notation X ∈ N(µ,Λ) is used to denote that X has a multivariate
normal distribution with mean vector μ and covariance matrix Λ.
Let X = (X1,X2)´ be a normal random vector distributed as
What is the joint distribution of Y1=X1+X2 and Y2=2X1-3X2? Since
Theorem 3.1. Let X ∈ N(µ,Λ) and set Y=BX+b. Then Y ∈ N(Bµ+b, BΛB´)
Proof. The correctness of the mean vector and the covariance matrix follows
directly from Theorem 2.2. Next we prove that every linear combination of Y
is normal by showing that a linear combination of Y is another linear
combination of X.
Thommy Perlinger, Probability Theory
7
it follows from Theorem 3.1 that
8
2
The multivariate normal distribution
Definition II: Transforms
Proof of Theorem 4.2.
Definition I implies Definition II
The moment generating function of a random vector X is given by
Let X be N(µ,Λ) by Definition I. The mgf of X is given by
Definition II. The random vector X is normal, N(µ,Λ), iff its moment
generating function is on the form
and since Y = t´X is a linear combination of X, it follows from Definition I that
Y is (univariate) normal and therefore has a moment generating function.
Furthermore, it follows from Theorem 2.2 that E(Y)=t´μ and Var(Y)=t´Λt.
Hence
Theorem 4.2. Definition I and Definition II are equivalent.
q
The meaning. If every linear combination of X is univariate normal then the
moment generating function of X is on the form given above. If, on the other
hand, the moment generating function of X is on the form given above, then
every linear combination of X is univariate normal.
Thommy Perlinger, Probability Theory
9
and the first part of the proof is established.
Thommy Perlinger, Probability Theory
10
Properties of non-negative definite
symmetric matrices
Properties of symmetric matrices
Definition. A symmetric matrix A is said to be positive-definite if
for all x≠0 the quadratic form x´Ax is positive.
If for
f allll x the
th quadratic
d ti form
f
is
i non-negative
ti then
th A is
i said
id to
t be
b
nonnegative-definite (or positive-semidefinite).
Orthogonal matrices. A symmetric matrix C is an orthogonal matrix if C´C=I,
where I is the identity matrix.
Theorem 2.1. Every covariance matrix Λ is nonnegativedefinite.
Diagonal matrices. A symmetric matrix D is a diagonal matrix if the
diagonal elements are the only non-zero elements of D.
Proof. Let X be a random vector whose covariance matrix is Λ,
and
d now study
d the
h linear
li
combination
bi i y´X.
´X By
B Th
Theorem 2.2
22
It follows that the rows (and columns) of an orthogonal matrix is orthonormal,
that is, they all have unit length and they are all pairwise orthogonal.
Diagonalization. Let A be a symmetric matrix. Then there exists an
orthogonal
g
matrix C and a diagonal
g
matrix D such that A=CDC´.
Furthermore, the diagonal elements of D are the eigenvalues of A.
The square root. Let A be a symmetric matrix. The square root of A is a
matrix (usually denoted) A1/2 where A1/2 A1/2 = A.
and the theorem is proved.
It follows from the diagonalization of A that A1/2 = CD1/2C´.
Thommy Perlinger, Probability Theory
11
12
3
Proof of Theorem 4.2.
Definition II implies Definition I
Proof of Theorem 4.2.
Definition II implies Definition I
Let Y1,…,Yn be independent N(0,1), that is, Y = (Y1,…,Yn )´ are N(0,I) by
Definition I. The moment generating function of Y is given by
The moment generating function of X is given by
which is the mgf
g g
given in Definition II. Since
Next we let X = Λ1/2Y + µ and since this is a linear transformation of Y it
follows from Theorem 2.2 that
it is clear that any linear combination of X is another linear combination of Y,
which means that X is normal, N(µ,Λ), according to Definition I.
13
Thommy Perlinger, Probability Theory
Problem 5.10.30 (part 1)
14
Problem 5.10.30 (part 1)
Let X₁,X₂, and X₃ have joint moment generating function as follows:
Find the joint distribution of Y1=X1+X3 and Y2=X1+X2, that is, the distribution
of the linear transformation Y=BX where
Since
By Definition II it follows that X₁,X₂, and X₃ are jointly normal
it follows from Theorem 3.1 that
Thommy Perlinger, Probability Theory
15
16
4
Important properties of determinants
The multivariate normal distribution
Definition III: The density function
Definition III. The random vector X is normal, N(µ,Λ),
(where det Λ > 0) iff its density function is on the form
1. A square matrix A is invertible iff det A ≠ 0.
2. For the identity matrix I we have that det I = 1.
3. For the transpose of A we have that det A = det A´.
4. Let A and B be square matrices. Then det AB = det A det B.
5. Results 2. and 4. now imply that det A-1 = (det A)-1.
Theorem 5.2. Definitions I, II, and III are equivalent (in the
nonsingular case).
6 Let C be an orthogonal matrix.
6.
matrix Results 2.,
2 3
3., and 4
4. now
imply that det C = ±1.
7. Since a symmetric matrix A can be diagonalized as A=CDC´
it follows by results 4. and 6. that det A = det D = λ1·λ2···λn,
where λ1,λ2,…,λn are the eigenvalues of A.
17
Proof of Theorem 5.2
Idea for the proof. First we find a normal random vector Y whose
density function is easy to derive. Then a suitably defined linear
transformation X = BY will be N(µ,Λ). Finally the transformation
theorem (Theorem 1.2.1) will give us the density function of X.
Thommy Perlinger, Probability Theory
18
Proof of Theorem 5.2
Step 1. Find a normal random vector Y whose density function
is easy to derive
Let Y1,…,Yn be independent N(0,1). Then, by Definition I,
Y = (Y1,…,Yn )´ is N(0,I). The density function of Y is given by
Step 2. We know from before that X = Λ1/2Y + µ is N(µ,Λ).
Step 3. Find the density function of X. Recall Theorem 1.2.1.
Step 3.1. Inversion yields that Y = Λ-1/2(X - µ).
Step 3.2. Since it is a linear transformation, the Jacobian becomes
Step 3.3. Finally, it follows from Theorem 1.2.1. that
Thommy Perlinger, Probability Theory
19
20
5
Problem 5.10.30 (part 2)
Conditional distributions
General situation. Let X be N(µ,Λ) with det Λ > 0. Furthermore,
let X1 and X2 be subvectors of X where the components of X1
and
d X2 are assumed
d to
t be
b different.
diff
t By
B definition
d fi iti
In the first part of the problem we found that
Since det Λ = 4·10-2·2 = 36 and
it follows from Definition III that the density of Y is given by
y
g be said about the distribution of X2||X1=x1?
Can anything
Answer. YES! Conditional distributions of multivariate normal
distributions are normal.
Thommy Perlinger, Probability Theory
21
Thommy Perlinger, Probability Theory
Problem 5.10.30 (part 3)
Independence
Find the conditional density of Y1 given that Y2=1, that is, find fY1|Y2=1(y1).
Natural question 1. Is there an easy way to determine whether the
components of a normal random vector are independent?
22
Theorem 7.1. Let X be a normal random vector. The components of X are
independent iff they are uncorrelated.
Proof. Show that uncorrelated components imply independence. Since
it follows that
which means that
Hence, the conditional distribution of Y1 given that Y2=1 is N(4/5, 18/5).
23
24
6
Problem 5.10.10
Problem 5.10.10
Since (U,V)′ is a linear transformation of (X,Y)′ it is clear that (U,V)′ is also
bivariate normal. The covariance matrix of (U,V)′ is given by
Suppose that the moment generating function of (X,Y)′ is
Determine a so that U=X+2Y and V=2X-Y become independent.
Since
it follows from Definition II that
Thommy Perlinger, Probability Theory
and it is thus clear that only for a=4/3 will U and V be independent.
25
Independence and linear transformations
Natural question 2. A linear transformation of a normal random vector is
itself normal. Is it always possible to find a linear transformation that will have
uncorrelated and hence,
uncorrelated,
hence independent components?
Theorem 8.1. Let X be N(µ,Λ). Furthermore, let C be the orthogonal matrix
that diagonalizes Λ, that is, C´ΛC = D, where the diagonal elements of D are
the eigenvalues of Λ. Then Y = C´X is N(C´μ, D).
Theorem 8.2. Let X be N(µ,σ2I). Furthermore, let C be an arbitrary
orthogonal matrix. Then Y = C´X is N(C´μ, σ2I).
Thommy Perlinger, Probability Theory
26
Problem 5.10.9 b
Let X and Y be independent N(0,σ2). Show that X+Y and X-Y are
independent normal random variables.
Since X and Y are independent we have that (X,Y)′ is bivariate normal,
N(0,σ2I). Furthermore
and because of the fact that
Conclusion. For the general N(µ,Λ) it always exists one orthogonal
transformation that will yield a normal random vector with independent
components. For the special case N(µ,σ2I) any orthogonal transformation will
produce a normal random vector with independent components.
Thommy Perlinger, Probability Theory
It is, however, by Theorem 7.1, enough to determine an off-diagonal element.
27
it follows from Exercise 8.2 that the components of (X+Y,X-Y)′ are
independent normal random variables.
Thommy Perlinger, Probability Theory
28
7
Problem 5.10.37
Problem 5.10.37
Let
Since det Λ = 1-ρ² and
where ρ is the correlation coefficient. Determine the probability distribution of
it follows that the joint density function of X and Y is given by
The moment generating function of W is defined by
It follows by the density of (X,Y)´ that the main part of the expression for the
moment generating function of W is given by Q where
and in order to find it we first have to find the joint density of X and Y.
Thommy Perlinger, Probability Theory
29
Problem 5.10.37
Thommy Perlinger, Probability Theory
30
Problem 5.10.37
It follows that the moment generating function of W is given by
Since Q is the main part of a multivariate normal density function where
and it is clear that W is χ2(2).
and
31
Thommy Perlinger, Probability Theory
32
8
The multivariate normal distribution
and the Chi-square distribution
Theorem 9.1. Let X be N(µ,Λ) with det Λ > 0. Then
where n is the dimension of X.
Proof. Set Y = Λ-1/2(X - µ). Then Y is N(0,I) and it follows that
and
d since
i
where Y1,Y2,…,Yn are i.i.d. N(0,1) it is clear that Y´Y is χ2(n).
Thommy Perlinger, Probability Theory
33
9