Download Gaussian random vectors The weak law of large numbers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The weak law of large numbers
The central limit theorem
Gaussian random vectors
Covariance matrices
The multidimensional gaussian law
Multidimensional gaussian density
Marginal distributions
Eigenvalues of the covariance matrix
Uncorrelation and independence
Linear combinations
Conditional densities
October 11, 2011
1 / 40
2 / 40
The weak law of large numbers
�
Theorem
Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and
identically distributed r.v., each having E(Xk ) = m and finite
variance.
Let
The r.v. X n has mean m:
�
�
� �
X1 + X2 + · · · + Xn
E Xn = E
n
=
X1 + X2 + · · · + Xn
Xn =
, n ≥ 1.
n
Then, for all � > 0,
�
��
�
lim P �X n − m� ≥ � = 0
E(X1 ) + E(X2 ) + · · · + E(Xn )
n
=
nm
=m
n
n→∞
3 / 40
4 / 40
�
If Var(Xk ) = σ 2 , then the variance of X n is σ 2 /n.
�
�
Var X n = E
��
Xn − m
�2 �
�
�2 
n
1  �
= 2 E
(Xk − m) 
n
k=1

n
�
�
�
1
= 2 
E (Xk − m)2 + 2
n
k=1
=
�
By the Chebyshev’s inequality:
�
1≤i<j≤n
σ2
n

E ((Xi − m)(Xj − m))
Hence, as n → ∞,
�
��
� σ 2 /n
P �X n − m � ≥ � ≤ 2 .
�
�
��
�
P �X n − m � ≥ � → 0
Notice that E ((Xi − m)(Xj − m)) = 0 because the r.v. Xi are
independent, and, therefore, they are pairwise uncorrelated.
5 / 40
6 / 40
Example: Probability as the limit of the relative frequency
�
Let A be a given event with P(A) = p.
�
Le us repeat n times the random experiment (independent
repetitions).
�
Let Xk be the indicator of the event “A happens in the k-th
repetition”. Hence,
�
P (|fn (A) − p| ≥ �) → 0
as n → ∞
Equivalently,
P (|fn (A) − p| < �) → 1
E(Xk ) = P(A) = p.
�
Therefore, for all � > 0,
�
Then,
X1 + X2 + · · · + Xn
= fn (A)
n
is the relative frequency of the event A.
Xn =
7 / 40
quan n → ∞
Hence, in a certain sense, the relative frequency of the event
A converges to its probability.
8 / 40
The central limit theorem
�
Theorem
We have
E(Sn� ) = 0,
Let X1 , X2 , . . . , Xk , . . . be a sequence of independent and
identically distributed r.v., each with E(Xk ) = m and
Var(Xk ) = σ 2 .
Let
�
Notice that Sn� is the normalized arithmetic mean X n :
Sn� =
n
1 � Xk − m
Sn� = √
.
σ
n
Then,
1
lim FSn� (x) = FN(0,1) (x) = √
n→∞
2π
Xn − m
√
σ/ n
σ
X n = √ Sn� + m
n
k=1
�
Var(Sn� ) = 1
�
x
e
−t 2 /2
We write
d
Sn� −→ N(0, 1),
dt
−∞
9 / 40
10 / 40
Covariance matrices
where
�
The covariance matrix KX of an n-dimensional r.v.
mX is the expectation vector
mX = E(X ) = (mX1 , mX2 , . . . , mXn )t
X = (X1 , X2 , . . . , Xn )t
�
is the square n × n matrix defined by
�
KX = E (X − mX )(X − mX )

k11
 k21
=
 ···
kn1
k12
k22
···
kn2
···
···
···
···
t

k1n
k2n 

··· 
knn
For i �= j,
�
�
kij = E (Xi − mXi )(Xj − mXj ) = Cov(Xi , Xj )
�
�
11 / 40
The diagonal entries of KX are
�
�
kii = E (Xi − mXi )2 = Var(Xi )
12 / 40
The matrix KX is
�
symmetric,
�
Moreover, the r.v.
kij = Cov(Xi , Xj ) = Cov(Xj , Xi ) = kji
�
X 1 − m X1 , X 2 − m X2 , . . . , X n − m Xn
are linearly independent if and only if KX is positive-definite.
That is, if and only if, for all z �= 0,
positive-semidefinite.
That is, for all z = (z1 , z2 , . . . , zn )t ∈ Rn ,
z t KX z > 0
z t KX z ≥ 0
13 / 40
Indeed, if
14 / 40
Moreover,
t
Y = z1 X1 + · · · zn Xn = z X ,
z t KX z = 0 for some z �= 0
we have
t
t
�
t
⇐⇒ σY2 = 0 for some z �= 0
�
z KX z = z E (X − mX )(X − mX ) z
�
�
= E z t (X − mX )(X − mX )t z
�
�
= E (Y − mY )(Y − mY )t
�
�
= E (Y − mY )2 = σY2 ≥ 0.
�
(Notice that z t mX = ni=1 zi mXi = mY .)
⇐⇒ Y − mY = 0, with probability 1, for some z �= 0
�
⇐⇒ ni=1 zi (Xi − mXi ) = 0, with probability 1,
for some z = (z1 , z2 , . . . , zn )t �= 0
⇐⇒ X1 − mX1 , X2 − mX2 , . . . , Xn − mXn are linealy dependent,
with probability 1.
15 / 40
16 / 40
Linear transformations
We have
Theorem
mY = E(Y ) = E(AX ) = A E(X ) = AmX ,
Let X = (X1 , X2 , . . . , Xn )t be an n-dimensional r.v., let A be an
m × n real matrix, and let Y = (Y1 , Y2 , . . . , Ym )t be the
m-dimensional r.v. defined by
and
�
�
KY = E (Y − mY )(Y − mY )t
�
�
= E A (X − mX )(X − mX )t At
�
�
= A E (X − mX )(X − mX )t At = AKX At
Y = AX .
Then,
m Y = A mX ,
K Y = A KX A t
17 / 40
18 / 40
Multidimensional gaussian density
Let X1 , X2 , . . ., Xn be n independent gaussian r.v.
Hence,
We have,
fX (x1 , x2 , . . . , xn ) = fX1 (x1 )fX2 (x2 ) · · · fXn (xn )
fX (x1 , x2 , . . . , xn ) =
n
�
2
1
1
√
=
e − 2 ((xi −mXi )/σXi )
2π σXi
i=1
=
(2π)n/2 σ
=
(2π)n/2
(2π)n/2
2
1 �n
1
e − 2 i=1 ((xi −mXi )/σXi )
X1 σ X2 · · · σ X n
1
1
t
�
e − 2 (x−mX )
det(KX )
1
�
�
1
exp − (x − mX )t KX−1 (x − mX )
2
det(KX )
�
KX−1 (x−mX )
19 / 40
20 / 40
where
�
x = (x1 , x2 , . . . , xn )t
�
mX is the expectation vector
Now, let us consider a linear transformation
mX = (mX1 , . . . , mXn )
�



KX = 


σX2 1
0
.
σX2 2
.
. .
Y = AX ,
0
.
.
. . σX2 n
being A a non-singular n × n matrix.






�
The linear system y = Ax has a unique solution x = A−1 y .
�
The jacobian determinant is
J(x1 , x2 , . . . , xn ) = det(A)
is the covariance matrix.
It is a diagonal matrix because the r.v. are independent and,
hence, pairwise uncorrelated.
21 / 40
22 / 40
In this way, we have
Moreover,
�
fX (x1 , x2 , . . . , xn ) ��
fY (y1 , y2 , . . . , yn ) =
|J(x1 , x2 , . . . , xn )| �x=A−1 y
�
mY = AmX =⇒ mX = A−1 mY
=
�
(2π)n/2
1
�
·
det(KX ) |det(A)|
(2π)n/2
1
�
�
�
1
· exp − (A−1 y − A−1 mY )t KX−1 (A−1 y − A−1 mY )
2
KY = AKX At =⇒
KY−1 = (At )−1 KX−1 A−1 = (A−1 )t KX−1 A−1
=
�
det(KY )
·
�
�
�t
1�
· exp − A−1 (y − mY ) KX−1 A−1 (y − mY )
2
det(KY ) = det(KX ) det(A)2
23 / 40
24 / 40
Multidimensional gaussian density
=
1
�
·
(2π)n/2 det(KY )
�
�
1
t
−1 t −1 −1
· exp − (y − mY ) (A ) KX A (y − mY )
2
�
�
1
1
−1
t
�
=
exp − (y − mY ) KY (y − mY )
2
(2π)n/2 det(KY )
�
Notice that the above expression is analogous to the one we
have in the case of independent r.v.
�
But now, the covariance matrix KY will not be, in general, a
diagonal matrix.
Definition
If the random vector X has probability density
fX (x1 , x2 , . . . , xn ) =
�
�
1
1
t
−1
�
exp − (x − m) K (x − m) ,
2
(2π)n/2 det(K )
where x = (x1 , x2 , . . . , xn )t , m is a column n × 1 vector, and K is a
square positive-semidefinite n × n matrix, we say that X is a
n-dimensional gaussian r.v.
�
We also say that the r.v. X1 , X2 , . . ., Xn are jointly gaussian.
25 / 40
26 / 40
σX = σY
ρ=0
For instance, for n = 2 we obtain:
fXY (x, y ) =
where
a(x, y ) =
2π
�
1
1 − ρ2 σ X σ Y
�
�
1
1
exp − ·
· a(x, y ) ,
2 1 − ρ2
0.15
�
x − mX
σX
�2
− 2ρ
x − mX y − mY
·
+
σX
σY
�
y − mY
σY
�2
0.1
2
0.05
0
0
-2
0
-2
2
27 / 40
28 / 40
σX = σY
σX = σY
ρ = 0.5
ρ = 0.9
0.3
0.15
0.1
0.2
2
0.05
2
0.1
0
0
0
0
-2
-2
0
0
-2
-2
2
2
29 / 40
30 / 40
Marginal distributions
σX = 2σY
ρ = 0.7
If m = (mi ) and K = (kij ),
�
0.1
Each component Xi is a (1-dimensional) gaussian r.v. with
parameters
m Xi = m i ,
0.075
0.05
σi2 = kii
2
0.025
0
0
-4
�
K is the covariance matrix of X .
-2
0
-2
2
4
31 / 40
32 / 40
Eigenvalues of the covariance matrix
Hence,
The matrix KX is symmetric. Hence, it can be transformed into a
diagonal matrix by means of an ortogonal transformation.
0 ≤ z t KX z = z t C t DCz = (Cz)t D (Cz)
That is,
�
= y t Dy =
There exists an ortogonal matrix C such that
yi2 λi ,
i=1
CKX C t = D = diag(λ1 , λ2 , . . . , λn )
�
n
�
where y = Cz.
The numbers λi ∈ R are the eigenvalues of KX .
�
Therefore,
λi ≥ 0,
Equivalently,
i = 1, 2, . . . , n
KX = C t DC
33 / 40
�
�
Uncorrelation and independence
KX is a positive-definite matrix if and only if
λi > 0,
i = 1, 2, . . . , n
( Xi − mXi , i = 1, . . . , n, are linearly independent r.v.)
This condition is equivalent to
Theorem
det(KX ) �= 0
�
34 / 40
If the r.v. X1 , X2 , . . ., Xn are jointly gaussian and pairwise
uncorrelated, then they are jointly independent.
Gaussian random vectors can be defined in a more general
framework, in such a way that KX is not necessarily invertible.
That is, Xi − mXi , i = 1, . . . , n, could be not all linearly
independent.
But only in the case that det(KX ) �= 0 there exists a density
fX (x1 , x2 , . . . , xn )
35 / 40
36 / 40
Linear combinations
Indeed,
Cov(Xi , Xj ) = 0 =⇒ KX = diag(σX2 1 , σX2 2 , . . . , σX2 n )
and, hence,
Theorem
Let X be an n-dimensional gaussian r.v., let A be an m × n real
matrix, and let
Y = AX .
fX (x1 , x2 , . . . , xn ) =
(2π)n/2
�
�
1
1
−1
t
�
exp − (x − mX ) KX (x − mX )
2
det(KX )
Then, Y is an m-dimensional gaussian r.v. with mY = A mX and
KY = AKX At .
�
2
1
− 12 ni=1 ((xi −mXi )/σXi )
=
e
(2π)n/2 σX1 σX2 · · · σXn
�
n
�
2
1
1
√
=
e − 2 ((xi −mXi )/σXi ) = fX1 (x1 )fX2 (x2 ) · · · fXn (xn )
2π σXi
i=1
If m ≤ n and A has full rang m, the random vector Y has a
probaility density fY (y1 , . . . , yn ).
37 / 40
38 / 40
Conditional densities
Let X , Y be jointly gaussian. Then,
Theorem
fY |X (y |X = x) =
The n-dimensional r.v. X = (X1 , . . . , Xn )t is gaussian if and only if
the 1-dimensional r.v.
=√
Y = a1 X 1 + · · · + an X n = a t X
is gaussian for all a = (a1 , a2 , . . . an )t ∈ Rn
�
1
2π
fXY (x, y )
fX (x)
�
1 − ρ2 σY
�
1
exp −
2
39 / 40
y − mY |X
σY |X
�2 �
mY |X is the expected value of Y given X :
mY |X = E(Y |X = x) = ρ
�
�
σY
(x − mX ) + mY
σX
σY2 |X = (1 − ρ2 ) σY2 .
40 / 40
Related documents