Download Chapter 5, 6 Multiple Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 5, 6
Multiple Random Variables
ENCS6161 - Probability and Stochastic
Processes
Concordia University
Vector Random Variables
A vector r.v. X is a function X : S → Rn , where S is
the sample space of a random experiment.
Example: randomly pick up a student name from a
list. S = {all student names on the list}. Let ω be a
given outcome, e.g. Tom

H(ω) : height of student ω 

H, W, A are r.v.s.
W (ω) : weight of student ω


A(ω) : age of student ω
Let X = (H, W, A), then X is a vector r.v.
ENCS6161 – p.1/47
Events
Each event involving X = (X1 , X2 , · · · , Xn ) has a
corresponding region in Rn .
Example: X = (X1 , X2 ) is a two-dimensional r.v.
A = {X1 + X2 ≤ 10}
B = {min(X1 , X2 ) ≤ 5}
C = {X12 + X22 ≤ 100}
ENCS6161 – p.2/47
Pairs of Random Variables
Pairs of discrete random variables
Joint probability mass function
PX,Y (xj , yk ) = P {X = xj
P P
\
Y = yk } = P {X = xj , Y = yk }
Obviously j k PX,Y (xj , yk ) = 1.
Marginal Probability Mass Function
PX (xj ) = P {X = xj } = P {X = xj , Y = anything} =
Similarly PY (yk ) =
P∞
j=1
∞
X
PX,Y (xj , yk )
k=1
PX,Y (xj , yk ).
ENCS6161 – p.3/47
Pairs of Random Variables
The joint CDF of X and Y (for both discrete and
continuous r.v.s)
FX,Y (x, y) = P {X ≤ x, Y ≤ y}
y
6
(x,y)
x
ENCS6161 – p.4/47
Pairs of Random Variables
Properties of the joint CDF:
1.
2.
3.
4.
FX,Y (x1 , y1 ) ≤ FX,Y (x2 , y2 ), if x1 ≤ x2 , y1 ≤ y2 .
FX,Y (−∞, y) = FX,Y (x, −∞) = 0
FX,Y (∞, ∞) = 1
FX (x) = P {X ≤ x} = P {X ≤ x, Y = anything}
= P {X ≤ x, Y ≤ ∞} = FX,Y (X, ∞)
FY (y) = FX,Y (∞, y)
FX (x), FY (y) : Marginal cdf
ENCS6161 – p.5/47
Pairs of Random Variables
The joint pdf of two jointly continuous r.v.s.
∂2
fX,Y (x, y) =
FX,Y (x, y)
∂x∂y
Obviously,
Z
∞Z ∞
fX,Y (x, y)dxdy = 1
−∞ −∞
and
FX,Y (x, y) =
Z
x
Z
y
fX,Y (x0 , y 0 )dy 0 dx0
−∞ −∞
ENCS6161 – p.6/47
Pairs of Random Variables
The probability
P {a ≤ X ≤ b, c ≤ Y ≤ d} =
In general,
P {(X, Y ) ∈ A} =
Z Z
Z bZ
a
d
fX,Y (x, y)dydx
c
fX,Y (x, y)dxdy
A
Example:
1
y
6
A
Z 1Z
0
0
1
x
fX,Y (x0 , y 0 )dy 0 dx0
0
- x
ENCS6161 – p.7/47
Pairs of Random Variables
Marginal pdf:
fX (x) =
=
=
fY (y) =
d
d
FX (x) =
FX,Y (x, ∞)
dx dx
Z xZ ∞
d
fX,Y (x0 , y 0 )dy 0 dx0
dx
−∞ −∞
Z ∞
fX,Y (x, y 0 )dy 0
−∞
Z ∞
fX,Y (x0 , y)dx0
−∞
ENCS6161 – p.8/47
Pairs of Random Variables
Example:

 1
fX,Y (x, y) =
 0
if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
otherwise.
Find FX,Y (x, y)
1) x ≤ 0 or y ≤ 0, FX,Y (x, y) = 0
2) 0 ≤ x ≤ 1, and 0 ≤ y ≤ 1 Z Z
x
y
1 dy 0 dx0 = xy
FX,Y (x, y) =
0
3) 0 ≤ x ≤ 1, and y > 1
FX,Y (x, y) =
4) x > 1 and 0 ≤ y < 1
Z
0
x
0
Z
1
1 dy 0 dx0 = x
0
FX,Y (x, y) = y
5) x > 1 and y > 1
FX,Y (x, y) = 1
ENCS6161 – p.9/47
Independence
PX,Y (xj , yk ) = PX (xj )PY (yk ), for all xj and yk
(discrete r.v.s)
or FX,Y (x, y) = FX (x)FY (y) for all x and y
or fX,Y (x, y) = fX (x)fY (y) for all x and y
Example:
a)
6
fX,Y =
(
1 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
0 otherwise
b)
fX,Y =
-
6
(
√
1 0 ≤ x ≤ 2, 0 ≤ y ≤ x
0 otherwise
-
ENCS6161 – p.10/47
Conditional Probability
If X is discrete,
FY (y | x)
=
fY (y | x)
=
P {Y ≤ y, X = x}
P {X = x}
d
FY (y | x)
dy
for P {X = x} > 0
If X is continuous, P {X = x} = 0
FY (y | x) =
=
h→0
lim
h→0
R y R x+h
−∞ x
fY (y|x)
=
lim
Ry
fX,Y (x0 , y 0 )dx0 dy 0
R x+h
x
=
P {Y ≤ y, x < X ≤ x + h}
h→0
P {x < X ≤ x + h}
lim FY (y | x < X ≤ x + h) = lim
fX (x0 )dx0
0
0
f
(x, y )dy · h
−∞ X,Y
fX (x) · h
fX,Y (x, y)
d
FY (y|x) =
dy
fX (x)
h→0
=
Ry
0
0
f
(x,
y
)dy
X,Y
−∞
fX (x)
ENCS6161 – p.11/47
Conditional Probability
If X , Y independent,
fY (y|x) = fY (y)
Similarly,
fX,Y (x, y)
fX (x|y) =
fY (y)
So,
fX,Y (x, y) = fX (x) · fY (y|x) = fY (y) · fX (x|y)
Bayes Rule:
fX (x) · fY (y|x)
fX (x|y) =
fY (y)
ENCS6161 – p.12/47
Conditional Probability
Example: A r.v. X is uniformly selected in [0, 1], and
then Y is selected uniformly in [0, x]. Find fY (y)
Solution:
1
1
fX,Y (x, y) = fX (x)fY (y|x) = 1 · =
x
x
for 0 ≤ x ≤ 1, 0 ≤ y ≤ x and is 0 elsewhere.
Z ∞
fY (y) =
fX,Y (x, y)dx
1
0
y
6
1
x
−∞
Z 1
1
dx = − ln y
=
y x
for 0 ≤ y ≤ 1 and fY (y) = 0 elsewhere.
ENCS6161 – p.13/47
Conditional Probability
Example:
x ∈ {0, 1}
-
fY (y | x)
Y-∈ R
Detector
X̂
-
Channel
X = 0 −→ −A (volts), we assume P {X = 0} = P0
X = 1 −→ +A (volts), P {X = 1} = P1 = 1 − P0
Decide X̂ = 0, if P {X = 0 | y} ≥ P {X = 1 | y}
Decide X̂ = 1, if P {X = 0 | y} < P {X = 1 | y}
This is called the Maximum a posterior probability
(MAP) detection.
ENCS6161 – p.14/47
Conditional Probability
Binary communication over Additive White Gaussian
Noise (AWGN) channel
2
1
− (y+A)
fY (y|0) = √
e 2σ2
2πσ
(y−A)2
1
e− 2σ2
fY (y|1) = √
2πσ
Apply the MAP detection, we need to find
P {X = 0|y} and P {X = 1|y}. Note here X is discrete,
Y is continuous.
ENCS6161 – p.15/47
Conditional Probability
Use the similar approach (considering x < X ≤ x + h, and let
h → 0), we have
P {X = 0}fY (y|0)
P {X = 0|y} =
fY (y)
P {X = 1}fY (y|0)
P {X = 1|y} =
fY (y)
Decide X̂ = 0, if
p0
σ2
ln
P {X = 0|y} ≥ P {X = 1|y} ⇒ y ≤
2A p1
Decide X̂ = 1, if
σ2
p0
P {X = 0|y} < P {X = 1|y} ⇒ y >
ln
2A p1
When p0 = p1 =
1
2
:
Decide X̂ = 0, if y ≤ 0
Decide X̂ = 1, if y > 0
ENCS6161 – p.16/47
Conditional Probability
Prob of error:
considering the special case p0 = p1 = 12
Pε = P0 P {X̂ = 1|X = 0} + P1 P {X̂ = 0|X = 1}
= P0 P {Y > 0|X = 0} + P1 P {Y ≤ 0|X =1}
Z ∞
(y+A)2
1
A
− 2σ2
P {Y > 0|X = 0} = √
e
dy = Q
σ
2πσ 0
Similarly,
A
P {Y ≤ 0|X = 1} = Q
σ
A
∴ Pε = Q
σ
A ↑, Pε ↓ σ ↑, Pε ↑
ENCS6161 – p.17/47
Conditional Expectation
The conditional expectation
E[Y |x] =
In discrete case,
Z
E[Y |x] =
An important fact:
yfY (y|x)dy
−∞
X
yi
yi PY (yi |x)
E[Y ] = E[E[Y |X]]
Proof:
E[E[Y |X]] =
=
In general:
∞
Z
Z
∞
−∞
∞
−∞
E[Y |x]fX (x)dx =
Z
∞
Z
∞
−∞
Z
∞
yfY (y|x)fX (x)dydx
−∞
yfX,Y (x, y)dydx = E[Y ]
−∞
E[h(Y )] = E[E[h(Y )|X]]
ENCS6161 – p.18/47
Multiple Random Variables
Joint cdf
FX1 ,···Xn (x1 , · · · xn ) = P [X1 ≤ x1 , · · · , X1 ≤ xn ]
Joint pdf
∂n
fX1 ,···Xn (x1 , · · · xn ) =
FX1 ,···Xn (x1 , · · · xn )
∂x1 , · · · ∂xn
If discrete, joint pmf
PX1 ,···Xn (x1 , · · · xn ) = P [X1 = x1 , · · · Xn = xn ]
Marginal pdf Z
fXi (xi ) =
∞
···
Z
∞
−∞
−∞
all x1 ···xn except xi
f (x1 , · · · xn )dx1 · · · dxn
ENCS6161 – p.19/47
Independence
X1 , · · · Xn are independent iff
FX1 ,···Xn (x1 , · · · xn ) = FX1 (x1 ) · · · FXn (xn )
for all x1 , · · · , xn
If we use pdf,
fX1 ,···Xn (x1 , · · · xn ) = fX1 (x1 ) · · · fXn (xn )
for all x1 , · · · , xn
ENCS6161 – p.20/47
Functions of Several r.v.s
One function of several r.v.s
Z = g(X1 , · · · Xn )
Let Rz = {x = (x1 , · · · , xn ) s.t. g(x) ≤ z} then
Fz (z) = P {X ∈ Rz }
Z
Z
=
· · · fX1 ,··· ,Xn (x1 , · · · , xn )dx1 · · · dxn
x∈Rz
ENCS6161 – p.21/47
Functions of Several r.v.s
Example: Z = X + Y , find Fz (z) and fz (z) in terms of
fX,Y (x, y)
Z = X +Y ≤z ⇒ Y ≤z−X
Z ∞ Z z−x
Fz (z) =
fX,Y (x, y)dydx
−∞ −∞
Z ∞
d
Fz (z) =
fz (z) =
fX,Y (x, z − x)dx
dz
−∞
y
y=z-x
x
If X and Y are independent
Z ∞
fz (z) =
fX (x)fY (z − x)dx
−∞
ENCS6161 – p.22/47
Functions of Several r.v.s
Example: let Z = X/Y . Find the pdf of Z if X and Y
are independent and both exponentially distributed
with mean one.
Can use the similar approach as previous example,
but complicated. Fix Y = y , then Z = X/y and
fZ (z|y) = |y|fX (yz|y). So
Z ∞
Z ∞
fZ (z) =
fZ,Y (z, y)dy =
fZ (z|y)fY (y)dy
−∞
−∞
Z ∞
Z ∞
=
|y| fX (yz|y)fY (y)dy =
|y| fX,Y (yz, y)dy
−∞
−∞
ENCS6161 – p.23/47
Functions of Several r.v.s
Since X, Y are indep. exponentially distributed
Z ∞
Z ∞
fZ (z) =
y fX (yz)fY (y)dy =
ye−yz e−y dy
0
1
=
(1 + z)2
0
for z > 0
ENCS6161 – p.24/47
Transformation of Random Vectors
Transformation of Random Vectors
Z1 = g1 (X1 · · · Xn ) Z2 = g2 (X1 · · · Xn ) · · · Zn = gn (X1 · · · Xn )
The joint CDF of Z is
FZ1 ···Zn (z1 · · · zn ) = P {Z1 ≤ z1 , · · · , Zn ≤ zn }
Z
Z
=
· · · fX1 ···Xn (x1 · · · xn )dx1 · · · dxn
x:gk (x)≤zk
ENCS6161 – p.25/47
pdf of Linear Transformations
If Z = AX , where A is a n × n invertible matrix.
fZ (z) = fZ1 ···Zn (z1 , · · · , zn )
=
fX (A−1 z)
fX1 ···Xn (x1 , · · · , xn ) =
|A|
|A|
x=A−1 z
|A| is the absolute
"
# value of the determinant of A.
e.g if A =
a b
c d
, then |A| = |ad − bc|
ENCS6161 – p.26/47
pdf of General Transformations
Z1 = g1 (X), Z2 = g2 (X), · · · , Zn = gn (X) where
X = (X1 , · · · , Xn )
We assume that the set of equations:
z1 = g1 (x), · · · , zn = gn (x)
has a unique solution given by
x1 = h1 (z), · · · , xn = hn (z)
The joint pdf of Z is given by
fX1 ···Xn (h1 (z), · · · , hn (z))
fZ1 ···Zn (z1 · · · zn ) =
|J(x1 , · · · , xn )|
= fX1 ···Xn (h1 (z), · · · , hn (z)) |J(z1 , · · · , zn )| (∗)
where J(x1 , · · · , xn ) is called the Jacobian of the
transformation.
ENCS6161 – p.27/47
pdf of General Transformations
The Jacobian of the transformation
 ∂g

∂g1
1
∂x1 · · · ∂xn


J(x1 , · · · , xn ) = det  · · · · · · · · · 
∂gn
∂gn
·
·
·
∂x1
∂xn
and


∂h1
∂h1
·
·
·
∂z1
∂zn
1


J(z1 , · · · , zn ) = det  · · · · · · · · ·  =
J(x
·
·
·
x
)
1
n
x=h(z)
∂hn
∂hn
·
·
·
∂z1
∂zn
Linear transformation is a special case of (∗)
ENCS6161 – p.28/47
pdf of General Transformations
Example: let X and Y be zero-mean unit-variance
independent Gaussian r.vs. Find the joint pdf of V
and (
W defined by:
1
2
2
2
V = (X + Y )
W = ∠(X, Y ) = arctan(Y /X) W ∈ [0, 2π)
This is a transformation from Cartesian to Polar
coordinates. The inverse transformation is:
y
(
y
v
w
x
x = v cos(w)
y = v sin(w)
x
ENCS6161 – p.29/47
pdf of General Transformations
The Jacobian
"
#
cos w −v sin w
J(v, w) =
= v cos2 w + v sin2 w = v
sin w v cos w
Since X and Y are zero-mean unit-variance
independent Gaussian r.v.s,
1 − y2
1 − x2 +y2
1 − x2
fX,Y (x, y) = √ e 2 · √ e 2 =
e 2
2π
2π
2π
The joint pdf of V, W is then
v − v2
1 −(v2 cos2 w+v2 sin2 w)
2
=
e
e 2
fV,W (v, w) = v ·
2π
2π
for v ≥ 0 and 0 ≤ w < 2π
ENCS6161 – p.30/47
pdf of General Transformations
The marginal pdf of V and W
Z ∞
Z
fV (v) =
fV,W (v, w)dw =
2π
v2
v − v2
−
e 2 dw = ve 2
2π
−∞
0
for v ≥ 0. This
Z ∞Distribution.
Z ∞is called the Rayleigh
v2
1
1
−2
ve
dv =
fW (w) =
fV,W (v, w)dv =
2π 0
2π
−∞
for 0 ≤ w < 2π
Since
fV,W (v, w) = fV (v)fW (w)
⇒ V, W are independent.
ENCS6161 – p.31/47
Expected Value of Functions of r.v.s
Let Z =Zg(X1 , XZ2 , · · · , Xn ) then
∞
∞
E[Z] =
···
g(x1 , · · · , xn )fX1 ···Xn (x1 , · · · , xn )dx1 · · · dxn
−∞
−∞
For discrete
Xcase,
X
g(x1 , · · · , xn )PX1 ···Xn (x1 , · · · , xn )
E[Z] =
···
all possible x
Example: Z = X1 + X2 + · · · + Xn
E[Z] = E[X1 + X2 + · · · + Xn ]
Z ∞
Z ∞
=
···
(x1 + · · · + xn )fX1 ···Xn (x1 · · · xn )dx1 · · · dxn
−∞
−∞
= E[X1 ] + · · · + E[Xn ]
ENCS6161 – p.32/47
Expected Value of Functions of r.v.s
Example:
X2 · · · Xn
Z ∞Z = X
Z 1∞
E[Z] =
···
x1 · · · xn fX1 ···Xn (x1 · · · xn )dx1 · · · dxn
−∞
−∞
If X1 , X2 , · · · , Xn are indep.
E[Z] = E[X1 X2 · · · Xn ] = E[X1 ]E[X2 ] · · · E[Xn ]
The (j, k)-th moment
ofZtwo r.v.s X&Y is
Z
E[X j Y k ] =
∞
∞
−∞
−∞
∞
∞
−∞
−∞
xj y k fX,Y (x, y)dxdy
If j = k = 1 , it is called
Zthe correlation.
Z
E[XY ] =
xyfX,Y (x, y)dxdy
If E[XY ] = 0, we call X&Y are orthogonal.
ENCS6161 – p.33/47
Expected Value of Functions of r.v.s
The (j, k)-th central
moment of X, Y is
h
E (X − E (X))j (Y − E (Y ))k
when j = 2, k = 0, ⇒ V ar(X)
j = 0, k = 2, ⇒ V ar(Y )
i
When j = k = 1, it is called the covariance of X, Y
Cov(X, Y ) = E[(X − E(X))(Y − E(Y ))]
= E[XY ] − E[X]E[Y ] = Cov(Y, X)
The correlation coefficient of X and Y is defined as
Cov(X, Y )
ρX,Y =
σX σY p
p
where σX = V ar(X) and σY = V ar(Y ).
ENCS6161 – p.34/47
Expected Value of Functions of r.v.s
The correlation coefficient −1 ≤ ρX,Y ≤ 1
Proof:
(
2 )
X − E[X] Y − E[Y ]
0 ≤ E
±
σX
σY
= 1 ± 2ρX,Y + 1 = 2(1 ± ρX,Y )
If ρX,Y = 0, X, Y are said to be uncorrelated.
If X, Y are independent,
E[XY ] = E[X]E[Y ] ⇒ Cov(X, Y ) = 0 ⇒ ρX,Y = 0.
Hence, X, Y are uncorrelated.
The converse is not always true. It is true in the case
of Gaussian r.v.s ( will be discussed later)
ENCS6161 – p.35/47
Expected Value of Functions of r.v.s
Example: θ is uniform in [0, 2π).
Let X = cos θ and Y = sin θ
X and Y are not independent, since X 2 + Y 2 = 1.
However
1
E[XY ] = E[sin θ cos θ] = E[ sin(2θ)]
2
Z 2π
1 1
sin(2θ)dθ = 0
=
2π 2
0
We can also show E[X] = E[Y ] = 0. So
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0 ⇒ ρX,Y = 0
X, Y are uncorrelated but not independent.
ENCS6161 – p.36/47
Joint Characteristic Function
Joint Characteristic Function h
i
ΦX1 ···Xn (w1 , · · · , wn ) = E ej(w1 X1 +···+wn Xn )
For two variables
h
i
ΦX,Y (w1 , w2 ) = E ej(w1 X+w2 Y )
Marginal characteristic function
ΦX (w) = ΦX,Y (w, 0) ΦY (w) = ΦX,Y (0, w)
If X, Y are independent
ΦX,Y (w1 , w2 ) = E[ejw1 X+jw2 Y ]
= E[ejw1 X ]E[ejw2 Y ] = ΦX (w1 )ΦY (w2 )
ENCS6161 – p.37/47
Joint Characteristic Function
If Z = aX + bY
ΦZ (w) = E[ejw(aX+bY ) ] = ΦX,Y (aw, bw)
If Z = X + Y , X and Y are independent
ΦZ (w) = ΦX,Y (w, w) = ΦX (w)ΦY (w)
ENCS6161 – p.38/47
Jointly Gaussian Random Variables
Consider a vector of random variables
X = (X1 , X2 , · · · Xn ). Each with mean mi = E[Xi ], for
i = 1, · · · , n and the covariance matrix


V ar(X1 )
Cov(X1 , X2 ) · · · Cov(X1 , Xn )


.
.
.
.
.
.
K=

.
.
···
.
Cov(Xn , X1 )
···
···
V ar(Xn )
Let m = [m1 , · · · , mn ]T be the mean vector and
x = [x1 , · · · , xn ]T where (·)T denotes transpose. Then
X1 , X2 , · · · Xn are said to be the jointly Gaussian if
their joint pdf is:
1
1
T −1
fX (x) =
exp
−
−
m)
K (x − m)
(x
1
n
2
(2π) 2 |K| 2
where |K| is the determinant of K .
ENCS6161 – p.39/47
Jointly Gaussian Random Variables
For n = 1, let V ar(X1 ) = σ 2 , then fX (x) =
√1 e
2πσ
(x−m)2
2σ 2
For n
 the r.v.s by X and Y. Let
 = 2, denote
m=
mX
mY
K=
2
σX
ρXY σX σY
ρXY σX σY
σY2

2 2
Then |K| = σX
σY (1 − ρ2XY ) 

σY2
−ρXY σX σY
1
−1


K = 2 2
2
2
σX σY (1 − ρXY ) −ρXY σX σY
σX
and
1
1
x − mx 2
p
fX,Y (x, y) =
exp
−
)
(
2
2
2(1 − ρXY )
σx
2πσX σY 1 − ρXY
y − my 2
x − mx y − my
)(
)+(
)
−2ρXY (
σx
σy
σy
ENCS6161 – p.40/47
Jointly Gaussian Random Variables
The marginal pdf of X:
2
(x−m )
1
− 2σ2x
x
fX (x) = √
e
2πσx
The marginal pdf of Y :
(y−my )2
− 2σ2
1
y
fY (y) = √
e
2πσy
If ρXY = 0 ⇒ X, Y are independent.
The conditional pdf.
fX|Y (x|y)
exp − 2(1−ρ21
2
XY )σX
h
x − ρXY
σX
σY (y
fX,Y (x, y)
p
=
=
2 (1 − ρ2 )
fY (y)
2πσX
XY
σX
2
(1 − ρ2XY ))
(y − my ) + mx , σX
∼ N (ρXY
|
{z
}
σY
{z
}
|
⇓
⇓
E[X|Y ]
− my ) − mx
i2 V ar(X|Y )
ENCS6161 – p.41/47
Linear Transformation of Gaussian r.v.s
Let X ∼ N (m, K), Y = AX then
Y ∼ N (m̂, C), where m̂ = Am and C = AKAT
proof:
fX (A y)
1
1 −1
fY (y) =
=
− (A y − m)T K −1 (A−1 y − m)
n
1 exp
|A|
2
(2π) 2 |A||K| 2
−1
Note that A−1 y − m = A−1 (y − Am) = A−1 (y − m̂), so
(A−1 y − m)T K −1 (A−1 y − m) = (y − m̂)T (A−1 )T K −1 A−1 (y − m̂)
= (y − m̂)T C −1 (y − m̂)
1
2
2
and |A||K| = |A| |K|
fY (y) =
1
21
T
= |AKA |
12
= |C|
1
2
1
(y − m̂)T C −1 (y − m̂)
exp
−
n
1
2
(2π) 2 |C| 2
∼ N (m̂, C)
ENCS6161 – p.42/47
Linear Transformation of Gaussian r.v.s
Since K is symmetric, it is always
possible to finda matrix A s.t.




T
Λ = AKA is diagonal. Λ = 



0
λ1
λ2
..
0
.
λn
so
fY (y) =
=
1
n
2
1
− 2 (y−m̂)
1 e
T







Λ−1 (y−m̂)
(2π) |Λ| 2
(y −m̂ )2
(yn −m̂n )2
1
1
− 1 2λ 1
−
2λn
1
√
e
e
··· √
2πλ1
2πλn
That is, we can transform X into n independent Gaussian r.v.s
Y1 · · · Yn with means m̂i and variance λi .
ENCS6161 – p.43/47
Mean Square Estimation
We use g(X) to estimate Y , write as Ŷ = g(X). The
cost associated with the estimation error is C(Y − Ŷ ).
e.g.
C(Y − Ŷ ) = (Y − Ŷ )2
Y
X
= (Y − g(X))2
The mean square error
E[C] = E[(Y − g(X))2 ]
Case 1: if Ŷ = a
E[(Y − a)2 ] = E[Y 2 ] − 2aE[Y ] + a2 = f (a)
df
= 0 ⇒ a∗ = E[Y ] = µY
da
The mean square error: E[(Y − µY )2 ] = V ar[Y ]
ENCS6161 – p.44/47
Mean Square Estimation
Case(
2: if Ŷ = aX + b, then E[(Y − aX − b)2 ] = f (a, b)
∂f
σY ∗
∗
∂a = 0 ⇒ a∗ = ρ
,
b
=
µ
−
a
µX
Y
XY
∂f
σX
∂b = 0
σY
(X − µX ) + µY
⇒ Ŷ = ρXY
σX
This is called the MMSE linear estimation.
The mean square error:
E[(Y − Ŷ )2 ] = σY2 (1 − ρ2XY )
If ρXY = 0, Ŷ = µY , error= σY2 , reduces to case 1.
If ρXY = ±1, Ŷ = ± σσXY (X − µX ) + µY , and error= 0
ENCS6161 – p.45/47
Mean Square Estimation
Case 3: Ŷ is a general function of X .
E[(Y − Ŷ )2 ] = E[(Y − g(X))2 ] = E[E[(Y − g(X))2 |X]]
Z ∞
=
E[(Y − g(x))2 |x]fX (x)dx
−∞
For any x, choose g(x) to minimize E[(Y − g(x))2 |X = x]
⇒ g ∗ (x) = E[Y |X = x]
This is called the MMSE estimation.
Example: X , Y joint Gaussian.
σY
E[Y |X] = ρX,Y
(X − µX ) + µY
σX
The MMSE estimation is linear for Gaussian.
ENCS6161 – p.46/47
Mean Square Estimation
Example: X ∼ uniform(−1, 1), Y = X 2 . We have
E[X] = 0
ρXY = E[XY ] − E[X]E[Y ] = E[XY ] = E[X 3 ] = 0
So, the MMSE linear estimation:
Ŷ = µY
and the error is σY2 .
The MMSE estimation:
g ∗ (x) = E[Y |X = x] = E[X 2 |X = x] = x2
So Ŷ = X 2 and the error is 0.
ENCS6161 – p.47/47
Related documents