Download lecture9

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Estimation of Random Variables
Two types of estimation:
1) Estimating parameters/statistics of a random variable (or several) from data.
2) Estimating the value of an inaccessible random variable X
based on observations of another random variable ,Y .
e.g. Estimate the future price of a stock based on its present (and past) price.
Two conditional estimators :
1)
Maximum A Posteriori Probability (MAP) Estimator:
Given Y  y
X̂  x such that P X  x | Y  y  is maximized
PY  y | X  x  P X  x 
P Y  y 
P X  x | Y  y  
So we need to know the probabilities on the right hand side to do this
estimate, especially P(X=x) which may not be available
( Remember, X is hard to observe)
If X and Y are jointly continuous :
max f X  X  x | Y  y 
x
2) Maximum Likelihood (ML) Estimator:
Given Y  y :
X̂  x such that P Y  y | X  x  is maximized
i.e. Find the likeliest X value based on the observation.
This is useful when P( Y = y | X ) is available , i.e. The likelihood of
observing a Y value given the value of X is known.
e.g. Probability of receiving a 0 on a communication channel given that a
0 or 1 was sent.
If X and Y are jointly continuous :
max fY  Y  y | X  x 
x
Example 6.26 : X, Y are jointly Gaussian
MAP :
fX  x| y 

1
exp
2
2
 2 1    X





 x   X  y  mY   m X
Y


22X 1   2
 f X  x | y  is maximized when .......  0
 X̂ MAP  
X
 y  mY   mX
Y




2



ML :
Again,
fY  y | x 


1
Y

x  mX   mY
exp
y


2
2 
X
 2 1   Y 

2Y2 1   2




fY  y | x  is maximized when .......  0
i.e.
y
Y
x  mX   mY  0
X
X
ˆ
 X ML 
 Y
 y  mY   mX
Note that in this case Xˆ MAP  Xˆ ML
This isn' t always the case.



2



Minimum Mean Square Estimator (MMSE)
Estimate X given observations of Y
X̂  g Y 

such that error e  E  X  g Y 
Case 1: g Y   a

min E  X  a 
2

2
 is minimized
constant  :
 

 min E X 2  a 2  2a E  X 
a
a
To minimize , solve :
 

d
E X 2  a 2  2a E  X   0
da
 2 a  E  X   0
 a*  E  X   X̂ *  E  X 
i.e. The best constant MMSE of X , is its mean.
The estimation error in this case is

E  X  E  X 
2

 VAR X 
Case 2: Linear Estimator g(Y) = aY + b

min e  min E  X  a Y  b 
a ,b
a ,b
2

This is like estimating the random variable X - aY by constant b
So, using case 1,
b*  E  X  aY 
which gives the new estimation problem

min E  X  aY  E  X  a Y 
a
2

taking derivative w.r.t a and setting to 0
d
2
E  X  E  X   a Y  E Y   0
da






d
d 2
2
2
E  X  E  X  
a E Y  E Y 
da
da
d
 2a E  X  E  X Y  E Y  
da

 2a VARY   2 COV  X , Y 
solving
2a * VARY   2 COV  X , Y   0
 a* 
COV  X , Y 

  XY X
VARY 
Y



Xˆ  a * Y  b*   XY X Y  E  X    XY X E Y 
Y

 Xˆ   XY X Y  E Y   E  X 
Y
Y
Error


e*  E X  a * Y  b *
 
2


 E  X  E  X   a * Y  E Y 

2

2
2
 E  X  E  X   a * Y  E Y   2 a *  X  E  X  Y  E Y 
2

2

 E  X  E  X   a * E Y  E Y 
2
2
  2a
  X2  a *  Y2  2a * COV  X , Y 
2
 
2
X

2
XY
 X2 2
X


2

 XY X  Y
Y
XY
 Y2
Y
2
2
  X2   XY
 X2  2  XY
 X2

2
  X2 1   XY

*

E  X  E  X  Y  E Y 
If  XY  1
 X ,Y
perfectly correlated 
error  0
If  XY  1

error ~ 1 - ρ XY

error ~ VAR X 
high variance X is harder to estimate
If  XY  0
g * Y   E  X 
error  VAR X 
Best linear estimator is the constant estimator in this
case.
Heuristic explanation of linear MMSE
Y  E Y 
Xˆ   XY  X
 EX 
Y

Y  E Y 
Y
~ distrib Y 0,1
(standardized version of Y )
  XY  X
  XY  X
Y  E Y 
Y
Y  E Y 
Y
 scales standardiz ed Y to have the variance of X scaled by ρ XY
 E  X   shifts rescaled distributi on to have the mean of X .
Case 3: Nonlinear estimator g(Y)

min E  X  g Y 
g
2

Using conditional expectation:

E  X  g Y 
2
   E  X  g Y 

2

| Y  y fY  y  dy

Since


E  X  g Y  | Y  y  0  y
2
 the integral can be minimized by minimizing this quantity for every valu e of y.
However, given Y  y , g Y   g  y   constant


 min E  X  g Y  , which is a case 1 problem with a  g  y 
2
 g *  y   E  X |Y  y 
g*  E  X |Y  y 
is called the regression curve.
The error achieved by the regression curve is


e*  E X  g * Y 



2

 E  X  E  X | y  | Y  y fY  y  dy
2


  VAR  X |Y  y  fY  y  dy

E X | Y  y
For jointly Gaussian X, Y
fX  x| y 

1

exp
2
2
2
1




XY
X






X
 x  E  X    XY
 y  E Y  
Y



22X 1   XY
2

2





Thus, the optimal nonlinear MMSE is
E  X | Y  y   E  X    XY
X
Y  E Y 
Y
which is same as the linear MMSE.
 for jointly Gaussian X , Y , linear MMSE is optimal and the same as
MAP estimator.
Related documents