Download lecture9

Estimation of Random Variables Two types of estimation: 1) Estimating parameters/statistics of a random variable (or several) from data. 2) Estimating the value of an inaccessible random variable X based on observations of another random variable ,Y . e.g. Estimate the future price of a stock based on its present (and past) price. Two conditional estimators : 1) Maximum A Posteriori Probability (MAP) Estimator: Given Y  y X̂  x such that P X  x | Y  y  is maximized PY  y | X  x  P X  x  P Y  y  P X  x | Y  y   So we need to know the probabilities on the right hand side to do this estimate, especially P(X=x) which may not be available ( Remember, X is hard to observe) If X and Y are jointly continuous : max f X  X  x | Y  y  x 2) Maximum Likelihood (ML) Estimator: Given Y  y : X̂  x such that P Y  y | X  x  is maximized i.e. Find the likeliest X value based on the observation. This is useful when P( Y = y | X ) is available , i.e. The likelihood of observing a Y value given the value of X is known. e.g. Probability of receiving a 0 on a communication channel given that a 0 or 1 was sent. If X and Y are jointly continuous : max fY  Y  y | X  x  x Example 6.26 : X, Y are jointly Gaussian MAP : fX  x| y   1 exp 2 2  2 1    X       x   X  y  mY   m X Y   22X 1   2  f X  x | y  is maximized when .......  0  X̂ MAP   X  y  mY   mX Y     2    ML : Again, fY  y | x    1 Y  x  mX   mY exp y   2 2  X  2 1   Y   2Y2 1   2     fY  y | x  is maximized when .......  0 i.e. y Y x  mX   mY  0 X X ˆ  X ML   Y  y  mY   mX Note that in this case Xˆ MAP  Xˆ ML This isn' t always the case.    2    Minimum Mean Square Estimator (MMSE) Estimate X given observations of Y X̂  g Y   such that error e  E  X  g Y  Case 1: g Y   a  min E  X  a  2  2  is minimized constant  :     min E X 2  a 2  2a E  X  a a To minimize , solve :    d E X 2  a 2  2a E  X   0 da  2 a  E  X   0  a*  E  X   X̂ *  E  X  i.e. The best constant MMSE of X , is its mean. The estimation error in this case is  E  X  E  X  2   VAR X  Case 2: Linear Estimator g(Y) = aY + b  min e  min E  X  a Y  b  a ,b a ,b 2  This is like estimating the random variable X - aY by constant b So, using case 1, b*  E  X  aY  which gives the new estimation problem  min E  X  aY  E  X  a Y  a 2  taking derivative w.r.t a and setting to 0 d 2 E  X  E  X   a Y  E Y   0 da       d d 2 2 2 E  X  E  X   a E Y  E Y  da da d  2a E  X  E  X Y  E Y   da   2a VARY   2 COV  X , Y  solving 2a * VARY   2 COV  X , Y   0  a*  COV  X , Y     XY X VARY  Y    Xˆ  a * Y  b*   XY X Y  E  X    XY X E Y  Y   Xˆ   XY X Y  E Y   E  X  Y Y Error   e*  E X  a * Y  b *   2    E  X  E  X   a * Y  E Y   2  2 2  E  X  E  X   a * Y  E Y   2 a *  X  E  X  Y  E Y  2  2   E  X  E  X   a * E Y  E Y  2 2   2a   X2  a *  Y2  2a * COV  X , Y  2   2 X  2 XY  X2 2 X   2   XY X  Y Y XY  Y2 Y 2 2   X2   XY  X2  2  XY  X2  2   X2 1   XY  *  E  X  E  X  Y  E Y  If  XY  1  X ,Y perfectly correlated  error  0 If  XY  1  error ~ 1 - ρ XY  error ~ VAR X  high variance X is harder to estimate If  XY  0 g * Y   E  X  error  VAR X  Best linear estimator is the constant estimator in this case. Heuristic explanation of linear MMSE Y  E Y  Xˆ   XY  X  EX  Y  Y  E Y  Y ~ distrib Y 0,1 (standardized version of Y )   XY  X   XY  X Y  E Y  Y Y  E Y  Y  scales standardiz ed Y to have the variance of X scaled by ρ XY  E  X   shifts rescaled distributi on to have the mean of X . Case 3: Nonlinear estimator g(Y)  min E  X  g Y  g 2  Using conditional expectation:  E  X  g Y  2    E  X  g Y   2  | Y  y fY  y  dy  Since   E  X  g Y  | Y  y  0  y 2  the integral can be minimized by minimizing this quantity for every valu e of y. However, given Y  y , g Y   g  y   constant    min E  X  g Y  , which is a case 1 problem with a  g  y  2  g *  y   E  X |Y  y  g*  E  X |Y  y  is called the regression curve. The error achieved by the regression curve is   e*  E X  g * Y     2   E  X  E  X | y  | Y  y fY  y  dy 2     VAR  X |Y  y  fY  y  dy  E X | Y  y For jointly Gaussian X, Y fX  x| y   1  exp 2 2 2 1     XY X       X  x  E  X    XY  y  E Y   Y    22X 1   XY 2  2      Thus, the optimal nonlinear MMSE is E  X | Y  y   E  X    XY X Y  E Y  Y which is same as the linear MMSE.  for jointly Gaussian X , Y , linear MMSE is optimal and the same as MAP estimator.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download lecture9