Download Vector space Interpretation of Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Exterior algebra wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Laplace–Runge–Lenz vector wikipedia , lookup

System of linear equations wikipedia , lookup

Euclidean vector wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Vector space wikipedia , lookup

Matrix calculus wikipedia , lookup

Four-vector wikipedia , lookup

Ordinary least squares wikipedia , lookup

Transcript
Vector space Interpretation of Random Variables
Interpretation of random variables as elements of a vector space helps in understanding
many operations involving random variables.
concepts of the vector space.
We start with an introduction to the
We will discuss the principles of the minimum mean-
square error estimation and the linear minimum mean-square error estimation of a
signal from noise and the vector-space interpretation of the later.
Vector space
Consider a set V with elements called vectors and the field of real numbers
.
V is called a vector space if and only if
1. An operation vector addition '+' is defined in V such that ( V ,+) is a
commutative group. Thus ( V ,+) satisfies the following properties.
(i)
(ii)
(iii)
(iv)
(v)
For any pair of elements v, w  V , there exists a unique element
(v + w)  V.
Vector addition is associative: v + (w + z) = (v + w) + z for any three vectors
v, w, z  V
There is a vector 0  V such that v + 0 = 0 + v  v for any v  V.
For any v  V there is a vector v  V such that v + (-v) = 0 = (-v) + v.
for any v, w  V , v + w = w + v.
2. For any element v  V and any r  . the scalar product rv  V
This scalar product has the following properties for any r , s  and any
v, w  V ,
3. r ( sv)  (rs) v for r , s  and v  V
r(v + w) = rv + rw
4.
5. (r + s)v = rv + sv
6. 1v = v
It is easy to verify that the set of all random variables defined on a probability space
( S , F, P) forms a vector space with respect to addition and scalar multiplication. Similarly
the set of all n  dim ensional random vectors forms a vector space. I
Linear Independence
Consider n random vectors v1 , v 2 , ...., v N .
If c1v1  c2 v 2  ....  cN v N  0 implies that
c1  c2  ....  cN  0, then v1 , v 2 , ...., v N are linearly independent.
For
N random vectors
X1 , X2 ,...., XN , if
c1X1  c2 X2  ....  cN XN  0 implies that
c1  c2  ....  cN  0, then X1 , X2 , ...., XN are linearly independent.
Inner Product
If v and w are real vectors in a vector space V defined over the field , the inner product
 v, w  is a scalar such that
v,w,z V and r 
1.  v, w    w, v 
2
2.  v, v   v  0, where v is a norm induced by the inner product
3.  v  w , z    v, z    w, z 
4.  rv , w   r  v, w 
In the case of two random variables X and Y , the joint expectation EXY defines an
inner product between X and Y . Thus
< X,Y >= EXY
We can easily verify that EXY satisfies the axioms of inner product.
The norm of a a random variable X is given by
X
2
 EX 2
 X1 
Y1 
X 
Y 
2
2

For two n  dimensional random vectors X 
and Y    , the inner product is
 
 
 
 
Xn 
Yn 
n
< X, Y > = EXY   EX iYi
i 1
The norm of a a random vector X is given by
n
X < X, X > = EXX   EX i2
2
i 1

The set of RVs along with the inner product defined through the joint expectation
operation and the corresponding norm defines a Hilbert Space.
Schwarz Inequality
For any two vectors v and w belonging to a Hilbert space V
| v, w 
| v w
This means that for any two random variables X and Y
E( XY )  EX 2 EY 2
Similarly for any two random vectors X and Y
E (XY)  EXX EYY
Orthogonal Random Variables and Orthogonal Random Vectors
Two vectors v and w are called orthogonal if  v, w   0
Two random variables X and Y are called orthogonal if EXY  0.
Similarly two random vectors X and Y are called orthogonal if
n
EXY   EX iYi  0
i 1
Just like the independent random variables and the uncorrelated random variables, the
orthogonal random variables form an important class of random variables.
Remark
If X and Y are uncorrelated, then
E ( X   X )(Y  Y )  0
 ( X   X ) is orthogonal to (Y  Y )
If each of X and Y is of zero-mean
Cov( X , Y )  EXY
In this case, EXY  0  Cov( XY )  0.
Minimum Mean-square-error Estimation
Suppose X is a random variable which is not observable and Y is another observable
random variable which is statistically dependent on X through the joint probability
density function f X ,Y ( x, y). We pose the following problem.
Given a value of Y what is the best guess for X ?
This problem is known as the estimation problem and has many practical applications.
One application is the signal estimation from noisy observations as illustrated in the Fig.
below:
Signal
Estimated
Noisy observation Y
X
+
Estimation
Signal X̂
y [n]
y [n]
Noise
x [n]
Let Xˆ (Y ) be the estimate of the random variable X based on the random variable
x [n]
Y . Clearly Xˆ (Y ) is a function of Y . We have to find best estimate be Xˆ (Y ) in some
meaningful sense. Observe that

X is the unknown random variable

Xˆ (Y ) is the estimate of X .

X  Xˆ (Y ) is the estimation error.

E ( X  Xˆ (Y )) 2 is the mean of the square error.
One meaningful criterion is to minimize E ( X  Xˆ (Y )) 2 with respect to Xˆ (Y ) and the
corresponding estimation principle is called minimum mean square error principle. Such
a function which we want to minimize is called a cost function in optimization theory. For
finding Xˆ (Y ), we have to minimize the cost function
 
E ( X  Xˆ (Y )) 2    ( x  Xˆ ( y )) 2 f X ,Y ( x, y )dydx
 
 
   ( x  Xˆ ( y ))2 fY ( y ) f X / Y ( x )dydx
 




  fY ( y )(  ( x  Xˆ ( y ))2 f X / Y ( x )dx)dy
Since fY ( y) is always positive, the minimization of E ( X  Xˆ (Y ))2 with respect

to Xˆ (Y ) is equivalent to minimizing the inside integral  ( x  Xˆ ( y))2 f X / Y ( x)dx

with respect to Xˆ (Y ). The condition for the minimum is

Xˆ

2
 ( x  Xˆ ( y )) f X / Y ( x / y ) dx  0


Or 2  ( x  Xˆ ( y )) f X / Y ( x / y ) dx  0
-




  Xˆ ( y ) f X / Y ( x / y )dx   xf X / Y ( x / y )dx
 Xˆ ( y )  E ( X / Y  y )
Thus, the minimum mean-square error estimation involves conditional expectation
E ( X / Y  y). To find E ( X / Y  y),
we have to determine the a posteriori probability

density f X / Y ( x / y ) and perform
 xf X / Y ( x / y )dx. These operations are computationally

exhaustive when we have to perform these operations numerically.
------------------------------------------------------------------------------------------------------Example Consider two zero-mean jointly Gaussian random variables X and Y with the
joint pdf
f X ,Y ( x, y) 

1
2 X  Y 1  X2 ,Y
e
 2
1
x
2
2(1  X ,Y )   2
 X
2
 2  XY  xy  y 2 
X Y Y 

-  x  , -  y  
The marginal density fY ( y ) is a Gaussian random variable and given by
fY ( y ) 
y2
2  Y2

1
2  Y
e
f X ,Y ( x, y )
 f X /Y ( x / y) 
fY ( y )


1
2 X 1  X2 ,Y

which is Gaussian with mean
e
1
2
2 2
X (1  X ,Y )
  XY  X
x 
Y


y

2

XY X

Y
y . Therefore, the MMSE estimator of X given
Y  y is given by
Xˆ ( y )  E ( X / Y  y )

 XY  X

y
Y
This example illustrates that in the case of jointly Gaussian random variables X and Y ,
the mean-square estimator of X given Y  y, is linearly related with y. This important result
gives us a clue to have simpler version of the mean-square error estimation problem discussed
below.
Linear Minimum Mean-square-error Estimation and the Orthogonality Principle
We assume that X and Y are both of zero-mean and Xˆ ( y )  ay. The estimation problem
is now to find the optimal value for a. Thus, we have the linear minimum meansquare error criterion which minimizes E( X  aY )2 with respect to a.
Signal
X
Noisy observation Y
+
Estimated
Estimation
X̂  aY
Signal X̂
y [n]
y [n]
Noise
x [n]
x [n]
d
E ( X  aY ) 2  0
da
d
 E ( X  aY ) 2  0
da
 E ( X  aY )Y  0
 EeY  0
where e is the estimation error.
Thus the optimum value of a is such that the estimation error ( X  aY ) is orthogonal to
the observed random variable Y and the optimal estimator aY is the orthogonal
projection of X on
Y . This orthogonality principle forms the heart of a class of
estimation problem called Wiener filtering. The orthogonality principle is illustrated
geometrically in the following figure (Fig. ).
X
e is orthogonal to Y
e
aY
Y
Orothogonal
projection
The optimum value of a is given by
E ( X  aY )Y  0
 EXY  aEY 2  0
EXY
EY 2
The corresponding minimum linear mean-square error (LMMSE) is
 a
LMMSE  E ( X  aY ) 2
 E ( X  aY ) X  aE ( X  aY )Y
 E ( X  aY ) X  0
( E ( X  aY )Y  0, using the orthogonality principle)
 EX 2  aEXY
The orthogonality principle can be applied to optimal estimation of a random variable
from more than one observation. We illustrate this in the following example.
Example Suppose X is a zero-mean random variable which is to be estimated from two
zero-mean random variables Y1 and Y2 . Let the LMMSE estimator be Xˆ  a1Y1  a2Y2 .
Then the optimal values of a1 and a2 are given by
E ( X  a1Y1  a2Y2 )2
 0 i  1, 2.
ai
This results in the orthogonality conditions
E ( X  a1Y1  a2Y2 )Y1  0
and
E ( X  a1Y1  a2Y2 )Y2  0
Rewriting the above equations we get
a1 EY12  a2 EY2Y1  EXY1
and
a1 EY2Y1  a2 EY22   EXY2
Solving these equations we can find a1 and a2 .
Further the corresponding minimum linear mean-square error (LMMSE) is
LMMSE  E ( X  a1Y1  a2Y2 ) 2
 E ( X  a1Y1  a2Y2 ) X  a1E ( X  a1Y1  a2Y2 )Y1  a2 E ( X  a1Y1  a2Y2 )Y2
 E ( X  a1Y1  a2Y2 ) X  0
( using the orthogonality principle)
 EX 2  a1 EXY1  a2 EXY2