* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Vector space Interpretation of Random Variables
Exterior algebra wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Laplace–Runge–Lenz vector wikipedia , lookup
System of linear equations wikipedia , lookup
Euclidean vector wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Vector space wikipedia , lookup
Matrix calculus wikipedia , lookup
Vector space Interpretation of Random Variables Interpretation of random variables as elements of a vector space helps in understanding many operations involving random variables. concepts of the vector space. We start with an introduction to the We will discuss the principles of the minimum mean- square error estimation and the linear minimum mean-square error estimation of a signal from noise and the vector-space interpretation of the later. Vector space Consider a set V with elements called vectors and the field of real numbers . V is called a vector space if and only if 1. An operation vector addition '+' is defined in V such that ( V ,+) is a commutative group. Thus ( V ,+) satisfies the following properties. (i) (ii) (iii) (iv) (v) For any pair of elements v, w V , there exists a unique element (v + w) V. Vector addition is associative: v + (w + z) = (v + w) + z for any three vectors v, w, z V There is a vector 0 V such that v + 0 = 0 + v v for any v V. For any v V there is a vector v V such that v + (-v) = 0 = (-v) + v. for any v, w V , v + w = w + v. 2. For any element v V and any r . the scalar product rv V This scalar product has the following properties for any r , s and any v, w V , 3. r ( sv) (rs) v for r , s and v V r(v + w) = rv + rw 4. 5. (r + s)v = rv + sv 6. 1v = v It is easy to verify that the set of all random variables defined on a probability space ( S , F, P) forms a vector space with respect to addition and scalar multiplication. Similarly the set of all n dim ensional random vectors forms a vector space. I Linear Independence Consider n random vectors v1 , v 2 , ...., v N . If c1v1 c2 v 2 .... cN v N 0 implies that c1 c2 .... cN 0, then v1 , v 2 , ...., v N are linearly independent. For N random vectors X1 , X2 ,...., XN , if c1X1 c2 X2 .... cN XN 0 implies that c1 c2 .... cN 0, then X1 , X2 , ...., XN are linearly independent. Inner Product If v and w are real vectors in a vector space V defined over the field , the inner product v, w is a scalar such that v,w,z V and r 1. v, w w, v 2 2. v, v v 0, where v is a norm induced by the inner product 3. v w , z v, z w, z 4. rv , w r v, w In the case of two random variables X and Y , the joint expectation EXY defines an inner product between X and Y . Thus < X,Y >= EXY We can easily verify that EXY satisfies the axioms of inner product. The norm of a a random variable X is given by X 2 EX 2 X1 Y1 X Y 2 2 For two n dimensional random vectors X and Y , the inner product is Xn Yn n < X, Y > = EXY EX iYi i 1 The norm of a a random vector X is given by n X < X, X > = EXX EX i2 2 i 1 The set of RVs along with the inner product defined through the joint expectation operation and the corresponding norm defines a Hilbert Space. Schwarz Inequality For any two vectors v and w belonging to a Hilbert space V | v, w | v w This means that for any two random variables X and Y E( XY ) EX 2 EY 2 Similarly for any two random vectors X and Y E (XY) EXX EYY Orthogonal Random Variables and Orthogonal Random Vectors Two vectors v and w are called orthogonal if v, w 0 Two random variables X and Y are called orthogonal if EXY 0. Similarly two random vectors X and Y are called orthogonal if n EXY EX iYi 0 i 1 Just like the independent random variables and the uncorrelated random variables, the orthogonal random variables form an important class of random variables. Remark If X and Y are uncorrelated, then E ( X X )(Y Y ) 0 ( X X ) is orthogonal to (Y Y ) If each of X and Y is of zero-mean Cov( X , Y ) EXY In this case, EXY 0 Cov( XY ) 0. Minimum Mean-square-error Estimation Suppose X is a random variable which is not observable and Y is another observable random variable which is statistically dependent on X through the joint probability density function f X ,Y ( x, y). We pose the following problem. Given a value of Y what is the best guess for X ? This problem is known as the estimation problem and has many practical applications. One application is the signal estimation from noisy observations as illustrated in the Fig. below: Signal Estimated Noisy observation Y X + Estimation Signal X̂ y [n] y [n] Noise x [n] Let Xˆ (Y ) be the estimate of the random variable X based on the random variable x [n] Y . Clearly Xˆ (Y ) is a function of Y . We have to find best estimate be Xˆ (Y ) in some meaningful sense. Observe that X is the unknown random variable Xˆ (Y ) is the estimate of X . X Xˆ (Y ) is the estimation error. E ( X Xˆ (Y )) 2 is the mean of the square error. One meaningful criterion is to minimize E ( X Xˆ (Y )) 2 with respect to Xˆ (Y ) and the corresponding estimation principle is called minimum mean square error principle. Such a function which we want to minimize is called a cost function in optimization theory. For finding Xˆ (Y ), we have to minimize the cost function E ( X Xˆ (Y )) 2 ( x Xˆ ( y )) 2 f X ,Y ( x, y )dydx ( x Xˆ ( y ))2 fY ( y ) f X / Y ( x )dydx fY ( y )( ( x Xˆ ( y ))2 f X / Y ( x )dx)dy Since fY ( y) is always positive, the minimization of E ( X Xˆ (Y ))2 with respect to Xˆ (Y ) is equivalent to minimizing the inside integral ( x Xˆ ( y))2 f X / Y ( x)dx with respect to Xˆ (Y ). The condition for the minimum is Xˆ 2 ( x Xˆ ( y )) f X / Y ( x / y ) dx 0 Or 2 ( x Xˆ ( y )) f X / Y ( x / y ) dx 0 - Xˆ ( y ) f X / Y ( x / y )dx xf X / Y ( x / y )dx Xˆ ( y ) E ( X / Y y ) Thus, the minimum mean-square error estimation involves conditional expectation E ( X / Y y). To find E ( X / Y y), we have to determine the a posteriori probability density f X / Y ( x / y ) and perform xf X / Y ( x / y )dx. These operations are computationally exhaustive when we have to perform these operations numerically. ------------------------------------------------------------------------------------------------------Example Consider two zero-mean jointly Gaussian random variables X and Y with the joint pdf f X ,Y ( x, y) 1 2 X Y 1 X2 ,Y e 2 1 x 2 2(1 X ,Y ) 2 X 2 2 XY xy y 2 X Y Y - x , - y The marginal density fY ( y ) is a Gaussian random variable and given by fY ( y ) y2 2 Y2 1 2 Y e f X ,Y ( x, y ) f X /Y ( x / y) fY ( y ) 1 2 X 1 X2 ,Y which is Gaussian with mean e 1 2 2 2 X (1 X ,Y ) XY X x Y y 2 XY X Y y . Therefore, the MMSE estimator of X given Y y is given by Xˆ ( y ) E ( X / Y y ) XY X y Y This example illustrates that in the case of jointly Gaussian random variables X and Y , the mean-square estimator of X given Y y, is linearly related with y. This important result gives us a clue to have simpler version of the mean-square error estimation problem discussed below. Linear Minimum Mean-square-error Estimation and the Orthogonality Principle We assume that X and Y are both of zero-mean and Xˆ ( y ) ay. The estimation problem is now to find the optimal value for a. Thus, we have the linear minimum meansquare error criterion which minimizes E( X aY )2 with respect to a. Signal X Noisy observation Y + Estimated Estimation X̂ aY Signal X̂ y [n] y [n] Noise x [n] x [n] d E ( X aY ) 2 0 da d E ( X aY ) 2 0 da E ( X aY )Y 0 EeY 0 where e is the estimation error. Thus the optimum value of a is such that the estimation error ( X aY ) is orthogonal to the observed random variable Y and the optimal estimator aY is the orthogonal projection of X on Y . This orthogonality principle forms the heart of a class of estimation problem called Wiener filtering. The orthogonality principle is illustrated geometrically in the following figure (Fig. ). X e is orthogonal to Y e aY Y Orothogonal projection The optimum value of a is given by E ( X aY )Y 0 EXY aEY 2 0 EXY EY 2 The corresponding minimum linear mean-square error (LMMSE) is a LMMSE E ( X aY ) 2 E ( X aY ) X aE ( X aY )Y E ( X aY ) X 0 ( E ( X aY )Y 0, using the orthogonality principle) EX 2 aEXY The orthogonality principle can be applied to optimal estimation of a random variable from more than one observation. We illustrate this in the following example. Example Suppose X is a zero-mean random variable which is to be estimated from two zero-mean random variables Y1 and Y2 . Let the LMMSE estimator be Xˆ a1Y1 a2Y2 . Then the optimal values of a1 and a2 are given by E ( X a1Y1 a2Y2 )2 0 i 1, 2. ai This results in the orthogonality conditions E ( X a1Y1 a2Y2 )Y1 0 and E ( X a1Y1 a2Y2 )Y2 0 Rewriting the above equations we get a1 EY12 a2 EY2Y1 EXY1 and a1 EY2Y1 a2 EY22 EXY2 Solving these equations we can find a1 and a2 . Further the corresponding minimum linear mean-square error (LMMSE) is LMMSE E ( X a1Y1 a2Y2 ) 2 E ( X a1Y1 a2Y2 ) X a1E ( X a1Y1 a2Y2 )Y1 a2 E ( X a1Y1 a2Y2 )Y2 E ( X a1Y1 a2Y2 ) X 0 ( using the orthogonality principle) EX 2 a1 EXY1 a2 EXY2