Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 5, 6 Multiple Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University Vector Random Variables A vector r.v. X is a function X : S → Rn , where S is the sample space of a random experiment. Example: randomly pick up a student name from a list. S = {all student names on the list}. Let ω be a given outcome, e.g. Tom H(ω) : height of student ω H, W, A are r.v.s. W (ω) : weight of student ω A(ω) : age of student ω Let X = (H, W, A), then X is a vector r.v. ENCS6161 – p.1/47 Events Each event involving X = (X1 , X2 , · · · , Xn ) has a corresponding region in Rn . Example: X = (X1 , X2 ) is a two-dimensional r.v. A = {X1 + X2 ≤ 10} B = {min(X1 , X2 ) ≤ 5} C = {X12 + X22 ≤ 100} ENCS6161 – p.2/47 Pairs of Random Variables Pairs of discrete random variables Joint probability mass function PX,Y (xj , yk ) = P {X = xj P P \ Y = yk } = P {X = xj , Y = yk } Obviously j k PX,Y (xj , yk ) = 1. Marginal Probability Mass Function PX (xj ) = P {X = xj } = P {X = xj , Y = anything} = Similarly PY (yk ) = P∞ j=1 ∞ X PX,Y (xj , yk ) k=1 PX,Y (xj , yk ). ENCS6161 – p.3/47 Pairs of Random Variables The joint CDF of X and Y (for both discrete and continuous r.v.s) FX,Y (x, y) = P {X ≤ x, Y ≤ y} y 6 (x,y) x ENCS6161 – p.4/47 Pairs of Random Variables Properties of the joint CDF: 1. 2. 3. 4. FX,Y (x1 , y1 ) ≤ FX,Y (x2 , y2 ), if x1 ≤ x2 , y1 ≤ y2 . FX,Y (−∞, y) = FX,Y (x, −∞) = 0 FX,Y (∞, ∞) = 1 FX (x) = P {X ≤ x} = P {X ≤ x, Y = anything} = P {X ≤ x, Y ≤ ∞} = FX,Y (X, ∞) FY (y) = FX,Y (∞, y) FX (x), FY (y) : Marginal cdf ENCS6161 – p.5/47 Pairs of Random Variables The joint pdf of two jointly continuous r.v.s. ∂2 fX,Y (x, y) = FX,Y (x, y) ∂x∂y Obviously, Z ∞Z ∞ fX,Y (x, y)dxdy = 1 −∞ −∞ and FX,Y (x, y) = Z x Z y fX,Y (x0 , y 0 )dy 0 dx0 −∞ −∞ ENCS6161 – p.6/47 Pairs of Random Variables The probability P {a ≤ X ≤ b, c ≤ Y ≤ d} = In general, P {(X, Y ) ∈ A} = Z Z Z bZ a d fX,Y (x, y)dydx c fX,Y (x, y)dxdy A Example: 1 y 6 A Z 1Z 0 0 1 x fX,Y (x0 , y 0 )dy 0 dx0 0 - x ENCS6161 – p.7/47 Pairs of Random Variables Marginal pdf: fX (x) = = = fY (y) = d d FX (x) = FX,Y (x, ∞) dx dx Z xZ ∞ d fX,Y (x0 , y 0 )dy 0 dx0 dx −∞ −∞ Z ∞ fX,Y (x, y 0 )dy 0 −∞ Z ∞ fX,Y (x0 , y)dx0 −∞ ENCS6161 – p.8/47 Pairs of Random Variables Example: 1 fX,Y (x, y) = 0 if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 otherwise. Find FX,Y (x, y) 1) x ≤ 0 or y ≤ 0, FX,Y (x, y) = 0 2) 0 ≤ x ≤ 1, and 0 ≤ y ≤ 1 Z Z x y 1 dy 0 dx0 = xy FX,Y (x, y) = 0 3) 0 ≤ x ≤ 1, and y > 1 FX,Y (x, y) = 4) x > 1 and 0 ≤ y < 1 Z 0 x 0 Z 1 1 dy 0 dx0 = x 0 FX,Y (x, y) = y 5) x > 1 and y > 1 FX,Y (x, y) = 1 ENCS6161 – p.9/47 Independence PX,Y (xj , yk ) = PX (xj )PY (yk ), for all xj and yk (discrete r.v.s) or FX,Y (x, y) = FX (x)FY (y) for all x and y or fX,Y (x, y) = fX (x)fY (y) for all x and y Example: a) 6 fX,Y = ( 1 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 0 otherwise b) fX,Y = - 6 ( √ 1 0 ≤ x ≤ 2, 0 ≤ y ≤ x 0 otherwise - ENCS6161 – p.10/47 Conditional Probability If X is discrete, FY (y | x) = fY (y | x) = P {Y ≤ y, X = x} P {X = x} d FY (y | x) dy for P {X = x} > 0 If X is continuous, P {X = x} = 0 FY (y | x) = = h→0 lim h→0 R y R x+h −∞ x fY (y|x) = lim Ry fX,Y (x0 , y 0 )dx0 dy 0 R x+h x = P {Y ≤ y, x < X ≤ x + h} h→0 P {x < X ≤ x + h} lim FY (y | x < X ≤ x + h) = lim fX (x0 )dx0 0 0 f (x, y )dy · h −∞ X,Y fX (x) · h fX,Y (x, y) d FY (y|x) = dy fX (x) h→0 = Ry 0 0 f (x, y )dy X,Y −∞ fX (x) ENCS6161 – p.11/47 Conditional Probability If X , Y independent, fY (y|x) = fY (y) Similarly, fX,Y (x, y) fX (x|y) = fY (y) So, fX,Y (x, y) = fX (x) · fY (y|x) = fY (y) · fX (x|y) Bayes Rule: fX (x) · fY (y|x) fX (x|y) = fY (y) ENCS6161 – p.12/47 Conditional Probability Example: A r.v. X is uniformly selected in [0, 1], and then Y is selected uniformly in [0, x]. Find fY (y) Solution: 1 1 fX,Y (x, y) = fX (x)fY (y|x) = 1 · = x x for 0 ≤ x ≤ 1, 0 ≤ y ≤ x and is 0 elsewhere. Z ∞ fY (y) = fX,Y (x, y)dx 1 0 y 6 1 x −∞ Z 1 1 dx = − ln y = y x for 0 ≤ y ≤ 1 and fY (y) = 0 elsewhere. ENCS6161 – p.13/47 Conditional Probability Example: x ∈ {0, 1} - fY (y | x) Y-∈ R Detector X̂ - Channel X = 0 −→ −A (volts), we assume P {X = 0} = P0 X = 1 −→ +A (volts), P {X = 1} = P1 = 1 − P0 Decide X̂ = 0, if P {X = 0 | y} ≥ P {X = 1 | y} Decide X̂ = 1, if P {X = 0 | y} < P {X = 1 | y} This is called the Maximum a posterior probability (MAP) detection. ENCS6161 – p.14/47 Conditional Probability Binary communication over Additive White Gaussian Noise (AWGN) channel 2 1 − (y+A) fY (y|0) = √ e 2σ2 2πσ (y−A)2 1 e− 2σ2 fY (y|1) = √ 2πσ Apply the MAP detection, we need to find P {X = 0|y} and P {X = 1|y}. Note here X is discrete, Y is continuous. ENCS6161 – p.15/47 Conditional Probability Use the similar approach (considering x < X ≤ x + h, and let h → 0), we have P {X = 0}fY (y|0) P {X = 0|y} = fY (y) P {X = 1}fY (y|0) P {X = 1|y} = fY (y) Decide X̂ = 0, if p0 σ2 ln P {X = 0|y} ≥ P {X = 1|y} ⇒ y ≤ 2A p1 Decide X̂ = 1, if σ2 p0 P {X = 0|y} < P {X = 1|y} ⇒ y > ln 2A p1 When p0 = p1 = 1 2 : Decide X̂ = 0, if y ≤ 0 Decide X̂ = 1, if y > 0 ENCS6161 – p.16/47 Conditional Probability Prob of error: considering the special case p0 = p1 = 12 Pε = P0 P {X̂ = 1|X = 0} + P1 P {X̂ = 0|X = 1} = P0 P {Y > 0|X = 0} + P1 P {Y ≤ 0|X =1} Z ∞ (y+A)2 1 A − 2σ2 P {Y > 0|X = 0} = √ e dy = Q σ 2πσ 0 Similarly, A P {Y ≤ 0|X = 1} = Q σ A ∴ Pε = Q σ A ↑, Pε ↓ σ ↑, Pε ↑ ENCS6161 – p.17/47 Conditional Expectation The conditional expectation E[Y |x] = In discrete case, Z E[Y |x] = An important fact: yfY (y|x)dy −∞ X yi yi PY (yi |x) E[Y ] = E[E[Y |X]] Proof: E[E[Y |X]] = = In general: ∞ Z Z ∞ −∞ ∞ −∞ E[Y |x]fX (x)dx = Z ∞ Z ∞ −∞ Z ∞ yfY (y|x)fX (x)dydx −∞ yfX,Y (x, y)dydx = E[Y ] −∞ E[h(Y )] = E[E[h(Y )|X]] ENCS6161 – p.18/47 Multiple Random Variables Joint cdf FX1 ,···Xn (x1 , · · · xn ) = P [X1 ≤ x1 , · · · , X1 ≤ xn ] Joint pdf ∂n fX1 ,···Xn (x1 , · · · xn ) = FX1 ,···Xn (x1 , · · · xn ) ∂x1 , · · · ∂xn If discrete, joint pmf PX1 ,···Xn (x1 , · · · xn ) = P [X1 = x1 , · · · Xn = xn ] Marginal pdf Z fXi (xi ) = ∞ ··· Z ∞ −∞ −∞ all x1 ···xn except xi f (x1 , · · · xn )dx1 · · · dxn ENCS6161 – p.19/47 Independence X1 , · · · Xn are independent iff FX1 ,···Xn (x1 , · · · xn ) = FX1 (x1 ) · · · FXn (xn ) for all x1 , · · · , xn If we use pdf, fX1 ,···Xn (x1 , · · · xn ) = fX1 (x1 ) · · · fXn (xn ) for all x1 , · · · , xn ENCS6161 – p.20/47 Functions of Several r.v.s One function of several r.v.s Z = g(X1 , · · · Xn ) Let Rz = {x = (x1 , · · · , xn ) s.t. g(x) ≤ z} then Fz (z) = P {X ∈ Rz } Z Z = · · · fX1 ,··· ,Xn (x1 , · · · , xn )dx1 · · · dxn x∈Rz ENCS6161 – p.21/47 Functions of Several r.v.s Example: Z = X + Y , find Fz (z) and fz (z) in terms of fX,Y (x, y) Z = X +Y ≤z ⇒ Y ≤z−X Z ∞ Z z−x Fz (z) = fX,Y (x, y)dydx −∞ −∞ Z ∞ d Fz (z) = fz (z) = fX,Y (x, z − x)dx dz −∞ y y=z-x x If X and Y are independent Z ∞ fz (z) = fX (x)fY (z − x)dx −∞ ENCS6161 – p.22/47 Functions of Several r.v.s Example: let Z = X/Y . Find the pdf of Z if X and Y are independent and both exponentially distributed with mean one. Can use the similar approach as previous example, but complicated. Fix Y = y , then Z = X/y and fZ (z|y) = |y|fX (yz|y). So Z ∞ Z ∞ fZ (z) = fZ,Y (z, y)dy = fZ (z|y)fY (y)dy −∞ −∞ Z ∞ Z ∞ = |y| fX (yz|y)fY (y)dy = |y| fX,Y (yz, y)dy −∞ −∞ ENCS6161 – p.23/47 Functions of Several r.v.s Since X, Y are indep. exponentially distributed Z ∞ Z ∞ fZ (z) = y fX (yz)fY (y)dy = ye−yz e−y dy 0 1 = (1 + z)2 0 for z > 0 ENCS6161 – p.24/47 Transformation of Random Vectors Transformation of Random Vectors Z1 = g1 (X1 · · · Xn ) Z2 = g2 (X1 · · · Xn ) · · · Zn = gn (X1 · · · Xn ) The joint CDF of Z is FZ1 ···Zn (z1 · · · zn ) = P {Z1 ≤ z1 , · · · , Zn ≤ zn } Z Z = · · · fX1 ···Xn (x1 · · · xn )dx1 · · · dxn x:gk (x)≤zk ENCS6161 – p.25/47 pdf of Linear Transformations If Z = AX , where A is a n × n invertible matrix. fZ (z) = fZ1 ···Zn (z1 , · · · , zn ) = fX (A−1 z) fX1 ···Xn (x1 , · · · , xn ) = |A| |A| x=A−1 z |A| is the absolute " # value of the determinant of A. e.g if A = a b c d , then |A| = |ad − bc| ENCS6161 – p.26/47 pdf of General Transformations Z1 = g1 (X), Z2 = g2 (X), · · · , Zn = gn (X) where X = (X1 , · · · , Xn ) We assume that the set of equations: z1 = g1 (x), · · · , zn = gn (x) has a unique solution given by x1 = h1 (z), · · · , xn = hn (z) The joint pdf of Z is given by fX1 ···Xn (h1 (z), · · · , hn (z)) fZ1 ···Zn (z1 · · · zn ) = |J(x1 , · · · , xn )| = fX1 ···Xn (h1 (z), · · · , hn (z)) |J(z1 , · · · , zn )| (∗) where J(x1 , · · · , xn ) is called the Jacobian of the transformation. ENCS6161 – p.27/47 pdf of General Transformations The Jacobian of the transformation ∂g ∂g1 1 ∂x1 · · · ∂xn J(x1 , · · · , xn ) = det · · · · · · · · · ∂gn ∂gn · · · ∂x1 ∂xn and ∂h1 ∂h1 · · · ∂z1 ∂zn 1 J(z1 , · · · , zn ) = det · · · · · · · · · = J(x · · · x ) 1 n x=h(z) ∂hn ∂hn · · · ∂z1 ∂zn Linear transformation is a special case of (∗) ENCS6161 – p.28/47 pdf of General Transformations Example: let X and Y be zero-mean unit-variance independent Gaussian r.vs. Find the joint pdf of V and ( W defined by: 1 2 2 2 V = (X + Y ) W = ∠(X, Y ) = arctan(Y /X) W ∈ [0, 2π) This is a transformation from Cartesian to Polar coordinates. The inverse transformation is: y ( y v w x x = v cos(w) y = v sin(w) x ENCS6161 – p.29/47 pdf of General Transformations The Jacobian " # cos w −v sin w J(v, w) = = v cos2 w + v sin2 w = v sin w v cos w Since X and Y are zero-mean unit-variance independent Gaussian r.v.s, 1 − y2 1 − x2 +y2 1 − x2 fX,Y (x, y) = √ e 2 · √ e 2 = e 2 2π 2π 2π The joint pdf of V, W is then v − v2 1 −(v2 cos2 w+v2 sin2 w) 2 = e e 2 fV,W (v, w) = v · 2π 2π for v ≥ 0 and 0 ≤ w < 2π ENCS6161 – p.30/47 pdf of General Transformations The marginal pdf of V and W Z ∞ Z fV (v) = fV,W (v, w)dw = 2π v2 v − v2 − e 2 dw = ve 2 2π −∞ 0 for v ≥ 0. This Z ∞Distribution. Z ∞is called the Rayleigh v2 1 1 −2 ve dv = fW (w) = fV,W (v, w)dv = 2π 0 2π −∞ for 0 ≤ w < 2π Since fV,W (v, w) = fV (v)fW (w) ⇒ V, W are independent. ENCS6161 – p.31/47 Expected Value of Functions of r.v.s Let Z =Zg(X1 , XZ2 , · · · , Xn ) then ∞ ∞ E[Z] = ··· g(x1 , · · · , xn )fX1 ···Xn (x1 , · · · , xn )dx1 · · · dxn −∞ −∞ For discrete Xcase, X g(x1 , · · · , xn )PX1 ···Xn (x1 , · · · , xn ) E[Z] = ··· all possible x Example: Z = X1 + X2 + · · · + Xn E[Z] = E[X1 + X2 + · · · + Xn ] Z ∞ Z ∞ = ··· (x1 + · · · + xn )fX1 ···Xn (x1 · · · xn )dx1 · · · dxn −∞ −∞ = E[X1 ] + · · · + E[Xn ] ENCS6161 – p.32/47 Expected Value of Functions of r.v.s Example: X2 · · · Xn Z ∞Z = X Z 1∞ E[Z] = ··· x1 · · · xn fX1 ···Xn (x1 · · · xn )dx1 · · · dxn −∞ −∞ If X1 , X2 , · · · , Xn are indep. E[Z] = E[X1 X2 · · · Xn ] = E[X1 ]E[X2 ] · · · E[Xn ] The (j, k)-th moment ofZtwo r.v.s X&Y is Z E[X j Y k ] = ∞ ∞ −∞ −∞ ∞ ∞ −∞ −∞ xj y k fX,Y (x, y)dxdy If j = k = 1 , it is called Zthe correlation. Z E[XY ] = xyfX,Y (x, y)dxdy If E[XY ] = 0, we call X&Y are orthogonal. ENCS6161 – p.33/47 Expected Value of Functions of r.v.s The (j, k)-th central moment of X, Y is h E (X − E (X))j (Y − E (Y ))k when j = 2, k = 0, ⇒ V ar(X) j = 0, k = 2, ⇒ V ar(Y ) i When j = k = 1, it is called the covariance of X, Y Cov(X, Y ) = E[(X − E(X))(Y − E(Y ))] = E[XY ] − E[X]E[Y ] = Cov(Y, X) The correlation coefficient of X and Y is defined as Cov(X, Y ) ρX,Y = σX σY p p where σX = V ar(X) and σY = V ar(Y ). ENCS6161 – p.34/47 Expected Value of Functions of r.v.s The correlation coefficient −1 ≤ ρX,Y ≤ 1 Proof: ( 2 ) X − E[X] Y − E[Y ] 0 ≤ E ± σX σY = 1 ± 2ρX,Y + 1 = 2(1 ± ρX,Y ) If ρX,Y = 0, X, Y are said to be uncorrelated. If X, Y are independent, E[XY ] = E[X]E[Y ] ⇒ Cov(X, Y ) = 0 ⇒ ρX,Y = 0. Hence, X, Y are uncorrelated. The converse is not always true. It is true in the case of Gaussian r.v.s ( will be discussed later) ENCS6161 – p.35/47 Expected Value of Functions of r.v.s Example: θ is uniform in [0, 2π). Let X = cos θ and Y = sin θ X and Y are not independent, since X 2 + Y 2 = 1. However 1 E[XY ] = E[sin θ cos θ] = E[ sin(2θ)] 2 Z 2π 1 1 sin(2θ)dθ = 0 = 2π 2 0 We can also show E[X] = E[Y ] = 0. So Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0 ⇒ ρX,Y = 0 X, Y are uncorrelated but not independent. ENCS6161 – p.36/47 Joint Characteristic Function Joint Characteristic Function h i ΦX1 ···Xn (w1 , · · · , wn ) = E ej(w1 X1 +···+wn Xn ) For two variables h i ΦX,Y (w1 , w2 ) = E ej(w1 X+w2 Y ) Marginal characteristic function ΦX (w) = ΦX,Y (w, 0) ΦY (w) = ΦX,Y (0, w) If X, Y are independent ΦX,Y (w1 , w2 ) = E[ejw1 X+jw2 Y ] = E[ejw1 X ]E[ejw2 Y ] = ΦX (w1 )ΦY (w2 ) ENCS6161 – p.37/47 Joint Characteristic Function If Z = aX + bY ΦZ (w) = E[ejw(aX+bY ) ] = ΦX,Y (aw, bw) If Z = X + Y , X and Y are independent ΦZ (w) = ΦX,Y (w, w) = ΦX (w)ΦY (w) ENCS6161 – p.38/47 Jointly Gaussian Random Variables Consider a vector of random variables X = (X1 , X2 , · · · Xn ). Each with mean mi = E[Xi ], for i = 1, · · · , n and the covariance matrix V ar(X1 ) Cov(X1 , X2 ) · · · Cov(X1 , Xn ) . . . . . . K= . . ··· . Cov(Xn , X1 ) ··· ··· V ar(Xn ) Let m = [m1 , · · · , mn ]T be the mean vector and x = [x1 , · · · , xn ]T where (·)T denotes transpose. Then X1 , X2 , · · · Xn are said to be the jointly Gaussian if their joint pdf is: 1 1 T −1 fX (x) = exp − − m) K (x − m) (x 1 n 2 (2π) 2 |K| 2 where |K| is the determinant of K . ENCS6161 – p.39/47 Jointly Gaussian Random Variables For n = 1, let V ar(X1 ) = σ 2 , then fX (x) = √1 e 2πσ (x−m)2 2σ 2 For n the r.v.s by X and Y. Let = 2, denote m= mX mY K= 2 σX ρXY σX σY ρXY σX σY σY2 2 2 Then |K| = σX σY (1 − ρ2XY ) σY2 −ρXY σX σY 1 −1 K = 2 2 2 2 σX σY (1 − ρXY ) −ρXY σX σY σX and 1 1 x − mx 2 p fX,Y (x, y) = exp − ) ( 2 2 2(1 − ρXY ) σx 2πσX σY 1 − ρXY y − my 2 x − mx y − my )( )+( ) −2ρXY ( σx σy σy ENCS6161 – p.40/47 Jointly Gaussian Random Variables The marginal pdf of X: 2 (x−m ) 1 − 2σ2x x fX (x) = √ e 2πσx The marginal pdf of Y : (y−my )2 − 2σ2 1 y fY (y) = √ e 2πσy If ρXY = 0 ⇒ X, Y are independent. The conditional pdf. fX|Y (x|y) exp − 2(1−ρ21 2 XY )σX h x − ρXY σX σY (y fX,Y (x, y) p = = 2 (1 − ρ2 ) fY (y) 2πσX XY σX 2 (1 − ρ2XY )) (y − my ) + mx , σX ∼ N (ρXY | {z } σY {z } | ⇓ ⇓ E[X|Y ] − my ) − mx i2 V ar(X|Y ) ENCS6161 – p.41/47 Linear Transformation of Gaussian r.v.s Let X ∼ N (m, K), Y = AX then Y ∼ N (m̂, C), where m̂ = Am and C = AKAT proof: fX (A y) 1 1 −1 fY (y) = = − (A y − m)T K −1 (A−1 y − m) n 1 exp |A| 2 (2π) 2 |A||K| 2 −1 Note that A−1 y − m = A−1 (y − Am) = A−1 (y − m̂), so (A−1 y − m)T K −1 (A−1 y − m) = (y − m̂)T (A−1 )T K −1 A−1 (y − m̂) = (y − m̂)T C −1 (y − m̂) 1 2 2 and |A||K| = |A| |K| fY (y) = 1 21 T = |AKA | 12 = |C| 1 2 1 (y − m̂)T C −1 (y − m̂) exp − n 1 2 (2π) 2 |C| 2 ∼ N (m̂, C) ENCS6161 – p.42/47 Linear Transformation of Gaussian r.v.s Since K is symmetric, it is always possible to finda matrix A s.t. T Λ = AKA is diagonal. Λ = 0 λ1 λ2 .. 0 . λn so fY (y) = = 1 n 2 1 − 2 (y−m̂) 1 e T Λ−1 (y−m̂) (2π) |Λ| 2 (y −m̂ )2 (yn −m̂n )2 1 1 − 1 2λ 1 − 2λn 1 √ e e ··· √ 2πλ1 2πλn That is, we can transform X into n independent Gaussian r.v.s Y1 · · · Yn with means m̂i and variance λi . ENCS6161 – p.43/47 Mean Square Estimation We use g(X) to estimate Y , write as Ŷ = g(X). The cost associated with the estimation error is C(Y − Ŷ ). e.g. C(Y − Ŷ ) = (Y − Ŷ )2 Y X = (Y − g(X))2 The mean square error E[C] = E[(Y − g(X))2 ] Case 1: if Ŷ = a E[(Y − a)2 ] = E[Y 2 ] − 2aE[Y ] + a2 = f (a) df = 0 ⇒ a∗ = E[Y ] = µY da The mean square error: E[(Y − µY )2 ] = V ar[Y ] ENCS6161 – p.44/47 Mean Square Estimation Case( 2: if Ŷ = aX + b, then E[(Y − aX − b)2 ] = f (a, b) ∂f σY ∗ ∗ ∂a = 0 ⇒ a∗ = ρ , b = µ − a µX Y XY ∂f σX ∂b = 0 σY (X − µX ) + µY ⇒ Ŷ = ρXY σX This is called the MMSE linear estimation. The mean square error: E[(Y − Ŷ )2 ] = σY2 (1 − ρ2XY ) If ρXY = 0, Ŷ = µY , error= σY2 , reduces to case 1. If ρXY = ±1, Ŷ = ± σσXY (X − µX ) + µY , and error= 0 ENCS6161 – p.45/47 Mean Square Estimation Case 3: Ŷ is a general function of X . E[(Y − Ŷ )2 ] = E[(Y − g(X))2 ] = E[E[(Y − g(X))2 |X]] Z ∞ = E[(Y − g(x))2 |x]fX (x)dx −∞ For any x, choose g(x) to minimize E[(Y − g(x))2 |X = x] ⇒ g ∗ (x) = E[Y |X = x] This is called the MMSE estimation. Example: X , Y joint Gaussian. σY E[Y |X] = ρX,Y (X − µX ) + µY σX The MMSE estimation is linear for Gaussian. ENCS6161 – p.46/47 Mean Square Estimation Example: X ∼ uniform(−1, 1), Y = X 2 . We have E[X] = 0 ρXY = E[XY ] − E[X]E[Y ] = E[XY ] = E[X 3 ] = 0 So, the MMSE linear estimation: Ŷ = µY and the error is σY2 . The MMSE estimation: g ∗ (x) = E[Y |X = x] = E[X 2 |X = x] = x2 So Ŷ = X 2 and the error is 0. ENCS6161 – p.47/47