Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Variables Overview Definition: Random Variable • Definitions X(ζ) = ζ S • Cumulative distribution function • Probability density Function Real Line ζ • Functions of random variables Range • Expected values • Markov & Chebyshev inequalities • Definition: a random variable X is a function that assigns a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. • Independence & marginal distributions • The sample space S is the domain of the random variable • Bayes rule and conditional probability • The set of all values that X can have is the range of the random variable • Mean & variance • Mean square estimation • This is a many to one mapping. That is, a set of points, ζ1 , ζ2 , . . . may take on the same value of the random variable • Linear prediction • Will abbreviate as simply “RV” J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 1 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 Example 1: Random Variable Definitions Cumulative Distribution Function Suppose that a coin is tossed three times and the sequence of heads and tails is noted. What is the sample space for this experiment? Let X be the number of heads in three coin tosses. What is the range of X? List all of the points in the domain of the sample space and the corresponding values of X. The cumulative distribution function (CDF) of a random variable X is defined as the probability of the event {X ≤ x}: 2 F (x) = P [X ≤ x] • Sometimes is just called distribution function • Here X is the random variable and x is a non-random variable J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 3 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 4 Properties of the CDF Example 2: Distribution Functions 1. 0 ≤ F (x) ≤ 1 The arrival time of Joe’s email obeys the exponential probability law with parameter λ: 0 x<0 P [X > x] = −λx x ≥ 0. λe 2. limx→+∞ F (x) = 1 3. limx→−∞ F (x) = 0 4. F (x) is a nondecreasing function of x. Thus, if a < b, then F (a) ≤ F (b). Find the CDF of X for λ = 2 and plot F (x) versus x. 5. F (x) is continuous from the right. That is, for h > 0, F (b) = limh→0 F (b + h) = F (b+ ) 6. P [a < X ≤ b] = F (b) − F (a) 7. P [X = b] = F (b) − F (b− ) J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 5 J. McNames Portland State University Example 2: Distribution Plot ECE 4/557 Random Variables Ver. 1.06 6 Ver. 1.06 8 Example 2: MATLAB Code function [] = ExponentialCDF(); Exponential Cumulative Distribution Function close all; 1 %FigureSet(1); %figure(1); FigureSet(1,’LTX’); lambda = 2; x = 0:0.01:2; y = 1-exp(-lambda*x); F(x) 0.8 h = plot(x,y,’b’,[0 100],[1 1],’k:’); set(h,’LineWidth’,1.5); axis([0 max(x) 0 1.1]); xlabel(’x’); ylabel(’F(x)’); title(’Exponential Cumulative Distribution Function’); set(gca,’Box’,’Off’); AxisSet(8); 0.6 0.4 print -depsc ExponentialCDF; 0.2 0 J. McNames 0 0.2 0.4 0.6 0.8 Portland State University 1 x 1.2 ECE 4/557 1.4 1.6 1.8 Random Variables 2 Ver. 1.06 7 J. McNames Portland State University ECE 4/557 Random Variables Empirical Distribution Function Example 3: Empirical Distribution Function Plot Let X1 , X2 , . . . , Xn be a random sample. The empirical distribution function (edf) is a function of x which equals the fraction of Xi s that are less than or equal to x for each x, −∞ < x < ∞ Exponential Emperical Distribution Function N:25 1 • The “true” CDF is never known 0.8 • All we have is data S(x) • The edf is a rough estimate of the CDF • Piecewise-constant function (stairs) 0.6 • Assuming the sample consist of distinct values, each step has height = n1 0.4 • Minimum value: 0, Maximum value: 1 0.2 • Nondecreasing • Is a random function 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 9 J. McNames Example 3: MATLAB Code Random Variables Ver. 1.06 10 Discrete: An RV whose CDF is a right-continuous, staircase (piecewise constant) function of x. • Only takes on values from a finite set • Encounter often in applications involving counting close all; FigureSet(1,’LTX’); lambda = 2; N = 25; R = exprnd(1/lambda,N,1); x = 0:0.02:max(R); F = 1-exp(-lambda*x); h = cdfplot(R); hold on; plot(x,F,’r’,[0 100],[1 1],’k:’); hold off; grid off; set(h,’LineWidth’,1.0); set(gca,’XLim’,[0 max(R)]); set(gca,’YLim’,[0 1.1]); xlabel(’x’); ylabel(’S(x)’); title(sprintf(’Exponential Emperical Distribution Function box off; Continuous: An RV whose CDF is continuous everywhere • Can be writtenas an integral of some nonnegative function x f (x): F (x) = −∞ f (u) du • Implies P [X = x] = 0 everywhere. • In words, there is an infinitesimal probability X will be equal to any specific number x. • Nonetheless, an experiment will cause X to equal some value. N:%d’,N)); AxisSet(8); print -depsc ExponentialEDF; Mixed: An RV with a CDF that has jumps on a countable set of points, but also increases continuously over one or more intervals. IOW, everything else. x Note: MATLAB defines the distribution as f (x) = λ1 e− λ u(x) rather than f (x) = λe−λx u(x). Portland State University ECE 4/557 Random Variable Types function [] = ExponentialEDF(); J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 11 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 12 Properties of the PDF Definition: Probability Density Function (PDF) 1. f (x) ≥ 0 The probability density function (PDF) of a continuous RV is defined as the derivative of F (x): f (x) = b 2. P [a ≤ X ≤ b] = a f (u) du x 3. F (x) = −∞ f (u) du +∞ 4. −∞ f (u) du = 1 dF (x) dx Alternatively, F (x − ) + F (x + ) →0 2 5. A valid PDF can be formed from any nonnegative, piecewise continuous function g(x) that has a finite integral f (x) = lim 6. The PDF must be defined for all real values of x • Conceptually, it is more useful than the CDF 7. If X does not take on some values, this implies f (x) = 0 for those values • Does not technically exist for discrete or mixed RV’s – Can finesse with impulse functions – δ(x) = du(x) dx where u(x) is the unit step function • PDF represents the density of probability at the point x J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 13 J. McNames Portland State University Example 4: Exponential PDF ECE 4/557 Random Variables Ver. 1.06 14 Ver. 1.06 16 Example 4: MATLAB Code function [] = ExponentialPDF(); Exponential CDF and PDF close all; F(x) 1 FigureSet(1,’LTX’); lambda = 2; x = -0.5:0.005:2; 0.5 0 −0.5 0 0.5 1 1.5 xl = [min(x) max(x)]; F = zeros(size(x)); id = find(x>=0); F(id) = 1-exp(-lambda*x(id)); f = zeros(size(x)); f(id) = lambda*exp(-lambda*x(id)); 2 subplot(2,1,1); h = plot(x,F,’b’,xl,[1 1],’k:’); set(h,’LineWidth’,1.5); xlim(xl); ylim([0 1.1]); ylabel(’F(x)’); title(’Exponential CDF and PDF’); box off; 2 f(x) 1.5 1 subplot(2,1,2); h = plot(x,f,’g’); set(h,’LineWidth’,1.5); xlim(xl); ylim([0 2.1]); xlabel(’x’); ylabel(’f(x)’); box off; 0.5 0 −0.5 0 0.5 1 1.5 2 x J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 15 J. McNames Portland State University ECE 4/557 Random Variables Histograms AxisSet(8); Let X1 , X2 , . . . , Xn be a random sample. The histogram is a function of x which equals the fraction of Xi s that are within specified intervals. print -depsc ExponentialPDF; • Like the CDF, the “true” PDF is never known • The histogram is a rough estimate of the PDF • Usually shown in the form of a bar plot • Minimum value: 0, Maximum value: ∞ • Is a random function • Perhaps the most common graphical representation of estimated PDFs J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 17 J. McNames Example 5: Histograms True Random Variables Ver. 1.06 18 • Histograms can be misleading • The apparent shape of the histogram is sensitive to – The bin locations – The bin widths 1 Estimated ECE 4/557 Histogram Comments Exponential Emperical Distribution Function N:100 0.5 0 Portland State University 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 • It can be shown that the bin width affects the bias and the variance of this estimator of the PDF 1 0.5 0 1 True Estimated 0.5 0 J. McNames 0 0.5 1 1.5 Portland State University 2 2.5 x ECE 4/557 3 3.5 4 Random Variables 4.5 5 Ver. 1.06 19 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 20 Histogram Accuracy −∞ 2 fˆ(u) − f (u) du 1 2 f (x) [h Exponential Emperical Distribution Function N:100 1 − 2(x − bj )] + O(h2 ) for x ∈ (bj , bj + 1] 0.5 0 f (x) + O(n−1 ) nh 1 h2 R(f ) + + O(n−1 ) + O(h3 ) nh 12 0 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 True Estimated 0 Random Variables 1.5 0.5 • More on all of this later ECE 4/557 1 1 • The bin width controls the bias-variance tradeoff Portland State University 0.5 0.5 where h is the bin width, bj is the jth bin boundary, and R(φ) = φ(u)2 du J. McNames 0 1 Estimated MISE = ∞ True ISE = BIAS fˆ(x) = Var fˆ(x) = Example 5: Histograms with Different Bin Centers Ver. 1.06 21 0 J. McNames 0.5 1 1.5 2 Portland State University Example 6: Uniform Distribution 2.5 x 3 ECE 4/557 3.5 4 4.5 Random Variables 5 Ver. 1.06 22 Gaussian RV’s Plot the CDF and PDF for a uniform random variable X ∼ U[a, b]. Note: X ∼ U[a, b] denotes that X is drawn from a uniform distribution and has a range of [a, b]. Gaussian Distribution Function Gaussian Density Function 0.4 1 0.3 0.6 f(x) F(x) 0.8 0.2 0.4 0.1 0.2 0 −5 −4 −3 −2 −1 0 x f (x) = 1 2 √ 3 4 5 0 −5 −4 (x−m)2 1 e− 2σ2 2πσ 1 F (x) = P [X ≤ x] = √ 2πσ −3 −2 x −∞ −1 e− 0 x (x−m)2 2σ 2 1 2 3 4 5 dx 2 • Denoted as X ∼ N (μX , σX ) • Also called the normal distribution • Arises naturally in many applications J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 23 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 24 • Central limit theorem (more later) Functions of RV’s • We will work with functions of RV’s: Y = g(X) • Y is also an RV • Example: Y = aX + b FY (y) = P [Y ≤ y] P [aX + b ≤ y] y−b = P X≤ a y−b = FX a = for a > 0. J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 25 J. McNames Expected Values Overview Ver. 1.06 26 • This is called the mean of X • The expected value of X is only defined if the integral converges absolutely: ∞ |x| f (x) dx < ∞ −∞ • The “best” estimate of the mean of X given a data set is the sample average, N 1 xi ≈ E[x] X̄ = N i=1 • These scalar descriptive statistics are called point estimates Random Variables Ver. 1.06 • The expected value of a random variable X is denoted E[X] • Often, much less information about the distribution of X is sufficient – Mean – Median – Standard deviation – Range ECE 4/557 Random Variables −∞ • Given a data set, estimating the CDF/PDF is one of the most difficult problems we will discuss (density estimation) Portland State University ECE 4/557 Expected Values Defined +∞ E[X] = x f (x) dx • To completely describe all of the known information about an RV, we must specify the CDF or PDF J. McNames Portland State University 27 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 28 Average versus Mean E[X] = μx = +∞ −∞ xf (x) dx Expected Values of Functions We can also calculate the expected values of functions of random variables. Let Y = g(X). Then, ∞ g(x)f (x) dx E[Y ] = N 1 x̄ = μ̂x = xi N i=1 −∞ Note the distinction between the average and mean Example Let g(X) = I(X) where I(X) is the indicator function of the event {X in C}, where C is some interval in the real line: 0 X not in C g(X) = 1 X in C • The average is – an estimate of the mean – calculated from a data set – a random variable • The mean is – Calculated from a PDF – Not a random variable – A property of the PDF J. McNames Portland State University then E[Y ] = ∞ −∞ g(x)f (x) dx = C f (x) dx = P [X ⊂ C] Thus, the expected value of the indicator of an event is equal to the probability of the event. ECE 4/557 Random Variables Ver. 1.06 29 Expected Value Properties J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 30 Ver. 1.06 32 Variance The variance of a random variable is defined as follows: 1. E[c] = c where c is a constant 2 ≡ E[(X − μx )2 ] σX 2. E[cX] = c E[X] N N 3. E[ k=1 gk (X)] = k=1 E[gk (X)] The nth moment of an RV is defined as ∞ E[X n ] ≡ xn f (x) dx 4. Proof left as a homework assignment −∞ • Variance is a measure of how wide a distribution is • A measure of dispersion • There are others as well • The standard deviation is defined as σ ≡ √ σ2 2 • σX = E[X 2 ] − E[X]2 • Both are properties of the CDF and are not RVs J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 31 J. McNames Portland State University ECE 4/557 Random Variables Markov Inequality Example 7: Markov Inequality The mean and variance of a RV X give us sufficient information to establish bounds on certain probabilities. Suppose that X is a nonnegative random variable. The mean height of children in a kindergarten class is 3.5 feet. Find the bound on probability that a kid in the class is taller than 9 feet. Markov inequality: P [X ≥ a] ≤ Proof E[X] xf (x) dx + 0 a ≥ a = ≥ E[X] a ∞ ∞ a ∞ a xf (x) dx xf (x) dx af (x) dx = aP [X ≥ a] J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 33 J. McNames Chebyshev Inequality σ2 a2 2 2 Proof Let D = (X − μ) where μ = E[X]. Then apply the Markov inequality Random Variables 34 • X ≡ [H(ζ), W (ζ), A(ζ)] E[(X − μ)2 ] σ2 = a2 a2 ECE 4/557 Ver. 1.06 • Example: randomly select a student • Where – H(ζ) = height of student ζ – W (ζ) = weight of student ζ – A(ζ) = age of student ζ • Note: if σ 2 = 0, the Chebyshev inequality implies P [X = μ] = 1 Portland State University Random Variables A vector random variable is a function that assigns a vector of real numbers to each outcome ζ in S, the sample space of the random experiment. • These bounds are very loose J. McNames ECE 4/557 Multiple Random Variables P [|X − μ| ≥ a] ≤ P [D ≥ a] = P [D2 ≥ a2 ] ≤ Portland State University Ver. 1.06 35 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 36 Jointly Continuous Random Variables Example 8: Jointly Continuous RV Random variables X and Y are jointly continuous if the probabilities of events involving (X, Y ) can be expressed as an integral of a PDF. Gaussian Density Function f(x,y) 1.5 0.25 In other words, there is a joint probability density function that is defined on the real plane such that for any event A, fX,Y (u, v) du dv P [X, Y in A] = 1 0.2 y A Properties +∞ +∞ • −∞ −∞ fX,Y (u, v) du dv = 1 • fX,Y (x, y) = 0.15 0.5 0.1 d2 FX,Y (x,y) dx dy 0 0.05 −0.5 −0.5 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 37 J. McNames Example 8: MATLAB Code 0 0.5 x Portland State University 1 ECE 4/557 1.5 Random Variables Ver. 1.06 38 Example 8: Jointly Continuous RV Continued Gaussian Density Function f(x,y) 0.25 0.4 0.2 F(x,y) 0.3 0.2 0.15 0.1 0.1 0 2 1.5 1 1 y J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 39 J. McNames 0.05 0.5 0 −1 0 −0.5 Portland State University x ECE 4/557 Random Variables Ver. 1.06 40 Marginal PDF’s Joint Cumulative Distribution Function (CDF) Y Y x x Y FX (x) FX,Y (x, y) FY (y) y y X X There is also a joint CDF: FX,Y (x, y) = x −∞ The marginal PDF’s fX (x) and fY (y) are given as follows ∞ fX (x) = fX,Y (x, y) dy −∞ ∞ fX,Y (x, y) dx fY (y) = y −∞ X fX,Y (u, v) du dv = P [X ≤ x & Y ≤ y] −∞ = P [X ≤ x, Y ≤ y] • fX (x) is the same as the PDF of X, if Y had not been considered • The marginal PDF can be obtained from the joint PDF • The joint PDF cannot be obtained from the marginal PDF J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 41 J. McNames Portland State University Independence ECE 4/557 Random Variables Ver. 1.06 42 Conditional CDF’s & Bayes’ Theorem Two random variables X and Y are independent if and only if their joint PDF is equal to the product of the marginal PDF’s. The conditional CDF of Y given X = x: FY (y|x) = fX,Y (x, y) = fX (x)fY (y) = Equivalently, they are independent if and only if their joint CDF is equal to the product of the marginal CDF’s. lim FY (y|x < X ≤ x + h) h→0 y −∞ fX,Y (x, y ) dy fX (x) Proof omitted. FX,Y (x, y) = FX (x)FY (y) The conditional PDF of Y given X = x: fY (y|x) = • If X and Y are independent, the random variables W = g(X) and Z = h(Y ) are also independent d fX,Y (x, y) FY (y|x) = dy fX (x) • This can be viewed as a form of Bayes’ theorem • Gives a posteriori probability that Y is close to y given that X is close to x J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 43 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 44 Conditional CDF’s & Independence Conditional Expectation If X and Y are independent The conditional expectation of Y given X = x is defined by ∞ EY [Y |X = x] = y fY (y|x) dy fX,Y (x, y) = fX (x)fY (y) −∞ and the conditional PDF of Y given X = x is fY (y|x) = = = fX,Y (x, y) fX (x) fX (x)fY (y) fX (x) fY (y) • EY [Y |X = x] can be viewed as a function of x: g(x) = EY [Y |x] • g(X) = EY [Y |X] is a random variable • It can be shown that EY [Y ] = EX [g(X)] = EX [EY [Y |X]] • More generally, EY [h(Y )] = EX [EY [h(Y )|X]] where ∞ ∞ EX [EY [h(Y )|X]] = h(y)fY (y|x) dy fX (x) dx −∞ J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 45 Correlation and Covariance Portland State University ECE 4/557 Random Variables Ver. 1.06 46 Correlation Coefficient The correlation coefficient of X and Y is defined as The jk th moment of X and Y is defined as ∞ ∞ j k E[X Y ] = xj y k fX,Y (x, y) dx dy −∞ J. McNames −∞ ρX,Y = • −1 ≤ ρX,Y ≤ 1 −∞ 2 σX,Y σX σY • The correlation of X and Y is defined as E[XY ] • Extreme values of ρX,Y indicate a linear relationship between X and Y : Y = aX + b • If E[XY ] = 0, we say X and Y are orthogonal • ρX,Y = 1 implies a > 0, ρX,Y = −1 implies a < 0 • The covariance of X and Y is defined as • X and Y are said to be uncorrelated if ρX,Y = 0 2 • If X and Y are independent, σX,Y = 0 (see homework) 2 σX,Y = E[(X − μX )(Y − μY )] • If X and Y are independent, ρX,Y = 0 • If ρX,Y = 0, X and Y may not be independent • Uncorrelated variables are not necessarily independent • However, if X and Y are Gaussian random variables, then ρX,Y = 0 implies X and Y are independent J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 47 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 48 Mean Square Estimation Observed Variables x1,...,x n Model Example 9: Minimum MSE Estimation Suppose we wish to estimate a random variable Y with a constant a. What is the best value of a that minimizes the MSE? Output y • Often will want to estimate the value of one RV Y from one or more other RVs X: Ŷ = g(X) • Encounter often in nonlinear modeling and classification • It may be that Y = g(X) • The estimation error is defined as Y − g(X) • We will assign a cost to each error c(Y − g(X)) • Goal: find G(X) that minimizes E[c(Y − g(X))] • The most common cost function is mean squared error (MSE): MSE = E[(Y − g(X))2 ] J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 49 J. McNames Example 10: Minimum Linear MSE Estimation Portland State University ECE 4/557 Random Variables Ver. 1.06 50 Ver. 1.06 52 Example 10: Workspace (1) Suppose we wish to estimate a random variable Y with a linear function of X, Ŷ = aX + b. What values of a and b minimize the MSE? J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 51 J. McNames Portland State University ECE 4/557 Random Variables MMSE Linear Estimation Discussion Example 10: Workspace (2) Ŷ = a∗ X + b∗ X − E[X] + E[Y ] = ρX,Y σY σX • Note that X−E[X] is just a scaled version of X σX – Zero mean – Unit variance – Sometimes called a z score • Xs = σY X−E[X] σX has the variance of Y • The term E[Y ] ensures that E[Ŷ ] = E[Y ] • ρX,Y specifies the sign and extent of Y relative to Xs • If uncorrelated, Ŷ = E[Y ] • If perfectly correlated, Ŷ = ±σY J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 53 J. McNames Portland State University Orthogonality Condition MMSE = • Fundamental result in mean square estimation = • Central to the area of linear estimation = • Enables us to more easily find the minimum MSE of the best linear estimator = • The notation will be simplified by the following notation = Ỹ = Y − E[Y ] = • These are called centered random variables = J. McNames Portland State University ECE 4/557 Random Variables X−E[X] σX ECE 4/557 + E[Y ] = Y Random Variables Ver. 1.06 54 Best Linear Estimator MMSE The orthogonality condition states that the error of the best linear estimator is orthogonal to the observation X − E[X]. X̃ = X − E[X] Ver. 1.06 55 J. McNames 2 E {(Y − E[Y ]) − a∗ (X − E[X])} 2 ∗ E Ỹ − a X̃ E Ỹ − a∗ X̃ Ỹ − a∗ E Ỹ − a∗ X̃ X̃ E Ỹ − a∗ X̃ Ỹ 2 σY2 − a∗ σX,Y 2 σX,Y 2 2 σX,Y σY − 2 σX σY2 (1 − ρ2X,Y ) Portland State University ECE 4/557 Random Variables Ver. 1.06 56 Best Linear Estimator MMSE MMSE = = Nonlinear Estimation 2 E {(Y − E[Y ]) − a∗ (X − E[X])} σY2 1 − ρ2X,Y • In general, the best estimator of Y given X will be nonlinear • Suppose we wish to find the g(X) that best approximates Y in the MMSE sense min EX,Y [(Y − g(X))2 ] g(·) • When ρX,Y = ±1, MMSE = 0 Using conditional expectation 2 2 EX,Y (Y − g(X)) = EX EY (Y − g(x)) |X = x ∞ 2 = EY (Y − g(x)) |X = x fX (x) dx −∞ ∞ ∞ 2 (y − g(x)) fY (y|x) dy fX (x) dx = • Perfect correlation implies perfect prediction • No correlation (ρX,Y = ±0) implies MMSE = σY2 −∞ J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 57 Nonlinear Estimation Continued ∞ ∞ EX,Y [(Y − g(X))2 ] = (y − g(x))2 fY (y|x) dy fX (x) dx −∞ J. McNames Portland State University −∞ ECE 4/557 Random Variables Ver. 1.06 58 Random Variables Ver. 1.06 60 Random Vectors Let X be a random vector, ⎡ ⎤ X1 ⎢ X2 ⎥ ⎢ ⎥ X=⎢ . ⎥ ⎣ .. ⎦ −∞ • Integrand is positive for all x XL • Minimized by minimizing EY [(Y − g(x))2 |X = x] for each x • g(x) is a constant relative to EY [·] Then, the expected value of X is defined as ⎤ ⎡ E[X1 ] ⎢ E[X2 ] ⎥ ⎥ ⎢ E[X] = ⎢ . ⎥ ⎣ .. ⎦ • Reduces to the equivalent example earlier: estimate Y with a constant g(x) • Therefore, the g(x) that minimizes the MSE is Ŷ = g ∗ (x) = EY [Y |X = x] E[XL ] ∗ • The function g (x) is called the regression curve • Has the smallest possible MSE • Linear estimators are generally worse (larger MSE) J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 59 J. McNames Portland State University ECE 4/557 Linear Estimation with Vectors Suppose that we wish to estimate Y with a linear sum of random variables X1 , X2 , . . . XL . Ŷ Linear Estimation Error ε2 T = MSE ≡ Then the error Y − Ŷ can be written as = T The expected value of the squared error is i=1 ε T = Y 2 + w XX w − 2Y X w T X w L X i wi = T = (Y − X w)2 T Y −X w E[ε2 ] T = E[(Y − X w)2 ] = E[Y 2 ] + w E[XX ]w − 2 E[Y X ]w T T T and the squared error can be written as ε2 T = (Y − X w)2 T T T = Y 2 + w XX w − 2Y X w J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 61 Correlation Matrix Let X be a zero-mean random vector. The variance-covariance of the vector X, also called the correlation matrix, can be written as R J. McNames Portland State University P T 2 σX 1 ,X2 2 σX 2 .. . 2 σX L ,X1 2 σX L ,X2 Random Variables Ver. 1.06 62 Cross-Correlation Matrix Let Y be a zero-mean scalar random variable. Define P , the cross correlation matrix, as = σ 2 {X} = E[XX ] ⎡ 2 σ X1 2 ⎢ σX ⎢ 2 ,X1 = ⎢ . ⎣ .. ECE 4/557 ⎤ 2 . . . σX 1 ,XL 2 ⎥ . . . σX 2 ,XL ⎥ .. ⎥ .. . . ⎦ 2 ... σXL E[Y X] ⎤ ⎡ E[Y X1 ] ⎢ E[Y X2 ] ⎥ ⎥ ⎢ = ⎢ ⎥ .. ⎦ ⎣ . = E[Y XL ] T R is a symmetric matrix: R = R. J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 63 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 64 Minimum Mean Squared Error Minimum Mean Squared Error Using the matrices R and P , the MSE can be rewritten E[ε2 ] w∗ = R−1 P T = E[(Y − X w)2 ] = E[Y ] + w E[XX ]w − 2 E[Y X ]w Find the minimum MSE by substitution into the equation for the MSE T 2 T T T T min E[ε2 ] = σY2 + w∗ Rw∗ − 2P w∗ T σY2 + w Rw − 2w P = Take the gradient of the MSE above with respect to w and set the resulting expression equal to zero. ∇w E[ε ] 2 = ∇w (σY2 T T + w Rw − 2w P ) T = R w + Rw − 2P = 2Rw − 2P = 0 T = σY2 + (R−1 P ) R(R−1 P ) − 2P (R−1 P ) = σY2 + P R−1 P − 2P R−1 P = σY2 − P R−1 P = σY2 + P w∗ T T T Solving for w∗ , we obtain w∗ = R−1 P J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 65 Closing Comments • In general, we cannot calculate anything discussed so far • Everything discussed requires the true PDF (or CDF) be known • In practice, we’ll have data, not PDF’s • Represents a best-case scenario • How close can we approximate the true point estimate given only data • Will compare our estimators on cases where the true PDF is known J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 67 J. McNames Portland State University ECE 4/557 Random Variables Ver. 1.06 66