Download Example 1: Random Variable Definitions Suppose that a coin is

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Random Variables Overview
Definition: Random Variable
• Definitions
X(ζ) = ζ
S
• Cumulative distribution function
• Probability density Function
Real
Line
ζ
• Functions of random variables
Range
• Expected values
• Markov & Chebyshev inequalities
• Definition: a random variable X is a function that assigns a real
number, X(ζ), to each outcome ζ in the sample space of a
random experiment.
• Independence & marginal distributions
• The sample space S is the domain of the random variable
• Bayes rule and conditional probability
• The set of all values that X can have is the range of the random
variable
• Mean & variance
• Mean square estimation
• This is a many to one mapping. That is, a set of points, ζ1 , ζ2 , . . .
may take on the same value of the random variable
• Linear prediction
• Will abbreviate as simply “RV”
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
1
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
Example 1: Random Variable Definitions
Cumulative Distribution Function
Suppose that a coin is tossed three times and the sequence of heads
and tails is noted. What is the sample space for this experiment? Let
X be the number of heads in three coin tosses. What is the range of
X? List all of the points in the domain of the sample space and the
corresponding values of X.
The cumulative distribution function (CDF) of a random variable X
is defined as the probability of the event {X ≤ x}:
2
F (x) = P [X ≤ x]
• Sometimes is just called distribution function
• Here X is the random variable and x is a non-random variable
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
3
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
4
Properties of the CDF
Example 2: Distribution Functions
1. 0 ≤ F (x) ≤ 1
The arrival time of Joe’s email obeys the exponential probability law
with parameter λ:
0
x<0
P [X > x] =
−λx
x ≥ 0.
λe
2. limx→+∞ F (x) = 1
3. limx→−∞ F (x) = 0
4. F (x) is a nondecreasing function of x. Thus, if a < b, then
F (a) ≤ F (b).
Find the CDF of X for λ = 2 and plot F (x) versus x.
5. F (x) is continuous from the right. That is, for h > 0,
F (b) = limh→0 F (b + h) = F (b+ )
6. P [a < X ≤ b] = F (b) − F (a)
7. P [X = b] = F (b) − F (b− )
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
5
J. McNames
Portland State University
Example 2: Distribution Plot
ECE 4/557
Random Variables
Ver. 1.06
6
Ver. 1.06
8
Example 2: MATLAB Code
function [] = ExponentialCDF();
Exponential Cumulative Distribution Function
close all;
1
%FigureSet(1);
%figure(1);
FigureSet(1,’LTX’);
lambda = 2;
x = 0:0.01:2;
y = 1-exp(-lambda*x);
F(x)
0.8
h = plot(x,y,’b’,[0 100],[1 1],’k:’);
set(h,’LineWidth’,1.5);
axis([0 max(x) 0 1.1]);
xlabel(’x’);
ylabel(’F(x)’);
title(’Exponential Cumulative Distribution Function’);
set(gca,’Box’,’Off’);
AxisSet(8);
0.6
0.4
print -depsc ExponentialCDF;
0.2
0
J. McNames
0
0.2
0.4
0.6
0.8
Portland State University
1
x
1.2
ECE 4/557
1.4
1.6
1.8
Random Variables
2
Ver. 1.06
7
J. McNames
Portland State University
ECE 4/557
Random Variables
Empirical Distribution Function
Example 3: Empirical Distribution Function Plot
Let X1 , X2 , . . . , Xn be a random sample. The empirical distribution
function (edf) is a function of x which equals the fraction of Xi s that
are less than or equal to x for each x, −∞ < x < ∞
Exponential Emperical Distribution Function N:25
1
• The “true” CDF is never known
0.8
• All we have is data
S(x)
• The edf is a rough estimate of the CDF
• Piecewise-constant function (stairs)
0.6
• Assuming the sample consist of distinct values, each step has
height = n1
0.4
• Minimum value: 0, Maximum value: 1
0.2
• Nondecreasing
• Is a random function
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
9
J. McNames
Example 3: MATLAB Code
Random Variables
Ver. 1.06
10
Discrete: An RV whose CDF is a right-continuous, staircase
(piecewise constant) function of x.
• Only takes on values from a finite set
• Encounter often in applications involving counting
close all;
FigureSet(1,’LTX’);
lambda = 2;
N
= 25;
R = exprnd(1/lambda,N,1);
x = 0:0.02:max(R);
F = 1-exp(-lambda*x);
h = cdfplot(R);
hold on;
plot(x,F,’r’,[0 100],[1 1],’k:’);
hold off;
grid off;
set(h,’LineWidth’,1.0);
set(gca,’XLim’,[0 max(R)]);
set(gca,’YLim’,[0 1.1]);
xlabel(’x’);
ylabel(’S(x)’);
title(sprintf(’Exponential Emperical Distribution Function
box off;
Continuous: An RV whose CDF is continuous everywhere
• Can be writtenas an integral of some nonnegative function
x
f (x): F (x) = −∞ f (u) du
• Implies P [X = x] = 0 everywhere.
• In words, there is an infinitesimal probability X will be equal to
any specific number x.
• Nonetheless, an experiment will cause X to equal some value.
N:%d’,N));
AxisSet(8);
print -depsc ExponentialEDF;
Mixed: An RV with a CDF that has jumps on a countable set of
points, but also increases continuously over one or more intervals.
IOW, everything else.
x
Note: MATLAB defines the distribution as f (x) = λ1 e− λ u(x) rather
than f (x) = λe−λx u(x).
Portland State University
ECE 4/557
Random Variable Types
function [] = ExponentialEDF();
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
11
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
12
Properties of the PDF
Definition: Probability Density Function (PDF)
1. f (x) ≥ 0
The probability density function (PDF) of a continuous RV is
defined as the derivative of F (x):
f (x) =
b
2. P [a ≤ X ≤ b] = a f (u) du
x
3. F (x) = −∞ f (u) du
+∞
4. −∞ f (u) du = 1
dF (x)
dx
Alternatively,
F (x − ) + F (x + )
→0
2
5. A valid PDF can be formed from any nonnegative, piecewise
continuous function g(x) that has a finite integral
f (x) = lim
6. The PDF must be defined for all real values of x
• Conceptually, it is more useful than the CDF
7. If X does not take on some values, this implies f (x) = 0 for those
values
• Does not technically exist for discrete or mixed RV’s
– Can finesse with impulse functions
– δ(x) =
du(x)
dx
where u(x) is the unit step function
• PDF represents the density of probability at the point x
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
13
J. McNames
Portland State University
Example 4: Exponential PDF
ECE 4/557
Random Variables
Ver. 1.06
14
Ver. 1.06
16
Example 4: MATLAB Code
function [] = ExponentialPDF();
Exponential CDF and PDF
close all;
F(x)
1
FigureSet(1,’LTX’);
lambda = 2;
x = -0.5:0.005:2;
0.5
0
−0.5
0
0.5
1
1.5
xl = [min(x) max(x)];
F = zeros(size(x));
id = find(x>=0);
F(id) = 1-exp(-lambda*x(id));
f = zeros(size(x));
f(id) = lambda*exp(-lambda*x(id));
2
subplot(2,1,1);
h = plot(x,F,’b’,xl,[1 1],’k:’);
set(h,’LineWidth’,1.5);
xlim(xl);
ylim([0 1.1]);
ylabel(’F(x)’);
title(’Exponential CDF and PDF’);
box off;
2
f(x)
1.5
1
subplot(2,1,2);
h = plot(x,f,’g’);
set(h,’LineWidth’,1.5);
xlim(xl);
ylim([0 2.1]);
xlabel(’x’);
ylabel(’f(x)’);
box off;
0.5
0
−0.5
0
0.5
1
1.5
2
x
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
15
J. McNames
Portland State University
ECE 4/557
Random Variables
Histograms
AxisSet(8);
Let X1 , X2 , . . . , Xn be a random sample. The histogram is a function
of x which equals the fraction of Xi s that are within specified intervals.
print -depsc ExponentialPDF;
• Like the CDF, the “true” PDF is never known
• The histogram is a rough estimate of the PDF
• Usually shown in the form of a bar plot
• Minimum value: 0, Maximum value: ∞
• Is a random function
• Perhaps the most common graphical representation of estimated
PDFs
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
17
J. McNames
Example 5: Histograms
True
Random Variables
Ver. 1.06
18
• Histograms can be misleading
• The apparent shape of the histogram is sensitive to
– The bin locations
– The bin widths
1
Estimated
ECE 4/557
Histogram Comments
Exponential Emperical Distribution Function N:100
0.5
0
Portland State University
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
• It can be shown that the bin width affects the bias and the
variance of this estimator of the PDF
1
0.5
0
1
True
Estimated
0.5
0
J. McNames
0
0.5
1
1.5
Portland State University
2
2.5
x
ECE 4/557
3
3.5
4
Random Variables
4.5
5
Ver. 1.06
19
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
20
Histogram Accuracy
−∞
2
fˆ(u) − f (u) du
1 2 f (x) [h
Exponential Emperical Distribution Function N:100
1
− 2(x − bj )] + O(h2 ) for x ∈ (bj , bj + 1]
0.5
0
f (x)
+ O(n−1 )
nh
1
h2 R(f )
+
+ O(n−1 ) + O(h3 )
nh
12
0
2
2.5
3
3.5
4
4.5
5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
True
Estimated
0
Random Variables
1.5
0.5
• More on all of this later
ECE 4/557
1
1
• The bin width controls the bias-variance tradeoff
Portland State University
0.5
0.5
where h is the bin width, bj is the jth bin boundary, and
R(φ) = φ(u)2 du
J. McNames
0
1
Estimated
MISE =
∞
True
ISE =
BIAS fˆ(x) =
Var fˆ(x) =
Example 5: Histograms with Different Bin Centers
Ver. 1.06
21
0
J. McNames
0.5
1
1.5
2
Portland State University
Example 6: Uniform Distribution
2.5
x
3
ECE 4/557
3.5
4
4.5
Random Variables
5
Ver. 1.06
22
Gaussian RV’s
Plot the CDF and PDF for a uniform random variable X ∼ U[a, b].
Note: X ∼ U[a, b] denotes that X is drawn from a uniform
distribution and has a range of [a, b].
Gaussian Distribution Function
Gaussian Density Function
0.4
1
0.3
0.6
f(x)
F(x)
0.8
0.2
0.4
0.1
0.2
0
−5
−4
−3
−2
−1
0
x
f (x) =
1
2
√
3
4
5
0
−5
−4
(x−m)2
1
e− 2σ2
2πσ
1
F (x) = P [X ≤ x] = √
2πσ
−3
−2
x
−∞
−1
e−
0
x
(x−m)2
2σ 2
1
2
3
4
5
dx
2
• Denoted as X ∼ N (μX , σX
)
• Also called the normal distribution
• Arises naturally in many applications
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
23
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
24
• Central limit theorem (more later)
Functions of RV’s
• We will work with functions of RV’s: Y = g(X)
• Y is also an RV
• Example: Y = aX + b
FY (y) = P [Y ≤ y]
P [aX + b ≤ y]
y−b
= P X≤
a
y−b
= FX
a
=
for a > 0.
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
25
J. McNames
Expected Values Overview
Ver. 1.06
26
• This is called the mean of X
• The expected value of X is only defined if the integral converges
absolutely:
∞
|x| f (x) dx < ∞
−∞
• The “best” estimate of the mean of X given a data set is the
sample average,
N
1 xi ≈ E[x]
X̄ =
N i=1
• These scalar descriptive statistics are called point estimates
Random Variables
Ver. 1.06
• The expected value of a random variable X is denoted E[X]
• Often, much less information about the distribution of X is
sufficient
– Mean
– Median
– Standard deviation
– Range
ECE 4/557
Random Variables
−∞
• Given a data set, estimating the CDF/PDF is one of the most
difficult problems we will discuss (density estimation)
Portland State University
ECE 4/557
Expected Values Defined
+∞
E[X] =
x f (x) dx
• To completely describe all of the known information about an RV,
we must specify the CDF or PDF
J. McNames
Portland State University
27
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
28
Average versus Mean
E[X] = μx =
+∞
−∞
xf (x) dx
Expected Values of Functions
We can also calculate the expected values of functions of random
variables. Let Y = g(X). Then,
∞
g(x)f (x) dx
E[Y ] =
N
1 x̄ = μ̂x =
xi
N i=1
−∞
Note the distinction between the average and mean
Example Let g(X) = I(X) where I(X) is the indicator function of
the event {X in C}, where C is some interval in the real line:
0 X not in C
g(X) =
1 X in C
• The average is
– an estimate of the mean
– calculated from a data set
– a random variable
• The mean is
– Calculated from a PDF
– Not a random variable
– A property of the PDF
J. McNames
Portland State University
then
E[Y ] =
∞
−∞
g(x)f (x) dx =
C
f (x) dx = P [X ⊂ C]
Thus, the expected value of the indicator of an event is equal to the
probability of the event.
ECE 4/557
Random Variables
Ver. 1.06
29
Expected Value Properties
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
30
Ver. 1.06
32
Variance
The variance of a random variable is defined as follows:
1. E[c] = c where c is a constant
2
≡ E[(X − μx )2 ]
σX
2. E[cX] = c E[X]
N
N
3. E[ k=1 gk (X)] = k=1 E[gk (X)]
The nth moment of an RV is defined as
∞
E[X n ] ≡
xn f (x) dx
4. Proof left as a homework assignment
−∞
• Variance is a measure of how wide a distribution is
• A measure of dispersion
• There are others as well
• The standard deviation is defined as σ ≡
√
σ2
2
• σX
= E[X 2 ] − E[X]2
• Both are properties of the CDF and are not RVs
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
31
J. McNames
Portland State University
ECE 4/557
Random Variables
Markov Inequality
Example 7: Markov Inequality
The mean and variance of a RV X give us sufficient information to
establish bounds on certain probabilities. Suppose that X is a
nonnegative random variable.
The mean height of children in a kindergarten class is 3.5 feet. Find
the bound on probability that a kid in the class is taller than 9 feet.
Markov inequality:
P [X ≥ a] ≤
Proof
E[X]
xf (x) dx +
0
a
≥
a
=
≥
E[X]
a
∞
∞
a
∞
a
xf (x) dx
xf (x) dx
af (x) dx
= aP [X ≥ a]
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
33
J. McNames
Chebyshev Inequality
σ2
a2
2
2
Proof Let D = (X − μ) where μ = E[X]. Then apply the Markov
inequality
Random Variables
34
• X ≡ [H(ζ), W (ζ), A(ζ)]
E[(X − μ)2 ]
σ2
=
a2
a2
ECE 4/557
Ver. 1.06
• Example: randomly select a student
• Where
– H(ζ) = height of student ζ
– W (ζ) = weight of student ζ
– A(ζ) = age of student ζ
• Note: if σ 2 = 0, the Chebyshev inequality implies P [X = μ] = 1
Portland State University
Random Variables
A vector random variable is a function that assigns a vector of real
numbers to each outcome ζ in S, the sample space of the random
experiment.
• These bounds are very loose
J. McNames
ECE 4/557
Multiple Random Variables
P [|X − μ| ≥ a] ≤
P [D ≥ a] = P [D2 ≥ a2 ] ≤
Portland State University
Ver. 1.06
35
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
36
Jointly Continuous Random Variables
Example 8: Jointly Continuous RV
Random variables X and Y are jointly continuous if the probabilities
of events involving (X, Y ) can be expressed as an integral of a PDF.
Gaussian Density Function f(x,y)
1.5
0.25
In other words, there is a joint probability density function that is
defined on the real plane such that for any event A,
fX,Y (u, v) du dv
P [X, Y in A] =
1
0.2
y
A
Properties
+∞ +∞
• −∞ −∞ fX,Y (u, v) du dv = 1
• fX,Y (x, y) =
0.15
0.5
0.1
d2 FX,Y (x,y)
dx dy
0
0.05
−0.5
−0.5
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
37
J. McNames
Example 8: MATLAB Code
0
0.5
x
Portland State University
1
ECE 4/557
1.5
Random Variables
Ver. 1.06
38
Example 8: Jointly Continuous RV Continued
Gaussian Density Function f(x,y)
0.25
0.4
0.2
F(x,y)
0.3
0.2
0.15
0.1
0.1
0
2
1.5
1
1
y
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
39
J. McNames
0.05
0.5
0
−1
0
−0.5
Portland State University
x
ECE 4/557
Random Variables
Ver. 1.06
40
Marginal PDF’s
Joint Cumulative Distribution Function (CDF)
Y
Y
x
x
Y
FX (x)
FX,Y (x, y)
FY (y)
y
y
X
X
There is also a joint CDF:
FX,Y (x, y) =
x
−∞
The marginal PDF’s fX (x) and fY (y) are given as follows
∞
fX (x) =
fX,Y (x, y) dy
−∞
∞
fX,Y (x, y) dx
fY (y) =
y
−∞
X
fX,Y (u, v) du dv
= P [X ≤ x & Y ≤ y]
−∞
= P [X ≤ x, Y ≤ y]
• fX (x) is the same as the PDF of X, if Y had not been considered
• The marginal PDF can be obtained from the joint PDF
• The joint PDF cannot be obtained from the marginal PDF
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
41
J. McNames
Portland State University
Independence
ECE 4/557
Random Variables
Ver. 1.06
42
Conditional CDF’s & Bayes’ Theorem
Two random variables X and Y are independent if and only if their
joint PDF is equal to the product of the marginal PDF’s.
The conditional CDF of Y given X = x:
FY (y|x) =
fX,Y (x, y) = fX (x)fY (y)
=
Equivalently, they are independent if and only if their joint CDF is
equal to the product of the marginal CDF’s.
lim FY (y|x < X ≤ x + h)
h→0
y
−∞
fX,Y (x, y ) dy fX (x)
Proof omitted.
FX,Y (x, y) = FX (x)FY (y)
The conditional PDF of Y given X = x:
fY (y|x) =
• If X and Y are independent, the random variables W = g(X) and
Z = h(Y ) are also independent
d
fX,Y (x, y)
FY (y|x) =
dy
fX (x)
• This can be viewed as a form of Bayes’ theorem
• Gives a posteriori probability that Y is close to y given that X is
close to x
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
43
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
44
Conditional CDF’s & Independence
Conditional Expectation
If X and Y are independent
The conditional expectation of Y given X = x is defined by
∞
EY [Y |X = x] =
y fY (y|x) dy
fX,Y (x, y) = fX (x)fY (y)
−∞
and the conditional PDF of Y given X = x is
fY (y|x) =
=
=
fX,Y (x, y)
fX (x)
fX (x)fY (y)
fX (x)
fY (y)
• EY [Y |X = x] can be viewed as a function of x: g(x) = EY [Y |x]
• g(X) = EY [Y |X] is a random variable
• It can be shown that EY [Y ] = EX [g(X)] = EX [EY [Y |X]]
• More generally, EY [h(Y )] = EX [EY [h(Y )|X]] where
∞ ∞
EX [EY [h(Y )|X]] =
h(y)fY (y|x) dy fX (x) dx
−∞
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
45
Correlation and Covariance
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
46
Correlation Coefficient
The correlation coefficient of X and Y is defined as
The jk th moment of X and Y is defined as
∞ ∞
j k
E[X Y ] =
xj y k fX,Y (x, y) dx dy
−∞
J. McNames
−∞
ρX,Y =
• −1 ≤ ρX,Y ≤ 1
−∞
2
σX,Y
σX σY
• The correlation of X and Y is defined as E[XY ]
• Extreme values of ρX,Y indicate a linear relationship between X
and Y : Y = aX + b
• If E[XY ] = 0, we say X and Y are orthogonal
• ρX,Y = 1 implies a > 0, ρX,Y = −1 implies a < 0
• The covariance of X and Y is defined as
• X and Y are said to be uncorrelated if ρX,Y = 0
2
• If X and Y are independent, σX,Y
= 0 (see homework)
2
σX,Y
= E[(X − μX )(Y − μY )]
• If X and Y are independent, ρX,Y = 0
• If ρX,Y = 0, X and Y may not be independent
• Uncorrelated variables are not necessarily independent
• However, if X and Y are Gaussian random variables, then
ρX,Y = 0 implies X and Y are independent
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
47
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
48
Mean Square Estimation
Observed
Variables
x1,...,x n
Model
Example 9: Minimum MSE Estimation
Suppose we wish to estimate a random variable Y with a constant a.
What is the best value of a that minimizes the MSE?
Output
y
• Often will want to estimate the value of one RV Y from one or
more other RVs X: Ŷ = g(X)
• Encounter often in nonlinear modeling and classification
• It may be that Y = g(X)
• The estimation error is defined as Y − g(X)
• We will assign a cost to each error c(Y − g(X))
• Goal: find G(X) that minimizes E[c(Y − g(X))]
• The most common cost function is mean squared error (MSE):
MSE = E[(Y − g(X))2 ]
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
49
J. McNames
Example 10: Minimum Linear MSE Estimation
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
50
Ver. 1.06
52
Example 10: Workspace (1)
Suppose we wish to estimate a random variable Y with a linear
function of X, Ŷ = aX + b. What values of a and b minimize the
MSE?
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
51
J. McNames
Portland State University
ECE 4/557
Random Variables
MMSE Linear Estimation Discussion
Example 10: Workspace (2)
Ŷ
= a∗ X + b∗
X − E[X]
+ E[Y ]
= ρX,Y σY
σX
• Note that X−E[X]
is just a scaled version of X
σX
– Zero mean
– Unit variance
– Sometimes called a z score
• Xs = σY
X−E[X]
σX
has the variance of Y
• The term E[Y ] ensures that E[Ŷ ] = E[Y ]
• ρX,Y specifies the sign and extent of Y relative to Xs
• If uncorrelated, Ŷ = E[Y ]
• If perfectly correlated, Ŷ = ±σY
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
53
J. McNames
Portland State University
Orthogonality Condition
MMSE =
• Fundamental result in mean square estimation
=
• Central to the area of linear estimation
=
• Enables us to more easily find the minimum MSE of the best
linear estimator
=
• The notation will be simplified by the following notation
=
Ỹ = Y − E[Y ]
=
• These are called centered random variables
=
J. McNames
Portland State University
ECE 4/557
Random Variables
X−E[X]
σX
ECE 4/557
+ E[Y ] = Y
Random Variables
Ver. 1.06
54
Best Linear Estimator MMSE
The orthogonality condition states that the error of the best linear
estimator is orthogonal to the observation X − E[X].
X̃ = X − E[X]
Ver. 1.06
55
J. McNames
2
E {(Y − E[Y ]) − a∗ (X − E[X])}
2 ∗
E Ỹ − a X̃
E Ỹ − a∗ X̃ Ỹ − a∗ E Ỹ − a∗ X̃ X̃
E Ỹ − a∗ X̃ Ỹ
2
σY2 − a∗ σX,Y
2
σX,Y
2
2
σX,Y
σY −
2
σX
σY2 (1 − ρ2X,Y )
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
56
Best Linear Estimator MMSE
MMSE =
=
Nonlinear Estimation
2
E {(Y − E[Y ]) − a∗ (X − E[X])}
σY2 1 − ρ2X,Y
• In general, the best estimator of Y given X will be nonlinear
• Suppose we wish to find the g(X) that best approximates Y in
the MMSE sense
min EX,Y [(Y − g(X))2 ]
g(·)
• When ρX,Y = ±1, MMSE = 0
Using conditional expectation
2
2
EX,Y (Y − g(X))
= EX EY (Y − g(x)) |X = x
∞
2
=
EY (Y − g(x)) |X = x fX (x) dx
−∞
∞ ∞
2
(y − g(x)) fY (y|x) dy fX (x) dx
=
• Perfect correlation implies perfect prediction
• No correlation (ρX,Y = ±0) implies MMSE = σY2
−∞
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
57
Nonlinear Estimation Continued
∞ ∞
EX,Y [(Y − g(X))2 ] =
(y − g(x))2 fY (y|x) dy fX (x) dx
−∞
J. McNames
Portland State University
−∞
ECE 4/557
Random Variables
Ver. 1.06
58
Random Variables
Ver. 1.06
60
Random Vectors
Let X be a random vector,
⎡ ⎤
X1
⎢ X2 ⎥
⎢ ⎥
X=⎢ . ⎥
⎣ .. ⎦
−∞
• Integrand is positive for all x
XL
• Minimized by minimizing EY [(Y − g(x))2 |X = x] for each x
• g(x) is a constant relative to EY [·]
Then, the expected value of X is defined as
⎤
⎡
E[X1 ]
⎢ E[X2 ] ⎥
⎥
⎢
E[X] = ⎢ . ⎥
⎣ .. ⎦
• Reduces to the equivalent example earlier: estimate Y with a
constant g(x)
• Therefore, the g(x) that minimizes the MSE is
Ŷ = g ∗ (x) = EY [Y |X = x]
E[XL ]
∗
• The function g (x) is called the regression curve
• Has the smallest possible MSE
• Linear estimators are generally worse (larger MSE)
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
59
J. McNames
Portland State University
ECE 4/557
Linear Estimation with Vectors
Suppose that we wish to estimate Y with a linear sum of random
variables X1 , X2 , . . . XL .
Ŷ
Linear Estimation Error
ε2
T
=
MSE ≡
Then the error Y − Ŷ can be written as
=
T
The expected value of the squared error is
i=1
ε
T
= Y 2 + w XX w − 2Y X w
T
X w
L
X i wi
=
T
= (Y − X w)2
T
Y −X w
E[ε2 ]
T
=
E[(Y − X w)2 ]
=
E[Y 2 ] + w E[XX ]w − 2 E[Y X ]w
T
T
T
and the squared error can be written as
ε2
T
= (Y − X w)2
T
T
T
= Y 2 + w XX w − 2Y X w
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
61
Correlation Matrix
Let X be a zero-mean random vector. The variance-covariance of
the vector X, also called the correlation matrix, can be written as
R
J. McNames
Portland State University
P
T
2
σX
1 ,X2
2
σX
2
..
.
2
σX
L ,X1
2
σX
L ,X2
Random Variables
Ver. 1.06
62
Cross-Correlation Matrix
Let Y be a zero-mean scalar random variable. Define P , the cross
correlation matrix, as
= σ 2 {X}
= E[XX ]
⎡ 2
σ X1
2
⎢ σX
⎢ 2 ,X1
= ⎢ .
⎣ ..
ECE 4/557
⎤
2
. . . σX
1 ,XL
2
⎥
. . . σX
2 ,XL ⎥
.. ⎥
..
.
. ⎦
2
...
σXL
E[Y X]
⎤
⎡
E[Y X1 ]
⎢ E[Y X2 ] ⎥
⎥
⎢
= ⎢
⎥
..
⎦
⎣
.
=
E[Y XL ]
T
R is a symmetric matrix: R = R.
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
63
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
64
Minimum Mean Squared Error
Minimum Mean Squared Error
Using the matrices R and P , the MSE can be rewritten
E[ε2 ]
w∗ = R−1 P
T
=
E[(Y − X w)2 ]
=
E[Y ] + w E[XX ]w − 2 E[Y X ]w
Find the minimum MSE by substitution into the equation for the MSE
T
2
T
T
T
T
min E[ε2 ] = σY2 + w∗ Rw∗ − 2P w∗
T
σY2 + w Rw − 2w P
=
Take the gradient of the MSE above with respect to w and set the
resulting expression equal to zero.
∇w E[ε ]
2
=
∇w (σY2
T
T
+ w Rw − 2w P )
T
= R w + Rw − 2P
= 2Rw − 2P
= 0
T
=
σY2 + (R−1 P ) R(R−1 P ) − 2P (R−1 P )
=
σY2 + P R−1 P − 2P R−1 P
=
σY2 − P R−1 P
=
σY2 + P w∗
T
T
T
Solving for w∗ , we obtain
w∗ = R−1 P
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
65
Closing Comments
• In general, we cannot calculate anything discussed so far
• Everything discussed requires the true PDF (or CDF) be known
• In practice, we’ll have data, not PDF’s
• Represents a best-case scenario
• How close can we approximate the true point estimate given only
data
• Will compare our estimators on cases where the true PDF is known
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
67
J. McNames
Portland State University
ECE 4/557
Random Variables
Ver. 1.06
66