Download Pairs of Random Variables

Document related concepts
Transcript
Pairs of Random Variables
• In this chapter, we consider experiments that produce a collection of random variables, X1 , X2 , . . . , Xn , where n can be any integer.
• For most of this chapter, we study n = 2 random variables: X and Y . A pair of
random variables is enough to show the important concepts and useful problemsolving techniques. Moreover, the definitions and theorems we introduce for X and
Y generalize to n random variables. These generalized definitions appear near the
end of this chapter in Section 5.10.
• We also note that a pair of random variables X and Y is the same as the two0
dimensional vector X Y . Similarly, the random variables X1 , . . . , Xn can be writ
0
ten as the n dimensional vector X = X1 · · · Xn . Since the components of X
are random variables, X is called a random vector . Thus this chapter begins our
study of random vectors.
• This subject is continued in Chapter 8, which uses techniques of linear algebra to
develop further the properties of random vectors.
Pairs of Random Variables: Definitions
• We begin here with the definition of FX,Y (x, y), the joint cumulative distribution
function of two random variables, a generalization of the CDF introduced in Section 3.4 and again in Section 4.2.
• The joint CDF is a complete probability model for any experiment that produces two
random variables. However, it not very useful for analyzing practical experiments.
• More useful models are PX,Y (x, y), the joint probability mass function for two discrete random variables, presented in Sections 5.2 and 5.3, and fX,Y (x, y), the joint
probability density function of two continuous random variables, presented in Sections 5.4 and 5.5.
• Section 5.7 considers functions of two random variables and expectations, respectively.
• We extend the definition of independent events to define independent random variables.
• The subject of Section 5.9 is the special case in which X and Y are Gaussian.
Example 5.1
We would like to measure random variable X, but we instead observe
Y = X + Z.
(5.1)
The noise Z prevents us from perfectly observing X. In some settings, Z
is an interfering signal. In the simplest setting, Z is just noise inside the
circuitry of your measurement device that is unrelated to X. In this case,
it is appropriate to assume that the signal and noise are independent;
that is, the events X = x and Z = z are independent. This simple model
produces three random variables, X, Y and Z, but any pair completely
specifies the remaining random variable. Thus we will see that a probability model for the pair (X, Z) or for the pair (X, Y ) will be sufficient to
analyze experiments related to this system.
Section 5.1
Joint Cumulative Distribution
Function
Figure 5.1
Y
{X £ x, Y £ y}
(x,y)
X
The area of the (X, Y ) plane corresponding to the joint cumulative distribution function FX,Y (x, y).
Joint Cumulative Distribution
Definition 5.1 Function (CDF)
The joint cumulative distribution function of random variables X and Y
is
FX,Y (x, y ) = P [X ≤ x, Y ≤ y ] .
Theorem 5.1
For any pair of random variables, X, Y ,
(a)
(b)
(c)
(d)
(e)
0 ≤ FX,Y (x, y) ≤ 1,
FX,Y (∞, ∞) = 1,
FX(x) = FX,Y (x, ∞),
FY (y) = FX,Y (∞, y),
FX,Y (x, −∞) = 0,
(f) FX,Y (−∞, y) = 0,
(g) If x ≤ x1 and y ≤ y1, then
FX,Y (x, y ) ≤ FX,Y (x1, y1)
Example 5.2 Problem
X years is the age of children entering first grade in a school. Y years is
the age of children entering second grade. The joint CDF of X and Y is
FX,Y (x, y ) =
Find FX(x) and FY (y).



0





0





(x − 5)(y − 6)

y−6






x−5




1
x < 5,
y < 6,
5 ≤ x < 6, 6 ≤ y < 7,
x ≥ 6, 6 ≤ y < 7,
5 ≤ x < 6, y ≥ 7,
otherwise.
(5.3)
Example 5.2 Solution
Using Theorem 5.1(b) and Theorem 5.1(c), we find
FX (x) =



0



x−5
1
x < 5,
5 ≤ x < 6,
x ≥ 6,
F Y (y ) =



0
y−6
1



y < 6,
6 ≤ y < 7,
y ≥ 7.
(5.4)
Referring to Theorem 4.6, we see from Equation (5.4) that X is a continuous uniform (5, 6) random variable and Y is a continuous uniform (6, 7)
random variable.
Theorem 5.2
P [x1 < X ≤ x2, y1 < Y ≤ y2] = FX,Y (x2, y2) − FX,Y (x2, y1)
− FX,Y (x1, y2) + FX,Y (x1, y1) .
Quiz 5.1
Express the following extreme values of the joint CDF FX,Y (x, y) as numbers or in terms of the CDFs FX(x) and FY (y).
(a) FX,Y (−∞, 2)
(b) FX,Y (∞, ∞)
(c) FX,Y (∞, y)
(d) FX,Y (∞, −∞)
Quiz 5.1 Solution
Each value of the joint CDF can be found by considering the corresponding probability.
(a) FX,Y (−∞, 2) = P[X ≤ −∞, Y ≤ 2] ≤ P[X ≤ −∞] = 0 since X cannot
take on the value −∞.
(b) FX,Y (∞, ∞) = P[X ≤ ∞, Y ≤ ∞] = 1.
This result is given in Theorem 5.1.
(c) FX,Y (∞, y) = P[X ≤ ∞, Y ≤ y] = P[Y ≤ y] = FY (y).
(d) FX,Y (∞, −∞) = P[X ≤ ∞, Y ≤ −∞] = P[Y ≤ −∞] = 0 since Y cannot
take on the value −∞.
Section 5.2
Joint Probability Mass Function
Joint Probability Mass
Definition 5.2 Function (PMF)
The joint probability mass function of discrete random variables X and
Y is
PX,Y (x, y ) = P [X = x, Y = y ] .
Example 5.3 Problem
Test two integrated circuits one after the other. On each test, the
possible outcomes are a (accept) and r (reject). Assume that all circuits
are acceptable with probability 0.9 and that the outcomes of successive
tests are independent. Count the number of acceptable circuits X and
count the number of successful tests Y before you observe the first reject.
(If both tests are successful, let Y = 2.) Draw a tree diagram for the
experiment and find the joint PMF PX,Y (x, y).
Example 5.3 Solution
0.9
HH
a
H
HH
H
HH
0.1
H
H
r
a •aa
0.9 X=2,Y =2The experiment has the tree dia-
0.1
r •ar
gram shown to the left. The sample
X=1,Y =1space of the experiment is
0.9
a •ra
X=1,Y =0
0.1
r •rr
X=0,Y =0Observing the tree diagram, we com-
X
X
XXX
XXX
X
X
XXX
XXX
XXX
X
S = {aa, ar, ra, rr} .
(5.7)
pute
P [aa] = 0.81,
P [ar] = 0.09,
(5.8)
P [ra] = 0.09,
P [rr] = 0.01.
(5.9)
Each outcome specifies a pair of values X and Y . Let g(s) be the function
that transforms each outcome s in the sample space S into the pair of
random variables (X, Y ). Then
g(aa) = (2, 2),
g(ar) = (1, 1),
g(ra) = (1, 0),
g(rr) = (0, 0).
(5.10)
[Continued]
Example 5.3 Solution
(Continued 2)
For each pair of values x, y, PX,Y (x, y) is the sum of the probabilities of the
outcomes for which X = x and Y = y. For example, PX,Y (1, 1) = P[ar].
PX,Y (x, y) y = 0 y = 1 y = 2
x=0
0.01
0
0
0.09 0.09
0
x=1
x=2
0
0
0.81
The joint PMF can be represented by
the table on left, or, as shown below, as
a set of labeled points in the x, y plane
where each point is a possible value of
the pair (x, y), or as a simple list:
y
2
•
6
•
1
0 •
0
.01
•
1
.81
.09
PX,Y (x, y ) =
.09
-
2
x



0.81






0.09
0.09




0.01




0
x = 2, y = 2,
x = 1, y = 1,
x = 1, y = 0,
x = 0, y = 0.
otherwise
Theorem 5.3
For discrete random variables X and Y and any set B in the X, Y plane,
the probability of the event {(X, Y ) ∈ B} is
P [B ] =
X
(x,y)∈B
PX,Y (x, y ) .
Example 5.4 Problem
Continuing Example 5.3, find the probability of the event B that X,
the number of acceptable circuits, equals Y , the number of tests before
observing the first failure.
Example 5.4 Solution
Mathematically, B is the event {X = Y }. The elements of B with nonzero
probability are
B ∩ SX,Y = {(0, 0), (1, 1), (2, 2)} .
(5.12)
Therefore,
P [B ] = PX,Y (0, 0) + PX,Y (1, 1) + PX,Y (2, 2)
= 0.01 + 0.09 + 0.81 = 0.91.
(5.13)
Figure 5.2
Y
Y
B={X+Y £ 3}
X
Subsets B of the (X, Y ) plane.
bullets.
B={X 2 + Y 2 £ 9}
X
Points (X, Y ) ∈ SX,Y are marked by
Quiz 5.2
The joint PMF PQ,G(q, g) for random variables Q and G is given in the
following table:
PQ,G (q, g ) g = 0 g = 1 g = 2 g = 3
q=0
0.06 0.18 0.24 0.12
q=1
0.04 0.12 0.16 0.08
Calculate the following probabilities:
(a) P[Q = 0]
(b) P[Q = G]
(c) P[G > 1]
(d) P[G > Q]
Quiz 5.2 Solution
From the joint PMF of Q and G given in the table, we can calculate the requested
probabilities by summing the PMF over those values of Q and G that correspond to the
event.
(a) The probability that Q = 0 is
P [Q = 0] = PQ,G (0, 0) + PQ,G (0, 1) + PQ,G (0, 2) + PQ,G (0, 3)
= 0.06 + 0.18 + 0.24 + 0.12 = 0.6.
(1)
(b) The probability that Q = G is
P [Q = G] = PQ,G (0, 0) + PQ,G (1, 1) = 0.18.
(2)
(c) The probability that G > 1 is
P [G > 1] =
3 X
1
X
PQ,G (q, g)
g=2 q=0
= 0.24 + 0.16 + 0.12 + 0.08 = 0.6.
(3)
(d) The probability that G > Q is
P [G > Q] =
3
1
X
X
PQ,G (q, g)
q=0 g=q+1
= 0.18 + 0.24 + 0.12 + 0.16 + 0.08 = 0.78.
(4)
Section 5.3
Marginal PMF
Theorem 5.4
For discrete random variables X and Y with joint PMF PX,Y (x, y),
P X ( x) =
X
y∈SY
PX,Y (x, y ) ,
P Y (y ) =
X
x∈SX
PX,Y (x, y ) .
Example 5.5 Problem
PX,Y (x, y) y = 0 y = 1 y = 2
x=0
0.01
0
0
0.09 0.09
0
x=1
0
0
0.81
x=2
Y.
In Example 5.3, we found that random
variables X and Y have the joint PMF
shown in this table. Find the marginal
PMFs for the random variables X and
Example 5.5 Solution
We note that both X and Y have range {0, 1, 2}. Theorem 5.4 gives
PX (0) =
2
X
PX,Y (0, y) = 0.01
y=0
PX (2) =
2
X
PX (1) =
2
X
PX,Y (1, y) = 0.18
(5.14)
x 6= 0, 1, 2
(5.15)
y=0
PX,Y (2, y) = 0.81
PX (x) = 0
y=0
Referring to the table representation of PX,Y (x, y), we observe that each value of PX(x)
is the result of adding all the entries in one
P row of the table. Similarly, the formula for
the PMF of Y in Theorem 5.4, PY (y) = x∈SX PX,Y (x, y), is the sum of all the entries in
one column of the table.
[Continued]
Example 5.5 Solution
(Continued 2)
We display PX(x) and PY (y) by rewriting the table and placing the row sums and column
sums in the margins.
PX,Y (x, y)
x=0
x=1
x=2
PY (y)
y=0 y=1 y=2
0.01
0
0
0.09
0.09
0
0
0
0.81
0.10
0.09
0.81
PX (x)
0.01
0.18
0.81
Thus the column in the right margin shows PX(x) and the row in the bottom margin
shows PY (y). Note that the sum of all the entries in the bottom margin is 1 and
so is the sum of all the entries in the right margin. This is simply a verification of
Theorem 3.1(b), which states that the PMF of any random variable must sum to 1.
Quiz 5.3
The probability mass function PH,B(h, b) for the two random variables H
and B is given in the following table. Find the marginal PMFs PH(h) and
PB(b).
PH,B (h, b) b = 0 b = 2 b = 4
h = −1
0
0.4
0.2
0.1
0
0.1
h=0
h=1
0.1
0.1
0
(5.16)
Quiz 5.3 Solution
By Theorem 5.4, the marginal PMF of H is
X
PH (h) =
PH,B (h, b) .
(1)
b=0,2,4
For each value of h, this corresponds to calculating the row sum across the table of the
joint PMF. Similarly, the marginal PMF of B is
PB (b) =
1
X
PH,B (h, b) .
(2)
h=−1
For each value of b, this corresponds to the column sum down the table of the joint
PMF. The easiest way to calculate these marginal PMFs is to simply sum each row and
column:
PH,B(h,b)
h = −1
h=0
h=1
PB(b)
b=0 b=2 b=4
0
0.4
0.2
0.1
0
0.1
0.1
0.1
0
0.2
0.5
0.3
PH(h)
0.6
0.2
0.2
Section 5.4
Joint Probability Density Function
Joint Probability Density
Definition 5.3 Function (PDF)
The joint PDF of the continuous random variables X and Y is a function
fX,Y (x, y) with the property
FX,Y (x, y ) =
Z x
Z y
−∞ −∞
fX,Y (u, v ) dv du.
Theorem 5.5
fX,Y (x, y ) =
∂ 2FX,Y (x, y )
∂x ∂y
Example 5.6 Problem
Use the joint CDF for childrens’ ages X and Y given in Example 5.2 to
derive the joint PDF presented in Equation (5.5).
Example 5.6 Solution
Referring to Equation (5.3) for the joint CDF FX,Y (x, y), we must evaluate the partial derivative ∂ 2FX,Y (x, y)/∂x ∂y for each of the six regions
specified in Equation (5.3). However, ∂ 2FX,Y (x, y)/∂x ∂y is nonzero only
if FX.Y (x.y) is a function of both x and y. In this example, only the region
{5 ≤ x < 6, 6 ≤ y < 7} meets this requirement. Over this region,
∂2
∂
∂
fX,Y (x, y ) =
[x − 5] [y − 6] = 1.
[(x − 5)(y − 6)] =
∂x ∂y
∂x
∂y
Over all other regions, the joint PDF fX,Y (x, y) is zero.
(5.18)
Theorem 5.6
A joint PDF fX,Y (x, y) has the following properties corresponding to first
and second axioms of probability (see Section 1.2):
(a) fX,Y (x, y) ≥ 0 for all (x, y),
(b)
Z ∞ Z ∞
−∞ −∞
fX,Y (x, y) dx dy = 1.
Theorem 5.7
The probability that the continuous random variables (X, Y ) are in A is
P [A] =
ZZ
A
fX,Y (x, y ) dx dy.
Example 5.7 Problem
Random variables X and Y have joint PDF
fX,Y (x, y ) =

c
0
0 ≤ x ≤ 5, 0 ≤ y ≤ 3,
otherwise.
Find the constant c and P[A] = P[2 ≤ X < 3, 1 ≤ Y < 3].
(5.19)
Example 5.7 Solution
The large rectangle in the diagram is the area of nonzero probability.
Theorem 5.6 states that the integral of the joint PDF over this rectangle
is 1:
Y
1=
Z 5Z 3
0
A
X
0
c dy dx = 15c.
(5.20)
Therefore, c = 1/15.
The small dark rectangle in the diagram is the event A =
{2 ≤ X < 3, 1 ≤ Y < 3}. P[A] is the integral of the
PDF over this rectangle, which is
P [A] =
Z 3Z 3
1
2
1 15
dv du = 2/15.
(5.21)
This probability model is an example of a pair of random variables uniformly distributed over a rectangle in the X, Y plane.
Figure 5.3
Y
Y
1
1
y
y
X
x
X
x
1
1
0≤y≤x≤1
(b)
x < 0 or y < 0
(a)
Y
Y
1
1
y
y
X
x
1
X
1 x
0≤x<y
0≤y≤1
0≤x≤1
x>1
(c)
(d)
(Case (e) covering the whole triangle is omitted.)
Five cases for the CDF FX,Y (x, y) of Example 5.8.
Example 5.8 Problem
Find the joint CDF FX,Y (x, y) when X and Y have joint PDF
Y
1
fXY(x,y)=2
fX,Y (x, y ) =
X
1

2
0
0 ≤ y ≤ x ≤ 1,
otherwise.
(5.22)
Example 5.8 Solution
We can derive the joint CDF using Definition 5.3 in which we integrate the joint PDF
fX,Y (x, y) over the area shown in Figure 5.1. To perform the integration it is extremely
useful to draw a diagram that clearly shows the area with nonzero probability and then
to use the diagram to derive the limits of the integral in Definition 5.3.
The difficulty with this integral is that the nature of the region of integration depends
critically on x and y. In this apparently simple example, there are five cases to consider!
The five cases are shown in Figure 5.3. First, we note that with x < 0 or y < 0, the
triangle is completely outside the region of integration, as shown in Figure 5.3a. Thus
we have FX,Y (x, y) = 0 if either x < 0 or y < 0. Another simple case arises when x ≥ 1
and y ≥ 1. In this case, we see in Figure 5.3e that the triangle is completely inside the
region of integration, and we infer from Theorem 5.6 that FX,Y (x, y) = 1. The other
cases we must consider are more complicated. In each case, since fX,Y (x, y) = 2 over
the triangular region, the value of the integral is two times the indicated area. When
(x, y) is inside the area of nonzero probability (Figure 5.3b), the integral is
Z yZ x
(Figure 5.3b).
(5.23)
FX,Y (x, y) =
2 du dv = 2xy − y 2
0
v
[Continued]
Example 5.8 Solution
(Continued 2)
In Figure 5.3c, (x, y) is above the triangle, and the integral is
Z xZ x
FX,Y (x, y) =
2 du dv = x2
(Figure 5.3c).
0
(5.24)
v
The remaining situation to consider is shown in Figure 5.3d, when (x, y) is to the right
of the triangle of nonzero probability, in which case the integral is
Z yZ 1
FX,Y (x, y) =
2 du dv = 2y − y 2
(Figure 5.3d)
(5.25)
0
v
The resulting CDF, corresponding to the

0



 2xy − y 2
x2
FX,Y (x, y) =

2

2y
−
y


1
five cases of Figure 5.3, is
x < 0 or y < 0
0≤y≤x≤1
0 ≤ x < y, 0 ≤ x ≤ 1
0 ≤ y ≤ 1, x > 1
x > 1, y > 1
(a),
(b),
(c),
(d),
(e).
(5.26)
In Figure 5.4, the surface plot of FX,Y (x, y) shows that cases (a) through (e) correspond
to contours on the “hill” that is FX,Y (x, y). In terms of visualizing the random variables,
the surface plot of FX,Y (x, y) is less instructive than the simple triangle characterizing
the PDF fX,Y (x, y).
Because the PDF in this example is fX,Y (x, y) = 2 over (x, y) ∈ SX,Y , each probability is
just two times the area of the region shown in one of the diagrams (either a triangle
or a trapezoid). You may want to apply some high school geometry to verify that the
results obtained from the integrals are indeed twice the areas of the regions indicated.
The approach taken in our solution, integrating over SX,Y to obtain the CDF, works for
any PDF.
Figure 5.4
1
0.5
2
1
0
0
0.5
1
1.5
x
2
A graph of the joint CDF FX,Y (x, y) of Example 5.8.
0
y
Example 5.9 Problem
As in Example 5.7, random variables X and Y have joint PDF
fX,Y (x, y ) =
What is P[A] = P[Y > X]?

1/15
0
0 ≤ x ≤ 5, 0 ≤ y ≤ 3,
otherwise.
(5.27)
Example 5.9 Solution
Applying Theorem 5.7, we integrate fX,Y (x, y) over the part of the X, Y
plane satisfying Y > X. In this case,
Y
Y>X
P [A] =
=
X
Z 3
!
Z 3
1
0
x 15
Z 3
3−x
0
15
dy dx
3
2
(3 − x) 3
dx = −
.
=
30
10
0
(5.28)
(5.29)
Quiz 5.4
The joint probability density function of random variables X and Y is
fX,Y (x, y ) =

cxy
0
0 ≤ x ≤ 1, 0 ≤ y ≤ 2,
otherwise.
(5.30)
Find the constant c. What is the probability of the event A = X 2 + Y 2 ≤
1?
Quiz 5.4 Solution
To find the constant c, we apply
Z ∞Z ∞
fX,Y (x, y) dx dy
1=
−∞ −∞
2Z 1
Z
=
Z
cxy dx dy = c
0
0
2
y
0
!
Z 2
2 2
c
cy
= c.
dy =
y dy =
2 0
2 0
4 0
1
x2 Thus c = 1.
(1)
Y
2
To calculate P[A], we write
ZZ
P [A] =
fX,Y (x, y) dx dy
(2)
A
1
A
To integrate over A, we convert to polar coordinates using the substitutions x = r cos θ, y = r sin θ and dx dy = r dr dθ.
1
X
This yields
Z
π/2 Z 1
P [A] =
0
r2 sin θ cos θ r dr dθ
0
1
Z
r3 dr
=
0
Z
0
π/2
1 4
sin θ cos θ dθ = r /40
π/2!
1
sin θ =
.
2 0
8
2
(3)
Section 5.5
Marginal PDF
Theorem 5.8
If X and Y are random variables with joint PDF fX,Y (x, y),
fX (x) =
Z ∞
−∞
fX,Y (x, y ) dy,
f Y (y ) =
Z ∞
−∞
fX,Y (x, y ) dx.
Proof: Theorem 5.8
From the definition of the joint PDF, we can write
FX (x) = P [X ≤ x] =
Z ∞
Z x
−∞
−∞
fX,Y (u, y ) dy
du.
(5.31)
Taking the derivative of both sides with respect to x (which involves
differentiating an integral with variable limits), we obtain
f X ( x) =
Z ∞
−∞
fX,Y (x, y ) dy
. A similar argument holds for fY (y).
Example 5.10 Problem
The joint PDF of X and Y is
fX,Y (x, y ) =

5y/4
0
−1 ≤ x ≤ 1, x2 ≤ y ≤ 1,
otherwise.
Find the marginal PDFs fX(x) and fY (y).
(5.32)
Example 5.10 Solution
We use Theorem 5.8 to find the marginal PDF fX(x). In the figure that
accompanies Equation (5.33) below, the gray bowl-shaped region depicts
those values of X and Y for which fX,Y (x, y) > 0. When x < −1 or when
x > 1, fX,Y (x, y) = 0, and therefore fX(x) = 0. For −1 ≤ x ≤ 1,
Y X=x
1
f X ( x) =
2
x
X
5(1 − x4)
dy =
.
4
8
Z 1
5y
x2
(5.33)
fX(x)
x 1
-1
The complete expression for the marginal PDF of X is
0.5
f X ( x) =
0
−1
0
x

5(1 − x4)/8
0
−1 ≤ x ≤ 1,
otherwise.
(5.34)
1
[Continued]
Example 5.10 Solution
(Continued 2)
For the marginal PDF of Y , we note that for y < 0 or y > 1, fY (y) = 0. For 0 ≤ y ≤ 1,
we integrate over the horizontal bar marked Y = y. The boundaries of the bar are
√
√
x = − y and x = y. Therefore, for 0 ≤ y ≤ 1,
Y
1
1/2
-1 -y
Y=y
X
1/2
y 1
x=√y
5y
5y fY (y) = √
= 5y 3/2 /2.
dx =
x
4 x=−√y
− y 4
Z
√
y
(5.35)
The complete marginal PDF of Y is
fY(y)
3
2
1
fY (y) =
0
−1
0
y
1
(5/2)y 3/2
0
0 ≤ y ≤ 1,
otherwise.
(5.36)
Quiz 5.5
The joint probability density function of random variables X and Y is
fX,Y (x, y ) =

6(x + y 2)/5
0
0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
otherwise.
Find fX(x) and fY (y), the marginal PDFs of X and Y .
(5.37)
Quiz 5.5 Solution
By Theorem 5.8, the marginal PDF of X is
Z ∞
fX (x) =
fX,Y (x, y) dy
(1)
−∞
Note that fX(x) = 0 for x < 0 or x > 1. For 0 ≤ x ≤ 1,
Z
y=1
6 1
6x + 2
6
fX (x) =
xy + y 3 /3 y=0 =
(x + y 2 ) dy =
5 0
5
5
The complete expression for the PDF of X is
(6x + 2)/5
fX (x) =
0
0≤x≤1
otherwise
(2)
(3)
By the same method we obtain the marginal PDF for Y . For 0 ≤ y ≤ 1,
Z ∞
fY (y) =
fX,Y (x, y) dy
−∞
6
=
5
Z
0
1
x=1
2
6
x
6y 2 + 3
2 2
(x + y ) dx =
+ xy =
.
5 2
5
x=0
Since fY (y) = 0 for y < 0 or y > 1, the complete expression for the PDF of Y is
(3 + 6y 2 )/5 0 ≤ y ≤ 1,
fY (y) =
0
otherwise.
(4)
(5)
Section 5.6
Independent Random Variables
Definition 5.4 Independent Random Variables
Random variables X and Y are independent if and only if
Discrete:
PX,Y (x, y) = PX(x)PY (y)
Continuous: fX,Y (x, y) = fX(x)fY (y).
Example 5.11 Problem
Are the childrens’ ages X and Y in Example 5.2 independent?
Example 5.11 Solution
In Example 5.2, we derived the CDFs FX(x) and FY (y), which showed
that X is uniform (5, 6) and Y is uniform (6, 7). Thus X and Y have
marginal PDFs
f X ( x) =

1
0
5 ≤ x < 6,
otherwise,
fY (y ) =

1
0
6 ≤ x < 7,
otherwise.
(5.38)
Referring to Equation (5.5), we observe that fX,Y (x, y) = fX(x)fY (y).
Thus X and Y are independent.
Example 5.12 Problem
fX,Y (x, y ) =
Are X and Y independent?

4xy
0
0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
otherwise.
Example 5.12 Solution
The marginal PDFs of X and Y are
f X ( x) =

2x
0
0 ≤ x ≤ 1,
otherwise,
fY (y ) =

2y
0
0 ≤ y ≤ 1,
otherwise.
(5.39)
It is easily verified that fX,Y (x, y) = fX(x)fY (y) for all pairs (x, y), and so
we conclude that X and Y are independent.
Example 5.13 Problem
fU,V (u, v ) =
Are U and V independent?

24uv
0
u ≥ 0, v ≥ 0, u + v ≤ 1,
otherwise.
(5.40)
Example 5.13 Solution
Since fU,V (u, v) looks similar in form to fX,Y (x, y) in the previous example,
we might suppose that U and V can also be factored into marginal PDFs
fU(u) and fV (v). However, this is not the case. Owing to the triangular
shape of the region of nonzero probability, the marginal PDFs are
fU (u) =

12u(1 − u)2
0

12v(1 − v)2
0 ≤ u ≤ 1,
fV (v ) =
0
otherwise,
0 ≤ v ≤ 1,
otherwise.
Clearly, U and V are not independent. Learning U changes our knowledge
of V . For example, learning U = 1/2 informs us that P[V ≤ 1/2] = 1.
Example 5.14 Problem
Consider again the noisy observation model of Example 5.1. Suppose X
is a Gaussian (0, σX ) information signal sent by a radio transmitter and
Y = X + Z is the output of a low-noise amplifier attached to the antenna
of a radio receiver. The noise Z is a Gaussian (0, σZ ) random variable
that is generated within the receiver. What is the joint PDF fX,Z(x, z)?
Example 5.14 Solution
From the information given, we know that X and Z have PDFs
fX (x) = q
1
2
2πσX
2
−x2 /2σX
e
,
1
2
−z 2 /2σZ
fZ (z ) = q
e
2
2πσZ
.
(5.41)
The signal X depends on the information being transmitted by the sender
and the noise Z depends on electrons bouncing around in the receiver
circuitry. As there is no reason for these to be related, we model X and
Z as independent. Thus, the joint PDF is
fX,Z (x, z ) = fX (x) fZ (z ) =
1
2π
q
2 σ2
σX
Z
1
−2
e
x2 + z 2
σ2
σ2
X
Z
.
(5.42)
Quiz 5.6(A)
Random variables X and Y in Example 5.3 and random variables Q and
G in Quiz 5.2 have joint PMFs:
PX,Y (x, y)
x=0
x=1
x=2
y=0 y=1 y=2
0.01
0
0
0.09
0.09
0
0
0
0.81
(a) Are X and Y independent?
(b) Are Q and G independent?
PQ,G(q, g)
q=0
q=1
g=0 g=1 g=2 g=3
0.06
0.18
0.24
0.12
0.04
0.12
0.16
0.08
Quiz 5.6(A) Solution
(a) For random variables X and Y from Example 5.3, we observe that PY (1) = 0.09
and PX(0) = 0.01. However,
PX,Y (0, 1) = 0 6= PX (0) PY (1)
(1)
Since we have found a pair x, y such that PX,Y (x, y) 6= PX(x)PY (y), we can conclude
that X and Y are dependent. Note that whenever PX,Y (x, y) = 0, independence
requires that either PX(x) = 0 or PY (y) = 0.
(b) For random variables Q and G from Quiz 5.2, it is not obvious whether they are
independent. Unlike X and Y in part (a), there are no obvious pairs q, g that fail
the independence requirement. In this case, we calculate the marginal PMFs from
the table of the joint PMF PQ,G(q, g) in Quiz 5.2. In transposed form, this table is
PQ,G(q, g)
g=0
g=1
g=2
g=3
PQ(q)
q=0 q=1
0.06 0.04
0.18 0.12
0.24 0.16
0.12 0.08
0.60 0.40
PG(g)
0.10
0.30
0.40
0.20
Careful study of the table will verify that PQ,G(q, g) = PQ(q)PG(g) for every pair q, g.
Hence Q and G are independent.
Quiz 5.6(B)
Random variables X1 and X2 are independentand identically distributed
with probability density function
fX (x) =

x/2
0
What is the joint PDF fX1,X2(x1, x2)?
0 ≤ x ≤ 2,
otherwise.
(5.43)
Quiz 5.6(B) Solution
Since X1 and X2 are identical, fX1(x) = fX2(x) = fX(x). Since X1 and
X2 are independent,

 x1 · x2
fX1,X2 (x1, x2) = fX1 (x1) fX2 (x2) = 2 2
0
0 ≤ x1, x2 ≤ 2,
otherwise.
(1)
Section 5.7
Expected Value of a Function of
Two Random Variables
Theorem 5.9
For random variables X and Y , the expected value of W = g(X, Y ) is
Discrete:
E[W ] =
X
X
g(x, y)PX,Y (x, y)
x∈SX y∈SY
Continuous: E[W ] =
Z ∞ Z ∞
−∞ −∞
g(x, y)fX,Y (x, y) dx dy.
Theorem 5.10
E [a1g1(X, Y ) + · · · + angn(X, Y )] = a1 E [g1(X, Y )] + · · · + an E [gn(X, Y )] .
Proof: Theorem 5.10
Let g(X, Y ) = a1 g1 (X, Y ) + · · · + an gn (X, Y ). For discrete random variables X, Y , Theorem 5.9 states
X X
E [g(X, Y )] =
(5.44)
(a1 g1 (x, y) + · · · + an gn (x, y)) PX,Y (x, y) .
x∈SX y∈SY
We can break the double summation into n weighted double summations:
X X
X X
E [g(X, Y )] = a1
g1 (x, y)PX,Y (x, y) + · · · + an
gn (x, y)PX,Y (x, y) .
x∈SX y∈SY
x∈SX y∈SY
By Theorem 5.9, the ith double summation on the right side is E[gi (X, Y )]; thus,
E [g(X, Y )] = a1 E [g1 (X, Y )] + · · · + an E [gn (X, Y )] .
For continuous random variables, Theorem 5.9 says
Z ∞Z ∞
E [g(X, Y )] =
(a1 g1 (x, y) + · · · + an gn (x, y)) fX,Y (x, y) dx dy.
−∞
(5.45)
(5.46)
−∞
To complete the proof, we express this integral as the sum of n integrals and recognize
that each of the new integrals is a weighted expected value, ai E[gi (X, Y )].
Theorem 5.11
For any two random variables X and Y ,
E [X + Y ] = E [X ] + E [Y ] .
Theorem 5.12
The variance of the sum of two random variables is
Var [X + Y ] = Var [X ] + Var [Y ] + 2 E [(X − µX )(Y − µY )] .
Proof: Theorem 5.12
Since E[X + Y ] = µX + µY ,
h
Var[X + Y ] = E (X + Y − (µX + µY ))
h
2
i
= E ((X − µX ) + (Y − µY ))
2
i
h
i
2
2
= E (X − µX ) + 2(X − µX )(Y − µY ) + (Y − µY ) . (5.47)
We observe that each of the three terms in the preceding expected values
is a function of X and Y . Therefore, Theorem 5.10 implies
h
i
h
i
2
2
Var[X + Y ] = E (X − µX ) + 2 E [(X − µX )(Y − µY )] + E (Y − µY ) .
(5.48)
The first and last terms are, respectively, Var[X] and Var[Y ].
Example 5.15 Problem
A company website has three pages. They require 750 kilobytes, 1500
kilobytes, and 2500 kilobytes for transmission. The transmission speed
can be 5 Mb/s for external requests or 10 Mb/s for internal requests.
Requests arrive randomly from inside and outside the company independently of page length, which is also random. The probability models for
transmision speed, R, and page length, L, are:
P R (r ) =



0.4
0.6
0



r = 5,
r = 10,
otherwise,
PL (l) =


0.3




0.5

0.2





0
l = 750,
l = 1500,
l = 2500,
otherwise.
(5.49)
Write an expression for the transmission time g(R, L) seconds. Derive the
expected transmission time E[g(R, L)]. Does E[(g(R, L)] = g(E[R], E[L])?
Example 5.15 Solution
The transmission time T seconds is the the page length (in kb) divided
by the transmission speed (in kb/s), or T = 8L/1000R. Because R and
L are independent, PR,L(r, l) = PR(r)PL(l) and
E [g(R, L)] =
XX
r
PR (r) PL (l)
l
8l
1000r

!
X P R (r )
X
8

=
P L (l ) l 
1000 r
r
l
0.4
0.6
8
+
(0.3(750) + 0.5(1500) + 0.2(2500))
1000 5
10
= 1.652 s.
(5.50)
=
By comparison, E[R] =
kilobytes. This implies
g(E [R] , E [L]) =
P
P
rP
(r)
=
8
Mb/s
and
E[L]
=
R
r
l lPL(l) = 1475
8 E [L]
= 1.475 s 6= E [g(R, L)] .
1000 E [R]
(5.51)
Section 5.8
Covariance, Correlation and
Independence
Definition 5.5 Covariance
The covariance of two random variables X and Y is
Cov [X, Y ] = E [(X − µX ) (Y − µY )] .
Example 5.16
Suppose we perform an experiment in which we measure X and Y in
centimeters (for example the height of two sisters). However, if we change
units and measure height in meters, we will perform the same experiment
except we observe X̂ = X/100 and Ŷ = Y /100. In this case, X̂ and Ŷ
have expected values µX̂ = µX /100 m, µŶ = µY /100 m and
h
i
h
i
Cov X̂, Ŷ = E (X̂ − µX̂ )(Ŷ − µŶ )
E [(X − µX )(Y − µY )]
Cov [X, Y ] 2
=
=
m .
10, 000
10, 000
(5.53)
Changing the unit of measurement from cm2 to m2 reduces the covariance by a factor of 10, 000. However, the tendency of X − µX and Y − µY
to have the same sign is the same as the tendency of X̂ − µX̂ and Ŷ − µŶ
to have the same sign. (Both are an indication of how likely it is that a
girl is taller than average if her sister is taller than average).
Definition 5.6 Correlation Coefficient
The correlation coefficient of two random variables X and Y is
ρX,Y = q
Cov [X, Y ]
Cov [X, Y ]
=
.
σ
σ
X Y
Var[X] Var[Y ]
Figure 5.5
0
−2
2
Y
2
Y
Y
2
0
−2
−2
0
X
2
(a) ρX,Y = −0.9
0
−2
−2
0
X
(b) ρX,Y = 0
2
−2
0
X
2
(c) ρX,Y = 0.9
Each graph has 200 samples, each marked by a dot, of the random
variable pair (X, Y ) such that E[X] = E[Y ] = 0, Var[X] = Var[Y ] = 1.
Theorem 5.13
If X̂ = aX + b and Ŷ = cY + d, then
(a) ρX̂,Ŷ = ρX,Y ,
(b) Cov[X̂, Ŷ ] = ac Cov[X, Y ].
Theorem 5.14
−1 ≤ ρX,Y ≤ 1.
Proof: Theorem 5.14
2 and σ 2 denote the variances of X and Y , and for a constant a,
Let σX
Y
let W = X − aY . Then,
h
i
2
Var[W ] = E (X − aY ) − (E [X − aY ])2 .
(5.54)
Since E[X − aY ] = µX − aµY , expanding the squares yields
h
i 2
2
2
2
2
2
Var[W ] = E X − 2aXY + a Y − µX − 2aµX µY + a µY
= Var[X] − 2a Cov [X, Y ] + a2 Var[Y ].
(5.55)
Since Var[W ] ≥ 0 for any a, we have 2a Cov[X, Y ] ≤ Var[X] + a2 Var[Y ].
Choosing a = σX /σY yields Cov[X, Y ] ≤ σY σX , which implies ρX,Y ≤ 1.
Choosing a = −σX /σY yields Cov[X, Y ] ≥ −σY σX , which implies ρX,Y ≥
−1.
Theorem 5.15
If X and Y are random variables such that Y = aX + b,
ρX,Y =


 −1 a < 0,


0 a = 0,
1 a > 0.
5.8 Comment: Examples of Correlation
Some examples of positive, negative, and zero correlation coefficients include:
• X is a student’s height. Y is the same student’s weight. 0 < ρX,Y < 1.
• X is the distance of a cellular phone from the nearest base station. Y is the power
of the received signal at the cellular phone. −1 < ρX,Y < 0.
• X is the temperature of a resistor measured in degrees Celsius. Y is the temperature
of the same resistor measured in Kelvins. ρX,Y = 1 .
• X is the gain of an electrical circuit measured in decibels. Y is the attenuation,
measured in decibels, of the same circuit. ρX,Y = −1.
• X is the telephone number of a cellular phone. Y is the Social Security number of
the phone’s owner. ρX,Y = 0.
Definition 5.7 Correlation
The correlation of X and Y is rX,Y = E[XY ]
Theorem 5.16
(a) Cov[X, Y ] = rX,Y − µX µY .
(b) Var[X + Y ] = Var[X] + Var[Y ] + 2 Cov[X, Y ].
(c) If X = Y , Cov[X, Y ] = Var[X] = Var[Y ] and rX,Y = E[X 2] = E[Y 2].
Proof: Theorem 5.16
Cross-multiplying inside the expected value of Definition 5.5 yields
Cov [X, Y ] = E [XY − µX Y − µY X + µX µY ] .
(5.56)
Since the expected value of the sum equals the sum of the expected
values,
Cov [X, Y ] = E [XY ] − E [µX Y ] − E [µY X ] + E [µY µX ] .
(5.57)
Note that in the expression E[µY X], µY is a constant. Referring to
Theorem 3.12, we set a = µY and b = 0 to obtain E[µY X] = µY E[X] =
µY µX . The same reasoning demonstrates that E[µX Y ] = µX E[Y ] =
µX µY . Therefore,
Cov [X, Y ] = E [XY ] − µX µY − µY µX + µY µX = rX,Y − µX µY .
(5.58)
The other relationships follow directly from the definitions and Theorem 5.12.
Example 5.17 Problem
For the integrated circuits tests in Example 5.3, we found in Example 5.5
that the probability model for X and Y is given by the following matrix.
PX,Y (x, y ) y = 0 y = 1 y = 2 PX (x)
x=0
0.01
0
0
0.01
x=1
0.09 0.09
0
0.18
0
0
0.81
0.81
x=2
P Y (y )
0.10 0.09 0.81
Find rX,Y and Cov[X, Y ].
Example 5.17 Solution
By Definition 5.7,
rX,Y = E [XY ] =
2 X
2
X
xyPX,Y (x, y )
(5.59)
x=0 y=0
= (1)(1)0.09 + (2)(2)0.81 = 3.33.
(5.60)
To use Theorem 5.16(a) to find the covariance, we find
E [X ] = (1)(0.18) + (2)(0.81) = 1.80,
E [Y ] = (1)(0.09) + (2)(0.81) = 1.71.
(5.61)
Therefore, by Theorem 5.16(a), Cov[X, Y ] = 3.33−(1.80)(1.71) = 0.252.
Definition 5.8 Orthogonal Random Variables
Random variables X and Y are orthogonal if rX,Y = 0.
Definition 5.9 Uncorrelated Random Variables
Random variables X and Y are uncorrelated if Cov[X, Y ] = 0.
Theorem 5.17
For independent random variables X and Y ,
(a) E[g(X)h(Y )] = E[g(X)] E[h(Y )],
(b) rX,Y = E[XY ] = E[X] E[Y ],
(c) Cov[X, Y ] = ρX,Y = 0,
(d) Var[X + Y ] = Var[X] + Var[Y ],
Proof: Theorem 5.17
We present the proof for discrete random variables. By replacing PMFs and sums
with PDFs and integrals we arrive at essentially the same proof for continuous random
variables. Since PX,Y (x, y) = PX(x)PY (y),
X X
E [g(X)h(Y )] =
g(x)h(y)PX (x) PY (y)
x∈SX y∈SY
=

!
X
X

g(x)PX (x)
h(y)PY (y) = E [g(X)] E [h(Y )] .
x∈SX
(5.62)
y∈SY
If g(X) = X, and h(Y ) = Y , this equation implies rX,Y = E[XY ] = E[X] E[Y ]. This
equation and Theorem 5.16(a) imply Cov[X, Y ] = 0. As a result, Theorem 5.16(b)
implies Var[X + Y ] = Var[X] + Var[Y ]. Furthermore, ρX,Y = Cov[X, Y ]/(σX σY ) = 0.
Example 5.18 Problem
For the noisy observation Y = X +Z of Example 5.1, find the covariances
Cov[X, Z] and Cov[X, Y ] and the correlation coefficients ρX,Z and ρX,Y .
Example 5.18 Solution
We recall from Example 5.1 that the signal X is Gaussian (0, σX ), that
the noise Z is Gaussian (0, σZ ), and that X and Z are independent. We
know from Theorem 5.17(c) that independence of X and Z implies
Cov [X, Z ] = ρX,Z = 0.
(5.63)
In addition, by Theorem 5.17(d),
2 + σ2 .
Var[Y ] = Var[X] + Var[Z] = σX
(5.64)
Z
Since E[X] = E[Z] = 0, Theorem 5.11 tells us that E[Y ] = E[X] + E[Z] =
0 and Theorem 5.17)(b) says that E[XZ] = E[X] E[Z] = 0. This permits
us to write
Cov [X, Y ] = E [XY ] = E [hX(X + Z)i]
h
i
h
i
2
2
2
2.
= E X + XZ = E X + E [XZ ] = E X = σX
(5.65)
This implies
ρX,Y = q
Cov [X, Y ]
2
σX
=q
2 (σ 2 + σ 2 )
σX
Var[X] Var[Y ]
X
Z
v
u
u
=t
2 /σ 2
σX
Z .
2 /σ 2
1 + σX
Z
(5.66)
Quiz 5.8(A)
Random variables L and T have joint PMF
PL,T (l, t)
t = 40 sec t = 60 sec
l = 1 page
0.15
0.1
0.30
0.2
l = 2 pages
l = 3 pages
0.15
0.1.
Find the following quantities.
(a) E[L] and Var[L]
(b) E[T ] and Var[T ]
(c) The covariance Cov[L, T ]
(d) The correlation coefficient ρL,T
Quiz 5.8(A) Solution
It is helpful to first make a table that includes the marginal PMFs.
PL,T(l, t)
l=1
l=2
l=3
PT(t)
t = 40
0.15
0.3
0.15
0.6
t = 60
0.1
0.2
0.1
0.4
PL(l)
0.25
0.5
0.25
(a) The expected value of L is
E [L] = 1(0.25) + 2(0.5) + 3(0.25) = 2.
Since the second moment of L is
E L2 = 12 (0.25) + 22 (0.5) + 32 (0.25) = 4.5,
(1)
(2)
the variance of L is
2
Var [L] = E L − (E [L])2 = 0.5.
(3)
(b) The expected value of T is
E [T ] = 40(0.6) + 60(0.4) = 48.
The second moment of T is
E T 2 = 402 (0.6) + 602 (0.4) = 2400.
(4)
(5)
[Continued]
Quiz 5.8(A) Solution
(Continued 2)
Thus
Var[T ] = E T 2 − (E [T ])2 = 96.
(6)
(c) First we need to find
E [LT ] =
3
X X
ltPLT (lt)
t=40,60 l=1
= 1(40)(0.15) + 2(40)(0.3) + 3(40)(0.15)
+ 1(60)(0.1) + 2(60)(0.2) + 3(60)(0.1)
= 96.
(7)
The covariance of L and T is
Cov [L, T ] = E [LT ] − E [L] E [T ] = 96 − 2(48) = 0.
(d) Since Cov[L, T ] = 0, the correlation coefficient is ρL,T = 0.
(8)
Quiz 5.8(B)
The joint probability density function of random variables X and Y is
fX,Y (x, y ) =

xy
0
Find the following quantities.
(a) E[X] and Var[X]
(b) E[Y ] and Var[Y ]
(c) The covariance Cov[X, Y ]
(d) The correlation coefficient ρX,Y
0 ≤ x ≤ 1, 0 ≤ y ≤ 2,
otherwise.
(5.67)
Quiz 5.8(B) Solution
As in the discrete case, the calculations become easier if we first calculate the marginal
PDFs fX(x) and fY (y). For 0 ≤ x ≤ 1,
y=2
Z ∞
Z 2
1 2 fX (x) =
fX,Y (x, y) dy =
xy dy = xy = 2x.
(1)
2
−∞
0
y=0
Similarly, for 0 ≤ y ≤ 2,
Z
∞
fY (y) =
2
Z
fX,Y (x, y) dx =
−∞
0
x=1
y
1
= .
xy dx = x2 y 2
2
x=0
The complete expressions for the marginal PDFs are
2x 0 ≤ x ≤ 1,
y/2
fX (x) =
fY (y) =
0
0
otherwise,
0 ≤ y ≤ 2,
otherwise.
(2)
(3)
From the marginal PDFs, it is straightforward to calculate the various expectations.
(a) The first and second moments of X are
Z ∞
Z 1
2
(4)
E [X] =
xfX (x) dx =
2x2 dx = .
3
−∞
0
Z ∞
Z 1
1
E X2 =
x2 fX (x) dx =
2x3 dx = .
(5)
2
−∞
0
[Continued]
Quiz 5.8(B) Solution
(Continued 2)
The variance of X is
Var[X] = E[X 2 ] − (E[X])2 =
(a) The first and second moments of Y are
Z ∞
Z
E [Y ] =
yfY (y) dy =
1
.
18
2
4
1 2
y dy = ,
3
−∞
0 2
Z ∞
Z 2
2
1 3
E Y =
y dy = 2.
y 2 fY (y) dy =
2
−∞
0
(6)
(7)
The variance of Y is
2
2
16
Var[Y ] = E Y − (E [Y ])2 = 2 −
= .
9
9
(8)
(b) We start by finding
Z
∞
Z
E [XY ] =
xyfX,Y (x, y) dx, dy
−∞ −∞
1Z 2
2 2
Z
=
∞
0
0
1
2
x3 y 3 = 8.
x y dx, dy =
3 0 3 0
9
(9)
The covariance of X and Y is then
8 2 4
− · = 0.
9 3 3
(c) Since Cov[X, Y ] = 0, the correlation coefficient is ρX,Y = 0.
Cov [X, Y ] = E [XY ] − E [X] E [Y ] =
(10)
Section 5.9
Bivariate Gaussian Random
Variables
Bivariate Gaussian Random
Definition 5.10 Variables
Random variables X and Y have a bivariate Gaussian PDF with parameters µX , µY , σX > 0, σY > 0, and ρX,Y satisfying −1 < ρX,Y < 1 if

2
2 
2ρ
(x−µ
)(y−µ
)
x−µX
y−µY
X,Y
X
Y
−
+


σX
σX σY
σY


exp −

2
2(1 − ρX,Y )
q
fX,Y (x, y ) =
,
2
2πσX σY 1 − ρX,Y
Figure 5.6
ρ = −0.9
ρ=0
0.1
0.3
2
0
0
−2
1
0
−2
y
2
−1
x
−2
1
−2
y
2
−2
0
−2
2
0
−2
1
y
2
2
Y
0
0
X
−1
x
2
Y
Y
0
x
2
−2
2
0.1
0
0
−1
0.2
0
0
−2
fX,Y(x,y)
2
0.1
0.2
X,Y
0.2
(x,y)
0.3
f
fX,Y(x,y)
0.3
ρ = 0.9
0
−2
−2
0
X
2
−2
0
X
2
The Joint Gaussian PDF fX,Y (x, y) for µX = µY = 0, σX = σY = 1, and three values of
ρX,Y = ρ. Next to each PDF, we plot 200 sample pairs (X, Y ) generated with that PDF.
Theorem 5.18
If X and Y are the bivariate Gaussian random variables in Definition 5.10,
X is the Gaussian (µX , σX ) random variable and Y is the Gaussian (µY , σY )
random variable:
2 /2σ 2
2 /2σ 2
1
1
−(x−µ
−(y−µ
)
)
X
Y
X
Y.
√ e
√ e
f X ( x) =
,
f Y (y ) =
σX 2π
σY 2π
Proof: Theorem 5.18
Integrating fX,Y (x, y) in Equation (5.69) over all y, we have
fX (x) =
=
Z ∞
−∞
fX,Y (x, y ) dy
1
√
σX 2π
2
−(x−µX )2 /2σX
e
Z ∞
−∞ σ̃Y
|
1
√
−(y−µ̃Y (x))2 /2σ̃Y2
2π
e
{z
1
dy
(5.70)
}
The integral above the bracket equals 1 because it is the integral of a
Gaussian PDF. The remainder of the formula is the PDF of the Gaussian
(µX , σX ) random variable. The same reasoning with the roles of X and
Y reversed leads to the formula for fY (y).
Theorem 5.19
Bivariate Gaussian random variables X and Y in Definition 5.10 have
correlation coefficient ρX,Y .
Theorem 5.20
Bivariate Gaussian random variables X and Y are uncorrelated if and only
if they are independent.
Theorem 5.21
If X and Y are bivariate Gaussian random variables with PDF given by
Definition 5.10, and W1 and W2 are given by the linearly independent
equations
W1 = a1X + b1Y,
W2 = a2X + b2Y,
then W1 and W2 are bivariate Gaussian random variables such that
E [Wi] = aiµX + biµY ,
2
2 2
Var[Wi] = a2
i σX + bi σY + 2ai bi ρX,Y σX σY ,
2 + b b σ 2 + (a b + a b )ρ
Cov [W1, W2] = a1a2σX
1 2 Y
1 2
2 1 X,Y σX σY .
i = 1, 2,
i = 1, 2,
Example 5.19 Problem
For the noisy observation in Example 5.14, find the PDF of Y = X + Z.
Example 5.19 Solution
Since X is Gaussian (0, σX ) and Z is Gaussian (0, σZ ) and X and Z are
independent, X and Z are jointly Gaussian. It follows from Theorem 5.21
that Y is Gaussian with E[Y ] = E[X] + E[Z] = 0 and variance σY2 =
2 + σ 2 . The PDF of Y is
σX
Z
fY (y ) = q
1
2 + σ2 )
2π(σX
Z
e
2 +σ 2 )
−y 2 /2(σX
Z
.
(5.71)
Example 5.20 Problem
Continuing Example 5.19, find the joint PDF of X and Y when σX = 4
and σZ = 3.
Example 5.20 Solution
From Theorem 5.21, we know that X and Y are bivariate Gaussian. We
2 + σ2 =
also know that µX = µY = 0 and that Y has variance σY2 = σX
Z
25. Substituting σX = 4 and σZ = 3 in the formula for the correlation
coefficient derived in Example 5.18, we have
v
u
u
ρX,Y = t
2 /σ 2
σX
4
Z
= .
2
2
5
1 + σX /σZ
(5.72)
Applying these parameters to Definition 5.10, we obtain
1 − 25x2/16−2xy+y2/18
fX,Y (x, y ) =
e
.
24π
(5.73)
Quiz 5.9
Let X and Y be jointly Gaussian (0, 1) random variables with correlation
coefficient 1/2. What is the joint PDF of X and Y ?
Quiz 5.9 Solution
This problem just requires identifying the various parameters in Definition 5.10. Specifically, from the problem statement, we know ρ = 1/2
and
µX = 0,
µY = 0,
σX = 1,
σY = 1.
Applying these facts to Definition 5.10, we have
2
2
e−2(x −xy+y )/3
√
fX,Y (x, y ) =
.
2
3π
(1)
Section 5.10
Multivariate Probability Models
Definition 5.11 Multivariate Joint CDF
The joint CDF of X1, . . . , Xn is
FX1,...,Xn (x1, . . . , xn) = P [X1 ≤ x1, . . . , Xn ≤ xn] .
Definition 5.12 Multivariate Joint PMF
The joint PMF of the discrete random variables X1, . . . , Xn is
PX1,...,Xn (x1, . . . , xn) = P [X1 = x1, . . . , Xn = xn] .
Definition 5.13 Multivariate Joint PDF
The joint PDF of the continuous random variables X1, . . . , Xn is the function
fX1,...,Xn (x1, . . . , xn) =
∂ nFX1,...,Xn (x1, . . . , xn)
∂x1 · · · ∂xn
.
Theorem 5.22
If X1, . . . , Xn are discrete random variables with joint PMF
PX1,...,Xn(x1, . . . , xn),
(a) PX1,...,Xn(x1, . . . , xn) ≥ 0,
(b)
X
x1 ∈SX1
···
X
xn ∈SXn
PX1,...,Xn(x1, . . . , xn) = 1.
Theorem 5.23
If X1, . . . , Xn have joint PDF
fX1,...,Xn(x1, . . . , xn),
(a) fX1,...,Xn(x1, . . . , xn) ≥ 0,
(b) FX1,...,Xn(x1, . . . , xn) =
(c)
Z ∞
−∞
···
Z ∞
−∞
Z x
1
−∞
···
Z x
n
−∞
fX1,...,Xn(u1, . . . , un) du1 · · · dun,
fX1,...,Xn(x1, . . . , xn) dx1 · · · dxn = 1.
Theorem 5.24
The probability of an event A expressed in terms of the random variables
X1, . . . , Xn is
Discrete:
X
P[A] =
PX1,...,Xn(x1, . . . , xn)
(x1 ,...,xn )∈A
Continuous: P[A] =
Z
···
A
Z
fX1,...,Xn(x1, . . . , xn) dx1 dx2 . . . dxn.
Example 5.21 Problem
Consider a set of n independent trials in which there are r possible outcomes s1, . . . , sr for each trial. In each trial, P[si] = pi. Let Ni equal the
number of times that outcome si occurs over n trials. What is the joint
PMF of N1, . . . , Nr ?
Example 5.21 Solution
The solution to this problem appears in Theorem 2.9 and is repeated
here:
n
n n
r
(5.74)
PN1,...,Nr (n1, . . . , nr ) =
p1 1 p2 2 · · · pn
r .
n1, . . . , nr
Theorem 5.25
For a joint PMF PW,X,Y,Z(w, x, y, z) of discrete random variables W, X, Y, Z,
some marginal PMFs are
PX,Y,Z (x, y, z ) =
PW,Z (w, z ) =
X
PW,X,Y,Z (w, x, y, z ) ,
w∈SW
X X
x∈SX y∈SY
PW,X,Y,Z (w, x, y, z ) ,
Theorem 5.26
For a joint PDF fW,X,Y,Z(w, x, y, z) of continuous random variables W, X, Y, Z,
some marginal PDFs are
fW,X,Y (w, x, y ) =
f X ( x) =
Z ∞
fW,X,Y,Z (w, x, y, z ) dz,
Z−∞
∞ Z ∞ Z ∞
−∞ −∞ −∞
fW,X,Y,Z (w, x, y, z ) dw dy dz.
Example 5.22 Problem
As in Quiz 5.10, the random variables Y1, . . . , Y4 have the joint PDF
fY1,...,Y4 (y1, . . . , y4) =

4
0
0 ≤ y1 ≤ y2 ≤ 1, 0 ≤ y3 ≤ y4 ≤ 1,
otherwise.
Find the marginal PDFs fY1,Y4(y1, y4), fY2,Y3(y2, y3), and fY3(y3).
(5.75)
Example 5.22 Solution
fY1,Y4 (y1, y4) =
Z ∞ Z ∞
−∞ −∞
fY1,...,Y4 (y1, . . . , y4) dy2 dy3.
(5.76)
In the foregoing integral, the hard part is identifying the correct limits.
These limits will depend on y1 and y4. For 0 ≤ y1 ≤ 1 and 0 ≤ y4 ≤ 1,
fY1,Y4 (y1, y4) =
Z 1Z y
4
y1 0
4 dy3 dy2 = 4(1 − y1)y4.
(5.77)
The complete expression for fY1,Y4(y1, y4) is

4(1 − y )y
1 4
fY1,Y4 (y1, y4) =
0
0 ≤ y1 ≤ 1, 0 ≤ y4 ≤ 1,
otherwise.
(5.78)
Similarly, for 0 ≤ y2 ≤ 1 and 0 ≤ y3 ≤ 1,
fY2,Y3 (y2, y3) =
Z y Z 1
2
0
y3
4 dy4 dy1 = 4y2(1 − y3).
(5.79)
[Continued]
Example 5.22 Solution
(Continued 2)
The complete expression for fY2,Y3(y2, y3) is

4y (1 − y )
2
3
fY2,Y3 (y2, y3) =
0
0 ≤ y2 ≤ 1, 0 ≤ y3 ≤ 1,
otherwise.
(5.80)
Lastly, for 0 ≤ y3 ≤ 1,
fY3 (y3) =
Z ∞
−∞
fY2,Y3 (y2, y3) dy2 =
Z 1
0
4y2(1 − y3) dy2 = 2(1 − y3). (5.81)
The complete expression is

2(1 − y )
3
fY3 (y3) =
0
0 ≤ y3 ≤ 1,
otherwise.
(5.82)
N Independent Random
Definition 5.14 Variables
Random variables X1, . . . , Xn are independent if for all x1, . . . , xn,
Discrete:
PX1,...,Xn(x1, . . . , xn) = PX1(x1)PX2(x2) · · · PXN(xn)
Continuous: fX1,...,Xn(x1, . . . , xn) = fX1(x1)fX2(x2) · · · fXn(xn).
Independent and Identically
Definition 5.15 Distributed (iid)
X1, . . . , Xn are independent and identically distributed (iid) if
Discrete:
PX1,...,Xn(x1, . . . , xn) = PX(x1)PX(x2) · · · PX(xn)
Continuous: fX1,...,Xn(x1, . . . , xn) = fX(x1)fX(x2) · · · fX(xn).
Example 5.23 Problem
The random variables X1, . . . , Xn have the joint PDF
fX1,...,Xn (x1, . . . , xn) =

1
0
0 ≤ xi ≤ 1, i = 1, . . . , n,
otherwise.
Let A denote the event that maxi Xi ≤ 1/2. Find P[A].
(5.83)
Example 5.23 Solution
We can solve this problem by applying Theorem 5.24:
P [A] = P max Xi ≤ 1/2 = P [X1 ≤ 1/2, . . . , Xn ≤ 1/2]
i
Z 1/2
Z 1/2
1
(5.84)
1 dx1 · · · dxn = n .
2
0
0
As n grows, the probability that the maximum is less than 1/2 rapidly
goes to 0.
=
···
We note that inspection of the joint PDF reveals that X1, . . . , X4 are iid
continuous uniform (0, 1) random variables. The integration in Equation (5.84) is easy because independence implies
P [A] = P [X1 ≤ 1/2, . . . , Xn ≤ 1/2]
= P [X1 ≤ 1/2] × · · · × P [Xn ≤ 1/2] = (1/2)n.
(5.85)
Quiz 5.10
The random variables Y1, . . . , Y4 have the joint PDF
fY1,...,Y4 (y1, . . . , y4) =

4
0
0 ≤ y1 ≤ y2 ≤ 1, 0 ≤ y3 ≤ y4 ≤ 1,
otherwise.
Let C denote the event that maxi Yi ≤ 1/2. Find P[C].
(5.86)
Quiz 5.10 Solution
We find P[C] by integrating the joint PDF over the region of interest.
Specifically,
P [C ] =
Z
1
2
dy2
Z y
2
dy1
Z
1
2
dy4
Z y
4
4dy3
0
0 
0
 0
Z 1
Z 1
2
2
= 4
y2 dy2 
y4 dy4
0
0



1
1
2
2
2
1
1
1
1
=
.
= 4  y22   y42  = 4
2 0
2 0
8
16
(1)
Section 5.11
Matlab
Sample Space Grids
• We start with the case when X and Y are finite random variables with ranges
SX = {x1 , . . . , xn } ,
SY = {y1 , . . . , ym } .
(5.87)
In this case, we can take advantage of Matlab techniques for surface plots of g(x, y)
over the x, y plane.
• In Matlab, we represent SX and SY by the n element vector
vector sy.
sx and m element
• The function [SX,SY]=ndgrid(sx,sy) produces the pair of n × m matrices,




x1 · · · x1
y1 · · · ym
...  ,
...  .
SX =  ...
SY =  ...
xn · · · xn
y1 · · · ym
(5.88)
We refer to matrices SX and SY as a sample space grid because they are a grid
representation of the joint sample space
SX,Y = {(x, y)|x ∈ SX , y ∈ SY } .
That is, [SX(i,j) SY(i,j)] is the pair (xi , yj ).
(5.89)
Probabilities on Grids
• To complete the probability model, for X and Y , in Matlab, we employ the n × m
matrix PXY such that PXY(i,j) = PX,Y (xi , yj ).
• To make sure that probabilities have been generated properly, we note that
[SX(:)
SY(:)
PXY(:)]
is a matrix whose rows list all possible pairs xi , yj and corresponding probabilities
PX,Y (xi , yj ).
• Given a function g(x, y) that operates on the elements of vectors x and y, the
advantage of this grid approach is that the Matlab function g(SX,SY) will calculate
g(x, y) for each x ∈ SX and y ∈ SY .
• In particular, g(SX,SY) produces an n × m matrix with i, jth element g(xi , yj ).
Example 5.24 Problem
An Internet photo developer website prints compressed photo images.
Each image file contains a variable-sized image of X × Y pixels described
by the joint PMF
PX,Y (x, y ) y = 400 y = 800 y = 1200
x = 800
0.2
0.05
0.1
0.05
0.2
0.1
x = 1200
x = 1600
0
0.1
0.2.
(5.90)
For random variables X, Y , write a script imagepmf.m that defines the
sample space grid matrices SX, SY, and PXY.
Example 5.24 Solution
i0
h
In the script imagepmf.m, the matrix SX has 800 1200 1600
h
i
column and SY has 400 800 1200 for each row.
After running imagepmf.m, we can inspect the variables:
%imagepmf.m
PXY=[0.2 0.05 0.1; ...
0.05 0.2 0.1; ...
0
0.1 0.2];
[SX,SY]=ndgrid([800 1200 1600],...
[400 800 1200]);
>> imagepmf; SX
SX =
800
800
1200
1200
1600
1600
>> SY
SY =
400
800
400
800
400
800
800
1200
1600
1200
1200
1200
for each
Example 5.25 Problem
At 24 bits (3 bytes) per pixel, a 10:1 image compression factor yields
image files with B = 0.3XY bytes. Find the expected value E[B] and the
PMF PB(b).
Example 5.25 Solution
The script imagesize.m produces the expected
value as eb, and produces the PMF, which is
represented by the vectors sb and pb. The 3 × 3
matrix SB has i, jth element g(xi, yj ) = 0.3xiyj .
The calculation of eb is simply a Matlab implementation of Theorem 5.9. Since some elements of SB are identical,
sb=unique(SB) extracts the unique elements. Although SB and PXY are
both 3 × 3 matrices, each is stored internally by Matlab as a 9-element
vector. Hence, we can pass SB and PXY to the finitepmf() function,
which was designed to handle a finite random variable described by a
pair of column vectors. Figure 5.7 shows one result of running the program imagesize. The vectors sb and pb comprise PB(b). For example,
PB(288000) = 0.3.
%imagesize.m
imagepmf;
SB=0.3*(SX.*SY);
eb=sum(sum(SB.*PXY))
sb=unique(SB)’
pb=finitepmf(SB,PXY,sb)’
Figure 5.7
>> imagesize
eb =
319200
sb =
96000 144000
pb =
0.2000 0.0500
192000
288000
384000
432000
576000
0.0500
0.3000
0.1000
0.1000
0.2000
Output resulting from imagesize.m in Example 5.25.
Example 5.26 Problem
Write a function xy=imagerv(m) that generates m sample pairs of the
image size random variables X, Y of Example 5.25.
Example 5.26 Solution
The function imagerv uses the imagesize.m script to define the matrices
SX, SY, and PXY. It then calls the finiterv.m function. Here is the code
imagerv.m and a sample run:
function xy = imagerv(m);
imagepmf;
S=[SX(:) SY(:)];
xy=finiterv(S,PXY(:),m);
>> xy=imagerv(3)
xy =
800
1200
1600
400
800
800
Example 5.27 Problem
Given a list xy of sample pairs of random variables X, Y with Matlab
range grids SX and SY, write a Matlab function
fxy=freqxy(xy,SX,SY)
that calculates the relative frequency of every pair x, y. The output fxy
should correspond to the matrix [SX(:) SY(:) PXY(:)].
Example 5.27 Solution
The matrix [SX(:) SY(:)] in freqxy has
rows that list all possible pairs x, y. We
append this matrix to xy to ensure that the
new xy has every possible pair x, y. Next,
the unique function copies all unique rows
of xy to the matrix U and also provides
the vector J that indexes the rows of xy in U; that is, xy=U(J). In
addition, the number of occurrences of j in J indicates the number of
occurrences in xy of row j in U. Thus we use the hist function on
J to calculate the relative frequencies. We include the correction factor
-1 because we had appended [SX(:) SY(:)] to xy at the start. Lastly,
we reorder the rows of fxy because the output of unique produces the
rows of U in a different order from [SX(:) SY(:) PXY(:)].
function fxy = freqxy(xy,SX,SY)
xy=[xy; SX(:) SY(:)];
[U,I,J]=unique(xy,’rows’);
N=hist(J,1:max(J))-1;
N=N/sum(N);
fxy=[U N(:)];
fxy=sortrows(fxy,[2 1 3]);
Example 5.28 Problem
Generate m = 10, 000 samples of random variables X, Y of Example 5.25.
Calculate the relative frequencies and use stem3 to graph them.
Example 5.28 Solution
The script imagestem.m generates the following relative frequency stem
plot.
%imagestem.m
imagepmf;
xy=imagerv(10000);
fxy=freqxy(xy,SX,SY);
stem3(fxy(:,1),...
fxy(:,2),fxy(:,3));
xlabel(’\it x’);
ylabel(’\it y’);
0.2
0.1
1600
0
1200
800
800
400
y
0
0
x