Download Chapter 6: Multivariate Probability Distributions - UF-Stat

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
1
Chapter 6: Multivariate Probability Distributions
6.1 Bivariate and Marginal Probability Distributions
Chapters 4 and 5 dealt with experiments that produced a single numerical response, or
random variable, of interest. We discussed, for example, the life lengths X of a battery or
the strength Y of a steel casing. Often, however, we want to study the joint behavior of
two random variables, such as the joint behavior of life length and casing strength for
these batteries. Perhaps in such a study we can identify a region in which some
combination of life length and casing strength is optimal in balancing the cost of
manufacturing with customer satisfaction. To proceed with such a study, we must know
how to handle a joint probability distribution. When only two random variables are
involved, joint distributions are called bivariate distributions. We will discuss the
bivariate case in some detail; extensions to more than two variables follow along similar
lines.
Other situations in which bivariate probability distributions are important come to
mind easily. A physician studies the joint behavior of pulse and exercise. An educator
studies the joint behavior of grades and time devoted to study, or the interrelationship of
pretest and posttest scores. An economist studies the joint behavior of business volume
and profits. In fact, most real problems we come across have more than one underlying
random variable of interest. On April 15, 1912, the ocean liner Titanic collided with an
iceberg and sank. Of the 2201 people on the Titanic, 1490 perished. The question as to
whether economic status was related to survival of this disaster has been discussed
extensively. In Table 6.1, the numbers of survivors and fatalities for four economic
classes from this incident are recorded: first-class passengers, second-class passengers,
third-class passengers, and the crew.
Table 6.1. Survival and Economic Class of Passengers on the Titanic.
Economic Status Survivors Fatalities Total
First-class
203
122
325
Second-class
118
167
285
Third-class
178
528
706
Crew
212
673
885
Total
711
1490
2201
As noted in earlier chapters, it is more convenient to analyze such data in terms of
meaningful random variables rather than in terms of described categories. For each
passenger, we want to know whether or not s/he survived and what economic class s/he
was in. An X1 defined as
⎧0,
if passenger survived
⎪
X1 = ⎨
if passenger did not survive
⎪⎩1,
will keep track the number of fatalities. The other variable of interest is economic status.
An X2 defined as
2
⎧1,
⎪
⎪2,
⎪
X2 = ⎨
⎪3,
⎪
⎪⎩4,
if passenger was in first − class
if passenger was in sec ond − class
if passenger was in third − class
if passenger was a crew member
provides a random variable that keeps track of the number of passengers in each
economic class.
The frequencies from Table 6.1 are turned into the relative frequencies of Table
6.2 to produce the joint probability distribution of X1 and X2.
Table 6.12 Joint Probability Function of Survival and
Economic Class of Passengers on the Titanic.
X1
0
1
1 0.09 0.06 0.15
2 0.05 0.08 0.13
X2 3 0.08 0.24 0.32
4 0.10 0.30 0.40
0.32 0.68 1.00
In general, we write
P( X 1 = x1 , X 2 = x2 ) = p( x1 , x2 )
and call p( x1 , x2 ) the joint probability function of (X1, X2). From Table 6.2, we can see
that
P( X 1 = 0, X 2 = 4) = p(0,4) = 0.10
represents the approximate probability that a randomly selected passenger is a crew
member who survived. The properties of the probability function for jointly distributed
random variables are similar to those for the univariate probability functions as
summarized in Definite 6.1.
Similarly, a distribution function is associated with every probability function.
The extension from one to two or more dimensions is an intuitive one. For two
dimensions, the distribution function is defined as
F( X 1 , X 2 ) ( x1 , x2 ) = P( X 1 ≤ x1 , X 2 ≤ x2 ),
( x1 , x2 ) ∈ ℜ2
As with the univariate case, every distribution function must satisfy four conditions:
1.
2.
lim
lim F ( x1 , x2 ) = 0
x1 → −∞ x 2 → −∞
lim lim F ( x1 , x 2 ) = 1
x1 →∞ x2 →∞
3
3. The distribution function is a non-decreasing function; that is, for each a < b and c
< d, F(b, d) – F(a, d) – F(c, d) + F(a, c) ≥ 0.
4. For fixed x1 or x2, the distribution function is right-hand continuous in the
remaining variable. For example, for x1 fixed, lim+ F ( x1 , x2 + h ) = F ( x1 , x2 ) .
h →0
It is important to check each condition as the following example illustrates.
Definition 6.1. Let X1 and X2 be discrete random variables. The joint probability
distribution of X1 and X2 is given by
p ( x1 , x 2 ) = P ( X 1 = x1 , X 2 = x 2 )
Defined for all real numbers x1 and x 2 . The function p ( x1 , x 2 ) is called the joint
probability function of X1 and X2. All joint probability functions must satisfy two
properties:
x1 , x2 ∈ ℜ
1. p( x1 , x2 ) ≥ 0,
2. ∑ ∑ p( x1 , x2 ) = 1
x1 x2
Further, any function that satisfies the two properties above is a probability function.
Example 6.1:
Let
x+ y<0
x+ y≥0
⎧0,
F ( x, y ) = ⎨
⎩1,
Is F a distribution function? Justify your answer.
Solution:
Here we used X and Y instead of X1 and X2. These symbols serve as names and do not
affect how we proceed.
The first condition is satisfied because lim F ( x, y ) = lim F ( x, y ) = 0 .
x → −∞
y → −∞
Similarly, lim F ( x, y ) = lim F ( x, y ) = 1 so the second condition is satisfied.
x →∞
y →∞
The last condition is satisfied. This is verified by focusing on the points of
discontinuity for the function. These occur at the line x + y = 1. Yet, because the
function is right-hand continuous for all points on this line, it is right-hand continuous.
The third condition, however, is not satisfied. To see this, notice that
P (1 < X ≤ 2,−1 < Y ≤ 2) = F ( 2,2) − F ( −1,2) − F ( 2,−1) + F ( −1,−1)
= 1 − 1 − 1 + 0 = −1
Therefore, this function is not a distribution function.
4
Definition 6.2. The joint distribution function F(a, b) for a bivariate random
variable (X1, X2) is
F(b) = P(X1 ≤ a, X2 ≤ b)
If X1 and X2 are discrete,
a
b
F ( a , b) = ∑
∑ p( x1 , x2 )
x1 = −∞ x2 = −∞
where p(x1, x2) is the joint probability function.
The distribution function is often called the cumulative distribution function
(c.d.f).
Now we will return to the data for the Titanic. The probability that a randomly selected
passenger will be a crew member is
P( X 2 = 4) = P( X 1 = 0, X 2 = 4) + P( X 1 = 1, X 2 = 4)
= 0.10 + 0.30 = 0.40
which is one of the marginal probabilities for X2. The univariate distribution along the
right margin is the marginal distribution for X2, and the one along the bottom row is the
marginal distribution for X1. The conditional probability distribution X1, given X2, fixes a
value of X2 (a row of Table 6.2) and looks at the relative frequencies for values of X1 in
that row. For example, conditioning on X2 = 1, produces
P( X 1 = 0 | X 2 = 1) =
=
P( X 1 = 0, X 2 = 1)
P( X 2 = 1)
0.09
= 0.60
0.15
and
P( X 1 = 1 | X 2 = 1) = 0.40
These two values show how survivorship behaved for passengers in first class.
Notice that the conditional probabilities for third-class passengers were
P( X 1 = 0 | X 2 = 3) = 0.25 and P( X 1 = 1 | X 2 = 3) = 0.75 . Did the probability of survival
change with class? What were the chances of survival for the second-class passengers
and the crew?
Another set of conditional distributions here consists of those for X2 conditioning
on X1. In this case,
P( X 1 = 0, X 2 = 1)
P( X 2 = 1 | X 1 = 0) =
P( X 1 = 0)
=
0.09
= 0.28
0.32
5
P ( X 2 = 2 | X 1 = 0) =
0.05
= 0.16
0.32
P ( X 2 = 3 | X 1 = 0) =
0.08
= 0.25
0.32
P ( X 2 = 4 | X 1 = 0) =
0.10
= 0.31
0.32
and
Notice that the probability that a randomly selected survivor is a crew member is 0.31.
Does that mean that they had the highest rate of survival? Not necessarily. There were
more crew members than any other economic class. If the survival was the same for all
classes, then we would expect more of the survivors to be crew members. Was the
probability of a crew member surviving larger than that for other economic classes?
We first worked with the data from the Titanic in Exercises 3.38 and 3.54. There
we defined events and found the probabilities, odds, and odds ratios for particular
outcomes. Here we have used random variables, but notice that the basic processes are
the same. Keep in mind throughout this chapter that what we will be doing here is not so
much new as a different approach to the analysis.
Definition 6.3. The marginal probability functions of X1 and X2, respectively, are
given by
p ( x1 ) = ∑ p ( x1 , x 2 )
x2
and
p ( x 2 ) = ∑ p ( x1 , x 2 )
x1
The conditional distribution of X1 given X2 = x2 is
P( X 1 = x1 , X 2 = x2 )
P ( X 1 = x1 | X 2 = x2 ) =
P ( X 2 = x2 )
which is defined for all real values of x1. Similarly, for all real values of x2, the
conditional distribution of X2 given X1 = x1 is
P( X 1 = x1 , X 2 = x2 )
P ( X 2 = x2 | X 1 = x1 ) =
P( X 1 = x1 )
Further, X1 and X2 are independent if the marginal and conditional distributions are
equal.
Example 6.2:
Three checkout counters are in operation at a local supermarket. Two customers arrive at
the counters at different times, when the counters are serving no other customers. It is
assumed that the customers then choose a checkout station at random and independent of
one another. Let X1 denote the number of times counter A is selected, and let X2 denote
6
the number of times counter B is selected by the two customers. Find the joint
probability distribution of X1 and X2. Find the probability that one of the customers visits
counter B, given that one of the customers is known to have visited counter A.
Solution:
For convenience, let us introduce X3—the number of customers who visit counter C.
Now the event (X1 = 0, X2 = 0) is equivalent to the event (X1 = 0, X2 = 0, X3 = 2). It
follows that
P( X 1 = 0, X 2 = 0) = P( X 1 = 0, X 2 = 0, X 3 = 2)
= P( customer I selects counter C and counter II selects counter C )
= P( customer I selects counter C ) × P( customer II selects counter C )
1 1
×
3 3
1
=
9
=
because customers’ choices are independent and because each customer makes a random
selection from among the three available counters.
It is slightly more complicated to calculate
P( X 1 = 1, X 2 = 0) = P( X 1 = 1, X 2 = 0, X 3 = 1) . In this event, customer I could select
counter A and customer II could select counter C, or I could select C and II could select
A. Thus,
P ( X 1 = 1, X 2 = 0) = P ( I selects A) P ( II selects C ) + P ( I selects C ) P ( II selects A)
⎛1 1⎞ ⎛1 1⎞
=⎜ × ⎟+⎜ × ⎟
⎝ 3 3⎠ ⎝ 3 3⎠
2
=
9
Similar arguments allow one to derive the results given in Table 6.3.
7
Table 6.3. Probability Distribution Associated with Example 6.2
Marginal Probabilities for X2
X1
0
1
2
0
1/9 2/9 1/9
4/9
X2
1
2/9 2/9 0
4/9
2
1/9 0
0
1/9
Marginal Probabilities for X1 4/9 4/9 1/9
1
The statement asks for
P ( X 2 = 1 | X 1 = 1) =
P ( X 1 = 1, X 2 = 1)
P ( X 1 = 1)
2/9
4/9
1
=
2
=
Does this answer (1/2) agree with your intuition?
Before we move to the continuous case, let us quickly review the situation in one
dimension. If f (x ) denotes the probability density function of a random variable X, then
f (x ) represents a relative frequency curve, and probabilities, such as P ( a ≤ X ≤ b) , are
represented as areas under this curve. In symbolic terms,
b
P (a ≤ X ≤ b) = ∫ f ( x)dx
a
Now, suppose that we are interested in the joint behavior of two continuous
random variables—say, X1 and X2—where X1 and X2 might represent the amounts of two
different hydrocarbons found in an air sample taken for a pollution study, for example.
The relative frequency of these two random variables can be modeled by a bivariate
function, f ( x1 , x2 ) , which forms a probability, or relative frequency, surface in three
dimensions. Figure 6.1 shows such a surface. The probability that X1 lies in one interval
and that X2 lies in another interval is then represented as a volume under this surface.
Thus,
b2 a2
P(a1 ≤ X 1 ≤ a 2 , b1 ≤ X 2 ≤ b2 ) = ∫ ∫ f ( x1 , x 2 )dx1dx2
b1 a1
Notice that this integral simply gives the volume under the surface and over the shaded
region in figure 6.1. We trace the actual computations involved in such a biaraiate
problem with a very simple example.
8
Figure 6.1. A bivariate density function
Example 6.3:
A certain process for producing an industrial chemical yields a product that contains two
main types of impurities. For a certain volume of sample from this process, let X1 denote
the proportion of total impurities in the sample, and let X2 denote the proportion of type 1
impurity among all impurities found. Suppose that, after investigation of many such
samples, the joint distribution of X1 and X2 can be adequately modeled by the following
function:
⎧⎪2(1 − x1 ),
for 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1
f ( x1 , x2 ) = ⎨
elsewhere
⎪⎩0,
This function graphs as the surface given in Figure 6.2. Calculate the probability that X1
is less than 0.5 and that X2 is between 0.4 and 0.7.
Figure 6.2. Probability density function for Example 6.2.
9
Solution:
From the preceding discussion, we see that
0.70.5
P(0 ≤ x1 ≤ 0.5, 0.4 ≤ x2 ≤ 0.7) =
∫ ∫ 2(1 − x1 )dx1dx2
0.4 0
2 0.5
∫ [− (1 − x1 ) ]0 dx2
0.7
=
0.4
0.7
=
∫ (0.75)dx2
0.4
= 0.75 x2 ]00..74
= (0.75)(0.3)
= 0.225
Thus, the fraction of such samples having less than 50% impurities and between 40% and
70% type 1 impurities is 0.225.
The distribution function continues to be
F( X 1 , X 2 ) ( x1 , x2 ) = P( X 1 ≤ x1 , X 2 ≤ x2 ),
( x1 , x2 ) ∈ ℜ2
However, to find the probabilities, we integrate, instead of sum, over the appropriate
regions. The four conditions of a distribution function are the same for both discrete and
continuous random variables.
Just as the univariate (or marginal) probabilities were computed by summing over
rows or columns in the discrete case, the univariate density function for X1 in the
continuous case can be found by integrating (“summing”) over values of X2. Thus, the
marginal density function of X1, f1 ( x1 ) , is given by
∞
f1 ( x1 ) = ∫ f ( x1 , x 2 )dx 2
−∞
Similarly, the marginal density function of X2, f 2 ( x2 ) is given by
∞
f 2 ( x 2 ) = ∫ f ( x1 , x 2 )dx1
−∞
10
Example 6.4:
For the case given in Example 6.3, find the marginal probability density functions for X1
and X2.
Solution:
Let us first try to visualize what the answers should look like, before going through the
integration. To find f1 ( x1 ) , we accumulate all the probabilities in the x2 direction. Look
at Figure 6.2 and think of collapsing the wedge-shaped figure back onto the
[x1 , f ( x1 , x2 )] plane. Then more probability mass will build up toward the zero point of
the x1-axis than toward the unity point. In other words, the function f1 ( x1 ) should be
high at 0 and low at 1. Formally,
∞
f ( x1 ) = ∫ f ( x1 , x2 )dx2
−∞
1
= ∫ 2(1 − x1 )dx2
0
= 2(1 − x1 )x2 ]10
⎧⎪2(1 − x1 ),
= ⎨0,
⎪⎩
0 ≤ x1 ≤ 1
elsewhere
The function graphs as in Figure 6.3. Notice that our conjecture is correct.
Thinking of how f 2 ( x2 ) should look geometrically, imagine that the wedge of Figure 6.2
has been forced back onto the [x2 , f ( x1 , x2 )] plane. Then the probability mass should
accumulate equally all along the (0, 1) interval on the x2-axis. Mathematically,
∞
f 2 ( x2 ) = ∫ f ( x1 , x2 )dx1
−∞
1
= ∫ 2(1 − x1 )dx1
0
[
= − (1 − x1 )2
⎧ 1,
=⎨
⎩0,
]
1
0
0 ≤ x2 ≤ 1
elsewhere
and again our conjecture is verified, as shown in Figure 6.4.
11
Figure 6.3. Probability density function f1(x1) for Example 6.3
Figure 6.4. Probability density function f2(x2) for Example 6.3.
Definition 6.4. Let X1 and X2 be continuous random variables. The joint probability
density function of X1 and X2, if it exists, is given by a nonnegative function
f ( x1 , x 2 ) such that
b2 a2
P (a1 ≤ X 1 ≤ a 2 , b1 ≤ X 2 ≤ b2 ) = ∫ ∫ f ( x1 , x 2 )dx1dx 2
b1 a1
The cumulative distribution function is
F( X 1 , X 2 ) ( a, b) = P( X 1 ≤ a, X 2 ≤ b)
b
a
= ∫ ∫ f ( x1 , x2 )dx1dx2
−∞ −∞
The marginal probability functions of X1 and X2, respectively, are given by
∞
f1 ( x1 ) = ∫ f ( x1 , x 2 )dx 2
−∞
Following
is another (and somewhat more complicated) example.
and
∞
f 2 ( x 2 ) = ∫ f ( x1 , x 2 )dx1
−∞
12
Example 6.5:
Gasoline is to be stocked in a bulk tank once each week and then sold to customers. Let
X1 denote the proportion of the tank that is stocked in a particular week, and let X2 denote
the proportion of the tank that is sold in the same week. Due to limited supplies, X1 is not
fixed in advance but varies from week to week. Suppose that a study of many weeks
shows the joint relative frequency behavior of X1 and X2 to be such that the following
density function provides an adequate model:
0 ≤ x2 ≤ x1 ≤ 1
elsewhere
⎧⎪3x1 ,
f ( x1 , x2 ) = ⎨ 0,
⎪⎩
Notice that X2 must always be less than or equal to X1. This density function is graphed
in Figure 6.5. Find the probability between 20% and 40% of the tank for a given week.
Solution:
Because the question refers to the proportion of the tank that was sold in a given week,
the marginal behavior of X2 is to be addressed. Thus, it is necessary to find the marginal
density, f 2 ( x2 ) , by integrating the joint density of X1 and X2 over X1. Unlike the last
example, the bounds on the region of positive probability for one random variable
depend on the value of the other random variable. A graph depicting the region of
positive probability is shown in Figure 6.6. From this graph, we see that, for a given
value of X2 = x2, positive probability is associated with X1 values from x2 to 1. Thus, we
have
∞
f 2 ( x2 ) = ∫ f ( x1 , x2 )dx1
−∞
1
= ∫ 3x1dx1
x2
1
3 ⎤
= x12 ⎥
2 ⎦ x2
(
)
⎧⎪ 3
1 − x22 ,
= ⎨2
⎪⎩ 0,
Now, we can find the desired probability.
0 ≤ x2 ≤ 1
elsewhere
13
(
)
3
1 − x22 dx2
0. 2 2
0. 4
P (0.2 ≤ X 2 ≤ 0.4) = ∫
0.4
3⎛
x 3 ⎞⎤
= ⎜⎜ x2 − 2 ⎟⎟⎥
2⎝
3 ⎠⎥⎦
0.2
3 ⎧⎪ ⎡
(0.4)3 ⎤ ⎡
(0.2)3 ⎤ ⎫⎪
= ⎨ ⎢0.4 −
⎥⎬
⎥ − ⎢0.2 −
3 ⎦ ⎣
3 ⎦ ⎪⎭
2 ⎪⎩ ⎣
= 0.272
Notice that, when working with the marginal density of X2, the region of positive
probability does not depend on X1. This is always true. After integrating out X1, it plays
no role in the marginal probability density of X2. The marginal density of X2 graphs as a
function that is high at x2 = 0 and then tends toward 0 as x2 tends toward 1. Does this
agree with your intuition after looking at Figure 6.5?
Figure 6.5. The joint density function of Example 6.4
Figure 6.6. The region of positive probability for Example 6.4.
14
6.2 Conditional Probability Distributions
Before considering conditional probability distributions, think back to conditional
probability discussed in Chapter 3. For two events A and B, the conditional probability of
A given B may be computed as
P( AB )
P( A | B ) =
P( B )
where P ( B ) > 0 . An analogous approach can be used with probability functions where
the joint density of X1 and X2 replaces P ( AB ) , and the marginal densities of X1 and X2
replaces P ( A) and P (B ) . That is, an intuitive definition of the conditional density of X1
given X2 is
f ( x1 , x 2 )
f ( x1 | x2 ) =
f ( x2 )
for f ( x2 ) > 0 . We shall soon see that this intuition is correct.
More formally, recall that, in the bivariate discrete case, the conditional
probabilities for X1 for a given X2 were found by fixing attention on the particular row in
which X 2 = x2 , and then looking at the relative probabilities within that row; that is, the
individual cell probabilities were divided by the marginal total for that row, to obtain
conditional probabilities.
In the bivariate continuous case, the form of the probability density function
representing the conditional behavior of X1 for a given X2 is found by slicing through the
joint density in the x1 direction at the particular value of X2. The function then has to be
weighted by the marginal density function for X2 at that point. Let’s look at a specific
example before examining the general definition of conditional density functions.
Example 6.6:
Refer to Example 6.5. If in a given week the tank is stocked to be 50% full, find the
probability that the proportion of the tank that is sold during that week is less than 0.2.
Solution:
To address this question, we need to first the conditional distribution of X2 given X1.
Slicing through f ( x1, x2 ) in the x2 direction at x1 = 0.5 yields
f (0.5, x2 ) = 3(0.5) = 1.5,
0 ≤ x2 ≤ 0.5
Thus, the conditional behavior of X2 for a given X1 of 0.5 is constant over the interval (0,
0.5). The marginal value of f ( x1 ) at x1 = 0.5 is obtained as follows:
15
∞
f1 ( x1 ) = ∫ f ( x1 , x2 )dx2
−∞
x1
= ∫ 3 x1dx2
0
⎧3x12 ,
⎪
= ⎨0,
⎪⎩
0 ≤ x1 ≤ 1
elsewhere
f1 (0.5) = 3(0.5) 2 = 0.75
Upon dividing, we see that the conditional behavior of X2 for a given X1 of 0.5 is
represented by the function
f (0.5, x2 )
f ( x2 | x1 = 0.5) =
f1 (0.5)
1.5
0.75
⎧⎪2,
= ⎨ 0,
⎪⎩
=
0 < x2 < 0.5
otherwise
This function of x2 has all the properties of a probability density function and is the
conditional density function for X2 at X1 = 0.5. (Notice that, unlike the marginal
distribution, the conditional distribution of X2 depends on the particular value of X1 that is
observed.) Then,
0.2
P ( X 2 < 0.2 | X 1 = 0.5) = ∫ f ( x2 | x1 = 0.5)dx2
0
0.2
= ∫ 2dx2
0
= 0.4
Thus, among all weeks in which the tank was half full immediately after stocking, sales
amounted to less than 20% of the tank 40% of the time.
Clearly, these manipulations to obtain conditional density functions in the continuous
case are analogous to those used to obtain conditional probabilities in the discrete case,
except that integrals are used in place of sums. The formal definition of a conditional
density function follows.
16
Definition 6.5. Let X1 and X2 be continuous random variables with joint probability
density function f ( x1 , x 2 ) and marginal densities f ( x1 ) and f ( x 2 ) , respectively.
Then, the conditional probability density function of X1 given X 2 = x 2 is defined
by
⎧ f ( x1 , x2 )
,
for f 2 ( x 2 ) > 0
⎪
f ( x1 | x 2 ) = ⎨ f 2 ( x 2 )
⎪0,
elsewhere
⎩
and the conditional probability density function of X2 given X 1 = x1 , is defined by
⎧ f ( x1 , x 2 )
,
for f1 ( x1 ) > 0
⎪
f ( x 2 | x1 ) = ⎨ f1 ( x 2 )
⎪0,
elsewhere
⎩
_______________________________________________________________________
Example 6.7:
A soft-drink machine has a random supply Y2 at the beginning of a given day and
dispenses a random amount Y1 during the day (with measurements in gallons). It is not
resupplied during the day; hence, Y1 ≤ Y2. It has been observed that Y1 and Y2 have joint
density
,0 ≤ y1 ≤ y2 , 0 ≤ y2 ≤ 2
⎧⎪1 / 2,
f ( y1 , y2 ) = ⎨
⎪⎩ 0,
elsewhere
In other words, the points (y1, y2) are uniformly distributed over the triangle with the
given boundaries. Find the conditional density of Y1, given Y2 = y2. Evaluate the
probability that less than ½ gallon is sold, given that the machine contains 1 gallon at the
start of the day.
Solution:
The marginal density of Y2 is given by
∞
f 2 ( y 2 ) = ∫ f ( y1 , y 2 )dy1
−∞
⎧ y2
⎪ (1 / 2)dy1 = (1 / 2) y 2 ,
= ⎨ 0∫
⎪⎩0,
By Definition 6.5,
0≤ y≤2
elsewhere
17
f1 ( y1 | y 2 ) =
f ( y1 , y 2 )
f 2 ( y2 )
1
⎧ 1/ 2
,
=
⎪
= ⎨ (1 / 2) y 2 y 2
⎪⎩ 0,
0 ≤ y1 ≤ y 2 ≤ 2
elsewhere
The probability of interest is
∞
P (Y1 < 1 / 2 | Y2 = 1) = ∫ f ( y1 | y 2 = 1)dy1
−∞
1/ 2
= ∫ (1 / 2)dy1
0
=
1
4
Thus, the amount sold is highly dependent on the amount in supply.
Recall that the marginal density of Y1 (or Y2) does not involve Y2 (or Y1). However, note
that a relationship between Y1 and Y2 may again be evident in the conditional distribution
of Y1 given Y2 (or Y2 given Y1), both in expressing the height of the density function and
in designating the region of positive probability.
____________________________________________________________________
6.3 Independent Random Variables
Before defining independent random variables, let us recall once again that two events A
and B are independent if P ( A, B ) = P ( A) P ( B ) . Somewhat analogously, two discrete
random variables are independent if
P ( X 1 , X 2 ) = P ( X 1 = x1 ) P ( X 2 = x 2 )
for all real numbers x1 and x 2 . A similar idea carries over to the continuous case.
Definition 6.6. Discrete random variables X1 and X2 are said to be independent if
P ( X 1 = x1 , X 2 = x 2 ) = P ( X 1 = x1 ) P ( X 2 = x 2 )
for all real numbers x1 and x 2 .
Continuous random variables X1 and X2 are said to be independent if
f ( x1 , x 2 ) = f ( x1 ) f ( x 2 )
for all real numbers x1 and x 2 .
The concepts of joint probability density functions and independence extend
immediately to n random variables, where n is any finite positive integer. The n random
18
variables X1, X2, …, Xn are said to be independent if their joint density function,
f ( x1 , x2 , …, xn ) is given by
f ( x1 , x 2 , …, x n ) = f ( x1 ) f ( x 2 ) f ( x n )
for all real numbers x1 , x 2 ,…, xn .
Example 6.8:
Show that the random variables survival and economic class that have the joint
distribution identified in Table 6.2 are not independent.
Solution:
It is necessary to check only one entry in the table. We see that P(X1 = 0, X2 = 0) = 0.38,
whereas P(X1 = 0) = 0.76 and P(X2 = 0) = 0.55. Because
P( X 1 = 0, X 2 = 0) = 0.76 ≠ 0.418 = P( X 1 = 0) P( X 2 = 0)
the random variables cannot be independent.
Note: To prove that two variables are not independent, it is sufficient to find one value
x1 and one value x2 such that P( X 1 = x1 , X 2 = x2 ) ≠ P( X 1 = x1 ) P( X 2 = x2 ) if (X1, X2)
are discrete or f ( x1 , x2 ) ≠ f ( x1 ) f ( x2 ) if (X1, X2) are continuous. In contrast, to prove
the two variables are independent, equality of the joint probability function and the
product of the marginal probability functions must be shown to hold for all (x1, x2).
Example 6.9:
Show that the random variables in Example 6.3 are independent.
Solution:
Here,
⎧2(1 − x1 ),
⎪
f ( x1 , x 2 ) = ⎨ 0,
⎪⎩
We saw in Example 6.4 that
and
⎧⎪2(1 − x1 ),
f ( x1 ) = ⎨0,
⎪⎩
0 ≤ x1 ≤ 1; 0 ≤ x2 ≤ 1
otherwise
0 ≤ x1 ≤ 1
elsewhere
19
⎧ 1,
f 2 ( x 2 ) == ⎨
⎩0,
0 ≤ x2 ≤ 1
elsewhere
Thus f ( x1 , x2 ) = f ( x1 ) f ( x2 ) , for all real numbers x1 and x2; and X1 and X2 are
independent random variables.
Exercises
6.1. Two construction contracts are to be randomly assigned to one or more of three
firms. Numbering the firms I, II, and III, let X1 be the number of contracts assigned to
firm I, and let X2 be the number assigned to firm II. (A firm may receive more than one
contract.)
a. Find the joint probability distribution for X1 and X2.
b. Find the marginal probability distribution for X1.
c. Find P(X1 = 1|X2 = 1).
6.2. A mountain rescue service studied the behavior of lost hikers so that more effective
search strategies could be devised. They decided to determine both the direction traveled
and the experience level. From this study, it is known that the probabilities of a hiker
being experienced or not and of going uphill, downhill or remaining in the same place are
as shown in the table below:
Direction
Uphill Downhill Remain in Same Place
Novice
0.10
0.25
0.25
Experienced 0.05
0.10
0.25
a. Find the marginal distribution of the direction a lost hiker travels.
b. Find the conditional distribution of direction given that the hiker is experienced.
c. Are experience and the direction a hiker walks independent? Justify your answer.
6.3. The U.S. Census Bureau collects information on certain characteristics in the U.S.
population. For 2004, it reported the following numbers of citizens by age and whether
or not they were in poverty as shown below (all counts are in 1000s of people).
Live in Poverty Do Not Live in Poverty
Under 18 years
347
12,680
18 to 64 years
517
19,997
65 years and older
130
3,327
Totals
994
35,984
Totals
13,027
20,514
3,457
36,978
20
Suppose a person is selected at random from the citizens of the United States and observe
his (her) age and poverty status. Further, define X1 and X2 as follows:
⎧ 1,
⎪
⎪
X 1 = ⎨ 2,
Let
⎪3,
⎪⎩
⎧0,
Let X 2 = ⎨
⎩ 1,
if age is under 18 years
if age is 18 to 64 years
if age is 65 years or older
if person lives in poverty
if person does not live in poverty
a. Find the joint probability distribution of X1 and X2.
b. Find the marginal probability distribution of age.
c. Find the marginal probability distribution of poverty in the U.S.
d. Find the conditional distribution of age given that a person is in poverty.
e. Are age and whether or not a person lives in poverty in the U.S. independent? Justify
your answer.
6.4. The National Center for Health Statistics (NCHS) of the Centers for Disease Control
(CDC), among others, is interested in the causes of death and its relationship to gender,
age, race, etc. For children, ages 1 to 4, the numbers of deaths for each gender and by
cause during 2002 are presented in the table below. Let X1 be the random variable
associated with cause of death and X2 be the random variable representing gender.
.
Cause of Death
Female Male Total
Accidents (unintentional injuries) 617
1024 1641
Congenital Problems
248
282 530
Assault (homicide)
189
234 323
Other
998
1266 2264
Total
2052
2806 4858
Source: CDC/NCHS
a. Fully define the random variables X1 and X2.
b. Find the joint distribution of gender and cause of death.
c. Find the marginal distribution of gender.
d. Find the marginal distribution of cause of death.
e. What is the probability that a randomly selected male in this age range who died had a
congenital problem leading to his death?
f. What is the probability that a child, aged 1 to 4, who died as a consequence of a
homicide was a female?
g. Find the conditional distribution of cause of death given gender.
h. Are cause of death and gender independent? Justify your answer.
21
6.5. Refer to Example 6.3.
a. Find the cumulative distribution function for this distribution.
b. Use the distribution function found in (a) to find the probability that there are no more
than 50% total impurities and that the proportion of type I impurities is no more than 25%
of the total.
6.6. A radioactive particle is randomly located in a square area with sides that are 1 unit
in length. Let X1 and X2 denote the coordinates of the particle. Since the particle is
equally likely to fall in any subarea of fixed size, a reasonable model of (X1, X2) is given
by
⎧⎪1,
0 ≤ x1 ≤ 1. 0 ≤ x2 ≤ 1
f ( x1 , x2 ) = ⎨
elsewhere
⎪⎩0,
a. Sketch the probability density surface.
b. Find P( X 1 ≤ 0.2, X 2 ≤ 0.4) .
c. Find P(0.1 ≤ X 1 ≤ 0.3, X 2 > 0.4)
6.7. Refer to Exercise 6.6.
a. Find the joint cumulative distribution function for X1 and X2.
b. Use the joint cumulative distribution function to find the probability that X1 < 0.5 and
X2 <0.8.
c. Verify the probability in (b) using the joint probability density function.
6.8. An environmental engineer measures the amount (by weight) of particulate pollution
in air samples (of a certain volume) collected over the smokestack of a coal fueled power
plant. Let X1 denote the amount of pollutant per sample when a certain cleaning device
on the stack is not operating, and let X2 denote the amount of pollutants per sample when
the cleaning device is operating under similar environmental conditions. It is observed
that X1 is always greater than 2X2; and the relative frequency of (X1, X2) can be modeled
by
⎧k ,
0 ≤ x1 ≤ 2, 0 ≤ x 2 ≤ 1, 2 x2 ≤ x1
⎪
f ( x1 , x 2 ) = ⎨0,
elsewhere
⎪⎩
(In other words, X1 and X2 are randomly distributed over the region inside the triangle
bounded by x1 =2, x2 = 0, and 2x2 = x1.)
a. Find the value of k that makes this a probability density function.
b. Find P(X1 ≥ 3X2). (In other words, find the probability that the cleaning device will
reduce the amount of pollutant by one-third or more.)
6.9. Refer to the situation described in Exercise 6.8.
a. Find the marginal density function of X2.
b. Find P(X2 ≤ 0.4).
c. Are X1 and X2 independent?
22
d. Find P(X2 ≤ ¼|X1 = 1).
6.10. Let X1 and X2 denote the proportions of two different chemicals found in a sample
mixture of chemicals used as an insecticide. Suppose that X1 and X2 have a joint
probability density given by
⎧⎪2,
f ( x1 , x 2 ) = ⎨
⎪⎩0,
0 ≤ x1 ≤ 1. 0 ≤ x 2 ≤ 1, 0 ≤ x1 + x 2 ≤ 1
elsewhere
a. Find P(X1 ≤ ¼, X2 ≤ ½)
b. Find P(X1 ≤ ¾, X2 ≤ ¾)
c. Find P(X1 ≤ ½|X2 ≤ ½)
6.11. Refer to the setting described in Exercise 6.10.
a. Find the marginal density functions for X1 and X2.
b. Are X1 and X2 independent?
6.12. Refer to the setting described in Exercise 6.10.
a. Find the conditional distribution of X1 given X2.
b. Find P(X1 > ½|X2 = ¼).
6.13. Let X1 and X2 denote the proportions of time, out of one workweek, that employees
I and II, respectively, actually spend performing their assigned tasks. The joint relative
frequency behavior of X1 and X2 is modeled by the probability density function
⎧⎪ x1 + x2 ,
f ( x1 , x2 ) = ⎨
⎪⎩ 0,
0 ≤ x1 ≤ 1. 0 ≤ x2 ≤ 1
elsewhere
a. Find P(X1 < ½, X2 > ¼).
b. Find P(X1 + X2 ≤ 1).
6.14. Refer to Exercise 6.13.
a. Find the marginal distributions of X1 and X2, the distributions of the time each
employee spends perform his assigned tasks.
b. Find the probability that the first employee spends more than 75% of the workweek on
his assigned task.
c. Are X1 and X2 independent?
6.15. Referring to Exercise 6.13, find the probability that employee I spends more than
75% of the workweek on his assigned task, given that employee II spends exactly 50% of
the workweek on her assigned task.
6.16. An electronic surveillance system has one of each of two different types of
components in joint operations. If X1 and X2 denote the random life lengths of the
23
components of type I and type II, respectively, the joint probability density function is
given by
⎧ 1 −( x1 + x2 ) / 2
,
x1 > 0, x2 > 0
⎪ xe
f ( x1 , x2 ) = ⎨ 8 1
elsewhere
⎪⎩ 0,
(measurements are in hundreds of hours).
a. Are X1 and X2 independent?
b. Find P(X1 >1, X2 >1).
6.17. A bus arrives at a bus top at a randomly selected time within a 1-hour period. A
passenger arrives at the bus stop at a randomly selected time within the same hour. The
passenger is willing to wait for the bus for up to ¼ of an hour. What is the probability
that the passenger will catch the bus? [Hint: Let X1 denote the bus arrival time, and let
X2 denote the passenger arrival time. If these arrivals are independent, then
⎧⎪1,
f ( x1 , x2 ) = ⎨
⎪⎩0,
0 ≤ x1 ≤ 1. 0 ≤ x2 ≤ 1
elsewhere
Now find P(X2 ≤ X1 ≤ X2 + ¼).]
6.18. Two friends are to meet at a library. Each arrives at an independently selected time
within a fixed 1-hour period. Each agrees to wait not more than 10 minutes for the other.
Find the probability that they will meet.
6.19. Two quality control inspectors each interrupt a production line at randomly (and
independently) selected times within a give day (of 8 hours). Find the probability that the
two interruptions will be more than 4 hours apart.
6.20. Two customers enter a bank at random times within a fixed 1-hour period. Assume
that the calls are independent of each other.
a. Find the probability that both arrive in the first half hour.
b. Find the probability that the two customers are within 5 minutes of each other.
6.21. A bombing target is in the center of a circle with radius 1 mile. A bomb falls at a
randomly selected point inside that circle. If the bomb destroys everything within ½ mile
of its landing point, what is the probability that it will destroy the target?
6.4 Expected Values of Functions of Random Variables
When we encounter problems that involve more than one random variable, we often
combine the variables into a single function. We may be interested in the life lengths of
five different electronic components within the same system or the difference between
24
two strength test measurements on the same section of cable. We now discuss how to
find expected values of functions of more than one random variable. Definition 5.5 gives
the basic result for finding expected values in the bivariate case. The definition, of
course, can be generalized to more variables.
Notice that, if X1 and X2 are independent, then
E[g ( X 1 )h( X 2 )] = E[g ( X 1 )]E[h( X 2 )]
.
A function of two variables that is commonly of interest in probabilistic and
statistical problems is the covariance.
Definition 6.7. Suppose that the discrete random variables (X1, X2) have a joint
probability function given by p ( x1 , x 2 ) . If g ( X 1 , X 2 ) is any real-valued function of
( X 1 , X 2 ) , then
E [g ( X 1 , X 2 )] = ∑ ∑ g ( x1 , x 2 ) p ( x1 , x 2 )
x1 x2
The sum is over all values of ( x1 , x 2 ) for which p ( x1 , x 2 ) > 0 . If ( X 1 , X 2 ) are
continuous random variables, with probability density function f ( x1 , x 2 ) , then
∞ ∞
E [g ( X 1 , X 2 )] = ∫ ∫ g ( x1 , x 2 ) f ( x1 , x 2 )dx1dx 2
−∞ −∞
Definition 6.8. The covariance between two random variables X1 and X2 is given by
cov( X 1 , X 2 ) = E [( X 1 − μ1 )( X 2 − μ 2 )]
where
μ1 = E ( X 1 ) and μ 2 = E ( X 2 )
The covariance helps us assess the relationship between two variables in the
following sense. If X2 tends to be large when X1 is large, and if X2 tends to be small when
X1 is small, then X1 and X2 have a positive covariance. If, on the other hand, X2 tends to
be small when X1 is large and large when X1 is small, then the two variables have a
negative covariance.
While covariance measures the direction of the association between two random
variables, correlation measures the strength of the association.
Definition 6.9. The correlation coefficient between two random variables X1 and X2
is given by
cov( X 1 , X 2 )
ρ=
V ( X 1 )V ( X 2 )
25
The correlation coefficient is a unitless quantity that takes on values between -1
and +1. If ρ = +1 or ρ = -1, then X2 must be a linear function of X1. The next theorem
gives an easier computation form for the covariance.
Theorem 6.1. If X1 has a mean μ1 and X2 has a mean of μ2, then
cov( X 1 , X 2 ) = E ( X 1 X 2 ) − μ1 μ 2
The proof (not included here) is an application of the properties of expected value.
If X1 and X2 are independent random variables, then E ( X 1 X 2 ) = E ( X 1 )E ( X 2 ) .
Using Theorem 6.1, it is then clear that independence between X1 and X2 implies that
cov( X 1 , X 2 ) = 0 . The converse is not necessarily true; that is, zero covariance does not
imply that the variables are independent.
To see how this theorem works, look at the joint distribution in Table 6.4. Clearly,
E ( X 1 ) = E ( X 2 ) = 0 , and E ( X 1 X 2 ) = 0 as well. Therefore, cov( X 1 , X 2 ) = 0 . On the
other hand,
1 9
P ( X 1 = −1, X 2 = −1) = ≠
= P ( X 1 = −1)P ( X 2 = −1)
8 64
so the random variables are dependent.
Table 6.4. Joint Distribution of Dependent Variables
with a Covariance of Zero
Totals
X1
-1
0
1
-1 1/8 1/8 1/8
3/8
X2 0 1/8 0 1/8
2/8
1 1/8 1/8 1/8
3/8
Totals 3/8 2/8 3/8
1
We shall now calculate the covariance in two examples.
Example 6.9:
A travel agency keeps track of the number of customers who call on any one day and the
number of trips booked on any one day. Let X1 denote the number of calls, let X2 denote
the number of orders placed, and let p(x1, x2) denote the joint probability function for (X1,
X2). Records show that
p(0, 0) = 0.04
p(2, 1) = 0.20
p(3, 1) = 0.16
26
p(1, 0) = 0.08
p(1, 1) = 0.06
p(2, 0) = 0.12
p(2, 2) = 0.12
p(3, 0) = 0.10
p(3, 2) = 0.10
p(3, 3) = 0.02
Thus, for any given day, the probability of, say, two calls and one order is 0.20. Find
cov(X1, X2) and the correlation between X1 and X2.
Solution:
Theorem 6.1 suggests that we first find E(X1, X2), which is
E ( X 1 X 2 ) = ∑ ∑ x1 x2 p( x1 , x2 )
x1 x2
= (0 × 0) p(0,0) + (1 × 0) p (1,0) + (1 × 1) p(1,1) + ( 2 × 0) p ( 2,0)
+ ( 2 × 1) p( 2,1) + ( 2 × 2) p ( 2,2) + (3 × 0) p(3,0)
+ (3 × 1) p(3,1) + (3 × 2) p(3,2) + (3 × 3) p(3,3)
= 0(0.4) + 0(0.08) + 1(0.06) + 0(0.12) + 2(0.20) + 4(0.12)
+ 0(0.10) + 3(0.16) + 6(0.10) + 9(0.02)
= 2.20
Now we must find E ( X 1 ) = μ1 and E ( X 2 ) = μ2 . The marginal distributions of X1 and X2
are given on the following charts:
It follows that
and
Thus,
x1
0
1
2
3
p(x1)
0.04
0.14
0.44
0.32
x2
0
1
2
3
p(x2)
0.34
0.42
0.22
0.02
μ1 = 1(0.14) + 2(0.44) + 3(0.32) = 1.98
μ2 = 1(0.42) + 2(0.22) + 3(0.02) = 0.92
27
cov( X 1 , X 2 ) = E ( X 1 X 2 ) − μ1μ2
= 2.20 − 1.98(0.92)
= 0.3784
From the marginal distributions of X1 and X2, it follows that
( )
V ( X 1 ) = E X 12 − μ12
= 4.78 − (1.98) 2
= 0.8596
and
( )
V ( X 2 ) = E X 22 − μ22
= 1.48 − (0.92) 2
= 0.6336
Hence,
ρ=
=
cov( X 1 , X 2 )
V ( X 1 ) ×V ( X 2 )
0.3784
(0.8596)(0.6336)
= 0.51
A moderate positive association exists between the number of calls and the number of
trips booked. Do the positive covariance and the strength of the correlation agree with
your intuition?
_____________________________________________________________________
When a problem involves n random variables X 1 , X 2 ,…, X n , we are often
interested in studying linear combinations of those variables. For example, if the random
variables measure the quarterly incomes for n plants in a corporation, we may want to
look at their sum or average. If X1 represents the monthly cost of servicing defective
plants before a new quality control system was installed, and X2 denotes that cost after the
system went into operation, then we may want to study X 1 − X 2 . Theorem 6.2 gives a
general result for the mean and variance of a linear combination of random variables.
28
Theorem 6.2. Let Y1 , Y2 ,…, Yn and X 1 , X 2 , … , X m be random variables, with
E (Yi ) = μ i and E ( X i ) = ξ i . Define
n
U 1 = ∑ ai Yi
m
U2 = ∑bj X j
and
i =1
j =1
For constants a1 , a 2 ,… , a n and b1 , b2 ,…, bm . Then the following relationships hold:
n
(a) E (U 1 ) = ∑ ai μ i
i =1
(
n
(b) V (U 1 ) = ∑ ai2V (Yi ) + 2 ∑ ∑ ai a j cov Yi , Y j
i =1
i< j
)
where the double sum is over all pairs (i, j ) with i < j
(
n m
(c) cov(U 1 ,U 2 ) = ∑ ∑ ai b j cov Yi , X j
i =1 j =1
)
Proof:
(a) The proof for part (a) follows directly from the definition of expected value and
from properties of sums or integrals.
(b) To prove this part, we resort to the definition of variance and write
V (U 1 ) = E [U 1 − E (U 1 )]2
n
n
⎤
= E ⎡⎢ ∑ ai Yi − ∑ ai μ i ⎥
i =1
⎦
⎣i =1
2
2
⎤
⎡n
= E ⎢ ∑ ai (Yi − μ i )⎥
⎦
⎣i =1
⎤
⎡n
2
= E ⎢ ∑ ai2 (Yi − μ i ) + ∑ ∑ ai a j E (Yi − μ i ) Y j − μ j ⎥
i≠ j
⎦
⎣i =1
[
[
n
)]
(
(
= ∑ ai2 E (Yi − μ i )2 + ∑ ∑ ai a j E (Yi − μ i ) Y j − μ j
i =1
i≠ j
By the definition of variance and covariance, we then have
n
(
V (U 1 ) = ∑ ai2V (Yi ) + ∑ ∑ ai a j cov Yi , Y j
i =1
(
)
i≠ j
(
)
)
Notice that cov Yi , Y j = cov Y j , Yi ; hence, we can write
n
(
V (U 1 ) = ∑ ai2V (Yi ) + 2 ∑ ∑ ai a j cov Yi , Y j
i =1
i< j
)
(c) The proof of this part is obtained by similar steps. We have
)]
29
cov(U 1 , U 2 ) = E{[U 1 − E (U 1 )][U 2 − E (U 2 )]}
n
n
⎡⎛ n
⎞⎤
⎞⎛ n
= E ⎢⎜ ∑ aiYi − ∑ ai μi ⎟⎜⎜ ∑ ai X i − ∑ aiξ i ⎟⎟⎥
i =1
j =1
⎠⎝ j =1
⎢⎣⎝ i =1
⎠⎥⎦
⎧⎪ ⎡ n
⎤ ⎫⎪
⎤⎡ n
= E ⎨ ⎢∑ ai (Yi − μi )⎥ ⎢∑ ai ( X i − ξ i )⎥ ⎬
⎪⎩ ⎣ i =1
⎦ ⎣ j =1
⎦ ⎪⎭
⎤
⎡n m
= E ⎢∑∑ ai b j E[(Yi − μi )( X j − ξ j )]⎥
⎦
⎣ i =1 j =1
On observing that cov(Yi , Yi ) = V (Yi ) , we see that part (b) is a special case of part (c).
Example 6.10:
Let Y1 , Y2 , …, Yn be independent random variables, with E (Yi ) = μ and V (Yi ) = σ 2 .
(These variables may denote the outcomes of n independent trials of an experiment.)
Defining
1 n
Y = ∑ Yi
n i =1
2
σ
Show that E (Y ) = μ and V (Y ) =
.
n
Solution:
Notice that Y is a linear function with all constants ai equal to 1/n; that is,
⎛1⎞
⎛1⎞
Y = ⎜ ⎟Y1 + ⎜ ⎟Y2 +
⎝n⎠
⎝n⎠
By Theorem 6.2 part (a),
E (Y ) = ∑ ai μ
n
i =1
n
= μ ∑ ai
i =1
1
i =1 n
n
= μ∑
=
nμ
n
=μ
By Theorem 6.2 part (b),
⎛1⎞
+ ⎜ ⎟Yn
⎝n⎠
30
(
V (Y ) = ∑ ai2V (Yi ) + 2 ∑ ∑ ai a j cov Yi , Y j
n
i =1
i< j
)
But the covariance terms are all zero, since the random variables are independent. Thus,
2
nσ 2 σ 2
⎛1⎞
V (Y ) = ∑ ⎜ ⎟ σ 2 = 2 =
n
i =1⎝ n ⎠
n
.
n
Example 6.11:
Let X1 denote the amount of gasoline stocked in a bulk tank at the beginning of a week,
and let X2 denote the amount sold during the week; then Y = X 1 − X 2 represents the
amount left over at the end of the week. Find the mean and the variance of Y, assuming
that the joint density function of (X1, X2) is given by
for 0 ≤ x2 ≤ x1 ≤ 1
⎧⎪ 3x1 ,
f ( x1 , x2 ) = ⎨
⎪⎩0,
elsewhere
Solution:
We must first find the means and the variances of X1 and X2. The marginal density of X1
is found to be
⎧⎪3x12 ,
for 0 ≤ x1 ≤ 1
f1 ( x1 ) = ⎨
⎪⎩0,
elsewhere
Thus,
1
( )
E ( X 1 ) = ∫ x1 3x12 dx1
0
1
⎡ x4 ⎤
= 3⎢ 1 ⎥
⎣ 4 ⎦0
3
=
4
The marginal distribution of X2 is found to be
(
)
⎧3
2
⎪ 1 − x2 ,
f 2 ( x2 ) = ⎨ 2
⎪⎩ 0,
Thus,
for 0 ≤ x2 ≤ 1
elsewhere
31
1
E ( X 2 ) = ∫ ( x2 )
0
(
)
3
1 − x22 dx2
2
(
)
31
= ∫ x2 − x23 dx2
20
1
1
3 ⎧⎪ ⎡ x22 ⎤ ⎡ x24 ⎤ ⎫⎪
= ⎨⎢ ⎥ − ⎢ ⎥ ⎬
2 ⎪⎣ 2 ⎦ 0 ⎣ 4 ⎦ 0 ⎪
⎭
⎩
3 ⎧1 1 ⎫
= ⎨ − ⎬
2 ⎩2 4 ⎭
3
=
8
By similar arguments, it follows that
E ( X 12 ) =
3
5
3 ⎛ 3⎞
V ( X1) = − ⎜ ⎟
5 ⎝4⎠
= 0.0375
E ( X 22 ) =
2
1
5
and
1 ⎛ 3⎞
V(X2) = − ⎜ ⎟
5 ⎝8⎠
= 0.0594
The next step is to find cov(X1,X2). Now
2
32
1 x1
E ( X 1 X 2 ) = ∫ ∫ ( x1 x2 )3x1dx2 dx1
00
1 x1
= 3∫ ∫ x12 x2 dx2 dx1
0 0
1
= 3∫
0
=
⎡ x2
x12 ⎢ 2
x1
⎤
⎥ dx1
2
⎣ ⎦0
31 4
∫ x1 dx1
20
1
3 ⎡ x5 ⎤
= ⎢ 1⎥
2 ⎣ 5 ⎦0
=
and
3
10
cov( X 1 , X 2 ) = E ( X 1 X 2 ) − μ1μ2
3 ⎛ 3 ⎞⎛ 3 ⎞
− ⎜ ⎟⎜ ⎟
10 ⎝ 4 ⎠⎝ 8 ⎠
= 0.0188
=
From Theorem 6.2,
E (Y ) = E ( X 1 ) − E ( X 2 )
3 3
−
4 8
3
=
8
= 0.0375
=
and
V (Y ) = V ( X 1 ) + V ( X 2 ) + 2(1)( −1) cov( X 1 , X 2 )
= 0.0375 + 0.0594 − 2(0.0188)
= 0.0593
When the random variables in use are independent, calculating the variance of a
linear function simplifies, since the covariances are zero.
Example 6.12:
A firm purchases two types of industrial chemicals. The amount of type I chemical
purchased per week, X1, has E(X1) = 40 gallons, with V(X1) =4. The amount of type II
chemical purchased, X2, has E(X2) = 65 gallons, with V(X2) = 8. Type I costs $3 per
33
gallon, whereas type II costs $5 per gallon. Find the mean and the variance of the total
weekly amount spent on these types of chemicals, assuming that X1 and X2 are
independent.
Solution:
The dollar amount spent week is given by
Y = 3X1 + 5X 2
From Theorem 6.2,
E (Y ) = 3E ( X 1 ) + 5E ( X 2 )
= 3( 40) + 5(65)
= 445
and
V (Y ) = (3) 2V ( X 1 ) + (5) 2V ( X 2 )
= 9( 4) + 25(8)
= 236
The firm can expect to spend $445 per week in chemicals.
Example 6.13:
Sampling problems involving finite populations can be modeled by selecting balls from
urns. Suppose that an urn contains r white balls and (N – r) black balls. A random
sample of n balls is drawn without replacement; and Y, the number of white balls in the
sample, is observed. From Chapter 4, we know that Y has a hypergeometric probability
distribution. Find the mean and the variance of Y.
Solution:
We first observe some characteristics of sampling without replacement. Suppose that the
sampling is done sequentially, and we observe outcomes from X1, X2, …, Xn, where
⎧ 1,
Xi = ⎨
⎩0,
if the ith draw results in a white ball
otherwise
Unquestionably, P( X 1 ) = r N . But, it is also true that P( X 2 ) = r N , because
34
P( X 2 = 1) = P( X 1 = 1, X 2 = 1) + P( X 1 = 0, X 2 = 1)
= P( X 1 = 1) P( X 2 = 1 | X 1 = 1) + P( X 1 = 0) P( X 2 = 1 | X 1 = 0)
r ⎛ r −1 ⎞ N − r ⎛ r ⎞
⎟
⎜
⎟+
⎜
N ⎝ N −1⎠
N ⎝ N −1⎠
r ( N − 1)
=
N ( N − 1)
r
=
N
=
The same is true for Xk; that is,
P ( X k = 1) =
r
,
N
k = 1,2,..., n
Thus, the probability of drawing a white ball on any draw, given no knowledge of the
outcomes of previous draws, is r/N.
In a similar way, it can be shown that
P ( X j = 1, X k = 1) =
r ( r − 1)
,
N ( N − 1)
j≠k
n
Now observe that Y = ∑ X i ; hence,
i =1
n
⎛ r ⎞
E (Y ) = ∑ E ( X i ) = n⎜ ⎟
i =1
⎝N⎠
To find V(Y), we need V(Xi) and cov(Xi, Xj). Because Xi is 1 with probability r/N and 0
with probability 1 – (r/N), it follows that
V (Xi ) =
Also,
r
N
r ⎞
⎛
⎜1 − ⎟
⎝ N⎠
cov( X i , X j ) = E ( X i X j ) − E ( X i ) E ( X j )
=
r ( r − 1) ⎛ r ⎞
−⎜ ⎟
N ( N − 1) ⎝ N ⎠
=−
r
N
2
r⎞ 1
⎛
⎜1 − ⎟
⎝ N ⎠ N −1
Because XiXj = 1 if and only if Xi = 1 and Xj = 1. From Theorem 6.2, we know that
35
n
V (Y ) = ∑ V ( X i ) + 2∑∑ cov( X i , X j )
i =1
i< j
=n
r ⎛
r ⎞
r⎞ 1 ⎤
⎡ r ⎛
⎜1 − ⎟ + 2∑∑ ⎢− ⎜1 − ⎟
N⎝
N⎠
N⎝
N ⎠ N − 1 ⎥⎦
i< j ⎣
=n
r ⎛
r ⎞
r ⎛
r⎞ 1
⎜1 − ⎟ − n ( n − 1) ⎜1 − ⎟
N⎝
N⎠
N⎝
N ⎠ N −1
because there are n(n – 1)/2 terms in the double summation. A little algebra yields
V (Y ) = n
r ⎛
r ⎞ N −n
⎜1 − ⎟
N ⎝ N ⎠ N −1
The usefulness of Theorem 6.2 becomes especially clear when one tries to find
the expected value and the variance for the hypergeometric random variable. Without it,
by proceeding directly from the definition of an expectation, the necessary summations
are exceedingly difficult to obtain.
Exercises
6.22. A radioactive particle is randomly located in a square area with sides that are 1
unit in length. Let X1 and X2 denote the coordinates of the particle. Since the particle is
equally likely to fall in any subarea of fixed size, a reasonable model of (X1, X2) is given
by
⎧⎪1,
0 ≤ x1 ≤ 1. 0 ≤ x 2 ≤ 1
f ( x1 , x 2 ) = ⎨
elsewhere
⎪⎩0,
a. Find the mean of Xi.
b. Find the mean of X2 .
6.23. Refer to Exercise 6.22.
a. Find the covariance between X1 and X2.
b. Find the correlation between X1 and X2.
6.24. In a study of particulate pollution in air samples over a smokestack, X1 represents
the amount of pollutants per sample when a cleaning device is not operating, and X2
represents the amount per sample when the cleaning device is operating. Assume that (X1,
X2) has a joint probability density function
for 0 ≤ x1 ≤ 2; 0 ≤ x2 ≤ 1; 2 x2 ≤ x1
⎧⎪1,
f ( x1 , x2 ) = ⎨
⎪⎩0,
elsewhere
a. Find the mean and variance of pollutants per sample when the cleaning device is not in
place.
36
b. Find the mean and variance of pollutants per sample when the cleaning device is in
place.
6.25. Refer to Exercise 6.25. The random variable Y = X1 – X2 represents the amount by
which the weight of emitted pollutant can be reduced by using the cleaning device.
a. Find the mean and variance of Y.
b. Find an interval within which values of Y should lie at least 75% of the time.
6.26. Let X1 and X2 denote the proportions of two different chemicals found in a sample
mixture of chemicals used as an insecticide. Suppose that X1 and X2 have a joint
probability density given by
⎧⎪2,
f ( x1 , x 2 ) = ⎨
⎪⎩0,
0 ≤ x1 ≤ 1. 0 ≤ x 2 ≤ 1, 0 ≤ x1 + x 2 ≤ 1
elsewhere
a. Find the mean of X1.
b. Find the variance of X1.
6.27. Refer to Exercise 6.26.
a. Find the mean of X2.
b. Find the variance of X2.
6.28. Refer to Exercise 6.26.
a. Find the covariance between X1 and X2.
b. Find the correlation between X1 given X2.
6.29. The proportions X1 and X2 of two chemicals found in samples in an insecticide
have the joint probability density function
⎧⎪2,
f ( x1 , x2 ) = ⎨
⎪⎩ 0,
for 0 ≤ x1 ≤ 1; 0 ≤ x2 ≤ 1; 0 ≤ x1 + x2 ≤ 1
elsewhere
The random variable Y = X1 + X2 denotes the proportion of the insecticides due to the
combination of both chemicals.
a. Find the mean and variance of the proportion of the first chemical (X1) used in the
insecticide.
b. Find the mean and variance of the proportion of the insecticides due to the
combination of both chemicals.
c. Find an interval within which values of Y should lie for at least 50% of the samples of
insecticide.
6.30. For a sheet-metal stamping machine in a certain factory, the time between failures,
X1, has a men time between failure (MTBF) of 56 hours and a variance of 16 hours. The
repair time, X2, has a mean time to repair (MTTR) of 5 hours and a variance of 4 hours.
37
If X1 and X2 are independent, find the expected value and the variance of Y = X1 + X2,
which represents one operation-repair cycle.
6.31. A particular fast-food outlet is interested in the joint behavior of the random
variable Y1, which is defined as the total time, in minutes, between a customer’s arrival at
the fast-food outlet and his or her leaving the service window, and Y2, the time that the
customer waits in line before reaching the service window. Because Y1 includes the time
time a customer waits in line, we must have Y1 ≥ Y2. The relative frequency distribution
of observed values of Y1 and Y2 can be modeled by the probability density function
⎧⎪e − y1 ,
f ( y1 , y2 ) = ⎨
⎪⎩ 0,
a.
b.
c.
d.
for 0 ≤ y2 ≤ y1 < ∞
elsewhere
Find P(Y1 < 2,Y2 > 1) .
Find P(Y1 ≥ 2Y2 )
Find P(Y1 − Y2 ≥ 1) . [Note: Y1 − Y2 denotes the time spent at the service window.]
Find the marginal density functions for Y1 and Y2.
6.32. For the situation described in Exercise 6.31, the random variable Y1 - Y2 represents
the time spent at the service window.
a. Find E(Y1 - Y2)
b. Find V(Y1 - Y2).
c. Is it highly likely that a customer will spend more than 2 minutes at the service
window?
6.33. Prove Theorem 6.1.
6.5.
Conditional Expectations
Section 6.2 contains a discussion of conditional probability functions and conditional
density functions, which we shall now relate to conditional expectations. Conditional
expectations are defined in the same manner as univariate expectations are, except that
the conditional density is used in place of the marginal density function.
38
Definition 6.8. If X1 and X2 are any two random variables, the conditional
expectation of X1 given that X 2 = x 2 is defined to be
∞
E ( X 1 | X 2 = x2 ) = ∫ x1 f ( x1 | x2 )dx1
−∞
If X1 and X2 are jointly continuous, and
E ( X 1 | X 2 = x 2 ) = ∑ x1 p ( x1 | x 2 )
x1
If X1 and X2 are jointly discrete
Example 6.14:
Refer to Example 6.7, with X1 denoting the amount of drink sold, X2 denoting the amount
in supply, and
⎧ 1 / 2,
0 ≤ x1 ≤ x2 ; 0 ≤ x2 ≤ 2
f ( x1 , x2 ) = ⎨
elsewhere
⎩0,
Find the conditional expectation of amount of sales X1, given that X2 = 1.
Solution:
In Example 6.7, we found that
0 ≤ x1 ≤ x2 ≤ 2
elsewhere
⎧1 / x 2 ,
f ( x1 | x2 ) = ⎨
⎩0,
Thus, from Definition 6.8,
∞
E ( X 1 | X 2 = 1) = ∫ x1 f ( x1 | x2 )dx1
−∞
1
= ∫ x1 (1)dx1
0
x2
= 1
2
1
0
1
=
2
It follows that, if the soft-drink machine contains 1 gallon at the start of the day, the
expected amount that will be sold that day is ½ gallon.
It is important to note that we could solve this problem more generally. That is,
given that the amount is supply for a given day is X2 = x2, what is the expected amount
that will be sold that day.
39
∞
E ( X 1 | X 2 = x2 ) = ∫ x1 f (x1 | x2 )dx1
−∞
x2 ⎛
1⎞
= ∫ x1 ⎜⎜ ⎟⎟dx1
0 ⎝ x2 ⎠
x2
= 1
2 x2
=
x2
,
2
x2
0
0 ≤ x2 ≤ 2
Notice that, if X2 = 1, meaning that the soft-drink machine contains 1 gallon at the start of
the day, the expected amount sold that day is still ½ gallon. However, we can also easily
determine that, if the soft-drink machine contains the full 2 gallons at the start of the day,
the expected amount sold is 2/2 = 1 gallon. Similarly, we can determine the expected
amount sold given any initial amount in the machine.
Suppose we kept records over many days of the gallons at the start of the day and
the amount sold. We could plot the amount sold against the amount available at the start
of the day. The points of the scatter plot should lie roughly along a line. If we fit a least
squares line to these points, the intercept should be 0, and the slope should be about ½.
x
In fact, we would be estimating E ( X 1 | X 2 = x2 ) = 2 . Of course, here we know enough
2
to obtain the exact relationship in X1 and X2. In real life, this is often not the case, and
regression is used to obtain estimates of the very things we have derived exactly here.
_______________________________________________________________________
In Example 6.14, we found the conditional expectation of X1, given X 2 = x2 ,
obtaining the conditional expectation as being a function of the random variable X2.
Hence, we can find the expected value of the conditional expectation. The results of this
type of iterated expectation are given in Theorem 6.3.
40
Theorem 6.3. Let X1 and X2 denote random variables. Then
E ( X 1 ) = E [E ( X 1 | X 2 )]
Where, on the right-hand side, the inside expectation is stated with respect to the
conditional distribution of X1 given X2, and the outside expectation is stated with
respect to the distribution of X2.
Proof:
Let X1 and X2 have the joint density function f ( x1 , x 2 ) and marginal densities f ( x1 )
and f ( x 2 ) , respectively. Then
∞
E ( X 1 ) = ∫ x1 f1 ( x1 )dx1
−∞
∞ ∞
= ∫ ∫ x1 f (x1 , x 2 )dx1dx 2
−∞ −∞
∞ ∞
= ∫ ∫ x1 f1 (x1 | x 2 ) f 2 (x 2 )dx1dx 2
−∞ −∞
∞ ⎡∞
⎤
= ∫ ⎢ ∫ x1 f1 (x1 | x 2 )dx1 ⎥ f 2 ( x 2 )dx 2
−∞ ⎣ −∞
⎦
∞
= ∫ E ( X 1 | X 2 = x 2 ) f 2 ( x 2 )dx 2
−∞
= E [E ( X 1 | X 2 )]
The proof is similar for the discrete case.
Conditional expectations can also be used to find the variance of a random variable, in
the spirit of Theorem 6.3. The main result is given in Theorem 6.4.
41
Theorem 6.4. For random variables X1 and X2,
V ( X 1 ) = E [V ( X 1 | X 2 )] + V [E ( X 1 | X 2 )]
Proof:
We begin with E [V ( X 1 | X 2 )] and write
[(
)
]
= E [E (X | X )] − E [E ( X | X )]
= E [X ] − E [E ( X | X )]
= E [X ] − [E ( X )] − E [E ( X | X
E [V ( X 1 | X 2 )] = E E X 12 | X 2 − [E ( X 1 | X 2 )]
2
2
1
2
2
1
2
2
2
1
1
2
)]2 + [E ( X 1 )]2
2
2
= V ( X 1 ) − E [[E ( X 1 | X 2 )] ] + [E [E ( X 1 | X 2 )] ]
= V ( X 1 ) − V [E ( X 1 | X 2 )]
2
1
2
1
1
2
Rearranging terms, we have
V ( X 1 ) = E [V ( X 1 | X 2 )] + V [E ( X 1 | X 2 )]
Example 6.15:
A quality control plan for an assembly line involves sampling n finished items per day
and counting Y, the number of defective items. If p denotes the probability of observing a
defective item, then Y has a binomial distribution, when the number of items produced by
the line is large. However, p varies from day to day and is assumed to have a uniform
distribution on the interval from 0 to ¼.
1. Find the expected value of Y for any given day.
2. Find the standard deviation of Y.
3. Given n = 10 items are sampled, find the expected value and standard deviation of the
number of defectives for that day.
Solution:
1. From Theorem 6.4, we know that
E (Y ) = E [E (Y | p )]
For a given p, Y has a binomial distribution; hence,
E (Y | p ) = np
42
Thus,
E (Y ) = E ( np )
= nE ( p )
1/ 4
= n ∫ 4 pdp
0
⎛1⎞
= n⎜ ⎟
⎝8⎠
And for n = 10,
E (Y ) =
2. From Theorem 6.4,
10 5
=
8 4
V (Y ) = E[V (Y | p )] + [ E ( y | p )]
= E[np(1 − p )] + V [np ]
= nE ( p ) − nE ( p 2 ) + n 2V ( p )
= nE ( p ) − n ( p ) + n[ E ( p )]2 + n 2V ( p )
= nE ( p )[1 + E ( p )] + n( n − 1)V ( p )
The standard deviation of course is simply V (Y ) = nE ( p )[1 + E ( p)] + n(n − 1)V ( p ) .
Using the properties of the uniform distribution, we have that E(p) = 1/8 and
1
. Thus, for n = 10,
V ( p) =
2
( 4) (12)
1
⎛ 1 ⎞⎛ 7 ⎞
V (Y ) = 10⎜ ⎟⎜ ⎟ + 10(9) 2
( 4) (12)
⎝ 8 ⎠⎝ 8 ⎠
= 1.56
and
V (Y ) = 1.25
3. In the long run, if n = 10 items are inspected each day, this inspection policy should
discover, on average, 5/4 defective items per day, with a standard deviation of 1.25. The
calculations should be checked by actually finding the unconditional distribution of Y and
computing E(Y) and V(Y) directly.
________________________________________________________________________
43
Exercises
6.34. Let X1 and X2 denote the proportions of time, out of one workweek, that employees
I and II, respectively, actually spend performing their assigned tasks. The joint relative
frequency behavior of X1 and X2 is modeled by the probability density function
⎧⎪ x1 + x2 ,
f ( x1 , x2 ) = ⎨
⎪⎩ 0,
0 ≤ x1 ≤ 1. 0 ≤ x2 ≤ 1
elsewhere
a. If employee I works half of the time, what is the mean time employee II works?
b. If employee I works half of the time, what is the variance of the time employee II
works?
6.35. An environmental engineer measures the amount (by weight) of particulate
pollution in air samples (of a certain volume) collected over the smokestack of a coal
fueled power plant. Let X1 denote the amount of pollutant per sample when a certain
cleaning device on the stack is not operating, and let X2 denote the amount of pollutants
per sample when the cleaning device is operating under similar environmental conditions.
The relative frequency of (X1, X2) can be modeled by
⎧ 1,
⎪
f ( x1 , x2 ) = ⎨0,
⎪⎩
0 ≤ x1 ≤ 2, 0 ≤ x2 ≤ 1, 2 x2 ≤ x1
elsewhere
a. If the amount of pollutant is 0.5 when the cleaning device is operating, what is the
mean of the pollutant emitted when the cleaning deice is not operating.
b. If the amount of pollutant is 0.5 when the cleaning device is operating, what is the
variance of the pollutant emitted when the cleaning deice is not operating.
6.36. Refer to exercise 6.35.
a. If the amount of pollutant is 0.5 when the cleaning device is not operating, what is the
mean of the pollutant emitted when the cleaning deice is operating.
b. If the amount of pollutant is 0.5 when the cleaning device is not operating, what is the
variance of the pollutant emitted when the cleaning deice is operating.
6.37. Referring to Exercise 6.31, if a randomly observed customer’s total time between
arrival and leaving the service window is known to have been 2 minutes, find the
probability that the customer waited less than 1 minute to be served.
6.38. Referring to Exercise 6.31, suppose that a customer spends a length of time y1 in
the fast-food outlet. Find the probability that this customer spends less than half of that
time at the service window.
44
6.39. Suppose that Y is the random variable denoting the number of bacteria per cubic
centimeter in water samples and that, for a given location Y has a Poisson distribution
with mean λ. But λ varies from location to location and has a gamma distribution with
parameters α and β. Find the expressions for E(Y) and V(Y).
6.40. The number of eggs N found in nests of a certain species of turtle has a Poisson
distribution with mean λ. Each egg has probability p of being viable, and this event is
independent from egg to egg. Find the mean and the variance of the number of viable
eggs per nest.
6.41. Let Y have a geometric distribution; that is, let
p ( y ) = p (1 − p ) y ,
y = 0,1,...
The goal is to find V(Y) through a conditional argument.
Let X = 1 if the first trial is a success, and let X = 0 if the first trial is a failure. Argue that
E (Y 2 | X = 1) = 0
E (Y 2 | X = 0) = E [(1 + Y ) 2 ]
Write E (Y 2 ) in terms of E (Y 2 | X )
Simplify the expression in part (b) to show that
E (Y ) =
2− p
p2
V (Y ) =
1− p
p2
Show that
6.6 The Multinomial Distribution
Suppose that an experiment consists of n independent trials, much like the binomial case,
except that each trial can result in any one of k possible outcomes. For example, a
customer checking out of a grocery store may choose any one of k checkout counters.
Now suppose that the probability that a particular trial results in outcome i is denoted by
pi, where i = 1, 2, …, k, denote the number of the n trials that result in outcome i. In
developing a formula for P(Y1 = y1 , Y2 = y 2 , … , Yk = y k ) , we first observe that, because
of independence of trials, the probability of having y1 outcomes of type 1 through yk
outcomes of type k in a particular order is
p1y1 p 2y2 … p kyk
45
The number of such orderings equals the number of ways to partition the n trials into y1
type 1 outcomes, y2 type 2 outcomes, and so on through yk type k outcomes, or
n!
y1! y 2 ! y k !
where
k
∑ yi = n
i =1
Hence,
P(Y1 = y1 , Y2 = y 2 , … , Yk = y k ) =
n!
p1y1 p 2y2
y1! y 2 ! y k !
p kyk
This is called the multinomial probability distribution. Notice that, for k = 2, we are back
into the binomial case, because Y2 is then equal to n − Y1 . The following example
illustrates the computations involved.
Example 6.16:
Items under inspection are subject to two types of defects. About 70% of the items in a
large lot are judged to be free of defects; another 20% have only a type A defect; and the
final 10% have only a type B defect (none has both types of defect). If six of these items
are randomly selected from the lot, find the probability that three have no defects, one has
a type A defect, and two have type B defects.
Solution:
If we can legitimately assume that the outcomes are independent from trial to trial (that is,
from item to item in our sample), which they usually would be in a large lot, then the
multinomal distribution provides a useful model. Letting Y1, Y2, and Y3 denote the
number of trials resulting in zero, type A, and type B defectives, respectively, we have p1
= 0.7, p2 = 0.2, and p3 = 0.1. It follows that
6!
(0.7)3 (0.2)(0.1) 2
3!1!2!
= 0.041
P (Y1 = 3, Y2 = 1, Y3 = 2 ) =
Example 6.17:
Find E(Yi)and V(Yi) for the multinomial probability distribution.
46
Solution:
We are concerned with the marginal distribution of Yi, the number of trials that fall in cell
i. Imagine that all the cells, excluding cell i, are combined into a single large cell. Hence,
every trial will result either in cell i or not in cell i, with probabilities pi and 1 – pi,
respectively; and Yi possesses a binomial marginal probability distribution. Consequently,
E (Yi ) = npi
V (Yi ) = npi qi
where
qi = 1 − pi
The same results can be obtained by setting up the expectations and evaluating. For
example,
n!
E (Yi ) = ∑ ∑ ∑ y1
p1y1 p2y2 pkyk
y
y
y
!
!
!
y1 y2
yk
1 2
k
Because we have already derived the expected value and the variance of Yi, the tedious
summation of this expectation is left to the interested reader.
Example 6.18:
If Y1, Y2, …, Yk have the multinomial distribution given in Example 6.16, find cov(Ys,Yt),
with s ≠ t.
Solution:
Thinking of the multinomial experiment as a sequence of n independent trials, we define
⎧1,
Ui = ⎨
⎩ 0,
if trial i results in class s
otherwise
⎧1,
Wi = ⎨
⎩ 0,
if trial i results in class t
otherwise
and
Then,
n
Ys = ∑U i
i =1
and
n
Yt = ∑W j
j =1
47
To evaluate cov(Ys,Yt), we need the following results:
E (U i ) = ps
E (W j ) = pt
cov(U i ,W j ) = 0
if i ≠ j, since the trials are independent; and
cov(U i ,W j ) = E (U iW j ) − E (U i ) E (W j )
= 0 − ps pt
because UiWi always equals zero. From Theorem 6.2, we then have
n n
cov(Ys , Yt ) = ∑ ∑ cov(U i ,W j )
i =1 j =1
n
= ∑ cov(U i ,Wi ) + ∑ ∑ cov(U i ,W j )
i =1
i< j
n
= ∑ ( − ps pt ) + 0
i =1
= −nps pt
The covariance is negative, which is to be expected because a large number of outcomes
in cell s would force the number in cell t to be small.
Multinomial Experiment
1. The experiment consists of n identical trials.
2. The outcome of each trial falls into one of k classes or cells.
3. The probability that the outcome of a single trial will fall in a particular cell—
say, cell i—is pi (i = 1, 2, …, k), and remains the same from trial to trial.
Notice that p1 + p 2 + + p k = 1 .
4. The trials are independent.
5. The random variables of interest are Y1 , Y2 , …, Yk , where Yi (i = 1, 2, …, k) is
equal to the number of trials in which the outcome falls in cell i. Notice that
Y1 + Y2 + + Yk = n
48
The Multinomial Distribution
P(Y1 = y1 , Y2 = y 2 ,… , Yk = y k ) =
k
k
i =1
i =1
n!
p1y1 p 2y2
y1! y 2 ! y k !
p kyk
where ∑ y i = n and ∑ pi = 1
E (Yi ) = npi
(
V (Yi ) = npi (1 − pi ) ,
)
cov Yi , Y j = −npi p j ,
i = 1, 2,…, k
i≠ j
Exercises
6.42. The National Fire Incident Reporting Service says that, among residential fires,
approximated 74% are in one- or two-family homes, 20% are in multi-family homes
(such as apartments), and the other 6% are in other types of dwellings. If five fires are
reported independently in one day, find the probability that 2 are in one- or two-family
home, 2 are in multi-family homes, and 1 in another type of dwelling.
6.43. The typical cost of damages for a fire in a family home is $50,000, whereas the
typical cost of damage in an apartment fire is $20,000, and in other dwellings only $5000.
Using the information given in Exercise 6.42, find the expected total damage cost for the
five independently reported fires.
6.44. In inspections of aircraft, wing cracks are reported as nonexistent, detectable, or
critical. The history of a certain fleet reveals that 70% of the planes inspected have no
wing cracks, 25% have detectable wing cracks, and 5% have critical wing cracks. For the
next four planes inspected, find the probabilities of the following events.
a. One has a critical crack, one has a detectable crack, and two have no cracks.
b. At least one critical crack is observed.
6.45. Of the 544,685 babies born in California in 2004, 24% were born in winter
(December, January, February), 24% were born in spring (March, April, May), 26% were
born in summer (June, July, August), and 26% were born in the fall (September, October,
November). If four babies are randomly selected, what is the approximate probability
that one was born in each season? Does it matter whether the babies were born in 2005?
California?
6.46. The U.S. Bureau of Labor Statistics reported that, as of 2005, 19% of the adult
population under age 65 was between 16 and 24 years of age, 20% was between 25 and
34, 23% was between 35 and 44, and 38% was between 45 and 64. An automobile
manufacturer wants to obtain opinions on a new design from five randomly chosen adults
from the under 65 group of adults. Of the five people so selected, find the approximate
probability that two will be between 16 and 24, two will be between 25 and 44, and one
will be between 45 and 64.
49
6.47. Customers leaving a subway station can exit through any one of three gates.
Assuming that any particular customer is equally likely to select any one of the three
gates, find the probabilities of the following events among a sample of four customers.
a. Two select gate A, one selects gate B, and one selects gate C.
b. All four select the same gate.
c. All three gates are used.
6.48. Among a large number of applicants for a certain position, 60% have only a highschool education, 30% have some college training, and 10% have completed a college
degree. If five applicants are selected to be interviewed, find the probability that at least
one will have completed a college degree. What assumptions must be true for your
answer to be valid?
6.49. In a large lot of manufactured items, 10% contain exactly one defect, and 5%
contain more than one defect. Ten items are randomly selected from this lot for sale, and
the repair costs total
Y1 + 3Y2
where Y1 denotes the number among the ten having exactly one defect, and Y2 denotes the
number among the ten having two or more defects. Find the expected value of the repair
costs. Find the variance of the repair costs.
6.50. Referring to Exercise 6.49, if Y denotes the number of items that contain at least
one defect, find the probabilities of the following events among the ten sampled items.
a. Y is exactly 2
b. Y is at least 1
6.51. Vehicles arriving at an intersection can turn right or left or can continue straight
ahead. In a study of traffic patterns at this intersection over a long period of time,
engineers have noted that 40% of the vehicles turn left, 25% turn right, and the remainder
continue straight ahead.
a. For the next five cars entering this intersections, find the probability that one turns left,
one turns right, and three continue straight ahead.
b. For the next five cars entering the intersection, find the probability that at least one
turns right.
c. If 100 cars enter the intersection in a day, find the expected values and the variance of
the number that turn left. What assumptions must be true for your answer to be valid?
6.7.
More on the Moment-Generating Function
The use of moment-generating functions to identify distributions of random variables is
particularly useful in work with sums of independent random variables. Suppose that X1
and X2 are independent random variables, and let Y = X 1 + X 2 . Using the properties of
expectations, we have
50
M Y (t ) = E (e tY ) = E (e t ( X 1 + X 2 ) )
= E (e tX1 e tX 2 )
= E (e tX1 )E (e tX 2 )
= M X 1 (t ) M X 2 (t )
That is, the moment generating function of the sum of two independent random variables
is the product of the moment-generating functions of the two variables. This result can
be extended to the sum of n independent random variables; that is, if X 1 , X 2 ,…, X n are n
independent random variables, the moment generating function of
Y = X 1 + X 2 + + X n is the product of the n moment generating functions:
M Y (t ) = M X 1 (t ) M X 2 (t )
M X n (t )
.
If we can then identify the form of the moment generating function, we can determine the
distribution of Y as illustrated in the next example
.
Example 6.19:
Suppose that X1 and X2 are independent, exponential random variables, each with mean θ.
Define Y = X 1 + X 2 . Find the distribution of Y.
Solution:
The moment-generating functions of X1 and X2 are
M X 1 (t ) = M X 2 (t ) = (1 − θt ) −1
Thus, the moment generating function of Y is
M Y (t ) = M X 1 (t ) M X 2 (t )
= (1 − θt ) −1 (1 − θt ) −1
= (1 − θt ) − 2
Upon recognizing the form of this moment-generating function, we can immediately
conclude that Y has a gamma distribution, with α = 2 and β = θ. This result can be
generalized to the sum of independent gamma random variables with common scale
parameter β.
Note: A key to this method is identifying the moment-generating function of the derived
random variable Y.
51
Example 6.20:
Let X1 denote the number of vehicles passing a particular point on the eastbound lane of a
highway in one hour. Suppose that the Poisson distribution with mean λ1 is a reasonable
model for X1. Now, let X2 denote the number of vehicles passing a point on the
westbound lane of the same highway in 1 hour. Suppose that X2 has a Poisson
distribution with mean λ2. Of interest is Y = X1 + X2, the total traffic count in both lanes
in one hour. Find the probability distribution for Y if X1 and X2 are assumed to be
independent.
Solution:
It is known from Chapter 4 that
t
M X1 (t ) = eλ1 ( e −1)
and
t
M X 2 (t ) = eλ2 ( e −1)
By the properties of moment-generating functions described earlier,
M Y (t ) = M X 1 (t ) M X 2 (t )
t
t
= eλ1 ( e −1) e λ2 ( e −1)
t
= e( λ1 +λ2 )( e −1)
Now, the moment-generating function for Y has the form of a Poisson momentgenerating function, with mean λ1 + λ2. Thus, by the uniqueness property, Y must have a
Poisson distribution with mean λ1 + λ2. Being able to add independent Poisson random
variables and still retain the Poisson properties is important in many applications.
Exercises
6.52. A quality control inspector random selects n1 items from the production line in the
morning and determines the number of defectives, X1. She randomly selects n2 items
from the same production line in the afternoon and determines the number of defectives,
X2.
a. Find the distribution of the number of defectives observed on a randomly selected day.
What assumptions did you have to make?
b. If n1 = 5 and n2 =4, what is the probability of not obtaining any defectives if the line is
producing 1% defectives?
6.53 . Let X1 and X2 denote independent, normally distributed random variables, not
necessarily having the same mean and variance.
a. Show that, for any constants a and b, Y = aX 1 + bX 2 is normally distributed.
52
b. Use the moment generating function to find the mean and variance of Y.
6.54. Resistors of a certain type have resistances that are normally distributed, with a
mean of 100 ohms and a standard deviation of 10 ohms. Two such resistors are
connected in series, which causes the total resistance of the circuit to be the sum of the
individual resistances. Find the probabilities of the following events.
a. The total resistance exceeds 220 ohms.
b. The total resistance is less than 190 ohms.
6.55. A certain type of elevator has a maximum weight capacity X1, which is normally
distributed with a mean and a standard deviation of 500 and 300 pounds, respectively.
For a certain building equipped with this type of elevator, the elevator loading X2 is a
normally distributed random variable with a mean and a standard deviation of 4000 and
400 pounds, respectively. For any given time that the elevator is in use, find the
probability that it will be overloaded, assuming that X1 and X2 are independent.
6.7 . Compounding and its Applications
The univariate probability distributions of Chapters 4 and 5 depend on one or more
parameters; once the parameters are known, the distributions are specified completely.
These parameters frequently are unknown, however, and (as in Example 6.15) sometimes
may be regarded as random quantities. Assigning distributions to these parameters and
then finding the marginal distribution of the original random variable is known as
compounding. This process has theoretical as well as practical uses, as we see next.
Example 6.21:
Suppose that Y denotes the number of bacteria per cubic centimeter in a certain liquid and
that, for a given location, Y has a Poisson distribution with mean λ. Suppose, too, that λ
varies from location to location and that, for a location chosen at random, λ has a gamma
distribution with parameters α and β, where α is a positive integer. Find the probability
distribution for the bacteria count Y at a randomly selected location.
Solution:
Because λ is random, the Poisson assumption applies to the conditional distribution of Y
for fixed λ. Thus,
λ y e −λ
p( y | λ ) =
,
y = 0,1,2,...
y!
Also,
⎧ 1
λα −1e −λβ ,
λ >0
⎪
f (λ ) = ⎨ Γ(α ) β α
⎪⎩ 0,
otherwise
53
Then the joint distribution of λ and Y is given by
g ( y , λ ) = p ( y | λ ) f (λ )
1
λ y +α −1e −λ [1+(1 / β )]
=
y! Γ(α ) β α
The marginal distribution of Y is found by integrating over λ; it yields
p( y ) =
∞
1
λ y +α −1e −λ [1+(1 / β )]dx
α ∫
y! Γ(α ) β 0
⎛
1
1⎞
=
Γ( y + α )⎜⎜1 + ⎟⎟
α
y! Γ(α ) β
⎝ β⎠
−( y +α )
Since α is an integer,
α
p( y ) =
( y + α − 1)! ⎛ 1 ⎞
⎜ ⎟
(α − 1)! y! ⎜⎝ β ⎟⎠
⎛ β ⎞
⎜⎜
⎟⎟
⎝1+ β ⎠
α
⎛ y + α − 1⎞⎛ 1 ⎞
= ⎜⎜
⎟⎟⎜⎜
⎟⎟
⎝ α − 1 ⎠⎝ 1 + β ⎠
− ( y +α )
y
⎛ β ⎞
⎜⎜
⎟⎟ ,
⎝1+ β ⎠
y = 0,1,2,...
If we let α = r and 1/(1 + β) = p, then p(y) has the form of a negative binomial distribution.
Hence, the negative binomial distribution is a reasonable model for counts in which the
mean count may vary.
Exercises
6.56. Suppose that a customer arrives at a checkout counter in a store just as the counter
is opening. A random number of customers N will be ahead of him, since some
customers may arrive early. Suppose that this number has the probability distribution
p ( n ) = P ( N = n ) = pq n ,
n = 0,1,2,...
where 0 < p < 1 and q = 1 – p (this is a form of the geometric distribution). Customer
service times are assumed to be independent and identically distributed exponential
random variables, with a mean of θ.
a. For a given n, find the waiting time W for the customer to complete his checkout.
b. Find the distribution of waiting time for the customer to complete his checkout.
c. Find the mean and variance of the waiting time for the customer to complete his
checkout.
54
6.8 . Summary
Most phenomena that one studies, such as the effect of a drug on the body or the life
length of a computer system, are the result of many variables acting jointly. In such
studies, it is important to look not only at the marginal distributions of random variables
individually, but also at the joint distribution of the variables acting together and at the
conditional distributions of one variable for fixed values of another. Expected values of
joint distributions lead to covariance and correlation, which help assess the direction and
strength of association between two random variables. Moment-generating functions
help identify distributions as being of a certain type (more on this in the next chapter) and
help in the calculation of expected values.
As in the univariate case, multivariate probability distributions are models of
reality and do not precisely fit real situations. Nevertheless, they serve as very useful
approximations that help build understanding of the world around us.
Supplementary Exercises
6.57. Consider the joint probability density function given below:
⎧cx x ,
f ( x1 , x 2 ) = ⎨ 1 2
⎩ 0,
a.
b.
c.
d.
e.
f.
0 ≤ x 2 ≤ x1 ,0 ≤ x1 ≤ 2
otherwise
Determine the constant c that makes f a legitimate joint probability density function.
Find the marginal density of X1.
Find the marginal density of X2.
Find the joint distribution function for X1 and X2.
Find the probability P( X 1 < 1 / 2, X 2 < 3 / 4).
Find the probability P( X 1 ≤ 1 / 2 | X 2 > 3 / 4).
6.58. Refer to Exercise 6.57.
a. Find the conditional density of X2 given X1.
b. Find the expected value of X2 given X1.
c. Are X1 and X2 independent?
6.59. The joint density of X1 and X2 is given by
⎧3x1 ,
f ( x1 , x2 ) = ⎨
⎩ 0,
0 ≤ x2 ≤ x1 ≤ 1
otherwise
a. Find the marginal density functions of X1 and X2.
b. Find the joint distribution function.
c. Find P( X 1 ≤ 3 / 4, X 2 ≤ 1 / 2).
55
d. Find P( X 1 ≤ 1 / 2 | X 2 ≥ 3 / 4).
6.60. Refer to the situation described in Exercise 6.59.
a. Find the conditional density of X1, given X2 = x2.
b. Find the conditional density of X2, given X1 = x1.
c. Are X1 and X2 independent? Justify your answer.
d. Find the probability P( X 1 ≤ 3 / 4 | X 2 = 1 / 2).
6.61. A committee of three persons is to be randomly selected from a group consisting of
four Republicans, three Democrats, and two independents. Let X1 denote the number of
Republicans on the committee, and let X2 denote the number of Democrats on the
committee.
a. Find the joint probability distribution of X1 and X2.
b. Find the marginal distributions of X1 and X2.
c. Find the probability P( X 1 = 1 | X 2 ≥ 1).
6.62. The joint density of X1 and X2 is given by
⎧⎪6 x12 x2 ,
f ( x1 , x 2 ) = ⎨
⎪⎩0,
0 ≤ x1 ≤ 1, 0 ≤ x 2 ≤ 1
otherwise
a. Find the mean and variance of X1.
b. Find the mean and variance of X2.
6.63. Refer to Exercise 6.62.
a. Find the covariance of X1 and X2.
b. Find the correlation between X1 and X2.
6.64. Let X1 denote the amount of a certain bulk item stocked by a supplier at the
beginning of a week, and suppose that X1 has a uniform distribution over the interval 0 ≤
x1 ≤ 1. Let X2 denote the amount of this item sold by the supplier during the week, and
suppose that X2 has a uniform distribution over the interval 0 ≤ x2 ≤ x1, where x1 is a
specific value of X1.
a. Find the joint density function of X1 and X2.
b. If the supplier stocks an amount of the item of ½, what is the probability that she sells
an amount greater than ¼?
6.65. Let X and Y have the joint probability density function given by
⎧3 y,
f ( x, y ) = ⎨
⎩ 0,
a. Find the marginal density of X.
b. Find the marginal density of Y.
0 ≤ x ≤ y;
0 ≤ y ≤1
otherwise
56
c. Find the conditional density of Y given X = x.
d. Find the expected value of Y given X = x.
e. Are X and Y independent? Justify your answer.
6.66. Let X and Y have joint density given by
⎧⎪2,
f ( x, y ) = ⎨
⎪⎩ 0,
0 ≤ x ≤ y, 0 ≤ y ≤ 1
otherwise
a. Find the covariance of X and Y.
b. Find the correlation coefficient. What, if anything, does it tell you about the
relationship of X and Y?
6.67. Let (X1, X2) denote the coordinates of a point at random inside a unit circle, with
center at the origin; that is, X1 and X2 have a joint density function given by
⎧⎪ 1 / π ,
f ( x1 , x2 ) = ⎨
⎪⎩0,
x12 + x22 ≤ 1
otherwise
a. Find the marginal density function of X1.
b. Find P( X 1 ≤ X 2 )
6.68. The joint density of X1 and X2 is given by
⎧⎪ x1 + x2 ,
f ( x1 , x2 ) = ⎨
⎪⎩0,
0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1
otherwise
a. Find the marginal density functions of X1 and X2.
b. Are X1 and X2 independent? Justify your answer.
c. Find the conditional density of X1 given X2 = x2.
6.69. Let X1 and X2 have the joint density function given by
⎧⎪ K ,
f ( x1 , x 2 ) = ⎨
⎪⎩ 0,
a.
b.
c.
d.
e.
f.
0 ≤ x1 ≤ 2, 0 ≤ x 2 ≤ 1, 2 x 2 ≤ x1
otherwise
Find the value of K that makes the function a probability density.
Find the marginal density functions of X1 and X2.
Find the conditional density of X1 given X2 = x2.
Find the conditional density of X2 and X1 = x1.
Find P( X 1 ≤ 1.5, X 2 = 0.5).
Find P( X 2 ≤ 0.5 | X 1 = 1.5).
57
6.70. Let X1 and X2 have a joint distribution that is uniform over the region shaded in the
accompanying diagram.
a.
b.
c.
d.
Find the marginal density of X2.
Find the marginal density of X1.
Find the conditional density function of X2 given X1.
Find P[( X 1 − X 2 ) ≥ 0].
6.71. Refer to the situation described in Exercise 6.70.
a. Find E(X1).
b. Find V(X1).
c. Find cov(X1, X2).
6.72. Referring to Exercises 6.57 and 6.58, find cov(X1, X2).
6.73. Refer to the situation described in Exercise 6.61.
a. Find cov(X1, X2).
b. Find E ( X 1 + X 2 ) and V ( X 1 + X 2 ) by finding the probability distribution of X 1 + X 2 .
c. Find E ( X 1 + X 2 ) and V ( X 1 + X 2 ) by using Theorem 6.2.
6.74. Refer to the situation described in Exercise 6.68.
a. Find cov(X1, X2).
b. Find E (3 X 1 − 2 X 2 ) .
c. Find V (3 X 1 − 2 X 2 ) .
6.75. Refer to the situation described in Exercise 6.69.
a. Find E ( X 1 + 3 X 2 ) .
b. Find V ( X 1 + 3 X 2 ) .
6.76. A quality control plan calls for randomly selecting three items from the daily
production (assumed to be large) of a certain machine and observing the number of
defectives. The proportion p of defectives produced by the machine varies from day to
day and is assumed to have a uniform distribution on the interval (0, 1). For a randomly
58
chosen day, find the unconditional probability that exactly two defectives are observed in
the sample.
6.77. The number of defects per year, denoted by X, for a certain fabric is known to have
a Poisson distribution with parameter λ. However, λ is not known and is assumed to be
random, with its probability density function given by
⎧⎪ e − λ ,
λ ≥0
f (λ ) = ⎨
⎪⎩0,
otherwise
Find the unconditional probability function for X.
6.78. Prove that, if X1 and X2 are continuous, independent random variables, cov(X1, X2)
= 0.
6.79. The mean time that it takes any given machine of a particular type to fail is
exponentially distributed with parameter λ. However, λ varies from machine to machine
according to a gamma distribution with parameters r and θ.
a. Find the mean time that it would take for a randomly selected machine to fail.
b. Find the variance of the time that it would take for a randomly selected machine to
breakdown.
6.80. The length of life X of a fuse has the probability density
⎧e − λ / θ / θ ,
f (λ ) = ⎨
⎩ 0,
x > 0, θ > 0
otherwise
Three such fuses operate independently. Find the joint density of their lengths of life, X1,
X2, and X3.
6.81. A retail grocer figures that his daily gain X from sales is a normally distributed
random variable, with μ = 50 and σ2 = 10 (measurements in dollars). X could be negative
if he is forced to dispose of perishable goods. He also figures daily overhead costs Y to
have a gamma distribution, with α = 4 and β = 2. If X and Y are independent, find the
expected value and the variance of his net daily gain. Would you expect his net gain for
tomorrow to go above $70?
6.82. A coin has probability p of coming up heads when tossed. In n independent tosses
of the coin, let Xi = 1 if the ith toss results in heads and Xi = 0 if the ith toss results in tails.
Then Y, the number of heads in the n tosses, has a binomial distribution and can be
represented as
n
Y = ∑ Xi
i =1
59
Find E(Y) and V(Y) using Theorem 6.2.
6.83. Refer to the situation and results of Exercises 6.57 and 6.58.
a. Find E ( X 2 | X 1 = x1 ) .
b. Use Theorem 6.4 to find E ( X 2 ) .
c. Find E ( X 2 ) directly from the marginal density of X2.
6.84. Refer to the situation described in Exercise 6.77.
a. Find E ( X ) by first finding the conditional expectation of X for given λ and then using
Theorem 6.4.
b. Find E(X) directly from the probability distribution of X.
6.85. Referring to exercise 6.64, if the supplier stocks an amount equal to 3/4, what is the
expected amount that she will sell during the week?
6.86. Among typical July days in Tampa, 30% have total radiation of at most 5 calories,
60% have total radiation of more than 5 calories but no more than 6 calories, and 10%
have total radiation of more than 6 calories. A solar collector for a hot water system is to
be run for 6 days. Find the probability that 3 days will produce no more than 5 calories
each, 1 day will produce between 5 and 6 calories, and 2 days will produce at least 6
calories each. What assumption must be true for your answer to be correct?
6.87. Let X be a continuous random variable with distribution function F(x) and density
function f(x). We can then write, for x1 ≤ x2,
P ( X ≤ x2 | X ≥ x1 ) =
F ( x2 ) − F ( x1 )
1 − F ( x1 )
As a function of x2 for fixed x1, the right-hand side of this expression is called the
conditional distribution function of X, given that X ≥ x1. On taking the derivative with
respect to x2, we see that the corresponding conditional density function is given by
f (x2 )
,
1 − F ( x1 )
x2 ≥ x1
Suppose that a certain type of electronic component has life length X, with the density
function (life length measured in hours)
⎧(1 / 400)e − x / 400 ,
x≥0
f ( x) = ⎨
otherwise
⎩ 0,
Find the expected length of life for a component of this type that has already been in use
for 100 hours.
60
6.88. Let X1, X2, and X3 be random variables, either continuous or discrete. The joint
moment-generating function of X1, X2, and X3 is defined by
(
M (t1 , t2 , t3 ) = E et1 X1 +t2 X 2 +t3 X 3
)
a. Show that M (t1 , t2 , t3 ) gives the moment-generating function for X 1 + X 2 + X 3 .
b. Show that M (t1 , t2 ,0) gives the moment-generating function for X 1 + X 2 .
c. Show that
∂ k1 + k2 + k3 M (t1 , t2 , t3 )
∂t1k1 ∂t2k2 ∂t3k3
t =t
1
(
= E X 1k1 X 2k2 X 3k3
)
2 = t3 = 0
6.89. Let X1, X2, and X3 have a multinomial distribution with probability function
p ( x1 , x2 , x3 ) =
n!
p1x1 p2x2 p3x3 ,
x1! x2! x3!
n
∑ xi = n
i =1
Employ the results of Exercise 6.88 to perform the following tasks.
a. Find the joint moment-generating function of X1, X2, and X3.
b. Use the joint moment-generating function to find cov(X1, X2).
6.90. The negative binomial variable X is defined as the number of the failures prior to
the rth success in a sequence of independent trials with constant probability p of success
on each trial. Let Xi denote a geometric random variable, defined as the number of
failures prior to the trial on which the first success occurs. Then, we can write
r
X = ∑ Xi
i =1
for independent random variables X1, X2, …,Xr. Use Theorem 6.2 to show that
E ( X ) = r p and V ( X ) = r (1 − p ) p 2 .
6.91. A box contains four balls, numbered 1 throug 4. One ball is selected at random
from this box. Let
X1 = 1 if ball 1 or ball 2 is drawn
X2 = 1 if ball 1 or ball 3 is drawn
X3 = 1 if ball 1 or ball 4 is drawn
and let the values of Xi be zero otherwise. Show that any two of the random variables X1,
X2, and X3 are independent, but the three together are not.
6.92. Let X1 and X2 be jointly distributed random variables with finite variances.
61
( )( )
a. Show that [E ( X 1 X 2 ] ≤ E X 12 E X 22 . (Hint: Observe that, for any real number t,
2
[
]
E (tX 1 − X 2 ) 2 ≥ 0 ; or equivalently,
( )
( )
t 2 E X 12 − 2 E ( X 1 X 2 ) + E X 22 ≥ 0
This is a quadratic expression of the form At 2 + Bt + C ; and because it is not negative,
we must have B 2 − 4 AC ≤ 0 . The preceding inequality follows directly.)
b. Let ρ denote the correlation coefficient of X1 and X2; that is,
ρ=
cov( X 1 , X 2 )
V ( X 1 )V ( X 2 )
Using the inequality of part (a), show that ρ 2 ≤ 1 .
6.93. A box contains N1 white balls, N2 black balls, and N3 red balls ( N1 + N 2 + N 3 = N ).
A random sample of n balls is selected from the box, without replacement. Let X1, X2,
and X3 denote the number of white, black, and red balls, respectively, observed in the
sample. Find the correlation coefficient for X1 and X2. (Let pi = N i / N for i = 1, 2, 3.)
6.94. Let X1, X2, …, Xn be independent random variables with E ( X i ) = μ and
V ( X i ) = σ 2 , for i = 1, 2, …, n. Let
n
U1 = ∑ ai X i
i =1
and
n
U 2 = ∑ bi X i
i =1
where a1 , a2 ,…, an and b1 , b2 ,…, bn are constants. U1 and U2 are said to be orthogonal if
n
cov(U1, U2) = 0. Show that U1 and U2 are orthogonal if and only if ∑ ai bi = 0 .
i =1
6.95. The life length X of fuses of a certain type is modeled by the exponential
distribution with
⎧(1 / 3)e − x / 3 ,
x>0
f ( x) = ⎨
otherwise
⎩ 0,
The measurements are in hundreds of hours.
a. If two such fuses have independent life lengths X1 and X2, find their joint probability
density function.
62
b. One fuse in part (a) is in a primary system, and the other is in a backup system that
comes into use only if the primary system fails. The total effective life length of the two
fuses is then X 1 + X 2 . Find P( X 1 + X 2 ≤ 1) .
6.96. Referring to Exercise 6.95, suppose that three such fuses are operating
independently in a system.
a. Find the probability that exactly two of the three last longer than 500 hours.
b. Find the probability that at least one of the three fails before 500 hours.