Download Multivariate Distributions

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
4
Multivariate
Distributions
4.1
Multivariate Distributions of the Discrete Type
There are many random experiments or situations that involve more than one
random variable. For example, a college admissions department might be
interested in the ACT mathematics score X and the ACT verbal score Y of
prospective students. Or manufactured items might be classified into three or
more categories: Here X might represent the number of good items among n
items; V would be the number of "seconds," and the number of defectives
would then be n —
in a biology laboratory, 400 kernels of corn could be classified into four categories, smooth and yellow, smooth and purple, wrinkled and yellow, and wrinkled and purple. The numbers in these four categories could be denoted by
X , , •K.I , A-3, and JC, = 400 - A-i - X , - A-3, respectively.
In order to deal with situations such as these, it will be necessary to extend
certain definitions as well as give new ones.
DEFINITION 4.1-1 Let X and Y be two functions defined on a discrete probability space. Let R denote the corresponding twasdimensmnat apace o^X_and_Y,
the two random variables of the discrete type. The probability that X = x and
Y = y is denoted by f i x . v) =P(X = x, Y = y), and it is induced from the discrete probability space through the functions X and Y. The function f ( x , y) is
called the joint probability density function (joint p.d.f.) of X and Y and has the
following properties:
(i) 0 ^/(x, y) < 1.
(ii) £
209
Multivariate Distributions
Figure 4.1-1
(iii) P[,(X, r) e A] = ^ ^ f ( x , y), where A is a subset of the space R.
fx,y)eA
The following example will make this definition more meaningful.
Example 4.1-1 Roll a pair of unbiased dice. For each of the 36 sample
points with probability 1/36, let X denote the smaller and Y the larger
outcome on the dice. For example, if the outcome is (3, 2) then the observed
values are X = 2, Y = 3; if the outcome is (2, 2), then the observed values
are X = Y = 2. The joint p.d.f. of X and Y is given by the induced probabilities
1_
36'
T_
36'
1 < x = y < 6,
1 :£ x < y < 6,
when x and y are integers. Figure 4.1-1 depicts the probabilities of the
various points of the space R.
Notice that certain numbers have been recorded in the bottom and lefthand margins of Figure 4.1-1. These numbers are the respective column and
row totals of the probabilities. The column totals are the respective probabilities that X will assume the values in the x space R^ = {1, 2, 3, 4, 5, 6}, and the
row totals are the respective probabilities that Y will assume the values in the
210
4.1
Multivariate Distributions of the Discrete Type
y space R^ = {1, 2, 3, 4, 5, 6}. That is, the totals describe probability density
functions of X and Y, respectively. Since each collection of these probabilities
is frequently recorded in the margins and satisfies the properties of a p-d.f. of
one random variable, each is called a marginal p.d.f.
DEFINITION 4.1-2 Let X and Y have the joint probability density function
f(x, y) with space R. The probability density function of X alone, called the
marginal probability density function of X, is defined by
fi(x) = S AX, y),
x e R,,
y
where the summation is taken over all possible y values for each given x in the x
space Rf That is, the summation is over all (x, y) in R with a given x value.
Similarly, the marginal probability density function of Y is defined by
f2(y}=^f^y),
y^^2,
where the summation is taken over all possible x values for each given y in the y
space R^. The random variables X and Y are independent if and only if
f ( x , y) =A(xWy),
xeR,, y e R , ;
otherwise X and Y are said to be dependent.
Example 4.1-2
Let the joint p.d.f. of X and Y be denned by
/(x^)=^ 2 ,
x=l,2,3,
y=l,2.
Then
?)=U^y)= £
:x + 1 + x
+2
2x+3
^^ ^^=^^'
x=l 2 3
- ' '
and
^)=E/(x,.)=E^=6^,
.=1,2.
Note that both/iOc) andf^y) satisfy the properties of a probability density
function. Since/(x, y) ^A(x)f^(y), X and Y are dependent.
211
Multivariata Distributions
Example 4.1-3 Let the joint p.d.f. of X and Y be
f { x , y) =
xy_
30
x = 1, 2, 3, y = 1, 2.
The marginal probability density functions are
x = 1, 2, 3
AM = £
30 6 '
and
^-s^.
y = 1, 2.
Then f ( x , y) s f i W f ^ y ) for x = 1, 2, 3 and y = 1, 2; thus X and Y are
independent.
Example 4.1-4 Let the joint p.d.f. o! X and Y be
xy_
13 '
(x, >.) = (1, 1), (1, 2), (2, 2).
Then the p.d.f. of X is
5
13'
8
T3'
and that of V is
fi(y) =
Thus/(x, y) ^/i(x)/2(y) for x = 1, 2 and y = 1, 2, and X and V are dependent.
Note that in Example 4.1-4 the support R of X and Y is "triangular."
Whenever this support R is not " rectangular," the random variables must be
dependent because R cannot then equal the product set {(x, y): x e R^, y e
R^}. That is, if we observe that the support R of X and Y is not a product set,
then X and Y must be dependent. For illustration, in Example 4.1-4, X and Y
212
4.1
Multivariate Distributions of th» Dixcrta Type
are dependent because R = {(1, 1), (1, 2), (2, 2)} is not a product set. On the
other hand, if R equals the product set {(x, y): x e Ri, yeR^} and if the
formula for/(x, y) is the product of an expression in x alone and an expression
in y alone, then X and Y are independent, as illustrated in Example 4.1-3.
Example 4.1-2 illustrates the fact that the support can be rectangular but the
formula for/(x, y) is not such a product and thus X and Y are dependent.
The notion of a joint p.d.f. of two discrete random variables can be extended
to a joint p.d.f. of n random variables of the discrete type. Briefly, the joint
p.d.f. of the n random variables X^ X ^ , . . . , X^ is denned by
/(X[, X;, ..., Xn) == P(X^ = Xi, X^ = X;, . . . , X^ = X,,)
over an appropriate space R. Furthermore,/(Xi, x^ „..., x,,) satisfies properties
similar to those given in Definition 4.1-1. In addition, the marginal probability
density function of one of n discrete random variables, say X ^ , is found by
summing f ( x ^ , x ^ . . . . , x^) over all x,'s except x^; that is,
/^)=£--- £
The random variables A\, X ^ , . . . , X ^ are mutually independent if and only if
/(^, x^,..., x,) = f , ( x , ) f ^ ) - - - f ^ ) ,
Xi e R,, x, e R^, .... x, e R,.
If X i , X ^ , . , . , X^ are not independent, they are said to be dependent.
We are now in a position to examine more formally the concept of a
random sample from a distribution. Recall that when we collected the n observations, X i , x ^ , . . . , x^, of X we wanted them in some sense to be independent,
which we now observe is actually mutual independence. That is, before the
sample is actually taken, we want the corresponding random variables X^
X ^ , . . . , X,, to be mutually independent and each to have the same distribution
and, of course, the same p.d.f., say f(x). That is, the numbers X i , X ^ , ..-, X^
that are to be observed should be mutually independent and identically distributed random variables with joint p.d.f. f ( x ^ f ( x - i ) •
Example 4.1-5 Let X i , X ^ , X y , X^ be four mutually independent and
identically distributed random variables with the common Poisson p.d.f.
Ve-2
213
Multivariate Distributions
Ch. 4
Then, for illustration,
P(A-i = 3, X , = 1, Xa = 2, X, = 1) = /(3)/(1)/(2)/(1)
_2^^_1^8
3!1!2!1!
_ 2 ^ e _ ' _ 3 2 _,
12 - 3 '' •
Let us also compute the probability that exactly one of the X's equals zero.
First we treat zero as " success." If W equals the number of successes, then
the distribution of W is b(4, e~2) because
P(A-,=0)=e- 2 ,
1=1,2,3,4.
Thus the probability of one success and three failures is
PiW^l^-^i.e-V-e-1)3.
We are now prepared to define officially a random sample and other related
terms. Say a random experiment that results in a random variable X having
p.d.f./(X) is repeated n independent times. Let X,, X j , ..., X , denote the n
random variables associated with these outcomes. The collection of these
random variables, which are mutually independent and identically distributed,
is called a random sample from a distribution with p.d.f. f ( x ) . The number n is
called the sample size. The common distribution of the random variables in a
random sample is sometimes called the population from which the sample is
taken.
Let us now consider joint marginal distributions of 2 or more of n random
variables. A joint marginal p.d.f. of X j and X^ is found by summing/^, x;,
..., x,,) over all x,'s except X j and x^; that is,
.U^^)=£--- E £
Extensions of these marginal probability density functions to more than two
random variables are made in an obvious way.
Example 4.1-6 Consider a population of 200 students who have just finished a first course in calculus. Of these 200, 40 have earned A's, 60 B's, 70
C's, 20 D's, and 10 F's. A sample of size 25 is taken at random and without
214
4.1
Multivariate Distributions of the Discrete Type
replacement from this population so that each possible sample has probability
poo\
\25J
of being selected. Within the sample of 25, let A', be the number of A students, Jf, the number of B students, X , the number of C students, X^ the
number of D students, and 25 —
dents. The space R of (A\, X;, X ^ , X ^ ) is defined by the collection of
ordered 4-tuplets of nonnegative integers (x;, x;, X j , x^) such that x, + x;
+ x, + X4 < 25. The joint p.d.f. of X ^ , X ^ , X , , and ^"4 is
rw\r6o'\no\^ov
10
\
\xJ\x^}\x,}\xJ\'15 - Xi - x; - xj - xj
f(x,. x,. x,, xj = —
—
—
—
—
—
—
—
—
—
^
—
—
—
—
—
/
—
—
—
—
—
—
—
—
—
[is)
for (xi, x-i, X s , X f ) e R , where it is understood that
l'k\
. ) = 0 if j > k.
Without actually summing, we know that the marginal p.d.f. of X , is
poy 130 \
^--{-^
.=0,1,2,...,25
[u)
and the joint marginal p.d.f. of X ^ and X; is
fll(Xi, X,) =
/40y60Y
100
\
\xJ\xJ\25 - Xi - x j
/200\
'
\V
0 < Xi, 0 < x,,
Xt + x-t <, 25.
Of course,/3(X3) is a hypergeometric p.d.f. and/izt^i* X2) sud/fxi, x^, -X3,
^4) are extensions of that type of p.d.f. It is easy to see that
f(x^, X 2 , JC3, ^4) ^/i(Xi)/2(X2)/3(X3)/4(X4),
and thus X^ X ^ , X ^ , and X^ are dependent. Note also that the space R is
not rectangular, which would also imply that the random variables are
dependent.
The distribution in Example 4.1-6 illustrates an extension of the hypergeometric distribution. In general, instead of two classes, suppose that each of the
215
Multivariate Distributions
Ch. 4
n objects can be placed into one of 5 disjoint classes so that «i objects are in
the first class, n^ in the second, and so on until we find n, in the sth class. Of
course, n = n, + n; + •
n at random and without replacement. Let the random variable X, denote the
number of observed objects in the sample belonging to the t'th class. Find the
probability that exactly x; objects belong to the t'th class, i = 1, 2,..., s. Here
0 < x, < n,
and
Xi + x; + •
/n,\
We can select x; objects from the ith class in any one of i I ways, i = 1, 2,...,
5. By the multiplication principle, the product
equals the number of ways the joint operation can be performed. If we assume
that each of the ( ) ways of selecting r objects from n = «i + "2 + "'
V/
+ n, objects has the same probability, we have that the probability of selecting exactly x, objects from the ith class, i = 1, 2,.... s, is
•".y"2
P(Xt = X t , X ^ = X 2 , . . . , X , = x , ) =
0 ^ x, < n,
and
^,Ax,
x, + x; + •
Example 4.1-7 The probability that a 13-card bridge hand (selected at
random and without replacement) contains two clubs, four diamonds, three
hearts, and four spades is
f'TTT3')
I 2 A 4A 3A 4 I
<'52\
ll3/
11,404,407,300
= 0.018.
635,013,559,600
We now consider an extension of the binomial distribution, namely the
multinomial distribution. Consider a sequence of repetitions of an experiment
for which the following conditions are satisfied:
(a) The experiment has k possible outcomes that are mutually exclusive and
exhaustive, say A^, A^,..., A^.
216
4.1
Muttivariate Distributions of the Discrete Type
(b) n independent trials of this experiment are observed.
(c) P(A,) = p,, i = 1, 2,..., k, on each trial with ^?= i P; = 1.
(d) The random variable X , is equal to the number of times A, occurs in the
n trials, i = 1, 2,..., k.
If x^, x;,..., Xfc are nonnegative integers such that their sum equals n, then,
for such a sequence, the probability that A, occurs x, times, i = 1, 2, ..,, k, is
given by
P{X, = x,, X, = x^, .... X, = x,) =
x^. x^.
•
x^.
To see that this is correct, note that the number of distinguishable arrangements ofxi Ai's, x; A^'s,..., Xk At'sis
\JCi,JC2,...,Xfc/
X J X ; ! -•• Xk'
and that the probability of each of these distinguishable arrangements is
P^Pi2 •
Hence the product for these two latter expressions gives the correct probability, which is in agreement with the expression for
P(X^ = X i , Xy, = x^. ..., X,, = Xt).
We say that X ^ , X ^ , ..., X^ have a multinomial distribution. The reason is
that
S~
1
—
T
—
—
i P^Py ••• Pskk=(Pi+P2+•••+ PkT = ^
-^r^-'" ^k-
where the summation is over the set of all nonnegative integers \i, x^, . - - , x,,
whose sum is n. That is, P(Xi = a:i, X^ = x ^ , ..., X^ = x^) is a typical term in
the expansion of the nth power of the multinomial (pi + pi + " ' + Pit)When k = 3, we often let X = X^ and Y = X^, then n - X - Y = X ^ . We
say that X and Y have a trinomial distribution. The joint p.d.f. o f X and Y is
^•^-.Wn-.-yY.^-^-^""'-
where x and y are nonnegative integers such that x + y < n. Since the marginal distributions of X and Y are, respectively, b{n, p^) and b(n, p^), it is obvious
217
Multivariate Distributions
Ch. 4
that the product of their probability density functions does not equal /(x, y\
and hence they are dependent random variables. Also note that the support of
X and y is triangular, so the random variables must be dependent.
Example 4.1-8 In manufacturing a certain item, it is found that in normal
production about 95% of the items are good ones, 4% are "seconds," and
1 % are defective. This particular company has a program of quality control
by statistical method; and each hour an on-line inspector observes 20 items
selected at random, counting the number X of seconds, and the number Y
of defectives. If, in fact, the production is normal, the probability of finding
in this sample of size n -= 20 at least two seconds or at least two defective
items is
1 - P(X = 0 or 1 and Y = 0 or 1)
901
7ft l
= i - ^o, (o.o^o.oino^)20 - -^^ (o.o4)\o.o\f{o.95r
- 0^9, (o.o^o.oi)1^^)19 - Yv^g, (o.o^to.oi^o^)18
= 0.204.
Exercises
4.1-1 Let the joint p.d.f. of X and Y be denned by
/(x,^)=^,
x = l , 2 , y=\,l, 3, 4.
Find
(a) /i(x), the marginal p.d.f. of X ;
(b) f^y\ the marginal p.d.f. of V;
(c) P[X > V);
(d) P(Y = 2X);
(e) P(X +Y = 3);
(f) P(X ^ 3 - V).
(g) Are X and Y independent or dependent?
4.1-2 Roll a red and a black four-sided die. Let X equal the outcome on the red die,
and let Y equal the outcome on the black die.
(a) On graph paper, show the space of X and Y.
(b) Define the joint p.d.f. on the space (similar to Figure 4.1-1).
(c) Give the marginal p.d.f, of X in the margin.
(d) Give the marginal p.d.f. of Y in the margin.
(e) Are X and Y dependent or independent? Why?
4.1-3 Roll a red and a black four-sided die. Let X equal the outcome on the red die
and let Y equal the sum of the two dice.
(a) On graph paper, describe the space of X and Y.
(b) Define the joint p.d.f, on the space (similar to Figure 4.1-1).
218
4.1
Multivariate Distributions of the Discrete Type
(c) Give the marginal p.d.f. of X in the margin.
(d) Give the marginal p.d.f. of Y in the margin.
(e) Are X and Y dependent or independent? Why?
4.1-4 Let ,X\, X - i , X^ denote a random sample of size n = 3 from a distribution with
the geometric p.d.f.
^''©({r'•
^1.2,3,....
That is, -X'i, X ^ , and X^ are mutually independent and each has this geometric
distribution.
(a) Compute P(X, = 1, X ^ = 3, X^ = 1».
(b) Determine P(X, + X ^ + X ^ = 5).
(c) If V equals the maximum of X ^ , X ^ , X ^ , find
P(V £
4.1-5 A box contains a mixture of tea bags—15 spice, 5 orange, 10 mint, and 20 green.
Select 4 tea bags at random and without replacement. Find the probability that
(a) one of each kind of tea is selected,
(b) all 4 tea bags are green tea.
4.1-6 Draw 13 cards at random and without replacement from an ordinary deck of
playing cards. Among these 13 cards let A\ be the number of spades, -Y; the number
of hearts, X^ the number of diamonds, and 13 —
clubs.
(a) Determine P(Xi = 5, X ^ = 4, X., = 3).
(b) Among the 13 cards, what is the probability that the numbers of cards in the
four suits are 5, 4, 3, and 1 ?
HINT: Part (a) presents one way this could occur, but there are also other ways, for
example, X , =3,X^= 5, X j = 1.
4.1-7 A particular pound of candy contains 136 jelly beans of which the numbers of
black, green, orange, pink, purple, red, white, and yellow are 11, 12, 13, 16, 25, 32, 13,
and 14, respectively. Sixteen jelly beans are selected at random and without replacement.
(a) Give the probability that exactly two of each color are selected.
(b) Let X equal the number of black, and let Y equal the number of red in the
sample o f n = 16 Jelly beans. Give the joint p.d.f. of-Y and Y.
(c» Find P(JS: ^ 2).
4.1-8 A box contains 100 Christmas tree light bulbs of which 30 are red, 35 are blue, 15
are white, and 20 are green. Fifteen bulbs are to be drawn at random from the box
to fill a string with 15 sockets. Let X ^ denote the number of red, X^ the number of
blue, and X j the number of white bulbs drawn.
(a) Givef(xi,x^,x,,}, the joint p.d.f. of X ^ X ^ , and X ^ .
(b) Describe the set of points for which/(xj, x^, ^3) > 0.
(c) Determine/Jxi), the marginal p.d.f. of X ^ and find P(Xi = 10).
(d) Find/iaOci, x;), the joint marginal p.d.f. of X, and X ^ .
219
Muftivariate Distributions
Ch. 4
4.1-9 In a biology laboratory, corn is used to illustrate the Mendelian theory of inheritance. It is claimed that the four categories for the kernels of corn, smooth and
yellow, wrinkled and yellow, smooth and purple, and wrinkled and purple, will occur
in the ratio 9:3:3:1. Out of 208 kernels of corn, let A], X i , X j and X^ = 208 - Xi
—
theory is true,
(a) Give the joint p.d-f. of X i , X ^ , Xj,and ^4,and describe the support in 3-space,
(b) Give the marginal p.d.f. of X ^ .
(c) Give the joint marginal p.d.f. of X ^ and -X";.
(d» Find E(X^), E[X^ E(X^}, and E(X^).
4.1-10 Toss a fair die 12 independent times. Let X , denote the number of times i
occurs, i = 1, 2, 3,4, 5, 6.
(a) What is the joint p.d.f. of X ^ X ^ , . . . , X^
(b) Find the probability that each outcome occurs two times.
(c) Find P(X, = 2».
(d) Are X ^ , X ^ , . . . , X ^ mutually independent?
4.1-11 A manufactured item is classified as good, a "second," or defective, with probabilities 6/10, 3/10, and 1/10, respectively. Fifteen such items are selected at random
from the production line. Let X denote the number of good items, Y the number of
seconds, and 15 —
(a) Give the joint p.d.f. of X and V,/(x, y}.
(b) Sketch the set of points for which f ( x , v} > 0, From the shape of this region, can
X and r be independent? Why?
(c) Find P{X = 10, Y = 4).
(d) Give the marginal p-d.f. of X .
(e) FindP(X ^ 11).
4.1-12 Following the second "Great Debate," assume that the proportions of listeners
who thought that Reagan had won, Mondale had won, and it was a tie were ?„ =
0.40, py = 0.35, and pj- = 0.25, respectively. In-a random sample o f n = 100 listeners,
let X equal the number who thought Reagan had won and let Y equal the number
who thought Mondale had won.
(a) Give the joint p.d.f. of X and y.
(b) What is the marginal distribution o f X ' ?
4.2
The Correlation Coefficient
Let A"i, X ^ , ..., X^ be random variables of the discrete type having a joint
distribution. In this section we consider the mathematical expectation of functions of these random variables. If u(X ^ X ^ , . . - , X ^ ) is a function of «
variables of the discrete type that have a joint p.d.f. f { x ^ x;,..., x,,) and space
R, then
EW,,X,,...,X^=^--- ^«(x,,x,,...,xJ/(x,,x,,...,xJ,
220
4.2
The Correlation Coefficient
if it exists, is called the mathematical expectation (or expected value) of
u(X,,X^...,X^.
Example 4.2-1 There are eight similar chips in a bowl: three marked (0, 0),
two marked (1, 0), two marked (0, 1), and one marked (1, 1). A player selects
a chip at random and is given the sum of the two coordinates in dollars. If
-X"! and X^ represent those two coordinates, respectively, their joint p.d.f. is
/(xi, x,) = 3 ~ xl ~ x l ,
x, = 0, 1 and ^ = 0, 1.
0
Thus
E(X,+X,)= i
i (x, + x,) 3 ~ x^ "xl
:
^)^^^-
That is, the expected payoff is 75fi.
The following mathematical expectations, subject to their existence, have
special names:
(i) If Ui(^i, X - i , . . . , X.) = X , , then
£[ui(A-i,A',,...,JC.)]=£(A-,)=^
is called the mean of X,., j = 1, 2,..., n.
(ii) If u,(Xi, X , , . . . , X.) = ( X , - i i f , then
£["2(A'i, X , , . . . , X,)-] = EKX, - /i,)2] = ff.2 = Var(A',)
is called the variance of -Y;, i = 1, 2,..., n.
(iii) If u,(X^, X-,,..., X.) = ( X , - MX, - f4 i ¥• j , then
£[u3(A-i, X,, ..., X.)~\ = £[(A-, - ^X, - ^)] = s,, = Cov(;y,, X,)
is called the covariance of X , and X j .
(iv) If the standard deviations (T| and a, are positive, then
u
Covpf,, X )
a fT
i j
g,j
^i^i
is called the correlation coefficient of X, and X j .
221
Multivariate Distributions
Ch
It is convenient to observe that the mean and the variance of X , can
computed from either the joint p.d.f. or the marginal p.d.f. of X , . F
example, if n = 2,
tl, = E(X,) = S E ^/(^i, ^)
=I:xi[l:/(*i,^)]=E<i/i(^i)Before considering the meaning of the covariance and the correlatii
coefficient, let us note a few simple facts. With i ^ j ,
EUX, - MX, - ^.)] = E(x;\i - v.;x, -fi,x,+ /i,/;,)
= E(X; A-,) - n,E(X,) - ^E(X,) + w,
because it is true that even in the multivariate situation, £
distributive operator (see Exercise 4.2-4). Thus
Cov(X,, X,) = E(X,X,) - w, - ^11; + w, = E(X,X) - f i i i i , .
Since p^ = CoviX,, ^•)/<T, a,, we also have
E(X, X j l = it, itj + py (T, s,.
That is, the expected value of the product of two random variables is eqi
to the product ^, f i j of their expectations plus their covariance py a, a j .
A simple example at this point would be helpful.
Example 4.2-2 Let X^ and X^ have the joint p.d.f.
/(^., x,) = xl ^x2 .
xi=l,2, x,=l,2.
The marginal probability density functions are, respectively,
^)=£^2^6,
^1,2,
and
c / v i - y x! + ^2 . 3 + 4x,
_
'~L,
18
"" 18 •
J1{X1
222
4.2
The Correlation Coefficient
Since/(Xi, x;) ^/i(x,)/,(x;), A'i and X; are dependent. The mean and the
variance of X ^ are
2
2xi+6
t'S\
/'l0\
14
^=^^-^=^\^)+^\Ts)^•
;_ „
''l~,L,xl
18 - ^7
9 - '8T- 8T •
The mean and the variance of X^ are
^w-w^,
2
fc=
E
3+4x2
18
/ 7\
+
•
\
/
*2 ——————— = ( l ) 7 o
10
.2-1
10
and
2 _ v 2 3+4x; _ /My 51 841 77
^ " - i ^ 2 18 "iTsY - 18" 324" 324'
The covariance of X^ and X^ is
Cov^.^^^x,.-^2-^)
(lx!) 3+^ 2 l)^
+(
l) 0
2)
© < " (^) ( ( (,l
•/^-f^
•\18; ^ 9 A
-o-^xs.
=
45 406_
J_
18- 162 -- 162 '
Hence the correlation coefficient is
p=
, -l/w
=——==-0.025.
V'(20/81)(77/324) yT540
Insight into the correlation coefficient p of two discrete random variables X
and V may be gained by thoughtfully examining its definition
E (x - f»Xy - fr)/fe V)
p="—————————————,
223
Multivariate Distributions
Ch. 4
where ^, jUy, a^, and o-y denote the respective means and standard deviations. If positive probability is assigned to pairs (x, y) in which both x and y
are either simultaneously above or simultaneously below their respective
means, the corresponding terms in the summation that defines p are positive
because both factors (x —
negative. If pairs (x, y), which yield large positive products (x - /^)(y ~ ^\
contain most of the probability of the distribution, the correlation coefficient
will tend to be positive (see Example 4.2-3). If, on the other hand, the points
(x, y), in which one component is below its mean and the other above its
mean, have most of the probability, then the coefficient of correlation will tend
to be negative because the products (x —
4.2-5). This interpretation of the sign of the correlation coefficient will play an
important role in subsequent work.
To gain additional insight into the meaning of the correlation coefficient p,
consider the following problem. Think of the points (x, y) in the space R and
their corresponding probabilities. Let us consider all possible lines in twodimensional space, each with finite slope, that pass through the point associated with the means, namely [p.^, ^y). These lines are of the form y —
b(x —
(XQ, Yo) so that/(xo, y'o) > 0, consider the vertical distance from that point to
one of these lines. Since j/p 1s the height of the point above the x axis and ^y +
b(xo —
point (xo, Vo), then the absolute value of the difference of these two heights is
the vertical distance from point (XQ , yy) to the line y = p.y + b(x —
the required distance is | y^ —
tance and take the weighted average of all such squares; that is, let us consider
the mathematical expectation
E[\,(Y-^-b(X-^}Y}^K{b).
The problem is to find that line (or that b) which minimizes this expectation of
the square {Y —
least squares, and the line is sometimes called the least squares regression line.
The solution of the problem is very easy, since
K(b) = E[{Y - ^ - 2b(X - ^}(Y - ^} + b\X - /^)2}
= o-^ —
because £
ingly, the derivative
K'(b) = -Ipffxffy + 2^
equals zero at b = p f f y / ^ x ' an(! we see that K(b) obtains its minimum for that
b since K"{b} = 2a^ > 0. Consequently the least squares regression line (the line
224
4.2
The Correlation Coefficient
of the given form that is the best fit in the foregoing sense) is
y = /ty + p -JL (x - fijc).
Of course, if p > 0, the slope of the line is positive; but if p < 0, the slope is
negative.
It is also instructive to note the value of the minimum of
K{b] = £{[(V - ^» - b{X - ^)]2} = o2 - 2bpa^ a, + b2^.
It is
K.[p ^^a^-lp^ po^oy + [p ^ ) a\
\ Ox}
a
x
\ a\)
= 0} - 2p2o-2 + p 2 ^ 2 = o^l - p2).
Since K(b} is the expected value of a square, it must be nonnegative for all b,
and we see that cr^(l - p2) >: 0; that is, p2 < 1, and hence - 1 < p < 1, which
is an important property of the correlation coefficient p. Note that if p = 0,
then K(pay/Ujc) = o^\ on the other hand, X(piTy/cr^) is relatively small if p is
close to 1 or negative 1. That is, the vertical deviations of the points with
positive density from the line y = ^y 4- p((7y/<r^)(x - p^) are sm^11 if P *s close
to 1 or negative I because X(p(rr/ff^) is the expectation of the square of those
deviations. Thus p measures, in this sense, the amount of linearity in the probability distribution. As a matter of fact, in the discrete case, all the points of
positive density lie on this straight line if and only ifp is equal to 1 or negative
1.
REMARK More generally, we could have fitted the line v(x) = a + bx by the
same application of the principle of least squares. We would then have proved
that the " best" line actually passes through the point (/^, py). Recall that in
the discussion above we assumed our line to be of that form. Students will find
this derivation to be an interesting exercise using partial derivatives (see Exercise 4.2-5).
The following three examples illustrate joint discrete distributions for which
p is positive, zero, and negative, respectively. In Figures 4.2-1 and 4.2-2 the line
of best fit or the least squares regression line is also drawn.
E^inple 4.2-3 Roll a pair of four-sided dice for which the outcome is 1, 2,
EJ^mple
3, or 4 on each die. Let X denote the smaller and Y the larger outcome on
225
Multivariate Distributions
Ch. 4
the dice. Then the joint p.d.f. o f X and Y is denned by
^,
l^x=^4,
f^, y)= ^
16-
l
<Jt<^4•
It can be shown that E(X) = 15/8, E(Y) = 25/8, VarpO = 55/64,
Var(r) = 55/64, Cov(A', r) = 25/64, and p = 25/55. Thus the line of best fit
F6
r6
16
Figure 4.2-2
226
4.2
The Correlation Coefficient
The joint p.d.f. is depicted in Figure 4.2-1. On this figure we have drawn
horizontal and vertical lines through (/ix, /^y) and also the line of best fit.
Example 4.2-4 Roll an unbiased four-sided die two independent times. Let
X equal the outcome on the first roll and Y the outcome on the second roll.
The joint p.d.f. of AT and Y is
f ( x , y)=—
16
x = 1, 2, 3, 4, and y = 1, 2, 3, 4.
Since the marginal p.d.f.'s are the same, we have {ijc = {iy = 2.5,
Var(JQ = Var(Y) = 5/4. Because E(XY) = 100/16, Cov(X, Y) = 100/16
- (2.5)(2.5) = 0, and thus p = 0. The line of best fit would simply be the
horizontal line through the point (/^, /iy). Of course, if we were minimizing
the expected value of the square of the horizontal distances, then the line of
best fit would be the vertical line through (jUy, /<y). The joint and marginal
p.d.f.'s are depicted in Figure 4.2-2.
Example 4.2-5 Let X equal the number of ones and Y the number of twos
and threes when a pair of fair four-sided dice are rolled. Then X and Y have
a trinomial distribution with p.d.f.
^••^xl^-x-^W
0 < x + y < 2,
where x, and y are nonnegative integers. Since the marginal p.d.f. of X is
t(2, 1/4) and the marginal p.d.f. of V is b(2, 1/2), we know that it, = 1/2,
Var(X) = 6/16, ii, = 1, and Var(Y) = 1/2. Since E(XY) = (1X1X4/16) = 4/16,
Cov(X, Y) = 4/16 - (1/2X1) = -4/16; therefore, the correlation coefficient
is p = - 1A/3. Using these values for the parameters, we obtain the line of
best fit, namely
The joint p.d.f. is displayed in Figure 4.2-3. On this figure we have drawn
horizontal and vertical lines through (/^x, /^y) and also the line of best fit.
227
Multivariate Distributions
Suppose that X and Y are independent so that/(x, y) s f i ( x ) f ^ y ) and we
want to find the expected value of the product (X - ii^Y - fi,). Subject to
the existence of the expectations, we know that
EWW] = S £
R
=£
Rl Rl
= Y, uW,(x) £
»1
Sl
= EWWW)-].
This can be used to show that the correlation coefficient of two independent
variables is zero (see Example 4.2-4). For, in a standard notation, we have
Cov(X, Y)=EQX-^W-h)'}
= E(X - ^)E(Y - 11,) = 0.
The converse of this fact is not necessarily true, however; zero correlation does
not in general imply independence. It is most important to keep this straight;
independence implies zero correlation, but zero correlation does not necessarily imply independence. The latter is now illustrated.
E^,liple 4.2-6 Let X and Y have the joint p.d.f.
fl.x, y) = j ,
228
(x, y) = (0, 1), (1, 0), (2, 1).
4.2
The Correlation Coefficient
Since the support is not "rectangular," X and V must be dependent. The
means of X and Y are ^ = 1 and f t , = 2/3, respectively. Hence
Cov(X, Y) = E(XY) - fe fly
= (0)(1)Q) + (D(O)Q') + (2)(1)Q') - (1/J') = 0.
That is, p = 0, but X and Y are dependent.
Exercises
4.2-1 Let the random variables X and Y have the joint p.d.f.
/(x,?)-^,
x=l,2, y - 1,2, 3, 4.
Find the means ^y and /^y, the variances CT^ and ffy, and the correlation coefficient p.
Are X and Y independent or dependent?
4.2-2 Let X and Y have the joint p.d.f. described as follows:
(x. y}
f(x, y)
(0, 0)
(1.0)
(1.1)
(2,1)
1/6
2/6
2/6
1/6
Find the correlation coefficient p and the " best-fitting " line.
HINT: First depict the points in R and their corresponding probabilities.
4.2-3 Roll a fair four-sided die twice. Let X denote the outcome on the first roll, and let
V equal the sum of the two rolls. Find
(•) C,,
(b) ^,
(<•) f 1, ,
w " ,,
(e) Cov(A-, V),
(t) p.
(g) the best fitting line.
, (h) Display the joint p.d.f. as done in Figure 4.2-1 and draw the best-fitting line on
fc
„
i.s
229
Muttivariate Distributions
Ch. 4
4.2-4 In the multivariate situation, show that £
convenience, let n = 1 and show that
E\_a,u^Xi,X^+a2Ui(Xi,X^'] = a^E[u^X^, X^ + a^E^u^X^ X^].
4.2-5 Let X and Y be random variables with respective means ^ and ^, respective
variances cs\ and 0}, and correlation coefficient p. Fit the line y = a + bx by the
method of least squares to the probability distribution by minimizing the expectation
K(a, b) = £[(r - a - bX)^
with respect to a and b.
HINT: Consider 8K/9a = 0 and SK/Sb = 0 and solve simultaneously.
4.2-6 Let X and V have a trinomial distribution with parameters n = 3, p, = 1/6, and
p; = 1/2. Find
(a) E(X),
(b) £(n
(c)
(d)
(e)
(0
Var(JO,
Var(r),
Cov(X, Y),
P-
Note that p = -Vpipz/O - PiXl - p2>.
4.2-7 Let the joint p.d.f. of X and V b e f ( x , y) = 1/4, (x, >') 6 ^ = {(0, 0), (1, 1), (1, -1),
(2, 0)}.
(a) Are X and V independent ?
(b) Calculate Cov(A", Y} and p.
This also illustrates [he fact that dependent random variables can have a correlation
coefficient of zero.
\4.f-S The joint p.d.f. of X and Y is f { x , y) = 1/6, 0 £
Y/integers.
\(a) Sketch the support of X and V.
•^(b) Record the marginal p.d.f.'s/i^and/^tin the "margins."
>/(c) Find Cov[X, Y).
^(d) Find p, the correlation coefficient.
./(e) Find the best-fitting line and draw it on your figure.
4.2-9 Let X ^ , X^ be a random sample of size n = 2 from the distribution with the
binomial p.d.f.
/M-mrr", -0,1,3.
Find the joint p.d.f. of Y = X^ and W = X^ + X ^ , determine the marginal p.d.f, of
W and compute the correlation coefficient of Y and W.
230
4.3
Conditional Distributions
HINT: Map the nine points (x^, x;) in the space ofA'i, X^ into the nine points (y, w} in
the space of V, W along with the corresponding probabilities and proceed as in
earlier exercises.
4.3_________________________________
Conditional Distributions
Let X and Y have a joint discrete distribution with p.d.f. f { x , y} on space R.
Say the marginal probability density functions are /i(x) and f ^ y ) with spaces
RI and R,, respectively. Let event A = {X = x} and event B = {Y = y},
(x, y) e R. Thus A n B = [X = x, Y = y}. Because
P(/l r ^ B ) = P ( X = x , Y = y) = f ( x , y)
and
P(B) = P( Y = y) = f,(y) > 0
(since y e R,),
we see that the conditional probability of event A given event B is
P(A\B)=p(Ar'B)=f{x^
•v1""
P(B)
/AO
This leads to the following definition.
DEFINITION 4.3-1 The conditional probability density function of X, giten
that y = y, is defined by
S(x | y) = f<—^ ,
provided that f,(y) > 0.
Similarly, the conditional probability density function of V, given that .Y = v. is
defined by
f ( x v)
h(y | x) = —
—
—,
Ji(x)
provided that f ^ x ) > 0.
Example 4.3-1 Let X and Y have the joint p.d.f.
f^x,y)=x^y.
x = l , 2 , 3 , y = l , 2.
231
Multivariate Distributions
•
2
' 21
3
4
•
! ! ^ ^)
Figure 4.3-1
In Example 4.1-2 we showed that
,. > 2x + 3
/iM:
x = 1, 2, 3,
and
Uy) =
3y+6
Thus the conditional p.d.f. of X given Y == y, is equal to
(x + y)/21
»(x|y)= (3J' + 6)/21
x +J'
3^ + 6 '
x = 1, 2, 3, when y = 1 or 2.
For example,
P(^=2|y=2)=s,(2|2)=^=^.
Similarly, the conditional p.d.f. of Y, given -T = x, is equal to
h(y]x)=
x+y
2x+3'
y = 1, 2, when ,x = 1, 2, or 3.
The joint p.d.f. f { x , y) is depicted in Figure 4.3-1 along with the marginal
p.d.f.'s. Conditionally, if y = 2, we would expect the outcomes ofx, 1, 2, and
3, to occur in the ratios 3:4:5. This is precisely what g(x \ y) does, namely
9(112)=
1 +2
12
9(21 2)=
2+2
12
9(3| 2)=
3+2
12
Figure 4.3-2 displays g(x|l) and g(x\2), while Figure 4.3-3 gives h(y\l),
h(y) 2), and h(y | 3). Compare the probabilities in Figure 4.3-3 with those in
232
Conditional Distributions
•
Figure 4.3-2
Figure 4.3-1. They should agree with your intuition as well as with the
formula for h(y | x).
Note that 0 < h(y | x) and
^'^^f^1-
Thus h(y \ x) satisfies the conditions of a probability density function, and so
we can compute conditional probabilities such as
P(a<Y<b\X=x)=
^
(y:a<y<b(
h
(y\x)
and conditional expectations such as
EW)\X=x-]=^u(y}h(y\x)
in a manner similar to those associated with probabilities and expectations.
Two special conditional expectations are the conditional mean of Y, given
X = x, defined by
^^=E(Y\x)=^yh(y\x),
h(y\\)\ h(y\1)\ h(.y\3)
' y
'T
•
Figure 4.3-3
233
Multivariate Distributions
Ch. 4
and the conditional variance of V, given X = x, denned by
o?i, = £{[Y - £(Y i x)] 2 1 x] = S: [>. - £(r | x)]2^ | x),
y
which can be computed using
a2y^=E(Y2\x)~\_E(Y\x)•]2.
The conditional mean f i ^ \ y and the conditional variance o-^iy are given by
similar expressions.
Example 4.3-2 We use the background of Example 4.3-1 and compute
f i y I ^ and a2, i _r, when x = 3:
^^^^^^^
and
-<(r-?^-'H,('-?)•(^)
^ 25 /4\ ^6 /5^ ^ 20
~ 8 1 W^Sl W ' S l '
The conditional mean of X , given V = y, is a function ofy alone; the conditional mean of Y, given X = x, is a function of x alone. Suppose that the latter
conditional mean is a linear function of x; that is, E[Y\x)=a + bx. Let us
find the constants a and b in terms of characteristics ^, f l y , ajc, c^, and p.
This development will shed additional light on the correlation coefficient p ;
accordingly we assume that the respective standard deviations Uy and ffy are
both positive so that the correlation coefficient will exist.
It is given that
yl!
£/
-f="+<",
y
JlW
^R.,
where R^ is the space o f X . Hence
£
y
and
£
xeR,. y
234
xeRi
4.3
Conditional Distributions
that is, with to and ^y representing the respective means, we have
P.Y = a + fcto .
(4.3-2)
In addition, if we multiply both members of equation (4.3-1) by x and sum, we
obtain
Z Zxy/(x,y)= ^ (ax + bx'Vifx);
xeRi
y
xeRi
that is,
E(XY) = aE{X} + bE(X2)
or, equivalently,
to pi, + pa, ffy = ato + 6(» + ri).
(4.3-3)
The solution of equations (4.3-2) and (4.3-3) is
"i
,
,
"i
a = P.Y —
"x
"x
which implies that if£(y!x) is linear, it is given by
E(Y^x)=^l,+paf-^,x-^l^.
"x
That is, if the conditional mean of Y, given X = x, is linear, it is exactly the
same as the best-fitting line considered in Section 4.2.
Of course, if the conditional mean of X , given Y = y, is linear, it is given by
£(;y|>.)=to+p^b'-ft).
ffy
We see that the point [x = to, E(Y\x) = ^iy] satisfies the expression for
£(V|x); and \_E(X\y) = to, »
is, the point (to, fy) is on each of the two lines. In addition, we note that the
product of the coefficient of x in £<y|x) and the coefficient of y in E(X\y)
equals p2 and the ratio of these two coefficients equals o^jo\ •
tions sometimes prove useful in particular problems.
Example 4.3-3 Let X and Y have the trinomial p.d.f. with parameters n,
pi, p2, and 1 —
f^y^xWn-x-yV.^'11''3'"'-
235
Multivariate Distributions
Ch. 4
where x and y are nonnegative integers such that x + y < n. From the
development of the trinomial distribution, it is obvious that X and Y have
marginal binomial distributions b(n, pi) and b(n, p^), respectively. Thus
^^f^A, ("-^'
"''
AM
p^W^Y'"'
>•'("- *->')!li-piAi-pJ
'
y = 0, 1, 2 , . . . . n - x .
That is, the conditional p.d.f. of Y, given X = x, is binomial
»L,,^1
L
i-pj
and thus has conditional mean
£(V|x)=(n-x)- p2
1-Pi
In a similar manner, we obtain
E(X\y}=(n-y)- p1-P2
Since each of the conditional means is linear, the product of the respective
coefficients of x and y is
^ ^ (^Pi_\( ^_Pi_'\ _ . _PiP2 _
\1-PJ\1-PJ
(l-Pl)(l-P2)
However, p must be negative because the coefficients of x and y are negative, thus
PiPi
^ (1-PiXl-Pi)'
p=
Exercises
'^•1 Let ^ and V have the joint p.d.f.
/(*, y} = ^ ,
x = 1, 2, >• = 1, 2, 3, 4.
(a) Display the joint p.d.f. and the marginal p.d.f.'s on a graph like Figure 4.3-1.
(b) Find g(x | v) and draw a figure like Figure 4.3-2, depicting the conditional p.d.f.'s
for y = 1, 2, 3, and 4.
236
4.3
Conditional Distributions
(c) Find h(y\ x) and draw a figure like Figure 4.3-3, depicting the conditional p.d.f.'s
for x = 1 and 2.
(d» Find (i) P(l ^ Y ^ 31 X = 1), (ii) P(Y sS 21 X = 2), and (iii) P(X = 2 \ Y = 3),
j(e) Find £(V | X = 1) and Var(r | X == 1).
^(L3-2 Let the joint p.d.f./(x, y) of X and y be given by the following:
(x, y)
f(x. y)
(1,1)
(2,1)
3/8
1/8
(1,2)
(2, 2)
1/8
3/8
Find the two conditional probability density functions and the corresponding means
and variances.
4.3-3 Let W equal the weight of laundry soap in a 1-kilogram box that is distributed in
Southeast Asia. Suppose that P(W < 1) = 0.02 and P{W > 1.072) = 0.08. Call a box
of soap light, good, or heavy depending on whether W < 1, 1 <. W <: 1.072, or
W > 1.072, respectively. In a random sample of n = 50 boxes, let X equal the
number of light boxes and Y the number of good boxes.
(a» What is the joint p.d.f. of X and V?
(b) Give the name of the distribution of Y along with the values of the parameters of
this distribution.
(c) Given that X = 3, how is Y distributed conditionally?
(d) Determine E{Y\X =3).
(e) Find p, the correlation coefficient of X and Y.
4.3-4 The genes for eye color for a certain male fruit fly are (R, W). The genes for eye
color for the mating female fruit fly are (R, W). Their offspring receive one gene for
eye color from each parent. If an offspring ends up with either (R, R), (R, W), or
(W, R), its eyes will look red. Let X equal the number of offspring having red eyes.
Let V equal the number of red-eyed offspring having (R, W) or (W, R) genes.
(a) If the total number of offspring is n = 400, how is X distributed?
(b) Give the values of E(X} and Var(A').
(c) Given that m = 300 offspring have red eyes, how is Y distributed?
(d) Give the values of£(T) and VarfY).
4.3-5 Let X and Y have a trinomial distribution with n = 2, pi = 1/4, and p; = 1/2.
(a) Give£(r|x).
(b) Compare your answer to part (a) with the equation of the line of best fit in
Example 4.2-5. Are they the same? Why?
4.3-6 (a) With the background of Example 4.2-3, find £(V | x) for x= 1,2,3,4.
(b) Do the points [x, £( Y \ x»] lie on the lie of best fit ? Why ?
REMARK It was stated that I/the conditional mean of Y, given X = x, is linear, then
it is exactly the same as the line of best fit.
237
MultivariatB Distributions
4.3-7 Using the joint p.d.f. given in Exercise 4.2-3, give the value of £( Y | x} for x = 1,2,
yr\/, 4. Is this linear? Do these points lie on the best-fitting line?
^AE^-8 An unbiased six-sided die is cast 30 independent times. Let X be the number of
one's and Y the number of two's.
(a) What is the joint p.d.f. of X and Y'!
(b) Find the conditional p.d.f. of X , given Y = y.
(c) Compute E(X1 - 4XY + 3r 2 ).
4.3-9 Let X and V have a uniform distribution on the set of points with integer coordinates in R = {(x, y): 0 <, x s 7, x £
and both x and y are integers. Find
(a) /,M,
(b) t&N,
(c) Em.t),
/,(<1)^,
dMe) Ar).
jyt.3-10 Let/i(.i) = 1/10, x = 0, 1, 2, .... 9, and t(y|») = 1/(10 - x}, y = x, x + I, .... 9.
" Find
00 Ax, y),
W fiW,
te) £(y|x).
MJp-n Let X and V have a joint uniform distribution on the set of points with integer
coordinates in R = {{x, y]: I < x <. 4, 4 - x <, y a 6 - x}. That is, f ( s , y) •= 1/12,
ffi^' rf e •"•
($W Sketch the set 8..
(b) Define the marginal p.d.f.'s/iOc} and/at^) in the "margins."
(c) Define h{y \ x), the conditional p.d-f, of Y, given X = x.
(d) Find £(Y | x) and draw y = £(V | x) on the sketch in part (a).
4.3-12 Referring to Exercise 4.2-9, determine the conditional p.d.f. of Y, given W = w,
and the conditional mean E{Y \ w}.
4.4
Multivariate Distributions of the Continuous Type
In this section we extend the idea of the p.d.f. of one random variable of the
continuous type to that of two or more random variables of the continuous
type. As in the one variable case, the definitions are the same as those in the
discrete case except that integrals replace summations. For the most part we
simply accept this substitution, and thus this section consists mainly of examples and exercises.
The joint probability density function of n random variables X ^ , X ^ , . . . , X,
of the continuous type is an integrable function f ( x ^ , x^, ..., .x,,) with the
238
4.4
Multivariate Distributions of the Continuous Type
following properties:
(a) f(x,,x,,...,x.)>0.
/*»
(b»
f
CO
•
•
• f ( x ^ x ^ , . . . , x ^ d x , ••• dx,=l.
(c) P [ ( ^ ^ , ^ , . . . , ^ ) e A ] = f •-• \ f ( x , , x , , . . . , x ^ d x , •••dx,,
J
A
J
where (-.^i, X ^ , . . . , X^ e A is an event denned in n-dimensional Euclidean space.
For the special case of a joint distribution of two random variables X and Y,
note that
PQX, Y) e A] =
j^...
f(x, y) dx dy,
and thus P[_(X, Y) e A~\ is the volume of the solid over the region A in the xy
plane and bounded by the surface z =/(x, y).
Example 4.4-1 Let X and Y have the joint p.d.f. f ( x , y) = e'''',
0<x<aa,0<y<w. The graph of z =f(x, y) is given in Figure 4.4-1
Figure 4.4-1
239
Multivariate Distributions
2
Ch. 4
1
when x + y < 9. Let A = {(x, y): 0 < x < ao, 0 < y < x/3}. The probability that (X, Y) falls in A is given by
P[{X, Y) e A} = | | e-'-' dy dx = \ e-^-e-^S' dx
Jo Jo
Jo
= f"[e-- - e-4"3] & = f-e-' + 3 e-4"3]"'
Jo
ff
\
4
L
Jo
The marginal p.d.f. of any one of these n random variables, say X,,, is given
by the (n —
'
c« p»
„
^
The definitions associated with mathematical expectations are the same as
those associated with the discrete case after replacing the summations by integrations.
Example 4.4-2 Let X and Y have the joint p.d.f.
f ( x , y) == 2,
0£
T h e n / ? = { ( x , } ' ) : 0 < x < j ' < l } i s t h e support and, for illustration,
p ( o & X <<{{.,,,00 <
< iiY
Y<
< l^}}==Pp((oo <
<X
X<
<Y
Y , 00 <
<Y
Y<
< l' }\
f
\
t
2/
\
2
rw r,
•'/
2y ly i
rw
-Jo i ^^'! ' ^•
The shaded region in Figure 4.4-2 is the region of integration that is a
subset of R, and the given probability is the volume above that region under
the surface z = 2. The marginal p.d.f.'s are given by
-I
/,(x) =
and
2 dy = 2(1 - x),
M=r
f,(y) =
240
Jo
2 dx = 2y,
0 < x< 1,
0 ^ y ;s 1.
4.4
Multivariate Distributions of the Continuous Type
Figure 4.4-2
Four illustrations of expected values are
£(JQ=j
I 2 x d y d<xx = |
j ' 22x(l
x ( l-- xx)) dx
d x=
=l1 ,
n
r) =
£(V)=
ri (•»
'
i
rir'
22
2y dx dy = \ 2y2 riy
dy =j- ,.
Jo
1
E(Y')=
} = iI ^l 22 yy 22 ddxx ddyy = {I l 2/
2 y 3dy
d y==^^ ,
and
Cov(A-, 7) = EKX - ft^Y - 11,1} = E(XY) - ^^
.n;..-,-,-i-;-j-^
= r [° ^/(^ >•) ^ dy - QYJ')
From these calculations it is obvious that E(X), E{Y), and £(V2) could be
calculated using the marginal p.d.f.'s as well as the joint one.
Let X and Y have a distribution of the continuous type with joint p.d.f.
/(x, y) and marginal p.d.f.'s/i(x) and/^), respectively. So in accord with our
policy of transition from the discrete to the continuous case, we have that the
241
Multivariate Distributions
Ch. 4
•^K^^^K&t-'pAA^-ra-icaTi.^rib-yatiaiice (3i'?, given ^ = x, are, respectively,
h(y | x) = ^x' y > ,
AM
£(r|x)=
provided that /,(x) > 0,
r" y/i(v|x)^,
and
Varfylxl^Cy-Stri.ia-'l.i}
= r' [.y-E(Y\x)-}lh{y\x)dy
=£[r2|x]-[£(y|x)]•l.
Similar expressions are associated with the conditional distribution of X, given
Y=y.
Example 4.4-3 Let X and Y be the random variables of Example 4.4-2.
Thus
/(x, y) = 2,
0 $ x < y < 1,
f,(x) = 2(1 - x),
0 < x S 1,
My) =2y.
0 < y < 1.
and
Before we actually find the conditional p.d.f. of Y, given X = x, we shall give
an intuitive argument. The joint p.d.f. is constant over the triangular region
shown in Figure 4.4-2. If the value of X is known, say X = x, then the
possible values of Y are between x and 1. Furthermore, we would expect Y
to be uniformly distributed on the interval [x, 1]. That is, we would anticipate that h(y\x) = 1/(1 —
definition that
^^-y^rT-xThe conditional mean of Y, given X = x, is
242
-
0<X<1
-
,^•'.hr^-lx-^l—
p\yT—x
i . dyr =[w^)!
y2 r =li+;-^• 0 < X < 1 -
E Y =
(
x<y<l
L4
Multivariate Distributions of the Continuous Type
Note that, for a given x, the conditional mean of Y lies on the dotted line in
Figure 4.4-2, a result that also agrees with our intuition. Similarly, it could
be shown that
E(X\y)=^,
0^y<l.
The conditional variance of Y, given X = x, is
E{\_Y-E(Y\x)•]2^x}= ['(y-1—^
Jx
\
L
-{wRecall that if a random variable W is U(a, b), then E{W) = (a + b)/2, and
Var(W) =(b— a f l l l . Since the conditional distribution of V, given X = x,
is U(x, 1), we could have written down immediately that E{Y\x) = (x + 1)/2
andVar(r|x)=(l - x f l l Z .
An illustration of a computation of a conditional probability is
<!'
f^o
'-Hr'G
r' ' ^
•"'=
L^^-'
In general, if £(Y | x) is linear, it is equal to
E(Y\x)=ii,+p(°'\(x-^).
^x/
If E(X | y) is linear, then
E(X\y)=^+p(^\y-fi,).
V^y/
Thus, in Example 4.4-2, we see that the product of the coefficients of x in
E(Y\x) and y in E ( X \ y ) is p2 = 1/4. Thus p = 1/2 since each coefficient is
positive. Since the ratio of those coefficients is equal to ffy/o-2- = 1, we have
that a1 = o?.
243
Multivarjate Distributions
Ch. 4
We could have calculated the correlation coefficient directly from the definition
__ E[,(X - ^Y - fa)] _ Cov(X, Y)
"x "r
"x "i
In Example 4.4-2 we showed that Cov(X, V) = 1/36. We also found E(Y) and
£(V2) so that ff 2 = E(Y2} - [£(V)]2 = 1/2 - (2/3)2 = 1/18. Since a\ = ff 2 ,
1/36
p-
1
\[W^[\IW~'2'
Of course, the definition of independent random variables of the continuous
type carries over naturally from the discrete case. That is, X and Y are independent if and only if the joint p.d.f. factors into the product of their marginal
p.d.f.'s, namely,
f(x, y) =fi(x)f^).
x e Rt,
y e R,.
Thus the random variables X and Y in Example 4.4-1 are independent. In
addition, the rules that allow us to determine easily dependent and independent random variables are also valid here. For illustration, X and Y in
Example 4.4-2 are obviously dependent because the support R is not a
product space, since it is bounded by the diagonal line y = x.
Exercises
4.4-1 Let/(x, y) = 2e-x-', 0 < x -, y < oo, be the joint p.d.f. of X and Y. Find/iM
and/^), the marginal p.d.f.'s of X and Y, respectively. Are X and Y independent?
4.4-2
(a)
(c)
(e)
Let/(;(, y) = 3/2, x2 a y S 1, 0 a x < 1, be the joint p.d.f. of A" and Y. Find
P(0 £
P(l/2 s X •£ 1, 1/2 < Y £
Are X and Y independent?
4.4-3 Let/(x, y} = 1/4, 0 < x < 2, 0 <, y £
and/^l, the marginal probability density functions. Are the two random variables
independent?
4.4-4 Let X and Y have the joint p.d.f. f ( x , y} = x + y,0 < x < 1,0 < y < 1.
(a) Find the marginal p.d.f.'s y^(x) andf^y) and show that/(x, y) ?'/i(x)/;(y). Thus
X and Y are dependent. Compute
(b) f t ) ; ,
(c) ^y,
(d) tj\,
(e) <7 2 ,
(f) the correlation coefficient p.
4.4-5 Let/(x, y) = e ~ x ~ y , Q < x < oo, 0 < y < oc, be the joint p.d.f. of X and V. Argue
that X and y are independent and compute
(a) P(X < n
(b) pyr > i, Y > i),
244
4.4
Multivariate Distributions of the Continuous Type
(c) P(X = F),
(d) P(X < 2),
(e» P(0 < X < oo, X / 3 < Y < 3X),
(f) P(0 < X < oo, 3X < Y < cc).
4.4-6
(a)
(b)
(c)
(d)
(e)
Let/(x, y) = 1/20, x ^ ^ ^ x + 2,0 < x < 10, be the joint p.d.f. of X and Y.
Sketch the region for which f ( x , y} > 0, that is, the support.
Find/i(x), the marginal p.d.f. of X .
Find h(y | x), the conditional p.d.f. of Y, given X = x.
Find the conditional mean and variance of V, given X = x.
Find/^(y), the marginal p.d.f. of Y.
4.4-7 Let /(x, y) = 1/40, 0 ^ x ^ 10, 10 - x ^ y ^ 14 - x, be the joint p.d.f. of X and
y.
(a)
(b)
(c)
(d)
4.4-8
(a)
(b)
(c)
(d)
(e)
Sketch the region for which/(x, y) > 0.
Find/i(x), the marginal p.d.f. ofX.
Find h[y \ x), the conditional p.d.f. of Y, given X = x.
Find E(Y | x), the conditional mean of Y, given X = x.
Let/(x, y) = 1/8,0 ^ y <4, y ^ x ^ y + 2,bethe joint p.d.f. of X and Y.
Sketch the region for which f [ x , y} > 0.
Find/i(x), the marginal p.d.f. of X .
Find h[y \ x), the conditional p.d.f. of Y, given X = x.
Find E(Y j x), the conditional mean of Y, given X •= x.
Graph y = E[Y\ x) on your sketch in part (a). Is y = E(Y\ x} linear?
4.4-9 Let X have a uniform distribution (/(O, 2), and let the conditional distribution of
V, given X = x, be U{0, x2).
(a) Define the joint p.d.f. of X and V,/(x, y}.
(b) Calculate ^(y), the marginal p.d.f. of Y.
(c) Find E(X \ y), the conditional mean of X , given Y = y.
(d) Find E(Y \ x), the conditional mean of Y, given X = x.
4.4-10 The Joint moment-generating function of the random variables X and Y of the
continuous type with joint p.d.f. f i x , y) is defined by
^h,t,}= 1'" [" el^^f{x,y}dxdy
if this integral exists for -h^ <ti< /ip -A; < t; < h^. If X and Y are independent,
show that
M(ti, t;) = M(ti, 0)M(0, la).
From the uniqueness property of the moment-generating function, it can be argued
that this is also a sufficient condition for independence. These statements are also
true in the discrete case. That is, X and Y are independent if and only if M(;i, (3) =
M{t,, 0)M(0, t;).
4.4-11 Let ( X , Y) denote a point selected at random from the rectangle
R = {(x, y): 0 ^ x £
Compute F[(X, Y) e A], where A = {(x, y): y <: e1'} n R.
245
Multivariate Distributions
Ch. 4
4.4-12 Let Xi, X , be independent and have distributions that are [/(O, 1). The joint
p.d.f. o f X i and Jf, is/(Xi,.(,) = 1, 0 <. Xi s 1, 0 <, x, 5 1.
(i) Show that P(Xi + Xj < 1) = tr/4.
(b) Using pairs of random numbers, find an approximation of n/4.
HINT: For n pairs of random numbers, the relative frequency
#[{(x,, x,}: x\ + xl s 1}]
is an approximation of n/4.
(c) Let V equal the #[{(^i, x^}: x\ + x| < 1}] for n independent pairs of random
numbers. How is Y distributed?
4.4-13 Let X have a uniform distribution [/(O, 1), and let the conditional distribution
of r, given X = x, be U(x2, ;c2 + 1).
(a) Record the joint p.d.f. of X and Y, f ( x , y). Sketch the region for which
f ( x , y) > 0.
(b) Find the marginal p.d.f. of Y,{j.y).
(c) Whatis£(Y|x).
4.4-14 The distribution of X is UiQ, 1) and the conditional distribution of Y, given
X = x, is U(0, 1 - x).
(a) Record the joint p.d.f- of-Y and Y. Be certain to include the domain of the p.d.f.
(b) Find f^(y}, the marginal p.d.f. of Y.
(c) Whatis£(y|x)?
4.4-15 Let X,, X,, ..., X . be independent and have distributions that are [7(0, 1). The
joint p.d.f. of Xi, X , , ..., X . is/(x,, x ^ , ..., x,) = 1, 0 <. x, < 1, 0 <. x, £
0£
(a) PO-,+X,<:1)=1/2!,
(b) / ' ( ^ i + J ( - , + . Y 3 f i l ) = l / 3 ! ,
(c) P(J?i+ X , + " - + X . < t ) = l / n } .
HINT: Draw figures for parts (a) and (b).
4.5
The Bjvariate Normal Distribution
Let X and Y be random variables with joint p.d.f. f ( x , y) of the continuous
type. Many applications are concerned with the conditional distribution of
one of the random variables, say Y, given that X == x. For example, X and Y
might be a student's grade point averages from high school and from the first
year in college, respectively. Persons in the field of educational testing and
measurement are extremely interested in the conditional distribution of Y,
given X = x, in such situations.
Suppose that we have an application in which we can make the following
three assumptions about the conditional distribution of Y, given X = x'.
246
4.5
The Bivariate Normal Distribution
(a) It is normal for each real x.
(b) Its mean £( Y \ x} is a linear function of x.
(c) Its variance is constant; that is, it does not depend upon the given value
ofx.
Of course, assumption (b), along with a result given in Section 4.4 implies
that
E(Y\x)=|^,+pal(x-^).
"x
Let us consider the implication of assumption (c). The conditional variance is
given by
^1,=)
\y-h-P^(x-h)\h(y\x)dy,
where h(y \ x) is the conditional p.d.f. of Y given X = x. Multiply each member
of this equation o f f ^ x ) and integrate on x. Since cr 2 ^ is a constant, the lefthand member is equal to <r2 i,. Thus we have
x
2
dx
ff,i. = J-«J-»L
r F \^ - ^ - r ffy
"x ( - ^'x)\J ^'^y\x)f,^x) ^ -
However, h{y \ x)f^(x) =f(x, y); hence the right-hand member is just an expectation and the equation can be written as
ff 2 !, = £{(r - fi,)2 - Ip ^ {X - ^ - /ly) + p2 ^ (X - |^,)1{.
But using the fact that the expectation £
ing E\_(X - ^)(Y - 11,)'] = pax a,, that
-i
i
^y
•; ^v i
Tri, = ff? - 2p —
"]!
ax
= CT? - 2p1 a1, + p1 o? = i7?(l - p2).
That is, the conditional variance of Y, for each given x, is ff2;! - p2). These
facts about the conditional mean and variance, along with assumption (a),
require that the conditional p.d.f. of V, given X = x, be
1
2
^
_
,,p
r
_
b'-^-pw^xx-^q
]
" ' ' ~ ,,^ ^~?
L
2a (l - p )
J•
p
2
2
—
247
Multivariate Distributions
Figure 4.5-1
Before we make any assumptions about the distribution of X , we give an
example and figure to illustrate the implications of our current assumptions,
Example 4.5-1 Let ^ == 10, a\ =9, ^y = 15, (T? = 16, and p = 0.8. We
have seen that assumptions (a), (b), and (c) imply that the conditional distribution of V, given X = x, is
••]1
N\ 15 + (0.8)1-KX- 10), 16(1 -0.82)
In Figure 4.5-1 the conditional mean line
E(Y | x) = 15 + (0.8)(^ - 10) = (^x + (")
has been graphed. For each of x = 5, 10, and 15, the p.d.f. of Y, given
X = x, is given.
Up to this point, nothing has been said about the distribution of X other
than that it has mean ^ and positive variance o^. Suppose, in addition, we
assume that this distribution is also normal; that is, the marginal p.d.f. o f X i s
<r.tV2'
248
:exp
r _ (x - ^f~\
L
2,i
J-
—
4.5
The Bivariate Normal Distribution
Hence the joint p.d.f. of X and Y is given by the product
f ( x , y) = h(y\x)f,(x) = ——————==
exp [~ -q^-yl~\,
2
2na^^l-p
L
where it is easy to show (see Exercise 4.5-2) that
2
J
(4.5-1)
^'-n
^, y) = ——
2p(^Y^'^
. (y-^
i - p2 LV ixrf^Y
I
\- "x
A ", ,
A joint p.d.f. of this form is called a bivariate normal p.d.f.
Example 4.5-2 Let us assume that in a certain population of college students, the respective grade point averages, say X and V, in high school and
the first year in college have an approximate bivariate normal distribution
with parameters ^ = 2.9, /iy = 2.4, dy = 0.4, Uy = 0.5, and p = 0.8.
Then, for illustration,
^<.<3.3,=^<^<"^)
= <I(1.8) - <S>(-0.6) = 0.6898.
Since the conditional p.d.f. of Y, given X = 3.2, is normal with mean
•o'
2.4 + (0.8/^\3.2 - 2.9) = 2.7
and standard deviation (0.5K/1 - 0.64 = 0.3, we have that
P(2.1 < V < 3.31 X = 3.2)
•2.1-2.7 r-2.7^»_17l
0.3
0.3
-(- 0.3
=d)(2)-D(- 2) =0.9544.
From a practical point of view, however, the reader should be warned that
the correlation coefficient of these grade point averages is, in many
instances, much smaller than 0.8.
Since x and y enter the bivariate normal p.d.f. in a similar manner, the roles
of X and Y could have been interchanged. That is, Y could have been assigned
the marginal normal p.d.f. N(^r, o"?), and the conditional p.d.f. of X , given
—
V = y, would have then been normal, with mean ^ + Pt^/^rX)'
249
Multivariate Distributions
Ch. 4
variance <7^(1 —
special note of it.
In order to have a better understanding of the geometry of the bivariate
normal distribution, consider the graph of z =f{x, y), where f ( x , y) is given by
equation (4.5-1). If we intersect this surface with planes parallel to the yz plane,
that is, with x = X y , we have
f(xo, y)=fi(Xo)h(y\Xo).
In this equation /i(xo) is a constant, and h(y\Xo) is a normal p.d.f. Thus z =
f ( x Q , y) is bell-shaped, that is, has the shape of a normal p.d.f. However, note
that it is not necessarily a p.d.f. because of the factor /i(xo). Similarly, intersections of the surface z =/(x, y) with planes y = yo, parallel to the xz plane will
be bell-shaped.
0 < ZQ < -
2n(Tjt O-y ^/l —
then
0 < ZQ litdjf <jy -\/1 —
If we intersect z =/(x, y) with the plane z = Z y , which is parallel to the xy
plane, we have
m
A^vT^^expr^l
Taking the natural logarithm of each side, we obtain
f^'-.v
zpf^^v^a)
lx^\ __2p
^-^v^ia ++ ^v
z_"i
V
"x
I
\
V "x
A
"Y
I
\ a,
"r
)
___
= -2(1 - p2) In (^2m^,^\-p1).
Thus we see that these intersections are ellipses.
Example 4.5-3 With //, = 10, a\ = 9, ^ = 15, o2 = 16, and p = 0.8, the
bivariate normal p.d.f. has been graphed in Figure 4.5-2. For p = 0.8, the
level curves for zg = 0.001, 0.006, 0.011, 0.016, and 0.021 are given in Figure
4.5-3. The conditional mean line,
E{Y | x) = 15 + (0.8/|\x - 10) = ^ x + ^ ,
is also drawn on Figure 4.5-3. Note that this line intersects the level curves
at points through which vertical tangents can be drawn to the ellipses.
250
The Bivariate Normal Distribution
Figure 4.5-2
We close this section by observing another important property of the correlation coefficient p if X and Y have a bivariate normal distribution. In equation (4.5-1) of the product h(y \ x)f^(x), let us consider the factor h(y \ x) if p =0.
We see that this product, which is the joint p.d.f. of X and Y, equals fi{x)f^(y}
because h(y \ x) is, when p = 0, a normal p.d.f. with mean jiiy and variance a^.
That is, if p = Q, the joint p.d.f. factors into the product of the two marginal
probability density functions, and, hence, X and Y are independent random
variables. Of course, if X and Y are any independent random variables (not
necessarily normal), we know that p, if it exists, is always equal to zero. Thus
we have proved the following.
THEOREM 4.5-1 // X and Y have a bivariate normal distribution with correlation coefficient p, then X and Y are independent if and only if p =0.
Thus, in the bivariate normal case, p = 0 does imply independence of X and
y.
It should be mentioned here that these characteristics of the bivariate
normal distribution can be extended to the trivariate normal distribution or,
more generally, the multivariate normal distribution. This is done in more
251
Multivahate Distributions
Figure 4.5-3
advanced texts assuming some knowledge of matrices; for illustration, see
Chapter 12 ofHogg and Craig (1978).
Exercises
43-1 Let X and Y have a bivariate normal distribution with parameters /^ = ~ 3,
H, = 10, ai = 25, <r} = 9, and p = 3/5. Compute
(b) P(-5<;l;-<5|r=13),
(a) P(-5<X <5),
(d) P(7< r < 16|X=2).
(c) P(7 < r < 16),
252
4.5
The Bivariate Normal Distribution
4.5-2 Show that the expression in the exponent of equation (4.5-1) is equal to the
function q{x, y} given in the text.
4.5-3 Let X and Y have a bivariate normal distribution with parameters ^ = 2.8,
fly = 110, a\ = 0.16, (Ty = 100, and p = 0.6. Compute
(a) P(106 < r < 124),
(b) P(106 < Y < 1241 X = 3.2).
4.5-4 Let X and Y have a bivariate normal distribution with parameters f l y = 50,
fly »
(a) P(65.8 < r £
4.5-5 Let X denote the height in centimeters and Y the weight in kilograms of male
college students. Assume that X and Y have a bivariate normal distribution with
parameters ^ = 185, o\ = 100, fly =- 84, Oy = 64, and p = 3/5.
(a) Determine the conditional distribution of Y, given that X = 190.
(b) Find P(86.4 < Y < 95.36 \X = 190).
4^-6 For a freshman taking introductory statistics and majoring in psychology, let X
equal the student's ACT mathematics score and Y the student's ACT verbal score.
Assume that X and Y have a bivariate normal distribution with fix = 22.7, o\ =
17.64, fly = 22.7, ffy = 12.25, and p = 0.78. Find
(a) P(18.5 < Y < 25.5),
(b) E(Y\x},
(c) Var(y|x),
(d) P(18.5<y<25.5|X=23),
(e) P(18.5<y<25.5|X=25).
(f) Forjc = 21, 23, and 25, draw a graph ofz = h[y\x) similar to Figure 4.5-1.
4.5-7 For a pair of gallinules, let X equal the weight in grams of the male and Y the
weight in grams of the female. Assume that X and Y have a bivariate normal distribution with fi,= 413.6, ai- 457.96, fly = 346.7, Ty-519.84, and p = -0.32.
Find
(a) P(309.3 < Y < 380.9),
(b) E(Y\x),
(c) Var(Y|x),
(d) P(309.3 < V < 380.91 X = 384.9).
4.5-8 Let X and Y have a bivariate normal distribution with parameters ^ = 10,
s\ = 9, )i, = 15, iiy = 16, and p = 0. Find
(a) P(13.6 < y < 17.2),
(b) £<y|x),
(c) Var(Y|x),
(d) P(13.6< Y < 17.21 X = 9.1).
4.5-9 Let X and Y have a bivariate normal distribution. Find two different lines, at,x)
and fc(x), parallel to and equidistant from £(V I x), such that
F[a(x) < V < t(x) | X = x] = 0.9544
for all real x. Plot a(x), &(x), and £(V | x) when /^ = 2, fly = —
P = 3/5.
253
?
Multivariate Distributions
Ch. 4
j
4.5-10 In a college health fitness program, let X denote the weight in kilograms of a
male freshman at the beginning of the program and let Y denote his weight change
during a semester. Assume that X and Y have a bivariate normal distribution with
^ = 72.30, ff^ = 110.25, f l y = 2.80, o-? = 2.89, and p =- -0.57. (The lighter students
tend to gain weight, while the heavier students tend to lose weight.) Find
(a) P(2.80 $ V < 5.35),
(b) P(176< Y <. 5.341 ^=82.3).
;
4.5-11 For a female freshman in a health fitness program, let X equal her percentage of
body fat at the beginning of the program and let V equal the change in her percentage of body fat measured at the end of the program. Assume that X and Y have a
bivariate normal distribution with ^ •= 24.5, ajc = 4.82 = 23.04, ^y = —0,2, a3/ =
3.02 = 9.0, and p = -0.32. Find
(a) F(1.3 £
(b) )iy i,., the conditional mean of Y, given X = x,
(c) iT^i^, the conditional variance of Y, given X = x,
(d) P(1.3^ V < 5 . 8 | ^ = 18).
4.5-12 For a male freshman in a health fitness program, let X equal his percentage of
body fat at the beginning of the program and let Y equal the change in his percentage of body fat measured at the end of the program. Assume that X and Y have a
bivariate normal distribution with ^ = 15.00, (T^ = 4.52, ^= —1.55, o^ = 1.52, and
p - -0.60. Find
(a) P(0.205 ;. Y £
(b) P(0.21 ^ Y < 0.811 X =20).
4.6
Sampling from Bivariate Distributions
In Section 1.4 we found that we could compute characteristics of a sample in
exactly the same way that we computed the corresponding characteristics of a
distribution of one random variable. Of course, this can be done because the
sample characteristics are actually those of a distribution, namely the empirical distribution. Also, since the empirical distribution approximates the actual
distribution when the sample size is large, the sample characteristics can be
used as estimates of the corresponding characteristics of the actual distribution. These notions can be extended to samples from distributions of two or
more variables, and the purpose of this section is to illustrate this extension.
Let ( X i , Yi», ( X ^ , V;), ..., {X^ Y^ be n independent observations of the
pair of random variables ( X , Y) with a fixed, but possibly unknown, distribution. We say that the collection of pairs (Xi, Vi), ( X ^ , Y^), ..., ( X ^ , YJ is a
random sample from this distribution involving two random variables. The
definition of a random sample can obviously be extended to samples from
distributions with more than two random variables and hence will not be
given formally. If we now assign the weight 1/n to each observed pair (^i, y^\
[ x ^ , y^), ..., (x^, y^), we create a discrete-type distribution of probability in
254
4.6
Sampling from Bivariate Distribution:
two-dimensional space. Using this empirical distribution, we can compute the
means, the variances, the covariance, the correlation coefficient, and the bestfitting line.
We use the means of the empirical distribution, x and y, as the sample
means. However, in defining the sample variances and the sample covariance,
we modify, as before, those corresponding characteristics of the empirical distribution by multiplying them by the factor n/(n —
is to create better estimates (in some sense to be denned later) of the variances
and covariance of the underlying distribution. In particular, the means,
variances, covariance, and correlation coefficient of the sample are given by
-
1 ;,
-
,2 _
i
'
y"
".£y"
i
'
.
V
<
\
/
.
xl
~
(„ ,,-|2 _
^-n-l2^'"-"
-
\2
£x
^ ? -I
_ ,,>2 _ E ..'i
,„
'--n-lA^
,2 _
1 ;,
y=
^n,?^"
Vi-i
/
n(n-l)
/ "
\
/ "
\2
\1=1
/
\.-1
/
" Z v ? - £)',
n(n-l)
, .
»(ix,y,)-(ix,Yt,,)A
1
c" = —
!
— y '"•
(x - xXv
- v} - "°n(n
'
n - 1. , '
' ~
- 1)
L (
y
"°'
"'
/
and
r
^
"(^•^-(^•X.^-)
- M-^M^f'
From the last equation we see that the covariance of the sample can be written
The best-fitting line (least squares regression line) is
y = y + ^(x - x) == -y + ^ (x - x).
V-t/
^X
255
Multivariate Distributions
Ch. 4
Recall that in Section 4.2 we fit that straight line by minimizing a certain
expectation, which for the empirical distribution would be
^ S [y; - a - bxf.
Of course, we do not expect that each observed pair (x,, y.) will lie on this line,
y = a -+- bx. That is, y^ does not usually equal y + r(Sy/s^x, —
expect this line to fit the collection of points (^i, Vi), ( x ^ , y^), ..., (jc,,, y^ in
some best way, namely best by the principle of least squares.
Example 4,6-1 To simplify the calculations, we take a small sample size,
n = 5. Let the five observed points be
(3,2), (6,0), (5,2), (1,6), (3,5).
It is sometimes helpful to construct the following table to simplify the calculations.
xy
x'
y2
3
2
6
9
4
6
S
1
_3
18
0
2
6
_5
15
0
10
6
15
37
36
25
1
_9
80
0
4
36
25
69
Y
Thus
^
—
—
i ^"(IS)^,
J
"1=1
y=^ y.=i(i5)=3.o,
"(JL^-(J^
s =
'
,
''
256
n(n - 1
)
=
5(80)_(18)2 ^ 76 ^ ^
5 ( 4 ) = 20
{|^HM 5(69 -(15)n(n - 1)
5(4)
120
20
•
Sampling from Bivariate Distributions
Figure 4.6-1
/ _ \ / _ yx 2 y
\i^i yl,
[lx'r•)-[^
')^'. ' ) _ 5(37) - (18)(15) _
n(n - 1)
5(4)
-85
20
= -4.25,
and
-4.25
s,s,
v'3.8v'6.0
= -0.89.
Thus the best-fitting line is
y = -y + rM (x - x)
—
—
—
©
<
•
= 3.0 - 0.89 /— (x - 3.6)
\ 3.8
= 7.03 - 1.12x.
We plot the five observed points and the best-fitting line on the same graph
(Figure 4.6-1) in order to compare the fit of this line to the collection of
points.
257
Multivariate Distributions
Ch. 4
Example 4.6-2 Let the joint p.d.f. of the random variables X and Y of the
discrete type be /(x, y) = 1/15, 1 ^ y S x <, 5, where x and y are integers.
Then the marginal p.d.f. of X is /i(x) = x/15, x = 1, 2, 3, 4, 5, and the marginal p.d.f. of V is f,(y) = (6 - >')/15, y = 1, 2, 3, 4, 5. It is easy to show that
ti, = 11/3, ai = 14/9, /iy = 7/3, (T? = 14/9, and p = 1/2. Thus the "bestfitting " line associated with this distribution is
(a,\
y = v., + p[ —
\"s/
^
-HNR)-^.
A random sample of n = 30 observations from this joint distribution was
generated and yielded the following data:
(2,2)
(4, 1)
(5, 4)
(3, 1)
(4, 1)
(2, 1)
(5,2)
(3,2)
(1, 1)
(5, 1)
(3, 2)
(5,4)
(5,2)
(4,3)
(5, 1)
(4, 3)
(4, 2)
(4,2)
(4,2)
(4,2)
(2, 1)
(4, 3)
(4, 1)
(3,3)
(4,3)
(5,2)
(4, 4)
(5, 5)
(3, 1)
(3,2)
For these 30 observations, in which some points were observed more than
once, we have x = 3.767, y = 2.133, s, = 1.073, s, = 1.106, and r = 0.405.
Thus the observed best-fitting line for these 30 points is
y = 0.417x + 0.560;
this should be compared to the best-fitting line of the distribution.
To visualize better the relationship between r and a plot of «
points (Xp y,),..., (x., y,), we have generated three diflerent sets of 50 pairs of
observations from three bivariate normal distributions. In the next example we
list the corresponding values of jc, y, s^,, s^, r, and the observed best-fitting
line. Each set of points and corresponding line are plotted on the same graph.
Example 4.6-3 Three random samples, each of size n = 50, were taken
from three different bivariate normal distributions. For each of these distributions, y.s = 12, aj = 16, fty = 8, ff? = 9. The respective values of the
correlation coefficient, p, are 0.8, 0.2, and - 0.6. The corresponding sample
characteristics are
258
Sampling from Bivariate Distributions
Figure 4.6-2(a)
(a) x = 11.905, s3; = 14.095, y = 8.271, s; = 6.851, and r = 0.799, so the
best-fitting line is
y = 8.271 + 0.799 /_6^5!- (x -_ 11.905) = 0.557;c + 1.639.
V 14.095
(b) x = 12.038, s^ = 15.011, y = 7.790, s,2 = 7.931, and r = 0.169, so the
best-fitting line is
/7931
y = 7.790 + 0.169 /
—
—
, (x - 12.038) = 0.123x + 6.311.
(c) x = 12.095, s^ = 16.762, y = 8.040, s1, = 7.655, and r = -0.689, so the
best-fitting line is
y = 8.040 - 0.689 /-^; (x - 12.095) = -0.466x + 13.672.
\ 16.762
In Figure 4.6-2, these respective lines and the corresponding sample points
are plotted. Note the effect that the value of r has on the slope of the line
and the variability of the points about that line.
259
Multivahate Distributions
Figure 4.6-2(c)
260
Sampling from Bivariate Distributions
The next example shows that two random variables X and Y may be clearly
related (dependent) but yet have a correlation coefficient p close to zero. This,
however, is not unexpected, since we recall that p does, in a sense, measure the
linear relationship between two random variables. That is, the linear relationship between X and Y could be zero, whereas higher order ones could be quite
strong.
Example 4.6-4 Twenty-five observations of X and Y, generated from a
certain bivariate distribution, are
(6.91, 17.52)
(4.32, 22.69)
(2.38, 17.61)
(7.98, 14.29)
(8.26, 10.77)
(2.00, 12.87)
(3.10, 18.63)
(7.69, 16.77)
(2.21, 14.97)
(3.42, 19.16)
(8.18, 11.15)
(5.39, 22.41)
(1.19, 7.50)
(3.21, 19.06)
(5.47, 23.89)
(7.35, 16.63)
(2.32, 15.09)
(7.54, 14.75)
(1.27, 10.75)
(7.33, 17.42)
(8.41, 9.40)
(8.72, 9.83)
(6.09,22.33)
(5.30,21.37)
(7.30, 17.36)
For this set of observations x = 5.33, y = 16.17, s2, = 6.521, s^ = 20.865, and
r = —0.06. Note that r is very close to zero even though X and Y seem very
dependent; that is, it seems that a quadratic expression would fit the data
very well. In Exercise 4.6-11 the reader is asked to fit y = a + bx + ex2 to
these 25 points by the method of least squares. See Figure 4.6-3 for a plot of
the 25 points.
Exercises
4.6-1 Three observed values of the pair of random variables [ X , Y) yielded the three
points (2, 3), (4, 7), and (6, 5).
(a) Calculate x, J', s^, s 2 , r, and the equation of the best-fitting line.
(b) Plot the points and the best-fitting line on the same graph.
4.6-2 Three observed values of the pair of random variables (X, Y) yielded the three
points (1,2), (3, 1), and (2, 3).
(a) Calculate x, y, s1:, s2 r, and the equation of the best-fitting line.
(b) Plot the points and the best-fitting line on the same graph.
4.6-3 A pair of unbiased dice was rolled six independent times. Let X denote the
smaller outcome and Y the larger outcome on the dice. The following outcomes
were observed:
(2.5) (3,5) (3,6) (2,3) (5,5) (1,3)
2
2
(a) Find x, y, s , s r, and the best-fitting line for the sample.
(b) Plot the points and the line on the same graph.
261
MultiVariate Distributions
Figure 4.6-3
(c) Define the joint p.d.f. of X and Y (see Example 4.1-1) and then calculate f i ^ , ^
ojc, a^, p, and the best-fitting line of this joint distribution.
4.6-4 Ten college students took the Undergraduate Record Exam (URE) when the
were juniors and the Graduate Record Exam (ORE) when they were seniors. TlQuantitative URE score (x) and the Quantitative GRE score (>•) for each of these 1
students is given in the following list of ordered pairs (x, y}'.
(550, 570) (670, 730) (490, 450) (410, 540) (570, 560)
(490, 400) (450, 420) (490,520) (780, 710) (520, 620)
(a» Verify that x = 542.0, y = 552.0, s^ = 12,040.0, s^ = 12,640.0, and r = 0.79.
(b) Find the equation of the best-fitting line.
(c) Plot the 10 points and the line on the same graph.
4.6-5 The respective high school and college grade-point averages for 20 college seniol
as ordered pairs {x, y) are
262
4.6
Sampling from Bivariate Distributions
(3.75, 3.19)
(3.42, 2.97)
(3.47,3.15)
(2.47, 2.11)
(3.30, 3.05)
(3.45, 3.34)
(4.00, 3.79)
(2.60,2.26)
(3.36, 3.01)
(2.58. 2.63)
(2.87, 2.23)
(2.65, 2.55)
(4.00,3.76)
(3.60, 2.92)
(3.80, 3.22)
(3.60, 3.46)
(3.10, 2.50)
(2.30,2.11)
(3.65, 3.09)
(3.79, 3.27)
(a) Verify that x ^ 3.29, y = 2.93, s2, = 0.283, s} = 0.260, and r = 0.92.
(b) Find the equation of the best-fitting line.
Ac) Plot the 20 points and the line on the same graph.
^.6-6 The respective high school grade-point average and the SAT mathematics score
for 25 college students as ordered pairs {x, y) are
(4.00,577)
(2.53,453)
(3.45,407)
(2.48,539)
(2.69, 534)
(2.82, 584)
(2.33,464)
(2.21, 525)
(2.59, 545)
(3.37,499)
(3.00, 446)
(2.93,466)
(3.25,491)
(2.90,433)
(3.64, 556)
(3.23, 394)
(2.46,497)
(2.62,460)
(2.75,413)
(2.82,440)
(3.51, 608)
(4.00, 657)
(3.72,449)
(2.78, 323)
[
(3.33,413)
T-1
.
(a) Verify-that x -• 3.02, y =486,12, s1, -^ 8.258, ^ =-5WS^2, and r .- 0.275..
(b) Find the equation of the best-fitting line.
k.l»^ C;)Wf^\.,||;,:.t ,1
4.6-7 Let the set R = {(x, y): x/2 < y < x/2 + 2, 0 < x < 10}. Let ( X , Y) denote a X ^y
random point selected uniformly from R.
(a) Sketch the set R in the xy plane. Does it seem intuitive to you that E(Y | x) =
x/2+1?
(b) Twenty-five points selected at random from R by the computer are
(6.58, 3.58)
(4.73, 2.36)
(1.52, 1.96)
(7.35,4.68)
(9.17, 5.50)
(9.74, 5.61)
(9.96, 5.68)
(6.43, 3.61) .
(9.06,4.81)
(2.05, 1.92)
(3.39, 2.70)
(4.78,4.05)
(1.70,0.97)
(3.37, 3.62)
(2.75, 2.25)
(6.57,4.26)
(5.29, 3.17)
(3.13, 1.60)
(8.06,4.34)
(1.79, 1.26)
(9.81, 6.39)
(1.32, 1.87)
(9.30, 5.95)
(0.46, 2.04)
•
Use the 25 observed values to obtain the best-fitting line of this sample. Note
that it is close to £(V | x) = x / 2 + I.
4.6-8 The following data give the ACT Math and ACT Verbal scores for 15 students:
(16,19)
(25, 21)
(27, 29)
(18,17)
(21, 24)
(28, 24)
(22,18)
(23, 18)
(30, 24)
(20,23)
(24,18)
(27, 23)
(17,20)
(31, 25)
(28, 24)
263
Multivariate Distributions
Ch. 4
(a) Vertify that x = 23.8, y = 21,8, s^ = 22.457, s^ = 11.600, and r = 0.626.
(b) Find the equation of the best-fitting line.
<c> 7\o\ tine ^5 points and the line on the same graph.
4.6-9 Each of 16 professional golfers hits off the tee a golf ball of brand A and a golf
ball of brand B, eight hitting ball A before ball B and eight hitting ball B before ball
A. Let X and Y equal the distances traveled in yards for ball A and for ball B,
respectively. The following data, (x, y), were observed:
(265, 252)
(255, 244)
(244, 251)
(272,276)
(258, 245)
(212, 222)
(246, 243)
(276, 259)
(254, 255)
(260, 246)
(274, 267)
(224,231)
(274,275)
(274, 260)
(263,246)
(269, 267)
(a) Verify that x = 257.50, y = 252.44, s^ = 341.333, s^ = 218.796, and r = 0.867.
(b) Find the equation of the best-fitting line.
(c) Plot the 16 points and the line on the same graph.
4.6-10 Fourteen pairs of gallinules were captured and weighed. Let X equal the male
weight and Y the female weight. The following weights in grams were observed:
(405, 321)
(403,328)
(415, 355)
(396, 378)
(370, 372)
(400,340)
(457,351)
(435, 314)
(425, 398)
(450, 320)
(425, 375)
(420, 330)
(415, 365)
(425,355)
(a) Verify that x = 417.2, s^ = 22.36, y = 350.1, Sy = 25.56, and r = -0.252.
(b) Find the equation of the best-fitting line.
(c) Plot the points and the line on the same graph.
4.6-11 We would like to fit the quadratic curve y = a + bx + ex2 to a set of points
(xi, .Vi),(x2, y^},..., (;<„, >'n)by the method of least squares. To do this, let
h{a, b,c}= ^ (y, - a - bx, - ex?)2,
(a) By setting the three first partial derivatives of h with respect to a, b, and c equal
to zero, show that a, &, and c satisfy the following set of equations, all sums
going from 1 to n:
an + b ^ x, + c ^ x,2 = ^ y,;
^'L^+b^xf+c^xf^^x.y,;
a^xf+b^xf+c^xt=Y.xfy,.
(b) For the data given in Example 4.6-4, ^x, = 133.34, ^x? = 867.75, ^ x,3 =
6197.21, ^ x^ = 46,318.88, ^ y, == 404.22, ^ x,y, = 2138.38, ^ xf y, = 13,380,30.
Show that a = -1.88, b = 9.86, and c = -0.995.
(c) Plot the points and this least squares quadratic regression curve on the same
graph.
264
4.6
Sampling from Bivariate Distributions
4.6-12 Let a random number X be selected uniformly from the interval (1, 9). For each
observed value of X = x, let a random number Y be selected uniformly from the
interval (x2 —
erated on a computer are
(4.16,
(2.69,
(2.44,
(3.17.
(5.47,
(8.26,
(6.87,
2.66)
10.14)
8.04)
6.79)
1.82)
12.89)
5.96)
(2.88,
(2.54,
(3.20,
(5.39,
(8.17,
(6.62,
8.60)
8.69)
4.41)
1.63)
14.55)
6.72)
(4.97,
(1.49,
(4.20,
(8.43,
(2.18,
(2.68,
3.76)
14.07)
3.01)
14.23)
12.76)
9.53)
(2.02,
(2.13,
(8.74,
(6.10,
(3.18,
(8.06,
12.81)
10.36)
17.15)
4.75)
6.56)
11.63)
(a» For these data, ^ x, = 116.04, ^ x f = 675,35, ^ x,3 = 4551.52, ^ x4 =
33,331.38, Y. Vi = 213.52, ^ x,y^ = 1036.97, ^ x f y , - 6661.79. Show that the
equation of the least squares quadratic regression curve is equal to
1.026x2 - 10.296x + 28.612.
(b) Plot the points and the least squares quadratic regression curve on the same
graph.
4.6-13 After keeping records for 6 years, a telephone company is interested in predicting the number of telephones that will be in service in year 7. The following data are
available. The number of telephones in service is given in thousands.
Year
Number of
Telephones
1
2
3
4
5
6
91
93
95
99
102
105
(a) Find the quadratic curve of best fit, y = a + bx + ex2, that could be used for this
prediction.
(b) Plot the points and the curve on the same graph.
(c) Predict the number of telephones that will be in service in year 7.
4.6-14 For male freshmen in a health fitness program, let X equal a participant's percentage of body fat at the beginning of the semester and let Y equal the change in
this percentage (percentage at the end of the semester minus percentage at the beginning of the semester, so that a negative y indicates a loss). Twelve observations of
265
Multivariate Distributions
Ch. 4
( X , Y) are
(13.1,1.1)
(8.2, -1.1)
(5.4,0.5)
(16.8,0.5)
(10.4, -0.2)
(14.3,-4.6)
(17.9,-1.3)
(17.4, -2.0)
(11.1,1.0)
(10.6,-2.2)
(10.5. -1.4)
(5.3,1.7)
(a) Verify that x = 11.750, s, = 4.298, y = -0.667, s, = 1.788, and r = -0.395.
(b) Find the equation of the best-fitting line (i.e., the least squares regression line).
(c) Plot the points and the line on the same graph.
4.6-15 Let X equal the number of milligrams of tar and V the number of milligrams of
carbon monoxide per filtered cigarette (100 millimeters in length) measured by the
Federal Trade Commission. A sample of 12 brands yielded the following data:
(5,7)
(15,15)
(17,16)
(20,20)
(9,11)
(11,10)
(8,9)
(13,13)
(11,9)
(13,11)
(12,11)
(11,14)
(a) Verify that x= 12.08, s.= 4.010, y = 12.17, s, = 3.614, and r = 0.915.
(b) Find the equation of the best-fitting line.
(c) Plot the points and the line on the same graph.
4.6-16 For each of 20 statistics students, let X and Y equal the mother's and father's
ages, respectively. The observed data are
(50,52)
(48, 50)
(51, 56)
(51,50)
(48,48)
(54, 58)
(50,53)
(64, 65)
(44,46)
(52,51)
(40,41)
(53, 56)
(47,50)
(52, 62)
(50,49)
(48,51)
(46,48)
(51, 55)
(51,52)
(49, 52)
(a) Verify that jc = 49.95, s, = 4.628, y = 52.25, s, = 5.418, and r = 0.888.
(b) Find the equation of the best-fitting line.
(c) Plot the points and the line on the same graph.
4.7
The t and F Distributions
Two distributions that play an important role in statistical applications will be
introduced in this section.
EOREM 4.7-1 If Z is a random variable that is N(0, 1), if U is a random
{triable that is K\r), and i f Z and U are independent, then
~ yujr
266
4.7
The t and F Distributions
has a t distribution with r degrees of freedom. Its p.d.f. is
r-[(f +1)/2]
alt} = —=——-——————
,
^TirWDd+tW"1"2
- co < ( < oo.
REMARK This distribution was first discovered by W. S. Gosset when he was
working for an Irish brewery. Because Gosset published under the pseudonym
Student, this distribution is sometimes known as Student's t distribution.
Proof: In the proof, we first find an expression for the distribution function
of T and then take its derivative to find the p.d.f. of T. Since Z and U are
independent, the joint p.d.f. of Z and U is
9(21
") = ~!TK '!~'w W2)F2 " ^ ' 2-l ''~"1'
°
° <z<°
°
, 0 < u < co.
The distribution function F(t) = P(T < t) of T is given by
F(t) = P(Zl^uJr S t)
-n-
=P(Z<^VI~rt)
g(z, u) dz du.
That is,
i
(•"rr-'®7" e' 2 ' 2 ' 1
^-T^ItL 2^4—•'2&-The p.d.f. of T is the derivative of the distribution function; so applying the
Fundamental Theorem of Calculus to the inner integral we see that
i
1
__
f " f,~t"t2Wr1
[~,
du
'~^rW)^ -^^——
2-^ ^ r '
^'^T^I
2 1
i
r'"^"' - „„
.„„
.,, 'du.
^r(r/2)J,
2-1"2
In the integral make the change of variables
du
1
y = (1 + t"lr)u so that —
dy 1 + t'/r '
267
Multivariate Distributions
Figure 4.7-1
Thus we find that
.A+i
/(t)
^r^Ji
^r(r/2) L(l + ^/rr^J Jo
^r+1^^^
e-^ 2 ^.
The integral in this last expression for/(r) is equal to 1 because the integrand
is like the p.d.f. of a chi-square distribution with r + 1 degrees of freedom.
Thus the p.d.f. is as given in the theorem.
D
Note that the distribution of T is completely determined by the number r.
Since it is, in general, difficult to evaluate the distribution function of T, some
values of P(T ^ t) are found in Table VI in the Appendix for r = 1, 2, 3, ...,
30. Also observe that the graph of the p.d.f. of T is symmetrical with respect to
the vertical axis t == 0 and is very similar to the graph of the p.d.f. of the
standard normal distribution N(0, 1). Figure 4.7-1 shows the graphs of the
probability density functions of T when r = 1, 3, and 7 and of N(0, 1). In this
figure we see that the tails of the t distribution are heavier than those of a
normal one; that is, there is more extreme probability in the t distribution
than in the standardized normal one.
Because of the symmetry of the t distribution about t = 0, the mean (if it
exists) must equal zero. That is, it can be shown that E(T) == 0 when r s: 2.
When r = I, the t distribution is the same as the Cauchy distribution, and we
noted in Section 3.2 that the mean and thus the variance do not exist for the
268
The t and F Distributions
Figure 4.7-2
Cauchy distribution. The variance of T is
Var(T) = £(T2) =
when r >. 3.
r-2'
The variance does not exist when r = 1 or 2. Although it is fairly difficult to
compute these moments from the p.d.f. of T, they can be found (Exercise 4.7-4)
using the definition of T and the independence of Z and U, namely
£(T) = E(Z)E
and
£(T2) = E(Z1)E^\
For notational purposes we shall let f^(r) denote the constant for which
P(T > t.(r)) = a,
when T has a t distribution with r degrees of freedom. That is, ^(r) 1s fhe
100(1 —
distribution with r degrees of freedom (see Figure 4.7-2). In this figure, r = 7.
Let us consider some illustrations of the use of the (-table and this notation
for right tail probabilities.
Example 4.7-1 Let T have a t distribution with seven degrees of freedom.
Then, from Table VI in the Appendix, we have
P{T < 1.415) = 0.90,
P(T <, -1.415) = 1 - P(T < 1.415) = 0.10,
and
P(- 1.895 < r < 1.415) = 0.90 - 0.05 = 0.85.
269
Multivariate Distributions
Ch. 4
We also have, for example, t,,.n,(7) = 1.415, (o.9o(7) = -t,,.io(7) = -1.415,
and to.o25(7) = 2.365.
Example 4.7-2 Let T have a t distribution with a variance of 5/4. Thus
r/(r - 2) = 5/4, and r = 10. Then
P(- 1.812 S T < 1.812) =0.90
and to.o5<10) = 1.812, (o.oi(10) = 2.764, and to.,,(10) = -2.764.
Example 4.7-3 Let T have a t distribution with 14 degrees of freedom.
Find a constant c such that P( \ T \ < c) = 0.90. From Table VI in the
Appendix we see that P(T < 1.761) = 0.95 and therefore c = 1.761 =
to.05(14).
Another important distribution in statistical applications is introduced in
the following theorem.
THEOREM 4.7-2 // U and V are independent chi-square variables with r^
and r-i degrees of freedom, respectively, then
F^
V/r,
has an F distribution with r^ and /•; degrees of freedom. I t s p.d.f. is
„
'""''r^w^xi+r,^)-———
0<M <CO
'
•
REMARK For many years the random variable defined in Theorem 4.7-2 has
been called F, a symbol first proposed by George Snedecor to honor R. A.
Fisher, who used a modification of this ratio in several statistical applications.
Proof: In this proof, we first find an expression for the distribution function
of F and then take its derivative to find the p.d.f. of F. Since U and V are
independent, the joint p.d.f. of U and V is
y n / 2 - l g - u / 2 yr;/2-lg-^'2
^^^W^rw^'
0<U<C0
'
0<1;<CO
-
In this derivation we let W = F to avoid using / as a symbol for a variable.
The distribution function F(w) = P(\V < w) o f f i s
W=P(^.)=P(l7^v)
-rr
270
g{u, v) du dv.
4.7
The t and F Distributions
That is,
r" re'""'2""" u"'2-1,.-"'2 ~i
i
^'r^w^i li
^
r
^
"
}
—
—
—
^
.
The p.d.f. of /-' = W is the derivative of the distribution function; so applying
the Fundamental Theorem of Calculus to the inner integral, we have
/(w) = F'(w)
=
1
f"
[(rl/r2wl/2
F(ri/2)]-W2) Jo
2<^•+^2
"1 .-.n/^.^ ,V/2-1,-./2 ^
^T
<r tv yi/ 2 !*/'/ 2 " 1 I"" v,)<''i+'-2)/2-l
^Y^J^________
________e(./2)ll+(n/r2>w),
rl+r 2
r^/2)TW2) Jo
2•
^
In the integral, make the change of variables
f
r^ \
dv
1
v = 1+ —
\ r^ )
dy 1 + (r,/r^)w
Thus we see that
^
f(w)
•
(r./r^r^r.+r^W12-1 f*
V^ 2 -^-^ 2
1 2
r(r,/2)r(r,/2)[l + {r.w/r^r ^ Jo r[^ + ^)/2]2"-^2 -v-
The integral in this last expression for /(w) is equal to 1 because the integrand
is like a p.d.f. of a chi-square distribution with r^ + r^ degrees of freedom.
Thus the p.d.f. f ( w ) is as given in the theorem.
Q
Note that the F distribution depends on two parameters, r, and r;, in that
order. The first parameter is the number of degrees of freedom in the numerator, and the second is the number of degrees of freedom in the denominator.
See Figure 4.7-3 for graphs of the p.d.f. of the F distribution for four pairs of
degrees of freedom. It can be shown that
E(F) = ——— and
r^-2
Var(F) =
2r22{rl+r2
~\
r^r^ - S)2^ - 4)
To verify these two expressions, we note, using the independence of U and V
in the definition of F, that
.,,.0<,) .. .^w]
271
Multivariate Distributions
Figure 4.7-3
In Exercise 4.7-9 the student is asked to find E<U), E{\/V), E(U2), and E(l/y1).
Some values of the distribution function P(F < , f ) of the F distribution are
given in Table VII in the Appendix. For notational purposes, if F has an F
distribution with r^ and r^ degrees of freedom, we say that the distribution of
F is F(ri, r;). Furthermore, we will let F^r ^2) denote the constant [the upper
lOOa percent point of F(r^, r^~\ for which
PV
S ^•,('•1, '•2)] = ".
Example 4.7-5 If the distribution of F is F(r,, r;), then from Table VII in
the Appendix we see that
when 1-1 = 7, r; = 8, P(F < 3.50) = 0.95, so Fo.o5(7, 8) = 3.50;
when r, = 9, ^ = 4, P(F <, 14.66) = 0.99, so fo.oift 4) = 14-66To use Table VII to find the F values corresponding to the 0.01, 0.025, and
0.05 cumulative probabilities we need to note the following: Since F •=
[U/r^}/(V/r^, where U and V are independent and ^(fi) and ^{r^}, respectively, then l / F = (VIr^KU/r^ must have a distribution that is F(r;, r,). Note
the change in the order of the parameters in the distribution of l / F . Now if the
272
4,7
The t and F Distributions
distribution of F is F{r^ r;), then
^E7 ^ ^i, ^)] = a
and
^^T^]^The complement of {1/F < l/F,(ri, r,)} is {1/F > l/f.(ri, r;)}. Thus
/
'[t > ^] =l - a •
(4 7 1
--'
Since the distribution of 1/F is F(r^, ri), by definition of F^ _^(r;, ri), we have
P|j:>Fi-.(r3,ro1=l-i2.
(4.7-2)
From equations (4.7-1) and (4.7-2) we see that
'l-(r2•rl)=^)•
(4 7 3)
•-
The use of equation (4.7-3) is illustrated in the next example.
Example 4.7-6 If the distribution of F is F(4, 9), constants c and d such
that
P(F <c)= 0.01
and
P(F ^ d) = 0.05
are given by
c=F09 (4
' •9)=^4j=T4166=o•0682;
d=FO
^•9)=1^)=^=OA661•
Furthermore, if F is F{6, 9) then
P(F < 0.2439) = ^ > ^) = P(^ £
because the distribution of 1/F is F(9, 6).
273
Multivariate Distributions
Ch. 4
Exercises
4.7-1 Let T have a ( distribution with r degrees of freedom. Find
(a) P(T £
(b) P(7" £
(c) P( | r | > 2.228) when i— 10,
(d) P(-1.753 < r s 2.602) when r = 15,
(e) P(l,330 S T S 2.552) when r = 18.
4.7-2 Let T have a (distribution with r = 19. Find c such that
(a) P(|T|£c)=0.05,
(b) P(|T|£c)=0.01,
(c) P(T a: c) - 0.025,
(d) P(|T|£c)=0.95.
4.7-3 Find
(a) tooi(13),
(b) tooi(15),
(c) lo9s(17),
(d) to.,,,(5).
4.7-4 Let T have a t distribution with r degrees of freedom. Show that £(T) = 0, r > 2,
and Var(T) = r/(r - 2), provided that r a: 3, by first finding £(Z), ffl/VU), £(Z2),
and £(1/17).
4.7-5 Let r have a I distribution with r degrees of freedom. Show that T 2 has an F
distribution with 1 and r degrees of freedom.
HINT: Consider T1 = VHVIr).
4.7-6
(a)
(b)
(c)
Let F have an F distribution with f-i and r; degrees of freedom. Find
P(F £
P(F s 4.14) when r, = 7, r, = 15,
P(F < 0.1508) when r, = 8, r, = 5,
HINT: 0.1508= 1/6.63.
(d) P(0.1323 < F < 2.79) when 1-1 = 6, r., = 15.
4.7-7 Let F have an F distribution with r, and r; degrees of freedom. Find numbers a
and b such that
(a) P(a S F ^ b) = 0.90 when r, = 8, r, = 6,
(b) P(a £
4.7-8 Find
(a)
(b)
(c)
(d)
F, o,(5, 9),
f» o.,,(9, 7),
F, ,,(8, 5),
F».,,(5, 7).
4.7-9 Find the mean and the variance of an F random variable with f i and r; degrees
of freedom by first finding E(V), E(l/V), E(U2), Had E(t/V2) as suggested in the text.
4.7-10 Let -YI and Xa have independent gamma distributions with parameters a, 9 and
A 9, respectively. Let W ^ X,/(Xi + X^). Use a method which is similar to that
274
4.7
The t and F Distributions
given in the proofs of Theorems 4.7-1 and 4.7-2 to show that the p.d.f. of W is
^=^w"l(l-wr'•0 < W < L
We say that W has a beta distribution with parameters a and ? (see Example 4.8-3).
4.7-11 Let X have a beta distribution with parameters a and /f. Show that the mean
and variance of A" are
and
cr2 =
«/)
(a + P + IXa + ffi 2 '
HINT: In evaluating E{X) and E[X2) compare the integrands to the p.d.f.'s of beta
distributions with parameters a + 1, fl and a + 2, jj, respectively.
4.7-12 Let Zi, Z;, Z j be a random sample of size 3 from a standard normal distribution N(0, 1).
(a) How is V distributed if
V(Z2 + Zj»/2
(b) Let V = Zi/Z^. Show that V has a Cauchy distribution.
HINT: Use a method similar to the proof of Theorems 4.7-1 and 4.7-2. Note the quadrants in which V > 0, V < 0, Z, > 0, and Z; < 0.
(c) Let
„
,
Z,
^/(zFTZj)/2
Show that the distribution function of W is
'0,
- lArctan^-w 2 )/!!. 2 ),
"\
1
1
f(»)= <
2'
"'£ -v/2,
-,/2<w<0,
w = 0,
1—
0<»<v^,
"V
1
V'2<".
^1,
HINT: What relationship is there between parts (b) and (c)?
(d) Show that the p.d.f. of W is
-V2<w<,/2.
275
Multivariate Distributions
Ch. 4
Note that this is a U-shaped distribution. Why does it differ so much from that
in part (a) when the definitions for U and W are so similar?
(e) Show that the distribution function of W, for -,/2 < w < ^/2, can be denned
by
I I /
w\1
1/
"
\
Flw} = - + - Arcsm —
— = - + - Aretan ,
2 "\
^/ll
2 «\
,/2—-A'
.
(e) Find the means and variances (if they exist) of U, V, and W.
*4.8
Transformations of Random Variables
In this section we consider another important method of constructing models.
As a matter of fact, to go very far in theory or application of statistics, one
must know something about transformations of random variables. We saw
how important this was in dealing with the normal distribution when we
noted that if X is N(fi, <r2), then 2 = u(X) = (X - ^)/a is N(0, 1); this simple
transformation allows us to use one table for probabilities associated with all
normal distributions. In the proof of this, we found that the distribution function of Z was given by the integral
,.,('
G(z) = p{^——v- <. z) = P(X <za+fi)
=
f""" i
r (x-^i.
—
7
= exp - —
—
5
—
J-« aJTn ' L
2<r2 J
dx.
In Section 3.4 we changed variables, x = wa + 11, in the integral to determine
the p.d.f. ofZ; but let us now simply differentiate the integral with respect to z.
If we recall from calculus that the derivative
O.N "/(() dt\ =/[c(2)]i/(z),
then, since certain assumptions are satisfied in our case, we have
,, „
„,
1
r (zg + It - /i)2"! d(za + f t )
g(z) = G'(z) = —-,= exp - —
—
—
^
—
— —
—
,— =/[c(z)]c
za
(J^/IK
L
J
"z
1
/ z2}
=—
= exp - —
,/2n
\ 27
That is, Z has the p.d.f. of a standard normal distribution.
Motivated by the preceding argument we note that, in general, if X is a
276
4.8
Transformations of Random Variables
continuous-type random variable with p.d.f. f ( x ) on support a < x < b and if
V = u(X) and its inverse X = v(Y) are increasing continuous functions, then
the p.d.f. of V is
g(y) = fMWy)
on the support given by a < v(y) < b or, equivalently, u(a} < y < u(b). Moreover, if u and v are decreasing functions, then the p.d.f. is
9{y) = fW}\.-vW
»(&) < V < "(a).
Hence, to cover both cases, we can simply write
g(y) = | v'{y) | /[>(}')],
c <y <d,
where the support c < y < d corresponds to a < x < b through the transformation x = v{y).
Example 4.8-1 Let the positive random variable X have the p.d.f.
f ( x ) = e - J C , 0 < x < o o , which is skewed to the right. To find a distribution
that is more symmetric than that ofX, statisticians frequently use the square
root transformation, namely Y = ^ / X . Here y = ^ / x corresponds to
x = y2, which has derivative 1y. Thus the p.d.f. of Y is
g{y) = lye-^,
0 < y < oo,
which is of the Weibull type (see Section 3.5). The graphs of f ( x ) and g(y)
should convince the reader that the latter is more symmetric than the
former (see Figure 4.8-1).
Example 4.8-2 Let X be binomial with parameters n and p. Since X has a
discrete distribution, Y = u(X) will also have a discrete distribution with the
same probabilities as those in the support of X . For illustration, with n = 3,
p = 1/4, and Y = X2, we have
/ 3 ViV7/^3-^
^'(^(J
-
^=al'4'9-
For a more interesting problem with the binomial random variable X,
suppose we were to search for a transformation u(X/n) of the relative frequency X / n that would have a variance very little dependent on p itself when n
is large. That is, we want the variance of u(X/n) to be essentially a constant.
Consider the function u(X/n} and find, using two terms of Taylor's expansion about p, that
u(-)%u(p)+u'(p)(--p).
277
Multivarjate Distributions
Figure 4,8-1
Here terms of higher powers can be disregarded if n is large enough so that
X / n is close enough to p. Thus
Va{"(f)1 ^ Wp)y Var(^ - p ) = Wp)]2 p(^-p).
(4.8-1)
However, if Vatf_ii(X/n)] is to be constant with respect to p, then
[ttWpfl - P) = k
or
u'(p} =
c
VP(1 -f)
,
where k and c are constants. We see that
a(p) = 2c Arcsin ^/?
is a solution to this differential equation. Thus with c = 1/2, we frequently see,
in the literature, use of the arcsine transformation, namely
[x
Y= Arcsin /-,
V n
which, with large n, has an approximate normal distribution with mean
^ = Arcsin ^ / p
278
Transformations of Random Variables
and variance [using formula (4.8-1)]
^r^Arcsm^''0^-^.
There should be one note of warning here: If the function Y = u(X) does
not have a single-valued inverse, the determination of the distribution of Y
will not be as simple. We did consider one such example in Section 3.5 by
finding the distribution of Z2, where Z is N(0, 1). In this case, there were " two
inverse functions" and special care was exercised. In our examples, we will not
consider problems with "many inverses"; however, we thought that such a
warning should be issued here.
When two or more random variables are involved, many interesting problems can result. In the case of a single-valued inverse, the rule is about the
same as that in the one-variable case, with the derivative being replaced by the
Jacobian. Namely, if X^ and X^ are two continuous-type random variables
with joint p.d.f. /(xi, x;) and if V, = Ui(A'i, X^), Y, = u^fXi, X,) has the
single-valued inverse X^ = i>i(S'i, Y^), X^ = v^Y^, Y;), then the joint p.d.f. of V,
and y; is
aCri, f i ) = l^l/["i(3'i. Vi), vi(yi. yi)~\,
where the Jacobian J is the determinant
8x^
8x^
SYt
8x^
Syi
8x^
By,
By,
Of course, we find the support of V,, V; by considering the mapping of the
support of X ^ , X^ under the transformation y^ = Mi(xi, jc;), y^ = u^tii, x^).
Example 4.8-3 Let X ^ and X ^ have independent gamma distributions with
parameters «, 9 and ft, 9, respectively. That is, the joint p.d.f. of X i and X j is
/(^i, x,) =
r(«)r(/i)9"-^ x\
-x;
exp
0 < x, < oo,
0 < x, < oo.
Consider
A, + X,
Y, = X , + X ,
279
Multivariate Distributions
or, equivalcntly,
x, = y, r,,
A-; = r; - Yi r;.
The Jacobian is
^2
-:>'2
>'!
1 - Yl
=.»'2(i - y i ) + y i f i = y i -
Thus the joint p.d.f. g(yi, y^) of Yi and V; is
1
9 ( y t , y i ) = \ y i \•TW^-^(yiyiT'^yi-y^iY'''-'12",
where the support is 0 < y^ < 1, 0 < y^ < GO. The marginal p.d.f. of Yi is
^--^^[^e-^
But the integral in this expression is that of a gamma p.d.f. with parameters,
«
equals r(a 4- P) and
;
^-r^^1'1-^'-
o<yl<l
-
I
We say that V, has a beta p.d.f. with parameters «
Example 4.8-4 (Box-Muller Transformation) Let the distribution of X
be N{fi, v1). It is not easy to simulate an observation of X using Y = F(X),
where Y is uniform U(0, 1) and F is the distribution function of the desired
normal distribution, because we cannot express the normal distribution
function F{x) in closed form. Consider the following transformation,
however, where X ^ and X^ are observations of a random sample from
U(0, I):
Zi = v ^ l n X i cos (2nX,),
Z, = ^/-2 In X , sin (2-iiX,)
or, equivalently,
zf + z'\
X , = exp —
^Arctan(|
280
Transformations of Random Variables
Figure 4.8-2
which has Jacobian
] =
| 2)i(z; + z|)
2ir(z; + z|)
In
Since the joint p.d.f. of X^ and X^ is
/(xi, x;) = 1,
0 < x, < 1, 0 < x, < 1,
we have that the joint p.d.f. of Z^ and Z^ is
ff(zi, 22) =
2>t
(1)
—
(The student should note that there is some difficulty with the definition of
the transformation, particularly when Zi == 0. However, these difficulties
occur at events with probability zero and hence cause no problems; see
Exercise 4.8-12.) To summarize, from two independent U(0, 1) random variables, we have generated two independent N(0, 1) random variables through
this Box-MuUer transformation.
The techniques described for two random variables can be extended to three
or more random variables. We do not give any details here but mention, for
281
Multivariate Distributions
illustration, that with three random variables X ^ , X ^ , X^ of the continuous
type, we need three "new" random variables Zi, Z ^ , Z^ so that the corresponding Jacobian of the single-valued inverse transformation is the nonzero
determinant
S^_
Bz,
Sx,_
3zi
^
az,
5Z2
OZ3
5x;
(?x;
az., oz,
ax, ax,
Exercises
4.8-1 Let the p.d.f. of X be defined by/(x) = x'/4, 0 < x < 2. Find the p.d.f. of V - X'.
4.8-2 Let the p.d.f. of X be defined /(x) = 1/n, -n/2 < x < n/2. Find the p.d.f. of
Y = tan X . We say that Y has a Cawhy distribution.
4.8-3 Let the p.d.f. of X be defined by /(x) = (3/2)x2, - 1 < x < 1. Find the p.d.f. of
Y ={X3 + 1)/2.
4.8-4 If V has a uniform distribution on the interval (0, 1), find the p.d.f. of
J f = ( 2 y - l)'^.
4.8-5 Let X i , X^ denote a random sample from a distribution ^^l. Find the joint
p.d.f. of YI = X ^ and Y; = X^ + X i . Here note that the support of Y^, Y^ is 0 <
y, < y-i < oo. Also find the marginal p.d.f. of each of Yi and Y^. Are Vi and V;
independent?
4.8-6 Let Zi, Z; be a random sample from the standard normal distribution N(0, 1).
Use the transformation defined by polar coordinates Z , = X , c o s X ^ , Z; =
XtSinX,.
(a) Show that the Jacobian equals Xi. (This explains the factor r of r dr d6 in the
usual polar coordinate notation.)
(b) Find the joint p.d.f. ofX, and X;.
(c) Are X, and X, independent?
4.8-7 Let the independent random variables X ^ and X^ be M(0, 1) and ^(r), respectively. Let YI = X J ^ X ^ r and Y, = X;.
(a) Find the joint p.d.f. of y, and Y,.
(b) Determine the marginal p.d.f. of Y^ and show that Yi has a Students t distribution.
4.8-8 Let A", and X; be independent chi-square random variables with i-i and r^
degrees of freedom, respectively. Let r, = (X,/r,)/(X,/r,)and y.,=X,.
(a) Find the joint p.d.f. of V, and V,.
(b) Determine the marginal p.d.f. of Yi and show that V, has an F distribution.
282
4.8
Transformations of Random Variables
4.8-9 Let X have a Poisson distribution with mean A. Find a transformation u(X) so
that Var [u(X)] is about free of /., for large values of /..
HINT: u[X) % u(/l) + [u't/'W - /l», provided KO)]tY - /l)2/^ and higher terms can be
neglected when ). is large. A solution is u(X) = ^ / X , and the latter restriction should
be checked.
4.8-10 Generalize Exercise 4.8-9 by assuming that the variance of a distribution is
equal to Cfi", where c is a constant (note in Exercise 4.8-9 that this is the case with
p =- 1). In particular, the transformation Y = u(X} = X1'^2, when p / 2, or
V = u(X) = In X, when p = 2, seems to produce a random variable Y whose
variance is about free of p.. This is the reason transformations like ^ f x , In X , and
more generally X1' are so popular in applications.
4.8-11 Let Z^ and Z^ be independent standard normal random variables N(0, 1). Show
that Vi = Zi/Z; has a Cauchy distribution.
4.8-12 In Example 4.8-4 verify that the given transformation maps {(x^, x^}:0 < Xi <
1, 0 < x^ < 1} onto {^z^, 2;): — o o < Z i < o o , —oo<z;<oo} except for a set of
points that has probability 0.
HINT: What is the image of vertical line segments? What is the image of horizontal line
segments?
4.8-13 Let A"i and X^ be independent chi-square random variables with r, and r;
degrees of freedom, respectively. Show that
(a) U = X J { X ^ + X t ) has a beta distribution with a = r,/2 and j8=r;,/2;
(b) V = JC;/(^i + X ^ ) has a beta distribution with a = 1-2/2 and /? = r^/2.
4.8-14 (a) Let X have a beta distribution with parameters a and fi (see Example 4-8-3).
Show that the mean and variance of X are
^
fi = —
—
a+P
and
(a + fi +l)(a + ^»2 "
(b) Show that the beta p.d.f. has a maximum (mode) at x = (a —
4.8-15 Determine the constant c such that/(x) = cx\l - x)6,0 < x < 1, is a p.d.f.
4.8-16 When a and ^ are integers and 0 < p < 1, then
f'n^i) ,
J. HaTO '
{l
^,-,
yr
uy
..M
,1-. \y)
where «
4.8-17 Evaluate
r.
r(7)
^ y'(l - yf dy
Jo r(4)r(3)
(a) using integration,
(b) using the result of Exercise 4.8-16.
283