Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4 Multivariate Distributions 4.1 Multivariate Distributions of the Discrete Type There are many random experiments or situations that involve more than one random variable. For example, a college admissions department might be interested in the ACT mathematics score X and the ACT verbal score Y of prospective students. Or manufactured items might be classified into three or more categories: Here X might represent the number of good items among n items; V would be the number of "seconds," and the number of defectives would then be n — in a biology laboratory, 400 kernels of corn could be classified into four categories, smooth and yellow, smooth and purple, wrinkled and yellow, and wrinkled and purple. The numbers in these four categories could be denoted by X , , •K.I , A-3, and JC, = 400 - A-i - X , - A-3, respectively. In order to deal with situations such as these, it will be necessary to extend certain definitions as well as give new ones. DEFINITION 4.1-1 Let X and Y be two functions defined on a discrete probability space. Let R denote the corresponding twasdimensmnat apace o^X_and_Y, the two random variables of the discrete type. The probability that X = x and Y = y is denoted by f i x . v) =P(X = x, Y = y), and it is induced from the discrete probability space through the functions X and Y. The function f ( x , y) is called the joint probability density function (joint p.d.f.) of X and Y and has the following properties: (i) 0 ^/(x, y) < 1. (ii) £ 209 Multivariate Distributions Figure 4.1-1 (iii) P[,(X, r) e A] = ^ ^ f ( x , y), where A is a subset of the space R. fx,y)eA The following example will make this definition more meaningful. Example 4.1-1 Roll a pair of unbiased dice. For each of the 36 sample points with probability 1/36, let X denote the smaller and Y the larger outcome on the dice. For example, if the outcome is (3, 2) then the observed values are X = 2, Y = 3; if the outcome is (2, 2), then the observed values are X = Y = 2. The joint p.d.f. of X and Y is given by the induced probabilities 1_ 36' T_ 36' 1 < x = y < 6, 1 :£ x < y < 6, when x and y are integers. Figure 4.1-1 depicts the probabilities of the various points of the space R. Notice that certain numbers have been recorded in the bottom and lefthand margins of Figure 4.1-1. These numbers are the respective column and row totals of the probabilities. The column totals are the respective probabilities that X will assume the values in the x space R^ = {1, 2, 3, 4, 5, 6}, and the row totals are the respective probabilities that Y will assume the values in the 210 4.1 Multivariate Distributions of the Discrete Type y space R^ = {1, 2, 3, 4, 5, 6}. That is, the totals describe probability density functions of X and Y, respectively. Since each collection of these probabilities is frequently recorded in the margins and satisfies the properties of a p-d.f. of one random variable, each is called a marginal p.d.f. DEFINITION 4.1-2 Let X and Y have the joint probability density function f(x, y) with space R. The probability density function of X alone, called the marginal probability density function of X, is defined by fi(x) = S AX, y), x e R,, y where the summation is taken over all possible y values for each given x in the x space Rf That is, the summation is over all (x, y) in R with a given x value. Similarly, the marginal probability density function of Y is defined by f2(y}=^f^y), y^^2, where the summation is taken over all possible x values for each given y in the y space R^. The random variables X and Y are independent if and only if f ( x , y) =A(xWy), xeR,, y e R , ; otherwise X and Y are said to be dependent. Example 4.1-2 Let the joint p.d.f. of X and Y be denned by /(x^)=^ 2 , x=l,2,3, y=l,2. Then ?)=U^y)= £ :x + 1 + x +2 2x+3 ^^ ^^=^^' x=l 2 3 - ' ' and ^)=E/(x,.)=E^=6^, .=1,2. Note that both/iOc) andf^y) satisfy the properties of a probability density function. Since/(x, y) ^A(x)f^(y), X and Y are dependent. 211 Multivariata Distributions Example 4.1-3 Let the joint p.d.f. of X and Y be f { x , y) = xy_ 30 x = 1, 2, 3, y = 1, 2. The marginal probability density functions are x = 1, 2, 3 AM = £ 30 6 ' and ^-s^. y = 1, 2. Then f ( x , y) s f i W f ^ y ) for x = 1, 2, 3 and y = 1, 2; thus X and Y are independent. Example 4.1-4 Let the joint p.d.f. o! X and Y be xy_ 13 ' (x, >.) = (1, 1), (1, 2), (2, 2). Then the p.d.f. of X is 5 13' 8 T3' and that of V is fi(y) = Thus/(x, y) ^/i(x)/2(y) for x = 1, 2 and y = 1, 2, and X and V are dependent. Note that in Example 4.1-4 the support R of X and Y is "triangular." Whenever this support R is not " rectangular," the random variables must be dependent because R cannot then equal the product set {(x, y): x e R^, y e R^}. That is, if we observe that the support R of X and Y is not a product set, then X and Y must be dependent. For illustration, in Example 4.1-4, X and Y 212 4.1 Multivariate Distributions of th» Dixcrta Type are dependent because R = {(1, 1), (1, 2), (2, 2)} is not a product set. On the other hand, if R equals the product set {(x, y): x e Ri, yeR^} and if the formula for/(x, y) is the product of an expression in x alone and an expression in y alone, then X and Y are independent, as illustrated in Example 4.1-3. Example 4.1-2 illustrates the fact that the support can be rectangular but the formula for/(x, y) is not such a product and thus X and Y are dependent. The notion of a joint p.d.f. of two discrete random variables can be extended to a joint p.d.f. of n random variables of the discrete type. Briefly, the joint p.d.f. of the n random variables X^ X ^ , . . . , X^ is denned by /(X[, X;, ..., Xn) == P(X^ = Xi, X^ = X;, . . . , X^ = X,,) over an appropriate space R. Furthermore,/(Xi, x^ „..., x,,) satisfies properties similar to those given in Definition 4.1-1. In addition, the marginal probability density function of one of n discrete random variables, say X ^ , is found by summing f ( x ^ , x ^ . . . . , x^) over all x,'s except x^; that is, /^)=£--- £ The random variables A\, X ^ , . . . , X ^ are mutually independent if and only if /(^, x^,..., x,) = f , ( x , ) f ^ ) - - - f ^ ) , Xi e R,, x, e R^, .... x, e R,. If X i , X ^ , . , . , X^ are not independent, they are said to be dependent. We are now in a position to examine more formally the concept of a random sample from a distribution. Recall that when we collected the n observations, X i , x ^ , . . . , x^, of X we wanted them in some sense to be independent, which we now observe is actually mutual independence. That is, before the sample is actually taken, we want the corresponding random variables X^ X ^ , . . . , X,, to be mutually independent and each to have the same distribution and, of course, the same p.d.f., say f(x). That is, the numbers X i , X ^ , ..-, X^ that are to be observed should be mutually independent and identically distributed random variables with joint p.d.f. f ( x ^ f ( x - i ) • Example 4.1-5 Let X i , X ^ , X y , X^ be four mutually independent and identically distributed random variables with the common Poisson p.d.f. Ve-2 213 Multivariate Distributions Ch. 4 Then, for illustration, P(A-i = 3, X , = 1, Xa = 2, X, = 1) = /(3)/(1)/(2)/(1) _2^^_1^8 3!1!2!1! _ 2 ^ e _ ' _ 3 2 _, 12 - 3 '' • Let us also compute the probability that exactly one of the X's equals zero. First we treat zero as " success." If W equals the number of successes, then the distribution of W is b(4, e~2) because P(A-,=0)=e- 2 , 1=1,2,3,4. Thus the probability of one success and three failures is PiW^l^-^i.e-V-e-1)3. We are now prepared to define officially a random sample and other related terms. Say a random experiment that results in a random variable X having p.d.f./(X) is repeated n independent times. Let X,, X j , ..., X , denote the n random variables associated with these outcomes. The collection of these random variables, which are mutually independent and identically distributed, is called a random sample from a distribution with p.d.f. f ( x ) . The number n is called the sample size. The common distribution of the random variables in a random sample is sometimes called the population from which the sample is taken. Let us now consider joint marginal distributions of 2 or more of n random variables. A joint marginal p.d.f. of X j and X^ is found by summing/^, x;, ..., x,,) over all x,'s except X j and x^; that is, .U^^)=£--- E £ Extensions of these marginal probability density functions to more than two random variables are made in an obvious way. Example 4.1-6 Consider a population of 200 students who have just finished a first course in calculus. Of these 200, 40 have earned A's, 60 B's, 70 C's, 20 D's, and 10 F's. A sample of size 25 is taken at random and without 214 4.1 Multivariate Distributions of the Discrete Type replacement from this population so that each possible sample has probability poo\ \25J of being selected. Within the sample of 25, let A', be the number of A students, Jf, the number of B students, X , the number of C students, X^ the number of D students, and 25 — dents. The space R of (A\, X;, X ^ , X ^ ) is defined by the collection of ordered 4-tuplets of nonnegative integers (x;, x;, X j , x^) such that x, + x; + x, + X4 < 25. The joint p.d.f. of X ^ , X ^ , X , , and ^"4 is rw\r6o'\no\^ov 10 \ \xJ\x^}\x,}\xJ\'15 - Xi - x; - xj - xj f(x,. x,. x,, xj = — — — — — — — — — — ^ — — — — — / — — — — — — — — — [is) for (xi, x-i, X s , X f ) e R , where it is understood that l'k\ . ) = 0 if j > k. Without actually summing, we know that the marginal p.d.f. of X , is poy 130 \ ^--{-^ .=0,1,2,...,25 [u) and the joint marginal p.d.f. of X ^ and X; is fll(Xi, X,) = /40y60Y 100 \ \xJ\xJ\25 - Xi - x j /200\ ' \V 0 < Xi, 0 < x,, Xt + x-t <, 25. Of course,/3(X3) is a hypergeometric p.d.f. and/izt^i* X2) sud/fxi, x^, -X3, ^4) are extensions of that type of p.d.f. It is easy to see that f(x^, X 2 , JC3, ^4) ^/i(Xi)/2(X2)/3(X3)/4(X4), and thus X^ X ^ , X ^ , and X^ are dependent. Note also that the space R is not rectangular, which would also imply that the random variables are dependent. The distribution in Example 4.1-6 illustrates an extension of the hypergeometric distribution. In general, instead of two classes, suppose that each of the 215 Multivariate Distributions Ch. 4 n objects can be placed into one of 5 disjoint classes so that «i objects are in the first class, n^ in the second, and so on until we find n, in the sth class. Of course, n = n, + n; + • n at random and without replacement. Let the random variable X, denote the number of observed objects in the sample belonging to the t'th class. Find the probability that exactly x; objects belong to the t'th class, i = 1, 2,..., s. Here 0 < x, < n, and Xi + x; + • /n,\ We can select x; objects from the ith class in any one of i I ways, i = 1, 2,..., 5. By the multiplication principle, the product equals the number of ways the joint operation can be performed. If we assume that each of the ( ) ways of selecting r objects from n = «i + "2 + "' V/ + n, objects has the same probability, we have that the probability of selecting exactly x, objects from the ith class, i = 1, 2,.... s, is •".y"2 P(Xt = X t , X ^ = X 2 , . . . , X , = x , ) = 0 ^ x, < n, and ^,Ax, x, + x; + • Example 4.1-7 The probability that a 13-card bridge hand (selected at random and without replacement) contains two clubs, four diamonds, three hearts, and four spades is f'TTT3') I 2 A 4A 3A 4 I <'52\ ll3/ 11,404,407,300 = 0.018. 635,013,559,600 We now consider an extension of the binomial distribution, namely the multinomial distribution. Consider a sequence of repetitions of an experiment for which the following conditions are satisfied: (a) The experiment has k possible outcomes that are mutually exclusive and exhaustive, say A^, A^,..., A^. 216 4.1 Muttivariate Distributions of the Discrete Type (b) n independent trials of this experiment are observed. (c) P(A,) = p,, i = 1, 2,..., k, on each trial with ^?= i P; = 1. (d) The random variable X , is equal to the number of times A, occurs in the n trials, i = 1, 2,..., k. If x^, x;,..., Xfc are nonnegative integers such that their sum equals n, then, for such a sequence, the probability that A, occurs x, times, i = 1, 2, ..,, k, is given by P{X, = x,, X, = x^, .... X, = x,) = x^. x^. • x^. To see that this is correct, note that the number of distinguishable arrangements ofxi Ai's, x; A^'s,..., Xk At'sis \JCi,JC2,...,Xfc/ X J X ; ! -•• Xk' and that the probability of each of these distinguishable arrangements is P^Pi2 • Hence the product for these two latter expressions gives the correct probability, which is in agreement with the expression for P(X^ = X i , Xy, = x^. ..., X,, = Xt). We say that X ^ , X ^ , ..., X^ have a multinomial distribution. The reason is that S~ 1 — T — — i P^Py ••• Pskk=(Pi+P2+•••+ PkT = ^ -^r^-'" ^k- where the summation is over the set of all nonnegative integers \i, x^, . - - , x,, whose sum is n. That is, P(Xi = a:i, X^ = x ^ , ..., X^ = x^) is a typical term in the expansion of the nth power of the multinomial (pi + pi + " ' + Pit)When k = 3, we often let X = X^ and Y = X^, then n - X - Y = X ^ . We say that X and Y have a trinomial distribution. The joint p.d.f. o f X and Y is ^•^-.Wn-.-yY.^-^-^""'- where x and y are nonnegative integers such that x + y < n. Since the marginal distributions of X and Y are, respectively, b{n, p^) and b(n, p^), it is obvious 217 Multivariate Distributions Ch. 4 that the product of their probability density functions does not equal /(x, y\ and hence they are dependent random variables. Also note that the support of X and y is triangular, so the random variables must be dependent. Example 4.1-8 In manufacturing a certain item, it is found that in normal production about 95% of the items are good ones, 4% are "seconds," and 1 % are defective. This particular company has a program of quality control by statistical method; and each hour an on-line inspector observes 20 items selected at random, counting the number X of seconds, and the number Y of defectives. If, in fact, the production is normal, the probability of finding in this sample of size n -= 20 at least two seconds or at least two defective items is 1 - P(X = 0 or 1 and Y = 0 or 1) 901 7ft l = i - ^o, (o.o^o.oino^)20 - -^^ (o.o4)\o.o\f{o.95r - 0^9, (o.o^o.oi)1^^)19 - Yv^g, (o.o^to.oi^o^)18 = 0.204. Exercises 4.1-1 Let the joint p.d.f. of X and Y be denned by /(x,^)=^, x = l , 2 , y=\,l, 3, 4. Find (a) /i(x), the marginal p.d.f. of X ; (b) f^y\ the marginal p.d.f. of V; (c) P[X > V); (d) P(Y = 2X); (e) P(X +Y = 3); (f) P(X ^ 3 - V). (g) Are X and Y independent or dependent? 4.1-2 Roll a red and a black four-sided die. Let X equal the outcome on the red die, and let Y equal the outcome on the black die. (a) On graph paper, show the space of X and Y. (b) Define the joint p.d.f. on the space (similar to Figure 4.1-1). (c) Give the marginal p.d.f, of X in the margin. (d) Give the marginal p.d.f. of Y in the margin. (e) Are X and Y dependent or independent? Why? 4.1-3 Roll a red and a black four-sided die. Let X equal the outcome on the red die and let Y equal the sum of the two dice. (a) On graph paper, describe the space of X and Y. (b) Define the joint p.d.f, on the space (similar to Figure 4.1-1). 218 4.1 Multivariate Distributions of the Discrete Type (c) Give the marginal p.d.f. of X in the margin. (d) Give the marginal p.d.f. of Y in the margin. (e) Are X and Y dependent or independent? Why? 4.1-4 Let ,X\, X - i , X^ denote a random sample of size n = 3 from a distribution with the geometric p.d.f. ^''©({r'• ^1.2,3,.... That is, -X'i, X ^ , and X^ are mutually independent and each has this geometric distribution. (a) Compute P(X, = 1, X ^ = 3, X^ = 1». (b) Determine P(X, + X ^ + X ^ = 5). (c) If V equals the maximum of X ^ , X ^ , X ^ , find P(V £ 4.1-5 A box contains a mixture of tea bags—15 spice, 5 orange, 10 mint, and 20 green. Select 4 tea bags at random and without replacement. Find the probability that (a) one of each kind of tea is selected, (b) all 4 tea bags are green tea. 4.1-6 Draw 13 cards at random and without replacement from an ordinary deck of playing cards. Among these 13 cards let A\ be the number of spades, -Y; the number of hearts, X^ the number of diamonds, and 13 — clubs. (a) Determine P(Xi = 5, X ^ = 4, X., = 3). (b) Among the 13 cards, what is the probability that the numbers of cards in the four suits are 5, 4, 3, and 1 ? HINT: Part (a) presents one way this could occur, but there are also other ways, for example, X , =3,X^= 5, X j = 1. 4.1-7 A particular pound of candy contains 136 jelly beans of which the numbers of black, green, orange, pink, purple, red, white, and yellow are 11, 12, 13, 16, 25, 32, 13, and 14, respectively. Sixteen jelly beans are selected at random and without replacement. (a) Give the probability that exactly two of each color are selected. (b) Let X equal the number of black, and let Y equal the number of red in the sample o f n = 16 Jelly beans. Give the joint p.d.f. of-Y and Y. (c» Find P(JS: ^ 2). 4.1-8 A box contains 100 Christmas tree light bulbs of which 30 are red, 35 are blue, 15 are white, and 20 are green. Fifteen bulbs are to be drawn at random from the box to fill a string with 15 sockets. Let X ^ denote the number of red, X^ the number of blue, and X j the number of white bulbs drawn. (a) Givef(xi,x^,x,,}, the joint p.d.f. of X ^ X ^ , and X ^ . (b) Describe the set of points for which/(xj, x^, ^3) > 0. (c) Determine/Jxi), the marginal p.d.f. of X ^ and find P(Xi = 10). (d) Find/iaOci, x;), the joint marginal p.d.f. of X, and X ^ . 219 Muftivariate Distributions Ch. 4 4.1-9 In a biology laboratory, corn is used to illustrate the Mendelian theory of inheritance. It is claimed that the four categories for the kernels of corn, smooth and yellow, wrinkled and yellow, smooth and purple, and wrinkled and purple, will occur in the ratio 9:3:3:1. Out of 208 kernels of corn, let A], X i , X j and X^ = 208 - Xi — theory is true, (a) Give the joint p.d-f. of X i , X ^ , Xj,and ^4,and describe the support in 3-space, (b) Give the marginal p.d.f. of X ^ . (c) Give the joint marginal p.d.f. of X ^ and -X";. (d» Find E(X^), E[X^ E(X^}, and E(X^). 4.1-10 Toss a fair die 12 independent times. Let X , denote the number of times i occurs, i = 1, 2, 3,4, 5, 6. (a) What is the joint p.d.f. of X ^ X ^ , . . . , X^ (b) Find the probability that each outcome occurs two times. (c) Find P(X, = 2». (d) Are X ^ , X ^ , . . . , X ^ mutually independent? 4.1-11 A manufactured item is classified as good, a "second," or defective, with probabilities 6/10, 3/10, and 1/10, respectively. Fifteen such items are selected at random from the production line. Let X denote the number of good items, Y the number of seconds, and 15 — (a) Give the joint p.d.f. of X and V,/(x, y}. (b) Sketch the set of points for which f ( x , v} > 0, From the shape of this region, can X and r be independent? Why? (c) Find P{X = 10, Y = 4). (d) Give the marginal p-d.f. of X . (e) FindP(X ^ 11). 4.1-12 Following the second "Great Debate," assume that the proportions of listeners who thought that Reagan had won, Mondale had won, and it was a tie were ?„ = 0.40, py = 0.35, and pj- = 0.25, respectively. In-a random sample o f n = 100 listeners, let X equal the number who thought Reagan had won and let Y equal the number who thought Mondale had won. (a) Give the joint p.d.f. of X and y. (b) What is the marginal distribution o f X ' ? 4.2 The Correlation Coefficient Let A"i, X ^ , ..., X^ be random variables of the discrete type having a joint distribution. In this section we consider the mathematical expectation of functions of these random variables. If u(X ^ X ^ , . . - , X ^ ) is a function of « variables of the discrete type that have a joint p.d.f. f { x ^ x;,..., x,,) and space R, then EW,,X,,...,X^=^--- ^«(x,,x,,...,xJ/(x,,x,,...,xJ, 220 4.2 The Correlation Coefficient if it exists, is called the mathematical expectation (or expected value) of u(X,,X^...,X^. Example 4.2-1 There are eight similar chips in a bowl: three marked (0, 0), two marked (1, 0), two marked (0, 1), and one marked (1, 1). A player selects a chip at random and is given the sum of the two coordinates in dollars. If -X"! and X^ represent those two coordinates, respectively, their joint p.d.f. is /(xi, x,) = 3 ~ xl ~ x l , x, = 0, 1 and ^ = 0, 1. 0 Thus E(X,+X,)= i i (x, + x,) 3 ~ x^ "xl : ^)^^^- That is, the expected payoff is 75fi. The following mathematical expectations, subject to their existence, have special names: (i) If Ui(^i, X - i , . . . , X.) = X , , then £[ui(A-i,A',,...,JC.)]=£(A-,)=^ is called the mean of X,., j = 1, 2,..., n. (ii) If u,(Xi, X , , . . . , X.) = ( X , - i i f , then £["2(A'i, X , , . . . , X,)-] = EKX, - /i,)2] = ff.2 = Var(A',) is called the variance of -Y;, i = 1, 2,..., n. (iii) If u,(X^, X-,,..., X.) = ( X , - MX, - f4 i ¥• j , then £[u3(A-i, X,, ..., X.)~\ = £[(A-, - ^X, - ^)] = s,, = Cov(;y,, X,) is called the covariance of X , and X j . (iv) If the standard deviations (T| and a, are positive, then u Covpf,, X ) a fT i j g,j ^i^i is called the correlation coefficient of X, and X j . 221 Multivariate Distributions Ch It is convenient to observe that the mean and the variance of X , can computed from either the joint p.d.f. or the marginal p.d.f. of X , . F example, if n = 2, tl, = E(X,) = S E ^/(^i, ^) =I:xi[l:/(*i,^)]=E<i/i(^i)Before considering the meaning of the covariance and the correlatii coefficient, let us note a few simple facts. With i ^ j , EUX, - MX, - ^.)] = E(x;\i - v.;x, -fi,x,+ /i,/;,) = E(X; A-,) - n,E(X,) - ^E(X,) + w, because it is true that even in the multivariate situation, £ distributive operator (see Exercise 4.2-4). Thus Cov(X,, X,) = E(X,X,) - w, - ^11; + w, = E(X,X) - f i i i i , . Since p^ = CoviX,, ^•)/<T, a,, we also have E(X, X j l = it, itj + py (T, s,. That is, the expected value of the product of two random variables is eqi to the product ^, f i j of their expectations plus their covariance py a, a j . A simple example at this point would be helpful. Example 4.2-2 Let X^ and X^ have the joint p.d.f. /(^., x,) = xl ^x2 . xi=l,2, x,=l,2. The marginal probability density functions are, respectively, ^)=£^2^6, ^1,2, and c / v i - y x! + ^2 . 3 + 4x, _ '~L, 18 "" 18 • J1{X1 222 4.2 The Correlation Coefficient Since/(Xi, x;) ^/i(x,)/,(x;), A'i and X; are dependent. The mean and the variance of X ^ are 2 2xi+6 t'S\ /'l0\ 14 ^=^^-^=^\^)+^\Ts)^• ;_ „ ''l~,L,xl 18 - ^7 9 - '8T- 8T • The mean and the variance of X^ are ^w-w^, 2 fc= E 3+4x2 18 / 7\ + • \ / *2 ——————— = ( l ) 7 o 10 .2-1 10 and 2 _ v 2 3+4x; _ /My 51 841 77 ^ " - i ^ 2 18 "iTsY - 18" 324" 324' The covariance of X^ and X^ is Cov^.^^^x,.-^2-^) (lx!) 3+^ 2 l)^ +( l) 0 2) © < " (^) ( ( (,l •/^-f^ •\18; ^ 9 A -o-^xs. = 45 406_ J_ 18- 162 -- 162 ' Hence the correlation coefficient is p= , -l/w =——==-0.025. V'(20/81)(77/324) yT540 Insight into the correlation coefficient p of two discrete random variables X and V may be gained by thoughtfully examining its definition E (x - f»Xy - fr)/fe V) p="—————————————, 223 Multivariate Distributions Ch. 4 where ^, jUy, a^, and o-y denote the respective means and standard deviations. If positive probability is assigned to pairs (x, y) in which both x and y are either simultaneously above or simultaneously below their respective means, the corresponding terms in the summation that defines p are positive because both factors (x — negative. If pairs (x, y), which yield large positive products (x - /^)(y ~ ^\ contain most of the probability of the distribution, the correlation coefficient will tend to be positive (see Example 4.2-3). If, on the other hand, the points (x, y), in which one component is below its mean and the other above its mean, have most of the probability, then the coefficient of correlation will tend to be negative because the products (x — 4.2-5). This interpretation of the sign of the correlation coefficient will play an important role in subsequent work. To gain additional insight into the meaning of the correlation coefficient p, consider the following problem. Think of the points (x, y) in the space R and their corresponding probabilities. Let us consider all possible lines in twodimensional space, each with finite slope, that pass through the point associated with the means, namely [p.^, ^y). These lines are of the form y — b(x — (XQ, Yo) so that/(xo, y'o) > 0, consider the vertical distance from that point to one of these lines. Since j/p 1s the height of the point above the x axis and ^y + b(xo — point (xo, Vo), then the absolute value of the difference of these two heights is the vertical distance from point (XQ , yy) to the line y = p.y + b(x — the required distance is | y^ — tance and take the weighted average of all such squares; that is, let us consider the mathematical expectation E[\,(Y-^-b(X-^}Y}^K{b). The problem is to find that line (or that b) which minimizes this expectation of the square {Y — least squares, and the line is sometimes called the least squares regression line. The solution of the problem is very easy, since K(b) = E[{Y - ^ - 2b(X - ^}(Y - ^} + b\X - /^)2} = o-^ — because £ ingly, the derivative K'(b) = -Ipffxffy + 2^ equals zero at b = p f f y / ^ x ' an(! we see that K(b) obtains its minimum for that b since K"{b} = 2a^ > 0. Consequently the least squares regression line (the line 224 4.2 The Correlation Coefficient of the given form that is the best fit in the foregoing sense) is y = /ty + p -JL (x - fijc). Of course, if p > 0, the slope of the line is positive; but if p < 0, the slope is negative. It is also instructive to note the value of the minimum of K{b] = £{[(V - ^» - b{X - ^)]2} = o2 - 2bpa^ a, + b2^. It is K.[p ^^a^-lp^ po^oy + [p ^ ) a\ \ Ox} a x \ a\) = 0} - 2p2o-2 + p 2 ^ 2 = o^l - p2). Since K(b} is the expected value of a square, it must be nonnegative for all b, and we see that cr^(l - p2) >: 0; that is, p2 < 1, and hence - 1 < p < 1, which is an important property of the correlation coefficient p. Note that if p = 0, then K(pay/Ujc) = o^\ on the other hand, X(piTy/cr^) is relatively small if p is close to 1 or negative 1. That is, the vertical deviations of the points with positive density from the line y = ^y 4- p((7y/<r^)(x - p^) are sm^11 if P *s close to 1 or negative I because X(p(rr/ff^) is the expectation of the square of those deviations. Thus p measures, in this sense, the amount of linearity in the probability distribution. As a matter of fact, in the discrete case, all the points of positive density lie on this straight line if and only ifp is equal to 1 or negative 1. REMARK More generally, we could have fitted the line v(x) = a + bx by the same application of the principle of least squares. We would then have proved that the " best" line actually passes through the point (/^, py). Recall that in the discussion above we assumed our line to be of that form. Students will find this derivation to be an interesting exercise using partial derivatives (see Exercise 4.2-5). The following three examples illustrate joint discrete distributions for which p is positive, zero, and negative, respectively. In Figures 4.2-1 and 4.2-2 the line of best fit or the least squares regression line is also drawn. E^inple 4.2-3 Roll a pair of four-sided dice for which the outcome is 1, 2, EJ^mple 3, or 4 on each die. Let X denote the smaller and Y the larger outcome on 225 Multivariate Distributions Ch. 4 the dice. Then the joint p.d.f. o f X and Y is denned by ^, l^x=^4, f^, y)= ^ 16- l <Jt<^4• It can be shown that E(X) = 15/8, E(Y) = 25/8, VarpO = 55/64, Var(r) = 55/64, Cov(A', r) = 25/64, and p = 25/55. Thus the line of best fit F6 r6 16 Figure 4.2-2 226 4.2 The Correlation Coefficient The joint p.d.f. is depicted in Figure 4.2-1. On this figure we have drawn horizontal and vertical lines through (/ix, /^y) and also the line of best fit. Example 4.2-4 Roll an unbiased four-sided die two independent times. Let X equal the outcome on the first roll and Y the outcome on the second roll. The joint p.d.f. of AT and Y is f ( x , y)=— 16 x = 1, 2, 3, 4, and y = 1, 2, 3, 4. Since the marginal p.d.f.'s are the same, we have {ijc = {iy = 2.5, Var(JQ = Var(Y) = 5/4. Because E(XY) = 100/16, Cov(X, Y) = 100/16 - (2.5)(2.5) = 0, and thus p = 0. The line of best fit would simply be the horizontal line through the point (/^, /iy). Of course, if we were minimizing the expected value of the square of the horizontal distances, then the line of best fit would be the vertical line through (jUy, /<y). The joint and marginal p.d.f.'s are depicted in Figure 4.2-2. Example 4.2-5 Let X equal the number of ones and Y the number of twos and threes when a pair of fair four-sided dice are rolled. Then X and Y have a trinomial distribution with p.d.f. ^••^xl^-x-^W 0 < x + y < 2, where x, and y are nonnegative integers. Since the marginal p.d.f. of X is t(2, 1/4) and the marginal p.d.f. of V is b(2, 1/2), we know that it, = 1/2, Var(X) = 6/16, ii, = 1, and Var(Y) = 1/2. Since E(XY) = (1X1X4/16) = 4/16, Cov(X, Y) = 4/16 - (1/2X1) = -4/16; therefore, the correlation coefficient is p = - 1A/3. Using these values for the parameters, we obtain the line of best fit, namely The joint p.d.f. is displayed in Figure 4.2-3. On this figure we have drawn horizontal and vertical lines through (/^x, /^y) and also the line of best fit. 227 Multivariate Distributions Suppose that X and Y are independent so that/(x, y) s f i ( x ) f ^ y ) and we want to find the expected value of the product (X - ii^Y - fi,). Subject to the existence of the expectations, we know that EWW] = S £ R =£ Rl Rl = Y, uW,(x) £ »1 Sl = EWWW)-]. This can be used to show that the correlation coefficient of two independent variables is zero (see Example 4.2-4). For, in a standard notation, we have Cov(X, Y)=EQX-^W-h)'} = E(X - ^)E(Y - 11,) = 0. The converse of this fact is not necessarily true, however; zero correlation does not in general imply independence. It is most important to keep this straight; independence implies zero correlation, but zero correlation does not necessarily imply independence. The latter is now illustrated. E^,liple 4.2-6 Let X and Y have the joint p.d.f. fl.x, y) = j , 228 (x, y) = (0, 1), (1, 0), (2, 1). 4.2 The Correlation Coefficient Since the support is not "rectangular," X and V must be dependent. The means of X and Y are ^ = 1 and f t , = 2/3, respectively. Hence Cov(X, Y) = E(XY) - fe fly = (0)(1)Q) + (D(O)Q') + (2)(1)Q') - (1/J') = 0. That is, p = 0, but X and Y are dependent. Exercises 4.2-1 Let the random variables X and Y have the joint p.d.f. /(x,?)-^, x=l,2, y - 1,2, 3, 4. Find the means ^y and /^y, the variances CT^ and ffy, and the correlation coefficient p. Are X and Y independent or dependent? 4.2-2 Let X and Y have the joint p.d.f. described as follows: (x. y} f(x, y) (0, 0) (1.0) (1.1) (2,1) 1/6 2/6 2/6 1/6 Find the correlation coefficient p and the " best-fitting " line. HINT: First depict the points in R and their corresponding probabilities. 4.2-3 Roll a fair four-sided die twice. Let X denote the outcome on the first roll, and let V equal the sum of the two rolls. Find (•) C,, (b) ^, (<•) f 1, , w " ,, (e) Cov(A-, V), (t) p. (g) the best fitting line. , (h) Display the joint p.d.f. as done in Figure 4.2-1 and draw the best-fitting line on fc „ i.s 229 Muttivariate Distributions Ch. 4 4.2-4 In the multivariate situation, show that £ convenience, let n = 1 and show that E\_a,u^Xi,X^+a2Ui(Xi,X^'] = a^E[u^X^, X^ + a^E^u^X^ X^]. 4.2-5 Let X and Y be random variables with respective means ^ and ^, respective variances cs\ and 0}, and correlation coefficient p. Fit the line y = a + bx by the method of least squares to the probability distribution by minimizing the expectation K(a, b) = £[(r - a - bX)^ with respect to a and b. HINT: Consider 8K/9a = 0 and SK/Sb = 0 and solve simultaneously. 4.2-6 Let X and V have a trinomial distribution with parameters n = 3, p, = 1/6, and p; = 1/2. Find (a) E(X), (b) £(n (c) (d) (e) (0 Var(JO, Var(r), Cov(X, Y), P- Note that p = -Vpipz/O - PiXl - p2>. 4.2-7 Let the joint p.d.f. of X and V b e f ( x , y) = 1/4, (x, >') 6 ^ = {(0, 0), (1, 1), (1, -1), (2, 0)}. (a) Are X and V independent ? (b) Calculate Cov(A", Y} and p. This also illustrates [he fact that dependent random variables can have a correlation coefficient of zero. \4.f-S The joint p.d.f. of X and Y is f { x , y) = 1/6, 0 £ Y/integers. \(a) Sketch the support of X and V. •^(b) Record the marginal p.d.f.'s/i^and/^tin the "margins." >/(c) Find Cov[X, Y). ^(d) Find p, the correlation coefficient. ./(e) Find the best-fitting line and draw it on your figure. 4.2-9 Let X ^ , X^ be a random sample of size n = 2 from the distribution with the binomial p.d.f. /M-mrr", -0,1,3. Find the joint p.d.f. of Y = X^ and W = X^ + X ^ , determine the marginal p.d.f, of W and compute the correlation coefficient of Y and W. 230 4.3 Conditional Distributions HINT: Map the nine points (x^, x;) in the space ofA'i, X^ into the nine points (y, w} in the space of V, W along with the corresponding probabilities and proceed as in earlier exercises. 4.3_________________________________ Conditional Distributions Let X and Y have a joint discrete distribution with p.d.f. f { x , y} on space R. Say the marginal probability density functions are /i(x) and f ^ y ) with spaces RI and R,, respectively. Let event A = {X = x} and event B = {Y = y}, (x, y) e R. Thus A n B = [X = x, Y = y}. Because P(/l r ^ B ) = P ( X = x , Y = y) = f ( x , y) and P(B) = P( Y = y) = f,(y) > 0 (since y e R,), we see that the conditional probability of event A given event B is P(A\B)=p(Ar'B)=f{x^ •v1"" P(B) /AO This leads to the following definition. DEFINITION 4.3-1 The conditional probability density function of X, giten that y = y, is defined by S(x | y) = f<—^ , provided that f,(y) > 0. Similarly, the conditional probability density function of V, given that .Y = v. is defined by f ( x v) h(y | x) = — — —, Ji(x) provided that f ^ x ) > 0. Example 4.3-1 Let X and Y have the joint p.d.f. f^x,y)=x^y. x = l , 2 , 3 , y = l , 2. 231 Multivariate Distributions • 2 ' 21 3 4 • ! ! ^ ^) Figure 4.3-1 In Example 4.1-2 we showed that ,. > 2x + 3 /iM: x = 1, 2, 3, and Uy) = 3y+6 Thus the conditional p.d.f. of X given Y == y, is equal to (x + y)/21 »(x|y)= (3J' + 6)/21 x +J' 3^ + 6 ' x = 1, 2, 3, when y = 1 or 2. For example, P(^=2|y=2)=s,(2|2)=^=^. Similarly, the conditional p.d.f. of Y, given -T = x, is equal to h(y]x)= x+y 2x+3' y = 1, 2, when ,x = 1, 2, or 3. The joint p.d.f. f { x , y) is depicted in Figure 4.3-1 along with the marginal p.d.f.'s. Conditionally, if y = 2, we would expect the outcomes ofx, 1, 2, and 3, to occur in the ratios 3:4:5. This is precisely what g(x \ y) does, namely 9(112)= 1 +2 12 9(21 2)= 2+2 12 9(3| 2)= 3+2 12 Figure 4.3-2 displays g(x|l) and g(x\2), while Figure 4.3-3 gives h(y\l), h(y) 2), and h(y | 3). Compare the probabilities in Figure 4.3-3 with those in 232 Conditional Distributions • Figure 4.3-2 Figure 4.3-1. They should agree with your intuition as well as with the formula for h(y | x). Note that 0 < h(y | x) and ^'^^f^1- Thus h(y \ x) satisfies the conditions of a probability density function, and so we can compute conditional probabilities such as P(a<Y<b\X=x)= ^ (y:a<y<b( h (y\x) and conditional expectations such as EW)\X=x-]=^u(y}h(y\x) in a manner similar to those associated with probabilities and expectations. Two special conditional expectations are the conditional mean of Y, given X = x, defined by ^^=E(Y\x)=^yh(y\x), h(y\\)\ h(y\1)\ h(.y\3) ' y 'T • Figure 4.3-3 233 Multivariate Distributions Ch. 4 and the conditional variance of V, given X = x, denned by o?i, = £{[Y - £(Y i x)] 2 1 x] = S: [>. - £(r | x)]2^ | x), y which can be computed using a2y^=E(Y2\x)~\_E(Y\x)•]2. The conditional mean f i ^ \ y and the conditional variance o-^iy are given by similar expressions. Example 4.3-2 We use the background of Example 4.3-1 and compute f i y I ^ and a2, i _r, when x = 3: ^^^^^^^ and -<(r-?^-'H,('-?)•(^) ^ 25 /4\ ^6 /5^ ^ 20 ~ 8 1 W^Sl W ' S l ' The conditional mean of X , given V = y, is a function ofy alone; the conditional mean of Y, given X = x, is a function of x alone. Suppose that the latter conditional mean is a linear function of x; that is, E[Y\x)=a + bx. Let us find the constants a and b in terms of characteristics ^, f l y , ajc, c^, and p. This development will shed additional light on the correlation coefficient p ; accordingly we assume that the respective standard deviations Uy and ffy are both positive so that the correlation coefficient will exist. It is given that yl! £/ -f="+<", y JlW ^R., where R^ is the space o f X . Hence £ y and £ xeR,. y 234 xeRi 4.3 Conditional Distributions that is, with to and ^y representing the respective means, we have P.Y = a + fcto . (4.3-2) In addition, if we multiply both members of equation (4.3-1) by x and sum, we obtain Z Zxy/(x,y)= ^ (ax + bx'Vifx); xeRi y xeRi that is, E(XY) = aE{X} + bE(X2) or, equivalently, to pi, + pa, ffy = ato + 6(» + ri). (4.3-3) The solution of equations (4.3-2) and (4.3-3) is "i , , "i a = P.Y — "x "x which implies that if£(y!x) is linear, it is given by E(Y^x)=^l,+paf-^,x-^l^. "x That is, if the conditional mean of Y, given X = x, is linear, it is exactly the same as the best-fitting line considered in Section 4.2. Of course, if the conditional mean of X , given Y = y, is linear, it is given by £(;y|>.)=to+p^b'-ft). ffy We see that the point [x = to, E(Y\x) = ^iy] satisfies the expression for £(V|x); and \_E(X\y) = to, » is, the point (to, fy) is on each of the two lines. In addition, we note that the product of the coefficient of x in £<y|x) and the coefficient of y in E(X\y) equals p2 and the ratio of these two coefficients equals o^jo\ • tions sometimes prove useful in particular problems. Example 4.3-3 Let X and Y have the trinomial p.d.f. with parameters n, pi, p2, and 1 — f^y^xWn-x-yV.^'11''3'"'- 235 Multivariate Distributions Ch. 4 where x and y are nonnegative integers such that x + y < n. From the development of the trinomial distribution, it is obvious that X and Y have marginal binomial distributions b(n, pi) and b(n, p^), respectively. Thus ^^f^A, ("-^' "'' AM p^W^Y'"' >•'("- *->')!li-piAi-pJ ' y = 0, 1, 2 , . . . . n - x . That is, the conditional p.d.f. of Y, given X = x, is binomial »L,,^1 L i-pj and thus has conditional mean £(V|x)=(n-x)- p2 1-Pi In a similar manner, we obtain E(X\y}=(n-y)- p1-P2 Since each of the conditional means is linear, the product of the respective coefficients of x and y is ^ ^ (^Pi_\( ^_Pi_'\ _ . _PiP2 _ \1-PJ\1-PJ (l-Pl)(l-P2) However, p must be negative because the coefficients of x and y are negative, thus PiPi ^ (1-PiXl-Pi)' p= Exercises '^•1 Let ^ and V have the joint p.d.f. /(*, y} = ^ , x = 1, 2, >• = 1, 2, 3, 4. (a) Display the joint p.d.f. and the marginal p.d.f.'s on a graph like Figure 4.3-1. (b) Find g(x | v) and draw a figure like Figure 4.3-2, depicting the conditional p.d.f.'s for y = 1, 2, 3, and 4. 236 4.3 Conditional Distributions (c) Find h(y\ x) and draw a figure like Figure 4.3-3, depicting the conditional p.d.f.'s for x = 1 and 2. (d» Find (i) P(l ^ Y ^ 31 X = 1), (ii) P(Y sS 21 X = 2), and (iii) P(X = 2 \ Y = 3), j(e) Find £(V | X = 1) and Var(r | X == 1). ^(L3-2 Let the joint p.d.f./(x, y) of X and y be given by the following: (x, y) f(x. y) (1,1) (2,1) 3/8 1/8 (1,2) (2, 2) 1/8 3/8 Find the two conditional probability density functions and the corresponding means and variances. 4.3-3 Let W equal the weight of laundry soap in a 1-kilogram box that is distributed in Southeast Asia. Suppose that P(W < 1) = 0.02 and P{W > 1.072) = 0.08. Call a box of soap light, good, or heavy depending on whether W < 1, 1 <. W <: 1.072, or W > 1.072, respectively. In a random sample of n = 50 boxes, let X equal the number of light boxes and Y the number of good boxes. (a» What is the joint p.d.f. of X and V? (b) Give the name of the distribution of Y along with the values of the parameters of this distribution. (c) Given that X = 3, how is Y distributed conditionally? (d) Determine E{Y\X =3). (e) Find p, the correlation coefficient of X and Y. 4.3-4 The genes for eye color for a certain male fruit fly are (R, W). The genes for eye color for the mating female fruit fly are (R, W). Their offspring receive one gene for eye color from each parent. If an offspring ends up with either (R, R), (R, W), or (W, R), its eyes will look red. Let X equal the number of offspring having red eyes. Let V equal the number of red-eyed offspring having (R, W) or (W, R) genes. (a) If the total number of offspring is n = 400, how is X distributed? (b) Give the values of E(X} and Var(A'). (c) Given that m = 300 offspring have red eyes, how is Y distributed? (d) Give the values of£(T) and VarfY). 4.3-5 Let X and Y have a trinomial distribution with n = 2, pi = 1/4, and p; = 1/2. (a) Give£(r|x). (b) Compare your answer to part (a) with the equation of the line of best fit in Example 4.2-5. Are they the same? Why? 4.3-6 (a) With the background of Example 4.2-3, find £(V | x) for x= 1,2,3,4. (b) Do the points [x, £( Y \ x»] lie on the lie of best fit ? Why ? REMARK It was stated that I/the conditional mean of Y, given X = x, is linear, then it is exactly the same as the line of best fit. 237 MultivariatB Distributions 4.3-7 Using the joint p.d.f. given in Exercise 4.2-3, give the value of £( Y | x} for x = 1,2, yr\/, 4. Is this linear? Do these points lie on the best-fitting line? ^AE^-8 An unbiased six-sided die is cast 30 independent times. Let X be the number of one's and Y the number of two's. (a) What is the joint p.d.f. of X and Y'! (b) Find the conditional p.d.f. of X , given Y = y. (c) Compute E(X1 - 4XY + 3r 2 ). 4.3-9 Let X and V have a uniform distribution on the set of points with integer coordinates in R = {(x, y): 0 <, x s 7, x £ and both x and y are integers. Find (a) /,M, (b) t&N, (c) Em.t), /,(<1)^, dMe) Ar). jyt.3-10 Let/i(.i) = 1/10, x = 0, 1, 2, .... 9, and t(y|») = 1/(10 - x}, y = x, x + I, .... 9. " Find 00 Ax, y), W fiW, te) £(y|x). MJp-n Let X and V have a joint uniform distribution on the set of points with integer coordinates in R = {{x, y]: I < x <. 4, 4 - x <, y a 6 - x}. That is, f ( s , y) •= 1/12, ffi^' rf e •"• ($W Sketch the set 8.. (b) Define the marginal p.d.f.'s/iOc} and/at^) in the "margins." (c) Define h{y \ x), the conditional p.d-f, of Y, given X = x. (d) Find £(Y | x) and draw y = £(V | x) on the sketch in part (a). 4.3-12 Referring to Exercise 4.2-9, determine the conditional p.d.f. of Y, given W = w, and the conditional mean E{Y \ w}. 4.4 Multivariate Distributions of the Continuous Type In this section we extend the idea of the p.d.f. of one random variable of the continuous type to that of two or more random variables of the continuous type. As in the one variable case, the definitions are the same as those in the discrete case except that integrals replace summations. For the most part we simply accept this substitution, and thus this section consists mainly of examples and exercises. The joint probability density function of n random variables X ^ , X ^ , . . . , X, of the continuous type is an integrable function f ( x ^ , x^, ..., .x,,) with the 238 4.4 Multivariate Distributions of the Continuous Type following properties: (a) f(x,,x,,...,x.)>0. /*» (b» f CO • • • f ( x ^ x ^ , . . . , x ^ d x , ••• dx,=l. (c) P [ ( ^ ^ , ^ , . . . , ^ ) e A ] = f •-• \ f ( x , , x , , . . . , x ^ d x , •••dx,, J A J where (-.^i, X ^ , . . . , X^ e A is an event denned in n-dimensional Euclidean space. For the special case of a joint distribution of two random variables X and Y, note that PQX, Y) e A] = j^... f(x, y) dx dy, and thus P[_(X, Y) e A~\ is the volume of the solid over the region A in the xy plane and bounded by the surface z =/(x, y). Example 4.4-1 Let X and Y have the joint p.d.f. f ( x , y) = e'''', 0<x<aa,0<y<w. The graph of z =f(x, y) is given in Figure 4.4-1 Figure 4.4-1 239 Multivariate Distributions 2 Ch. 4 1 when x + y < 9. Let A = {(x, y): 0 < x < ao, 0 < y < x/3}. The probability that (X, Y) falls in A is given by P[{X, Y) e A} = | | e-'-' dy dx = \ e-^-e-^S' dx Jo Jo Jo = f"[e-- - e-4"3] & = f-e-' + 3 e-4"3]"' Jo ff \ 4 L Jo The marginal p.d.f. of any one of these n random variables, say X,,, is given by the (n — ' c« p» „ ^ The definitions associated with mathematical expectations are the same as those associated with the discrete case after replacing the summations by integrations. Example 4.4-2 Let X and Y have the joint p.d.f. f ( x , y) == 2, 0£ T h e n / ? = { ( x , } ' ) : 0 < x < j ' < l } i s t h e support and, for illustration, p ( o & X <<{{.,,,00 < < iiY Y< < l^}}==Pp((oo < <X X< <Y Y , 00 < <Y Y< < l' }\ f \ t 2/ \ 2 rw r, •'/ 2y ly i rw -Jo i ^^'! ' ^• The shaded region in Figure 4.4-2 is the region of integration that is a subset of R, and the given probability is the volume above that region under the surface z = 2. The marginal p.d.f.'s are given by -I /,(x) = and 2 dy = 2(1 - x), M=r f,(y) = 240 Jo 2 dx = 2y, 0 < x< 1, 0 ^ y ;s 1. 4.4 Multivariate Distributions of the Continuous Type Figure 4.4-2 Four illustrations of expected values are £(JQ=j I 2 x d y d<xx = | j ' 22x(l x ( l-- xx)) dx d x= =l1 , n r) = £(V)= ri (•» ' i rir' 22 2y dx dy = \ 2y2 riy dy =j- ,. Jo 1 E(Y')= } = iI ^l 22 yy 22 ddxx ddyy = {I l 2/ 2 y 3dy d y==^^ , and Cov(A-, 7) = EKX - ft^Y - 11,1} = E(XY) - ^^ .n;..-,-,-i-;-j-^ = r [° ^/(^ >•) ^ dy - QYJ') From these calculations it is obvious that E(X), E{Y), and £(V2) could be calculated using the marginal p.d.f.'s as well as the joint one. Let X and Y have a distribution of the continuous type with joint p.d.f. /(x, y) and marginal p.d.f.'s/i(x) and/^), respectively. So in accord with our policy of transition from the discrete to the continuous case, we have that the 241 Multivariate Distributions Ch. 4 •^K^^^K&t-'pAA^-ra-icaTi.^rib-yatiaiice (3i'?, given ^ = x, are, respectively, h(y | x) = ^x' y > , AM £(r|x)= provided that /,(x) > 0, r" y/i(v|x)^, and Varfylxl^Cy-Stri.ia-'l.i} = r' [.y-E(Y\x)-}lh{y\x)dy =£[r2|x]-[£(y|x)]•l. Similar expressions are associated with the conditional distribution of X, given Y=y. Example 4.4-3 Let X and Y be the random variables of Example 4.4-2. Thus /(x, y) = 2, 0 $ x < y < 1, f,(x) = 2(1 - x), 0 < x S 1, My) =2y. 0 < y < 1. and Before we actually find the conditional p.d.f. of Y, given X = x, we shall give an intuitive argument. The joint p.d.f. is constant over the triangular region shown in Figure 4.4-2. If the value of X is known, say X = x, then the possible values of Y are between x and 1. Furthermore, we would expect Y to be uniformly distributed on the interval [x, 1]. That is, we would anticipate that h(y\x) = 1/(1 — definition that ^^-y^rT-xThe conditional mean of Y, given X = x, is 242 - 0<X<1 - ,^•'.hr^-lx-^l— p\yT—x i . dyr =[w^)! y2 r =li+;-^• 0 < X < 1 - E Y = ( x<y<l L4 Multivariate Distributions of the Continuous Type Note that, for a given x, the conditional mean of Y lies on the dotted line in Figure 4.4-2, a result that also agrees with our intuition. Similarly, it could be shown that E(X\y)=^, 0^y<l. The conditional variance of Y, given X = x, is E{\_Y-E(Y\x)•]2^x}= ['(y-1—^ Jx \ L -{wRecall that if a random variable W is U(a, b), then E{W) = (a + b)/2, and Var(W) =(b— a f l l l . Since the conditional distribution of V, given X = x, is U(x, 1), we could have written down immediately that E{Y\x) = (x + 1)/2 andVar(r|x)=(l - x f l l Z . An illustration of a computation of a conditional probability is <!' f^o '-Hr'G r' ' ^ •"'= L^^-' In general, if £(Y | x) is linear, it is equal to E(Y\x)=ii,+p(°'\(x-^). ^x/ If E(X | y) is linear, then E(X\y)=^+p(^\y-fi,). V^y/ Thus, in Example 4.4-2, we see that the product of the coefficients of x in E(Y\x) and y in E ( X \ y ) is p2 = 1/4. Thus p = 1/2 since each coefficient is positive. Since the ratio of those coefficients is equal to ffy/o-2- = 1, we have that a1 = o?. 243 Multivarjate Distributions Ch. 4 We could have calculated the correlation coefficient directly from the definition __ E[,(X - ^Y - fa)] _ Cov(X, Y) "x "r "x "i In Example 4.4-2 we showed that Cov(X, V) = 1/36. We also found E(Y) and £(V2) so that ff 2 = E(Y2} - [£(V)]2 = 1/2 - (2/3)2 = 1/18. Since a\ = ff 2 , 1/36 p- 1 \[W^[\IW~'2' Of course, the definition of independent random variables of the continuous type carries over naturally from the discrete case. That is, X and Y are independent if and only if the joint p.d.f. factors into the product of their marginal p.d.f.'s, namely, f(x, y) =fi(x)f^). x e Rt, y e R,. Thus the random variables X and Y in Example 4.4-1 are independent. In addition, the rules that allow us to determine easily dependent and independent random variables are also valid here. For illustration, X and Y in Example 4.4-2 are obviously dependent because the support R is not a product space, since it is bounded by the diagonal line y = x. Exercises 4.4-1 Let/(x, y) = 2e-x-', 0 < x -, y < oo, be the joint p.d.f. of X and Y. Find/iM and/^), the marginal p.d.f.'s of X and Y, respectively. Are X and Y independent? 4.4-2 (a) (c) (e) Let/(;(, y) = 3/2, x2 a y S 1, 0 a x < 1, be the joint p.d.f. of A" and Y. Find P(0 £ P(l/2 s X •£ 1, 1/2 < Y £ Are X and Y independent? 4.4-3 Let/(x, y} = 1/4, 0 < x < 2, 0 <, y £ and/^l, the marginal probability density functions. Are the two random variables independent? 4.4-4 Let X and Y have the joint p.d.f. f ( x , y} = x + y,0 < x < 1,0 < y < 1. (a) Find the marginal p.d.f.'s y^(x) andf^y) and show that/(x, y) ?'/i(x)/;(y). Thus X and Y are dependent. Compute (b) f t ) ; , (c) ^y, (d) tj\, (e) <7 2 , (f) the correlation coefficient p. 4.4-5 Let/(x, y) = e ~ x ~ y , Q < x < oo, 0 < y < oc, be the joint p.d.f. of X and V. Argue that X and y are independent and compute (a) P(X < n (b) pyr > i, Y > i), 244 4.4 Multivariate Distributions of the Continuous Type (c) P(X = F), (d) P(X < 2), (e» P(0 < X < oo, X / 3 < Y < 3X), (f) P(0 < X < oo, 3X < Y < cc). 4.4-6 (a) (b) (c) (d) (e) Let/(x, y) = 1/20, x ^ ^ ^ x + 2,0 < x < 10, be the joint p.d.f. of X and Y. Sketch the region for which f ( x , y} > 0, that is, the support. Find/i(x), the marginal p.d.f. of X . Find h(y | x), the conditional p.d.f. of Y, given X = x. Find the conditional mean and variance of V, given X = x. Find/^(y), the marginal p.d.f. of Y. 4.4-7 Let /(x, y) = 1/40, 0 ^ x ^ 10, 10 - x ^ y ^ 14 - x, be the joint p.d.f. of X and y. (a) (b) (c) (d) 4.4-8 (a) (b) (c) (d) (e) Sketch the region for which/(x, y) > 0. Find/i(x), the marginal p.d.f. ofX. Find h[y \ x), the conditional p.d.f. of Y, given X = x. Find E(Y | x), the conditional mean of Y, given X = x. Let/(x, y) = 1/8,0 ^ y <4, y ^ x ^ y + 2,bethe joint p.d.f. of X and Y. Sketch the region for which f [ x , y} > 0. Find/i(x), the marginal p.d.f. of X . Find h[y \ x), the conditional p.d.f. of Y, given X = x. Find E(Y j x), the conditional mean of Y, given X •= x. Graph y = E[Y\ x) on your sketch in part (a). Is y = E(Y\ x} linear? 4.4-9 Let X have a uniform distribution (/(O, 2), and let the conditional distribution of V, given X = x, be U{0, x2). (a) Define the joint p.d.f. of X and V,/(x, y}. (b) Calculate ^(y), the marginal p.d.f. of Y. (c) Find E(X \ y), the conditional mean of X , given Y = y. (d) Find E(Y \ x), the conditional mean of Y, given X = x. 4.4-10 The Joint moment-generating function of the random variables X and Y of the continuous type with joint p.d.f. f i x , y) is defined by ^h,t,}= 1'" [" el^^f{x,y}dxdy if this integral exists for -h^ <ti< /ip -A; < t; < h^. If X and Y are independent, show that M(ti, t;) = M(ti, 0)M(0, la). From the uniqueness property of the moment-generating function, it can be argued that this is also a sufficient condition for independence. These statements are also true in the discrete case. That is, X and Y are independent if and only if M(;i, (3) = M{t,, 0)M(0, t;). 4.4-11 Let ( X , Y) denote a point selected at random from the rectangle R = {(x, y): 0 ^ x £ Compute F[(X, Y) e A], where A = {(x, y): y <: e1'} n R. 245 Multivariate Distributions Ch. 4 4.4-12 Let Xi, X , be independent and have distributions that are [/(O, 1). The joint p.d.f. o f X i and Jf, is/(Xi,.(,) = 1, 0 <. Xi s 1, 0 <, x, 5 1. (i) Show that P(Xi + Xj < 1) = tr/4. (b) Using pairs of random numbers, find an approximation of n/4. HINT: For n pairs of random numbers, the relative frequency #[{(x,, x,}: x\ + xl s 1}] is an approximation of n/4. (c) Let V equal the #[{(^i, x^}: x\ + x| < 1}] for n independent pairs of random numbers. How is Y distributed? 4.4-13 Let X have a uniform distribution [/(O, 1), and let the conditional distribution of r, given X = x, be U(x2, ;c2 + 1). (a) Record the joint p.d.f. of X and Y, f ( x , y). Sketch the region for which f ( x , y) > 0. (b) Find the marginal p.d.f. of Y,{j.y). (c) Whatis£(Y|x). 4.4-14 The distribution of X is UiQ, 1) and the conditional distribution of Y, given X = x, is U(0, 1 - x). (a) Record the joint p.d.f- of-Y and Y. Be certain to include the domain of the p.d.f. (b) Find f^(y}, the marginal p.d.f. of Y. (c) Whatis£(y|x)? 4.4-15 Let X,, X,, ..., X . be independent and have distributions that are [7(0, 1). The joint p.d.f. of Xi, X , , ..., X . is/(x,, x ^ , ..., x,) = 1, 0 <. x, < 1, 0 <. x, £ 0£ (a) PO-,+X,<:1)=1/2!, (b) / ' ( ^ i + J ( - , + . Y 3 f i l ) = l / 3 ! , (c) P(J?i+ X , + " - + X . < t ) = l / n } . HINT: Draw figures for parts (a) and (b). 4.5 The Bjvariate Normal Distribution Let X and Y be random variables with joint p.d.f. f ( x , y) of the continuous type. Many applications are concerned with the conditional distribution of one of the random variables, say Y, given that X == x. For example, X and Y might be a student's grade point averages from high school and from the first year in college, respectively. Persons in the field of educational testing and measurement are extremely interested in the conditional distribution of Y, given X = x, in such situations. Suppose that we have an application in which we can make the following three assumptions about the conditional distribution of Y, given X = x'. 246 4.5 The Bivariate Normal Distribution (a) It is normal for each real x. (b) Its mean £( Y \ x} is a linear function of x. (c) Its variance is constant; that is, it does not depend upon the given value ofx. Of course, assumption (b), along with a result given in Section 4.4 implies that E(Y\x)=|^,+pal(x-^). "x Let us consider the implication of assumption (c). The conditional variance is given by ^1,=) \y-h-P^(x-h)\h(y\x)dy, where h(y \ x) is the conditional p.d.f. of Y given X = x. Multiply each member of this equation o f f ^ x ) and integrate on x. Since cr 2 ^ is a constant, the lefthand member is equal to <r2 i,. Thus we have x 2 dx ff,i. = J-«J-»L r F \^ - ^ - r ffy "x ( - ^'x)\J ^'^y\x)f,^x) ^ - However, h{y \ x)f^(x) =f(x, y); hence the right-hand member is just an expectation and the equation can be written as ff 2 !, = £{(r - fi,)2 - Ip ^ {X - ^ - /ly) + p2 ^ (X - |^,)1{. But using the fact that the expectation £ ing E\_(X - ^)(Y - 11,)'] = pax a,, that -i i ^y •; ^v i Tri, = ff? - 2p — "]! ax = CT? - 2p1 a1, + p1 o? = i7?(l - p2). That is, the conditional variance of Y, for each given x, is ff2;! - p2). These facts about the conditional mean and variance, along with assumption (a), require that the conditional p.d.f. of V, given X = x, be 1 2 ^ _ ,,p r _ b'-^-pw^xx-^q ] " ' ' ~ ,,^ ^~? L 2a (l - p ) J• p 2 2 — 247 Multivariate Distributions Figure 4.5-1 Before we make any assumptions about the distribution of X , we give an example and figure to illustrate the implications of our current assumptions, Example 4.5-1 Let ^ == 10, a\ =9, ^y = 15, (T? = 16, and p = 0.8. We have seen that assumptions (a), (b), and (c) imply that the conditional distribution of V, given X = x, is ••]1 N\ 15 + (0.8)1-KX- 10), 16(1 -0.82) In Figure 4.5-1 the conditional mean line E(Y | x) = 15 + (0.8)(^ - 10) = (^x + (") has been graphed. For each of x = 5, 10, and 15, the p.d.f. of Y, given X = x, is given. Up to this point, nothing has been said about the distribution of X other than that it has mean ^ and positive variance o^. Suppose, in addition, we assume that this distribution is also normal; that is, the marginal p.d.f. o f X i s <r.tV2' 248 :exp r _ (x - ^f~\ L 2,i J- — 4.5 The Bivariate Normal Distribution Hence the joint p.d.f. of X and Y is given by the product f ( x , y) = h(y\x)f,(x) = ——————== exp [~ -q^-yl~\, 2 2na^^l-p L where it is easy to show (see Exercise 4.5-2) that 2 J (4.5-1) ^'-n ^, y) = —— 2p(^Y^'^ . (y-^ i - p2 LV ixrf^Y I \- "x A ", , A joint p.d.f. of this form is called a bivariate normal p.d.f. Example 4.5-2 Let us assume that in a certain population of college students, the respective grade point averages, say X and V, in high school and the first year in college have an approximate bivariate normal distribution with parameters ^ = 2.9, /iy = 2.4, dy = 0.4, Uy = 0.5, and p = 0.8. Then, for illustration, ^<.<3.3,=^<^<"^) = <I(1.8) - <S>(-0.6) = 0.6898. Since the conditional p.d.f. of Y, given X = 3.2, is normal with mean •o' 2.4 + (0.8/^\3.2 - 2.9) = 2.7 and standard deviation (0.5K/1 - 0.64 = 0.3, we have that P(2.1 < V < 3.31 X = 3.2) •2.1-2.7 r-2.7^»_17l 0.3 0.3 -(- 0.3 =d)(2)-D(- 2) =0.9544. From a practical point of view, however, the reader should be warned that the correlation coefficient of these grade point averages is, in many instances, much smaller than 0.8. Since x and y enter the bivariate normal p.d.f. in a similar manner, the roles of X and Y could have been interchanged. That is, Y could have been assigned the marginal normal p.d.f. N(^r, o"?), and the conditional p.d.f. of X , given — V = y, would have then been normal, with mean ^ + Pt^/^rX)' 249 Multivariate Distributions Ch. 4 variance <7^(1 — special note of it. In order to have a better understanding of the geometry of the bivariate normal distribution, consider the graph of z =f{x, y), where f ( x , y) is given by equation (4.5-1). If we intersect this surface with planes parallel to the yz plane, that is, with x = X y , we have f(xo, y)=fi(Xo)h(y\Xo). In this equation /i(xo) is a constant, and h(y\Xo) is a normal p.d.f. Thus z = f ( x Q , y) is bell-shaped, that is, has the shape of a normal p.d.f. However, note that it is not necessarily a p.d.f. because of the factor /i(xo). Similarly, intersections of the surface z =/(x, y) with planes y = yo, parallel to the xz plane will be bell-shaped. 0 < ZQ < - 2n(Tjt O-y ^/l — then 0 < ZQ litdjf <jy -\/1 — If we intersect z =/(x, y) with the plane z = Z y , which is parallel to the xy plane, we have m A^vT^^expr^l Taking the natural logarithm of each side, we obtain f^'-.v zpf^^v^a) lx^\ __2p ^-^v^ia ++ ^v z_"i V "x I \ V "x A "Y I \ a, "r ) ___ = -2(1 - p2) In (^2m^,^\-p1). Thus we see that these intersections are ellipses. Example 4.5-3 With //, = 10, a\ = 9, ^ = 15, o2 = 16, and p = 0.8, the bivariate normal p.d.f. has been graphed in Figure 4.5-2. For p = 0.8, the level curves for zg = 0.001, 0.006, 0.011, 0.016, and 0.021 are given in Figure 4.5-3. The conditional mean line, E{Y | x) = 15 + (0.8/|\x - 10) = ^ x + ^ , is also drawn on Figure 4.5-3. Note that this line intersects the level curves at points through which vertical tangents can be drawn to the ellipses. 250 The Bivariate Normal Distribution Figure 4.5-2 We close this section by observing another important property of the correlation coefficient p if X and Y have a bivariate normal distribution. In equation (4.5-1) of the product h(y \ x)f^(x), let us consider the factor h(y \ x) if p =0. We see that this product, which is the joint p.d.f. of X and Y, equals fi{x)f^(y} because h(y \ x) is, when p = 0, a normal p.d.f. with mean jiiy and variance a^. That is, if p = Q, the joint p.d.f. factors into the product of the two marginal probability density functions, and, hence, X and Y are independent random variables. Of course, if X and Y are any independent random variables (not necessarily normal), we know that p, if it exists, is always equal to zero. Thus we have proved the following. THEOREM 4.5-1 // X and Y have a bivariate normal distribution with correlation coefficient p, then X and Y are independent if and only if p =0. Thus, in the bivariate normal case, p = 0 does imply independence of X and y. It should be mentioned here that these characteristics of the bivariate normal distribution can be extended to the trivariate normal distribution or, more generally, the multivariate normal distribution. This is done in more 251 Multivahate Distributions Figure 4.5-3 advanced texts assuming some knowledge of matrices; for illustration, see Chapter 12 ofHogg and Craig (1978). Exercises 43-1 Let X and Y have a bivariate normal distribution with parameters /^ = ~ 3, H, = 10, ai = 25, <r} = 9, and p = 3/5. Compute (b) P(-5<;l;-<5|r=13), (a) P(-5<X <5), (d) P(7< r < 16|X=2). (c) P(7 < r < 16), 252 4.5 The Bivariate Normal Distribution 4.5-2 Show that the expression in the exponent of equation (4.5-1) is equal to the function q{x, y} given in the text. 4.5-3 Let X and Y have a bivariate normal distribution with parameters ^ = 2.8, fly = 110, a\ = 0.16, (Ty = 100, and p = 0.6. Compute (a) P(106 < r < 124), (b) P(106 < Y < 1241 X = 3.2). 4.5-4 Let X and Y have a bivariate normal distribution with parameters f l y = 50, fly » (a) P(65.8 < r £ 4.5-5 Let X denote the height in centimeters and Y the weight in kilograms of male college students. Assume that X and Y have a bivariate normal distribution with parameters ^ = 185, o\ = 100, fly =- 84, Oy = 64, and p = 3/5. (a) Determine the conditional distribution of Y, given that X = 190. (b) Find P(86.4 < Y < 95.36 \X = 190). 4^-6 For a freshman taking introductory statistics and majoring in psychology, let X equal the student's ACT mathematics score and Y the student's ACT verbal score. Assume that X and Y have a bivariate normal distribution with fix = 22.7, o\ = 17.64, fly = 22.7, ffy = 12.25, and p = 0.78. Find (a) P(18.5 < Y < 25.5), (b) E(Y\x}, (c) Var(y|x), (d) P(18.5<y<25.5|X=23), (e) P(18.5<y<25.5|X=25). (f) Forjc = 21, 23, and 25, draw a graph ofz = h[y\x) similar to Figure 4.5-1. 4.5-7 For a pair of gallinules, let X equal the weight in grams of the male and Y the weight in grams of the female. Assume that X and Y have a bivariate normal distribution with fi,= 413.6, ai- 457.96, fly = 346.7, Ty-519.84, and p = -0.32. Find (a) P(309.3 < Y < 380.9), (b) E(Y\x), (c) Var(Y|x), (d) P(309.3 < V < 380.91 X = 384.9). 4.5-8 Let X and Y have a bivariate normal distribution with parameters ^ = 10, s\ = 9, )i, = 15, iiy = 16, and p = 0. Find (a) P(13.6 < y < 17.2), (b) £<y|x), (c) Var(Y|x), (d) P(13.6< Y < 17.21 X = 9.1). 4.5-9 Let X and Y have a bivariate normal distribution. Find two different lines, at,x) and fc(x), parallel to and equidistant from £(V I x), such that F[a(x) < V < t(x) | X = x] = 0.9544 for all real x. Plot a(x), &(x), and £(V | x) when /^ = 2, fly = — P = 3/5. 253 ? Multivariate Distributions Ch. 4 j 4.5-10 In a college health fitness program, let X denote the weight in kilograms of a male freshman at the beginning of the program and let Y denote his weight change during a semester. Assume that X and Y have a bivariate normal distribution with ^ = 72.30, ff^ = 110.25, f l y = 2.80, o-? = 2.89, and p =- -0.57. (The lighter students tend to gain weight, while the heavier students tend to lose weight.) Find (a) P(2.80 $ V < 5.35), (b) P(176< Y <. 5.341 ^=82.3). ; 4.5-11 For a female freshman in a health fitness program, let X equal her percentage of body fat at the beginning of the program and let V equal the change in her percentage of body fat measured at the end of the program. Assume that X and Y have a bivariate normal distribution with ^ •= 24.5, ajc = 4.82 = 23.04, ^y = —0,2, a3/ = 3.02 = 9.0, and p = -0.32. Find (a) F(1.3 £ (b) )iy i,., the conditional mean of Y, given X = x, (c) iT^i^, the conditional variance of Y, given X = x, (d) P(1.3^ V < 5 . 8 | ^ = 18). 4.5-12 For a male freshman in a health fitness program, let X equal his percentage of body fat at the beginning of the program and let Y equal the change in his percentage of body fat measured at the end of the program. Assume that X and Y have a bivariate normal distribution with ^ = 15.00, (T^ = 4.52, ^= —1.55, o^ = 1.52, and p - -0.60. Find (a) P(0.205 ;. Y £ (b) P(0.21 ^ Y < 0.811 X =20). 4.6 Sampling from Bivariate Distributions In Section 1.4 we found that we could compute characteristics of a sample in exactly the same way that we computed the corresponding characteristics of a distribution of one random variable. Of course, this can be done because the sample characteristics are actually those of a distribution, namely the empirical distribution. Also, since the empirical distribution approximates the actual distribution when the sample size is large, the sample characteristics can be used as estimates of the corresponding characteristics of the actual distribution. These notions can be extended to samples from distributions of two or more variables, and the purpose of this section is to illustrate this extension. Let ( X i , Yi», ( X ^ , V;), ..., {X^ Y^ be n independent observations of the pair of random variables ( X , Y) with a fixed, but possibly unknown, distribution. We say that the collection of pairs (Xi, Vi), ( X ^ , Y^), ..., ( X ^ , YJ is a random sample from this distribution involving two random variables. The definition of a random sample can obviously be extended to samples from distributions with more than two random variables and hence will not be given formally. If we now assign the weight 1/n to each observed pair (^i, y^\ [ x ^ , y^), ..., (x^, y^), we create a discrete-type distribution of probability in 254 4.6 Sampling from Bivariate Distribution: two-dimensional space. Using this empirical distribution, we can compute the means, the variances, the covariance, the correlation coefficient, and the bestfitting line. We use the means of the empirical distribution, x and y, as the sample means. However, in defining the sample variances and the sample covariance, we modify, as before, those corresponding characteristics of the empirical distribution by multiplying them by the factor n/(n — is to create better estimates (in some sense to be denned later) of the variances and covariance of the underlying distribution. In particular, the means, variances, covariance, and correlation coefficient of the sample are given by - 1 ;, - ,2 _ i ' y" ".£y" i ' . V < \ / . xl ~ („ ,,-|2 _ ^-n-l2^'"-" - \2 £x ^ ? -I _ ,,>2 _ E ..'i ,„ '--n-lA^ ,2 _ 1 ;, y= ^n,?^" Vi-i / n(n-l) / " \ / " \2 \1=1 / \.-1 / " Z v ? - £)', n(n-l) , . »(ix,y,)-(ix,Yt,,)A 1 c" = — ! — y '"• (x - xXv - v} - "°n(n ' n - 1. , ' ' ~ - 1) L ( y "°' "' / and r ^ "(^•^-(^•X.^-) - M-^M^f' From the last equation we see that the covariance of the sample can be written The best-fitting line (least squares regression line) is y = y + ^(x - x) == -y + ^ (x - x). V-t/ ^X 255 Multivariate Distributions Ch. 4 Recall that in Section 4.2 we fit that straight line by minimizing a certain expectation, which for the empirical distribution would be ^ S [y; - a - bxf. Of course, we do not expect that each observed pair (x,, y.) will lie on this line, y = a -+- bx. That is, y^ does not usually equal y + r(Sy/s^x, — expect this line to fit the collection of points (^i, Vi), ( x ^ , y^), ..., (jc,,, y^ in some best way, namely best by the principle of least squares. Example 4,6-1 To simplify the calculations, we take a small sample size, n = 5. Let the five observed points be (3,2), (6,0), (5,2), (1,6), (3,5). It is sometimes helpful to construct the following table to simplify the calculations. xy x' y2 3 2 6 9 4 6 S 1 _3 18 0 2 6 _5 15 0 10 6 15 37 36 25 1 _9 80 0 4 36 25 69 Y Thus ^ — — i ^"(IS)^, J "1=1 y=^ y.=i(i5)=3.o, "(JL^-(J^ s = ' , '' 256 n(n - 1 ) = 5(80)_(18)2 ^ 76 ^ ^ 5 ( 4 ) = 20 {|^HM 5(69 -(15)n(n - 1) 5(4) 120 20 • Sampling from Bivariate Distributions Figure 4.6-1 / _ \ / _ yx 2 y \i^i yl, [lx'r•)-[^ ')^'. ' ) _ 5(37) - (18)(15) _ n(n - 1) 5(4) -85 20 = -4.25, and -4.25 s,s, v'3.8v'6.0 = -0.89. Thus the best-fitting line is y = -y + rM (x - x) — — — © < • = 3.0 - 0.89 /— (x - 3.6) \ 3.8 = 7.03 - 1.12x. We plot the five observed points and the best-fitting line on the same graph (Figure 4.6-1) in order to compare the fit of this line to the collection of points. 257 Multivariate Distributions Ch. 4 Example 4.6-2 Let the joint p.d.f. of the random variables X and Y of the discrete type be /(x, y) = 1/15, 1 ^ y S x <, 5, where x and y are integers. Then the marginal p.d.f. of X is /i(x) = x/15, x = 1, 2, 3, 4, 5, and the marginal p.d.f. of V is f,(y) = (6 - >')/15, y = 1, 2, 3, 4, 5. It is easy to show that ti, = 11/3, ai = 14/9, /iy = 7/3, (T? = 14/9, and p = 1/2. Thus the "bestfitting " line associated with this distribution is (a,\ y = v., + p[ — \"s/ ^ -HNR)-^. A random sample of n = 30 observations from this joint distribution was generated and yielded the following data: (2,2) (4, 1) (5, 4) (3, 1) (4, 1) (2, 1) (5,2) (3,2) (1, 1) (5, 1) (3, 2) (5,4) (5,2) (4,3) (5, 1) (4, 3) (4, 2) (4,2) (4,2) (4,2) (2, 1) (4, 3) (4, 1) (3,3) (4,3) (5,2) (4, 4) (5, 5) (3, 1) (3,2) For these 30 observations, in which some points were observed more than once, we have x = 3.767, y = 2.133, s, = 1.073, s, = 1.106, and r = 0.405. Thus the observed best-fitting line for these 30 points is y = 0.417x + 0.560; this should be compared to the best-fitting line of the distribution. To visualize better the relationship between r and a plot of « points (Xp y,),..., (x., y,), we have generated three diflerent sets of 50 pairs of observations from three bivariate normal distributions. In the next example we list the corresponding values of jc, y, s^,, s^, r, and the observed best-fitting line. Each set of points and corresponding line are plotted on the same graph. Example 4.6-3 Three random samples, each of size n = 50, were taken from three different bivariate normal distributions. For each of these distributions, y.s = 12, aj = 16, fty = 8, ff? = 9. The respective values of the correlation coefficient, p, are 0.8, 0.2, and - 0.6. The corresponding sample characteristics are 258 Sampling from Bivariate Distributions Figure 4.6-2(a) (a) x = 11.905, s3; = 14.095, y = 8.271, s; = 6.851, and r = 0.799, so the best-fitting line is y = 8.271 + 0.799 /_6^5!- (x -_ 11.905) = 0.557;c + 1.639. V 14.095 (b) x = 12.038, s^ = 15.011, y = 7.790, s,2 = 7.931, and r = 0.169, so the best-fitting line is /7931 y = 7.790 + 0.169 / — — , (x - 12.038) = 0.123x + 6.311. (c) x = 12.095, s^ = 16.762, y = 8.040, s1, = 7.655, and r = -0.689, so the best-fitting line is y = 8.040 - 0.689 /-^; (x - 12.095) = -0.466x + 13.672. \ 16.762 In Figure 4.6-2, these respective lines and the corresponding sample points are plotted. Note the effect that the value of r has on the slope of the line and the variability of the points about that line. 259 Multivahate Distributions Figure 4.6-2(c) 260 Sampling from Bivariate Distributions The next example shows that two random variables X and Y may be clearly related (dependent) but yet have a correlation coefficient p close to zero. This, however, is not unexpected, since we recall that p does, in a sense, measure the linear relationship between two random variables. That is, the linear relationship between X and Y could be zero, whereas higher order ones could be quite strong. Example 4.6-4 Twenty-five observations of X and Y, generated from a certain bivariate distribution, are (6.91, 17.52) (4.32, 22.69) (2.38, 17.61) (7.98, 14.29) (8.26, 10.77) (2.00, 12.87) (3.10, 18.63) (7.69, 16.77) (2.21, 14.97) (3.42, 19.16) (8.18, 11.15) (5.39, 22.41) (1.19, 7.50) (3.21, 19.06) (5.47, 23.89) (7.35, 16.63) (2.32, 15.09) (7.54, 14.75) (1.27, 10.75) (7.33, 17.42) (8.41, 9.40) (8.72, 9.83) (6.09,22.33) (5.30,21.37) (7.30, 17.36) For this set of observations x = 5.33, y = 16.17, s2, = 6.521, s^ = 20.865, and r = —0.06. Note that r is very close to zero even though X and Y seem very dependent; that is, it seems that a quadratic expression would fit the data very well. In Exercise 4.6-11 the reader is asked to fit y = a + bx + ex2 to these 25 points by the method of least squares. See Figure 4.6-3 for a plot of the 25 points. Exercises 4.6-1 Three observed values of the pair of random variables [ X , Y) yielded the three points (2, 3), (4, 7), and (6, 5). (a) Calculate x, J', s^, s 2 , r, and the equation of the best-fitting line. (b) Plot the points and the best-fitting line on the same graph. 4.6-2 Three observed values of the pair of random variables (X, Y) yielded the three points (1,2), (3, 1), and (2, 3). (a) Calculate x, y, s1:, s2 r, and the equation of the best-fitting line. (b) Plot the points and the best-fitting line on the same graph. 4.6-3 A pair of unbiased dice was rolled six independent times. Let X denote the smaller outcome and Y the larger outcome on the dice. The following outcomes were observed: (2.5) (3,5) (3,6) (2,3) (5,5) (1,3) 2 2 (a) Find x, y, s , s r, and the best-fitting line for the sample. (b) Plot the points and the line on the same graph. 261 MultiVariate Distributions Figure 4.6-3 (c) Define the joint p.d.f. of X and Y (see Example 4.1-1) and then calculate f i ^ , ^ ojc, a^, p, and the best-fitting line of this joint distribution. 4.6-4 Ten college students took the Undergraduate Record Exam (URE) when the were juniors and the Graduate Record Exam (ORE) when they were seniors. TlQuantitative URE score (x) and the Quantitative GRE score (>•) for each of these 1 students is given in the following list of ordered pairs (x, y}'. (550, 570) (670, 730) (490, 450) (410, 540) (570, 560) (490, 400) (450, 420) (490,520) (780, 710) (520, 620) (a» Verify that x = 542.0, y = 552.0, s^ = 12,040.0, s^ = 12,640.0, and r = 0.79. (b) Find the equation of the best-fitting line. (c) Plot the 10 points and the line on the same graph. 4.6-5 The respective high school and college grade-point averages for 20 college seniol as ordered pairs {x, y) are 262 4.6 Sampling from Bivariate Distributions (3.75, 3.19) (3.42, 2.97) (3.47,3.15) (2.47, 2.11) (3.30, 3.05) (3.45, 3.34) (4.00, 3.79) (2.60,2.26) (3.36, 3.01) (2.58. 2.63) (2.87, 2.23) (2.65, 2.55) (4.00,3.76) (3.60, 2.92) (3.80, 3.22) (3.60, 3.46) (3.10, 2.50) (2.30,2.11) (3.65, 3.09) (3.79, 3.27) (a) Verify that x ^ 3.29, y = 2.93, s2, = 0.283, s} = 0.260, and r = 0.92. (b) Find the equation of the best-fitting line. Ac) Plot the 20 points and the line on the same graph. ^.6-6 The respective high school grade-point average and the SAT mathematics score for 25 college students as ordered pairs {x, y) are (4.00,577) (2.53,453) (3.45,407) (2.48,539) (2.69, 534) (2.82, 584) (2.33,464) (2.21, 525) (2.59, 545) (3.37,499) (3.00, 446) (2.93,466) (3.25,491) (2.90,433) (3.64, 556) (3.23, 394) (2.46,497) (2.62,460) (2.75,413) (2.82,440) (3.51, 608) (4.00, 657) (3.72,449) (2.78, 323) [ (3.33,413) T-1 . (a) Verify-that x -• 3.02, y =486,12, s1, -^ 8.258, ^ =-5WS^2, and r .- 0.275.. (b) Find the equation of the best-fitting line. k.l»^ C;)Wf^\.,||;,:.t ,1 4.6-7 Let the set R = {(x, y): x/2 < y < x/2 + 2, 0 < x < 10}. Let ( X , Y) denote a X ^y random point selected uniformly from R. (a) Sketch the set R in the xy plane. Does it seem intuitive to you that E(Y | x) = x/2+1? (b) Twenty-five points selected at random from R by the computer are (6.58, 3.58) (4.73, 2.36) (1.52, 1.96) (7.35,4.68) (9.17, 5.50) (9.74, 5.61) (9.96, 5.68) (6.43, 3.61) . (9.06,4.81) (2.05, 1.92) (3.39, 2.70) (4.78,4.05) (1.70,0.97) (3.37, 3.62) (2.75, 2.25) (6.57,4.26) (5.29, 3.17) (3.13, 1.60) (8.06,4.34) (1.79, 1.26) (9.81, 6.39) (1.32, 1.87) (9.30, 5.95) (0.46, 2.04) • Use the 25 observed values to obtain the best-fitting line of this sample. Note that it is close to £(V | x) = x / 2 + I. 4.6-8 The following data give the ACT Math and ACT Verbal scores for 15 students: (16,19) (25, 21) (27, 29) (18,17) (21, 24) (28, 24) (22,18) (23, 18) (30, 24) (20,23) (24,18) (27, 23) (17,20) (31, 25) (28, 24) 263 Multivariate Distributions Ch. 4 (a) Vertify that x = 23.8, y = 21,8, s^ = 22.457, s^ = 11.600, and r = 0.626. (b) Find the equation of the best-fitting line. <c> 7\o\ tine ^5 points and the line on the same graph. 4.6-9 Each of 16 professional golfers hits off the tee a golf ball of brand A and a golf ball of brand B, eight hitting ball A before ball B and eight hitting ball B before ball A. Let X and Y equal the distances traveled in yards for ball A and for ball B, respectively. The following data, (x, y), were observed: (265, 252) (255, 244) (244, 251) (272,276) (258, 245) (212, 222) (246, 243) (276, 259) (254, 255) (260, 246) (274, 267) (224,231) (274,275) (274, 260) (263,246) (269, 267) (a) Verify that x = 257.50, y = 252.44, s^ = 341.333, s^ = 218.796, and r = 0.867. (b) Find the equation of the best-fitting line. (c) Plot the 16 points and the line on the same graph. 4.6-10 Fourteen pairs of gallinules were captured and weighed. Let X equal the male weight and Y the female weight. The following weights in grams were observed: (405, 321) (403,328) (415, 355) (396, 378) (370, 372) (400,340) (457,351) (435, 314) (425, 398) (450, 320) (425, 375) (420, 330) (415, 365) (425,355) (a) Verify that x = 417.2, s^ = 22.36, y = 350.1, Sy = 25.56, and r = -0.252. (b) Find the equation of the best-fitting line. (c) Plot the points and the line on the same graph. 4.6-11 We would like to fit the quadratic curve y = a + bx + ex2 to a set of points (xi, .Vi),(x2, y^},..., (;<„, >'n)by the method of least squares. To do this, let h{a, b,c}= ^ (y, - a - bx, - ex?)2, (a) By setting the three first partial derivatives of h with respect to a, b, and c equal to zero, show that a, &, and c satisfy the following set of equations, all sums going from 1 to n: an + b ^ x, + c ^ x,2 = ^ y,; ^'L^+b^xf+c^xf^^x.y,; a^xf+b^xf+c^xt=Y.xfy,. (b) For the data given in Example 4.6-4, ^x, = 133.34, ^x? = 867.75, ^ x,3 = 6197.21, ^ x^ = 46,318.88, ^ y, == 404.22, ^ x,y, = 2138.38, ^ xf y, = 13,380,30. Show that a = -1.88, b = 9.86, and c = -0.995. (c) Plot the points and this least squares quadratic regression curve on the same graph. 264 4.6 Sampling from Bivariate Distributions 4.6-12 Let a random number X be selected uniformly from the interval (1, 9). For each observed value of X = x, let a random number Y be selected uniformly from the interval (x2 — erated on a computer are (4.16, (2.69, (2.44, (3.17. (5.47, (8.26, (6.87, 2.66) 10.14) 8.04) 6.79) 1.82) 12.89) 5.96) (2.88, (2.54, (3.20, (5.39, (8.17, (6.62, 8.60) 8.69) 4.41) 1.63) 14.55) 6.72) (4.97, (1.49, (4.20, (8.43, (2.18, (2.68, 3.76) 14.07) 3.01) 14.23) 12.76) 9.53) (2.02, (2.13, (8.74, (6.10, (3.18, (8.06, 12.81) 10.36) 17.15) 4.75) 6.56) 11.63) (a» For these data, ^ x, = 116.04, ^ x f = 675,35, ^ x,3 = 4551.52, ^ x4 = 33,331.38, Y. Vi = 213.52, ^ x,y^ = 1036.97, ^ x f y , - 6661.79. Show that the equation of the least squares quadratic regression curve is equal to 1.026x2 - 10.296x + 28.612. (b) Plot the points and the least squares quadratic regression curve on the same graph. 4.6-13 After keeping records for 6 years, a telephone company is interested in predicting the number of telephones that will be in service in year 7. The following data are available. The number of telephones in service is given in thousands. Year Number of Telephones 1 2 3 4 5 6 91 93 95 99 102 105 (a) Find the quadratic curve of best fit, y = a + bx + ex2, that could be used for this prediction. (b) Plot the points and the curve on the same graph. (c) Predict the number of telephones that will be in service in year 7. 4.6-14 For male freshmen in a health fitness program, let X equal a participant's percentage of body fat at the beginning of the semester and let Y equal the change in this percentage (percentage at the end of the semester minus percentage at the beginning of the semester, so that a negative y indicates a loss). Twelve observations of 265 Multivariate Distributions Ch. 4 ( X , Y) are (13.1,1.1) (8.2, -1.1) (5.4,0.5) (16.8,0.5) (10.4, -0.2) (14.3,-4.6) (17.9,-1.3) (17.4, -2.0) (11.1,1.0) (10.6,-2.2) (10.5. -1.4) (5.3,1.7) (a) Verify that x = 11.750, s, = 4.298, y = -0.667, s, = 1.788, and r = -0.395. (b) Find the equation of the best-fitting line (i.e., the least squares regression line). (c) Plot the points and the line on the same graph. 4.6-15 Let X equal the number of milligrams of tar and V the number of milligrams of carbon monoxide per filtered cigarette (100 millimeters in length) measured by the Federal Trade Commission. A sample of 12 brands yielded the following data: (5,7) (15,15) (17,16) (20,20) (9,11) (11,10) (8,9) (13,13) (11,9) (13,11) (12,11) (11,14) (a) Verify that x= 12.08, s.= 4.010, y = 12.17, s, = 3.614, and r = 0.915. (b) Find the equation of the best-fitting line. (c) Plot the points and the line on the same graph. 4.6-16 For each of 20 statistics students, let X and Y equal the mother's and father's ages, respectively. The observed data are (50,52) (48, 50) (51, 56) (51,50) (48,48) (54, 58) (50,53) (64, 65) (44,46) (52,51) (40,41) (53, 56) (47,50) (52, 62) (50,49) (48,51) (46,48) (51, 55) (51,52) (49, 52) (a) Verify that jc = 49.95, s, = 4.628, y = 52.25, s, = 5.418, and r = 0.888. (b) Find the equation of the best-fitting line. (c) Plot the points and the line on the same graph. 4.7 The t and F Distributions Two distributions that play an important role in statistical applications will be introduced in this section. EOREM 4.7-1 If Z is a random variable that is N(0, 1), if U is a random {triable that is K\r), and i f Z and U are independent, then ~ yujr 266 4.7 The t and F Distributions has a t distribution with r degrees of freedom. Its p.d.f. is r-[(f +1)/2] alt} = —=——-—————— , ^TirWDd+tW"1"2 - co < ( < oo. REMARK This distribution was first discovered by W. S. Gosset when he was working for an Irish brewery. Because Gosset published under the pseudonym Student, this distribution is sometimes known as Student's t distribution. Proof: In the proof, we first find an expression for the distribution function of T and then take its derivative to find the p.d.f. of T. Since Z and U are independent, the joint p.d.f. of Z and U is 9(21 ") = ~!TK '!~'w W2)F2 " ^ ' 2-l ''~"1' ° ° <z<° ° , 0 < u < co. The distribution function F(t) = P(T < t) of T is given by F(t) = P(Zl^uJr S t) -n- =P(Z<^VI~rt) g(z, u) dz du. That is, i (•"rr-'®7" e' 2 ' 2 ' 1 ^-T^ItL 2^4—•'2&-The p.d.f. of T is the derivative of the distribution function; so applying the Fundamental Theorem of Calculus to the inner integral we see that i 1 __ f " f,~t"t2Wr1 [~, du '~^rW)^ -^^—— 2-^ ^ r ' ^'^T^I 2 1 i r'"^"' - „„ .„„ .,, 'du. ^r(r/2)J, 2-1"2 In the integral make the change of variables du 1 y = (1 + t"lr)u so that — dy 1 + t'/r ' 267 Multivariate Distributions Figure 4.7-1 Thus we find that .A+i /(t) ^r^Ji ^r(r/2) L(l + ^/rr^J Jo ^r+1^^^ e-^ 2 ^. The integral in this last expression for/(r) is equal to 1 because the integrand is like the p.d.f. of a chi-square distribution with r + 1 degrees of freedom. Thus the p.d.f. is as given in the theorem. D Note that the distribution of T is completely determined by the number r. Since it is, in general, difficult to evaluate the distribution function of T, some values of P(T ^ t) are found in Table VI in the Appendix for r = 1, 2, 3, ..., 30. Also observe that the graph of the p.d.f. of T is symmetrical with respect to the vertical axis t == 0 and is very similar to the graph of the p.d.f. of the standard normal distribution N(0, 1). Figure 4.7-1 shows the graphs of the probability density functions of T when r = 1, 3, and 7 and of N(0, 1). In this figure we see that the tails of the t distribution are heavier than those of a normal one; that is, there is more extreme probability in the t distribution than in the standardized normal one. Because of the symmetry of the t distribution about t = 0, the mean (if it exists) must equal zero. That is, it can be shown that E(T) == 0 when r s: 2. When r = I, the t distribution is the same as the Cauchy distribution, and we noted in Section 3.2 that the mean and thus the variance do not exist for the 268 The t and F Distributions Figure 4.7-2 Cauchy distribution. The variance of T is Var(T) = £(T2) = when r >. 3. r-2' The variance does not exist when r = 1 or 2. Although it is fairly difficult to compute these moments from the p.d.f. of T, they can be found (Exercise 4.7-4) using the definition of T and the independence of Z and U, namely £(T) = E(Z)E and £(T2) = E(Z1)E^\ For notational purposes we shall let f^(r) denote the constant for which P(T > t.(r)) = a, when T has a t distribution with r degrees of freedom. That is, ^(r) 1s fhe 100(1 — distribution with r degrees of freedom (see Figure 4.7-2). In this figure, r = 7. Let us consider some illustrations of the use of the (-table and this notation for right tail probabilities. Example 4.7-1 Let T have a t distribution with seven degrees of freedom. Then, from Table VI in the Appendix, we have P{T < 1.415) = 0.90, P(T <, -1.415) = 1 - P(T < 1.415) = 0.10, and P(- 1.895 < r < 1.415) = 0.90 - 0.05 = 0.85. 269 Multivariate Distributions Ch. 4 We also have, for example, t,,.n,(7) = 1.415, (o.9o(7) = -t,,.io(7) = -1.415, and to.o25(7) = 2.365. Example 4.7-2 Let T have a t distribution with a variance of 5/4. Thus r/(r - 2) = 5/4, and r = 10. Then P(- 1.812 S T < 1.812) =0.90 and to.o5<10) = 1.812, (o.oi(10) = 2.764, and to.,,(10) = -2.764. Example 4.7-3 Let T have a t distribution with 14 degrees of freedom. Find a constant c such that P( \ T \ < c) = 0.90. From Table VI in the Appendix we see that P(T < 1.761) = 0.95 and therefore c = 1.761 = to.05(14). Another important distribution in statistical applications is introduced in the following theorem. THEOREM 4.7-2 // U and V are independent chi-square variables with r^ and r-i degrees of freedom, respectively, then F^ V/r, has an F distribution with r^ and /•; degrees of freedom. I t s p.d.f. is „ '""''r^w^xi+r,^)-——— 0<M <CO ' • REMARK For many years the random variable defined in Theorem 4.7-2 has been called F, a symbol first proposed by George Snedecor to honor R. A. Fisher, who used a modification of this ratio in several statistical applications. Proof: In this proof, we first find an expression for the distribution function of F and then take its derivative to find the p.d.f. of F. Since U and V are independent, the joint p.d.f. of U and V is y n / 2 - l g - u / 2 yr;/2-lg-^'2 ^^^W^rw^' 0<U<C0 ' 0<1;<CO - In this derivation we let W = F to avoid using / as a symbol for a variable. The distribution function F(w) = P(\V < w) o f f i s W=P(^.)=P(l7^v) -rr 270 g{u, v) du dv. 4.7 The t and F Distributions That is, r" re'""'2""" u"'2-1,.-"'2 ~i i ^'r^w^i li ^ r ^ " } — — — ^ . The p.d.f. of /-' = W is the derivative of the distribution function; so applying the Fundamental Theorem of Calculus to the inner integral, we have /(w) = F'(w) = 1 f" [(rl/r2wl/2 F(ri/2)]-W2) Jo 2<^•+^2 "1 .-.n/^.^ ,V/2-1,-./2 ^ ^T <r tv yi/ 2 !*/'/ 2 " 1 I"" v,)<''i+'-2)/2-l ^Y^J^________ ________e(./2)ll+(n/r2>w), rl+r 2 r^/2)TW2) Jo 2• ^ In the integral, make the change of variables f r^ \ dv 1 v = 1+ — \ r^ ) dy 1 + (r,/r^)w Thus we see that ^ f(w) • (r./r^r^r.+r^W12-1 f* V^ 2 -^-^ 2 1 2 r(r,/2)r(r,/2)[l + {r.w/r^r ^ Jo r[^ + ^)/2]2"-^2 -v- The integral in this last expression for /(w) is equal to 1 because the integrand is like a p.d.f. of a chi-square distribution with r^ + r^ degrees of freedom. Thus the p.d.f. f ( w ) is as given in the theorem. Q Note that the F distribution depends on two parameters, r, and r;, in that order. The first parameter is the number of degrees of freedom in the numerator, and the second is the number of degrees of freedom in the denominator. See Figure 4.7-3 for graphs of the p.d.f. of the F distribution for four pairs of degrees of freedom. It can be shown that E(F) = ——— and r^-2 Var(F) = 2r22{rl+r2 ~\ r^r^ - S)2^ - 4) To verify these two expressions, we note, using the independence of U and V in the definition of F, that .,,.0<,) .. .^w] 271 Multivariate Distributions Figure 4.7-3 In Exercise 4.7-9 the student is asked to find E<U), E{\/V), E(U2), and E(l/y1). Some values of the distribution function P(F < , f ) of the F distribution are given in Table VII in the Appendix. For notational purposes, if F has an F distribution with r^ and r^ degrees of freedom, we say that the distribution of F is F(ri, r;). Furthermore, we will let F^r ^2) denote the constant [the upper lOOa percent point of F(r^, r^~\ for which PV S ^•,('•1, '•2)] = ". Example 4.7-5 If the distribution of F is F(r,, r;), then from Table VII in the Appendix we see that when 1-1 = 7, r; = 8, P(F < 3.50) = 0.95, so Fo.o5(7, 8) = 3.50; when r, = 9, ^ = 4, P(F <, 14.66) = 0.99, so fo.oift 4) = 14-66To use Table VII to find the F values corresponding to the 0.01, 0.025, and 0.05 cumulative probabilities we need to note the following: Since F •= [U/r^}/(V/r^, where U and V are independent and ^(fi) and ^{r^}, respectively, then l / F = (VIr^KU/r^ must have a distribution that is F(r;, r,). Note the change in the order of the parameters in the distribution of l / F . Now if the 272 4,7 The t and F Distributions distribution of F is F{r^ r;), then ^E7 ^ ^i, ^)] = a and ^^T^]^The complement of {1/F < l/F,(ri, r,)} is {1/F > l/f.(ri, r;)}. Thus / '[t > ^] =l - a • (4 7 1 --' Since the distribution of 1/F is F(r^, ri), by definition of F^ _^(r;, ri), we have P|j:>Fi-.(r3,ro1=l-i2. (4.7-2) From equations (4.7-1) and (4.7-2) we see that 'l-(r2•rl)=^)• (4 7 3) •- The use of equation (4.7-3) is illustrated in the next example. Example 4.7-6 If the distribution of F is F(4, 9), constants c and d such that P(F <c)= 0.01 and P(F ^ d) = 0.05 are given by c=F09 (4 ' •9)=^4j=T4166=o•0682; d=FO ^•9)=1^)=^=OA661• Furthermore, if F is F{6, 9) then P(F < 0.2439) = ^ > ^) = P(^ £ because the distribution of 1/F is F(9, 6). 273 Multivariate Distributions Ch. 4 Exercises 4.7-1 Let T have a ( distribution with r degrees of freedom. Find (a) P(T £ (b) P(7" £ (c) P( | r | > 2.228) when i— 10, (d) P(-1.753 < r s 2.602) when r = 15, (e) P(l,330 S T S 2.552) when r = 18. 4.7-2 Let T have a (distribution with r = 19. Find c such that (a) P(|T|£c)=0.05, (b) P(|T|£c)=0.01, (c) P(T a: c) - 0.025, (d) P(|T|£c)=0.95. 4.7-3 Find (a) tooi(13), (b) tooi(15), (c) lo9s(17), (d) to.,,,(5). 4.7-4 Let T have a t distribution with r degrees of freedom. Show that £(T) = 0, r > 2, and Var(T) = r/(r - 2), provided that r a: 3, by first finding £(Z), ffl/VU), £(Z2), and £(1/17). 4.7-5 Let r have a I distribution with r degrees of freedom. Show that T 2 has an F distribution with 1 and r degrees of freedom. HINT: Consider T1 = VHVIr). 4.7-6 (a) (b) (c) Let F have an F distribution with f-i and r; degrees of freedom. Find P(F £ P(F s 4.14) when r, = 7, r, = 15, P(F < 0.1508) when r, = 8, r, = 5, HINT: 0.1508= 1/6.63. (d) P(0.1323 < F < 2.79) when 1-1 = 6, r., = 15. 4.7-7 Let F have an F distribution with r, and r; degrees of freedom. Find numbers a and b such that (a) P(a S F ^ b) = 0.90 when r, = 8, r, = 6, (b) P(a £ 4.7-8 Find (a) (b) (c) (d) F, o,(5, 9), f» o.,,(9, 7), F, ,,(8, 5), F».,,(5, 7). 4.7-9 Find the mean and the variance of an F random variable with f i and r; degrees of freedom by first finding E(V), E(l/V), E(U2), Had E(t/V2) as suggested in the text. 4.7-10 Let -YI and Xa have independent gamma distributions with parameters a, 9 and A 9, respectively. Let W ^ X,/(Xi + X^). Use a method which is similar to that 274 4.7 The t and F Distributions given in the proofs of Theorems 4.7-1 and 4.7-2 to show that the p.d.f. of W is ^=^w"l(l-wr'•0 < W < L We say that W has a beta distribution with parameters a and ? (see Example 4.8-3). 4.7-11 Let X have a beta distribution with parameters a and /f. Show that the mean and variance of A" are and cr2 = «/) (a + P + IXa + ffi 2 ' HINT: In evaluating E{X) and E[X2) compare the integrands to the p.d.f.'s of beta distributions with parameters a + 1, fl and a + 2, jj, respectively. 4.7-12 Let Zi, Z;, Z j be a random sample of size 3 from a standard normal distribution N(0, 1). (a) How is V distributed if V(Z2 + Zj»/2 (b) Let V = Zi/Z^. Show that V has a Cauchy distribution. HINT: Use a method similar to the proof of Theorems 4.7-1 and 4.7-2. Note the quadrants in which V > 0, V < 0, Z, > 0, and Z; < 0. (c) Let „ , Z, ^/(zFTZj)/2 Show that the distribution function of W is '0, - lArctan^-w 2 )/!!. 2 ), "\ 1 1 f(»)= < 2' "'£ -v/2, -,/2<w<0, w = 0, 1— 0<»<v^, "V 1 V'2<". ^1, HINT: What relationship is there between parts (b) and (c)? (d) Show that the p.d.f. of W is -V2<w<,/2. 275 Multivariate Distributions Ch. 4 Note that this is a U-shaped distribution. Why does it differ so much from that in part (a) when the definitions for U and W are so similar? (e) Show that the distribution function of W, for -,/2 < w < ^/2, can be denned by I I / w\1 1/ " \ Flw} = - + - Arcsm — — = - + - Aretan , 2 "\ ^/ll 2 «\ ,/2—-A' . (e) Find the means and variances (if they exist) of U, V, and W. *4.8 Transformations of Random Variables In this section we consider another important method of constructing models. As a matter of fact, to go very far in theory or application of statistics, one must know something about transformations of random variables. We saw how important this was in dealing with the normal distribution when we noted that if X is N(fi, <r2), then 2 = u(X) = (X - ^)/a is N(0, 1); this simple transformation allows us to use one table for probabilities associated with all normal distributions. In the proof of this, we found that the distribution function of Z was given by the integral ,.,(' G(z) = p{^——v- <. z) = P(X <za+fi) = f""" i r (x-^i. — 7 = exp - — — 5 — J-« aJTn ' L 2<r2 J dx. In Section 3.4 we changed variables, x = wa + 11, in the integral to determine the p.d.f. ofZ; but let us now simply differentiate the integral with respect to z. If we recall from calculus that the derivative O.N "/(() dt\ =/[c(2)]i/(z), then, since certain assumptions are satisfied in our case, we have ,, „ „, 1 r (zg + It - /i)2"! d(za + f t ) g(z) = G'(z) = —-,= exp - — — — ^ — — — — ,— =/[c(z)]c za (J^/IK L J "z 1 / z2} =— = exp - — ,/2n \ 27 That is, Z has the p.d.f. of a standard normal distribution. Motivated by the preceding argument we note that, in general, if X is a 276 4.8 Transformations of Random Variables continuous-type random variable with p.d.f. f ( x ) on support a < x < b and if V = u(X) and its inverse X = v(Y) are increasing continuous functions, then the p.d.f. of V is g(y) = fMWy) on the support given by a < v(y) < b or, equivalently, u(a} < y < u(b). Moreover, if u and v are decreasing functions, then the p.d.f. is 9{y) = fW}\.-vW »(&) < V < "(a). Hence, to cover both cases, we can simply write g(y) = | v'{y) | /[>(}')], c <y <d, where the support c < y < d corresponds to a < x < b through the transformation x = v{y). Example 4.8-1 Let the positive random variable X have the p.d.f. f ( x ) = e - J C , 0 < x < o o , which is skewed to the right. To find a distribution that is more symmetric than that ofX, statisticians frequently use the square root transformation, namely Y = ^ / X . Here y = ^ / x corresponds to x = y2, which has derivative 1y. Thus the p.d.f. of Y is g{y) = lye-^, 0 < y < oo, which is of the Weibull type (see Section 3.5). The graphs of f ( x ) and g(y) should convince the reader that the latter is more symmetric than the former (see Figure 4.8-1). Example 4.8-2 Let X be binomial with parameters n and p. Since X has a discrete distribution, Y = u(X) will also have a discrete distribution with the same probabilities as those in the support of X . For illustration, with n = 3, p = 1/4, and Y = X2, we have / 3 ViV7/^3-^ ^'(^(J - ^=al'4'9- For a more interesting problem with the binomial random variable X, suppose we were to search for a transformation u(X/n) of the relative frequency X / n that would have a variance very little dependent on p itself when n is large. That is, we want the variance of u(X/n) to be essentially a constant. Consider the function u(X/n} and find, using two terms of Taylor's expansion about p, that u(-)%u(p)+u'(p)(--p). 277 Multivarjate Distributions Figure 4,8-1 Here terms of higher powers can be disregarded if n is large enough so that X / n is close enough to p. Thus Va{"(f)1 ^ Wp)y Var(^ - p ) = Wp)]2 p(^-p). (4.8-1) However, if Vatf_ii(X/n)] is to be constant with respect to p, then [ttWpfl - P) = k or u'(p} = c VP(1 -f) , where k and c are constants. We see that a(p) = 2c Arcsin ^/? is a solution to this differential equation. Thus with c = 1/2, we frequently see, in the literature, use of the arcsine transformation, namely [x Y= Arcsin /-, V n which, with large n, has an approximate normal distribution with mean ^ = Arcsin ^ / p 278 Transformations of Random Variables and variance [using formula (4.8-1)] ^r^Arcsm^''0^-^. There should be one note of warning here: If the function Y = u(X) does not have a single-valued inverse, the determination of the distribution of Y will not be as simple. We did consider one such example in Section 3.5 by finding the distribution of Z2, where Z is N(0, 1). In this case, there were " two inverse functions" and special care was exercised. In our examples, we will not consider problems with "many inverses"; however, we thought that such a warning should be issued here. When two or more random variables are involved, many interesting problems can result. In the case of a single-valued inverse, the rule is about the same as that in the one-variable case, with the derivative being replaced by the Jacobian. Namely, if X^ and X^ are two continuous-type random variables with joint p.d.f. /(xi, x;) and if V, = Ui(A'i, X^), Y, = u^fXi, X,) has the single-valued inverse X^ = i>i(S'i, Y^), X^ = v^Y^, Y;), then the joint p.d.f. of V, and y; is aCri, f i ) = l^l/["i(3'i. Vi), vi(yi. yi)~\, where the Jacobian J is the determinant 8x^ 8x^ SYt 8x^ Syi 8x^ By, By, Of course, we find the support of V,, V; by considering the mapping of the support of X ^ , X^ under the transformation y^ = Mi(xi, jc;), y^ = u^tii, x^). Example 4.8-3 Let X ^ and X ^ have independent gamma distributions with parameters «, 9 and ft, 9, respectively. That is, the joint p.d.f. of X i and X j is /(^i, x,) = r(«)r(/i)9"-^ x\ -x; exp 0 < x, < oo, 0 < x, < oo. Consider A, + X, Y, = X , + X , 279 Multivariate Distributions or, equivalcntly, x, = y, r,, A-; = r; - Yi r;. The Jacobian is ^2 -:>'2 >'! 1 - Yl =.»'2(i - y i ) + y i f i = y i - Thus the joint p.d.f. g(yi, y^) of Yi and V; is 1 9 ( y t , y i ) = \ y i \•TW^-^(yiyiT'^yi-y^iY'''-'12", where the support is 0 < y^ < 1, 0 < y^ < GO. The marginal p.d.f. of Yi is ^--^^[^e-^ But the integral in this expression is that of a gamma p.d.f. with parameters, « equals r(a 4- P) and ; ^-r^^1'1-^'- o<yl<l - I We say that V, has a beta p.d.f. with parameters « Example 4.8-4 (Box-Muller Transformation) Let the distribution of X be N{fi, v1). It is not easy to simulate an observation of X using Y = F(X), where Y is uniform U(0, 1) and F is the distribution function of the desired normal distribution, because we cannot express the normal distribution function F{x) in closed form. Consider the following transformation, however, where X ^ and X^ are observations of a random sample from U(0, I): Zi = v ^ l n X i cos (2nX,), Z, = ^/-2 In X , sin (2-iiX,) or, equivalently, zf + z'\ X , = exp — ^Arctan(| 280 Transformations of Random Variables Figure 4.8-2 which has Jacobian ] = | 2)i(z; + z|) 2ir(z; + z|) In Since the joint p.d.f. of X^ and X^ is /(xi, x;) = 1, 0 < x, < 1, 0 < x, < 1, we have that the joint p.d.f. of Z^ and Z^ is ff(zi, 22) = 2>t (1) — (The student should note that there is some difficulty with the definition of the transformation, particularly when Zi == 0. However, these difficulties occur at events with probability zero and hence cause no problems; see Exercise 4.8-12.) To summarize, from two independent U(0, 1) random variables, we have generated two independent N(0, 1) random variables through this Box-MuUer transformation. The techniques described for two random variables can be extended to three or more random variables. We do not give any details here but mention, for 281 Multivariate Distributions illustration, that with three random variables X ^ , X ^ , X^ of the continuous type, we need three "new" random variables Zi, Z ^ , Z^ so that the corresponding Jacobian of the single-valued inverse transformation is the nonzero determinant S^_ Bz, Sx,_ 3zi ^ az, 5Z2 OZ3 5x; (?x; az., oz, ax, ax, Exercises 4.8-1 Let the p.d.f. of X be defined by/(x) = x'/4, 0 < x < 2. Find the p.d.f. of V - X'. 4.8-2 Let the p.d.f. of X be defined /(x) = 1/n, -n/2 < x < n/2. Find the p.d.f. of Y = tan X . We say that Y has a Cawhy distribution. 4.8-3 Let the p.d.f. of X be defined by /(x) = (3/2)x2, - 1 < x < 1. Find the p.d.f. of Y ={X3 + 1)/2. 4.8-4 If V has a uniform distribution on the interval (0, 1), find the p.d.f. of J f = ( 2 y - l)'^. 4.8-5 Let X i , X^ denote a random sample from a distribution ^^l. Find the joint p.d.f. of YI = X ^ and Y; = X^ + X i . Here note that the support of Y^, Y^ is 0 < y, < y-i < oo. Also find the marginal p.d.f. of each of Yi and Y^. Are Vi and V; independent? 4.8-6 Let Zi, Z; be a random sample from the standard normal distribution N(0, 1). Use the transformation defined by polar coordinates Z , = X , c o s X ^ , Z; = XtSinX,. (a) Show that the Jacobian equals Xi. (This explains the factor r of r dr d6 in the usual polar coordinate notation.) (b) Find the joint p.d.f. ofX, and X;. (c) Are X, and X, independent? 4.8-7 Let the independent random variables X ^ and X^ be M(0, 1) and ^(r), respectively. Let YI = X J ^ X ^ r and Y, = X;. (a) Find the joint p.d.f. of y, and Y,. (b) Determine the marginal p.d.f. of Y^ and show that Yi has a Students t distribution. 4.8-8 Let A", and X; be independent chi-square random variables with i-i and r^ degrees of freedom, respectively. Let r, = (X,/r,)/(X,/r,)and y.,=X,. (a) Find the joint p.d.f. of V, and V,. (b) Determine the marginal p.d.f. of Yi and show that V, has an F distribution. 282 4.8 Transformations of Random Variables 4.8-9 Let X have a Poisson distribution with mean A. Find a transformation u(X) so that Var [u(X)] is about free of /., for large values of /.. HINT: u[X) % u(/l) + [u't/'W - /l», provided KO)]tY - /l)2/^ and higher terms can be neglected when ). is large. A solution is u(X) = ^ / X , and the latter restriction should be checked. 4.8-10 Generalize Exercise 4.8-9 by assuming that the variance of a distribution is equal to Cfi", where c is a constant (note in Exercise 4.8-9 that this is the case with p =- 1). In particular, the transformation Y = u(X} = X1'^2, when p / 2, or V = u(X) = In X, when p = 2, seems to produce a random variable Y whose variance is about free of p.. This is the reason transformations like ^ f x , In X , and more generally X1' are so popular in applications. 4.8-11 Let Z^ and Z^ be independent standard normal random variables N(0, 1). Show that Vi = Zi/Z; has a Cauchy distribution. 4.8-12 In Example 4.8-4 verify that the given transformation maps {(x^, x^}:0 < Xi < 1, 0 < x^ < 1} onto {^z^, 2;): — o o < Z i < o o , —oo<z;<oo} except for a set of points that has probability 0. HINT: What is the image of vertical line segments? What is the image of horizontal line segments? 4.8-13 Let A"i and X^ be independent chi-square random variables with r, and r; degrees of freedom, respectively. Show that (a) U = X J { X ^ + X t ) has a beta distribution with a = r,/2 and j8=r;,/2; (b) V = JC;/(^i + X ^ ) has a beta distribution with a = 1-2/2 and /? = r^/2. 4.8-14 (a) Let X have a beta distribution with parameters a and fi (see Example 4-8-3). Show that the mean and variance of X are ^ fi = — — a+P and (a + fi +l)(a + ^»2 " (b) Show that the beta p.d.f. has a maximum (mode) at x = (a — 4.8-15 Determine the constant c such that/(x) = cx\l - x)6,0 < x < 1, is a p.d.f. 4.8-16 When a and ^ are integers and 0 < p < 1, then f'n^i) , J. HaTO ' {l ^,-, yr uy ..M ,1-. \y) where « 4.8-17 Evaluate r. r(7) ^ y'(l - yf dy Jo r(4)r(3) (a) using integration, (b) using the result of Exercise 4.8-16. 283