Download Chapter 6: Multivariate Probability Distributions - UF-Stat

1 Chapter 6: Multivariate Probability Distributions 6.1 Bivariate and Marginal Probability Distributions Chapters 4 and 5 dealt with experiments that produced a single numerical response, or random variable, of interest. We discussed, for example, the life lengths X of a battery or the strength Y of a steel casing. Often, however, we want to study the joint behavior of two random variables, such as the joint behavior of life length and casing strength for these batteries. Perhaps in such a study we can identify a region in which some combination of life length and casing strength is optimal in balancing the cost of manufacturing with customer satisfaction. To proceed with such a study, we must know how to handle a joint probability distribution. When only two random variables are involved, joint distributions are called bivariate distributions. We will discuss the bivariate case in some detail; extensions to more than two variables follow along similar lines. Other situations in which bivariate probability distributions are important come to mind easily. A physician studies the joint behavior of pulse and exercise. An educator studies the joint behavior of grades and time devoted to study, or the interrelationship of pretest and posttest scores. An economist studies the joint behavior of business volume and profits. In fact, most real problems we come across have more than one underlying random variable of interest. On April 15, 1912, the ocean liner Titanic collided with an iceberg and sank. Of the 2201 people on the Titanic, 1490 perished. The question as to whether economic status was related to survival of this disaster has been discussed extensively. In Table 6.1, the numbers of survivors and fatalities for four economic classes from this incident are recorded: first-class passengers, second-class passengers, third-class passengers, and the crew. Table 6.1. Survival and Economic Class of Passengers on the Titanic. Economic Status Survivors Fatalities Total First-class 203 122 325 Second-class 118 167 285 Third-class 178 528 706 Crew 212 673 885 Total 711 1490 2201 As noted in earlier chapters, it is more convenient to analyze such data in terms of meaningful random variables rather than in terms of described categories. For each passenger, we want to know whether or not s/he survived and what economic class s/he was in. An X1 defined as ⎧0, if passenger survived ⎪ X1 = ⎨ if passenger did not survive ⎪⎩1, will keep track the number of fatalities. The other variable of interest is economic status. An X2 defined as 2 ⎧1, ⎪ ⎪2, ⎪ X2 = ⎨ ⎪3, ⎪ ⎪⎩4, if passenger was in first − class if passenger was in sec ond − class if passenger was in third − class if passenger was a crew member provides a random variable that keeps track of the number of passengers in each economic class. The frequencies from Table 6.1 are turned into the relative frequencies of Table 6.2 to produce the joint probability distribution of X1 and X2. Table 6.12 Joint Probability Function of Survival and Economic Class of Passengers on the Titanic. X1 0 1 1 0.09 0.06 0.15 2 0.05 0.08 0.13 X2 3 0.08 0.24 0.32 4 0.10 0.30 0.40 0.32 0.68 1.00 In general, we write P( X 1 = x1 , X 2 = x2 ) = p( x1 , x2 ) and call p( x1 , x2 ) the joint probability function of (X1, X2). From Table 6.2, we can see that P( X 1 = 0, X 2 = 4) = p(0,4) = 0.10 represents the approximate probability that a randomly selected passenger is a crew member who survived. The properties of the probability function for jointly distributed random variables are similar to those for the univariate probability functions as summarized in Definite 6.1. Similarly, a distribution function is associated with every probability function. The extension from one to two or more dimensions is an intuitive one. For two dimensions, the distribution function is defined as F( X 1 , X 2 ) ( x1 , x2 ) = P( X 1 ≤ x1 , X 2 ≤ x2 ), ( x1 , x2 ) ∈ ℜ2 As with the univariate case, every distribution function must satisfy four conditions: 1. 2. lim lim F ( x1 , x2 ) = 0 x1 → −∞ x 2 → −∞ lim lim F ( x1 , x 2 ) = 1 x1 →∞ x2 →∞ 3 3. The distribution function is a non-decreasing function; that is, for each a < b and c < d, F(b, d) – F(a, d) – F(c, d) + F(a, c) ≥ 0. 4. For fixed x1 or x2, the distribution function is right-hand continuous in the remaining variable. For example, for x1 fixed, lim+ F ( x1 , x2 + h ) = F ( x1 , x2 ) . h →0 It is important to check each condition as the following example illustrates. Definition 6.1. Let X1 and X2 be discrete random variables. The joint probability distribution of X1 and X2 is given by p ( x1 , x 2 ) = P ( X 1 = x1 , X 2 = x 2 ) Defined for all real numbers x1 and x 2 . The function p ( x1 , x 2 ) is called the joint probability function of X1 and X2. All joint probability functions must satisfy two properties: x1 , x2 ∈ ℜ 1. p( x1 , x2 ) ≥ 0, 2. ∑ ∑ p( x1 , x2 ) = 1 x1 x2 Further, any function that satisfies the two properties above is a probability function. Example 6.1: Let x+ y<0 x+ y≥0 ⎧0, F ( x, y ) = ⎨ ⎩1, Is F a distribution function? Justify your answer. Solution: Here we used X and Y instead of X1 and X2. These symbols serve as names and do not affect how we proceed. The first condition is satisfied because lim F ( x, y ) = lim F ( x, y ) = 0 . x → −∞ y → −∞ Similarly, lim F ( x, y ) = lim F ( x, y ) = 1 so the second condition is satisfied. x →∞ y →∞ The last condition is satisfied. This is verified by focusing on the points of discontinuity for the function. These occur at the line x + y = 1. Yet, because the function is right-hand continuous for all points on this line, it is right-hand continuous. The third condition, however, is not satisfied. To see this, notice that P (1 < X ≤ 2,−1 < Y ≤ 2) = F ( 2,2) − F ( −1,2) − F ( 2,−1) + F ( −1,−1) = 1 − 1 − 1 + 0 = −1 Therefore, this function is not a distribution function. 4 Definition 6.2. The joint distribution function F(a, b) for a bivariate random variable (X1, X2) is F(b) = P(X1 ≤ a, X2 ≤ b) If X1 and X2 are discrete, a b F ( a , b) = ∑ ∑ p( x1 , x2 ) x1 = −∞ x2 = −∞ where p(x1, x2) is the joint probability function. The distribution function is often called the cumulative distribution function (c.d.f). Now we will return to the data for the Titanic. The probability that a randomly selected passenger will be a crew member is P( X 2 = 4) = P( X 1 = 0, X 2 = 4) + P( X 1 = 1, X 2 = 4) = 0.10 + 0.30 = 0.40 which is one of the marginal probabilities for X2. The univariate distribution along the right margin is the marginal distribution for X2, and the one along the bottom row is the marginal distribution for X1. The conditional probability distribution X1, given X2, fixes a value of X2 (a row of Table 6.2) and looks at the relative frequencies for values of X1 in that row. For example, conditioning on X2 = 1, produces P( X 1 = 0 | X 2 = 1) = = P( X 1 = 0, X 2 = 1) P( X 2 = 1) 0.09 = 0.60 0.15 and P( X 1 = 1 | X 2 = 1) = 0.40 These two values show how survivorship behaved for passengers in first class. Notice that the conditional probabilities for third-class passengers were P( X 1 = 0 | X 2 = 3) = 0.25 and P( X 1 = 1 | X 2 = 3) = 0.75 . Did the probability of survival change with class? What were the chances of survival for the second-class passengers and the crew? Another set of conditional distributions here consists of those for X2 conditioning on X1. In this case, P( X 1 = 0, X 2 = 1) P( X 2 = 1 | X 1 = 0) = P( X 1 = 0) = 0.09 = 0.28 0.32 5 P ( X 2 = 2 | X 1 = 0) = 0.05 = 0.16 0.32 P ( X 2 = 3 | X 1 = 0) = 0.08 = 0.25 0.32 P ( X 2 = 4 | X 1 = 0) = 0.10 = 0.31 0.32 and Notice that the probability that a randomly selected survivor is a crew member is 0.31. Does that mean that they had the highest rate of survival? Not necessarily. There were more crew members than any other economic class. If the survival was the same for all classes, then we would expect more of the survivors to be crew members. Was the probability of a crew member surviving larger than that for other economic classes? We first worked with the data from the Titanic in Exercises 3.38 and 3.54. There we defined events and found the probabilities, odds, and odds ratios for particular outcomes. Here we have used random variables, but notice that the basic processes are the same. Keep in mind throughout this chapter that what we will be doing here is not so much new as a different approach to the analysis. Definition 6.3. The marginal probability functions of X1 and X2, respectively, are given by p ( x1 ) = ∑ p ( x1 , x 2 ) x2 and p ( x 2 ) = ∑ p ( x1 , x 2 ) x1 The conditional distribution of X1 given X2 = x2 is P( X 1 = x1 , X 2 = x2 ) P ( X 1 = x1 | X 2 = x2 ) = P ( X 2 = x2 ) which is defined for all real values of x1. Similarly, for all real values of x2, the conditional distribution of X2 given X1 = x1 is P( X 1 = x1 , X 2 = x2 ) P ( X 2 = x2 | X 1 = x1 ) = P( X 1 = x1 ) Further, X1 and X2 are independent if the marginal and conditional distributions are equal. Example 6.2: Three checkout counters are in operation at a local supermarket. Two customers arrive at the counters at different times, when the counters are serving no other customers. It is assumed that the customers then choose a checkout station at random and independent of one another. Let X1 denote the number of times counter A is selected, and let X2 denote 6 the number of times counter B is selected by the two customers. Find the joint probability distribution of X1 and X2. Find the probability that one of the customers visits counter B, given that one of the customers is known to have visited counter A. Solution: For convenience, let us introduce X3—the number of customers who visit counter C. Now the event (X1 = 0, X2 = 0) is equivalent to the event (X1 = 0, X2 = 0, X3 = 2). It follows that P( X 1 = 0, X 2 = 0) = P( X 1 = 0, X 2 = 0, X 3 = 2) = P( customer I selects counter C and counter II selects counter C ) = P( customer I selects counter C ) × P( customer II selects counter C ) 1 1 × 3 3 1 = 9 = because customers’ choices are independent and because each customer makes a random selection from among the three available counters. It is slightly more complicated to calculate P( X 1 = 1, X 2 = 0) = P( X 1 = 1, X 2 = 0, X 3 = 1) . In this event, customer I could select counter A and customer II could select counter C, or I could select C and II could select A. Thus, P ( X 1 = 1, X 2 = 0) = P ( I selects A) P ( II selects C ) + P ( I selects C ) P ( II selects A) ⎛1 1⎞ ⎛1 1⎞ =⎜ × ⎟+⎜ × ⎟ ⎝ 3 3⎠ ⎝ 3 3⎠ 2 = 9 Similar arguments allow one to derive the results given in Table 6.3. 7 Table 6.3. Probability Distribution Associated with Example 6.2 Marginal Probabilities for X2 X1 0 1 2 0 1/9 2/9 1/9 4/9 X2 1 2/9 2/9 0 4/9 2 1/9 0 0 1/9 Marginal Probabilities for X1 4/9 4/9 1/9 1 The statement asks for P ( X 2 = 1 | X 1 = 1) = P ( X 1 = 1, X 2 = 1) P ( X 1 = 1) 2/9 4/9 1 = 2 = Does this answer (1/2) agree with your intuition? Before we move to the continuous case, let us quickly review the situation in one dimension. If f (x ) denotes the probability density function of a random variable X, then f (x ) represents a relative frequency curve, and probabilities, such as P ( a ≤ X ≤ b) , are represented as areas under this curve. In symbolic terms, b P (a ≤ X ≤ b) = ∫ f ( x)dx a Now, suppose that we are interested in the joint behavior of two continuous random variables—say, X1 and X2—where X1 and X2 might represent the amounts of two different hydrocarbons found in an air sample taken for a pollution study, for example. The relative frequency of these two random variables can be modeled by a bivariate function, f ( x1 , x2 ) , which forms a probability, or relative frequency, surface in three dimensions. Figure 6.1 shows such a surface. The probability that X1 lies in one interval and that X2 lies in another interval is then represented as a volume under this surface. Thus, b2 a2 P(a1 ≤ X 1 ≤ a 2 , b1 ≤ X 2 ≤ b2 ) = ∫ ∫ f ( x1 , x 2 )dx1dx2 b1 a1 Notice that this integral simply gives the volume under the surface and over the shaded region in figure 6.1. We trace the actual computations involved in such a biaraiate problem with a very simple example. 8 Figure 6.1. A bivariate density function Example 6.3: A certain process for producing an industrial chemical yields a product that contains two main types of impurities. For a certain volume of sample from this process, let X1 denote the proportion of total impurities in the sample, and let X2 denote the proportion of type 1 impurity among all impurities found. Suppose that, after investigation of many such samples, the joint distribution of X1 and X2 can be adequately modeled by the following function: ⎧⎪2(1 − x1 ), for 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1 f ( x1 , x2 ) = ⎨ elsewhere ⎪⎩0, This function graphs as the surface given in Figure 6.2. Calculate the probability that X1 is less than 0.5 and that X2 is between 0.4 and 0.7. Figure 6.2. Probability density function for Example 6.2. 9 Solution: From the preceding discussion, we see that 0.70.5 P(0 ≤ x1 ≤ 0.5, 0.4 ≤ x2 ≤ 0.7) = ∫ ∫ 2(1 − x1 )dx1dx2 0.4 0 2 0.5 ∫ [− (1 − x1 ) ]0 dx2 0.7 = 0.4 0.7 = ∫ (0.75)dx2 0.4 = 0.75 x2 ]00..74 = (0.75)(0.3) = 0.225 Thus, the fraction of such samples having less than 50% impurities and between 40% and 70% type 1 impurities is 0.225. The distribution function continues to be F( X 1 , X 2 ) ( x1 , x2 ) = P( X 1 ≤ x1 , X 2 ≤ x2 ), ( x1 , x2 ) ∈ ℜ2 However, to find the probabilities, we integrate, instead of sum, over the appropriate regions. The four conditions of a distribution function are the same for both discrete and continuous random variables. Just as the univariate (or marginal) probabilities were computed by summing over rows or columns in the discrete case, the univariate density function for X1 in the continuous case can be found by integrating (“summing”) over values of X2. Thus, the marginal density function of X1, f1 ( x1 ) , is given by ∞ f1 ( x1 ) = ∫ f ( x1 , x 2 )dx 2 −∞ Similarly, the marginal density function of X2, f 2 ( x2 ) is given by ∞ f 2 ( x 2 ) = ∫ f ( x1 , x 2 )dx1 −∞ 10 Example 6.4: For the case given in Example 6.3, find the marginal probability density functions for X1 and X2. Solution: Let us first try to visualize what the answers should look like, before going through the integration. To find f1 ( x1 ) , we accumulate all the probabilities in the x2 direction. Look at Figure 6.2 and think of collapsing the wedge-shaped figure back onto the [x1 , f ( x1 , x2 )] plane. Then more probability mass will build up toward the zero point of the x1-axis than toward the unity point. In other words, the function f1 ( x1 ) should be high at 0 and low at 1. Formally, ∞ f ( x1 ) = ∫ f ( x1 , x2 )dx2 −∞ 1 = ∫ 2(1 − x1 )dx2 0 = 2(1 − x1 )x2 ]10 ⎧⎪2(1 − x1 ), = ⎨0, ⎪⎩ 0 ≤ x1 ≤ 1 elsewhere The function graphs as in Figure 6.3. Notice that our conjecture is correct. Thinking of how f 2 ( x2 ) should look geometrically, imagine that the wedge of Figure 6.2 has been forced back onto the [x2 , f ( x1 , x2 )] plane. Then the probability mass should accumulate equally all along the (0, 1) interval on the x2-axis. Mathematically, ∞ f 2 ( x2 ) = ∫ f ( x1 , x2 )dx1 −∞ 1 = ∫ 2(1 − x1 )dx1 0 [ = − (1 − x1 )2 ⎧ 1, =⎨ ⎩0, ] 1 0 0 ≤ x2 ≤ 1 elsewhere and again our conjecture is verified, as shown in Figure 6.4. 11 Figure 6.3. Probability density function f1(x1) for Example 6.3 Figure 6.4. Probability density function f2(x2) for Example 6.3. Definition 6.4. Let X1 and X2 be continuous random variables. The joint probability density function of X1 and X2, if it exists, is given by a nonnegative function f ( x1 , x 2 ) such that b2 a2 P (a1 ≤ X 1 ≤ a 2 , b1 ≤ X 2 ≤ b2 ) = ∫ ∫ f ( x1 , x 2 )dx1dx 2 b1 a1 The cumulative distribution function is F( X 1 , X 2 ) ( a, b) = P( X 1 ≤ a, X 2 ≤ b) b a = ∫ ∫ f ( x1 , x2 )dx1dx2 −∞ −∞ The marginal probability functions of X1 and X2, respectively, are given by ∞ f1 ( x1 ) = ∫ f ( x1 , x 2 )dx 2 −∞ Following is another (and somewhat more complicated) example. and ∞ f 2 ( x 2 ) = ∫ f ( x1 , x 2 )dx1 −∞ 12 Example 6.5: Gasoline is to be stocked in a bulk tank once each week and then sold to customers. Let X1 denote the proportion of the tank that is stocked in a particular week, and let X2 denote the proportion of the tank that is sold in the same week. Due to limited supplies, X1 is not fixed in advance but varies from week to week. Suppose that a study of many weeks shows the joint relative frequency behavior of X1 and X2 to be such that the following density function provides an adequate model: 0 ≤ x2 ≤ x1 ≤ 1 elsewhere ⎧⎪3x1 , f ( x1 , x2 ) = ⎨ 0, ⎪⎩ Notice that X2 must always be less than or equal to X1. This density function is graphed in Figure 6.5. Find the probability between 20% and 40% of the tank for a given week. Solution: Because the question refers to the proportion of the tank that was sold in a given week, the marginal behavior of X2 is to be addressed. Thus, it is necessary to find the marginal density, f 2 ( x2 ) , by integrating the joint density of X1 and X2 over X1. Unlike the last example, the bounds on the region of positive probability for one random variable depend on the value of the other random variable. A graph depicting the region of positive probability is shown in Figure 6.6. From this graph, we see that, for a given value of X2 = x2, positive probability is associated with X1 values from x2 to 1. Thus, we have ∞ f 2 ( x2 ) = ∫ f ( x1 , x2 )dx1 −∞ 1 = ∫ 3x1dx1 x2 1 3 ⎤ = x12 ⎥ 2 ⎦ x2 ( ) ⎧⎪ 3 1 − x22 , = ⎨2 ⎪⎩ 0, Now, we can find the desired probability. 0 ≤ x2 ≤ 1 elsewhere 13 ( ) 3 1 − x22 dx2 0. 2 2 0. 4 P (0.2 ≤ X 2 ≤ 0.4) = ∫ 0.4 3⎛ x 3 ⎞⎤ = ⎜⎜ x2 − 2 ⎟⎟⎥ 2⎝ 3 ⎠⎥⎦ 0.2 3 ⎧⎪ ⎡ (0.4)3 ⎤ ⎡ (0.2)3 ⎤ ⎫⎪ = ⎨ ⎢0.4 − ⎥⎬ ⎥ − ⎢0.2 − 3 ⎦ ⎣ 3 ⎦ ⎪⎭ 2 ⎪⎩ ⎣ = 0.272 Notice that, when working with the marginal density of X2, the region of positive probability does not depend on X1. This is always true. After integrating out X1, it plays no role in the marginal probability density of X2. The marginal density of X2 graphs as a function that is high at x2 = 0 and then tends toward 0 as x2 tends toward 1. Does this agree with your intuition after looking at Figure 6.5? Figure 6.5. The joint density function of Example 6.4 Figure 6.6. The region of positive probability for Example 6.4. 14 6.2 Conditional Probability Distributions Before considering conditional probability distributions, think back to conditional probability discussed in Chapter 3. For two events A and B, the conditional probability of A given B may be computed as P( AB ) P( A | B ) = P( B ) where P ( B ) > 0 . An analogous approach can be used with probability functions where the joint density of X1 and X2 replaces P ( AB ) , and the marginal densities of X1 and X2 replaces P ( A) and P (B ) . That is, an intuitive definition of the conditional density of X1 given X2 is f ( x1 , x 2 ) f ( x1 | x2 ) = f ( x2 ) for f ( x2 ) > 0 . We shall soon see that this intuition is correct. More formally, recall that, in the bivariate discrete case, the conditional probabilities for X1 for a given X2 were found by fixing attention on the particular row in which X 2 = x2 , and then looking at the relative probabilities within that row; that is, the individual cell probabilities were divided by the marginal total for that row, to obtain conditional probabilities. In the bivariate continuous case, the form of the probability density function representing the conditional behavior of X1 for a given X2 is found by slicing through the joint density in the x1 direction at the particular value of X2. The function then has to be weighted by the marginal density function for X2 at that point. Let’s look at a specific example before examining the general definition of conditional density functions. Example 6.6: Refer to Example 6.5. If in a given week the tank is stocked to be 50% full, find the probability that the proportion of the tank that is sold during that week is less than 0.2. Solution: To address this question, we need to first the conditional distribution of X2 given X1. Slicing through f ( x1, x2 ) in the x2 direction at x1 = 0.5 yields f (0.5, x2 ) = 3(0.5) = 1.5, 0 ≤ x2 ≤ 0.5 Thus, the conditional behavior of X2 for a given X1 of 0.5 is constant over the interval (0, 0.5). The marginal value of f ( x1 ) at x1 = 0.5 is obtained as follows: 15 ∞ f1 ( x1 ) = ∫ f ( x1 , x2 )dx2 −∞ x1 = ∫ 3 x1dx2 0 ⎧3x12 , ⎪ = ⎨0, ⎪⎩ 0 ≤ x1 ≤ 1 elsewhere f1 (0.5) = 3(0.5) 2 = 0.75 Upon dividing, we see that the conditional behavior of X2 for a given X1 of 0.5 is represented by the function f (0.5, x2 ) f ( x2 | x1 = 0.5) = f1 (0.5) 1.5 0.75 ⎧⎪2, = ⎨ 0, ⎪⎩ = 0 < x2 < 0.5 otherwise This function of x2 has all the properties of a probability density function and is the conditional density function for X2 at X1 = 0.5. (Notice that, unlike the marginal distribution, the conditional distribution of X2 depends on the particular value of X1 that is observed.) Then, 0.2 P ( X 2 < 0.2 | X 1 = 0.5) = ∫ f ( x2 | x1 = 0.5)dx2 0 0.2 = ∫ 2dx2 0 = 0.4 Thus, among all weeks in which the tank was half full immediately after stocking, sales amounted to less than 20% of the tank 40% of the time. Clearly, these manipulations to obtain conditional density functions in the continuous case are analogous to those used to obtain conditional probabilities in the discrete case, except that integrals are used in place of sums. The formal definition of a conditional density function follows. 16 Definition 6.5. Let X1 and X2 be continuous random variables with joint probability density function f ( x1 , x 2 ) and marginal densities f ( x1 ) and f ( x 2 ) , respectively. Then, the conditional probability density function of X1 given X 2 = x 2 is defined by ⎧ f ( x1 , x2 ) , for f 2 ( x 2 ) > 0 ⎪ f ( x1 | x 2 ) = ⎨ f 2 ( x 2 ) ⎪0, elsewhere ⎩ and the conditional probability density function of X2 given X 1 = x1 , is defined by ⎧ f ( x1 , x 2 ) , for f1 ( x1 ) > 0 ⎪ f ( x 2 | x1 ) = ⎨ f1 ( x 2 ) ⎪0, elsewhere ⎩ _______________________________________________________________________ Example 6.7: A soft-drink machine has a random supply Y2 at the beginning of a given day and dispenses a random amount Y1 during the day (with measurements in gallons). It is not resupplied during the day; hence, Y1 ≤ Y2. It has been observed that Y1 and Y2 have joint density ,0 ≤ y1 ≤ y2 , 0 ≤ y2 ≤ 2 ⎧⎪1 / 2, f ( y1 , y2 ) = ⎨ ⎪⎩ 0, elsewhere In other words, the points (y1, y2) are uniformly distributed over the triangle with the given boundaries. Find the conditional density of Y1, given Y2 = y2. Evaluate the probability that less than ½ gallon is sold, given that the machine contains 1 gallon at the start of the day. Solution: The marginal density of Y2 is given by ∞ f 2 ( y 2 ) = ∫ f ( y1 , y 2 )dy1 −∞ ⎧ y2 ⎪ (1 / 2)dy1 = (1 / 2) y 2 , = ⎨ 0∫ ⎪⎩0, By Definition 6.5, 0≤ y≤2 elsewhere 17 f1 ( y1 | y 2 ) = f ( y1 , y 2 ) f 2 ( y2 ) 1 ⎧ 1/ 2 , = ⎪ = ⎨ (1 / 2) y 2 y 2 ⎪⎩ 0, 0 ≤ y1 ≤ y 2 ≤ 2 elsewhere The probability of interest is ∞ P (Y1 < 1 / 2 | Y2 = 1) = ∫ f ( y1 | y 2 = 1)dy1 −∞ 1/ 2 = ∫ (1 / 2)dy1 0 = 1 4 Thus, the amount sold is highly dependent on the amount in supply. Recall that the marginal density of Y1 (or Y2) does not involve Y2 (or Y1). However, note that a relationship between Y1 and Y2 may again be evident in the conditional distribution of Y1 given Y2 (or Y2 given Y1), both in expressing the height of the density function and in designating the region of positive probability. ____________________________________________________________________ 6.3 Independent Random Variables Before defining independent random variables, let us recall once again that two events A and B are independent if P ( A, B ) = P ( A) P ( B ) . Somewhat analogously, two discrete random variables are independent if P ( X 1 , X 2 ) = P ( X 1 = x1 ) P ( X 2 = x 2 ) for all real numbers x1 and x 2 . A similar idea carries over to the continuous case. Definition 6.6. Discrete random variables X1 and X2 are said to be independent if P ( X 1 = x1 , X 2 = x 2 ) = P ( X 1 = x1 ) P ( X 2 = x 2 ) for all real numbers x1 and x 2 . Continuous random variables X1 and X2 are said to be independent if f ( x1 , x 2 ) = f ( x1 ) f ( x 2 ) for all real numbers x1 and x 2 . The concepts of joint probability density functions and independence extend immediately to n random variables, where n is any finite positive integer. The n random 18 variables X1, X2, …, Xn are said to be independent if their joint density function, f ( x1 , x2 , …, xn ) is given by f ( x1 , x 2 , …, x n ) = f ( x1 ) f ( x 2 ) f ( x n ) for all real numbers x1 , x 2 ,…, xn . Example 6.8: Show that the random variables survival and economic class that have the joint distribution identified in Table 6.2 are not independent. Solution: It is necessary to check only one entry in the table. We see that P(X1 = 0, X2 = 0) = 0.38, whereas P(X1 = 0) = 0.76 and P(X2 = 0) = 0.55. Because P( X 1 = 0, X 2 = 0) = 0.76 ≠ 0.418 = P( X 1 = 0) P( X 2 = 0) the random variables cannot be independent. Note: To prove that two variables are not independent, it is sufficient to find one value x1 and one value x2 such that P( X 1 = x1 , X 2 = x2 ) ≠ P( X 1 = x1 ) P( X 2 = x2 ) if (X1, X2) are discrete or f ( x1 , x2 ) ≠ f ( x1 ) f ( x2 ) if (X1, X2) are continuous. In contrast, to prove the two variables are independent, equality of the joint probability function and the product of the marginal probability functions must be shown to hold for all (x1, x2). Example 6.9: Show that the random variables in Example 6.3 are independent. Solution: Here, ⎧2(1 − x1 ), ⎪ f ( x1 , x 2 ) = ⎨ 0, ⎪⎩ We saw in Example 6.4 that and ⎧⎪2(1 − x1 ), f ( x1 ) = ⎨0, ⎪⎩ 0 ≤ x1 ≤ 1; 0 ≤ x2 ≤ 1 otherwise 0 ≤ x1 ≤ 1 elsewhere 19 ⎧ 1, f 2 ( x 2 ) == ⎨ ⎩0, 0 ≤ x2 ≤ 1 elsewhere Thus f ( x1 , x2 ) = f ( x1 ) f ( x2 ) , for all real numbers x1 and x2; and X1 and X2 are independent random variables. Exercises 6.1. Two construction contracts are to be randomly assigned to one or more of three firms. Numbering the firms I, II, and III, let X1 be the number of contracts assigned to firm I, and let X2 be the number assigned to firm II. (A firm may receive more than one contract.) a. Find the joint probability distribution for X1 and X2. b. Find the marginal probability distribution for X1. c. Find P(X1 = 1|X2 = 1). 6.2. A mountain rescue service studied the behavior of lost hikers so that more effective search strategies could be devised. They decided to determine both the direction traveled and the experience level. From this study, it is known that the probabilities of a hiker being experienced or not and of going uphill, downhill or remaining in the same place are as shown in the table below: Direction Uphill Downhill Remain in Same Place Novice 0.10 0.25 0.25 Experienced 0.05 0.10 0.25 a. Find the marginal distribution of the direction a lost hiker travels. b. Find the conditional distribution of direction given that the hiker is experienced. c. Are experience and the direction a hiker walks independent? Justify your answer. 6.3. The U.S. Census Bureau collects information on certain characteristics in the U.S. population. For 2004, it reported the following numbers of citizens by age and whether or not they were in poverty as shown below (all counts are in 1000s of people). Live in Poverty Do Not Live in Poverty Under 18 years 347 12,680 18 to 64 years 517 19,997 65 years and older 130 3,327 Totals 994 35,984 Totals 13,027 20,514 3,457 36,978 20 Suppose a person is selected at random from the citizens of the United States and observe his (her) age and poverty status. Further, define X1 and X2 as follows: ⎧ 1, ⎪ ⎪ X 1 = ⎨ 2, Let ⎪3, ⎪⎩ ⎧0, Let X 2 = ⎨ ⎩ 1, if age is under 18 years if age is 18 to 64 years if age is 65 years or older if person lives in poverty if person does not live in poverty a. Find the joint probability distribution of X1 and X2. b. Find the marginal probability distribution of age. c. Find the marginal probability distribution of poverty in the U.S. d. Find the conditional distribution of age given that a person is in poverty. e. Are age and whether or not a person lives in poverty in the U.S. independent? Justify your answer. 6.4. The National Center for Health Statistics (NCHS) of the Centers for Disease Control (CDC), among others, is interested in the causes of death and its relationship to gender, age, race, etc. For children, ages 1 to 4, the numbers of deaths for each gender and by cause during 2002 are presented in the table below. Let X1 be the random variable associated with cause of death and X2 be the random variable representing gender. . Cause of Death Female Male Total Accidents (unintentional injuries) 617 1024 1641 Congenital Problems 248 282 530 Assault (homicide) 189 234 323 Other 998 1266 2264 Total 2052 2806 4858 Source: CDC/NCHS a. Fully define the random variables X1 and X2. b. Find the joint distribution of gender and cause of death. c. Find the marginal distribution of gender. d. Find the marginal distribution of cause of death. e. What is the probability that a randomly selected male in this age range who died had a congenital problem leading to his death? f. What is the probability that a child, aged 1 to 4, who died as a consequence of a homicide was a female? g. Find the conditional distribution of cause of death given gender. h. Are cause of death and gender independent? Justify your answer. 21 6.5. Refer to Example 6.3. a. Find the cumulative distribution function for this distribution. b. Use the distribution function found in (a) to find the probability that there are no more than 50% total impurities and that the proportion of type I impurities is no more than 25% of the total. 6.6. A radioactive particle is randomly located in a square area with sides that are 1 unit in length. Let X1 and X2 denote the coordinates of the particle. Since the particle is equally likely to fall in any subarea of fixed size, a reasonable model of (X1, X2) is given by ⎧⎪1, 0 ≤ x1 ≤ 1. 0 ≤ x2 ≤ 1 f ( x1 , x2 ) = ⎨ elsewhere ⎪⎩0, a. Sketch the probability density surface. b. Find P( X 1 ≤ 0.2, X 2 ≤ 0.4) . c. Find P(0.1 ≤ X 1 ≤ 0.3, X 2 > 0.4) 6.7. Refer to Exercise 6.6. a. Find the joint cumulative distribution function for X1 and X2. b. Use the joint cumulative distribution function to find the probability that X1 < 0.5 and X2 <0.8. c. Verify the probability in (b) using the joint probability density function. 6.8. An environmental engineer measures the amount (by weight) of particulate pollution in air samples (of a certain volume) collected over the smokestack of a coal fueled power plant. Let X1 denote the amount of pollutant per sample when a certain cleaning device on the stack is not operating, and let X2 denote the amount of pollutants per sample when the cleaning device is operating under similar environmental conditions. It is observed that X1 is always greater than 2X2; and the relative frequency of (X1, X2) can be modeled by ⎧k , 0 ≤ x1 ≤ 2, 0 ≤ x 2 ≤ 1, 2 x2 ≤ x1 ⎪ f ( x1 , x 2 ) = ⎨0, elsewhere ⎪⎩ (In other words, X1 and X2 are randomly distributed over the region inside the triangle bounded by x1 =2, x2 = 0, and 2x2 = x1.) a. Find the value of k that makes this a probability density function. b. Find P(X1 ≥ 3X2). (In other words, find the probability that the cleaning device will reduce the amount of pollutant by one-third or more.) 6.9. Refer to the situation described in Exercise 6.8. a. Find the marginal density function of X2. b. Find P(X2 ≤ 0.4). c. Are X1 and X2 independent? 22 d. Find P(X2 ≤ ¼|X1 = 1). 6.10. Let X1 and X2 denote the proportions of two different chemicals found in a sample mixture of chemicals used as an insecticide. Suppose that X1 and X2 have a joint probability density given by ⎧⎪2, f ( x1 , x 2 ) = ⎨ ⎪⎩0, 0 ≤ x1 ≤ 1. 0 ≤ x 2 ≤ 1, 0 ≤ x1 + x 2 ≤ 1 elsewhere a. Find P(X1 ≤ ¼, X2 ≤ ½) b. Find P(X1 ≤ ¾, X2 ≤ ¾) c. Find P(X1 ≤ ½|X2 ≤ ½) 6.11. Refer to the setting described in Exercise 6.10. a. Find the marginal density functions for X1 and X2. b. Are X1 and X2 independent? 6.12. Refer to the setting described in Exercise 6.10. a. Find the conditional distribution of X1 given X2. b. Find P(X1 > ½|X2 = ¼). 6.13. Let X1 and X2 denote the proportions of time, out of one workweek, that employees I and II, respectively, actually spend performing their assigned tasks. The joint relative frequency behavior of X1 and X2 is modeled by the probability density function ⎧⎪ x1 + x2 , f ( x1 , x2 ) = ⎨ ⎪⎩ 0, 0 ≤ x1 ≤ 1. 0 ≤ x2 ≤ 1 elsewhere a. Find P(X1 < ½, X2 > ¼). b. Find P(X1 + X2 ≤ 1). 6.14. Refer to Exercise 6.13. a. Find the marginal distributions of X1 and X2, the distributions of the time each employee spends perform his assigned tasks. b. Find the probability that the first employee spends more than 75% of the workweek on his assigned task. c. Are X1 and X2 independent? 6.15. Referring to Exercise 6.13, find the probability that employee I spends more than 75% of the workweek on his assigned task, given that employee II spends exactly 50% of the workweek on her assigned task. 6.16. An electronic surveillance system has one of each of two different types of components in joint operations. If X1 and X2 denote the random life lengths of the 23 components of type I and type II, respectively, the joint probability density function is given by ⎧ 1 −( x1 + x2 ) / 2 , x1 > 0, x2 > 0 ⎪ xe f ( x1 , x2 ) = ⎨ 8 1 elsewhere ⎪⎩ 0, (measurements are in hundreds of hours). a. Are X1 and X2 independent? b. Find P(X1 >1, X2 >1). 6.17. A bus arrives at a bus top at a randomly selected time within a 1-hour period. A passenger arrives at the bus stop at a randomly selected time within the same hour. The passenger is willing to wait for the bus for up to ¼ of an hour. What is the probability that the passenger will catch the bus? [Hint: Let X1 denote the bus arrival time, and let X2 denote the passenger arrival time. If these arrivals are independent, then ⎧⎪1, f ( x1 , x2 ) = ⎨ ⎪⎩0, 0 ≤ x1 ≤ 1. 0 ≤ x2 ≤ 1 elsewhere Now find P(X2 ≤ X1 ≤ X2 + ¼).] 6.18. Two friends are to meet at a library. Each arrives at an independently selected time within a fixed 1-hour period. Each agrees to wait not more than 10 minutes for the other. Find the probability that they will meet. 6.19. Two quality control inspectors each interrupt a production line at randomly (and independently) selected times within a give day (of 8 hours). Find the probability that the two interruptions will be more than 4 hours apart. 6.20. Two customers enter a bank at random times within a fixed 1-hour period. Assume that the calls are independent of each other. a. Find the probability that both arrive in the first half hour. b. Find the probability that the two customers are within 5 minutes of each other. 6.21. A bombing target is in the center of a circle with radius 1 mile. A bomb falls at a randomly selected point inside that circle. If the bomb destroys everything within ½ mile of its landing point, what is the probability that it will destroy the target? 6.4 Expected Values of Functions of Random Variables When we encounter problems that involve more than one random variable, we often combine the variables into a single function. We may be interested in the life lengths of five different electronic components within the same system or the difference between 24 two strength test measurements on the same section of cable. We now discuss how to find expected values of functions of more than one random variable. Definition 5.5 gives the basic result for finding expected values in the bivariate case. The definition, of course, can be generalized to more variables. Notice that, if X1 and X2 are independent, then E[g ( X 1 )h( X 2 )] = E[g ( X 1 )]E[h( X 2 )] . A function of two variables that is commonly of interest in probabilistic and statistical problems is the covariance. Definition 6.7. Suppose that the discrete random variables (X1, X2) have a joint probability function given by p ( x1 , x 2 ) . If g ( X 1 , X 2 ) is any real-valued function of ( X 1 , X 2 ) , then E [g ( X 1 , X 2 )] = ∑ ∑ g ( x1 , x 2 ) p ( x1 , x 2 ) x1 x2 The sum is over all values of ( x1 , x 2 ) for which p ( x1 , x 2 ) > 0 . If ( X 1 , X 2 ) are continuous random variables, with probability density function f ( x1 , x 2 ) , then ∞ ∞ E [g ( X 1 , X 2 )] = ∫ ∫ g ( x1 , x 2 ) f ( x1 , x 2 )dx1dx 2 −∞ −∞ Definition 6.8. The covariance between two random variables X1 and X2 is given by cov( X 1 , X 2 ) = E [( X 1 − μ1 )( X 2 − μ 2 )] where μ1 = E ( X 1 ) and μ 2 = E ( X 2 ) The covariance helps us assess the relationship between two variables in the following sense. If X2 tends to be large when X1 is large, and if X2 tends to be small when X1 is small, then X1 and X2 have a positive covariance. If, on the other hand, X2 tends to be small when X1 is large and large when X1 is small, then the two variables have a negative covariance. While covariance measures the direction of the association between two random variables, correlation measures the strength of the association. Definition 6.9. The correlation coefficient between two random variables X1 and X2 is given by cov( X 1 , X 2 ) ρ= V ( X 1 )V ( X 2 ) 25 The correlation coefficient is a unitless quantity that takes on values between -1 and +1. If ρ = +1 or ρ = -1, then X2 must be a linear function of X1. The next theorem gives an easier computation form for the covariance. Theorem 6.1. If X1 has a mean μ1 and X2 has a mean of μ2, then cov( X 1 , X 2 ) = E ( X 1 X 2 ) − μ1 μ 2 The proof (not included here) is an application of the properties of expected value. If X1 and X2 are independent random variables, then E ( X 1 X 2 ) = E ( X 1 )E ( X 2 ) . Using Theorem 6.1, it is then clear that independence between X1 and X2 implies that cov( X 1 , X 2 ) = 0 . The converse is not necessarily true; that is, zero covariance does not imply that the variables are independent. To see how this theorem works, look at the joint distribution in Table 6.4. Clearly, E ( X 1 ) = E ( X 2 ) = 0 , and E ( X 1 X 2 ) = 0 as well. Therefore, cov( X 1 , X 2 ) = 0 . On the other hand, 1 9 P ( X 1 = −1, X 2 = −1) = ≠ = P ( X 1 = −1)P ( X 2 = −1) 8 64 so the random variables are dependent. Table 6.4. Joint Distribution of Dependent Variables with a Covariance of Zero Totals X1 -1 0 1 -1 1/8 1/8 1/8 3/8 X2 0 1/8 0 1/8 2/8 1 1/8 1/8 1/8 3/8 Totals 3/8 2/8 3/8 1 We shall now calculate the covariance in two examples. Example 6.9: A travel agency keeps track of the number of customers who call on any one day and the number of trips booked on any one day. Let X1 denote the number of calls, let X2 denote the number of orders placed, and let p(x1, x2) denote the joint probability function for (X1, X2). Records show that p(0, 0) = 0.04 p(2, 1) = 0.20 p(3, 1) = 0.16 26 p(1, 0) = 0.08 p(1, 1) = 0.06 p(2, 0) = 0.12 p(2, 2) = 0.12 p(3, 0) = 0.10 p(3, 2) = 0.10 p(3, 3) = 0.02 Thus, for any given day, the probability of, say, two calls and one order is 0.20. Find cov(X1, X2) and the correlation between X1 and X2. Solution: Theorem 6.1 suggests that we first find E(X1, X2), which is E ( X 1 X 2 ) = ∑ ∑ x1 x2 p( x1 , x2 ) x1 x2 = (0 × 0) p(0,0) + (1 × 0) p (1,0) + (1 × 1) p(1,1) + ( 2 × 0) p ( 2,0) + ( 2 × 1) p( 2,1) + ( 2 × 2) p ( 2,2) + (3 × 0) p(3,0) + (3 × 1) p(3,1) + (3 × 2) p(3,2) + (3 × 3) p(3,3) = 0(0.4) + 0(0.08) + 1(0.06) + 0(0.12) + 2(0.20) + 4(0.12) + 0(0.10) + 3(0.16) + 6(0.10) + 9(0.02) = 2.20 Now we must find E ( X 1 ) = μ1 and E ( X 2 ) = μ2 . The marginal distributions of X1 and X2 are given on the following charts: It follows that and Thus, x1 0 1 2 3 p(x1) 0.04 0.14 0.44 0.32 x2 0 1 2 3 p(x2) 0.34 0.42 0.22 0.02 μ1 = 1(0.14) + 2(0.44) + 3(0.32) = 1.98 μ2 = 1(0.42) + 2(0.22) + 3(0.02) = 0.92 27 cov( X 1 , X 2 ) = E ( X 1 X 2 ) − μ1μ2 = 2.20 − 1.98(0.92) = 0.3784 From the marginal distributions of X1 and X2, it follows that ( ) V ( X 1 ) = E X 12 − μ12 = 4.78 − (1.98) 2 = 0.8596 and ( ) V ( X 2 ) = E X 22 − μ22 = 1.48 − (0.92) 2 = 0.6336 Hence, ρ= = cov( X 1 , X 2 ) V ( X 1 ) ×V ( X 2 ) 0.3784 (0.8596)(0.6336) = 0.51 A moderate positive association exists between the number of calls and the number of trips booked. Do the positive covariance and the strength of the correlation agree with your intuition? _____________________________________________________________________ When a problem involves n random variables X 1 , X 2 ,…, X n , we are often interested in studying linear combinations of those variables. For example, if the random variables measure the quarterly incomes for n plants in a corporation, we may want to look at their sum or average. If X1 represents the monthly cost of servicing defective plants before a new quality control system was installed, and X2 denotes that cost after the system went into operation, then we may want to study X 1 − X 2 . Theorem 6.2 gives a general result for the mean and variance of a linear combination of random variables. 28 Theorem 6.2. Let Y1 , Y2 ,…, Yn and X 1 , X 2 , … , X m be random variables, with E (Yi ) = μ i and E ( X i ) = ξ i . Define n U 1 = ∑ ai Yi m U2 = ∑bj X j and i =1 j =1 For constants a1 , a 2 ,… , a n and b1 , b2 ,…, bm . Then the following relationships hold: n (a) E (U 1 ) = ∑ ai μ i i =1 ( n (b) V (U 1 ) = ∑ ai2V (Yi ) + 2 ∑ ∑ ai a j cov Yi , Y j i =1 i< j ) where the double sum is over all pairs (i, j ) with i < j ( n m (c) cov(U 1 ,U 2 ) = ∑ ∑ ai b j cov Yi , X j i =1 j =1 ) Proof: (a) The proof for part (a) follows directly from the definition of expected value and from properties of sums or integrals. (b) To prove this part, we resort to the definition of variance and write V (U 1 ) = E [U 1 − E (U 1 )]2 n n ⎤ = E ⎡⎢ ∑ ai Yi − ∑ ai μ i ⎥ i =1 ⎦ ⎣i =1 2 2 ⎤ ⎡n = E ⎢ ∑ ai (Yi − μ i )⎥ ⎦ ⎣i =1 ⎤ ⎡n 2 = E ⎢ ∑ ai2 (Yi − μ i ) + ∑ ∑ ai a j E (Yi − μ i ) Y j − μ j ⎥ i≠ j ⎦ ⎣i =1 [ [ n )] ( ( = ∑ ai2 E (Yi − μ i )2 + ∑ ∑ ai a j E (Yi − μ i ) Y j − μ j i =1 i≠ j By the definition of variance and covariance, we then have n ( V (U 1 ) = ∑ ai2V (Yi ) + ∑ ∑ ai a j cov Yi , Y j i =1 ( ) i≠ j ( ) ) Notice that cov Yi , Y j = cov Y j , Yi ; hence, we can write n ( V (U 1 ) = ∑ ai2V (Yi ) + 2 ∑ ∑ ai a j cov Yi , Y j i =1 i< j ) (c) The proof of this part is obtained by similar steps. We have )] 29 cov(U 1 , U 2 ) = E{[U 1 − E (U 1 )][U 2 − E (U 2 )]} n n ⎡⎛ n ⎞⎤ ⎞⎛ n = E ⎢⎜ ∑ aiYi − ∑ ai μi ⎟⎜⎜ ∑ ai X i − ∑ aiξ i ⎟⎟⎥ i =1 j =1 ⎠⎝ j =1 ⎢⎣⎝ i =1 ⎠⎥⎦ ⎧⎪ ⎡ n ⎤ ⎫⎪ ⎤⎡ n = E ⎨ ⎢∑ ai (Yi − μi )⎥ ⎢∑ ai ( X i − ξ i )⎥ ⎬ ⎪⎩ ⎣ i =1 ⎦ ⎣ j =1 ⎦ ⎪⎭ ⎤ ⎡n m = E ⎢∑∑ ai b j E[(Yi − μi )( X j − ξ j )]⎥ ⎦ ⎣ i =1 j =1 On observing that cov(Yi , Yi ) = V (Yi ) , we see that part (b) is a special case of part (c). Example 6.10: Let Y1 , Y2 , …, Yn be independent random variables, with E (Yi ) = μ and V (Yi ) = σ 2 . (These variables may denote the outcomes of n independent trials of an experiment.) Defining 1 n Y = ∑ Yi n i =1 2 σ Show that E (Y ) = μ and V (Y ) = . n Solution: Notice that Y is a linear function with all constants ai equal to 1/n; that is, ⎛1⎞ ⎛1⎞ Y = ⎜ ⎟Y1 + ⎜ ⎟Y2 + ⎝n⎠ ⎝n⎠ By Theorem 6.2 part (a), E (Y ) = ∑ ai μ n i =1 n = μ ∑ ai i =1 1 i =1 n n = μ∑ = nμ n =μ By Theorem 6.2 part (b), ⎛1⎞ + ⎜ ⎟Yn ⎝n⎠ 30 ( V (Y ) = ∑ ai2V (Yi ) + 2 ∑ ∑ ai a j cov Yi , Y j n i =1 i< j ) But the covariance terms are all zero, since the random variables are independent. Thus, 2 nσ 2 σ 2 ⎛1⎞ V (Y ) = ∑ ⎜ ⎟ σ 2 = 2 = n i =1⎝ n ⎠ n . n Example 6.11: Let X1 denote the amount of gasoline stocked in a bulk tank at the beginning of a week, and let X2 denote the amount sold during the week; then Y = X 1 − X 2 represents the amount left over at the end of the week. Find the mean and the variance of Y, assuming that the joint density function of (X1, X2) is given by for 0 ≤ x2 ≤ x1 ≤ 1 ⎧⎪ 3x1 , f ( x1 , x2 ) = ⎨ ⎪⎩0, elsewhere Solution: We must first find the means and the variances of X1 and X2. The marginal density of X1 is found to be ⎧⎪3x12 , for 0 ≤ x1 ≤ 1 f1 ( x1 ) = ⎨ ⎪⎩0, elsewhere Thus, 1 ( ) E ( X 1 ) = ∫ x1 3x12 dx1 0 1 ⎡ x4 ⎤ = 3⎢ 1 ⎥ ⎣ 4 ⎦0 3 = 4 The marginal distribution of X2 is found to be ( ) ⎧3 2 ⎪ 1 − x2 , f 2 ( x2 ) = ⎨ 2 ⎪⎩ 0, Thus, for 0 ≤ x2 ≤ 1 elsewhere 31 1 E ( X 2 ) = ∫ ( x2 ) 0 ( ) 3 1 − x22 dx2 2 ( ) 31 = ∫ x2 − x23 dx2 20 1 1 3 ⎧⎪ ⎡ x22 ⎤ ⎡ x24 ⎤ ⎫⎪ = ⎨⎢ ⎥ − ⎢ ⎥ ⎬ 2 ⎪⎣ 2 ⎦ 0 ⎣ 4 ⎦ 0 ⎪ ⎭ ⎩ 3 ⎧1 1 ⎫ = ⎨ − ⎬ 2 ⎩2 4 ⎭ 3 = 8 By similar arguments, it follows that E ( X 12 ) = 3 5 3 ⎛ 3⎞ V ( X1) = − ⎜ ⎟ 5 ⎝4⎠ = 0.0375 E ( X 22 ) = 2 1 5 and 1 ⎛ 3⎞ V(X2) = − ⎜ ⎟ 5 ⎝8⎠ = 0.0594 The next step is to find cov(X1,X2). Now 2 32 1 x1 E ( X 1 X 2 ) = ∫ ∫ ( x1 x2 )3x1dx2 dx1 00 1 x1 = 3∫ ∫ x12 x2 dx2 dx1 0 0 1 = 3∫ 0 = ⎡ x2 x12 ⎢ 2 x1 ⎤ ⎥ dx1 2 ⎣ ⎦0 31 4 ∫ x1 dx1 20 1 3 ⎡ x5 ⎤ = ⎢ 1⎥ 2 ⎣ 5 ⎦0 = and 3 10 cov( X 1 , X 2 ) = E ( X 1 X 2 ) − μ1μ2 3 ⎛ 3 ⎞⎛ 3 ⎞ − ⎜ ⎟⎜ ⎟ 10 ⎝ 4 ⎠⎝ 8 ⎠ = 0.0188 = From Theorem 6.2, E (Y ) = E ( X 1 ) − E ( X 2 ) 3 3 − 4 8 3 = 8 = 0.0375 = and V (Y ) = V ( X 1 ) + V ( X 2 ) + 2(1)( −1) cov( X 1 , X 2 ) = 0.0375 + 0.0594 − 2(0.0188) = 0.0593 When the random variables in use are independent, calculating the variance of a linear function simplifies, since the covariances are zero. Example 6.12: A firm purchases two types of industrial chemicals. The amount of type I chemical purchased per week, X1, has E(X1) = 40 gallons, with V(X1) =4. The amount of type II chemical purchased, X2, has E(X2) = 65 gallons, with V(X2) = 8. Type I costs $3 per 33 gallon, whereas type II costs $5 per gallon. Find the mean and the variance of the total weekly amount spent on these types of chemicals, assuming that X1 and X2 are independent. Solution: The dollar amount spent week is given by Y = 3X1 + 5X 2 From Theorem 6.2, E (Y ) = 3E ( X 1 ) + 5E ( X 2 ) = 3( 40) + 5(65) = 445 and V (Y ) = (3) 2V ( X 1 ) + (5) 2V ( X 2 ) = 9( 4) + 25(8) = 236 The firm can expect to spend $445 per week in chemicals. Example 6.13: Sampling problems involving finite populations can be modeled by selecting balls from urns. Suppose that an urn contains r white balls and (N – r) black balls. A random sample of n balls is drawn without replacement; and Y, the number of white balls in the sample, is observed. From Chapter 4, we know that Y has a hypergeometric probability distribution. Find the mean and the variance of Y. Solution: We first observe some characteristics of sampling without replacement. Suppose that the sampling is done sequentially, and we observe outcomes from X1, X2, …, Xn, where ⎧ 1, Xi = ⎨ ⎩0, if the ith draw results in a white ball otherwise Unquestionably, P( X 1 ) = r N . But, it is also true that P( X 2 ) = r N , because 34 P( X 2 = 1) = P( X 1 = 1, X 2 = 1) + P( X 1 = 0, X 2 = 1) = P( X 1 = 1) P( X 2 = 1 | X 1 = 1) + P( X 1 = 0) P( X 2 = 1 | X 1 = 0) r ⎛ r −1 ⎞ N − r ⎛ r ⎞ ⎟ ⎜ ⎟+ ⎜ N ⎝ N −1⎠ N ⎝ N −1⎠ r ( N − 1) = N ( N − 1) r = N = The same is true for Xk; that is, P ( X k = 1) = r , N k = 1,2,..., n Thus, the probability of drawing a white ball on any draw, given no knowledge of the outcomes of previous draws, is r/N. In a similar way, it can be shown that P ( X j = 1, X k = 1) = r ( r − 1) , N ( N − 1) j≠k n Now observe that Y = ∑ X i ; hence, i =1 n ⎛ r ⎞ E (Y ) = ∑ E ( X i ) = n⎜ ⎟ i =1 ⎝N⎠ To find V(Y), we need V(Xi) and cov(Xi, Xj). Because Xi is 1 with probability r/N and 0 with probability 1 – (r/N), it follows that V (Xi ) = Also, r N r ⎞ ⎛ ⎜1 − ⎟ ⎝ N⎠ cov( X i , X j ) = E ( X i X j ) − E ( X i ) E ( X j ) = r ( r − 1) ⎛ r ⎞ −⎜ ⎟ N ( N − 1) ⎝ N ⎠ =− r N 2 r⎞ 1 ⎛ ⎜1 − ⎟ ⎝ N ⎠ N −1 Because XiXj = 1 if and only if Xi = 1 and Xj = 1. From Theorem 6.2, we know that 35 n V (Y ) = ∑ V ( X i ) + 2∑∑ cov( X i , X j ) i =1 i< j =n r ⎛ r ⎞ r⎞ 1 ⎤ ⎡ r ⎛ ⎜1 − ⎟ + 2∑∑ ⎢− ⎜1 − ⎟ N⎝ N⎠ N⎝ N ⎠ N − 1 ⎥⎦ i< j ⎣ =n r ⎛ r ⎞ r ⎛ r⎞ 1 ⎜1 − ⎟ − n ( n − 1) ⎜1 − ⎟ N⎝ N⎠ N⎝ N ⎠ N −1 because there are n(n – 1)/2 terms in the double summation. A little algebra yields V (Y ) = n r ⎛ r ⎞ N −n ⎜1 − ⎟ N ⎝ N ⎠ N −1 The usefulness of Theorem 6.2 becomes especially clear when one tries to find the expected value and the variance for the hypergeometric random variable. Without it, by proceeding directly from the definition of an expectation, the necessary summations are exceedingly difficult to obtain. Exercises 6.22. A radioactive particle is randomly located in a square area with sides that are 1 unit in length. Let X1 and X2 denote the coordinates of the particle. Since the particle is equally likely to fall in any subarea of fixed size, a reasonable model of (X1, X2) is given by ⎧⎪1, 0 ≤ x1 ≤ 1. 0 ≤ x 2 ≤ 1 f ( x1 , x 2 ) = ⎨ elsewhere ⎪⎩0, a. Find the mean of Xi. b. Find the mean of X2 . 6.23. Refer to Exercise 6.22. a. Find the covariance between X1 and X2. b. Find the correlation between X1 and X2. 6.24. In a study of particulate pollution in air samples over a smokestack, X1 represents the amount of pollutants per sample when a cleaning device is not operating, and X2 represents the amount per sample when the cleaning device is operating. Assume that (X1, X2) has a joint probability density function for 0 ≤ x1 ≤ 2; 0 ≤ x2 ≤ 1; 2 x2 ≤ x1 ⎧⎪1, f ( x1 , x2 ) = ⎨ ⎪⎩0, elsewhere a. Find the mean and variance of pollutants per sample when the cleaning device is not in place. 36 b. Find the mean and variance of pollutants per sample when the cleaning device is in place. 6.25. Refer to Exercise 6.25. The random variable Y = X1 – X2 represents the amount by which the weight of emitted pollutant can be reduced by using the cleaning device. a. Find the mean and variance of Y. b. Find an interval within which values of Y should lie at least 75% of the time. 6.26. Let X1 and X2 denote the proportions of two different chemicals found in a sample mixture of chemicals used as an insecticide. Suppose that X1 and X2 have a joint probability density given by ⎧⎪2, f ( x1 , x 2 ) = ⎨ ⎪⎩0, 0 ≤ x1 ≤ 1. 0 ≤ x 2 ≤ 1, 0 ≤ x1 + x 2 ≤ 1 elsewhere a. Find the mean of X1. b. Find the variance of X1. 6.27. Refer to Exercise 6.26. a. Find the mean of X2. b. Find the variance of X2. 6.28. Refer to Exercise 6.26. a. Find the covariance between X1 and X2. b. Find the correlation between X1 given X2. 6.29. The proportions X1 and X2 of two chemicals found in samples in an insecticide have the joint probability density function ⎧⎪2, f ( x1 , x2 ) = ⎨ ⎪⎩ 0, for 0 ≤ x1 ≤ 1; 0 ≤ x2 ≤ 1; 0 ≤ x1 + x2 ≤ 1 elsewhere The random variable Y = X1 + X2 denotes the proportion of the insecticides due to the combination of both chemicals. a. Find the mean and variance of the proportion of the first chemical (X1) used in the insecticide. b. Find the mean and variance of the proportion of the insecticides due to the combination of both chemicals. c. Find an interval within which values of Y should lie for at least 50% of the samples of insecticide. 6.30. For a sheet-metal stamping machine in a certain factory, the time between failures, X1, has a men time between failure (MTBF) of 56 hours and a variance of 16 hours. The repair time, X2, has a mean time to repair (MTTR) of 5 hours and a variance of 4 hours. 37 If X1 and X2 are independent, find the expected value and the variance of Y = X1 + X2, which represents one operation-repair cycle. 6.31. A particular fast-food outlet is interested in the joint behavior of the random variable Y1, which is defined as the total time, in minutes, between a customer’s arrival at the fast-food outlet and his or her leaving the service window, and Y2, the time that the customer waits in line before reaching the service window. Because Y1 includes the time time a customer waits in line, we must have Y1 ≥ Y2. The relative frequency distribution of observed values of Y1 and Y2 can be modeled by the probability density function ⎧⎪e − y1 , f ( y1 , y2 ) = ⎨ ⎪⎩ 0, a. b. c. d. for 0 ≤ y2 ≤ y1 < ∞ elsewhere Find P(Y1 < 2,Y2 > 1) . Find P(Y1 ≥ 2Y2 ) Find P(Y1 − Y2 ≥ 1) . [Note: Y1 − Y2 denotes the time spent at the service window.] Find the marginal density functions for Y1 and Y2. 6.32. For the situation described in Exercise 6.31, the random variable Y1 - Y2 represents the time spent at the service window. a. Find E(Y1 - Y2) b. Find V(Y1 - Y2). c. Is it highly likely that a customer will spend more than 2 minutes at the service window? 6.33. Prove Theorem 6.1. 6.5. Conditional Expectations Section 6.2 contains a discussion of conditional probability functions and conditional density functions, which we shall now relate to conditional expectations. Conditional expectations are defined in the same manner as univariate expectations are, except that the conditional density is used in place of the marginal density function. 38 Definition 6.8. If X1 and X2 are any two random variables, the conditional expectation of X1 given that X 2 = x 2 is defined to be ∞ E ( X 1 | X 2 = x2 ) = ∫ x1 f ( x1 | x2 )dx1 −∞ If X1 and X2 are jointly continuous, and E ( X 1 | X 2 = x 2 ) = ∑ x1 p ( x1 | x 2 ) x1 If X1 and X2 are jointly discrete Example 6.14: Refer to Example 6.7, with X1 denoting the amount of drink sold, X2 denoting the amount in supply, and ⎧ 1 / 2, 0 ≤ x1 ≤ x2 ; 0 ≤ x2 ≤ 2 f ( x1 , x2 ) = ⎨ elsewhere ⎩0, Find the conditional expectation of amount of sales X1, given that X2 = 1. Solution: In Example 6.7, we found that 0 ≤ x1 ≤ x2 ≤ 2 elsewhere ⎧1 / x 2 , f ( x1 | x2 ) = ⎨ ⎩0, Thus, from Definition 6.8, ∞ E ( X 1 | X 2 = 1) = ∫ x1 f ( x1 | x2 )dx1 −∞ 1 = ∫ x1 (1)dx1 0 x2 = 1 2 1 0 1 = 2 It follows that, if the soft-drink machine contains 1 gallon at the start of the day, the expected amount that will be sold that day is ½ gallon. It is important to note that we could solve this problem more generally. That is, given that the amount is supply for a given day is X2 = x2, what is the expected amount that will be sold that day. 39 ∞ E ( X 1 | X 2 = x2 ) = ∫ x1 f (x1 | x2 )dx1 −∞ x2 ⎛ 1⎞ = ∫ x1 ⎜⎜ ⎟⎟dx1 0 ⎝ x2 ⎠ x2 = 1 2 x2 = x2 , 2 x2 0 0 ≤ x2 ≤ 2 Notice that, if X2 = 1, meaning that the soft-drink machine contains 1 gallon at the start of the day, the expected amount sold that day is still ½ gallon. However, we can also easily determine that, if the soft-drink machine contains the full 2 gallons at the start of the day, the expected amount sold is 2/2 = 1 gallon. Similarly, we can determine the expected amount sold given any initial amount in the machine. Suppose we kept records over many days of the gallons at the start of the day and the amount sold. We could plot the amount sold against the amount available at the start of the day. The points of the scatter plot should lie roughly along a line. If we fit a least squares line to these points, the intercept should be 0, and the slope should be about ½. x In fact, we would be estimating E ( X 1 | X 2 = x2 ) = 2 . Of course, here we know enough 2 to obtain the exact relationship in X1 and X2. In real life, this is often not the case, and regression is used to obtain estimates of the very things we have derived exactly here. _______________________________________________________________________ In Example 6.14, we found the conditional expectation of X1, given X 2 = x2 , obtaining the conditional expectation as being a function of the random variable X2. Hence, we can find the expected value of the conditional expectation. The results of this type of iterated expectation are given in Theorem 6.3. 40 Theorem 6.3. Let X1 and X2 denote random variables. Then E ( X 1 ) = E [E ( X 1 | X 2 )] Where, on the right-hand side, the inside expectation is stated with respect to the conditional distribution of X1 given X2, and the outside expectation is stated with respect to the distribution of X2. Proof: Let X1 and X2 have the joint density function f ( x1 , x 2 ) and marginal densities f ( x1 ) and f ( x 2 ) , respectively. Then ∞ E ( X 1 ) = ∫ x1 f1 ( x1 )dx1 −∞ ∞ ∞ = ∫ ∫ x1 f (x1 , x 2 )dx1dx 2 −∞ −∞ ∞ ∞ = ∫ ∫ x1 f1 (x1 | x 2 ) f 2 (x 2 )dx1dx 2 −∞ −∞ ∞ ⎡∞ ⎤ = ∫ ⎢ ∫ x1 f1 (x1 | x 2 )dx1 ⎥ f 2 ( x 2 )dx 2 −∞ ⎣ −∞ ⎦ ∞ = ∫ E ( X 1 | X 2 = x 2 ) f 2 ( x 2 )dx 2 −∞ = E [E ( X 1 | X 2 )] The proof is similar for the discrete case. Conditional expectations can also be used to find the variance of a random variable, in the spirit of Theorem 6.3. The main result is given in Theorem 6.4. 41 Theorem 6.4. For random variables X1 and X2, V ( X 1 ) = E [V ( X 1 | X 2 )] + V [E ( X 1 | X 2 )] Proof: We begin with E [V ( X 1 | X 2 )] and write [( ) ] = E [E (X | X )] − E [E ( X | X )] = E [X ] − E [E ( X | X )] = E [X ] − [E ( X )] − E [E ( X | X E [V ( X 1 | X 2 )] = E E X 12 | X 2 − [E ( X 1 | X 2 )] 2 2 1 2 2 1 2 2 2 1 1 2 )]2 + [E ( X 1 )]2 2 2 = V ( X 1 ) − E [[E ( X 1 | X 2 )] ] + [E [E ( X 1 | X 2 )] ] = V ( X 1 ) − V [E ( X 1 | X 2 )] 2 1 2 1 1 2 Rearranging terms, we have V ( X 1 ) = E [V ( X 1 | X 2 )] + V [E ( X 1 | X 2 )] Example 6.15: A quality control plan for an assembly line involves sampling n finished items per day and counting Y, the number of defective items. If p denotes the probability of observing a defective item, then Y has a binomial distribution, when the number of items produced by the line is large. However, p varies from day to day and is assumed to have a uniform distribution on the interval from 0 to ¼. 1. Find the expected value of Y for any given day. 2. Find the standard deviation of Y. 3. Given n = 10 items are sampled, find the expected value and standard deviation of the number of defectives for that day. Solution: 1. From Theorem 6.4, we know that E (Y ) = E [E (Y | p )] For a given p, Y has a binomial distribution; hence, E (Y | p ) = np 42 Thus, E (Y ) = E ( np ) = nE ( p ) 1/ 4 = n ∫ 4 pdp 0 ⎛1⎞ = n⎜ ⎟ ⎝8⎠ And for n = 10, E (Y ) = 2. From Theorem 6.4, 10 5 = 8 4 V (Y ) = E[V (Y | p )] + [ E ( y | p )] = E[np(1 − p )] + V [np ] = nE ( p ) − nE ( p 2 ) + n 2V ( p ) = nE ( p ) − n ( p ) + n[ E ( p )]2 + n 2V ( p ) = nE ( p )[1 + E ( p )] + n( n − 1)V ( p ) The standard deviation of course is simply V (Y ) = nE ( p )[1 + E ( p)] + n(n − 1)V ( p ) . Using the properties of the uniform distribution, we have that E(p) = 1/8 and 1 . Thus, for n = 10, V ( p) = 2 ( 4) (12) 1 ⎛ 1 ⎞⎛ 7 ⎞ V (Y ) = 10⎜ ⎟⎜ ⎟ + 10(9) 2 ( 4) (12) ⎝ 8 ⎠⎝ 8 ⎠ = 1.56 and V (Y ) = 1.25 3. In the long run, if n = 10 items are inspected each day, this inspection policy should discover, on average, 5/4 defective items per day, with a standard deviation of 1.25. The calculations should be checked by actually finding the unconditional distribution of Y and computing E(Y) and V(Y) directly. ________________________________________________________________________ 43 Exercises 6.34. Let X1 and X2 denote the proportions of time, out of one workweek, that employees I and II, respectively, actually spend performing their assigned tasks. The joint relative frequency behavior of X1 and X2 is modeled by the probability density function ⎧⎪ x1 + x2 , f ( x1 , x2 ) = ⎨ ⎪⎩ 0, 0 ≤ x1 ≤ 1. 0 ≤ x2 ≤ 1 elsewhere a. If employee I works half of the time, what is the mean time employee II works? b. If employee I works half of the time, what is the variance of the time employee II works? 6.35. An environmental engineer measures the amount (by weight) of particulate pollution in air samples (of a certain volume) collected over the smokestack of a coal fueled power plant. Let X1 denote the amount of pollutant per sample when a certain cleaning device on the stack is not operating, and let X2 denote the amount of pollutants per sample when the cleaning device is operating under similar environmental conditions. The relative frequency of (X1, X2) can be modeled by ⎧ 1, ⎪ f ( x1 , x2 ) = ⎨0, ⎪⎩ 0 ≤ x1 ≤ 2, 0 ≤ x2 ≤ 1, 2 x2 ≤ x1 elsewhere a. If the amount of pollutant is 0.5 when the cleaning device is operating, what is the mean of the pollutant emitted when the cleaning deice is not operating. b. If the amount of pollutant is 0.5 when the cleaning device is operating, what is the variance of the pollutant emitted when the cleaning deice is not operating. 6.36. Refer to exercise 6.35. a. If the amount of pollutant is 0.5 when the cleaning device is not operating, what is the mean of the pollutant emitted when the cleaning deice is operating. b. If the amount of pollutant is 0.5 when the cleaning device is not operating, what is the variance of the pollutant emitted when the cleaning deice is operating. 6.37. Referring to Exercise 6.31, if a randomly observed customer’s total time between arrival and leaving the service window is known to have been 2 minutes, find the probability that the customer waited less than 1 minute to be served. 6.38. Referring to Exercise 6.31, suppose that a customer spends a length of time y1 in the fast-food outlet. Find the probability that this customer spends less than half of that time at the service window. 44 6.39. Suppose that Y is the random variable denoting the number of bacteria per cubic centimeter in water samples and that, for a given location Y has a Poisson distribution with mean λ. But λ varies from location to location and has a gamma distribution with parameters α and β. Find the expressions for E(Y) and V(Y). 6.40. The number of eggs N found in nests of a certain species of turtle has a Poisson distribution with mean λ. Each egg has probability p of being viable, and this event is independent from egg to egg. Find the mean and the variance of the number of viable eggs per nest. 6.41. Let Y have a geometric distribution; that is, let p ( y ) = p (1 − p ) y , y = 0,1,... The goal is to find V(Y) through a conditional argument. Let X = 1 if the first trial is a success, and let X = 0 if the first trial is a failure. Argue that E (Y 2 | X = 1) = 0 E (Y 2 | X = 0) = E [(1 + Y ) 2 ] Write E (Y 2 ) in terms of E (Y 2 | X ) Simplify the expression in part (b) to show that E (Y ) = 2− p p2 V (Y ) = 1− p p2 Show that 6.6 The Multinomial Distribution Suppose that an experiment consists of n independent trials, much like the binomial case, except that each trial can result in any one of k possible outcomes. For example, a customer checking out of a grocery store may choose any one of k checkout counters. Now suppose that the probability that a particular trial results in outcome i is denoted by pi, where i = 1, 2, …, k, denote the number of the n trials that result in outcome i. In developing a formula for P(Y1 = y1 , Y2 = y 2 , … , Yk = y k ) , we first observe that, because of independence of trials, the probability of having y1 outcomes of type 1 through yk outcomes of type k in a particular order is p1y1 p 2y2 … p kyk 45 The number of such orderings equals the number of ways to partition the n trials into y1 type 1 outcomes, y2 type 2 outcomes, and so on through yk type k outcomes, or n! y1! y 2 ! y k ! where k ∑ yi = n i =1 Hence, P(Y1 = y1 , Y2 = y 2 , … , Yk = y k ) = n! p1y1 p 2y2 y1! y 2 ! y k ! p kyk This is called the multinomial probability distribution. Notice that, for k = 2, we are back into the binomial case, because Y2 is then equal to n − Y1 . The following example illustrates the computations involved. Example 6.16: Items under inspection are subject to two types of defects. About 70% of the items in a large lot are judged to be free of defects; another 20% have only a type A defect; and the final 10% have only a type B defect (none has both types of defect). If six of these items are randomly selected from the lot, find the probability that three have no defects, one has a type A defect, and two have type B defects. Solution: If we can legitimately assume that the outcomes are independent from trial to trial (that is, from item to item in our sample), which they usually would be in a large lot, then the multinomal distribution provides a useful model. Letting Y1, Y2, and Y3 denote the number of trials resulting in zero, type A, and type B defectives, respectively, we have p1 = 0.7, p2 = 0.2, and p3 = 0.1. It follows that 6! (0.7)3 (0.2)(0.1) 2 3!1!2! = 0.041 P (Y1 = 3, Y2 = 1, Y3 = 2 ) = Example 6.17: Find E(Yi)and V(Yi) for the multinomial probability distribution. 46 Solution: We are concerned with the marginal distribution of Yi, the number of trials that fall in cell i. Imagine that all the cells, excluding cell i, are combined into a single large cell. Hence, every trial will result either in cell i or not in cell i, with probabilities pi and 1 – pi, respectively; and Yi possesses a binomial marginal probability distribution. Consequently, E (Yi ) = npi V (Yi ) = npi qi where qi = 1 − pi The same results can be obtained by setting up the expectations and evaluating. For example, n! E (Yi ) = ∑ ∑ ∑ y1 p1y1 p2y2 pkyk y y y ! ! ! y1 y2 yk 1 2 k Because we have already derived the expected value and the variance of Yi, the tedious summation of this expectation is left to the interested reader. Example 6.18: If Y1, Y2, …, Yk have the multinomial distribution given in Example 6.16, find cov(Ys,Yt), with s ≠ t. Solution: Thinking of the multinomial experiment as a sequence of n independent trials, we define ⎧1, Ui = ⎨ ⎩ 0, if trial i results in class s otherwise ⎧1, Wi = ⎨ ⎩ 0, if trial i results in class t otherwise and Then, n Ys = ∑U i i =1 and n Yt = ∑W j j =1 47 To evaluate cov(Ys,Yt), we need the following results: E (U i ) = ps E (W j ) = pt cov(U i ,W j ) = 0 if i ≠ j, since the trials are independent; and cov(U i ,W j ) = E (U iW j ) − E (U i ) E (W j ) = 0 − ps pt because UiWi always equals zero. From Theorem 6.2, we then have n n cov(Ys , Yt ) = ∑ ∑ cov(U i ,W j ) i =1 j =1 n = ∑ cov(U i ,Wi ) + ∑ ∑ cov(U i ,W j ) i =1 i< j n = ∑ ( − ps pt ) + 0 i =1 = −nps pt The covariance is negative, which is to be expected because a large number of outcomes in cell s would force the number in cell t to be small. Multinomial Experiment 1. The experiment consists of n identical trials. 2. The outcome of each trial falls into one of k classes or cells. 3. The probability that the outcome of a single trial will fall in a particular cell— say, cell i—is pi (i = 1, 2, …, k), and remains the same from trial to trial. Notice that p1 + p 2 + + p k = 1 . 4. The trials are independent. 5. The random variables of interest are Y1 , Y2 , …, Yk , where Yi (i = 1, 2, …, k) is equal to the number of trials in which the outcome falls in cell i. Notice that Y1 + Y2 + + Yk = n 48 The Multinomial Distribution P(Y1 = y1 , Y2 = y 2 ,… , Yk = y k ) = k k i =1 i =1 n! p1y1 p 2y2 y1! y 2 ! y k ! p kyk where ∑ y i = n and ∑ pi = 1 E (Yi ) = npi ( V (Yi ) = npi (1 − pi ) , ) cov Yi , Y j = −npi p j , i = 1, 2,…, k i≠ j Exercises 6.42. The National Fire Incident Reporting Service says that, among residential fires, approximated 74% are in one- or two-family homes, 20% are in multi-family homes (such as apartments), and the other 6% are in other types of dwellings. If five fires are reported independently in one day, find the probability that 2 are in one- or two-family home, 2 are in multi-family homes, and 1 in another type of dwelling. 6.43. The typical cost of damages for a fire in a family home is $50,000, whereas the typical cost of damage in an apartment fire is $20,000, and in other dwellings only $5000. Using the information given in Exercise 6.42, find the expected total damage cost for the five independently reported fires. 6.44. In inspections of aircraft, wing cracks are reported as nonexistent, detectable, or critical. The history of a certain fleet reveals that 70% of the planes inspected have no wing cracks, 25% have detectable wing cracks, and 5% have critical wing cracks. For the next four planes inspected, find the probabilities of the following events. a. One has a critical crack, one has a detectable crack, and two have no cracks. b. At least one critical crack is observed. 6.45. Of the 544,685 babies born in California in 2004, 24% were born in winter (December, January, February), 24% were born in spring (March, April, May), 26% were born in summer (June, July, August), and 26% were born in the fall (September, October, November). If four babies are randomly selected, what is the approximate probability that one was born in each season? Does it matter whether the babies were born in 2005? California? 6.46. The U.S. Bureau of Labor Statistics reported that, as of 2005, 19% of the adult population under age 65 was between 16 and 24 years of age, 20% was between 25 and 34, 23% was between 35 and 44, and 38% was between 45 and 64. An automobile manufacturer wants to obtain opinions on a new design from five randomly chosen adults from the under 65 group of adults. Of the five people so selected, find the approximate probability that two will be between 16 and 24, two will be between 25 and 44, and one will be between 45 and 64. 49 6.47. Customers leaving a subway station can exit through any one of three gates. Assuming that any particular customer is equally likely to select any one of the three gates, find the probabilities of the following events among a sample of four customers. a. Two select gate A, one selects gate B, and one selects gate C. b. All four select the same gate. c. All three gates are used. 6.48. Among a large number of applicants for a certain position, 60% have only a highschool education, 30% have some college training, and 10% have completed a college degree. If five applicants are selected to be interviewed, find the probability that at least one will have completed a college degree. What assumptions must be true for your answer to be valid? 6.49. In a large lot of manufactured items, 10% contain exactly one defect, and 5% contain more than one defect. Ten items are randomly selected from this lot for sale, and the repair costs total Y1 + 3Y2 where Y1 denotes the number among the ten having exactly one defect, and Y2 denotes the number among the ten having two or more defects. Find the expected value of the repair costs. Find the variance of the repair costs. 6.50. Referring to Exercise 6.49, if Y denotes the number of items that contain at least one defect, find the probabilities of the following events among the ten sampled items. a. Y is exactly 2 b. Y is at least 1 6.51. Vehicles arriving at an intersection can turn right or left or can continue straight ahead. In a study of traffic patterns at this intersection over a long period of time, engineers have noted that 40% of the vehicles turn left, 25% turn right, and the remainder continue straight ahead. a. For the next five cars entering this intersections, find the probability that one turns left, one turns right, and three continue straight ahead. b. For the next five cars entering the intersection, find the probability that at least one turns right. c. If 100 cars enter the intersection in a day, find the expected values and the variance of the number that turn left. What assumptions must be true for your answer to be valid? 6.7. More on the Moment-Generating Function The use of moment-generating functions to identify distributions of random variables is particularly useful in work with sums of independent random variables. Suppose that X1 and X2 are independent random variables, and let Y = X 1 + X 2 . Using the properties of expectations, we have 50 M Y (t ) = E (e tY ) = E (e t ( X 1 + X 2 ) ) = E (e tX1 e tX 2 ) = E (e tX1 )E (e tX 2 ) = M X 1 (t ) M X 2 (t ) That is, the moment generating function of the sum of two independent random variables is the product of the moment-generating functions of the two variables. This result can be extended to the sum of n independent random variables; that is, if X 1 , X 2 ,…, X n are n independent random variables, the moment generating function of Y = X 1 + X 2 + + X n is the product of the n moment generating functions: M Y (t ) = M X 1 (t ) M X 2 (t ) M X n (t ) . If we can then identify the form of the moment generating function, we can determine the distribution of Y as illustrated in the next example . Example 6.19: Suppose that X1 and X2 are independent, exponential random variables, each with mean θ. Define Y = X 1 + X 2 . Find the distribution of Y. Solution: The moment-generating functions of X1 and X2 are M X 1 (t ) = M X 2 (t ) = (1 − θt ) −1 Thus, the moment generating function of Y is M Y (t ) = M X 1 (t ) M X 2 (t ) = (1 − θt ) −1 (1 − θt ) −1 = (1 − θt ) − 2 Upon recognizing the form of this moment-generating function, we can immediately conclude that Y has a gamma distribution, with α = 2 and β = θ. This result can be generalized to the sum of independent gamma random variables with common scale parameter β. Note: A key to this method is identifying the moment-generating function of the derived random variable Y. 51 Example 6.20: Let X1 denote the number of vehicles passing a particular point on the eastbound lane of a highway in one hour. Suppose that the Poisson distribution with mean λ1 is a reasonable model for X1. Now, let X2 denote the number of vehicles passing a point on the westbound lane of the same highway in 1 hour. Suppose that X2 has a Poisson distribution with mean λ2. Of interest is Y = X1 + X2, the total traffic count in both lanes in one hour. Find the probability distribution for Y if X1 and X2 are assumed to be independent. Solution: It is known from Chapter 4 that t M X1 (t ) = eλ1 ( e −1) and t M X 2 (t ) = eλ2 ( e −1) By the properties of moment-generating functions described earlier, M Y (t ) = M X 1 (t ) M X 2 (t ) t t = eλ1 ( e −1) e λ2 ( e −1) t = e( λ1 +λ2 )( e −1) Now, the moment-generating function for Y has the form of a Poisson momentgenerating function, with mean λ1 + λ2. Thus, by the uniqueness property, Y must have a Poisson distribution with mean λ1 + λ2. Being able to add independent Poisson random variables and still retain the Poisson properties is important in many applications. Exercises 6.52. A quality control inspector random selects n1 items from the production line in the morning and determines the number of defectives, X1. She randomly selects n2 items from the same production line in the afternoon and determines the number of defectives, X2. a. Find the distribution of the number of defectives observed on a randomly selected day. What assumptions did you have to make? b. If n1 = 5 and n2 =4, what is the probability of not obtaining any defectives if the line is producing 1% defectives? 6.53 . Let X1 and X2 denote independent, normally distributed random variables, not necessarily having the same mean and variance. a. Show that, for any constants a and b, Y = aX 1 + bX 2 is normally distributed. 52 b. Use the moment generating function to find the mean and variance of Y. 6.54. Resistors of a certain type have resistances that are normally distributed, with a mean of 100 ohms and a standard deviation of 10 ohms. Two such resistors are connected in series, which causes the total resistance of the circuit to be the sum of the individual resistances. Find the probabilities of the following events. a. The total resistance exceeds 220 ohms. b. The total resistance is less than 190 ohms. 6.55. A certain type of elevator has a maximum weight capacity X1, which is normally distributed with a mean and a standard deviation of 500 and 300 pounds, respectively. For a certain building equipped with this type of elevator, the elevator loading X2 is a normally distributed random variable with a mean and a standard deviation of 4000 and 400 pounds, respectively. For any given time that the elevator is in use, find the probability that it will be overloaded, assuming that X1 and X2 are independent. 6.7 . Compounding and its Applications The univariate probability distributions of Chapters 4 and 5 depend on one or more parameters; once the parameters are known, the distributions are specified completely. These parameters frequently are unknown, however, and (as in Example 6.15) sometimes may be regarded as random quantities. Assigning distributions to these parameters and then finding the marginal distribution of the original random variable is known as compounding. This process has theoretical as well as practical uses, as we see next. Example 6.21: Suppose that Y denotes the number of bacteria per cubic centimeter in a certain liquid and that, for a given location, Y has a Poisson distribution with mean λ. Suppose, too, that λ varies from location to location and that, for a location chosen at random, λ has a gamma distribution with parameters α and β, where α is a positive integer. Find the probability distribution for the bacteria count Y at a randomly selected location. Solution: Because λ is random, the Poisson assumption applies to the conditional distribution of Y for fixed λ. Thus, λ y e −λ p( y | λ ) = , y = 0,1,2,... y! Also, ⎧ 1 λα −1e −λβ , λ >0 ⎪ f (λ ) = ⎨ Γ(α ) β α ⎪⎩ 0, otherwise 53 Then the joint distribution of λ and Y is given by g ( y , λ ) = p ( y | λ ) f (λ ) 1 λ y +α −1e −λ [1+(1 / β )] = y! Γ(α ) β α The marginal distribution of Y is found by integrating over λ; it yields p( y ) = ∞ 1 λ y +α −1e −λ [1+(1 / β )]dx α ∫ y! Γ(α ) β 0 ⎛ 1 1⎞ = Γ( y + α )⎜⎜1 + ⎟⎟ α y! Γ(α ) β ⎝ β⎠ −( y +α ) Since α is an integer, α p( y ) = ( y + α − 1)! ⎛ 1 ⎞ ⎜ ⎟ (α − 1)! y! ⎜⎝ β ⎟⎠ ⎛ β ⎞ ⎜⎜ ⎟⎟ ⎝1+ β ⎠ α ⎛ y + α − 1⎞⎛ 1 ⎞ = ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎝ α − 1 ⎠⎝ 1 + β ⎠ − ( y +α ) y ⎛ β ⎞ ⎜⎜ ⎟⎟ , ⎝1+ β ⎠ y = 0,1,2,... If we let α = r and 1/(1 + β) = p, then p(y) has the form of a negative binomial distribution. Hence, the negative binomial distribution is a reasonable model for counts in which the mean count may vary. Exercises 6.56. Suppose that a customer arrives at a checkout counter in a store just as the counter is opening. A random number of customers N will be ahead of him, since some customers may arrive early. Suppose that this number has the probability distribution p ( n ) = P ( N = n ) = pq n , n = 0,1,2,... where 0 < p < 1 and q = 1 – p (this is a form of the geometric distribution). Customer service times are assumed to be independent and identically distributed exponential random variables, with a mean of θ. a. For a given n, find the waiting time W for the customer to complete his checkout. b. Find the distribution of waiting time for the customer to complete his checkout. c. Find the mean and variance of the waiting time for the customer to complete his checkout. 54 6.8 . Summary Most phenomena that one studies, such as the effect of a drug on the body or the life length of a computer system, are the result of many variables acting jointly. In such studies, it is important to look not only at the marginal distributions of random variables individually, but also at the joint distribution of the variables acting together and at the conditional distributions of one variable for fixed values of another. Expected values of joint distributions lead to covariance and correlation, which help assess the direction and strength of association between two random variables. Moment-generating functions help identify distributions as being of a certain type (more on this in the next chapter) and help in the calculation of expected values. As in the univariate case, multivariate probability distributions are models of reality and do not precisely fit real situations. Nevertheless, they serve as very useful approximations that help build understanding of the world around us. Supplementary Exercises 6.57. Consider the joint probability density function given below: ⎧cx x , f ( x1 , x 2 ) = ⎨ 1 2 ⎩ 0, a. b. c. d. e. f. 0 ≤ x 2 ≤ x1 ,0 ≤ x1 ≤ 2 otherwise Determine the constant c that makes f a legitimate joint probability density function. Find the marginal density of X1. Find the marginal density of X2. Find the joint distribution function for X1 and X2. Find the probability P( X 1 < 1 / 2, X 2 < 3 / 4). Find the probability P( X 1 ≤ 1 / 2 | X 2 > 3 / 4). 6.58. Refer to Exercise 6.57. a. Find the conditional density of X2 given X1. b. Find the expected value of X2 given X1. c. Are X1 and X2 independent? 6.59. The joint density of X1 and X2 is given by ⎧3x1 , f ( x1 , x2 ) = ⎨ ⎩ 0, 0 ≤ x2 ≤ x1 ≤ 1 otherwise a. Find the marginal density functions of X1 and X2. b. Find the joint distribution function. c. Find P( X 1 ≤ 3 / 4, X 2 ≤ 1 / 2). 55 d. Find P( X 1 ≤ 1 / 2 | X 2 ≥ 3 / 4). 6.60. Refer to the situation described in Exercise 6.59. a. Find the conditional density of X1, given X2 = x2. b. Find the conditional density of X2, given X1 = x1. c. Are X1 and X2 independent? Justify your answer. d. Find the probability P( X 1 ≤ 3 / 4 | X 2 = 1 / 2). 6.61. A committee of three persons is to be randomly selected from a group consisting of four Republicans, three Democrats, and two independents. Let X1 denote the number of Republicans on the committee, and let X2 denote the number of Democrats on the committee. a. Find the joint probability distribution of X1 and X2. b. Find the marginal distributions of X1 and X2. c. Find the probability P( X 1 = 1 | X 2 ≥ 1). 6.62. The joint density of X1 and X2 is given by ⎧⎪6 x12 x2 , f ( x1 , x 2 ) = ⎨ ⎪⎩0, 0 ≤ x1 ≤ 1, 0 ≤ x 2 ≤ 1 otherwise a. Find the mean and variance of X1. b. Find the mean and variance of X2. 6.63. Refer to Exercise 6.62. a. Find the covariance of X1 and X2. b. Find the correlation between X1 and X2. 6.64. Let X1 denote the amount of a certain bulk item stocked by a supplier at the beginning of a week, and suppose that X1 has a uniform distribution over the interval 0 ≤ x1 ≤ 1. Let X2 denote the amount of this item sold by the supplier during the week, and suppose that X2 has a uniform distribution over the interval 0 ≤ x2 ≤ x1, where x1 is a specific value of X1. a. Find the joint density function of X1 and X2. b. If the supplier stocks an amount of the item of ½, what is the probability that she sells an amount greater than ¼? 6.65. Let X and Y have the joint probability density function given by ⎧3 y, f ( x, y ) = ⎨ ⎩ 0, a. Find the marginal density of X. b. Find the marginal density of Y. 0 ≤ x ≤ y; 0 ≤ y ≤1 otherwise 56 c. Find the conditional density of Y given X = x. d. Find the expected value of Y given X = x. e. Are X and Y independent? Justify your answer. 6.66. Let X and Y have joint density given by ⎧⎪2, f ( x, y ) = ⎨ ⎪⎩ 0, 0 ≤ x ≤ y, 0 ≤ y ≤ 1 otherwise a. Find the covariance of X and Y. b. Find the correlation coefficient. What, if anything, does it tell you about the relationship of X and Y? 6.67. Let (X1, X2) denote the coordinates of a point at random inside a unit circle, with center at the origin; that is, X1 and X2 have a joint density function given by ⎧⎪ 1 / π , f ( x1 , x2 ) = ⎨ ⎪⎩0, x12 + x22 ≤ 1 otherwise a. Find the marginal density function of X1. b. Find P( X 1 ≤ X 2 ) 6.68. The joint density of X1 and X2 is given by ⎧⎪ x1 + x2 , f ( x1 , x2 ) = ⎨ ⎪⎩0, 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1 otherwise a. Find the marginal density functions of X1 and X2. b. Are X1 and X2 independent? Justify your answer. c. Find the conditional density of X1 given X2 = x2. 6.69. Let X1 and X2 have the joint density function given by ⎧⎪ K , f ( x1 , x 2 ) = ⎨ ⎪⎩ 0, a. b. c. d. e. f. 0 ≤ x1 ≤ 2, 0 ≤ x 2 ≤ 1, 2 x 2 ≤ x1 otherwise Find the value of K that makes the function a probability density. Find the marginal density functions of X1 and X2. Find the conditional density of X1 given X2 = x2. Find the conditional density of X2 and X1 = x1. Find P( X 1 ≤ 1.5, X 2 = 0.5). Find P( X 2 ≤ 0.5 | X 1 = 1.5). 57 6.70. Let X1 and X2 have a joint distribution that is uniform over the region shaded in the accompanying diagram. a. b. c. d. Find the marginal density of X2. Find the marginal density of X1. Find the conditional density function of X2 given X1. Find P[( X 1 − X 2 ) ≥ 0]. 6.71. Refer to the situation described in Exercise 6.70. a. Find E(X1). b. Find V(X1). c. Find cov(X1, X2). 6.72. Referring to Exercises 6.57 and 6.58, find cov(X1, X2). 6.73. Refer to the situation described in Exercise 6.61. a. Find cov(X1, X2). b. Find E ( X 1 + X 2 ) and V ( X 1 + X 2 ) by finding the probability distribution of X 1 + X 2 . c. Find E ( X 1 + X 2 ) and V ( X 1 + X 2 ) by using Theorem 6.2. 6.74. Refer to the situation described in Exercise 6.68. a. Find cov(X1, X2). b. Find E (3 X 1 − 2 X 2 ) . c. Find V (3 X 1 − 2 X 2 ) . 6.75. Refer to the situation described in Exercise 6.69. a. Find E ( X 1 + 3 X 2 ) . b. Find V ( X 1 + 3 X 2 ) . 6.76. A quality control plan calls for randomly selecting three items from the daily production (assumed to be large) of a certain machine and observing the number of defectives. The proportion p of defectives produced by the machine varies from day to day and is assumed to have a uniform distribution on the interval (0, 1). For a randomly 58 chosen day, find the unconditional probability that exactly two defectives are observed in the sample. 6.77. The number of defects per year, denoted by X, for a certain fabric is known to have a Poisson distribution with parameter λ. However, λ is not known and is assumed to be random, with its probability density function given by ⎧⎪ e − λ , λ ≥0 f (λ ) = ⎨ ⎪⎩0, otherwise Find the unconditional probability function for X. 6.78. Prove that, if X1 and X2 are continuous, independent random variables, cov(X1, X2) = 0. 6.79. The mean time that it takes any given machine of a particular type to fail is exponentially distributed with parameter λ. However, λ varies from machine to machine according to a gamma distribution with parameters r and θ. a. Find the mean time that it would take for a randomly selected machine to fail. b. Find the variance of the time that it would take for a randomly selected machine to breakdown. 6.80. The length of life X of a fuse has the probability density ⎧e − λ / θ / θ , f (λ ) = ⎨ ⎩ 0, x > 0, θ > 0 otherwise Three such fuses operate independently. Find the joint density of their lengths of life, X1, X2, and X3. 6.81. A retail grocer figures that his daily gain X from sales is a normally distributed random variable, with μ = 50 and σ2 = 10 (measurements in dollars). X could be negative if he is forced to dispose of perishable goods. He also figures daily overhead costs Y to have a gamma distribution, with α = 4 and β = 2. If X and Y are independent, find the expected value and the variance of his net daily gain. Would you expect his net gain for tomorrow to go above $70? 6.82. A coin has probability p of coming up heads when tossed. In n independent tosses of the coin, let Xi = 1 if the ith toss results in heads and Xi = 0 if the ith toss results in tails. Then Y, the number of heads in the n tosses, has a binomial distribution and can be represented as n Y = ∑ Xi i =1 59 Find E(Y) and V(Y) using Theorem 6.2. 6.83. Refer to the situation and results of Exercises 6.57 and 6.58. a. Find E ( X 2 | X 1 = x1 ) . b. Use Theorem 6.4 to find E ( X 2 ) . c. Find E ( X 2 ) directly from the marginal density of X2. 6.84. Refer to the situation described in Exercise 6.77. a. Find E ( X ) by first finding the conditional expectation of X for given λ and then using Theorem 6.4. b. Find E(X) directly from the probability distribution of X. 6.85. Referring to exercise 6.64, if the supplier stocks an amount equal to 3/4, what is the expected amount that she will sell during the week? 6.86. Among typical July days in Tampa, 30% have total radiation of at most 5 calories, 60% have total radiation of more than 5 calories but no more than 6 calories, and 10% have total radiation of more than 6 calories. A solar collector for a hot water system is to be run for 6 days. Find the probability that 3 days will produce no more than 5 calories each, 1 day will produce between 5 and 6 calories, and 2 days will produce at least 6 calories each. What assumption must be true for your answer to be correct? 6.87. Let X be a continuous random variable with distribution function F(x) and density function f(x). We can then write, for x1 ≤ x2, P ( X ≤ x2 | X ≥ x1 ) = F ( x2 ) − F ( x1 ) 1 − F ( x1 ) As a function of x2 for fixed x1, the right-hand side of this expression is called the conditional distribution function of X, given that X ≥ x1. On taking the derivative with respect to x2, we see that the corresponding conditional density function is given by f (x2 ) , 1 − F ( x1 ) x2 ≥ x1 Suppose that a certain type of electronic component has life length X, with the density function (life length measured in hours) ⎧(1 / 400)e − x / 400 , x≥0 f ( x) = ⎨ otherwise ⎩ 0, Find the expected length of life for a component of this type that has already been in use for 100 hours. 60 6.88. Let X1, X2, and X3 be random variables, either continuous or discrete. The joint moment-generating function of X1, X2, and X3 is defined by ( M (t1 , t2 , t3 ) = E et1 X1 +t2 X 2 +t3 X 3 ) a. Show that M (t1 , t2 , t3 ) gives the moment-generating function for X 1 + X 2 + X 3 . b. Show that M (t1 , t2 ,0) gives the moment-generating function for X 1 + X 2 . c. Show that ∂ k1 + k2 + k3 M (t1 , t2 , t3 ) ∂t1k1 ∂t2k2 ∂t3k3 t =t 1 ( = E X 1k1 X 2k2 X 3k3 ) 2 = t3 = 0 6.89. Let X1, X2, and X3 have a multinomial distribution with probability function p ( x1 , x2 , x3 ) = n! p1x1 p2x2 p3x3 , x1! x2! x3! n ∑ xi = n i =1 Employ the results of Exercise 6.88 to perform the following tasks. a. Find the joint moment-generating function of X1, X2, and X3. b. Use the joint moment-generating function to find cov(X1, X2). 6.90. The negative binomial variable X is defined as the number of the failures prior to the rth success in a sequence of independent trials with constant probability p of success on each trial. Let Xi denote a geometric random variable, defined as the number of failures prior to the trial on which the first success occurs. Then, we can write r X = ∑ Xi i =1 for independent random variables X1, X2, …,Xr. Use Theorem 6.2 to show that E ( X ) = r p and V ( X ) = r (1 − p ) p 2 . 6.91. A box contains four balls, numbered 1 throug 4. One ball is selected at random from this box. Let X1 = 1 if ball 1 or ball 2 is drawn X2 = 1 if ball 1 or ball 3 is drawn X3 = 1 if ball 1 or ball 4 is drawn and let the values of Xi be zero otherwise. Show that any two of the random variables X1, X2, and X3 are independent, but the three together are not. 6.92. Let X1 and X2 be jointly distributed random variables with finite variances. 61 ( )( ) a. Show that [E ( X 1 X 2 ] ≤ E X 12 E X 22 . (Hint: Observe that, for any real number t, 2 [ ] E (tX 1 − X 2 ) 2 ≥ 0 ; or equivalently, ( ) ( ) t 2 E X 12 − 2 E ( X 1 X 2 ) + E X 22 ≥ 0 This is a quadratic expression of the form At 2 + Bt + C ; and because it is not negative, we must have B 2 − 4 AC ≤ 0 . The preceding inequality follows directly.) b. Let ρ denote the correlation coefficient of X1 and X2; that is, ρ= cov( X 1 , X 2 ) V ( X 1 )V ( X 2 ) Using the inequality of part (a), show that ρ 2 ≤ 1 . 6.93. A box contains N1 white balls, N2 black balls, and N3 red balls ( N1 + N 2 + N 3 = N ). A random sample of n balls is selected from the box, without replacement. Let X1, X2, and X3 denote the number of white, black, and red balls, respectively, observed in the sample. Find the correlation coefficient for X1 and X2. (Let pi = N i / N for i = 1, 2, 3.) 6.94. Let X1, X2, …, Xn be independent random variables with E ( X i ) = μ and V ( X i ) = σ 2 , for i = 1, 2, …, n. Let n U1 = ∑ ai X i i =1 and n U 2 = ∑ bi X i i =1 where a1 , a2 ,…, an and b1 , b2 ,…, bn are constants. U1 and U2 are said to be orthogonal if n cov(U1, U2) = 0. Show that U1 and U2 are orthogonal if and only if ∑ ai bi = 0 . i =1 6.95. The life length X of fuses of a certain type is modeled by the exponential distribution with ⎧(1 / 3)e − x / 3 , x>0 f ( x) = ⎨ otherwise ⎩ 0, The measurements are in hundreds of hours. a. If two such fuses have independent life lengths X1 and X2, find their joint probability density function. 62 b. One fuse in part (a) is in a primary system, and the other is in a backup system that comes into use only if the primary system fails. The total effective life length of the two fuses is then X 1 + X 2 . Find P( X 1 + X 2 ≤ 1) . 6.96. Referring to Exercise 6.95, suppose that three such fuses are operating independently in a system. a. Find the probability that exactly two of the three last longer than 500 hours. b. Find the probability that at least one of the three fails before 500 hours.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 6: Multivariate Probability Distributions - UF-Stat