Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Transcript

STAT 421 Lecture Notes 3.5 52 Marginal Distributions Definition 3.5.1 Suppose that X and Y have a joint distribution. The c.d.f. of X derived by integrating (or summing) over the support of Y is called the marginal c.d.f. of X. The p.f. or p.d.f. associated with the marginal c.d.f. is called the marginal p.f. (or p.d.f.) Theorem 3.5.1. If X and Y have a discrete joint distribution with p.f. f , then the marginal p.f. of X is ∑ f1 (x) = f (x, y). The notation positive. y ∑ y implies that the summation is over all values of y such that f (x, y) is Recall the urn example in which an urn contains 4 green, 6 red and 10 black balls. Two balls are drawn randomly and without replacement. Let X count the number of green balls and Y count the number of reds. The (joint) probability distribution for (X, Y ) is determined by counting the number of ways to draw 0 ≤ x ≤ 2 from 4 (reds) and 0 ≤ y ≤ 2 from 6 (greens) and the number of ways to draw 2 − x − y from 10, given that x + y ≤ 2. The joint probability function is (4)(6)( 10 ) x y( 2−x−y ) , x, y ∈ {0, 1, 2} and x + y ≤ 2, 20 f (x, y) = (1) 2 0 otherwise. The p.f. may also be presented by enumerating all possible values of (X, Y ) and their probabilities. Table 1 illustrates. Table 1: The joint probability function for the number of red (X) and green (Y ) balls from the urn example. y x 0 1 2 0 .237 .316 .079 1 .210 .126 0 2 .032 0 0 The marginal distributions of X and Y can be obtained by summing along rows (yielding the marginal p.f. of X) and along columns (yielding the marginal p.f. of Y ). The result is displayed in Table 2. STAT 421 Lecture Notes 53 Table 2: Joint and marginal probability functions for the number of red (X) and green (Y ) balls from the urn example. y x 0 1 2 f1 (x) 0 .237 .316 .079 .632 1 .210 .126 0 .336 2 .032 0 0 .032 f2 (y) .479 .442 .079 1 Theorem 3.5.2. If X and Y have a continuous joint distribution with p.d.f. f , then the marginal p.f. f1 of X is ∫ ∞ f1 (x) = f (x, y) dy for − ∞ < x < ∞. −∞ The proof is based on recognizing that a probability involving X alone, say Pr(X ≤ x), is equivalent to Pr(X ≤ x, Y < ∞). The condition that Y may take on any value y ∈ R leads to the integral ∫ x ∫ ∞ ∫ x Pr(X ≤ x, Y < ∞) = f (x, y) dydx = f1 (x) dx. −∞ −∞ −∞ Example 3.5.3 Suppose that X and Y have the following joint p.d.f.: 21 x2 y, for x2 ≤ y ≤ 1, f (x) = 4 0, otherwise. 0.0 0.2 0.4 y 0.6 0.8 1.0 To find the marginals, the support of X and Y must be determined from the condition x2 ≤ y ≤ 1. Obviously, the support is limited to y ≤ 1. Furthermore, 0 ≤ x2 ⇒ 0 ≤ y. √ √ Finally, x2 ≤ y implies that − y ≤ x ≤ y. The support is shown in red: −1.0 −0.5 0.0 x 0.5 1.0 STAT 421 Lecture Notes Then 21 f1 (x) = 4 ∫ 54 1 x2 y dy 1 21 x2 y 2 = 4 2 x2 21 (x2 − x4 ), for − 1 ≤ x ≤ 1, 8 = 0, otherwise. x2 The marginal p.d.f. of Y is ∫ 21 f2 (y) = 4 = = √ y x2 y √ − y √y 21 3 x y √ 12 − y 7 y 5/2 , 2 0, dx for 0 < y < 1, otherwise. √ Notice that the bounds of integration are determined by the inequality x2 ≤ y ⇒ − y ≤ √ x ≤ y. Theorem 3.5.3. Suppose that X is discrete and Y is continuous, and the joint p.f./p.d.f. is f . Then, the marginal p.f. of X is ∫ ∞ f1 (x) = f (x, y) dy, for all x. −∞ The marginal p.d.f. of Y is f2 (y) = ∑ f (x, y), for y ∈ R. x Example 3.5.4 Suppose that X is discrete and Y is continuous, and the joint p.f./p.d.f. is x−1 xy , for x ∈ {1, 2, 3}, 0 < y < 1, 3 f (x, y) = 0, otherwise. Then, the marginal p.f. of X is ∫ 1 xy x−1 dy 3 0 1 1 x y , for x ∈ {1, 2, 3} = 3 0 1 , for x ∈ {1, 2, 3}, 3 = 0, otherwise. f1 (x) = STAT 421 Lecture Notes 55 It’s useful to establish some notation not used by DeGroot and Schervish. Let IA denote an indicator function of the set A. The function is defined according to 1, if x ∈ A, IA (x) = 0, if x ̸∈ A. Now, f1 can be defined without separate cases: 1 f1 (x) = I{1,2,3} (x). 3 Returning to the example, the marginal p.d.f. of Y is f2 (y) = 3 ∑ xy x−1 x=1 = 3 ) 1( 1 + 2y + 3y 2 I(0,1) (y). 3 While it is theoretically possible to compute marginal distributions from the joint distribution, the reverse is sometimes false. The joint p.f./p.d.f.’s of random variables X and Y cannot be derived from the marginal distributions of X and Y unless X and Y are independent random variables. Definition 3.5.2. Random variables X and Y are independent if for every two sets of real numbers A and B, Pr(X ∈ A, Y ∈ B) = Pr(X ∈ A) Pr(Y ∈ B). If X and Y are independent, then Pr(X ≤ x, Y ≤ y) = Pr(X ≤ x) Pr(Y ≤ y) ⇒ F (x, y) = F1 (x)F2 (y). Thus, when X and Y are independent, the joint cumulative distribution function of X and Y can be constructed as the product of the univariate cumulative distribution functions. Theorem 3.5.4. X and Y are independent if and only if F (x, y) = F1 (x)F2 (y) for all real numbers x and y. The next theorem is very useful for proving that two (or more) random variables are independent. STAT 421 Lecture Notes 56 Theorem 3.5.5. Suppose that X and Y have a joint p.f./p.d.f. f . Then, X and Y are independent if and only if f (x, y) = h1 (x)h2 (y) ∀x, y ∈ R, (2) where h1 is a nonnegative function of x alone and where h2 is a nonnegative function of y alone. Corollary 3.5.1. extends Theorem 3.5.5. Corollary 3.5.1. X and Y are independent if and only if f (x, y) = f1 (x)f2 (y) ∀x, y ∈ R, (3) where f1 and f2 are the marginal p.f./p.d.f.’s of X and Y . Simply factoring f does not necessarily yield the marginals f1 and f2 since h1 may differ from f1 by a multiplicative constant (and similarly, factoring may yield h2 (y) = cf2 (y) for some c ̸= 1). A more intuitive definition of independent random variables is presented in the next section on conditional probability, but looking ahead, it will be stated that X and Y are independent discrete random variables if knowing that y is the realized value of Y does not change the probability that X takes on any particular value. Mathematically, X and Y are independent if and only if Pr(X = x|Y = y) = Pr(X = x) for all x and y. The definition extends to continuous random variables by replacing the events {X = x} and {Y = y} with events {X ∈ A} and {Y ∈ B} where A and B are sets such that Pr(X ∈ A) > 0 and Pr(Y ∈ B) > 0. Example Consider discrete random variables X and Y counting the number of heads when coins A and B are tossed at the same time. Coin B is fair, but A is not and yields Pr(X = 0) = 1/2, Pr(X = 1) = 1/4 = Pr(X = 2). The marginal p.d.f.s are shown in the margins of Table 3. The body of Table 3 gives the joint p.d.f. of (X, Y ). Each entry was obtained by computing Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y). Notice that x). ∑2 x=0 Pr(X = x, Y = y) = Pr(Y = y) and ∑2 y=0 Pr(X = x, Y = y) = Pr(X = STAT 421 Lecture Notes 57 Table 3: The joint and marginal p.d.f’s of X and Y . Values in the body of the Table are Pr(X = x, Y = y). y x 0 1 2 f1 (x) 0 1/8 1/4 1/8 1/2 1 1/16 1/8 1/16 1/4 2 1/16 1/8 1/16 1/4 f2 (y) 1/4 1/2 1/4 Example The data in Table 4 enumerates the outcomes of the Titanic passengers and crew. Table 5 contains the proportion of all passengers and crew cross-classified into a particular cell (e.g., Pr(Survived, First) = 203/2201 = .092). Table 4: Titanic data. Class First Second Third Crew Survived 203 118 178 212 Died 122 167 528 673 Total 325 285 706 885 Total 711 1490 2201 Table 5: Outcome probabilities for the Titanic passengers and crew. Class First Second Third Crew Pr(Outcome) Survived .092 .054 .081 .096 .323 Died .055 .076 .240 .306 .677 Pr(Class) .148 .129 .321 .402 The survivorship random variable with marginal p.d.f. Pr(Survived) = .323 and Pr(Died) = .677 is not independent of the class random variable since product of the marginal probabilities Pr(First) = .148 and Pr(Survived) = .323 is .148 × .322 = .0478 ̸= .092 = Pr(Survived, First). Example 3.5.9. Suppose that X and Y are independent random variables and g(x) = 2xI[0,1] (x) is the p.d.f. of both random variables. Recall that 1, if x ∈ [0, 1], I[0,1] (x) = 0, if x ̸∈ [0, 1]. STAT 421 Lecture Notes 58 The probability Pr(X + Y ≤ 1) is computed as follows: 1. f (x, y) = 4xyI[0,1] (x)I[0,1] (y) = 4xyI[0,1]×[0,1] (x, y) 2. Let S0 = {X + Y ≤ 1} = {(x, y)|0 ≤ x ≤ 1 − y}. Then ∫ 1 ∫ 1−y Pr[(X, Y ) ∈ S0 ] = 4xy dxdy 0 0 ∫ 1 = 2(1 − y)2 y dy 0 1 . = 6 Example 3.5.10. Consider the joint p.d.f. of (X, Y ): kx2 y 2 , x2 + y 2 ≤ 1, f (x, y) = 0, otherwise . It might appear that X and Y are independent random variables since kx2 y 2 is easily factored as two functions, each of which depend only on one variable. However, the support of (X, Y ) is not rectangular with edges parallel to the x- and y-axes, so there is no possibility of defining the support of X without reference to Y . Another view of this complication writes f using an indicator function to explicitly identify the support: f (x, y) = kx2 y 2 I{(r,s)|r2 +s2 ≤1} (x, y). There is no possibility of factoring I{(r,s)|r2 +s2 ≤1} (x, y) as two indicator functions each of which involves only one variable. Factorization could be accomplished if the support were rectangular with edges parallel to the x- and y-axes, say I{(r,s)|r≤1,s≤2} (x, y) = I[0,1] (r)I[0,2] (s). If this were true of f (x, y), then X and Y are independent. DeGroot and Schervish establish lack of independence of X and Y by making the rectangular support argument. Then they give Theorem 3.5.6. which states that the support must be rectangular with edges parallel to the x- and y-axes for independence to hold. Example 3.5.11. Suppose that X and Y have joint p.d.f. ke−(x+2y) , 0 ≤ x, 0 ≤ y, f (x, y) = 0, otherwise . Theorem 3.5.4. and 3.5.6. can be used to establish that X and Y are independent. Alternatively, write f (x, y) = ke−(x+2y) I{(r,s)|0≤r,0≤s} (x, y) = h1 (x)h2 (y). STAT 421 Lecture Notes 59 where h1 (x) = e−x I[0,∞) (x) h2 (y) = ke−2y I[0,∞) (y). The function h1 is a p.d.f. (it integrates to 1), but h2 is not, at least until k is replaced by 12 .