Download 3.5 Marginal Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
STAT 421 Lecture Notes
3.5
52
Marginal Distributions
Definition 3.5.1 Suppose that X and Y have a joint distribution. The c.d.f. of X derived
by integrating (or summing) over the support of Y is called the marginal c.d.f. of X. The
p.f. or p.d.f. associated with the marginal c.d.f. is called the marginal p.f. (or p.d.f.)
Theorem 3.5.1. If X and Y have a discrete joint distribution with p.f. f , then the marginal
p.f. of X is
∑
f1 (x) =
f (x, y).
The notation
positive.
y
∑
y
implies that the summation is over all values of y such that f (x, y) is
Recall the urn example in which an urn contains 4 green, 6 red and 10 black balls. Two
balls are drawn randomly and without replacement.
Let X count the number of green balls and Y count the number of reds. The (joint) probability distribution for (X, Y ) is determined by counting the number of ways to draw 0 ≤ x ≤ 2
from 4 (reds) and 0 ≤ y ≤ 2 from 6 (greens) and the number of ways to draw 2 − x − y from
10, given that x + y ≤ 2. The joint probability function is
 (4)(6)( 10 )

 x y( 2−x−y
)
, x, y ∈ {0, 1, 2} and x + y ≤ 2,
20
f (x, y) =
(1)
2


0
otherwise.
The p.f. may also be presented by enumerating all possible values of (X, Y ) and their
probabilities. Table 1 illustrates.
Table 1: The joint probability function for the number of red (X) and green (Y ) balls from
the urn example.
y
x
0
1
2
0 .237 .316 .079
1 .210 .126
0
2 .032
0
0
The marginal distributions of X and Y can be obtained by summing along rows (yielding
the marginal p.f. of X) and along columns (yielding the marginal p.f. of Y ). The result is
displayed in Table 2.
STAT 421 Lecture Notes
53
Table 2: Joint and marginal probability functions for the number of red (X) and green (Y )
balls from the urn example.
y
x
0
1
2
f1 (x)
0
.237 .316 .079 .632
1
.210 .126
0
.336
2
.032
0
0
.032
f2 (y) .479 .442 .079
1
Theorem 3.5.2. If X and Y have a continuous joint distribution with p.d.f. f , then the
marginal p.f. f1 of X is
∫ ∞
f1 (x) =
f (x, y) dy for − ∞ < x < ∞.
−∞
The proof is based on recognizing that a probability involving X alone, say Pr(X ≤ x), is
equivalent to Pr(X ≤ x, Y < ∞). The condition that Y may take on any value y ∈ R leads
to the integral
∫ x ∫ ∞
∫ x
Pr(X ≤ x, Y < ∞) =
f (x, y) dydx =
f1 (x) dx.
−∞
−∞
−∞
Example 3.5.3 Suppose that X and Y have the following joint p.d.f.:

 21 x2 y, for x2 ≤ y ≤ 1,
f (x) = 4
0,
otherwise.
0.0
0.2
0.4
y
0.6
0.8
1.0
To find the marginals, the support of X and Y must be determined from the condition
x2 ≤ y ≤ 1. Obviously, the support is limited to y ≤ 1. Furthermore, 0 ≤ x2 ⇒ 0 ≤ y.
√
√
Finally, x2 ≤ y implies that − y ≤ x ≤ y. The support is shown in red:
−1.0
−0.5
0.0
x
0.5
1.0
STAT 421 Lecture Notes
Then
21
f1 (x) =
4
∫
54
1
x2 y dy
1
21 x2 y 2 =
4 2 x2

 21 (x2 − x4 ), for − 1 ≤ x ≤ 1,
8
=
0,
otherwise.
x2
The marginal p.d.f. of Y is
∫
21
f2 (y) =
4
=
=
√
y
x2 y
√
− y
√y
21 3 x y
√
12
− y

 7 y 5/2 ,
2
0,
dx
for 0 < y < 1,
otherwise.
√
Notice that the bounds of integration are determined by the inequality x2 ≤ y ⇒ − y ≤
√
x ≤ y.
Theorem 3.5.3. Suppose that X is discrete and Y is continuous, and the joint p.f./p.d.f.
is f . Then, the marginal p.f. of X is
∫ ∞
f1 (x) =
f (x, y) dy, for all x.
−∞
The marginal p.d.f. of Y is
f2 (y) =
∑
f (x, y), for y ∈ R.
x
Example 3.5.4 Suppose that X is discrete and Y is continuous, and the joint p.f./p.d.f. is
 x−1
 xy , for x ∈ {1, 2, 3}, 0 < y < 1,
3
f (x, y) =

0,
otherwise.
Then, the marginal p.f. of X is
∫
1
xy x−1
dy
3
0
1
1 x y , for x ∈ {1, 2, 3}
=
3 0

 1 , for x ∈ {1, 2, 3},
3
=
0, otherwise.
f1 (x) =
STAT 421 Lecture Notes
55
It’s useful to establish some notation not used by DeGroot and Schervish.
Let IA denote an indicator function of the set A. The function is defined according to

1, if x ∈ A,
IA (x) =
0, if x ̸∈ A.
Now, f1 can be defined without separate cases:
1
f1 (x) = I{1,2,3} (x).
3
Returning to the example, the marginal p.d.f. of Y is
f2 (y) =
3
∑
xy x−1
x=1
=
3
)
1(
1 + 2y + 3y 2 I(0,1) (y).
3
While it is theoretically possible to compute marginal distributions from the joint distribution, the reverse is sometimes false. The joint p.f./p.d.f.’s of random variables X and Y
cannot be derived from the marginal distributions of X and Y unless X and Y are independent random variables.
Definition 3.5.2. Random variables X and Y are independent if for every two sets of real
numbers A and B,
Pr(X ∈ A, Y ∈ B) = Pr(X ∈ A) Pr(Y ∈ B).
If X and Y are independent, then
Pr(X ≤ x, Y ≤ y) = Pr(X ≤ x) Pr(Y ≤ y)
⇒ F (x, y) = F1 (x)F2 (y).
Thus, when X and Y are independent, the joint cumulative distribution function of X and
Y can be constructed as the product of the univariate cumulative distribution functions.
Theorem 3.5.4. X and Y are independent if and only if F (x, y) = F1 (x)F2 (y) for all
real numbers x and y.
The next theorem is very useful for proving that two (or more) random variables are independent.
STAT 421 Lecture Notes
56
Theorem 3.5.5. Suppose that X and Y have a joint p.f./p.d.f. f . Then, X and Y are
independent if and only if
f (x, y) = h1 (x)h2 (y) ∀x, y ∈ R,
(2)
where h1 is a nonnegative function of x alone and where h2 is a nonnegative function of y
alone.
Corollary 3.5.1. extends Theorem 3.5.5.
Corollary 3.5.1. X and Y are independent if and only if
f (x, y) = f1 (x)f2 (y) ∀x, y ∈ R,
(3)
where f1 and f2 are the marginal p.f./p.d.f.’s of X and Y .
Simply factoring f does not necessarily yield the marginals f1 and f2 since h1 may differ from f1 by a multiplicative constant (and similarly, factoring may yield h2 (y) = cf2 (y)
for some c ̸= 1).
A more intuitive definition of independent random variables is presented in the next section
on conditional probability, but looking ahead, it will be stated that X and Y are independent
discrete random variables if knowing that y is the realized value of Y does not change the
probability that X takes on any particular value. Mathematically, X and Y are independent
if and only if Pr(X = x|Y = y) = Pr(X = x) for all x and y. The definition extends
to continuous random variables by replacing the events {X = x} and {Y = y} with events
{X ∈ A} and {Y ∈ B} where A and B are sets such that Pr(X ∈ A) > 0 and Pr(Y ∈ B) > 0.
Example Consider discrete random variables X and Y counting the number of heads when
coins A and B are tossed at the same time. Coin B is fair, but A is not and yields
Pr(X = 0) = 1/2, Pr(X = 1) = 1/4 = Pr(X = 2). The marginal p.d.f.s are shown in
the margins of Table 3. The body of Table 3 gives the joint p.d.f. of (X, Y ). Each entry was
obtained by computing Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y).
Notice that
x).
∑2
x=0
Pr(X = x, Y = y) = Pr(Y = y) and
∑2
y=0
Pr(X = x, Y = y) = Pr(X =
STAT 421 Lecture Notes
57
Table 3: The joint and marginal p.d.f’s of X and Y . Values in the body of the Table are
Pr(X = x, Y = y).
y
x
0
1
2
f1 (x)
0
1/8 1/4 1/8
1/2
1
1/16 1/8 1/16 1/4
2
1/16 1/8 1/16 1/4
f2 (y) 1/4 1/2 1/4
Example The data in Table 4 enumerates the outcomes of the Titanic passengers and crew.
Table 5 contains the proportion of all passengers and crew cross-classified into a particular
cell (e.g., Pr(Survived, First) = 203/2201 = .092).
Table 4: Titanic data.
Class
First Second Third Crew
Survived 203
118
178
212
Died 122
167
528
673
Total 325
285
706
885
Total
711
1490
2201
Table 5: Outcome probabilities for the Titanic passengers and crew.
Class
First Second Third Crew Pr(Outcome)
Survived .092
.054
.081
.096
.323
Died .055
.076
.240
.306
.677
Pr(Class) .148
.129
.321
.402
The survivorship random variable with marginal p.d.f. Pr(Survived) = .323 and Pr(Died) =
.677 is not independent of the class random variable since product of the marginal probabilities Pr(First) = .148 and Pr(Survived) = .323 is .148 × .322 = .0478 ̸= .092 =
Pr(Survived, First).
Example 3.5.9. Suppose that X and Y are independent random variables and
g(x) = 2xI[0,1] (x)
is the p.d.f. of both random variables. Recall that

1, if x ∈ [0, 1],
I[0,1] (x) =
0, if x ̸∈ [0, 1].
STAT 421 Lecture Notes
58
The probability Pr(X + Y ≤ 1) is computed as follows:
1. f (x, y) = 4xyI[0,1] (x)I[0,1] (y) = 4xyI[0,1]×[0,1] (x, y)
2. Let S0 = {X + Y ≤ 1} = {(x, y)|0 ≤ x ≤ 1 − y}. Then
∫ 1 ∫ 1−y
Pr[(X, Y ) ∈ S0 ] =
4xy dxdy
0
0
∫ 1
=
2(1 − y)2 y dy
0
1
.
=
6
Example 3.5.10. Consider the joint p.d.f. of (X, Y ):

kx2 y 2 , x2 + y 2 ≤ 1,
f (x, y) =
0,
otherwise .
It might appear that X and Y are independent random variables since kx2 y 2 is easily factored as two functions, each of which depend only on one variable. However, the support of
(X, Y ) is not rectangular with edges parallel to the x- and y-axes, so there is no possibility
of defining the support of X without reference to Y .
Another view of this complication writes f using an indicator function to explicitly identify
the support:
f (x, y) = kx2 y 2 I{(r,s)|r2 +s2 ≤1} (x, y).
There is no possibility of factoring I{(r,s)|r2 +s2 ≤1} (x, y) as two indicator functions each of
which involves only one variable. Factorization could be accomplished if the support were
rectangular with edges parallel to the x- and y-axes, say
I{(r,s)|r≤1,s≤2} (x, y) = I[0,1] (r)I[0,2] (s).
If this were true of f (x, y), then X and Y are independent. DeGroot and Schervish establish
lack of independence of X and Y by making the rectangular support argument. Then they
give Theorem 3.5.6. which states that the support must be rectangular with edges parallel
to the x- and y-axes for independence to hold.
Example 3.5.11. Suppose that X and Y have joint p.d.f.

ke−(x+2y) , 0 ≤ x, 0 ≤ y,
f (x, y) =
0,
otherwise .
Theorem 3.5.4. and 3.5.6. can be used to establish that X and Y are independent. Alternatively, write
f (x, y) = ke−(x+2y) I{(r,s)|0≤r,0≤s} (x, y) = h1 (x)h2 (y).
STAT 421 Lecture Notes
59
where
h1 (x) = e−x I[0,∞) (x)
h2 (y) = ke−2y I[0,∞) (y).
The function h1 is a p.d.f. (it integrates to 1), but h2 is not, at least until k is replaced by 12 .