Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Conditional Distributions
The goal is to provide a general definition of the conditional distribution of Y given X,
when (X, Y ) are jointly distributed.
Let F be a distribution function on R. Let G(·, ·) be a map from R × BR to [0, 1]
satisfying: (a) G(x, ·) ia a probability measure on BR for every x in R, and, (b) G(·, A) is a
measurable function for every Borel set A.
We can then form the generalized product F × G in the following sense: There exists a
measure H on BR2 , which we call F × G such that:
Z x0
(F × G)[(−∞, x0 ] × (−∞, y0 ]] =
G(x, (−∞, y0 ]) dF (x) .
−∞
More generally, for Borel subsets A, B, we should have:
Z
G(x, B) dF (x) .
(F × G)(A × B) =
A
If (X, Y ) has distribution F × G, then the marginal of X is F and the conditional of Y given
X = x is G(x, ·). From F × G, we can recover the marginal distribution of Y , say F̃ and the
conditional of X given Y = y, say G̃(y, ·), where G̃ has the same properties as G and F̃ × G̃
gives the distribution of (Y, X). Note that:
Z ∞
F̃ (y0 ) = P (X < ∞, Y ≤ y0 ) =
G(x, (−∞, y0 ]) dF (x) .
−∞
When G does not depend on its first co-ordinate, i.e. G(x, ·) is the same measure for all x,
X and Y are independent, and G(·) ≡ G(0, ·), gives the marginal distribution of Y .
Example 1: Suppose that F is U (0, 1), and that G is defined by:
G(x, {1}) = x and G(x, {0}) = 1 − x .
1
We seek to find F × G which is defined on ([0, 1] × {0, 1}, B[0,1] × 2{0,1} ). Let (X, Y ) follow
F × G. Now,
Z x
Z x
x2
u du =
G(u, {1}) dF (u) =
P (X ≤ x, Y = 1) =
.
(0.1)
2
0
0
Similarly,
x2
.
(0.2)
2
So P (Y = 1) = 1/2; therefore Y ∼ Ber(1/2). It is also clear from the above discussion that
given X, Y ∼ Ber(x). This can also be verified through the limiting definition of conditional
probabilities that was discussed before.
P (X ≤ x, Y = 0) = x −
P (Y = 1 | X = x) = lim P (Y = 1 | X ∈ [x, x + h])
h→0
P (Y = 1, X ∈ [x, x + h])
h→0
P (X ∈ [x, x + h])
R x+h
u du
= lim x
= x.
h→0
h
= lim
Next, we seek to find H(y, ·), the conditional of X given Y = y. We can do it via the
definition of conditional probabilities when conditioning on a discrete random variable but
let’s try the more formal recipe. We have:
Z
1
H(y, (0, x]) dF (y) = H(1, (0, x]) ,
P (Y = 1, X ≤ x) =
2
{1}
and
Z
P (Y = 0, X ≤ x) =
H(y, (0, x]) dF (y) = H(0, (0, x])
{0}
1
,
2
and using (0.1) and (0.2), we get:
H(1, (0, x]) = x2 and H(0, (0, x]) = 2x = x2 .
Example 2: Let X be a continuous random variable with:
f (x) =
2 −x
e 1(x > 0) + ex 1(x < 0) .
3
Note that the distribution function of X is:
1
F (x) = P (X ≤ x) = ex 1(x ≤ 0) +
3
2
1 2
+ (1 − e−x )
3 3
1(x > 0) .
We first find the conditional distribution of Y = sign(X) given |X|, i.e. P (Y = 1| | X |= x)
for x > 0. So, it suffices to get a transition function G(t, a), a ∈ {−1, 1}, t > 0, such that:
Z x
G(t, 1) dF|X| ) t .
(0.3)
P (|X| ≤ x, Y = 1) =
0
and
Z
x
G(t, −1) dF|X| ) t .
P (|X| ≤ x, Y = −1) =
0
Now,
P (|X| ≤ x, Y = 1) = P (0 < X ≤ x) =
2
(1 − e−x ) ,
3
and
P (|X| ≤ x, Y = −1) = P (−x ≤ X < 0) =
1
(1 − e−x ) .
3
So P (|X| ≤ x) = 1 − e−x . Now, by (0.3),
Z x
Z x
2
−t
−x
G(t, 1) e−t dt ,
G(t, 1) d(1 − e ) =
(1 − e ) =
3
0
0
showing that G(x, 1) = 2/3. Similarly G(x, −1) = 1/3.
Note that the distribution corresponding to f can be generated by the following stochastic
mechanism: Let V follow Exp(1) and let B be a {−1, 1} valued random variable independent
of V , with pB (1) = 2/3 = 1 − pB (−1), and let X = V 1{B = 1} − V 1{B = −1}. Then V
is precisely |X| and X ∼ f . Note that the sign of X is precisely B and it is independent
of V = |X| by the mechanism itself. So the conditional of sign(X) given |X| is simply the
unconditional distribution of B and we obtain the same result as with the formal derivation.
Next, consider the distribution of X given Y . Note that P (Y = −1) = 1/3 and P (Y = 1) =
2/3. We have:
1
P (Y = −1, X ≤ x) = H(−1, (−∞, x]) ,
3
so
H(−1, (−∞, x]) = 3 P (Y = −1, X ≤ x)
= 3 P (X ≤ 0 ∧ x)
1
= 3
1(x > 0) + P (X ≤ x) 1(x ≤ 0) = ex 1(x ≤ 0) + 1(x > 0) .
3
Similarly, we compute H(1, (−∞, x]).
3
0.1
Order statistics and conditional distributions
Let X1 , X2 , . . . , Xn be i.i.d. from a distribution F with Lebesgue density f . Let
X(1) , X(2) , . . . , X(n) be the corresponding order statistics. Note that the order statistics are
all distinct with probability 1 and
P ( (X(1) , X(2) , . . . , X(n) ) ∈ B) = 1 ,
where B = {(x1 , x2 , . . . , xn ) : x1 < x2 < . . . < xn }. Let’s first find the joint density of the
order statistics. Let Π be the set of all permutations of the numbers 1 through n. For a
measurable subset of B, say A, we have:
P ((X(1) , X(2) , . . . , X(n) ) ∈ A)) = P (∪π∈Π {(Xπ1 , Xπ2 , . . . , Xπn ) ∈ A})
X
=
P ({(Xπ1 , Xπ2 , . . . , Xπn ) ∈ A})
π∈Π
= n ! P ((X1 , X2 , . . . , X) ∈ A)
Z
=
n ! f (x1 , x2 , . . . , xn ) dx1 dx2 . . . dxn .
A
This shows that:
ford (x1 , x2 , . . . , xn ) = n ! Πni=1 f (xi ) , (x1 , x2 , . . . , xn ) ∈ B .
Remark: If we assumed that the Xi ’s were not independent but came from an exchangeable
distribution with density f (x1 , x2 , . . . , xn ), i.e. the distribution of the Xi ’s is invariant to
permutations of the Xi ’s, then f is necessarily symmetric in its arguments and an argument
similar to the one above would show that
ford (x1 , x2 , . . . , xn ) = n ! f (x1 , x2 , . . . , xn ) , (x1 , x2 , . . . , xn ) ∈ B .
Now, consider the situation that the distribution of (X1 , X2 , . . . , Xn ) is exchangeable. We
seek to find:
P ((X1 , X2 , . . . Xn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn )
for some permutation π. Let τ be an arbitrary permutation. Note that (Y1 , Y2 , . . . , Yn ) ≡
(Xτ1 , Xτ2 , . . . , Xτn ) has the same distribution as (X1 , X2 , . . . , Xn ). Thus,
P((X1 , X2 , . . . Xn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn )
= P((Y1 , Y2 , . . . Yn ) = (xπ1 , xπ2 , . . . , xπn ) | Y(1) = x1 , Y(2) = x2 , . . . , Y(n) = xn )
= P((Xτ1 , Xτ2 , . . . Xτn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn )
= P((X1 , X2 , . . . Xn ) = (x(π◦τ −1 )1 , x(π◦τ −1 )2 , . . . , x(π◦τ −1 )n ) | X(i) = xi , i = 1, . . . , n) .
4
As τ runs over all permutations, so does π ◦ τ −1 , showing that the conditional probability
under consideration does not depend upon the permutation π initially fixed. As there are
n ! permutations, we conclude that:
1
.
P ((X1 , X2 , . . . Xn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn ) =
n!
An example with Uniforms: Suppose that X1 , X2 , . . . , Xn are i.i.d. Uniform (0, θ). The
joint density of {X(i) } is given by:
n!
1 {0 < x1 < x2 < . . . < xn < θ} .
θn
The marginal density of the maximum, X(n) is:
n
1 {0 < xn < θ} .
fX(n) (xn ) = n xn−1
θ n
So, the conditional density of (X(1) , X(2) , . . . , X(n−1) ) | X(n) = xn ), by direct division, is seen
to be:
(n − 1) !
1 {0 < x1 < x2 < . . . < xn } .
fcond (x1 , x2 , . . . , xn−1 ) =
xnn−1
This shows that the first n − 1 order statistics given the maximum, xn , are distributed as
the n − 1 order statistics from a sample of size n − 1 from Uniform(0, xn ). But note that
the distribution of the vector {X1 , X2 , . . . , Xn } − {X(n) } given (X(1) , X(2) , . . . , X(n) ) must
be uniformly distributed on all the (n − 1) ! permutations of the first n − 1 order statistics.
Thus, the random vector {X1 , X2 , . . . , Xn } − {X(n) }, conditional on X(n) , must behave
like an i.i.d. random sample from Uniform(0, X(n) ). These arguments can be made more
rigorous but at the expense of much notation.
ford (x1 , x2 , . . . , xn ) =
Order statistics and non-exchangeable distributions: Take (X, Y ) to be a pair
of independent random variables, each defined on (0, 1), with X having Lebesgue density
f and Y having Lebesgue density g. Now, X and Y are not exchangeable. We consider
P ((X, Y ) ∈ A , (U, V ) ∈ B) where U = X ∧ Y, V = X ∨ Y , A is a Borel subset of (0, 1) and
B a Borel subset of {x < y : x, y ∈ (0, 1)}. Let π be the permutation on {1, 2} that swaps
indices. Then:
P ((X, Y ) ∈ A , (U, V ) ∈ B) = P ((X, Y ) ∈ A ∩ (B ∪ πB))
= P ((X, Y ) ∈ A ∩ B) + P ((X, Y ) ∈ (A ∩ πB)
Z
Z
=
f (x)g(y) dx dy +
f (x)g(y) dxdy
A∩B
A∩πB
Z
Z
=
f (u)g(v) du dv +
f (v)g(u) dudv (change of variable)
A∩B
πA∩B
Z
=
, {(f (u)g(v) 1((u, v) ∈ A) + f (v)g(u) 1((u, v) ∈ πA)} du dv .
B
5
From the above derivation, taking A to be the unit square, we find that:
Z
(f (u)g(v) + f (v)g(u)) dudv ,
P ((U, V ) ∈ B) =
B
so that dFU,V (u, v) = (f (u)g(v) + f (v)g(u)) dudv. Conclude that:
Z
P ((X, Y ) ∈ A , (U, V ) ∈ B) =
ξ((u, v), A) dFU,V (u, v)
B
where, for u < v,
f (u) g(v)
f (v) g(u)
1((u, v) ∈ A) +
1((u, v) ∈ πA) .
f (u)g(v) + f (v)g(u)
f (u)g(v) + f (v)g(u)
Q
Remark: If (X1 , X2 , . . . , Xn ) is a random vector with density ni=1 fi (xi ), you should be
able to guess the form of the conditional distribution of the Xi ’s given the order statistics.
ξ((u, v), A) =
6