Download Conditional Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Conditional Distributions
The goal is to provide a general definition of the conditional distribution of Y given X,
when (X, Y ) are jointly distributed.
Let F be a distribution function on R. Let G(·, ·) be a map from R × BR to [0, 1]
satisfying: (a) G(x, ·) ia a probability measure on BR for every x in R, and, (b) G(·, A) is a
measurable function for every Borel set A.
We can then form the generalized product F × G in the following sense: There exists a
measure H on BR2 , which we call F × G such that:
Z x0
(F × G)[(−∞, x0 ] × (−∞, y0 ]] =
G(x, (−∞, y0 ]) dF (x) .
−∞
More generally, for Borel subsets A, B, we should have:
Z
G(x, B) dF (x) .
(F × G)(A × B) =
A
If (X, Y ) has distribution F × G, then the marginal of X is F and the conditional of Y given
X = x is G(x, ·). From F × G, we can recover the marginal distribution of Y , say F̃ and the
conditional of X given Y = y, say G̃(y, ·), where G̃ has the same properties as G and F̃ × G̃
gives the distribution of (Y, X). Note that:
Z ∞
F̃ (y0 ) = P (X < ∞, Y ≤ y0 ) =
G(x, (−∞, y0 ]) dF (x) .
−∞
When G does not depend on its first co-ordinate, i.e. G(x, ·) is the same measure for all x,
X and Y are independent, and G(·) ≡ G(0, ·), gives the marginal distribution of Y .
Example 1: Suppose that F is U (0, 1), and that G is defined by:
G(x, {1}) = x and G(x, {0}) = 1 − x .
1
We seek to find F × G which is defined on ([0, 1] × {0, 1}, B[0,1] × 2{0,1} ). Let (X, Y ) follow
F × G. Now,
Z x
Z x
x2
u du =
G(u, {1}) dF (u) =
P (X ≤ x, Y = 1) =
.
(0.1)
2
0
0
Similarly,
x2
.
(0.2)
2
So P (Y = 1) = 1/2; therefore Y ∼ Ber(1/2). It is also clear from the above discussion that
given X, Y ∼ Ber(x). This can also be verified through the limiting definition of conditional
probabilities that was discussed before.
P (X ≤ x, Y = 0) = x −
P (Y = 1 | X = x) = lim P (Y = 1 | X ∈ [x, x + h])
h→0
P (Y = 1, X ∈ [x, x + h])
h→0
P (X ∈ [x, x + h])
R x+h
u du
= lim x
= x.
h→0
h
= lim
Next, we seek to find H(y, ·), the conditional of X given Y = y. We can do it via the
definition of conditional probabilities when conditioning on a discrete random variable but
let’s try the more formal recipe. We have:
Z
1
H(y, (0, x]) dF (y) = H(1, (0, x]) ,
P (Y = 1, X ≤ x) =
2
{1}
and
Z
P (Y = 0, X ≤ x) =
H(y, (0, x]) dF (y) = H(0, (0, x])
{0}
1
,
2
and using (0.1) and (0.2), we get:
H(1, (0, x]) = x2 and H(0, (0, x]) = 2x = x2 .
Example 2: Let X be a continuous random variable with:
f (x) =
2 −x
e 1(x > 0) + ex 1(x < 0) .
3
Note that the distribution function of X is:
1
F (x) = P (X ≤ x) = ex 1(x ≤ 0) +
3
2
1 2
+ (1 − e−x )
3 3
1(x > 0) .
We first find the conditional distribution of Y = sign(X) given |X|, i.e. P (Y = 1| | X |= x)
for x > 0. So, it suffices to get a transition function G(t, a), a ∈ {−1, 1}, t > 0, such that:
Z x
G(t, 1) dF|X| ) t .
(0.3)
P (|X| ≤ x, Y = 1) =
0
and
Z
x
G(t, −1) dF|X| ) t .
P (|X| ≤ x, Y = −1) =
0
Now,
P (|X| ≤ x, Y = 1) = P (0 < X ≤ x) =
2
(1 − e−x ) ,
3
and
P (|X| ≤ x, Y = −1) = P (−x ≤ X < 0) =
1
(1 − e−x ) .
3
So P (|X| ≤ x) = 1 − e−x . Now, by (0.3),
Z x
Z x
2
−t
−x
G(t, 1) e−t dt ,
G(t, 1) d(1 − e ) =
(1 − e ) =
3
0
0
showing that G(x, 1) = 2/3. Similarly G(x, −1) = 1/3.
Note that the distribution corresponding to f can be generated by the following stochastic
mechanism: Let V follow Exp(1) and let B be a {−1, 1} valued random variable independent
of V , with pB (1) = 2/3 = 1 − pB (−1), and let X = V 1{B = 1} − V 1{B = −1}. Then V
is precisely |X| and X ∼ f . Note that the sign of X is precisely B and it is independent
of V = |X| by the mechanism itself. So the conditional of sign(X) given |X| is simply the
unconditional distribution of B and we obtain the same result as with the formal derivation.
Next, consider the distribution of X given Y . Note that P (Y = −1) = 1/3 and P (Y = 1) =
2/3. We have:
1
P (Y = −1, X ≤ x) = H(−1, (−∞, x]) ,
3
so
H(−1, (−∞, x]) = 3 P (Y = −1, X ≤ x)
= 3 P (X ≤ 0 ∧ x)
1
= 3
1(x > 0) + P (X ≤ x) 1(x ≤ 0) = ex 1(x ≤ 0) + 1(x > 0) .
3
Similarly, we compute H(1, (−∞, x]).
3
0.1
Order statistics and conditional distributions
Let X1 , X2 , . . . , Xn be i.i.d. from a distribution F with Lebesgue density f . Let
X(1) , X(2) , . . . , X(n) be the corresponding order statistics. Note that the order statistics are
all distinct with probability 1 and
P ( (X(1) , X(2) , . . . , X(n) ) ∈ B) = 1 ,
where B = {(x1 , x2 , . . . , xn ) : x1 < x2 < . . . < xn }. Let’s first find the joint density of the
order statistics. Let Π be the set of all permutations of the numbers 1 through n. For a
measurable subset of B, say A, we have:
P ((X(1) , X(2) , . . . , X(n) ) ∈ A)) = P (∪π∈Π {(Xπ1 , Xπ2 , . . . , Xπn ) ∈ A})
X
=
P ({(Xπ1 , Xπ2 , . . . , Xπn ) ∈ A})
π∈Π
= n ! P ((X1 , X2 , . . . , X) ∈ A)
Z
=
n ! f (x1 , x2 , . . . , xn ) dx1 dx2 . . . dxn .
A
This shows that:
ford (x1 , x2 , . . . , xn ) = n ! Πni=1 f (xi ) , (x1 , x2 , . . . , xn ) ∈ B .
Remark: If we assumed that the Xi ’s were not independent but came from an exchangeable
distribution with density f (x1 , x2 , . . . , xn ), i.e. the distribution of the Xi ’s is invariant to
permutations of the Xi ’s, then f is necessarily symmetric in its arguments and an argument
similar to the one above would show that
ford (x1 , x2 , . . . , xn ) = n ! f (x1 , x2 , . . . , xn ) , (x1 , x2 , . . . , xn ) ∈ B .
Now, consider the situation that the distribution of (X1 , X2 , . . . , Xn ) is exchangeable. We
seek to find:
P ((X1 , X2 , . . . Xn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn )
for some permutation π. Let τ be an arbitrary permutation. Note that (Y1 , Y2 , . . . , Yn ) ≡
(Xτ1 , Xτ2 , . . . , Xτn ) has the same distribution as (X1 , X2 , . . . , Xn ). Thus,
P((X1 , X2 , . . . Xn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn )
= P((Y1 , Y2 , . . . Yn ) = (xπ1 , xπ2 , . . . , xπn ) | Y(1) = x1 , Y(2) = x2 , . . . , Y(n) = xn )
= P((Xτ1 , Xτ2 , . . . Xτn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn )
= P((X1 , X2 , . . . Xn ) = (x(π◦τ −1 )1 , x(π◦τ −1 )2 , . . . , x(π◦τ −1 )n ) | X(i) = xi , i = 1, . . . , n) .
4
As τ runs over all permutations, so does π ◦ τ −1 , showing that the conditional probability
under consideration does not depend upon the permutation π initially fixed. As there are
n ! permutations, we conclude that:
1
.
P ((X1 , X2 , . . . Xn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn ) =
n!
An example with Uniforms: Suppose that X1 , X2 , . . . , Xn are i.i.d. Uniform (0, θ). The
joint density of {X(i) } is given by:
n!
1 {0 < x1 < x2 < . . . < xn < θ} .
θn
The marginal density of the maximum, X(n) is:
n
1 {0 < xn < θ} .
fX(n) (xn ) = n xn−1
θ n
So, the conditional density of (X(1) , X(2) , . . . , X(n−1) ) | X(n) = xn ), by direct division, is seen
to be:
(n − 1) !
1 {0 < x1 < x2 < . . . < xn } .
fcond (x1 , x2 , . . . , xn−1 ) =
xnn−1
This shows that the first n − 1 order statistics given the maximum, xn , are distributed as
the n − 1 order statistics from a sample of size n − 1 from Uniform(0, xn ). But note that
the distribution of the vector {X1 , X2 , . . . , Xn } − {X(n) } given (X(1) , X(2) , . . . , X(n) ) must
be uniformly distributed on all the (n − 1) ! permutations of the first n − 1 order statistics.
Thus, the random vector {X1 , X2 , . . . , Xn } − {X(n) }, conditional on X(n) , must behave
like an i.i.d. random sample from Uniform(0, X(n) ). These arguments can be made more
rigorous but at the expense of much notation.
ford (x1 , x2 , . . . , xn ) =
Order statistics and non-exchangeable distributions: Take (X, Y ) to be a pair
of independent random variables, each defined on (0, 1), with X having Lebesgue density
f and Y having Lebesgue density g. Now, X and Y are not exchangeable. We consider
P ((X, Y ) ∈ A , (U, V ) ∈ B) where U = X ∧ Y, V = X ∨ Y , A is a Borel subset of (0, 1) and
B a Borel subset of {x < y : x, y ∈ (0, 1)}. Let π be the permutation on {1, 2} that swaps
indices. Then:
P ((X, Y ) ∈ A , (U, V ) ∈ B) = P ((X, Y ) ∈ A ∩ (B ∪ πB))
= P ((X, Y ) ∈ A ∩ B) + P ((X, Y ) ∈ (A ∩ πB)
Z
Z
=
f (x)g(y) dx dy +
f (x)g(y) dxdy
A∩B
A∩πB
Z
Z
=
f (u)g(v) du dv +
f (v)g(u) dudv (change of variable)
A∩B
πA∩B
Z
=
, {(f (u)g(v) 1((u, v) ∈ A) + f (v)g(u) 1((u, v) ∈ πA)} du dv .
B
5
From the above derivation, taking A to be the unit square, we find that:
Z
(f (u)g(v) + f (v)g(u)) dudv ,
P ((U, V ) ∈ B) =
B
so that dFU,V (u, v) = (f (u)g(v) + f (v)g(u)) dudv. Conclude that:
Z
P ((X, Y ) ∈ A , (U, V ) ∈ B) =
ξ((u, v), A) dFU,V (u, v)
B
where, for u < v,
f (u) g(v)
f (v) g(u)
1((u, v) ∈ A) +
1((u, v) ∈ πA) .
f (u)g(v) + f (v)g(u)
f (u)g(v) + f (v)g(u)
Q
Remark: If (X1 , X2 , . . . , Xn ) is a random vector with density ni=1 fi (xi ), you should be
able to guess the form of the conditional distribution of the Xi ’s given the order statistics.
ξ((u, v), A) =
6
Related documents