Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Conditional Distributions The goal is to provide a general definition of the conditional distribution of Y given X, when (X, Y ) are jointly distributed. Let F be a distribution function on R. Let G(·, ·) be a map from R × BR to [0, 1] satisfying: (a) G(x, ·) ia a probability measure on BR for every x in R, and, (b) G(·, A) is a measurable function for every Borel set A. We can then form the generalized product F × G in the following sense: There exists a measure H on BR2 , which we call F × G such that: Z x0 (F × G)[(−∞, x0 ] × (−∞, y0 ]] = G(x, (−∞, y0 ]) dF (x) . −∞ More generally, for Borel subsets A, B, we should have: Z G(x, B) dF (x) . (F × G)(A × B) = A If (X, Y ) has distribution F × G, then the marginal of X is F and the conditional of Y given X = x is G(x, ·). From F × G, we can recover the marginal distribution of Y , say F̃ and the conditional of X given Y = y, say G̃(y, ·), where G̃ has the same properties as G and F̃ × G̃ gives the distribution of (Y, X). Note that: Z ∞ F̃ (y0 ) = P (X < ∞, Y ≤ y0 ) = G(x, (−∞, y0 ]) dF (x) . −∞ When G does not depend on its first co-ordinate, i.e. G(x, ·) is the same measure for all x, X and Y are independent, and G(·) ≡ G(0, ·), gives the marginal distribution of Y . Example 1: Suppose that F is U (0, 1), and that G is defined by: G(x, {1}) = x and G(x, {0}) = 1 − x . 1 We seek to find F × G which is defined on ([0, 1] × {0, 1}, B[0,1] × 2{0,1} ). Let (X, Y ) follow F × G. Now, Z x Z x x2 u du = G(u, {1}) dF (u) = P (X ≤ x, Y = 1) = . (0.1) 2 0 0 Similarly, x2 . (0.2) 2 So P (Y = 1) = 1/2; therefore Y ∼ Ber(1/2). It is also clear from the above discussion that given X, Y ∼ Ber(x). This can also be verified through the limiting definition of conditional probabilities that was discussed before. P (X ≤ x, Y = 0) = x − P (Y = 1 | X = x) = lim P (Y = 1 | X ∈ [x, x + h]) h→0 P (Y = 1, X ∈ [x, x + h]) h→0 P (X ∈ [x, x + h]) R x+h u du = lim x = x. h→0 h = lim Next, we seek to find H(y, ·), the conditional of X given Y = y. We can do it via the definition of conditional probabilities when conditioning on a discrete random variable but let’s try the more formal recipe. We have: Z 1 H(y, (0, x]) dF (y) = H(1, (0, x]) , P (Y = 1, X ≤ x) = 2 {1} and Z P (Y = 0, X ≤ x) = H(y, (0, x]) dF (y) = H(0, (0, x]) {0} 1 , 2 and using (0.1) and (0.2), we get: H(1, (0, x]) = x2 and H(0, (0, x]) = 2x = x2 . Example 2: Let X be a continuous random variable with: f (x) = 2 −x e 1(x > 0) + ex 1(x < 0) . 3 Note that the distribution function of X is: 1 F (x) = P (X ≤ x) = ex 1(x ≤ 0) + 3 2 1 2 + (1 − e−x ) 3 3 1(x > 0) . We first find the conditional distribution of Y = sign(X) given |X|, i.e. P (Y = 1| | X |= x) for x > 0. So, it suffices to get a transition function G(t, a), a ∈ {−1, 1}, t > 0, such that: Z x G(t, 1) dF|X| ) t . (0.3) P (|X| ≤ x, Y = 1) = 0 and Z x G(t, −1) dF|X| ) t . P (|X| ≤ x, Y = −1) = 0 Now, P (|X| ≤ x, Y = 1) = P (0 < X ≤ x) = 2 (1 − e−x ) , 3 and P (|X| ≤ x, Y = −1) = P (−x ≤ X < 0) = 1 (1 − e−x ) . 3 So P (|X| ≤ x) = 1 − e−x . Now, by (0.3), Z x Z x 2 −t −x G(t, 1) e−t dt , G(t, 1) d(1 − e ) = (1 − e ) = 3 0 0 showing that G(x, 1) = 2/3. Similarly G(x, −1) = 1/3. Note that the distribution corresponding to f can be generated by the following stochastic mechanism: Let V follow Exp(1) and let B be a {−1, 1} valued random variable independent of V , with pB (1) = 2/3 = 1 − pB (−1), and let X = V 1{B = 1} − V 1{B = −1}. Then V is precisely |X| and X ∼ f . Note that the sign of X is precisely B and it is independent of V = |X| by the mechanism itself. So the conditional of sign(X) given |X| is simply the unconditional distribution of B and we obtain the same result as with the formal derivation. Next, consider the distribution of X given Y . Note that P (Y = −1) = 1/3 and P (Y = 1) = 2/3. We have: 1 P (Y = −1, X ≤ x) = H(−1, (−∞, x]) , 3 so H(−1, (−∞, x]) = 3 P (Y = −1, X ≤ x) = 3 P (X ≤ 0 ∧ x) 1 = 3 1(x > 0) + P (X ≤ x) 1(x ≤ 0) = ex 1(x ≤ 0) + 1(x > 0) . 3 Similarly, we compute H(1, (−∞, x]). 3 0.1 Order statistics and conditional distributions Let X1 , X2 , . . . , Xn be i.i.d. from a distribution F with Lebesgue density f . Let X(1) , X(2) , . . . , X(n) be the corresponding order statistics. Note that the order statistics are all distinct with probability 1 and P ( (X(1) , X(2) , . . . , X(n) ) ∈ B) = 1 , where B = {(x1 , x2 , . . . , xn ) : x1 < x2 < . . . < xn }. Let’s first find the joint density of the order statistics. Let Π be the set of all permutations of the numbers 1 through n. For a measurable subset of B, say A, we have: P ((X(1) , X(2) , . . . , X(n) ) ∈ A)) = P (∪π∈Π {(Xπ1 , Xπ2 , . . . , Xπn ) ∈ A}) X = P ({(Xπ1 , Xπ2 , . . . , Xπn ) ∈ A}) π∈Π = n ! P ((X1 , X2 , . . . , X) ∈ A) Z = n ! f (x1 , x2 , . . . , xn ) dx1 dx2 . . . dxn . A This shows that: ford (x1 , x2 , . . . , xn ) = n ! Πni=1 f (xi ) , (x1 , x2 , . . . , xn ) ∈ B . Remark: If we assumed that the Xi ’s were not independent but came from an exchangeable distribution with density f (x1 , x2 , . . . , xn ), i.e. the distribution of the Xi ’s is invariant to permutations of the Xi ’s, then f is necessarily symmetric in its arguments and an argument similar to the one above would show that ford (x1 , x2 , . . . , xn ) = n ! f (x1 , x2 , . . . , xn ) , (x1 , x2 , . . . , xn ) ∈ B . Now, consider the situation that the distribution of (X1 , X2 , . . . , Xn ) is exchangeable. We seek to find: P ((X1 , X2 , . . . Xn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn ) for some permutation π. Let τ be an arbitrary permutation. Note that (Y1 , Y2 , . . . , Yn ) ≡ (Xτ1 , Xτ2 , . . . , Xτn ) has the same distribution as (X1 , X2 , . . . , Xn ). Thus, P((X1 , X2 , . . . Xn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn ) = P((Y1 , Y2 , . . . Yn ) = (xπ1 , xπ2 , . . . , xπn ) | Y(1) = x1 , Y(2) = x2 , . . . , Y(n) = xn ) = P((Xτ1 , Xτ2 , . . . Xτn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn ) = P((X1 , X2 , . . . Xn ) = (x(π◦τ −1 )1 , x(π◦τ −1 )2 , . . . , x(π◦τ −1 )n ) | X(i) = xi , i = 1, . . . , n) . 4 As τ runs over all permutations, so does π ◦ τ −1 , showing that the conditional probability under consideration does not depend upon the permutation π initially fixed. As there are n ! permutations, we conclude that: 1 . P ((X1 , X2 , . . . Xn ) = (xπ1 , xπ2 , . . . , xπn ) | X(1) = x1 , X(2) = x2 , . . . , X(n) = xn ) = n! An example with Uniforms: Suppose that X1 , X2 , . . . , Xn are i.i.d. Uniform (0, θ). The joint density of {X(i) } is given by: n! 1 {0 < x1 < x2 < . . . < xn < θ} . θn The marginal density of the maximum, X(n) is: n 1 {0 < xn < θ} . fX(n) (xn ) = n xn−1 θ n So, the conditional density of (X(1) , X(2) , . . . , X(n−1) ) | X(n) = xn ), by direct division, is seen to be: (n − 1) ! 1 {0 < x1 < x2 < . . . < xn } . fcond (x1 , x2 , . . . , xn−1 ) = xnn−1 This shows that the first n − 1 order statistics given the maximum, xn , are distributed as the n − 1 order statistics from a sample of size n − 1 from Uniform(0, xn ). But note that the distribution of the vector {X1 , X2 , . . . , Xn } − {X(n) } given (X(1) , X(2) , . . . , X(n) ) must be uniformly distributed on all the (n − 1) ! permutations of the first n − 1 order statistics. Thus, the random vector {X1 , X2 , . . . , Xn } − {X(n) }, conditional on X(n) , must behave like an i.i.d. random sample from Uniform(0, X(n) ). These arguments can be made more rigorous but at the expense of much notation. ford (x1 , x2 , . . . , xn ) = Order statistics and non-exchangeable distributions: Take (X, Y ) to be a pair of independent random variables, each defined on (0, 1), with X having Lebesgue density f and Y having Lebesgue density g. Now, X and Y are not exchangeable. We consider P ((X, Y ) ∈ A , (U, V ) ∈ B) where U = X ∧ Y, V = X ∨ Y , A is a Borel subset of (0, 1) and B a Borel subset of {x < y : x, y ∈ (0, 1)}. Let π be the permutation on {1, 2} that swaps indices. Then: P ((X, Y ) ∈ A , (U, V ) ∈ B) = P ((X, Y ) ∈ A ∩ (B ∪ πB)) = P ((X, Y ) ∈ A ∩ B) + P ((X, Y ) ∈ (A ∩ πB) Z Z = f (x)g(y) dx dy + f (x)g(y) dxdy A∩B A∩πB Z Z = f (u)g(v) du dv + f (v)g(u) dudv (change of variable) A∩B πA∩B Z = , {(f (u)g(v) 1((u, v) ∈ A) + f (v)g(u) 1((u, v) ∈ πA)} du dv . B 5 From the above derivation, taking A to be the unit square, we find that: Z (f (u)g(v) + f (v)g(u)) dudv , P ((U, V ) ∈ B) = B so that dFU,V (u, v) = (f (u)g(v) + f (v)g(u)) dudv. Conclude that: Z P ((X, Y ) ∈ A , (U, V ) ∈ B) = ξ((u, v), A) dFU,V (u, v) B where, for u < v, f (u) g(v) f (v) g(u) 1((u, v) ∈ A) + 1((u, v) ∈ πA) . f (u)g(v) + f (v)g(u) f (u)g(v) + f (v)g(u) Q Remark: If (X1 , X2 , . . . , Xn ) is a random vector with density ni=1 fi (xi ), you should be able to guess the form of the conditional distribution of the Xi ’s given the order statistics. ξ((u, v), A) = 6