Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
APPENDIX C Distribution Functions Let (Ω, F, P) be a probability space and let X be a random variable on (Ω, F, P). Theorem C.1. Each random variable X on (Ω, F, P) induces a probability space (R1 , B 1 , μX ) by μX (B) = P({ω : X(ω) ∈ B}) = P(X ∈ B) for all B ∈ B 1 . (C.1) dd (C.1) dd μ dd probability measure dddddddd. dddddddd ddddddd probability space (Ω, F, P) ddddddddddddd probability space (R1 , B 1 , μX ). ddddddddddddd μ dddddd. Notation C.2. {ω : X(ω) ∈ B} = {X ∈ B} = X −1 (B). ddd pre-image ddd. ddddddddd, μX (B) = P(X ∈ B) = P(X −1 (B)) = (P ◦ X −1 )(B) i.e., μX = P ◦ X −1 Definition C.3. The function F defined by F (x) = μX ((−∞, x]) = P(X ≤ x) is called the cumulative distribution function (c.d.f.) of the random variable X. Example C.4. (1) If the cumulative distribution function of X is given by x 1 (z − μ)2 F (x) = √ dz, exp − 2σ 2 2πσ −∞ 265 266 C. DISTRIBUTION FUNCTIONS we say X is normally distributed with mean μ and variance σ 2 , denoted by X ∼ N (μ, σ 2 ). (2) Suppose (Ω, F, P) = ([0, 1], B1 , m). (i) X1 (ω) = ω, for all ω ∈ [0, 1]. Then F (x) = P(X1 ≤ x) = m({ω ∈ [0, 1] : ω ≤ x}) = ⎧ ⎪ ⎪ ⎪0 ⎪ ⎪ ⎪ ⎨ x ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩1 if x < 0, if 0 ≤ x ≤ 1, if x > 1. (ii) X2 (ω) = 1 − ω, for all ω ∈ [0, 1]. Then ⎧ ⎪ ⎪ ⎪ 0 if x < 0, ⎪ ⎪ ⎪ ⎨ F (x) = x if 0 ≤ x ≤ 1, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩1 if x > 1, which is the same as the distribution function of X1 . This random variable is called uniformly distribution on [0, 1]. Proposition C.5. Let F (x) be the distribution function of a random variable X. Then (1) F (−∞) := lim F (x) = 0, x−→−∞ F (∞) := lim F (x) = 1. x−→∞ (2) F is non-decreasing and right-continuous. Remark C.6. (1) If F is differentiable, the function f (x) = F (x) is called the probability density function (p.d.f, dddddd) of X. (2) If F is represented in the form F = ∞ j=1 bj I[aj ,∞) , C. DISTRIBUTION FUNCTIONS where (an ) ⊆ R, bj > 0, for all j and ∞ 267 bj = 1, then F is called a discrete distribution j=1 function. Moreover, if ai = aj for all i = j, then the function m({ai }) = bi is called the probability mass function. (3) For any distribution function F , we have the following decomposition F = αFd + βFac + γFsc , where α, β, γ ≥ 0, α + β + γ = 1, and • Fd : discrete distribution function, • Fac : absolutely continuous distribution function, i.e., Fac (x) exists for all x. • Fsc : singular continuous distribution function. This means that Fsc exist almost everywhere, and Fsc (x) = 0 almost everywhere, but Fsc (x) = x −∞ Fsc (y) dy. For example, the Cantor function shown in Figure C.1 is a singular continuous distribution function. 1 3/4 1/2 1/4 1/9 2/9 1/3 2/3 7/9 8/9 Figure C.1. Cantor function 1 268 C. DISTRIBUTION FUNCTIONS Remark C.7. The expectation of a random variable X can be written as E[X] = X dP = X(ω) dP(ω) = Ω Ω = x dμX = R ∞ = −∞ x dμX (x) = R R Ω X(ω) P(dω) x μX (dx) x dFX (x). Theorem C.8 (Change of variables). If X is a random variable and g : R1 −→ R1 is a Borel-measurable function, then E[g(X)] = g(X) dP = Ω R g(x) dμX (x) = ∞ −∞ g(x) dFX (x). Question : How about the high dimensional case ? Definition C.9. Consider the random vecotr X = (X1 , X2 , · · · , Xd ). This implies that X : Ω −→ Rd . Define a probability measure on (Rd , B d ) by μX (B) = (P ◦ X −1 )(B) = P(X −1 (B)) = P((X1 , X2 , · · · , Xd ) ∈ B) for B ∈ B d . The distribution function of X is defined by FX (x) = P(X1 ≤ x1 , X2 ≤ x2 , · · · , Xd ≤ xd ) for x = (x1 , x2 , · · · , xd ). Theorem C.10. If X is a random variable and g : Rd −→ R is a Borel-measurable function, then E[g(X)] = g(X) dP = Ω ∞ ∞ = −∞ −∞ ··· Rd ∞ −∞ g(x) dμX (x) g(x1 , x2 , · · · , xd ) dFX (x1 , x2 , · · · , xd ) Theorem C.11. The following statements are equivalent. C. DISTRIBUTION FUNCTIONS (1) X1 , X2 , · · · , Xd are independent, i.e., P(X1 ∈ B1 , X2 ∈ B2 , · · · , Xd ∈ Bd ) = P(X1 ∈ B1 )P(X2 ∈ B2 ) · · · P(Xd ∈ Bd ). (2) μX = μX1 × μX2 × · · · × μXd for X = (X1 , X2 , · · · , Xd ). (3) FX (x1 , x2 , · · · , xd ) = FX1 (x1 )FX2 (x2 ) · · · FXd (xd ). 269