Download Here I want to introduce the concept of a measure on a measurable

MATH 7550-01 INTRODUCTION TO PROBABILITY FALL 2011 Lecture 7. Densities. Distribution functions. Here I want to introduce the concept of a measure 𝑚 on a measurable space (𝑋, 𝒳 ) being 𝜎-finite. This means that there exists an infinite sequence of sets 𝐵𝑖 ∈ 𝒳 such that 𝑚(𝐵𝑖 ) < ∞, ∪ ∞ and 𝑖=1 𝐵𝑖 = 𝑋. The Lebesgue measure 𝜆𝑛 is 𝜎-finite (we can take 𝐵𝑖 = [− 𝑖, 𝑖]𝑛 ), while the counting measure # is not if the space 𝑋 is uncountable. ∫ Now, how is the integral 𝑓 𝑑𝑚 of a measurable function 𝑓 with respect to the 𝐴 measure 𝑚 over some subset 𝐴 ⊂ 𝑋, and not over the whole 𝑋, defined? This simplest way to do so is to take by definition, for 𝐴 ∈ 𝒳 , ∫ ∫ 𝑓 (𝑥) 𝑚(𝑑𝑥) = 𝐼𝐴 (𝑥) 𝑚(𝑑𝑥). (7.1) 𝐴 𝑋 This definition leads to the same result as if we consider simple functions on 𝐴, then arbitrary nonnegative measurable functions on 𝐴, etc. Now, we started to whole digression about Lebesgue integrals when we spoke about distribution densities; our last formula before that was (5.13). If a set function 𝑛 is given by formula (5.13) for all 𝐶 ∈ 𝒳 , it is necessarily countably additive: for disjoint 𝐴1 , 𝐴2 , ..., 𝐴𝑛 , ... ∫ ∫ ∞ (∪ ) 𝑛 𝐴𝑖 = 𝑓 (𝑥) 𝑚(𝑑𝑥) = 𝐼∪∞ 𝐴𝑖 (𝑥) ⋅ 𝑓 (𝑥) 𝑚(𝑑𝑥) 𝑖=1 𝑋 ∪∞ 𝑖=1 𝐴𝑖 𝑖=1 (7.2) ∫ ∑ ∫ ∞ ∞ ∞ ∑ ∑ = 𝐼𝐴𝑖 (𝑥) ⋅ 𝑓 (𝑥) 𝑚(𝑑𝑥) = 𝐼𝐴𝑖 (𝑥) ⋅ 𝑓 (𝑥) 𝑚(𝑑𝑥) = 𝑛(𝐴𝑖 ) 𝑋 𝑖=1 𝑖=1 𝑋 𝑖=1 (proved above for nonnegative integrands; so something remains here to be proved). So a set function 𝑛 having a density with respect to a measure 𝑚 is necessarily countably additive. (In Kolmogorov & Fomin’s book countably additive set functions are called charges – because such functions provide a mathematical model for electric charges, 𝑛(𝐴) being the total charge, positive or negative, carried by the region 𝐴). A density 𝑓 (𝑥) of 𝑛 with respect to 𝑚 is, generally, not unique: if we change the function 𝑓 (𝑥) arbitrarily on a non-empty set 𝐴 having zero 𝑚-measure (such sets may not ∫ exist for a measure 𝑚; they certainly do exist for the Lebesgue measure), the integrals 𝑓 𝑑𝑚 do not change, and the changed 𝑓 is still a version of density. 𝐶 Theorem 7.1. The density of a countably additive set function 𝑛 with respect to 𝑚 is almost unique (of course supposing that it exists); i. e., if the measurable functions 𝑓1 , 𝑓2 are two versions of this density: ∫ ∫ 𝑓1 (𝑥) 𝑚(𝑑𝑥) = 𝑓2 (𝑥) 𝑚(𝑑𝑥) = 𝑛(𝐶) (7.3) 𝐶 𝐶 1 for every 𝐶 ∈ 𝒳 , then 𝑓1 (𝑥) = 𝑓2 (𝑥) almost everywhere, (7.4) that is, 𝑚{𝑥 : 𝑓1 (𝑥) ∕= 𝑓2 (𝑥)} = 0. ∫ (7.5) ∫ “Proof ”: Let 𝐴 = {𝑥 : 𝑓1 (𝑥) < 𝑓2 (𝑥)}; then we have [𝑓2 (𝑥) − 𝑓1 (𝑥)] 𝑚(𝑑𝑥) = 𝐴 ∫ 𝑓2 𝑑𝑚 − 𝑓1 𝑑𝑚 = 𝑛(𝐴) − 𝑛(𝐴) = 0. The integral of the function 𝑓2 (𝑥) − 𝑓1 (𝑥) that is 𝐴 𝐴 strictly positive on 𝐴 is equal to 0, and it follows from this that 𝑚(𝐴) = 0. Similarly, for 𝐵 = {𝑥 : 𝑓1 (𝑥) > 𝑓2 (𝑥)} we have 𝑚(𝐵) = 0; and 𝑚{𝑥 : 𝑓1 (𝑥) ∕= 𝑓2 (𝑥)} = 𝑚(𝐴) + 𝑚(𝐵) = 0. However, it turns out that the statement of Theorem 7.1 is false (although it’s pretty easy to make it correct introducing some extra conditions), and the “proof” contains some mistake. Problem 7 : Produce∫an example of measures 𝑚 and 𝑛 on a space (𝑋, 𝒳 ) such that ∫ 𝑓1 (𝑥) 𝑚(𝑑𝑥) = 𝑓2 (𝑥) 𝑚(𝑑𝑥), but 𝑚{𝑥 : 𝑓1 (𝑥) = ∕ 𝑓2 (𝑥)} ∕= 0. Show at which 𝑛(𝐶) = 𝐶 𝐶 point the “proof” of Theorem 7.1 fails. I’ll give the corrected version of Theorem 7.1 at the end of the next version of Lecture note # 7 (not in the present version, since I want you to solve Problem 7 by yourselves). The notations for (every version of) the density of 𝑛 with respect to 𝑚 will be 𝑓 (𝑥) = 𝑛(𝑑𝑥) 𝑑𝑛 = (𝑥). 𝑚(𝑑𝑥) 𝑑𝑚 (7.6) Why such notations are used, we’ll discuss later. It is clear that if 𝑛 has a density with respect to 𝑚, then for 𝐶 ∈ 𝒳 𝑚(𝐶) = 0 ⇒ 𝑛(𝐶) = 0. (7.7) A countably additive set function 𝑛 is called absolutely continuous with respect to 𝑚 if 𝑛 has a density with respect to 𝑛 (formula (3.22) is satisfied). The notation for absolute continuity is 𝑛 ≪ 𝑚. (7.8) The following theorem (a real big one, not so easy to prove) is proved in measure theory: Theorem 7.2 (Radon – Nikodym’s Theorem: see, e. g., Kolmogorov & Fomin’s book, Theorem 2 of Section 34). Let a measure 𝑚 on (𝑋, 𝒳 ) be 𝜎-finite, and let 𝑛 be a finite countably additive set function on (𝑋, 𝒳 ) that is absolutely continuous with respect to 𝑚. Then 𝑛 has a density 𝑓 (𝑥) with respect to 𝑚. 2 Note that the Lebesgue measure 𝜆𝑛 is 𝜎-finite (𝐵𝑖 = {𝑥 : ∣𝑥∣ ≤ 𝑖}); so a density with 𝜇𝜉 (𝑑𝑥) 𝜇𝜉 (𝑑𝑥) respect to the Lebesgue measure = exists if and only if 𝜇𝜉 is absolutely 𝜆𝑛 (𝑑𝑥) 𝑑𝑥 continuous with respect to the Lebesgue measure. (Usually, for shortness, we speak of just continuous distributions. Random variables having such distributions are called continuous random variables.) It turns out that the description of discrete distributions by means of their “probability mass functions” 𝑝𝜉 (𝑥) = 𝑃 {𝜉 = 𝑥} is also a description involving the density – not with respect to the Lebesgue measure, but with respect to the counting measure #: 𝑝𝜉 (𝑥) = 𝜇𝜉 (𝑑𝑥) . #(𝑑𝑥) Indeed, this means that for every set 𝐶 ∈ 𝒳 ∫ 𝑝(𝑥) #(𝑑𝑥). 𝑃 {𝜉 ∈ 𝐶} = (7.9) (7.10) 𝐶 By (6.28), this is the same as 𝑃 {𝜉 ∈ 𝐶} = ∑ 𝑝(𝑥), (7.11) 𝑥∈𝐶 and this is just the formula (5.8). This is a reason for treating probability mass functions of discrete distributions and probability densities of (absolutely) continuous ones in parallel ways (e. g., the statistical maximum-likelihood estimates are defined by the same formulas in the discrete and in the continuous case). Note that since the counting measure # is not, generally, 𝜎 -finite, the above results about existence and (almost) uniqueness of the density cannot be applied. As for uniqueness, it still is there, and even without “almost” (because there are no non-empty sets with #(𝐴) = 0); but it does not follow from 𝜇𝜉 (∅) = 0 that this distribution is a discrete one. (Nothing at all can follow from 𝜇𝜉 (∅) = 0, because this a general property of all measures.) Sometimes it is reasonable to consider densities of probability distributions not with respect to the Lebesgue measure, or to the counting measure #, but with respect to some other measures. For example, if we consider random variables taking values in an infinitedimensional space: in such spaces no infinite-dimensional Lebesgue measure can be defined. 𝑑𝜇𝜂 It is worth while to consider the density of one distribution with respect to another. 𝑑𝜇𝜉 Every distribution of a random variable is a probability measure (i. e., a measure whose value on the largest set is equal to 1); and every probability measure 𝑚 on a measurable space (𝑋, 𝒳 ) is the distribution of some random variable taking values in (𝑋, 𝒳 ) (in fact, of infinitely many random variables). Indeed, we can take the probability space (Ω, ℱ, 𝑃 ) in this way: Ω = 𝑋, ℱ = 𝒳 , 𝑃 = 𝑚; we define on this space the random variable 𝜉(𝜔) = 𝜔; and the distribution 𝜇𝜉 of this random variable will be nothing but the measure 𝑚: 𝜇𝜉 (𝐶) = 𝑃 {𝜔 : 𝜉(𝜔) ∈ 𝐶} = 𝑃 (𝐶) = 𝑚(𝐶). 3 In the Lecture 5, discrete distributions and (absolutely) continuous distributions were introduced. Are there distributions that do not belong to these two classes? Of course, there are: mixtures of discrete and continuous distributions: 𝜇(𝐶) = 𝑞1 ⋅ 𝜇1 (𝐶) + 𝑞2 ⋅ 𝜇2 (𝐶), where 𝑞1 , 𝑞2 > 0, 𝑞1 + 𝑞2 = 1, 𝜇1 is a discrete distribution, and 𝜇2 an absolutely continuous one. The mixture is neither discrete nor continuous, because ∑ 𝜇{𝑥} = 𝑞1 ∕= 0, 1, (7.12) 𝑥∈ℝ1 while for discrete distributions it should be equal to 1, and for continuous, to 0. Are there distributions on the real line that are not such mixtures (and not discrete, not absolutely continuous)? We’ll return to this question after considering distribution functions. Let 𝜉 be a real-valued random variable. Its distribution function is a function on (− ∞, ∞) defined by 𝐹 (𝑥) = 𝐹𝜉 (𝑥) = 𝑃 {𝜉 ≤ 𝑥}. (7.13) Clearly the distribution function depends only on the distribution 𝜇 of a random variable: 𝐹 (𝑥) = 𝐹𝜇 (𝑥) = 𝜇(− ∞, 𝑥]. (7.14) Theorem 7.3. We have, for all − ∞ < 𝑎 ≤ 𝑏 < ∞: 𝑃 {𝑎 < 𝜉 ≤ 𝑏} = 𝐹𝜉 (𝑏) − 𝐹𝜉 (𝑎). (7.15) The proof is very simple: Clearly, (− ∞, 𝑏] = (− ∞, 𝑎] ∪ (𝑎, 𝑏], and these intervals are disjoint; the same holds for their inverse images under the mapping 𝜉: {𝜉 ≤ 𝑏} = {𝜉 ≤ 𝑎} ∪ {𝑎 < 𝜉 ≤ 𝑏}, (7.16) and by (finite) additivity of 𝑃 , 𝑃 {𝜉 ≤ 𝑏} = 𝑃 {𝜉 ≤ 𝑎} + 𝑃 {𝑎 < 𝜉 ≤ 𝑏}, (7.17) which leads to (7.15). Theorem 7.4. If 𝐹 (𝑥) is the distribution function of a random variable 𝜉, then 𝐹 (𝑥) is non-decreasing, (7.18) 𝐹 (∞) = 1, (7.19) 𝐹 (− ∞) = 0, (7.20) 𝐹 (𝑐+ ) = 𝐹 (𝑐) for 𝑐 ∈ (− ∞, ∞), 𝐹 (𝑐− ) = 𝑃 {𝜉 < 𝑐} 4 (7.21) (7.22) (𝐹 (∞), 𝐹 (− ∞), 𝐹 (𝑐+ ), 𝐹 (𝑐− ) are the notations for the limits lim𝑥→∞ 𝐹 (𝑥), lim𝑥→−∞ 𝐹 (𝑥), the right-hand limit at 𝑐: lim𝑥→𝑐+ 𝐹 (𝑥), and the left-hand limit lim𝑥→𝑐− 𝐹 (𝑥). Note that by definition the formula (7.13) defines 𝐹 (𝑥) on the real line, and not on the extended real line). Equality (7.21) means that every distribution function is continuous from the right at every point of the real axis. Proof. The statement (7.18) follows from Theorem 7.3. For a monotone function, limits at ± ∞ and one-sided limits at every finite point necessarily exist, so the limits (7.19) – (7.22) do exist. The limit at + ∞ is equal to the limit along every sequence of numbers going to ∞; e. g., 𝐹 (∞) = lim 𝐹 (𝑛) = lim 𝑃 {𝜉 ≤ 𝑛}. (7.23) 𝑛→∞ 𝑛→∞ The sequence of events 𝐵𝑛 = {𝜉 ≤ 𝑛} is clearly non-decreasing, so by Theorem 3.3 we have: 𝐹 (∞) = 𝑃 ( lim {𝜉 ≤ 𝑛}). (7.24) 𝑛→∞ What is the limit of this sequence of events? It is a non-decreasing sequence, so the limit is the union: ∞ ∪ lim {𝜉 ≤ 𝑛} = {𝜉 ≤ 𝑖}. (7.25) 𝑛→∞ 𝑖=1 But this union clearly is the whole sample space Ω: for every sample point 𝜔 ∈ Ω, there exists at least one natural number 𝑖 such that 𝑖 > 𝜉(𝜔) (or 𝑖 ≥ 𝜉(𝜔)). So 𝐹 (∞) = 𝑃 (Ω) = 1. (7.26) Now to (7.20): 𝐹 (− ∞) = lim 𝐹 (− 𝑛) = lim 𝑃 {𝜉 ≤ − 𝑛} = 𝑃 ( lim {𝜉 ≤ − 𝑛}) 𝑛→∞ 𝑛→∞ =𝑃 𝑛→∞ ∞ ∩ ( ) {𝜉 ≤ −𝑖} = 𝑃 (∅) = 0 (7.27) 𝑖=1 (the events {𝜉 ≤ − 𝑛} form a non-increasing sequence with (clearly) empty intersection). To (7.21) (with a non-increasing sequence): 𝐹 (𝑐+ ) = lim 𝑃 {𝜉 ≤ 𝑐 + 1/𝑛} = 𝑃 𝑛→∞ ∞ (∩ ) {𝜉 ≤ 𝑐 + 1/𝑖} = 𝑃 {𝜉 ≤ 𝑐} = 𝐹 (𝑐) (7.28) 𝑖=1 (the event {𝜉 ≤ 𝑐} occurs if and only if all events {𝜉 ≤ 𝑐 + 1/𝑖} occur: the “if” part by limit passage in the inequality 𝜉 ≤ 𝑐 + 1/𝑖, and the “only if”: 𝜉 ≤ 𝑐 ⇒ 𝜉 ≤ 𝑐 + 1/𝑖 for all natural 𝑖). Finally, (7.22) (again a non-decreasing sequence of events): ∞ ∪ {𝜉 ≤ 𝑐 − 1/𝑖} = {𝜉 < 𝑐} 𝑖=1 5 (7.29) (if 𝜉(𝜔) is ≤ than at least one 𝑐 − 1/𝑖, it is certainly (less than) 𝑐; and if 𝜉(𝜔) is less than 𝑐, there exists a natural number 𝑖 such that 𝑐 − 1/𝑖 ∈ 𝜉(𝜔), 𝑐 , and 𝜔 belongs to the event in the left-hand side of (7.29)), so − 𝐹 (𝑐 ) = lim 𝑃 {𝜉 ≤ 𝑐 − 1/𝑛} = 𝑃 𝑛→∞ ∞ (∪ ) {𝜉 ≤ 𝑐 − 1/𝑖} = 𝑃 {𝜉 < 𝑐}. (7.30) 𝑖=1 From (7.13) and (7.22) we obtain easily: 𝑃 {𝜉 = 𝑐} = 𝑃 {𝜉 ≤ 𝑐} − 𝑃 {𝜉 < 𝑐} = 𝐹 (𝑐) − 𝐹 (𝑐− ) = 𝐹 (𝑐+ ) − 𝐹 (𝑐− ); (7.31) this is the jump of the distribution function 𝐹 at the point 𝑐. Now, the corrected formulation of the almost-uniqueness theorem: Theorem 7.1′ . Let the measure 𝑚 be 𝜎-finite. Then the density of a countably additive set function 𝑛 with respect to 𝑚 is almost unique (supposing that it exists). Try to work out the proof by yourself (anyway I don’t have to give proofs of measuretheory theorems). 6

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Here I want to introduce the concept of a measure on a measurable