Download Here I want to introduce the concept of a measure on a measurable

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Random variable wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Central limit theorem wikipedia , lookup

Transcript
MATH 7550-01
INTRODUCTION TO PROBABILITY
FALL 2011
Lecture 7. Densities. Distribution functions.
Here I want to introduce the concept of a measure π‘š on a measurable space (𝑋, 𝒳 )
being 𝜎-finite.
This means that there exists an infinite sequence of sets 𝐡𝑖 ∈ 𝒳 such that π‘š(𝐡𝑖 ) < ∞,
βˆͺ
∞
and 𝑖=1 𝐡𝑖 = 𝑋. The Lebesgue measure πœ†π‘› is 𝜎-finite (we can take 𝐡𝑖 = [βˆ’ 𝑖, 𝑖]𝑛 ), while
the counting measure # is not if the space 𝑋 is uncountable.
∫
Now, how is the integral
𝑓 π‘‘π‘š of a measurable function 𝑓 with respect to the
𝐴
measure π‘š over some subset 𝐴 βŠ‚ 𝑋, and not over the whole 𝑋, defined? This simplest
way to do so is to take by definition, for 𝐴 ∈ 𝒳 ,
∫
∫
𝑓 (π‘₯) π‘š(𝑑π‘₯) =
𝐼𝐴 (π‘₯) π‘š(𝑑π‘₯).
(7.1)
𝐴
𝑋
This definition leads to the same result as if we consider simple functions on 𝐴, then
arbitrary nonnegative measurable functions on 𝐴, etc.
Now, we started to whole digression about Lebesgue integrals when we spoke about
distribution densities; our last formula before that was (5.13).
If a set function 𝑛 is given by formula (5.13) for all 𝐢 ∈ 𝒳 , it is necessarily countably
additive: for disjoint 𝐴1 , 𝐴2 , ..., 𝐴𝑛 , ...
∫
∫
∞
(βˆͺ
)
𝑛
𝐴𝑖 =
𝑓 (π‘₯) π‘š(𝑑π‘₯) =
𝐼βˆͺ∞ 𝐴𝑖 (π‘₯) β‹… 𝑓 (π‘₯) π‘š(𝑑π‘₯)
𝑖=1
𝑋
βˆͺ∞
𝑖=1
𝐴𝑖
𝑖=1
(7.2)
∫ βˆ‘
∫
∞
∞
∞
βˆ‘
βˆ‘
=
𝐼𝐴𝑖 (π‘₯) β‹… 𝑓 (π‘₯) π‘š(𝑑π‘₯) =
𝐼𝐴𝑖 (π‘₯) β‹… 𝑓 (π‘₯) π‘š(𝑑π‘₯) =
𝑛(𝐴𝑖 )
𝑋 𝑖=1
𝑖=1
𝑋
𝑖=1
(proved above for nonnegative integrands; so something remains here to be proved).
So a set function 𝑛 having a density with respect to a measure π‘š is necessarily
countably additive. (In Kolmogorov & Fomin’s book countably additive set functions are
called charges – because such functions provide a mathematical model for electric charges,
𝑛(𝐴) being the total charge, positive or negative, carried by the region 𝐴).
A density 𝑓 (π‘₯) of 𝑛 with respect to π‘š is, generally, not unique: if we change the
function 𝑓 (π‘₯) arbitrarily on a non-empty set 𝐴 having zero π‘š-measure (such sets may
not
∫ exist for a measure π‘š; they certainly do exist for the Lebesgue measure), the integrals
𝑓 π‘‘π‘š do not change, and the changed 𝑓 is still a version of density.
𝐢
Theorem 7.1. The density of a countably additive set function 𝑛 with respect to π‘š
is almost unique (of course supposing that it exists); i. e., if the measurable functions 𝑓1 ,
𝑓2 are two versions of this density:
∫
∫
𝑓1 (π‘₯) π‘š(𝑑π‘₯) =
𝑓2 (π‘₯) π‘š(𝑑π‘₯) = 𝑛(𝐢)
(7.3)
𝐢
𝐢
1
for every 𝐢 ∈ 𝒳 , then
𝑓1 (π‘₯) = 𝑓2 (π‘₯)
almost everywhere,
(7.4)
that is,
π‘š{π‘₯ : 𝑓1 (π‘₯) βˆ•= 𝑓2 (π‘₯)} = 0.
∫
(7.5)
∫
β€œProof ”: Let 𝐴 = {π‘₯ : 𝑓1 (π‘₯) < 𝑓2 (π‘₯)}; then we have
[𝑓2 (π‘₯) βˆ’ 𝑓1 (π‘₯)] π‘š(𝑑π‘₯) =
𝐴
∫
𝑓2 π‘‘π‘š βˆ’
𝑓1 π‘‘π‘š = 𝑛(𝐴) βˆ’ 𝑛(𝐴) = 0. The integral of the function 𝑓2 (π‘₯) βˆ’ 𝑓1 (π‘₯) that is
𝐴
𝐴
strictly positive on 𝐴 is equal to 0, and it follows from this that π‘š(𝐴) = 0. Similarly, for
𝐡 = {π‘₯ : 𝑓1 (π‘₯) > 𝑓2 (π‘₯)} we have π‘š(𝐡) = 0; and π‘š{π‘₯ : 𝑓1 (π‘₯) βˆ•= 𝑓2 (π‘₯)} = π‘š(𝐴) + π‘š(𝐡)
= 0.
However, it turns out that the statement of Theorem 7.1 is false (although it’s pretty
easy to make it correct introducing some extra conditions), and the β€œproof” contains some
mistake.
Problem
7 : Produce∫an example of measures π‘š and 𝑛 on a space (𝑋, 𝒳 ) such that
∫
𝑓1 (π‘₯) π‘š(𝑑π‘₯) =
𝑓2 (π‘₯) π‘š(𝑑π‘₯), but π‘š{π‘₯ : 𝑓1 (π‘₯) =
βˆ• 𝑓2 (π‘₯)} βˆ•= 0. Show at which
𝑛(𝐢) =
𝐢
𝐢
point the β€œproof” of Theorem 7.1 fails.
I’ll give the corrected version of Theorem 7.1 at the end of the next version of Lecture
note # 7 (not in the present version, since I want you to solve Problem 7 by yourselves).
The notations for (every version of) the density of 𝑛 with respect to π‘š will be
𝑓 (π‘₯) =
𝑛(𝑑π‘₯)
𝑑𝑛
=
(π‘₯).
π‘š(𝑑π‘₯)
π‘‘π‘š
(7.6)
Why such notations are used, we’ll discuss later.
It is clear that if 𝑛 has a density with respect to π‘š, then for 𝐢 ∈ 𝒳
π‘š(𝐢) = 0 β‡’ 𝑛(𝐢) = 0.
(7.7)
A countably additive set function 𝑛 is called absolutely continuous with respect to π‘š if
𝑛 has a density with respect to 𝑛 (formula (3.22) is satisfied). The notation for absolute
continuity is
𝑛 β‰ͺ π‘š.
(7.8)
The following theorem (a real big one, not so easy to prove) is proved in measure
theory:
Theorem 7.2 (Radon – Nikodym’s Theorem: see, e. g., Kolmogorov & Fomin’s book,
Theorem 2 of Section 34). Let a measure π‘š on (𝑋, 𝒳 ) be 𝜎-finite, and let 𝑛 be a finite
countably additive set function on (𝑋, 𝒳 ) that is absolutely continuous with respect to π‘š.
Then 𝑛 has a density 𝑓 (π‘₯) with respect to π‘š.
2
Note that the Lebesgue measure πœ†π‘› is 𝜎-finite (𝐡𝑖 = {π‘₯ : ∣π‘₯∣ ≀ 𝑖}); so a density with
πœ‡πœ‰ (𝑑π‘₯)
πœ‡πœ‰ (𝑑π‘₯)
respect to the Lebesgue measure
=
exists if and only if πœ‡πœ‰ is absolutely
πœ†π‘› (𝑑π‘₯)
𝑑π‘₯
continuous with respect to the Lebesgue measure. (Usually, for shortness, we speak of just
continuous distributions. Random variables having such distributions are called continuous
random variables.)
It turns out that the description of discrete distributions by means of their β€œprobability
mass functions” π‘πœ‰ (π‘₯) = 𝑃 {πœ‰ = π‘₯} is also a description involving the density – not with
respect to the Lebesgue measure, but with respect to the counting measure #:
π‘πœ‰ (π‘₯) =
πœ‡πœ‰ (𝑑π‘₯)
.
#(𝑑π‘₯)
Indeed, this means that for every set 𝐢 ∈ 𝒳
∫
𝑝(π‘₯) #(𝑑π‘₯).
𝑃 {πœ‰ ∈ 𝐢} =
(7.9)
(7.10)
𝐢
By (6.28), this is the same as
𝑃 {πœ‰ ∈ 𝐢} =
βˆ‘
𝑝(π‘₯),
(7.11)
π‘₯∈𝐢
and this is just the formula (5.8).
This is a reason for treating probability mass functions of discrete distributions and
probability densities of (absolutely) continuous ones in parallel ways (e. g., the statistical
maximum-likelihood estimates are defined by the same formulas in the discrete and in the
continuous case).
Note that since the counting measure # is not, generally, 𝜎 -finite, the above results about existence
and (almost) uniqueness of the density cannot be applied. As for uniqueness, it still is there, and even
without β€œalmost” (because there are no non-empty sets with #(𝐴) = 0); but it does not follow from
πœ‡πœ‰ (βˆ…) = 0 that this distribution is a discrete one. (Nothing at all can follow from πœ‡πœ‰ (βˆ…) = 0, because
this a general property of all measures.)
Sometimes it is reasonable to consider densities of probability distributions not with
respect to the Lebesgue measure, or to the counting measure #, but with respect to some
other measures. For example, if we consider random variables taking values in an infinitedimensional space: in such spaces no infinite-dimensional Lebesgue measure can be defined.
π‘‘πœ‡πœ‚
It is worth while to consider the density
of one distribution with respect to another.
π‘‘πœ‡πœ‰
Every distribution of a random variable is a probability measure (i. e., a measure whose
value on the largest set is equal to 1); and every probability measure π‘š on a measurable
space (𝑋, 𝒳 ) is the distribution of some random variable taking values in (𝑋, 𝒳 ) (in fact,
of infinitely many random variables). Indeed, we can take the probability space (Ξ©, β„±, 𝑃 )
in this way: Ξ© = 𝑋, β„± = 𝒳 , 𝑃 = π‘š; we define on this space the random variable
πœ‰(πœ”) = πœ”; and the distribution πœ‡πœ‰ of this random variable will be nothing but the measure
π‘š: πœ‡πœ‰ (𝐢) = 𝑃 {πœ” : πœ‰(πœ”) ∈ 𝐢} = 𝑃 (𝐢) = π‘š(𝐢).
3
In the Lecture 5, discrete distributions and (absolutely) continuous distributions were
introduced. Are there distributions that do not belong to these two classes?
Of course, there are: mixtures of discrete and continuous distributions: πœ‡(𝐢) =
π‘ž1 β‹… πœ‡1 (𝐢) + π‘ž2 β‹… πœ‡2 (𝐢), where π‘ž1 , π‘ž2 > 0, π‘ž1 + π‘ž2 = 1, πœ‡1 is a discrete distribution,
and πœ‡2 an absolutely continuous one. The mixture is neither discrete nor continuous,
because
βˆ‘
πœ‡{π‘₯} = π‘ž1 βˆ•= 0, 1,
(7.12)
π‘₯βˆˆβ„1
while for discrete distributions it should be equal to 1, and for continuous, to 0.
Are there distributions on the real line that are not such mixtures (and not discrete,
not absolutely continuous)?
We’ll return to this question after considering distribution functions.
Let πœ‰ be a real-valued random variable. Its distribution function is a function
on (βˆ’ ∞, ∞) defined by
𝐹 (π‘₯) = πΉπœ‰ (π‘₯) = 𝑃 {πœ‰ ≀ π‘₯}.
(7.13)
Clearly the distribution function depends only on the distribution πœ‡ of a random
variable:
𝐹 (π‘₯) = πΉπœ‡ (π‘₯) = πœ‡(βˆ’ ∞, π‘₯].
(7.14)
Theorem 7.3. We have, for all βˆ’ ∞ < π‘Ž ≀ 𝑏 < ∞:
𝑃 {π‘Ž < πœ‰ ≀ 𝑏} = πΉπœ‰ (𝑏) βˆ’ πΉπœ‰ (π‘Ž).
(7.15)
The proof is very simple: Clearly, (βˆ’ ∞, 𝑏] = (βˆ’ ∞, π‘Ž] βˆͺ (π‘Ž, 𝑏], and these intervals
are disjoint; the same holds for their inverse images under the mapping πœ‰:
{πœ‰ ≀ 𝑏} = {πœ‰ ≀ π‘Ž} βˆͺ {π‘Ž < πœ‰ ≀ 𝑏},
(7.16)
and by (finite) additivity of 𝑃 ,
𝑃 {πœ‰ ≀ 𝑏} = 𝑃 {πœ‰ ≀ π‘Ž} + 𝑃 {π‘Ž < πœ‰ ≀ 𝑏},
(7.17)
which leads to (7.15).
Theorem 7.4. If 𝐹 (π‘₯) is the distribution function of a random variable πœ‰, then
𝐹 (π‘₯) is non-decreasing,
(7.18)
𝐹 (∞) = 1,
(7.19)
𝐹 (βˆ’ ∞) = 0,
(7.20)
𝐹 (𝑐+ ) = 𝐹 (𝑐)
for 𝑐 ∈ (βˆ’ ∞, ∞),
𝐹 (π‘βˆ’ ) = 𝑃 {πœ‰ < 𝑐}
4
(7.21)
(7.22)
(𝐹 (∞), 𝐹 (βˆ’ ∞), 𝐹 (𝑐+ ), 𝐹 (π‘βˆ’ ) are the notations for the limits limπ‘₯β†’βˆž 𝐹 (π‘₯),
limπ‘₯β†’βˆ’βˆž 𝐹 (π‘₯), the right-hand limit at 𝑐: limπ‘₯→𝑐+ 𝐹 (π‘₯), and the left-hand limit
limπ‘₯β†’π‘βˆ’ 𝐹 (π‘₯). Note that by definition the formula (7.13) defines 𝐹 (π‘₯) on the real line,
and not on the extended real line).
Equality (7.21) means that every distribution function is continuous from the right at
every point of the real axis.
Proof. The statement (7.18) follows from Theorem 7.3. For a monotone function, limits at ± ∞ and one-sided limits at every finite point necessarily exist, so the limits (7.19) –
(7.22) do exist.
The limit at + ∞ is equal to the limit along every sequence of numbers going to ∞;
e. g.,
𝐹 (∞) = lim 𝐹 (𝑛) = lim 𝑃 {πœ‰ ≀ 𝑛}.
(7.23)
π‘›β†’βˆž
π‘›β†’βˆž
The sequence of events 𝐡𝑛 = {πœ‰ ≀ 𝑛} is clearly non-decreasing, so by Theorem 3.3
we have:
𝐹 (∞) = 𝑃 ( lim {πœ‰ ≀ 𝑛}).
(7.24)
π‘›β†’βˆž
What is the limit of this sequence of events? It is a non-decreasing sequence, so the limit
is the union:
∞
βˆͺ
lim {πœ‰ ≀ 𝑛} =
{πœ‰ ≀ 𝑖}.
(7.25)
π‘›β†’βˆž
𝑖=1
But this union clearly is the whole sample space Ξ©: for every sample point πœ” ∈ Ξ©, there
exists at least one natural number 𝑖 such that 𝑖 > πœ‰(πœ”) (or 𝑖 β‰₯ πœ‰(πœ”)). So
𝐹 (∞) = 𝑃 (Ξ©) = 1.
(7.26)
Now to (7.20):
𝐹 (βˆ’ ∞) = lim 𝐹 (βˆ’ 𝑛) = lim 𝑃 {πœ‰ ≀ βˆ’ 𝑛} = 𝑃 ( lim {πœ‰ ≀ βˆ’ 𝑛})
π‘›β†’βˆž
π‘›β†’βˆž
=𝑃
π‘›β†’βˆž
∞
∩
(
)
{πœ‰ ≀ βˆ’π‘–} = 𝑃 (βˆ…) = 0
(7.27)
𝑖=1
(the events {πœ‰ ≀ βˆ’ 𝑛} form a non-increasing sequence with (clearly) empty intersection).
To (7.21) (with a non-increasing sequence):
𝐹 (𝑐+ ) = lim 𝑃 {πœ‰ ≀ 𝑐 + 1/𝑛} = 𝑃
π‘›β†’βˆž
∞
(∩
)
{πœ‰ ≀ 𝑐 + 1/𝑖} = 𝑃 {πœ‰ ≀ 𝑐} = 𝐹 (𝑐)
(7.28)
𝑖=1
(the event {πœ‰ ≀ 𝑐} occurs if and only if all events {πœ‰ ≀ 𝑐 + 1/𝑖} occur: the β€œif” part by
limit passage in the inequality πœ‰ ≀ 𝑐 + 1/𝑖, and the β€œonly if”: πœ‰ ≀ 𝑐 β‡’ πœ‰ ≀ 𝑐 + 1/𝑖 for all
natural 𝑖). Finally, (7.22) (again a non-decreasing sequence of events):
∞
βˆͺ
{πœ‰ ≀ 𝑐 βˆ’ 1/𝑖} = {πœ‰ < 𝑐}
𝑖=1
5
(7.29)
(if πœ‰(πœ”) is ≀ than at least one 𝑐 βˆ’ 1/𝑖, it is certainly (less than) 𝑐; and if πœ‰(πœ”) is less than 𝑐,
there exists a natural number 𝑖 such that 𝑐 βˆ’ 1/𝑖 ∈ πœ‰(πœ”), 𝑐 , and πœ” belongs to the event
in the left-hand side of (7.29)), so
βˆ’
𝐹 (𝑐 ) = lim 𝑃 {πœ‰ ≀ 𝑐 βˆ’ 1/𝑛} = 𝑃
π‘›β†’βˆž
∞
(βˆͺ
)
{πœ‰ ≀ 𝑐 βˆ’ 1/𝑖} = 𝑃 {πœ‰ < 𝑐}.
(7.30)
𝑖=1
From (7.13) and (7.22) we obtain easily:
𝑃 {πœ‰ = 𝑐} = 𝑃 {πœ‰ ≀ 𝑐} βˆ’ 𝑃 {πœ‰ < 𝑐} = 𝐹 (𝑐) βˆ’ 𝐹 (π‘βˆ’ ) = 𝐹 (𝑐+ ) βˆ’ 𝐹 (π‘βˆ’ );
(7.31)
this is the jump of the distribution function 𝐹 at the point 𝑐.
Now, the corrected formulation of the almost-uniqueness theorem:
Theorem 7.1β€² . Let the measure π‘š be 𝜎-finite. Then the density of a countably
additive set function 𝑛 with respect to π‘š is almost unique (supposing that it exists).
Try to work out the proof by yourself (anyway I don’t have to give proofs of measuretheory theorems).
6