Download - Free Documents

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
A Measure Theory Tutorial Measure Theory for Dummies
Maya R. Gupta guptaee.washington.edu Dept of EE, University of Washington Seattle WA,
UWEE Technical Report Number UWEETR
May
Department of Electrical Engineering University of Washington Box Seattle, Washington
PHN FAX URL http//www.ee.washington.edu
Rather this is a hack way to get the basic ideas down so you can read through research
papers and follow whats going on. The tutorial assumes one has had at least a year of
collegelevel calculus. University of Washington Seattle WA. The focus is on the terms and
ideas relevant to applied probability and information theory. An example measure is volume.
youll get curious and excited enough about the details to check out some of the references
for a deeper understanding. So we dene a measurable space. Hopefully. Measure theory is a
bit like grammar. The triple . Given a collection of possible events B. this is called the
powerset. A algebra is a special kind of collection of subsets of the sample space a algebra
is complete in that if some set A is in your algebra. University of Washington. . having a
sample space makes it possible to dene complements of sets. but the details do exist and for
good reasons. combines a measurable space and a measure. Also. why do you need to
state For one. There are a number of great texts that do measure theory justice. not just nite
unions. then the event F C is the set of outcomes in that are disjoint from F . and returns the
measure of A. then the union A B must also be in your collection of sets in fact. A. Another
term sometimes used to mean the same thing as algebra is eld. it must be that if you have
two sets A and B in your collection of sets. B. So ones writes B . A Something to measure
First. algebras are closed under countable unions. This is not one of them. we need
something to measure. .edu Dept of EE. In general. There are no proofs and no exercises.
and familiarity with terms like closed and open. A measurable space is written . many people
communicate clearly without worrying about all the details. some graduate level exposure to
random processes. A measure is dened by two properties . and the set of all outcomes . B.
UWEETR May Abstract This tutorial is an informal introduction to measure theory for people
who are interested in reading papers that use measure theory. B Measure A measure takes
a set A from a measurable collection of sets B. if the event F B. . then you have to have AC
the complement of A in your set too. The largest possible eld is the collection of all the
possible subsets of . you will see that the collection of events B in a measurable space is a
algebra. Algebras and Fields Often. which goes by the name Lebesgue measure. Gupta
guptaee.washington. The smallest possible eld is a collection of just two sets. of EE. which is
some positive real number. which is sometimes called the sample space.A Measure Theory
Tutorial Measure Theory for Dummies Maya R. measures are generalized notions of volume.
A measurable space is a collection of events B. and thus the triple is called a measure
space. Dept.
B. One is a signed measure. For example given probability measure P and two sets A. You
can dene the sample space to be the real line R. . a b . a b . thats how one denes a
cummulative distribution function. For example. which can be negative. but its never going to
happen. For example you might say. b. etc. B. Or. A Borel set is an element of a Borel
algebra. A special case of measure is the probability measure. You can see how our ordinary
notion of volume satises these two properties. b. In terms of probability. R is a measurable
space. It turns out that just about any set you can describe on the real line is a Borel set.
Lebesgue measure The Lebesgue measure L A is just the volume or area. Borel sets A
algebra collection of sets that appears often is the Borel algebra. and then the corresponding
Borel algebra is the collection of sets R. Countable Additivity If Ai B are disjoint sets for i . b .
Thus. which is the collection of sets that is the smallest sigmaalgebra that includes the open
subsets of the real line. This shows up in proofs where people want to say that some
measure is really equivalent to some other measure . and the value f A might be innite. etc.
the unit line segment . . R. Rd . you could integrate a function f over B. then the two
probability measures must be equal.. P A B. The nice thing about sets of measure zero is
they dont count when you want to state things about the collection of sets B. For example. as
Rd . B. Youll usually see people talk about the Borel algebra on the real line. by which you
mean there is zero probability of drawing a point bigger than one or smaller than zero. we
can familiarly write P BA P A B . you could use all right closed intervals . B B. B. Measurable
functions A function dened over a measurable set is called a measurable function. Measure
zero A set of measure zero is some set A B such that A .. There are a couple variations on
measure that you will run into. the irrational numbers form a Borel set. for example. And a
probability measure P has the two above properties of a measure but its also normalized.
Support The support of a measure is all the sets that do not have measure zero.
Nonnegativity A for all A B . you could use the set of all open intervals a. UWEETR . that
means Ive got this event A in my collection of events. One thing that makes the Borel sets so
powerful is that if you know what a probability measure does on every interval. then the
measure of the union of the Ai is equal to the sum of the measures of the Ai . and this is the
basis for the math that tells you that if two cdfs are equal for all choices of b. In fact. You
could also dene a Borel measurable space for R . You often see written the measure has
compact support to note that the support of the measure forms a compact closed and
bounded set. the probability measure has support only on the unit interval. is a Borel set. The
Borel set can be equivalently dened by intervals of various types. A probability measure P
over discrete set of events is basically what you know as a probability mass function. then
you know what it does on all the Borel sets. To do that they just need to show that and are
equivalent on all intervals. for example. such that P . L . and then they have proven that the
two measures are equivalent for all the Borel sets and hence over the measurable space. . or
length of set A.. but you can ignore f A if set A has measure zero. b. . A probability space is
just a measure space with a probability measure. . or all closed intervals of the real line a.
it must be that for all a lt b. Borel measure To call a measure a Borel measure means it is
dened over a Borel algebra. Example This example is based on an example from Capinski
and Kopps book. As shorthand. the mass of m. its common not to write the induced measure
at all. there is some set of events w that X maps to the output event A. b F b F a. the term
distribution is also used in a more specic way. and consider the eld M of measurable subsets
of . UWEETR . so that the measure of . then a random variable is a measurable function X
that maps the measurable space . since the Borel algebra can be generated by the set of
intervals . R. such as FX or FY . usually the Borel algebra of the real numbers R. / both are /.
then f is called the density of the distribution F . Since all subintervals of . the probability of a
Borel set A in the output space is equal to the probability of the inverse image under X of that
Borel set PA P X A P w Xw A. Consider an output event A R. To have a density you need the
distribution function F to be what is called absolutely continuous. there might not be a
corresponding density. Lebesgue measurable function. The distribution function is usually
indexed by the random variable. . b F b F a a f xdx. Distributions The probability measure P
over the output measurable space induced by a random variable X is called the distribution
of X .B. x for all x R. b is Pa. Restrict Lebesgue measure m to the interval B . P . C. let there
be some measure space . so the random variable X induces a probability measure over the
space. This works for every Borel set in the output space. That is. that is. More formally.
Thus all numerals are equally likely to appear as rst digits of the decimal expansion of a
number drawn randomly according to this measure. one writes the probability PA P X A. We
can see that the formal denition is saying the same thing as the basic denition. That is. with
the same length have the same measure. C. at random. . C The truth about Random
Variables A basic denition of a random variable is that it species a set of events that happen
with corresponding probabilities. However. is spread uniformly over . the complete
description of the probability measure induced by a random variable X requires knowledge
about P X A for all sets A R. and just write P X A. F to another measurable space. B. As we
foreshadowed in the section on Borel sets. page of what it means to say draw a number from
. / is the same as the measure of /. which is PA P w Xw A. and the probability of output event
A is the total probability of those events w. If f exists such that the above holds. x. However.
is a probability measure on M. Probability densities Given a distribution. where f is a
nonnegative. The random variable X induces a probability on the output event A. F. In fact.
Then one can say that the induced probability measure over the interval a. Then m. Then.
the distribution function of X is F x P X x. we only have to know P X A for every set A .
This general denition of expectation is the subtraction of two limitsums. and let X minX. . the
integral notation EX XwdP w which is also written EX and sometimes written EX XxP dx. This
formula has all the parts you expect its a sum over an increasingly larger number of
increasingly smaller components. D Entropy Entropy is a useful function of a random variable
and in this section we will use it to solidify some of the ideas and notation introduced above.
but we then negafy it. then one says that the random variable X is integrable. Note that EX
turns a negative X into a positive. We are going to start with expectation the average as the
more fundamental concept. and youll often see equivalence relationships where some q
equals some p if they agree on all sets that do not have measure . one gets . Then the
entropy is HP f aA P f a ln P f a. . which are dened to be intervals of the real line. Riemann
integration has some problems that make it not as useful as Lebesgue integration. consider
the entropy of a discrete alphabet random variable f dened on the probability space . What
about negative random variables Well need a couple extra denitions. from which we will get
to the familiar integral formula nn EX lim n k k P n k k Xlt n n . Let X maxX. If EX lt . and for
each component one takes the measure of the interval. for an arbitrary random variable X let
EX EX EX as long as EX and/or EX are nite. Then. Here is a limit denition of expectation for
nonnegative random variables X . Expectation and integration Expectation and integration
are fundamentally the same thing. For a discrete distribution F there is some countable set of
numbers xj and point masses pj such that F x xj x pj . As noted in Section E. The most
common measure to use in integration is the Lebesgue measure.C. Discrete distributions A
discrete distribution F has the familiar corresponding point mass function. and we give this
general denition of expectation a new notation. note this limit might not exist. and from there
develop integration. B. That is a simple but key idea in many proofs. UWEETR . XdP. Youll
note from the limitsum denition that if one takes the integral of a set of measure . for all x R.
C. and the reader is referred for example to Capinski and Kopp for more details . First. so the
negative aspect is not lost. This is our starting denition. The pj form the probability mass
function pmf over the events. or probability mass function. which is for almost all practical
purposes equivalent to the standard Riemann integration that one rst learns. P .
. This theorem doesnt work for Riemann integration. in mathematics. see for example . so
you can equivalently write HP f pf a ln pf a. . One of the more important results in this area is
Lebesgues Dominated Convergence Theorem. Probability A Graduate Course. Then you
can also write entropy in terms of the induced partition Q A HP Q i P Qi ln P Qi . which
teaches measure theoretic graduate level probability with the assumption that you do not
have a B. W. . Dover. .A. Gut. . F Read on If you are interested in a more thorough
understanding of measure theory and probability. instead. Reza. References S. A formal
statement of this can be found on mathworlds page www. or the information theory book by
Ash you might want to read the less formal book by Reza before Ash. Kopp. Whats powerful
about this theorem though is that one doesnt have to assume that f is measurable.
Hamming. . Dover. Other good texts are an undergraduate text on measure theory . For
more on the aws of Riemann integration. F. fn is a sequence of measurable functions that
converges to f . Springer. R. Springer Verlag available free online. Capinski and E. Ash.
Integral. and Probability. B. where pf P w f w a P f a. A where Qi w f w ai f ai . If you are
interested in information theory you can solidify your understanding of the use of measure
theory in information theory by reading Bob Grays book . One of the basic questions in
mathematics is to what extent limits of objects carry over to limits of functions of objects.
There are some other restrictions on the fn s see the formal statement. Entropy and
Information Theory. and that is considered one of the aws of Riemann integration that makes
Lebesgue integration more general and more useful. Grays book is available free online. M.
and the Kullback. . An introduction to information theory. . M. . UWEETR . Information theory
and statistics. the theorem concludes that f is integrable. it says that the integral of a function
f with a measure is the same as the limit of the integral of fn . E Limits To quote Gut . M. one
of the friendliest books is Resnicks . . Ash. R.mathworld. Dover. The integration one learns
as a kid is Riemann integration. . . but these three are relatively wellsuited for selfstudy.
Gray. . If youve decided you arent so interested in formal probability. aA A discrete random
variable f induces a partition on the input space that corresponds to the inverse image of
each event let the partition Q consist of sets Qi i . Information theory. Addison Wesley. I
recommend Richard Hammings book . and shows you how to integrate it by instead taking
the limit of the integral of the sequence of functions. There are certainly plenty of other
probability and measure theory text books. R. and Guts graduate level measuretheoretic
probability book . but want to learn more about approaches to solving probability problems.
Kullbacks book .com. . A. and Reza books are available in inexpensive Dover editions.
Measure. where f . Kullback. f induces a probability mass function pmf pf . Springer
Undergraduate Mathematics Series. The art of probability for scientists and engineers.Also. f
. . . Basically.
Elements of Information Theory. Resnick. Wiley Series in Telecommunications. . Cover and
J. A. Thomas. UWEETR . Birkh user. S. A probability path. I. . M. a T.