Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Measure Theory Tutorial Measure Theory for Dummies Maya R. Gupta guptaee.washington.edu Dept of EE, University of Washington Seattle WA, UWEE Technical Report Number UWEETR May Department of Electrical Engineering University of Washington Box Seattle, Washington PHN FAX URL http//www.ee.washington.edu Rather this is a hack way to get the basic ideas down so you can read through research papers and follow whats going on. The tutorial assumes one has had at least a year of collegelevel calculus. University of Washington Seattle WA. The focus is on the terms and ideas relevant to applied probability and information theory. An example measure is volume. youll get curious and excited enough about the details to check out some of the references for a deeper understanding. So we dene a measurable space. Hopefully. Measure theory is a bit like grammar. The triple . Given a collection of possible events B. this is called the powerset. A algebra is a special kind of collection of subsets of the sample space a algebra is complete in that if some set A is in your algebra. University of Washington. . having a sample space makes it possible to dene complements of sets. but the details do exist and for good reasons. combines a measurable space and a measure. Also. why do you need to state For one. There are a number of great texts that do measure theory justice. not just nite unions. then the event F C is the set of outcomes in that are disjoint from F . and returns the measure of A. then the union A B must also be in your collection of sets in fact. A. Another term sometimes used to mean the same thing as algebra is eld. it must be that if you have two sets A and B in your collection of sets. B. So ones writes B . A Something to measure First. algebras are closed under countable unions. This is not one of them. we need something to measure. .edu Dept of EE. In general. There are no proofs and no exercises. and familiarity with terms like closed and open. A measurable space is written . many people communicate clearly without worrying about all the details. some graduate level exposure to random processes. A measure is dened by two properties . and the set of all outcomes . B. UWEETR May Abstract This tutorial is an informal introduction to measure theory for people who are interested in reading papers that use measure theory. B Measure A measure takes a set A from a measurable collection of sets B. if the event F B. . then you have to have AC the complement of A in your set too. The largest possible eld is the collection of all the possible subsets of . you will see that the collection of events B in a measurable space is a algebra. Algebras and Fields Often. which goes by the name Lebesgue measure. Gupta guptaee.washington. The smallest possible eld is a collection of just two sets. of EE. which is some positive real number. which is sometimes called the sample space.A Measure Theory Tutorial Measure Theory for Dummies Maya R. measures are generalized notions of volume. A measurable space is a collection of events B. and thus the triple is called a measure space. Dept. B. One is a signed measure. For example given probability measure P and two sets A. You can dene the sample space to be the real line R. . a b . a b . thats how one denes a cummulative distribution function. For example. which can be negative. but its never going to happen. For example you might say. b. etc. B. Or. A Borel set is an element of a Borel algebra. A special case of measure is the probability measure. You can see how our ordinary notion of volume satises these two properties. b. In terms of probability. R is a measurable space. It turns out that just about any set you can describe on the real line is a Borel set. Lebesgue measure The Lebesgue measure L A is just the volume or area. Borel sets A algebra collection of sets that appears often is the Borel algebra. and then the corresponding Borel algebra is the collection of sets R. Countable Additivity If Ai B are disjoint sets for i . b . Thus. which is the collection of sets that is the smallest sigmaalgebra that includes the open subsets of the real line. This shows up in proofs where people want to say that some measure is really equivalent to some other measure . and the value f A might be innite. etc. the unit line segment . . R. Rd . you could integrate a function f over B. then the two probability measures must be equal.. P A B. The nice thing about sets of measure zero is they dont count when you want to state things about the collection of sets B. For example. as Rd . B. Youll usually see people talk about the Borel algebra on the real line. by which you mean there is zero probability of drawing a point bigger than one or smaller than zero. we can familiarly write P BA P A B . you could use all right closed intervals . B B. B. Measurable functions A function dened over a measurable set is called a measurable function. Measure zero A set of measure zero is some set A B such that A .. There are a couple variations on measure that you will run into. the irrational numbers form a Borel set. for example. And a probability measure P has the two above properties of a measure but its also normalized. Support The support of a measure is all the sets that do not have measure zero. Nonnegativity A for all A B . you could use the set of all open intervals a. UWEETR . that means Ive got this event A in my collection of events. One thing that makes the Borel sets so powerful is that if you know what a probability measure does on every interval. then the measure of the union of the Ai is equal to the sum of the measures of the Ai . and this is the basis for the math that tells you that if two cdfs are equal for all choices of b. In fact. You could also dene a Borel measurable space for R . You often see written the measure has compact support to note that the support of the measure forms a compact closed and bounded set. the probability measure has support only on the unit interval. is a Borel set. The Borel set can be equivalently dened by intervals of various types. A probability measure P over discrete set of events is basically what you know as a probability mass function. then you know what it does on all the Borel sets. To do that they just need to show that and are equivalent on all intervals. for example. such that P . L . and then they have proven that the two measures are equivalent for all the Borel sets and hence over the measurable space. . or length of set A.. but you can ignore f A if set A has measure zero. b. . A probability space is just a measure space with a probability measure. . or all closed intervals of the real line a. it must be that for all a lt b. Borel measure To call a measure a Borel measure means it is dened over a Borel algebra. Example This example is based on an example from Capinski and Kopps book. As shorthand. the mass of m. its common not to write the induced measure at all. there is some set of events w that X maps to the output event A. b F b F a. the term distribution is also used in a more specic way. and consider the eld M of measurable subsets of . UWEETR . so that the measure of . then a random variable is a measurable function X that maps the measurable space . since the Borel algebra can be generated by the set of intervals . R. such as FX or FY . usually the Borel algebra of the real numbers R. / both are /. then f is called the density of the distribution F . Since all subintervals of . the probability of a Borel set A in the output space is equal to the probability of the inverse image under X of that Borel set PA P X A P w Xw A. Consider an output event A R. To have a density you need the distribution function F to be what is called absolutely continuous. there might not be a corresponding density. Lebesgue measurable function. The distribution function is usually indexed by the random variable. . b F b F a a f xdx. Distributions The probability measure P over the output measurable space induced by a random variable X is called the distribution of X .B. x for all x R. b is Pa. Restrict Lebesgue measure m to the interval B . P . C. let there be some measure space . so the random variable X induces a probability measure over the space. This works for every Borel set in the output space. That is. that is. More formally. Thus all numerals are equally likely to appear as rst digits of the decimal expansion of a number drawn randomly according to this measure. one writes the probability PA P X A. We can see that the formal denition is saying the same thing as the basic denition. That is. with the same length have the same measure. C. at random. . C The truth about Random Variables A basic denition of a random variable is that it species a set of events that happen with corresponding probabilities. However. is spread uniformly over . the complete description of the probability measure induced by a random variable X requires knowledge about P X A for all sets A R. and just write P X A. F to another measurable space. B. As we foreshadowed in the section on Borel sets. page of what it means to say draw a number from . / is the same as the measure of /. which is PA P w Xw A. and the probability of output event A is the total probability of those events w. If f exists such that the above holds. x. However. is a probability measure on M. Probability densities Given a distribution. where f is a nonnegative. The random variable X induces a probability on the output event A. F. In fact. Then one can say that the induced probability measure over the interval a. Then m. Then. the distribution function of X is F x P X x. we only have to know P X A for every set A . This general denition of expectation is the subtraction of two limitsums. and let X minX. . the integral notation EX XwdP w which is also written EX and sometimes written EX XxP dx. This formula has all the parts you expect its a sum over an increasingly larger number of increasingly smaller components. D Entropy Entropy is a useful function of a random variable and in this section we will use it to solidify some of the ideas and notation introduced above. but we then negafy it. then one says that the random variable X is integrable. Note that EX turns a negative X into a positive. We are going to start with expectation the average as the more fundamental concept. and youll often see equivalence relationships where some q equals some p if they agree on all sets that do not have measure . one gets . Then the entropy is HP f aA P f a ln P f a. . which are dened to be intervals of the real line. Riemann integration has some problems that make it not as useful as Lebesgue integration. consider the entropy of a discrete alphabet random variable f dened on the probability space . What about negative random variables Well need a couple extra denitions. from which we will get to the familiar integral formula nn EX lim n k k P n k k Xlt n n . Let X maxX. If EX lt . and for each component one takes the measure of the interval. for an arbitrary random variable X let EX EX EX as long as EX and/or EX are nite. Then. Here is a limit denition of expectation for nonnegative random variables X . Expectation and integration Expectation and integration are fundamentally the same thing. For a discrete distribution F there is some countable set of numbers xj and point masses pj such that F x xj x pj . As noted in Section E. The most common measure to use in integration is the Lebesgue measure.C. Discrete distributions A discrete distribution F has the familiar corresponding point mass function. and we give this general denition of expectation a new notation. note this limit might not exist. and from there develop integration. B. That is a simple but key idea in many proofs. UWEETR . XdP. Youll note from the limitsum denition that if one takes the integral of a set of measure . for all x R. C. and the reader is referred for example to Capinski and Kopp for more details . First. so the negative aspect is not lost. This is our starting denition. The pj form the probability mass function pmf over the events. or probability mass function. which is for almost all practical purposes equivalent to the standard Riemann integration that one rst learns. P . . This theorem doesnt work for Riemann integration. in mathematics. see for example . so you can equivalently write HP f pf a ln pf a. . One of the more important results in this area is Lebesgues Dominated Convergence Theorem. Probability A Graduate Course. Then you can also write entropy in terms of the induced partition Q A HP Q i P Qi ln P Qi . which teaches measure theoretic graduate level probability with the assumption that you do not have a B. W. . Dover. .A. Gut. . F Read on If you are interested in a more thorough understanding of measure theory and probability. instead. Reza. References S. A formal statement of this can be found on mathworlds page www. or the information theory book by Ash you might want to read the less formal book by Reza before Ash. Kopp. Whats powerful about this theorem though is that one doesnt have to assume that f is measurable. Hamming. . Dover. Other good texts are an undergraduate text on measure theory . For more on the aws of Riemann integration. F. fn is a sequence of measurable functions that converges to f . Springer. R. Springer Verlag available free online. Capinski and E. Ash. Integral. and Probability. B. where pf P w f w a P f a. A where Qi w f w ai f ai . If you are interested in information theory you can solidify your understanding of the use of measure theory in information theory by reading Bob Grays book . One of the basic questions in mathematics is to what extent limits of objects carry over to limits of functions of objects. There are some other restrictions on the fn s see the formal statement. Entropy and Information Theory. and that is considered one of the aws of Riemann integration that makes Lebesgue integration more general and more useful. Grays book is available free online. M. and the Kullback. . An introduction to information theory. . M. . UWEETR . Information theory and statistics. the theorem concludes that f is integrable. it says that the integral of a function f with a measure is the same as the limit of the integral of fn . E Limits To quote Gut . M. one of the friendliest books is Resnicks . . Ash. R.mathworld. Dover. The integration one learns as a kid is Riemann integration. . . but these three are relatively wellsuited for selfstudy. Gray. . If youve decided you arent so interested in formal probability. aA A discrete random variable f induces a partition on the input space that corresponds to the inverse image of each event let the partition Q consist of sets Qi i . Information theory. Addison Wesley. I recommend Richard Hammings book . and shows you how to integrate it by instead taking the limit of the integral of the sequence of functions. There are certainly plenty of other probability and measure theory text books. R. and Guts graduate level measuretheoretic probability book . but want to learn more about approaches to solving probability problems. Kullbacks book .com. . A. and Reza books are available in inexpensive Dover editions. Measure. where f . Kullback. f induces a probability mass function pmf pf . Springer Undergraduate Mathematics Series. The art of probability for scientists and engineers.Also. f . . . Basically. Elements of Information Theory. Resnick. Wiley Series in Telecommunications. . Cover and J. A. Thomas. UWEETR . Birkh user. S. A probability path. I. . M. a T.