Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MATH 7550-01 INTRODUCTION TO PROBABILITY FALL 2011 Lecture 4. Measurable functions. Random variables. I have told you that we can work with π-algebras generated by classes of sets in some indirect ways. The following microtheorem is useful here. Theorem 4.1. Suppose π and π are classes of subsets of a space π. If π β π(π), π β π(π), (4.1) then π(π) = π(π). (4.2) Proof. It is clear that if π β β°, then π(π) β π(β°). Applying this to the ο¬rst inclusion in (4.1), we get: ( ) π(π) β π π(π) = π(π) (4.3) (the last equality is quite clear). From the second inclusion in (4.1) we obtain, the same way, that π(π) β π(π); which, together with (4.3), yields (4.2). Let us show an example of how this is used. Let π be the class of all intervals, (π, π], (π, π), [π, π), [π, π], ο¬nite or inο¬nite, in the real line (a one-point set {π} is also an interval: {π} = [π, π]: it is the set of all real π₯ such that π β€ π₯ β€ π; and the empty set can also be considered as an interval). Let π be the class of all intervals, ο¬nite or inο¬nite, of the form (π, π] (we understand what the interval (β β, π] is; but what is (π, β]? β By deο¬nition, it is the set {π₯ β β1 : π < π₯ β€ β}; that is, the same as {π₯ β β1 : π < π₯ < β} = (π, β): no real number can be equal to β). Clearly π β π. Let us prove that the π-algebras π(π) and π(π) are the same. It is enough to check that π β π(π), which means that every interval of any kind (( , ), [ , ), [ , ]) belongs to π(π). And to do this, it is enough to represent βͺ β© every interval as the result of applying countably many set-theoretic operations ( , , and π ) to intervals of the form ( , ]. An interval (π, π) is the union of smaller semi-closed intervals: (π, π) = β βͺ (π, π β 1/π] (4.4) π=π (make a picture; I did not write the union from 1 to inο¬nity because I did not want questions about what happens if the length of the interval (π, π) is less than 1. But in fact, it would be OK: several ο¬rst summands in (4.4) would be intervals with the βrightβ end π β 1/π to the left of the βleftβ end π; such intervals are just empty, and it wouldnβt aο¬ect the union). So (π, π) belongs to π(π) by the π-algebra axiom 3π) (see Lectures 1β 2). The interval [π, π] is the complement of the union (ββ, π) βͺ (π, β] 1 (4.5) (if π = β β or π = β, the corresponding intervals are empty), and such intervals are both in π(π); so [π, π] β π(π). Finally, the interval [π, π) is the complement of (β β, π) βͺ [π, β], and also belongs to π(π). The same π-algebra is also the same as the π-algebra generated by) all semi-inο¬nite ( π intervals (β β, π], β β < π < β (because (π, π] = (β β, π] βͺ (β β, π]π ), and the same as that generated by all open subsets of β1 (the last statement is true because every open subset of the real line is a countable union of open intervals). The π-algebra π{all intervals} = π{(π, π] : ββ β€ π β€ π β€ β} = π{(ββ, π] : π β β1 } = π{all open sets} (4.6) that we introduced is very important for us. Remember that I said that the choice of the sample space Ξ© in applications of probability theory is natural, related to the nature of the experiment whose mathematical model we are trying to build; and the choice of the π-algebra β± of its subsets proclaimed as events is standard. Namely, if the sample space Ξ© is countable (possibly, ο¬nite), we take β± = π«(Ξ©), the class of all subsets of Ξ©; and if Ξ© is the real line, or part of it, or a set in βπ , ..., then the choice of the π-algebra β± is also standard. The time has come to set this standard. If Ξ© = β1 , we take as β± the π-algebra (4.6). This π-algebra is called the (one-dimensional) Borel π-algebra, and its elements are called Borel sets. The notation for it that we are going to use is β¬ 1 . When Henri Lebesgue constructed what we call now the Lebesgue measure π1 on the real line, it was deο¬ned on some π -algebra β1 of subsets of β1 , called measurable subsets. This π -algebra contains all intervals, and the Lebesgue measure of every interval is equal to its length. Since β1 is some π -algebra containing all intervals, and β¬ 1 is the smallest such π -algebra, we have clearly β1 β β¬ 1 . Is the inclusion β, in fact, a strict inclusion β, or is β1 = β¬ 1 ? You can be sure that mathematicians have worked to ascertain this; it turned out that β1 is wider than β¬ 1 : β1 β β¬ 1 (and even much wider β though I donβt want to spend time explaining what it means). So which of these π -algebras should we use as the standard? It turns out that the π -algebra β¬ 1 of Borel sets is very large: quite large enough for all practical purposes, and for most of theoretic ones. As a matter of fact, we cannot even construct an example of a non-Borel set; and we learn about existence of such only by indirect methods. So in our probability theory we are quite satisο¬ed with the Borel π -algebra β¬ 1 , and we donβt need in fact any sets belonging to β1 but not to β¬ 1 . We even donβt need to know that β1 is strictly larger than β¬ 1 . (The question arises: why did Lebesgue introduce the π -algebra β1 if β¬ 1 is quite enough? β But this was discovered only later.) The second reason that we are using the Borel π -algebra rather than some larger ones is that the π -algebra β1 is closely related to a speciο¬c measure: the Lebesgue measure π1 ; and for other measures on the real line the π -algebras on which one can consider them are diο¬erent from β1 . It would be unreasonable to consider many diο¬erent π -algebras on the same space β1 if we can do with just one, β¬ 1 . The words to describe the π-algebra π(π) will be: the π-algebra generated by the class of sets π. 2 How can we work with π-algebras generated by class of sets? We can do so in some indirect ways. We can consider Borel π-algebras in multidimensional spaces βπ ; these π-algebras can be deο¬ned either as generated by all multidimensional βintervalsβ β i. e., rectangles (π, π] × (π, π], or parallelepipeds in the three-dimensional case, etc.; or as generated by all open sets. Problem 7 given to you is about the fact that it is all the same. The Borel π-algebra in the π-dimensional Euclidean space is denoted β¬π . We can also deο¬ne the Borel π-algebra β¬π in every space π that is a metric space, or even a non-metric topological space: in every space where a class of open sets is deο¬ned. Of course, since there neednβt be any βintervalsβ or rectangles in the space π, the π-algebra β¬π is deο¬ned as one generated by the class πͺ of open subsets of π. I said that standard π-algebras are used in spaces that are either Euclidean spaces βπ , or their parts; but I have spoken only of the whole spaces. If π is a Borel subset of βπ , we can deο¬ne the π-algebra β¬π either as one generated by subsets of π that are open in π (a set π΄ β π is called open in π if for every point π₯0 β π΄ there is its neighborhood in π that is entirely within π΄: there exists a positive radius π such that {π₯ β π : β£π₯ β π₯0 β£ < π} β π΄); or as the π-algebra consisting of all Borel subsets of π΄. See Problem 8 . Now we go to random variables. First, a piece belonging to the set-theoretic introduction to probability and measure theory: what a measurable function is. When Lebesgue introduced the concept of measurability, it was understood as measurability with respect to the Lebesgue measure. But afterwards mathematicians understood that this concept can be formulated without reference to any measure: as one belonging to the set-theoretic introduction. A pair (π, π³ ) of a space π and a π-algebra π³ in it is called a measurable space. Let (π, π³ ) and (π, π΄) be two measurable spaces. Let π (π₯) be a function π : π 7β π (i. e., a function deο¬ned on π, with values in π ). We call this function (π³ , π΄)-measurable (or measurable with respect to π³ , π΄) if for every set πΆ β π΄ its inverse image π β1 (πΆ) = {π₯ : π (π₯) β πΆ} belongs to π³ : πΆ β π΄ β π β1 (πΆ) β π³ . (4.7) If we consider a number-valued or a vector-valued function (π = β1 or βπ ), and take the standard π-algebra β¬1 or β¬π as the π-algebra π΄, we omit the mention of π΄ and say βπ³ -measurableβ, or βmeasurable with respect to the π-algebra π³ β. If π³ is a Borel π-algebra, we call a function π that is measurable with respect to it Borel measurable. Let Ξ© be a sample space, β±, a π-algebra in it whose elements we proclaim events. Let (π, π³ ) be a measurable space. A random variable taking values in this measurable space is, by deο¬nition, a measurable function from (Ξ©, β±) to (π, π³ ). We will denote random variables with Greek letters. So, a random variable with values in (π, π³ ) is a function π : Ξ© 7β π that is (β±, π³ )-measurable: πΆ β π³ β π β1 (πΆ) = {π : π(π) β πΆ} [the short notation: {π β πΆ}] β β± (is an event). (4.8) 3 Of course, for the most part we consider π = β1 (just random variables, without any mention of the space β1 in which they take values); or π = βπ (random vectors); or, say, the space of all π × π matrices (or symmetric matrices) β then we speak of random matrices (or random symmetric matrices). In all these cases as the π-algebra π³ we take the corresponding Borel π-algebra. Theorem 4.2. Let π be a measurable function from (π, π³ ) to (π, π΄), and π a measurable function from (π, π΄) to (π, ( π΅). ) Then the composition π β π (i. e. the function π 7β π deο¬ned by (π β π )(π₯) = π π (π₯) ) is an (π³ , π΅)-measurable function. The proof is so simple that I omit it. The βprobabilisticβ formulation (i. e., in the language of (the set-theoretic introduction to) probability theory): Let π be a random variable with values in (π, π³ ). If π is an (π³ , π΅)-measurable ( ) function π 7β π, then π = π(π) (which is the short notation for π(π) = π π(π) ) is a random variable with values in (π, π΅). The particular case of π = π = β1 , π³ = π΅ = β¬1 : Let π be a random variable (by default, taking values in the real line), and π a Borel measurable real-valued function. Then π(π) also is a random variable. The deο¬nition (4.7) requires checking π β1 (πΆ) β π³ for all sets in π΄. This may be too many sets. But usually we are able to do it for much fewer sets: Theorem 4.3. Let π΄ be the π-algebra generated by some class π of subsets of π. Then if πΆ β π β π β1 (πΆ) β π³ , (4.9) the function π is (π³ , π΄)-measurable. Proof. Let us denote with π the class of all subsets of π for which π β1 (πΆ) β π³ . Clearly π β π. If we prove that π is a π-algebra, then, by deο¬nition, π β π(π) = π΄, and π΄ β π is just the statement of our theorem. So let us check that π is a π-algebra in π . This means, ο¬rst, that π β π; (4.10) πΆ β π β πΆ π = π β πΆ β π, (4.11) second and third, that πΆ1 , πΆ2 , ..., πΆπ , ... β π β β βͺ πΆπ β π. (4.12) π=1 Checking (4.10): π β1 (π ) = {π₯ : π (π₯) β π } = π β π³ . Now to (4.11): π β1 (π β πΆ) = π β π β1 (πΆ) β π³ ; 4 (4.13) and (4.12): π β1 β (βͺ π=1 β ) βͺ πΆπ = π β1 (πΆπ ) β π³ . (4.14) π=1 So, e. g., since the one-dimensional Borel π-algebra is generated by all semi-inο¬nite intervals (ββ, π] (or by all (β β, π), β β < π < β), to check that a real-valued function π = π(π) is a random variable it is enough to check that all sets {π : π(π) β€ π} (or <) are events. The proof of Theorem 4.3 is the standard thing that we have to do inο¬nitely many times if we want to set probability theory rigorously; we cannot do without such thins, but they are pretty simple, and we have to remember that they are not what is very important in probability theory. Theorem 4.4. Every continuous function π : π 7β π (of course, if we want to consider continuous functions, we have to suppose that π and π are sets in Euclidean spaces, or metric, or at least topological spaces) is (β¬π , β¬π )-measurable (in short: Borel measurable). Proof. By deο¬nition, β¬π is the π-algebra generated by the open subsets of π ; so by Theorem 4.3 it is enough to check that for every open πΆ β π π β1 (πΆ) = {π₯ : π (π₯) β πΆ} β β¬π [= π{open subsets of π}]. (4.15) But if the function π is continuous, the inverse image of every open set is again open; so (4.15) is true. In the same way we prove that every monotone function π : β1 7β β1 is Borel measurable: we use the fact that β¬1 = π{all intervals}, and that the inverse image π β1 of every interval is again an interval. Now, the most important thing in probability theory must be the probabilities; and we havenβt even mentioned them. Of course it is because we just deο¬ned what random variables are, and this is only the ο¬rst, and not the most important thing about random variables. I donβt really know in what place of my lecture notes to put what follows β just as I didnβt know in what place of the lecture I should put it. We are going to use some things of measure theory. I would like to tell you that measure theory: the theory of measure and integration, contains material of diο¬erent degrees of diο¬culty: some things are very simple; some other are of medium diο¬culty (such is, for example, the construction of Lebesgue integral); and some other parts are pretty complicated. Such is the construction of the Lebesgue measure on the real line or in βπ . We could develop our theory using only very simple and medium-simple things; but in probability theory this would restrict us to discrete random variables, and we wouldnβt be able to speak of continuous distributions, for which integration with respect to the Lebesgue measure is needed. So in fact we cannot do without the Lebesgue measure, and with it, the more complicated things in measure theory. In contrast with this, things belonging to the set-theoretic introduction both to measure theory and probability theory are all pretty simple. 5