Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROBABILITY FALL 2014 - CLASS NOTES 11 3. SEPTEMBER 16 We begin with an example. Example 3.1. As an example of random experiment with sample space the interval [0, 2⇡), we have described a spinner on September 11. It is reasonable to assume that the probability of the needle ending up between the angles a and b is proportional to the normalized (by 2⇡) length of the interval, i.e. (b a)/2⇡. We have verified this experimentally using the random generator in MatLab. We now want to look at the distribution of the sum of two such uniformly distributed random numbers in [0, 1) (we renormalize for convenience), call it X . 3.1. Probability distribution functions on R, Rn . We want to be able to describe random experiments whose natural sample space ⌦ is a subset of the real line R or of the Euclidean space Rn . We get a simplified theory if we restrict ⌦ to be of the following type: · ⌦ ⇢ R is a finite or countable union of (closed, half-open, open) possibly unbounded intervals {I n }; · ⌦ ⇢ Rn is a countable union of products of intervals Ras above; · ⌦ ⇢ Rn is a domain for which the Riemann integral ⌦ ·dx 1 . . . dx n makes sense; we will refer to ⌦ as such by the phrase sample space (where we mean in fact admissible sample space). Definition 3.2. Let ⌦ ⇢ Rn be a sample space. A function f : ⌦ ! R is a probability distribution function on ⌦ if · f (x) 0 for all x 2 ⌦; Z f (x) dx = 15 · f is Riemann integrable on ⌦ and ⌦ Note thatR we can always think of f being defined on Rn by setting f = 0 on Rn \⌦. If we do so, then Rn f = 1. Definition 3.3. We say that X : R ! R is a random variable with probability distribution function f = f X if Z x P(X x) = The function Z FX (x) = 1 f (t) dt. x 1 f (t) dt = P(X x) is called cumulative distribution function of X . 5 We recall that, for instance, when ⌦ ⇢ R, ⌦ = Z S n In f (x) dx = ⌦ as above, so that XZ n In f (x) dx. 12 PROBABILITY FALL 2014 - CLASS NOTES Lemma 3.4. The cumulative distribution function FX of f is a nondecreasing, absolutely continuous function such that FX0 = f at all points where FX is differentiable, and lim FX (x) = 0, lim FX (x) = 1. x! 1 x!1 Proof. That FX is nondecreasing is immediate from f being positive. That FX is absolutely continuous and FX0 = f wherever FX is differentiable is immediate from the fundamental theorem of calculus for Riemann-integrable functions. Finally, the last two properties are R obvious respectively from the Riemann integrability of f and from the fact that R f = 1. ⇤ Example 3.5. Let X be a random variable which is uniformly distributed on the interval [a, b], with a < b being real numbers. Intuitively, this means that 8 x > a, > <0 x a a x < b, P(X x) = > :b a 1 x b; this means that the function ⇢ f (x) = 1 b a a<x<b x a or x 0 is a probability distribution function for X. b The Rn case of Definition 3.3 is as follows. Definition 3.6. We say that X = (X 1 , . . . X n ) : Rn ! Rn is a random variable with probability distribution function f if Z x1 Z xn P(X 1 x 1 , . . . , X n x n ) = The function Z FX (x 1 , . . . , x n ) = 1 Z x1 1 ··· is called cumulative distribution function of X . ··· 1 f (t 1 , . . . , t n ) dt 1 · · · dt n . xn 1 f (t 1 , . . . , t n ) dt 1 · · · dt n Example 3.7. Let X = (X 1 , X 2 ) be a random variable describing the landing position of a dart thrown at a target ⌦ which is a disc of radius R > 0, in cartesian coordinates centered at the center of the target. Assuming that the landing position is uniformly distributed on the circle, a probability distribution function is given by ⇢ 1 x 12 + x 22 < R2 , 2 f (x 1 , x 2 ) = ⇡R 0 otherwise. So in particular, for instance Z0 Z P(X 1 0, X 2 0) = 1 Z 0 1 f (x 1 , x 2 ) dx 1 = dx 2 = x 12 +x 22 <R2 ,x 1 0,x 2 1 1 dx 1 dx 2 = . 2 ⇡R 4 0 PROBABILITY FALL 2014 - CLASS NOTES Our theory will later justify that for any E ⇢ ⌦ Z P(X 2 E) := E f (x 1 , x 2 )dx 1 = dx 2 = 13 |E| ⇡R2 as we had postulated. ∆ We can define a new random variable ⇢ := X 12 + X 22 as the distance from the landing to the center of the target. We have, using the above, that 8 <0 r0, 2 2 2 2 P(⇢ r) = P(X 2 {x 1 + x 2 r }) = Rr 2 0 r R : 1, r > R. Can we find the probability distribution function of ⇢? Example 3.8 (improper integrals). Let 8 c < f (x) = x log : 0 e x 2 0< x <1 elsewhere Choose c such that f is a probability distribution function. Example 3.9 (uniform probability). Let (X , Y ) be uniformly distributed on the square [0, 1]2 . Find the cumulative probability distribution function and the probability distribution function of Z = X + Y . Example 3.10 (uniform probability). The train to Boston leaves every hour, but Francesco has forgotten the exact time (i.e. each hour hr a train leaves at hr:mm and he does not know what mm is). He will show up at the train station between 5pm and 6pm. Let T be the time that Francesco will have to wait at the station. Assuming that both Francesco’s arrival and the train departure time 5:mm are uniformly distributed between 5pm and 6pm, calculate the cumulative probability distribution for T . 3.2. Probability measure associated to a distribution function. Given a random variable X : R ! R with probability distribution function f = f X , we would like to calculate P(X 2 E) for as many sets E ⇢ R as possible: these sets will be our events. Definition 3.11. Let B ⇢ R. Then B 2 B(R) if either B or its complement can be written as the countable union of subintervals (an , bn ] with an < bn real numbers. Lemma 3.12. The collection B(R) is a -algebra on R and furthermore · B(R) contains all intervals of the form (a, b] (with possibly a = 1), [a, b], [a, b) (with possibly b = 1), (a, b) (with possibly a = 1, b = +1) · B(R) contains all the points, in the sense that {a} 2 B(R) for all a 2 R. With the present definitions, see e.g. Definition 3.3, we can only compute P(X 2 E) for E = ( 1, x]. With the following theorem, we extend the definition of P(·) to all sets of B(R). 14 PROBABILITY FALL 2014 - CLASS NOTES Theorem 3.1. Let X : R ! R be a random variable with probability distribution function f = f X . There is a unique probability measure PX on (R, B(R)) in the sense of Definition 2.13, given by PX (B) = P(X 2 B), such that for all a < b with possibly a = 1. Z b P(X 2 (a, b]) = P(x 2 (a, b)) = FX (b) (3) FX (a) = a f X (x) dx. In particular, the following properties hold true. (1) if E ⇢ F ⇢ R are both in B(R) then PX (E) PX (F ); (2) PX (R) = 1, and if ⌦ is the set where f is nonzero, PX (B \⌦) = PX (B) for all B 2 B(R); (3) if {A j } is a collection of pairwise disjoint sets of B(R), meaning that A j \ Ak = ; unless k = j, then Ç å 1 1 X [ PX Aj = PX (A j ). j=1 j=1 (4) PX ({x}) = P(X = x) = 0 8x 2 R; (5) PX (A) + PX (Ã) = 1, Ã being the complement of A. We will not see a complete proof but the idea is to define PX on countable unions of disjoint open on the left intervals by Ç å 1 1 X [ PX (a j , b j ] := FX (b j ) FX (a j ) j=1 j=1 and then extend this to any B 2 B(R) by postulating that the countable additivity (4) in the Theorem holds. 3.3. Further examples. Example 3.13 (exponential distribution). Let T be the random variable describing the (random) time between two consecutive breakdowns of a certain machine which is assumed to be wear-free, in the following sense: if we set the origin of time at the last breakdown, the probability of having a breakdown in the time interval [t, t + s) with the machine is still working at time t > 0 is the same as the probability of having one between [0, s). By this assumption, we can determine the cumulative distribution function of T up to some parameter > 0. Indeed, we only consider positive times, so P(T 0) = 0. If t > 0 and s > 0 a moment’s thought leads to P(T > t + s) = P(T > t)P(T > s) and thus setting G(t) = P(T > t) = 1 P(T t) = 1 F T (t), we have that G(t + s) = G(t)G(s), 8t, s > 0. It is clear that G(t) = e t satisfies the above equation for all s, t. Some work shows that these are the only continuous solutions to the above equation (the proof is that H = ln G satisfies H(t +s) = H(t)+H(s) and such a function, if continuous, must be linear.) Moreover > 0 for G to go to zero at infinity. Therefore, we have found ⇢ 0 t 0 F T (t) = t 1 e t > 0. Note that > 0 is the reciprocal of the expected time between occurrences.