Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 2: Infinite Discrete and Continuous Sample Spaces CS244-Randomness and Computation February 2, 2015 To extend our formal framework to discrete sample spaces with an infinite number of outcomes, we need a small tweak to the definitions. In order to extend them to continuous sample spaces, we will need more than a small tweak! The details follow. Related reading: Grinstead and Snell treat all this in Chapter 2, and the notes here are very abbreviated version of their approach. 1 Infinite Discrete Sample Spaces Flip a fair coin until two consecutive tosses are tails. We can list the possible outcomes of this experiment as Ω = {T T, HT T, HHT T, T HT T, HHHT T, HT HT T, T HHT T, HHHHT T, HHT HT T, HT HHT T, T HHHT T, T HT HT T, ....} In other words, Ω consists of all sequences of H’s and T’s that end in two consecutive T’s, but do not contain any other occurrence of consecutive T’s. The important thing here is that Ω is infinite. What sort of probability should we assign to the individual outcomes? Following our usual reasoning, we would estimate that each sequence of length k occurs with frequency 2−k . How many outcomes are there of each length? In the tabulation above, we see 1 sequence of length 2, 1 of length 3, 2 of length 4, 3 of length 5, 5 of length 6. You might recognize the sequence 1,1,2,3,5,... as the start of the Fibonacci 1 sequence F0 , F1 , F2 , F3 , . . . , where for each k ≥ 2, Fk = Fk−1 + Fk−2 . (If you’re so inclined, you might enjoy trying to prove that Ω contains exactly Fk−2 elements of length k for every k ≥ 2. ) Does this define a probability function? We would want all the probabilities of the individual outcomes to add to 1. But ‘adding up all the individual outcomes’ now means summing the infinite series F0 F1 F2 + + + ··· 4 8 16 Does this series converge to 1? Does it converge at all? If you add more and more terms, you do find that the sum appears to be approaching 1. Again, it is not hard to prove that this really is a convergent infinite series with sum 1, and we leave it as an exercise for the mathematically curious. Since all the terms are positive, we can drop out any set of terms, and the remaining terms will still form a convergent infinite series, with sum less than 1. We can thus set, for any event E ⊆ Ω, X P [E] = 2−|e| , e∈E where |e| denotes the length of the outcome e (since each outcome is a string of coin tosses). Implicit in this way of defining probability functions is the following extension of our third axiom, applicable to any probability space: Axiom 3’. If E1 , E2 , E3 , . . . , is a sequence of pairwise disjoint events, then P[ ∞ [ Ej ] = j=1 ∞ X P Ej . j=1 The other probability properties imply that the infinite sum on the right has to be convergent with a limit no more than 1. This is the small tweak we needed. Observe that ‘equally likely outcomes’ makes no sense in this setting: If each outcome had equal probability greater than 0, the sum over all probabilities would not converge. If each outcome had probability 0, the series would converge to 0, not 1. We can define a random variable X by X(e) = |e|; in other words, X(e) is the number of tosses made before two consecutive tails occur. The probability mass function is given by 2 Figure 1: PMF of Time to Two Tails Fj−2 2j for j > 1. While the plot of the PMF is inifinite in extent, we can display a finite portion of it it; this is shown in Figure 1. pX (j) = P [X = j] = 2 2.1 Continuous Sample Spaces Definition Figure 1 shows a game spinner. The points on the circumference of the circle are labeled by the set of real numbers Ω = {0 ≤ x < 1}, 3 Figure 2: A spinner so this is the sample space for our experiment. We also denote this set as [0, 1), a half-open interval. The whole point of the spinner is that the outcomes are somehow ‘equally likely’. (This is the experiment simulated by a call to the random number generator rand().) We have no problem saying that the event depicted in Figure 2 has probability 0.25, since it occupies exactly one-quarter of the circumference of the circle. But what sort of probability should we assign to an individual outcome? If we say that each individual point has some fixed probability p > 0, then the sum of these probabilities would exceed 1, so that can’t be right. But if we say that the probability of an individual point is 0, then that makes it sound that for each individual point Q on the circumference, it is impossible that the spinner points to Q, and so it is impossible that the spinner points anywhere. So that can’t be right either.... In fact, we DO say that the probability of each indivdual point is 0, and we just have to be careful about how we apply the intuition that probability 0 means ‘impossible’. With this in mind, we assign to each event E ⊆ Ω the total length of E. This makes sense for individual points, intervals, unions of disjoint intervals, etc. This is the uniform probability distribution on Ω. (Mathematical subtlety: it is impossible to assign a length to every subset of 4 Figure 3: The spinner, showing the event {x : 0.5 ≤ x ≤ 0.75} the interval [0, 1), and preserve the properties that you think length functions ought to have. This makes getting the formal definition quite right pretty tricky. For our purposes...don’t worry about it.) A continuous sample space can be 2-dimesnsional, or 3- or more-dimensional. In the uniform distribution model, the probability function on a 2-dimensional sample space Ω assigns to each event E ⊆ Ω the probability area(E) . A For instance, in our dartboard example, the board, with radius 1, has area π and the bulls-eye, which has radius 1/12,, has area π/144 square feet. So in the uniform model, the bulls-eye would have probability 1/144. There is no strong reason to believe that this is a good model of what happens in reality—that would depend on the skill of the players whose behavior we’re modeling 2.2 Buffon’s Needle Here is an amusing—and at first glance, astonishing—application of these ideas. Mark off a large area on the floor with parallel lines one inch apart. Toss a oneinch-long needle in the air and let it land on the floor. Perform this experiment 5 Figure 4: Buffon’s Needle, 600 trials repeatedly, and count the proportion of trials in which the needle crosses one of the lines. This proportion will be approximately 2/π. (The problem, in slightly different form, was described in the 18th century by G. L. Leclerc, Comte de Buffon, who was more famous as a naturalist than as a mathematician.) Figure 4 illustrates a simulation of the experiment with 600 throws. In this particular run, the needles crossed the line in 65.33% of trials. The correct value of 2/π is about 0.6366. In another run, with 60000 trials, the estimate of 2/π was 0.6389. Let’s see how this remarkable result is obtained. An outcome is where the needle lands, and this is determined by two parameters: The cooridnates of the left extremity of the needle, and the angle − π2 ≤ θ ≤ π2 that the needle makes with the horizontal. Now we are only interested in whether the needle crosses one of the vertical lines, so we don’t care about the y-coordinate at all, and we only care about the integer part of the x-coordinate, that is the horizontal distance 0 ≤ z ≤ 1 from the vertical line immediately to the left. 6 Figure 5: Sample space for Buffon Needle Problem, with the event ‘the needle crosses a line’ shaded So we can view the sample space as the rectangle {(θ, z) : − π2 ≤ θ ≤ π2 , 0 ≤ z ≤ 1}. Here it is reasonable to assume a uniform distribution: we don’t think that any horizontal displacement z or angle θ is more likely than any other one. The right-hand extremity of the needle has x-coordinate z + cos θ, and thus the needle crosses the line if and only if z > 1 − cos θ. In the plot of Figure 5, this event is the shaded area. The probability of crossing the line is thus the area of the shaded region divided by the area of the rectangle. If we turn the picture upside-down, the shaded region is just the area under the graph of cos θ for − π2 ≤ θ ≤ π2 , which is Z π 2 cos θdθ = sin − π2 π π − sin(− ) = 1 − (−1) = 2, 2 2 and the area of the rectangle is π, so the probability that the needle crosses a line 7 is 2/π, as claimed. 2.3 Monte Carlo Integration The foregoing example (minus the story about the needle) shows a method for approximating an integral: To find the area of a region X (in this cases the shaded region of Figure 5), we enclose it in a region Y whose area we know (in our example, the rectangle) and select points uniformly and at random in Y. Let N be the number of points selected and Nhits the number of these points that are in X. Then area(X) Nhits ≈ , N area(Y ) so we obtain an approximation of the area of X. Of course in the above example, we knew the integral exactly, but for peculiarly-shaped regions of integration, this method might be a practical alternative. Approximating integrals in this way is called Monte Carlo integration. 2.4 Continuous random variables—cumulative distribution function and density function For continuous sample spaces, we still define a random variable as function X from the sample space to the set of real numbers. Generally speaking X will be a continuous function. Here are several examples: • Let Ω = [0, 1) with the uniform probability distribution. This is the spinner example described above. Set X(x) = x, so that the value of the random variable is simply the outcome of the spin. • Consider now a pair of spinners. When we spin each one, we get a pair x, y of numbers in [0, 1). We can model the set of outcomes as the 1 × 1 square. It makes sense here to use the uniform probability distribution on this square, as the outcome on one spinner presumably has no influence on the outcome of the other spinner. Let Y (x, y) = x+y, the sum of the values on the two spinners. Y is a random variable, a kind of continuous analogue of the roll of two dice. • Let’s go back to an earlier example of an inifinite discrete sample space, that of flipping a fair coin until tails comes up. We might use this to model a 8 different situation: Suppose we observe people entering a store. We divide time into one-minute intervals and find that on average, one customer enters the store in about half of the one-minute periods. We can think of the experiment as starting a clock and noting the time each time the first customer arrives.Using the coin model for the store arrivals is not a terrible thing to do, but it does sweep a lot under the rug. Clearly, this is a continuous process, and we might instead view the sample space as the set of positive real numbers, where again the outcome of the experiment is the arrival time of the the first customer. Let’s define Z(t) = t; that is, the value of the random variable Z is just the outcome of the experiment. It is not very helpful to talk about the probability mass function for any of these random variables: The probability of the random variable having a particular value x will generally be 0. However, it does make sense to talk about the cumulative probability distriubtion FX (a) = P [X < a]. Let’s calculate this for each of the three random variables in the examples above. For the random variable X giving the value of a single spinner, we have FX (a) = P [X < a] = a for all a between 0 and 1, while FX (a) = 0 for a < 0, and FX (a) = 1 for a > 1, The cdf FX is plotted in Figure 6. For the sum of Y of two spinner outcomes, the cumulative probability FY (a) is a2 /2 for 0 < a < 1 and 1 − (2 − a)2 /2 for 1 < a < 2. This is illustrated in Figure 7. The cdf FY itself is plotted in Figure 8. For the waiting time Z, we know that FZ (1) = 1/2, FZ (2) = 3/4, and in general, for positive integers k, FZ (k) = 1 − 2−k . In fact, we can argue that for all positive real numbers x, FZ (x) = 1 − 2−x , giving the plot in Figure 9. For discrete random variables, the probability mass function gives the difference between the successive values of the cumulative distribution function. For continuous random variables the appropriate analogue is the derivative of the cdf. This is called the probability density function or pdf pX of X. Another way to look at it is that the cumulative distribution function is the integral of the denisty function: Z x FX (x) = pX (t)dt. −∞ 9 Figure 6: Cumulative distribution function for the spinner Figure 7: Events Y < 0.8 and Y > 1.2 for the sum of two spinners 10 Figure 8: Cumulative distribution function for the sum of two spinners Y 11 Figure 9: Cumulative distribution function for the waiting-time random variable Z. 12 Figure 10: Probability density function for the uniform random variable X: the area under the graph of the density function is shaded The probability that a random variable X has value between a and b is the area under the graph of the density function between x = a and x = b, that is, Z b P [a ≤ X ≤ b] = pX (t)dt. a The density functions for our three example random variables are shown in Figures 10-12. Observe that the density functions for the single spinner, the sum of two spinners, and the waiting time, have the same basic shapes as the probability mass functions for a single die, the sum of two dice, and the number of times until a head is tossed, which are discrete versions of these continuous random variables. 13 Figure 11: Probability density function for the sum of two spinners Y : the area under the graph of the density function is shaded 14 Figure 12: Probability density function for the waiting time random variable. The bounding curve is y = ln(x) · 2−x . The area under this curve is shaded 15