Download Lecture 2: Infinite Discrete and Continuous Sample Spaces

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Lecture 2: Infinite Discrete and Continuous
Sample Spaces
CS244-Randomness and Computation
February 2, 2015
To extend our formal framework to discrete sample spaces with an infinite
number of outcomes, we need a small tweak to the definitions. In order to extend
them to continuous sample spaces, we will need more than a small tweak! The
details follow.
Related reading: Grinstead and Snell treat all this in Chapter 2, and the notes
here are very abbreviated version of their approach.
1
Infinite Discrete Sample Spaces
Flip a fair coin until two consecutive tosses are tails. We can list the possible
outcomes of this experiment as
Ω = {T T, HT T, HHT T, T HT T, HHHT T, HT HT T, T HHT T,
HHHHT T, HHT HT T, HT HHT T, T HHHT T, T HT HT T, ....}
In other words, Ω consists of all sequences of H’s and T’s that end in two
consecutive T’s, but do not contain any other occurrence of consecutive T’s. The
important thing here is that Ω is infinite.
What sort of probability should we assign to the individual outcomes? Following our usual reasoning, we would estimate that each sequence of length k occurs
with frequency 2−k .
How many outcomes are there of each length? In the tabulation above, we
see 1 sequence of length 2, 1 of length 3, 2 of length 4, 3 of length 5, 5 of length
6. You might recognize the sequence 1,1,2,3,5,... as the start of the Fibonacci
1
sequence F0 , F1 , F2 , F3 , . . . , where for each k ≥ 2, Fk = Fk−1 + Fk−2 . (If you’re
so inclined, you might enjoy trying to prove that Ω contains exactly Fk−2 elements
of length k for every k ≥ 2. )
Does this define a probability function? We would want all the probabilities of
the individual outcomes to add to 1. But ‘adding up all the individual outcomes’
now means summing the infinite series
F0 F1 F2
+
+
+ ···
4
8
16
Does this series converge to 1? Does it converge at all? If you add more and
more terms, you do find that the sum appears to be approaching 1. Again, it is
not hard to prove that this really is a convergent infinite series with sum 1, and
we leave it as an exercise for the mathematically curious. Since all the terms are
positive, we can drop out any set of terms, and the remaining terms will still form
a convergent infinite series, with sum less than 1. We can thus set, for any event
E ⊆ Ω,
X
P [E] =
2−|e| ,
e∈E
where |e| denotes the length of the outcome e (since each outcome is a string of
coin tosses).
Implicit in this way of defining probability functions is the following extension
of our third axiom, applicable to any probability space:
Axiom 3’. If E1 , E2 , E3 , . . . , is a sequence of pairwise disjoint events, then
P[
∞
[
Ej ] =
j=1
∞
X
P Ej .
j=1
The other probability properties imply that the infinite sum on the right has to
be convergent with a limit no more than 1. This is the small tweak we needed.
Observe that ‘equally likely outcomes’ makes no sense in this setting: If each
outcome had equal probability greater than 0, the sum over all probabilities would
not converge. If each outcome had probability 0, the series would converge to 0,
not 1.
We can define a random variable X by X(e) = |e|; in other words, X(e) is the
number of tosses made before two consecutive tails occur. The probability mass
function is given by
2
Figure 1: PMF of Time to Two Tails
Fj−2
2j
for j > 1. While the plot of the PMF is inifinite in extent, we can display a finite
portion of it it; this is shown in Figure 1.
pX (j) = P [X = j] =
2
2.1
Continuous Sample Spaces
Definition
Figure 1 shows a game spinner. The points on the circumference of the circle are
labeled by the set of real numbers
Ω = {0 ≤ x < 1},
3
Figure 2: A spinner
so this is the sample space for our experiment. We also denote this set as [0, 1), a
half-open interval.
The whole point of the spinner is that the outcomes are somehow ‘equally
likely’. (This is the experiment simulated by a call to the random number generator rand().) We have no problem saying that the event depicted in Figure 2 has
probability 0.25, since it occupies exactly one-quarter of the circumference of the
circle. But what sort of probability should we assign to an individual outcome?
If we say that each individual point has some fixed probability p > 0, then
the sum of these probabilities would exceed 1, so that can’t be right. But if we
say that the probability of an individual point is 0, then that makes it sound that
for each individual point Q on the circumference, it is impossible that the spinner
points to Q, and so it is impossible that the spinner points anywhere. So that can’t
be right either....
In fact, we DO say that the probability of each indivdual point is 0, and we
just have to be careful about how we apply the intuition that probability 0 means
‘impossible’.
With this in mind, we assign to each event E ⊆ Ω the total length of E. This
makes sense for individual points, intervals, unions of disjoint intervals, etc. This
is the uniform probability distribution on Ω.
(Mathematical subtlety: it is impossible to assign a length to every subset of
4
Figure 3: The spinner, showing the event {x : 0.5 ≤ x ≤ 0.75}
the interval [0, 1), and preserve the properties that you think length functions ought
to have. This makes getting the formal definition quite right pretty tricky. For our
purposes...don’t worry about it.)
A continuous sample space can be 2-dimesnsional, or 3- or more-dimensional.
In the uniform distribution model, the probability function on a 2-dimensional
sample space Ω assigns to each event E ⊆ Ω the probability
area(E)
.
A
For instance, in our dartboard example, the board, with radius 1, has area π and the
bulls-eye, which has radius 1/12,, has area π/144 square feet. So in the uniform
model, the bulls-eye would have probability 1/144. There is no strong reason to
believe that this is a good model of what happens in reality—that would depend
on the skill of the players whose behavior we’re modeling
2.2
Buffon’s Needle
Here is an amusing—and at first glance, astonishing—application of these ideas.
Mark off a large area on the floor with parallel lines one inch apart. Toss a oneinch-long needle in the air and let it land on the floor. Perform this experiment
5
Figure 4: Buffon’s Needle, 600 trials
repeatedly, and count the proportion of trials in which the needle crosses one of
the lines. This proportion will be approximately 2/π. (The problem, in slightly
different form, was described in the 18th century by G. L. Leclerc, Comte de
Buffon, who was more famous as a naturalist than as a mathematician.)
Figure 4 illustrates a simulation of the experiment with 600 throws. In this
particular run, the needles crossed the line in 65.33% of trials. The correct value
of 2/π is about 0.6366.
In another run, with 60000 trials, the estimate of 2/π was 0.6389.
Let’s see how this remarkable result is obtained. An outcome is where the
needle lands, and this is determined by two parameters: The cooridnates of the
left extremity of the needle, and the angle − π2 ≤ θ ≤ π2 that the needle makes
with the horizontal. Now we are only interested in whether the needle crosses
one of the vertical lines, so we don’t care about the y-coordinate at all, and we
only care about the integer part of the x-coordinate, that is the horizontal distance
0 ≤ z ≤ 1 from the vertical line immediately to the left.
6
Figure 5: Sample space for Buffon Needle Problem, with the event ‘the needle
crosses a line’ shaded
So we can view the sample space as the rectangle {(θ, z) : − π2 ≤ θ ≤ π2 , 0 ≤
z ≤ 1}. Here it is reasonable to assume a uniform distribution: we don’t think
that any horizontal displacement z or angle θ is more likely than any other one.
The right-hand extremity of the needle has x-coordinate z + cos θ, and thus
the needle crosses the line if and only if z > 1 − cos θ. In the plot of Figure 5, this
event is the shaded area.
The probability of crossing the line is thus the area of the shaded region divided by the area of the rectangle. If we turn the picture upside-down, the shaded
region is just the area under the graph of cos θ for − π2 ≤ θ ≤ π2 , which is
Z
π
2
cos θdθ = sin
− π2
π
π
− sin(− ) = 1 − (−1) = 2,
2
2
and the area of the rectangle is π, so the probability that the needle crosses a line
7
is 2/π, as claimed.
2.3
Monte Carlo Integration
The foregoing example (minus the story about the needle) shows a method for
approximating an integral: To find the area of a region X (in this cases the shaded
region of Figure 5), we enclose it in a region Y whose area we know (in our
example, the rectangle) and select points uniformly and at random in Y. Let N be
the number of points selected and Nhits the number of these points that are in X.
Then
area(X)
Nhits
≈
,
N
area(Y )
so we obtain an approximation of the area of X. Of course in the above example,
we knew the integral exactly, but for peculiarly-shaped regions of integration, this
method might be a practical alternative. Approximating integrals in this way is
called Monte Carlo integration.
2.4
Continuous random variables—cumulative distribution function and density function
For continuous sample spaces, we still define a random variable as function X
from the sample space to the set of real numbers. Generally speaking X will be a
continuous function. Here are several examples:
• Let Ω = [0, 1) with the uniform probability distribution. This is the spinner
example described above. Set X(x) = x, so that the value of the random
variable is simply the outcome of the spin.
• Consider now a pair of spinners. When we spin each one, we get a pair
x, y of numbers in [0, 1). We can model the set of outcomes as the 1 × 1
square. It makes sense here to use the uniform probability distribution on
this square, as the outcome on one spinner presumably has no influence on
the outcome of the other spinner. Let Y (x, y) = x+y, the sum of the values
on the two spinners. Y is a random variable, a kind of continuous analogue
of the roll of two dice.
• Let’s go back to an earlier example of an inifinite discrete sample space,
that of flipping a fair coin until tails comes up. We might use this to model a
8
different situation: Suppose we observe people entering a store. We divide
time into one-minute intervals and find that on average, one customer enters
the store in about half of the one-minute periods. We can think of the experiment as starting a clock and noting the time each time the first customer
arrives.Using the coin model for the store arrivals is not a terrible thing to
do, but it does sweep a lot under the rug. Clearly, this is a continuous process, and we might instead view the sample space as the set of positive real
numbers, where again the outcome of the experiment is the arrival time of
the the first customer. Let’s define Z(t) = t; that is, the value of the random
variable Z is just the outcome of the experiment.
It is not very helpful to talk about the probability mass function for any of these
random variables: The probability of the random variable having a particular value
x will generally be 0. However, it does make sense to talk about the cumulative
probability distriubtion
FX (a) = P [X < a].
Let’s calculate this for each of the three random variables in the examples
above.
For the random variable X giving the value of a single spinner, we have
FX (a) = P [X < a] = a
for all a between 0 and 1, while FX (a) = 0 for a < 0, and FX (a) = 1 for a > 1,
The cdf FX is plotted in Figure 6.
For the sum of Y of two spinner outcomes, the cumulative probability FY (a)
is a2 /2 for 0 < a < 1 and 1 − (2 − a)2 /2 for 1 < a < 2. This is illustrated in
Figure 7. The cdf FY itself is plotted in Figure 8.
For the waiting time Z, we know that FZ (1) = 1/2, FZ (2) = 3/4, and in
general, for positive integers k, FZ (k) = 1 − 2−k . In fact, we can argue that for
all positive real numbers x, FZ (x) = 1 − 2−x , giving the plot in Figure 9.
For discrete random variables, the probability mass function gives the difference between the successive values of the cumulative distribution function. For
continuous random variables the appropriate analogue is the derivative of the cdf.
This is called the probability density function or pdf pX of X. Another way to
look at it is that the cumulative distribution function is the integral of the denisty
function:
Z x
FX (x) =
pX (t)dt.
−∞
9
Figure 6: Cumulative distribution function for the spinner
Figure 7: Events Y < 0.8 and Y > 1.2 for the sum of two spinners
10
Figure 8: Cumulative distribution function for the sum of two spinners Y
11
Figure 9: Cumulative distribution function for the waiting-time random variable
Z.
12
Figure 10: Probability density function for the uniform random variable X: the
area under the graph of the density function is shaded
The probability that a random variable X has value between a and b is the area
under the graph of the density function between x = a and x = b, that is,
Z b
P [a ≤ X ≤ b] =
pX (t)dt.
a
The density functions for our three example random variables are shown in Figures 10-12. Observe that the density functions for the single spinner, the sum of
two spinners, and the waiting time, have the same basic shapes as the probability
mass functions for a single die, the sum of two dice, and the number of times until
a head is tossed, which are discrete versions of these continuous random variables.
13
Figure 11: Probability density function for the sum of two spinners Y : the area
under the graph of the density function is shaded
14
Figure 12: Probability density function for the waiting time random variable. The
bounding curve is y = ln(x) · 2−x . The area under this curve is shaded
15