Download Examples of Discrete Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Section 3.4
1
Chapter 3 – Special Discrete Random Variables.
Section 3.4
Binomial random variable
An experiment that has only two possible outcomes is called a Bernoulli trial, for
example, a single coin toss. For the sake of argument, we will call one of the possible
outcomes “success”, and the other one “failure”. The probability of a success is p, and the
probability of failure is 1 − p. We are interested in studying a sequence of identical and
independent Bernoulli trials, and looking at the total number of successes that occur.
Definition. A binomial random variable is the number of successes in n independent
and identical Bernoulli trials.
Examples.
A fair coin is tossed 100 times and Y , the number of heads, is recorded. Then Y is
a binomial random variable with n = 100 and p = 1/2.
Two evenly matched teams play a series of 6 games. The number of wins Y is a
binomial random variable with n = 6 and p = 1/2.
An inspector looks at five computers where the chance that each computer is defective
is 1/6. The number Y of defective computers that he sees is a binomial random variable
with n = 5 and p = 1/6.
If Y is a binomial random variable, then the possible outcomes for Y are obviously
0, 1, . . . , n. In other words, the number of observed successes could be any number between
0 and n. The sample space consists of all strings of length n that consist of S ’s and F ’s;
for example,
n trials
}|
{
z
SSFSFSSSF · · · SF .
Now let us choose a value of 0 ≤ y ≤ n, and look at a couple of typical sample points
belonging to the event (Y = y),
y
n −y
z }| { z }| {
SSS · · · S FFF · · · F,
y −1
n −y
z }| { z }| {
S S S · · · S F F F · · · F S,
y −2
n −y
z }| { z }| {
S S S · · · S F F F · · · F S S.
Every sample point in the event (Y = y) is an arrangement of y S ’s and n − y F ’s, and
so therefore has probability p y (1 − p)n−y .
How many such sample points are there? The number of sample points
¡ in (Y = y)
is the number of distinct arrangements of y S ’s and n − y F ’s, that is, ny . Putting it
Section 3.4
2
together gives the formula for binomial probabilities.
Binomial probabilities.
If Y is a binomial random variable with parameters n and p,
then
² ³
n y
P (Y = y) =
p (1 − p)n−y , y = 0, 1, . . . , n.
y
Example.
Best-of-seven series
In section 1.6 we figured out that the probability of a best-of-seven series between two
evenly matched teams going the full seven games was 20/64. This can also be calculated
using binomial probabilities. If you play six games against an equally skilled opponent,
and Y is the number of wins, then Y has a binomial distribution with n = 6 and p = 1/2.
goes seven games if Y = 3, and the chance of that happening is P (Y = 3) =
The
¡ series
6
3
3
(1/2)
(1/2)
= 20/64 = .3125. So best-of-seven series ought to be seven games long
3
30% of the time. But, in fact, if you look at the Stanley Cup final series for the last fifty
years (1946-1995), there were seven-game series only 8 times (1950, 1954, 1955, 1964, 1965,
1971, 1987, 1994). This seems to show that a lot of these match-ups were not even, which
tends to make the series end sooner.
If you are twice as good as your opponent,
what is the chance of a full seven games?
¡
This time p = 2/3, and so P (Y = 3) = 63 (2/3)3 (1/3)3 = .2195. This agrees more closely
to the actual results, although it’s still a bit high.
Example.
An even split
If I toss a fair coin ten times, what
chance that I get exactly 5 heads and 5
¡ is the
5
5
tails? The answer is P (Y = 5) = 10
(1/2)
(1/2)
= .2461. If I toss a fair coin 100
5
times,
is the chance of exactly fifty heads? This time the answer is P (Y = 50) =
100¡ what
50
50
= .0796. You may be a bit surprised that this is such an uncommon
50 (1/2) (1/2)
event. If you flip a coin 100 times the odds are pretty good that you will get about an equal
number of heads and tails, but to get exactly one half heads and one half tails gets harder
and harder as the sample size increases. Just for fun, here is an approximate
pformula for
the chance of getting exactly n heads in 2n coin tosses: P (an even split) ≈ (πn)−1 .
Example.
Testing for ESP
In order to test for ESP you draw a card from an ordinary deck and ask the subject
what color it is. You repeat this 20 times and the subject is correct 15 times. How likely
is it that this is due to chance?
If the subject is guessing, then Y , the number of correct readings, follows a binomial
distribution with n = 20 and p = 1/2. We want to know the probability that someone
Section 3.4
3
can do this well (or better) by guessing. Thus
P (Y ≥ 15) = P (Y = 15) + P (Y = 16) + · · · + P (Y = 20)
² ³
² ³
² ³
20
20
20
15
5
16
4
=
(1/2) (1/2) +
(1/2) (1/2) + · · · +
(1/2)20 (1/2)0
15
16
20
= 21700(1/2)20
= 0.0207.
This is a pretty unlikely event but certainly not impossible. What conclusion can we draw?
Example.
Quality control
In mass production manufacturing there is a certain percentage of acceptable loss
due to defective units. To check the level of defectives, you take a sample from the day’s
production. If the number of defectives is small you continue, but if there are too many
defectives you shut down the production line for repairs.
Suppose that 5% defectives is considered acceptable, but 10% defectives is unacceptable. Our strategy is to take a sample of n = 40 units and shut down production if we find
4 or more defectives. Our inspection strategy has two conflicting goals, it is supposed to
shut down when p ≥ .10, but continue if p ≤ .05. There are two possible wrong decisions;
to continue when p ≥ .10, and to shut down even though p ≤ .05.
How often will we unnecessarily shut down? Suppose that there are acceptably many
defectives, and to take the worst case, say there are 5% defectives, so that p = .05. Let
Y be the number of observed defective units in the sample. The probability of shutting
down production is
P (shut down)
= P (Y ≥ 4)
= 1 − P (Y ≤ 3)
= 1 − P (Y = 0) − P (Y = 1) − P (Y = 2) − P (Y = 3)
² ³
² ³
² ³
² ³
40
40
40
40
0
40
1
39
2
38
=1−
(.05) (.95) −
(.05) (.95) −
(.05) (.95) −
(.05)3 (.95)37
0
1
2
3
= 1 − .1285 − .2705 − .2777 − .1851
= .1382
On the other hand, how often will we fail to spot an unacceptably high level of
defectives? Let us now suppose that there are unacceptably many defectives, and again to
take the worst case, let’s say there are 10% defectives, so that p = .10. The chance that
Section 3.4
4
the day’s production passes inspection anyway is
P (passes inspection)
= P (Y ≤ 3)
= P (Y = 0) + P (Y = 1) + P (Y = 2) + P (Y = 3)
² ³
² ³
² ³
² ³
40
40
40
40
0
40
1
39
2
38
=
(.10) (.90) +
(.10) (.90) +
(.10) (.90) +
(.10)3 (.90)37
0
1
2
3
= .0148 + .0657 + .1423 + .2003
= .4231
We see that this scheme is fairly likely to make errors. If we wanted to be more certain
about our decision, we would need to take a larger sample size.
Example.
Multiple choice exams
If a multiple choice exam has 30 questions, each with 5 responses, what is the probability of passing the exam by guessing? If you guess on every question, then Y the number
of correct answers will be a binomial random variable with n = 30 and p = 1/5. To pass
you need 15 or more correct answers so P (pass the exam) = P (Y ≥ 15) = 0.000231.
Binomial moments.
If Y is a binomial random variable with parameters n and p,
then
E (Y ) = np
Example.
and
VAR (Y ) = np(1 − p).
The accuracy of empirical probabilities
If we simulate n random events, where the chance of a success is p, then the number of
observed successes Y has a binomial distribution with parameters n and p. The empirical
probability is pb = Y/n. Now the binomial moments given above show that E (b
p) =
(np)/n = p, and VAR (b
p ) = (np(1−p))/n 2 = p(1−p)/n. By computing the two standard
deviation interval, we get some idea about how close pb is to p. Since the quantity p(1 − p)
is maximized when p = 1/2, we find that regardless of the value of p,
r
p(1 − p)
1
≤√ .
2 STD (b
p) = 2
n
n
In most of our examples, the empirical probabilities have been based on n = 1000 repetitions. Thus, our empirical probabilities are typically within ±.03 of the true probabilities.
For example, suppose we simulate 1000 throws of five dice, and find that on 71 occasions we get a sum of 14. Then we are fairly certain that the true probability of getting
14 lies somewhere between .041 and .101.
Section 3.5
Section 3.5
5
Geometric and negative binomial random variables
Like the binomial, the geometric and negative binomial random variables are based
on a sequence of independent and identical Bernoulli trials. Instead of fixing the number
of trials n and counting up how many successes there are, we fix the number of successes
k and count up how many trials it takes to get them. The geometric random variable is
the number of trials until the first success. Given an integer k ≥ 1, the negative binomial
random variable is the number of trials until the k th success. You see that a geometric
random variable is a negative binomial random variable where k = 1. On the other hand,
note that a negative binomial random variable Y is the sum of k independent geometric
random variables. That is, Y = X1 + X2 + · · · + Xk , where X1 is the number of trials until
the first success, X2 is the number of trials after the first success until the second success,
etc. All of these X ’s have geometric distributions with parameter p. If Y is negative
binomial, then a typical sample point belonging to (Y = y) looks like F F S · · · F S S,
where the first y − 1 symbols in the string contain exactly
y−1k¡ − 1 successes and y − k
failures, and then the y th symbol is an S . Since there are k−1
such strings, and they all
k
y−k
have probability p (1 − p)
we get the following formula.
Negative binomial probabilities.
If Y is a negative binomial random variable with parameters k
and p, then
²
³
y −1 k
P (Y = y) =
p (1 − p)y−k ,
k −1
y = k , k + 1, . . . .
It follows that the geometric distribution is given by p(y) = p(1 − p)y−1 , y = 1, 2, . . . .
Example. The chance of a packet arrival to a distribution hub is 1/10 during each
time interval. Let Y be the arrival time of the first packet, it has a geometric distribution
with p = .10. The probability that the first packet arrives during the third time interval
is P (Y = 3) = (1/10)1 (9/10)2 = .081. The probability that the first packet arrives on or
after the third time interval is
P (Y ≥ 3) = 1 − P (Y = 1) − P (Y = 2) = 1 − .10 − (.90)(.10) = .81.
If X is the arrival time
packet, the chance that it arrives on the 99th time
98¡of the tenth
10
interval is P (X = 99) = 9 (1/10) (9/10)89 = 0.01332.
Section 3.7
Example.
6
The 500 goal club
With only 30 games remaining in the NHL season, veteran winger Flash LaRue is
starting to get worried. With a career total of 488 goals, it is not at all certain that he
will be able to score his 500th career goal before the end of the season. He will get a big
bonus from his team if he manages this feat, but unfortunately Flash only scores at a rate
of about once every three games. Is there any hope that he will get his 500th goal before
the end of the season?
Let’s try to calculate the moments of a negative binomial random variable.
p + p(1 − p) + p(1 − p)2 + p(1 − p)3
p(1 − p) + p(1 − p)2 + p(1 − p)3
p(1 − p)2 + p(1 − p)3
p(1 − p)3
+···
+···
+···
+···
..
.
1
(1 − p)
(1 − p)2
(1 − p)3
..
.
p + 2p(1 − p) + 3p(1 − p)2 + 4p(1 − p)3 + · · ·
1/p
This sum ought to convince you that the mean of a geometric random variable is 1/p,
and the result for negative binomial follows from the equation Y = X1 + X2 + · · · + Xk .
Confirming the variance formula is left as an exercise.
Negative binomial moments.
If Y is a negative binomial random variable with parameters k
and p, then
E (Y ) =
k
p
and
VAR (Y ) =
k (1 − p)
.
p2
We note that, as you would expect, the rarer an event is, the longer you will have to
wait for it. Taking the geometric case (k = 1), we see that we will wait on average µ = 2
trials to see the first “heads” in a coin tossing experiment, we will wait on average µ = 36
trials to see the first pair of sixes in tossing a pair of dice, and we will buy on average
µ = 13, 983, 816 tickets before we win Lotto 6-49.
We also note that σ decreases from infinity to zero as p ranges from 0 to 1. This says
that predicting the first occurrence of an event is difficult for rare events, and easy for
common events.
Section 3.7
Hypergeometric random variable
The hypergeometric distribution is the number of successes that arise in sampling
without replacement. We suppose that there is a population of size N , of which r of them
are “successes” and the rest “failures”, and a sample of size n is drawn.
Section 3.7
7
The probability formula below is simply the ratio of the number of samples containing
y successes and n − y failures, to the total number of possible samples of size n. The weird
looking conditions on y just ensure that you don’t try to find the probability of some
impossible event.
Hypergeometric probabilities.
If Y is a hypergeometric random variable with parameters n, r ,
and N , then
² ³²
³
r
N −r
y
n −y
² ³
,
P (Y = y) =
N
n
y = max(0, n −(N −r )), . . . , min(n, r )
Example.
A box contains 12 poker chips of which 7 are green and 5 are blue.
Eight chips are selected at random without replacement from this box. Let X denote the
number of green chips selected. The probability mass function is
² ³²
³
7
5
x
8−x
² ³
, x = 3, 4, . . . , 7.
p(x ) =
12
8
Note that the range of possible x values is restricted by the make-up of the population.
Example.
Lotto 6-49
In Lotto 6-49 you buy a ticket with six numbers chosen from the set {1, 2, . . . , 49}.
The draw consists of a random sample drawn without replacement from the same set,
and your prize depends on how many “successes” were drawn. Here a “success” is any
number that was on your ticket. So Y , the number of matches, follows a hypergeometric
distribution with r = 6, n = 6, and N = 49. The probabilities for the different number of
matches are obtained using the formula
² ³²
³
6
43
y
6−y
² ³
, y = 0, . . . , 6.
P (Y = y) =
49
6
To four decimal places, we have
y
0
1
2
3
4
5
6
p(y)
.4360
.4130
.1324
.0176
.0010
.0000
.0000
Section 3.7
8
Hypergeometric moments.
If Y is a hypergeometric random variable with parameters n, r ,
and N then
E (Y ) = n
r
N
and
VAR (Y ) = n
r N −r N −n
.
N N
N −1
For example, the average number of green chips drawn in the first problem is µ =
(8)(7)/12 = 4.66666. Also, the average number of matches on your Lotto 6-49 ticket is
µ = (6)(6)/49 = .73469.
Example.
Capture-tag-recapture
A scientific expedition has captured, tagged, and released eight sea turtles in a particular region. The expedition assumes that the population size in this region is 35, which
means that 8 are tagged and 27 not tagged. The expedition will now capture 10 turtles
and note how many of them are tagged. If the assumption about the population size is
correct, what is the probability that the new sample will have 3 or less tagged turtles in
it?
P (Y ≤ 3) = P (Y = 0) + P (Y = 1) + P (Y = 2) + P (Y = 3)
² ³² ³ ² ³² ³ ² ³² ³ ² ³² ³
8 27
8 27
8 27
8 27
1
9
2
8
3
7
0 10
= ² ³ + ² ³ + ² ³ + ² ³
35
35
35
35
10
10
10
10
= .04595 + .20424 + .33861 + .27089
= .85969.
We would certainly expect to get three or less tagged turtles in the new sample. If the
expedition found five tagged turtles, is that evidence that they have over-estimated the
population size?
Example.
A political poll
The population of Alberta is around 2, 545, 000, and let’s suppose that about 70% of
these are eligible to vote in the next provincial election. Then the population of eligible
voters has N = 1781500 people in it. Suppose that n = 100 people are randomly selected
from the eligible voters (without replacement) and asked whether or not they support
Ralph Klein. Also suppose, for the sake of argument, that exactly 60%, or 1068900 eligible
voters do support Ralph Klein. How accurately will the poll reflect that?
Let Y stand for the number of Klein supporters included in the random sample. Then
Y has a hypergeometric distribution with n = 100, r = 1068900, and N = 1781500. The
Section 3.8
9
mean and variance of Y are given by
µ = 100
1068900
= 60
1781500
and
σ 2 = 100
1068900 712600 1781400
= 23.998666.
1781500 1781500 1781499
A two standard deviation interval says that probably between 50 to 70 people in the poll
will be Klein supporters.
Note that if the sampling were done with replacement, then Y would follow a binomial
distribution with n = 100 and p = .6. In this case, we would have
µ = 100(.6) = 60
and
σ 2 = 100(.6)(.4) = 24.
Since n is small relative to N , the ratio
N −n
1781400
=
≈ 1,
1781499
N −1
and the mean and variance of the hypergeometric distribution coincide with the mean and
variance of the binomial distribution. The distributions of these two random variables are
also essentially the same whenever n is small relative to N .
Section 3.8
Poisson random variable
This probability distribution is named after the French mathematician Poisson, according to whom. . .
Life is good for only two things, discovering mathematics and
teaching mathematics – Siméon Poisson
In Recherches sur la probabilité des jugements en matière criminelle et en matière
civile, an important work on probability published in 1837, the Poisson distribution first
appeared. The Poisson distribution describes the probability that a random event will
occur in a time or space interval under the conditions that the probability of the event
occurring is very small, but the number of trials is very large so that the event actually
occurs a few times.
To illustrate this idea, suppose you are interested in the number of arrivals to a queue
in a one day period. You could divide the time interval up into little subintervals, so that
for all practical purposes, only one arrival can occur per subinterval. Therefore, for each
subinterval of time, we have
P (no arrival) = 1 − p,
P (one arrival) = p,
P (more than one arrival) = 0.
The total number of arrivals X , is the number of subintervals that contain an arrival. This
has a binomial distribution, where n is the number of subintervals. The probability of
seeing x arrivals during the day is
² ³
n x
P (X = x ) =
p (1 − p)n−x .
x
Section 3.8
10
Now let’s suppose that you keep on dividing the time interval into smaller and smaller
subintervals; increasing n but decreasing p so that the product µ = np remains constant.
What happens to P (X = x )?
² ³
² ³° ± °
µ ±n−x
n x
n
µ x
n−x
1−
p (1 − p)
=
n
n
x
x
n(n − 1) · · · (n − x + 1) ° µ ±x °
µ ±n °
µ ±−x
=
1−
1−
x!
n
n
n
° n − x + 1 ±°
µx °
µ ±n ° n ±° n − 1 ±
µ ±−x
=
.
1−
···
1−
x!
n
n
n
n
n
Now you take the limit as n → ∞, and obtain
°
µ ±n
→ e −µ
1−
n
and
° n ±° n − 1 ±
° n − x + 1 ±°
µ ±−x
→ 1.
···
1−
n
n
n
n
This leads to the following formula.
Poisson probabilities.
If X is a Poisson random variable with parameter µ, then
e −µ µ x
P (X = x ) =
,
x!
x = 0, 1, . . . ,
The derivation of the Poisson distribution explains why it is sometimes called the law
of rare events. Let’s look at an example involving the rarest event I can think of.
Example.
More Lotto 6-49
The odds of winning the jackpot in Lotto 6-49 are one in 13,983,816, or p = 7.1511 ×
10−8 . Suppose you play twice a week, every week for 10,000 years. The total number
of plays is then n = 2 × 52 × 10000 = 1, 040, 000. Setting µ = np = .07437 and using
the Poisson formula, we see that the chance of hitting zero jackpots during this time is
P (X = 0) = (e −.07437 )(.07437)0 /0! = .928327. After all that time, we still have only
about a 7% chance of getting a Lotto 6-49 jackpot. The probability of getting exactly two
jackpots during this time is P (X = 2) = (e −.07437 )(.07437)2 /2! = .002567.
Example.
Hashing
Hashing is a tool for organizing files, where a hashing function transforms a key into
an address, which is then the basis for searching for and storing records. Hashing has two
important features:
1. With hashing, the addresses generated appear to be random — there is no immediate
connection between the key and the location of the record.
Section 3.8
11
2. With hashing, two different keys may be transformed into the same address, in which
case we say that a collision has occurred.
Given that it is nearly impossible to achieve a uniform distribution of records among
the available addresses in a file, it is important to be able to predict how records are likely
to be distributed. Suppose that there are N addresses available, and that the hashing
function assigns them in a completely random fashion. This means that for any fixed
address, the probability that it is selected is 1/N . If r keys are hashed, we can use the
Poisson approximation to the binomial to obtain the probability that exactly x records
are assigned to a given address. This is
e −(r/N ) (r /N )x
, x = 0, 1, . . .
x!
For instance, if we are trying to fit r = 10000 records in N = 10000 addresses, the
proportion of addresses that will remain empty is p(0) = 10 e −1 /0! = .3679. We would
expect a total of about 3679 empty addresses. Since p(1) = 11 e −1 /1! = .3679, we would
also expect a total of about 3679 addresses with 1 record assigned, and about 10000 −
2(3679) = 2642 addresses with more than 1 record assigned. Because we have a packing
density r /N of 1, we must expect a large number of collisions. In order to reduce the
number of collisions we should increase the number N of available addresses.
For more about hashing, the reader is referred to Chapter 11 of the book File Structures: A conceptual toolkit by Michael J. Folk and Bill Zoellick.
p(x ) =
Poisson moments.
If X is a Poisson random variable, then
E (X ) = µ
Example.
and
VAR (X ) = µ.
Particle emissions
In 1910, Hans Geiger and Ernest Rutherford conducted a famous experiment in which
they counted the number of α-particle emissions during 2608 time intervals of equal length.
Their data is as follows.
x
intervals
0
1
2
3
4
5
6
7
8
9
10 > 10
57 203 383 525 532 408 273 139 45
27
10
6
A total of 10097 particles were observed, giving a rate of µ = 10097/2608 = 3.8715
particles per time period. If these particles were following a Poisson distribution, then the
number of intervals with no particles should be about
2608 ×
e −3.8715 (3.8715)0
= 54.31,
0!
Section 3.8
12
the number of intervals with exactly one particle should be about
2608 ×
e −3.8715 (3.8715)1
= 210.27,
1!
and so on. In fact, the frequencies that we would expect to observe are
0
1
2
3
4
5
6
7
8
9
10
> 10
54.31 210.27 407.06 525.31 508.44 393.69 254.03 140.50 67.99 29.25 11.32 5.83
By comparing these two tables, you can see that the Poisson distribution seems to
describe this phenomenon quite well.