Download Notes 11 - Wharton Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of randomness wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Indeterminism wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Probability interpretations wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Random variable wikipedia , lookup

Stochastic geometry models of wireless networks wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Randomness wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Statistics 510: Notes 11
Reading: Sections 4.7.
I. Poisson Random Variables
A random variable X taking on one of the values 0, 1, 2, ...
is said to be a Poisson random variable with parameter  if
for some   0 ,
p(i)  P{ X  i}  e
i

i!
, i  0,1, 2,
This is a pmf because


p(i)  e
i 0



i 0
i
i!
 e   e  1
Poisson random variables provide an approximation for a
binomial random variable when n is large and p is small
enough so that np   is of moderate size.
Let X be a binomial random variable with parameters (n,p)
and let   np . Then
n!
P( X  i) 
p i (1  p ) n i
(n  i )!i !
n!   

 
(n  i )!i !  n 
n i
 
1  
 n
n(n  1) (n  i  1)  i (1   / n) n

ni
i ! (1   / n)i
For n large,  moderate and i considerably smaller than n
i
1
n(n  1) (n  i  1)
 
 

1


e
,

1,


1    1
i
n
 n
 n
Hence for n large and  moderate,
n
P( X  i )  e
i

i
i! .
(Note that if i is not considerably smaller than n
and p is small, then P( X  i )  0 for a binomial random
variable and P( X  i )  0 for a Poisson random variable
since i would be much greater than   np if i is not
considerably smaller than n).
The following tables gives an idea of the accuracy of the
Poisson approximation to the binomial. For n=100,
p=1/100, the Poisson approximation is remarkably good.
Binomial probabilities and Poisson probabilities for n=5
and p=1/5 (   1 )
5 
e (1)
x
  (.2) (.8)
x
5 x
 x
0
1
2
3
4
5
6
1
x
x!
0.328
0.410
0.205
0.051
0.006
0.000
0
0.368
0.368
0.184
0.061
0.015
0.003
0.001
Binomial probabilities and Poisson probabilities for n=100
and p=1/100 (   1 )
2
X
100 
x
100  x

 (.01) (.99)
x


e1 (1) x
x!
0
1
2
3
4
5
6
7
8
9
10
0.366032
0.369730
0.184865
0.060999
0.014942
0.002898
0.000463
0.000063
0.000007
0.000001
0.000000
0.367879
0.367879
0.183940
0.061313
0.015328
0.003066
0.000511
0.000073
0.000009
0.000001
0.000000
Applications of Poisson random variables: The Poisson
family of random variables provides a good model for the
number of successes in an experiment consisting of a large
number of independent trials with a small probability of
success for each trial (since the number of successes is a
binomial random variable with n large and p small)
Examples of random phenomenon that are accurately
modeled as Poisson random variables include:
 The number of misprints on a page (or a group of
pages) of a book
 The number of people in a community living to 100
years of age
 The number of wrong telephone numbers that are
dialed in a day
3
Example 1: A chromosome mutation believed to be linked
with colorblindness is known to occur, on the average, once
in every 10,000 births. If 20,000 babies are born this year
in a certain city, what is the probability that at least one will
develop color blindness?
An advantage of the Poisson approximation to the binomial
is that we do not need to know the precise number of trials
and the precise value of the probability of success; it is
enough to know what the product of these two values is.
Example 2: Suppose that in a certain country commercial
airplane crashes occur at the rate of 2.5 per year. Find the
probability that four or more crashes will occur next year.
4
Expected Value and Variance of Poisson Random
Variables
Let X be a Poisson (  ) random variable.
ie    i
E( X )  
i!
i 0

e    i 1
 
i 1 (i  1)!

 e



i 0
j
j!
(by letting j  i  1)
  e   e

i 2e  i
E( X )  
i!
i 0

2
ie    i 1
 
i 1 (i  1)!

( j  1)e    j
 e 
(by letting j  i  1)
j!
j 0


  je    j  e    j 
  


j
!
j
!
j

0
j

0


  (  1)
where the final equality follows since the first sum is the
expected value of a Poisson random variable with
5
parameter  and the second is the sum of the probabilities
of this random variable.
Therefore,
Var ( X )  E ( X 2 )  ( E ( X )) 2   .
Poisson random variables for number of events occurring in
a time period
Another use of the Poisson probability distribution, besides
approximating the binomial for large n, small p, is to model
the number of “events” occurring in a certain period of
time, e.g.,
 the number of earthquakes occurring during some
fixed time span
 the number of wars per year
 the number of electrons emitted from a radioactive
source during a given period of time
 the number of freak accidents, such as falls in the
shower, for a large population during a given period of
time (used by insurance companies)
 number of vehicles that pass a marker on a roadway
during a given period of time.
Let X denote the number of events occurring in a certain
period of time. Suppose for a positive constant  , the
following assumptions hold true:
1. The probability that exactly 1 event occurs in a given
interval of length h is equal to  h  o(h) where o(h) stands
6
f (h) / h  0 [for
for any function f (h) that is such that lim
h 0
2
instance, f (h)  h is o(h) whereas f (h)  h is not.]
2. The probability that 2 or more events occur in an
interval of length h is equal to o(h) .
3. For any integers n, j1 , j2 ,
, jn and any set of n
nonoverlapping intervals, if we define Ei to be the event
that exactly ji of the events under consideration occur in
the ith of these intervals, then events E1 ,
independent.
, En are
Under Assumptions 1-3, the number of events occurring in
any interval of length t is a Poisson random variable with
parameter t .
Proof: Let N (t ) denote the number of events occurring in
the interval [0, t ] . To obtain an expression for
P{N (t )  k} , we start by breaking the interval [0, t ] into n
non-overlapping subintervals of length t / n . Now,
P{N (t )  k}  P{k of the subintervals contain exactly 1 event
and the other n  k contain 0 events}
+P{ N (t )  k and at least one subinterval contains
2 or more events}
(1.1)
7
Let A and B denote the two mutually exclusive events on
the right hand side of the above equation. We have
P ( B )  P (at least one subinterval contains 2 or more events)
n
 P(
{ith subinterval contains 2 or more events})
i 1
n
  P(ith subinterval contains 2 or more events)
i 1
n
=  o(t / n)
i=1
 no(t / n)
 o(t / n) 
t

 t/n 
Now for any t, t / n  0 as n   and so
o(t / n)(t / n)  0 as n   by the definition of o. Hence,
P( B)  0 as n  
(1.2)
.
On the other hand, since assumptions 1 and 2 imply that
P(0 events occur in an interval of length h) 
1  [ h  o(h)  o(h)]  1   h  o(h)
we see from Assumption 3 that
P ( A)  P{k of the subintervals contain exactly 1 event and
and the other n-k contain 0 events}
 n   t
 t 
     o  
 n 
k  n
k
  t   t  
1   n   o  n  
    
However, since
8
nk
 t
 t 
 o(t / n) 
n   o     t  t 
 t as n   ,

 n 
 t/n 
n
it follows by the same argument that verified the Poisson
approximation to the binomial that
 t 
P( A)  e t
k!
k
as n  
(1.3)
Thus, by letting n   and using (1.1), (1.2) and (1.3), we
obtain
e   t ( t ) k
P{N (t )  k} 
, k  0,1,...
k!
Hence, if assumptions 1-3 are satisfied, the number of
events occurring in any fixed interval of length t is a
Poisson random variable with mean t . The value  is the
rate per unit time at which events occur.
Example 3: Suppose that earthquakes occur in the western
portion of the United States in accordance with
assumptions 1, 2 and 3 with   2 and with 1 week as the
unit of time (That is, earthquakes occur in accordance with
the three assumptions at the rate of 2 per week).
Find the probability that at least 3 earthquakes occur during
the next 2 weeks.
9
10