Download Chapter 5, Probability Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 5, Probability Distributions
5.1 Introduction
- In this chapter, we will discuss various probability distributions including discrete
probability distributions and continuous probability distributions.
-
Discrete probability distributions is used when the sampling space is discrete but not
countable. Following is a list of discrete probability distributions:
 discrete uniform
 binomial and multinomial
 hypergeometric
 negative binomial
 geometric
 Poisson
-
Continuous probability distribution is used when the sample space is continuous.
Following is a list of continuous probability distributions:
 Uniform
 Normal (or Guassian)
 Gamma
 Beta
 t distribution
 F distribution
 2 distribution
5.2 Discrete uniform distribution
- the definition: if a r. v., X, assumes the values x1, x2, ..., xk with equal probabilities,
then X conforms discrete uniform distribution and its probability function is given
below:
f (x,k ) 
-
1
, x  x1, x2 ,...,x k
k
the mean and variance:
k
1
   xi
k i 1
k
1
   (x i   )2
k i1
2
5.3 Binomial and multinomial distributions
- First, let us introduce the Bernoulli process. If:
 the outcomes of process is either success (X = 1) or fail (X = 0)

the probability of success is P(X = 1) = p and the probability of fail is P(X = 0) =
1-p = q
Then, the process is a Bernoulli process.
-
The probability distribution of the Bernoulli process:
p(x) = px(1 - p)1-x, x = 0, 1 and 0 < p < 1
-
The mean and the variance:
E(X) = p
V(X) = p(1 - p)
-
An example: what is the prob. of picking a male student?
X = 1: male student with probability p = (8/12) = 2/3
X = 0: female student with probability 1-p = 1/3
Thus, the probability distribution is:
P(x) = (0.25)x(0.75)1-x, x = 0 and 1
In addition, the mean: p = 2/3 and the variance V = (2/3)(1/3) = 2/9
-
Binomial Distribution: the binomial distribution is defined based on the Bernoulli
process. It is made up of n independent Bernoulli processes. Suppose that X1, X2, ...,
Xn are independent Bernoulli random variables, then Y =  Xi will conform Binomial
distribution. (note that Y is the number of successes among the n trails)
-
The probability distribution of binomial distribution is:
n
P(Y  y)    p y (1  p) n y ,
y  0,1, ..., n
 y
-
The student example: pick three students from the 12 students (Note we must take
samples with replacement in order to ensure the same probability and independence).
none is male student from the 3:
the possibility: FFF
 3
3
the probability:   (1-p) = (0.037)
0
 
one is male student from the 3:
the possibility: MFF, FMF, FFM
 3
2
the probability:   3p(1-p) = (0.222)
1
 
two are male students from the 3:
the possibility: MMF, MFM, FMM
 3
2
the probability:   3p (1-p) = (0.445)
2
 
three are male students from the 3:
the possibility: MMM
 3 3
the probability:   p = (0.296)
3
 
In general, the formula is:
 3
P(Y  y )    p y (1  p) 3 y , y  0, 1, 2, 3
 y
We can derive the general formula in a same manner.
-
Mean and variance of the binomial distribution:
E(Y) =  E(Xi) = p = np
V(Y) =  V(Xi) = p(1 - p) = np(1 - p)
-
the example: find the mean and variance of picking male students and then use
Chybeshev's theorem to interpret the interval  ± 2.
 = (3)(2/3) = 2

 = (3)(2/3)(1/3) = 2/3,  = 0.817
at k = 2,
 + 2 = 2 + (2)(0.816) = 3
 - 2 = 2 - (2)(0.816) = 1
(1 - 1/k2) = 3/4. Therefore, there should be at least a probability of 3/4 that the
number of male students picked are between 1 to 3. Indeed, the probability is actually
p(1)+p(2)+p(3) = 0.973.
-
Using the Binomial distribution table: a function of n and p.
-
Multinomial distribution: this is an extension of binomial distribution: let x1, x2, ..., xk
be independent r. v. with the probability p1, p2, ..., pk, where,
k
k
i 1
i 1
 xi  n, and  pi  1
then, they conform multinomial distribution with the probability distribution:
n

 x1 x 2
 p1 p2 ... pkxk
f ( x1 , x2 ,...xk ; p1 , p2 ,... pk )  
 x1 , x2 ,...xk 
5.4 Hypergeometric Distribution
- The example: what is the probability of pick three male students in a roll? Note that at
this time, samples are not independent, or sampling without replacement. As a result
we need to use hypergeometric distribution. Following shows how the distribution is
formed:
 no male student from the 3 students
12 
8
 4
 
 8  4 
  
0 3
probability =   
12 
 
3
 
total   , male   , female  
3
0
3



one male students from the 12 students
12 
8
 4

 
 
total   , male   , female  
3
1
2

 8  4 
  
1 2
probability =   
12 
 
3

two male students from the 12 students
12 
8
 4
 
8
  4 
  
2 1
probability =   
12 
 
3
 
total   , male   , female  
3
2
1



three male students from the 12 students
12 
8
 4
 
 8  4 
  
 3  0 
12 
 
3
 
total   , male   , female  
3
3
0


probability =
In general, the probability distribution is as follows:
8
4 

 

y3  y 
P(Y  y) 
, y  0, 1, 2, 3
12

 
3 
-
the general formula of the hypergeometry distribution:
k N  k 

 

yn  y 
P(Y  y) 
, y  0, 1, 2, ..., n
N

 
n 
-
the mean and the variance of the hypergeometry distribution:
nk
N
N  n nk  k 
2 
1
N  1 N  N 

as a special case, let N be infinite, then (k / N) = p, and (N-n) / (N-1) = 1. Hence:
 = np
2 = np(1 - p)
That is, the hypergeometric distribution becomes the binomial distribution
-
We can also define the multivariate hypergeometric distribution
5.5 Negative Binomial and Geometric Distributions
- An example: picking three students, what is the probability that the third student is
the second male?
 a possibility is FMM and its probability is (1-p)p2
 the other possibility is MFM and its probability is (1-p)p2
3  1
note that there are 
combinations, and hence, the probability is:
2  1
3  1
f (X  3,k  2)  
1 p p2
2  1
-
The general formula for the negative binomial distribution is as follows:
x  1 k
xk
f (X  x)  
p (1  p) , x = k, k+1, k+2, ...
k  1
where, x is the number of trails and k is the kth success.
-
the mean of variance of the negative binomial distribution:
E(X) = k(1-p)/p
V(X) = k(1-p)/p2
-
another example: picking until get a male student:
 the first pick: p
 the second pick: (1-p)p
 the third pick: (1-p)2p
-
the general formula is:
f(X = x) = (1 - p)x-1p, x = 1, 2, 3, ...
This is the geometric distribution.
- the mean of variance of the negative binomial distribution and geometric distributions:
E(X) = 1/p
V(X) = (1-p)/p2
5.6 Poisson Distribution
- Poisson process is a random process representing a discrete event takes place over
continuous intervals of time or region. Examples of Poisson processes include:
 the arrival of telephone calls at a switchboard,
 the passing cars of an electric checking device.
Note that all these examples involve a discrete random event. At any given small
period of time (or region), the probability that the event occurs is small; however,
over a long time (or large region), the number of occurrence is large.
-
Poisson distribution plays an extremely important role in science and engineering,
since it represents an appropriate probabilistic model for a large number of
observational phenomena.
-
The Poisson distribution can be described by the following formula:
p(x, t) 
e t ( t) x
, x = 0, 1, 2, ...
x!
where,  is the average number of outcomes per unit time or region. Hence, t
represents the number of outcomes.
Proof: refer to the textbook.
-
The Poisson process can be considered as an approximation to the Binomial
Distribution when n is large and p is small.
-
From a physical point of view, given a time interval of length T, which is divided
interval into n equal sub-intervals of length t (t  0), (note that T = nt), and
assume:
 The probability of a success in any sub-interval t is given by t.
 The probability of more than one success in any sub-interval t is negligible.
 The probability of a success in any sub-interval does not depend on what
happened prior to that time.
Then, we have the Poisson distribution.
-
Mean and Variance of Poisson distribution


-

 
An example: in a large company, industrial accidents occur at the mean of three per
week (t = 3) (note that accidents occurs independently).
 the probability distribution:
y
p(y) = (3) exp(-3) / y!, y = 0, 1, 2, ...

the probability can be determined based on simple calculation or by means of
checking the Poisson distribution table.

the probability of less than and equal to four accidents in a week:
p(0) + p(1) + p(2) + p(3) + p(4) = 0.815

the probability of equal and more than four:
P(Y  4) = 1 - P(Y  3) = 0.353

the probability of equal to four
P(Y = 4) = P(Y  4) - P(Y  3) = 0.168
note that this is the same as:
p(4) = 0.168
5.7 Uniform Distribution
- The uniform distribution is a continuous probability distribution
 the assumption: the random event is equally likely in an interval
 an example: receiving an express mail between 1 ~ 5 pm
-
The probability density function (pdf)
 1

f ( x)   b  a
 0
-
elsewhere
By integration, we obtain the probability function (pf)
 0
x  a
F ( x)  
b  a
 1
-
a xb
xa
a xb
bx
A comparison between the discrete distributions and continuous distribution
 the discrete r. v., we have probability function:
P(X = x) = p(x)
 for continuous r. v.:
F(X = x) = 0
x
F(x) =
 f(x) dx
-
f(x) =
-
F(x)
dx
An example: receiving an express mail equally likely between 1 to 5 pm.
f(x) = 1/4,
1x5
0,
elsewhere
hence, the probability of receiving an express mail between 2 to 5 pm is
P(2  X  5) = (5 - 1)/(5 - 1) - (2 - 1)/(5 - 1) = 3/4.
- The mean and the variance:
E(x) = (a+b)/2
2
V(x) = (b-a) /12
5.8 Normal Distribution
- In the natural world there are more cases where possibilities are not equally likely.
Instead there is a most likely value and then the likelihood decreases symmetrically.
This leads to the Normal distribution.
-
Normal distribution is by far the most widely used probability distribution. Why
Normal distribution is so popular?
 the large number theorem
 a linear combination of Normal is still Normal
-
The probability density function:
f(x) =
1
2 
- (x - )2 /2 2
e
note that probability function does not have analytical form, hence, we rely on
numerical calculation (Table A.3)
-
The mean, variance and standard deviation of a normal distributions:
E(X) = 
2
V(X) = 
These two parameters uniquely determine the normal distribution. Hence, a normal
distribution is often denoted as N(, )
-
Illustration of the normal distribution:
 the bell shape
 the mean

-
the standard deviation: ± (68% area), ±2 (95.4% area), and ±3 (99.7% area).
In particular, with
E(X) = 
2
V(X) =  
we have the standard normal distribution N(0, 1)
-
Calculate the probability through the standard normal distribution:
 translate to a normal distribution to a standard normal distribution by:
X-
Z=


use the normal distribution table (Table A.3)
-
An example: given N(16, 1), P(X > 17) = ?
 Z = (X - 16)/1
 P[Z > (17 - 16)/1] = P(Z > 1)
= 1 - P(Z < 1)
= 1 - 8413 (form Table A.3)
= 0.1587
-
Questions:
 given  and , how to calculate P(c1  X  c2)?
 given p,  and , how to calculate x so that P(X > x) = p
-
Given a set of data, it is often necessary to checking whether the data set conforms
normal distribution.
-
The student example - the number of hours of study of the 12 students:
 sorting the data: 10, 12, 12, 14, 14, 14, 15, 15, 15, 20, 20, 25
 note that there are just 6 different values. So, the 100  6 = 16.7
 finding the percentile of the data: 16, 32, 32, 48, 48, 48, 64, 64, 64, 80, 80, 96
 finding the z-values of the percentile: -1., -.47, -.47, -.05, -.05, -.05, .36, .36, .36,
.85, .85, 1.75
 plotting:
•
25
•
20
15•
•
-1.5

-
•-1
-0.5
10
0.5
1
1.5
2
Because the horizontal axis is from a normal distribution, the linear relationship
indicates that the distribution of the data can be approximated by a normal
distribution.
If a data set conforms normal distribution, then the related probability calculated can
be easily done. Following the 12 students example:
 = 15.5

 = 16
Question: what is the prob. of picking a student who studies at least 15 hours per
week?
Answer: we first calculate the z value;
z = (15 - 15.5) / 4 = -0.125
hence, the probability is: P(Z > -0.125) = 1 - P(Z < -0.125) = 1 - 0.45 = 0.55
-
As another example, assuming that an exam is coming, everybody is putting an extra
3 hours for study per week, what is the probability of picking a student who studies at
least 20 hours per week? We first calculate the z value;
z = (20 - 18.5) / 4 = 0.375
hence, P(X > 20) = P(Z > 0.375) = 1 - P(Z < 0.375) = 1 - 0.64 = 0.36.
-
As an exercise, you may want to try to find that, given a probability of 95%, what is
the range of the hours of study per week for a picked student.
-
Normal approximation to binomial. Assuming p is small and n is large, then
Z
X  np
np(1  p)
is approximately normally distributed. This can be demonstrated by the example. In
the students example, the probability of picking a student who studies more than 15
hours per week is p = 3/12 = 1/4. Consider the case of sampling with replacement,
picking 3 students who all study more than 15 hours per week is:
b(X = 3, n = 12, p = 1/4) = 0.212
Use normal distribution to approximate:
 = np = (12)(1/4) = 3
2
 = np(1 - p) = (12)(1/4)(3/4) = 9/4 = 2.25 ( = 1.5)
hence,
P(2.5 < X < 3.5) = P[(2.5 - 3)/1.5 < Z < (3.5 - 3)/1.5]
= P(-0.167 < Z < 0.167)
= 0.56 - 0.395
= 0.165
It is seen that the results are rather similar. The approximation error is caused by
small n (n = 12).
-
The normal approximation of binomial distribution is very useful when n is large
because binomial distribution will then require tedious calculation.
5.9 Exponential distribution, Gamma distribution and Chi-Square (2) distribution
- There are cases, for example the failure rate, in which the possibility decreases
exponentially. This leads to the exponential distribution.
-
the probability density function of the exponential distributions:
1
 x
 exp    x  0,   0
f ( x)  
 

0
elsewhere

-
the probability function
F(x) = 1 - exp(-x/),
-
x > 0,  > 0
To calculate mean and variance, we need the Gamma () function:

() =
x
-1 -x
e dx
0
using integration by part:
(uv)' = u'v + uv'
uv 
or
 u' v   uv'
 uv'  uv   u' v
let u = x-1, dv = e-xdx, it follows that:
( )  ex x  1



 e  x (  1)x  2 dx  (  1)(   1)
0
0
In particular:
(+1) = F()
(n) = (n-1)!
(1/2) = 
In general:


0
(x) 1 e
 x
dx    ( )
for the geometry distribution, since  = 1,  = :
E(X) = 
2
V(X) = 

-
The exponential distribution is correlated to Poisson distribution: given a Poisson
distribution with the mean t, the probability of first time occurrence is exponential.
-
Another common case is that the possibility is low when close to zero - this leads to
the Gamma distribution. The probability density function of Gamma distribution:
f ( x) 
1
 

x
 1

e
x

, x > 0,  > 0.
-
The mean and variance:
E(X) = 
2
V(X) = 
-
Note that exponential distribution is a special case of Gamma distribution with  = 1.
-
Another special case of the gamma distribution is the 2 distribution. Let  = /2 and
 = 2, it results in the 2 distribution:
f (x) 
1

2 2 (  2 )
x

2 1
e
x2
,x>0
its mean and variance are as follows:
=
2 = 2
-
Illustration.
Gamma or 2
Exponential
5.10 Weibull distribution
- The assumption: similar to Gamma
-
The probability density function:
 -1 -x /
x e
,

= 0,
f(x) =
x>0
otherwise
-
The probability function:

F(x) = 1 - exp(-x /), x > 0
-
The mean and variance
1/ 
E(X) =  (1 + 1 )

2/ 
2
V(X) =  {(1 + 2 ) - [(1 + 1 )] }


-
Application in reliability, defining:
f(t) - the pdf of failure
F(t) - the pf of failure
R(t) = 1 - F(t) - the probability of no failure (reliability function)
r(t) = f(t) / R(t) - the failure rate function
if:
r (t ) 
f (t )
f (t )
1

 
R(t ) 1  F (t ) 
then f(t) will be exponential.
-
Proof: since
dF(t)/dt = f(t)
 • F'(t) = 1 - F(t)
 • F'(t) + F(t) = 1
solving the above gives:
F(t) = 1 - exp(-t/), t  0
or
f(t) = 1/ exp(-t/),
t0
5.11 Summary
- Discrete distributions
 discrete uniform: equally likely
 binomial and multinomial: number of success in n independent Bernoulli
experiments
 hypergeometric: sampling is dependent (finite sampling space)
 negative binomial: kth success in n trials
 geometric: trail until success
 Poisson: discrete event in continuous intervals.
-
Continuous distributions
 uniform: equally likely
 Normal: has a most likely value and decreasing symmetrically
 exponential: gradually decreasing
 Gamma: small when close to zero (generalized exponential)
 Beta: contained in a finite interval
 Weibull: generalized Gamma