Download Chap004 - Ka

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Randomness wikipedia , lookup

Birthday problem wikipedia , lookup

Generalized linear model wikipedia , lookup

Probability amplitude wikipedia , lookup

Transcript
Dr. Ka-fu Wong
ECON1003
Analysis of Economic Data
Ka-fu Wong © 2003
Chap 4- 1
Chapter Four
Discrete Probability Distributions
GOALS
1.
2.
3.
4.
5.
6.
l
Define the terms random variable and probability
distribution.
Distinguish between a discrete and continuous
probability distributions.
Calculate the mean, variance, and standard deviation
of a discrete probability distribution.
Describe the characteristics and compute probabilities
using the binomial probability distribution.
Describe the characteristics and compute probabilities
using the hypergeometric distribution.
Describe the characteristics and compute the
probabilities using the Poisson distribution.
Ka-fu Wong © 2003
Chap 4- 2
Random Variables
 A random variable is a numerical value
determined by the outcome of an experiment.
 A probability distribution is the listing of all
possible outcomes of an experiment and the
corresponding probability.
Ka-fu Wong © 2003
Chap 4- 3
Types of Probability Distributions
 A discrete probability distribution can assume only
certain outcomes (need not be finite) – for variables that
take discrete values.
The number of students in a class.
The number of children in a family.
The number of cars entering a carwash in a hour.
Number of home mortgages approved by Coastal
Federal Bank last week.
 Number of CDs you own.
 Number of t rips made outside Hong Kong in the past
one year.
 The number of ten-cents coins in your pocket.




Ka-fu Wong © 2003
Chap 4- 4
Types of Probability Distributions
 A continuous probability distribution can assume an
infinite number of values within a given range – for
variables that take continuous values.





Ka-fu Wong © 2003
The distance students travel to class.
The time it takes an executive to drive to work.
The length of an afternoon nap.
The length of time of a particular phone call.
The amount of money spent on your last haircut.
Chap 4- 5
Features of a Discrete Distribution
 Let X1,…,XN be the list of all possible outcomes (N of them).
 The main features of a discrete probability distribution are:
 The probability of a particular outcome, P(Xi), is
between 0 and 1.00.
 The sum of the probabilities of the various outcomes is
1.00. That is,
P(X1) + … + P(XN) = 1
 The outcomes are mutually exclusive. That is,
P(X1and X2) = 0 and
P(X1or X2) = P(X1)+ P(X2)
Generally, for all i not equal to j.
P(Xi and Xj) = 0.
P(Xi or Xj) = P(Xi)+ P(Xj)
Ka-fu Wong © 2003
Chap 4- 6
Example 1
 Consider a random experiment in which a coin is tossed
three times. Let x be the number of heads. Let H
represent the outcome of a head and T the outcome of a
tail.
 The possible outcomes for such an experiment will be:
TTT, TTH, THT, THH,
HTT, HTH, HHT, HHH.
 Thus the possible values of x (number of heads) are
P(x=0) =1/8
x=0: TTT
If the
x=1: TTH, THT, HTT
P(x=1) =3/8
coin is
x=2: THH, HTH, HHT
P(x=2) =3/8
fair
x=3: HHH
P(x=3) =1/8
 From the definition of a random variable, x as defined in
this experiment, is a random variable.
Ka-fu Wong © 2003
Chap 4- 7
A note on Random variables
 Note that the random variable is usually a
transformation of the original outcome.
 For example, if x is a random variable, the
following are also random variables:
 Y=2x
 Y=3+2x
 Y=x2
 Y=log(x)
Ka-fu Wong © 2003
Chap 4- 8
The Mean of a Discrete Probability
Distribution
 The mean:
 reports the central location of the data.
 is the long-run average value of the random
variable. That is, the average of the outcomes
of many experiments.
 is also referred to as its expected value, E(X),
in a probability distribution.
 is a weighted average.
Ka-fu Wong © 2003
Chap 4- 9
The Mean of a Discrete Probability
Distribution
The mean is computed by the formula:
μ  Σ[xP(x)]
 x1P(x1 )  x 2P(x 2 )  ...  x nP(x n )
where  represents the mean and P(x) is the
probability of the various outcomes x.
Similar to the formula for computing grouped mean
(weighted mean) where P(x) is replaced by relative
frequency (weights).
Ka-fu Wong © 2003
Chap 4- 10
The Variance of a Discrete
Probability Distribution
 The variance measures the amount of spread
(variation) of a distribution.
 The variance of a discrete distribution is
denoted by the Greek letter 2 (sigma squared).
 The standard deviation is the square root of  2.
Ka-fu Wong © 2003
Chap 4- 11
The Variance of a Discrete
Probability Distribution
 The variance of a discrete probability distribution
is computed from the formula:
σ 2  Σ[(x  μ)2 P(x)]
 (x1  μ)2 P(x1 )  (x 2  μ)2 P(x 2 )  ...  (x n  μ)2 P(x n )
Similar to the formula for computing grouped variance
where P(x) is replaced by relative frequency.
Ka-fu Wong © 2003
Chap 4- 12
EXAMPLE 2
 Dan Desch, owner of College Painters, studied
his records for the past 20 weeks and reports the
following number of houses painted per week:
Number of houses painted, x W e e k s
10
11
12
13
Total
Ka-fu Wong © 2003
5
6
7
2
20
Probability, P(x)
.25
.30
.35
.10
1.00
Chap 4- 13
EXAMPLE 2 continued
x
P(x)
10
11
12
13
Total
.25
.30
.35
.10
1.00
 Compute the mean and variance of the
number of houses painted per week and:
μ  E(x)  Σ[xP(x)]
 (10)(.25)  (11)(.30)  (12)(.35)  (13)(.10)
 11.3
σ 2  Σ[(x  μ)2 P(x)]
 (10  11.3)2 (.25)  ...  (13  11.3)2 (.10)
 0.4225  0.0270  0.1715  0.2890
 0.91
Ka-fu Wong © 2003
Chap 4- 14
Binomial Probability Distribution
 The binomial distribution has the following
characteristics:
 An outcome of an experiment is classified into one of
two mutually exclusive categories, such as a success
or failure.
 The data collected are the results of counts in a series
of trials.
 The probability of success stays the same for each trial.
 The trials are independent.
 For example, tossing an unfair coin three times.
 H is labeled success and T is labeled failure.
 The data collected are number of H in the three tosses.
 The probability of H stays the same for each toss.
 The results of the tosses are independent.
Ka-fu Wong © 2003
Chap 4- 15
Binomial Probability Distribution
 To construct a binomial distribution, let
 n be the number of trials
 x be the number of observed successes
  be the probability of success on each trial
 The formula for the binomial probability
distribution is:
P(x) = nCx  x(1- )n-x
Ka-fu Wong © 2003
Chap 4- 16
The density functions of binomial distributions
with n=20 and different success rates p
Ka-fu Wong © 2003
Chap 4- 17
Binomial Probability Distribution
 The formula for the binomial probability distribution is:
P(x) = nCx  x(1- )n-x
TTT, TTH, THT, THH,
HTT, HTH, HHT, HHH.
 X=number of heads
 The coin is fair, i.e., P(head) = 1/2.
 P(x=0) = 1/8 Please verify the probability
 P(x=1) = 3/8 from simple counting with the
 P(x=2) = 3/8 formula for the binomial
 P(x=3) = 1/8 probability distribution .
When the coin is not fair, simple counting rule will not work.
Ka-fu Wong © 2003
Chap 4- 18
EXAMPLE 3
The Alabama Department of Labor reports that
20% of the workforce in Mobile is unemployed.
From a sample of 14 workers, calculate the
following probabilities:
 Exactly three are unemployed.
 At least three are unemployed.
 At least one are unemployed.
Ka-fu Wong © 2003
Chap 4- 19
EXAMPLE 3
continued
The Alabama Department of Labor reports that 20% of the
workforce in Mobile is unemployed. From a sample of 14 workers
 The probability of exactly 3:
P (3)14 C 3 (.20)3 (1  .20)11
 (364)(. 0080)(. 0859)
 .2501
 The probability of at least 3 is:
P ( x  3)14 C3 (.20)3 (.80)11  ... 14 C14 (.20)14 (.80)0
 .250  .172  ...  .000  .551
Ka-fu Wong © 2003
Chap 4- 20
Example 3
continued
The Alabama Department of Labor reports that 20% of the
workforce in Mobile is unemployed. From a sample of 14 workers
 The probability of at least one being unemployed.
P(x  1)  1  P(0)
0
14
 114 C 0 (.20) (1  .20)
 1  .044  .956
Ka-fu Wong © 2003
Chap 4- 21
Mean & Variance of the Binomial
Distribution
 The mean is found by:
  n
 The variance is found by:
  n (1  )
2
Ka-fu Wong © 2003
Chap 4- 22
EXAMPLE 3
continued
 From EXAMPLE 3, recall that  =.2 and n=14.
 Hence, the mean is:
= n  = 14(.2) = 2.8.
 The variance is:
2 = n  (1-  ) = (14)(.2)(.8) =2.24.
Ka-fu Wong © 2003
Chap 4- 23
Example 4
x = number of patients who will experience
nausea following treatment with Phe-Mycin
n = 4 , p = 0.1 , q = 1 – p = 1 - 0.1 = 0.9
Find the probability that 2 of the 4 patients
treated will experience nausea.
4!
p(2)  P(x=2)=
(0.1)2 (0.9)4-2 =6(0.1)2 (0.9)2 =0.0486
2!(4-2)!
Ka-fu Wong © 2003
Chap 4- 24
Example: Binomial Distribution,
n = 4, p = 0.1
Ka-fu Wong © 2003
Chap 4- 25
Several Binomial Distributions
Ka-fu Wong © 2003
Chap 4- 26
Finite Population
 A finite population is a population consisting of a
fixed number of known individuals, objects, or
measurements. Examples include:
 The number of students in this class.
 The number of cars in the parking lot.
 The number of homes built in Blackmoor.
Ka-fu Wong © 2003
Chap 4- 27
Hypergeometric Distribution
 The hypergeometric distribution has the
following characteristics:
 There are only 2 possible outcomes.
 The probability of a success is not the same on
each trial.
 It results from a count of the number of
successes in a fixed number of trials.
Ka-fu Wong © 2003
Chap 4- 28
EXAMPLE 8 of last lecture
In a bag containing 7 red chips and 5 blue chips you
select 2 chips one after the other without replacement.
6/11
7/12
5/12
R1
R2
5/11
B2
7/11
R2
B1
4/11
B2
The probability of a success (red chip) is not the same on each trial.
Ka-fu Wong © 2003
Chap 4- 29
Hypergeometric Distribution
 The formula for finding a probability using
the hypergeometric distribution is:
( S C x )( N S Cn  x )
P( x ) 
N Cn
where N is the size of the population, S is the
number of successes in the population, x is
the number of successes in a sample of n
observations.
Ka-fu Wong © 2003
Chap 4- 30
Hypergeometric Distribution
 Use the hypergeometric distribution to find the
probability of a specified number of successes
or failures if:
 the sample is selected from a finite
population without replacement (recall that
a criteria for the binomial distribution is that
the probability of success remains the same
from trial to trial)
 the size of the sample n is greater than 5%
of the size of the population N .
Ka-fu Wong © 2003
Chap 4- 31
The density functions of hypergeometric distributions with
N=100, n=20 and different success rates p (=S/N).
Ka-fu Wong © 2003
Chap 4- 32
EXAMPLE 5
 The National Air Safety Board has a list of 10 reported safety
violations. Suppose only 4 of the reported violations are actual
violations and the Safety Board will only be able to investigate five
of the violations. What is the probability that three of five
violations randomly selected to be investigated are actually
violations?
( 4 C3 )(10 4 C52 )
P (3 ) 
10 C5
( 4 C3 )( 6 C2 ) 4(15)


 .238
252
10 C5
Ka-fu Wong © 2003
Chap 4- 33
Poisson Probability Distribution
The formula for the binomial probability distribution is:
P(x) = nCx  x(1- )n-x
 The binomial distribution becomes more
skewed to the right (positive) as the probability
of success become smaller.
 The limiting form of the binomial distribution
where the probability of success  is small and
n is large is called the Poisson probability
distribution.
Ka-fu Wong © 2003
Chap 4- 34
Poisson Probability Distribution
The Poisson distribution can be described
mathematically using the formula:
P( x ) 
 xe
x!
where  is the mean number of successes in a
particular interval of time, e is the constant
2.71828, and x is the number of successes.
Ka-fu Wong © 2003
Chap 4- 35
Poisson Probability Distribution
 The mean number of successes  can be
determined in binomial situations by n , where
n is the number of trials and  the probability of
a success.
 The variance of the Poisson distribution is also
equal to n .
 X, the number of success generally has no
specific upper limit.
 Probability distribution always skewed to the
right.
 Becomes symmetrical when  gets large.
Ka-fu Wong © 2003
Chap 4- 36
EXAMPLE 6
 The Sylvania Urgent Care facility specializes in
caring for minor injuries, colds, and flu. For the
evening hours of 6-10 PM the mean number of
arrivals is 4.0 per hour. What is the probability of
2 arrivals in an hour?
P( x ) 
Ka-fu Wong © 2003
 e
x

x!
2
4 e

2!
4
 .1465
Chap 4- 37
Example: Poisson Probabilities
x = number of Cleveland air traffic control errors
during one week
 = 0.4 (expected number of errors per week)
Find the probability that 3 errors will occur in a
week.
p(3)  P(x = 3) =
Ka-fu Wong © 2003
e
-0.4
3
(0.4)
= .0072
3!
Chap 4- 38
Example: Poisson Distribution,  = 0.4
Ka-fu Wong © 2003
Chap 4- 39
Mean and Variance of a Poisson Random
Variable
If x is a Poisson random variable with parameter , then
Mean
X = 
Variance
 x2 = 
Standard Deviation
Ka-fu Wong © 2003
 x =  x2  
Chap 4- 40
Several Poisson Distributions
Ka-fu Wong © 2003
Chap 4- 41
What distributions to use?
 Poisson considers the number of times an event occurs
over an INTERVAL of TIME or SPACE. Note that we are not
considering a sample of given number of observations.
 Thus, if we are considering a sample of 10 observations
and we are asked to compute the probability of having
6 successes, we should not use Poisson. Instead, we
should consider Binomial or Hypergeometric.
 Hypergeometric consider the number of successes in a
sample when the probability of success varies across trials
due to “without replacement” sampling strategy. To
compute the Hypergeometric probability, one will need to
know N and S separately.
 Suppose we know that the probability of success is 0.3.
We are considering a sample of 10 observations and we
are asked to compute the probability of having 6
successes. We cannot use Hypergeometric because we
do not have N and S separately. Instead, we have to
use Binomial.
Ka-fu Wong © 2003
Chap 4- 42
What distributions to use?
Example
In a shipment of 15 hard disks, 5 are defective. If 4 of the
disks are inspected, what is the probability that exactly 1 is
defective?
 First, we recognize that it is not Poisson because "4 of the
disks are inspected" (i.e., sample size =4).
 Second, it is sampling without replacement because if we
were to inspect four disks for defects, we will not want to
sample with replacement.
 Third, both N (15 hard disks) and S (5 are defective) are
given. Hence we will use Hypergeometric.
Ka-fu Wong © 2003
Chap 4- 43
Chapter Four
Discrete Probability Distributions
- END -
Ka-fu Wong © 2003
Chap 4- 44