Download Notes Ch. 4

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
Chapter 4:
Commonly Used Distributions
1
Section 4.1:
The Bernoulli Distribution
We use the Bernoulli distribution when we have an
experiment which can result in one of two outcomes.
One outcome is labeled “success,” and the other
outcome is labeled “failure.”
The probability of a success is denoted by p. The
probability of a failure is then 1 – p.
Such a trial is called a Bernoulli trial with success
probability p.
2
X ~ Bernoulli(p)
For any Bernoulli trial, we define a random variable X
as follows: If the experiment results in a success, then X
= 1. Otherwise, X = 0. It follows that X is a discrete
random variable, with probability mass function p(x)
defined by:
x 1
 p

p( x)  P( X  x)  1  p
x0
 0
otherwise

3
Figure 4.1
Bernoulli(p=0.5)
Bernoulli(p=0.8)
Examples 1 and 2
1. Toss a coin. If we define heads to be the success,
then p is the probability that the coin comes up
heads. For a fair coin, p = 1/2.
2. Select a component from a population of
components, some of which are defective. Define
success to be a defective component, then p is the
proportion of defective components in the
population.
3. Roll a six-sided die. Let success be when you roll a
4.
5
Mean and Variance
If X ~ Bernoulli(p), then
1

  E[ X ]   x p( x)
i 0
X = 0(1- p) + 1(p) = p
1

  E[( X   ) ]   ( x   ) p( x)
2
2
2
.
i 0
  (0  p) (1  p)  (1  p) ( p)  p(1  p)
2
X
2
2
6
Example 3
Ten percent of components manufactured by a certain
process are defective. A component is chosen at
random. Let X = 1 if the component is defective, and
X = 0 otherwise.
1. What is the distribution of X?
2. Find the mean and variance of X?
7
Section 4.2:
The Binomial Distribution
If a total of n Bernoulli trials are conducted, and
 The trials are independent.
 Each trial has the same success probability p.
 X is the number of successes in the n trials.
then X has the binomial distribution with parameters n
and p, denoted X ~ Bin(n,p).
In other words, X counts the number of successes in n
independent trials (experiments)
8
Binomial R.V.:
pmf, mean, and variance
 If X ~ Bin(n, p), the probability mass function of X
is (derive):
n!

x
n x
p
(1

p
)
, x  0,1,..., n

p( x)  P( X  x)   x!(n  x)!
0, otherwise

 Mean (derive): X = np
 Variance (derive):   np(1  p)
2
X
9
More on the Binomial
• Assume n independent Bernoulli trials are conducted.
• Each trial has probability of success p.
• Let Y1, …, Yn be defined as follows: Yi = 1 if the ith
trial results in success, and Yi = 0 otherwise. (Each of
the Yi has the Bernoulli(p) distribution.)
• Now, let X represent the number of successes among
the n trials. So, X = Y1 + …+ Yn .
This shows that a binomial random variable can be
expressed as a sum of Bernoulli random variables.
It also helps us calculate the mean and variance of a
Binomial RV
10
Figure 4.2
X ~ Bin(10,0.4)
X ~ Bin(20,0.1)
Example 6
A large industrial firm allows a discount on any invoice
that is paid within 30 days. Of all invoices, 10% receive
the discount. In a company audit, 12 invoices are
sampled at random. What is the probability that fewer
than 4 of the 12 sampled invoices receive the discount?
• Assume that invoices are independent of one another
12
Another Use of the Binomial
Assume that a finite population contains items of two
types, successes and failures, and that a simple random
sample is drawn from the population. Then if the
sample size is no more than 5% of the population, the
binomial distribution may be used to model the number
of successes.
13
Example 5
A lot contains several thousand components, 10% of which are
defective. 15 components are sampled from the lot. Let X
represent the number of defective components in the sample.
• What is the distribution of X?
• How many components do we expect to be defective?
• What is that standard deviation of the number of defective
components?
• A lot is only accepted if no defects are found in the inspected
items. How many components should be inspected so that the
chance of accepting a lot when the lot contains 10% defective
items is less than 5%?
14
Estimate of p
To estimate p, conduct n independent trials and let X =
the number of success.
If X ~ Bin(n, p), then the sample proportion pˆ  X / n is
used to estimate the (true) success probability p.
Bias is the difference: pˆ  p
An estimator is unbiased if:
E[ bias ] = 0
E[estimator - parameter estimating] = 0 or
E[estimator] = parameter estimating
15
Estimate of p
Is the sample proportion pˆ  X / n an unbiased
estimator of p?
 p  E[ pˆ ]  (derive)  p
The uncertainty (standard error) in p̂ is:
(derive)  pˆ 
p (1  p )
.
n
In practice, when computing , we substitute p̂ for p,
since p is unknown.
16
Example 7
In a sample of 100 newly manufactured automobile
tires, 7 are found to have minor flaws on the tread. If
four newly manufactured tires are selected at random
and installed on a car, estimate the probability that none
of the four tires have a flaw, and find the uncertainty in
this estimate.
17
Section 4.3:
The Poisson Distribution
 One way to think of the Poisson distribution is as
an approximation to the binomial distribution when n
is large and p is small.
 We can approximate the binomial mass function with
a quantity λ = np; this λ is the parameter in the
Poisson distribution.
 x
n
  x
e 
n x
lim   p (1  p) 
, x  0,1,
n  x
x
!


np
18
Poisson R.V.:
pmf, mean, and variance
If X ~ Poisson(λ), the probability mass function of X is
 e   x
, for x = 0, 1, 2, ...

p ( x)  P( X  x)   x !
0, otherwise
 Mean: X = λ
 Variance:  X2  
Note: X must be a discrete random variable and λ must
be a positive constant.
19
Figure 4.3
Poisson(1)
Poisson(10)
Example 8
Particles are suspended in a liquid medium at a concentration of 6
particles per mL. A large volume of the suspension is thoroughly
agitated, and then 3 mL are withdrawn. What is the probability
that exactly 15 particles are withdrawn?
•  is thought of as a rate, the rate that event X occurs per some
unit, on average (e.g. 6 particles per mL).
• However, to choose the value of lambda properly, we must take
into account the units of the question. Here, the question is,
P(15 particles in 3 mL)
• So, what is the average rate of particles per 3 mL?
21
Example 9
A machine randomly drops Hershey’s Kisses into a crate of 50
Easter baskets. How many kisses should the machine drop so that
the probability of selecting a random basket without any kisses is
0.03 or less?
• What is probability question?
• What is the rate?
22
Section 4.4:
Some Other Discrete Distributions
• Consider a finite population containing two types of
items, which may be called successes and failures.
• A simple random sample is drawn from the population.
• Each item sampled constitutes a Bernoulli trial.
• As each item is selected, the probability of successes in
the remaining population decreases or increases,
depending on whether the sampled item was a success or
a failure.
• This scenario is sometimes called Sampling without
replacement.
23
Hypergeometric
• The trials are not independent, so the number of
successes in the sample does not follow a
binomial distribution.
• The distribution that properly describes the
number of successes is the hypergeometric
distribution.
24
Hypergeometric pmf
Assume a finite population contains N items, of which R are
classified as successes and N – R are classified as failures.
Assume that n items are sampled from this population, and let X
represent the number of successes in the sample. Then X has a
hypergeometric distribution with parameters N, R, and n, which
can be denoted X ~ H(N, R, n). The probability mass function of
X is
  R  N  R 
  

x
n

x
 , if max(0, R  n  N )  x  min(n, R)
  
p( x)  P( X  x)  
N
n 

 

0, otherwise
25
Mean and Variance
If X ~ H(N, R, n), then
nR
Mean of X:  X 
N
R  N  n 
 R 
Variance of X:   n  1  

 N  N  N  1 
2
X
26
Example 10
• Of 50 buildings in an industrial park, 12 have electrical code
violations. If 10 buildings are selected at random for inspection,
what is the probability that exactly 3 of the 10 have code
violations? What is the mean and variance of X?
• Pick 5 cards from a standard deck, (a) what is distribution for
the number of 8’s that are drawn? (b) What is the probability of
drawing exactly two aces?
• There are 5 cokes, 5 sprites, 3 pepsi’s, and 2 dr.pepper’s in a
cooler. Pick 3 drinks at random, (a) what is distribution for the
number of dr. peppers that are drawn? (b) What is the probability
of drawing exactly one dr. pepper?
27
Geometric Distribution
• Assume that a sequence of independent Bernoulli
trials is conducted, each with the same probability of
success, p.
• Let X represent the number of trials up to and
including the first success.
• Then X is a discrete random variable, which is said to
have the geometric distribution with parameter p.
• We write X ~ Geom(p).
28
Geometric R.V.:
pmf, mean, and variance
If X ~ Geom(p), then
 The pmf of X is
 p(1  p) x 1 , x  1,2,...
p ( x)  P( X  x)  
0, otherwise
The mean of X is
1
X 
p
.
1 p
The variance of X is   2 .
p
2
X
29
Example 11
Products are inspected on an assembly line. It is known
that approximately 10% of products have a defect. What
is the probability that the first defect occurs on the 5th
inspection.
1. Define a RV
2. What is the distribution of X?
3. Answer the question
30
Section 4.8:
The Uniform Distribution
The uniform distribution has two parameters, a and b,
with a < b. If X is a random variable with the
continuous uniform distribution then it is uniformly
distributed on the interval (a, b). We write X ~ U(a,b).
The pdf is
 1
, a xb

f ( x)   b  a
0, otherwise
31
Mean and Variance
If X ~ U(a, b).
Then the mean is (derive):
ab
μX 
2
and the variance is (derive):
(b  a )
σ 
.
12
2
2
X
32
Example 19
When a motorist stops at a red light at a certain
intersection, the waiting time for the light to turn green,
in seconds, is uniformly distributed on the interval
(0, 30). Find the probability that the waiting time is
between 10 and 15 seconds.
33
Section 4.5:
The Normal Distribution
The normal distribution (also called the Gaussian
distribution) is by far the most commonly used
distribution in statistics. This distribution provides a
good model for many, although not all, continuous
populations.
The normal distribution is continuous rather than
discrete. The mean of a normal population may have
any value, and the variance may have any positive
value.
34
Normal R.V.:
pdf, mean, and variance
The probability density function of a normal population
with mean  and variance 2 is given by
1
f ( x) 
e  ( x   ) / 2 ,    x  
 2
2
2
If X ~ N(, 2), then the mean and variance of X are
given by
X  
 X2   2
35
68-95-99.7% Rule
This figure represents a plot of the normal probability density
function with mean  and standard deviation . Note that the
curve is symmetric about , so that  is the median as well as the
mean. It is also the case for the normal population.
About 68% of the population is in the interval   .
About 95% of the population is in the interval   2.
About 99.7% of the population is in the interval   3.
36
Standard Units
•The proportion of a normal population that is within a
given number of standard deviations of the mean is the
same for any normal population.
•For this reason, when dealing with normal populations,
we often convert from the units in which the population
items were originally measured to standard units.
•Standard units tell how many standard deviations an
observation is from the population mean.
37
Standard Normal Distribution
In general, we convert to standard units by subtracting
the mean and dividing by the standard deviation. Thus,
if x is an item sampled from a normal population with
mean  and variance 2, the standard unit equivalent of
x is the number z, where
z = (x - )/.
The number z is sometimes called the “z-score” of x.
The z-score is an item sampled from a normal
population with mean 0 and standard deviation of 1.
This normal distribution is called the standard normal
distribution.
38
Example 13
Aluminum sheets used to make beverage cans have
thicknesses that are normally distributed with mean 10
and standard deviation 1.3. A particular sheet is 10.8
thousandths of an inch thick. Find the z-score.
39
Example 13 cont.
The thickness of a certain sheet has a z-score of -1.7.
Find the thickness of the sheet in the original units of
thousandths of inches.
40
Finding Areas Under the Normal Curve
The proportion of a normal population that lies within a given
interval is equal to the area under the normal probability density
above that interval. This would suggest integrating the normal
pdf, but this integral does not have a closed form solution.
So, the areas under the curve are approximated numerically and
are available in Table A.2. (See website link for the normal
table)This table provides area under the curve for the standard
normal density. We can convert any normal into a standard
normal so that we can compute areas under the curve.
The table gives the area in the left-hand tail of the curve. Other
areas can be calculated by subtraction or by using the fact that the
total area under the curve is 1.
41
Example 14
Find the area under normal curve to the left of z = 0.47.
Find the area under the curve to the right of z = 1.38.
42
Example 15
Find the area under the normal curve between z = 0.71
and z = 1.28.
What z-score corresponds to the 75th percentile of a
normal curve?
43
Example 15b
SAT Math scores are Normally distributed with a mean
of 550 and standard deviation 27.
a) What is the probability that a student scores less than
610? More than 500? Between 510 and 610?
b) Suppose administrators want to use the SAT score to
determine if a student is “at risk”. At risk, are students
in the lower 7th percentile. What scores should be used
as a cut-off for admission to the “at risk” group.
c) If you select 10 people at random, what is the
probability that exactly 3 people scored 610 or higher?
44
Estimating the Parameters
If X1,…,Xn are a random sample from a N(,2)
distribution,  is estimated with the sample mean and 2
is estimated with the sample variance.
As with any sample mean, the uncertainty in
X is  / n which we replace with s / n , if  is
unknown. The mean is an unbiased estimator of .
45
Linear Functions of Normal Random
Variables
Let X ~ N(, 2) and let a ≠ 0 and b be constants.
Then aX + b ~ N(a + b, a22).
Let X1, X2, …, Xn be independent and normally distributed with
means 1, 2,…, n and variances σ12 , σ 22 ,K , σ n2 . Let c1, c2,…, cn
be constants, and c1 X1 + c2 X2 +…+ cnXn be a linear combination.
Then
c1 X1 + c2 X2 +…+ cnXn
~ N(c11 + c2 2 +…+ cnn, ,c12σ12  c22σ 22  L  cn2σ n2 )
46
Example 16
A chemist measures the temperature of a solution in oC.
The measurement is denoted C, and is normally
distributed with mean 40oC and standard deviation 1oC.
oF by the equation F =
The measurement is converted
to
to oF
1.8C + 32.
What is the distribution of F?
What is the probability that a randomly selected sample
is less than 104.5 oF
47
Distributions of Functions of Normals
Let X1, X2, …, Xn be independent and normally distributed with
mean  and variance 2. Then
 σ2 
X ~ N  μ,  .
 n 
Let X and Y be independent, with X ~ N(X,
2
Y ~ N(Y, σY ). Then
σ X2 ) and
X  Y ~ N ( μ X  μY , σ  σ )
2
X
2
Y
X  Y ~ N ( μ X  μY , σ  σ )
2
X
2
Y
48
Example 16b
A length of board is Normally distributed with mean 20
cm and standard deviation 4 cm. Select 16 boards at
random.
What is the probability distribution for the average
length of these 16 boards?
What is the probability that the average board length is
more than 21.3 cm?
49
Example 16c
A length of board that is Normally distributed with
mean 20 cm and standard deviation 4 cm is joined endto-end with another board that is Normally distributed
with mean 30 cm and standard deviation 3 cm. Find the
probability that the total length of such a component is
less than 45 cm.
50
Section 4.7:
The Exponential Distribution
The exponential distribution is a continuous
distribution that is sometimes used to model the time
that elapses before an event occurs. Such a time is often
called a waiting time.
The probability density of the exponential distribution
involves a parameter, which is a positive constant λ
whose value determines the density function’s location
and shape.
We write X ~ Exp(λ).
51
Exponential R.V.:
pdf, cdf, mean and variance
The pdf of an exponential r.v. is
 e   x , x  0
f ( x)  
.
0, otherwise
The cdf of an exponential r.v. is
0, x  0
F ( x)  
.
x
1  e , x  0
The mean of an exponential r.v. is
 X  1/  .
The variance of an exponential r.v. is
  1/  .
2
X
2
52
Figure 4.17
Example 17
The lifetime of a component (in months) is
exponentially distributed with lambda=0.2.
1. What is the probability that a randomly select
component lasts longer that 8 months?
2.Between 4 and 10 months?
3.What is the mean lifetime?
4.What is the standard deviation of the lifetime?
5.What is the 3rd quartile of lifetimes?
54
Example 17b
The time between trains is exponentially distributed
with mean 5 minutes. A train just departed, what is the
probability that the next train arrives in less than a
minute?
1. What is the probability that a randomly select
component lasts longer that 8 months?
2.Between 4 and 10 months?
3.What is the mean lifetime?
4.What is the standard deviation of the lifetime?
5.What is the 3rd quartile of lifetimes?
55
Lack of Memory Property
Interesting, but not responsible for.
The exponential distribution has a property known as the lack of
memory property: If T ~ Exp(λ), and t and s are positive
numbers, then P(T > t + s | T > s) = P(T > t).
Suppose the time between trains is exponentially distributed with
mean 5 minutes. Given that you have waited for 3 minutes, what
is the probability that takes less than a minute from now to
arrive? The answer is that it is simply the probability that the train
arrives in less than a minute. In other words, it doesn’t matter
how long you have waited.
56
Section 4.11: The Central Limit Thereom
The Central Limit Theorem
Let X1,…,Xn be a random sample from a population with mean  and variance
2 .
Let X 
X1  L  X n
be the sample mean.
n
Let Sn = X1+…+Xn be the sum of the sample observations. Then if n is
sufficiently large,
 2 

X ~ N   ,
n 

and S n ~ N (n , n ) approximately.
2
For most populations, if the sample size is greater than 30, the Central Limit
Theorem approximation is good.
57
Example 22
The manufacturer of a certain part requires two different
machine operations. The time on machine 1 has mean
0.4 hours and standard deviation 0.1 hours. The time on
machine 2 has mean 0.45 hours and standard deviation
0.15 hours. The times needed on the machines are
independent. Suppose that 65 parts are manufactured.
What is the distribution of the total time on machine 1?
On machine 2? What is the probability that the total
time used by both machines together is between 50 and
55 hours?
58