Download Slide 1

Document related concepts

History of statistics wikipedia , lookup

Transcript
Chapter
37
The Normal
Probability
Distribution
2010 reserved
Pearson Prentice Hall. All rights reserved
© 2010 Pearson Prentice Hall. All©rights
Section 7.1 Properties of the Normal Distribution
© 2010 Pearson Prentice Hall. All rights reserved
7-2
© 2010 Pearson Prentice Hall. All rights reserved
7-3
EXAMPLE
Illustrating the Uniform Distribution
Suppose that United Parcel Service is supposed to deliver a
package to your front door and the arrival time is somewhere
between 10 am and 11 am. Let the random variable X
represent the time from 10 am when the delivery is supposed
to take place. The delivery could be at 10 am (x = 0) or at 11
am (x = 60) with all 1-minute interval of times between x = 0
and x = 60 equally likely. That is to say your package is just as
likely to arrive between 10:15 and 10:16 as it is to arrive
between 10:40 and 10:41. The random variable X can be any
value in the interval from 0 to 60, that is, 0 < X < 60. Because
any two intervals of equal length between 0 and 60, inclusive,
are equally likely, the random variable X is said to follow a
uniform probability distribution.
© 2010 Pearson Prentice Hall. All rights reserved
7-4
© 2010 Pearson Prentice Hall. All rights reserved
7-5
The graph below illustrates the properties for the “time”
example. Notice the area of the rectangle is one and the graph
is greater than or equal to zero for all x between 0 and 60,
inclusive.
Because the area of a
rectangle is height
times width, and the
width of the rectangle
is 60, the height must
be 1/60.
© 2010 Pearson Prentice Hall. All rights reserved
7-6
Values of the random variable X less than 0 or greater than
60 are impossible, thus the equation must be zero for X
less than 0 or greater than 60.
© 2010 Pearson Prentice Hall. All rights reserved
7-7
The area under the graph of the density function over an
interval represents the probability of observing a value of the
random variable in that interval.
© 2010 Pearson Prentice Hall. All rights reserved
7-8
EXAMPLE
Area as a Probability
The probability of choosing a time that is between 15 and 30
seconds after the minute is the area under the uniform density
function.
Area
= P(15 < x < 30)
= 15/60
= 0.25
15
30
© 2010 Pearson Prentice Hall. All rights reserved
7-9
© 2010 Pearson Prentice Hall. All rights reserved
7-10
Relative frequency histograms that are symmetric and
bell-shaped are said to have the shape of a normal
curve.
© 2010 Pearson Prentice Hall. All rights reserved
7-11
If a continuous random variable is normally distributed,
or has a normal probability distribution, then a relative
frequency histogram of the random variable has the
shape of a normal curve (bell-shaped and symmetric).
© 2010 Pearson Prentice Hall. All rights reserved
7-12
© 2010 Pearson Prentice Hall. All rights reserved
7-13
© 2010 Pearson Prentice Hall. All rights reserved
7-14
© 2010 Pearson Prentice Hall. All rights reserved
7-15
© 2010 Pearson Prentice Hall. All rights reserved
7-16
© 2010 Pearson Prentice Hall. All rights reserved
7-17
EXAMPLE
A Normal Random Variable
The data on the next slide represent the heights
(in inches) of a random sample of 50 two-year
old males.
(a) Draw a histogram of the data using a lower
class limit of the first class equal to 31.5 and a
class width of 1.
(b) Do you think that the variable “height of 2year old males” is normally distributed?
© 2010 Pearson Prentice Hall. All rights reserved
7-18
36.0
34.7
34.4
33.2
35.1
38.3
37.2
36.2
33.4
35.7
36.1
35.2
33.6
39.3
34.8
37.4
37.9
35.2
34.4
39.8
36.0
38.2
39.3
35.6
36.7
37.0
34.6
31.5
34.0
33.0
36.0
37.2
38.4
37.7
36.9
36.8
36.0
34.8
35.4
36.9
35.1
33.5
35.7
35.7
© 2010 Pearson Prentice Hall. All rights reserved
36.8
34.0
37.0
35.0
35.7
38.9
7-19
© 2010 Pearson Prentice Hall. All rights reserved
7-20
In the next slide, we have a normal density
curve drawn over the histogram. How does
the area of the rectangle corresponding to a
height between 34.5 and 35.5 inches relate
to the area under the curve between these
two heights?
© 2010 Pearson Prentice Hall. All rights reserved
7-21
© 2010 Pearson Prentice Hall. All rights reserved
7-22
© 2010 Pearson Prentice Hall. All rights reserved
7-23
© 2010 Pearson Prentice Hall. All rights reserved
7-24
EXAMPLE
Interpreting the Area Under a Normal Curve
The weights of giraffes are approximately normally distributed with mean μ =
2200 pounds and standard deviation σ = 200 pounds.
(a) Draw a normal curve with the parameters labeled.
(b) Shade the area under the normal curve to the left of x = 2100 pounds.
(c) Suppose that the area under the normal curve to the left of x = 2100
pounds is 0.3085. Provide two interpretations of this result.
(a), (b)
(c)
• The proportion of giraffes whose
weight is less than 2100 pounds is
0.3085
• The probability that a randomly
selected giraffe weighs less than
2100 pounds is 0.3085.
© 2010 Pearson Prentice Hall. All rights reserved
7-25
© 2010 Pearson Prentice Hall. All rights reserved
7-26
© 2010 Pearson Prentice Hall. All rights reserved
7-27
EXAMPLE
Relation Between a Normal Random Variable and a
Standard Normal Random Variable
The weights of giraffes are approximately normally distributed with mean μ =
2200 pounds and standard deviation σ = 200 pounds. Draw a graph that
demonstrates the area under the normal curve between 2000 and 2300
pounds is equal to the area under the standard normal curve between the Zscores of 2000 and 2300 pounds.
© 2010 Pearson Prentice Hall. All rights reserved
7-28
Section 7.2 The Standard Normal Distribution
© 2010 Pearson Prentice Hall. All rights reserved
7-29
© 2010 Pearson Prentice Hall. All rights reserved
7-30
© 2010 Pearson Prentice Hall. All rights reserved
7-31
© 2010 Pearson Prentice Hall. All rights reserved
7-32
© 2010 Pearson Prentice Hall. All rights reserved
7-33
The table gives the area under the standard normal curve for
values to the left of a specified Z-score, zo, as shown in the
figure.
© 2010 Pearson Prentice Hall. All rights reserved
7-34
EXAMPLE
Finding the Area Under the Standard Normal Curve
Find the area under the standard normal curve to the left of z = -0.38.
Area left of z = -0.38 is 0.3520.
© 2010 Pearson Prentice Hall. All rights reserved
7-35
Area under the normal curve to the
right of zo = 1 – Area to the left of zo
© 2010 Pearson Prentice Hall. All rights reserved
7-36
EXAMPLE
Finding the Area Under the Standard Normal Curve
Find the area under the standard normal curve to the right of Z = 1.25.
Area right of 1.25 = 1 – area left of 1.25
= 1 – 0.8944
= 0.1056 © 2010 Pearson Prentice Hall. All rights reserved
7-37
EXAMPLE
Finding the Area Under the Standard Normal Curve
Find the area under the standard normal curve between z = -1.02 and z = 2.94.
Area between -1.02 and 2.94 = (Area left of z = 2.94) – (area left of z = -1.02)
= 0.9984 – 0.1539
= 0.8445
© 2010 Pearson Prentice Hall. All rights reserved
7-38
© 2010 Pearson Prentice Hall. All rights reserved
7-39
© 2010 Pearson Prentice Hall. All rights reserved
7-40
EXAMPLE
Finding a z-score from a Specified Area to the Left
Find the z-score such that the area to the left of the z-score is 0.7157.
The z-score such that the area to the left of the z-score is 0.7157 is z = 0.57.
© 2010 Pearson Prentice Hall. All rights reserved
7-41
EXAMPLE
Finding a z-score from a Specified Area to the Right
Find the z-score such that the area to the right of the z-score is 0.3021.
The area left of the z-score is 1 – 0.3021 = 0.6979.
The approximate z-score that corresponds to an area of 0.6979 to the left (0.3021
to the right) is 0.52. Therefore, z = 0.52.
© 2010 Pearson Prentice Hall. All rights reserved
7-42
EXAMPLE
Finding a z-score from a Specified Area
Find the z-scores that separate the middle 80% of the area under the normal curve
from the 20% in the tails.
Area = 0.8
Area = 0.1
Area = 0.1
z1 is the z-score such that the area left is 0.1, so z1 = -1.28.
z2 is the z-score such that the area left is 0.9, so z2 = 1.28.
© 2010 Pearson Prentice Hall. All rights reserved
7-43
The notation zα (prounounced “z sub alpha”) is the z-score such that
the area under the standard normal curve to the right of zα is α.
© 2010 Pearson Prentice Hall. All rights reserved
7-44
EXAMPLE
Finding the Value of z
Find the value of z0.25
We are looking for the z-value such that the area to the right of the z-value is
0.25. This means that the area left of the z-value is 0.75.
z0.25 = 0.67
© 2010 Pearson Prentice Hall. All rights reserved
7-45
© 2010 Pearson Prentice Hall. All rights reserved
7-46
Notation for the Probability of a Standard Normal Random
Variable
P(a < Z < b)
represents the probability a standard
normal random variable is between
a and b
P(Z > a)
represents the probability a standard
normal random variable is greater
than a.
P(Z < a)
represents the probability a standard
normal random variable is less than a.
© 2010 Pearson Prentice Hall. All rights reserved
7-47
EXAMPLE
Finding Probabilities of Standard Normal Random Variables
Find each of the following probabilities:
(a) P(Z < -0.23)
(b) P(Z > 1.93)
(c) P(0.65 < Z < 2.10)
(a) P(Z < -0.23) = 0.4090
(b) P(Z > 1.93) = 0.0268
(c) P(0.65 < Z < 2.10) = 0.2399
© 2010 Pearson Prentice Hall. All rights reserved
7-48
For any continuous random variable, the probability of observing a specific
value of the random variable is 0. For example, for a standard normal
random variable, P(a) = 0 for any value of a. This is because there is no area
under the standard normal curve associated with a single value, so the
probability must be 0. Therefore, the following probabilities are equivalent:
P(a < Z < b) = P(a < Z < b) = P(a < Z < b) = P(a < Z < b)
© 2010 Pearson Prentice Hall. All rights reserved
7-49
Section 7.3 Applications of the Normal Distribution
© 2010 Pearson Prentice Hall. All rights reserved
7-50
© 2010 Pearson Prentice Hall. All rights reserved
7-51
© 2010 Pearson Prentice Hall. All rights reserved
7-52
EXAMPLE
Finding the Probability of a Normal Random
Variable
It is known that the length of a certain steel rod is normally
distributed with a mean of 100 cm and a standard deviation of
0.45 cm.* What is the probability that a randomly selected steel
rod has a length less than 99.2 cm?
99.2  100 

P( X  99.2)  P  Z 

0.45 

 P  Z  1.78 
 0.0375
Interpretation: If we randomly selected 100 steel rods, we would expect about 4
of them to be less than 99.2 cm.
*Based
upon information obtained from Stefan Wilk.
© 2010 Pearson Prentice Hall. All rights reserved
7-53
EXAMPLE
Finding the Probability of a Normal Random
Variable
It is known that the length of a certain steel rod is normally
distributed with a mean of 100 cm and a standard deviation of
0.45 cm. What is the probability that a randomly selected steel
rod has a length between 99.8 and 100.3 cm?
100.3  100 
 99.8  100
P(99.8  X  100.3)  P 
Z

0.45 
 0.45
 P  0.44  Z  0.67 
 0.4186
Interpretation: If we randomly selected 100 steel rods, we would expect about 42
of them to be between 99.8 cm and 100.3 cm.
© 2010 Pearson Prentice Hall. All rights reserved
7-54
EXAMPLE
Finding the Percentile Rank of a Normal Random
Variable
The combined (verbal + quantitative reasoning) score on the GRE is normally
distributed with mean 1049 and standard deviation 189.
(Source: http://www.ets.org/Media/Tests/GRE/pdf/994994.pdf.)
The Department of Psychology at Columbia University in New York requires a
minimum combined score of 1200 for admission to their doctoral program.
(Source: www.columbia.edu/cu/gsas/departments/psychology/department.html.)
What is the percentile rank of a student who earns a combined GRE score of
1300?
The area under the normal curve is a probability, proportion, or percentile.
Here, the area under the normal curve to the left of 1300 represents the
percentile rank of the student.
Area left of 1300 = Area left of (z = 1.33)
= 0.91 (rounded to two decimal places)
Interpretation: The student scored at the 91st percentile. This means the student
scored better than 91% of the students who took the GRE.
7-55
© 2010 Pearson Prentice Hall. All rights reserved
EXAMPLE
Finding the Proportion Corresponding to a Normal
Random Variable
It is known that the length of a certain steel rod is normally
distributed with a mean of 100 cm and a standard deviation of
0.45 cm. Suppose the manufacturer must discard all rods less
than 99.1 cm or longer than 100.9 cm. What proportion of rods
must be discarded?
The proportion is the area under the normal curve to the left of 99.1 cm plus
the area under the normal curve to the right of 100.9 cm.
Area left of 99.1 + area right of 100.9 = (Area left of z = -2) + (Area right of z = 2)
= 0.0228 + 0.0228
= 0.0456
Interpretation: The proportion of rods that must be discarded is 0.0456. If the company
manufactured 1000 rods, they would expect to discard about 46 of them.
© 2010 Pearson Prentice Hall. All rights reserved
7-56
© 2010 Pearson Prentice Hall. All rights reserved
7-57
© 2010 Pearson Prentice Hall. All rights reserved
7-58
EXAMPLE
Finding the Value of a Normal Random Variable
The combined (verbal + quantitative reasoning) score on the GRE is normally
distributed with mean 1049 and standard deviation 189.
(Source: http://www.ets.org/Media/Tests/GRE/pdf/994994.pdf.)
What is the score of a student whose percentile rank is at the 85th percentile?
The z-score that corresponds to the 85th percentile is the z-score such that the
area under the standard normal curve to the left is 0.85. This z-score is 1.04.
x = µ + zσ
= 1049 + 1.04(189)
= 1246
Interpretation: The proportion of rods that must be discarded is 0.0456. If the company
manufactured 1000 rods, they would expect to discard about 46 of them.
© 2010 Pearson Prentice Hall. All rights reserved
7-59
EXAMPLE
Finding the Value of a Normal Random Variable
It is known that the length of a certain steel rod is normally distributed with
a mean of 100 cm and a standard deviation of 0.45 cm. Suppose the
manufacturer wants to accept 90% of all rods manufactured. Determine
the length of rods that make up the middle 90% of all steel rods
manufactured.
z1 = -1.645 and z2 = 1.645
Area = 0.05
Area = 0.05
x1 = µ + z1σ
= 100 + (-1.645)(0.45)
= 99.26 cm
x2 = µ + z2σ
= 100 + (1.645)(0.45)
= 100.74 cm
Interpretation: The length of steel rods that make up the middle 90% of all steel rods
manufactured would have lengths between 99.26 cm and 100.74 cm.
© 2010 Pearson Prentice Hall. All rights reserved
7-60
Section 7.4 Assessing Normality
© 2010 Pearson Prentice Hall. All rights reserved
7-61
Suppose that we obtain a simple random sample from a
population whose distribution is unknown. Many of the
statistical tests that we perform on small data sets (sample
size less than 30) require that the population from which the
sample is drawn be normally distributed. Up to this point,
we have said that a random variable X is normally
distributed, or at least approximately normal, provided the
histogram of the data is symmetric and bell-shaped. This
method works well for large data sets, but the shape of a
histogram drawn from a small sample of observations does
not always accurately represent the shape of the population.
For this reason, we need additional methods for assessing
the normality of a random variable X when we are looking at
sample data.
© 2010 Pearson Prentice Hall. All rights reserved
7-62
© 2010 Pearson Prentice Hall. All rights reserved
7-63
A normal probability plot plots observed data
versus normal scores.
A normal score is the expected Z-score of the
data value if the distribution of the random
variable is normal. The expected Z-score of an
observed value will depend upon the number of
observations in the data set.
© 2010 Pearson Prentice Hall. All rights reserved
7-64
© 2010 Pearson Prentice Hall. All rights reserved
7-65
The idea behind finding the expected Z-score is that if the data
comes from a population that is normally distributed, we should
be able to predict the area left of each of the data values. The
value of fi represents the expected area left of the ith data value
assuming the data comes from a population that is normally
distributed. For example, f1 is the expected area left of the
smallest data value, f2 is the expected area left of the second
smallest data value, and so on.
© 2010 Pearson Prentice Hall. All rights reserved
7-66
If sample data is taken from a population that is
normally distributed, a normal probability plot of the
actual values versus the expected Z-scores will be
approximately linear.
© 2010 Pearson Prentice Hall. All rights reserved
7-67
We will be content in reading normal probability plots
constructed using the statistical software package,
MINITAB. In MINITAB, if the points plotted lie within
the bounds provided in the graph, then we have
reason to believe that the sample data comes from a
population that is normally distributed.
© 2010 Pearson Prentice Hall. All rights reserved
7-68
EXAMPLE
Interpreting a Normal Probability Plot
The following data represent the time between eruptions
(in seconds) for a random sample of 15 eruptions at the
Old Faithful Geyser in California. Is there reason to
believe the time between eruptions is normally
distributed?
728
730
726
678
722
716
723
708
736
© 2010 Pearson Prentice Hall. All rights reserved
735
708
736
735
714
719
7-69
The random variable “time between eruptions” is
likely not normal.
7-70
© 2010 Pearson Prentice Hall. All rights reserved
© 2010 Pearson Prentice Hall. All rights reserved
7-71
EXAMPLE
Interpreting a Normal Probability Plot
Suppose that seventeen randomly selected workers at a detergent factory
were tested for exposure to a Bacillus subtillis enzyme by measuring the ratio
of forced expiratory volume (FEV) to vital capacity (VC). NOTE: FEV is the
maximum volume of air a person can exhale in one second; VC is the
maximum volume of air that a person can exhale after taking a deep breath. Is
it reasonable to conclude that the FEV to VC (FEV/VC) ratio is normally
distributed?
Source: Shore, N.S.; Greene R.; and Kazemi, H. “Lung Dysfunction in Workers Exposed to Bacillus subtillis Enzyme,”
Environmental Research, 4 (1971), pp. 512 - 519.
© 2010 Pearson Prentice Hall. All rights reserved
7-72
Reasonable to believe that FEV/VC is normally
distributed.
© 2010 Pearson Prentice Hall. All rights reserved
7-73
Section 7.5 The Normal Approximation to the Binomial
Probability Distribution
© 2010 Pearson Prentice Hall. All rights reserved
7-74
© 2010 Pearson Prentice Hall. All rights reserved
7-75
Criteria for a Binomial Probability Experiment
An experiment is said to be a binomial experiment
provided:
1. The experiment is performed n independent times.
Each repetition of the experiment is called a trial.
Independence means that the outcome of one trial will
not affect the outcome of the other trials.
2. For each trial, there are two mutually exclusive
outcomes - success or failure.
3. The probability of success, p, is the same for each
trial of the experiment.
© 2010 Pearson Prentice Hall. All rights reserved
7-76
For a fixed p, as the number of trials n in a
binomial experiment increases, the
probability distribution of the random
variable X becomes more nearly symmetric
and bell-shaped. As a general rule of
thumb, if np(1 – p) > 10, the probability
distribution will be approximately symmetric
and bell-shaped.
© 2010 Pearson Prentice Hall. All rights reserved
7-77
© 2010 Pearson Prentice Hall. All rights reserved
7-78
P(X = 18) ≈ P(17.5 < X < 18.5)
© 2010 Pearson Prentice Hall. All rights reserved
7-79
P(X < 18) ≈ P(X < 18.5)
© 2010 Pearson Prentice Hall. All rights reserved
7-80
© 2010 Pearson Prentice Hall. All rights reserved
7-81
EXAMPLE
Using the Binomial Probability Distribution Function
According to the Experian Automotive, 35% of all car-owning households
have three or more cars.
(a) In a random sample of 400 car-owning households, what is the
probability that fewer than 150 have three or more cars?
np(1  p)  400(0.35)(1  0.35)  X  400(0.35)
 91  10
 140
 X  400(0.35)(1  0.35)
 9.54
P( X  150)  P( X  149)
 P( X  149.5)
149.5  140 

 PZ 

9.54


 0.8413
(b) In a random sample of 400 car-owning households, what is the
probability that at least 160 have three or more cars?
P( X  160)  P( X  159.5)
159.5  140 

 PZ 

9.54


 0.0207
© 2010 Pearson Prentice Hall. All rights reserved
7-82