Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 6: Normal Probability
Distributions
P(a  x  b)
a

b
x
Chapter Goals
• Learn about the normal, bell-shaped, or
Gaussian distribution.
• How probabilities are found.
• How probabilities are represented.
• How normal distributions are used in the
real world.
6.1: Normal Probability
Distributions
• The normal probability distribution is the
most important distribution in all of
statistics.
• Many continuous random variables have
normal or approximately normal
distributions.
• Need to learn how to describe a normal
probability distribution.
Normal Probability Distribution:
1. A continuous random variable.
2. Description involves two functions:
a. A function to determine the ordinates of the graph
picturing the distribution.
b. A function to determine probabilities.
3. Normal probability distribution function:
1
f ( x) 
 2
( x   )2
2
2

e
This is the function for the normal (bell-shaped) curve.
4. The probability that x lies in some interval is the area
under the curve.
The normal probability distribution:

  3   2   

 
  2   3
Illustration of probabilities for a normal distribution:
b
P(a  x  b)   f ( x )dx
a
a
b
x
Note:
1. The definite integral is a calculus topic.
2. We will use a table to find probabilities for normal
distributions.
3. We will learn how to compute probabilities for one special
normal distribution: the standard normal distribution.
4. Transform all other normal probability questions to this
special distribution.
5. Recall the empirical rule: the percentages that lie within
certain intervals about the mean come from the normal
probability distribution.
6. We need to refine the empirical rule to be able to find the
percentage that lies between any two numbers.
Percentage, proportion, and probability:
1. Basically the same concepts.
2. Percentage (30%) is usually used when talking about a
proportion (3/10) of a population.
3. Probability is usually used when talking about the chance
that the next individual item will possess a certain
property.
4. Area is the graphic representation of all three when we
draw a picture to illustrate the situation.
6.2: The Standard Normal
Distribution
• There are infinitely many normal
probability distributions.
• They are all related to the standard normal
distribution.
• The standard normal distribution is the
normal distribution of the standard variable
z (the z-score).
Properties of the Standard Normal Distribution:
1. The total area under the normal curve is equal to 1.
2. The distribution is mounded and symmetric; it extends
indefinitely in both directions, approaching but never
touching the horizontal axis.
3. The distribution has a mean of 0 and a standard deviation
of 1.
4. The mean divides the area in half, 0.50 on each side.
5. Nearly all the area is between z = 3.00 and z = 3.00.
Note:
1. Table 3, Appendix B lists the probabilities associated with
the intervals from the mean (0) to a specific value of z.
2. Probabilities of other intervals are found using the table
entries, addition, subtraction, and the properties above.
Table 3, Appendix B entries:
0
z
The table contains the area under the standard normal curve
between 0 and a specific value of z.
Example: Find the area under the standard normal curve
between z = 0 and z = 1.45.
0
145
.
z
A portion of Table 3:
z
0.00
0.01
0.02
0.03
0.04
0.05

1.4

P(0  z  145
. )  0.4265
0.4265
0.06
Example: Find the area under the normal curve to the right of
z = 1.45; P(z > 1.45).
Area asked for
0.4265
0
145
.
P( z  145
. )  0.5000  0.4265  0.0735
z
Example: Find the area to the left of z = 1.45; P(z < 1.45).
0.5000
0.4265
0
P( z  145
. )  0.5000  0.4265  0.9265
145
.
z
Note:
1. The addition and subtraction used in the previous examples
are correct because the “areas” represent mutually
exclusive events.
2. The symmetry of the normal distribution is a key factor in
determining probabilities associated with values below (to
the left of) the mean. For example: the area between the
mean and z = 1.37 is exactly the same as the area between
the mean and z = +1.37.
3. When finding normal distribution probabilities, a sketch is
always helpful.
Example: Find the area between the mean (z = 0) and z = 1.26.
Area from table
0.3962
Area asked for
126
.
P( 126
.  z  0)  0.3962
0
126
.
z
Example: Find the area to the left of .98; P(z < .98).
Area asked for
Area from table
0.3365
.98
0
.98
P( z  .98)  0.5000  0.3365  01635
.
Example: Find the area between z = 2.3 and z = 1.8.
0.4893
 2.3
0.4641
0
18
.
P( 2.3  z  18
. )  P( 2.3  z  0)  P(0  z  18
. )
 0.4893  0.4641  0.9534
Example: Find the area between z = 1.4 and z = .5.
Area asked for
14
.
.5 0 .5
14
.
P( 14
.  z  .5)  P(0  z  14
. )  P(0  z .5)
 0.4192  01915
.
 0.2277
Note: The normal distribution table may also be used to
determine a z-score if we are given the area (to work
backwards).
Example: What is the z-score associated with the 85th
percentile?
0.3500
15%
implies
P85
0
z
Solution:
In Table 3 Appendix B, find the “area” entry that is closest to
0.3500.
z
0.00
0.01
0.02
0.03
0.04

1.0
0.3485 0.3500 0.3508

The area entry closest to 0.3500 is 0.3508.
The z-score that corresponds to this area is 1.04.
The 85th percentile in a normal distribution is 1.04.
0.05
Example: What z-scores bound the middle 90% of a normal
distribution?
0.4500
90%
implies
0
z
0
z
Solution:
The 90% is split into two equal parts by the mean.
Find the area in Table 3 closest to 0.4500.
z
0.00
0.01
0.02
0.03
0.04
0.05

1.6
0.4495 0.4500 0.4505

0.4500 is exactly half way between 0.4495 and 0.4505.
Therefore, z = 1.645
z = 1.645 and z = 1.645 bound the middle 90% of a normal
distribution.
6.3: Applications of Normal
Distributions
• Apply the techniques learned for the z
distribution to all normal distributions.
• Start with a probability question in terms of
x-values.
• Convert, or transform, the question into an
equivalent probability statement involving
z-values.
Standardization:
Suppose x is a normal random variable with mean  and
standard deviation .
The random variable
x
z

has a standard normal distribution.

0
c
c

x
z
Example: A bottling machine is adjusted to fill bottles with a
mean of 32.0 oz of soda and standard deviation of 0.02.
Assume the amount of fill is normally distributed and a bottle
is selected at random.
1. Find the probability the bottle contains between 32 oz and
32.025 oz.
2. Find the probability the bottle contains more than 31.97 oz.
When x  32;
z
When x  32.025;
32  

z
32  32

0
.02
32  


32.025  32
 125
.
.02
Illustration:
Area asked for
32
0
32.025
125
.
x
z
32  32 x  32 32.025  32 
P(32  x  32.025)  P



 .02

.02
.02
 P(0  z  125
. )  0.3944
Illustration:
3197
.
15
.
32
0
x  32 3197
.  32

P( x  3197
. )  P

.)
  P( z  15
 .02

.02
 0.5000  0.4332  0.9332
x
z
Note:
1. The normal table may be used to answer many kinds of
questions involving a normal distribution.
2. Often we need to find a cutoff point: a value of x such that
there is a certain probability in a specified interval defined
by x.
Example: The waiting time x at a certain bank is
approximately normally distributed with a mean of 3.7
minutes and a standard deviation of 1.4 minutes. The bank
would like to claim that 95% of all customers are waited on
by a teller within c minutes. Find the value of c that makes
this statement true.
Solution:
0.0500
0.5000 0.4500
3.7
0
P ( x  c) .95
x  3.7 c  3.7 

P

 .95
 14
.
14
. 
c  3.7 

P z 
 .95

14
. 
c
1645
.
x
z
c  3.7
 1645
.
14
.
c  (1645
. )(14
. )  3.7  6.003
c  6 minutes
Example: A radar unit is used to measure the speed of
automobiles on an expressway during rush-hour traffic. The
speeds of individual automobiles are normally distributed
with a mean of 62 mph. Find the standard deviation of all
speeds if 3% of the automobiles travel faster than 72 mph.
Illustration:
0.0300
0.4700
62
72
x
0
188
.
z
Solution:
P ( x  72)  0.03
x  62 72  62 
P

  0.03
 
 
72  62 

P z 
  0.03

 
72  62

 188
.
(188
. )( )  10
  10 / 188
.  5.32
P ( z  188
. )  0.03
Notation:
If x is a normal random variable with mean  and standard
deviation , this is often denoted: x ~ N(, ).
Example: Suppose x is a normal random variable with  = 35
and  = 6. A convenient notation to identify this random
variable is: x ~ N(35, 6).
6.4: Notation
• z-score used throughout statistics in a
variety of ways.
• Need convenient notation to indicate the
area under the standard normal distribution.
• z(a) is the token, or algebraic name, for the
z-score (point on the z axis) such that there
is a of the area (probability) to the right of
z(a).
Illustrations:
z(0.10) represents the
value of z such that the
area to the right under
the standard normal
curve is 0.10
010
.
0
z(0.80) represents the
value of z such that the
area to the right under
the standard normal
curve is 0.80
z(010
. )
z
0.80
z(0.80) 0
z
Example: Find the numerical value of z(0.10).
Table shows this area (0.4000)
0.10 (area information
from notation)
0
z(010
. )
z
Use Table 3: look for an area as close as possible to 0.4000
z(0.10) = 1.28
Example: Find the numerical value of z(0.80).
Look for 0.3000; remember
that z must be negative.
z(0.80)
0
z
Use Table 3: look for an area as close as possible to 0.3000.
z(0.80) = .84
Note:
The values of z that will be used regularly come from one of
the following situations:
1. The z-score such that there is a specified area in one tail of
the normal distribution.
2. The z-scores that bound a specified middle proportion of
the normal distribution.
Example: Find the numerical value of z(0.99).
0.01
z(0.99)
0
z
Because of the symmetrical nature of the normal distribution,
z(0.99) = z(0.01).
Using Table 3: z(0.99) = 2.33
Example: Find the z-scores that bound the middle 0.99 of the
normal distribution.
0.005
0.005
0.495
z(0.995)
or
 z(0.005)
0.495
0
z(0.005)
Use Table 3:
z(0.005)  2.575 and z(0.995)   z(0.005)  2.575
6.5: Normal Approximation of
the Binomial
• Recall: the binomial distribution is a
probability distribution of the discrete
random variable x, the number of successes
observed in n repeated independent trials.
• Binomial probabilities can be reasonably
estimated by using the normal probability
distribution.
Background: Consider the distribution of the binomial
variable x when n = 20 and p = 0.5.
Histogram:
P( x )
0
.
1
8
0
.
1
6
0
.
1
4
0
.
1
2
0
.
1
0
0
.
0
8
0
.
0
6
0
.
0
4
0
.
0
2
0
.
0
0
x
0
1
2
3
4
5
6
7
8
9
1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
The histogram may be approximated by a normal curve.
Note:
1. The normal curve has mean and standard deviation from
the binomial distribution.
  np  (20)(0.5)  10
  npq  (20)(0.5)(0.5)  5  2.236
2. Can approximate the area of the rectangles with the area
under the normal curve.
3. The approximation becomes more accurate as n becomes
larger.
Two Problems:
1. As p moves away from 0.5, the binomial distribution is
less symmetric, less normal-looking.
Solution: The normal distribution provides a reasonable
approximation to a binomial probability distribution
whenever the values of np and n(1  p) both equal or
exceed 5.
2. The binomial distribution is discrete, and the normal
distribution is continuous.
Solution: Use the continuity correction factor. Add or
subtract 0.5 to account for the width of each rectangle.
Example: Research indicates 40% of all students entering a
certain university withdraw from a course during their first
year. What is the probability that fewer than 650 of this
year’s entering class of 1800 will withdraw from a class?
Let x be the number of students that withdraw from a course
during their first year.
x has a binomial distribution: n = 1800, p = 0.4
The probability function is given by:
 1800
x
1800 x
P( x )  
for x  0, 1, 2, ... ,1800
 (0.4) (0.6)
 x 
Solution:
Use the normal approximation method.
  np  (1800)(0.4)  720
  npq  (1800)(0.4)(0.6)  432  20.78
P( x is fewer than 650)  P( x  650)
 P( x  649.5)
(for discrete variable x )
(for a continuous variable x )
x  720 649.5  720
 P


 20.78
20.78 
 P( z  3.39)
 0.5000  0.4997  0.0003