Download Elamentary 7.The Normal Distribution Text Statistics

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Bluman: Elamentary
Statistics: AStep byStep
Approach, Fourth Edition
7.The Normal Distribution
Text
© The McGraw-Hili
Companies, 2001
.;"
:':':"",:<':~':"; '.>
. l.."
.,.:,.,
..'/-"'!. ,:;:;,"':<>~:~:;;-'\:::
I) I Bluman: Elementary
I 7.TheNOnl1al Distribution I Text
Statistics: AStep byStep
Approach, Fourth Edition
248
Chapter 7 The NormalDistribution
© The McGraw-Hili
Companies, 2001
1:,
;,
What Is Normal?
Medical researchers have determined so-called normal intervals for a person's blood
pressure, cholesterol, triglycerides, and the like. For example, the normal range of systolic blood pressure is 110 to 140. The normal interval for a person's triglycerides is
from 30 to 200 milligrams per deciliter (mg/dl). By measuring these variables, a physician can determine if a patient's vital statistics are within the normal interval, or if some
type of treatment is needed to correct a condition and avoid future illnesses. The question then is, "How does one determine the so-called normal intervals?"
In this chapter, you will learn how researchers determine normal intervals for
specific medical tests using the normal distribution. You will see how the same methods are used to determine the lifetimes of batteries, the strength of ropes, and many
other traits.
-
Introduction
Random variables can be either discrete or continuous. Discrete variables and their distributions were explained in Chapter 6. Recall that a discrete variable cannot assume all
values between any two given values of the variables. On the other hand, a continuous
variable can assume all values between any two given values of the variables. Examples
of continuous variables are the heights of adult men, body temperatures of rats, and cholesterollevels of adults. Many continuous variables, such as the examples just mentioned, have distributions that are bell-shaped and are called approximately normally
distributed variables. For example, if a researcher selects a random sample of 100 adult
women, measures their heights, and constructs a histogram, the researcher gets a graph
similar to the one shown in Figure 7-l(a). Now, if the researcher increases the sample
size and decreases the width of the classes, the histograms will look like the ones shown
in Figure 7-l(b) and 7-l(c). Finally, if it were possible to measure exactly the heights
of all adult females in the United States and plot them, the histogram would approach.
what is called the normal distribution, shown in Figure 7-1(d). This distribution is also
-c----
~_
Bluman:Elementary
Statistics: A Step by Step
Approach, Fourth Edition
7.TheNormal Distribution
Text
IW
© The McGraw-Hili
Companies, 2001
Section 7-1
Introduction
249
Figure 7-1
Histograms for the
Distribution of Heights of
Adult Women
Oiljectivll 1. Identify
distributions as symmetrical
or skewed.
(a)Random sample of100 women
(b)Sample size increased and class width decreased
(e) Sample size increased and class width
decreased further
(d)Normal distribution forthe population
known as the bell curve or the Gaussian distribution, named for the German mathematician Carl Friedrich Gauss (1777-1855), who derived its equation.
No variable fits the normal distribution perfectly, since the normal distribution is a~
theoretical distribution. However, the normal distribution can be used to describe many
variables, because the deviations from the normal distribution are very small. This concept will be explained further in the next section.
When the data values are evenly distributed about the mean, the distribution is said
to be symmetrical. Figure 7-2(a) shows a symmetrical distribution. When the majority
of the data values fall to the left or right of the mean, the distribution is said to be
skewed. When the majority of the data values fall to the right of the mean, the distribution is said to be negatively or left skewed. The mean is to the left of the median, and
the mean and the median are to the left of the mode. See Figure 7-2(b). When the majority of the data values fall to the left of the mean, the distribution is said to be positively or right skewed. The mean falls to the right of the median and both the mean and
the median fall to the right of the mode. See Figure 7-2(c).
Figure 7-2
Normal and Skewed Distributions
";:1
:)
Mean
Median
Mode
(a)Normal
I
Mean Median Mode
(b)Negatively skewed
Mode Median Mean
(e)Positively skewed
---------
CD I
250
8luman: Elementary
Statistics:AStepbyStap
Approach. Fourth Edition
, 7.TheNormal Distribution
I Text
© The McGraw-Hili
Companies. 2001
Chapter 7 The Normal Distribution
.
The "tail" of the curve indicates the direction of skewness (right is positive, left
negative). This distribution can be compared with the one in Figure 3-1. Both types follow the same principles.
This chapter will present the properties of the normal distribution and discuss its applications. Then a very important theorem called the central limit theorem will be explained. Finally, the chapter will explain how the normal curve distribution can be used
as an approximation to other distributions, such as the binomial distribution. Since the
binomial distribution is a discrete distribution, a correction for continuity may be employed when the normal distribution is used for its approximation.
Properties of the
Normal Distribution
6bjeeiive 2. Identify the
properties ofthe normal
distribution.
Figure 7-3
Graph of a Circle and an
Application
In mathematics, curves can be represented by equations. For example, the equation of
the circle shown in Figure 7-3 is x 2 + y2 = r 2, where r is the radius. The circle can be
used to represent many physical objects, such as a wheel or a gear. Even though it is not
possible to manufacture a wheel that is perfectly round, the equation and the properties
of the circle can be used to study the many aspects of the wheel, such as area, velocity,
and acceleration. In a similar manner, the theoretical curve, called the normal distribution curve, can be used to study many variables that are not perfectly normally distributed but are nevertheless approximately normal.
The mathematical equation for the normal distribution is
~ ,
e- IX- JL)2f(2u')
y=
u-yl21T
where
e r« 2.718 (= means "is approximately equal to")
'IT
3.14
J.L = population mean
a = population standard deviation
=
Wheel
This equation may look formidable, but in applied statistics, tables are used for specific
problems instead of the equation.
Another important aspect in applied statistics is that the area under the normal distribution curve is more important than thefrequencies. Therefore, when the normal distribution is pictured, the y axis, which indicates the frequencies, is sometimes omitted.
Circles can be different sizes, depending on their diameters (or radii) and can be
used to represent wheels of different sizes. Likewise, normal curves have different
shapes and can be used to represent different variables.
The shape and position of the normal distribution curve depend on two parameters, the mean and the standard deviation. Each normally distributed variable
has its own normal distribution curve, which depends on the values of the variable's
mean and standard deviation. Figure 7-4(a) shows two normal distributions with
the same mean values but different standard deviations. The larger the standard deviation, the more dispersed, or spread out, the distribution is. Figure 7-4(b) shows two
normal distributions with the same standard deviation but with different means. These
curves have the same shapes but are located at different positions on the x axis. Figure 7-4(c) shows two normal distributions with different means and different standard
deviations.
I-
I
i
The normal distribution isacontinuous, symmetric, bell-shaped distribution of avariable.
Bluman: Elementary
Statistics: AStep by Step
Approach, Fourth Edition
7.TheNormal Distribution
Text
© TheMcGraw-Hili
Companias.2001
Section 7-3
The Standard Normal Distribution
251
Figure 7-4
Shapes of Normal Distributions
1l1=~
(a) Same means but different standard deviations
~
(f1 =(fz
111
J.iz
(b) Different means but same standard deviations
(c) Different means and different standard deviations
The properties of the normal distribution, including those mentioned in the definition, are explained next.
jill
mberb
-; coinswere tossed,'Nof
·'real~ing anycorinection
,withthe naturally occur
'bles; hashowedl
ulato only afew '
, dB. AboiJti OOye
,.Iater, twom3tl1erilati
,', Pierre Uiplace,inFra
",'
auss i' "
aquatio '
•,nbrmalciJive'iiidepe
.and withciut any krio
,',
,ofDeMoivre's work; I ',' .',;
'1924,Karl Pearson found';'
that DeMoivre had',
discovered the formula: ' '
• before laplace orGauss,. .'
These values follow the empirical rule for data given in Section 3-3.
One must know these properties in order to solve problems using applications involving distributions that are approximately normal .
The Standard Normal
Distribution
Since each normally distributed variable has its own mean and standard deviation, as
stated earlier, the shape and location of these curves will vary. In practical applications,
then, one would have to have a table of areas under the curve for each variable. To simplify this situation, statisticians use what is called the standard normal distribution.
-:,
e.
':>
...'.~.: ..~~.~·~~'.~·~~~~~~.'~~u~."~~.~~~~.~~·~~-.~;~'"u
bz
Bluman: Elementary
Statistics:A StepbyStep
Approach. Fourth Edition
252
Chapter 7
7.TheNormel Distribution
©TheMcGraw-Hili
Compenies. 2001
Text
The Normal Distribution
Figure 7-5
.
Areas under the Normal
Distribution Curve
J.l-30'
J.l-20'
I
J.l-10'
J.l+ 10'
J.l
I
I
' - - About 68% - - l
J.l+ 20'
About 95%
About 99.7%
l:lllj&~tiva 3.
Firid the area
under the standard normal
distribution, given various z
values.
}
J.l+ 30'
I
}
The standard normal distribution is anormal distribution with amean of 0and astandard deviation of1.
The standard normal distribution is shown in Figure 7-fJ.
Figure 7-6
Standard Normal
Distribution
The values wider the curve indicate the proportion of area in each section. For example, the area between the mean and one standard deviation above or below the mean
is about 0.3413, or 34.13%.
The formula for the standard normal distribution is
All normally distributed variables can be transformed into the standard normally
distributed variable by using the formula for the standard score:
Z=
value - mean
standard deviation
m@
This is the same formula used in Section 3-4. The use of this formula will be explained
in the next section.
As stated earlier, the area under the normal distribution curve is used to solve practical application problems, such as finding the percentage of adult women whose height
is between 5 feet 4 inches and 5 feet 7 inches, or finding the probability that a new
battery will last longer than four years. Hence, the major emphasis of this section will
be to show the procedure for finding the area under the standard normal distribution
curve for any z value. The applications will be shown in the next section. Once the X
values are transformed using the preceding formula, they are called z values. The z
value is actually the number of standard deviations that a particular X value is away
-t
8luman: Elementary
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies. 2001
Statistics: AStep byStep
Approach. Fourth Edition
Section7-3
The Standard Normal Distribution
253
from the mean. Table E in Appendix C gives the area (to four decimal places) under the
standard normal curve for any z value from 0 to 3.09.
Finding Areas under
the Standard Normal
Distribution Curve
For the solution of problems using the normal distribution, a four-step procedure is recommended with the use of the Procedure Table shown below.
STEP 1
Draw a picture.
STEP 2
Shade the area desired.
STEP 3
Find the correct figure in the following Procedure Table (the figure that is
similar to the one you've drawn).
STEP 4
Follow the directions given in the appropriate block of the Procedure Table
to get the desired area.
There are seven basic types of problems and all seven are summarized in the Procedure Table. Note that this table is presented as an aid in understanding how to use the
normal distribution table and in visualizing the problems. After learning the procedures,
one should not find it necessary to refer to the procedure table for every problem.
eI
254
Bluman: Elementary
Statistics: AStep byStep
Approach, Fourth Edition
Chapter 7
I 7.TheNormal Distribution
j Text
© The McGraw-Hili
Companies, 2001
The Normal Distribution
t·,
.
Situation 1 Find the area under the normal curve between 0 and any z value;
, Example 7-1
""
<
Find the area under the normal distribution curve between z = 0 and z = 2.34.
Solution
Draw the figure and represent the area as shown in Figure 7-7.
figure 7-7
Area under the Standard
Normal Curve for
. Example 7-1
o
2.34
Since Table E gives the area between 0 and any z value to the right of 0, one need
only look up the z value in the table. Find 2.3 in the left column and 0.04 in the top tow.
The value where the column and row meet in the table is the answer, 0.4904. See Figure
7-8. Hence, the area is 0.4904, or 49.04%.
figure 7-8
Using Table E in
the Appendix for
Example 7-1
Bluman: Elementary
Statistics: AStepby Step
Approach. Fourth Edition
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies. 2001
Section 7-3
Find the area between z
The Standard Normal Distribution
255
= 0 and z = 1.8.
Solution
Draw the figure and represent the area as shown in Figure 7-9.
Figure 7-9
Area under the Standard
Normal Curve for
Example 7-2
o
1.8
Find the area in Table E by finding 1.8 in the left column and 0.00 in the top row. The
area is 0.4641 or 46.41%.
Next, one must be able to find the areas for values that are not in Table E. This is
done by using the properties of the normal distribution described in Section 7-2.
EXample 7"-3
Find the area between z = 0 and z = -1.75.
Solution
Represent the area as shown in Figure 7-10.
Figure 7-10
Area under the Standard
Normal Curve for
Example 7-3
o
-1.75
Table E does not give the areas for negative values of z, But since the normal distribution is symmetric about the mean, the area to the left of the mean (in this case, the
mean is 0) is the same as the area to the right of the mean. Hence one need only look up
the area for z = + 1.75, which is 0.4599, or 45.99%. This solution is summarized in
block 1 in the Procedure Table.
Remember that area is always a positive number, even
if the z value is negative.
Situation 2 Find the area under the curve in either tail.
mple7+4
Find the area to the right of z
= 1.11.
Solution
Draw the figure and represent the area as shown in Figure 7-11.
b
_
~-----
-----
-~-~-~----
e
256
--~-
Bluman: Elementary
Statistics:AStepbyStep
Approach, Fourth Edition
-
--~
---
7.The Normal Distribution
© The McGraw-Hili
Text
Companies. 2001
Chapter 7 The Normal Distribution
Figure 7-11
Area under the Standard
• Normal Curve for
Example 7-4
o
1.11
The required area is in the tail of the curve. Since Table E gives the area between z
= 0 and z = 1.11, first fmd that area. Then subtract this value from 0.5000, since half of
the area under the curve is to the right of z = O. See Figure 7-12.
Figure 7-12
Finding the Area in the
Tail of the Curve
(Example 7-4)
o
1.11
The areabetweenz = Oandz = 1.11 is 0.3665, and the area to the right ofz = 1.11
is 0.1335, or 13.35%, obtained by subtracting 0.3665 from 0.5000.
Example 7+5
.
Find the area to the left of z = -1.93.
Solution
The desired area is shown in Figure 7-13.
Figure 7-13
Area under the Standard
Normal Curve for
Example 7-5
-1.93
o
Again, Table E gives the area for positive z values. But from the symmetric property of the normal distribution, the area to the left of -1.93 is the same as the area to the
right of z = + 1.93, as shown in Figure 7-14.
Now find the area between 0 and + 1.93 and subtract it from 0.5000, as shown:
0.5000
-0.4732
.0.0268, or 2.68%
IIII
Bluman: Elementary
7.TheNormal Distribution
Text
© The McGraw-Hili
, Companies, 2001
Statistics: AStepbyStep
Approech, Fourth Edition
IJ
I'
'IIII'
'II,
II
Section 7-3
The Standard Nanna! Distribution
257
I
II
,I
'I
:!
Figure 7-14
I
Comparison of Areas to
the Right of + 1.93 and to
the Left of -1.93
(Example 7-5)
o
+1.93
o
-1.93
This procedure was summarized in block 2 of the Procedure Table.
Situation 3 Find the area under the curve between any two z values on the same side of
the mean.
ample 7
Find the area between z = 2.00 and z = 2.47.
Solution
The desired area is shown in Figure 7-15.
Figure 7-15
Area under the Curve for
Example 7-6
o
2.00 2.47
For this situation, look up the area from z = 0 to Z = 2.47 and the area from z = 0
to Z = 2.00. Then subtract the two areas, as shown in Figure 7-16.
Figure 7-16
~ - - 0.4932 --~
Finding the Area
under the Curve for
Example 7-6
! '
k_'
o
2.00 2.47
---------
8luman: Elamentary
Statistics:AStep bV Step
Approach. Fourth Edition
258
Chapter 7
7.TheNormal Distribution
© The McGrew-Hili
Companies. 2001
Text
The Normal Distribution
"
The area between z = 0 and z = 2.47 is 0.4932. The area between z = 0 and z =
2.00 is 0.4772. Hence, the desired area is 0.4932 - 0.4772 = 0.0160, or 1.60%. This
procedure is summarized in block 3 of the Procedure Table.
Two things should be noted here. First, the areas, not the z values, are subtracted.
Subtracting the z values will yield an incorrect answer. Second, the procedure in Example 7-6 is used when both z values are on the same side of the mean.
e
Example 7~7
Find the area between z = -2.48 and z = -0.83.
Solution
The desired area is shown in Figure 7-17.
Figure 7-17
Area under the Curve for
Example 7-7
o
-0.83
-2.48
The area between z = 0 and z = -2.48 is 0.4934. The area between z = 0 and z =
-0.83 is 0.2967. Subtracting yields 0.4934 - 0.2967 = 0.1967, or 19.67%. This solution is summarized in block 3 of the Procedure Table.
Situation 4 Find the area under the curve between any two z values on opposite sides of
the mean.
"Examp e7-+8 "
Find the area between z = + 1.68 and
z=
-1.37.
Solution
The desired area is shown in Figure 7-18.
Figure 7-18
Area under the Curvefor
Example 7-8
-1.37
o
1.68
Now, since the two areas are on opposite sides of z = 0, one must find both areas
and add them. The area between z = 0 and z = 1.68 is 0.4535. The area between z = 0
and z = -1.37 is 0.4147. Hence, the total area between z = -1.37 and z -= +1.68 is
0.4535 + 0.4147 = 0.8682, or 86.82%.
7.TheNormal Distribution
Bluman: Elementary
Statistics: AStepby Step
Approach, fourth Edition
Text
© The McGrew-Hili
Companies. 2001
Section 7-3
The Standard Normal Distribution
259
This type of problem is summarized in block 4 of the Procedure Table.
Situation 5 Find the area under the curve to the left of any z value, where z is greater
than the mean.
Example 7--9 '
Find the area to the left of z = 1.99.
\
Solution
The desired area is shown in Figure 7-19.
Figure 7-19
Area under the Curve for
Example 7-9
'I
':\
•...
Ii
o
1.99
Since Table E gives only the area between z = 0 and z = 1.99, one must add 0.5000
to the table area, since 0.5000 (half) of the total area lies to the left of z = O. The area
betweenz = 0 andz = 1.99 is 0.4767, and the total area is 0.4767 + 0.5000 = 0.9767,
or 97.67%.
This solution is summarized in block 5 of the Procedure Table.
The same procedure is used when the z value is to the left of the mean, as shown in
the next example.
Situation 6 Find the area under the curve to the right of any z value, where z is less than
the mean.
EXample 7-10
'
Find the area to the right of z = -1.16.
Solution
The desired area is shown in Figure 7-20.
Figure 7-20
Area under the Curve for
Example 7-10
-1.16
The area betweenz = 0 andz
0.5000 = 0.8770, or 87.70%.
0
= -l.16is 0.3770. Hence, the total area is 0.3770 +
Ii
II
I!
\1
---,
eI
!
II
Bluman: Elementary _
Statistics: A Step byStep
Approach. Fourth Edition
I 7.TheNormal Distribution I Text
© The McGraw-Hili
Companies. 2001
Ii
I
260
Chapter 7 The Normal Distribution
This type of problem is summarized in block 6 of the Procedure Table.
.
;.
The final type of problem is that of finding the area in two tails. To solve it, find the
area in each tail and add them, as shown in the next example.
Situation 7 Find the total area under the curve in any two tails.
; EXample 7-1
Find the area to the right of z = +2.43 and to the left of z = -3.01.
Solution
The desired area is shown in Figure 7-21.
Figure 7-21
Area under the Curve for
Example 7-11
-3.01
o
2.43
The area to the right of 2.43 is 0.5000 - 0.4925 = 0.0075. The area to the left of
-3.01 is 0.5000 - 0.4987 = 0.0013. The total area, then, is 0.0075 + 0.0013 =
0:0088, or 0.88%.
This solntion is summarized in block 7 of the Procedure Table.
z=
The Normal
Distribution Curve as
a Probability
Distribution Curve
Example 7-12
The normal distribution curve can be used as a probability distribution curve for normally distributed variables. Recall that the normal distribution is a continuous distribution, as opposed to a discrete probability distribution, as explained in Chapter 6. The fact
that it is continuous means that there are no gaps in the curve. In other words, for every
z value on the x axis, there is a corresponding height, or frequency value.
However, as stated earlier, the area under the curve is more important than the frequencies. This area corresponds to a probability. That is, if it were possible to select any
z value at random, the probability of choosing one, say, between 0 and 2.00 would be the
same as the area under the curve between 0 and 2.00. In this case, the area is 0.4772.
Therefore, the probability of selecting any z value between 0 and 2.00 is 0.4772. The
problems involving probability are solved in the same manner as the previous examples
involving areas in this section. For example, if the problem is to fmd the probability of
selecting az value between 2.25 and 2.94, solve it by using the method shown in block
3 of the Procedure Table.
For probabilities, a special notation is used. For example, if the problem is to
find the probability of any z value between 0 and 2.32, this probability is written as
P(O < z < 2.32).
Find the probability for each.
a.
P(O
< z < 2.32)
f;luman: Elementary
~.
-r 7.~N;;r;;;al m~rib-~ion
IT~~--­
r
I © The McGraw-Hili
IV
Companies. 2001
SUrtistics: AStepby Step
Approach. Fourth Edition
Section 7-3
The Standard Normal Distribution
261
b. P(Z < 1.65)
c. P(z > 1.91)
Solution
a.
P(O < z < 2.32) means to find the area under the normal distribution curve
between 0 and 2.32. Look up the area in Table E corresponding to z = 2.32. It is
0.4898, or 48.98%. The area is shown in Figure 7-22.
Figure 7-22
Area under the Curve for
Part a of Example 7-12
o
2.32
b. P(z < 1.65) is represented in Figure 7-23.
Figure 7-23
Area under the Curve for
Part b of Example 7-12
o
1.65
First, find the area between 0 and 1.65 in Table E. Then add it to 0.5000 to get
0.4505 + 0.5000 = 0.9505, or 95.05%.
c. P(z> 1.91) is shown in Figure 7-24.
Figure 7-24
Area under the Curve for
Part c of Example 7-12
o
1.91
Since this area is a tail area, find the area between 0 and 1.91 and subtract it from
0.5000. Hence, 0.5000 - 0.4719 = 0.0281, or 2.81 %.
Sometimes, one must fmd a specific z value for a given area under the normal distribution. The procedure is to work backward, using Table E.
Find the z value such that the area under the normal distribution curve between 0 and the
z value is 0.2123.
eI
262
Bluman: Elementary
Statistics: A Step byStep
Approach. Fourth Edition
I 7.TheNormal Distribution I Text
© The McGraw-Hili
Companies. 2001
Chapter 7 The Normal Distribution
t,
Solution
,.
Draw the figure. The area is shown in Figure 7-25.
Figure 7-25
Area under the Curve for
Example 7-13
o
z
Next, find the area in Table E, as shown in Figure 7-26. Then read the correct Z
value in the left column as 0.5 and in the top row as 0.06 and add these two values to get
0.56.
Figure 7-26
Finding the z Value from
Table E (Example 7-13)
If the exact area cannot be found, use the closest value. For example, if one wanted
to find the z value for an area 0.4241, the closest area is 0.4236, which gives a Z value of
1.43. See Table E in Appendix C.
Finding the area under the standard normal distribution curve is the first step in
solving a wide variety of practical applications in which the variables are normally distributed. Some of these applications will be presented in Section 7~.
7-1. What are the characteristics of the normal
distribution?
7-2. Why is the normal distribution important in statistical
analysis?
7-3. What is the total area under the normal distribution
curve?
7-4•. What percentage of the area falls below the mean?
Above the mean?
7-5. What percentage of the area under the normal
distribution curve falls within one standard deviation above
and below the mean? Two standard deviations? Three
standard deviations?
For Exercises 7-6 through 7-25, find the area under the
normal distribution curve.
= 0 and z = 1.97
Between z = 0 and z = 0.56
7-6. Between z
7-7.
Bluman: Elementary
Statistics: AStep byStep
Approach, Fourth Edition
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies. 2001
Section 7-3
7-S. Between z = 0 and z = -0.48
The Standard Normal Distribution
7-40.
7-9. Between z = 0 and z = -2.07
7-10. To the right of z = 1.09
7-U. To the right of z = 0.23
7-12. To the left of z = -0.64
o
7-13. To the left of z = -1.43
7-14. Between z = 1.23 and z = 1.90
z
7-41.
7-15. Between z = 0.79 and z = 1.28
7-16. Between z = -0.87 and z = -0.21
7-17. Between z = -1.56 and z = -1.83
7-18. Between z = 0.24 and z = -1.12
z
7-19. Between z = 2.47 and z = -1.03
7-20. To the left of z = 1.31
o
7-42.
7-21. To the left of z = 2.11
7-22. To the right of z = -1.92
0.0239
7-23. To the right of z = -0.18
7-24. To the left of z = -2.15 and to the right of z = 1.62
o
7-25. To the right of z = 1.92 and to the left of z = -0.44
In Exercises 7-26 through 7-39, find probabilities for
each, using the standard normal distribution.
< z < 1.69)
P(O < z < 0.67)
P( -1.23 < z < 0)
P( -1.57 < z < 0)
P(z > 2.59)
P(z > 2.83)
P(z < -1.77)
P(z < -1.51)
P( -0.05 < z < 1.10)
P(-2.46 < z < 1.74)
P(1.32 < z < 1.51)
P(1.46 < z < 2.97)
P(z > -1.39)
P(z < 1.42)
z
7-43.
7-26. P(O
7-27.
7-28.
7-29.
7-30.
7-31.
7-32.
7-33.
7-34.
7-35.
7-36.
7-37.
7-38.
7-39.
For Exercises 7-40 through 7-45, find the z value that
corresponds to the given area.
o
z
7-44.
o
7-45.
z
o
I
I
~------------
z
263
Bluman: Elementary
Statistics:AStep bV Step
Approach, Fourth Edition
264
Chapter 7
7.TheNormal Distribution
© The McGraw-Hili
Companies. 2001
TeXl
The Normal Distribution
7-46. Find the z value to the right of the mean so that
a.5%
a. 53.98% of the area under the distribution curve lies to
the left of it.
b. 71.90% ef the area under the distribution curve lies to
the left of it.
c. 96.78% of the area under the distribution curve lies to
the left of it.
b. 10%
c. 1%
*7-50. Find the z values that correspond to the 90th
percentile, 80th percentile, 50th percentile, and 5th
percentile.
*7-51. Draw a normal distribution with a mean of 100 and
a standard deviation of 15.
7-47. Find the z value to the left of the mean so that
a. 98.87% of the area under the distribution curve lies to
the right of it.
b. 82.12% of the area under the distribution curve lies to
the right of it.
c. 60.64% of the area under the distribution curve lies to
the right of it.
*7-52. Find the equation for the standard normal
distribution by substituting 0 for J.L and 1 for o-in the
equation
e- rx- p.'P/(2u 2 )
y
*7-48. Find two z values so that 40% of the middle area is
bounded by them.
*7-49. Find two z values, one positive and one negative,
so that the areas in the two tails total the following values.
Applications of the
Normal Distribution
~!)jet\tille
4. Find
probabilities foranormally
distributed variable by
transforming it into astandard
normal variable.
= 0-V27i-
*7-53. Graph the standard normal distribution by using the
formula derived in Exercise 7-52. Let 7T = 3.14 and e =
2.718. Use X values of -2, -1.5, -1, -0.5,0,0.5,1,1.5,
and 2.
The standard normal distribution curve can be used to solve a wide variety of practical
problems. The only requirement is that the variable be normally or approximately normally distributed. There are several mathematical tests to determine whether a variable
is normally distributed. See the Critical Thinking Challenge on page 294. For all the
problems presented in this chapter, one can assume that the variable is normally or
approximately normally distributed.
To solve problems by using the standard normal distribution, transform the original
variable into a standard normal distribution variable by using the formula
Z=
value - mean
standard deviation
or
x-p.,
z=-0-
This is the same formula presented in Section 3-4. This formula transforms the values
of the variable into standard units or z values. Once the variable is transformed, then the
Procedure Table and Table E in Appendix C can be used to solve problems.
For example, suppose that the scores for a standardized test are normally distributed, have a mean of 100, and have a standard deviation of 15. When the scores are
transformed into z values, the two distributions coincide, as shown in Figure 7-27. (Recall that the z distribution has a mean of a and a standard deviation of 1.)
Figure 7-27
Test Scores and Their
Corresponding z Values
-3
55
-2
70
-1
85
o
1
2
100
115
130
3 z
145
Blumen: Elementary
Statistics: AStep byStep
Approach, Fourth Edition
7.TheNonnel Distribution
© The McGraw-Hili
Text
Companies. 2001
Section 7-4
Applications of the Normal Distribution
265
To solve the application problems in this section, transform the values of the variable into z values and then use the Procedure Table and Table E, as shown in the next
examples.
Example 7-14
If the scores for the test have a mean of 100 and a standard deviation of 15, find the percentage of scores that will fall below 112.
Solution
STEP 1
Draw the figure and represent the area, as shown in Figure 7-28.
Figure 7-28
Area under the Curve for
Example 7-14
100
STEP 2
112
Find the z value corresponding to a score of 112.
z =X (T
JL
= 112 -
100
15
= 12 = 0 8
15'
Hence, 112 is 0.8 standard deviation above the mean of 100, as shown
for the z distribution in Figure 7-29.
Figure 7-29
Area and z Values for
Example 7-14
o
STEP 3
0.8
Find the area using Table E. The area between z = 0 and z =:= 0.8 is 0.2881. t
Since the area under the curve to the left of z = 0.8 is desired, add 0.5000 to
0.2881 (0.5000 + 0.2881 = 0.7881). Therefore, 78.81% of the scores fall
below 112.
Each month, an American household generates an average of 28 pounds of newspaper
for garbage or recycling. Assume the standard deviation is two pounds. If a household
is selected at random, find the probability of its generating
•
--"f
:
8luman:Elementary
Statistics:AStep by Step
Approach. Fourth Edition
I"
I
7.The Normal Distribution
© The McGraw-Hili
Companies. 2001
Text
I
i
266
Chapter 7
The Normal Distribution
a. Between 27 and 31 pounds per month.
b. More than 30.2 pounds per month.
Assume the variable is approximately normally distributed.
:.principie
'ormal
Orrec!
'~t
,chatliJ;g.
i;ftie'pllme
Source:Michael D. Shook and Robert L. Shook,The Book of Odds (New York: Penguin Putnam Inc., 1991).
Solution a
STEP 1
Draw the figure and represent the area. See Figure 7-30.
Figure 7-30
Area under the Curve for
Part a of Example 7-15
27 28
STEP 2
Find the two Z values.
Zl =
Zz =
STEP 3
31
X- JL
-(F-
X - JL
-(F-
=
=
27 - 28
2
=
-:21 =
31 - 28
:2 =
2
=
3
-0.5
1.5
Find the appropriate area, using Table E. The area between Z = 0 and Z =
-0.5 is 0.1915. The area between z = 0 and z = 1.5 is 0.4332. Add 0.1915
and 0.4332 (0.1915 + 0.4332 = 0.6247). Thus, the total area is 62.47%. See
Figure 7-31.
Figure 7-31
Area and z Values for Part
a of Example 7-15
27 28
-0.5 0
31
1.5
Hence, the probability that a randomly selected household generates between 27 and 31
pounds of newspapers per month is 62.47%.
Solution b
STEP 1
Draw the figure and represent the area, as shown in Figure 7-32.
Bluman: Elementary
Statistics: AStepbyStep
Approach. Fourth Edition
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies. 2001
Section 7-4 Applications of the Normal Distribution
267
Figure 7-32
Area under the Curve for
Part b of Example 7-15
28
STEP 2
Find the z value for 30.2.
z =X -
I.J.
= 30.2 -
a'
STEP 3
30.2
28 = 2.2
2· 2
= 1.1
Find the appropriate area. The area between z = 0 and z = 1.1 obtained
from Table E is 0.3643. Since the desired area is in the right tail, subtract
0.3643 from 0.5000.
0.5000 - 0.3643 = 0.1357
Hence, the probability that a randomly selected household will accumulate
more than 30.2 pounds of newspapers is 0.1357, or 13.57%.
The normal distribution can also be used to answer questions of "How many?" This
application is shown in the next example.
The American Automobile Association reports that the average time it takes to respond
to an emergency call is 25 minutes. Assume the variable is approximately normally distributed and the standard deviation is 4.5 minutes. If 80 calls are randomly selected, approximately how many will be responded to in less than 15 minutes?
Source: Michael D. Shookand Robert L. Shook, The Book of Odds (NewYork: Penguin Putnam Inc.,
1991),70.
Solution
To solve the problem, find the area under the normal distribution curve to the left of 15.
STEP 1
Draw a figure and represent the area as shown in Figure 7-33.
Figure 7-33
Area under the Curve for
Example 7-16
15
25
eI
8luman: Elementary
Statistics: AStepbyStep
Approach, Fourth Edition
268
I 7.TheNormal Distribution I Text
© The McGraw-Hili
.Companies. 2001
Chapter 7 The Normal Distribution
t,
;.
STEP 2
Find the z value for 15.
z =X (T
f.L
= 15 -
25
4.5
= -2.22
STEP 3
Find the appropriate area. The area obtained from Table E is 0.4868, which
corresponds to the area between z = 0 and z = -2.22 (Use +2.22.)
STEP 4
Subtract 0.4868 from 0.5000 to get 0.0132.
STEP 5
To find how many calls will be made in less than 15 minutes, multiply the
sample size (80) by the area (0.0132) to get 1.056. Hence, 1.056, or
approximately one, call will be responded to in under 15 minutes.
Note: For problems using percentages, be sure to change the percentage to a decimal before multiplying. Also, round the answer to the nearest whole number, since it is
not possible to have 1.056 calls.
finding Data Values
Given Specific
Probabilities
Example 7-17
~lljllil::tlve 5.
Find specific
data values forgiven
percentag as using the
standard normal distribution.
The normal distribution can also be used to find specific data values for given percentages. This application is shown in the next example.
In order to qualify for a police academy, candidates must score in the top 10% on a general abilities test. The test has a mean of 200 and a standard deviation of 20. Find the
lowest possible score to qualify. Assume the test scores are normally distributed.
Solution
Since the test scores are normally distributed, the test value (X) that cuts off the upper
10% of the area under the normal distribution curve is desired. This area is shown in
Figure 7-34.
Figure 7-34
Area under the Curve for
Example 7-17
200
X
Work backward to solve this problem
STEP 1
Subtract 0.1000 from 0.5000 to get the area under the normal distribution
between 200 and X: 0.5000 - 0.1000 = 0.4000.
STEP 2
Find the z value that corresponds to an area of 0.4000 by looking up 0.4000
in the area portion of Table E. If the specific value cannot be found, use the
closest value-in this case, 0.3997, as shown in Figure 7-35. The
- - - - - - - - - - - - - - -- - -
Bluman: Elementary
Statistics: A Step byStep
Approach, Fourth Edition
---
--
----
7.The Normal Distribution
Text
© The McGraw-Hili
Companies. 2001
Section 7-4
Applications of the Normal Distribution
269
corresponding Z value is 1.28. (If the area falls exactly halfway between two
z values, use the larger of the two z values. For example, the area 0.4500
falls halfway between 0.4495 and 0.4505. In this case use 1.65 rather than
1.64 for the z value.)
Figure 7-35
Finding the z Value from
TableE (Example 7-17)
Substitute in the formula z = (X - JL)/ 0' and solve for X.
STEP 3
1.28 =X ~goo
+ 200
25.60 + 200
(1.28)(20)
= X
= X
225.60 =X
226 =X
A score of 226 should be used as a cutoff. Anybody scoring 226 or higher qualifies.
Instead of using the formula shown in Step 3 one can use the formula X
This is obtained by solving
z=
(X - JL)
= z . 0' + JL.
for X
0'
as shown.
z·O'=X-JL
z'u+JL=X
X=z'u+JL
Multiply both sides by
0'.
Add JL to both sides.
Exchange both sides of the equation.
For a medical study, a researcher wishes to select people in the middle 60% of the population based on blood pressure. If the mean systolic blood pressure is 120 and the
b
_
e
270
8luman: Elementary
Statistics: AStep byStep
Approach. Fourth Edition
Chapter 7
7.TheNormal Distribution
© The McGraw-Hili
Text
Companies. 2001
The Normal Distribution
standard deviation is 8, find the upper and lower readings that would qualify people to
participate in the study.
Solution
Assume that blood pressure readings are normally distributed; then cutoff points are as
shown in Figure 7-36.
Figure 7-36
Area under the Curve for
Example 7-18
Note that two values are needed, one above the mean and one below the mean. Find
the value to the right of the mean first. The closest z value for an area of 0.3000 is 0.84.
Substituting in the formula X = za + p" one gets
Xl = zo: + p, = (0.84)(8)
On the other side,
z=
X2 = (-0.84)(8)
+ 120 =
126.72
-0.84; hence,
+ 120 = 113.28
Therefore, the middle 60% will have blood pressure readings of 113.28
126.72.
<
X
<
As shown in this section, the normal distribution is a useful tool in answering many
questions about variables that are normally or approximately normally distributed.
7-54. Explain why the standard normal distribution can be
used to solve many real-life problems.
7-55. The average hourly wage of production workers in
manufacturing is $11.76. Assume the variable is normally
distributed. If the standard deviation of earnings is $2.72,
find these probabilities for a randomly selected production
worker.
a. The production worker earns more than $12.55.
b. The production worker earns less than $8.00.
Source: Statistical Abstract ofthe United States 1994 (Washington,
DC: U.S. Bureau of the Census).
7-56. The Speedmaster N automobile gets an average 22.0
miles per gallon in the city. The standard deviation is 3 miles
per gallon. Assume the variable is normally distributed. Find
the probability that on any given day, the car will get more
than 26 miles per gallon when driven in the city.
7-57. If the mean salary of telephone operators in the
United States is $31,256, and the standard deviation is
$3,000, find the following probabilities for a randomly
selected telephone operator. Assume the variable is
normally distributed.
a. The operator earns more than $35,000.
b. The operator earns less than $25,000.
7-58. For a specific year, Americans spent an average of
$71.12 for books. Assume the variable is normally
distributed. If the standard deviation of the amount spent on
books is $8.42, find these probabilities for a randomly
selected American.
a. He or she spent more than $60 per year on books.
b. He or she spent less than $80 per year on books.
Source: Statistical Abstract ofthe United States 1994 (Washington,
DC: U.S. Bureau of the Census).
Bluman: Elementary
Slatistics: ASlep by Slap
Approach, Fourth Edition
7.TheNormal Distribution
Text
©TheMcGraw-Hili
Companies, 2001
Section 7-4
7-59. A survey found that people keep their television sets
an average of 4.8 years. The standard deviation is 0.89 year.
If a person decides to buy a new TV set, fmd the probability
that he or she has owned the old set for the following
amount of time. Assume the variable is normally distributed.
a. Less than 2.5 years
b. Between 3 and 4 years
c. More than 4.2 years
7-60. The average age of CEOs is 56 years. Assume the
variable is normally distributed. If the standard deviation is
four years, find the probability that the age of a randomly
selected CEO will be in the following range.
a. Between 53 and 59 years old
b. Between 58 and 63 years old
c. Between 50 and 55 years old
Source: MichaelD. ShookandRobertL. Shook, The Book of Odds
(New York: Penguin PutnamInc., 1991),49.
7-61. The average life of a brand of automobile tires is
30,000 miles, with a standard deviation of 2,000 miles. If a
tire is selected and tested, find the probability that it will
have the following lifetime. Assume the variable is
normally distributed.
a. Between 25,000 and 28,000 miles
b. Between 27,000 and 32,000 miles
c. Between 31,500 and 33,500 miles
7-62. The average time a visitor spends at the Renzie Park
Art Exhibit is 62 minutes. The standard deviation is 12
minutes. If a visitor is selected at random, find the
probability that he or she will spend the following amount
of time at the exhibit. Assume the variable is normally
distributed.
a. At least 82 minutes
b. At most 50 minutes
7-63. The average time for a courier to travel from
Pittsburgh to Harrisburg is 200 minutes, and the standard
deviation is 10 minutes. If one of these trips is selected at
random, find the probability that the courier will have the
following travel time. Assume the variable is normally
distributed.
a. At least 180 minutes
b. At most 205 minutes
7-64. The average amount of rain per year in South
Summerville is 49 inches. The standard deviation is
5.6 inches. Find the probability that next year South
Summerville will receive the following amount of rain.
Assume the variable is normally distributed.
a. At most 51 inches of rain
b. At least 58 inches of rain
7-65. The average waiting time for a drive-in window at a
local bank is 9.2 minutes, with a standard deviation of 2.6
minutes. When a customer arrives at the bank, find the
Applications of the Normal Distribution
271
probability that the customer will have to wait the following
amount of time. Assume the variable is normally distributed.
a. Between 5 and 10 minutes
b. Less than 6 minutes or more than 9 minutes
7-66. The average time it takes college freshmen to
complete the Mason Basic Reasoning Test is 24.6 minutes.
The standard deviation is 5.8 minutes. Find these
probabilities. Assume the variable is normally distributed.
a. It will take a student between 15 and 30 minutes to
complete the test.
b. It will take a student less than 18 minutes or more than
28 minutes to complete the test. .
7-67. A brisk walk at 4 miles per hour burns an average of
300 calories per hour. If the standard deviation of the
distribution is 8 calories, find the probability that a person
who walks one hour at the rate of 4 miles per hour will
burn the following calories. Assume the variable is
normally distributed.
a. More than 280 calories
b. Less than 293 calories
c. Between 285 and 320 calories
7-68. During September, the average temperature of
Laurel Lake is 64.2° and the standard deviation is 3.2°.
Assume the variable is normally distributed. For a
randomly selected day, find the probability that the
temperature will be as follows:
a. Above62°
b. Below 67°
c. Between 65° and 68°
7-69. If the systolic blood pressure for a certain group of
obese people has a mean of 132 anda standard deviation of
8, find the probability that a randomly selected obese
person will have the following blood pressure. Assume the
variable is normally distributed.
a. Above 130
b. Below 140
c. Between 131 and 136
7-70. In order to qualify for letter sorting, applicants are
given a speed-reading test. The scores are normally
distributed, with a mean of 80 and a standard deviation of
8. If only the top 15% of the applicants are selected, find
the cutoff score.
7-71. The scores on a test have a mean of 100 and a
standard deviation of 15. If a personnel manager wishes to
select from the top 75% of applicants who take the test,
find the cutoff score. Assume the variable is normally
distributed.
7-72. For an educational study, a volunteer must place in
the middle 50% on a test. If the mean for the population is
100 and the standard deviation is 15, find the two limits
(upper and lower) for the scores that would enable a
.
r
I
e
272
Bluman: Elementary
Statistics:AStepbyStep
Approach. Fourth Edition
Chapter 7
7.TheNormal Distribution
© The McGraw-Hili
Text
Companies, 2001
The Normal Distribution
volunteer to participate in the study. Assume the variable is
normally distributed.
7-73. A contractor decided to build homes that will
include the middle 80% of the market. If the average size
(in square feet) of homes built is 1,810, find the maximum
and minimum sizes of the homes the contractor should
build. Assume that the standard deviation is 92 square feet
and the variable is normally distributed.
Source: Michael D. Shook andRobert L. Shook, The Book ofOdds
(New York:Penguin PutnamInc., 1991), 15.
7-74. If the average price of a new home is $145,500, find
the maximum and minimum prices of the houses a
contractor will build to include the middle 80% of the
market. Assume that the standard deviation of prices is
$1,500 and the variable is normally distributed.
Source: Michael D. Shook and Robert L. Shook, The Book of Odds
(New York:Penguin PutnamInc., 1991), 15.
7-75. An athletic association wants to sponsor a footrace.
The average time it takes to run the course is 58.6 minutes,
with a standard deviation of 4.3 minutes. If the association
decides to award certificates to the fastest 20% of the
racers, what should the cutoff time be? Assume the variable
is normally distributed.
7-76. In order to help students improve their reading, a
school district decides to implement a reading program. It
is to be administered to the bottom 5% of the students in
the district, based on the scores on a reading achievement
exam. If the average score for the students in the district is
122.6, find the cutoff score that will make a student eligible
for the program. The standard deviation is 18. Assume the
variable is normally distributed.
The owner reads in a study that the mean price of children's
talking books is $10.52, with a standard deviation of $1.08.
Find the maximum and minimum prices of talking books
the owner should sell. Assume the variable is normally
distributed.
7-81. An advertising company plans to market a product
to low-income families. A study states that for a particular
area, the average income per family is $24,596 and the
standard deviation is $6,256. If the company plans to target
the bottom 18 % of the families based on income, find the
cutoff income. Assume the variable is normally distributed.
7-82. If a one-person household spends an average of $40
per week on groceries, find the maximum and minimum
dollar amount spent per week for the middle 50% of oneperson households. Assume that the standard deviation is
$5 and the variable is normally distributed.
Source: Michael D. Shook and Robert L. Shook, The Book of Odds
(New York:Penguin Putnam Inc., 1991), 192.
7-83. The mean lifetime of a wristwatch is 25 months,
with a standard deviation of 5 months. If the distribution is
normal, for how many months should a guarantee be if the
manufacturer does not want to exchange more than 10% of
the watches? Assume the variable is normally distributed.
7-84. In order to quality for security officers' training,
recruits are tested for stress tolerance. The scores are
normally distributed, with a mean of 62 and a standard
deviation of 8. If only the top 15% of recruits are selected,
find the cutoff score.
7-85. In the distributions shown, state the mean and
standard deviation for each. Hint: See Figures 7-5 and 7-6.
Hint: The vertical lines are one standard deviation apart.
7-77. An automobile dealer finds that the average price of
a previously owned vehicle is $8,256. He decides to sell
cars that will appeal to the middle 60% of the market in
terms of price. Find the maximum and minimum prices of
the cars the dealer will sell: The standard deviation is
$1,150 and the variable is normally distributed.
7-78. A small publisher wishes to publish selfimprovement books. After a survey of the market, the publisher finds that the average cost of the type of book that she
wishes to publish is $12.80. If she wants to price her books
to sell in the middle 70% range, what should the maximum
and minimum prices of the books be? The standard deviation
is $0.83 and the variable is normally distributed.
60
80
100
120
140
160
180
7.5
10
12.5
15
17.5
20
22.5
7-79. A special enrichment program in mathematics is to
be offered to the top 12% of students in a school district. A
standardized mathematics achievement test given to all
students has a mean of 57.3 and a standard deviation of 16.
Find the cutoff score. Assume the variable is normally
distributed.
7-80. A book store owner decides to sell children's talking
books that will appeal to the middle 60% of his customers.
. Blumen: Elementary
StBtistics: AStep byStep
7.TheNormel Distribution
Text
© The McGraw-Hili
Companies, ZOOl
Approech, Fourth Edition
Section 7-4
Applications of the Normal Distribution
273
7-f,7. Given a data set, how could you decide if the
distribution of the data was approximately normal?
7-f,8. If a distribution of raw scores were plotted and then
the scores were transformed into zscores, would the shape
of the distribution change? Explain your answer.
7-f,9. In a normal distribution, find
0" when JL
2.68% of the area lies to the right of 105.
15
20
25
30
35
= 100 and
7-90. In a normal distribution, find JL when rr is 6 and
40
3.75% of the area lies to the left of 85.
7-91. In a certain normal distribution, 1.25% of the area
lies to the left of 42 and 1.25% of the area lies to the right
of 48. Find JL and 0".
7-f,6. Suppose that the mathematics SAT scores for high
school seniors for a specific year have a mean of 456 and
a standard deviation of 100 and are approximately
normally distributed. If a subgroup of these high school
seniors, those who are in the National Honor Society, is
selected, would you expect the distribution of scores to
have the same mean and standard deviation? Explain your
answer.
TI·83
Step by Step
7-92. An instructor gives a 100-point examination in
which the grades are normally distributed. The mean is 60
and the standard deviation is 10. If there are 5% Xs and 5%
F's, 15% B's and 15% D's, and 60% C's, find the scores
that divide the distribution into those categories.
Tilll® l'T\tvlfi.m~n If)1l;sitdfu>wtl1\0llli
To find the area under the standard normal distribution curve between any two z values:
Example TI7-1
Find the area between z
= 2.00 and z = 2.47, as in Example 7-6.
1. Press 2nd [DISTR] then 2 to get
normalcdf (.
2. Enter 2, 2.47) then press ENTER.
The calculator will display .0159944012.
To find the area under the normal distribution curve less than (or greater than) any z value,
use -1E99 and 1E99 to specify infinity.
Example TI7-2
Find the area to the left of z = -1.93, as in Example 7-5.
1. Press 2nd [DISTR] then 2 to get normalcdf (.
2. Enter -lE99, -1.93) then press ENTER. Note: Use 2nd [EE] to paste E into the cursor
location (this indicates the following number-here, 99-is an exponent).
The calculator will display .0268033499.
nor~a
"1
.'
,',-, .-, 4.....(
1c df ~~,~.
.0159944012
nor-ma1cdf':: -1 E99,
-1 q~
...
--.1 ..
•
.I'
CI'-::1~"':!'4.~q
• •"::1--:'':::
".:.I..:.'_'....
~.:. ...J •..J
.'.'
-
o
274
Bluman: Elementary
Statistics:AStep by Step
Approach. Fourth Edition
Chapter 7
7.TheNormal Distribution
© The McGraw-Hili
Companies, 2001
TeXl
The Normal Distribution
To find the area under a normal distribution curve between any two z values given a specific
mean and standard deviation:
Example TI7-3
= 28 and = 2, as in Example 7-15a.
Find the area between 27 and 31 when JL
(J"
1. Press 2nd [DISTR] then 2 to get normalcdf (.
2. Enter 27,31,28,2) then press ENTER.
The calculator will display .6246552391.
.- '-'4.-f.:·oJ
C'S'-'791
• b..::.
.....::. ..:J _
Excel
Step by Step
Th~ r'f@rr"lffill&lH Di15lJ;!1"liblillti!@illl
Excel has a built-in table for the standard normal distribution, in the form of the function
NORMSDIST. To find the area between two z values under the normal distribution curve:
1. Select a blank cell.
2. Click the f x icon to call up the function list.
3. Select NORMSDIST from the Statistical function category.
4. Enter the smaller z value and click [OK].
5. Repeat steps 1-4 for the larger z value.
6. Subtract the two computed values.
The Excel Function
NORMSDIST
The Central Limit
Theorem
In addition to knowing how individual data values vary about the mean for a population, statisticians are also interested in knowing about the distribution of the means
of samples taken from a population. This topic is discussed in the subsections that
follow.
Siuman: Elementary
Statistics: A Step byStep
Approach. Fourth Edition
7.The Normal Distribution
Text
© The McGraw-Hili
Companies, 2001
Section 7-5
Distribution of Sample
Means
Ililjeciille 6. Use the central
limit theorem tosolve
problems involving sample
means for large and small
samples.
The Central Limit Theorem
275
Suppose a researcher selects 100 samples of a specific size from a large population and
comput~ ~ ~ean of ~e same variable for each of the 100 samples. These sample
means, Xl> X2 , X3 , ..• , XlQO, constitute a sampling distribution of sample means.
Asampling distribution ofsample means isadistribution obtained by using the means computed from
random samples of aspecific size taken from apopulation.
If the samples are randomly selected with replacement, the sample means, for the
most part, will be somewhat different from the population mean JL. These differences
are caused by sampling error.
Sampling error is the difference between the sample measure and the corresponding population measure due
to the fact that the sample isnot apertect representation of the populatlon,
When all possible samples of a specific size are selected with replacement from a
population, the distribution of the sample means for a variable has two important properties, which are explained next.
The following example illustrates these two properties. Suppose a professor gave
an eight-point quiz to a small class of four students. The results of the quiz were 2, 6, 4,
and 8. For the sake of discussion, assume that the four students constitute the population. The mean of the population is
11.=2+6+4+8=5
4
f""
The standard deviation of the population is
c
y(2 - 5)2
>
+ (6 -
5)2 + (4 - 5)2
4
+ (8 -
5)2
= 2.236
The graph of the original distribution is shown in Figure 7-37. This is called a uniform
distribution.
Figure 7-37
Distribution of Quiz
Scores
2
6
4
8
Score
Now, if all samples of size 2 are taken with replacement, and the mean of each sample is found, the distribution is as shown next.
oI
276
8luman: Elementary
Statistics: A Step byStep
Approach. Fourth Edition
I 7.TheNormal Distribution I Text
© The McGraw-Hili
Companies. 2001
Chapter 7 The Normal Distribution
"
Sample
Mean
Sample
Mean
2,2
2,4
2,6
2,8
4,2
4,4
4,6
4, 8
2
4
5
6,2
6,4
6,6
6, 8
8,2
8,4
8,6
7
6
8,8
8
3
4
5
3
4
5
6
7
5
6
A frequency distribution of sample means is as follows.
X
f
2
3
4
5
6
1
2
3
4
3
2
1
7
8
For the data from the example just discussed, Figure 7-38 shows the graph of the
sample means. The graph appears to be somewhat normal, even though it is a histogram.
Figure 7-38
Distribution of
Sample Means
The mean of the sample means, denoted by /J.lg, is
/Lx =
2+3+ .. ·+880
16
= 16 = 5
which is the same as the population mean. Hence,
/Lx = /L
The standard deviation of sample means, denoted by
0\
(Tx
=
x' is
(T
/(2 - 5)2 + (3 - 5)2 + ... + (8 - 5)2
16
= 1.581
V
which is the same as the population standard deviation divided by 0:
- 2.236 - 1 581
(Tx -
VI - .
Blumen: Elementary
Statistics: A Step byStep
Approach. Fourth Edition
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies. 2001
Section 7-5
The Central Limit Theorem
277
(Note: Rounding rules were not used here in order to show that the answers coincide.)
In summary, if all possible samples of size n are taken with replacement from the
same population, the mean of the sample means, denoted by JLx' equals the population
mean JL; and the standard deviation of the sample means, denoted by CTi' equals a/Vn.
The standard deviation of the sample means is called the standard error of the mean.
Hence,
CT
CTX = Vii
A third property of the sampling distribution of sample means pertains to the shape
of the distribution and is explained by the central limit theorem.
The central limit theorem can be used to answer questions about sample means in
the same manner that the normal distribution can be used to answer questions about individual values. The only difference is that a new formula must be used for the z values.
It is
X-JL
z=--
CT/Vii
Notice that X is the sample mean, and the denominator is the standard error of the
mean.
If a large number of samples of a given size were selected from a large population,
and the sample means computed, the distribution of sample means would look like the
one shown in Figure 7-39. The percentages indicate the areas of the regions.
Figure 7-39
Distribution of Sample
Means for Large Number
of Samples
Il +1ax
Il + 2ax
Il + 3ax
It's important to remember two things when using the central limittheorem:
1. When the original variable is normally distributed, the distribution of the sample
means will be normally distributed, for any sample size n:
2. When the distribution of the original variable departs from normality, a sample size
of 30 or more is needed to use the normal distribution to approximate the
distribution of the sample means. The larger the sample, the better the
approximation will be.
Blumsn:Elementary
© The McGraw-Hili
Companies. 2001
Text
7.TheNormal Distribution
Statistics: A Step by Step
Approach,FnurthEdition
278
Chapter 7 The Nonna! Distribution
The next several examples show how the standard normal distribution can be used
to answer questions about sample means .
...............................................................................................................................................................................
A. C. Neilsen reported that children between the ages of 2 and 5 watch an average of
25 hours of television per week. Assume the variable is normally distributed and the
standard deviation is 3 hours. If 20 children between the ages of 2 and 5 are randomly
selected, [rod the probability that the mean of the number of hours they watch television
will be greater than 26.3 hours.
Source: Michael D. Shook and Robert L. Shook, The Book of Odds (New York: Penguin Putnam Inc.,
1991),161.
Solution
Since the variable is approximately normally distributed, the distribution of sample
means will be approximately normal, with a mean of 25. The standard deviation of the
sample means is
U'
U'jf
=
Vii =
3
VW = 0.671
The distribution of the means is shown in Figure 7-40, with the appropriate area
shaded.
Figure 7-40
Distribution of the Means
for Example 7-19
25
26.3
The z value is
z=
x-
p:
U'/Vii
=
26.3 - 25
1.3
3/VW = 0.671 = 1.94
The area between 0 and 1.94 is 0.4738. Since the desired area is in the tail, subtract
0.4738 from 0.5000. Hence, 0.5000 - 0.4738 = 0.0262, or 2.62%.
One can conclude that the probability of obtaining a sample mean larger than 26.3
hours is 2.62% (i.e., p(Jt. > 26.3) = 2.62%).
The average age of a vehicle registered in the United States is 8 years, or 96 months. Assume the standard deviation is 16 months. If a random sample of 36 vehicles is selected,
:find the probability that the mean of their age is between 90 and 100 months.
Source: Harper's Index 290, no. 1740 (May 1995), p. 11.
Solution
The desired area is shown in Figure 7-41.
Bluman: Elementary
Statistics: AStep byStep
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies. 2001
Approach, Fourth Edition
Section 7-5
The Central Limit Theorem
279
Figure 7-41
Areaunder the Curve for
Example 7-20
96
90
100
The two Z values are
90 - 96
Zl
= 16/v'36 = -2.25
Zz
=
100 - 96
16/v'36 = 1.50
The two areas corresponding to the Z values of -2.25 and 1.50, respectively, are
0.4878 and 0.4332. Since the z values are on opposite sides of the mean, find the probability by adding the areas: 0.4878 + 0.4332 = 0.921, or 92.1%.
Hence, the probability of obtaining a sample mean between 90 and 100 months is
92.1% i.e., P(90 < f < 100) = 92.1 %.
Since the sample size is 30 or larger, the normality assumption is not necessary, as
shown in Example 7-20.
Students sometimes have difficulty deciding whether to use
f-p,
z=--
a/Vii
or
X-p,
z=-(F
The formula
f - p,
z=--
a/Vii
should be used to gain information about a sample mean, as shown in this section. The
formula
X-p,
z=-(F
-
is used to gain information about an individual data value obtained from the population.
Notice that the first formula contains X, the symbol for the sample mean, while the sec-,
ond formula contains X, the symbol for an individual data value. The next example illustrates the uses of the two formulas.
The average number of pounds of meat a person consumes a year is 218.4 pounds.
Assume that the standard deviation is 25 pounds and the distribution is approximately
normal.
Source: Michael D. Shook and Robert L. Shook, The Book ofOdds (New York: Penguin Putnam Inc.,
1991),164.
e
280
Bluman: Elementary
Statistics:AStepbyStep
Approach. Fourth Edition
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies. 2001
Chapter 7 The Normal Distribution
a. Find the probability that a person selected at random consumes less than 224
pounds per year.
b. If a sample of 40 individuals is selected, find the probability that the mean of the
sample will be less than 224 pounds per year.
Solution
a. Since the question asks about an individual person, the formula z = (X -
p.,)1 a
is used.
The distribution is shown in Figure 7-42.
Figure 7-42
Area under.the Curve for
Part a of Example 7-21
218;4 224
Distribution ofindividual data values forthe population
The
z value is
z= X:
p., = 224
~5218.4
= 0.22
The area between 0 and 0.22 is 0.0871; this area must be added to 0.5000 to get the
total area to the left of z = 0.22.
0.0871
+ 0.5000 =
0.5871
Hence, the probability of selecting an individual who consumes less than 224
pounds of meat per year is 0.5871, or 58.71 % (i.e., P(X < 224) = 0.5871).
b. Since the question concerns the mean of a sample with a size of 40, the formula
z = (X - p.,)/(u/vn) is used.
The area is shown in Figure 7-43.
Figure 7-43
Area under the Curve for
Part b of Example 7-21
218;4
224
Distribution of means forallsamples ofsize 40taken from the population
Bluman: Elementary
Statistics: AStepbyStep
Approach, Fourth Edition
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies, 2001
Section 7-5
The CentralLimit Theorem
281
The z value is
z
= X-
JL
cr/yn
= 224 - 218.4 = 1 42
25/V45
.
The area between z = 0 and z = 1.42 is 0.4222; this value must be added to 0.5000
to get the total area.
0.4222
+ 0.5000 =
0.9222
Hence, the probability that the mean of a sample of 40 individuals is less than 224
pounds per year is 0.9222, or 92.22%. That is, P(X < 224) = 0.9222.
Comparing the two probabilities, one can see that the probability of selecting
an individual who consumes less than 224 pounds of meat per year is 58.71%, but
the probability of selecting a sample of 40 people with a mean consumption of
meat that is less than 224 pounds per year is 92.22%. This rather large difference is
due to the fact that the distribution of sample means is much less variable than the
distribution of individual data values.
finite Population
Correction factor
(Optional)
The formula for the standard error of the mean, cr/yn, is accurate when the samples are
drawn with replacement or are drawn without replacement from a very large or infinite
population. Since sampling with replacement is for the most part unrealistic, a correctionfactor is necessary for computing the standard error of the mean for samples drawn
without replacement from a finite population. Compute the correction factor by using
the following formula:
1~
VN=1
where N is the population size and n is the sample size.
This correction factor is necessary if relatively large samples are taken from a small
population, because the sample mean will then more accurately estimate the population
mean and there will be less error in the estimation. Therefore, the standard error of the
mean must be multiplied by the correction factor to adjust it for large samples taken
from a small population. That is,
Finally, the formula for the z value becomes
X-JL
z= a
yn'
~-n
N- 1
When the population is large and the sample is small, the correction factor is generally not used,since it will be very close to 1.00.
The formulas and their uses are summarized in Table 7-1.
41) , 8luman: Elementary
I 7.TheNormal Distribution I Text
Statistics:AStep bV Step
Approach. Fourth Edition
282
Chapter 7
© The McGraw-Hili
Companies. 2001
The Normal Distribution
7-93. If samples of a specific size are selected from a
population and the means are computed, what is this
distribution of means called?
7-94. Why do most of the sample means differ somewhat
from the population mean? What is this difference called?
7-95. What is the mean of the sample means?
7-96. What is the standard deviation of the sample means
called? What is the formula for this standard deviation?
7-97. What does the central limit theorem say about the
shape of the distribution of sample means?
7-98. What formula is used to gain information about an
individual data value when the variable is normally
distributed?
7-99. What formula is used to gain information about a
sample mean when the variable is normally distributed or
when the sample size is 30 or more?
For Exercises 7-100 through 7-117, assume that the
sample is taken from a large population and the
correction factor can be ignored.
7-100. A survey found that Americans generate an average
individuals is selected, find the probability that the mean
will be greater than 21.3 g/ml. Assume the variable is
normally distributed.
7-103. The mean weight of 18-year-old females is 126
pounds, and the standard deviation is 15.7. If a sample of
25 females is selected, find the probability that the mean-of
the sample will be greater than 128.3 pounds. Assume the
variable is normally distributed.
7-104. The mean grade point average of the engineering
majors at a large university is 3.23, with a standard
deviation of 0.72. In a class of 48 students, find the
probability that the mean grade point average of the
students is less than 3.15.
7-105. The average price of a pound of sliced bacon is
$2.02. Assume the standard deviation is $0.08. If a random
sample of 40 one-pound packages is selected, find the
probability that the mean of the sample will be less than
$2.00.
Source: Statistical Abstract ofthe United States 1994.
(Washington, DC: U. S. Bureau of the Census).
7-106. The average hourly wage of fast-food workers
of 17.2 pounds of glass garbage each year. Assume the
standard deviation of the distribution is 2.5 pounds. Find
the probability that the mean of a sample of 55 families will
be between 17 and 18 pounds.
employed by a nationwide chain is $5.55. The standard
deviation is $1.15. If a sample of 50 workers is selected,
find the probability that the mean of the sample will be
between $5.25 and $5.90.
Source:MichaelD. Shook and Robert L. Shook, TheBook of Odds
(New York: Penguin PutnamInc., 1991),14.
7-107. The mean score on a dexterity test for
7-101. The mean serum cholesterol of a large population
of overweight adults is 220 mg/dl and the standard
deviation is 16.3 mg/dl, If a sample of 30 adults is selected,
fmd the probability that the mean will be between 220 and
222mg/dl.
7-102. For a certain large group of individuals, the mean
hemoglobin level in the blood is 21.0 grams per milliliter
(g/ml). The standard deviation is 2 g/mI. If a sample of 25
12-year-olds is 30. The standard deviation is 5. If a
psychologist administers the test to a class of 22
students, find the probability that the mean of the sample
will be between 27 and 31. Assume the variable is
normally distributed.
7-108. A recent study of the life span of portable radios
found the average to be 3.1 years, with a standard
deviation of 0.9 year. If the number of radios owned by
the students in one dormitory is 47, find the probability
8luman: Elementary
Statistics: A Step byStep
Approach. Fourth Edition
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies, 2001
Section 7-5
that the mean lifetime of these radios will be less than
2.7 years.
7-109. The average age oflawyers is 43.6 years, with a
standard deviation of 5.1 years. If a law firm employs 50
lawyers, find the probability that the average age of the
group is greater than 44.2 years old.
7-110. The average annual precipitation for Des Moines is
30.83 inches, with a standard deviation of 5 inches. If a
random sample of 10 years is selected, find the probability
that the mean will be between 32 and 33 inches. Assume
the variable is normally distributed.
7-111. Procter & Gamble reported that an American family
of 4 washes an average of one ton (2000 pounds) of clothes
each year. If the standard deviation of the distribution is
187.5 pounds, find the probability that the mean of a
randomly selected sample of 50 families of four will be
between 1980 and 1990 pounds.
Source: Lewis H. Lapham, MichaelPollan, and Eric Etheridge,
The Harper's Index Book (NewYork: Henry Holt & Co., 1987).
7-112. The average annual salary in Pennsylvania was
$24,393 in 1992. Assume that salaries were normally
distributed for a certain group of wage earners, and the
standard deviation of this group was $4,362.
a. Find the probability that a randomly selected individual
earned less than $26,000.
b. Find the probability that for a randomly selected sample
of 25 individuals, the mean salary was less than $26,000.
c. Why is the probability for Part b higher than the
probability for Part a?
Associated Press,December23, 1992.
7-113. The average time it takes a group of adults to
complete a certain achievement test is 46.2 minutes. The
standard deviation is 8 minutes. Assume the variable is
normally distributed.
a. Find the probability that a randomly selected adult will
complete the test in less than 43 minutes.
b. Find the probability that if 50 randomly selected adults
take the test, the mean time it takes the group to complete
the test will be less than 43 minutes.
c. Does it seem reasonable that an adult would finish the
test in less than 43 minutes? Explain.
d. Does it seem reasonable that the mean of the 50 adults
could be less than 43 minutes?
7-114. Assume that the mean systolic blood pressure of
normal adults is 120 millimeters of mercury (mm Hg) and
the standard deviation is 5.6. Assume the variable is
normally distributed.
a. If an individual is selected, find the probability
that the individual's pressure will be between 120 and
121.8 lIllIl Hg.
The Central Limit Theorem
283
b. If a sample of 30 adults is randomly selected, find the
probability that the sample mean will be between 120 and
121.8 lIllIl Hg.
c. Why is the answer to Part a so much smaller than the
answer to Part b?
7-115. The average cholesterol content of a certain
brand of eggs is 215 milligrams and the standard
deviation is 15 milligrams. Assume the variable is
normally distributed.
a. If a single egg is selected, find the probability that the
cholesterol content will be more than 220 milligrams.
b. If a sample of 25 eggs is selected, find the probability
that the mean of the sample will be larger than 220
milligrams.
Source: Living Fit (EnglewoodCliffs,NJ: Best Foods, CPC
International, Inc., 1991).
7-116. At a large publishing company, the mean age of
proofreaders is 36.2 years, and the standard deviation is 3.7
years. Assume the variable is normally distributed.
a. If a proofreader from the company is randomly
selected, find the probability that his or her age will be
between 36 and 37.5 years.
b. If a random sample of 15 proofreaders is selected, find
the probability that the mean age of the proofreaders in the
sample will be between 36 and 37.5 years.
7-117. The average labor cost for car repairs for a
large chain of car repair shops is $48.25. The standard
deviation is $4.20. Assume the variable is normally
distributed.
a. If a store is selected at random, find the probability that
the labor cost will range between $46 and $48.
b. If 20 stores are selected at random, find the
probability that the mean of the sample will be between
$46 and $48.
c. Which answer is larger? Explain why.
For Exercises 7-118 and 7-119, check to see whether the
correction factor should be used. H so, be sure to
include it in the calculations.
*7-118. In a study of the life expectancy of 500 people in a
certain geographic region, the mean age at death was 72.0
years and the standard deviation was 5.3 years. If a sample
of 50 people from this region is selected, find the
probability that the mean life expectancy will be less than
70 years.
*7-119. A study of 800 homeowners in a certain area
showed that the average value of the homes was $82,000
.and the standard deviation was $5,000. If 50 homes are for
sale, find the probability that the mean of the values of
these homes is greater than $83,500.
oI
284
Bluman: Elementary
Statistics:AStep byStep
Approach. Fourth Edition
Chapter 7
I 7.TheNormal Distribution I Text
© The McGraw-Hili
Companies, 2001
The Normal Distribution
*7-120. The average breaking strength of a certain
brand of steel cable is 2000 pounds, with a standard
deviation of 100 pounds. A sample of 20 cables is
selected and tested. Find the sample mean that will cut
off the upper 95 % of all of the samples of size 20 taken
from the population. Assume the variable is normally
distributed.
The Normal
Approximation to the
Binomial Distribution
*7-121. The standard deviation of a variable is 15. If a
sample of 100 individuals is selected, compute the standard
error of the mean. What size sample is necessary to double
the standard error of the mean?
*7-122. In Exercise 7-121, what size sample is needed to
cut the standard error of the mean in half?
The normal distribution is often used to solve problems that involve the binomial distribution since when n is large (say, 100), the calculations are too difficult to do by hand
using the binomial distribution. Recall from Chapter 6 that a binomial distribution has
the following characteristics:
1. There must be a fixed number of trials.
2. The outcome of each trial must be independent.
3. Each experiment can have only two outcomes or be reduced to two outcomes.
4. The probability of a success must remain the same for each trial.
f]lij<lctive 1. Use the normal
approximation to compute
probabilities fora binomial
variable.
Also, recall that a binomial distribution is determined by n (the number of trials)
andp (the probability of a success). Whenp is approximately 0.5, and as nincreases, the
shape of the binomial distribution becomes similar to the normal distribution. The larger
n and the closer p is to 0.5, the more similar the shape of the binomial distribution is to
the normal distribution.
But when p is close to 0 or 1 and n is relatively small, the normal approximation is
inaccurate. As a rule of thumb, statisticians generally agree that the normal approximation should be used only when n . p and n . q are both greater than or equal to 5. (Note:
q = 1 - p.) For example, if pis 0.3 and n is 10, then np = (10)(0.3) = 3, and the normal distribution should not be used as an approximation. On the other hand, if p = 0.5
and n = 10, then np = (10)(0.5) = 5 and nq = (10)(0.5) = 5, and the normal distribution can be used as an approximation. See Figure 7-44.
In addition to the previous condition of np ;::: 5 and nq ;::: 5, a correction for continuity may be used in the normal approximation.
Acorrection for continuity isacorrection employed when acontinuous distribution is used to
approximate adiscrete distribution.
The continuity correction means that for any specific value of X, say 8, the
boundaries of X in the binomial distribution (in this case, 7.5 to 8.5) must be used. (See
Chapter 1, Section 1-3.) Hence, when one employs the normal distribution to approximate the binomial, the boundaries of any specific value X must be used as they are
shown in the binomial distribution. For example, for P(X = 8), the correction is
P(7.5 < X < 8.5). For P(X:5 7), the correction is P(X < 7.5). For P(X;::: 3), the correction is P(X > 2.5).
Students sometimes have difficulty deciding whether to add 0.5 or subtract 0.5
from the data value for the correction factor. Table 7-2 summarizes the different
situations.
Bluman: Elementary
Statistics: AStep by Step
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies, 2001
Approach, Fourth Edition
Section 7-6
Figure 7-44
Comparison of the
Binomial Distribution and
the Normal Distribution
The Normal Approximation to the Binomial Distribution
Binomial Probabilities for n = 10, p = 0.3
P(X)
In· p = 10(0.3) =3; n» q =10(0.7) =7]
0.3
x
P(X)
o
0.028
0.121
0.233
0.267
0.200
0.103
0.037
0.009
0.001
0.000
0.000
1
2
3
4
5
6
7
8
9
10
0.2
0.1
x
0
2
3
4
5
6
7
9
B
10
Binomial Probabilities for n= 10, P= 0.5
In· p= 10(0.5),= 5; n» q= 10(0.5) =5]
P(X)
0.3
0.2
0.1
X
P(X)
0
1
2
3
4
5
6
7
B
9
10
0.001
0.010
0.044
0.117
0.205
0.246
0.205
0.117
0.044
0.010
0.001
X
o
2
3
4
5
6
7
B
9
285
_
•
......
eI
286
Bluman: Elementary
Statistics:AStep by Step
Approach. Fourth Edition
I © The McGrew-Hili
I 7.The Normal Distribution I Text
Compenies, 2001
Chapter 7 The Normal Distribution
The formulas for the mean and standard deviation for the binomial distribution are
necessary for calculations. They are
,.
J.L
= n . p and a = \/n . p
.q
The steps for using the normal distribution to approximate the binomial distribution
are shown in the next Procedure Table.
! ample"7-22
~
A magazine reported that 6% of American drivers read the newspaper while driving. If
300 drivers are selected at random, find the probability that exactly 25 say they read the
newspaper while driving.
Source: USASnapshot, USAToday, June 26, 1995.
Solution
Here, p = 0.06, q = 0.94, and n
STEP 1
= 300.
Check to see whether the normal approximation can be used.
np = (300)(0.06) = 18
nq = (300)(0.94) = 282
Since np :::: 5 and nq :::: 5, the normal distribution can be used.
STEP 2 Find the mean and standard deviation.
J.L = np = (300)(0.06) = 18
tr
= ynpq = \/(300)(0.06)(0.94) =
\116.92
= 4.11
STEP 3
Write the problem in probability notation: P(X = 25).
STEP 4
Rewrite the problem by using the continuity correction factor. See
approximation number 1 in Table 7-2. P(25 - 0.5 < X < 25 + 0.5) =
P(24.5 < X < 25.5). Show the corresponding area under the normal
distribution curve (see Figure 7-45).
Bluman: Elementary
Statistics: AStepbyStep
Approach, Fourth Edition
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies. 2001
Section 7-6 The Normal Approximation to the Binomial Distribution
287
Figure 7-45
Area under the Curve
and X Values for
Example 7-22
18
STEP 5
24.5
(25.5
Find the corresponding z values. Since 25 represents any value between 24.5
and 25.5, find both z values.
Zl
STEP 6
= 25.5 - 18 = 1 82
4.11
.
Zz
= 24.5 -
4.11
18
= 1 58
.
Find the solution. Find the corresponding areas in the table: The area for z =
1.82 is 0.4656, and the area for z = 1.58 is 0.4429. Subtract the areas to get
the approximate value: 0.4656 - 0.4429 = 0.0227, or 2.27%.
Hence, the probability that exactly 25 people read the newspaper while driving is
2.27%.
. ample7-2
.............................................................................................................................................................................
Of the members of a bowling league, 10% are widowed. If 200 bowling league members are selected at random, find the probability that 10 or more will be widowed.
Solution
Here,p
STEP 1
= 0.10, q = 0.90, andn = 200.
Since np is (200)(0.10) = 20 and nq is (200)(0.90) = 180, the normal
approximation can be used.
STEP 2
JL
(J'
= np = (200)(0.10) = 20
= vnpq = y'(200)(0.1O)(0.90) = y18 = 4.24
STEP 3
P(X ~ 10).
STEP 4
See approximation number 2 in Table 7-2. P(X> 10 - 0.5) = P(X> 9.5).
The desired area is shown in Figure 7-46.
Figure 7-46
Area under the Curve and
X Value for Example 7-23
9.510
-
20
CD
Bluman:Elemelll8ry
Statistics: A Step by Step
7. The Normal Distribution
© The McGraw-Hili
Companies, 2001
Text
Approach,FourthEdition
288
Chapter 7
The Normal Distribution
STEP 5
Since the problem is to find the probability of 10 or more positive responses,
the normal distribution graph is as shown in Figure 7-46. Hence, the area
between 9.5 and 20 must be added to 0.5000 to get the correct
approximation.
The z value is
z
= 9.5 - 20 = -248
4.24
.
STEP 6
The area between 20 and 9.5 is 0.4934. Thus, the probability of getting 10 or
more responses is 0.5000 + 0.4934 = 0.9934, or 99.34%.
It can be concluded, then, that the probability of 10 or more widowed people in a
random sample of 200 bowling league members is 99.34%.
If a baseball player's batting average is 0.320 (32%), find the probability that the player
will get at most 26 hits in 100 times at bat.
,\
Solution
Here, p
STEP 1
= 0.32, q = 0.68, and n = 100.
Since np = (100)(0.320) = 32 and nq = (100)(0.680) = 68, the normal
distribution can be used to approximate the binomial distribution.
STEP 2
p:
(T
= np = (100)(0.320) = 32
= yTijiq = Y(I00)(0.32)(0.68) = Y21.76 = 4.66
STEP 3
P(X:5 26).
STEP 4
See approximation number 4 in Table 7-2. P(X < 26
The desired area is shown in Figure 7-47.
+ 0.5)
= P(X
<
26.5).
Figure 7-47
Area under the Curve for
Example 7-24
26 26.5
STEP 5
The z value is
z
STEP 6
32.0
= 26.5 - 32 = -1 18
4.66
.
The area between the mean and 26.5 is 0.3810. Since the area in the left tail
is desired, 0.3810 must be subtracted from 0.5000. So the probability is
0.5000....; 0.3810 = 0.1190, or 11.9%.
The closeness of the normal approximation is shown in the next example.
8luman: Elementary
Statistics: AStepbyStep
Approach, Fourth Edition
7.TheNormal Distribution
Text
© The McGrew-Hili
Companies, 2001
Section 7-6 The Normal Approximationto the BinomialDistribution
I
.
289
When n = 10 and p = 0.5, use the binomial distribution table (Table B in Appendix C)
to [rod the probability that X = 6. Then use the normal approximation to [rod the probability that X = 6.
Solution
From Table B, for n = 10, p = 0.5, and X = 6, the probability is 0.205.
For the normal approximation,
p, = np
(T
= (10)(0.5) = 5
= ynpq = \1(10)(0.5)(0.5) = 1.58
Now, X = 6 is represented by the boundaries 5.5 and 6.5. So the z values are
Zl
=
6.5 - 5
1.58 = 0.95
Zz
=
5.5 - 5
1.58
= 0.32
The corresponding area for 0.95 is 0.3289, and the corresponding area for 0.32 is
0.1255.
The solution is 0.3289 - 0.1255 = 0.2034, which is very close to the binomial
table value of 0.205. The desired area is shown in Figure 7-48.
Figure 7-48
Area under the Curve for
Example7-25
The normal approximation also can be used to approximate other distributions,
such as the Poisson distribution (see Table C in Appendix C).
7-123. Explainwhy the normal distributioncan be used as
an approximation to the binomial distribution. What
conditionsmust be met to use the normal distributionto
approximatethe binomial distribution?Why is a correction
for continuitynecessary?
7-124. (ans) Use the normal approximation to the
binomial to find the probabilitiesfor the specific value(s)
ofX.
a. n = 30,p = O.5,X= 18
b.
c.
d.
e.
f.
>
n = 50,p = 0.8,X = 44
n=l00,p=O.I,X= 12
n = 1O,p = 0.5, X 2=: 7
n=20,p=0.7,XSl2
n = 50, p = 0.6, X s 40
7-125. Check each binomial distributionto see whether
it can be approximatedby the normal distribution (i.e.,
are np 2=: 5 and nq 2=: 5?).
a. n = 20,p = 0.5
d. n = 50,p = 0.2
b. n = 1O,p = 0.6
e. n = 30,p = 0.8c. n = 40,p = 0.9
f. n = 20,p = 0.85
7-126. Of al13- to 5-year-oldchildren, 56% are enrolled in
school. If a sample of 500 such childrenis randomly
selected, find the probabilitythat at least 250 will be
enrolled in school.
Source: Statistical Abstract ofthe United States 1994 (Washington,
DC: U.S. Bureau of the Census).
e
290
Bluman: Elementary
7.TheNormal Distribution
Text
Statistics:A StepbyStep
Approach. Fourth Edition
© The McGraw-Hili
Companies, 2001
Chapter 7 The Normal Distribution
7-127. Twoout of five adult smokers acquiredthe habit by
age 14. If 400 smokers are randomly selected,find the
probabilitythat 170 or more acquired the habit by age 14.
Source: Harper's Index 289, no. 1735 (December 1994).
7-128. A theater owner has found that 5% of his patrons
do not showup for the performancethat they purchased
tickets for.If the theatre has 100 seats, find the probability
that six or more patrons will not show up for the sold-out
performance.
7-129. In a survey, 15%of Americans said they believe
that they will eventually get cancer. In a random sampleof
80 Americansused in the survey,find these probabilities.
a. At least six people in the sample believe they will get
cancer.
b. Fewer than five people in the sample believe they will
get cancer.
Source: Harper's Index (December 1994), p. 13.
7-130. If the probability that a newborn child will be
female is 50%, find the probabilitythat in 100 births,55 or
more will be females.
7-131. Prevention magazinereports that 18% of drivers
use a car phone while driving.Find the probabilitythat in a
random sample of 100 drivers, exactly 18 will use a car
phone while driving.
Source: USA Snapshot, USA Today, June 26, 1995.
7-132. A contractor statesthat 90% of all homes soldhave
burglar alarm systems. If a contractor sells 250 homes,find
Summary
the probability that fewer than 5 of them will not have
burglar alarm systems.
7-133. In 1993,only 3% of elementary and
secondary schoolsin the United States did not have
microcomputers. If a random sample of 180 elementary
and secondaryschools is selected, find the probability
that in 19936 or fewer schools did not have
microcomputers.
Source: Statistical Abstract of the United States 1994.
(Washington, DC: U. S. Bureau of the Census).
7-134. Last year, 17% of American workers carpooled
to work. If a random sample of 30 workers is selected,
find the probabilitythat exactly 5 people will carpool to
work.
\
Source: USA Snapshot, USA Today, July 6, 1995.
7-135. Apolitical candidate estimates that 30% of the
voters in his party favor his proposed tax reform bill. If
there are 400 people at a rally, find the probability that at
least 100 favor his tax bill.
*7-136. Recall that for use of the normal distribution
as an approximation to the binomial distribution,the
conditions np ~ 5 and nq ~ 5 must be met For each
given probability, compute the minimum sample size
needed for use of the normal approximation.
a. p = 0.1
d. P = 0.8
b. P = 0.3
e. p = 0.9
c. p = 0.5
The normal distribution can be used to describe a variety of variables, such as heights,
weights, and temperatures. The normal distribution is bell-shaped, unimodal, symmetric, and continuous; its mean, median, and mode are equal. Since each variable has its
own distribution with mean p, and standard deviation CT, mathematicians use the standard normal distribution, which has a mean of 0 and a standard deviation of 1. Other approximately normally distributed variables can be transformed into the standard normal
distribution with the formula z = (X - p,)/CT.
The normal distribution can also be used to describe a sampling distribution of sample means. These samples must be of the same size and randomly selected with replacement from the population. The means of the samples will differ somewhat from the
population mean, since samples are generally not perfect representations of the population from which they came. The mean of the sample means will be equal to the population mean, and the standard deviation of the sample means will be equal to the
population standard deviation divided by the square root of the sample size. The central
limit theorem states that as the size of the samples increases, the distribution of sample
means will be approximately normal.
The normal distribution can be used to approximate other distributions, such as the
binomial distribution. For the normal distribution to be used as an approximation, the
conditions np ;::: 5 and nq ;::: 5 must be met. Also, a correction for continuity may be
used for more accurate results.
7.TheNormal Distribution
, Bluman: Elementary
Statistics: A Step byStep
Approach, Fourth Edition
© The McGraw-Hili
Companies. 2001
Text
Section 7-7
negatively or left skewed
distribution 249
central limit
theorem 277
correction for
continuity 284
normal distribution
250
positively or right skewed
distribution 249
Formula for the z value (or standard score):
0'
- , ...
sampling error
275
standard error of the
mean 277
291
standard normal
distribution 252
symmetrical
distribution 249
zvalue
I
i, I
I'
252
Formula for the standard error of the mean:
0'
(z=X-JL
.
sampling distribution of
sample means 275
Summary
O'x= VTi
_'
Formula for finding a specific data value:
X=z'O'+JL
Formula for the mean of the sample means:
JLx = JL
Formula for the z value for the central limit theorem:
X-JL
z= u/VTi
Formulas for the mean and standard deviation for the
binomial distribution:
O'=~
7-137. Find the area under the standard normal
distribution curve for each.
a. Between z = 0 and z = 1.95
b. Betweeaz = 0 and z = 0.37
c. Between z = 1.32 and z = 1.82
d. Between z = -1.05 and z = 2.05
e. Between z = -0.03 and z = 0.53
f Between z = + 1.10 and z = -1.80
g. To the right of z = 1.99
h. To the right of z = -1.36
i. To the left of z = -2.09
j. To the left of z = 1.68
7-138. Using the standard normal distribution, find each
probability.
a. P(O < z < 2.07)
b. P(-1.83 < z < 0)
c. P( -1.59 < z < +2.01)
deviation of 0.3 second. Find the probability that it will
take the child the following time to react to the noise.
Assume the variable is normally distributed.
a. Between 1.0 and 1.3 seconds
b. More than 2.2 seconds
c. Less than 1.1 seconds
7-140. The average diastolic blood pressure of a certain
age group of people is 85 mm Hg. The standard deviation is
6. If an individual from this age group is selected, find the
probability that his or her pressure will be the following.
Assume the variable is normally distributed.
a. Greater than 90
b. Below 80
c. Between 85 and 95
d. Between 88 and 92
7-141. The average number of miles a mail carrier walks
on a typical route in a certain city is 5.6. The standard
deviation is 0.8 mile. Find the probability that a carrier
d. P(1.33 < z < 1.88)
e. P( -2.56 < z < 0.37)
f P(z> 1.66)
g. P(z < -2.03)
h. P(z> -1.19)
i. P(z < 1.93)
j. P(z> -1.77)
chosen at random walks the following miles. Assume the
variable is normally distributed.
a. Between 5 and 6 miles
b. Less than 4 miles
c. More than 6.3 miles
7-139. The average reaction time for a four-year-old child
to respond to a noise is 1.5 seconds, with a standard
7-142. The average number of years a person lives after
being diagnosed with a certain disease is 4 years. The
I i
I
-o
292
7.TheNormal Distribution
Bluman: Elementary
© The McGraw-Hili
Companies. 2001
Text
Statistics:A Step by Step
Approach, Fourth Edition
Chapter 7
The Normal Distribution
standard deviation is 6 months. Find the probability that an
individual diagnosed with the disease will live the
following numbers of years. Assume the variable is
normally distributed.
a. More than 5 years
b. Less than 2 years
c. Between 3.5 and 5.2 years
d. Between 4.2 and 4.8 years
7-143. Americans spend on average $617 per year on their
insured vehicles. Assume the variable is normally
distributed. If the standard deviation of the distribution is
$52, find the probability that a randomly selected insuredvehicle owner will spend the following amounts on his or
her vehicle per year.
a. Between $600 and $700
b. Less than $575
c. More than $624
Source: Statistical Abstract ofthe United States 1994 (Washington,
DC: U.S. Bureau of the Census).
running shoes are selected, find the probability that the
mean cost of a pair of shoes will be less than $80. Assume
the variable is normally distributed.
7-147. Americans spend on average 12.2 minutes in the
shower. If the standard deviation of the variable is 2.3
minutes and the variable is normally distributed, find the
probability that the mean time of a sample of 12 Americans
who shower will be less than 11 minutes.
Source: USA Snapshot, USA Today, July 11, 1995.
7-148. The probability of winning on a slot machine is
5%. If a person plays the machine 500 times, [rod the
probability of winning 30 times. Use the normal
approximation to the binomial distribution.
7-149. Of the total population of older Americans, 18%
live in Florida. For a randomly selected sample of 200
older Americans, find the probability that more than 40 live
in Florida.
Source: Elizabeth Vierck, Fact Book on Aging (Santa Barbara, CA:
7-144. The average weight of an airline'passenger's
ABC-cUD 1990).
suitcase is 45 pounds. The standard deviation is 2 pounds.
If 15% of the suitcases are overweight, find the maximum
weight allowed by the airline. Assume the variable is
normally distributed.
7-150. In a large university, 30% of the incoming
freshmen elect to enroll in a Personal Finance course
offered by the university. Find the probability that of 800
randomly selected incoming freshmen, at least 260 have
elected to enroll in the course.
7-145. An educational study to be conducted requires a
test score in the middle 40% range. If JL = 100 and a = 15,
find the highest and lowest acceptable test scores that
would enable a candidate to participate in the study.
Assume the variable is normally distributed.
7-146. The average cost of XYZ brand running shoes is
$83 per pair, with a standard deviation of $8. If 9 pairs of
7-151. Of the total population of the United States, 20%
live in the Northeast. If 200 residents of the United States
are selected at random, find the probability that at least 50
live in the Northeast.
Source: Statistical Abstract ofthe United States 1994 (Washington,
DC: U.S. Bureau of the Census).
. What IsNormal? Revisited
Many of the variables measured in medical tests-blood pressure, triglyceride level,
etc.-are approximately normally distributed for the majority of the population in the
United States. Thus, researchers can find the mean and standard deviation of these variables. Then, using these two measures along with the z values, they can find normal in. tervals for healthy individuals. For example, 95% of the systolic blood pressures of
healthy individuals fall within two standard deviations of the mean. If an individual's
1 pressure is outside the determined normal range (either above or below), the physician
!
...............1...:~.~~Ok for a possible cause and prescribe treatment if necessary.
Determine whether each statement is true or false. If the
statement is false, explain why.
1. The total area under the normal distribution is infinite.
2. The standard normal distribution is a continuous
distribution.
3. All variables that are approximately normally
distributed can be transformed into standard normal
variables.
4. The z value corresponding to a number below the mean
is always negative.
,
.'\1
I.
Bluman: Elementary
Statistics: AStepbyStep
Approach. Fourth Edition
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies. 2001
Section 7-7
5. The area under the standard normal distribution to the
left of z = 0 is negative.
6. The central limit theorem applies to means of samples
selected from different populations.
Select the best answer.
7. The mean of the standard normal distribution is
a.O
b. 1
c. 100
d. variable
8. Approximately what percentage of normally
distributed data values will fall within one standard
deviation above or below the mean?
a.68%
b.95%
c. 99.7%
d. variable
9. Which is not a property of the standard normal
distribution?
a. It's symmetric about the mean.
b. It's uniform.
c. It's bell-shaped.
d. It's unimodal.
\
11. The standard deviation of all possible sample means
equals
a. The population standard deviation
b. The population standard deviation divided by the
population mean
c. The population standard deviation divided by the
square root of the sample size
d. The square root of the population standard
deviation
Complete the following statements with the best answer.
12. When using the standard normal distribution, P(z < 0)
=
13. The difference between a sample mean and a
population mean is due to
_
14. The mean of the sample means equals
.
15. The standard deviation of all possible sample means is
called
.
16. The normal distribution can be used to approximate the
binomial distribution when n . p and n . q are both
greater than or equal to
.
293
17. The correction factor for the central limit theorem
should be used when the sample size is greater than
_ _ _ the size of the population.
18. Find the area under the standard normal distribution for
each.
a. Between 0 and 1.50
b. Between 0 and -1.25
c. Between 1.56 and 1.96
d. Between -1.20 and -2.25
e. Between -0.06 and 0.73
f Between 1.10 and -1.80
g. To the right of z = 1.75
h. To the right of z = -1.28
i. To the left of z = -2.12
j. To the left of z = 1.36
19. Using the standard normal distribution, find each
probability.
a. P(O < z < 2.16)
b. P( -1.87 < z < 0)
c. P( -1.63 < z < 2.17)
d. P(1.72 < z < 1.98)
e.
f
10. When a distribution is positively skewed, the
relationship of the mean, median, and mode from left
to right will be
a. Mean, median, mode
b. Mode, median, mean
c. Median, mode, mean
d. Mean, mode, median
Summary
P( -2.17
,
"
I
I
< z < 0.71)
P(z> 1.77)
g. P(z < -2.37)
h. P(z > -1.73)
i. P(z < 2.03)
j. P(z > -1.02)
20. The time it takes for a certain pain reliever to begin to
reduce symptoms is 30 minutes, with a standard
deviation of 4 minutes. Assuming the variable is
normally distributed, find the probability that it will
take the medication
a. Between 34 and 35 minutes to begin to work
b. More than 35 minutes to begin to work
c. Less than 25 minutes to begin to work
d. Between 35 and 40 minutes to begin to work
21. The average height of a certain age group of people is
53 inches. The standard deviation is 4 inches. If the
variable is normally distributed, find the probability
that a selected individual's height will be
a. Greater than 59 inches
b. Less than 45 inches
c. Between 50 and 55 inches
d. Between 58 and 62 inches
22. The average number of gallons of lemonade consumed
by the football team during a game is 20, with a
standard deviation of 3 gallons. Assume the variable is
normally distributed. When a game is played, find the
probability of using
a. Between 20 and 25 gallons
b. Less than 19 gallons
c. More than 21 gallons
d. Between 26 and 28 gallons
---------_._--------------------
e
294
Bluman: Elamentary
Statistics: A Step bV Step
Approach, Fourth Edition
Chapter 7
7. TheNormal Distribution
Text
© The McGraw-Hili
Companies, 2001
The Normal Distribution
23. The average number of years a person takes to
complete a graduate degree program is 3. The standard
deviation is 4 months. Assume the variable is normally
distributed. If an individual enrolls in the program, find
the probability that it will take
a. More than 4 years to complete the program
b. Less than 3 years to complete the program
c. Between 3.8 and 4.5 years to complete the program
d. Between 2.5 and 3.1 years to complete the program
24. On the daily run of an express bus, the average number
of passengers is 48. The standard deviation is 3.
Assume the variable is normally distributed. Find the
probability that the bus will have
a. Between 36 and 40 passengers
b. Fewer than 42 passengers
c. More than 48 passengers
d. Between 43 and 47 passengers
25. The average thickness of books on a library shelf is 8.3
centimeters. The standard deviation is 0.6 centimeter.
If 20% of the books are oversized, find the minimum
thickness of the oversized books on the library shelf.
Assume the variable is normally distributed.
27. The average repair cost of a microwave oven is. $55,
with a standard deviation of $8. The costs are normally
distributed. If 12 ovens are repaired, find the
probability that the mean of the repair bills will be
greater than $60.
28. The average electric bill in a residential area is $72 for
the month of April. The standard deviation is $6. If the
amounts of the electric bills are normally distributed,
find the probability that the mean of the bill for 15
residents will be less than $75.
29. The probability of winning on a slot machine is 8%. If
a person plays the machine 300 times, find the
probability of winning at most 20 times. Use the
normal approximation to the binomial distribution.
30. If 10% of the people in a certain factory are members
of a union, find the probability that in a sample of
2000, fewer than 180 people are union members.
31. In a corporation, 30% of the employees contributed to
a retirement program offered by the company. Find the
probability that of 300 randomly selected employees,
at least 105 are enrolled in the program.
26. Membership in an elite organization requires a test
score in the upper 30% range. If p., = 115 and (J' = 12,
find the lowest acceptable score that would enable a
candidate to apply for membership. Assume the
variable is normally distributed.
32. A company that installs swimming pools for residential
customers sells a pool package to 18% of the
customers its representatives visit. If the sales reps visit
180 customers, find the probability of their making at
least 40 sales.
Sometimes a researcher must decide whether or not a
variable is normally distributed. There are several ways
to do this. One simple but very subjective method uses
special graph paper, which is called normal probability
paper. For the distribution of systolic blood pressure
readings given in Chapter 3 of the textbook, the following
method can be used:
2. Find the cumulative frequencies for each class and place
the results in the third column.
1. Make a table, as shown.
Cumulative
Cumulative
Boundaries
Frequency
89.5-104.5
104.5-119.5
119.5-134.5
134.5-149.5
149.5-164.5
164.5-179.5
24
62
72
26
12
4
200
frequency
percent
frequency
3. Find the cumulative percents for each class by
dividing each cumulative frequency by 200 (the total
frequencies) and multiplying by 100%. (For the first
class, it would be 24/200 X 100% = 12%.) Place these
values in the last column.
4. Using the normal probability paper, label the x axis
with the class boundaries as shown and plot the
percents.
5. If the points fall approximately in a straight line, it
can be concluded that the distribution is normal.
Do you feel that this distribution is approximately
normal? Explain your answer.
Bluman: Elementary
7.TheNormal Distribution
Text
© The McGraw-Hili
Companies. 2001
Statistics: AStepby Step
Approach, Fourth Edition
Section 7-7
6. To find an approximation of the mean or median, draw a
horizontal line from the 50% point on the y axis over to
the curve and then a vertical line down to the x axis.
Compare this approximation of the mean with the
computed mean.
Summary
16% and 84% values on the y axis. Subtract these two
values and divide the result by 2. Compare this
approximate standard deviation to the computed
standard deviation.
8. Explain why the method used in Step 7 works.
7. To fmd an approximation of the standard deviation,
locate the values on the x axis that correspond to the
1. Select a variable (interval or ratio) and collect 30 data
values.
a. Construct a frequency distribution for the variable.
b. Use the procedure described in the critical thinking
challenge "When Is a Distribution Normal?" to
graph the distribution on normal probability paper.
c. Can you conclude that the data are approximately
normally distributed? Explain your answer.
295
2. Repeat exercise 1 using the data from Data Set X in
Appendix D. Use all the data.
3. Repeat exercise 1 using the data from Data Set XI in
AppendixD.
e
BlumaR: Elementary
Statistics: A Step byStep
Approach. Fourth Edition
B. Confidence Intervals and
Sample Size
Text
© The McGrew-Hili
Companies, 2001
Bluman: Elementary
Statistics: A Step bVStep
Approach, Fourth Edition
8.Confidence Intervals and
Sample Size
Text
© The McGraw-Hili
Companies, 2001
Statistics Today
297
Would You Change the Channel?
In the article shown on the next page, a survey by the Roper Organization found that
45% of the people who were offended by a television program would change the channel, while 15% would turn off their television sets. The article further states that the
margin of error is 3 percentage points, and 4000 adults were interviewed.
Several questions arise:
1. How do these estimates compare with the true population percentages?
2. What is meant by a margin of error of 3 percentage points?
3. Is the sample of 4000 large enough to represent the population of all adults who
watch television in the United States?
".
After reading this chapter, you will be able to answer these questions, since this
chapter explains how statisticians can use statistics to make estimates of parameters.
CD I
298
8luman: Elementary
Statistics: A Step by Step
Approach. Fourth Edition
Chapter 8
8.Confidence IntelVals and
Sample Size
I Text
© The McGraw-Hili
Companies. 2001
Confidence Intervals and Sample Size
TV Survey Eyes. On-Off Habits
NEW YORK (AP)-You're watching TV
and the show offends you. Do you sit there
and take it or do you get up and go?
A survey released yesterday on the public's attitudes toward television said 12 percent of the respondents do nothing, 45
percent change channels and 15 percent turn
off their sets.
National Association of Broadcasters and
the Network Television Association, an alliance of the Big Three broadcast networks,
co-sponsored the survey, which has been
taken every two years since 1959.
The Roper Organization polled 4,000
adults nationwide in personal interviews between November and December last year.
The results had a margin of sampling error
of plus or minus 3 percentage points.
The study, "America's Watching," said
that even in an age of channel "grazing"
about two in three viewers regularly make a
special effort to watch shows they particularly like.
And despite increasing cable alternatives,
about three-quarters of these "appointment"
viewers with cable said that ABC, CBS and
NBC still offer most of their favorites.
Viewers find TV news highly credible,
the survey also said. In the event of conflicting news reports, 56 percent of respondents
said they would believe television's account
above others.
A hefty 69 percent said they get most of
their news from TV. Newspapers were second, with 43 percent. Radio, word-of-mouth
and magazines fell far behind.
A majority of the respondents reported
having seen something on television that
they found "either personally offensive or
morally objectionable" in the past few
weeks. But only 42 percent could specify
whether it was profanity, sex or violence.
Source: Associated Press. Reprinted with permission,
Introduction
One aspect of inferential statistics is estimation, which is the process of estimating the
value of a parameter from information obtained from a sample. For example, The Book
of Odds, by Michael D. Shook and Robert L. Shook (New York: Penguin Putnam, Inc.,
1991), contains the following statements:
"One out of4Americans is currently dieting." (Calorie Control Council. Used with
permission.)
"Seventy-two percent ofAmericans haveflown in commercial airlines." ("The Bristol
Meyers Report: Medicine in the Next Century." Used with permission.)
"The average kindergarten student has seen more than 5000 hours oftelevision." (U.S.
Department ofEducation.)
"The average school nurse makes $32,786 a year." (National Association ofSchool
Nurses. Used with permission.)
"The average amount oflife insurance is $108,000 per household with life insurance."
(American Council ofLife Insurance. Used with permission.)
Since the populations from which these values were obtained are large, these values
are only estimates of the true parameters and are derived from data collected from samples.
The statistical procedures for estimating the population mean, proportion, variance,
and standard deviation will be explained in this chapter.
An important question in estimation is that of sample size. How large should the
sample be in order to make an accurate estimate? This question is not easy to answer
since the size of the sample depends on several factors, such as the accuracy desired and
the probability of making a correct estimate. The question of sample size will be explained in this chapter also.
Bluman: Elementary
Statistics: AStep byStep
APproach, Fourth Edition
8.Confidence Intervals and
SampleSize
Section 8-2
Confidence Intervals
for the Mean (0' Known
Dr n2: 30) and Sample
Size
Objective 1. Find the
confidence interval forthe
mean when (T isknown or
n2: 30.
Confideru::e intervals
Text
© The McGraw-Hili
Companies, 2001
Confidence Intervals for the Mean (0' Known or n ~ 30) and Sample Size
. 299
Suppose a college president wishes to estimate the average age of the students attending
classes this semester. The president could select a random sample of 100 students and
find the average age of these students, say 22.3 years. From the sample mean, the president could infer that the average age of all the students is 22.3 years. This type of estimate is called a point estimate.
A point estimate isaspecific numeric~ value estimate of aparameter. The best point estimate ofthe
population mean t-t isthe sample mean X.
One might ask why other measures of central tendency, such as the median and
mode, are not used to estimate the population mean. The reason is that the means of
samples vary less than other statistics (such as medians and modes) when many samples
are selected from the same population. Therefore, the sample mean is the best estimate
of the population mean.
Sample measures (i.e., statistics) are used to estimate population measures (i.e., parameters). These statistics are called estimators. As previously stated, the sample mean
is a better estimator of the population mean than the sample median or sample mode.
A good estimator should satisfy the three properties described next.
As stated in Chapter 7, the sample mean will be, for the most part, somewhat different
from the population mean due to sampling error. Therefore, one might ask a second
question: How good is a point estimate? The answer is that there is no way of knowing
how close the point estimate is to the population mean.
This answer places some doubt on the accuracy of point estimates. For this reason,
statisticians prefer another type of estimate called an interval estimate.
An interval estimate ofaparameter is an interval or arange of values used to estimate the parameter. This
estimate mayor may not contain the value ofthe parameter being estimated.
In an interval estimate, the parameter is specified as being between two values.
For example, an interval estimate for the average age of all students might be
26.9 < JL < 27.7, or 27.3 ± 0.4 years.
Either the interval contains the parameter or it does not. A degree of confidence
(usually a percent) can be assigned before an interval estimate is made. For instance,
one may wish to be 95% confident that the interval contains the true population mean.
Another question then arises. Why 95%? Why not 99% or 99.5%?
If one desires to be more confident, such as 99% or 99.5% confident, then the interval must be larger. For example, a 99% confidence interval for the mean age of
h_--------
e
300
Bluman: Elementary
Statistics: AStep byStep
Approach, Fourth Edition
8. Confidence Intervals and
SampleSize
© The McGraw-Hili
Companies. 2001
Text
Chapter 8 Confidence Intervals and Sample Size
'Rointandinterv~l, '..........•
.•·•.·.estjmates.were.known;a~i'.
~,~~lw~i.{t~~6E~
··m~th~inatjCian·.
Neym~n;fprmUIaf,
i,!praqlic~L?~pliqatt9il~
i'iJhelTl. ; .,:;d"':;
..
;~;,~.~-~ ~~~'~'~~:~ '~:'-~~'.;;~.-~.~'';~.''.~'~'~~
college students might be 26.7 < JL < 27.9, or 27.3 :±: 0.6. Hence, a trade-off occurs. To
be more confident that the interval contains the true population mean, one must make
the interval wider.
The confidence level ofan interval estimate of aparameter is the probability that the interval estimate will
contain the parameter.
Aconfidence interval isaspecific interval estimate ofaparameter determined by using data obtained from
asample and by using the specific confidence level ofthe estimate.
.
Intervals constructed in this way are called confidence intervals. Three common
confidence intervals are used: the 90%, the 95%, and the 99% confidence intervals.
The algebraic derivation of the formula for determining a confidence interval for a
mean will be shown later. A brief intuitive explanation will be given first.
The central limit theorem states that when the sample size is large, approximately
95% of the sample means will fall within :±: 1.96 standard errors of the population mean.
That is,
JL :±: 1.96 (
.vn)
6 :-
::'~) ,J!.
-;1 -.;.- ';; o
c? ,_.')-~
"J'0...~ _....: "-
Now, if a specific sample mean is selected, say X, there is a 95% probability that it falls
within the range of JL :±: 1.96 (oo/Vii). Likewise, there is a 95% probability that the interval specified by
will contain JL, as will be shown later. Stated another way,
X-
1.96(
.vn) < JL < X + 1.96( .vn)
Hence, one can be 95% confident that the population mean is contained within that interval when the values of the variable are normally distributed in the population.
The value used for the 95% confidence interval, 1.96, is obtained from Table E in
- Appendix C. For a 99% confidence interval, the value 2.58 is used instead of 1.96 in the
formula. This value is also obtained from Table E and is based on the standard normal
distribution. Since other confIdence intervals are used in statistics, the symbol Za/2 (read
, "zee sub alpha over two") is used in the general formula for confidence intervals. The
Greek letter a (alpha) represents the total area in both of the tails of the standard normal
distribution curve. al2 represents the area in each one of the tails. More will be said after Examples 8-1 and 8-2 about fmding other values for Za/2'
.
The relationship between a and the confIdence level is that the stated confidence level
is the percentage equivalent to the decimal value of 1 - a, and vice versa. When the 95%
confidence interval is to be found, a = 0.05, since 1 - 0.05 = 0.95, or 95%. When a =
0.01, then 1 - a = 1 - 0.01 = 0.99, and the 99% confidence interval is being calculated.
. 8luman: Elementary
Statistics: A Step by Step
APproacb. Fourth Edition
8.Confidence Intervals and
. Sample Size
Section 8-2
Text
© The McGraw-Hili
Companies, 2001
Confidence Intervals for the Mean (0" Known or n
~
30) and Sample Size
301
The term Za/2 (ulyn) is called the maximum error of estimate. For a specific value,
say a = 0.05, 95% of the sample means will fall within this error value on either side of
the population mean, as previously explained. See Figure 8-1.
Figure 8-1
95% Confidence Interval
lX
ZlX/2(Ji)
=0.05
ZlX/2(Ji)
Distribution of Xs
The maximum error ofestimate isthe maximum likely difference between the point estimate ofaparameter
and the actual value of the parameter.
Amore detailed explanation of the error of estimate follows the examples illustrating the computation of confidence intervals.
Rounding Rule for a Confidence Interval for a Mean When computing a confidence interval for a population mean using raw data, round off to one more decimal place than the
number of decimal places in the original data. When computing a confidence interval
for a population mean using a sample mean and a standard deviation, round off to the
same number of decimal places as given for the mean.
Exampe~
The president of a large university wishes to estimate the average age of the students
presently enrolled. From past studies, the standard deviation is known to be 2 years. A
sample of 50 students is selected, and the mean is found to be 23.2 years. Find the 95%
confidence interval of the population mean.
Solution·
Since the 95% confidence interval is desired,
formula
Za/2 =
1.96. Hence, substituting
in the
x- Za/2(Vn) < p, < X + Za/2(Vn)
one gets
23.2 -
1.96(Jso) <
23.2 - 0.6
22.6
P,
< 23.2 +
1.96(Jso)
< P, < 23.2 + 0.6
< P, < 23.8
or 23.2 ± 0.6 years. Hence, the president can say, with 95% confidence, that the average age of the students is between 22.6 and 23.8 years, based on 50 students.
eI
302
Blumen: Elementary
Statistics:AStep by Step
Approach. Fourth Edition
I 8.Confidence Intervels and , Text
© The McGraw-Hili
Companies, 2001
SampleSize
Chapter 8 Confidence Intervals and Sample Size
A certain medication is known to increase the pulse rate of its users. The standard deviation of the pulse rate is known to be 5 beats per minute. A sample of 30 users had an
average pulse rate of 104 beats per minute. Find the 99% confidence interval of the true
mean.
Solution
Since the 99% confidence interval is desired,
formula
Za/Z
= 2.58. Hence, substituting in the
x - Za/Z(.;n) <!J- < X + Za/Z(.;n)
one gets
104 - 2.58(Jo) <!J- < 104 + 2.58(Jo)
104 - 2.4 < !J- < 104 + 2.4
101.6 < !J- < 106.4
rounded to 102 <!J- < 106 or 104 ± 2
Hence one can be 99% confident that the mean pulse rate of all users of this medication
is between 102 and 106 beats per minute, based on a sample of 30 users.
Another way of looking at a confidence interval is shown in Figure 8-2.
According to the central limit theorem, approximately 95% of the sample means
fall within 1.96 standard deviations of the population mean if the sample size is
30 or more or if (T is known when n is less than 30 when the population is normally distributed. If it were possible to build a confidence interval about each sample mean, as was done in the previous examples for !J-, 95% of these intervals would
contain the population mean, as shown in Figure 8-3. Hence, one can be 95% confident that an interval built around a specific sample mean would contain the population mean.
Figure 8-2
95% Confidence Interval
for Sample Means
Bluman: Elementary
Statistics: A Step by Step
Approach, Fourth Edition
8.Confidence Intervals and
Sample Size
Section 8-2
Text
Confidence Intervals for the Mean (0' Known or n
© The McGraw-Hili
Companies, 2001
~
30) and Sample Size
303
Figure 8-3
95% Confidence Intervals
for Each Sample Mean
If one desires to be 99% confident, the confidence intervals must be enlarged so that 99
out of every 100 intervals contain the population mean.
Since other confidence intervals (besides 90%, 95%, and 99%) are sometimes used
in statistics, an explanation of how to find the values for Zctl2 is necessary. As stated
previously, the Greek letter IX represents the total of the areas in both tails of the normal
distribution. The value for IX is found by subtracting the decimal equivalent for the desired confidence level from 1. For example, if one wanted to find the 98% confidence
interval, one would change 98% to 0.98 and find IX = 1 - 0.98, or 0.02. Then al2 is
obtained by dividing IX by 2. So al2 is 0.02/2, or 0.01. Finally, Zo.01 is the z value that
will give an area of 0.01 in the right tail of the standard normal distribution curve. See
Figure 8-4.
Figure 8-4
Finding al2 for a 98%
Confidence Interval
Once al2 is determined, the corresponding Zctl2 value can be found by using the procedure shown in Chapter 7 (see Example 7-17), which is reviewed here. To get the Zctl2
value for a 98% confidence interval, subtract 0.01 from 0.5000 to get 0.4900. Next,
•
I
I'?,'F"''-
eI
304
Bluman: ElamentBry
Statistics:AStep by Step
Approach. Fourth Edition
Chapter 8
I 8.Confidence Intervals and I Text
© The McGraw-Hili
Companies. 2001
SampleSize
Confidence Intervals and Sample Size
locate the area that is closest to 0.4900 (in this case, 0.4901) in Table E, and then [rod
the corresponding z value. In this example, it is 2.33. See Figure 8-5.
"
Figure 8-5
Finding Za/2 for a 98%
Confidence Interval
For confidence intervals, only the positive z value is used in the formula.
When the original variable is normally distributed and a is known, the standard
normal distribution can be used to find confidence intervals regardless of the size of the
sample. When n ~ 30, the distribution of means will be approximately normal even if
the original distribution of the variable departs from normality. Also, if n =::: 30 (some
authors use n > 30), S can be substituted for ain the formula for confidence intervals;
and the standard normal distribution can be used to find confidence intervals for means,
as shown in the next example.
, Example 1E3
The following data represent a sample of the assets (in millions of dollars) of 30 credit
unions in Southwestern Pennsylvania. Find the 90% confidence interva1 of the mean.
$12.23
2.89
13.19
73.25
11.59
8.74
7.92
40.22
5.01
2.27
$16.56
1.24
9.16
1.91
6.69
3.17
4.78
2.42
1.47
$ 4.39
12.77
2.76
2.17
1.42
14.64
1.06
18.13
16.85
21.58
12.24
Source: Pittsburgh Post Gazette, June 7, 1998.
Solution:
STEP 1
Find the mean and standard deviation for the data. Use the formulas shown
in Chapter 3 or your calculator. The mean X = 11.091.
The standard deviation s = -14.405.
STEP 2
Find al2. Since the 90% confidence interval is to be used, a = 1 - 0.90
= 0.10, and
~ = 0.;0 = 0.05
Bluman: Elementary
Statistics: A Step by Step
Approach. Fourth Edition
8.Confidence IntelVals and
Sample Size
Text
© The McGraw-Hili
Companies. 2001
Section 8-2 Confidence Intervals for the Mean (0' Known or n ~ 30) and Sample Size
305
STEP 3
Find Zei!2' Subtract 0.05 from 0.5000 to get 0.4500. The corresponding Z
value obtained from Table E is 1.65. (Note: This value is found by using the
Z value for an area between 0.4495 and 0.4505. Amore precise Z value
obtained mathematically is 1.645 and is sometimes used; however, 1.65 will
be used in this textbook.)
STEP 4
Substitute in the formula
JI- Zri/2(Vn) < JL < JI + Zri/2(Vn)
(s is used in place of
(T when (Tis unknown,
11.091 - 1.65 ( 14.405)
yI30
< JL <
since n
11.091
11.091 - 4.339
<
JL < 11.091
6.752
<
JL
<
2::
30.)
+ 1.65 (14.405)
V30
+ 4.339
15.430
Hence, one can be 90% confident that the population mean of the assets of all credit
unions is between $6.752 million and $15.430 million, based on a sample of 30 credit
unions.
Sample Size
Objective 2, Determine the
minimum sample size for
finding aconfidence interval
forthe mean.
Sample size determination is closely related to statistical estimation. Quite often, one asks,
''How large a sample is necessary to make an accurate estimate?" The answer is not simple, since it depends on three things: the maximum error of estimate, the population standard deviation, and the degree of confidence. For example, how close to the true mean
does one want to be (2 units, 5 units, etc.), and how confident does one wish to be (90%,
95%,99%, etc.)? For the purpose of this chapter, it will be assumed that the population
standard deviation of the variable is known or has been estimated from a previous study.
The formula for sample size is derived from the maximum error of estimate
formula,
E
= ZeiI2(V"n)
and this formula is solved for n as follows:
Eyn = Zri/z{(T)
yn = Zei/2'
E
Hence,
(T
o
306
Bluman: ElemantBry
Statistics: A Step by Step
Approach. Fourth Edition
8.Confidence Intervals and
Sample Size
© The McGraw-Hili
Companies. 2001
Text
Chapter 8 Confidence Intervals and Sample Size
. Example H
.'
The college president asks the statistics teacher to estimate the average age of the students at their college. How large a sample is necessary? The statistics teacher would like
to be 99% confident that the estimate should be accurate within one year. From a previous study, the standard deviation of the ages is known to be 3 years.
Solution
Since a
one gets
= 0.01 (or 1 n=
0.99),
Za/2
= 2.58,
and E
= 1, substituting in
the formula,
(Zcrlk O'f = [(2.5~)(3)r = 59.9
which is rounded up to 60. Therefore, in order to be 99% confident that the estimate
is within 1 year of the true mean age, the teacher needs a sample size of at least 60
students.
Notice that when one is finding the sample size, the size of the population is irrelevant when the population is large or infinite or when sampling is done with replacement.
In other cases, an adjustment is made in the formula for computing sample size. This adjustment is beyond the scope of this book.
The formula for determining sample size requires the use of the population standard
deviation. What then happens when 0' is unknown? In this case, an attempt is made to
estimate 0'. One such way is to use the standard deviation, S, obtained from a sample
taken previously as an estimate for 0'. The standard deviation can also be estimated by
dividing the range by 4.
Sometimes, interval estimates rather than point estimates are reported. For instance,
one may read a statement such as "On the basis of a sample of 200 families, the survey
estimates that an American family of two spends an average of $84 per week for groceries. One can be 95% confident that this estimate is accurate within $3 of the true
mean." This statement means that the 95% confidence interval of the true mean is
< JL < $84 + $3
$81 < JL < $87
$84 - $3
The algebraic derivation of the formula for a confidence interval is shown next. As
explained in Chapter 7, the sampling distribution of the mean is approximately normal
when large samples (n ~ 30) are taken from a population. Also,
X-JL
z=-u/vn
I BlumBn: Elementary
I II. connaence Intervals and I Text
Statistics: A Step by Step
© The McGraw-Hili
Companies. 2001
Sample Size
AppfDBCh, FDurth EditiDn
Section 8-2
Confidence Intervals for the Mean (0' Known or n ze 30) and Sample Size
307
Furthermore, there is a probability of 1 - a that a Z will have a value between -ZaJ2 and
+ZaJ2' Hence,
- Za/2
<
X-/-L
cr/Vii
< Za/2
Using algebra,
u
U
-Za/2' Vii<X- /-L<Za/2'
vn
Subtracting X from both sides and from the middle, one gets
-
u
-X - Za/2' Vii
<
-/-L
<
-
u
-X + Za/2 . Vii
Multiplying by -1, one gets
cr
X+Za/2' Vii> /-L>X-Za/2'
U
vn
Reversing the inequality, one gets the formula for the confidence interval:
8-1. What is the difference between a point estimate and
an interval estimate of a parameter? Which is better? Why?
47,596
68,751
69,831
28,843
5,838
53,107
8-2. What information is necessary to calculate a
confidence interval?
31,391
62,892
48,829
55,105
50,706
63,974
8-3. What is the maximum error of estimate?
56,674
38,362
31,851
mean?
34,906
38,359
51,549
56,088
72,086
8-5. What are three properties of a good estimator?
34,009
46,127
50,850
49,926
54,960
8-6. What statistic best estimat~s J.L?""
32,785
48,321
49,671
8-4. What is meant by the 95 % confidence interval of the
8-7. What is necessary to determine sample size?
8-8. When one is determining the sample size for a
confidence interval, is the size of the population relevant?
8-9. Find each.
a. Za/2 for the 99%
b. Za/2 for the 98%
c. Za/2 for the 95%
d. ZcJ2 for the 90%
e. ZcJ2 for the 94%
confidence interval
confidence interval
confidence interval
confidence interval
confidence interval
8-10. Find the 95% confidence interval for the mean paid
attendance at the Major League AIl Star games. A random
sample of the paid attendances is shown.
31,938
43,801
Source: Time Almanac, Boston, MA, 1999.
8-11. A sample of the reading scores of 35 fifth-graders has
a mean of 82. The standard deviation of the sample is 15.
a. Find the 95 % confidence interval of the mean reading
scores of all fifth-graders.
b. Find the 99% confidence interval of the mean reading
scores of all fifth-graders.
c. Which interval is larger? Explain why.
8-12. Find the 90% confidence interval of the population
mean for the incomes of Western Pennsylvania Credit
Unions. A random sample of 50 credit unions is shown. The
data are in thousands of dollars.
oI
308
Chapter 8
$
..
8.Confidence IntelVals and
Sample Size
8luman: Elementary
Statistics: A Step byStep
Approach, Fourth Edition
84
;49
I Text
©TheMcGraw-Hili
Companies, 2001
Confidence Intervals and Sample Size
14
31
72
26
7,685
3,100
725
850
252
104
31
8
11,778
7,300
3,472
540
3
18
72
23
55
11,370
5,400
1,570
160
133
16
29
225
138
9,953
3,114
2,600
2,821
85
24
391
72
158
6,200
3,483
8,954
8
4340
346
19
5
846
1,000
1,650
1,200
390
461
254
125
61
123
1,999
400
3,473
600
60
29
10
366
47
1,270
873
400
713
28
254
6
21
11,960
1,195
2,290
175
97
6
17
77
8
887
1,703
4,236
1,400
82
Source: Pittsburgh Post Gazette, June 7, 1998.
8-13. A study of 40 English composition professors
showed that they spent, on average, 12.6 minutes correcting
a student's term paper.
a. Find the 90% confidence interval of the mean time for
all composition papers when a = 2.5 minutes.
b. If a professor stated that he spent, on average, 30
minutes correcting a term paper, what would be your
reaction?
8-14. A study of 40 bowlers showed that their average
score was 186. The standard deviation of the population
is 6.
a. Find the 95% confidence interval of the mean score for
all bowlers.
b. Find the 95% confidence interval of the mean score
if a sample of 100 bowlers is used instead of a sample
of 40.
c. Which interval is smaller? Explain why.
8-15. A study found that 8- to 12-year-olds spend an
average of $18.50 per trip to a mall. If a sample of 49
children was used, find the 90% confidence interval of
the mean. Assume the standard deviation of the sample
is $1.56.
Source: USA Today, July 25, 1995.
8-16. A study found that the average time it took a person
to find a new job was 5.9 months. If a sample of 36 job
seekers was surveyed, find the 95% confidence interval of
the mean. Assume the standard deviation of the sample is
0.8 month.
Source: The Book ofOddsby Michael D. Shook and Robert Shook
(New York: Penguin Putnam, Inc., 1991),46.
8-17. Find the 90% confidence interval for the mean
number of local jobs for top corporations in Southwestern
Pennsylvania. A sample of 40 selected corporations is
shown.
Source: PittsburghTribuneReview, July 6, 1997.
8-18. A random sample of 48 days taken at a large hospital
shows that an average of 38 patients were treated in the
emergency room per day. The standard deviation of the
population is 4.
a. Find the 99% confidence interval of the mean number
of ER patients treated each day at the hospital.
b. Find the 99% confidence interval of the mean number
of ER patients treated each day if the standard deviation
were 8 instead of 4.
c. Why is the confidence interval for part b wider than the
one for part a?
8-19. Noise levels at various area urban hospitals
were measured in decibels. The mean of the noise levels
in 84 corridors was 61.2 decibels, and the standard
deviation was 7.9. Find the 95% confidence interval of
the true mean.
Source: M. Bayo, A. Garcia, and A. Garcia, "Noise Levels in an
Urban Hospital and Workers' Subjective Responses," Archives of
Environmental Health 50, no. 3 (May-June 1995).
8-20. A researcher is interested in estimating the
average salary of police officers in a large city. She wants to
be 95% confident that her estimate is correct. If the standard
deviation is $1050, how large a sample is needed to get the
desired information and to be accurate within $200?
8-21. A university dean wishes to estimate the average
number of hours his part-time instructors teach per week.
The standard deviation from a previous study is 2.6 hours.
How large a sample must be selected if he wants to be 99%
confident of finding whether the true mean differs from the
sample mean by 1 hour?
8-22. In the hospital study cited in Exercise 8-19, the
mean noise level in the 171 ward areas was 58.0 decibels,
and the standard deviation was 4.8. Find the 90%
confidence interval of the true mean.
Source: M. Bayo, A. Garcia, and A. Garcia, "Noise Levels in an
Urban Hospital and Workers' Subjective Responses," Archives of
EnvironmentalHealth 50, no. 3 (May-June 1995).
8-23. An insurance company is trying to estimate the
average number of sick days that full-time food-service
8luman: Elementary
Statistics: AStep byStep
Approach, Fourth Edition
8.Confidence lntervals and
SampleSize
Section 8-2
© The McGraw-Hili
Companies, 2001
Text
Confidence Intervals for the Mean
«J' Known or n ;;::: 30) and Sample Size
309
accurate within $0.1O?A previous study showed that the
standard deviation of the price was $0.12.
workers use per year. A pilot study found the standard
deviation to be 2.5 days. How large a sample must be
selected if the company wants to be 95% confident of
getting an interval that contains the true mean with a
maximum error of 1 day?
8-25. A health care professional wishes to estimate the
birth weights of infants. How large a sample must she
select if she desires to be 90% confident that the true mean
is within 6 ounces of the sample mean? The standard
deviation of the birth weights is known to be 8 ounces.
8-24. A restaurant owner wishes to find the 99%
confidence interval of the true mean cost of a dry martini.
HoWlarge should the sample be if she wishes to be
MINITAB
Step by Step
Example MT8-1
Find the 95% confidence interval when
43
42
52
41
18
20
41
53
25
22
(J'
45
43
25
23
= 11 using the following sample:
21
21
42
27
32
33
24
36
32
47
19
19
25
20
26
1. Enter the data into C1 of a MINITAB worksheet.
2. Select Stat>Basic
Statistics>1-Sample Z.
3. Select C1 for the variable.
4. Click in the box for Sigma and enter 11.If the standard deviation for the population were
unknown, you would calculate the standard deviation for the sample and use s.
l-Sample Z Dialog Box
5. Click the [Options] button. In the dialog box make sure the Confidence Level is
95.0 (in Release 12 the Confidence Level is set in the I-Sample Z dialog box, without
clicking Options). You may need to click inside the textbox before you type the new level.
6. Click [OK].
The results will be displayed in the session window.
Session Window with .
Z-Interval
0t1'Il·S;f'l;mpIEl·:Z~C'1
11n,et~s:~'lJtJ!j,c;.jj, !11'~~~ ~
~,QU~tiillJlrl
);1
s....
.it
;;t()efM
1.11-.• til.
HE"
t;~t1:;£U'],
2:;..111
95·;~·ift~ 1:1'
.2Jft~
]l~#91lJ
44
CD
310
8luman:Elementary
Statistics:A Step by Step
Approach. Fourth Edition
8. Confidence Intervalsand
Sample Size
Text
© The McGraw-Hili
Companies. 2001
Chapter 8 Confidence Intervals and Sample Size
TI-83
Step by Step
For confidence intervals, the calculator will accept either raw data or summary statistics.
IF'fjWJi!lii~ ~l Z C\Q)!lllft!1itll{t:;]j'll~ Th.rll11Jell'w~R mt!JI£' tlfu.1e JWJ(~lm
(IDF;dlt-my
1. Enter the data into L1.
2. Press STAT and move the cursor to TESTS.
3. Press 7.
4.
5.
6.
7.
Select Data, and press ENTER and move cursor to 0".
Enter the values for 0". Make sure L1 is selected and Freq is 1.
Type in the correct confidence level.
Select Calculate and press ENTER.
Example TIS-I
Find the 99% confidence interval for the mean when 0" = 6 using the following data:
27
16
9
14
32
ZInter'val
I npt.: I~ Stats
.,.:6
List:L1
Freo:.t:1
C-Le.. .'e 1: • 99
Calculate
15
16
18
16
13
ZInterval
"1'-'
"":'1....
.-,..... 4°"')
1;
• .£..(
.::..,..::.....:...
~I(
x=17.6
Sx=6.818276094
n=10
•
]Ei'imli1Em~ a it Ci[j)mrl1d~!flJ.~ mrerwaH f®Ir tlhl~ MelEu!ll (Sbl&tics)
1. Press STAT and move the cursor to TESTS.
2. Press 7.
3.
4.
5.
6.
Select Stats and press ENTER. Move cursor to 0".
Enter the value for the population standard deviation, 0".
Enter the sample mean, X.
Enter the sample size.
7. Type in the correct confidence level.
8. Select Calculate and press ENTER.
E1<iiLlI!!!llpnfJ 1['][8=2
Find the 90% confidence interval of the mean when X = 182, 0" = 8, and n
Input
ZInterval
Inpt:Oata ~ ~
.,.:8
x: 182
n:50
C-Le ...)e 1: • 9
Calculate
= 50.
Output
ZInterval
(180.14,183.86)
x=182
n=50
•
I
Siuman: tlem."'.... J
Statistics: AStepby Step
Approach, Fourth Edition
Sample Size
Companies, 2001
Confidence Intervals for the Mean (0" Unknown and n
Section 8-3
< 30)
311
As shown, the 90% confidence interval of the mean is 180.14 < JL < 183.86.
Excel
Step by Step
Example XL8-1
Find a 95% confidence interval for the mean based on the following 3D-number sample data set
using the sample standard deviation:
43
19
21
52
18
25
27
26
33
20
44
25
42
36
47
45
41
19
43
41
20
21
42
53
22
32
25
24
23
32
1. Enter the data in column A.
2. Use the function STDEV to find the standard deviation.
3. Select a blank cell, and select the function CONFIDENCE.
4. Enter the sample size and standard deviation for the data set.
5. Enter alpha (here, 0.05) and click [OK].
The number returned by the function CONFIDENCE specifies the interval around the
mean as [mean - CONFIDENCE, mean + CONFIDENCE]. That is, the width of the
interval is twice the value returned by this function. Using the AVERAGE function, we
find the mean of data set to be 32.0333. So, the 95% confidence interval of the population
mean is
32.0333 - 3.9407 < JL < 32.0333
28.0926 < JL < 35.9740
Confidence Intervals
for the Mean
(u Unknown and
n< 3D)
~l3jeciille
3. Find the
confidence interval for the
mean when o: isunknown and
n « 30.
+ 3.9407
When (T is known and the variable is normally distributed or when (T is unknown and
n 2': 30, the standard normal distribution is used to find confidence intervals for the
mean. However, in many situations, the population standard deviation is not known
and the sample size is less than 30. In such situations, the standard deviation from the
sample can be used in place of the population standard deviation for confidence intervals. But a somewhat different distribution, called the t distribution, must be used
when the sample size is less than 30 and the variable is normally or approximately normally distributed.
Some important characteristics of the t distribution are described next.
CD
312
Bluman: Elementary
Statistics:AStep by Step
Approach, Fourth Edition
Chapter 8
8.Confidence Intervals Bnd
SampleSize
© The McGraw-Hili
Companies, 2001
Text
Confidence Intervals and Sample Size
,Tlletdistrlbutionwas;
'forrnulated;in.1.908'gY'an. .j.
:lris~"breVl'iQg·.ernpIOye~::
• \nameitw.s~G()sset."':;'
~~~t~i~~li;
J,Bscause:b
!:iernployea
:';~l{()~ed,to;p~bl
}Gpssetpllplish'
:;fihdinQ~lli6gt .
Figure 8-6
z
The t Familyof Curves
o
Many statistical distributions use the concept of degrees of freedom, and the formulas for fmding the degrees of freedom vary for different statistical tests. The degrees
of freedom are the number of values that are free to vary after a sample statistic has
been computed, and they tell the researcher which specific curve to use when a distribution consists of a family of curves.
For example, if the mean of 5 values is 10, then 4 of the 5 values are free to vary.
But once 4 values are selected, the fifth value must be a specific number to get a sum of
50, since 50 -;- 5 = 10. Hence, the degrees offreedom are 5 - 1 = 4, and this value tells
the researcher which t curve to use.
The symbol d.f. will be used for degrees of freedom. The degrees of freedom for a
confidence interval for the mean are found by subtracting 1 from the sample size. That
is, d.f. = n - 1. Note: For some statistical tests used later in this book, the degrees of
freedom are not equal to n - 1.
The formula for fmding a confidence interval about the mean using the t distribution is given next.
----"
.------_._-
..
---------------_._----._------------------------'
Bluman:Elementary
Statistics: A Step by Step
Approach.Fourth Edition
8.Confidence Intervals and
Sample Size
© 1he McGraw-HiII
companies, 1001
Text
Section 8-3
Confidence Intervals for the Mean ((J' UnJOl0wn and n < 30)
313
Thevaluesfor tcJ2 are foundin TableF inAppendix C.Thetop roW ofTable F,labeled
"Confidence Intervals,"is used to get thesevalues. Theothertwo rows, labeled "One Tail"
and''TwoTails," will be explainedin the next chapterand shouldnot be used here.
The next example shows how to find the valuein Table F for taf).'
F;d"ili~'~~~'~'~~~'f~;'~'95'%'~~;d~~~~';~~;~';h~~'ili;';;;~1~'~i~~'i·~"22:·"""··"""""··
EXample &-5
Solution
d.f = 22 - 1, or 21. Find 21 in the left columnand 95% in the roW labeled "Confidence
Intervals." The intersection where the two meet gives the value for taf)., which is 2.080.
SeeFigure 8-7.
Figure 8-7
Finding tan for
Example 8-5
Note: At the bottom of Table F where d.f, = 00, the Za/2 values can befound for specific confidence intervals. The reason is that as the degrees of freedom increase, the t
distribution approaches the standard normal distribution.
The next two examples show how to find the confidenceinterval when one is using
the t distribution.
.............................................................................................................................................................................
Tenrandomly selected automobiles were stopped, and the tread depth ofthe right front
tire was measured. The mean was 0.32 inch, and the standard deviation was 0.08 inch.
Find the 95% confidence interval of the mean depth. Assume that the variable is approximately normally distributed.
Solution
Since c is unknown and s must replace it, the t distribution (Table F) mustbe used for
95%. Hence, with 9 degrees of freedom,
= 2.262.
The 95% confidenceinterval of the populationmeanis foundby substituting in the
tal2
Ii
" ",e "
a·population,of: " ,'
rcerateci criminals. For, , formula
x- ta/2(0z) < JL < X+ ta/2k~n)
easures:heusedthe, !.,:
thscif.cmB ciftheir
ern.'" ", ,,>
,.;
~.~~~~~.~~'.'~~~'.;:-~~:,i·~~~'~~~ ~~::'~~~'i ~'. ~~
..
Hence,
0.32 -
(2.262}(~) < JL < 0.32 + (2.26Z)(~)
0.32 - 0.057 < JL < 0.32
+ 0.051
0.26 < JL < 0.38
Therefore, one can be 95% confident that the populationmean tr~ad depthof all right
front tires is between 0.26 and 0.38 inch based on a sample of 10 ores.
"
Bluman: Elementary
Statistics: A Step bVStep
Approach. Fourth Edition
I
314
8. Confidence Intervalsand
Sample Size
© The McGraw-Hili
Companies. 2001
Text
Chapter 8 Confidence Intervals and Sample Size
Exampe ' 7
;,
'.
..............................................................................................................................................................................
The data represent a sample of the number of home fires started by candles for the past
several years. (Data are from the National Fire Protection Association.) Find the 99%
confIdence interval for the mean number of home fires started by candles each year.
5460
6090
5900
6310
7160
8440
9930
Solution
STEP 1
Find the mean and standard deviation for the data.
Use the formulas in Chapter 3 or your calculator.
The mean X = 7041.4
The standard deviation s = 1610.3
STEP 2
Find ta/2 in Table F. Use the 99% confIdence interval with d.f. = 6. It is
3.707.
STEP 3
Substitute in the formula and solve
X - ta/2(Jn) <
7041.4 -
/L <
X + ta/2 +
(In)
3.707(1~3) < /L < 7041.4 + 3.707(1~3)
< /L < 7041.4 + 2256.2
4785.2 < /L <: 9297.6
7041.4 - 2256.2
One can be 99% confIdent that the population mean of home fires started by
candles each year is between 4785.2 and 9297.6, based on a sample of home
fires occurring over a period of 7 years.
Students sometimes have difficulty deciding whether to use Zal2 or ta/2 values when
finding confidence intervals for the mean. As stated previously, when the (J'is known,
Zal2 values can be used no matter what the sample size is, as long as the variable is normally distributed or n === 30. When (J'is unknown and n === 30, s can be used in the formula and Za/2 values can be used. Finally, when rr is unknown and n < 30, s is used in
the formula and tal2 values are used, as long as the variable is approximately normally
distributed. These rules are summarized in Figure 8-8.
Figure s-a
When to Use the z or t
Distribution
Yes
Yes
*Variable must be normally distibuted when n < 30.
**Variable must be approximately normally distibuted.
Bluman:Elementary
Statistics: AStep by Step
Approach, Fourth Edition
8.Confidence Intervals and
Sample Size ,
Text
Section 8-3
I
©The McGraw-Hili
Companies, 2001
Confidence Intervals for the Mean ((J' Unknown and n
(l
:1
"I
< 30)
315
It should be pointed out that some statisticians have a different point of view. They
use Zal2 values when o is known and tal2 values when o is unknown. In these circumstances, a t table that contains t values for sample sizes greater than or equal to 30 would
be needed. The procedure shown in Figure 8-8 is the one used throughout this textbook.
8-26. What are the properties of the t distribution?
8-27. Who developed the t distribution?
8-28. What is meant by degrees of freedom?
8-29. When should the t distribution be used to find a
confidence interval for the mean?
8-30. (ans) Find the values for each.
a. tal2 and n = 18 for the 99% confidence interval for
the mean
b. tal2 and n
the mean
c. tal2 and n
the mean
d. tal2 and n
the mean
e. tal2 and n
the mean
= 23 for the 95% confidence interval for
= 15 for the 98% confidence interval for
= 10 for the 90% confidence interval for
= 20 for the 95% confidence interval for
the sample was 1.6. Find the 90% confidence interval of the
true mean.
8-35. A sample of 6 adult elephants had an average weight
of 12,200 pounds, with a sample standard deviation of 200
pounds. Find the 95% confidence interval of the true mean.
8-36. The daily salaries of substitute teachers for 8 local
school districts is shown. What is the point estimate for the
mean? Find the 90% confidence interval of the mean for
the salaries of substitute teachers in the region.
$60
$56
$60
$55
$70
$55
$60
$55
Source: Pittsburgh Tribune Review, November 30, 1997.
8-37. A recent study of 28 city residents showed that the
mean of the time they had lived at their present address was
9.3 years. The standard deviation of the sample was 2
years. Find the 90% confidence interval of the true mean.
8-38. An automobile shop manager timed six employees
For Exercises 8-31 through 8-46, assume that all
variables are approximately normally distributed.
8-31. The average hemoglobin reading for a sample of 20
teachers was 16 grams per 100 milliliters, with a sample
standard deviation of 2 grams. Find the 99% confidence
interval of the true mean.
8-32. A meteorologist who sampled 15 cold weather fronts
found that the average speed at which they traveled across a
certain state was 18 miles per hour. The standard deviation
of the sample was 2 miles per hour. Find the 95%
confidence interval of the mean.
8-33. A state representative wishes to estimate the mean
number of women representatives per state legislature. A
random sample of 17 states is selected, and the number of
women representatives is shown below. Based on the
sample, what is the point estimate of the mean? Find the
90% confidence interval of the mean population. (Note: The
population mean is actually 31.72 or about 32.) Compare
this value to the point estimate and the confidence interval.
There is something unusual about the data. Describe it and
state how it would affect the confidence interval.
5
31
33
16
35
45
37
19
24
13
18
29
15
39
18
58
132
8-34. A sample of 20 tuna showed that they swim an
average of 8.6 miles per hour. The standard deviation for
and found that the average time it took them to change a
water pump was 18 minutes. The standard deviation of the
sample was 3 minutes. Find the 99% confidence interval of
the true mean.
8-39. A recent study of 25 students showed that they spent
an average of $18.53 for gasoline per week. The standard
deviation of the sample was $3.00. Find the.95%
confidence interval of the true mean.
.,
i'
8-40. For a group of 10 men subjected to a stress situation,
the mean number of heartbeats per minute was 126, and the,
standard deviation was 4. Find the 95% confidence interval
of the true mean.
8-41. For the stress test described in Exercise 8-40, six
women had an average heart rate of 115 beats per minute.
The standard deviation of the sample was 6 beats. Find the
95% confidence interval of the true mean for the women.
8-42. For a sample of 24 operating rooms taken in the
hospital study mentioned in Exercise 8-19" ~e ~ean noise
level was 41.6 decibels, and the standard deviation was 7.5.
Find the 95% confidence interval of the true mean of the
noise levels in the operating rooms.
Source: M. Bayo, A. Garcia, and A. Garcia, ''Noise Levels in an
Urban Hospital and Workers' Subjective Responses," Archivesof
"
Environmental Health 50, no. 3 (May-June 1995).
8-43. Find the 98% confidence interval of the mean
taxable assessed values for the City of Pittsburgh. A
random sample of 14 wards is shown. Data are in millions
------~--
'-'----------~---'------
I
e
316
Bluman: Elementary
Statistics:AStep byStep
Approach. Fourth Edition
Chapter 8
8. Confidence Intervals and
SampleSize
Text
© The McGraw-Hili
Companies, 2001
Confidence Intervals and Sample Size
answers. The data represent the daily revenues from 20
parking meters in a small municipality.
of dollars. Note: Based on all the wards, the population
mean is $65,.70. Does the confidence interval actually
include thepopulation mean?
$22
29
$24
$120
$382
$50
$38
23
70
56
17
51
Source: Pittsburgh Tribune Review, May 22, 1999.
$2.60
1.30
2.40
2.80
1.00
$297
38
8-44. For a group of 20 students taking a final exam, the
mean heart rate was 96 beats per minute, and the standard
deviation was 5. Find the 95% confidence interval of the
true mean.
$1.05
3.10
2.35
2.50
2.75
$2.45
2.35
2.40
2.10
1.80
$2.90
2.00
1.95
1.75
1.95
8-45. The average yearly income for 28 married couples
living in city C is $58,219. The standard deviation of the
sample is $56. Find the 95% confidence interval of the true
mean.
*8-46. A one-sided confidence interval can be found for a
mean by using
-
s
J.L > X - tayTi
or
J.L
-
s
< X + tayn
.J...JTME MAN IN i}ol(
where ta is the value found under the row labeled one tail.
Find two one-sided 95 % confidence intervals of the
population mean for the data shown and interpret the
~TRE£T
()\lI5'1l'tT TOA ~PLINc;.I"Uloll0'" PLUS Oil.
"'1"'V~ TMUE 1£«(N1.&( CIo'JolTS)
Source: Drawing by Nurit Karlin: © 1988
The New YorkerMagazine, Inc.
MINITAB
Step by Step
IffiinmlTIlg tl.\l t
OrDdd~IDl~ htl];elt"~n f®lt" th~ 1\1[~
Example MT8-2
Find the 95% confidence interval when a
625
675
535
406
= 11 using the following sample:
512
680
483
522
619
575
1. Type the data into C1 of a MINITAB worksheet.
2. Select
Stat>Basic Statistics>1-Sample t.
I-Sample t Dialog Box
3. Double-click C1 for the variable.
4. Click on Options and be sure the Confidence Level is 95.0 (in Release 12 the
Confidence Level is set in the l-Sample t dialogue box, without clicking Options). You may
need to click inside the textbox to change it. Click [OK].
! .
I
iH:
i :Iii!
'1_
ilJil}
_i!llki
5. Click [OK].
_
r;;:man: Elementary
Statistics: A Step by Step
B. Confidence IntelVals and
I Text
© The McGraw-Hili
Sample Size
Companies. 2001
Approach, Fourth Edition
Section 8-3
Confidence Intervals for the Mean (0"Unknown and n < 30)
317
In the session window you will see the results. The 95% confidence interval estimate for p: is between 500.4 and 626.0. The sample size, mean, standard deviation, and standard error of the mean
are also shown.
Session Window with
t-interval
:'\re:lJjj;.~,'i~
Jl:J' , ,
.:ai~[)J0.~,
J~tii!u:i:
~:;'J!ie;Wfi;
~.'7;,,;~(~1:.,a:
:5&;3.2:'
t
95"[i~1tI
'iWlr,,,,Al~,
1~,i$,,':l1i~
TI·83
Step by Step
1. Enter the data into L1•
2. Press STAT and move the cursor to TESTS.
3. Press 8.
4. Select Data and press ENTER. Move cursor to
c-level.
5. Type in the correct confidence level. Make sure L1 is selected and Freq is 1.
6. Select
Calculate and press ENTER.
Example TI8-3
Find the 95% confidence interval for the following data:
62
78
81
59
86
63
79
97
73
93
90
88
84
98
78
93
87
82
Input
TI nter'.. ' al
Inpt:lm,E St.at.s
List.:~
Fr·e·=:.j: 1
(:-Le '..,1e 1: • 95
Calculate
Output
TI nt.er' 1...1a 1
...
71:' g':::4 ' ; C,,,,,:,
4P_1' . '.~
'.1·_1 • • ,_,
IV(.
:.-.
1
...,.-..-..-..-..-..-..-.
.:-: =I:' .... L. L L. L'::' L L.
- ....- 11 • 5-'I.'.q-· .- ..:.;.
.-. '.....'- t-:_' ...1...
'_' .•..• n=18
•
As shown on the screen, the 95% confidence interval of the mean is 75.964
values of the sample mean and standard deviation are also given.
h
_
<
t-t < 87.481. The
G
318
B. Confidence IlIIervalsand
Semple Size
Blumen: Elemelllary
Statistics: A Step by Step
Approach, Fourth Edition
Chapter 8
© The McGraw-Hili
Text
Companies. 2001
Confidence Intervals and Sample Size
fifum!LUH:[d~ &111 OGmllifHi!l(!3IT~{('''if; mreil'¥7il~ [@rI' ftllitl1:e M®wJ(!l (~f£~twJ)
"
1. Press STAT and move the cursor to TESTS.
2. Press 8 for
TInterval.
3. Select Stats and press ENTER. Move cursor to
4. Enter the value for the sample mean,
x.
X.
5. Enter the value for the sample standard deviation, SX'
6. Enter the sample size, n.
7. Type in the correct confidence interval.
8. Select Calculate and press ENTER.
Example TI8-4
As in Example 8-6, find the 95% confidence interval when X
= 0.32, s = 0.08, and n = 10.
Output
Input
Tlnter'.. . al
I,.-,p+..:Da+..a
:;::. -:::?
~
r-. •• ........
Sx:.08
,.-,: 10
TIt'",+..er'-.Ja 1
(.26277:0.37723)
5<=.32
C-Level:.95
Calculate
As shown, the 95% confidence interval is 0.26277
-
Confidence Intervals
and Sample Size for
Proportions
I',Ibjactlve 4. Rnd the
confidence interval fora
proportion.
i!FL
S>~=.
,.-,=10
08
< JL < 0.37723.
A USA Today Snapshots feature (July 2, 1993) stated that 12% of the pleasure boats in
the United States were named Serenity. The parameter 12% is called a proportion. It
means that of all the pleasure boats in the United States, 12 out of every 100 are named
Serenity. A proportion represents a part of a whole. It can be expressed as a fraction, decimal, or percentage. In this case, 12% = 0.12 = l~ orrs. Proportions can also represent
probabilities. In this case, if a pleasure boat is selected at random, the probability that it
is called Serenity is 0.12.
Proportions can be obtained from samples or populations. The following symbols
will be used.
·-· ...- -.....
8luman: Elementary
Statistics: A Step by Step
Approach, Fourth Edition
8.Confidence Intervalsand
Sample Size
Text
© The McGraw-Hili
Companies, 2001
Section 8-4
319
Confidence Intervals and Sample Size for Proportions
For example, in a study, 200 people were asked if they were satisfied with their job
or profession; 162 said that they were. In this case, n = 200, X = 162, and p = Xln =
162/200 = 0.81. It can be said that for this sample, 0.81 or 81% of those surveyed were
satisfied with their job or profession. The sample proportion is p = 0.81.
The proportion of people who did not respond favorably when asked if they were
satisfied with their job or profession constituted q, where q = (n - X)ln. For this survey, q = (200 - 162)/200 = 38/200, or 0.19, or 19%.
When p and q are given in decimals or fractions, p + q = 1. When p and q are
given in percentages, p + ij = 100%. It follows, then, that q = 1 - p, or p = 1 - q,
when p and ij are in decimal or fraction form. For the sample survey on job satisfaction,
ij can also be found by using q = 1 - p, or 1 - 0.81 = 0.19.
-Similar reasoning applies to population proportions; i.e.,p = 1 - q, q = 1 - p, and
p + q = 1, when p and q are expressed in decimal or fraction form. When p and q are
expressed as percentages, p + q = 100%, p = 100% - q, and q = 100% - p.
, ampeH
In a recent survey of 150 households, 54 had central air conditioning. Find p and ij,
where p is the proportion of households that have central air conditioning.
Solution
Since X = 54 and n = 150,
P=
~ = :5~ =
q=
n - X = 150 - 54 = ~ = 064
n
One can also find
0.36 = 36%
150
150
.
= 64%
qby using the formula q = 1 - p. In this case, q =
1 - 0.36
= 0.64.
..............................................................................................................................................................................
As with means, the statistician, given the sample proportion, tries to estimate the
population proportion. Point and interval estimates for a population proportion can be
made by using the sample proportion. For a point estimate of p (the population proportion), p (the sample proportion) is used. On the basis of the three properties of a good
estimator, p is unbiased, consistent, and relatively efficient. But as with means, one is
not able to decide how good the point estimate of p is. Therefore, statisticians also use
an interval estimate for a proportion, and they can assign a probability that the interval
will contain the population proportion.
Confidence Intervals
To construct a confidence interval about a proportion, one must use the maximum error
of estimate, which is
E=
Za/zVi!-
Confidence intervals about proportions must meet the criteria that np
2::
5 and nq
2::
5.
CD
320
Bluman:Elamentary
8. Confidence Intervals and
Statistics: A Step.by Step
Approach.FourthEdition
SampleSize
Chapter 8
© The McGraw-Hili
Text
Companies. 2001
Confidence Intervals and Sample Size
Rounding Rule for a Confidence Interval for a Proportion Round off to three decimal
places.
'.'i
ample
A sample of 500 nursing applications included 60 from men. Find the 90% confidence
interval of the true proportion of men who applied to the nursing program.
Solution
= 1-
Since a
pwhenft
0.90
(Za/2)
= 0.10, and Za/2 = 1.65, substituting in the formula
Vi!- <
p
< P + (Za/2)
= 601500 = 0.12 and q = 1 -
Vi!-
0.12
= 0.88, one gets
(O.1~~~.88) < p < 0.12 + (1.65)
0.12 - (1.65)
(0.12)(0.88)
500
< P < 0.12 + 0.024
0.096 < P < 0.144
9.6% < P < 14.4%
or
Hence, one can be 90% confident that the percentage of men who applied is between
9.6% and 14.4%.
When a specific percentage is given, the percentage becomes p when it is changed
to a decimal. For example, if the problem states that 12% of the applicants were men,
then p = 0.12.
" Example 8-10
A survey of 200,000 boat owners found that 12% of the pleasure boats were named
Serenity. Find the 95% confidence interval of the true proportion of boats named Serenity. (Source: USA Today Snapshot, July 2, 1993)
Solution
From the Snapshot, ft = 0.12 (i.e., 12%), and n = 200,000. Since Za/2 = 1.96, substituting in the formula
ft 0.12 - (1.96)
(Za/2)
Vi!- <
p
< p + (Za/2)
Vi!-
yields
(02~,~~8) < p < 0.12 + (1.96) (02~,~~8)
0.119
< P < 0.121
Hence, one can say with 95% confidence that the true percentage of boats named Serenity is between 11.9% and 12.1 %.
Sample Size for
Proportions
.:::.;
C;'i
0.12 - 0.024
In order to find the sample size needed to determine a confidence interval about a proportion, use the following formula.
Bluman: Elementary
Statistics: A Step by Step
Approach, Fourth Edition
8.Confidence Intervalsand
Sample Size
Text
Section 8-4
© The McGraw-Hili
Companies. 2001
Confidence Intervals and Sample Size for Proportions
321
ObjectiveS. Determine the
minimum sample size for
finding aconfidence interval
foraproportion.
This formula can be found by solving the maximum error of estimate value for n:
E=
Za/2Vi!-
There are two situations to consider. First, if some approximation of P is known
(e.g., from a previous study), that value can be used in the formula.
Second, if no approximation of Pis mown, one should use P = 0.5. This value will
give a sample size sufficiently large to guarantee an accurate prediction.'given the confidence interval and the error of estimate. The reason is that when P and q are each 0.5,
the product P. qis at maximum, as shown here.
fiq
0.1
0.2
0.3
0.9
0.8
0.7
0.4
0.5
0.6
0.6
0.7
0.8
0.9
0.5
0.4
0.3
0.09
0.16
0.21
0.24
0.25
0.24
0.2
0.21
0.16
0.1
0.09
.............................................................................................................................................................................
Example 8+11
A researcher wishes to estimate, with 95% confidence, the proportion of people who
own a home computer. A previous study shows that 40% of those interviewed had a
computer at home. The researcher wishes to be accurate within 2% of the true proportion. Find the minimum sample size necessary.
Solution
Since Za/2
= 1.96, E = 0.02, P = 0040, and q = 0.60, then
n
=pq(~r = (OAO)(0.60)(~:~~r = 2304.96
which, when rounded up, is 2305 people to interview.
..............................................................................................................................................................................
The same researcher wishes to estimate the proportion of executives who own a car
phone. She wants to be 90% confident and be accurate within 5% of the true proportion.
Find the minimum sample size necessary.
Blumen: Elementery
Statistics:AStep byStep
Approech, Fourth Edition
322
I 8. Confidence Intervalsand I Text
© The McGraw-Hili
Companies, 2001
SempleSize
Chapter 8 Confidence Intervals and Sample Size
Solution
Since no prior knowledge of p is known, statisticians assign the values p = 0.5 and
fj = 0.5. The sample size obtained by using these values will be large enough to ensure
the specified degree of confidence. Hence,
n=
pq(~r = (O.5)(O.5)(~:~~r = 272.25
which, when rounded up, is 273 executives to ask.
In determining the sample size, the size of the population is irrelevant. Only the degree of confidence and the maximum error are necessary to make the determination.
.'
8-47. In each case, findp and q.
a. n = 80 and X = 40
b. n = 200 and X = 90
c. n = 130 and X = 60
d. n = 60 and X = 35
e. n = 95 and X = 43
8-48. (aus) Find p and q for each percentage. (Use each
percentage for p.)
a. 12%
b.29%
c. 65%
d.53%
e. 67%
8-49. A U.S. TravelData Center surveyconductedfor
BetterHomesand Gardens of 1500 adultsfound that 39%
said that they would take more vacations this year than last
year.Find the 95% confidenceintervalof the true proportion
of adults who said that they will travelmore this year.
Source:USA Today, April20, 1995.
8-50. Arecent study of loa people in Miami found 27 were
obese.Find the 90% confidenceinterval of the population
proportionof individualsliving in Miami who are obese.
Source: Based on information from the Centerfor DiseaseControl
andPrevention, USAToday, March4, 1997.
8-51. An employment counselor found that in a sample of
100 unemployed workers, 65% were not interested in
returning to work. Find the 95% confidenceinterval of the
true proportion of workers who do not wish to return to
work.
8-52. A Today/CNN/GallupPoll of 1015 adults found that
13% approved of the job Congress was doingin 1995. Find
the 95% confidence interval of the true proportion of adults
who feel this way.
Source: USAToday, March31, 1995.
8-53. A survey found that out of 200 workers, 168 said
they were interrupted three or more times an hour by phone
messages, faxes, etc. Find the 90% confidence interval of
the population proportion of workers who are interrupted
three or more times an hour.
Source:Basedon informationfrom USAToday Snapshot, August
6,1997.
8-54. A survey of 120 female freshmen showed that 18 did
not wish to work after marriage. Find the 95% confidence
interval of the true proportion of females who do not wish
to work after marriage.
8-55. A study by the University of Michigan found that
one in five 13- and l4-year-olds is a sometime smoker. To
see how the smoking rate of the students at a large school
district compared to the national rate, the superintendent
surveyed 200 13- and l4-year-old students and found that
23% said they were sometime smokers. Find the 99%
confidence interval of the true proportion and compare this
with the University of Michigan's study.
Source: USAToday, July 20, 1995.
8-56. A survey of 80 recent fatal traffic accidents showed
that 46 were alcohol-related. Find the 95% confidence
interval of the true proportion of fatal alcohol-related
accidents.
8-57. A survey of 90 families showed that 40 owned at
least one gun. Find the 95% confidence interval of the true
proportion of families who own at least one gun.
8-58. In a certain state, a survey of 500 workers showed
that 45% belonged to a union. Find the 90% confidence
interval of the true proportion of workers who belong to a
union.
8-59. For a certain age group, a study of 100 people who
died showed that 25% had died of cancer. Find the
I
UIUIIIO
-._-~-
•
Statistics;A Step byStep
Approach, Fourth Edition
SampleSize
Companies. 2001
Section 8-4
proportion of individuals who die of cancer in that age
group. Use the 98% confidence interval.
8-60. A researcher wishes to be 95% confident that her
~stimate of the true proportion of individuals who travel
overseas is within,0.04 of the true proportion. Find the
sample size necessary. In a prior study, a sample of 200
people showed that 80 traveled overseas last year.
8-61. A medical researcher wishes to determine the
percentage of females who take vitamins. He wishes to be
99% confident that the estimate is within 2 percentage
points of the true proportion. A recent study of 180 females
showed that 25% took vitamins.
a. How large should the sample size be?
b. If no estimate of the sample proportion is available,
how large should the sample be?
8-62. A recent study indicated that 29% of the 100 women
over 55 in the study were widows.
a. How large a sample must one take to be 90% confident
that the estimate is within 0.05 of the true proportion of
women over 55 who are widows?
b. If no estimate of the sample proportion is available,
how large should the sample be?
8-63. A researcher wishes to estimate the proportion of
adult males who are under 5 feet 5 inches tall. She wants to
MINITAB
Step by Step
Confidence Intervals and Sample Size for Proportions
323
be 90% confident that her estimate is within 5% of the true
proportion.
a. How large a sample should be taken if in a sample of
300 males, 30 were under 5 feet 5 inches tall?
b. If no estimate of the sample proportion is available,
how large should the sample be?
8-64. An educator desires to estimate, within 0.03, the true
proportion of high school students who study at least 1 hour
each school night. He wants to be 98% confident.
a. How large a sample is necessary? Previously, he
conducted a study and found that 60% of the 250 students
surveyed spent at least 1 hour each school night studying.
b. If no estimate of the sample proportion is available,
how large should the sample be?
*8-65. If a sample of 600 people is chosen and the
researcher desires to have a maximum error of estimate of
4% on the specific proportion who favor gun control, find
the degree of confidence. A recent study showed that 50%
were in favor of some form of gun control.
*8-66. In a study, 68% of 1015 adults said they believe the
Republicans favor the rich. If the margin of error was 3
percentage points, what was the confidence interval used
for the proportion?
Source: USAToday, March 31, 1995.
cr. l?,nnl\(lillJr1~
""I"' ;ID fl..
/P
it:!l,J1
11' " : i l l '"
TIl)
....Jt'"
M.1
'/(D'!Elill!©i2JITll(C',J;
ilmUvBlr'W'i£\ill
tFIDIf' Iffi Ji
ifif.k1J!.D@lPJ]@1ill
MINITAB will calculate a confidence interval given the statistics from a sample or given the raw
data. These instructions use the sample information in Example 8-9.
Example MT8-3
There were 500 nursing applications in a sample, including 60 from men. Find the 90% confidence interval of the actual proportion of male applicants.
1. Select
Stat>Basic Statistics>1 Proportion.
I Proportion Dialog Box
2. Click on the button for Summarized data.
CD I
Bluman: Elementllry
Statistics: AStep byStep
I 8.Confidence Intervels end , Text
I © The McGraw-Hili
Companies. 2001
Sample Size
Approach. Fourth Edition
324
Chapter 8 Confidence Intervals and Sample Size
Number of trials and enter 500.
4. In the Number of successes box enter 60.
5. Click on [Options].
3. Click in the box for
;.
1 Proportion Options
Dialog Box
6. Type 90.0 for the confidence level.
7. Check the box for Use test and
interval based on normal
dist ribution.
8. Click [OK] twice.
The results for the confidence interval will be displayed in the session window. The population
proportion is most likely between 10% and 14%. The Z- and P-values shown are used in hypothesis testing, covered in the next chapter.
Confidence Interval for
1 Proportion
.:!~
::;N':
'j,~'
TI·83
Step by Step
1. Press STAT and move the cursor to TESTS.
2. Press A (ALPHA, MATH) for 1 - PropZlnt.
3. Enter the value for X.
4. Enter the value for n.
5. Type in the correct confidence interval.
6. Select
Calculate and press ENTER.
Example TI8-5
Find the 95% confidence interval of p when X = 60 and n
= 500, as in Example 8-9.
Input
Output
I-ProF'Z I nt
I-PropZInt
>::: 60
·c--0
n ....
II:::!
C-Level: .95
Ca Lcu l at.e
(.09152~.14848)
.
~.=.
1~
L.
n=500
The 95% confidence level for pis 0.09152 < P < 0.14848. Pis also given.
Bluman: Elementary
Statistics: A Step by Step
Approach. Fourth Edition
B. Confidence Intervals and
Sample Size
Text
Section 8-5
Confidence Intervals
for Variances and
Standard Deviations
Objective 6. Find a
confidence interval fora
variance and astandard
deviation.
© The McGraw-Hili
Companies. 2001
Confidence Intervals for Variances and Standard Deviations
325
In the previous section, confidence intervals were calculated for means and proportions.
This section will explain how to find confidence intervals for variances and standard deviations. In statistics, the variance and standard deviation of a variable are as important
as the mean. For example, when products that fit together (such as pipes) are manufactured, it is important to keep the variations of the diameters of the products as small as
possible; otherwise, they will not fit together properly and have to be scrapped. In the
manufacture of medicines, the variance and standard deviation of the medication in the
pills play an important role in making sure patients receive the proper dosage. For these
reasons, confidence intervals for variances and standard deviations are necessary.
In order to calculate these confidence intervals, a new statistical distribution is
needed. It is called the chi-square distribution.
The chi-square variable is similar to the t variable in that its distribution is a family
of curves based on the number of degrees of freedom. The symbol for chi-square is X2
(Greek letter chi, pronounced "ki"). Several of the distributions are shown in Figure
8-9, along with the corresponding degrees of freedom. The chi-square distribution is obtained from the values of (n - 1)s2/cr when random samples are selected from a normally distributed population whose variance is cr.
Figure 8-9
The Chi-Square
Family of Curves
d.t. =1
•
e
326
8luman: Elementary
Statistics:AStep byStep
Approach. Fourth Edition
Chapter 8
1'.
.: ',"'.,'
:.' ",,','. ' .: ,".::
,dellrees
,fq~mulated", , i{
"inathemalician'.name
,;H,¥"s,hel·jD;1.~f)~~6.il~',
·\.was'studYitJ~1h~;:ac;cu
•
!~:;~aM~
i>111~flt
. Example 8-'13
Text
© The McGraw-Hili
Companies, 2001
Confidence Intervals and Sample Size
,Jhex~distrlbutioiJwith'2n.
, ' ';contdh
8.Confidence Intervals and
Sample Size
A em-square variable cannot be negative, and the distributions are positively
skewed. At about 100 degrees of freedom, the em-square distribution becomes somewhat symmetrical. The area under each chi-square distribution is equal to 1.00 or 100%.
Table G in Appendix C gives the values for the chi-square distribution. These values are used in the denominators of the formulas for confidence intervals. Two different
values are used in the formula. One value is found on the left side of the table and the
other is on the right. For example, to find the table values corresponding to the 95% confidence interval, one must first change 95% to a decimal and subtract it from 1 (1 - 0.95
= 0.05). Then divide the answer by 2 (cd2 = 0.05/2 = 0.025). This is the column on the
right side of the table, used to get the values for :dght. To get the value for Xfeft, subtract
the value of cd2 from 1 (1 - 0.0512 = 0.975). Finally, find the appropriate row corresponding to the degrees of freedom, n - 1. A similar procedure is used to find the val-.
ues for a 90% or 99% confidence interval.
Find the values for ~ght and Xfeft for a 90% confidence interval when n = 25.
Solution
To find X~ght, subtract 1 - 0.90 = 0.10 and divide by 2 to get 0.05.
To find Xfeft' subtract 1 - 0.05 to get 0.95. Hence, use the 0.95 and 0.05 columns
and the row corresponding to 24 d.f. See Figure 8-10.
Figure 8-10
x2Table for
Example 8-13
The answers are
= 36.415
X~ft = 13.848
X~ght
The best estimates for if and (T are i2 and s respectively.
In order to find confidence intervals for variances and standard deviations, one
must assume that the variable is normally distributed.
The formulas for the confidence intervals are shown next.
I
---
StBtistics: AStepbyStep
Approach. Fourth Edition
Sample Size
Companies. 2001
Section 8-5
Confidence Intervals for Variances and Standard Deviations
327
Recall that i2 is the symbol for the sample variance and s is the symbol for the sample standard deviation. If the problem gives the sample standard deviation (s), be sure to
square it when using the formula. But if the problem gives the sample variance (S2), do
not square it when using the formula, since the variance is already in square units.
Rounding Rule for a Confidence Interval for a Variance or Standard Deviation When computing a confidence interval for a population variance or standard deviation using raw data,
round off to one more decimal place than the number of decimal places in the original data.
When computing a confidence interval for a population variance or standard deviation using a sample variance or standard deviation, round off to the same number of
decimal places as given for the sample variance or standard deviation.
The next example shows how to find a confidence interval for a variance and standard deviation.
Find the 95 % confidence interval for the variance and standard deviation of the nicotine
content of cigarettes manufactured if a sample of 20 cigarettes has a standard deviation
of 1.6 milligrams.
Solution
Since a = 0.05, the two critical values for the 0.025 and 0.975 levels for 19 degrees of
freedom are 32.852 and 8.907. The 95% confidence interval for the variance is found by
substituting in the formula:
(n - 1)s2
2
X right
2
<0'<
(20 - 1)(1.6)2
32.852
<
2
(n - 1)i2
2
Xleft
(20 - 1)(1.6)2
8.907
<
1.5 < cr < 5.5
0'
Hence, one can be 95 % confident that the true variance for the nicotine content is
between 1.5 and 5.5.
For the standard deviation, the confidence interval is
yff.5 <
0'
<
0'
1.2
-
< V53
< 2.3
t
Hence, one can be 95% confident that the true standard deviation for the nicotine content of all cigarettes manufactured is between 1.2 and 2.3 milligrams based on a sample
of 20 cigarettes.
Find the 90% confidence interval for the variance and standard deviation for the price
of an adult single-day ski lift ticket. The data represent a selected sample of nationwide
ski resorts. Assume the variable is normally distributed.
e
328
Bluman: Elementary
Statistics: A Step by Step
Approach. Fourth Edition
Chapter 8
8.Confidence Intervalsand
Sample Size
© The McGraw-Hili
Companies. 2001
Text
Confidence Intervals and Sample Size
$59
39
$54
$53
$52
$51
49
46
49
48
Source: USA Today, October 6, 1998.
Solution
STEP 1
Find the variance for the data. Use the formulas in Chapter 3 or your
calculator.
The variance s'l = 28.2
STEP 2
Find X~ght and Xfeft from Table G in Appendix C. Since a = 0.10, the two
critical values are 3.325 and 16.919 using d.f. = 9 and 0.95 and 0.05.
STEP 3
Substitute in the formula and solve.
(n - 1)s2
2
X right
2
<0"<
(10 - 1)(28.2)
16.919
<
15.0
<
2
0"
0"2
(n - 1)s'l
2
X left
(10 - 1)(28.2)
<
3.325·
< 76.3
For the standard deviation
y'I5 <
<
3.87 < 0" <
0"
V76.3
8.73
Hence one can be 90% confident that the standard deviation for the price of all singleday ski lift tickets of the population is between $3.87 and $8.73 based on a sample
of 10 nationwide ski resorts. (Two decimal places are used since the data are in dollars
and cents.)
Note: If you are using the standard deviation instead (as in Example 8-14) of the
variance, be sure to square the standard deviation when substituting in the formula.
8-67. What distribution must be used when computing
8-71. Find the 90% confidence interval for the variance
confidence intervals for variances and standard deviations?
and standard deviation for the time it takes an inspector to
check a bus for safety if a sample of 27 buses has a
standard deviation of 6.8 minutes. Assume the variable is
normally distributed.
8:-68.
What assumption must be made when computing
confidence intervals for variances and standard deviations?
8-69. Using Table G, find the values for Xreft and Xaght.
a. a = 0.05
b. a = 0.10
c. a = 0.01
d. a = 0.05
e. a = 0.10
n = 16
n= 5
n = 23
n = 29
n = 14
8-70. Find the 95% confidence interval for the variance
and standard deviation for the lifetime of batteries if a
sample of 20 batteries has a standard deviation of 1.7
months. Assume the variable is normally distributed.
8-72. Find the 99% confidence interval for the variance
and standard deviation of the weights of 25 gallon
containers of motor oil if a sample of 14 containers has a
variance of 3.2. The weights are given in ounces. Assume
the variable is normally distributed.
8-73. The sugar content (in grams) for a random sample of
4-ounce containers of applesauce is shown. Find the 99%
confidence interval for the population variance and
standard deviation. Assume the variable is normally
distributed.
I
.,J
,:tj
-------------------
;.
IlIURlan: Elementary
'surtisliCS:AStep byStep
,APproach. Fourth Edition
I 8.Confidence Intervals and I Text
© The McGraw-Hili
Companies, 2001
Sample Size
Section 8-6
18.6
21.0
22.1
21.6
19.5
20.3
19.7
19.5
20.2
19.6
20.8
20.1
19.3
18.9
20.7
19.9
20.4
20.7
18.9
20.3
8-74. Find the 90% confidence interval for the variance
andstandarddeviation of the ages of seniors at Oak Park
College if a sample of 24 students has a standard deviation
, of 2.3 years.Assume the variable is normally distributed.
8-75. Find the 98% confidence interval for the variance
andstandarddeviation for the time it takes a telephone
company to transfer a call to the correct office. A sample of
15callshas a standard deviation of 1.6 minutes. Assume
the variableis normally distributed.
8-76. A random sample of stock prices per share is shown.
Find the 90% confidence interval for the variance and
standard deviation for the prices. Assume the variable is
normallydistributed.
$26.69
75.37
3.81
6.94
40.25
$13.88
7.50
53.81
28.25
10.87
$12.00
43.00
45.12
60.50
14.75
$28.37
47.50
13.62
28.00
46.12
Summary
329
8-77. A service station advertises that customers will have
to wait no more than 30 minutes for an oil change. A
sample of 28 oil changes has a standard deviation of
5.2 minutes.Find the 95% confidence interval of the
population standard deviation of the time spent waiting
for an oil change.
8-78. Find the 95% confidence interval for the variance
and standard deviation of the ounces of coffee that a
machine dispensesin 12-ounce cups. Assume the variable
is normally distributed.The data are given here.
12.03, 12.10, 12.02, 11.98, 12.00, 12.05, 11.97, 11.99.
*8-79. A confidenceinterval for a standard deviation for
large samplestaken from a normally distributed population
can be approximatedby
s
s - za/2 • V2ii <
s
(T
< S + za/2 • V2ii
Find the 95% confidence interval for the population
standard deviationof calculator batteries. A sample of
200 calculatorbatteries has a standard deviation of
18 months.
Source: Pittsburgh Tribune Review, July 6,1997.
..
,
-
,
Summary
An important aspect of inferential statistics is estimation. Estimations of parameters of
populations are accomplished by selecting a random sample from that population and
choosing and computing a statistic that is the best estimator of the parameter. A good estimator must be unbiased, consistent, and relatively efficient. The best estimators of J.L and
p are X and p, respectively. The best estimators of a2 and (J' are S2 and s, respectively.
There are two types of estimates of a parameter: point estimates and interval estimates. A point estimate is a specific value. For example, if a researcher wishes to estimate the average length of a certain adult fish, a sample of the fish is selected and
measured. The mean of this sample is computed-e.g., 3.2 centimeters. From this sample mean, the researcher estimates the population mean to be 3.2 centimeters.
The problem with point estimates is that the accuracy of the estimate cannot be determined. For this reason, statisticians prefer to use the interval estimate. By computing
an interval about the sample value, statisticians can be 95% or 99% (or some other percentage) confident that their estimate contains the true parameter. The confidence level
is determined by the researcher. The higher the confidence level, the wider the interval
of the estimate must be. For example, a 95% confidence interval of the true mean length
of a certain species of fish might be
3.17
< J.L < 3.23
whereas the 99% confidence interval might be
3.15
< J.L < 3.25
When the confidence interval of the mean is computed, the z or t values are used,
depending on whether or not the population standard deviation is known and depending
L
_
e
330
Bluman: Elementary
Statistics: A Step by Step
Approach, Fourth Edition
8.Confidence IntelVals and
Sample Size
© The McGraw-Hili
Companies, 2001
Text
Chapter 8 Confidence Intervals and Sample Size
on the size of the sample. If a is known or n ;::: 30, the z values can be used. If o: is not
known, the t values must be used when the sample size is less than 30, and the population is normally distributed.
Closely related to computing confidence intervals is determining the sample size to
make an estimate of the mean. The following information is needed to determine the
minimum sample size necessary.
1. The degree of confidence must be stated.
2. The population standard deviation must be known or be able to be estimated.
3. The maximum error of estimate must be stated.
Confidence intervals and sample sizes can also be computed for proportions, using
the normal distribution, and confidence intervals for variances and standard deviations
can be computed using the chi-square distribution.
chi-square
distribution
degrees of freedom 312
estimation 298
estimator 299
interval estimate 299
325
confidence interval 300
confidence level 300
consistent estimator 299
Formula for the confidence interval of the mean when a is
known (when n 2:: 30, S can be used if o is unknown):
x -(Z0{2}:n) < JL < X + f0{2} :n)
= (ZW~
Formula for the confidence interval for a variance:
2
(n - 1)8
< u 2 < "'---::-"'--
Mort
.i:lghl
Formula for confidence interval for a standard deviation:
y(n -
where E is the maximum error.
Formula for the confidence interval of the mean when a is
unknown and n < 30:
X-
n=pq(~r
"'---=-"'--
ur
relatively efficient
estimator 299
t distribution 311
unbiased estimator 299
Formula for the sample size for proportions:
(n - 1)82
Formula for the sample size for means:
n
maximum error of
estimate 301
point estimate 299
proportion 318
1)82
rrighl
<u<
y(n -
1)82
Meft
ta/2(v,,) < JL < X + ta/2(v,,)
Formula for the confidence interval for a proportion:
ft -
(z0{2)
Vi!
< P < ft + (z0{2)
whereft = X/n and q = 1 -
Vi!-
ft.
8-80. TV Guide reported that Americans have their
television sets turned on an average of 54 hours per week.
A researcher surveyed 50 local households and found they
had their sets on an average of 47.3 hours. The standard
deviation of the sample was 6.2 hours. Find the 90%
confidence interval of the true mean. How does this
compare with the TV Guide findings?
Source: TV Guide 43, no. 30 (1995), p. 26.
e
------r-----r-------'T---~--I
Bluman:Elementary
Statistics: A Stepby Step
Approach. Fourth Edition
8. Confidence Intervals and
Sample Size
Text
© The McGraw-Hili
Companies. 2001
Section 8--6 Summary
8-81. A study of 36 members of the Central Park Walkers
showed that they could walk at an average rate of 2.6 miles
per hour. The sample standard deviation is 0.4. Find the
95% confidence interval for the mean for all walkers.
8-82. The average weight of 60 randomly selected
compact automobiles was 2627 pounds. The sample
standard deviation was 400 pounds. Find the 99%
confidence interval of a true mean weight of the
automobiles.
8-83. A U.S. Travel Data Center survey reported that
Americans stayed an average of 7.5 nights when they went
on vacation. The sample size was 1500. Find the 95%
confidence interval of the true mean. Assume the standard
deviation was 0.8 day.
Source: USA Today, April 20, 1995.
8-84. A recent study of 25 commuters showed that they
spent an average of $18.23 on public transportation per
week. The standard deviation of the sample was $3.00.
Find the 95 % confidence interval of the true mean. Assume
the variable is normally distributed.
8-85. For a certain urban area, in a sample of five months,
an average of 28 mail carriers were bitten by dogs each
month. The standard deviation of the sample was 3. Find
the 90% confidence interval of the true mean number of
mail carriers who are bitten by dogs each month. Assume
the variable is normally distributed.
8-86. A researcher is interested in estimating the average
salary of teachers in a large urban school district. She wants
to be 95% confident that her estimate is correct. If the
standard deviation is $1050, how large a sample is needed
to be accurate within $200?
331
interval of the true proportion of all adults who favor
visiting historical sites as vacations.
Source: USA Today, April 20, 1995.
8-89. In a study of 200 accidents that required treatment in
an emergency room, 40% occurred at home. Find the 90%
confidence interval of the true proportion of accidents that
occur at home.
8-90. In a recent study of 100 people, 85 said that they
were dissatisfied with their local elected officials. Find the
90% confidence interval of the true proportion of
individuals who are dissatisfied with their local elected
officials.
8-91. A nutritionist wishes to determine, within 2%, the
true proportion of adults who snack before bedtime. If she
wishes to be 95% confident that her estimate contains the
population proportion. how large a sample will she need? A
previous study found that 18% of the 100 people surveyed
said they did snack before bedtime.
8-92. A survey of 200 adults showed that 15% played
basketball for regular exercise. If a researcher desires to
find the 99% confidence interval of the true proportion of
adults who play basketball and be within 1% of the true
population, how large a sample should be selected?
8-93. The standard deviation of the diameter of 28 oranges
was 0.34 inch. Find the 99% confidence interval of the true
standard deviation of the diameters of the oranges.
8-94. A random sample of 22 lawn mowers was selected.
and the motors were tested to see how many miles per
gallon of gasoline each one obtained. The variance of the
measurements was 2.6. Find the 95% confidence interval of
the true variance.
8-87. A researcher wishes to estimate, within $25, the true
average amount of postage a community college spends
each year. If she wishes to be 90% confident, how large a
sample is necessary? The standard deviation is known to
be $80.
8-95. A random sample of 15 snowmobiles was selected,
and the lifetimes (in months) of the batteries was measured.
The variance of the sample was 8.6. Find the 90%
confidence interval of the true variance.
8-88. A U.S. Travel Data Center's survey of 1500 adults
found that 42% of respondents stated that they favor
historical sites as vacations. Find the 95% confidence
8-96. The heights of 28 police officers from a large-city
police force were measured. The standard deviatio~ of the
sample was 1.83 inches. Find the 95% confidence mterval
of the standard deviation of the heights of the officers.
l Would You Change the Channel? Revisited
! The estimates given in the article are point estimates. However, sin~e the mar~n of er\ ror is stated to be 3 percentage points, an interval estimate can easily be.obtamed. For
1 example. if 45% of the people changed the channel, then the confidence interval of ~e
1 true percentages of people who changed channels would be 42% < P < 48%. The arti1 cle fails to state whether a 90%, 95%. or some other percentage was used for the confi1 dence interval.
e
332
8luman: Elementary
Statistics:AStep by Step
Approach. Fourth Edition
8.Confidence Intervals and
Sample Size
© The McGraw-Hili
Text
Companies, 2001
Chapter 8 Confidence Intervals and Sample Size
Using the formula given in Section 8-4, a minimum sample size of 1068 would be
needed to obtain a 95% confidence interval for p, as shown next. Use p and q as 0.5,
since no value is known for p.
n
=pq(~r
1.96)2
n = (0.5)(0.5) (0.03 = 1067.1
n = 1068
The Data Bank is found in Appendix D, or on the World
Wide Web by following links from
www.mhhe.comlmath/statlbluman/
intervals of the mean length in miles of major North
American rivers. Find the mean of all the values and
determine if the confidenceintervals contain the mean.
1. From the Data Bank choosea variable,find the mean,
and constructthe 95% and 99% confidenceintervals of
the population mean.Use a sampleof at least 30
subjects. Find the mean of the populationand determine
whether it falls withinthe confidence interval.
S. From Data Set VI in Appendix D, select a sample of 20
values and find the 90% confidence interval of the mean
of the number of acres. Find the mean of all the values
and determine if the confidence interval contains the
mean.
2. Repeat Exercise 1 using a different variable and a
sample of 15.
6. From Data Set XIV in Appendix D, select a sample of
20 weights for the Pittsburgh Steelers and find the
proportion of players who weigh over 250 pounds.
Construct a 95% confidenceinterval for this proportion.
Find the proportion of all Steelers who weigh more than
250 pounds. Does the confidence interval contain this
value?
3. Repeat Exercise 1 using a proportion. For example,
construct a confidenceintervalfor the proportion of
individuals who did not completehigh school.
4. From Data Set ill in AppendixD, select a sample of 30
values and constructthe 95% and 99% confidence
Determine whether each statement is true or false. H the
statement is false, explain why.
1. Interval estimatesare preferredover point estimates
since a confidencelevel canbe specified. .
2. For a specificconfidence interval, the larger the sample
size, the smallerthe maximum error of estimate will be.
3. An estimatoris consistentif, as the sample size
decreases, the value of the estimatorapproaches the
value of the parameterestimated.
4. In order to determine the samplesize needed to
estimate a parameter, one mustknow the maximum
error of estimate.
Select the best answer.
S. When a 99% confidenceintervalis calculated instead
of a 95% confidenceintervalwith n being the same,
the maximum error of estimatewill be
a. Smaller
b. Larger
c. The same
d. It cannot be determined
6. The best point estimate of the population mean is
a. The samplemean
b. The samplemedian
c. The samplemode
d. The samplemidrange
7. When the populationstandard deviation is unknown
and sample size is less than 30, what table value should
be used in computinga confidence interval for a mean?
a.z
b. t
c. chi-square
d. None of the above
Complete the following statements with the best answer.
8. A good estimator shouldbe
---~~
-
_._._._._----_.
, and
__
----------
.~-_ ..
I Bluman: Elementary
Statistics: A Stepby Step
Approach, Fourth Edition
8.Confidence Intervalsand
Sample Size
I Text
© The McGraw-Hili
Companies, 2001
Section 8-ti
9. The maximum difference between the point estimate of
a parameter and the actual value of the parameter is
called_-10. The statement "The average height of an adult male is 5
feet 10 inches" is an example of a(n)
estimate.
11. The three confidence intervals used most often are the
_ _ _ %,
%, and
%.
12. A random sample of 49 shoppers showed that they
spend an average of $23.45 per visit at the Saturday
Mornings Bookstore. The standard deviation of the
sample was $2.80. Find the 90% confidence interval of
the true mean.
13. An irate patient complained that the cost of a doctor's
visit was too high. She randomly surveyed 20 other
patients and found that the meari amount of money
they spent on each doctor's visit was $44.80. The
standard deviation of the sample was $3.53. Find the
95% confidence interval of the population mean.
Assume the variable is normally distributed.
14. The average weight of 40 randomly selected school
buses was 4150 pounds. The standard deviation was
480 pounds. Find the 99% confidence interval of the
true mean weight of the buses.
15. In a study of 10 insurance sales reps from a certain
large city, the average age of the group was 48.6, and
the standard deviation was 4.1 years. Find the 95%
confidence interval of the population mean age of all
insurance sales reps in that city.
16. In a hospital, a sample of 8 weeks was selected, and it
was found that an average of 438 patients were treated
in the emergency room each week. The standard
deviation was 16. Find the 99% confidence interval of
the true mean. Assume the variable is normally
distributed.
17. For a certain urban area, it was found that in a sample
of 4 months, an average of 31 burglaries occurred each
month. The standard deviation was 4. Find the 90%
confidence interval of the true mean number of
burglaries each month.
18. A university dean wishes to estimate the average
number of hours freshmen study each week. The
A confidence interval for a median can be found by using
the formulas
n +-I + Za12vn
U -- 2
2
(round up)
333
standard deviation from a previous study is 2.6 hours.
How large a sample must be selected if he wants to be
99% confident of finding whether the true mean differs
from the sample mean by 0.5 hour?
19. A researcher wishes to estimate within $300 the true
average amount of money a county spends on road
repairs each year. If she wants to be 90% confident,
how large a sample is necessary? The standard
deviation is known to be $900.
20. Arecent study of 75 workers found that 53 people rode
the bus to work each day. Find the 95% confidence
interval of the proportion of all workers who rode the
bus to work.
21. In a study of 150 accidents that required treatment in
an emergency room, 36% involved children under 6
years of age. Find the 90% confidence interval of the
true proportion of accidents that involve children under
the age of 6.
22. A survey of 90 families showed that 40 owned at least
one television set. Find the 95% confidence interval of
the true proportion of families who own at least one
television set.
23. A nutritionist wishes to determine, within 3%, the true
proportion of adults who do not eat any lunch. If he
wishes to be 95 % confident that his estimate contains
the population proportion, how large a sample will be
necessary? A previous study found that 15% of the 125
people surveyed said they did not eat lunch.
24. A sample of 25 novels has a standard deviation of nine
pages. Find the 95% confidence interval of the
population standard deviation.
25. Find the 90% confidence interval for the variance and
standard deviation for the time it takes a state police
inspector to check a truck for safety if a sample of 27
trucks has a standard deviation of 6.8 minutes. Assume
the variable is normally distributed.
26. A sample of 20 automobiles has a pollution by-product
release standard deviation of 2.3 ounces when one
gallon of gasoline is used. Find the 90% confidence
interval of the population standard deviation.
Suppose a data set has 30 values, and one wants to find the
95% confidence interval for the median. Substituting in the
formulas, one gets
U = 30
L=n-U+I
to define positions in the set of ordered data values.
Summary
I..,
+ 1 + 1.96V30 = 21
2
2
L = 30 - 21 + 1 = 10
when n = 30 and
Za/2
= 1.96.
(rounded up)
e
334
Bluman: Elementary
Statistics:AStepby Step
Approach. Fourth Edition
8.Confidence Intervals and
SampleSize
© The McGraw-Hili
Text
Companies, 2001
Chapter 8 Confidence Intervals and Sample Size
Arrange the data in order from smallest to largest, and
then select the 1Ot)1 and 21st values of the data array;
Aence, XIO < med < X21•
Find the 90% confidence interval for the median for
the data in Exercise 8-12.
Use MlNITAB, the TI-83, or a computer program of your
choice to complete the following exercises.
2. Using the same data or different data, construct a
confidence interval for a proportion. For example, you
might want to find the proportion of passes completed
by the quarterback or the proportion of passes that were
intercepted. Write a short paragraph summarizing the
results.
1. Select several variables, such as the number of points a
football team scored in each game of a specific season,
the number of passes completed, or the number of yards
gained. Using confidence intervals for the mean,
determine the 90%,95%, and 99% confidence intervals.
(Use z or t, whichever is relevant.) Decide which you
think is most appropriate. When this is completed, write
a summary of your findings by answering the following
questions.
a. What was the purpose of the study?
b. What was the population?
c. How was the sample selected?
d. What were the results obtained using confidence
intervals?
e. Did you use z or t? Why?
You may use the following websites to obtain raw data:
http://www.mhhe.com/math/statlbluman/
http://lib.stat.cmu.eduIDASL
hrtp:llwww.oecd.orglstatlist.htm
http://www.statcan.calenglishl