Download Distribution of Data and the Empirical Rule

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Time series wikipedia , lookup

Transcript
302360_File_B.qxd
7/7/03
7:18 AM
Page 1
Distribution of Data and the Empirical Rule
1
Distribution of Data and the Empirical Rule
■
Stem-and-Leaf Diagrams
■
Frequency Distributions and
Histograms
■
Normal Distributions and
the Empirical Rule
■
z-Scores
■
Stem-and-Leaf Diagrams
Although the mean, the median, the mode, and the standard deviation provide
some information about a set of data and the distribution of the data, it is often
helpful to use graphical procedures that visually illustrate precisely how the values in a set of data are distributed.
Many small sets of data can be graphically displayed by using a stem-andleaf diagram. For instance, consider the following history test scores:
65, 72, 96, 86, 43, 61, 75, 86, 49, 68, 98, 74, 84, 78, 85, 75, 86, 73
A Stem-and-Leaf Diagram of
a Set of History Test Scores
Stems
4
Leaves
39
5
6
158
7
234558
8
45666
9
68
Legend: 8/6 represents 86
In this form the data are called raw data because the data have not been organized. With raw data it is generally difficult to observe how the data are distributed. In the stem-and-leaf diagram shown at the left, we have organized the test
scores by placing all the scores that are in the 40s in the top row, the scores that are
in the 50s in the second row, the scores that are in the 60s in the third row, and so
on. The tens digits of the scores have been placed to the left of the vertical line. In
this diagram they are referred to as stems. The ones digits of the test scores have
been placed in the proper row to the right of the vertical line. In this diagram they
are the leaves. It is now easy to make observations about the distribution of the
scores. Only two of the scores are in the 90s, six of the scores are in the 70s, and
none of the scores are in the 50s. The lowest score is 43 and the highest is 98.
Steps in the Construction of a Stem-and-Leaf Diagram
1. Determine the stems and list the stems in a column from smallest to largest.
2. List the remaining digits of each stem as a leaf to the right of its stem.
3. Include a legend that explains the meaning of the stem and the leaves. Include a
title for the diagram.
The choice of how many leading digits to use as the stem will depend on the
particular application and can be best explained with an example.
EXAMPLE 1
Construct a Stem-and-Leaf Diagram
A travel agent has recorded the amount spent by customers for a cruise. Construct
a stem-and-leaf diagram for the data.
Amount Spent for a Cruise, Summer of 2003
$3600
$4700
$7200
$2100
$5700
$4400
$9400
$6200
$5900
$2100
$4100
$5200
$7300
$6200
$3800
$4900
$5400
$5400
$3100
$3100
$4500
$4500
$2900
$3700
$3700
$4800
$4800
$2400
Continued ➤
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 2
2
Solution One method of choosing the stems is to let each thousands digit be a
stem and each hundreds digit be a leaf. If the stems and leaves are assigned in this
manner, then the notation 21, which has a stem of 2 and a leaf of 1, represents a
cost of $2100 and the notation 54 represents a cost of $5400. The diagram can now
be constructed by writing all of the stems, from smallest to largest, in a column to
the left of a vertical line and writing the corresponding leaves to the right of the
vertical line.
Amount Spent for a Cruise
Stems
Leaves
2
1149
3
116778
4
14557889
5
24479
6
22
7
23
8
9
4
Legend: 73 represents $7300
CHECK YOUR PROGRESS 1
The following table lists the ages of the customers
who purchased a cruise. Construct a stem-and-leaf diagram for the data.
Ages of Customers Who
Purchased a Cruise
32
45
66
21
62
68
61
55
23
38
44
77
46
50
33
35
42
45
51
28
40
41
52
52
72
64
51
33
Solution See page S1.
Sometimes two sets of data can be compared by using a back-to-back stemand-leaf diagram, which has common stems with leaves from one data set displayed to the right of the stems and leaves from the other data set displayed to the
left of the stems. For instance, the following back-to-back stem-and-leaf diagram
shows the test scores for two biology classes that took the same test.
Copyright © Houghton Mifflin Company. All rights reserved.
7/7/03
7:18 AM
Page 3
Distribution of Data and the Empirical Rule
3
Biology Test Scores
8 A.M. class
10 A.M. class
2
4
58
7
5
6799
58
6
2348
1233378
7
1335568
44556889
8
23666
24558
9
45
Legend: 37 represents 73
Legend: 82 represents 82
QUESTION Which biology class did better on the test?
■
Frequency Distributions and Histograms
Large sets of data are often displayed using a frequency distribution or a histogram. For example, consider the following situation. An Internet service
provider (ISP) has installed new computers. To estimate the new download times
its subscribers will experience, the ISP surveyed 1000 subscribers to determine the
time each subscriber required to download a particular file from the Internet site
music.net. The results of that survey are summarized in the following table.
Download time
(in seconds)
Number of
subscribers
0– 10
28
10– 20
129
20– 30
355
30– 40
345
40– 50
121
50– 60
22
A grouped frequency distribution
400
Number of subscribers
302360_File_B.qxd
350
300
250
200
150
100
50
0
10
20
30
40
50
60
Download time, in seconds
A histogram of the frequency distribution at
the left
The above table is called a grouped frequency distribution. It shows how often (frequently) certain events occurred. Each interval 0–10, 10–20, . . . is called a
ANSWER
The 8 A.M. class did better on the test because it had more scores in the 80s
and 90s and fewer scores in the 40s, 50s, and 60s. The scores in the 70s were
similar for both classes.
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 4
4
class. This distribution has six classes. For the 10–20 class, 10 is the lower class
boundary and 20 is the upper class boundary. Any data value that lies on a common boundary is assigned to the higher class. The graph of a frequency distribution is called a histogram. A histogram provides a pictorial view of how the data
are distributed. In the above histogram, the height of each bar indicates how many
subscribers experienced the download times indicated by the class represented
below on the horizontal axis. The center point of a class is called a class mark. In
the above histogram, the class marks 5, 15, 25, 35, 45, 55 are shown by the red tick
marks on the horizontal axis.
Instead of using classes with a width of 10 seconds, the ISP could have chosen
a smaller class width. A smaller class width produces more classes. For instance, if
each class width were 5 seconds, the frequency distribution and histogram for the
music.net example would have the form shown below.
Number of
subscribers
0– 5
8
5– 10
20
10– 15
40
15– 20
89
20– 25
155
25– 30
200
30– 35
196
35– 40
149
40– 45
76
45– 50
45
50– 55
14
55– 60
8
A frequency distribution with 12 classes
200
Number of subscribers
Download time
(in seconds)
175
150
125
100
75
50
25
0
5 10 15 20 25 30 35 40 45 50 55 60
Download time, in seconds
A histogram of the frequency distribution at
the left
Examine the following distribution. It shows the percent of subscribers who
are in each class, as opposed to the frequency distribution above, which shows the
number of subscribers in each class. The type of frequency distribution that lists
the percent of data in each class is called a relative frequency distribution. The
relative frequency histogram shown at the right below was drawn by using the
data in the relative frequency distribution. It shows the percent of subscribers
along its vertical axis.
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 5
Distribution of Data and the Empirical Rule
Number of
subscribers
0– 5
0.8
5– 10
2.0
10– 15
4.0
15– 20
8.9
20– 25
15.5
25– 30
20.0
30– 35
19.6
35– 40
14.9
40– 45
7.6
45– 50
4.5
50– 55
1.4
55– 60
0.8
A relative frequency distribution
20
Percent of subscribers
Download time
(in seconds)
5
15
10
5
0
5 10 15 20 25 30 35 40 45 50 55 60
Download time, in seconds
A relative frequency histogram
One advantage of using a relative frequency distribution instead of a frequency distribution is that there is a direct correspondence between the percent of
the data that lie in a particular portion of the relative frequency distribution and
probability. For instance, in the relative frequency distribution above, the percent
of the data that lie between 35 and 40 seconds is 14.9%. Thus, if a subscriber is
chosen at random, the probability that the subscriber will require between 35 and
40 seconds to download the music file is 0.149.
Download time
(in seconds)
Percent of
subscribers
0 –5
0.8
5 –10
2.0
10 –15
4.0
15 –20
8.9
20 –25
15.5
25 –30
20.0
30 –35
19.6
35 –40
14.9
40 –45
7.6
45 –50
4.5
50 –55
1.4
55 –60
0.8
EXAMPLE 2
Use a Relative Frequency Distribution

 Sum is

 14.9%

Use the music.net relative frequency distribution above to determine





 Sum is

 68.8%





Solution
a. The percent of data in all classes with a lower bound of 25 seconds or more is
the sum of the percents for all of the classes highlighted in red in the distribution at the left. The percent of subscribers who required at least 25 seconds to
download the file is 68.8%.
b. The percent of data in all classes with a lower bound of at least 5 seconds and
an upper bound of 20 seconds or less is the sum of the percents for all of the
classes highlighted in blue in the distribution at the left. Thus the percent of
subscribers who required from 5 to 20 seconds to download the file is 14.9%.
The probability that a subscriber chosen at random will require from 5 to
20 seconds to download the file is 0.149.
Continued ➤
a. the percent of subscribers who required at least 25 seconds to download the file.
b. the probability that a subscriber chosen at random will require from 5 to
20 seconds to download the file.
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 6
6
CHECK YOUR PROGRESS 2
Use the relative frequency distribution below to
determine
a. the percent of the states that pay an average teacher salary of at least $45,000.
b. the probability that a state selected at random pays an average teacher salary
of at least $30,000 but less than $39,000.
Average Salaries of Public School Teachers, 1998–1999
Average Salary, s
Number of States
Relative Frequency
$27,000 s $30,000
3
6%
$30,000 s $33,000
7
14%
$33,000 s $36,000
12
24%
$36,000 s $39,000
9
18%
$39,000 s $42,000
6
12%
$42,000 s $45,000
3
6%
$45,000 s $48,000
5
10%
$48,000 s $51,000
3
6%
$51,000 s $54,000
2
4%
Source: www.nea.org.
Solution See page S1.
There is a geometric analogy between the percents of data and probabilities we calculated in Example 2 and the relative frequency histogram for the data. For instance, the percent of data described in part a. of Example 2 corresponds to the
area shown by the red bars in the histogram on the left below. The percent of data
described in part b. corresponds to the area shown by the blue bars in the histogram on the right below.
20
Percent of subscribers
Percent of subscribers
20
15
10
5
0
5 10 15 20 25 30 35 40 45 50 55 60
15
10
5
0
Download time, in seconds
25 seconds or more
■
5 10 15 20 25 30 35 40 45 50 55 60
Download time, in seconds
At least 5 but less than 20 seconds
Normal Distributions and the Empirical Rule
A histogram for a set of data provides us with a tool that can indicate patterns or
trends in the distribution of data. The terms uniform, skewed, symmetrical, and normal are used to describe the distributions of some sets of data.
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 7
Distribution of Data and the Empirical Rule
7
A uniform distribution, shown in the figure below, is generated when all of
the observed events occur with the same frequency. The graph of a uniform distribution remains at the same height over the range of the data. Some random
processes produce distributions that are uniform or nearly uniform. For example,
if the spinner below is used to generate numbers, then in the long run each of the
numbers 1, 2, 3, . . . , 8 will be generated with approximately the same frequency.
Random number generator
Frequency of x
Uniform distribution
1
2
7
5
3
1
x
3
4
5
6
7
8
x
Skewed distributions
Frequency of x
mean = median = mode
2
8
A symmetrical distribution, shown at the left, is symmetrical about a vertical
center line. If you fold a symmetrical distribution along the center line, the right
side of the distribution will match the left side. The following data sets are examples of distributions that are nearly symmetrical: the weights of all male students,
the heights of all teenage females, the prices of a gallon of regular gasoline in a
large city, the mileages for a particular type of automobile tire, and the amounts of
soda dispensed by a vending machine. In a symmetrical distribution, the mean,
the median, and the mode are all equal and they are located at the center of the
distribution.
Skewed distributions, shown in the figures below, have a longer tail on one
side of the distribution and shorter tail on the other side. A distribution is skewed
to the left if it has a longer tail on the left and is skewed to the right if it has a longer
tail on the right. In a distribution that is skewed to the left, the mean is less than
the median, which is less than the mode. In a distribution that is skewed to the
right, the mode is less than the median, which is less than the mean.
Frequency of x
Frequency of x
Symmetrical distribution
Center line
6
4
Skewed left
mean
median
mode
x
Skewed right
x
mode median
mean
Many examinations yield test scores that have skewed distributions. For instance, if a test designed for students in the sixth grade is given to students in a
ninth grade class, most of the scores will be high, and the distribution of the test
scores will be skewed to the left.
Discrete values are separated from each other by an increment, or “space.”
For example, only whole numbers are used to record the number of points a
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 8
8
basketball player scores in a game. The possible numbers of points s that the
player can score are restricted to the discrete values 0, 1, 2, 3, 4, . . . . The variable s
is a discrete variable. Different scores are separated from each other by at least 1
point. Any variable that is based on counting procedures is a discrete variable.
Histograms are generally used to show the distribution of discrete variables.
Continuous values are values that can take on all real numbers in some interval. For example, the possible times that it takes to drive to the grocery store represent a continuous value. The time is not restricted to natural numbers such as
4 minutes or 5 minutes. In fact, the time may be any part of a minute, or of a second if we care to measure that precisely. A variable such as time that is based on
measuring with smaller and smaller units is a continuous variable. Continuous
curves, rather than histograms, are used to show the distributions of continuous
variables.
Distributions of continuous variables
f (t)
f (x)
a. Bimodal
t
f (w)
b. Skewed right
x
c. Symmetrical
w
In some cases a continuous curve is used to display the distribution of a set of
discrete data. For instance, when we have a large set of data and the class intervals are very small, the shape of the top of the histogram approaches a smooth
curve. See the two figures below. Thus, when graphing the distribution of very
large sets of data with very small class intervals, it is common practice to replace
the histogram with a smooth continuous curve.
A histogram for discrete data
A continuous distribution curve
f (x)
f (x)
x
If x is a continuous variable with
mean (the Greek letter mu)
and standard deviation , then
its normal distribution is given by
f x e 1
2
x 2
2
x
One of the most important statistical distributions is known as a normal distribution. The precise mathematical definition of a normal distribution is given by
the equation in the Take Note at the left; however, for many problems it is sufficient to know that all normal distributions have the following properties.
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 9
Distribution of Data and the Empirical Rule
9
Properties of a Normal Distribution
A normal distribution has a bell shape that is symmetric about a vertical line through
its center. The mean, the median, and the mode of a normal distribution are all equal
and they are located at the center of the distribution.
A normal distribution
f (x)
2.15%
µ − 3σ
13.6%
µ − 2σ
34.1%
34.1%
13.6%
µ −σ
µ
µ +σ
68.2% of the data
95.4% of the data
99.7% of the data
µ + 2σ
2.15%
x
µ + 3σ
The Empirical Rule: In a normal distribution, about
68.2% of the data lies within 1 standard deviation of the mean.
95.4% of the data lies within 2 standard deviations of the mean.
99.7% of the data lies within 3 standard deviations of the mean.
The Empirical Rule can be used to solve many problems that involve a normal distribution.
EXAMPLE 3
Use the Empirical Rule
A survey of 1000 U.S. gas stations found that the price
charged for a gallon of regular gas can be closely approximated by a normal distribution with a mean of
$1.90 and a standard deviation of $0.20. How many of
the stations charge
a. between $1.50 and $2.30 for a gallon of regular gas?
b. less than $2.10 for a gallon of regular gas?
c. more than $2.30 for a gallon of regular gas?
Data within 2σ of µ
f(x)
34.1%
34.1%
13.6%
µ − 2σ
13.6%
µ
95.4%
µ + 2σ
x
Solution
a. The $1.50 per gallon price is 2 standard deviations below the mean. The $2.30
price is 2 standard deviations above the mean. In a normal distribution,
95.4% of all data lies within 2 standard deviations of the mean. (See the normal distribution at the left.) Therefore, approximately
95.4%1000 0.9541000 954
of the stations charge between $1.50 and $2.30 for a gallon of regular gas.
Continued ➤
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 10
10
f(x)
Data less than 1 σ above µ
b. The $2.10 price is 1 standard deviation above the mean. (See the normal distribution at the left.) In a normal distribution, 34.1% of all data lies between
the mean and 1 standard deviation above the mean. Thus, approximately
34.1%
34.1%1000 0.3411000 341
50%
x
µ µ +σ
84.1% of
the data
f(x)
c.
Data more than 2 σ above µ
2.3%
µ − 2σ
2.3%
µ
95.4%
µ + 2σ
x
of the stations charge between $1.90 and $2.10 for a gallon of regular gasoline.
Half of the stations charge less than the mean. Therefore, about 341 500 841 of the stations charge less that $2.10 for a gallon of regular gas. This
problem can also be solved by computing 34.1% 50% 84.1% of 1000.
The $2.30 price is 2 standard deviations above the mean. In a normal distribution, 95.4% of all data is within 2 standard deviations of the mean. This
means that the other 4.6% of the data will lie either more than 2 standard deviations above the mean or less than 2 standard deviations below the mean.
We are only interested in the data that lie more than 2 standard deviations
1
above the mean, which is 2 of 4.6%, or 2.3%, of the data. (See the distribution
at the left.) Thus about 2.3%1000 0.0231000 23 of the stations
charge more than $2.30 for a gallon of regular gas.
CHECK YOUR PROGRESS 3 A vegetable distributor knows that during the month
of August, the weights of its tomatoes were normally distributed with a mean of
0.61 pound and a standard deviation of 0.15 pound.
a. What percent of the tomatoes weighed less than 0.76 pound?
b. In a shipment of 6000 tomatoes, how many tomatoes can be expected to
weigh more than 0.31 pound?
c. In a shipment of 4500 tomatoes, how many tomatoes can be expected to
weigh between 0.31 and 0.91 pound?
Solution See page S1.
■
z-Scores
When you take a test, it is natural to wonder how you will do compared to the
other students in the class. Will you finish in the top 10%, or will you be closer to
the middle? One statistic that is used to measure the position of a data value with
respect to other data values is known as the z-score.
z-Score
The z-score for a given data value x is the number of standard deviations between x
and the mean of the data. The following formulas are used to calculate the z-score for
a data value x.
Population:
zx x
Sample:
zx xx
s
In the next example, we use a student’s z-scores for two tests to determine
how well the student did on each test in comparison to the other students.
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 11
Distribution of Data and the Empirical Rule
EXAMPLE 4
11
Use z-Scores
Ruben has taken two tests in his math class. He scored 72 on the first test, for
which the mean was 65 and the standard deviation was 8. He received a 60
on the second test, for which the mean was 45 and the standard deviation
was 12. In comparison to the other students, did Ruben do better on the first
or the second test?
b. Stacy is in the same math class as Ruben. Stacy’s z-score for the first test was
0.75 . What was Stacy’s score on the first test?
a.
In any application, the quantity
x and the standard deviation
are both measured in the same
units. Thus a z-score, which is
the quotient of x and , is a
dimensionless measure.
Solution
72 65
60 45
a. The z-score formula yields z72 8 0.875 and z60 12 1.25. Thus
Ruben scored 0.875 standard deviations above the mean on his first test and
1.25 standard deviations above the mean on the second test. In comparison to
his classmates, Ruben scored better on the second test than on the first test.
b. Substitute into the z-score formula and score for x.
x 65
8
6 x 65
x 59
0.75 Stacy’s score on the first test was 59.
CHECK YOUR PROGRESS 4
Cheryl took two quizzes in her history class. She scored 15 on the first quiz,
for which the mean was 12 and the standard deviation was 2.4. Her score on
the second quiz, for which the mean was 11.5 and the standard deviation was
2.2, was 14. In comparison to her classmates, did Cheryl do better on the first
or the second quiz?
b. Greg is in the same history class as Cheryl. Greg’s z-score for the first quiz
was 2.5 . What was Greg’s score on the first quiz?
a.
Solution See page S1.
Topics for Discussion
1. Is it possible, in a normal distribution of data, for the mean to be much larger
than the median? Explain.
2. Must all large data sets have a normal distribution? Explain.
3. A professor gave a final examination to 110 students. Eighteen students had
examination scores that were more than one standard deviation above the
mean. Does this indicate that 18 of the students had examination scores that
were less than one standard deviation below the mean? Explain.
4. A set of data consists of the 525 monthly salaries, listed in dollars, of the employees of a large company. What units should be used for the z-scores associated with the salaries? Explain.
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 12
12
EXERCISES
In Exercises 1 to 8, determine whether the given statement
is true or false.
1. If a distribution is symmetric about a vertical line, then it
is a normal distribution.
2. Every normal distribution has a bell-shaped graph.
3. In a normal distribution, the mean, the median, and the
mode of the distribution all are located at the center of
the distribution.
4. In a distribution that is skewed to the left, the median of
the data is greater than the mean.
5. If a z-score for a data value x is negative, then x must also
be negative.
6. In every data set, 68.2% of the data lies within 1 standard
deviation of the mean.
7. Let x be the number of people who attend a baseball
game. The variable x is a discrete variable.
8. The time of day d in the lobby of a bank is measured with
a digital clock. The variable d is a continuous variable.
In Exercises 9 and 10, use the Empirical Rule to answer each
question.
9. In a normal distribution, what percent of the data lies
a. within 2 standard deviations of the mean?
b. more than 1 standard deviation above the mean?
c. between 1 standard deviation below the mean and
2 standard deviations above the mean?
10. In a normal distribution, what percent of the data lies
a. within 3 standard deviations of the mean?
Business and Economics
11.
State Sales Tax Rates Use the following frequency
distribution to determine
a. the percent of states in the U.S. that had a 2001 sales tax
of at least 5%.
b. the probability that a state selected at random had a
2001 sales tax rate of at least 3% but less than 5%.
2001 State Sales Tax Rate
Tax rate, r
Number
of states
Relative
frequency
0% r 1%
5
10%
1% r 2%
0
0%
2% r 3%
1
2%
3% r 4%
0
0%
4% r 5%
13
26%
5% r 6%
15
30%
6% r 7%
13
26%
7% r 8%
3
6%
Source: Time Almanac 2002
12. Waiting Time The amount of time customers spend
waiting in line at a bank is normally distributed, with
a mean of 3.5 minutes and a standard deviation of
0.75 minute. Find the probability that the time a customer
will spend waiting is
a. at most 2.75 minutes.
b. less than 2 minutes.
13. Weights of Parcels During a particular week, an
overnight delivery company found that the weights of
its parcels were normally distributed, with a mean of
24 ounces and a standard deviation of 6 ounces.
b. less than 2 standard deviations below the mean?
a. What percent of the parcels weighed between
12 ounces and 30 ounces?
c. between 2 standard deviations below the mean and
3 standard deviations above the mean?
b. What percent of the parcels weighed more than
42 ounces?
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 13
Distribution of Data and the Empirical Rule
14. Weights of Boxes of Corn Flakes The weights of the boxes
of corn flakes filled by a machine are normally
distributed, with an average weight of 14.5 ounces and
a standard deviation of 0.5 ounce. What percent of
the boxes
Social Sciences
19.
Presidential
Inauguration Ages
and Ages at Death The
table in Exercise 26 of
Section 8.4 lists the U.S.
presidents and their
ages at inauguration.
The table in Exercise 27
of Section 8.4 lists the
Marshall/Liaison/Getty Images
deceased U.S. presidents
as of December 2002, and
their ages at death.
a. weigh less than 14.0 ounces?
b. weigh between 13.5 and 15.0 ounces?
15. Duration of Long Distance Telephone Calls A telephone
company has found that the lengths of its long distance
telephone calls are normally distributed, with a mean of
225 seconds and a standard deviation of 55 seconds.
What percent of its long distance calls last
a. Construct a back-to-back stem-and-leaf diagram for
the data in the tables.
a. more than 335 seconds?
b.
b. between 170 and 390 seconds?
13
20.
What patterns, if any, are evident from the
diagram?
Average Salaries of Teachers Use the following
frequency distribution to determine
a. the percent of states in the U.S. that paid a 1998 – 1999
average teacher salary of at least $39,000.
Life and Health Sciences
16.
Median Income for Physicians The 1995 median
income for physicians was $160,000. (Source: AMA
Center for Health Policy Research) The distribution of
these incomes is skewed to the right. Is the mean of these
incomes greater than or less than $160,000?
b. the probability that a state selected at random paid a
1998 –1999 average teacher salary of at least $36,000
but less than $45,000.
Average Salaries of Public School Teachers,
1998–1999
17. Heights of Women A survey of 1000 women aged 20 to 30
found that their heights are normally distributed, with a
mean of 65 inches and a standard deviation of 2.5 inches.
a. How many of the women have a height that is within
1 standard deviation of the mean?
b. How many of the women have a height that is between 60 inches and 70 inches?
18.
Distribution of Data Consider the set of the heights
of all babies born in the United States during a
particular year. Do you think this data set can be closely
approximated by a normal distribution? Explain.
Average salary, s
Number
of states
Relative
frequency
$27,000 s $30,000
3
6%
$30,000 s $33,000
7
14%
$33,000 s $36,000
12
24%
$36,000 s $39,000
9
18%
$39,000 s $42,000
6
12%
$42,000 s $45,000
3
6%
$45,000 s $48,000
5
10%
$48,000 s $51,000
3
6%
$51,000 s $54,000
2
4%
Source: www.nea.org.
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 14
14
21. Test Scores The following relative frequency histogram
shows the distribution of test scores for 50 students who
took a history test.
Relative frequency
25%
20%
25. Comparison of Quiz Scores Ryan took two quizzes in his
art class. He scored 45 on the first quiz, for which the
mean was 51.4 and the standard deviation was 9.5. His
score on the second quiz, for which the mean was 53.6
and the standard deviation was 7.2, was 49. In comparison
to his classmates, did Ryan do better on the first or the
second quiz?
15%
10%
5%
0%
28 36 44 52 60 68 76 84 92 100
Test scores
a. What percent of the students scored at least 76 on
the test?
b. How many of the students received a score of at least
60 but less than 84?
22. Examination Duration Times At a university, 500 law
students took an examination. One student completed
the exam in 24 minutes. The mode for the completion
time is 50 minutes. The distribution of the times the
students took to complete the exam is skewed to the left.
Is the mean of these times greater than or less than
50 minutes?
23. Intelligence Quotients A psychologist finds that the
intelligence quotients of a group of patients are normally
distributed, with a mean of 104 and a standard deviation
of 26. Find the percent of the patients with IQs
26. Comparison of Test Scores Tanya took two tests in her
chemistry class. She scored 85 on the first test, for which
the mean was 79.4 and the standard deviation was 6.4.
Her score on the second test, for which the mean was
70.5 and the standard deviation was 5.3, was 78. In
comparison to her classmates, did Tanya do better on the
first or the second test?
Sports and Recreation
27.
Super Bowl Results, 1967–2001
AP/Wide
World
Photos
b.
b. between 130 and 182.
Distribution of Data The population of a resort city
consists mostly of wealthy families and families
with low incomes. Do you think the set of family incomes
for this city can be closely approximated by a normal
distribution? Explain.
35– 10
24– 7
27–10
42 –10
49– 26
33– 14
16– 6
26–21
20 –16
27– 17
16– 7
21– 17
27–17
55 –10
35– 21
23– 7
32– 14
38–9
20 –19
31– 24
16– 13
27– 10
38–16
37 –24
34– 19
24– 3
35– 31
46–10
52 –17
23– 16
14– 7
31– 19
39–20
30 –13
34– 7
a. Construct a back-to-back stem-and-leaf diagram for
the winning scores and the losing scores.
a. above 130.
24.
Super Bowl Scores The following table lists the
winning and losing scores for all of the Super Bowl
games up to the year 2001.
28.
What patterns, if any, are evident from the backto-back stem-and-leaf diagram?
Ironman Triathlon The following table lists the
winning times for the men’s and women’s Ironman
Triathlon World Championships, held in Kailua-Kona,
Hawaii. (Source: http://www.3athlon.org/races/ironman/
hawaii2001/statistik/index.php)
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 15
Distribution of Data and the Empirical Rule
30. Race Times The following relative frequency histogram
shows the distribution of times for the 1200 contestants
who finished a race.
Women, 1979–2000
11:47
8:29
8:20
12:55
9:35
9:17
11:16
8:34
8:21
11:21
9:01
9:07
9:25
8:31
8:04
12:01
9:01
9:32
9:38
8:09
8:33
10:54
9:14
9:24
9:08
8:28
8:24
10:44
9:08
9:13
9:06
8:19
8:17
10:25
8:55
9:26
8:54
8:09
8:21
10:25
8:58
8:51
8:08
9:49
9:20
16%
12%
8%
4%
0%
Time, in seconds
a. What percent of the contestants finished the race in
less than 80 seconds?
b. How many contestants had a time of at least 60 seconds but less than 80 seconds?
What patterns, if any, are evident from the backto-back stem-and-leaf diagram?
31. Baseball Attendance A baseball franchise finds that the
attendance at its home games is normally distributed,
with a mean of 16,000 and a standard deviation of 4000.
a. What percent of the home games have an attendance
between 8000 and 16,000?
Home Run Leaders The following tables list the
29.
20%
50 60 70 80 90 100 110 120
a. Construct a back-to-back stem-and-leaf diagram
for the data in the tables. Hint: Use the two-digit
“minutes” as your leaves, and insert a comma
between the leaves in each row so that they can be
easily distinguished from each other.
b.
24%
Relative frequency
Ironman Triathlon World Championships (Winning times
rounded to the nearest minute)
Men, 1978– 2000
15
numbers of home runs hit by the home run leaders
in the National and the American League from 1971 to
2001.
b. What percent of the home games have an attendance
of less than 12,000?
Home Run Leaders, 1971– 2001
National League
Physical Sciences and Engineering
48
40
44
36
38
38
52
40
48
48
31
37
40
36
37
37
49
39
47
40
38
35
46
43
40
47
49
70
65
50
73
American League
a. less than 326 pounds?
33
37
32
32
36
32
39
46
45
41
22
39
39
43
40
40
49
42
36
51
44
43
46
40
50
52
56
56
48
47
b. between 302 and 398 pounds?
52
a. Construct a back-to-back stem-and-leaf diagram
for the data in the tables.
b.
32. Breaking Points of Ropes The breaking points of a
particular type of rope are normally distributed, with a
mean of 350 pounds and a standard deviation of
24 pounds. What is the probability that a piece of this
rope chosen at random will have a breaking point of
What patterns, if any, are evident from the backto-back stem-and-leaf diagram?
33. Tire Mileage The mileages of WearEver tires are normally
distributed, with a mean of 48,000 miles and a standard
deviation of 6000 miles. What is the probability that the
WearEver tire you purchase will provide a mileage of
a. more than 60,000 miles?
b. between 42,000 and 54,000 miles?
Copyright © Houghton Mifflin Company. All rights reserved.
302360_File_B.qxd
7/7/03
7:18 AM
Page 16
16
34. Highway Speed of Vehicles A study of 8000 vehicles that
passed by a highway checkpoint found that their speeds
were normally distributed, with a mean of 61 miles per
hour and a standard deviation of 7 miles per hour.
a. How many of the vehicles had a speed of more than
68 miles per hour?
b. How many of the vehicles had a speed of less than
40 miles per hour?
Explorations
Applying Chebyshev’s theorem with z 2 yields
1
3
1
1
1
1 21 z2
2
4
4
3
75% means that at least 75% of the data
4
in any data set must lie within 2 standard deviations of the
mean of the data set.
This result of
1. Use Chebyshev’s theorem to determine the minimum
percentage of data (to the nearest percent) in any data set
that must lie within
a. 1.2 standard deviations of the mean.
Chebyshev’s Theorem The following well-known theorem is
called Chebyshev’s theorem. It is named after the Russian
mathematician Pafnuty Lvovich Chebyshev (1821–1894).
Chebyshev’s theorem states that a mathematical
relationship exists between the spread of data and the
standard deviation of the data. A remarkable property of
Chebyshev’s theorem is that it is valid for any set of data.
This is unlike the Empirical Rule, which applies only to
sets of data that have normal distributions.
b. 2.5 standard deviations of the mean.
c. 3.1 standard deviations of the mean.
2. A new automobile dealership found that during the
month of March, the mean selling price of its cars was
$29,200, with a standard deviation of $5100. Use
Chebyshev’s theorem to determine the minimum percentage (to the nearest percent) of the dealership’s cars
that have a selling price within
Chebyshev’s Theorem
a. 1.5 standard deviations of the mean— that is, between
$21,550 and $36,850.
The proportion or percentage of any data set that lies
within z standard deviations of the mean, where z is any
positive number greater than 1, is at least
b. 2.8 standard deviations of the mean—that is, between
$14,920 and $43,480.
1
1
z2
Copyright © Houghton Mifflin Company. All rights reserved.