Download Describing Location in a Distribution

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Describing Location in a Distribution
I
~
j
2.1
Measures of Relative Standing and Density
Curves
2.2
Normal Distributions
Chapter Review
C A S E:
ST' UJ DY
ThenewSAT
For many years, high school students across the United States have taken
the SAT as part of the college admissions process . The SAT has undergone
regular changes over time, but until recently, test takers have always
received two scores : SAT Verbal and SAT Math . Motivated by the findings
of a Blue Ribbon Panel in 1990 and threats by the mammoth University of
California system to discontinue use of the SAT in its admissions process,
the College Board decided to add a Writing section to the test. This section
would include multiple-choice questions on grammar and usage from the old
SAT II Writing Subject Test, as well as a t imed, argumentative essay. At the
same time, the existing Verbal and Math sections of the test underwent
major content revisions. Analogies were removed from the Verbal section,
which was renamed Critical Reading . In the Math section, quantitative comparison questions were eliminated, and new questions testing advanced
algebra content were added .
In March 2005, the College Board administered the new SAT for the first
time. Students, parents, teachers, high school counselors , and college
admissions officers waited anxiously to hear about the results from this new
exam. Would the scores on the new SAT be comparable to those from previqus years? How would students perform on the new Writing section (and
particularly on the timed essay)? In the past, boys had earned higher average
scores than girls on both the Verbal and Math sections of the SAT. Would
similar gender differences emerge on the new SAT?
By the end of this chapter, you will have developed the statistical tools
you need to answer important questions about the new SAT.
113
-A~
'
I·
•
't
114
CHAPTER 2
Describing Location in a Distribution
Activity 2A
A fine-grained distribution
Materials: Sheet ofgrid paper; salt; can of spray paint; paint easel; newspapers
1. Place the grid paper on the easel with a horizontal fold as shown, at
about a 45o angle to the horizontal. Provide a "lip" at the bottom to
catch the salt. Place newspaper behind the grid paper and extending
out on all sides so you will not get paint on the easel.
2.
Pour a strean1 of salt slowly from a point near the n1iddle of the top edge
of the grid paper. The grains of salt will hop and skip their way down the
grid as they collide with one another and bounce left and right. They will
accun1ulate at the bottom, piled against the grid, with the s1nooth profile
of a bell-shaped curve, known as a Nonnal distribution. We will learn
about the Normal distribution in this chapter.
3.
Now carefully spray the grid-salt and all-with paint. Then discard the
salt. You should be able to easily measure the height of the curve at different places by sin1ply counting lines on the grid, or you could approxitnate areas by counting small squares or portions of squares on the grid.
How could you get a tall, narrow curve? How could you get a short,
broad curve? What factors might affect the height and breadth of the curve?
Frmn the members of the class, collect a set of Normal curves that differ
from one another.
Activity 28
Roll a Normal distribution
Materials: Several marbles, all the same size; two metersticks for a "ramp"; a
ruled sheet of paper; a flat table about 4 feet long; carbon paper; Scotch Tape
or masking tape
1. At one end of the table prop up the two metersticks in a V shape to provide a ran1p for the marbles to roll down. Each marble will roll down
t
Activity 28
the chute, continue across the table, and fall off the table to the floor
below. Make sure that the ramp is secure and that the tabletop does not
have any grooves or obstructions.
2.
Roll a n1arble down the ramp several tirnes to get a good idea of the area
of the floor where it will fall.
3.
Center the ruled sheet of paper (see Figure 2.1) over this area, face up,
with the bottmn edge toward the table and parallel to the edge of the
table. The ruled lines should go in the same direction as the marble's
path. Tape the sheet securely to the floor. Place the sheet of carbon
paper, carbon side down, over the ruled sheet.
4.
Roll the rnarble for a class total of 200 tirnes. The spots where it hits the
floor will be recorded on the ruled paper as black dots. When the marble hits the floor, it will probably bounce, so try to catch it in midair
after the impact so that you don't get any extra marks. After the first 100
rolls, replace the sheet of paper. This will make it easier for you to
count the spots. Make sure that the second sheet is in exactly the same
position as the first one.
5. When the rnarble has been rolled 200 tin1es, make a histograrn of the
distribution of the points as follows. First, count the number of dots
in each colurnn. Then graph these numbers by drawing horizontal
lines in the colun1ns at the appropriate levels. Use the scale on the
left-hand side of the sheet.
Figure 2. 1
Example of ruled sheet for Activity 28.
-I-
-
30 -I-
-
-I-
25 -I-I-
20
-I-
15
-
10
-
5
--
-
-
-
-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
CHAPTER 2
Describing Location in a Distribution
Introduction
Suppose that Jenny earns an 86 (out of 100) on her next statistics test. Should she
be satisfied or disappointed with her perfonnance? That depends on how her score
compares to those of the other students who took the test. If 86 is the highest score,
Jenny might be very pleased. Perhaps her teacher will "curve" the grades so that
Jenny's 86 becmnes an "A." But if Jenny's 86 falls below the a middle" of this set of
test scores, she may not be so happy.
Section 2.1 focuses on describing the location of an individual within a distribution. We begin by using raw data to calculate two useful measures oflocation.
One is based on the mean and standard deviation. The other is connected to the
n1edian and quartiles.
Smnetitnes it is helpful to use graphical models called density curves to
describe the location of individuals within a distribution, rather than relying on
actual data values. This is especially true when data fall in a bell-shaped pattern
called a Nonnal distribution. Section 2.2 exan1ines the properties of Norn1al distributions and shows you how to perforn1 useful calculations with them.
2.1
Measures of Relative Standing and Density Curves
Here are the scores of all 2 5 students in Mr. Pryor's statistics class on their first test:
79
77
6
7
7
2334
7
8
8
9
Bm899
00123334
81
83
80
86
77
73
90
79
83
85
74
83
93
89
78
84
80
82
75
67
77
72
73
The bold score is Jenny's 86. How did she perforn1 on this test relative to her
class1nates?
The stem plot in the margin displays this distribution of test scores. Notice
that the distribution is roughly symn1etric with no apparent outliers. Figure 2.2
provides Minitab output that includes summary statistics for the test score data.
569
03
Figure 2.2
Minitab output showing descriptive statistics for the test scores of
Mr. Pryor's statistics class.
Mini tab
Descriptive Statistics: Test 1 scores
Variable
Test 1 scores
N
Variable
Test 1 scores
Minimum
67.00
25
Mean
80.00
Maximum
93.00
Median
80.00
Q1
76.00
TrMean
80.00
Q3
83.50
StDev
6.07
SE Mean
1. 21
2.1 Measures of Relative Standing and Density Curves
Where does Jenny's 86 fall relative to the center of this distribution? Since the
mean and n1edian are both 80, we can say that Jenny's result is "above average."
But how much above average is it?
Measuring Relative Standing: z-Scores
standardizing
One way to describe Jenny's position within the distribution of test scores is to
tell how 1nany standard deviations above or below the mean her score is. Since
the n1ean is 80 and the standard deviation is about 6, Jenny's score of 86 is about
one standard deviation above the n1ean. Converting scores like this from original values to standard deviation units is known as standardizing. To standardize
a value, subtract the mean of the distribution and then divide by the standard
deviation.
Standardized Values and z-Scores
If xis an observation from a distribution that has known mean and standard deviation, the standardized value of x is
x- mean
z = standard deviation
A standardized value is often called a z-score.
A z-score tells us how n1any standard deviations away from the mean the original observation falls, and in which direction. Observations larger than the mean are
positive when standardized, and observations smaller than the mean are negative.
Example2.1
~~
Pryorsfirsttest
Standardizing scores
Jenny's score on the test was x = 86. Her standardized test score is
z=
80 = 86 - 80 = 0.99
6.07
6.07
X -
Katie earned the highest score in the class, 93. Her corresponding z-score is
z=
93- 80 = 2.14
6.07
In other words, Katie's result is 2.14 standard deviations above the mean score for this test.
Norman got a 72 on this test. His standardized score is
z=
72 - 80 = - 1.32
6.07
Norman's score is 1.32 standard deviations below the class mean of 80.
118
CHAPTER 2
Describing Location in a Distribution
We can also use z-scores to compare the relative standing of individuals in different distributions, as the following exa1nple illustrates.
Example2.2
Statistics or chemistry: which is easier?
Using z-scores for comparison
The day after receiving her statistics test result of 86 from Mr. Pryor, Jenny earned an 82
on Mr. Goldstone's chemistry test. At first, she was disappointed. Then Mr. Goldstone told
the class that the distribution of scores was fairly symmetric with a mean of 76 and a standard deviation of 4. Jenny quickly calculated her z-score:
z=
82 - 76 = 1.5
4
Her 82 in chemistry was 1. 5 standard deviations above the mean score for the class. Since
she only scored 1 standard deviation above the mean on the statistics test, Jenny actually
did better in chemistry relative to the class.
We often standardize observations from symmetric distributions to express
them on a common scale. We might, for example, compare the heights of two
children of different ages by calculating their z-scores. The standardized heights
tell us where each child stands in the distribution for his or her age group.
Exercises
2.1 SAT versus ACT Eleanor scores 680 on the mathematics part of the SAT. The distribution of SAT scores in a reference population is symmetric and single-peaked with
mean 500 and standard deviation 100. Gerald takes the American College Testing
(ACT) mathematics test and scores 27. ACT scores also follow a symmetric, singlepeaked distribution but with mean 18 and standard deviation 6. Find the standardized
scores for both students. Assuming that both tests measure the same kind of ability, who
has the higher score?
2.2 Comparing batting averages Three landmarks of baseball achievement are Ty Cobb's
batting average of .420 in 1911, Ted Williams's .406 in 1941, and George Brett's .390 in
1980. These batting averages cannot be compared directly because the distribution of
major league batting averages has changed over the years. The distributions are quite symmetric, except for outliers such as Cobb, Williams, and Brett. While the mean batting
average has been held roughly constant by rule changes and the balance between hitting
and pitching, the standard deviation has dropped over time. Here are the facts:
Decade
Mean
Std. dev.
1910s
1940s
1970s
.266
.267
.261
.0371
.0326
.0317
Compute the standardized batting averages for Cobb, Williams, and Brett to compare how
far each stood above his peers. 1
2.1 Measures of Relative Standing and Density Curves
2.3 Measuring bone density Individuals with low bone density (osteoporosis) have a high
risk of broken bones (fractures). Physicians who are concerned about low bone density in
patients can refer them for specialized testing. Currently, the most common method for
testing bone density is dual-energy X-ray absorptiometry (DXA). A patient who undergoes
a DXA test usually gets bone density results in grams per square centimeter (g/cm 2) and in
standardized units.
Judy, who is 25 years old, has her bone density measured using DXA. Her results indicate a bone density in the hip of948 g/cm 2 and a standardized score of z = -1.45. In the
reference population 2 of 25-year-old women like Judy, the mean bone density in the hip
is 956 g/cm 2 .
(a) Judy has not taken a statistics class in a few years. Explain to her in simple language
what the standardized score tells her about her bone density.
(b) Use the information provided to calculate the standard deviation of bone density in the
reference population.
2.4 Comparing bone density Refer to the previous exercise. One of Judy's friends, Mary,
has the bone density in her hip measured using DXA. Mary is 35 years old. Her bone density is also reported as 948 g/cm 2, but her standardized score is z = 0.50. The mean bone
density in the hip for the reference population of 35-year-old women is 944 g/cm 2 .
(a) Whose bones are healthier: Judy's or Mary's? Justify your answer.
(b) Calculate the standard deviation of the bone density in Mary's reference population.
How does this compare with your answer to Exercise 2.3(b)? Are you surprised?
Measuring Relative Standing: Percentiles
We can also describe Jenny's performance on her first statistics test using percentiles. In Chapter 1, we defined the pth percentile of a distribution as the value
with p percent of the observations less than or equal to it. Using the stemplot on
page 116, we can see that Jenny's 86 is the 22nd highest score on this test, counting up from the lowest score. Since 22 of the 25 observations (88%) are at or below
her score, Jenny scored at the 88th percentile.
Example2.3
Mr. Pryor's first test, continued
Finding percentiles
For Norman, who got a 72, only 2 of the 25 test scores in the class are at or below his. Norman's percentile is computed as follows: 2/25=0.08, or 8%. So Norman scored at the 8th
percentile on this test.
Katie's 93 puts her by definition at the lOOth percentile, since 25 out of 25 test scores
fall at or below her result. Note that some people define the pth percentile of a distribution as the value with p percent of observations below it. Using this alternative definition of percentile, it is never possible for an individual to fall at the 1OOth percentile.
That is why you never see a standardized test score reported above the 99th percentile.
120
CHAPTER 2
Describing Location in a Distribution
Two students scored an 80 on Mr. Pryor's first test. By our definition of percentile, 14 of
the 25 scores in the class were at or below 80, so these two students are at the 56th percentile.
If we used the alternative definition of percentile, these two students would fall at the 48th
percentile ( 12 of 25 scores were below 80). Of course, since 80 is the median score, we prefer to think of it as being the 50th percentile. Calculating percentiles is not an exact science!
~·
'1 can't speak for everyone in the top one percent, but I'm fine."
You may have wondered by now whether there is a sitnple way to convert a
z-score to a percentile. The percent of observations falling at or below a particular
z-score depends on the shape of the distribution. An observation that is equal to
the mean has a z-score of 0. In a heavily left-skewed distribution, where the tnean
will be less than the median, this observation will be somewhere below the 50th
percentile (the median).
There is an interesting result that describes the percent of observations in any
distribution that fall within a specified number of standard deviations of the mean.
It is known as Chebyshev's inequality.
Chebyshev's Inequality
In any distribution, the percent of observations falling within k standard
deviations of the mean is at least (I 00) (I If k
(I 00)
=
kl' )-
2, for example, Chebyshev's inequality tells us that at least
(I - kl') =
7 5% of the o bserva ti ons in any d istri bu tion must fall within
2 standard deviations of the tnean. The following table displays the percent of
observations within k standard deviations of the mean for several values of k.
k
2
1
12
at least (I 00) ( - k ) of observations
within k standard deviations of mean
(100)( I -
/z) = 0%
(100)(1 -
2;) = 75%
2.1 Measures of Relative Standing and Density Curves
3~) 89%
(100)( I - ~) = 93.75%
4
(IOO)(I - ~) = 96%
5
3 (100)( I -
4
5
Going with 85% of the
flow According to the Los
Angeles Times, speed
limits on California
highways are set at the
85th percentile of vehicle
speeds on those stretches
of road. The idea,
apparently, is that 85% of
people behave in a safe
and sane manner. The
other 15% are reckless
individuals who go too
fast. To which group do
you belong?
=
So it is fairly unusual to find an observation that is more than 5 standard deviations
away frmn the mean in any distribution.
Chebyshev's inequality gives us some insight into how observations are distributed within distributions. It does not help us detennine the percentile corresponding to a given z-score, however. For that, we need 111ore advanced statistical
models known as density curves.
Exercises
2.5 Unemployment in the states, I Each month the Bureau of Labor Statistics announces
the unemployment rate for the previous month. Unemployment rates are economically
important and politically sensitive. Unemployment may vary greatly among the states
because types of work are unevenly distributed across the country. Table 2.1 presents the
unemployment rates for each of the 50 states in May 2005.
Table 2.1
State
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Unemployment rates by state, May 2005
Percent
4.4
6.4
4.8
5.0
5.3
5.3
5.3
4.1
4.0
5.2
2.7
3.9
5.8
4.8
4.8
5.3
5.7
State
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Source: Bureau of Labor Statistics Web site, www.bls.gov.
Percent
5.4
5.0
4.2
4.8
7.1
4.3
7.1
5.6
4.5
4.0
4.0
3.6
3.9
6.0
5.0
5.1
3.5
State
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming
Percent
6.1
4.5
6.5
4.8
4.5
6.3
4.0
6.2
5.5
4.9
3.1
3.6
5.7
4.5
4.7
4.0
122
CHAPTER 2
Describing Location in a Distribution
(a) Make a histogram of these data. Be sure to label and scale your axes.
(b) Calculate numerical summaries for this data set. Describe the shape, center, and
spread of the distribution of unemployment rates.
(c) Determine the percentile for Illinois. Explain in simple terms what this says about the
unemployment rate in Illinois relative to the other states.
(d) Which state is at the 30th percentile? Calculate the z-score for this state.
(e) Compare the percent of state unemployment rates that fall within 1, 2, 3, 4, and 5 standard deviations of the mean with the percents guaranteed by Chebyshev's inequality (refer
to the table on page 121).
2.6 Unemployment in the states, II Refer to the previous exercise. The December 2000
unemployment rates for the 50 states had a symmetric, single-peaked distribution with
a mean of 3.47% and a standard deviation of about 1%. The unemployment rate for
Illinois that month was 4.5%. There were 42 states with lower unemployment rates than
Illinois.
(a) Write a sentence comparing the actual rates of unemployment in Illinois in December 2000 and May 2005.
(b) Compare the z-scores for the Illinois unemployment rate in these same two months in
a sentence or two.
(c) Compare the percentiles for the Illinois unemployment rate in these same two months
in a sentence or two.
2.7 PSAT scores In October 2004, about 1.4 million college-bound high school juniors
took the Preliminary SAT (PSAT). The mean score on the Critical Reading test was 46.9,
and the standard deviation was 10.9. Nationally, 5.2 % of test takers earned a score of 65 or
higher on the Critical Reading test's 20 to 80 scale. 3
Scott was one of 50 junior boys to take the PSAT at his school. He scored 64 on the
Critical Reading test. This placed Scott at the 68th percentile within the group of boys.
Looking at all 50 boys' Critical Reading scores, the mean was 58.2, and the standard deviation was 9.4.
(a) Write a sentence or two comparing Scott's percentile among the national group of test
takers and among the 50 boys at his school.
(b) Calculate and compare Scott's z-score among these same two groups of test takers.
(c) How well did the boys at Scott's school perform on the PSAT? Give appropriate evidence to support your answer.
(d) What does Chebyshev's inequality tell you about the performance of students nationally? At Scott's school?
2.8 Blood pressure Larry came home very excited after a VISit to his doctor. He
announced proudly to his wife, "My doctor says my blood pressure is at the 90th percentile
among men like me. That means I'm better off than about 90% of similar men." How
should his wife, who is a statistician, respond to Larry's statement?
r
2.1 Measures of Relative Standing and Density Curves
Density Curves
In Chapter l, we developed a kit of graphical and numerical tools that we call a
"Data Analysis Toolbox" for describing distributions. In this chapter, we have
explored some methods for measuring the relative standing of individuals within
distributions. We now have a clear strategy for exploring data fron1 a single quantitative variable.
l. Always plot your data: make a graph, usually a histogratn or a stemplot.
2. Look for the overall pattern (shape, center, spread) and for striking deviations
such as outliers.
3. Calculate a nun1erical sumtnary to briefly describe center and spread.
Here is one n1ore step to add to this strategy:
4. Sometimes the overall pattern of a large nmnber of observations is so regular
that we can describe it by a s1nooth curve. Doing so can help us describe the
location of individual observations within a distribution.
Figure 2. 3 is a histogram of the scores of all 94 7 seventh-grade students in
Gary, Indiana, on the vocabulary part of the Iowa Test of Basic Skills. 4 Scores on
this national test have a very regular distribution. The histogram is symmetric,
and both tails fall off smoothly frmn a single center peak. There are no large gaps
or obvious outliers. The s1nooth curve drawn through the tops of the histogram
bars in Figure 2. 3 is a good description of the overall pattern of the data.
Figure2.3
Histogram of the vocabulary scores of all seventh-grade students
in Gary, Indiana.
2
4
6
8
Vocabulary scores
10
12
CHAPTER 2
124
mathematical
model
Example2.4
Describing Location in a Distribution
The curve is a mathematical model for the distribution. A n1athematical
model is an idealized description. It gives a compact picture of the overall pattern of the data but ignores minor irregularities as well as any outliers.
We will see that it is easier to work with the smooth curve in Figure 2.3 than
with the histogram. The reason is that the histogran1 depends on our choice of
classes, while with a little care we can use a curve that does not depend on any
choices we make. Here's how we do it.
Seventh-grade vocabulary scores
From histogran1 to density curve
Our eyes respond to the areas of the bars in a histogram. The bar areas represent proportions
of the observations. Figure 2.4(a) is a copy of Figure 2.3 with the leftmost bars shaded blue.
The area of the blue bars in Figure 2.4(a) represents the students with vocabulary scores of 6.0
or lower. There are 287 such students, who make up the proportion 287/947 = 0.303 of all
Gary seventh-graders. In other words, a score of 6.0 corresponds to about the 30th percentile.
Figure 2.4
(a) The proportion of scores less than or equal to 6.0 from the histogram is 0.303. (b) The
proportion of scores less than or equal to 6.0 from the density curve is 0.293.
The blue bars
represent scores
The blue area
represents scores
~6.0.
~6.0.
4
6
Vocabulary scores
10
12
2
4
6
8
10
12
Vocabulary scores
Now concentrate on the curve drawn through the bars. In Figur~ 2.4(b ), the area
under the curve to the left of 6.0 is shaded. Adjust the scale of the graph so that the total
area under the curve is exactly 1. This area represents the proportion l, that is, all the
observations. Areas under the curve then represent proportions of the observations. The
curve is now a density curve. The blue area under the density curve in Figure 2.4(b) represents the proportion of students with score 6.0 or lower. T'his area is 0.293, only 0.010
away from the histogram result. So our estimate based on the density curve is that a score
of 6.0 falls at about the 29th percentile. You can see that areas under the density curve give
quite good approximations of areas given by the histogram.
t
2.1 Measures of Relative Standing and Density Curves
Density Curve
A density curve is a curve that
•
is always on or above the horizontal axis, and
•
has area exactly 1 underneath it.
A density curve describes the overall pattern of a distribution. The area under
the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval.
Normal curve
Example2.5
The density curve in Figures 2. 3 and 2.4 is a Normal curve. Density curves, like
distributions, come in many shapes. In later chapters, we will encounter important density curves that are skewed to the left or right and curves that may look like
Normal curves but are not.
A left-skewed distribution
Density curves and areas
Figure 2.5 shows the density curve for a distribution that is skewed to the left. The
smooth curve makes the overall shape of the distribution clearly visible. The shaded area
under the curve covers the range of values from 7 to 8. This area is 0.12. This means that
the proportion 0.12 ( 12 %) of all observations from this distribution have values between
7 and 8.
Figure 2.5
The shaded area under this density curve is the proportion of
observations taking values between 7 and B.
Total area under
7
8
CHAPTER 2
Describing Location in a Distribution
Figure 2.6 shows two density curves: a symmetric, Normal density curve and
a right-skewed curve. The mean and median of each distribution, which will be
discussed in more detail shortly, have been 1narked on the horizontal axis.
Figure 2.6
(a) The median and mean of a symmetric density curve. (b) The median and mean of a
right-skewed density curve.
The long right tail pulls
the mean to the right.
t
Median and mean
llean
Median
A density curve of the appropriate shape is often an adequate description of
the overall pattern of a distribution. Outliers, which are deviations from the overall pattern, are not described by the curve. Of course, no set of real data is exactly
described by a density curve. The curve is an approximation that is easy to use
and accurate enough for practical use.
The Median and Mean of a Density Curve
Our n1easures of center and spread apply to density curves as well as to actual sets
of observations. The 1nedian and quartiles are easy. Areas under a density curve
represent proportions of the total number of observations. The 1nedian is the point
with half the observations on either side. So the median of a density curve is the
"equal-areas point," the point with half the area under the curve to its left and
the remaining half of the area to its right. The quartiles divide the area under the
curve into quarters. One-fourth of the area under the curve is to the left of the first
quartile, and three-fourths of the area is to the left of the third quartile. You can
roughly locate the median and quartiles of any density curve by eye by dividing
the area under the curve into four equal parts.
Because density curves are idealized patterns, a symmetric density curve is
exactly syn1n1etric. The median of a symmetric density curve is therefore at its
center. Figure 2.6(a) shows the median of a symn1etric curve. It isn't so easy to spot
the equal-areas point on a skewed curve. There are mathematical ways of finding
the median for any density curve. We did that to 111ark the median on the skewed
curve in Figure 2.6(b ).
What about the mean? The mean of a set of observations is their arithmetic
average. If we think of the observations as weights strung out along a thin rod,
2.1 Measures of Relative Standing and Density Curves
127
the mean is the point at which the rod would balance. This fact is also true of
density curves. The mean of a density curve is the "balance point," the point at
which the curve would balance if n1ade of solid material. Figure 2. 7 illustrates
this fact about the 1nean. A symn1etric curve balances at its center because the
two sides are identical. The mean and median of a symmetric density curve are
equal, as in Figure 2.6(a). We know that the mean of a skewed distribution is
pulled toward the long tail. Figure 2.6(b) shows how the mean of a skewed density curve is pulled toward the long tail 111ore than is the median. It's hard to
locate the balance point by eye on a skewed curve. There are mathematical ways
of calculating the n1ean for any density curve, so we are able to mark the mean
as well as the n1edian in Figure 2.6(b ).
Figure 2.7
The mean is the balance point of a density curve.
~
Median and Mean of a Density Curve
The median of a density curve is the "equal-areas point," the point that divides
the area under the curve in half.
The mean of a density curve is the "balance point," at which the curve would
balance if made of solid material.
The 1nedian and .m ean are the same for a symmetric density curve. They both
lie at the center of the curve. The mean of a skewed curve is pulled away from
the 1nedian in the direction of the long tail.
mean 1-L
standilrd
deviation u
We can roughly locate the mean, median, and quartiles of any density curve
by eye. This is not true of the standard deviation. When necessary, we can once
again call on more advanced mathematics to learn the value of the standard deviation. The study of mathematical methods for doing calculations with density
curves is part of theoretical statistics. Though we are concentrating on statistical
practice, we often make use of the results of 1nathematical study.
Because a density curve is an idealized description of the distribution of
data, we need to distinguish between the mean and standard deviation of the
density curve and the mean x and standard deviations computed from the actual
observations. The usual notation for the mean of a density curve is JL (the Greek
letter mu). We write the standard deviation of a density curve as u (the Greek
letter sigma).
CHAPTER 2
128
Describing Location in a Distribution
Exercises
2.9 Density curves Sketch density curves that might describe distributions with the following shapes:
(a) Symmetric, but with two peaks
(b) Single peak and skewed to the left
uniform
distribution
2.10 A uniform distribution, I Figure 2.8 displays the density curve of a uniform distribution. The curve takes the constant value l over the interval from 0 to l and is 0 outside the
range of values. This means that data described by this distribution take values that are uniformly spread between 0 and l.
Figure 2.8
The density curve of a uniform distribution, for Exercise 2.10.
"''
} height = 1
J
0
Use areas under this density curve to answer the following questions.
(a) Why is the total area under this curve equal to l?
(b) What percent of the observations lie above 0.8?
(c) What percent of the observations lie below 0.6?
(d) What percent of the observations lie between 0.25 and 0.75?
(e) What is the mean J.-t for this distribution?
2.11 A uniform distribution, II Refer to the previous exercise. Can you construct a modified boxplot for the uniform distribution? If so, do it. If not, explain why not.
2.12 Finding means and medians Figure 2.9 displays three density curves, each with three
points indicated. At which of these points on each curve do the mean and the median fall?
Figure 2.9
Three density curves, for Exercise 2.12.
ABC
ABC
(a)
(b)
~
AB C
(c)
2.1 Measures of Relative Standing and Density Curves
129
2.13 A weird density curve A line segment can be considered a density "curve," as shown
in Exercise 2.1 0. A "broken-line" graph can also be considered a density curve. Figure 2.10
shows such a density curve.
Figure 2 . 10
An unusual "broken-line" density curve, for Exercise 2.13.
y
2~----r----+-----r----r
X
0
0.2
0.4
0.6
0.8
(a) Verify that the graph in Figure 2.10 is a valid density curve.
For each of the following, use areas under this density curve to find the proportion of
observations within the given interval:
(b) 0. 6 ~ X ~ 0. 8
(c)O~x~o.4
(d) 0 ~X~ 0.2
(e) The median of this density curve is a point between X= 0.2 and X= 0.4. Explain why.
outcomes
simulation
2.14 Roll a distribution In this exercise you will pretend to roll a regular, six-sided die 120
times. Each time you roll the die, you will record the result. The numbers 1, 2, 3, 4, 5, and
6 are called the outcomes of this chance phenomenon.
In 120 rolls, how many of each number would you expect to roll? The TI-83/84 and
TI-89 are useful devices for imitating chance behavior, especially in situations like this that
involve performing many repetitions. Because you are only pretending to roll the die
repeatedly, we call this process a simulation. There will be a more formal treatment of
simulations in Chapter 6.
•
Begin by clearing L 1/list1 on your calculator.
•
Use your calculator's random integer generator to generate 120 random whole numbers between 1 and 6 (inclusive), and then store these numbers in L 1/listl.
•
On the TI-83/84, press
IFJII,choose PRB, then 5: Randint. On the TI-89, press
llt!iffllel~[i] and choose randint ....
CHAPTER 2
•
Describing Location in a Distribution
On the TI-83/84, complete the command Randint ( 1 6 12 0) ~l. On the
TI-89, complete the command tis tat. randint ( 1 6 12 0 ~ list1.
I
I
I
•
•
•
•
I
Set the viewing window as follows: X[ l, 7] 1 by Y[- 5, 2 5]5.
Specify a histogram using the data in 1 1/listl .
Then graph. Are you surprised?
Repeat the simulation several times. You can recall and reuse the previous command
by pressing flfiJIM~IIMU. It's a good habit to clear L1/listl before you roll the die again.
(a) In theory, each number should come up 20 times. But in practice, there is chance variation, so the bars in the histogram will probably have different heights. Theoretically, what
should the distribution look like?
(b) How is this distribution similar to the uniform distribution in Exercise 2.1 0? How is it
different?
Section 2.1 Summa!}
Two ways of describing an individual's location within a distribution are
z-scores and percentiles. To standardize any observation x, subtract the mean
of the distribution and then divide by the standard deviation. The resulting z-score
x- mean
standard deviation
z=------says how many standard deviations x lies fron1 the distribution n1ean. An observation's percentile is the percent of the distribution that is at or below the value of
that observation. We can also use z-scores and percentiles to compare the relative
standing of individuals in different distributions.
Chebyshev's inequality gives us a useful rule of thumb for the percent of
observations in any distribution that are within a specified number of standard
deviations of the mean.
We can sometin1es describe the overall pattern of a distribution by a density
curve. A density curve always remains on or above the horizontal axis and has total
area l underneath it. An area under a density curve gives the proportion of observations that fall in a range of values. A density curve is an idealized description of
the overall pattern of a distribution that s1nooths out the irregularities in the actual
data. Write the mean of a density curve as IL and the standard deviation of a density curve as O" to distinguish then1 from the mean x and the standard deviation s
of the actual data.
The mean, the n1edian, and the quartiles of a density curve can be located by
eye. The mean IL is the balance point of the curve. The median divides the area
under the curve in half. The quartiles, with the 1nedian, divide the area under the
curve into quarters. The standard deviation O" cannot be located by eye on most
density curves.
The mean and median are equal for symtnetric density curves. The mean of
a skewed curve is located farther toward the long tail than is the median.
2.1 Measures of Relative Standing and Density Curves
Sectio11 2.1 Exercises
2.15 Men's and women's heights The heights of women aged 20 to 29 are symmetrically
distributed with a mean of 64 inches and standard deviation of 2.7 inches. Men the same
age have mean height 69.3 inches with a standard deviation of 2.8 inches. What are the zscores for a woman 6 feet tall and a man 6 feet tall? Describe in simple language what
information the z-scores give that the actual heights do not.
2.16 Baseball salaries, I Here are the salaries for each member of the Boston Red Sox
baseball team on the opening day of the 2005 season: 5
Player
Ramirez, Manny
Schilling, Curt
Damon, Johnny
Renteria, Edgar
Varitek, Jason
Foulke, Keith
Nixon, Trot
Clement, Matt
Ortiz, David
Wakefield, Tim
Wells, David
Millar, Kevin
Payton, Jay
Embree, Alan
Bellhorn, Mark
Timlin, Mike
Mueller, Bill
Arroyo, Bronson
Miller, Wade
Mirabelli, Doug
Halama, John
Mantei, Matt
Vazquez, Ramon
Myers, Mike
McCarty, David
Youkilis, Kevin
Neal, Blaine
Stern, Adam
Salary
$ 22,000,000
14,500,000
8,250,000
8,000,000
8,000,000
7,500,000
7,500,000
6,500,000
5,250,000
4,670,000
4,075,000
3,500,000
3,500,000
3,000,000
2,750,000
2,750,000
2,500,000
1,850,000
1,500,000
1,500,000
850,000
750,000
. 700,000
600,000
550,000
323,125
321,000
316,000
(a) Construct a histogram of the distribution of salaries.
(b) Calculate numerical summaries for this distribution of salaries. Describe the shape,
center, and spread of the distribution. Are there any outliers?
(c) Describe David McCarty's position within the distribution using both his z-score and
his percentile.
CHAPTER 2
Describing Location in a Distribution
(d) According to our definition of percentile, who are the players at the 25th and 75th percentiles of this distribution, and what are their salaries? Do these values match the first and
third quartiles you calculated in part (b)? Why or why not?
2.17 Baseball salaries, II Refer to the previous exercise. On the opening day of the 2004
season, David McCarty had a $500,000 salary. Johnny Damon's salary was $8,000,000. For
the 30 players on the roster that clay, the mean salary was $4,243,283.33 and the standard
deviation was $5,324,827.26. Damon's salary was the fifth highest on the team, while
McCarty's salary ranked 24th. Compare David McCarty's and Johnny Damon's openingday salaries for the 2004 and 2005 seasons using actual data values, z-scores, and percentiles. Write a few sentences describing your conclusions.
2.18 Statistics test scores revisited Refer to the scores of Mr. Pryor's statistics students on
their first test (page 116).
(a) Compare the percent of test scores that fall within l, 2, 3, 4, and 5 standard deviations of the mean with the percents guaranteed by Chebyshev's inequality (see the table
on page 116).
(b) Mr. Pryor decides that the test scores are a little low. One option he considers is adding
4 points to each student's score. How will this affect each student's z-score? Their percentile?
(c) Rather than adding 4 points, Mr. Pryor also thinks about multiplying each student's
score by 1.06. How will this affect each student's z-score? Their percentile?
(d) The last idea Mr. Pryor contemplates is determining each student's final score as follows: 84 + (student's z-score)(4 points). Which of the three plans- (b), (c), or (d) -would
you recommend? Explain your reasoning.
2.19 Comparing performance: percentiles Erik is a star runner on the track team, and
Erica is one of the best sprinters on the swim team. Both athletes qualify for the state championship meet based on their performance during the regular season.
(a) In the track playoffs, Erik records a time that would fall at the lOth percentile of all his
race times that season. But his performance places him at the 40th percentile in the state
championship• meet. Explain how this is possible.
(b) Erica swims a bit lethargically for her in the state swim meet, recording a time that would
fall at the 40th percentile of all her meet times that season. But her performance places
Erica at the lOth percentile in this event at the state meet. Explain how this could happen.
2.20 Another weird density curve A certain density curve looks like an inverted letter V.
The first segment goes from the point (0, 0.6) to the point (0.5, 1.4). The second segment
goes from (0.5, 1.4) to (1, 0.6).
(a) Sketch the curve. Verify that the area under the curve is l, so that it is a valid density
curve.
(b) Determine the median. Mark the median and the approximate locations of the quartiles Q1 and Q3 on your sketch.
(c) What percent of the observations lie below 0.3?
(d) What percent of the observations lie between 0.3 and 0.7?
2.2 Normal Distributions
2.21 Calculator-generated density curve Like Minitab and similar computer software,
the TI-83/84 and TI-89 has a "random number generator" that produces decimal numbers
between 0 and l.
mJD,then choose PRB and 1 :Rand.
•
On the TI-83/84, press
•
On the Tl-89, press P.m!Jm (MATH) then choose 7: Probability and 4: rand (.
Be sure to close the parentheses.
Press !j~lljO several times to see the results. The command 2rand ( 2rand () on the
TI-89) produces a random number between 0 and 2. The density curve of the outcomes
has constant height between 0 and 2, and height 0 elsewhere.
(a) Draw a graph of the density curve.
(b) Use your graph from (a) and the fact that areas under the curve are relative frequencies
of outcomes to find the proportion of outcomes that are less than l .
(c) What is the median of the distribution? The mean? What are the quartiles?
(d) Find the proportion of outcomes that lie between 0.5 and 1.3.
2.22 FLIP50 The program FLIP50 simulates flipping a fair coin 50 times and counts the
number of times the coin comes up heads. It prints the number of heads on the screen.
Then it repeats the simulation for a total of l 00 times, each time displaying the number of
heads in 50 flips. When it finishes, it draws a histogram of the 100 results. (You have to set
up the plot first on the TI-89.)
(a) What outcomes are likely? What outcomes are the most likely? If you made a histogram
of the results of the l 00 repetitions, what shape distribution would you expect?
(b) Link the program from a classmate or your teacher. Run the program and observe the
variations in the results of the l 00 repetitions.
I
t
(c) When the histogram appears, TRACE to see the classes and frequencies . Record the
results in a frequency table. (On the TI-89, set up Plot l to be a histogram of list2 with a
bucket width of2 . Then press C[i) (GRAPH).
(d) Describe the distribution: symmetric versus nonsymmetric; center; spread; number of
peaks; gaps; suspected outliers. What shape density curve would best fit your distribution?
2.2 Normal Distributions
Nomwl
distributions
One particularly in1portant class of density curves has already appeared in Figures
2.3, 2.4, and 2.6(a) and the ufine-grained distribution" of Activity 2A. These density curves are syn1n1etric, single-peaked, and bell-shaped. They are called Normal
curves, and they describe Norma l distributions. Norn1al distributions play a large
role in statistics, but they are rather special and not at all unormal" in the sense of
being average or natural. We capitalize Normal to remind you that these curves
are special.
CHAPTER 2
Describing Location in a Distribution
All Normal distributions have the same overall shape. The exact density curve
for a particular Normal distribution is described by giving its mean J.L and its standard deviation a-. The n1ean is located at the center of the symmetric curve, and is
the same as the median. Changing J.L without changing a- moves the Normal curve
along the horizontal axis without changing its spread. The standard deviation a- controls the spread of a Normal curve. Figure 2.11 shows two Nonnal curves with different values of a-. The curve with the larger standard deviation is more spread out.
Figure 2.11
Two Normal curves, showing the mean J..L and standard deviation cr.
~
J..L
J..L
The standard deviation a- is the natural measure of spread for Normal distributions. Not only do J.L and a- completely detennine the shape of a Nonnal curve,
but we can locate a- by eye on the curve. Here's how. As we move out in either
direction frmn the center J.L, the curve changes fron1 falling ever more steeply
to falling ever less steeply.
_/
1\
~
The points at which this change of curvature takes place are located at distance aon either side of the mean J.L. Advanced math students know these points as inflection points. Figure 2.11 shows a- for two different Normal curves. You can feel the
change as you run a pencil along a Normal curve, and so find the standard deviation. Ren1ember that J.L and a- alone do not specify the shape of most distributions,
and that the shape of density curves in general does not reveal a-. These are special properties of Nonnal distributions.
Why are the Normal distributions i1nportant in statistics? Here are three reasons. First, Normal distributions are good descriptions for some distributions of
real data. Distributions that are often close to Normal include
•
scores on tests taken by many people (such as SAT exams and many psychological tests),
•
repeated careful1neasurements of the same quantity, and
•
characteristics of biological populations (such as yields of corn and lengths of
anin1al pregnancies).
2.2 Normal Distributions
Second, Normal distributions are good approximations to the results of many
kinds of chance outcomes, such as tossing a coin many times. (See the FLIP 50 simulation in Exercise 2.22, page 133.) Third, and 1nost in1portant, we will see that
many statistical inference procedures based on Nonnal distributions work well for
other roughly symmetric distributions.
However, even though many sets of data follow a Normal distribution,
many do not. Most inc01ne distributions, for exarnple, are skewed to the right and
so are not Normal. Some distributions are symmetric but not Normal or even
close to Normal. The uniform distribution of Exercise 2.10 (page 128) is one such
example. Non-Normal data, like non-normal people, not only are common but
are s01netimes 1nore interesting than their Normal counterparts.
I
~
The 68-95-99.7 Rule
Although there are many Nonnal curves, they all have common properties. In particular, all Normal distributions obey the following rule.
The 68-95-99.7 Rule
In the Nonnal distribution with mean JL and standard deviation a:
•
Approximately 68% of the observations fall within a of the 1nean J.L.
•
Approximately 95% of the observations fall within 2a of JL.
•
Approximately 99.7% of the observations fall within 3a of JL.
Figure 2.12 illustrates the 68-95-99.7 rule. Some authors refer to it as the a empirical rule." By remembering these three numbers, you can think about Normal
Figure 2.12
The 68-95-99.7 rule for Normal distributions.
68% of data -+
I
I
-3
-2
-1
95% of data
I
99.7% of data
I
0
\ •I
I'
2
c
•I
3
136
CHAPTER 2
Describing Location in a Distribution
distributions without constantly making detailed calculations when rough approximations will suffice. Notice that the 68-95-99.7 rule gives us 1nuch more precise
inforn1ation about how the observations fall in a Normal distribution than Chebyshev's inequality (page 120) did.
Example2.6
Young women's heights
Using the 68-95-99.7 rule
The distribution of heights of young women agedl8 to 24 is approximately Normal with
mean J.L = 64.5 inches and standard deviation cr = 2. 5 inches. Figure 2.13 shows the application of the 68-95-99.7 rule in this example.
Figure 2.13
57
The 68-95-99.7 rule applied to the distribution of the heights of
young women. Here, J.L = 64.5 and u = 2.5.
59.5
62
64.5 .
67
69.5
72
Height (in inches)
Two standard deviations is 5 inches for this distribution. The 9 5 part of the
68-95-99.7 rule says that the middle 95 % of young women are between 64.5 - 5 and
64.5 + 5 inches tall, that is, between 59.5 and69.5 inches. This fact is exactly true for an
exactly Normal distribution. It is approximately true for the heights of young women
because the distribution of heights is approximately Normal.
The other 5% of young women have heights outside the range from 59.5 to 69.5
inches. Because the Normal distributions are symmetric, half of these women are on the
tall side. So the tallest 2.5% of young women are taller than 69 .5 inches.
The 99.7 part of the 68-95-99.7 rule says that almost all young women (99.7% of
them) have heights between J.L- 3crand J.L + 3cr. This range of heights is 57 to 72 inches.
N(J.L, u)
Because we will n1ention Normal distributions often, a short notation is helpful.
We abbreviate the Normal distribution with n1ean JL and standard deviation a as
N(JL,a). For example, the distribution of young women's heights is N(64.5, 2.5).
2.2 Normal Distributions
Activity 2C
The Normal Curve Applet
The Normal Curve Applet allows you to investigate the properties of Normal distributions very easily. It is somewhat limited by the number of pixels available for use, so that it can't hit every value exactly. For the questions
below, use the closest available values.
l. How accurate is 68-95-99. 7? The 68-95-99. 7 rule for Norn1al distributions is a useful approximation. But how accurate is this rule? On the
applet, drag one flag past the other so that the applet shows the area
under the curve between the two flags .
(a) Place the flags one standard deviation on either side of the mean.
What is the area between these two values? What does the
68-95-99. 7 rule say this area is?
(b) Repeat for locations two and three standard deviations on either
side of the mean. Again compare the 68-9 5-99. 7 rule with the area
given by the applet.
2.
How 1nany standard deviations above or below the mean does the
40th percentile of the Nonnal distribution with n1ean 0 and standard
deviation l fall? Describe how you used the applet to answer this
question.
Exercises
2.23 Estimating standard deviations Figure 2.14 on the next page shows two Normal curves,
both with mean 0. Approximately what is the standard deviation of each of these curves?
2.24 Men·s heights T he distribution of h eights of adult Am erican men is approximately
Normal with mean 69 inches and standard deviation 2.5 inch es. Draw a Normal curve on
which this mean and standard deviation are correctly located. (Hint: Draw the curve first,
locate the points wh ere the curvature changes, then mark the h orizontal axis.)
2.25 More on men's heights T h e distribution of heights of adult American men is approximately Normal with mean 69 inch es and standard deviation 2.5 inch es. Use th e
68-95-99.7 rule to answer the following questions.
(a) W hat percent of men are taller than 74 inches?
(b) Between what heights do the m iddle 95% of men fall?
(c) W hat percent of men are shorter than 66.5 in ches?
(d) A h eight of 71. 5 inch es corresponds to what perce ntile of adult male Am erican
h eights?
CHAPTER 2
Describing Location in a Distribution
Figure 2.14
-1.6
Two Normal curves with the same mean but different standard
deviations, for Exercise 2.23.
-1.2
- 0.8
-0.4
0
0.4
0.8
1.2
1.6
2.26 Potato chips The distribution of weights of 9-ounce bags of a particular brand of
potato chips is approximately Normal with mean J.L = 9.12 ounces and standard deviation
u = 0.15 ounce.
(a) Draw an accurate sketch of the distribution of potato chip bag weights. Be sure to label
the mean, as well as the points one, two, and three standard deviations away from the mean
on the horizontal axis.
(b) A bag that weighs 8.97 ounces is at what percentile in this distribution?
(c) What percent of 9-ounce bags of this brand of potato chips weigh between 8.67 ounces
and 9.27 ounces?
2.27 FLIP50 Refer to Exercise 2.22 (page 133). Run the program FLIP50 again.
(a) Use the Data Analysis Toolbox from Chapter 1 to describe your results.
(b) What percent of your observations fall within one standard deviation of the mean?
Within two standard deviations? Within three standard deviations?
(c) How well do your data follow the 68-95-99.7 rule for Normal distributions? Explain.
2.28 A fine-grained distribution You can do this exercise if you spray-painted a Normal
distribution in Activity 2A. On your "fine-grained distribution," first count the number of
whole squares and parts of squares under the curve. Approximate as best you can. This represents the total area under the curve.
(a) Mark vertical lines at J.L - 1u and J.L + 1u. Count the number of squares or parts of
squares between these two vertical lines. Now divide the number of squares within one
standard deviation of J.L by the total number of squares under the curve and express your
answer as a percent. How does this compare with 68%? Why would you expect your
answer to differ somewhat from 68%?
2.2 Normal Distributions
(b) Count squares to determine the percent of area within 2u of J.L. How does your answer
compare with 95 %?
(c) Count squares to determ ine the percent of area within 3u of J.L. How does your answer
compare with 99. 7%?
The Standard Normal Distribution
As the 68-95-99.7 rule suggests, all Normal distributions share many common
properties. In fact, all Norn1al distributions are the same if we m easure in units of
size u about the m ean J.L as center. Changing to these un its requires us to standardize, just as we did in Section 2.1:
z=
He said, she said The
height and weight
distributions in this
chapter come from actual
measurements in a
government survey. Why
not just ask people? When
asked their weight, almost
all women say they weigh
less than they really do.
Heavier men also
underreport their weight,
but lighter men claim to
weigh more than the scale
shows. We leave you to
ponder the psychology of
the two sexes. Just
remember that "say so"
is no replacement for
measuring.
x- J.L
-
(J
If the variable we standardize has a Normal distribution, then so does the new
variable z. (This is because standardizing is a linear transfon11ation, which, as we
learned in C hapter 1, does not change the shape of a distribution. ) This new distribution is called the standard Normal distribution.
Standard Normal Distribution
T h e standard Normal distribution is the N ormal distribution N(O, l ) with
m ean 0 and standard deviation 1 (Figure 2.15 ).
Figure 2.15
Standard Normal distribution.
- 3.0
- 2.0
- 1.0
1.0
0
2.0
3.0
If a variable x h as any Normal distribution N( J.L, u ) with m ean J.L and standard
deviation u, then the standardized variable
X-
J.L
z=--u
has the standard Normal distribution.
140
CHAPTER 2
Describing Location in a Distribution
Standard Normal Calculations
An area under a density curve is a proportion of the observations in a distribution.
Any question about what proportion of observations lie in some range of values
can be answered by finding an area under the curve. Because all Normal distributions are the same when we standardize, we can find areas under any Normal
curve fro1n a single table, a table that gives areas under the curve for the standard
Normal distribution. Table A, the standard Normal table, gives areas under the
standard Normal curve. You can find Table A inside the front cover.
Ths Standard Norms/ Tsbls
Table A is a table of areas under the standard Normal curve. The table entry for
each value z is the area under the curve to the left of z.
z
The next two examples show how to use the table.
Example2.7
Area to the left
Using the standard Nonnal table
Problem: Find the proportion of observations from the standard Normal distribution that
are less than 2.22.
Solution: To find the area to the left of 2.22, locate 2.2 in the left-hand column of Table
A, then locate the remaining digit 2 as .02 in the top row. The entry opposite 2.2 and under
.02 is 0.9868. This is the area we seek.
z
.00
.01
.02
.03
2.1
.9821
.9826
.9830
.9834
2.2
.9861
.9864
.9868
.9871
2.3
.9893
.9896
.9898
.9901
Figure 2.16 illustrates the relationship between the value z = 2.22 and the area 0.9868.
2.2 Normal Distributions
Figure 2.16
The area under a standard Normal curve to the left of the point
z = 2.22 is 0.9868.
Table entry for z is always
the area under the curve
to the left of z .
Z=
Example2.8
2.22
Area to the right
More on using the standard Normal table
z
.04
.05
.08
Problem: Find the proportion of observations fro m the standard Normal distribution that
- 2.2
- 2.1
- 2.0
.0125
.0 162
.0207
.0 122
.0158
.0202
.0 11 9
.0 154
.0197
are greater than -2. 15.
Solution: Enter Table A under z = -2. 15. T hat is, fi nd -2. 1 in the left-hand column and
.05 in the top row. T he table entry is 0.01 58. T his is the area to the left of- 2.1 5. Because
the total area under the curve is 1, the area lying to the right of - 2. 15 is 1 - 0. 0158 =
0.9842. Figure 2.17 illustrates these areas .
Figure 2.17
Areas under the standard Normal curve to the right and left of
z = - 2.15. Table A gives only areas to the left.
Z=
- 2.15
Caution! A common student mistake is to look up a z-value in Table A and
report the entry corresponding to that z-value, regardless of whether the problem
asks for the area to the left or to the right of that z-value. Always sketch the standard Normal curve, mark the z-value, and shade the area of interest. And before you
finish, make sure your answer is reasonable in the context of the problem.
142
CHAPTER 2
Describing Location in a Distribution
Exercises
2.29 Table A practice Use Table A to find the proportion of observations from a standard
Normal distribution that satisfies each of the following statements. In each case, sketch a
standard Normal curve and shade the area under the curve that is the answer to the question. Use the Normal Curve Applet to check your answers.
(a) z < 2.85
(b) z > 2.85
(c) z > -1.66
(d) -1.66 < z < 2.85
2.30 More Table A practice Use Table A to find the proportion of observations from a
standard Normal distribution that satisfies each of the following statements. In each case,
sketch a standard Normal curve and shade the area under the curve that is the answer to
the question. Use the Normal Curve Applet to check your answers.
(a) z < -2.46
(b) z > 2.46
(c) 0.89 < z < 2.46
(d) -2.95 < z < -1.27
B.C.
by johnny hart
NORMALIZ.E
WHAI YoiJ AAvt:: IF 'lOu Dot-lriJEt:D GI..-A~S~s.
f'fl
-<- -
::::.,.;;oo;·
--
~4-,.;--A- ~ • .
1m-
Normal Distribution Calculations
We can answer any question about proportions of observations in a Normal distribution by standardizing and then using the standard Normal table. Here is an
outline of the method for finding the proportion of the distribution in any region.
Solving Problems Involving Norms/ Distributions
Step 1: State the problem in terms of the observed variable x. Draw a picture
of the distribution and shade the area of interest under the curve.
2.2 Normal Distributions
Step 2: Standardize and draw a picture. Standardize x to restate the problem in
terms of a standard Normal variable z. Draw a picture to show the area of interest under the standard Norn1al curve.
Step 3: Use the table. Find the required area under the standard Normal curve,
using Table A and the fact that the total area under the curve is 1.
Step 4: Conclusion. Write your conclusion in the context of the problem.
Example2.9
Is cholesterol a problem for young boys?
Normal calculations
The level of cholesterol in the blood is important because high cholesterol levels may
increase the risk of heart disease . The distribution of blood cholesterol levels in a large
population of people of the same age and sex is roughly Normal. For 14-year-old boys, 6 the
mean is JL = 170 milligrams of cholesterol per deciliter of blood (mg/dl) and the standard
deviation is a= 30 mg/dl. Levels above 240 mg/dl may require medical attention. What
percent of 14-year-old boys have more than 240 mg/dl of cholesterol?
Step 1: State the problem. Call the level of cholesterol in the blood x. The variable x has
theN( 170, 30) distribution. We want the proportion of boys with cholesterol level x > 240.
Sketch the distribution, mark the important points on the horizontal axis, and shade the
area of interest. See Figure 2.18(a).
Figure 2. 18a
(a) Cholesterol levels for 14-year-o/d boys who may require
medical attention.
170
200
240
Step 2: Standardize and draw a picture. On both sides of the inequality, subtract the
mean, then divide by the standard deviation, to turn x into a standard Normal z:
X> 240
X -
170 > 240 - 170
30
30
z > 2.33
144
CHAPTER 2
Describing Location in a Distribution
Sketch a standard Normal curve, and shade the area of interest. See Figure 2.18(b).
Figure 2. 18b
(b) Areas under the standard Normal curve.
Area
=0.9901
Area
Z=
=0.0099
2.33
Step 3: Use the table. From Table A, we see that the proportion of observations less than
2.33 is 0.9901. About 99% of boys have cholesterol levels less than 240. The area to the
right of 2.33 is therefore 1 - 0.9901 = 0.0099. This is about 0.01 , or 1%.
Step 4: Conclusion. (Write your conclusion in the context of the problem.) Only about
1% of 14-year-old boys have dangerously high cholesterol.
In a Norn1al distribution, the proportion of observations with x > 240 is the
same as the proportion with x ~ 240. There is no area under the curve exactly
above the point 240 on the horizontal axis, so the areas under the curve with
x > 240 and x ~ 240 are the sa1ne. This isn't true of the actual data. There n1ay
be a boy with exactly 240 mg/dl of blood cholesterol. The Normal distribution
is just an easy-to-use approximation, not a description of every detail in the
actual data.
The key to doing a Normal calculation is to sketch the area you want, then
match that area with the area that the table gives you. Here is another example.
Example 2. 10 Cholesterol in young boys (again)
Working with an interval
What percent of 14-year-old boys have blood cholesterol between 170 and 240 mg/dl?
Step 1: State the problem. We want the proportion of boys with 170:::; x:::; 240.
Step 2: Standardize and draw a picture.
170 ::S
170 - 170
30
< X -
X ::S
240
170 < 240 - 170
30
30
0:::; z:::; 2.33
2.2 Normal Distributions
Sketch a standard Normal curve, and shade the area of interest. See Figure 2.19.
Figure 2.19
Areas under the standard Normal curve.
Area
=0.5
I
Area
=0.4901
0
Area
2.33
=0.9901
Step 3: Use the table. The area between 0 and 2.33 is the area to the left of 2.33 minus
the area to the left of 0. Look at Figure 2.19 to check this. From Table A,
area between 0 and 2. 3 3 = area to th e left of 2. 33 - area to the left of 0.00
= 0.9901 - 0.5000 = 0.4901
Step 4: Conclusion. (State your conclusion in context.) About 49% of 14-year-old boys
have cholesterol levels between 170 and 240 mg/cll.
What if we meet a z that falls outside the range covered by Table A? For
exa1nple, the area to the left of z = -4 does not appear in the table. But since
-4 is less than - 3.4, this area is sn1aller than the entry for z = - 3.40, which is
0.0003. There is very little area under the standard Normal curve outside the
range covered by Table A. You can take this area to be zero with little loss of
accuracy.
Finding a Value, Given a Proportion
Examples 2.9 and2.10 illustrate the use of Table A to find what proportion of the
observations satisfies some condition, such as "blood cholesterol between 170 and
240 n1g/dl." We n1ay instead want to find the observed value with a given proportion of the observations above or below it. To do this, use Table A backwards. Find
the given proportion in the body of the table , read the corresponding z from the
left column and top row, then "unstandardize" to get the observed value. Here is
an example.
146
CHAPTER 2
Describing Location in a Distribution
Example 2. 11 SAT Verbal scores
Inverse Normal calculations
Scores on the SAT Verbal test in recent years follow approximately the N(505, 110) distribution. How high must a student score in order to place in the top 10% of all students
taking the SAT?
Step 1: State the problem and draw a picture. We want to find the SAT score x with area
0.1 to its right under the Normal curve with mean J-t = 505 and standard deviation u =
110. That's the same as finding the SAT score x with area 0.9 to its left. Figure 2.20 poses
the question in graphical form.
Figure 2.20
Locating the point on a Normal curve with area 0.10 to its right.
X= 505
The bell curve? Does the
distribution of human
intelligence follow the
"bell curve" of a Normal
distribution? Scores on IQ
tests do roughly follow a
Normal distribution. That
is because a test score is
calculated from a
person's answers in a
way that is designed to
produce a Normal
distribution. To conclude
that intelligence follows a
bell curve, we must agree
thatthetestscores
directly measure
intelligence. Many
psychologists don't think
there is one human
characteristic that we can
call "intelligence" and
can measure by a single
test score.
Z=
X=?
0
Z=
1.28
Because Table A gives the areas to the left of z-values, always state the problem in
terms of the area to the left of x.
Step 2: Use the table. Look in the body of Table A for the entry closest to 0.9. It is 0.8997.
This is the entry corresponding to z
0.9 to its left.
= 1.28. So z = 1.28 is the standardized value with area
Step 3: Unstandardize to transform the solution from the z scale back to the original x scale.
We know that the standardized value of the unknown xis z = 1.28. Sox itself satisfies
505 = 1.28
110
X-
Solving this equation for x gives
X=
505 + (1.28)(110)
= 645.8
This equation should make sense: it finds the x that lies 1.28 standard deviations above the
mean on this particular Normal curve. That is the "unstandardized" meaning of z = 1.28.
Step 4: Conclusion. We see that a student must score at least 646 to place in the highest
10%.
2.2 Normal Distributions
Exercises
2.31 How hard do locomotives pull? An important measure of the performance of a
locomotive is its "adhesion," which is the locomotive's pulling force as a multiple of
its weight. The adhesion of one 4400-horsepower diesel locomotive varies in actual
use according to a Normal distribution with mean JL = 0.37 and standard deviation
u = 0.04. For each part that follows, sketch and shade an appropriate Normal distribution. Then show your work.
(a) What proportion of adhesions measured in use are higher than 0.40?
(b) What proportion of adhesions are between 0.40 and 0.50?
(c) Improvements in the locomotive's computer controls change the distribution of adhesion to a Normal distribution with mean JL = 0.41 and standard deviation u = 0.02. Find
the proportions in (a) and (b) after this improvement.
2.32 Table A in reverse Use Table A to find the value z from a standard Normal distribution that satisfies each of the following conditions. (Use the value of z from Table A that
comes closest to satisfying the condition.) In each case, sketch a standard N annal curve with
your value of z marked on the axis. Use the Nomwl Curve Applet to check your answers.
(a) The point z with 25% of the observations falling to the left of it.
(b) The point z with 40% of the observations falling to the right of it.
2.33 Length of pregnancies The length of human pregnancies from conception to birth
varies according to a distribution that is approximately Normal with mean 266 days and
standard deviation 16 days. For each part that follows, sketch and shade an appropriate
Normal distribution. Then show your work.
(a) What percent of pregnancies last less than 240 days (that's about 8 months)?
(b) What percent of pregnancies last between 240 and 270 days (roughly between 8
months and 9 months)?
(c) How long do the longest 20% of pregnancies last?
2.34 IQ test scores Scores on the Wechsler Adult Intelligence Scale (a standard IQ test)
for the 20 to 34 age group are approximately Normally distributed with JL = 110 and
u = 25. For each part that follows, sketch and shade an appropriate Normal distribution.
Then show your work.
(a) What percent of people aged 20 to 34 have IQ scores above 100?
(b) What percent have scores above 150?
(c) MENSA is an elite organization that admits as members people who score in the top
2% on IQ tests. What score on the Wechsler Adult Intelligence Scale would an individual
have to earn to qualify for MENSA membership?
2.35 Locating quartiles The quartiles of any density curve are the points with area 0.25
and 0.75 to their left under the curve. Use Table A or the Nomwl Curve Applet to answer
the following questions.
CHAPTER 2
Describing Location in a Distribution
(a) What are the quartiles of a standard Normal distribution?
(b) How many standard deviations away from the mean do the quartiles lie in any Normal
distribution? What are the quartiles for the lengths of human pregnancies in Exercise 2.33?
2.36 Brush your teeth The amount of time Ricardo spends brushing his teeth follows a
Normal distribution with unknown mean and standard deviation. Ricardo spends less than
one minute brushing his teeth about 40% of the time. He spends more than two minutes
brushing his teeth 2% of the time. Use this information to determine the mean and standard deviation of this distribution.
Assessing Normality
The Normal distributions provide good 1nodels for smne distributions of real data.
Examples include the highway gas mileage of 2006 Corvette convertibles, statewide
une1nployment rates, and weights of 9-ounce bags of potato chips. The distributions
of smne other con1mon variables are usually skewed and therefore distinctly nonNorn1al. Exan1ples include economic variables such as personal income and gross
sales of business firn1s, the survival times of cancer patients after treatment, and the
service lifetime of mechanical or electronic components. While experience can suggest whether or not a Normal distribution is plausible in a particular case, it is risky
to assume that a distribution is Nonnal without actually inspecting the data.
In the latter part of this course we will want to use various statistical inference
procedures to try to answer questions that are in1portant to us. These tests involve
sampling people or objects and recording data to gain insights into the populations
from which they come. Many of these procedures are based on the assumption that
the host population is approximately Normally distributed. Consequently, we need
to develop n1ethods for assessing Nonnality.
Method I Construct a histogram or a sten1plot. See if the graph is approximately
bell-shaped and syn1n1etric about the n1ean.
A histogram or stemplot can reveal distinctly non-Normal features of a distribution, such as outliers, pronounced skewness, or gaps and clusters. You can
improve the effectiveness of these plots for assessing whether a distribution is Normal by marking the points :X, :X ± s, and :X ± 2s on the horizontal axis. This gives
the scale natural to Normal distributions. Then compare the count of observations
in each interval with the 68-95-99.7 rule.
Example 2. 12 Seventh-grade vocabulary scores (again)
Assessing N onnality
The histogram in Figure 2.3 (page 123) suggests thatthe distribution of the 947 Gary vocabulary scores is close to Normal. It is hard to assess by eye how close to Normal a histogram is.
Let's use the 68-95-99.7 rule to check more closely. We enter the scores into a statistical software package and ask for the mean and standard deviation. The computer replies,
MEAN= 6. 8585
STDEV = 1. 5952
2.2 Normal Distributions
Now that we know that x = 6.8585 and s = 1.5952, we check the 68-95-99.7 rule by finding the actual counts of Gary vocabulary scores in intervals of length s about the mean x.
The computer will also do this for us. Here are the counts:
21
2.07
3s
x-
129
3.67
x-
2s
331
5.26
x-s
125
318
6.86
8.45
X
x+s
21
10.05
x + 2s
11.64
x + 3s
The distribution is very close to symmetric. It also follows the 68-95-99.7 rule closely:
68.5% of the scores (649 out of 94 7) are within one standard deviation of the mean, 95.4%
(903 out of947) are within two standard deviations, and 99.8% (94 5 out of947) are within
three standard deviations. These counts confirm that the Normal distribution with
J.L = 6.86 and a = 1.595 fits these data well.
Sn1aller data sets rarely fit the 68-95-99.7 rule as well as the Gary vocabulary scores. This is even true of observations taken from a larger population
that really has a Normal distribution. There is 1nore chance variation in small
data sets.
Method 2 Construct a Normal probability plot. A Nonnal probability plot provides a good assess1nent of the adequacy of the Nonnal1nodel for a set of data.
Most software packages, including Minitab, Fathom, and Crunchlt!, can construct Norn1al probability plots from entered data. The TI-83/84 and TI-89 will
also make Normal probability plots. You will need to be able to produce a Normal probability plot (either with a calculator or with computer software) and
interpret it.
Here is the basic idea of a Norn1al probability plot. The graphs produced by
technology use more sophisticated versions of this idea.
l. Arrange the observed data values frmn smallest to largest. Record what percentile of the data each value occupies. For example, the smallest observation in a set of 20 is at the 5% point, the second smallest is at the 10% point,
and so on.
2. Use the standard Normal distribution (Table A) to find the z-scores at these
san1e percentiles. For example, z = -1.645 is the 5% point of the standard
Norn1al distribution, and z = -1.28 is the 10% point.
3. Plot each data point x against the corresponding z. If the data distribution is
close to Nonnal, the plotted points will lie close to some straight line. Note:
some software plots the x-values on the horizontal axis and the z-scores on the
vertical axis, while other software does just the reverse. You should be able to
interpret the plot in either case.
150
CHAPTER 2
Describing Location in a Distribution
Use of Normal Probability Plots
If the points on a Normal probability plot lie close to a straight line, the plot
indicates that the data are Normal. Systematic deviations from a straight line
indicate a non-Normal distribution. Outliers appear as points that are far away
from the overall pattern of the plot.
The next three examples show you how to interpret Normal probability plots.
Example 2. 13 Connecting wires and wafers
Interpreting Normal probability plots
Let's return to Example 1. 8 (page 54), which described the breaking strengths of connections between very fine wires and a semiconductor wafer. Figure 1. 7 showed that the
distribution of breaking strengths is symmetric and single-peaked. Now we ask: do the
breaking strengths follow a Normal distribution?
Figure 2.21 is a Normal probability plot of the breaking strengths. Lay a transparent
straightedge over the center of the plot to see that most points fall close to a straight line.
A Normal distribution describes these points quite well. The only substantial deviations
are short horizontal runs of points. Each run represents repeated observations having the
same value-there are five measurements at 1150, for example. This phenomenon (called
granularity) is a result of the limited precision of the measurements and does not represent
an important deviation from Normality.
Figure 2.21
Normal probability plot of the breaking strengths of wires bonded
to a semiconductor wafer. This distribution has a Normal shape
except for outliers in both tails.
3000
l
•
2500
!
c:n
"C
1000
I
•
Cl
~ 1500
CJ)
Cl
c::
~
~
1000
05
•
500
•
•
•••
•
•••
•
••
0
-3
-2
-1
0
z-score
1
2
3
151
2.2 Normal Distributions
The high outlier at 3150 pounds lies above the line formed by the center of the datait is farther out in the high direction than we expect Normal data to be. The two low outliers at 0 lie below the line-they are suspiciously far out in the low direction. These outliers were explained in Example 1.8.
Example 2. 14 Guinea pig survival
Some non-Nonnal data
Table 2.2 gives the survival times in days of 72 guinea pigs after they were injected with
tubercle bacilli in a medical experiment.
Table2.2
43
80
91
103
137
191
45
80
92
104
138
198
Survival times {days) of guinea pigs in a medical experiment
53
81
92
107
139
211
56
81
97
108
144
214
56
81
99
109
145
243
57
82
99
113
147
249
58
83
100
114
156
329
66
83
100
118
162
380
67
84
101
121
174
403
73
88
102
123
178
511
74
89
102
126
179
522
79
91
102
128
184
598
Source: T. Bjerkedal. "Acquisition of resistance in guinea pigs infected with different doses of virulent tubercle bacilli,"
American Journal of Hygiene. 72 (1960). pp. 130-148.
Figure 2.22 is a Normal probability plot of the guinea pig survival times. Draw a line
through the leftmost points, which correspond to the smaller observations. The larger
observations fall systematically above this line. That is, the right-of-center observations
have larger values than the Normal distribution. This Normal probability plot indicates
that the guinea pig survival data are strongly right-skewed. In a right-skewed distribution,
the largest observations fall distinctly above a line drawn through the main body of points.
Similarly, left-skewness is evident when the smallest observations fall below the line.
There appear to be no outliers in this set of data.
Figure 2.22
Normal probability plot of the survival times of guinea pigs in a
medical experiment. This distribution is skewed to the right.
•
600
••
500
(ij"
~
••
•
400
Q)
E
·~
-ro
>
300
-~
#'
(I)
200
....
100
-3
-2
/
-1
z-score
152
CHAPTER 2
Describing Location in a Distribution
Example 2. 15 Acidity of rainwater
Some Normal data
Researchers collected data on the acidity levels (measured by pH) in 105 samples of
rainwater. Distilled water has pH 7.00. As water becomes more acidic, the pH goes
down. The pH of rainwater is important to environmentalists because of the problem of
acid rain. 7
Figure 2.23 is a Normal probability plot of the 105 acidity (pH) measurements. The
Normal probability plot makes it clear that a Normal distribution is a good descriptionthere are only minor wiggles in a generally straight-line pattern.
Figure2.23
Normal probability plot of the acidity (pH) values of 105 samples of
rainwater. This distribution is roughly Normal.
•• •
•
•
•
6.5
•
~
6.0
Q)
:::1
-ro
>
:J:
c..
~
co
5.5
:5:
.!:::
co
0:
5.0
J'
..
,.
• •
4.5
-3
-2
-1
0
2
3
z-score
As Figure 2.23 indicates, real data ahnost always show some departure from
the theoretical Nonnal1nodel. When you examine a Normal probability plot,
look for shapes that show clear departures from Normality. Don't overreact to
minor wiggles in the plot. When we discuss statistical methods that are based on
the Norn1al model, we will pay attention to the sensitivity of each method to
departures from Normality. Many co1nmon methods work well as long as the data
are approximately Normal and outliers are not present.
2.2 Normal Distributions
Technology Toolbox
Normal probability plots on the Tl-83/84/89
If you ran the progran1 FLIP 50 in Exercise 2.22 (page 13 3), and you still have the data (I 00 numbers mostly in the 20s) in Ll/listl, then use these data. If you have not entered the program and
run it, take a few 1ninutes to do that now. Duplicate this example with your data. Here is the histogram that was generated at the end of one run of this sin1ulation on each calculator.
TI-83/84
TI-89
P1:L1
min=24
max <26
n=28
n= 29 .
MAIN
RAD AUTO
FUNC
Ask for one-variable statistics:
• Press B
, choose CALC, then 1: 1-Var
Stats and iliJU (Ll)
• In the Stats/List Editor, press; II] (Calc)
and choose 1: 1-Var Stats for list!.
This will give us the following:
1-Var Stats
x=25.04
Ix=2504
I,x 2 =63732
Sx=3.228409246
OX=3.21 2226642
,J,n=100
1 -Var Stats ...
~
x
-E
LX
Ix2
3.
Sx
ax
n
I ~ ·I ~ ~~~X
rfil ( Enter=OK )
MAIN
1-Var Stats
tn=100
minX=18
Q1 =23
Med=25
Q3=27
maxX=33
J
Fl
Too l
RAD AUTO
ax
n
1.
=24.45
=2445.
=60525.
=3.40635624 14 2
=3.38532146602
=100.
=14.
=22 .
MinX
Q1X
MedX
Q3X
MaxX
I (x-x) 2
FUNC
=3.3853216602
=100.
=14.
=22.
=24.
=27.
=32.
=1148. 75
l i CEn~~J::=OJ< :>
MAIN
RAD AUTO
FUNC
f-----
f---
4/4
f-----
4 /4
• Cmnparing the 1neans and medians (x = 25.04 versus M = 25 on the TI-83/84 and x = 24.45
versus M = 24 on the TI-89) suggests that the distributions are fairly symmetric. Boxplots confirm the roughly symmetric shape.
~3~
Pl
~
Med=25
Med: 2 4.
USE HjJ, OR TYPE+ [ESC] =CANCEL
CHAPTER 2
154
Describing Location in a Distribution
Technology Toolbox
Normal probability plots on the Tl-83/84/89 (continued)
• To construct a Normal probability plot of the data with the x-values on the horizontal axis, define
Plot 1 like this:
~ Plot2
gQff
Type : L
I~
Plot3
~
3
dlJ:b
o{]t··>{l}-;·
-
Data List---:-11
Data Axis:f3 Y
Mark: m + ·
_j
No rm Prob Plot ...
r
r-
Plot Number:
List:
Data Axis:
Mark:
Store Zscor es to:
Plotl--7
llistl
I
X--7
Box-?
lstatvarsSz I
<: ESC-CANCEL::>
=8
5
1
6
5
l <: Enter-OK >
zscores [20] =-. 85961736186 ...
~
USE f- AND --7 TO OPEN CHOICES
• Use ZoomStat (ZoomData on the TI-89) to see the finished graph.
P1:L1
D
i.
•••
rl-'
••••
80
)::(
X=18
Y=-2.575829
[]
MAIN
RAD AUTO
FUNC
Interpretation: The Nonnal probability plot is quite linear, so it is reasonable to believe that the data
follow a Normal distribution.
Comment: The cursor on the bottom TI-83/84 screen shot is on the point with an x-value of 18 and
a z-score of- 2. 58. This is the point with the smallest x-value of the 100 data values generated by
the program FLIP 50. By our definition, that puts the value of x = 18 at the 1% point. The alternative definition of percentile (see page 119) would place x = 18 at the 0% point, since no data values are below this one. The Normal probability plot on the TI-83/84 and TI-89 uses a compromise:
it puts this point at the 0. 5% mark. If we look up an area to the left of 0.0050 (0. 5%) in Table A, we
find that z = - 2.58.
Exercises
2.37 Taking stock Figure 2.24 (on the next page) is a Normal probability plot of the
monthly percent returns for U.S. common stocks from June 1950 to June 2000. Because
there are 601 observations, the individual points in the middle of the plot run together. In
what way do monthly returns on stocks deviate from a Normal distribution?
2.2 Normal Distributions
Figure2.24
Normal probability plot of the percent returns on U.S. common
stocks in 601 consecutive months, for Exercise 2.37.
•
10
E
~
0
e
.....
c
Q)
(.)
Q;
c..
-10
•
-20
J
•
-3
-2
-1
2
0
3
z-score
2.38 Carbon dioxide emissions Figure 2.25 is a Normal probability plot of the emissions of
carbon dioxide per person in 48 countries. 8 In what ways is this distribution non-Normal?
Figure2.25
Normal probability plot of C02 emissions in 48 countries, for
Exercise 2.38.
•
20
18
c
0
en
•
16
Q;
0..
Q;
•
14
0..
en
c
12
·s
10
.8
Q)
, ••••
.§.
en
c
8
I
0
·u;
-~
E
6
,
Q)
N
0
u
4
....../
2
0
-3
-2
-1
,
•
tJ/1"
0
z-score
2
3
156
CHAPTER 2
Describing Location in a Distribution
2.39 Great white sharks Here are the lengths in feet of 44 great white sharks:
18.7
16.4
13.2
19.1
12.3
16.7
15.8
16.2
18.6
17.8
14.3
22.8
16.4
16.2
16.6
16.8
15.7
12.6
9.4
13.6
18.3
17.8
18.2
13.2
14.6
13.8
13.2
15.7
15.8
12.2
13.6
19.7
14.9
15.2
15.3
18.7
17.6
14.7
16.1
13.2
12.1
12.4
13.5
16.8
(a) Use the Data Analysis Toolbox (page 93) to describe the distribution of these lengths.
(b) Compare the mean with the median. Does this comparison support your assessment
in (a) of the shape of the distribution? Explain.
(c) Is the distribution approximately Normal? If you haven't done this already, enter the
data into your calculator, and reorder them from smallest to largest. Then calculate the
percent of the data that lies within one standard deviation of the mean. Within two standard deviations of the mean. Within three standard deviations of the mean.
(d) Use your calculator to construct a Normal probability plot. Interpret this plot.
(e) Having inspected the data from several different perspectives, do you think these data
are approximately Normal? Write a brief summary of your assessment that combines your
findings from (a) through (d).
2.40 Cavendish and the density of the earth Repeated careful measurements of the same
physical quantity often have a distribution that is close to Normal. Look back at
Cavendish's 29 measurements of the density of the earth from Exercise 1.60 (page 106):
5.50
5.57
5.42
5.61
5.53
5.47
4.88
5.62
5.63
5.07
5.29
5.34
5.26
5.44
5.46
5.55
5.34
5.30
5.36
5.79
5.75
5.29
5.10
5.68
5.58
5.27
5.85
5.65
5.39
(a) Construct a stemplot to show that the data are reasonably symmetric.
(b) Now check how closely they follow the 68-95-99.7 rule. Find x and s, then count the
number of observations that fall between x - sand x + s, between x - 2s and x + 2s,
and between :X- 3s and x + 3s. Compare the percents of the 29 observations in each of
these intervals with the 68-95-99.7 rule. We expect that, when we have only a few observations from a Normal distribution, the percents will show some deviation from 68, 95,
and 99.7. Cavendish's measurements are in fact close to Normal.
(c) Use your calculator to construct a Normal probability plot for Cavendish's density of
the earth data, and write a brief statement about the Normality of the data. Does the Normal probability plot reinforce your findings in (a) and (b)?
2.41 Normal random numbers Use technology to generate 100 observations from the standard Normal distribution. (The TI-83/84 command randNorrn ( 0, 1, 10 0) --7L 1 or
the TI-89 command tis tat. randNorrn ( 0, 1, 100) --7list1 will do this.)
(a) Make a histogram of these observations. How does the shape of the histogram compare
with a Normal density curve?
(b) Make a Normal probability plot of the data. Does the plot suggest any important deviations from Normality?
2.2 Normal Distributions
(c) Repeat the process of generating 100 observations and completing parts (a) and (b) several times. This should give you a better idea of how close to Normal the data will appear
when we take a sample from a Normal population.
2.42 Uniform random numbers Use technology to generate 100 observations from the uniform distribution described in Exercise 2.10 (page 128). (The TI-83/84 command
rand ( 100) ~L 1 or the TI-89 command rand ( 100) ~list1 will do this.)
(a) Make a histogram of these observations. How does the histogram compare with the
density curve in Figure 2.8 (page 128)?
(b) Make a Normal probability plot of your data. According to this plot, how does the uniform distribution deviate from Normality?
Section 2.2 Summarl
The Normal distributions are described by a special family of bell-shaped,
symmetric density curves, called Normal curves. The mean J.L and standard
deviation a completely specify a Normal distribution N(J.L, a). The 1nean is the
center of the curve, and a is the distance from J.L to the inflection points on
either side.
In particular, all Norn1al distributions satisfy the 68-95-99.7 rule, which
describes what percent of observations lie within one, two, and three standard
deviations of the mean.
All Normal distributions are the same when 1neasuren1ents are standardized. If x has the N(J.L,a) distribution, then the standardized variable
z = (x - J.L)Ia has the standard Normal distribution N(O, l) with mean 0 and
standard deviation l.
Table A gives the proportions of standard Normal observations that are less
than z for many values of z. By standardizing, we can use Table A for any Normal
distribution. The Normal Curve Applet allows you to do Normal calculations
quickly.
In order to perform certain inference procedures in later chapters, we will
need to know that the data cmne from populations that are approximately Norn1ally distributed. To assess Normality, one can observe the shape of histograms,
stemplots, and boxplots and see how well the data fit the 68-95-99.7 rule for Normal distributions. Another good 1nethod for assessing Normality is to construct a
Normal probability plot.
Section 2.2 Exercises
For Exercises 2.43 to 2.47, use Table A or the Nomwl Curve Applet.
2.43 Finding areas from z-scores Find the proportion of observations from a standard
Normal distribution that fall in each of the following regions. In each case, sketch a standard Normal curve and shade the area representing the region.
(a)
z < 1.28
(b) z > -0.42
CHAPTER 2
(c) -0.42 <
Describing Location in a Distribution
z < 1.28
(d) -1.28 < z < -0.42
2.44 Working backward: finding z-values
(a) Find the number z such that the proportion of observations that are less than z in a standard Normal distribution is 0.98.
(b) Find the number z such that 22% of all observations from a standard Normal distribution are greater than z.
2.45 A surprising calculation Changing the mean of a Normal distribution by a moderate amount (while keeping the standard deviation about the same) can greatly change the
percent of observations in the tails. Suppose that a college is looking for applicants with
SAT Math scores of750 and above.
(a) In 2004, the scores of males on the SAT Math test followed the N(537, 116) distribution. What percent of males scored 750 or better? Show your work.
(b) Females' SAT Math scores that year had the N(501, 110) distribution. What percent
of females scored 750 or better? Show your work. You see that the percent of males above
750 is almost three times the percent of females with such scores.
2.46 The stock market The annual rate of return on stock indexes (which combine many
individual stocks) is approximately Normal. Since 1945, the Standard & Poor's 500 index
has had a mean yearly return of 12%, with a standard deviation of 16.5 %. Take this Normal distribution to be the distribution of yearly returns over a long period.
(a) In what range do the middle 95 % of all yearly returns lie? Justify your answer.
(b) The market is down for the year if the return on the index is less than zero. In what proportion of years is the market down? Show your work.
(c) In what proportion of years does the index gain 2 5% or more? Show your work.
2.47 Deciles The deciles of any distribution are the points that mark off the lowest 10%
and the highest 10%. The deciles of a density curve are therefore the points with area 0.1
and 0.9 to their left under the curve.
(a) What are the deciles of the standard Normal distribution?
(b) The heights of young women are approximately Normal with mean 64.5 inches and
standard deviation 2.5 inches. What are the deciles of this distribution? Show your work.
2.48 Outliers in a Normal distribution The percent of the observations that are classified
as outliers by the 1.5 X IQR rule is the same in any Normal distribution. What is this percent? Show your method clearly.
2.49 Runners' heart rates Figure 2.26 is a Normal probability plot of the heart rates of 200
male runners after six minutes of exercise on a treadmill. 9 The distribution is close to Normal. How can you see this? Describe the nature of the small deviations from Normality
that are visible in the plot.
2.2 Normal Distributions
Figure2.26
159
Normal probability plot of the heart rates of 200 male runners, for
Exercise 2.49.
,.,..
140~
130 -i
~
::I
c
.E
Q;
a.
120
I
•
•
110
~
ro
Q)
:::::.
~
~
t
ro
Q)
100
90
:I:
··-'
BOj
•
70
-3
2
3
z-score
2.50 Weights aren't Normal The heights of people of the same sex and similar ages follow
Normal distributions reasonably closely. Weights, on the other hand, are not Normally distributed. The weights of women aged 20 to 29 have mean 141.7 pounds and median 13 3.2
pounds. The first and third quartiles are 118.3 pounds and 157.3 pounds. What can you
say about the shape of the weight distribution? Why?
.nna¥~
... ...
EJ.·,:;.,r;x
,_
......
-==Qlllgll.t; . . . ,
j
Ij
j
j
i
j
42
:Ji
Many plu~.rs £1t the' nonnal OJ:t'V~ when 1t
col!l61
to dlstrlbuUcm or we.liht.
160
CHAPTER 2
e
Describing Location in a Distribution
CLOS~Dl
As~
ThenewSAT
Using what you have learned in this chapter, you should now be ready to
examine some results from the Spring 2005 administrations of the new
SAT discussed in the chapter-opening Case Study (page 113).
1.
Overall, the distribution of scores on the new Writing section followed a
Normal distribution with mean 516 and standard deviation 115.
(a) What percent of students earned scores between 600 and 700 on
the Writing section? Show your work.
(b) What score would place a student at the 65th percentile?
2.
In past years, males have earned higher mean scores on the Verbal and
Math sections of the SAT. On the new Writing section, male scores followed a N(491, 11 0) distribution. Female scores followed a N(502, 108)
distribution .
(a) What percent of male test takers earned scores below the mean
score for female test takers?
(b) What percent of female test takers earned scores above the mean
score for male test takers?
(c) What percent of male test takers earned scores above the 85th percentile of the female scores?
3.
Here are the Writing scores on the new SAT for all test takers at a
medium-sized private high school :
Males:
Females:
660
540
500
620
530
520
440
700
580
430
460
720
610
580
520
630
540
490
760
720
550
71 0
560
530
470
670
530
690
600
570
570
690
550
700
690
530
500
430
660
520
690
600
580
550
420
700
500
460
480
670
560
580
630
690
580
590
640 640 560
620 580 570
580 530 630
550
550
520
570
580
640
530
610
540
640
630
570
520
490 630 620
560 780 450
560 630 610
(a) Did males or females at this school perform better? Give appropriate
graphical and numerical evidence to support your answer.
(b) How do the scores of the males and females at this school compare
with the national results? Write a few sentences discussing this issue.
(c) Is the distribution of scores for the males at this school approximately
Normal? How about the distribution of scores for the females at this
school? Justify your answer with appropriate statistical evidence.
Chapter Review
Chapter Review
Summary
This chapter focused on two big issues: describing an individual value's location
within a distribution of data and modeling distributions with density curves. Both
z-scores and percentiles provide easily calculated Ineasures of relative standing for
individuals. Density curves come in assorted shapes, but all share the property that
the area beneath the curve is 1. We can use areas under density curves to estimate
the proportion of individuals in a distribution whose values fall in a specified
range. In the special case of Normally distributed data, we can use the standard
Normal curve and Table A to calculate such areas. There are many real-world
examples of Normal distributions. When you n1eet a new set of data, you can use
the graphical and numerical tools discussed in this chapter to assess the Normality of the data.
What You Should Have Learned
Here is a review list of the most important skills you should have acquired from
your study of this chapter.
A. Measures of Relative Standing
1. Find the standardized value (z-score) of an observation. Interpret z-scores
in context.
2.
Use percentiles to locate individual values within distributions of data.
3. Apply Chebyshev's inequality to a given distribution of data.
B. Density Curves
1. Know that areas under a density curve represent proportions of all observations and that the total area under a density curve is 1.
2. Approximately locate the median (equal-areas point) and the mean (balance point) on a density curve.
3.
Know that the mean and median both lie at the center of a symmetric
density curve and that the Inean 1noves farther toward the long tail of a
skewed curve.
C. Normal Distributions
1. Recognize the shape of Normal curves and be able to estimate both the
n1ean and standard deviation from such a curve.
2.
Use the 68-95-99.7 rule and sym1netry to state what percent of the observations from a Normal distribution fall between two points when the
points lie at the mean or one, two, or three standard deviations on either
side of the n1ean.
162
CHAPTER 2
Describing Location in a Distribution
3.
Use the standard Normal distribution to calculate the proportion of values
in a specified range and to determine a z-score from a percentile.
4.
Given that a variable has the Normal distribution with mean p., and standard deviation 0', use Table A and your calculator to detern1ine the proportion of values in a specified range.
5.
Given that a variable has the Normal distribution with mean p., and standard deviation 0', calculate the point having a stated proportion of all values to the left or to the right of it.
D. Assessing Normality
1. Plot a histogram, stem plot, and/or boxplot to detennine if a distribution is
bell-sha peel.
2.
Determine the proportion of observations within one, two, and three
standard deviations of the mean, and cmnpare with the 68-95-99.7 rule
for Normal distributions.
3.
Construct and interpret Normal probability plots.
Web Links
You can get more information about standardized tests (like the PSAT, SAT, and
ACT) and intelligence tests (like the Wechsler Adult Intelligence Scale) online.
College Board Web site
www.collegeboard.com
ACT Web site
www.act.org
MENSA Web site
www.us.mensa.org
Chapter Review Exercises
2.51 IQ scores for children The scores of a reference population on the Wechsler Intelligence Scale for Children (WISC) are Normally distributed with JL = 100 and (J' = 15. A
school district classified children as "gifted" if their WISC score exceeded 13 5. There are
1300 sixth-graders in the school district. About how many of them are gifted? Show your
work.
2.52 Understanding density curves Remember that it is areas under a density curve, not
the height of the curve, that give proportions in a distribution. To illustrate this, sketch a
density curve that has its peak at 0 on the horizontal axis but has greater area within 0.25
on either side of 1 than within 0.25 on either side of 0.
2.53 Are the data Normal? Scores on the ACT test for the 2004 high school graduating
class had mean 20.9 and standard deviation 4.8. In all, 1,171,460 students in this class took
the test, and 1,052,490 of them had scores of 27 or lower. 10 If the distribution of scores
were Normal, what percent of scores would be 27 or lower? What percent of the actual
scores were 27 or lower? Does the Normal distribution describe the actual data well?
Chapter Review
2.54 Standardized test scores as percentiles Joey received a report that he scored in the
97th percentile on a national standardized reading test but in the 72nd percentile on the
math portion of the test.
(a) Explain to Joey's grandmother, who knows no statistics, what these numbers mean.
(b) Can we determine Joey's z-scores for his reading and math performance? Why or why
not?
2.55 Helmet sizes The army reports that the distribution of head circumference among
soldiers is approximately Normal with mean 22.8 inches and standard deviation 1.1
inches. Helmets are mass-produced for all except the smallest 5% and the largest 5% of
head sizes. Soldiers in the smallest or largest 5% get custom-made helmets. What head
sizes get custom-made helmets? Show your work.
2.56 More weird density curves A certain density curve consists of a straight-line segment
that begins at the origin, (0, 0), and has slope l.
(a) Sketch the density curve. What are the coordinates of the right endpoint of the segment?
(b) Determine the median, the first quartile (Q 1), and the third quartile (Q 3 ).
(c) Relative to the median, where would you expect the mean of the distribution to be
located?
(d) What percent of the observations lie below 0.5? Above 1.5?
2.57 Are the data Normal? A government report looked at the amount borrowed for college by students who graduated in 2000 and had taken out student loans. 11 The mean
amount was x = $ 17,776 and the standard deviation was s = $12,034. The quartiles were
Ql = $9900, M = $ 15,532, and Q3 = $22,500.
(a) Compare the mean :X and the median M. Also compare the distances of Q 1 and Q 3
from the median. Explain why both comparisons suggest that the distribution is rightskewed. Interpret this in the context of student loans.
(b) The right-skew increases the standard deviation. So a Normal distribution with the
same mean and standard deviation would have a third quartile larger than the actual Q3.
Find the third quartile of the Normal distribution with /.1. = $17,776 and cr = $12,034 and
compare it with Q3 = $22,500.
2.58 Osteoporosis Osteoporosis is a condition in which the bones become brittle due to
loss of minerals. To diagnose osteoporosis, an elaborate apparatus measures bone mineral
density (BMD). BMD is usually reported in standardized form. The standardization is
based on a population of healthy young adults. The World Health Organization (WHO)
criterion for osteoporosis is a BMD score that is 2.5 standard deviations below the mean
for young adults. BMD measurements in a population of people similar in age and sex
roughly follow a Normal distribution.
(a) What percent of healthy young adults have osteoporosis by the WHO criterion?
(b) Women aged 70 to 79 are of course not young adults. The mean BMD in this age group
is about- 2 on the standard scale for young adults. Suppose that the standard deviation is the
same as for young adults. What percent of this older population has osteoporosis?
164
CHAPTER 2
Describing Location in a Distribution
2.59 Normal probability plots Figure 2.27 displays Normal probability plots for four different sets of data. Describe what each plot tells you about the Normality of the given
data set.
Figure 2.27
Four Normal probability plots, for Exercise 2.59.
.. • •
•
•-2 •••
-1
-3
,
/
•
• ••
......
••
2
0
z-score
3
-3
• -•2
••·"
-1
0
1
2
3
z-score
(a)
,....
(b)
•
•
•
•
I
/
I'
•
•
..
-3
•
•
/
• • If
-2
•
-1
0
z-score
(c)
1
2
3
-3
• -2...1'
-1
0
1
2
3
z-score
(d)
2.60 Grading managers Many companies "grade on a bell curve" to compare the performance of their managers and professional workers. This forces the use of some low
performance ratings, so that not all workers are listed as "above average." Ford Motor
Company's "performance management process" for a time assigned 10% A grades, 80% B
grades, and 10% C grades to the company's 18,000 managers. Suppose that Ford's performance scores really are Normally distributed. This year, managers with scores less than 25
received C's and those with scores above 475 received A's. What are the mean and standard deviation of the scores? Show your work.
Chapter Review
Technology Toolbox
Finding areas with ShadeNorm
The TI-83/84/89 can be used to find the area to the left or right of a point or above an interval in a Normal distribution without referring to a standard Nonnal table. Consider the WISC scores for children
ofExercise 2.51. This distribution is N( 100, 15). Suppose we want to find the percent of children whose
WISC scores are above 125. Begin by specifying a viewing window as follows: X[55, 145]1 2 and
Y[ -0.008, 0.028J.oi· You will generally need to experiment with they settings to get a good graph.
TI-83/84
TI-89
• Press Pll!l l!im (DISTR), then choose DRAW
and 1 : ShadeNorm ( .
• Press lih-1ri11el! (i (Flash Apps) and choose
shadNorm(.
• Cmnplete the command ShadeNorm
( 1251 1E99 1100115) and preSS
• Complete the comtnand tis tat. shadNorm
( 125 1 1E99 100 1 15) and preSS
am.
am.
1
FlT
Tools
Area=.04779
low=125
/l
Area =. 04779
low=125.
UP=lE99
MAIN
RAD APPROX
up=l.E99
FUNC
You must always specify an interval. An area in the right tail of the distribution would theoretically be the interval (125, oo). The calculator limitation dictates that we use a nutnber that is
at least 5 or l 0 standard deviations to the right of the mean. To find the area to the left of 8 5,
you would specify ShadeNorm ( -1E9 9 8 5 10 0 15) . Or we could specify Shade
Norm ( 0 8 5 10 0 15) since WISC scores can't be negative. Both yield at least four-decimalplace accuracy. If you're using standard Normal values, then you need to specify only the endpoints of the interval; the 1nean 0 and standard deviation l will be understood. For example, use
ShadeNorm ( 11 2) to find the area above the interval z = l to z = 2 to four decimal places.
Compare this result to the 68-95-99.7 rule.
I
I
I
I
I
I
2.61 Made in the shade Use the calculator's ShadeNorm feature to find the following
areas for the WISC scores of Exercise 2. 51 (page 162), to four-decimal-place accuracy.
Then write your findings in a sentence.
(a) The relative frequency of scores greater than 135 .
(b) The relative frequency of scores lower than 7 5.
(c) Show two ways to find the percent of scores within two standard deviations of the
mean.
CHAPTER 2
Describing Location in a Distribution
Technology Toolbox
Finding areas with normalcdf
The normalcdf com1nand on the TI-83/84/89 can be used to find areas under a Normal curve.
This method has the advantage over ShadeNorn1 of being quicker, and the disadvantage of not
providing a picture of the area it is finding. Here are the keystrokes for the WISC scores of Exercise2.5l:
Tl-83/84
TI-89
• Press mDJ ~ (DISTR) and choose
2: normalcdf (.
• Press liti)fi111Iij [i] (Flash Apps) and choose
normCdf (.
• Complete
• Complete the con1mand tis tat. normCdf
( 12 5 1 E 9 9 1 0 0 15 ) and press [llliil.
command normalcdf
( 12 5 1 E 9 9 1 0 0 15 ) and press ll~lll;l
I
the
I
I
normalcdf(125,1E99,
100,15)
.0477903304
I
I
I
I ~=..........=~....L...:..::.=-==:....::.<...
• tistat.normcdf(l25,l. E99 .,_.
.04779033036
·•·~•eJt
MAIN
RAD APPRO X
FUNC
1/ 30
We can say that about 5% of the WISC scores are above 125. If the Normal values have already
been standardized, then you need to specify only the left and right endpoints of the interval. For
example, normalcdf ( -1~ 1) returns 0.6827, meaning that the area from z = -l to z = l is
approximately 0.6827, correct to four decimal places.
Chapter Review
2.62 Areas by calculator Use the calculator's normalcdf function to verify your answer to
Exercise 2. 53 (page 162).
Technology Toolbox
Finding values with invNorm
The TI-83/84/89 invNorm function calculates the raw or standardized Normal value corresponding to a known area under a Normal curve or a relative frequency. The following example uses the
WISC scores, which have a N(100, 15) distribution. Here are the keystrokes:
TI-83/84
TI-89
• Press PIDJ ~ (DISTR), then choose
3 : invNorm ( .
• Press Press llf.tif!11eln li) (Flash Apps) and
choose invNorm ( .
• Complete the command invNorm ( . 9
10 0 15) and press I@UII;J. Compare this
with the command invNorm ( . 9) .
• Complete the command tis tat. invNorm
(. 9 100 15) and press
Compare
this with the command tis tat. inv
Norm(. 9).
I
I
invNorm(.9,100,15)
119.2232735
invNorm ( . 9)
1.281551567
I
mm.
I
• tistat.invnorm( . 9,100,15)
119 223
• tistat.invnorm(.9)
1.28155
0
MAIN
RAD APPROX
FUNC
2/3 0
The first cmnmand finds that the WISC score that has 90% of the scores below it from the N(lOO,
15 ) distribution is x = 119. The second command finds that the standardized WISC score that has
90% of the scores below it is z = 1.28.
2.63 Inverse normal on the calculator Use the calculator's invNorm function to verify
your answer to Exercise 2.55 (page 163).