Download Document

Document related concepts
no text concepts found
Transcript
MM 207
Unit #5
The Normal Distribution
Copyright © 2009 Pearson Education, Inc.
Slide 1.1- 1
Section 5.1
WHAT IS NORMAL?
Copyright © 2009 Pearson Education, Inc.
Slide 1.1- 2
Suppose a friend is pregnant and due to give birth on
June 30. Would you advise her to schedule an important
business meeting for June 16, two weeks before the due
date?
Figure 5.1 is a
histogram for a
distribution of 300
natural births. The
left vertical axis
shows the number
of births for each 4day bin. The right
vertical axis shows
relative frequencies.
Figure 5.1
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 3
We can find the proportion of births that occurred more
than 14 days before the due date by adding the relative
frequencies for the bins to the left of -14.
These bins have
a total relative
frequency of
about 0.21,
which says that
about 21% of the
births in this data
set occurred
more than 14
days before the
due date.
Figure 5.1
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 4
The Normal Shape
The distribution of the birth data has a fairly distinctive
shape, which is easier to see if we overlay the
histogram with a smooth curve (Figure 5.2).
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 5
For our present purposes, the shape of this smooth
distribution has three very important characteristics:
• The distribution is single-peaked. Its mode, or most
common birth date, is the due date.
• The distribution is symmetric around its single peak;
therefore, its median and mean are the same as its mode.
The median is the due date because equal numbers of
births occur before and after this date. The mean is also the
due date because, for every birth before the due date, there
is a birth the same number of days after the due date.
• The distribution is spread out in a way that makes it
resemble the shape of a bell, so we call it a “bell-shaped”
distribution.
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 6
Figure 5.3 Both distributions are normal and have the same mean
of 75, but the distribution on the left has a larger standard deviation.
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 7
Definition
The normal distribution is a symmetric, bell-shaped
distribution with a single peak. Its peak corresponds to
the mean, median, and mode of the distribution. Its
variation can be characterized by the standard deviation
of the distribution.
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 8
The Normal Distribution and Relative
Frequencies
Relative Frequencies and the Normal Distribution
• The area that lies under the normal distribution curve
corresponding to a range of values on the horizontal axis
is the relative frequency of those values.
• Because the total relative frequency must be 1, the total
area under the normal distribution curve must equal 1, or
100%.
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 9
Figure 5.5 The percentage of the total area in any region under the
normal curve tells us the relative frequency of data values in that region.
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 10
EXAMPLE 2 Estimating Areas
Look again at the normal distribution in Figure 5.5 (slide 5.1-11).
a. Estimate the percentage of births occurring between 0 and 60
days after the due date.
Solution:
a. About half of the total area under the curve lies in the region
between 0 days and 60 days. This means that about 50% of
the births in the sample occur between 0 and 60 days after
the due date.
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 11
EXAMPLE 2 Estimating Areas
Look again at the normal distribution in Figure 5.5 (slide 5.1-11).
b. Estimate the percentage of births occurring between 14 days
before and 14 days after the due date.
Solution:
b. Figure 5.5 shows that about 18% of the births occur more
than 14 days before the due date. Because the distribution is
symmetric, about 18% must also occur more than 14 days
after the due date. Therefore, a total of about 18% 18%
36% of births occur either more than 14 days before or more
than 14 days after the due date. The question asked about
the remaining region, which means between 14 days before
and 14 days after the due date, so this region must represent
100% - 36% = 64% of the births.
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 12
When Can We Expect a Normal
Distribution?
Conditions for a Normal Distribution
A data set that satisfies the following four criteria is likely
to have a nearly normal distribution:
1. Most data values are clustered near the mean, giving
the distribution a well-defined single peak.
2. Data values are spread evenly around the mean,
making the distribution symmetric.
3. Larger deviations from the mean become increasingly
rare, producing the tapering tails of the distribution.
4. Individual data values result from a combination of
many different factors, such as genetic and
environmental factors.
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 13
EXAMPLE 3 Is It a Normal Distribution?
Which of the following variables would you expect to
have a normal or nearly normal distribution?
a. Scores on a very easy test
Solution:
a. Tests have a maximum possible score (100%) that
limits the size of data values. If the test is easy, the
mean will be high and many scores will be close to the
maximum possible. The few lower scores may be
spread out well below the mean. We therefore expect
the distribution of scores to be left-skewed and nonnormal.
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 14
EXAMPLE 3 Is It a Normal Distribution?
Which of the following variables would you expect to
have a normal or nearly normal distribution?
b. Heights of a random sample of adult women
Solution:
b. Height is determined by a combination of many
factors (the genetic makeup of both parents and
possibly environmental or nutritional factors). We
expect the mean height for the sample to be close to
the mode (most common height). We also expect there
to be roughly equal numbers of women above and
below the mean, and extremely large and small heights
should be rare. That is why height is nearly normally
distributed.
Copyright © 2009 Pearson Education, Inc.
Slide 5.1- 15
Section 5.2
PROPERTIES OF THE
NORMAL DISTRIBUTION
Copyright © 2009 Pearson Education, Inc.
Slide 1.1- 16
Consider a Consumer Reports survey in which
participants were asked how long they owned their
last TV set before they replaced it. The variable of
interest in this survey is replacement time for
television sets.
Based on the survey, the distribution of replacement
times has a mean of about 8.2 years, which we denote
as m (the Greek letter mu).
The standard deviation of the distribution is about 1.1
years, which we denote as s (the Greek letter sigma).
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 17
Making the reasonable assumption that the
distribution of TV replacement times is approximately
normal, we can picture it as shown in Figure 5.16.
Figure 5.16 Normal distribution for replacement times for TV sets with a
mean of m 8.2 years and a standard deviation of s 1.1 years.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 18
A simple rule, called the 68-95-99.7 rule, gives
precise guidelines for the percentage of data values
that lie within 1, 2, and 3 standard deviations of the
mean for any normal distribution.
Figure 5.17 Normal distribution illustrating the 68-95-99.7 rule.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 19
The 68-95-99.7 Rule for a Normal Distribution
• About 68% (more precisely, 68.3%), or just over twothirds, of the data points fall within 1 standard deviation
of the mean.
• About 95% (more precisely, 95.4%) of the data points
fall within 2 standard deviations of the mean.
• About 99.7% of the data points fall within 3 standard
deviations of the mean.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 20
EXAMPLE 1 SAT Scores
The tests that make up the verbal (critical reading) and
mathematics SAT (and the GRE, LSAT, and GMAT) are
designed so that their scores are normally distributed with a
mean of m = 500 and a standard deviation of s = 100. Interpret
this statement.
Solution: From the 68-95-99.7 rule, about 68% of students
have scores within 1 standard deviation (100 points) of the
mean of 500 points; that is, about 68% of students score
between 400 and 600.
About 95% of students score within 2 standard deviations (200
points) of the mean, or between 300 and 700.
And about 99.7% of students score within 3 standard deviations
(300 points) of the mean, or between 200 and 800.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 21
EXAMPLE 1 SAT Scores
Solution: (cont.)
Figure 5.18 shows this interpretation graphically; note
that the horizontal axis shows both actual scores and
distance from the mean in standard deviations.
Figure 5.18 Normal distribution for SAT scores, showing the percentages
associated with 1, 2, and 3 standard deviations.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 22
EXAMPLE 2 Detecting Counterfeits
Vending machines can be adjusted to reject coins above and
below certain weights. The weights of legal U.S. quarters have a
normal distribution with a mean of 5.67 grams and a standard
deviation of 0.0700 gram. If a vending machine is adjusted to
reject quarters that weigh more than 5.81 grams and less than 5.53
grams, what percentage of legal quarters will be rejected by the
machine?
Solution: A weight of 5.81 is 0.14 gram, or 2 standard deviations,
above the mean. A weight of 5.53 is 0.14 gram, or 2 standard
deviations, below the mean. Therefore, by accepting
only quarters within the weight range 5.53 to 5.81 grams, the
machine accepts quarters that are within 2 standard deviations of
the mean and rejects those that are more than 2 standard
deviations from the mean. By the 68-95-99.7 rule, 95% of legal
quarters will be accepted and 5% of legal quarters will be rejected.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 23
Applying the 68-95-99.7 Rule
We can apply the 68-95-99.7 rule to determine when
data values lie 1, 2, or 3 standard deviations from the
mean.
For example, suppose that 1,000 students take an
exam and the scores are normally distributed with a
mean of m = 75 and a standard deviation of s = 7.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 24
Figure 5.19 A normal distribution of test scores with a mean of 75 and a
standard deviation of 7. (a) 68% of the scores lie within 1 standard
deviation of the mean. (b) 95% of the scores lie within 2 standard deviations
of the mean.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 25
Identifying Unusual Results
In statistics, we often need to distinguish values that are
typical, or “usual,” from values that are “unusual.” By
applying the 68-95-99.7 rule, we find that about 95% of all
values from a normal distribution lie within 2 standard
deviations of the mean.
This implies that, among all values, 5% lie more than 2
standard deviations away from the mean. We can use
this property to identify values that are relatively
“unusual”:
Unusual values are values that are more than 2 standard
deviations away from the mean.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 26
EXAMPLE 4 Normal Heart Rate
You measure your resting heart rate at noon every day for a year
and record the data. You discover that the data have a normal
distribution with a mean of 66 and a standard deviation of 4. On
how many days was your heart rate below 58 beats per minute?
Solution: A heart rate of 58 is 8 (or 2 standard deviations) below
the mean. According to the 68-95-99.7 rule, about 95% of the data
points are within 2 standard deviations of the mean.
Therefore, 2.5% of the data points are more than 2 standard
deviations below the mean, and 2.5% of the data points are more
than 2 standard deviations above the mean. On 2.5% of 365 days,
or about 9 days, your measured heart rate was below 58 beats per
minute.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 27
EXAMPLE 5 Finding a Percentile
On a visit to the doctor’s office, your fourth-grade daughter is told
that her height is 1 standard deviation above the mean for her age
and sex. What is her percentile for height? Assume that heights of
fourth-grade girls are normally distributed.
Solution: Recall that a data value lies in the nth percentile of a
distribution if n% of the data values are less than or equal to it
(see Section 4.3). According to the 68-95-99.7 rule, 68% of the
heights are within 1 standard deviation of the mean. Therefore,
34% of the heights (half of 68%) are between 0 and 1 standard
deviation above the mean. We also know that, because the
distribution is symmetric, 50% of all heights are below the mean.
Therefore, 50% + 34% = 84% of all heights are less than 1
standard deviation above the mean (Figure 5.21). Your daughter is
in the 84th percentile for heights among fourth-grade girls.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 28
Figure 5.21 Normal distribution curve showing 84% of scores less than 1
standard deviation above the mean.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 29
Standard Scores
Computing Standard Scores
The number of standard deviations a data value lies
above or below the mean is called its standard score (or
z-score), defined by
data value – mean
z = standard score =
standard deviation
The standard score is positive for data values above the
mean and negative for data values below the mean.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 30
EXAMPLE 6 Finding Standard Scores
The Stanford-Binet IQ test is scaled so that scores have a mean of
100 and a standard deviation of 16. Find the standard scores for
IQs of 85, 100, and 125.
Solution: We calculate the standard scores for these IQs by using
the standard score formula with a mean of 100 and standard
deviation of 16.
standard score for 125: z =
85 – 100
= -0.94
16
100 – 100
standard score for 100: z =
= 0.00
16
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 31
EXAMPLE 6 Finding Standard Scores
Solution: (cont.)
125 – 100
standard score for 125: z =
= 1.56
16
We can interpret these standard scores as follows: 85 is 0.94
standard deviation below the mean, 100 is equal to the mean, and
125 is 1.56 standard deviations above the mean.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 32
Figure 5.22 shows the values on the distribution of IQ
scores from Example 6.
Figure 5.22 Standard scores for IQ scores of 85, 100, and 125.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 33
Standard Scores and Percentiles
Once we know the standard score of a data value, the
properties of the normal distribution allow us to find
its percentile in the distribution. This is usually done
with a standard score table, such as Table 5.1 (next
slide).
(Appendix A has a more detailed standard score
table.)
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 34
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 35
EXAMPLE 7 Cholesterol Levels
Cholesterol levels in men 18 to 24 years of age are normally
distributed with a mean of 178 and a standard deviation of 41.
a. What is the percentile for a 20-year-old man with a cholesterol
level of 190?
Solution:
a.The standard score for a cholesterol level of 190 is
data value – mean
190 – 178
z = standard score =
=
≈ 0.29
standard deviation
41
Table 5.1 shows that a standard score of 0.29 corresponds to about
the 61st percentile.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 36
EXAMPLE 7 Cholesterol Levels
Cholesterol levels in men 18 to 24 years of age are normally
distributed with a mean of 178 and a standard deviation of 41.
b. What cholesterol level corresponds to the 90th percentile, the
level at which treatment may be necessary?
Solution:
b. Table 5.1 shows that 90.32% of all data values have a standard
score less than 1.3. Thus, the 90th percentile is about 1.3 standard
deviations above the mean. Given the mean cholesterol level of
178 and the standard deviation of 41, a cholesterol level 1.3
standard deviations above the mean is
A cholesterol level of about 231 corresponds to the 90th percentile.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 37
Toward Probability
Suppose you pick a baby at random and ask whether the baby
was born more than 15 days prior to his or her due date. Because
births are normally distributed around the due date with a
standard deviation of 15 days, we know that 16% of all births
occur more than 15 days prior to the due date (see Example 3).
For an individual baby chosen at random, we can therefore say
that there’s a 0.16 chance (about 1 in 6) that the baby was born
more than 15 days early.
In other words, the properties of the normal distribution allow us
to make a probability statement about an individual. In this case,
our statement is that the probability of a birth occurring more
than 15 days early is 0.16.
This example shows that the properties of the normal distribution
can be restated in terms of ideas of probability.
Copyright © 2009 Pearson Education, Inc.
Slide 5.2- 38
Section 5.3
THE CENTRAL LIMIT
THEOREM
Copyright © 2009 Pearson Education, Inc.
Slide 1.1- 39
Suppose we roll one die 1,000 times and record the outcome
of each roll, which can be the number 1, 2, 3, 4, 5, or 6.
Figure 5.23 shows a histogram of
outcomes. All six outcomes have
roughly the same relative
frequency, because the die is
equally likely to land in each of the
six possible ways. That is, the
histogram shows a (nearly) uniform
distribution (see Section 4.2).
It turns out that the distribution in
Figure 5.23 has a mean of 3.41 and
a standard deviation of 1.73.
Copyright © 2009 Pearson Education, Inc.
Figure 5.23 Frequency and
relative frequency distribution
of outcomes from rolling one
die 1,000 times.
Slide 5.3- 40
Now suppose we roll two dice 1,000 times and record the mean
of the two numbers that appear on each roll. To find the mean
for a single roll, we add the two numbers and divide by 2.
Figure 5.25a shows a typical result.
The most common values in this
distribution are the central values
3.0, 3.5, and 4.0. These values are
common because they can occur in
several ways.
The mean and standard deviation
for this distribution are 3.43 and
1.21, respectively.
Figure 5.25a Frequency and relative
frequency distribution of sample means
from rolling two dice 1,000 times.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 41
Suppose we roll five dice 1,000
times and record the mean of the
five numbers on each roll. A
histogram for this experiment is
shown in Figure 5.25b.
Once again we see that the central
values around 3.5 occur most
frequently, but the spread of the
distribution is narrower than in the
two previous cases.
The mean and standard deviation
are 3.46 and 0.74, respectively.
Figure 5.25b Frequency and
relative frequency distribution
of sample means from rolling
five dice 1,000 times.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 42
If we further increase the
number of dice to ten on each
of 1,000 rolls, we find the
histogram in Figure 5.25c,
which is even narrower.
In this case, the mean is 3.49
and standard deviation is 0.56.
Figure 5.25c Frequency and
relative frequency distribution of
sample means from rolling ten
dice 1,000 times.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 43
Table 5.2 shows that as the sample size increases, the mean
of the distribution of means approaches the value 3.5 and the
standard deviation becomes smaller (making the distribution
narrower).
More important, the distribution looks more and more like a
normal distribution as the sample size increases.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 44
The Central Limit Theorem
Suppose we take many random samples of size n for a
variable with any distribution (not necessarily a normal
distribution) and record the distribution of the means of
each sample. Then,
1. The distribution of means will be approximately a
normal distribution for large sample sizes.
2. The mean of the distribution of means approaches the
population mean, m, for large sample sizes.
3. The standard deviation of the distribution of means
approaches σ/ n for large sample sizes, where s is
the standard deviation of the population.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 45
Be sure to note the very important adjustment, described
by item 3 above, that must be made when working with
samples or groups instead of individuals:
The standard deviation of the distribution of sample
means is not the standard deviation of the population, s,
but rather s/ n , where n is the size of the samples.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 46
TECHNICAL NOTE
(1) For practical purposes, the distribution of means
will be nearly normal if the sample size is larger than
30.
(2) If the original population is normally distributed,
then the sample means will be normally distributed
for any sample size n.
(3) In the ideal case, where the distribution of
means is formed from all possible samples, the
mean of the distribution of means equals μ and the
standard deviation of the distribution of means
equals σ/ n.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 47
Figure 5.26 As the sample size increases (n = 5, 10, 30), the distribution of sample
means approaches a normal distribution, regardless of the shape of the original
distribution. The larger the sample size, the smaller is the standard deviation of the
distribution of sample means.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 48
EXAMPLE 1 Predicting Test Scores
You are a middle school principal and your 100 eighth-graders are
about to take a national standardized test. The test is designed so
that the mean score is m = 400 with a standard deviation of s = 70.
Assume the scores are normally distributed.
a. What is the likelihood that one of your eighth-graders, selected
at random, will score below 375 on the exam?
Solution:
a. In dealing with an individual score, we use the method of
standard scores discussed in Section 5.2. Given the mean of
400 and standard deviation of 70, a score of 375 has a standard
score of
data value – mean
z = standard deviation = 375 – 400 = -0.36
70
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 49
EXAMPLE 1 Predicting Test Scores
Solution: (cont.)
According to Table 5.1, a standard score of -0.36 corresponds to
about the 36th percentile— that is, 36% of all students can be
expected to score below 375. Thus, there is about a 0.36 chance
that a randomly selected student will score below 375.
Notice that we need to know that the scores have a normal
distribution in order to make this calculation, because the table of
standard scores applies only to normal distributions.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 50
EXAMPLE 1 Predicting Test Scores
You are a middle school principal and your 100 eighth-graders are
about to take a national standardized test. The test is designed so
that the mean score is m = 400 with a standard deviation of s = 70.
Assume the scores are normally distributed.
b. Your performance as a principal depends on how well your
entire group of eighth-graders scores on the exam. What is the
likelihood that your group of 100 eighth-graders will have a
mean score below 375?
Solution:
b. The question about the mean of a group of students must be
handled with the Central Limit Theorem. According to this
theorem, if we take random samples of size n = 100 students
and compute the mean test score of each group, the distribution
of means is approximately normal.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 51
EXAMPLE 1 Predicting Test Scores
Solution: (cont.)
Moreover, the mean of this distribution is m = 400 and its standard
deviation is s / n = 70/ 100 = 7. With these values for the mean
and standard deviation, the standard score for a mean test score of
375 is
data value – mean
z = standard deviation = 375 – 400 = -0.357
7
Table 5.1 shows that a standard score of -3.5 corresponds to the
0.02th percentile, and the standard score in this case is even lower.
In other words, fewer than 0.02% of all random samples of 100
students will have a mean score of less than 375.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 52
EXAMPLE 1 Predicting Test Scores
Solution: (cont.)
Therefore, the chance that a randomly selected group of 100
students will have a mean score below 375 is less than 0.0002,
or about 1 in 5,000.
Notice that this calculation regarding the group mean did not
depend on the individual scores’ having a normal distribution.
This example has an important lesson. The likelihood of an
individual scoring below 375 is more than 1 in 3 (36%), but
the likelihood of a group of 100 students having a mean score
below 375 is less than 1 in 5,000 (0.02%).
In other words, there is much more variation in the scores of
individuals than in the means of groups of individuals.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 53
The Value of the Central Limit Theorem
The Central Limit Theorem allows us to say something about
the mean of a group if we know the mean, m, and the standard
deviation, s, of the entire population. This can be useful, but it
turns out that the opposite application is far more important.
Two major activities of statistics are making estimates of
population means and testing claims about population means. Is
it possible to make a good estimate of the population mean
knowing only the mean of a much smaller sample?
As you can probably guess, being able to answer this type of
question lies at the heart of statistical sampling, especially in
polls and surveys. The Central Limit Theorem provides the key
to answering such questions.
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 54
Q & A???
Copyright © 2009 Pearson Education, Inc.
Slide 5.3- 55
Related documents