Download standard deviations from the mean

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Chapter 6
The Standard Deviation as a
Ruler and the Normal Model
Copyright © 2009 Pearson Education, Inc.
NOTE on slides / What we can and cannot do

The following notice accompanies these slides, which have been downloaded
from the publisher’s Web site:
“This work is protected by United States copyright laws and is provided solely
for the use of instructors in teaching their courses and assessing student
learning. Dissemination or sale of any part of this work (including on the
World Wide Web) will destroy the integrity of the work and is not permitted.
The work and materials from this site should never be made available to
students except by instructors using the accompanying text in their classes.
All recipients of this work are expected to abide by these restrictions and to
honor the intended pedagogical purposes and the needs of other instructors
who rely on these materials.”

We can use these slides because we are using the text for this course.
Please help us stay legal. Do not distribute these slides any further.

The original slides are done in orange / brown and black. My additions are in
red and blue. Topics in green are optional.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 3
Topics in this chapter









Shifting and Rescaling Data
Standardized values (z-scores)
Using the standard deviation as a ruler
The Normal Model
The 68-95-99.7 Rule
Finding normal percents
and the reverse
Normal Probability Plots
The Normality Assumption
Copyright © 2009 Pearson Education, Inc.
Slide 1- 4
Division of Mathematics, HCC
Course Objectives for Chapter 6
After studying this chapter, the student will be able to:
20.
Compare values from two different distributions using their zscores.
21.
Use Normal models (when appropriate) and the 68-95-99.7
Rule to estimate the percentage of observations falling
within one, two, or three standard deviations of the mean.
22.
Determine the percentages of observations that satisfy
certain conditions by using the Normal model and determine
“extraordinary” values, and the reverse.
23.
Determine whether a variable satisfies the Nearly Normal
condition by making a normal probability plot or histogram.

Note: It is essential that this chapter be mastered. Almost
everything in Unit 3 depends on it.
Copyright © 2009 Pearson Education, Inc.
Let’s review our summary statistics



Workers in a particular office have the following annual
salaries (in thousands)
 62, 62, 58, 54, 50, 46, 44
What are the summary statistics (rounded)?
 Mean: 53.7
 Median: 54
 Range: 18
 Standard Deviation: 7.34
The boss wants to reward everyone for a job well-done.
He can give them a one-time bonus or an extra raise.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 6
Option 1 - $5K Bonus





The data become
 67, 67, 63, 59, 55, 51, 49
The summary statistics become
 Mean: 58.7 (was 53.7)
 Median: 59 (was 54)
 Range: 18 (was 18)
 Standard Deviation: 7.34 (was 7.34)
What has changed? By what?
What has stayed the same?
This is called Shifting
Copyright © 2009 Pearson Education, Inc.
Slide 1- 7
Option 2 – 5% raise





The data become
 65.1, 65.1, 60.9, 56.7, 52.5, 48.3, 46.2
The summary statistics become
 Mean: 56.4 (was 53.7)
 Median: 56.7 (was 54)
 Range: 18.9 (was 18)
 Standard Deviation: 7.74 (was 7.34)
What has changed? By what?
What has stayed the same?
This is called Rescaling
Copyright © 2009 Pearson Education, Inc.
Slide 1- 8
Shifting Data

Shifting data:
 Adding (or subtracting) a constant to every data value
adds (or subtracts) the same constant to measures of
position.
 Adding (or subtracting) a constant to each value will
increase (or decrease) measures of position: center,
percentiles, max or min by the same constant.
 Its shape and spread - range, IQR, standard deviation remain unchanged. When we gave the employees a
$5K bonus, we shifted their salaries.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 9
Another example – shifting data





80 men of a particular height and body frame were
weighed.
The average weight in kilograms is 82.36.
The NIH recommends that the average be 74 kg.
Let’s shift by 74 kg.
The new mean is 8.36 kg. Note also
 If I weigh 80 kg, then I am +6 kg with respect to normal
weight.
 If I weigh 70 kg, then I am -4 kg with respect to normal
weight.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 10
Shifting Data (cont.)

NIH example: The following histograms show a
shift from men’s actual weights to kilograms
above (or if negative, below) recommended
weight:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 11
Rescaling Data

Rescaling data:
 When we multiply (or divide) all the data values
by any constant, all measures of position (such
as the mean, median, and percentiles) and
measures of spread (such as the range, the
IQR, and the standard deviation) are multiplied
(or divided) by that same constant.
 When we gave the employees a 5% raise, we
rescaled their salaries.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 12
Rescaling Data (cont.)

NIH Example: The men’s weight data set measured
weights in kilograms. If we want to think about these
weights in pounds, we would rescale the data:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 13
Summary of effects
5K bonus
5% raise
Shifted
Rescaled
Mean
Up amt of shift
Up same percent
Median
Up amt of shift
Up same percent
Range
No change
Up same percent
Standard Deviation
No change
Up same percent
Copyright © 2009 Pearson Education, Inc.
Slide 1- 14
Summary of Effects


Shifting: Adding (or subtracting) a constant to
every data value adds (or subtracts) the same
constant to measures of center, but leaves the
measure of spread unchanged.
Rescaling: When we multiply (or divide) every
data value by a constant, all measures of center
and spread are multiplied (or divided) by that
same constant.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 15
Let’s do one more thing




We have our office salaries
 62, 62, 58, 54, 50, 46, 44
 Recall: Mean = 53.7, St. Dev. = 7.34
Let’s shift then so that the average is 0.
 We get 8.3, 8.3, 4.3, 0.3, -3.7, -7.7. -9.7
 Mean is (approximately) 0.
Now divide them by the standard deviation.
 We get 1.13, 1.13, .59, -.04, -.5, -1.05, -1.32
 Mean is still 0, Standard Deviation is 1.
We have “standardized” the salaries.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 16
Benefits of Standardizing



Standardized values have been converted from
their original units to the standard statistical unit
of standard deviations from the mean.
Thus, we can compare values that are measured
on different scales, with different units, or from
different populations.
Compare:


62, 62, 58, 54, 50, 46, 44
1.13, 1.13, .59, -.04, -.5, -1.05, -1.32
Copyright © 2009 Pearson Education, Inc.
Slide 1- 17
The Standard Deviation as a Ruler



The trick in comparing very different-looking
values is to use standard deviations as our rulers.
The standard deviation tells us how the whole
collection of values varies, so it’s a natural ruler
for comparing an individual to a group.
As the most common measure of variation, the
standard deviation plays a crucial role in how we
look at data.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 18
Standardizing with z-scores

We compare individual data values to their mean,
relative to their standard deviation using the
following formula:
y  y

z
s

We call the resulting values standardized values,
denoted as z. They can also be called z-scores.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 19
Standardizing with z-scores (cont.)




Standardized values have no units.
z-scores measure the distance of each data
value from the mean in standard deviations.
That is, a z-score measures how many standard
deviations we are from the mean.
A negative z-score tells us that the data value is
below the mean, while a positive z-score tells us
that the data value is above the mean.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 20
Standardizing with z-scores (cont.)

Standardizing data into z-scores shifts the data
by subtracting the mean and rescales the values
by dividing by their standard deviation.
 Standardizing into z-scores does not change
the shape of the distribution.
 Standardizing into z-scores changes the center
by making the mean 0.
 Standardizing into z-scores changes the
spread by making the standard deviation 1.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 21
When Is a z-score BIG?




A z-score gives us an indication of how unusual a
value is because it tells us how far it is from the
mean.
In particular, the z-score of a data point measures
the number of standard deviations the data point is
from the mean.
Remember that a negative z-score tells us that the
data value is below the mean, while a positive zscore tells us that the data value is above the mean.
The larger a z-score is (negative or positive), the
more unusual it is.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 22
When Is a z-score Big? (cont.)




There is no universal standard for z-scores, but
there is a model that shows up over and over in
Statistics.
This model is called the Normal model (You may
have heard of “bell-shaped curves.”).
Normal models are appropriate for distributions
whose shapes are unimodal and roughly
symmetric.
These distributions provide a measure of how
extreme a z-score is.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 23
When Is a z-score Big? (cont.)



There is a Normal model for every possible combination
of mean and standard deviation.
 We write N(μ,σ) to represent a Normal model with a
mean of μ and a standard deviation of σ.
We use Greek letters because this mean and standard
deviation do not come from data—they are numbers
(called parameters) that specify the model.
Nothing is ever perfectly normal (or perfectly much of any
“nice” distribution.) However, the normal model is useful
for a wide variety of situations.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 24
Example



Suppose we asked 30 people the question:
At what age did you get your first real job?
We could construct a histogram and see if
any pattern emerges.
Source (next ten slides including this one): Marc Boyer
and Martine Ferguson, slides for “Basic Statistics
Course”, given in Fall 2008 at the FDA Center for Food
Safety and Applied Nutrition.
Copyright © 2009 Pearson Education, Inc.
Copyright © 2009 Pearson Education, Inc.

Now suppose that we asked 300 people the
same question.

Observe as the number of people we ask
increases, the graph begins to take the shape of
what the population would look like.

The histogram now begins to take the shape of a
normal or Gaussian distribution because the
underlying distribution is normal.
Copyright © 2009 Pearson Education, Inc.
Copyright © 2009 Pearson Education, Inc.
Now suppose that we asked 3000 people
the same question.
As the number of people increases the
histogram appears more smooth.
The histogram now looks like a Normal
probability distribution.
Copyright © 2009 Pearson Education, Inc.
Copyright © 2009 Pearson Education, Inc.




Next we will see what the population looks like
(plot of the distribution of values in the entire
population).
The previous histograms had a vertical scale that
showed the percentage of observations in each
category.
Now the vertical axis doesn’t show the percent of
observations since we have an infinite
population.
We must start thinking in terms of area under the
curve.
Copyright © 2009 Pearson Education, Inc.
The distribution of all values in the
population is no longer a histogram.
The area under the entire curve represents
the entire population, and the proportion of
that area that falls between two values is
the probability of observing a value in that
interval.
Copyright © 2009 Pearson Education, Inc.
Copyright © 2009 Pearson Education, Inc.




The mean is the center of the normal distribution.
The standard deviation gives an expression of
the spread.
A special case of the normal distribution is called
the standard normal distribution.
The standard normal distribution has mean zero
and standard deviation 1.
Copyright © 2009 Pearson Education, Inc.


Why is a normal distribution like a lion?
They both have a mean µ!
Copyright © 2009 Pearson Education, Inc.
Slide 1- 35
When Is a z-score Big? (cont.)


Summaries of data, like the sample mean and
standard deviation, are written with Latin letters.
Such summaries of data are called statistics.
When we standardize Normal data, we still call the
standardized value a z-score, and we write
z
Copyright © 2009 Pearson Education, Inc.
y

Slide 1- 36
When Is a z-score Big? (cont.)


Once we have standardized, we need only one
model:
 The N(0,1) model is called the standard
Normal model (or the standard Normal
distribution).
Be careful—don’t use a Normal model for just any
data set, since standardizing does not change the
shape of the distribution.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 37
When Is a z-score Big? (cont.)


When we use the Normal model, we are
assuming the distribution is Normal.
We cannot check this assumption in practice, so
we check the following condition:
 Nearly Normal Condition: The shape of the
data’s distribution is unimodal and symmetric.
 This condition can be checked by making a
histogram or a Normal probability plot (to be
explained later).
Copyright © 2009 Pearson Education, Inc.
Slide 1- 38
A little history – Normal Model

First published in 1718 by Abraham de Moivre
(“Doctrine of Chances”).
 He had no idea how to apply it to experimental
observations.
 Context of estimating binomial (coin-toss, etc.)
probabilities for large n.
 The paper remained unknown until another
statistician, Karl Pearson, discovered it in
1924!
Copyright © 2009 Pearson Education, Inc.
Slide 1- 39
A little history – Normal Model





Pierre-Simon, Marquis Laplace - Analytical Theory of
Probabilities (1812) – first used the normal distribution in
1778 for the analysis of errors of experiments.
Karl Friedrich Gauss, who claimed to have used the
method since 1794, justified it rigorously in 1809
(independent of LaPlace). Sometimes the Normal
distribution is referred to as the Gaussian distribution.
The name "bell curve" goes back to Jouffret who first used
the term "bell surface" in 1872 for a “multivariate normal”
distribution, i.e. an extension to three dimensions.
The name "normal distribution" was coined independently
by Charles S. Peirce, Francis Galton and Wilhelm Lexis
around 1875.
The independent discoveries show how naturally the
Normal Model arises.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 40
The 68-95-99.7 Rule


Normal models give us an idea of how extreme a
value is by telling us how likely it is to find one
that far from the mean.
We can find these numbers precisely, but until
then we will use a simple rule that tells us a lot
about the Normal model…
Copyright © 2009 Pearson Education, Inc.
Slide 1- 41
The 68-95-99.7 Rule (cont.)

It turns out that in a Normal model:
 about 68% of the values fall within one
standard deviation of the mean;
 about 95% of the values fall within two
standard deviations of the mean; and,
 about 99.7% (almost all!) of the values fall
within three standard deviations of the mean.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 42
The 68-95-99.7 Rule (cont.)

The following shows what the 68-95-99.7 Rule
tells us:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 43
More on the 68-95-99.7 rule
If the population is normally distributed then:
1.
Approximately 68% of the observations are within 1
standard deviation of the population mean.
2.
Approximately 95% of the observations are within 2
standard deviations of the population mean.
3.
Approximately 99.7% of the observations are within
3 standard deviations of the population mean.
Source (this and the next 5 slides): Marc Boyer and Martine
Ferguson, slides for “Basic Statistics Course”, given in Fall
2008 at FDA Center for Food Safety and Applied Nutrition.
Copyright © 2009 Pearson Education, Inc.
Approximately 68% of the observations fall within 1
standard deviation of the mean
Copyright © 2009 Pearson Education, Inc.
More on the 68-95-99.7 rule




Note that the range "within one standard deviation of the mean" is
highlighted in green.
The area under the curve over this range is the relative frequency of
observations in the range.
That is, 0.68 = 68% of the observations fall within one standard
deviation of the mean (µ ± σ).
Below the axis, in red, is another set of numbers.
 These numbers are simply measures of standard deviations from
the mean.
 In working with the variable X we will often find it necessary to
convert into units of standard deviations from the mean.
 When the variable is measured this way, the letter Z is commonly
used.
Copyright © 2009 Pearson Education, Inc.
Approximately 95% of the observations fall within 2
standard deviations of the mean
Copyright © 2009 Pearson Education, Inc.
Approximately 99.7% of the observations fall within 3
standard deviations of the mean
Copyright © 2009 Pearson Education, Inc.
Copyright © 2009 Pearson Education, Inc.
The First Three Rules for Working with
Normal Models




Make a picture.
Make a picture.
Make a picture.
And, when we have data, make a histogram to
check the Nearly Normal Condition to make sure
we can use the Normal model to model the
distribution.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 50
Example: 68-95-99.7 rule.



Example: For men aged 18 to 24, serum
cholesterol levels have a mean of 178 mg/100mL
with a standard deviation of 40.7 mg/mL.
Pete’s cholesterol reading is 231.
Where is Pete with respect to the mean
cholesterol level?
Copyright © 2009 Pearson Education, Inc.
Watching Pete’s Cholesterol level






(231 – 178) = 53.
The standard deviation was 40.7
53 / 40.7 is 1.3 standard deviations above the mean.
Pete’s z-score is 1.3
We can say that between 68% and 95% of the stated
population has a cholesterol level more extreme than
Pete’s. Between 2.5% and 16% have a cholesterol
level higher than Pete’s.
We can say more using technology.
Copyright © 2009 Pearson Education, Inc.
Importance of the z-score




The z-score is a ruler for comparing populations,
even those which do not have the same mean
and standard deviation.
One study showed the mean cholesterol of
American women as 188 mg/100mL and a
standard deviation of 24 mg/100 mL.
By coincidence, Susan has a cholesterol reading
of 231 mg/100mL.
Who’s is really higher – Pete’s or Susan’s?
Copyright © 2009 Pearson Education, Inc.
Pete and Susan






Susan is above her mean by 231-188, or 43
mg/100mL.
43.24 = 1.792 standard deviations.
Susan’s z-score is 1.792.
Pete’s z-score is 1.3.
Susan’s is higher.
Medically, Susan may have a bigger problem
than Pete.
Copyright © 2009 Pearson Education, Inc.
Pete and Susan




Susan’s reading is closer to the mean (43
mg/100ml vs. Pete’s of 53 mg/100ml).
But Susan’s population has smaller variability
than Pete’s.
This made Susan’s cholesterol more extreme
than Pete’s.
It’s about variability!
Copyright © 2009 Pearson Education, Inc.
*Finding Normal Percentiles by Hand
(This is both slow and inaccurate)



When a data value doesn’t fall exactly 1, 2, or 3
standard deviations from the mean, we can look it
up in a table of Normal percentiles.
Table Z in Appendix D provides us with normal
percentiles, but many calculators and statistics
computer packages provide these as well.
Let’s use the technology. As for the tables – let’s
not and say we did!
Copyright © 2009 Pearson Education, Inc.
Slide 1- 56
*Finding Normal Percentiles by Hand (cont.)


Table Z is the standard Normal table. We have to convert
our data to z-scores before using the table.
The figure shows us how to find the area to the left when
we have a z-score of 1.80:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 57
Finding Normal Percentiles Using Technology
Much preferred method



Many calculators and statistics programs have the
ability to find normal percentiles for us.
Both the TI and StatCrunch will easily do it.
The ActivStats Multimedia Assistant offers two methods
for finding normal percentiles:

The “Normal Model Tool” makes it easy to see how
areas under parts of the Normal model correspond to
particular cut points.

There is also a Normal table in which the picture of
the normal model is interactive.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 58
Finding Normal Percentiles Using Technology
(cont.)
The following was produced with the “Normal
Model Tool” in ActivStats:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 59
Finding Normal Percentiles Using the TI


To find the percentile between z = -0.5
and z = 1.

Press 2nd VARS, which will get
you “DISTR”

Press 2 – normalcdf(, then
“ENTER

When normalcdf( appears, type (.5,1) (Not – as in subtract).

Your answer will display.
To find the percentile less than 1

Enter normalcdf (-999,1) as above.

Similarly, we can enter
normalcdf(-.5,999) to find the
percent of values bigger than -0.5.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 60
Normalcdf with the 2.55 operating system



See the screen
captures at the right for
the percentile between
z = -0.5 and z = 1.
Enter -.5 and 1 in the
menu; then select
Paste.
Normalcdf appears on
the next screen (you
must scroll to see it
all.).
Copyright © 2009 Pearson Education, Inc.
Slide 1- 61
(Make a picture)3 with ShadeNorm




Select [DISTR] as
before, but this time,
select [DRAW], then
ShadeNorm.
Enter -0.5 and 1.
The result is at the
lower right.
You might have to
select an appropriate
window.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 62
Copyright © 2009 Pearson Education, Inc.
Slide 1- 63
Finding Normal Percentiles Using StatCrunch


Under Stat, select
“Calculators”, then
“Normal”.
First, select <= and
list – 0.5. Note the
answer as 0.3085.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 64
Finding Normal Percentiles Using StatCrunch





Now select => and
type 1. Note the
answer as 0.1587.
Then calculate
1 – 0.3085 – 0.1587
=0.5328
I recommend the TI –
it is faster and more
direct.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 65
Another way with StatCrunch




Under Data, select
“Compute Expression”
Type the expression
as shown.
The answer will be
added to the first
nonempty column
(0.5328023).
I still recommend the
TI.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 66
Let’s verify the 68-95-99.7 rule with the TI!

It turns out that in a Normal model:
 about 68% of the values fall within one
standard deviation of the mean;


about 95% of the values fall within two
standard deviations of the mean;


Normalcdf((-)1,1)=.6826894809
Normalcdf((-)2,2)=.954499876
about 99.7% (almost all!) of the values fall
within three standard deviations of the mean.

Normalcdf((-3),3)=.9973000656
Copyright © 2009 Pearson Education, Inc.
Slide 1- 67
From Percentiles to Scores: z in Reverse


Sometimes we start with areas and need to find
the corresponding z-score or even the original
data value.
Example: What z-score represents the first
quartile in a Normal model?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 68
Z in reverse with the TI.





SAT Math scores are normally distributed with a
mean score of 500 and a standard deviation of 100.
Great Eastern Technical University brags that they
will consider for admission only students in the 90th
percentile of the population as measured by the
SAT Math scores.
What is the cutoff for admission consideration at
GETU?
There is a TI function, InvNorm “Inverse Normal”.
This function goes the other way from normalcdf.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 69
SAT score with InvNorm




The arguments are InvNorm(Pct,Mean,StDev)
Because InvNorm takes only Percentile, it gives
the score corresponding to the area from the
percentile to the extreme left side.
InvNorm(0.90,500,100) = 628.155.
Since SAT scores are typically reported in units of
10, a 630 is required for admission consideration
at GWTU.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 70
Using InvNorm



Enter 2nd, then DIST, then InvNorm
Old Operating System:
 Enter “.9,500,100” (this includes entering the commas)
 Close Parentheses
 You will then nave “invNorm(.9,500,100)” entered.
New Operating System: Fill in as shown below:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 71
Using InvNorm




Notice that the default mean is 0, standard
deviation is 1.
Leaving it this way will get the z-score
corresponding to the 90th percentile, i.e.
InvNorm(0.9) = 1.2816.
This will be very useful when we get to Unit 3.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 72
Z in reverse with the TI.






SAT Math scores are normally distributed with a mean
score of 500 and a standard deviation of 100.
Great Western Technical University brags that they will
consider for admission only students in the top 5% of
the population as measured by the SAT Math scores.
What is the cutoff for admission consideration at
GWTU?
Note that the top 5% is the bottom 95%.
InvNorm(0.95,500,100) = 664.48
Again, if SAT reports in units of 10, you would need an
SAT Math score of 670 for consideration at GWTU.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 73
What z-scores correspond to
the middle 95%?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 74
Middle 95%


The z-score cutoffs for the middle 95% are +z and
–z. How to find z?
Issue:





InvNorm goes only from a cutoff to the extreme left.
We need to “fudge” to accommodate InvNorm!
There is 0.95 in the middle, plus 0.025 on the
extreme left.
InvNorm(0.975) is 1.959963 or 1.96.
This is extremely important for Unit 3. You need to
understand this and keep it in mind.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 75
z in Reverse with StatCrunch


Start out the same way as before
This time, use the right-hand side and look for the
answer on the left (1.959964)
Copyright © 2009 Pearson Education, Inc.
Slide 1- 76

An Application to Test Scores
QUESTION 1
 The SAT Verbal has a mean of 500
and a standard deviation of 100.
 Pat got 650 on the SAT Verbal.
How well did Pat do?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 77
An Application to Test Scores

ANSWER – actually, two answers!
 If we standardize Pat’s score, we get
650 – 500
100






Or a z-score of +1.5.
That is, Pat’s score is 1.5 standard deviations above
the mean SAT score of 500.
Only people who have had statistics think in terms of
z-scores, so let’s figure a percentile.
We use normalcdf(?,1.5). ??
On the TI, use a low lower bound such as -999.
Normalcdf((-)999,1.5)=93.32%, a respectable job.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 78
An Application to Test Scores

Tell: If Pat got a 650 on the SAT Verbal, his
score was in the 93.32 percentile.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 79
An Application to Test Scores


QUESTION 2

One college that Pat is considering requires the ACT. Pat took it
as well and got a 27.

The ACT has a mean of 21 and a standard deviation of 4.7

How well did Pat do? As well as the SAT?
ANSWER

Standardizing Pat’s score, we get
27 – 21
4.7



Or a z-score of +1.28. Not quite as good.
As before, on the TI, use a low lower bound such as -.999.
For a percentile, Normalcdf((-)999,1.28)=89.97%, still a good
showing.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 80
An Application to Test Scores

Tell: If Pat got a 27 on the ACT, which is
N(21,4.7), then Pat’s score was in the 89.97
percentile. This score was not as good as his
SAT score, which was in the 93.32 percentile.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 81
An Application to Test Scores


QUESTION 3
 How well would Pat have to do on the ACT to match
his percentile (93.32) and equivalent z-score (1.5) on
the SAT?
ANSWER
 Remembering our standardization, we have (X is Pat’s
ACT score)
1.5 = X – 21
4.7


We manipulate to get X (1.5)*(4.7)+21 = 27.05!
Even though Pat just missed it with the 27, 28 is
needed since ACT scores are reported in whole
numbers!
Copyright © 2009 Pearson Education, Inc.
Slide 1- 82
An Application to Test Scores

Tell: In order to do as well on the ACT as on the
SAT, Pat would need an ACT score of 28.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 83
Shortcut with the TI



If we have, for
example, N(500,100)
and Pat’s 650, we do
not have to compute
the z-score.
Use normalcdf with
two more parameters
as shown.
(-999,1.5) vs
(-999,650,500,100)
Copyright © 2009 Pearson Education, Inc.
Slide 1- 84
Going the other way






Question 4: Chris scored in the 60th percentile.
 (a)
What is Chris’s z-score?
 (b)
What is Chris’s SAT score?
We can use the TI function InvNorm.
InvNorm(0.6) = 0.2533.
Then 0.2533 = (x – 500)/100
Then x = 525
Shortcut:
 InvNorm(0.6,500,100) = 525.335
Copyright © 2009 Pearson Education, Inc.
Slide 1- 85
Copyright © 2009 Pearson Education, Inc.
Slide 1- 86
Are You Normal? Normal Probability Plots


When you actually have your own data, you must
check to see whether a Normal model is
reasonable.
Looking at a histogram of the data is a good way
to check that the underlying distribution is roughly
unimodal and symmetric.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 87
Are You Normal? Normal Probability Plots (cont)


A more specialized graphical display that can
help you decide whether a Normal model is
appropriate is the Normal probability plot.
If the distribution of the data is roughly Normal,
the Normal probability plot approximates a
diagonal straight line. Deviations from a straight
line indicate that the distribution is not Normal.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 88
Are You Normal? Normal Probability Plots (cont)

Nearly Normal data have a histogram and a
Normal probability plot that look somewhat like
this example:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 89
Are You Normal? Normal Probability Plots (cont)

A skewed distribution might have a histogram
and Normal probability plot like this:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 90
Normal probability plots with the TI








Assume that your data are in L1 (the data from Chapter 5 are there.)
Data: 62, 63, 65, 66, 68, 70, 71, 73, 75
Choose X:[60,80],Y:[-4,4]
Press 2nd, Y= to brig up “STAT PLOT”
Pick one of the plots 1 through 3; say 1 and turn it on – make sure the
others are off.
The TYPE is the one at the lower right (the squiggly one)
Be sure that L1 is after “Data List”
Select ZOOM, then type 9
Copyright © 2009 Pearson Education, Inc.
Slide 1- 91
Normal Probability Plot –
StatCrunch (called a QQ plot)







Assume that your data are in the first column
Select Graph, then QQ Plot
Select the column where your data are.
Continue on as in all of the other StatCrunch
graphs.
The graph comes up, but with the normal scale
on the x-axis and the data on the y-axis.
This is the opposite of how most books do it!
Again, I would recommend the TI.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 92
Copyright © 2009 Pearson Education, Inc.
Slide 1- 93
Copyright © 2009 Pearson Education, Inc.
Slide 1- 94
Publisher’s Instructions:
Normal Probability (QQ) Plot





QQ Plot Displays the sample quantiles of a variable
versus the quantiles of a standard normal distribution.
Select the column(s) to be displayed in the plot(s).
Enter an optional Where clause to specify the data rows
to be included in the computation.
Select an optional Group by column to generate a
separate QQ plot for each distinct value of the Group by
column.
Click the Next button to specify graph layout options.
Click the Create Graph! button to create the plot(s).
Copyright © 2009 Pearson Education, Inc.
Slide 1- 95
What Can Go Wrong?

Don’t use a Normal model when the distribution is
not unimodal and symmetric.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 96
An example – incorrect z-score



Below : µ = 0.5; σ = 0.288
The point 0.99 is actually at the 99th percentile.
If you assume N(.5,.288), the z-score would be
1.701; percentile would be 95.56!
Copyright © 2009 Pearson Education, Inc.
Slide 1- 97
An example – incorrect z-score


Below : µ = 0.5; σ = 0.5
The point 5.98 would be at the 95th percentile.
If you assume N(.5,.5), the z-score of 5.98 would
be off the charts!
0.6
0.5
0.4
0.3
0.2
0.1
0
0.
00
1.
00
2.
00
3.
00
4.
00
5.
00
6.
00
7.
00
8.
00
9.
00
10
.0
0
11
.0
0
12
.0
0
13
.0
0
14
.0
0
15
.0
0

Copyright © 2009 Pearson Education, Inc.
Slide 1- 98
What Can Go Wrong? (cont.)



Don’t use the mean and standard deviation when
outliers are present—the mean and standard
deviation can both be distorted by outliers.
Don’t round your results in the middle of a
calculation.
Don’t worry about minor differences in results.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 99
What have we learned?

The story data can tell may be easier to
understand after shifting or rescaling the data.
 Shifting data by adding or subtracting the same
amount from each value affects measures of
center and position but not measures of
spread.
 Rescaling data by multiplying or dividing every
value by a constant changes all the summary
statistics—center, position, and spread.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 100
What have we learned? (cont.)

We’ve learned the power of standardizing data.
 Standardizing uses the SD as a ruler to
measure distance from the mean (z-scores).
 With z-scores, we can compare values from
different distributions or values based on
different units.
 z-scores can identify unusual or surprising
values among data.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 101
What have we learned? (cont.)

We’ve learned that the 68-95-99.7 Rule can be a
useful rule of thumb for understanding
distributions:
 For data that are unimodal and symmetric,
about 68% fall within 1 SD of the mean, 95%
fall within 2 SDs of the mean, and 99.7% fall
within 3 SDs of the mean.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 102
What have we learned? (cont.)

We see the importance of Thinking about
whether a method will work:
 Normality Assumption: We sometimes work
with Normal tables (Table Z). These tables are
based on the Normal model. But the TI is
faster and more accurate.
 Data can’t be exactly Normal, so we check the
Nearly Normal Condition by making a
histogram (is it unimodal, symmetric and free
of outliers?) or a normal probability plot (is it
straight enough?).
Copyright © 2009 Pearson Education, Inc.
Slide 1- 103
Example from the Video




The situation: A company fills cereal boxes to an
average of 16 oz. There are minor variations.
The standard deviation is 0.2 oz.
If the company sets its standard at 16.0 oz, half of
the boxes will be underweight.
Suppose the machine is set at 16.3 oz. What
percent of the boxes will be underweight?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 104
Example from the Video




Easy way: With the TI,
normalcdf(16.3,999,16,0.2) = 0.0668
It can also be worked by hand.
About 6.7%, or about 1 in 15 boxes will be
underweight.
Tell: If the machine is set at 16.3 oz with a st.
dev. of 0.2 oz, about 1 in 15 boxes will be
underweight.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 105
Example from the Video




The company decides that 1 in 15 is too high.
They now want 1 in 25, or 4%. How high should
the machine be set?
With the TI,
invNorm(0.04) = - 1.75
The machine must be set 1.75 standard
deviations above the mean, or .35 above the
mean.
Tell: The machine must be set for a mean of
16.35 to have 4% underweight boxes..
Copyright © 2009 Pearson Education, Inc.
Slide 1- 106
Example from the Video






The company president wants less free cereal!
The mean must be set at 16.2 and no more than
4% underweight. What to do?
Change the standard deviation. We need
N(16.2,σ) and we have to find σ.
We know that the z-score that corresponds to 4%
underweight is “-1.75”.
We can then solve -1.75 = (16.2 – 16)/ σ for σ.
We get σ = 0.114.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 107
Example from the Video

Tell: The company must get the machine to box
cereal with a standard deviation of 0.114 ounces
in order to meet the stated objective of a mean of
16.2 oz. and no more than 4% underweight
boxes.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 108
Topics in this chapter









Shifting and Rescaling Data
Standardized values (z-scores)
Using the standard deviation as a ruler
The Normal Model
The 68-95-99.7 Rule
Finding normal percents
and the reverse
Normal Probability Plots
The Normality Assumption
Copyright © 2009 Pearson Education, Inc.
Slide 1- 109
Division of Mathematics, HCC
Course Objectives for Chapter 6
After studying this chapter, the student will be able to:
20.
Compare values from two different distributions using their zscores.
21.
Use Normal models (when appropriate) and the 68-95-99.7
Rule to estimate the percentage of observations falling
within one, two, or three standard deviations of the mean.
22.
Determine the percentages of observations that satisfy
certain conditions by using the Normal model and determine
“extraordinary” values, and the reverse.
23.
Determine whether a variable satisfies the Nearly Normal
condition by making a normal probability plot or histogram.

Note: It is essential that this chapter be mastered. Almost
everything in Unit 3 depends on it.
Copyright © 2009 Pearson Education, Inc.