Download standard deviations from the mean

Document related concepts

History of statistics wikipedia , lookup

Transcript
Chapter 5
The Standard
Deviation as a
Ruler and the
Normal Model
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
1
NOTE on slides / What we can and cannot do

The following notice accompanies these slides, which have been downloaded from
the publisher’s Web site:
“This work is protected by United States copyright laws and is provided solely for the
use of instructors in teaching their courses and assessing student learning.
Dissemination or sale of any part of this work (including on the World Wide Web) will
destroy the integrity of the work and is not permitted. The work and materials from
this site should never be made available to students except by instructors using the
accompanying text in their classes. All recipients of this work are expected to abide
by these restrictions and to honor the intended pedagogical purposes and the needs
of other instructors who rely on these materials.”

Some of these slides are taken from the Third Edition; others are my own additions.
We can use these slides because we are using the text for this course. Please help
us stay legal. Do not distribute these slides any further.

The original slides are done in green / red and black. My additions are in red and blue.

Topics in brown and maroon are optional.
Slide 2- 2
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 2
2
Topics in this chapter









Shifting and Rescaling Data
Standardized values (z-scores)
Using the standard deviation as a ruler
The Normal Model
The 68-95-99.7 Rule
Finding normal percents
and the reverse
Normal Probability Plots
The Normality Assumption
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 3
3
Division of Mathematics, HCC
Course Objectives for Chapter 5
After studying this chapter, the student will be able to:
19.
Compare values from two different distributions using their z-scores.
20.
Use Normal models (when appropriate) and the 68-95-99.7 Rule to
estimate the percentage of observations falling within one, two, or
three standard deviations of the mean.
21.
Determine the percentages of observations that satisfy certain
conditions by using the Normal model and determine “extraordinary”
values.
22.
Determine whether a variable satisfies the Nearly Normal condition
by making a normal probability plot or histogram.
23.
Determine the z-score that corresponds to a given percentage of
observations.

Note: It is essential that this chapter be mastered. Almost everything in
Unit 3 depends on it.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
4
5.2
Shifting and Scaling
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
5
National Health and Examination Survey
•
Who?
80 male participants between 19 and 24 who
measured between 68 and 70 inches tall
•
What?
Their weights in kilograms
•
When? 2001 – 2002
•
Where? United States
•
Why?
To study nutrition and health issues and trends
•
How?
National survey
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
6
Shifting Weights
•
Mean: 82.36 kg
•
Maximum Healthy Weight: 74 kg
•
How are shape, center, and
spread affected when 74 is
subtracted from all values?
•
Shape and spread are
unaffected.
•
Center is shifted by 74.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
7
Rules for Shifting
•
If the same number is subtracted or added to all data
values, then:
•
The measures of the spread – standard deviation,
range, and IQR – are all unaffected.
•
The measures of position – mean, median, and mode –
are all changed by that number.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
8
Rescaling
•
If we multiply all data values by the same number, what
happens to the position and spread?
•
To go from kg to lbs, multiply by 2.2.
•
The mean and spread are also multiplied by 2.2.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
9
How Rescaling Affects the Center and
Spread
•
When we multiply (or divide) all the data values by a
constant, all measures of position and all measures of
spread are multiplied (or divided) by that same constant.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
10
Example: Rescaling Combined Times in
the Olympics
•
The mean and standard deviation in the men’s
combined event at the Olympics were 168.93 seconds
and 2.90 seconds, respectively.
•
If the times are measured in minutes, what will be the
new mean and standard deviation?
•
Mean: 168.93 / 60 = 2.816 minutes
•
Standard Deviation: 2.90 / 60 = 0.048 minute
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
11
Example: Office reward



Workers in a particular office have the following annual salaries
(in thousands)
 62, 62, 58, 54, 50, 46, 44
What are the summary statistics (rounded)?
 Mean: 53.7
 Median: 54
 Range: 18
 Standard Deviation: 7.34
The boss wants to reward everyone for a job well-done. He
can give them a one-time bonus or an extra raise.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 12
12
Option 1 - $5K Bonus (Shifting)




The data become
 67, 67, 63, 59, 55, 51, 49
The summary statistics become
 Mean: 58.7 (was 53.7)
 Median: 59 (was 54)
 Range: 18 (was 18)
 Standard Deviation: 7.34 (was 7.34)
What has changed? By what?
What has stayed the same?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 13
13
Option 2 – 5% raise (Rescaling)




The data become
 65.1, 65.1, 60.9, 56.7, 52.5, 48.3, 46.2
The summary statistics become
 Mean: 56.4 (was 53.7)
 Median: 56.7 (was 54)
 Range: 18.9 (was 18)
 Standard Deviation: 7.74 (was 7.34)
What has changed? By what?
What has stayed the same?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 14
14
Summary of effects
5K bonus
5% raise
Shifted
Rescaled
Mean
Up amt of shift
Up same percent
Median
Up amt of shift
Up same percent
Range
No change
Up same percent
Standard Deviation No change
Up same percent
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 15
15
5.1
Standardizing
with z-Scores
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
16
Let’s do one more thing




We have our office salaries
 62, 62, 58, 54, 50, 46, 44
 Recall: Mean = 53.7, St. Dev. = 7.34
Let’s shift then so that the average is 0.
 We get 8.3, 8.3, 4.3, 0.3, -3.7, -7.7. -9.7
 Mean is (approximately) 0.
Now divide them by the standard deviation.
 We get 1.13, 1.13, .59, -.04, -.5, -1.05, -1.32
 Mean is still 0, Standard Deviation is 1.
We have “standardized” the salaries.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 17
17
Benefits of Standardizing
•
•
•
Standardized values have been converted
from their original units to the standard
statistical unit of standard deviations from
the mean.
Thus, we can compare values that are
measured on different scales, with different
units, or from different populations.
Compare:
–
–
62,
62, 58, 54, 50, 46, 44
1.13, 1.13, .59, -.04, -.5, -1.05, -1.32
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 18
18
Comparing Athletes
•
Natalya Dobrynska (Ukraine) took the gold in the
Olympics with a long jump of 6.63 m
for the women’s heptathlon, 0.5 m
higher than average.
•
Hyleas Fountain (USA) won the
200 m run with a time of 23.21 s, 1.5 s faster than
average.
•
Whose performance was more impressive?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
19
How Many Standard Deviations Above?
Long Jump
200 m Run
Mean
6.11 m
24.71 s
SD
0.24 m
0.70 s
Individual
6.63 m
23.21 s
• The standard deviation helps us compare.
• Long Jump:
• 1 SD above: 6.11 + 0.24 = 6.35
• 2 SD above: 6.11 + (2)(0.24) = 6.59
• Just over 2 standard deviations above
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
20
The z-Score
•
In general, to find the distance between the value and
the mean in standard deviations:
1. Subtract the mean from the value.
2. Divide by the standard deviation.
y y
z
s
•
This is called the z-score.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
21
The z-score
•
The z-score measures the distance of the value from the
mean in standard deviations.
•
A positive z-score indicates the value is above the mean.
•
A negative z-score indicates the value is below the mean.
•
A small z-score indicates the value is close to the mean
when compared to the rest of the data values.
•
A large z-score indicates the value is far from the mean
when compared to the rest of the data values.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
22
How Many Standard Deviations Above?
Long Jump
200 m Run
Mean
6.11 m
24.71 s
SD
0.24 m
0.70 s
Individual
6.63 m
23.21 s
• Standard Deviations from the Mean
Long Jump:
200 m Run:
6.63  6.11
z
 2.17
0.24
23.21  24.71
z
 2.14
0.70
• Natalya Dobrynska’s long jump was a little more
impressive than Hyleas Fountain’s 200 m run.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
23
Shifting, Scaling, and z-Scores
•
Converting to z-scores:
y y 0
•
Subtract the mean
•
Divide by the standard deviation
•
The shape of the distribution does not change.
•
Changes the center by making the mean 0
•
Changes the spread by making the standard deviation 1
s s =1
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
24
Example: SAT and ACT Scores
•
How high does a college-bound senior need to score on
the ACT in order to make it into the top quarter of
equivalent of SAT scores for a college with middle 50%
between 1530 and 1850?
•
SAT: Mean = 1500, Standard Deviation = 250
•
ACT: Mean = 20.8, Standard Deviation = 4.8
•
Think →
Plan: Want ACT score for upper quarter. Have y and s
• Variables: Both are quantitative. Units are points.
•
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
25
Show →Mechanics: Standardize the
Variable
•
It is known that the middle 50% of SAT scores are
between 1530 and 1850, y = 1500, s = 250
•
The top quarter starts at 1850.
1850  1500
 1.40
• Find the z-score: z 
250
•
For the ACT, 1.40 standard deviations above the mean:
20.8  1.40(4.8)  27.52
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
26
Conclusion
• To
be in the top quarter of applicants in
terms of combined SAT scores, a collegebound senior would need to have an ACT
score of at least 27.52.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
27
Practice
Example – Which student performed better?
• Student A received a 85 on a 100 point quiz with a
mean of 90 and standard deviation of 5.
• Student B received a 35 on a 50 point quiz with a mean
of 37 and a standard deviation of 3.
• We must compare z-scores!
85 − 90
𝑆𝑡𝑢𝑑𝑒𝑛𝑡 𝐴:
= −1
5
35 − 37
𝑆𝑡𝑢𝑑𝑒𝑛𝑡 𝐵:
= −2/3
3
Student B did better.
Source: Mrs. Emily Francis, Instructor of Mathematics, HCC
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 28
28
Who is relatively taller:
• A non-basketball playing man who is 75 inches tall (assume
non-basketball playing men have a mean height of 71.5
inches tall and a standard deviation of 2.1 inches).
• A male basketball player who is 85 inches tall (assume male
basketball players have a mean height of 80 inches and a
standard deviation of 3.3)
75 − 71.5
𝑧(𝑁𝑜𝑛𝑝𝑙𝑎𝑦𝑒𝑟):
= +1.667
2.1
85 − 80
𝑧(𝑃𝑙𝑎𝑦𝑒𝑟):
= +1.515
3.3
The non[player is relatively taller.
Source: Mrs. Emily Francis, Instructor of Mathematics, HCC
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 29
29
5.3
Normal Models
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
30
Models
•
“All models are wrong, but some are useful.”
George Box, statistician
•
−1 < z < 1: Not uncommon
•
z = ±3: Rare
•
z = 6: Shouts out for attention!
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
31
Example
 Suppose
we asked 30 people the question: At
what age did you get your first real job?
 We could construct a histogram and see if any
pattern emerges.

Source (next ten slides including this one): Marc Boyer and
Martine Ferguson, slides for “Basic Statistics Course”, given in
Fall 2008 at the FDA Center for Food Safety and Applied
Nutrition.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
32
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
33

Now suppose that we asked 300 people the same
question.

Observe as the number of people we ask
increases, the graph begins to take the shape of
what the population would look like.

The histogram now begins to take the shape of a
normal or Gaussian distribution because the
underlying distribution is normal.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
34
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
35

Now suppose that we asked 3000 people the same
question.

As the number of people increases the histogram
appears more smooth.

The histogram now looks like a Normal probability
distribution.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
36
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
37






Next we will see what the population looks like (plot of the
distribution of values in the entire population).
The previous histograms had a vertical scale that showed
the percentage of observations in each category.
Now the vertical axis doesn’t show the percent of
observations since we have an infinite population.
We must start thinking in terms of area under the curve.
The distribution of all values in the population is no longer
a histogram.
The area under the entire curve represents the entire
population, and the proportion of that area that falls
between two values is the probability of observing a value
in that interval.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
38
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
39


The mean is the center of the normal distribution.
The standard deviation gives an expression of the
spread.
 A special case of the normal distribution is called the
standard normal distribution.
 The standard normal distribution has mean zero and
standard deviation 1.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
40
The Normal Model
•
Bell Shaped: unimodal, symmetric
•
A Normal model for every mean and standard deviation.
•
m (read “mew”) represents the population mean.
s (read “sigma”) represents the population standard
deviation.
• N(m, s) represents a Normal model with mean m and
standard deviation s.
•
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
41
A little history – Normal Model

First published in 1718 by Abraham de Moivre
(“Doctrine of Chances”).
 He had no idea how to apply it to
experimental observations.
 Context of estimating binomial (coin-toss,
etc.) probabilities for large n.
 The paper remained unknown until another
statistician, Karl Pearson, discovered it in
1924!
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 42
42
A little history – Normal Model





Pierre-Simon, Marquis Laplace - Analytical Theory of
Probabilities (1812) – first used the normal distribution in 1778
for the analysis of errors of experiments.
Karl Friedrich Gauss, who claimed to have used the method
since 1794, justified it rigorously in 1809 (independent of
LaPlace). Sometimes the Normal distribution is referred to as
the Gaussian distribution.
The name "bell curve" goes back to Jouffret who first used the
term "bell surface" in 1872 for a “multivariate normal”
distribution, i.e. an extension to three dimensions.
The name "normal distribution" was coined independently by
Charles S. Peirce, Francis Galton and Wilhelm Lexis around
1875.
The independent discoveries show how naturally the Normal
Model arises.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 43
43
Parameters and Statistics
•
Parameters: Numbers that help specify the model
• m, s
•
Statistics: Numbers that summarize the data
• y , s, median, mode (We will see this in Chapter 10).
•
N(0, 1) is called the standard Normal model, or the
standard Normal distribution.
•
The Normal model should only be used if the data is
approximately symmetric and unimodal.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
44
The 68-95-99.7 Rule
(also called the “Empirical Rule”)
•
68% of the values fall within 1 standard deviation of the
mean.
• 95% of the values fall within 2 standard deviations of the
mean.
• 99.7% of the values fall within 3 standard deviations of
the mean.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
45
More on the 68-95-99.7 rule
If the population is normally distributed then:
1. Approximately 68% of the observations are within 1
standard deviation of the population mean.
2. Approximately 95% of the observations are within 2
standard deviations of the population mean.
3. Approximately 99.7% of the observations are within 3
standard deviations of the population mean.
Source (this and the next 5 slides): Marc Boyer and Martine Ferguson,
slides for “Basic Statistics Course”, given in Fall 2008 at FDA
Center for Food Safety and Applied Nutrition.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
46
Approximately 68% of the observations fall within 1
standard deviation of the mean
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
47
More on the 68-95-99.7 rule







Note that the range "within one standard deviation of
the mean" is highlighted in green.
The area under the curve over this range is the relative
frequency of observations in the range.
That is, 0.68 = 68% of the observations fall within one
standard deviation of the mean (µ ± σ).
Below the axis, in red, is another set of numbers.
These numbers are simply measures of standard
deviations from the mean.
In working with the variable X we will often find it
necessary to convert into units of standard deviations
from the mean.
When the variable is measured this way, the letter Z is
commonly used.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
48
Approximately 95% of the observations fall within 2
standard deviations of the mean
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
49
Approximately 99.7% of the observations fall within 3
standard deviations of the mean
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
50
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
51
Example of the 68-95-99.7 Rule
•
In the 2010 winter Olympics men’s slalom, Li Lei’s time
was 120.86 sec, about 1 standard deviation slower than
the mean. Given the Normal model, how many of the
48 skiers were slower?
•
About 68% are within 1 standard deviation of the mean.
•
100% – 68% = 32% are outside.
•
“Slower” is just the left side.
•
32% / 2 = 16% are slower.
•
16% of 48 is 7.7.
•
About 7 are slower than Li Lei.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
52
The Empirical Rule is only an approximation.
•
•
•
•
IQ’s have a mean of 100 and a standard deviation of
16.
If a student has an IQ of 116, what percent of students
have a higher score.
Answer: 16% using the Empirical Rule.
We will see later that the correct answer is closer to
15.87%.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
53
Three Rules For Using the Normal Model
1.
Make a picture.
2.
Make a picture.
3.
Make a picture.
•
When data is provided, first make a histogram to make
sure that the distribution is symmetric and unimodal.
•
Then sketch the Normal model.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
54
Working With the 68-95-99.7 Rule
•
Each part of the SAT has a mean of 500 and a standard
deviation of 100. Assume the data is symmetric and
unimodal. If you earned a 700 on one part of the SAT
how do you stand among all others who took the SAT?
•
Think →
• Plan: The variable is quantitative and the distribution
is symmetric and unimodal. Use the Normal model
N(500, 100).
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
55
Show and Tell
•
Show → Mechanics:
• Make a picture.
• 700 is 2 standard deviations above the mean.
•
Tell → Conclusion:
• 95% lies within 2 standard deviations of the mean.
• 100% - 95% = 5% are outside of 2 standard deviations
of the mean.
• Above 2 standard deviations is half of that.
• 5% / 2 = 2.5%
• Your score is higher than 2.5% of all scores on this test.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
56
Example: 68-95-99.7 rule.

Example: For men aged 18 to 24, serum cholesterol
levels have a mean of 178 mg/100mL with a standard
deviation of 40.7 mg/mL.
 Pete’s cholesterol reading is 231.
 Where is Pete with respect to the mean cholesterol
level?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
57
Watching Pete’s Cholesterol level






(231 – 178) = 53.
The standard deviation was 40.7
53 / 40.7 is 1.3 standard deviations above the mean.
Pete’s z-score is 1.3
We can say that between 68% and 95% of the stated
population has a cholesterol level more extreme than
Pete’s. Between 2.5% and 16% have a cholesterol level
higher than Pete’s.
We can say more using technology.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
58
Importance of the z-score

The z-score is a ruler for comparing populations, even
those which do not have the same mean and standard
deviation.
 One study showed the mean cholesterol of American
women as 188 mg/100mL and a standard deviation of
24 mg/100 mL.
 By coincidence, Susan has a cholesterol reading of
231 mg/100mL.
 Who’s is really higher – Pete’s or Susan’s?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
59
Pete and Susan






Susan is above her mean by 231-188, or 43
mg/100mL.
43.24 = 1.792 standard deviations.
Susan’s z-score is 1.792.
Pete’s z-score is 1.3.
Susan’s is higher.
Medically, Susan may have a bigger problem than
Pete.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
60
Pete and Susan
Susan’s reading is closer to the mean (43 mg/100ml
vs. Pete’s of 53 mg/100ml).
 But Susan’s population has smaller variability than
Pete’s.
 This made Susan’s cholesterol more extreme than
Pete’s.
 It’s about variability!

Copyright © 2014, 2012, 2009 Pearson Education, Inc.
61
5.4
Finding Normal
Percentiles
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
62
What if z is not −3, −2, −1, 0, 1, 2, or 3?
•
If the data value we are trying to find using the Normal
model does not have such a nice z-score, we will use a
computer.
•
Example: Where do you stand if your SAT math score
was 680? m = 500, s = 100
•
Note that the z-score is not an integer:
680  500
z
 1.8
100
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
63
*Finding Normal Percentiles by Hand
(This is both slower and less accurate.
Don’t do it this way!)
When a data value doesn’t fall exactly 1, 2, or 3 standard
deviations from the mean, we can look it up in a table
of Normal percentiles.
Table Z in Appendix D provides us with normal
percentiles, but many calculators and statistics
computer packages provide these as well.
Let’s use the technology. As for the tables – let’s not and
say we did!
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 64
64
Using StatCrunch for the Normal Model
•
What percent of all SAT
scores are below 680?
• m = 500, s = 100
•
Stat → Calculators
→ Normal
•
Fill in info, hit Compute
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
65
Using StatCrunch for the Normal Model
•
What percent of all SAT
scores are below 680?
• m = 500, s = 100
•
Stat → Calculators
→ Normal
•
Fill in info, hit Compute
•
96.4% of SAT scores
are below 680.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
66
Using the TI for the Normal Model







Same exercise – what percent of SAT scores are
lower than 680?
On the TI, [DISTR], then normalcdf.
The syntax for normalcdf id
normalcdf(min,max,mean,stdev).
Here, the minimum is “minus infinity”.
Input a large negative number, say -99999
Use the negative sign below the 3.
Normalcdf(-99999,680,500,100) = 0.9641
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
67
Using the TI for the Normal Model
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
68
A Probability Involving “Between”
•
What is the proportion of SAT scores that fall between
450 and 600? m = 500, s = 100
•
Think →
• Plan: Probability that x is between 450 and 600
= Probability that x < 600 – Probability that x < 450
•
Variable: We are told that the
Normal model works.
N(500, 100)
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
69
A Probability Involving “Between”
•
What is the proportion of SAT scores that fall between
450 and 600? m = 500, s = 100
•
Show → Mechanics: Use StatCrunch to find each of the
probabilities.
•
Probability that x is between 450 and 600
= Probability that x < 600 – Probability that x < 450
= 0.8413 – 0.3085 = 0.5328
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
70
A Probability Involving “Between” (SC)
•
What is the proportion of SAT scores that fall between
450 and 600? m = 500, s = 100
•
Probability that x is between 450 and 600
= Probability that x < 600 – Probability that x < 450
= 0.8413 – 0.3085
= 0.5328
•
Conclusion: The Normal model estimates that about
53.28% of SAT scores fall between 450 and 600.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
71
A Probability Involving “Between” (TI)
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
72
From Percentiles to Scores: z in Reverse
•
Suppose a college admits only people with SAT
scores in the top 10%. How high a score does it take
to be eligible? m = 500, s = 100
•
Think →
•
Plan: We are given the probability and want to go
backwards to find x.
•
Variable: N(500, 100)
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
73
From Percentiles to Scores:
z in Reverse (SC)
•
Suppose a college admits only people with SAT scores
in the top 10%. How high a score does it take to be
eligible? m = 500, s = 100
•
Show → Mechanics: Use StatCrunch putting in 0.9 for
the probability.
•
Probability x < 628 = 0.9
•
Conclusion: Because the school wants the SAT
Verbal scores in the top 10%, the cutoff is 628.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
74
From Percentiles to Scores:
z in Reverse (TI)





Going from a percent to a score is the inverse of
normalcdf.
Therefore, use InvNorm(pct,mean,stdev)
However, InvNorm only computes the lower x%.
However, the highest 10% is the lowest 90%.
Therefore, use InvNorm(0.9,500,100) = 628.15
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
75
From Percentiles to Scores:
z in Reverse (TI)
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
76
What z-scores correspond to
the middle 95%?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 77
77
Middle 95%
The z-score cutoffs for the middle 95% are +z and –z. How
to find z?
Issue:
• InvNorm goes only from a cutoff to the extreme
• We need to “fudge” to accommodate InvNorm!
left.
There is 0.95 in the middle, plus 0.025 on the extreme left.
InvNorm(0.975) is 1.959963 or 1.96.
This is extremely important for Unit 3. You need to
understand this and keep it in mind.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 78
78

An Application to Test Scores
QUESTION 1
 The SAT Verbal has a mean of 500
and a standard deviation of 100.
 Pat got 650 on the SAT Verbal.
How well did Pat do?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 79
79
An Application to Test Scores

ANSWER – actually, two answers!
 If we standardize Pat’s score, we get
650 – 500
100
 Or a z-score of +1.5.
 That is, Pat’s score is 1.5 standard deviations
above the mean SAT score of 500.
 Only people who have had statistics think in terms
of z-scores, so let’s figure a percentile.
 We use normalcdf(?,1.5). ??
 On the TI, use a low lower bound such as -999.
 Normalcdf((-)999,1.5)=93.32%, a respectable job.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 80
80
An Application to Test Scores
Tell: If Pat got a 650 on the SAT Verbal, his score was in
the 93.32 percentile.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 81
81
An Application to Test Scores


QUESTION 2

One college that Pat is considering requires the ACT. Pat took
it as well and got a 27.

The ACT has a mean of 21 and a standard deviation of 4.7

How well did Pat do? As well as the SAT?
ANSWER

Standardizing Pat’s score, we get
27 – 21
4.7



Or a z-score of +1.28. Not quite as good.
As before, on the TI, use a low lower bound such as -.999.
For a percentile, Normalcdf((-)999,1.28)=89.97%, still a good
showing.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 82
82
An Application to Test Scores

Tell: If Pat got a 27 on the ACT, which is N(21,4.7),
then Pat’s score was in the 89.97 percentile. This
score was not as good as his SAT score, which was in
the 93.32 percentile.
 We are using z-scores to, in effect, compare apples
and oranges – two datasets with completely different
means and standard deviations.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 83
83
An Application to Test Scores


QUESTION 3
 How well would Pat have to do on the ACT to match his
percentile (93.32) and equivalent z-score (1.5) on the SAT?
ANSWER
 Remembering our standardization, we have (X is Pat’s ACT
score)
1.5 =


𝑋 −21
47
We manipulate to get X: (1.5)*(4.7)+21 = 27.05!
Even though Pat just missed it with the 27, 28 is needed
since ACT scores are reported in whole numbers!
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 84
84
An Application to Test Scores
Tell: In order to do as well on the ACT as on the SAT, Pat
would need an ACT score of 28.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 85
85
Percentiles and Z-scores
• What
percent of a standard Normal model is found in each region?
Draw a picture for each
a) z > -2.05
b) z < -0.33
c) 1.2 < z < 1.8
d) |z| < 1.28
• In a standard Normal model, what value(s) of z cut(s) off the region
described? Draw a picture first!
a) The highest 20%
b) The highest 75%
c) The lowest 3%
d) The middle 90%
Source: Mrs. Emily Francis, Instructor of Mathematics, HCC
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 86
86
More Percentiles and Z-scores
• What
percent of a standard Normal model is found in each region?
Draw a picture for each
a) z > -1.05
b) z < -0.40
c) 1.3 < z < 2.0
• In a standard Normal model, what value(s) of z cut(s) off the region
described? Draw a picture first!
a) The highest 20%
b) The highest 60%
c) The lowest 6%
d) The middle 75%
Source: Mrs. Emily Francis, Instructor of Mathematics, HCC
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 87
87
Additional exercises
Some IQ tests are standardized to a normal model with a mean of
100 and a standard deviation of 16.
A) Draw the model for these IQ scores clearly labeling showing
what the 68-95-99.7 Rule predicts about the scores
B) In what interval would you expect to find the central 95% of IQ
scores to be found?
C) About what percent of people should have IQ scores above
116?
D) About what percent of people should have IQ scores between
68 and 84?
E) About what percent of people would have IQ scores above
132?
Source: Mrs. Emily Francis, Instructor of Mathematics, HCC
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 88
88
44) Based on the Normal model N(100, 16) describing
IQ scores, what percent of people’s IQ scores would
you expect to be
– Over 80?
– Under 90?
– Between 112 and 132?
46) In the same model, what cutoff value bounds
– The highest 5% of all IQs?
– The lowest 30% of the IQs?
– The middle 80% of the IQs?
Source: Mrs. Emily Francis, Instructor of Mathematics, HCC
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 89
89
Underweight Cereal Boxes
•
Based on experience, a manufacturer
makes cereal boxes that fit the Normal
model with mean 16.3 ounces and
standard deviation 0.2 ounces, but the
label reads 16.0 ounces. What fraction
will be underweight?
•
Think →
• Plan: Find Probability that x < 16.0
• Variable: N(16.3, 0.2)
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
90
Underweight Cereal Boxes
•
What fraction of the cereal
boxes will be underweight
(less than 16.0)?
m = 16.3, s = 0.2
•
Show → Mechanics: Use
StatCrunch to find the
probability.
•
Probability x < 16.0 = 0.0668
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
91
Underweight Cereal Boxes
•
What fraction of the cereal boxes will be underweight
(less than 16.0)? m = 16.3, s = 0.2
•
Probability x < 16.0 = 0.0668
•
Conclusion: I estimate that approximately 6.7% of the
boxes will contain less than 16.0 ounces of cereal.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
92
Underweight Cereal Boxes Part II
•
Lawyers say that 6.7% is too high and recommend that
at most 4% be underweight. What should they set the
mean at? s = 0.2
•
Think →
• Plan: Find the mean such that
Probability(x < 16.0) = 0.04.
•
Variable: N(?, 0.2)
•
Reality Check: Note that the mean must be less
than 16.3 ounces.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
93
How I would do it
We cannot do this using InvNorm or normalcdf. We can,
however, get a z-score that corresponds to the lowest
4%, and then solve: (16 – xbar)/0.02 = z
Use 16 because “underweight” is defined as less than 16
oz.
The lowest 4% in the standard normal corresponds to
InvNorm(0.04,0,1) = -1.7506.
We need to be 1.75 standard deviations below the mean
Solve: (16 – xbar)/0.02 = -1.75.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
94
How I would do it (next step)
(16 – xbar)/0.02 = -1.75.
(16 – xbar) = -1.75 * 0.02
16 – xbar = - 0.035
16 + 0.035 = xbar
Xbar = 16.035 oz. (to three decimal places)
This should clarify the next slide.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
95
Underweight Cereal Boxes Part II
•
Lawyers say that 6.7% is too high and recommend that
at most 4% be underweight. What should they set the
mean at? s = 0.2
•
Mechanics: Sketch a picture.
•
Use StatCrunch to find z
such that the area to the
left of the standard Normal Model is 0.04.
•
•
z = −1.75
Find 16 + 1.75(0.02)
= 16.035 ounces
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
96
Underweight Cereal Boxes Part II
•
Lawyers say that 6.7% is too high and recommend that
at most 4% be underweight. What should they set the
mean at? s = 0.2
•
z = −1.75
•
Find 16 + 1.75(0.02) = 16.035 ounces
•
Conclusion: The company must set the machine to
average 16.035 ounces per box.
•
Note: Correction from the publisher’s slide, which said 16.35.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
97
Underweight Cereal Boxes Part III
•
The CEO vetoes that plan and sticks with a mean of
16.2 ounces and 4% weighing under 16.0 ounces.
She demands a machine with a lower standard
deviation. What standard deviation must the machine
achieve?
•
Think →
• Plan: Find s such that Probability x < 16.0 = 0.04.
•
Variable: N(16.2, ?)
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
98
Underweight Cereal Boxes Part III
•
What standard deviation must the machine
achieve? N(60.2, ?)
Show → Mechanics: From before, z = −1.75
16.0  16.2
• 1.75 
s
•
s = 0.114
•
1.75s = 0.2,
•
Conclusion: The company must get the machine to
box cereal with a standard deviation of no more than
0.114 ounces. The machine must be more consistent.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
99
Section 5.5
Normal
Probability
Plots
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
100
Checking if the Normal Model Applies
•
A histogram will work, but there is an alternative
method.
•One problem with histograms – they look different with
different bin widths.
• Instead use a Normal Probability Plot.
• Plots each value against the z-score that would be
expected had the distribution been perfectly normal.
• If the plot shows a line or is nearly straight, then the
Normal model works.
• If the plot strays from being a line, then the Normal
model is not a good model.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
101
The Normal Model Applies
•
The Normal probability plot is nearly straight, so the
Normal model applies. Note that the histogram is
unimodal and somewhat symmetric.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
102
The Normal Model Does Not Apply
•
The Normal probability plot is not straight, so the
Normal model does not apply applies. Note that the
histogram is skewed right.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
103
Histogram with the TI




Example: Data: 62, 63,
65, 66, 68, 70, 71, 73,
75
Use [STAT][EDIT] to put
the dataset in L1.
The first few data points
are shown.
NOTE: You will do this a
lot in this course!
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 4- 104
104
Slide 1- 104
Histogram with the TI
First, select [Y1] and turn off any
functions from Algebra class!
Press [2nd][Y1] and go to one of
the three plots. Turn it on.
Select the histogram.
Make sure that L1 (or wherever
you put the data) is in Xlist.
Make sure the 1 is in Freq
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 4- 105
105
Histogram with the TI (default)
You can get a window
default by selecting Zoom
and then 9
Below is the window. It
shows a bin width of 3.25.
It includes all of the values.
Because we have integers,
I’d rather have 3 as a bin
width.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 4- 106
106
Histogram with the TI
Choose as window X:[60,78];Y[1,3]. You may have to play with
this.
•
For X, I picked a little lower than the
min and a little higher than the max.
• For Y, I picked a little bigger than the
largest bin frequency than I
expected.
Xscl is the length of the bin. In
this case, choosing 3 makes cut
points at 60, 63, 66, 69. 72. 75,
and 78.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 4- 107
107
NPP with the TI
[STATPLOT], then turn
Plot 1 on.
o Select the lower right
plot, This is the NPP.
o Press Zoom, 9
o Looks pretty good!
o
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
108
Normal Probability Plot –
StatCrunch (called a QQ plot)
•
•
•
•
•
•
•
Assume that your data are in the first column
Select Graph, then QQ Plot
Select the column where your data are.
Continue on as in all of the other StatCrunch graphs.
The graph comes up, but with the normal scale on the
x-axis and the data on the y-axis.
This is the opposite of how most books do it!
Again, I would recommend the TI.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 109
109
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 110
110
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 111
111
Publisher’s Instructions:
Normal Probability (QQ) Plot
•
•
•
•
•
QQ Plot Displays the sample quantiles of a variable versus the
quantiles of a standard normal distribution. Select the column(s)
to be displayed in the plot(s).
Enter an optional Where clause to specify the data rows to be
included in the computation.
Select an optional Group by column to generate a separate QQ
plot for each distinct value of the Group by column.
Click the Next button to specify graph layout options.
Click the Create Graph! button to create the plot(s).
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 112
112
Other tests for Normality
There are several analytical (as opposed to graphical)
tests to see if data fit a normal distribution.
• Goodness of fit test – will demonstrate in Chapter 22
• Shapiro-Wilk test – used by FDA / CFSAN; this is also
in StatCrunch (I’ll show it to you after we do Unit 3.)
• Lilliefors Test
• Anderson-Darling Test
• and several others.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
113
5.end
Wrap-up
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
114
What Can Go Wrong?
Don’t use a Normal model when the distribution is not
unimodal and symmetric.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 115
115
An example – incorrect z-score
Below : µ = 0.5; σ = 0.288
The point 0.99 is actually at the 99th percentile.
If you assume N(.5,.288), the z-score would be 1.701;
percentile would be 95.56!
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
116
Slide 1- 116
An example – incorrect z-score
•
•
•
Below : µ = 0.5; σ = 0.5
The point 5.98 would be at the 95th percentile.
If you assume N(.5,.5), the z-score of 5.98 would be
off the charts!
0.6
0.5
0.4
0.3
0.2
0.1
0
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 117
117
What Can Go Wrong? (cont.)
Don’t use the mean and standard deviation when outliers
are present—the mean and standard deviation can
both be distorted by outliers.
Don’t round your results in the middle of a calculation.
Don’t worry about minor differences in results.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 118
118
What Can Go Wrong
•
Don’t use the Normal model when the distribution is not
unimodal and symmetric.
• Always look at the picture first.
•
Don’t use the mean and standard deviation when
outliers are present.
• Check by making a picture.
•
Don’t round your results in the middle of the calculation.
• Always wait until the end to round.
•
Don’t worry about minor differences in results.
• Different rounding can produce slightly different results.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
119
What have we learned?
The story data can tell may be easier to understand after
shifting or rescaling the data.
• Shifting data by adding or subtracting the same amount
from each value affects measures of center and position
but not measures of spread.
• Rescaling data by multiplying or dividing every value by
a constant changes all the summary statistics—center,
position, and spread.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 120
120
What have we learned? (cont.)
We’ve learned the power of standardizing data.
• Standardizing uses the SD as a ruler to measure
distance from the mean (z-scores).
• With z-scores, we can compare values from different
distributions or values based on different units.
• z-scores can identify unusual or surprising values
among data.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 121
121
What have we learned? (cont.)
We’ve learned that the 68-95-99.7 Rule can be a useful
rule of thumb for understanding distributions:
• For data that are unimodal and symmetric, about 68%
fall within 1 SD of the mean, 95% fall within 2 SDs of the
mean, and 99.7% fall within 3 SDs of the mean.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 122
122
What have we learned? (cont.)
We see the importance of Thinking about whether a
method will work:
• Normality Assumption: We sometimes work with
Normal tables (Table Z). These tables are based on the
Normal model. But the TI is faster and more accurate.
• Data can’t be exactly Normal, so we check the Nearly
Normal Condition by making a histogram (is it
unimodal, symmetric and free of outliers?) or a normal
probability plot (is it straight enough?).
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 123
123
Division of Mathematics, HCC
Course Objectives for Chapter 5
After studying this chapter, the student will be able to:
19.
Compare values from two different distributions using their z-scores.
20.
Use Normal models (when appropriate) and the 68-95-99.7 Rule to
estimate the percentage of observations falling within one, two, or
three standard deviations of the mean.
21.
Determine the percentages of observations that satisfy certain
conditions by using the Normal model and determine “extraordinary”
values.
22.
Determine whether a variable satisfies the Nearly Normal condition
by making a normal probability plot or histogram.
23.
Determine the z-score that corresponds to a given percentage of
observations.

Note: It is essential that this chapter be mastered. Almost everything in
Unit 3 depends on it.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
124