Download standard deviation

Document related concepts
no text concepts found
Transcript
Chapter 6
The Standard Deviation as a
Ruler and the Normal Model
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Normal Distributions


Many real-life histograms have a symmetric,
unimodal and tapered shape.
Example: Here is the earthquakes distribution
again ....
Tuesday, June 18, 13
Normal Distributions
• Example: The distribution of college age males’ heights
Tuesday, June 18, 13
The Normal Model
• George Box:
– “All models are wrong, but some are useful.”
• If data are unimodal and roughly symmetric, then the
normal model is appropriate.
– Written N(µ, σ)
• µ is the mean for the model
• σ is the SD for the model
Tuesday, June 18, 13
The Standard Deviation as a Ruler



QUESTION: How do we compare very different
looking Normal Distributions - some wide, some
narrow, some flat and some tall?
ANSWER: Use standard deviations as our rulers.
The standard deviation tells us how the whole
collection of values varies, so it’s a natural ruler
for comparing an individual to a group.
It is the most common measure of variation.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 3
Standardizing with z-scores

We introduce the z-score.

We compare individual data values to their
group’s mean value, relative to their standard
deviation:

How far is a data point from it’s group’s mean?
Subtract the value from the mean: (y - mean)
How many standard deviations is that distance?
Divide the distance by s: (y-mean) / s

Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 4
Standardizing with z-scores (cont.)




Standardized values have no units.
z-scores measure the distance of each data value
from the mean in multiples of standard deviations.
A negative z-score tells us that the data value is
below the mean (to the left)
A positive z-score tells us that the data value is
above the mean (to the right).
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 5
Benefits of Standardizing


Standardized values have been converted from
their original units to the standard statistical
unit of standard deviations from the mean.
Thus, we can compare values that are measured
on different scales, with different units, or from
different populations.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 6
Example of Z-Score Comparisons

Q: Which is more overweight - a 13 lbs
Chihuahua or a 118 lb Doberman Pincer?

Chihuahua: mean weight = 7 lb, SD = 1.2 lb
 z-score = (13 - 7) / 1.2 = 5.0
Doberman: mean weight = 107 lb, SD = 24 lb
 z-score = (118 - 107) / 24 = 0.46


So we see that the Chihuahua is far more
overweight than the Doberman.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Ex 2: Compare Female to Male Height
•
Who is relatively ‘taller’; A 65” female or a 70” male?
•
Standardize the Results
– Compute the z-score…
– Calculates the distance of a value from the mean in standard
deviations
•
Females: Mean = 64.08,
SD = 3.422
z = (65 - 64.08) / 3.422 = 0.29
Tuesday, June 18, 13
•
Males: Mean = 70.49,
SD = 4.080
z = (70 - 70.49) / 4.080 = -0.12
Female
Tuesday, June 18, 13
Male
Why the z-score works

SHIFTING DATA
 Adding a constant to every data value adds
the same constant to measures of position.
 SO ... adding a constant to each value will
increase position (center, quartiles, max or
min) by the same amount.
 Its shape (range, IQR, standard deviation)
remain unchanged.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 7
Shifting Data (cont.)

EXAMPLE: The following histograms show a
shift from men’s actual weights to kilograms
above recommended weight:
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 8
SCALING DATA


When we multiply all the data values by any
constant,
 all measures of position (such as the mean,
median, and percentiles) and
 all measures of spread (such as the range, the
IQR, and the standard deviation)
are multiplied by that same constant.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 9
SCALING DATA (cont.)

EXAMPLE: The men’s weight data set measured
weights in kilograms. If we want to think about these
weights in pounds, we would rescale the data:
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 10
Back to z-scores


Standardizing data into z-scores both
 (i) shifts the data by subtracting the mean and
 (ii) rescales the values by dividing by their
standard deviation.
Standardizing into z-scores does not change the
shape of the distribution.
 Standardizing into z-scores changes the
center so that the mean is 0.
 Standardizing into z-scores changes the
spread so that the standard deviation 1.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 11
When Is a z-score BIG?


A z-score gives us an indication of how many
standard deviations a data point is from it’s mean
 Remember that a negative z-score tells us that
the data value is below the mean . . .
 While a positive z-score tells us that the data
value is above the mean.
The larger a z-score is (negative or positive), the
more unusual it is.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 12
Models

There is a Normal model for every possible
combination of mean and standard deviation.
 We write N(µ,σ) to represent a Normal model
with a mean of µ and a standard deviation of σ.

We use Greek letters because this mean and
standard deviation do not come from data—they
are numbers (called parameters) that specify the
model.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 14
When Is a z-score Big? (cont.)



Summaries of data, like the sample mean and
standard deviation, are written with Latin letters.
Such summaries of data are called statistics.
When we standardize Normal data, we still call the
standardized value a z-score, and we write
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 15
When Is a z-score Big? (cont.)


Once we have standardized, we need only one
model:
 The N(0,1) model is called the standard
Normal model (or the standard Normal
distribution).
Be careful—don’t use a Normal model for just any
data set, since standardizing does not change the
shape of the distribution.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 16
When Is a z-score Big? (cont.)


When we use the Normal model, we are
assuming the distribution is Normal.
We often cannot check this assumption in
practice, so we check the following condition:
 Nearly Normal Condition: The shape of the
data’s distribution is unimodal and symmetric.
 This condition can be checked by making a
histogram or a Normality plot (to be explained
later).
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 17
The 68-95-99.7 Rule of Thumb

It turns out that in a Normal model:
 68% of the values fall within one SD of the
mean;
 95% of the values fall within two SD of the
mean; and,
 99.7% (almost all!) of the values fall within
three SD of the mean.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 19
The 68-95-99.7 Rule (cont.)

The following shows what the 68-95-99.7 Rule
tells us (visualize your histogram embedded
within):
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 20
The First Three Rules for Working with
Normal Models




Make a picture.
Make a picture.
Make a picture.
And, when we have data, make a histogram to
check the Nearly Normal Condition to make sure
we can use the Normal model to model the
distribution.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 21
Finding Normal Percentiles by Hand

When a data value doesn’t fall exactly 1, 2, or 3
standard deviations from the mean, we can look it
up in a table of Normal percentiles.

Table Z in Appendix D provides us with normal
percentiles, but
Many calculators and statistics computer
packages provide these as well.

Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 22
Example – Bone Density Test
A bone mineral density test can be helpful in identifying the
presence of osteoporosis.
The result of the test is commonly measured as a z score,
which has a normal distribution with a mean of 0 and a
standard deviation of 1.
A randomly selected adult undergoes a bone density test.
Q: Find the percentage of people that result is a bone
density reading less than +1.27.
Copyright © 2014, 2012, 2010 Pearson Education, Inc.
Tuesday, June 18, 13
Section 6.2-
Example – continued
The percentage is the AREA under the ND curve to the LEFT of
the given z-score, that is that is less than 1.27.
Copyright © 2014, 2012, 2010 Pearson Education, Inc.
Tuesday, June 18, 13
Section 6.2-
Look at Table A-2
Copyright © 2014, 2012, 2010 Pearson Education, Inc.
Tuesday, June 18, 13
Section 6.2-
Example – continued
The percentage of randomly sampled adults
having a bone density less than 1.27 is 89.8%
Copyright © 2014, 2012, 2010 Pearson Education, Inc.
Tuesday, June 18, 13
Section 6.2-
Example #2
Using the same bone density test, find the percentage of
adults with a bone density above –1.00 (which is considered to
be in the “normal” range of bone density readings.
The percentage of randomly sampled adults having a bone
density above –1 is 84.1%
Copyright © 2014, 2012, 2010 Pearson Education, Inc.
Tuesday, June 18, 13
Section 6.2-
Example #3
What percentage of adults have a bone density reading
between –1.00 and –2.50 (these subjects have osteopenia).
1. The area to the left of z = –2.50 is 0.0062.
2. The area to the left of z = –1.00 is 0.1587.
3. The area between z = –2.50 and z = –1.00 is the difference
between the areas found above.
The percentage of randomly sampled adults with osteopenia is
© 2014, 2012, 2010 Pearson Education, Inc.
Section 6.20.1587 - 0.0062Copyright
= 0.1525
or 15.25%
Tuesday, June 18, 13
Notation
denotes the percentage of z scores between a and b.
denotes the percentage of z scores greater than a.
denotes the percentage of z scores less than a.
Copyright © 2014, 2012, 2010 Pearson Education, Inc.
Tuesday, June 18, 13
Section 6.2-
Quiz: EPA mpg




We now cover every which way we can use the z-table and z-score
to find what we need.
We just need to know three things:
 1st - We are dealing with a Normal Distribution
 2nd - Mean value
 3rd - Standard Deviation (SD)
EXAMPLE: The EPA finds that consumer car mileage follow a
normal distribution with a mean of 24 mpg and SD of 6 mpg.
Q1: What is the Rule-of-Thumb breakdown?
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg


Q1: What is the Rule-of-Thumb breakdown?
A: We expect that ...
 68% of cars to get between 18 and 30 mpg.
 95% of cars to get between 12 and 36 mpg, and
 99.7% of cars to get between 6 and 42 mpg.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg

Q2: What percentage of cars get less than 15 mpg?
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg


Q2: What percentage of cars get less than 15 mpg?
A Find the Area to the left of 15 mpg:
 z-score for 15 mpg: z = ( 15 - 24 ) / 6 = -1.5
 Z-Table: => 0.0668 or 6.68%
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg

Q3: What percentage of cars get between 20 and 30 mpg?
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg


Q3: What percentage of cars get between 20 and 30 mpg?
A: Find area between 20 and 30 mpg:
 30 mpg: z = (30-24) / 6 = +1.00 => 0.8413 area to the left
 20 mpg: z = (20-24) / 6 = -0.67
=> 0.2514 area to the left
 In-between: 0.8413 - 0.2514 = 0.5889 or 58.89% of cars
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg

Q4: What percentage of cars get better than 40 mpg?
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg


Q4: What percentage of cars get better than 40 mpg?
A: Find the area to the left of 40 mpg and subtract from 1.000
 40 mpg: z= (40-30) / 6 = 2.67 => 0.9962 area to the left
 Area to the right = 1.000 - 0.9962 = 0.0038 or 0.38% of cars
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg

Q5: What mpg separates the lower 20% or cars from the
upper 80%?
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg


Q5: What mpg separates the lower 20% or cars from the upper
80%?
A: Do a reverse table lookup to get the z-score, then a reverse
z-score to the mpg value:
 Body of Z-Table: 0.2005 => z = -0.84
 z-score formula: -0.84 = ( y - 24) / 6
solve for y: 18.95 mpg
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg

Q6: Find the 3rd Quartile?
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg


Q6: Find the 3rd Quartile? I.E. the 75th percentile.
A: Reverse lookup 75%, or an area of 0.7500, then solve for
the mpg:
 Body of Z Table: 0.7486 => z = 0.67
 Z score in reverse: 0.67 = ( y - 24 ) / 6 solve for y: 28.0 mpg
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg

Q7: Find the mpg for the top 5% of cars?
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Quiz: EPA mpg


Q7: Find the mpg for the top 5% of cars?
A: Find the z-score for the lower 95%
 Reverse lookup: 0.9500 => z = +1.645
 reverse z-score: 1.645 = (y - 24) / 6 solve for y: 33.87mpg
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Reverse Table Lookup: From Percentiles to
Scores


Sometimes we start with areas and need to find
the corresponding z-score and maybe even the
original data value.
Example: What z-score represents the first
quartile in a Normal model?
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 24
From Percentiles to Scores: z in Reverse (cont.)





Look in body of the Table Z for an area of 0.2500
The exact area is not there, but 0.2514 is pretty
close.
Identify the row and column: This figure is
associated with z = -0.60, and 0.07 respectively
Put these numbers together: z = - 0.67
SO ... the first quartile is 0.67 SD below the
mean.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 25
Software: Finding
Normal Percentiles
•
Given a particular range of values, what
percent of values fall in that range
– MAKE A PICTURE FIRST!!!!!
•
Minitab > Graph > Probability Distribution
Plot > View Probability
– Distribution: Normal, Enter Mean, Enter SD
– Shaded Area:
• ‘Probability’ if you’re given the %
• ‘X Value’ if you want the %
• Select the appropriate picture
• Enter the value
•
Ex: IQ scores are nearly normal with mean
100 and SD 16. Find the percent of IQ
scores over 80.
Tuesday, June 18, 13
Just Checking:
0.894
•
IQ Scores Again… Mean 100, SD 16
1.
2.
Find the percent of scores below 90
Find the percent of scores between 112 and 132
Tuesday, June 18, 13
Checking NORMALITY:
Are You Normal? Normal Probability Plots


When you actually have your own data, you
must check to see whether a Normal model is
reasonable.
Looking at a histogram of the data is a good way
to check that the underlying distribution is roughly
unimodal and symmetric.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 26
Are You Normal? Normal Probability Plots (cont)


Try software: A more specialized graphical
display that can help you decide whether a
Normal model is appropriate is the Normal
probability plot.
If the distribution of the data is roughly Normal,
the Normal probability plot approximates a
diagonal straight line. Deviations from a straight
line indicate that the distribution is not Normal.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 27
Are You Normal? Normal Probability Plots (cont)

Nearly Normal data have a histogram and a
Normal probability plot that look somewhat like
this example:
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 28
Are You Normal? Normal Probability Plots (cont)

A skewed distribution might have a histogram
and Normal probability plot like this:
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 29
Does the Normal Model Apply?
•
Check #1: Histogram
– Make sure unimodal, roughly symmetric
•
Check #2: Normal Probability Plot
– Minitab > Graph > Probability Plot
• Single
• Enter the variable
– The Normal Probability Plot plots the data values against what we
would expect under a perfect normal model
– Look for: Points that form a roughly straight, diagonal line
– Note: This graph is slightly different from your book, but same general
idea. You can do a lot of work in the options to get it to look identical
(not worth it).
•
http://www.canyons.edu/faculty/morrowa/140/datasets
– Use the Fall 2009 COC Math 140 Survey Results (cleaned heights)
– Determine if the normal model applies for height, age, and gpa
Tuesday, June 18, 13
Some
more
examples
Tuesday, June 18, 13
What Can Go Wrong?

Don’t use a Normal model when the distribution is
not unimodal and symmetric.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 30
What Can Go Wrong? (cont.)

Don’t use the mean and standard deviation when
outliers are present—the mean and standard
deviation can both be distorted by outliers.

Don’t round your results in the middle of a
calculation.

Don’t worry about minor differences in results.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 31
What have we learned?

The story data can tell may be easier to
understand after shifting or rescaling the data.
 Shifting data by adding or subtracting the
same amount from each value affects
measures of center and position but not
measures of spread.
 Rescaling data by multiplying or dividing every
value by a constant changes all the summary
statistics—center, position, and spread.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 32
What have we learned? (cont.)

We’ve learned the power of standardizing data.
 Standardizing uses the SD as a ruler to
measure distance from the mean (z-scores).
 With z-scores, we can compare values from
different distributions or values based on
different units.
 z-scores can identify unusual or surprising
values among data.
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 33
What have we learned? (cont.)

We see the importance of Thinking about whether
a method will work:
 Normality Assumption: We sometimes work
with Normal tables (Table Z). These tables are
based on the Normal model.
 Data can’t be exactly Normal, so we check the
Nearly Normal Condition by making a
histogram (is it unimodal, symmetric and free
of outliers?) or a normal probability plot (is it
straight enough?).
Copyright © 2009 Pearson Education, Inc.
Tuesday, June 18, 13
Slide 1- 35