Download Standard Deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 6
The Standard Deviation
and
The Normal Model
How can we compare?
1. A sister is 67 inches tall. Her brother is 72 inches
tall. Who is taller (comparatively)?
2. You took SAT and scored 1410. Your friend took
ACT and scored 30. Which score is better?
Solution – compare by standardizing those values,
Bring them to “common denominator”!
Slide 6- 2
The Standard Deviation as a Ruler:

The trick in comparing very different-looking
values is to use standard deviations as our rulers

The standard deviation tells how the whole
collection of values varies, therefore it is a natural
ruler for comparing an individual to a group
Slide 6- 3
Standardizing with z-scores
We compare individual data values to their mean,
relative to their standard deviation using the
following formula:
y y
z
s
We call the resulting values standardized values,
denoted as z. They can also be called z-scores.
Slide 6- 4
Standardizing with z-scores (cont.)
IMPORTANT:
 Standardized values have no units.
 z-scores measure the distance of each data
value from the mean in standard deviations.
 A negative z-score tells us that the data value is
below the mean, while a positive z-score tells us
that the data value is above the mean.
Slide 6- 5
When Standardizing,

Standardized values have been converted from
their original units to the standard statistical unit
of standard deviations from the mean.

Thus, we can compare values that are
measured on different scales, with different
units, or from different populations.
Slide 6- 6
Question1:
A sister is 67 inches tall. Her brother is 72 inches tall.
Who is taller (comparatively)?
Height of women
y  66, s y  2.5
Height of men
x  70, sx  3
Slide 6- 7
Standardizing Variables
y  y 67  66
z

 0.4
sy
2.5
x  x 72  70
z

 0.67
sx
3
A sister is 0.4 standard deviations above mean height for
women.
Her brother is 0.67 standard deviations above mean height
for men.
The brother is taller (comparatively).
Slide 6- 8
Question 2: SAT vs. ACT
You took SAT and scored 1410.
Your friend took ACT and scored 30.
Which score is better?
SAT has mean 1150 and standard deviation 100.
ACT has mean 18 and standard deviation 6.
Slide 6- 9
Standardizing: SAT vs. ACT
Your score
Friend’s score
1410  1150
 2.6
100
30  18
2
6
You scored better on SAT than your friend did on ACT.
Slide 6- 10
What is the standardizing actually?
- it is just Shifting and Rescaling Data
Shifting data:
 Adding (or subtracting) a constant amount to
each value just adds (or subtracts) the same
constant to (from) the mean. This is true for the
median and other measures of position too.
 In general, adding a constant to every data
value adds the same constant to measures of
center and percentiles, but leaves measures of
spread unchanged.
Slide 6- 11
Shifting Data (cont.)
The following histograms show a shift from men’s
actual weights to kilograms above recommended
weight:
Slide 6- 12
Rescaling data:
When we divide or multiply all the data values
by any constant value, both measures of
location (e.g., mean and median) and
measures of spread (e.g., range, IQR,
standard deviation) are divided and multiplied
by the same value.
Slide 6- 13
Rescaling Data (cont.)

The men’s weight data set measured weights in
kilograms. If we want to think about these weights in
pounds, we would rescale the data:
Slide 6- 14
Back to z-scores
y y
z
s
Standardizing data into z-scores shifts the data by
subtracting the mean and rescales the values by
dividing by their standard deviation.
 Standardizing into z-scores does not change
the shape of the distribution.
 Standardizing into z-scores changes the center
by making the mean 0.
 Standardizing into z-scores changes the
spread by making the standard deviation 1.
Slide 6- 15
When Is a z-score Big?


A z-score gives us an indication of how unusual a
value is because it tells us how far it is from the
mean.
The larger a z-score is (negative or positive), the
more unusual it is.
Slide 6- 16
When Is a z-score Big? (cont.)



There is no universal standard for z-scores, but
there is a model that shows up over and over in
Statistics.
This model is called the Normal model
(You may have heard of “bell-shaped curves.”).
Normal models are appropriate for distributions
whose shapes are unimodal and roughly
symmetric.
Slide 6- 17
Normal Model

We write N(μ,σ) to represent a Normal model
with a mean of μ and a standard deviation of σ.



We use Greek letters because this mean and standard
deviation do not come from data - they are numbers (called
parameters) that specify the model.
Summaries of data, like the sample mean and standard
deviation, are written with Latin letters. Such summaries of
data are called statistics.
When we standardize Normal data, we still call the
standardized value a z-score, and we write
z
y

Slide 6- 18
Normal Model


Once we have standardized, we need only one
model:
The N(0,1) model is called the standard Normal
model (or the standard Normal distribution).
Don’t use a Normal model for just any data set check the following condition:
Nearly Normal Condition:


The shape of the data’s distribution is unimodal
and symmetric.
This condition can be checked with a histogram or a
Normal probability plot.**
Slide 6- 19
The 68-95-99.7 Rule


Normal models give us an idea of how extreme a value is by
telling us how likely it is to find one that far from the mean.
In a Normal model:



about 68% of the values fall within one standard deviation of the mean;
about 95% of the values fall within two standard deviations of the
mean; and,
about 99.7% (almost all!) of the values fall within three standard
deviations of the mean.
Slide 6- 20
The First Three Rules for Working with
Normal Models

Make a picture.
 Make a picture.
 Make a picture.
And, when we have data, make a histogram
to check the Nearly Normal Condition to
make sure we can use the Normal model to
model the distribution.
Slide 6- 21
Example: pg. 125 #18
Some IQ tests are standardized to a Normal model,
with a mean of 100 and a standard deviation of 16.
a.
b.
c.
d.
e.
Draw the model for these IQ scores. Clearly label it, showing
what the 68-95-99.7 Rule predicts about the scores.
In what interval would you expect the central 95% of the IQ
scores to be found?
Answer: 68 to 132 IQ points
About what percent of people should have IQ scores
above 116?
Answer: 16%
About what percent of people should have IQ scores
between 68 and 84?
Answer: 13.5%
About what percent of people should have IQ scores
Answer: 2.5%
above 132?
Slide 6- 22
Finding Normal Percentiles by Hand –
see textbook***
When a data value doesn’t fall exactly 1, 2,
or 3 standard deviations from the mean, we
can look it up in a table of Normal percentiles
– review the textbook.
Slide 6- 23
Finding Normal Percentiles Using Technology
Most calculators and statistics programs have the ability
to find normal percentiles for us.
Look under: 2nd DISTR
You will see THREE “norm” functions that know the
Normal model. They are:
normalpdf(
normalcdf(
invNorm(
Slide 6- 24
normalpdf(

Calculates y values for graphing a Normal curve.
You won’t use this often, if at all. Try graphing:
Y1=normalpdf(x)
Use graphing window xmin = -4,
xmax = 4,
ymin = -.1,
ymax = .5
Slide 6- 25
normalcdf(


Finds the area between two z-score cut points,
by specifying
normalcdf(zleft,zright)
You will use this function OFTEN!!
Slide 6- 26
Example: Using normalcdf
Find the area of the shaded region between
z = -0.5 and z = 1.0. (Always draw a picture first.)
-.5
Calculator entry: Lower:
Upper: 1
μ: 0
σ: 1
Paste
normalcdf(-.5,1.0), or
normalcdf(-.5,1.0,0,1) =
0.532807
Slide 6- 27
Example: SAT Scores Again
Question 2: SAT II scores are described by model
N(500,100). What proportion of SAT II scores
fall between 450 and 600?
1. Make a picture of this normal model.
Shade desired region.
2. Find z-scores for the cut points (-0.5 and 1).
3. Use your calculator to find this area.
4. The other method!!! –
normalcdf(450,600,500,100)=0.532807, 53.3%
Slide 6- 28
To infinity and beyond! (not really)



Theoretically, the standard Normal model extends
right and left forever. Since you can’t tell the
calculator to use infinity as a right of left cut point,
the book suggests that you use 99 (or -99).
What area of the Normal model does
normalcdf(.67,99) represent?
What area of the Normal model does
normalcdf(-99.1.38) represent?
Slide 6- 29
Example: pg. 127 #34 – IQs revisited (Cont.)
Based on the Normal model N(100,16) describing
IQ scores, what percent of people’s IQs would
you expect to be:
Answer: 89.4%
a.
Over 80?
b.
Under 90?
Answer: 26.6%
c.
Between 112 and 132?
Answer: 20.4%
Slide 6- 30
From Percentiles to z-Scores in Reverse


Sometimes we start with areas and need to find
the corresponding z-score or even the original
data value.
Example: What z-score represents the first
quartile in a Normal model?
Slide 6- 31
invNorm(


Finds the z scores of a specific percentile, in
decimal form
Find the z score at the 25th percentile (first
quartile):
invNorm(.25) = -.6744897495
The calculator says that the cut point for the
leftmost 25% of a Normal model is approximately
z = -0.674.
Slide 6- 32
Example: SAT once again
Question 3: Suppose a college says it admits only
people with SAT II verbal test scores among the
top 10%. How high a score does it take to be
eligible?
1. Draw a picture and shade appropriate region.
2. What percentile will we be using?
3. Use calculator to find z score: invNorm(0.9)
4. Convert z score back to the original units, OR….
invNorm(0.9,500,100)
Answer: z = 1.28, and the score is 628 points on the SAT.
Slide 6- 33
How Can You Tell - Normal or not?



When you actually have your own data, you must check
to see whether a Normal model is reasonable.
Looking at a histogram of the data is a good way to check
that the underlying distribution is roughly unimodal and
symmetric.
A more specialized graphical display that can help you
decide whether a Normal model is appropriate is the
Normal probability plot:
If the distribution of the data is roughly Normal, the Normal
probability plot approximates a diagonal straight line.
Deviations from a straight line indicate that the distribution
is not Normal.
Slide 6- 34
How Can You Tell - Normal or not? (cont.)

Nearly Normal data have a histogram and a
Normal probability plot that look somewhat like
this example:
Slide 6- 35
How Can You Tell - Normal or not? (cont.)

A skewed distribution might have a histogram
and Normal probability plot like this:
Slide 6- 36
What Can Go Wrong?

Don’t use a Normal model when the distribution is
not unimodal and symmetric.
Don’t use the mean and standard deviation when
outliers are present—the mean and standard
deviation can both be distorted by outliers.

Don’t round off too soon too much.



Don’t round your results in the middle of a
calculation.
Don’t worry about minor differences in results.
Slide 6- 37
What have we learned?


The analysis of the data can be easier after shifting or rescaling
the data.
 Shifting data by adding or subtracting the same amount from
each value affects measures of center and position but not
measures of spread.
 Rescaling data by multiplying or dividing every value by a
constant changes all the summary statistics—center, position,
and spread.
The power of standardizing data:
 Standardizing uses the SD as a ruler to measure distance from
the mean (z-scores).
 With z-scores, we can compare values from different
distributions or values based on different units.
 z-scores can identify unusual values among data.
Slide 6- 38
What have we learned? (cont.)
Before using a Normal model - check the Nearly
Normal Condition with a histogram or Normal
probability plot.


Normal models follow the 68-95-99.7 Rule, and
we can use technology or tables for a more
detailed analysis.
Slide 6- 39