Download Wednesday 10 - 1 - Class Power Point on Standard Deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
STANDARD
DEVIATION
CHAPTER 6
To determine who should get the GOLD MEDAL,
somehow the performances in all 7 events need to
be combined into ONE score!!
How do they do this? Some races are recorded in
minutes and seconds ( the runs) and some are
recorded in meters ( the throwing and jumping)
WHAT DO YOU THINK????
The Standard Deviation as a
Ruler



Use standard deviation
when comparing
unlike measures.
Standard deviation is
the most common
measure of spread.
Remember standard
deviation is the square
root of the variance.
Standardizing
 We
standardize to eliminate units.
 A standardized value can be found by
subtracting the mean from the value and
dividing by the standard deviation.

Has no units
A
z-score measures the distance of each
data value from the mean in standard
deviation.


Negative z-score- data value below the mean
Positive z-score- data value above the mean
Benefits of Standardizing
 Standardized
values are converted to the
standard statistical unit of standard
deviations from the mean. (z-score)
 Values that are measured on different scales
or in different units can now be compared.
 Example: Judges will use z-scores to
determine the winner of a heptathlon .

Bacher ran the 800-m in 129 seconds, 1.6 standard
deviations better than the mean. Her long jump of 5.84 m
had a z-score of -.44 (fell below the mean). 1.6-.44= .94.
Judges would do this for all runners and events and the
highest score wins.
Shifting Data
 Adding
or subtracting a constant amount to
each value just adds or subtracts the same
constant to:


the mean and median
Maximum, minimum, and quartiles
 The
spread does not change because the
distribution is simply shifting.

The range, IQR, and the standard deviation remains the
same.
 Recap:
Adding a constant to every data value
adds the same constant to measures of center
and percentiles, but leaves measures of spread
unchanged.
Rescaling Data
 Rescaling data is multiplying or
dividing all values by the same
number.

Changes the measurement units.


Ex. Inches to feet (multiply by 12)
When we divide or multiply all the
data values by any constant
value, both measures of location
(mean and median) and
measures of spread (range, IQR
and standard deviation) are
divided or multiplied by that same
value.
Back to z-scores
 Standardizing
z-scores is shifting them
by the mean and rescaling them by
standard deviation.
 Standardizing:
does not change the shape of the
distribution of a variable.
 Changes the center by making the
mean 0
 changes the spread by making
standard deviation 1

The First Three Rules for Working
with Normal Models
Make a picture!
Make a picture!
Make a picture!
The Normal Model
 There
are a lot of variables in the real world that have
similar looking distributions…
The Normal Model
 The
Normal Model is a simpler version of symmetric,
mound shaped data (often referred to as a “Bell Shaped
Curve”)
The Normal Model
• We use a normal model to mathematically smooth the
distribution (more on this later)
The Normal Model
 When





can we use the normal model?
The data is unimodal
The data is symmetric
The data is mound shaped
The data has no outliers
You need to check (with a histogram or a Normal Probability
Plot, or something similar) if the data is nearly normal before
you can use the Normal Distribution to model your data
Example
 Is
the normal model appropriate? Why or why not?
Example
 Is
the normal model appropriate? Why or why not?
Mean =
StDev =
19.810 mm
112.016 mm
Min
LQ
Med.
UQ
Max
0.316 mm
1.872 mm
3.589 mm
9.457 mm
2682.000 mm
=
=
=
=
=
The Normal Model
• Whether or not the normal model is appropriate for
a particular variable is based only on the shape of
the variable’s distribution…
• A variable’s distribution can have any center and
any spread and still fit a normal model.
How is this possible?
The Normal Model
• There’s a different normal model for every center and
every spread.
• Center
• The center of the normal model is denoted
m (say “mu”)
• μ is the mean and the median of the normal model
• Spread
• The spread of the normal model is denoted s (say “sigma”)
• Smaller s makes the curve “tall and skinny”; larger s makes the
curve “flat and wide”
• Write N(m, s)
• This says that we have a normal model with a mean of m and a
std. dev. of s
The Normal Model
• We could work with all of the different normal
models, but then we’d have to keep track of a lot of
stuff. It is much, much easier to convert all the
different normal models into one Standard Normal
Model
• Standardize our model by transforming the data
z
ym
s
• Standard Normal Model:
• N(0,1)
• Mean = 0
• Standard Deviation = 1
The Normal Model
• Once standardized…
• We’ve simply shifted and rescaled the distributions.
• Think of it as adding a z-scale/z-axis which always has a
mean of 0 and a standard deviation of 1.
Notes on the Normal Model
• The curve is always above the x-axis
– Never reaches the x-axis
– Continues to infinity and –infinity
• Total area under the curve = 1
• It is perfectly symmetric
– Mean = Median
• For any segment of the normal curve we know
the area underneath the curve within that
segment
The Normal Model
• Why use the normal model?
– It’s simpler than real data.
– The data from a normal model is distributed in a
predictable pattern
– For any segment of the standard normal model
we know the area underneath the curve.
– The area underneath the curve tells us what
proportion of the data has values that
correspond to our segments, thus giving us the
probability that a data point will be in that
segment
The Normal Model: 68-95-99.7 Rule
• (about) 68% of the data is between -1 and 1
standard deviations from the mean
• (about) 95% of the data is between -2 and 2
standard deviations from the mean
• (about) 99.7% of the data is between -3 and 3
standard deviations from the mean
The Normal Model: 68-95-99.7 Rule
68% of observations are within 1 σ of the
mean μ
 For N(0,1), 68% of the observations are
between –1 and 1

The Normal Model: 68-95-99.7 Rule
95% of observations are within 2 σ of the
mean μ
 For N(0,1), 95% of the observations are
between –2 and 2

The Normal Model: 68-95-99.7 Rule
99.7% of observations are within 3 σ of the
mean μ
 For N(0,1), 99.7% of the observations are
between –3 and 3

The Normal Model: 68-95-99.7 Rule

The heights of men are thought to follow a Normal
Model with a mean of 70 in. and a std. dev. of 3 in.

N(70, 3)

Draw a picture of this Normal Model. Clearly label
what the 68-95-99.7 Rule indicates about men’s
heights.
Example: How to proceed
Step 1: Check to see if the normal model is appropriate
Step 2: Draw the standard normal picture.
Step 3: Find the mean and standard deviation of the
sample data.
Step 4: Make the conversion from the standard normal
to the data and draw the new axis (directly
below the standard normal picture) using this
formula.
z
ym
s
Step 5: Use the picture to answer any questions.
Example
1. What percentage of men have
heights greater than 70 inches?
2. What percentage of men have
heights less than 64 inches?
3. What percentage of men have
heights greater than 73 inches?
4. 0.15% of men have heights above
what value?
Example
De Veaux, Velleman & Bock, Ch6, #19:
EPA fuel economy estimates for automobile models tested
recently predicted a mean of 24.8 mpg and a standard
deviation of 6.2 mpg for highway driving. Assume that a
Normal model can be applied.
1. What does the 68-95-99.7 rule say about the
distribution of autos’ fuel efficiency?
2. About what percent of autos should get more than 31
mpg?
3. Describe the gas mileage of the worst 2.5% of all the
cars.
4. If I pick a automobile at random, what is the probability
that I select a car that gets more than 37.2 mpg?