Download z-scores: Using Standard Deviation as a Ruler

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
z-scores: Using Standard Deviation as a Ruler



The trick in comparing very different-looking values is to use standard deviations as our
rulers.
The standard deviation tells us how the whole collection of values varies, so it’s a natural
ruler for comparing an individual to a group.
As the most common measure of variation, the standard deviation plays a crucial role in
how we look at data.
Remember:
Standard Deviation (s) is a measure of spread. It approximately measures the average
distance of each data point from the mean.
Standardizing with z-scores
A z-score measures how many standard deviations a number is from the mean.
Example:
6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 25, 26, 30
n = 14 (there are 14 numbers)
Mean: y  16.6
Minimum:
Q1:
Median:
Q3:
Maximum:
5
12
16.5
22
27
Because the mean (16.6) is very close to the median (16.5), there is a good chance that
the data is symmetric. To be sure, we can look at the histogram of the data. (above)
With symmetric data, the standard deviation is the appropriate measure of spread.
The standard deviation:
s ≈ 6.6
6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 25, 26, 30
Because 10 is exactly 1 standard deviation away from the mean, the z-score
corresponding to the number 10 would be z = -1 (The negative is because the number is to
the left of the mean.)
The z-score corresponding to the number 20 would be approximately 0.5 or ½ .
Standardizing with z-scores

We compare individual data values to their mean, relative to their standard deviation using
the following formula:
y y
z



s
We call the resulting values standardized values, denoted as z. They can also be called zscores
Standardized values have no units.
z-scores measure the distance of each data value from the mean in standard deviations.
A negative z-score tells us that the data value is below the mean, while a positive z-score
tells us that the data value is above the mean.



Example:
Use the formula z 
 y  y  to standardize the following data by converting to z-scores.
s
6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 25, 26, 30
First, what is the mean?
y
6
 y  y
s
z
8
y
What is the standard deviation?
10
12
13
s
15
The Standard Deviation as a Ruler
z-scores measure the number of standard deviations a number is from the mean.
A positive z-score means that the datum is to the
of the mean.
A negative z-score means that the datum is to the
of the mean.
If a z-score is near zero (close to the mean) that indicates that the datum is
Most z-scores fall between -2 and 2. A z-score higher than 2 or less than -2 are unusual.
(95% of the data in a set that is normally distributed are less than 2 standard deviations away from the mean)
If a z-score has a value higher than 3 or lower than -3, then the corresponding datum is very
unusual.
(97.7% of the data in a set that is normally distributed are less than 3 standard deviations away from the mean)
Normal Curve
Normally distributed data has a histogram that looks similar to the bell curve below. As you can
see, most of the data is near the mean (where z = 0).
z
Example:
The z-scores for the data set:
2, 5, 7, 8, 13 are as follows:
Benefits of Standardizing (Converting to z-scores)


Standardized values have been converted from their original units to the standard statistical
unit of standard deviations from the mean.
Thus, we can compare values that are measured on different scales, with different units, or
from different populations.
Example:
JaNathan earned a 93% on a test in Mr. Kane’s class. The test scores for that test
were normally distributed with a mean of 75 and a standard deviation of 12.
During the football season, JaNathan ran the 40-yard dash in 4.5 seconds. The
mean time for the team in the 40-yd dash was normally distributed with a mean
of 5.1 seconds and a standard deviation of 0.33.
Which is more impressive, JaNathan’s 93% test score or his 4.5 second 40-yard
dash time?
Because the two numbers we are trying to compare use different units, we need to
standardize the units (convert them to z-scores) before we can compare them.
Convert each to z-scores (show work)
Test Score 93
Running time 4.5
Because the 4.5 second run time has a z-score that is further away from zero than the z-score of
the 93% test score, it is more impressive that JaNathan ran the 40-yard dash in 4.5 seconds.
(Keep in mind that a negative z-score for run time is good because we want our run time to be less than the mean
run time.. Also, a positive z-score for test grade is good because we want to score higher than the mean on a test.)
What have we learned?

We’ve learned the power of standardizing data.
o Standardizing uses the Standard Deviation as a ruler to measure distance from
the mean (z-scores).
o With z-scores, we can compare values from different distributions or values
based on different units.
o z-scores can identify unusual or surprising values among data.