Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia, lookup

Categorical variable wikipedia, lookup

Student's t-test wikipedia, lookup

Regression toward the mean wikipedia, lookup

Transcript
```z-scores
z-scores
The Standard Deviation as a Ruler



The trick in comparing very different-looking
Remember:
values
is to use standard deviations as our rulers.
The standard deviation tells us how the whole
Standard
(s) isso
a measure
of ruler
collection
of Deviation
values varies,
it’s a natural
which
approximately
the average
comparing
anisindividual
to a group.
each data
point from
the mean.
Asdistance
the mostofcommon
measure
of variation,
the
standard deviation plays a crucial role in how we
look at data.
z-scores
Standardizing with z-scores
A z-score measures how many standard
deviations a number is from the mean.
For example, let’s look at the following data set.
6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 25, 26, 30
z-scores
6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 25, 26, 30
n = 14
Mean:
y  16.6
Minimum:
Q1:
Median:
Q3:
Maximum:
6
12
16.5
22
30
Because the mean(16.6)
is very close to the
median(16.5), there is a
good chance that the
data is symmetric.
With symmetric data, the
To be
sure, we
can lookis the
standard
deviation
at the
histogrammeasure
of the of
appropriate
The standard deviation:
s ≈ 6.6
z-scores
z = -1
6.6
6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 25, 26, 30
Mean ≈ 16.6
With a standard deviation
of 6.6 and a mean of
16.6, we can see that the
number 10 is exactly one
standard deviation below
the mean.
Because 10 is exactly 1
standard
away
(mean) deviation
– (std. dev.)
from the mean, the zscore corresponding to
16.6 – 6.6
10 be
the number
10=would
z = -1
(The negative is because the
number is to the left of the mean.)
z-scores
z ≈ 0.5
3.3
6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 25, 26, 30
Mean ≈ 16.6
With a standard deviation of
6.6, half (0.5) of one standard
deviation would be 3.3.
Notice that the number 20 is
approximately ½ of one
standard deviation from the
mean.
(mean) + (0.5)(std. dev.)
The z-score corresponding to
the number 20 would be
approximately 0.5 or ½ .
16.6 +3.3 = 19.9
≈ 20
z-scores
Standardizing with z-scores

We compare individual data values to their
mean, relative to their standard deviation
using the following formula:
y  y

z
s

We call the resulting values standardized
values, denoted as z. They can also be
called z-scores
z-scores
Standardizing with z-scores (cont.)



Standardized values have no units.
z-scores measure the distance of each data
value from the mean in standard deviations.
A negative z-score tells us that the data value is
below the mean, while a positive z-score tells us
that the data value is above the mean.
z-scores
Use the formula
y  y

z
to standardize the
s
following data by converting to z-scores.
6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 24, 25, 26
Recall:
mean: y ≈ 16.6
standard deviation: s ≈ 6.6
and that the data are the y values.
z-scores
y  y

z
s
6, 8, 10, 12, 13, 15, 16, 17, 18, 20, 22, 24, 25, 26
y
6
 y  y
 6  16.6 
s
6.6
6.6
6.6
6.6
z
-1.61
-1.30
-1
-0.70
8
10
12
8  16.6  10  16.6  12  16.6 
Complete the rest of this table in your notes.
z-scores
The Standard Deviation as a Ruler
z-scores measure the number of standard
deviations a number is from the mean.
A positive z-score means that the datum is to the
right of the mean.
A negative z-score means that the datum is to
the left of the mean.
z-scores
The Standard Deviation as a Ruler
If a z-score is near zero that indicates that the datum is
typical (close to the mean).
Most z-scores fall between -2 and 2. A z-score higher
than 2 or less than -2 are unusual.
(95% of the data in a set that is normally distributed are less than 2
standard deviations away from the mean)
If a z-score has a value higher than 3 or lower than -3,
then the corresponding datum is very unusual.
(97.7% of the data in a set that is normally distributed are less than 3
standard deviations away from the mean)
z-scores
Normally distributed data has a histogram that looks
similar to the bell curve below. As you can see, most of
the data is near the mean (where z = 0).
z
z-scores
Example:
The z-scores for the data set: 2, 5, 7, 8, 13 are as follows:
Mean = 7
2
Standard Deviation = 4.06
5
7
8
13
2  7
5  7
7  7
8  7 
13  7 
4.06
4.06
4.06
4.06
4.06
-1.23
-0.25
0
Bigger z-scores
mean that data
is less typical
Small z-scores mean
that data is typical
0.25
1.48
Bigger z-scores
mean that data
is less typical
z-scores
Benefits of Standardizing
(Converting to z-scores)


Standardized values have been converted
from their original units to the standard
statistical unit of standard deviations from the
mean.
Thus, we can compare values that are
measured on different scales, with different
units, or from different populations.
z-scores
Example:
JaNathan earned a 93% on a test in Mr. Kane’s class.
The test scores for that test were normally distributed
with a mean of 75 and a standard deviation of 12.
During the football season, JaNathan ran the 40-yard
dash in 4.5 seconds. The mean time for the team in the
40-yd dash was normally distributed with a mean of 5.1
seconds and a standard deviation of 0.33.
Which is more impressive, JaNathan’s 93% test score or
his 5.1 second 40-yard dash time?
z-scores
Because the two numbers we are trying to
compare use different units, we need to
standardize the units (convert them to z-scores)
before we can compare them.
During the football season, JaNathan ran the 40-yard dash in 4.5 seconds. The
JaNathan
93%inon
a 40-yd
test in dash
Mr. Kane’s
class. The
test scores
that
mean
timeearned
for the ateam
the
was normally
distributed
with for
a mean
test
were
normally
a mean
75 and a standard deviation of 12.
of 5.1
seconds
anddistributed
a standardwith
deviation
of of
0.33.
Converting 93
4.5to
toaaz-score:
z-score:
xx4.5
93
xx5.1
75
ss0.33
12
93  75
5.1
x  x 4.5

z
0.33
12
s
z z 1.5
1.82
z-scores
The standardized scores compare as follows:
93% test score  z = 1.5
5.1 sec run time  z = -1.8
Because the 5.1 second run time has a z-score
that is further away from zero than the z-score of
the 93% test score, it is more impressive that
JaNathan ran the 40-yard dash in 5.1 seconds.
(Keep in mind that a negative z-score for run time is good because we want
our run time to be less than the mean run time.. Also, a positive z-score for
test grade is good because we want to score higher than the mean on a
test.)
z-scores
What have we learned?

We’ve learned the power of standardizing data.
 Standardizing uses the Standard Deviation as
a ruler to measure distance from the mean (zscores).
 With z-scores, we can compare values from
different distributions or values based on
different units.
 z-scores can identify unusual or surprising
values among data.
```