Download The Standard Deviation as a Ruler and the Normal

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
The Standard Deviation as a Ruler
and the Normal Model
Normal curve, z-scores and calculating probabilities
The Shape and Distribution of Data
Data sets consisting of physical measurements (heights,
weights, lengths of bones, heart rates, blood pressures and so
on) for adults of the same species and sex tend to follow a
similar pattern.
The pattern is that most individuals are clumped around the
average, with numbers decreasing the farther values are from
the average in either direction.
30
Sample Distribution of Men's Height in U.S.
Sample Size = 200
25
Sample mean ( x)= 69.65
Sample standard deviation(s) = 3.36
Frequency
20
15
10
5
0
60.00
62.00
64.00
66.00
68.00
70.00
72.00
74.00
76.00
78.00
80.00
82.00
Men's Height - Sample Size n = 200
We use a theoretical distribution, instead of the sample distribution. This allows us to answer certain
questions using only the mean and standard deviation of the data
Example: The distribution of hemoglobin
measures
Hemoglobin: Substance in the blood that carries oxygen
Swedish Men: Aged 35 - 50
Events that effects Hemoglobin levels
Event
Effect on hemoglobin
Has a gene for producing a lot of hemoglobin
Increase
Just returned from a trip to the mountains
Increase
Has been eating a poor diet lately
Decrease
Recent illness
Decrease
If we assume that everyone has a 50:50 chance of each event and
that each event leads to an increase or decrease by about the same
amount, the histogram of hemoglobin levels will have a symmetric,
normal shape.
Distribution of car insurance damage claims

The measurements follow a right skewed
distribution. Majority of claims were below $5,000,
but there were occasionally a few extremely high
claims.
Normal Distribution (or Normal Curve)
1. Naturally occurring distribution: Many populations of
measurements follow approximately a normal curve:
•
Physical measurements within a homogeneous population – heights of
male adults.
•
Standard academic tests given to a large group – SAT scores.
Normal Distribution (or Normal Curve)
Proportion of population of measurements falling in a certain range =
area under curve over that range.
Knowing only the mean and standard deviation, we can use the
assumption of normality to answer certain questions. Example: The
probability that a British man will be between 65 and 70 inches.
The Standard Deviation as a Ruler

The trick in comparing very different-looking values is to
use standard deviations as our rulers.

The standard deviation tells us how the whole collection
of values varies, so it’s a natural ruler for comparing an
individual to a group.

As the most common measure of variation, the standard
deviation plays a crucial role in how we look at data.
Characteristics of a Normal Curve
 Symmetric
 Empirical
and Bell-Shaped
Rule
 For any normal curve, approximately …
 68% of the values fall within 1 standard deviation of
the mean in either direction
 95% of the values fall within 2 standard deviations of
the mean in either direction
 99.7% of the values fall within 3 standard deviations
of the mean in either direction
Empirical Rule for Any Normal Curve
68%
-1sd
95%
+1sd
-2 sd
+2 sd
99.7%
-3 sd
+3 sd
Standardizing with z-scores

We compare individual data values to their mean, relative
to their standard deviation using the following formula:
x  x

z
s

We call the resulting values standardized values, denoted
as z. They can also be called z-scores.
Standardizing with z-scores

Standardized values have no units.

z-scores measure the distance of each data value from
the mean in standard deviations.

A negative z-score tells us that the data value is below the
mean, while a positive z-score tells us that the data value
is above the mean.
Benefits of Standardizing

Standardized values have been converted from their
original units to the standard statistical unit of standard
deviations from the mean.

Thus, we can compare values that are measured on
different scales, with different units, or from different
populations.
Standardized Score
A “standardized score” is simply the number of standard
deviations an individual falls above or below the mean for
the whole group.
Example
Male heights have a mean of 70 inches and a standard deviation of 3
inches. Female heights have a mean of 65 inches and a standard
deviation of 2 ½ inches. Thus, a man who is 73 inches tall has a
standardized score of 1.
Thought Question: What is the standardized score corresponding
to your own height?
Standardizing with z-scores

Standardized Score (or z-score) is :
observed value – mean
standard deviation
Note: The term z-score relates only to standard normal
curves
Example: Male heights have a mean of 70 inches and a
Standard deviation of 3 inches. If my height is 71.5
inches, then my standardized score is 71.5 – 70/3 = 0.50
More on z-scores

Standardizing data into z-scores shifts the data by
subtracting the mean and rescales the values by dividing
by their standard deviation.

Standardizing into z-scores does not change the shape of the
distribution.

Standardizing into z-scores changes the center by making
the mean 0.

Standardizing into z-scores changes the spread by making
the standard deviation 1.
When is a z-score BIG?

A z-score gives us an indication of how unusual a value is
because it tells us how far it is from the mean.

A data value that sits right at the mean, has a z-score
equal to 0.

A z-score of 1 means the data value is 1 standard
deviation above the mean.

A z-score of –1 means the data value is 1 standard
deviation below the mean.
Describing Populations with Theoretical
Normal Models

There is a Normal model for every possible combination
of mean and standard deviation.

We write N(μ,σ) to represent a Normal model with a mean
of μ and a standard deviation of σ.

We use Greek letters because this mean and standard
deviation are not numerical summaries of the data. They
are part of the model. They don’t come from the data.
They are numbers that we choose to help specify the
model.

Such numbers are called parameters of the model.
Describing Populations with Theoretical
Normal Models

When we standardize using theoretical normal model
N(μ,σ), we still call the standardized value a z-score, and
we write
z
y

Heights of Adult Women in US
N(μ = 65, σ=2.5)
z
 What
y

is z-score for height of 68 inches?
Using the Normal Model

Once we have standardized, we need only one model:


The N(0,1) model is called the standard Normal model
(or the standard Normal distribution).
Be careful—don’t use a Normal model for just any data
set, since standardizing does not change the shape of the
distribution.
The Standard Normal Curve
1. The standard normal
curve has a mean of 0
and standard deviation
of 1.
2. We convert our data
values to the their
corresponding values
on the standard
normal curve.
3. This enables us to get
exact probabilities of
certain events.
Using the Normal Model

When we use the Normal model, we are assuming the
population distribution is Normal.

We cannot check this assumption in practice, so we
check the following condition:

Nearly Normal Condition:The shape of the sample data’s
distribution is unimodal and symmetric.

This condition can be checked with a histogram or a Normal
probability plot.
NHANES of 1976-1980

The National Health and Nutrition Examination Survey
(NHANES) is a program of studies designed to assess the health
and nutritional status of adults and children in the United States.
The survey is unique in that it combines interviews and physical
examinations.

Heights of adults, ages 18-24
 women
 mean: 65.0 inches
 standard deviation: 2.5 inches

men
 mean: 70.0 inches
 standard deviation: 2.8 inches
NHANES of 1976-1980

Empirical Rule
 Women: mean: 65.0 inches, standard deviation: 2.5 inches



68% are between 62.5 and 67.5 inches
[mean  1 std dev = 65.0  2.5]
95% are between 60.0 and 70.0 inches
99.7% are between 57.5 and 72.5 inches
Thought Question
 Men: mean: 70.0 inches, standard deviation: 2.8 inches

95% are between A. [64.4, 75.6 ] B. [67.2,72.8] C. [61.6, 78.4 ]
Using the Mean, Standard Deviation and
the Normal Model

With the Mean and Standard Deviation of the Normal
Distribution we can determine:
1.
At what percentile a given individual falls, if you know their
value? Example: What proportion (or percentage) of men
are less than 72.8 inches tall?
2.
What proportion of individuals fall into any range of values?
Example: What proportion (or percentage) of men are
between 68 inches tall and 74 inches tall?
3.
What value corresponds to a given percentile? Example:
What height value is the 10th percentile for men ages 18 to
24?
NHANES of 1976-1980
At what percentile a given individual falls, if you
know their value?
 Q. What proportion (or percentage) of men are less than
72.8 inches tall?
68%
(using the Empirical Rule)
1.
32% / 2 =
?
16%
-1
standard
deviations:
height values:
+1
70
72.8
(height values)
-3
-2
-1
0
1
2
3
61.6
64.4
67.2
70
72.8
75.6
78.4
NHANES of 1976-1980
Q. What proportion (or percentage) of men are less
than 68 inches tall?
Note: Here we can’t use the Empirical Rule

?
standard
deviations:
height values:
68
70
(height values)
-3
-2
-1
0
1
2
3
61.6
64.4
67.2
70
72.8
75.6
78.4
NHANES: Calculate Standardized Score

Q. What proportion of men are less than 68 inches tall?

How many standard deviations is 68 from 70?

standardized score =
(observed value minus mean) / (std dev)
= (68  70) / 2.8 = 0.71

The value 68 is 0.71 standard deviations below the mean 70.
NHANES of 1976-1980

What proportion (or percentage) of men are less than 68
inches tall?
?

68
70
(height values)
-0.71
0
(standardized values)
What is the probability of a standardized-score less than -0.71?
NHANES of 1976-1980

What proportion (or percentage) of men are less than 68
inches tall?
24%

68
70
(height values)
-0.71
0
(standardized values)
The proportion (or percentage) of men less than 68 inches tall
is 0.24 (or 24%)
Question What proportion of mean are 74 inches or shorter?
(hint: the standardized score for 74 inches is 1.43)
NHANES of 1976-1980
2. What proportion (or percentage) of individuals fall into any range
of values?
Q. What proportion (or percentage) of men are between 68 inches tall and
74 inches tall?

First, draw and label a normal curve.
standard
deviations:
height values:
-3
-2
-1
0
1
2
3
61.6
64.4
67.2
70
72.8
75.6
78.4
NHANES of 1976-1980

Shade on the graph the range of heights between 68
and 74 inches.
?
(height values)
(standardized values)
68
70
74
-0.71
0
1.43
NHANES of 1976-1980


24% of the men are less than 68 inches tall
and 92.4% of men are less than 74 inches tall
What percentage are between 68 inches tall and
74 inches tall?
92.4%
?
24%
(height values)
68
70
74
(standardized values) -0.71
0
1.43
NHANES of 1976-1980

Q. What proportion (or percentage) of men are
between 68 and 74 inches tall?
92.4%
92.4% – 24%
= 68.4%
24%
(height values)
(standardized values)

68
70
74
-0.71
0
1.43
The proportion (or percentage) of men between 68
inches tall and 74 inches tall is 0.684 (or 68.4%)
NHANES of 1976-1980
3.
What value corresponds to a given percentile?
Q. What height value is the 10th percentile for men ages 18
to 24?
10%
?
70
(height values)
Look up the closest percentile in the table.
Find the corresponding standardized score.
The value you seek is that many standard deviations from the mean.
Normal Tables
NHANES of 1976-1980

What height value is the 10th percentile for men ages 18 to 24?
10%
?
-1.28
70
(height values)
0
(standardized values)
NHANES of 1976-1980

Observed Value for a Standardized Score

What height value is the 10th percentile for men ages
18 to 24?
observed value =
mean plus [(standardized score)  (std dev)]

= 70 + [(1.28 )  (2.8)]
= 70 + (3.58) = 66.36

The value 66.36 is approximately the 10th percentile of
the population.
Summary
1.
At what percentile a given individual falls, if you know their
value?



2.
What proportion of individuals fall into any range of values?



3.
If possible, use Empirical Rule
Otherwise calculate standardized score
Look up normal tables to find percentile
Calculate standardized score for lowest and highest value for range
Look up normal tables to find percentile below both values
Subtract the larger percentile from the smaller percentile
What value corresponds to a given percentile?



Look up the closest percentile in the table
Find the corresponding standardized score
The value you seek is that many standard deviations from the mean
How to check data for Normality

When you actually have your own data, you must check
to see whether a Normal model is reasonable.

Looking at a histogram of the data is a good way to
check that the underlying distribution is roughly
unimodal and symmetric.
How to check data for Normality

A more specialized graphical display that can help you
decide whether a Normal model is appropriate is the
Normal probability plot.

If the distribution of the data is roughly Normal, the
Normal probability plot approximates a diagonal straight
line. Deviations from a straight line indicate that the
distribution is not Normal.
How to check data for Normality

Nearly Normal data have a histogram and a Normal
probability plot that look somewhat like this example:
How to check data for Normality

A skewed distribution might have a histogram and
Normal probability plot like this:
SAS - PROC UNIVARIATE
title "Demonstrating MIDPOINT= Histogram Option";
proc univariate data=example.Blood_Pressure;
id Subj;
var SBP;
histogram / normal midpoints=100 to 170 by 5;
probplot / normal(mu=est sigma=est);
run;
PROC UNIVARIATE - HISTOGRAM
SAS – Normal Probaility Plot
Heart Rate Example
Sex HR Sex HR Sex HR Sex HR Sex HR Sex HR Sex HR
F
55
M
66
F
70
M
73
F
77
M
79
F
82
M
57
F
67
F
70
M
73
F
77
M
79
M
82
M
59
F
67
M
70
M
73
F
77
M
79
F
83
F
61
F
68
M
70
M
73
M
77
F
80
M
83
M
61
F
68
F
71
F
74
M
77
F
80
M
83
M
62
F
68
F
71
F
74
F
78
M
80
F
84
M
62
M
68
M
71
F
74
F
78
F
81
F
84
F
63
F
69
M
71
M
74
F
78
F
81
M
85
F
64
M
69
F
72
F
75
F
78
F
81
F
86
M
64
M
69
M
72
F
75
M
78
M
81
F
86
M
64
M
69
F
73
M
75
M
78
F
82
M
89
M
66
F
70
M
73
M
76
M
79
F
82
M
89
For the heart rate data for 84
adults:
Mean HR = 74.0 bpm
SD = 7.5 bpm
Mean  1SD = 74.0  7.5
= 66.5-81.5 bpm
25
Mean  2SD = 74.0  15.0
= 59.0-89.0 bpm
Frequency
20
15
10
Mean  3SD = 74.0  22.5
= 51.5-96.5 bpm
5
0
//
55
60
65
70
75
80
Heart Rate (BPM)
85
90
Heart Rate Example
HR Data:
57/84 (67.9%) subjects are between mean ± 1SD
82/84 (97.6%) are between mean ± 2SD
84/84 (100%) are between mean ± 3SD
100
+3 SD
95
+2 SD
90
Heart rate (bpm)
85
+ 1SD
80
Mean
75
70
-1 SD
65
-2 SD
60
55
-3 SD
50
45
0
4
8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84
Subject number
Question