Download Lecture 6 - Brian Paciotti

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Lecture 6
Bell Shaped Curves
Thought Question 1:
The heights of adult women
in the United States follow,
at least approximately,
a bell-shaped curve.
What do you think this means?
Thought Question 2:
What does it mean to say
that a man’s weight
th
is in the 30 percentile
for all adult males?
Thought Question 3:
A “standardized score” is simply the number of
standard deviations an individual falls above or
below the mean for the whole group.
Male heights have a mean of 70 inches and a
standard deviation of 3 inches. Female heights
have a mean of 65 inches and a standard deviation
of 2 ½ inches. Thus, a man who is 73 inches tall
has a standardized score of 1.
What is the standardized score corresponding to
your own height?
Thought Question 4:
Data sets consisting of physical measurements
(heights, weights, lengths of bones, and so on)
for adults of the same species and sex tend to
follow a similar pattern.
The pattern is that most individuals are clumped
around the average, with numbers decreasing
the farther values are from the average in either
direction.
Describe what shape a histogram of such
measurements would have.
8.1 Populations, Frequency
Curves, and Proportions
Move from pictures and shapes
of a set of data to …
Pictures and shapes for
populations of measurements.
Frequency Curves
Smoothed-out histogram by connecting tops of
rectangles with smooth curve.
Frequency curve
for population
of British male
heights.
The measurements
follow a normal
distribution (or a
bell-shaped or
Gaussian curve).
Note: Height of curve set so area under entire curve is 1.
Frequency Curves
Not all frequency curves are bell-shaped!
Frequency curve
for population
of dollar amounts
of car insurance
damage claims.
The measurements follow a right skewed distribution.
Majority of claims were below $5,000, but there were
occasionally a few extremely high claims.
Proportions
Recall: Total area under frequency curve = 1 for 100%
Key: Proportion of population of measurements
falling in a certain range =
area under curve over that range.
Mean British Height is
68.25 inches. Area to
the right of the mean is
0.50. So about half of
all British men are
68.25 inches or taller.
Tables will provide other areas under normal curves.
8.2 The Pervasiveness
of Normal Curves
Many populations of measurements
follow approximately a normal curve:
• Physical measurements within a homogeneous
population – heights of male adults.
• Standard academic tests given to a large group
– SAT scores.
Normal Distribution
Probability
Probability is
area under
curve!
d
P(c  x  d )   f ( x) dx
c
f(x)
c
d
x
?
Normal Distribution
• The height of a normal density curve at any
point x is given by
 2
1
 12 ( x
)
f ( x) 
e
 2
 is the mean



is the standard deviation
Importance of
Normal Distribution
• 1. Describes Many Random Processes or
Continuous Phenomena
• 2. Basis for Classical Statistical Inference
Examples with approximate
Normal distributions
•
•
•
•
•
•
Height
Weight
IQ scores
Standardized test scores
Body temperature
Repeated measurement of same quantity
• These distributions which are like generalised relative
frequency histograms can take many different shapes, some
symmetrical some skewed.
• There is one shape however that crops up all through the
natural world and that is …
• The Normal Distribution is Symmetric.
• There are many different Normal curves, some are
fat some are thin.
• Some are centred at 0 some at 1 some at 5 etc.
• Each normal curve can be uniquely identified by
two parameters.
• The Mean and the Standard Deviation
• Once you know the mean and the S.Deviation for
a Normal curve then it is possible to draw the
curve.
• Normal curves are centred at the Mean. And the
Standard Deviation describes how spread out they
are.
A Normal Frequency Curve for the Population of SAT scores
0.0045
0.004
0.0035
Height of Curve
0.003
0.0025
0.002
0.0015
0.001
0.0005
0
200
300
400
500
SAT Scores
600
700
800
• The area under a Normal curve to the left of the
mean is .5. This indicates that the probability that
something which is normally distributed is less
than its mean is .5.
• The area under the curve to the left of any point A
on the X axis represents the probability that a
Normal variable is less than A.
Example: The normal distribution is the most important
distribution in Statistics. Typical normal curves with
different sigma (standard deviation) values are shown
below.
8.3 Percentiles and
Standardized Scores
Your percentile = the percentage of
the population that falls below you.
Finding percentiles for normal curves requires:
• Your own value.
• The mean for the population of values.
• The standard deviation for the population.
Then any bell curve can be standardized
so one table can be used to find percentiles.
Percentiles
Example: Have you ever wondered what percentage of the population
(of your gender) is taller than you are?
• Your percentile in a population represents the position of your
measurement in comparison with everyone else’s.
•It gives the percentage of the population that fall below you. For
example, if you are in the 98th percentile, it means that 98% of the
population falls below you and only 2% is above you.
•Your percentile value is easy to find if the population of values has an
approximate bell shape.
• Although there are an unlimited number of potential bell-shaped
curves, each one can be completely determined once you know the
mean and standard deviation of the population.
• In addition, each curve can be “standardized” in a way such that the
same table can be used to find percentiles for any of them.
Infinite Number
of Tables
Normal distributions differ by
mean & standard deviation.
Each distribution would
require its own table.
f(X)
X
That’s an infinite number!
Standardize the
Normal Distribution
X 
Z

Normal
Distribution
Standardized
Normal Distribution

= 1

X
=0
One table!
Z
The standardized score is often called the z-score.
Once you know the z-score for an observed value,
you can easily find the percentile corresponding to
the observed value by using the table that gives
the percentiles for a normal distribution with mean
0 and standard deviation 1.
A normal curve with a mean of 0 and a standard
deviation of 1 is called a standard normal curve. It
is the curve that results when any normal curve is
converted to standardized scores.
Standardized Scores
Standardized Score (standard score or z-score):
observed value – mean
standard deviation
IQ scores have a normal distribution with
a mean of 100 and a standard deviation of 16.
• Suppose your IQ score was 116.
• Standardized score = (116 – 100)/16 = +1
• Your IQ is 1 standard deviation above the mean.
• Suppose your IQ score was 84.
• Standardized score = (84 – 100)/16 = –1
• Your IQ is 1 standard deviation below the mean.
A normal curve with mean = 0 and standard deviation = 1
is called a standard normal curve.
Table 8.1: Proportions and Percentiles
for Standard Normal Scores
Standard
Score, z
-6.00
-5.20
-4.26
-3.00
:
-1.00
:
-0.58
:
0.00
Proportion
Below z
0.000000001
0.0000001
0.00001
0.0013
:
0.16
:
0.28
:
0.50
Percentile
0.0000001
0.00001
0.001
0.13
:
16
:
28
:
50
Standard
Score, z
0.03
0.05
0.08
0.10
:
0.58
:
1.00
:
6.00
Proportion
Below z
0.51
0.52
0.53
0.54
:
0.72
:
0.84
:
0.999999999
Percentile
51
52
53
54
:
72
:
84
:
99.9999999
Finding a Percentile from an observed value:
1. Find the standardized score = (observed value – mean)/s.d.,
where s.d. = standard deviation.
Don’t forget to keep the plus or minus sign.
2. Look up the percentile in Table 8.1.
• Suppose your IQ score was 116.
• Standardized score = (116 – 100)/16 = +1
• Your IQ is 1 standard deviation above the mean.
• From Table 8.1 you would be at the 84th percentile.
• Your IQ would be higher than that of 84%
of the population.
Finding an Observed Value from a Percentile:
1. Look up the percentile in Table 8.1 and
find the corresponding standardized score.
2. Compute observed value = mean +(standardized score)(s.d.),
where s.d. = standard deviation.
Example 1: Tragically Low IQ
“Jury urges mercy for mother who killed baby. …
The mother had an IQ lower than 98 percent of the population.”
(Scotsman, March 8, 1994,p. 2)
• Mother was in the 2nd percentile.
• Table 8.1 gives her standardized score = –2.05,
or 2.05 standard deviations below the mean of 100.
• Her IQ = 100 + (–2.05)(16) = 100 – 32.8 = 67.2 or about 67.
The Standard Normal Table: 8.1
• Table 8.1 is a table of areas under the standard normal density
curve. The table entry for each value z is the area under the
curve to the left of z.
The Standard Normal Table: Table A
• Table 8.1 can be used to find the proportion of observations of a
variable which fall to the left of a specific value z if the variable
follows a normal distribution.
Example 2: Calibrating Your GRE Score
GRE Exams between 10/1/89
and 9/30/92 had mean verbal
score of 497 and a standard
deviation of 115. (ETS, 1993)
Suppose your score was 650
and scores were bell-shaped.
• Standardized score =
(650 – 497)/115 = +1.33.
• Table 8.1, z = 1.33 is between
the 90th and 91st percentile.
• Your score was higher than
about 90% of the population.
Example 3: Removing Moles
Company Molegon: remove unwanted moles from gardens.
Weights of moles are approximately normal with a mean of
150 grams and a standard deviation of 56 grams.
Only moles between 68 and 211 grams can be legally caught.
• Standardized score =
(68 – 150)/56 = –1.46, and
Standardized score =
(211 – 150)/56 = +1.09.
• Table 8.1: 86% weigh 211
or less; 7% weigh 68 or less.
• About 86% – 7% = 79% are
within the legal limits.
Standardizing Example
X   6.2  5
Z

 .12

10
Normal
Distribution
 = 10
= 5 6.2 X
Standardized
Normal Distribution
=1
= 0 .12
Z
Some Examples
Suppose it is know that verbal SAT scores are normally
distributed with a mean of 500 and a standard deviation of
100.
Find the proportion of the population of SAT scores are less
than or equal to 600.
First we need to find the standardized score:
Z-score=(observed value-mean)/(standard deviation)
=(600-500)/100 = +1
From Table 8.1 we see that a z-score of +1 is the 84th
percentile and the proportion of population SAT scores that
are less than or equal to 600 is 0.84.
SAT SCORES
0.0045
0.004
0.0035
Height of Curve
0.003
0.0025
0.002
0.0015
0.001
0.0005
0
200
300
400
500
SAT Scores
600
700
800
Standardized Scores (Z-Scores)
0.45
0.4
0.35
Height of Curve
0.3
0.25
0.2
0.15
0.1
0.05
0
-3
-2
-1
0
Standardized Score (Z-score)
1
2
3
Estimate the proportion of population SAT scores that are
greater than 400.
First, we need to find the standardized score:
z-score=(400-500)/100 = -1
From Table 8.1 we see that 16% of population values have
a z-score less than or equal to -1 (or equivalently, 16% of
population values have an observed score less than 400.
However, we are interested in the proportion of the
population with scores GREATER than 400.
proportion ABOVE 400 = 1 - proportion BELOW 400
= 1 – 0.16
= 0.84
0.45
0.4
0.35
Height of Curve
0.3
0.25
0.2
0.15
0.1
0.05
0
-3
-2
-1
0
Standardized Score (Z-score)
1
2
3
Estimate the proportion of population SAT scores that are
between 400 and 600.
An observed value of 400 has a z-score of -1 and
represents the 16th percentile (proportion below z = -1 is
0.16).
An observed value of 600 has a z-score of +1 and
represents the 84th percentile (proportion below z = +1 is
0.84).
Let’s draw a picture….
0.45
0.4
0.35
Height of Curve
0.3
0.25
0.2
0.15
0.1
0.05
0
-3
-2
-1
0
1
Standardized Score (Z-score)
So the proportion with scores between 400 and 600
=Proportion below 600 – Proportion below 400
= 0.84 - 0.16 = 0.68
2
3
Find an SAT score such that 70% of the population had SAT
scores less than or equal to this number (i.e., estimate the
70th percentile of the population).
First we need to find the z-score that corresponds to the
70th percentile.
From Table 8.1 we see that this z-score is +0.52.
Next we need to find the observed value (from the zscore):
Observed value = mean + (z-score)*(standard deviation)
= 500 + 0.52*100
= 552
8.4 z-Scores and
Familiar Intervals
Empirical Rule
For any normal curve, approximately …
• 68% of the values fall within 1 standard deviation of the
mean in either direction
• 95% of the values fall within 2 standard deviations of the
mean in either direction
• 99.7% of the values fall within 3 standard deviations of
the mean in either direction
A measurement would be an extreme outlier
if it fell more than 3 s.d. above or below the mean.
The 68-95-99.7 Rule
The Empirical Rule Applet
• http://www.stat.sc.edu/~west/applets/empiri
calrule.html
Heights of Adult Women
Since adult women
in U.S. have a mean
height of 65 inches
with a s.d. of 2.5
inches and heights
are bell-shaped,
approximately …
• 68% of adult women are between 62.5 and 67.5 inches,
• 95% of adult women are between 60 and 70 inches,
• 99.7% of adult women are between 57.5 and 72.5 inches.
For Those Who Like Formulas
• Example
• In Tombstone, Arizona Territory people used Colt
.45 revolvers. However people used different
ammunition.
• Wyatt Earp knew that his brothers and Doc
Holliday were the only ones in the territory who
used Colt .45s with Winchester ammunition.
• The Earp brothers conducted tests on many
different combinations of weapons and
ammunition.They found that dataset of
observations produced by the combination of Colt
.45 with Winchester shells showed a Mean
velocity of 936 feet/second and a Standard
Deviation of 10 feet/second.
• The measurements were taken at a distance
of 15 feet from the gun.
• When Wyatt examined the body of a
cowboy shot in the back in cold blood he
concluded that he was shot at a distance of
15 feet and that the velocity of the bullet at
impact was 1,000 feet/second.
• The dastardly Ike Clanton claimed that this
cowboy was shot by the Earp brothers or
Doc Holliday. Was Wyatt able to clear his
good name using the Empirical Rule?
• The distribution of this bullet velocity data should
be approximately bell-shaped. This implies that
the empirical rule should give a good estimation of
the percentages of the data within each interval.
k# of
Standard
Deviations
2
3
4
5
6
7
Interval
Empirical
approximate
Percentage
916,
906,
896,
886,
876,
866,
95%
99.7%
~100%
~100%
~100%
~100%
956
966
976
986
996
1006
• This table quite clearly demonstrates that since the
bullet velocity in the shooting was 1000 ft/sec and
since this lies more than 6 Standard Deviations
away from the mean the probability is extremely
high that the Earps were not responsible for this
shooting.
• This is especially evident from looking at the
column showing percentages from the empirical
rule.
• Practically 100% of bullet velocities should be
between 896 and 976 ft/sec.
Example
P(3.8  X  5)
X   3.8  5
Z

  .12

10
Normal
Distribution
Standardized
Normal Distribution
 = 10
=1
.0478
3.8 = 5
X
-.12  = 0
Shaded area exaggerated
Z
Example
P(2.9  X  7.1)
Normal
Distribution
X   2.9  5
Z

  .21

10
X   7.1  5
Z

 .21
Standardized

10
Normal Distribution
 = 10
=1
.1664
.0832 .0832
2.9 5 7.1 X
-.21 0 .21
Shaded area exaggerated
Z
Example
P(X  8)
X  85
Z

 .30

10
Normal
Distribution
Standardized
Normal Distribution
 = 10
=1
.5000
.3821
.1179
=5
8
X
=0
Shaded area exaggerated
.30 Z
Example
P(7.1  X  8)
Normal
Distribution
X   7.1  5
Z

 .21

10
X  85
Z

 .30

10
Standardized
Normal Distribution
 = 10
=1
.1179
.0347
.0832
 = 5 7.1 8 X
 = 0 .21 .30 Z
Shaded area exaggerated
Solution*
P(2000  X  2400)
X   2400  2000
Z

 2.0

200
Normal
Distribution
Standardized
Normal Distribution
 = 200
=1
.4772
 = 2000 2400
X
=0
2.0
Z