Download Chapter6

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 6
Measures of Location
Normal Distributions
Larson/Farber 4th ed
1
Objectives
•
•
•
•
•
•
Define & Understand Percentiles & their Graphs
Define & interpret z-scores
Define and Understand the Normal Curve
Interpret and Apply the Empirical Rule
Interpret graphs of normal probability distributions
Find areas under the standard normal curve
Larson/Farber 4th ed
2
One way to describe the location of a value in a distribution
is to tell what percent of observations are less than it.
Definition:
The pth percentile of a distribution is the value
with p percent of the observations less than it.
Example, p. 85
Jenny earned a score of 86 on her test. How did she perform
relative to the rest of the class?
6 7
7 2334
7 5777899
8 00123334
8 569
9 03
Her score was greater than 21 of the 25
observations. Since 21 of the 25, or 84%, of the
scores are below hers, Jenny is at the 84th
percentile in the class’s test score distribution.
Describing Location in a Distribution

Position: Percentiles
+
 Measuring
+
Relative Frequency Graphs
A cumulative relative frequency graph (or ogive)
displays the cumulative relative frequency of each
class of a frequency distribution.
Age
Frequency
Relative
frequency
Cumulative
frequency
Cumulative
relative
frequency
2/44 =
4.5%
2
2/44 =
4.5%
7/44 =
15.9%
9
13/44 =
29.5%
22
12/44 =
34%
34
4044
2
4549
7
5054
13
5559
12
6064
7
7/44 =
15.9%
41
41/44 =
93.2%
6569
3
3/44 =
6.8%
44
44/44 =
100%
9/44 =
20.5%
22/44 =
50.0%
34/44 =
77.3%
Cumulative relative frequency (%)
Age of First 44 Presidents When They Were
Inaugurated
100
80
60
40
20
0
40
45
50
55
60
65
Age at inauguration
70
Describing Location in a Distribution
 Cumulative

Estimate and interpret the 65th percentile of the distribution
65
11
47
58
+
Describing Location in a Distribution
Cumulative Relative Frequency Graphs
A cumulative relative frequency graph (or ogive) displays the
cumulative relative frequency of each class of a frequency
distribution.
 Was Barack Obama, who was inaugurated at age 47,
unusually young?
Comparing data sets
• How do we compare results when they are
measured on two completely different
scales?
• One solution might be to look at
percentiles
• What might you say about a woman that is
in the 50th percentile and a man in the 15th
percentile?
6
Another way of comparing
• Another way of comparing: Look at
whether the data point is above or below
the mean, and by how much.
• Example: Compare an SAT score of 1080
to an ACT score of 28 given the mean
SAT is 896 with standard deviation of 174
& the mean ACT is 20.6 with a standard
deviation of 5.2
7
The Standard Deviation as a Ruler
• The trick in comparing very different-looking values
is to use standard deviations as our rulers.
• The standard deviation tells us how the whole
collection of values varies, so it’s a natural ruler for
comparing an individual to a group.
• As the most common measure of variation, the
standard deviation plays a crucial role in how we look
at data.
Slide 6- 8
Standardizing with z-scores
• We compare individual data values to their mean,
relative to their standard deviation using the
following formula:
y  y

z
s
• We call the resulting values standardized values,
denoted as z. They can also be called z-scores.
Slide 6- 9
Standardizing with z-scores (cont.)
• Standardized values have no units.
• z-scores measure the distance of each data value from
the mean in standard deviations.
• A negative z-score tells us that the data value is below
the mean, while a positive z-score tells us that the
data value is above the mean.
Slide 6- 10
Benefits of Standardizing
• Standardized values have been converted from their
original units to the standard statistical unit of
standard deviations from the mean.
• Thus, we can compare values that are measured on
different scales, with different units, or from different
populations.
Slide 6- 11
Back to z-scores
• As the formula indicates, standardizing data into zscores shifts the data by subtracting the mean and
rescales the values by dividing by their standard
deviation.
 Standardizing into z-scores does not change the
shape of the distribution.
 Standardizing into z-scores changes the center by
making the mean 0.
 Standardizing into z-scores changes the spread by
making the standard deviation 1.
Slide 6- 12
When Is a z-score Big?
• A z-score gives us an indication of how unusual a
value is because it tells us how far it is from the
mean.
• Remember that a negative z-score tells us that the
data value is below the mean, while a positive z-score
tells us that the data value is above the mean.
• The larger a z-score is (negative or positive), the more
unusual it is.
Slide 6- 13
EXAMPLE
• Enter the following data into L1 on your
calculator and construct a histogram. Also
run the 1-Vars Stats and record the mean
and sample standard deviation.
Amount of precipitation in Newark over the course of a
year:
3.54 2.88 4.16 4.2
4.09 4.02 4.76 3.7
3.82
3.6
3.65 3.81
When Is a z-score Big? (cont.)
• There is no universal standard for z-scores, but there
is a model that shows up over and over in Statistics.
• This model is called the Normal model (You may
have heard of “bell-shaped curves.”).
• Normal models are appropriate for distributions
whose shapes are unimodal and roughly symmetric.
• These distributions provide a measure of how
extreme a z-score is.
Slide 6- 15
When Is a z-score Big? (cont.)
• There is a Normal model for every possible
combination of mean and standard deviation obtained
by rescaling the data to z-scores.
 We write N(μ,σ) to represent a Normal model with
a mean of μ and a standard deviation of σ.
Slide 6- 16
Properties of Normal Distributions
Normal distribution
• A continuous probability distribution for a random
variable, x.
• The most important continuous probability
distribution in statistics.
• The graph of a normal distribution is called the
normal curve.
x
Larson/Farber 4th ed
17
Properties of Normal Distributions
1. The mean, median, and mode are equal.
2. The normal curve is bell-shaped and symmetric
about the mean.
3. The total area under the curve is equal to one.
4. The normal curve approaches, but never touches the
x-axis as it extends farther and farther away from the
mean.
Total area = 1
μ
Larson/Farber 4th ed
x
18
Properties of Normal Distributions
5. Between μ – σ and μ + σ (in the center of the curve),
the graph curves downward. The graph curves
upward to the left of μ – σ and to the right of μ + σ.
The points at which the curve changes from curving
upward to curving downward are called the
inflection points.
Inflection points
μ  3σ
Larson/Farber 4th ed
μ  2σ
μσ
μ
μ+σ
μ + 2σ
μ + 3σ
x
19
Means and Standard Deviations
• A normal distribution can have any mean and any
positive standard deviation.
• The mean gives the location of the line of symmetry.
• The standard deviation describes the spread of the
data.
μ = 3.5
σ = 1.5
Larson/Farber 4th ed
μ = 3.5
σ = 0.7
μ = 1.5
σ = 0.7
20
Example: Understanding Mean and
Standard Deviation
1. Which curve has the greater mean?
Solution:
Curve A has the greater mean (The line of symmetry
of curve A occurs at x = 15. The line of symmetry of
curve B occurs at x = 12.)
Larson/Farber 4th ed
21
Example: Understanding Mean and
Standard Deviation
2. Which curve has the greater standard deviation?
Solution:
Curve B has the greater standard deviation (Curve
B is more spread out than curve A.)
Larson/Farber 4th ed
22
Example: Interpreting Graphs
The heights of fully grown white oak trees are normally
distributed. The curve represents the distribution. What
is the mean height of a fully grown white oak tree?
Solution:
μ = 90 (A normal
curve is symmetric
about the mean)
Larson/Farber 4th ed
σ = 3.5 (The inflection
points are one standard
deviation away from
the mean)
23
The Standard Normal Distribution
Standard normal distribution
• A normal distribution with a mean of 0 and a standard
deviation of 1.
Area = 1
3
2
1
z
0
1
2
3
• Any x-value can be transformed into a z-score by
using the formula
Value - Mean
x-
z

Standard deviation

Larson/Farber 4th ed
24
The Standard Normal Distribution
• If each data value of a normally distributed random
variable x is transformed into a z-score, the result will
be the standard normal distribution.
Normal Distribution

z

Larson/Farber 4th ed
x
x-
Standard Normal
Distribution

1
0
z
25
When Is a z-score Big? (cont.)
• When we use the Normal model, we are assuming the
distribution is Normal.
• We cannot check this assumption in practice, so we
check the following condition:
 Nearly Normal Condition: The shape of the data’s
distribution is unimodal and symmetric.
 This condition can be checked with a histogram or
a Normal probability plot (to be explained later).
Slide 6- 26
The First Three Rules for Working with
Normal Models
• Make a picture.
• Make a picture.
• Make a picture.
• And, when we have data, make a histogram to check
the Nearly Normal Condition to make sure we can
use the Normal model to model the distribution.
Slide 6- 27
The 68-95-99.7 Rule
• Normal models give us an idea of how extreme a
value is by telling us how likely it is to find one that
far from the mean.
• We can find these numbers precisely, but until then
we will use a simple rule that tells us a lot about the
Normal model…
Slide 6- 28
The 68-95-99.7 Rule (cont.)
• It turns out that in a Normal model:
 about 68% of the values fall within one standard
deviation of the mean;
 about 95% of the values fall within two standard
deviations of the mean; and,
 about 99.7% (almost all!) of the values fall within
three standard deviations of the mean.
Slide 6- 29
The 68-95-99.7 Rule (cont.)
• The following shows what the 68-95-99.7 Rule tells
us:
Slide 6- 30
The Empirical Rule

AN EXAMPLE:
Normal Distributions
The distribution of Iowa Test of Basic Skills (ITBS) vocabulary
scores for 7th grade students in Gary, Indiana, is close to
Normal. Suppose the distribution is N(6.84, 1.55).
a) What is the mean & standard deviation of this Normal
Distribution?
b) Sketch the Normal density curve for this distribution.
c) What percent of ITBS vocabulary scores are less than 3.74?
d) What percent of the scores are between 5.29 and 9.94?
68+13.5 = 81.5%
Example – Using the Normal Distribution (Empirical
Rule)
IQ scores are normally distributed with mean 100 and
standard deviation 15. Find the proportion of the
population with IQ scores in the given interval. Also
find the probability that a randomly selected individual
has an IQ score in the given interval.
(a) Between 85 and 115
(b) At least 130
(c) At most 130
Finding Normal Percentiles by Hand
• When a data value doesn’t fall exactly 1, 2, or 3
standard deviations from the mean, we can look it up
in a table of Normal percentiles.
• Table Z in Appendix G provides us with normal
percentiles, but many calculators and statistics
computer packages provide these as well.
Slide 6- 34
Finding Normal Percentiles by Hand
(cont.)
• Table Z is the standard Normal table. We have to convert our data
to z-scores before using the table.
• Figure 6.5 shows us how to find the area to the left when we
have a z-score of 1.80:
Slide 6- 35
• Normal Distribution Calculations
When Tiger Woods hits his driver, the distance the ball travels can be
described by N(304, 8). What percent of Tiger’s drives travel between 305
and 325 yards?

305 - 304
 0.13
8
When x = 325, z =
325 - 304
 2.63
8
Normal Distributions

When x = 305, z =
Using Table A, we can find the area to the left of z=2.63 and the area to the left of z=0.13.
0.9957 – 0.5517 = 0.4440. About 44% of Tiger’s drives travel between 305 and 325 yards.
Cautions
•
We should only use the z-table when the
distributions are normal, and data has been
standardized
• The z-table only gives the amount of data found
below the z-score, THAT IS THE AREA TO
THE LEFT OF THE z-score!
• If you want to find the portion found above the
z-score, subtract the probability found on the
table from 1.
37
AP Statistics,
Section 2.2, Part 1
EXAMPLE:
• Find the proportion of observations from
the standard Normal distribution the is
greater than .81
Will my calculator do any of
this normal stuff?
• Normalpdf – use for graphing ONLY
• Normalcdf – will find proportion of area from
lower bound to upper bound in z-scores or
data values
• Invnorm (inverse normal) – will find z-score
or data value for a given proportion of area
• THESE COMMANDS ARE FOUND IN THE
DISTRIBUTION MENU of VARS key
Finding Normal Percentiles using the
calculator




Go to the Distribution key on your calculator
Find NORMCDF
Use the key stroke: NORMCDF(min z,max z)
Without using z-scores:
NORMCDF(min value, max value, Mean, St. Dev)
Example Using Data Values
• Men’s heights are Normally distributed according to
N(69,2.5).
a) What proportion of men are between 68 and 70
inches tall?
b) What percent of men are taller than 68 inches?
c) What percent of men are shorter than 68 inches?
41
AP Statistics,
Section 2.2, Part 1
Application of the Normal
Curve
The amount of time it takes for a pizza delivery is
approximately normally distributed with a mean of 25
minutes and a standard deviation of 2 minutes. If you order
a pizza, find the probability that the delivery time will be:
a.
between 25 and 27 minutes.
.3413
a. ___________
b.
less than 30 minutes.
.9938
b. __________
c.
less than 22.7 minutes.
.1251
c. __________
Example: Finding Probabilities for
Normal Distributions
A survey indicates that for each trip to the supermarket,
a shopper spends an average of 45 minutes with a
standard deviation of 12 minutes in the store. The length
of time spent in the store is normally distributed and is
represented by the variable x. A shopper enters the store.
Find the probability that the shopper will be in the store
for between 24 and 54 minutes.
Larson/Farber 4th ed
43
Example: Finding Probabilities for
Normal Distributions
Find the probability that the shopper will be in the store
more than 39 minutes. (Recall μ = 45 minutes and
σ = 12 minutes)
Larson/Farber 4th ed
44
Example: Finding Probabilities for
Normal Distributions
If 200 shoppers enter the store, how many shoppers
would you expect to be in the store more than 39
minutes?
Solution:
Recall P(x > 39) = 0.6915
200(0.6915) =138.3 (or about 138) shoppers
Larson/Farber 4th ed
45
Example: Finding Probabilities for
Normal Distributions
A survey indicates that people use their computers an
average of 2.4 years before upgrading to a new
machine. The standard deviation is 0.5 year. A
computer owner is selected at random. Find the
probability that he or she will use it for fewer than 2
years before upgrading. Assume that the variable x
is normally distributed.
1. Draw a picture
2. Use your calculator to find the proportion of
Area under the normal model less than 2
Larson/Farber 4th ed
46
Example: Using Technology to find
Normal Probabilities
Assume that cholesterol levels of men in the United
States are normally distributed, with a mean of 215
milligrams per deciliter and a standard deviation of 25
milligrams per deciliter. You randomly select a man
from the United States. What is the probability that his
cholesterol level is less than 175?
Larson/Farber 4th ed
47
From Percentiles to Scores: z in Reverse
• Sometimes we start with areas and need to find the
corresponding z-score or even the original data value.
• Example: What z-score represents the first quartile
(25% mark) in a Normal model?
Slide 6- 48
From Percentiles to Scores: z in Reverse
(cont.)
• Look in Table Z for an area of 0.2500.
• The exact area is not there, but 0.2514 is pretty close.
• This figure is associated with z = -0.67, so the first
quartile is 0.67 standard deviations below the mean.
Slide 6- 49
INVERSE NORM
• We can also use the calculator to also find the z-score
for a particular area:
INVNORM(prop of area to the left)
So, to solve the previous problem, keystroke:
INVNORM( .25)
Note: when using the calculator, entering μ,σ will
“un-standardize” the data, that is return a data value and
not a z-score
Working backwards
• How tall must a man be in order to be in the 90th
percentile? (Recall men’s model was N(69, 2.5)
51
AP Statistics,
Section 2.2, Part 1
Working backwards
• How tall must a woman be in order to be in the top
15% of all women if heights follow a Normal model
N(60, 2.3)?
52
AP Statistics,
Section 2.2, Part 1
Scores for a civil service exam are normally distributed,
with a mean of 75 and a standard deviation of 6.5. To be
eligible for civil service employment, you must score in
the top 5%. What is the lowest score you can earn and
still be eligible for employment?
Solution:
1 – 0.05
= 0.95
0
75
Larson/Farber 4th ed
5%
?
?
z
x
An exam score in the top 5%
is any score above the 95th
percentile. Find the score
that corresponds to a
cumulative area of 0.95.
53
Solution: Finding a Specific Data Value
USE INVNORM(1-.05,75,6.5)
5%
85.69
75
The lowest score you can earn and still be eligible for employment is 86
Larson/Farber 4th ed
54
The Verbal SAT test has a mean score of 500 and a
standard deviation of 100. Scores are normally
distributed. A major university determines that it will
accept only students whose Verbal SAT scores are in
the top 4%. What is the minimum score that a
student must earn to be accepted?
...students whose Verbal SAT
scores are in the top 4%.
Mean = 500, standard deviation = 100
.9600
= .04
X= ??
Using INVNORM (.9600, 500, 100 ) =
Mean = 500, standard deviation = 100
The cut-off score is 675. Check your answer
using NORMCDF(675,5000,500,100) =
.9600
= .04
X=
HW: P. 133 31 THRU 35, & 45
REMINDER:
• CAUTION!!!
• Whether using the calculator or
Table, we should only use the z-table
when the distributions are normal,
and data has been standardized
58
AP Statistics,
Section 2.2, Part 1
Are You Normal? How Can You Tell?
• When you actually have your own data, you must
check to see whether a Normal model is reasonable.
• Looking at a histogram of the data is a good way to
check that the underlying distribution is roughly
unimodal and symmetric.
Slide 6- 59
Are You Normal? How Can You Tell?
(cont.)
• A more specialized graphical display that can help
you decide whether a Normal model is appropriate is
the Normal probability plot.
• If the distribution of the data is roughly Normal, the
Normal probability plot approximates a diagonal
straight line. Deviations from a straight line indicate
that the distribution is not Normal.
Slide 6- 60
Are You Normal? How Can You Tell?
(cont.)
• Nearly Normal data have a histogram and a Normal
probability plot that look somewhat like this example:
Slide 6- 61
Are You Normal? How Can You Tell?
(cont.)
• A skewed distribution might have a histogram and
Normal probability plot like this:
Slide 6- 62
What Can Go Wrong?
• Don’t use a Normal model when the distribution is
not unimodal and symmetric.
Slide 6- 63