Download Chap2.2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Suppose we measured the right foot length of 30 teachers and graphed the
results.
Assume the first person had a 10 inch foot.
Number of People with
that Shoe Size
If our second subject had a 9 inch foot, we would add her to the graph.
As we continued to plot foot lengths, a pattern would
begin to emerge.
8
7
6
5
4
3
2
1
.
4
5 6
7
8
9
10
11
12
Length of Right Foot
13
14
Number of People with
that Shoe Size
Notice how there are more people (n=6) with a 10 inch right foot than any other length.
Notice also how as the length becomes larger or smaller, there are fewer and fewer
people with that measurement. This is a characteristics of many variables that we
measure. There is a tendency to have most measurements in the middle, and fewer
as we approach the high and low extremes.
8
7
6
5
4
3
2
1
If we were to connect the top of each bar, we would create a
frequency polygon.
.
4
5
6
7 8
9 10
11
12
Length of Right Foot
13
14
Number of People with
that Shoe Size
You will notice that if we smooth the lines, our data almost creates a bell
shaped curve.
8
7
6
5
4
3
2
1
4
5
6
7
8
9 10 11 12 13 14
Length of Right Foot
You will notice that if we smooth the lines, our data almost creates a bell
shaped curve.
Number of People with
that Shoe Size
This bell shaped curve is known as the “Bell Curve” or the “Normal Curve.”
8
7
6
5
4
3
2
1
4
5
6
7
8
9 10 11 12 13 14
Length of Right Foot
Number of Students
Whenever you see a normal curve, you should imagine the HISTOGRAM within it.
9
8
7
6
5
4
3
2
1
12 13 14
15 16 17 18 19 20 21 22
Points on a Quiz
What can you say about the Mean, Median and Mode of a Normal Curve?
So WHAT ABOUT THE
STANDARD DEVIATION????

The inflection points
(where the curve
starts to flatten out)
represent the width of
the standard deviation
μ-σ
AP Statistics, Section 2.1, Part 1
μ
μ+σ
7
Normal distributions are a
family of distributions that
have the same general bell
shape. They are symmetric
(the left side is an exact
mirror of the right side) with
scores more concentrated in
the middle than in the tails.
Examples of normal
distributions are shown to
the right. Notice that they
differ in how spread out they
are although the area under
each curve is always the
same and equal to 1!
The mean and standard deviation are useful ways to describe a set of
scores as they also determine the shape of the bell curve. If the scores
are grouped closely together, the curves will have a smaller standard
deviation than if they are spread farther apart.
Large Standard Deviation
Small Standard Deviation
.
Same Means
Different Standard Deviations
Different Means
Same Standard Deviations
Different Means
Different Standard Deviations
THE NORMAL EQUATION
Notation:
N(μ,σ) is a normal distribution
N(0,1) is the standard normal
distribution
“Standardizing” is the process
of doing a linear translation
from N(μ,σ) into N(0,1)
IN GENERAL, A Normal
Distribution is:







Symmetrical bell-shaped (unimodal) density curve
Above the horizontal axis
Each curve is specified by its Mean & Standard
Deviation: N(m, s)
The transition points occur at m + s
Probability is calculated by finding the area under
the curve
As s increases, the curve flattens &
spreads out
As s decreases, the curve gets
taller and thinner
All Normal distributions can be “standardized” and are therefore the
same if we measure in units of size σ from the mean µ as center.
Definition:
The standard Normal distribution is the Normal distribution
with mean 0 and standard deviation 1.
If a variable x has any Normal distribution N(µ,σ) with mean µ
and standard deviation σ, then the standardized variable
z
x -m
s
has the standard Normal distribution, N(0,1).

Normal Distributions

Standard Normal Distribution
+
 The
The EMPIRICAL Rule
Normal models give us an idea of how
extreme a value is by telling us how likely
it is to find one that far from the mean.
 We can find these numbers precisely, but
until then we will use a simple rule that
tells us a lot about the Normal model…


It turns out that in a Normal model:
 about 68% of the values fall within one standard
deviation of the mean;
 about 95% of the values fall within two standard
deviations of the mean; and,
 about 99.7% (almost all!) of the values fall within
three standard deviations of the mean.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
68-95-99.7 (or Empirical) Rule
AP Statistics, Section 2.1, Part 1
15
Example, p. 113
Normal Distributions
The distribution of Iowa Test of Basic Skills (ITBS)
vocabulary scores for 7th grade students in Gary,
Indiana, is close to Normal. Suppose the distribution is
N(6.84, 1.55).
a) Sketch the Normal density curve for this distribution.
b) What percent of ITBS vocabulary scores are less than
3.74?
c) What percent of the scores are between 5.29 and 9.94?
Finding Normal Percentiles by Hand


When a data value doesn’t fall exactly 1, 2, or 3
standard deviations from the mean, we can look it
up in a table of Normal percentiles.
A Table of Standard Normal Probabilities provides
us with normal percentiles, but many calculators
and statistics computer packages provide these
as well.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.

The Standard Normal Table
Definition:
The Standard Normal Table
Table A is a table of areas under the standard Normal curve. The table
entry for each value z is the area under the curve to the left of z.
Suppose we want to find the
proportion of observations from the
standard Normal distribution that are
less than 0.81.
We can use Table A:
Z
.00
.01
.02
0.7
.7580
.7611
.7642
0.8
.7881
.7910
.7939
0.9
.8159
.8186
.8212
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Normal Distributions
Because all Normal distributions are the same when we
standardize, we can find areas under any Normal curve from
a single table.
P(z < 0.81) = .7910

+
Example, p. 117
Finding Areas Under the Standard Normal Curve
Normal Distributions
Find the proportion of observations from the standard Normal
distribution that are between -1.25 and 0.81.
Can you find the same proportion using a different approach?
1 - (0.1056+0.2090) = 1 – 0.3146
= 0.6854
Strategies for finding probabilities or
proportions in normal distributions
1
2
3
4
5
6
Express the problem in terms of the observed
variable x
Make a graph of the data to check the Nearly
Normal Condition to make sure we can use the
Normal distribution to model the distribution.
Draw a picture of the distribution and shade the
area of interest under the curve.
Standardize x to restate the problem in terms of a
standard Normal variable z.
Use Table A or the calculator and the fact that the
total area under the curve is 1 to find the required
area under the standard Normal curve.
Conclude: Write your conclusion in the context of
the problem.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
+
Distribution Calculations
When Tiger Woods hits his driver, the distance the ball travels can be
described by N(304, 8). What percent of Tiger’s drives travel between 305
and 325 yards?

When x = 305, z =
305 - 304
 0.13
8
When x = 325, z =
325 - 304
 2.63
8
Normal Distributions
 Normal

Using Table A, we can find the area to the left of z=2.63 and the area to the left of z=0.13.
0.9957 – 0.5517 = 0.4440. About 44% of Tiger’s drives travel between 305 and 325 yards.
+
Cautions

We should only use the z-table when the
distributions are normal, and data has been
standardized
 The
z-table only gives the amount of data
found below the z-score, THAT IS THE AREA
TO THE LEFT OF THE z-score!
 If
you want to find the portion found above
the z-score, subtract the probability found on
the table from 1.
22
AP Statistics,
Section 2.2, Part 1
+
EXAMPLE:
Find
the proportion of observations
from the standard Normal
distribution the is greater than .81
+
From Percentiles to Scores: z in
Reverse
 Sometimes
we start with areas and need to
find the corresponding z-score or even the
original data value.
 Example:
What z-score represents the first
quartile in a Normal model?
+
From Percentiles to Scores: z in
Reverse (cont.)

Look in theTable for an area of 0.2500.

The exact area is not there, but 0.2514 is pretty close.

This figure is associated with z = –0.67, so the first
quartile is 0.67 standard deviations below the mean.
Will my calculator do any of
this normal stuff?
• Normalpdf – use for graphing ONLY
• Normalcdf – will find probability of
area from lower bound to upper bound
• Invnorm (inverse normal) – will find zscore for probability
• THESE COMMANDS ARE FOUND IN
THE DISTRIBUTION MENU
Finding Normal Percentiles using the calculator



Go to the Distribution key on your calculator
Find NORMCDF
Use the key stroke: NORMCDF(min z,max z)
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Example


Men’s heights are N(69,2.5).
What percent of men are taller than 68 inches?
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
AP Statistics,
28
Section 2.2, Part 1
Working with intervals

What proportion of men are between 68 and 70
inches tall?
AP Statistics, Section 2.2,
Copyright © 2010, 2007, 2004 Pearson Education,
Inc.1
Part
29
INVERSE NORM

We can also use the calculator to also find the zscore for a particular area:
INVNORM(prop of area to the left)
Note: when using the calculator, entering μ,σ will
“un-standardize” the data
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Working backwards

How tall must a woman be in order to be in the
top 15% of all women?
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
AP Statistics,
31
Section 2.2, Part 1
Working backwards

How tall must a man be in order to be in the 90th
percentile?
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
AP Statistics,
32
Section 2.2, Part 1
Working backwards

What range of values make up the middle 50% of
men’s heights?
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
AP Statistics,
33
Section 2.2, Part 1
+
REMINDER:
CAUTION!!!
Whether
using the calculator or
Table, we should only use the ztable when the distributions are
normal, and data has been
standardized
34
AP Statistics,
Section 2.2, Part 1
Therefore,
we need a strategy for
assessing Normality.
Plot the data.
•Make a dotplot, stemplot, or histogram and see if the graph is
approximately symmetric and bell-shaped.
•Check the 1-Vars Stats & compare mean & median
Check whether the data follow the 68-95-99.7 rule.
•Count how many observations fall within one, two, and three
standard deviations of the mean and check to see if these
percents are close to the 68%, 95%, and 99.7% targets for a
Normal distribution.
+
Normality
Normal Distributions
 Assessing
+
Are You Normal?
How Can You Tell?
A more
specialized graphical display
that can help you decide whether a
Normal model is appropriate is the
Normal probability plot.
If
the distribution of the data is roughly
Normal, the Normal probability plot
approximates a diagonal straight line.
Deviations from a straight line indicate
that the distribution is not Normal.
Most software packages can construct Normal probability plots.
These plots are constructed by plotting each observation in a data set
against its corresponding percentile’s z-score.
Interpreting Normal Probability Plots
If the points on a Normal probability plot lie close to a straight line,
the plot indicates that the data are Normal. Systematic deviations from
a straight line indicate a non-Normal distribution. Outliers appear as
points that are far away from the overall pattern of the plot.
Normal Distributions

Probability Plots
+
 Normal
Are You Normal? How Can You Tell? (cont.)

Nearly Normal data have a histogram and a
Normal probability plot that look somewhat like
this example:
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Are You Normal? How Can You Tell? (cont.)

A skewed distribution might have a histogram and
Normal probability plot like this:
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Are Walter Johnson’s Wins
Normal?



5, 14, 13, 25, 25, 33,
36, 28, 27, 25, 23, 23,
20, 8, 17, 15, 17, 23,
20, 15, 5 into list L1
Run “1-Var Stats”
Is the data set
symmetric? Where do
you look?
AP Statistics, Section 2.2,
Copyright © 2010, 2007, 2004 Pearson Education,
Inc.2
Part
40
Are Walter Johnson’s Wins
Normal?


Look also at boxplot
Is the data set
symmetric?
AP Statistics, Section 2.2,
Copyright © 2010, 2007, 2004 Pearson Education,
Inc.2
Part
41
68-95-99.7 Rule?

You can use the 68-95-99.7 rule with a histogram
to see if the distribution roughly fits the rule.

You will also want to check the Normal Probability
Plot.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
AP Statistics,
42
Section 2.2, Part 2
In Summary:
What Can Go Wrong?

Don’t use a Normal model when the distribution is
not unimodal and symmetric.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
What Can Go Wrong? (cont.)




Don’t use the mean and standard deviation when
outliers are present—the mean and standard
deviation can both be distorted by outliers.
Don’t round off too soon.
Don’t round your results in the middle of a
calculation.
Don’t worry about minor differences in results.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
What have we learned?

The story data can tell may be easier to
understand after shifting or rescaling the data.
 Shifting data by adding or subtracting the same
amount from each value affects measures of
center and position but not measures of spread.
 Rescaling data by multiplying or dividing every
value by a constant changes all the summary
statistics—center, position, and spread.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
What have we learned? (cont.)

We’ve learned the power of standardizing data.
 Standardizing uses the SD as a ruler to
measure distance from the mean (z-scores).
 With z-scores, we can compare values from
different distributions or values based on
different units.
 z-scores can identify unusual or surprising
values among data.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
What have we learned? (cont.)

We’ve learned that the 68-95-99.7 Rule can be a
useful rule of thumb for understanding NORMAL
distributions:
 For data that are unimodal and symmetric,
about 68% fall within 1 SD of the mean, 95% fall
within 2 SDs of the mean, and 99.7% fall within
3 SDs of the mean.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
What have we learned? (cont.)

We see the importance of Thinking about whether
a method will work:
 Normality Assumption: We sometimes work
with Normal tables. These tables are based on
the Normal model.
 Data can’t be exactly Normal, so we check the
Nearly Normal Condition by making a
histogram (is it unimodal, symmetric and free of
outliers?) or a normal probability plot (is it
straight enough?).
Copyright © 2010, 2007, 2004 Pearson Education, Inc.