Download Chapter 2 The Normal Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Section 2.2
Density curves and Normal
Distributions
Activity


Roll one die 20 times and record the
results
Make a dotplot
Roll two dice 20 times and record the
sum of the two die
Make a dotplot
Exploring Quantitative Data
In Chapter 1, we developed a kit of graphical and
numerical tools for describing distributions. Now, we’ll add
one more step to the strategy.
Exploring Quantitative Data
1. Always plot your data: make a graph, usually a dotplot,
stemplot, or histogram.
2. Look for the overall pattern (shape, center, and spread) and
for striking departures such as outliers.
3. Calculate a numerical summary to briefly describe center
and spread.
4. Sometimes the overall pattern of a large number of
observations is so regular that we can describe it by a
smooth curve (density curve)
2
4
8
6
S u m o f Tw o D i c e
10
12
Density Curves
• A density curve is a mathematical model for
the distribution
– A mathematical model is an idealized description
• Ignores minor irregularities as well as any
outliers
• Describes the overall pattern of a distribution
Density Curves
• Is always on or above the horizontal axis
• Has area exactly 1 underneath it
• The area under the curve and between any range
of values is the proportion of all observations
that fall in that range
• Most common density
curve is the Normal curve
Mean and median of a density curve
• Median of a density curve is the equal-areas point (hard
to find with skewed curves)
• Mean is the balance point of a density curve (at which
the curve would balance if made of solid material)
• In a symmetric distribution the mean and median are
in the same place (the center of the curve)
• In a skewed distribution the mean is pulled towards the
tail
• Can basically locate the mean, median, and quartiles of
any density curve (not the standard deviation)
http://www.shodor.org/interactivate/activities/PlopIt/
Describing Density Curves

Because a density curve is an idealized description of the
distribution of data, we need to distinguish between the
mean and standard deviation of the density curve and the
mean and standard deviation computed from the actual
(sample) observations
actual observations
density curve
Mean
x

standard deviation
s
 (sigma)
(mu)
Density Curve practice
Density Curve practice
Normal Distributions
One particularly important class of density curves are the
Normal curves, which describe Normal distributions.
• All Normal curves have the same overall shape: symmetric,
single- peaked, and bell-shaped
• Any specific Normal curve is completely described by giving its
mean µ and its standard deviation σ.
• Changing µ without changing σ moves the normal curve along
the horizontal axis without changing its spread.
• The standard deviation σ controls the spread of a normal
curve.
• Inflection points are the points at which the change of the
curvature takes place and are located at a distance σ on either
side of the mean.
• We abbreviate the Normal distribution with mean µ and
standard deviation σ as N(µ,σ).
Empirical Rule: 68-95-99.7
In the Normal distribution with mean μ and standard deviation σ.
Remember
values in a
data set must
appear to be a
normal bellshaped
histogram,
dotplot, or
stemplot to
use the
Empirical
Rule!
Warmup
1. Suppose the average height of freshmen at MRHS
is 60 inches with a standard deviation of 1.5 inches.
What is the z-score for a freshman who has a height
of a. 58 inches? Z= -1.33
b. 60.15 inches?
Z= 0.1
2. Suppose the average height of sophomores at
MRHS is 62 inches with a standard deviation of 2
inches. What is the height of the sophomore (x- value)
that corresponds to a
a) z-score = 0?
b) z-score = -2.44?
c) z-score = 3.1?
x= 62 in
x= 57.12 in
x= 68.2 in
3. Suppose the average height of juniors at MRHS is
unknown but the standard deviation is 2.5 inches.
What is the population mean height of juniors if a
junior 66 inches tall corresponds to a z-score of μ= 67.8 in
0.75?
4. Suppose the height of seniors at MRHS is 67 inches
but the standard deviation is unknown. What is the
standard deviation knowing
a) a senior 68.5 inches tall has a corresponding z-score of .87?
σ = 1.73 in
b) a senior 63 inches tall has a corresponding z-score of –2.43?
(This resulting population standard deviation should be
σ = 1.65
different from the answer to a.)
in
5. An incoming freshman took her college’s
placement exams in French and math. In French,
she scored 82 and in math, 86. The overall results
on the French exam had a mean of 72 and a
standard deviation of 8, while the mean math
score was 76 with a standard deviation 12. On
which exam did she do better RELATIVE to her
classmates?
This shows that she
performed better on
the French exam
than on the Math
exam relative to her
classmates.
Standard Normal
• Normal distributions vary from one another in two main
ways:
• The mean, µ, may be located anywhere on the x-axis
• The bell shape may be more or less spread according to the size
of the standard deviation, σ
• We can standardize our distribution in order to find
proportions and compare to other distributions
The Standard Normal Distribution
All Normal distributions are the same if we measure in units of size σ
from the mean µ as center.
The standard Normal distribution is the Normal distribution with mean 0 and
standard deviation 1.
If a variable x has any Normal distribution N(µ,σ) with mean µ and standard
deviation σ, then the standardized variable
z
x -

has the standard Normal distribution, N(0,1).
Standard Normal Table
• Table A – table of areas under the standard
normal curve. The table entry for each value z is
the area under the curve to the left
• We can answer any
question about
proportion of
observations in a
normal distribution
by standardizing and
then using the
standard normal
Always draw your curve and shade what you
are looking for!!!!!
table
The Standard Normal Table
The standard Normal Table (Table A) is a table of areas under the
standard Normal curve. The table entry for each value z is the area
under the curve to the left of z.
Suppose we want to find the proportion
of observations from the standard Normal
distribution that are less than z = 0.81.
We can use Table A:
Z
.00
.01
.02
0.7
.7580
.7611
.7642
0.8
.7881
.7910
.7939
0.9
.8159
.8186
.8212
P(z < 0.81) = .7910
Practice
1. Find the proportion of observations from the standard
normal distribution that are less than 1.4
0.9192 or 91.92%
Practice
2. Find the proportion of observations from the standard normal
distribution that are greater than –2.15
1-0.0158=0.9842 or
98.42%
Always make sure that you draw a picture and shade the area of
interest and ensure that your answer is reasonable in the context
of the problem
Find the following probabilities:
3. P(z < 1)
4. P(z < 0)
50%
84.13%
5. P(z < 1.5)
93.32%
6. P(z > 1.5)
1-.9332 = 6.68%
7. P(2.3 < z < 3.1)
.9990-.9893= 0.97%
Normal Distribution Calculations
We can answer a question about areas in
any Normal distribution by standardizing and
using Table A or by using technology.
How To Find Areas In Any Normal Distribution
Step 1: State the distribution and the values of interest.
Draw a Normal curve with the area of interest shaded
and the mean, standard deviation, and boundary
value(s) clearly identified.
Step 2: Perform calculations—show your work! Do one of
the following: (i) Compute a z-score for each boundary
value and use Table A or technology to find the desired
area under the standard Normal curve; or (ii) use the
normalcdf command and label each of the inputs.
Step 3: Answer the question in context.
Going back to Warmup #5
An incoming freshman took her college’s placement
exams in French and math. In French, she scored 82
and in math, 86. The overall results on the French
exam had a mean of 72 and a standard deviation of 8,
while the mean math score was 76 with a standard
deviation 12. What percent of students did she score
higher than for both tests?
French: 89.44%
Math: 79.67%
The area on Table A corresponds to
the percentiles.
Using your Calculatornormalcdf (instead of Table A)
Scores on the SAT are approximately normal
N(505, 110) distribution. What proportion of
students fall:
Normalcdf
Below 100? Lower: -1E99
Upper: 100
µ: 505
σ: 110
P(x<100) ≈ 0
Above 330?
Normalcdf
Lower: 330
Upper: 1E99
µ: 505
σ: 110
P(x<100) ≈ 0.944
Between 200 and 600?
Normalcdf
Lower: 200
Upper: 600
µ: 505
σ: 110
P(x<100) ≈ 0.803
Working Backwards: Normal Distribution Calculations
Sometimes, we may want to find the observed
value that corresponds to a given percentile. There
are again three steps.
How To Find Values From Areas In Any Normal Distribution
Step 1: State the distribution and the values of interest.
Draw a Normal curve with the area of interest shaded
and the mean, standard deviation, and unknown
boundary value clearly identified.
Step 2: Perform calculations—show your work! Do one of
the following: (i) Use Table A or technology to find the
value of z with the indicated area under the standard
Normal curve, then “unstandardize” to transform back
to the original distribution; or (ii) Use the invNorm
command and label each of the inputs.
Step 3: Answer the question.
How high must a student score in order to place in the
top 10% of all students taking the SAT?
A percentile of 90% corresponds to a zscore around 1.29
𝑧=
𝑥−𝜇
𝜎
1.29 =
𝑥 − 505
110
𝑥 = 646.9
invNorm
Area= 0.9
µ= 505
σ=110
X=645.97
Remember to answer in context and MAKE SURE YOU
ROUND THE CORRECT DIRECTION!!!!!
Warm up
1. According to http://www.cdc.gov/growthcharts/, the
heights of three-year-old females are approximately Normally
distributed with a mean of 94.5 cm and a standard deviation
of 4 cm. What is the third quartile of this distribution?
#61 An airline flies the same route at the same time each
day. The flight time varies according to a Normal distribution
with unknown mean and standard deviation. On 15% of
days, the flight takes more than an hour. On 3% of days, the
flight lasts 75 minutes or more. Use this information to
determine the mean and standard deviation of the flight time
distribution.
Ways for Assessing Normality
1. Construct a stemplot or histogram
Determine if the graph has an approximate bell shaped curve
2. See if your data follows the Empirical Rule (68, 95,
99.7%)
3. Construct a normal probability plot (aka normal
quantile plot)
Assess the graph using the graphing calculator
Does the data look approximately linear?
Assessing Normality
The Normal distributions provide good models for some
distributions of real data.
Many statistical inference procedures are based on the
assumption that the population is approximately
Normally distributed.
A Normal probability plot provides a good assessment
of whether a data set follows a Normal distribution.
Interpreting Normal Probability Plots
If the points on a Normal probability plot lie close to a straight
line, the plot indicates that the data are Normal.
Systematic deviations from a straight line indicate a non-Normal
distribution.
Outliers appear as points that are far away from the overall
pattern of the plot.
Example of GRE scores
Normal probability plot of hurricane
losses
Normal Probability Plots
Remember the comparison of Barry Bonds to
Hank Aaron in Ch 1?
What was the shape of Aaron’s data:
13
20
24
26
27
29
30
32
34
34
38
39
39
40
40
44
44
44
44
45
47
• If the data distribution is close to a normal
distribution, the plotted points will lie close
to a straight line.
• Nonnormal data will show a nonlinear trend
• Outliers appear as points that are far away
from the overall pattern of the plot
• Try our Barry Bond’s example
from Ch 1.
• We proved before that 73 was
an outlier. See how it looks on
the Normal probability plot.
The measurements listed below describe the usable capacity (in cubic
feet) of a sample of 36 side-by-side refrigerators (Consumer Reports, May
2010). Are the data close to Normal?
12.9 13.7 14.1 14.2 14.5 14.5 14.6 14.7 15.1 15.2 15.3 15.3
15.3 15.3 15.5 15.6 15.6 15.8 16.0 16.0 16.2 16.2 16.3 16.4
16.5 16.6 16.6 16.6 16.8 17.0 17.0 17.2 17.4 17.4 17.9 18.4
Here is a histogram of these data. It
seems roughly symmetric and bellshaped.
Does this data follow the Empirical rule?
What does this tell you?
Create a Normal Probability Plot. What can you conclude?
Since we have proved out
data are very close to
Normal, we can go ahead
and use our Normal
calculations (normalcdf,
invNorm, z-scores, etc)