Download 3.3 Density Curves and Normal Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
3.3 Density Curves and
Normal Distributions
•
•
•
•
•
•
•
•
1
Density Curves
Measuring Center and Spread for Density Curves
Normal Distributions
The 68-95-99.7 Rule
Standardizing Observations
Using the Standard Normal Table
Inverse Normal Calculations
Normal Quantile Plots
Exploring Quantitative Data
2
We now have a kit of graphical and numerical tools for describing
distributions. We also have a strategy for exploring data on a single
quantitative variable. Now, we’ll add one more step to the strategy.
Exploring Quantitative Data
1. Always plot your data: Make a graph.
2. Look for the overall pattern (shape, center, and spread) and
for striking departures such as outliers.
3. Calculate a numerical summary to briefly describe center
and spread.
4. Sometimes, the overall pattern of a large number of
observations is so regular that we can describe it by a
smooth curve.
2
Density curves
A density curve is a mathematical model of a distribution…
The total area under the curve, by definition, is equal to 1, or 100%.
The area under the curve for a range of values is the proportion of
all observations for that range.
Area under Density Curve ~ Relative Frequency of Histogram
Histogram of a sample with the
smoothed, density curve
describing theoretically the
population.
rel. freq of left
histogram=287/947=.303
area = .293 under rt.
curve
Density Curves
A density curve is a curve that:
 Is always on or above the horizontal axis
 Has an area of exactly 1 underneath it
A density curve describes the overall pattern of a
distribution. The area under the curve and above any
range of values on the horizontal axis is the
proportion of all observations that fall in that range.
4
Density curves come in many
shapes. Some are well known
mathematically and others aren’t
– but they all lie above the
horizontal axis and have total
area = 1.
Density Curves
6
Our measures of center and spread apply to density
curves as well as to actual sets of observations.
Distinguishing the Median and Mean of a Density Curve
 The median of a density curve is the equal-areas point―the point that
divides the area under the curve in half.
 The mean of a density curve is the balance point, at which the curve
would balance if made of solid material.
 The median and the mean are the same for a symmetric density curve.
They both lie at the center of the curve. The mean of a skewed curve is
pulled away from the median in the direction of the long tail.
6
Density Curves
 The mean and standard deviation computed from
actual observations (data) are denoted by x and s,
respectively, and are called the sample mean and
sample standard deviation.
 The mean and standard deviation of the idealized
distribution represented by the density curve are
denoted by µ (“mu”) and  (“sigma”), respectively,
and are sometimes called the population mean and
population standard deviation.
7
Normal Distributions
One particularly important class of density curves are the Normal curves,
which describe Normal distributions.
 All Normal curves are symmetric, single-peaked, and bell-shaped.
 A specific Normal curve is described by giving its mean µ and
standard deviation σ.
8
Normal Distributions
A Normal distribution is described by a Normal density curve.
Any particular Normal distribution is completely specified by two
numbers: its mean µ and standard deviation σ.
 The mean of a Normal distribution is the center of the
symmetric Normal curve.
 The standard deviation is the distance from the center to the
change-of-curvature points on either side, the points of
inflection of the density.
 We abbreviate the Normal distribution with mean µ and
standard deviation σ as N(µ,σ).
9
Normal distributions
Normal – or Gaussian – distributions are a family of symmetrical, bellshaped density curves defined by a mean m (mu) and a standard
deviation  (sigma) : N(m,).
1
f (x) =
e
s 2p
2
1æ x -m ö
- ç
÷
2è s ø
x
e = 2.71828… The base of the natural logarithm
π = pi = 3.14159…
x
A family of density curves
Here, means are the same (m = 15)
while standard deviations are
different ( = 2, 4, and 6).
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
Here, means are different
(m = 10, 15, and 20) while
standard deviations are the
same ( = 3).
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
The 68-95-99.7 Rule
The 68-95-99.7 Rule
In the Normal distribution with mean µ and standard deviation σ:
 Approximately 68% of the observations fall within σ of µ.
 Approximately 95% of the observations fall within 2σ of µ.
 Approximately 99.7% of the observations fall within 3σ of µ.
Here’s a N(64.5”, 2.5”)
distribution of heights of
college-aged females.
12
The 68-95-99.7 Rule
The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for
7th-grade students in Gary, Indiana, is close to Normal. Suppose the
distribution is N(6.84, 1.55).

Sketch the Normal density curve for this distribution.

What percent of ITBS vocabulary scores are less than 3.74?

What percent of the scores are between 5.29 and 9.94?
13
Standardizing Observations
If a variable x has a distribution with mean µ and standard
deviation σ, then the standardized value of x, or its z-score, is
z
x-μ

Note z= the number of s.d.’s away from mu that x is…(sorry about
the grammer!)
All Normal distributions are the same if we measure in units of size σ
from the mean µ as center.
The standard Normal distribution is the
Normal distribution with mean 0 and
standard deviation 1. That is, the standard
Normal distribution is N(0,1) – it is
represented by Z and we write Z ~ N(0,1)
14
The Standard Normal Table
Because all Normal distributions are the same when we standardize, we
can find areas under any Normal curve from a single table.
The Standard Normal Table
Table A is a table of areas under the standard Normal curve. The
table entry for each value z is the area under the curve to the left
of z.
15
The Standard Normal Table
Suppose we want to find the proportion of observations from the
standard Normal distribution that are less than 0.81.
We can use Table A:
P(z < 0.81) = 0.7910
16
Z
0.00
0.01
0.02
0.7
0.7580
0.7611
0.7642
0.8
0.7881
0.7910
0.7939
0.9
0.8159
0.8186
0.8212
Tips on using Table A
To calculate the area between 2 z- values, first get the area under N(0,1) to the
left for each z-value from Table A.
Then subtract the smaller
area from the larger area.
A common mistake made by
students is to subtract both z
values - it is the areas that are
subtracted, not the z-scores!
area between z1 and z2
=
area left of z1 – area
left of z2
area right of z =
1
-
area left of z
Normal Calculations
How to Solve Problems Involving Normal Distributions
Express the problem in terms of the observed variable X.
Draw a picture of the distribution of X and shade the area of
interest under the curve.
Perform calculations.
 Standardize X to restate the problem in terms of a standard
Normal variable Z.
 Use Table A and the fact that the total area under the curve is
1 to find the required area under the standard Normal curve.
Write your conclusion in the context of the problem.
18
Inverse Normal Calculations
According to the Health and Nutrition Examination
Study of 1976–1980, the heights X (in inches) of
adult men aged 18–24 are N(70, 2.8).
How tall must a man be (? below) to be in the lower 10% for men
aged 18–24?
N(70, 2.8)
0.10
?
19
70
Inverse Normal Calculations
N(70, 2.8)
How tall must a man be in
the lower 10% for men aged
18–24?
0.10
? 70
Look up the closest probability
(closest to 0.10) in the table.
Find the corresponding z the
standardized score.
The value you seek is that many
standard deviations from the
mean.
20
z
0.07
0.08
0.09
–1.3
0.0853
0.0838
0.0823
–1.2
.1020
0.1003
0.0985
–1.1
0.1210
0.1190
0.1170
Z = –1.28
Normal Calculations
How tall must a man be in the lower
10% for men aged 18–24?
N(70, 2.8)
0.10
Z = –1.28
? 70
We need to “unstandardize” the z-score to find the observed value (x):
z
xm

x  m  z
x = 70 + z(2.8)
= 70 + [(1.28 )  (2.8)]
= 70 + (–3.58) = 66.42
A man would have to be approximately 66.42 inches tall or less to
place
in the lower 10% of all men in the population.
21
Normal Quantile Plots
One way to assess if a distribution is indeed approximately Normal is to
plot the data on a Normal quantile plot.
The data points are ordered from smallest to largest and their
percentile ranks are converted to z-scores with Table A. These z-scores
are then plotted against the data to create a Normal quantile plot.
 If the distribution is indeed Normal, the plot will show a straight
line, indicating a good match between the data and a Normal
distribution – in JMP the points fall within the dotted lines.
 Systematic deviations from a straight line indicate a non-Normal
distribution. Outliers appear as points that are far away from the
overall pattern of the plot – some points fall outside the dotted
lines in JMP.
22
Normal Quantile Plots
Good fit to a straight line: the distribution
of rainwater pH values is close to
normal. The intercept of the line ~ mean
of the data and the slope of the line ~
s.d. of the data
23
Curved pattern: The data are not
Normally distributed. Instead, it shows
a right skew: A few individuals have
particularly long survival times.
Normal quantile plots are complex to do by hand, but they are easy to
do in JMP – under the red triangle, choose Normal Quantile Plot – but
notice the difference when compared to the above plots…
HW:
•Finish reading section 3.3
•Work over all the examples!
•I’ve put up some videos on computing Normal
Probabilities as an online assignment… due
9/19 & 9/22 at 9:00am
•Work on #3.82-3.93, 3.95, 3.100-3.102,
3.104, 3.106-3.109, 3.110-3.128. Do as many
of these problems in order to really
understand what’s going on here!
•Quiz in class on Monday 9/22
•Test #1 on October 1, covering Chapts. 1-4