Download 1.3 Density Curves and Normal Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Density matrix wikipedia , lookup

Density of states wikipedia , lookup

Receiver operating characteristic wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
1.3 Density Curves and Normal Distributions
Ulrich Hoensch
Tuesday, September 11, 2012
Fitting Density Curves to Histograms
Advanced statistical software (NOT Microsoft Excel) can produce
“smoothed versions” of histograms.
Example 1 The following are histograms and corresponding
density curves for data representing: (a) the acidity or rainwater;
(b) the survival time of Guinea pigs.
Fitting Density Curves to Histograms
When fitting a density curve to a histogram, we want that for any
interval on the horizontal axis that spans the width of a collection
of rectangles, the following holds:
area of rectangles ≈ area under density curve.
This requirement follows from the more general fact that for both
histograms and density curves,
area = proportion.
Example 2: Scores of Seventh-Graders
The following figure shows the histogram and a fitted density curve
of the test scores of 947 seventh-grade students on the Iowa Test
of Basic Skills.
(a) The bars shaded in blue represent the actual proportion of
students who score less than or equal to a 6.0 (the proportion
is 0.303 = 30.3%).
(b) The area below the density curve shaded in blue represents
the proportion of students who score less than or equal to a
6.0, as predicted by the density curve (this proportion is
0.293 = 29.3%).
Example 2: Scores of Seventh-Graders
Definition of Density Curve
A density curve is a curve that
I
is always on or above the horizontal axis and
I
has area exactly 1 underneath it.
In addition, we have that for any two values a and b on the
horizontal axis,
area below the density curve between a and b ≈
proportion of observations that fall between a and b.
Median of a Density Curve
The median of a density curve is the point M on the horizontal
axis so that the area below the density curve and to the left of M
is 50% (and consequently the area to the right is also 50%).
50%
50%
Median
Percentiles of a Density Curve
The pth percentile of a density curve is the point P on the
horizontal axis so that p percent of the area below the density
curve lie to the left of P. The inter-quartile range is
consequently the extent of the middle 50% of the area.
50%
Q1
Q3
Mean of a Density Curve
The mean of a density curve is the “balance point” of the curve: if
the area below the curve were made of a solid material, the mean
would correspond to the position of the fulcrum when balancing it:
Mean and Median of a Density Curve
Unless a density curve is symmetric, the mean is not equal to the
median.
I
For right-skewed distributions the mean is larger than the
median;
I
For left-skewed distributions the mean is smaller than the
median.
Normal Distributions
Normal curves are the density functions of normal distributions.
They have the following general shape.
I
They are symmetric, unimodal (have only one peak), and
bell-shaped.
I
The mean is denoted by the symbol µ (small Greek letter
“mu”), and the standard deviation is denoted by the symbol
σ (small Greek letter “sigma”).
I
On either side of the mean there are two points, called
inflection points where the curve makes the transition from
bending upwards to bending downwards, and vice versa.
I
The standard deviation σ is the horizontal distance from the
mean µ to these inflection points.
Normal Distributions
Two normal curves are shown here.
The 68-95-99.7 Rule
Example 3: Height of Young Women
The height of young women aged 18 to 24 is approximately
normally distributed with mean µ = 64.5 inches and standard
deviation σ = 2.5 inches.
We write X ∼ N(µ, σ) if a variable X has a normal distribution
with mean µ and standard deviation σ. Consequently, we have
that for the height X of young women, X ∼ N(64.5, 2.5).
Example 3: Height of Young Women
Question. What is the percentage of women that are between 62
and 69.5 inches tall?
Answer.
I
The value 62 is one standard deviation below the mean, so
the area between the value 62 and the mean is 68%/2 = 34%.
I
The value 69.5 is two standard deviations above the mean, so
the area between the mean and the value 69.5 is
95%/2 = 47.5%.
I
We are looking for the combined area: the percentage of
women that are between 62 and 69.5 inches tall is
34% + 47.5% = 81.5%.
Example 4: LSAT Scores
The scores X on the LSAT are normally distributed with mean
µ = 150 and standard deviation σ = 15; that is X ∼ N(150, 15).
100
120
140
160
180
200
Find each of the following, using a TI-83/TI-83 Plus/TI-84
Plus calculator.
Example 4: LSAT Scores
Q. Find the percentage of people who score between 120 and 160
on the LSAT.
A.
1. Type [2ND] VARS (DISTR), select 2:
normalcdf(.
2. Type normalcdf(120,160,150,15).
3. Press ENTER. The proportion is 0.72475 . . ., so the percentage
of people who score between 120 and 160 is about 72.5%.
Example 4: LSAT Scores
The shaded area is 72.5%.
100
120
140
160
180
200
Note: The general syntax for finding the proportion between a and
b is normcdf(a,b ,µ,σ ).
Example 4: LSAT Scores
Q. Suppose you scored a 172 on the LSAT. Find the percentage of
people who scored below or at the same level.
A.
1. Type [2ND] VARS (DISTR), select 2: normalcdf(.
2. Type normalcdf(-1000000,172,150,15). (The number
−1000000 can be replaced by any very small (very negative)
number.)
3. Press ENTER. The proportion is 0.92876 . . ., so the percentage
of people who score at or below 172 is about 92.9%.
92.9%
100
120
140
160
180
200
Example 4: LSAT Scores
Q. What is the cutoff score for the top 10% (i.e. the 90th
percentile)?
A.
1. Type [2ND] VARS (DISTR), select 3:
invNorm(.
2. Type invNorm(0.9,150,15).
3. Press ENTER. The percentile is 169.22 . . ., so 90% of people
score below 169 (and 10% score above 169).
90%
100
120
140
160
180
200
Example 4: LSAT Scores
The general syntax for finding the cutoff so that the proportion p
of observations fall below this cutoff is invNorm(p,µ,σ ).
Q. Find the scores of the middle 80% of the distribution.
A. In order to find the middle 80%, we need to find the 10th and
the 90th percentile.
80%
100
120
140
160
180
200
1. The 90th percentile was computed above, it is 169.
2. Type invNorm(0.1,150,15) to find the 10th percentile. It
is 130.77 . . . ≈ 131. So the middle 80% score between 131
and 169 on the LSAT.