Download density curve.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Normal Distributions
Overview
2
Introduction
So far we two types of tools for describing
distributions…graphical and numerical. We also have a strategy
for exploring data on a single quantitative variable:
1. Always plot your data: make a histogram. Remember to label
and scale these for good communication!
2. Look for (and verbally describe) the overall pattern (shape,
center, spread, peaks) and for striking deviations such as
outliers/gaps of the variable’s distribution.
3. Based on the results of the graphical analysis, choose either
the five-number summary or the mean and standard
deviation to briefly describe center and spread in the
numbers. Be aware of the numerical summary limitations.
4. NOW we add, if the overall pattern of a large number of
observations is very regular we can describe it by a smooth
curve.
3
Density Curves
Think of drawing a curve through the tops of the bars in a
histogram, smoothing out the irregular ups and downs of the
bars.
1. Most histograms show the counts of observations in each
class by the heights of their bars and therefore by the areas
of the bars. We set up curves to show the proportion of
observations in any region by areas under the curve.
• Choose the scale so that the total area under the curve is
exactly 1. We then have a density curve.
2. A histogram is a plot of data obtained from a sample. We use
this histogram to understand the actual distribution of the
population from which the sample was selected.
• The density curve is intended to reflect the
idealized shape of the population distribution.
4
Center and Spread of Density Curves
Density curves help us better understand our measures of center
and spread. Areas under a density curve represent proportions of
the total number of observations.
• The median is the point with half the observations on either
side. So the median of a density curve is the equal-areas point,
the point with half the area under the curve to its left and the
remaining half of the area to its right.
• The quartiles divide the area under the curve into quarters.
One-fourth of the area under the curve is to the left of the first
quartile, and three-fourths of the area is to the left of the third
quartile.
• You can roughly locate the median and quartiles of any density
curve by eye by dividing the area under the curve into four
equal parts.
5
Center and Spread of Density Curves
If we think of the observations as weights stacked on a seesaw,
the mean is the point at which the seesaw would balance. This
fact is also true of density curves. The mean is the point at which
the curve would balance if made of solid material.
• A symmetric curve balances at its center because the two sides
are identical.
• The mean and median of a symmetric density curve are equal.
We know that the mean of a skewed distribution is pulled
toward the long tail.
The mean of a density curve is the point at which it would balance.
6
Center and Spread of Density Curves
Median and Mean of a Density Curve
The median of a density curve is the equal-areas point, the
point that divides the area under the curve in half.
The mean of a density curve is the balance point, or center of
gravity, at which the curve would balance if made of solid
material.
The median and mean are the same for a symmetric density
curve. They both lie at the center of the curve. The mean of a
skewed curve is pulled away from the median in the direction of
the long tail.
Normal Distributions
The mathematical, ideal versions of Normal distributions
are perfectly symmetrical, bell-shaped distributions with a
single peak. In the ideal version, the peak corresponds to the
mean, median, and mode of the distribution. There are an
infinite number of Normal Distributions.
BE CAUTIOUS! Not all symmetric, bell-shaped distributions
are Normally distributed, so do not assume that every bell
shaped curve in Normally distributed. This is addressed in
courses in Statistics. Keep in mind that:
No real world data set matches these idealized curves
exactly.
You cannot judge normality of data on the basis on visual
examination, we will use the methods discussed here
only when TOLD that the variable is normally distributed.
8
Normal Distributions
Normal curves are symmetric, single-peaked, and bell-shaped.
Their tails fall off quickly, so that we do not expect outliers.
Because Normal distributions are symmetric, the mean and
median lie together at the peak in the center of the curve.
Normal curves have the special property that giving the mean
and the standard deviation completely specifies the curve. The
mean fixes the center of the curve, and the standard deviation
determines its shape.
• Changing the mean of a Normal distribution does not change
its shape, only its location on the axis.
• Changing the standard deviation does change the shape of a
Normal curve.
Characteristics of Normal Distributions
The curve drops smoothly on both sides, flattening near but never
touching the x-axis.
The points of inflection (where the curve changes from concave down
to concave up) occur on either side of the mean, median and mode
value, at about 60% of the height of the highest point and enclose about
2/3 of the total area.
The inflection points are located horizontally at 1 standard deviation, σ
(sigma), on either side of the mean, μ (mu)
There are Infinite Normal Distributions
• Each specific Normal curve is described completely by its
mean, μ and standard deviation, σ.
• μ give location along the x axis, σ determines shape
• The total area under the curve is 1…probabilities/
percentages are found by determining areas in intervals.
11
Normal Density Curves
Here is a summary of basic facts about Normal curves.
Normal Density Curves
The Normal curves are symmetric, bell-shaped curves
that have these properties:
• A specific Normal curve is completely described by
giving its mean and its standard deviation.
• The mean determines the center of the distribution. It is
located at the center of symmetry of the curve.
• The standard deviation determines the shape of the
curve. It is the distance from the mean to the changeof-curvature points on either side.
Same mean, μ, but different standard deviations, σ
13
The 68-95-99.7 Rule
There are many Normal curves, each described by its mean and
standard deviation. All Normal curves share many properties. In
particular, the standard deviation is the natural unit of
measurement for Normal distributions. This fact is reflected in
the following rule.
The 68-95-99.7 Rule
In any Normal distribution, approximately
• 68% of the observations fall within one standard
deviation of the mean.
• 95% of the observations fall within two standard
deviations of the mean.
• 99.7% of the observations fall within three standard
deviations of the mean.
14
The 68-95-99.7 Rule (Empirical Rule)
15
The 68-95-99.7 Rule
The distribution of Iowa Test of Basic Skills (ITBS) vocabulary
scores for 7th-grade students in Gary, Indiana, is Normal with
mean 6.84 and standard deviation 1.55.
•
Sketch the Normal density curve for this distribution.
•
What percent of ITBS vocabulary scores are less than 3.74?
Given that the ITBS
vocabulary scores for 7th
graders in Gary Indiana are
Normal with µ = 6.84 and σ =
1.55, we would expect about
2.5% of their ITBS vocabulary
scores to be less than 3.74.
16
The 68-95-99.7 Rule
The distribution of Iowa Test of Basic Skills (ITBS) vocabulary
scores for 7th-grade students in Gary, Indiana, is Normal with
mean 6.84 and standard deviation 1.55.
•
What percent of the scores are between 5.29 and 9.94?
Given that the ITBS
vocabulary scores for 7th
graders in Gary Indiana are
Normal with µ = 6.84 and σ
= 1.55, we would expect
about 13.5% of their ITBS
vocabulary scores to be
between 5.29 and 9.94.
Empirical Rule Usage is very limited!!!
The empirical rule is a quick and easy way to approximate
areas under a normal curve…but it only works if we are
interested in areas exactly 1, 2 or 3 standard deviations
from the mean.
To look at other areas (probabilities), we need a different
method. The most typical way to do this involves using a
standard normal table. This is the method used in most
MATH 220 sections, so we will cover that next
But for today, we will practice the 68, 95, 99.7 rule
approach …
Your turn:
68 – 95 - 99.7 (Empirical) rule examples
The distribution of heights of young men in the US is
nearly normally distributed with a mean 70 inches and
standard deviation 2.5 inches. Use the 68-95-99.7 rule
to answer the questions that follow.
• Start by labeling the variable on the x axis and then
marking the values of the mean, 1 standard
deviation, 2 standard deviation and 3 standard
deviation intervals on a sketched normal curve.
Your turn:
68 – 95 - 99.7 (Empirical) rule examples
The distribution of heights of young men in the US is
nearly normally distributed with a mean 70 inches and
standard deviation 2.5 inches. Use the 68-95-99.7 rule
to answer the questions that follow.
68 – 95 - 99.7 (Empirical) rule examples
• About what percent of US young men are taller than
75 inches? Shade the relevant area on curve.
Shade the relevant area on curve.
Find the relevant area/probability.
Write an appropriate contextual sentence that answers the question
asked.
68 – 95 - 99.7 (Empirical) rule examples
• About what percent of US young men are taller than
75 inches? Shade the relevant area on curve.
Given that the heights of young men from the US are
nearly Normal with µ = 70 inches and σ = 2.5 inches, we
would expect about 2.5% of these young men to have
heights above 75 inches.
68 – 95 - 99.7 (Empirical) rule examples
• Between what values do the heights of the middle
95% of US young men fall?
Shade the relevant area on curve.
Find the relevant area/probability.
Write an appropriate contextual sentence that answers the question asked.
68 – 95 - 99.7 (Empirical) rule examples
• Between what values do the heights of the middle
95% of US young men fall?
Since the heights of young men from the US are nearly
Normal with µ = 70 inches and σ = 2.5 inches, we
would expect that the middle 95% of such heights to
be between 65 inches and 75 inches.
68 – 95 - 99.7 (Empirical) rule examples
• Approximately how short are the shortest 16% of US
young men?
Shade the relevant area on curve.
Find the relevant area/probability.
Write an appropriate contextual sentence that answers the question asked.
68 – 95 - 99.7 (Empirical) rule examples
• Approximately how short are the shortest 16% of US
young men?
Since the heights of young men from the US are nearly Normal with
µ = 70 inches and σ = 2.5 inches, we would expect that the
shortest 16% of these men have heights less than 67.5 inches.
68 – 95 - 99.7 (Empirical) rule examples
• Approximately what percent of US young men are
taller than 67.5 inches?
Shade the relevant area on curve.
Find the relevant area/probability.
Write an appropriate contextual sentence that answers the question asked.
68 – 95 - 99.7 (Empirical) rule examples
• Approximately what percent of US young men are
taller than 67.5 inches?
Given that the heights of young men from the US are nearly
Normal µ = 70 inches and σ = 2.5 inches, we would expect that
about 84% of these young men would have heights above 67.5
inches.
68 – 95 - 99.7 (Empirical) rule examples
• What is the approximate probability that a
randomly chosen young man from the US has a
height between 67.5 and 75 inches?
Shade the relevant area on curve.
Find the relevant area/probability.
Write an appropriate contextual sentence that answers the question asked.
68 – 95 - 99.7 (Empirical) rule examples
• What is the approximate probability that a
randomly chosen young man from the US has a
height between 67.5 and 75 inches?
Since the heights of young men from the US are nearly
Normal µ = 70 inches and σ = 2.5 inches, there is about a
.815 probability that a young man randomly selected from
this group would have a height between 67.5 inches and 75
inches.