Download 501_Lecture_01

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
The Normal Distributions







Density curves
Normal distributions
The 68-95-99.7 rule
The standard normal distribution
Normal distribution calculations
Standardizing observations
Normal quantile plots
Example: Here is a histogram
of vocabulary scores of 947
seventh graders.
The smooth curve drawn
over the histogram is a
mathematical model for the
distribution.
3
A density curve is a curve that:


is always on or above the horizontal axis
has an area of exactly 1 underneath it
A density curve describes the overall pattern of a
distribution. The area under the curve and above any
range of values on the horizontal axis is the proportion
of all observations that fall in that range.
Density curves
The areas of the shaded bars in this
histogram is equal to 0.303.
For the density curve, the
proportion of the area to the left of
6.0 is now equal to 0.293.
The areas of the shaded bars in this
histogram represent the proportion
of scores in the observed data that
are less than or equal to 6.0. This
proportion is equal to 0.303.
Now the area under the smooth
curve to the left of 6.0 is shaded. If
the scale is adjusted so the total
area under the curve is exactly 1,
then this curve is called a density
curve. The proportion of the area to
the left of 6.0 is now equal to 0.293.
6



Mode – Location where the curve is high-peaked
Median – the equal-areas point. Half the area on
each side.
Mean – the balance point of the density curve
◦ Think of placing a wedge so that the density would
balance like on a see-saw or teeter totter
◦ Hard to visually find the mean for skewed curves

Mathematical formulas are used to calculate the
mean, median, standard deviation, etc.
Normal Density Curve
A right skewed density curve
Mean is the balance of the density curve


μ – mean of the idealized distribution (of the
density curve)
σ – standard deviation of the idealized
distribution

x – mean of the actual observations (sample

mean)
s – standard deviation of the actual
observations (sample standard deviation)






Symmetric, unimodal, bell-shaped
Characterized by mean (μ) and stdev (σ).
Mean is the point of symmetry
Can visually speculate σ (inflection point?)
Good description of many real variables (test
scores, crop yields, height)
Approximates many other distributions
Normal Distributions
1
f ( x) 
e
 2
1  x 
 

2  
2


Described only by mean and standard deviation
Instead of writing it out each time, we shorthand
Normal using N and put the mean and stdev in
parenthesis:
general normal: N(µ, σ)
standard normal: N(0,1)
The 68-95-99.7 Rule
In the Normal distribution with mean µ and standard deviation σ:
 Approximately 68% of the observations fall within σ of µ.

Approximately 95% of the observations fall within 2σ of µ.

Approximately 99.7% of the observations fall within 3σ of µ.
16


The distribution of heights of young women aged
18 to 24 is approximately normal with mean µ =
64.5 inches and standard deviation σ = 2.5
inches.
Between what two points do 68% of the women
fall into? 95%? 99.7%?

Tables for normal distribution with mean µ = 0 and
stdev σ = 1 (N(0,1)) are available
◦ see page T-2 near the back of the book
1. First learn how to find out different types of
probabilities for N(0,1) (standard normal curve).
2. Then learn to convert ANY normal distribution
to a standard normal and find the
corresponding probability
The Standard Normal Table
•
•
Table always give the area to the left
Suppose we want to find the proportion of observations from the
standard Normal distribution that are less than 0.81.
P(z < 0.81) = .7910
Z
.00
.01
.02
0.7
.7580
.7611
.7642
0.8
.7881
.7910
.7939
0.9
.8159
.8186
.8212
20

What proportion of observations on a
standard normal variable Z take
values
◦ less than 2.2 ?
◦ greater than -2.05 ?

If I give you a probability, can you find the
corresponding z value?
 called percentiles
◦ What is the z-score for the 25th percentile of the N(0,1)
curve?
◦ 90th percentile?
Standardizing Observations
If a variable x has a distribution with mean µ and standard
deviation σ, then the standardized value of x, or its z-score, is
z
x-μ

All Normal distributions are the same if we measure in units of size σ from
the mean µ as center.
The standard Normal distribution
is the Normal distribution with mean
0 and standard deviation 1. That is,
the standard Normal distribution is
N(0,1).
25

Example: Compute the standardized scores for
women 70 inches tall and 60 inches tall. (μ=64.5
σ=2.5)

For SAT scores on individual sections, the
scores are approximately normal with mean 500
and standard deviation 100.
◦ What percent of students score a 700 or higher?
◦ What percent of students score in between 600 and
700?
◦ How high must you score on a section to be in the top
1% of all test takers?
◦ Mark’s score on one section was the 68th percentile,
what score did he get?
Normal Quantile Plots
One way to assess if a distribution is indeed approximately Normal is to
plot the data on a Normal quantile plot or QQplot from SAS.
The data points are ranked and the percentile ranks are converted to zscores with Table A. The z-scores are then used for the x-axis against
which the data are plotted on the y-axis of the Normal quantile plot.
 If the distribution is indeed Normal, the plot will show a straight
line along the 45 degree line, indicating a good match between
the data and a Normal distribution.
 Systematic deviations from a straight line indicate a non-Normal
distribution. Outliers appear as points that are far away from the
overall pattern of the plot.
28
Good fit to a straight line: The
distribution of rainwater pH values
is close to Normal.
Curved pattern: The data are not Normally
distributed. Instead, the data are right
skewed: A few individuals have particularly
long survival times.
Normal quantile plots are complex to do by hand, but they are standard features
in most statistical software.
29





Hypothesized mathematical models for
distributions: Density curves
Normal Distribution
Evaluating probabilities for standard normal
distribution
Evaluating probabilities for ANY normal
distribution by converting it to a standard normal
distribution
Normal quantile plot (qqplot)