Download Density Durves and the Normal Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Statistics in the hands of an engineer are like a lamppost to a drunk—
they're used more for support than illumination.
A. E. Housman
Chapter 2
Sec 2.1
In Chapter 1 we learned that exploring a single quantitative variable requires plotting the data
into a graph, usually a histogram or stem plot. We should look for an overall pattern by
discussing shape, center and spread and noting any outliers. Then we can calculate a
numerical summary to describe the center and spread...mean and standard deviation for
symmetric distributions and 5 number summary for skewed distributions.
NOW we add one more step...
Sometimes the overall pattern of a very large number of observations is SO REGULAR that
we can describe it by a smooth curve. This curve is a mathematical model for the
distribution, ie. an idealized description. It gives a quick picture of the overall pattern but
ignores minor irregularities as well as outliers. It is easier to work with a smooth curve than
with a histogram because the histogram depends on the choice of classes, while the curve
does not depend on any choices made by us.
Rationale...the bars of a histogram suggest areas....these areas represent proportions of the
observations. The total area under the smooth curve outlining the histogram is exactly 1
(representing ALL the observations). The curve is now a density curve. A density curve is a
curve that is always on or above the horizontal axis, and has area exactly 1 underneath it.
When considering a specific data point, there is area to the left and area to the right. A
NORMAL curve is one that mimics a symmetric histogram and the mean and median are
EQUAL. Other curves may be skewed as their corresponding histogram with the mean
skewed in the direction of the tail.
Graphically, the MEDIAN of a density curve is the equal-area point, the point with half the
area under the curve to its left and the other half to its right. The quartiles divide the area
under the curve into quarters. 1/4 of the area is to the left of Q1 and 3/4 of the area is to the
left of Q3. You can roughly locate the median and quartiles of any density curve by eye by
dividing the area under the curve into four equal parts. The MEAN is the point at which the
curve would BALANCE if made of solid material.
New notation...since a density curve is an idealized description of the data (not actual data),
we will distinguish between the mean and standard deviation of the curve and the mean (x )
and standard (s) from the actual observations. The notation for the mean of this idealized
distribution is Greek mu small "m") and the standard deviation is (Greek sigma small
"s").
Normal distributions result with some outcomes of chance that are repeated many, many
times. Chance experiments can be carried out on the TI-83 with some skill. Since you will
only be pretending to do the experiment, it is called a "simulation." See page 85 in the text
for rolling of a die.
Normal Distributions are symmetric, single-peaked, and bell-shaped. They are called
normal curves. All normal distributions have the same overall shape. The exact density
curve for a particular normal distribution is described by giving its mean and standard
deviation  The mean is located at the center of the symmetric curve, and is the same as the
median. Changing without changing  moves the normal curve along the horizontal axis
WITHOUT changing its spread. The standard deviation controls the spread. The curve with
the larger standard deviation is more spread out.
A density curve has points at which the curvature "changes"..,these are called "inflection
points". These points are located on both sides of the mean at a distance of The inflection
points are located at  a
Normal distributions are important because they are good descriptions for many real data
(like SAT scores and psychological tests), they are good approximations to the results of
many kinds of chance outcomes (tossing a coin, rolling a die), and they are the basis for
statistical inference procedures and work well for other roughly symmetric distributions.
Empirical Rule... In a normal distribution with mean and standard deviation 
  of the observations will fall within 1 standard deviation of the mean
 95% of the observations fall within 2 standard deviations of the mean
 of the observations fall within 3 standard deviations of the mean

Frequently test scores are reported in percentiles rather than raw scores. If you score as the
94th percentile then 94% of the students taking the test scored lower or equal to you.
Percentiles are used when wanting to examine where an individual observation stands
relative to the other individuals in the distribution. In our statistical language, the median is
the 50th percentile and Q1 is 25th and Q3 is 75th. A good explanation appears in Ex. 2.6 and
2.7 on page 89.