Download 42.-NormalDist

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Reality Math
Dot Sulock, University of North Carolina at Asheville
Normal Distributions
Purpose: Understand enough about Normal Distributions to enable an
understanding of the March Madness module.
1. What is a Frequency Distribution?
A Frequency Distribution is a way of organizing data by counting it.
Quiz Grade
10
9
8
7
6
5
Frequency
2
5
4
1
1
The above frequency distribution of quiz grades represents 13 grades
{10, 10, 9, 9, 9, 9, 9, 8, 8, 8, 8, 7, 5}
The Quiz Grade Frequency Distribution is graphed below. Any graph which has
frequency or percentage on the vertical axis is a frequency distribution graph, also called
a histogram.
1
2. Adult Male Heights in America
The above frequency distribution is that of adult male heights in America. Adult male
heights are normally distributed with approximate average 70 inches and standard
deviation 5 inches. Inches are on the horizontal axis and frequency (%) on the vertical
axis. Where the curve is high, there are a lot of males with that height.
What does “normally distributed” mean? A normal frequency distribution is symmetrical
and “bell-shaped.” Normal distributions have very well known mathematical equations
which produce a lot of useful characteristics which apply to all normal curves. Really
understanding normal distributions requires more time than we will devote to them.
Lots of biological variables are normally distributed, like height, head size, tulip height,
acorn size, etc. Ultra-accurate measurements of machined parts are normally distributed.
If you toss a coin 100 times, record the number of heads, and repeat that process many
time, and the number of heads obtained would be normally distributed. Standardized test
scores are normally distributed. And the list goes on and on. In fact, sample averages for
2
repeated samples greater than 30 from not-normally distributed populations are normally
distributed. This is a powerful fact from statistics called The Central Limit Theorem.
Statisticians have the ability to determine whether the variables under study actually are
normally distributed. When variables are normally distributed, which is often, much can
be known about them. Testing whether a variable is normally distributed or not is
beyond the scope of this unit. We will limit ourselves to what can be known about
variables that are actually normally distributed.
Back to the adult male heights. You know what “average” is, but what is “standard
deviation”?
3. Calculating a Standard Deviation
The standard deviation is another measure of variability, or spread, or diversity of the
data. You already know how to determine the range of a variable, which is the easiest
measure of the diversity of the data. The range measures the width of the distribution
graph.
Generally when you want to know the standard deviation of a data set, the data set is
large and the standard deviation is determined by a computer program. But knowing how
to calculate the standard deviation is empowering, so let’s do the standard deviation of a
little data set by hand. Consider the data set A = {1, 1, 2, 5, 11}
5.
(a) What is the median (middle number)?
(b) Calculate the mean (average).
The first step toward calculating the standard deviation is to determine the deviations
from the mean for each data point. Deviations from the mean are simply how far above or
below the data point is from the mean. The mean of our data set is (1 + 1 + 2 + 5 + 11)/5
=4
data
point
1
1
2
5
11
total
deviation from
the mean
1-4=-3
1 - 4 = -3
2 - 4 = -2
5 - 4 = +1
11 - 4 = + 7
0
Notice that the sum of the deviations from the mean is always 0. Adding up the
deviations from the mean is a good check. If the sum of the deviations from the mean is
not 0, then your mean is not really the mean or you have made a mistake calculating the
deviations from the mean.
3
Sum of deviations from the mean doesn’t measure variability since it is always 0. So
statisticians then square all the deviations from the mean to make them add up. The
bigger the sum of squared deviations, the more variability.
data
point
1
1
2
5
11
deviation from
the mean
1-4=-3
1 - 4 = -3
2 - 4 = -2
5 - 4 = +1
11 - 4 = + 7
total
squared deviation
from the mean
9
9
4
1
49
72
However, if there was a lot of data the sum of the squared deviations might be pretty big
just because there were a lot of squared deviations. It turns out that the average squared
deviation is what is needed. The average of the squared deviations is called the variance.
(9 + 9 + 4 + 1 + 49)/ 5 = 72/5 = 14.4
Among other problems with the variance, it will have the wrong units. If our data was in
inches, for example, the variance would be in square inches. So the earlier squaring is
undone by taking the square root of the variance to get the standard deviation.
Standard deviation = 14.4 = 3.8
Excel will do this all for you if you type in = stdevp(1,1,2,5,11). Stdevp will find the
standard deviation of a known data set which is the population of interest to you. The p
 you want the standard deviation of a whole population of numbers.
after stdev means that
Another way to use Excel to get the standard deviation of these numbers is to enter them
into cells, say A3 – A7 and type into another cell = stdevp(A3:A7).
6. For another small data set B = {1,7,7, 10,10,10,11} find
(a) mean
(b) mode
(c) median
(d) average
(e) range
(f) standard deviation (by hand)
(g) standard deviation (using Excel)
4
4. Understanding “Within One Standard Deviation of the Mean”
Within one standard deviation of the mean is between the mean minus one standard
deviation and the mean plus one standard deviation.
7.
(a) What percent of the data in data set A is “within one standard
deviation of the mean,” that is, between 4 - 3.8 and 4 + 3.8?
(b) What percent of the data in data set B is “within one standard
deviation of the mean”?
5. The 68%-95%-Almost 100% rule
For a normally distributed variable, the distance from the left-hand side of the bell-shaped
curve to the right-hand side of the bell-shaped normal curve is about 6 standard
deviations.
Amazingly, all normally distributed variables have the following characteristics:



68% of the scores are within one standard deviation of the mean.
95% of the scores are within two standard deviations of the mean.
99.7% of the scores are within three standard deviations of the mean,
almost 100%
The normal curve that follows is labeled with the mean, called µ, and the standard
deviations, called  . For example, µ + 3  refers to three standard deviations above the
mean. The red lettering at the bottom indicates how many standard deviations above or
below average the interval edges are.


5
8. For a normally distributed variable, what percent of the scores will be
(a) below µ (the mean)?
(b) above µ (the mean)?
(c) between µ  1  ?
(d) between µ  2  ?
(e) below µ + 2  ?
Heights.

9. Adult Male
What percent of adult males are


(a) less than 70 inches tall?
 than 75 inches tall?
(b) less
(c) more than 80 inches tall?
(d) between 65 and 75 inches tall?
So the chart giving all the percents for all normal curve distributions is very useful if
(1) we know we have a normal distribution and (2) we only want to know percents
associated with the mean  1, 2, or 3 standard deviations. What if we want to know a
percent connected with a value of the variable that is not the mean  1, 2, or 3 standard
deviations? For example, what percent of adult males are below 6 feet (72 inches) in
height?

are below 70 inches and
Looking at the graph, we could estimate our answer. 50%
maybe half of the 34% between 70 inches and 75 inches would be below 72 inches. So
our estimate would be about 50% + 34%/2 = 67% below 72 inches.
6. Excel to the Rescue
How surprised are you that, once again, Excel comes to our aid? The somewhat
mysterious command = normdist(score, average, standard deviation, true) will give us
the percent of scores at or below the score we input. The italicized entries are question
specific. So to get a more accurate answer to the question of what percent of adult males
are below 6 feet in height, we type in = normdist(72,70,5,true) into any Excel cell,
obtaining 0.655 = 65.5%, not far from our estimate.
Sketch a new normal curve for each part of this question so you can see what area you
need. The curve above is labeled in standard deviations above and below the mean.
Excel doesn’t necessarily give you the answer to the questions, since Excel gives you the
percent of data below your number, that is, the area from your number all the way to
the left of the curve. When Excel doesn’t give you the answer, Excel gives you a
number that leads to the answer, but you have to figure out how to use it.
For example, the percent of adult males who have heights above 6 feet would be: 1 –
65.5% = 35.5%
10. What is the probability that an adult male height will be
6
(a) less than 68 inches?
(b) greater than 77 inches?
(c) less than 58 inches
(d) between 58 inches and 68 inches?
(e) less than 50 inches?
Use Excel and draw labeled, shaded, normal curves for each part of this question.
55in
60in
65in
70in
75in
80in
85in
11.Yao Ming, 姚明, from China, who plays basketball for the Houston Rockets
of the National Basketball Association, is 7’ 6” tall. What percent of adult males
are at least as tall as Yao?
7
15. SAT mathematical reasoning scores are normally distributed with mean 500
and standard deviation 100. What percent of SAT scores are (a) below 750 (b) above
750 (c) between 400 and 600 (easy question!), (d) above 550?
The percentile rank of a score is the percent of scores at or below that score. The normal
curve model is continuous, based on a curve rather than a histogram, and in the model
there is actually no probability of getting any particular score. So percentile ranks for
continuous models are the percent of scores below the score of interest.
16. What is the percentile rank of an SAT score of (a) 450 (b) 550 (c) 650
(d) 750?
17. IQ scores for some tests are normally distributed with average 100 and
standard deviation 15. What percent of people have an IQ (a) below 100? (b) below 85?
(c) below 50? (b) above 50?
18. Assume that light bulb life expectancy is normally distributed. If light bulbs
have a mean life expectancy of 1000 hours, with standard deviation 100 hours, what is
the chance of a light bulb lasting (a) less than 800 hours? (b) more than 800 hours?
19. (a) If boxes of Raisin Bran are supposed to weigh 24 ounces, and the
standard deviation of their weights is ½ ounce, what is the chance of a box of Raisin Bran
weighing less than 22 ounces? (b) So if you bought a box of Raisin Bran labeled 24
ounces and found that it only weighed 22 ounces, would you have evidence to question
the accuracy of the packaging machine?
20. You work for quality control in a manufacturing plant. A certain machine is
making rods that need to be 11.25 cm  0.05long. By measuring and calculating you
have determined that the standard deviation of these rod lengths is 0.02 cm. If the
machine is producing rods with average 11.25 cm, what percent of the rods will be too
long or too short? Turn in a labeled, shaded normal curve with this question.

21. Compare standardized test results. Assume the grades are normally
distributed. Which is a higher percentile, getting a 78 on a test with average 70 and
standard deviation 4 or getting a 95 on a test with average 85 and standard deviation 10?
8