Download Unit 6 Summary

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Unit 6 Summary
Unit 6 covers the Normal Curve and the Central Limit Theorem. What do we mean
by "normal?" We encounter the normal (bell) curve in our daily lives but may not recognize
it. I would be willing to say that the time you commute to work over the course of a month
or two is normally distributed. The average miles per gallon you get with your car is
probably normally distributed. Beyond the fact that we encounter the normal curve in our
daily lives it has tremendous value in statistics.
The properties of the normal curve are:
•
•
•
The curve is unimodal, that is one mode over the mean
The curve is symmetrical, that is the median is the same as the mean and mode
The curve approaches but never touches the horizontal axis
We can describe a distribution (remember unit 4) using the mean and the standard
deviation. A very important feature of the normal curve is the applicability of the Empirical
Rule. Using the 68-95-99.7 Rule or Empirical Rule (page 205) we can determine the
percentage of the area under the curve between 1, 2 or 3 standard deviations on either side
of the mean. There is approximately 68% fall within ± 1 standard deviation, 95% fall within
± 2 standard deviations, and 99.7% fall within ± 3 standard deviations. A major concept
that we will address in the next unit is the fact that the area under the curve is equal to
the probability. Therefore, if 68% of the scores in a distribution are between ± 1 standard
deviation, then the probability of randomly selecting a score from the distribution in that
range is also 68%.
The 68-95-99.7 rule only applies to data values that are 1, 2, or 3 standard deviations
away from the mean. We can generalize this rule if we know precisely how many standard
deviations from the mean a particular data value is. The number of standard deviations a
data value is above or below the mean is called a standard score or z -score. The z score
distribution has a mean of 0 and a standard deviation of 1 because a data value at the mean
is 0 standard deviation away from the mean and a value that is one standard deviation away
from the mean is one standard deviation away from the mean! The z score locates a single
score on a distribution of scores in terms of how many standard deviations away from the
mean the score is. The formula for the z score is:
Z = (X – m)/ s
So if we have a distribution that has a mean of 225 and a standard deviation of 50, we can
find the z score for a value of 300 by using the formula:
Z = (300 - 225) / 50 = 75/50 = 1.5
We can now use StatCrunch {Stat -- Calculators -- Normal} to get the percentile but we could
also use Appendix A: z-Score Tables. Again StatCrunch will do this much quicker for you and
you do not run the risk of error. You should read that as I recommend you use StatCrunch!
To use the table we need the z score. For our z score of +1.5 we can look in the first column
for the X.X part of our z score and the columns for the .XX values. In the columns the first .x
is a place holder for the value in the 10's place from the first column. Looking on page
447 we find the 1.5 row in the first column and then we use the .00 column and find the
percentile is .9332 or 93.32. If we had used StatCrunch it would be:
1
Unit 6 Summary
You will notice that StatCrunch is more accurate than the table because it is showing 7
decimal places instead of just 4! As we move toward our next unit, probability, we can make
a probability statement about that score. In our example the probability of getting a score of
300 or less is 0.93.
Section 5.3 introduces us to the Central Limit Theorem. The Central Limit Theorem (CLT) is
applicable when the sample size is large (n>30). In this case, the shape of the x distribution
is irrelevant and the CLT can be used to describe the sampling distribution. Here is a
demonstration of a sampling distribution and how the CLT works for us. I want you to
consider the following population of scores: 2, 4, 6, 8
Now, I want you to calculate the mean (μ) and standard deviation (σ) of this population.
Remember that this is a population so the formula for the Standard Deviation does not use
N-1 but just N.
μ=
ΣX/N =
σ=
√ΣX2 – (ΣX)2/N / N
20/4 = 5
√120 – 202/4 / 4 = 120-100/4 = 20/4 = √5 = 2.236
These represent values that usually no one knows. It is what we will try to estimate when
we make inferences to the population in unit 8.
2
Unit 6 Summary
Next let’s look at the Central Limit Theorem.
First take all possible samples (with replacement) of two (n = 2) from this population.
There will be 16 such samples. They are represented below by Pick 1 and Pick 2. For
example, our first sample is 2, 2. There are all possible samples listed. We then will use
these data as our population and calculate the mean of this set of data and the standard
deviation.
Pick 1
Pick 2
Mean
Mean 2
Variance
2
2
2
2
4
4
4
4
6
6
6
6
8
8
8
8
2
4
6
8
2
4
6
8
2
4
6
8
2
4
6
8
2
3
4
5
3
4
5
6
4
5
6
7
5
6
7
8
80
4
9
16
25
9
16
25
36
16
25
36
49
25
36
49
64
440
0
2
8
18
2
0
2
8
8
2
0
2
18
8
2
0
Standard
Deviation
0.000
1.414
2.828
4.243
1.414
0.000
1.414
2.828
2.828
1.414
0.00
1.414
4.243
2.828
1.414
0.00
Remember these are not estimated but actual values since we have all possible samples of
size 2. This is the population of all these samples.
The mean of the means from this distribution is 80/16 = 5, which equals the
population mean. So we have shown that the mean of the means is equal to µ or the
population mean.
Now let’s see about what the theorem tells us about the standard deviation. First we will
calculate it from our data and then use the Theorem formula to see if it matches.
From our data, we calculate the Population Standard Deviation:
Sx
= √ΣX2 – (ΣX)2/N / N
= √440 – (80)2/16 / 16 (notice we divide by N since this is a population).
= √40/16
= √2.5
= 1.58
Now, we will calculate what the Central Limit Theorem tells us the standard deviation will
be. It is
σx
= σ/ √n
3
Unit 6 Summary
= 2.236 / √2
= 2.236 / 1.14142
= 1.58
Notice that they are identical. Now, if you graph all the means from our example in a
histogram you will have the Sampling Distribution for n = 2 from our population. Also you
will see that it is somewhat like a normal distribution. We will cover sampling distributions
later in Unit 8. Also, you need to remember that the Theorem applies to any distribution
only when n>30 and here we only have n = 2.
Once we have the value for Z we can use StatCrunch to give us the probability.
4