Download SPSS Guide

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
CHAPTER 7
INTRODUCTION TO SAMPLING DISTRIBUTIONS
CENTRAL LIMIT THEOREM (SECTION 7.2 OF UNDERSTANDABLE
STATISTICS)
The Central Limit Theorem says that if x is a random variable with any distribution having mean µ and
standard deviation σ, then the distribution of sample means x based on random samples of size n is such that for
sufficiently large n:
(a) The mean of the x distribution is approximately the same as the mean of the x distribution.
(b) The standard deviation of the x distribution is approximately σ
n.
(c) The x distribution is approximately a normal distribution.
Furthermore, as the sample size n becomes larger and larger, the approximations mentions in (a), (b) and (c)
become better.
We can use SPSS to demonstrate the Central Limit Theorem. The computer does not prove the theorem. A
proof of the Central Limit Theorem requires advanced mathematics and is beyond the scope of an introductory
course. However, we can use the computer to gain a better understanding of the theorem.
To demonstrate the Central Limit Theorem, we need a specific x distribution. One of the simplest is the
uniform probability distribution.
332
Copyright © Houghton Mifflin Company. All rights reserved.
Part IV: SPSS Guide
333
The normal distribution is the usual bell-shaped curve, but the uniform distribution is the rectangular or
box-shaped graph. The two distributions are very different.
The uniform distribution has the property that all subintervals of the same length inside the interval 0 to 9
have the same probability of occurrence no matter where they are located. This means that the uniform
distribution on the interval from 0 to 9 could be represented on the computer by selecting random numbers from
0 to 9. Since all numbers from 0 to 9 would be equally likely to be chosen, we say we are dealing with a
uniform (equally likely) probability distribution. Note that when we say we are selecting random numbers from
0 to 9, we do not just mean whole numbers or integers; we mean real numbers in decimal form such as
2.413912, and so forth.
Because the interval from 0 to 9 is 9 units long and because the total area under the probability graph must
by 1, the height of the uniform probability graph must be 1/9. The mean of the uniform distribution on the
interval from 0 to 9 is the balance point. Looking at the Figure, it is fairly clear that the mean is 4.5. Using
advanced methods of statistics, it can be shown that for the uniform probability distribution x between 0 and 9,
µ = 4.5 and σ = 3 3 2 ≈ 2.598
The figure shows us that the uniform x distribution and the normal distribution are quite different. However,
using the computer we will construct one hundred sample means x from the x distribution using a sample size of
n = 40. We will use 100 rows (for the 100 samples) and 40 columns (sample size is 40). We can vary the
number of samples as well as the sample size n according to how many rows and columns we use.
We will see that even though the uniform distribution is very different from the normal distribution, the
histogram of the sample means is somewhat bell shaped. We will also see that the mean or the x distribution is
close to the predicted mean of 4.5 and that the standard deviation is close to σ
n or 2.598
40 or 0.411.
Example
In order for us to get familiar with the procedure, let us first work with 100 samples using a sample size of
n = 5. Follow these steps. Also note that your results will rary.
First, name the first column (variable) x1. Enter a number (any number) in the 100th cell of the first column
to define the variable size (that is, the number of samples). Then use TransformhCompute for five times
(since our sample size n = 5). Note that TransformhCompute works with one target variable at a time. Since
our sample size is 5, we need to generate random numbers from the uniform distribution in 5 columns ( that is,
5 variables). That is why we need to use TransformhCompute for five times. Each time we use the formula
xi = RV.UNIFORM(0, 9),
here i = 1, 2, 3, 4, 5.
Note that the TransformhCompute dialog box preserves the numeric expression used most recently.
Therefore the expression RV.UNIFORM(0, 9) only needs to be entered once. After that, all you have to do in
the TransformhCompute dialog box is to change the target variable name, that is, to change the value of i.
Displayed below is our fifth use of TransformhCompute with this formula. Here i = 5. Therefore the formula
reads x5 = RV.UNIFORM(0, 9).
Copyright © Houghton Mifflin Company. All rights reserved.
334
Technology Guide Understandable Statistics, 8th Edition
Click on OK. Another hundred of random numbers will be generated in the fifth column under variable name
x5. So 100 random samples of size 5 from the uniform distribution on (0, 9) are generated.
Next, let us take the mean of each of the 100 rows (5 columns across) and store the values under the variable
name xbar. Use TransformhCompute with the formula xbar = MEAN(x1, x2, x3, x4, x5) as shown below.
Copyright © Houghton Mifflin Company. All rights reserved.
Part IV: SPSS Guide
335
Click on OK. The results follow.
Let us now look at the mean and standard deviation of xbar (the sample means) as well as its histogram, using
the menu options hAnalyzehDescriptive Statistics h Frequencies. Uncheck “Display frequency table”, click
on “Charts” and select “Histogram”, then click on “Statistics” and select “Mean” and “Std deviation”. Click on
OK. The results follow.
Copyright © Houghton Mifflin Company. All rights reserved.
336
Technology Guide Understandable Statistics, 8th Edition
Note that the histogram is already quite close to a bell shaped one. Here the sample size is only 5. When the
sample size is sufficiently large, the histogram will look more like a normal distribution.
Now let us draw 100 random samples of size 40 from the uniform distribution on the interval from 0 to
9. The steps will be the same as above, only that now we need to repeat TransformhCompute for 40 times
with the formula
xi = RV.UNIFORM(0, 9),
here i = 1, 2 . . . 40.
After that we compute the sample mean by xbar = MEAN(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12,
x13, x14, x15, x16, x17, x18, x19, x20, x21, x22, x23, x24, x25, x26, x27, x28, x29, x30, x31, x32, x33, x34,
x35, x36, x37, x38, x39, x40). Do these, and the results follow. (Your results will vary.)
Copyright © Houghton Mifflin Company. All rights reserved.
Part IV: SPSS Guide
337
Now look at the mean and standard deviation of xbar (the sample means) as well as its histogram, using the
menu options hAnalyzehDescriptive Statistics h Frequencies. Uncheck “Display frequency table”, click on
“Charts” and select “Histogram”, then click on “Statistics” and select “Mean” and “Std deviation”. Click on
OK. The results follow.
Note the Mean and Std Dev are very close to the values predicted by the Central Limit Theorem. The
histogram for this sample does not appear very similar to a normal distribution. Let’s try another
sample. The following are the results.
Copyright © Houghton Mifflin Company. All rights reserved.
338
Technology Guide Understandable Statistics, 8th Edition
This histogram looks more like a normal distribution. You will get slightly different results each time
you draw 100 samples.
LAB ACTIVITIES FOR CENTRAL LIMIT THEOREM
1. Repeat the experiment of Example 1. That is, draw 100 random samples of size 40 each from the uniform
probability distribution between 0 and 9. Then take the means of each of these samples and put the results
under the variable name xbar. Next use hAnalyzehDescriptive Statistics h Frequencies on xbar. How
does the mean and standard deviation of the distribution of sample means compare to those predicted by
the Central Limit Theorem? How does the histogram of the distribution of sample means compare to a
normal curve?
2. Next take 100 random samples of size 20 from the uniform probability distribution between 0 and 9.
Again put the means under the variable name xbar and then use hAnalyzehDescriptive Statistics h
Frequencies on xbar. How do these results compare to those in problem 1? How do the standard
deviations compare?
Copyright © Houghton Mifflin Company. All rights reserved.