Download 1. Introduction 2. Sampling Distributions for Means

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics 11/Economics 40
Lecture 13
Distribution of the Sample Mean (5.2)
1. Introduction
Conceptually, chapter 5.2 says the same things as Chapter 5.1, we are just working with means now
instead of counts and proportions.
2. Sampling Distributions for Means (5.2)
Suppose we draw a simple random sample of size n from a large population. Call the observed values X1 ,
X2 , ..., Xn .
An example might be -- draw a simple random sample (SRS) of 25 stocks from 7,497 currently traded
stocks on the NYSE, AMEX & NASD. Measure the average percentage change from the sample of 25
and compare it to the population average.
Some Stata Output for the sample:
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+----------------------------------------------------chgprice |
25
17.3368
23.54894
-13.8
76.01
Min
Max
And from the population from which it was drawn:
Variable |
Obs
Mean
Std. Dev.
---------+----------------------------------------------------chgprice |
7497
12.27294
28.133
-89.36
525
A statistic: The mean of the sample of 25 is 17.3368 and it is just old x (from Chapter 1.2)
You could define x = (X1 + X2 + ... + Xn )/n.
x can be thought of as the mean from a single sample selected at random from all possible samples that
could have been generated from the population. It could also be thought of as a random variable -- it's an
outcome of a random experiment (sample).
The expected value of x is µx , the mean of the population of random variable x. In other words, the
mean of all sample means should be equal to the population mean. We can check this using a simulation.
If I were to draw 10,000 samples of size 25 (with replacement) from our population of 7,497 stocks this is
the result:
. summ
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+----------------------------------------------------chgprice |
10000
12.2264
5.588577
-4.1552
45.1392
This the overall average of 10,000 sample means. We got 12.2264, this is very close to µx
x is considered an unbiased estimator of µx when it comes from a random sample. If your samples are
not random, this relationship will not hold. For our sample of 25 stocks, the mean of the sample is
17.3368 and the mean of the population is 12.27294.
Statistics 11/Economics 40
Lecture 13
Distribution of the Sample Mean (5.2)
σ
, where sigma is
n
the standard deviation of the population. In our data here, sigma is 28.133 so the theoretical standard
28 .133
deviation for a distribution of samples of size 25 should be
= 5.6266
25
The theoretical standard deviation of all possible x 's from all possible samples is
We can check whether this holds true or not by examining the results of a simulation (you will do this in
Lab #3) (see the handout) from the output above, the standard deviation for the 10,000 samples means
(from samples of size 25) is 5.58857, again, very close to what we would expect in theory.
So note:
A sample has a mean x and it has a standard deviation s.
A population has a mean µx and a standard deviation ó
A sampling distribution or a distribution of all possible sample statistics, in this case a
σ
mean, also has a mean µ x but a standard deviation
.
n
Your sample is just one realization of all possible samples from a population.
σ
of the SAMPLE MEAN will be smaller than the standard deviation for
n
individual measurements (as in Chapter 1.3). In other words, it is easier to predict the mean of many
observations than it is to predict the value of a single observation (or to predict the average of small
samples). What is causing this? Examine the formula for the standard deviation of the sampling
distribution, note the effect of sample size on the standard deviation.
The standard deviation
Some things to consider
How close is x to µ x ... in other words, how accurate will our guesses be? In order to do this, you will
need to know the standard deviation of the population ó and the sample size n
Note how the standard deviation of the sampling distribution changes with sample size. For big samples,
the standard deviation for the sample mean will be small and for small samples, the standard deviation is
large.
3. The Central Limit Theorem and the Normal Distribution
Given a simple random sample of size n from a population having mean µ and standard deviation σ, the
sample mean x will come from a sampling distribution of means with mean µ and standard deviation =
σ
.
n
A. Basic Distributional Result
If the original population had a normal distribution, then the distribution of the sample mean will also be
normally distributed. This is good, because it means we can use the normal table (Table A) to make
inferences with a statement of probability or chance.
Example. IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. A sample
of 25 persons is drawn. How likely is it to get a sample average of 108 or more? (0.38%) How likely is it
for the first score to be 108 or more? (29.8%)
Statistics 11/Economics 40
Lecture 13
Distribution of the Sample Mean (5.2)
B. The Central Limit Theorem (p. 401)
No matter what the distribution of the original population, if the sample size is "large" (your textbook
believes that samples greater than or equal to 15 are large), the distribution of the possible sample means
will be close to the normal distribution. It is a very powerful theorem and it is the reason why the normal
distribution is so well studied.
C. Summary
Take a simple random sample from a population with mean µ and standard deviation σ . Let x be the
average of the samples taken from the population. If either
the original population is normally distributed OR the sample size n is sufficiently large,
then x will be normally distributed with expected value µ and standard deviation
σ
.
n
If the histogram for the population follows a normal curve, or if the sample size is large enough each
time, then the histogram for the possible values for x-bar will follow a normal curve that has a mean of µ
σ
and a standard deviation of
.
n
Thus, about 68% of the x-bars will be within one standard deviation, about 95% of the x-bars will be
within two standard deviations, and 99.7% of the x-bars will be within 3 SD.
Let's go back to our first sample of 25 with its mean of 17.3368. The chance of getting a mean that large
17.3368 − 12 .27294
or larger is: first calculate Z =
= .899 about .90, then do a look-up and get .8159 and
28.133
25
then take 1-.8159 to get .1841. So the chance of drawing a sample of size 25 with an average of 17.33 or
higher was a little over 18%
NOTE: The Central Limit Theorem only applies to the distribution of possible sample averages (i.e. the
sampling distribution) it says nothing about the distribution of individual scores in either the sample or
the population.
D. Example
You are interested in valuating a e-brokerage for your employers, the owners claim clients have an
average brokerage account of $19,000 with a standard deviation of $10000 (clearly not normal). They
allow you to draw a random sample of 150 clients from their database and you get a sample average of
$16,500. If the claim is truthful, how likely is it to get a sample average of 16,500 dollars or less?
10000
= 816.497 dollars, so the chance of getting a
150
single sample mean of $16,500 or less has a z = (16500-19000)/816.497 = -3.06... about 0.11% or like
something like 1 in a 1000 samples.
Well the standard deviation of all sample means is
Things to note
The e-brokerage clients accounts need not be normally distributed given the central limit theorem. The
CLT also lets you use normal calculations to figure out what the chance of getting an average of $165,00
if the claim is an average of $19000.