Download Sampling Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Sampling Distributions and
The Central Limit Theorem
The BIG PICTURE of statistics is to make inferences about
UNKNOWN Population using SAMPLE Information.
For example, use Sample mean as an estimate of the population
mean of the study.
This chapter tells us how well a sample statistic such as sample
mean perform when it is used to estimate the unknown
population mean.
Recall the difference between ‘statistic’ and ‘parameter’.
Population
Parameter
Mean
m
Variance
s2
Standard deviation
s
Sample
Statistic
Mean
Variance
s2
Standard deviation
s
X
Population parameters do not change, since they describe the entire
population.
Sample statistics vary from sample to sample, therefore, a sample statistic
such as sample mean is a random variable.
For each sample, we can compute a sample mean, which will be different
from sample to sample, and we can learn about the distribution of these
sample means to see how sample means behave.
To characterize the behavior of sample means, we need to study the
distribution of all possible sample means.
Sampling distribution of sample mean
In a real world situation, population is often not available. All
we can do is to use sample information to make an
estimate or prediction of the population characteristics.
How do we know if our estimate or prediction is a 'good' one?
Example: To estimate the average weekly grocery spending for a family
in a city, a random sample of 25 families are surveyed. The sample
average is $80 and s.d. $30.
Is $80 a ‘good estimate of average grocery spending per
family in the city?
How about if we take another random sample of 25 families, and we
obtained the average to b $90. Which one is better?
Q: How do we decide a good way for estimating the average family grocery
spending?
Decide a good way for estimating the average family
grocery spending?
The Idea:
• Study the behavior of all potential sample means, each is
computed from the spending of 25 families. We can
• Then use the pattern of the general behavior of sample
means to figure out how much confidence we have when
we make our estimation or prediction.
The behavior of all possible sample means, in statistics, can
be described by the distribution of sample mean.
Based on the distribution of sample mean, when we take a
sample and obtain only one sample mean, we can tell how
close the observed sample mean is to the unknown
population mean.
So, first thing is to learn the distributional behavior of all
possible sample means. This is
Sampling distribution of Sample Mean
The distributional behavior of sample means is
characterized by four properties:
1. How do we determine the distribution of sample mean?
2. What is the center of the distribution?
3. What is the variation of the distribution?
4. What is the shape of the distribution?
The sampling distribution of sample mean is the
probability distribution of all possible sample means,
each sample mean is obtained from a random sample
of n observations drawn from the population with mean
m and standard deviation s.
NOTE: The distribution of sample mean depends on (a) the
population from which we draw the sample and (b) the
sample size, n.
How do we determine the sampling distribution of
sample mean?
x x x x xxx xx x
POPULATION
xx
x
x x
x
x
x
x x x x
x
x
x
x x xxxxx xxxx
x x x xx
x xxxxx x x x x x x x xx x x
x x x xxx x xxx xx xxx x x x xxx
Samples:
Each sample
is a random
sample of 8
SAT scores
from the
entire
population
Individual
SAT scores
x x x
xx x x
xx x xxx x x xx x x x x
x
Xxxx
xxx
x1
x
xxx
xxxxx
x2
xxxxxxxx
x3
xxxxxxxx
x4
xxxx
Xxx
xxxx
xxxx
x5
x6
Sample Means: In this example, you see only six samples and six sample means. It is not
x   xi / 8
enough to demonstrate the distribution of sample means. If we continue
to go through the same process and obtain, say, 1000 sample means,
then, we can construct histogram of these sample means. The
distribution of sample mean is shown by this histogram.
OUR GOAL is to describe the distribution of all possible sample means.
A graphical illustration of distribution of population and
distribution of sample mean
Figure A represents the weights for a sample of 26 pebbles, each weighed to
the nearest gram. Figure B represents the mean weights of random samples
of 3 pebbles each, with the mean weights rounded to the nearest gram.. One
value is circled in each distribution. Is there a difference between what is
represented by the X circled in A and the X circled in B? Please select
the best answer from the list below.
a) No, in both Figure A and Figure B, the X represents one pebble that
weighs 6 grams.
b) Yes, Figure A has a larger range of values than Figure B
c) Yes, the X in Figure A is the weight for a single pebble, while the X in
Figure B represents the average weight of 3 pebbles.
Dot plot (A): each dot
represents the weight of
an individual pebble.
This is the distribution of
the population
Dot plot (B): each dot
represents the AVERAGE
weight of THREE pebbles in
the sample.
This is the sampling distribution
of sample mean, X
Must known facts of Sampling Distribution of X
Suppose random sample of size n is drown from a population with mean m
and s.d. s. Then, we can describe the distribution of Sample Mean
based on the following two situations:
(A) If the population where we draw our sample is normal:
X will be normal with mean m and s.d. s / n
(B) If the population where we draw our sample is not normal:
(B-1) When sample size n is small (<30):
has the similar distribution shape as the population,
X and the mean will be m and s.d. will be s / n
(B-2) When n is large (>= 30) then, regardless the distribution of the
original population where we draw our samples,
X will be approximately normal with mean m and s.d.s / n
[The Fact of (B-2) is called the Central Limit Theorem]
The Distribution of Sample Mean
When Population is NOT Normal [FACT B-2]
: Central Limit Theorem] [Similar exam questions]
Take random sample of n observations from population, which is NOT normal, Then:
(1) The center (the mean) of sample means
m m
= the center (mean) of population mean x
(2) The spread (s.d.) of sample means = the spread (s.d.) of population/sqrt(n)
s.d .( X )  s x  s / n  s.d .( Population ) / n  s.d .( X ) / n
(3) If the population is not normal (could be skewed-to-right, to-left or others), then, the
shape of the distribution of sample mean depends on the sample size n.
If n is larger, the distribution shape of sample mean is closer to Normal. This is what
so-called Central Limit Theorem.
A general guideline is that when n > 30, we say the sampling distribution is
approximately normal.
Distribution of Sample Means:
still skewed, but not as
skewed as population.
Mean is m, s.d. is s / n
Population is skewed-to-right
Mean is m, s.d. is s
m
m
FACT B-2: The Central Limit Theorem
If the population from which the samples are drawn is NOT
Normal, the shape of the sampling distribution of sample
mean:
(a) If sample size n is small, the distribution shape of sample
mean is similar to the population distribution shape.
(b) If sample n is large, the distribution shape of sample mean
is closer to normal. In general, as n is larger than 30, the
distribution of sample mean is approximately NORMAL,
regardless the distribution shape of the population.
X is approximately Normal with mean, m x  m (the population mean)
and s.d. of X is s x 
s
Population s.d.

n
n
Example : Sampling Distribution of Sample Mean
[Similar Exam problems]
1. Suppose we draw a random sample of size n = 10 from bank
accounts in a large city. We are interest in the average amount
of saving per 10 accounts.
The individual saving does not follow a normal curve. In fact, the
distribution of individual saving is very skewed to right.
Suppose we know the population average saving is m = $3000
and s = $2000.
Q: What would be the distribution of sample means, each is the
average of 10 accounts drawn from this population?
Answer
ANS: The sampling distribution of Sample Means,
each is the average of 10 account savings drawn
from this very skewed population would be:
The shape of the distribution of sample means is still
skewed, but, less skewed than the individual
account saving distribution. (This is FACT B1)
The mean of the distribution of Sample Means is
m X  $3000,
and the standard deviation is: s X  s / n  2000 / 10  $632.46
Example : Sampling Distribution of
Sample Mean [Similar Exam problems]
2. Suppose we draw a random sample of size n = 50 from bank accounts in
a large city. We are interest in the average amount of saving per 50
accounts.
The individual saving does not follow a normal curve. In fact, the distribution
of individual saving is very skewed to right. Suppose we know the
population average saving is m = $3000 and s = $2000. Question: What
would be the distribution of sample means, each is the average of 50
accounts drawn from this population?
Answer
ANS: The sampling distribution of Sample Means,
each is the average of 50 account savings drawn
from this very skewed population would be:
The shape of the distribution of sample means is
approximately normal (This is Central Limit
Theorem (Fact B2)
The mean of the distribution of Sample Means is
m X  $3000,
and the standard deviation is: s X  s / n  2000 / 50  $282.84
Some Important Points related to Sampling
distribution of Sample Mean
• The difference between distribution of sample mean and the original
population distribution is the variation of sample mean is getting smaller
when sample size is getting larger:
s.d .( X )  s x  s / n  s.d .( Population ) / n  s.d .( X ) / n
• The s.d .( X )  s x  s / n tells us that sample means will be closer to
the population mean when sample size is larger.
• Applying the empirical rule to the distribution of sample mean tells us
that we are sure that about 68% of sample means will be within one
s / n of population mean, m. About 95% of sample means will be within
two s / n of population mean, m. This works like magic. Since, this
allows us to determine that one unit of error of using sample mean to
estimate population is s / n .
• As you see when sample size is large, this error becomes smaller.
Examples: calculate probabilities based on the sampling
distribution of sample mean.
[Similar exam questions]
A random sample of size n = 25 is chosen from a normal population with
known mean, m8, and s.d., s = 4.
(a) Determine the sampling distribution of sample mean.
(b) Determine the probability of having sample mean less than 7.
(c) Determine the probability of having sample mean between 7 and 9.
(d) What is the 75th percentile of the sample mean?
(b)
(c)
Answer to Q(b)
From Q(a) we have
X ~ N(8, 0.8)
Q(b) asks P( X < 7) . Note that the mean =8 and sd 0.8. Now
use your TI Calculator or the table to find the answer.
Answer is .10565
Answer to Q(c)
From Q(a) we have X ~ N(8, 0.8)
Q(c) asks P(7 < X < 9) . Note that mean =8 and sd 0.8,
then, use TI calculator or the table to get
Answer is .7887
Answer to Q(d)
From Q(a) we have
X
~ N(8, 0.8)
Q(d) asks to find a value of sample mean , so that
P( X < ) = .75, Use mean =8 and sd 0.8 in your TI
calculator or the table to get
Answer is: the 75th
percentile = 8.5396
Exercises for Sampling Distribution
[Similar Exam Problems].
1. In a marketing study of gas prices for a State, if a random
sample of 16 prices will be observed, and suppose the
individual prices follow a normal distribution with mean
price of $1.45 and a standard deviation $.2.
(a) What will be the distribution of sample mean, from size of n
= 16?
(b) If you indeed observe 16 prices from a middle size city and
compute the average of these 16 prices, you have the
average price is $1.38. What is the chance of having the
average price from 16 samples to be lower than $1.38?
(c) The city manager claims that average price of 16 stations,
$1.38, is extremely low comparing with all other averages,
each from 16 prices. Is this claim correct?
(d) Can you find the 40th percentile average price of 16
prices?
2. In a household income survey study for a State, if a random
sample of 64 will be observed, and that we do not know the
distribution of individual household incomes, but, we do have
information about overall average household income, m =
$45,000 and s.d. = $16,000.
(a) Now based on this information, what cay you say about the
distribution of the sample means, each from 64 household
incomes?
(b) Is the average household income of $52,000 from 64
households an indication of an unusually high average?
(c) Find a 95th percentile of average household incomes from
64 households.
3. Suppose that the mean time for an oil change at a “10minute oil change joint” is 11.4 minutes with a standard
deviation of 3.2 minutes.
(a) If a random sample of n = 35 oil changes is selected,
describe the sampling distribution of the sample mean.
(b) If a random sample of n = 35 oil changes is selected, what
is the probability the mean oil change time is less than 11
minutes?
(c ) If a random sample of n = 50 oil changes is selected, what
is the probability the mean oil change time is less than 11
minutes?
(d) What effect did increasing the sample size have on the
probability?
4. In a marketing study of gas prices for a State, if a random
sample of 16 prices will be observed, and suppose the
individual prices follow a normal distribution with mean price of
$1.45 and a standard deviation $.2.
(a) What will be the distribution of sample mean, each sample is a
random sample of n = 16 prices?
(b) If you indeed observe 16 prices from a middle size city and
compute the average of these 16 prices, you have the average
price is $1.38. What is the chance of having the average price
from 16 samples to be lower than $1.38?
(c) The city manager claims that average price of 16 stations,
$1.38, is extremely low comparing with all other averages,
each from 16 prices. Is this claim correct?
(d) Can you find the 40th percentile average price of 16 prices?
5. A random sample of n = 64 observations are to be randomly selected.
Determine if each of the following statements is correct or not:
• The sampling distribution of sample mean in this case is the histogram of
the 64 observations that are to be collected.
• The average of all possible sample means must be equal to the true
population mean, that is E(X ) = m.( The center of the distribution of is
the population mean, m. ( This is the property called UNBIASED. )
• Since each sample mean is from an average of 64 observations, different
samples will result different sample average. Therefore, there will be
variation of sample means.
• The standard deviation of sample mean,s x < s, the population standard
deviation.
• The shape of the sampling distribution of sample mean can not be close to
normal because the original population distribution shape is not known.
• The shape of the sampling distribution of sample mean will be close to
normal because the sample size is large.
• Central Limit Theorem says: when population is normal, the shape of
sampling distribution of sample mean is close to normal, regardless the
shape of the size of the sample.