Download Sampling Distribution of Sample Mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Introduction to Inference
Sampling Distributions
Inference with a Single Observation
Population
?
Sampling
Parameter: 
Inference
Observation Xi
• Each observation Xi in a random sample is a
representative of unobserved variables in population
• How different would this observation be if we took a
different random sample?
Inference with Sample Mean
Population
?
Sampling
Sample
Parameter: 
Inference
Estimation
Statistic: x
• Sample mean is our estimate of population mean
• How much would the sample mean change if we took
a different sample?
• Key to this question: Sampling Distribution of x
Sampling Distribution of a Sample
• Sampling Distribution
of a Sample Statistic: The
Statistic
distribution of values for a sample statistic obtained
from repeated samples, all of the same size and all
drawn from the same population
Example: Consider the set {1, 2, 3, 4}:
1) Make a list of all samples of size 2 that can be drawn
from this set (Sample with replacement)
2) Construct the sampling distribution for the sample mean
for samples of size 2
3) Construct the sampling distribution for the minimum for
samples of size 2
Table of All Possible Samples
This table lists all
possible samples of size
2, the mean for each
sample, and the
probability of each
sample occurring (all
equally likely)
# of possible samples
(with placement) = Nn
Sample
x
{1, 1}
{1, 2}
{1, 3}
{1, 4}
{2, 1}
{2, 2}
{2, 3}
{2, 4}
{3, 1}
{3, 2}
{3, 3}
{3, 4}
{4, 1}
{4, 2}
{4, 3}
{4, 4}
1.0
1.5
2.0
2.5
1.5
2.0
2.5
3.0
2.0
2.5
3.0
3.5
2.5
3.0
3.5
4.0
Minimum Probability
1
1
1
1
1
2
2
2
1
2
3
3
1
2
3
4
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
1/16
Sampling Distribution
• Summarize the information in the previous table to obtain
the sampling distribution of the sample mean and the
sample minimum:
Sampling Distribution
of the Sample Mean
x
1.0
1.5
2.0
2.5
3.0
3.5
4.0
P( x )
1/16
2/16
3/16
4/16
3/16
2/16
1/16
Histogram: Sampling Distribution
of the Sample Mean
P( x )
0
.
2
5
0
.
2
0
0
.
1
5
0
.
1
0
0
.
0
5
0
.
0
0
1
.
0
1
.
5
2
.
0
2
.
5
3
.
0
3
.
5
4
.
0
x
Sampling Distribution of Sample Mean
• Distribution of values taken by statistic in all possible
samples of size n from the same population
• Model assumption: our observations xi are sampled
from a population with mean  and variance 2
Population
Unknown
Parameter:

Sample 1 of size n
Sample 2 of size n
Sample 3 of size n
Sample 4 of size n
Sample 5 of size n
Sample 6 of size n
Sample 7 of size n
Sample 8 of size n
.
.
.
x
x
x
x
x
x
x
x
Distribution
of these
values?
Mean of Sample Mean
• First, we examine the center of the sampling
distribution of the sample mean.
• Center of the sampling distribution of the sample
mean is the unknown population mean:
mean( X ) = μ
• Over repeated samples, the sample mean will, on
average, be equal to the population mean
– no guarantees for any one sample!
Variance of Sample Mean
• Next, we examine the spread of the sampling
distribution of the sample mean
• The variance of the sampling distribution of the
sample mean is
variance( X ) = 2/n
• As sample size increases, variance of the sample
mean decreases!
• Averaging over many observations is more accurate than
just looking at one or two observations
• Comparing the sampling distribution of the
sample mean when n = 1 (parent population)
vs. n = 10
Law of Large Numbers
• Remember the Law of Large Numbers:
• If one draws independent samples from a
population with mean μ, then as the number of
observations increases, the sample mean x gets
closer and closer to the population mean μ
• This is easier to see now since we know that
mean(x) = μ
variance(x) = 2/n
0 as n gets large
Example
• Population: seasonal home-run totals for
7032 baseball players from 1901 to 1996
• Take different samples from this population and
compare the sample mean we get each time
• In real life, we can’t do this because we don’t
usually have the entire population!
Mean
Variance
100 samples of size n = 1
3.69
46.8
100 samples of size n = 10
4.43
4.43
100 samples of size n = 100
4.42
0.43
100 samples of size n = 1000
4.42
0.06
Sample Size
Population Parameter
 = 4.42
Distribution of Sample Mean
• We now know the center and spread of the
sampling distribution for the sample mean.
• What about the shape of the distribution?
• If our data x1,x2,…, xn follow a Normal
distribution, then the sample mean x will also
follow a Normal distribution!
Example
• Mortality in US cities (deaths/100,000 people)
• This variable seems to approximately follow a
Normal distribution, so the sample mean will
also approximately follow a Normal distribution
irrespective of the sample size drawn.
Central Limit Theorem
• What if the original data doesn’t follow a Normal
distribution?
• HR/Season for sample of baseball players
• If the sample is large enough, it doesn’t matter!
Central Limit Theorem
• If the sample size is large enough (n≥ 30),
then the sample mean x has an
approximately Normal distribution

• This is true no matter what the shape of
the distribution of the original data!
Example: Home Runs per Season
• Take many different samples from the seasonal HR
totals for a population of 7032 players
• Calculate sample mean for each sample
n=1
n = 10
n = 100
Important Definition &
Theorem
Sampling Distribution of Sample Means
If all possible random samples, each of size n, are taken from any
population with a mean  and a standard deviation , the sampling
distribution of sample means will:
1. have a mean  x equal to 
2. have a standard deviation  x equal to 
n
Further, if the sampled population has a normal distribution, then the
sampling distribution of x will also be normal for samples of all
sizes
Central Limit Theorem
The sampling distribution of sample means will become normal as
the sample size increases.
Summary
• The mean of the sampling distribution of x is equal to the mean of the
original population: x = 
• The standard deviation of the sampling distribution of x (also called the
standard error of the mean) is equal to the standard deviation of the
original population divided by the square root of the sample size:  x =  n
Notes:
– The distribution of x becomes more compact as n increases. (Why?)
– The variance of x :  x2 =  2 n
• The distribution of x is (exactly) normal when the original population
is normal
• The CLT says: the distribution of x is approximately normal regardless
of the shape of the original distribution, when the sample size is large
enough!
Standard Error of the Mean
Standard Error of the Mean: The standard deviation of
the sampling distribution of sample means:  x =  n
Notes:
• The n in the formula for the standard error of the mean is
the size of the sample
• The proof of the Central Limit Theorem is beyond the
scope of this course
• The following example illustrates the results of the
Central Limit Theorem
Graphical Illustration of the Central Limit
Distribution of x:
Original Population
Theorem
n=2
10
20
30
x
10
20
Distribution of x:
n = 30
Distribution of x:
n = 10
10
x
x
30
10
20
x
7.3 ~ Applications of the Central Limit Theorem
• When the sampling distribution of the
sample mean is (exactly) normally
distributed, or approximately normally
distributed (by the CLT), we can answer
probability questions using the standard
normal distribution, using the z standard
score for dealing with the normal distribution,
Example 2
 Example:Consider a normal population with  = 50
and  = 15. Suppose a sample of size 9 is selected at
random. Find:
1) P ( 45  x  60)
2) P ( x  47.5)
Solutions: Since the original population is normal, the
distribution of the sample mean is also (exactly) normal
1)  x =  = 50
2)  x = 
n = 15
9 = 15 3 = 5
Example 2
0.4772
0.3413
45
- 1.00
z=
x-

n
;
50
0
60
2.00
x
z
 45 - 50
60 - 50
 z 
P (45  x  60) = P
÷
 5
5 
= P( -1.00  z  2.00)
= 0.3413 + 0.4772 = 0.8185
Example 2
0.3085
01915
.
47.5 50
-0.50
z=
x-

n
;
0
x
z
 x - 50 47.5 - 50

P( x  47.5) = P
÷
 5
5 
= P( z  -.5)
= 0.5000 - 01915
= 0.3085
.
Related documents