Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 10. Sampling and Sampling
Distributions
Simple Random Sampling and Sampling
Distributions
An important step in statistical analysis is the
extraction of a “correct” sample. The
process of obtaining such a sample is called
sampling and sample design is the method
used to collect samples.
There are multiple methods of sampling:
If a sample of n is drawn from a population of
N, there are NCn possible samples and each
has the probability of 1/NCn being selected.
For an infinite population, the values of the
sample must be drawn independently from
the same population (distribution).
If we are interested in a particular parameter,
these different samples may lead to different
statistics. The distribution of these values is
referred to as the sampling distribution.
Simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling
Simple random sampling is assumed here.
1
Simple Random Sampling Distributions
2
Simple Random Sampling Distributions
n=2
Suppose an office has 5 people whose working
experiences are {3, 5, 7, 9, 11}. If a random
sample is taken to calculate the average
working experience, what is the distribution of
the sample means? Note that µ = 7 and σ2 =
8.
If a sample of size 2 is taken, there are 5C2 = 10
possible samples: {3,5}, {3,7}, {3,9}, {3,11},
{5,7}, {5,9}, {5,11}, {7,9}, {7,11}, and {9,11}; the
averages are: 4, 5, 6, 6, 7, 7, 8, 8, 9, and 10.
If a sample of size 4 is taken, there are 5C4 = 5
possible samples. Samples of sizes 3, 5 and 1?
n=4
n=1
x
Prob.
x
Prob.
x
Prob.
4
1/10
6
1/5
3
1/5
5
1/10
6.5
1/5
5
1/5
6
2/10
7
1/5
7
1/5
7
2/10
7.5
1/5
9
1/5
8
2/10
8
1/5
11
1/5
9
1/10
10
1/10
2/10
1/10
n=5
1/10
x
Prob.
7
1
4 5 6 7 8 9 10
3
4
The Mean and Variance of Sampling
Distributions
The Mean and Variance of Sampling
Distributions
Just like any probability distribution, we can
For random samples of size n taken from a
compute the means and variances of the
above sampling distributions as follows:
Sample Size
n=1
n=2
n=4
n=5
µx
2
Variance σ x
7
7
7
7
8
3
0.5
0
Mean
population with mean µ and standard
deviation σ, the mean and standard deviation
of the sampling distribution are
N −n
σ
σ
or σ x =
⋅
µ x = µ; σ x =
N −1
n
n
if n is not a small proportion of N. The factor
is called the finite population correction factor.
Also from now on, σ x is called the standard
error of the mean. Why is named so?
The mean of the sampling distribution equals
N −n
N −1
the population mean.
The variance of the sampling distribution is
smaller than the population variance.
5
The Central Limit Theorem (CLT)
6
CLT in Graph: the Average Number of Rolling Dice
When both n and N are small, there is not
much can be described about the sampling
distribution.
If a random sample of size n is drawn from an
infinite population with µ and σ, then when n
is large the sample mean x can be viewed as
a normal random variable. That is, as n → ∞,
x −µ
σ2
x → N ( µ, ) and z =
→ N (0,1)
n
σ/ n
How large is large enough? n ≥ 30 for x .
Monte Carlo simulations.
0.2
0.200
0.15
0.150
0.1
0.100
0.05
0.050
0.000
0
0
2
4
6
0
8
Roll a die once (6)
0.150
4
6
8
0.12
0.1
0.08
0.06
0.04
0.02
0
0.100
0.050
0.000
0
2
4
6
Roll a die 3 times (16)
7
2
Roll a die twice (11)
8
0
2
4
6
8
Roll a die 4 times (21)
8
Sampling Distributions of Other Statistics
The sample median: the distribution tends to
be normal with mean being the population
median and standard deviation of 1.25σ / n .
This implies that the sample median has a
larger variation than the sample mean; it also
requires larger samples.
The sample variance: the distribution also
tends to normal with large samples; it follows
a different distribution for small samples.
HW: 10.33, 34, 36, 42, 43 and 45.
9