Download Sampling_MathsFest1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Sampling
Why Sample?
Jan8, 2003 Air Midwest Flight 5481 from Douglas International Airport
in North Carolina stalled after take off, crashed into a hangar and
burst into flames. All 21 people on board perished.
A subsequent investigation revealed that the weight of the passengers
was a factor that contributed to the crash.
This prompted the FAA to collect weight information from randomly
selected flights so that old assumptions about passenger weights could
be updated.
Sampling Distributions
Suppose that we draw all possible samples of size n from a
given population. Suppose further that we compute a
statistic (e.g., a mean) for each sample. The probability
distribution of this statistic is called a sampling distribution
Population
1
2
5
(1, 1)
(1, 2)
(1, 5)
(2, 1)
(2, 2)
(2, 5)
(5, 1)
(5, 2)
(5, 5)
Mean
1
1.5
3
1.5
2
3.5
3
3.5
5
Probability
𝟏
𝟗
𝟏
𝟗
𝟏
𝟗
𝟏
𝟗
𝟏
𝟗
𝟏
𝟗
𝟏
𝟗
𝟏
𝟗
𝟏
𝟗
Sample
Sampling With Replacement
1. When selecting a relatively small sample from a large
population, it makes no significant difference whether we
sample with replacement or without replacement.
2. Sampling with replacement results in independent events
that are unaffected by previous outcomes. Independent
events are easier to analyse and result in simpler formulas.
Simple Random Sample
Advantages of Simple Random
Sampling
• Every member of the population has an equal chance of being
represented in the sample
• The simple random sample should be representative of the
population. Theoretically the only thing that can compromise
its representativeness is luck
• If the sample is not representative of the population, then
the random variation is called sampling error
Disadvantages of Simple Random
Sampling
• A complete and up to date list of all the population is
required
• Such a list is usually not available for large populations
Estimators
Population
1
Mean (µ)
𝟖
𝟑
Sample
Mean (𝒙)
2
5
(1, 1)
(1, 2)
(1, 5)
(2, 1)
(2, 2)
(2, 5)
(5, 1)
(5, 2)
(5, 5)
1
1.5
3
1.5
2
3.5
3
3.5
5
Mean of Sampling Distribution
𝟖
𝟑
The sample statistic targets the population parameter
Estimators
Population
1
Standard Deviation (σ)
Sample
SD (s)
2
5
1.6997
(1, 1)
(1, 2)
(1, 5)
(2, 1)
(2, 2)
(2, 5)
(5, 1)
(5, 2)
(5, 5)
0
0.707
2.828
0.707
0
2.121
2.828
2.121
0
Mean of Sample Standard Deviations
1.2569
The sample statistic does not target the population parameter
Stratified Random Sampling
Advantages of Stratified Random
Sampling
• Provides greater precision than a simple random sample of
the same size
• Smaller samples are required, thereby saving money
• Can guard against an unrepresentative sample
Disadvantages of Stratified Random
Sampling
May require more administrative effort than a simple random
sample
A complete and up to date list of the population is required
Uniform Population Distribution
frequency
6
5
4
3
2
1
1
2
3
4
5
6
7
8
9
raw score
What is the mean of this population? 5
What is the standard deviation of this population?
5 = 2.24
Distribution of Sample Means: Samples of Size 2
Sample
Scores
Mean ( X )
1
2, 2
2
2
2,4
3
3
2,6
4
4
2,8
5
5
4,2
3
6
4,4
4
7
4,6
5
8
4,8
6
9
6,2
4
10
6,4
5
11
6,6
6
12
6,8
7
13
8,2
5
14
8,4
6
15
8,6
7
16
8,8
8
Distribution of Sample Means
from Samples of Size n = 2
6
5
4
3
2
1
1
2
3
4
5
6
7
sample mean
8
9
Distribution of Sample Means from Samples of
Size n = 2
6
5
4
3
2
1
1
2
3
4
5
6
7
sample mean
p( X > 7) = ?
8
9
Distribution of Sample Means
from Samples of Size n = 2
6
5
4
3
2
1
1
2
3
4
5
6
7
8
sample mean
P( 𝑋 > 7) =
1
16
=6%
9
Distribution of Sample Means
Distribution of Sample Means
Population Distribution
6
6
5
4
3
2
1
5
4
3
2
1
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
raw score
sample mean
P(X > 7) = 25%
𝑃 𝑋 > 7) = 6%
Cluster Sampling
Advantages of Cluster Sampling
• Inexpensive
• Limited resources can be allocated to a few randomly selected
clusters.
• Easy to implement
• Subjects are easily accessed
Disadvantages of Cluster Sampling
• From all the different probability sampling methods, this
technique is the least representative of the population.
• There is a tendency for individuals within a cluster to have
similar characteristics, therefore there is a chance that a
researcher may have an over represented or under
represented cluster.
Mean of Sample Means
Mean of Population = 5
Means
2
3
4
5
𝟖𝟎
Mean of Sample Means = 𝟏𝟔 = 𝟓
3
Standard Deviation of Population = 5 = 2.24
4
5
6
4
5
6
Standard Deviation of Sample Means = 1.58 =
7
𝟓
𝟐
5
6
7
8
Skewed Population Distribution
frequency
6
5
4
3
2
1
1
2
3
4
5
6
raw score
7
8
9
Distribution of Sample Means Samples of Size 2
12
frequency
10
8
`
6
4
2
2
Spreadsheet
3
4
sample mean
5
6
Systematic Random Sample
Advantages of Systematic Random
Sampling
• Representative of the population
• Because the sample is random, we can make statistical
conclusions that would be considered valid
Disadvantages of Systematic Random
Sampling
• A complete and up to date list of all the population is
required
• If the population is listed in some standardised pattern,
then systematic sampling could pick out similar members
rather than completely random members
Uniform Population Distribution
frequency
6
5
4
3
2
1
1
2
3
4
5
6
raw score
Mean = 5
Standard deviation =
5 = 2.24
7
8
Distribution of Sample Means Sample Size 3
24
22
20
18
16
14
12
10
8
6
4
2
1
2
3
4
5
6
7
sample mean
8
9
Spreadsheet
Things to Notice
1. The sample means tend to pile up around the population mean.
2. The distribution of sample means is approximately normal in
shape, even though the population distribution was not.
3. The distribution of sample means has less variability than
does the population distribution.
4. Increasing sample size decreases the variability in the
distribution of samples.
The Central Limit Theorem
The Central Limit Theorem states
The sampling distribution of any statistic will be normal or nearly normal, if the
sample size is large enough.
The mean of the sampling distribution is the equal to the mean of the
population.
𝜇𝑋 = 𝜇
The standard deviation of the sampling distribution (also known as the standard
error is the standard deviation of the population divided by the square root of
the sample size
𝜎
𝜎𝑋 =
𝑛
Non Probabilistic Sampling
• Quota Sampling
• Convenience Sampling
• Snowball Sampling