Download Document

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
統計學
授課教師:統計系余清祥
日期:2014年11月13日
第七章:抽樣與抽樣分配
Fall 2014
STATISTICS in PRACTICE
• MeadWestvaco Corporation,
a leading producer of
packaging, coated and specialty
papers, and specialty chemicals.
• MeadWestvaco’s internal consulting group uses
sampling to provide a variety of information
that enables the company to obtain significant
productivity benefits and remain competitive.
• Data collected from sample plots throughout the
forests are the basis for learning about the
population of trees owned by the company.
2
Chapter 7
Sampling and Sampling Distributions
• Selecting a Sample
• Point Estimation
• Introduction to Sampling Distributions
• Sampling Distribution of x
• Sampling Distribution of p
• Properties of Point Estimators
• Other Sampling Methods
3
Introduction -1
An element is the entity on which data are collected.
A population is a collection of all the elements of
interest.
A sample is a subset of the population.
The sampled population is the population from
which the sample is drawn.
A frame is a list of the elements that the sample will
be selected from.
4
Introduction -2
The reason we select a sample is to collect data to
answer a research question about a population.
The sample results provide only estimates of the
values of the population characteristics.
The reason is simply that the sample contains only
a portion of the population.
With proper sampling methods, the sample results
can provide “good” estimates of the population
characteristics.
5
7.1 The Electronics Associates
Sampling Problem
• The director of personnel for Electronics Associates,
Inc. (EAI), has been assigned the task of developing
a profile of the company’s 2500 managers.
• Suppose that the necessary information on all the
EAI managers was not readily available in the
company’s database.
• The question is how the firm’s director of personnel
can obtain estimates of the population parameters by
using a sample of managers rather than all 2500
managers in the population.
• Suppose that a sample of 30 managers will be used.
How we can identify a sample of 30 managers.
6
7.2 Selecting a Sample
• Sampling from a Finite Population
• Sampling from an Infinite Population
7
Sampling from a Finite Population -1
• Finite populations are often defined by lists
such as:
– Organization membership roster
– Credit card account numbers
– Inventory product numbers
• A simple random sample of size n from a
finite population of size N is a sample
selected such that each possible sample of
size n has the same probability of being
selected.
8
Sampling from a Finite Population -2
• Replacing each sampled element before
selecting subsequent elements is called
sampling with replacement.
• Sampling without replacement is the
procedure used most often.
• In large sampling projects, computergenerated random numbers are often used to
automate the sample selection process.
• Excel provides a function for generating
random numbers in its worksheets.
9
Sampling from a Finite Population -3
(Random Numbers)
10
Sampling from a Finite Population -4
• Example: To select a simple random sample
from the finite population of EAI managers,
1. We first construct a frame by assigning each
manager a number, the numbers 1 to 2500.
2. We refer to the table of random numbers. We
may start the selection of random numbers
anywhere in the table and move systematically in
a direction of our choice.
3. For example, 6327 1599 8671 7445 1102 1514
1807, include 1599, 1102, 1514, 1807 in the random
sample ignore the numbers 6327, 8671 and 7445.
11
Sampling from a Finite Population -5
• Example: St. Andrew’s College
– St. Andrew’s College received 900 applications
for admission in the upcoming year from
prospective students. The applicants were
numbered, from 1 to 900, as their applications
arrived. The Director of Admissions would like to
select a simple random sample of 30 applicants.
12
Sampling from a Finite Population -6
• Example: St. Andrew’s College
– Step 1: Assign a random number to each of the
900 applicants.
The random numbers generated by Excel’s
RAND function follow a uniform probability
distribution between 0 and 1.
– Step 2: Select the 30 applicants corresponding to
the 30 smallest random numbers.
13
Sampling from an Infinite Population -1
• Sometimes we want to select a sample, but
find it is not possible to obtain a list of all
elements in the population.
• As a result, we cannot construct a frame for
the population.
• Hence, we cannot use the random number
selection procedure.
• Most often this situation occurs in infinite
population cases.
14
Sampling from an Infinite Population -2
• Populations are often generated by an
ongoing process where there is no upper
limit on the number of units that can be
generated.
• Some examples of on-going processes, with
infinite populations, are:
–
–
–
–
parts being manufactured on a production line
transactions occurring at a bank
telephone calls arriving at a technical help desk
customers entering a store
15
Sampling from an Infinite Population -3
• In the case of an infinite population, we must
select a random sample in order to make
valid statistical inferences about the
population from which the sample is taken.
• A random sample from an infinite
population is a sample selected such that the
following conditions are satisfied.
1. Each element selected comes from the population
of interest.
2. Each element is selected independently.
To prevent selection bias.
16
7.3 Point Estimation -1
Point estimation is a form of statistical inference.
In point estimation we use the data from the sample
to compute a value of a sample statistic that serves
as an estimate of a population parameter.
We refer to
mean µ.
as the point estimator of the population
s is the point estimator of the population standard
deviation σ.
is the point estimator of the population proportion p.
17
Point Estimation -2
• Example: the EAI Problem, annual salary and
training program status for a simple random sample
of 30 EAI managers
18
Point Estimation -3
• Example: the EAI Problem, to estimate the
population mean, the population standard
deviation and population proportion.
19
Point Estimation -4
• Example: St. Andrew’s College
– Recall that St. Andrew’s College received 900
applications from prospective students. The
application form contains a variety of information
including the individual’s Scholastic Aptitude
Test (SAT) score and whether or not the
individual desires on-campus housing.
– At a meeting in a few hours, the Director of
Admissions would like to announce the average
SAT score and the proportion of applicants that
want to live on campus, for the population of 900
applicants.
20
Point Estimation -5
• Example: St. Andrew’s College
– However, the necessary data on the applicants
have not yet been entered in the college’s
computerized database. So, the Director decides
to estimate the values of the population
parameters of interest based on sample statistics.
The sample of 30 applicants is selected using
computer-generated random numbers.
21
Point Estimation -6
• Example: St. Andrew’s College
– x as Point Estimator of µ
xi 50, 520
∑
=
x =
= 1684
30
30
– s as Point Estimator of σ
=
s
2
(
x
−
x
)
∑=
i
29
210, 512
= 85.2
29
– p as Point Estimator of p
=
p 20
=
30 0.67
– Note: Different random numbers would have
identified a different sample which would have
resulted in different point estimates.
22
Point Estimation -7
• Example: St. Andrew’s College
– Once all the data for the 900 applicants were
entered in the college’s database, the values of the
population parameters of interest were calculated.
– Population Mean SAT Score
=
µ
x
∑
=
i
900
1697
– Population Standard Deviation for SAT Score
=
σ
2
(
x
−
µ
)
∑ i
= 87.4
900
– Population Proportion Wanting On-Campus
Housing
648
=
p
= 0.72
900
23
Summary of Point Estimates
Obtained from a Simple Random Sample
Population
Parameter
Parameter
Value
µ = Population mean
1697
SAT score
Point
Estimator
Point
Estimate
x = Sample mean
1684
85.2
SAT score
σ = Population std.
87.4
s = Sample standard deviation
for SAT score
p = Population proportion wanting
campus housing
0.72
0.67
p = Sample proportion wanting
campus housing
deviation for
SAT score
24
Practical Advice
The target population is the population we want to
make inferences about.
The sampled population is the population from
which the sample is actually taken.
Whenever a sample is used to make inferences
about a population, we should make sure that the
targeted population and the sampled population
are in close agreement.
25
7.4 Introduction to Sampling Distributions
-1
• Example: the EAI Problem, Values of x and p
from 500 Simple Random Samples of 30 EAI
Managers
26
Introduction to Sampling Distributions -2
• Example: the EAI Problem, Frequency and
Relative Frequency Distributions of from 500
Simple Random Samples of 30 EAI
27
Introduction to Sampling Distributions -3
• Example: the EAI Problem, Relative Frequency
Histogram of x Values from 500 Simple Random
Samples of 30 each.
28
Introduction to Sampling Distributions -4
• Example: the EAI Problem, Relative Frequency
Histogram of p Values from 500 Simple Random
Samples of 30 each.
29
Introduction to Sampling Distributions -5
• From the approximation we observe the bellshaped appearance of the distribution.
• In practice, we select only one simple
random sample from the population.
• We repeated the sampling process 500 times
in this section simply to illustrate that many
different samples are possible and that the
different samples generate a variety of values
for the sample statistics x and p .
30
7.5 Sampling Distribution of x -1
• Expected Value of x
• Standard Deviation of x
• Form of the Sampling Distribution of x
• Practical Value of the Sampling Distribution
of x
• Relationship Between the Sampling Size and
the Sampling Distribution of x
31
Sampling Distribution of x -2
• Process of Statistical Inference
Population
with mean
µ=?
The value of x is used to
make inferences about
the value of µ.
A simple random sample
of n elements is selected
from the population.
The sample data
provide a value for
the sample mean x .
32
Expected Value of x
• The sampling distribution of x is the
probability distribution of all possible values
of the sample mean x .
• Expected Value of x
E( x ) = µ
where: µ = the population mean
• When the expected value of the point
estimator equals the population parameter,
we say the point estimator is unbiased.
33
Standard Deviation of x -1
• We will use the following notation to define
the standard deviation of the sampling
distribution of x .
σx = the standard deviation of x
σ = the standard deviation of the population
n = the sample size
N = the population size
34
Standard Deviation of x -2
• Standard Deviation of x
– Finite Population
N −n σ
σx =
( )
N −1 n
Infinite Population
σx =
σ
n
– A finite population is treated as being infinite if
n/N ≤ 0.05.
– ( N − n) /( N − 1) is the finite population correction
factor.
– σ x is referred to as the standard error of the mean.
35
Standard Deviation of x -3
• Using the following expression to compute
the Standard Deviation of
σx =
where:
σ
n
1. The population is infinite; or
2. The population is finite and the sample size is less
than or equal to 5% of the population size; that is,
n/N ≤ 0.05.
36
Standard Deviation of x -4
• Example: the EAI problem, the standard
deviation of annual salary for the population
of 2500 EAI managers is σ = 4000. The
population is finite with N = 2500. With n =
30, we have n/N = 30/2500 = 0.012.
• The sample size = 0.112 ≤ 5%, we can
compute the standard error:
σ
=
x
σ
4000
=
= 730.3
30
n
37
From the Sampling Distribution of x -1
When the population has a normal distribution, the
sampling distribution of is normally distributed
for any sample size.
In most applications, the sampling distribution of
can be approximated by a normal distribution
whenever the sample is size 30 or more.
In cases where the population is highly skewed or
outliers are present, samples of size 50 may be
needed.
38
From the Sampling Distribution of x -2
The sampling distribution of can be used to
provide probability information about how close
the sample mean is to the population mean µ .
39
Central Limit Theorem
• When the population from which we are
selecting a random sample does not have a
normal distribution, the central limit theorem
is helpful in identifying the shape of the
sampling distribution of x .
CENTRAL LIMIT THEOREM
In selecting random samples of size n from a
population, the sampling distribution of the sample
mean can be approximated by a normal
distribution as the sample size becomes large.
40
Central Limit Theorem
• Illustration of The Central Limit Theorem
41
Central Limit Theorem
• Illustration of The Central Limit Theorem
42
Sampling Distribution of x for the EAI
Problem
• Example: the EAI problem, we previously
showed that E( x ) = $51,800 and σ x = 730.3.
43
Practical Value of the Sampling
Distribution of x -1
• Example: the EAI problem
– What is the probability that the sample mean
computed using a simple random sample of 30
EAI managers will be within $500 of the
population mean?
– Refer to the sampling distribution of shown again
in Figure 7.5. With a population mean of $51,800,
the personnel director wants to know the
probability that x is between $51,300 and $52,300.
44
Practical Value of the Sampling
Distribution of x -2
• Example: the EAI problem, Probability of a Sample
Mean being within $500 of the Population Mean for
a Simple Random Sample of 30 EAI Managers
45
Practical Value of the Sampling
Distribution of x -3
• Example: the EAI problem
– At upper endpoint x = 52300,
52300 − 51800
z = 0.68
=
730.30
– We find a cumulative probability
P(z ≤ 0.68) = 0.7517
– At lower endpoint x = 51300,
51300 − 51800
z=
= −0.68
730.30
– We find a cumulative probability
P(z ≤ –0.68) = 0.2483
46
Practical Value of the Sampling
Distribution of x -4
• Example: the EAI problem
– Calculate the area under the curve between the
lower and upper endpoints of the interval.
P(– 0.68 ≤ z ≤ 0.68) = P(z ≤ 0.68) – P(z ≤ –0.68)
= 0.7517 – 0.2483
= 0.5034
– A simple random mean of 30 EAI managers has a
0.5034 probability of providing a sample mean x
that with $500 of the population mean.
– There is 1 – 0.5034 = 0.4966 probability that the
difference between x and µ = $51800 will be more
that $500.
47
Practical Value of the Sampling
Distribution of x -5
• Example: St. Andrew’s College
Sampling
Distribution
of x
for SAT
Scores
E( x ) = 1697
σ
=
x
σ 87.4
=
= 15.96
n
30
x
48
Practical Value of the Sampling
Distribution of x -6
• Example: St. Andrew’s College
– What is the probability that a simple random
sample of 30 applicants will provide an estimate
of the population mean SAT score that is within
+/-10 of the actual population mean µ ?
– In other words, what is the probability that x will
be between 1687 and 1707?
49
Practical Value of the Sampling
Distribution of x -7
• Example: St. Andrew’s College
– Step 1: Calculate the z-value at the upper
endpoint of the interval.
z = (1707 – 1697)/15.96= 0.63
– Step 2: Find the area under the curve to the left
of the upper endpoint.
P(z ≤ 0.63) = 0.7357
50
Practical Value of the Sampling
Distribution of x -8
• Example: St. Andrew’s College
Cumulative Probabilities for
the Standard Normal Distribution
51
Practical Value of the Sampling
Distribution of x -9
• Example: St. Andrew’s College
Sampling
Distribution
of x
for SAT
Scores
σ x = 15.96
Area = 0.7357
1697 1707
x
52
Practical Value of the Sampling
Distribution of x -10
• Example: St. Andrew’s College
– Step 3: Calculate the z-value at the lower
endpoint of the interval.
z = (1687 – 1697)/15.96 = – 0.63
– Step 4: Find the area under the curve to the left of
the lower endpoint.
P(z ≤ – 0.63) = 0.2643
53
Practical Value of the Sampling
Distribution of x -11
• Example: St. Andrew’s College
Sampling
Distribution
of x
for SAT
Scores
σ x = 15.96
Area = 0.2643
1687 1697
x
54
Practical Value of the Sampling
Distribution of x -12
• Example: St. Andrew’s College
– Step 5: Calculate the area under the curve
between the lower and upper endpoints
of the interval.
P(– 0.68 ≤ z ≤ 0.68) = P(z ≤ 0.68) – P(z ≤ –0.68)
= 0.7357 - 0.2643
= 0.4714
– The probability that the sample mean SAT score
will be between 1687 and 1707 is:
P(1687 ≤ x ≤ 1707) = 0.4714
55
Practical Value of the Sampling
Distribution of x -13
• Example: St. Andrew’s College
Sampling
Distribution
of x
for SAT
Scores
σ x = 15.96
Area = 0.4714
1687 1697 1707
x
56
Relationship Between the Sample Size
and the Sampling Distribution of x -1
• Example: the EAI problem
– A Comparison of The Sampling Distributions of x
for Simple Random Samples of n = 30 and n = 100
EAI Managers
57
Relationship Between the Sample Size
and the Sampling Distribution of x -2
• Example: the EAI problem
– Suppose that in the EAI sampling problem we
select a simple random sample of 100 EAI
managers instead of the 30 originally considered.
– E( x ) = µ regardless of the sample size. In our
example, E( x ) remains at 730.3.
58
Relationship Between the Sample Size
and the Sampling Distribution of x -3
• Example: the EAI problem
– Whenever the sample size is increased, the
standard error of the mean σ x is decreased. With
the increase in the sample size to n = 30, the
standard error of the mean is 730.
– With the increase in the sample size to n = 100,
the standard error of the mean id decrease to
σ
4000
σ
=
=
= 400
x
n
100
59
Relationship Between the Sample Size
and the Sampling Distribution of x -4
• Example: the EAI problem, Probability of a Sample
Mean Being Within $500 of the Population Mean for
a Simple Random Sample of 100 EAI Managers
60
Relationship Between the Sample Size
and the Sampling Distribution of x -5
• Example: St. Andrew’s College
– Suppose we select a simple random sample of 100
applicants instead of the 30 originally considered.
– E( x ) = µ regardless of the sample size. In our
example, E( x ) remains at 1697.
– Whenever the sample size is increased, the
standard error of the mean σ x is decreased. With
the increase in the sample size to n = 100, the
standard error of the mean is decreased from
15.96 to:
=
σx
N −n  σ 
=
N − 1  n 
900 − 100  87.4 
=
0.94333(8.74)
= 8.2


900 − 1  100 
61
Relationship Between the Sample Size
and the Sampling Distribution of x -6
• Example: St. Andrew’s College
With n = 100,
σ x = 8.2
With n = 30,
σ x = 15.96
E( x ) = 1697
x
62
Relationship Between the Sample Size
and the Sampling Distribution of x -7
• Example: St. Andrew’s College
Sampling
Distribution
of x
for SAT
Scores
σ x = 8.2
Area = 0.7776
1687 1697 1707
x
63
7.6 Sampling Distribution of p -1
• Expected Value of p
• Standard Deviation of p
• Form of the Sampling Distribution of p
• Practical Value of the Sampling Distribution
of p
64
Sampling Distribution of p -2
• Example: Relative Frequency Histogram of Sample
Proportion Values from 500 Simple Random
Samples of 30 each.
65
Sampling Distribution of p -3
• Making Inferences about a Population
Proportion
Population
with proportion
p=?
The value of p is used
to make inferences
about the value of p.
A simple random sample
of n elements is selected
from the population.
The sample data
provide a value for the
sample proportion p.
66
Expected Value of p -1
• The sampling distribution of p is the
probability distribution of all possible values
of the sample proportion p .
• Expected Value of p
E ( p) = p
where: p = the population proportion
67
Expected Value of p -2
• Example: the EAI problem
– Because E( p ) = p , p is an unbiased estimator of p.
– We noted that p = 0.60 for the EAI problem,
where p is the proportion for the population of
managers who participated in the company’s
management training program.
– Thus, the expected value of p for the EAI
sampling problem is 0.60.
68
Standard Values of p -1
• Standard Values of p
–
Finite Population
σp =
N −n
N −1
p(1 − p)
n
Infinite Population
σp =
p (1 − p )
n
– σ p is referred to as the standard error of the
proportion.
– ( N − n) /( N − 1) is the finite population correction
factor.
69
Standard Values of p -2
• Example: the EAI problem
– For the EAI study, p = 0.60. With n/N = 30/2500 =
0.012, we can ignore the finite population
correction factor when we compute the standard
error of the population.
– For the simple random sample of 30 managers,
σp
=
p(1 − p )
=
n
0.60(1 − 0.60)
= 0.0894
30
70
Form of the Sampling Distribution of p -1
The sampling distribution of p can be approximated
by a normal distribution whenever the sample size
is large enough to satisfy the two conditions:
np > 5
and
n(1 – p) > 5
. . . because when these conditions are satisfied, the
probability distribution of x in the sample proportion,
p = x/n, can be approximated by normal distribution
(and because n is a constant).
71
Form of the Sampling Distribution of p -2
• Example: the EAI problem
– The proportion for the population of managers
who participated in the company’s management
training program is p = 0.60.
– With a simple random sample of size 30, we have
np = 30(0.60) = 18 and n(1 – p) = 30(0.40) = 12.
72
Form of the Sampling Distribution of p -3
• Example: the EAI problem, Sampling Distribution
of p for the Proportion of EAI Managers who
Participated in the Management training program
73
Practical Value of the Sampling
Distribution of p -1
• Example: St. Andrew’s College
– Recall that 72% of the prospective students
applying to St. Andrew’s College desire oncampus housing.
– What is the probability that a simple random
sample of 30 applicants will provide an estimate
of the population proportion of applicant desiring
on-campus housing that is within plus or minus
0.05 of the actual population proportion?
74
Practical Value of the Sampling
Distribution of p -2
• Example: St. Andrew’s College
– For our example, with n = 30 and p = 0.72, the
normal distribution is an acceptable
approximation because:
np = 30(0.072) = 21.6 > 5
and
n(1 – p) = 30(0.28) = 8.4 > 5
75
Practical Value of the Sampling
Distribution of p -3
• Example: St. Andrew’s College
Sampling
Distribution
of p
=
σp
E( p ) = 0.72
0.72(1 − 0.72)
= 0.082
30
p
76
Practical Value of the Sampling
Distribution of p -4
• Example: St. Andrew’s College
– Step 1: Calculate the z-value at the upper
endpoint of the interval.
z = (0.77 – 0.72)/0.082 = 0.61
– Step 2: Find the area under the curve to the left of
the upper endpoint.
P(z < 0.61) = 0.7291
77
Practical Value of the Sampling
Distribution of p -5
• Example: St. Andrew’s College
Cumulative Probabilities for
the Standard Normal Distribution
78
Practical Value of the Sampling
Distribution of p -6
• Example: St. Andrew’s College
Sampling
Distribution
of p
σ p = 0.082
Area = 0.7291
p
0.72 0.77
79
Practical Value of the Sampling
Distribution of p -7
• Example: St. Andrew’s College
– Step 3: Calculate the z-value at the lower
endpoint of the interval.
z = (0.67 – 0.72)/0.082 = – 0.61
– Step 4: Find the area under the curve to the left of
the lower endpoint.
P(z ≤ –0.61) = 0.2709
80
Practical Value of the Sampling
Distribution of p -8
• Example: St. Andrew’s College
Sampling
Distribution
of p
σ p = .082
Area = 0.2709
p
0.67 0.72
81
Practical Value of the Sampling
Distribution of p -9
• Example: St. Andrew’s College
– Step 5: Calculate the area under the curve
between the lower and upper endpoints
of the interval.
P(–0.61 ≤ z ≤ 0.61) = P(z ≤ 0.61) – P(z ≤ – 0.61)
= 0.7291 – 0.2709 = 0.4582
– The probability that the sample proportion of
applicants wanting on-campus housing will be
within +/–0.05 of the actual population
proportion :
P(0.67 ≤ p ≤ 0.77) = 0.4582
82
Practical Value of the Sampling
Distribution of p -10
• Example: St. Andrew’s College
Sampling
Distribution
of p
σ p = 0.082
Area = 0.4582
0.67 0.72 0.77
p
83
Practical Value of the Sampling
Distribution of p -11
• Example: the EAI problem
– Suppose that the personnel director wants to
know the probability of obtaining a value of p
that is within 0.05 of the population proportion of
EAI managers who participated in the training
program.
– That is, What is the probability of obtaining a
sample with a sample proportion p between 0.50
and 0.65?
84
Practical Value of the Sampling
Distribution of p -12
• Example: the EAI problem, Probability of
obtaining p between 0.55 and 0.65
85
7.7 Properties of Point Estimators -1
• Several different sample statistics can be
used as point estimator of different
population parameters, we use the following
general notation in this section
θ = the population parameter of interest
θˆ = the sample statistics or point estimator of θ
• The notation θ is the Greek letter theta and
the notation θˆ is pronounced “theta-hat”.
86
Properties of Point Estimators -2
• Before using a sample statistic as a point
estimator, statisticians check to see whether
the sample statistic has the following
properties associated with good point
estimators.
Unbiased
Efficiency
Consistency
87
樣本統計量好壞的判斷


母體的特性參數(Parameter)
樣本中用以推測母體特性的估計值
 統計量(Statistic)
對統計量的要求:
(1) 不偏(Unbiased): E(統計量) = 參數
(2) 變異數(Variance)愈小愈好
 變異數與風險(Risk)有相似的涵意

Precise
Biased
Unbiased
Imprecise
Properties of Point Estimators -3
Unbiased
• If the expected value of the sample statistics
is equal to the population parameter being
estimated, the sample statistics is said to be
an unbiased estimator of the population
parameter.
• The sample statistics θˆ is an unbiased
estimator of the population parameters θ if
where
E(θˆ ) = θ
E(θˆ ) = the expected value of the sample statistics θˆ
90
Properties of Point Estimators -4
• Examples of Unbiased and Biased Point
Estimators
91
Properties of Point Estimators -5
Efficiency
• Given the choice of two unbiased estimators
of the same population parameter, we would
prefer to use the point estimator with the
smaller standard deviation, since it tends to
provide estimates closer to the population
parameter.
• The point estimator with the smaller
standard deviation is said to have greater
relative efficiency than the other.
92
Properties of Point Estimators -6
• Example: Sampling Distributions of Two
Unbiased Point Estimators.
93
Properties of Point Estimators -7
Consistency
• A point estimator is consistent if the values
of the point estimator tend to become closer
to the population parameter as the sample
size becomes larger.
94
7.8 Other Sampling Methods
• Stratified Random Sampling
• Cluster Sampling
• Systematic Sampling
• Convenience Sampling
• Judgment Sampling
95
Stratified Random Sampling -1
The population is first divided into groups of
elements called strata.
Each element in the population belongs to one and
only one stratum.
Best results are obtained when the elements within
each stratum are as much alike as possible (i.e. a
homogeneous group).
96
Stratified Random Sampling -2
• Diagram for Stratified Random Sampling
97
分層隨機抽樣(Stratified Random Sampling)
第一層
○○○○○○
○○○○○
○○
第二層
XXXXX
XXXX
X
第三層
∆∆∆∆
∆∆∆
∆∆
○○○○
○○
XXX
XX
∆∆
∆∆
抽樣
Stratified Random Sampling -3
A simple random sample is taken from each stratum.
Formulas are available for combining the stratum
sample results into one population parameter
estimate.
Advantage: If strata are homogeneous, this method
is as “precise” as simple random sampling but with
a smaller total sample size.
Example: The basis for forming the strata might be
department, location, age, industry type, and so on.
99
Cluster Sampling -1
The population is first divided into separate groups
of elements called clusters.
Ideally, each cluster is a representative small-scale
version of the population (i.e. heterogeneous group).
A simple random sample of the clusters is then taken.
All elements within each sampled (chosen) cluster
form the sample.
100
Cluster Sampling -2
• Diagram for Cluster Sampling
101
集體隨機抽樣(Cluster Random Sampling)
○○○○○○○
×××
△△△△△
A
○○○○○○○
×××
△△△△△
B
○○○○○○○
×××
△△△△△
C
抽出A、D
○○○○○○○
×××
△△△△△
==================⇒
○○○○○○○
×××
△△△△△
D
○○○○○○○
×××
△△△△△
E
○○○○○○○
×××
△△△△△
F
○○○○○○○
×××
△△△△△
Cluster Sampling -3
Example: A primary application is area sampling,
where clusters are city blocks or other well-defined
areas.
Advantage: The close proximity of elements can be
cost effective (i.e. many sample observations can be
obtained in a short time).
Disadvantage: This method generally requires a
larger total sample size than simple or stratified
random sampling.
103
Systematic Sampling -1
If a sample size of n is desired from a population
containing N elements, we might sample one
element for every n/N elements in the population.
We randomly select one of the first n/N elements
from the population list.
We then select every n/Nth element that follows in
the population list.
104
Systematic Sampling -2
This method has the properties of a simple random
sample, especially if the list of the population
elements is a random ordering.
Advantage: The sample usually will be easier to
identify than it would be if simple random sampling
were used.
Example: Selecting every 100th listing in a telephone
book after the first randomly selected listing
105
系統抽樣(Systematic Sampling)
∣○●○○∣○●○○∣○●○○∣○●○○∣○●○○∣母體
↓
●●●●●
樣本
較常見的非隨機抽樣法
• 立意抽樣:不依隨機原則抽取樣本,而由母
體中選取部份具有典型代表樣本。(e.g. 專家
意見)
• 便利抽樣:事先不預定樣本,碰到即問或樣
本自動回答。(e.g. 街頭調查)
• 滾球抽樣:利用樣本尋找樣本,對於特定族
群樣本取得不易時採用。(e.g. 愛滋病的罹病
人數)
• 配額抽樣:規定具有某種特性的樣本比例,
類似分層隨機抽樣。
Convenience Sampling -1
It is a nonprobability sampling technique. Items are
included in the sample without known probabilities
of being selected.
The sample is identified primarily by convenience.
Example: A professor conducting research might
use student volunteers to constitute a sample.
108
Convenience Sampling -2
Advantage: Sample selection and data collection are
relatively easy.
Disadvantage: It is impossible to determine how
representative of the population the sample is.
109
Judgment Sampling -1
The person most knowledgeable on the subject of
the study selects elements of the population that he
or she feels are most representative of the population.
It is a nonprobability sampling technique.
Example: A reporter might sample three or four
senators, judging them as reflecting the general
opinion of the senate.
110
Judgment Sampling -2
Advantage: It is a relatively easy way of selecting a
sample.
Disadvantage: The quality of the sample results
depends on the judgment of the person selecting the
sample.
111
Recommendation
It is recommended that probability sampling methods
(simple random, stratified, cluster, or systematic) be
used.
For these methods, formulas are available for
evaluating the “goodness” of the sample results in
terms of the closeness of the results to the population
parameters being estimated.
An evaluation of the goodness cannot be made with
non-probability (convenience or judgment) sampling
methods.
112
End of Chapter 7
113