Download 7 - rphilip

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Transcript
CHAPTER
7
Sampling
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Q. 1. Define Sample.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample
 A subgroup of the elements of the population selected for
participation in the study.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Q. 2. What are the benefits of
Sampling?
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Sampling: An Introduction
 benefits of sampling, relative to a complete census:
 lower cost
 less time investment
 can be more accurate
 allows control of interactive testing effect
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Q. 3. What are some basic sampling
concepts?
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Sampling: An Introduction
 some basic sampling concepts
 element – the unit about which we seek information
 population – aggregate of all the elements, defined prior to
sample selection
 sampling unit – elements available for selection
 sampling frame – list of all sampling units available for
selection, from which the actual sample will be drawn
 study population – group we wish to make inferences about
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Q. 4. What are the five steps in the
sampling process?
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Sampling Process: An Overview
 Step 1. Define the population
 elements
 sampling units
 extent
 time frame
 Step 2. Identify the sampling frame
 Step 3. Decide sample size
 too large: resources wasted
 too small: inference impossible
 Step 4. Select procedure for selection
 Step 5. Physically select the sample
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Q. 5. What are the two types of
sampling procedures?
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Sampling Procedures
 probability sampling – each element has a known chance
of selection
 nonprobability sampling – selection is based in some part
on human judgment
FIGURE
7.1
Sampling procedures
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
sample standard
s
degree of variation (from
)
deviation
in a sample
Probability
Sampling
Procedures (continued)
standard error of
estimated variation in
the mean
2
population variance
σ
sample variance
s2
population proportion
π
sample proportion
p
population size
sample size
sum of squares
degrees of freedom
confidence interval
N
n
SS
df
CI
level of confidence
z-value
t-value
a
z
t
proportion of population with
binary response
proportion of sample with binary
response
number of population units
number of sample units
known degree of certainty for
quantity
degree of certainty required
points on normal distribution
points on a t distribution
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Drawing Inferences from Sample Statistics
for Continuous Variables
FIGURE
7.2
Estimating the Population Mean: Area Within One Standard
Error of the Sample Mean for a Normal Distribution
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Normal Curve
70
80
90
100
110
120
130
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Mean Distribution
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Drawing Inferences from Sample Statistics
for Continuous Variables (continued)
The central limit theorem determines likelihood that the
true population mean lies within a calculated confidence
interval (around the sample mean):
 sample mean ± 1 standard error = 68% confidence interval
 sample mean ± 2 standard errors = 95% confidence interval
 sample mean ± 3 standard errors = 99.7% confidence interval
Note: this measure of sampling error is only possible with the use of probabilitybased sample selection procedures.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Q. 6. What are the types of probability
sampling?
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Probability Sampling Procedures
simple random sampling – simplest probability-based method
Statistical terms:
parameter - true value of quantity of interest in a specific population
statistic - any quantity deriving from a sample, used to estimate a
population parameter
population mean
µ
average of all population units
sample mean
X
average of all sample units
population standard
σ
degree of variation (from µ ) in the
deviation
population
X
sample standard
s
degree of variation (from
) in a
deviation
sample
X
s
standard error of the
estimated variation in
X
mean
© 2013 Cengage Learning. All Rights Reserved. May not be scanned,
2 copied or duplicated, or posted to a publicly accessible website, in whole or in part.
population
variance
Q. 7. Define simple random sampling.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Random Sampling
A process by which each element has an equal chance of
being selected, and any combination sample elements has an
equal chance of being selected as another of the same size.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Stratified Sampling
If strata within the population are more homogenous than
the overall population, stratified sampling can decrease the
standard error (i.e., increase efficiency) of an estimator.
1. Divide the defined population into mutually exclusive
and collectively exhaustive subgroups or strata.
2. Select independent simple random sample from each
strata.
 mutually exclusive – membership in one stratum
precludes membership in any other stratum
 collectively exhaustive – all possible categories of a
variable are used to define the strata
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Stratified Sampling
(continued)
Variance and standard deviation within each stratum are
much lower than the variance and standard deviation of the
total sample.
Overall sample mean for the stratified sample ( X st. ) is a
weighted average of the within-strata means, with the weight
given to each stratum Wj = Nst.j /N.
standard error of the mean: s
2
X st .
2 2


W
 j s
X
A

j 1
st . j
 proportionate stratified sampling – simple random
sample of each stratum with sample sizes proportional to
number of elements in that strata
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Stratified Sampling
(continued)
 formulas for proportionate stratified sampling:
N st.1 = population size in stratum 1
N st.2 = population size in stratum 2
n st.1 = sample size in stratum 1
n st.2 = sample size in stratum 2
X st.1
X st.2
=
sample mean of stratum 1
=
sample mean of stratum 2
s 2 st.1
s 2 st.2
=
sample variance of stratum 1
=
sample variance of stratum 2
 disproportionate stratified sampling – overall standard
error can be reduced by sampling more heavily in strata
with higher variability
 overall mean and standard error calculated with the same
formulas as in proportionate stratified sampling:
Wj = Nst.j /N
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Cluster Sampling
 population divided into mutually exclusive and
collectively exhaustive groups, followed by selection of a
random sample from among these groups
 clusters defined as close as possible to the population in
heterogeneity on the variables of interest – each cluster
should look like a simple random sample
 if the groups are less heterogeneous than the population,
the standard error will be smaller
 can be far less costly than other procedures
 2 types:
 systematic sampling
 area sampling
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Systematic Sampling
 every kth element in the frame is selected
 advantages:




ease of sample selection
lower cost
no need for a complete sampling frame
mean from any one sample is an unbiased estimator of the population
mean – results are often nearly identical to simple random sampling
 disadvantages:
 possible periodicity –cyclical pattern coinciding with sampling
interval
 potential for larger standard errors than random sample of same size
 sampling interval, k, is the reciprocal of the desired sampling
fraction:
k
N
n
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Area Sampling
From a population of NB geographic areas, a simple random
sample of nB areas is selected.
 multistage area samples – multiple stages of selection,
from largest geographic area down to actual residents
 less statistically efficient than simple random sampling
1. equal-chance (or equal-proportion) selection of clusters
– each element has an equal chance of being selected
 each cluster given equal chance of selection
 same proportion of elements selected from each cluster
 disadvantage: elements selected to represent larger
clusters may not be very representative
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Area Sampling
(continued)
2. probability proportionate to size (PPS) – probability
of cluster selection proportional to size of cluster,
with equal probability of element selection
 probability of selecting element in block B =
block probability × within-block element
probability
 net result of selection stages is equal probability of
selection
 relatively statistically efficient
 all but guarantees that large clusters are represented
in the sample
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Q. 8. Define Parameter and Statistic.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Parameter and Statistic
Parameter is the true value of quantity of interest in a
specific population.
Statistic is any quantity deriving from a sample, used to
estimate a population parameter.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Q. 9. Define Cluster Sampling.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Cluster Sampling
A process of selecting a cluster or group of elements
randomly.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Q. 10. What are the four types of nonprobability sampling?
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Nonprobability Sampling Procedures
convenience sampling – convenience-based selection
 advantages:
 helpful for generating hypotheses in exploratory research
 can occasionally be used for conclusive studies when
inaccuracies are considered acceptable
 disadvantages:
 difference in size and direction between the population and
sample value can never be known
 cannot measure sampling error
 cannot make any conclusive statements about the results
 inappropriate for conclusive research
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Nonprobability Sampling Procedures
(continued)
judgment sampling – selection based on judgment by
expert
 advantages:
 good for deciding which cities to test market
 can improve sample with selection for informed opinion
 if the expert’s judgment is valid, the sample will be better
than a convenience sample
 limitations:
 degree and direction of error are unknown
 conclusive statements cannot be viewed as meaningful
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Nonprobability Sampling Procedures
(continued)
quota sampling – selection engineered to match the
population on specified “control characteristics”
 number of categories of control characteristics multiplied to
determine the number of cells of combinations
 cell sample size = [total sample size] x [proportion desired]
 limitations:
 respondents must be assigned to cells in accurate proportions
 all characteristics related to the measures of interest must be
included in the control characteristics
 can difficult to find a sufficient number of desired respondents for
each cell
 selection of respondents may be biased
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.