Download sampling and sampling distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Transcript
SAMPLING AND
SAMPLING
DISTRIBUTIONS
CONTENTS
STATISTICS IN PRACTICE:MEAD CORPORATION
7.1 THE ELECTRONICS ASSOCIATES SAMPLING PROBLEM
7.2 SIMPLE RANDOM SAMPLING
Sampling from a Finite Population
Sampling from an Infinite Population
7.3 POINT ESTIMATION
7.4 INTRODUCTION TO SAMPLING DISTRIBUTIONS
7.5 SAMPLING DISTRIBUTION OF 
Expected Value of 
Standard Deviation of 
Central Limit Theorem
Sampling Distribution of  for the EAI Sampling Problem
Practical Value of the Sampling Distribution of 
Relationship Between the Sample Size and the Sampling
Distribution of 
7.6 SAMPLING DISTRIBUTION OF p
Expected Value of p
Standard Deviation of p
Form of the Sampling Distribution of p
Practical Value of the Sampling Distribution of
7.7 PROPERTIES OF POINT ESTIMATORS
Unbiasedness
Efficiency
Consistency
7.8 OTHER SAMPLING METHODS
Stratified Random Sampling
Cluster Sampling
Systematic Sampling
Convenience Sampling
Judgment Sampling
p
WHY WE SHOULD USE SAMPLES
It is unpractical to observe all the
elements of a population for the
necessary data collection.
Reasons for
using samples
There are a lot of
elements.It waste
too much time and
money for the data
collection.It is not timely .
The population is
too large to study
all the elements
There is disruption in the
examination
shell(炮弹)、lamp
(灯泡)、brick(砖)
等
7.1 THE ELECTRONICS ASSOCIATES
SAMPLING PROBLEM
The director of personnel for Electronics Associates, Inc. (EAI),
has been assigned the task of developing a profile of the
company’s 2500 managers. The characteristics to be identified
include the mean annual salary for the managers and the
proportion of managers having completed the company’s
management training program.
Using the 2500 managers as the population for this study, we
can find the annual salary and the training program status for
each individual by referring to the firm’s personnel records. The
data file containing this information for all 2500 managers in
the population is on the disk at the back of the book.
Using the formulas presented in Chapter 3 ,we can
compute the population mean and the population standard
deviation for the annual salary data.
Population mean:
Population standard deviation:
 =$51,800
 =$4000
Furthermore, the data for the training program status show
that 1500 of the 2500 managers have completed the
training program. Letting p denote the proportion of the
population having completed the training program, we see
that p = 1500/2500 = .60.
Now if the necessary information on all the EAI managers
was not readily available in the company’s database.
Suppose that a sample of 30managers will be used.
Clearly, the time and the cost of developing a profile would
be substantially less for 30 managers than for the entire
population.
If the personnel director could be assured that a sample
of 30 managers would provide adequate information about
the population of 2500 managers, working with a sample
would be preferable to working with the entire population.
Let us explore the possibility of using a sample for the EAI
study by first considering how we can identify a sample of
30 managers.
7.2
SIMPLE RANDOM SAMPLING
Several methods can be used to select a sample
from a population; one of the most common is simple
random sampling.
7.2.1
Sampling from a Finite Population
Simple Random Sample (Finite Population)
A simple random sample of size n from a finite
population of size N is a sample selected such that each
possible sample of size n has the same probability of
being selected.
 In implementing the simple random sample selection
process, it is possible that a random number used
previously may appear again in the table before the
sample of 30 EAI managers has been selected. Because
we do not want to select a manager more than one time,
any previously used random numbers are ignored because
the corresponding manager is already included in the
sample. Selecting a sample in this manner is referred to as
sampling without replacement.
 If we had selected the sample such that previously used
random numbers were acceptable and specific managers
could be included in the sample two or more times, we
would be sampling with replacement.
(When we refer to simple random sampling, we will assume
that the sampling is without replacement.)
 The number of different simple random samples of size n
that can be selected from a finite population of size N is
N!
n!( N  n)!
7.2.2
Sampling from an Infinite Population
Simple Random Sample (Infinite Population)
A simple random sample from an infinite
population is a sample selected such that the following
conditions are satisfied.
1.Each element selected comes from the same
population.
2.Each element is selected independently.
For example, populations consisting of all possible
parts to be manufactured, all possible customer visits, all
possible bank transactions, and so on can be classified
as infinite populations.
7.3
POINT ESTIMATION
Now, let us return to the EAI problem. Assume that a
simple random sample of 30 managers has been selected
and that the corresponding data on annual salary and
management training program participation are as shown
in Table 7.2.
To estimate the value of a population parameter, we
compute a corresponding characteristic of the sample,
referred to as a sample statistic. For example, to estimate
the population mean  and the population standard
deviation  for the annual salary of EAI managers, we
simply use the data in Table 7.2 to calculate the
corresponding sample statistics: the sample mean x and
the sample standard deviation s. The sample mean is
x
=
x
n
i
1,554,420
=
= $51,814.00
30
And the sample standard deviation is
s=
 (x
i
 x)2
n 1
=
325,009,260
=
29
$ 3347.72
In addition, by computing the proportion of managers in the
sample who responded Yes, we can estimate the proportion of
managers in the population who have completed the manage
ment training program. Table 7.2 shows that 19 of the 30 man
agers in the sample have completed the training program. Thu
s, the sample proportion, denoted by p ,is given by
p
19
=
= .63
30
This value is used as an estimate of the population proporti
on p .
By making the preceding computations, we have performed
the statistical procedure called point estimation. We refer to x
as the point estimator of the population mean u ,s as the point
estimator of the population standard deviation ,and pas
the point estimator of the population proportion p .The
actual numerical value obtained for x , s ,or p in a
particular sample is called the point estimate of the
parameter.
7.4 INTRODUCTION TO SAMPLING
DISTRIBUTIONS
The probability distribution of any particular sample
statistic is called the sampling distribution of the statistic.
Because the various possible values of x and p are
the result of different simple random samples, the
probability distribution of x and p is called the sampling
distribution of x and p .
7.5
SAMPLING DISTRIBUTION OF
x
The sampling distribution of x is the probability distribution
of all possible values of the sample mean, x .
THE STATISTICAL PROCESS OF USING A SAMPLE MEAN TO MAKE
INFERENCES ABOUT A POPULATION MEAN
Populatio
n with
mean
=?
A simple random
sample of n elements
is selected from the
population.
The value of x is used
to make inferences
about the value of  .
The sample data
provide a value for
the sample mean x .
7.5.1
Expected Value of x
E(
x)= 
Where
E(
x ) = the expected value of x
 = the population mean
This result shows that with simple random sampling, the
expected value or mean for x is equal to the mean of the
population.
7.5.2
Standard Deviation of
x
Let us define the standard deviation of the sampling
distribution of x .We will use the following notation.
 x = the standard deviation of the sampling
distribution of x
 = the standard deviation of the population
n = the sample size
N =the population size
Standard Deviation of x
Finite Population
Infinite Population
N n  
x 
 
N 1  n 
x 

n
We can see that the factor
N  n N  1 is required for
the finite population case but nor for the infinite population
case. This factor is commonly referred to as the finite
population correction factor.
Use the Following Expression to Calculate the Standard
Deviation of x
x 

n
Whenever
1.The population is infinite ;or
2.The population is finite and the sample size is less
than or equal to 5% of the population size; that is, n N  .05 .
7.5.3
Central Limit Theorem
The final step in identifying the characteristics of the sampling
distribution of x is to determine the form of the probability
distribution of x .We consider two cases: one in which the
population distribution is unknown and one in which the
population distribution is known to be normally distributed.
When the population distribution is unknown, we rely on
one of the most important theorems in statistics——the
central limit theorem. A statement of the central limit theorem
as it applies to the sampling distribution of x follows.
Central Limit Theorem
In selecting simple random samples of size n from a
population, the sampling distribution of the sample mean x
can be approximated by a normal probability distribution as
the sample size becomes large.
ILLUSTRATION OF THE CENTRAL LIMIT THEOREM FOR
THREE POPULATIONS
In summary, if we use a large n  30simple random sample, the central limit
theorem enables us to conclude that the sampling distribution of x can be
approximated by a normal probability distribution.
7.5.4 Relationship Between the Sample Size
and the Sampling Distribution of x
A COMPARISON OF THE SAMPLING DISTRIBUTIONS OF x FOR
SIMPLE RANDOM SAMPLES OF n  30 AND n  100 EAI MANAGERS
With
n  100
 x  400
With n  30
 x  730.30
51,800
As the sample size is increased, the standard error of the
mean is decreased. As a result, the larger sample size will
provide a higher probability that the sample mean is within a
specified distance of the population mean.
7.6
SAMPLING DITRIBUTION OF p
The sampling distribution of p is the probability distribution
of all possible values of the sample proportion p .
THE STATISTICAL PROCESS OF USING A SAMPLE PROPORTION TO
MAKE INFERENCES ABOUT A POPULATION PROPORTION
Population with
proportion
p=?
A simple random
sample of n elements
is selected from the
population.
The value of pp is used
to make inferences
about the value of p .
The sample data
provide a value for the
sample proportion p .
7.6.1
Expected Value of p
E p   p
where
E P  =
the expected value of p
P = the population proportion
7.6.2
Standard Deviation of p
Finite Population
P 
N n
N 1
Infinite Population
p1  p 
n
p 
p1  p 
n
We see that the only difference is the use of the finite
population correction factor N  n N  1 .
Use the Following Expression to Calculate the Standard
Deviation of p
p 
p1  p 
n
Whenever
1.The population is infinite ;or
2.The population is finite and the sample size is less
than or equal to 5% of the population size; that is, n N  .05 .
7.6.3
Form of the Sampling Distribution of p
The sampling distribution of p can be approximate by a
normal probability distribution whenever the sample size is
large.
With p , the sample size can be considered large
whenever the following two conditions are satisfied.
np  5
n1  p  5
7.7
PROPERTIES OF POINT ESTIMATORS
unbiasedness
The properties
of good
point estimators

efficiency

consistency
Because several different sample statistics can be used
as point estimators of different population parameters, we
will use the following general notation in this section.
 =the population parameter of interest
ˆ =the sample statistic or point estimator of 
In general,  represents any population parameter ; ˆ
represents the corresponding sample statistic.
7.7.1
Unbiasedness
If the expected value of the sample statistic is equal to
the population parameter being estimated, the sample
statistic is said to be an unbiased estimator of the
population parameter.
Unbiasedness
The sample statistic ˆ is an unbiased estimator of the
population parameter  if
E ˆ  

where

E ˆ = the expected value of the sample statistic ˆ
Hence, the expected value, or mean, of all possible values
of an unbiased sample statistic is equal to the population
parameter being estimated.
EXAMPLES OF UNBIASED AND BIASED POINT ESTIMATORS
Sampling distribution
of ˆ
Sampling distribution
of ˆ
Bias

Parameter  is located at the
mean of the sampling distribution;

E ˆ  
(a) Unbiased Estimator
ˆ


ˆ
E ˆ
Parameter  is not located at the
mean of the sampling distribution;

E ˆ  
(b) Biased Estimator
7.7.2
Efficiency
The point estimator with the smaller standard deviation is
said to have greater relative efficiency than the other.
SAMPLING DISTRIBUTIONS OF TWO UNBIASED PIONT ESTIMATORS
Sampling distribution
of ˆ1
Sampling distribution
of ˆ2

ˆ
Parameter
Note that the standard deviation of ˆ1 is less than the standard
deviation of ˆ2 ;thus, values of ˆ1 have a greater chance of
being close to the parameter  than do values of ˆ2 .because
the standard deviation of point estimator ˆ1 is less than the
standard deviation of point estimator ˆ2 , ˆ1 is relatively more
efficient than ˆ2 and is the preferred point estimator.
7.7.3
Consistency
Loosely speaking ,a point estimator is consistent if the
values of the point estimator tend to become closer to the
population parameter as the sample size becomes larger.
In other words, a large sample size tends to provide a
better point estimate than a small sample size.
Note that for the sample mean x ,we showed that the
standard deviation of x is given by  x   n .Because  x is
related to the sample size such that larger sample sizes
provide smaller values for  x ,we conclude that a larger
sample size tends to provide point estimates closer to the
population mean  .In this sense, we can say that the
sample mean x is a consistent estimator of the population
mean  .Using a similar rationale , we can also conclude
that the sample proportion p is a consistent estimator of
the population proportion p .
7.8
7.8.1
OTHER SAMPLING METHODS
Stratified Random Sampling
In stratified random sampling, the elements in the
population are first divided into groups called strata, such
that each element in the population belongs to one and
only one stratum. The basis for forming the strata, such as
department, location, age, industry type, and so on, is at
the discretion of the designer of the sample.
DIAGRAM FOR CLUSTER SAMPLING
Population
Stratum 1
Stratum 2
Stratum H
7.8.2
Cluster Sampling
In cluster sampling, the elements in the population are
first divided into separate groups called clusters. Each
element of the population belongs to one and only one
cluster.
DIAGRAM FOR CLUSTER SAMPLING
Population
Cluster 1
Cluster 2
Cluster K
7.8.3
Systematic Sampling
An alternative to simple random sampling is systematic
sampling.
For example, if a sample size of 50 is desired from a
population containing 5000 elements, we will sample one
element for every 5000/50=100 elements in the population.
A systematic sample for this case involves selecting
randomly one of the first 100 elements from the population
list. Other sample elements are identified by starting with
the first sampled element and then selecting every 100th
element that follows in the population list. In effect, the
sample of 50 is identified by moving systematically through
the population and identifying every 100th element after
the first randomly selected element.
7.8.4
Convenience Sampling
Convenience sampling is a nonprobability sampling
technique. As the name implies, the sample is identified
primarily by convenience. Elements are included in the
sample without prespecified or known probabilities of being
selected.
For example, a professor conducting research at a
university may use student volunteers to constitute a
sample simply because they are readily available and will
participate as subjects for little or no cost.
Convenience samples have the advantage of relatively
easy sample selection and data collection; however, it is
impossible to evaluate the “goodness” of the sample in
terms of its representativeness of the population.
7.8.5
Judgment Sampling
One additional nonprobability sampling technique is
judgment sampling. In this approach, the person most
knowledgeable on the subject of the study selects
elements of the population that he or she feels are most
representative of the population. Often this method is a
relatively easy way of selecting a sample.
For example, a reporter may sample two or three
senators, judging that those senators reflect the general
opinion of all senators. However, the quality of the sample
results depends on the judgment of the person selecting
the sample. Again, great caution is warranted in drawing
conclusions based on judgment samples used to make
inferences about populations.
SUMMARY
GLOSSARY
Parameter, Simple random sampling, Sampling without
Replacement, Sampling with replacement, Sample statistic,
Point estimate, Point estimator, Sampling error, Sampling
distribution, Finite population correction factor, Standard
error, Central limit theorem, Unbiasedness, Relative efficiency,
Consistency, Stratified random sampling, Cluster sampling,
Systematic sampling, Convenience sampling .
KEY FORMULAS
Expected Value of x
E x   
Standard Deviation of x
Finite Population
x 
N n  
 
N 1  n 
Infinite Population

x 
n
Expected Value of p
E p   p
Standard Deviation of p
Finite Population
P 
N n
N 1
p1  p 
n
Infinite Population
p1  p 
 
p
n