Download sample

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Gibbs sampling wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Transcript
Anderson
u
Sweeney
u
Williams
CONTEMPORARY
BUSINESS
STATISTICS
WITH MICROSOFT EXCEL
u Slides Prepared by JOHN LOUCKS u
© 2001 South-Western/Thomson Learning
Slide 1
Chapter 7
Sampling and Sampling Distributions
Simple Random Sampling
 Point Estimation
 Introduction to Sampling
Distributions
 Sampling Distribution of
x
 Sampling Distribution of
p
 Other Sampling Methods

Slide 2
Statistical Inference






The purpose of statistical inference is to obtain
information about a population from information
contained in a sample.
A population is the set of all the elements of interest.
A sample is a subset of the population.
The sample results provide only estimates of the
values of the population characteristics.
A parameter is a numerical characteristic of a
population.
With proper sampling methods, the sample results
will provide “good” estimates of the population
characteristics.
Slide 3
Simple Random Sampling

Finite Population
• A simple random sample from a finite population
of size N is a sample selected such that each
possible sample of size n has the same probability
of being selected.
• Replacing each sampled element before selecting
subsequent elements is called sampling with
replacement.
• Sampling without replacement is the procedure
used most often.
• In large sampling projects, computer-generated
random numbers are often used to automate the
sample selection process.
Slide 4
Simple Random Sampling

Infinite Population
• A simple random sample from an infinite
population is a sample selected such that the
following conditions are satisfied.
• Each element selected comes from the same
population.
• Each element is selected independently.
• The population is usually considered infinite if it
involves an ongoing process that makes listing or
counting every element impossible.
• The random number selection procedure cannot
be used for infinite populations.
Slide 5
Point Estimation




In point estimation we use the data from the sample
to compute a value of a sample statistic that serves as
an estimate of a population parameter.
We refer to x as the point estimator of the population
mean .
s is the point estimator of the population standard
deviation .
p is the point estimator of the population proportion
p.
Slide 6
Sampling Error



The absolute difference between an unbiased point
estimate and the corresponding population
parameter is called the sampling error.
Sampling error is the result of using a subset of the
population (the sample), and not the entire
population to develop estimates.
The sampling errors are:
| x   | for sample mean
|s -  | for sample standard deviation
| p  p | for sample proportion
Slide 7
Example: St. Edward’s
St. Edward’s University receives 1,500 applications
annually from prospective students. The application
forms contain a variety of information including the
individual’s scholastic aptitude test (SAT) score and
whether or not the individual is an in-state resident.
The director of admissions would like to know, at
least roughly, the following information:
• the average SAT score for the applicants, and
• the proportion of applicants that are in-state
residents.
We will now look at two alternatives for obtaining
the desired information.
Slide 8
Example: St. Edward’s

Alternative #1: Take a Census of 1,500 Applicants
• SAT Scores
• Population Mean
xi


 990
1,500
• Population Standard Deviation

2
(
x


)
 i
1,500
 80
• In-State Applicants
• Population Proportion
1,080
p
 .72
1,500
Slide 9
Example: St. Edward’s

Alternative #2: Take a Sample of 50 Applicants
• Excel can be used to select a simple random
sample without replacement.
• The process is based on random numbers
generated by Excel’s RAND function.
• RAND function generates numbers in the interval
from 0 to 1.
• Any number in the interval is equally likely.
• The numbers are actually values of a uniformly
distributed random variable.
Slide 10
Example: St. Edward’s

Using Excel to Select a Simple Random Sample
• 1500 random numbers are generated, one for each
applicant in the population.
• Then we choose the 50 applicants corresponding
to the 50 smallest random numbers as our sample.
• Each of the 1500 applicants have the same
probability of being included.
Slide 11
Using Excel to Select
a Simple Random Sample

Formula Worksheet
A
1
2
3
4
5
6
7
8
9
SAT Score
1008
1025
952
1090
1127
1015
965
1161
B
In-State
Yes
No
Yes
Yes
Yes
No
Yes
No
C
Random
Number
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()
=RAND()
Note: Rows 10-1501 are not shown.
Slide 12
Using Excel to Select
a Simple Random Sample

Value Worksheet
A
1
2
3
4
5
6
7
8
9
SAT Score
1008
1025
952
1090
1127
1015
965
1161
B
In-State
Yes
No
Yes
Yes
Yes
No
Yes
No
C
Random
Number
0.38184
0.08037
0.25515
0.82225
0.38700
0.52999
0.27962
0.28245
Note: Rows 10-1501 are not shown.
Slide 13
Using Excel to Select
a Simple Random Sample

Put Random Numbers in Ascending Order
• Step 1 Select cells A2:A1501
• Step 2 Select the Data pull-down menu
• Step 3 Choose the Sort option
• Step 4 When the Sort dialog box appears:
Choose Random Numbers
in the Sort by text box
Choose Ascending
Click OK
Slide 14
Using Excel to Select
a Simple Random Sample

Value Worksheet (Sorted)
A
1
2
3
4
5
6
7
8
9
SAT Score
1107
1043
991
1008
1127
982
1163
1008
B
In-State
No
Yes
Yes
No
Yes
Yes
Yes
No
C
Random
Number
0.00027
0.00192
0.00303
0.00481
0.00538
0.00583
0.00649
0.00667
Note: Rows 10-1501 are not shown.
Slide 15
Example: St. Edward’s

Point Estimates
• x as Point Estimator of 
 xi 49, 850
x

 997
50
50
• s as Point Estimator of 
277, 097
 ( xi  x )
s

 75. 2
49
49
• p as Point Estimator of p
p  34 50 . 68
 Note: Different random numbers would have
identified a different sample which would have
resulted in different point estimates.
2
Slide 16
Sampling Distribution of x
The sampling distribution of x is the probability
distribution of all possible values of the sample
mean x .

Expected Value of x
E(x) = 
where:
 = the population mean
Slide 17
Sampling Distribution of x

Standard Deviation of x
Finite Population

N n
x  ( )
n N 1
Infinite Population
x 

n
• A finite population is treated as being
infinite if n/N < .05.
• ( N  n) / ( N  1) is the finite correction factor.
•  x is referred to as the standard error of the mean.
Slide 18
Sampling Distribution of x

If we use a large (n > 30) simple random sample, the
central limit theorem enables us to conclude that the
sampling distribution of x can be approximated by a
normal probability distribution.

When the simple random sample is small (n < 30), the
sampling distribution of x can be considered normal
only if we assume the population has a normal
probability distribution.
Slide 19
Example: St. Edward’s

Sampling Distribution of
x for the SAT Scores

80
x 

 11. 3
n
50
E ( x )    990
x
Slide 20
Example: St. Edward’s

Sampling Distribution of x for the SAT Scores
What is the probability that a simple random
sample of 50 applicants will provide an estimate of
the population mean SAT score that is within plus or
minus 10 of the actual population mean  ?
In other words, what is the probability that x
will be between 980 and 1000?
Slide 21
Example: St. Edward’s

Sampling Distribution of x for the SAT Scores
Sampling
distribution
of x
Area = .3106
Area = .3106
x
980 990 1000
Using the standard normal probability table with
z = 10/11.3 = .88, we have area = (.3106)(2) = .6212.
Slide 22
Sampling Distribution of p
The sampling distribution of p is the probability
distribution of all possible values of the sample
proportion p.

Expected Value of p
E ( p)  p
where:
p = the population proportion
Slide 23
Sampling Distribution of p

Standard Deviation of p
Finite Population
p 
p(1  p) N  n
n
N 1
Infinite Population
p 
p(1  p)
n
•  p is referred to as the standard error of the
proportion.
Slide 24
Example: St. Edward’s

Sampling Distribution of p for In-State Residents
p
.72(1.72)

.0635
50
E ( p )  p . 72
The normal probability distribution is an acceptable
approximation since np = 50(.72) = 36 > 5 and
n(1 - p) = 50(.28) = 14 > 5.
Slide 25
Example: St. Edward’s

Sampling Distribution of p for In-State Residents
What is the probability that a simple random
sample of 50 applicants will provide an estimate of
the population proportion of in-state residents that is
within plus or minus .05 of the actual population
proportion?
In other words, what is the probability that p
will be between .67 and .77?
Slide 26
Example: St. Edward’s

Sampling Distribution of p for In-State Residents
Sampling
distribution
of p
Area = .2852
Area = .2852
0.67 0.72 0.77
p
For z = .05/.0635 = .79, the area = (.2852)(2) = .5704.
The probability is .5704 that the sample proportion will
be within +/-.05 of the actual population proportion.
Slide 27
Other Sampling Methods





Stratified Random Sampling
Cluster Sampling
Systematic Sampling
Convenience Sampling
Judgment Sampling
Slide 28
Stratified Random Sampling
The population is first divided into groups of
elements called strata.
 Each element in the population belongs to one and
only one stratum.
 Best results are obtained when the elements within
each stratum are as much alike as possible (i.e.
homogeneous group).
 A simple random sample is taken from each stratum.
 Formulas are available for combining the stratum
sample results into one population parameter
estimate.

Slide 29
Stratified Random Sampling


Advantage: If strata are homogeneous, this method
is as “precise” as simple random sampling but with a
smaller total sample size.
Example: The basis for forming the strata might be
department, location, age, industry type, etc.
Slide 30
Cluster Sampling
The population is first divided into separate groups
of elements called clusters.
 Ideally, each cluster is a representative small-scale
version of the population (i.e. heterogeneous group).
 A simple random sample of the clusters is then taken.
 All elements within each sampled (chosen) cluster
form the sample.
… continued

Slide 31
Cluster Sampling



Advantage: The close proximity of elements can be
cost effective (I.e. many sample observations can be
obtained in a short time).
Disadvantage: This method generally requires a
larger total sample size than simple or stratified
random sampling.
Example: A primary application is area sampling,
where clusters are city blocks or other well-defined
areas.
Slide 32
Systematic Sampling
If a sample size of n is desired from a population
containing N elements, we might sample one element
for every n/N elements in the population.
 We randomly select one of the first n/N elements
from the population list.
 We then select every n/Nth element that follows in
the population list.
 This method has the properties of a simple random
sample, especially if the list of the population
elements is a random ordering.
… continued

Slide 33
Systematic Sampling


Advantage: The sample usually will be easier to
identify than it would be if simple random sampling
were used.
Example: Selecting every 100th listing in a telephone
book after the first randomly selected listing.
Slide 34
Convenience Sampling
It is a nonprobability sampling technique. Items are
included in the sample without known probabilities
of being selected.
 The sample is identified primarily by convenience.
 Advantage: Sample selection and data collection are
relatively easy.
 Disadvantage: It is impossible to determine how
representative of the population the sample is.
 Example: A professor conducting research might use
student volunteers to constitute a sample.

Slide 35
Judgment Sampling
The person most knowledgeable on the subject of the
study selects elements of the population that he or
she feels are most representative of the population.
 It is a nonprobability sampling technique.
 Advantage: It is a relatively easy way of selecting a
sample.
 Disadvantage: The quality of the sample results
depends on the judgment of the person selecting the
sample.
 Example: A reporter might sample three or four
senators, judging them as reflecting the general
opinion of the senate.

Slide 36
End of Chapter 7
Slide 37