Download Sampling Distribution and Sampling Method (int)

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Statistic Business
Week 7
Sampling and Sampling Distribution
Agenda
Time
30 minutes
60 minutes
60 minutes
50 minutes
Activity
Sampling
Sampling Distribution of the Mean
Sampling Distribution of the Proportion
Exercise
Learning Objectives
In this chapter, you learn:
• To distinguish between different sampling
methods
• The concept of the sampling distribution
• To compute probabilities related to the
sample mean and the sample proportion
• The importance of the Central Limit Theorem
SAMPLING
Why Sample?
• Selecting a sample is less time-consuming
than selecting every item in the population
(census).
• Selecting a sample is less costly than selecting
every item in the population.
• An analysis of a sample is less cumbersome
and more practical than an analysis of the
entire population.
A Sampling Process Begins With A
Sampling Frame
• The sampling frame is a listing of items that
make up the population
• Frames are data sources such as population
lists, directories, or maps
• Inaccurate or biased results can result if a
frame excludes certain portions of the
population
• Using different frames to generate data can
lead to dissimilar conclusions
Types of Samples
Samples
Non-Probability
Samples
Judgment
Convenience
Probability Samples
Simple
Random
Systematic
Stratified
Cluster
Types of Samples:
Nonprobability Sample
• In a nonprobability sample, items included are
chosen without regard to their probability of
occurrence.
– In convenience sampling, items are selected based
only on the fact that they are easy, inexpensive, or
convenient to sample.
– In a judgment sample, you get the opinions of preselected experts in the subject matter.
Types of Samples:
Probability Sample
• In a probability sample, items in the sample
are chosen on the basis of known
probabilities.
Probability
Samples
Simple
Random
Systematic
Stratified
Cluster
Probability Sample:
Simple Random Sample
• Every individual or item from the frame has an
equal chance of being selected
• Selection may be with replacement (selected
individual is returned to frame for possible
reselection) or without replacement (selected
individual isn’t returned to the frame).
• Samples obtained from table of random
numbers or computer random number
generators.
Selecting a Simple Random Sample Using A
Random Number Table
Sampling Frame For
Population With 850
Items
Item Name Item #
Bev R.
Ulan X.
.
.
.
.
Joann P.
Paul F.
001
002
.
.
.
.
849
850
Portion Of A Random Number Table
49280 88924 35779 00283 81163 07275
11100 02340 12860 74697 96644 89439
09893 23997 20048 49420 88872 08401
The First 5 Items in a simple
random sample
Item # 492
Item # 808
Item # 892 -- does not exist so ignore
Item # 435
Item # 779
Item # 002
Probability Sample:
Systematic Sample
• Decide on sample size: n
• Divide frame of N individuals into groups of k
individuals: k=N/n
• Randomly select one individual from the 1st
group
• Select every kth individual thereafter
N = 40
n=4
k = 10
First
Group
Probability Sample:
Stratified Sample
• Divide population into two or more subgroups (called
strata) according to some common characteristic
• A simple random sample is selected from each
subgroup, with sample sizes proportional to strata sizes
• Samples from subgroups are combined into one
• This is a common technique when sampling population
of voters, stratifying across racial or socio-economic
lines.
Population
Divided
into 4
strata
Probability Sample
Cluster Sample
• Population is divided into several “clusters,” each
representative of the population
• A simple random sample of clusters is selected
• All items in the selected clusters can be used, or items can
be chosen from a cluster using another probability
sampling technique
• A common application of cluster sampling involves election
exit polls, where certain election districts are selected and
sampled.
Population
divided into
16 clusters.
Randomly selected
clusters for sample
Probability Sample:
Comparing Sampling Methods
• Simple random sample and Systematic sample
– Simple to use
– May not be a good representation of the population’s
underlying characteristics
• Stratified sample
– Ensures representation of individuals across the entire
population
• Cluster sample
– More cost effective
– Less efficient (need larger sample to acquire the same
level of precision)
Evaluating Survey Worthiness
•
•
•
•
•
What is the purpose of the survey?
Is the survey based on a probability sample?
Coverage error – appropriate frame?
Nonresponse error – follow up
Measurement error – good questions elicit
good responses
• Sampling error – always exists
Types of Survey Errors
• Coverage error or selection bias
– Exists if some groups are excluded from the frame and
have no chance of being selected
• Non response error or bias
– People who do not respond may be different from those
who do respond
• Sampling error
– Variation from sample to sample will always exist
• Measurement error
– Due to weaknesses in question design, respondent error,
and interviewer’s effects on the respondent (“Hawthorne
effect”)
Types of Survey Errors
(continued)
• Coverage error
Excluded from
frame
• Non response error
Follow up on
nonresponses
• Sampling error
• Measurement error
Random differences
from sample to
sample
Bad or leading
question
SAMPLING DISTRIBUTION OF THE
MEAN
Sampling Distributions
• A sampling distribution is a distribution of all of
the possible values of a sample statistic for a
given size sample selected from a population.
• For example, suppose you sample 50 students
from your college regarding their mean GPA. If
you obtained many different samples of 50, you
will compute a different mean for each sample.
We are interested in the distribution of all
potential mean GPA we might calculate for any
given sample of 50 students.
Developing a
Sampling Distribution
• Assume there is a population …
• Population size N=4
• Random variable, X,
is age of individuals
• Values of X: 18, 20,
22, 24 (years)
A
B
C
D
Developing a
Sampling Distribution
(continued)
Summary Measures for the Population Distribution:
X

μ
P(x)
i
N
.3
18  20  22  24

 21
4
σ
 (X
i
 μ)
N
.2
.1
0
2
 2.236
18
A
B
20
C
22
D
24
Uniform Distribution
x
Developing a Sampling Distribution
(continued)
Now consider all possible samples of size n=2
1st
Obs
2nd
16 Sample
Means
Observation
18
20
22
24
18
18,18
18,20
18,22
18,24
20
20,18
20,20
20,22
20,24
1st 2nd Observation
Obs 18 20 22 24
22
22,18
22,20
22,22
22,24
18 18 19 20 21
24
24,18
24,20
24,22
24,24
20 19 20 21 22
16 possible
samples (sampling
with replacement)
22 20 21 22 23
24 21 22 23 24
Developing a Sampling Distribution
(continued)
Sampling Distribution of All Sample Means
Sample Means
Distribution
16 Sample Means
1st 2nd Observation
Obs 18 20 22 24
18 18 19 20 21
_
P(X)
.3
.2
20 19 20 21 22
.1
22 20 21 22 23
0
24 21 22 23 24
18 19 20 21 22 23 24
(no longer uniform)
_
X
Developing a Sampling Distribution
(continued)
Summary Measures of this Sampling Distribution:
μX
X

i 18  19  19    24


 21
σX 

N
16
2
(
X

μ
)
i

X
N
(18 - 21)2  (19 - 21)2    (24 - 21)2
 1.58
16
Comparing the Population Distribution
to the Sample Means Distribution
Population
N=4
μ  21
Sample Means Distribution
n=2
μX  21
σ  2.236
σ X  1.58
_
P(X)
.3
P(X)
.3
.2
.2
.1
.1
0
18
A
20
B
22
C
24
D
X
0
18
19
20 21 22 23
24
_
X
Sample Mean Sampling Distribution:
Standard Error of the Mean
• Different samples of the same size from the same
population will yield different sample means
• A measure of the variability in the mean from sample
to sample is given by the Standard Error of the Mean:
(This assumes that sampling is with replacement or sampling
is without replacement from an infinite population)
σ
σX 
n
• Note that the standard error of the mean decreases as
the sample size increases
Sample Mean Sampling Distribution:
If the Population is Normal
• If a population is normally distributed with mean
μ and standard deviation σ, the sampling
distribution of X is also normally distributed
with
μX  μ
and
σ
σX 
n
Z-value for Sampling Distribution
of the Mean
• Z-value for the sampling distribution of X :
Z
where:
( X  μX )
σX
( X  μ)

σ
n
X = sample mean
μ = population mean
σ = population standard deviation
n = sample size
Sampling Distribution Properties
μx  μ
(i.e.
x is unbiased )
Normal Population
Distribution
μ
x
μx
x
Normal Sampling
Distribution
(has the same mean)
Sampling Distribution Properties
(continued)
– As n increases,
– σ x decreases
Larger sample
size
Smaller sample
size
μ
x
Example
Oxford Cereals fills thousands of boxes of cereal during an
eight-hour shift. As the plant operations manager, you are
responsible for monitoring the amount of cereal placed in
each box. To be consistent with package labeling, boxes
should contain a mean of 368 grams of cereal.
Because of the speed of the process, the cereal weight
varies from box to box, causing some boxes to be
underfilled and others overfilled. If the process is not
working properly, the mean weight in the boxes could
vary too much from the label weight of 368 grams to be
acceptable.
Example
Because weighing every single box is too timeconsuming, costly, and inefficient, you must take a
sample of boxes. For each sample you select, you
plan to weigh the individual boxes and calculate a
sample mean. You need to determine the
probability that such a sample mean could have
been randomly selected from a population whose
mean is 368 grams. Based on your analysis, you will
have to decide whether to maintain, alter, or shut
down the cerealfilling process.
Example
a. If you randomly select a sample of 25 boxes without
replacement from the thousands of boxes filled
during a shift, the sample contains much less than
5% of the population. Given that the standard
deviation of the cereal-filling process is 15 grams,
compute the standard error of the mean.
Example
a. If you randomly select a sample of 25 boxes without
replacement from the thousands of boxes filled
during a shift, the sample contains much less than
5% of the population. Given that the standard
deviation of the cereal-filling process is 15 grams,
compute the standard error of the mean.
σX 
σ
n

15
25
3
Example
b. How is the standard error of the mean
affected by increasing the sample size from
25 to 100 boxes?
Example
b. How is the standard error of the mean
affected by increasing the sample size from
25 to 100 boxes?
σX 
σ
n

15
100
 1.5
Example
c. If you select a sample of 100 boxes, what is the
probability that the sample mean is below 365
grams?
Example
c. If you select a sample of 100 boxes, what is the
probability that the sample mean is below 365
grams?
365
368
X  μ 365  368  3
Z


 2
σ
15
1.5
n
100
P( x  365)  P( z  2)  0.0228
Example
d. Find an interval symmetrically distributed
around the population mean that will include
95% of the sample means, based on samples
of 25 boxes.
Example
d. Find an interval symmetrically distributed
around the population mean that will include
95% of the sample means, based on samples
of 25 boxes.
95%
XL
Therefore:
368
XU
P( X  X L )  0.025
P( X  X U )  0.975
Example
P( X  X L )  0.025
P( X  X U )  0.975
Z X L  1.96
Z XU  1.96
X L  368
- 1.96 
15
25
X L  368
1.96 
15
25
X L  368  1.96.3
X L  368  1.96.3
X L  362.12
X L  373.88
EXERCISE
Exercise 1
The U.S. Census Bureau announced that the median
sales price of new houses sold in 2009 was
$215,600, and the mean sales price was $270,100.
Assume that the standard deviation of the prices is
$90,000.
a. If you select a random sample of n = 100, what
is the probability that the sample mean will be
less than $300,000?
b. If you select a random sample of n = 100, what
is the probability that the sample mean will be
between $275,000 and $290,000?
Exercise 2
Time spent using e-mail per session is normally
distributed, with minutes and minutes. If you select a
random sample of 25 sessions,
a. what is the probability that the sample mean is
between 7.8 and 8.2 minutes?
b. what is the probability that the sample mean is
between 7.5 and 8 minutes?
c. If you select a random sample of 100 sessions, what is
the probability that the sample mean is between 7.8
and 8.2 minutes?
d. Explain the difference in the results of (a) and (c).
Exercise 3
The amount of time a bank teller spends with each customer
has a population mean, = 3.10 minutes and a standard
deviation, = 0.40 minute. If you select a random sample of 16
customers,
a. what is the probability that the mean time spent per
customer is at least 3 minutes?
b. there is an 85% chance that the sample mean is less than
how many minutes?
c. What assumption must you make in order to solve (a) and
(b)?
d. If you select a random sample of 64 customers, there is an
85% chance that the sample mean is less than how many
minutes?
ANSWER
Exercise 1
a. Karena mean > median, distribusi populasi
harga jual akan menceng ke kiri. Karena n=3
(n<30) maka distribusi sampelnya juga akan
menceng ke kiri
b. Karena n=100 maka distribusi sampelnya
akan mendekati normal dengan rata-rata
$274.300 dan simpangan baku $9.000
c. 0.9996
d. 0.2796
Exercise 3
a. P(X >3) = P(Z>-1.00) = 1.0 – 0.1587 = 0.8413
b. P(Z<1.04) = 0.85
X = 3.10 + 1.04 (0.1) = 3.204
a. Distribusi populasi paling tidak harus simetris
b. P(Z<1.04) = 0.85
X = 3.10 + 1.04 (0.05) = 3.152
CENTRAL LIMIT THEOREM
Central Limit Theorem
As the
sample
size gets
large
enough…
n↑
the sampling
distribution
becomes
almost normal
regardless of
shape of
population
x
Sample Mean Sampling Distribution:
If the Population is not Normal
How Large is Large Enough?
• For most distributions, n > 30 will give a
sampling distribution that is nearly normal
• For fairly symmetric distributions, n > 15 will
usually give a sampling distribution is almost
normal
• For normal population distributions, the
sampling distribution of the mean is always
normally distributed
SAMPLING DISTRIBUTION OF THE
PROPORTION
Population Proportions
π = the proportion of the population having some
characteristic
Sample proportion ( p ) provides an estimate of π:
p
X
number of items in the sample having the characteristic of interest

n
sample size
• 0≤ p≤1
• p is approximately distributed as a normal distribution
when n is large
• (assuming sampling with replacement from a finite population
or without replacement from an infinite population)
Sampling Distribution of p
• Approximated by a
normal distribution if:
nπ  5
P( ps)
Sampling Distribution
.3
.2
.1
0
and
0
n(1  π )  5
where
μp  π
and
.2
.4
.6
8
π(1 π )
σp 
n
(where π = population proportion)
1
p
Z-Value for Proportions
Standardize p to a Z value with the formula:
p 
Z

σp
p 
 (1  )
n
Example
• Suppose that the manager of the local branch of a bank
determines that 40% of all depositors have multiple
accounts at the bank.
• If you select a random sample of 200 depositors,
because n = 200(0.40) = 80 ≥ 5 and n(1 – ) =
200(0.60) = 120 ≥ 5, the sample size is large enough to
assume that the sampling distribution of the
proportion is approximately normally distributed.
• Calculate the probability that the sample proportion of
depositors with multiple accounts is less than 0.30
Example
Z
p 
 (1   )
n

0.30  0.40
(0.40)(0.60)
200

 0.10
0.24
200
 2.89
P(Z<-2.89) = 0.0019
Therefore, if the population proportion of
items of interest is 0.40, only 0.19% of the
samples of would be expected to have sample
proportions less than 0.30.
EXERCISE
Exercise 4
A political pollster is conducting an analysis of sample
results in order to make predictions on election night.
Assuming a two-candidate election, if a specific candidate
receives at least 55% of the vote in the sample, that
candidate will be forecast as the winner of the election. If
you select a random sample of 100 voters, what is the
probability that a candidate will be forecast as the winner
when
a. the population percentage of her vote is 50.1%?
b. the population percentage of her vote is 60%?
c. the population percentage of her vote is 49% (and she
will actually lose the election)?
Exercise 5
In a recent survey of full-time female workers ages 22 to 35
years, 46% said that they would rather give up some of their
salary for more personal time. Suppose you select a sample of
100 full-time female workers 22 to 35 years old.
a. What is the probability that in the sample, fewer than
50% would rather give up some of their salary for more
personal time?
b. What is the probability that in the sample, between 40%
and 50% would rather give up some of their salary for
more personal time?
c. What is the probability that in the sample, more than 40%
would rather give up some of their salary for more
personal time?
ANSWER
Exercise 4
a.  = 0.501
 (1   )
0.501(1  0.501)
P 

 0.05
n
100
P(p>0.55) = P(Z>0.98) = 1.0 – 0.8365 = 0.1635
b.  = 0.60
 (1   )
0.6(1  0.6)
P 

 0.04899
n
100
P(p>0.55) = P(Z>-1.021) = 1.0 – 0.1539 = 0.8461
Exercise 4
c.  = 0.49
 (1   )
0.49(1  0.49)
P 

 0.05
n
100
P(p>0.55) = P(Z>1.28) = 1.0 – 0.8849 = 0.1151
d.  = 0.60
 (1   )
0.6(1  0.6)
P 

 0.04899
n
100
P(p>0.55) = P(Z>-1.021) = 1.0 – 0.1539 = 0.8461
THANK YOU