Download sample

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical inference wikipedia , lookup

Student's t-test wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Business Statistics:
Chapter 7
Introduction to
Sampling Distributions
QMIS 220, by Dr. M. Zainal
Chapter Goals
After completing this chapter, you should be
able to:

Define the concept of sampling error

Determine the mean and standard deviation
_ for the
sampling distribution of the sample mean, x

Determine the mean and standard deviation for the
_
sampling distribution of the sample proportion, p

Describe the Central Limit Theorem and its importance
_
_
Apply sampling distributions for both x and p

QMIS 220, by Dr. M. Zainal
Chap 7-2
Review: Inferential Statistics

Inferential statistics



Drawing conclusions and/or making decisions
concerning a population based only on
sample data
Consists of methods that use sample results
to help make decisions or predictions about a
population.
Elections
QMIS 220, by Dr. M. Zainal
Chap 7-3
Review: Inferential Statistics
Sample statistics
(known)
Inference
Sample
QMIS 220, by Dr. M. Zainal
Population parameters
(unknown, but can
be estimated from
sample evidence)
Population
Chap 7-4
Review: Inferential Statistics
Drawing conclusions and/or making decisions
concerning a population based on sample results.

Estimation


e.g., Estimate the population mean
weight using the sample mean
weight
Hypothesis Testing

e.g., Use sample evidence to test
the claim that the population mean
weight is 120 pounds
QMIS 220, by Dr. M. Zainal
Chap 7-5
Review: Key Definitions

A population is the entire collection of things
under consideration


A parameter is a summary measure computed to
describe a characteristic of the population
A sample is a portion of the population
selected for analysis

A statistic is a summary measure computed to
describe a characteristic of the sample
QMIS 220, by Dr. M. Zainal
Chap 7-6
Review: Population vs. Sample
Population
a b
Sample
cd
b
ef gh i jk l m n
o p q rs t u v w
x y
QMIS 220, by Dr. M. Zainal
z
c
gi
o
n
r
u
y
Chap 7-7
Review: Why Sample?

Less time consuming than a census

Less costly to administer than a census

It is possible to obtain statistical results of a
sufficiently high precision based on samples.
QMIS 220, by Dr. M. Zainal
Chap 7-8
Review: Sampling Techniques
Sampling Techniques
Nonstatistical Sampling
Convenience
Statistical Sampling
Simple
Random
Systematic
Judgment
Stratified
QMIS 220, by Dr. M. Zainal
Cluster
Chap 7-9
Review: Statistical Sampling

Items of the sample are chosen based on
known or calculable probabilities
Statistical Sampling
(Probability Sampling)
Simple Random
QMIS 220, by Dr. M. Zainal
Stratified
Systematic
Cluster
Chap 7-10
Simple Random Sampling

Every possible sample of a given size has an
equal chance of being selected

Selection may be with replacement or without
replacement

The sample can be obtained using a table of
random numbers or computer random number
generator
QMIS 220, by Dr. M. Zainal
Chap 7-11
Stratified Random Sampling

Divide population into subgroups (called strata)
according to some common characteristic

Select a simple random sample from each
subgroup

Combine samples from subgroups into one
Population
Divided
into 4
strata
Sample
QMIS 220, by Dr. M. Zainal
Chap 7-12
Systematic Random Sampling

Decide on sample size: n

Divide frame of N individuals into groups of k
individuals: k=N/n

Randomly select one individual from the 1st
group

Select every kth individual thereafter
N = 64
n=8
First Group
k=8
QMIS 220, by Dr. M. Zainal
Chap 7-13
Cluster Sampling


Divide population into several “clusters,” each
representative of the population
Select a simple random sample of clusters

All items in the selected clusters can be used, or items can be
chosen from a cluster using another probability sampling
technique
Population
divided into
16 clusters.
QMIS 220, by Dr. M. Zainal
Randomly selected
clusters for sample
Chap 7-14
Examples of poor samplings
The technique of sampling has been widely
used, both properly and improperly, in the area of
politics.

During the 1936 presidential race where the
Literary Digest predicted Alf Landon to win the
election over Franklin D. Roosevelt.

QMIS 220, by Dr. M. Zainal
Chap 7-15
Sampling Error
So far, we have stressed the benefits of drawing
a sample from a population.

However, in statistics, as in life, there's no such
thing as a free lunch.

By sampling, we expose ourselves to errors that
can lead to inaccurate conclusions about the
population.

The type of error that a statistician is most
concerned about is called sampling error.

QMIS 220, by Dr. M. Zainal
Chap 7-16
Sampling Error

Sample Statistics are used to estimate
Population Parameters
ex: X is an estimate of the population mean, μ

Problems:

Different samples provide different estimates of
the population parameter

Sample results have potential variability, thus
sampling error exits
QMIS 220, by Dr. M. Zainal
Chap 7-17
Sampling Error
As the entire population is rarely measured, the
sampling error cannot be directly calculated.

With inferential statistics, we'll be able to assign
probabilities to certain amounts of sampling error
later.

It occurs when we select a sample that is not a
perfect match to the entire population.

Sampling errors are a small price to pay to avoid
measuring an entire population.

QMIS 220, by Dr. M. Zainal
Chap 7-18
Sampling Error
One way to reduce the sampling error of a
statistical study is to increase the size of the
sample.


In general, the larger the sample size, the
smaller the sampling error.

If you increase the sample size until it reaches
the size of the population, then the sampling
error will be reduced to 0.

But in doing so, we lose the benefits of sampling.
QMIS 220, by Dr. M. Zainal
Chap 7-19
Calculating Sampling Error

Sampling Error:
The difference between a value (a statistic)
computed from a sample and the corresponding
value (a parameter) computed from a population
Example: (for the mean)
Sampling Error  x - μ
where:
x  sample mean
μ  population mean
QMIS 220, by Dr. M. Zainal
Chap 7-20
Review

Population mean:
x

μ
N
i
Sample Mean:
x

x
i
n
where:
μ = Population mean
x = sample mean
xi = Values in the population or sample
N = Population size
n = sample size
QMIS 220, by Dr. M. Zainal
Chap 7-21
Example
If the population mean is μ = 98.6 degrees
and a sample of n = 5 temperatures yields a
sample mean of x = 99.2 degrees, then the
sampling error is
x  μ  99.2  98.6  0.6 degrees
QMIS 220, by Dr. M. Zainal
Chap 7-22
Sampling Errors

Different samples will yield different sampling
errors

The sampling error may be positive or negative
( x may be greater than or less than μ)

The expected sampling error decreases as the
sample size increases
QMIS 220, by Dr. M. Zainal
Chap 7-23
Sampling Distribution

A sampling distribution is a
distribution of the possible values of
a statistic for a given size sample
selected from a population
QMIS 220, by Dr. M. Zainal
Chap 7-24
Developing a
Sampling Distribution

Assume there is a population …

Population size N=4

Random variable, x,
is age of individuals

Values of x: 18, 20,
22, 24 (years)
QMIS 220, by Dr. M. Zainal
A
B
C
D
Chap 7-25
Developing a
Sampling Distribution
(continued)
Summary Measures for the Population Distribution:
x

μ
P(x)
i
N
.3
18  20  22  24

 21
4
σ
 (x
QMIS 220, by Dr. M. Zainal
i
 μ)
N
.2
.1
0
2
 2.236
18
20
22
24
x
A
B
C
D
Uniform Distribution
Chap 7-26
Developing a
Sampling Distribution
(continued)
Now consider all possible samples of size n=2
1st
Obs
2nd Observation
18
20
22
24
18 18,18 18,20 18,22 18,24
16 Sample
Means
20 20,18 20,20 20,22 20,24
1st 2nd Observation
Obs 18 20 22 24
22 22,18 22,20 22,22 22,24
18 18 19 20 21
24 24,18 24,20 24,22 24,24
20 19 20 21 22
16 possible samples
(sampling with
replacement)
QMIS 220, by Dr. M. Zainal
22 20 21 22 23
24 21 22 23 24
Chap 7-27
Developing a
Sampling Distribution
(continued)
Sampling Distribution of All Sample Means
Sample Means
Distribution
16 Sample Means
1st 2nd Observation
Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
24 21 22 23 24
QMIS 220, by Dr. M. Zainal
P(x)
.3
.2
.1
0
18 19
20 21 22 23
(no longer uniform)
24
_
x
Chap 7-28
Developing a
Sampling Distribution
(continued)
Summary Measures of this Sampling Distribution:
μx
x


18  19  21   24

 21
N
16
σx 

QMIS 220, by Dr. M. Zainal
i
2
(
x

μ
)
 i x
N
(18 - 21)2  (19 - 21)2    (24 - 21)2
 1.58
16
Chap 7-29
Comparing the Population with
its Sampling Distribution
Population
N=4
μ  21
σ  2.236
Sample Means Distribution
n=2
μx  21
P(x)
.3
P(x)
.3
.2
.2
.1
.1
0
18
20
22
24
A
B
C
D
QMIS 220, by Dr. M. Zainal
x
0
18 19
σ x  1.58
20 21 22 23
24
_
x
Chap 7-30
Properties of a Sampling
Distribution

For any population,

the average value of all possible sample means computed from
all possible random samples of a given size from the population
is equal to the population mean:
μx  μ

Theorem 1
The standard deviation of the possible sample means
computed from all random samples of size n is equal to the
population standard deviation divided by the square root of the
sample size:
σ
σx 
n
QMIS 220, by Dr. M. Zainal
Theorem 2
Chap 7-31
If the Population is Normal
If a population is normal with mean μ and
standard deviation σ, the sampling distribution
of
x
is also normally distributed with
μx  μ
QMIS 220, by Dr. M. Zainal
and
σ
σx 
n
Theorem 3
Chap 7-32
z-value for Sampling Distribution
of x

Z-value for the sampling distribution of
x:
(x  μ)
z
σ
n
where:
x = sample mean
μ = population mean
σ = population standard deviation
n = sample size
QMIS 220, by Dr. M. Zainal
Chap 7-33
Finite Population Correction

Apply the Finite Population Correction if:
 the sample is large relative to the population
(n is greater than 5% of N)
and…
 Sampling is without replacement
Then
QMIS 220, by Dr. M. Zainal
(x  μ)
z
σ Nn
n N 1
Chap 7-34
Sampling Distribution Properties

The sample mean is an unbiased estimator
Normal Population
Distribution
μ
μx  μ
Normal Sampling
Distribution
(has the same mean)
μx
QMIS 220, by Dr. M. Zainal
x
x
Chap 7-35
Sampling Distribution Properties
(continued)

The sample mean is a consistent estimator
(the value of x becomes closer to μ as n increases):
Population
x
Small
sample size
As n increases,
x
σ x  σ/ n decreases
QMIS 220, by Dr. M. Zainal
Larger
sample size
μ
x
Chap 7-36
If the Population is not Normal

We can apply the Central Limit Theorem:



Even if the population is not normal,
…sample means from the population will be
approximately normal as long as the sample size is
large enough
…and the sampling distribution will have
μx  μ
QMIS 220, by Dr. M. Zainal
and
σ
σx 
n
Theorem 4
Chap 7-37
Central Limit Theorem
As the
sample
size gets
large
enough…
QMIS 220, by Dr. M. Zainal
n↑
the sampling
distribution
becomes
almost normal
regardless of
shape of
population
x
Chap 7-38
If the Population is not Normal
(continued)
Population Distribution
Sampling distribution
properties:
Central Tendency
μx  μ
σ
σx 
n
Variation
μ
x
Sampling Distribution
(becomes normal as n increases)
Larger
sample
size
Smaller
sample size
(Sampling with replacement)
QMIS 220, by Dr. M. Zainal
μx
x
Chap 7-39
How Large is Large Enough?

For most distributions, n > 30 will give a
sampling distribution that is nearly normal

For fairly symmetric distributions, n > 15 is
sufficient

For normal population distributions, the
sampling distribution of the mean is always
normally distributed
QMIS 220, by Dr. M. Zainal
Chap 7-40
Example


Suppose a population has mean μ = 8 and
standard deviation σ = 3. Suppose a random
sample of size n = 36 is selected.
What is the probability that the sample mean is
between 7.8 and 8.2?
QMIS 220, by Dr. M. Zainal
Chap 7-41
Example
(continued)
Solution:




Even if the population is not normally
distributed, the central limit theorem can be
used (n > 30)
… so the sampling distribution of x is
approximately normal
… with mean μx = μ = 8
σ
3
…and standard deviation σ x  n  36  0.5
QMIS 220, by Dr. M. Zainal
Chap 7-42
Example
(continued)
Solution (continued) -- find z-scores:


μ
μ
 7.8 - 8
8.2 - 8 
x
P(7.8  μ x  8.2)  P



3
σ
3


36
n
36 

 P(-0.4  z  0.4)  0.3108
Population
Distribution
?
Sampling
Distribution
???
??
?
??
μ8
QMIS 220, by Dr. M. Zainal
?
?
Standard Normal
Distribution
Sample
.1554
+.1554
Standardize
?
x
7.8
μx  8
8.2
x
-0.4
μz  0
0.4
z
Chap 7-43
Population Proportions, π
π = the proportion of the population having
some characteristic

Sample proportion ( p ) provides an estimate
of π :
x
number of successes in the sample
p

n
sample size

If two outcomes, p has a binomial distribution
QMIS 220, by Dr. M. Zainal
Chap 7-44
Sampling Distribution of p

Approximated by a
normal distribution if:

Sampling Distribution
.3
.2
.1
0
nπ  5
n(1  π)  5
0
where
μp  π
P( p )
and
.2
.4
.6
8
1
p
π(1 π)
σp 
n
(where π = population proportion)
QMIS 220, by Dr. M. Zainal
Chap 7-45
z-Value for Proportions
Standardize p to a z value with the formula:
pπ
z

σp

If sampling is without replacement
and n is greater than 5% of the
population size, then σ p must use
the finite population correction
factor:
QMIS 220, by Dr. M. Zainal
pπ
π(1  π)
n
σp 
π(1  π) N  n
n
N 1
Chap 7-46
Example


If the true proportion of voters who support
Proposition A is π = .4, what is the probability
that a sample of size 200 yields a sample
proportion between .40 and .45?
i.e.: if π = .4 and n = 200, what is
P(.40 ≤ p ≤ .45) ?
QMIS 220, by Dr. M. Zainal
Chap 7-47
Example
(continued)

Find σ p :
if π = .4 and n = 200, what is
P(.40 ≤ p ≤ .45) ?
π(1 π)
.4(1  .4)
σp 

 .03464
n
200
.45  .40 
Convert to
 .40  .40
P(.40  p  .45)  P
z

standard
.03464 
 .03464
normal:
 P(0  z  1.44)
QMIS 220, by Dr. M. Zainal
Chap 7-48
Example
(continued)
if π = .4 and n = 200, what is
P(.40 ≤ p ≤ .45) ?

Use standard normal table:
P(0 ≤ z ≤ 1.44) = .4251
Standardized
Normal Distribution
Sampling Distribution
.4251
Standardize
.40
QMIS 220, by Dr. M. Zainal
.45
p
0
1.44
z
Chap 7-49
Chapter Summary



Discussed sampling error
Introduced sampling distributions
Described the sampling distribution of the mean





For normal populations
Using the Central Limit Theorem
Described the sampling distribution of a
proportion
Calculated probabilities using sampling
distributions
Discussed sampling from finite populations
QMIS 220, by Dr. M. Zainal
Chap 7-50
Copyright
The materials of this presentation were mostly
taken from the PowerPoint files accompanied
Business Statistics: A Decision-Making Approach,
7e © 2008 Prentice-Hall, Inc.
QMIS 220, by Dr. M. Zainal
Chap 7-51