• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia, lookup

Gibbs sampling wikipedia, lookup

Transcript
```Chapter 8
Sampling Variability
&
Sampling Distributions
1
Basic Terms
Any quantity computed from values in a
sample is called a statistic.
The observed value of a statistic depends
on the particular sample selected from
the population; typically, it varies from
sample to sample. This variability is
called sampling variability.
2
Sampling Distribution
The distribution of a statistic is called its
sampling distribution.
3
Example
Consider a population that consists of the
numbers 1, 2, 3, 4 and 5 generated in a
manner that the probability of each of those
values is 0.2 no matter what the previous
selections were. This population could be
described as the outcome associated with a
spinner such as given below. The distribution is
next to it.
x
1
2
3
4
5
4
p(x)
0.2
0.2
0.2
0.2
0.2
Example
If the sampling distribution for the means of
samples of size two is analyzed, it looks like
Sample
1, 1
1, 2
1, 3
1, 4
1, 5
2, 1
2, 2
2, 3
2, 4
2, 5
3, 1
3, 2
3, 3
5
1
1.5
2
2.5
3
1.5
2
2.5
3
3.5
2
2.5
3
Sample
3, 4
3, 5
4, 1
4, 2
4, 3
4, 4
4, 5
5, 1
5, 2
5, 3
5, 4
5, 5
3.5
4
2.5
3
3.5
4
4.5
3
3.5
4
4.5
5
1
1.5
2
2.5
3
3.5
4
4.5
5
frequency
1
2
3
4
5
4
3
2
1
25
p(x)
0.04
0.08
0.12
0.16
0.20
0.16
0.12
0.08
0.04
Example
The original distribution and the sampling
distribution of means of samples with n=2
are given below.
1
2
3
4
Original distribution
5
1
2
3
4
5
Sampling distribution
n=2
6
Example
Sampling distributions for n=3 and n=4 were
calculated and are illustrated below.
1
2
3
4
5
Sampling distribution n = 3
7
1
2
3
4
5
Sampling distribution n = 4
Simulations
To illustrate the general
behavior of samples of
fixed size n, 10000
samples each of size 30,
60 and 120 were
generated from this
uniform distribution and
the means calculated.
Probability histograms
were created for each of
these (simulated)
sampling distributions.
8
2
3
4
Means (n=30)
2
Notice all three of these
look to be essentially
normally distributed.
Further, note that the
variability decreases as
the sample size increases. 2
3
4
3
4
Means (n=60)
Means (n=120)
Simulations
To further illustrate the general behavior of
samples of fixed size n, 10000 samples each of
size 4, 16 and 30 were generated from the
positively skewed distribution pictured below.
Skewed distribution
9
Notice that these sampling distributions all all skewed,
but as n increased the sampling distributions became
more symmetric and eventually appeared to be almost
normally distributed.
Terminology
Let x denote the mean of the observations
in a random sample of size n from a
population having mean µ and standard
deviation . Denote the mean value of the
distribution by  x and the standard deviation
of the distribution by  x (called the standard
error of the mean), then the rules on the
next two slides hold.
10
Properties of the Sampling
Distribution of the Sample Mean.
Rule 1:  x  
Rule 2: 
x

n
This rule is approximately correct as
long as no more than 5% of the
population is included in the sample.
Rule 3: When the population distribution is
normal, the sampling distribution of x
is also normal for any sample size n.
11
Central Limit Theorem.
Rule 4: When n is sufficiently large, the
sampling distribution of x is
approximately normally
distributed, even when the
population distribution is not
itself normal.
12
Illustrations of Sampling
Distributions
Population
n =4
n=9
n = 16
Symmetric normal like population
13
Illustrations of Sampling
Distributions
Population
n=4
n=10
n=30
Skewed population
14
Theorem.
The Central Limit Theorem can safely
be applied when n exceeds 30.
If n is large or the population distribution
is normal, the standardized variable
x  X x  
z

X
 n
has (approximately) a standard normal
(z) distribution.
15
Example
A food company sells “18 ounce” boxes
of cereal. Let x denote the actual amount
of cereal in a box of cereal. Suppose that
x is normally distributed with µ = 18.03
ounces and  = 0.05.
a) What proportion of the boxes will
contain less than 18 ounces?
18  18.03 

P(x  18)  P  z 

0.05 

 P(z  0.60)  0.2743
16
Example - continued
b) A case consists of 24 boxes of cereal.
What is the probability that the mean
amount of cereal (per box in a case)
is less than 18 ounces?
The central limit theorem states that the
distribution of x is normally distributed so

18  18.03 
P(x  18)  P  z 

0.05 24 

 P(z  2.94)  0.0016
17
Some proportion distributions
where  = 0.2
Let p be the proportion of successes in a
random sample of size n from a population
whose proportion of S’s (successes) is .
n = 10
n = 20
n = 50
n = 100
18
0.2
0.2
0.2
0.2
Properties of the Sampling
Distribution of p
Let p be the proportion of successes
in a random sample of size n from a
population whose proportion of S’s
(successes) is .
Denote the mean of p by p and the
standard deviation by p. Then the
following rules hold
19
Properties of the Sampling
Distribution of p
Rule 1: p  
Rule 2:
(1  )
p 
n
Rule 3: When n is large and  is not too near
0 or 1, the sampling distribution of p is
approximately normal.
20
Condition for Use
The further the value of  is from 0.5, the
larger n must be for the normal
approximation to the sampling distribution
of p to be accurate.
Rule of Thumb
If both np ≥ 10 and n(1-p)  10, then it is
safe to use a normal approximation.
21
Example
If the true proportion of defectives
produced by a certain manufacturing
process is 0.08 and a sample of 400 is
chosen, what is the probability that the
proportion of defectives in the sample is
greater than 0.10?
Since n 400(0.08) 32>10 and
n(1-) = 400(0.92) = 368 > 10,
it’s reasonable to use the normal
approximation.
22
Example
(continued)
p    0.08
(1  )
0.08(1  0.08)
p 

 0.013565
n
400
p  p 0.10  0.08
z

 1.47
p
0.013565
P(p > 0.1)  P(z > 1.47)
 1  0.9292  0.0708
23
Example
Suppose 3% of the people contacted by phone
are receptive to a certain sales pitch and buy
people, what is the probability that more than
100 of the people contacted will purchase your
product?
Clearly  = 0.03 and p = 100/2000 = 0.05 so
24



0.05  0.03 
P(p > 0.05)  P  z >

(0.03)(0.97) 


2000


0.05  0.03 

 P z >
  P(z > 5.24)  0
0.0038145 

Example - continued
If your sales staff contacts 2000 people, what
is the probability that less than 50 of the
people contacted will purchase your product?
Now  = 0.03 and p = 50/2000 = 0.025 so



0.025  0.03 
P(p  0.025)  P  z 

(0.03)(0.97) 


2000


0.025  0.03 
 P  z 
  P(z  1.31)  0.0951
0.0038145 

25