Download 2 - Cloudfront.net

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Chapter 8
Sampling Variability
&
Sampling Distributions
1
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Basic Terms
Any quantity computed from values in a
sample is called a statistic.
The observed value of a statistic depends
on the particular sample selected from
the population; typically, it varies from
sample to sample. This variability is
called sampling variability.
2
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sampling Distribution
The distribution of a statistic is called its
sampling distribution.
3
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Consider a population that consists of the
numbers 1, 2, 3, 4 and 5 generated in a
manner that the probability of each of those
values is 0.2 no matter what the previous
selections were. This population could be
described as the outcome associated with a
spinner such as given below. The distribution is
next to it.
x
1
2
3
4
5
4
p(x)
0.2
0.2
0.2
0.2
0.2
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
If the sampling distribution for the means of
samples of size two is analyzed, it looks like
Sample
1, 1
1, 2
1, 3
1, 4
1, 5
2, 1
2, 2
2, 3
2, 4
2, 5
3, 1
3, 2
3, 3
5
1
1.5
2
2.5
3
1.5
2
2.5
3
3.5
2
2.5
3
Sample
3, 4
3, 5
4, 1
4, 2
4, 3
4, 4
4, 5
5, 1
5, 2
5, 3
5, 4
5, 5
3.5
4
2.5
3
3.5
4
4.5
3
3.5
4
4.5
5
1
1.5
2
2.5
3
3.5
4
4.5
5
frequency
1
2
3
4
5
4
3
2
1
25
p(x)
0.04
0.08
0.12
0.16
0.20
0.16
0.12
0.08
0.04
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
The original distribution and the sampling
distribution of means of samples with n=2
are given below.
1
2
3
4
Original distribution
5
1
2
3
4
5
Sampling distribution
n=2
6
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Sampling distributions for n=3 and n=4 were
calculated and are illustrated below.
1
2
3
4
5
Sampling distribution n = 3
7
1
2
3
4
5
Sampling distribution n = 4
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Simulations
To illustrate the general
behavior of samples of
fixed size n, 10000
samples each of size 30,
60 and 120 were
generated from this
uniform distribution and
the means calculated.
Probability histograms
were created for each of
these (simulated)
sampling distributions.
8
2
3
4
Means (n=30)
2
Notice all three of these
look to be essentially
normally distributed.
Further, note that the
variability decreases as
the sample size increases. 2
3
4
3
4
Means (n=60)
Means (n=120)
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Simulations
To further illustrate the general behavior of
samples of fixed size n, 10000 samples each of
size 4, 16 and 30 were generated from the
positively skewed distribution pictured below.
Skewed distribution
9
Notice that these sampling distributions all all skewed,
but as n increased the sampling distributions became
more symmetric and eventually appeared to be almost
normally distributed.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Terminology
Let x denote the mean of the observations
in a random sample of size n from a
population having mean µ and standard
deviation . Denote the mean value of the
distribution by  x and the standard deviation
of the distribution by  x (called the standard
error of the mean), then the rules on the
next two slides hold.
10
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Properties of the Sampling
Distribution of the Sample Mean.
Rule 1:  x  
Rule 2: 
x

n
This rule is approximately correct as
long as no more than 5% of the
population is included in the sample.
Rule 3: When the population distribution is
normal, the sampling distribution of x
is also normal for any sample size n.
11
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Central Limit Theorem.
Rule 4: When n is sufficiently large, the
sampling distribution of x is
approximately normally
distributed, even when the
population distribution is not
itself normal.
12
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Illustrations of Sampling
Distributions
Population
n =4
n=9
n = 16
Symmetric normal like population
13
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Illustrations of Sampling
Distributions
Population
n=4
n=10
n=30
Skewed population
14
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
More about the Central Limit
Theorem.
The Central Limit Theorem can safely
be applied when n exceeds 30.
If n is large or the population distribution
is normal, the standardized variable
x  X x  
z

X
 n
has (approximately) a standard normal
(z) distribution.
15
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A food company sells “18 ounce” boxes
of cereal. Let x denote the actual amount
of cereal in a box of cereal. Suppose that
x is normally distributed with µ = 18.03
ounces and  = 0.05.
a) What proportion of the boxes will
contain less than 18 ounces?
18  18.03 

P(x  18)  P  z 

0.05 

 P(z  0.60)  0.2743
16
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example - continued
b) A case consists of 24 boxes of cereal.
What is the probability that the mean
amount of cereal (per box in a case)
is less than 18 ounces?
The central limit theorem states that the
distribution of x is normally distributed so

18  18.03 
P(x  18)  P  z 

0.05 24 

 P(z  2.94)  0.0016
17
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Some proportion distributions
where  = 0.2
Let p be the proportion of successes in a
random sample of size n from a population
whose proportion of S’s (successes) is .
n = 10
n = 20
n = 50
n = 100
18
0.2
0.2
0.2
0.2
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Properties of the Sampling
Distribution of p
Let p be the proportion of successes
in a random sample of size n from a
population whose proportion of S’s
(successes) is .
Denote the mean of p by p and the
standard deviation by p. Then the
following rules hold
19
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Properties of the Sampling
Distribution of p
Rule 1: p  
Rule 2:
(1  )
p 
n
Rule 3: When n is large and  is not too near
0 or 1, the sampling distribution of p is
approximately normal.
20
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Condition for Use
The further the value of  is from 0.5, the
larger n must be for the normal
approximation to the sampling distribution
of p to be accurate.
Rule of Thumb
If both np ≥ 10 and n(1-p)  10, then it is
safe to use a normal approximation.
21
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
If the true proportion of defectives
produced by a certain manufacturing
process is 0.08 and a sample of 400 is
chosen, what is the probability that the
proportion of defectives in the sample is
greater than 0.10?
Since n 400(0.08) 32>10 and
n(1-) = 400(0.92) = 368 > 10,
it’s reasonable to use the normal
approximation.
22
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
(continued)
p    0.08
(1  )
0.08(1  0.08)
p 

 0.013565
n
400
p  p 0.10  0.08
z

 1.47
p
0.013565
P(p > 0.1)  P(z > 1.47)
 1  0.9292  0.0708
23
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Suppose 3% of the people contacted by phone
are receptive to a certain sales pitch and buy
your product. If your sales staff contacts 2000
people, what is the probability that more than
100 of the people contacted will purchase your
product?
Clearly  = 0.03 and p = 100/2000 = 0.05 so
24



0.05  0.03 
P(p > 0.05)  P  z >

(0.03)(0.97) 


2000


0.05  0.03 

 P z >
  P(z > 5.24)  0
0.0038145 

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example - continued
If your sales staff contacts 2000 people, what
is the probability that less than 50 of the
people contacted will purchase your product?
Now  = 0.03 and p = 50/2000 = 0.025 so



0.025  0.03 
P(p  0.025)  P  z 

(0.03)(0.97) 


2000


0.025  0.03 
 P  z 
  P(z  1.31)  0.0951
0.0038145 

25
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.