Download sampling distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 8
Sampling Variability and
Sampling Distributions
Basic Terms
Any quantity computed from values in a
sample is called a statistic.
The observed value of a statistic depends
on the particular sample selected from the
population; typically, it varies from sample
to sample. This variability is called
sampling variability.
2
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Sampling Distribution
The distribution of a statistic is called its
sampling distribution.
3
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Consider a population that consists of the
numbers 1, 2, 3, 4 and 5 generated in a manner
that the probability of each of those values is
0.2 no matter what the previous selections
were. This population could be described as the
outcome associated with a spinner such as
given below with the distribution next to it.
4
x
1
2
3
4
5
p(x)
0.2
0.2
0.2
0.2
0.2
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
If the sampling distribution for the means
of samples of size two is analyzed, it
looks like
Sample
1, 1
1, 2
1, 3
1, 4
1, 5
2, 1
2, 2
2, 3
2, 4
2, 5
3, 1
3, 2
3, 3
5
1
1.5
2
2.5
3
1.5
2
2.5
3
3.5
2
2.5
3
Sample
3, 4
3, 5
4, 1
4, 2
4, 3
4, 4
4, 5
5, 1
5, 2
5, 3
5, 4
5, 5
3.5
4
2.5
3
3.5
4
4.5
3
3.5
4
4.5
5
1
1.5
2
2.5
3
3.5
4
4.5
5
frequency
1
2
3
4
5
4
3
2
1
25
p(x)
0.04
0.08
0.12
0.16
0.20
0.16
0.12
0.08
0.04
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
The original distribution and the sampling
distribution of means of samples with n=2
are given below.
1
2
3
4
5
1
2
3
4
5
Sampling distribution
Original distribution
6
n=2
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Sampling distributions for n=3 and n=4
were calculated and are illustrated below.
1
2
3
4
5
Original distribution
1
7
2
3
4
5
Sampling distribution n = 3
1
2
3
4
5
Sampling distribution n = 2
1
2
3
4
5
Sampling distribution n = 4
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Simulations
To illustrate the general
behavior of samples of
2
fixed size n, 10000
samples each of size 30,
60 and 120 were
generated from this
uniform distribution and
the means calculated.
Probability histograms
2
were created for each of
these (simulated)
sampling distributions.
8
Notice all three of these
look to be essentially
normally distributed.
Further, note that the
variability decreases as
the sample size increases.2
3
4
3
4
3
4
Means (n=30)
Means (n=60)
Means (n=120)
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Simulations
To further illustrate the general behavior of samples of
fixed size n, 10000 samples each of size 4, 16 and 30
were generated from the positively skewed distribution
pictured below.
Skewed distribution
9
Notice that these sampling distributions all all skewed,
but as n increased the sampling distributions became
more symmetric and eventually appeared to be almost
normally distributed.
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Terminology
Let x denote the mean of the observations in a
random sample of size n from a population
having mean m and standard deviation s.
Denote the mean value of the x distribution by m x
and the standard deviation of the x distribution
by s x (called the standard error of the mean),
then the rules on the next two slides hold.
10
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Properties of the Sampling
Distribution of the Sample Mean.
Rule 1: m x  m
s
Rule 2: s x  n This rule is approximately
correct as long as no more than 5% of
the population is included in the
sample.
Rule 3: When the population distribution is
normal, the sampling distribution of x is
also normal for any sample size n.
11
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Central Limit Theorem.
Rule 4: When n is sufficiently large, the sampling
distribution of x is approximately normally
distributed, even when the population
distribution is not itself normal.
12
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Illustrations of Sampling
Distributions
Population
n= 4
n=9
n = 25
Symmetric normal like population
13
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Illustrations of Sampling
Distributions
Population
n=4
n=10
n=30
Skewed population
14
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
More about the Central Limit
Theorem.
The Central Limit Theorem can safely be
applied when n exceeds 30.
If n is large or the population distribution
is normal, the standardized variable
x  mX x  m
z

sX
s n
has (approximately) a standard normal
(z) distribution.
15
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A food company sells “18 ounce” boxes of
cereal. Let x denote the actual amount of
cereal in a box of cereal. Suppose that x is
normally distributed with m = 18.03 ounces
and s = 0.05.
a) What proportion of the boxes will contain
less than 18 ounces?
18  18.03 

P(x  18)  P  z 

0.05 

 P(z  0.60)  0.2743
16
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example - continued
b) A case consists of 24 boxes of cereal.
What is the probability that the mean
amount of cereal (per box in a case) is
less than 18 ounces?
The central limit theorem states that the
distribution of x is normally distributed so

18  18.03 
P(x  18)  P  z 

0.05 24 

 P(z  2.94)  0.0016
17
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Some proportion distributions
where p = 0.2
Let p be the proportion of successes in
a random sample of size n from a
population whose proportion of S’s
(successes) is p.
n = 10
n = 20
n = 50
n = 100
0.2
18
0.2
0.2
0.2
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Properties of the Sampling
Distribution of p
Let p be the proportion of successes in a
random sample of size n from a
population whose proportion of S’s
(successes) is p. Denote the mean of p
by mp and the standard deviation by sp.
Then the following rules hold
19
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Properties of the Sampling
Distribution of p
Rule 1: mp  p
Rule 2:
p(1  p)
sp 
n
Rule 3: When n is large and p is not too near 0
or 1, the sampling distribution of p is
approximately normal.
20
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Condition for Use
The further the value of p is from 0.5, the larger
n must be for a normal approximation to the
sampling distribution of p to be accurate.
Rule of Thumb
If both np  10 and n(1-p)  10, then it is safe to
use a normal approximation.
21
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
If the true proportion of defectives produced by
a certain manufacturing process is 0.08 and a
sample of 400 is chosen, what is the
probability that the proportion of defectives in
the sample is greater than 0.10?
Since np  400(0.08)  32 > 10 and
n(1-p) = 400(0.92) = 368 > 10,
it’s reasonable to use the normal approximation.
22
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
(continued)
mp  p  0.08
p(1  p)
0.08(1  0.08)
sp 

 0.013565
n
400
p  mp 0.10  0.08
z

 1.47
sp
0.013565
P(p > 0.1)  P(z > 1.47)
 1  0.9292  0.0708
23
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
Suppose 3% of the people contacted by phone are
receptive to a certain sales pitch and buy your
product. If your sales staff contacts 2000 people,
what is the probability that more than 100 of the
people contacted will purchase your product?
24
Clearly p = 0.03 and p = 100/2000 = 0.05 so



0.05  0.03 
P(p > 0.05)  P  z >

(0.03)(0.97) 



2000


0.05  0.03 

 P z >
  P(z > 5.24)  0
0.0038145 

Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example - continued
If your sales staff contacts 2000 people, what is the
probability that less than 50 of the people contacted
will purchase your product?
Now p = 0.03 and p = 50/2000 = 0.025 so



0.025  0.03 
P(p  0.025)  P  z 

(0.03)(0.97) 



2000


0.025  0.03 

 P z 
  P(z  1.31)  0.0951
0.0038145 

25
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.