Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Unit 7 (Ch 8)
Sampling Variability and
Sampling Distributions
Suppose we are interested in finding the true
mean (m) fat content of quarter-pound
hamburgers marketed by a national fast food
chain. To learn something about m, we could
answer
these
weand
will
obtain a To
sample
of n
= 50 questions,
hamburgers
examine the sampling distribution,
determine the fat content of each one.
which describes the long-run behavior
of sample statistic.
Recall that the sample
mean is a statistic
Statistic
• A number that that can be computed from
sample data
• Some statistics we will use include
x – sample mean
s – standard deviation
p – sample proportion
This variability is
called sampling
variability
• The observed value of the statistic
depends on the particular sample selected
from the population and it will vary from
sample to sample.
The campus of Wolf City College
has a fish pond. Suppose there
are 20 fish in the pond. The
lengths of the fish (in inches)
are given below:
4.5
5.4 10.3 7.9
6.3
4.3
9.6
8.5
6.6 11.7 8.9
2.2
9.8
8.7 13.3 4.6 10.7 13.4 7.7
5.6
This is a statistic!
We caught fish with lengths 6.3 The true mean m = 8.
Let’s
catch
inches,Suppose
2.2 inches,
and
13.3
inches.
Notice
that
some
we randomly catch a sample
ofan
This
is
two
more
x = 7.27
inches
sample
means
are
example
of
3 fish from this pond and measure
their
samples
and
closer and some
2nd sample
8.5,
4.6,
and
5.6
inches.
sampling
length. What would the mean
length
of
farther
away;
some
look
at
the
x = 6.23 inches
variability
above
and
some
below
the
sample
be?
sample
means.
rd
3 sample – 10.3, 8.9, and 13.4 inches.the mean.
x = 10.87 inches
Fish Pond Continued . . .
4.5
5.4 10.3 7.9
6.3
4.3
9.6
8.5
6.6 11.7 8.9
2.2
9.8
8.7 13.3 4.6 10.7 13.4 7.7
5.6
There are 1140 (20C3) different possible samples
of size 3 from this population. If we were to
catch all those different samples and calculate
the mean length of each sample, we would have a
distribution of all possible x.
This would be the sampling distribution of x.
Sampling Distributions of x
• The distribution that would be formed
by considering the value of a sample
statistic for every possible different
sample of a given size from a
population.
In this case, the sample
statistic is the sample
mean x.
Fish Pond Revisited . . .
Suppose there are only 5 fish
in the pond. The lengths of
the fish (in inches) are given
below:
6.6 11.7 8.9
2.2
What is the mean
mand
7.84
x = standard
deviation of this
sxpopulation?
= 3.262
9.8
We will keep the
population size
small so that we
can find ALL the
possible samples.
Fish Pond Revisited . . .
6.6 11.7 8.9
2.2
9.8
mx = 7.84 and sx = 3.262
Pairs
6.6 &
11.7
6.6 &
8.9
6.6 &
2.2
6.6 &
9.8
x
9.15
7.75
4.4
8.2
11.7 &
8.9
11.7 &
2.2
11.7 &
9.8
8.9 &
2.2
8.9 &
9.8
2.2 &
9.8
find
all5.55
the
10.3Let’s
6.95
10.75
9.35 6
How
many
samples
samplesofofsize
size22.
are
possible?
These values determine the How
doisthese
What
the mean
mx = 7.84
sampling
distribution of x for
valuesand
compare
to
standard
samples of size 2.
the
population
deviation
of these
sx = 1.998
meansample
and standard
means?
deviation?
Fish Pond Revisited . . .
6.6 11.7 8.9
2.2
9.8
mx = 7.84 and sx = 3.262
Triples
x
11.7,
8.9,
Now6.6,let’s11.7,
find 11.7,
all the
2.2, many
8.9,
8.9,
2.2,
2.2,
How
samples
samples
of
size
3.
9.8
2.2
9.8
9.8
9.8
of size
310.133
are 7.9 6.967
9.067
6.833 9.367
5.9
8.433
6.2
7.6
possible?
These values determine the
What is the mean
mx = 7.84
sampling
distribution of xHow
fordo
values
andthese
standard
samples of size 3.
compare to
deviation
ofthe
these
sx = 1.332
population
and
samplemean
means?
standard deviation?
6.6,
11.7,
8.9
6.6,
11.7,
2.2
6.6,
11.7,
9.8
6.6,
8.9,
2.2
6.6,
8.9,
9.8
What do you notice?
• The mean of the sampling distribution
EQUALS the mean of the population.
mx = m
• As the sample size increases, the
standard deviation of the sampling
distribution decreases.
as n
sx
General Properties of Sampling
Distributions of x
Rule 1:
Rule 2:
mx  m
sx 
s
n
Note that in the
previous fish pond
examples this standard
deviation formula was
not correct because
the sample sizes were
more than 10% of the
population.
This rule is exact if the population is infinite, and is
approximately correct if the population is finite and
no more than 10% of the population is included in
the sample
The paper “Mean Platelet Volume in Patients with
Metabolic Syndrome and Its Relationship with Coronary
Artery Disease” (Thrombosis Research, 2007) includes
data that suggests that the distribution of platelet
volume
of use
patients
whotodogenerate
not haverandom
metabolic
syndrome
We can
Minitab
samples
from
population.normal
We will
generate
random
samples
is this
approximately
with
mean m500
= 8.25
and standard
of n s= =5 0.75.
and compute the sample mean for each.
deviation
Platelets Continued . . .
Similarly, we will generate 500 random samples of n = 10,
n = 20, and n = 30. The density histograms below display
the resulting 500 x for each of the given sample sizes.
What do
do you
you notice
notice
What
do
you
notice
What
about the
themeans
standard
about
the
shape
of
about
deviation
of these
these
histograms?
these
histograms?
histograms?
General Properties Continued . . .
Rule 3:
When the population distribution is
normal, the sampling distribution of x is
also normal for any sample size n.
The paper “Is the Overtime Period in an NHL Game Long
Enough?” (American Statistician, 2008) gave data on the
time (in minutes) from the start of the game to the
first goal scored for the 281 regular season games from
the 2005-2006 season that went into overtime. The
density histogram for the data is shown below.
Let’s consider these 281 values as
a population. The distribution is
strongly positively skewed with
mean m = 13 minutes and with a
median of 10Using
minutes.
Minitab, we will
generate 500 samples of
the following sample sizes
from this distribution:
n = 5, n = 10, n = 20, n = 30.
What
do
you
notice
These
Are
these
are
histograms
What
dothe
youdensity
notice
about
the
standard
histograms
centered
for
the
at 500
about
the
shape
of
deviations
of
these
samples
approximately
m = 13?
these
histograms?
histograms?
General Properties Continued . . .
Rule 4: Central Limit Theorem
When n is sufficiently large, the sampling
distribution of x is well approximated by
a normal curve, even when the population
distribution is not itself normal.
How large is “sufficiently large”
CLT can safely be anyway?
applied if n exceeds 30.
A soft-drink bottler claims that, on average,
cans contain 12 oz of soda. Let x denote the
actual volume of soda in a randomly selected can.
Suppose that x is normally distributed with
s = .16 oz. Sixteen cans are randomly selected,
and the soda volume is determined for each one.
Let x = the resulting sample mean soda.
If the bottler’s claim is correct, then the
sampling distribution of x is normally distributed
with:
m x  m  12
s
.16
sx 

 .04
n
16
Soda Problem Continued . . .
m x  m  12
s
.16
sx 

 .04 To standardized these
n
16
endpoints, use
x
in
mthe
x  mean
m
What is the probability
that
the
Look these
up
table
x sample
z


soda volume is between
and 12.08
and 11.96
subtract
thes
sxounces
ounces?
probabilities.
n
P(11.96 < x < 12.08) = .9772 - .1587 = .8185
11.96  12
a* 
 1
.04
12.08  12
b* 
2
.04
A hot dog manufacturer asserts that one of its
brands of hot dogs has a average fat content of 18
grams per hot dog with standard deviation of 1
gram. Consumers of this brand would probably not
be disturbed if the mean was less than 18 grams,
but would be unhappy if it exceeded 18 grams. An
independent testing organization is asked to
analyze a random sample of 36 hot dogs. Suppose
the resulting sample mean is 18.4 grams. Does this
Since the sample size is
result suggest that the manufacturer’s claim is
greater than 30, the
incorrect?
Central Limit Theorem
applies.
So the distribution of x is
approximately normal with
mx  18 and sx 
1
 .1667
36
Hot Dogs Continued . . .
mx  18 and sx 
1
 .1667
36
Suppose the resulting sample mean is 18.4 grams.
Does this result suggest that the manufacturer’s
claim is incorrect?
P(x > 18.4) = 1 - .9918 = .0082
Values of x at least as large as 18.4 would be
observed
18.4  18 only about .82% of the time.
z 
 2.40
The sample
.1667 mean of 18.4 is large enough to
cause us to doubt that the manufacturer’s
claim is correct.
Let’s explore what happens with in distributions
of sample proportions (p). Have students
perform the following experiment.
This is a statistic!
•Toss a penny 20 times and record the number
of heads.
•Calculate
the
proportion
of
heads
and
mark
it
on
The
dotplot
is
a
partial
graph
of
the
What would happen to the dotplot if we
the dot
plot
on distribution
the
sampling
oftimes
all sample
flipped
theboard.
penny 50
and
proportions
sample size
20.
recorded
the of
proportion
of heads?
What shape do you think the dot plot will
have?
Sampling Distribution of p
The distribution that would be formed by
considering the value of a sample statistic for
every possible different
We sample
will use:of a given size
from a population.
p for the population proportion
and
p for the sample proportion
In this case, we will use
number of successes in the sample
pˆ 
n
Suppose we have a population of six students:
Alice, Ben, Charles, Denise, Edward, & Frank
will keepinthe
We are We
interested
thepopulation
proportionsmall
of females.
so that
weparameter
can find ALL
the
This is called
the
of interest
possible samples of a given size.
What is the proportion of females?
1/3
Let’s select samples of two from this population.
How many different samples are possible?
6C2
=15
Find the 15 different samples that are possible
and find the sample proportion of the number
of females in each sample.
Ben & Frank
Alice & Ben
.5
Charles & Denise
Alice & Charles
.5
Charles & Edward
Alice & Denise
1
Charles & Frank
Alice & Edward
.5
Denise of
& Edward
the mean
the
Alice & Frank How does
.5
Denise & Frank
Ben & Charles
0
sampling
distribution
& Frank
Ben & Denise compare
.5 to theEdward
population
Ben & Edward
0
parameter
(p)?
0
.5
0
0
.5
.5
0
Find the mean and standard deviation of these
sample proportions.
1
m pˆ 
and
s pˆ  0.29814
3
General Properties for Sampling
Distributions of p
Rule 1:
m pˆ  p
Rule 2: s pˆ 
p (1  p )
n
Note that in the
previous student
example this standard
deviation formula was
not correct because
the sample size was
more than 10% of the
population.
This rule is exact if the population is infinite, and is
approximately correct if the population is finite and
no more than 10% of the population is included in
the sample
In the fall of 2008, there were 18,516 students
enrolled at California Polytechnic State University,
San Luis Obispo. Of these students, 8091 (43.7%)
were female. We will use a statistical software
package to simulate sampling from this Cal Poly
population.
We will generate 500 samples of each of the
following sample sizes: n = 10, n = 25, n = 50, n = 100
and compute the proportion of females for each
sample.
The following histograms display
the distributions of the sample
proportions for the 500 samples of
each sample size.
What
do
notice
What
do you
youhistograms
notice about
about
Are these
thethe
standard
deviation
of
shape
of the
these
centered
around
true
these
distributions?
distributions?
proportion
p = .437?
The development of viral hepatitis after a blood
transfusion can cause serious complications for
a patient. The article “Lack of Awareness
Results in Poor Autologous Blood Transfusions” (Health
Care Management, May 15, 2003) reported that
hepatitis occurs in 7% of patients who receive blood
transfusions during heart surgery. We will simulate
sampling from a population of blood recipients.
We will generate 500 samples of each of the following
sample sizes: n = 10, n = 25, n = 50, n = 100 and
compute the proportion of people who contract
hepatitis for each sample.
The following histograms display the distributions of
the sample proportions for the 500 samples of each
sample size.
Are these
histogram
s centered
around
the true
proportion
p = .07?
What
happens to
the shape
of these
histograms
as the
sample size
increases?
General Properties Continued . . .
Rule 3: When n is large and p is not too
near 0 or 1, the sampling distribution of p
is approximately normal.
The farther the value of p is from 0.5, the larger n must
be for the sampling distribution of p to be approximately
normal.
A conservative rule of thumb:
If np > 10 and n (1 – p) > 10, then a normal
distribution provides a reasonable
approximation to the sampling distribution of p.
Blood Transfusions Revisited . . .
Let p = proportion of patients who contract
hepatitis after a blood transfusion
p = .07
Suppose a new blood
procedure
is we
Toscreening
answer this
question,
believed to reduce the
incident
rate
hepatitis.
must
consider
theofsampling
Blood screened using thisdistribution
procedure is
ofgiven
p. to
n = 200 blood recipients. Only 6 of the 200
patients contract hepatitis. Does this result
indicate that the true proportion of patients who
contract hepatitis when the new screening is
used is less than 7%?
Blood Transfusions Revisited . . .
Let p = .07
p = 6/200 = .03
Is the sampling distribution approximately
normal?
Yes, we can
np = 200(.07) = 14 > 10
use a normal
n(1-p) = 200(.93) = 186 > 10
approximation.
What is the mean and standard deviation of the
sampling distribution?
m pˆ  .07
.07(.93)
s pˆ 
 .018
200
Blood Transfusions Revisited . . .
m pˆ  .07
Let p = .07
p = 6/200 = .03
.07(.93)
s pˆ 
 .018
200
Does this result indicate that the true
proportion of patients who contract hepatitis
when the new screening is usedThis
is less
7%?
smallthan
probability
tells
us screening
that it is
This
new
P(p < .03) = .0132
unlikely that a sample
procedure
appears
proportion of
.03 or
toscreening
yield awould
smaller
.03  .07
Assume the
smaller
be
z 
 2.22
incidence
for
procedure
observed
is notrate
if the
.07(.93)
screening
procedure
hepatitis.
effective
and
p
=
.07.
200
was ineffective.