Download Chapter 8 - Algebra I PAP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 8
Sampling Variability
&
Sampling Distributions
Section 8.1
Statistics and Sampling
Variability
Suppose we are interested in finding the true
mean (m) fat content of quarter-pound hamburgers
marketed by a national fast food chain. To learn
something about m, we could obtain a sample of
To answer and
these
questions,
n = 50 hamburgers
determine
thewe
fatwill
content
examine the sampling distribution, which
of each one.
describes the long-run behavior of sample
statistics.
Recall that the sample
mean is a statistic
Statistic
The campus of Wolf City College
has a fish pond. Suppose there
are 20 fish in the pond. The
lengths of the fish (in inches) are
given below:
4.5
5.4 10.3 7.9
6.3
4.3
8.5
6.6 11.7 8.9
2.2
9.8
9.6
8.7 13.3 4.6 10.7 13.4 7.7 5.6
This is aThe
statistic!
true mean m = 8.
We caught fish with lengths 6.3
inches,
Notice
that some
Let’sare
catch two
2.2 inches,
and
13.3
inches.
sample
means
Suppose we randomly catch a sample
ofan
This
is
more
samples
closer
and
some
farther
x = 7.273inches
example
of
fish from this pond and measure
theirat the
and
look
away;
some above and
2nd sample
8.5,
4.6,
and
5.6
inches.
sampling
length. What would
thebelow
mean
of
some
thelength
mean.
sample
means.
x = 6.23 inches
the sample be?
rd
3 sample – 10.3, 8.9, and 13.4 inches.
x = 10.87 inches
variability
The distribution that would be formed by
considering the value of a sample statistic
for every possible different sample of a
given size from a population.
In this case, the sample
statistic is the sample
mean x.
Fish Pond Continued . . .
4.5
5.4 10.3 7.9
6.3
4.3
9.6
8.5
6.6 11.7 8.9
2.2
9.8
8.7 13.3 4.6 10.7 13.4 7.7
5.6
Section 8.2
The Sampling Distribution of a
Sample Mean
Fish Pond Revisited . . .
Suppose there are only 5 fish
in the pond. The lengths of the
fish (in inches) are given below:
6.6 11.7 8.9
2.2
What is the mean
mand
7.84
x =standard
deviation of this
sxpopulation?
= 3.262
9.8
We will keep the
population size
small so that we
can find ALL the
possible samples.
Fish Pond Revisited . . .
6.6 11.7 8.9
2.2
9.8
mx = 7.84 and sx = 3.262
Pairs
6.6 &
11.7
6.6 &
8.9
6.6 &
2.2
6.6 &
9.8
x
9.15
7.75
4.4
8.2
mx = 7.84
sx = 1.998
11.7 &
8.9
11.7 &
2.2
11.7 &
9.8
8.9 &
2.2
8.9 &
9.8
2.2 &
9.8
find
all
the
10.3 Let’s
6.95
10.75
5.55
9.35
6
How many samples
samplesofofsize
size22.
are
possible?
How
doisthese
What
the mean
valuesand
compare
to
standard
the
population
deviation
of these
meansample
and standard
means?
deviation?
Fish Pond Revisited . . .
6.6 11.7 8.9
2.2
9.8
mx = 7.84 and sx = 3.262
Triples
x
6.6,
11.7,
8.9
6.6,
11.7,
2.2
6.6,
11.7,
9.8
6.6,
8.9, 2.2
9.067
6.833
9.367
5.9
mx = 7.84
sx = 1.332
6.6, let’s
11.7,
Now
find 11.7,
all the11.7,
2.2, 9.8 many
8.9, 2.2 samples
8.9, 9.8 2.2, 9.8
How
samples
of size
3.
of size
310.133
are 7.9
8.433
6.2
7.6
possible?
6.6,
8.9, 9.8
What is the mean
How do
values
andthese
standard
compare
deviation to
of the
these
population
and
samplemean
means?
standard deviation?
8.9,
2.2,
9.8
6.967
What do you notice?
• The mean of the sampling distribution
EQUALS the mean of the population.
mx = m
• As the sample size increases, the
standard deviation of the sampling
distribution decreases.
as n
sx
General Properties of Sampling
Distributions of x
Rule 1:
Rule 2:
mx  m
sx 
s
n
Note that in the previous
fish pond examples this
standard deviation
formula was not correct
because the sample
sizes were more than
10% of the population.
This rule is exact if the population is infinite, and is
approximately correct if the population is finite and no
more than 10% of the population is included in the
sample
The paper “Mean Platelet Volume in Patients with Metabolic
Syndrome and Its Relationship with Coronary Artery
Disease” (Thrombosis Research, 2007) includes data that
suggests
that
the
distribution
of platelet
volume
of patients
We can
use
Minitab
to generate
random
samples
from
whothis
dopopulation.
not have metabolic
syndrome500
is approximately
We will generate
random samples
normalofwith
8.25
and s =the
0.75.
n = m5 =and
compute
sample mean for each.
Platelets Continued . . .
Similarly, we will generate 500 random samples of n = 10,
n = 20, and n = 30. The density histograms below display
the resulting 500 x for each of the given sample sizes.
What do
doyou
you notice
notice
What
do
you
notice
What
aboutthe
themeans
standard
about
the
shape
of
about
deviation
of these
these
histograms?
these
histograms?
histograms?
General Properties Continued . . .
Rule 3:
When the population distribution is normal,
the sampling distribution of x is also normal
for any sample size n.
The paper “Is the Overtime Period in an NHL Game Long
Enough?” (American Statistician, 2008) gave data on the
time (in minutes) from the start of the game to the first goal
scored for the 281 regular season games from the 20052006 season that went into overtime. The density
histogram for the data is shown below.
Let’s consider these 281 values as a
population. The distribution is
strongly positively skewed with
mean m = 13 minutes and with a
median of 10Using
minutes.
Minitab, we will
generate 500 samples of the
following sample sizes from
this distribution:
n = 5, n = 10, n = 20, n = 30.
What
do
you
notice
These
Are
these
are
histograms
What
dothe
youdensity
notice
about
the
standard
histograms
centered
the
at 500
about
thefor
shape
of
deviations
of
these
approximately
samples
m = 13?
these
histograms?
histograms?
General Properties Continued . . .
How large is “sufficiently large”
CLT can safely be applied
if n exceeds 30.
anyway?
A soft-drink bottler claims that, on average, cans
contain 12 oz of soda. Let x denote the actual
volume of soda in a randomly selected can.
Suppose that x is normally distributed with
s = .16 oz. Sixteen cans are randomly selected,
and the soda volume is determined for each one.
Let x = the resulting sample mean soda.
If the bottler’s claim is correct, then the sampling
distribution of x is normally distributed with:
m x  m  12
s
.16
sx 

 .04
n
16
Soda Problem Continued . . .
m x  m  12
s
.16
sx 

 .04
n
16
To standardized these
endpoints, use
in
mthe
xmean
 m soda
What is the probability
that x
the
sample
Look these
up
table
x
z 

volume is between 11.96
andounces
subtract
the12.08
s
sx and
ounces?
n
probabilities.
P(11.96 < x < 12.08) = .9772 - .1587 = .8185
11.96  12
a* 
 1
.04
12.08  12
b* 
2
.04
A hot dog manufacturer asserts that one of its brands
of hot dogs has an average fat content of 18 grams
per hot dog with standard deviation of 1 gram.
Consumers of this brand would probably not be
disturbed if the mean was less than 18 grams, but
would be unhappy if it exceeded 18 grams. An
independent testing organization is asked to analyze
a random sample of 36 hot dogs. Suppose the
resulting sample mean is 18.4 grams. Does this
Since the sample size is
result suggest that the manufacturer’s claim is
greater than 30, the
incorrect?
Central Limit Theorem
applies.
So the distribution of x is
approximately normal with
mx  18 and sx 
1
 .1667
36
Hot Dogs Continued . . .
mx  18 and sx 
1
 .1667
36
Suppose the resulting sample mean is 18.4 grams.
Does this result suggest that the manufacturer’s claim
is incorrect?
P(x > 18.4) = 1 - .9918 = .0082
Values of x at least as large as 18.4 would be
observed
18
.4  18 only about .82% of the time.
z 
 2.40
The sample
.1667 mean of 18.4 is large enough to
cause us to doubt that the manufacturer’s claim is
correct.
Section 8.3
The Sampling Distribution of a
Sample Proportion
Let’s explore what happens within distributions of
sample proportions (p). Have students perform the
following experiment.
This is a statistic!
•Toss a penny 20 times and record the number of
heads.
The
is a partial
graph
ofmark
the
•Calculate
the
proportion
oftoheads
and
Whatdotplot
would
happen
the
dotplot
if weit on
sampling
distribution
of all
the dot
plot on
board.
flipped
thethe
penny
50 times
andsample
recorded
proportions
of sample
size 20.
the proportion
of heads?
What shape do you think the dot plot will have?
Sampling Distribution of p
The distribution that would be formed by
considering the value of a sample statistic for
We will use:
every possible different sample of a given size
p for the population proportion
from a population.
and
p for the sample proportion
In this case, we will use
number of successes in the sample
pˆ 
n
Suppose we have a population of six students:
Alice, Ben, Charles, Denise, Edward, & Frank
will keep
so
We areWe
interested
in the
the population
proportion small
of females.
that we the
can parameter
find ALL theof
possible
interest
This is called
samples of a given size.
What is the proportion of females?
1/3
Let’s select samples of two from this population.
How many different samples are possible?
6C2
=15
Find the 15 different samples that are possible and
find the sample proportion of the number of
females in each sample.
Ben & Frank
Alice & Ben
.5
Charles & Denise
Alice & Charles
.5
Charles & Edward
Alice & Denise
1
Charles & Frank
Alice & Edward
.5
Denise of
& Edward
the mean
the
Alice & Frank How does
.5
Denise & Frank
Ben & Charles
0
sampling
distribution
& Frank
Ben & Denise compare
.5 to theEdward
population
Ben & Edward
0
parameter
(p)?
0
.5
0
0
.5
.5
0
Find the mean and standard deviation of these
sample proportions.
1
m pˆ 
and s pˆ  0.29814
3
General Properties for Sampling
Distributions of p
Rule 1:
m pˆ  p
Rule 2: s pˆ 
p (1  p )
n
Note that in the previous
student example this
standard deviation
formula was not correct
because the sample
size was more than 10%
of the population.
This rule is exact if the population is infinite, and is
approximately correct if the population is finite and
no more than 10% of the population is included in
the sample
In the fall of 2008, there were 18,516 students enrolled
at California Polytechnic State University, San Luis
Obispo. Of these students, 8091 (43.7%) were
female. We will use a statistical software package to
simulate sampling from this Cal Poly population.
We will generate 500 samples of each of the following
sample sizes: n = 10, n = 25, n = 50, n = 100 and
compute the proportion of females for each sample.
The following histograms display the
distributions of the sample proportions
for the 500 samples of each sample
size.
What
do you
notice about
about
What
do
you
notice
Are these
histograms
thethe
standard
deviation
shape
of the
these
centered
around
trueof
these
distributions?
distributions?
proportion
p = .437?
The development of viral hepatitis after a blood
transfusion can cause serious complications for
a patient. The article “Lack of Awareness Results
in Poor Autologous Blood Transfusions” (Health Care
Management, May 15, 2003) reported that hepatitis
occurs in 7% of patients who receive blood transfusions
during heart surgery. We will simulate sampling from a
population of blood recipients.
We will generate 500 samples of each of the following
sample sizes: n = 10, n = 25, n = 50, n = 100 and
compute the proportion of people who contract hepatitis
for each sample.
The following histograms display the distributions of the
sample proportions for the 500 samples of each sample
size.
Are these
histograms
centered
around the
true
proportion
p = .07?
What
happens to
the shape
of these
histograms
as the
sample size
increases?
General Properties Continued . . .
Rule 3: When n is large and p is not too near
0 or 1, the sampling distribution of p is
approximately normal.
The farther the value of p is from 0.5, the larger n must be for
the sampling distribution of p to be approximately normal.
A conservative rule of thumb:
If np > 10 and n (1 – p) > 10, then a normal
distribution provides a reasonable approximation
to the sampling distribution of p.
Blood Transfusions Revisited . . .
Let p = proportion of patients who contract
hepatitis after a blood transfusion
p = .07
Suppose a new bloodToscreening
procedure
is we
answer this
question,
believed to reduce themust
incident
rate of
consider
thehepatitis.
sampling
distributionisofgiven
p. to
Blood screened using this procedure
n = 200 blood recipients. Only 6 of the 200 patients
contract hepatitis. Does this result indicate that the
true proportion of patients who contract hepatitis
when the new screening is used is less than 7%?
Blood Transfusions Revisited . . .
Let p = .07
p = 6/200 = .03
Is the sampling distribution approximately normal?
np = 200(.07) = 14 > 10
Yes, we can
n(1-p) = 200(.93) = 186 > 10
use a normal
approximation.
What is the mean and standard deviation of the
sampling distribution?
m pˆ  .07
.07(.93)
s pˆ 
 .018
200
Blood Transfusions Revisited . . .
m pˆ  .07
Let p = .07
p = 6/200 = .03
.07(.93)
s pˆ 
 .018
200
Does this result indicate that the true proportion of
patients who contract hepatitis when the new
screening is used is less than 7%?
This small probability
tells us
thatscreening
it is unlikely
This
new
P(p < .03) = .0132
that a sample
procedure
proportion appears
of .03 or
yield awould
smaller
.03  .07
Assume thetoscreening
smaller
be
z 
 2.22
incidence
rate
observed
if thefor
procedure is
not
effective
.07(.93)
screening
procedure
hepatitis.
and
p
=
.07.
200
was ineffective.
Defective Example
If the true proportion of defectives produced by a
certain manufacturing process is 0.08 and a
sample of 400 is chosen, what is the probability
that the proportion of defectives in the sample is
greater than 0.10?
Would a normal distribution be reasonable?
Since np  400(0.08)  32 > 10 and
n(1- p) = 400(0.92) = 368 > 10,
it’s reasonable to use the normal approximation.
Defective Example--Continued
mp    0.08
(1  )
0.08(1  0.08)
sp 

 0.013565
n
400
p  mp 0.10  0.08
z

 1.47
sp
0.013565
P(p > 0.1)  P(z > 1.47)
 1  0.9292  0.0708
© 2008 Brooks/Cole, a division of Thomson
Learning, Inc.
Sales Example
Suppose 3% of the people contacted by phone are
receptive to a certain sales pitch and buy your product.
If your sales staff contacts 2000 people, what is the
probability that more than 100 of the people contacted
will purchase your product?
Clearly p = 0.03 and p = 100/2000 = 0.05 so



0.05  0.03 
P(p > 0.05)  P  z >

(0.03)(0.97) 


2000


0.05  0.03 

 P z >
  P(z > 5.24)  0
0.0038145 

Sales Example - continued
If your sales staff contacts 2000 people, what is the
probability that less than 50 of the people contacted
will purchase your product?
Now p = 0.03 and p = 50/2000 = 0.025 so



0.025  0.03 
P(p  0.025)  P  z 

(0.03)(0.97) 


2000


0.025  0.03 

 P z 
  P(z  1.31)  0.0951
0.0038145 
