Download Chapter 18

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 18
Sampling Distributions
– Sample Means &
Sample Proportions
Parameter
• A number that describes the population
• Symbols we will use for parameters
include
m - mean
s – standard deviation
p/ p – proportion (p)
a – y-intercept of LSRL
b – slope of LSRL
Statistic
• A number that can be computed from
sample data without making use of any
unknown parameter
• Symbols we will use for statistics include
x – mean
s – standard deviation
p – proportion
a – y-intercept of LSRL
b – slope of LSRL
The sampling distribution
of a statistic is the
distribution of values taken
by the statistic in all
possible samples of the
same size from the same
population.
Consider the population of 5
fish in my pond – the length of
fish (in inches):
2, 7, 10, 11, 14
What is the mean
mand
8.8
x =standard
deviation
of this
sx = 4.0694
population?
Let’s take samples of size 2
(n = 2) from this population:
How many samples of size 2 are
possible?
C = 10
5
2
mx = 8.8
sx = 2.4919
Find
all 10
of
What
is the
mean
these
and samples
standardand
record
theofsample
deviation
the
means.
sample
means?
Repeat this procedure with sample
size n = 3
How many samples of size 3 are
possible?
C = 10
5
mx = 8.8
sx =
3
What
mean
Find
allisofthe
these
and standard
samples
and
deviation
of the
record
the
sample
1.66132 sample means?
means.
What do you notice?
• The mean of the sampling distribution
EQUALS the mean of the population.
mx = m
• As the sample size increases, the standard
deviation of the sampling distribution
decreases.
as n
sx
A statistic used to estimate a
parameter is unbiased if the
mean of its sampling
distribution
is
equal
to
the
Remember the Jelly Blubbers?
The judgmental
samples were centered
true value
of the parameter
too high & were bias, while the
being estimated.
randomly selected samples were
centered over the true mean
General Properties
Rule 1:
mx = m
s
Rule 2: sx =
n
This rule is approximately correct as long
as no more than (10%) of the population is
included in the sample
General Properties
Rule 3:
When the population distribution is
normal, the sampling distribution of x
is also normal for any sample size n.
Activity – drawing samples
General Properties
Rule 4: Central Limit Theorem
When n is sufficiently large, the
sampling distribution of x is well
approximated by a normal curve, even
when the population distribution is not
How large is “sufficiently large”
itself normal.
anyway?
CLT can safely be applied if n exceeds 30.
EX) The army reports that the distribution of
head circumference among soldiers is
approximately normal with mean 22.8 inches and
standard deviation of 1.1 inches.
a) What is the probability that a randomly
selected soldier’s head will have a circumference
that is greater than 23.5 inches?
P(X > 23.5) = .2623
b) What is the probability that a random
sample of five soldiers will have an
average head circumference that is greater
than 23.5 inches?
Do you expect the probability to
be more or
less
than
the
answer
What normal curve are
to part
(a)?
Explain
you now working with?
P(X > 23.5) = .0774
Suppose a team of biologists has been
studying the Pinedale children’s fishing
pond. Let x represent the length of a single
trout taken at random from the pond. This
group of biologists has determined that the
length has a normal distribution with mean of
10.2 inches and standard deviation of 1.4
inches. What is the probability that a single
trout taken at random from the pond is
between 8 and 12 inches long?
P(8 < X < 12) = .8427
What is the probability that the mean
length of five trout taken at random is
between 8 and 12 inches long?
Do xyou
expect
the probability to
P(8<
<12)
= .9978
be more or less than the answer
to part (a)? Explain
What sample mean would be at the 95th
percentile? (Assume n = 5)
x = 11.43 inches
A soft-drink bottler claims that, on average,
cans contain 12 oz of soda. Let x denote the
actual volume of soda in a randomly selected
can. Suppose that x is normally distributed
with s = .16 oz. Sixteen cans are randomly
selected and a mean of 12.1 oz is calculated.
What is the probability that the mean of 16
cans will exceed 12.1 oz?
P(x >12.1) = .0062
A hot dog manufacturer asserts that one of its
brands of hot dogs has a average fat content of
18 grams per hot dog with standard deviation of
1 gram. Consumers of this brand would
probably not be disturbed if the mean was less
than 18 grams, but would be unhappy if it
exceeded 18 grams. An independent testing
organization is asked to analyze a random
sample of 36 hot dogs. Suppose the resulting
sample mean is 18.4 grams. What is the
probability that the sample mean is greater
than 18.4 grams?
P(x >12.1) = .0082
Does this result indicate that the
manufacturer’s claim is incorrect?
Yes, not likely to happen by chance
alone.
Modeling the Distribution of
Sample Proportions
• Rather than showing real repeated samples,
imagine what would happen if we were to
actually draw many samples.
• Now imagine what would happen if we
looked at the sample proportions for these
samples. What would the histogram of all
the sample proportions look like?
Modeling the Distribution of
Sample Proportions (cont.)
• We would expect the histogram of the
sample proportions to center at the true
proportion, p, in the population.
• As far as the shape of the histogram goes,
we can simulate a bunch of random samples
that we didn’t really draw.
Modeling the Distribution of
Sample Proportions (cont.)
• It turns out that the histogram is unimodal,
symmetric, and centered at p.
• More specifically, it’s an amazing and
fortunate fact that a Normal model is just
the right one for the histogram of sample
proportions.
• To use a Normal model, we need to specify
its mean and standard deviation. The mean
of this particular Normal is at p.
The Sampling Distribution Model
for a Proportion (cont.)
• Provided that the sampled values are
independent and the sample size is large
enough, the sampling distribution of p̂ is
modeled by a Normal model with
– Mean:
m( p̂)  p
– Standard deviation:
SD( pˆ )  p(1n p)
Assumptions and Conditions
•
•
Most models are useful only when
specific assumptions are true.
There are two assumptions in the case of
the model for the distribution of sample
proportions:
1. The sampled values must be independent of
each other.
2. The sample size, n, must be large enough.
Assumptions and Conditions (cont.)
10% condition: If sampling has not been made
with replacement, then the sample size, n, must
be no larger than 10% of the population
(population is at least 10 times as large as the
sample)
Success/failure condition: The sample size has to
be big enough so that both npˆ and n(1 pˆ )
are greater than or equal to 10.
EXAMPLE
You ask an SRS of 1500 first year college
students whether they applied to any other
college. There are over 1.7 million first year
college students. 35% of all first year students
applied to other colleges. What is the
probability that your sample will give a result
within 2 percentage points of this true value?
P  .33  pˆ  .37 
EXAMPLE CONTINUED
Step 1: Calculate the mean and standard
deviation
Step 2: Standardize the scores
EXAMPLE CONTINUED
Step 1: Calculate the mean and standard deviation
p  .35
s
.35 1  .35 
1500
 .0123
Step 2: Standardize the scores
.33 - .35
z
.0123
 1.626
.37  .35
z
.0123
 1.626
EXAMPLE CONTINUED
Step 3: Find the P  1.626  z  1.626 
 .8968
So almost 90% of all samples will give a
result within 2 percentage points of the true
value of the population
What have we learned?
• We know that no sample fully and exactly describes the
population; sample proportions and means will vary
from sample to sample. That’s sampling error (or, better,
sampling variability). We know it will always be present
– indeed, the world would be a boring place if variability
didn’t exist. You might think that sampling variability
would prevent us from learning anything reliable about a
population by looking at a sample, but that’s just not so.
The fortunate fact is that sampling variability is not just
unavoidable – it’s predictable!
What have we learned? (cont.)
• We’ve learned to describe the behavior of
sample proportions when our sample is
random and large enough to expect at least 10
successes and failures.
• We’ve also learned to describe the behavior of
sample means (thanks to the CLT!) when our
sample is random (and larger if our data come
from a population that’s not roughly unimodal
and symmetric).
What Can Go Wrong? (cont.)
• Beware of observations that are not
independent.
– The CLT depends crucially on the assumption of
independence.
– You can’t check this with your data—you have
to think about how the data were gathered.
• Watch out for small samples from skewed
populations.
– The more skewed the distribution, the larger the
sample size we need for the CLT to work.
ASSIGNMENT
p. 428 #27, 28, 33, 38 (online)
#21, 22, 23, 26 (book)