Download sampling distribution model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 18
Sampling Distribution Models
Copyright © 2009 Pearson Education, Inc.
Objectives:

The student will be able to:
 State and apply the conditions and uses of the
Central Limit Theorem.
 Determine the mean and standard deviation
(standard error) for a sampling distribution of
proportions or means.
 Apply the sampling distribution of a proportion
or a mean to application problems.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 3
Sample Proportions and Sampling
Distributions



The Harris poll found that of 889 U.S. adults, 40%
said they believe in ghosts. CBS News found that
of 808 U.S. adults, 48% said they believe in ghosts.
 Why are these two sample proportions different?
 What is the true population proportion (of ALL
U.S. adults)?
We’ll denote the population proportion p, and the
sample proportion p^
Consider all possible samples of size 808… if we
made a histogram of the number of samples having
a given p^ what might that look like?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 4
The Central Limit Theorem for Sample
Proportions



Rather than showing real repeated samples,
imagine what would happen if we were to actually
draw many samples and look at their proportions.
The histogram we’d get if we could see all the
proportions from all possible samples is called
the sampling distribution of the proportions.
What would the histogram of all the sample
proportions look like?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 5
The Central Limit Theorem for Sample
Proportions (cont)



We would expect the histogram of the sample
proportions to center at the true proportion, p, in
the population.
It turns out that the histogram is unimodal,
symmetric, and centered at p.
More specifically, it’s an amazing and fortunate
fact that a Normal model is just the right one for
the histogram of sample proportions.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 6
The Central Limit Theorem for Sample
Proportions (cont)


A sampling distribution model for how a
sample proportion varies from sample to sample
allows us to quantify that variation and how likely
it is that we’d observe particular sample
proportions
 Since sampling distribution is normally
distributed we can use the full power of the
Normal model!
To use a Normal model, we need to specify its
mean and standard deviation. We’ll put µ, the
mean of the Normal, at p.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 7
The Central Limit Theorem for Sample
Proportions (cont)

When working with proportions, the standard
deviation we will use is
pq
n

So, The Central Limit Theorem for Sample
Proportions says: the distribution of the sample
proportions is modeled with a probability model
that is

pq 
N  p,

n


Copyright © 2009 Pearson Education, Inc.
Slide 1- 8
The Central Limit Theorem for Sample
Proportions (cont)

A picture of what we just discussed is as follows:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 9
Another way of saying
this…Sampling Distributions


Sample statistics are random variables
themselves
 Sample proportion (for categorical data)
 Sample mean (for quantitative data)
They have a probability distribution, mean,
standard deviation, etc.
Copyright © 2009 Pearson Education, Inc.
Assumptions and Conditions


Most models are useful only when specific
assumptions are true.
There are two assumptions in the case of the
model for the distribution of sample proportions:
1. The Independence Assumption: The sampled
values must be independent of each other.
2. The Sample Size Assumption: The sample
size, n, must be large enough.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 11
Assumptions and Conditions (cont.)



Assumptions are hard—often impossible—to
check. That’s why we assume them.
Still, we need to check whether the assumptions
are reasonable by checking conditions that
provide information about the assumptions.
The corresponding conditions to check before
using the Normal to model the distribution of
sample proportions are the Randomization
Condition,10% Condition and the Success/Failure
Condition.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 12
Assumptions and Conditions (cont.)
1. Randomization Condition: The sample should
be a simple random sample of the population.
2. 10% Condition: If sampling has not been made
with replacement, then the sample size, n,
must be no larger than 10% of the population.
3. Success/Failure Condition: The sample size
has to be big enough so that both np and nq
are at least 10.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 13
The Central Limit Theorem for Sample
Proportions (cont)

Because we have a Normal model, for example,
we know that 95% of Normally distributed values
fall within two standard deviations of the mean.
So we should not be surprised if 95% of various
polls gave results that were near the mean but
varied above and below that by no more than two
standard deviations.
 This is what we mean by sampling error. It’s
not really an error at all, but just variability
you’d expect to see from one sample to
another.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 14
Worked examples


12) Public Health statistics indicate that 26.4% of American
adults smoke cigarettes. Describe the sampling distribution
model for the proportion of smokers among a randomly
selected group of 50 adults. What are your assumptions and
conditions?
15) Based on past experience, a bank believes that 7% of
the people who receive loans will not make payments on
time. The bank has recently approved 200 loans.
 What are the mean and standard deviation of the
proportion of clients in this group who may not make
timely payments?
 What assumptions underlie your model? Are the
conditions met?
 What is the probability that over 10% of these clients will
not make timely payments?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 15
Practice

16) Assume that 30% of students at a university
wear contact lenses.
 We randomly pick 100 students. Let p^
represent the proportion of students who wear
contact lenses. What’s the appropriate model
for the distribution of p^?



Specify the name of the distribution, the mean, and
the standard deviation.
Be sure the verify that the conditions are met.
What’s the approximate probability that more
than one third of this sample wear contacts?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 16
What About Quantitative Data?




Proportions summarize categorical variables.
The Normal sampling distribution model looks like
it will be very useful.
Can we do something similar with quantitative
data?
We can indeed. Even more remarkable, not only
can we use all of the same concepts, but almost
the same model.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 17
Simulating the Sampling Distribution of a Mean


Like any statistic computed from a random
sample, a sample mean also has a sampling
distribution.
We can use simulation to get a sense as to what
the sampling distribution of the sample mean
might look like…
Copyright © 2009 Pearson Education, Inc.
Slide 1- 18
Means – The “Average” of One Die

Let’s start with a simulation of 10,000 tosses of a
die. A histogram of the results is:
Copyright © 2009 Pearson Education, Inc.
Slide 1- 19
Means – Averaging More Dice

Looking at the average
(mean) of two dice after a
simulation of 10,000
tosses:
Copyright © 2009 Pearson Education, Inc.

The average (mean) of
three dice after a
simulation of 10,000
tosses looks like:
Slide 1- 20
Means – Averaging Still More Dice

The average (mean) of 5
dice after a simulation of
10,000 tosses looks like:
Copyright © 2009 Pearson Education, Inc.

The average (mean) of 20
dice after a simulation of
10,000 tosses looks like:
Slide 1- 21
Means – What the Simulations Show


As the sample size (number of dice) gets larger,
each sample average is more likely to be closer
to the population mean.
 So, we see the shape continuing to tighten
around 3.5
And, it probably does not shock you that the
sampling distribution of a mean becomes Normal.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 22
The Fundamental Theorem of Statistics (cont.)
The Central Limit Theorem (CLT)
The mean of a random sample has a sampling
distribution whose shape can be approximated by
a Normal model. The larger the sample, the
better the approximation will be.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 23
The Central Limit Theorem: The
Fundamental Theorem of Statistics (cont.)


The CLT is surprising and a bit weird:
 Not only does the histogram of the sample
means get closer and closer to the Normal
model as the sample size grows, but this is
true regardless of the shape of the population
distribution.
 For example – the result of rolling a die is
Uniformly distributed (not normal!) but the
sampling distribution is still normal
The CLT works better (and faster) the closer the
population model is to a Normal itself. It also
works better for larger samples.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 24
Assumptions and Conditions

The CLT requires essentially the same
assumptions we saw for modeling proportions:
 Independence Assumption: The sampled
values must be independent of each other.
 Sample Size Assumption: The sample size
must be sufficiently large.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 25
Assumptions and Conditions (cont.)

We can’t check these directly, but we can think about
whether the Independence Assumption is plausible.
We can also check some related conditions:

Randomization Condition: The data values must
be sampled randomly.

10% Condition: When the sample is drawn without
replacement, the sample size, n, should be no
more than 10% of the population.

Large Enough Sample Condition: The CLT doesn’t
tell us how large a sample we need. For now, you
need to think about your sample size in the context
of what you know about the population.
Copyright © 2009 Pearson Education, Inc.
Slide 1- 26
But Which Normal?



The CLT says that the sampling distribution of
any mean or proportion is approximately Normal.
But which Normal model?
 For proportions, the sampling distribution is
centered at the population proportion.
 For means, it’s centered at the population
mean.
But what about the standard deviations?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 27
But Which Normal? (cont.)

The Normal model for the sampling distribution of
the mean has a standard deviation equal to
SD y  

n
where σ is the population standard deviation.

So the sampling distribution of means can be
modeled by N( μ, σ/√n )
Copyright © 2009 Pearson Education, Inc.
Slide 1- 28

38) Statistics indicate that Ithaca, NY gets an average
rainfall of 35.4” of rain each year, with a standard
deviation of 4.2”. Assume that a Normal model applies
 During what percentage of years does Ithaca get more
than 40” of rain?
 Less than how much rain falls in the driest 20% of all
years?
 A Cornell student is in Ithaca for 4 years. Let y(bar)
represent the mean amount of rain for those 4 years.
Describe the sampling distribution model of this
sample mean y(bar).
 What’s the probability that those 4 years average less
than 30” of rain?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 29

A restaurateur anticipates serving 180 people on
a Friday evening and believes that about 20% of
the patrons will order the steak special. How
many of those specials should he plan on
ordering in order to be 95% sure (i.e. only a 5%
chance of running out of food) of having enough
steaks on hand to meet customer demand?


This yields a proportion of 0.249 (or 45
steaks).
Copyright © 2009 Pearson Education, Inc.
Slide 1- 30


43) The College Board reported the score distribution
shown in the table for all students who took the 2006
AP Statistics Exam:
 Find the mean and standard deviation of the
scores
 If we select a random sample of 40 AP students
would we expect their scores to follow a Normal
Model?
 Consider the mean scores of random samples of
40 AP stats students. Describe the sampling
model for these means
An AP stats teacher had 63 students preparing to
take the AP exam. He considers his students to be
“typical” of all the national students. What’s the
probability that his students will achieve an average
score of at least 3?
Copyright © 2009 Pearson Education, Inc.
Score
Percent of
Students
5
12.6
4
22.2
3
25.3
2
18.3
1
21.6
Slide 1- 31

48) The weight of potato chips in a bag is stated
to be 10 ounces. The amount that the machine
puts in these bags is believed to have a normal
model with mean 10.2 oz and standard deviation
of 0.12 oz.
 What fraction of all bags are underweight?
 Some of the chips are sold in “bargain packs”
of 3 bags. What is the probability that none of
the 3 is underweight?
 What’s the probability that the mean weight of
the 3 bags is below 10 oz.
 What’s the probability that the mean weight of
a 24-bag case is below 10 oz?
Copyright © 2009 Pearson Education, Inc.
Slide 1- 32