Download Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia, lookup

Transcript
Chapter 15
Sampling
Distribution Models
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
1
Objectives
State and apply the conditions and uses of the Central
Limit Theorem.
55. Determine the mean and standard deviation (standard
error) for a sampling distribution of proportions or
means.
56. Apply the sampling distribution of a proportion or a
mean to application problems.
54.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
2
15.1
Sampling
Distribution of a
Proportion
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
3
Sample Proportions and Sampling
Distributions
The Harris poll found that of 889 U.S. adults, 40% said
they believe in ghosts. CBS News found that of 808
U.S. adults, 48% said they believe in ghosts.
• Why are these two sample proportions different?
• What is the true population proportion (of ALL U.S.
adults)?
We’ll denote the population proportion p, and the sample
proportion p^
Consider all possible samples of size 808… if we made a
histogram of the number of samples having a given p^
what might that look like?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 4
4
The Central Limit Theorem for Sample
Proportions
Rather than showing real repeated samples, imagine
what would happen if we were to actually draw many
samples and look at their proportions.
The histogram we’d get if we could see all the
proportions from all possible samples is called the
sampling distribution of the proportions.
What would the histogram of all the sample proportions
look like?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 5
5
Sampling About Evolution
According to a Gallup poll, 43% believe in
evolution. Assume this is true of all Americans.
• If many surveys were done of 1007 Americans, we
could calculate the sample proportion for each.
•
The histogram shows the
distribution of a simulation
of 2000 sample proportions.
•
The distribution of all possible
sample proportions from samples with the same
sample size is called the sampling distribution.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
6
Sampling Distributions
Sampling Distribution for Proportions
• Symmetric
• Unimodal
• Centered at p
• The sampling distribution follows the Normal model:

N  p,

pq 
n 
What does the sampling distribution tell us?
• The sampling distribution allows us to make
statements about where we think the corresponding
population parameter is and how precise these
statements are likely to be.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
7
Another way of saying this…
Sample statistics are random variables themselves
• Sample proportion (for categorical data)
• Sample mean (for quantitative data)
They have a probability distribution, mean, standard
deviation, etc.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
8
Mean and Standard Deviation
Sampling Distribution for Proportions
•
Mean = p
•
npq
pq
σ( pˆ ) =
=
n
n
•

pq 
N  p,

n 

Copyright © 2014, 2012, 2009 Pearson Education, Inc.
9
The Normal Model for Evolution
Population: p = 0.43, n = 1007. Sampling Distribution:
• Mean = 0.43
•
Standard deviation = σ( pˆ ) =
 0.43  0.57   0.0156
1007
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
10
Assumptions and Conditions
Most models are useful only when specific assumptions
are true.
There are two assumptions in the case of the model for
the distribution of sample proportions:
1. The Independence Assumption: The sampled
values must be independent of each other.
2. The Sample Size Assumption: The sample size, n,
must be large enough.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 11
11
Assumptions and Conditions (cont.)
Assumptions are hard—often impossible—to check.
That’s why we assume them.
Still, we need to check whether the assumptions are
reasonable by checking conditions that provide
information about the assumptions.
The corresponding conditions to check before using the
Normal to model the distribution of sample proportions
are the Randomization Condition,10% Condition and
the Success/Failure Condition.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 12
12
Assumptions and Conditions (cont.)
1. Randomization Condition: The sample should be a
simple random sample of the population.
2. 10% Condition: If sampling has not been made with
replacement, then the sample size, n, must be no
larger than 10% of the population.
3. Success/Failure Condition: The sample size has to
be big enough so that both np and nq are at least
10.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 13
13
The Central Limit Theorem for Sample
Proportions (cont)
Because we have a Normal model, for example, we
know that 95% of Normally distributed values fall
within two standard deviations of the mean. So we
should not be surprised if 95% of various polls gave
results that were near the mean but varied above and
below that by no more than two standard deviations.
• This is what we mean by sampling error. It’s not really
an error at all, but just variability you’d expect to see
from one sample to another.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 14
14
Solving Sampling Distribution Problems
(Proportions)
o
First identify what sampling distribution is involved.
o Hint: you must know the underlying population p and there
must be a sample proportion involved.

pq 
o The sampling distribution is given by N  p,


n 
o
Check the conditions, to be sure the sampling distribution
applies.
o
Draw a picture of the Sampling Distribution (Normal curve)
o
Find where p^ falls on this distribution and use NormalCdf to
solve for the probability of seeing p^ or something more
extreme (shade from p^ to the nearest tail)
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
15
Practice
12) Public Health statistics indicate that 26.4% of American adults
smoke cigarettes. Describe the sampling distribution model for
the proportion of smokers among a randomly selected group of
50 adults. What are your assumptions and conditions?
15) Based on past experience, a bank believes that 7% of the
people who receive loans will not make payments on time. The
bank has recently approved 200 loans.
• What are the mean and standard deviation of the proportion of
clients in this group who may not make timely payments?
• What assumptions underlie your model? Are the conditions
met?
• What is the probability that over 10% of these clients will not
make timely payments?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 16
16
Practice
16) Assume that 30% of students at a university wear
contact lenses.
• We randomly pick 100 students. Let p^ represent the
proportion of students who wear contact lenses.
What’s the appropriate model for the distribution of p^?
– Specify the name of the distribution, the mean, and
the standard deviation.
– Be sure the verify that the conditions are met.
• What’s the approximate probability that more than one
third of this sample wear contacts?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 17
17
Enough Lefty Seats?
13% of all people are left handed.
• A 200-seat auditorium has 15 lefty seats.
• What is the probability that there will not be enough
lefty seats for a class of 90 students?
Think→
ˆ > 0.167 
• Plan: p^=15/90 ≈ 0.167, Want P  p
• Model:
 Independence Assumption: With respect to
lefties, the students are independent.
 10% Condition: This is out of all people.
 Success/Failure Condition: 15  10, 75  10
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
18
Enough Lefty Seats?
Think→
• Model: p = 0.13, n=90
SD  pˆ  =
 0.13  0.87   0.035
90
The model is: N(0.13, 0.035)
Show→
• Plot
0.167  0.13
 1.06
• Mechanics: z =
0.035
P ( pˆ > 0.167) = P ( z >1.06)
•
 0.1446
Or normalcdf(0.167, 1E99, 0.13, 0.035)
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
19
Enough Lefty Seats?
Tell →
• Conclusion: There is about a 14.5% chance that
there will not be enough seats for the left handed
students in the class.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
20
15.3
The Sampling
Distribution of
Other Statistics
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
21
The Sampling Distribution for Others
•
There is a sampling distribution for any statistic, but
the Normal model may not fit.
• Below are histograms showing results of simulations
of sampling distributions.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
22
The Sampling Distribution For Others
•
The medians seem to be approximately Normal.
•
The variances seem somewhat skewed right.
•
The minimums are all over the place.
•
In this course, we will focus on the proportions and
the means.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
23
Sampling Distribution of the Means
•
Imagine we roll a number of dice and
take the average of the rolls over and
over again.
•
For 1 die, the distribution is Uniform.
•
For 3 dice, the sampling distribution
for the means is closer to Normal.
•
For 20 dice, the sampling distribution
for the means is very close to
normal. The standard deviation is
much smaller.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
24
15.4
The Central Limit
Theorem: The
Fundamental
Theorem of
Statistics
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
25
The Central Limit Theorem
The Central Limit Theorem
• The sampling distribution of any mean becomes
nearly Normal as the sample size grows.
Requirements
• Independent
• Randomly collected sample
The sampling distribution of the means is close to Normal
if either:
• Large sample size
• Population close to Normal
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
26
Video on the Central Limit Theorem
http://www.nytimes.com/video/science/100000002452709
/bunnies-dragons-and-the-normalworld.html?playlistId=100000002438160
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
27
How Normal?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
28
Population Distribution and Sampling
Distribution of the Means
Population Distribution
•
Normal
Sampling Distribution for
the Means
→ Normal (any sample size)
•
Uniform
→ Normal (large sample size)
•
Bimodal
→ Normal (larger sample size)
•
Skewed
→ Normal (larger sample size)
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
29
Standard Deviation of the Means
•
Which would be more unusual: a student who is
6’9” tall in the class or a class that has mean height
of 6’9”?
•
The sample means have a smaller standard
deviation than the individuals.
•
The standard deviation of the sample means goes
down by the square root of the sample size:
σ
SD  y  =
n
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
30
The Sampling Distribution Model for a
Mean
When a random sample is drawn from a population with
mean m and standard deviation s, the sampling
distribution has:
• Mean: m
σ
• Standard Deviation:
n
• For large sample size, the distribution is
approximately normal regardless of the population
the random sample comes from.
•
The larger the sample size, the closer to Normal.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
31
Solving Sampling Distribution Problems
(Means)
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
32
Caution!
Pay attention to how the sampling distribution of means
differs depending on the size of the sample.
Be careful to distinguish between the underlying
distribution of the population (which may or may not be
normal) and the sampling distribution of means (which
depends on sample size n).
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
33
38) Statistics indicate that Ithaca, NY gets an average
rainfall of 35.4” of rain each year, with a standard
deviation of 4.2”. Assume that a Normal model
applies
– During what percentage of years does Ithaca get
more than 40” of rain?
– Less than how much rain falls in the driest 20% of
all years?
– A Cornell student is in Ithaca for 4 years. Let y(bar)
represent the mean amount of rain for those 4 years.
Describe the sampling distribution model of this
sample mean y(bar).
– What’s the probability that those 4 years average
less than 30” of rain?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 34
34
Too Heavy for the Elevator?
Mean weight of US men is 190 lb, the
standard deviation is 59 lb. An elevator has a weight limit
of 10 persons or 2500 lb. Find the probability that 10 men
in the elevator will overload the weight limit.
Think →
• Plan: 10 over 2500 lb same as their mean over 250.
•
Model:
 Independence Assumption: Not random, but
probably independent.
 Sample
Size Condition: Weight approx. Normal.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
37
Too Heavy for the Elevator
Think →
• Model: m = 190, s = 59
By the CLT, the sampling distribution of y is
approximately Normal:
σ
59
μ( y ) =190, SD( y ) =
=
 18.66
n
10
Show→
• Plot:
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
38
Too Heavy for the Elevator?
•
Mechanics:
y  μ 250  190
z=
=
 3.21
SD( y )
18.66
P ( y > 250)  P ( z > 3.21)  0.0007
Tell →
• Conclusion: There is only a 0.0007 chance that the
10 men will exceed the elevator’s weight limit.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
39
43) The College Board reported the score distribution
shown in the table for all students who took the
2006 AP Statistics Exam:
• Find the mean and standard deviation of the scores
• If we select a random sample of 40 AP students
would we expect their scores to follow a Normal
Model?
• Consider the mean scores of random samples of 40
AP stats students. Describe the sampling model for
these means
An AP stats teacher had 63 students preparing to take
the AP exam. He considers his students to be
“typical” of all the national students. What’s the
probability that his students will achieve an average
score of at least 3?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Score Percent of
Students
5
12.6
4
22.2
3
25.3
2
18.3
1
21.6
Slide 1- 40
40
48) The weight of potato chips in a bag is stated to be 10
ounces. The amount that the machine puts in these
bags is believed to have a normal model with mean
10.2 oz and standard deviation of 0.12 oz.
• What fraction of all bags are underweight?
• Some of the chips are sold in “bargain packs” of 3
bags. What is the probability that none of the 3 is
underweight?
• What’s the probability that the mean weight of the 3
bags is below 10 oz.
• What’s the probability that the mean weight of a 24-bag
case is below 10 oz?
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Slide 1- 41
41
15.5
Sampling
Distributions: A
Summary
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
42
Sample Size and Standard Deviation
σ
• SD( y ) =
n
SD( pˆ ) =
pq
n
•
Larger sample size → Smaller standard deviation
•
Multiply n by 4 → Divide the standard deviation by 2.
•
Need a sample size of 100 to reduce the
standard deviation by a factor of 10.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
43
Billion Dollar Misunderstanding
Bill and Melinda Gates Foundation found that the 12% of
the top 50 performing schools were from the smallest 3%.
They funded a transformation to small schools.
•
Small schools have a smaller n, thus a higher y
standard deviation.
•
Likely to see both higher and lower means.
•
18% of the bottom 50 were also from the smallest 3%.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
44
Distribution of the Sample
vs. the Sampling Distribution
Don’t confuse the distribution of the sample and the
sampling distribution.
• If the population’s distribution is not Normal, then the
sample’s distribution will not be normal even if the
sample size is very large.
•
For large sample sizes, the sampling distribution,
which is the distribution of all possible sample means
from samples of that size, will be approximately
Normal.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
45
Two Truths About Sampling Distributions
•
Sampling distributions arise because samples vary.
Each random sample will contain different cases
and, so, a different value of the statistic.
•
Although we can always simulate a sampling
distribution, the Central Limit Theorem saves us the
trouble for proportions and means. This is especially
important when we do not know the population’s
distribution.
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
46
What Can Go Wrong?
Don’t confuse the sampling distribution with the
distribution of the sample.
• A histogram of the data shows the sample’s
distribution. The sampling distribution is more
theoretical.
• Beware of observations that are not independent.
• The CLT fails for dependent samples. A good
survey design can ensure independence.
• Watch out for small samples from skewed or bimodal
populations.
• The CLT requires large samples or a Normal
population or both.
•
Copyright © 2014, 2012, 2009 Pearson Education, Inc.
47