Download sec7.2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Sampling
Distributions of
Proportions
• Toss a penny 20 times and record the
number of heads.
• Now, Really think about it for a
minute:
• We are tossing exactly 20 times,
expecting that the probability of a
head will be .5 each time and that
each toss will be independent
• SOUND FAMILIAR???????
• Okay now, imagine 1000 people
lining up and tossing the penny
20 times each.
• If we were to record each sample
proportion and histogram the
results what would expect the
shape to be?
A Sampling Distribution Model
for a Proportion


A proportion is no longer just a computation from
a set of data.
 It is now a random variable quantity that has a
probability distribution.
 This distribution is called the sampling
distribution model for proportions.
Even though we depend on sampling distribution
models, we never actually get to see them.
 We never actually take repeated samples from
the same population and make a histogram. We
only imagine or simulate them.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.

The Sampling Distribution of Proportions
How good is the statistic
sampling distributi on of
pˆ as an estimate of the parameter p? The
pˆ answers this question.
Consider the approximate sampling distributions generated by a simulation in
which SRSs of Reese’s Pieces are drawn from a population whose
proportion of orange candies is either 0.45 or 0.15.
What do you notice about the shape, center, and spread of each?
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
The Sampling Distribution of Proportions
What did you notice about the shape, center, and spread of
each sampling distribution?
Shape : In some cases, the sampling distributi on of pˆ can be
approximat ed by a Normal curve. This seems to depend on both the
sample size n and the population proportion p.
Center : The mean of the distributi on is  pˆ  p. This makes sense
because the sample proportion pˆ is an unbiased estimator of p.
Spread: For a specific value of p , the standard deviation  pˆ gets
smaller as n gets larger. The value of  pˆ depends on both n and p.
There is an important connection between th e sample proportion
the number of " successes" X in the sample.
pˆ 
count of successes in sample
size of sample
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
X

n
pˆ and
Sample Proportions



The Sampling Distribution of Proportions
X  np
X  np(1  p)
Since pˆ  X / n  (1 / n)  X , we are just multiplyin g the random variable X
by a constant (1 / n) to get the random variable pˆ . Therefore,

1
 pˆ  (np)  p
n

pˆ is an unbiased estimator of p
1
np(1  p)
 pˆ 
np(1  p) 

2
n
n
p(1  p)
n
As sample size increases, the spread decreases.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Sample Proportions
In Chapter 6, we learned that the mean and standard
deviation of a binomial random variable X are
The Sampling Distribution Model
for a Proportion (cont.)

Provided that the sampled values are
independent and the sample size is large enough,
the sampling distribution of p̂ is very much like the
Binomial Distribution and for large sample sizes is
modeled by a Normal model with

Mean:

Standard deviation: SD( p̂) 
( p̂)  p
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
pq
n
Modeling the Distribution of
Sample Proportions (cont.)

A picture of what we just discussed is as follows:
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
How Good Is the Normal Model?


The Normal model gets better as a good model
for the distribution of sample proportions as the
sample size gets bigger.
Just how big of a sample do we need? This will
soon be revealed…
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Assumptions and Conditions


Most models are useful only when specific
assumptions are true.
There are two assumptions in the case of the
model for the distribution of sample proportions:
1. The Independence Assumption: The sampled
values must be independent of each other.
2. The Sample Size Assumption: The sample
size, n, must be large enough.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Assumptions and Conditions (cont.)



Assumptions are hard—often impossible—to
check. That’s why we assume them.
Still, we need to check whether the assumptions
are reasonable by checking conditions that
provide information about the assumptions.
The corresponding conditions to check before
using the Normal to model the distribution of
sample proportions are the Randomization
Condition, the 10% Condition and the
Success/Failure Condition.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Assumptions and Conditions (cont.)
Randomization Condition: The sample should
be a simple random sample of the population.
2.
10% Condition: the sample size, n, must be no
larger than 10% of the population.
3.
Success/Failure Condition: The sample size
has to be big enough so that both np (number of
successes) and nq (number of failures) are at
least 10.
…So, we need a large enough sample that is not
too large.
1.
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Why does the third assumption insure an
approximate normal distribution?
Remember back to binomial distributions
Suppose n = 10 & p = 0.1
(probability of a success), a
histogram of this distribution
> 10 &skewed
n(1-p) >right!
10
isnp
strongly
insures that the sample size
Now
use
n
=
100
&
p
=
0.1
(Now
is large enough to have a
np normal
> 10!) While
the histogram is
approximation!
still strongly skewed right – look
what happens to the tail!
Consider the following situation:
Suppose we have a population of six
people: Alice, Ben, Charles, Denise,
Edward, & Frank
What is the proportion of females? 1/3
What is the parameter of interest in
this population?
gender
Draw samples of two from this population.
How many different samples are
possible?
6C2 =15
Find the 15 different samples that are
possible & find the sample proportion of the
number of females in each sample.
Ben & Frank
Alice & Ben
.5
Charles & Denise
Alice & Charles
.5
Alice & Denise
1
Charles & Edward
Alice & Edward
.5
Charles & Frank
the mean of the
Alice & Frank How does
.5
Denise & Edward
(p-hat)
Ben & Charlessampling
0 distribution
Denise & Frank
Ben & Denise compare
.5 to the population
Edward & Frank
parameter
(p)?

=
p
Ben & Edward
0p-hat
0
.5
0
0
.5
.5
0
Find the mean & standard deviation of all p-hats.
μpˆ
1

3
&
σ pˆ  0.29814
But WAIT! We said that the standard
deviation should equal SD( p̂)  pq
n
σ pˆ 
 
1 2
3 3 1
2
3
 0.29814
WHY did this happen?
We are sampling more than 10% of our
population!
So – in order to calculate the
standard deviation of the
sampling distribution, we
MUST be sure that our sample
size is less than 10% of the
population!
Assumptions (Rules of Thumb)
• Must start with a Simple Random
Sample
• Sample size must be less than 10% of
the population (independence)
• Sample size must be large enough to
insure a normal approximation can be
used.
np > 10 & n (1 – p) > 10

A polling organization asks an SRS of 1500 first-year college students how far away
their home is. Suppose that 35% of all first-year students actually attend college within
50 miles of home. What is the probability that the random sample of 1500 students will
give a result within 2 percentage points of this true value?
STATE: We want to find the probability that the sample proportion falls between 0.33
and 0.37 (within 2 percentage points, or 0.02, of 0.35).
+
Sample Proportions
ˆ
 Using the Normal Approximation for p
Inference about a population proportion p is based on the sampling distribution
of pˆ . When the sample size is large enough for np and n(1 p) to both be at
least 10 (the Normal condition), the sampling distribution of pˆ is
approximately Normal.
PLAN: We have an SRS of size n = 1500 drawn from a population in which the
proportion p = 0.35 attend college within 50 miles of home.
 pˆ  0.35
 pˆ 
(0.35)(0.65)
 0.0123
1500
DO: Since np = 1500(0.35) = 525 and n(1 – p) =
 1500(0.65)=975 are both greater than 10, we’ll standardize and
then use Table A to find the desired probability.
 0.35
0.37  0.35
0.33
z
 1.63
 1.63
0.123
0.123
P(0.33  pˆ  0.37)  P(1.63  Z 1.63)  0.9484  0.0516  0.8968
z
CONCLUDE: About 90% of all SRSs of size 1500 will give a result
 truth about the population.
 2 percentage points of the
within
Based on past experience, a bank believes
that 7% of the people who receive loans
μpˆ  .07
will not make payments on time. The bank
recently approved 200
.93
.07loans.
σ pˆ 
 .01804
Yes
–
200
What are the mean and
standard
deviation
np = 200(.07) = 14
of the proportion of clients
in this group
n(1 - p) = 200(.93) = 186
who may not make payments on time?
Are assumptions met?
Ncdf(.10, 1E99, .07, .01804) =
What is the probability that over 10% of
.0482
these clients will not make payments on
time?
Suppose one student tossed a coin
200 times and found only 42% heads.
Do you believe that this is likely to
happen?

.5(.5) 
  .0118
ncdf   ,.42,.5,

200


No – since there is
approximately a 1% chance of
this happening, I do not
believe the student did this.
Assume that 30% of the students
at MSU wear contacts. In a
sample of 100 students, what is
the probability that more than 35%
of them wear contacts?
p-hat = .3
& p-hat = .045826
Check assumptions!
np = 100(.3) = 30 & n(1-p) =100(.7) = 70
Ncdf(.35, 1E99, .3, .045826) = .1376
+ Section 7.2
Sample Proportions
Summary
In this section, we learned that…
When we want information about the population proportion p of successes, we
ˆ to estimate the unknown
 often take an SRS and use the sample proportion p
parameter p. The sampling distribution of pˆ describes how the statistic varies
in all possible samples from the population.




The mean of the sampling distribution of pˆ is equal to the population proportion

p. That is, pˆ is an unbiased estimator of p.
p(1 p)
The standard deviation of the sampling distribution of pˆ is  pˆ 
for
n
an SRS of size n. This formula can be used if the population is at least 10 times
as large as the sample (the 10% condition). The standard deviation of pˆ gets
smaller as the sample size n gets larger.

When the sample size n is larger, the sampling distribution of pˆ is close to a
p(1 p)
Normal distribution with mean p and standard deviation  pˆ 
.
n
 In practice, use this Normal approximation when both np ≥ 10 and n(1 - p) ≥ 10 (the
Normal condition).