Download Inference for Proportions

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Inference for Proportions
One Sample
Confidence Intervals
One Sample Proportions
Rate your confidence
0 - 100
• Name my age within 10 years?
• within 5 years?
• within 1 year?
• Shooting a basketball at a wading
pool, will make basket?
• Shooting the ball at a large trash can,
will make basket?
• Shooting the ball at a carnival, will
make basket?
What happens to your
confidence as the interval gets
smaller?
The larger your confidence,
the wider the interval.
Point Estimate
• Use a single statistic based on sample
data to estimate a population
parameter
• Simplest approach
• But not always very precise due to
variation in the sampling distribution
Confidence intervals
• Are used to estimate the
unknown population parameter
• Formula:
estimate + margin of error
Margin of error
• Shows how accurate we believe our estimate
is
• The smaller the margin of error, the more
precise our estimate of the true parameter
• Formula:
æ critical ö
E=ç
÷
value
è
ø
æ standard deviation ö
×ç
÷
of
the
statistic
è
ø
Assumptions:
• SRS
• Normal distribution
n p̂ > 10 & n(1- p̂) > 10
• Population is at least 10n
Formula for Confidence interval:
CI  statistic  critical value SD of statistic
Normal
curve
p̂  z *
æ p̂ (1- p̂ ) ö
ç
÷
n
è
ø
Note: For confidence intervals, we DO NOT know p – so
we MUST substitute p-hat for p in both the SD & when
checking assumptions.

Critical value (z*)
• Found from the confidence level
• The upper z-score with probability p lying to its
right under the standard normal curve
Confidence level
90%
95%
99%
z*=1.645
z*=1.96
z*=2.576
Tail Area
Z*
.05
.025
.005
1.645
.05.025
.0051.96
2.576
Confidence level
• Is the success rate of the method used
to construct the interval
• Using this method, ____% of the
time the intervals constructed will
contain the true population
parameter
What does it mean to be 95%
confident?
• 95% chance that p is contained in the
confidence interval
• The probability that the interval
contains p is 95%
• The method used to construct the
interval will produce intervals that
contain p 95% of the time.
A May 2000 Gallup Poll found that
38% of a random sample of 1012
adults said that they believe in ghosts.
Find a 95% confidence interval for the
true proportion of adults who believe
in ghost.
Assumptions:
Step 1: check assumptions!
•Have an SRS of adults
•n p̂ =1012(.38) = 384.56 & n(1- p̂ ) = 1012(.62) = 627.44
Since both are greater than 10, the distribution can be
approximated by a normal curve
Step10,1012.
2: make
•Population of adults is at least
calculations
æ p̂ (1- p̂ ) ö
æ .38(.62) ö
P̂ ± z * ç
= (.35,.41)
÷ = .38 ±1.96 ç
÷
n
è 1012 ø
è
ø
Step 3: conclusion in context
We are 95% confident that the true proportion of adults
who believe in ghosts is between 35% and 41%.
To find sample size:
Another Gallop Poll is taken in order to
æ
ö
p̂
1p̂
(
)
measure
of adults who
E the
= zproportion
*ç
÷humans. What
approve of attempts
to
clone
n ø
è
sample
size
is
necessary
to
be
within
+
0.04
However, since we have not yet taken a
of
the true
proportion
adults
approve
sample,
we do
not know a of
p-hat
(or p)who
to use!
of attempts to clone humans with a 95%
Confidence Interval?
What p-hat (p) do you use when trying
to find the sample size for a given
margin of error?
.1(.9) = .09
.2(.8) = .16
.3(.7) = .21
.4(.6) = .24
.5(.5) = .25
By using .5 for p-hat, we
are using the worst-case
scenario and using the
largest SD in our
calculations.
Another Gallop Poll is taken in order to measure the
proportion of adults who approve of attempts to clone
humans. What sample size is necessary to be within + 0.04 of
the true proportion of adults who approve of attempts to
clone humans with a 95% Confidence Interval?
æ
E = z *ç
è
p (1- p ) ö
÷
n
ø
æ .5 (.5 ) ö
.04 = 1.96 ç
÷
n
è
ø
.5 ( .5 )
.04
=
1.96
n
2
.25
æ .04 ö
çè
÷ø =
1.96
n
n = 600.25 ­ 601
Use p-hat = .5
Divide by 1.96
Square both sides
Round up on sample size
Hypothesis Tests
One Sample
Proportions
Example 1: Julie and Megan
How can I tell if pennies really
wonder ifland
head
and
tails
are
heads 50% of the time?
equally likely if a penny is spun.
Hypothesis
test
They
spin pennies
40 times and
will help me
get 17 decide!
heads. Should they
reject
the
standard
that
But how do I know if this P̂ is one
WhatIis
theirheads
sample
proportion?
pennies
land
50%or of
the
that
expect
to happen
is it
one
time? that is unlikely to happen?
What are hypothesis tests?
Calculations that tell us if a value
occurs by random chance or not – if
it is statistically significant
Is it . . .
– a random occurrence due to
variation?
– a biased occurrence due to some
other reason?
Nature
of
hypothesis
tests
How does a murder trial work?
• First begin by supposing the
First - is
assume
the
“effect”
NOTthat
present
person is innocent
• Next, see if data provides
Then – must have sufficient
evidence against the
evidence to prove guilty
supposition
Example: murder trial
Steps:
Notice the steps are the
same except we add
hypothesis statements –
which you will learn today
1) Assumptions
2) Hypothesis statements &
define parameters
3) Calculations
4) Conclusion, in context
Assumptions for z-test:
• Have an SRS from a binomial
distribution
• Distribution is (approximately)
normal
np ³ 10
YES –
n(1- p) ³ 10
N >10n
These are the same
assumptions
as confidence
Use
the hypothesized parameter
in the null
hypothesisintervals!!
to check assumptions!
Example 1: Julie and Megan wonder
if head and tails are equally likely if
a penny is spun. They spin pennies
40 times and get 17 heads. Should
they reject the standard that
pennies land 50% of the time?
Are the assumptions met?
• Binomial Random Sample
• 40(.5) >10 and 40(1-.5) >10
• Infinate amount of spins > 10(40)
Writing Hypothesis statements:
• Null hypothesis – is the statement
being tested; this is a statement of
“no effect” or “no difference”
H0:
• Alternative hypothesis – is the
statement that we suspect is true
Ha:
The form:
Null hypothesis
H0: parameter = hypothesized value
Alternative hypothesis
Ha: parameter = hypothesized value
Ha: parameter > hypothesized value
Ha: parameter < hypothesized value
Example 1 Contd.: Julie and Megan
wonder if head and tails are equally
likely if a penny is spun. They spin
pennies 40 times and get 17 heads.
Should they reject the standard
that pennies land 50% of the time?
State the hypotheses :
H0: p = .5
Ha: p ≠ .5
Where p is the true
proportion of heads
Example 2:
A company is willing to renew its
advertising contract with a local radio station only if
the station can prove that more than 20% of the
residents of the city have heard the ad and recognize
the company’s product. The radio station conducts a
random sample of 400 people and finds that 90 have
heard the ad and recognize the product. Is this
sufficient evidence for the company to renew its
contract? State the hypotheses :
H0: p = .2
Ha: p > .2
Where p is the
true proportion
that heard the ad.
Formula for hypothesis test:
statistic - parameter
Test statistic 
SD of statistic
z
pˆ  p
p 1  p 
n
Example 1 Contd. Test Statistics for
Julie and Megan’s Data
statistic - parameter
Test statistic 
SD of statistic
-0.95 =
.425 - .5
.5 (1- .5 )
40
P-values • The probability that the test
statistic would have a value as
extreme or more than what
is actually observed
Level of significance • Is the amount of evidence
necessary before we begin to doubt
that the null hypothesis is true
• Is the probability that we will
reject the null hypothesis, assuming
that it is true
• Denoted by α
– Can be any value
– Usual values: 0.1, 0.05, 0.01
– Most common is 0.05
Statistically significant –
• The p-value is as small or smaller
than the level of significance (α)
• If p > α, “fail to reject” the null
hypothesis at the a level.
• If p < α, “reject” the null
hypothesis at the a level.
Facts about p-values:
• ALWAYS make decision about the null
hypothesis!
• Large p-values show support for the
null hypothesis, but never that it is
true!
• Small p-values show support that the
null is not true.
• Double the p-value for two-tail (=)
tests
• Never accept the null hypothesis!
Never “accept” the null hypothesis!
Never “accept” the null
hypothesis!
Never “accept” the null
hypothesis!
At an α level of .05, would you
reject or fail to reject H0 for
the given p-values?
a) .03
b) .15
c) .45
d) .023
Reject
Fail to reject
Fail to reject
Reject
Writing Conclusions:
1) A statement of the decision
being made (reject or fail to
reject H0) & why (linkage)
AND
2) A statement of the results in
context. (state in terms of Ha)
“Since the p-value < (>) α,
I reject (fail to reject)
the H0. I do (do not)
have statistically
significant evidence to
suggest that Ha.”
Be sure to write Ha in
context (words)!
Example 1 Contd. The Decision
.425 - .5
.5 (1- .5 )
40
= -0.95
P-Value = .342
Compare the P-Value to the Alpha Level
.342 > .05
Since the P-Value is greater than the alpha level I fail to
reject that spinning a penny lands heads 50% of the
time. I do not have statistically significant evidence to
suggest that spinning a penny is anything other than fair.
What? You and Jeff
Spun your pennies and got
10 heads out of 40 spins?
Well that not what Meg
and I got. So what now?
You Decide
Joe and Jeff decide to test the
same hypothesis but gather
their own evidence. They spin
pennies 40 times and get 10
heads. Should they reject the
standard that pennies land
heads 50% of the time?
We DID
NOT
reject!
But we DID
reject!
BOTH OF
THEM!!!
Who
is Correct?
Conclusion are based off of
your data. It is important
however to discuss possible
ERRORS that could have
been made.
Errors in Hypothesis Tests
Every time you make a decision there is a possibility
that an error occurred.
ERRORS
Murder Trial
Revisited
Reject
Decision
Guilty
Fail to
Reject
Decision
Not Guilty
Ho is True
Actually
Innocent
Ho is False
Actually
Guilty
Type I Error
Correct
Type I Error
Correct
Correct
Type II
Error
Type II
Error
Correct
Type I Error
When you reject a null hypothesis
when it is actually true.
Denoted by alpha (α)
-the level of significance of a test
Type II Error
When you fail to reject the null
hypothesis when it is false
Denoted by beta (β)
Example 2 Revisited: A company is willing
to renew its advertising contract with a local
radio station only if the station can prove that
more than 20% of the residents of the city have
heard the ad and recognize the company’s
product. The radio station conducts a random
sample of 400 people and finds that 90 have
heard the ad and recognize the product. Is this
sufficient evidence for the company to renew its
contract?
Assumptions:
•Have an SRS of people
•np = 400(.2) = 80 & n(1-p) = 400(.8) = 320 - Since both are greater than
10, this distribution is approximately normal.
•Population of people is at least 4000.
Use the parameter in the null hypothesis to
check
assumptions!
H0: p = .2
where p is the true proportion
of people
who
Ha: p > .2
z 
.225  .2
.2(.8)
400
heard the ad
 1.25
p  value  .1056
α  .05
Use the parameter in the null hypothesis to
calculate standard deviation!
Since the p-value >α, I fail to reject the null hypothesis. There is not
sufficient evidence to suggest that the true proportion of people who heard
the ad is greater than .2.
What type of error could the
radio station have made?
Type I
OR
Type II
Two-Sample
Proportions Inference
Sampling Distributions for the
difference in proportions
When tossing pennies, the probability of the coin landing on heads
is 0.5. However, when spinning the coin, the probability of the
coin landing on heads is 0.4. Let’s investigate.
Looking at the sampling distribution of the difference in
sample proportions:
•What is the mean of the difference in sample proportions (flip spin)?
 0.1
pˆf  pˆs
•What is the standard deviation of the difference in sample
proportions (flip - spin)? 
 0.14
pˆf  pˆs
•Can the sampling distribution of difference in sample proportions
(flip - spin) be approximated by a normal distribution?

Yes, since n1p1=12.5, n1(1-p1)=12.5, n2p2=10, n2(1p2)=15 –so all are at least 5)
Assumptions:
• Two, independent SRS’s from
populations
• Populations at least 10n
• Normal approximation for both
n1 p1  5
n1 1  p1   5
n2 p2  5
n2 1  p2   5
Formula for confidence interval:
CI  statistic  critical value SD of statistic
 pˆ
 pˆ  
Margin of
1 error! 2
z*
Standard
error!

pˆ1 1  pˆ1  pˆ2 1  pˆ2 

n1
n2
Note: use p-hat when p is not known
Example 1: At Community Hospital, the burn center is
Since n1with
p1=259,
n1(1-p
n2p2=94, ntreatment.
1)=57, compress
2(1experimenting
a new
plasma
p2)=325 and all > 5, then the distribution of
A random
sample of 316 patients with minor burns
difference in proportions is approximately
received the plasma compress
normal.treatment. Of these
patients, it was found that 259 had no visible scars after
treatment. Another random sample of 419 patients with
minor burns received no plasma compress treatment.
.82(.18) .22(.78)
For this group,
S .E . it was found that
 94 had no visible scars
after treatment. What is316
the shape &419
standard error of the
sampling distribution
0.0296 of the difference in the proportions
of people with visible scars between the two groups?
Example 1: At Community Hospital, the burn center is
experimenting with a new plasma compress treatment.
A random sample of 316 patients with minor burns
received the plasma compress treatment. Of these
patients, it was found that 259 had no visible scars after
treatment. Another random sample of 419 patients with
minor burns received no plasma compress treatment.
For this group, it was found that 94 had no visible scars
after treatment. What is a 95% confidence interval of
the difference in proportion of people who had no visible
scars between the plasma compress treatment & control
group?
Assumptions:
Since these are all burn patients, we can add 316 + 419 =
•Have 2 independent SRS of burn patients 735.
If not the same – you MUST list separately.
•Both distributions are approximately normal since n1p1=259, n1(1-p1)=57,
n2p2=94, n2(1-p2)=325 and all > 5
•Population of burn patients is at least 7350.
p1 1  p1  p2 1  p2 
pˆ1  pˆ2   z *


n1
n2
.82.18 .22.78
.82  .22  1.96

 .537, .654 
316
419
We are 95% confident that the true difference in the proportion of people
who had no visible scars between the plasma compress treatment &
control group is between 53.7% and 65.4%
Example 2: Suppose that researchers want
to estimate the difference in proportions of
people who are against the death penalty in
Texas & in California. If the two sample
Since both n’s are the same size, you
sizes are
same,
what size– sample
havethe
common
denominators
so add! is
needed to be within 2% of the true
difference at 90% confidence?
.5(.5) .5(.5)
.02  1.645

n
n
.25  .25
.02  1.645
n
n = 3383
ExampleSO
3: –Researchers
comparing the effectiveness of two pain
which is correct?
medications randomly selected a group of patients who had been
complaining of a certain kind of joint pain. They randomly
= (.67, .83)
divided these people into CI
twoA groups,
and then administered the
CIB =(.52,
.70) who received
painkillers. Of the 112 people
in the group
Since
overlap,
it was
appears
that there
medication
A,the
84 intervals
said this pain
reliever
effective.
Of the 108
is no
proportion
of people
people in
the difference
other group,in66the
reported
that pain
relieverwho
B was
reported
relieve between the two medicines.
effective.
(BVD, pain
p. 435)
a) Construct separate 95% confidence intervals for the proportion
of people who reported that the pain reliever was effective. Based
CIdo= the
(0.017,
0.261) of people who reported
on these intervals how
proportions
Since
zero
is not inAthe
is a
pain relieve
with
medication
or interval,
medicationthere
B compare?
difference
the proportion
who in the
b) Construct
a 95% in
confidence
intervalof
forpeople
the difference
reported
between
the two effective.
proportions
of peoplepain
whorelieve
may find
these medications
medicines.
Hypothesis statements:
• H0: p1 = p2
• Ha: p1 > p2
• Ha: p1 < p2
• Ha: p1 ≠ p2
Be sure to
define both p1
& p2!
Since we assume that the
population proportions are equal
in the null hypothesis, the
variances are equal.
Therefore, we pool
variances!
x1  x 2
the
pˆ 
n1  n2
Formula for Hypothesis test:
p1 = p2
statistic - parameter
Test statistic  So . . .
SD of statistic
p1 – p2 =0
z
pˆ1  pˆ2   p1  p2 
1 1
pˆ1  pˆ 

n1
n2
Example 4: A forest in Oregon has an infestation of
spruce moths. In an effort to control the moth, one
area has been regularly sprayed from airplanes. In
this area, a random sample of 495 spruce trees
showed that 81 had been killed by moths. A second
nearby area receives no treatment. In this area, a
random sample of 518 spruce trees showed that 92
had been killed by the moth. Do these data indicate
that the proportion of spruce trees killed by the moth
is different for these areas?
Assumptions:
•Have 2 independent SRS of spruce trees
•Both distributions are approximately normal since n1p1=81, n1(1-p1)=414,
n2p2=92, n2(1-p2)=426 and all > 5
•Population of spruce trees is at least 10,130.
H0: p1=p2
where p1 is the true proportion of trees killed by moths Ha: p1≠p2
in the treated area p2 is the true proportion of trees
killed by moths in
the untreated area
z 
pˆ1  pˆ2
.16  .18

 0.59
1 1
1
1
p 1  p 

.17 .83

n1 n2
495 518
P-value = 0.5547
a = 0.05
Since p-value > a, I fail to reject H0. There is not sufficient evidence to
suggest that the proportion of spruce trees killed by the moth is different
for these areas