Download Simple and Effective Confidence Intervals for Proportions and Differences of... Result from Adding Two Successes and Two Failures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Transcript
Simple and Effective Confidence Intervals for Proportions and Differences of Proportions
Result from Adding Two Successes and Two Failures
Author(s): Alan Agresti and Brian Caffo
Reviewed work(s):
Source: The American Statistician, Vol. 54, No. 4 (Nov., 2000), pp. 280-288
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2685779 .
Accessed: 28/10/2011 10:35
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected]
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to The
American Statistician.
http://www.jstor.org
Teacher's Corner
Simple and EffectiveConfidenceIntervalsforProportions
and Differencesof ProportionsResult fromAdding Two
Successes and Two Failures
Alan AGRESTI and Brian CAFFO
* An approximate100(1 - cv)%confidenceintervalfor
P1
-
P2 iS
The standardconfidenceintervalsforproportions
and their
differences
used in introductory
statisticscourseshave poor
P
(2)
+) P2 (-P2)
( -1-32) + Z/2
Pl(I
the actual coverage probabilityoftenbeing
performance,
of
muchlowerthanintended.However,simpleadjustments
theseintervalsbased on addingfourpseudo observations, These confidenceintervalsresult from invertinglargehalfof each type,performsurprisingly
well even forsmall sample Wald tests, which evaluate standard errors at
samples.To illustrate,
fora broadvarietyof parameterset- the maximumlikelihoodestimates.For instance,the intingswith 10 observationsin each sample,a nominal95% terval for p is the set of
po values for which IP intervalfor the differenceof proportionshas actual covPoI/ (l - p3)/n < Z./2; that is, the set of po having P
erage probabilitybelow .93 in 88% of the cases withthe value exceedingaein testingHo: p =
po againstHa p :&PO
standardintervalbutin only 1% withtheadjustedinterval; usingtheapproximately
normalteststatistic.The intervals
the mean distancebetweenthe nominaland actual cover- are sometimescalled Waldintervals.Althoughtheseinterage probabilitiesis .06 forthestandardinterval,but .01 for vals are simple and naturalfor studentswho have previtheadjustedone. In teachingwiththeseadjustedintervals, ously seen analogous large-sampleformulasfor means,a
one can bypassawkwardsamplesize guidelinesand use the considerableliterature
showsthattheybehavepoorly(e.g.,
same formulaswithsmall and large samples.
Ghosh 1979; Vollset1993; Newcombe 1998a, 1998b). This
can be trueevenwhenthesamplesize is verylarge(Brown,
Cai, and DasGupta 1999). In thisarticle,we describesimof theseintervalsthatperformmuchbetter
ple adjustments
but can be easily taughtin the typicalnon-calculus-based
statisticscourse.
These referencesshowed thata muchbetterconfidence
1. INTRODUCTION
intervalfor a single proportionis based on invertingthe
Let X denote a binomial variatefor n trialswith pa- test with standarderrorevaluatedat the null hypothesis,
rameterp, denotedbin(n,p), and let p = X/n denotethe whichis the score testapproach.This confidenceinterval,
sample proportion.For two independentsamples,let X1 due to Wilson (1927), is the set of po values for which
be bin(nl,pl), and let X2 be bin(n2,p2).Let Za denotethe P - Po | Po(i - Po)b2 <
Za/2, whichis
1- a quantileof thestandardnormaldistribution.
Nearlyall
statisticstextbookspresentthefollowingconfielementary
Z /2
n_
_
dence intervalsforp and P1 - P2:
1
' 2/2)
n+ z/ 2)
n(
KEY WORDS: Binomial distribution;
Score test; Small
sample;Wald test.
_
* An approximate100(1 - c)% confidenceintervalfor
p is
Alan Agrestiis Professor,
andBrianCaffois a GraduateStudent,Departmentof Statistics,Universityof Florida,Gainesville,FL 32611-8545 (Email:[email protected]). This workwas partiallysupportedby grants
fromthe National Institutesof Health and the National Science Foundation.The authorsappreciatehelpfulcommentsfromBrentCoull and
YongyiMin.
280
TheAnmericani
November2000, Vol.54, No. 4
Statisticiani,
a/+Zc,/2 [/2)
(
(
2?>/)
+
(2)
G) (t22+z/ )]
The midpointis a weightedaverage of p and 1/2, and it
equals the sampleproportionafteraddingZ /2 pseudo observations,halfof each type.The square of the coefficient
.ofZa/2 in thisformulais a weightedaverageof thevariance
of a sample proportionwhenp = p and the varianceof a
sample proportionwhenp = 1/2,using n + z 2 in place
of theusual sample size n. For the 95% case, Agrestiand
to motivateapproxiCoull (1998) used thisrepresentation
matingthe score intervalby theordinaryWald interval(1)
? 2000 AmericanStatisticalAssociationi
CoverageProbability
CoverageProbability
1.00
.9
950/
.95
v
-.95
9090
.00
1.00
8
-
.80
.80
.75 -
.75 j0
0
.4
.2
.6
1
.
~~
70p
~~~~p ~~~~~
|-------
1.00 -
.
.95
.
Wald
.85 -
.80 -
80
.75-
7
.70
.6
.8
1
6
8
1
.6
.8
1
4
CoverageProbability
1.00
.95
-
.85 -
-
80
5
p
.70
0
.2
.4
.6
n=10
n=5
2
Adjusted
.90-
.85 -
p
0
1
6
-
.90-
.4
4
..
.90-
.2
7p
2
'1.00
,95
0
:75 -
CoverageProbability
CoverageProbability
,
,, ,,
,90
85 -
-.
.80 -
99%
:
,-
CoverageProbability
1
':1.00
'V +
.8
1
p
.70
0
.2
.4
n=20
Figure 1. Coverage probabilitiesforthebinomialparameterp withthe nominal95% and 99% Waldconfidenceintervaland the adjusted interval
based on adding fourpseudo observations,forn = 5, 10, 20.
two of p of p and 1/2 ratherthan the weightedaverage of the
afteraddingz.025= 1.962 4 pseudo observations,
each type.That is, theiradjusted"add two successes and variances;by Jensen'sinequality,the adjusted intervalis
two failures"intervalhas the simpleform
widerthanthe score interval.
of
in performance
For small samples,the improvement
Wald interval
theadjustedintervalcomparedto theordinary
(3) is dramatic.To illustrate,
iZ. 025 V/P( - p/t
Figure 1 shows theactual cover95% Wald and adjusted
the
nominal
for
age
probabilities
but withn = (n + 4) trialsand p (X + 2)/(n + 4). The
forn = 5, 10, and 20.
function
of
p,
plotted
as
a
intervals
midpointequals thatof the 95% score confidenceinterval
occurs
forp near 0 or 1. For
For
all
great
improvement
n
of
butthecoefficient
(roundingZ.025to 2.0 forthatinterval),
that
whenp = .01, the
(1999)
stated
Brown
et
al.
instance,
Z.025 uses the variancep(l - p)/niat the weightedaverage
size of n requiredsuch thattheactualcoverageprobability
at least .94
of a nominal95% Wald intervalis uniformly
Coverage Probability
forall n above thatvalue is n = 7963, whereasforthead1.0
.10 the
justed intervalthis is truefor everyn; when p
=
11 for
values are n
646 forthe Wald intervaland n
theadjustedinterval.The Wald intervalbehavesespecially
.8
partlybecause
poorlywithsmalln forp neartheboundary,
of the nonnegligibleprobabilityof havingp = 0 or 1 and
.6
thus the degenerateinterval[0, 0] or [1, 1]. Agrestiand
Coull (1998) recommendedtheadjustedintervalforuse in
elementarystatisticscourses,since the Wald intervalbe.4
haves poorlyyetthescore intervalis too complexformost
coursesare
students.Many studentsin non-calculus-based
are neededto solve
equations
(which
by
quadratic
mystified
.2
using the
forthe score interval)and would have difficulty
it is ofIn
formula
above.
such
courses,
weightedaverage
2
4
6
8
0
teneasier to show how to adapt a simplemethodso thatit
t Pseudo Observations
workswell ratherthanto presenta morecomplexmethod.
Let It (n,x) denotethe adjustmentof the Wald interval
Figure 2. Boxplots of coverage probabilitiesfornominal95% ad- thatadds t/2 successes and t/2 failures.With confidence
justed confidenceintervalsbased on adding t pseudo observations;disCoull approxtributions
referto 10,000 cases, withn1 and n2 each chosen uniformly levels (1 ca)otherthan.95, theAgrestiand
between 0 and 1. imationof the score intervaluses It (n, x) with t = z2
between 10 and 30 and p 1 and p2 chosen uniformly
-
November2000, Vol.54, No. 4
The Amterican
Stcatisticicani,
281
Table 1. Summary of Performanceof Nominal 95% Confidence Intervalsfor Pi - P2 Based on Adding t Pseudo Obserfor(Pl,P2).
vations,AveragingwithRespect to a Uniform
Distribution
n
Characteristic
0
Numberof Pseudo Observationst
2
4
6
.949
.960
.958
.945
.954
.952
20
.924
.949
.956
.955
.948
.953
.951
30
.933
.949
.954
.954
.949
.950
.951
30, 10
.895
.948
.959
.959
.950
.950
.952
10
.059
.014
.013
.020
.035
.014
.012
20
.026
.008
.008
.012
.022
.009
.007
30
.017
.006
.006
.008
.016
.008
.006
30, 10
.055
.018
.012
.013
.023
.010
.011
10
.647
.670
.673
.668
.659
.654
.647
20
.480
.487
.488
.487
.485
.481
.477
30
.398
.401
.401
.401
.401
.398
.396
30, 10
.537
.551
.553
.551
.545
.537
.536
10
.880
.090
.010
.100
.235
.072
.046
20'
.404
.016
.002
.046
.175
.020
.008
30
.180
.005
.000
.023
.131
.009
.002
30, 10
.934
.112
.004
.028
.173
.029
.018
Length
Cov. Prob. < .93
NOTE: Table reportsmean of coverage probabilitiesCt(n,pl; n,p2), mean of distances Ct(n,pi; n,p2) -
.951 fromnominallevel,mean of expected intervallengths,and proportionof cases
<.93.
insteadof t = 4, for instanceadding2.7 pseudo observationsfora 90% intervaland 5.4 fora 99% interval.Many
instructors
in elementary
courseswill findit simplerto tell
studentsto use the same constantfor all cases. One will
do reasonablywell, especiallyat high nominalconfidence
levels, by the recipe of always using t = 4. The performanceof theadjustedinterval14(n,xc)is muchbetterthan
the Wald interval(1) for the usual confidencelevels. To
illustrate,Figure 1 also shows coverage probabilitiesfor
nominal99% intervals,when in = 5, 10, 20. Since the .95
confidencelevel is the mostcommonin practiceand since
this"add two successes and two failures"adjustmentprovides strongimprovement
over the Wald for otherlevels
as well,it is simplestforelementary
coursesto recommend
thatadjustment
textsthatrecuniformly.
Of theelementary
ommendadjustmentof theWald intervalby addingpseudo
observations,some (e.g., McClave and Sincich 2000) direct studentsto use 14(n,c) regardlessof the confidence
coefficient
whereasothers(e.g., Samuels and Witmer1999)
recommendt = z2
The purposeof thisarticleis to show thata simpleadjustment,adding two successes and two failures(total),
also worksquite well fortwo-samplecomparisonsof proportions.The simpleWald formula(2) improvessubstan282
Approximate
Bayes
.891
Distance
n,p2)
Hybrid
Score
10
Coverage
with Ct(n,p1;
8
Teacher'sCornier
tially afteradding a pseudo observationof each type to
each sample, regardingsample i as (nm+ 2) trials with
Pi = (Xi + 1)/(mn+ 2). There is no reason to expect an
optimalintervalto resultfromthismethod,or in particular fromaddingthe same numberof pseudo observations
to each sample or even the same numberof cases of each
to thisformbecause of the
attention
type,butwe restricted
simplicityof explainingit in a classroomsetting.
2. COMPARING PERFORMANCE OF WALD
INTERVALS AND ADJUSTED INTERVALS
we now
For the two-samplecomparisonof proportions,
of theWald confidenceformula(2)
studytheperformance
t/4of each typeto each
afteraddingt pseudo observations,
whentheintervalforP1 -P2 containsvalsample,truncating
ues < -1 or > 1. Denote thisintervalby It (n1, x1; n2, X2),
or It for short,so 1o denotesthe ordinaryWald interval.
Our discussionrefersmainlyto the .95 confidencecoefficient,but our evaluationsalso studied.90 and .99 coefficients.Let Ct(nm,pi;n2,P2), or Ct for short,denotethe
truecoverageprobabilityof a nominal95% confidenceintervalIt. We investigatedwhetherthereis a t value for
which ICt((nl,pl;n2,P2) - .951tendsto be small formost
ProportionBelow .93
ProportionBelow .93
1
1
.8
.8
.6
.6
nl = n2 = 10
.4
.4
.2
.2
0
_
_
_
_
_
_
_
nl = 30 n2 = 10
_0_
_
_
_
_
_
_
_
_
l
l
l
l
l
l
l
l
l
I
0
2
4
6
8
0
2
4
6
8
t Pseudo Observations
t Pseudo Observations
Figure3. Proportionof (p1, p2) cases withp1 and p2 chosen uniformly
between 0 and 1 forwhichnominal95% adjusted confidenceintervals
based on adding t pseudo observationshave actual coverage probabilitiesbelow ,93, forn1 = n2 = 10 and n 1 = 30, n2 = 10.
evenwithsmall nr and n2, withCt rarelyveryfar
fora vari(say .02) below .95. To exploretheperformance
etyof t withsmallnT, we randomlysampled10,000values
of (ni, P1; n2,P2), takingP1 and P2 independently
froma
uniformdistribution
over [0,1] and takingn, and n2 independently
froma uniform
distribution
over{10, 11,.. ., 30}.
For each realizationwe evaluatedCt(ni, P1; n2, P2) fort between0 and 8. Figure2 illustratesresults,showingskeletal
box plotsof Ct fort = 0, 2,4, 6, 8 (i.e., adding0, .5, 1, 1.5,
2 observationsof each typeto each sample).
(P1, P2),
Coverage Probability
The ordinary95% Wald intervalbehavespoorly.Its coverageprobabilitiestendto be too small,and theyconverge
to 0 as each pi moves toward 1 or 0. The coveragesfor
It improvegreatlyfor the positivevalues of t. The case
14 withfourpseudo observationsbehaves especiallywell,
havingrelativelyfew poor coverageprobabilities.For instance,theproportion
of cases fort = (0, 2, 4, 6, 8) thathad
were
<
.93
(.572,
.026,
.002, .046, .171). Similarly,the
Ct
proportionof nominal99% intervalsthathad actual coverbelow .97 were(.310, .012, .000, .000, .000),
age probability
and the proportionof nominal90% intervalsthathad ac-
Coverage Probability
Coverage Probability
1.00-
1.00-
*
.95-
tVV$
1.00-
.95-
.90 -
.90
.85 -
.85 -
i
::
wtAA?AAt:4
I
V IV
;.95t
:h&~~~~~~.A~~
h~~I)
hfv~~~~h
%"-i
j~t
.Y.
ti~~X%
0
90
.85 ------- Wald
Adjusted
.80
l_
l_
0
.2
l_
l___
.4
P2 =.1
.6
p1
'.
.8
1
.80
l_l
0
_l
.2
.4
P2 =.3
_l
.6
__
.8
p1
1
p1
.800
.2
.4
.6
.8
1
P2 =.5
of
Figure4. Coverage probabilitiesfornominal95% Waldand adjusted confidenceintervals(adding t = 4 pseudo observations)as a function
p1 whenp2= .1,.3,.5, withn1 = n2= 20.
The Amzericani
November2000, Vol.54, No. 4
Statisticiani,
283
Coverage Probability
Coverage Probability
1.00
Coverage Probability
1.00 -
1.00-
m
7
95
50
,
95
14,u
.J95
.90
'
',
'
-
'90
.85
",
.80 -
------
Wald
''""''"'v\'"-
.90'P"'"'"'e"'
""'n"'""'''r'"G
85
8
.80 -
.80 -
-
.75 -
'"
Adjusted
.75 -75
.70
__
0
_
__
.2
_
.4
__
.6
nl = n2 = 10
p1
_
.8
1
.70 -
_
0
_
.2
_
__
.4
p1
.6
nl= 20, n2= 10
.8
1
.70 -
p1
0
.2
.4
.6
.8
1
nl= 40, n2= 10
of
Figure5. Coverage probabilitiesfornominal95% Wald and adjusted confidenceintervals(adding t = 4 pseudo observations)as a function
p1 whenp2 = .3 when n1 = n2 = 10, n1 = 20, n2 = 10, and nl = 40, n2 = 10.
tual coverageprobabilitybelow .88 were (.623, .045, .016, (nm,n2) = (10, 10), (20, 10), and (40, 10). Figure6 showsC0o
.131, .255). The patternexhibitedhere is illustrativeof a and C4 as a functionofP1 whenP1 -P2 = 0 or .2 and when
varietyof resultsfromanalyzingCt more closely,as we the relativeriskP1/P2 = 2.0 or 4.0, when ni = n2 = 10.
In Figures4-6, onlyrarelydoes the adjustedintervalhave
now discuss.
below thenominallevel.On theother
We analyzed the performanceof the It intervalfor coveragesignificantly
various fixed (nl, n2) combinations.Table 1 summarizes hand,Figures4 and 6 showthatit can be veryconservative
in an average sense based on tak- when P1 and P2 are both close to 0 or 1, say with (P1 +P2)/2
some characteristics,
ing (P1, P2) uniformfromthe unit square, for (n1, n2) = below about.2 or above about.8 forthesmall samplesizes
however,to the verylow
(10,10), (20, 20), (30,30), (30,10). Although the adjusted studiedhere. This is preferred,
of
Wald
in these cases. Figures 7
the
interval
coverages
interval14 tends to be conservative,it compareswell to
8
their
showing
surfaceplotsof C0o
and
behavior,
illustrate
othercases in themean of thedistancesICt - .951and esand
C4 overtheunitsquarewhenni = n2 = 10. The spikes
of cases forwhichCt < .93. For n.
peciallytheproportion
for
10,
instance,theactualcoverageprobability
is below at values of pi in Figures4 and 5 become ridgesat values
in
.93 for 88% of such cases withthe Wald interval,but for of P1 P2 thesefigures.
The poor performance
of theWald intervaldoes not oconly 1% of themwith 14. Figure 3 shows the proportions
cur because it is too short.In fact,for moderate-sizedpi
of coverageprobabilitiesthatare below .93 as a function
it tendsto be too long. For instance,when nr = 12 = 10,
of t, for(n1, n2) = (10, 10) and (30, 10). The improvement
Io has greaterexpectedlengththan14 forP2 between.11
over theordinaryWald intervalfromaddingt = 4 pseudo
and .89 when P1 = .5 and for P2 between .18 and .82
observationsis substantial.Remainingfiguresconcentrate
when P1 = .3. When n, = n2 = n and when Pi =
on thisparticularadjustment,
whichfaredwell in a variety
P2 = P, Io has greaterlengththanIt when p falls within
of evaluationswe conducted.
/.25 - n(4n + t)/[24n2 + 12nt+ 2t2] of .5. For all t > 0,
over theunitsquare for (P1, P2) thisintervalaround.5 shrinksmonotonically
Averagingperformance
as n increases
can mask poor behaviorin certainregions,and in practice to .50 i
.50/v3, or (.21,.79), which applies also to the
certainpairings(e.g., JP1- P21 small) are oftenmorecom- Agrestiand Coull (1998) adjustedintervalin the singlemon or moreimportant
thanothers.Thus,besides studying samplecase. As in thesingle-proportion
case, theWald inthese summaryexpectations,we plottedCt as a function tervalsuffers
fromhavingthemaximumlikelihoodestimate
of P1 for variousfixedvalues of P2, P1 - P2, and P1/P2. exactlyin themiddleof theinterval.
To illustrate,Figure4 plots the Wald coverageCo and the
Thereis nothinguniqueaboutt = 4 pseudo observations
coverage C4 for the adjustedinterval,fixingP2 at .1, .3, in providinggood performance
of adjustedintervalsin the
and .5, for ni = n2 = 20. The poor coverage spikes for one- and two-sampleproblems.For instance,Figure 3 and
the Wald intervaldisappear with 14, but this adjustment Table 1 showthatotheradjustments
oftenworkwell. A reis quite conservativewhen P1 and P2 are both close to 0 gion of t values providesubstantialimprovement
over the
or bothclose to 1. The adjustment14 performsreasonably Wald interval,withvalues near t = 2 being less conservaevenwithvery tive thant = 4. We emphasizedthe case t = 4 earlierfor
well,and muchbetterthantheWald interval,
smallor unbalancedsamplesizes. Figure5 illustrates,
plot- the two-samplecase because it rarelyhas poor coverage.
to
some conservativeness
tingCo and C4 as a functionof P1 withP2 fixedat .3, for We believeit is worthpermitting
284
Teacher'sCorner
Coverage Probability
1.0
Coverage Probability
P1 -P2=0
l
1.0
.9
.9g-
.8 -
- ."-
-
-
..8
Wald
.7
Adjusted
p1
.6
0
.2
Coverage Probability
1.0
.4
.6
.8
p1
.6
1
.2
.4
Coverage Probability
P1/P2=2
1.0
.9
-|,
.8
.8
.7 -.7
.6
P1 -P2=.2
p1
l
I
0
P1/P2=4
' ,--
,'K
j1/~Ai
.'
-
l~~~~~~~~p
-
.8
.6
.2
.4
.6
.8
1
.6
,'
-
p1
l
I
0
.2
.4
.6
.8
1
of
Figure6. Coverage probabilitiesfornominal95% Wald and adjusted confidenceintervals(adding t = 4 pseudo observations)as a function
p1 whenp1-p2 = 0 or .2 and whenp1/p2 = 2 or 4, forn1 = n2 = 10.
courses,it focuseson the simpleIt adjustmentratherthan
methodsthatmaybe suggestedby statisticalprinciples.To
one approachis to invert
finda good methodmoregenerally,
=
A
thathas good properties,such
a testof Ho: P1 P2
as using the large-samplescore test(Mee 1984) or profile
likelihoodmethods(Newcombe 1998b). The score testof
P1 - P2 = 0 is the familiarPearson chi-squaredtest,so
thisapproachhas theadvantagethattheconfidenceinterval
is consistentwith the most commonlytaughttest of the
samenominallevel.The methodof obtainingtheconfidence
intervalis too complex for elementarycourses,however,
partlybecause thetestof P1 - P2 = A requiresfindingthe
maximumlikelihoodestimatesof (P1, P2) forthe standard
errorsubjectto theconstraint
P1 - P2 - A.
Newcombe(1998b) evaluatedvariousconfidenceinterval
methodsforP1 - P2. He proposeda methodthatperforms
betterthan the Wald intervaland similarto
substantially
simpler(althe
score
whilebeingcomputationally
interval,
THE
3. COMPARING
ADJUSTED INTERVAL
courses).
statistics
most
elementary
complex
for
though
too
WITH OTHER GOOD INTERVALS
Many methodshave been proposedforimprovingon the His methodis a hybridof resultsfromthe single-sample
let (ej,ui) be the
ordinaryWald confidenceintervalforP1 - P2. Since this score intervalsforP1 and P2 Specifically,
articledicussesmethodsappropriate
in elementary
statistics rootsforpi in Z, /2= I-Pil/
pi (l - pi) ni. Newcombe's
ensurethatthecoverageprobability
rarelyfallsmuchbelow
thenominallevel. In the one-samplecase the adjustedinthe
terval'2(n, x) is betterthan14(n, x) in approximating
score intervalwith small confidencelevels, such as 90%.
An advantageof the interval12(n,x) forp is consistency
betweenthe single-samplecase and our recommendedadjustment14(n1, x1; n2,x2) for two samples. For instance,
as ri2 ~+oc and the second sample yields a perfectestimate,the resulting"add two successes and two failures"
two-sampleintervaluses the firstsample in the same way
as does the"add one success and one failure"single-sample
interval.However,forthesingle-sampleproblemwe prefer
the 14(n,x) interval,since .95-is by farthe mostcommon
confidencelevel in practiceand thisintervalworkssomewhatbetterthan'2 (n, x) in thatcase.
TheAmericanStatisticiani,
November2000, Vol.54, No. 4
285
hybridscore intervalis
(il - P2)
-
Z
1
/2
) +
U2(1-U2)
(Pl-_p2+Zo2 Ul(1-Ul)
ni
+?
2(l1-2)1
1
n2
Comparedto the adjustedintervalI4, the hybridscore intervalalso is conservativewhenP1 and P2 are bothclose to
however,withmean
0 or 1; overall,it is less conservative,
coverageprobabilitycloser to thenominallevel (see Table
1). Likewise, it tends to be a bit shorter.It has a somewhathigherproportionof cases withcoverageprobability
being too small,mainlyforvalues of JP1- P21 near 1; for
the 10,000 randomlyselected cases with ni also random
was
between10 and 30, theminimumcoverageprobability
.92 forthe95% adjustedintervaland .86 forthe95% hybrid
score interval.
The adjusted intervalI4 and the hybridscore interval
bothhave a greatertendencyfordistal non-coveragethen
For instance,forthe 10,000randomly
mesialnon-coverage.
selected cases, the mean probabilityfor which the lower
limitexceeds P1 - P2 whenP1 - P2 > 0 or the upperlimit
is less thanP1 - P2 whenP1 - P2 < 0 was .030 for14 and
.033 forthe 95% hybridscore interval,whereasthe mean
probabilityfor whichthe upperlimitis less thanP1 - P2
whenP1 - P2 > 0 or the lowerlimitexceeds P1 - P2 when
P1 - P2 < 0 was .013 forI4 and .014 forthe 95% hybrid
score. As t increasesforIt, theratioof incidenceof distal
non-coverageto mesial non-coverageincreases;for these
randomlyselectedcases, fort = (0, 2, 4, 6, 8) it equals (.7,
1.2,2.2, 4.3, 8.1). Unliketheadjustedintervaland theWald
thehybridscoreintervalcannotproduceovershoot,
interval,
CoverageProbability
CoverageProbability
1
95
.9
.7
.7
p2
for95% nominaladjustedconfiFigure8. Coverageprobabilities
for95% nominalWaldconfidence dence interval
Figure7. Coverageprobabilities
ofp1
as a function
(addingt = 4 pseudo observations)
ofp1 andp2, whenn1 = n2 = 10.
as a function
interval
and p2, whenni1= n2 = 10.
286
Teacher'sCorner
Finally,an alternativeway to improvethe Wald method
withthe intervalforP1 - P2 extendingbelow -1 or above
+ 1 and thusrequiringtruncation.
OvershootforIt is less is witha continuity
correction(Fleiss 1981,p. 29). As with
commonas t increases.For instance,for these randomly othercontinuity
corrections,this generallyresultsin conselectedcases, the mean probabilityof overshootfort
usually more so thanthe adjusted
servativeperformance,
(0, 2, 4, 6, 8) was (.048, .033, .016, .006, .000).
like those of
interval.However,the coverageprobabilities,
Since standardintervalsforp andP1 -P2 improvegreatly the Wald interval,can dip substantially
below the nominal
to shrinkageof pointesti- level whenbothpi are near0 or 1.
withadjustmentcorresponding
mates,one wouldexpectintervalsresultingfroma Bayesian
approachwithcomparableshrinkagealso to performwell
sense. Carlin and Louis (1996, pp. 117in a frequentist
4. TEACHING THE ADJUSTED INTERVALS
123) providedevidence of this typefor estimatingp. For
P1 - P2, considerindependentuniformpriordistributions Agrestiand Coull (1998) motivated
theiradjustedinterval
of pi is beta with
forP1 and P2. The posteriordistribution
forthe
as a simpleapproximation
(3) fora singleproportion
meanPi = (Xi + 1) j (ni + 2) and variancePi (I -Pi)/ (ni + 3). score 95% confidenceinterval.We know of no such simforthedistribution
of
Using a crudenormalapproximation
ple motivationforthe adjustedintervalforthetwo-sample
of theposteriorbeta variatesleads to theinthedifference
withtheBayesian incomparison,otherthanthesimilarity
terval
terval(4). A problemforfutureresearchis to studywhether
theoreticalsupportexists for this simple yet effectiveadjustment,suchas Edgeworthor saddlepointexpansionsthat
forthetail behav+ P2(l-P2)
(4) mightprovideimprovedapproximations
i1(l-i3)
(P1-P2)?Za/2
ior of Pl - P2The motivationneeded for teachingin the elementary
How can one motivate
statistics
course is quite different.
This has the same centeras the adjusted interval14 but
In
the
observations?
single-samplecase we
adding
pseudo
uses ni + 3 insteadof ni + 2 in the denominatorsof the
binomial
distributionis highly
that
the
remind
students
standarderror.For elementarycourses,this intervalwas
and
because
ofthisperhaps
skewed
as
0
and
1,
approaches
p
suggestedby Berry(1996, p. 291). Like Newcombe's hybe
of
interval.
As
supportfor
should
not
the
the
midpoint
p
brid score interval,it tends to performquite well, being
ExplorStat
(available
use
the
software
students
this,
we
have
slightlyshorterand less conservativethan14 but suffering
simulation
Through
at
http://www.stat.ufl.edu/-dwack/).
occasionalpoorercoverages(see Table 1). For sample size
of statisticalmethods
combinationswe considered,its minimumcoverageproba- it showshow operatingcharacteristics
sizes
and
populationdistrisample
as
students
vary
change
bilitywas onlyslightlybelow thatfortheadjustedinterval.
such as .10 or
when
takes
values
p
butions.
For
instance,
If conservativeness
is a concern(e.g., if bothpi are likely
of Wald
observe
a
students
relatively
high
proportion
.90,
to be close to 0), the approximateBayes and hybridscore
size
to
when
is
the
sample
p
n
30,
failing
contain
intervals
intervalsare slightlypreferableto 14.
inference
for
is
adequate
large-sample
their
text
suggests
The adjusted interval14 (and the similar approximate
Bayes interval(4)) is simplerthanothermethodsthatim- fora mean.
Most students,however,seem more convincedby speprove greatlyover the Wald interval.Thus, we believe it
cific
exampleswheretheWald methodseems nonsensical,
is appropriateforelementarystatisticscourses.We do not
such
as whenp = 0 or 1. We oftenuse data froma quesclaim optimalityin any sense or thatothermethodsmay
to the studentsat the beginningof
tionnaire
administered
notbe betterforsome purposes.Some applications,forinone
of us (Agresti)taughta class to 24
term.
For
instance,
stance,may requirethatthe true confidencelevel be no
in
honors
students
fall
1999. In responseto the question,
lower thanthe nominallevel, mandatinga methodthatis
"Are
you a vegetarian?",0 of the 24 studentsresponded
(e.g.,Chan and Zhang 1999). Also,
necessarilyconservative
we recommend14 forintervalestimationand notforan im- "yes,"yettheyrealizedthattheWald intervalof [0, 0] was
We
populationproportion.
plicittestof Ho: P1 - P2 = 0, althoughsuch a testwould notplausiblefora corresponding
be morereliablethanone based on theWald interval.For have also used homeworkexercisessuch as estimatingthe
whenall
of success fora new medicaltreatment
a significance
test,we would continueto teachthePearson probability
chi-squaredtestin elementarycourses. The testbased on 10 subjectsin a sample experiencesuccess, or estimating
of deathdue to suicidewhena sampleof 30
14 is too conservativewhen the commonvalue of pi un- theprobability
der the null is close to 0 or close to 1, for most sample deathrecordshas no occurrences.(Again,theWald interval
sizes more conservativethanthe Pearson testfor such pi. is [0, 0], but the National Centerfor Health StatisticsreAlthoughthe adjustedintervalis notguaranteedto be con- portsthatin theUnitedStatestheprobabilityof deathdue
sistentwiththe resultof the Pearson test,it usually does to suicideis about .01.) Althoughone can amendtheWald
agree.For instance,forcommonvalues (.1, .2, .3, .4, .5) of methodto improveits behaviorwhenp 0 or 1, such as
the endpointsby ones based on the exact bitestwithnominal by reeplacing
Pi, the95% versionof 14 and thePeareson
significancelevel of .05 agree withprobability(.972, .996, nomialtest,makingsuch exceptionsfroma generalrecipe
themainidea of takingtheestimate
.9996, 1.000, 1.000) whennl = 2=30 and (1.0, 1.0, 1.0, distractsstudentsfreom
multipleof a standarderror.
1.0, 1.0) whennl = 2=10.
plus and minusa normal-score
November2000, Vol.54, No. 4
The AmericanStatistician,
287
[ReceivedSeptemnber
1999. RevisedFebru-cary
2000.]
In thesingle-samplecase
Whyfoutr
pseudo observations?
we explain that this approximatesthe resultsof a more
complex methodthatdoes not requireestimatingthe unREFERENCES
knownstandarderror;here,we explainthe conceptof invertingthetestwithnullstandarderror,or findingsolutions Agresti,A., and Coull, B. A. (1998), "Approximateis Betterthan'Exact'
of (p - p) = 2 /p(l -p)/n thatdo not requireestimating forIntervalEstimationof BinomialProportions,"TheAmericanStatistician,52, 119-126.
/p(l - p)/n. In thetwo-samplecase one could explainthat
Berry,
D. A. (1996), Statistics.A Bayesian Perspective,Belmont,CA:
prior
thisapproximatesa statisticalanalysisthatrepresents
Wadsworth.
(Some inbeliefsabout each pi by a uniformdistribution.
Brown,L. D., Cai, T. T., and DasGupta,A. (1999), "ConfidenceIntervals
of course,will prefera more fullyBayesian apstructors,
fora BinomialProportionand EdgeworthExpansions,"technicalreport
proach,as in Berry1996.)
StatisticsDepartment.
99-18, PurdueUniversity,
The poor performanceof the ordinaryWald intervals Carlin,B. P.,and Louis, T. A. (1996), Bayes anidEmpiricalBayes Methods
London: Chapmanand Hall.
for-Data Anialysis,
for p and for P1 - P2 is unfortunate,
since they are the
simplestand most obvious ones to presentin elementary Chan,I. S. F., and Zhang,Z. (1999), "Test-BasedExact ConfidenceInterBiomet7ics,55,
forthe Differenceof Two Binomial Proportions,"
courses.Also unfortunate
fortheseintervalsis thedifficulty vals
1202-1209.
of providingadequate sample size guidelines.Introductory Fleiss, J. L. (1981), StatisticalMethoclsfor-Rates anidPr-oportions
(2nd
butthese
textbooksprovidea varietyof recommendations,
ed.), New York:Wiley.
have inadequacies (Leemis and Trivedi1996; Brownet al. Ghosh,B. K. (1979), "A Comparisonof Some ApproximateConfidenceInStatistical
Journ-7Zal
of theAmericani
tervalsfortheBinomialParameter,"
1999). And, needless to say, most texts do not indicate
Association,74, 894-900.
what to do when the guidelinesare violated,otherthan
perhapsto consult a statistician.The resultsin this arti- Leemis,L. M., and Trivedi,K. S. (1996), "A Comparisonof Approximate
IntervalEstimatorsforthe BernoulliParameter,"The Anmericani
Statiscle suggestthatfor the "add two successes and two failticiani,50, 63-68.
ures" adjustedconfidenceintervals,one mightsimplyby- McClave, J. T., and Sincich,T. (2000), Statistics(8th ed.), Englewood
pass sample size rules. The adjusted intervalshave safe
Cliffs,NJ:PrenticeHall.
forpracticalapplicationwith al- Mee, R. W. (1984), "ConfidenceBounds fortheDifferenceBetweenTwo
operatingcharacteristics
40, 1175-1176.
Probabilities,"
Biomiietr-ics,
mostall samplesizes. In fact,we notein closing(and with
tonguein cheek) that the adjustedintervals14(n, x) and Newcombe,R. (1998a), "Two-Sided ConfidenceIntervalsforthe Single
Proportion:Comparisonof Seven Methods,"Statisticsin Medicinle,17,
14(n1, x1; n2, X2) have theadvantagethat,as withBayesian
857-872.
methods,one can do an analysiswithouthavingany data.
(1998b), "IntervalEstimationfor the DifferenceBetween IndeIn thesingle-samplecase theadjustedsamplethenhas p =
pendentProportions:Comparisonof Eleven Methods,"Statisticsin
2/4, and the 95% confidenceintervalis .5 i 2A/(.5)(.5)/4,
Medicinie,17, 873-890.
or [0, 1]. In thetwo-samplecase theadjustedsampleshave Samuels,M. L., and Witmer,J.W. (1999), Statisticsfor theLifeScienices
(2nd ed.), EnglewoodCliffs,NJ:PrenticeHall.
P, = 1/2 and P2 = 1/2,and the 95% confidenceintervalis
S. E. (1993), "ConfidenceIntervalsfor a Binomial Proportion,"
Vollset,
or
Both
(.5 .5) i 2\ [(.5) (.5)/2] + [(.5) (.5)/2], [-1, +1].
Statisticsin Medicine,12, 809-824.
as one would hope froma freanalysesare uninformative,
E. B. (1927), "Probable Inference,the Law of Succession, and
quentistapproachwithno data. No one will get into too Wilson,
StatisticalAssociationl,
StatisticalInference,"Journ71al
of theAmnerican
muchtroubleusingthem!
22, 209-212.
288
Teacher'sCoriier