Download A SAS Macro for Exact Power of Chi-Square Tests of Equality of Two Proportions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Statistics
A SAS· MACRO FOR EXACT POWER OF Cm-SQUARE TESTS
OF EQUALITY OF TWO PROPORTIONS
Carl E. Pierchala
Abstraa: Approximations are typically used when
computing the power of the Pearson chi-square test for
the equality of two proportions. However, it is
straightforward although compUtationally intensive to
oll1y applied to the uncorrected chi-square test. For
various reasons cited in bis paper, Piereba1a did not
release the actual code used in performing that work.
calculate the power exactly. By judiciously restricting
the region of the sample space where the appropriate
probabilities are summed, it is possible to speed up the
computations and still obtain essentially exact power
values. Also, it is easy to generalize the computations
to handle the corrected chi-square test of Yates and a
similarly corrected chi-square test proposed by Garside.
'Ibis paper describes the methods used to compute exact
power values for these chi-square tests, and gives a
SAS macro for the DATA step to do the computations.
Examples of the macro's usage are provided.
The purpose of this work is to make an improved
routine, programmed as a SAS macro called
POWCHIEP, available to users. This new routine is
more efficient, faster, more user-friendly, and
generalized to give power for Yates' and Garside's tests
1. INTRODUCTION
The power of a statistical test is often of interest when
planning or interpreting statistically-based
investigations. For studies comparing two proportions,
there.is no unanimity of opinion as to the most
appropriate statistical test. Pearson's uncorrected chisquare test and Yates' corrected chi-square test both
have their supporters. Another variant of the chi-square
test has been studied by Garside and Mack (1976). The
capability to accurately compute power for each of
these tests is thus of interest.
Power of chi-square tests usually is determined by using
approximations. However, Piercha1a (1988) noted a
situation involving the uncorrected chi-square test in
which two variant approximations did not agree well,
and also did DOt agree well with essentially exact power
values for the test. In that paper, Pierchala developed
an approach for computing 'almost' exact power of the
uncorrected test. The approach substantially reduced
the computational burden by restricting the
computations to a rectangular subspace of the full
sample space. Depending on the desired accuracy of
the power computstion, the size of the rectangular
subspace was varied in a specific way so as to
guarantee accuracy to a prespecified level.
Pierchala's work in 1988 was programmed in APL and
744
as well as for the uncorrected test. Finally, the routine
returns a refined estimate of the error in the
computation due to restricting the computation to the
rectangular subspace.
In the next section, the theory behind the computation is
reviewed in sufficient detail to enable the user to
employ the routine in a SAS data step. Also, the notion
of 'almost exact' power is discussed. After that comes
a section on SAS programming considerations, which
includes a description of the computational flow in the
routine. Two examples of the use of the macro are
then given. Finally, some conclusions are offered, and
then the macro to compute almost exact power,
POWCHIEP, is listed together with a companion
macro, POWCHIDR. which makes it easy for the user
to drop unneeded intermediate SAS variables used by
POWCHIEP.
It is to be noted that SAS-style notation is used in this
paper rather than conventional notation. The SAS
notation mimics closely the mathematical notation. For
example, pi is used in lieu of Pl. The reader interested
in a treatment utilizing the mathematical notation is
referred to Pierchala (1988). Also note that in the
actual macro, an underscore character '_' ~ most
of the variable names; for example, "'pI is used rather
than pl. This should reduce the chance that in the
calling DATA step the user will use a variable name
that is the same as one used temporarily by
POWCHIEP. For simplicity, in most of this paper the
leading underscore charticter is not shown.
2. THEORETICAL BACKGROUND
a) Review of Hypothesis Testing
Statistics
Many studies are designed with the hope of
demonstrating that a difference exists between two
population proportiODS (of success,say), denoted by pI
and p2 for the first and second populations respectively_
Because it is usually impractical (if not impossible) to
study the entire populations, samples are taken from
esch population. In the ideal scenario, independent
simple random samples are taken with replacement
from the two populations.
Two cases are distinguished, depending on the
formulation of the alternative hypothesis (HI) and the
null hypothesis (HO):
i.
One-sided tests (nside= 1)
HI: p2 GTpl
HO: p2 LEpl
ii.
Two-sided tests (oside=2)
HI: pI NEp2
HO: pI EQp2.
Note that the conventional ordering of the two
hypotheses is reversed. This is because HI is what the
investigator hopes to demonstrate, so HO is set up as
the straw hypothesis, which hopefully will be
contradicted (probabilisticaIly spesking) by the data
when they are gathered and analyzed, allowing us to
conclude that HI appears to be true. In other words,
we hope that the samples' results can be shown to be
unlikely were HO actually true, in which case we
conclude that the data tend to falsify HO, leading us to
conclude that HI is true.
When the study is carried out and the results come in,
they are often tabulated in a table similar to the
following.
SAMPLE 1
SAMPLE 2
SUCCESSES
sl
s2
FAILURES
nl-s1
nZ-s2
nI
nZ
TOTALS
Next, a test statistic cmSQ is calculated, where the
exact formula for CHISQ depends on the test chosen by
the investigator. If cmSQ exceeds a critical value
CRVAL (and in the one-sided case, if also sllnl LT
s2lnZ), then HO is rejected and it is concluded that the
alternative hypothesis is true.
Now with random sampling, there is for any given pI
and p2 some probability of rejecting the null hypothesis.
In the ideal scenario this probability can be calculated,
and that is what the macro POWCHIEP given in this
paper computes 'almost' exactly. When the alternative
hypothesis is actually true, then the probability of
rejecting the null hypothesis is called the 'power' of the
test. In what follows, the term power is used, loosely,
synonymous with 'probability of rejecting the null
hypothesis'. In other words. POWCHIEP really
computes the probability of rejecting the null
hypothesis, but for short we say it is computing power.
keeping in mind that we are not using the latter term in
its strict technical sense with this usage.
In practice, the test statistic CHISQ is usually one of
the variants of chi-square test statistics. Some of these
involve a so-called continuity correction, denoted by
CC. There is one such test statistic called the
uncorrected chi-square, sometimes attributed to
Pearson. This and two 'corrected' variants can be
calculated with a single formula which includes a term
CC. A single formula for cmSQ could be given. but
for efficiency in programming it is preferable to
ca1cu1ate cmSQ in six steps. They are:
= nl+n2
n
noprd =
fl =
s
d
chisq =
,
nlnl/nZ ,
nl-s1 ,
= sl+s2
•
= abs«sl *(n2-s2»-(s2*fl»
- cc*n •
noprd*d*d/s/(n-s) •
Now, the three variants of the chi-square statistic are:
the uncorrected, obtained using CC = 0; Ya1es'
corrected, obtained using CC 0.5; and Garside's
corrected, obtained using values of CC which vary as a
function of the sample sizes, nl and n2, and nside, the
number of sides (lor 2) in the test. See Garside and
Mack (1976) for more detail regarding the latter test.
=
For the two-sided test, CRVAL is taken to be the 1alpha quantile of the chi-square distribution with one
degree of freedom, while for the one-sided test CRVAL
is taken to be the 1-2*alpha quantile of the chi-square
distribution with one degree of freedom. In either case,
745
Statistics
the nominal 'size' of the test will be alpha. Alpha
typically is chosen to be 0.05 or 0.01. Because of the
discrete nature of the distributions of sl and sz, it turns
out that the true size of the test is not exactly equal to
alpha. This fact seems to be part of the controversy as
to which version of the chi-square test is the best to
use, with some authors preferring to guarantee that the
size of the test will almost always be less than or equal
to alpha (hence choosing Yates' test). while other less
conservative authors are willing to live with the size
often a bit larger than alpha (hence choosing the
uncorrected test).
b) Computation of Power
There are exactly (nl + 1)*(n2+ 1) distinct possible
results that could occur when taking the two samples,
because sl could be 0, 1,2•••.• or nl, and sz could be
O. 1, 2, •.•• or n2. The set of (n1+ 1)*(n2+ 1) points
(sl,sZ) is called the sample space. It is divided into the
acceptance region (those points where the test does not
reject HO) and the rejection region (those points where
the test does reject HO). Adding up the probabilities for
the (sl.sZ) in the rejection region gives the probability
of rejecting HO. which is the power of the test (strictly
speaking) when the true pi and p2 are consistent with
the alternative hypothesis. In the ideal scenario. the
probability of rejecting HO is straightforward to
compute because sl and sz then follow independent
binomial distributions.
Let psI denote the probability that the first sample
yields sl, and let ps2 denote the probability that the
second sample yields 12. Now the sum. of the binomial
probabilities from 0 to 81 is called the cumulative
binomial probability. denoted cpsi. Let the sum. of the
binomial probabilities from 0 to sl-1 be denoted cpsp.
Then cpsl-cpsp equals psI. Because SAS provides the
cumulative binomial function PROBBNML, this method
of taking differences is used to compute psi. rather than
using the usual formula given in many statistics texts.
Similarly. ps2 is easily computed.
Then. because the two samples are statistically
independent in the ideal scenario. the probability of
obtaining (sl.s2) is just psls2
psI *ps2. Simply
adding up these values for all the points in the rejection
region gives the probability of rejecting HO. In
practice, this can be done using nested loops indexing
on 51 and 12. In the inner loop it is determined if
(sl.s2) is in the rejection region. and if so. psls2 is
=
746
added to the cumulating power.
c) Almost Exact Power
Exact computation of power as described above is
rather time consuming even on a 486 PC. However,
almost exact values of power can be obtained much
more quickly by restricting the computations to a
'rectangular subspace' of the sample space.
Let e be a small nonnegative value. e.g. O.OOOOOOOS.
We restrict sl to the range [sIb. sle]. where sIb is the
largest s1 such that Pr[sl LT sIb] is less than or equal
to el2. and sle is the smallest sl such that
Pr[sl GT sle] is less than or equal el2. Then the
probability that s1 will be in [sIb. sle] is greater than
or equal to l-e. Likewise, restricting 12 to an
analogously defined range [s2b. s2e] guarantees a
probability l-e or greater of including 12. The
rectangular subspace is the set of all points (sl.s2)
where sl and s2 are restricted to each of the above two
'marginal subranges' respectively.
Because sl and 12 are statistically independent under the
ideal scenario. the probability of being in the
rectangular subspace must equal or exceed (l-e)*(l-e)
1-2*e+e*e. It follows that the chance of being
outside the rectangular subspace is less than or equal to
1-(1-2*e+e*e)
2*e-e*e. which is less than 2*e
when e is positive. It follows that we can compute
almost exact power by restricting computations to the
rectangular subspace based on a reasonably small e.
Since the amount of probability mass outside the
rectangular subspace does not exceed 2*e. the amount
of error in power (due to restricting computation to the
rectangular subspace) cannot exceed 2*e.
=
=
3. SAS PROGRAMMING CONSIDERATIONS
a) The cslling DATA step program uses the statement
'%powchiep(•••);· to invoke the macro to compute the
power of a chi-square test of equality of two
proportions. This is a name-style invocation with
keyword parameters. The parameters and their defaults
are defined at the end of this paper near the beginning
of Listing 1 which gives the macro. Defaults are
provided for some of the parameters. Relatively little
error checking is done, so the user must take care when
passing parameters lest the routine return bad results or
cause SAS to malfunction.
Statistics
b) Invoking the macro causes the SAS code for doing
the power computations to be inserted into the calling
DATA step program. Some examples of the use of
POWCHIEP are given in the next section of this paper.
Thirty-six intermediate variables and two temporary
arrays are used in the computations. To help prevent
the possibility that one of the intermediate variables or
temporary arrays is identical to a calling DATA step
variable or array name, the name of each of the
intermediate variables and temporary arrays begins with
an underscore '_' character. The user may wish to
drop the temporary variables in his DATA step
program. This can readily be done by invoking the
statement ''.ipowehidr;', in which case no KEEP
statements should be used. See the Drop statement in
POWCHlDR, displayed near the end of Listing 1, for
an alphabetica1list of the 36 temporary variables. The
two temporary arrays are named -pIa and -p2a in
POWCHIEP; recall that for simplicity they are referred
to as pIa and p2a in the body of this text.
c) The parameters DIM1 and DIM2 must be integer
constants, not variables, and must be sufficiently large.
Choosing DIMI to always equal or exceed n1 and
choosing DIM2 to always equal or exceed n2 is sure to
work:. DIMI and DIM2 are the upper dimensions,
respectively, for arrays pIa and p2a, which hold the
binomial probabilities corresponding to the two
marginal subranges. Both of these arrays begin at
element 1MO. This has the benefit of reducing
computations when indexing elements of these arrays.
Also, pIa and p2a are temporary arrays, so they require
less memory. They have no associated variables, and
they do not need to be dropped.
d) Most of the parameters are assigned to temporary
variables at the beginning of POWCHIDR, primarily to
guarantee that their values can be printed if so
requested through the 'round' option.
e) Cumulative binomial probabilities are given by the
SAS function PROBBNML. For sl, binomial
probabilities are determined iteratively by taking the
difference between the current cumulative value, cpsl,
and the previous value, cpsp. The basic code is as
follows:
*
cpsp=O; cumulative values start at zero;
do 81=0 to nlj
cpsl=probbnml(pl,nl,sl);
psi = cps l-cpsp;
cpsp=cpsl;
end;
Thus at iteration sl, the binomial probability is psI, and
the cumulative binomial probability is cpsl.
f) In the macro, the code demonstrated directly above
is intertwined with additional statements so as to give
the limits [sIb, sle] for the marginal subrange and so as
to put the binomial probability for each sl in the
subrange into the array pIa. A key statement is
if cpsl GE minI and cpsp LE minh then do;
Fust note that earlier, the assignments
minl=el2;
minh= l-e12;
had been made. The condition 'cpsl GE mini' causes
the exclusion from the marginal suhrange of those sl in
the low end of the distribution such that their
cumulative probability mass is less than elZ. The
condition 'cpsp LE minh' causes the exclusion from the
marginal subrange of those s1 in the high end of the
distribution such that their cumulative probability mass
is less than elZ. The use of cpsp rather than cps! in the
latter condition is a subtlety due to 51 following a
binomial, hence discrete rather than continuous,
distribution. This subtlety follows basically from the
:fact that the probability of 51 or more successes is equal
to one minus the probability of 81-1 or fewer successes.
g) The marginal subrange [sZb, s2e] and the
corresponding binomial probabilities (stored in array
p2a) are determined for the second population in exactly
the same manner as was done for the first population.
h) The actual determination of the power is then done
using two loops, the second nested in the first. The
outer loop, indexing on 51, obtains pst from the array
pIa. The inner loop, indexing on sZ, obtains psZ from
the array p2a. Then pslsZ is obtained as the product
psI *ps2. Next CHlSQ is determined. Finally, CHlSQ
is compared to CRYAL. If CHlSQ is larger than
CRYAL (and in the one-sided case, if the sample
proportions also differ in the appropriate direction),
then psls2 is added to the cumulating power value.
i) When computing CHISQ for use as described
directly above, the six steps described previously are
carried out at different points in the code for
computational efficiency. The first two steps are done
747
Statistics
prior to the outer (sl) loop, the third step is done inside
the outer (51) loop but outside the inner (12) loop, and
the last three steps are done inside the inner loop.
C1USQ is set to zero in lieu of the final calculation in
two special cases giving division by zero. In both
cues, the sample proportions are identical (both are
either zero or one). This is consistent with HO, so
C1USQ is set to zero in both of those cues.
j) The parameter 'round' can be employed to put
intermediate results onto the log if checks are desired in
regard to either of two situations. The first is in regard
to the possibility of computational inaccuracy in either
C1USQ or in CRV AI.. (Indeed, in a phone call to SAS
Technical Support, it was learned that CINV is accurate
only to lE-5. CINV is used to compute CRVAL, so
CRVAL is only accurate to this level.) The second
situation is in regard to the effect of rounding CRVAL
and C1USQ prior to comparing their magnitudes in the
hypothesis test. This is implicitly done when doing the
chi-square computations by hand and using tabled chisquare values to determine CRVAI.. Additional detsils
regarding the usage of the 'round' parameter are given
near the top of Listing 1. If the user wishes to suppress
this option, 'round=.' can be specified in the invocation
of the macro. However, using the default lE-5 will
alert the user should CHlSQ be close enough to
CRVAL for the computstional inaccuracy in CRVAL to
be worrisome.
k) While the macro is iterating though the rectangular
subspace in the two nested loops, the quantity 'sum' is
cumulated as the sum of all psls2 in the subspace.
Then l-sum is returned to the variable specified by the
parameter ERRBD. This is a conservative estimate of
the actual error in the power, and will be more accurate
than the predicted error given by 2*e.
I) If the parameter PUT_ALL is not equal to 'N' (for
no), then at the very end of the macro-generated code a
PUT _ALL_is issued to print all the variables
(mcluding the intermediate variables from the macro)
onto the log. Users will norma1ly accept the default to
suppress this output.
a) Replication of PiercbaIa (1988)
The following code replicates Piercha1a's APL results.
* Short test to replicate Piercha1a (1988);
options linesiz.e=80 ps=62 mprint symbolgen:
data powchiel;
keep pi p2 power err_lim;
nside= 1; * one-tail test;
alpha=.05;
e=.00000005; * So error limit 2*e = lE7;
nl= 375;
pl=.OOI;
n2= 375;
do p2=.OOl,.002 to .020 by .002;
cc= 0; * uncorrected test;
%powchiep(nside=nside,alpha=alpha,e=e,
nl =nl,pl =pl,n2=n2,p2=p2,cc=cc,
diml =375,dim2=375,put_ all=yes,
round= .OOl,pw=power,errbd=errJim);
power=round(power,.OOOI}; output;
end; '" of p2 loop;
run;
tide 'Replication of Pierchala (1988)';
tide2' Table 1 Power Computations ';
proc print noobs;
var pI p2 power err_lim;
run;
The resulting power values, rounded to four decimal
places, agree exactly with Piercha1a's APL results fc~
the 'exact' method. The results are given in Listing
near the end of this paper. It can be seen in the results
that the actual error limits on the power values are less
that the IE-7 predicted by the inequality given earlier.
<
Each of the power values computed by the code above
took about 1 second on an A~ 486/50 PC with a
80387 compatable math coprocessor chip. This.
compares to 92 seconds per power value using
Pierchala's APL routine on an IBM PC-XT with an
8087 math coprocessor chip.
b) Replication of GarsIde and Mack (1976)
4. EXAMPLES
The following two examples demonstrate the use of the
macro POWCHIEP. They also serve the purpose of
demonstrating that the macro gives accurate
computations.
748
The following code was used to replicate a portion of
one of the tables given by Garside and Mack for the
probability of rejecting the null hypothesis.
'" Test to replicate Garside and Mack (1976);
options mprint symbolgen;
Statistics
%Iet table=2;
%Iet part=3;
%letm=l00;
%Iet mprime=50;
%let alpha= .05;
%let lamda=0.143;
data p_rej_hO;
%powcbidr;
drop method 00;
do p=.l to .9 by .1;
array pw(3) y u g;
do method = 1,2,3;
ifmethod = 1 then 00=.5;
else if method=2 then 00=0;
else if method=3 then oo=&lamda;
%powcbiep(nside= 1,alpha=&a1pha,
nl =&mprime,pl =p,diml =&mprime,
n2=&m,p2=p,dim2=&m,
oo=oo,pw=pw(method);
end;
y=round(y, .00(1);
u=round(u,.OOOI);
g=round(g,.OOOl);
output;
end;
run;
proc print noobs;
run;
The results are given (with additional formatting) in
Usting 3 at the end of this paper. The values all agree
exactly with the corresponding values given by GlU"Side
and Mack. One subtlety is to be noted. In their
formulation of the alternative hypothesis, GlU"Side and
Mack reversed the direction of the inequality as
compared to the formulation in this paper. Thus, in the
call to POWClDEP above, nl is the sample size for
their second population, and n2 is that for their first
population. That is, reversing the numbering of the
populations switches the direction of the inequalities in
the alternative and null hypotheses.
5. CONCLUSIONS
The macro POWClDEP gives accurate results
compared to two examples in the literature. This macro
gives a reliable method to compute the probability of
rejecting the null hypothesis of the equality of two
independent proportions. The values are computed
much more quickly than they were by a preliminary
version of the macro that performed the computations
over the entire sample space. Albeit on more modern
equipment, POWCHIEP also gives much quicker
computations than did the earlier APL routine on older
computing equipment. The values computed by
POWCHIEP are well within the accuracy prespecified
by the inequality given above. The power values can
be computed relatively quickly by POWCHIEP, making
it an attractive alternative to the commonly used
approximations.
6. REFERENCES
GlU"Side , G. R. and Mack, C. (1976), "Actual type 1
Error Probabilities for Various Tests in the
Homogeneity Case of the 2 x 2 Contingency Table" •
The American Statistician 30, 18-21.
Pierchala, C. E. (1988), "Exact Power Calculations for
the Chi-Square Test of Two Proportions", Computing
Science and Statistics: 1988 Proceedings of the 20th
Symposium on the Interface, 470-473.
CAVEAT. While there are no known problems in
accuracy with the macros given in this paper, no
warranty implied or otherwise is given as to the
correctness and accuracy of the routine. The user is
responsible to study the code and determine if it appears
correct for his or her purposes.
The author may be contacted at:
2400 Sixteenth St., N.W. # 537
Washington, DC 20009-6629
ph. (202) 483-8131
If you mail to me either a 3.5" or 4.25" floppy diskette
formatted for IBM compatible DOS-based personal
computers, along with a self-addressed stamped
envelope with sufficient postage to mail back the
diskette, I will make a copy and send back to you in
ASCII format the macros given in this paper. Please
include yom phone number with your request for the
macros.
" SAS is a registered trademark of SAS Institute Inc.,
Cary, NC, USA.
749
Statistics
LISTING 1.
The Macros POWCHIEP and POWCHIBM.
h1acro powchiep
/*
(nside=2,
/*
alpha=.05,
Version of 9-JUL-94;
WARNING: The parameter values below are not
generally checked for appr~riateness, so
the user is responsible to do so
passing
incorrect parameters may cause the routine to
abort, or worse, to return incorrect results
*/
no. of sides of test, 1 or 2 -- for one-sided
tests, the alternative hypothesis is that p2 is
greater than p1
*/
/*
nominal significance level of the test
e=.00000005,
/*
error limiting parameter -- guarantees that
the error (due to restricting the computations
to the rectangular subspace) will not exceed 2*e
n1=,
sample size of the first sample
/*
p1=,
true pr~ortion of success in first population
/*
diml=,
an integer specifying the size of the array used
/*
to hold binomial probabilities for first marginal
subrange. A conservative choice for this integer
is to have it equal or exceed every n1
used by the calling program. This parameter
MOST be a constant, NOT a variable name.
n2=,
sample size of the second sample
/*
p2=,
true pr~ortion of success in second p~ulation
/*
dim2=,
an integer specifying the size of the array used
/*
to hold binomial probabilities for second
subrange. A conservative choice for this integer
is to have it equal or exceed every n2
used by the calling program. This parameter
MUST be a constant, NOT a variable name.
cc=O,
/*
continuity correction: 0 for uncorrected test,
.5 for Yates corrected test, or the Garfield/Mack
value for their test
*/
*/
*/
*/
*/
*/
*/
*/
*/
control switch to cause all variables to be put to
log immediately prior to finishing macro execution:
anything except 'N' causes the put to occur
*/
round=lE-5,
Rounding level and tolerance level: if the absolute
/*
difference between the computed chi-square <_chisq)
and the critical value ( crva1) is less than or
equal to 'round', then various intermediate results
are put to the log, including rchsq, which equals
_chisq rounded to the nearest 'round!, and _rcrvl,
which equals _crva1 rounded to the nearest ' round' .
If set to '.', then these intermediate results are
not put to the log. Typical choices are .01 or .001.
750
Statistics
pW="'pw,
1*
wi
variable name in which to store the computed power wi
errbd= errbd
-1*
variable name in which to store the probability of
being outside the rectangular subspace, which is a
conservative estimate of the error in the computed
power due to restricting computations to this
subspace. (The rectangular subspace is the subspace used for performing the computations.) Note
that errbd is dropped if macro POWCHIDR is used in
the calling DATA step.
*1
); * (This right-parenthesis ends macro statement);
* Power determined by computation over the rectangular subspace of
the entire sample space;
* Cumulative chi-square used to determine marginal subranges;
* Variable names altered for macro identification ;
* This was done by putting in a leading underscore character ('_')
before variable names;
* Guarantee that certain items are variable names, not constants,
mainly as a convenience for printing intermediate results;
n1=&o1;
-n2=&n2i
:::91= &pl ;
"'p2=&p2;
cc=&cc;
-nside=o.nsidei
:::alpha=&alpha;
e=&e;
:::errbd=&errbd; * in the event that macro powchidr is utilized but
errbd was not assigned the default _errbd by macro
powchiep, this assignment ensures that _errbd exists
and can be dropped via powchidr;
* initialize sum of bivariate probabilities in rectangular subspace;
_SU11\=O;
_crval=cinv(l-(3-_nside)*_alpha,l); * determine critical value;
&pw=O; * initialize sum for power;
n= nl+ n2; * used for chi-square computation;
-noPrd..-nl n11 n2; * used for chi-square computation;
constants needed to use cumulative chi-square to determine marginal
subranges;
_minl=_e/2;
* this produces the lower end of the marginal subrange;
_miDh=l-_minl; * this produces the upper end of the marginal subrange;
* For margin 1, set up array used to hold the binomial probabilities
for elements of the marginal subrange, and calculate and save those
probabili ties;
array ..,pla(O :&diml) _temporary_;
_cpsp=O; * cumulative binomial probability initialized;
~1=O; * initialize counter of binomial prbs. exceeding threshold;
do sl=O to n1;
_Cpsl=probbnml. (..,pl,_n1,_sl) ;
..,psl=_cps1-_cpsp;
if _cpSl ge minl and _cpsp le _miDh then do;
...pla{~1T.."'pS1;~1=~1+1;if ~l=l then _slb=_sl;end;
_cpsp=_cps1;
end;
if _jl=O then do; put n*** ABORT -- error limiting parameter set too high";
abort 16; end;
_sle=_slb+-=i l-l;
* For margin 2, set up array used to hold the binomial probabilities
for elements of the marginal subrange, and calculate and save those
*
751
Statistics
probabilities;
array ""p2a(O:&dim2) _temporary_;
_cpsp=O; * cumulative binomial probability initialized;
~2=0; * initialize counter of binomial prbs. exceeding threshold;
ao _s2=0 to _n2;
_cps2=probbnml(""p2,_n2,_s2);
""ps2=_cps2-_cpsp;
if cps2 ge udnl and _cpsp le minh then do;
- ....P2a{~2T=....Ps2;~2=~2+1;if-~2=1 then _S2b=_s2;end;
_cpsp=_cps2 ;
end;
if ~2=0 then do; put "*** ABORT -- error limiting parameter set too high";
abort 16; end;
_s2e=_s2b+~2-1;
* Perform computations over rectangular subspace;
*
do
*
ITERATE ALONG l!IARGIN 1 FOR BINOMIAL PROBABILITIES IN StJBRANGE
sl= slb to
sle
_fl=_nl-_sl;
*
;
:psl~""pla{_si-_slb};
used for chi-square computation;
ITERATE ALONG l!IARGIN 2 FOR BINOMIAL PROBABILITIES IN StJBRANGE
do
*
s2= s2b to s2e;
-....ps2c....p2a{_S2-_s2b};
COMPUTE BIVARIATE BINOMIAL PROBABILITIES;
""pSls2=""psl*""ps2;
_sumc_sum+""ps1s2 ;
* compute chi-square;
_S=_Sl+_s2;
_d=abs((_sl*(_n2-_s2»-(_s2*_fl»-_cc*_n;*adjusted for continuity corrctn;
if s ne 0 and s ne n then
_chIsq=_noprd*_d*_d/ _sf Cn-_s); else _chisq=O;
* note that without the 'if' statement above, the special cases
sl=s2=0 and sl=n1,s2=o2 would yield a udssing value
due to division by zero in calculating _chisq; * That would
cause a 'division by zero' message and a 'put _all_' to be put
to the log; * In both cases in the 'if' clause, the sample
proportions are exactly equal, which is consistent with the
null hypothesis, so setting _chisq to zero causes it to be
less than _crval below, so power is calculated correctly;
* cumulate power;
i f nside=2 then
if-_chisq gt _crval then &pw=&pw+....pSls2;* this is for 2 tail test;
else; * this else completes the nested if;
else if _chisq gt _crval and _sl/_n1 It _s2/_n2 then &pw=&pw+....ps1s2;
* 2nd else is for 1 tail test;
* put messages to log if chisq is closer to _crval than 'round';
csmcv= chisq- crval;if aEs ( csmev) Ie &round
then do;
_rchsq=round(_chisq,&round);
_rcrvl=round(_crval,&round);
put;
put '_chisq is close to _crval';
put _alpha= _nside= _cc= _e=
_sl= _n1= ""p1= _s2= _n2= ""p2= ""ps1= ""ps2= ""pS1s2=
&pw= _chisq= _rchsq= _crval= _rcrvl= _csmcv=;
put;
end;
end;
end;
&errbd=l- sum;
if "&put_all" ne "N" then do; put _all_; put; end;
%mend powchiep;
752
Statistics
%macro powchidr;
The user may use macro POWCHIDR to issue a DROP statement to
drop all of the above temporary variables. An al ternati ve is
to use a KEEP statement in the calling data step to keep only
those variables needed by the user. Do NOT use both DROP and
* KEEP statements in the same data step.
*;
* PURPOSE: ISSUE A DROP STATEMENT TO DROP ALL OF THE TEMPORARY
* VARIJU!LES USED BY MACRO POWCHIEP
*
*
*
*
*
....
....
......
I
I
I
I
I
I
I
* drop the following 36 temporary variables used by the macro POWCHIEP;
drop _alpha _cc _chisq _cpsl _cps2 _cpsp _crval
_csmcv _d _e _errbd _fl ~l ~2 _Minh _minI _n _nl
_02 _noprd _nside "'pl "p2 "'psI "ps2 "ps1s2
_rchsq _rcrvl _s _sl _s2 _Slb _sle _s2b _s2e _sum
%mend powchidr i
Output from Second Example (with
added formatting)
Listing 2. Output from F1rst Example
Listing 3.
Replication of Pierchala (1988)
Table I Power Computations
Garside and Mack (1976), Table 2 (Part 3)
M=loo, M'=50, ALPHA=.OS, LAMDA_G=0.I43
PI
P2
POWER
ERR LIM
0.0045
0.0278
0.1323
0.2759
0.4200
0.5474
0.6539
0.7403
0.8088
0.8621
0.9024
.000000012992
.000000013586
.000000013113
.000000021114
.000000023338
.000000020664
.000000016464
.000000012762
.000000022667
.000000015083
.000000024083
TEST
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
P
Y
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.0200
0.0284
0.0328
0.0340
0.0366
0.0354
0.0355
0.0345
0.0317
U
0.0425
0.0478
0.0491
0.0494
0.0473
0.1)524
0.0530
0.0521
0.OS17
G
0.0421
0.0417
0.0429
0.0438
0.0470
0.0459
0.0471
0.0464
0.0492
=
=
Key: Y
Yates' corrected chi-square,
U
Uncorrected chi-square,
G = Garside's corrected chi-square
753