Download Using SAS Functions for Power Analysis and Sample Size Estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
USING SAS FUNCTIONS FOR POWER ANALYSIS AND SAMPLE SIZE
ESTIMATION
Mel Widawski, UCLA, Los Angeles, California
ABSTRACT
The power of a test is the likelihood of detecting an effect (of a given
size), with a test that uses a set alpha and sample size. Conversely if
you know the power you want to achieve you can determine the
sample size. What do you do if you need to know the power of a
statistical test and you don't have a power analysis package handy?
The SAS0 Language has a number of functions for calculating
probability from the value of a statistic and also calculating the value
of the statistic from a pr9pability.
This paper will show ~orne simple ways to determine power and
take home some sample programs to help
~le size. ':ou will
wtth sample stze and power determination. You should also develop
a better. understanding of what goes into power and sample size
calculations. In order to make use of these functions you need to
know h_ow to calculate the non-centrality parameter. We will spend
some time on these calculations and some tricks in calculation.
Finally the paper will introduce you to the %POWER macro from
SAS that can be used to determine both power and sample size.
alSo
INTRODUCTION
The basic theme of this paper may be considered to be it is all related
as we will ~lso explore the relationships between tests in relation t~
power. We will start~ t.oking at determining power or sample size
for I will suggest thiJikjng about power in relation to Multiple
Regression analysis as :( feel that it is more general and allows a
maximum of freedom.
Understanding power analysis depends on understanding that all
statistical tests have a theoretical distribution. We will statt with the t
distribution, and build from there.
The curve on the right is the t distribution for an arbitrary
effect size. Alpha (a) is set to .05 and is the area of the first
curve to the right of the line. Since the degrees of freedom
are 20 the critical value oft is approximately I.72. The area
of the second curve to the right of that line is power. In the
discussion of the t distribution below, I will show you a
small program for calculating this critical value oft.
When people draw this distribution they tend to draw the
two curves as identical curves, but as you can see they are
actually a little different. The second curve is a noncentral t
distribution given that some alternate hypothesized value
for t is correct. As the true effect gets bigger, the curve
flattens and becomes a little skewed. You access this
different curve by use of the noncentrality parameter that is
related to both the size of the effect and the sample size.
The noncentrality parameter for the t distribution is called
delta, and is related to the effect size (d) for this statistic.
Effect size
d
NC parameter
o=
d
r;:-;--
v21n
You may notice a distinct relationship between the formula
for the noncentrality parameter and that for t itself.
t=IPJ-.u21
ut.J2tn
These two formulas are the same. So you now know how to
calculate the non-centrality parameter for t. In the
discussion below we will see how these formulas can easily
be calculated in the DATA STEP.
After exploring the calculation of power for t we will show
2
that you can easily calculate f1 that relates to the effect size
for t, and that this is virtually identical to R2 from
regression analysis. The rest of the discussion will involve
power calculations give estimates ofR2 • Particular attention
will be paid to how you go about specilying the effect sizes
for power analysis.
tTESTS
This figure demonstrates the concept of power for the t distribution.
The curve on the left is the central t distribution for a mean of zero.
159
When you want to estimate the power of a t test the most
difficult problem is determining the effect size (d) you wish
to be able to detect. It is possible to use Cohen's small (.2),
medium (.5) and large (.8) effect sizes for the differences
between means (Cohen, 1988). Let us look at another way
of specifying effect size that may be more meaningful to
you.
If in review of the literature you find studies using the same
dependent measures that you are using, you may find the
means and standard deviations of the measures you are
using. Since you usually calculate power to convince
corresponding to .80 power. Notice the fOTDlUlas for
calculating effsize and NCt (the noncentrality parameter).
Also the formula for rsq (R~ is given. AD alternative way
of calculating rsq is:
Rsq=(t**2)/((t**2)+df);
This gives an approximation for the R2 based on the
relationship between R2 and F, and the relationship
between F and t. You would calculate t in the usual fashion
or you can substitute the NCt fort in the above equation. It
is interesting that using N instead of df gives a stable
estimate of R2 that is identical to the formula used in the
program above.
The program above produces the following output.
someone (e.g. a granting agency) that your study is likely to succeed,
you must decide what would constitute a meaningful change in the
mean of this score. Then armed with this information you have all
that is necessary to begin calculating the power or sample size for
your study.
There are two main functions in SAS that we will be making use of
for this problem: PROBT, and TINY.
probt(t,df,ncl
When using this for power analysis t would be the critical t for the
alpha level you are using (.05), df will be determined in the program,
and nc can be calculated as shown above. This will yield a
probability equivalent to 13 or type li error and 1-13 =power. We can
use TINY to determine the critical t as follows.
tinv(p,df)
For this calculation p is 1-a/2 for a two tailed test.
alpha=0.05 tails=2 mean1=1300 std1=207
mean2=1200 std2=207
effsize=0.48309l7874
rsq=0.0551280072327946
A Sample Problem
Let's try to put this in concrete tenns. Suppose you were examining a
new method of preparation for the SAT and we know that the current
mean for the SAT is I 019 and the current standard deviation is 207.
Under the current method you are using your students score around
111
1200 on the SAT. This translates to the 64 percentile. The mean
SAT value for entering freshmen at UCLA is 1304. Thus we would
like to know the sample size necessary to detect an increase of at least
I00 points in the mean SAT score of your students.
The following program will calculate power and search for sample
size:
nl
n2
tal ph
NCt
power
130
132
134
136
138
140
128
130
132
134
136
138
65
66
67
68
65
66
67
68
69
69
70
70
1.9786
1.9783
1.9781
1.9778
1.9775
1.9773
2.754
2.775
2.796
2.816
2.837
2.858
0.780
0.786
0.792
0.798
0.804
0.810
A Second Sample Problem
Sometimes you don't have hard information about the
means and standard deviations of your measures. In that
case one method you might use is to set the standard
deviation to one and estimate a difference in means in
relation to the standard deviation. Thus if your difference
in means is estimated to be .5 then you are expecting a half
a standard deviation change in means. This corresponds to
an effect size of .5 and will result in approximately the
same samples size requirements as above. If you get a
chance try running the program with means of I and 1.4
and standard deviations of I.
meanl=1300; /•replace with Groupl mean est.*/
mean2=1200; /•replace with Group2 mean est.•/
sigmaa207; /*replace with your std estim */
meandiff • abs(meanl-mean2l;
alph=alpha/tails;
*/
*/
df=N est-2;
t=meandiff /(sigma•sqrt((1/nl)
+ll/n2)));
More General Sample Program
The following program will calculate power and search for
sample size and allows for different sample sizes in the two
groups and slightly different standard deviations.
rsq=IIMeandiff/2)**2)/
((sigma**2)+((Meandiff/2)•• 2));
effsize=(Meandiff)/(sigma) ;
NCt=effsize/sqrt((1/(nl)+(1 /n2)))
DATA temp;
alpha=.05;
tails=2;
meanl=1300;
/•replace SO w/ grp1 n
nnl=SO;
stdl•207; /*replace with G1 std
talph•TINV(l-alph,df);
power=l-(PROBT(talph,df,NC t));
output;
liND;
PROC PRINT;
BY alpha tails meanl mean2 sigma effsize rsq;
FORMAT rsq 18.16;
VAR N est df n1 n2 talph NCt power;
JUJN;
df
A total N of 138, which corresponds to 69 per group, yields
a power of at least .80.
DATA temp;
alpha-.05;
tails=2;
/*Specify lower/upper N and even increment
DO N_est=130 to 140 BY 2; /* testing N's
nl•round(N est/2);
n2=N_est-nl;
N_est
*/
*/
mean2•1200;
-
You specifY the low and hi values to try for N and the means and
standard deviations. I actually usually start with fairly wide spread
values for the range for N, and an increment value of 10 or 20 to
determine how to narrow the range for searching for an N
160
/•replace so w/ grp2 n
nn2=50;
std2=207; /*replace with G1 std
N = n1 + n2; /•calc your total N
meandiff = ABS(mean1-mean2);
alph•alpha/tails;
*/
*/
•/
DO N_est=l30 TO 140 B~ 2;/•testing Ns*/
n1=ROUND(nn1/N*(N est));
n2=N_est-n1;
-
.
ssl•((nl)-l)•(stdl**2);
ss2•((n2)-l)•(std2••2);
raq=((Meandiff/2)**2)/
((((ssl+ss2l/(N est-2)))
+((Meandiff/2)**2));
affsiza•(Meandiff)/
SQRT((ssl+ss2)/(N_est-2));
NCt•effsize/SQRT((l/nl)+(l/n2))
df=N est-2·
talph=TINV(l-alph,df);
power=l-(PROBT(talph,df,NCt));
OUTPUT;
DMA powerf;
alph•.OS;
tails•2;
rsq• 0.0551280072327946;
rsqcov•O
*rsqexp•.;
error•l•(rsq+rsqcov);
nd£=1;
cvdf=O;
DO N• 130 TO 140 BY 2;
edf=N-(ndf+cvdf+1l;
crit_F=FINV(l-(alph*(3-tails))
,ndf,edf);
Fexp•(rsq/ndf)/(error/edf);
effsz_f2=rsq/error;
lamhda•effsz f2* (edf+ndf+l);
power•l-PROBF(crit_F,ndf,edf,Lambda);
OUTPUT;
BND·
PROC PRINT;
BY alpha tails meanl stdl mean2 std2
effsize rsq;
PORMAT rsq 18.16;
~ N_est df nl n2 talph NCt power
RUN;
BND;
PROC PRINT DATAapowerf NOOBS;
TITLE "Power and Sample Size; R & F";
BY alph tails rsq rsqcov ndf cvdf error;
VAR N edf crit_F Fexp effsz_f2
lambda power;
You specifY the low and high values to try for N and the increment,
which should be an even number unless the ratio you specifY would
require a different multiple. For example if you specifY nnl of 1 and
nn2 of 2 then you are requesting proportional sampling in a 2 to I
ratio. In that case you would want to specifY an increment of 3.
RUN;
This program is more amenable to sample estimates of sigma, but
also gives the same answer as the previous program when population
estimates are used.
Notice that the critical value for F (crit_F), the degrees of
freedom, and lambda are all that are necessary to calculate
power using the PROBF function.
FANDR2
The program above produces the following output.
The reason we produced the estimate of R2 in the t test example was
to show the relationship between power estimates for F and t. Also, I
would like to introduce R2 as a common interface. This also allows us
to use what I feel is a more flexible approach to calculating power.
First we need to look at the definitions for effect size (ti) and The
noncentrality parameter (lambda, A. ).
Effect size
2
R
J --l-R2
---
2
NC parameter A, = /
2
*N
You may notice a distinct relationship between the formula for the
noncentrality parameter F for which the numerator and denominator
are divided by their respective degrees of freedom.
Computing these values is rather simple and can be accomplished
with the following code:
effsz_f2=rsq/(l-rsq); /*for effect size
*/
and
lambda=effsz_f2*(edf + ndf +l); /*NC param*/
where R2 (rsq) is either specified or calculated in your program, and
edf is the error degrees of freedom ((N-ndf)-1 ), and ndf is the
numerator degrees of freedom. The reason for specifying this as
degrees of freedom, rather than as N, is that it is more flexible. This
enables specifying covariates or multiple regression problems with
other predictors. The FINV function is used to determine the critical
value for F given alpha and the appropriate degrees of freedom.
crit_F=FINV(l-alph,ndf,edf);
The following program will do a power and sample size analysis for
that will correspond to that done for the t test above:
161
alph•O.OS tails•2 rsq•O.OS51280072 rsqcov•O
ndf•1 cvdf•O
error•0.9448719928
N
130
132
134
136
ed.f
128
130
132
134
138
136
140
138
crit P
3.915l4
3.91399
3.91288
3.91179
3.91075
3.90973
effsz f2
lambda
0.058l44
0.058344
0.058344
0.058344
0.058344
0.058344
7.58477
7.70146
7.81815
7.93484
8.05153
8.16822
power
0.78035
0.78659
o. 79268
0.79861:
0.80441
0.81006
The power N and degrees of freedom in this output should
look familiar as they are the same as in the previous output.
Since we have already calculated these values as a t test it
might not be thought that this exercise is necessary.
Benefiting from a Covariate
What if you cannot afford to sample 138 people for this
study'} You can accept a larger effect size to detect, or you
can scrap the study. Another solution would be to introduce
a covariate that is related to SAT but should not be related
to the method of instruction you are evaluating. You will
have to further assume that the method of evaluation will
not change the relationship between this covariate and
SAT. Let us assume that IQ is such a variable, and you can
easily administer this IQ test to your students.
Assume that it is known that IQ relates to SAT with an R2
of .2, and that there is zero correlation between IQ and the
method of teaching. Luckily in such a case R2 for each
effect are additive. Taking advantage of both of these facts
we can reapply the program we used previously, but specifY
two additional variables: rsqcov and cvdffor the degrees of
freedom of the covariate.
rsqcov=.2
cvdf • 1 ;
Notice the formulas for the
preCision necessary.
relationship between f- and a and the formula for obtaining
the effect size d from a.
Now after re-running the program we see the following results.
alph-o.os tails•2 rsq•O.OS51280072 rsqcov-0.2
ndf•l cvdf•l error•0-7448719928
N
100
102
104
106
108
110
edf
crit_F
effsz_f2
lambda
power
97
3. 93913
). 93712
3.93519
3.93334
3. 93156
3.9n84
0.074010
0.074010
0.074010
0.074010
0.074010
0.074010
7.32699
7.47501
7.62303
7.77105
7. 91907
8.06709
0.?6424
o. 77261
0.78074
0.78861
o. 79625
0. 80365
99
101
103
lOS
107
delta-sqrt(effsz_f2*(edf+ ndf+l));
effsz_d-delta/sqrt((N/2) /2);
This is useful if you want to be able to SJ!CCify the effi:ct io
tenns of the metric of your scale for a t test. The output
produced by this code follows:
Notice that in this case we can get by with 28 fewer people in our
sample. This corresponds to a correlation of about .44 for IQ with
SAT. That is not that unlikely, in fact if the correlation were more
like .60 then the R2 would be .36 and we would need only 88
students. If there is reason to believe that your covariate is related to
the teaching method then the problem becomes more complicated
and you have to know the R1 for IQ and the R2 for the combination of
IQ and teaching method.
Finding Minimum Effect Size for an N and Power
There are times when the limitations are such that you can only
manage a given sample size and you would like to determine the
effect size that would correspond to that sample size with.SO power.
A simple modification of the preceding code will accomplish this
task. All we have to do is add an outer do loop to the program which
steps through the changes in effect sizes until we reach .80 power.
alph•O.OS tails-2 n•BO ndf•l edf•78 cvdf•O ragcov-o
crit_P-3.9634720514
rsq
effsz_f2
lambda
power
delta
effsz_d
0.09135
0.09136
0.09137
0.09138
0.09139
0.09140
0.10053
0.10055
0.10056
0.10057
0.10058
0.10059
8.04270
8.04367
8.04464
8.04561
8.04658
8.04755
0.7998
0.7998
0.7999
0.7999
o. 7999
0.8000
2.8359
2.8361
2.8363
2.8364
2.8366
2.8368
0.63414
0.63418
0.63422
0.63426
0.63429
0.6303
An R2 of .09140 is associated with .80 power. Applyiog
this to the original problem, then the effect sized of .63433
means that we can detect a change of .63433*207 or 131
with .80 power. This translates to being confident of
detecting a mean change to 1331 from 1200.
The code to accomplish this follows:
THE %POWER MACRO
DATA power;
ndf=1;
cvdf=O;
alph=.OS;
tails=2;
rsqcov=O;
This is a macro written by Kristin R. Latour and a short
article on it is available as a technical report from SAS
Institute at the following URL:
http://fto.sas.com/techsup/downloadltechnote!ts272.odf
DO rsq= .09135 TO .0914 BY .00001;
The MACRO itself is available at the following URL:
http://ewe3.sas.com/tec:hsup/download!stat/power.hunl
error=1-(rsq+rsqcov);
DO n= 80 TO 80 BY 2;
edf=n-(ndf+cvdf+1);
crit F=FINV(1-(alph*(3-tails) )
,ndf,edf);
Fexp=(rsq/ndf)/(error/ed f);
effsz_f2=rsq/error;
delta=sqrt(effsz_f2*(edf+ ndf+l));
effsz_d=delta/sqrt((N/2) /2);
lambda=effsz f2*(edf+ndf+l);
power=l-PROBF(crit_F,ndf ,edf,Lambda);
OUTPUT;
This macro calculates power for ANOVA designs and
works with GLM, and uses the OUTSTAT data set from
GLM as input. It is set up to handle both prospective power
analysis and retrospective power analysis power calculation
for a study already completed. Prospective power analysis
is what we have been doing up until, where power and
samples size is calculated for a study not yet undertaken.
A sample program for using the MACRO for prospective is
available at the following URL:
http://ewe3.sas.com/techsup/downloadlstat/powerex.html
BND;
BND;
RON;
PROC PRINT NOOBS;
BY alph tails N ndf edf cvdf rsqcov
crit F;
VAR rsq error effsz f2 lambda power
delta effsz_d;
RUN;
A good strategy is to narrow in on the range by using a large
increment first with a fairly broad range. Then as you narrow the
range you can decrease the. increment until you feel you have the
162
I am presenting an excerpt from that program so that you
might notice a trick that is provided for aiding in some of
the specifications for the o/oPOWER macro. PROC GLM is
used to determine the sum of squares for a give mean
structure. Then the standard deviation (sigma) that is
specified is used in conjunction with this to aid io the
calculation of effect size and the noncentrality parameter.
The program code follows on the next page:
DATA prospect;
INPUT group mean count;
REFERENCES
CARDS;
Cohen, J. Statistical Power Analysis for the Behavioral
Sciences, Hillsdale, NJ: Lawrence Erlbaum Associates,
1988.567 pp
l 40 5
2 45 10
3 35 10
SAS Institute Inc., SA~ Language: Reference, Version 6,
First Edition, Cary, NC: SAS Institute Inc. , 1990
PROC GLM DATA=prospect OUTSTAT=prosout;
CLASS group;
FREQ count;
MODEL mean=group;
RUN;
SAS Institute Inc., SASISTA-ze User's Guide, Version 6,
Fourth Edition, Volume 2, Cary, NC: SAS Institute Inc.,
1989. 1351-1194.
%power(data=prosout,
out=powout,
effect=group,
calcs=power lsn,
alpha=.Ol .05,
sigma=4.0 8.0,
delta=2.0 5.0)
SAS is a registered trademark or trademark ofSAS Institute
Inc. in the USA and other countries. • indicates USA
registration.
CONTACT INFORMATION
This is a convenient way to approach ANOVA problems. It is
possible to use GLM to produce the sum of squares and determine,
the sum of squares error, which is simply the error degrees of
freedom times the variance, in order to determine R2 which is the
sum of squares for the procedure over the sum of squares total.
Another convenient site for calculation of power for ANOVA designs
that include repeated measures is Michael Friendly's site at the
following URL where his {power macro is available.
http://www.math.yorku.ca/SCS/sasmaclfuower.html
CONCLUSION
We have covered the calculation of effect size and noncentrality
parameters for the t and F distribution, demonstrated the relationship
between these and R2, and shown a number of ways to calculate
power and determine sample size. I hope this helps you both be able
to use the SAS functions to calculate power, and also are better able
to specify those parameters required by power analysis packages.
To recap the functions to assist in determining power are TINY and
FINV for detennining the critical values fort and F, and PROBT and
PROBF for calculating power using the noncentral t and F
distributions. You have learned to calculate li and l.. , the
noncentrality coefficients which are needed to supply to these
functions.
ACKNOWLEDGMENTS
I would like to thank some of the aforementioned individuals and
some un-named individuals who have shared their knowledge of
power analysis on the Web.
163
Your comments and questions are valued and encouraged.
Contact the author at:
Mel Widawski
UCLA
E-mail Address: [email protected]