Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
USING SAS FUNCTIONS FOR POWER ANALYSIS AND SAMPLE SIZE ESTIMATION Mel Widawski, UCLA, Los Angeles, California ABSTRACT The power of a test is the likelihood of detecting an effect (of a given size), with a test that uses a set alpha and sample size. Conversely if you know the power you want to achieve you can determine the sample size. What do you do if you need to know the power of a statistical test and you don't have a power analysis package handy? The SAS0 Language has a number of functions for calculating probability from the value of a statistic and also calculating the value of the statistic from a pr9pability. This paper will show ~orne simple ways to determine power and take home some sample programs to help ~le size. ':ou will wtth sample stze and power determination. You should also develop a better. understanding of what goes into power and sample size calculations. In order to make use of these functions you need to know h_ow to calculate the non-centrality parameter. We will spend some time on these calculations and some tricks in calculation. Finally the paper will introduce you to the %POWER macro from SAS that can be used to determine both power and sample size. alSo INTRODUCTION The basic theme of this paper may be considered to be it is all related as we will ~lso explore the relationships between tests in relation t~ power. We will start~ t.oking at determining power or sample size for I will suggest thiJikjng about power in relation to Multiple Regression analysis as :( feel that it is more general and allows a maximum of freedom. Understanding power analysis depends on understanding that all statistical tests have a theoretical distribution. We will statt with the t distribution, and build from there. The curve on the right is the t distribution for an arbitrary effect size. Alpha (a) is set to .05 and is the area of the first curve to the right of the line. Since the degrees of freedom are 20 the critical value oft is approximately I.72. The area of the second curve to the right of that line is power. In the discussion of the t distribution below, I will show you a small program for calculating this critical value oft. When people draw this distribution they tend to draw the two curves as identical curves, but as you can see they are actually a little different. The second curve is a noncentral t distribution given that some alternate hypothesized value for t is correct. As the true effect gets bigger, the curve flattens and becomes a little skewed. You access this different curve by use of the noncentrality parameter that is related to both the size of the effect and the sample size. The noncentrality parameter for the t distribution is called delta, and is related to the effect size (d) for this statistic. Effect size d NC parameter o= d r;:-;-- v21n You may notice a distinct relationship between the formula for the noncentrality parameter and that for t itself. t=IPJ-.u21 ut.J2tn These two formulas are the same. So you now know how to calculate the non-centrality parameter for t. In the discussion below we will see how these formulas can easily be calculated in the DATA STEP. After exploring the calculation of power for t we will show 2 that you can easily calculate f1 that relates to the effect size for t, and that this is virtually identical to R2 from regression analysis. The rest of the discussion will involve power calculations give estimates ofR2 • Particular attention will be paid to how you go about specilying the effect sizes for power analysis. tTESTS This figure demonstrates the concept of power for the t distribution. The curve on the left is the central t distribution for a mean of zero. 159 When you want to estimate the power of a t test the most difficult problem is determining the effect size (d) you wish to be able to detect. It is possible to use Cohen's small (.2), medium (.5) and large (.8) effect sizes for the differences between means (Cohen, 1988). Let us look at another way of specifying effect size that may be more meaningful to you. If in review of the literature you find studies using the same dependent measures that you are using, you may find the means and standard deviations of the measures you are using. Since you usually calculate power to convince corresponding to .80 power. Notice the fOTDlUlas for calculating effsize and NCt (the noncentrality parameter). Also the formula for rsq (R~ is given. AD alternative way of calculating rsq is: Rsq=(t**2)/((t**2)+df); This gives an approximation for the R2 based on the relationship between R2 and F, and the relationship between F and t. You would calculate t in the usual fashion or you can substitute the NCt fort in the above equation. It is interesting that using N instead of df gives a stable estimate of R2 that is identical to the formula used in the program above. The program above produces the following output. someone (e.g. a granting agency) that your study is likely to succeed, you must decide what would constitute a meaningful change in the mean of this score. Then armed with this information you have all that is necessary to begin calculating the power or sample size for your study. There are two main functions in SAS that we will be making use of for this problem: PROBT, and TINY. probt(t,df,ncl When using this for power analysis t would be the critical t for the alpha level you are using (.05), df will be determined in the program, and nc can be calculated as shown above. This will yield a probability equivalent to 13 or type li error and 1-13 =power. We can use TINY to determine the critical t as follows. tinv(p,df) For this calculation p is 1-a/2 for a two tailed test. alpha=0.05 tails=2 mean1=1300 std1=207 mean2=1200 std2=207 effsize=0.48309l7874 rsq=0.0551280072327946 A Sample Problem Let's try to put this in concrete tenns. Suppose you were examining a new method of preparation for the SAT and we know that the current mean for the SAT is I 019 and the current standard deviation is 207. Under the current method you are using your students score around 111 1200 on the SAT. This translates to the 64 percentile. The mean SAT value for entering freshmen at UCLA is 1304. Thus we would like to know the sample size necessary to detect an increase of at least I00 points in the mean SAT score of your students. The following program will calculate power and search for sample size: nl n2 tal ph NCt power 130 132 134 136 138 140 128 130 132 134 136 138 65 66 67 68 65 66 67 68 69 69 70 70 1.9786 1.9783 1.9781 1.9778 1.9775 1.9773 2.754 2.775 2.796 2.816 2.837 2.858 0.780 0.786 0.792 0.798 0.804 0.810 A Second Sample Problem Sometimes you don't have hard information about the means and standard deviations of your measures. In that case one method you might use is to set the standard deviation to one and estimate a difference in means in relation to the standard deviation. Thus if your difference in means is estimated to be .5 then you are expecting a half a standard deviation change in means. This corresponds to an effect size of .5 and will result in approximately the same samples size requirements as above. If you get a chance try running the program with means of I and 1.4 and standard deviations of I. meanl=1300; /•replace with Groupl mean est.*/ mean2=1200; /•replace with Group2 mean est.•/ sigmaa207; /*replace with your std estim */ meandiff • abs(meanl-mean2l; alph=alpha/tails; */ */ df=N est-2; t=meandiff /(sigma•sqrt((1/nl) +ll/n2))); More General Sample Program The following program will calculate power and search for sample size and allows for different sample sizes in the two groups and slightly different standard deviations. rsq=IIMeandiff/2)**2)/ ((sigma**2)+((Meandiff/2)•• 2)); effsize=(Meandiff)/(sigma) ; NCt=effsize/sqrt((1/(nl)+(1 /n2))) DATA temp; alpha=.05; tails=2; meanl=1300; /•replace SO w/ grp1 n nnl=SO; stdl•207; /*replace with G1 std talph•TINV(l-alph,df); power=l-(PROBT(talph,df,NC t)); output; liND; PROC PRINT; BY alpha tails meanl mean2 sigma effsize rsq; FORMAT rsq 18.16; VAR N est df n1 n2 talph NCt power; JUJN; df A total N of 138, which corresponds to 69 per group, yields a power of at least .80. DATA temp; alpha-.05; tails=2; /*Specify lower/upper N and even increment DO N_est=130 to 140 BY 2; /* testing N's nl•round(N est/2); n2=N_est-nl; N_est */ */ mean2•1200; - You specifY the low and hi values to try for N and the means and standard deviations. I actually usually start with fairly wide spread values for the range for N, and an increment value of 10 or 20 to determine how to narrow the range for searching for an N 160 /•replace so w/ grp2 n nn2=50; std2=207; /*replace with G1 std N = n1 + n2; /•calc your total N meandiff = ABS(mean1-mean2); alph•alpha/tails; */ */ •/ DO N_est=l30 TO 140 B~ 2;/•testing Ns*/ n1=ROUND(nn1/N*(N est)); n2=N_est-n1; - . ssl•((nl)-l)•(stdl**2); ss2•((n2)-l)•(std2••2); raq=((Meandiff/2)**2)/ ((((ssl+ss2l/(N est-2))) +((Meandiff/2)**2)); affsiza•(Meandiff)/ SQRT((ssl+ss2)/(N_est-2)); NCt•effsize/SQRT((l/nl)+(l/n2)) df=N est-2· talph=TINV(l-alph,df); power=l-(PROBT(talph,df,NCt)); OUTPUT; DMA powerf; alph•.OS; tails•2; rsq• 0.0551280072327946; rsqcov•O *rsqexp•.; error•l•(rsq+rsqcov); nd£=1; cvdf=O; DO N• 130 TO 140 BY 2; edf=N-(ndf+cvdf+1l; crit_F=FINV(l-(alph*(3-tails)) ,ndf,edf); Fexp•(rsq/ndf)/(error/edf); effsz_f2=rsq/error; lamhda•effsz f2* (edf+ndf+l); power•l-PROBF(crit_F,ndf,edf,Lambda); OUTPUT; BND· PROC PRINT; BY alpha tails meanl stdl mean2 std2 effsize rsq; PORMAT rsq 18.16; ~ N_est df nl n2 talph NCt power RUN; BND; PROC PRINT DATAapowerf NOOBS; TITLE "Power and Sample Size; R & F"; BY alph tails rsq rsqcov ndf cvdf error; VAR N edf crit_F Fexp effsz_f2 lambda power; You specifY the low and high values to try for N and the increment, which should be an even number unless the ratio you specifY would require a different multiple. For example if you specifY nnl of 1 and nn2 of 2 then you are requesting proportional sampling in a 2 to I ratio. In that case you would want to specifY an increment of 3. RUN; This program is more amenable to sample estimates of sigma, but also gives the same answer as the previous program when population estimates are used. Notice that the critical value for F (crit_F), the degrees of freedom, and lambda are all that are necessary to calculate power using the PROBF function. FANDR2 The program above produces the following output. The reason we produced the estimate of R2 in the t test example was to show the relationship between power estimates for F and t. Also, I would like to introduce R2 as a common interface. This also allows us to use what I feel is a more flexible approach to calculating power. First we need to look at the definitions for effect size (ti) and The noncentrality parameter (lambda, A. ). Effect size 2 R J --l-R2 --- 2 NC parameter A, = / 2 *N You may notice a distinct relationship between the formula for the noncentrality parameter F for which the numerator and denominator are divided by their respective degrees of freedom. Computing these values is rather simple and can be accomplished with the following code: effsz_f2=rsq/(l-rsq); /*for effect size */ and lambda=effsz_f2*(edf + ndf +l); /*NC param*/ where R2 (rsq) is either specified or calculated in your program, and edf is the error degrees of freedom ((N-ndf)-1 ), and ndf is the numerator degrees of freedom. The reason for specifying this as degrees of freedom, rather than as N, is that it is more flexible. This enables specifying covariates or multiple regression problems with other predictors. The FINV function is used to determine the critical value for F given alpha and the appropriate degrees of freedom. crit_F=FINV(l-alph,ndf,edf); The following program will do a power and sample size analysis for that will correspond to that done for the t test above: 161 alph•O.OS tails•2 rsq•O.OS51280072 rsqcov•O ndf•1 cvdf•O error•0.9448719928 N 130 132 134 136 ed.f 128 130 132 134 138 136 140 138 crit P 3.915l4 3.91399 3.91288 3.91179 3.91075 3.90973 effsz f2 lambda 0.058l44 0.058344 0.058344 0.058344 0.058344 0.058344 7.58477 7.70146 7.81815 7.93484 8.05153 8.16822 power 0.78035 0.78659 o. 79268 0.79861: 0.80441 0.81006 The power N and degrees of freedom in this output should look familiar as they are the same as in the previous output. Since we have already calculated these values as a t test it might not be thought that this exercise is necessary. Benefiting from a Covariate What if you cannot afford to sample 138 people for this study'} You can accept a larger effect size to detect, or you can scrap the study. Another solution would be to introduce a covariate that is related to SAT but should not be related to the method of instruction you are evaluating. You will have to further assume that the method of evaluation will not change the relationship between this covariate and SAT. Let us assume that IQ is such a variable, and you can easily administer this IQ test to your students. Assume that it is known that IQ relates to SAT with an R2 of .2, and that there is zero correlation between IQ and the method of teaching. Luckily in such a case R2 for each effect are additive. Taking advantage of both of these facts we can reapply the program we used previously, but specifY two additional variables: rsqcov and cvdffor the degrees of freedom of the covariate. rsqcov=.2 cvdf • 1 ; Notice the formulas for the preCision necessary. relationship between f- and a and the formula for obtaining the effect size d from a. Now after re-running the program we see the following results. alph-o.os tails•2 rsq•O.OS51280072 rsqcov-0.2 ndf•l cvdf•l error•0-7448719928 N 100 102 104 106 108 110 edf crit_F effsz_f2 lambda power 97 3. 93913 ). 93712 3.93519 3.93334 3. 93156 3.9n84 0.074010 0.074010 0.074010 0.074010 0.074010 0.074010 7.32699 7.47501 7.62303 7.77105 7. 91907 8.06709 0.?6424 o. 77261 0.78074 0.78861 o. 79625 0. 80365 99 101 103 lOS 107 delta-sqrt(effsz_f2*(edf+ ndf+l)); effsz_d-delta/sqrt((N/2) /2); This is useful if you want to be able to SJ!CCify the effi:ct io tenns of the metric of your scale for a t test. The output produced by this code follows: Notice that in this case we can get by with 28 fewer people in our sample. This corresponds to a correlation of about .44 for IQ with SAT. That is not that unlikely, in fact if the correlation were more like .60 then the R2 would be .36 and we would need only 88 students. If there is reason to believe that your covariate is related to the teaching method then the problem becomes more complicated and you have to know the R1 for IQ and the R2 for the combination of IQ and teaching method. Finding Minimum Effect Size for an N and Power There are times when the limitations are such that you can only manage a given sample size and you would like to determine the effect size that would correspond to that sample size with.SO power. A simple modification of the preceding code will accomplish this task. All we have to do is add an outer do loop to the program which steps through the changes in effect sizes until we reach .80 power. alph•O.OS tails-2 n•BO ndf•l edf•78 cvdf•O ragcov-o crit_P-3.9634720514 rsq effsz_f2 lambda power delta effsz_d 0.09135 0.09136 0.09137 0.09138 0.09139 0.09140 0.10053 0.10055 0.10056 0.10057 0.10058 0.10059 8.04270 8.04367 8.04464 8.04561 8.04658 8.04755 0.7998 0.7998 0.7999 0.7999 o. 7999 0.8000 2.8359 2.8361 2.8363 2.8364 2.8366 2.8368 0.63414 0.63418 0.63422 0.63426 0.63429 0.6303 An R2 of .09140 is associated with .80 power. Applyiog this to the original problem, then the effect sized of .63433 means that we can detect a change of .63433*207 or 131 with .80 power. This translates to being confident of detecting a mean change to 1331 from 1200. The code to accomplish this follows: THE %POWER MACRO DATA power; ndf=1; cvdf=O; alph=.OS; tails=2; rsqcov=O; This is a macro written by Kristin R. Latour and a short article on it is available as a technical report from SAS Institute at the following URL: http://fto.sas.com/techsup/downloadltechnote!ts272.odf DO rsq= .09135 TO .0914 BY .00001; The MACRO itself is available at the following URL: http://ewe3.sas.com/tec:hsup/download!stat/power.hunl error=1-(rsq+rsqcov); DO n= 80 TO 80 BY 2; edf=n-(ndf+cvdf+1); crit F=FINV(1-(alph*(3-tails) ) ,ndf,edf); Fexp=(rsq/ndf)/(error/ed f); effsz_f2=rsq/error; delta=sqrt(effsz_f2*(edf+ ndf+l)); effsz_d=delta/sqrt((N/2) /2); lambda=effsz f2*(edf+ndf+l); power=l-PROBF(crit_F,ndf ,edf,Lambda); OUTPUT; This macro calculates power for ANOVA designs and works with GLM, and uses the OUTSTAT data set from GLM as input. It is set up to handle both prospective power analysis and retrospective power analysis power calculation for a study already completed. Prospective power analysis is what we have been doing up until, where power and samples size is calculated for a study not yet undertaken. A sample program for using the MACRO for prospective is available at the following URL: http://ewe3.sas.com/techsup/downloadlstat/powerex.html BND; BND; RON; PROC PRINT NOOBS; BY alph tails N ndf edf cvdf rsqcov crit F; VAR rsq error effsz f2 lambda power delta effsz_d; RUN; A good strategy is to narrow in on the range by using a large increment first with a fairly broad range. Then as you narrow the range you can decrease the. increment until you feel you have the 162 I am presenting an excerpt from that program so that you might notice a trick that is provided for aiding in some of the specifications for the o/oPOWER macro. PROC GLM is used to determine the sum of squares for a give mean structure. Then the standard deviation (sigma) that is specified is used in conjunction with this to aid io the calculation of effect size and the noncentrality parameter. The program code follows on the next page: DATA prospect; INPUT group mean count; REFERENCES CARDS; Cohen, J. Statistical Power Analysis for the Behavioral Sciences, Hillsdale, NJ: Lawrence Erlbaum Associates, 1988.567 pp l 40 5 2 45 10 3 35 10 SAS Institute Inc., SA~ Language: Reference, Version 6, First Edition, Cary, NC: SAS Institute Inc. , 1990 PROC GLM DATA=prospect OUTSTAT=prosout; CLASS group; FREQ count; MODEL mean=group; RUN; SAS Institute Inc., SASISTA-ze User's Guide, Version 6, Fourth Edition, Volume 2, Cary, NC: SAS Institute Inc., 1989. 1351-1194. %power(data=prosout, out=powout, effect=group, calcs=power lsn, alpha=.Ol .05, sigma=4.0 8.0, delta=2.0 5.0) SAS is a registered trademark or trademark ofSAS Institute Inc. in the USA and other countries. • indicates USA registration. CONTACT INFORMATION This is a convenient way to approach ANOVA problems. It is possible to use GLM to produce the sum of squares and determine, the sum of squares error, which is simply the error degrees of freedom times the variance, in order to determine R2 which is the sum of squares for the procedure over the sum of squares total. Another convenient site for calculation of power for ANOVA designs that include repeated measures is Michael Friendly's site at the following URL where his {power macro is available. http://www.math.yorku.ca/SCS/sasmaclfuower.html CONCLUSION We have covered the calculation of effect size and noncentrality parameters for the t and F distribution, demonstrated the relationship between these and R2, and shown a number of ways to calculate power and determine sample size. I hope this helps you both be able to use the SAS functions to calculate power, and also are better able to specify those parameters required by power analysis packages. To recap the functions to assist in determining power are TINY and FINV for detennining the critical values fort and F, and PROBT and PROBF for calculating power using the noncentral t and F distributions. You have learned to calculate li and l.. , the noncentrality coefficients which are needed to supply to these functions. ACKNOWLEDGMENTS I would like to thank some of the aforementioned individuals and some un-named individuals who have shared their knowledge of power analysis on the Web. 163 Your comments and questions are valued and encouraged. Contact the author at: Mel Widawski UCLA E-mail Address: [email protected]