Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PO04 %YAMGAST: Yet Another Macro to Generate a Summary Table Zang Xiong, St. Jude Children’s Research Hospital, Tennessee ABSTRACT There are many published macros that generate a summary table for clinical study report. All of these macros require users to define both the type of descriptive statistics and the method of statistical test. However for a fixed sample size, when the type of descriptive statistics is chosen, there is usually a standard test for statistics comparison. For example, when we report the mean with standard deviation of a continuous variable, we use two-sample t-test or ANOVA to test whether the variable is distributed differently in the groups of interest. If the sample size is small, an exact test might be more appropriate. This paper introduces a user friendly macro, which can automatically choose an appropriate test based on the type of descriptive statistics and sample size. It produces a highly customized and ready-to-publish table in RTF format. INTRODUCTION A table summarizing the descriptive statistics and statistical comparison is seen in many clinical study reports. In the table, mean with standard deviation or median with range is used to summarize continuous variables; whereas frequency with row or column percentage is used for categorical variables. When the mean is reported, the p-value from two-sample t-test or ANOVA is used to examine the difference in the distribution of the variables in the groups of interest. When the median is reported, Wilcoxon rank sum test is used for testing. Chi-square test is usually used for comparison when we report frequency with percentage. Exact p-values are recommended over the asymptotic p-values in the situation where the asymptotic assumptions cannot be satisfied. Small sample size is one of the assumption violations. Exact test is useful for comparing sample proportions when the [1] expected number of occurrences in some cells is less than 5. There is no clear guideline for when to use exact test for continuous variable. It is recommended to use exact test for continuous variables when the sample size is less than 30.[2] Some of the SAS procedures, for example the NPAR1WAY and FREQ, provide EXACT statement which can be used for exact test. There are many published macros that generate a summary table for clinical study report. All of these macros require users to define both the type of descriptive statistics and the method of statistical test.[3, 4] The macro described in this paper can automatically choose an appropriate test by the type of summary statistics and the sample size for each variable. The available tests under each circumstance are summarized in the table below. Summary Variable Type Statistics Sample Size* Test SAS Procedure (PROC) Continuous Mean Total ≥ 30 ANOVA GLM Total < 30 Not recommended Total ≥ 30 Wilcoxon rank sum/ Median NPAR1WAY Kruskal-Wallis Total < 30 Categorical Freq Expected cell count ≥ 5 Exact Wilcoxon rank sum/ NPAR1WAY Exact Kruskal-Wallis (with Exact statement) Fisher test/ FREQ Pearson chi-square test Expected cell count < 5 Exact Fisher test/ FREQ Exact Pearson chi-square test (with Exact statement) * The small sample size criteria can be defined in the macro, with the default of 30 for continuous variable and 5 for categorical variables. MACRO PARAMETERS 1 PO04 Required parameters: dat = Working SAS data set. grp = Group/column variable. Observation with missing group variable will not be summarized in the table. vlist = Name and the desired summary statistics for the variables which will be listed in the table. Use ‘\’ to separate the variable names and statistics. For each variable, multiple statistics are allowed. The test statistics will be based on the first statistics statement of that variable. The valid values for summary statistics and their corresponding output are listed here: mean1 Mean SD mean2 Mean (LowerCL – UpperCL) med1 Median (Min – Max) med2 Median (Q1 – Q3) freq Frequency (Percentage %) (Percentage can be row or column percentage) Example: vlist = age_at_dx\med1 mean1\ gender\freq\ height\med2\ file = Path and file name of the RTF file Optional statement: test = Specify whether to perform and report the test p-values. yes (default)/no If grp variable has only 1 level, no test will be performed. exacttest = Specify whether to automatically choose exact test for each variable. yes (default)/no If yes, all the exact p-values will have a single dagger (†) as superscript. small1 = Specify the smallest sample size cut off for continuous variable. Default: 30. small2 = Specify the smallest cell count cut off for categorical variable. Default: 5. total = Specify whether to report the summary statistics for the entire cohort when there are more than 1 group. yes (default)/no pct = Specify row or column percentage for categorical variables. col: column percentage (default); row: row percentage ncont = Specify whether or not to report number of non missing observations for each continuous variables in each group. yes /no (default) missing = Specify whether or not to report number of missing observations for each variables (both continuous and categorical) in each group. yes /no (default) pctfmt = Specify the format for all percentages. Default is 5.1. pfmt = Specify the format for all p-values. Default is pvalue5.3. alpha = Specify the alpha level for calculating the confidence interval. Default is 0.05. w1 = Specify the width of the first column. Default is 4cm. w2 = Specify the width of the column other than the first column and the column for p values. Default is 3.5cm. w3 = Specify the width of the column for p-values. Default is 2cm. 2 PO04 orient = ps = style = title = footnote = Specify the orientation of the page. portrait (default)/landscape Specify the maximum number of rows per page. Default is 35. Specify the style of RTF file. Specify the title for the table. Specify the footnote for the summary table. TECHNICAL DETAILS %yamgast ( dat=, grp=, vlist=, total=yes, test=yes, exacttest=yes, small1=30, small2=5, ncont=no, pct=col, missing=no, pctfmt=5.1, pfmt=pvalue5.3, alpha=0.05, w1=4cm, w2=3.5cm, w3=2cm, orient=portrait, ps=35, style=, title=%str(), footnote=%str(), file=); STEP 1: Create some macro variables which will be used later. Set the default format of variables if there is no format assigned to them. Use variable name as the variable’s label if no label is assigned to them. And decide whether an exact test is appropriate for comparing the continuous variables between/among groups. data _null_; ci = right(left(trim(put(round((1 - &alpha)*100, 1.), 2.0)))); call symput('ci', ci); run; data _dat; set &dat; where &grp is not missing; _grp = left(''||left(put(&grp, &grpfmt))); run; proc sql noprint; select count(*) into :ntotal from _dat; quit; %let ntotal = %sysevalf(&ntotal); %let exactwilcox = no; %if &ntotal < &small1 %then %let exactwilcox = yes; ods output variables = _content; proc contents data = _dat; run; ods output close; data _content; set _content; if upcase(type) = 'CHAR' and format = ' ' then format = '$30.'; if upcase(type) = 'NUM' and format = ' ' then format = '8.1'; if label = ' ' then label = propcase(variable); run; proc sql noprint; 3 PO04 select label into :grplbl from _content where upcase(variable) = "%upcase(&grp)"; quit; proc sql noprint; select format into :grpfmt from _content where upcase(variable) = "%upcase(&grp)"; quit; proc sql noprint; select count(*) into :ngrp from (select distinct &grp from _dat); quit; %let ngrp = %sysevalf(&ngrp); STEP 2: Based on the type of summary statistics and options, each row variable will be passed to several macros to generate the summary statistics and p-value. The result is appended and saved into a data set named _report. The macro %nametest generates a dataset for each variable which stores the label of the variable and the p-value of the test. Macro %count generates a dataset for each continuous variable which contains the number of none missing value in each group. Macro %missing generates a dataset for each variable which has the number of missing value in each group. Please refer to the appendix for the details of %nametest. data _report; set _null_; run; %let _I=1; %do %while (%scan(&vlist, &_I, %str(\)) NE ); %let var=%scan(&vlist, &_I, %str(\)); %let display=%scan(&vlist, %eval(&_I+1), %str(\)); %let firstmacro=%scan(&display, 1, %str( )); %let _exacttest = 0; proc sql noprint; select type into :vartype from _content where upcase(variable) = "%upcase(&var)"; quit; proc sql noprint; select label into :varlbl from _content where upcase(variable) = "%upcase(&var)"; quit; proc sql noprint; select format into :varfmt from _content where upcase(variable) = "%upcase(&var)"; quit; %nametest; %if %upcase(%substr(&firstmacro, 1, 1)) = %str(M) and %upcase(%substr(&ncont, 1, 1)) = %str(Y) %then %do; %count; %end; %let _J = 1; %do %while (%scan(&display, &_J, %str( )) NE ); %let macro = %scan(&display, &_J, %str( )); %¯o; %let _J = %eval(&_J + 1); %end; %if %upcase(%substr(&firstmacro, 1, 1)) = %str(M) 4 PO04 and %upcase(%substr(&missing, 1, 1)) = %str(Y) %then %do; %missing; %end; %let _I=%eval(&_I+2); %end; STEP 3: After the previous two steps all the summary statistics and p-values are stored in the _report dataset. Then PROC REPORT and ODS RTF are used to generate a summary table in RTF format. Please refer to the appendix for the details of %report. data _report; set _report; _one = 1; retain mypage 1 nrowprint &ps; counter = _n_; if counter > nrowprint then do; nrowprint = nrowprint + &ps; end; mypage = nrowprint/&ps; run; proc sql noprint; select max(mypage) into :maxpage from _report; quit; title ' '; footnote ' '; options orientation=&orient center; ods rtf file = "&file" bodytitle startpage=yes style=&style; ods noptitle; ods escapechar='^'; %do _ii = 1 %to &maxpage; %if &_ii = 1 %then %do; title "^S={just=center font_size = 12pt fontweight=bold} &title"; %report; %end; %else %do; title "^S={just=center font_size = 12pt fontweight=bold} &title (cont.)"; %report; %end; %end; ods rtf close; EXAMPLE The macro is very useful when sample size is small. In this example, variable ‘risk’ is the group variable or column variable. Age, height, gender and race are all row variables. Mean and standard deviation will be calculated for age and ANOVA will be used for group comparison (There will be a warning message in the log file for not recommendation for this type of test). Median, min, max and inter quartile range will be calculated for height and Wilcoxon rank sum test will be used for group comparison. The frequencies, the column percentages and the p-value from 2 test of gender and race will be calculated. The summary table is shown after the example SAS code. 5 PO04 proc format; value risk value gender value $ race 1 = 2 = 1 = 2 = 'B' 'W' 'O' 'Average Risk' 'High Risk'; 'Male' 'Female'; = 'Black' = 'White' = 'Other'; run; data dat; input risk age height gender race $ @@; format risk risk. gender gender. race $race.; label risk = 'Risk Group' age = 'Age (year)'; datalines; 1 15 165 1 B 1 20 170 2 . 1 17 182 2 W 1 . 160 2 O 2 14 155 1 W 2 10 120 1 O 2 11 130 . B 2 18 170 2 W 2 16 175 1 O ; run; %yamgast(dat=dat, grp=risk, vlist= age\mean1\ height\med1 med2\ gender\freq\ race\freq\, ncont=yes, missing=yes, style=custom, title=Example: Summary Statistics and Comparison between Risk Groups, footnote=Example of Output, file=C:\test.rtf); Example: Summary Statistics and Comparison between Risk Groups Risk Group Variables Total (N=9) Average Risk (N=4) High Risk (N=5) Age (year) N Mean ± SD Missing 0.169 8 3 5 15.1 ± 3.4 17.3±2.5 13.8±3.3 1 1 0 0.317 † Height N P Value 9 4 6 5 PO04 Risk Group Total (N=9) Average Risk (N=4) High Risk (N=5) Median (min-max) 165.0 (120.0 - 182.0) 167.5 (160.0 - 182.0) 155.0 (120.0 - 175.0) Median (q25-q75) 165.0 (155.0 - 170.0) 167.5 (162.5 - 176.0) 155.0 (130.0 - 170.0) 0 0 0 Variables Missing P Value 0.486 † Gender Male (%) 4 (44.4%) 1 (25.0%) 3 (60.0%) Female (%) 4 (44.4%) 3 (75.0%) 1 (20.0%) 1 0 1 Missing 1.000 † Race Black (%) 2 (22.2%) 1 (25.0%) 1 (20.0%) Other (%) 3 (33.3%) 1 (25.0%) 2 (40.0%) White (%) 3 (33.3%) 1 (25.0%) 2 (40.0%) 1 1 0 Missing Date: 10MAR08 † Exact test Example of Output REFERENCE 1. Carr, W.E., Fisher's Exact Test Extended to More Than Two Samples of Equal Size. TECHNOMETRICS, 1980. 22(2): p. 269. 2. Cytel, StatXact Manual. 3. Zhou, Y., L. Zhang, and M.L. Hancock, %SummaryTable: A SAS® Macro to Produce a Summary Table in Clinical Trial. Pharmaceutical Industry SAS Users Group. Paper AD13. 4. Grandits, G. and K. Svendsen, A System of SAS® Macros for Producing Statistical Reports. SAS® Users Group International. Paper 141-31. CONTACT Zang (Ann) Xiong Department of Biostatistics St. Jude Children's Research Hospital 332 N. Lauderdale St. Memphis, TN 38105-2794 Tel: 901.495.5762 [email protected] APPENDIX %macro nametest; %let &var._p = NA; %if %upcase(%substr(&firstmacro, 1, 4)) = %str(MEAN) and &ngrp > 1 and %upcase(%substr(&test, 1, 1)) = %str(Y) %then %do; 7 PO04 %if %upcase(%substr(&exactwilcox, 1, 1)) = %str(N) %then %do; %put WARNING: Sample size is less than 30, ANOVA is not recommended; %end; ods output modelanova = _test; proc glm data = _dat; class _grp; model &var = _grp; run; quit; ods output close; data _null_; set _test; if hypothesistype = 3 then do; p = put(probf, &pfmt); call symput("&var._p", p); end; run; proc datasets; delete _test; run; quit; %end; %if %upcase(%substr(&firstmacro, 1, 3)) = %str(MED) and &ngrp > 1 and %upcase(%substr(&test, 1, 1)) = %str(Y) %then %do; proc npar1way data=_dat; class _grp; var &var; %if %upcase(%substr(&exactwilcox, 1, 1)) = %str(Y) %then %do; exact wilcoxon; %end; output out = p1 WILCOXON ; run; data _null_; set p1; %if &ngrp = 2 %then %do; %if %upcase(%substr(&exactwilcox, 1, 1)) = %str(N) %then %do; p = put(p2_wil, &pfmt); %end; %if %upcase(%substr(&exactwilcox, 1, 1)) = %str(Y) %then %do; p = put(xp2_wil, &pfmt); %end; %end; %if &ngrp > 2 %then %do; %if %upcase(%substr(&exactwilcox, 1, 1)) = %str(N) %then %do; p = put(p_kw, &pfmt); %end; %if %upcase(%substr(&exactwilcox, 1, 1)) = %str(Y) %then %do; p = put(xp_kw, &pfmt); %end; %end; call symput("&var._p", p); run; %end; %if &firstmacro = %str(freq) and &ngrp > 1 and %upcase(%substr(&test, 1, 1)) = %str(Y) %then %do; ods output CrossTabFreqs = freqexpct PearsonChiSq = p1; proc freq data=_dat ; exact chisq; 8 PO04 tables &var*_grp/expected chisq; run; ods output close; %if %upcase(%substr(&exacttest, 1, 1)) = %str(Y) %then %do; data _null; set freqexpct; if . < expected < &small2 then do; call symput('_exacttest', 1); end; run; %end; data _null_; set p1; %if &_exacttest = 0 %then %do; if Name1 = 'P_PCHI' then do; %end; %if &_exacttest = 1 %then %do; if Name1 = 'XP_PCHI' then do; %end; p = put(nValue1, pvalue5.3); call symput("&var._p", p); end; run; %end; data _nametest; keep col0-col&ngrp total pvalue; length col0-col&ngrp total pvalue $99.; col0 = '^S={leftmargin=0pt fontweight=bold}'||"&varlbl"; %if %upcase(%substr(&test, 1, 1)) = %str(Y) %then %do; %if (&firstmacro = %str(freq) and &_exacttest = 1) or (%upcase(%substr(&firstmacro, 1, 3)) = %str(MED) and %upcase(%substr(&exactwilcox, 1, 1)) = %str(Y)) %then %do; pvalue = "&&&var._p"||' ^{super †}'; %end; %else %do; pvalue = "&&&var._p"; %end; %end; run; data _report; set _report _nametest; run; proc datasets; delete _nametest; run; quit; %mend; %macro report; proc report data = _report nowd headline split='#' box; where mypage = &_ii; column ( col0 (%if &ngrp > 1 %then %do; "\B &grplbl" %if %upcase(%substr(&total, 1, 1)) = %str(Y) %then %do; total %end; %end; 9 PO04 %do i = 1 %to &ngrp; col&i %end; ) %if %upcase(%substr(&test, 1, 1)) = %str(Y) and &ngrp > 1 %then %do; pvalue %end; ) ; define col0/display '^S={just=left}Variables' style(column)=[cellwidth=&w1 just=left leftmargin=10pt]; %if %upcase(%substr(&total, 1, 1)) = %str(Y) %then %do; define total/display "^S={just=center}Total#(N=&ntotal)" style(column)=[cellwidth=&w2 just=c]; %end; %do i = 1 %to &ngrp; define col&i /display "^S={just=center}&&grp&i.#%str((N=&&ngrp&i))" style(column)=[cellwidth=&w2 just=c]; %end; %if %upcase(%substr(&test, 1, 1)) = %str(Y) and &ngrp > 1 %then %do; define pvalue /display "^S={just=center}P Value" style(column)=[cellwidth=&w3 just=c font_style=italic]; %end; %if %upcase(%substr(&exacttest, 1, 1)) = %str(Y) or &footnote ^= %str()%then %do; compute after _page_/style={just=l font_size=8pt}; line "^S={protectspecialchars = off just=left}Date: &sysdate."; %if %upcase(%substr(&exacttest, 1, 1)) = %str(Y) and %upcase(%substr(&test, 1, 1)) = %str(Y) %then %do; line "^{super †}Exact test"; %end; %if &footnote ^= %str() %then %do; %let _I = 1; %do %while (%scan(&footnote, &_I, %str(\)) NE ); %let text=%scan(&footnote, &_I, %str(\)); line "&text"; %let _I = %eval(&_I + 1); %end; %end; endcomp; %end; run; %mend; 10