Download %YAMGAST: Yet Another Macro to Generate a Summary Table

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
PO04
%YAMGAST: Yet Another Macro to Generate a Summary Table
Zang Xiong, St. Jude Children’s Research Hospital, Tennessee
ABSTRACT
There are many published macros that generate a summary table for clinical study report. All of these macros require users to
define both the type of descriptive statistics and the method of statistical test. However for a fixed sample size, when the type of
descriptive statistics is chosen, there is usually a standard test for statistics comparison. For example, when we report the mean
with standard deviation of a continuous variable, we use two-sample t-test or ANOVA to test whether the variable is distributed
differently in the groups of interest. If the sample size is small, an exact test might be more appropriate. This paper introduces a
user friendly macro, which can automatically choose an appropriate test based on the type of descriptive statistics and sample size.
It produces a highly customized and ready-to-publish table in RTF format.
INTRODUCTION
A table summarizing the descriptive statistics and statistical comparison is seen in many clinical study reports. In the table, mean
with standard deviation or median with range is used to summarize continuous variables; whereas frequency with row or column
percentage is used for categorical variables. When the mean is reported, the p-value from two-sample t-test or ANOVA is used to
examine the difference in the distribution of the variables in the groups of interest. When the median is reported, Wilcoxon rank
sum test is used for testing. Chi-square test is usually used for comparison when we report frequency with percentage.
Exact p-values are recommended over the asymptotic p-values in the situation where the asymptotic assumptions cannot be
satisfied. Small sample size is one of the assumption violations. Exact test is useful for comparing sample proportions when the
[1]
expected number of occurrences in some cells is less than 5.
There is no clear guideline for when to use exact test for
continuous variable. It is recommended to use exact test for continuous variables when the sample size is less than 30.[2] Some of
the SAS procedures, for example the NPAR1WAY and FREQ, provide EXACT statement which can be used for exact test.
There are many published macros that generate a summary table for clinical study report. All of these macros require users to
define both the type of descriptive statistics and the method of statistical test.[3,
4]
The macro described in this paper can
automatically choose an appropriate test by the type of summary statistics and the sample size for each variable. The available
tests under each circumstance are summarized in the table below.
Summary
Variable Type
Statistics
Sample Size*
Test
SAS Procedure (PROC)
Continuous
Mean
Total ≥ 30
ANOVA
GLM
Total < 30
Not recommended
Total ≥ 30
Wilcoxon rank sum/
Median
NPAR1WAY
Kruskal-Wallis
Total < 30
Categorical
Freq
Expected cell count ≥ 5
Exact Wilcoxon rank sum/
NPAR1WAY
Exact Kruskal-Wallis
(with Exact statement)
Fisher test/
FREQ
Pearson chi-square test
Expected cell count < 5
Exact Fisher test/
FREQ
Exact Pearson chi-square test
(with Exact statement)
* The small sample size criteria can be defined in the macro, with the default of 30 for continuous variable and 5 for categorical
variables.
MACRO PARAMETERS
1
PO04
Required parameters:
dat =
Working SAS data set.
grp =
Group/column variable. Observation with missing group variable will not be summarized in the table.
vlist =
Name and the desired summary statistics for the variables which will be listed in the table. Use ‘\’ to separate
the variable names and statistics. For each variable, multiple statistics are allowed. The test statistics will be
based on the first statistics statement of that variable.
The valid values for summary statistics and their corresponding output are listed here:
mean1
Mean SD
mean2
Mean (LowerCL – UpperCL)
med1
Median (Min – Max)
med2
Median (Q1 – Q3)
freq
Frequency (Percentage %) (Percentage can be row or column percentage)
Example: vlist =
age_at_dx\med1 mean1\
gender\freq\
height\med2\
file =
Path and file name of the RTF file
Optional statement:
test =
Specify whether to perform and report the test p-values.
yes (default)/no
If grp variable has only 1 level, no test will be performed.
exacttest =
Specify whether to automatically choose exact test for each variable.
yes (default)/no
If yes, all the exact p-values will have a single dagger (†) as superscript.
small1 =
Specify the smallest sample size cut off for continuous variable. Default: 30.
small2 =
Specify the smallest cell count cut off for categorical variable. Default: 5.
total =
Specify whether to report the summary statistics for the entire cohort when there are more than 1
group.
yes (default)/no
pct =
Specify row or column percentage for categorical variables.
col: column percentage (default); row: row percentage
ncont =
Specify whether or not to report number of non missing observations for each continuous variables
in each group.
yes /no (default)
missing =
Specify whether or not to report number of missing observations for each variables (both
continuous and categorical) in each group.
yes /no (default)
pctfmt =
Specify the format for all percentages. Default is 5.1.
pfmt =
Specify the format for all p-values. Default is pvalue5.3.
alpha =
Specify the alpha level for calculating the confidence interval. Default is 0.05.
w1 =
Specify the width of the first column. Default is 4cm.
w2 =
Specify the width of the column other than the first column and the column for p values. Default is
3.5cm.
w3 =
Specify the width of the column for p-values. Default is 2cm.
2
PO04
orient =
ps =
style =
title =
footnote =
Specify the orientation of the page.
portrait (default)/landscape
Specify the maximum number of rows per page. Default is 35.
Specify the style of RTF file.
Specify the title for the table.
Specify the footnote for the summary table.
TECHNICAL DETAILS
%yamgast (
dat=, grp=, vlist=,
total=yes, test=yes, exacttest=yes, small1=30, small2=5,
ncont=no, pct=col, missing=no,
pctfmt=5.1, pfmt=pvalue5.3, alpha=0.05,
w1=4cm, w2=3.5cm, w3=2cm,
orient=portrait, ps=35, style=,
title=%str(), footnote=%str(), file=);
STEP 1: Create some macro variables which will be used later. Set the default format of variables if there is no
format assigned to them. Use variable name as the variable’s label if no label is assigned to them. And decide
whether an exact test is appropriate for comparing the continuous variables between/among groups.
data _null_;
ci = right(left(trim(put(round((1 - &alpha)*100, 1.), 2.0))));
call symput('ci', ci);
run;
data _dat; set &dat; where &grp is not missing;
_grp = left(''||left(put(&grp, &grpfmt)));
run;
proc sql noprint; select count(*) into :ntotal from _dat; quit;
%let ntotal = %sysevalf(&ntotal);
%let exactwilcox = no;
%if &ntotal < &small1 %then %let exactwilcox = yes;
ods output variables = _content;
proc contents data = _dat;
run;
ods output close;
data _content; set _content;
if upcase(type) = 'CHAR' and format = ' ' then format = '$30.';
if upcase(type) = 'NUM' and format = ' ' then format = '8.1';
if label = ' ' then label = propcase(variable);
run;
proc sql noprint;
3
PO04
select label into :grplbl
from _content
where upcase(variable) = "%upcase(&grp)"; quit;
proc sql noprint;
select format into :grpfmt
from _content
where upcase(variable) = "%upcase(&grp)"; quit;
proc sql noprint;
select count(*) into :ngrp
from (select distinct &grp from _dat); quit;
%let ngrp = %sysevalf(&ngrp);
STEP 2: Based on the type of summary statistics and options, each row variable will be passed to several macros to
generate the summary statistics and p-value. The result is appended and saved into a data set named _report. The
macro %nametest generates a dataset for each variable which stores the label of the variable and the p-value of the
test. Macro %count generates a dataset for each continuous variable which contains the number of none missing
value in each group. Macro %missing generates a dataset for each variable which has the number of missing value in
each group. Please refer to the appendix for the details of %nametest.
data _report; set _null_; run;
%let _I=1;
%do %while (%scan(&vlist, &_I, %str(\)) NE );
%let var=%scan(&vlist, &_I, %str(\));
%let display=%scan(&vlist, %eval(&_I+1), %str(\));
%let firstmacro=%scan(&display, 1, %str( ));
%let _exacttest = 0;
proc sql noprint;
select type into :vartype from _content
where upcase(variable) = "%upcase(&var)"; quit;
proc sql noprint;
select label into :varlbl from _content
where upcase(variable) = "%upcase(&var)"; quit;
proc sql noprint;
select format into :varfmt from _content
where upcase(variable) = "%upcase(&var)"; quit;
%nametest;
%if %upcase(%substr(&firstmacro, 1, 1)) = %str(M)
and %upcase(%substr(&ncont, 1, 1)) = %str(Y) %then %do;
%count;
%end;
%let _J = 1;
%do %while (%scan(&display, &_J, %str( )) NE );
%let macro = %scan(&display, &_J, %str( ));
%&macro;
%let _J = %eval(&_J + 1);
%end;
%if %upcase(%substr(&firstmacro, 1, 1)) = %str(M)
4
PO04
and %upcase(%substr(&missing, 1, 1)) = %str(Y) %then %do;
%missing;
%end;
%let _I=%eval(&_I+2);
%end;
STEP 3: After the previous two steps all the summary statistics and p-values are stored in the _report dataset. Then
PROC REPORT and ODS RTF are used to generate a summary table in RTF format. Please refer to the appendix for
the details of %report.
data _report; set _report;
_one = 1;
retain mypage 1 nrowprint &ps;
counter = _n_;
if counter > nrowprint then do;
nrowprint = nrowprint + &ps;
end;
mypage = nrowprint/&ps;
run;
proc sql noprint;
select max(mypage) into :maxpage from _report; quit;
title ' ';
footnote ' ';
options orientation=&orient center;
ods rtf file = "&file" bodytitle startpage=yes style=&style;
ods noptitle;
ods escapechar='^';
%do _ii = 1 %to &maxpage;
%if &_ii = 1 %then %do;
title "^S={just=center font_size = 12pt fontweight=bold} &title";
%report;
%end;
%else %do;
title "^S={just=center font_size = 12pt fontweight=bold}
&title (cont.)";
%report;
%end;
%end;
ods rtf close;
EXAMPLE
The macro is very useful when sample size is small. In this example, variable ‘risk’ is the group variable or column
variable. Age, height, gender and race are all row variables. Mean and standard deviation will be calculated for age
and ANOVA will be used for group comparison (There will be a warning message in the log file for not
recommendation for this type of test). Median, min, max and inter quartile range will be calculated for height and
Wilcoxon rank sum test will be used for group comparison. The frequencies, the column percentages and the p-value
from 2 test of gender and race will be calculated. The summary table is shown after the example SAS code.
5
PO04
proc format;
value risk
value gender
value $ race
1 =
2 =
1 =
2 =
'B'
'W'
'O'
'Average Risk'
'High Risk';
'Male'
'Female';
= 'Black'
= 'White'
= 'Other';
run;
data dat;
input risk age height gender race $ @@;
format risk risk. gender gender. race $race.;
label risk = 'Risk Group' age = 'Age (year)';
datalines;
1 15 165 1 B
1 20 170 2 .
1 17 182 2 W
1 . 160 2 O
2 14 155 1 W
2 10 120 1 O
2 11 130 . B
2 18 170 2 W
2 16 175 1 O
;
run;
%yamgast(dat=dat, grp=risk,
vlist= age\mean1\
height\med1 med2\
gender\freq\
race\freq\,
ncont=yes, missing=yes,
style=custom,
title=Example: Summary Statistics and Comparison between Risk Groups,
footnote=Example of Output,
file=C:\test.rtf);
Example: Summary Statistics and Comparison between Risk Groups
Risk Group
Variables
Total
(N=9)
Average Risk
(N=4)
High Risk
(N=5)
Age (year)
N
Mean ± SD
Missing
0.169
8
3
5
15.1 ± 3.4
17.3±2.5
13.8±3.3
1
1
0
0.317 †
Height
N
P Value
9
4
6
5
PO04
Risk Group
Total
(N=9)
Average Risk
(N=4)
High Risk
(N=5)
Median (min-max)
165.0 (120.0 - 182.0)
167.5 (160.0 - 182.0)
155.0 (120.0 - 175.0)
Median (q25-q75)
165.0 (155.0 - 170.0)
167.5 (162.5 - 176.0)
155.0 (130.0 - 170.0)
0
0
0
Variables
Missing
P Value
0.486 †
Gender
Male (%)
4 (44.4%)
1 (25.0%)
3 (60.0%)
Female (%)
4 (44.4%)
3 (75.0%)
1 (20.0%)
1
0
1
Missing
1.000 †
Race
Black (%)
2 (22.2%)
1 (25.0%)
1 (20.0%)
Other (%)
3 (33.3%)
1 (25.0%)
2 (40.0%)
White (%)
3 (33.3%)
1 (25.0%)
2 (40.0%)
1
1
0
Missing
Date: 10MAR08
† Exact test
Example of Output
REFERENCE
1.
Carr, W.E., Fisher's Exact Test Extended to More Than Two Samples of Equal Size. TECHNOMETRICS, 1980.
22(2): p. 269.
2.
Cytel, StatXact Manual.
3.
Zhou, Y., L. Zhang, and M.L. Hancock, %SummaryTable: A SAS® Macro to Produce a Summary Table in
Clinical Trial. Pharmaceutical Industry SAS Users Group. Paper AD13.
4.
Grandits, G. and K. Svendsen, A System of SAS® Macros for Producing Statistical Reports. SAS® Users
Group International. Paper 141-31.
CONTACT
Zang (Ann) Xiong
Department of Biostatistics
St. Jude Children's Research Hospital
332 N. Lauderdale St.
Memphis, TN 38105-2794
Tel: 901.495.5762
[email protected]
APPENDIX
%macro nametest;
%let &var._p = NA;
%if %upcase(%substr(&firstmacro, 1, 4)) = %str(MEAN) and &ngrp > 1
and %upcase(%substr(&test, 1, 1)) = %str(Y) %then %do;
7
PO04
%if %upcase(%substr(&exactwilcox, 1, 1)) = %str(N) %then %do;
%put WARNING: Sample size is less than 30, ANOVA is not recommended;
%end;
ods output modelanova = _test;
proc glm data = _dat;
class _grp;
model &var = _grp;
run; quit;
ods output close;
data _null_; set _test;
if hypothesistype = 3 then do;
p = put(probf, &pfmt);
call symput("&var._p", p);
end;
run;
proc datasets; delete _test; run; quit;
%end;
%if %upcase(%substr(&firstmacro, 1, 3)) = %str(MED) and &ngrp > 1
and %upcase(%substr(&test, 1, 1)) = %str(Y) %then %do;
proc npar1way data=_dat;
class _grp;
var &var;
%if %upcase(%substr(&exactwilcox, 1, 1)) = %str(Y) %then %do;
exact wilcoxon;
%end;
output out = p1
WILCOXON ;
run;
data _null_; set p1;
%if &ngrp = 2 %then %do;
%if %upcase(%substr(&exactwilcox, 1, 1)) = %str(N) %then %do;
p = put(p2_wil, &pfmt);
%end;
%if %upcase(%substr(&exactwilcox, 1, 1)) = %str(Y) %then %do;
p = put(xp2_wil, &pfmt);
%end;
%end;
%if &ngrp > 2 %then %do;
%if %upcase(%substr(&exactwilcox, 1, 1)) = %str(N) %then %do;
p = put(p_kw, &pfmt);
%end;
%if %upcase(%substr(&exactwilcox, 1, 1)) = %str(Y) %then %do;
p = put(xp_kw, &pfmt);
%end;
%end;
call symput("&var._p", p);
run;
%end;
%if &firstmacro = %str(freq) and &ngrp > 1
and %upcase(%substr(&test, 1, 1)) = %str(Y) %then %do;
ods output CrossTabFreqs = freqexpct PearsonChiSq = p1;
proc freq data=_dat ;
exact chisq;
8
PO04
tables &var*_grp/expected chisq;
run;
ods output close;
%if %upcase(%substr(&exacttest, 1, 1)) = %str(Y) %then %do;
data _null; set freqexpct;
if . < expected < &small2 then do;
call symput('_exacttest', 1);
end;
run;
%end;
data _null_; set p1;
%if &_exacttest = 0 %then %do;
if Name1 = 'P_PCHI' then do;
%end;
%if &_exacttest = 1 %then %do;
if Name1 = 'XP_PCHI' then do;
%end;
p = put(nValue1, pvalue5.3);
call symput("&var._p", p); end;
run;
%end;
data _nametest;
keep col0-col&ngrp total pvalue;
length col0-col&ngrp total pvalue $99.;
col0 = '^S={leftmargin=0pt fontweight=bold}'||"&varlbl";
%if %upcase(%substr(&test, 1, 1)) = %str(Y) %then %do;
%if (&firstmacro = %str(freq) and &_exacttest = 1) or
(%upcase(%substr(&firstmacro, 1, 3)) = %str(MED)
and %upcase(%substr(&exactwilcox, 1, 1)) = %str(Y)) %then %do;
pvalue = "&&&var._p"||' ^{super †}';
%end;
%else %do;
pvalue = "&&&var._p";
%end;
%end;
run;
data _report; set _report _nametest; run;
proc datasets; delete _nametest; run; quit;
%mend;
%macro report;
proc report data = _report nowd headline split='#' box;
where mypage = &_ii;
column
(
col0
(%if &ngrp > 1 %then %do;
"\B &grplbl"
%if %upcase(%substr(&total, 1, 1)) = %str(Y) %then %do;
total
%end;
%end;
9
PO04
%do i = 1 %to &ngrp;
col&i
%end;
)
%if %upcase(%substr(&test, 1, 1)) = %str(Y) and &ngrp > 1 %then %do;
pvalue
%end;
)
;
define col0/display '^S={just=left}Variables'
style(column)=[cellwidth=&w1 just=left leftmargin=10pt];
%if %upcase(%substr(&total, 1, 1)) = %str(Y) %then %do;
define total/display "^S={just=center}Total#(N=&ntotal)"
style(column)=[cellwidth=&w2 just=c];
%end;
%do i = 1 %to &ngrp;
define col&i /display "^S={just=center}&&grp&i.#%str((N=&&ngrp&i))"
style(column)=[cellwidth=&w2 just=c];
%end;
%if %upcase(%substr(&test, 1, 1)) = %str(Y) and &ngrp > 1 %then %do;
define pvalue /display "^S={just=center}P Value"
style(column)=[cellwidth=&w3 just=c font_style=italic];
%end;
%if %upcase(%substr(&exacttest, 1, 1)) = %str(Y)
or &footnote ^= %str()%then %do;
compute after _page_/style={just=l font_size=8pt};
line "^S={protectspecialchars = off just=left}Date: &sysdate.";
%if %upcase(%substr(&exacttest, 1, 1)) = %str(Y)
and %upcase(%substr(&test, 1, 1)) = %str(Y) %then %do;
line "^{super †}Exact test";
%end;
%if &footnote ^= %str() %then %do;
%let _I = 1;
%do %while (%scan(&footnote, &_I, %str(\)) NE );
%let text=%scan(&footnote, &_I, %str(\));
line "&text";
%let _I = %eval(&_I + 1);
%end;
%end;
endcomp;
%end;
run;
%mend;
10