Download Statistical Application of SAS in Method Comparison Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Paper PO01
Statistical Application of SAS in Method Comparison Analysis
David Shen, ClinForce Consulting Inc., Philadelphia, PA
Zaizai Lu, AstraZeneca Pharmaceutical Inc., Wilmington, DE
ABSTRACT
It is common to study the agreement of measurements between two methods in clinical
studies. Many techniques consisting of exploratory graphics and statistical analysis are
introduced to identify the relationships between the results from two methods. All SAS
codes related to the topic are also presented in this paper.
INTRODUCTION
New quantitative methods are often needed to meet the specific requirements in clinical
studies. For example, a study concerning blood glucose concentrations requires
frequent blood glucose monitoring (up to 12 measurements per medication). In order to
implement such monitoring without posing an unacceptably high volume of blood sample
from subjects, a more sensitive technique with low blood volume, will be needed to
replace the traditional one. Since it is unlikely that two different methods will give
identical results from all individuals, a comparison of methods needs to be performed to
estimate accuracy or errors. Agreement between two clinical methods of clinical
measurements can be quantified using the results from the same quantity by two
methods. This paper describes the factors to be considered during the experimental
design, visual graphics to explore the method agreement, and final statistical analysis to
decide if two methods are interchangeable in clinical interpretation.
FACTORS TO CONSIDER
Sample size is important for experimental studies, it should be carefully determined from
the defined power. The specimens should be taken from different time points. The
sample contents can cover the clinical concern range. For example blood glucose
concentration (fasting) is 60-126 mg/dL. The routine laboratory method used as
comparative method does not necessarily imply its correctness. If the differences are
small, then the two methods have the same relative accuracy. If the differences are large
and medically unacceptable, then it is necessary to identify which method is inaccurate
by repeating measurements. Common practice is to analyze each specimen singly by
the test and comparative methods. However, the repeating of individual method analysis
can provide a check on the validity of the data. Duplicate analyses would help to identify
discrepancies between methods.
GRAPHICAL EXPLORATION
Visual graphics allow directly to inspect the agreement between test and comparative
methods. There are four plots which are widely used in data exploration.
1. Comparison plot: displays the test result on the y-axis versus the comparative result
on the x-axis. As points are accumulated, a visual line of best fit can be drawn to show
the general relationship between the methods and help identify discrepant results.
2. Difference plot: displays the difference between the test minus comparative results on
the y-axis versus the comparative result on the x-axis. Ideally, these differences should
scatter around the line of zero differences, half being above and half being below. Any
large differences will stand out in the plot and draw the attention.
3. Altman-Bland plot: displays the differences in measurements on the y-axis versus the
average measurement obtained between the two methods on the x-axis. The mean and
standard deviation are calculated. The mean difference plus or minus 1.96 standard
deviations can then be calculated. 95% of differences should lie between these two
lines. This is called 95% limits of agreement method. It's simple and easy to express and
interpret the data.
4. Frequency histogram: displays the information about the distribution of the difference
between two methods. It is a useful complementary plot to the Altman-Bland plot by
offering the distribution of differences between two methods.
These graphics are generally advantageous for showing the analytical range of data, the
linearity of response over the range, and the general relationship between methods. The
data and program codes for these four plots are shown in the appendix.
STATISTICAL ANALYSIS
Common statistical methods include summary statistics, t-test, correlation, regression
and analysis of variances.
1. Summary statistics
PROC UNIVARIATE can provide the descriptive summary
statistics of the data.
proc univariate data = clinlab normal plots;
var result1 result2 dif
run;
The output contains:
• Sample size
• Range
• Arithmetic mean.
• Median
•
•
Standard Deviation: the standard deviation is the square root of the variance.
When the distribution of the observations is normal, then 95% of observations
are located in the interval Mean ± 1.96SD.
Test for Normal Distribution: PROC UNIVARIATE calculates the Shapiro-Wilk
W statistic. If P is higher than 0.05, it may be assumed that the data has a normal
distribution. If the P value is less than 0.05, then the hypothesis that the
distribution of the observations in the sample is normal should be rejected. In the
latter case, the sample cannot accurately be described by arithmetic mean and
standard deviation, and such samples should not be submitted to any
parametrical statistical test or procedure, such as t-test, which will be discussed
later. To test the possible difference between not normally distributed samples,
the Wilcoxon test should be used, and correlation can be estimated by means of
rank correlation. When the sample size is small, you can visually evaluate the
symmetry and peak of the distribution using the histogram or cumulative
frequency distribution. The three plots generated by PROC UNIVARIATE also
display the data distribution, which enable us easily to see if the data are
approximately normal.
2. T-test PROC TTEST with paired comparison tests the null hypothesis that the
average of the differences between the paired observations in the two samples is zero.
If the calculated P-value is greater than 0.05, the conclusion is that the mean difference
between the paired observations is not significantly different from 0. Otherwise, the
results from two methods are significantly different.
proc ttest data = clinlab ;
paired result1*result2;
run;
The TTEST Procedure
Difference
RESULT1 - RESULT2
Difference
RESULT1 - RESULT2
Statistics
N
30
DF
29
Lower CL
Mean
-2.518
Mean
-0.433
t Value
-0.43
Upper CL
Mean
1.6513
Lower CL
Std Dev
4.4462
Std Dev
5.5828
Upper CL
Std Dev
7.5051
Pr > |t|
0.6739
The first section of the output first displays simple summary statistics such as n, mean,
std, 95% confidence interval for the mean. The second section of the result displays the
null hypothesis test. The high P-value of 0.6739 leads to the conclusion of statistically
non-significant difference.
Note that the sample size will be equal. The paired t-test actually uses the one sample
process on the differences between two results, so t-test can also be conducted using
PROC MEANS.
proc means data = clinlab n mean std lclm uclm t prt;
var dif;
run;
The MEANS Procedure
Analysis Variable : DIF
Lower 95%
Upper 95%
N
Mean
Std Dev
CL for Mean
CL for Mean
t Value
Pr > |t|
----------------------------------------------------------------------------------30
-0.4333333
5.5828144
-2.5179905
1.6513238
-0.43
0.6739
-----------------------------------------------------------------------------------
The t test assumes that data to be tested are distributed normally. This assumption can
be checked using the UNIVARIATE procedure. If the normality assumptions for the t test
are not satisfied, nonparametric Wilcoxon Rank Sum test by PROC NPAR1WAY should
be used to analyze the data.
3. Correlation: Correlation analysis is used to see if the values of two variables are
associated. Simple correlation analysis can be conducted by PROC CORR. The default
correlation analysis includes descriptive statistics, Pearson correlation statistics, and
probabilities for the variable. Correlation coefficients contain information on both the
strength and direction of a linear relationship between two numeric variables.
proc corr data = clinlab ;
var result1 result2;
run;
Pearson Correlation Coefficients, N = 30
Prob > |r| under H0: Rho=0
RESULT1
RESULT2
RESULT1
1.00000
0.96171
<.0001
RESULT2
0.96171
1.00000
<.0001
The P-value is the probability that you would have found the current result if the
correlation coefficient were in fact zero (null hypothesis). If this probability is lower than
the conventional 5% (P<0.05) the correlation is called statistically significant.
Since correlation only looks at the association rather than the agreement between two
methods, correlation may inaccurately estimate the agreement of the relationship. The
result comparison plot can identify and prevent this potential mistake. The red diagonal
line is the line of equality. If the green regression line does not coincide with the red line
of equality, obviously, it is inappropriate to use correlation to interpret the agreement of
two methods.
4. Linear regression: Regression is used to describe the relationship between two
variables and to predict one variable from another. The linear equation Y=a + bX can
reflect the agreement between two methods. The intercept a should equal to 0 or around
0 while slope b should equal to 1 or very close to 1 if the results from two methods are
comparable.
proc reg data=clinlab;
model result1 = result2;
run; quit;
The output shows the P-value, R-square and estimates. If the significance level for the
F-test is very small (< 0.0001), the hypothesis that there is no linear relationship can be
rejected.
Source
DF
Model
Error
Corrected Total
1
28
29
Root MSE
Dependent Mean
Coeff Var
Parameter Estimates
Analysis of Variance
Sum of
Squares
5.57227
92.13333
6.04805
10707
869.40541
11576
R-Square
Adj R-Sq
Mean
Square
10707
31.05019
F Value
Pr > F
344.81
<.0001
0.9249
0.9222
Variable
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
RESULT2
1
1
4.53636
0.94631
4.82579
0.05096
0.94
18.57
0.3552
<.0001
The t-value and the P-value are for the hypothesis that these coefficients are equal to 0.
The p-value for intercept is 0.3553, which indicates that the intercept is not significantly
different from 0, while slope with a p-value <0.0001 means it is significantly different
from 0. The statement added to the PROC REG can explicitly test whether the slope
value is 1 or not.
proc reg data=clinlab;
model result1 = result2;
slope: test result2 = 1;
run;
quit;
Test SLOPE Results for Dependent Variable RESULT1
Mean
Source
DF
Square
F Value
Numerator
1
34.46126
1.11
Denominator
28
31.05019
Pr > F
0.3011
The P-value to test slope in the above result is 0.3011, which indicates that the slope is
not significantly different from 1.
The PROC REG procedure above has tested the linearity of two methodst, values of
intercept and slope. Next a diagnostic Shapiro-Wilks test will be conducted to check the
normality of residuals (residuals = differences between observed and predicted values).
proc reg data=clinlab ;
model result1 = result2 /r clm cli ;
output out = resid r=resid p=pred
slope: test result2 = 1;
run; quit;
proc univariate data = resid normal plots;
var resid;
run;
The residual plot shows the goodness of fit of the selected model or equation. Residuals
also point out the possible outliers (unusual values) in the data and problems with the
regression model. Note some options were added to the above model statement, option
R is for residual diagnostics. Option CLM provides 95% confidence interval for the
regression line. This interval includes the true regression line with 95% probability, while
option CLI presents the 95% prediction interval for the regression line. The 95%
prediction interval is much wider than the 95% confidence interval. For any given value
of the independent variable, this interval represents the 95% probability for the values of
the dependent variable.
5. Analysis of Variance. To study the influence of the qualitative (discrete) factors on
another (continuous) variable, we can use analysis of variance by PROC GLM.
proc glm data = rawlab;
class subject method ;
model result = subject method;
run; quit;
Source
DF
Type III SS
Mean Square
F Value
Pr > F
SUBJECT
METHOD
29
1
23079.90000
2.81667
795.85862
2.81667
51.07
0.18
<.0001
0.6739
We can find that the output above is the same result from paired t-test result.
To evaluate the additional unknown random effects of SUBEJCT to the results,
we can use MIXED procedure. In mixed model, the variances of the random-effects
parameter SUBJECT assumed to impact the variability of the data become the
covariance parameters for the mixed model.
proc mixed data=rawlab;
class method subject;
model result=method /ddfm=satterth ;
random subject;
lsmeans method/pdiff cl alpha=.05;
estimate 'Result1 vs Result2' method 1 -1;
run;
Type 3 Tests of Fixed Effects
Num
Den
Effect
DF
DF
F Value
METHOD
1
29
0.18
Label
Result1 vs Result2
Estimate
-0.4333
Pr > F
0.6739
Estimates
Standard
Error
1.0193
DF
29
t Value
-0.43
Pr > |t|
0.6739
Effect
METHOD
METHOD
METHOD
1
2
Effect
METHOD
METHOD
Least Squares Means
METHOD
Lower
1
84.6243
2
85.0576
Effect
METHOD
Effect
METHOD
METHOD
1
Estimate
92.1333
92.5667
Least Squares Means
Standard
Error
DF
3.6775
30.1
3.6775
30.1
t Value
25.05
25.17
Pr > |t|
<.0001
<.0001
Alpha
0.05
0.05
Upper
99.6424
100.08
Differences of Least Squares Means
Standard
_METHOD
Estimate
Error
DF
t Value
2
-0.4333
1.0193
29
-0.43
Pr > |t|
0.6739
Alpha
0.05
Differences of Least Squares Means
METHOD
_METHOD
Lower
Upper
1
2
-2.5180
1.6513
The RANDOM statement defines the random effects SUBJECT to constitute the vector
in the mixed model. The DDFM=SATTERTH option performs a general Satterthwaite
approximation for the denominator degrees of freedom. PROC MIXED also provides
several different statistics suitable for generating hypothesis tests and confidence
intervals. The validity of these statistics depends upon the mean and variancecovariance.
TRADEMARK INFORMATION
SAS and all other SAS Institute Inc. product or service names are registered trademarks
or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA
registration. Other brand and product names are registered trademarks or trademarks of
their respective companies.
CONTACT INFORMATION
David Shen
ClinForce Consulting Inc.
Philadelphia, PA
[email protected]
Zaizai Lu
AstraZeneca Pharmaceutical Inc.
Wilmington, DE
[email protected]
Appendix
data set: rawlab
Obs
SUBJECT
SUBJECT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
101
101
102
102
103
103
104
104
105
105
106
106
107
107
108
108
109
109
110
110
111
111
112
112
113
113
114
114
115
115
116
116
117
117
118
118
119
119
120
120
121
121
122
122
123
123
124
124
125
125
METHOD
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
RESULT
82.5
90.5
124.5
134.5
96.0
91.0
77.5
83.0
107.5
100.0
75.0
67.5
89.5
77.5
77.5
72.5
73.5
106.5
101.0
118.5
119.0
92.5
94.5
81.5
83.0
113.5
110.5
48.0
47.0
76.0
77.5
81.5
78.5
85.0
82.0
66.0
68.0
68.0
127.5
125.5
79.5
73.5
106.5
106.5
106.5
118.5
106.5
107.0
68.5
77.5
87.5
90.5
51
52
53
54
55
56
57
58
59
60
126
126
127
127
128
128
129
129
130
130
1
2
1
2
1
2
1
2
1
2
96.0
100.0
83.5
87.5
75.0
81.5
99.5
100.5
133.5
130.0
proc transpose data = rawlab out = temp
prefix = result ;
by subject;
id method;
run;
data clinlab;
set temp (drop=_name_);
dif = result1 - result2;
mean
mean = mean (result1, result2);
run;
%macro Explots (data=, interv= 10);
data clinlab;
set &data;
run;
proc means data = clinlab noprint;
var result1 result2;
output out = range min=min1 min2 max =max1 max2;
run;
data _null_;
set range;
a=round(min(min1,
a=round(min(min1, min2)*0.90 - &interv/2, &interv);
b=round(max(max1, max2)*1.05 + &interv/2, &interv);
call symput ('lowerax', a);
call symput ('upperax', b);
run;
goptions reset=global;
axis1 order = (&lowerax to &upperax by &interv)
minor=(number=1)
minor=(number=1)
label = (a=90 'Test Method Result' );
axis2 order = (&lowerax to &upperax by &interv)
minor=(number=1)
label = ('Comparative Method Result');
axis3 label =(a=90 'Difference') minor=(number=1);
axis4 label =(a=90 'Counts') minor = (number=1);
(number=1);
axis5 label =( 'Difference of Means');
data line;
length function color $8;
retain hsys ysys xsys '2' color 'red';
function = 'move'; x=&lowerax; y=&lowerax; ; output;
function = 'draw'; x=&upperax; y=&upperax;
line=2; ;output;
run;
symbol v=circle color=blue i=r;
proc gplot data = clinlab annotate=line;
plot result1*result2 / vaxis=axis1
haxis=axis2 ;
run; quit;
symbol v=dot i=none color= blue ;
proc gplot data = clinlab;
plot dif*result2 / haxis=axis2
haxis=axis2
vaxis=axis3
vref =0;
run; quit;
proc means data = clinlab noprint ;
var dif;
output out = abplot mean = bias std = std range=range;
run;
data _null_;
set abplot ;
upper = bias + 1.96*std;
lower = bias - 1.96*std;
call symput ('upper', upper);
call symput ('middle', bias);
call symput ('lower', lower);
call symput ('range', range/10);
run;
data abline;
length function color style $8 text $15; ;
retain hsys ysys xsys '2' color 'red' line 2 position '6';
function = 'move'; x=&lowerax; y=&lower; output;
function = 'draw'; x=&upperax; y=&lower; output;
function = 'move'; x=&lowerax; y=&upper; output;
function = 'draw'; x=&upperax; y=&upper; output;
function = 'move'; x=&lowerax; y=&middle; output;
function = 'draw'; x=&upperax; y=&middle; line=1; output;
function = 'label'; x=&lowerax+ 0.05*&interv;
y=&lower + &range;
style= 'swissb'; color='green'; size=1.5;
text='BIAStext='BIAS-1.96SD'; output;
output;
function = 'label'; x=&lowerax+ 0.05*&interv;
y=&middle + &range;
style= 'swissb'; color='green'; size=1.5;
text='BIAS
'; output;
function = 'label'; x=&lowerax+ 0.05*&interv;
y=&upper
y=&upper + &range;
style= 'swissb'; color='green'; size=1.5;
text='BIAS+1.96SD'; output;
run;
symbol v=dot i=none color= blue
;
proc gplot data = clinlab annotate=abline;
plot dif*mean / haxis=axis2
vaxis=axis3 ;
run; quit;
proc gchart data = clinlab;
vbar dif / levels=9
raxis=axis4
maxis=axis5;
run; quit;
%mend;
%Explots (data=clinlab);
150
140
130
120
110
100
90
80
70
60
50
40
40
50
60
70
80
90
100
110
120
130
140
150
120
130
140
150
Com
par at i ve M
et hod Resul t
20
10
0
- 10
- 20
40
50
60
70
80
90
100
110
Com
par at i ve M
et hod Resul t
20
10
0
- 10
- 20
40
50
60
70
80
90
100
110
120
130
140
150
Com
par at i ve M
et hod Resul t
8
7
6
5
4
3
2
1
0
- 12
-9
-6
-3
0
D
i f f er ence of
3
M
eans
6
9
12