Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Paper PO01 Statistical Application of SAS in Method Comparison Analysis David Shen, ClinForce Consulting Inc., Philadelphia, PA Zaizai Lu, AstraZeneca Pharmaceutical Inc., Wilmington, DE ABSTRACT It is common to study the agreement of measurements between two methods in clinical studies. Many techniques consisting of exploratory graphics and statistical analysis are introduced to identify the relationships between the results from two methods. All SAS codes related to the topic are also presented in this paper. INTRODUCTION New quantitative methods are often needed to meet the specific requirements in clinical studies. For example, a study concerning blood glucose concentrations requires frequent blood glucose monitoring (up to 12 measurements per medication). In order to implement such monitoring without posing an unacceptably high volume of blood sample from subjects, a more sensitive technique with low blood volume, will be needed to replace the traditional one. Since it is unlikely that two different methods will give identical results from all individuals, a comparison of methods needs to be performed to estimate accuracy or errors. Agreement between two clinical methods of clinical measurements can be quantified using the results from the same quantity by two methods. This paper describes the factors to be considered during the experimental design, visual graphics to explore the method agreement, and final statistical analysis to decide if two methods are interchangeable in clinical interpretation. FACTORS TO CONSIDER Sample size is important for experimental studies, it should be carefully determined from the defined power. The specimens should be taken from different time points. The sample contents can cover the clinical concern range. For example blood glucose concentration (fasting) is 60-126 mg/dL. The routine laboratory method used as comparative method does not necessarily imply its correctness. If the differences are small, then the two methods have the same relative accuracy. If the differences are large and medically unacceptable, then it is necessary to identify which method is inaccurate by repeating measurements. Common practice is to analyze each specimen singly by the test and comparative methods. However, the repeating of individual method analysis can provide a check on the validity of the data. Duplicate analyses would help to identify discrepancies between methods. GRAPHICAL EXPLORATION Visual graphics allow directly to inspect the agreement between test and comparative methods. There are four plots which are widely used in data exploration. 1. Comparison plot: displays the test result on the y-axis versus the comparative result on the x-axis. As points are accumulated, a visual line of best fit can be drawn to show the general relationship between the methods and help identify discrepant results. 2. Difference plot: displays the difference between the test minus comparative results on the y-axis versus the comparative result on the x-axis. Ideally, these differences should scatter around the line of zero differences, half being above and half being below. Any large differences will stand out in the plot and draw the attention. 3. Altman-Bland plot: displays the differences in measurements on the y-axis versus the average measurement obtained between the two methods on the x-axis. The mean and standard deviation are calculated. The mean difference plus or minus 1.96 standard deviations can then be calculated. 95% of differences should lie between these two lines. This is called 95% limits of agreement method. It's simple and easy to express and interpret the data. 4. Frequency histogram: displays the information about the distribution of the difference between two methods. It is a useful complementary plot to the Altman-Bland plot by offering the distribution of differences between two methods. These graphics are generally advantageous for showing the analytical range of data, the linearity of response over the range, and the general relationship between methods. The data and program codes for these four plots are shown in the appendix. STATISTICAL ANALYSIS Common statistical methods include summary statistics, t-test, correlation, regression and analysis of variances. 1. Summary statistics PROC UNIVARIATE can provide the descriptive summary statistics of the data. proc univariate data = clinlab normal plots; var result1 result2 dif run; The output contains: • Sample size • Range • Arithmetic mean. • Median • • Standard Deviation: the standard deviation is the square root of the variance. When the distribution of the observations is normal, then 95% of observations are located in the interval Mean ± 1.96SD. Test for Normal Distribution: PROC UNIVARIATE calculates the Shapiro-Wilk W statistic. If P is higher than 0.05, it may be assumed that the data has a normal distribution. If the P value is less than 0.05, then the hypothesis that the distribution of the observations in the sample is normal should be rejected. In the latter case, the sample cannot accurately be described by arithmetic mean and standard deviation, and such samples should not be submitted to any parametrical statistical test or procedure, such as t-test, which will be discussed later. To test the possible difference between not normally distributed samples, the Wilcoxon test should be used, and correlation can be estimated by means of rank correlation. When the sample size is small, you can visually evaluate the symmetry and peak of the distribution using the histogram or cumulative frequency distribution. The three plots generated by PROC UNIVARIATE also display the data distribution, which enable us easily to see if the data are approximately normal. 2. T-test PROC TTEST with paired comparison tests the null hypothesis that the average of the differences between the paired observations in the two samples is zero. If the calculated P-value is greater than 0.05, the conclusion is that the mean difference between the paired observations is not significantly different from 0. Otherwise, the results from two methods are significantly different. proc ttest data = clinlab ; paired result1*result2; run; The TTEST Procedure Difference RESULT1 - RESULT2 Difference RESULT1 - RESULT2 Statistics N 30 DF 29 Lower CL Mean -2.518 Mean -0.433 t Value -0.43 Upper CL Mean 1.6513 Lower CL Std Dev 4.4462 Std Dev 5.5828 Upper CL Std Dev 7.5051 Pr > |t| 0.6739 The first section of the output first displays simple summary statistics such as n, mean, std, 95% confidence interval for the mean. The second section of the result displays the null hypothesis test. The high P-value of 0.6739 leads to the conclusion of statistically non-significant difference. Note that the sample size will be equal. The paired t-test actually uses the one sample process on the differences between two results, so t-test can also be conducted using PROC MEANS. proc means data = clinlab n mean std lclm uclm t prt; var dif; run; The MEANS Procedure Analysis Variable : DIF Lower 95% Upper 95% N Mean Std Dev CL for Mean CL for Mean t Value Pr > |t| ----------------------------------------------------------------------------------30 -0.4333333 5.5828144 -2.5179905 1.6513238 -0.43 0.6739 ----------------------------------------------------------------------------------- The t test assumes that data to be tested are distributed normally. This assumption can be checked using the UNIVARIATE procedure. If the normality assumptions for the t test are not satisfied, nonparametric Wilcoxon Rank Sum test by PROC NPAR1WAY should be used to analyze the data. 3. Correlation: Correlation analysis is used to see if the values of two variables are associated. Simple correlation analysis can be conducted by PROC CORR. The default correlation analysis includes descriptive statistics, Pearson correlation statistics, and probabilities for the variable. Correlation coefficients contain information on both the strength and direction of a linear relationship between two numeric variables. proc corr data = clinlab ; var result1 result2; run; Pearson Correlation Coefficients, N = 30 Prob > |r| under H0: Rho=0 RESULT1 RESULT2 RESULT1 1.00000 0.96171 <.0001 RESULT2 0.96171 1.00000 <.0001 The P-value is the probability that you would have found the current result if the correlation coefficient were in fact zero (null hypothesis). If this probability is lower than the conventional 5% (P<0.05) the correlation is called statistically significant. Since correlation only looks at the association rather than the agreement between two methods, correlation may inaccurately estimate the agreement of the relationship. The result comparison plot can identify and prevent this potential mistake. The red diagonal line is the line of equality. If the green regression line does not coincide with the red line of equality, obviously, it is inappropriate to use correlation to interpret the agreement of two methods. 4. Linear regression: Regression is used to describe the relationship between two variables and to predict one variable from another. The linear equation Y=a + bX can reflect the agreement between two methods. The intercept a should equal to 0 or around 0 while slope b should equal to 1 or very close to 1 if the results from two methods are comparable. proc reg data=clinlab; model result1 = result2; run; quit; The output shows the P-value, R-square and estimates. If the significance level for the F-test is very small (< 0.0001), the hypothesis that there is no linear relationship can be rejected. Source DF Model Error Corrected Total 1 28 29 Root MSE Dependent Mean Coeff Var Parameter Estimates Analysis of Variance Sum of Squares 5.57227 92.13333 6.04805 10707 869.40541 11576 R-Square Adj R-Sq Mean Square 10707 31.05019 F Value Pr > F 344.81 <.0001 0.9249 0.9222 Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept RESULT2 1 1 4.53636 0.94631 4.82579 0.05096 0.94 18.57 0.3552 <.0001 The t-value and the P-value are for the hypothesis that these coefficients are equal to 0. The p-value for intercept is 0.3553, which indicates that the intercept is not significantly different from 0, while slope with a p-value <0.0001 means it is significantly different from 0. The statement added to the PROC REG can explicitly test whether the slope value is 1 or not. proc reg data=clinlab; model result1 = result2; slope: test result2 = 1; run; quit; Test SLOPE Results for Dependent Variable RESULT1 Mean Source DF Square F Value Numerator 1 34.46126 1.11 Denominator 28 31.05019 Pr > F 0.3011 The P-value to test slope in the above result is 0.3011, which indicates that the slope is not significantly different from 1. The PROC REG procedure above has tested the linearity of two methodst, values of intercept and slope. Next a diagnostic Shapiro-Wilks test will be conducted to check the normality of residuals (residuals = differences between observed and predicted values). proc reg data=clinlab ; model result1 = result2 /r clm cli ; output out = resid r=resid p=pred slope: test result2 = 1; run; quit; proc univariate data = resid normal plots; var resid; run; The residual plot shows the goodness of fit of the selected model or equation. Residuals also point out the possible outliers (unusual values) in the data and problems with the regression model. Note some options were added to the above model statement, option R is for residual diagnostics. Option CLM provides 95% confidence interval for the regression line. This interval includes the true regression line with 95% probability, while option CLI presents the 95% prediction interval for the regression line. The 95% prediction interval is much wider than the 95% confidence interval. For any given value of the independent variable, this interval represents the 95% probability for the values of the dependent variable. 5. Analysis of Variance. To study the influence of the qualitative (discrete) factors on another (continuous) variable, we can use analysis of variance by PROC GLM. proc glm data = rawlab; class subject method ; model result = subject method; run; quit; Source DF Type III SS Mean Square F Value Pr > F SUBJECT METHOD 29 1 23079.90000 2.81667 795.85862 2.81667 51.07 0.18 <.0001 0.6739 We can find that the output above is the same result from paired t-test result. To evaluate the additional unknown random effects of SUBEJCT to the results, we can use MIXED procedure. In mixed model, the variances of the random-effects parameter SUBJECT assumed to impact the variability of the data become the covariance parameters for the mixed model. proc mixed data=rawlab; class method subject; model result=method /ddfm=satterth ; random subject; lsmeans method/pdiff cl alpha=.05; estimate 'Result1 vs Result2' method 1 -1; run; Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value METHOD 1 29 0.18 Label Result1 vs Result2 Estimate -0.4333 Pr > F 0.6739 Estimates Standard Error 1.0193 DF 29 t Value -0.43 Pr > |t| 0.6739 Effect METHOD METHOD METHOD 1 2 Effect METHOD METHOD Least Squares Means METHOD Lower 1 84.6243 2 85.0576 Effect METHOD Effect METHOD METHOD 1 Estimate 92.1333 92.5667 Least Squares Means Standard Error DF 3.6775 30.1 3.6775 30.1 t Value 25.05 25.17 Pr > |t| <.0001 <.0001 Alpha 0.05 0.05 Upper 99.6424 100.08 Differences of Least Squares Means Standard _METHOD Estimate Error DF t Value 2 -0.4333 1.0193 29 -0.43 Pr > |t| 0.6739 Alpha 0.05 Differences of Least Squares Means METHOD _METHOD Lower Upper 1 2 -2.5180 1.6513 The RANDOM statement defines the random effects SUBJECT to constitute the vector in the mixed model. The DDFM=SATTERTH option performs a general Satterthwaite approximation for the denominator degrees of freedom. PROC MIXED also provides several different statistics suitable for generating hypothesis tests and confidence intervals. The validity of these statistics depends upon the mean and variancecovariance. TRADEMARK INFORMATION SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. CONTACT INFORMATION David Shen ClinForce Consulting Inc. Philadelphia, PA [email protected] Zaizai Lu AstraZeneca Pharmaceutical Inc. Wilmington, DE [email protected] Appendix data set: rawlab Obs SUBJECT SUBJECT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 101 101 102 102 103 103 104 104 105 105 106 106 107 107 108 108 109 109 110 110 111 111 112 112 113 113 114 114 115 115 116 116 117 117 118 118 119 119 120 120 121 121 122 122 123 123 124 124 125 125 METHOD 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 RESULT 82.5 90.5 124.5 134.5 96.0 91.0 77.5 83.0 107.5 100.0 75.0 67.5 89.5 77.5 77.5 72.5 73.5 106.5 101.0 118.5 119.0 92.5 94.5 81.5 83.0 113.5 110.5 48.0 47.0 76.0 77.5 81.5 78.5 85.0 82.0 66.0 68.0 68.0 127.5 125.5 79.5 73.5 106.5 106.5 106.5 118.5 106.5 107.0 68.5 77.5 87.5 90.5 51 52 53 54 55 56 57 58 59 60 126 126 127 127 128 128 129 129 130 130 1 2 1 2 1 2 1 2 1 2 96.0 100.0 83.5 87.5 75.0 81.5 99.5 100.5 133.5 130.0 proc transpose data = rawlab out = temp prefix = result ; by subject; id method; run; data clinlab; set temp (drop=_name_); dif = result1 - result2; mean mean = mean (result1, result2); run; %macro Explots (data=, interv= 10); data clinlab; set &data; run; proc means data = clinlab noprint; var result1 result2; output out = range min=min1 min2 max =max1 max2; run; data _null_; set range; a=round(min(min1, a=round(min(min1, min2)*0.90 - &interv/2, &interv); b=round(max(max1, max2)*1.05 + &interv/2, &interv); call symput ('lowerax', a); call symput ('upperax', b); run; goptions reset=global; axis1 order = (&lowerax to &upperax by &interv) minor=(number=1) minor=(number=1) label = (a=90 'Test Method Result' ); axis2 order = (&lowerax to &upperax by &interv) minor=(number=1) label = ('Comparative Method Result'); axis3 label =(a=90 'Difference') minor=(number=1); axis4 label =(a=90 'Counts') minor = (number=1); (number=1); axis5 label =( 'Difference of Means'); data line; length function color $8; retain hsys ysys xsys '2' color 'red'; function = 'move'; x=&lowerax; y=&lowerax; ; output; function = 'draw'; x=&upperax; y=&upperax; line=2; ;output; run; symbol v=circle color=blue i=r; proc gplot data = clinlab annotate=line; plot result1*result2 / vaxis=axis1 haxis=axis2 ; run; quit; symbol v=dot i=none color= blue ; proc gplot data = clinlab; plot dif*result2 / haxis=axis2 haxis=axis2 vaxis=axis3 vref =0; run; quit; proc means data = clinlab noprint ; var dif; output out = abplot mean = bias std = std range=range; run; data _null_; set abplot ; upper = bias + 1.96*std; lower = bias - 1.96*std; call symput ('upper', upper); call symput ('middle', bias); call symput ('lower', lower); call symput ('range', range/10); run; data abline; length function color style $8 text $15; ; retain hsys ysys xsys '2' color 'red' line 2 position '6'; function = 'move'; x=&lowerax; y=&lower; output; function = 'draw'; x=&upperax; y=&lower; output; function = 'move'; x=&lowerax; y=&upper; output; function = 'draw'; x=&upperax; y=&upper; output; function = 'move'; x=&lowerax; y=&middle; output; function = 'draw'; x=&upperax; y=&middle; line=1; output; function = 'label'; x=&lowerax+ 0.05*&interv; y=&lower + ⦥ style= 'swissb'; color='green'; size=1.5; text='BIAStext='BIAS-1.96SD'; output; output; function = 'label'; x=&lowerax+ 0.05*&interv; y=&middle + ⦥ style= 'swissb'; color='green'; size=1.5; text='BIAS '; output; function = 'label'; x=&lowerax+ 0.05*&interv; y=&upper y=&upper + ⦥ style= 'swissb'; color='green'; size=1.5; text='BIAS+1.96SD'; output; run; symbol v=dot i=none color= blue ; proc gplot data = clinlab annotate=abline; plot dif*mean / haxis=axis2 vaxis=axis3 ; run; quit; proc gchart data = clinlab; vbar dif / levels=9 raxis=axis4 maxis=axis5; run; quit; %mend; %Explots (data=clinlab); 150 140 130 120 110 100 90 80 70 60 50 40 40 50 60 70 80 90 100 110 120 130 140 150 120 130 140 150 Com par at i ve M et hod Resul t 20 10 0 - 10 - 20 40 50 60 70 80 90 100 110 Com par at i ve M et hod Resul t 20 10 0 - 10 - 20 40 50 60 70 80 90 100 110 120 130 140 150 Com par at i ve M et hod Resul t 8 7 6 5 4 3 2 1 0 - 12 -9 -6 -3 0 D i f f er ence of 3 M eans 6 9 12