Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Estimation of the Values below the Detection Limit by Regression Techniques Gayland Ridley. Merck & Co. Inc. Blue Bell. PA Shi-Tao Yeh. EDP/Temps & CONTRACT SERVICES. Bala Cynwyd. PA Abstract The regression order statistics method converts the observations into a ranked order serial then regresses it on the values corresponding to the inverse cumulative normal distribution function. Obervations are ranked from smallest to largest with values below the Dl treated as the smallest values. The environmental or laboratory data set may contain some observations which may be very small or near zero. These data are measured. but measuring devices or procedures used are unable to detect low concentrations. The analytical laboratories may report it as below "detection limit" (Dll. The data report forms from the laboratories may indicate: let Xi = the i th ranked observation. XI = the smallest value. o not detected. o less than a specified detection limit. or o zero. X k+ I = the smallest detectable value. and X n = the largest value. If the obervations comprising of the sample are randomly drawn from the population. the ordered data values would divide the underlying probability density function into equal areas. Thus. on estimated plotting position on an appropriate coordinate system can be calcaulated for each point such that the data above the Dl will fall on a straight line. The most common techniques to estimate values below the Dl are the substitution and deletion techniques. Deletion techniques delete the observations below the Dl from the data set. This approach uses the observations above the Dl only. Substitution techniques use zero. half of the DL. or the Dl as the value for substituting any below the Dl observations. let Both deletion and substitution techniques generate bioses from estimates of parameters. This produces inaccurate descriptive statistics and incorrect statistical inferences. Ai = F-l[(i.3/8)/(n+ 1/4)1 where F -1 [XI is the inverse cumulative normal distribution function. The method regresses X i on A ; to estimate a and b in the equation. This paper provides two regression methods. regression order statistics and a log probability method. with SAS ® code for estimating the values below the DL. Xj=a+bAj+e,. The mean of the distribution for the whole data set is estimated by a and the standard deviation is estimated by b. Regression Order StaHsHcs Method 1.1 Theoretical Background 1.2 Computation Procedure 877 data pi; set pi; n = runiss + nomiss; call symput('runiss',nmiss); call symput('nomiss',nomiss); call symput('n',n); %global nmiss nomiss n; Figure 1 shows the computation procedure. R.tri._ tbo !lola Sot .... V_ ';111 MiIoiaa VII_ 1* creates a order list */ data 12; doi= 1 to&n: j = i; output; end; Aai... tho Numhon of - . . . VII_, Non-MiaiDr VII. ., """ Totol 0 _ u olobol Moen> Variohl. data f\(keep=i); set &.fname; set f2; proc sort data=f\ ;by &vname; .j C Fillet /* computes z scons */ .... F= « 1-311)1 (N+ 114» Z=PROBrr(F) Petfo.... Re,...;oo ADalyoiI .... Retrie... 1110 Pr-clictodVal_ I data 13; set f\; set fJ(keep=i); f i- 3/8 )/( &n+ 1/4»; z = probit(f); I =« /* fits a regression line AlIi... tho Predie1ed VaI_ to !be Miainr _d */ Val. . UId Flua it IPri= Ibo Ne.. Dua with MiIoiaa VII.... proc reg data=13 noprint; model &vname = z; output out = pip = xhat; I run; data f3(keep=j &vamc llag); . set 13; set pi; length l\ag $ 13; xhat = round(xhat,&rofl); if &vname = . then do; flag = '<= ESTIMATED'; &vname = xhat; end; Figure J Rowchatf of Computation Sleps 1.3 SAS Macro Module The argument in the following macro module is the usefs input of SAS data set name Iname and variable name vname with the below DL values to be estimated. The third argument roll is the use(s specified roundoff unit. The module calls the SAS ROUND function to round the estimated values to the nearest roundoff unit. data &fname; set &fuame; set 13; proc print data=&fnarne noobs~ run; O/.mend reg1; O/.macro regl(fname,vanme.roff); Figure 2 SAS Macro Module /* computes # of missing values and # of ob. in the datafile 'j 1.4 Sample Input and How to Use the Module proc unvariate data = &fname noprint; var&vname; output out=p 1 nm.iss=nmiss n=nomiss; Display 1 shows a sample data file with file name of SUG1.DAT. 878 9 10 x y 13 12 9 10 12 10 16 8 11 12 13 1 2 14 15 16 17 18 19 20 7 3 10 13 4 5 6 7 8 9 9 11 10 10 6 <~~ ESTIMATED 16 8 ,,12 13 10 12 11 9 12 13 14 15 16 17 18 19 20 10 12 10 16 Display 2 Output from Example Program Log Probability Method 8 2.1 Theoretical Background 16 8 12 Display I Gilliom and Helsel conducted an experiment in which samples were generated from a wide range of parent distributions. and the DL at varying levels. This was to evaluate eight different methods for estimating distribution parameters [IJ [3J. They found the most robust method for minimizing error for distribution parameters estimation was the log-probability regression method. This method uses observed data above the DL with mISSing values extrapolated to compute the distribution parameters in assuming the below DL observations follows a lognormal distribution at the zero-to-DL range. Sample Data The following example utilizes the sample data set and illustrates how to use the macro module. data sugi; infile Isugi.dat'~ input yx; run; %reg 1(sugi,x, 1.) run; 2.2 Computation Procedure The estimation steps are as follows: Figure 3 SAS Program Example Display 2 shows the output from example SAS program. o performs the log transformation on observations above the DL. the o computes their z scores. where y X 1 2 3 4 5 6 7 8 5 z=F-l [(i-3/8)/{n+1/4)J. FLAG o fits a regression line to the log <= ESTIMATED transformed observations and their z 7 10 13 9 11 10 10 scores, o predicts the below DL observations from the regression line. o backtransforms all values to 879 arithmetic units. 2.3 How to Use the Module and output Sample Conclusion This paper provides SAS modules to estimate the values below the DL. Two The following SAS program illustrates how to perform log probability regression. It produces SAS output shown in Display 3. regression methods. rank order and log probability. with computation procedures and SAS macro code are presented and discussed. The log probability regression method is considered the most robust method for minimizing error in missing value data sugi(drop=x); estimates. The module in this paper provides a computational tool for users to estimate the values below the DL. set sugi; Ix = log(x); %regl (sugi,Ix,.OO 1) run; data sugi(drop=lx); set sugi; x = exp(lx); References [I] Gilliom. R. J. and Helsel. D. R.(1986) "Estimation of Distribution Parameters for Censored Trace level Water Quality Data. 2. Verification and Applications". Water Resources Res.. 22(2).147-155 run; proc print data=sugi noobs; var y x flag; run; [2] Helsel. D. R. (1990). "less than Obvious. Figure 4 SAS Progrom for Log Probability Statistical Treatment of Data below the Detection limit". Environ. Sci. Technol. 24(12).1766-1774 Method x y 1 2 3 4 6.1227 7.0000 10.0000 13.0000 9.0000 11.0000 10.0000 10.0000 13.0000 12.0000 9.0000 10.0000 13.0000 10.0000 16.0000 8.0000 6.9379 16.0000 8.0000 12.0000 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [3] • and Gilliam. R. J. (1986) "Estimation of Distributional Parameters for Censored Trace level Water Quality Dafa. 1. Estimation Techniques". Water Resources Res ..22(2). 135-146 FLAG <=ESTIMATED [4] Newman. M.C.. D. Greene. Dixon. P. M .. looney. B. B.. and Segal. C. (l992) Uncensor V3.D. Savannah River Ecology laboratory. Aiken. Sc. 1-15 [5] . Dixon. P. M .. looney. B. B.. and Pinder. J. E. III. (1989) "Estimating Mean and Variance for Environmental Samples with· below Detection limit Observations". Water Resources Bull. 25(4). 905-916. <= ESTIMATED SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Display 3 Output from Log ProbabUify Method 880