Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Analysis of Chromium Emissions Data Nagaraj Neerchal and Justin Newcomer, UMBC and OIAA/OEI and Mohamed Seregeldin, Office of Air Quality Planning and Standards, EPA, RTP Objective • To develop a protocol (methodology) for obtaining confidence bounds for the “Mean Chromium Emissions” for each welding process and rod type combination. • Incorporate all the data, including the averages, to the best of our ability. About The Data • Three Welding Processes – GMAW, SMAW, FCAW • Three Rod Types – E308, E309, and E316 • Multiple Sources of Data – Some report individual measurements – Some report only averages without the original observations. – Units of reporting vary—all are converted to g/kg Summary Statistics Total Chromium (g/kg) Rod Type E316 E308 Welding Process E309 Mean Sample (SE of Mean) Size Standard Deviation Mean (SE of Mean) Sample Size Standard Deviation Mean (SE of Mean) Sample Size Standard Deviation SMAW 0.7804222 (0.1056113) 9 0.3168339 0.8287333 (0.1443974) 3 0.2501037 0.63952857 (0.08441454) 7 0.22333989 GMAW No Data (N/A) 0 (N/A) 1.0323333 (0.1338336) 3 0.2318067 4.599625 (1.350994) 4 2.701988 FCAW No Data (N/A) 0 (N/A) 2 0.834386 2.219625 (0.4905373) 4 0.9810745 Mean Sample (SE of Mean) Size (0.59) Chromium 6 (g/kg) Rod Type E316 E308 Welding Process 2.45 E309 Standard Deviation Mean (SE of Mean) Sample Size Standard Deviation Mean (SE of Mean) Sample Size Standard Deviation SMAW 0.14595556 (0.01328651) 9 0.03985954 0.19013333 (0.03337715) 3 0.0578109 0.092064 (0.02513234) 7 0.06649392 GMAW No Data (N/A) 0 (N/A) 0.02333333 (0.01151468) 3 0.019944 0.047525 (0.01384038) 4 0.02768077 FCAW No Data (N/A) 0 (N/A) 0.05586667 (0.01138015) 3 0.019711 0.05635 (0.02121692) 4 0.04243383 Note: Summary Statistics based only on observations with single measurement. Combining Rod Types • Combine E308+E316 because of the similar technology and small sample size • Sample Sizes: Total Chromium (g/kg) Rod Type Welding Process E308 E316 E309 SMAW 9 Single Measurements (NSRP 0574) 2 Averages based on 3 tests (AP-42) 3 Single Measurements (NSRP 0574) 5 Averages based on 3 tests (AP-42) 7 Single Measurements (NSRP 0574, ESAB) 0 Averages GMAW 0 Single Measurements 1 Average based on 3 tests (AP-42) 3 Single Measurements (NSRP 0574) 1 Average based on 3 tests (AP-42) 4 Single Measurements (NSRP 0574, ESAB) 0 Averages FCAW 0 Single Measurements 0 Averages 2 Single Measurements (NSRP 0574) 2 Averages based on 3 tests (AP-42) 4 Single Measurements (NSRP 0574, ESAB) 0 Averages Chromium 6 (g/kg) Rod Type Welding Process E308 SMAW 9 Single Measurements (NSRP 0574) 1 Average based on 3 tests (AP-42) GMAW 0 Single Measurements 1 Average based on 3 tests (CARB) FCAW 0 Single Measurements 0 Averages E316 3 Single Measurements (NSRP 0574) 3 Averages based on 3 tests (AP-42) 1 Average based on 4 tests (CARB) 3 Single Measurements (NSRP 0574) 3 Averages based on 3 tests (AP-42, CARB) 1 Average based on 4 tests (CARB) 3 Single Measurements (NSRP 0574) 1 Average based on 3 tests (AP-42) E309 7 Single Measurements (NSRP 0574, ESAB) 0 Averages 4 Single Measurements (NSRP 0574, ESAB) 0 Averages 4 Single Measurements (NSRP 0574, ESAB) 1 Average based on 6 tests (CARB) Summary Statistics After Combing Data for Rod Types Total Chromium (g/kg) Rod Type (E308+E316) Welding Process Mean Sample (SE of Mean) Size E309 Standard Deviation Mean (SE of Mean) Sample Size Standard Deviation SMAW 0.7925 (0.08409162) 12 0.29130191 0.63952857 (0.08441454) 7 0.2233399 GMAW 1.0323333 (0.1338336) 3 0.2318067 4.599625 (1.350994) 4 2.701988 FCAW 2.45 (0.59) 2 0.834386 2.219625 (0.4905373) 4 0.9810745 Chromium 6 (g/kg) Rod Type (E308+E316) Welding Process Mean Sample (SE of Mean) Size E309 Standard Deviation Mean (SE of Mean) Sample Size Standard Deviation SMAW 0.157 (0.01342367) 12 0.04650097 0.092064 (0.02513234) 7 0.0664939 GMAW 0.02333333 (0.01151468) 3 0.019944 0.047525 (0.01384038) 4 0.0276808 FCAW 0.05586667 (0.01138015) 3 0.019711 0.05635 (0.02121692) 4 0.0424338 Note: Summary Statistics based only on observations with single measurement. Traditional Approaches • Assume Normality? – Normality is not a good assumption for this data set at all – Sample sizes are very small for certain combinations – Bounds obtained assuming normality give meaningless results (e.g. negative bounds) when the data does not follow normality • 95% Confidence Intervals for the Mean: Total Chromium (g/kg) Rod Type Chromium 6 (g/kg) Rod Type Welding Process (E308+E316) E309 (E308+E316) E309 SMAW (0.60742 , 0.97758) (0.432974 , 0.846084) (0.127455 , 0.18655) (0.030567 , 0.15356) GMAW (0.45649 , 1.60817) (0.30016 , 8.89909) (-0.02621 , 0.07288) (0.003479 , 0.091571) FCAW (-5.04667 , 9.9467) (0.65852 , 3.78073) (0.0069 , 0.10483) (-0.01117 , 0.123872) Note: Summary Statistics based only on observations with single measurement. Traditional Approaches • Transform the data to normality – Optimal transformation for Total Chromium data is different from optimal for Chrom6 data. – It is hard to transform the confidence bounds back to the original scale (mean of the log is not the same log of the mean!) • Box-Cox Log-Likelihood Plots: Box-Cox Transformation Results: Log-Likelihood vs. lambda for Chromium6 in WeldindSingles.Cr6 20 -40 -20 Log-Likelihood 0 -60 -70 -60 -80 -80 -90 Log-Likelihood -50 40 -40 Box-Cox Transformation Results: Log-Likelihood vs. lambda for TotalChromium in WeldingSingles.TC -2 -1 0 lambda 1 2 -2 -1 0 lambda 1 2 Traditional Approaches • Weighted regression to incorporate the averages Yijk ij ijk , i 1, , 3 (processes ); j 1, , 2 (rodtypes) ; k 1, , nij (data points for each combinatio n). Therefore, 2 Yij ~ ij , . n ij nij 1 for single measuremen ts 1 for measuremen ts based on several tests Traditional Approaches • Weighted Regression – Estimates have good properties (such as BLUE) in general—not only for normal data – But the confidence bounds are sensitive to the normality assumption, especially when the sample sizes are small as in our case. Total Chromium (g/kg) Chromium 6 (g/kg) Rod Type Rod Type (E308+E316) Welding Estimated Mean Process (SE of Mean) E309 (E308+E316) E309 95% Confidence Interval Estimated Mean (SE of Mean) 95% Confidence Interval Estimated Mean (SE of Mean) 95% Confidence Interval Estimated Mean (SE of Mean) 95% Confidence Interval SMAW 0.589727 (0.167497) (0.2503 , 0.9291) 0.63953 (0.363675) (-0.09735 , 1.3764) 0.20996 (0.0156086) (0.1783 , 0.2416) 0.092064 (0.0294975) (0.0323, 0.15183) GMAW 0.629444 (0.320731) (-0.02042 , 1.27931) 4.599625 (0.4810972) (3.62483 , 5.57442) 0.0175875 (0.01951076) (-0.0219, 0.05712) 0.047525 (0.0390215) (-0.03154 , 0.12659) FCAW 1.16375 (0.3401871) (0.47447 , 1.85303) 2.219625 (0.481097) (1.2448 , 3.19442) 0.097933 (0.03186) (0.03337 , 0.16248) 0.0313 (0.0246793) (-0.018705 , 0.0813) Traditional Approaches • Nonparametric Approaches? – Nonparametric approaches usually use ranks. When only averages are reported we completely lose the information regarding ranks. Therefore, means can not be incorporated into nonparametric approaches. • Bootstrapping? – Made popular by Bradley Efron in the 1980’s – Efron and Tibshirani (1993) – Millard, S. P. and Neerchal, N. K. (2000) Bootstrapping • What is Bootstrapping? – Resampling the observed data – It is a simulation type of method where the observed data (not a mathematical model) is repeatedly sampled for generating representative data sets – Only indispensable assumption is that “observations are a random sample from a single population” – There are some fixes available when the single population assumption is violated as in our case. – Can be implemented in quite a few software packages: e.g. SPLUS, SAS – Millard and Neerchal (2000) gives S-Plus code Bootstrapping - The Details Data X=(X1,X2,X3,….,Xn) Statistic: T=T(X) rep #1 X*1=(X*1,X*2,X*3,….,X*n) T*1=T(X*1) rep #2 X*2=(X*1,X*2,X*3,….,X*n) T*1=T(X*1) ….. ……. …….. rep #B X*B=(X*1,X*2,X*3,….,X*n) T*1=T(X*1) Bootstrapping inference is based on the distribution of the replicated values of the statistic : T*1,T*2,….T*B. For example, Bootstrap 95% Upper Confidence Bound based on T is given by the 95th percentile of the distribution of T*s. Bootstrapping Single Tests Data Total Chromium (g/kg) Percentiles of the Bootstrap Distribution Welding Process Rod Type 2.5% 5% 95% 97.5% Chromium 6 (g/kg) Percentiles of the Bootstrap Distribution 2.5% 5% E308+E316 0.6377529 0.6620375 0.9251633 0.9589048 0.13258208 0.13667417 95% 97.5% 0.1789025 0.18285 SMAW E309 0.4696429 0.5008814 0.7565686 0.7693221 0.04525865 0.05157771 0.12786571 0.1341867 E308+E316 0.898 0.8983333 1.1663333 1.3 0.0074 0.01056667 0.0361 0.0457 E309 2.146375 2.74425 6.4275 6.455 0.02091 0.0217375 0.0640075 0.064835 E308+E316 1.86 1.86 3.04 3.04 0.0335 0.04346667 0.06826667 E309 1.288875 1.298875 2.83 2.84 0.02335 GMAW 0.0707 FCAW 0.024925 0.08935 Note: Columns in yellow represent the 95% upper confidence bound 0.090625 Bootstrapping the Combined Data • Group the data points according to the number of tests used in reporting the average, within each welding process and rod type combination. Then bootstrap within each such group. • i.e. for GMAW and E316: Source of Data NSRP 0587 NSRP 0587 NSRP 0587 CARB AP-42 CARB Welding Process GMAW GMAW GMAW GMAW GMAW GMAW RODTYPE NTESTS Total Chromium (g/kg) Chromium 6 (g/kg) E316 1 0.898 0.0457 E316 1 1.3 0.0169 E316 1 0.899 0.0074 E316 3 0.025 E316 3 0.532 0.007 E316 4 0.0086 Note: Each color represents a separate group Bootstrapping - Results Welding Process Total Chromium (g/kg) Percentiles of the Bootstrap Distribution Chromium 6 (g/kg) Percentiles of the Bootstrap Distribution Rod Type 2.5% 5% 95% 2.5% E308+E316 0.5268929 0.539621 0.665441 E309 0.47395 0.49778 0.6781325 0.6878395 0.1382514 0.1435892 0.1974498 0.2029217 97.5% 0.6758741 0.1615378 5% 0.16393 95% 97.5% 0.2144114 0.2177658 SMAW E308+E316 0.5154444 0.5155556 0.7433333 0.7434444 0.0104625 0.0114438 0.0231484 0.0237313 GMAW E309 2.146375 2.74425 6.455 6.455 0.02091 0.030215 0.0640075 0.0648557 E308+E316 0.66375 0.66375 1.66375 1.66375 0.08675 0.0917333 0.1041333 E309 1.188875 1.298875 2.83 2.84 0.0181 0.10535 FCAW 0.01873 0.0445 Note: Columns in yellow represent the 95% upper confidence bound 0.04501 Final Remarks • Normality assumption is not appropriate for either Total Chromium or Chromium6 data. • Weighted regression model can accommodate the averages into the estimates. • Bootstrapping the data seems to be a way to ensure that meaningful confidence bounds are obtained • More work is needed to study the robustness of Bootstrapping results with respect to some extreme values in the data