Download Analysis of Chromium Emissions Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Analysis of Chromium
Emissions Data
Nagaraj Neerchal and Justin Newcomer, UMBC and OIAA/OEI
and
Mohamed Seregeldin, Office of Air Quality Planning and Standards,
EPA, RTP
Objective
• To develop a protocol (methodology) for
obtaining confidence bounds for the “Mean
Chromium Emissions” for each welding
process and rod type combination.
• Incorporate all the data, including the
averages, to the best of our ability.
About The Data
• Three Welding Processes
– GMAW, SMAW, FCAW
• Three Rod Types
– E308, E309, and E316
• Multiple Sources of Data
– Some report individual measurements
– Some report only averages without the original
observations.
– Units of reporting vary—all are converted to g/kg
Summary Statistics
Total Chromium (g/kg)
Rod Type
E316
E308
Welding
Process
E309
Mean
Sample
(SE of Mean) Size
Standard
Deviation
Mean
(SE of Mean)
Sample
Size
Standard
Deviation
Mean
(SE of Mean)
Sample
Size
Standard
Deviation
SMAW
0.7804222
(0.1056113)
9
0.3168339
0.8287333
(0.1443974)
3
0.2501037
0.63952857
(0.08441454)
7
0.22333989
GMAW
No Data
(N/A)
0
(N/A)
1.0323333
(0.1338336)
3
0.2318067
4.599625
(1.350994)
4
2.701988
FCAW
No Data
(N/A)
0
(N/A)
2
0.834386
2.219625
(0.4905373)
4
0.9810745
Mean
Sample
(SE of Mean) Size
(0.59)
Chromium 6 (g/kg)
Rod Type
E316
E308
Welding
Process
2.45
E309
Standard
Deviation
Mean
(SE of Mean)
Sample
Size
Standard
Deviation
Mean
(SE of Mean)
Sample
Size
Standard
Deviation
SMAW
0.14595556
(0.01328651)
9
0.03985954
0.19013333
(0.03337715)
3
0.0578109
0.092064
(0.02513234)
7
0.06649392
GMAW
No Data
(N/A)
0
(N/A)
0.02333333
(0.01151468)
3
0.019944
0.047525
(0.01384038)
4
0.02768077
FCAW
No Data
(N/A)
0
(N/A)
0.05586667
(0.01138015)
3
0.019711
0.05635
(0.02121692)
4
0.04243383
Note: Summary Statistics based only on observations with single measurement.
Combining Rod Types
• Combine E308+E316 because of the similar technology
and small sample size
• Sample Sizes:
Total Chromium (g/kg)
Rod Type
Welding
Process
E308
E316
E309
SMAW
9 Single Measurements (NSRP 0574)
2 Averages based on 3 tests (AP-42)
3 Single Measurements (NSRP 0574)
5 Averages based on 3 tests (AP-42)
7 Single Measurements (NSRP 0574, ESAB)
0 Averages
GMAW
0 Single Measurements
1 Average based on 3 tests (AP-42)
3 Single Measurements (NSRP 0574)
1 Average based on 3 tests (AP-42)
4 Single Measurements (NSRP 0574, ESAB)
0 Averages
FCAW
0 Single Measurements
0 Averages
2 Single Measurements (NSRP 0574)
2 Averages based on 3 tests (AP-42)
4 Single Measurements (NSRP 0574, ESAB)
0 Averages
Chromium 6 (g/kg)
Rod Type
Welding
Process
E308
SMAW
9 Single Measurements (NSRP 0574)
1 Average based on 3 tests (AP-42)
GMAW
0 Single Measurements
1 Average based on 3 tests (CARB)
FCAW
0 Single Measurements
0 Averages
E316
3 Single Measurements (NSRP 0574)
3 Averages based on 3 tests (AP-42)
1 Average based on 4 tests (CARB)
3 Single Measurements (NSRP 0574)
3 Averages based on 3 tests (AP-42, CARB)
1 Average based on 4 tests (CARB)
3 Single Measurements (NSRP 0574)
1 Average based on 3 tests (AP-42)
E309
7 Single Measurements (NSRP 0574, ESAB)
0 Averages
4 Single Measurements (NSRP 0574, ESAB)
0 Averages
4 Single Measurements (NSRP 0574, ESAB)
1 Average based on 6 tests (CARB)
Summary Statistics After Combing Data for Rod Types
Total Chromium (g/kg)
Rod Type
(E308+E316)
Welding
Process
Mean
Sample
(SE of Mean) Size
E309
Standard
Deviation
Mean
(SE of Mean)
Sample
Size
Standard
Deviation
SMAW
0.7925
(0.08409162)
12
0.29130191
0.63952857
(0.08441454)
7
0.2233399
GMAW
1.0323333
(0.1338336)
3
0.2318067
4.599625
(1.350994)
4
2.701988
FCAW
2.45
(0.59)
2
0.834386
2.219625
(0.4905373)
4
0.9810745
Chromium 6 (g/kg)
Rod Type
(E308+E316)
Welding
Process
Mean
Sample
(SE of Mean) Size
E309
Standard
Deviation
Mean
(SE of Mean)
Sample
Size
Standard
Deviation
SMAW
0.157
(0.01342367)
12
0.04650097
0.092064
(0.02513234)
7
0.0664939
GMAW
0.02333333
(0.01151468)
3
0.019944
0.047525
(0.01384038)
4
0.0276808
FCAW
0.05586667
(0.01138015)
3
0.019711
0.05635
(0.02121692)
4
0.0424338
Note: Summary Statistics based only on observations with single measurement.
Traditional Approaches
• Assume Normality?
– Normality is not a good assumption for this data set at all
– Sample sizes are very small for certain combinations
– Bounds obtained assuming normality give meaningless results
(e.g. negative bounds) when the data does not follow normality
• 95% Confidence Intervals for the Mean:
Total Chromium (g/kg)
Rod Type
Chromium 6 (g/kg)
Rod Type
Welding
Process
(E308+E316)
E309
(E308+E316)
E309
SMAW
(0.60742 , 0.97758)
(0.432974 , 0.846084)
(0.127455 , 0.18655)
(0.030567 , 0.15356)
GMAW
(0.45649 , 1.60817)
(0.30016 , 8.89909)
(-0.02621 , 0.07288)
(0.003479 , 0.091571)
FCAW
(-5.04667 , 9.9467)
(0.65852 , 3.78073)
(0.0069 , 0.10483)
(-0.01117 , 0.123872)
Note: Summary Statistics based only on observations with single measurement.
Traditional Approaches
• Transform the data to normality
– Optimal transformation for Total Chromium data is different from
optimal for Chrom6 data.
– It is hard to transform the confidence bounds back to the original
scale (mean of the log is not the same log of the mean!)
• Box-Cox Log-Likelihood Plots:
Box-Cox Transformation Results:
Log-Likelihood vs. lambda for Chromium6 in WeldindSingles.Cr6
20
-40
-20
Log-Likelihood
0
-60
-70
-60
-80
-80
-90
Log-Likelihood
-50
40
-40
Box-Cox Transformation Results:
Log-Likelihood vs. lambda for TotalChromium in WeldingSingles.TC
-2
-1
0
lambda
1
2
-2
-1
0
lambda
1
2
Traditional Approaches
• Weighted regression to incorporate the
averages
Yijk  ij   ijk , i  1,  , 3 (processes ); j  1,  , 2 (rodtypes) ;
k  1,  , nij (data points for each combinatio n).
Therefore,
 2 
Yij ~  ij ,
.
n
ij 

nij  1 for single measuremen ts
 1 for measuremen ts based on several tests
Traditional Approaches
• Weighted Regression
– Estimates have good properties (such as BLUE) in general—not
only for normal data
– But the confidence bounds are sensitive to the normality
assumption, especially when the sample sizes are small as in
our case.
Total Chromium (g/kg)
Chromium 6 (g/kg)
Rod Type
Rod Type
(E308+E316)
Welding Estimated Mean
Process
(SE of Mean)
E309
(E308+E316)
E309
95% Confidence
Interval
Estimated Mean
(SE of Mean)
95% Confidence
Interval
Estimated Mean
(SE of Mean)
95% Confidence
Interval
Estimated Mean
(SE of Mean)
95% Confidence
Interval
SMAW
0.589727
(0.167497)
(0.2503 , 0.9291)
0.63953
(0.363675)
(-0.09735 , 1.3764)
0.20996
(0.0156086)
(0.1783 , 0.2416)
0.092064
(0.0294975)
(0.0323, 0.15183)
GMAW
0.629444
(0.320731)
(-0.02042 , 1.27931)
4.599625
(0.4810972)
(3.62483 , 5.57442)
0.0175875
(0.01951076)
(-0.0219, 0.05712)
0.047525
(0.0390215)
(-0.03154 , 0.12659)
FCAW
1.16375
(0.3401871)
(0.47447 , 1.85303)
2.219625
(0.481097)
(1.2448 , 3.19442)
0.097933
(0.03186)
(0.03337 , 0.16248)
0.0313
(0.0246793)
(-0.018705 , 0.0813)
Traditional Approaches
•
Nonparametric Approaches?
– Nonparametric approaches usually use ranks. When
only averages are reported we completely lose the
information regarding ranks. Therefore, means can not
be incorporated into nonparametric approaches.
•
Bootstrapping?
– Made popular by Bradley Efron in the 1980’s
– Efron and Tibshirani (1993)
– Millard, S. P. and Neerchal, N. K. (2000)
Bootstrapping
• What is Bootstrapping?
– Resampling the observed data
– It is a simulation type of method where the observed data
(not a mathematical model) is repeatedly sampled for
generating representative data sets
– Only indispensable assumption is that “observations are
a random sample from a single population”
– There are some fixes available when the single
population assumption is violated as in our case.
– Can be implemented in quite a few software packages:
e.g. SPLUS, SAS
– Millard and Neerchal (2000) gives S-Plus code
Bootstrapping - The Details
Data
X=(X1,X2,X3,….,Xn)
Statistic: T=T(X)
rep #1 X*1=(X*1,X*2,X*3,….,X*n)
T*1=T(X*1)
rep #2 X*2=(X*1,X*2,X*3,….,X*n)
T*1=T(X*1)
…..
…….
……..
rep #B X*B=(X*1,X*2,X*3,….,X*n)
T*1=T(X*1)
Bootstrapping inference is based on the distribution of the replicated values of
the statistic : T*1,T*2,….T*B. For example, Bootstrap 95% Upper Confidence
Bound based on T is given by the 95th percentile of the distribution of T*s.
Bootstrapping Single Tests Data
Total Chromium (g/kg)
Percentiles of the Bootstrap Distribution
Welding
Process
Rod Type
2.5%
5%
95%
97.5%
Chromium 6 (g/kg)
Percentiles of the Bootstrap Distribution
2.5%
5%
E308+E316 0.6377529 0.6620375 0.9251633 0.9589048 0.13258208 0.13667417
95%
97.5%
0.1789025
0.18285
SMAW
E309
0.4696429 0.5008814 0.7565686 0.7693221 0.04525865 0.05157771 0.12786571 0.1341867
E308+E316
0.898
0.8983333 1.1663333
1.3
0.0074
0.01056667
0.0361
0.0457
E309
2.146375
2.74425
6.4275
6.455
0.02091
0.0217375
0.0640075
0.064835
E308+E316
1.86
1.86
3.04
3.04
0.0335
0.04346667 0.06826667
E309
1.288875
1.298875
2.83
2.84
0.02335
GMAW
0.0707
FCAW
0.024925
0.08935
Note: Columns in yellow represent the 95% upper confidence bound
0.090625
Bootstrapping the Combined Data
• Group the data points according to the number of
tests used in reporting the average, within each
welding process and rod type combination. Then
bootstrap within each such group.
• i.e. for GMAW and E316:
Source of Data
NSRP 0587
NSRP 0587
NSRP 0587
CARB
AP-42
CARB
Welding Process
GMAW
GMAW
GMAW
GMAW
GMAW
GMAW
RODTYPE NTESTS Total Chromium (g/kg) Chromium 6 (g/kg)
E316
1
0.898
0.0457
E316
1
1.3
0.0169
E316
1
0.899
0.0074
E316
3
0.025
E316
3
0.532
0.007
E316
4
0.0086
Note: Each color represents a separate group
Bootstrapping - Results
Welding
Process
Total Chromium (g/kg)
Percentiles of the Bootstrap Distribution
Chromium 6 (g/kg)
Percentiles of the Bootstrap Distribution
Rod Type
2.5%
5%
95%
2.5%
E308+E316
0.5268929
0.539621
0.665441
E309
0.47395
0.49778
0.6781325 0.6878395 0.1382514 0.1435892 0.1974498 0.2029217
97.5%
0.6758741 0.1615378
5%
0.16393
95%
97.5%
0.2144114 0.2177658
SMAW
E308+E316
0.5154444 0.5155556 0.7433333 0.7434444 0.0104625 0.0114438 0.0231484 0.0237313
GMAW
E309
2.146375
2.74425
6.455
6.455
0.02091
0.030215
0.0640075 0.0648557
E308+E316
0.66375
0.66375
1.66375
1.66375
0.08675
0.0917333 0.1041333
E309
1.188875
1.298875
2.83
2.84
0.0181
0.10535
FCAW
0.01873
0.0445
Note: Columns in yellow represent the 95% upper confidence bound
0.04501
Final Remarks
• Normality assumption is not appropriate for
either Total Chromium or Chromium6 data.
• Weighted regression model can accommodate
the averages into the estimates.
• Bootstrapping the data seems to be a way to
ensure that meaningful confidence bounds are
obtained
• More work is needed to study the robustness of
Bootstrapping results with respect to some
extreme values in the data
Related documents