* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Data assimilation wikipedia , lookup
Confidence interval wikipedia , lookup
Regression analysis wikipedia , lookup
Choice modelling wikipedia , lookup
Linear regression wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Coefficient of determination wikipedia , lookup
Nonparametric Bootstrap Inference on the Characterization of a Response Surface Robert Parody Center for Quality and Applied Statistics Rochester Institute of Technology 2009 QPRC June 4, 2009 Presentation Outline Introduction Previous Work New Technique Example Simulation Study Conclusion and Future Research Introduction Response Surface Methodology (RSM) – Identify the relationship between a set of kpredictor variables ξ x1 ,, x k and the response variable y – Typically, the goal of the experiment is to optimize E(Y) x is transformed into coded x by xi xi xi 0 sc i The Model A second order model is fit to the data represented by yu b 0 xu ωu εu k k k k i i i j x b i xi b ii xi2 b ij xi x j – where: bi, bii, and bij are unknown parameters e ~ F(0,s2) and independent wu are other effects such as block effects and covariates, which are not interacting with the xi’s Equivalently, in matrix form, x xβ xBx where b 11 β b1 ,..., b k and Β sym. b12 2 b 22 b1k 2 b 2k 2 b kk Background Canonical Analysis Rotate the axis system so that the new system lies parallel to the principle axes of the surface P is the matrix of eigenvectors of B where PP = PP = I The rotated variables and parameters: – w = Px – q = Pb – L = PBP = diag(li) Types of Surfaces If all li < 0 (> 0), the stationary point is a maximizer (minimizer); contours are ellipsoidal. If the li have different signs, the stationary point is a minimax point (complicated hyperbolic contours). Standard Errors for the li Carter Chinchilli and Campbell (1990) – Found standard errors and covariances for li by way of the delta method Bisgaard and Ankenman (1996) – Simplified this with the creation of the Double Linear Regression (DLR) method Previous Work Edwards and Berry (1987) – Simulated a critical point for a prespecified linear combination of the parameters – The natural pivotal quantity for constructing simultaneous intervals for these linear combinations of the parameters is Q max cj γˆ γ /sˆ cj V c j 1 j r * 1/2 Shortcoming The technique on the previous slide is only valid when – – The errors are i.i.d. normal with constant variance The set of linear combinations of interest are prespecified Research Goal Employ a nonparametric bootstrap based on a pivotal quantity to extend the previously mentioned work to include situations where: 1. The set of linear combinations of interest are not prespecified 2. Relax the error distribution assumption Bootstrap Idea Resample from the original data – either directly or via a fitted model – to create replicate datasets Use these replicate datasets to create distributions for parameters of interest Consider the nonparametric version by utilizing the empirical distribution 12 Empirical Distribution The empirical distribution is one which equal probability 1/N is given to each sample value yi The corresponding estimate of the cdf F is the empirical distribution function (EDF) F̂ , which is defined as the sample proportion: Fˆ y # yi y N 13 New Technique The pivotal quantity for simultaneous inference on li: Q max 1 j k lˆ j l / s lˆ j Bootstrap Equivalent Replace the parameter with the estimates and the estimates with the bootstrap estimates to get: Q max * 1 j k * ˆ l lj /s lj * j Bootstrap Parameter Estimation Find the model fits Resample from the modified residuals N times with replacement Add these values to the fits and use them as observations Fit the new model and determine the bootstrap parameter estimates An Adjustment We usually at least assume that the errors are iid from a distribution with mean 0 and constant variance s2 The residuals on the other hand come from a common distribution with mean 0 and variance s2(1-hii) So the modified residuals become d * i ei 1 hii 17 Critical Point Procedure Create nonparametric bootstrap estimates for the unknown parameters in Q* Now find Q* by maximizing over the j elements Repeat this process for a large number of bootstrap samples (m) and take the (m+1)(1a)th order statistic Bootstrap Simulation Size Edwards and Berry (1987) showed conditional coverage probability of 95% simulation-based bounds will be +/-0.002 for 99% of the generations for (m+1)=80000 Example Chemical process experiment with k=5 from Box (1954) Goal: Maximize percentage yield Parameter Estimates Parameter Estimate l1 -0.041 l2 -0.400 l3 -1.782 l4 -2.625 l5 -4.461 Parameter Estimates Parameter Estimate l1 -0.041 l2 -0.400 l3 -1.782 l4 -2.625 l5 -4.461 Critical Point Using a=0.05 and (m+1)=80000, we get Q0.05 2.937 Estimates and 95% Simultaneous Confidence Intervals Parameter LCL Estimate UCL l1 -0.741 -0.041 0.660 l2 -0.840 -0.400 0.045 l3 -2.553 -1.782 -1.011 l4 -3.332 -2.625 -1.918 l5 -5.205 -4.461 -3.717 Estimates and 95% Simultaneous Confidence Intervals Parameter LCL Estimate UCL l1 -0.741 -0.041 0.660 l2 -0.840 -0.400 0.045 l3 -2.553 -1.782 -1.011 l4 -3.332 -2.625 -1.918 l5 -5.205 -4.461 -3.717 Relative Efficiency Comparison of critical points – For the example, we would only need ~88% of the sample size for the simulation method as compared to traditional simultaneous methods Computer Time – Approximately 2 minutes on a Intel Core 2 Duo computer Simulation Study 10 critical points were created For each critical point, 10000 confidence intervals were created by bootstrapping the residuals This was done 100 times for each point Simulation Results Conclusions New technique yields tighter bounds Works for linear combinations not prespecified Relaxes normality assumption on the error terms Simulation study yields adequate coverage Future Research Relax model assumptions further to include nonhomogeneous error variances Apply to other situations where we are unable to prespecify the combinations, such as ridge analysis