Download Analysis of Regression Confidence Intervals and Bayesian

Auxiliary Material for Manuscript 2011WR011289: Analysis of Regression Confidence Intervals and Bayesian Credible Intervals for Uncertainty Quantification1 Dan Lu2, Ming Ye2, Mary C. Hill3 2 Department of Scientific Computing, Florida State University, Tallahassee, Florida, USA 3 U.S. Geological Survey, Boulder, Colorado, USA March 6, 2012 1 Lu, D., M. Ye, and M. C. Hill (2012), Analysis of regression confidence intervals and Bayesian credible intervals for uncertainty quantification. Water Resour. Res. DOI: 10.1029/2011WR011289. Appendix A: Derivation of Posterior Parameter Distribution for Noninformative Prior For a linear model y  Xβ  ε with ε Nn (0, C ) , where C   2ω 1 , assume that β and σ have independent prior distributions and denote θ  (β,  ) , then p(θ)  p(β) p( ) . With Jeffery’s’ noninformative priors, i.e., p (β)  constant and p ( )  1/  , based on Bayes’ theorem p(θ | y )  p(y | θ) p(θ) with Gaussian likelihood function, p(y ) T  1  exp   2  y  Xβ  ω  y  Xβ    2 , p  y | θ  1/2 n /2 2  2   ω (A1) we have p(β,  | y )  p  y | θ   1 (A2) T  1     ( n 1) exp   2  y  Xβ  ω  y  Xβ   .  2  Since ω is positive-definite, there exists an n  n nonsingular matrix K such that ω  K T K , therefore,  y  Xβ  T ω  y  Xβ    y  Xβ  K T K  y  Xβ  T  Ky  KXβ 2 2  Ky  KXbˆ  KXbˆ  KXβ 2  (y  Xbˆ )T ω(y  Xbˆ )  KXbˆ  KXβ , (A3) 2  (n  p) s 2  (β  bˆ )T XT ωX(β  bˆ ) where 1 defines the L2 norm of a vector, b̂ is the least-square estimate, bˆ   XT ωX  XT ωy , and s2 is an unbiased estimate of σ2, s 2  (y  Xbˆ )T ω(y  Xbˆ ) / (n  p) . Substituting (A3) into (A2) leads to  1  p(β,  | y )    ( n 1) exp   2 (n  p) s 2  (β  bˆ )T XT ωX(β  bˆ )   .  2  (A4) The distribution of β can be obtained by integrating (A2) with respect to σ2. Analogy to   0 x  ( n 1) exp(a / x 2 )dx  1  n /2 a (n / 2) 2 (A5) of gamma distribution, the posterior distribution of β is  p(β | y )   p(β,  | y )d 0 T   y  Xβ  ω  y  Xβ     (A6)  n /2 . Substituting (A3) into (A6) leads to  (β  bˆ )T XT ωX(β  bˆ )  p(β | y )  1   (n  p) s 2    n /2 . (A7) Equation (A7) is a special case of the p-dimensional multivariate t-distribution 1 ( (v  p))  ( v  p )/2 2 1  v 1 (t  μ)T Σ 1 (t  μ)  p (t )  1 ( v) p /2 ( v) | Σ |1/2 2 with v=(n-p), Σ  s 2  XT ωX  1 (A8) and μ  bˆ . If σ is known, it follows from (A4) that  1   1  p(β | y )  exp   2 (β  bˆ )T XT ωX(β  bˆ )   exp   (β  bˆ )T XT C1X(β  bˆ )  ,  2   2  which is multivariate Gaussian distribution. (A9) Appendix B: Equivalence of Credible and Confidence Intervals for Consistent Prior Information In Bayesian analysis, for a linear model y  Xβ  ε with ε Nn (0, C ) , where variance- covariance matrix C is known, the conjugate prior distribution of parameters β is assumed as (B1) p (β) : N p (β p ,C p ) , and likelihood function as p ( y | β) = 1 (2 ) n /2  1  exp   (y  Xβ) C1 (y  Xβ)  | C |  2 . Then based on the Bayes’ theorem p(β | y )  (B2) p(y | β) p(β) , the posterior distribution of p(y ) parameters is 1  1  p(β | y )  exp   (y  Xβ) C1 (y  Xβ)  (β  β p )  Cp1 (β  β p )  2  2   1  exp    y C1y  y C1Xβ  β  XC1y  β  XC1Xβ  2 β Cp1β  β Cp1β p  β p Cp1β  β pCp1β p   (B3)  1  exp  β   XC1X  Cp1  β   y C1X  β pCp1  β  2 β   XC1y  Cp1β p    y C1y  β pCp1β p   . Because the covariance matrices are symmetric, the terms in the square bracket in (B3) can be generalized as β Aβ  Bβ  βB  G   β  A1B  A  β  A1B    G  B A1B  ,  where (B4) A  XC1X  Cp1 , B  XC1y  Cp1β p , and G  y C1y  βpCp1β p . Because G  B  A 1B is irrelevant to β and can be treated as a constant, (B3) can be simplified as   1  p(β | y )  exp    β  β 'p   XC1X  Cp1  β  β 'p   ,  2  where β'p   XT C1X  Cp1  1 X C T (B5) y  Cp1β p  . Thus the posterior distribution p(β | y ) is 1  p (β | y ) : N p  β 'p , C'p  , (B6) with 1 C'p   XT C1X  Cp1  . (B7) Correspondingly, the linear prediction function g (β ) (i.e., g(β)=Zβ) has g (β) N p ( g (βp ), ZT (XT C1X  Cp1 )1 Z) , (B8) and its (1   ) 100% credible interval is g (β'p )  z1 /2[ZT (XT C1X  Cp1 )1 Z]1/2 . (B9) The similar procedure can be applied to linearized nonlinear models, as shown in McLaughlin and Townley [1996]. In classical regression, if the observations y can be simulated by a linear model y  Xβ  ε with ε Nn (0, C ) , and the prior information yβ on parameters β is available and represented as y   β  ε  with errors ε  N npri (0, C ) , then combining the two kinds of data information gives the augmented linear model [Schweppe, 1973, p.104; Cooley, 1983; Hill and Tiedeman, 2007], y  Xβ  ε , (B10) y  where y    is a vector of n observations of y and npri prior information of yβ; β is a vector y   X of p unknown true model parameters; X    is a (n+npri)×p coefficient matrix with X I  representing the sensitivity of observations to parameters and the identity matrix I representing ε  the sensitivity of prior information to parameters; and ε    is a vector of (n+npri) errors ε   with ε representing errors of observations and εβ representing errors of prior information. Assume the errors in prior information on the parameters are uncorrelated to the errors in observations, we get ε N( n npri ) (0, C) ,  C with C    0 (B11) 0  T T T  where E (εε )  C , E(ε  ε  )  C and E(εε )  0 . C  For the linear model defined in equation (B10), linear regression parameter estimates b̂ are obtained by  minimizing   T the generalized least-squares objective function  S (b)  y  Xb C1 y  Xb with respect to b, where b is a general vector of model parameters,  i.e., bˆ  XT C1X  1 XT C1y   XT C1X  C1  1 X C T y  C1y   with specifying X , C , and 1  y . The estimates follow a multivariate Gaussian distribution [Toutenburg, 1982, p.52], bˆ   1 N p (β, ( XT C1X) 1 )  exp   bˆ  β  2  X C   1     X  C1  bˆ  β  , (B12) which is equivalent to the posterior distribution of β in (B5) when the prior information y  is equal to the mean of prior distribution β p , and the covariance of errors of the prior information C  is equal to the covariance of the prior distribution C p . Under these conditions, the confidence intervals based on the distribution of b̂ (B12) is numerically identical to the credible intervals based on the posterior distribution of β (B5). As in (B9), the (1   ) 100% confidence interval for linear model prediction, g(β)=Zβ, is g (bˆ )  z1 /2 [ZT ( XT C1X  C1 )1 Z]1/2 . (B13) When prior information is included in evaluating parameter uncertainty, the posterior covariance matrix,  XT C1X  C1  indicates that the matrix is inverse of sum of two measures 1 of information from data and prior (where the measure of information is the inverse of the covariance matrix which is positive-semidefinite). This suggests that no matter how much information brought by data, the a posteriori covariance matrix would not be greater than the a priori [Box and Tiao, 1992, p. 17]. A common concern when using prior information and observations is that they may be conceptually different, for example, due to scale issues. This draws into question the integrated use of observations and prior information in both regression and Bayesian methods. During parameter estimation, Hill and Tiedeman [2007, p. 288-289] suggest putting more emphasis on observation data for the following two reasons: (1) experience has shown that in many systems observations often can be measured more accurately than prior information, and (2) the relation between observed and simulated values is usually more direct than is the relation between prior information and model parameter values. However, propagation of uncertainty can have different goals than does parameter estimation, and the meaning of the prior information needs to be considered carefully. Further examination of this issue is beyond the scope of the present work, which focuses on the comparison of the two methods, not this difficulty shared by both. Appendix C: Derivation of Linear Credible Intervals for Linear and Nonlinear Models According to Berger [1985], if Jeffreys’ noninformative prior p (β)  constant is considered, the posterior density of parameter β is determined solely by data y. Then Bayes’ theorem p(β | y )  p(β | y)  p(y | β) p(β) can be written as p(y ) c  exp[log p(y | β)]  c  exp[log p(y)]dβ  exp[log p(y | β)]  exp[log p(y)]dβ . (C1) By considering a Taylor series expansion of log p (y | β) about b̂ (which maximizes log p (y | β) ) and retaining terms up to the second order, equation (C1) is approximated by 1   exp log p (y | bˆ )  (β  bˆ )T I (bˆ )(β  bˆ )  2   p (β | y )  1   T  exp log p(y | bˆ )  2 (β  bˆ ) I (bˆ )(β  bˆ )  dβ  1  exp   (β  bˆ )T I (bˆ )(β  bˆ )   2   p /2 1/2 ˆ (2 ) | I (b) | (C2) ,   2 log p(y | β)  where I (bˆ )     , and p is the number of parameters. It leads directly to that ββT   β bˆ the posterior density is multivariate normal, i.e., p(β | y) y  Xβ  ε with ε   N p bˆ , [ I (bˆ )]1 . For a linear model Nn (0, C ) , this distribution is exact, because derivatives of log p (y | β) of 1 1 orders higher than two are zeros. Given that bˆ   XT C1X  XT C1y and [ I (bˆ )]1   XT C1X  , parameter distribution p(β | y) N p (bˆ ,( XT C1X)1 ) is obtained directly; so is the linear credible interval of a linear prediction g (β ) . For a nonlinear model y  f (β)  ε with ε Nn (0, C ) , because the higher-order derivatives are not necessarily zero, equation (C2) is an approximation of posterior density p (β | y ) . Following model linearization f (b)  f (bˆ )  Xbˆ (b  bˆ ) , the posterior density p (β | y ) is approximated by β  N p bˆ , [ I (bˆ )]1  1 bˆ   XTbˆ C1XTbˆ  XTbˆ C1y 1 [ I (bˆ )]1   XTbˆ C1XTbˆ  (C3) . If the nonlinear prediction function g (b ) also can be linearized by g (b)  g (bˆ )  ZTbˆ (b  bˆ ) , the linear credible interval is the same as the linear confidence interval for nonlinear models. Appendix D: Figure of Sensitivity and Residual Analysis for Model 3Z in Complex Groundwater Test Case (a) 25 Composite scaled sensitivity Composite scaled sensitivity 25 20 15 10 5 0 K1 RCH K3 KRB LAKERCH K2 20 15 10 5 0 KV VANI (c) K1 RCH K3 KV VANI (d) (b) 2 Weighted residual 2 Weighted residual K2 3 3 1 0 -1 1 0 -1 -2 -2 -3 -20 KRB LAKERCH Parameter name Parameter name -10 0 10 20 Weighted simulated value 30 40 -3 -20 -10 0 10 20 30 40 Weighted simulated value Figure D1. Composite scaled sensitivity and weighted residuals versus weighted simulated values for model 3Z calibrated using two ((a) - (b)) and eighteen ((c) - (d)) observations of streamflow gain. Calibration data include hydraulic heads (o), flows (+), lake stage (*), and measurements of net lake recharge (Δ). Residuals are mostly positive for weighted residuals between 26 and 32 as bounded by the two vertical lines. Appendix E: Prior Information Used for the Groundwater Model Parameters Assuming that measurements of net lake recharge (LAKERCH) are always available in practice, prior information of LAKERCH is used in this study for all the three groundwater models, and the prior information of sixteen hydraulic conductivities (locations are shown in Figure 3(d) of the article) is used only for model INT, because this information about parameters are known before collecting the data. Due to the insensitivity of vertical anisotropy (VANI) in the three models calibration, convergence of model calibration is difficult. To help the convergence, prior information of VANI is used for regularization. This prior information is also used for calculating confidence intervals. To make the calculation of credible interval in MCMC consistent with that of confidence interval, the consistent prior distribution of VANI is assumed. The details of the prior information for all the parameters in the three models are listed in the following tables, where KRB represents hydraulic conductance of the riverbed; RCH represents recharge rate; KV represents leakance of the confining unit; and K represents hydraulic conductivity. Table E1: Prior information of model parameters for model HO. U and N stand for uniform and normal distributions. Parameter Prior distribution Noninformative parameters KRB U (102, 105) RCH U (10-4, 10-2) KV U (10-4, 1) K U (10-2, 500) Informative parameters LAKERCH N (0.000603, 0.0003) VANI N (2.5, 0.6) Table E2: Prior information of model parameters for model 3Z. U and N stand for uniform and normal distributions. Parameter Prior distribution Noninformative parameters KRB U (102, 105) RCH U (10-4, 10-2) KV U (10-4, 1) K1 U (10-2, 500) K2 U (10-2, 103) K3 U (10-2, 500) Informative parameters LAKERCH N (0.000603, 0.0003) VANI N (2.5, 0.6) Table E3: Prior information of model parameters for model INT. U and N stand for uniform and normal distributions. Parameter Prior distribution Noninformative parameters KRB1 U (102, 105) KRB2 U (102, 106) KRB3 U (102, 105) RCH U (10-4, 10-2) KV U (10-4, 1) KA U (10-2, 500) KB U (10-2, 500) KC U (10-2, 500) Informative parameters K1 N (132, 26.4) K2 N (100, 20) K3 N (53, 10.6) K4 N (104, 20.8) K10 N (90, 18.0) K14 N (42, 8.4) K15 N (60, 12.0) K18 N (54, 10.8) K19 N (114, 22.8) K20 N (82, 16.4) K21 N (27, 5.4) K22 N (52, 10.4) K23 N (55, 11.0) K24 N (67, 13.4) K25 N (64, 12.8) K26 N (57, 11.4) LAKERCH N (0.000603, 0.0003) VANI N (2.5, 0.6) Take model HO as an example. Figure E1 indicates that the chosen uniform distributions for the four model parameters are flat and have no influential impact on the posterior distribution, suggesting that those are noninformative priors. -4 x 10 2500 Normal Prior Posterior 5000 Uniform Prior Posterior 5 2000 Uniform Prior Posterior 4000 4 PDF PDF 3000 PDF 1500 3 1000 2000 2 500 1000 1 0 -0.5 0 0.5 1 1.5 LAKERCH 2 0 2000 2.5 4000 6000 KRB -3 x 10 100 8000 0 10000 6 8 10 12 14 16 -4 x 10 1 Uniform Prior Posterior 0.1 80 4 RCH 0.12 Uniform Prior Posterior 2 Normal Prior Posterior 0.8 0.08 PDF PDF 0.6 PDF 60 0.06 40 0.4 0.04 20 0 0.2 0.02 0 0.01 0.02 0.03 KV 0.04 0.05 0 20 30 40 50 K 60 70 0 0 2 4 VANI 6 8 Figure E1: The prior and posterior probabilities for HO model parameters. The prior information is plotted based on Table E1. References Berger, J. O. (1985), Statistical decision theory and Bayesian analysis, 2nd edition, SpringerVerlag, New York, 641pp. Box, E. P., and G. C. Tiao (1992), Bayesian inference in statistical analysis, Wiley classics library edition published, 588pp. Cooley, R. L. (1983), Incorporation of prior information on parameters into nonlinear regression groundwater flow models: 2. Applications, Water Resour. Res., 19(3), 662–676. Hill, M. C., and C. R. Tiedeman (2007), Effective calibration of ground water models, with analysis of data, sensitivities, predictions, and uncertainty, John Wiely & Sons, New York, 480pp. McLaughlin, D., and L. R. Townley (1996), A reassessment of the groundwater inverse problem, Water Resour. Res., 32(5), 1131-1161. Schweppe, F. C. (1973), Uncertainty dynamic systems, Prentice-Hall, Englewood Cliffs, N.J.. Toutenburg, H. (1982), Prior information in linear models, Wiely series in probability and mathematical statistics, 215pp.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Analysis of Regression Confidence Intervals and Bayesian