Lecture II-3: Interpolation and Variational Methods Lecture Outline: • The Interpolation Problem, Estimation Options • Regression Methods – Linear – Nonlinear • Input-oriented Bayesian Methods – Linear – Nonlinear • Variational Solutions • SGP97 Case Study A Typical Interpolation Problem -- Groundwater Flow Problem is to characterize unknown heads at nodes on a discrete grid. Estimates rely on scattered head measurements and must be compatible with the groundwater flow equation. State Eq. (GW Flow Eq) Grid node Continuous T 2 y ( x) u ( x) ; y 0 Discretized: A(T ) y Bu Output Eq. w My Meas Eq. z w My on boundaries Well observation y = vector of hydraulic head at n grid nodes M = Matrix of coefs. used to interpolate nodal heads to measurement locations u = vector of recharge values at n grid nodes z = vector of measurements at n locations (uncertain) T = Scalar transmissivity (assumed known) = vector of n measurement errors (uncertain) How can we characterize unknown states (heads) and inputs (recharge) at all nodes? Options for Solving the Interpolation Problem The two most commonly used options for solving the interpolation problem emphasize point and probabilistic estimates, respectively. 1 Classical regression approach: Assume the input u is unknown and the measurement error is random with a zero mean and known covariance C . Adjust nodal values of u to obtain the ‘best’ (e.g. least-squares) fit between the model output w and the measurement z. Given certain assumptions, this ‘point’ estimate may be used to derive probabilistic information about the range of likely states. 2 Bayesian estimation approach: Assume u and are random vectors described by known unconditional PDFs f u(u) and f ( ). Derive the conditional PDF of the state f y|z (y|z) or, when this is not feasible, identify particular properties of this PDF. Use this information to characterize the uncertain state variable. Although these methods can lead to similar results in some cases, they are based on different assumptions and have somewhat different objectives. We will emphasize the Bayesian approach. Classical Regression - Linear Problems In the regression approach the “goodness of fit” between model outputs and observations is measured in terms of the weighted sum-squared error JLS : 1 J LS [ z w]T C [ z w] When the problem is linear (as in the groundwater example), the state and output are linear functions of the input: y A 1 (T ) Bu Du w My MDu Gu In this case the error JLS is a quadratic function of u with a unique minimum which is a linear function of z: Function to minimize: Minimizing u: 1 J LS (u ) [ z Gu]T C [ z Gu] uˆ LS [(G T C 1G ) 1 G T C 1 ] z z û LS is the classic least-squares estimate of u. The corresponding least-squares estimate of y is: yˆ LS D uˆ LS Note that the matrix [G T C -1 G] has an inverse only when the number of unknowns in u is less than the number of measurements in z. Classical Regression - Nonlinear Problems When the state and/or measurement vectors are nonlinear functions of the input, the regression approach can be applied iteratively. Suppose that w = g(u). At each iteration the linear estimation equations are used, with the nonlinear model approximated by a first-order Taylor series: w g (u) g (uˆ k ) G k (u uˆ k ) On iteration k ; k = 1, … kmax Where: Gk g (u ) u u uˆ k Then the least-squares estimation equations at iteration k become: uˆ LS , k 1 uˆ LS , k [(GkT C 1Gk ) 1 GkT C 1 ][ z g (uˆ k )] The iteration is started with a “first guess” û1 and then continued until the sequence of estimates converges. An estimate of the state y = d(u) is obtained from the converged estimate of y : yˆ LS d (uˆ LS ) In practice, JLS may have many local minima in the nonlinear case and convergence is not guaranteed (i.e. the estimation problem may be ill-posed). Bayesian Estimation - Linear Multivariate Normal Problems Bayesian estimation focuses on the conditional PDF f y|z(y|z). This PDF conveys all the information about the uncertain state y contained in the measurement vector z. Derivation of f y|z(u|z) (which is multivariate normal) is straightforward when u and z are jointly normal. This requirement is met in our groundwater example if we assume that: • f u (u) is multivariate normal with specified mean u and covariance Cu u • f () is multivariate normal with a zero mean and covariance C • • The state and measurement equations are linear and the measurement error is additive, so y = D u and z = My + = MDu + = G u + . u and are independent In this case, f y|z(y|z) is completely defined by its mean and covariance, which can be derived from the general expression for a conditional multivariate normal PDF: 1 yˆ B E ( y | z ) y C yz C zz [z z] C yy | z C yy C yz C zz C Tyz These expressions are equivalent to those obtained from kriging with a known mean and optimal interpolation, when comparable assumptions are made. Derivation of the Unconditional Mean and Covariance - Linear Multivariate Normal Problems The groundwater model enters the Bayesian estimation equations through the unconditional mean y and the unconditional covariances Cyz and Czz. These can be derived from the linear state and measurement equations and the specified covariances Cu u and C . y E[ Du ] Du z E[Gu ] Gu C yy E[( y y )( y y )T ] E[ D(u u )(u u )T D T ] DC uu D T C yz E[( y y )( z z )T ] E[( D(u u )(Gu Gu )T ] DC uu G T C zz E[( z z )( z z )T ] E[(Gu Gu )(Gu Gu )T ] GCuu G T C The conditional mean estimate ŷ B obtained from these expressions can be shown to approach the least-squares estimate ŷ LS when C u u . An approach similar to the one outlined above can be used to derive the conditional mean and covariance of the uncertain input u. Interpreting Bayesian Estimation Results The conditional PDFs produced in the linear multivariate normal case are not particularly informative in themselves. In practice, it is more useful to examine spatial plots of scalar properties of these PDFs, such as the mean and standard deviation, or plots of the marginal conditional PDFs at particular locations. 1 1 0.4 1 3 f y|z(y|z) 2 4 0.2 2 0 0 Contours of ŷ B = E[y|z] Contours of yy|z 2 4 6 8 y Marginal conditional PDF of y at node 14 The conditional mean is generally used as a point estimate of y while the conditional standard deviation provides a measure of confidence in this estimate. Note that the conditional standard deviation decreases near well locations, reflecting the local information provided by the head measurements. Bayesian Estimation - Nonlinear Problems When the state and/or measurement vectors are nonlinear functions of the input, the variables y and z are generally not mutivariate normal, even if u and are normal. In this case, it is difficult to derive the conditional PDF f y|z(y| z) directly. An alternative is to work with f u|z(u| z), the conditional PDF of u. Once f u|z(u| z) is computed it may be possible to use it to derive f y|z(y| z) or some of its properties. The PDF f u|z(u| z) can be obtained from Bayes Theorem: f u | z (u | z ) f z |u ( z | u ) f u (u ) f z ( z) f z |u ( z | u ) f u (u ) f z|u ( z | u) fu (u)du We suppose that f u (u) and f () are given (e.g. multivariate normal). If the measurement error is additive but the transformations y = d(u) and w = m(y) are nonlinear, then: z m( y ) m [d (u )] g (u ) and the PDF f z|u(z| u) is: f z |u ( z | u) f [ z g (u)] In this case, we have all the information required to apply Bayes Theorem. Obtaining Practical Bayesian Estimates -- The Conditional Mode For problems of realistic size the conditional PDF f u|z(u| z) is difficult to derive in closed form and is too large to store in numerical form. Even when this PDF can be computed, it is difficult to interpret. Usually spatial plots of scalar PDF properties provide the best characterization of the system’s inputs and states. In the nonlinear case it is difficult to derive exact expressions for the conditional mean and standard deviation or for the marginal conditional densities for nonlinear problems. However, it is possible to estimate the conditional mode (maximum) of f u|z(u| z). f u|z(u|z) 0.35 Conditional PDF of u (given z) for a scalar (single input) problem 0.3 0.25 0.2 0.15 0.1 0.05 0 0 2 4 Mode (peak) 6 8 10 12 14 u Deriving the Conditional Mode The conditional mode is derived by noting that the maximum (with respect to u) of the PDF f u|z(u| z) is the same as the minimum of - ln [ f u|z(u| z)] (since - ln[ ] is a monotonically decreasing function of its argument). From Bayes Theorem we have (for additive measurement error): f z |u ( z | u ) f u (u ) J B ln[ f u | z (u | z ) ] ln f z ( z) ln [ f ( z g (u ) ] ln [ f u (u ) ] ln [ f z ( z ) ] If and u are multivariate normal this expression may be written as: 1 1 1 1 J B [ z g (u )]T C [ z g (u )] [u u ]T Cuu [u u ] Terms that do not 2 2 depend on u The estimated mode of f u|z(u| z) is the value of u (represented by û B,mode) which minimizes JB. Note that JB is an extended form of the least-square error measure JLS used in nonlinear regression. û B,mode is found with an iterative search similar to the one used to solve the nonlinear regression problem. This search usually converges better than the regression search because the second term in JB tends to give a better defined minimum. This is sometimes called a regularization term. Iterative Solution of Nonlinear Bayesian Minimization Problems In spatially distributed problems where the dimension of u is large a gradient-based search is the preferred method for minimizing JB. The search is carried out iteratively, with the new estimate (at the end of iteration k) computed from the old estimate (at the end of iteration k -1) and the gradient of JB evaluated at the old estimate: u2 uˆ k 1 J , uˆ k uˆ k 1 , B k 1 u uˆ k 1 where: J B, k 1 u J (u ) B u uˆ û k k 1 u1 Contours of JB for a problem with 2 uncertain inputs, with search steps shown in red Conventional numerical computation of JB /u using, for example, a finite difference technique, is very time-consuming, requiring order n model runs per iteration, where n is the dimension of u. Variational (adjoint) methods can greatly reduce the effort needed to compute JB /u. Variational (Adjoint) Methods for Deriving Search Gradients-1 Variational methods obtain the search gradient JB /u indirectly, from the first variation of a modified form of JB. These methods treat the state equation as an equality constraint. This constraint is adjoined to JB with a Lagrange multiplier (or adjoint vector). To illustrate, consider a static interpolation problem with nonlinear state and measurement equations and an additive measurement error: y d(u) z m( y ) When the state equation is adjoined the part of JB that depends on u is: JB 1 1 1 1 [ z m( y )]T C [ z m( y )] [u u ]T Cuu [u u ] T [ y d (u )] 2 2 where is the Lagrange multiplier (or adjoint) vector. At a local minimum the first variation of JB must equal zero: 1 J B [ z m( y)]T C m( y ) T d (u ) 1 y [u u ]T Cuu T u 0 y u If is selected to insure that the first bracketed term is zero then the second bracketed term is the desired gradient JB /u. Variational (Adjoint) Methods for Deriving Search Gradients - 2 The variational approach for computing JB /u on iteration k of the search can be summarized as follows: Compute state using input estimate uˆk 1 from iteration k-1 yˆ k - 1 d(uˆk - 1 ) mk 1 T 1 k 1 C [ z m( yˆ k 1 )] y J B, k 1 u 1 [uˆk 1 u ]T Cuu Tk 1 d k 1(u ) u J , uˆ k uˆ k 1 , B k 1 u Compute adjoint from new state Compute gradient at Compute new input estimate ûk Here the subscripts k-1 on the partial derivatives m /y and d/u indicate that they are evaluated at yˆ k 1 and uˆk 1, respectively. There are many versions of this static variational algorithm, depending on the form used to write the state equation. All of these give the same final result. In particular, all require only one solution of the state equation, together with inversions of the covariance matrices C and Cu u . When these matrices are diagonal (implying uncorrelated input and measurement errors) the inversions are straightforward. When correlation is included they can be computationally demanding. SGP97 Experiment - Soil Moisture Campaign Case Study Area Aircraft microwave measurements Test of Variational Smoothing Algorithm – SGP97 Soil Moisture Problem Mean landatmosphere boundary fluxes Soil properties and land use Land surface model Mean initial conditions Observing System Simulation Experiment (OSSE) Random input error “True” soil, canopy moisture and temperature Radiative transfer model “True” radiobrightness Random meas. error Random initial condition error “Measured” radiobrightness Estimation error Variational Algorithm Soil properties and land use, mean fluxes and initial conditions, error covariances Estimated radiobrightness and soil moisture Synthetic Experiment (OSSE) based on SGP97 Field Campaign Synthetic experiment uses real soil, landcover, and precipitation data from SGP97 (Oklahoma). Radiobrightness measurements are generated from our land surface and radiative transfer models, with space/time correlated model error (process noise) and measurement error added. SGP97 study area, showing principal inputs to data assimilation algorithm: Effects of Smoothing Window Configuration Position and length of variational smoothing window affect estimation accuracy. Estimation error is less for longer windows that are reinitialized just after (rather than just before) measurement times. reference experiment (rms = 0.029) 3 assim. intervals A (rms = 0.03) 12 assim. intervals B (rms = 0.032) 12 assim. intervals C (rms = 0.038) radiobrightness observation times top node saturation rms error [-] 0.05 0.04 0.03 / d ] 0.02 A B Window configurations [ m m 0.01 n C 170 172 174 176 day of year 178 180 182 1 5 0 1 0 0 r e c i p i t a t i o 0 0 P 5 0 1 7 0 1 7 2 1 7 4 1 d a y 7 6 o 1 f y e a r 7 8 1 8 0 1 8 2 Effects of Precipitation Information Variational algorithm performs well even without precipitation information. In this case, soil moisture is inferred only from microwave measurements. top node saturation rms error [-] reference experiment (rms = 0.014) est - precip. withheld (rms = 0.034) prior - precip. withheld (rms = 0.19) 0.35 0.3 0.25 0.2 0.15 0.05 [ m m / d ] 0.1 170 172 174 176 178 180 182 1 1 1 a t i o n 0 i p i t day of year 5 0 1 0 0 r e c 1 0 P 5 0 1 7 0 1 7 2 1 7 4 1 d a y 7 6 o f y e a r 7 8 8 0 8 2 Summary The Bayesian estimation approach outlined above is frequently used to solve static data assimilation (or interpolation) problems. It has the following notable features: • When the state and measurement equations are linear and inputs and measurements errors are normally distributed the conditional PDFs f y|z(y| z)] and f u|z(u| z) are multivariate normal. In this case the Bayesian conditional mean and Bayesian conditional mode approaches give the same point estimate (i.e. the conditional mode is equal to the conditional mean). • When the problem is nonlinear the Bayesian conditional mean and mode estimates are generally different. The Bayesian conditional mean estimate is generally not practical to compute for nonlinear problems of realistic size. • The least squares approach is generally less likely than the Bayesian approach to converge to a reasonable answer for nonlinear problems since it does not benefit from the “regularization” properties imparted by the second term in JB. • The variational (adjoint) approach greatly improves the computational efficiency of the Bayesian conditional mode estimation algorithm, especially for large problems. • The input-oriented variational approach discussed here is a 3DVAR data assimilation algorithm. This name reflects the fact that 3DVAR is used for problems with variability in three spatial dimensions but not in time. 4DVAR data assimilation methods extend the concepts discussed here to timedependent (dynamic) problems.