Download u| z

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia, lookup

Expectation–maximization algorithm wikipedia, lookup

Regression analysis wikipedia, lookup

Linear regression wikipedia, lookup

Data assimilation wikipedia, lookup

Regression toward the mean wikipedia, lookup

Time series wikipedia, lookup

Transcript
Lecture II-3: Interpolation and Variational Methods
Lecture Outline:
•
The Interpolation Problem, Estimation Options
•
Regression Methods
– Linear
– Nonlinear
•
Input-oriented Bayesian Methods
– Linear
– Nonlinear
•
Variational Solutions
•
SGP97 Case Study
A Typical Interpolation Problem -- Groundwater Flow
Problem is to characterize unknown heads at nodes on a discrete grid. Estimates
rely on scattered head measurements and must be compatible with the
groundwater flow equation.
State Eq. (GW Flow Eq)
Grid node
Continuous
T  2 y ( x)  u ( x) ; y  0
Discretized:
A(T ) y  Bu
Output Eq.
w  My
Meas Eq.
z  w    My  
on boundaries
Well observation
y = vector of hydraulic head at n grid nodes M = Matrix of coefs. used to interpolate nodal
heads to measurement locations
u = vector of recharge values at n grid nodes
z = vector of measurements at n locations
(uncertain)
T = Scalar transmissivity (assumed known)  = vector of n measurement errors (uncertain)
How can we characterize unknown states (heads) and inputs (recharge) at all nodes?
Options for Solving the Interpolation Problem
The two most commonly used options for solving the interpolation problem emphasize
point and probabilistic estimates, respectively.
1
Classical regression approach: Assume the input u is unknown and the
measurement error  is random with a zero mean and known covariance C  .
Adjust nodal values of u to obtain the ‘best’ (e.g. least-squares) fit between the
model output w and the measurement z. Given certain assumptions, this ‘point’
estimate may be used to derive probabilistic information about the range of likely
states.
2
Bayesian estimation approach: Assume u and  are random vectors described by
known unconditional PDFs f u(u) and f  ( ). Derive the conditional PDF of the
state f y|z (y|z) or, when this is not feasible, identify particular properties of this PDF.
Use this information to characterize the uncertain state variable.
Although these methods can lead to similar results in some cases, they are based on
different assumptions and have somewhat different objectives. We will emphasize the
Bayesian approach.
Classical Regression - Linear Problems
In the regression approach the “goodness of fit” between model outputs and
observations is measured in terms of the weighted sum-squared error JLS :
1
J LS  [ z  w]T C
[ z  w]
When the problem is linear (as in the groundwater example), the state and output are
linear functions of the input:
y  A 1 (T ) Bu  Du
w  My  MDu  Gu
In this case the error JLS is a quadratic function of u with a unique minimum which is a
linear function of z:
Function to minimize:
Minimizing u:
1
J LS (u )  [ z  Gu]T C
[ z  Gu]
uˆ LS  [(G T C 1G ) 1 G T C 1 ] z   z
û LS is the classic least-squares estimate of u. The corresponding least-squares
estimate of y is:
yˆ LS  D uˆ LS
Note that the matrix [G T C -1  G] has an inverse only when the number of unknowns
in u is less than the number of measurements in z.
Classical Regression - Nonlinear Problems
When the state and/or measurement vectors are nonlinear functions of the input, the
regression approach can be applied iteratively. Suppose that w = g(u). At each iteration
the linear estimation equations are used, with the nonlinear model approximated by a
first-order Taylor series:
w  g (u)  g (uˆ k ) G k (u  uˆ k )
On iteration k ; k = 1, … kmax
Where:
Gk 
g (u )
u u uˆ k
Then the least-squares estimation equations at iteration k become:
uˆ LS , k 1  uˆ LS , k  [(GkT C 1Gk ) 1 GkT C 1 ][ z  g (uˆ k )]
The iteration is started with a “first guess” û1 and then continued until the sequence
of estimates converges. An estimate of the state y = d(u) is obtained from the
converged estimate of y :
yˆ LS  d (uˆ LS )
In practice, JLS may have many local minima in the nonlinear case and convergence is
not guaranteed (i.e. the estimation problem may be ill-posed).
Bayesian Estimation - Linear Multivariate Normal Problems
Bayesian estimation focuses on the conditional PDF f y|z(y|z). This PDF conveys all the
information about the uncertain state y contained in the measurement vector z.
Derivation of f y|z(u|z) (which is multivariate normal) is straightforward when u and z are
jointly normal. This requirement is met in our groundwater example if we assume that:
•
f u (u) is multivariate normal with specified mean u and covariance Cu u
•
f  () is multivariate normal with a zero mean and covariance C 
•
•
The state and measurement equations are linear and the measurement error
is additive, so y = D u and z = My +  = MDu +  = G u + .
u and  are independent
In this case, f y|z(y|z) is completely defined by its mean and covariance, which can be
derived from the general expression for a conditional multivariate normal PDF:
1
yˆ B  E ( y | z )  y  C yz C zz
[z  z]
C yy | z  C yy  C yz C zz C Tyz
These expressions are equivalent to those obtained from kriging with a known
mean and optimal interpolation, when comparable assumptions are made.
Derivation of the Unconditional Mean and Covariance - Linear
Multivariate Normal Problems
The groundwater model enters the Bayesian estimation equations through the
unconditional mean y and the unconditional covariances Cyz and Czz. These can be
derived from the linear state and measurement equations and the specified
covariances Cu u and C .
y  E[ Du ]  Du
z  E[Gu   ]  Gu
C yy  E[( y  y )( y  y )T ]  E[ D(u  u )(u  u )T D T ]  DC uu D T
C yz  E[( y  y )( z  z )T ]  E[( D(u  u )(Gu  Gu   )T ]  DC uu G T
C zz  E[( z  z )( z  z )T ]  E[(Gu  Gu   )(Gu  Gu   )T ]  GCuu G T  C
The conditional mean estimate ŷ B obtained from these expressions can be shown to
approach the least-squares estimate ŷ LS when C u u  .
An approach similar to the one outlined above can be used to derive the conditional
mean and covariance of the uncertain input u.
Interpreting Bayesian Estimation Results
The conditional PDFs produced in the linear multivariate normal case are not particularly
informative in themselves. In practice, it is more useful to examine spatial plots of
scalar properties of these PDFs, such as the mean and standard deviation, or plots of
the marginal conditional PDFs at particular locations.
1
1
0.4
1
3
f y|z(y|z)
2
4
0.2
2
0
0
Contours of
ŷ B = E[y|z]
Contours of
yy|z
2
4
6
8
y
Marginal conditional PDF of y
at node 14
The conditional mean is generally used as a point estimate of y while the conditional
standard deviation provides a measure of confidence in this estimate. Note that the
conditional standard deviation decreases near well locations, reflecting the local
information provided by the head measurements.
Bayesian Estimation - Nonlinear Problems
When the state and/or measurement vectors are nonlinear functions of the input, the
variables y and z are generally not mutivariate normal, even if u and  are normal. In this
case, it is difficult to derive the conditional PDF f y|z(y| z) directly.
An alternative is to work with f u|z(u| z), the conditional PDF of u. Once f u|z(u| z) is
computed it may be possible to use it to derive f y|z(y| z) or some of its properties.
The PDF f u|z(u| z) can be obtained from Bayes Theorem:
f u | z (u | z ) 
f z |u ( z | u ) f u (u )
f z ( z)

f z |u ( z | u ) f u (u )
 f z|u ( z | u) fu (u)du
We suppose that f u (u) and f  () are given (e.g. multivariate normal). If the
measurement error is additive but the transformations y = d(u) and w = m(y) are
nonlinear, then:
z  m( y )    m [d (u )]    g (u )  
and the PDF f z|u(z| u) is:
f z |u ( z | u)  f [ z  g (u)]
In this case, we have all the information required to apply Bayes Theorem.
Obtaining Practical Bayesian Estimates -- The Conditional Mode
For problems of realistic size the conditional PDF f u|z(u| z) is difficult to derive in
closed form and is too large to store in numerical form. Even when this PDF can be
computed, it is difficult to interpret. Usually spatial plots of scalar PDF properties
provide the best characterization of the system’s inputs and states.
In the nonlinear case it is difficult to derive exact expressions for the conditional
mean and standard deviation or for the marginal conditional densities for nonlinear
problems. However, it is possible to estimate the conditional mode (maximum) of
f u|z(u| z).
f u|z(u|z)
0.35
Conditional PDF of u (given z)
for a scalar (single input)
problem
0.3
0.25
0.2
0.15
0.1
0.05
0
0
2
4
Mode (peak)
6
8
10
12
14
u
Deriving the Conditional Mode
The conditional mode is derived by noting that the maximum (with respect to u) of the
PDF f u|z(u| z) is the same as the minimum of - ln [ f u|z(u| z)] (since - ln[ ] is a
monotonically decreasing function of its argument). From Bayes Theorem we have (for
additive measurement error):
 f z |u ( z | u ) f u (u ) 
J B  ln[ f u | z (u | z ) ]   ln 

f z ( z)


  ln [ f  ( z  g (u ) ]  ln [ f u (u ) ]  ln [ f z ( z ) ]
If  and u are multivariate normal this expression may be written as:
1
1
1
1
J B  [ z  g (u )]T C
[ z  g (u )]  [u  u ]T Cuu
[u  u ]  Terms that do not
2
2
depend on u
The estimated mode of f u|z(u| z) is the value of u (represented by û B,mode) which
minimizes JB. Note that JB is an extended form of the least-square error measure JLS
used in nonlinear regression.
û B,mode is found with an iterative search similar to the one used to solve the nonlinear
regression problem. This search usually converges better than the regression search
because the second term in JB tends to give a better defined minimum. This is sometimes
called a regularization term.
Iterative Solution of Nonlinear Bayesian Minimization Problems
In spatially distributed problems where the dimension of u is large a gradient-based
search is the preferred method for minimizing JB. The search is carried out iteratively,
with the new estimate (at the end of iteration k) computed from the old estimate (at
the end of iteration k -1) and the gradient of JB evaluated at the old estimate:
u2
uˆ k 1
J ,


uˆ k   uˆ k 1 , B k 1 
u 

uˆ k 1
where:
J B, k 1
u
J (u )
 B
u uˆ
û k
k 1
u1
Contours of JB for a problem with 2
uncertain inputs, with search steps
shown in red
Conventional numerical computation of JB /u using, for example, a finite difference
technique, is very time-consuming, requiring order n model runs per iteration, where
n is the dimension of u. Variational (adjoint) methods can greatly reduce the effort
needed to compute JB /u.
Variational (Adjoint) Methods for Deriving Search Gradients-1
Variational methods obtain the search gradient JB /u indirectly, from the first
variation of a modified form of JB. These methods treat the state equation as an
equality constraint. This constraint is adjoined to JB with a Lagrange multiplier (or
adjoint vector). To illustrate, consider a static interpolation problem with nonlinear
state and measurement equations and an additive measurement error:
y  d(u)
z  m( y )  
When the state equation is adjoined the part of JB that depends on u is:
JB 
1
1
1
1
[ z  m( y )]T C
[ z  m( y )]  [u  u ]T Cuu
[u  u ]  T [ y  d (u )]
2
2
where  is the Lagrange multiplier (or adjoint) vector. At a local minimum
the first variation of JB must equal zero:

1
J B   [ z  m( y)]T C

m( y ) T 
d (u ) 

1
  y   [u  u ]T Cuu
 T
u  0
y

u


If  is selected to insure that the first bracketed term is zero then the
second bracketed term is the desired gradient JB /u.
Variational (Adjoint) Methods for Deriving Search Gradients - 2
The variational approach for computing JB /u on iteration k of the search can be
summarized as follows:
Compute state using input
estimate uˆk 1 from iteration k-1
yˆ k - 1  d(uˆk - 1 )
mk 1 T 1
k 1 
C [ z  m( yˆ k 1 )]
y
J B, k 1
u
1
 [uˆk 1  u ]T Cuu
 Tk 1
d k 1(u )
u
J ,


uˆ k   uˆ k 1 , B k 1 
u 

Compute adjoint from new state
Compute gradient at
Compute new input estimate
ûk
Here the subscripts k-1 on the partial derivatives m /y and d/u indicate that
they are evaluated at yˆ k 1 and uˆk 1, respectively.
There are many versions of this static variational algorithm, depending on the form
used to write the state equation. All of these give the same final result. In particular,
all require only one solution of the state equation, together with inversions of the
covariance matrices C and Cu u . When these matrices are diagonal (implying
uncorrelated input and measurement errors) the inversions are straightforward. When
correlation is included they can be computationally demanding.
SGP97 Experiment - Soil Moisture Campaign
Case Study
Area
Aircraft microwave
measurements
Test of Variational Smoothing Algorithm – SGP97 Soil Moisture Problem
Mean landatmosphere
boundary fluxes
Soil properties
and land use
Land surface
model
Mean initial
conditions
Observing System Simulation
Experiment (OSSE)
Random
input error
“True” soil,
canopy moisture
and temperature
Radiative
transfer model
“True”
radiobrightness
Random
meas. error
Random initial
condition error
“Measured”
radiobrightness
Estimation
error
Variational
Algorithm
Soil properties
and land use,
mean fluxes and
initial conditions,
error covariances
Estimated
radiobrightness
and soil moisture
Synthetic Experiment (OSSE) based on SGP97 Field Campaign
Synthetic experiment uses real soil, landcover, and precipitation data from SGP97
(Oklahoma). Radiobrightness measurements are generated from our land surface and
radiative transfer models, with space/time correlated model error (process noise) and
measurement error added.
SGP97 study area,
showing principal inputs
to data assimilation
algorithm:
Effects of Smoothing Window Configuration
Position and length of variational smoothing window affect estimation accuracy.
Estimation error is less for longer windows that are reinitialized just after (rather than
just before) measurement times.
reference experiment (rms = 0.029)
3 assim. intervals A (rms = 0.03)
12 assim. intervals B (rms = 0.032)
12 assim. intervals C (rms = 0.038)
radiobrightness observation times
top node saturation rms error [-]
0.05
0.04
0.03
/
d
]
0.02
A
B
Window
configurations
[
m
m
0.01
n
C
170
172
174
176
day of year
178
180
182
1
5
0
1
0
0
r
e
c
i
p
i
t
a
t
i
o
0
0
P
5
0
1
7
0
1
7
2
1
7
4
1
d
a
y
7
6
o
1
f
y
e
a
r
7
8
1
8
0
1
8
2
Effects of Precipitation Information
Variational algorithm performs well even without precipitation information. In this
case, soil moisture is inferred only from microwave measurements.
top node saturation rms error [-]
reference experiment (rms = 0.014)
est - precip. withheld (rms = 0.034)
prior - precip. withheld (rms = 0.19)
0.35
0.3
0.25
0.2
0.15
0.05
[
m
m
/
d
]
0.1
170
172
174
176
178
180
182
1
1
1
a
t
i
o
n
0
i
p
i
t
day of year
5
0
1
0
0
r
e
c
1
0
P
5
0
1
7
0
1
7
2
1
7
4
1
d
a
y
7
6
o
f
y
e
a
r
7
8
8
0
8
2
Summary
The Bayesian estimation approach outlined above is frequently used to solve static
data assimilation (or interpolation) problems. It has the following notable features:
•
When the state and measurement equations are linear and inputs and
measurements errors are normally distributed the conditional PDFs f y|z(y| z)]
and f u|z(u| z) are multivariate normal. In this case the Bayesian conditional
mean and Bayesian conditional mode approaches give the same point estimate
(i.e. the conditional mode is equal to the conditional mean).
•
When the problem is nonlinear the Bayesian conditional mean and mode
estimates are generally different. The Bayesian conditional mean estimate is
generally not practical to compute for nonlinear problems of realistic size.
•
The least squares approach is generally less likely than the Bayesian approach
to converge to a reasonable answer for nonlinear problems since it does not
benefit from the “regularization” properties imparted by the second term in JB.
•
The variational (adjoint) approach greatly improves the computational efficiency
of the Bayesian conditional mode estimation algorithm, especially for large
problems.
•
The input-oriented variational approach discussed here is a 3DVAR data
assimilation algorithm. This name reflects the fact that 3DVAR is used for
problems with variability in three spatial dimensions but not in time. 4DVAR
data assimilation methods extend the concepts discussed here to timedependent (dynamic) problems.