Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Path Integral Method and Speeding up MC with Nonlocal Strategies Juan M. Restrepo Department of Mathematics Physics Department University of Arizona Collaborators Gregory Eyink (Johns Hopkins) Frank Alexander (LANL) Parsing the Problem There is no single data assimilation problem, when the challenges are considered: Spatio/temporal scales High frequency of data insertion or low frequency When the number of state variables is small or large Three Estimation Problems: Given a random time series {X(t): t < t0} X(t) 2 RN Prediction: Estimate {x(t): t> t0} Filtering (Nudiction): Estimate {x(t0)} Smoothing (Retrodiction): Estimate {x(t): t · t0} Formulating the Equations: Discrete Model: L(x(0),…,x(t-dt),x(t),x(t+dt),…,x(tf ),…, B(t)q(t), B(t)q(t+dt),…,t) = 0 x(t) 2 RN B q 2 RN Example: take dt = 1 x(t+1) = L(x,Bq,t) if linear: x(t+1) = A(t) x + B(t) q(t) A 2 RN£ N Example: t u(z,t) = n zzu(z,t) + f(t) u(z,0) = u0(z) u(0,t) = g(t) u(1,t) = h(t) Discretizing: x(t) ´ [u1(t),u2(t)…uN(t)]T x(t+dt) = A x(t) + B q(t) x(t) = A x(t-dt) + B q(t-dt) .... Leads to: L(x(0),…,x(t-dt),x(t),x(t+dt),…,x(tf ),…, Bq(t), Bq(t+dt),…,t) = 0 x(t) 2 RN B q 2 RN Statement of the Problem Stochastic Dynamics (Langevin Problem): Measurements: GOAL: Find mean x conditioned on measurements: xS(t) = E[ x(t)| y1,..., yM] and Covariance matrix (uncertainty) CS(t) =E[(x(t)-xS(t))(x(t) -xS(t))>|y1,...,yM] The conditional mean xS(t) minimizes tr CS(t) = E[|(x(t)-xS(t))|2|y1,...,yM]. It is termed the smoother estimate. A Nonlinear Example Stochastic Dynamics (Langevin Problem): dX(t) = f(X(t)) dt + k dW(t) with V(x) = -2x2+x4 f(x) = -V’(x)=4x(1-x2) k = 0.5 Measurements: at times m Dt, m=1,…, M one observes ym := X(tm) + r Nm to have measured values Ym, m=1,2,…,M Observations Ym 2 y(tm) Extended Kalman Filter Alternative Approaches KSP: optimal, but impractical ADJOINT/4D-VAR (Restrepo, Leaf, Griewank, SIAM J. Sci Comp 1995) Mean Field Variational Method (Eyink, Restrepo, Alexander, Physica D, 2003) enKF (ensemble Kalman Filter) Particle Method (Kim Eyink Restrepo Alexander Johnson, Mon. Wea. Rev. 2002) Path Integral Method (Alexander Eyink Restrepo, J. Stat. Phys. 2005) KSP Kushner, Stratonovich Pardoux Prediction/Filter: Solve t P = -x [f(x) P] + k2xx P/2 P(x,0) = Ps(x) between tm < t < tm+1, m=0,1…,M. At measurement times tm, P (x,tm+) = 1/N exp(rm/r2 – x2/r2/2) P(x,tm-) Smoothing: solve final value problem Solve t A = -x f(x) A – k2xx A /2 A (x,t ¸ tm) = 1 between tm+1¸ t ¸ tm, m=M,..,1. At measurement times tm, A (x,tm-) = 1/N exp(rm x/r2 – x2/r2/2) A (x,tm+) Then, for each m=1, 2,…, M and tm < t < tm+1 P(x,t|yj= Yj, j=1,2,…,M)= A (x,t) P (x,t) Kolmogorov Equation t P = -x [f(x) P] + k2xx P/2 P(x,t)t ! 1 = Ps(x) Observations Ym 2 y(tm) KSP Filter Results Why not KSP? Impractical! Very tiny example: a scalar PDE in 1-d space and time solved on a lattice of N points in space: Dim(X(t))=N. Each component of X(t) requires the solution of 2 stiff PDE’s enKF (ensemble) Combine forecast with measurements: use linear interpolation via Kalman gain • Between observations: Run full model P times, use MC to get conditional mean and variance. • Pros/Cons Conditional statistics Linear interpolation Error in PDF » 1/P1/2 Need to run model fully resolved Conveniently incorporates legacy code Path Integral Method Related to simulated annealing It can be developed as a black box Simple to implement Can handle nonlinear/non-Gaussian problems Calculates mean and uncertainty PROBLEM: Relies on MC!!! Discretized using explicit Euler-Maruyama scheme Probability of the dynamics generating a given history related to probability that it experiences a certain noise history h (tk) = W(tk + dt)-W(tk), at times tk, k=0,1,2,…, For Gaussian, uncorrelated noise this probability, Prob h(t) »exp(-1/2 k | h (tk) |2). In general: Prob h(t) » exp(-Hdyn) t = t0, t1, ...t_T where Hdyn ´ ¼ k = 0T-1 [ [(xk+1- xk)/dt-f(xk,tk)]> D-1(xk, tk) [(xk+1-xk)/dt-f(xk,tk)] ] Hdyn ´ k = 0T - 1 [ [(xk+1- xk)/dt –f(xk,tk)]> D-1(xk, tk) [(xk+1-xk)/dt -f(xk,tk)] ]/4 To include influence of observations use Bayes' rule. This modifies Hamiltonian: Hobs =m=1M[h(x(tm)- y(tm)]>R-1[h(x(tm))-y(tm)] The total Hamiltonian: H = Hdyn + Hobs The Hamiltonian is the log-likelihood. There is a very close relationship between the approach we take here and 4D-var, used extensively in the ocean/climate community. Maximum likelihood methods minimize the Hamiltonian. Instead, we are going to sample the Gibbs distribution. SAMPLING Hybrid Monte Carlo (HMC) Unigrid Monte Carlo (UMC) Generalized Monte Carlo (GHMC) Hybrid Monte Carlo molecular dynamics: used to propose a new system configuration Metropolis MC: accept/reject based on the energy Configuration is specified by degrees of freedom q0, q1, ... , qT. qi 2 RN The HMC algorithm works as follows: To each qi, a conjugate generalized momemtum, pi, is assigned. The momenta pi give rise to a kinetic energy HK = i pi2/2 . The total Hamiltonian of the system H = H + HK. The dynamics are: dqi/dt = pi dpi/dt= Fi where Fi=- H/ qi is the force on the ith degree of freedom. What’s going on? Write Probability P(x) = exp(-E(x))/Z E(x) and grad(E(x)) are easily evaluated: Gradient indicates which direction one should go to find states with higher probability! Note: H(q,p) = H(q) + HK PH(q,p)=exp(-H(q,p))/Z=exp(-H(q))exp(HK(p))/Z separable then marginal distribution exp(-H(q))/Zq 1) A chain of states is generated: (qi’,pi’) i=0,1,2,…,T, by evolving J steps. 2) Detailed balance achieved if configuration obtained after evolving J steps is accepted with probability min[1, exp DH], where DH = H(q',p')-H(q,p). The Metropolis step corrects for time discretization errors. 3) p’ refreshed after every acceptance/rejection according to a Gaussian distribution of independent variables exp(-HK). Unigrid Monte Carlo Updates system by taking coherent moves on a number of length'' scales. Decompose system into blocks of contiguous lattice points. B=Block sizes 1, 2, 4, ..., 2s. B=1 the standard local Metropolis. Update: to each site in B local value has a random (Gaussian) df added to it. Metropolis accept/reject as before. Generalized HMC Dynamics replaced by: dqi/dt = A pi dpi/dt = [A]T Fi A is an N£ N matrix. when A =I we obtain HMC. Discretize: q' = q + dt Ap + ½ dt A AT F([q]) p' = p + ½ dt AT(F[q]+F[q']) Challenge: find A that leads to a significant reduction of the correlation time. Used the circulant matrix A =circ(1,exp(-a ),exp(-2 a ),…, exp(-Ta )) A Nonlinear Example Stochastic Dynamics (Langevin Problem): dX(t) = f(X(t)) dt + k dW(t) with V(x) = -2x2+x4 f(x) = -V’(x)=4x(1-x2) k = 0.5 Measurements: at times m Dt, m=1,…, M one observes ym := X(tm) + r Nm to have measured values Ym, m=1,2,…,M Observations Ym 2 y(tm) RESULTS: decorrelation time T+1 number of lattice points J number of MD steps per sweep (standard deviation) [a] in A Conclusions GHMC affects significant speedup GHMC results can be applied to enKF Path Integral can be used as a black box data assimilator Questions Euler-Maruyama error O(dt) other time integrators? Optimal A matrix in GHMC? Combining Path Integral Method with closure techniques when N and T large Further Information: http://www.physics.arizona.edu/~restrepo Education: Not enough people trained to do it The training sequence is not yet mature The problem is interdisciplinary Data Assimilation (Arizona) Prerequisites: Graduate numerical analysis sequence (linear algebra). Probability theory. Program Semester 1: Inverse Problem: the deterministic problem optimization sensitivity analysis