Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Cambridge (Newton Institute) December 2015 NO EQUATIONS, NO VARIABLES NO PARAMETERS: Data, and the computational modeling of Complex/Multiscale Dynamical Systems I.G. Kevrekidis, W. Gear, R. Coifman -and other good people like Carmeline Dsilva and Ronen TalmonDepartment of Chemical Engineering, PACM & Mathematics Princeton University, Princeton, NJ 08544 This semester: Hans Fischer Fellow, IAS-TUM Muenchen Princeton University Princeton University Clustering and stirring in a plankton model An “equation-free” demo Young, Roberts and Stuhne, Nature 2001 Princeton University Dynamics of System with convection Princeton University Coarse Projective Integration (coarse projective Forward Euler) Princeton University Projective Integration: From t=2,3,4,5 to 10 Princeton University Projective Integration - a sequence of outer integration steps based on inner simulator + estimation (stochastic inference) Accuracy and stability of these methods – NEC/TR 2001 (w/ C. W.Gear, SIAM J.Sci.Comp. 03, J.Comp.Phys. 03, --and coarse projective integration (inner LB) Comp.Chem.Eng. 2002 time Projective methods in time: -perform detailed simulation for short periods or use existing/legacy codes - and then extrapolate forward over large steps Space Princeton University value value Patch Dynamics Computation • running fine-scale simulations in the space space (I) Full Scale value value value (II) Coarse Projective Integration • run fine-scale space (II) Gap-tooth full spatial and temporal domain is expensive (II) Patch Dynamics spac space e simulations in only a fraction of the spatial and temporal domain Princeton University So, the main point: • You have a “detailed simulator” --- or an experiment • Somebody tells you what are good coarse variable(s) • Then you can use the IDEA – that a coarse equation exists – to accelerate the simulation/ extraction of information. – Equation-Free – BUT – How do we know what the right coarse variables are Data mining techniques – Diffusion Maps Princeton University Where do good coarse variables come from? • Systematic hierarchy (and mathematics) – Fourier modes, moments….. • Experience/expertise/knowledge/brilliance – Human learning (“brain” data mining, observation, phase fields…) • Machine Learning – Data mining, manifold learning (here: diffusion maps) Princeton University Princeton University and now for something completely different: moving groups of individuals (T. Frewen) Princeton University Princeton University Fish Schooling Models Initial State Position, Direction, Speed INFORMED UNINFORMED ci t , vi t , si t Compute Desired Direction d i t t Zone of Attraction Rij< Zone of Deflection Rij< d i t t j i c j t ci t d i t t c j t ci t j i c j t ci t c j t ci t j 1 v j t v j t Normalize dˆi t t Update Direction for Informed Individuals ONLY di t t ' dˆi t t g i d i t t dˆ t t g ' i i Update Positions ci t t ci t vi t t si t Couzin, Krause, Franks & Levin (2005) Princeton University Nature (433) 513 INFORMED DIRN STICK STATES STUCK ~ typically around rxn coordinate value of about 0.5 INFORMED individual close to front of group away from centroid Princeton University INFORMED DIRN SLIP STATES SLIP ~ wider range of rxn coordinate values for slip 00.35 INFORMED individual close to group centroid Princeton University Dataset as Weighted Graph edges 3 W W 1 23 2 weights 12 vertices x x i j Wij exp t 2 parameter tR Princeton University N datapoints Compute NN “neighborhood” matrix K i {x } x i x j Ki , j = exp 2 Parameter Local neighborhood size Compute diagonal normalization matrix D N Di ,i = Ki , j j 1 Compute Markov matrix M M D1K Require: Eigenvalues λ and Eigenvectors Φ of M Top 2nd Nth M 1 11 M 2 22 M N N N 1 2 3 N A few Eigenvalues\Eigenvectors provide meaningful information on dataset geometry Princeton University Eigenfunctions of the Laplacian to Parameterize a Manifold Consider a manifold We want a “good“ parameterization of the manifold Instead, we have data sampled from the manifold and want to obtain a parameterization These parameterizations are the eigenfunctions of Δφ We therefore approximate the Laplacian on the data We obtain a similar parameterization for a curved manifold Diffusion maps approximates the eigenfunctions for a manifold using data sampled from the manifold Princeton University Diffusion Maps Dataset in x, y, z Dataset Diffusion Map eigencomputation N datapoints x xi , yi , zi , i 1, N i R. Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. PNAS 102 (2005). N datapoints i 2i , 3i , i 1, N B. Nadler, S. Lafon, R. Coifman, and I. G. Kevrekidis, Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Appl. Comput. Harmon. Anal. 21 (2006). Princeton University Diffusion Map (2, 3) Report absolute distance of all uninformed individuals to informed individual to DMAP routine Report (signed) distance of all uninformed individuals to informed individual to DMAP routine ABSOLUTE Coordinates SIGNED Coordinates Reaction Coordinate 3 STICK STICK SLIP 2 SLIP 2 Princeton University SIGNED Coordinates MACHINE MACHINE ABSOLUTE Coordinates MAN MAN Princeton University Effective simplicity • Construct predictive models (deterministic, Markovian) • Get information from them: CALCULUS, Taylor series – – – – Derivatives in time to jump in time Derivatives in parameter space for sensitivity /optimization Derivatives in phase space for contraction mappings Derivatives in physical space for PDE discretizations In complex systems --- no derivatives at the level we need them sometimes no variables ---- no calculus If we know what the right variables are, we can PERFORM differential operations on the right variables – A Calculus for Complex Systems Princeton University Nonlinear Intrinsic Variables • The key part of diffusion maps is constructing the appropriate distance metric for the problem • We would like to construct a distance metric that is invariant to the observation function (under some conditions), and instead respects the underlying dynamics of the system A. Singer and R. R. Coifman, Applied and Computational Harmonic Analysis, 2008 Princeton University Chemical Reaction Network Example • We simulate the dynamics of a chemical reaction network using the Gillespie Stochastic Simulation Algorithm • The rate constants are such that the reaction coordinates quickly approach an apparent two-dimensional manifold Princeton University Computing the Distance Metric from Empirical Data We have data from a trajectory or time series For each data point, we look We estimate the at the local trajectory local covariance using the in order to sample the local trajectory local noise We then consider the distance metric d 2 ( xi , x j ) 2( xi x j )T (C( xi ) C( x j ))† ( xi x j ) where C is the estimated noise covariance. This is the Mahalanobis distance and is invariant to the observation function (provided the observation function is invertible). We then use the Mahalanobis distance in a DMAPS computation. We call the resulting DMAPS variables Nonlinear Intrinsic Variables (NIV). C. J. Dsilva et al., The Journal of chemical physics, 2013. Princeton University Princeton University Partial Observations and NIV embeddings •We simulate the dynamics of a chemical reaction network using the Gillespie Stochastic Simulation Algorithm •The rate constants are such that the reaction coordinates quickly approach an apparent two-dimensional manifold We observe the concentrations of E, S, S : E, and F : E* We observe the concentrations of E, S, E : S, and D : S* We obtain a consistent embedding for the two data sets D. T. Gillespie, The Journal of Physical Chemistry, 1977 Princeton University Reconstructing Missing Components We would like to exploit the consistency of the embeddings to “fill in” the missing components in each data set. Train a function f (x) : NIV space → chemical space using data set 1 as training data We use a multiscale Laplacian Pyramids algorithm for training. We could use splines, nearest neighbors, etc. Compute NIV embedding of data set 2 Use f(x) to estimate components in data set 2 We can successfully predict the concentrations of the missing components in data set 2 N. Rabin and R. R. Coifman, SDM, 2012. Princeton University Another data problem, and a small miracle • NIVs also help solve a major multiscale data problem • There are two kinds of reduction: • one in which the value of the fast variable (its QSS) becomes slaved to the slow variable • and one in which the statistics of the fast variable (its quasi-invariant measure) becomes slaved to the slow variable. Princeton University Two types of reduction (“ODE” and “SDE”) Princeton University Manifold Learning Applied to Reducible SDEs Consider the SDE dx adt dW1 dy y dt 1 dW2 Issue: y is still O(1), so we cannot recover only x using DMAPS If ε < 1, then x is the only slow variable a = 3, ε=1e-3 Princeton University Rescaling to Recover the Slow Variables Consider the transformation z = y √ε Now z is of a much smaller scale than x, and DMAPS will recover x Then the SDEs become dx adt dW1 z dz dt dW2 Note: after this transformation, the two SDEs both have noise with unit variance Princeton University NIVs to Recover the Slow Variables with Nonlinearities We can also recover NIVs when the data is obscured by a nonlinear measurement fun y f1 ( x, y ) ( y 5) cos 2 x 2 y f 2 ( x, y ) ( y 5) sin 2 x 2 Look at local clouds in the data We again recover the slow variable x Princeton University Princeton University Princeton University Reverse Projective Integration – a sequence of outer integration steps backward; based on forward steps + estimation 1 2 3 Reverse Integration: and then a little forward, a lot backward ! We are studying the accuracy and stability of these methods Princeton University Princeton University Free Energy Landscape Princeton University Right-handed -helical minimum Princeton University Just like a stable manifold computation Instantaneous ring position , t consists of set of replicas (nodes) 1 2 , t 3 Periodic Ring BCs 1 , t N 1 , t N , t 2 , t (arc length) Evolve ring with u E normal velocity u: Backward Integration of: t E tˆ additional term controls nodal distribution along ring Princeton University Protocols for Coarse MD using Reverse Ring Integration MD Simulations run FORWARD in Time Initialize ring nodes Step Ring BACKWARD in “time” or effective potential Parametrization: as in string method Vanden-Eijnden E, Ren, Princeton University USUAL DMAP + BOUNDARY DETECTION Stretch points close to edges to avoid extension issues Princeton University An algorithm for boundary point detection i i Legend: Previously detected boundary point j Nearest neighbor of ij j Small neighborhood of Center of mass of neighborhood First edge point to be guessed (if DMAP is used, take min/max) (i, j) i j ij ; i, j 2) New boundary points: i , j : (i , j ) max i, j 1) Compute: 3) Algorithm parameters: 0.3; 0.15 Princeton University USE NEW COORDINATES FOR EXTENSION PURPOSES Both restriction and lifting were made by local RBF (over 40 nearest neighbours) Princeton University Out of sample extension Physical highdimensional space START END x x x x Reduced lowdimensional space (DMAP, PCA) extend Out-of-sample (extended, onmanifold) reduced sample Princeton University Princeton University Princeton University Sine-DMAP Standard DMAP algorithm automatically imposes a zero-flux (Neumann) condition at the manifold boundary (without •even explicitly knowing where the boundary is!); There is a possibility of imposing instead Dirichlet-boundary conditions (provided we are able to detect in advance the •manifold boundary); Simple example: Points on a segment of length 1 (physical coordinate 0<x<1) One-to-one but…trying to extend f at the boundary: f x 0 x Cosine-based DMAPs x x 1 Edge detectio n df x df dx d dx d Cosine-DMAP coordinate Sine-based DMAPs Ok at the boundary! x 49 Princeton University Sine-DMAP coordinate 1) Alanine dipeptide in implicit solvent (Still algorithm) by GROMACS; 2) dt=0.001 ps, Nsteps= 200000; 3) Temperature: 300 K (v-rescale); Initial configuration 4) PBC, with a simulation box of (2nm)^3; 5) Cut-off for non-bonded interactions: 0.8 nm; 6) Force field: Amber03. Princeton University Edges can be automatically detected using local PCA and by computing the Boundary Propensity Field (see Gear’s notes) Princeton University A new energy well is discovered starting from the considered extended edge after a very short MD run (Nsteps= 20000)! Restart from here Princeton University No new wells are found starting from a different extended edge after a very short MD run (Nsteps= 20000) Restart from here Princeton University Princeton University Princeton University Effective simplicity • Construct predictive models (deterministic, Markovian) • Get information from them: CALCULUS, Taylor series – – – – Derivatives in time to jump in time Derivatives in parameter space for sensitivity /optimization Derivatives in phase space for contraction mappings Derivatives in physical space for PDE discretizations In complex systems --- no derivatives at the level we need them sometimes no variables ---- no calculus If we know what the right variables are, we can PERFORM differential operations on the right variables – A Calculus for Complex Systems Princeton University Coming full circle No equations ? Isn’t that a little medieval ? Equations =“Understanding” AGAIN matrix free iterative linear algebra Ax=b PRECONDITIONING, BAx=Bb B approximate inverse of A Use “the best equation you have” to precondition equation-free computations. With enough initialization authority: equation free laboratory experiments Samaey et al., JCompPhys (LBM for streamer fronts) – physics/0604147 math.DS/0608122 q-bio.QM/0606006Princeton University Computer-Aided Analysis of Nonlinear Problems in Transport Phenomena Robert A. Brown, L. E. Scriven and William J. Silliman in HOLMES, P.J., New Approaches to Nonlinear Problems in Dynamics, 1980 ABSTRACT The nonlinear partial differential equations of mass, momentum, energy, Species and charge transport…. can be solved in terms of functions of limited differentiability, no more than the physics warrants, rather than the analytic functions of classical analysis… ….. basis sets consisting of low-order polynomials. …. systematically generating and analyzing solutions by fast computers employing modern matrix techniques. ….. nonlinear algebraic equations by the Newton-Raphson method. … The Newton-Raphson technique is greatly preferred because the Jacobian of the solution is a treasure trove, not only for continuation, but also for analysing stability of solutions, for detecting bifurcations of solution families, and for computing asymptotic estimates of the effects, on any solution, of small changes in parameters, boundary conditions, and boundary shape…… In what we do, not only the analysis, but the equations themselves are obtained on the computer, from short experiments with an alternative, microscopic description. Princeton University