Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine [email protected] Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth 1 Outline • Graphical models – Convenient description of structure among random variables • Use this structure to – Organize inference computations • Finding optimal (ML, etc.) estimates • Calculate data likelihood • Simulation / drawing samples – Suggest sub-optimal (approximate) inference computations • e.g. when optimal computations too expensive • Some examples from data assimilation – Markov chains, Kalman filtering – Rainfall models • Mixtures of trees • Loopy graphs – Image analysis (de-noising, smoothing, etc.) 2 Graphical Models An undirected graph set of is defined by nodes set of edges Nodes connecting nodes are associated with random variables Graph Separation Conditional Independence 3 Graphical Models: Factorization • Sufficient condition – Distribution factors into product of “potential functions” defined on cliques of G – Condition also necessary if distribution strictly positive • Examples 4 Graphical Models: Inference • Many possible inference goals – Given a few observed RVs, compute: • • • • Marginal distributions Joint, Maximum a-posteriori (MAP) values Data likelihood of observed variables Samples from posterior • Use graph structure to do computations efficiently – Example: compute posterior marginal p(x2 | x5=X5) 5 Finding marginals via Belief Propagation (aka sum-product; other goals have similar algorithms) Combine the observations from all nodes in the graph through a series of local message-passing operations neighborhood of node s (adjacent nodes) message sent from node t to node s (“sufficient statistic” of t’s knowledge about s) 6 BP Message Updates I. Message Product: Multiply incoming messages (from all nodes but s) with the local observation to form a distribution over II. Message Propagation: Transform distribution from node t to node s using the pairwise interaction potential Integrate over to form distribution summarizing node t’s knowledge about 7 Example: sequential estimation • Well-known example – Markov Chain – Jointly Gaussian uncertainty • Gives integrals a simple, closed form – Optimal inference (in many senses) given by Kalman filter – Convert large (T) problem to collection of smaller problems – “exact” non-Gaussian: particle & ensemble filtering & extensions – Same general results hold for any tree-structured graph • Partial elimination ordering of nodes – Complexity limited by dimension of each variable 8 Exact estimation in non-trees • Often our variables aren’t so well-behaved – May be able to convert using variable augmentation • Often the case in Bayesian parameter estimation – Treat parameters as variables, include them in the graph – (increases nonlinearities!) • But, dimensionality problem – Computation increases (maybe a lot!) • Jointly Gaussian, d3 • Otherwise often exponential in d a – Can trade off graph complexity with dimensionality… 9 Example: rainfall data • 41 stations in India • Rainfall occurrence & amounts for ~30 years • Some stations/days missing • Tasks – – – – Impute missing entries Simulate realistic rainfall Short term predictions … • Can’t deal with joint distribution – too large to even manipulate • Conditional independence structure? – Unlikely to be tree-structured 10 Example: rainfall data • “True” relationships – not tree-like at all – High tree-width • Need some approximations – Approximate model, exact inference – Correct model, approximate inference • Even harder: – May get multiple observation modalities (satellite data, etc.) – Have own statistical structure & relationships to stations 11 Example: rainfall data • Consider a single time-slice • Option 1: mixtures of trees – Add “hidden” variable indicating which of several trees – (Generally) marginalize over this variable + + + • Option 2: use loopy graph, ignore loops in inference – Utility depends on task: – Works well for filling in missing data – Perhaps less well for other tasks 12 Multi-scale models • Another example of graph structure • Efficient computation if tree-structured • Again, don’t really believe any particular tree – Perhaps average over (use mixture of) several • (see e.g. Willsky 2002) • (also w/ loops, similar to multi-grid) 13 Summary • Explicit structure among variables – – – – Prior knowledge / learned from data Structure organizes computation, suggests approximations Can provide computational efficiency (often naïve distribution too large to represent / estimate) • Offers some choice – Where to put the complexity? – Simple graph structure with high-dimensional variables – Complex graph structure with more manageable variables • Approximate structure, exact computations • Improved structures, approximate computations 14