Download Graphical Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Graphical Models in Data
Assimilation Problems
Alexander Ihler
UC Irvine
[email protected]
Collaborators:
Sergey Kirshner
Andrew Robertson
Padhraic Smyth
1
Outline
• Graphical models
– Convenient description of structure among random variables
• Use this structure to
– Organize inference computations
• Finding optimal (ML, etc.) estimates
• Calculate data likelihood
• Simulation / drawing samples
– Suggest sub-optimal (approximate) inference computations
• e.g. when optimal computations too expensive
• Some examples from data assimilation
– Markov chains, Kalman filtering
– Rainfall models
• Mixtures of trees
• Loopy graphs
– Image analysis (de-noising, smoothing, etc.)
2
Graphical Models
An undirected graph
set of
is defined by
nodes
set of edges
Nodes
connecting nodes
are associated with random variables
Graph Separation
Conditional
Independence
3
Graphical Models:
Factorization
• Sufficient condition
– Distribution factors into product of “potential functions”
defined on cliques of G
– Condition also necessary if distribution strictly positive
• Examples
4
Graphical Models:
Inference
• Many possible inference goals
– Given a few observed RVs, compute:
•
•
•
•
Marginal distributions
Joint, Maximum a-posteriori (MAP) values
Data likelihood of observed variables
Samples from posterior
• Use graph structure to do computations efficiently
– Example: compute posterior marginal p(x2 | x5=X5)
5
Finding marginals via Belief Propagation
(aka sum-product; other goals have similar algorithms)
Combine the observations from all nodes in the graph
through a series of local message-passing operations
neighborhood of node s (adjacent nodes)
message sent from node t to node s
(“sufficient statistic” of t’s knowledge about s)
6
BP Message Updates
I. Message Product: Multiply incoming messages (from all nodes
but s) with the local observation to form a distribution over
II. Message Propagation: Transform distribution from node t to
node s using the pairwise interaction potential
Integrate over
to form distribution summarizing
node t’s knowledge about
7
Example: sequential estimation
• Well-known example
– Markov Chain
– Jointly Gaussian uncertainty
• Gives integrals a simple, closed form
– Optimal inference (in many senses) given by Kalman filter
– Convert large (T) problem to collection of smaller problems
– “exact” non-Gaussian: particle & ensemble filtering & extensions
– Same general results hold for any tree-structured graph
• Partial elimination ordering of nodes
– Complexity limited by dimension of
each variable
8
Exact estimation in non-trees
• Often our variables aren’t so well-behaved
– May be able to convert using variable augmentation
• Often the case in Bayesian parameter estimation
– Treat parameters as variables, include them in the graph
– (increases nonlinearities!)
• But, dimensionality problem
– Computation increases (maybe a lot!)
• Jointly Gaussian,  d3
• Otherwise often exponential in d
a
– Can trade off graph complexity with dimensionality…
9
Example: rainfall data
• 41 stations in India
• Rainfall occurrence &
amounts for ~30 years
• Some stations/days missing
• Tasks
–
–
–
–
Impute missing entries
Simulate realistic rainfall
Short term predictions
…
• Can’t deal with joint distribution – too large to even manipulate
• Conditional independence structure?
– Unlikely to be tree-structured
10
Example: rainfall data
• “True” relationships
– not tree-like at all
– High tree-width
• Need some approximations
– Approximate model,
exact inference
– Correct model,
approximate inference
• Even harder:
– May get multiple observation
modalities (satellite data, etc.)
– Have own statistical structure
& relationships to stations
11
Example: rainfall data
• Consider a single time-slice
• Option 1: mixtures of trees
– Add “hidden” variable indicating which of several trees
– (Generally) marginalize over this variable
+
+
+
• Option 2: use loopy graph, ignore loops in inference
– Utility depends on task:
– Works well for filling in missing data
– Perhaps less well for other tasks
12
Multi-scale models
• Another example of graph structure
• Efficient computation if tree-structured
• Again, don’t really believe any particular tree
– Perhaps average over (use mixture of) several
• (see e.g. Willsky 2002)
• (also w/ loops,
similar to multi-grid)
13
Summary
• Explicit structure among variables
–
–
–
–
Prior knowledge / learned from data
Structure organizes computation, suggests approximations
Can provide computational efficiency
(often naïve distribution too large to represent / estimate)
• Offers some choice
– Where to put the complexity?
– Simple graph structure with high-dimensional variables
– Complex graph structure with more manageable variables
• Approximate structure, exact computations
• Improved structures, approximate computations
14