Download Presentation slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Time series wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Some System Identification Challenges and
Approaches
“Many basic scientific problems are now routinely
solved by simulation: a fancy random walk is performed
on the system of interest. Averages computed from the
walk give useful answers to formerly intractable
problems”
Persi Diaconis, 2008
Brett Ninness
School of Electrical Engineering & Computer Science
The University of Newcastle, Australia
1
System Identification - a rich history
•
1700‘s: Bernoulli, Euler, Lagrange - probability concepts
•
1763: Bayes - conditional probability
•
1795: Gauss, Legendre - least squares
•
1800-1850: Gauss, Legendre, Cauchy - prob.
distributions
•
1879: Stokes - periodogram of time series
•
1890: Galton, Pearson - regression and correlation
•
1922: Fisher - Maximum Likelihood (ML)
•
1921: Yule - AR and MA time series
•
1933: Kolmogorov - Axiomatic probability theory
•
1930‘s: Khinchin, Kolmogorov, Cramér - stationary
processes
2
System Identification - a rich history
•
1941-1949: Wiener, Kolmogorov - prediction theory
•
1960: Kalman - Kalman Filter
•
1965: Kalman & Ho - Realisation theory
•
1965: Åström & Bohlin - ML methods for dynamic systems
•
1970: Box & Jenkins - a unified and complete presentation
•
1970’s: Experiment design, PE formulation with
underpinning theory, analysis of recursive methods
•
1980‘s: Bias & Variance quantification, tradeoff and design
•
1990‘s: Subspace methods, control relevant identification,
robust estimation methods.
3
Recent & Current Activity
4
This talk
5
Acknowledgements
•
•
Results here rest heavily on the work of colleagues:
‣
Dr. Adrian Wills (Newcastle University)
‣
Dr. Thomas Schön (Linköping University)
‣
Dr. Stuart Gibson (Nomura Bank)
‣
Soren Henriksen (Newcastle University)
and on learning from experts:
‣
Håkan Hjalmarsson, Tomas McKelvey, Fredrik
Gustafsson, Michel Gevers, Graham Goodwin.
6
Challenge 1 - General Nonlinear ID
•
Effective solutions available for specific nonlinear
structures
‣
NARX, Hammerstein-Wiener, Bilinear.....
•
Extension to more general forms?
•
Example:
7
Challenge 1 - General Nonlinear ID
Obstacle 1: How do we compute a cost function?
• Prediction error (PE) cost:
• Maximum Likelihood (ML) cost:
8
Computing
•
Turn to general measurement and time update equations:
Measureme
nt Update
Time Update
•
Problem - closed form solutions only for special cases:
‣ Linear, Gaussian (Kalman Filter), Discrete state HMM
• More generally:
‣
Need to compute solution numerically
‣
Multi-dimensional integrals the main challenge
9
Computing
10
SEQUENTIAL IMPORTANCE RESAMPLING
•
SIR - More commonly known as “particle filtering”
•
Key idea - use the strong law of large numbers (SLLN)
•
‣
Suppose a vector random number generator gives
realisations from a given target density
‣
Then by the SLLN, with probability one:
‣
Suggests approximate quantification
How to build the necessary random number generator?
11
Recursive solution (Particle filter)
Time
Update
Resampling
Measurement
Update
12
Example
.
vs
13
History
‣
Handschin & Mayne, Int’l J. Control, 1969
‣
Resampling Approach: Gordon, Salmond & Smith, IEE
Proc. Radar & Signal Processing, 1993. (1136 citations)
‣
Now widely used in signal processing, target tracking,
computer vision, econometrics, robotics and statistics,
control....
‣
Some applications in system identification.
-
Bulk of work has involved considering parameters as
state variables.
14
Back to Nonlinear System Identification
•
General(ish) model
structure
•
Prediction error cost:
•
Max. Likelihood cost:
15
Nonlinear System Identification
Obstacle 2: How do we compute an estimate?
•
Gradient based search is standard practice:
•
How to compute the necessary gradients?
•
Strategies:
‣
Differencing to compute derivatives?
‣
Direct search methods: Nelder-Mead, simulated annealing?
16
Expectation-Maximisation (EM) ALG.
17
Expectation-Maximisation (EM) ALG.
•
Example - linear system:
•
Estimate by regression?
•
Need state - use estimate? E.g. Kalman
smoother
•
Suggests iteration:
‣
Use estimates of A,B,C,D to estimate state
‣
Use estimates of state
‣
Return and do again.
;
to estimate A,B,C,D;
18
Expectation-Maximisation (EM) ALG.
•
Key idea - “complete” and “incomplete” data
‣
Actual observations:
‣
“Wished for” (incomplete) obervations:
‣
Form estimate of “wished for” likelihood:
•
E Step: Calculate
•
M Step: Compute
19
KEY EM Algorithm Property
•
Bayes’ rule:
•
Take conditional expectation
•
Increasing
of both sides:
implies increased likelihood:
20
Expectation-Maximisation (EM) ALG.
21
Expectation-Maximisation (EM) ALG.
22
Expectation-Maximisation (EM) ALG.
•
‣
‣
History
Generally attributed to Baum: Ann. Math. Stat. 1970;
Generalised by Dempster et al: JRSS B, 1977 (9858 cites)
‣ Widely used in image processing, statistics, radar...
23
Nonlinear system estimation
Example: N=100 data points, M=100 particles, 100 experiments
24
Evolution of
•
Look at b parameter only - others fixed at true values:
25
Gradient Based Search Revisited
•
Fisher’s Identity
26
EM vs Gradient search iterates
27
Challenge 2: Application Relevant ID
•
Quality of an estimate
•
“Traditional” practice - note asymptotic results
•
Assume convergence effectively occurred for finite N
must be quantified for it to be useful
28
Assessment & Design
•
Often, a function
•
Again - “classical” approach
- use linear approximation:
•
Couple with approximate
Gaussianity of
of the parameters is of more interest
29
One perspective
Need to combine prior knowledge, assumptions and data
: Measure of the evidence supporting an
underlying system property - parameter value, frequency
response, achieved gain/phase margin......
30
Computing Posteriors
•
In principle, posterior computation straightforward:
Bayes’ Rule
Likelihood
prior knowledge
•
Example:
Combine:
31
Using Posteriors
Now the difficulty - using the posterior
•
•
Marginal on i’th parameter:
‣
Evaluation on -dim. grid,
evaluations of
‣
Simpson’s rule - evaluation
error:
Other measures?
model order
32
Markov Chain Monte Carlo (MCMC)
33
A randomised approach
•
•
Use the Strong Law of Large Numbers (SLLN) again.
‣
Build a (vector) random number generator giving
realisations:
‣
Then by the SLLN, with probability one:
‣
Suggests the approximation:
One view - numerical integration with intelligently chosen
grid points.
34
The Metropolis Algorithm
The required vector random number generator:
‣
1. Initialise: Choose
and set
Z.y=y;
Z.u=u;
M.A=4;
g1=est(Z,M);
theta=g1.theta;
35
The Metropolis Algorithm
‣
2. Draw a proposal value
xi = theta + 0.1*randn(size(theta));
g2 = theta2m(xi,g1);
36
The Metropolis Algorithm
3. Compute an acceptance probability:
cold = validate(Z,g1);
cnew = validate(Z,g2);
prat = exp((-0.5/var)*(cold-cnew)*N);
alpha = min(1,prat);
37
The Metropolis Algorithm
4. Set
with probability
if (rand <= alpha)
theta=xi;
end;
38
“Markov Chain Monte Carlo” History
•
Origins: Metropolis, Rosenbluth, Rosenbluth,Teller &
Teller, Journal of Chemical Physics, 1953. (11,564 ISI
citations)
•
Widespread use:
‣ Listed #1 in “Great Algorithms of Scientific
Computing”, Dongarra & Sullivan, Comp. & Sci in
Eng. 2000
‣ “The Markov Chain Monte Carlo Revolution”,
Diaconis, Bull. American Mathematical Society, 2008.
“Many basic scientific problems are now routinely solved by simulation: a
fancy random walk is performed on the system of interest. Averages
computed from the walk give useful answers to formerly intractable
problems”
‣
Widely used in chemistry, physics, statistics....
Emerging uses in biology, telecommunications.
39
Example
•
Simple first order situation:
•
N=20 data samples available:
•
Metropolis Algorithm:
realisations
40
Marginal posteriors via MCMC
41
Posterior of functions of
•
Candidate closed loop
controller:
What are the likely achieved gain and phase margins ?
Implicit functions of - direct computation unclear
42
Sample Histograms of
There is strong evidence that the proposed controller
will achieve a gain margin > 3.8 and phase margin > 95o
43
Conclusions
•
Many thanks for your attention;
•
Collective thanks to the SYSID2009 Organisation Team!
•
Deep thanks to the Uni. Newcastle Signal Processing
Micro-electonics group (sigpromu.org)
‣
Steve Weller, Chris Kellett, Tharaka Dissanayake, Peter
Schreier, Sarah Johnson, Geoff Knagge, Björn Rüffer,
Adrian Wills, Lawrence Ong, Dale Bates, Ian Griffiths,
David Hayes, Soren Henriksen, Adam Mills, Alan Murray
who endured multiple road-test versions of this talk, that
were even worse than this one.
44