Download A review of statistical modelling and inference for electrical capacitance tomography TOPICAL REVIEW

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability density function wikipedia , lookup

Transcript
IOP PUBLISHING
MEASUREMENT SCIENCE AND TECHNOLOGY
Meas. Sci. Technol. 20 (2009) 052002 (22pp)
doi:10.1088/0957-0233/20/5/052002
TOPICAL REVIEW
A review of statistical modelling and
inference for electrical capacitance
tomography
D Watzenig1 and C Fox2
1
Institute of Electrical Measurement and Measurement Signal Processing, Graz University of
Technology, Kopernikusgasse 24, A-8010 Graz, Austria
2
Department of Physics, University of Otago, PO Box 56, Dunedin, New Zealand
E-mail: [email protected] and [email protected]
Received 20 November 2007, in final form 18 December 2008
Published 3 April 2009
Online at stacks.iop.org/MST/20/052002
Abstract
Bayesian inference applied to electrical capacitance tomography, or other inverse problems,
provides a framework for quantified model fitting. Estimation of unknown quantities of
interest is based on the posterior distribution over the unknown permittivity and unobserved
data, conditioned on measured data. Key components in this framework are a prior model
requiring a parametrization of the permittivity and a normalizable prior density, the likelihood
function that follows from a decomposition of measurements into deterministic and random
parts, and numerical simulation of noise-free measurements. Uncertainty in recovered
permittivities arises from measurement noise, measurement sensitivities, model inaccuracy,
discretization error and a priori uncertainty; each of these sources may be accounted for and in
some cases taken advantage of. Estimates or properties of the permittivity can be calculated as
summary statistics over the posterior distribution using Markov chain Monte Carlo sampling.
Several modified Metropolis–Hastings algorithms are available to speed up this
computationally expensive step. The bias in estimates that is induced by the representation of
unknowns may be avoided by design of a prior density. The differing purpose of applications
means that there is no single ‘Bayesian’ analysis. Further, differing solutions will use different
modelling choices, perhaps influenced by the need for computational efficiency. We solve a
reference problem of recovering the unknown shape of a constant permittivity inclusion in an
otherwise uniform background. Statistics calculated in the reference problem give accurate
estimates of inclusion area, and other properties, when using measured data. The alternatives
available for structuring inferential solutions in other applications are clarified by contrasting
them against the choice we made in our reference solution.
Keywords: statistical inversion, Bayesian inference, Markov chain Monte Carlo, electrical
capacitance tomography
within inaccessible domains in applications where differing
materials show up as contrasting permittivities. Electrodes
are set in an insulating material at the outside of an insulating
tube. By applying a predefined voltage pattern to electrodes
the capacitance between pairs of electrodes can be directly
related to measured electric potentials, electric currents and
1. Introduction
Electrical capacitance tomography (ECT) is an imaging
modality in which one attempts to recover the spatially varying
permittivity of an insulating medium from measurement
of capacitance outside the boundary of the medium
[1, 2]. ECT is primarily used for non-invasive imaging
0957-0233/09/052002+22$30.00
1
© 2009 IOP Publishing Ltd Printed in the UK
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
electric charges. Measurements consist of all the capacitances
between pairs of electrodes, making up the matrix of transcapacitances. The interior of the tube contains the material
with unknown permittivity distribution that is being imaged.
It is advantageous to surround the apparatus by an electrically
conducting ground shield so that measured trans-capacitances
do not depend on the environment outside the apparatus.
Measured capacitances depend on the unknown permittivity.
The imaging problem is to ‘invert’ this relationship to
determine the unknown permittivity.
ECT has been proposed for a variety of target applications
such as imaging dilute as well as bulky multi-phase flows in oil
refinement, in the food industry and to observe pharmaceutical
and chemical processes [3–7]. Other fields of application can
be found in the characterizing of different phases in fluidized
beds, mixing processes and combustion chambers [8–10].
ECT systems can be implemented with low cost, and due
to their robustness and small failure probability are suitable
for operation under harsh environmental conditions including
the presence of strong external electromagnetic fields [11].
The functional relationship ε(x, y) → C from the
permittivity distribution to the trans-capacitance matrix C
defines the forward map. The forward map can be modelled
using the physics of the problem and is written as an elliptic
partial differential equation (PDE) of the form ∇ · (ε∇u) = q
subject to boundary conditions. The permittivity being sought
is denoted by ε and appears as the spatially varying coefficient
in the PDE. The measurements made are of the boundary
values of the electrical potential u and flux ε ∂u
.
∂n
If the electrical field u can be well approximated as
a small change about a known field then the linear Born
approximation may be used to simulate the measurement
process (forward map). This would be the case when the
permittivity is essentially known up to a small uncertainty. In
most applications, however, the measurement process must be
simulated by solving the PDE subject to appropriate boundary
conditions. In that case the forward map is nonlinear [12, 13].
When the measurement system, consisting of the region
of interest and electrodes and the permittivity, is long in one
direction, it follows that the electric fields do not vary in that
direction and the forward and inverse problems reduce to a
two-dimensional problem for a slice through the system. This
is an approximation to the three-dimensional inverse problem
that is usually made to reduce computational complexity.
As a general imaging technique, ECT is necessarily
low resolution.
This follows from each measurement
being dependent on all of the permittivity, resulting in
measurements primarily being sensitive to average, or slowly
varying, properties. Fine-scale structure in the permittivity
has little effect on measurements. Consequently, practical
measurements that include noise do not unambiguously define
a detailed image of the spatially varying permittivity. For this
reason it is necessary to include further information in the
imaging step, such as physical constraints. In deterministic
regularization methods this takes the form of a regularizing
functional, typically a semi-norm over representations chosen
on mathematical grounds. In contrast, the wider range of
representations allowed by the statistical approach allows for
the inclusion of constraints or information that more closely
represent actual knowledge about the unknown permittivity
thereby allowing genuine modelling of the unknowns to inform
the imaging step. Representations and image modelling are
discussed in section 5.
The majority of ECT reconstructions reported in the
literature apply deterministic approaches such as regularized
least squares to solve the inverse problem [14–17].
Deterministic inversion consists of applying a regular
approximation to the inverse of the forward map to give a
single estimate of the unknown parameters, and includes errors
only in terms of a single number, the ‘magnitude’ of noise.
Finer details of the statistical distribution of errors are not
considered.
The inclusion of the error process is critical for practical
solution of inverse problems such as ECT where the forward
map has a large range of sensitivities to features in the unknown
permittivities. Explicitly modelling the error process, and
ensuring that the error conforms to the model, is also an
invaluable tool in developing accurate instrumentation and
an accurate forward map. All too often measurements
contain artefacts that are not part of the intended measurement
modality, or are not modelled in the forward map. Interpreting
measurement or discretization artefacts in terms an idealized
forward map typically leads to substantial artefacts in the
reconstructed permittivities [18]. We cannot over stress the
value of the standard technique (in statistics) of examining
residuals to validate the forward map and measurement error
distributions (see section 4).
A probabilistic model for measurement error and
other uncertainties results in a probabilistic model for the
measurement process, and inversion is then a problem of
statistical inference in which the unknown permittivity is to be
estimated. The parametrization of the unknown permittivity
is an important consideration as it determines the ease, or
difficulty, of stating constraints over allowable permittivities.
A consequent requirement is the specification of a prior
distribution over parameters since this is the primary means
of ensuring that estimates of quantities of interest are not
biased by the estimation procedure. We address the issue of
representation and prior distribution in section 5. Accurately
modelling the measurement process, including the statistics of
the measurement error, and modelling unknowns via the prior
distribution are key steps in an accurate application of Bayesian
inference to inverse problems [19–21]. This approach provides
a convenient setting for defining, incorporating, controlling
and interpreting prior information and has a wide ranging
applicability in many inverse problems.
Statistical solutions to inverse problems have a long
history, with notable developments in the work of Laplace and
Jeffreys [22] though the ingredients of a modern computational
approach where introduced in seminal papers by Geman and
Geman in 1984 [23] and Grenander and Miller in 1994
[24], each for a problem in image restoration. The first
substantive statistical methods applied in electrical impedance
tomography (EIT) were presented by Nicholls and Fox [25]
and Kaipio et al [26] about 10 years ago.
By taking into account uncertainties, one can quantify the
range of parameters that are consistent with measured data
2
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
via the posterior probability distribution. Then solutions to
an ill-posed inverse problem are well-determined problems in
statistical inference over that distribution. Bayesian methods
encompass much more than simply reporting a posterior mode
and can be regarded as more general than regularization.
The Bayesian paradigm has many advantages over
deterministic approaches, such as robust predictive densities,
posterior error estimates, direct support for optimal decisions
and the ability to treat arbitrary forward maps and error
distributions. One non-obvious advantage is the ability to
use a wide range of representations of the unknown system
including parameter spaces that are discrete, discontinuous or
of even variable dimension. Since inferential methods make
optimal use of data, the ability to reduce data to a minimal
set gives cost savings in applications where collecting data is
expensive.
The price of these advantages is presently the relatively
high computational cost of sampling algorithms for computing
estimates. Typically, the dimension of the parameter space
is between 50 and 105 depending on the problem to be
solved. Integrations required to compute estimates such as
posterior means or credible intervals are intractable through
standard analytical or numerical quadrature techniques. Best
current solutions employ Markov chain Monte Carlo (MCMC)
sampling that is computationally costly. These sampling
algorithms draw samples from the posterior distribution by
simulating a Markov chain with an appropriate transition
kernel, with popular examples being the Gibbs sampler [23]
and the Metropolis–Hastings (MH) algorithm [27]. Both
discrete and continuous problems can be treated requiring only
the ratio of probability densities of two states to be calculated.
In scenarios where material distributions change with time and
real-time performance is required, Bayesian recursive filters
such as Kalman filters (KFs) [28, 29] and particle filters (PFs)
[30] can be applied.
To overcome the computational expense of MCMC
algorithms, much current research focuses on fast alternative
algorithms in order to extend the field of applications,
including non-stationary problems. We discuss these recent
advances in more detail in section 6.3.
In this article we review the current state-of-the-art
in performing ECT from measured data using inferential
methods. We have tried to also incorporate the nature of a
tutorial by surveying the many methods available in terms of
the sequence of choices that need to be made to implement
one solution as opposed to another.
Throughout the review we will highlight a reference
problem in ECT, of recovering the unknown shape of a
single inclusion with unknown constant permittivity in an
otherwise uniform background material, from uncertain
capacitance measurements at electrodes outside the material
[31]. A schematic of this reference problem is shown in
figure 1. We expect that contrasting the various choices
we made in solving this reference application, against the
alternatives available, will clarify both our solution and other
structured inferential solutions.
The reference problem arose in an application with the
goal of quantifying void fraction (water/air) in oil pipe
inclusion
electrodes
u0
shielding
Figure 1. A schematic of the reference problem in ECT, in which a
single inclusion in permittivity is sought.
lines. Hence the area of the (two-dimensional) inclusion
was of primary interest. We therefore chose an explicit
representation of the boundary of the inclusion so that area is
simple to calculate. That choice of representation necessarily
introduces a bias in estimates of area (as it would with
regularized inversion) and so we adjust the prior density
over inclusions to compensate (see section 5). Numerical
implementation of the forward map uses a boundary element
method (BEM) representation of the unknown permittivity,
taking advantage of the piecewise-constant representation,
coupled to a finite element method (FEM) discretization of
the region of unchanging permittivity around the electrodes
[17]. We present the posterior distribution of inclusion area
as a check on accuracy of the method. All these aspects
correspond to choices made for this problem, some of which
we would change given a different application or measurement
modality.
The review is organized as follows. In section 2 we
introduce the sensor technology, the requirements for
an imaging system in electrical tomography, typical
measurements,
instrumentation error,
measurement
uncertainties and a calibration concept.
Section 3
addresses data simulation for ECT. Sensor model, modelling
error as well as numerical implementation via finite and
boundary elements are discussed. In section 4 we formulate
ECT in the Bayesian inferential framework by introducing
a probabilistic model of the measurement process along
with basic definitions for statistical inverse problems.
Representation of unknowns as low-, mid- and high-level
and the issues in prior modelling are investigated in
section 5. In section 6 we discuss the MCMC sampling
procedure, acceleration schemes and recent advances in
sampling as well as the topic of summarizing the posterior
distribution and calculation of statistics. Inversion results
for ECT are presented in section 7 for different permittivity
distributions using synthetic and measured data. Section 8
contains a brief review on the solution of non-stationary
inverse problems. Concluding remarks are given in section 9.
3
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
• An outer screen of the sensor head is compulsory because
this will shield the sensor system from the ambient
systems and prevent charge disturbances on the electrodes
due to external charged objects [37].
In the charge-based method, the induced displacement
current is measured and converted to a voltage signal which is
proportional to the inter-electrode capacitance. In comparison
to the electrode potential-based strategy, this method can
be considered a low-impedance measurement.
Due to
the narrow frequency characteristics, the method is less
affected by electromagnetic disturbances. The narrow noise
bandwidth implies improved signal-to-noise ratio (SNR) and
consequently a higher resolution in measured quantities. By
including a tunable low-pass filter, varying stray capacitances
can be compensated in a very simple and natural way
[38]. Unfortunately this approach does not work for the
electrode potential-based method [35]. A filter between
the electrode and the operational amplifier significantly
reduces the impedance of the input stage. Within the
high-impedance method, the stray capacitance and the interelectrode capacitance form a capacitive voltage divider that
reduces the amplitude of the received signal. Low-impedance
measurements exhibit better noise characteristics and are less
sensitive to varying stray capacitances.
Due to these advantages we use a charge-based sensor
built at Graz University of Technology for the different
investigations and experiments presented in this work [35].
The complete sensor font-end of the charge-based ECT
sensor consists of an input resonant circuit, a low-noise
current-to-voltage converter, a bandpass filter, a logarithmic
demodulator and a 24 bit analog-to-digital converter controlled
by a microprocessor. Figure 2 illustrates the measurement
configuration of the ECT sensor used for permittivity imaging.
The sensor has a carrier frequency of 40 MHz and comprises
two tuneable filters adjusted by means of variable capacitances
for stray capacitance compensation. The data acquisition has
a maximum sampling rate of 7.5 k samples/s. The receiving
channel offers linear characteristics from 10 dBμA to
65 dBμA. A single measurement frame consists of 16
projections, according to the 16 available transmitting
electrodes. A measurement frame consequently consists of
16 × 15 = 240 entries. The first 105 displacement current
values of a measured frame are shown in figure 3. In most
charge-based ECT applications the following restrictions are
assumed to simplify sensor modelling:
• The permittivity is independent of the electric field
strength.
• The carrier frequency is constant and the wavelength
is large compared to the sensor geometry leading to an
electrostatic model.
• Stray capacitances in the longitudinal direction are not
considered (due to length of electrodes).
The configuration used in the present study is aimed
at multi-phase flow monitoring, hence a ring of electrodes
covering the cross-section of a process pipe is used. The 16
electrodes are evenly distributed around its boundary. This
yields a set of 16 × 15/2 = 120 independent inter-electrode
capacitances from which the internal permittivity distribution
has to be reconstructed.
2. Instrumentation for ECT
ECT is a non-invasive technique to examine the permittivity
distribution of closed objects by means of measurements
of coupling capacitances in a multi-electrode assembly. A
general setup for ECT consists of the imaging domain ∈ R2
or R3 containing the unknown permittivity distribution and the
domain boundary ∂ where a number of electrodes are placed.
Typical configurations use 8–16 electrodes. A measurement
circuit is connected to each of the electrodes to sense the
inter-electrode capacitances providing information about the
permittivity within .
2.1. ECT sensors
Two measuring principles are typically applied to determine
the matrix of trans-capacitances, or coupling capacitances:
• Charge-based or displacement current-based method (ac
voltage, low-impedance measurement) [16, 32].
• Electrode potential-based method (dc voltage, highimpedance measurement) [17].
For both methods each electrode is designated as a
‘transmitting’ or as a ‘receiving’ electrode with a prescribed
voltage being applied to transmitting electrodes. In the chargebased method the receiving electrodes are held at virtual earth
with the displacement charge being measured, while in the
voltage-based approach the receiving electrodes are floating
with the potential being measured. A comparison of the two
different measurement principles is given in [33].
For the charge-based method, frequencies of the typically
sinusoidal excitation signal are about 1 MHz in order
to provide sufficient sensitivity. The different hardware
designs of the sensing electronics, their advantages and
disadvantages have been presented by Yang [34]. More
recently, Wegleiter et al compared the charge-based method
(sinusoidal excitation signal, 40 MHz) and the potentialbased method for the ECT sensor front-end in terms of
circuit modelling, robustness to stray capacitances, hardware
design issues and measurement repeatability [35]. From the
ECT hardware design perspective, sensors need to meet the
following requirements and limitations:
• A selectable operation mode of each electrode
(transmit/receive).
• Accurate measurement of very small inter-electrode
capacitances in the range of 1 fF to 5 pF in the presence
of stray capacitances of the order of 150 pF [34, 36].
• High dynamic range of the amplifying circuitry to cover
a wide range of magnitudes for electrodes adjacent and
opposite to the transmitting one and to be able to consider
both low- and high-contrast problems in permittivity.
• High measurement resolution since capacitance changes
caused by permittivity variations are very small.
• Shielding for all sensor circuits to reduce cross-talk
between transmitting and receiving electrodes.
• Parallel measurement of receiving channels.
• The possibility of sensor calibration (circuitry
adjustment).
• A monotonic transfer function for the range of interest.
4
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
grounded front-end (1 of n)
I/U
Logic
HF
inclusion
image reconstruction
algorithm
electrode
μC
PVC tube
grounded front-end (n of n)
I/U
Logic
HF
grounded shielding
Figure 2. Measurement configuration of the used ECT sensor. The measurement electrodes are placed around the pipe containing the
imaging plane. Every electrode features dedicated transmitting and receiving hardware.
point of calibration can be obtained by comparing simulated
and measured data for a well-defined target—a centred PVC
rod in our example. The gain error between model and
data is then corrected for each electrode, figure 5(b). Note
that the prescribed calibration distribution should somehow
correspond to an expectable permittivity distribution in terms
of permittivity values and shape. It is very helpful to calibrate
the model for a reference distribution that will lead to improved
SNR, which can be achieved by placing the known reference
object in the centre of the pipe where there is low sensor
sensitivity. Due to temperature fluctuations and deterioration
effects it is recommendable to repeat the calibration in terms
of offset and gain correction during operation of the sensor.
2.3. Measurement uncertainty
The sensor front-end and the subsequent instrumentation
introduce noise sources to the measurement process. Applied
voltage also has error, but this error is not significant and
we do not consider that here (see [39] for an analysis that
does). To investigate the robustness and repeatability of data
acquisition, the distribution of measured displacement currents
was examined over multiple measurements with a given fixed
permittivity distribution [31]. Figure 6 shows the normalized
quantile plot and the histogram for 2000 measurements at
one electrode. The measured electrode displacement currents
exhibit noise properties that can be well modelled as additive
zero mean Gaussian with standard deviation σ ≈ 0.07μA.
The matrix of sample correlation coefficients for all electrodes
is shown in figure 7. As can be seen, off-diagonal elements are
plausibly zero, so the measurement error covariance matrix
is modelled as = σ 2 I where I is the identity matrix.
Accordingly, the density for measuring d given permittivity
defined by θ is the multivariate Gaussian,
1
T −1
π(d|θ ) ∝ exp − (qm − d) (qm − d) ,
(1)
2
Figure 3. The first 105 values of the acquired measurement vector
for a typical material distribution of PVC in air. The crown-shaped
profile results from the wide range of coupling capacitances
between the transmitting and receiving electrodes.
2.2. Sensor calibration
When using measured data special focus has to be put
on the calibration of the computer model in order to
successfully reconstruct parameters from the data. Calibration
is performed by fitting the model to measured data for known
internal permittivities, by adjusting stray capacitances. Stray
capacitances are represented by two parameters—the radial
distance between the tube and the outer screen and the
permittivity within this space. In cases where other aspects
of the geometry are also uncertain, we include parameters to
describe the possible variation. Figure 4(a) shows the relative
error between model and data for an empty pipe. According
to the normalized quantile plot in figure 4(b), the difference
is Gaussian distributed yielding a multivariate Gaussian error
model. The remaining offset error (model to data mismatch)
is corrected for the empty pipe (figure 5(a)). A second
where qm denotes the vector of simulated displacement
charges.
5
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
3
2
2
quantiles of input sample
relative error between measurements
and model [%]
QQ plot of sample data vs. standard normal
3
1
0
1
1
0
1
2
2
3
0
30
60
90
120
150
180
entries in charge vector
210
3
3
240
1
0
1
standard normal quantiles
2
(a)
2
3
(b)
Figure 4. (a) Relative error between simulated and measured data for an empty pipe, i.e. for an air-filled pipe (εr = 1.0). (b) Normalized
quantile plot of the error between measured and simulated displacement currents. The distribution of displacement currents meets the
Gaussian assumption.
12
x 10
15
10
10
6
gain error
offset error [A]
8
4
2
5
0
2
4
0
30
60
90
120
150
180
entries in charge vector
210
240
(a)
0
0
30
60
90
120
150
180
entries in charge vector
210
240
(b)
Figure 5. (a) Difference between measured and simulated data for an empty pipe. This difference is referred to as offset error. (b) Gain
error between measured and simulated data corresponding to a centred PVC rod after offset correction.
instrumentation in Graz, where each electrode in sequence is
made ‘active’ with an asserted potential of v0 and the remaining
electrodes j = i act as ‘receivers’ and are held at potential
vj = 0. The arrangement is surrounded by a grounded outer
screen.
We denote the measurement region by , being the
region bounded by the electrodes and outer shield. Let
i , i = 1, 2, . . . , Ne denote the boundary of electrode i when
there are Ne electrodes, and let s be the inner boundary of the
e
outer shield. Then the boundary to is ∂ = ∪N
i=1 i ∪ s . In
the absence of internal charges, the electric potential u satisfies
the generalized Laplace equation in which the permittivity
ε appears as a coefficient, along with Dirichlet boundary
value conditions corresponding to the voltage asserted at
electrodes.
3. Data simulation for ECT
3.1. Mathematical model
As discussed in section 2, measurements for ECT consist of
the displacement charges at electrodes that result from voltages
applied at electrodes. The forward map is the deterministic
functional relationship from permittivity distribution to the
noise-free displacement charges. In practice it is necessary
to solve for the electric fields throughout the measurement
apparatus, from which the displacement charge at electrodes
may be evaluated.
The particular set of displacement charges measured,
and hence the forward map, actually depends on the set of
electrode voltages applied during the measurement procedure.
Here we describe the procedure employed in the ECT
6
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
asserted at electrodes ui |j . (The linear relationship is
operation by the matrix of trans-capacitances C.) Note that the
measurement set we describe, above, corresponds to asserting
the standard basis of electrode voltages to fully characterize
the linear mapping. Consequently, measurements made using
any other set of electrode voltages are a function of the set we
describe here.
In the presence of noise, however, measurements made
using some voltage patterns contribute more information about
the unknown permittivity than others. Then, for a given
number of measurements, optimal resolution may be achieved
using an incomplete set of voltage vectors [40].
3.2. Nature of the forward map
The forward map, defined by solving the BVP (2) and then
evaluating equation (3) defines a map from Dirichlet data to
Neumann data (or flux) on the boundary of the region, hence
gives a representation of the ‘Dirichlet to Neumann’ (DtN)
map. The measured trans-capacitance matrix C is a discrete
version of this map, between electrodes. We note, in passing,
that the BVP being self-adjoint implies that C is symmetric,
while the definiteness of the generalized Laplacian implies that
C is positive definite [41].
At typical SNR of 1000:1, a study of the linearized
forward map shows that ECT data contains information about
at most 103 independent features of the permittivity [42].
This number increases linearly with geometric improvement
of SNR, and hence is effectively an upper bound for practical
instrumentation. It seems plausible that a similar bound holds
for the nonlinear problem. Hence the use of 16 electrodes
in ECT, giving 120 independent electrical measurements, is
sufficient. Further measurements effectively only increase the
SNR, which is achieved most efficiently by longer acquisition
times rather than more electrodes and instrumentation.
correlation coefficient ρxy
Figure 6. Distribution of 2000 measured displacement currents on
one electrode. The distribution can be well modelled as Gaussian
distribution according to the normalized quantile plot (top) and the
histogram of the displacement currents (bottom).
1
0.8
0.6
0.4
0.2
16
0
13
16
13
9
9
5
5
1
3.3. Model error
Figure 7. Matrix of correlation coefficients. Off-diagonal elements
are almost zero.
Model error occurs when (theoretical) data defined by the
forward map differs from the noise-free data produced by the
physical measurement system. For many complex inverse
problems model error is the most fundamental source of
uncertainty since it is not avoidable. However, it is not usually
a significant issue in ECT where precise instrumentation is
represented well by the electric field equations. The most
likely source of model error is when the actual and modelled
geometries differ [43]. An interesting investigation of model
error is given in [44], though it is important to note that in
that work the correct (simulated) model lies within the range
of assumed models. Much more problematic is model error
outside any known range, though frameworks for that problem
have been developed recently [45].
For the case when electrode i is held at potential v0 while
all others are held at virtual earth, then the potential ui satisfies
the Dirichlet boundary value problem (BVP),
∇ · (ε∇ui ) = 0,
in ,
ui |i = v0 ,
ui |j = 0,
ui |s = 0.
j = i,
(2)
For brevity we have not written ε(r) and ui (r) showing the
functional dependence on position r ∈ , but take that spatial
variation to be implicit. The charge at the sensing electrode j
can be determined by integration of the electric displacement
over the electrode boundary,
∂ui
dr,
(3)
ε
qi,j = −
∂n
j
3.4. Computer model
While there are a few idealized problems in ECT for which
analytic solution of the forward map is possible [46], in all
practical cases accurate simulation of data requires computer
evaluation of a discretized version of equations (2) and (3).
where n is the inward normal vector.
It can be seen from equations (2) and (3) that the measured
displacement charges qi,j are a linear function of the voltages
7
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
∂Ω
Ω2, ε2
∂Ω2
Ω1, ε1
Ω
Figure 8. Cross-sectional view of an inclusion with permittivity ε2
in a background material with permittivity ε1 .
Two discretization schemes are most commonly used: the
finite element method (FEM) and the boundary element
method (BEM). Both approaches have been applied to ECT.
For the reference problem we have used a coupled FEM/BEM
scheme [17, 31], with the region being imaged using a BEM
formulation coupled to a FEM discretization of the insulating
pipe and region outside the electrodes.
Figure 9. A finite element mesh used for ECT.
in the reference application where we can argue on physical
grounds that surface tension at voids causes inclusions to be
smooth.
Further, the complexity of the BEM formulation increases
dramatically as the number of inclusions increases. This
makes BEM unsuitable for problems where a variable
number of inclusions are allowed, such as the high-level
representations discussed in section 5. In three-dimensional
problems the number of nonzero entries in the BEM system
exceeds the number in a FEM discretization, making BEM
unsuitable because of computational cost.
3.4.1. Boundary element method. The BEM uses a discrete
form of the boundary integral equations [47] that express fields
within regions of constant permittivity in terms of values at
the boundary of the region. Hence BEM is applicable to
problems where the permittivity is piecewise constant, as in
our reference problem, and has been used extensively in ECT
[30, 48, 49].
Figure 8 illustrates a BEM discretization of an elliptic
inclusion (ε2 ) in an otherwise constant background material
with permittivity ε1 . For simplicity, linear boundary elements
are used in the reference problem, and the electric potential
and its normal derivative are assumed to be constant on each
of the Nb elements, though higher order schemes are possible
[50]. The resulting system to be solved has the form
K + 12 I u = H q,
(4)
3.4.2. Finite element method. In FEM discretization of
equations (2), the region of interest is usually discretized as
the union of triangular elements, each of constant permittivity,
with the potential interpolated between nodes by piecewise
linear functions [16, 51]. Figure 9 shows a FEM mesh recently
used for ECT [52]. The discretized area includes the insulating
pipe (dark grey) the region outside the pipe with electrode inset
(light grey) and the region inside the pipe containing material
of unknown permittivity being imaged. This mesh has about
6000 elements and is ‘unstructured’, with smaller elements
around the electrode ends to give accurate representation of
rapid changes in fields, and with larger elements towards the
centre of the pipe where decreased resolution of ECT does not
warrant finer division of the permittivity [42].
FEM discretization results in a linear system to be solved
of the form
where u is a known vector of Dirichlet values, q is the vector
of Neumann values being solved for, and the matrices K and
H are dense, not symmetric, and of size Nb × Nb when there
are Nb boundary elements in total.
BEM suffers from many potential numerical difficulties
that make it problematic for use in solving inverse problems
by iterative methods, whether statistical or deterministic. One
such problem is that the matrix system to be solved becomes
highly ill-conditioned when regions are thin or significantly
non-convex. While these geometry-based problems have wellknown solutions, they do add to the complexity of computer
codes; otherwise they pose a genuine difficulty for sampling
algorithms since the state is required to explore all possible
states including those for which BEM fails. A ‘fix’ that we
have implemented for the reference problem is to modify
the prior over boundaries to exclude states that present a
numerical difficulty, such as polygonal boundaries with thin
‘spikes’. This pragmatic solution is unlikely to affect results
Ki u = fi ,
(5)
where u is a known vector of nodal values, over the whole
mesh, and fi is a forcing vector and Ki is the stiffness matrix
modified for the Dirichlet conditions corresponding to nonzero
voltage on electrode i. Notably, the matrix Ki is symmetric,
sparse and of size Ne × Ne when there are Ne nodes in the
mesh.
8
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
to the boundary by solving a non-homogeneous equation with
the current BEM matrices [57].
Data is simulated by solving the FEM equation (5) for
multiple fi , one for each asserted voltage pattern. For twodimensional problems, efficient solution is achieved by first
operating by a bandwidth reducing permutation followed
by Cholesky factoring of Ki [52]. For three-dimensional
problems with fine meshes, multigrid solvers are significantly
faster, and also provide access to cheap solutions at coarse
scales that may be utilized within the MCMC to decrease
overall compute time [53].
For the efficient numerical implementation of (3) the
electrode charge is directly calculated from the node potentials
ui for sending electrode i and the global finite element stiffness
matrix free of boundary conditions K without having to resort
to gradient calculations,
knTj ui .
(6)
qi,j =
4. Formulation of ECT as Bayesian inference
Application of Bayesian inference to ECT provides a
framework for quantified model fitting, by explicitly forming
the conditional distribution of parameters defining the
permittivity and unobserved data, given observed data. The
distribution over parameters quantifies the degree to which
measurements combined with other knowledge determine
the unknown permittivity, or properties of the permittivity.
The uncertainty over permittivities arises from several main
sources: measurement noise, measurement sensitivities
that are small compared to the noise, model inaccuracy,
discretization error and a priori uncertainty in the true
parameters.
Defining a distribution over allowable permittivities
includes the formulation used in deterministic approaches
of seeking a single solution, when the distribution is highly
peaked around a single parameter value. It also allows for more
general circumstances of multi-modal densities corresponding
to multiple solutions being consistent with the data. However,
the real power for high-dimensional inverse problems is that
robust estimates may be calculated over the density, whereas
modes of the density corresponding to deterministic solutions
can give results that are highly sensitive to the particular
realization of noise in the measured data [12, 58].
nj
The sum contains the scalar products of the rows of the stiffness
matrix corresponding to the nodes nj of the sensing electrodes
and the solution vector.
In our recent work in ECT, and EIT, we have used the
FEM discretization only, for reasons of speed, generality and
for the generic structure of the FEM system of equations that
allows very efficient calculation Jacobians and local updates
[39, 54]. The permittivity ε(r) defined by parameters θ is first
mapped to FEM elements, and the solution is then calculated.
3.4.3. Discretization error. In BEM or FEM formulations,
the number of elements used is a compromise between
numerical accuracy and computational effort. However,
good imaging results require that the discretization be made
sufficiently fine so that errors introduced through discretization
are smaller than measurements errors. The mesh depicted
in figure 9 was designed as the coarsest mesh meeting this
requirement. Failing to achieve that, and not including
discretization errors correctly, leads to substantially increased
errors in the recovered permittivity. This important result
is demonstrated explicitly in [18, 55]. Coarse numerical
discretization has been used, either in conjunction with
accurate solvers or by including discretization error, to speed
up sample-based inference in ECT. Schemes for achieving this
are covered in section 6.3.2.
4.1. Likelihood function
The unavoidable presence of measurement noise means that
the measurement process is probabilistic, as we saw in
section 2.3. The inverse problem is then naturally a problem of
statistical inference. In the following we outline the inferential
formulation in a general setting and relate it to the reference
problem in ECT.
For the sake of definiteness, consider the case of additive
noise n with probability density function πn (n). In most cases
the measurement noise has a multivariate Gaussian or a Gibbs
distribution [55], though it is important to note that any noise
process may be treated. Then the measurement process can be
written as
3.4.4.
Factorizations and derivatives. When efficient
solution of systems (4) and (5) is performed by first factorizing
matrices, it is feasible to directly maintain the QR factorization
in BEM, and the Cholesky factorization in FEM [56],
for potential gain in computational efficiency. Both these
schemes have been applied to EIT, as has direct updating of
solutions using the Woodbury formula [39] though the latter
is numerically unstable in the long term. Both FEM and BEM
formulations also allow efficient calculation of derivatives.
In a FEM formulation, operation by the Jacobian from finite
element coefficients to solutions may be performed using only
solutions at the current state [13], with gradients with respect
to the parameter vector calculated using the chain rule. In
BEM formulations, expressions for the Fréchet derivative of
the forward map allow evaluation of the gradient with respect
d = A(θ ) + n,
(7)
where A(θ ) denotes the forward map describing the mapping
from permittivity defined by θ to noise-free measurements,
and n is a realization from the noise process.
Equation (7) represents a decomposition of measurements
into deterministic (A(θ )) and random parts (n). For the
instrumentation described in section 2 the decomposition was
relatively clear, largely because of careful construction of
instrumentation and repeated improvement to remove stray
effects. In general, however, the decomposition is somewhat
arbitrary since it is possible to describe any effect as random.
The random part has a minimal component consisting of
thermal and shot noise in the electronics, and digitization
errors, and often includes external interference, though the
9
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
latter can be modelled as deterministic through the use of
‘nuisance variables’. Our experience is that quality of imaging,
and inference, is always improved by putting as much as
possible into the deterministic part, though complexity of
physical modelling and computation often set a practical limit.
The conditional probability density for measuring d given
that θ is the true parameter then follows from equation (7), and
is
π(d|θ ) = πn (d − A(θ ))
uncertainty bounds. Bimodal distributions need at least two
values reported, and so on. Summary statistics of some
function f (θ ) may be calculated as expectations over the
posterior
f (θ )π(θ |d) dθ
(10)
E[f (·)] =
with common statistics being the mean E[θ ] and the
Since parameter space
variance E[(θ − E[θ ])2 ] of θ .
is usually of high dimension, the integrals required
cannot be performed analytically, or using deterministic
numerical methods such as Gaussian quadrature. Fortunately,
Monte Carlo approximations can be evaluated with tractable
computation, as discussed in section 6.
(8)
since the change of variables has determinant 1. Making a
set of measurements corresponds to drawing a sample d from
π(d|θ ) which is a probability distribution parametrized by the
unknowns θ via the forward map A.
As a function of θ, π(d|θ ) is not a probability density
function and is usually written l(θ |d) and referred to as the
likelihood function. The likelihood principle is the formal
statement that all information that data d contains about the
unknown parameters θ is encoded in the likelihood function,
for fixed d.
The form of the likelihood function we use in the
reference problem is given in equation (1), accounting for
errors in measured displacement charges. In practice the
voltages asserted at electrodes are also imprecisely known
and the nominal values should be considered as part of the
measurement set. A framework for that more complete
analysis is given in [39], which augments rather than changes
our development here.
4.3. Sensor calibration
Our inferential formulation for the reference problem is
actually a little more complicated than the general procedure
given above, because of the sensor calibration step. The
permittivity ε is decomposed (in this section only) into
unknown and fixed parts, with separate domains. We write
εext = ε|ext , εint = ε|int where int is the interior of the
pipe, that in the application contains material of unknown
permittivity, and ext is the pipe and exterior region in which
the electrodes are fixed.
Calibration consists of estimating the exterior permittivity
εext parametrized by θext containing a few permittivity values
as well as a few parameters describing possible deviations
from ideal geometry. Repeated measurements are made with
simple known interior permittivities parametrized by (known)
parameter θint , allowing the simple best-fit estimate
4.2. Bayesian inference
Statistical inference aims at recovering parameters θ and
assessing the uncertainty about these parameters based on
all available knowledge of the measurement process and the
measurement noise as well as information about the unknowns
prior to the measurement. In the Bayesian formulation,
inference about θ is based on the posterior density π(θ |d)
θext = arg max π(d|θext , θint ).
θext
(11)
We then fix, i.e. condition on, this estimate to give the
likelihood function for inference about unknown θint .
4.4. A short reading list in Bayesian inference
l(θ |d)π(θ )
,
(9)
π(d)
where π(θ ) denotes the prior density, expressing the
information of θ prior to the measurement of d. The topic
of prior distributions is covered in section 5. The posterior
density π(θ |d) denotes the probability density over θ given
the priorinformation and the measurements. The denominator
π(d) = θ l(θ |d)π(θ )dθ is a finite normalizing constant since
the sum of the posterior probability density function over all
possible causes must be equal to one. In case of a fixed
forward map such as in ECT this probability density does not
need to be calculated explicitly and we can work with the nonnormalized posterior distribution determined by the likelihood
function and the prior density function.
In this inferential framework, solution to the inverse
problem corresponds to providing statistics that summarize
the posterior distribution. How to summarize the posterior
depends greatly on the application under consideration. For
example a posterior distribution peaked around a single value
is well summarized by giving just that value, and a measure
of width, corresponding to a well-defined inverse image with
π(θ |d) =
Our treatment here of the Bayesian formulation is necessarily
cursory, and light on technical details. For more details of
Bayesian formulations of inferential problems in general (not
necessarily for inverse problems or ECT) we recommend
[59]. Computing expectations over complex densities in
parameter space necessarily use sampling algorithms. A
practical introduction is given in the first few chapters of
[60]. More technical results regarding convergence of MCMC
methods can be found in [61]. A kit-bag of useful ideas for
speeding up MCMC can be found in [62].
5. Representation of unknowns and the prior
distribution
Since the primary unknown, i.e. the permittivity, is a spatially
varying function, recent developments in spatial statistics [63]
and pattern theory [64] are directly applicable. They provide
means of stating loose, generic and specific information about
the unknown permittivity, as befits the application. A good
overview of image analysis from the statisticians viewpoint
10
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
for arbitrary permittivity distributions. Grey values may be
restricted to two values (black/white) [13, 53, 65] or a finite set
of allowable values [25] or more typically to any positive value
[26, 44]. Low-level prior distributions usually have the role
familiar in regularization of preferring smoothness, sometimes
modified to implicitly allow non-smooth behaviour such as
edge processes. Good overviews of low-level prior modelling
in EIT are given by Kaipio et al [26, 55] and Siltanen et al
[66], and for single-photon emission computed tomography
by Aykroyd et al [67].
The following gives a brief discussion of low-level prior
models used for ECT:
is [19]. The central components of all image models are the
representation of the unknown image, i.e. the parameters or
coordinates used to define an image, and a normalizable prior
density over the space of representations.
Representation and knowledge are inextricably linked,
and so the reason for choosing a particular representation
should be largely determined by the type of knowledge one
wants to express or calculate. For example, in the reference
ECT problem we know that the permittivity is two valued
(background and inclusion) and our primary interest is the
area of the inclusion. A polygonal representation of the
inclusion, with background and inclusion permittivity values,
is a suitable representation since it automatically states the
physical knowledge and allows straightforward calculation of
the area. In contrast, a grey-scale pixel image would require
further constraints to state the prior knowledge, while the
calculation of inclusion area is a non-trivial task requiring
identification of the inclusion boundary amongst other steps.
In many applications, the use of representations that provide
quick access to properties of interest can provide substantial
efficiencies, since then image restoration and image analysis
are not separate tasks.
It is useful to classify representations, and priors, as lowlevel, mid-level and high-level. Low-level representations
are local and generic, and usually very high-dimensional,
such as grey-scale pixel images, or the vector of element
coefficients in a FEM discretization. These representations
can be used for any image, but are inconvenient for stating or
calculating anything other than local structural information.
Mid-level models are also generic, but provide convenient
ways of expressing quantities of interest such as geometric
features of objects, or between objects. An example is the
polygonal representation used in the reference problem. Highlevel models capture important, possibly complex, features of
the images and are useful for answering global questions about
the image, such as counting the number of objects of a given
type.
The formulation of a statistically sensible prior
distribution over the space of representations is a major
practical difference between regularization and Bayesian
inferential methods. Consistency of the statistical formulation
and guaranteed convergence of sampling algorithms both
require that the prior density be normalizable. We find that
the requirement of specifying a parameter space with finite
volume has the added benefit of forcing us to be explicit when
modelling the image. In the Bayesian framework it is typical to
test modelling assumptions by drawing several samples from
the prior distribution and ensuring that they look reasonable.
In contrast, typical regularization functionals would fail these
requirements.
The role of the prior distribution is typically different for
low- mid- and high-level representations, as we will see in the
following review of representations and priors for ECT.
• Gaussian priors. The Gaussian white noise prior, in
equation (12), is the most widely used prior model,
since the diagonal covariance matrix generalizes standard
Tikhonov regularization
1
2
(12)
π(θ ) ∝ exp − 2 θ − θ .
2σ
The variance σ 2 describes the variability of the unknown
parameters θ around the assumed mean value θ .
• Markov random field (MRF) priors. Smoothness priors
are special cases of MRFs in which the conditional
density over pixels (elements) depends on the remaining
parameters only through its neighbours. A typical MRF
prior is the total variation prior [55] given by
T V (θ ) =
M lij |θi − θj |,
(13)
i=1 j ∈Ni
where Ni ⊂ {1, 2, . . . , M} is a set of possible neighbours
/ Ni and i, j are neighbours with a common
of θi with i ∈
edge of length lij . Common neighbourhood structures
are induced by the pixel lattice or FEM mesh. The prior
probability density is then π(θ ) ∝ exp{−βT V (θ )}, where
β is a smoothing parameter. We note that π(θ ) is a Gibbs
distribution [23]. An application of a non-standard MRF
prior can be found in the recovering of resistor values in an
electrical network from electrical measurements collected
at the boundary [13].
• Impulse noise priors. Such priors are typically applied to
low contrast problems where small regions in an otherwise
uniform background have to be recovered (e.g. bright stars
in black sky). Representative priors are the maximum
entropy prior [68] and the L1 prior [69].
• Sample-based priors. In some applications it is possible to
define a representative ensemble of images that may occur,
through a set of sample images that define an empirical
prior density. This prior has been used in the context of
pixel image models [55].
5.2. Mid-level models
5.1. Low-level models
Mid-level representations allow access to generic structural
information about the unknown permittivity, without imposing
complex structure. Examples of mid-level priors used for
Low-level representations use grey-scale values over a pixel
(voxel) lattice, or a fixed FEM discretization, and can be used
11
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
ECT, or EIT, include the ‘type field’ or segmented MRF
prior [25, 70, 71], coloured continuum triangulations [72] and
explicit boundaries between piecewise-constant permittivities
such as the polygonal boundary used in the reference
application (though some classify the latter as high level [19]).
Representing unknowns as piecewise constant via
boundaries has been widespread in ECT, with general
contour models based on Fourier descriptors [66, 73], splines
[74], radial basis functions [54], front points [75], Bezier
curves [76] and simple polygons [48, 49, 54].
Fewparameter representations of smooth contours [77, 78] and
smooth transitions [42] between regions of different physical
properties have also been used.
Our solution to the reference problem uses a polygonal
representation of the boundary, so the permittivity is defined
by the parameter θ = ((x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )) giving
the vertexes of an n-gon for some fixed n typically in the
range 8–32. We also include the two permittivity values,
but we will omit that consideration here for clarity. A basic
prior density over this representation is to sample each vertex
(xk , yk ) uniformly in area from the allowable domain and
restrict to simple polygons, i.e. not self crossing. This prior
density has the form
π(θ ) ∝ I (θ ),
10
x 10
4
9
8
number of samples
7
6
5
4
3
2
1
0
0
0.5
1
1.5
2
2.5
3
3.5
4
area (arbitrary units)
Figure 10. Empirical distribution over area for indicator function
prior density.
reconstructions. These difficulties can be circumvented by
further modifying the prior distribution.
We briefly mention some other mid-level priors that have
been used in ECT.
(14)
• Smoothness prior over star-shaped polygons. Aykroyd
et al represented polygons in a star-shaped manner,
parametrizing the centre of the star and radii r =
(r1 , r2 , . . . , rn ) at m equi-spaced angles [48]. They
specified a prior intended to give smoothness of the
boundary, using
⎫
⎧
⎬
⎨ 1 (ri − rj )2 ,
(16)
π(r) ∝ exp − 2
⎭
⎩ 2ν
where I is the indicator function for θ representing a feasible
polygon. That is, the prior density is constant over allowable
polygons. The change of variables relation for probability
distributions shows that a uniform density in vertex position
gives a density over area that scales as (area)−1/2 . Hence
for large polygons, and inclusions, where most polygons are
simple, this prior puts greater weight on small areas resulting
in estimated areas that will always be smaller than the true
area. The constraint that the polygon be simple complicates
this picture for small area inclusions, since a greater proportion
of small polygons are self-crossing. An empirical distribution
over prior area, given by sampling from the prior, is shown
in figure 10 for n = 8. The overall effect is that the area of
large inclusions will be underestimated while the area of small
inclusions will be overestimated, with the division between
‘small’ and ‘large’ depending on the number of vertexes n.
This effect necessarily occurs in regularized or least-squares
fitting of contour-based models, since effectively the constant
prior model is used.
Since we are primarily interested in area of inclusions, we
remove this bias by explicitly specifying a prior in terms of
area, given in equation (15) [79], though scaling based on the
empirical distribution to give a prior that is non-informative
with respect to area is also possible [52]. The circumference
of the inclusion c(θ ) with respect to the circumference of a
circle with an area equal to the area of the polygon (θ ) is
calculated. The variance σpr2 is chosen to be small to penalize
small and large areas
c(θ )
1
−1
π(θ ) ∝ exp − 2
I (θ ) .
(15)
√
2σpr 2 (θ )π
i∼j
where i ∼ j indicates neighbouring radii. In the context
of the reference problem, it is interesting to note that this
prior will exhibit the bias in (large) area outlined above,
as is evident from computed results.
• Structural priors.
Especially in medical imaging,
geometry and position of structures are often known
a priori. Hence, it is reasonable to include this knowledge
in the form of a prior model [80, 81], or equivalently by
fusing different sensing modalities. An example is the
use of ultrasound tomography to recover boundaries for
use in ECT [82].
Parameter spaces for mid-level models are not-usually
linear spaces. Consequently, non-uniqueness or ill-posedness
results for forward maps based on the theory of linear spaces
do not apply. We have experience of industrial applications
where an inverse problem that was under-determined for a
low-level linear-space model became over-determined for a
mid-level model and required reduction of the measurement
set to avoid excessive computation.
5.3. High-level models
High-level models incorporate structural information by
modelling objects in the image. Representations are typically
Redundancy in the polygonal representation can also lead
to numerical inefficiency without contributing to quality of
12
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
The reversible jump formalism considers the composite
parameter (θ, γ ) where θ is the usual state vector, and γ is the
vector of random numbers used to compute the proposal θ .
Similarly, (θ , γ ) is the composite parameter for the reverse
proposal. Then the MCMC sampling algorithm with MH
dynamics can be written as
variable dimension allowing for differing numbers of objects.
Hence, perhaps contrary to expectation, high-level models
are usually higher (infinite) dimensional models than lowor mid-level models. High-level prior densities are defined
over individual objects and, importantly, also over the number
of objects to provide a trade off between model complexity
and data fit, as in an ‘information criterion’. A high-level
representation was used in [70] allowing conditioning on, and
counting of ‘blobs’ in an EIT application. It seems inevitable
that high-level deformable template models [64] developed for
medical imaging, and other applications, will find application
in ECT.
Let the chain be in state θn = θ , then θn+1 is determined in the
following way:
• Propose a new candidate state θ from θ with some
proposal density q(θ, θ ).
• Calculate the MH acceptance ratio
π(θ |d)q(γ ) ∂(θ , γ ) . (17)
α(θ, θ ) = min 1,
π(θ |d)q(γ ) ∂(θ, γ ) 6. Summarizing the posterior distribution
• Set θn+1 = θ with probability α(θ, θ ) , i.e. accept the
proposed state, otherwise set θn+1 = θ , i.e. reject.
• Repeat
In the following we write the abbreviated π(θ ) for the posterior
density π(θ |d). Exploration of the posterior distribution
is performed using Markov chain Monte Carlo (MCMC)
sampling that generates a Markov chain with equilibrium
distribution π by simulating an appropriate transition kernel
[83, 84].
The long-term output of MCMC samplers are states θi
distributed according to π , and we write θi ∼ π . The empirical
distribution defined this way can be used to summarize π , or in
exploratory analyses the samples may simply be displayed to
gain understanding about the nature of permittivities consistent
with the data. In many applications a single sample from
π provides a better reconstruction than regularized inversion
[58]. A few independent samples, say 2–4, can establish a
scale and nature of ambiguity in the allowable permittivities
(see e.g. [20]), while extensive sampling allows quantitative
estimates of posterior variability in applications where that
is needed. Computed results for the reference problem,
presented in section 7, are designed to present the range of
states in the posterior by summarizing posterior area and
boundary processes.
The last factor in equation (17) denotes the Jacobian
determinant of the transformation from (θ, γ ) to (θ , γ ).
6.2. Monte Carlo integration
Quantitative estimates from the posterior distribution require
computing the expectations in equation (10). Given samples
{θi }i=1,2,...,N from π , the required integral may be computed
using the Monte Carlo approximation
N
1 f (θ )π(θ ) dθ ≈
f (θi ).
(18)
N i=1
According to the law of large numbers, equation (18) holds
to any desired accuracy for sufficiently large N. In addition, it
follows from the central limit theorem that the approximation
error is independent of dimensionality of the state space
and hence MCMC methods are suitable for high-dimensional
problems. The variance of the approximation error is given
by
N
1 f (θi ) −
f (θ )π(θ ) dθ
Var
N i=1
⎞
⎛
N Var(f ) ⎝
j
≈
ρ̂j ⎠
1−
(19)
1+2
N
N
j =1
6.1. Metropolis–Hastings algorithm
The Metropolis–Hastings (MH) algorithm generates a Markov
chain with equilibrium distribution π by simulating a suitable
transition kernel [27].
It uses a proposal density q(θ, θ ) to suggest a new state
θ when at state θ , i.e. a possible move θ → θ . The proposal
is accepted or rejected according to a rule that ensures the
desired ergodic behaviour. Choice of the proposal density
is largely arbitrary, with convergence guaranteed when the
resulting MCMC is irreducible and aperiodic. However, the
choice of proposal distribution critically affects efficiency of
the resulting sampler, with design of a good proposal being
something of an art.
The standard MH formalism has been extended to deal
with transitions in state space with differing dimension [83],
allowing insertion and deletion of parameters [72, 85, 86].
Even though we do not use variable-dimension states in
the reference example we prefer this ‘reversible jump’, or
Metropolis–Hastings–Green (MHG), formalism as it greatly
simplifies calculation of acceptance probabilities for the
subspace moves that we employ.
τf
with Var(f ) = E f (θ )2 − μ2f . The factor ρ̂j = Ĉj /Ĉ0 is the
normalized autocovariance function (ACF) and Ĉj is the ACF
at lag j , i.e. Ĉj is the covariance between the values taken by f
for two random states of the chain θi and θi+j . Consequently,
the less correlated are consecutive states of the chain the more
accurate are estimates. When {θi }N
i=1 are independent, i.e.
uncorrelated,
samples
and
the
estimator
for the mean of f is
!
f̄N = N1 N
i=1 f (θi ) and the variance of the estimator is
Var(f )
.
(20)
N
However, almost always {θi }N
i=1 produced by MCMC is a
sequence of correlated samples. The rate τf /N at which the
Var(f¯N ) =
13
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
variance Var(f¯N ) reduces in equation (19) is called statistical
efficiency, and we write the variance reduction for independent
samples as
Var(f )
.
(21)
Var(f¯N ) = τf
N
The quantity τf is called the integrated autocorrelation time
(IACT) and can be interpreted as the number of correlated
samples with the same variance-reducing power as one
independent sample. For a given posterior distribution the
Markov chain should be designed in a way that τf is as small as
possible so that small variance σf̄N = Var(f¯N ) of the estimate
f¯N is achieved with minimal sample size N.
Parallel tempering is similar to simulated tempering
except that the N chains (one for each value of k) are maintained
simultaneously. An example is the Metropolis coupled
MCMC in [84] that simultaneously runs chains with the spatial
parameters increasingly coarsened, defining a sequence of
distributions as above.
6.3.2. Using approximations to the forward map. As a means
of model reduction (and counteracting inverse crimes) Kaipio
and Somersalo [55] introduced the ‘enhanced error model’ to
correct for discretization errors introduced by coarse numerical
approximations. For the case of Gaussian prior and noise
distributions, they considered the accurate model d = Aθ + n
and the coarse approximation d = Ãθ̃ + ñ where θ̃ = P θ is
a coarse approximation to the unknowns x resulting from a
projection by P, and à is the (cheap) approximation to A on
coarse variables. Then
6.3. Acceleration schemes for Metropolis–Hastings MCMC
A major drawback of MCMC sampling is its computational
expense since the forward problem has to be solved hundreds
of thousands times to explore the posterior distribution. Hence
much research effort has gone into finding ways of accelerating
the basic MH algorithm. We review several such schemes,
dwelling on those that have found application in impedance
tomography.
ñ = (A − ÃP )θ + n
defines the enhanced error model by assuming that the two
terms on the right-hand side are uncorrelated. Use of the
coarse approximation necessarily increases the uncertainty
of recovered values, since discretization error has been
introduced. However, Kaipio and Somersalo [55] give
examples in which a tolerably small increase in posterior
uncertainty is traded for a huge reduction in compute time
without introducing bias in estimates, and demonstrate that
accurate real-time inversion is possible.
A second use of approximations was introduced by
Christen and Fox [13] who considered the state-dependent
approximation πθ∗ (·) to the posterior distribution calculated
using a cheap approximation to the forward map, to give a
modified Metropolis–Hastings MCMC. Once a proposal is
generated from the proposal distribution q(θ, θ ), to avoid
calculating π(θ ) for proposals that are rejected, they first
evaluate the proposal using the approximation πθ∗ (θ ) to create
a second proposal distribution q ∗ (θ, θ ) that is then used in a
standard Metropolis–Hastings algorithm.
Christen and Fox [13] present an example using a local
linearization, and demonstrate an order of magnitude speedup
for a problem in electrical impedance imaging. Lee used
a coarse BEM approximation to speed up inverse obstacle
scattering [57], while coarsened solutions available in a multilevel (multigrid) solver were used for a problem equivalent to
ECT in [53].
6.3.1. Simulated tempering. Consider the case where the
MH algorithm is used to sample from posterior distribution
π(·) using proposal distribution q(θ, θ ) and it is found that
the resulting chain is evolving slowly, or worse still is getting
stuck. This can happen because of multi-modality of π(·), or
because of strong correlations as is typical in inverse problems
where the support of π(·) can be effectively a low-dimensional
subspace of . Simulated tempering (with the name and
idea adapted from simulated annealing for optimization) is a
general method that can overcome some of these difficulties,
while using the existing proposal distribution.
The methods augments the state space to ×
{0, 1, . . . , N } and defines a set of distributions {πk (·)}N
k=0
where π0 = π and π1 (·), π2 (·), . . . , πN (·) are a sequence of
distributions that are increasingly easy to sample from. The
distribution over the augmented space is taken as
π(x, k) = λk πk (x),
(23)
(22)
where
λ0 , λ1 , . . . , λN are pseudo prior constants with
!N
λ
= 1. Transitions for a fixed k are derived from
k
k=1
the proposal q(θ, θ ) and are interspersed with proposals
that change k (perhaps by a random walk in k) with
both accepted/rejected by a standard Metropolis–Hastings
algorithm. The random walk then occurs in (θ, k) space.
Samples that have k = 0, i.e. from the conditional density
π(θ, k|k = 0), are samples from the desired distribution.
A simple example of such a sequence is the scheme
due to Marinari and Parisi [87] who introduced simulated
tempering. Define the positive numbers (inverse temperatures)
1 = β0 < β1 < · · · < βN . The sequence of distributions
is then given by πk (θ ) = λk π βk (θ ) which are increasingly
unimodal. The opposite regime, of increasing temperature, has
found greater success in impedance tomography applications
where high-accuracy data leads to posterior distributions that
are too narrow to easily sample [70].
6.4. Summary statistics
The posterior distribution is typically defined over a highdimensional parameter space so direct visualization is not
possible. However, in the Bayesian framework we are
able to calculate summary statistics that quantify and
examine the feasible solutions to the inverse problem.
Histograms for properties, posterior variability of parameters,
and expectations can be easily derived from the posterior
distribution. Scatter plots allow for an illustrative and
meaningful depiction of the range of feasible solutions (see
section 7).
14
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
More generic summary statistics of a posterior distribution
are the point estimates (or modes) and interval estimates,
computed via numerical optimization. The most common
point estimate is the maximum a posteriori (MAP) estimator
given by
θMAP = arg max π(θ |d)
θ
noise implies that in practice a range of data may be measured
for a given state θ . Let π(d|θ ) denote the probability density
function over allowable measurements d for a given true state
θ . Making a set of measurements corresponds to drawing a
sample d from π(d|θ ).
Our objective now is to work out what we can say
about the parameter θ given measurements d. Inference
about θ is based on the posterior density π(θ |d) applying
Bayes’ theorem (9). The posterior density π(θ |d) gives
the probability density over allowable states θ conditioned
on measurements and prior information. Summarizing the
posterior distribution corresponds to solving the inverse
problem since that gives knowledge of the allowable values
of parameters with uncertainties, etc.
In section 2.3 we have derived the likelihood function
l(θ |d) as a function of θ for the given ECT system. Note that
l(θ |d) is generally not a probability function.
Since in our ECT application the estimation of the process
parameter material or void fraction is of primary interest, we
have to specify an appropriate prior π(θ ) that allows for an
unbiased reconstruction of inclusion area. To avoid this bias
we specify a prior density in terms of area directly, given
in equation (15) in section 5 [79]. The circumference of the
inclusion c(θ ) with respect to the circumference of a circle with
an area equal to the area of the polygon (θ ) is calculated. The
variance σpr2 is chosen to be small to penalize small and large
areas.
Based on the accurately modelled forward map, the
specified prior distribution π(θ ) for the shape and permittivity
of the material inclusion, and the measurement noise model,
we are able to give a posterior distribution π(θ |d) for inclusion
conditioned on measured data
1
c(θ )
π(θ |d) ∝ π(θ )l(θ |d) = exp − 2
−1
√
2σpr 2 π (θ )
1
T −1
− (qm − d) (qm − d) .
(28)
2
(24)
since this mode of the posterior distribution generalizes
regularized inversion. For example, when the noise is
additive zero mean Gaussian with variance σ 2 , and we
use a simple Gaussian prior π(θ ), then the MAP estimate
becomes the classical deterministic setting known as Tikhonov
regularization [12].
In case of multi-modal posterior
distributions, or if the mode of the posterior distribution
lies far away from the bulk of the posterior distribution, the
MAP estimate provides an unsatisfactory summary of feasible
parameter values, and can be very unrepresentative of the
support of the posterior distribution.
The estimate given by the mode of the likelihood function,
often erroneously described as the estimate that corresponds
to the set of parameters which are most likely to generate
the measured data, is called the maximum likelihood (ML)
estimate and is defined by
θML = arg max l(θ |d).
θ
(25)
This estimator is equivalent to solving the inverse problem
without taking regularization into account [55]. Hence, in
ill-posed inverse problems such as ECT the ML estimate is
seldom useful.
A more robust point estimate is the conditional mean (CM)
of the parameters conditioned on the measured data d,
θCM = E[θ |d] =
θ π(θ |d) dθ.
(26)
A common interval estimate is the conditional covariance (CC)
estimate given by
Cov(θ |d) = (θ − θCM )(θ − θCM )T π(θ |d) dθ ∈ Rn×n .
It is this distribution we explore to learn about the unknown
permittivity, by applying MCMC sampling with Metropolis–
Hastings dynamics.
We use several types of update to propose a new state
θ , usually referred to as ‘moves’ [79]. The choice and
the combination of such moves is a crucial issue in MCMC
sampling in order to ensure ergodic behaviour of the chain
within useful time scales and to force convergence. Multiple
moves can be built into the MCMC sampler simply by defining
separate reversible transition probabilities for each move
[25, 83]. Let M be the number of moves, and let {P r (i) (Xn+1 =
θn+1 |Xn = θn )}M
i=1 represent a set of M transition probabilities
which are reversible with respect to the posterior distribution
and let νi , i = 1, . . . , M be the probability of choosing move
i, then the overall transition probability is given by
(27)
The price of robustness in the CM and CC estimates is
an integration over the high-dimensional parameter space,
requiring extensive MCMC sampling.
7. Numerical examples
The previous sections give a complete formulation of ECT
in the Bayesian inferential framework. The uncertainties in
measured data, i.e. in measured inter-electrode capacitances,
and the range of feasible images are taken into account by
statistically modelling the measurement process according to
section 4.
Our representation of the true parameters is the set
of points {(xi , yi )}, giving vertexes defining a polygonal
boundary of a material inclusion [31, 48, 88, 89]. This
set denoted θ ∈ defines a ‘state’ in our reconstruction
algorithm. The forward map in section 3 relates the state θ to
noise-free measurements. The omnipresence of measurement
P r(Xn+1 = θn+1 |Xn = θn )
=
M
i=1
15
νi P r (i) (Xn+1 = θn+1 |Xn = θn ).
(29)
Meas. Sci. Technol. 20 (2009) 052002
(a)
Topical Review
(b)
(a)
(b)
Figure 12. Scatter plots. (a) Entire domain with circular
inclusion. (b) Detail plot. The dashed grey contour corresponds to
the true shape.
(c)
and a permittivity of εr = 3.5 in an air-filled pipe (εr = 1.0)
are considered. For the experiments using synthetic data 2000
000 samples were drawn from the posterior distribution, using
a simulated data set corrupted by noise. The data was created
using 10 000 boundary elements in the forward map. For the
experiment using measured data 1000 000 samples were drawn
from a posterior distribution. Noise standard deviation in both
cases was σ = 6.7 × 10−4 .
A burn-in period of 50 000 samples was found suitable
after testing the algorithm with different initial states and
different ratios of moves. During the burn-in-period, which
strongly depends on the initial state of the Markov chain, the
sampling distribution is not the equilibrium distribution. Once
the chain is in equilibrium only every 100th sample is stored.
The length the burn-in was found by testing the algorithm for
different initial states and different ratios of moves T , V , S , R
to propose a new candidate state. For the results presented,
the four proposal moves were chosen with equal probability,
giving an acceptance rate of about 2%. In the absence of a
comprehensive test for Markov chain convergence, we analyse
selected output statistics in terms of stationarity. Once the
chain achieves stationarity, samples are assumed to come
from the equilibrium distribution and are used for posterior
inference.
(d)
Figure 11. Different moves to propose a new candidate θ . (a)
Translation T , (b) vertex move V , (c) scaling S , (d) rotation R.
At least one of the M moves has to be irreducible on the state
space to ensure that the equilibrium distribution of the Markov
chain is independent of the initial choice θ (0) of the parameter
vector [65].
We find a combination of M = 4 moves gives a suitably
efficient MCMC in this example [31]. These are translation,
rotation, and scaling of the polygon, and moving the position
of one vertex of the polygon.
Translation T : translate the polygon described by the
parameter vector θ by a random step λT ∼ U (−ρ T , ρ T ).
Scaling S : scale
the entire polygon by a random multiplier
λS ∼ U ρ1S , ρ S with ρ S = 2.
Rotation R: rotate the polygon by a random angle λR ∼
U (−ρ R , ρ R ) with respect to the centre c of the polygon.
cos(λR ) sin(λR )
θ = c +
θ
(30)
−sin(λR ) cos(λR )
7.1. Example 1—circular inclusion
Figure 12 illustrates the resultant posterior variability in
inclusion shape as a scatter plot. This plot shows points
taken randomly from each state, uniformly in boundary length,
and gives a graphical display of the probability density that a
boundary passes through any element of area. The points
are clustered around the true state, shown by the dashed grey
contour, indicating that the posterior has a well-defined mode
close to the true value. Hence, for this case, point estimators
calculated from the posterior such the MAP state and the CM
state will give similar results. Histograms of reconstructed
inclusion area and circumference are depicted in figure 13.
Sampled area as well as sampled circumference are scattered
around their true values true = 3.14× 10−4 m2 and ctrue =
6.28× 10−2 m. The estimated parameters and their posterior
variability are summarized in table 1. The MCMC output
trace of inclusion area and inclusion circumference are shown
Vertex move V : shift one vertex of the polygon by a random
step λV ∼ U (−ρ V , ρ V ).
The Jacobian term in the MHG algorithm for the moves
translation, rotation and vertex move is 1. For the scaling
move the Jacobian term is λ−2n+1 [31].
The vertex move ensures irreducibility but, by itself,
would lead to a very slow algorithm. The remaining moves
are designed to give an efficient algorithm. A new candidate
θ is proposed from θ by randomly choosing one of these
four moves and using a random step size λi tuned for each
move.
In the following different shaped material inclusions are
recovered from simulated and measured data using MCMC
sampling. For all experiments inclusions with different shapes
16
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
Table 1. Posterior variability of circular inclusion.
Quantities
True values
Mean
x-coordinate of centre (m)
y-coordinate of centre (m)
Area (m2 )
Circumference c (m)
Log likelihood
0.00
−2.09 × 10−4
2.50 × 10−2 2.52 × 10−2
3.14 × 10−4 3.17 × 10−4
6.28 × 10−2 6.24 × 10−2
–
−38.76
4000
3500
3500
3000
3000
Standard deviation
IACT
2.46 × 10−4
6.76 × 10−5
5.03 × 10−6
6.95 × 10−5
0.26
5.75 × 103
2.24 × 103
1.98 × 102
3.09 × 102
6.48 × 102
2500
2500
2000
2000
1500
1500
1000
1000
500
500
0
0.06
0
3
3.1
3.2
3.3
3.4
2
inclusion area [m ]
3.5
0.061
0.062
0.063
0.064
0.065
0.066
inclusion circumference [m]
x 10
(a)
(b)
Figure 13. Summary statistics. (a) Histogram of reconstructed sample areas. (b) Histogram of reconstructed sample circumferences.
3.6
log likelihood
autocorrelation
1
0.5
0
0.5
0
1
0.5
0
0.5
0
deviation
3.2
0
5000
increased margin
of deviation
10000 15000
0.066
log likelihood
autocorrelation
1.5
output trace
3.4
3
2000 4000 6000 8000
x 10
2000 4000 6000 8000
lag
(a)
0.064
(b)
Figure 15. Scatter plots. (a) Entire domain with elliptic-shaped
inclusion. (b) fancy-shaped contour.
0.062
0.06
0
some variation about the horizontal axis. The IACT is given
in the right-most column of table 1, while the bottom row
gives the mean of the log likelihood. The posterior variability
represented by the standard deviation of the parameters is small
implying high reliability in estimated parameters. Note that
estimated intervals contain the true values within one standard
deviation.
5000 10000 15000
MCMC updates
Figure 14. MCMC output trace (left column) and autocorrelation
(right column) of inclusion area (top) and inclusion circumference
(bottom) in updates.
in the right column of figure 14. The left column corresponds
to the autocorrelation in updates. The ACF discussed in
section 6.2 provides a useful tool for investigating serial
dependence in stationary time series data, as the presence of
serial correlation is revealed by a slowly decaying ACF. The
faster the autocorrelation function for a stationary time series
decays to 0 with increasing lag, the less the correlation between
consecutive states of the chain with consequent reduction of
variance in estimates. In general, the autocorrelation function
should be—after falling off smoothly to zero—distributed with
7.2. Example 2—elliptic and fancy-shaped contour
Figures 15(a) and (b) illustrate the resultant posterior
variability in inclusion shape for more fancy shapes. Points in
the scatter plot are clustered around the true contour plotted
in dashed grey. Due to the decreased sensitivity in the centre
of the pipe, the margin of deviation of scattered points for
the elliptic contour is greater towards the centre than in the
region close to the electrodes. For the fancy contour in
17
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
Table 2. Posterior variability of circular inclusion from measured data.
Quantities
True values
Mean
x-coordinate of centre (m)
y-coordinate of centre (m)
Area (m2 )
Circumference c (m)
Log likelihood
–
3.71 × 10−2
–
−1.14 × 10−2
3.14 × 10−4 3.13 × 10−4
6.28 × 10−2 6.24 × 10−2
–
−46.10
Standard deviation
IACT
2.32 × 10−5
3.02 × 10−5
6.88 × 10−6
1.57 × 10−4
1.72 × 10−1
5.89 × 102
4.65 × 102
1.10 × 103
1.88 × 103
3.99 × 102
8. Non-stationary ECT
In several industrial applications the physical quantities of
interest are time dependent and consequently, the measured
data depends on these quantities at different time steps. On
the other hand, it is often impossible to wait for all the
data to be collected before giving a parameter estimate.
Typical dynamic imaging examples are transient combustion
processes, sedimentation in hydrocyclones, mixing or flow
processes. These challenging classes of problems are referred
to as non-stationary inverse problems [55].
Bayesian recursive approaches exhibit a powerful
framework to provide continually updated parameter estimates
as the data arrives. The most popular representative to
extract a signal or parameters from a series of incomplete and
noisy measurements, is the Kalman filter (KF). KFs convert a
Gaussian prior probability to a Gaussian posterior probability
when the likelihood function is also Gaussian. The present
state of knowledge of unknown parameters at time instant k is
completely characterized by the use of a small set of sufficient
statistics based on prior information and the measurement
history. KFs provide unbiased estimates with minimum
variance when both state transition and measurement process
are linear functions, process and measurement noise are
uncorrelated and Gaussian distributed with zero mean. The
KF-based state estimation can be extended to nonlinear state
transitions and measurement models leading to the extended
Kalman filter (EKF), which linearizes about the most recent
state estimate.
Taking into account any evolution of the contour over time
the state-space representation of a contour can be defined by
Pipe
MAP
CM
(a)
(b)
Figure 16. Reconstruction results. (a) Scatter plot. Randomly
chosen points of the posterior distribution are plotted. (b) Detail plot
of point estimates. MAP estimate (grey) and the CM estimate
(dashed black) calculated from the posterior distribution.
figure 15(b) deviations from the true contour in regions of
low sensor sensitivity are clearly visible. Furthermore, there
are significant outliers in the right part of the estimation result
despite this part of the contour being close to the boundary.
The reason is that the distinct corner in the true boundary is not
well modelled by our prior, which rejects candidate states with
sharp angles between two boundary elements. Appropriately
changing the prior would allow these features of the contour
to be recovered. Note, however, that this mis-modelling in
the prior is evident from the scatter plot in the region near the
sharp corner, and indicates the significant benefit of posterior
error estimates available in a Bayesian analysis.
7.3. Example 3—circular contour (measured data)
Figure 16(a) shows the posterior variability in inclusion shape
and position for measured data. Samples from the posterior
distribution are consistent with the circular shape of the PVC
rod. Due to the lack of a reference measurement system, the
true shape is not depicted. However, knowing the geometry
of the rod allows validation of the reconstruction results by
comparing estimates to the true area and circumference. The
centred grey circle-shaped contour represents the initial state
of the Markov chain. MAP state and CM state estimates
are presented in figure 16(b). In accordance with the first
example (same shape and properties but synthetic data)
MAP and CM estimates almost coincide indicating that the
posterior distribution has a well-defined single mode. Table 2
summarizes the inference. Mean, standard deviation and the
IACT are evaluated for inclusion area , circumference c and
centre coordinates (x, y) of the circular inclusion.
θk+1 = fk (θk , vk )
(31)
dk = hk (θk , nk ),
(32)
where fk (·) represents the state transition of the state θ from
time k − 1 to time k subjected to process noise which is
modelled by v. A measurement based on the current state
θk subjected to measurement noise n is modelled by hk (·).
The simplest dynamic model which is widely in use for ECT
and EIT is a constant shape with a random walk in position.
Since the (E)KF relies on first-order derivatives, a
linearization of the nonlinear measurement equation is
required. By linearizing (32) about the latest predicted state
θk|k−1 , we obtain
dk ≈ hk (θk|k−1 ) + Jk (θk|k−1 )θk + "
nk .
(33)
The Jacobian Jk (θk|k−1 ) is composed of the derivatives
of measured charges with respect to state variables, e.g.
18
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
coefficients of a contour model. The Jacobian is usually
calculated from the solution of the forward map using the
adjoint variable approach [17, 91]. Higher order terms are
considered as additional noise [90]. The stochastic part "
nk
is composed of measurement and linearization error and is
typically also assumed to be zero-mean Gaussian.
When considering inverse problems we have to face
the problems of multiple maxima and the possibility of
finding unrepresentative peaks in the posterior probability. By
imposing constraints on the state vector and by specifying
appropriate priors, the KF has been successfully applied
to non-stationary electrical tomography. Vauhkonen et al
who introduced the KF to EIT, augments the state vector
by artificial measurements (spatial regularization) in order
to obtain robust estimates [80]. Based on this work, many
approaches for two- and three-dimensional dynamic imaging
and parameter tracking have been presented over the last
decade including extended, constrained and unscented Kalman
filtering techniques (see, e.g. [29, 90, 91]) as well as fixed-lag
and fixed-interval smoothing approaches [92]. More recently,
focus was put on the estimation of phase boundary parameters
using different contour models (see, e.g. [28, 93]).
A less restrictive formulation of the Bayes principle
based on sequential Monte Carlo simulations and a numerical
approximation of non-Gaussian state densities is given by the
particle filter (PF) [55, 94, 95]. In the literature, the PF is also
known as bootstrap filter, condensation tracking or sampling
importance resampling.
Whereas for the KF the state is modelled using a
multi-variate Gaussian distribution, the PF numerically
approximates any potentially multi-modal and non-Gaussian
distribution over the state vector. Calculation of the Jacobian
is not necessary. The distribution is represented by a set
of ‘particles’, or states. A set of N particles θ (m) randomly
chosen from the state space and their corresponding weights
ξ (m) define the empirical distribution
fθ (θ ) ≈ {θ (m) , ξ (m) }m=1,...,N .
0.05
0
0.05
0.05
0
0.05
Figure 17. PF tracking result of a circular phantom with a diameter
of 50 mm moved from left to right. The bold dashed black contours
denote the sample mean at three different time instants while the
grey-shaded contours represent the posterior variability of involved
particles associated with their respective weights.
A good overview of non-stationary inverse problems in
the framework of Bayesian recursive filters, including many
illustrative examples, is presented in [55].
9. Discussion and conclusion
A key difference between regularized least squares and
Bayesian methods is that whereas regularization gives point
estimates, typically using a data-misfit criterion, Bayesian
methods can provide averages over all solutions consistent
with the data. This leads to a improvement in robustness
in estimates or properties of the unknown permittivity. The
improvement is not surprising once it is realized that the single
‘most likely’ solution, found by a regularized minimization of
misfit to the measured data, is typically unrepresentative of
the bulk of feasible solutions in high-dimensional nonlinear
problems.
Inferential solutions to inverse problems provide other
substantial advantages over deterministic methods, such
as the ability to treat arbitrary forward maps and error
distributions, and to use a wide range of representations of the
unknown system including parameter spaces that are discrete,
discontinuous or even variable dimension.
Markov chain Monte Carlo sampling (MCMC) has
revolutionized computational Bayesian inference and is
currently the best available technology for a comprehensive
analysis of inverse problems, allowing quantitative estimates
and exploration of high-dimensional posterior distributions
without special mathematical structure. One complaint
might be that inference appears ‘too easy’, since ability to
simulate measurements means that the posterior distribution
can be sampled, effectively solving the inverse problem by
giving access to summary statistics that characterize posterior
variability. However, as we hope we have demonstrated
throughout this paper, formulating the inverse problem in a
(34)
Assuming that the underlying process is Markov, the state
transition in equation (31) can be reformulated as conditional
density π(θk |θk−1 ). The PF keeps track of the current state
estimate represented by π(θk |Dk ), where Dk = {d1 , . . . , dk }
denotes the history of measurements acquired up to time step
k. In case of reconstructing material interfaces in ECT, the
state of the model is a set of parameters that fully describe the
contour at any instance (mid-level models).
Figure 17 shows a representative example of PF-based
object tracking from measured electrical capacitance data.
A phantom is moved from left to right with constant speed.
Measurements are acquired at eight different positions spaced
along the x-axis. In each position 10 measurement frames are
collected and provided to the reconstruction algorithm. The
filter uses 20 particles for sequential sampling.
In an example discussed in [30] time-varying contours
are used to describe interfaces between different material
properties. A comparison of particle filtering to Kalman
filtering with application to ECT is presented in [74]. An
application of a PF for detecting an inclusion in ultrasound
reflection tomography is presented in [82].
19
Meas. Sci. Technol. 20 (2009) 052002
Topical Review
Bayesian inferential framework requires accurate simulation
of the measurement process by careful discretization of a
forward map that is a validated model for measurements,
a validated stochastic model for measurement noise, and a
data-independent prior distribution that is (non)informative
with respect to the primary quantity of interest. These
modelling requirements ensure that achieving quality results
using Bayesian inference will never constitute a ‘free lunch’.
A further impediment to the application of Bayesian analyses
to practical inverse problems lies in the computational cost
of MCMC sampling. In recent years several promising
algorithms and advances have been suggested that give
substantial speedup for computationally intensive problems
including capacitance tomography. There is some hope that
eventually the computational cost of sampling will not be
substantially greater than that of optimization.
We contrasted the modelling choices available against a
reference problem to clarify the role of alternatives available in
structuring inferential solutions for other applications. There
is now a wide range of well-developed tools for stochastic
modelling and Bayesian inference for inverse problems.
Notwithstanding the modelling difficulties alluded to above,
applying those tools is now a well-developed procedure, and
we anticipate that Bayesian inference for inverse problems
will move from being the quality standard in solutions to the
quality solution of standard choice.
[11] Dyakowski T 2005 Application of electrical capacitance
tomography for imaging industrial processes J. Zhejiang
Univ. Sci. 12 1374–8
[12] Fox C and Nicholls G 2002 Statistical estimation of the
parameters of a PDE Can. Appl. Math. Q. 10 277–306
[13] Christen J A and Fox C 2005 MCMC using an approximation
J. Comp. Graphical Stat. 14 795–810
[14] Yang W Q and Peng L 2003 Image reconstruction algorithms
for electrical capacitance tomography Meas. Sci. Technol.
14 R1–13
[15] Brandstätter B, Holler G and Watzenig D 2003 Reconstruction
of inhomogeneities in fluids by means of capacitance
tomography J. Comp. Math. Electr. Electron. Eng.
22 508–19
[16] Soleimani M and Lionheart W R B 2005 Nonlinear image
reconstruction for electrical capacitance tomography using
experimental data Meas. Sci. Technol. 16 1987–96
[17] Kortschak B and Brandstätter B 2004 A FEM-BEM approach
using level-sets in tomography J. Comp. Math. Electr.
Electron. Eng. 24 591–605
[18] Kaipio J and Somersalo E 2007 Statistical inverse problems:
discretization, model reduction and inverse crimes J.
Comput. Appl. Math. 198 493–504
[19] Hurn M A, Husby O and Rue H 2003 Advances in Bayesian
image analysis Highly Structured Stochastic Systems
ed P J Green, N Hjort and S Richardson (Oxford: Oxford
University Press) pp 302–22
[20] McKeague I W, Nicholls G, Speer K and Herbei R 2005
Statistical inversion of south atlantic circulation in an
abyssal neutral density layer J. Marine Res.
63 683–704
[21] Higdon D and Yamamoto S 2001 Estimation of the head
sensitivity function in scanning magnetoresistance
microscopy J. Am. Stat. Assoc. 96 785–93
[22] Jeffreys H 1931 Scientific Inference (Cambridge: Cambridge
University Press)
[23] Geman S and Geman D 1984 Stochastic relaxation: Gibbs
distributions, and the Bayesian restoration of images IEEE
Trans. Pattern Anal. Mach. Intell. 6 721–41
[24] Grenander U and Miller M 1994 Representations of knowledge
in complex systems J. R. Stat. Soc. Ser. B 56 549–603
[25] Nicholls G K and Fox C 1998 Prior modelling and posterior
sampling in impedance imaging Proc. SPIE 3459 116–27
[26] Kaipio J P, Kolehmainen V, Somersalo E and Vauhkonen M
2000 Statistical inversion and Monte Carlo sampling
methods in electrical impedance tomography Inverse
Problems 16 1487–522
[27] Hastings W K 1970 Monte Carlo sampling methods using
Markov chains and their applications Biometrika 57 97–109
[28] Tossavainen O P, Vauhkonen M and Kolehmainen V 2007 A
three-dimensional shape estimation approach for tracking of
phase interfaces in sedimentation processes using electrical
impedance tomgraphy Meas. Sci. Technol. 18 1413–24
[29] Soleimani M, Vauhkonen M, Yang W Q, Peyton A J, Kim B S
and Ma X 2007 Dynamic imaging in electrical capacitance
tomography and electromagnetic induction tomography
using a Kalman filter Meas. Sci. Technol. 18 3287–94
[30] Watzenig D 2006 Recovery of inclusion shape by statistical
inversion of non-stationary tomographic measurement data
Int. J. Inform. Syst. Sci. 2 469–83
[31] Watzenig D 2006 Bayesian inference for process tomography
from measured electrical capacitance data PhD Thesis
Institute of Electrical Measurement and Measurement
Signal Processing, Graz University of Technology
[32] Huang A M, Plaskowski A B, Xie C G and Beck M S 1988
Capacitance-based tomographic flow imaging system IEE
Electron. Lett. 24 418–9
[33] Kortschak B, Wegleiter H and Brandstätter B 2007
Formulation of cost functionals for different measurement
References
[1] Scott D M and McCann H 2005 Process Imaging and
Automatic Control (Boca Raton, FL: CRC Press)
[2] Beck M S, Dyakowski T and Williams R A 1998 Process
tomography—the state of the art Trans. Instrum. Meas.
Control 20 163–77
[3] Holder D S 2005 Electrical Impedance Tomography:
Methods, History and Applications (Series in Medical
Physics and Biomedical Engineering) (Bristol: Institute of
Physics Publishing)
[4] York T 2001 Status of electrical tomography in industrial
applications J. Electron. Imaging 10 608–19
[5] Mohamed-Saleh J and Hoyle B S 2002 Determination of
multi-component flow process parameters based on
electrical capacitance tomography data using artificial
neural networks Meas. Sci. Technol. 13 1815–21
[6] Tapp H S, Peyton A J, Kemsley E K and Wilson R H 2003
Chemical engineering applications of electrical process
tomography Sensors Actuators B 92 17–24
[7] Gamio J C, Castro J, Rivera L, Alamillia J, Garcia-Nocetti F
and Aguilar L 2005 Visualisation of gas-oil twophase flow
in pressurised pipes using electrical capacitance
tomography Flow Meas. Instrum. 16 129–34
[8] Makkawi Y T and Wright P C 2004 Electrical capacitance
tomography for conventional fluidized bed
measurements—remarks on the measuring technique J.
Powder Technol. 148 142–57
[9] Sanderson J and Rhodes M 2003 Hydrodynamic similarity of
solids motion and mixing in bubbling fluidized beds J. Am.
Inst. Chem. Eng. 49 2317–27
[10] Waterfall R C 2000 Imaging combustion using capacitance
tomography Advanced Sensors and Instrumentation
Systems for Combustion Processes: IEE Seminar
ed J Gardener (IEE Professional Group J1) pp 12/1–12/4
20
Meas. Sci. Technol. 20 (2009) 052002
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
Topical Review
principles in nonlinear capacitance tomography Meas. Sci.
Technol. 18 71–8
Yang W Q 1996 Hardware design of electrical capacitance
tomography systems Meas. Sci. Technol. 7 225–32
Wegleiter H, Fuchs A, Holler G and Kortschak B 2008
Development of a displacement current based sensor for
electrical capacitance tomography applications Flow Meas.
Instrum. 19 241–50
Yang W Q, Scott A L and Gamio J C 2003 Analysis of the
effect of stray capacitance on an ac-based capacitance
tomography sensor J. Instrum. Meas. 52 1674–81
Alme K J and Mylvaganam S 2006 Electrical capacitance
tomography—sensor models, design, simulations, and
experimental verification IEEE Sensors J. 6 1256–66
Wegleiter H, Fuchs A, Holler G and Kortschak B 2005
Analysis of hardware concepts for electrical capacitance
tomography applications Proc. 4th IEEE Conf. on Sensors
(Oct. 31–Nov. 3, Irvine, CA, USA) pp 688–91
Fox C, Nicholls G and Palm M 2000 Efficient solution of
boundary-value problems for image reconstruction via
sampling J. Electron. Imaging 9 251–9
Kaipio J P, Seppänen A, Somersalo E and Haario H 2004
Posterior covariance related optimal current patterns in
electrical impedance tomography Inverse Problems
20 919–36
Fang W and Cumberbatch E 2005 Matrix properties of data
from electrical capacitance tomography J. Eng. Math.
51 127–46
Fox C 1988 Conductance imaging PhD Thesis University of
Cambridge
Kolehmainen V, Lassas M and Ola P 2005 Inverse
conductivity problem with an imperfectly known boundary
SIAM J. Appl. Math. 66 365–83
Nissinen A, Heikkinen L M and Kaipio J P 2008 The Bayesian
approximation error approach for electrical impedance
tomography—experimental results Meas. Sci. Technol.
19 015501
Bayarri M J, Berger J O, Cafeo J, Garcia-Donato G, Liu F,
Palomo J, Parthasarathy R J, Paulo R, Sacks J and Walsh D
2007 Computer model validation with functional output
Ann. Stats. 35 1874–906
Seagar A D 1983 Probing with low frequency electric currents
PhD Thesis Electrical Engineering, University of
Canterbury
Kress R 1999 Linear Integral Equations 2nd edn (New York:
Applied Mathematical Sciences, Springer)
Aykroyd R G and Cattle B A 2006 A flexible statistical and
efficient computational approach to object location applied
to electrical tomography Stat. Comp. 16 363–75
Roy D, Nicholls G and Fox C 2008 Imaging convex
quadrilateral inclusions in uniform conductors from
electrical boundary measurements Stat. Comput. 19 17–26
Wrobel L C 2002 The Boundary Element Method (Chichester:
Wiley)
Watzenig D, Steiner G, Fuchs A, Zangl H and Brandstätter B
2007 Influence of the discretiztion error on the
reconstruction accuracy in electrical capacitance
tomography Int. J. Comp. Math. Electr. Electron. Eng.
26 661–76
Schwarzl C 2007 Robust parameter estimation in ECT using
MCMC sampling Master Thesis Graz University of
Technology
Moulton J D, Fox C and Svyatskiy D 2008 Multilevel
approximations in sample-based inversion from the
Dirichlet-to-Neumann map J. Phys.: Conf. Ser. 124 012035
[54] Schwarzl C, Watzenig D and Fox C 2008 Estimation of
contour parameter uncertainties in permittivity imaging
using MCMC sampling 5th IEEE Sensor Array and
Multichannel Signal Processing Workshop (21–23 July)
pp 446–50
[55] Kaipio J P and Somersalo E 2004 Statistical and
Computational Inverse Problems (New York: Applied
Mathematical Sciences, Springer)
[56] Golub G H and Van Loan C F 1993 Matrix Computations 2nd
edn (Baltimore: The Johns Hopkins University Press)
[57] Lee J E 2005 Sample based inference for inverse obstacle
scattering Master Thesis Department of Mathematics, The
University of Auckland
[58] Fox C 2008 Recent advances in inferential solutions to inverse
problems J. Inverse Problems Sci. Eng. 16 797–810
[59] Robert C 2001 The Bayesian Choice (New York: Springer)
[60] Gilks W R, Richardson S and Spiegelhalter D (ed) 1996
Markov Chain Monte Carlo in Practice (London: Chapman
and Hall)
[61] Robert C P and Casella G 2000 Monte Carlo Statistical
Methods. Springer Texts in Statistics 2nd edn (New York:
Springer)
[62] Liu J S 2001 Monte Carlo Strategies in Scientific Computing
(New York: Springer)
[63] Banerjee S, Carlin B P and Gelfand A E 2004 Hierarchical
Modeling and Analysis for Spatial Data (Boca Raton, FL:
CRC Press)
[64] Grenander U and Miller M 2007 Pattern Theory: From
Representation to Inference (Oxford: Oxford University
Press)
[65] Fox C and Nicholls G K 1997 Sampling conductivity images
via MCMC The Art and Science of Bayesian Image
Analysis—Leeds Annual Statistics Research Workshop vol
14 pp 91–100
[66] Siltanen S, Voutilainen A, Kolehmainen V, Järvenpää S,
Kaipio J P, Koistinen P, Lassas M, Pirttilä J and Somersalo
E 2003 Statistical inversion for medical X-ray tomography
with few radiographs: I. General theory Phys. Med. Biol.
48 1437–63
[67] Aykroyd R G and Zimeras S 1999 Inhomogeneous prior
models for image reconstruction J. Am. Stat. Assoc.
94 934–46
[68] Noumeir R, Mailloux G E and Lemieux R 1995 An
expectation maximization reconstruction algorithm for
emission tomography with non-uniform entropy prior Int. J.
Biomed. Comp. 39 299–310
[69] Kolehmainen V 2001 Novel approaches to image
reconstruction in diffusion tomography PhD Thesis Kuopio
University Publications C. Natural and Environmental
Sciences 125
[70] Palm M 1999 Monte Carlo methods in electrical conductance
imaging Master Thesis Department of Mathematics, The
University of Auckland
[71] Cui T 2005 Bayesian inference for geothermal model
calibration Master Thesis Department of Mathematics, The
University of Auckland
[72] Andersen K E, Brooks S P and Hansen M B 2003 Bayesian
inversion of geoelectrical resistivity data J. R. Stat. Soc.:
Ser. B 65 619–42
[73] Kolehmainen V, Voutilainen A and Kaipio J P 2001
Estimation of non-stationary region boundaries in EIT-state
estimation approach Inverse Problems 17 1937–56
[74] Watzenig D, Brandner M and Steiner G 2007 A particle filter
approach for tomographic imaging based on different
state-space representations Meas. Sci. Technol.
18 30–40
[75] Kim M C, Kim K Y, Kim S, Seo K H, Jeon H J, Kim J H
and Choi B Y 2005 Estimation of phase boundary by front
points method in electrical impedance tomography Proc.
21
Meas. Sci. Technol. 20 (2009) 052002
[76]
[77]
[78]
[79]
[80]
[81]
[82]
[83]
[84]
[85]
Topical Review
[86] Higdon D, Lee H and Bi Z A 2002 Bayesian approach to
characterizing uncertainty in inverse problems using coarse
and fine scale information IEEE Trans. Signal Process.
50 389–99
[87] Marinari E and Parisi G 1992 Simulated tempering: a new
Monte Carlo scheme Europhys. Lett. 19 451–8
[88] Watzenig D 2007 Statistical solutions to inverse problems statistical inversion J. Austrian Soc. Electr. Eng. (OVE) 7/8
240–7
[89] Aykroyd R G and Cattle B A 2007 A boundary-element
approach for the complete electrode model of EIT
illustrated using simulated and real data Inverse Problems
Sci. Eng. 15 441–61
[90] Kim K Y, Kang S I, Kim M C, Kim S, Lee Y J and Vauhkonen
M 2003 Dynamic image reconstruction in electrical
impedance tomography with known internal structures
IEEE Trans. Magn. 38 1301–4
[91] Watzenig D, Steiner G and Pröll C 2005 Statistical estimation
of phase boundaries and material parameters in industrial
process tomography Proc. of the IEEE Int. Conf. on Ind.
Technol. (ICIT’05) (Hong Kong, China, December 14–17)
pp 720–5
[92] Seppänen A, Vauhkonen M, Vauhkonen P J, Somersalo E
and Kaipio J P 2001 State estimation with fluid dynamical
evolution models in process tomography—an application
with impedance tomography Inverse Problems 17 467–84
[93] Kim B S, Ijaz U Z, Kim J H, Kim M C, Kim S and
Kim K Y 2007 Nonstationary phase boundary
estimation in electrical impedance tomography based on the
interacting multiple model scheme Meas. Sci. Technol.
18 62–70
[94] Doucet A, de Freitas N and Gordon N J 2001 Sequential
Monte Carlo Methods in Practice (New York: Springer)
[95] Arulampalam M S, Maskell S, Gordon N J and Clapp T 2002
A tutorial on particle filters for online
nonlinear/non-Gaussian Bayesian tracking IEEE Trans.
Signal Process. 50 174–88
Int. Conf. on Inverse Problems, Design and Optimization
(IPDO 2004) pp 101–7
Tossavainen O P, Vauhkonen M, Heikkinen L M
and Savolainen T 2004 Estimating shapes and free surfaces
with electrical impedance tomography Meas. Sci. Technol.
15 1402–11
Grudzien K, Romanowski A and Williams R A 2005
Application of a Bayesian approach to the tomographic
analysis of hopper flow J. Part. Part. Syst. Charact.
22 246–53
West R M, Aykroyd R G, Meng S and Williams R A 2004
MCMC techniques and spatial-temporal modelling for
medical EIT Meas. Sci. Technol. 25 181–94
Watzenig D and Fox C 2008 Posterior variability of inclusion
shape based on tomographic measurement data J. Phys.:
Conf. Ser. 135 012102
Vauhkonen M, Karjalainen P A and Kaipio J P 1998 A
Kalman filter approach to track fast impedance changes in
electrical impedance tomography IEEE Trans. Biomed.
Eng. 45 486–93
Kaipio J P, Kolehmainen V, Vauhkonen M and Somersalo E
1999 Inverse problems with structural prior information
Inverse Problems 15 713–29
Steiner G, Soleimani M and Watzenig D 2008 A
bio-electromechanical imaging technique with combined
electrical impedance and ultrasound tomography Physiol.
Meas. 29 63–75
Green P J 1995 Reversible jump Markov chain Monte Carlo
computation and Bayesian model determination Biometrika
82 711–32
Higdon D, Lee H and Holloman C 2003 Markov chain Monte
Carlo-based approaches for inference in computationally
intensive inverse problems Bayesian Statistics 7 (Oxford:
Oxford University Press)
Brooks S P, Giudici P and Roberts G O 2003 Efficient
construction of reversible jump MCMC proposal
distributions J. R. Stat. Soc.: Ser. B 65 3–56
22