Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IOP PUBLISHING MEASUREMENT SCIENCE AND TECHNOLOGY Meas. Sci. Technol. 20 (2009) 052002 (22pp) doi:10.1088/0957-0233/20/5/052002 TOPICAL REVIEW A review of statistical modelling and inference for electrical capacitance tomography D Watzenig1 and C Fox2 1 Institute of Electrical Measurement and Measurement Signal Processing, Graz University of Technology, Kopernikusgasse 24, A-8010 Graz, Austria 2 Department of Physics, University of Otago, PO Box 56, Dunedin, New Zealand E-mail: [email protected] and [email protected] Received 20 November 2007, in final form 18 December 2008 Published 3 April 2009 Online at stacks.iop.org/MST/20/052002 Abstract Bayesian inference applied to electrical capacitance tomography, or other inverse problems, provides a framework for quantified model fitting. Estimation of unknown quantities of interest is based on the posterior distribution over the unknown permittivity and unobserved data, conditioned on measured data. Key components in this framework are a prior model requiring a parametrization of the permittivity and a normalizable prior density, the likelihood function that follows from a decomposition of measurements into deterministic and random parts, and numerical simulation of noise-free measurements. Uncertainty in recovered permittivities arises from measurement noise, measurement sensitivities, model inaccuracy, discretization error and a priori uncertainty; each of these sources may be accounted for and in some cases taken advantage of. Estimates or properties of the permittivity can be calculated as summary statistics over the posterior distribution using Markov chain Monte Carlo sampling. Several modified Metropolis–Hastings algorithms are available to speed up this computationally expensive step. The bias in estimates that is induced by the representation of unknowns may be avoided by design of a prior density. The differing purpose of applications means that there is no single ‘Bayesian’ analysis. Further, differing solutions will use different modelling choices, perhaps influenced by the need for computational efficiency. We solve a reference problem of recovering the unknown shape of a constant permittivity inclusion in an otherwise uniform background. Statistics calculated in the reference problem give accurate estimates of inclusion area, and other properties, when using measured data. The alternatives available for structuring inferential solutions in other applications are clarified by contrasting them against the choice we made in our reference solution. Keywords: statistical inversion, Bayesian inference, Markov chain Monte Carlo, electrical capacitance tomography within inaccessible domains in applications where differing materials show up as contrasting permittivities. Electrodes are set in an insulating material at the outside of an insulating tube. By applying a predefined voltage pattern to electrodes the capacitance between pairs of electrodes can be directly related to measured electric potentials, electric currents and 1. Introduction Electrical capacitance tomography (ECT) is an imaging modality in which one attempts to recover the spatially varying permittivity of an insulating medium from measurement of capacitance outside the boundary of the medium [1, 2]. ECT is primarily used for non-invasive imaging 0957-0233/09/052002+22$30.00 1 © 2009 IOP Publishing Ltd Printed in the UK Meas. Sci. Technol. 20 (2009) 052002 Topical Review electric charges. Measurements consist of all the capacitances between pairs of electrodes, making up the matrix of transcapacitances. The interior of the tube contains the material with unknown permittivity distribution that is being imaged. It is advantageous to surround the apparatus by an electrically conducting ground shield so that measured trans-capacitances do not depend on the environment outside the apparatus. Measured capacitances depend on the unknown permittivity. The imaging problem is to ‘invert’ this relationship to determine the unknown permittivity. ECT has been proposed for a variety of target applications such as imaging dilute as well as bulky multi-phase flows in oil refinement, in the food industry and to observe pharmaceutical and chemical processes [3–7]. Other fields of application can be found in the characterizing of different phases in fluidized beds, mixing processes and combustion chambers [8–10]. ECT systems can be implemented with low cost, and due to their robustness and small failure probability are suitable for operation under harsh environmental conditions including the presence of strong external electromagnetic fields [11]. The functional relationship ε(x, y) → C from the permittivity distribution to the trans-capacitance matrix C defines the forward map. The forward map can be modelled using the physics of the problem and is written as an elliptic partial differential equation (PDE) of the form ∇ · (ε∇u) = q subject to boundary conditions. The permittivity being sought is denoted by ε and appears as the spatially varying coefficient in the PDE. The measurements made are of the boundary values of the electrical potential u and flux ε ∂u . ∂n If the electrical field u can be well approximated as a small change about a known field then the linear Born approximation may be used to simulate the measurement process (forward map). This would be the case when the permittivity is essentially known up to a small uncertainty. In most applications, however, the measurement process must be simulated by solving the PDE subject to appropriate boundary conditions. In that case the forward map is nonlinear [12, 13]. When the measurement system, consisting of the region of interest and electrodes and the permittivity, is long in one direction, it follows that the electric fields do not vary in that direction and the forward and inverse problems reduce to a two-dimensional problem for a slice through the system. This is an approximation to the three-dimensional inverse problem that is usually made to reduce computational complexity. As a general imaging technique, ECT is necessarily low resolution. This follows from each measurement being dependent on all of the permittivity, resulting in measurements primarily being sensitive to average, or slowly varying, properties. Fine-scale structure in the permittivity has little effect on measurements. Consequently, practical measurements that include noise do not unambiguously define a detailed image of the spatially varying permittivity. For this reason it is necessary to include further information in the imaging step, such as physical constraints. In deterministic regularization methods this takes the form of a regularizing functional, typically a semi-norm over representations chosen on mathematical grounds. In contrast, the wider range of representations allowed by the statistical approach allows for the inclusion of constraints or information that more closely represent actual knowledge about the unknown permittivity thereby allowing genuine modelling of the unknowns to inform the imaging step. Representations and image modelling are discussed in section 5. The majority of ECT reconstructions reported in the literature apply deterministic approaches such as regularized least squares to solve the inverse problem [14–17]. Deterministic inversion consists of applying a regular approximation to the inverse of the forward map to give a single estimate of the unknown parameters, and includes errors only in terms of a single number, the ‘magnitude’ of noise. Finer details of the statistical distribution of errors are not considered. The inclusion of the error process is critical for practical solution of inverse problems such as ECT where the forward map has a large range of sensitivities to features in the unknown permittivities. Explicitly modelling the error process, and ensuring that the error conforms to the model, is also an invaluable tool in developing accurate instrumentation and an accurate forward map. All too often measurements contain artefacts that are not part of the intended measurement modality, or are not modelled in the forward map. Interpreting measurement or discretization artefacts in terms an idealized forward map typically leads to substantial artefacts in the reconstructed permittivities [18]. We cannot over stress the value of the standard technique (in statistics) of examining residuals to validate the forward map and measurement error distributions (see section 4). A probabilistic model for measurement error and other uncertainties results in a probabilistic model for the measurement process, and inversion is then a problem of statistical inference in which the unknown permittivity is to be estimated. The parametrization of the unknown permittivity is an important consideration as it determines the ease, or difficulty, of stating constraints over allowable permittivities. A consequent requirement is the specification of a prior distribution over parameters since this is the primary means of ensuring that estimates of quantities of interest are not biased by the estimation procedure. We address the issue of representation and prior distribution in section 5. Accurately modelling the measurement process, including the statistics of the measurement error, and modelling unknowns via the prior distribution are key steps in an accurate application of Bayesian inference to inverse problems [19–21]. This approach provides a convenient setting for defining, incorporating, controlling and interpreting prior information and has a wide ranging applicability in many inverse problems. Statistical solutions to inverse problems have a long history, with notable developments in the work of Laplace and Jeffreys [22] though the ingredients of a modern computational approach where introduced in seminal papers by Geman and Geman in 1984 [23] and Grenander and Miller in 1994 [24], each for a problem in image restoration. The first substantive statistical methods applied in electrical impedance tomography (EIT) were presented by Nicholls and Fox [25] and Kaipio et al [26] about 10 years ago. By taking into account uncertainties, one can quantify the range of parameters that are consistent with measured data 2 Meas. Sci. Technol. 20 (2009) 052002 Topical Review via the posterior probability distribution. Then solutions to an ill-posed inverse problem are well-determined problems in statistical inference over that distribution. Bayesian methods encompass much more than simply reporting a posterior mode and can be regarded as more general than regularization. The Bayesian paradigm has many advantages over deterministic approaches, such as robust predictive densities, posterior error estimates, direct support for optimal decisions and the ability to treat arbitrary forward maps and error distributions. One non-obvious advantage is the ability to use a wide range of representations of the unknown system including parameter spaces that are discrete, discontinuous or of even variable dimension. Since inferential methods make optimal use of data, the ability to reduce data to a minimal set gives cost savings in applications where collecting data is expensive. The price of these advantages is presently the relatively high computational cost of sampling algorithms for computing estimates. Typically, the dimension of the parameter space is between 50 and 105 depending on the problem to be solved. Integrations required to compute estimates such as posterior means or credible intervals are intractable through standard analytical or numerical quadrature techniques. Best current solutions employ Markov chain Monte Carlo (MCMC) sampling that is computationally costly. These sampling algorithms draw samples from the posterior distribution by simulating a Markov chain with an appropriate transition kernel, with popular examples being the Gibbs sampler [23] and the Metropolis–Hastings (MH) algorithm [27]. Both discrete and continuous problems can be treated requiring only the ratio of probability densities of two states to be calculated. In scenarios where material distributions change with time and real-time performance is required, Bayesian recursive filters such as Kalman filters (KFs) [28, 29] and particle filters (PFs) [30] can be applied. To overcome the computational expense of MCMC algorithms, much current research focuses on fast alternative algorithms in order to extend the field of applications, including non-stationary problems. We discuss these recent advances in more detail in section 6.3. In this article we review the current state-of-the-art in performing ECT from measured data using inferential methods. We have tried to also incorporate the nature of a tutorial by surveying the many methods available in terms of the sequence of choices that need to be made to implement one solution as opposed to another. Throughout the review we will highlight a reference problem in ECT, of recovering the unknown shape of a single inclusion with unknown constant permittivity in an otherwise uniform background material, from uncertain capacitance measurements at electrodes outside the material [31]. A schematic of this reference problem is shown in figure 1. We expect that contrasting the various choices we made in solving this reference application, against the alternatives available, will clarify both our solution and other structured inferential solutions. The reference problem arose in an application with the goal of quantifying void fraction (water/air) in oil pipe inclusion electrodes u0 shielding Figure 1. A schematic of the reference problem in ECT, in which a single inclusion in permittivity is sought. lines. Hence the area of the (two-dimensional) inclusion was of primary interest. We therefore chose an explicit representation of the boundary of the inclusion so that area is simple to calculate. That choice of representation necessarily introduces a bias in estimates of area (as it would with regularized inversion) and so we adjust the prior density over inclusions to compensate (see section 5). Numerical implementation of the forward map uses a boundary element method (BEM) representation of the unknown permittivity, taking advantage of the piecewise-constant representation, coupled to a finite element method (FEM) discretization of the region of unchanging permittivity around the electrodes [17]. We present the posterior distribution of inclusion area as a check on accuracy of the method. All these aspects correspond to choices made for this problem, some of which we would change given a different application or measurement modality. The review is organized as follows. In section 2 we introduce the sensor technology, the requirements for an imaging system in electrical tomography, typical measurements, instrumentation error, measurement uncertainties and a calibration concept. Section 3 addresses data simulation for ECT. Sensor model, modelling error as well as numerical implementation via finite and boundary elements are discussed. In section 4 we formulate ECT in the Bayesian inferential framework by introducing a probabilistic model of the measurement process along with basic definitions for statistical inverse problems. Representation of unknowns as low-, mid- and high-level and the issues in prior modelling are investigated in section 5. In section 6 we discuss the MCMC sampling procedure, acceleration schemes and recent advances in sampling as well as the topic of summarizing the posterior distribution and calculation of statistics. Inversion results for ECT are presented in section 7 for different permittivity distributions using synthetic and measured data. Section 8 contains a brief review on the solution of non-stationary inverse problems. Concluding remarks are given in section 9. 3 Meas. Sci. Technol. 20 (2009) 052002 Topical Review • An outer screen of the sensor head is compulsory because this will shield the sensor system from the ambient systems and prevent charge disturbances on the electrodes due to external charged objects [37]. In the charge-based method, the induced displacement current is measured and converted to a voltage signal which is proportional to the inter-electrode capacitance. In comparison to the electrode potential-based strategy, this method can be considered a low-impedance measurement. Due to the narrow frequency characteristics, the method is less affected by electromagnetic disturbances. The narrow noise bandwidth implies improved signal-to-noise ratio (SNR) and consequently a higher resolution in measured quantities. By including a tunable low-pass filter, varying stray capacitances can be compensated in a very simple and natural way [38]. Unfortunately this approach does not work for the electrode potential-based method [35]. A filter between the electrode and the operational amplifier significantly reduces the impedance of the input stage. Within the high-impedance method, the stray capacitance and the interelectrode capacitance form a capacitive voltage divider that reduces the amplitude of the received signal. Low-impedance measurements exhibit better noise characteristics and are less sensitive to varying stray capacitances. Due to these advantages we use a charge-based sensor built at Graz University of Technology for the different investigations and experiments presented in this work [35]. The complete sensor font-end of the charge-based ECT sensor consists of an input resonant circuit, a low-noise current-to-voltage converter, a bandpass filter, a logarithmic demodulator and a 24 bit analog-to-digital converter controlled by a microprocessor. Figure 2 illustrates the measurement configuration of the ECT sensor used for permittivity imaging. The sensor has a carrier frequency of 40 MHz and comprises two tuneable filters adjusted by means of variable capacitances for stray capacitance compensation. The data acquisition has a maximum sampling rate of 7.5 k samples/s. The receiving channel offers linear characteristics from 10 dBμA to 65 dBμA. A single measurement frame consists of 16 projections, according to the 16 available transmitting electrodes. A measurement frame consequently consists of 16 × 15 = 240 entries. The first 105 displacement current values of a measured frame are shown in figure 3. In most charge-based ECT applications the following restrictions are assumed to simplify sensor modelling: • The permittivity is independent of the electric field strength. • The carrier frequency is constant and the wavelength is large compared to the sensor geometry leading to an electrostatic model. • Stray capacitances in the longitudinal direction are not considered (due to length of electrodes). The configuration used in the present study is aimed at multi-phase flow monitoring, hence a ring of electrodes covering the cross-section of a process pipe is used. The 16 electrodes are evenly distributed around its boundary. This yields a set of 16 × 15/2 = 120 independent inter-electrode capacitances from which the internal permittivity distribution has to be reconstructed. 2. Instrumentation for ECT ECT is a non-invasive technique to examine the permittivity distribution of closed objects by means of measurements of coupling capacitances in a multi-electrode assembly. A general setup for ECT consists of the imaging domain ∈ R2 or R3 containing the unknown permittivity distribution and the domain boundary ∂ where a number of electrodes are placed. Typical configurations use 8–16 electrodes. A measurement circuit is connected to each of the electrodes to sense the inter-electrode capacitances providing information about the permittivity within . 2.1. ECT sensors Two measuring principles are typically applied to determine the matrix of trans-capacitances, or coupling capacitances: • Charge-based or displacement current-based method (ac voltage, low-impedance measurement) [16, 32]. • Electrode potential-based method (dc voltage, highimpedance measurement) [17]. For both methods each electrode is designated as a ‘transmitting’ or as a ‘receiving’ electrode with a prescribed voltage being applied to transmitting electrodes. In the chargebased method the receiving electrodes are held at virtual earth with the displacement charge being measured, while in the voltage-based approach the receiving electrodes are floating with the potential being measured. A comparison of the two different measurement principles is given in [33]. For the charge-based method, frequencies of the typically sinusoidal excitation signal are about 1 MHz in order to provide sufficient sensitivity. The different hardware designs of the sensing electronics, their advantages and disadvantages have been presented by Yang [34]. More recently, Wegleiter et al compared the charge-based method (sinusoidal excitation signal, 40 MHz) and the potentialbased method for the ECT sensor front-end in terms of circuit modelling, robustness to stray capacitances, hardware design issues and measurement repeatability [35]. From the ECT hardware design perspective, sensors need to meet the following requirements and limitations: • A selectable operation mode of each electrode (transmit/receive). • Accurate measurement of very small inter-electrode capacitances in the range of 1 fF to 5 pF in the presence of stray capacitances of the order of 150 pF [34, 36]. • High dynamic range of the amplifying circuitry to cover a wide range of magnitudes for electrodes adjacent and opposite to the transmitting one and to be able to consider both low- and high-contrast problems in permittivity. • High measurement resolution since capacitance changes caused by permittivity variations are very small. • Shielding for all sensor circuits to reduce cross-talk between transmitting and receiving electrodes. • Parallel measurement of receiving channels. • The possibility of sensor calibration (circuitry adjustment). • A monotonic transfer function for the range of interest. 4 Meas. Sci. Technol. 20 (2009) 052002 Topical Review grounded front-end (1 of n) I/U Logic HF inclusion image reconstruction algorithm electrode μC PVC tube grounded front-end (n of n) I/U Logic HF grounded shielding Figure 2. Measurement configuration of the used ECT sensor. The measurement electrodes are placed around the pipe containing the imaging plane. Every electrode features dedicated transmitting and receiving hardware. point of calibration can be obtained by comparing simulated and measured data for a well-defined target—a centred PVC rod in our example. The gain error between model and data is then corrected for each electrode, figure 5(b). Note that the prescribed calibration distribution should somehow correspond to an expectable permittivity distribution in terms of permittivity values and shape. It is very helpful to calibrate the model for a reference distribution that will lead to improved SNR, which can be achieved by placing the known reference object in the centre of the pipe where there is low sensor sensitivity. Due to temperature fluctuations and deterioration effects it is recommendable to repeat the calibration in terms of offset and gain correction during operation of the sensor. 2.3. Measurement uncertainty The sensor front-end and the subsequent instrumentation introduce noise sources to the measurement process. Applied voltage also has error, but this error is not significant and we do not consider that here (see [39] for an analysis that does). To investigate the robustness and repeatability of data acquisition, the distribution of measured displacement currents was examined over multiple measurements with a given fixed permittivity distribution [31]. Figure 6 shows the normalized quantile plot and the histogram for 2000 measurements at one electrode. The measured electrode displacement currents exhibit noise properties that can be well modelled as additive zero mean Gaussian with standard deviation σ ≈ 0.07μA. The matrix of sample correlation coefficients for all electrodes is shown in figure 7. As can be seen, off-diagonal elements are plausibly zero, so the measurement error covariance matrix is modelled as = σ 2 I where I is the identity matrix. Accordingly, the density for measuring d given permittivity defined by θ is the multivariate Gaussian, 1 T −1 π(d|θ ) ∝ exp − (qm − d) (qm − d) , (1) 2 Figure 3. The first 105 values of the acquired measurement vector for a typical material distribution of PVC in air. The crown-shaped profile results from the wide range of coupling capacitances between the transmitting and receiving electrodes. 2.2. Sensor calibration When using measured data special focus has to be put on the calibration of the computer model in order to successfully reconstruct parameters from the data. Calibration is performed by fitting the model to measured data for known internal permittivities, by adjusting stray capacitances. Stray capacitances are represented by two parameters—the radial distance between the tube and the outer screen and the permittivity within this space. In cases where other aspects of the geometry are also uncertain, we include parameters to describe the possible variation. Figure 4(a) shows the relative error between model and data for an empty pipe. According to the normalized quantile plot in figure 4(b), the difference is Gaussian distributed yielding a multivariate Gaussian error model. The remaining offset error (model to data mismatch) is corrected for the empty pipe (figure 5(a)). A second where qm denotes the vector of simulated displacement charges. 5 Meas. Sci. Technol. 20 (2009) 052002 Topical Review 3 2 2 quantiles of input sample relative error between measurements and model [%] QQ plot of sample data vs. standard normal 3 1 0 1 1 0 1 2 2 3 0 30 60 90 120 150 180 entries in charge vector 210 3 3 240 1 0 1 standard normal quantiles 2 (a) 2 3 (b) Figure 4. (a) Relative error between simulated and measured data for an empty pipe, i.e. for an air-filled pipe (εr = 1.0). (b) Normalized quantile plot of the error between measured and simulated displacement currents. The distribution of displacement currents meets the Gaussian assumption. 12 x 10 15 10 10 6 gain error offset error [A] 8 4 2 5 0 2 4 0 30 60 90 120 150 180 entries in charge vector 210 240 (a) 0 0 30 60 90 120 150 180 entries in charge vector 210 240 (b) Figure 5. (a) Difference between measured and simulated data for an empty pipe. This difference is referred to as offset error. (b) Gain error between measured and simulated data corresponding to a centred PVC rod after offset correction. instrumentation in Graz, where each electrode in sequence is made ‘active’ with an asserted potential of v0 and the remaining electrodes j = i act as ‘receivers’ and are held at potential vj = 0. The arrangement is surrounded by a grounded outer screen. We denote the measurement region by , being the region bounded by the electrodes and outer shield. Let i , i = 1, 2, . . . , Ne denote the boundary of electrode i when there are Ne electrodes, and let s be the inner boundary of the e outer shield. Then the boundary to is ∂ = ∪N i=1 i ∪ s . In the absence of internal charges, the electric potential u satisfies the generalized Laplace equation in which the permittivity ε appears as a coefficient, along with Dirichlet boundary value conditions corresponding to the voltage asserted at electrodes. 3. Data simulation for ECT 3.1. Mathematical model As discussed in section 2, measurements for ECT consist of the displacement charges at electrodes that result from voltages applied at electrodes. The forward map is the deterministic functional relationship from permittivity distribution to the noise-free displacement charges. In practice it is necessary to solve for the electric fields throughout the measurement apparatus, from which the displacement charge at electrodes may be evaluated. The particular set of displacement charges measured, and hence the forward map, actually depends on the set of electrode voltages applied during the measurement procedure. Here we describe the procedure employed in the ECT 6 Meas. Sci. Technol. 20 (2009) 052002 Topical Review asserted at electrodes ui |j . (The linear relationship is operation by the matrix of trans-capacitances C.) Note that the measurement set we describe, above, corresponds to asserting the standard basis of electrode voltages to fully characterize the linear mapping. Consequently, measurements made using any other set of electrode voltages are a function of the set we describe here. In the presence of noise, however, measurements made using some voltage patterns contribute more information about the unknown permittivity than others. Then, for a given number of measurements, optimal resolution may be achieved using an incomplete set of voltage vectors [40]. 3.2. Nature of the forward map The forward map, defined by solving the BVP (2) and then evaluating equation (3) defines a map from Dirichlet data to Neumann data (or flux) on the boundary of the region, hence gives a representation of the ‘Dirichlet to Neumann’ (DtN) map. The measured trans-capacitance matrix C is a discrete version of this map, between electrodes. We note, in passing, that the BVP being self-adjoint implies that C is symmetric, while the definiteness of the generalized Laplacian implies that C is positive definite [41]. At typical SNR of 1000:1, a study of the linearized forward map shows that ECT data contains information about at most 103 independent features of the permittivity [42]. This number increases linearly with geometric improvement of SNR, and hence is effectively an upper bound for practical instrumentation. It seems plausible that a similar bound holds for the nonlinear problem. Hence the use of 16 electrodes in ECT, giving 120 independent electrical measurements, is sufficient. Further measurements effectively only increase the SNR, which is achieved most efficiently by longer acquisition times rather than more electrodes and instrumentation. correlation coefficient ρxy Figure 6. Distribution of 2000 measured displacement currents on one electrode. The distribution can be well modelled as Gaussian distribution according to the normalized quantile plot (top) and the histogram of the displacement currents (bottom). 1 0.8 0.6 0.4 0.2 16 0 13 16 13 9 9 5 5 1 3.3. Model error Figure 7. Matrix of correlation coefficients. Off-diagonal elements are almost zero. Model error occurs when (theoretical) data defined by the forward map differs from the noise-free data produced by the physical measurement system. For many complex inverse problems model error is the most fundamental source of uncertainty since it is not avoidable. However, it is not usually a significant issue in ECT where precise instrumentation is represented well by the electric field equations. The most likely source of model error is when the actual and modelled geometries differ [43]. An interesting investigation of model error is given in [44], though it is important to note that in that work the correct (simulated) model lies within the range of assumed models. Much more problematic is model error outside any known range, though frameworks for that problem have been developed recently [45]. For the case when electrode i is held at potential v0 while all others are held at virtual earth, then the potential ui satisfies the Dirichlet boundary value problem (BVP), ∇ · (ε∇ui ) = 0, in , ui |i = v0 , ui |j = 0, ui |s = 0. j = i, (2) For brevity we have not written ε(r) and ui (r) showing the functional dependence on position r ∈ , but take that spatial variation to be implicit. The charge at the sensing electrode j can be determined by integration of the electric displacement over the electrode boundary, ∂ui dr, (3) ε qi,j = − ∂n j 3.4. Computer model While there are a few idealized problems in ECT for which analytic solution of the forward map is possible [46], in all practical cases accurate simulation of data requires computer evaluation of a discretized version of equations (2) and (3). where n is the inward normal vector. It can be seen from equations (2) and (3) that the measured displacement charges qi,j are a linear function of the voltages 7 Meas. Sci. Technol. 20 (2009) 052002 Topical Review ∂Ω Ω2, ε2 ∂Ω2 Ω1, ε1 Ω Figure 8. Cross-sectional view of an inclusion with permittivity ε2 in a background material with permittivity ε1 . Two discretization schemes are most commonly used: the finite element method (FEM) and the boundary element method (BEM). Both approaches have been applied to ECT. For the reference problem we have used a coupled FEM/BEM scheme [17, 31], with the region being imaged using a BEM formulation coupled to a FEM discretization of the insulating pipe and region outside the electrodes. Figure 9. A finite element mesh used for ECT. in the reference application where we can argue on physical grounds that surface tension at voids causes inclusions to be smooth. Further, the complexity of the BEM formulation increases dramatically as the number of inclusions increases. This makes BEM unsuitable for problems where a variable number of inclusions are allowed, such as the high-level representations discussed in section 5. In three-dimensional problems the number of nonzero entries in the BEM system exceeds the number in a FEM discretization, making BEM unsuitable because of computational cost. 3.4.1. Boundary element method. The BEM uses a discrete form of the boundary integral equations [47] that express fields within regions of constant permittivity in terms of values at the boundary of the region. Hence BEM is applicable to problems where the permittivity is piecewise constant, as in our reference problem, and has been used extensively in ECT [30, 48, 49]. Figure 8 illustrates a BEM discretization of an elliptic inclusion (ε2 ) in an otherwise constant background material with permittivity ε1 . For simplicity, linear boundary elements are used in the reference problem, and the electric potential and its normal derivative are assumed to be constant on each of the Nb elements, though higher order schemes are possible [50]. The resulting system to be solved has the form K + 12 I u = H q, (4) 3.4.2. Finite element method. In FEM discretization of equations (2), the region of interest is usually discretized as the union of triangular elements, each of constant permittivity, with the potential interpolated between nodes by piecewise linear functions [16, 51]. Figure 9 shows a FEM mesh recently used for ECT [52]. The discretized area includes the insulating pipe (dark grey) the region outside the pipe with electrode inset (light grey) and the region inside the pipe containing material of unknown permittivity being imaged. This mesh has about 6000 elements and is ‘unstructured’, with smaller elements around the electrode ends to give accurate representation of rapid changes in fields, and with larger elements towards the centre of the pipe where decreased resolution of ECT does not warrant finer division of the permittivity [42]. FEM discretization results in a linear system to be solved of the form where u is a known vector of Dirichlet values, q is the vector of Neumann values being solved for, and the matrices K and H are dense, not symmetric, and of size Nb × Nb when there are Nb boundary elements in total. BEM suffers from many potential numerical difficulties that make it problematic for use in solving inverse problems by iterative methods, whether statistical or deterministic. One such problem is that the matrix system to be solved becomes highly ill-conditioned when regions are thin or significantly non-convex. While these geometry-based problems have wellknown solutions, they do add to the complexity of computer codes; otherwise they pose a genuine difficulty for sampling algorithms since the state is required to explore all possible states including those for which BEM fails. A ‘fix’ that we have implemented for the reference problem is to modify the prior over boundaries to exclude states that present a numerical difficulty, such as polygonal boundaries with thin ‘spikes’. This pragmatic solution is unlikely to affect results Ki u = fi , (5) where u is a known vector of nodal values, over the whole mesh, and fi is a forcing vector and Ki is the stiffness matrix modified for the Dirichlet conditions corresponding to nonzero voltage on electrode i. Notably, the matrix Ki is symmetric, sparse and of size Ne × Ne when there are Ne nodes in the mesh. 8 Meas. Sci. Technol. 20 (2009) 052002 Topical Review to the boundary by solving a non-homogeneous equation with the current BEM matrices [57]. Data is simulated by solving the FEM equation (5) for multiple fi , one for each asserted voltage pattern. For twodimensional problems, efficient solution is achieved by first operating by a bandwidth reducing permutation followed by Cholesky factoring of Ki [52]. For three-dimensional problems with fine meshes, multigrid solvers are significantly faster, and also provide access to cheap solutions at coarse scales that may be utilized within the MCMC to decrease overall compute time [53]. For the efficient numerical implementation of (3) the electrode charge is directly calculated from the node potentials ui for sending electrode i and the global finite element stiffness matrix free of boundary conditions K without having to resort to gradient calculations, knTj ui . (6) qi,j = 4. Formulation of ECT as Bayesian inference Application of Bayesian inference to ECT provides a framework for quantified model fitting, by explicitly forming the conditional distribution of parameters defining the permittivity and unobserved data, given observed data. The distribution over parameters quantifies the degree to which measurements combined with other knowledge determine the unknown permittivity, or properties of the permittivity. The uncertainty over permittivities arises from several main sources: measurement noise, measurement sensitivities that are small compared to the noise, model inaccuracy, discretization error and a priori uncertainty in the true parameters. Defining a distribution over allowable permittivities includes the formulation used in deterministic approaches of seeking a single solution, when the distribution is highly peaked around a single parameter value. It also allows for more general circumstances of multi-modal densities corresponding to multiple solutions being consistent with the data. However, the real power for high-dimensional inverse problems is that robust estimates may be calculated over the density, whereas modes of the density corresponding to deterministic solutions can give results that are highly sensitive to the particular realization of noise in the measured data [12, 58]. nj The sum contains the scalar products of the rows of the stiffness matrix corresponding to the nodes nj of the sensing electrodes and the solution vector. In our recent work in ECT, and EIT, we have used the FEM discretization only, for reasons of speed, generality and for the generic structure of the FEM system of equations that allows very efficient calculation Jacobians and local updates [39, 54]. The permittivity ε(r) defined by parameters θ is first mapped to FEM elements, and the solution is then calculated. 3.4.3. Discretization error. In BEM or FEM formulations, the number of elements used is a compromise between numerical accuracy and computational effort. However, good imaging results require that the discretization be made sufficiently fine so that errors introduced through discretization are smaller than measurements errors. The mesh depicted in figure 9 was designed as the coarsest mesh meeting this requirement. Failing to achieve that, and not including discretization errors correctly, leads to substantially increased errors in the recovered permittivity. This important result is demonstrated explicitly in [18, 55]. Coarse numerical discretization has been used, either in conjunction with accurate solvers or by including discretization error, to speed up sample-based inference in ECT. Schemes for achieving this are covered in section 6.3.2. 4.1. Likelihood function The unavoidable presence of measurement noise means that the measurement process is probabilistic, as we saw in section 2.3. The inverse problem is then naturally a problem of statistical inference. In the following we outline the inferential formulation in a general setting and relate it to the reference problem in ECT. For the sake of definiteness, consider the case of additive noise n with probability density function πn (n). In most cases the measurement noise has a multivariate Gaussian or a Gibbs distribution [55], though it is important to note that any noise process may be treated. Then the measurement process can be written as 3.4.4. Factorizations and derivatives. When efficient solution of systems (4) and (5) is performed by first factorizing matrices, it is feasible to directly maintain the QR factorization in BEM, and the Cholesky factorization in FEM [56], for potential gain in computational efficiency. Both these schemes have been applied to EIT, as has direct updating of solutions using the Woodbury formula [39] though the latter is numerically unstable in the long term. Both FEM and BEM formulations also allow efficient calculation of derivatives. In a FEM formulation, operation by the Jacobian from finite element coefficients to solutions may be performed using only solutions at the current state [13], with gradients with respect to the parameter vector calculated using the chain rule. In BEM formulations, expressions for the Fréchet derivative of the forward map allow evaluation of the gradient with respect d = A(θ ) + n, (7) where A(θ ) denotes the forward map describing the mapping from permittivity defined by θ to noise-free measurements, and n is a realization from the noise process. Equation (7) represents a decomposition of measurements into deterministic (A(θ )) and random parts (n). For the instrumentation described in section 2 the decomposition was relatively clear, largely because of careful construction of instrumentation and repeated improvement to remove stray effects. In general, however, the decomposition is somewhat arbitrary since it is possible to describe any effect as random. The random part has a minimal component consisting of thermal and shot noise in the electronics, and digitization errors, and often includes external interference, though the 9 Meas. Sci. Technol. 20 (2009) 052002 Topical Review latter can be modelled as deterministic through the use of ‘nuisance variables’. Our experience is that quality of imaging, and inference, is always improved by putting as much as possible into the deterministic part, though complexity of physical modelling and computation often set a practical limit. The conditional probability density for measuring d given that θ is the true parameter then follows from equation (7), and is π(d|θ ) = πn (d − A(θ )) uncertainty bounds. Bimodal distributions need at least two values reported, and so on. Summary statistics of some function f (θ ) may be calculated as expectations over the posterior f (θ )π(θ |d) dθ (10) E[f (·)] = with common statistics being the mean E[θ ] and the Since parameter space variance E[(θ − E[θ ])2 ] of θ . is usually of high dimension, the integrals required cannot be performed analytically, or using deterministic numerical methods such as Gaussian quadrature. Fortunately, Monte Carlo approximations can be evaluated with tractable computation, as discussed in section 6. (8) since the change of variables has determinant 1. Making a set of measurements corresponds to drawing a sample d from π(d|θ ) which is a probability distribution parametrized by the unknowns θ via the forward map A. As a function of θ, π(d|θ ) is not a probability density function and is usually written l(θ |d) and referred to as the likelihood function. The likelihood principle is the formal statement that all information that data d contains about the unknown parameters θ is encoded in the likelihood function, for fixed d. The form of the likelihood function we use in the reference problem is given in equation (1), accounting for errors in measured displacement charges. In practice the voltages asserted at electrodes are also imprecisely known and the nominal values should be considered as part of the measurement set. A framework for that more complete analysis is given in [39], which augments rather than changes our development here. 4.3. Sensor calibration Our inferential formulation for the reference problem is actually a little more complicated than the general procedure given above, because of the sensor calibration step. The permittivity ε is decomposed (in this section only) into unknown and fixed parts, with separate domains. We write εext = ε|ext , εint = ε|int where int is the interior of the pipe, that in the application contains material of unknown permittivity, and ext is the pipe and exterior region in which the electrodes are fixed. Calibration consists of estimating the exterior permittivity εext parametrized by θext containing a few permittivity values as well as a few parameters describing possible deviations from ideal geometry. Repeated measurements are made with simple known interior permittivities parametrized by (known) parameter θint , allowing the simple best-fit estimate 4.2. Bayesian inference Statistical inference aims at recovering parameters θ and assessing the uncertainty about these parameters based on all available knowledge of the measurement process and the measurement noise as well as information about the unknowns prior to the measurement. In the Bayesian formulation, inference about θ is based on the posterior density π(θ |d) θext = arg max π(d|θext , θint ). θext (11) We then fix, i.e. condition on, this estimate to give the likelihood function for inference about unknown θint . 4.4. A short reading list in Bayesian inference l(θ |d)π(θ ) , (9) π(d) where π(θ ) denotes the prior density, expressing the information of θ prior to the measurement of d. The topic of prior distributions is covered in section 5. The posterior density π(θ |d) denotes the probability density over θ given the priorinformation and the measurements. The denominator π(d) = θ l(θ |d)π(θ )dθ is a finite normalizing constant since the sum of the posterior probability density function over all possible causes must be equal to one. In case of a fixed forward map such as in ECT this probability density does not need to be calculated explicitly and we can work with the nonnormalized posterior distribution determined by the likelihood function and the prior density function. In this inferential framework, solution to the inverse problem corresponds to providing statistics that summarize the posterior distribution. How to summarize the posterior depends greatly on the application under consideration. For example a posterior distribution peaked around a single value is well summarized by giving just that value, and a measure of width, corresponding to a well-defined inverse image with π(θ |d) = Our treatment here of the Bayesian formulation is necessarily cursory, and light on technical details. For more details of Bayesian formulations of inferential problems in general (not necessarily for inverse problems or ECT) we recommend [59]. Computing expectations over complex densities in parameter space necessarily use sampling algorithms. A practical introduction is given in the first few chapters of [60]. More technical results regarding convergence of MCMC methods can be found in [61]. A kit-bag of useful ideas for speeding up MCMC can be found in [62]. 5. Representation of unknowns and the prior distribution Since the primary unknown, i.e. the permittivity, is a spatially varying function, recent developments in spatial statistics [63] and pattern theory [64] are directly applicable. They provide means of stating loose, generic and specific information about the unknown permittivity, as befits the application. A good overview of image analysis from the statisticians viewpoint 10 Meas. Sci. Technol. 20 (2009) 052002 Topical Review for arbitrary permittivity distributions. Grey values may be restricted to two values (black/white) [13, 53, 65] or a finite set of allowable values [25] or more typically to any positive value [26, 44]. Low-level prior distributions usually have the role familiar in regularization of preferring smoothness, sometimes modified to implicitly allow non-smooth behaviour such as edge processes. Good overviews of low-level prior modelling in EIT are given by Kaipio et al [26, 55] and Siltanen et al [66], and for single-photon emission computed tomography by Aykroyd et al [67]. The following gives a brief discussion of low-level prior models used for ECT: is [19]. The central components of all image models are the representation of the unknown image, i.e. the parameters or coordinates used to define an image, and a normalizable prior density over the space of representations. Representation and knowledge are inextricably linked, and so the reason for choosing a particular representation should be largely determined by the type of knowledge one wants to express or calculate. For example, in the reference ECT problem we know that the permittivity is two valued (background and inclusion) and our primary interest is the area of the inclusion. A polygonal representation of the inclusion, with background and inclusion permittivity values, is a suitable representation since it automatically states the physical knowledge and allows straightforward calculation of the area. In contrast, a grey-scale pixel image would require further constraints to state the prior knowledge, while the calculation of inclusion area is a non-trivial task requiring identification of the inclusion boundary amongst other steps. In many applications, the use of representations that provide quick access to properties of interest can provide substantial efficiencies, since then image restoration and image analysis are not separate tasks. It is useful to classify representations, and priors, as lowlevel, mid-level and high-level. Low-level representations are local and generic, and usually very high-dimensional, such as grey-scale pixel images, or the vector of element coefficients in a FEM discretization. These representations can be used for any image, but are inconvenient for stating or calculating anything other than local structural information. Mid-level models are also generic, but provide convenient ways of expressing quantities of interest such as geometric features of objects, or between objects. An example is the polygonal representation used in the reference problem. Highlevel models capture important, possibly complex, features of the images and are useful for answering global questions about the image, such as counting the number of objects of a given type. The formulation of a statistically sensible prior distribution over the space of representations is a major practical difference between regularization and Bayesian inferential methods. Consistency of the statistical formulation and guaranteed convergence of sampling algorithms both require that the prior density be normalizable. We find that the requirement of specifying a parameter space with finite volume has the added benefit of forcing us to be explicit when modelling the image. In the Bayesian framework it is typical to test modelling assumptions by drawing several samples from the prior distribution and ensuring that they look reasonable. In contrast, typical regularization functionals would fail these requirements. The role of the prior distribution is typically different for low- mid- and high-level representations, as we will see in the following review of representations and priors for ECT. • Gaussian priors. The Gaussian white noise prior, in equation (12), is the most widely used prior model, since the diagonal covariance matrix generalizes standard Tikhonov regularization 1 2 (12) π(θ ) ∝ exp − 2 θ − θ . 2σ The variance σ 2 describes the variability of the unknown parameters θ around the assumed mean value θ . • Markov random field (MRF) priors. Smoothness priors are special cases of MRFs in which the conditional density over pixels (elements) depends on the remaining parameters only through its neighbours. A typical MRF prior is the total variation prior [55] given by T V (θ ) = M lij |θi − θj |, (13) i=1 j ∈Ni where Ni ⊂ {1, 2, . . . , M} is a set of possible neighbours / Ni and i, j are neighbours with a common of θi with i ∈ edge of length lij . Common neighbourhood structures are induced by the pixel lattice or FEM mesh. The prior probability density is then π(θ ) ∝ exp{−βT V (θ )}, where β is a smoothing parameter. We note that π(θ ) is a Gibbs distribution [23]. An application of a non-standard MRF prior can be found in the recovering of resistor values in an electrical network from electrical measurements collected at the boundary [13]. • Impulse noise priors. Such priors are typically applied to low contrast problems where small regions in an otherwise uniform background have to be recovered (e.g. bright stars in black sky). Representative priors are the maximum entropy prior [68] and the L1 prior [69]. • Sample-based priors. In some applications it is possible to define a representative ensemble of images that may occur, through a set of sample images that define an empirical prior density. This prior has been used in the context of pixel image models [55]. 5.2. Mid-level models 5.1. Low-level models Mid-level representations allow access to generic structural information about the unknown permittivity, without imposing complex structure. Examples of mid-level priors used for Low-level representations use grey-scale values over a pixel (voxel) lattice, or a fixed FEM discretization, and can be used 11 Meas. Sci. Technol. 20 (2009) 052002 Topical Review ECT, or EIT, include the ‘type field’ or segmented MRF prior [25, 70, 71], coloured continuum triangulations [72] and explicit boundaries between piecewise-constant permittivities such as the polygonal boundary used in the reference application (though some classify the latter as high level [19]). Representing unknowns as piecewise constant via boundaries has been widespread in ECT, with general contour models based on Fourier descriptors [66, 73], splines [74], radial basis functions [54], front points [75], Bezier curves [76] and simple polygons [48, 49, 54]. Fewparameter representations of smooth contours [77, 78] and smooth transitions [42] between regions of different physical properties have also been used. Our solution to the reference problem uses a polygonal representation of the boundary, so the permittivity is defined by the parameter θ = ((x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )) giving the vertexes of an n-gon for some fixed n typically in the range 8–32. We also include the two permittivity values, but we will omit that consideration here for clarity. A basic prior density over this representation is to sample each vertex (xk , yk ) uniformly in area from the allowable domain and restrict to simple polygons, i.e. not self crossing. This prior density has the form π(θ ) ∝ I (θ ), 10 x 10 4 9 8 number of samples 7 6 5 4 3 2 1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 area (arbitrary units) Figure 10. Empirical distribution over area for indicator function prior density. reconstructions. These difficulties can be circumvented by further modifying the prior distribution. We briefly mention some other mid-level priors that have been used in ECT. (14) • Smoothness prior over star-shaped polygons. Aykroyd et al represented polygons in a star-shaped manner, parametrizing the centre of the star and radii r = (r1 , r2 , . . . , rn ) at m equi-spaced angles [48]. They specified a prior intended to give smoothness of the boundary, using ⎫ ⎧ ⎬ ⎨ 1 (ri − rj )2 , (16) π(r) ∝ exp − 2 ⎭ ⎩ 2ν where I is the indicator function for θ representing a feasible polygon. That is, the prior density is constant over allowable polygons. The change of variables relation for probability distributions shows that a uniform density in vertex position gives a density over area that scales as (area)−1/2 . Hence for large polygons, and inclusions, where most polygons are simple, this prior puts greater weight on small areas resulting in estimated areas that will always be smaller than the true area. The constraint that the polygon be simple complicates this picture for small area inclusions, since a greater proportion of small polygons are self-crossing. An empirical distribution over prior area, given by sampling from the prior, is shown in figure 10 for n = 8. The overall effect is that the area of large inclusions will be underestimated while the area of small inclusions will be overestimated, with the division between ‘small’ and ‘large’ depending on the number of vertexes n. This effect necessarily occurs in regularized or least-squares fitting of contour-based models, since effectively the constant prior model is used. Since we are primarily interested in area of inclusions, we remove this bias by explicitly specifying a prior in terms of area, given in equation (15) [79], though scaling based on the empirical distribution to give a prior that is non-informative with respect to area is also possible [52]. The circumference of the inclusion c(θ ) with respect to the circumference of a circle with an area equal to the area of the polygon (θ ) is calculated. The variance σpr2 is chosen to be small to penalize small and large areas c(θ ) 1 −1 π(θ ) ∝ exp − 2 I (θ ) . (15) √ 2σpr 2 (θ )π i∼j where i ∼ j indicates neighbouring radii. In the context of the reference problem, it is interesting to note that this prior will exhibit the bias in (large) area outlined above, as is evident from computed results. • Structural priors. Especially in medical imaging, geometry and position of structures are often known a priori. Hence, it is reasonable to include this knowledge in the form of a prior model [80, 81], or equivalently by fusing different sensing modalities. An example is the use of ultrasound tomography to recover boundaries for use in ECT [82]. Parameter spaces for mid-level models are not-usually linear spaces. Consequently, non-uniqueness or ill-posedness results for forward maps based on the theory of linear spaces do not apply. We have experience of industrial applications where an inverse problem that was under-determined for a low-level linear-space model became over-determined for a mid-level model and required reduction of the measurement set to avoid excessive computation. 5.3. High-level models High-level models incorporate structural information by modelling objects in the image. Representations are typically Redundancy in the polygonal representation can also lead to numerical inefficiency without contributing to quality of 12 Meas. Sci. Technol. 20 (2009) 052002 Topical Review The reversible jump formalism considers the composite parameter (θ, γ ) where θ is the usual state vector, and γ is the vector of random numbers used to compute the proposal θ . Similarly, (θ , γ ) is the composite parameter for the reverse proposal. Then the MCMC sampling algorithm with MH dynamics can be written as variable dimension allowing for differing numbers of objects. Hence, perhaps contrary to expectation, high-level models are usually higher (infinite) dimensional models than lowor mid-level models. High-level prior densities are defined over individual objects and, importantly, also over the number of objects to provide a trade off between model complexity and data fit, as in an ‘information criterion’. A high-level representation was used in [70] allowing conditioning on, and counting of ‘blobs’ in an EIT application. It seems inevitable that high-level deformable template models [64] developed for medical imaging, and other applications, will find application in ECT. Let the chain be in state θn = θ , then θn+1 is determined in the following way: • Propose a new candidate state θ from θ with some proposal density q(θ, θ ). • Calculate the MH acceptance ratio π(θ |d)q(γ ) ∂(θ , γ ) . (17) α(θ, θ ) = min 1, π(θ |d)q(γ ) ∂(θ, γ ) 6. Summarizing the posterior distribution • Set θn+1 = θ with probability α(θ, θ ) , i.e. accept the proposed state, otherwise set θn+1 = θ , i.e. reject. • Repeat In the following we write the abbreviated π(θ ) for the posterior density π(θ |d). Exploration of the posterior distribution is performed using Markov chain Monte Carlo (MCMC) sampling that generates a Markov chain with equilibrium distribution π by simulating an appropriate transition kernel [83, 84]. The long-term output of MCMC samplers are states θi distributed according to π , and we write θi ∼ π . The empirical distribution defined this way can be used to summarize π , or in exploratory analyses the samples may simply be displayed to gain understanding about the nature of permittivities consistent with the data. In many applications a single sample from π provides a better reconstruction than regularized inversion [58]. A few independent samples, say 2–4, can establish a scale and nature of ambiguity in the allowable permittivities (see e.g. [20]), while extensive sampling allows quantitative estimates of posterior variability in applications where that is needed. Computed results for the reference problem, presented in section 7, are designed to present the range of states in the posterior by summarizing posterior area and boundary processes. The last factor in equation (17) denotes the Jacobian determinant of the transformation from (θ, γ ) to (θ , γ ). 6.2. Monte Carlo integration Quantitative estimates from the posterior distribution require computing the expectations in equation (10). Given samples {θi }i=1,2,...,N from π , the required integral may be computed using the Monte Carlo approximation N 1 f (θ )π(θ ) dθ ≈ f (θi ). (18) N i=1 According to the law of large numbers, equation (18) holds to any desired accuracy for sufficiently large N. In addition, it follows from the central limit theorem that the approximation error is independent of dimensionality of the state space and hence MCMC methods are suitable for high-dimensional problems. The variance of the approximation error is given by N 1 f (θi ) − f (θ )π(θ ) dθ Var N i=1 ⎞ ⎛ N Var(f ) ⎝ j ≈ ρ̂j ⎠ 1− (19) 1+2 N N j =1 6.1. Metropolis–Hastings algorithm The Metropolis–Hastings (MH) algorithm generates a Markov chain with equilibrium distribution π by simulating a suitable transition kernel [27]. It uses a proposal density q(θ, θ ) to suggest a new state θ when at state θ , i.e. a possible move θ → θ . The proposal is accepted or rejected according to a rule that ensures the desired ergodic behaviour. Choice of the proposal density is largely arbitrary, with convergence guaranteed when the resulting MCMC is irreducible and aperiodic. However, the choice of proposal distribution critically affects efficiency of the resulting sampler, with design of a good proposal being something of an art. The standard MH formalism has been extended to deal with transitions in state space with differing dimension [83], allowing insertion and deletion of parameters [72, 85, 86]. Even though we do not use variable-dimension states in the reference example we prefer this ‘reversible jump’, or Metropolis–Hastings–Green (MHG), formalism as it greatly simplifies calculation of acceptance probabilities for the subspace moves that we employ. τf with Var(f ) = E f (θ )2 − μ2f . The factor ρ̂j = Ĉj /Ĉ0 is the normalized autocovariance function (ACF) and Ĉj is the ACF at lag j , i.e. Ĉj is the covariance between the values taken by f for two random states of the chain θi and θi+j . Consequently, the less correlated are consecutive states of the chain the more accurate are estimates. When {θi }N i=1 are independent, i.e. uncorrelated, samples and the estimator for the mean of f is ! f̄N = N1 N i=1 f (θi ) and the variance of the estimator is Var(f ) . (20) N However, almost always {θi }N i=1 produced by MCMC is a sequence of correlated samples. The rate τf /N at which the Var(f¯N ) = 13 Meas. Sci. Technol. 20 (2009) 052002 Topical Review variance Var(f¯N ) reduces in equation (19) is called statistical efficiency, and we write the variance reduction for independent samples as Var(f ) . (21) Var(f¯N ) = τf N The quantity τf is called the integrated autocorrelation time (IACT) and can be interpreted as the number of correlated samples with the same variance-reducing power as one independent sample. For a given posterior distribution the Markov chain should be designed in a way that τf is as small as possible so that small variance σf̄N = Var(f¯N ) of the estimate f¯N is achieved with minimal sample size N. Parallel tempering is similar to simulated tempering except that the N chains (one for each value of k) are maintained simultaneously. An example is the Metropolis coupled MCMC in [84] that simultaneously runs chains with the spatial parameters increasingly coarsened, defining a sequence of distributions as above. 6.3.2. Using approximations to the forward map. As a means of model reduction (and counteracting inverse crimes) Kaipio and Somersalo [55] introduced the ‘enhanced error model’ to correct for discretization errors introduced by coarse numerical approximations. For the case of Gaussian prior and noise distributions, they considered the accurate model d = Aθ + n and the coarse approximation d = Ãθ̃ + ñ where θ̃ = P θ is a coarse approximation to the unknowns x resulting from a projection by P, and à is the (cheap) approximation to A on coarse variables. Then 6.3. Acceleration schemes for Metropolis–Hastings MCMC A major drawback of MCMC sampling is its computational expense since the forward problem has to be solved hundreds of thousands times to explore the posterior distribution. Hence much research effort has gone into finding ways of accelerating the basic MH algorithm. We review several such schemes, dwelling on those that have found application in impedance tomography. ñ = (A − ÃP )θ + n defines the enhanced error model by assuming that the two terms on the right-hand side are uncorrelated. Use of the coarse approximation necessarily increases the uncertainty of recovered values, since discretization error has been introduced. However, Kaipio and Somersalo [55] give examples in which a tolerably small increase in posterior uncertainty is traded for a huge reduction in compute time without introducing bias in estimates, and demonstrate that accurate real-time inversion is possible. A second use of approximations was introduced by Christen and Fox [13] who considered the state-dependent approximation πθ∗ (·) to the posterior distribution calculated using a cheap approximation to the forward map, to give a modified Metropolis–Hastings MCMC. Once a proposal is generated from the proposal distribution q(θ, θ ), to avoid calculating π(θ ) for proposals that are rejected, they first evaluate the proposal using the approximation πθ∗ (θ ) to create a second proposal distribution q ∗ (θ, θ ) that is then used in a standard Metropolis–Hastings algorithm. Christen and Fox [13] present an example using a local linearization, and demonstrate an order of magnitude speedup for a problem in electrical impedance imaging. Lee used a coarse BEM approximation to speed up inverse obstacle scattering [57], while coarsened solutions available in a multilevel (multigrid) solver were used for a problem equivalent to ECT in [53]. 6.3.1. Simulated tempering. Consider the case where the MH algorithm is used to sample from posterior distribution π(·) using proposal distribution q(θ, θ ) and it is found that the resulting chain is evolving slowly, or worse still is getting stuck. This can happen because of multi-modality of π(·), or because of strong correlations as is typical in inverse problems where the support of π(·) can be effectively a low-dimensional subspace of . Simulated tempering (with the name and idea adapted from simulated annealing for optimization) is a general method that can overcome some of these difficulties, while using the existing proposal distribution. The methods augments the state space to × {0, 1, . . . , N } and defines a set of distributions {πk (·)}N k=0 where π0 = π and π1 (·), π2 (·), . . . , πN (·) are a sequence of distributions that are increasingly easy to sample from. The distribution over the augmented space is taken as π(x, k) = λk πk (x), (23) (22) where λ0 , λ1 , . . . , λN are pseudo prior constants with !N λ = 1. Transitions for a fixed k are derived from k k=1 the proposal q(θ, θ ) and are interspersed with proposals that change k (perhaps by a random walk in k) with both accepted/rejected by a standard Metropolis–Hastings algorithm. The random walk then occurs in (θ, k) space. Samples that have k = 0, i.e. from the conditional density π(θ, k|k = 0), are samples from the desired distribution. A simple example of such a sequence is the scheme due to Marinari and Parisi [87] who introduced simulated tempering. Define the positive numbers (inverse temperatures) 1 = β0 < β1 < · · · < βN . The sequence of distributions is then given by πk (θ ) = λk π βk (θ ) which are increasingly unimodal. The opposite regime, of increasing temperature, has found greater success in impedance tomography applications where high-accuracy data leads to posterior distributions that are too narrow to easily sample [70]. 6.4. Summary statistics The posterior distribution is typically defined over a highdimensional parameter space so direct visualization is not possible. However, in the Bayesian framework we are able to calculate summary statistics that quantify and examine the feasible solutions to the inverse problem. Histograms for properties, posterior variability of parameters, and expectations can be easily derived from the posterior distribution. Scatter plots allow for an illustrative and meaningful depiction of the range of feasible solutions (see section 7). 14 Meas. Sci. Technol. 20 (2009) 052002 Topical Review More generic summary statistics of a posterior distribution are the point estimates (or modes) and interval estimates, computed via numerical optimization. The most common point estimate is the maximum a posteriori (MAP) estimator given by θMAP = arg max π(θ |d) θ noise implies that in practice a range of data may be measured for a given state θ . Let π(d|θ ) denote the probability density function over allowable measurements d for a given true state θ . Making a set of measurements corresponds to drawing a sample d from π(d|θ ). Our objective now is to work out what we can say about the parameter θ given measurements d. Inference about θ is based on the posterior density π(θ |d) applying Bayes’ theorem (9). The posterior density π(θ |d) gives the probability density over allowable states θ conditioned on measurements and prior information. Summarizing the posterior distribution corresponds to solving the inverse problem since that gives knowledge of the allowable values of parameters with uncertainties, etc. In section 2.3 we have derived the likelihood function l(θ |d) as a function of θ for the given ECT system. Note that l(θ |d) is generally not a probability function. Since in our ECT application the estimation of the process parameter material or void fraction is of primary interest, we have to specify an appropriate prior π(θ ) that allows for an unbiased reconstruction of inclusion area. To avoid this bias we specify a prior density in terms of area directly, given in equation (15) in section 5 [79]. The circumference of the inclusion c(θ ) with respect to the circumference of a circle with an area equal to the area of the polygon (θ ) is calculated. The variance σpr2 is chosen to be small to penalize small and large areas. Based on the accurately modelled forward map, the specified prior distribution π(θ ) for the shape and permittivity of the material inclusion, and the measurement noise model, we are able to give a posterior distribution π(θ |d) for inclusion conditioned on measured data 1 c(θ ) π(θ |d) ∝ π(θ )l(θ |d) = exp − 2 −1 √ 2σpr 2 π (θ ) 1 T −1 − (qm − d) (qm − d) . (28) 2 (24) since this mode of the posterior distribution generalizes regularized inversion. For example, when the noise is additive zero mean Gaussian with variance σ 2 , and we use a simple Gaussian prior π(θ ), then the MAP estimate becomes the classical deterministic setting known as Tikhonov regularization [12]. In case of multi-modal posterior distributions, or if the mode of the posterior distribution lies far away from the bulk of the posterior distribution, the MAP estimate provides an unsatisfactory summary of feasible parameter values, and can be very unrepresentative of the support of the posterior distribution. The estimate given by the mode of the likelihood function, often erroneously described as the estimate that corresponds to the set of parameters which are most likely to generate the measured data, is called the maximum likelihood (ML) estimate and is defined by θML = arg max l(θ |d). θ (25) This estimator is equivalent to solving the inverse problem without taking regularization into account [55]. Hence, in ill-posed inverse problems such as ECT the ML estimate is seldom useful. A more robust point estimate is the conditional mean (CM) of the parameters conditioned on the measured data d, θCM = E[θ |d] = θ π(θ |d) dθ. (26) A common interval estimate is the conditional covariance (CC) estimate given by Cov(θ |d) = (θ − θCM )(θ − θCM )T π(θ |d) dθ ∈ Rn×n . It is this distribution we explore to learn about the unknown permittivity, by applying MCMC sampling with Metropolis– Hastings dynamics. We use several types of update to propose a new state θ , usually referred to as ‘moves’ [79]. The choice and the combination of such moves is a crucial issue in MCMC sampling in order to ensure ergodic behaviour of the chain within useful time scales and to force convergence. Multiple moves can be built into the MCMC sampler simply by defining separate reversible transition probabilities for each move [25, 83]. Let M be the number of moves, and let {P r (i) (Xn+1 = θn+1 |Xn = θn )}M i=1 represent a set of M transition probabilities which are reversible with respect to the posterior distribution and let νi , i = 1, . . . , M be the probability of choosing move i, then the overall transition probability is given by (27) The price of robustness in the CM and CC estimates is an integration over the high-dimensional parameter space, requiring extensive MCMC sampling. 7. Numerical examples The previous sections give a complete formulation of ECT in the Bayesian inferential framework. The uncertainties in measured data, i.e. in measured inter-electrode capacitances, and the range of feasible images are taken into account by statistically modelling the measurement process according to section 4. Our representation of the true parameters is the set of points {(xi , yi )}, giving vertexes defining a polygonal boundary of a material inclusion [31, 48, 88, 89]. This set denoted θ ∈ defines a ‘state’ in our reconstruction algorithm. The forward map in section 3 relates the state θ to noise-free measurements. The omnipresence of measurement P r(Xn+1 = θn+1 |Xn = θn ) = M i=1 15 νi P r (i) (Xn+1 = θn+1 |Xn = θn ). (29) Meas. Sci. Technol. 20 (2009) 052002 (a) Topical Review (b) (a) (b) Figure 12. Scatter plots. (a) Entire domain with circular inclusion. (b) Detail plot. The dashed grey contour corresponds to the true shape. (c) and a permittivity of εr = 3.5 in an air-filled pipe (εr = 1.0) are considered. For the experiments using synthetic data 2000 000 samples were drawn from the posterior distribution, using a simulated data set corrupted by noise. The data was created using 10 000 boundary elements in the forward map. For the experiment using measured data 1000 000 samples were drawn from a posterior distribution. Noise standard deviation in both cases was σ = 6.7 × 10−4 . A burn-in period of 50 000 samples was found suitable after testing the algorithm with different initial states and different ratios of moves. During the burn-in-period, which strongly depends on the initial state of the Markov chain, the sampling distribution is not the equilibrium distribution. Once the chain is in equilibrium only every 100th sample is stored. The length the burn-in was found by testing the algorithm for different initial states and different ratios of moves T , V , S , R to propose a new candidate state. For the results presented, the four proposal moves were chosen with equal probability, giving an acceptance rate of about 2%. In the absence of a comprehensive test for Markov chain convergence, we analyse selected output statistics in terms of stationarity. Once the chain achieves stationarity, samples are assumed to come from the equilibrium distribution and are used for posterior inference. (d) Figure 11. Different moves to propose a new candidate θ . (a) Translation T , (b) vertex move V , (c) scaling S , (d) rotation R. At least one of the M moves has to be irreducible on the state space to ensure that the equilibrium distribution of the Markov chain is independent of the initial choice θ (0) of the parameter vector [65]. We find a combination of M = 4 moves gives a suitably efficient MCMC in this example [31]. These are translation, rotation, and scaling of the polygon, and moving the position of one vertex of the polygon. Translation T : translate the polygon described by the parameter vector θ by a random step λT ∼ U (−ρ T , ρ T ). Scaling S : scale the entire polygon by a random multiplier λS ∼ U ρ1S , ρ S with ρ S = 2. Rotation R: rotate the polygon by a random angle λR ∼ U (−ρ R , ρ R ) with respect to the centre c of the polygon. cos(λR ) sin(λR ) θ = c + θ (30) −sin(λR ) cos(λR ) 7.1. Example 1—circular inclusion Figure 12 illustrates the resultant posterior variability in inclusion shape as a scatter plot. This plot shows points taken randomly from each state, uniformly in boundary length, and gives a graphical display of the probability density that a boundary passes through any element of area. The points are clustered around the true state, shown by the dashed grey contour, indicating that the posterior has a well-defined mode close to the true value. Hence, for this case, point estimators calculated from the posterior such the MAP state and the CM state will give similar results. Histograms of reconstructed inclusion area and circumference are depicted in figure 13. Sampled area as well as sampled circumference are scattered around their true values true = 3.14× 10−4 m2 and ctrue = 6.28× 10−2 m. The estimated parameters and their posterior variability are summarized in table 1. The MCMC output trace of inclusion area and inclusion circumference are shown Vertex move V : shift one vertex of the polygon by a random step λV ∼ U (−ρ V , ρ V ). The Jacobian term in the MHG algorithm for the moves translation, rotation and vertex move is 1. For the scaling move the Jacobian term is λ−2n+1 [31]. The vertex move ensures irreducibility but, by itself, would lead to a very slow algorithm. The remaining moves are designed to give an efficient algorithm. A new candidate θ is proposed from θ by randomly choosing one of these four moves and using a random step size λi tuned for each move. In the following different shaped material inclusions are recovered from simulated and measured data using MCMC sampling. For all experiments inclusions with different shapes 16 Meas. Sci. Technol. 20 (2009) 052002 Topical Review Table 1. Posterior variability of circular inclusion. Quantities True values Mean x-coordinate of centre (m) y-coordinate of centre (m) Area (m2 ) Circumference c (m) Log likelihood 0.00 −2.09 × 10−4 2.50 × 10−2 2.52 × 10−2 3.14 × 10−4 3.17 × 10−4 6.28 × 10−2 6.24 × 10−2 – −38.76 4000 3500 3500 3000 3000 Standard deviation IACT 2.46 × 10−4 6.76 × 10−5 5.03 × 10−6 6.95 × 10−5 0.26 5.75 × 103 2.24 × 103 1.98 × 102 3.09 × 102 6.48 × 102 2500 2500 2000 2000 1500 1500 1000 1000 500 500 0 0.06 0 3 3.1 3.2 3.3 3.4 2 inclusion area [m ] 3.5 0.061 0.062 0.063 0.064 0.065 0.066 inclusion circumference [m] x 10 (a) (b) Figure 13. Summary statistics. (a) Histogram of reconstructed sample areas. (b) Histogram of reconstructed sample circumferences. 3.6 log likelihood autocorrelation 1 0.5 0 0.5 0 1 0.5 0 0.5 0 deviation 3.2 0 5000 increased margin of deviation 10000 15000 0.066 log likelihood autocorrelation 1.5 output trace 3.4 3 2000 4000 6000 8000 x 10 2000 4000 6000 8000 lag (a) 0.064 (b) Figure 15. Scatter plots. (a) Entire domain with elliptic-shaped inclusion. (b) fancy-shaped contour. 0.062 0.06 0 some variation about the horizontal axis. The IACT is given in the right-most column of table 1, while the bottom row gives the mean of the log likelihood. The posterior variability represented by the standard deviation of the parameters is small implying high reliability in estimated parameters. Note that estimated intervals contain the true values within one standard deviation. 5000 10000 15000 MCMC updates Figure 14. MCMC output trace (left column) and autocorrelation (right column) of inclusion area (top) and inclusion circumference (bottom) in updates. in the right column of figure 14. The left column corresponds to the autocorrelation in updates. The ACF discussed in section 6.2 provides a useful tool for investigating serial dependence in stationary time series data, as the presence of serial correlation is revealed by a slowly decaying ACF. The faster the autocorrelation function for a stationary time series decays to 0 with increasing lag, the less the correlation between consecutive states of the chain with consequent reduction of variance in estimates. In general, the autocorrelation function should be—after falling off smoothly to zero—distributed with 7.2. Example 2—elliptic and fancy-shaped contour Figures 15(a) and (b) illustrate the resultant posterior variability in inclusion shape for more fancy shapes. Points in the scatter plot are clustered around the true contour plotted in dashed grey. Due to the decreased sensitivity in the centre of the pipe, the margin of deviation of scattered points for the elliptic contour is greater towards the centre than in the region close to the electrodes. For the fancy contour in 17 Meas. Sci. Technol. 20 (2009) 052002 Topical Review Table 2. Posterior variability of circular inclusion from measured data. Quantities True values Mean x-coordinate of centre (m) y-coordinate of centre (m) Area (m2 ) Circumference c (m) Log likelihood – 3.71 × 10−2 – −1.14 × 10−2 3.14 × 10−4 3.13 × 10−4 6.28 × 10−2 6.24 × 10−2 – −46.10 Standard deviation IACT 2.32 × 10−5 3.02 × 10−5 6.88 × 10−6 1.57 × 10−4 1.72 × 10−1 5.89 × 102 4.65 × 102 1.10 × 103 1.88 × 103 3.99 × 102 8. Non-stationary ECT In several industrial applications the physical quantities of interest are time dependent and consequently, the measured data depends on these quantities at different time steps. On the other hand, it is often impossible to wait for all the data to be collected before giving a parameter estimate. Typical dynamic imaging examples are transient combustion processes, sedimentation in hydrocyclones, mixing or flow processes. These challenging classes of problems are referred to as non-stationary inverse problems [55]. Bayesian recursive approaches exhibit a powerful framework to provide continually updated parameter estimates as the data arrives. The most popular representative to extract a signal or parameters from a series of incomplete and noisy measurements, is the Kalman filter (KF). KFs convert a Gaussian prior probability to a Gaussian posterior probability when the likelihood function is also Gaussian. The present state of knowledge of unknown parameters at time instant k is completely characterized by the use of a small set of sufficient statistics based on prior information and the measurement history. KFs provide unbiased estimates with minimum variance when both state transition and measurement process are linear functions, process and measurement noise are uncorrelated and Gaussian distributed with zero mean. The KF-based state estimation can be extended to nonlinear state transitions and measurement models leading to the extended Kalman filter (EKF), which linearizes about the most recent state estimate. Taking into account any evolution of the contour over time the state-space representation of a contour can be defined by Pipe MAP CM (a) (b) Figure 16. Reconstruction results. (a) Scatter plot. Randomly chosen points of the posterior distribution are plotted. (b) Detail plot of point estimates. MAP estimate (grey) and the CM estimate (dashed black) calculated from the posterior distribution. figure 15(b) deviations from the true contour in regions of low sensor sensitivity are clearly visible. Furthermore, there are significant outliers in the right part of the estimation result despite this part of the contour being close to the boundary. The reason is that the distinct corner in the true boundary is not well modelled by our prior, which rejects candidate states with sharp angles between two boundary elements. Appropriately changing the prior would allow these features of the contour to be recovered. Note, however, that this mis-modelling in the prior is evident from the scatter plot in the region near the sharp corner, and indicates the significant benefit of posterior error estimates available in a Bayesian analysis. 7.3. Example 3—circular contour (measured data) Figure 16(a) shows the posterior variability in inclusion shape and position for measured data. Samples from the posterior distribution are consistent with the circular shape of the PVC rod. Due to the lack of a reference measurement system, the true shape is not depicted. However, knowing the geometry of the rod allows validation of the reconstruction results by comparing estimates to the true area and circumference. The centred grey circle-shaped contour represents the initial state of the Markov chain. MAP state and CM state estimates are presented in figure 16(b). In accordance with the first example (same shape and properties but synthetic data) MAP and CM estimates almost coincide indicating that the posterior distribution has a well-defined single mode. Table 2 summarizes the inference. Mean, standard deviation and the IACT are evaluated for inclusion area , circumference c and centre coordinates (x, y) of the circular inclusion. θk+1 = fk (θk , vk ) (31) dk = hk (θk , nk ), (32) where fk (·) represents the state transition of the state θ from time k − 1 to time k subjected to process noise which is modelled by v. A measurement based on the current state θk subjected to measurement noise n is modelled by hk (·). The simplest dynamic model which is widely in use for ECT and EIT is a constant shape with a random walk in position. Since the (E)KF relies on first-order derivatives, a linearization of the nonlinear measurement equation is required. By linearizing (32) about the latest predicted state θk|k−1 , we obtain dk ≈ hk (θk|k−1 ) + Jk (θk|k−1 )θk + " nk . (33) The Jacobian Jk (θk|k−1 ) is composed of the derivatives of measured charges with respect to state variables, e.g. 18 Meas. Sci. Technol. 20 (2009) 052002 Topical Review coefficients of a contour model. The Jacobian is usually calculated from the solution of the forward map using the adjoint variable approach [17, 91]. Higher order terms are considered as additional noise [90]. The stochastic part " nk is composed of measurement and linearization error and is typically also assumed to be zero-mean Gaussian. When considering inverse problems we have to face the problems of multiple maxima and the possibility of finding unrepresentative peaks in the posterior probability. By imposing constraints on the state vector and by specifying appropriate priors, the KF has been successfully applied to non-stationary electrical tomography. Vauhkonen et al who introduced the KF to EIT, augments the state vector by artificial measurements (spatial regularization) in order to obtain robust estimates [80]. Based on this work, many approaches for two- and three-dimensional dynamic imaging and parameter tracking have been presented over the last decade including extended, constrained and unscented Kalman filtering techniques (see, e.g. [29, 90, 91]) as well as fixed-lag and fixed-interval smoothing approaches [92]. More recently, focus was put on the estimation of phase boundary parameters using different contour models (see, e.g. [28, 93]). A less restrictive formulation of the Bayes principle based on sequential Monte Carlo simulations and a numerical approximation of non-Gaussian state densities is given by the particle filter (PF) [55, 94, 95]. In the literature, the PF is also known as bootstrap filter, condensation tracking or sampling importance resampling. Whereas for the KF the state is modelled using a multi-variate Gaussian distribution, the PF numerically approximates any potentially multi-modal and non-Gaussian distribution over the state vector. Calculation of the Jacobian is not necessary. The distribution is represented by a set of ‘particles’, or states. A set of N particles θ (m) randomly chosen from the state space and their corresponding weights ξ (m) define the empirical distribution fθ (θ ) ≈ {θ (m) , ξ (m) }m=1,...,N . 0.05 0 0.05 0.05 0 0.05 Figure 17. PF tracking result of a circular phantom with a diameter of 50 mm moved from left to right. The bold dashed black contours denote the sample mean at three different time instants while the grey-shaded contours represent the posterior variability of involved particles associated with their respective weights. A good overview of non-stationary inverse problems in the framework of Bayesian recursive filters, including many illustrative examples, is presented in [55]. 9. Discussion and conclusion A key difference between regularized least squares and Bayesian methods is that whereas regularization gives point estimates, typically using a data-misfit criterion, Bayesian methods can provide averages over all solutions consistent with the data. This leads to a improvement in robustness in estimates or properties of the unknown permittivity. The improvement is not surprising once it is realized that the single ‘most likely’ solution, found by a regularized minimization of misfit to the measured data, is typically unrepresentative of the bulk of feasible solutions in high-dimensional nonlinear problems. Inferential solutions to inverse problems provide other substantial advantages over deterministic methods, such as the ability to treat arbitrary forward maps and error distributions, and to use a wide range of representations of the unknown system including parameter spaces that are discrete, discontinuous or even variable dimension. Markov chain Monte Carlo sampling (MCMC) has revolutionized computational Bayesian inference and is currently the best available technology for a comprehensive analysis of inverse problems, allowing quantitative estimates and exploration of high-dimensional posterior distributions without special mathematical structure. One complaint might be that inference appears ‘too easy’, since ability to simulate measurements means that the posterior distribution can be sampled, effectively solving the inverse problem by giving access to summary statistics that characterize posterior variability. However, as we hope we have demonstrated throughout this paper, formulating the inverse problem in a (34) Assuming that the underlying process is Markov, the state transition in equation (31) can be reformulated as conditional density π(θk |θk−1 ). The PF keeps track of the current state estimate represented by π(θk |Dk ), where Dk = {d1 , . . . , dk } denotes the history of measurements acquired up to time step k. In case of reconstructing material interfaces in ECT, the state of the model is a set of parameters that fully describe the contour at any instance (mid-level models). Figure 17 shows a representative example of PF-based object tracking from measured electrical capacitance data. A phantom is moved from left to right with constant speed. Measurements are acquired at eight different positions spaced along the x-axis. In each position 10 measurement frames are collected and provided to the reconstruction algorithm. The filter uses 20 particles for sequential sampling. In an example discussed in [30] time-varying contours are used to describe interfaces between different material properties. A comparison of particle filtering to Kalman filtering with application to ECT is presented in [74]. An application of a PF for detecting an inclusion in ultrasound reflection tomography is presented in [82]. 19 Meas. Sci. Technol. 20 (2009) 052002 Topical Review Bayesian inferential framework requires accurate simulation of the measurement process by careful discretization of a forward map that is a validated model for measurements, a validated stochastic model for measurement noise, and a data-independent prior distribution that is (non)informative with respect to the primary quantity of interest. These modelling requirements ensure that achieving quality results using Bayesian inference will never constitute a ‘free lunch’. A further impediment to the application of Bayesian analyses to practical inverse problems lies in the computational cost of MCMC sampling. In recent years several promising algorithms and advances have been suggested that give substantial speedup for computationally intensive problems including capacitance tomography. There is some hope that eventually the computational cost of sampling will not be substantially greater than that of optimization. We contrasted the modelling choices available against a reference problem to clarify the role of alternatives available in structuring inferential solutions for other applications. There is now a wide range of well-developed tools for stochastic modelling and Bayesian inference for inverse problems. Notwithstanding the modelling difficulties alluded to above, applying those tools is now a well-developed procedure, and we anticipate that Bayesian inference for inverse problems will move from being the quality standard in solutions to the quality solution of standard choice. [11] Dyakowski T 2005 Application of electrical capacitance tomography for imaging industrial processes J. Zhejiang Univ. Sci. 12 1374–8 [12] Fox C and Nicholls G 2002 Statistical estimation of the parameters of a PDE Can. Appl. Math. Q. 10 277–306 [13] Christen J A and Fox C 2005 MCMC using an approximation J. Comp. Graphical Stat. 14 795–810 [14] Yang W Q and Peng L 2003 Image reconstruction algorithms for electrical capacitance tomography Meas. Sci. Technol. 14 R1–13 [15] Brandstätter B, Holler G and Watzenig D 2003 Reconstruction of inhomogeneities in fluids by means of capacitance tomography J. Comp. Math. Electr. Electron. Eng. 22 508–19 [16] Soleimani M and Lionheart W R B 2005 Nonlinear image reconstruction for electrical capacitance tomography using experimental data Meas. Sci. Technol. 16 1987–96 [17] Kortschak B and Brandstätter B 2004 A FEM-BEM approach using level-sets in tomography J. Comp. Math. Electr. Electron. Eng. 24 591–605 [18] Kaipio J and Somersalo E 2007 Statistical inverse problems: discretization, model reduction and inverse crimes J. Comput. Appl. Math. 198 493–504 [19] Hurn M A, Husby O and Rue H 2003 Advances in Bayesian image analysis Highly Structured Stochastic Systems ed P J Green, N Hjort and S Richardson (Oxford: Oxford University Press) pp 302–22 [20] McKeague I W, Nicholls G, Speer K and Herbei R 2005 Statistical inversion of south atlantic circulation in an abyssal neutral density layer J. Marine Res. 63 683–704 [21] Higdon D and Yamamoto S 2001 Estimation of the head sensitivity function in scanning magnetoresistance microscopy J. Am. Stat. Assoc. 96 785–93 [22] Jeffreys H 1931 Scientific Inference (Cambridge: Cambridge University Press) [23] Geman S and Geman D 1984 Stochastic relaxation: Gibbs distributions, and the Bayesian restoration of images IEEE Trans. Pattern Anal. Mach. Intell. 6 721–41 [24] Grenander U and Miller M 1994 Representations of knowledge in complex systems J. R. Stat. Soc. Ser. B 56 549–603 [25] Nicholls G K and Fox C 1998 Prior modelling and posterior sampling in impedance imaging Proc. SPIE 3459 116–27 [26] Kaipio J P, Kolehmainen V, Somersalo E and Vauhkonen M 2000 Statistical inversion and Monte Carlo sampling methods in electrical impedance tomography Inverse Problems 16 1487–522 [27] Hastings W K 1970 Monte Carlo sampling methods using Markov chains and their applications Biometrika 57 97–109 [28] Tossavainen O P, Vauhkonen M and Kolehmainen V 2007 A three-dimensional shape estimation approach for tracking of phase interfaces in sedimentation processes using electrical impedance tomgraphy Meas. Sci. Technol. 18 1413–24 [29] Soleimani M, Vauhkonen M, Yang W Q, Peyton A J, Kim B S and Ma X 2007 Dynamic imaging in electrical capacitance tomography and electromagnetic induction tomography using a Kalman filter Meas. Sci. Technol. 18 3287–94 [30] Watzenig D 2006 Recovery of inclusion shape by statistical inversion of non-stationary tomographic measurement data Int. J. Inform. Syst. Sci. 2 469–83 [31] Watzenig D 2006 Bayesian inference for process tomography from measured electrical capacitance data PhD Thesis Institute of Electrical Measurement and Measurement Signal Processing, Graz University of Technology [32] Huang A M, Plaskowski A B, Xie C G and Beck M S 1988 Capacitance-based tomographic flow imaging system IEE Electron. Lett. 24 418–9 [33] Kortschak B, Wegleiter H and Brandstätter B 2007 Formulation of cost functionals for different measurement References [1] Scott D M and McCann H 2005 Process Imaging and Automatic Control (Boca Raton, FL: CRC Press) [2] Beck M S, Dyakowski T and Williams R A 1998 Process tomography—the state of the art Trans. Instrum. Meas. Control 20 163–77 [3] Holder D S 2005 Electrical Impedance Tomography: Methods, History and Applications (Series in Medical Physics and Biomedical Engineering) (Bristol: Institute of Physics Publishing) [4] York T 2001 Status of electrical tomography in industrial applications J. Electron. Imaging 10 608–19 [5] Mohamed-Saleh J and Hoyle B S 2002 Determination of multi-component flow process parameters based on electrical capacitance tomography data using artificial neural networks Meas. Sci. Technol. 13 1815–21 [6] Tapp H S, Peyton A J, Kemsley E K and Wilson R H 2003 Chemical engineering applications of electrical process tomography Sensors Actuators B 92 17–24 [7] Gamio J C, Castro J, Rivera L, Alamillia J, Garcia-Nocetti F and Aguilar L 2005 Visualisation of gas-oil twophase flow in pressurised pipes using electrical capacitance tomography Flow Meas. Instrum. 16 129–34 [8] Makkawi Y T and Wright P C 2004 Electrical capacitance tomography for conventional fluidized bed measurements—remarks on the measuring technique J. Powder Technol. 148 142–57 [9] Sanderson J and Rhodes M 2003 Hydrodynamic similarity of solids motion and mixing in bubbling fluidized beds J. Am. Inst. Chem. Eng. 49 2317–27 [10] Waterfall R C 2000 Imaging combustion using capacitance tomography Advanced Sensors and Instrumentation Systems for Combustion Processes: IEE Seminar ed J Gardener (IEE Professional Group J1) pp 12/1–12/4 20 Meas. Sci. Technol. 20 (2009) 052002 [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] Topical Review principles in nonlinear capacitance tomography Meas. Sci. Technol. 18 71–8 Yang W Q 1996 Hardware design of electrical capacitance tomography systems Meas. Sci. Technol. 7 225–32 Wegleiter H, Fuchs A, Holler G and Kortschak B 2008 Development of a displacement current based sensor for electrical capacitance tomography applications Flow Meas. Instrum. 19 241–50 Yang W Q, Scott A L and Gamio J C 2003 Analysis of the effect of stray capacitance on an ac-based capacitance tomography sensor J. Instrum. Meas. 52 1674–81 Alme K J and Mylvaganam S 2006 Electrical capacitance tomography—sensor models, design, simulations, and experimental verification IEEE Sensors J. 6 1256–66 Wegleiter H, Fuchs A, Holler G and Kortschak B 2005 Analysis of hardware concepts for electrical capacitance tomography applications Proc. 4th IEEE Conf. on Sensors (Oct. 31–Nov. 3, Irvine, CA, USA) pp 688–91 Fox C, Nicholls G and Palm M 2000 Efficient solution of boundary-value problems for image reconstruction via sampling J. Electron. Imaging 9 251–9 Kaipio J P, Seppänen A, Somersalo E and Haario H 2004 Posterior covariance related optimal current patterns in electrical impedance tomography Inverse Problems 20 919–36 Fang W and Cumberbatch E 2005 Matrix properties of data from electrical capacitance tomography J. Eng. Math. 51 127–46 Fox C 1988 Conductance imaging PhD Thesis University of Cambridge Kolehmainen V, Lassas M and Ola P 2005 Inverse conductivity problem with an imperfectly known boundary SIAM J. Appl. Math. 66 365–83 Nissinen A, Heikkinen L M and Kaipio J P 2008 The Bayesian approximation error approach for electrical impedance tomography—experimental results Meas. Sci. Technol. 19 015501 Bayarri M J, Berger J O, Cafeo J, Garcia-Donato G, Liu F, Palomo J, Parthasarathy R J, Paulo R, Sacks J and Walsh D 2007 Computer model validation with functional output Ann. Stats. 35 1874–906 Seagar A D 1983 Probing with low frequency electric currents PhD Thesis Electrical Engineering, University of Canterbury Kress R 1999 Linear Integral Equations 2nd edn (New York: Applied Mathematical Sciences, Springer) Aykroyd R G and Cattle B A 2006 A flexible statistical and efficient computational approach to object location applied to electrical tomography Stat. Comp. 16 363–75 Roy D, Nicholls G and Fox C 2008 Imaging convex quadrilateral inclusions in uniform conductors from electrical boundary measurements Stat. Comput. 19 17–26 Wrobel L C 2002 The Boundary Element Method (Chichester: Wiley) Watzenig D, Steiner G, Fuchs A, Zangl H and Brandstätter B 2007 Influence of the discretiztion error on the reconstruction accuracy in electrical capacitance tomography Int. J. Comp. Math. Electr. Electron. Eng. 26 661–76 Schwarzl C 2007 Robust parameter estimation in ECT using MCMC sampling Master Thesis Graz University of Technology Moulton J D, Fox C and Svyatskiy D 2008 Multilevel approximations in sample-based inversion from the Dirichlet-to-Neumann map J. Phys.: Conf. Ser. 124 012035 [54] Schwarzl C, Watzenig D and Fox C 2008 Estimation of contour parameter uncertainties in permittivity imaging using MCMC sampling 5th IEEE Sensor Array and Multichannel Signal Processing Workshop (21–23 July) pp 446–50 [55] Kaipio J P and Somersalo E 2004 Statistical and Computational Inverse Problems (New York: Applied Mathematical Sciences, Springer) [56] Golub G H and Van Loan C F 1993 Matrix Computations 2nd edn (Baltimore: The Johns Hopkins University Press) [57] Lee J E 2005 Sample based inference for inverse obstacle scattering Master Thesis Department of Mathematics, The University of Auckland [58] Fox C 2008 Recent advances in inferential solutions to inverse problems J. Inverse Problems Sci. Eng. 16 797–810 [59] Robert C 2001 The Bayesian Choice (New York: Springer) [60] Gilks W R, Richardson S and Spiegelhalter D (ed) 1996 Markov Chain Monte Carlo in Practice (London: Chapman and Hall) [61] Robert C P and Casella G 2000 Monte Carlo Statistical Methods. Springer Texts in Statistics 2nd edn (New York: Springer) [62] Liu J S 2001 Monte Carlo Strategies in Scientific Computing (New York: Springer) [63] Banerjee S, Carlin B P and Gelfand A E 2004 Hierarchical Modeling and Analysis for Spatial Data (Boca Raton, FL: CRC Press) [64] Grenander U and Miller M 2007 Pattern Theory: From Representation to Inference (Oxford: Oxford University Press) [65] Fox C and Nicholls G K 1997 Sampling conductivity images via MCMC The Art and Science of Bayesian Image Analysis—Leeds Annual Statistics Research Workshop vol 14 pp 91–100 [66] Siltanen S, Voutilainen A, Kolehmainen V, Järvenpää S, Kaipio J P, Koistinen P, Lassas M, Pirttilä J and Somersalo E 2003 Statistical inversion for medical X-ray tomography with few radiographs: I. General theory Phys. Med. Biol. 48 1437–63 [67] Aykroyd R G and Zimeras S 1999 Inhomogeneous prior models for image reconstruction J. Am. Stat. Assoc. 94 934–46 [68] Noumeir R, Mailloux G E and Lemieux R 1995 An expectation maximization reconstruction algorithm for emission tomography with non-uniform entropy prior Int. J. Biomed. Comp. 39 299–310 [69] Kolehmainen V 2001 Novel approaches to image reconstruction in diffusion tomography PhD Thesis Kuopio University Publications C. Natural and Environmental Sciences 125 [70] Palm M 1999 Monte Carlo methods in electrical conductance imaging Master Thesis Department of Mathematics, The University of Auckland [71] Cui T 2005 Bayesian inference for geothermal model calibration Master Thesis Department of Mathematics, The University of Auckland [72] Andersen K E, Brooks S P and Hansen M B 2003 Bayesian inversion of geoelectrical resistivity data J. R. Stat. Soc.: Ser. B 65 619–42 [73] Kolehmainen V, Voutilainen A and Kaipio J P 2001 Estimation of non-stationary region boundaries in EIT-state estimation approach Inverse Problems 17 1937–56 [74] Watzenig D, Brandner M and Steiner G 2007 A particle filter approach for tomographic imaging based on different state-space representations Meas. Sci. Technol. 18 30–40 [75] Kim M C, Kim K Y, Kim S, Seo K H, Jeon H J, Kim J H and Choi B Y 2005 Estimation of phase boundary by front points method in electrical impedance tomography Proc. 21 Meas. Sci. Technol. 20 (2009) 052002 [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] Topical Review [86] Higdon D, Lee H and Bi Z A 2002 Bayesian approach to characterizing uncertainty in inverse problems using coarse and fine scale information IEEE Trans. Signal Process. 50 389–99 [87] Marinari E and Parisi G 1992 Simulated tempering: a new Monte Carlo scheme Europhys. Lett. 19 451–8 [88] Watzenig D 2007 Statistical solutions to inverse problems statistical inversion J. Austrian Soc. Electr. Eng. (OVE) 7/8 240–7 [89] Aykroyd R G and Cattle B A 2007 A boundary-element approach for the complete electrode model of EIT illustrated using simulated and real data Inverse Problems Sci. Eng. 15 441–61 [90] Kim K Y, Kang S I, Kim M C, Kim S, Lee Y J and Vauhkonen M 2003 Dynamic image reconstruction in electrical impedance tomography with known internal structures IEEE Trans. Magn. 38 1301–4 [91] Watzenig D, Steiner G and Pröll C 2005 Statistical estimation of phase boundaries and material parameters in industrial process tomography Proc. of the IEEE Int. Conf. on Ind. Technol. (ICIT’05) (Hong Kong, China, December 14–17) pp 720–5 [92] Seppänen A, Vauhkonen M, Vauhkonen P J, Somersalo E and Kaipio J P 2001 State estimation with fluid dynamical evolution models in process tomography—an application with impedance tomography Inverse Problems 17 467–84 [93] Kim B S, Ijaz U Z, Kim J H, Kim M C, Kim S and Kim K Y 2007 Nonstationary phase boundary estimation in electrical impedance tomography based on the interacting multiple model scheme Meas. Sci. Technol. 18 62–70 [94] Doucet A, de Freitas N and Gordon N J 2001 Sequential Monte Carlo Methods in Practice (New York: Springer) [95] Arulampalam M S, Maskell S, Gordon N J and Clapp T 2002 A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking IEEE Trans. Signal Process. 50 174–88 Int. Conf. on Inverse Problems, Design and Optimization (IPDO 2004) pp 101–7 Tossavainen O P, Vauhkonen M, Heikkinen L M and Savolainen T 2004 Estimating shapes and free surfaces with electrical impedance tomography Meas. Sci. Technol. 15 1402–11 Grudzien K, Romanowski A and Williams R A 2005 Application of a Bayesian approach to the tomographic analysis of hopper flow J. Part. Part. Syst. Charact. 22 246–53 West R M, Aykroyd R G, Meng S and Williams R A 2004 MCMC techniques and spatial-temporal modelling for medical EIT Meas. Sci. Technol. 25 181–94 Watzenig D and Fox C 2008 Posterior variability of inclusion shape based on tomographic measurement data J. Phys.: Conf. Ser. 135 012102 Vauhkonen M, Karjalainen P A and Kaipio J P 1998 A Kalman filter approach to track fast impedance changes in electrical impedance tomography IEEE Trans. Biomed. Eng. 45 486–93 Kaipio J P, Kolehmainen V, Vauhkonen M and Somersalo E 1999 Inverse problems with structural prior information Inverse Problems 15 713–29 Steiner G, Soleimani M and Watzenig D 2008 A bio-electromechanical imaging technique with combined electrical impedance and ultrasound tomography Physiol. Meas. 29 63–75 Green P J 1995 Reversible jump Markov chain Monte Carlo computation and Bayesian model determination Biometrika 82 711–32 Higdon D, Lee H and Holloman C 2003 Markov chain Monte Carlo-based approaches for inference in computationally intensive inverse problems Bayesian Statistics 7 (Oxford: Oxford University Press) Brooks S P, Giudici P and Roberts G O 2003 Efficient construction of reversible jump MCMC proposal distributions J. R. Stat. Soc.: Ser. B 65 3–56 22