Download Nonlinear Reconstruction Methods for Transmission Electron

Document related concepts

Imagery analysis wikipedia , lookup

3D optical data storage wikipedia , lookup

Gaseous detection device wikipedia , lookup

Transmission electron microscopy wikipedia , lookup

Transcript
Nonlinear Reconstruction
Methods for Transmission
Electron Microscopy
Masterarbeit
zur Erlangung des akademischen Grades
Master of Science
Westfälische Wilhelms-Universität Münster
Fachbereich Mathematik und Informatik
Institut für Numerische und Angewandte Mathematik
Betreuung:
Dr. Christoph Brune
Prof. Dr. Martin Burger
Prof. Ozan Öktem
Eingereicht von:
Leonie Zeune
Münster, Juni 2014
i
Abstract
Electron tomography (ET) is a technique to recover the three-dimensional structure
of an object on a molecular level from a set of two-dimensional transmission electron
microscope (TEM) images recorded from different perspectives. These images are corrupted by Poisson as well as Gaussian noise. The resulting inverse problem is severely
ill-posed due to a combination of a very low signal-to-noise ratio and an incomplete
data problem. In this thesis we present an approach to solve this inverse problem
with variational methods. It is based on a statistical modeling of the inverse problem in terms of maximum a posteriori (MAP) likelihood estimations. In contrast to
the majority of reconstruction methods in the field of ET, we focus on modeling data
corrupted by Poisson noise. Thus, we want to minimize a nonlinear energy functional
with the Kullback-Leibler divergence as the data discrepancy term combined with a
total variation regularization. In order to solve this optimization problem, we propose
an alternating two-step iteration consisting of an expectation-maximization (EM) step
and the solution of a weighted Rudin-Osher-Fatemi (ROF) model. The algorithm is
adapted to the affine form of the forward model in ET. In order to overcome contrast
loss typical for TV-based regularization, we extend the algorithm by iterative regularization based on Bregman distances. Finally, we illustrate the performance of our
techniques on synthetic and experimental biological data.
ii
Eidesstattliche Erklärung
Hiermit versichere ich, Leonie Zeune, dass ich die vorliegende Arbeit selbstständig
verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet
habe. Gedanklich, inhaltlich oder wörtlich Übernommenes habe ich durch Angabe
von Herkunft und Text oder Anmerkung belegt bzw. kenntlich gemacht. Dies gilt in
gleicher Weise für Bilder, Tabellen, Zeichnungen und Skizzen, die nicht von mir selbst
erstellt wurden.
Alle auf der CD beigefügten Programme sind von mir selbst programmiert oder durch
Angabe von Herkunft kenntlich gemacht worden.
Münster, 16. Juni 2014
Leonie Zeune
iii
Acknowledgment
I want to take this opportunity to thank the people who supported and encouraged me
in the last months and made this thesis possible, especially
Dr. Christoph Brune for being a great supervisor and for helping and supporting me
a lot. For many valuable advices and for motivating me if things did not work out as
I hoped.
Prof. Ozan Öktem for giving me the unique opportunity to spend six months at
the Royal Institute of Technology (KTH) in Stockholm and making me feel welcome.
Especially, I want to thank him for taking a lot of time to support me and for very
helpful discussions and advices.
Prof. Dr. Martin Burger for giving me the opportunity to work at this interesting and
challenging topic and for making my stay at the KTH possible.
The people who proofread my thesis and thereby helped to improve it.
Stefan Poggensee for a lot of patience and encouragement during the last months and
for always believing in me and making me happy.
My great family - my parents, my sisters Lisa and Sophie and their ”Anhang” Tomas
and Stefan - for a lot of love and encouragement and for always cheering me up.
iv
Contents
1 Introduction
2 Electron Tomography
2.1 Transmission Electron Microscopy
2.1.1 Advantages of Illumination
2.1.2 Image Formation Process .
2.1.3 Scattering and Diffraction
2.2 Electron Microscope . . . . . . .
2.2.1 Electron Source . . . . . .
2.2.2 The Condenser System . .
2.2.3 Holders and Stage . . . . .
2.2.4 Imaging Lenses . . . . . .
2.2.5 Viewing Device . . . . . .
2.3 Sample Preparation . . . . . . . .
1
. .
by
. .
. .
. .
. .
. .
. .
. .
. .
. .
. . . . . .
Electrons
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
3 The Forward Model
3.0.1 Basic Notation . . . . . . . . . . . . . . .
3.1 Image Formation . . . . . . . . . . . . . . . . . .
3.1.1 Modeling Phase Contrast Imaging . . . . .
3.1.2 The Forward Operator . . . . . . . . . . .
3.2 The Inverse Problem . . . . . . . . . . . . . . . .
3.3 Difficulties for Solving the Inverse Problem . . . .
3.4 Reconstruction Methods in Electron Tomography
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Inverse Problems and Variational Methods
4.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Basic Concept . . . . . . . . . . . . . . . . . . . . . . .
4.1.2 Different Noise Models and Corresponding Data Terms
4.1.3 Different Regularization Terms . . . . . . . . . . . . .
4.2 Existence and Uniqueness of a Minimizer . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
5
6
7
8
9
9
11
13
14
14
.
.
.
.
.
.
.
16
16
17
18
27
28
29
31
.
.
.
.
.
34
35
35
37
46
54
Contents
v
5 Numerical Approach
5.1 Introduction to Numerical Optimization Methods .
5.1.1 Gradient Descent Methods . . . . . . . . . .
5.1.2 Newton Methods . . . . . . . . . . . . . . .
5.1.3 Introduction to Splitting Methods . . . . . .
5.2 Bregman-FB-EM-TV Algorithm . . . . . . . . . . .
5.2.1 EM Algorithm . . . . . . . . . . . . . . . . .
5.2.2 FB-EM-TV Algorithm . . . . . . . . . . . .
5.2.3 Bregman-FB-EM-TV Algorithm . . . . . . .
5.2.4 Numerical Realization of the Weighted ROF
6 Programming and Realization in MATLAB and C
6.1 Bregman-FB-EM-TV Toolbox . . . . . . . . . . . .
6.2 TVreg Software . . . . . . . . . . . . . . . . . . . .
6.3 Embedding of the Forward Operator via MEX files
6.4 Difficulties . . . . . . . . . . . . . . . . . . . . . . .
7 Results
7.1 Simulated Data . . . . . . . .
7.1.1 Results Balls Phantom
7.1.2 Results RNA Phantom
7.2 Experimental Data . . . . . .
7.2.1 Results CPMV Virus .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
63
63
65
67
68
68
70
77
80
.
.
.
.
83
83
85
86
87
.
.
.
.
.
90
90
93
98
103
103
8 Conclusion and Outlook
107
List of Figures
110
List of Tables
112
Bibliography
113
1
1 Introduction
Microscopy, whose origins can be traced back to the 16th century, has played and still
plays a central role in life sciences and medicine. In general, there are two major goals
that provide incentives for developments in the field of microscopy. On the one hand,
there is the quest for higher magnification and resolving power. There are now a variety
of microscopic imaging techniques, e.g. electron microscopy that uses electrons instead
of visible light for imaging. Compared to visible light, electrons have a much shorter
wavelength, which enables to image species at significantly higher resolution than in
light microscopy. On the other hand, there is the quest to image in three dimensions
rather than in two dimensions. This is essential for understanding the structural threedimensional conformation of proteins and macromolecular assemblies, which is closely
related to their function within biological processes in the cell in time and space. Such
assemblies form the machinery responsible for most biological processes and are relevant
for understanding many diseases such as cancer and metabolic disorders. Knowledge
of these three-dimensional structures would provide the mechanistic descriptions for
how macromolecules act in an assembly. Moreover, it could give some indications
for developing therapeutic interventions related to diseases. Therefore, the structure
determination problem has come to play a central role in both commercial and academic
biomedical research. It is the problem of recovering the three-dimensional structure
of an individual molecule (e.g. a protein or a macromolecular assembly) at highest
possible resolution in its natural environment, which could be in-situ (i.e. in the cellular
environment) or in-vitro (i.e. in the aqueous environment).
Two established structure determination methods are X-ray crystallography and nuclear magnetic resonance. However, none of these methods can recover the threedimensional structure of an individual molecule of a sub-cellular complex in its natural
environment. This is important in order to address many key biological issues. A particular issue are the current unacceptably high failure rates in the drug discovery, even
in late stages of the development process. It is very difficult to understand drug targets
and the disease mechanisms on a molecular level and to match the model system to
1
Introduction
2
the human biology properly. This is one of the main reasons for the high failure rates
when drug candidates progress from pre-clinical studies to clinical trials.
Due to the reasons above, one of the main goals for present-day electron microscopy is
to look at the life processes within a cell at the molecular level. Traditional electron
microscopy is limited to acquire a two-dimensional image which is then interpreted
with little or no image post-processing. Thus, from a mathematical point of view,
microscopy in this context is not necessarily understood as an inverse problem. Nevertheless, there is the quest to image specimens in three dimensions. During recent years
electron microscopy has developed into the most useful tool to study macromolecules,
molecular complexes and supramolecular assemblies in three dimensions. Three main
techniques have been applied: electron crystallography reviewed in [43, 38], single particle analysis reviewed in [35], and Electron Tomography (ET), which is the topic of
this thesis. Electron crystallography permits structural analysis of macromolecules at
or close to atomic resolution (0.4 nm or better). It relies on the availability of twodimensional crystals and has proven to be especially suited for membrane proteins.
Larger biological complexes are preferably studied by single particle analysis, which
in favorable cases allows the molecular objects to be examined at medium-resolution
(1 − 2 nm). Notably, this resolution is good enough to permit docking, which is the
process of fitting high-resolution structures of subunits or domains (usually obtained
by X-ray crystallography) into the large structure at hand.This approach may reveal
the entire complex at close to atomic resolution. Finally, ET can provide structural
information at the molecular level in the context of the cellular environment. Most
importantly, by using sample preparation techniques that preserve the specimens at
close to life conditions (called cryo electron microscopy), it is possible to study macromolecular complexes in aqueous solution or small cells, or sections through larger cells
or tissues (see e.g. [3, 49]). Presently, the resolution is limited to 4 − 5 nm but it seems
reasonable to reach higher resolutions in near future. The docking approach would then
be realistic, which will help to identify and characterize the observed molecular complexes. It should finally be recalled that ET examines the supramolecular assemblies
as individual objects. This is essential since within the cell these complex structures
are likely to be dynamic, i.e. they change their conformation and subunit composition.
Moreover, they interact, often transiently, with other molecular assemblies and cellular
structures. Thus, often in conjunction with other methods such as X-ray crystallography, mass spectrometry and single particle analysis (cf. [63, 78]), ET is likely to be the
most efficient tool to visualize the supramolecular structures at work. This will help
us to understand how they operate within the cell at the molecular level.
1
Introduction
3
The idea in ET is to recover the three-dimensional structure of a specimen from a set
of two-dimensional Transmission Electron Microscope (TEM) images of the specimen.
Here, each image represents a view of the specimen under scrutiny from a different
direction. This technique, which was first outlined about 40 years ago, is still the only
approach that allows one to reconstruct the three-dimensional structure of individual
molecules in their natural (i.e. in-situ or in-vitro) environment. Hence, ET is more of
a structure determination method than a direct imaging method, where imaging data
is directly interpreted. To infer the three-dimensional image from the available twodimensional imaging data one needs to use mathematics. The reconstruction problem
in ET is however an example of a limited data inverse scattering problem which is
severely ill-posed. This, in combination with very noisy data, makes it difficult to obtain
reliable reconstructions with high enough resolution, unless sophisticated mathematics
is employed. Hence, the usage of ET in mainstream structural biology has been limited
to the study of cellular sub-structures rather than individual molecules.
As already mentioned, reliable reconstruction methods play an important role for the
success of ET. This thesis presents a reconstruction method for ET based on variational methods. Mathematically speaking, we solve the resulting inverse problem by
minimizing an energy functional consisting of two different terms: a data discrepancy
term and a regularization term. The data discrepancy term is given by assumptions
about the noise in the recorded data. The regularization term accounts for a-priori
knowledge that stabilizes the inversion. This is needed since the inverse problem is
severely ill-posed. We choose the total variation as the regularization functional. This
choice leads to reconstructions with preserved or even enhanced sample contours. Unfortunately, such reconstructions suffer from a systematic loss of contrast. Hence, we
also present techniques to overcome this drawback.
This thesis is organized as follows. We start with an introduction to electron tomography in Chapter 2. In particular, we address the advantages of TEM compared to light
microscopy and describe how it operates. In Chapter 3, we outline the derivation of
a computationally feasible forward model for phase contrast TEM imaging. Moreover,
we comment on some difficulties for solving the resulting inverse problem. Chapter
4 starts with an introduction to inverse problems and variational methods before we
move on to present the minimization problem we want to solve in our reconstruction
algorithm. The latter is presented in Chapter 5, whereas the computational realization is described in Chapter 6. Afterwards, numerical results using simulated as well
as experimental data sets are presented in Chapter 7. Finally, Chapter 8 provides
conclusions and an outlook.
4
2 Electron Tomography
ET is a tomography technique to obtain 3D reconstructions of specimens at the nano
meter scale. It is comparable to the phase contrast x-ray Computed Tomography (CT)
but with electrons instead of photons. The underlying data for the reconstruction process are a series of 2D images recorded with a TEM. The individual images, henceforth
called micrographs, are recorded while the specimen is tilted along a fixed axis in the
microscope. This chapter explains the operating principles of Transmission Electron
Microscopy. This is essential for understanding the forward model presented in the
following chapter.
2.1 Transmission Electron Microscopy
Transmission Electron Microscopy is an imaging technique, which uses an electron
beam in order to illuminate the specimen. The theoretical basis for using electrons
for imaging was laid by Hans Busch in the 1920s when he designed the first working
electron lens and thereby laid the foundations for electron optics. He suggested that
magnetic fields can be used in order to shape the path of an electron beam and thus
are usable analogously to lenses in light optics. Besides Busch’s work, there have been
several discoveries about the properties of electrons that led to the invention of an
electron microscope. It was Louis de Broglie who stated that electrons, besides their
interpretation as particles, have wave-like characteristics. This concept is known as the
wave-particle dualism and is a central concept of quantum mechanics. Moreover, it was
discovered that electrons behave in vacuum much like light. Based on these findings,
Ernst Ruska built the first TEM in 1931 and later in 1939 the first commercial TEM
was made available.
2
Electron Tomography
5
2.1.1 Advantages of Illumination by Electrons
Combined with his theory about the wave-particle dualism of electrons, De Broglie
stated that electrons have a wavelength
λ=
~
~
=
,
p
m0 v
(2.1)
with ~ denoting Planck’s constant, p the momentum, m0 the electron mass at rest, and
v the velocity. Next, equating the kinetic and potential energies yields
eV =
m0 v 2
.
2
(2.2)
Combining (2.1) and (2.2) allows us to derive a relationship between the wavelength
of electrons and their energy. If we neglect relativistic effects, this relationship is
~
λ= √
.
2m0 eV
(2.3)
Thus, an increase of the acceleration voltage for the electrons results in a smaller
electron wavelength. For a typical acceleration voltage of 200 keV the wavelength is
λ = 0.00274 nm and for 300 keV it is even λ = 0.00224 nm. Thus, the wavelength
of electrons is much smaller than the wavelength of visible light ranging from 390
to 700 nm. The main advantage of an electron microscope in comparison to a light
microscope is the significantly higher resolution that can be achieved theoretically.
Resolution is defined as the smallest distance δ between two points such that one can
still see them as two separated points. For classical light microscopes the relation
between the wavelength λ of the illumination source and the theoretically achievable
resolution δ is given by the Rayleigh criterion
δ=
0.61λ
,
µ sin β
(2.4)
where µ is the refractive index of the viewing medium and β the semi-angle of collection
of the magnifying lens (cf. [84]). The term µ sin β is called the numerical aperture and
can be approximated by µ sin β ≈ 1. Thus, the resolution is directly related to the
wavelength of the light. The lowest obtainable resolution for light microscopy with
conventional lenses is about 200 nm. Albeit small, it is still about 100 times larger
than the size of typical proteins which ranges from 3 − 10 nm and about 1000 times
larger than the diameter of an atom whose size ranges from 0.062 − 0.52 nm. Therefore,
2
Electron Tomography
6
light microscopy cannot be used to study life science processes at a molecular level.
Since the wavelength of electrons is much smaller, in theory resolutions at atomic level
are easily to obtain. The relationship between the resolution δ of a TEM and the
electron wavelength λ can be approximated by
δ=
1.22λ
,
β
(2.5)
where β is the semi-angle of collection of the electromagnetic lens (cf. [84]). But in
contrast to light microscopy, where the achieved resolutions are close to the predicted
values, in electron microscopy one is far away from achieving predicted resolutions. In
material science the best achieved resolution is about 0.05 nm (cf. [33]). In biological
applications resolutions are around 4 − 5 nm. This limitation is mainly caused by
technical problems and specimen properties, limiting the dose and energy of incoming
electrons. See Chapter 3 for more details.
2.1.2 Image Formation Process
The image formation process in a TEM is very similar to the one in classical light microscopes. Instead of light, a beam of electrons is used and glass lenses are replaced by
electromagnetic lenses. Electrons are emitted from an electron source and accelerated
to an energy of typically 200 − 300 keV. Condenser lenses form a parallel beam that
passes through a very thin specimen. The transmitted electrons of interest are collected and focused by an objective lense. Afterwards, several projector lenses project a
magnified image to the image plane, where an intensity is generated. A viewing device,
like a Charged Coupled Device (CCD) camera, detects the intensity and converts it
into a grayscale image. It is essential that the whole image formation takes place in
vacuum and that the specimen is very thin. Electrons are nearly 2000 times lighter
and smaller than the smallest atom. Thus, if there was no vacuum, the possibility of
an interaction with air molecules would be very high. Moreover, the specimen needs
to be thin enough to allow the electrons to transmit it. The thicker the specimen is,
the more electrons are backscattered and cannot contribute to the resulting image of
a TEM. In contrast to photons that are not charged and do not affect each other,
electrons are negatively charged and repulse each other if they are too close. We want
to approximate the path of an electron through the specimen by a ray. Therefore,
we have to ensure that the distance between two successive electrons is large enough
so that they do not influence each other’s path. It has been shown that their mean
2
Electron Tomography
7
Figure 2.1: Different kinds of electron scattering from a thin specimen. [84]
separation is much larger than the specimen’s thickness (cf. [56]) and thus interaction
between different electrons can be neglected.
2.1.3 Scattering and Diffraction
In TEM imaging all information available results from electron scattering and diffraction. If we want to understand the different information that can be obtained from
electron scattering, it is important to always keep in mind the wave-particle dualism
of electrons. If we think of electrons as particles, they can scatter either elastically
or inelastically. Elastically scattered electrons do not change their energy while inelastically scattered electrons loose some of their energy. If we think of waves, the
diffraction of the electron wave is distinguished into coherent and incoherent diffraction. Coherent waves remain in step with each other but may be phase shifted after
interacting with the specimen, while incoherent diffracted electron waves have no phase
relationship after specimen interaction. Elastic scattering is associated with coherence
and inelastic scattering with incoherence, see Chapter 3 for more details. If the electron interacts with the specimen, it can generate a range of secondary signals. In
Figure 2.1 the important signals are illustrated. Since each secondary signal can reveal different properties of the specimen, there are several different imaging techniques
using certain signals. In TEM imaging the interest lies only in unscattered or at low
angles scattered electrons that transmit the specimen. Since only forward scattered
electrons, i.e. electrons that are scattered with an angle lower than 90◦ , transmit the
specimen, all backscattered electrons are neglected. Moreover, a restricting aperture
2
Electron Tomography
8
centered around the optical axis is used in order to select only the electrons that deviate less than a certain angular from the axis. Thus, the signals of interest in TEM
are incoherent inelastic scattered electrons and coherent elastic scattered electrons (cf.
Figure 2.1). Incoherent inelastically scattered electrons can be used to form an amplitude contrast image. Since they lost energy during interaction with the specimen,
their amplitude varies. Thus, the intensity, which is the absolute value of the electron wave, varies as well and forms an image of the specimen. It is also possible to
form an image out of the information obtained from coherent elastically scattered electrons. Since the amplitude is constant, it is required to make the occurring phase shift
visible. The resulting image is then referred to as a phase contrast image. The visualization of the phase is more complicated and will be discussed in Chapter 3 together
with a mathematical model for the image formation process of phase contrast images.
2.2 Electron Microscope
In this section, we want to briefly describe the build-up of a TEM used to
acquire the images in ET. It consists
of an electron source, electromagnetic
lenses for the specimen illumination,
a specimen holder, imaging lenses and
a viewing device. Together, they are
called the electron optical column. See
Figure 2.2 for a sketch of the buildup. It is very important that the whole
system is in vacuum in order to prevent that electrons interact with air
molecules. In the following, all components of a modern TEM are explained
in more detail. The section is based
upon [33, 84].
Figure 2.2: Cross section of the column of a
modern TEM. Image courtesy of FEI1 .
1
www.fei.com
2
Electron Tomography
9
2.2.1 Electron Source
The most common electron sources in Electron Microscopes can be divided into two
different kinds: thermionic and field-emission sources. Thermionic sources produce
electrons when they are heated, the most common ones are tungsten filaments and
lanthanum hexaboride crystals. In contrast, there are field-emitters, which produce
electrons when a large electric potential is applied between the source and an anode.
The electrons are then extracted from a very sharply pointed tungsten tip. The properties of an electron source can be described by brightness, coherency and stability,
whereby brightness is the most important one, since it has an influence on the resulting resolution of the microscope. It is defined as the current density per unit solid
angle of the source (cf. [84]). The electron source is incorporated into a gun assembly
in order to be able to control the beam and direct it into the illumination system.
It focusses the electrons coming from the source in one point, called cross-over point.
Since high resolution TEM based on phase contrast imaging needs high spatial coherence, field-emission guns are the best choices for these applications. They provide a
brightness up to 1000 times greater than thermionic sources. A disadvantage are the
varying beam currents (cf. [33]).
2.2.2 The Condenser System
Electromagnetic Lenses Electromagnetic lenses are the equivalent to glass lenses in
a light microscope. They control the electron path and are responsible for focussing the
beam and magnifying the image. A cross-section of an electromagnetic lens is shown in
Figure 2.3. Here, C is an electrical coil and P is the soft iron pole piece. If the current
passing through the coils varies, the power of the lense changes [33]. The positions of
the lenses are fixed. This is the main difference to glass lenses, which cannot change
their strength, but their position can be adjusted. The stronger the lens is, the lower is
its magnifying power while its demagnifying power is higher. Besides, electromagnetic
lenses and glass lenses behave similarly with similar types of aberration, the most important ones are spherical aberration, chromatic aberration and astigmatism. Spherical
aberration means that the power of the lens in the center differs from that at the edges.
It is primarily determined by the lens design and quality. Spherical aberration causes
that information from one point is spread over a disc in the image plane. Thus, the
resulting image is blurred. It is one of the major problems that limits the resolution
of a TEM. Modern TEMs use spherical aberration correctors in order to alleviate this
2
Electron Tomography
10
Figure 2.3: Cross-section of an electromagnetic lens. Image courtesy of FEI2 .
problem (cf. [33]). Chromatic aberration means that the power of the lens varies with
the energy of electrons passing through the lens. Therefore, the accelerating voltage
should be as stable as possible in order to decrease the effects of chromatic aberration.
Finally, astigmatism causes that a circle in the specimen becomes an ellipse in the
image (cf. [33]). Often, there is not only one lens but a system consisting of several
lenses and apertures with different diameters. Apertures exclude electrons that are
not required for the image formation process (cf. Figure 2.4). Besides, they control
the divergence and convergence of the electron path through the lenses; the smaller
the aperture the more parallel the resulting beam. Therefore, apertures influence the
depths of focus of a beam (cf. [84]). The focus is the image point where the light rays
from one point in the object converge. It can be above or beneath or in the normal
image plane. If the image point is above the image plane, the beam is called overfocused. This is associated with a strong lens. If the image point is in the image plane,
it is focused. Accordingly, the beam is underfocussed if the ray converges beneath the
image plane. This concept corresponds to a weaker lens. In Figure 2.5 all three focus
concepts are illustrated. Apparently, the rays are more parallel in the image plane if
the image is underfocussed than if it is overfocused.
Condenser System The condenser system follows the electron gun and is responsible
for the form of the electron beam that hits the specimen. It consists of several lenses
and apertures and transfers the electron beam to the specimen. In bright-field TEM
imaging the specimen is uniformly illuminated, therefore the electron beam should
2
www.fei.com
2
Electron Tomography
11
Figure 2.4: Ray diagram illustrating how an aperture restricts the angular spread of
electrons entering the lens. [84]
be parallel when it hits the specimen. In other imaging techniques, like Scanning
Transmission Electron Microscope (STEM) imaging, the beam needs to be focussed in
a small spot on the specimen. In order to obtain a parallel illumination beam, it is
useful to operate the microscope out of focus. A simplified concept using only two lenses
is shown in Figure 2.6 (a). The first lens C1 forms an image of the gun cross-over point.
For thermionic electron sources, this C1 cross-over image is a demagnified version of
the first gun cross-over, explaining the name ”condenser system”. For field-emitters,
the gun cross-over often needs to be magnified, since its size is smaller than the desired
size of the illumination area. The following C2 lens produces an underfocussed image of
the C1 cross-over, thus resulting in a nearly parallel illumination hitting the specimen.
In Figure 2.6 (b) the effect of an additional C2 aperture is clarified. The resulting beam
is more parallel if a smaller aperture is added. The disadvantage of an aperture is the
decrease of the total number of electrons hitting the specimen, reducing the quality
of the resulting image. Note that this is only a simplified model, whereas the actual
condenser system in a TEM is far more complicated.
2.2.3 Holders and Stage
In order to insert the specimen in the evacuated optical column, it is placed on a
specimen holder, which is then inserted into the TEM stage. There are two different
kinds of holders: top-entry holders and side-entry holders. In TEM, side-entry hold-
2
Electron Tomography
12
Figure 2.5: The concept of (A) overfocus (B) focus and (C) underfocus. [84]
ers are commonly used, therefore we focus on these holders. For an illustration of a
side-entry holder see Figure 2.7. The specimen is placed on a copper grid, which is
then mounted near the tip of the holder. Afterwards, the holder is introduced into a
goniometer through an air lock. The air lock ensures that the increase of pressure in
the microscope is minimal when the holder is inserted into the vacuum surrounding the
optical column. The whole holder-stage system must provide for various movements
like translation into x-,y- and z-direction, rotation and tilting. x and y translations are
necessary in order to move the region of interest into the illuminated area. With translations in z-direction the height of the holder can be adjusted. The basic movements
like translation and single axis tilting are provided by the goniometer. It is located
close to the objective lense in order to minimize lens aberrations and maximize the
resolution (cf. [33]). The holder rod is responsible for every other desired movement,
like a second tilt axis or rotation of the specimen in the plane perpendicular to the
optical axis. There are also holders, called in situ holders, that allow to change the
specimen during the illumination. Examples of in situ holders are heating, cooling
and cryo-transfer holders. The latter one permits to transfer cryo freezed samples (cf.
section 2.3) into the TEM without water vapor condensing as ice on the surface (cf.
[84]).
2
Electron Tomography
(a)
13
(b)
Figure 2.6: Parallel-beam operation in the TEM. (a) The basic principle, using only
the C1 and an underfocussed C2 lens. (b) Effect of the C2 aperture on the parallel
nature of the beam. [84]
2.2.4 Imaging Lenses
After interaction with the specimen the imaging system needs to create an image out
of the transmitted electrons and then magnify and project it onto the viewing device.
The optical system consists of an objective lens with an aperture in its focal plane
followed by several projector lenses and apertures. The objective lens is the most
important part and forms a first intermediate image of the specimen. The following
aperture has the important role to determine which electron information is used for
the final image. A smaller aperture collects only the electrons close to the optical axis.
Therefore, the influence of spherical aberration is small, but a lot of information from
outer electrons is neglected. With a wider aperture more information is included, but
the blurring effect of spherical aberration is stronger. Therefore, it is obvious that a
high quality of the objective lens is essential for a good resolution in the final image.
The following projector lenses magnify and project the image onto the viewing device.
In phase contrast imaging the imaging system also has the important task to make the
phase of the electron wave visible. See Chapter 3 for more details.
2
Electron Tomography
14
Figure 2.7: Single tilt sample holder for a TEM.3
2.2.5 Viewing Device
The TEM viewing device needs to be able to perform real-time imaging as well as to
record the image. Older TEM devices use a fluorescent screen for real-time imaging
and a film camera in order to record images (cf. [33]). In modern microscopes this
is replaced by solid-state devices like CCD cameras. Above the detector plane, there
is a scintillator converting the electrons into photons, which are then transported to
the CCD element via a lens coupling or fibre optics [32]. The light creates charge in
the CCD, which is then recorded. Due to electron and photon propagation within the
scintillator, the CCD camera is responsible for some loss of resolution and efficiency.
Therefore, direct electron detectors are recently introduced to modern TEMs (cf. [33]).
2.3 Sample Preparation
Before a specimen can be inserted into the specimen holder it needs to be small enough,
stable and very thin in order to permit the transmission of electrons. The copper specimen grid used as a specimen carrier mounted on the holder has a diameter of roughly
3 mm restricting the size of the specimen. It is very important that the preparation
technique preserves the specimen properties and does not alter its atomic structure (cf.
[33]). A common preparation technique for biological samples starts with a chemical
treatment of the specimen in order to remove the water in the tissue. Afterwards, the
tissue is embedded in hardening resin. The hard specimen is then cut into slices of
about 0.5 µm, which can be inserted into the microscope. Another common approach
to stabilize the specimen is to freeze it. Since traditional freezing methods can damage
the specimen by resulting ice crystals, in biological applications most of the samples
3
Wikipedia: http://upload.wikimedia.org/wikipedia/commons/4/4d/TEM-Single-tilt.svg
2
Electron Tomography
15
are cryo fixated. An advantage of this technique is that damages of the sample are reduced to a minimum in comparison to conventional preparation techniques. Moreover,
the original state of the tissue is preserved to a high degree. Cryo-fixation involves
ultra-rapid freezing of the sample, called vitrification. The tissue is frozen so quickly
that water molecules have no time to crystallize. Thus, the damaging ice crystals are
avoided. Another advantage of the low temperature of vitrified samples is that damages caused by the electron beam are reduced as well. Therefore, vitrified samples can
be longer exposed by electrons (cf. [33]). Cryo-fixation allows biological specimens
to be recorded in their natural environment. This can be helpful in order to better
understand the form and function of the specimen under scrutiny.
16
3 The Forward Model
This chapter outlines the derivation of a computationally feasible model for phase
contrast TEM imaging, based on [31, 56]. For a more detailed exposition the reader
may consult [31, 60, 42]. We will start with an introduction of concepts and the basic
notation used throughout this chapter.
3.0.1 Basic Notation
The unit sphere in the Euclidean space R3 is defined as
S 2 := {x ∈ R3 : |x| = 1}.
For given ω ∈ S 2 , the hyperplane in R3 that is orthogonal to ω is defined as
ω ⊥ := {x ∈ R3 : x · ω = 0}.
A line in R3 is uniquely determined by a pair (ω, y) where ω ∈ S 2 is the direction and
y ∈ ω ⊥ is the unique point in the hyperplane ω ⊥ through which the line passes. Next,
any point in R3 can be written as y + tω for some y ∈ ω ⊥ and t ∈ R. If f is a real or
complex valued function defined on R3 that decreases sufficiently fast to infinity, then
the ray transform P(f ) of f is defined as the following line integral:
Z
∞
P(f )(ω, y) :=
f (y + tω) dt
for ω ∈ S 2 and y ∈ ω ⊥ .
−∞
Finally, ~ω⊥ denotes the two-dimensional convolution in the hyperplane ω ⊥
Z
(f ~ω⊥ g)(x) :=
f (x − τ )g(τ ) dτ
ω⊥
for x ∈ ω ⊥
3
The Forward Model
17
and ∗ the three-dimensional convolution in R3
Z
(f ∗ g)(x) :=
f (x − τ )g(τ ) dτ
for x ∈ R3 .
R3
Moreover, Fω⊥ is the two-dimensional Fourier transformation on ω ⊥ defined as
Z
Fω⊥ (f )(ξ) =
e−ix · ξ f (ω, ξ)(τ ) dx
for ξ ∈ ω ⊥ .
ω⊥
3.1 Image Formation
Overview There are essentially three different mechanisms that give rise to contrast
(intensity variations) in a TEM image: diffraction contrast, amplitude contrast, and
phase contrast. Each of these can be described by a common quantum mechanical
model. Although elegant, this is not a computationally feasible approach. For computational feasibility, one must take advantage of various approximations that specifically
hold for each of the aforementioned three contrast mechanisms.
Diffraction contrast is generated in practice by intercepting the diffraction pattern using
an objective aperture in the back focal plane of the objective lens that only allows the
transmitted beam to form the image. A diffraction contrast image reveals variations
in the intensity of the selected electron beam as it leaves the sample. This type of
contrast is essentially only interpretable if the specimen is ordered (crystalline). Our
interest lies in imaging amorphous specimens, so modeling diffraction contrast is of less
relevance to us.
Amplitude contrast, also called thickness-mass contrast, refers to contrast in the image
that arises when one removes electrons that have scattered with an angle that is too
high. This is done by placing an aperture in the back focal plane of the objective lens.
Since lighter elements give rise to smaller scattering angles than heavier elements do,
amplitude contrast basically maps the scattering power of the elements present in the
specimen. On the other hand, such contrast is associated with low and mid range
resolution. Furthermore, when imaging weakly scattering specimens (like unstained
biological specimens), most electrons undergo scattering with very small angles and
therefore pass through the aperture. Hence, amplitude contrast cannot be used to
explain contrast in images from such specimens.
Phase contrast arises from the interference in the image plane of the scattered electron
3
The Forward Model
18
wave with itself. Thanks to a high quality optical system that converts the phase of
the scattered electron wave at the specimen exit plane into visible contrast variation
in the image plane, the effect of this interference is only visible as intensity variations
in the image. Contrast in high resolution TEM images of thin unstained biological
specimens is almost entirely due to phase contrast. Hence, our focus will be on deriving
a computationally feasible model for phase contrast TEM imaging. This model can
also be extended to include amplitude contrast (cf. [31]).
3.1.1 Modeling Phase Contrast Imaging
We want to start this section by stating two basic assumptions. First, the specimen
forms a closed system together with the incident electron, so we disregard any interaction with the environment. We also assume that successive imaging electrons are
independent, which is called the independent electron assumption. Both assumptions
hold under normal TEM conditions. As an example, the independent electron assumption holds, since the distance between two successive imaging electrons is much larger
than the specimen thickness. As a consequence of this assumption the wave mechanical
notions, like interference and superposition, refer to the wave of a single electron, i.e.
the crests of the latter one interact with each other.
A model for phase contrast TEM imaging naturally divides into three parts:
1. the interaction between the imaging electron and the specimen (electron-specimen
interaction),
2. the influence of the optics,
3. the detector.
All three parts are coupled but can be treated separately.
1. Electron-Specimen Interaction
In this section, we want to model the interaction between the incident imaging electron
and the specimen. Essentially, this reduces to modeling the scattering of electrons
against the atoms in the specimen. To begin with, we assume perfect coherent imaging,
which in particular implies that the incident electron is modeled as a monochromatic
plane wave and electrons only scatter elastically against the atoms in the specimen.
3
The Forward Model
19
These assumptions will be relaxed below, by accounting for the partial incoherence due
to incoherent illumination and inelastic scattering.
The Schrödinger Equation For elastic scattering, the specimen remains in the same
quantum state. Hence, the scattering can be modeled as a one-body problem, where the
scattering properties of the specimen are fully described by its electrostatic potential
Ue.p. : R3 → R+ . This derives from the fact that only inelastic scattering changes
the state of the specimen. If only elastic scattering occurs, the specimen will not
change over time, so the function describing the specimen is time-independent. In this
case, it is fully described by the spatially dependent electrostatic potential Ue.p. . The
non-relativistic time evolution of the imaging electron is then described by the scalar
Schrödinger equation [72]:
~2
∂
∆ + V (x) Ψ (x, t) .
i~ Ψ (x, t) = −
∂t
2m
(3.1)
Here, ~ is Planck’s constant, e is the elementary charge, m is the electron mass at rest,
Ψ : R3 × R → C is the wave function for the imaging electron, and V : R3 → R− is the
potential energy, so V (x) = −eUe.p. (x). Now, it is common to express solutions to (3.1)
as
Ψ(x, t) = u(x)f (t) where f has unit modulus.
(3.2)
Inserting (3.2) into (3.1) results in two separate differential equations. First,
i~f 0 (t) = Ef (t),
so
E
f (t) = e−i h t ,
with E being the constant energy of the elastically scattered electron, given by E =
~2 k2
. Here, k = 2π
denotes the electron wavenumber and λ the wavelength of the
2m
λ
electron. Second,
~2
−
∆ + V (x) u(x) = Eu(x)
(3.3)
2m
is a partial differential equation of Helmholtz type, and it can be rewritten as
∆ + k 2 u(x) = −Fs.p. (x)u(x),
with Fs.p. : R3 → R+ given by
Fs.p. (x) := −
2m
V (x).
~
3
The Forward Model
20
The function Fs.p. is henceforth called scattering potential. According to [77] u(x) also
satisfies the Sommerfeld radiation condition as a boundary condition, so u is the unique
solution of
∆ + k 2 u(x) = −Fs.p. (x)u(x)
lim r nr (x) · ∇usc (x)−ikusc (x) = 0 for x ∈ ∂D3r ,
r→∞
(3.4)
(3.5)
where D3r denotes the ball with radius r in R3 , nr (x) denotes the outgoing unit normal
to ∂D3r at x and
usc = u − uin ,
where uin is the monochromatic incoming wave hitting the specimen.
Coherence and Incoherence Regarding the wavenature of the electrons one can divide scattering into coherent and incoherent scattering. A coherent wave has a temporal
constant phase difference, which means that the frequence and propagation speed remain constant over time. If the scattered wave is coherent, we refer to it as coherent
scattering, if not, as incoherent scattering. The main advantage of coherent waves is
their ability to form stationary interference. The resulting wave has temporal constant
amplitude, wavelength, velocity and frequence. When imaging thin unstained biological specimens the main contrast in the TEM image is the contrast that arises from
stationary interference of the scattered wave with itself. It thereby is essential that
there is coherent scattering.
Besides the classification of scattering in coherent and incoherent, it can also be differentiated into elastic and inelastic scattering. In the case of elastic scattering there is
no transfer of energy from the incident electron to the specimen, whereas in the case of
inelastic scattering energy is transferred from the electron to the specimen. In this case
the specimen changes its state. Although every possible combination of coherent/incoherent and elastic/inelastic scattering can occur, it is common to associate inelastic
with incoherent scattering and elastic with coherent scattering. Inelastic scattering
implies that energy is transferred from the incident electron to the specimen, whereby
the transfer process is not deterministic but a quantum mechanical process. So it is
very likely that the amount of transferred energy varies and therefore also the frequence
of the resulting scattered wave, resulting in incoherence.
3
The Forward Model
21
Inelastic Scattering and Imperfect Illumination Since inelastic scattering is typically incoherent, it does not create any interference and it blurs the phase contrast
image one would get if one only had elastic scattering. A phenomenological model to
account for this influence of inelastic scattering is to introduce an incoherent amplitude
contrast formation component by letting the scattering potential have an imaginary
part, i.e. Fs.p. : R3 → C where
Fs.p. (x) := −
2m
(V (x) + i Vabs (x)) .
~
(3.6)
Here, the potential energy V accounts for elastic scattering effects and gives rise to
phase contrast, whereas the absorption potential Vabs : R3 → R− models the amplitude
contrast that originates from the decrease in flux of elastically and unscattered electrons
due to inelastic scattering. The imaginary part Vabs is called the absorption (optical)
potential.
Another source of incoherence is from the illumination, i.e. from the fact that the
incident imaging electron is not a perfect monochromatic plane wave traveling along the
TEM optical axis. This incoherence can be accounted for by modifying the convolution
kernel that models the optics (see section 3.1.1).
A Computationally Feasible Model The multi-scale nature of numerically solving
(3.4) is not computationally feasible. For a rough idea why the problem is not computationally feasible we utilize the rule of thumb that for reasonable accuracy of the
solution one needs about 10 grid points per wavelength, although it is shown that in
the case of high wavenumbers this is not sufficient. If electrons are accelerated with
200 keV, their wavelength is about 0.0025 nm. Thus, a specimen thickness of 100 nm
corresponds to 400000 wavelengths. With 10 grid points per wavelength, this results
in 4 million grid points, only in the z-direction, whereas the specimen dimensions in xand y-direction are even bigger. So one has to consider various approximations.
One approximation is to use geometrical optics. This is an approximate treatment
of wave propagation where the wavelength is considered to be infinitesimally small
(semi-classical approximation). The idea is to represent the highly oscillating solution
as a product of a slowly varying amplitude function and an exponential function of
a slowly varying phase multiplied by a large parameter. It allows us to express the
scattered electron as a phase shifted version of the incident electron. The phase shift
is proportional to the integral of the scattering potential along electron trajectories.
For thin specimens one can disregard the curvature of these electron trajectories, i.e.
3
The Forward Model
22
one assumes that electrons travel along straight lines parallel to the direction ω of the
incident plane wave.
If Ψ0 (x, t) := u0 (x)f (t) is the wave function of the incident electron, then the above
results in the projection assumption that allows us to express the scattered electron
wave Ψout (x, t) = uout (x)f (t) with x = y + τ ω on the specimen exit plane ω ⊥ + τ ω as
τ
Z
uout (y + τ ω) ≈ u0 (y + τ ω) exp iσ
Fs.p. (y + sω) ds
(3.7)
−∞
with the constant σ =
me
.
k~2
The weak phase object approximation is to linearize the exponential in (3.7), i.e.
Z
uout (y + τ ω) ≈ u0 (y + τ ω) 1 + iσ
∞
Fs.p. (y + sω) ds
(3.8)
−∞
= u0 (y + τ ω) + usc (y + τ ω)
with usc (y + τ ω) = u0 (y + τ ω) iσ P(Fs.p. ). Above, we can integrate to ∞, since Fs.p. is
zero beyond the specimen exit plane.
The expression in (3.8) is our model for electron scattering. It is sufficiently accurate
for a computational treatment of phase contrast TEM imaging data on unstained thin
biological specimens.
2. The Optics
After interacting with the specimen, electrons pass through the TEM optics as they
migrate from the specimen exit plane to the image plane. Besides magnifying the image,
in phase contrast imaging the optics has another equally important, but more subtle,
role. It is necessary to make phase contrast visible. This is because the intensity is the
absolute value of the electron wave whose phase equals one irrespective of the value
of the real phase. This problem of losing phase information when taking intensities is
referred to as the phase problem. One consequence is that if one measures intensity data
directly on the specimen plane, then phase contrast information will be lost. So the
optics generates quantum interference between the crests of the electron wave, making
it possible to detect the phase.
We want to illustrate the above claim somewhat more precisely. For simplicity, consider
the case when the electron wave undergoes a constant phase shift of about 4θ as it
3
The Forward Model
23
scatters against the specimen. The phase contrast information that an electron carries
after scattering is contained in this phase shift term, i.e. u = u0 exp(i4θ) where
4θ ∈ R is the phase contrast from the specimen. Hence, all relevant phase contrast
information is contained in 4θ. By taking the intensity in the specimen exit plane,
|u|2 = |u0 |2 , we loose all the phase contrast information. However, if we have an optical
system that can shift the phases of the scattered electron over π/2 with respect to u0 ,
the amplitude gets multiplied by exp(π/2) = i, so the phase shift i4θ becomes −4θ.
This is as if the scattered wave would have the form u = u0 exp(−4θ), and taking
the image intensity now gives us |u|2 ≈ |u0 |2 (1 − 24θ). Hence, in this way we have
circumvented the phase problem. Practically, in TEM imaging such a phase shift can
be accomplished by deliberately going out-of-focus while acquiring the images.
The Set-Up The optical system in the TEM consists of an objective lens followed
by a number of projector lenses and some apertures that are present at several places.
The most important part is the objective lens, which forms a first intermediate image
with a magnification of only 20-50 times and is followed by an aperture in its focal
plane. Although the magnification is relatively small, the objective lens has to have
the highest quality concerning spherical and chromatic aberration and astigmatism.
Aberration is worse at high angles than at low angles and the first lens has to deal
with the highest range of angles, since the range decreases with every magnification.
So all following lenses are less affected by aberration and have almost no influence on
the final image resolution but only a magnifying effect. Therefore, they can be of much
less quality than the objective lense.
In order to model phase contrast imaging, it turns out that one can model the entire
TEM optical system as a single thin lens with an aperture in its focal plane as illustrated
in Figure 3.1 (cf. [31]). The magnification of the single thin lens corresponds to the
magnification M of the entire optical system (objective and projector lenses taken
together), so
M = p/q and 1/f = 1/p + 1/q.
(3.9)
Here, f is the focal length of the lens, and q, p > 0 are the distances from the lens
to the objective and image planes. The aberration properties of the single thin lens
correspond to the properties of the objective lens, since this is the only lens with an
influence on the image resolution. Note that this set-up does not correspond to a
physical optical system. Furthermore, knowledge of M and f (the latter taken as the
focal length of the objective lens) allows us to determine p and q by (3.9).
3
The Forward Model
24
Figure 3.1: The optical set-up consisting of a single thin lens with an aperture in its
focal plane. [31]
Model for the Optics Here we consider the setup in Figure 3.1. First, many electron optical elements, including electron lenses, are adequately modeled within the
framework of geometrical charged-particle optics, i.e. one can consider the electron as
a point-like charged mass whose motion is governed by the laws of classical mechanics. Modeling diffraction by an aperture (which is an opaque screen with a suitable
opening) needs to be based on wave mechanics.
Now, the transforming properties of this setup are simply a suitable combination of
free-space propagation, a model for a thin lens, and a model for diffraction by an
aperture. Let Ψout (y − qω, t) = uout (y − qω)f (t) denote the electron wave on the
specimen exit plane. Then, following [31], the corresponding electron wave in a plane
immediately above the detector is given as
Ψdet (y + rω, t) = udet (y + rω)f (t),
where
h
i y
1
−1
Fω⊥ CTFopt · Fω⊥ uout ( · − qω)
+ rω
udet (y + rω) :=
M
M
(3.10)
with uout defined in (3.8). In the above, Fω⊥ is the (two-dimensional) Fourier transform
in the hyperplane ω ⊥ orthogonal to the optical axis ω and CTF is the optics Contrast
3
The Forward Model
25
Transfer Function (CTF) that is given as
!
Cs 4
4z 2
|ξ| − i 3 |ξ| .
CTFopt (ξ) := χ |ξ| exp i
2k
4k
(3.11)
Here, ξ is a variable in the reciprocal space with unit nm−1 , 4z is the defocus (4z < 0
for under focus and 4z > 0 for over focus), Cs the spherical aberration, and χ is
the aperture function (also called the pupil function). The latter is the characteristic
function of the aperture in the focal plane of the primary lens.
The Intensity Generated by One Single Electron If the electron wave above the
detector udet is given by (3.10), then the intensity generated by one single electron in
the image plane is
2 2
I(Fs.p. )(y, ω) := Ψdet (y + rω, t) = udet (y + rω) .
(3.12)
Together with (3.8) this results in
2
i y
1 −1 h
+ rω .
I(Fs.p. )(y, ω) = 2 Fω⊥ CTFopt · Fω⊥ u0 · − qω (1 + iσ P(Fs.p. ))
M
M
We assume that the wave function of the incident electron leaving the condenser is a
monochromatic plane wave traveling along the fixed direction ω, i.e. for ω ∈ S 2 and
y ∈ ω ⊥ holds
u0 (y − qω) = eik(y−qω) · ω = e−ikq .
Moreover, since biological specimens are weak scatterers, we assume that the intensity
can be linearized, i.e. we can ignore second-order terms of usc . Therefore, we can
assume that
2
−1 h
i
F ⊥ CTFopt · Fω⊥ usc ( · − qω) (y + rω) ≈ 0.
ω
Then, the intensity generated by one single electron is given as
2σ
1
re
im
PSF
~
I(Fs.p. )(y, ω) = 2 1 −
⊥ P Fs.p.
ω
opt
M
(2π)2
im
+ PSFopt ~ω⊥ P
re
Fs.p.
y
+ rω
M
(3.13)
im
−1
with PSFre
opt and PSFopt denoting the real and imaginary part of Fω⊥ CTFopt and
re
im
Fs.p.
and Fs.p.
the real and imaginary part of the scattering potential Fs.p. . Now, a
3
The Forward Model
26
common assumption is that
re
im
(x) for x ∈ R3 and constant Q ∈ R.
(x) = QFs.p.
Fs.p.
This is the standard phase contrast model and the resulting intensity is
I(Fs.p. )(y, ω) =
1
M2
1−
2σ
(2π)2
PSFopt ~ω⊥ P
re
Fs.p.
y
M
+ rω
(3.14)
with
o
n
re
PSFopt (y + rω) := PSFim
+
Q
PSF
opt
opt (y + rω).
The specimen-dependent constant Q is called amplitude contrast ratio.
3. The Detector
The detector is modeled as a rectangular area in the detector plane divided into square
pixels. The process of detecting the scattered electron wave is divided into several
steps, roughly corresponding to the process that takes place in a physical detector.
The basic principle of the detector model is a Poisson counting process in which the
expected number of electrons at each pixel is proportional to the squared absolute value
of the wave function. In order to account for detector quantum efficiencies smaller than
1, the actual detector response is modeled as a probability distribution depending on
the number of counts (shot noise). Finally, the image is blurred by a detector point
spread function (detector blurring). A more detailed explanation follows.
Shot Noise When an electron wave reaches the scintillator, it is localized. A number
of such discrete sets of localizations occur during the formation of an image. The points
where collisions occur can then be described as a sum of random point masses, which
are Poisson distributed with the intensity as the expected value.
The model is somewhat simpler if we discretize the scintillator into “pixels” corresponding to the pixels of the detector. Then, letting xi,j denote the center point of the
(i, j)-th detector pixel, the response from the (i, j)-th pixel is given as a sample of the
random variable µCi,j where µ > 0 is a detector related scaling factor (depending on
the gain and quantum efficiency) and Ci,j ∼ Poisson A · D · Ii,j , where A is the pixel
area, D is the incoming dose (electrons/pixel), and Ii,j is the intensity generated by a
single electron at a suitable point yi,j ∈ ω ⊥ given by (3.13) or (3.14) respectively.
3
The Forward Model
27
Detector Blurring When an electron collides with the scintillator, it generates a burst
of photons, which are then recorded at pixels in the detector. However, these photons
are not only detected by that pixel but also to some extent by nearby pixels. This
introduces a correlation (blurring) between the initially independent random variables
modeling the shot noise. Next, there might be further correlations introduced by other
elements of the detector, e.g. due to charge bleeding around spots that have relatively
high intensities. Besides the shot noise, there is additive read out noise generated by
the detector. This can be modeled by a Gaussian random variable acting on each pixel.
A common approach is to model all these correlations collectively and phenomenologically by introducing a convolution. Hence, the data recorded at pixel (i, j) for a
fixed direction ω ∈ S 2 , henceforth denoted by fdata (ω)(i, j), is obtained by forming
a discrete, two-dimensional convolution of the response from pixel (i, j) with a point
spread function and adding the random variable modelling the read out noise:
fdata (ω)(i, j) :=
X
µCk,l PSFdet (xi,j − xk,l ) + i,j
(3.15)
k,l
with i,j ∼ N (0, σ̂ 2 ) and k, l being the indices of every possible pixel of the detector.
The detector point spread function PSFdet is defined in terms of its Fourier transform,
the Modulation Transfer Function (MTF), which is commonly modeled as
MTF(ξ) :=
b
a
+
+ c.
2
1 + α|ξ|
1 + β|ξ|2
(3.16)
Note that the parameters a, b, c, α and β are all independent of the specimen.
3.1.2 The Forward Operator
Based on the preceding definition of fdata (ω)(i, j), the forward operator can be defined
as follows: Assume the scattering properties of the specimen are fully described by
the complex valued scattering potential Fs.p. defined in (3.6) and for a fixed direction
ω ∈ S 2 the specimen is probed by a monochromatic plane wave. Then, the resulting
data at detector pixel (i, j) is a sample of the random variable fdata (ω)(i, j) defined in
(3.15). The forward operator for phase contrast TEM imaging is then defined as the
expected value of fdata (ω)(i, j), i.e.
K(Fs.p. )(ω)i,j := E fdata (ω)(i, j)
3
The Forward Model
28
for ω ∈ S 2 and pixel (i, j). By using definition (3.15) we get
K(Fs.p. )(ω)i,j = µ · A · D
P
k,l Ik,l
PSFdet (xi,j − xk,l )
(3.17)
with Ik,l = I(Fs.p. )(yk,l , ω) defined in (3.13) or (3.14) respectively.
3.2 The Inverse Problem
Before we state the inverse problem in ET we introduce some more notations. Assume
we have a given subset S0 ⊂ S 2 of m directions that defines the data collection geometry
2
and a detector with n2 pixels. Then, the data space V = (R+ )mn is the space of all
possible data. The reconstruction space U is the Banach space of all complex valued
functions defined on R3 that can act as a scattering potential. We assume that U is
contained in L 1 (R3 , C− )∩L 2 (R3 , C− ) and for every element in U the real and imaginary
part should be positive. Hence, the forward operator is a function K : U → V. The
most common data collection geometry is single axis tilting. The specimen plan rotates
about a fixed axis, called tilt axis, which is orthogonal to the optical axis. The rotation
angle is usually in a range of [−60◦ , 60◦ ] and is called tilt angle. For each direction ω we
record a micrograph fdata (ω) and then rotate the specimen plane to a new tilt angle.
The collection of all micrographs recorded while varying ω is then called tilt-series. For
a given element fdata in the data space V the inverse problem is to determine Fs.p. ∈ U
with K(Fs.p. )(ω)i,j = fdata (ω)(i, j) for every (i, j) ∈ {1, ..., n} × {1, ..., n} and ω ∈ S0 .
Hence, in an ideal situation, solving the inverse problem is equivalent to reconstructing
the scattering potential Fs.p. or alternatively the electrostatic potential Ue.p. of the
specimen in each voxel. From this, one could draw inferences about the refractive index
and therefore about the material of the specimen. But due to a lot of approximations
and assumptions made during the derivation of the forward operator, it is not possible
to reconstruct the true electrostatic potential. In the best case it would be possible
to reconstruct the correct proportions between the different refractive indices in the
specimen. In the case of biological specimens, where resolution is limited due to a lot of
problems described later on, one seeks to reconstruct a function Fs.p. that describes at
least the correct position and shape of the specimen, but even this is very complicated
in most of the cases.
3
The Forward Model
29
3.3 Difficulties for Solving the Inverse Problem
As already mentioned in Chapter 2, several problems limit the resolution of a TEM.
Moreover, they make it difficult to solve the inverse problem. In this section we want
to summarize some of the main problems that, from the technical point of view, limit
the quality of recorded data or, from the mathematical point of view, complicate it to
solve the inverse problem.
The Dose Problem When electrons scatter inelastically during the specimen interaction, energy is transferred to the specimen. This can cause ionization or heating of
the specimen, both resulting in specimen damage. Ionization is the process by which
an atom acquires positive charge by loosing an electron. An incoming electron can
collide with an electron in the electron shell of the atom and remove it. Hence, an
electron from an outer shell will replace the lost electron, but the atom remains positively charged. This can break some chemical bonds in the specimen. Another cause
for beam damage is heating. This is a major source of damage for biological samples.
In order to prevent specimen damage as much as possible, the total number of images
that can be recorded is limited, since the tissue gets more damaged with every illumination. This problem is called the dose problem. Thus, the recorded data are very
noisy with a low signal-to-noise ratio. Mathematically this leads to severe ill-posedness
of the inverse problem (cf. Chapter 4). Since inelastic scattering events increase with
the specimen thickness, it is important that thin specimens are used.
The Limited Angle Problem For data collected with single axis tilting the range of
tilt-angles is limited. Normally the specimen is tilted from −60◦ to +60◦ around the
tilt axis. The higher the tilt angle is, the longer is the path of electrons through the
specimen. If the path is too long, electrons cannot transmit the specimen and the risk
of specimen damage increases. This problem is called the limited angle problem. Now
the question is if the recorded projections are sufficient for a stable reconstruction of
the 3D volume. According to Orlovs criteria (cf. [54, Chap. VI]) this is only fulfilled if
every great circle on S 2 has a non-zero intersection with S0 , which is not the case for
tilting in the range of [−60◦ , 60◦ ]. This leads to severe ill-posedness of the problem. In
the case of dual axis tilting the problem is less severe, although Orlovs criteria is still
not fulfilled.
3
The Forward Model
30
Region of Interest (Local) Tomography In ET the region that is illuminated by an
electron beam is much smaller than the whole specimen. We define a region of interest
that we seek to reconstruct from the data. Thus, the support of the true scattering
potential is not fully contained in this subregion. Since information of the surrounding
region is missing, the scattering potential can not be uniquely determined. In order
to circumvent this problem, there are two different approaches. The first one is to
preprocess the recorded data in order to minimize the contributions from surrounding
regions. The second approach is based in prior assumptions about the sample outside
the region of interest. The forward and adjoint operator are adapted so that they
compensate for contributions from outer regions. A common approach is to set the
region outside to a constant value, estimated by the recorded data. Trying to account
for this effect is called long object compensation.
The Alignment Problem In TEM imaging there are always some small unintentional movements of the specimen during data processing. Hence, the actual set of
tilt angles S0 at which data were recorded is unknown and differs from the predicted
one. Nevertheless, we need to determine the actual geometric relationships prior to
reconstruction, at least to a sufficient degree of accuracy. This problem is called the
alignment problem. One way to solve this problem is to use fiducial markers. These
are often gold beads that are deposited on the specimen prior to data collection. Since
they have a very high density, they are clearly visible on all micrographs and can help
to determine the actual geometric relationships.
Multicomponent Inverse Problem In 3D ET it is not only the 3D volume that needs
to be recovered but also several parameters in the model for the forward operator. Apriori to the reconstruction they cannot be determined reliably, therefore they have to
be reconstructed alongside with the scattering potential. The problem we are dealing
with is therefore a multicomponent inverse problem. For an overview of parameters
that need to be recovered and indications how they can be determined see [56].
Estimating the Data Error The problem of estimating the data error does not influence the data quality or the ill-posedness of the inverse problem, nevertheless we
want to mention it here. For many reconstruction methods it is helpful to have an
a-priori estimate of the data error. Since the stochasticity of the data is very complex,
it is complicated to determine the data error prior to the reconstruction. A lot of
discrepancy principles are therefore not applicable in the case of TEM data.
3
The Forward Model
31
3.4 Reconstruction Methods in Electron Tomography
In this section we outline the most commonly used reconstruction methods in ET.
Reconstruction methods can be divided in either analytical methods, where the signal
is reconstructed directly from the data in a single step, or iterative methods. The
standard analytical methods are Filtered and Weighted Back-Projection, whereas the
iterative ones are ART and SIRT. As already mentioned, the inverse problem that
they all are trying to solve is
fdata = K(Fs.p. ),
(3.18)
where the forward operator (cf. (3.17)) can be rewritten in a more compact form
K(Fs.p. )(ω, x) = C1 (ω) − C2
n
o
re
PSF(ω, · ) ~ω⊥ P(Fs.p.
)(ω, · ) (x),
(3.19)
re
the real part of Fs.p. ∈ U.
with ω ∈ S0 ⊂ S 2 , x ∈ ω ⊥ , P the ray transform and Fs.p.
Analytical Methods Filtered Back-Projection (FBP) and Weighted Back-Projection
(WBP) are the standard approaches within the ET community. The basic assumption
for these methods is that the forward operator is the ray transform, i.e.
fdata = P(Fs.p. ) for Fs.p. ∈ U.
(3.20)
Therefore, the data fdata needs to be preprocessed before, including estimations of
C1 (ω) and C2 as well as deconvolving the PSF. Now, the set MS of lines parallel to a
direction in a fixed smooth curve S ⊂ S 2 that contains the finite set S0 of directions
needs to be defined. Since every pair (ω, x) with ω ∈ S 2 and x ∈ ω ⊥ determines a line,
we define
n
o
2
3
⊥
MS := (ω, x) ∈ S × R : ω ∈ S, x ∈ ω .
(3.21)
In the case of single axis tilting, S is a great circle arc on S 2 . The foundation for both
techniques is now laid by the following equation: For h : MS → R holds the following
H ∗ Fs.p. = PS∗ h ~ω⊥ P(Fs.p. ) with H := PS∗ (h).
(3.22)
PS∗ is the backprojection operator restricted to S, defined by
PS∗ (g)(x)
Z
g(ω, x − (x · ω)ω) dω
:=
S
for x ∈ R3 .
(3.23)
3
The Forward Model
32
The function h is called reconstruction kernel and H the corresponding filter. Now,
the idea for Filtered Back-Projection is to choose h such that H = PS∗ (h) ≈ δ. Then,
one has
FBP(fdata ) := PS∗ h ~ω⊥ fdata = PS∗ h ~ω⊥ P(Fs.p. )
(3.24)
= H ∗ Fs.p. ≈ δ ∗ Fs.p. = Fs.p. .
Hence, Fs.p. can be reconstructed from the data fdata . The method of Weighted BackProjection is mathematically equivalent to FBP, but instead of trying to find h such
that H ≈ δ one takes h = δω . In this case, the method does not directly yield a
reconstruction Fs.p. , but if an expression for the Fourier transformation of H can be
derived, we get
PS∗ h ~ω⊥ fdata = PS∗ fdata = H ∗ Fs.p. ⇒ F PS∗ fdata = F H · F Fs.p. . (3.25)
Hence, the WBP is defined as
WBP(fdata ) := F −1
!
F PS∗ fdata
= Fs.p. .
F H
(3.26)
Again, this yields a way to reconstruct Fs.p. from fdata . See [59] for a recent survey on
these methods and a discussion about appropriate choices of h or respectively H.
Iterative Methods The idea of iterative methods is to construct a sequence (Fs.p. )k ∈
U that in the limit converges to a solution Fs.p. ∈ U of (3.18). A common approach are
iterative methods with early stopping, assuming the reconstruction scheme to be semiconvergent. That means, initial iterates reconstruct large scale components of Fs.p. and
afterwards small scale components, including noise, will be recovered. Before defining
an iterative scheme, problem (3.18) is split into a finite number l of subproblems. We
mn2
start by splitting the data fdata ∈ V = (R+ )
into l subsets
fdata = (fdata )1 , (fdata )2 , ... (fdata )l
mn2
with (fdata )j ∈ Rij
(3.27)
i
and i1 + ... + il = mn2 . The function τj : (R+ )
→ (R+ ) j is the projection on the j-th
data component, i.e. τj (G) = Gj . Then, the partial forward operator Kj : U → Rij is
defined as
Kj (Fs.p. ) := τj ◦ K (Fs.p. ) for j = 1, ..., l.
Hence, the subproblems are
(fdata )j = Kj (Fs.p. ).
(3.28)
3
The Forward Model
33
A series of projections πj : U → U in the reconstruction space can be defined by
πj (Fs.p. ) := Fs.p. + Kj∗ ◦ Cj−1 ((fdata )j − Kj (Fs.p. )).
(3.29)
Here, Cj : Rij → Rij is a fixed linear positive definite operator. For noisy problems it
is often helpful to introduce a relaxation parameter µ > 0 and replace (3.29) by
πjµ (Fs.p. ) := (1 − µ)Fs.p. + µπj (Fs.p. ).
(3.30)
Hence, the impact of noise is reduced by replacing the projection operator by a linear combination of the argument itself and its projection. Then, the reconstruction
operator is the mapping of f 7→ (Fs.p. )kmax where (Fs.p. )kmax is generated by



(F ) = (Fs.p. )k

 s.p. k,0
(Fs.p. )k,j := πjµ ((Fs.p. )k,j−1 ), j = 1, ..., l



(F )
= (F )
s.p. k+1
(3.31)
s.p. k,N
with given (Fs.p. )0 , µ > 0, kmax and πjµ ( · ) defined in (3.30) together with (3.29). Now,
any iterative algorithm of the form (3.31) is characterized by its specific choice of Cj
and the splitting scheme (3.27). The most common examples are ART and SIRT.
Algebraic Reconstruction Technique (ART) ART is defined by Cj = Kj ◦ Kj∗ and
l = mn2 , i.e. fdata is split into single datapoints. It was introduced in [40] and is
the first iterative algebraic technique for tomographic reconstructions. The series of
projections in (3.31) can then be expressed as
(Fs.p. )k,j := (Fs.p. )k,j−1 + µ
1
(fdata )j − Kj · (Fs.p. )k,j−1 · Kj , j = 1, ..., l. (3.32)
2
kKj k
See [55] for further details on ART.
Simultaneous Iterative Reconstruction Technique (SIRT) The idea of SIRT was
first introduced for tomographic reconstructions in [37]. The approach attempts to
correct for errors in all data points simultaneously. Thus, the number ob subsets l is
equal to 1 and Cj are chosen as the identity. The resulting iterative scheme, replacing
(3.31), corresponds to Landweber iterations and is defined as
(Fs.p. )k := (Fs.p. )k−1 + µK∗ (fdata ) − K (Fs.p. )k−1 .
(3.33)
34
4 Inverse Problems and Variational
Methods
In this chapter we present variational methods and reconstruction methods in the
field of ET. First, we want to give a brief introduction to inverse problems and the
problem of ill-posedness. Afterwards, we show how inverse problems can be solved by
variational methods and give some examples of different data and regularization terms.
We present data terms motivated by noise models for TEM images, in particular we
focus on Poisson noise modeling. Moreover, we introduce Bregman distances and show
their applicability for data as well as for regularization terms. This is followed by an
overview of how to verify the existence and uniqueness of solutions.
Definition of Inverse Problems
In the field of imaging with biological or medical applications one often has to deal with
so-called inverse problems. This means only the effect f can be measured, although
the interest lies in determining the cause u. The effect and the cause are related to
each other by an operator K, often describing a projection or device influences on u.
The inverse problem we want to solve is then given by
Ku = f
(4.1)
with a compact and affine operator K between Banach spaces. Note that we are
interested in solving the inverse problem in ET, where K is given by (3.17), the effect
f corresponds to fdata in Chapter 3, and the cause u corresponds to the scattering
potential Fs.p. . For reasons of simplicity we will use the more general notations u and
f from here onwards. Solving this inverse problem is often complicated by the fact
that it is considered to be ill-posed. An inverse problem is called ill-posed if it is not
well-posed, which is defined by Jacques Hadamard as:
4
Inverse Problems and Variational Methods
35
Definition 4.0.1. Let K : U → V be a (possible nonlinear) operator. The inverse
problem of solving Ku = f is well-posed in the Hadamard sense if:
1. A solution exists for any f in the observed data space.
2. The solution is unique.
3. The solution’s behavior changes continuously with the initial conditions.
In the case of an ill-posed problem the third property often does not hold. Reasons
for ill-posedness can be measurement errors or omitted information caused by approximations of the problem. This can result in a not continuous or even not well-defined
inversion K−1 (cf. [81]). In order to determine u, even in the case of ill-posedness, we
are interested in methods for solving the inverse problems without the inversion of the
operator K. In this thesis, we will focus on an approach using variational methods.
4.1 Modeling
This section deals with variational methods as a solution technique for inverse problems
and is based on [71].
4.1.1 Basic Concept
The idea to use variational methods as a technique to solve inverse problems traces
back to an approach by Andrey Tikhonov in 1963 (cf. [80]). Since solving equation
(4.1) might not be possible, he suggested to minimize a functional consisting of two
parts: a data discrepancy term D(f, Ku) and a regularization functional R(u). The
data term D(f, Ku) measures the discrepancy between f and Ku, which is zero if u is
a solution of (4.1). The regularization functional R(u) contains a-priori information
about the solution u. Hence, R(u) should be minimal if u matches the prior information
and should increase if u does not correspond to the given information. Therefore, a
variational model has the form
J(u) = D(f, Ku) + αR(u),
(4.2)
where α is a weighting parameter in order to control the influence of the a-priori
information on the reconstruction. In most cases the measured data are a noisy version
4
Inverse Problems and Variational Methods
36
f δ of the exact data f with δ describing the noise level, i.e.
δ
f − f ≤ δ.
(4.3)
An approximation to the solution u can be obtained by
û = arg min J(u) = arg min D(f δ , Ku) + αR(u).
u
(4.4)
u
The main challenge is to find the right data and regularization term resulting in a
good approximation of the unknown truth u. Both functionals have to be chosen
in dependency of the given data, more precisely, the data term can be chosen with
regard to the expected noise and the regularization term regarding the available apriori information.
Optimality Condition Since the functional J may not be differentiable in a classical
sense, we must rely on the concept of subdifferentials (cf. [29, Chap. 5]) in order to
obtain optimality conditions for (4.4).
Definition 4.1.1. Let J : U → R ∪ {∞} be a convex functional and U ∗ the dual space
of U. Then, the subdifferential at u ∈ U is given as
∂J(u) = {p ∈ U ∗ | hp, v − ui ≤ J(u) − J(v), ∀v ∈ U},
(4.5)
where h · , · i denotes the standard duality product between U and its dual space U ∗ . An
element p ∈ ∂J(u) is called subgradient and can be identified with the slope of a plane
in U × R through (u, J(u)) that lies under the graph of J.
In case J is Fréchet-differentiable, ∂J(u) is equal to {J 0 (u)}, thus the subdifferential
coincides with the classical derivative. In order to obtain an optimality condition for
a problem minu∈U J(u) with J as in Definition 4.1.1, we can use the following lemma.
Lemma 4.1.2. Let J : U → R ∪ {∞} be a convex functional and û ∈ U. Then, û is a
minimum of J if and only if
0 ∈ ∂J(û).
(4.6)
If J is convex, the optimality condition (4.6) is not only necessary but also sufficient.
For problem (4.4) we can conclude that
0 ∈ ∂J û = ∂ D f δ , Kû + αR û .
(4.7)
4
Inverse Problems and Variational Methods
37
Notation Next, we want to clarify some notations used throughout this chapter. The
image u : Ω → R with Ω ⊂ R3 is a function mapping a point (x1 , x2 , x3 ) ∈ Ω to an
intensity value u(x1 , x2 , x3 ) ∈ R and is referred to as a reconstruction. The linear space
containing all functions u : Ω → R is denoted by U and is called reconstruction space.
We call f : Ω0 → R data and f δ : Ω0 → R noisy data. The function space containing all
possible data functions is called data space and denoted by V. In our case Ω0 ⊂ R2 ×S0 ,
where the first and the second dimension refer to coordinates in the image plane and
the third dimension refers to different tilt angles. The operator K : U → V is called
forward operator and is affine in our case (cf. Chapter 3). Both U and V should be
Banach spaces.
4.1.2 Different Noise Models and Corresponding Data Terms
Discretization Mathematically, images can be described in two different ways - either
continuous or discrete. So far we used the continuous definition u : Ω → R. The advantage of the continuous definition is that mathematical concepts like variational methods
can be applied easily. Another advantage is that edges can be defined as discontinuities
of u. In this chapter we will mainly deal with this continuous representation of images.
Nevertheless, digital images are always a discrete version of the underlying continuous
truth, which means that they are represented as a matrix containing intensity values.
Thus, the images need to be discretized at least before implementation. But also for
defining different noise models, it is easier to work with a discrete version of u, f and
f δ , respectively. In order to discretize u, we subdivide the region Ω into N1 × N2 × N3
small voxels. Then, the discrete image denoted as U ∈ RN1 ×N2 ×N3 is defined by
R
Ui1 ,i2 ,i3 =
voxel(i1 ,i2 ,i3 )
u(x) dx
R
voxel(i1 ,i2 ,i3 )
1 dx
.
(4.8)
The discretization of the data f or f δ ∈ Ω0 is determined by the device, since the
measured data are already a discrete version F ∈ RM1 ×M2 ×M3 or F δ ∈ RM1 ×M2 ×M3 . Note
that discrete images are denoted by capital letters. We refer to the discrete operator
as K, too. In the discrete case applying K is equivalent to a matrix multiplication.
In general, noise can be defined as an undesired distortion in the recorded image. We
can distinguish between intensity errors and sampling errors. Intensity errors can be
seen as a realization of an independent random variable acting on each pixel separately.
Sampling errors are on the contrary influenced by surrounding pixels as well. Here, we
4
Inverse Problems and Variational Methods
38
only concentrate on intensity errors, which can be roughly divided into three different
kinds:
1. Additive Noise Let U be the discrete image and δ = (δijk )ijk a matrix of the same
size as KU containing realizations of independent and identically distributed (i.i.d.)
random variables in each entry. If the recorded data F δ are
F δ = KU + δ,
(4.9)
then the noise is called additive noise. δ often contains realizations of Gaussian random
variables, in this case we call it additive Gaussian noise. Alternatively, δ can for
example contain realizations of Laplacian or uniformly distributed random variables.
2. Multiplicative Noise Using the same notations as before, if
δ
F δ = Fijk
ijk
= (KUijk δijk )ijk = KU · δ,
(4.10)
then the noise is called multiplicative noise. In this case all realizations of random
variables contained in δ have to be positive. For example, δ can contain realizations of
i.i.d. Gamma distributed random variables.
3. Data-Dependent Noise If the noise is neither additive nor multiplicative but
dependent on the measured intensity, meaning
F δ = δ(KU ),
(4.11)
then the noise is data-dependent. Commonly used models are Poisson noise and Saltand-Pepper noise. Poisson noise is often used to model the errors produced by photon
counting CCD sensors. Consider a two-dimensional sensor plane, where each sensor
(i, j) corresponds to its position xij in the sensor plane. Then, each sensor (i, j) counts
the incoming photons at position xij . The resulting number of photon counts δ(KU )ijk
can only be positive and one assumes that they are realizations of Poisson distributed
random variables with mean KUijk .
MAP Estimator One idea to find a data discrepancy term is to use a-priori information about the noise model. If U and F δ are seen as random variables, we can make
4
Inverse Problems and Variational Methods
39
use of the a-posteriori probability given by Bayes formula as
P(U |F δ ) =
P(F δ |U )P(U )
.
P(F δ )
(4.12)
Here,
• P(U |F δ ) is the probability that the measured data F δ were generated from the
true image U ,
• P(F δ |U ) is the probability that the data F δ are measured for a given image U
and
• P(U ), P(F δ ) are the a-priori probabilities for U and F δ respectively.
By maximizing the a-posteriori probability we can construct an estimator for U , called
the maximum a-posteriori probability (MAP) estimator, by
Bayes
Û = arg max P(U |F δ ) = arg max P(F δ |U )P(U ).
U
(4.13)
U
P(F δ ) can be neglected, since it is independent of U and therefore has no influence on
Û . Now, instead of maximizing the probability P(U |F δ ) we can minimize the negative
logarithm of the probability.
Û = arg min − log P(U |F δ )
U
δ
= arg min − log P F |U − log P U
U
|
{z
}|
{z
}
D(F δ ,U )
(4.14)
αR(U )
Defining D(F δ , U ) = − log P(F δ |U ) and αR(U ) = − log (P(U )) we get a variational
model like in (4.2).
Additive Gaussian Noise Now, assume that F contains additive Gaussian noise δ
with E(δ) = 0 and Var(δ) = σ 2 , i.e.
δ
δ
= KUijk + δijk ⇔ Fijk
− KUijk = δijk ,
Fijk
(4.15)
with a realization δijk of δ. Moreover, we assume a Gibbs model for the a-priori
probability of U . In this case, the random variables U and F δ are continuous and
therefore we replace the probability P by a probability density function ρP . Then, the
4
Inverse Problems and Variational Methods
40
probability that the measured data F δ were generated from the true image U is equal
to
ρP (F δ |U ) =
Y
=
Y
δ
ρP (δijk = Fijk
− KUijk )
ijk
δ
ijk
2
(Fijk −KUijk )
1
2σ 2
√ e−
.
σ 2π
|
{z
}
(4.16)
(∗)
Here, (∗) is the probability density function of a normal distribution. Since intensity
errors act on each pixel separately, all δijk are independent and we can factorize the
probability per voxel. We assume a Gibbs distribution for the a-priori probability of
U , i.e.
ρP (U ) = c · e−βE(U ) .
(4.17)
Inserting (4.16) and (4.17) into (4.14) results in
X 2
1 δ
2
Fijk − KUijk + σ β E(U ) .
Û = arg min
|{z}
2
U
ijk
(4.18)
=:α
By scaling (4.18) with the number of pixels (M1 · M2 ) and data acquisition directions
(M3 ) and assuming M1 · M2 · M3 → ∞ we get a continuous limit for the first term
Z
2
X 1
2
1
1
δ
Fijk − KUijk →
f δ − Ku .
M1 · M2 · M3 ijk 2
2 Ω0
(4.19)
Thereby we have an asymptotical variational model for continuous functions
Z
2
1
δ
û = arg min
f − Ku dx + αR(u) ,
2 Ω0
u
(4.20)
where R(u) is the asymptotical limit of E(U ). Now, we can see that in the case of
additive Gaussian noise a good choice for the data discrepancy term is
1
D(f , Ku) =
2
δ
Z
Ku − f δ
2
dx.
(4.21)
Ω
Poisson Noise As done before for additive Gaussian noise we now want to derive
an appropriate model for the case that F δ is Poisson distributed with mean KU . In
TEM imaging the actual noise is a mixture of additive Gaussian and Poisson noise,
resulting in a rather complex model (presented hereafter). Our focus will be to use a
4
Inverse Problems and Variational Methods
41
data discrepancy term that properly accounts for the Poisson stochasticity of the data.
Again, we assume a Gibbs model for the a-priori probability of U . Since Poisson noise
is also acting on each pixel separately, the probability of F δ given U is
δ
P(F |U ) =
Y
ijk
δ
Y KUijk Fijk −
δ
e
|Uijk ) =
P(Fijk
δ
)!
(Fijk
ijk
KUijk
.
(4.22)
The a-priori probability of U is still
P(U ) = c · e−βE(U ) .
Then, the MAP estimator is again given by
δ
− log P F |U
Û = arg min
− log P U
.
(4.23)
U
Using equation (4.22) we get
δ
Fijk
KU
ijk
− KUijk
δ
log P F |U = log
e
δ
(Fijk
)!
ijk
X
δ
δ
Fijk
· log KUijk − log (Fijk
)! − KUijk .
=
Y
(4.24)
ijk
δ
Since log (Fijk
)! is independent of U , we can neglect the term when inserting (4.24)
into (4.23) and the resulting estimator is
Û = arg min
X
U
−
δ
Fijk
· log KUijk + KUijk
+ αE(U ) .
(4.25)
ijk
Once again we scale (4.25) by the number of pixels and data acquisition directions and
assume them to become infinitely large and thereby get an asymptotical model
Z
û = arg min
u
Ku − f · log(Ku) dx + αR(u) .
δ
(4.26)
Ω0
Hence, in the case of Poisson distributed data an adequate choice for the data discrepancy term is
Z
δ
D(f , Ku) =
Ku − f δ · log(Ku) dx.
Ω0
4
Inverse Problems and Variational Methods
42
In order to ensure positivity, D(f δ , Ku) is often changed to
δ
Z f
δ
δ
− f + Ku dx ≥ 0
DKL (f , Ku) =
f log
Ku
Ω0
δ
(4.27)
and called Kullback-Leibler (KL) divergence or I-divergence. Since the added part is
independent of u, this does not affect the solution of (4.26).
Mixture of Poisson and Gaussian Noise A lot of applications in the field of imaging
are neither affected by pure Gaussian noise nor by pure Poisson noise. Often there
is a mixture of data-dependent noise sources and data-independent ones. We want to
present an approach to account for both noise models instead of approximating these
situations by either Poisson or Gaussian noise. An important application is imaging
with a CCD camera (cf. [76]), which is common in a TEM (see Chapter 2). Here we can
model the photon counting process as a Poisson distributed random variable and the
read-out noise, introduced by the detector, as a Gaussian distributed random variable
(cf. section 3.1.1). Other applications are fluorescence imaging or images acquired
with certain telescopes. Accurate models for this noise concept recently gained more
and more attention in literature, see for example [47, 52, 13, 36]. The proposed noise
model is
δ
Fijk
= Zijk + δijk ,
(4.28)
where Zijk is a realization of a Poisson distributed random variable Z ∼ Poisson(KU )
and δijk the realization of a normal distributed random variable δ ∼ N (0, σ 2 ). Both
random variables are acting on each pixel separately. There are two different approaches to account for (4.28) in the data discrepancy term. One idea is to use an
approximation of the noise statistics. The second idea is to follow the approach presented before, i.e. to determine P(F δ |U ) and its negative logarithm. The probability
of F δ under the assumption that U is the true image is given by
P(F δ |U ) =
Y
δ
P(Fijk
|Uijk )
ijk
=
Y
ijk
∞
X
KUijk
n!
n=0
n
e−
KUijk
δ
2
(Fijk −n)
1
√ e− 2σ2
σ 2π
!
.
(4.29)
4
Inverse Problems and Variational Methods
43
Hence, the resulting negative logarithm is
− log(P(F δ |U ))
=
X
ijk
− log
∞
X
KUijk
n!
n=0
n
e−
KUijk
δ
2
(Fijk −n)
1
√ e− 2σ2
σ 2π
!!
.
(4.30)
In [47] a primal-dual splitting algorithm is presented for solving the optimization problem (4.14) with D(f δ , Ku) given by (4.30). Approximations for (4.30) or respectively
its gradient are presented in [36] and [13].
Bregman Distances The concept of Bregman distances was originally introduced
for convex programming [17]. The generalized Bregman difference associated with a
functional L ( · ) is defined as
p
(u, v) = L (u) − L (v) − hp, u − vi
DL
(4.31)
with a subgradient p ∈ ∂L (v) and the duality product h · , · i. The subgradient is a
generalization of the derivative introduced in Definition 4.1.1. The Bregman distance
is not a distance in the classical sense, since
p
p
(v, u),
(u, v) 6= DL
DL
i.e. it is not symmetric, and the triangle inequality does not hold. Nevertheless,
p
p
(u, v) ≥ 0 and DL
(u, v) = 0 for u = v.
DL
(4.32)
If L ( · ) is continuously differentiable in v, the Bregman distance can be interpreted
as the difference between L (u) and the first-order Taylor approximation of L ( · )
around v evaluated in u. In Figure 4.1 there are two examples in order to clarify this
interpretation.
If L ( · ) is not continuously differentiable in v the subdifferential ∂L (v) may be multivalued. In this case the Bregman distance cannot be uniquely defined. See Figure 4.2
for an example. Here, L (x) = |x|, u = 1 and v = 0. L is not differentiable in v = 0
and the subdifferential ∂L (0) is given as
∂L (0) = {p | |u| ≥ p · u ∀u} = [−1, 1].
(4.33)
4
Inverse Problems and Variational Methods
44
Figure 4.1: Bregman distances for single-valued subdifferentials.
Figure 4.2: Bregman distances for a multi-valued subdifferential.
Hence, p is not uniquely defined. On the left-hand side of Figure 4.2 we chose p = 12
and on the right-hand side p = − 32 . It is easy to see that the Bregman distance varies
strongly for different choices of p. Hence, if we are using the Bregman distance for a
functional that is not Fréchet-differentiable, we have to address the problem of choosing
the ”correct” p.
Bregman distances can be useful for data discrepancy terms as well as for regularization
terms (see paragraph Bregman-TV Regularization in section 4.1.3). Many distance
measures, like for example the squared euclidean distance or the KL divergence (4.27)
can be associated with Bregman distances. In the following paragraph, we will give
a short overview of different distance measures and their interpretation as Bregman
distances. Note that both the data discrepancy and the regularization functional should
be convex. Therefore, it is essential that the Bregman distance is convex in the first
argument if L is convex.
4
Inverse Problems and Variational Methods
45
Distance Measures Another way of choosing a suitable data discrepancy term is to
rely on different distance measures. This means that the term should not be chosen
according to certain assumptions about the noise and a MAP estimation. Nevertheless,
this may result in the same data discrepancy terms that we stated before. See [27] for
a nonprobabilistic argumentation why least squares or KL divergence approaches are
a good choice. Another approach is to use Bregman distances as a measure of the
goodness of the reconstruction. Then, the discrepancy term is determined by the
functional L that is chosen. In Table 4.1 we present different distance measures (cf.
[6]). All of them can be motivated by a Bregman distance and may be a suitable choice
for a data discrepancy term. The matrix A in the Mahalanobis distance is assumed
R
L (x)
p
DL
(u, v)
Divergence
kxk2
ku − vk2
Squared Norm distance
x log(x) dµ
R
xT Ax
−
R
u log
u
v
− u + v dµ
(u − v)T A(u − v)
log(x) dµ
R
u
v
− log
u
v
− 1 dµ
KL divergence
Mahalanobis distance
Itakura-Saito distance
Table 4.1: Different Bregman Distances
to be positive definite and should be the inverse of the covariance matrix. See [6] for
more examples.
Further Related Literature on Models Using Poisson Noise In this paragraph we
want to present some further literature on solutions for the inverse problem
f δ = Ku,
(4.34)
where f δ is affected by Poisson noise. A review of relevant literature in order to solve
(4.34) is given in [14], describing and discussing the most frequently used algorithms.
In [11] the authors investigate variational methods with a data discrepancy term based
on a Poisson likelihood functional and total variation regularization. It is verified that
the problem of computing minimizers is well-posed. Furthermore, it is proven that
the resulting minimizers converge to the minimizer of the exact likelihood function if
the errors in the data and in the forward operator tend to zero. In [10] the authors
4
Inverse Problems and Variational Methods
46
broaden their work to a class of regularization functionals defined by differential operators of diffusion type. Again, a theoretical analysis of this approach is given. More
computational and algorithmic results are presented for example in [7, 8, 9]. In [70]
a total variation based regularization technique is proposed. The data discrepancy
term is derived via logarithmic a-posteriori probabilities and a MAP estimator and is
given by the KL divergence. In order to prevent the smoothing of the total variation
and to guarantee sharp edges, a dual approach for the numerical solution is used. The
minimization is realized by a two-step iteration based on the expectation-maximization
(EM) algorithm. A detailed explanation and analysis of this method is given in [69].
In addition, several numerical results are presented. In [73] another approach to minimize this energy functional is proposed. In this work the algorithm uses alternating
split Bregman techniques. Another approach to minimize a constrained optimization
problem based on the KL divergence and total variation (cf. (4.42)) or other edgepreserving regularization terms is given in [85]. The authors propose a particular form
of a general scaled gradient projection (SGP) method in order to solve this problem.
Furthermore, a new discrepancy principle for the choice of the regularization parameter is introduced. A criterion to select proper values for the regularization parameter
in the specific case of Poisson noise is given in [15]. An analysis of the minimization
of more general seminorms kD · k under the constraint of a bounded KL divergence
DKL (f δ , K · ) is presented in [79]. Here, D and K are linear operators often representing some discrete derivative and a linear blurring operator respectively. The focus lies
on proving relations between the parameters of KL divergences and constrained and
penalized problems. Since total variation based methods suffer from contrast reduction, an extension to the aforementioned algorithm in [69] is proposed in [19]. The
idea is to use Bregman iterations and inverse scale space methods in order to improve
imaging results. An analysis is given in [20].
4.1.3 Different Regularization Terms
Besides choosing different data terms, one can also adjust the functional defined in
(4.2) by a suitable choice of the regularization term. The regularization should be
chosen in relation to the a-priori information we have, e.g. the information that the
solution should be smooth or have sharp edges. We start this section by discussing
which function space is an appropriate choice for the regularization functional and
afterwards present two different regularizations. This section is based on [18].
4
Inverse Problems and Variational Methods
47
Function Space for the Regularization Term We start our consideration with stating some of the properties, which a reasonable function space should have. First, every
reasonable image u should be contained in the function space, while noise should either not be included or easily to separate from the signal u. Moreover, the function
space should be a dual space of another function space, which is useful for proofing
the existence of a minimizer of (4.2) (cf. section 4.2). Hence, we start with Lebesgue
spaces Lp , which are defined as
Z
o
n
L (Ω) := u : Ω → R measurable | |u|p dµ < ∞
p
(4.35)
Ω
for every real number 1 ≤ p < ∞. For p = ∞ the Lebesgue space is defined as
n
o
L∞ (Ω) := u : Ω → R measurable | ess sup|u(x)|< ∞
(4.36)
x∈Ω
with Ω ⊂ R3 and ess sup is the essential supremum (see for example [71, p.291] for
a definition). Every Lebesgue space Lp with p ≥ 1 is a Banach space with norm
1
R
kukLp = Ω |u|p dµ p or respectively kuk∞ = ess sup|u(x)|. For p = 2, Lp is not only
x∈Ω
a Banach space but also a Hilbert space with scalar product
Z
hu, viL2 =
u · v dµ.
(4.37)
Ω
The advantages of Lebesgue spaces are that every reasonable image is contained in Lp
and for 1 < p ≤ ∞ Lp is a dual space, since
∗
Lp (Ω) = Lq (Ω)
with
1 1
+ = 1.
q p
(4.38)
Only L1 is no dual space. The disadvantage of Lebesgue spaces is that not only the
signal u but also noise can be contained in Lp . One example is Gaussian noise, which
is contained in L2 . Thereby we cannot distinguish between signal and noise. The idea
is to reduce the function spaces in order to exclude noise. An obvious consideration
are Sobolev spaces W 1,p (Ω) defined as
W
1,p
Z
n
o
p
(Ω) = u ∈ L (Ω) | |∇u|p dµ < ∞ .
(4.39)
Ω
Therefore, W 1,p (Ω) is a subspace of Lp (Ω) with the constraint that not only the function
u itself but also its gradient ∇u has to be p-integrable. Hence, present noise leads to
much higher values of the norm, making it easier to differentiate between signal and
4
Inverse Problems and Variational Methods
48
noise. The main drawback of Sobolev spaces is that, especially in the case of p > 1,
they are too restrictive to include every reasonable image. This can be seen in the
following two Lemmata, proven in [18, Chap. 4.2].
Lemma 4.1.3. Let u ∈ W 1,p (Ω) with p > 1 and Ω ⊂ R1 . Then, u is continuous.
In order to proof this lemma, one verifies the Hölder condition. It can be generalized
to higher dimensions using the same approach for the proof.
Lemma 4.1.4. Let D ⊂ Ω be a domain with C 1 -boundary. Then, the function

1 for x ∈ D
u(x) =
0 else
(4.40)
is not in W 1,p (Ω) for p ≥ 1.
That means, for every p > 1 only continuous functions are permitted in W 1,p and
therefore no discontinuities like edges are allowed. As a result, W 1,p for p > 1 is no
reasonable choice for the function space. But even for p = 1 no piecewise constant
functions are permitted, which are essential for the discretization. Another drawback
is again that W 1,1 is no dual space. Consequently, the reduction of L1 to W 1,1 was
too restrictive and we need to enlarge our choice in a way that every reasonable image
u is contained in the function space and that it is a dual space. For that reason we
introduce the space of functions of bounded (total) variation BV (Ω) defined as
n
o
1
BV (Ω) = u ∈ L (Ω) | |u|BV < ∞
(4.41)
with the Total Variation (TV) given by (cf. [1])
Z
T V (u) = |u|BV :=
u ∇ · ϕ dµ.
sup
ϕ∈C0∞ (Ω;R3 ),
kϕk∞ <1
(4.42)
Ω
BV (Ω) is a Banach space with norm
kukBV (Ω) := kukL1 + |u|BV (Ω) .
(4.43)
For every function u ∈ W 1,1 holds
Z
|u|BV (Ω) =
|∇u| dµ = k∇ukL1 .
(4.44)
Ω
Thus, it is easy to conclude that for u ∈ W 1,1 kukW 1,1 = kukBV and therefore W 1,1 (Ω) is
4
Inverse Problems and Variational Methods
49
a subset of BV (Ω). But in addition piecewise constant functions, like the one presented
in Lemma 4.1.4, are included in BV (Ω) as well, since
Z
|u|BV =
u ∇ · ϕ dµ
sup
ϕ∈C0∞ (Ω;R3 ),
kϕk∞ <1
Ω
Z
=
1 ∇ · ϕ dµ
sup
ϕ∈C0∞ (Ω;R3 ),
kϕk∞ <1
D
Z
=
ϕ · n dσ
sup
ϕ∈C0∞ (Ω;R3 ),
kϕk∞ <1
∂D
Z
1 dσ = |∂D|< ∞
=
∂D
with u defined as in (4.40). Consequently, as long as ∂D is a finite Hausdorff-measure,
the total variation of the piecewise constant function is finite, too. In the case of a
simple curve, the Hausdorff-measure corresponds to the length of the curve. Therefore,
|u|BV with u defined as in Lemma 4.1.4 corresponds to the length of the curve as well.
Hence, reducing the total variation of u is equivalent to smoothing the curve. Another
advantage of this function space is that BV (Ω) is the dual space of another function
space (see [18, Chap 5.4] for more details). We can conclude that BV (Ω) is a reasonable
choice for the function space of the regularization functional.
TV Regularization Now, the idea is not only to seek for smooth functions in the
space of functions with bounded total variation but also use the total variation as the
regularization functional. Thus, R(u) in (4.2) is
R(u) = |u|BV (Ω)
(4.45)
with |u|BV defined as in (4.42). As already mentioned before, minimizing |u|BV has
a smoothing effect. In the case of a denoising problem, i.e. K = I is the identity
operator, and Gaussian Noise the variational model is
1
J(u) =
2
Z
u − fδ
2
dx + α|u|BV (Ω) ,
(4.46)
Ω
which is the well-known Rudin-Osher-Fatemi model, often referred to as the ROF model
(cf. [64]). In this thesis, we concentrate on the model using the KL divergence as a
4
Inverse Problems and Variational Methods
50
Figure 4.3: Contrast loss for 1D signal recovered with TV regularization and different
regularization parameters.
data discrepancy term together with the total variation as a regularization, i.e.
J(u) =
Z Ω0
f δ log
fδ
Ku
− f δ + Ku dx + α|u|BV (Ω) →
min .
(4.47)
u∈BV (Ω),
u≥0
A regularization with the total variation of the reconstruction u succeeds in reconstructing sharp edges. With (4.44) it is easy to see that minimizing the total variation is
equivalent to the assumption that the gradient has a sparse representation. As a consequence piecewise constant functions are preferred. Hence, an effect of TV-regularization
are cartoon-like images, where major structures like edges are reconstructed while small
scale textures vanish. Therefore, for reconstructing blocky images, minimizing the total variation is a good choice. If the focus is on reconstructing natural images, TV
regularization is not a good choice since small textures tend to not be reconstructed.
Another deficiency is the loss of contrast in the reconstruction compared to the original
image (see Figure 4.3). We will discuss an approach to overcome this problem below.
For the existence of a unique minimizer of (4.2) it is essential that the TV functional
is convex, i.e.
|β · u + (1 − β) · v|BV (Ω) ≤ β · |u|BV (Ω) +(1 − β) · |v|BV (Ω)
(4.48)
for every u, v ∈ BV (Ω) and β ∈ [0, 1] (cf. section 4.2). Since the total variation is not
differentiable in a classical sense, we must rely on subdifferentials in order to obtain
optimality conditions for the minimization problem.
4
Inverse Problems and Variational Methods
51
Bregman-TV Regularization Since reconstructions with TV as a regularization term
suffer from contrast loss (cf. [57]), we want to present an approach to overcome this
problem. It is based on simultaneous contrast enhancement using Bregman distances.
The technique was introduced in [57] with a detailed analysis for problems with squared
norm discrepancy terms. It has been generalized to time-continuity [22], Lp -norm data
discrepancy terms [21] and nonlinear inverse problems [5]. In [19] the approach was
introduced to solve problems with Poisson noise. Finally, in [12] a combination of
Bregman updates with higher-order TV methods is presented.
The idea to overcome contrast reduction is to introduce an iterative regularization by
Bregman distances. Instead of using a regularization functional as before, information
about the solution u that we gained from a prior solution of problem (4.4) is included.
Therefore, problem (4.4) is replaced by a sequence of problems
ul = arg min D(f δ , Ku) + α · (R(u) − hpl−1 , ui)
(4.49)
u
with p0 = 0 and pl−1 being an element in the subdifferential of the total variation of
the prior solution ul−1 . In this thesis our focus is on R(u) = T V (u) combined with the
KL divergence, i.e.
ul = arg min
u∈BV (Ω),
u≥0
R Ω0
f δ log
fδ
Ku
− f δ + Ku dx + α |u|BV (Ω) − hpl−1 , ui
(4.50)
with pl−1 ∈ |ul−1 |BV (Ω) . By using the Bregman distance (4.31), problem (4.49) can be
replaced with
pl−1
l−1
ul = arg min D(f δ , Ku) + αDR(
(u,
u
)
·)
u
= arg min D(f δ , Ku) + α R(u) − R(ul−1 ) − hpl−1 , u − ul−1 i .
(4.51)
u
An update strategy for pl can be derived via an optimality condition for the problem
(4.51). Using (4.7) we can conclude that
0 = ql + α pl − pl−1
with ql ∈ ∂D(f δ , Kul ), pl ∈ ∂R(ul ).
(4.52)
It follows that a suitable update strategy for pl is
pl = pl−1 −
1
· ql .
α
(4.53)
4
Inverse Problems and Variational Methods
52
Hence, according to (4.51), not only the regularization functional R(u) is minimized but
also the distance between u and ul−1 . The idea is that ul−1 is already an approximation
to the optimal solution û. Therefore, ul−1 can be included as a-priori information into
the regularization functional. The solutions of (4.49) and (4.51) correspond to each
other, since the added parts in (4.51) are independent of u and do not influence the
solution ul .
The iteration strategy can be roughly summarized as follows: We start with an overregularized solution u1 for problem (4.4). Then, subsequently, information that was
already filtered out as noise is added back to the reconstruction. Since the first image
was overregularized, this information still contains a lot of texture information. In
each iteration step more information is included but more noise as well. Since larger
scales converge faster than smaller ones, the method needs to be stopped at a suitable
time. Then, large scale features may already be incorporated into the reconstruction,
while small scale features, like noise, may be still missing. If the iteration procedure is
not stopped, it converges to the noisy image f δ and the total variation could become
unbounded. Therefore, we need a stopping criterion. If we have a reliable estimation
of the noise level δ, we can use for example a discrepancy principle, i.e. we stop the
method if ||ul − f δ ||≤ δ. If the noise level is unknown, another idea is to stop, when
ul+1 is noisier than ul . Since methods like this one can be seen as the opposite of scale
space methods, they are called inverse scale space methods (cf. [22]).
If R is not continuously differentiable, it is not obvious that the Bregman distance (4.31)
can be defined for arbitrary u and v, since ∂R(v) may be empty or multivalued (see
Figure 4.2). In [57] it is proven that a multivalued version of the Bregman distance is
no problem, since the algorithm selects a unique gradient. Bregman-TV regularization
leads to contrast improvement compared to standard TV regularization (cf. [57]).
Another question is the well-definedness of the algorithm. Therefore, we need the existence of a minimizer ul in (4.51) and of the subgradient pl . The well-definedness for
D(f δ , Ku) = 21 kf δ − Kuk2 with a linear operator K and R(u) = |u|BV (Ω) is proven in
[57]. For another choice of the regularization functional R(u) the results can be easily
generalized under the assumption that R(u) is a locally bounded, convex, nonnegative
regularization functional defined on a Banach space. The generalization to other fitting
functionals D(f δ , Ku) is far more complicated. In the case where the data discrepancy
term is a KL divergence, see [20] for an analysis. Moreover, several applications demonstrate the success of the iterative regularization procedure for reconstruction problems
with Poisson noise (cf. [69],[19]). See section 4.2 for more details.
4
Inverse Problems and Variational Methods
53
Further Related Literature on Regularization Methods in the field of ET In the
previous sections, a model to solve the inverse problem in ET based on variational
methods is presented. Apart from this, there have been several approaches to apply
variational methods to the inverse problems in ET (cf. for example [75], [65], [41]).
Nearly all of these methods assume additive Gaussian noise and therefore choose the
data discrepancy term as D(f, Ku) = kKu − f k2 or its weighted version, the Mahalanobis distance (cf. paragraph Distance Measures in section 4.1.2). The approaches
differ in particular by the choice made for the regularization functional R(u). Mainly,
three different kinds of regularization functionals have been applied: entropy regularization, TV regularization and sparsity promoting regularization.
Entropy Regularization The idea is to choose the regularization functional R(u) as
an entropy. One example for applications in ET is given in [75]. Here, the regularization
functional is the KL divergence
Z u
− u + ρ dx
Rρ (u) =
u log
ρ
Ω
(4.54)
with a fixed prior ρ ∈ U. Note that in our model the KL divergence is used as a data
discrepany term, whereas it can also be used as a regularization term as presented
here. The data discrepancy term in [75] is a Mahalanobis distance. The regularization
parameter is based on an estimate of the data error, which is unreliable in the case
of highly noisy ET data. An extension and mathematical analysis of the previous
approach is given in [67]. Here, an iterated regularization scheme is used to update the
estimate of the data error and the nuisance parameters during the reconstruction.
TV Regularization In the ET community there have been different approaches for
regularization functionals based on the total variation, comparable to our model. Most
of them use the formal definition of TV presented in (4.44), i.e.
Z
|∇u| dx.
R(u) =
Ω
The first approach with applications in ET is presented in [2] with an anisotropic
variant of TV. In [41] TV-regularization is applied on STEM data of specimens from
material sciences. A drawback in both approaches is that data has to be preprocessed,
since the forward operator is assumed to be the ray transform. An approach that
works with a forward operator modelling amplitude contrast as well as phase contrast
4
Inverse Problems and Variational Methods
54
for any parallel beam geometry is presented in [65]. We will explain this approach in
more detail later on and use it as a reference method for our results (cf. Chapter 6 and
Chapter 7). Here, the proposed regularization functional is not only the total variation
but
Z
1q
Z
p1
q
p
+β
(4.55)
|u| dx .
R(u) = α
|∇u| dx
Ω
Ω
With p = 1 and β = 0 this coincides with the normal TV regularization.
Sparsity Promoting Regularization The basic assumption for sparsity promoting
regularization is that most signals are sparse in some suitable representation. Then,
the regularization functional is
R(u) = kρ(u)kL1
(4.56)
where ρ is a sparsifying map for u. One example is ρ(u) = ∇u, which coincides with
TV regularization, assuming that the gradient is sparse. Another common approach
P
cj ϕj
is to assume that there is a dictionary {ϕj }∞
j=0 and u has a representation u =
with cj = 0 for as many j’s as possible. Then, ρ(u) = {cj }j , penalizing every cj that
is not zero. This can allow for recovering signals from relatively few and highly noisy
data. An approach for applications on STEM data is presented in [16].
4.2 Existence and Uniqueness of a Minimizer
In this section we want to outline how to proof the existence and uniqueness of a
minimizer of
J(u) = D(f δ , Ku) + αR(u) → min .
u∈U
Our approach follows the idea in [18]. Besides presenting a general framework, we focus
on the specific case that R(u) = |u|BV (Ω) . For the data discrepancy term D(f δ , Ku)
we consider either the L2 -norm 21 kKu − f δ k2 or the KL divergence DKL (f δ , Ku). In
each step, we start with the simpler case of the L2 -norm before we move on to the
more challenging case of the KL divergence, which we use in our model (cf. (4.47)).
We recall the approaches in [23] and [69], respectively. A special focus will be on the
question whether the approaches need to be changed in the case of an affine forward
operator Ku = C − Lu, where L is linear. We base our analysis on the weak-∗ topology
on BV (Ω) (cf. [48, Chap. 10.3]).
4
Inverse Problems and Variational Methods
55
Definition 4.2.1 (Weak-∗ Convergence). Let U be a Banach space, U ∗ its dual space.
Then, νk converges to ν in the weak-∗ topology on U ∗ if
hνk , ui → hν, ui
∀u ∈ U.
(4.57)
We write νk *∗ ν.
For proofing the existence of a minimizer we want to make use of the fundamental
theorem of optimization:
Theorem 4.2.2 (Fundamental Theorem of Optimization). Let J : (U, τ ) → R ∪ {∞}
be a functional on a topological space U with a (locally convex) topology τ that fulfills
the following two assumptions:
1. Lower Semicontinuity: for uk → u in topology τ holds J(u) ≤ lim inf J(uk ).
k
2. Compactness of Sub-Level Sets: ∃α ∈ R, so that Sα := {u ∈ U : J(u) ≤ α} is
nonempty and compact in τ .
Then, there exists a minimum û ∈ U, i.e. J(û) ≤ J(u) ∀u ∈ U.
In order to use this theorem, we need to proof that both assumptions are fulfilled. As
already stated before we want to find a minimum in the space of functions of bounded
variation, so U = BV (Ω). It is essential that BV (Ω) is the dual space of another
Banach space, in order to use the weak-∗ convergence.
Coercivity We will start with the compactness of the sub-level sets. Therefore, we
use the following theorem (cf. [48, Chap. 12, Theorem 3]):
Theorem 4.2.3 (Theorem of Banach-Alaoglu). Let U be the dual space of some Banach
space Z. Then, each bounded set in U is precompact in the weak-∗ topology.
That means, if we can show that the sub-level sets are bounded, we can conclude
their compactness. According to the following definition the functional is then called
BV -coercive (cf. [69, Definition 4.6]).
Definition 4.2.4 (BV-Coercitivity). A functional J defined on L1 (Ω) is BV -coercive,
if the sub-level sets of J are bounded in the k · kBV (Ω) norm, i.e. for all α ∈ R≥0 the set
{u ∈ L1 (Ω) : J(u) ≤ α} is uniformly bounded in the BV norm; or equivalent
J(u) → +∞
whenever
kukBV (Ω) → +∞.
(4.58)
4
Inverse Problems and Variational Methods
56
The concepts to verify the boundedness of the sub-level sets differ depending on the
data discrepancy term.
We start with a sketch of the idea presented in [23]. Here, the minimization problem
is
1
(4.59)
J(u) = kKu − f δ k2 + α|u|BV (Ω) → min
u∈BV (Ω)
2
with α > 0. The subspace of functions with bounded variation with mean zero is
defined by
Z
(4.60)
BV0 (Ω) = u ∈ BV (Ω)| u dx = 0 .
Ω
Furthermore we assume that K1 6= 0 and hKv, K1i = 0 ∀v ∈ BV0 (Ω), where 1 is the
constant function with value 1 everywhere. Both assumptions match with the forward
operator presented in Chapter 3. Now, every minimizer û of (4.59) can be decomposed
in the form
hf δ , K1i
û = v +
1
(4.61)
kK1k2
with v ∈ BV0 (Ω). It holds that |û|BV (Ω) = |v|BV (Ω since the total variation of a constant
function is zero. Hence, problem (4.59) can be reduced to a minimization over BV0 (Ω)
by accounting for (4.61) and the specific form of the forward operator Ku = C − Lu:
1
kKu − f δ k2 + α|u|BV (Ω)
2
2
δ
hf
,
K1i
1
δ
+ α|v|BV (Ω)
Kv
+
K1
−
f
= 2
kK1k2
2
δ
1
hf
,
K1i
δ
+ α|v|BV (Ω)
= C − Lv +
K1
−
f
2
kK1k2
2
1
= Lv − f˜ + α|v|BV (Ω) → min
v∈BV0 (Ω)
2
(4.62)
δ
,K1i
δ
with f˜ = C + hfkK1k
2 K1 − f . In [23, Proposition 3.4.] the authors proof that BV0 (Ω)
is the dual space of another Banach space, i.e. the assumption in the theorem of
Banach-Alaoglu is fulfilled. Moreover, they verify that the total variation | · |BV (Ω)
is an equivalent norm on BV0 (Ω). The total variation is bounded on the sub-level
sets Sα := {u ∈ U : J(u) ≤ α}. Thus, we can conclude that the sub-level sets are
bounded in the BV -norm and that J is BV -coercive. It follows with the theorem of
Banach-Alaoglu that the sub-level sets are compact in the weak-∗ topology.
Next, we want to focus on the idea presented in [69]. Here, the minimization problem
4
Inverse Problems and Variational Methods
57
is the following:
J(u) = DKL (f δ , Ku) + α|u|BV (Ω)
→
min
(4.63)
u∈BV (Ω),
u≥0
with α > 0. DKL : L1 (Ω0 ) × L1 (Ω0 ) → R≥0 ∪ {+∞} is the KL functional given by
Z ϕ
− ϕ + ψ dν
DKL (ϕ, ψ) =
ϕ log
ψ
Ω0
(4.64)
for all ϕ, ψ ≥ 0 a.e., where ν is a measure. With the convention 0 log(0) = 0 the
integrand in (4.64) is nonnegative and vanishes if and only if ϕ = ψ. Again we assume
that K1 6= 0. Since BV (Ω) ⊂ L1 (Ω), the set of admissible solutions can be extended
from BV (Ω) to L1 (Ω) with |u|BV (Ω) = +∞ if u ∈ L1 (Ω)\BV (Ω). Hence, solutions with
bounded total variation are still preferred. Now, the idea in [69] is to derive an estimate
of the form
kukBV (Ω) = kukL1 (Ω) + |u|BV (Ω) ≤ c1 (J(u))2 + c2 J(u) + c3 ,
(4.65)
with c1 ≥ 0, c2 > 0 and c3 ≥ 0. Then, the coercivity follows directly from the positivity
of J(u) for u ≥ 0 a.e.. Once again, every u ∈ BV (Ω) can be decomposed in the form
R
u=w+v
u dx
|Ω|
Ω
with w =
1
(4.66)
and v = u − w ∈ BV0 (Ω). It holds that
α|v|BV (Ω) = α|u|BV (Ω) ≤ J(u) = DKL (f δ , Ku) +α|u|BV (Ω)
|
{z
}
≥0
and therefore
1
J(u).
(4.67)
α
According to the Poincaré-Wirtinger inequality there exists a constant C1 > 0 with
|v|BV (Ω) ≤
C1
J(u).
α
(4.68)
(C1 + 1)
J(u).
α
(4.69)
kvkL1 (Ω) ≤ C1 |v|BV (Ω) ≤
Thus, we can conclude that
kukBV (Ω) ≤ kwkL1 (Ω) +
4
Inverse Problems and Variational Methods
58
In order to derive an inequality of the form (4.65), an estimate for kwkL1 (Ω) is still
needed. Therefore, one can investigate the L1 distance between Ku and f , using that
kϕ −
ψk2L1 (Ω0 )
≤
4
2
kϕkL1 (Ω0 ) + kψkL1 (Ω0 ) DKL (ϕ, ψ)
3
3
(4.70)
for any nonnegative functions ϕ and ψ in L1 (Ω0 ) (cf. [69, Lemma 4.3]). Now, for the
forward operator Ku = C − Lu we need to adapt the estimations in [69]. We seek an
upper and lower bound for kKu − f δ − Lwk2L1 (Ω0 ) . The resulting estimate is then in
dependence of kLwkL1 (Ω0 ) , which can be reformulated using a constant C2 > 0 with
R
C2 =
Ω0
|L1| dν
|Ω|
and kLwkL1 (Ω0 ) = C2 kwkL1 (Ω) .
The remaining part of the argumentation in [69, Lemma 4.7] can be directly carried
forward even in the case of an affine operator K. It verifies the existence of an estimate
of the form (4.65). Thus, J is BV -coercive and with the theorem of Banach-Alaoglu
we can conclude the compactness of the sub-level sets.
Lower Semicontinuity Next, we want to outline the proof that the functional J is
lower semicontinuous. Since the sub-level sets are compact in the weak-∗ topology,
this topology needs to be used for the lower semicontinuity as well. The sum of two
functionals D(f δ , Ku) and R(u) is lower semicontinuous if both functionals are. Thus,
this property can be verified separately. We need to show that for u ∈ BV (Ω) the
functionals u 7→ |u|BV (Ω) , u 7→ 12 kKu − f δ k2 and u 7→ DKL (f δ , Ku) are weak-∗ lower
semicontinuous. For the first and second functional we follow the approach in [23].
We start with the TV-functional. Therefore, we use the dual definition of the total
variation. Let ϕk ∈ C0∞ (Ω, R3 ) be a sequence with kϕk k∞ ≤ 1. Then,
Z
|u|BV (Ω) = lim
k
u∇ · ϕk dx.
(4.71)
Ω
Now, let un *∗ u. We need to verify that |u|BV (Ω) ≤ lim inf |un |BV (Ω) :
n
Z
Z
u∇ · ϕk dx = lim inf
Ω
n
≤ lim inf
n
Ω
un ∇ · ϕk dx
Z
sup
un ∇ · ϕ dx = lim inf |un |BV (Ω) .
ϕ∈C0∞ (Ω;R3 ),
kϕk∞ <1
Ω
n
4
Inverse Problems and Variational Methods
59
Now, with lim follows
k
Z
|u|BV (Ω) = lim
k
u∇ · ϕk dx ≤ lim inf |un |BV (Ω)
n
Ω
and u 7→ |u|BV (Ω) is weak-∗ lower semicontinuous.
Next, we want to verify the lower semicontinuity for the data discrepancy functional
u 7→ 21 kKu − f δ k2 . We can assume that K is a bounded linear operator on BV (Ω),
since we can shift the data f according to (4.62) if K is affine. Furthermore, we assume
that the range of the adjoint operator K∗ is contained in Z with Z ∗ = BV (Ω). Since
a square preserves lower semicontinuity, it suffices to show that u 7→ 21 kKu − f δ k is
weak-∗ lower semicontinuous. Again, we use a dual characterization, namely
kKu − f δ k = sup hKu − f δ , ϕi
ϕ,kϕk=1
for any Hilbert space norm. Let un *∗ u and let ϕk be a sequence with kϕk k∞ = 1
that fulfills lim hKu − f δ , ϕk i = kKu − f δ k. Then,
k
hKu − f δ , ϕk i =hu, K∗ ϕk i − hf δ , ϕk i
| {z }
∈Z
= lim inf hun , K∗ ϕk i − hf δ , ϕk i
n
≤ lim inf sup hKun − f δ , ϕi
n
ϕ,kϕk=1
= lim inf kKun − f δ k.
n
The weak-∗ lower semicontinuity of u 7→ 12 kKu − f δ k follows again by taking the limit
over k. Now, for a problem of the form (4.59) we can conclude that J(u) is lower
semicontinuous and that Theorem 4.2.2 is applicable.
Finally, for verifying the lower semicontinuity of u 7→ DKL (f δ , Ku) we use the following
property of the KL functional (cf. [69, Lemma 4.3]).
Lemma 4.2.5. For any fixed nonnegative function f ∈ L1 (Ω0 ), the KL functional
u 7→ DKL (f, Ku) is lower semicontinuous with respect to the weak topology of L1 (Ω0 ).
A sketch of the proof based on Fatou’s Lemma is given in [4]. Thus, we can directly
follow that J(u) given by (4.63) is lower semicontinuous in the weak-∗ topology. Again,
Theorem 4.2.2 is applicable.
4
Inverse Problems and Variational Methods
60
In conclusion we have shown that (4.59) for problems with additive Gaussian noise as
well as (4.63) for problems with Poisson noise have a minimizer û. It remains to verify
that the minimizer is unique.
Uniqueness In order to verify the uniqueness of the minimizer, it suffices to show
that the functional J(u) = D(f δ , Ku) + αR(u) is strictly convex. We start with a
definition of convexity.
Definition 4.2.6 ((Strict) Convexity). A function J : U → R is called convex if for
all x1 , x2 ∈ U and λ ∈ [0, 1]
J (λx1 + (1 − λ)x2 ) ≤ λJ(x1 ) + (1 − λ)J(x2 ).
(4.72)
J is called strictly convex if for all x1 6= x2 ∈ U and λ ∈ (0, 1)
J (λx1 + (1 − λ)x2 ) < λJ(x1 ) + (1 − λ)J(x2 ).
(4.73)
A sum of two functionals is strictly convex if both functionals are convex and at least
one of them is strictly convex. Therefore, this property can be again verified separately
for u 7→ |u|BV (Ω) , u 7→ 21 kKu − f δ k2 and u 7→ DKL (f δ , Ku).
The total variation is convex, since for u, v ∈ BV (Ω) and λ ∈ [0, 1]
Z
|λu + (1 − λ)v|BV (Ω) :=
(λu + (1 − λ)v) ∇ · ϕ dx
sup
ϕ∈C0∞ (Ω;R3 ),
kϕk∞ <1
!
Z
≤ λ·
u ∇ · ϕ dx
sup
ϕ∈C0∞ (Ω;R3 ),
kϕk∞ <1
Ω
Ω
!
Z
+ (1 − λ) ·
v ∇ · ϕ dx
sup
ϕ∈C0∞ (Ω;R3 ),
kϕk∞ <1
Ω
=λ|u|BV (Ω) + (1 − λ)|v|BV (Ω) .
Note, if Bregman-TV regularization is used instead of normal TV regularization, the
regularization functional is still convex, because | · |BV (Ω) is convex. In both cases, the
regularization parameter α needs to be positive in order to maintain convexity. Hence,
for uniqueness we have to verify that D̂ : BV (Ω) → R with D̂(u) = D(f δ , Ku) is
strictly convex.
We start with D(f δ , Ku) = 21 kKu − f δ k2 . According to (4.62) we can assume that K is
linear. The quadratic fitting term is convex, but it remains to verify that it is strictly
4
Inverse Problems and Variational Methods
61
convex. The second-order derivative of D̂ in direction v is
d2 D̂(u; v) = kKvk2 .
Thus, as long as K is injective, it holds that kKvk > 0 if v 6= 0. We can conclude that
D̂ is strictly convex.
Next, we want to show that u 7→ DKL (f δ , Ku) is strictly convex, too. According
to [69] the function mapping (ϕ, ψ) 7→ DKL (ϕ, ψ) is convex. For an affine operator
Ku = C − Lu with linear L follows
DKL λϕ1 + (1 − λ)ϕ2 , K λu1 + (1 − λ)u2
= DKL λϕ1 + (1 − λ)ϕ2 , C − L λu1 + (1 − λ)u2
= DKL λϕ1 + (1 − λ)ϕ2 , λ C − Lu1 + (1 − λ) C − Lu2
≤ λ · DKL ϕ1 , C − Lu1 + (1 − λ) · DKL ϕ2 , C − Lu2 .
Hence, with ϕ1 = ϕ2 = f we can conclude that D̂ with u 7→ D̂(u) = DKL (f δ , Ku) is
convex, too. For strict convexity we assume that inf0 f δ > 0, inf Ku > 0 and that K
Ω
Ω
is injective. Then, the strict convexity of D̂ is a consequence of the strict convexity of
the negative logarithm − log( · ).
Overall we have shown that J(u) is strictly convex for both kinds of the data discrepancy term. Together with the existence of a minimizer we can conclude that problem
(4.59) as well as problem (4.63) have a unique minimizer.
Well-Definedness for Iterative Regularization We cover the well-definedness for
variational methods using iterative regularization by taking the example of BregmanTV. If we use the Bregman distance as the regularization functional, the variational
model is
o
n1
kKu − f δ k2 + α |u|BV (Ω) − hpl−1 , ui
(4.74)
ul = arg min
u∈BV (Ω) 2
or respectively
ul = arg min
u∈BV (Ω),
u≥0
nZ Ω0
δ
f log
fδ
Ku
− f δ + Ku dx
o
+ α |u|BV (Ω) − hpl−1 , ui .
(4.75)
4
Inverse Problems and Variational Methods
62
In order to verify the well-definedness of this minimizing procedure, we need to show
that a minimizer ul of (4.74) or (4.75) respectively exists and that we may find a
suitable subgradient pl . In [57, Proposition 3.1] the existence of ul and pl is verified for
problem (4.74). The proof is by induction over l. The existence of p1 and q1 (cf. (5.45))
follows with the argumentation before, that problem (4.59) has a unique minimizer. In
[20, Proposition 1] the existence is shown for a variety of data discrepancy functionals,
in particular the approach is applicable to the KL divergence (4.75). Note that it is
only proven for a linear operator K, whereas for an affine forward operator the proof
may be adapted. Here, the idea is to use convex conjugates (cf. [29, Chap. I]) and the
Fenchel duality theorem (cf. [29, Chap. III]) in order to obtain a dual representation
of problem (4.63). Next, a dual inverse scale space method is derived. Finally, a bidual
formulation of this dual inverse scale space method again yields an iterative procedure
in a primal representation, given by
ul = arg min
u∈BV (Ω),
u≥0
nZ Ku + rl−1 − f δ log Ku + rl−1
o
dx + α|u|BV (Ω) .
(4.76)
Ω0
The update strategy for the residual function is
rl = rl−1 + Kul − f δ ,
with r0 = 0.
(4.77)
Hence, the existence of a minimizer for (4.75) can be traced back to the existence of
minimizers for a shifted version of the original minimization problem (4.63), which was
investigated in the previous paragraphs.
63
5 Numerical Approach
In the previous chapter we presented a method to find an approximation to the solution
of an ill-posed inverse problem. We want to minimize a functional of the form
J(u) = D(f δ , Ku) + αR(u) → min,
u∈U
where U is an infinite-dimensional Banach space. Therefore, we need discretizations
for the continuous functions u and f δ as well as a numerical optimization method to
solve this problem. We start by presenting the most commonly used Gradient Descent
Method and Newton Methods. Afterwards we move on to Splitting Methods and present
the method we use for the implementation.
The methods we are presenting are all based upon the concept of first optimize, then
discretize. That means, first, optimality conditions based on the infinite-dimensional
optimization problem are derived, like for example Karush-Kuhn-Tucker (KKT) conditions. Then, the occurring function spaces and operators are discretized for the algorithmic realization of the minimization strategy. Note that the presented optimization
strategies can also be applied for U = Rn and thus can be used for first discretize, then
optimize methods as well.
5.1 Introduction to Numerical Optimization Methods
5.1.1 Gradient Descent Methods
Assume we want to minimize a functional J : U → R. If J is Fréchet differentiable in
u and U is a Hilbert space, which is dual to itself, a gradient flow in U can be defined
by
∂u
= −J 0 (u),
(5.1)
∂t
5
Numerical Approach
64
where J 0 (u) can be identified with the gradient of J in u. Hence,
∂u
,v
∂t
= −J 0 (u)v
∀v ∈ U.
(5.2)
It follows that
∂
∂u
∂u
(J(u)) = J 0 (u)
= −k k2 ≤ 0.
(5.3)
∂t
∂t
∂t
Thus, we can conclude that, as long as (5.2) is satisfied, the functional J decreases
∂
until ∂t
J(u) = ∂u
= 0 is satisfied. Then, according to (5.1), J 0 (u) is equal to zero as
∂t
well and therefore u is a stationary point. Using a time discretization for (5.1) we get
an iterative scheme, called Gradient Descent Method
uk+1 = uk − σ k J 0 (uk )
= uk + σ k dk (uk ),
(5.4)
where σ k is the stepsize and thus needs to be small. dk (uk ) := J 0 (uk ) is called search
direction. The method ensures that J(uk+1 ) < J(uk ), that means the value of the
objective functional decreases in every step. Nevertheless, it is not guaranteed that the
method converges to a (global) minimum of J. One way to determine the stepsize σ k
is to solve a one-dimensional optimization problem, called exact line search
σ k = arg min J(uk + σdk ).
(5.5)
σ≥0
The solution of this problem is called exact stepsize. Another way to determine σ are
inexact line search strategies (cf. [83, Chap. 3]) that try to find a σ k which guarantees a
sufficient decrease of J without solving (5.5). A drawback of gradient descent methods
are their slow convergence rates. A way to improve them is to use a Conjugate Gradient
Method (cf. [83]). Here, according to [34], the iteration scheme is defined as follows



uk+1 = uk + σ k dk






g k+1 = J 0 (uk+1 )


δ k+1 = kg k+1 k2



k+1


β k = δ δk




dk+1 = −g k+1 + β k dk .
(5.6)
Again, σk can be determined by either exact or inexact line searches. There are also
variants of the algorithm that choose β k in other ways.
5
Numerical Approach
65
5.1.2 Newton Methods
In order to improve the convergence rate of the aforementioned minimization algorithms
from order one to quadratic convergence, we proceed from gradient based methods to
Newton methods. Here, not only the first derivative of J in u is used, but also the second one. Thus, we need to assume that the objective functional J is twice continuously
Fréchet differentiable. Then, the idea of Newton Methods can be motivated as follows:
Consider the second-order Taylor approximation of J around uk in uk+1 = uk + d,
1
M k (uk + d) := J(uk ) + J 0 (uk )d + dJ 00 (uk )d ≈ J(uk+1 ).
2
(5.7)
J 0 (uk ) is the gradient and J 00 (uk ) the Hessian matrix. If J 00 (uk ) is positive definite,
then M k (uk + d) has a unique minimizer dk defined by
0 = (M k )0 (uk + dk ) = J 0 (uk ) + J 00 (uk )dk
⇔ dk = −J 00 (uk )−1 J 0 (uk ).
(5.8)
Thus, we get the following update strategy
uk+1 = uk − J 00 (uk )−1 J 0 (uk ).
(5.9)
In order to obtain the search direction dk without inverting the Hessian matrix, one
solves the linear system J 00 (uk )dk = −J 0 (uk ) instead. A shortcoming of this Newton
method is that convergence is guaranteed only for a prior guess u0 that is sufficiently
close to a minimizer u∗ (cf. [83]). A remedy to this problem can be provided by
introducing a stepsize σ k and replacing (5.9) by a damped Newton method

dk = J 00 (uk )−1 J 0 (uk )
uk+1 = uk − σ k dk .
(5.10)
A drawback of both methods is that computing the Hessian matrix and solving the
linear system may be time-consuming and result in high computational costs. An idea
to circumvent this problem is to use an invertible matrix Ak ≈ J 00 (uk ) instead of the
Hessian matrix. Methods based on this approach are called Newton-like methods.
5
Numerical Approach
66
Remark In Chapter 7, we will partly compare our results to the results of an aforementioned algorithm minimizing the functional
1
J(u) = kKu − f δ k2 + α
2
Z
p
p1
|∇u| dx
Ω
Z
q
1q
|u| dx
+β
.
(5.11)
Ω
The algorithm that is used to minimize this functional is based on the idea of conjugate
gradient methods. The line searches in this optimization algorithm are realized by
Newton methods. We refer to the associated software framework as TVreg.
Disadvantages of Gradient and Newton Methods for TV Regularization Both
gradient descent and Newton methods assume that the objective functional J is Fréchet
differentiable. In case the minimization problem is
J(u) = D(f, Ku) + αR(u) → min
u
with R(u) = |u|BV (Ω) ,
(5.12)
this assumption is violated. There are different approaches to circumvent this problem.
The most common one is to replace the TV functional by a smoothed approximation
(cf. [1])
Z p
Rβ (u) =
|∇u|2 + β dx.
(5.13)
Ω
Rβ (u) is Fréchet differentiable, convex and close to T V (u) = |u|BV (Ω) if β is small.
Thus, optimization methods based on Fréchet derivates can be used. For β → 0
the solution of the pertubated variational problem with the regularization functional
(5.13) converges to the solution of the variational problem with R(u) = |u|BV (Ω) (cf.
[1]). However, one has to take into account that for standard Newton methods the
domain of convergence is very small, especially in the case of β being close to zero.
For a primal-dual approach to solve the pertubated problem with highly improved
convergence results see [26]. Another shortcoming of this approximation is that edges
are smoothed, thus the reconstructions are not as sharp as if we use the exact total
variation as a regularizing functional. In [44] the authors present and analyze a semismooth Newton method to solve (5.12) without a smoothed approximation of the total
variation. It is based on a dual formulation of the primal optimization problem (5.12)
obtained by Fenchels duality theorem. For approaches based on a primal-dual formulation of (5.12) with the exact definition of the total variation that use penalty and
barrier methods see [53, Chap. 4.3]. The optimization method presented in section 5.2
uses the exact definition of TV as well.
5
Numerical Approach
67
5.1.3 Introduction to Splitting Methods
Assume the problem to solve is of the form
0 ∈ A(u) + B(u),
(5.14)
with maximal monotone operators A and B on U. In [50] and [58] an approach to solve
(5.14) is presented. The idea is to rearrange the problem in order to obtain a fixedpoint equation u = T (u). Then, a fixed-point iteration can be used, that converges to
a solution of (5.14) under certain assumptions. For arbitrary σ > 0 holds
0 ∈ A(u) + B(u)
⇔ (I − σA)(u) ∈ (I + σB)(u)
⇔ u ∈ (I + σB)−1 (I − σA)(u).
{z
}
|
:=T (u)
σ
:= (I +σB)−1 is called resolvent. Thus, we have a fixed-point equation
The operator RB
σ
u = RB
(I − σA)(u)
(5.15)
resulting in the following fixed-point iteration
σ
uk+1 = RB
(I − σA)(uk ).
(5.16)
Assumptions ensuring the convergence of (5.16) to a solution of (5.15) are for example
provided by the Banach fixed-point theorem. The iteration scheme can be rewritten
as a two-step iteration
 1
 uk+ 2 −uk + A(uk ) = 0
σ
(5.17)
 uk+1 −uk+ 21 + B(uk+1 ) = 0.
σ
1
This is equivalent to (5.16), since uk+ 2 = uk − σA(uk ) = (I − σA)(uk ) and
1
uk+1 = uk+ 2 − σB(uk+1 ) = (I − σA)(uk ) − σB(uk+1 )
⇔ (I + σB)(uk+1 ) = (I − σA)(uk ) ⇔ (5.16).
The two-step iteration scheme in (5.17) is called Forward-Backward-Splitting (FBS)
algorithm, since the first step is a forward step on A, whereas the second step is a
backward step on B.
5
Numerical Approach
68
Advantages of the FBS algorithm The main advantages of the FBS algorithm are
the simplicity of the first step as well as its modular structure. The first step can be
realized by simply applying operators. Due to the modular structure of the algorithm
it is much easier to generalize it to other applications where either A or B has to be
changed. Especially a change of A can be implemented without much effort. This is
important in the context of variational methods where A represents the subdifferential
of the data discrepancy term and B the subdifferential of the regularization term.
Thus, for different data discrepancy terms, only a small part of the algorithm needs to
be changed. Another advantage of the modular structure is that one part can be for
example parallelized, whereas the second part remains the same.
5.2 Bregman-FB-EM-TV Algorithm
In the following section we will present an optimization strategy to solve the minimization problem
J(u) =
Z δ
f log
Ω0
fδ
Ku
− f + Ku dx + αR(u) →
δ
min ,
(5.18)
u∈BV (Ω),
u≥0
where K is an affine operator of the form Ku = C − Lu. We start with an algorithm
for R(u) = 0, proceed with the case R(u) = |u|BV (Ω) and finally present an algorithm
for iterative regularization based on Bregman distances presented in Chapter 4, i.e.
R(u) = |u|BV (Ω) − hpl−1 , ui. All algorithms are based on the optimality condition (cf.
Lemma 4.1.2)
0 ∈ ∂J(u),
with the subdifferential defined as in Definition 4.1.1. In the previous chapter it was
already proven that J(u) has a unique minimizer characterized by the presented optimality condition. This section is based upon [69], although we adapt the algorithms
presented there in order to be suitable for an affine forward operator.
5.2.1 EM Algorithm
In the previous chapter we have shown how variational models can be motivated by
stochastic noise models. The data discrepancy term DKL (f δ , Ku) depends on the
constrained probability p(u|f δ ), whereas the regularization is motivated by an a-priori
5
Numerical Approach
69
probability p(u) that accounts for a-priori information about the unknown solution u.
If we assume that each u ∈ U has the same a-priori probability, then p(u) is constant.
Thus, the regularization functional R(u) is equal to 0 and the minimization problem
reduces to
δ
Z f
δ
δ
− f + Ku dx → min .
(5.19)
f log
u∈BV (Ω),
Ku
Ω0
u≥0
A common approach for computing maximum likelihood estimates in the case of incomplete data is the so-called Expectation Maximization (EM) algorithm, which is also
known as the Richardson-Lucy algorithm (cf. [62, 51, 28]). It is based on the firstorder optimality condition for (5.19). Since the minimization problem is constrained,
we make use of the Karush-Kuhn-Tucker (KKT) conditions (cf. [45, Theorem 2.1.4]).
They yield the existence of a Lagrange multiplier λ ≥ 0 and state that each stationary
point of (5.19) fulfills the following optimality condition

0 = −L∗ 1 + L∗ f δ − λ
Ku
(5.20)
0 = λu.
Here, 1 is again the constant function with value 1 everywhere and L∗ is the adjoint
operator of L, where the forward operator is given by Ku = C −Lu. Since DKL (f δ , Ku)
is strict convex, a solution of (5.20) is not only a stationary point but also a global
minimum of (5.19). Multiplying the first optimality condition with u yields a fixedpoint equation
L∗ 1
λu
δ+
δ .
u=u
f
f
∗
∗
L Ku
L Ku
By utilizing λu = 0 we obtain a fixed-point iteration
uk+1 = uk
L∗
L∗ 1
δ .
f
Kuk
(5.21)
If K preserves positivity and the initial guess u0 is positive, then the algorithm preserves
positivity as well. Note that the iteration scheme differs significantly from the well∼
known iteration procedure in the case of a linear operator K, given by
∼∗
uk+1 = uk
K
f
∼
Kuk
∼∗
K 1
.
(5.22)
5
Numerical Approach
70
Not only that the adjoint of the linear part of K is used, but also the numerator and
the denominator are reversed. If this is not the case, the iteration scheme will not be
converging for the affine operator K. In [74] it is shown that algorithm (5.22) is an
example of the more general EM algorithm in [28]. For the classical case of a linear
operator there have been several proofs that for noise free data f , the iteration scheme
(5.22) converges to a solution of (5.19) (see for instance [82, 55]). For noisy data f δ ,
there is a semi-convergence of the iterates, described in [61]. The distance between the
iterates and the solution initially decreases, but later iterates are impaired by noise,
resulting in an increase of the distance. Thus, one either needs appropriate stopping
rules (cf. [61]) or regularization.
For very noisy ET data we observed that the convergence of the iterates, which is
commonly known as slow, is still too fast in order to obtain reasonable results. Already
the initial iterates are impaired by a lot of noise. Therefore, we take up the approach
presented in [55] to introduce a relaxation parameter ω that influences the convergence
speed by setting
!ω
L∗ 1
k+1
k
δ .
(5.23)
u
=u
f
∗
L Ku
k
For ω > 1, an increased convergence speed can be observed, whereas for ω < 1 the
iterates converge more slowly.
5.2.2 FB-EM-TV Algorithm
Since the results of the EM algorithm are unsatisfactory especially in the case of highly
noisy data, it is helpful to include a-priori information about u. Therefore, we use the
TV regularization presented in the previous chapter in order to improve the reconstruction results. The constrained optimization problem introduced in (4.47) is
J(u) =
Z Ω0
δ
f log
fδ
Ku
− f δ + Ku dx + α|u|BV (Ω) →
min ,
u∈BV (Ω),
u≥0
with α > 0. At first we will neglect the positivity constraint and derive an iterative
procedure based on the first-order optimality condition
0 ∈ ∂J(u) = ∂ DKL (f δ , Ku) + αR(u) .
5
Numerical Approach
71
We would like to split the right-hand side into two separate subdifferentials. The
KL divergence is defined on L1 (Ω), so its subgradients are elements of the dual space
∗
(L1 (Ω)) = L∞ (Ω). In contrast, the total variation is defined on the smaller subspace
BV (Ω) but can be extended to a convex functional on L1 (Ω) by setting |u|BV (Ω) = ∞ if
u ∈ L1 (Ω)\BV (Ω), without affecting the solutions of the minimization problem (5.18).
Thus, the subgradients of | · |BV (Ω) are contained in L∞ (Ω) as well. The continuity of
DKL (f δ , Ku) together with [29, Chap. I, Proposition 5.6] yield that
0 ∈ ∂ DKL (f δ , Ku) + αR(u) ⇔ 0 ∈ ∂DKL (f δ , Ku) + α∂R(u).
(5.24)
The KL divergence is Fréchet differentiable, thus the subdifferential is a unit set. The
optimality condition is
∗
∗
0 = −L 1 + L
fδ
Ku
+ αp,
p ∈ ∂|u|BV (Ω) ,
(5.25)
where L∗ is the adjoint of the linear part of K. Again, we want to derive an iteration
scheme converging to the solution of (4.47). Therefore, we evaluate the subdifferential
of the data discrepancy part in the previous iterate uk , whereas the subdifferential of
| · |BV (Ω) is evaluated in the next iterate uk+1 , i.e.
∗
0 = −L 1 + L
∗
fδ
Kuk
+ αpk+1 ,
pk+1 ∈ ∂|uk+1 |BV (Ω) .
(5.26)
The drawback of this approach is that the new iterate does not directly appear in
(5.26). Since the subgradient pk+1 is not uniquely defined, we cannot determine uk+1
from pk+1 . For an appropriate
uk+1 . Hence,
δ iterative procedure we need to incorporate
k+1
f
we divide (5.26) by L∗ Ku
and replace the resulting 1 by uuk .
k
0=
L∗ 1
1
uk+1
δ + α δ pk+1 ,
−
k
f
f
u
L∗ Ku
L∗ Ku
k
⇔ uk+1 = uk
L∗
L∗ 1
δ −α
f
Kuk
L∗
uk
δ pk+1 ,
f
Ku
(5.27)
with pk+1 ∈ ∂|uk+1 |BV (Ω) . Note that the first part of the right-hand side corresponds
to the EM iteration (5.21).
Another way to motivate (5.27) is using KKT conditions for the constrained optimization problem (4.47) again. They provide the existence of a Lagrange multiplier λ ≥ 0
5
Numerical Approach
72
such that every stationary point u fulfills

0 ∈ −L∗ 1 + L∗ f δ + α∂|u|BV (Ω) − λ
Ku
(5.28)
0 = λu.
∗
fδ
Ku
By multiplication with u and division by L
the Lagrange multiplier λ can be
eliminated and the resulting fixed-point equation is
u=u
L∗ 1
δ −α
L∗
f
Ku
L∗
u
δ p,
p ∈ ∂|u|BV (Ω) .
f
Ku
(5.29)
This equation corresponds to (5.25) multiplied by u, thus the multiplication involves
the positivity constraint in the optimality condition. Then, (5.27) can be seen as
a semi-implicit approach to (5.29). See [69, Chap. 4.3] for a verification that the
corresponding iteration to (5.27) for linear operators actually preserves positivity if K
preserves positivity and u0 ≥ 0.
Next, we split (5.27) in a two-step iteration where the first step corresponds to the EM
algorithm (cf. (5.21))

1

uk+ 2 = uk ·
L∗
L∗ 1 δ
(EM step)
f
Kuk

uk+1 = uk+ 12 − α uk+ 12 pk+1 ,
L∗ 1
(5.30)
(TV step)
with pk+1 ∈ ∂|uk+1 |BV (Ω) . We call this splitting scheme the EM-TV algorithm or FBEM-TV algorithm, since we show later on that (5.30) can be interpreted as a forwardbackward-splitting algorithm as described in section 5.1.3. The second half step can
be realized by solving the convex variational problem
uk+1 = arg min
u∈BV (Ω)
 Z
1 2

1
L∗ 1 u − uk+ 2

2 Ω
1
uk+ 2
dx + α|u|BV (Ω)



,
(5.31)


with its first-order optimality condition given by
L∗ 1
1
0 = uk+1 − uk+ 2
+ αpk+1 ,
k+ 12
u
pk+1 ∈ |uk+1 |BV (Ω) ,
which is equivalent to the second half step in (5.30). Problem (5.31) can be interpreted
k+ 1
as a weighted version of the ROF model (cf. (4.46)) with weight uL∗ 12 . Hence, one can
5
Numerical Approach
73
use a slightly modified version of one of the standard numerical approaches for solving
a ROF model in order to compute the second half step. See section 5.2.4 for more
details.
Damped FB-EM-TV Algorithm Next, we want to present a modification of the
proposed EM-TV splitting algorithm making it possible to control the interaction of
both steps. The idea can be traced back to an adaption of the optimality condition
without affecting its solution. We want to recall the condition presented in (5.25),
which we used as the basis for the splitting algorithm, namely
∗
∗
0 = −L 1 + L
fδ
Ku
+ αp,
p ∈ ∂|u|BV (Ω) .
∗
fδ
Ku
Without affecting the solutions, we can divide this equation by −L
and multiply
it by ω ∈ (0, 1]. Now, by adding the constant function 1 on both sides, we get
1=ω
L∗ 1
δ + 1 · (1 − ω) − ω
L∗
f
Ku
L∗
αp
δ ,
f
Ku
p ∈ ∂|u|BV (Ω) .
(5.32)
Multiplying with u leads to a fixed-point equation, which we can use similarly to before
in order to obtain a fixed-point iteration
uk+1 = ωuk
L∗
L∗ 1
δ + uk · (1 − ω) − ω
f
Kuk
αpk+1
δ ,
L∗
(5.33)
f
Kuk
with pk+1 ∈ ∂|uk+1 |BV (Ω) . This iteration scheme can be realized by the two-step iteration

1
∗

uk+ 2 = uk · ∗ L f1δ (EM step)
L
Kuk
(5.34)

uk+1 = ωuk+ 12 + uk · (1 − ω) − ωα uk+ 12 pk+1 , ((damped) TV step)
L∗ 1
which we refer to as the damped FB-EM-TV algorithm. The second half step can be
computed by solving a weighted ROF problem, which is now given by
uk+1 = arg min
u∈BV (Ω)
 Z
2
1

1
L∗ 1 u − ωuk+ 2 + (1 − ω)uk

2 Ω
1
uk+ 2
dx + ωα|u|BV (Ω)





. (5.35)
5
Numerical Approach
74
For ω = 1 the algorithm coincides with the FB-EM-TV algorithm. The damping
parameter can help to obtain a monotone descent of the objective functional and to
proof the convergence of the algorithm in dependence of the damping parameter ω. See
[69, Chap. 4.3–4.4] for more details about the convergence of the damped FB-EM-TV
algorithm as well as its positivity preservation in the case of a linear forward operator.
Since we are now fitting against a convex combination of the current EM iterate with
the previous TV iterate, the iterations stay closer to the regularized solution uk . Thus,
especially in the case of a small ω, we expect smoother results for uk+1 . Nevertheless,
if the FB-EM-TV algorithm without a damping parameter converges, it converges to
the same solution as the damped FB-EM-TV algorithm, since the changes have not
affected the solutions of the optimality condition.
Interpretation as a Forward-Backward-Splitting Algorithm As already mentioned,
the two-step algorithms can be interpreted as splitting methods, like the ones presented
in section 5.1.3. According to (5.24), the optimality condition (4.6) can be seen as a
decomposition problem
∗
∗
0 ∈ −L 1 + L
|
{z
fδ
Ku
+ α ∂|u|BV (Ω) ,
|
{z
}
B(u)
}
A(u)
(5.36)
with maximal monotone operators A and B (cf. (5.14)). Thus, following the approach
in section 5.1.3, the problem can be solved by a two-step iteration
1
1
uk+ 2 − uk
uk+ 2 − uk
+ A(uk ) =
− L ∗ 1 + L∗
σ
σ
k+1
u
−u
σ
k+ 12
k+1
+ B(uk+1 ) =
u
k+ 21
−u
σ
fδ
Kuk
=0
(5.37)
+ α pk+1 = 0,
with pk+1 ∈ ∂|uk+1 |BV (Ω) . Now, by choosing the artificial stepsize to be σ =
L∗
wuk ,
δ
f
Kuk
the splitting scheme is
L∗
fδ
Kuk
1
uk+ 2 − uk
∗
−L 1+L
ωuk
δ f
k+ 12
k+1
L∗ Ku
u
−
u
k
ωuk
∗
fδ
Kuk
=0
+ α pk+1 = 0.
(5.38)
5
Numerical Approach
75
The first equation of (5.38) is equivalent to
1
uk+ 2 = ω uk
L∗
L∗ 1
δ + uk · (1 − ω)
f
Kuk
and the second equation is equivalent to
1
uk+1 = uk+ 2 − ω α
L∗
uk
δ pk+1 .
f
Kuk
Thus, (5.38) coincides with the damped FB-EM-TV algorithm (5.34) or, if ω = 1, with
the FB-EM-TV algorithm (5.30).
Stopping Rules Besides the maximum number of iterations, more stopping rules are
included in the algorithm, which are based on
1. the error in the optimality condition,
2. the convergence of the primal functions uk ,
3. the convergence of the subgradients pk ∈ ∂|uk |BV (Ω) .
Therefore, we introduce a weighted norm induced by a weighted scalar product defined
as
Z
p
hu, viw :=
u v w dλ → kuk2,w := hu, uiw .
(5.39)
Ω
Here, w is a positive weight function and λ the standard Lebesgue measure on Ω. Since
the optimality condition for the (k + 1)-th iteration is given as
∗
0 = −L 1 + L
∗
fδ
Kuk+1
+ αpk+1 ,
(5.40)
we measure the error in the optimality condition in the weighted norm for every iteration, that means
opt
k+1
∗
fδ
∗
+ αpk+1
:= −L 1 + L
Kuk+1
2
2,uk+1
.
(5.41)
5
Numerical Approach
76
In order to introduce stopping rules based on the convergence of the sequences uk and
pk , we review the second half step of the damped FB-EM-TV algorithm given in (5.34):
uk+1 = ωuk ·
L∗

⇔ uk+1 − uk = ωuk 
L∗ 1
δ + uk · (1 − ω) − ωα
L∗
f
Kuk
L∗ 1
δ −1−α
f
Kuk
L∗

L∗
uk
δ pk+1
f
Kuk
pk+1 
δ .
f
Kuk
(5.42)
If the algorithm converges, the optimality condition (5.29) should be fulfilled for every
iterate, thus we can evaluate it at uk and get
uk = uk ·
L∗

⇔ α
L∗
uk
δ pk = uk 
f
Kuk
L∗ 1
δ −α
L∗
f
Kuk
L∗

uk
δ pk
f
Kuk
L∗ 1
δ − 1 .
f
Kuk
Now, by inserting this equation into the previous one (5.42), we obtain
L
∗
fδ
Kuk
uk+1 − uk
ωuk
+ α pk+1 − pk = 0,
which can be used in order to measure the convergence of the sequences of primal
functions uk and the subgradients pk , respectively. We define
uoptk+1
δ 2
L∗ f
k+1
k u
−
u
k
Ku
:= k
ωu
2,uk+1
,
(5.43)
2
poptk+1 := α pk+1 − pk 2,uk+1 .
Since for every k the primal function uk and Kuk are positive, optk+1 as well as uoptk+1
are well-defined. Now, the algorithm is stopped if at least one of the three criteria
((5.41) and (5.43)) is smaller than a pre-defined tolerance limit.
Pseudocode for the damped FB-EM-TV Algorithm Taken together, the algorithm
for solving the optimization problem (4.47) with the KL divergence as a data term and
TV regularization is given as the following.
5
Numerical Approach
77
Algorithm 1 (Damped) FB-EM-TV Algorithm for solving (4.47)
noisy data f δ , reg. param. α ≥ 0, weight. param. ω ∈ (0, 1],
maxEM its ∈ N, tolerance limit tol > 0
Initialization: k = 0, u0 = 1
Parameters:
Iteration:
while (k < maxEM its) and (optk ≥ tol or uoptk ≥ tol or poptk ≥ tol) do
1
1. Compute uk+ 2 via EM step in (5.34).
2. Compute uk+1 via weighted ROF model (5.35).
3. Update optk , uoptk and poptk according to (5.41) and (5.43).
4. Set k = k + 1.
end while
return uk
5.2.3 Bregman-FB-EM-TV Algorithm
Next, we consider the stepwise refinement of (4.47) based on Bregman distances proposed in (4.49). Thus, we want to solve the iterative optimization problem introduced
in (4.50), i.e.
ul = arg min
u∈BV (Ω),
u≥0
Z δ
f log
Ω0
fδ
Ku
− f δ + Ku dx + α |u|BV (Ω) − hpl−1 , ui
with pl−1 ∈ |ul−1 |BV (Ω) . Similar to before, we want to obtain a two-step iteration
in order to solve problem (4.50) for a fixed Bregman step l based on its optimality
condition. Since the KL divergence as well as the dual product hpl−1 , ui are Fréchet
differentiable, the optimality condition based on subdifferentials is given as
∗
0 ∈ −L 1 + L
∗
fδ
Kul
+ α ∂|ul |BV (Ω) − pl−1 .
(5.44)
By starting with a constant solution u0 , we can assume that p0 = 0 ∈ |u0 |BV (Ω) . Besides
a strategy to solve the optimization problem (4.50) for a fixed l, an update strategy
for pl is needed as well. This can be easily motivated by the optimality condition. The
subgradient pl ∈ ∂|ul |BV (Ω) can be chosen according to
1
pl = pl−1 +
α
∗
L 1−L
∗
fδ
Kul
(5.45)
with p0 = 0. Thus, a splitting algorithm to solve (4.50) for fixed l is alternated with
updates of pl (cf. (5.45)) and l. For the two-step iteration the first EM step is analogous
5
Numerical Approach
78
to the one in the FB-EM-TV algorithm (cf. (5.30)), whereas the second step needs to
be adapted in order to account for the change in the regularization functional. The
new splitting scheme for the Bregman-FB-EM-TV algorithm is

k+ 21

u
= ukl ·

 l
L∗
L∗ 1 (EM step)
fδ
Kuk
l
(5.46)

k+ 1

uk+1 = uk+ 12 − α ul 2 pk+1 − p ,
l−1
l
l
l
L∗ 1
(TV step)
where the index k refers to the number of EM iterations and the index l to the number
of Bregman iterations. Again, the second step can be solved via a variational problem
uk+1
= arg min
l
u∈BV (Ω)
 Z
k+ 1 2

1
L ∗ 1 u − ul 2



dx + α |u|BV (Ω) − hpl−1 , ui
.


k+ 1
ul 2

2 Ω
(5.47)
In order to be able to use the same approach for solving this problem as for the
weighted ROF problem in (5.31), we want to shift the second part of the regularization
functional, i.e. −αhpl−1 , ui, to the data fidelity term. Thus, we want to solve
uk+1
= arg min
l
u∈BV (Ω)
 Z
k+ 1
k+ 21 2

∗
1
− 2αpl−1 uul 2
L 1 u − ul
k+ 12

2 Ω
dx + α |u|BV (Ω)
ul



.


With α pl−1 := L∗ 1vl−1 this can be rewritten as
uk+1
l




Z
1
= arg min
2
u∈BV (Ω) 

 Ω
∗
L1
u−
k+ 1
ul 2
2
−
k+ 1
2vl−1 uul 2
dx + α |u|BV (Ω)
k+ 1
ul 2







and the new update strategy is

vl = vl−1 + 1 −
L
∗
fδ
Kul
L∗ 1

.
(5.48)
For the final step of rewriting problem (5.47) as a weighted ROF model, we have to
adapt the data discrepancy term in a way that it has the form of a weighted L2 -norm.
5
Numerical Approach
79
It holds that
u−
k+ 1
ul 2
2
k+ 1
− 2vl−1 uul 2
(5.49)
2
k+ 21
k+ 12
k+ 12 2
k+ 12 2
= u − ul + vl−1 ul
− 2 ul
vl−1 − vl−1 ul
,
where the last two terms are independent of the solution u and thus do not influence
it. Therefore, problem (5.47) can be rewritten as the weighted ROF model
uk+1
= arg min
l
u∈BV (Ω)
 Z
2
k+ 1
k+ 1

1
L∗ 1 u − ul 2 + vl−1 ul 2
k+ 12

2 Ω
ul
dx + α |u|BV (Ω)



.
(5.50)


A method to solve this problem is given in section 5.2.4. If we want to include a
damping parameter ω into the Bregman-FB-EM-TV algorithm as well, only the TV
step, which is realized by solving the weighted ROF model (5.50), needs to be adapted.
The TV step is then given as
uk+1
= arg min
l
u∈BV (Ω)
 Z
2
k+ 12
k+ 12

∗
1
L 1 u − ũl + ωvl−1 ul
k+ 12

2 Ω
ul
dx + αω |u|BV (Ω)



(5.51)


with
k+ 21
ũl
k+ 12
= ωul
+ (1 − ω)ukl .
We refer to the resulting algorithm as the damped Bregman-FB-EM-TV algorithm.
Again, with ω = 1 the algorithm coincides with the Bregman-FB-EM-TV algorithm.
Stopping Rules Since the damped Bregman-FB-EM-TV algorithm consists of two
different iterative schemes - the outer Bregman iterations and the inner EM iterations
- we differentiate between stopping rules for each of them. The inverse scale space
method of the Bregman-FB-EM-TV algorithm starts with an overregularized solution
and incorporates more small scales in every Bregman iteration. Since noise is small
scaled as well, the iterations need to be stopped before the results are impaired by too
much noise. Thus, a suitable stopping criterion would be to stop the iterations before
the residual of the noisy data f δ and Kul reaches the noise level δ (cf. [57, 22, 69]). Since
reliable estimations of the noise level are lacking for ET, so far we stop the Bregman
iterations only by the pre-defined maximum number of iterations. An appropriate
stopping rule for the Bregman iterations still needs to be incorporated in the algorithm.
5
Numerical Approach
80
For the inner EM iterations we define stopping rules analogous to the ones in section
5.2.2. The error in the optimality condition is measured by
optk+1
l
2
δ
∗
f
k+1
∗
:=
−L
1
+
L
+
αp
−
αp
l−1
l
k+1
Kuk+1
l
2,ul
2
δ
∗
f
k+1
∗
∗
+
αp
−
L
1v
=
−L
1
+
L
l−1
l
k+1 .
Kuk+1
l
2,u
(5.52)
l
The accuracies of the sequences of primal functions ukl and subgradients pkl ∈ ∂|ukl |BV (Ω)
are respectively measured by
uoptk+1
l
δ 2
k+1
L∗ f
k u
−
u
k
l l
Kul
:= k
ωu
l
,
2,uk+1
l
(5.53)
2
poptk+1 := α pk+1
− pkl 2,uk+1 .
l
l
l
Again, the EM iterations are stopped if at least one of the three criteria ((5.52) and
(5.53)) is smaller than a pre-defined tolerance limit.
Pseudocode for the damped Bregman-FB-EM-TV Algorithm The resulting algorithm for solving the optimization problem (4.50) with the KL divergence as a data
discrepancy term and the Bregman distance associated with the total variation as a
regularization is given in Algorithm 2.
5.2.4 Numerical Realization of the Weighted ROF
In the previous sections we have seen that the denoising step in the (damped) FB-EMTV algorithm (cf. (5.31) and (5.35)) as well as in the (damped) Bregman-FB-EM-TV
algorithm (cf. (5.50) and (5.51)) can be realized as a weighted ROF model. The general
form of the problem is
1
2
Z
Ω
(u − q)2
dx + β|u|BV (Ω) → min ,
u∈BV (Ω)
h
(5.54)
where q, h and β > 0 are chosen in dependence of the particular problem. An overview
of the different choices is given in Table 5.1. In general, there have been several
5
Numerical Approach
81
Algorithm 2 (Damped) Bregman-FB-EM-TV Algorithm for solving (4.50)
noisy data f δ , reg. param. α ≥ 0, weight. param. ω ∈ (0, 1],
maxEM its ∈ N, maxBregits ∈ N, tolerance limit tol > 0
Initialization: l = 1, u01 = 1, v0 := 0
Parameters:
Iteration:
while (l < maxBregits) do
1. Set k=0.
while
(k < maxEM its) and (optk ≥ tol or uoptk ≥ tol or poptk ≥ tol) do
k+ 21
a) Compute ul
via EM step in (5.46).
b) Compute uk+1
via weighted ROF model (5.51).
l
c) Update optkl , uoptkl and poptkl according to (5.52) and (5.53).
d) Set k = k + 1.
end while
2. Compute update vl according to (5.48).
3. Set u0l+1 = ukl .
4. Set l = l + 1.
end while
return u0l
approaches in order to solve the standard ROF model, i.e. problem (5.54) with weight
function h = 1. For some examples see [24, 25, 39]. In this section we want to outline
two different iterative approaches to solve problem (5.54) presented in [69]. The first
one is based on the projected gradient descent algorithm given in [24] for the standard
ROF model. For more details see [69] and the references therein. Note that the
approach uses the exact definition of the total variation given in (4.42). Thus, any
smoothing as given in (5.13) is not necessary.
In order to derive an iterative algorithm to solve (5.54), we insert the definition of the
total variation (4.42) into (5.54) and get a saddle point problem in the primal variable
u and the dual variable ϕ
inf
sup
L(u, ϕ) :=
u∈BV (Ω) ϕ∈C ∞ (Ω;R3 ),
0
kϕk∞ <1
1
2
Z
Ω
2
(u − q)
dx + β
h
Z
u ∇ · ϕdx.
(5.55)
Ω
We can swap the infimum and supremum and derive a primal optimality condition for
problem (5.55) given by
∂
L(u, g) = 0
∂u
⇔
u = q − βh ∇ · ϕ.
(5.56)
5
Numerical Approach
82
Algorithm
q
h
FB-EM-TV Algorithm (5.31)
uk+ 2
Damped FB-EM-TV Algorithm (5.35)
ωuk+ 2 + (1 − ω)uk
1
uk+ 2
L∗ 1
1
k+ 12
ul
Damped Bregman-FB-EM-TV Algorithm (5.51)
k+ 1
ul 2
L∗ 1
k+ 21
+ vl−1 ul
k+ 12
ωul
k+ 12
+ ωvl−1 ul
α
1
uk+ 2
L∗ 1
1
Bregman-FB-EM-TV Algorithm (5.50)
β
+ (1 − ω)ukl
k+ 1
ul 2
L∗ 1
ωα
α
ωα
Table 5.1: Overview of the particular settings for q, h and β in (5.54) in dependence of
the different algorithms presented in sections 5.2.2 and 5.2.3.
By substituting (5.56) into (5.55) we obtain a purely dual constrained optimization
problem only depending on the variable ϕ. Hence, with the KKT conditions (cf. [45]),
optimality conditions for the dual problem can be obtained from which a fixed-point
equation and a resulting fixed-point iteration are derived. Under certain conditions
this iteration converges to an optimal solution ϕ̃ of the purely dual problem and the
optimal primal solution ũ of the weighted ROF model (5.54) is then given as
ũ = q − βh ∇ · ϕ̃.
Another approach to solve the weighted ROF model is presented in [68, Chap. 6.3.4].
It is based on the Split Bregman approach proposed in [39]. Here, the formal definition
of TV (cf. (4.44)) is used to rewrite (5.54) as a constrained optimization problem
1
min
u,ũ,v 2
Z
Ω
(ũ − q)2
dx + β
h
Z
|v| dx s.t. ũ = u and v = ∇u.
(5.57)
Ω
Then, following the idea of augmented Lagrangian methods (cf. e.g. [46]), one can
define the augmented Lagrangian functional in accordance with (5.57) as
1
L(u, ũ, v, λ1 , λ2 ) =
2
Z
Ω
(ũ − q)2
dx + β
h
Z
|v| dx + hλ1 , v − ∇ui
Ω
(5.58)
µ1
µ2
+
kv − ∇uk2L2 (Ω) + hλ2 , ũ − ui +
kũ − uk2L2 (Ω) + Xũ≥0 .
2
2
Here, λ1 , λ2 are Lagrange multipliers, µ1 , µ2 are positive relaxation parameters and
Xũ≥0 is an indicator function in order to ensure positivity of the solution. An alternating
minimization scheme is derived with the standard Uzawa algorithm (cf. [30]). In each
iteration step one successively determines the minimum of L regarding u, ũ and v
respectively, whereas the respective other two are fixed. See [68] for more details.
83
6 Programming and Realization in
MATLAB and C
In the previous chapter an algorithm to solve the inverse problem f = Ku by variational
methods has been presented. In this chapter we will comment on the computational
realization of this algorithm. In particular, two existing frameworks are introduced,
which we used as a basis for the implementation. The first code is implemented in
R
MATLAB
, whereas the second one is an efficient C code. Therefore, we present an
approach to combine both codes. Moreover, we will address some difficulties of the
algorithm and ways to overcome these.
6.1 Bregman-FB-EM-TV Toolbox
As a framework for the algorithm we used an existing toolbox for the Bregman-FBR
EM-TV algorithm in the case of a linear operator, which is implemented in MATLAB
(cf. [69, 20]). The algorithm is solely designed for a linear forward operator, previous
applications have been in the field of fluorescence microscopy and positron emission
tomography. The resulting reconstructions can either be two- or three-dimensional.
The underlying algorithm is presented in Algorithm 3, where the differences to the
damped Bregman-FB-EM-TV algorithm for affine forward operators (cf. Algorithm 2)
are marked in red. An advantage of the toolbox is the very modular structure. The forward operator only influences the EM step as well as the Bregman update, whereas the
TV step remains unchanged. Therefore, the algorithm can be easily extended to other
applications with data corrupted by Poisson noise, as long as the underlying operator
is linear. If this is fulfilled, one only needs to insert the forward and adjoint operator
of the inverse problem to be solved. For applications with an affine forward operator
and data corrupted by Poisson noise, inserting the new operator is not sufficient. Here,
Algorithm 2 needs to be used, which has been developed in Chapter 5. The algorithms
differ in their particular EM step as well as in the weight of the TV step. Moreover, the
6
Programming and Realization in MATLAB and C
84
Algorithm 3 (Damped) Bregman-FB-EM-TV Algorithm for a linear forward operator
noisy data f δ , reg. param. α ≥ 0, weight. param. ω ∈ (0, 1],
maxEM its ∈ N, maxBregits ∈ N, tolerance limit tol > 0
Initialization: l = 1, u01 = 1, v0 := 0
Parameters:
Iteration:
while (l < maxBregits) do
1. Set k=0.
while
(k < maxEM its) and (optk ≥ tol or uoptk ≥ tol or poptk ≥ tol) do
k+ 21
a) Compute ul
via EM step
k+ 12
ul
K∗
= ukl ·
fδ
Kuk
l
K∗ 1
.
b) Compute uk+1
via weighted ROF model
l
uk+1
= arg min
l
 Z
2
k+ 1

1
K∗ 1 u − ũl 2 + ωvl−1 ukl
u∈BV (Ω) 
2
ukl
Ω
with
k+ 12
ũl
k+ 21
= ωul
dx + αω |u|BV (Ω)


+ (1 − ω)ukl .
c) Update optkl , uoptkl and poptkl according to
optk+1
l
uoptk+1
l
:= +K∗ 1 − K∗
fδ
Kuk+1
l
!
+
αpk+1
l
− K∗ 1vl−1
2,uk+1
l
2
− pkl 2,uk+1 .
poptk+1 := α pk+1
l
l
d) Set k = k + 1.
end while
2. Compute update vl according to

vl = vl−1 − 1 −
3. Set u0l+1 = ukl .
4. Set l = l + 1.
end while
return u0l
2
2,uk+1
l
K∗ 1 uk+1 − uk 2
l l
:= k
ωul
l



K∗
fδ
Kul
K∗ 1

.
6
Programming and Realization in MATLAB and C
85
Bregman update is changed. Nevertheless, we could use the framework of the existing
toolbox, the computation of the minimum of the weighted ROF model, and the computations for basic operators like gradient and divergence. We adapted the updates for
k+ 1
and vl as well as the stopping rules in accordance with Algorithm 2.
ul 2 , uk+1
l
6.2 TVreg Software
For the algorithm we also used parts of another toolbox, called TVreg, which is implemented in C. It is a software for solving three-dimensional tomographic reconstruction
problems based on [65]. The underlying algorithm solves the optimization problem
1
J(u) = kKu − f k22 + λtv
2
Z
Z
p
|u(x)|q dx → min .
|∇u(x)| dx + λl
Ω
u
Ω
(6.1)
Thus, the approach to solve the inverse problem Ku = f is based on variational methods
as well. The operator K is assumed to be linear, for that reason data are preprocessed in
the way that the constant part (cf. C1 (ω) in (6.3)) is substituted prior to reconstruction.
Moreover, data are assumed to be corrupted by additive Gaussian noise, therefore the
data discrepancy term is the L2 -norm. For λl = 0 and p = 1 the regularization
functional is the total variation. In our implementation presented in section 6.1 we
made use of the complex forward model (cf. (3.17), resp. (6.3)) for phase contrast
TEM imaging contained in the TVreg software. This enables us to make use of an
accurate and efficient implementation in C and to allow for fair comparisons between
different inversion methods. Although we did not use any parts of the implemented
optimization strategy, we want to shortly address it. This is done with regard to the
next chapter, where some of our results are compared to the ones of this software. The
iterative method for solving (6.1) is inspired by the conjugate gradient method and
uses Newton methods for some line searches (cf. Chapter 5.1). The software assumes
that J is differentiable, which is not fulfilled if p = 1 and/or q = 1. Therefore, the
energy functional J is replaced by a smoothed variant Jβ given as
1
Jβ (u) = kKu − f k22
2
Z
2
|∇u(x)| + β
+ λtv
Ω
2
p2
Z
2
|u(x)| + β
dx + λl
Ω
2
2q
(6.2)
dx → min .
u
The constant β > 0 in the algorithm is not fixed but adapted in each iteration during
the reconstruction. TVreg constructs a sequence (un , βn )n where un should converge
6
Programming and Realization in MATLAB and C
86
to a solution of (6.1) and βn becomes stationary. As can be seen in the results in
Chapter 7, the smoothing yields edges that are not as sharp as in the case of exact TV
regularization.
Choosing the right regularization parameter λ is always a difficult task in the context
of regularization methods. Although there are a lot of approaches for finding a suitable
λ, most of them are not applicable in the case of highly noisy ET data. One reason
is that most of them are based on an estimate δ of the present noise level, which is
difficult to obtain in ET. The main novelty of the TVreg software is a method for
choosing adequate regularization parameters (cf. [65]). It is explicitly designed for
highly noisy ET data and therefore a great advantage compared to all other methods.
6.3 Embedding of the Forward Operator via MEX files
The aforementioned TVreg software contains an implementation of the forward operator and its adjoint for phase contrast TEM imaging. In Chapter 3 we present a
computationally feasible forward model. It has the form
K(F )(ω)i,j = C1 (ω) − C2
X h
PSFopt ~ω⊥
i
P (F ) (xk,l ) PSFdet (xi,j − xk,l ) (6.3)
re
k,l
with constants C1 (ω) and C2 that are dose-dependent. The constant C1 (ω) corresponds
to what would be measured if the scattering potential was zero. In an ideal situation
all parameters influencing both constants are known prior to reconstruction. Since this
is often not the case, one either needs to reconstruct them alongside with the scattering
potential or estimate them from the data. The implementation of the forward operator
in the TVreg software has the form
K(F )(ω)i,j = C1 (ω) −
X h
PSFopt ~ω⊥
i
P (F ) (xk,l ) PSFdet (xi,j − xk,l ), (6.4)
re
k,l
|
{z
∗
}
where C1 (ω) is estimated as
C1 (ω) =
X
1
·
fdata (ω, i, j).
#fdata (ω, i, j) i,j
The idea for estimating C1 (ω) as the average of the measured data fdata (ω, · , · ) in each
micrograph is the following. The second part of the forward operator, i.e. (∗) in (6.4),
6
Programming and Realization in MATLAB and C
87
represents the information obtained due to scattering effects. In the case of weakly scattering materials, which is common for biological applications, this part is very small
compared to the constant C1 (ω). Therefore, one can assume that fdata (ω, i, j) ≈ C1 (ω)
for all (i, j) and estimate C1 (ω) as the average of the measured data fdata (ω, · , · ).
The constant C2 is missing in the implementation of TVreg. If one assumes additive
Gaussian noise and preprocesses the data as described before, i.e. one substitutes
C1 (ω) prior to reconstruction, this missing constant has no negative influence on the
reconstruction. Since the main interest in ET lies in reconstructing the correct form
and position of the specimen, it does not really matter if u or C2 · u is reconstructed.
However, in case one assumes Poisson noise, i.e. data-dependent noise, the missing
constant has a major influence on the reconstructions. As we can see below, especially
in the case of a high dose, this can result in the divergence of the algorithm. Nevertheless, we used this implementation in the (Bregman-)FB-EM-TV framework due to the
advantages mentioned before. The forward operator and its adjoint are implemented
in C whereas the software framework for the optimization algorithm is implemented
R
in MATLAB
. Therefore, we developed MEX1 files and used the MEX interface in
R
order to combine both. MEX file stands for MATLAB
executable file and provides
R
the opportunity to invoke efficient C-code from within MATLAB
. Therefore, MEX
files are a possibility to either increase the speed of certain functions by implementing
R
them in C without loosing the flexibility to use them out of MATLAB
or to use
R
existing C-code within a MATLAB framework as it is done in our implementation.
Technically, the files are written in C but, when compiled, act as if they were built-in
R
functions in MATLAB
.
6.4 Difficulties
Before we will present some results of the (Bregman-)FB-EM-TV algorithm in the next
chapter, we address some difficulties that may occur. Mainly, there are four different
problems that we have to deal with.
Pointwise Results As aforementioned, the dose-dependent constant C2 in (6.3) is
missing in the implementation of the forward operator. In general, a higher electron
dose results in an improved signal-to-noise ratio and thus should deliver better results.
This is not necessarily the case in our algorithm, since the influence of the missing C2
1
http://www.mathworks.de/de/help/matlab/call-mex-files-1.html
6
Programming and Realization in MATLAB and C
88
increases with a higher dose. The results may be pointwise, which means that individual values are disproportionately high and are emphasized each time the operator
is applied. This can lead to the divergence of the algorithm. Often, these pointwise
results are a consequence of too less iteration steps or an unsuitable scaling of the
data. The implementation of the weighted ROF model expects that the regularization
parameter is chosen in accordance with the scaling of the input data. That means, if
the input data are scaled from 0 to 255, the regularization parameter needs to be 255
times larger compared to the regularization parameter for data that are scaled from
0 to 1. By default, the algorithm uses the initial solution u0 = 1. If the ’correct’
solution is for example scaled from 0 to 80, it is complicated to choose a regularization
parameter that is suitable for the scaling of the first but also for later iterates. Thus,
the influence of the regularization becomes vanishingly low in later iterates. Thereby,
the regularization cannot counteract the pointwise results produced in the EM step
and the algorithm might diverge. A remedy to this problem is to adapt the starting
value consistent with the expected scaling of the solution. Especially for data with a
higher electron dose a higher starting value may result in significantly improved reconstructions. Note that if the algorithm converges, the result is the same, independent of
the chosen starting value. If a suitable starting value does not help, the problem can
be circumvented to a certain degree by using the damping parameter ω. For a small
ω the iterations in each iteration step stay closer to the regularized solution and we
expect smoother results (cf. Chapter 5.2.2). Therefore, it can suppress the formation
of individual high values. Another idea was to scale the data prior to reconstruction
in the sense that we eliminate the influence of the dose. Unfortunately, this promoted
the formation of stripe artifacts, which are addressed below.
Negative Values The linear part of the forward operator includes the PSF of the
optics defined in terms of its Fourier transformation called CTF (cf. (3.11)). In phase
contrast imaging the optics has the important role to make the phase shift that the
wave undergoes when interacting with the specimen, visible. Hence, the CTF might
reverse contrast, which is no problem in case one models additive Gaussian noise. If one
is modelling Poisson noise, it is still no problem when applying the forward operator.
There is only a problem when the adjoint of the linear part, i.e. L∗ , reverses contrast
so that negative values arise although it is applied to a function in data space that is
positive. In this case, there are negative values in the EM step of the algorithm leading
to the divergence of the algorithm. In our tests we discovered a strong correlation
between individual disproportionately high values as described in the paragraph above
6
Programming and Realization in MATLAB and C
89
and negative results after applying the adjoint to positive functions in the data space.
A way to prevent the negative results is either a small damping parameter w or a
suitable starting value u0 .
Stripe Artifacts In some rare cases the forward operator produces stripe artifacts
with a direction orthogonal to the tilt axis (cf. Chapter 7). One explanation for those
artifacts could be related to the limitation of angles in the measured data. Once the
artifacts arise, they are emphasized in each iteration. Unfortunately, we could not reliably figure out the source for the formation of these artifacts and will confine ourselves
to explain proper handling of these situations. A way to circumvent these stripes is
to use a strong regularization, leading to overregularized solutions with an unnatural
appearance. This goes well together with the usage of Bregman iterates. Since the iterative Bregman algorithm presented in Chapter 5.2.3 is an inverse scale space method,
the first Bregman iterates are strongly overregularized whereas smaller details are included in the later iterates. Once the stripe artifacts have been suppressed by strong
regularization in the first Bregman iterates, we could not observe new formations in
later Bregman iterates. Thus, using Bregman iterations seems to be a good way to
handle these artifacts.
Computational Time and Memory Consumption A major drawback of our algorithm is the long computational time as well as the memory consumption. The dimensions of the reconstructions for the different data sets we use range from 95 × 100 × 80
(smallest simulated data set) to 512 × 256 × 350 (largest experimental data set). Thus,
especially in the case of experimental data, the computational time is a major problem
exacerbating adequate parameter tests. If we compare the computational time needed
for the EM and TV step, the latter one is the crucial point. Therefore, the reconstruction size is critical, whereas the size of the data set has a much smaller impact. An idea
to circumvent this problem is to reconstruct smaller subregions if the focus mainly lies
on a certain part of the reconstruction. In order to give an impression of the actual
run times, we will mention some examples alongside with the results in Chapter 7.
90
7 Results
In this chapter we present some computational results of the (Bregman-)FB-EM-TV
reconstruction algorithm. In the first part we use simulated data sets and compare
our results to the ground truth as well as to results of the TVreg software. In the
second part, we present reconstructions from an experimental data set and compare
them to results from the TVreg software as well. Moreover, we clarify some of the
afore-mentioned difficulties of the algorithm based on suitable examples. Note that all
data sets we use are single-axis tilt-series data. Moreover, the results we present are
2D cross-sections of the 3D objects. For a better visualization, we partly replaced the
grayscale colorbar by a colorbar ranging from dark blue (low intensity values) to dark
red (high intensity values).
7.1 Simulated Data
The simulated data sets we use represent single-axis tilt-series data from a conventional
300 keV bright-field TEM. They are generated with the TEM simulation software that
is presented in [66]. The tilt angles vary from −60◦ to +60◦ with one micrograph every
second degree, i.e. in total there are 61 different micrographs for each data set. The
region of interest we want to reconstruct is a three-dimensional rectangular voxel region
with a voxel size of 0.5 nm. The detector is a two-dimensional rectangular pixel region
with a pixel size of 16 µm. Overall, the magnification is 25000. Moreover, we want to
specify the parameters that influence the forward model presented in Chapter 3. The
defocus and the spherical aberration that influence the CTF in (3.11) and thereby the
optics PSF are given as 4z = 3 µm and Cs = 2.1 mm, respectively. The detector PSF
is defined by use of the MTF in (3.16) with a = 0.7, b = 0.2, c = 0.1, α = 10 and
β = 40. Finally, the focal length in (3.9) is f = 2.7 mm.
7
Results
91
(a)
(b)
(c)
Figure 7.1: Simulated data set from balls phantom. a) 2D cross-section of the 3D
phantom. b) Zero-tilt image of the noise-free data set. c) Zero-tilt image with noise.
Balls Data Set The first data set we use is generated with a phantom representing
40 balls of different size and contrast embedded in aqueous buffer. The phantom as
well as given simulated data without and with noise are shown in Figure 7.1. The
noise-free data set is presented in Figure 7.1 b) and the data set with noise in 7.1 c).
The underlying objects are hardly to detect in the noisy data set, which illustrates
how challenging reconstructions in the field of ET are. The region of interest that we
reconstruct has the dimensions 210 × 250 × 40. The total electron dose is 6000e− /nm2
distributed over all micrographs, which corresponds to 40e− /pixel in each micrograph.
RNA Data Set The other simulated data sets are generated with a phantom representing a single RNA Polymerase II-particle. The three-dimensional region of interest
that we want to reconstruct has the dimensions 95 × 100 × 80. We use two different
doses, a low one representing a more realistic data set and a high one in order to
emphasize some of the algorithmic difficulties we have. The low total electron dose is
5000 e− /nm2 corresponding to a dose of 33 e− /pixel in each micrograph. The higher
total dose is 100.000 e− /nm2 , i.e. 671 e− /pixel in each micrograph. In Figure 7.2 we
show the phantom, the noise-free data set as well as the noisy data sets with the high
and low total electron dose, respectively.
Validation and Evaluation In the case of simulated data sets we want to evaluate
the obtained results by comparing them to the phantom. Now the question is which
validation tools are an adequate choice for our results. Since the focus in ET lies on
reconstructing the right position and shape of the specimen, we want to compare the re-
7
Results
92
(a)
(b)
(c)
(d)
Figure 7.2: Simulated data set from RNA phantom. a) 2D cross-section of the 3D
phantom. b) Zero-tilt image of the noise-free data set. c) Zero-tilt image with noise
(dose 671 e− /pixel). d) Zero-tilt image with noise (dose 33 e− /pixel).
sults and the phantom independently of their different contrasts. Therefore, we segment
the results prior to the comparison. We decided for a simple thresholding algorithm,
although the results are probably more accurate with a more sophisticated segmentaR
tion algorithm like e.g. K-Means or Chan-Vese algorithms. We use the MATLAB
build-in graythresh function1 , but reduce the automatic threshold by 0.1. The results
are thresholded and thereby converted into logical arrays. Then, we calculate the Jaccard similarity index of the segmented phantom and reconstruction result, which is
defined as
|A ∩ B|
(7.1)
JC(A, B) =
|A ∪ B|
for logical arrays A and B. The index ranges from 0 to 1, whereby a higher index is
preferable. Note that for some results, especially for low-dose data sets, the segmentation is not possible. Hence, we have to solely rely on our visual perception to evaluate
the results in these cases.
1
http://www.mathworks.de/de/help/images/ref/graythresh.html
7
Results
(a)
93
(b)
(c)
(d)
Figure 7.3: Balls phantom and TVreg result prior to and after segmentation. a) Phantom. b) Phantom after segmentation. c) Result obtained with TVreg software. d)
TVreg result after segmentation. The Jaccard similarity index of the ground truth
segmentation b) and d) is 0.585.
To evaluate the reconstruction results of the balls data set we use a second criterion,
based on the segmented image as well. The aim is to automatically find out the number
of balls that are reconstructed correctly or falsely and the ones that are missing in the
R
reconstruction. By using the MATLAB
function regionprops2 we obtain a list of the
centroids and areas of each connencted component that is present in the segmented
image. Then, a ball is labeled as correctly reconstructed if the euclidean distance of
its centroid crecons and the centroid cphantom of a ball in the phantom is smaller than 2.
This value is motivated by the distance distributions shown in Figure 7.5.
7.1.1 Results Balls Phantom
In this section, we present results of the (Bregman-)FB-EM-TV algorithm for the
balls data set. The ground truth of the data set before and after segmentation is
shown in Figure 7.3 a) and b). Figure 7.3 c) and d) show the TVreg result that
we use as a reference for our reconstructions, whereby the latter one is the result
after segmentation. The Jaccard similarity index of the ground truth segmentation in
Figure 7.3 b) and d) is 0.585. To get an impression of the influence of the regularization
parameter α we tested the FB-EM-TV algorithm for differentent parameter choices.
In Figure 7.4 the results for different levels of regularization are presented. A higher
regularization leads to less reconstructed balls, especially the balls with a low contrast
are not reconstructed. Moreover, the contrast loss that is typical for TV regularization
can be seen. While for a small regularization (cf. Figure 7.4 a)) the contrast of
the ball in the left lower corner is relatively bright, there is a significant decrease of
2
http://www.mathworks.de/de/help/images/ref/regionprops.html
7
Results
(a) α = 0.0025
94
(b) α = 0.0055
(c) α = 0.0095
(d) α = 0.0125
Figure 7.4: Results of the balls data set obtained with the FB-EM-TV algorithm for
different regularization parameters.
Figure 7.5: Minimal euclidean distance between the centroid of a reconstructed ball and
the centroid of a ball in the phantom. Examples for different regularization parameters.
contrast when the regularization parameter increases (cf. Figure 7.4 b)-d)). We used
these reconstruction results for different regularization parameters to get an idea of
the position of the reconstructed balls compared to the ground truth given in Figure
7.4 a). Dependent on the regularization parameter, we plotted the minimal euclidean
distance between the centroid of the reconstructed ball and the centroid of a ball in the
phantom in Figure 7.5. Here, each point represents a reconstructed ball. By comparing
the results for a smaller regularization parameter, where also the balls with low contrast
are reconstructed, to the results obtained with a larger parameter, where only the balls
with high contrast are reconstructed, we can conclude that the deviation is higher for
smaller balls or balls with low contrast. Overall, the distance is roughly in the range
from 0 to 2, therefore we used this value as the limit in our validation tool to decide
whether a ball is correctly reconstructed or not. Note that extreme outliers are not
plotted in Figure 7.5.
7
Results
95
(a)
(b)
(c)
Figure 7.6: Result for the balls data set obtained with the FB-EM-TV algorithm prior
to and after segmentation. a) Result prior to segmentation. b) Result after segmentation. c) KL divergence of f δ and Kuk . The Jaccard similarity of b) and Fig. 7.3 b) is
0.6703.
A result of the FB-EM-TV algorithm is presented in Figure 7.6 a). We chose the initial
solution u0 = 30, the regularization parameter α = 0.0025, and 150 EM steps. This
resulted in a computational time of approximately 72 minutes. If we compare this
result to the TVreg result in Figure 7.3 c), we see that our algorithm produces sharp
edges of the objects and a much smoother background. Note that the background
artifacts that are still visible are a consequence of the higher starting value. The balls
that are reconstructed are roughly the same in both results. A drawback of the FB-EMTV algorithm is that the contrast loss as a result of TV regularization makes it hard
to detect the row with the lowest contrast. This influences the segmentation result
in Figure 7.6 b). The lower intensity balls on the right are only partly segmented,
especially the contrast of the largest ball in the right row is too low. Nevertheless, we
obtain an increased Jaccard similarity index of 0.6703 compared to 0.585 before. In
Figure 7.6 c) we see that the KL divergence of the noisy data f δ and Kuk decreases
monotonously in every iteration step, which indicates that in each iteration the value
of the objective functional DKL (f δ , Ku) + α|u|BV (Ω) decreases.
The Bregman-FB-EM-TV algorithm is an extension of the FB-EM-TV algorithm, designed to reduce the contrast loss caused by total variation regularization. In Figure
7.7 we show a result of this extended algorithm. Here, the initial solution was u0 = 1,
the number of EM steps in each Bregman step 150 and the regularization parameter
α = 0.009. The computational time for this result was roughly 4 hours. The algorithm
is an inverse scale space method, thus, it starts with an overregularized solution and
incorporates more information in each Bregman iteration. Figure 7.7 a)-c) shows the
results after each Bregman iteration, whereby c) is the final result. The enhanced
7
Results
96
(a) 1st Bregman step
(d)
(b) 2nd
step
Bregman
(c) 3rd Bregman step
(e)
Figure 7.7: Results for the balls data set obtained with the Bregman-FB-EM-TV algorithm prior to and after segmentation. a) - c) Results of the 1st - 3rd Bregman step.
d) Result of the 3rd step after segmentation. e) KL divergence of f δ and Kuk . The
Jaccard similarity index of d) and Fig. 7.3 b) is 0.7687.
contrast leads to a better segmentation result, making it possible to detect the balls
with the lowest contrast, too. A drawback is that the contrast of false objects may
be enhanced as well. On the right edge of Figure 7.7 d) we can see a false object,
which was not detected in the result of the FB-EM-TV algorithm (cf. Figure 7.6 b)).
Compared to the segmented phantom in Figure 7.3 b) we obtain an improved Jaccard
similarity index of 0.7687. Again, Figure 7.7 e) shows the KL divergence of f δ and
Kuk , indicating a monotonous decrease of the objective functional.
Table 7.1 shows that the number of correctly reconstructed balls differs only slightly
among the reconstruction algorithms. All three have their strengths and weaknesses.
The Bregman- and the FB-EM-TV algorithm deliver sharp edges, making it easier to
differentiate between objects and the background. A drawback is that both algorithms
tend to reconstruct false objects at the edges of the reconstruction. Therefore, an
advantage of the TVreg software is that the number of false objects is distinctly smaller
compared to the other algorithms. Unfortunately, the edges of the TVreg results are
blurred, especially in outer slices of the 3D object. This makes the segmentation more
7
Results
97
Algorithm
# correctly
recons. balls
# false objects
# missing balls
TVreg
23
4
17
FB-EM-TV
22
10
18
Bregman-FB-EM-TV
25
8
15
Table 7.1: Evaluation of reconstructed balls.
Figure 7.8: Volume ratio of the balls in the reconstruction compared to the phantom.
complicated and results in segmented objects that are rather oval-shaped than circular.
This is clarified in Figure 7.8. Here, for each correctly reconstructed ball we compare
the volume of this ball after segmentation to the corresponding one in the ground
truth segmentation. For the TVreg software most of the segmented balls are too large,
sometimes even twice as large as the ground truth. On average, the volume ratio
between the reconstruction after segmentation and the ground truth segmentation is
1.336. The enhanced sample contours in the (Bregman-)FB-EM-TV results facilitate
the segmentation step and prevent that the volume of the segmented balls is much
larger than in the ground truth. A drawback of the FB-EM-TV algorithm is that for
some balls the contrast is too low to segment the whole ball. Therefore, the volume
tends to be too small with some outliers that are not even half as large as the ground
truth segmentation. Here, the average volume ratio is 0.808. The best results can
be obtained with the Bregman-FB-EM-TV algorithm, where the average volume ratio
is 0.924. Apart from two outliers, the volume ratios of the reconstructed balls after
segmentation compared to the ground truth segmentation are uniformly distributed
around 1.
7
Results
(a)
98
(b)
(c)
(d)
Figure 7.9: Phantom and TVreg result for RNA Polymerase II with high-dose data prior
to and after segmentation. a) Phantom. b) Phantom after segmentation. c) TVreg
result. d) TVreg result after segmentation. The Jaccard similarity index between b)
and d) is 0.6375.
7.1.2 Results RNA Phantom
In the following paragraphs we will compare results of two simulated data sets, where
the underlying phantom is the simulation of an RNA Polymerase II. The phantom
prior to and after segmentation is shown in Figure 7.9 a) and b), respectively.
High Dose Tilt-Series The first data set we used has a very high signal-to-noise ratio
compared to ET standards. On the basis of this data set we want to clarify what is
meant by pointwise results mentioned in Chapter 6.4 and the influence of the initial
solution on these results.
Again, we start this section with a reference result obtained with the TVreg software,
shown in Figure 7.9 c) and its segmentation in 7.9 d). The Jaccard similarity index of
Figure 7.9 b) and d) is 0.6375.
In Figure 7.10 a) a result of the FB-EM-TV algorithm is shown. Here, we chose an
initial solution of u0 = 80, 150 EM iterates and the regularization parameter α =
0.00095. For this reconstruction, the computational time needed was 26 minutes. If we
compare this result to the TVreg result in Figure 7.9 c), we can see that we obtain a
much smoother background, but the form of the reconstructed specimen is nearly the
same. This is even more obvious when we compare the results after segmentation, i.e.
Figures 7.10 b) and 7.9 d). The Jaccard similarity index is 0.6768 for the FB-EM-TV
result and thereby slightly higher than for the TVreg result.
The improved contrast of the Bregman-FB-EM-TV algorithm yields again an improvement, although not as significant as before. The result prior to segmentation is shown
7
Results
99
(a)
(b)
(c)
Figure 7.10: Result for RNA Polymerase II with high-dose data obtained with the FBEM-TV algorithm prior to and after segmentation. a) Result prior to segmentation.
b) Result after segmentation. c) KL divergence of f δ and Kuk . The Jaccard similarity
of b) and Fig. 7.9 b) is 0.6768.
(a)
(b)
(c)
Figure 7.11: Result for RNA Polymerase II with high-dose data obtained with the
Bregman-FB-EM-TV algorithm prior to and after segmentation. a) Result prior to
segmentation. b) Result after segmentation. c) KL divergence of f δ and Kuk . The
Jaccard similarity of b) and Fig. 7.9 b) is 0.6895.
in 7.11 a) and after segmentation in 7.11 b), whereby the Jaccard similarity index is
0.6895. This reconstruction is obtained after three Bregman steps with 150 EM iterations in each step and with α = 0.003 and u0 = 1. In total, the computational time was
45 minutes. Note that we ran our algorithm on different computers, thereby the computational times are not necessarily consistent with each other and are only mentioned
so that the reader can receive an impression of the time needed for a reconstruction.
Figure 7.11 c) shows the decrease of the objective functional in each iteration.
Next, we want to clarify the influence of the initial solution. Table 7.2 shows four
reconstruction results, where all are obtained with the same regularization parameter
α = 0.002. We see that for an initial solution of u0 = 1 and 50 EM iterations we get
a pointwise result as described before. Only sparse high values are visible, whereas
the overall form of the specimen vanishes. With more EM iterations, this problem can
be solved. But since the computational time of our algorithm is always an important
7
Results
100
EMits = 50
EMits = 150
u0 = 1
u0 = 80
Table 7.2: Influence of the initial solution and the number of EM iterates.
issue, we may prefer another solution, especially for large data sets. Then, an adapted
starting value u0 is advisable. In the second row we see the reconstruction results for
the same number of EM iterations but with u0 = 80. In this case, already after 50 EM
iterations the specimen form is clearly reconstructed and the next 100 iterations are
not necessarily needed if we want to shorten the computational time. Thus, an adapted
starting value can lead to improved results after the same number of iterations.
Low Dose Tilt-Series The most challenging simulated data set we used is again the
RNA Polymerase II but, compared to the prior reconstructions, with a significantly
decreased electron dose and thereby a very low signal-to-noise ratio.
Again, we are interested in a reference result obtained with the TVreg software. We
tested several parameters (see Figure 7.12), but in our test we were not able to obtain
a result that could be segmented for the postprocessing steps.
In Figure 7.13 results of the FB-EM-TV algorithm for different regularization parameters are presented. For all results, we chose 150 EM iterations and the initial solution
u0 = 10. Here, we have to weigh if the focus is on an improved contrast or less false
objects. By comparing Figure 7.13 a), d) and e), we see that the background becomes
smoother with a higher value of α, but we lose the upper part of the RNA Polymerase.
Therefore, we decided for the less regularized solution. The result after segmentation
is shown in Figure 7.13 b) with a Jaccard similarity index of 0.3411 when compared to
the ground truth segmentation (cf. Figure 7.9 b)).
7
Results
101
(a) λtv = 500
(b) λtv = 750
(c) λtv = 1500
Figure 7.12: Results for RNA Polymerase II with low-dose data obtained with the
TVreg software. Examples for different regularization parameters.
A similar comparison for different parameter choices but for the Bregman-EM-TV
algorithm is given in Figure 7.14. We used 2 Bregman steps with 150 EM iterations
each and the initial solution u0 = 1. Compared to Figure 7.13 we see a significantly
enhanced contrast, especially for a smaller regularization parameter. This facilitates
postprocessing steps based on segmentation. Unfortunately, the higher contrast leads
to more false objects in the background and thereby a smaller Jaccard similarity index
of 0.2922 when Figure 7.14 b) and Figure 7.9 b) are compared. Nevertheless, we think
that the result in Figure 7.14 a) is preferable over c) and d), where we lose again the
upper part of the object. The KL divergence of f δ and Kuk is shown in Figure 7.14 c).
With regard to Figures 7.12, 7.13 and 7.14, we can conclude that the (Bregman-)FBEM-TV algorithm, in comparison to the TVreg software, is strong in cases of a very
low signal-to-noise ratio.
7
Results
102
(a) α = 0.006
(b)
(d) α = 0.0065
(c)
(e) α = 0.007
Figure 7.13: Results for RNA Polymerase II with low-dose data obtained with the FBEM-TV algorithm for different regularization parameters. a), d) and e) Results prior
to segmentation. b) Result in a) after segmentation. c) KL divergence of f δ and Kuk .
The Jaccard similarity index of b) and Fig. 7.9 b) is 0.3411.
(a) α = 0.011
(b)
(d) α = 0.0125
(c)
(e) α = 0.0135
Figure 7.14: Results for RNA Polymerase II with low-dose data obtained with the
Bregman-FB-EM-TV algorithm for different regularization parameters. a), d) and e)
Results prior to segmentation. b) Result in a) after segmentation. c) KL divergence
of f δ and Kuk . The Jaccard similarity index of b) and Fig. 7.9 b) is 0.2922.
7
Results
103
(a)
(b)
(c)
Figure 7.15: a) Experimental data set. b) Result of the TVreg software. c) Crystal
structure4 of CPMV.
7.2 Experimental Data
The experimental data set we use is a single-axis tilt-series of a cryo-fixated in vitro
specimen by courtesy of FEI3 . The specimen contains a mixture of different viruses,
including Cowpea Mosaic Viruses (CPMV), in aqueous buffer. The data set we use is a
subset of the original larger data set, mainly containing CPMV virions. It is acquired
by FEI using a conventional 300 keV bright-field TEM. Data are recorded from −62.18◦
to 58.03◦ , with 81 micrographs in total. In Figure 7.15 we present the data set as well
as a crystal structure of a cowpea mosaic virus.
7.2.1 Results CPMV Virus
In this paragraph we present results of the (Bregman-)FB-EM-TV algorithm that we
obtained with the experimental data set presented in Figure 7.15 a).
In Figure 7.16 a) a result of the FB-EM-TV algorithm is shown. Here, the initial
solution was u0 = 1 and the regularization parameter α = 0.0035. In contrast to
the reconstruction shown in the previous paragraph, we used only 20 EM iterations,
which results in a computational time of 135 minutes. A reconstruction with 150
3
4
www.fei.com
http://www.scripps.edu/johnson/research/crystals.html
7
Results
104
(a)
(b)
Figure 7.16: Results of the FB-EM-TV and Bregman-FB-EM-TV algorithm for the
experimental data set. a) Result of the FB-EM-TV algorithm with α = 0.035. b)
Result of the Bregman-FB-EM-TV algorithm with α = 0.095.
EM iterations, as done for the simulated data sets, would take 17 hours and thereby
prevents reasonable parameter tests.
The second reconstruction, presented in Figure 7.16 b), is obtained with the BregmanFB-EM-TV algorithm after 3 Bregman steps with 10 EM iterations each. The computation of this result took about 3.5 hours. Note that we used a small damping
parameter (ω = 0.1) for both reconstructions.
Both reconstructions suffer from strong intensity variations at the edges of the object.
Moreover, with more iterations the results become pointwise and the algorithm might
diverge. The formation of sparse high intensity values can be suppressed by a strong
regularization. This motivates the usage of the Bregman-FB-EM-TV algorithm as an
inverse scale space method. Unfortunately, the results of the later Bregman steps are
again impaired by individual high intensity values. Therefore, the algorithm needs to
be stopped before these high values arise but then we end up with an overregularized
solution. A small damping parameter can result in enhanced reconstructions, although
it does not solve the problem.
Thus, based on the current status of our results, we have to admit that these difficulties prevent reconstructions from experimental data sets that are comparable to the
results of the TVreg software. The good results with respect to contrast and contour
enhancement that we achieved with the Bregman-FB-EM-TV algorithm for simulated
7
Results
105
(a) α = 0.0007
(b) α = 0.00035
(c) α = 0.007
Figure 7.17: Stripe artifacts that can impair the reconstruction results obtained with
the FB-EM-TV algorithm using a scaled version of the experimental data set. Results
for different regularization parameters.
data sets cannot be transferred to experimental data sets. The reason for these difficulties is a missing padding in the forward operator, which has a negative influence in
the case of a nonlinear data term. In the implementation of the forward operator the
region outside the given data set is assumed to be zero. If the data discrepancy term is
linear and the data can be preprocessed in the way that the background is substituted
prior to reconstruction, this assumption is reasonable. In the case of a nonlinear data
discrepancy term an equivalent preprocessing step to substitute the background is not
possible. Thus, a reasonable assumption in this case is that the region outside the
given data set is constant, whereby this constant is estimated e.g. as the average of
the data. For the simulated data sets this is less an issue since the region that contains
the information about the specimen is only a subregion of the larger data set (cf. the
noise-free data sets in Figures 7.1 and 7.2 b)). After including this assumption in the
forward and adjoint operator we expect results from the experimental data set that are
comparable to the results from simulated data sets shown before.
Stripe Artifacts In this paragraph, we want to shortly address the formation of stripe
artifacts, mentioned before. Nevertheless, we want to stress that these are really rare
artifacts when using the KL divergence as a data discrepancy term. In the examples
shown below, we scaled the data prior to reconstruction to reduce the influence of the
missing constant, mentioned before. Then, the formation of stripe artifacts was much
7
Results
106
more likely than without the scaling. In some tests, where we used the L2 -norm as the
data discrepancy term we observed these artifacts much more often. In those cases,
the artifacts could be circumvented by a simple line search to determine the relaxation
parameter in the Landweber step.
Figure 7.17 shows results of the FB-EM-TV algorithm for different regularization steps.
Prior to the reconstruction, we scaled the experimental data set by 10. We can see that
in the case of a small regularization parameter (cf. Fig. 7.17 a)) the results are impaired
by stripe artifacts with a direction orthogonal to the tilt axis. In each iteration step,
these artifacts are emphasized and result in the divergence of the algorithm. A higher
regularization (cf. Fig. 7.17 b) and c)) can suppress the formation of stripe artifacts
but leads to overregularized solutions. This motivates that in the case of stripe artifacts
the Bregman-FB-EM-TV algorithm should be used. Then, the artifacts are suppressed
in the first Bregman step and will not impair the reconstruction in later iterates.
107
8 Conclusion and Outlook
In this thesis we presented a reconstruction algorithm for ET based on variational
methods. We studied an approach using the KL divergence as a data discrepancy term
combined with different regularization functionals, namely total variation regularization and an extension with Bregman iterations.
We started this thesis with an introduction to ET and mentioned the advantages of
electron microscopy compared to light microscopy. Moreover, we gave an overview of
the build-up of a TEM in order to facilitate the understanding of the forward model
presented thereafter. The model we used to describe the image acquisition is a computationally feasible model for phase contrast TEM imaging. Afterwards, we introduced
variational approaches for solving an inverse problem and gave an overview on different
data and regularization functionals. We decided in favor of a data discrepancy term
based on a statistical modeling in terms of a MAP likelihood estimation combined
with total variation regularization or an extension with Bregman distances. Since we
assumed that the recorded data are affected by Poisson noise, the data discrepancy
term we used is the KL divergence.
In the numerical part we introduced two different forward-backward-splitting algorithms. They are adapted for a forward operator of the form Ku = C − Lu with a
linear operator L and can easily be generalized to other applications with a forward
R
operator of this affine form. Both algorithms are implemented in MATLAB
, whereas
the forward and adjoint operator for phase contrast TEM imaging are implemented in
C and invoked via MEX files.
The first algorithm tries to find a minimum of an energy functional consisting of the
KL divergence and the total variation. We compared the results that we obtained with
this algorithm to the ones of another toolbox, which minimizes an energy functional
consisting of the L2 -norm as a data discrepany term and the total variation as well. The
L2 -norm as a data discrepancy term can be associated with modelling additive Gaussian
noise. Thus, by means of this comparison, we wanted to investigate the excess profit of
8
Conclusion and Outlook
108
the more accurate model of Poisson noise. Based on the current status of our results we
come to the conclusion that improved results can be obtained, but modelling Poisson
compared to additive Gaussian noise cannot achieve the improvements we hoped for.
The most notable improvements could be obtained for data with a very low signal
to noise ratio. The quality of the reconstructions is affected by several difficulties
introduced in section 6.4 and clarified by some examples later on. Inaccuracies in the
implementation of the forward model seem to have a much higher impact in the case of
modelling Poisson noise. This affects especially the reconstructions from experimental
data sets. At the current status, these results cannot reflect the good results obtained
with simulated data sets. We think that an adapted implementation of the forward
model would be helpful and could solve some of the algorithmic problems.
The second algorithm is an extension of the first one by means of iterative regularization
based on Bregman distances. With regard to our results we can say that in nearly all
cases we could achieve improved results. Especially in the case of a low signal-tonoise ratio, iterative regularizations based on Bregman distances are beneficial. The
enhanced contrast of the results obtained with this second algorithm can facilitate
postprocessing steps based on segmentation methods as presented in Chapter 7.
Besides an adaption of the implemention for the forward operator, there are further
tasks in order to improve the proposed reconstruction algorithms. Right now, there are
several parameters that need to be chosen manually. We would like to incorporate parameter choice rules in the (Bregman-)FB-EM-TV algorithm to minimize the number
of parameters that must be chosen by the user. Concerning the regularization parameter, this could be a fixed choice in dependence of the electron dose and other given
parameters that significantly influence the quality of the recorded data. Another task
is a suitable criterion to stop the Bregman iterations. Since there is no reliable estimate
of the noise level in ET, we need to find a way to stop the iterations as soon as the new
iterate is noisier than the previous one. To enhance the applicability of our algorithm
for large experimental data sets an acceleration is indispensable. As a first measure
one could implement the whole algorithm in C, although we think that a further acceleration is needed. Therefore, GPU-accelerated computing would be advantageous.
Moreover, we are interested in a theoretical analysis of our algorithm. Especially, we
would like to find out under which conditions the convergence of the algorithm can be
proven. Once these problems are solved, we want to enlarge the number of possible
data and regularization terms. We would like to incorporate a data term based on a
noise model that accounts for a mixture of Gaussian and Poisson noise, which is present
in TEM images. Moreover, it would be interesting to test higher-order TV methods.
109
Acronyms
ART
CCD
CT
CTF
EM
ET
FBP
FBS
KKT
KL
MAP
MTF
PSF
ROF
SIRT
STEM
TEM
TV
WBP
Algebraic Reconstruction Technique
Charged Coupled Device
Computed Tomography
Contrast Transfer Function
Expectation Maximization
Electron Tomography
Filtered Back-Projection
Forward-Backward-Splitting
Karush-Kuhn-Tucker
Kullback-Leibler
Maximum A Posteriori
Modulation Transfer Function
Point Spread Function
Rudin-Osher-Fatemi
Simultaneous Iterative Reconstruction Technique
Scanning Transmission Electron Microscope
Transmission Electron Microscope
Total Variation
Weighted Back-Projection
110
List of Figures
2.1
2.2
2.3
2.4
2.5
2.6
2.7
3.1
4.1
4.2
4.3
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
Different kinds of electron scattering from a thin specimen. . . . . . . .
Cross section of the column of a modern TEM. . . . . . . . . . . . . .
Cross-section of an electromagnetic lens. . . . . . . . . . . . . . . . . .
Ray diagram illustrating how an aperture restricts the angular spread
of electrons entering the lens. . . . . . . . . . . . . . . . . . . . . . . .
The concept of overfocus, focus and underfocus. . . . . . . . . . . . . .
Parallel-beam operation in the TEM. . . . . . . . . . . . . . . . . . . .
Single tilt sample holder for a TEM. . . . . . . . . . . . . . . . . . . .
The optical set-up consisting of a single thin lens with an aperture in its
focal plane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bregman distances for single-valued subdifferentials. . . . . . . .
Bregman distances for a multi-valued subdifferential. . . . . . .
Contrast loss for 1D signal recovered with TV regularization and
ent regularization parameters. . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
differ. . . .
Simulated data set from balls phantom. . . . . . . . . . . . . . . . . . .
Simulated data set from RNA phantom. . . . . . . . . . . . . . . . . .
Balls phantom and TVreg result prior to and after segmentation. . . . .
Results of the balls data set obtained with the FB-EM-TV algorithm
for different regularization parameters. . . . . . . . . . . . . . . . . . .
Minimal euclidean distance between the centroid of a reconstructed ball
and the centroid of a ball in the phantom. . . . . . . . . . . . . . . . .
Result for the balls data set obtained with the FB-EM-TV algorithm
prior to and after segmentation. . . . . . . . . . . . . . . . . . . . . . .
Results for the balls data set obtained with the Bregman-FB-EM-TV
algorithm prior to and after segmentation. . . . . . . . . . . . . . . . .
Volume ratio of the balls in the reconstruction compared to the phantom.
7
8
10
11
12
13
14
24
44
44
50
91
92
93
94
94
95
96
97
List of Figures
7.9
7.10
7.11
7.12
7.13
7.14
7.15
7.16
7.17
Phantom and TVreg result for RNA Polymerase II with high-dose data
prior to and after segmentation. . . . . . . . . . . . . . . . . . . . . . .
Result for RNA Polymerase II with high-dose data obtained with the
FB-EM-TV algorithm prior to and after segmentation. . . . . . . . . .
Result for RNA Polymerase II with high-dose data obtained with the
Bregman-FB-EM-TV algorithm prior to and after segmentation. . . . .
Results for RNA Polymerase II with low-dose data obtained with the
TVreg software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Results for RNA Polymerase II with low-dose data obtained with the
FB-EM-TV algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Results for RNA Polymerase II with low-dose data obtained with the
Bregman-FB-EM-TV algorithm. . . . . . . . . . . . . . . . . . . . . . .
Experimental data set, TVreg result and crystal structure of CPMV. .
Results of the FB-EM-TV and Bregman-FB-EM-TV algorithm for the
experimental data set. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stripe artifacts that can impair the reconstruction results obtained with
the FB-EM-TV algorithm using a scaled version of the experimental
data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
111
98
99
99
101
102
102
103
104
105
112
List of Tables
4.1
Different Bregman Distances . . . . . . . . . . . . . . . . . . . . . . . .
45
5.1
Overview of the particular settings for q, h and β in (5.54) in dependence
of the different algorithms presented in sections 5.2.2 and 5.2.3. . . . .
82
7.1
7.2
Evaluation of reconstructed balls. . . . . . . . . . . . . . . . . . . . . . 97
Influence of the initial solution and the number of EM iterates. . . . . . 100
113
Bibliography
[1] R. Acar and C. R. Vogel. Analysis of bounded variation penalty methods for
ill-posed problems. Inverse problems, 10(6):1217, 1994. 48, 66
[2] I. Aganj, A. Bartesaghi, M. Borgnia, H. Y. Liao, G. Sapiro, and S. Subramaniam.
Regularization for inverting the radon transform with wedge consideration. In
Biomedical Imaging: From Nano to Macro, 2007. ISBI 2007. 4th IEEE International Symposium on, pages 217–220. IEEE, 2007. 53
[3] A. Al-Amoudi, J.-J. Chang, A. Leforestier, A. McDowall, L. M. Salamin, L. P.
Norlén, K. Richter, N. S. Blanc, D. Studer, and J. Dubochet. Cryo-electron microscopy of vitreous sections. The EMBO journal, 23(18):3583–3588, 2004. 2
[4] U. Amato and W. Hughes. Maximum entropy regularization of Fredholm integral
equations of the first kind. Inverse Problems, 7(6):793, 1991. 59
[5] M. Bachmayr and M. Burger. Iterative total variation schemes for nonlinear inverse
problems. Inverse Problems, 25(10):105004, 2009. 51
[6] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman
divergences. The Journal of Machine Learning Research, 6:1705–1749, 2005. 45
[7] J. Bardsley. An efficient computational method for total variation-penalized poisson likelihood estimation. Inverse Problems and Imaging, 2(2):167–185, 2008. 46
[8] J. M. Bardsley. Stopping rules for a nonnegatively constrained iterative method for
ill-posed poisson imaging problems. BIT Numerical Mathematics, 48(4):651–664,
2008. 46
[9] J. M. Bardsley and J. Goldes. Regularization parameter selection methods for illposed poisson maximum likelihood estimation. Inverse Problems, 25(9):095005,
2009. 46
[10] J. M. Bardsley and N. Laobeul. An analysis of regularization by diffusion for
Bibliography
114
ill-posed poisson likelihood estimations. Inverse Problems in Science and Engineering, 17(4):537–550, 2009. 45
[11] J. M. Bardsley and A. Luttman. Total variation-penalized poisson likelihood
estimation for ill-posed problems. Advances in Computational Mathematics, 31(13):35–59, 2009. 45
[12] M. Benning, C. Brune, M. Burger, and J. Müller. Higher-order TV methods
- enhancement via Bregman iteration. Journal of Scientific Computing, 54(23):269–310, 2013. 51
[13] F. Benvenuto, A. La Camera, C. Theys, A. Ferrari, H. Lantéri, and M. Bertero.
The study of an iterative method for the reconstruction of images corrupted by
poisson and gaussian noise. Inverse Problems, 24(3):035016, 2008. 42, 43
[14] M. Bertero, P. Boccacci, G. Desiderà, and G. Vicidomini. Image deblurring with
poisson data: from cells to galaxies. Inverse Problems, 25(12):123006, 2009. 45
[15] M. Bertero, P. Boccacci, G. Talenti, R. Zanella, and L. Zanni. A discrepancy
principle for poisson data. Inverse Problems, 26(10):105004, 2010. 46
[16] P. Binev, W. Dahmen, R. DeVore, P. Lamby, D. Savu, and R. Sharpley. Compressed sensing and electron microscopy. In T. Vogt, W. Dahmen, and P. Binev,
editors, Modeling Nanoscale Imaging in Electron Microscopy, pages 73–126.
Springer US, 2012. 54
[17] L. M. Bregman. The relaxation method of finding the common point of convex
sets and its application to the solution of problems in convex programming. USSR
computational mathematics and mathematical physics, 7(3):200–217, 1967. 43
[18] C. Brune. Variationsmethoden in der biomedizinischen Bildgebung. Lecture Notes,
2012. 46, 48, 49, 54
[19] C. Brune, A. Sawatzky, and M. Burger. Bregman-EM-TV methods with application to optical nanoscopy. In X.-C. Tai, K. Mørken, M. Lysaker, and K.-A. Lie,
editors, Scale Space and Variational Methods in Computer Vision, volume 5567
of Lecture Notes in Computer Science, pages 235–246. Springer Berlin Heidelberg,
2009. 46, 51, 52
[20] C. Brune, A. Sawatzky, and M. Burger. Primal and dual Bregman methods
with application to optical nanoscopy. International Journal of Computer Vision,
92(2):211–229, 2011. 46, 52, 62, 83
Bibliography
115
[21] M. Burger, K. Frick, S. Osher, and O. Scherzer. Inverse total variation flow.
Multiscale Modeling & Simulation, 6(2):366–395, 2007. 51
[22] M. Burger, G. Gilboa, S. Osher, J. Xu, et al. Nonlinear inverse scale space methods. Communications in Mathematical Sciences, 4(1):179–212, 2006. 51, 52, 79
[23] M. Burger and S. Osher. A guide to the TV zoo. In Level Set and PDE Based
Reconstruction Methods in Imaging, Lecture Notes in Mathematics. Springer International Publishing, 2013. 54, 56, 58
[24] A. Chambolle. An algorithm for total variation minimization and applications.
J. Math. Imaging Vision, 20(1-2):89–97, 2004. Special issue on mathematics and
image analysis. 81
[25] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision, 40(1):120–145, 2011.
81
[26] T. F. Chan, G. H. Golub, and P. Mulet. A nonlinear primal-dual method for total
variation-based image restoration. SIAM J. Sci. Comput., 20(6):1964–1977, 1999.
66
[27] I. Csiszar. Why least squares and maximum entropy? An axiomatic approach to
inference for linear inverse problems. The annals of statistics, 19(4):2032–2066,
1991. 45
[28] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B, 39(1):1–38, 1977.
With discussion. 69, 70
[29] I. Ekeland and R. Temam. Convex analysis and variational problems. NorthHolland Publishing Co., Amsterdam-Oxford; American Elsevier Publishing Co.,
Inc., New York, 1976. Translated from the French, Studies in Mathematics and
its Applications, Vol. 1. 36, 62, 71
[30] H. C. Elman and G. H. Golub. Inexact and preconditioned Uzawa algorithms for
saddle point problems. SIAM J. Numer. Anal., 31(6):1645–1661, 1994. 82
[31] D. Fanelli and O. Öktem. Electron tomography: a short overview with an emphasis
on the absorption potential model for the forward problem. Inverse Problems,
24(1):013001, 2008. 16, 18, 23, 24
Bibliography
116
[32] A. R. Faruqi and S. Subramaniam. CCD detectors in high-resolution biological
electron microscopy. Quarterly Reviews of Biophysics, 33:1–27, 2000. 14
[33] FEI. An introduction to electron microscopy. 2010. 6, 8, 9, 10, 12, 14, 15
[34] R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients.
Comput. J., 7:149–154, 1964. 64
[35] J. Frank. Single-particle reconstruction of biological macromolecules in electron
microscopy–30 years. Quarterly reviews of biophysics, 42(03):139–158, 2009. 2
[36] E. Gil-Rodrigo, J. Portilla, D. Miraut, and R. Suarez-Mesa. Efficient joint poissongauss restoration using multi-frame l2-relaxed-l0 analysis-based sparsity. In Image
Processing (ICIP), 2011 18th IEEE International Conference on, pages 1385–1388,
2011. 42, 43
[37] P. Gilbert. Iterative methods for the three-dimensional reconstruction of an object
from projections. Journal of Theoretical Biology, 36(1):105–117, 1972. 33
[38] R. Glaeser, K. Downing, D. DeRozier, W. Chu, and J. Frank. Electron crystallography of biological macromolecules. Oxford University Press, 2007. 2
[39] T. Goldstein and S. Osher. The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci., 2(2):323–343, 2009. 81, 82
[40] R. Gordon, R. Bender, and G. T. Herman. Algebraic reconstruction techniques
(ART) for three-dimensional electron microscopy and x-ray photography. Journal
of theoretical Biology, 29(3):471–481, 1970. 33
[41] B. Goris, W. Van den Broek, K. Batenburg, H. Heidari Mezerji, and S. Bals. Electron tomography based on a total variation minimization reconstruction technique.
Ultramicroscopy, 113:120–130, 2012. 53
[42] P. W. Hawkes and E. Kasper. Principles of electron optics, volume 3. Access
Online via Elsevier, 1996. 16
[43] R. Henderson. Realizing the potential of electron cryo-microscopy. Quarterly
Reviews of Biophysics, 37(01):3–13, 2004. 2
[44] M. Hintermüller and K. Kunisch. Total bounded variation regularization as a
bilaterally constrained optimization problem. SIAM J. Appl. Math., 64(4):1311–
1333, 2004. 66
Bibliography
117
[45] J.-B. Hiriart-Urruty and C. Lemaréchal. Convex analysis and minimization algorithms. I, volume 305 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1993.
Fundamentals. 69, 82
[46] K. Ito and K. Kunisch. Lagrange multiplier approach to variational problems and
applications, volume 15 of Advances in Design and Control. Society for Industrial
and Applied Mathematics (SIAM), Philadelphia, PA, 2008. 82
[47] A. Jezierska, E. Chouzenoux, J. Pesquet, and H. Talbot. A primal-dual proximal splitting approach for restoring data corrupted with poisson-gaussian noise.
In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International
Conference on, pages 1085–1088, 2012. 42, 43
[48] P. Lax. Functional Analysis. Pure and applied mathematics. Wiley, 2002. 54, 55
[49] A. Leis, B. Rockel, L. Andrees, and W. Baumeister. Visualizing cells at the
nanoscale. Trends in biochemical sciences, 34(2):60–70, 2009. 2
[50] P.-L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear
operators. SIAM J. Numer. Anal., 16(6):964–979, 1979. 67
[51] L. Lucy. An iterative technique for the rectification of observed distributions. The
astronomical journal, 79:745, 1974. 69
[52] F. Luisier, T. Blu, and M. Unser. Image denoising in mixed poisson–gaussian
noise. Image Processing, IEEE Transactions on, 20(3):696–708, 2011. 42
[53] J. Müller. Parallel total variation minimization. Master’s thesis, Institute for
Computational and Applied Mathematics, University of Münster, 2008. 66
[54] F. Natterer. The mathematics of computerized tomography. Springer, 1986. 29
[55] F. Natterer and F. Wübbeling. Mathematical methods in image reconstruction.
Siam, 2001. 33, 70
[56] O. Öktem. Reconstruction methods in electron tomography. In Mathematical Methods in Biomedical Imaging and Intensity-Modulated Radiation Therapy
(IMRT), pages 289–320. 2008. QC 20120131. 7, 16, 30
[57] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method for total variation-based image restoration. Multiscale Modeling &
Simulation, 4(2):460–489, 2005. 51, 52, 62, 79
Bibliography
118
[58] G. B. Passty. Ergodic convergence to a zero of the sum of monotone operators in
Hilbert space. J. Math. Anal. Appl., 72(2):383–390, 1979. 67
[59] P. A. Penczek. Fundamentals of three-dimensional reconstruction from projections. Methods in enzymology, 482:1–33, 2010. 32
[60] L. Reimer and H. Kohl. Transmission electron microscopy: physics of image
formation, volume 36. Springer, 2008. 16
[61] E. Resmerita, H. W. Engl, and A. N. Iusem. The expectation-maximization algorithm for ill-posed integral equations: a convergence analysis. Inverse Problems,
23(6):2575–2588, 2007. 70
[62] W. H. Richardson. Bayesian-based iterative method of image restoration. JOSA,
62(1):55–59, 1972. 69
[63] C. V. Robinson, A. Sali, and W. Baumeister. The molecular sociology of the cell.
Nature, 450(7172):973–982, 2007. 2
[64] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal
algorithms. Physica D: Nonlinear Phenomena, 60(1):259–268, 1992. 49
[65] H. Rullgård. A new principle for choosing regularization parameter in certain
inverse problems. arXiv preprint arXiv:0803.3713, 2008. 53, 54, 85, 86
[66] H. Rullgård, L.-G. Öfverstedt, S. Masich, B. Daneholt, and O. Öktem. Simulation
of transmission electron microscope images of biological specimens. Journal of
microscopy, 243(3):234–256, 2011. 90
[67] H. Rullgård, O. Öktem, and U. Skoglund. A componentwise iterated relative
entropy regularization method with updated prior and regularization parameter.
Inverse Problems, 23(5):2121, 2007. 53
[68] A. Sawatzky. (Nonlocal) total variation in medical imaging. PhD thesis, Institute
for Computational and Applied Mathematics, University of Münster, 2011. 82
[69] A. Sawatzky, C. Brune, T. Kösters, F. Wübbeling, and M. Burger. EM-TV methods for inverse problems with poisson noise. In Level Set and PDE Based Reconstruction Methods in Imaging, pages 71–142. Springer, 2013. 46, 52, 54, 55, 56,
57, 58, 59, 61, 68, 72, 74, 79, 81, 83
[70] A. Sawatzky, C. Brune, J. Müller, and M. Burger. Total variation processing of
Bibliography
119
images with poisson statistics. In Computer Analysis of Images and Patterns,
pages 533–540, 2009. 46
[71] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen. Variational
methods in imaging, volume 167. Springer, 2008. 35, 47
[72] L. I. Schiff. Quantum mechanics, chap 2. cGraw-Hill, 1968. 19
[73] S. Setzer, G. Steidl, and T. Teuber. Deblurring poissonian images by split Bregman techniques. Journal of Visual Communication and Image Representation,
21(3):193–199, 2010. 46
[74] L. A. Shepp and Y. Vardi. Maximum likelihood reconstruction for emission tomography. Medical Imaging, IEEE Transactions on, 1(2):113–122, 1982. 70
[75] U. Skoglund, L.-G. Öfverstedt, R. M. Burnett, and G. Bricogne. Maximumentropy three-dimensional reconstruction with deconvolution of the contrast transfer function: a test application with adenovirus. Journal of structural biology,
117(3):173–188, 1996. 53
[76] D. L. Snyder, A. M. Hammoud, and R. L. White. Image recovery from data
acquired with a charge-coupled-device camera. JOSA A, 10:1014–1023, 1993. 42
[77] A. Sommerfeld. Die Greensche Funktion der Schwingungsgleichung.
Deutsch Math.-Verein, 21:309–353, 1921. 20
J.-Ber.
[78] A. C. Steven and W. Baumeister. The future is hybrid. Journal of structural
biology, 163(3):186–195, 2008. 2
[79] T. Teuber, G. Steidl, and R. H. Chan. Minimization and parameter estimation for
seminorm regularization models with I-divergence constraints. Inverse Problems,
29(3):035007, 2013. 46
[80] A. N. Tikhonov. On the solution of incorrectly put problems and the regularisation
method. In Outlines Joint Sympos. Partial Differential Equations (Novosibirsk,
1963), pages 261–265. Acad. Sci. USSR Siberian Branch, Moscow, 1963. 35
[81] A. N. Tikhonov. Numerical methods for the solution of ill-posed problems.
Springer, 1995. 35
[82] Y. Vardi, L. Shepp, and L. Kaufman. A statistical model for positron emission
tomography. Journal of the American Statistical Association, 80(389):8–20, 1985.
70
Bibliography
120
[83] C. R. Vogel. Computational methods for inverse problems, volume 23 of Frontiers
in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM),
Philadelphia, PA, 2002. With a foreword by H. T. Banks. 64, 65
[84] D. B. Williams and C. B. Carter. The Transmission Electron Microscope. Springer,
1996. 5, 6, 7, 8, 9, 10, 11, 12, 13
[85] R. Zanella, P. Boccacci, L. Zanni, and M. Bertero. Efficient gradient projection methods for edge-preserving removal of poisson noise. Inverse Problems,
25(4):045010, 2009. 46