Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Nonlinear Reconstruction Methods for Transmission Electron Microscopy Masterarbeit zur Erlangung des akademischen Grades Master of Science Westfälische Wilhelms-Universität Münster Fachbereich Mathematik und Informatik Institut für Numerische und Angewandte Mathematik Betreuung: Dr. Christoph Brune Prof. Dr. Martin Burger Prof. Ozan Öktem Eingereicht von: Leonie Zeune Münster, Juni 2014 i Abstract Electron tomography (ET) is a technique to recover the three-dimensional structure of an object on a molecular level from a set of two-dimensional transmission electron microscope (TEM) images recorded from different perspectives. These images are corrupted by Poisson as well as Gaussian noise. The resulting inverse problem is severely ill-posed due to a combination of a very low signal-to-noise ratio and an incomplete data problem. In this thesis we present an approach to solve this inverse problem with variational methods. It is based on a statistical modeling of the inverse problem in terms of maximum a posteriori (MAP) likelihood estimations. In contrast to the majority of reconstruction methods in the field of ET, we focus on modeling data corrupted by Poisson noise. Thus, we want to minimize a nonlinear energy functional with the Kullback-Leibler divergence as the data discrepancy term combined with a total variation regularization. In order to solve this optimization problem, we propose an alternating two-step iteration consisting of an expectation-maximization (EM) step and the solution of a weighted Rudin-Osher-Fatemi (ROF) model. The algorithm is adapted to the affine form of the forward model in ET. In order to overcome contrast loss typical for TV-based regularization, we extend the algorithm by iterative regularization based on Bregman distances. Finally, we illustrate the performance of our techniques on synthetic and experimental biological data. ii Eidesstattliche Erklärung Hiermit versichere ich, Leonie Zeune, dass ich die vorliegende Arbeit selbstständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe. Gedanklich, inhaltlich oder wörtlich Übernommenes habe ich durch Angabe von Herkunft und Text oder Anmerkung belegt bzw. kenntlich gemacht. Dies gilt in gleicher Weise für Bilder, Tabellen, Zeichnungen und Skizzen, die nicht von mir selbst erstellt wurden. Alle auf der CD beigefügten Programme sind von mir selbst programmiert oder durch Angabe von Herkunft kenntlich gemacht worden. Münster, 16. Juni 2014 Leonie Zeune iii Acknowledgment I want to take this opportunity to thank the people who supported and encouraged me in the last months and made this thesis possible, especially Dr. Christoph Brune for being a great supervisor and for helping and supporting me a lot. For many valuable advices and for motivating me if things did not work out as I hoped. Prof. Ozan Öktem for giving me the unique opportunity to spend six months at the Royal Institute of Technology (KTH) in Stockholm and making me feel welcome. Especially, I want to thank him for taking a lot of time to support me and for very helpful discussions and advices. Prof. Dr. Martin Burger for giving me the opportunity to work at this interesting and challenging topic and for making my stay at the KTH possible. The people who proofread my thesis and thereby helped to improve it. Stefan Poggensee for a lot of patience and encouragement during the last months and for always believing in me and making me happy. My great family - my parents, my sisters Lisa and Sophie and their ”Anhang” Tomas and Stefan - for a lot of love and encouragement and for always cheering me up. iv Contents 1 Introduction 2 Electron Tomography 2.1 Transmission Electron Microscopy 2.1.1 Advantages of Illumination 2.1.2 Image Formation Process . 2.1.3 Scattering and Diffraction 2.2 Electron Microscope . . . . . . . 2.2.1 Electron Source . . . . . . 2.2.2 The Condenser System . . 2.2.3 Holders and Stage . . . . . 2.2.4 Imaging Lenses . . . . . . 2.2.5 Viewing Device . . . . . . 2.3 Sample Preparation . . . . . . . . 1 . . by . . . . . . . . . . . . . . . . . . . . . . . . Electrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 The Forward Model 3.0.1 Basic Notation . . . . . . . . . . . . . . . 3.1 Image Formation . . . . . . . . . . . . . . . . . . 3.1.1 Modeling Phase Contrast Imaging . . . . . 3.1.2 The Forward Operator . . . . . . . . . . . 3.2 The Inverse Problem . . . . . . . . . . . . . . . . 3.3 Difficulties for Solving the Inverse Problem . . . . 3.4 Reconstruction Methods in Electron Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Inverse Problems and Variational Methods 4.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Different Noise Models and Corresponding Data Terms 4.1.3 Different Regularization Terms . . . . . . . . . . . . . 4.2 Existence and Uniqueness of a Minimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 5 6 7 8 9 9 11 13 14 14 . . . . . . . 16 16 17 18 27 28 29 31 . . . . . 34 35 35 37 46 54 Contents v 5 Numerical Approach 5.1 Introduction to Numerical Optimization Methods . 5.1.1 Gradient Descent Methods . . . . . . . . . . 5.1.2 Newton Methods . . . . . . . . . . . . . . . 5.1.3 Introduction to Splitting Methods . . . . . . 5.2 Bregman-FB-EM-TV Algorithm . . . . . . . . . . . 5.2.1 EM Algorithm . . . . . . . . . . . . . . . . . 5.2.2 FB-EM-TV Algorithm . . . . . . . . . . . . 5.2.3 Bregman-FB-EM-TV Algorithm . . . . . . . 5.2.4 Numerical Realization of the Weighted ROF 6 Programming and Realization in MATLAB and C 6.1 Bregman-FB-EM-TV Toolbox . . . . . . . . . . . . 6.2 TVreg Software . . . . . . . . . . . . . . . . . . . . 6.3 Embedding of the Forward Operator via MEX files 6.4 Difficulties . . . . . . . . . . . . . . . . . . . . . . . 7 Results 7.1 Simulated Data . . . . . . . . 7.1.1 Results Balls Phantom 7.1.2 Results RNA Phantom 7.2 Experimental Data . . . . . . 7.2.1 Results CPMV Virus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 63 63 65 67 68 68 70 77 80 . . . . 83 83 85 86 87 . . . . . 90 90 93 98 103 103 8 Conclusion and Outlook 107 List of Figures 110 List of Tables 112 Bibliography 113 1 1 Introduction Microscopy, whose origins can be traced back to the 16th century, has played and still plays a central role in life sciences and medicine. In general, there are two major goals that provide incentives for developments in the field of microscopy. On the one hand, there is the quest for higher magnification and resolving power. There are now a variety of microscopic imaging techniques, e.g. electron microscopy that uses electrons instead of visible light for imaging. Compared to visible light, electrons have a much shorter wavelength, which enables to image species at significantly higher resolution than in light microscopy. On the other hand, there is the quest to image in three dimensions rather than in two dimensions. This is essential for understanding the structural threedimensional conformation of proteins and macromolecular assemblies, which is closely related to their function within biological processes in the cell in time and space. Such assemblies form the machinery responsible for most biological processes and are relevant for understanding many diseases such as cancer and metabolic disorders. Knowledge of these three-dimensional structures would provide the mechanistic descriptions for how macromolecules act in an assembly. Moreover, it could give some indications for developing therapeutic interventions related to diseases. Therefore, the structure determination problem has come to play a central role in both commercial and academic biomedical research. It is the problem of recovering the three-dimensional structure of an individual molecule (e.g. a protein or a macromolecular assembly) at highest possible resolution in its natural environment, which could be in-situ (i.e. in the cellular environment) or in-vitro (i.e. in the aqueous environment). Two established structure determination methods are X-ray crystallography and nuclear magnetic resonance. However, none of these methods can recover the threedimensional structure of an individual molecule of a sub-cellular complex in its natural environment. This is important in order to address many key biological issues. A particular issue are the current unacceptably high failure rates in the drug discovery, even in late stages of the development process. It is very difficult to understand drug targets and the disease mechanisms on a molecular level and to match the model system to 1 Introduction 2 the human biology properly. This is one of the main reasons for the high failure rates when drug candidates progress from pre-clinical studies to clinical trials. Due to the reasons above, one of the main goals for present-day electron microscopy is to look at the life processes within a cell at the molecular level. Traditional electron microscopy is limited to acquire a two-dimensional image which is then interpreted with little or no image post-processing. Thus, from a mathematical point of view, microscopy in this context is not necessarily understood as an inverse problem. Nevertheless, there is the quest to image specimens in three dimensions. During recent years electron microscopy has developed into the most useful tool to study macromolecules, molecular complexes and supramolecular assemblies in three dimensions. Three main techniques have been applied: electron crystallography reviewed in [43, 38], single particle analysis reviewed in [35], and Electron Tomography (ET), which is the topic of this thesis. Electron crystallography permits structural analysis of macromolecules at or close to atomic resolution (0.4 nm or better). It relies on the availability of twodimensional crystals and has proven to be especially suited for membrane proteins. Larger biological complexes are preferably studied by single particle analysis, which in favorable cases allows the molecular objects to be examined at medium-resolution (1 − 2 nm). Notably, this resolution is good enough to permit docking, which is the process of fitting high-resolution structures of subunits or domains (usually obtained by X-ray crystallography) into the large structure at hand.This approach may reveal the entire complex at close to atomic resolution. Finally, ET can provide structural information at the molecular level in the context of the cellular environment. Most importantly, by using sample preparation techniques that preserve the specimens at close to life conditions (called cryo electron microscopy), it is possible to study macromolecular complexes in aqueous solution or small cells, or sections through larger cells or tissues (see e.g. [3, 49]). Presently, the resolution is limited to 4 − 5 nm but it seems reasonable to reach higher resolutions in near future. The docking approach would then be realistic, which will help to identify and characterize the observed molecular complexes. It should finally be recalled that ET examines the supramolecular assemblies as individual objects. This is essential since within the cell these complex structures are likely to be dynamic, i.e. they change their conformation and subunit composition. Moreover, they interact, often transiently, with other molecular assemblies and cellular structures. Thus, often in conjunction with other methods such as X-ray crystallography, mass spectrometry and single particle analysis (cf. [63, 78]), ET is likely to be the most efficient tool to visualize the supramolecular structures at work. This will help us to understand how they operate within the cell at the molecular level. 1 Introduction 3 The idea in ET is to recover the three-dimensional structure of a specimen from a set of two-dimensional Transmission Electron Microscope (TEM) images of the specimen. Here, each image represents a view of the specimen under scrutiny from a different direction. This technique, which was first outlined about 40 years ago, is still the only approach that allows one to reconstruct the three-dimensional structure of individual molecules in their natural (i.e. in-situ or in-vitro) environment. Hence, ET is more of a structure determination method than a direct imaging method, where imaging data is directly interpreted. To infer the three-dimensional image from the available twodimensional imaging data one needs to use mathematics. The reconstruction problem in ET is however an example of a limited data inverse scattering problem which is severely ill-posed. This, in combination with very noisy data, makes it difficult to obtain reliable reconstructions with high enough resolution, unless sophisticated mathematics is employed. Hence, the usage of ET in mainstream structural biology has been limited to the study of cellular sub-structures rather than individual molecules. As already mentioned, reliable reconstruction methods play an important role for the success of ET. This thesis presents a reconstruction method for ET based on variational methods. Mathematically speaking, we solve the resulting inverse problem by minimizing an energy functional consisting of two different terms: a data discrepancy term and a regularization term. The data discrepancy term is given by assumptions about the noise in the recorded data. The regularization term accounts for a-priori knowledge that stabilizes the inversion. This is needed since the inverse problem is severely ill-posed. We choose the total variation as the regularization functional. This choice leads to reconstructions with preserved or even enhanced sample contours. Unfortunately, such reconstructions suffer from a systematic loss of contrast. Hence, we also present techniques to overcome this drawback. This thesis is organized as follows. We start with an introduction to electron tomography in Chapter 2. In particular, we address the advantages of TEM compared to light microscopy and describe how it operates. In Chapter 3, we outline the derivation of a computationally feasible forward model for phase contrast TEM imaging. Moreover, we comment on some difficulties for solving the resulting inverse problem. Chapter 4 starts with an introduction to inverse problems and variational methods before we move on to present the minimization problem we want to solve in our reconstruction algorithm. The latter is presented in Chapter 5, whereas the computational realization is described in Chapter 6. Afterwards, numerical results using simulated as well as experimental data sets are presented in Chapter 7. Finally, Chapter 8 provides conclusions and an outlook. 4 2 Electron Tomography ET is a tomography technique to obtain 3D reconstructions of specimens at the nano meter scale. It is comparable to the phase contrast x-ray Computed Tomography (CT) but with electrons instead of photons. The underlying data for the reconstruction process are a series of 2D images recorded with a TEM. The individual images, henceforth called micrographs, are recorded while the specimen is tilted along a fixed axis in the microscope. This chapter explains the operating principles of Transmission Electron Microscopy. This is essential for understanding the forward model presented in the following chapter. 2.1 Transmission Electron Microscopy Transmission Electron Microscopy is an imaging technique, which uses an electron beam in order to illuminate the specimen. The theoretical basis for using electrons for imaging was laid by Hans Busch in the 1920s when he designed the first working electron lens and thereby laid the foundations for electron optics. He suggested that magnetic fields can be used in order to shape the path of an electron beam and thus are usable analogously to lenses in light optics. Besides Busch’s work, there have been several discoveries about the properties of electrons that led to the invention of an electron microscope. It was Louis de Broglie who stated that electrons, besides their interpretation as particles, have wave-like characteristics. This concept is known as the wave-particle dualism and is a central concept of quantum mechanics. Moreover, it was discovered that electrons behave in vacuum much like light. Based on these findings, Ernst Ruska built the first TEM in 1931 and later in 1939 the first commercial TEM was made available. 2 Electron Tomography 5 2.1.1 Advantages of Illumination by Electrons Combined with his theory about the wave-particle dualism of electrons, De Broglie stated that electrons have a wavelength λ= ~ ~ = , p m0 v (2.1) with ~ denoting Planck’s constant, p the momentum, m0 the electron mass at rest, and v the velocity. Next, equating the kinetic and potential energies yields eV = m0 v 2 . 2 (2.2) Combining (2.1) and (2.2) allows us to derive a relationship between the wavelength of electrons and their energy. If we neglect relativistic effects, this relationship is ~ λ= √ . 2m0 eV (2.3) Thus, an increase of the acceleration voltage for the electrons results in a smaller electron wavelength. For a typical acceleration voltage of 200 keV the wavelength is λ = 0.00274 nm and for 300 keV it is even λ = 0.00224 nm. Thus, the wavelength of electrons is much smaller than the wavelength of visible light ranging from 390 to 700 nm. The main advantage of an electron microscope in comparison to a light microscope is the significantly higher resolution that can be achieved theoretically. Resolution is defined as the smallest distance δ between two points such that one can still see them as two separated points. For classical light microscopes the relation between the wavelength λ of the illumination source and the theoretically achievable resolution δ is given by the Rayleigh criterion δ= 0.61λ , µ sin β (2.4) where µ is the refractive index of the viewing medium and β the semi-angle of collection of the magnifying lens (cf. [84]). The term µ sin β is called the numerical aperture and can be approximated by µ sin β ≈ 1. Thus, the resolution is directly related to the wavelength of the light. The lowest obtainable resolution for light microscopy with conventional lenses is about 200 nm. Albeit small, it is still about 100 times larger than the size of typical proteins which ranges from 3 − 10 nm and about 1000 times larger than the diameter of an atom whose size ranges from 0.062 − 0.52 nm. Therefore, 2 Electron Tomography 6 light microscopy cannot be used to study life science processes at a molecular level. Since the wavelength of electrons is much smaller, in theory resolutions at atomic level are easily to obtain. The relationship between the resolution δ of a TEM and the electron wavelength λ can be approximated by δ= 1.22λ , β (2.5) where β is the semi-angle of collection of the electromagnetic lens (cf. [84]). But in contrast to light microscopy, where the achieved resolutions are close to the predicted values, in electron microscopy one is far away from achieving predicted resolutions. In material science the best achieved resolution is about 0.05 nm (cf. [33]). In biological applications resolutions are around 4 − 5 nm. This limitation is mainly caused by technical problems and specimen properties, limiting the dose and energy of incoming electrons. See Chapter 3 for more details. 2.1.2 Image Formation Process The image formation process in a TEM is very similar to the one in classical light microscopes. Instead of light, a beam of electrons is used and glass lenses are replaced by electromagnetic lenses. Electrons are emitted from an electron source and accelerated to an energy of typically 200 − 300 keV. Condenser lenses form a parallel beam that passes through a very thin specimen. The transmitted electrons of interest are collected and focused by an objective lense. Afterwards, several projector lenses project a magnified image to the image plane, where an intensity is generated. A viewing device, like a Charged Coupled Device (CCD) camera, detects the intensity and converts it into a grayscale image. It is essential that the whole image formation takes place in vacuum and that the specimen is very thin. Electrons are nearly 2000 times lighter and smaller than the smallest atom. Thus, if there was no vacuum, the possibility of an interaction with air molecules would be very high. Moreover, the specimen needs to be thin enough to allow the electrons to transmit it. The thicker the specimen is, the more electrons are backscattered and cannot contribute to the resulting image of a TEM. In contrast to photons that are not charged and do not affect each other, electrons are negatively charged and repulse each other if they are too close. We want to approximate the path of an electron through the specimen by a ray. Therefore, we have to ensure that the distance between two successive electrons is large enough so that they do not influence each other’s path. It has been shown that their mean 2 Electron Tomography 7 Figure 2.1: Different kinds of electron scattering from a thin specimen. [84] separation is much larger than the specimen’s thickness (cf. [56]) and thus interaction between different electrons can be neglected. 2.1.3 Scattering and Diffraction In TEM imaging all information available results from electron scattering and diffraction. If we want to understand the different information that can be obtained from electron scattering, it is important to always keep in mind the wave-particle dualism of electrons. If we think of electrons as particles, they can scatter either elastically or inelastically. Elastically scattered electrons do not change their energy while inelastically scattered electrons loose some of their energy. If we think of waves, the diffraction of the electron wave is distinguished into coherent and incoherent diffraction. Coherent waves remain in step with each other but may be phase shifted after interacting with the specimen, while incoherent diffracted electron waves have no phase relationship after specimen interaction. Elastic scattering is associated with coherence and inelastic scattering with incoherence, see Chapter 3 for more details. If the electron interacts with the specimen, it can generate a range of secondary signals. In Figure 2.1 the important signals are illustrated. Since each secondary signal can reveal different properties of the specimen, there are several different imaging techniques using certain signals. In TEM imaging the interest lies only in unscattered or at low angles scattered electrons that transmit the specimen. Since only forward scattered electrons, i.e. electrons that are scattered with an angle lower than 90◦ , transmit the specimen, all backscattered electrons are neglected. Moreover, a restricting aperture 2 Electron Tomography 8 centered around the optical axis is used in order to select only the electrons that deviate less than a certain angular from the axis. Thus, the signals of interest in TEM are incoherent inelastic scattered electrons and coherent elastic scattered electrons (cf. Figure 2.1). Incoherent inelastically scattered electrons can be used to form an amplitude contrast image. Since they lost energy during interaction with the specimen, their amplitude varies. Thus, the intensity, which is the absolute value of the electron wave, varies as well and forms an image of the specimen. It is also possible to form an image out of the information obtained from coherent elastically scattered electrons. Since the amplitude is constant, it is required to make the occurring phase shift visible. The resulting image is then referred to as a phase contrast image. The visualization of the phase is more complicated and will be discussed in Chapter 3 together with a mathematical model for the image formation process of phase contrast images. 2.2 Electron Microscope In this section, we want to briefly describe the build-up of a TEM used to acquire the images in ET. It consists of an electron source, electromagnetic lenses for the specimen illumination, a specimen holder, imaging lenses and a viewing device. Together, they are called the electron optical column. See Figure 2.2 for a sketch of the buildup. It is very important that the whole system is in vacuum in order to prevent that electrons interact with air molecules. In the following, all components of a modern TEM are explained in more detail. The section is based upon [33, 84]. Figure 2.2: Cross section of the column of a modern TEM. Image courtesy of FEI1 . 1 www.fei.com 2 Electron Tomography 9 2.2.1 Electron Source The most common electron sources in Electron Microscopes can be divided into two different kinds: thermionic and field-emission sources. Thermionic sources produce electrons when they are heated, the most common ones are tungsten filaments and lanthanum hexaboride crystals. In contrast, there are field-emitters, which produce electrons when a large electric potential is applied between the source and an anode. The electrons are then extracted from a very sharply pointed tungsten tip. The properties of an electron source can be described by brightness, coherency and stability, whereby brightness is the most important one, since it has an influence on the resulting resolution of the microscope. It is defined as the current density per unit solid angle of the source (cf. [84]). The electron source is incorporated into a gun assembly in order to be able to control the beam and direct it into the illumination system. It focusses the electrons coming from the source in one point, called cross-over point. Since high resolution TEM based on phase contrast imaging needs high spatial coherence, field-emission guns are the best choices for these applications. They provide a brightness up to 1000 times greater than thermionic sources. A disadvantage are the varying beam currents (cf. [33]). 2.2.2 The Condenser System Electromagnetic Lenses Electromagnetic lenses are the equivalent to glass lenses in a light microscope. They control the electron path and are responsible for focussing the beam and magnifying the image. A cross-section of an electromagnetic lens is shown in Figure 2.3. Here, C is an electrical coil and P is the soft iron pole piece. If the current passing through the coils varies, the power of the lense changes [33]. The positions of the lenses are fixed. This is the main difference to glass lenses, which cannot change their strength, but their position can be adjusted. The stronger the lens is, the lower is its magnifying power while its demagnifying power is higher. Besides, electromagnetic lenses and glass lenses behave similarly with similar types of aberration, the most important ones are spherical aberration, chromatic aberration and astigmatism. Spherical aberration means that the power of the lens in the center differs from that at the edges. It is primarily determined by the lens design and quality. Spherical aberration causes that information from one point is spread over a disc in the image plane. Thus, the resulting image is blurred. It is one of the major problems that limits the resolution of a TEM. Modern TEMs use spherical aberration correctors in order to alleviate this 2 Electron Tomography 10 Figure 2.3: Cross-section of an electromagnetic lens. Image courtesy of FEI2 . problem (cf. [33]). Chromatic aberration means that the power of the lens varies with the energy of electrons passing through the lens. Therefore, the accelerating voltage should be as stable as possible in order to decrease the effects of chromatic aberration. Finally, astigmatism causes that a circle in the specimen becomes an ellipse in the image (cf. [33]). Often, there is not only one lens but a system consisting of several lenses and apertures with different diameters. Apertures exclude electrons that are not required for the image formation process (cf. Figure 2.4). Besides, they control the divergence and convergence of the electron path through the lenses; the smaller the aperture the more parallel the resulting beam. Therefore, apertures influence the depths of focus of a beam (cf. [84]). The focus is the image point where the light rays from one point in the object converge. It can be above or beneath or in the normal image plane. If the image point is above the image plane, the beam is called overfocused. This is associated with a strong lens. If the image point is in the image plane, it is focused. Accordingly, the beam is underfocussed if the ray converges beneath the image plane. This concept corresponds to a weaker lens. In Figure 2.5 all three focus concepts are illustrated. Apparently, the rays are more parallel in the image plane if the image is underfocussed than if it is overfocused. Condenser System The condenser system follows the electron gun and is responsible for the form of the electron beam that hits the specimen. It consists of several lenses and apertures and transfers the electron beam to the specimen. In bright-field TEM imaging the specimen is uniformly illuminated, therefore the electron beam should 2 www.fei.com 2 Electron Tomography 11 Figure 2.4: Ray diagram illustrating how an aperture restricts the angular spread of electrons entering the lens. [84] be parallel when it hits the specimen. In other imaging techniques, like Scanning Transmission Electron Microscope (STEM) imaging, the beam needs to be focussed in a small spot on the specimen. In order to obtain a parallel illumination beam, it is useful to operate the microscope out of focus. A simplified concept using only two lenses is shown in Figure 2.6 (a). The first lens C1 forms an image of the gun cross-over point. For thermionic electron sources, this C1 cross-over image is a demagnified version of the first gun cross-over, explaining the name ”condenser system”. For field-emitters, the gun cross-over often needs to be magnified, since its size is smaller than the desired size of the illumination area. The following C2 lens produces an underfocussed image of the C1 cross-over, thus resulting in a nearly parallel illumination hitting the specimen. In Figure 2.6 (b) the effect of an additional C2 aperture is clarified. The resulting beam is more parallel if a smaller aperture is added. The disadvantage of an aperture is the decrease of the total number of electrons hitting the specimen, reducing the quality of the resulting image. Note that this is only a simplified model, whereas the actual condenser system in a TEM is far more complicated. 2.2.3 Holders and Stage In order to insert the specimen in the evacuated optical column, it is placed on a specimen holder, which is then inserted into the TEM stage. There are two different kinds of holders: top-entry holders and side-entry holders. In TEM, side-entry hold- 2 Electron Tomography 12 Figure 2.5: The concept of (A) overfocus (B) focus and (C) underfocus. [84] ers are commonly used, therefore we focus on these holders. For an illustration of a side-entry holder see Figure 2.7. The specimen is placed on a copper grid, which is then mounted near the tip of the holder. Afterwards, the holder is introduced into a goniometer through an air lock. The air lock ensures that the increase of pressure in the microscope is minimal when the holder is inserted into the vacuum surrounding the optical column. The whole holder-stage system must provide for various movements like translation into x-,y- and z-direction, rotation and tilting. x and y translations are necessary in order to move the region of interest into the illuminated area. With translations in z-direction the height of the holder can be adjusted. The basic movements like translation and single axis tilting are provided by the goniometer. It is located close to the objective lense in order to minimize lens aberrations and maximize the resolution (cf. [33]). The holder rod is responsible for every other desired movement, like a second tilt axis or rotation of the specimen in the plane perpendicular to the optical axis. There are also holders, called in situ holders, that allow to change the specimen during the illumination. Examples of in situ holders are heating, cooling and cryo-transfer holders. The latter one permits to transfer cryo freezed samples (cf. section 2.3) into the TEM without water vapor condensing as ice on the surface (cf. [84]). 2 Electron Tomography (a) 13 (b) Figure 2.6: Parallel-beam operation in the TEM. (a) The basic principle, using only the C1 and an underfocussed C2 lens. (b) Effect of the C2 aperture on the parallel nature of the beam. [84] 2.2.4 Imaging Lenses After interaction with the specimen the imaging system needs to create an image out of the transmitted electrons and then magnify and project it onto the viewing device. The optical system consists of an objective lens with an aperture in its focal plane followed by several projector lenses and apertures. The objective lens is the most important part and forms a first intermediate image of the specimen. The following aperture has the important role to determine which electron information is used for the final image. A smaller aperture collects only the electrons close to the optical axis. Therefore, the influence of spherical aberration is small, but a lot of information from outer electrons is neglected. With a wider aperture more information is included, but the blurring effect of spherical aberration is stronger. Therefore, it is obvious that a high quality of the objective lens is essential for a good resolution in the final image. The following projector lenses magnify and project the image onto the viewing device. In phase contrast imaging the imaging system also has the important task to make the phase of the electron wave visible. See Chapter 3 for more details. 2 Electron Tomography 14 Figure 2.7: Single tilt sample holder for a TEM.3 2.2.5 Viewing Device The TEM viewing device needs to be able to perform real-time imaging as well as to record the image. Older TEM devices use a fluorescent screen for real-time imaging and a film camera in order to record images (cf. [33]). In modern microscopes this is replaced by solid-state devices like CCD cameras. Above the detector plane, there is a scintillator converting the electrons into photons, which are then transported to the CCD element via a lens coupling or fibre optics [32]. The light creates charge in the CCD, which is then recorded. Due to electron and photon propagation within the scintillator, the CCD camera is responsible for some loss of resolution and efficiency. Therefore, direct electron detectors are recently introduced to modern TEMs (cf. [33]). 2.3 Sample Preparation Before a specimen can be inserted into the specimen holder it needs to be small enough, stable and very thin in order to permit the transmission of electrons. The copper specimen grid used as a specimen carrier mounted on the holder has a diameter of roughly 3 mm restricting the size of the specimen. It is very important that the preparation technique preserves the specimen properties and does not alter its atomic structure (cf. [33]). A common preparation technique for biological samples starts with a chemical treatment of the specimen in order to remove the water in the tissue. Afterwards, the tissue is embedded in hardening resin. The hard specimen is then cut into slices of about 0.5 µm, which can be inserted into the microscope. Another common approach to stabilize the specimen is to freeze it. Since traditional freezing methods can damage the specimen by resulting ice crystals, in biological applications most of the samples 3 Wikipedia: http://upload.wikimedia.org/wikipedia/commons/4/4d/TEM-Single-tilt.svg 2 Electron Tomography 15 are cryo fixated. An advantage of this technique is that damages of the sample are reduced to a minimum in comparison to conventional preparation techniques. Moreover, the original state of the tissue is preserved to a high degree. Cryo-fixation involves ultra-rapid freezing of the sample, called vitrification. The tissue is frozen so quickly that water molecules have no time to crystallize. Thus, the damaging ice crystals are avoided. Another advantage of the low temperature of vitrified samples is that damages caused by the electron beam are reduced as well. Therefore, vitrified samples can be longer exposed by electrons (cf. [33]). Cryo-fixation allows biological specimens to be recorded in their natural environment. This can be helpful in order to better understand the form and function of the specimen under scrutiny. 16 3 The Forward Model This chapter outlines the derivation of a computationally feasible model for phase contrast TEM imaging, based on [31, 56]. For a more detailed exposition the reader may consult [31, 60, 42]. We will start with an introduction of concepts and the basic notation used throughout this chapter. 3.0.1 Basic Notation The unit sphere in the Euclidean space R3 is defined as S 2 := {x ∈ R3 : |x| = 1}. For given ω ∈ S 2 , the hyperplane in R3 that is orthogonal to ω is defined as ω ⊥ := {x ∈ R3 : x · ω = 0}. A line in R3 is uniquely determined by a pair (ω, y) where ω ∈ S 2 is the direction and y ∈ ω ⊥ is the unique point in the hyperplane ω ⊥ through which the line passes. Next, any point in R3 can be written as y + tω for some y ∈ ω ⊥ and t ∈ R. If f is a real or complex valued function defined on R3 that decreases sufficiently fast to infinity, then the ray transform P(f ) of f is defined as the following line integral: Z ∞ P(f )(ω, y) := f (y + tω) dt for ω ∈ S 2 and y ∈ ω ⊥ . −∞ Finally, ~ω⊥ denotes the two-dimensional convolution in the hyperplane ω ⊥ Z (f ~ω⊥ g)(x) := f (x − τ )g(τ ) dτ ω⊥ for x ∈ ω ⊥ 3 The Forward Model 17 and ∗ the three-dimensional convolution in R3 Z (f ∗ g)(x) := f (x − τ )g(τ ) dτ for x ∈ R3 . R3 Moreover, Fω⊥ is the two-dimensional Fourier transformation on ω ⊥ defined as Z Fω⊥ (f )(ξ) = e−ix · ξ f (ω, ξ)(τ ) dx for ξ ∈ ω ⊥ . ω⊥ 3.1 Image Formation Overview There are essentially three different mechanisms that give rise to contrast (intensity variations) in a TEM image: diffraction contrast, amplitude contrast, and phase contrast. Each of these can be described by a common quantum mechanical model. Although elegant, this is not a computationally feasible approach. For computational feasibility, one must take advantage of various approximations that specifically hold for each of the aforementioned three contrast mechanisms. Diffraction contrast is generated in practice by intercepting the diffraction pattern using an objective aperture in the back focal plane of the objective lens that only allows the transmitted beam to form the image. A diffraction contrast image reveals variations in the intensity of the selected electron beam as it leaves the sample. This type of contrast is essentially only interpretable if the specimen is ordered (crystalline). Our interest lies in imaging amorphous specimens, so modeling diffraction contrast is of less relevance to us. Amplitude contrast, also called thickness-mass contrast, refers to contrast in the image that arises when one removes electrons that have scattered with an angle that is too high. This is done by placing an aperture in the back focal plane of the objective lens. Since lighter elements give rise to smaller scattering angles than heavier elements do, amplitude contrast basically maps the scattering power of the elements present in the specimen. On the other hand, such contrast is associated with low and mid range resolution. Furthermore, when imaging weakly scattering specimens (like unstained biological specimens), most electrons undergo scattering with very small angles and therefore pass through the aperture. Hence, amplitude contrast cannot be used to explain contrast in images from such specimens. Phase contrast arises from the interference in the image plane of the scattered electron 3 The Forward Model 18 wave with itself. Thanks to a high quality optical system that converts the phase of the scattered electron wave at the specimen exit plane into visible contrast variation in the image plane, the effect of this interference is only visible as intensity variations in the image. Contrast in high resolution TEM images of thin unstained biological specimens is almost entirely due to phase contrast. Hence, our focus will be on deriving a computationally feasible model for phase contrast TEM imaging. This model can also be extended to include amplitude contrast (cf. [31]). 3.1.1 Modeling Phase Contrast Imaging We want to start this section by stating two basic assumptions. First, the specimen forms a closed system together with the incident electron, so we disregard any interaction with the environment. We also assume that successive imaging electrons are independent, which is called the independent electron assumption. Both assumptions hold under normal TEM conditions. As an example, the independent electron assumption holds, since the distance between two successive imaging electrons is much larger than the specimen thickness. As a consequence of this assumption the wave mechanical notions, like interference and superposition, refer to the wave of a single electron, i.e. the crests of the latter one interact with each other. A model for phase contrast TEM imaging naturally divides into three parts: 1. the interaction between the imaging electron and the specimen (electron-specimen interaction), 2. the influence of the optics, 3. the detector. All three parts are coupled but can be treated separately. 1. Electron-Specimen Interaction In this section, we want to model the interaction between the incident imaging electron and the specimen. Essentially, this reduces to modeling the scattering of electrons against the atoms in the specimen. To begin with, we assume perfect coherent imaging, which in particular implies that the incident electron is modeled as a monochromatic plane wave and electrons only scatter elastically against the atoms in the specimen. 3 The Forward Model 19 These assumptions will be relaxed below, by accounting for the partial incoherence due to incoherent illumination and inelastic scattering. The Schrödinger Equation For elastic scattering, the specimen remains in the same quantum state. Hence, the scattering can be modeled as a one-body problem, where the scattering properties of the specimen are fully described by its electrostatic potential Ue.p. : R3 → R+ . This derives from the fact that only inelastic scattering changes the state of the specimen. If only elastic scattering occurs, the specimen will not change over time, so the function describing the specimen is time-independent. In this case, it is fully described by the spatially dependent electrostatic potential Ue.p. . The non-relativistic time evolution of the imaging electron is then described by the scalar Schrödinger equation [72]: ~2 ∂ ∆ + V (x) Ψ (x, t) . i~ Ψ (x, t) = − ∂t 2m (3.1) Here, ~ is Planck’s constant, e is the elementary charge, m is the electron mass at rest, Ψ : R3 × R → C is the wave function for the imaging electron, and V : R3 → R− is the potential energy, so V (x) = −eUe.p. (x). Now, it is common to express solutions to (3.1) as Ψ(x, t) = u(x)f (t) where f has unit modulus. (3.2) Inserting (3.2) into (3.1) results in two separate differential equations. First, i~f 0 (t) = Ef (t), so E f (t) = e−i h t , with E being the constant energy of the elastically scattered electron, given by E = ~2 k2 . Here, k = 2π denotes the electron wavenumber and λ the wavelength of the 2m λ electron. Second, ~2 − ∆ + V (x) u(x) = Eu(x) (3.3) 2m is a partial differential equation of Helmholtz type, and it can be rewritten as ∆ + k 2 u(x) = −Fs.p. (x)u(x), with Fs.p. : R3 → R+ given by Fs.p. (x) := − 2m V (x). ~ 3 The Forward Model 20 The function Fs.p. is henceforth called scattering potential. According to [77] u(x) also satisfies the Sommerfeld radiation condition as a boundary condition, so u is the unique solution of ∆ + k 2 u(x) = −Fs.p. (x)u(x) lim r nr (x) · ∇usc (x)−ikusc (x) = 0 for x ∈ ∂D3r , r→∞ (3.4) (3.5) where D3r denotes the ball with radius r in R3 , nr (x) denotes the outgoing unit normal to ∂D3r at x and usc = u − uin , where uin is the monochromatic incoming wave hitting the specimen. Coherence and Incoherence Regarding the wavenature of the electrons one can divide scattering into coherent and incoherent scattering. A coherent wave has a temporal constant phase difference, which means that the frequence and propagation speed remain constant over time. If the scattered wave is coherent, we refer to it as coherent scattering, if not, as incoherent scattering. The main advantage of coherent waves is their ability to form stationary interference. The resulting wave has temporal constant amplitude, wavelength, velocity and frequence. When imaging thin unstained biological specimens the main contrast in the TEM image is the contrast that arises from stationary interference of the scattered wave with itself. It thereby is essential that there is coherent scattering. Besides the classification of scattering in coherent and incoherent, it can also be differentiated into elastic and inelastic scattering. In the case of elastic scattering there is no transfer of energy from the incident electron to the specimen, whereas in the case of inelastic scattering energy is transferred from the electron to the specimen. In this case the specimen changes its state. Although every possible combination of coherent/incoherent and elastic/inelastic scattering can occur, it is common to associate inelastic with incoherent scattering and elastic with coherent scattering. Inelastic scattering implies that energy is transferred from the incident electron to the specimen, whereby the transfer process is not deterministic but a quantum mechanical process. So it is very likely that the amount of transferred energy varies and therefore also the frequence of the resulting scattered wave, resulting in incoherence. 3 The Forward Model 21 Inelastic Scattering and Imperfect Illumination Since inelastic scattering is typically incoherent, it does not create any interference and it blurs the phase contrast image one would get if one only had elastic scattering. A phenomenological model to account for this influence of inelastic scattering is to introduce an incoherent amplitude contrast formation component by letting the scattering potential have an imaginary part, i.e. Fs.p. : R3 → C where Fs.p. (x) := − 2m (V (x) + i Vabs (x)) . ~ (3.6) Here, the potential energy V accounts for elastic scattering effects and gives rise to phase contrast, whereas the absorption potential Vabs : R3 → R− models the amplitude contrast that originates from the decrease in flux of elastically and unscattered electrons due to inelastic scattering. The imaginary part Vabs is called the absorption (optical) potential. Another source of incoherence is from the illumination, i.e. from the fact that the incident imaging electron is not a perfect monochromatic plane wave traveling along the TEM optical axis. This incoherence can be accounted for by modifying the convolution kernel that models the optics (see section 3.1.1). A Computationally Feasible Model The multi-scale nature of numerically solving (3.4) is not computationally feasible. For a rough idea why the problem is not computationally feasible we utilize the rule of thumb that for reasonable accuracy of the solution one needs about 10 grid points per wavelength, although it is shown that in the case of high wavenumbers this is not sufficient. If electrons are accelerated with 200 keV, their wavelength is about 0.0025 nm. Thus, a specimen thickness of 100 nm corresponds to 400000 wavelengths. With 10 grid points per wavelength, this results in 4 million grid points, only in the z-direction, whereas the specimen dimensions in xand y-direction are even bigger. So one has to consider various approximations. One approximation is to use geometrical optics. This is an approximate treatment of wave propagation where the wavelength is considered to be infinitesimally small (semi-classical approximation). The idea is to represent the highly oscillating solution as a product of a slowly varying amplitude function and an exponential function of a slowly varying phase multiplied by a large parameter. It allows us to express the scattered electron as a phase shifted version of the incident electron. The phase shift is proportional to the integral of the scattering potential along electron trajectories. For thin specimens one can disregard the curvature of these electron trajectories, i.e. 3 The Forward Model 22 one assumes that electrons travel along straight lines parallel to the direction ω of the incident plane wave. If Ψ0 (x, t) := u0 (x)f (t) is the wave function of the incident electron, then the above results in the projection assumption that allows us to express the scattered electron wave Ψout (x, t) = uout (x)f (t) with x = y + τ ω on the specimen exit plane ω ⊥ + τ ω as τ Z uout (y + τ ω) ≈ u0 (y + τ ω) exp iσ Fs.p. (y + sω) ds (3.7) −∞ with the constant σ = me . k~2 The weak phase object approximation is to linearize the exponential in (3.7), i.e. Z uout (y + τ ω) ≈ u0 (y + τ ω) 1 + iσ ∞ Fs.p. (y + sω) ds (3.8) −∞ = u0 (y + τ ω) + usc (y + τ ω) with usc (y + τ ω) = u0 (y + τ ω) iσ P(Fs.p. ). Above, we can integrate to ∞, since Fs.p. is zero beyond the specimen exit plane. The expression in (3.8) is our model for electron scattering. It is sufficiently accurate for a computational treatment of phase contrast TEM imaging data on unstained thin biological specimens. 2. The Optics After interacting with the specimen, electrons pass through the TEM optics as they migrate from the specimen exit plane to the image plane. Besides magnifying the image, in phase contrast imaging the optics has another equally important, but more subtle, role. It is necessary to make phase contrast visible. This is because the intensity is the absolute value of the electron wave whose phase equals one irrespective of the value of the real phase. This problem of losing phase information when taking intensities is referred to as the phase problem. One consequence is that if one measures intensity data directly on the specimen plane, then phase contrast information will be lost. So the optics generates quantum interference between the crests of the electron wave, making it possible to detect the phase. We want to illustrate the above claim somewhat more precisely. For simplicity, consider the case when the electron wave undergoes a constant phase shift of about 4θ as it 3 The Forward Model 23 scatters against the specimen. The phase contrast information that an electron carries after scattering is contained in this phase shift term, i.e. u = u0 exp(i4θ) where 4θ ∈ R is the phase contrast from the specimen. Hence, all relevant phase contrast information is contained in 4θ. By taking the intensity in the specimen exit plane, |u|2 = |u0 |2 , we loose all the phase contrast information. However, if we have an optical system that can shift the phases of the scattered electron over π/2 with respect to u0 , the amplitude gets multiplied by exp(π/2) = i, so the phase shift i4θ becomes −4θ. This is as if the scattered wave would have the form u = u0 exp(−4θ), and taking the image intensity now gives us |u|2 ≈ |u0 |2 (1 − 24θ). Hence, in this way we have circumvented the phase problem. Practically, in TEM imaging such a phase shift can be accomplished by deliberately going out-of-focus while acquiring the images. The Set-Up The optical system in the TEM consists of an objective lens followed by a number of projector lenses and some apertures that are present at several places. The most important part is the objective lens, which forms a first intermediate image with a magnification of only 20-50 times and is followed by an aperture in its focal plane. Although the magnification is relatively small, the objective lens has to have the highest quality concerning spherical and chromatic aberration and astigmatism. Aberration is worse at high angles than at low angles and the first lens has to deal with the highest range of angles, since the range decreases with every magnification. So all following lenses are less affected by aberration and have almost no influence on the final image resolution but only a magnifying effect. Therefore, they can be of much less quality than the objective lense. In order to model phase contrast imaging, it turns out that one can model the entire TEM optical system as a single thin lens with an aperture in its focal plane as illustrated in Figure 3.1 (cf. [31]). The magnification of the single thin lens corresponds to the magnification M of the entire optical system (objective and projector lenses taken together), so M = p/q and 1/f = 1/p + 1/q. (3.9) Here, f is the focal length of the lens, and q, p > 0 are the distances from the lens to the objective and image planes. The aberration properties of the single thin lens correspond to the properties of the objective lens, since this is the only lens with an influence on the image resolution. Note that this set-up does not correspond to a physical optical system. Furthermore, knowledge of M and f (the latter taken as the focal length of the objective lens) allows us to determine p and q by (3.9). 3 The Forward Model 24 Figure 3.1: The optical set-up consisting of a single thin lens with an aperture in its focal plane. [31] Model for the Optics Here we consider the setup in Figure 3.1. First, many electron optical elements, including electron lenses, are adequately modeled within the framework of geometrical charged-particle optics, i.e. one can consider the electron as a point-like charged mass whose motion is governed by the laws of classical mechanics. Modeling diffraction by an aperture (which is an opaque screen with a suitable opening) needs to be based on wave mechanics. Now, the transforming properties of this setup are simply a suitable combination of free-space propagation, a model for a thin lens, and a model for diffraction by an aperture. Let Ψout (y − qω, t) = uout (y − qω)f (t) denote the electron wave on the specimen exit plane. Then, following [31], the corresponding electron wave in a plane immediately above the detector is given as Ψdet (y + rω, t) = udet (y + rω)f (t), where h i y 1 −1 Fω⊥ CTFopt · Fω⊥ uout ( · − qω) + rω udet (y + rω) := M M (3.10) with uout defined in (3.8). In the above, Fω⊥ is the (two-dimensional) Fourier transform in the hyperplane ω ⊥ orthogonal to the optical axis ω and CTF is the optics Contrast 3 The Forward Model 25 Transfer Function (CTF) that is given as ! Cs 4 4z 2 |ξ| − i 3 |ξ| . CTFopt (ξ) := χ |ξ| exp i 2k 4k (3.11) Here, ξ is a variable in the reciprocal space with unit nm−1 , 4z is the defocus (4z < 0 for under focus and 4z > 0 for over focus), Cs the spherical aberration, and χ is the aperture function (also called the pupil function). The latter is the characteristic function of the aperture in the focal plane of the primary lens. The Intensity Generated by One Single Electron If the electron wave above the detector udet is given by (3.10), then the intensity generated by one single electron in the image plane is 2 2 I(Fs.p. )(y, ω) := Ψdet (y + rω, t) = udet (y + rω) . (3.12) Together with (3.8) this results in 2 i y 1 −1 h + rω . I(Fs.p. )(y, ω) = 2 Fω⊥ CTFopt · Fω⊥ u0 · − qω (1 + iσ P(Fs.p. )) M M We assume that the wave function of the incident electron leaving the condenser is a monochromatic plane wave traveling along the fixed direction ω, i.e. for ω ∈ S 2 and y ∈ ω ⊥ holds u0 (y − qω) = eik(y−qω) · ω = e−ikq . Moreover, since biological specimens are weak scatterers, we assume that the intensity can be linearized, i.e. we can ignore second-order terms of usc . Therefore, we can assume that 2 −1 h i F ⊥ CTFopt · Fω⊥ usc ( · − qω) (y + rω) ≈ 0. ω Then, the intensity generated by one single electron is given as 2σ 1 re im PSF ~ I(Fs.p. )(y, ω) = 2 1 − ⊥ P Fs.p. ω opt M (2π)2 im + PSFopt ~ω⊥ P re Fs.p. y + rω M (3.13) im −1 with PSFre opt and PSFopt denoting the real and imaginary part of Fω⊥ CTFopt and re im Fs.p. and Fs.p. the real and imaginary part of the scattering potential Fs.p. . Now, a 3 The Forward Model 26 common assumption is that re im (x) for x ∈ R3 and constant Q ∈ R. (x) = QFs.p. Fs.p. This is the standard phase contrast model and the resulting intensity is I(Fs.p. )(y, ω) = 1 M2 1− 2σ (2π)2 PSFopt ~ω⊥ P re Fs.p. y M + rω (3.14) with o n re PSFopt (y + rω) := PSFim + Q PSF opt opt (y + rω). The specimen-dependent constant Q is called amplitude contrast ratio. 3. The Detector The detector is modeled as a rectangular area in the detector plane divided into square pixels. The process of detecting the scattered electron wave is divided into several steps, roughly corresponding to the process that takes place in a physical detector. The basic principle of the detector model is a Poisson counting process in which the expected number of electrons at each pixel is proportional to the squared absolute value of the wave function. In order to account for detector quantum efficiencies smaller than 1, the actual detector response is modeled as a probability distribution depending on the number of counts (shot noise). Finally, the image is blurred by a detector point spread function (detector blurring). A more detailed explanation follows. Shot Noise When an electron wave reaches the scintillator, it is localized. A number of such discrete sets of localizations occur during the formation of an image. The points where collisions occur can then be described as a sum of random point masses, which are Poisson distributed with the intensity as the expected value. The model is somewhat simpler if we discretize the scintillator into “pixels” corresponding to the pixels of the detector. Then, letting xi,j denote the center point of the (i, j)-th detector pixel, the response from the (i, j)-th pixel is given as a sample of the random variable µCi,j where µ > 0 is a detector related scaling factor (depending on the gain and quantum efficiency) and Ci,j ∼ Poisson A · D · Ii,j , where A is the pixel area, D is the incoming dose (electrons/pixel), and Ii,j is the intensity generated by a single electron at a suitable point yi,j ∈ ω ⊥ given by (3.13) or (3.14) respectively. 3 The Forward Model 27 Detector Blurring When an electron collides with the scintillator, it generates a burst of photons, which are then recorded at pixels in the detector. However, these photons are not only detected by that pixel but also to some extent by nearby pixels. This introduces a correlation (blurring) between the initially independent random variables modeling the shot noise. Next, there might be further correlations introduced by other elements of the detector, e.g. due to charge bleeding around spots that have relatively high intensities. Besides the shot noise, there is additive read out noise generated by the detector. This can be modeled by a Gaussian random variable acting on each pixel. A common approach is to model all these correlations collectively and phenomenologically by introducing a convolution. Hence, the data recorded at pixel (i, j) for a fixed direction ω ∈ S 2 , henceforth denoted by fdata (ω)(i, j), is obtained by forming a discrete, two-dimensional convolution of the response from pixel (i, j) with a point spread function and adding the random variable modelling the read out noise: fdata (ω)(i, j) := X µCk,l PSFdet (xi,j − xk,l ) + i,j (3.15) k,l with i,j ∼ N (0, σ̂ 2 ) and k, l being the indices of every possible pixel of the detector. The detector point spread function PSFdet is defined in terms of its Fourier transform, the Modulation Transfer Function (MTF), which is commonly modeled as MTF(ξ) := b a + + c. 2 1 + α|ξ| 1 + β|ξ|2 (3.16) Note that the parameters a, b, c, α and β are all independent of the specimen. 3.1.2 The Forward Operator Based on the preceding definition of fdata (ω)(i, j), the forward operator can be defined as follows: Assume the scattering properties of the specimen are fully described by the complex valued scattering potential Fs.p. defined in (3.6) and for a fixed direction ω ∈ S 2 the specimen is probed by a monochromatic plane wave. Then, the resulting data at detector pixel (i, j) is a sample of the random variable fdata (ω)(i, j) defined in (3.15). The forward operator for phase contrast TEM imaging is then defined as the expected value of fdata (ω)(i, j), i.e. K(Fs.p. )(ω)i,j := E fdata (ω)(i, j) 3 The Forward Model 28 for ω ∈ S 2 and pixel (i, j). By using definition (3.15) we get K(Fs.p. )(ω)i,j = µ · A · D P k,l Ik,l PSFdet (xi,j − xk,l ) (3.17) with Ik,l = I(Fs.p. )(yk,l , ω) defined in (3.13) or (3.14) respectively. 3.2 The Inverse Problem Before we state the inverse problem in ET we introduce some more notations. Assume we have a given subset S0 ⊂ S 2 of m directions that defines the data collection geometry 2 and a detector with n2 pixels. Then, the data space V = (R+ )mn is the space of all possible data. The reconstruction space U is the Banach space of all complex valued functions defined on R3 that can act as a scattering potential. We assume that U is contained in L 1 (R3 , C− )∩L 2 (R3 , C− ) and for every element in U the real and imaginary part should be positive. Hence, the forward operator is a function K : U → V. The most common data collection geometry is single axis tilting. The specimen plan rotates about a fixed axis, called tilt axis, which is orthogonal to the optical axis. The rotation angle is usually in a range of [−60◦ , 60◦ ] and is called tilt angle. For each direction ω we record a micrograph fdata (ω) and then rotate the specimen plane to a new tilt angle. The collection of all micrographs recorded while varying ω is then called tilt-series. For a given element fdata in the data space V the inverse problem is to determine Fs.p. ∈ U with K(Fs.p. )(ω)i,j = fdata (ω)(i, j) for every (i, j) ∈ {1, ..., n} × {1, ..., n} and ω ∈ S0 . Hence, in an ideal situation, solving the inverse problem is equivalent to reconstructing the scattering potential Fs.p. or alternatively the electrostatic potential Ue.p. of the specimen in each voxel. From this, one could draw inferences about the refractive index and therefore about the material of the specimen. But due to a lot of approximations and assumptions made during the derivation of the forward operator, it is not possible to reconstruct the true electrostatic potential. In the best case it would be possible to reconstruct the correct proportions between the different refractive indices in the specimen. In the case of biological specimens, where resolution is limited due to a lot of problems described later on, one seeks to reconstruct a function Fs.p. that describes at least the correct position and shape of the specimen, but even this is very complicated in most of the cases. 3 The Forward Model 29 3.3 Difficulties for Solving the Inverse Problem As already mentioned in Chapter 2, several problems limit the resolution of a TEM. Moreover, they make it difficult to solve the inverse problem. In this section we want to summarize some of the main problems that, from the technical point of view, limit the quality of recorded data or, from the mathematical point of view, complicate it to solve the inverse problem. The Dose Problem When electrons scatter inelastically during the specimen interaction, energy is transferred to the specimen. This can cause ionization or heating of the specimen, both resulting in specimen damage. Ionization is the process by which an atom acquires positive charge by loosing an electron. An incoming electron can collide with an electron in the electron shell of the atom and remove it. Hence, an electron from an outer shell will replace the lost electron, but the atom remains positively charged. This can break some chemical bonds in the specimen. Another cause for beam damage is heating. This is a major source of damage for biological samples. In order to prevent specimen damage as much as possible, the total number of images that can be recorded is limited, since the tissue gets more damaged with every illumination. This problem is called the dose problem. Thus, the recorded data are very noisy with a low signal-to-noise ratio. Mathematically this leads to severe ill-posedness of the inverse problem (cf. Chapter 4). Since inelastic scattering events increase with the specimen thickness, it is important that thin specimens are used. The Limited Angle Problem For data collected with single axis tilting the range of tilt-angles is limited. Normally the specimen is tilted from −60◦ to +60◦ around the tilt axis. The higher the tilt angle is, the longer is the path of electrons through the specimen. If the path is too long, electrons cannot transmit the specimen and the risk of specimen damage increases. This problem is called the limited angle problem. Now the question is if the recorded projections are sufficient for a stable reconstruction of the 3D volume. According to Orlovs criteria (cf. [54, Chap. VI]) this is only fulfilled if every great circle on S 2 has a non-zero intersection with S0 , which is not the case for tilting in the range of [−60◦ , 60◦ ]. This leads to severe ill-posedness of the problem. In the case of dual axis tilting the problem is less severe, although Orlovs criteria is still not fulfilled. 3 The Forward Model 30 Region of Interest (Local) Tomography In ET the region that is illuminated by an electron beam is much smaller than the whole specimen. We define a region of interest that we seek to reconstruct from the data. Thus, the support of the true scattering potential is not fully contained in this subregion. Since information of the surrounding region is missing, the scattering potential can not be uniquely determined. In order to circumvent this problem, there are two different approaches. The first one is to preprocess the recorded data in order to minimize the contributions from surrounding regions. The second approach is based in prior assumptions about the sample outside the region of interest. The forward and adjoint operator are adapted so that they compensate for contributions from outer regions. A common approach is to set the region outside to a constant value, estimated by the recorded data. Trying to account for this effect is called long object compensation. The Alignment Problem In TEM imaging there are always some small unintentional movements of the specimen during data processing. Hence, the actual set of tilt angles S0 at which data were recorded is unknown and differs from the predicted one. Nevertheless, we need to determine the actual geometric relationships prior to reconstruction, at least to a sufficient degree of accuracy. This problem is called the alignment problem. One way to solve this problem is to use fiducial markers. These are often gold beads that are deposited on the specimen prior to data collection. Since they have a very high density, they are clearly visible on all micrographs and can help to determine the actual geometric relationships. Multicomponent Inverse Problem In 3D ET it is not only the 3D volume that needs to be recovered but also several parameters in the model for the forward operator. Apriori to the reconstruction they cannot be determined reliably, therefore they have to be reconstructed alongside with the scattering potential. The problem we are dealing with is therefore a multicomponent inverse problem. For an overview of parameters that need to be recovered and indications how they can be determined see [56]. Estimating the Data Error The problem of estimating the data error does not influence the data quality or the ill-posedness of the inverse problem, nevertheless we want to mention it here. For many reconstruction methods it is helpful to have an a-priori estimate of the data error. Since the stochasticity of the data is very complex, it is complicated to determine the data error prior to the reconstruction. A lot of discrepancy principles are therefore not applicable in the case of TEM data. 3 The Forward Model 31 3.4 Reconstruction Methods in Electron Tomography In this section we outline the most commonly used reconstruction methods in ET. Reconstruction methods can be divided in either analytical methods, where the signal is reconstructed directly from the data in a single step, or iterative methods. The standard analytical methods are Filtered and Weighted Back-Projection, whereas the iterative ones are ART and SIRT. As already mentioned, the inverse problem that they all are trying to solve is fdata = K(Fs.p. ), (3.18) where the forward operator (cf. (3.17)) can be rewritten in a more compact form K(Fs.p. )(ω, x) = C1 (ω) − C2 n o re PSF(ω, · ) ~ω⊥ P(Fs.p. )(ω, · ) (x), (3.19) re the real part of Fs.p. ∈ U. with ω ∈ S0 ⊂ S 2 , x ∈ ω ⊥ , P the ray transform and Fs.p. Analytical Methods Filtered Back-Projection (FBP) and Weighted Back-Projection (WBP) are the standard approaches within the ET community. The basic assumption for these methods is that the forward operator is the ray transform, i.e. fdata = P(Fs.p. ) for Fs.p. ∈ U. (3.20) Therefore, the data fdata needs to be preprocessed before, including estimations of C1 (ω) and C2 as well as deconvolving the PSF. Now, the set MS of lines parallel to a direction in a fixed smooth curve S ⊂ S 2 that contains the finite set S0 of directions needs to be defined. Since every pair (ω, x) with ω ∈ S 2 and x ∈ ω ⊥ determines a line, we define n o 2 3 ⊥ MS := (ω, x) ∈ S × R : ω ∈ S, x ∈ ω . (3.21) In the case of single axis tilting, S is a great circle arc on S 2 . The foundation for both techniques is now laid by the following equation: For h : MS → R holds the following H ∗ Fs.p. = PS∗ h ~ω⊥ P(Fs.p. ) with H := PS∗ (h). (3.22) PS∗ is the backprojection operator restricted to S, defined by PS∗ (g)(x) Z g(ω, x − (x · ω)ω) dω := S for x ∈ R3 . (3.23) 3 The Forward Model 32 The function h is called reconstruction kernel and H the corresponding filter. Now, the idea for Filtered Back-Projection is to choose h such that H = PS∗ (h) ≈ δ. Then, one has FBP(fdata ) := PS∗ h ~ω⊥ fdata = PS∗ h ~ω⊥ P(Fs.p. ) (3.24) = H ∗ Fs.p. ≈ δ ∗ Fs.p. = Fs.p. . Hence, Fs.p. can be reconstructed from the data fdata . The method of Weighted BackProjection is mathematically equivalent to FBP, but instead of trying to find h such that H ≈ δ one takes h = δω . In this case, the method does not directly yield a reconstruction Fs.p. , but if an expression for the Fourier transformation of H can be derived, we get PS∗ h ~ω⊥ fdata = PS∗ fdata = H ∗ Fs.p. ⇒ F PS∗ fdata = F H · F Fs.p. . (3.25) Hence, the WBP is defined as WBP(fdata ) := F −1 ! F PS∗ fdata = Fs.p. . F H (3.26) Again, this yields a way to reconstruct Fs.p. from fdata . See [59] for a recent survey on these methods and a discussion about appropriate choices of h or respectively H. Iterative Methods The idea of iterative methods is to construct a sequence (Fs.p. )k ∈ U that in the limit converges to a solution Fs.p. ∈ U of (3.18). A common approach are iterative methods with early stopping, assuming the reconstruction scheme to be semiconvergent. That means, initial iterates reconstruct large scale components of Fs.p. and afterwards small scale components, including noise, will be recovered. Before defining an iterative scheme, problem (3.18) is split into a finite number l of subproblems. We mn2 start by splitting the data fdata ∈ V = (R+ ) into l subsets fdata = (fdata )1 , (fdata )2 , ... (fdata )l mn2 with (fdata )j ∈ Rij (3.27) i and i1 + ... + il = mn2 . The function τj : (R+ ) → (R+ ) j is the projection on the j-th data component, i.e. τj (G) = Gj . Then, the partial forward operator Kj : U → Rij is defined as Kj (Fs.p. ) := τj ◦ K (Fs.p. ) for j = 1, ..., l. Hence, the subproblems are (fdata )j = Kj (Fs.p. ). (3.28) 3 The Forward Model 33 A series of projections πj : U → U in the reconstruction space can be defined by πj (Fs.p. ) := Fs.p. + Kj∗ ◦ Cj−1 ((fdata )j − Kj (Fs.p. )). (3.29) Here, Cj : Rij → Rij is a fixed linear positive definite operator. For noisy problems it is often helpful to introduce a relaxation parameter µ > 0 and replace (3.29) by πjµ (Fs.p. ) := (1 − µ)Fs.p. + µπj (Fs.p. ). (3.30) Hence, the impact of noise is reduced by replacing the projection operator by a linear combination of the argument itself and its projection. Then, the reconstruction operator is the mapping of f 7→ (Fs.p. )kmax where (Fs.p. )kmax is generated by (F ) = (Fs.p. )k s.p. k,0 (Fs.p. )k,j := πjµ ((Fs.p. )k,j−1 ), j = 1, ..., l (F ) = (F ) s.p. k+1 (3.31) s.p. k,N with given (Fs.p. )0 , µ > 0, kmax and πjµ ( · ) defined in (3.30) together with (3.29). Now, any iterative algorithm of the form (3.31) is characterized by its specific choice of Cj and the splitting scheme (3.27). The most common examples are ART and SIRT. Algebraic Reconstruction Technique (ART) ART is defined by Cj = Kj ◦ Kj∗ and l = mn2 , i.e. fdata is split into single datapoints. It was introduced in [40] and is the first iterative algebraic technique for tomographic reconstructions. The series of projections in (3.31) can then be expressed as (Fs.p. )k,j := (Fs.p. )k,j−1 + µ 1 (fdata )j − Kj · (Fs.p. )k,j−1 · Kj , j = 1, ..., l. (3.32) 2 kKj k See [55] for further details on ART. Simultaneous Iterative Reconstruction Technique (SIRT) The idea of SIRT was first introduced for tomographic reconstructions in [37]. The approach attempts to correct for errors in all data points simultaneously. Thus, the number ob subsets l is equal to 1 and Cj are chosen as the identity. The resulting iterative scheme, replacing (3.31), corresponds to Landweber iterations and is defined as (Fs.p. )k := (Fs.p. )k−1 + µK∗ (fdata ) − K (Fs.p. )k−1 . (3.33) 34 4 Inverse Problems and Variational Methods In this chapter we present variational methods and reconstruction methods in the field of ET. First, we want to give a brief introduction to inverse problems and the problem of ill-posedness. Afterwards, we show how inverse problems can be solved by variational methods and give some examples of different data and regularization terms. We present data terms motivated by noise models for TEM images, in particular we focus on Poisson noise modeling. Moreover, we introduce Bregman distances and show their applicability for data as well as for regularization terms. This is followed by an overview of how to verify the existence and uniqueness of solutions. Definition of Inverse Problems In the field of imaging with biological or medical applications one often has to deal with so-called inverse problems. This means only the effect f can be measured, although the interest lies in determining the cause u. The effect and the cause are related to each other by an operator K, often describing a projection or device influences on u. The inverse problem we want to solve is then given by Ku = f (4.1) with a compact and affine operator K between Banach spaces. Note that we are interested in solving the inverse problem in ET, where K is given by (3.17), the effect f corresponds to fdata in Chapter 3, and the cause u corresponds to the scattering potential Fs.p. . For reasons of simplicity we will use the more general notations u and f from here onwards. Solving this inverse problem is often complicated by the fact that it is considered to be ill-posed. An inverse problem is called ill-posed if it is not well-posed, which is defined by Jacques Hadamard as: 4 Inverse Problems and Variational Methods 35 Definition 4.0.1. Let K : U → V be a (possible nonlinear) operator. The inverse problem of solving Ku = f is well-posed in the Hadamard sense if: 1. A solution exists for any f in the observed data space. 2. The solution is unique. 3. The solution’s behavior changes continuously with the initial conditions. In the case of an ill-posed problem the third property often does not hold. Reasons for ill-posedness can be measurement errors or omitted information caused by approximations of the problem. This can result in a not continuous or even not well-defined inversion K−1 (cf. [81]). In order to determine u, even in the case of ill-posedness, we are interested in methods for solving the inverse problems without the inversion of the operator K. In this thesis, we will focus on an approach using variational methods. 4.1 Modeling This section deals with variational methods as a solution technique for inverse problems and is based on [71]. 4.1.1 Basic Concept The idea to use variational methods as a technique to solve inverse problems traces back to an approach by Andrey Tikhonov in 1963 (cf. [80]). Since solving equation (4.1) might not be possible, he suggested to minimize a functional consisting of two parts: a data discrepancy term D(f, Ku) and a regularization functional R(u). The data term D(f, Ku) measures the discrepancy between f and Ku, which is zero if u is a solution of (4.1). The regularization functional R(u) contains a-priori information about the solution u. Hence, R(u) should be minimal if u matches the prior information and should increase if u does not correspond to the given information. Therefore, a variational model has the form J(u) = D(f, Ku) + αR(u), (4.2) where α is a weighting parameter in order to control the influence of the a-priori information on the reconstruction. In most cases the measured data are a noisy version 4 Inverse Problems and Variational Methods 36 f δ of the exact data f with δ describing the noise level, i.e. δ f − f ≤ δ. (4.3) An approximation to the solution u can be obtained by û = arg min J(u) = arg min D(f δ , Ku) + αR(u). u (4.4) u The main challenge is to find the right data and regularization term resulting in a good approximation of the unknown truth u. Both functionals have to be chosen in dependency of the given data, more precisely, the data term can be chosen with regard to the expected noise and the regularization term regarding the available apriori information. Optimality Condition Since the functional J may not be differentiable in a classical sense, we must rely on the concept of subdifferentials (cf. [29, Chap. 5]) in order to obtain optimality conditions for (4.4). Definition 4.1.1. Let J : U → R ∪ {∞} be a convex functional and U ∗ the dual space of U. Then, the subdifferential at u ∈ U is given as ∂J(u) = {p ∈ U ∗ | hp, v − ui ≤ J(u) − J(v), ∀v ∈ U}, (4.5) where h · , · i denotes the standard duality product between U and its dual space U ∗ . An element p ∈ ∂J(u) is called subgradient and can be identified with the slope of a plane in U × R through (u, J(u)) that lies under the graph of J. In case J is Fréchet-differentiable, ∂J(u) is equal to {J 0 (u)}, thus the subdifferential coincides with the classical derivative. In order to obtain an optimality condition for a problem minu∈U J(u) with J as in Definition 4.1.1, we can use the following lemma. Lemma 4.1.2. Let J : U → R ∪ {∞} be a convex functional and û ∈ U. Then, û is a minimum of J if and only if 0 ∈ ∂J(û). (4.6) If J is convex, the optimality condition (4.6) is not only necessary but also sufficient. For problem (4.4) we can conclude that 0 ∈ ∂J û = ∂ D f δ , Kû + αR û . (4.7) 4 Inverse Problems and Variational Methods 37 Notation Next, we want to clarify some notations used throughout this chapter. The image u : Ω → R with Ω ⊂ R3 is a function mapping a point (x1 , x2 , x3 ) ∈ Ω to an intensity value u(x1 , x2 , x3 ) ∈ R and is referred to as a reconstruction. The linear space containing all functions u : Ω → R is denoted by U and is called reconstruction space. We call f : Ω0 → R data and f δ : Ω0 → R noisy data. The function space containing all possible data functions is called data space and denoted by V. In our case Ω0 ⊂ R2 ×S0 , where the first and the second dimension refer to coordinates in the image plane and the third dimension refers to different tilt angles. The operator K : U → V is called forward operator and is affine in our case (cf. Chapter 3). Both U and V should be Banach spaces. 4.1.2 Different Noise Models and Corresponding Data Terms Discretization Mathematically, images can be described in two different ways - either continuous or discrete. So far we used the continuous definition u : Ω → R. The advantage of the continuous definition is that mathematical concepts like variational methods can be applied easily. Another advantage is that edges can be defined as discontinuities of u. In this chapter we will mainly deal with this continuous representation of images. Nevertheless, digital images are always a discrete version of the underlying continuous truth, which means that they are represented as a matrix containing intensity values. Thus, the images need to be discretized at least before implementation. But also for defining different noise models, it is easier to work with a discrete version of u, f and f δ , respectively. In order to discretize u, we subdivide the region Ω into N1 × N2 × N3 small voxels. Then, the discrete image denoted as U ∈ RN1 ×N2 ×N3 is defined by R Ui1 ,i2 ,i3 = voxel(i1 ,i2 ,i3 ) u(x) dx R voxel(i1 ,i2 ,i3 ) 1 dx . (4.8) The discretization of the data f or f δ ∈ Ω0 is determined by the device, since the measured data are already a discrete version F ∈ RM1 ×M2 ×M3 or F δ ∈ RM1 ×M2 ×M3 . Note that discrete images are denoted by capital letters. We refer to the discrete operator as K, too. In the discrete case applying K is equivalent to a matrix multiplication. In general, noise can be defined as an undesired distortion in the recorded image. We can distinguish between intensity errors and sampling errors. Intensity errors can be seen as a realization of an independent random variable acting on each pixel separately. Sampling errors are on the contrary influenced by surrounding pixels as well. Here, we 4 Inverse Problems and Variational Methods 38 only concentrate on intensity errors, which can be roughly divided into three different kinds: 1. Additive Noise Let U be the discrete image and δ = (δijk )ijk a matrix of the same size as KU containing realizations of independent and identically distributed (i.i.d.) random variables in each entry. If the recorded data F δ are F δ = KU + δ, (4.9) then the noise is called additive noise. δ often contains realizations of Gaussian random variables, in this case we call it additive Gaussian noise. Alternatively, δ can for example contain realizations of Laplacian or uniformly distributed random variables. 2. Multiplicative Noise Using the same notations as before, if δ F δ = Fijk ijk = (KUijk δijk )ijk = KU · δ, (4.10) then the noise is called multiplicative noise. In this case all realizations of random variables contained in δ have to be positive. For example, δ can contain realizations of i.i.d. Gamma distributed random variables. 3. Data-Dependent Noise If the noise is neither additive nor multiplicative but dependent on the measured intensity, meaning F δ = δ(KU ), (4.11) then the noise is data-dependent. Commonly used models are Poisson noise and Saltand-Pepper noise. Poisson noise is often used to model the errors produced by photon counting CCD sensors. Consider a two-dimensional sensor plane, where each sensor (i, j) corresponds to its position xij in the sensor plane. Then, each sensor (i, j) counts the incoming photons at position xij . The resulting number of photon counts δ(KU )ijk can only be positive and one assumes that they are realizations of Poisson distributed random variables with mean KUijk . MAP Estimator One idea to find a data discrepancy term is to use a-priori information about the noise model. If U and F δ are seen as random variables, we can make 4 Inverse Problems and Variational Methods 39 use of the a-posteriori probability given by Bayes formula as P(U |F δ ) = P(F δ |U )P(U ) . P(F δ ) (4.12) Here, • P(U |F δ ) is the probability that the measured data F δ were generated from the true image U , • P(F δ |U ) is the probability that the data F δ are measured for a given image U and • P(U ), P(F δ ) are the a-priori probabilities for U and F δ respectively. By maximizing the a-posteriori probability we can construct an estimator for U , called the maximum a-posteriori probability (MAP) estimator, by Bayes Û = arg max P(U |F δ ) = arg max P(F δ |U )P(U ). U (4.13) U P(F δ ) can be neglected, since it is independent of U and therefore has no influence on Û . Now, instead of maximizing the probability P(U |F δ ) we can minimize the negative logarithm of the probability. Û = arg min − log P(U |F δ ) U δ = arg min − log P F |U − log P U U | {z }| {z } D(F δ ,U ) (4.14) αR(U ) Defining D(F δ , U ) = − log P(F δ |U ) and αR(U ) = − log (P(U )) we get a variational model like in (4.2). Additive Gaussian Noise Now, assume that F contains additive Gaussian noise δ with E(δ) = 0 and Var(δ) = σ 2 , i.e. δ δ = KUijk + δijk ⇔ Fijk − KUijk = δijk , Fijk (4.15) with a realization δijk of δ. Moreover, we assume a Gibbs model for the a-priori probability of U . In this case, the random variables U and F δ are continuous and therefore we replace the probability P by a probability density function ρP . Then, the 4 Inverse Problems and Variational Methods 40 probability that the measured data F δ were generated from the true image U is equal to ρP (F δ |U ) = Y = Y δ ρP (δijk = Fijk − KUijk ) ijk δ ijk 2 (Fijk −KUijk ) 1 2σ 2 √ e− . σ 2π | {z } (4.16) (∗) Here, (∗) is the probability density function of a normal distribution. Since intensity errors act on each pixel separately, all δijk are independent and we can factorize the probability per voxel. We assume a Gibbs distribution for the a-priori probability of U , i.e. ρP (U ) = c · e−βE(U ) . (4.17) Inserting (4.16) and (4.17) into (4.14) results in X 2 1 δ 2 Fijk − KUijk + σ β E(U ) . Û = arg min |{z} 2 U ijk (4.18) =:α By scaling (4.18) with the number of pixels (M1 · M2 ) and data acquisition directions (M3 ) and assuming M1 · M2 · M3 → ∞ we get a continuous limit for the first term Z 2 X 1 2 1 1 δ Fijk − KUijk → f δ − Ku . M1 · M2 · M3 ijk 2 2 Ω0 (4.19) Thereby we have an asymptotical variational model for continuous functions Z 2 1 δ û = arg min f − Ku dx + αR(u) , 2 Ω0 u (4.20) where R(u) is the asymptotical limit of E(U ). Now, we can see that in the case of additive Gaussian noise a good choice for the data discrepancy term is 1 D(f , Ku) = 2 δ Z Ku − f δ 2 dx. (4.21) Ω Poisson Noise As done before for additive Gaussian noise we now want to derive an appropriate model for the case that F δ is Poisson distributed with mean KU . In TEM imaging the actual noise is a mixture of additive Gaussian and Poisson noise, resulting in a rather complex model (presented hereafter). Our focus will be to use a 4 Inverse Problems and Variational Methods 41 data discrepancy term that properly accounts for the Poisson stochasticity of the data. Again, we assume a Gibbs model for the a-priori probability of U . Since Poisson noise is also acting on each pixel separately, the probability of F δ given U is δ P(F |U ) = Y ijk δ Y KUijk Fijk − δ e |Uijk ) = P(Fijk δ )! (Fijk ijk KUijk . (4.22) The a-priori probability of U is still P(U ) = c · e−βE(U ) . Then, the MAP estimator is again given by δ − log P F |U Û = arg min − log P U . (4.23) U Using equation (4.22) we get δ Fijk KU ijk − KUijk δ log P F |U = log e δ (Fijk )! ijk X δ δ Fijk · log KUijk − log (Fijk )! − KUijk . = Y (4.24) ijk δ Since log (Fijk )! is independent of U , we can neglect the term when inserting (4.24) into (4.23) and the resulting estimator is Û = arg min X U − δ Fijk · log KUijk + KUijk + αE(U ) . (4.25) ijk Once again we scale (4.25) by the number of pixels and data acquisition directions and assume them to become infinitely large and thereby get an asymptotical model Z û = arg min u Ku − f · log(Ku) dx + αR(u) . δ (4.26) Ω0 Hence, in the case of Poisson distributed data an adequate choice for the data discrepancy term is Z δ D(f , Ku) = Ku − f δ · log(Ku) dx. Ω0 4 Inverse Problems and Variational Methods 42 In order to ensure positivity, D(f δ , Ku) is often changed to δ Z f δ δ − f + Ku dx ≥ 0 DKL (f , Ku) = f log Ku Ω0 δ (4.27) and called Kullback-Leibler (KL) divergence or I-divergence. Since the added part is independent of u, this does not affect the solution of (4.26). Mixture of Poisson and Gaussian Noise A lot of applications in the field of imaging are neither affected by pure Gaussian noise nor by pure Poisson noise. Often there is a mixture of data-dependent noise sources and data-independent ones. We want to present an approach to account for both noise models instead of approximating these situations by either Poisson or Gaussian noise. An important application is imaging with a CCD camera (cf. [76]), which is common in a TEM (see Chapter 2). Here we can model the photon counting process as a Poisson distributed random variable and the read-out noise, introduced by the detector, as a Gaussian distributed random variable (cf. section 3.1.1). Other applications are fluorescence imaging or images acquired with certain telescopes. Accurate models for this noise concept recently gained more and more attention in literature, see for example [47, 52, 13, 36]. The proposed noise model is δ Fijk = Zijk + δijk , (4.28) where Zijk is a realization of a Poisson distributed random variable Z ∼ Poisson(KU ) and δijk the realization of a normal distributed random variable δ ∼ N (0, σ 2 ). Both random variables are acting on each pixel separately. There are two different approaches to account for (4.28) in the data discrepancy term. One idea is to use an approximation of the noise statistics. The second idea is to follow the approach presented before, i.e. to determine P(F δ |U ) and its negative logarithm. The probability of F δ under the assumption that U is the true image is given by P(F δ |U ) = Y δ P(Fijk |Uijk ) ijk = Y ijk ∞ X KUijk n! n=0 n e− KUijk δ 2 (Fijk −n) 1 √ e− 2σ2 σ 2π ! . (4.29) 4 Inverse Problems and Variational Methods 43 Hence, the resulting negative logarithm is − log(P(F δ |U )) = X ijk − log ∞ X KUijk n! n=0 n e− KUijk δ 2 (Fijk −n) 1 √ e− 2σ2 σ 2π !! . (4.30) In [47] a primal-dual splitting algorithm is presented for solving the optimization problem (4.14) with D(f δ , Ku) given by (4.30). Approximations for (4.30) or respectively its gradient are presented in [36] and [13]. Bregman Distances The concept of Bregman distances was originally introduced for convex programming [17]. The generalized Bregman difference associated with a functional L ( · ) is defined as p (u, v) = L (u) − L (v) − hp, u − vi DL (4.31) with a subgradient p ∈ ∂L (v) and the duality product h · , · i. The subgradient is a generalization of the derivative introduced in Definition 4.1.1. The Bregman distance is not a distance in the classical sense, since p p (v, u), (u, v) 6= DL DL i.e. it is not symmetric, and the triangle inequality does not hold. Nevertheless, p p (u, v) ≥ 0 and DL (u, v) = 0 for u = v. DL (4.32) If L ( · ) is continuously differentiable in v, the Bregman distance can be interpreted as the difference between L (u) and the first-order Taylor approximation of L ( · ) around v evaluated in u. In Figure 4.1 there are two examples in order to clarify this interpretation. If L ( · ) is not continuously differentiable in v the subdifferential ∂L (v) may be multivalued. In this case the Bregman distance cannot be uniquely defined. See Figure 4.2 for an example. Here, L (x) = |x|, u = 1 and v = 0. L is not differentiable in v = 0 and the subdifferential ∂L (0) is given as ∂L (0) = {p | |u| ≥ p · u ∀u} = [−1, 1]. (4.33) 4 Inverse Problems and Variational Methods 44 Figure 4.1: Bregman distances for single-valued subdifferentials. Figure 4.2: Bregman distances for a multi-valued subdifferential. Hence, p is not uniquely defined. On the left-hand side of Figure 4.2 we chose p = 12 and on the right-hand side p = − 32 . It is easy to see that the Bregman distance varies strongly for different choices of p. Hence, if we are using the Bregman distance for a functional that is not Fréchet-differentiable, we have to address the problem of choosing the ”correct” p. Bregman distances can be useful for data discrepancy terms as well as for regularization terms (see paragraph Bregman-TV Regularization in section 4.1.3). Many distance measures, like for example the squared euclidean distance or the KL divergence (4.27) can be associated with Bregman distances. In the following paragraph, we will give a short overview of different distance measures and their interpretation as Bregman distances. Note that both the data discrepancy and the regularization functional should be convex. Therefore, it is essential that the Bregman distance is convex in the first argument if L is convex. 4 Inverse Problems and Variational Methods 45 Distance Measures Another way of choosing a suitable data discrepancy term is to rely on different distance measures. This means that the term should not be chosen according to certain assumptions about the noise and a MAP estimation. Nevertheless, this may result in the same data discrepancy terms that we stated before. See [27] for a nonprobabilistic argumentation why least squares or KL divergence approaches are a good choice. Another approach is to use Bregman distances as a measure of the goodness of the reconstruction. Then, the discrepancy term is determined by the functional L that is chosen. In Table 4.1 we present different distance measures (cf. [6]). All of them can be motivated by a Bregman distance and may be a suitable choice for a data discrepancy term. The matrix A in the Mahalanobis distance is assumed R L (x) p DL (u, v) Divergence kxk2 ku − vk2 Squared Norm distance x log(x) dµ R xT Ax − R u log u v − u + v dµ (u − v)T A(u − v) log(x) dµ R u v − log u v − 1 dµ KL divergence Mahalanobis distance Itakura-Saito distance Table 4.1: Different Bregman Distances to be positive definite and should be the inverse of the covariance matrix. See [6] for more examples. Further Related Literature on Models Using Poisson Noise In this paragraph we want to present some further literature on solutions for the inverse problem f δ = Ku, (4.34) where f δ is affected by Poisson noise. A review of relevant literature in order to solve (4.34) is given in [14], describing and discussing the most frequently used algorithms. In [11] the authors investigate variational methods with a data discrepancy term based on a Poisson likelihood functional and total variation regularization. It is verified that the problem of computing minimizers is well-posed. Furthermore, it is proven that the resulting minimizers converge to the minimizer of the exact likelihood function if the errors in the data and in the forward operator tend to zero. In [10] the authors 4 Inverse Problems and Variational Methods 46 broaden their work to a class of regularization functionals defined by differential operators of diffusion type. Again, a theoretical analysis of this approach is given. More computational and algorithmic results are presented for example in [7, 8, 9]. In [70] a total variation based regularization technique is proposed. The data discrepancy term is derived via logarithmic a-posteriori probabilities and a MAP estimator and is given by the KL divergence. In order to prevent the smoothing of the total variation and to guarantee sharp edges, a dual approach for the numerical solution is used. The minimization is realized by a two-step iteration based on the expectation-maximization (EM) algorithm. A detailed explanation and analysis of this method is given in [69]. In addition, several numerical results are presented. In [73] another approach to minimize this energy functional is proposed. In this work the algorithm uses alternating split Bregman techniques. Another approach to minimize a constrained optimization problem based on the KL divergence and total variation (cf. (4.42)) or other edgepreserving regularization terms is given in [85]. The authors propose a particular form of a general scaled gradient projection (SGP) method in order to solve this problem. Furthermore, a new discrepancy principle for the choice of the regularization parameter is introduced. A criterion to select proper values for the regularization parameter in the specific case of Poisson noise is given in [15]. An analysis of the minimization of more general seminorms kD · k under the constraint of a bounded KL divergence DKL (f δ , K · ) is presented in [79]. Here, D and K are linear operators often representing some discrete derivative and a linear blurring operator respectively. The focus lies on proving relations between the parameters of KL divergences and constrained and penalized problems. Since total variation based methods suffer from contrast reduction, an extension to the aforementioned algorithm in [69] is proposed in [19]. The idea is to use Bregman iterations and inverse scale space methods in order to improve imaging results. An analysis is given in [20]. 4.1.3 Different Regularization Terms Besides choosing different data terms, one can also adjust the functional defined in (4.2) by a suitable choice of the regularization term. The regularization should be chosen in relation to the a-priori information we have, e.g. the information that the solution should be smooth or have sharp edges. We start this section by discussing which function space is an appropriate choice for the regularization functional and afterwards present two different regularizations. This section is based on [18]. 4 Inverse Problems and Variational Methods 47 Function Space for the Regularization Term We start our consideration with stating some of the properties, which a reasonable function space should have. First, every reasonable image u should be contained in the function space, while noise should either not be included or easily to separate from the signal u. Moreover, the function space should be a dual space of another function space, which is useful for proofing the existence of a minimizer of (4.2) (cf. section 4.2). Hence, we start with Lebesgue spaces Lp , which are defined as Z o n L (Ω) := u : Ω → R measurable | |u|p dµ < ∞ p (4.35) Ω for every real number 1 ≤ p < ∞. For p = ∞ the Lebesgue space is defined as n o L∞ (Ω) := u : Ω → R measurable | ess sup|u(x)|< ∞ (4.36) x∈Ω with Ω ⊂ R3 and ess sup is the essential supremum (see for example [71, p.291] for a definition). Every Lebesgue space Lp with p ≥ 1 is a Banach space with norm 1 R kukLp = Ω |u|p dµ p or respectively kuk∞ = ess sup|u(x)|. For p = 2, Lp is not only x∈Ω a Banach space but also a Hilbert space with scalar product Z hu, viL2 = u · v dµ. (4.37) Ω The advantages of Lebesgue spaces are that every reasonable image is contained in Lp and for 1 < p ≤ ∞ Lp is a dual space, since ∗ Lp (Ω) = Lq (Ω) with 1 1 + = 1. q p (4.38) Only L1 is no dual space. The disadvantage of Lebesgue spaces is that not only the signal u but also noise can be contained in Lp . One example is Gaussian noise, which is contained in L2 . Thereby we cannot distinguish between signal and noise. The idea is to reduce the function spaces in order to exclude noise. An obvious consideration are Sobolev spaces W 1,p (Ω) defined as W 1,p Z n o p (Ω) = u ∈ L (Ω) | |∇u|p dµ < ∞ . (4.39) Ω Therefore, W 1,p (Ω) is a subspace of Lp (Ω) with the constraint that not only the function u itself but also its gradient ∇u has to be p-integrable. Hence, present noise leads to much higher values of the norm, making it easier to differentiate between signal and 4 Inverse Problems and Variational Methods 48 noise. The main drawback of Sobolev spaces is that, especially in the case of p > 1, they are too restrictive to include every reasonable image. This can be seen in the following two Lemmata, proven in [18, Chap. 4.2]. Lemma 4.1.3. Let u ∈ W 1,p (Ω) with p > 1 and Ω ⊂ R1 . Then, u is continuous. In order to proof this lemma, one verifies the Hölder condition. It can be generalized to higher dimensions using the same approach for the proof. Lemma 4.1.4. Let D ⊂ Ω be a domain with C 1 -boundary. Then, the function 1 for x ∈ D u(x) = 0 else (4.40) is not in W 1,p (Ω) for p ≥ 1. That means, for every p > 1 only continuous functions are permitted in W 1,p and therefore no discontinuities like edges are allowed. As a result, W 1,p for p > 1 is no reasonable choice for the function space. But even for p = 1 no piecewise constant functions are permitted, which are essential for the discretization. Another drawback is again that W 1,1 is no dual space. Consequently, the reduction of L1 to W 1,1 was too restrictive and we need to enlarge our choice in a way that every reasonable image u is contained in the function space and that it is a dual space. For that reason we introduce the space of functions of bounded (total) variation BV (Ω) defined as n o 1 BV (Ω) = u ∈ L (Ω) | |u|BV < ∞ (4.41) with the Total Variation (TV) given by (cf. [1]) Z T V (u) = |u|BV := u ∇ · ϕ dµ. sup ϕ∈C0∞ (Ω;R3 ), kϕk∞ <1 (4.42) Ω BV (Ω) is a Banach space with norm kukBV (Ω) := kukL1 + |u|BV (Ω) . (4.43) For every function u ∈ W 1,1 holds Z |u|BV (Ω) = |∇u| dµ = k∇ukL1 . (4.44) Ω Thus, it is easy to conclude that for u ∈ W 1,1 kukW 1,1 = kukBV and therefore W 1,1 (Ω) is 4 Inverse Problems and Variational Methods 49 a subset of BV (Ω). But in addition piecewise constant functions, like the one presented in Lemma 4.1.4, are included in BV (Ω) as well, since Z |u|BV = u ∇ · ϕ dµ sup ϕ∈C0∞ (Ω;R3 ), kϕk∞ <1 Ω Z = 1 ∇ · ϕ dµ sup ϕ∈C0∞ (Ω;R3 ), kϕk∞ <1 D Z = ϕ · n dσ sup ϕ∈C0∞ (Ω;R3 ), kϕk∞ <1 ∂D Z 1 dσ = |∂D|< ∞ = ∂D with u defined as in (4.40). Consequently, as long as ∂D is a finite Hausdorff-measure, the total variation of the piecewise constant function is finite, too. In the case of a simple curve, the Hausdorff-measure corresponds to the length of the curve. Therefore, |u|BV with u defined as in Lemma 4.1.4 corresponds to the length of the curve as well. Hence, reducing the total variation of u is equivalent to smoothing the curve. Another advantage of this function space is that BV (Ω) is the dual space of another function space (see [18, Chap 5.4] for more details). We can conclude that BV (Ω) is a reasonable choice for the function space of the regularization functional. TV Regularization Now, the idea is not only to seek for smooth functions in the space of functions with bounded total variation but also use the total variation as the regularization functional. Thus, R(u) in (4.2) is R(u) = |u|BV (Ω) (4.45) with |u|BV defined as in (4.42). As already mentioned before, minimizing |u|BV has a smoothing effect. In the case of a denoising problem, i.e. K = I is the identity operator, and Gaussian Noise the variational model is 1 J(u) = 2 Z u − fδ 2 dx + α|u|BV (Ω) , (4.46) Ω which is the well-known Rudin-Osher-Fatemi model, often referred to as the ROF model (cf. [64]). In this thesis, we concentrate on the model using the KL divergence as a 4 Inverse Problems and Variational Methods 50 Figure 4.3: Contrast loss for 1D signal recovered with TV regularization and different regularization parameters. data discrepancy term together with the total variation as a regularization, i.e. J(u) = Z Ω0 f δ log fδ Ku − f δ + Ku dx + α|u|BV (Ω) → min . (4.47) u∈BV (Ω), u≥0 A regularization with the total variation of the reconstruction u succeeds in reconstructing sharp edges. With (4.44) it is easy to see that minimizing the total variation is equivalent to the assumption that the gradient has a sparse representation. As a consequence piecewise constant functions are preferred. Hence, an effect of TV-regularization are cartoon-like images, where major structures like edges are reconstructed while small scale textures vanish. Therefore, for reconstructing blocky images, minimizing the total variation is a good choice. If the focus is on reconstructing natural images, TV regularization is not a good choice since small textures tend to not be reconstructed. Another deficiency is the loss of contrast in the reconstruction compared to the original image (see Figure 4.3). We will discuss an approach to overcome this problem below. For the existence of a unique minimizer of (4.2) it is essential that the TV functional is convex, i.e. |β · u + (1 − β) · v|BV (Ω) ≤ β · |u|BV (Ω) +(1 − β) · |v|BV (Ω) (4.48) for every u, v ∈ BV (Ω) and β ∈ [0, 1] (cf. section 4.2). Since the total variation is not differentiable in a classical sense, we must rely on subdifferentials in order to obtain optimality conditions for the minimization problem. 4 Inverse Problems and Variational Methods 51 Bregman-TV Regularization Since reconstructions with TV as a regularization term suffer from contrast loss (cf. [57]), we want to present an approach to overcome this problem. It is based on simultaneous contrast enhancement using Bregman distances. The technique was introduced in [57] with a detailed analysis for problems with squared norm discrepancy terms. It has been generalized to time-continuity [22], Lp -norm data discrepancy terms [21] and nonlinear inverse problems [5]. In [19] the approach was introduced to solve problems with Poisson noise. Finally, in [12] a combination of Bregman updates with higher-order TV methods is presented. The idea to overcome contrast reduction is to introduce an iterative regularization by Bregman distances. Instead of using a regularization functional as before, information about the solution u that we gained from a prior solution of problem (4.4) is included. Therefore, problem (4.4) is replaced by a sequence of problems ul = arg min D(f δ , Ku) + α · (R(u) − hpl−1 , ui) (4.49) u with p0 = 0 and pl−1 being an element in the subdifferential of the total variation of the prior solution ul−1 . In this thesis our focus is on R(u) = T V (u) combined with the KL divergence, i.e. ul = arg min u∈BV (Ω), u≥0 R Ω0 f δ log fδ Ku − f δ + Ku dx + α |u|BV (Ω) − hpl−1 , ui (4.50) with pl−1 ∈ |ul−1 |BV (Ω) . By using the Bregman distance (4.31), problem (4.49) can be replaced with pl−1 l−1 ul = arg min D(f δ , Ku) + αDR( (u, u ) ·) u = arg min D(f δ , Ku) + α R(u) − R(ul−1 ) − hpl−1 , u − ul−1 i . (4.51) u An update strategy for pl can be derived via an optimality condition for the problem (4.51). Using (4.7) we can conclude that 0 = ql + α pl − pl−1 with ql ∈ ∂D(f δ , Kul ), pl ∈ ∂R(ul ). (4.52) It follows that a suitable update strategy for pl is pl = pl−1 − 1 · ql . α (4.53) 4 Inverse Problems and Variational Methods 52 Hence, according to (4.51), not only the regularization functional R(u) is minimized but also the distance between u and ul−1 . The idea is that ul−1 is already an approximation to the optimal solution û. Therefore, ul−1 can be included as a-priori information into the regularization functional. The solutions of (4.49) and (4.51) correspond to each other, since the added parts in (4.51) are independent of u and do not influence the solution ul . The iteration strategy can be roughly summarized as follows: We start with an overregularized solution u1 for problem (4.4). Then, subsequently, information that was already filtered out as noise is added back to the reconstruction. Since the first image was overregularized, this information still contains a lot of texture information. In each iteration step more information is included but more noise as well. Since larger scales converge faster than smaller ones, the method needs to be stopped at a suitable time. Then, large scale features may already be incorporated into the reconstruction, while small scale features, like noise, may be still missing. If the iteration procedure is not stopped, it converges to the noisy image f δ and the total variation could become unbounded. Therefore, we need a stopping criterion. If we have a reliable estimation of the noise level δ, we can use for example a discrepancy principle, i.e. we stop the method if ||ul − f δ ||≤ δ. If the noise level is unknown, another idea is to stop, when ul+1 is noisier than ul . Since methods like this one can be seen as the opposite of scale space methods, they are called inverse scale space methods (cf. [22]). If R is not continuously differentiable, it is not obvious that the Bregman distance (4.31) can be defined for arbitrary u and v, since ∂R(v) may be empty or multivalued (see Figure 4.2). In [57] it is proven that a multivalued version of the Bregman distance is no problem, since the algorithm selects a unique gradient. Bregman-TV regularization leads to contrast improvement compared to standard TV regularization (cf. [57]). Another question is the well-definedness of the algorithm. Therefore, we need the existence of a minimizer ul in (4.51) and of the subgradient pl . The well-definedness for D(f δ , Ku) = 21 kf δ − Kuk2 with a linear operator K and R(u) = |u|BV (Ω) is proven in [57]. For another choice of the regularization functional R(u) the results can be easily generalized under the assumption that R(u) is a locally bounded, convex, nonnegative regularization functional defined on a Banach space. The generalization to other fitting functionals D(f δ , Ku) is far more complicated. In the case where the data discrepancy term is a KL divergence, see [20] for an analysis. Moreover, several applications demonstrate the success of the iterative regularization procedure for reconstruction problems with Poisson noise (cf. [69],[19]). See section 4.2 for more details. 4 Inverse Problems and Variational Methods 53 Further Related Literature on Regularization Methods in the field of ET In the previous sections, a model to solve the inverse problem in ET based on variational methods is presented. Apart from this, there have been several approaches to apply variational methods to the inverse problems in ET (cf. for example [75], [65], [41]). Nearly all of these methods assume additive Gaussian noise and therefore choose the data discrepancy term as D(f, Ku) = kKu − f k2 or its weighted version, the Mahalanobis distance (cf. paragraph Distance Measures in section 4.1.2). The approaches differ in particular by the choice made for the regularization functional R(u). Mainly, three different kinds of regularization functionals have been applied: entropy regularization, TV regularization and sparsity promoting regularization. Entropy Regularization The idea is to choose the regularization functional R(u) as an entropy. One example for applications in ET is given in [75]. Here, the regularization functional is the KL divergence Z u − u + ρ dx Rρ (u) = u log ρ Ω (4.54) with a fixed prior ρ ∈ U. Note that in our model the KL divergence is used as a data discrepany term, whereas it can also be used as a regularization term as presented here. The data discrepancy term in [75] is a Mahalanobis distance. The regularization parameter is based on an estimate of the data error, which is unreliable in the case of highly noisy ET data. An extension and mathematical analysis of the previous approach is given in [67]. Here, an iterated regularization scheme is used to update the estimate of the data error and the nuisance parameters during the reconstruction. TV Regularization In the ET community there have been different approaches for regularization functionals based on the total variation, comparable to our model. Most of them use the formal definition of TV presented in (4.44), i.e. Z |∇u| dx. R(u) = Ω The first approach with applications in ET is presented in [2] with an anisotropic variant of TV. In [41] TV-regularization is applied on STEM data of specimens from material sciences. A drawback in both approaches is that data has to be preprocessed, since the forward operator is assumed to be the ray transform. An approach that works with a forward operator modelling amplitude contrast as well as phase contrast 4 Inverse Problems and Variational Methods 54 for any parallel beam geometry is presented in [65]. We will explain this approach in more detail later on and use it as a reference method for our results (cf. Chapter 6 and Chapter 7). Here, the proposed regularization functional is not only the total variation but Z 1q Z p1 q p +β (4.55) |u| dx . R(u) = α |∇u| dx Ω Ω With p = 1 and β = 0 this coincides with the normal TV regularization. Sparsity Promoting Regularization The basic assumption for sparsity promoting regularization is that most signals are sparse in some suitable representation. Then, the regularization functional is R(u) = kρ(u)kL1 (4.56) where ρ is a sparsifying map for u. One example is ρ(u) = ∇u, which coincides with TV regularization, assuming that the gradient is sparse. Another common approach P cj ϕj is to assume that there is a dictionary {ϕj }∞ j=0 and u has a representation u = with cj = 0 for as many j’s as possible. Then, ρ(u) = {cj }j , penalizing every cj that is not zero. This can allow for recovering signals from relatively few and highly noisy data. An approach for applications on STEM data is presented in [16]. 4.2 Existence and Uniqueness of a Minimizer In this section we want to outline how to proof the existence and uniqueness of a minimizer of J(u) = D(f δ , Ku) + αR(u) → min . u∈U Our approach follows the idea in [18]. Besides presenting a general framework, we focus on the specific case that R(u) = |u|BV (Ω) . For the data discrepancy term D(f δ , Ku) we consider either the L2 -norm 21 kKu − f δ k2 or the KL divergence DKL (f δ , Ku). In each step, we start with the simpler case of the L2 -norm before we move on to the more challenging case of the KL divergence, which we use in our model (cf. (4.47)). We recall the approaches in [23] and [69], respectively. A special focus will be on the question whether the approaches need to be changed in the case of an affine forward operator Ku = C − Lu, where L is linear. We base our analysis on the weak-∗ topology on BV (Ω) (cf. [48, Chap. 10.3]). 4 Inverse Problems and Variational Methods 55 Definition 4.2.1 (Weak-∗ Convergence). Let U be a Banach space, U ∗ its dual space. Then, νk converges to ν in the weak-∗ topology on U ∗ if hνk , ui → hν, ui ∀u ∈ U. (4.57) We write νk *∗ ν. For proofing the existence of a minimizer we want to make use of the fundamental theorem of optimization: Theorem 4.2.2 (Fundamental Theorem of Optimization). Let J : (U, τ ) → R ∪ {∞} be a functional on a topological space U with a (locally convex) topology τ that fulfills the following two assumptions: 1. Lower Semicontinuity: for uk → u in topology τ holds J(u) ≤ lim inf J(uk ). k 2. Compactness of Sub-Level Sets: ∃α ∈ R, so that Sα := {u ∈ U : J(u) ≤ α} is nonempty and compact in τ . Then, there exists a minimum û ∈ U, i.e. J(û) ≤ J(u) ∀u ∈ U. In order to use this theorem, we need to proof that both assumptions are fulfilled. As already stated before we want to find a minimum in the space of functions of bounded variation, so U = BV (Ω). It is essential that BV (Ω) is the dual space of another Banach space, in order to use the weak-∗ convergence. Coercivity We will start with the compactness of the sub-level sets. Therefore, we use the following theorem (cf. [48, Chap. 12, Theorem 3]): Theorem 4.2.3 (Theorem of Banach-Alaoglu). Let U be the dual space of some Banach space Z. Then, each bounded set in U is precompact in the weak-∗ topology. That means, if we can show that the sub-level sets are bounded, we can conclude their compactness. According to the following definition the functional is then called BV -coercive (cf. [69, Definition 4.6]). Definition 4.2.4 (BV-Coercitivity). A functional J defined on L1 (Ω) is BV -coercive, if the sub-level sets of J are bounded in the k · kBV (Ω) norm, i.e. for all α ∈ R≥0 the set {u ∈ L1 (Ω) : J(u) ≤ α} is uniformly bounded in the BV norm; or equivalent J(u) → +∞ whenever kukBV (Ω) → +∞. (4.58) 4 Inverse Problems and Variational Methods 56 The concepts to verify the boundedness of the sub-level sets differ depending on the data discrepancy term. We start with a sketch of the idea presented in [23]. Here, the minimization problem is 1 (4.59) J(u) = kKu − f δ k2 + α|u|BV (Ω) → min u∈BV (Ω) 2 with α > 0. The subspace of functions with bounded variation with mean zero is defined by Z (4.60) BV0 (Ω) = u ∈ BV (Ω)| u dx = 0 . Ω Furthermore we assume that K1 6= 0 and hKv, K1i = 0 ∀v ∈ BV0 (Ω), where 1 is the constant function with value 1 everywhere. Both assumptions match with the forward operator presented in Chapter 3. Now, every minimizer û of (4.59) can be decomposed in the form hf δ , K1i û = v + 1 (4.61) kK1k2 with v ∈ BV0 (Ω). It holds that |û|BV (Ω) = |v|BV (Ω since the total variation of a constant function is zero. Hence, problem (4.59) can be reduced to a minimization over BV0 (Ω) by accounting for (4.61) and the specific form of the forward operator Ku = C − Lu: 1 kKu − f δ k2 + α|u|BV (Ω) 2 2 δ hf , K1i 1 δ + α|v|BV (Ω) Kv + K1 − f = 2 kK1k2 2 δ 1 hf , K1i δ + α|v|BV (Ω) = C − Lv + K1 − f 2 kK1k2 2 1 = Lv − f˜ + α|v|BV (Ω) → min v∈BV0 (Ω) 2 (4.62) δ ,K1i δ with f˜ = C + hfkK1k 2 K1 − f . In [23, Proposition 3.4.] the authors proof that BV0 (Ω) is the dual space of another Banach space, i.e. the assumption in the theorem of Banach-Alaoglu is fulfilled. Moreover, they verify that the total variation | · |BV (Ω) is an equivalent norm on BV0 (Ω). The total variation is bounded on the sub-level sets Sα := {u ∈ U : J(u) ≤ α}. Thus, we can conclude that the sub-level sets are bounded in the BV -norm and that J is BV -coercive. It follows with the theorem of Banach-Alaoglu that the sub-level sets are compact in the weak-∗ topology. Next, we want to focus on the idea presented in [69]. Here, the minimization problem 4 Inverse Problems and Variational Methods 57 is the following: J(u) = DKL (f δ , Ku) + α|u|BV (Ω) → min (4.63) u∈BV (Ω), u≥0 with α > 0. DKL : L1 (Ω0 ) × L1 (Ω0 ) → R≥0 ∪ {+∞} is the KL functional given by Z ϕ − ϕ + ψ dν DKL (ϕ, ψ) = ϕ log ψ Ω0 (4.64) for all ϕ, ψ ≥ 0 a.e., where ν is a measure. With the convention 0 log(0) = 0 the integrand in (4.64) is nonnegative and vanishes if and only if ϕ = ψ. Again we assume that K1 6= 0. Since BV (Ω) ⊂ L1 (Ω), the set of admissible solutions can be extended from BV (Ω) to L1 (Ω) with |u|BV (Ω) = +∞ if u ∈ L1 (Ω)\BV (Ω). Hence, solutions with bounded total variation are still preferred. Now, the idea in [69] is to derive an estimate of the form kukBV (Ω) = kukL1 (Ω) + |u|BV (Ω) ≤ c1 (J(u))2 + c2 J(u) + c3 , (4.65) with c1 ≥ 0, c2 > 0 and c3 ≥ 0. Then, the coercivity follows directly from the positivity of J(u) for u ≥ 0 a.e.. Once again, every u ∈ BV (Ω) can be decomposed in the form R u=w+v u dx |Ω| Ω with w = 1 (4.66) and v = u − w ∈ BV0 (Ω). It holds that α|v|BV (Ω) = α|u|BV (Ω) ≤ J(u) = DKL (f δ , Ku) +α|u|BV (Ω) | {z } ≥0 and therefore 1 J(u). (4.67) α According to the Poincaré-Wirtinger inequality there exists a constant C1 > 0 with |v|BV (Ω) ≤ C1 J(u). α (4.68) (C1 + 1) J(u). α (4.69) kvkL1 (Ω) ≤ C1 |v|BV (Ω) ≤ Thus, we can conclude that kukBV (Ω) ≤ kwkL1 (Ω) + 4 Inverse Problems and Variational Methods 58 In order to derive an inequality of the form (4.65), an estimate for kwkL1 (Ω) is still needed. Therefore, one can investigate the L1 distance between Ku and f , using that kϕ − ψk2L1 (Ω0 ) ≤ 4 2 kϕkL1 (Ω0 ) + kψkL1 (Ω0 ) DKL (ϕ, ψ) 3 3 (4.70) for any nonnegative functions ϕ and ψ in L1 (Ω0 ) (cf. [69, Lemma 4.3]). Now, for the forward operator Ku = C − Lu we need to adapt the estimations in [69]. We seek an upper and lower bound for kKu − f δ − Lwk2L1 (Ω0 ) . The resulting estimate is then in dependence of kLwkL1 (Ω0 ) , which can be reformulated using a constant C2 > 0 with R C2 = Ω0 |L1| dν |Ω| and kLwkL1 (Ω0 ) = C2 kwkL1 (Ω) . The remaining part of the argumentation in [69, Lemma 4.7] can be directly carried forward even in the case of an affine operator K. It verifies the existence of an estimate of the form (4.65). Thus, J is BV -coercive and with the theorem of Banach-Alaoglu we can conclude the compactness of the sub-level sets. Lower Semicontinuity Next, we want to outline the proof that the functional J is lower semicontinuous. Since the sub-level sets are compact in the weak-∗ topology, this topology needs to be used for the lower semicontinuity as well. The sum of two functionals D(f δ , Ku) and R(u) is lower semicontinuous if both functionals are. Thus, this property can be verified separately. We need to show that for u ∈ BV (Ω) the functionals u 7→ |u|BV (Ω) , u 7→ 12 kKu − f δ k2 and u 7→ DKL (f δ , Ku) are weak-∗ lower semicontinuous. For the first and second functional we follow the approach in [23]. We start with the TV-functional. Therefore, we use the dual definition of the total variation. Let ϕk ∈ C0∞ (Ω, R3 ) be a sequence with kϕk k∞ ≤ 1. Then, Z |u|BV (Ω) = lim k u∇ · ϕk dx. (4.71) Ω Now, let un *∗ u. We need to verify that |u|BV (Ω) ≤ lim inf |un |BV (Ω) : n Z Z u∇ · ϕk dx = lim inf Ω n ≤ lim inf n Ω un ∇ · ϕk dx Z sup un ∇ · ϕ dx = lim inf |un |BV (Ω) . ϕ∈C0∞ (Ω;R3 ), kϕk∞ <1 Ω n 4 Inverse Problems and Variational Methods 59 Now, with lim follows k Z |u|BV (Ω) = lim k u∇ · ϕk dx ≤ lim inf |un |BV (Ω) n Ω and u 7→ |u|BV (Ω) is weak-∗ lower semicontinuous. Next, we want to verify the lower semicontinuity for the data discrepancy functional u 7→ 21 kKu − f δ k2 . We can assume that K is a bounded linear operator on BV (Ω), since we can shift the data f according to (4.62) if K is affine. Furthermore, we assume that the range of the adjoint operator K∗ is contained in Z with Z ∗ = BV (Ω). Since a square preserves lower semicontinuity, it suffices to show that u 7→ 21 kKu − f δ k is weak-∗ lower semicontinuous. Again, we use a dual characterization, namely kKu − f δ k = sup hKu − f δ , ϕi ϕ,kϕk=1 for any Hilbert space norm. Let un *∗ u and let ϕk be a sequence with kϕk k∞ = 1 that fulfills lim hKu − f δ , ϕk i = kKu − f δ k. Then, k hKu − f δ , ϕk i =hu, K∗ ϕk i − hf δ , ϕk i | {z } ∈Z = lim inf hun , K∗ ϕk i − hf δ , ϕk i n ≤ lim inf sup hKun − f δ , ϕi n ϕ,kϕk=1 = lim inf kKun − f δ k. n The weak-∗ lower semicontinuity of u 7→ 12 kKu − f δ k follows again by taking the limit over k. Now, for a problem of the form (4.59) we can conclude that J(u) is lower semicontinuous and that Theorem 4.2.2 is applicable. Finally, for verifying the lower semicontinuity of u 7→ DKL (f δ , Ku) we use the following property of the KL functional (cf. [69, Lemma 4.3]). Lemma 4.2.5. For any fixed nonnegative function f ∈ L1 (Ω0 ), the KL functional u 7→ DKL (f, Ku) is lower semicontinuous with respect to the weak topology of L1 (Ω0 ). A sketch of the proof based on Fatou’s Lemma is given in [4]. Thus, we can directly follow that J(u) given by (4.63) is lower semicontinuous in the weak-∗ topology. Again, Theorem 4.2.2 is applicable. 4 Inverse Problems and Variational Methods 60 In conclusion we have shown that (4.59) for problems with additive Gaussian noise as well as (4.63) for problems with Poisson noise have a minimizer û. It remains to verify that the minimizer is unique. Uniqueness In order to verify the uniqueness of the minimizer, it suffices to show that the functional J(u) = D(f δ , Ku) + αR(u) is strictly convex. We start with a definition of convexity. Definition 4.2.6 ((Strict) Convexity). A function J : U → R is called convex if for all x1 , x2 ∈ U and λ ∈ [0, 1] J (λx1 + (1 − λ)x2 ) ≤ λJ(x1 ) + (1 − λ)J(x2 ). (4.72) J is called strictly convex if for all x1 6= x2 ∈ U and λ ∈ (0, 1) J (λx1 + (1 − λ)x2 ) < λJ(x1 ) + (1 − λ)J(x2 ). (4.73) A sum of two functionals is strictly convex if both functionals are convex and at least one of them is strictly convex. Therefore, this property can be again verified separately for u 7→ |u|BV (Ω) , u 7→ 21 kKu − f δ k2 and u 7→ DKL (f δ , Ku). The total variation is convex, since for u, v ∈ BV (Ω) and λ ∈ [0, 1] Z |λu + (1 − λ)v|BV (Ω) := (λu + (1 − λ)v) ∇ · ϕ dx sup ϕ∈C0∞ (Ω;R3 ), kϕk∞ <1 ! Z ≤ λ· u ∇ · ϕ dx sup ϕ∈C0∞ (Ω;R3 ), kϕk∞ <1 Ω Ω ! Z + (1 − λ) · v ∇ · ϕ dx sup ϕ∈C0∞ (Ω;R3 ), kϕk∞ <1 Ω =λ|u|BV (Ω) + (1 − λ)|v|BV (Ω) . Note, if Bregman-TV regularization is used instead of normal TV regularization, the regularization functional is still convex, because | · |BV (Ω) is convex. In both cases, the regularization parameter α needs to be positive in order to maintain convexity. Hence, for uniqueness we have to verify that D̂ : BV (Ω) → R with D̂(u) = D(f δ , Ku) is strictly convex. We start with D(f δ , Ku) = 21 kKu − f δ k2 . According to (4.62) we can assume that K is linear. The quadratic fitting term is convex, but it remains to verify that it is strictly 4 Inverse Problems and Variational Methods 61 convex. The second-order derivative of D̂ in direction v is d2 D̂(u; v) = kKvk2 . Thus, as long as K is injective, it holds that kKvk > 0 if v 6= 0. We can conclude that D̂ is strictly convex. Next, we want to show that u 7→ DKL (f δ , Ku) is strictly convex, too. According to [69] the function mapping (ϕ, ψ) 7→ DKL (ϕ, ψ) is convex. For an affine operator Ku = C − Lu with linear L follows DKL λϕ1 + (1 − λ)ϕ2 , K λu1 + (1 − λ)u2 = DKL λϕ1 + (1 − λ)ϕ2 , C − L λu1 + (1 − λ)u2 = DKL λϕ1 + (1 − λ)ϕ2 , λ C − Lu1 + (1 − λ) C − Lu2 ≤ λ · DKL ϕ1 , C − Lu1 + (1 − λ) · DKL ϕ2 , C − Lu2 . Hence, with ϕ1 = ϕ2 = f we can conclude that D̂ with u 7→ D̂(u) = DKL (f δ , Ku) is convex, too. For strict convexity we assume that inf0 f δ > 0, inf Ku > 0 and that K Ω Ω is injective. Then, the strict convexity of D̂ is a consequence of the strict convexity of the negative logarithm − log( · ). Overall we have shown that J(u) is strictly convex for both kinds of the data discrepancy term. Together with the existence of a minimizer we can conclude that problem (4.59) as well as problem (4.63) have a unique minimizer. Well-Definedness for Iterative Regularization We cover the well-definedness for variational methods using iterative regularization by taking the example of BregmanTV. If we use the Bregman distance as the regularization functional, the variational model is o n1 kKu − f δ k2 + α |u|BV (Ω) − hpl−1 , ui (4.74) ul = arg min u∈BV (Ω) 2 or respectively ul = arg min u∈BV (Ω), u≥0 nZ Ω0 δ f log fδ Ku − f δ + Ku dx o + α |u|BV (Ω) − hpl−1 , ui . (4.75) 4 Inverse Problems and Variational Methods 62 In order to verify the well-definedness of this minimizing procedure, we need to show that a minimizer ul of (4.74) or (4.75) respectively exists and that we may find a suitable subgradient pl . In [57, Proposition 3.1] the existence of ul and pl is verified for problem (4.74). The proof is by induction over l. The existence of p1 and q1 (cf. (5.45)) follows with the argumentation before, that problem (4.59) has a unique minimizer. In [20, Proposition 1] the existence is shown for a variety of data discrepancy functionals, in particular the approach is applicable to the KL divergence (4.75). Note that it is only proven for a linear operator K, whereas for an affine forward operator the proof may be adapted. Here, the idea is to use convex conjugates (cf. [29, Chap. I]) and the Fenchel duality theorem (cf. [29, Chap. III]) in order to obtain a dual representation of problem (4.63). Next, a dual inverse scale space method is derived. Finally, a bidual formulation of this dual inverse scale space method again yields an iterative procedure in a primal representation, given by ul = arg min u∈BV (Ω), u≥0 nZ Ku + rl−1 − f δ log Ku + rl−1 o dx + α|u|BV (Ω) . (4.76) Ω0 The update strategy for the residual function is rl = rl−1 + Kul − f δ , with r0 = 0. (4.77) Hence, the existence of a minimizer for (4.75) can be traced back to the existence of minimizers for a shifted version of the original minimization problem (4.63), which was investigated in the previous paragraphs. 63 5 Numerical Approach In the previous chapter we presented a method to find an approximation to the solution of an ill-posed inverse problem. We want to minimize a functional of the form J(u) = D(f δ , Ku) + αR(u) → min, u∈U where U is an infinite-dimensional Banach space. Therefore, we need discretizations for the continuous functions u and f δ as well as a numerical optimization method to solve this problem. We start by presenting the most commonly used Gradient Descent Method and Newton Methods. Afterwards we move on to Splitting Methods and present the method we use for the implementation. The methods we are presenting are all based upon the concept of first optimize, then discretize. That means, first, optimality conditions based on the infinite-dimensional optimization problem are derived, like for example Karush-Kuhn-Tucker (KKT) conditions. Then, the occurring function spaces and operators are discretized for the algorithmic realization of the minimization strategy. Note that the presented optimization strategies can also be applied for U = Rn and thus can be used for first discretize, then optimize methods as well. 5.1 Introduction to Numerical Optimization Methods 5.1.1 Gradient Descent Methods Assume we want to minimize a functional J : U → R. If J is Fréchet differentiable in u and U is a Hilbert space, which is dual to itself, a gradient flow in U can be defined by ∂u = −J 0 (u), (5.1) ∂t 5 Numerical Approach 64 where J 0 (u) can be identified with the gradient of J in u. Hence, ∂u ,v ∂t = −J 0 (u)v ∀v ∈ U. (5.2) It follows that ∂ ∂u ∂u (J(u)) = J 0 (u) = −k k2 ≤ 0. (5.3) ∂t ∂t ∂t Thus, we can conclude that, as long as (5.2) is satisfied, the functional J decreases ∂ until ∂t J(u) = ∂u = 0 is satisfied. Then, according to (5.1), J 0 (u) is equal to zero as ∂t well and therefore u is a stationary point. Using a time discretization for (5.1) we get an iterative scheme, called Gradient Descent Method uk+1 = uk − σ k J 0 (uk ) = uk + σ k dk (uk ), (5.4) where σ k is the stepsize and thus needs to be small. dk (uk ) := J 0 (uk ) is called search direction. The method ensures that J(uk+1 ) < J(uk ), that means the value of the objective functional decreases in every step. Nevertheless, it is not guaranteed that the method converges to a (global) minimum of J. One way to determine the stepsize σ k is to solve a one-dimensional optimization problem, called exact line search σ k = arg min J(uk + σdk ). (5.5) σ≥0 The solution of this problem is called exact stepsize. Another way to determine σ are inexact line search strategies (cf. [83, Chap. 3]) that try to find a σ k which guarantees a sufficient decrease of J without solving (5.5). A drawback of gradient descent methods are their slow convergence rates. A way to improve them is to use a Conjugate Gradient Method (cf. [83]). Here, according to [34], the iteration scheme is defined as follows uk+1 = uk + σ k dk g k+1 = J 0 (uk+1 ) δ k+1 = kg k+1 k2 k+1 β k = δ δk dk+1 = −g k+1 + β k dk . (5.6) Again, σk can be determined by either exact or inexact line searches. There are also variants of the algorithm that choose β k in other ways. 5 Numerical Approach 65 5.1.2 Newton Methods In order to improve the convergence rate of the aforementioned minimization algorithms from order one to quadratic convergence, we proceed from gradient based methods to Newton methods. Here, not only the first derivative of J in u is used, but also the second one. Thus, we need to assume that the objective functional J is twice continuously Fréchet differentiable. Then, the idea of Newton Methods can be motivated as follows: Consider the second-order Taylor approximation of J around uk in uk+1 = uk + d, 1 M k (uk + d) := J(uk ) + J 0 (uk )d + dJ 00 (uk )d ≈ J(uk+1 ). 2 (5.7) J 0 (uk ) is the gradient and J 00 (uk ) the Hessian matrix. If J 00 (uk ) is positive definite, then M k (uk + d) has a unique minimizer dk defined by 0 = (M k )0 (uk + dk ) = J 0 (uk ) + J 00 (uk )dk ⇔ dk = −J 00 (uk )−1 J 0 (uk ). (5.8) Thus, we get the following update strategy uk+1 = uk − J 00 (uk )−1 J 0 (uk ). (5.9) In order to obtain the search direction dk without inverting the Hessian matrix, one solves the linear system J 00 (uk )dk = −J 0 (uk ) instead. A shortcoming of this Newton method is that convergence is guaranteed only for a prior guess u0 that is sufficiently close to a minimizer u∗ (cf. [83]). A remedy to this problem can be provided by introducing a stepsize σ k and replacing (5.9) by a damped Newton method dk = J 00 (uk )−1 J 0 (uk ) uk+1 = uk − σ k dk . (5.10) A drawback of both methods is that computing the Hessian matrix and solving the linear system may be time-consuming and result in high computational costs. An idea to circumvent this problem is to use an invertible matrix Ak ≈ J 00 (uk ) instead of the Hessian matrix. Methods based on this approach are called Newton-like methods. 5 Numerical Approach 66 Remark In Chapter 7, we will partly compare our results to the results of an aforementioned algorithm minimizing the functional 1 J(u) = kKu − f δ k2 + α 2 Z p p1 |∇u| dx Ω Z q 1q |u| dx +β . (5.11) Ω The algorithm that is used to minimize this functional is based on the idea of conjugate gradient methods. The line searches in this optimization algorithm are realized by Newton methods. We refer to the associated software framework as TVreg. Disadvantages of Gradient and Newton Methods for TV Regularization Both gradient descent and Newton methods assume that the objective functional J is Fréchet differentiable. In case the minimization problem is J(u) = D(f, Ku) + αR(u) → min u with R(u) = |u|BV (Ω) , (5.12) this assumption is violated. There are different approaches to circumvent this problem. The most common one is to replace the TV functional by a smoothed approximation (cf. [1]) Z p Rβ (u) = |∇u|2 + β dx. (5.13) Ω Rβ (u) is Fréchet differentiable, convex and close to T V (u) = |u|BV (Ω) if β is small. Thus, optimization methods based on Fréchet derivates can be used. For β → 0 the solution of the pertubated variational problem with the regularization functional (5.13) converges to the solution of the variational problem with R(u) = |u|BV (Ω) (cf. [1]). However, one has to take into account that for standard Newton methods the domain of convergence is very small, especially in the case of β being close to zero. For a primal-dual approach to solve the pertubated problem with highly improved convergence results see [26]. Another shortcoming of this approximation is that edges are smoothed, thus the reconstructions are not as sharp as if we use the exact total variation as a regularizing functional. In [44] the authors present and analyze a semismooth Newton method to solve (5.12) without a smoothed approximation of the total variation. It is based on a dual formulation of the primal optimization problem (5.12) obtained by Fenchels duality theorem. For approaches based on a primal-dual formulation of (5.12) with the exact definition of the total variation that use penalty and barrier methods see [53, Chap. 4.3]. The optimization method presented in section 5.2 uses the exact definition of TV as well. 5 Numerical Approach 67 5.1.3 Introduction to Splitting Methods Assume the problem to solve is of the form 0 ∈ A(u) + B(u), (5.14) with maximal monotone operators A and B on U. In [50] and [58] an approach to solve (5.14) is presented. The idea is to rearrange the problem in order to obtain a fixedpoint equation u = T (u). Then, a fixed-point iteration can be used, that converges to a solution of (5.14) under certain assumptions. For arbitrary σ > 0 holds 0 ∈ A(u) + B(u) ⇔ (I − σA)(u) ∈ (I + σB)(u) ⇔ u ∈ (I + σB)−1 (I − σA)(u). {z } | :=T (u) σ := (I +σB)−1 is called resolvent. Thus, we have a fixed-point equation The operator RB σ u = RB (I − σA)(u) (5.15) resulting in the following fixed-point iteration σ uk+1 = RB (I − σA)(uk ). (5.16) Assumptions ensuring the convergence of (5.16) to a solution of (5.15) are for example provided by the Banach fixed-point theorem. The iteration scheme can be rewritten as a two-step iteration 1 uk+ 2 −uk + A(uk ) = 0 σ (5.17) uk+1 −uk+ 21 + B(uk+1 ) = 0. σ 1 This is equivalent to (5.16), since uk+ 2 = uk − σA(uk ) = (I − σA)(uk ) and 1 uk+1 = uk+ 2 − σB(uk+1 ) = (I − σA)(uk ) − σB(uk+1 ) ⇔ (I + σB)(uk+1 ) = (I − σA)(uk ) ⇔ (5.16). The two-step iteration scheme in (5.17) is called Forward-Backward-Splitting (FBS) algorithm, since the first step is a forward step on A, whereas the second step is a backward step on B. 5 Numerical Approach 68 Advantages of the FBS algorithm The main advantages of the FBS algorithm are the simplicity of the first step as well as its modular structure. The first step can be realized by simply applying operators. Due to the modular structure of the algorithm it is much easier to generalize it to other applications where either A or B has to be changed. Especially a change of A can be implemented without much effort. This is important in the context of variational methods where A represents the subdifferential of the data discrepancy term and B the subdifferential of the regularization term. Thus, for different data discrepancy terms, only a small part of the algorithm needs to be changed. Another advantage of the modular structure is that one part can be for example parallelized, whereas the second part remains the same. 5.2 Bregman-FB-EM-TV Algorithm In the following section we will present an optimization strategy to solve the minimization problem J(u) = Z δ f log Ω0 fδ Ku − f + Ku dx + αR(u) → δ min , (5.18) u∈BV (Ω), u≥0 where K is an affine operator of the form Ku = C − Lu. We start with an algorithm for R(u) = 0, proceed with the case R(u) = |u|BV (Ω) and finally present an algorithm for iterative regularization based on Bregman distances presented in Chapter 4, i.e. R(u) = |u|BV (Ω) − hpl−1 , ui. All algorithms are based on the optimality condition (cf. Lemma 4.1.2) 0 ∈ ∂J(u), with the subdifferential defined as in Definition 4.1.1. In the previous chapter it was already proven that J(u) has a unique minimizer characterized by the presented optimality condition. This section is based upon [69], although we adapt the algorithms presented there in order to be suitable for an affine forward operator. 5.2.1 EM Algorithm In the previous chapter we have shown how variational models can be motivated by stochastic noise models. The data discrepancy term DKL (f δ , Ku) depends on the constrained probability p(u|f δ ), whereas the regularization is motivated by an a-priori 5 Numerical Approach 69 probability p(u) that accounts for a-priori information about the unknown solution u. If we assume that each u ∈ U has the same a-priori probability, then p(u) is constant. Thus, the regularization functional R(u) is equal to 0 and the minimization problem reduces to δ Z f δ δ − f + Ku dx → min . (5.19) f log u∈BV (Ω), Ku Ω0 u≥0 A common approach for computing maximum likelihood estimates in the case of incomplete data is the so-called Expectation Maximization (EM) algorithm, which is also known as the Richardson-Lucy algorithm (cf. [62, 51, 28]). It is based on the firstorder optimality condition for (5.19). Since the minimization problem is constrained, we make use of the Karush-Kuhn-Tucker (KKT) conditions (cf. [45, Theorem 2.1.4]). They yield the existence of a Lagrange multiplier λ ≥ 0 and state that each stationary point of (5.19) fulfills the following optimality condition 0 = −L∗ 1 + L∗ f δ − λ Ku (5.20) 0 = λu. Here, 1 is again the constant function with value 1 everywhere and L∗ is the adjoint operator of L, where the forward operator is given by Ku = C −Lu. Since DKL (f δ , Ku) is strict convex, a solution of (5.20) is not only a stationary point but also a global minimum of (5.19). Multiplying the first optimality condition with u yields a fixedpoint equation L∗ 1 λu δ+ δ . u=u f f ∗ ∗ L Ku L Ku By utilizing λu = 0 we obtain a fixed-point iteration uk+1 = uk L∗ L∗ 1 δ . f Kuk (5.21) If K preserves positivity and the initial guess u0 is positive, then the algorithm preserves positivity as well. Note that the iteration scheme differs significantly from the well∼ known iteration procedure in the case of a linear operator K, given by ∼∗ uk+1 = uk K f ∼ Kuk ∼∗ K 1 . (5.22) 5 Numerical Approach 70 Not only that the adjoint of the linear part of K is used, but also the numerator and the denominator are reversed. If this is not the case, the iteration scheme will not be converging for the affine operator K. In [74] it is shown that algorithm (5.22) is an example of the more general EM algorithm in [28]. For the classical case of a linear operator there have been several proofs that for noise free data f , the iteration scheme (5.22) converges to a solution of (5.19) (see for instance [82, 55]). For noisy data f δ , there is a semi-convergence of the iterates, described in [61]. The distance between the iterates and the solution initially decreases, but later iterates are impaired by noise, resulting in an increase of the distance. Thus, one either needs appropriate stopping rules (cf. [61]) or regularization. For very noisy ET data we observed that the convergence of the iterates, which is commonly known as slow, is still too fast in order to obtain reasonable results. Already the initial iterates are impaired by a lot of noise. Therefore, we take up the approach presented in [55] to introduce a relaxation parameter ω that influences the convergence speed by setting !ω L∗ 1 k+1 k δ . (5.23) u =u f ∗ L Ku k For ω > 1, an increased convergence speed can be observed, whereas for ω < 1 the iterates converge more slowly. 5.2.2 FB-EM-TV Algorithm Since the results of the EM algorithm are unsatisfactory especially in the case of highly noisy data, it is helpful to include a-priori information about u. Therefore, we use the TV regularization presented in the previous chapter in order to improve the reconstruction results. The constrained optimization problem introduced in (4.47) is J(u) = Z Ω0 δ f log fδ Ku − f δ + Ku dx + α|u|BV (Ω) → min , u∈BV (Ω), u≥0 with α > 0. At first we will neglect the positivity constraint and derive an iterative procedure based on the first-order optimality condition 0 ∈ ∂J(u) = ∂ DKL (f δ , Ku) + αR(u) . 5 Numerical Approach 71 We would like to split the right-hand side into two separate subdifferentials. The KL divergence is defined on L1 (Ω), so its subgradients are elements of the dual space ∗ (L1 (Ω)) = L∞ (Ω). In contrast, the total variation is defined on the smaller subspace BV (Ω) but can be extended to a convex functional on L1 (Ω) by setting |u|BV (Ω) = ∞ if u ∈ L1 (Ω)\BV (Ω), without affecting the solutions of the minimization problem (5.18). Thus, the subgradients of | · |BV (Ω) are contained in L∞ (Ω) as well. The continuity of DKL (f δ , Ku) together with [29, Chap. I, Proposition 5.6] yield that 0 ∈ ∂ DKL (f δ , Ku) + αR(u) ⇔ 0 ∈ ∂DKL (f δ , Ku) + α∂R(u). (5.24) The KL divergence is Fréchet differentiable, thus the subdifferential is a unit set. The optimality condition is ∗ ∗ 0 = −L 1 + L fδ Ku + αp, p ∈ ∂|u|BV (Ω) , (5.25) where L∗ is the adjoint of the linear part of K. Again, we want to derive an iteration scheme converging to the solution of (4.47). Therefore, we evaluate the subdifferential of the data discrepancy part in the previous iterate uk , whereas the subdifferential of | · |BV (Ω) is evaluated in the next iterate uk+1 , i.e. ∗ 0 = −L 1 + L ∗ fδ Kuk + αpk+1 , pk+1 ∈ ∂|uk+1 |BV (Ω) . (5.26) The drawback of this approach is that the new iterate does not directly appear in (5.26). Since the subgradient pk+1 is not uniquely defined, we cannot determine uk+1 from pk+1 . For an appropriate uk+1 . Hence, δ iterative procedure we need to incorporate k+1 f we divide (5.26) by L∗ Ku and replace the resulting 1 by uuk . k 0= L∗ 1 1 uk+1 δ + α δ pk+1 , − k f f u L∗ Ku L∗ Ku k ⇔ uk+1 = uk L∗ L∗ 1 δ −α f Kuk L∗ uk δ pk+1 , f Ku (5.27) with pk+1 ∈ ∂|uk+1 |BV (Ω) . Note that the first part of the right-hand side corresponds to the EM iteration (5.21). Another way to motivate (5.27) is using KKT conditions for the constrained optimization problem (4.47) again. They provide the existence of a Lagrange multiplier λ ≥ 0 5 Numerical Approach 72 such that every stationary point u fulfills 0 ∈ −L∗ 1 + L∗ f δ + α∂|u|BV (Ω) − λ Ku (5.28) 0 = λu. ∗ fδ Ku By multiplication with u and division by L the Lagrange multiplier λ can be eliminated and the resulting fixed-point equation is u=u L∗ 1 δ −α L∗ f Ku L∗ u δ p, p ∈ ∂|u|BV (Ω) . f Ku (5.29) This equation corresponds to (5.25) multiplied by u, thus the multiplication involves the positivity constraint in the optimality condition. Then, (5.27) can be seen as a semi-implicit approach to (5.29). See [69, Chap. 4.3] for a verification that the corresponding iteration to (5.27) for linear operators actually preserves positivity if K preserves positivity and u0 ≥ 0. Next, we split (5.27) in a two-step iteration where the first step corresponds to the EM algorithm (cf. (5.21)) 1 uk+ 2 = uk · L∗ L∗ 1 δ (EM step) f Kuk uk+1 = uk+ 12 − α uk+ 12 pk+1 , L∗ 1 (5.30) (TV step) with pk+1 ∈ ∂|uk+1 |BV (Ω) . We call this splitting scheme the EM-TV algorithm or FBEM-TV algorithm, since we show later on that (5.30) can be interpreted as a forwardbackward-splitting algorithm as described in section 5.1.3. The second half step can be realized by solving the convex variational problem uk+1 = arg min u∈BV (Ω) Z 1 2 1 L∗ 1 u − uk+ 2 2 Ω 1 uk+ 2 dx + α|u|BV (Ω) , (5.31) with its first-order optimality condition given by L∗ 1 1 0 = uk+1 − uk+ 2 + αpk+1 , k+ 12 u pk+1 ∈ |uk+1 |BV (Ω) , which is equivalent to the second half step in (5.30). Problem (5.31) can be interpreted k+ 1 as a weighted version of the ROF model (cf. (4.46)) with weight uL∗ 12 . Hence, one can 5 Numerical Approach 73 use a slightly modified version of one of the standard numerical approaches for solving a ROF model in order to compute the second half step. See section 5.2.4 for more details. Damped FB-EM-TV Algorithm Next, we want to present a modification of the proposed EM-TV splitting algorithm making it possible to control the interaction of both steps. The idea can be traced back to an adaption of the optimality condition without affecting its solution. We want to recall the condition presented in (5.25), which we used as the basis for the splitting algorithm, namely ∗ ∗ 0 = −L 1 + L fδ Ku + αp, p ∈ ∂|u|BV (Ω) . ∗ fδ Ku Without affecting the solutions, we can divide this equation by −L and multiply it by ω ∈ (0, 1]. Now, by adding the constant function 1 on both sides, we get 1=ω L∗ 1 δ + 1 · (1 − ω) − ω L∗ f Ku L∗ αp δ , f Ku p ∈ ∂|u|BV (Ω) . (5.32) Multiplying with u leads to a fixed-point equation, which we can use similarly to before in order to obtain a fixed-point iteration uk+1 = ωuk L∗ L∗ 1 δ + uk · (1 − ω) − ω f Kuk αpk+1 δ , L∗ (5.33) f Kuk with pk+1 ∈ ∂|uk+1 |BV (Ω) . This iteration scheme can be realized by the two-step iteration 1 ∗ uk+ 2 = uk · ∗ L f1δ (EM step) L Kuk (5.34) uk+1 = ωuk+ 12 + uk · (1 − ω) − ωα uk+ 12 pk+1 , ((damped) TV step) L∗ 1 which we refer to as the damped FB-EM-TV algorithm. The second half step can be computed by solving a weighted ROF problem, which is now given by uk+1 = arg min u∈BV (Ω) Z 2 1 1 L∗ 1 u − ωuk+ 2 + (1 − ω)uk 2 Ω 1 uk+ 2 dx + ωα|u|BV (Ω) . (5.35) 5 Numerical Approach 74 For ω = 1 the algorithm coincides with the FB-EM-TV algorithm. The damping parameter can help to obtain a monotone descent of the objective functional and to proof the convergence of the algorithm in dependence of the damping parameter ω. See [69, Chap. 4.3–4.4] for more details about the convergence of the damped FB-EM-TV algorithm as well as its positivity preservation in the case of a linear forward operator. Since we are now fitting against a convex combination of the current EM iterate with the previous TV iterate, the iterations stay closer to the regularized solution uk . Thus, especially in the case of a small ω, we expect smoother results for uk+1 . Nevertheless, if the FB-EM-TV algorithm without a damping parameter converges, it converges to the same solution as the damped FB-EM-TV algorithm, since the changes have not affected the solutions of the optimality condition. Interpretation as a Forward-Backward-Splitting Algorithm As already mentioned, the two-step algorithms can be interpreted as splitting methods, like the ones presented in section 5.1.3. According to (5.24), the optimality condition (4.6) can be seen as a decomposition problem ∗ ∗ 0 ∈ −L 1 + L | {z fδ Ku + α ∂|u|BV (Ω) , | {z } B(u) } A(u) (5.36) with maximal monotone operators A and B (cf. (5.14)). Thus, following the approach in section 5.1.3, the problem can be solved by a two-step iteration 1 1 uk+ 2 − uk uk+ 2 − uk + A(uk ) = − L ∗ 1 + L∗ σ σ k+1 u −u σ k+ 12 k+1 + B(uk+1 ) = u k+ 21 −u σ fδ Kuk =0 (5.37) + α pk+1 = 0, with pk+1 ∈ ∂|uk+1 |BV (Ω) . Now, by choosing the artificial stepsize to be σ = L∗ wuk , δ f Kuk the splitting scheme is L∗ fδ Kuk 1 uk+ 2 − uk ∗ −L 1+L ωuk δ f k+ 12 k+1 L∗ Ku u − u k ωuk ∗ fδ Kuk =0 + α pk+1 = 0. (5.38) 5 Numerical Approach 75 The first equation of (5.38) is equivalent to 1 uk+ 2 = ω uk L∗ L∗ 1 δ + uk · (1 − ω) f Kuk and the second equation is equivalent to 1 uk+1 = uk+ 2 − ω α L∗ uk δ pk+1 . f Kuk Thus, (5.38) coincides with the damped FB-EM-TV algorithm (5.34) or, if ω = 1, with the FB-EM-TV algorithm (5.30). Stopping Rules Besides the maximum number of iterations, more stopping rules are included in the algorithm, which are based on 1. the error in the optimality condition, 2. the convergence of the primal functions uk , 3. the convergence of the subgradients pk ∈ ∂|uk |BV (Ω) . Therefore, we introduce a weighted norm induced by a weighted scalar product defined as Z p hu, viw := u v w dλ → kuk2,w := hu, uiw . (5.39) Ω Here, w is a positive weight function and λ the standard Lebesgue measure on Ω. Since the optimality condition for the (k + 1)-th iteration is given as ∗ 0 = −L 1 + L ∗ fδ Kuk+1 + αpk+1 , (5.40) we measure the error in the optimality condition in the weighted norm for every iteration, that means opt k+1 ∗ fδ ∗ + αpk+1 := −L 1 + L Kuk+1 2 2,uk+1 . (5.41) 5 Numerical Approach 76 In order to introduce stopping rules based on the convergence of the sequences uk and pk , we review the second half step of the damped FB-EM-TV algorithm given in (5.34): uk+1 = ωuk · L∗ ⇔ uk+1 − uk = ωuk L∗ 1 δ + uk · (1 − ω) − ωα L∗ f Kuk L∗ 1 δ −1−α f Kuk L∗ L∗ uk δ pk+1 f Kuk pk+1 δ . f Kuk (5.42) If the algorithm converges, the optimality condition (5.29) should be fulfilled for every iterate, thus we can evaluate it at uk and get uk = uk · L∗ ⇔ α L∗ uk δ pk = uk f Kuk L∗ 1 δ −α L∗ f Kuk L∗ uk δ pk f Kuk L∗ 1 δ − 1 . f Kuk Now, by inserting this equation into the previous one (5.42), we obtain L ∗ fδ Kuk uk+1 − uk ωuk + α pk+1 − pk = 0, which can be used in order to measure the convergence of the sequences of primal functions uk and the subgradients pk , respectively. We define uoptk+1 δ 2 L∗ f k+1 k u − u k Ku := k ωu 2,uk+1 , (5.43) 2 poptk+1 := α pk+1 − pk 2,uk+1 . Since for every k the primal function uk and Kuk are positive, optk+1 as well as uoptk+1 are well-defined. Now, the algorithm is stopped if at least one of the three criteria ((5.41) and (5.43)) is smaller than a pre-defined tolerance limit. Pseudocode for the damped FB-EM-TV Algorithm Taken together, the algorithm for solving the optimization problem (4.47) with the KL divergence as a data term and TV regularization is given as the following. 5 Numerical Approach 77 Algorithm 1 (Damped) FB-EM-TV Algorithm for solving (4.47) noisy data f δ , reg. param. α ≥ 0, weight. param. ω ∈ (0, 1], maxEM its ∈ N, tolerance limit tol > 0 Initialization: k = 0, u0 = 1 Parameters: Iteration: while (k < maxEM its) and (optk ≥ tol or uoptk ≥ tol or poptk ≥ tol) do 1 1. Compute uk+ 2 via EM step in (5.34). 2. Compute uk+1 via weighted ROF model (5.35). 3. Update optk , uoptk and poptk according to (5.41) and (5.43). 4. Set k = k + 1. end while return uk 5.2.3 Bregman-FB-EM-TV Algorithm Next, we consider the stepwise refinement of (4.47) based on Bregman distances proposed in (4.49). Thus, we want to solve the iterative optimization problem introduced in (4.50), i.e. ul = arg min u∈BV (Ω), u≥0 Z δ f log Ω0 fδ Ku − f δ + Ku dx + α |u|BV (Ω) − hpl−1 , ui with pl−1 ∈ |ul−1 |BV (Ω) . Similar to before, we want to obtain a two-step iteration in order to solve problem (4.50) for a fixed Bregman step l based on its optimality condition. Since the KL divergence as well as the dual product hpl−1 , ui are Fréchet differentiable, the optimality condition based on subdifferentials is given as ∗ 0 ∈ −L 1 + L ∗ fδ Kul + α ∂|ul |BV (Ω) − pl−1 . (5.44) By starting with a constant solution u0 , we can assume that p0 = 0 ∈ |u0 |BV (Ω) . Besides a strategy to solve the optimization problem (4.50) for a fixed l, an update strategy for pl is needed as well. This can be easily motivated by the optimality condition. The subgradient pl ∈ ∂|ul |BV (Ω) can be chosen according to 1 pl = pl−1 + α ∗ L 1−L ∗ fδ Kul (5.45) with p0 = 0. Thus, a splitting algorithm to solve (4.50) for fixed l is alternated with updates of pl (cf. (5.45)) and l. For the two-step iteration the first EM step is analogous 5 Numerical Approach 78 to the one in the FB-EM-TV algorithm (cf. (5.30)), whereas the second step needs to be adapted in order to account for the change in the regularization functional. The new splitting scheme for the Bregman-FB-EM-TV algorithm is k+ 21 u = ukl · l L∗ L∗ 1 (EM step) fδ Kuk l (5.46) k+ 1 uk+1 = uk+ 12 − α ul 2 pk+1 − p , l−1 l l l L∗ 1 (TV step) where the index k refers to the number of EM iterations and the index l to the number of Bregman iterations. Again, the second step can be solved via a variational problem uk+1 = arg min l u∈BV (Ω) Z k+ 1 2 1 L ∗ 1 u − ul 2 dx + α |u|BV (Ω) − hpl−1 , ui . k+ 1 ul 2 2 Ω (5.47) In order to be able to use the same approach for solving this problem as for the weighted ROF problem in (5.31), we want to shift the second part of the regularization functional, i.e. −αhpl−1 , ui, to the data fidelity term. Thus, we want to solve uk+1 = arg min l u∈BV (Ω) Z k+ 1 k+ 21 2 ∗ 1 − 2αpl−1 uul 2 L 1 u − ul k+ 12 2 Ω dx + α |u|BV (Ω) ul . With α pl−1 := L∗ 1vl−1 this can be rewritten as uk+1 l Z 1 = arg min 2 u∈BV (Ω) Ω ∗ L1 u− k+ 1 ul 2 2 − k+ 1 2vl−1 uul 2 dx + α |u|BV (Ω) k+ 1 ul 2 and the new update strategy is vl = vl−1 + 1 − L ∗ fδ Kul L∗ 1 . (5.48) For the final step of rewriting problem (5.47) as a weighted ROF model, we have to adapt the data discrepancy term in a way that it has the form of a weighted L2 -norm. 5 Numerical Approach 79 It holds that u− k+ 1 ul 2 2 k+ 1 − 2vl−1 uul 2 (5.49) 2 k+ 21 k+ 12 k+ 12 2 k+ 12 2 = u − ul + vl−1 ul − 2 ul vl−1 − vl−1 ul , where the last two terms are independent of the solution u and thus do not influence it. Therefore, problem (5.47) can be rewritten as the weighted ROF model uk+1 = arg min l u∈BV (Ω) Z 2 k+ 1 k+ 1 1 L∗ 1 u − ul 2 + vl−1 ul 2 k+ 12 2 Ω ul dx + α |u|BV (Ω) . (5.50) A method to solve this problem is given in section 5.2.4. If we want to include a damping parameter ω into the Bregman-FB-EM-TV algorithm as well, only the TV step, which is realized by solving the weighted ROF model (5.50), needs to be adapted. The TV step is then given as uk+1 = arg min l u∈BV (Ω) Z 2 k+ 12 k+ 12 ∗ 1 L 1 u − ũl + ωvl−1 ul k+ 12 2 Ω ul dx + αω |u|BV (Ω) (5.51) with k+ 21 ũl k+ 12 = ωul + (1 − ω)ukl . We refer to the resulting algorithm as the damped Bregman-FB-EM-TV algorithm. Again, with ω = 1 the algorithm coincides with the Bregman-FB-EM-TV algorithm. Stopping Rules Since the damped Bregman-FB-EM-TV algorithm consists of two different iterative schemes - the outer Bregman iterations and the inner EM iterations - we differentiate between stopping rules for each of them. The inverse scale space method of the Bregman-FB-EM-TV algorithm starts with an overregularized solution and incorporates more small scales in every Bregman iteration. Since noise is small scaled as well, the iterations need to be stopped before the results are impaired by too much noise. Thus, a suitable stopping criterion would be to stop the iterations before the residual of the noisy data f δ and Kul reaches the noise level δ (cf. [57, 22, 69]). Since reliable estimations of the noise level are lacking for ET, so far we stop the Bregman iterations only by the pre-defined maximum number of iterations. An appropriate stopping rule for the Bregman iterations still needs to be incorporated in the algorithm. 5 Numerical Approach 80 For the inner EM iterations we define stopping rules analogous to the ones in section 5.2.2. The error in the optimality condition is measured by optk+1 l 2 δ ∗ f k+1 ∗ := −L 1 + L + αp − αp l−1 l k+1 Kuk+1 l 2,ul 2 δ ∗ f k+1 ∗ ∗ + αp − L 1v = −L 1 + L l−1 l k+1 . Kuk+1 l 2,u (5.52) l The accuracies of the sequences of primal functions ukl and subgradients pkl ∈ ∂|ukl |BV (Ω) are respectively measured by uoptk+1 l δ 2 k+1 L∗ f k u − u k l l Kul := k ωu l , 2,uk+1 l (5.53) 2 poptk+1 := α pk+1 − pkl 2,uk+1 . l l l Again, the EM iterations are stopped if at least one of the three criteria ((5.52) and (5.53)) is smaller than a pre-defined tolerance limit. Pseudocode for the damped Bregman-FB-EM-TV Algorithm The resulting algorithm for solving the optimization problem (4.50) with the KL divergence as a data discrepancy term and the Bregman distance associated with the total variation as a regularization is given in Algorithm 2. 5.2.4 Numerical Realization of the Weighted ROF In the previous sections we have seen that the denoising step in the (damped) FB-EMTV algorithm (cf. (5.31) and (5.35)) as well as in the (damped) Bregman-FB-EM-TV algorithm (cf. (5.50) and (5.51)) can be realized as a weighted ROF model. The general form of the problem is 1 2 Z Ω (u − q)2 dx + β|u|BV (Ω) → min , u∈BV (Ω) h (5.54) where q, h and β > 0 are chosen in dependence of the particular problem. An overview of the different choices is given in Table 5.1. In general, there have been several 5 Numerical Approach 81 Algorithm 2 (Damped) Bregman-FB-EM-TV Algorithm for solving (4.50) noisy data f δ , reg. param. α ≥ 0, weight. param. ω ∈ (0, 1], maxEM its ∈ N, maxBregits ∈ N, tolerance limit tol > 0 Initialization: l = 1, u01 = 1, v0 := 0 Parameters: Iteration: while (l < maxBregits) do 1. Set k=0. while (k < maxEM its) and (optk ≥ tol or uoptk ≥ tol or poptk ≥ tol) do k+ 21 a) Compute ul via EM step in (5.46). b) Compute uk+1 via weighted ROF model (5.51). l c) Update optkl , uoptkl and poptkl according to (5.52) and (5.53). d) Set k = k + 1. end while 2. Compute update vl according to (5.48). 3. Set u0l+1 = ukl . 4. Set l = l + 1. end while return u0l approaches in order to solve the standard ROF model, i.e. problem (5.54) with weight function h = 1. For some examples see [24, 25, 39]. In this section we want to outline two different iterative approaches to solve problem (5.54) presented in [69]. The first one is based on the projected gradient descent algorithm given in [24] for the standard ROF model. For more details see [69] and the references therein. Note that the approach uses the exact definition of the total variation given in (4.42). Thus, any smoothing as given in (5.13) is not necessary. In order to derive an iterative algorithm to solve (5.54), we insert the definition of the total variation (4.42) into (5.54) and get a saddle point problem in the primal variable u and the dual variable ϕ inf sup L(u, ϕ) := u∈BV (Ω) ϕ∈C ∞ (Ω;R3 ), 0 kϕk∞ <1 1 2 Z Ω 2 (u − q) dx + β h Z u ∇ · ϕdx. (5.55) Ω We can swap the infimum and supremum and derive a primal optimality condition for problem (5.55) given by ∂ L(u, g) = 0 ∂u ⇔ u = q − βh ∇ · ϕ. (5.56) 5 Numerical Approach 82 Algorithm q h FB-EM-TV Algorithm (5.31) uk+ 2 Damped FB-EM-TV Algorithm (5.35) ωuk+ 2 + (1 − ω)uk 1 uk+ 2 L∗ 1 1 k+ 12 ul Damped Bregman-FB-EM-TV Algorithm (5.51) k+ 1 ul 2 L∗ 1 k+ 21 + vl−1 ul k+ 12 ωul k+ 12 + ωvl−1 ul α 1 uk+ 2 L∗ 1 1 Bregman-FB-EM-TV Algorithm (5.50) β + (1 − ω)ukl k+ 1 ul 2 L∗ 1 ωα α ωα Table 5.1: Overview of the particular settings for q, h and β in (5.54) in dependence of the different algorithms presented in sections 5.2.2 and 5.2.3. By substituting (5.56) into (5.55) we obtain a purely dual constrained optimization problem only depending on the variable ϕ. Hence, with the KKT conditions (cf. [45]), optimality conditions for the dual problem can be obtained from which a fixed-point equation and a resulting fixed-point iteration are derived. Under certain conditions this iteration converges to an optimal solution ϕ̃ of the purely dual problem and the optimal primal solution ũ of the weighted ROF model (5.54) is then given as ũ = q − βh ∇ · ϕ̃. Another approach to solve the weighted ROF model is presented in [68, Chap. 6.3.4]. It is based on the Split Bregman approach proposed in [39]. Here, the formal definition of TV (cf. (4.44)) is used to rewrite (5.54) as a constrained optimization problem 1 min u,ũ,v 2 Z Ω (ũ − q)2 dx + β h Z |v| dx s.t. ũ = u and v = ∇u. (5.57) Ω Then, following the idea of augmented Lagrangian methods (cf. e.g. [46]), one can define the augmented Lagrangian functional in accordance with (5.57) as 1 L(u, ũ, v, λ1 , λ2 ) = 2 Z Ω (ũ − q)2 dx + β h Z |v| dx + hλ1 , v − ∇ui Ω (5.58) µ1 µ2 + kv − ∇uk2L2 (Ω) + hλ2 , ũ − ui + kũ − uk2L2 (Ω) + Xũ≥0 . 2 2 Here, λ1 , λ2 are Lagrange multipliers, µ1 , µ2 are positive relaxation parameters and Xũ≥0 is an indicator function in order to ensure positivity of the solution. An alternating minimization scheme is derived with the standard Uzawa algorithm (cf. [30]). In each iteration step one successively determines the minimum of L regarding u, ũ and v respectively, whereas the respective other two are fixed. See [68] for more details. 83 6 Programming and Realization in MATLAB and C In the previous chapter an algorithm to solve the inverse problem f = Ku by variational methods has been presented. In this chapter we will comment on the computational realization of this algorithm. In particular, two existing frameworks are introduced, which we used as a basis for the implementation. The first code is implemented in R MATLAB , whereas the second one is an efficient C code. Therefore, we present an approach to combine both codes. Moreover, we will address some difficulties of the algorithm and ways to overcome these. 6.1 Bregman-FB-EM-TV Toolbox As a framework for the algorithm we used an existing toolbox for the Bregman-FBR EM-TV algorithm in the case of a linear operator, which is implemented in MATLAB (cf. [69, 20]). The algorithm is solely designed for a linear forward operator, previous applications have been in the field of fluorescence microscopy and positron emission tomography. The resulting reconstructions can either be two- or three-dimensional. The underlying algorithm is presented in Algorithm 3, where the differences to the damped Bregman-FB-EM-TV algorithm for affine forward operators (cf. Algorithm 2) are marked in red. An advantage of the toolbox is the very modular structure. The forward operator only influences the EM step as well as the Bregman update, whereas the TV step remains unchanged. Therefore, the algorithm can be easily extended to other applications with data corrupted by Poisson noise, as long as the underlying operator is linear. If this is fulfilled, one only needs to insert the forward and adjoint operator of the inverse problem to be solved. For applications with an affine forward operator and data corrupted by Poisson noise, inserting the new operator is not sufficient. Here, Algorithm 2 needs to be used, which has been developed in Chapter 5. The algorithms differ in their particular EM step as well as in the weight of the TV step. Moreover, the 6 Programming and Realization in MATLAB and C 84 Algorithm 3 (Damped) Bregman-FB-EM-TV Algorithm for a linear forward operator noisy data f δ , reg. param. α ≥ 0, weight. param. ω ∈ (0, 1], maxEM its ∈ N, maxBregits ∈ N, tolerance limit tol > 0 Initialization: l = 1, u01 = 1, v0 := 0 Parameters: Iteration: while (l < maxBregits) do 1. Set k=0. while (k < maxEM its) and (optk ≥ tol or uoptk ≥ tol or poptk ≥ tol) do k+ 21 a) Compute ul via EM step k+ 12 ul K∗ = ukl · fδ Kuk l K∗ 1 . b) Compute uk+1 via weighted ROF model l uk+1 = arg min l Z 2 k+ 1 1 K∗ 1 u − ũl 2 + ωvl−1 ukl u∈BV (Ω) 2 ukl Ω with k+ 12 ũl k+ 21 = ωul dx + αω |u|BV (Ω) + (1 − ω)ukl . c) Update optkl , uoptkl and poptkl according to optk+1 l uoptk+1 l := +K∗ 1 − K∗ fδ Kuk+1 l ! + αpk+1 l − K∗ 1vl−1 2,uk+1 l 2 − pkl 2,uk+1 . poptk+1 := α pk+1 l l d) Set k = k + 1. end while 2. Compute update vl according to vl = vl−1 − 1 − 3. Set u0l+1 = ukl . 4. Set l = l + 1. end while return u0l 2 2,uk+1 l K∗ 1 uk+1 − uk 2 l l := k ωul l K∗ fδ Kul K∗ 1 . 6 Programming and Realization in MATLAB and C 85 Bregman update is changed. Nevertheless, we could use the framework of the existing toolbox, the computation of the minimum of the weighted ROF model, and the computations for basic operators like gradient and divergence. We adapted the updates for k+ 1 and vl as well as the stopping rules in accordance with Algorithm 2. ul 2 , uk+1 l 6.2 TVreg Software For the algorithm we also used parts of another toolbox, called TVreg, which is implemented in C. It is a software for solving three-dimensional tomographic reconstruction problems based on [65]. The underlying algorithm solves the optimization problem 1 J(u) = kKu − f k22 + λtv 2 Z Z p |u(x)|q dx → min . |∇u(x)| dx + λl Ω u Ω (6.1) Thus, the approach to solve the inverse problem Ku = f is based on variational methods as well. The operator K is assumed to be linear, for that reason data are preprocessed in the way that the constant part (cf. C1 (ω) in (6.3)) is substituted prior to reconstruction. Moreover, data are assumed to be corrupted by additive Gaussian noise, therefore the data discrepancy term is the L2 -norm. For λl = 0 and p = 1 the regularization functional is the total variation. In our implementation presented in section 6.1 we made use of the complex forward model (cf. (3.17), resp. (6.3)) for phase contrast TEM imaging contained in the TVreg software. This enables us to make use of an accurate and efficient implementation in C and to allow for fair comparisons between different inversion methods. Although we did not use any parts of the implemented optimization strategy, we want to shortly address it. This is done with regard to the next chapter, where some of our results are compared to the ones of this software. The iterative method for solving (6.1) is inspired by the conjugate gradient method and uses Newton methods for some line searches (cf. Chapter 5.1). The software assumes that J is differentiable, which is not fulfilled if p = 1 and/or q = 1. Therefore, the energy functional J is replaced by a smoothed variant Jβ given as 1 Jβ (u) = kKu − f k22 2 Z 2 |∇u(x)| + β + λtv Ω 2 p2 Z 2 |u(x)| + β dx + λl Ω 2 2q (6.2) dx → min . u The constant β > 0 in the algorithm is not fixed but adapted in each iteration during the reconstruction. TVreg constructs a sequence (un , βn )n where un should converge 6 Programming and Realization in MATLAB and C 86 to a solution of (6.1) and βn becomes stationary. As can be seen in the results in Chapter 7, the smoothing yields edges that are not as sharp as in the case of exact TV regularization. Choosing the right regularization parameter λ is always a difficult task in the context of regularization methods. Although there are a lot of approaches for finding a suitable λ, most of them are not applicable in the case of highly noisy ET data. One reason is that most of them are based on an estimate δ of the present noise level, which is difficult to obtain in ET. The main novelty of the TVreg software is a method for choosing adequate regularization parameters (cf. [65]). It is explicitly designed for highly noisy ET data and therefore a great advantage compared to all other methods. 6.3 Embedding of the Forward Operator via MEX files The aforementioned TVreg software contains an implementation of the forward operator and its adjoint for phase contrast TEM imaging. In Chapter 3 we present a computationally feasible forward model. It has the form K(F )(ω)i,j = C1 (ω) − C2 X h PSFopt ~ω⊥ i P (F ) (xk,l ) PSFdet (xi,j − xk,l ) (6.3) re k,l with constants C1 (ω) and C2 that are dose-dependent. The constant C1 (ω) corresponds to what would be measured if the scattering potential was zero. In an ideal situation all parameters influencing both constants are known prior to reconstruction. Since this is often not the case, one either needs to reconstruct them alongside with the scattering potential or estimate them from the data. The implementation of the forward operator in the TVreg software has the form K(F )(ω)i,j = C1 (ω) − X h PSFopt ~ω⊥ i P (F ) (xk,l ) PSFdet (xi,j − xk,l ), (6.4) re k,l | {z ∗ } where C1 (ω) is estimated as C1 (ω) = X 1 · fdata (ω, i, j). #fdata (ω, i, j) i,j The idea for estimating C1 (ω) as the average of the measured data fdata (ω, · , · ) in each micrograph is the following. The second part of the forward operator, i.e. (∗) in (6.4), 6 Programming and Realization in MATLAB and C 87 represents the information obtained due to scattering effects. In the case of weakly scattering materials, which is common for biological applications, this part is very small compared to the constant C1 (ω). Therefore, one can assume that fdata (ω, i, j) ≈ C1 (ω) for all (i, j) and estimate C1 (ω) as the average of the measured data fdata (ω, · , · ). The constant C2 is missing in the implementation of TVreg. If one assumes additive Gaussian noise and preprocesses the data as described before, i.e. one substitutes C1 (ω) prior to reconstruction, this missing constant has no negative influence on the reconstruction. Since the main interest in ET lies in reconstructing the correct form and position of the specimen, it does not really matter if u or C2 · u is reconstructed. However, in case one assumes Poisson noise, i.e. data-dependent noise, the missing constant has a major influence on the reconstructions. As we can see below, especially in the case of a high dose, this can result in the divergence of the algorithm. Nevertheless, we used this implementation in the (Bregman-)FB-EM-TV framework due to the advantages mentioned before. The forward operator and its adjoint are implemented in C whereas the software framework for the optimization algorithm is implemented R in MATLAB . Therefore, we developed MEX1 files and used the MEX interface in R order to combine both. MEX file stands for MATLAB executable file and provides R the opportunity to invoke efficient C-code from within MATLAB . Therefore, MEX files are a possibility to either increase the speed of certain functions by implementing R them in C without loosing the flexibility to use them out of MATLAB or to use R existing C-code within a MATLAB framework as it is done in our implementation. Technically, the files are written in C but, when compiled, act as if they were built-in R functions in MATLAB . 6.4 Difficulties Before we will present some results of the (Bregman-)FB-EM-TV algorithm in the next chapter, we address some difficulties that may occur. Mainly, there are four different problems that we have to deal with. Pointwise Results As aforementioned, the dose-dependent constant C2 in (6.3) is missing in the implementation of the forward operator. In general, a higher electron dose results in an improved signal-to-noise ratio and thus should deliver better results. This is not necessarily the case in our algorithm, since the influence of the missing C2 1 http://www.mathworks.de/de/help/matlab/call-mex-files-1.html 6 Programming and Realization in MATLAB and C 88 increases with a higher dose. The results may be pointwise, which means that individual values are disproportionately high and are emphasized each time the operator is applied. This can lead to the divergence of the algorithm. Often, these pointwise results are a consequence of too less iteration steps or an unsuitable scaling of the data. The implementation of the weighted ROF model expects that the regularization parameter is chosen in accordance with the scaling of the input data. That means, if the input data are scaled from 0 to 255, the regularization parameter needs to be 255 times larger compared to the regularization parameter for data that are scaled from 0 to 1. By default, the algorithm uses the initial solution u0 = 1. If the ’correct’ solution is for example scaled from 0 to 80, it is complicated to choose a regularization parameter that is suitable for the scaling of the first but also for later iterates. Thus, the influence of the regularization becomes vanishingly low in later iterates. Thereby, the regularization cannot counteract the pointwise results produced in the EM step and the algorithm might diverge. A remedy to this problem is to adapt the starting value consistent with the expected scaling of the solution. Especially for data with a higher electron dose a higher starting value may result in significantly improved reconstructions. Note that if the algorithm converges, the result is the same, independent of the chosen starting value. If a suitable starting value does not help, the problem can be circumvented to a certain degree by using the damping parameter ω. For a small ω the iterations in each iteration step stay closer to the regularized solution and we expect smoother results (cf. Chapter 5.2.2). Therefore, it can suppress the formation of individual high values. Another idea was to scale the data prior to reconstruction in the sense that we eliminate the influence of the dose. Unfortunately, this promoted the formation of stripe artifacts, which are addressed below. Negative Values The linear part of the forward operator includes the PSF of the optics defined in terms of its Fourier transformation called CTF (cf. (3.11)). In phase contrast imaging the optics has the important role to make the phase shift that the wave undergoes when interacting with the specimen, visible. Hence, the CTF might reverse contrast, which is no problem in case one models additive Gaussian noise. If one is modelling Poisson noise, it is still no problem when applying the forward operator. There is only a problem when the adjoint of the linear part, i.e. L∗ , reverses contrast so that negative values arise although it is applied to a function in data space that is positive. In this case, there are negative values in the EM step of the algorithm leading to the divergence of the algorithm. In our tests we discovered a strong correlation between individual disproportionately high values as described in the paragraph above 6 Programming and Realization in MATLAB and C 89 and negative results after applying the adjoint to positive functions in the data space. A way to prevent the negative results is either a small damping parameter w or a suitable starting value u0 . Stripe Artifacts In some rare cases the forward operator produces stripe artifacts with a direction orthogonal to the tilt axis (cf. Chapter 7). One explanation for those artifacts could be related to the limitation of angles in the measured data. Once the artifacts arise, they are emphasized in each iteration. Unfortunately, we could not reliably figure out the source for the formation of these artifacts and will confine ourselves to explain proper handling of these situations. A way to circumvent these stripes is to use a strong regularization, leading to overregularized solutions with an unnatural appearance. This goes well together with the usage of Bregman iterates. Since the iterative Bregman algorithm presented in Chapter 5.2.3 is an inverse scale space method, the first Bregman iterates are strongly overregularized whereas smaller details are included in the later iterates. Once the stripe artifacts have been suppressed by strong regularization in the first Bregman iterates, we could not observe new formations in later Bregman iterates. Thus, using Bregman iterations seems to be a good way to handle these artifacts. Computational Time and Memory Consumption A major drawback of our algorithm is the long computational time as well as the memory consumption. The dimensions of the reconstructions for the different data sets we use range from 95 × 100 × 80 (smallest simulated data set) to 512 × 256 × 350 (largest experimental data set). Thus, especially in the case of experimental data, the computational time is a major problem exacerbating adequate parameter tests. If we compare the computational time needed for the EM and TV step, the latter one is the crucial point. Therefore, the reconstruction size is critical, whereas the size of the data set has a much smaller impact. An idea to circumvent this problem is to reconstruct smaller subregions if the focus mainly lies on a certain part of the reconstruction. In order to give an impression of the actual run times, we will mention some examples alongside with the results in Chapter 7. 90 7 Results In this chapter we present some computational results of the (Bregman-)FB-EM-TV reconstruction algorithm. In the first part we use simulated data sets and compare our results to the ground truth as well as to results of the TVreg software. In the second part, we present reconstructions from an experimental data set and compare them to results from the TVreg software as well. Moreover, we clarify some of the afore-mentioned difficulties of the algorithm based on suitable examples. Note that all data sets we use are single-axis tilt-series data. Moreover, the results we present are 2D cross-sections of the 3D objects. For a better visualization, we partly replaced the grayscale colorbar by a colorbar ranging from dark blue (low intensity values) to dark red (high intensity values). 7.1 Simulated Data The simulated data sets we use represent single-axis tilt-series data from a conventional 300 keV bright-field TEM. They are generated with the TEM simulation software that is presented in [66]. The tilt angles vary from −60◦ to +60◦ with one micrograph every second degree, i.e. in total there are 61 different micrographs for each data set. The region of interest we want to reconstruct is a three-dimensional rectangular voxel region with a voxel size of 0.5 nm. The detector is a two-dimensional rectangular pixel region with a pixel size of 16 µm. Overall, the magnification is 25000. Moreover, we want to specify the parameters that influence the forward model presented in Chapter 3. The defocus and the spherical aberration that influence the CTF in (3.11) and thereby the optics PSF are given as 4z = 3 µm and Cs = 2.1 mm, respectively. The detector PSF is defined by use of the MTF in (3.16) with a = 0.7, b = 0.2, c = 0.1, α = 10 and β = 40. Finally, the focal length in (3.9) is f = 2.7 mm. 7 Results 91 (a) (b) (c) Figure 7.1: Simulated data set from balls phantom. a) 2D cross-section of the 3D phantom. b) Zero-tilt image of the noise-free data set. c) Zero-tilt image with noise. Balls Data Set The first data set we use is generated with a phantom representing 40 balls of different size and contrast embedded in aqueous buffer. The phantom as well as given simulated data without and with noise are shown in Figure 7.1. The noise-free data set is presented in Figure 7.1 b) and the data set with noise in 7.1 c). The underlying objects are hardly to detect in the noisy data set, which illustrates how challenging reconstructions in the field of ET are. The region of interest that we reconstruct has the dimensions 210 × 250 × 40. The total electron dose is 6000e− /nm2 distributed over all micrographs, which corresponds to 40e− /pixel in each micrograph. RNA Data Set The other simulated data sets are generated with a phantom representing a single RNA Polymerase II-particle. The three-dimensional region of interest that we want to reconstruct has the dimensions 95 × 100 × 80. We use two different doses, a low one representing a more realistic data set and a high one in order to emphasize some of the algorithmic difficulties we have. The low total electron dose is 5000 e− /nm2 corresponding to a dose of 33 e− /pixel in each micrograph. The higher total dose is 100.000 e− /nm2 , i.e. 671 e− /pixel in each micrograph. In Figure 7.2 we show the phantom, the noise-free data set as well as the noisy data sets with the high and low total electron dose, respectively. Validation and Evaluation In the case of simulated data sets we want to evaluate the obtained results by comparing them to the phantom. Now the question is which validation tools are an adequate choice for our results. Since the focus in ET lies on reconstructing the right position and shape of the specimen, we want to compare the re- 7 Results 92 (a) (b) (c) (d) Figure 7.2: Simulated data set from RNA phantom. a) 2D cross-section of the 3D phantom. b) Zero-tilt image of the noise-free data set. c) Zero-tilt image with noise (dose 671 e− /pixel). d) Zero-tilt image with noise (dose 33 e− /pixel). sults and the phantom independently of their different contrasts. Therefore, we segment the results prior to the comparison. We decided for a simple thresholding algorithm, although the results are probably more accurate with a more sophisticated segmentaR tion algorithm like e.g. K-Means or Chan-Vese algorithms. We use the MATLAB build-in graythresh function1 , but reduce the automatic threshold by 0.1. The results are thresholded and thereby converted into logical arrays. Then, we calculate the Jaccard similarity index of the segmented phantom and reconstruction result, which is defined as |A ∩ B| (7.1) JC(A, B) = |A ∪ B| for logical arrays A and B. The index ranges from 0 to 1, whereby a higher index is preferable. Note that for some results, especially for low-dose data sets, the segmentation is not possible. Hence, we have to solely rely on our visual perception to evaluate the results in these cases. 1 http://www.mathworks.de/de/help/images/ref/graythresh.html 7 Results (a) 93 (b) (c) (d) Figure 7.3: Balls phantom and TVreg result prior to and after segmentation. a) Phantom. b) Phantom after segmentation. c) Result obtained with TVreg software. d) TVreg result after segmentation. The Jaccard similarity index of the ground truth segmentation b) and d) is 0.585. To evaluate the reconstruction results of the balls data set we use a second criterion, based on the segmented image as well. The aim is to automatically find out the number of balls that are reconstructed correctly or falsely and the ones that are missing in the R reconstruction. By using the MATLAB function regionprops2 we obtain a list of the centroids and areas of each connencted component that is present in the segmented image. Then, a ball is labeled as correctly reconstructed if the euclidean distance of its centroid crecons and the centroid cphantom of a ball in the phantom is smaller than 2. This value is motivated by the distance distributions shown in Figure 7.5. 7.1.1 Results Balls Phantom In this section, we present results of the (Bregman-)FB-EM-TV algorithm for the balls data set. The ground truth of the data set before and after segmentation is shown in Figure 7.3 a) and b). Figure 7.3 c) and d) show the TVreg result that we use as a reference for our reconstructions, whereby the latter one is the result after segmentation. The Jaccard similarity index of the ground truth segmentation in Figure 7.3 b) and d) is 0.585. To get an impression of the influence of the regularization parameter α we tested the FB-EM-TV algorithm for differentent parameter choices. In Figure 7.4 the results for different levels of regularization are presented. A higher regularization leads to less reconstructed balls, especially the balls with a low contrast are not reconstructed. Moreover, the contrast loss that is typical for TV regularization can be seen. While for a small regularization (cf. Figure 7.4 a)) the contrast of the ball in the left lower corner is relatively bright, there is a significant decrease of 2 http://www.mathworks.de/de/help/images/ref/regionprops.html 7 Results (a) α = 0.0025 94 (b) α = 0.0055 (c) α = 0.0095 (d) α = 0.0125 Figure 7.4: Results of the balls data set obtained with the FB-EM-TV algorithm for different regularization parameters. Figure 7.5: Minimal euclidean distance between the centroid of a reconstructed ball and the centroid of a ball in the phantom. Examples for different regularization parameters. contrast when the regularization parameter increases (cf. Figure 7.4 b)-d)). We used these reconstruction results for different regularization parameters to get an idea of the position of the reconstructed balls compared to the ground truth given in Figure 7.4 a). Dependent on the regularization parameter, we plotted the minimal euclidean distance between the centroid of the reconstructed ball and the centroid of a ball in the phantom in Figure 7.5. Here, each point represents a reconstructed ball. By comparing the results for a smaller regularization parameter, where also the balls with low contrast are reconstructed, to the results obtained with a larger parameter, where only the balls with high contrast are reconstructed, we can conclude that the deviation is higher for smaller balls or balls with low contrast. Overall, the distance is roughly in the range from 0 to 2, therefore we used this value as the limit in our validation tool to decide whether a ball is correctly reconstructed or not. Note that extreme outliers are not plotted in Figure 7.5. 7 Results 95 (a) (b) (c) Figure 7.6: Result for the balls data set obtained with the FB-EM-TV algorithm prior to and after segmentation. a) Result prior to segmentation. b) Result after segmentation. c) KL divergence of f δ and Kuk . The Jaccard similarity of b) and Fig. 7.3 b) is 0.6703. A result of the FB-EM-TV algorithm is presented in Figure 7.6 a). We chose the initial solution u0 = 30, the regularization parameter α = 0.0025, and 150 EM steps. This resulted in a computational time of approximately 72 minutes. If we compare this result to the TVreg result in Figure 7.3 c), we see that our algorithm produces sharp edges of the objects and a much smoother background. Note that the background artifacts that are still visible are a consequence of the higher starting value. The balls that are reconstructed are roughly the same in both results. A drawback of the FB-EMTV algorithm is that the contrast loss as a result of TV regularization makes it hard to detect the row with the lowest contrast. This influences the segmentation result in Figure 7.6 b). The lower intensity balls on the right are only partly segmented, especially the contrast of the largest ball in the right row is too low. Nevertheless, we obtain an increased Jaccard similarity index of 0.6703 compared to 0.585 before. In Figure 7.6 c) we see that the KL divergence of the noisy data f δ and Kuk decreases monotonously in every iteration step, which indicates that in each iteration the value of the objective functional DKL (f δ , Ku) + α|u|BV (Ω) decreases. The Bregman-FB-EM-TV algorithm is an extension of the FB-EM-TV algorithm, designed to reduce the contrast loss caused by total variation regularization. In Figure 7.7 we show a result of this extended algorithm. Here, the initial solution was u0 = 1, the number of EM steps in each Bregman step 150 and the regularization parameter α = 0.009. The computational time for this result was roughly 4 hours. The algorithm is an inverse scale space method, thus, it starts with an overregularized solution and incorporates more information in each Bregman iteration. Figure 7.7 a)-c) shows the results after each Bregman iteration, whereby c) is the final result. The enhanced 7 Results 96 (a) 1st Bregman step (d) (b) 2nd step Bregman (c) 3rd Bregman step (e) Figure 7.7: Results for the balls data set obtained with the Bregman-FB-EM-TV algorithm prior to and after segmentation. a) - c) Results of the 1st - 3rd Bregman step. d) Result of the 3rd step after segmentation. e) KL divergence of f δ and Kuk . The Jaccard similarity index of d) and Fig. 7.3 b) is 0.7687. contrast leads to a better segmentation result, making it possible to detect the balls with the lowest contrast, too. A drawback is that the contrast of false objects may be enhanced as well. On the right edge of Figure 7.7 d) we can see a false object, which was not detected in the result of the FB-EM-TV algorithm (cf. Figure 7.6 b)). Compared to the segmented phantom in Figure 7.3 b) we obtain an improved Jaccard similarity index of 0.7687. Again, Figure 7.7 e) shows the KL divergence of f δ and Kuk , indicating a monotonous decrease of the objective functional. Table 7.1 shows that the number of correctly reconstructed balls differs only slightly among the reconstruction algorithms. All three have their strengths and weaknesses. The Bregman- and the FB-EM-TV algorithm deliver sharp edges, making it easier to differentiate between objects and the background. A drawback is that both algorithms tend to reconstruct false objects at the edges of the reconstruction. Therefore, an advantage of the TVreg software is that the number of false objects is distinctly smaller compared to the other algorithms. Unfortunately, the edges of the TVreg results are blurred, especially in outer slices of the 3D object. This makes the segmentation more 7 Results 97 Algorithm # correctly recons. balls # false objects # missing balls TVreg 23 4 17 FB-EM-TV 22 10 18 Bregman-FB-EM-TV 25 8 15 Table 7.1: Evaluation of reconstructed balls. Figure 7.8: Volume ratio of the balls in the reconstruction compared to the phantom. complicated and results in segmented objects that are rather oval-shaped than circular. This is clarified in Figure 7.8. Here, for each correctly reconstructed ball we compare the volume of this ball after segmentation to the corresponding one in the ground truth segmentation. For the TVreg software most of the segmented balls are too large, sometimes even twice as large as the ground truth. On average, the volume ratio between the reconstruction after segmentation and the ground truth segmentation is 1.336. The enhanced sample contours in the (Bregman-)FB-EM-TV results facilitate the segmentation step and prevent that the volume of the segmented balls is much larger than in the ground truth. A drawback of the FB-EM-TV algorithm is that for some balls the contrast is too low to segment the whole ball. Therefore, the volume tends to be too small with some outliers that are not even half as large as the ground truth segmentation. Here, the average volume ratio is 0.808. The best results can be obtained with the Bregman-FB-EM-TV algorithm, where the average volume ratio is 0.924. Apart from two outliers, the volume ratios of the reconstructed balls after segmentation compared to the ground truth segmentation are uniformly distributed around 1. 7 Results (a) 98 (b) (c) (d) Figure 7.9: Phantom and TVreg result for RNA Polymerase II with high-dose data prior to and after segmentation. a) Phantom. b) Phantom after segmentation. c) TVreg result. d) TVreg result after segmentation. The Jaccard similarity index between b) and d) is 0.6375. 7.1.2 Results RNA Phantom In the following paragraphs we will compare results of two simulated data sets, where the underlying phantom is the simulation of an RNA Polymerase II. The phantom prior to and after segmentation is shown in Figure 7.9 a) and b), respectively. High Dose Tilt-Series The first data set we used has a very high signal-to-noise ratio compared to ET standards. On the basis of this data set we want to clarify what is meant by pointwise results mentioned in Chapter 6.4 and the influence of the initial solution on these results. Again, we start this section with a reference result obtained with the TVreg software, shown in Figure 7.9 c) and its segmentation in 7.9 d). The Jaccard similarity index of Figure 7.9 b) and d) is 0.6375. In Figure 7.10 a) a result of the FB-EM-TV algorithm is shown. Here, we chose an initial solution of u0 = 80, 150 EM iterates and the regularization parameter α = 0.00095. For this reconstruction, the computational time needed was 26 minutes. If we compare this result to the TVreg result in Figure 7.9 c), we can see that we obtain a much smoother background, but the form of the reconstructed specimen is nearly the same. This is even more obvious when we compare the results after segmentation, i.e. Figures 7.10 b) and 7.9 d). The Jaccard similarity index is 0.6768 for the FB-EM-TV result and thereby slightly higher than for the TVreg result. The improved contrast of the Bregman-FB-EM-TV algorithm yields again an improvement, although not as significant as before. The result prior to segmentation is shown 7 Results 99 (a) (b) (c) Figure 7.10: Result for RNA Polymerase II with high-dose data obtained with the FBEM-TV algorithm prior to and after segmentation. a) Result prior to segmentation. b) Result after segmentation. c) KL divergence of f δ and Kuk . The Jaccard similarity of b) and Fig. 7.9 b) is 0.6768. (a) (b) (c) Figure 7.11: Result for RNA Polymerase II with high-dose data obtained with the Bregman-FB-EM-TV algorithm prior to and after segmentation. a) Result prior to segmentation. b) Result after segmentation. c) KL divergence of f δ and Kuk . The Jaccard similarity of b) and Fig. 7.9 b) is 0.6895. in 7.11 a) and after segmentation in 7.11 b), whereby the Jaccard similarity index is 0.6895. This reconstruction is obtained after three Bregman steps with 150 EM iterations in each step and with α = 0.003 and u0 = 1. In total, the computational time was 45 minutes. Note that we ran our algorithm on different computers, thereby the computational times are not necessarily consistent with each other and are only mentioned so that the reader can receive an impression of the time needed for a reconstruction. Figure 7.11 c) shows the decrease of the objective functional in each iteration. Next, we want to clarify the influence of the initial solution. Table 7.2 shows four reconstruction results, where all are obtained with the same regularization parameter α = 0.002. We see that for an initial solution of u0 = 1 and 50 EM iterations we get a pointwise result as described before. Only sparse high values are visible, whereas the overall form of the specimen vanishes. With more EM iterations, this problem can be solved. But since the computational time of our algorithm is always an important 7 Results 100 EMits = 50 EMits = 150 u0 = 1 u0 = 80 Table 7.2: Influence of the initial solution and the number of EM iterates. issue, we may prefer another solution, especially for large data sets. Then, an adapted starting value u0 is advisable. In the second row we see the reconstruction results for the same number of EM iterations but with u0 = 80. In this case, already after 50 EM iterations the specimen form is clearly reconstructed and the next 100 iterations are not necessarily needed if we want to shorten the computational time. Thus, an adapted starting value can lead to improved results after the same number of iterations. Low Dose Tilt-Series The most challenging simulated data set we used is again the RNA Polymerase II but, compared to the prior reconstructions, with a significantly decreased electron dose and thereby a very low signal-to-noise ratio. Again, we are interested in a reference result obtained with the TVreg software. We tested several parameters (see Figure 7.12), but in our test we were not able to obtain a result that could be segmented for the postprocessing steps. In Figure 7.13 results of the FB-EM-TV algorithm for different regularization parameters are presented. For all results, we chose 150 EM iterations and the initial solution u0 = 10. Here, we have to weigh if the focus is on an improved contrast or less false objects. By comparing Figure 7.13 a), d) and e), we see that the background becomes smoother with a higher value of α, but we lose the upper part of the RNA Polymerase. Therefore, we decided for the less regularized solution. The result after segmentation is shown in Figure 7.13 b) with a Jaccard similarity index of 0.3411 when compared to the ground truth segmentation (cf. Figure 7.9 b)). 7 Results 101 (a) λtv = 500 (b) λtv = 750 (c) λtv = 1500 Figure 7.12: Results for RNA Polymerase II with low-dose data obtained with the TVreg software. Examples for different regularization parameters. A similar comparison for different parameter choices but for the Bregman-EM-TV algorithm is given in Figure 7.14. We used 2 Bregman steps with 150 EM iterations each and the initial solution u0 = 1. Compared to Figure 7.13 we see a significantly enhanced contrast, especially for a smaller regularization parameter. This facilitates postprocessing steps based on segmentation. Unfortunately, the higher contrast leads to more false objects in the background and thereby a smaller Jaccard similarity index of 0.2922 when Figure 7.14 b) and Figure 7.9 b) are compared. Nevertheless, we think that the result in Figure 7.14 a) is preferable over c) and d), where we lose again the upper part of the object. The KL divergence of f δ and Kuk is shown in Figure 7.14 c). With regard to Figures 7.12, 7.13 and 7.14, we can conclude that the (Bregman-)FBEM-TV algorithm, in comparison to the TVreg software, is strong in cases of a very low signal-to-noise ratio. 7 Results 102 (a) α = 0.006 (b) (d) α = 0.0065 (c) (e) α = 0.007 Figure 7.13: Results for RNA Polymerase II with low-dose data obtained with the FBEM-TV algorithm for different regularization parameters. a), d) and e) Results prior to segmentation. b) Result in a) after segmentation. c) KL divergence of f δ and Kuk . The Jaccard similarity index of b) and Fig. 7.9 b) is 0.3411. (a) α = 0.011 (b) (d) α = 0.0125 (c) (e) α = 0.0135 Figure 7.14: Results for RNA Polymerase II with low-dose data obtained with the Bregman-FB-EM-TV algorithm for different regularization parameters. a), d) and e) Results prior to segmentation. b) Result in a) after segmentation. c) KL divergence of f δ and Kuk . The Jaccard similarity index of b) and Fig. 7.9 b) is 0.2922. 7 Results 103 (a) (b) (c) Figure 7.15: a) Experimental data set. b) Result of the TVreg software. c) Crystal structure4 of CPMV. 7.2 Experimental Data The experimental data set we use is a single-axis tilt-series of a cryo-fixated in vitro specimen by courtesy of FEI3 . The specimen contains a mixture of different viruses, including Cowpea Mosaic Viruses (CPMV), in aqueous buffer. The data set we use is a subset of the original larger data set, mainly containing CPMV virions. It is acquired by FEI using a conventional 300 keV bright-field TEM. Data are recorded from −62.18◦ to 58.03◦ , with 81 micrographs in total. In Figure 7.15 we present the data set as well as a crystal structure of a cowpea mosaic virus. 7.2.1 Results CPMV Virus In this paragraph we present results of the (Bregman-)FB-EM-TV algorithm that we obtained with the experimental data set presented in Figure 7.15 a). In Figure 7.16 a) a result of the FB-EM-TV algorithm is shown. Here, the initial solution was u0 = 1 and the regularization parameter α = 0.0035. In contrast to the reconstruction shown in the previous paragraph, we used only 20 EM iterations, which results in a computational time of 135 minutes. A reconstruction with 150 3 4 www.fei.com http://www.scripps.edu/johnson/research/crystals.html 7 Results 104 (a) (b) Figure 7.16: Results of the FB-EM-TV and Bregman-FB-EM-TV algorithm for the experimental data set. a) Result of the FB-EM-TV algorithm with α = 0.035. b) Result of the Bregman-FB-EM-TV algorithm with α = 0.095. EM iterations, as done for the simulated data sets, would take 17 hours and thereby prevents reasonable parameter tests. The second reconstruction, presented in Figure 7.16 b), is obtained with the BregmanFB-EM-TV algorithm after 3 Bregman steps with 10 EM iterations each. The computation of this result took about 3.5 hours. Note that we used a small damping parameter (ω = 0.1) for both reconstructions. Both reconstructions suffer from strong intensity variations at the edges of the object. Moreover, with more iterations the results become pointwise and the algorithm might diverge. The formation of sparse high intensity values can be suppressed by a strong regularization. This motivates the usage of the Bregman-FB-EM-TV algorithm as an inverse scale space method. Unfortunately, the results of the later Bregman steps are again impaired by individual high intensity values. Therefore, the algorithm needs to be stopped before these high values arise but then we end up with an overregularized solution. A small damping parameter can result in enhanced reconstructions, although it does not solve the problem. Thus, based on the current status of our results, we have to admit that these difficulties prevent reconstructions from experimental data sets that are comparable to the results of the TVreg software. The good results with respect to contrast and contour enhancement that we achieved with the Bregman-FB-EM-TV algorithm for simulated 7 Results 105 (a) α = 0.0007 (b) α = 0.00035 (c) α = 0.007 Figure 7.17: Stripe artifacts that can impair the reconstruction results obtained with the FB-EM-TV algorithm using a scaled version of the experimental data set. Results for different regularization parameters. data sets cannot be transferred to experimental data sets. The reason for these difficulties is a missing padding in the forward operator, which has a negative influence in the case of a nonlinear data term. In the implementation of the forward operator the region outside the given data set is assumed to be zero. If the data discrepancy term is linear and the data can be preprocessed in the way that the background is substituted prior to reconstruction, this assumption is reasonable. In the case of a nonlinear data discrepancy term an equivalent preprocessing step to substitute the background is not possible. Thus, a reasonable assumption in this case is that the region outside the given data set is constant, whereby this constant is estimated e.g. as the average of the data. For the simulated data sets this is less an issue since the region that contains the information about the specimen is only a subregion of the larger data set (cf. the noise-free data sets in Figures 7.1 and 7.2 b)). After including this assumption in the forward and adjoint operator we expect results from the experimental data set that are comparable to the results from simulated data sets shown before. Stripe Artifacts In this paragraph, we want to shortly address the formation of stripe artifacts, mentioned before. Nevertheless, we want to stress that these are really rare artifacts when using the KL divergence as a data discrepancy term. In the examples shown below, we scaled the data prior to reconstruction to reduce the influence of the missing constant, mentioned before. Then, the formation of stripe artifacts was much 7 Results 106 more likely than without the scaling. In some tests, where we used the L2 -norm as the data discrepancy term we observed these artifacts much more often. In those cases, the artifacts could be circumvented by a simple line search to determine the relaxation parameter in the Landweber step. Figure 7.17 shows results of the FB-EM-TV algorithm for different regularization steps. Prior to the reconstruction, we scaled the experimental data set by 10. We can see that in the case of a small regularization parameter (cf. Fig. 7.17 a)) the results are impaired by stripe artifacts with a direction orthogonal to the tilt axis. In each iteration step, these artifacts are emphasized and result in the divergence of the algorithm. A higher regularization (cf. Fig. 7.17 b) and c)) can suppress the formation of stripe artifacts but leads to overregularized solutions. This motivates that in the case of stripe artifacts the Bregman-FB-EM-TV algorithm should be used. Then, the artifacts are suppressed in the first Bregman step and will not impair the reconstruction in later iterates. 107 8 Conclusion and Outlook In this thesis we presented a reconstruction algorithm for ET based on variational methods. We studied an approach using the KL divergence as a data discrepancy term combined with different regularization functionals, namely total variation regularization and an extension with Bregman iterations. We started this thesis with an introduction to ET and mentioned the advantages of electron microscopy compared to light microscopy. Moreover, we gave an overview of the build-up of a TEM in order to facilitate the understanding of the forward model presented thereafter. The model we used to describe the image acquisition is a computationally feasible model for phase contrast TEM imaging. Afterwards, we introduced variational approaches for solving an inverse problem and gave an overview on different data and regularization functionals. We decided in favor of a data discrepancy term based on a statistical modeling in terms of a MAP likelihood estimation combined with total variation regularization or an extension with Bregman distances. Since we assumed that the recorded data are affected by Poisson noise, the data discrepancy term we used is the KL divergence. In the numerical part we introduced two different forward-backward-splitting algorithms. They are adapted for a forward operator of the form Ku = C − Lu with a linear operator L and can easily be generalized to other applications with a forward R operator of this affine form. Both algorithms are implemented in MATLAB , whereas the forward and adjoint operator for phase contrast TEM imaging are implemented in C and invoked via MEX files. The first algorithm tries to find a minimum of an energy functional consisting of the KL divergence and the total variation. We compared the results that we obtained with this algorithm to the ones of another toolbox, which minimizes an energy functional consisting of the L2 -norm as a data discrepany term and the total variation as well. The L2 -norm as a data discrepancy term can be associated with modelling additive Gaussian noise. Thus, by means of this comparison, we wanted to investigate the excess profit of 8 Conclusion and Outlook 108 the more accurate model of Poisson noise. Based on the current status of our results we come to the conclusion that improved results can be obtained, but modelling Poisson compared to additive Gaussian noise cannot achieve the improvements we hoped for. The most notable improvements could be obtained for data with a very low signal to noise ratio. The quality of the reconstructions is affected by several difficulties introduced in section 6.4 and clarified by some examples later on. Inaccuracies in the implementation of the forward model seem to have a much higher impact in the case of modelling Poisson noise. This affects especially the reconstructions from experimental data sets. At the current status, these results cannot reflect the good results obtained with simulated data sets. We think that an adapted implementation of the forward model would be helpful and could solve some of the algorithmic problems. The second algorithm is an extension of the first one by means of iterative regularization based on Bregman distances. With regard to our results we can say that in nearly all cases we could achieve improved results. Especially in the case of a low signal-tonoise ratio, iterative regularizations based on Bregman distances are beneficial. The enhanced contrast of the results obtained with this second algorithm can facilitate postprocessing steps based on segmentation methods as presented in Chapter 7. Besides an adaption of the implemention for the forward operator, there are further tasks in order to improve the proposed reconstruction algorithms. Right now, there are several parameters that need to be chosen manually. We would like to incorporate parameter choice rules in the (Bregman-)FB-EM-TV algorithm to minimize the number of parameters that must be chosen by the user. Concerning the regularization parameter, this could be a fixed choice in dependence of the electron dose and other given parameters that significantly influence the quality of the recorded data. Another task is a suitable criterion to stop the Bregman iterations. Since there is no reliable estimate of the noise level in ET, we need to find a way to stop the iterations as soon as the new iterate is noisier than the previous one. To enhance the applicability of our algorithm for large experimental data sets an acceleration is indispensable. As a first measure one could implement the whole algorithm in C, although we think that a further acceleration is needed. Therefore, GPU-accelerated computing would be advantageous. Moreover, we are interested in a theoretical analysis of our algorithm. Especially, we would like to find out under which conditions the convergence of the algorithm can be proven. Once these problems are solved, we want to enlarge the number of possible data and regularization terms. We would like to incorporate a data term based on a noise model that accounts for a mixture of Gaussian and Poisson noise, which is present in TEM images. Moreover, it would be interesting to test higher-order TV methods. 109 Acronyms ART CCD CT CTF EM ET FBP FBS KKT KL MAP MTF PSF ROF SIRT STEM TEM TV WBP Algebraic Reconstruction Technique Charged Coupled Device Computed Tomography Contrast Transfer Function Expectation Maximization Electron Tomography Filtered Back-Projection Forward-Backward-Splitting Karush-Kuhn-Tucker Kullback-Leibler Maximum A Posteriori Modulation Transfer Function Point Spread Function Rudin-Osher-Fatemi Simultaneous Iterative Reconstruction Technique Scanning Transmission Electron Microscope Transmission Electron Microscope Total Variation Weighted Back-Projection 110 List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7 3.1 4.1 4.2 4.3 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 Different kinds of electron scattering from a thin specimen. . . . . . . . Cross section of the column of a modern TEM. . . . . . . . . . . . . . Cross-section of an electromagnetic lens. . . . . . . . . . . . . . . . . . Ray diagram illustrating how an aperture restricts the angular spread of electrons entering the lens. . . . . . . . . . . . . . . . . . . . . . . . The concept of overfocus, focus and underfocus. . . . . . . . . . . . . . Parallel-beam operation in the TEM. . . . . . . . . . . . . . . . . . . . Single tilt sample holder for a TEM. . . . . . . . . . . . . . . . . . . . The optical set-up consisting of a single thin lens with an aperture in its focal plane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bregman distances for single-valued subdifferentials. . . . . . . . Bregman distances for a multi-valued subdifferential. . . . . . . Contrast loss for 1D signal recovered with TV regularization and ent regularization parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . differ. . . . Simulated data set from balls phantom. . . . . . . . . . . . . . . . . . . Simulated data set from RNA phantom. . . . . . . . . . . . . . . . . . Balls phantom and TVreg result prior to and after segmentation. . . . . Results of the balls data set obtained with the FB-EM-TV algorithm for different regularization parameters. . . . . . . . . . . . . . . . . . . Minimal euclidean distance between the centroid of a reconstructed ball and the centroid of a ball in the phantom. . . . . . . . . . . . . . . . . Result for the balls data set obtained with the FB-EM-TV algorithm prior to and after segmentation. . . . . . . . . . . . . . . . . . . . . . . Results for the balls data set obtained with the Bregman-FB-EM-TV algorithm prior to and after segmentation. . . . . . . . . . . . . . . . . Volume ratio of the balls in the reconstruction compared to the phantom. 7 8 10 11 12 13 14 24 44 44 50 91 92 93 94 94 95 96 97 List of Figures 7.9 7.10 7.11 7.12 7.13 7.14 7.15 7.16 7.17 Phantom and TVreg result for RNA Polymerase II with high-dose data prior to and after segmentation. . . . . . . . . . . . . . . . . . . . . . . Result for RNA Polymerase II with high-dose data obtained with the FB-EM-TV algorithm prior to and after segmentation. . . . . . . . . . Result for RNA Polymerase II with high-dose data obtained with the Bregman-FB-EM-TV algorithm prior to and after segmentation. . . . . Results for RNA Polymerase II with low-dose data obtained with the TVreg software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results for RNA Polymerase II with low-dose data obtained with the FB-EM-TV algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . Results for RNA Polymerase II with low-dose data obtained with the Bregman-FB-EM-TV algorithm. . . . . . . . . . . . . . . . . . . . . . . Experimental data set, TVreg result and crystal structure of CPMV. . Results of the FB-EM-TV and Bregman-FB-EM-TV algorithm for the experimental data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . Stripe artifacts that can impair the reconstruction results obtained with the FB-EM-TV algorithm using a scaled version of the experimental data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 98 99 99 101 102 102 103 104 105 112 List of Tables 4.1 Different Bregman Distances . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1 Overview of the particular settings for q, h and β in (5.54) in dependence of the different algorithms presented in sections 5.2.2 and 5.2.3. . . . . 82 7.1 7.2 Evaluation of reconstructed balls. . . . . . . . . . . . . . . . . . . . . . 97 Influence of the initial solution and the number of EM iterates. . . . . . 100 113 Bibliography [1] R. Acar and C. R. Vogel. Analysis of bounded variation penalty methods for ill-posed problems. Inverse problems, 10(6):1217, 1994. 48, 66 [2] I. Aganj, A. Bartesaghi, M. Borgnia, H. Y. Liao, G. Sapiro, and S. Subramaniam. Regularization for inverting the radon transform with wedge consideration. In Biomedical Imaging: From Nano to Macro, 2007. ISBI 2007. 4th IEEE International Symposium on, pages 217–220. IEEE, 2007. 53 [3] A. Al-Amoudi, J.-J. Chang, A. Leforestier, A. McDowall, L. M. Salamin, L. P. Norlén, K. Richter, N. S. Blanc, D. Studer, and J. Dubochet. Cryo-electron microscopy of vitreous sections. The EMBO journal, 23(18):3583–3588, 2004. 2 [4] U. Amato and W. Hughes. Maximum entropy regularization of Fredholm integral equations of the first kind. Inverse Problems, 7(6):793, 1991. 59 [5] M. Bachmayr and M. Burger. Iterative total variation schemes for nonlinear inverse problems. Inverse Problems, 25(10):105004, 2009. 51 [6] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. The Journal of Machine Learning Research, 6:1705–1749, 2005. 45 [7] J. Bardsley. An efficient computational method for total variation-penalized poisson likelihood estimation. Inverse Problems and Imaging, 2(2):167–185, 2008. 46 [8] J. M. Bardsley. Stopping rules for a nonnegatively constrained iterative method for ill-posed poisson imaging problems. BIT Numerical Mathematics, 48(4):651–664, 2008. 46 [9] J. M. Bardsley and J. Goldes. Regularization parameter selection methods for illposed poisson maximum likelihood estimation. Inverse Problems, 25(9):095005, 2009. 46 [10] J. M. Bardsley and N. Laobeul. An analysis of regularization by diffusion for Bibliography 114 ill-posed poisson likelihood estimations. Inverse Problems in Science and Engineering, 17(4):537–550, 2009. 45 [11] J. M. Bardsley and A. Luttman. Total variation-penalized poisson likelihood estimation for ill-posed problems. Advances in Computational Mathematics, 31(13):35–59, 2009. 45 [12] M. Benning, C. Brune, M. Burger, and J. Müller. Higher-order TV methods - enhancement via Bregman iteration. Journal of Scientific Computing, 54(23):269–310, 2013. 51 [13] F. Benvenuto, A. La Camera, C. Theys, A. Ferrari, H. Lantéri, and M. Bertero. The study of an iterative method for the reconstruction of images corrupted by poisson and gaussian noise. Inverse Problems, 24(3):035016, 2008. 42, 43 [14] M. Bertero, P. Boccacci, G. Desiderà, and G. Vicidomini. Image deblurring with poisson data: from cells to galaxies. Inverse Problems, 25(12):123006, 2009. 45 [15] M. Bertero, P. Boccacci, G. Talenti, R. Zanella, and L. Zanni. A discrepancy principle for poisson data. Inverse Problems, 26(10):105004, 2010. 46 [16] P. Binev, W. Dahmen, R. DeVore, P. Lamby, D. Savu, and R. Sharpley. Compressed sensing and electron microscopy. In T. Vogt, W. Dahmen, and P. Binev, editors, Modeling Nanoscale Imaging in Electron Microscopy, pages 73–126. Springer US, 2012. 54 [17] L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3):200–217, 1967. 43 [18] C. Brune. Variationsmethoden in der biomedizinischen Bildgebung. Lecture Notes, 2012. 46, 48, 49, 54 [19] C. Brune, A. Sawatzky, and M. Burger. Bregman-EM-TV methods with application to optical nanoscopy. In X.-C. Tai, K. Mørken, M. Lysaker, and K.-A. Lie, editors, Scale Space and Variational Methods in Computer Vision, volume 5567 of Lecture Notes in Computer Science, pages 235–246. Springer Berlin Heidelberg, 2009. 46, 51, 52 [20] C. Brune, A. Sawatzky, and M. Burger. Primal and dual Bregman methods with application to optical nanoscopy. International Journal of Computer Vision, 92(2):211–229, 2011. 46, 52, 62, 83 Bibliography 115 [21] M. Burger, K. Frick, S. Osher, and O. Scherzer. Inverse total variation flow. Multiscale Modeling & Simulation, 6(2):366–395, 2007. 51 [22] M. Burger, G. Gilboa, S. Osher, J. Xu, et al. Nonlinear inverse scale space methods. Communications in Mathematical Sciences, 4(1):179–212, 2006. 51, 52, 79 [23] M. Burger and S. Osher. A guide to the TV zoo. In Level Set and PDE Based Reconstruction Methods in Imaging, Lecture Notes in Mathematics. Springer International Publishing, 2013. 54, 56, 58 [24] A. Chambolle. An algorithm for total variation minimization and applications. J. Math. Imaging Vision, 20(1-2):89–97, 2004. Special issue on mathematics and image analysis. 81 [25] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision, 40(1):120–145, 2011. 81 [26] T. F. Chan, G. H. Golub, and P. Mulet. A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput., 20(6):1964–1977, 1999. 66 [27] I. Csiszar. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. The annals of statistics, 19(4):2032–2066, 1991. 45 [28] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B, 39(1):1–38, 1977. With discussion. 69, 70 [29] I. Ekeland and R. Temam. Convex analysis and variational problems. NorthHolland Publishing Co., Amsterdam-Oxford; American Elsevier Publishing Co., Inc., New York, 1976. Translated from the French, Studies in Mathematics and its Applications, Vol. 1. 36, 62, 71 [30] H. C. Elman and G. H. Golub. Inexact and preconditioned Uzawa algorithms for saddle point problems. SIAM J. Numer. Anal., 31(6):1645–1661, 1994. 82 [31] D. Fanelli and O. Öktem. Electron tomography: a short overview with an emphasis on the absorption potential model for the forward problem. Inverse Problems, 24(1):013001, 2008. 16, 18, 23, 24 Bibliography 116 [32] A. R. Faruqi and S. Subramaniam. CCD detectors in high-resolution biological electron microscopy. Quarterly Reviews of Biophysics, 33:1–27, 2000. 14 [33] FEI. An introduction to electron microscopy. 2010. 6, 8, 9, 10, 12, 14, 15 [34] R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients. Comput. J., 7:149–154, 1964. 64 [35] J. Frank. Single-particle reconstruction of biological macromolecules in electron microscopy–30 years. Quarterly reviews of biophysics, 42(03):139–158, 2009. 2 [36] E. Gil-Rodrigo, J. Portilla, D. Miraut, and R. Suarez-Mesa. Efficient joint poissongauss restoration using multi-frame l2-relaxed-l0 analysis-based sparsity. In Image Processing (ICIP), 2011 18th IEEE International Conference on, pages 1385–1388, 2011. 42, 43 [37] P. Gilbert. Iterative methods for the three-dimensional reconstruction of an object from projections. Journal of Theoretical Biology, 36(1):105–117, 1972. 33 [38] R. Glaeser, K. Downing, D. DeRozier, W. Chu, and J. Frank. Electron crystallography of biological macromolecules. Oxford University Press, 2007. 2 [39] T. Goldstein and S. Osher. The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci., 2(2):323–343, 2009. 81, 82 [40] R. Gordon, R. Bender, and G. T. Herman. Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and x-ray photography. Journal of theoretical Biology, 29(3):471–481, 1970. 33 [41] B. Goris, W. Van den Broek, K. Batenburg, H. Heidari Mezerji, and S. Bals. Electron tomography based on a total variation minimization reconstruction technique. Ultramicroscopy, 113:120–130, 2012. 53 [42] P. W. Hawkes and E. Kasper. Principles of electron optics, volume 3. Access Online via Elsevier, 1996. 16 [43] R. Henderson. Realizing the potential of electron cryo-microscopy. Quarterly Reviews of Biophysics, 37(01):3–13, 2004. 2 [44] M. Hintermüller and K. Kunisch. Total bounded variation regularization as a bilaterally constrained optimization problem. SIAM J. Appl. Math., 64(4):1311– 1333, 2004. 66 Bibliography 117 [45] J.-B. Hiriart-Urruty and C. Lemaréchal. Convex analysis and minimization algorithms. I, volume 305 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1993. Fundamentals. 69, 82 [46] K. Ito and K. Kunisch. Lagrange multiplier approach to variational problems and applications, volume 15 of Advances in Design and Control. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008. 82 [47] A. Jezierska, E. Chouzenoux, J. Pesquet, and H. Talbot. A primal-dual proximal splitting approach for restoring data corrupted with poisson-gaussian noise. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 1085–1088, 2012. 42, 43 [48] P. Lax. Functional Analysis. Pure and applied mathematics. Wiley, 2002. 54, 55 [49] A. Leis, B. Rockel, L. Andrees, and W. Baumeister. Visualizing cells at the nanoscale. Trends in biochemical sciences, 34(2):60–70, 2009. 2 [50] P.-L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal., 16(6):964–979, 1979. 67 [51] L. Lucy. An iterative technique for the rectification of observed distributions. The astronomical journal, 79:745, 1974. 69 [52] F. Luisier, T. Blu, and M. Unser. Image denoising in mixed poisson–gaussian noise. Image Processing, IEEE Transactions on, 20(3):696–708, 2011. 42 [53] J. Müller. Parallel total variation minimization. Master’s thesis, Institute for Computational and Applied Mathematics, University of Münster, 2008. 66 [54] F. Natterer. The mathematics of computerized tomography. Springer, 1986. 29 [55] F. Natterer and F. Wübbeling. Mathematical methods in image reconstruction. Siam, 2001. 33, 70 [56] O. Öktem. Reconstruction methods in electron tomography. In Mathematical Methods in Biomedical Imaging and Intensity-Modulated Radiation Therapy (IMRT), pages 289–320. 2008. QC 20120131. 7, 16, 30 [57] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An iterative regularization method for total variation-based image restoration. Multiscale Modeling & Simulation, 4(2):460–489, 2005. 51, 52, 62, 79 Bibliography 118 [58] G. B. Passty. Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl., 72(2):383–390, 1979. 67 [59] P. A. Penczek. Fundamentals of three-dimensional reconstruction from projections. Methods in enzymology, 482:1–33, 2010. 32 [60] L. Reimer and H. Kohl. Transmission electron microscopy: physics of image formation, volume 36. Springer, 2008. 16 [61] E. Resmerita, H. W. Engl, and A. N. Iusem. The expectation-maximization algorithm for ill-posed integral equations: a convergence analysis. Inverse Problems, 23(6):2575–2588, 2007. 70 [62] W. H. Richardson. Bayesian-based iterative method of image restoration. JOSA, 62(1):55–59, 1972. 69 [63] C. V. Robinson, A. Sali, and W. Baumeister. The molecular sociology of the cell. Nature, 450(7172):973–982, 2007. 2 [64] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1):259–268, 1992. 49 [65] H. Rullgård. A new principle for choosing regularization parameter in certain inverse problems. arXiv preprint arXiv:0803.3713, 2008. 53, 54, 85, 86 [66] H. Rullgård, L.-G. Öfverstedt, S. Masich, B. Daneholt, and O. Öktem. Simulation of transmission electron microscope images of biological specimens. Journal of microscopy, 243(3):234–256, 2011. 90 [67] H. Rullgård, O. Öktem, and U. Skoglund. A componentwise iterated relative entropy regularization method with updated prior and regularization parameter. Inverse Problems, 23(5):2121, 2007. 53 [68] A. Sawatzky. (Nonlocal) total variation in medical imaging. PhD thesis, Institute for Computational and Applied Mathematics, University of Münster, 2011. 82 [69] A. Sawatzky, C. Brune, T. Kösters, F. Wübbeling, and M. Burger. EM-TV methods for inverse problems with poisson noise. In Level Set and PDE Based Reconstruction Methods in Imaging, pages 71–142. Springer, 2013. 46, 52, 54, 55, 56, 57, 58, 59, 61, 68, 72, 74, 79, 81, 83 [70] A. Sawatzky, C. Brune, J. Müller, and M. Burger. Total variation processing of Bibliography 119 images with poisson statistics. In Computer Analysis of Images and Patterns, pages 533–540, 2009. 46 [71] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen. Variational methods in imaging, volume 167. Springer, 2008. 35, 47 [72] L. I. Schiff. Quantum mechanics, chap 2. cGraw-Hill, 1968. 19 [73] S. Setzer, G. Steidl, and T. Teuber. Deblurring poissonian images by split Bregman techniques. Journal of Visual Communication and Image Representation, 21(3):193–199, 2010. 46 [74] L. A. Shepp and Y. Vardi. Maximum likelihood reconstruction for emission tomography. Medical Imaging, IEEE Transactions on, 1(2):113–122, 1982. 70 [75] U. Skoglund, L.-G. Öfverstedt, R. M. Burnett, and G. Bricogne. Maximumentropy three-dimensional reconstruction with deconvolution of the contrast transfer function: a test application with adenovirus. Journal of structural biology, 117(3):173–188, 1996. 53 [76] D. L. Snyder, A. M. Hammoud, and R. L. White. Image recovery from data acquired with a charge-coupled-device camera. JOSA A, 10:1014–1023, 1993. 42 [77] A. Sommerfeld. Die Greensche Funktion der Schwingungsgleichung. Deutsch Math.-Verein, 21:309–353, 1921. 20 J.-Ber. [78] A. C. Steven and W. Baumeister. The future is hybrid. Journal of structural biology, 163(3):186–195, 2008. 2 [79] T. Teuber, G. Steidl, and R. H. Chan. Minimization and parameter estimation for seminorm regularization models with I-divergence constraints. Inverse Problems, 29(3):035007, 2013. 46 [80] A. N. Tikhonov. On the solution of incorrectly put problems and the regularisation method. In Outlines Joint Sympos. Partial Differential Equations (Novosibirsk, 1963), pages 261–265. Acad. Sci. USSR Siberian Branch, Moscow, 1963. 35 [81] A. N. Tikhonov. Numerical methods for the solution of ill-posed problems. Springer, 1995. 35 [82] Y. Vardi, L. Shepp, and L. Kaufman. A statistical model for positron emission tomography. Journal of the American Statistical Association, 80(389):8–20, 1985. 70 Bibliography 120 [83] C. R. Vogel. Computational methods for inverse problems, volume 23 of Frontiers in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002. With a foreword by H. T. Banks. 64, 65 [84] D. B. Williams and C. B. Carter. The Transmission Electron Microscope. Springer, 1996. 5, 6, 7, 8, 9, 10, 11, 12, 13 [85] R. Zanella, P. Boccacci, L. Zanni, and M. Bertero. Efficient gradient projection methods for edge-preserving removal of poisson noise. Inverse Problems, 25(4):045010, 2009. 46