Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Rigid and Non-Rigid Surface Registration for Range Imaging Applications in Medicine Starre und nicht-starre Registrierung von Oberflächen für den Einsatz der Tiefenbildgebung in der Medizin Der Technischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg zur Erlangung des Doktorgrades Dr.-Ing. vorgelegt von Sebastian Bauer aus Marktheidenfeld Als Dissertation genehmigt von der Technischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg Tag der mündlichen Prüfung: Vorsitzende des Promotionsorgans: Gutachter: 24. September 2014 Prof. Dr.-Ing. habil. M. Merklein Prof. Dr.-Ing. J. Hornegger Prof. Dr. rer. nat. M. Rumpf Abstract The introduction of low-cost range imaging technologies that are capable of acquiring the three-dimensional geometry of an observed scene in an accurate, dense, and dynamic manner holds great potential for manifold applications in health care. Over the past few years, the use of range imaging modalities has been proposed for guidance in computer-assisted procedures, monitoring of interventional workspaces for safe robot-human interaction and workflow analysis, touch-less user interaction in sterile environments, and for application in early diagnosis and elderly care, among others. This thesis is concerned with the application of range imaging technologies in computer-assisted and image-guided interventions, where the geometric alignment of range imaging data to a given reference shape – either also acquired with range imaging technology or extracted from tomographic planning data – poses a fundamental challenge. In particular, we propose methods for both rigid and non-rigid surface registration that are tailored to cope with the specific properties of range imaging data. In the first part of this work, we focus on rigid surface registration problems. We introduce a point-based alignment approach based on matching customized local surface features and estimating a global transformation from the set of detected correspondences. The approach is capable of handling gross initial misalignments and the multi-modal case of aligning range imaging data to tomographic shape data. We investigate its application in image-guided open hepatic surgery and automatic patient setup in fractionated radiation therapy. For the rigid registration of surface data that exhibit only slight misalignments, such as with on-the-fly scene reconstruction using a hand-guided moving range imaging camera, we extend the classical iterative closest point algorithm to incorporate both geometric and photometric information. In particular, we investigate the use of acceleration structures for efficient nearest neighbor search to achieve real-time performance, and quantify the benefit of incorporating photometric information in endoscopic applications with a comprehensive simulation study. The emphasis of the second part of this work is on variational methods for non-rigid surface registration. Here, we target respiratory motion management in radiation therapy. The proposed methods estimate dense surface motion fields that describe the elastic deformation of the patient’s body. It can serve as a highdimensional respiration surrogate that substantially better reflects the complexity of human respiration compared to conventionally used low-dimensional surrogates. We propose three methods for different range imaging sensors and thereby account for the particular strengths and limitations of the individual modalities. For dense but noisy range imaging data, we propose a framework that solves the intertwined tasks of range image denoising and its registration with an accurate planning shape in a joint manner. For accurate but sparse range imaging data we introduce a method that aligns sparse measurements with a dense reference shape while simultaneously reconstructing a dense displacement field describing the non-rigid deformation of the body surface. For range imaging sensors that additionally capture photometric information, we investigate the estimation of surface motion fields driven by this complementary source of information. Kurzfassung Kostengünstige Technologien zur Tiefenbildgebung, welche die dreidimensionale Geometrie eines Objektes präzise, engmaschig und dynamisch akquirieren können, bergen großes Potential für Anwendungen im Gesundheitswesen. Erst kürzlich wurde die Tiefenbildgebung zur Navigation bei computergestützten Interventionen, zur Kollisionsvermeidung in robotergestützten Operationssäalen, zur Analyse von klinischen Arbeitsabläufen, zur berührungslosen Benutzerinteraktion in sterilen Umgebungen, oder für Anwendungen in der Früherkennung und Altenpflege vorgeschlagen. Die vorliegende Arbeit befasst sich mit der Nutzung der Tiefenbildgebung für computer- und bildgestützten Interventionen. Eine besondere Herausforderung in diesem Umfeld stellt die Registrierung von Tiefenbilddaten auf eine Referenzform, die entweder auch mit einer Tiefenbildkamera akquiriert oder aus tomographischen Planungsdaten extrahiert wurde, dar. Konkret werden Methoden zur starren und nicht-starren Registrierung von Oberflächen entwickelt, die speziell auf Tiefenbilddaten zugeschnitten sind. Der erste Teil der Arbeit behandelt starre Registrierungsprobleme. Wir präsentieren einen punktbasierten Registrierungsansatz, der auf dem Abgleich von lokalen Oberflächenmerkmalen basiert und aus den gefundenen Korrespondenzen eine globale Transformation schätzt. Er eignet sich für Problemstellungen mit großen initialen Abweichungen und für die multi-modale Registrierung von Tiefenbilddaten mit tomographischen Referenzformen. Wir untersuchen die Anwendung der Methode in der bildgestützten offenen Leberchirurgie und zur automatisierten Patientenpositionierung in der fraktionierten Strahlentherapie. Für die starre Registrierung von Oberflächen, die nur geringfügig zueinander verschoben sind, wie etwa bei der sukzessiven Rekonstruktion einer Szene mit einer handgeführten Tiefenbildkamera, erweitern wir den klassischen Algorithmus des iterativen nächsten Nachbarn auf die gemeinsame Analyse von geometrischen und photometrischen Informationen. Dabei untersuchen wir das Potential von Beschleunigungsstrukturen und quantifizieren den Vorteil dieses photo-geometrischen Ansatzes für endoskopische Anwendungen in einer Simulationsstudie. Der Fokus des zweiten Teils der Arbeit liegt auf variationellen Methoden zur nicht-starren Oberflächenregistrierung. Dabei adressieren wir das Management von Atembewegungen in der Strahlentherapie. Die vorgestellten Methoden schätzen dichte Oberflächen-Bewegungsfelder, welche die elastische Deformation des Patientenkörpers beschreiben. Diese Bewegungsfelder dienen als hochdimensionales Atemsurrogat und stellen die Komplexität der menschlichen Atmung wesentlich besser dar als konventionelle niedrigdimensionale Surrogate. Wir präsentieren drei Ansätze für unterschiedliche Tiefenbildgebungs-Sensoren, die auf deren spezifische Stärken und Schwächen zugeschnitten sind: Für dichte aber verrauschte Tiefenbilder schlagen wir eine Methode vor, die das Entrauschen von Tiefendaten mit deren Registrierung auf eine präzise Planungsform in einer kombinierten Formulierung löst. Für präzise aber dünn besetzte Tiefenbilder führen wir eine Methode ein, welche die dünn besetzten Tiefenbilder mit einer Referenzform registriert und gleichzeitig ein dichtes Oberflächen-Bewegungsfeld schätzt. Für Sensoren, die zusätzlich photometrische Informationen akquirieren, untersuchen wir die Schätzung von Bewegungsfeldern mithilfe dieser komplementären Daten. Acknowledgments First and foremost, let me express my sincere gratitude to Prof. Dr.-Ing. Joachim Hornegger for the opportunity to work in such an inspiring research environment. In particular, I appreciate his outstanding confidence in me, his encouragement, support and guidance over the years – not only as a scientific mentor –, the freedom he allowed me regarding the contents of my work, and for setting up collaborations that built the fundamental basis of this thesis. I would like to thank Prof. Dr. Martin Rumpf (University of Bonn) and Prof. Dr. Benjamin Berkels (RWTH Aachen University) for the intense collaboration in joint projects throughout this work. I enjoyed the winter months in Bonn, being a guest at the lab, and deeply appreciate the valuable discussions on setting up mathematical models, variational methods, finite elements, optimization techniques and the QuocMesh framework which facilitates life of an engineer. Many thanks to my colleagues at the Pattern Recognition Lab, for the pleasant and friendly atmosphere at the lab, for the ongoing knowledge sharing, and for the joyful time outside the working hours. In particular, let me thank Jakob Wasza and Sven Haase for the great time in our office, for the efforts in setting up a powerful development environment (RITK), and for endless scientific discussions that had a tremendous impact on this work. Among the students I have supervised, let me acknowledge Kerstin Müller, Dominik Neumann and Felix Lugauer for their excellent work that contributed to several publications. Let me also particularly thank Jakob Wasza for his meticulous review of this thesis. I acknowledge support by the European Regional Development Fund (ERDF) and the Bayerisches Staatsministerium für Wirtschaft, Infrastruktur, Verkehr und Technologie (StMWIVT), in the context of the R&D program IuK Bayern under Grant No. IUK338/001, and by the Graduate School of Information Science in Health (GSISH) and the Technische Universität München Graduate School. Thanks to our industrial partners at Siemens AG, Healthcare Sector, and Softgate GmbH: Dr. Natalia Anderl and Dr. Annemarie Bakai for the background in clinical workflows, Stefan Sattler and Stefan Schuster for the opportunity to serve new application fields beyond radiation therapy, and Dr. Florian Höpfl, Sebastian Reichert and Christiane Kupczok for the support in project management and camera calibration. I further would like to acknowledge Prof. Dr. Gerd Häusler and Dr. Svenja Ettl (Institute of Optics, Information and Photonics, University of Erlangen-Nürnberg) for the opportunity to investigate the active triangulation sensor for medical applications, Dr. Anja Borsdorf, Dr. Holger Kunze (Siemens AG, Healthcare Sector) and Prof. Dr. Arnd Dörfler (Department of Neuroradiology, Erlangen University Clinic) for their support in data acquisition, and Dr. Elli Angelopoulou for her great support in improving our manuscripts. Last but not least, I am deeply thankful to my wife and my family for their patience and support over the years. Erlangen, 26.04.2014 Sebastian Bauer Contents Chapter 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2 9 Range Imaging and Surface Registration in Medicine 2.1 Real-time Range Imaging Technologies . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Triangulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.2 Time-of-Flight Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.3 Discussion of RI Sensors investigated in this Thesis . . . . . . . . . . . . 13 2.2 Range Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 RITK: The Range Imaging Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.2 Virtual RGB-D Camera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.3 Range Data Enhancement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Applications in Medicine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.1 Prevention, Diagnosis and Support . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.2 Monitoring for OR Safety and Workflow Analysis . . . . . . . . . . . . . 21 2.3.3 Touchless Interaction and Visualization. . . . . . . . . . . . . . . . . . . . . 22 2.3.4 Guidance in Computer-assisted Interventions . . . . . . . . . . . . . . . . 22 2.4 Surface Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4.1 Global vs. Local Surface Registration. . . . . . . . . . . . . . . . . . . . . . . 24 2.4.2 Rigid vs. Non-Rigid Surface Registration. . . . . . . . . . . . . . . . . . . . 25 2.4.3 Medical Surface Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 I Rigid Surface Registration for Range Imaging Applications in Medicine 29 Chapter 3 Feature-based Multi-Modal Rigid Surface Registration 31 3.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1.1 Patient Setup in Fractionated Radiation Therapy . . . . . . . . . . . . . . 32 i 3.1.2 Image-Guided Open Liver Surgery . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Feature-based Surface Registration Framework. . . . . . . . . . . . . . . . . . . . 36 3.3.1 Correspondence Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.2 Transformation Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4 Shape Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.1 Spin Images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.2 Mesh Histograms of Oriented Gradients (MeshHOG) . . . . . . . . . . 40 3.4.3 Rotation Invariant Fast Features (RIFF) . . . . . . . . . . . . . . . . . . . . . 43 3.4.4 Distance Metrics for Feature Matching . . . . . . . . . . . . . . . . . . . . . 44 3.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.5.1 Multi-Modal Patient Setup in Fractionated RT . . . . . . . . . . . . . . . . 45 3.5.2 Multi-Modal Data Fusion in IGLS . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.6 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Chapter 4 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction 55 4.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.1.1 Operation Situs Reconstruction in Laparoscopy . . . . . . . . . . . . . . . 56 4.1.2 Towards 3-D Model Construction in Colonoscopy . . . . . . . . . . . . . 57 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 Photo-geometric Surface Registration Framework . . . . . . . . . . . . . . . . . . 60 4.3.1 Photo-geometric ICP Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.2 Approximative 6-D Nearest Neighbor Search using RBC . . . . . . . . 62 4.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4.1 Performance Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4.2 Experiments on Operation Situs Reconstruction. . . . . . . . . . . . . . . 68 4.4.3 Experiments on Colon Shape Model Construction . . . . . . . . . . . . . 72 4.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 II Non-Rigid Surface Registration for Range Imaging Applications in Medicine 79 Chapter 5 Joint Range Image Denoising and Surface Registration 81 5.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.1.1 Image-Guided Radiation Therapy. . . . . . . . . . . . . . . . . . . . . . . . . 82 5.1.2 Respiration-Synchronized Dose Delivery . . . . . . . . . . . . . . . . . . . 83 ii 5.1.3 Dense Deformation Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3 Non-Rigid Surface Registration Framework . . . . . . . . . . . . . . . . . . . . . . 86 5.3.1 Geometric Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.3.2 Definition of the Registration Energy . . . . . . . . . . . . . . . . . . . . . . 87 5.4 A Joint Denoising and Registration Approach. . . . . . . . . . . . . . . . . . . . . 88 5.4.1 Definition of the Registration Energy . . . . . . . . . . . . . . . . . . . . . . 89 5.4.2 Numerical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.5.1 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.6 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Chapter 6 Sparse-to-Dense Non-Rigid Surface Registration 101 6.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.2 Sparse-to-Dense Surface Registration Framework . . . . . . . . . . . . . . . . . . 103 6.2.1 Geometric Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.2.2 Definition of the Registration Energy . . . . . . . . . . . . . . . . . . . . . . 105 6.2.3 Numerical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.3.1 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Chapter 7 Photometry-driven Non-Rigid Surface Registration 119 7.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.2.1 Photometry-Driven Surface Registration . . . . . . . . . . . . . . . . . . . . 122 7.2.2 Geometry-Driven Surface Registration . . . . . . . . . . . . . . . . . . . . . 123 7.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.3.1 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Chapter 8 Outlook 129 Chapter 9 Summary 135 iii Chapter A Appendix 139 A.1 Projection Geometry. . . . . . . . . . . . . . . . . . . . . . . . . . A.1.1 Perspective Projection . . . . . . . . . . . . . . . . . . . . A.1.2 3-D Point Cloud Reconstruction. . . . . . . . . . . . . A.1.3 Range Image Data Representation . . . . . . . . . . . A.2 Joint Range Image Denoising and Surface Registration . A.2.1 Approximation of the Matching Energy . . . . . . . A.2.2 Derivation of the First Variations . . . . . . . . . . . . A.3 Sparse-to-dense Non-Rigid Surface Registration . . . . . A.3.1 Derivation of the First Variations . . . . . . . . . . . . A.3.2 Improved Projection Approximation . . . . . . . . . A.3.3 Detailed Results of the Prototype Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 139 140 141 141 141 142 144 144 144 145 List of Symbols 147 List of Abbreviations 151 List of Figures 153 List of Tables 155 Bibliography 157 iv CHAPTER 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Computer assistance has become increasingly important in medicine over the past decades. One of the key requirements in this context is a robust localization and dynamic tracking of the target objects involved in the specific medical procedure. Guidance and navigation concepts in computer-assisted interventions are based on establishing the spatial relationship between the patient anatomy and the medical instruments used during the intervention. This typically involves the registration of intra-interventionally acquired data – describing the patient anatomy during the intervention – to pre-interventionally acquired patient-specific anatomical models. One of the fundamental prerequisites to perform this registration is a dynamic, accurate, and robust acquisition of the patient anatomy during the intervention. So far, this has been addressed using either optical or electromagnetic tracking technologies that require markers to be attached to the target, or by means of intra-interventional radiographic imaging. While marker-based approaches often complicate the workflow and are thus not widely accepted in clinical routine, radiographic imaging implies a substantial radiation exposure to the patient and/or the physician. In contrast, real-time range imaging (RI) holds a marker-less and radiation-free alternative for the acquisition of intra-interventional data in computer-assisted interventions. Indeed, RI based techniques have experienced a remarkable development in this context with the availability of dynamic, dense and low-cost technologies and have been applied for numerous applications in the clinical environment, far beyond marker-less localization. In this chapter, we outline the motivation for this thesis and specify our scientific contributions to the field of surface registration for RI applications in medicine. 1.1 Motivation Registration has emerged as one of the key technologies in medical image computing and is an essential component in various applications in computer-assisted diagnosis and intervention. In general, registration denotes the process of finding an optimal geometric transformation that brings a moving template dataset into 1 2 Introduction congruence with a fixed reference dataset. In practice, registration problems are addressed by specifying a suitable mathematical model of transformations for the desired alignment and estimating the model parameters by optimizing a dedicated objective function. Depending on the particular application, the spatial correspondence between the template and the reference dataset can be of rigid or elastic nature. Classical medical image registration tasks involve the alignment of planar images (2-D/2-D registration), the alignment of volumetric datasets (3-D/3-D registration), projective alignment techniques (2-D/3-D registration), and the alignment of shapes (2-D contours or 3-D surfaces). First and foremost, the importance of medical image registration is driven by the growing and diverse variety of imaging modalities. This trend demands for methods to combine complementary data from multiple modalities, making it easily accessible to the physician and superseding the traditionally mental fusion. Typical scenarios involve the combination of morphological data, e.g. from computed tomography (CT), magnetic resonance imaging (MRI) or ultrasound (US), with functional information, e.g. from positron emission tomography (PET) or single-photon emission computed tomography (SPECT) imaging. Second, medical image registration provides the basis for intra-subject monitoring of spatio-temporal progression in longitudinal studies and inter-subject comparison with anatomical atlases and statistical shape models. Third, it can be applied in a joint manner with related medical image computing tasks such as denoising and segmentation. Over the past two decades, a broad spectrum of approaches for rigid and non-rigid, parametric and non-parametric registration with numerous options in terms of distance measures, regularizers and optimization schemes has evolved. For a survey let us refer to standard literature in the field [Hajn 01, Main 98, Mark 12, Mode 03a, Mode 09, Ruec 11]. In the last few years, significant advances in optics, electronics, sensor design, and computing power have rendered 3-D range imaging (RI) at dense resolutions, real-time frame rates and low manufacturing costs possible. These novel RI technologies hold benefits for a multitude of medical applications. Many of these applications involve the registration of an acquired 3-D shape with a reference model, for instance, the intra-fractional registration of the external patient body surface with tomographic planning data for patient setup and respiratory motion management in radiation therapy (RT), or the intra-operative registration of the operation situs with pre-operative planning data for augmented reality navigation and guidance. Hence, RI-based surface registration in medicine is a rapidly evolving field of research [Baue 13a]. This thesis is embedded in the context of RIbased applications in image-guided interventions – focusing on tasks that involve shape correspondence problems. We present novel concepts based on dynamic 3-D perception to improve the quality, safety and efficiency of clinical workflows and propose both rigid and non-rigid surface registration techniques that are optimized w.r.t. the strengths and limitations of different RI technologies. 1.2 Contributions The scientific focus of this work lies on the development of novel rigid and nonrigid surface registration techniques that meet the requirements of RI-based med- 1.2 Contributions 3 ical applications. In addition, we investigate new clinical applications for imageguided open surgery, minimally invasive procedures and radiation therapy. Some of the proposed methods follow up on ideas that have been introduced before. Hence, let us briefly summarize the main contributions of this thesis to the progress of research in the field of medical surface registration, along with the associated scientific publications. Contributions to the Field of Rigid Surface Registration First, we outline our contributions to the field of rigid surface registration for range imaging applications in medicine: • We propose a novel feature-based method for marker-less rigid alignment of intra-procedural range imaging data with pre-operative surface data extracted from tomographic imaging modalities. Regarding the challenge of multi-modal registration, we introduce customized 3-D shape descriptors that meet the following specific requirements: invariance to mesh density, mesh organization and inter-modality deviations in surface topography that result from the underlying sampling principles. Furthermore, we investigate the application of the proposed method in image-guided liver surgery and radiation therapy. Methods and results are detailed in Chap. 3 and have been presented at two conferences: [Mull 11] K. Müller, S. Bauer, J. Wasza, and J. Hornegger. Automatic Multi-modal ToF/CT Organ Surface Registration. In: Proceedings of Bildverarbeitung für die Medizin (BVM), pp. 154– 158, Springer, Mar 2011. [Baue 11a] S. Bauer, J. Wasza, S. Haase, N. Marosi, and J. Hornegger. Multi-modal Surface Registration for Markerless Initial Patient Setup in Radiation Therapy using Microsoft’s Kinect Sensor. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) Workshops, Workshop on Consumer Depth Cameras for Computer Vision (CDC4CV), pp. 1175–1181, IEEE, Nov 2011. • We propose a photo-geometric variant of the iterative closest point (ICP) algorithm in combination with an efficient nearest neighbor search scheme. Incorporating photometric information into the registration process is of particular interest for modern RI sensors that exhibit a low signal-to-noise ratio (SNR) in the range domain but acquire complementary high-grade photometric information. We investigate the benefits of this photo-geometric registration framework for two prospective clinical applications: optical 3-D colonoscopy and laparoscopic interventions. To overcome the traditional bottleneck in nearest neighbor search space traversal, we propose a variant of a recently published scheme by Cayton [Cayt 10, Cayt 11] that we have opti- 4 Introduction mized in terms of performance. Methods and results are detailed in Chap. 4 and have been presented at a conference and published as a book chapter: [Neum 11] D. Neumann, F. Lugauer, S. Bauer, J. Wasza, and J. Hornegger. Real-time RGB-D Mapping and 3-D Modeling on the GPU using the Random Ball Cover Data Structure. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) Workshops, Workshop on Consumer Depth Cameras for Computer Vision (CDC4CV), pp. 1161– 1167, IEEE, Nov 2011. [Baue 13b] S. Bauer, J. Wasza, F. Lugauer, D. Neumann, and J. Hornegger. Chap. Real-time RGB-D Mapping and 3-D Modeling on the GPU using the Random Ball Cover. In: Consumer Depth Cameras for Computer Vision: Research Topics and Applications, pp. 27–48, Advances in Computer Vision and Pattern Recognition, Springer, 2013. Contributions to the Field of Non-Rigid Surface Registration Second, we outline our contributions to the field of non-rigid surface registration for range imaging applications in medicine. Overall, we propose three novel methods for the estimation of dense 4-D surface motion fields (3-D+time), describing the elastic deformation of the patient’s external body under the influence of respiration. Dense respiratory motion tracking holds great potentials for motion compensation techniques in radiation therapy, as it better reflects the complexity of respiratory motion compared to conventionally used 1-D respiration surrogates [Faya 11, Yan 06]. All three approaches are optimized w.r.t. the strengths and limitations of different range imaging technologies: • We propose a novel variational framework for joint denoising of range imaging data and its non-rigid registration to a reference surface. Our experiments show that solving both tasks of denoising and registration in a simultaneous manner is superior to a sequential approach where surface registration is performed after denoising of noisy RI measurements. This allows a robust estimation of dense 4-D surface motion fields with range imaging modalities that exhibit a low SNR. Methods and results are detailed in Chap. 5 and have been presented at a conference: [Baue 12b] S. Bauer, B. Berkels, J. Hornegger, and M. Rumpf. Joint ToF Image Denoising and Registration with a CT Surface in Radiation Therapy. In: Proceedings of International Conference on Scale Space and Variational Methods in Computer Vision (SSVM), pp. 98–109, Springer, May 2012. • We propose the application of a novel RI sensor that acquires sparse but highly accurate 3-D position measurements in real-time. These are regis- 1.2 Contributions 5 tered with a dense reference surface extracted from planning data. Thereby a dense displacement field is recovered which describes the elastic spatiotemporal deformation of the complete patient body surface. In particular, the proposed approach involves the estimation of dense 4-D surface motion fields from sparse measurements using prior shape knowledge from planning data. It yields both a reconstruction of the instantaneous patient shape and a high-dimensional respiratory surrogate for respiratory motion tracking. Methods and results are detailed in Chap. 6 and have been presented at a conference and published in a journal article: [Baue 12a] S. Bauer, B. Berkels, S. Ettl, O. Arold, J. Hornegger, and M. Rumpf. Marker-less Reconstruction of Dense 4-D Surface Motion Fields using Active Laser Triangulation for Respiratory Motion Management. In: Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 414–421, LNCS 7510, Part I, Springer, Oct 2012. [Berk 13] B. Berkels, S. Bauer, S. Ettl, O. Arold, J. Hornegger, and M. Rumpf. Joint Surface Reconstruction and 4-D Deformation Estimation from Sparse Data and Prior Knowledge for MarkerLess Respiratory Motion Tracking. In: Medical Physics, Vol. 40, No. 9, pp. 091703 1–10, Sep 2013. • For RI sensors that provide aligned geometric and photometric information, we propose a method that performs the reconstruction of the geometric surface motion field by estimating the non-rigid transformation in the photometric image domain using a variational optical flow formulation. From this photometric 2-D displacement field and the known associated range measurements, the 3-D surface motion field is deduced. Methods and results are detailed in Chap. 7 and have been presented at a conference: [Baue 12d] S. Bauer, J. Wasza, and J. Hornegger. Photometric Estimation of 3-D Surface Motion Fields for Respiration Management. In: Proceedings of Bildverarbeitung für die Medizin (BVM), pp. 105–110, Springer, Mar 2012. In addition to the aforementioned scientific contributions, we have developed a powerful framework for high-performance and rapid prototyping RI processing, far beyond surface registration, named range imaging toolkit (RITK) [Wasz 11b]. RITK is released as an open source platform and thus another contribution to the scientific community of range image processing and analysis, paving the way for accelerating the use of range imaging technologies in clinical applications. Furthermore, we have conducted a comprehensive state-of-the-art survey on the integration of modern RI technologies in health care applications, published as a book chapter: 6 Introduction [Baue 13a] S. Bauer, A. Seitel, H. Hofmann, T. Blum, J. Wasza, M. Balda, H.-P. Meinzer, N. Navab, J. Hornegger, and L. Maier-Hein. Real-Time Range Imaging in Health Care: A Survey. In: Timeof-Flight and Depth Imaging. Sensors, Algorithms, and Applications, pp. 228–254, LNCS 8200, Springer, 2013. The survey identifies promising applications and algorithms, and provides an overview of recent developments in this emerging domain. We have reviewed recent methods and results and discuss open research issues and challenges that are of fundamental importance for the progression of the field. To our knowledge, this survey is the first in literature to address the fast growing number of research activities in the context of real-time RI in health care. Some chapters of this thesis contain material that has been published or submitted to conference proceedings and journals. In addition to the works listed in the itemization above, this involves several publications that emerged during this thesis [Baue 11b, Baue 12c, Ettl 12a, Grim 12, Pass 08, Sout 10, Wasz 11c, Wasz 11b, Wasz 11a, Wasz 12b, Wasz 12a, Wasz 13]. 1.3 Organization of this Thesis Let us outline the structure of this thesis, cf. Fig. 1.1. Chap. 2 provides a comprehensive overview about RI technologies with a focus on modalities applied in this work. We introduce our framework for range image processing and comment on range image enhancement. In addition, we present a survey of recent developments of RI applications in health care and summarize directions of research in the field of surface registration and shape correspondence. As we consider different medical applications within this thesis, the clinical background is discussed in the individual chapters. The main part of this thesis divides into two parts. Part I is concerned with rigid surface registration techniques. In Chap. 3, we introduce a comprehensive framework for feature-based rigid shape alignment and propose customized 3-D surface descriptors that meet the specific requirements for multi-modal surface registration. This point-based approach inherently copes with cases of partial matching and gross misalignments that occur in the applications we address. In particular, we propose the use of this automatic multi-modal surface registration framework for two clinical applications: image-guided liver surgery (IGLS) and reproducible patient setup in fractionated RT. For both applications, we present experimental results on real data from different RI modalities. Chap. 4 is concerned with rigid surface registration in the case of slight misalignments. In this context, the ICP algorithm is an established approach. Previous work had indicated that the incorporation of complementary photometric information into the correspondence search – opposed to the classical ICP that solely considers the geometric domain – improves alignment quality. Due to computational constraints and the lack of RI cameras that acquire both 3-D and color data, this combined photo-geometric approach has not been considered for interactive applications before. We particularly address on-the-fly reconstruction of tubular anatomical shapes holding potential 1.3 Organization of this Thesis 7 Introduction C1 Range Imaging and Surface Registration in Medicine C2 Rigid Surface Registration Feature-based Multi-Modal Rigid Surface Registration - Shape descriptors for multi-modal application - Automatic RT patient setup, image-guided liver surgery Photo-Geometric Rigid Surface Registration - Color ICP using the random ball cover data structure - Endoscopic operation situs and organ reconstruction Non-Rigid Surface Registration PI P II Joint Denoising and Non-Rigid Surface Registration - Using shape priors to support denoising process - Respiratory motion tracking using dense RI sensors C5 Sparse-to-Dense Non-Rigid Surface Registration - Dense estimation from sparse data and shape priors - Respiratory motion tracking using sparse RI sensors C6 Photometry-Driven Non-Rigid Surface Registration - Incorporating complementary sources of information - Respiratory motion tracking using RGB-D sensors C7 C3 C4 Outlook C8 Summary C9 Figure 1.1: Organization of this thesis, dividing into rigid (left, Part I) and non-rigid (right, Part II) surface registration. For the individual chapters, the main contribution in terms of methodology and the medical applications we address are depicted. for 3-D colonoscopy, and the reconstruction of the operation situs for field-of-view expansion in laparoscopic interventions by consecutive alignment of RI streams acquired from a hand-guided moving camera. The focus of Part II is on non-rigid methods for surface registration. In terms of application, Chapters 5-7 address the estimation of dense surface motion fields as a high-dimensional respiration surrogate holding potentials for motion tracking and compensation in image-guided diagnosis and interventions. First and foremost, we target motion-compensated dose delivery using external-internal correlation models in radiation therapy. The clinical background is detailed in the first chapter of Part II. In Chap. 5, we derive a variational formulation for nonrigid registration of dense surface data. Formulated as a classical shape alignment problem, the template surface is deformed to match a given reference while ensuring a smooth displacement field, thus preserving the original shape characteristics. Extending this basic formulation, we introduce a novel approach that solves denoising of dense RI data and its non-rigid registration to a reference surface extracted from planning data. Experimental results confirm that treating the two intertwined tasks of denoising and registration in a joint manner is beneficial: Incorporating prior knowledge about the reference shape helps substantially 8 Introduction in the denoising process, and proper denoising renders the registration problem more robust. Chap. 6 investigates the medical potential of a novel RI sensor that acquires sparse but highly accurate 3-D data in real-time. In particular, we have developed a sparse-to-dense registration approach that is capable of recovering the patient’s dense 3-D body surface and estimating a 4-D (3-D+time) surface motion field from sparse sampling data and patient-specific prior shape knowledge extracted from planning data. The method is validated on a 4-D CT respiration phantom and evaluated on both synthetic and real data. The experimental results indicate that a paradigm shift in RI technology – accurate but sparse vs. dense but noisy – is a promising direction for future research. Chap. 7 takes advantage of additional photometric information available with modern RGB-D sensors which capture both color (RGB) and depth (D) information, along the lines of Chap. 3. We propose an approach that breaks the estimation of surface motion fields down to a non-rigid image registration problem in the 2-D photometric domain. Based on this 2-D displacement field, the geometric 3-D motion field is deduced from the associated depth information. Experimental results on real data indicate that incorporating the photometric domain as a complementary source of information can help improving the quality of surface motion fields. The thesis concludes with an outlook (Chap. 8) and a summary (Chap. 9). CHAPTER 2 Range Imaging and Surface Registration in Medicine 2.1 Real-time Range Imaging Technologies . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Range Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Applications in Medicine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Surface Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 The recent availability of dynamic, dense, and low-cost range imaging has gained widespread interest in health care. It opens up new opportunities and has an increasing impact on both research and commercial activities. In this chapter, first, we introduce the measurement principles of different real-time range imaging modalities with a focus on RI sensors investigated in this thesis (Sect. 2.1). In Sect. 2.2, we present our development platform for range image processing and comment on range image enhancement. Sect. 2.3 comprises a state-of-theart survey on the integration of modern RI sensors in medical applications. Last, in Sect. 2.4, we present an overview of approaches to rigid and non-rigid surface registration. Parts of this chapter have been published in [Baue 13a]. 2.1 Real-time Range Imaging Technologies The projection of the 3-D world onto a 2-D sensor domain results in general in the loss of depth information. This implies that, for a given 2-D image, the 3-D geometry of the observed scene cannot be reconstructed unambiguously. Nature has satisfied the need for depth perception by providing humans and most animals with a binocular visual system. Based on the exploitation of binocular disparity, we can extract qualitative depth information from a pair of 2-D projections. For a human being, perceiving the world in three dimensions apparently comes without any effort. In contrast, the acquisition or reconstruction of 3-D geometry with sensor technologies has turned out to be an ongoing challenge. Over the past decades, a multitude of technologies for non-contact 3-D perception has been proposed. For an overview we refer to the surveys by Jarvis [Jarv 83] and Blais [Blai 04], and the books by Jähne et al. [Jahn 99] and Pears et al. [Pear 12]. These ongoing efforts underline the intuition that the most natural and descrip9 10 Range Imaging and Surface Registration in Medicine tive interface between the world and computer vision algorithms would be a full 3-D description of the environment, at least from a theoretical point of view. However, back in the early days of photogrammetry, a dense and accurate reconstruction of 3-D geometry was both tedious, time consuming, and expensive. Furthermore, data acquisition was limited to static objects and scenes. In practice, the world that we perceive spans another dimension: time. In fact, for many real-world applications, knowledge about the 3-D geometry of the environment is impractical - until it comes in real-time. This lack of real-time capable depth perception technology might explain the tremendous success of 2-D cameras in the field of computer vision. Even though such a classical camera projects the 3-D geometry onto a flat plane, capturing the temporal component of our 4-D world (3-D+time) seems more essential than the third dimension in spatial perception. This is not surprising, as 2-D projections still provide a multitude of cues about the underlying 3-D geometry. Lately, technological advances in optics, electronics, mechanical control, sensor design, and computing power have rendered metric 3-D range imaging at high resolutions (≥ 300k px) and real-time frame rates (≥ 30 Hz) possible. A significant step toward real-time and dense 3-D surface scanning was the development of Time-of-Flight (ToF) imaging (Sect. 2.1.2). However, its moderate spatial resolution, systematic errors, and the pricing of early ToF imaging prototypes led to a steady but rather slow increase of interest in this technology. In practice, the computer vision communities did hardly notice these early real-time capable RI sensors for some time. In 2010, this changed radically with the introduction of Microsoft Kinect as a natural user interface for gaming, at a mass market retail price of about $100 a unit and with more than 10 million sales within a few months. Apart from its impact on consumer electronics, computer vision researchers realized the potential behind the device – being a fully-functional real-time RI camera at a competitive pricing – for a wide range of applications far beyond gaming. The device has directed the attention of many research communities to the field of 3-D computer vision – with a strong focus on applications where real-time 3-D vision is the key. The fact that the computer vision community has dedicated a separate workshop series (Consumer Depth Cameras for Computer Vision, IEEE, since 2011) underlines the significance of low-cost RI. The first part of this chapter compares competing real-time RI technologies, with a focus on modalities used in this work. In particular, below, we restrict our discussion to triangulation and time-of-flight based approaches. Regardless of the fundamentally different underlying physical principles, both technologies are capable of acquiring dense and metric 3-D surface information at real-time frame rates and the vast majority of real-time range imaging sensors nowadays rely on these two principles. It is worth noting that alternative principles for 3-D shape acquisition exist, such as shape-from-shading also known as photometric stereo, interferometry, deflectometry, shape-from-texture, structure-frommotion and depth-from-focus/defocus. For a more generic overview of measurement principles for optical shape acquisition, we refer to Häusler and Ettl [Haus 11] and Stoykova et al. [Stoy 07]. 2.1 Real-time Range Imaging Technologies 11 Before we proceed, let us clarify that data acquired with RI devices is termed 3-D information in this thesis. We do not explicitly differentiate between 2.5-D data acquired from a single viewpoint as it is the case with today’s RI cameras and (full) 3-D data that can be obtained using reconstruction techniques in a multiview setup or with tomographic scanners (CT/MR). 2.1.1 Triangulation The most intensively explored principle in optical 3-D shape acquisition is triangulation. Let us differentiate between passive and active triangulation techniques. Passive Triangulation. The class of 3-D acquisition techniques restricted to using the natural illumination of a scene is denoted passive triangulation. The most prominent example is binocular perception using stereo vision [Hart 04]. Similar to the human visual system, stereo vision uses a pair of images acquired with two cameras from different viewpoints in order to compute 3-D structure. In particular, based on the geometric principle of triangulation, the position of a point in 3-D space can be reconstructed if the positions of its projections are known in both images. More specifically, the underlying theory of epipolar geometry states that the projection rays associated with the point locations on the images (and known from camera calibration) intersect at the unknown 3-D point in space, see Fig. 2.1a. The relative difference in position of the projected point (disparity) quantifies the depth of the object in the scene. The larger the disparity, the closer the object. Although considerable progress has been made in stereo vision, systems require precise calibration and imply a substantial computational burden to establish dense point correspondences based on feature matching, even though the search space can be reduced using epipolar constraints. Depth accuracy scales with the triangulation angle and the length of the triangulation base, respectively. However, a larger base comes with increased occlusion effects. Furthermore, the recovery of depth inside homogeneous, texture-less image regions or in the presence of repetitive patterns is an ill-posed problem. Active Triangulation. As opposed to passive triangulation approaches that solely rely on ambient scene illumination, active triangulation techniques for 3-D imaging typically use illumination units to project some form of light pattern onto the scene. A straightforward extension of passive stereo vision is the use of active pattern projection to simplify the correspondence problem in texture-less regions (active stereo vision). However, 3-D shape acquisition using active triangulation can also be achieved by using a projector in combination with only one camera. Here, one of the cameras of a stereo system is replaced with a light source that projects a controlled pattern onto the scene. Building upon the same measurement principle (triangulation geometry), the distance of an object can be determined based on prior knowledge of the extrinsic setup, i.e. the relative position and orientation of the light source w.r.t. the observing camera, see Fig. 2.1b. The correspondence problem is reduced to finding the known projected pattern. In the simplest case, the projected pattern may represent a single spot or a sheet of light, illuminating 12 Range Imaging and Surface Registration in Medicine Projection plane Image plane A Projector Triangulation Base Triangulation Base Camera A Surface Camera B Image plane B (a) Surface Camera Image plane (b) Projector Camera Surface Image plane (c) Figure 2.1: Schematic illustration of the measurement principle of different RI technologies. From left to right: (a) Passive triangulation in stereo vision, where the position x ∈ R3 is computed from its projection onto two different image planes, xp,A , xp,B ∈ R2 . Note that the term triangulation stems from the triangle formed by the connection between the projection points and the associated projection rays. (b) Active triangulation using pattern projection, where one of the cameras is replaced by a projector, xproj ∈ R2 is given by the projector geometry. (c) CW-based ToF imaging using intensity-modulated light where the position x ∈ R3 is deduced from the measured phase delay φtof . the scene with a single stripe. Active triangulation using the latter is commonly termed light sectioning (cf. Sect. 2.1.3), as the projected light sheet intersects with the 3-D scene geometry. Spot- or stripe-based triangulation systems are limited to capturing one single point or a line profile at a time. These modalities are typically denoted 3-D scanners, as they scan the scene in a consecutive manner to recover the geometry. This facilitates the correspondence problem and produces highly accurate and reliable measurements, but involves moving mechanical or electromechanical parts and precludes the acquisition of dynamic scenes. From a theoretical point of view, arbitrary projection patterns can be used. Range imaging modalities that use area patterns to capture the entire scene at a time are commonly denoted as structured light modalities. Using these technologies, no scanning is required. However, the correspondence between the projected and observed pattern is not obvious anymore and the projected pattern must be encoded. As a consequence, solving the correspondence problem induces a computational burden, even though this burden is low compared to passive systems. Along with advances in opto-electronics, various type of structured light patterns have been proposed in the literature over the years. Single-shot methods that are capable of reconstructing depth from one single static spatially-coded projection pattern use either monochrome [Garc 08] or color-coded patterns [Schm 12]. For the sake of completeness, we also refer to active triangulation sensors that are based on the projection of a time-series of patterns such as binary temporal patterns [Sava 97] or phase shifting [Salv 04], but unsuitable for dynamic RI. Let us conclude that active triangulation techniques are an interesting option to passive triangulation. Typically, active techniques outperform passive triangulation in terms of density, reliability and acquisition speed. Nonetheless, both are based on the geometric principle of triangulation and share the same limitations regarding accuracy and occlusions. 2.1 2.1.2 Real-time Range Imaging Technologies 13 Time-of-Flight Imaging ToF imaging is an emerging active range imaging technology that provides a direct way of acquiring metric 3-D surface information by measuring the time that light travels from an illumination unit to an object and back to a sensor. Two complementary approaches to ToF imaging have been proposed in the literature [Lang 00, Kolb 09]. Pulse-based ToF imaging directly measures the period of time between emission of a pulse from an illumination unit and reception of the reflected pulse back at the sensor using a very short exposure window [Yaha 07]. The alternative, continuous wave (CW) modulation ToF imaging, measures the phase delay φtof between an actively emitted and the reflected optical signal [Xu 98, Oggi 04, Foix 11], see Fig. 2.1c. CW-based ToF imaging is the most widely used approach in commercially available cameras. Let us briefly summarize the details: Active light sources attached to the camera illuminate the scene with an incoherent cosine-modulated optical signal in the non-visible spectrum of the infrared (IR) range. The light is reflected by the scene and enters the monocular camera, where each ToF sensor element performs a correlation of the local incoming optical signal with the electrical reference of the emitted signal. Based on this correlation, the phase shift φtof representing the propagation delay between both waves is measured by sampling the signal at equidistant points in time within a so-called integration time. The radial distance (range) r^ from the sensor element to the object is then given as φtof where f mod denotes the modulation frequency and c the speed of r^ = 2 f c · 2π mod light. Due to the periodicity of the cosine-shaped modulation signal, the validity of this equation is limited to distances smaller than 2 f c . ToF devices are commod pact, portable, easy to integrate and deliver complementary grayscale intensities in real-time with a single sensor. 2.1.3 Discussion of RI Sensors investigated in this Thesis In this thesis, we consider three different real-time RI modalities. All three are single-shot techniques that return metric 3-D coordinates of the acquired scene. For a comparison of their specifications, see Table 2.1. Below, let us discuss the strengths and limitations of the individual modalities. First, we compare ToF imaging with dense structured light. Then, we discuss a novel RI sensor prototype based on multi-line light sectioning. PMD CamCube. The PMD CamCube 2.0/3.0 is a CW ToF camera by PMD Technologies GmbH1 . It features a resolution of 204×204 px (2.0) or 200×200 px (3.0) and a full-resolution frame rate of 25 Hz (2.0) or 40 Hz (3.0), respectively. The IR illumination unit operates at a wavelength of 870 nm. The camera optics feature a field of view of 40◦ ×40◦ . Until PMD Technologies dropped the CamCube camera line from its commercially available product portfolio in 2012, it was the highest resolution ToF sensor on the market. 1 http://www.pmdtec.com 14 Range Imaging and Surface Registration in Medicine Table 2.1: Overview of the specifications for the RI sensors investigated in this thesis. The uncertainty for the PMD CamCube 3.0 and Microsoft Kinect, respectively, is given as the standard deviation of the measured depth values over time at a distance of 1 m to a white plane. These results are approximately consistent to related work and manufacturer reports. For the MLT sensor, the mean measurement uncertainty within the measurement volume of 80×80×35 cm3 is reported. We also depict raw data from the three sensors, for capturing a female torso phantom at a distance of 1.2 m (bottom line). Image sources, from left to right: PMD Technologies GmbH, Siegen Germany; Microsoft Corporation, Redmond, USA; Chair of Optics, University Erlangen-Nürnberg, Erlangen, Germany. Specification Principle Resolution [px] Frame rate [Hz] Meas. range [m] Field of view [◦ ] Uncertainty [mm] Price [e] Light source PMD CamCube 3.0 Microsoft Kinect Time-of-Flight Active Triang. (CW Modulation) (Structured Light) 200×200 640×480 40 30 0.3-7.0 0.8-4.0 40×40 57×43 ±5.98 ±0.92 8000 100 LED (870 nm) Laser (830 nm) MLT Sensor Active Triang. (Light Sectioning) 1024×768 (sparse) 15 Custom 44×33 ±0.39 Prototype Laser (660 nm) Example data The most substantial limitations of available ToF cameras are their low spatial resolution and low SNR. The former results from the fact that the complex circuitry for on-chip correlation entails substantial space requirements in terms of semiconductor area. The latter essentially results from limitations regarding the power of the emitted IR signal (trade-off between accuracy, power consumption, frame rate, and eye safety regulations), a finite integration time, and physical constraints such as the measurement uncertainty being indirectly proportional to the squared distance between the sensor and the object [Fran 09b]. Furthermore, ToF sensors suffer from a number of systematic (related to sensor) and non-systematic (related to scene content) errors [Kolb 09, Foix 11, Hans 13]. Systematic errors include (1) distance related measurement errors (aka. wiggling) that result from imperfections in the shape of the modulated IR signal, (2) amplitude related errors due to a low strength of the reflected signal, saturation, and the 2.1 Real-time Range Imaging Technologies 15 dependency of depth measurements on object material and reflectivity, (3) pixel related errors due to tolerances in the chip manufacturing process, (4) temperature related errors due to the influence of temperature on semiconductor material properties that cause variations in the response behavior, (5) errors that result from the limited lateral resolution such as ambiguities at depth discontinuities (aka. flying pixels), and (6) integration time-related errors that are not yet fully understood [Foix 11]. Note that the actual systematic error occurring in practice is a superposition of these individual sources of error. Non-systematic errors involve (1) motion blur in the presence of dynamic scenes resulting from the underlying principle of reconstructing the phase from a set of time-delayed samples captured within a non-infinitesimal integration time, (2) internal light scattering effects between the camera lens and the sensor and subsurface scattering at the object, and (3) multi-path issues that result from the superposition and interference of responses received from different reflection paths at the same sensor element. Early approaches to ToF camera calibration have particularly focused on the correction of systematic errors [Fuch 08a, Fuch 08b, Lind 10, Reyn 11, Schm 11] and put substantial effort to the theoretical and physical modeling of ToF sensors and their error sources using simulation frameworks [Kell 09, Schm 09]. Recently, we have noticed increased efforts to tackle non-systematic error sources, e.g. for the compensation of motion blur [Lee 12], light scattering [Wu 12], and multi-path issues [Fuch 10, Dorr 11]. Let us conclude that ToF imaging is a promising technology but still considered not mature for many practical and real-world applications. Nonetheless, it features several advantages compared to triangulation techniques discussed below. ToF cameras do not require a baseline between the illumination unit and the sensor, allowing for compact designs and superseding the need for extrinsic calibration. ToF imaging inherently delivers aligned geometric depth and photometric intensity information. Depth data is acquired independently for each pixel and regardless of scene texture conditions, also avoiding the computationally expensive correspondence problem and, thus, enabling fast 3-D data acquisition. The non-ambiguous measurement range is highly scalable by modifying light power, integration time, or modulation frequency. Multi-camera setups are possible using distinct modulation frequencies. Furthermore, ToF imaging is robust to background illumination by on-chip filtering of the active transmitter signal from ambient light. Last, from a researcher’s perspective, a considerable advantage is that most ToF manufacturers provide a comprehensive application programming interface (API) enabling software-based RI data enhancement. For instance, the application of compressed sensing techniques was proposed for ToF imaging recently [Cola 12, Tsag 12]. Microsoft Kinect. The Microsoft Kinect device features a conventional RGB camera (1280×1024 px, 30 Hz) that typically operates at a resolution of 640×480 px, an IR laser projector (830 nm) that projects a pseudo-random dot pattern, and a monochrome IR CMOS sensor (1280×1024 px, 30 Hz) equipped with an IR bandpass that observes the projected pattern in the scene. IR data are evaluated on a system-on-a-chip (SoC), generating range images at a maximum nominal resolu- 16 Range Imaging and Surface Registration in Medicine tion of 640×480 px at 30 Hz with 11-bit discretization [Ande 12, Catu 12, Khos 12, Smis 13]. The reconstruction of depth values from the observed dot pattern is based on correlation with known reference patterns. For a comprehensive description of the underlying reconstruction process, we refer to a series of patents from Primesense Ltd., Israel2 [Free 10b, Free 10a, Shpu 11, Zale 10] that originally developed the technique. Major advantages of Microsoft Kinect compared to ToF cameras are its high spatial resolution, a better SNR, and less systematic errors. For instance, measurements are independent of the reflectivity of objects, as opposed to ToF sensors. Strong ambient illumination may reduce the contrast in the observed pattern, influencing the depth reconstruction quality. However, this is not an issue with indoor applications being considered in this thesis. Another advantage compared to ToF imaging is the independence from measurements on scene geometry (no multi-path issues). Being a closed (black box) system, let us remark that there is no insight about internal pre-processing of depth data. Hence, it is unclear whether the higher SNR results from more reliable measurements or from pre-processing in the SoC unit. Compared to previous triangulation-based sensors, a substantial progress was solving the correspondence problem efficiently on a low-cost SoC processor, allowing for real-time depth reconstruction. Beside these technological aspects and engineering achievements to produce the most dense and real-time capable RI camera, the key factor for the success of Microsoft Kinect was its pricing that could be achieved using established IR sensor technology in combination with a dedicated SoC processor and mass market quantities. Let us briefly summarize its limitations [Khos 12, Shim 12]: First, both the depth resolution and the measurement uncertainty decreases quadratically with increasing sensor-object distance. Second, a common problem of Microsoft Kinect is the incapability to recover depth in regions with non-diffuse highlights or translucent objects. Specular highlights typically cause total reflection and sensor saturation, translucent objects result in refraction and subsurface scattering effects preventing or distorting the reflection back to the sensor. Third, as other triangulation-based RI systems, Microsoft Kinect suffers from partial occlusions in regions that are illuminated by the projector but not captured by the IR sensor, leading to missing areas where depth cannot be reconstructed. Note that the requirement of a baseline also restrains the degree of miniaturization and the danger of de-calibration over time due to physical stress. Fourth, multi-camera setups suffer from interferences, impairing depth reconstruction quality. Last, Microsoft Kinect is a an off-the-shelf consumer electronics device with a preset fix measurement range and restricted access to internal parameterization and raw sensor data. Multi-line Triangulation Sensor. The multi-line active triangulation (MLT) sensor used in this work was recently introduced for interactive reconstruction of dense and accurate 3-D models, based on the so-called principle of Flying Triangulation [Ettl 12b]: A hand-guided sensor is moved around an object while continuously capturing camera images of a projected line pattern. Each camera image delivers sparse 3-D measurements, which are aligned to precedingly acquired data 2 http://www.primesense.com/ 2.1 Real-time Range Imaging Technologies 17 camera images (single-shot sensor) ... 3-D data generation (triangulation) sparse 3-D sampling lines ... static viewpoint stream of 3-D sampling grids time ... t1 t2 t3 Figure 2.2: Schematic illustration of the proposed measurement principle for the MLT sensor. The camera acquires sets of horizontal and vertical lines that are projected onto the scene alternately. Based on these 2-D images, the associated 3-D measurements are reconstructed and merged to a stream of spatio-temporal 3-D sampling grids. in real-time. The principle is scalable and three sensors have been realized so far: an intra-oral teeth sensor, a face sensor, and a body sensor. The sensors have minimal measurement uncertainty for the respective measurement frustum. As light sources, LEDs or lasers can be used. In this thesis, we propose to apply the MLT sensor in a non-moving way, see Fig. 2.2: The sensor is rigidly mounted and captures the scene from one static viewpoint. Continuously, a stream of sparse 3-D data of the observed geometry is obtained. In detail, sets of 11 horizontal and 10 vertical lines are projected onto the scene alternately, using two laser line pattern projection systems (660 nm). The patterns are observed by a synchronized CCD camera with a resolution of 1024×768 px [Ettl 12b] and a frame rate of 30 Hz. Half of the lines of the measurement grid are updated from frame to frame, see Fig. 2.2. Hence, two consecutive perpendicular sets of line profiles describe the surface topography within a time window of 1/30 second and a fully updated set of horizontal and vertical measurements is available every 1/15 second, i.e. an effective frame rate of 15 Hz. Thus, the sensor can acquire sparse but highly accurate 3-D data in real-time. The MLT sensor can be potentially manufactured at low cost, is compact and relies on the established principle of light-sectioning, enabling high-precision surface sampling. The robustness of the system is based on the high-contrast laser signal, even under the presence of fabric and/or skin. Unlike light-sectioning RI systems that capture 3-D contours consecutively over time while sweeping a single laser line, the MLT sensor acquires information along multiple lines (simultaneously) and in both directions (alternately) by projecting two orthogonal line patterns. It features an optimized spatial resolution along the measurement grid as only a direct localization of the observed lines is needed, compared to neighborhood correlation in depth reconstruction techniques based on speckle pattern projection. 18 2.2 Range Imaging and Surface Registration in Medicine Range Image Processing As indicated before, the introduction of low-cost devices for real-time 3-D perception has attracted a great deal of attention. However, the state-of-the-art lacks a software library that explicitly addresses real-time processing of dense range image and 3-D point cloud streams. We have developed an open source platform that addresses the particular demands of modern RI sensors (Sect. 2.2.1). In this section, we further present an integrated RI simulation environment (Sect. 2.2.2) and a brief overview of range image enhancement (Sect. 2.2.3). For a recapitulation of the concept of perspective projection and on how the inversion of this projection is employed to calculate 3-D positions from the pixel-wise distance measurements of RI sensors we refer to the Appendix of this thesis (Sect. A.1). 2.2.1 RITK: The Range Imaging Toolkit In the computer vision community, several open source software libraries for general purpose 2-D and 3-D image processing exist today [Iban 05, Schr 06, Brad 00, Rusu 11]. However, most of these libraries only provide some basic functionality for the processing of static 3-D point clouds or surfaces and there exists no open source framework that is explicitly dedicated to real-time processing of range image streams. During the course of this thesis, we have developed a powerful yet intuitive software platform that facilitates the development of RI applications: the range imaging toolkit (RITK) [Wasz 11b]. It is a cross-platform and object-oriented toolkit explicitly dedicated to the processing of high-bandwidth data streams from modern RI devices. RITK puts emphasis on real-time processing using dedicated pipeline mechanisms and userfriendly interfaces for efficient range image stream processing on modern manycore graphics processing units (GPUs). Furthermore, RITK takes advantage of the interoperability of general purpose computing on the GPU and rendering for realtime visualization of dynamic 3-D point cloud data. Being designed thoroughly and in a generic manner, the toolkit is able to cope with the broad diversity of data streams provided by available RI devices and can easily be extended by custom sensor interfaces and processing modules. The toolkit can support developers in two ways: First, it can be used as an independent library for range image stream processing within existing software. Second, it supports developers at an application level with a comprehensive software development and rapid prototyping infrastructure for the creation of application-specific RI solutions. Due to its generic design, existing modules can be reused to assemble individual RI processing pipelines at run-time. RITK is an open source project and publicly available online3 . In our experience, it proved to greatly reduce the time required to develop RI applications. Hence, we feel confident that other researchers in the rapidly growing community will also benefit from RITK. 3 http://www5.cs.fau.de/ritk 2.2 2.2.2 Range Image Processing 19 Virtual RGB-D Camera We have implemented a range image stream simulator that can generate virtual RGB-D data in the same representation as a real RI camera would acquire it. In particular, it produces range data based on the OpenGL depth buffer representation of a given 3-D scene. The simulator allows to experiment with modalitydependent sensor resolutions, noise characteristics, and artifacts that occur with different RI sensors, while providing an absolute ground truth for evaluation purposes. Among others, we use it for quantitative evaluation of the rigid and nonrigid surface registration algorithms proposed in this thesis. 2.2.3 Range Data Enhancement As detailed in Sect. 2.1.3, available RI cameras typically exhibit low SNRs and may entail invalid or unreliable measurements that result in incomplete data due to the underlying sampling principles and physical limitations of the sensors. Consequently, the enhancement of the raw range measurements provided by RI cameras is a fundamental premise for medical applications that require a high level of accuracy and reliability in shape information while meeting real-time demands. For the applications addressed in this thesis, a typical RI pre-processing pipeline consists of two basic components: (1) restoration of invalid measurements and (2) temporal and spatial edge-preserving denoising [Wasz 11a, Wasz 11c]. Note that range data enhancement is usually performed in the 2-D sensor domain. This facilitates real-time RI data processing. Restoration of Invalid Measurements. Prior to denoising of RI data, the restoration of invalid measurements has to be taken into consideration. As opposed to static defect pixels, such invalid measurements occur unpredictably and can affect both an isolated pixel or connected local regions. Note that in practice, some RI sensors provide a pixel-wise reliability measurement. In the literature, a plurality of methods for defect pixel correction have been proposed. Given the trade-off between effectiveness and complexity, conventional approaches like normalized convolution [Knut 93, Fran 09a] provide satisfying results [Wasz 11a]. It is worth noting that restoring missing data in a first step renders an extra conditioning of invalid data unnecessary for subsequently applied denoising algorithms. Temporal and Spatial Denoising. The second component of a common RI data enhancement pipeline is concerned with both temporal and spatial denoising. When capturing static scenes, temporal measurement fluctuations can be reduced by averaging over a set of subsequent frames [Lind 10]. In practice, we typically applied an equally-weighted averaging or bilateral temporal averaging for dynamic scenes [Wasz 11a]. The length of the time interval considered for temporal denoising highly depends on the dynamics of the particular application. For low-frequent scenarios such as respiratory motion tracking and real-time RI acquisition frame rates, the averaging of frames within a finite time interval is typically acceptable. After temporal denoising, we apply edge-preserving spatial filtering. In the field of image processing and computer vision, a variety of spatial 20 Range Imaging and Surface Registration in Medicine denoising approaches have been introduced in recent years. Edge-preserving filters that smooth homogeneous regions while preserving manifest discontinuities are of special interest for RI data enhancement. In this context, one of the most popular and established methods is the bilateral filter [Auri 95, Toma 98]. Beyond its application for a multitude of conventional imaging modalities, it is a common choice for RI denoising [Lind 10]. The filter is straightforward to implement, but exhibits a poor run-time performance due to its non-stationary nature. A promising alternative is the recently introduced concept of guided filtering [He 13] with a computational complexity that is independent of the filter kernel size. At the same time, it exhibits a comparable degree of edge-preserving smoothing. Let us conclude with a word on filter parameterization. Regarding the trade-off between data denoising and preservation of topographical structure, we typically perform RI data pre-processing in a way that gives priority to surface smoothness. Insufficient filtering results in topographical artifacts that may have a strong influence on algorithmic performance. 2.3 Applications in Medicine In this section, we present a brief state-of-the-art survey on the integration of modern RI sensors in medical applications. For a comprehensive review we refer to [Baue 13a]. We focus on dynamic and interactive tasks where real-time and dense 3-D vision forms the key aspect. The survey divides into four thematic fields: prevention and support in elderly care and rehabilitation, room supervision and workflow analysis, touch-less interaction and visualization, and guidance in computer-assisted interventions. 2.3.1 Prevention, Diagnosis and Support Activity Assessment and Remote Support in Elderly Care. In-home activity assessment in elderly care is a rapidly evolving field. Systems focus on monitoring the health status, sharing information about presence and daily activities, and providing on-line assistance and coaching. This allows older adults to continue life in independent settings. Low-cost RI holds great potential in this context. For instance, recognition of early indicators of functional decline such as deviations in gait using pose estimation [Mosh 12, Garc 12, Ston 11b, Ston 11a] can help preventing accidents, automatic detection of abnormal events such as falls [Parr 12] can improve the response time in emergency situations, and retrospective data analysis can help understanding the mechanisms that led to an event. Early Diagnosis and Screening. The detection of abnormal behavior based on RI technologies also holds potential for early diagnosis and screening, for different groups of patients. Information about daily lifestyle and deviations from the normal can help in early diagnosis or progression analysis for cognitive impaired people such as Alzheimer’s [Coro 12] or Parkinson’s disease patients [Mosh 12]. Low-cost RI is also considered for large-scale screening of at-risk groups. In devel- 2.3 Applications in Medicine 21 opmental disorders such as autism and schizophrenia, observing behavioral precursors in early childhood using 3-D perception for activity recognition [Siva 12, Walc 12] can allow for early intervention and thus improve patient outcomes. In sleep monitoring, RI is gaining interest for non-contact measurement of sleep conditions [Yu 12] or diagnosis of sleep apnea [Fali 08]. Therapy Support and Progress Monitoring in Rehabilitation. RI sensors have also attracted interest in the field of physical therapy. “Serious games” in rehabilitation have shown to increase motivation of the patient, thus improving exercise performance and rehabilitation outcomes [Smit 12]. RI-based games are of particular interest, as the embedded sensors simultaneously allow for a quantitative assessment of performance. Low-cost RI systems have lately been considered for tele-rehabilitation techniques that are beneficial for translating skills learned in therapy to everyday life. Furthermore, RI-based rehabilitation systems for physically disabled patients [Chan 11, Gama 12, Huan 11], chronic pain patients [Scho 11] and patients after neurological injuries [Chan 12, Lang 11] have been proposed. Aids for Handicapped People. Recently, first approaches toward the use of assistive technologies to support handicapped people were proposed [Hers 08]. For instance, integration of a RI device into an augmented blindman’s cane or headmounted systems could aid visually impaired people in navigation by identifying and describing surroundings beyond the limited sensing range of a physical cane [Gall 10, Katz 12, Ong 13]. Low-cost RI can be also used with autonomous transportation vehicles that follow handicapped people using 3-D perception. 2.3.2 Monitoring for OR Safety and Workflow Analysis Room Supervision for Safety in Robot-assisted Interventions. Monitoring the working area of operating rooms (OR) using a multi-camera setup of conventional cameras [Ladi 08, Nava 11] or multiple 3-D cameras can help improve both medical staff and patient safety as well as the efficiency of workflows. In particular, collision avoidance is an emerging topic with the increased use of robotic workspaces, ensuring safe human-robot interaction [Monn 11, Nico 11]. We further refer to the EU projects SAFROS4 targeting patient safety in robotics surgery and ACTIVE5 involving camera-based OR monitoring to ensure safe workspace sharing between people and robotic instruments. Monitoring, Analysis, and Modeling of Workflows. In addition to ensuring safe human-robot interaction, OR workspace monitoring is of interest for the analysis and modeling of workflows. In intensive care units, 3-D perception and automatic activity recognition hold potential for workflow supervision and documentation, and can help improving workflow efficiency [Lea 12]. Another interesting research 4 http://www.safros.eu/ 5 http://www.active-fp7.eu/ 22 Range Imaging and Surface Registration in Medicine direction addresses the development of a context-aware system for surgical interventions that is able to recognize the surgical phase within the procedure and support the surgeon with appropriate visualization etc. [Pado 12]. We also refer to the MICCAI workshop series on Modeling and Monitoring of Computer Assisted Interventions (M2CAI). 2.3.3 Touchless Interaction and Visualization Touchless Interaction in Sterile Environments. The advent of touchless realtime user-machine interaction that came along with the introduction of low-cost RI sensors has also evoked interest in the medical domain. In particular, gesture control holds potential for touch-less interaction in interventional radiology where surgeons need to remain sterile but often want to navigate patient data and volumetric scans (CT/MR) etc. [Foth 12, Ment 12, John 11, Bigd 12a, Bigd 12b, Stau 12, Sout 08, Gall 11]. Gesture-based techniques are also being considered for fine control of surgical tools. On-patient Visualization of Medical Data. On-patient visualization of anatomical data can be used for medical education and training, intervention planning and further applications that require an intuitive visualization of 3-D data. Basically, using an RI camera, the system tracks the pose of the patient w.r.t. the camera’s coordinate system and blends anatomical information onto an augmented reality display that can be mounted rigidly [Blum 12, Nava 12] or be portable [Maie 11]. The former addresses anatomy teaching, the latter can supersede the traditionally mental transfer of medical image data visualized on a wall-mounted monitor onto the patient. 2.3.4 Guidance in Computer-assisted Interventions 3-D Endoscopy for Minimally Invasive Procedures. Knowledge about the local surface geometry during minimally invasive procedures holds great benefits compared to conventional 2-D endoscopy. Typically, 3-D data improve robustness for applications such as instrument localization, collision detection, metric measurements, and augmented reality, hence improving both quality and efficiency of minimally invasive procedures. Over the past years, various competing technologies have been proposed such as stereo vision [Stoy 12], photometric stereo [Coll 12], color-coded structured light [Schm 12], or ToF [Haas 12, Haas 13, Penn 09]. Patient Localization, Setup and Tracking. The automation of patient setup and position tracking is of particular interest for repeat treatments such as in fractionated radiation therapy, where the tumor is irradiated in a set of treatment sessions and reproducible patient setup is a key component for accurate dose delivery. Proposed solutions in RT rely on active stereo imaging with interactive frame rates [Peng 10], ToF imaging [Baue 12b, Plac 12], Microsoft Kinect [Baue 11a, Wasz 12b], and light sectioning [Brah 08, Ettl 12a]. Beside RT, the estimation of patient position, orientation and pose [Grim 12, Scha 09], and the localization of 2.4 Surface Registration 23 accessories such as MR coils [Simo 12] also holds potential for diagnostic and interventional imaging. Motion Management. Tracking the motion of patients by monitoring their external body surface is an essential prerequisite for motion compensation techniques. Motion compensation is of particular interest in RT for abdominal and thoracic regions where motion induces a substantial source of error, and holds potential for improvements in tomographic image acquisition. In contrast to early strategies that were restricted to acquiring a 1-D respiration curve [Scha 08], the methods proposed in this thesis recover dense deformation fields that better reflect the complexity of respiratory motion and allow for an automatic distinction between abdominal and thoracic respiration types [Wasz 12a]. Solutions for both dense [Baue 12b, Baue 12d] and sparse RI data [Baue 12c, Baue 12a] are presented in Part II, being tailored to the applied sensor technology. Vice versa, motion tracking and compensation can be also applied to improve patient setup accuracy [Wasz 12b, Wasz 13]. Guidance, Navigation and Augmented Reality. Real-time RI provides the basis for intra-operative acquisition of the 3-D operation area and registration of organ surfaces to pre-operative data in image-guided interventions, both for minimally invasive and open surgery. Among others, RI has been proposed for marker-free needle-tracking [Wang 12] and fusion with pre-operative data for augmented reality applications [Maie 12, Seit 12, Sant 12b, Sant 12a, Mull 11, Sant 10]. 2.4 Surface Registration Along with recent advances in dense and real-time 3-D RI and the availability of 3-D databases, the analysis of geometric shapes has been assuming an increasingly important role. In particular, the identification of corresponding elements between two or more given shapes represents a fundamental prerequisite for a wide range of applications. Below, let us cover the variety of names for this task that appear in the literature (shape correspondence, geometry registration, surface matching, model alignment) with the term surface registration. In general, the process of surface registration aims at recovering a transformation that brings a given moving template shape in congruence with a fixed reference shape and involves three basic components: 1. The choice of an appropriate geometric transformation model 2. The definition of an objective function based on a suitable similarity or distance metric 3. The application of an efficient optimization strategy that estimates the optimal transformation parameters w.r.t. the objective function 24 Range Imaging and Surface Registration in Medicine The actual choices for these three components highly depend on the problem at hand. In this section, we summarize related work and differentiate between local vs. global and rigid vs. non-rigid surface registration. In addition, we provide an overview of medical applications. For a more comprehensive overview we refer to the surveys by Kaick et al. on shape correspondence [Kaic 11], Salvi et al. on range image registration [Salv 07], Audette et al. and Sotiras et al. on medical image registration [Aude 00, Soti 12] and Heimann and Meinzer on statistical shape modeling [Heim 09]. In order to focus on methods that are most relevant to this thesis, the review is restricted to fully-automatic approaches that do not require user input. Furthermore, we consider the case of pairwise surface registration between two given shapes as opposed to group-wise shape registration. The review does not cover methods that cannot cope with partial matching and computationally highly expensive methods for recovering large-scale deformations such as approaches based on graph embeddings [Mate 08]. 2.4.1 Global vs. Local Surface Registration First, let us distinguish between global (aka. coarse) surface registration methods that explore the entire solution space and local (aka. fine) registration approaches that rely on an initial estimate and/or assume a similar pose between template and reference. For applications where the latter assumption is not given, typically a two-stage procedure is applied, combining a pre-alignment using global matching strategies with a subsequent local registration refinement technique that is initialized with the previously computed coarse initial guess. In the general global registration case, the goal is to find correspondences without pre-aligning the datasets. A popular approach is the use of feature descriptors that encode the local 3-D surface topography of the fixed and moving shape. These are then matched to establish point-to-point correspondences. Based on these correspondences and geometric consistency criteria, the transformation to align the template with the reference can be established. Salient points may be localized prior to the stages of feature description and correspondence search in order to keep the search space in a manageable dimension. Speaking in the words of an optimization problem, the objective function consists of a matching term that enforces similarity of the feature descriptors and is subject to constraints that quantify the degree of shape deformation. A common constraint for global registration techniques assesses the disparity in the distances and normal orientations between pairs of corresponding points, approximating the compatibility between pairs of assignments without first aligning the shapes. In the past decade, a variety of 3-D shape descriptors have been proposed, e.g. based on spherical point signatures [Chua 97], spin images [John 99], spherical harmonics [Kazh 03, Khai 08], shape context [Belo 00, From 04], integral descriptors [Gelf 05, Mana 06], multi-scale features [Li 05], salient geometric features [Gal 06] and diffusion geometry features such as heat diffusion signatures [Sun 09]. While traditional feature-based techniques were developed for the rigid registration case addressing globally or piecewise rigid (articulated) motion, they have been also explored for non-rigid regis- 2.4 Surface Registration 25 tration scenarios [Bron 11, Zaha 09, Sun 09], also cf. Sect. 2.4.2. For a concrete application of 3-D shape descriptors in rigid surface registration we refer to Chap. 3. Global registration approaches are typically used to guide local registration techniques that refine the initial solution. However, many scenarios involve only a slight misalignment of the fixed and moving shapes and do not require an initialization using the aforementioned techniques. For instance, assuming a similar pose is a valid assumption for the registration of data streams acquired with a hand-guided RI device (Chap. 4) or for tracking the motion of a respiring patient on a non-moving treatment table from a static viewpoint (Chap. 5). 2.4.2 Rigid vs. Non-Rigid Surface Registration In contrast to the previous distinction between global and local registration approaches, let us now give an overview about related work w.r.t. the class of the underlying transformations mapping a moving template shape onto a fixed reference. In the rigid case, the transformation is constrained to global translations and rotations. The non-rigid case involves elastic deformations. Rigid Surface Registration. Global surface registration in the rigid case, e.g. in the presence of gross misalignments, is typically addressed using feature-based approaches. As we already sketched the basic idea behind this class of methods including popular 3-D shape descriptors in Sect. 2.4.1, let us here confine to the case of rigid local surface registration. The most prominent and widely used algorithm for this task is the iterative closest point (ICP) algorithm originally introduced by Besl and McKey [Besl 92], Chen and Medioni [Chen 92] and Zhang [Zhan 94]. In an iterative manner, the ICP algorithm alternates between finding point correspondences in a nearest neighbor relationship and estimating the rigid transformation that optimally aligns the correspondences in a least-squares sense, minimizing its distance using a closed-form solution [Horn 87], cf. Chap. 4. The underlying objective function consists of a matching term that quantifies how well the datasets align to each other. It is worth noting that ICP convergence depends on the quality of the initial pre-alignment. Over the years, various variants of the ICP algorithm have been proposed [Rusi 01] with a focus on robustness to noise and outliers, efficiency, and alignment accuracy, e.g. considering different distance metrics (point-to-plane vs. pointto-point) to avoid snap-to-grid effects or addressing anisotropic noise [Maie 12]. Furthermore, promising alternatives to avoid the correspondence search in the first stage have been proposed, e.g. using probabilistic [Jian 05], implicit [Zhen 10, Rouh 11] or distance field representations [Fitz 03]. A prominent practical example for rigid surface registration is the alignment of 3-D scans [Gelf 05, Paul 05, Aige 08]. Assuming rigidity in the scanning target, a classical approach to scan alignment is a two-stage scheme combining a feature-based estimation of the gross transformation with an ICP-like refinement technique. However, even though the scanning target is a rigid body, non-linearities or calibration uncertainties of the acquisition device may induce low-frequent non-rigid deformations [Brow 07]. 26 Range Imaging and Surface Registration in Medicine Non-Rigid Surface Registration. Non-rigid surface registration (aka. deformable shape matching) is essential for various applications. One particular direction of research is on building statistical shape models that can be used for model morphing [Hasl 09] or as prior knowledge with related tasks, for instance guiding segmentation approaches in medical image computing [Heim 09], cf. Sect. 2.4.3. Among the most popular approaches to non-rigid surface registration is a modification along the original ICP algorithm and its variants. Instead of assigning discrete one-to-one point correspondences based on a nearest neighbor criterion, the assignment is relaxed to be a continuously valued confidence measurement. Typically, correspondences are established between all combinations of points according to some probability. Hence, these soft assignment techniques can be considered a generalization of the binary assignment in the classical ICP. An inherent advantage of such a weighted correspondence framework is its capability to handle noise and outliers, respectively. In the past decade, several soft assignment approaches have been proposed. One of the first works in this field is the robust point matching (RPM) framework by Chui and Rangarajan [Chui 03]. In combination with a thin plate spline (TPS) parameterization of the underlying elastic transformation [Book 89], it alternates between soft assignment update using deterministic annealing and estimation of the TPS parameters, and has become one of the most popular approaches (TPS-RPM) for non-rigid surface alignment. The work by Myronenko and Song [Myro 10] proposes a similar alternating strategy that considers the alignment as a probability density estimation problem. Basically, they interpret the template point set as Gaussian mixture model (GMM) centroids that are fit to the reference point set by maximizing the likelihood in an expectation-maximization (EM) framework [Demp 77]. To ensure the motion field to be smooth, implying preservation of topographical structure, the GMM centroids are constrained to move coherently. Different to Chui and Rangarajan [Chui 03], Myronenko and Song model the non-rigid transformation in a nonparametric manner. Related work by Tsin and Kanade [Tsin 04] and Jian and Vermuri [Jian 05] consider the surface registration problem as an alignment between two distributions, modeled as Gaussian mixture models (GMM). In the non-rigid case, one of the models is parameterized by TPS and the transformation parameters are estimated to minimize the statistical discrepancy between the two mixtures. For a common viewpoint of these closely related methods [Chui 03, Jian 05, Myro 10] and the relation to the classical rigid ICP [Besl 92, Chen 92] we refer the interested reader to the generalized framework by Jian and Vemuri [Jian 11]. Methods proposed for non-rigid surface registration along the basic idea of ICP alignment typically combine non-probabilistic closest point measures with an additional regularization formulation to enforce shape preservation [Alle 03, Ambe 07]. Another direction in non-rigid surface registration builds on implicit shape representations such as 3-D distance functions [Jone 06], embedding the problem of shape correspondence to a volumetric image domain. In the work of Paragios et al. [Para 03] and Huang et al. [Huan 06], both the template and reference shapes are embedded into a distance transform space. The alignment procedure itself is similar to volumetric non-rigid image registration algorithms, dividing into nonparametric [Para 03] and parametric [Huan 06] models such as free-form deforma- 2.5 Discussion and Conclusions 27 tions [Ruec 99]. An interesting generalization of distance transform based registration in the volumetric domain is the encoding of shapes into vector-valued feature images where a voxel holds a set of complementary descriptor values [Tang 08a]. 2.4.3 Medical Surface Registration In medical image computing, establishing dense correspondences between surface data is a fundamental requirement for the analysis of anatomical shapes that can be of approximately rigid (e.g. bones) or highly elastic (e.g. organs) nature and vary across age, gender, ethnicities, and diseases. Over the past decade, statistical shape modeling and atlas generation has evolved to an established subfield [Heim 09]. Medical shape modeling typically builds on volumetric imaging modalities (CT/MR/PET/SPECT) and extracted shapes are often represented as a binary volume. Hence, surface registration in medical image computing assumes a special role compared to other fields such as computer graphics where shapes are generally represented as meshes. Medical surface registration approaches can be divided into three classes: mesh-to-mesh, mesh-to-volume and volume-to-volume shape alignment. Rigid mesh-to-mesh registration approaches using Procrustes analysis [Jens 12] or ICP variants [Albr 12] is typically applied when shapes are expected to be similar, e.g. bones [Vos 04]. For anatomical structures that exhibit a substantially higher degree of variation, non-rigid mesh-to-mesh techniques are used [Huan 06, Tang 08a, Myro 10]. Notable early works are parametric solutions by Subsol et al. [Subs 98] and Fleute et al. [Fleu 99], for a survey we refer to Audette et al. [Aude 00]. Different to this first class of mesh-to-mesh registration techniques, mesh-to-volume alignment is a popular option. Here, a deformable shape model represented as a morphable mesh is adapted to a volumetric segmentation or used to actually perform the segmentation in the original intensity domain [Fleu 02]. The third option is a volume-to-volume registration, either on binary segmentation data [Fran 02] or in the original intensity domain [Ruec 03]. As indicated before, the widespread application of this last class of techniques in medical shape registration is due to the fact the the original representation is a volume. Let us remark that volume-to-volume based approaches can be also applied to shapes that were originally represented as meshes by transforming them into a volumetric representation. However, on the downside, volumetric registration typically implies a substantial computational burden since it aligns not only the target but also the background. Independent of the underlying methodology, surface registration has been applied to a variety of anatomical structures such as cardiac ventricles [Huan 06, Myro 10], cerebral structures [Rang 97, Jian 05, Tang 08a, Comb 10, Kurt 11], abdominal organs [Clem 08, Sant 10], cartilage tissue [Jens 12], and bone structures [Gran 02, Sesh 11, Gies 12], among others. 2.5 Discussion and Conclusions In this chapter, we have presented a comprehensive overview of real-time range imaging technologies and have identified promising applications in different fields 28 Range Imaging and Surface Registration in Medicine of health care. Many of these applications involve the alignment of an acquired 3-D shape with a reference model. This underlines the demand for dedicated surface registration approaches that are tailored to cope with the specific properties of RI data. In the remainder of this thesis, we propose both rigid and non-rigid surface registration techniques that are optimized w.r.t. the strengths and limitations of different RI technologies and describe novel concepts to improve the quality, safety and efficiency of clinical workflows. Part I Rigid Surface Registration for Range Imaging Applications in Medicine 29 CHAPTER 3 Feature-based Multi-Modal Rigid Surface Registration 3.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Feature-based Surface Registration Framework. . . . . . . . . . . . . . . . . 36 3.4 Shape Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 The robust alignment of surface data being acquired with different modalities holds great potential for medical applications but has been rarely addressed in literature. In this chapter, we present a generic framework for multi-modal rigid surface registration. Facing large misalignments and partial matching scenarios, we chose a feature-based registration approach. It is based on matching feature descriptors that encode the local 3-D surface topography to establish point correspondences between a moving template and a fixed reference shape. Based on these correspondences, the rigid transformation is estimated that brings both shapes into congruence. We put particular emphasis on the conception of shape descriptors that are capable of coping with surface data from different modalities. This implies non-consistent mesh density, mesh organization and inter-modality deviations in surface topography that result from the underlying sampling principles and noise characteristics. We have adapted state-of-the-art descriptors to meet these specific demands and handle the required invariances. In the experiments, we address two different clinical applications: • RI/CT torso surface registration for automatic initial patient setup in fractionated radiation therapy [Baue 11a] • RI/CT organ surface registration for augmented reality applications in imageguided open liver surgery (IGLS) [Mull 10, Mull 11] The remainder of this chapter is organized as follows: The medical background is depicted in Sect. 3.1. In Sect. 3.2, we review relevant literature. The proposed rigid surface registration framework is introduced and detailed in Sects. 3.3-3.4. Experimental results for torso and organ surface registration are given in Sect. 3.5. 31 32 Feature-based Multi-Modal Rigid Surface Registration Eventually, we discuss the results and draw conclusions in Sect. 3.6. Parts of this chapter have been published in [Baue 11a, Mull 10, Mull 11]. Parts from [Baue 11a] are reprinted with permission of IEEE, © 2011 IEEE. 3.1 Medical Background We propose the application of the developed multi-modal rigid surface registration framework in two medical applications. In RT, prior to each treatment fraction, the patient must be aligned to tomographic planning data. We present a novel marker-less solution that enables a fully-automatic initial coarse patient setup using multi-modal RI/CT surface registration. In image-guided liver surgery, the registration of the intra-operative organ shape with surface data extracted from pre-operative tomographic image data is conventionally performed based on manually selected anatomical landmarks. We introduce a fully automatic scheme that is able to estimate the transformation for organ registration in a multi-modal setup aligning intra-operative RI with pre-operative CT data. Below, let us detail the clinical motivation for both applications. 3.1.1 Patient Setup in Fractionated Radiation Therapy Precise and reproducible patient setup is a mandatory prerequisite for the success of fractionated RT, improving the balance between complications and cure and providing the fundamental basis for high-dose and small-margin irradiation application. Prior to each fraction, the patient must be accurately aligned w.r.t. the target isocenter that has been localized in tomographic planning data (typically CT). Conventionally, the alignment is performed manually by clinical staff using a laser cross and skin markers. For subsequent setup verification and correction, stereoscopic X-ray imaging, radiographic portal imaging, cone-beam CT or CT-on-rails may be applied. However, these involve additional radiation exposure to the patient. Non-radiographic techniques that locate electromagnetic fiducials [Kupe 07] are an accurate alternative, but require the patient to be eligible for the invasive procedure of marker implantation. Over the past few years, several devices for non-radiographic and non-invasive patient setup and monitoring based on RI have found their way into the clinics [Bert 05, Brah 08, Fren 09, Peng 10]. The impact of these solutions has provoked the American Association of Physicists in Medicine (AAPM) to issue a special report on quality assurance for these kind of non-radiographic systems [Will 09]. Regardless of the particular RI technology, the systems provide a complete, precise and metric 3-D surface model of the patient in a marker-less manner. Typically, available systems are capable of estimating the table transformation that brings a pre-defined region of the intra-fractional patient surface in congruence with a reference. A number of studies have shown that these techniques obtain a high degree of precision in patient setup for thoracic and abdominal tumor locations [Bert 06, Gier 08, Kren 09, Scho 07]. However, existing solutions are designed with a focus on setup verification and the automatic RI-based alignment is limited to a fine-scale position refinement. Gross misalignments cannot be resolved 3.1 Medical Background RI camera 33 LINAC Treatment table Figure 3.1: Schematic illustration of the proposed automatic initial patient setup: The patient’s intra-fractional surface is acquired with an RI device and registered to a reference shape extracted from tomographic planning data (depicted in gray). The estimated transformation (blue) that brings template and reference into congruence is then applied to the treatment table control. in an automatic manner. This entails that the initial patient alignment must be performed with conventional techniques using laser cross-hairs in combination with skin markers [Bert 06, Fren 09, Gier 08, Kren 09]. This manual initial coarse setup is both a time-consuming and tedious procedure. In average, patient setup time in fractionated RT lies in the range of several minutes. In this chapter, we propose an RI-based solution that enables a marker-less and automatic initial coarse RT patient setup, superseding the need for manual positioning using lasers and skin markers. Blending seamlessly into the clinical workflow, where the treatment table is initially moved to its lowest position to allow the patient to get on the table, the proposed approach directly acquires the patient surface in this initial position and aligns the target to the isocenter of the linear accelerator (LINAC) by registration to a given reference shape. An illustration of the concept is given in Fig. 3.1. In general, the method can be applied to reference surfaces either acquired by the RI device prior to or in the first fraction [Plac 12], or extracted from tomographic planning data [Baue 11a]. Let us stress that using RI reference data involves a two-step alignment procedure: In the first fraction, the patient needs to be manually positioned with conventional alignment techniques. Its body surface in this position is then captured using RI and stored as the reference shape for the remaining treatment fractions. This two-step approach involves increased alignment uncertainties due to error propagation. Hence, in this work, we particularly focus on the direct alignment of intra-fractional RI surface data to a reference shape extracted from pre-fractional tomographic data. This strategy implies the need for multi-modal registration, but yields a substantial gain in setup accuracy compared to the aforementioned two-step workflow. The capability to handle partial matching is required for both workflows. In the considered case, for instance, depending on the positioning and orientation of the RI camera, its field of view may differ substantially from the CT scan volume. 34 Feature-based Multi-Modal Rigid Surface Registration ( a) (b) (c) Figure 3.2: Intra-operative navigation in image-guided open liver surgery using a markerbased optical tracking system [Bell 07]. ( a) Registration of 3-D US data with pre-operative planning data. (b) Intra-operative setup with stereo localization and tracking device (left), US transducer equipped with retro-reflecting spheres (at the operation situs), and navigation screen (center). (c) Instrument tracking. Images reprinted from Beller et al. [Bell 07] by permission of John Wiley & Sons, Inc. Copyright © 2007 British Journal of Surgery Society Ltd. Published by John Wiley & Sons, Ltd. 3.1.2 Image-Guided Open Liver Surgery In image-guided surgery, the anatomical expertise of the surgeon is augmented with a patient-specific source of information by correlating the operation situs with pre-operative tomographic image data. This allows the physician to see the position and orientation of the surgical probe in relation to anatomical structures during the procedure, see Fig. 3.2. The essential step in image-guided surgery is the determination of the mapping between the intra-operative representation of the exposed organ and the patient anatomy available from pre-operatively acquired tomographic data. Image-guidance has found widespread acceptance in neurosurgery. Here, the aligning transformation is performed based on landmarks, via bone-implanted or skin-affixed fiducial markers [Clem 08]. The use of landmarkbased techniques is facilitated by the rigid anatomy surrounding the target. For image-guided open liver surgery, this assumption does not hold. Hence, surface-based techniques have been proposed to align the intra-operative representation to pre-operative images [Herl 99a, Herl 99b]. Benefits of computer navigated tool guidance in IGLS include (1) an enhanced support for the resection of subsurface targets and avoidance of critical structures, (2) improved outcomes due to reduced resection margins, and (3) an expansion of the spectrum of resectability [Cash 07]. To date, the conventional registration protocol for image guidance in hepatic surgery is based on a landmark-based initial alignment [Herl 99b, Clem 08]. As a prerequisite, anatomical fiducials are manually selected in the pre-operative image sets prior to surgery [Cash 07]. Then, during the procedure, the corresponding locations are digitized with a pen probe system at the operation situs. Based on the given correspondences, the aligning transformation is estimated. Eventu- 3.2 Related Work 35 ally, if dense surface data is available, the initial landmark-based registration may be refined by conventional rigid surface registration techniques [Besl 92, Clem 08]. In clinical practice, manual fiducial selection by radiology experts is difficult, subjective and time-consuming. To overcome this elaborate task, we introduce a fully automatic scheme that estimates the transformation without any manual interaction, based on multi-modal RI/CT organ surface registration. 3.2 Related Work For partial surface matching scenarios as addressed in this chapter, the trend is toward methods that establish point correspondences from matching local feature descriptors. In IGLS, we typically face partial matching as RI data only cover the part of the organ that resides within the sensor’s field of view (FOV), as opposed to the full shapes extracted from tomographic planning data. Matching local invariant features is a key component in a variety of computer vision tasks in the 2-D and 3-D domain such as registration, object recognition, scene reconstruction or similarity search in databases. Owing to its relevance to this work, we focus our discussion of related work on the subfield of feature-based 3-D surface registration. Typically, the descriptors encode the surface geometry in a limited support region around a point of interest [Chua 97, From 04, John 98, John 99, Tomb 10, Zaha 09]. The point correspondences can then be used to estimate the transformation that aligns the shapes. Among the first descriptors in the field were spherical point signatures [Chua 97] and spin images [John 98, John 99]. Introduced more than a decade ago, the latter still enjoy great popularity for surface matching and have established as one of the most commonly used methods for local 3-D surface description even though being originally designed for global shape encoding. Over the past years, a broad variety of 3-D shape descriptors have been proposed. For a comprehensive survey we refer to Sect. 2.4.1 and to the reviews by Bronstein et al. [Bron 11], Bustos et al. [Bust 05], Tangelder and Veltkamp [Tang 08b], and the shape retrieval contest benchmarks [Bron 10, Boye 11]. Let us also establish ties with the planar domain, where the majority of successful 2-D descriptors such as histograms of oriented gradients (HOG) [Dala 05], scale-invariant feature transform (SIFT) [Lowe 04], and rotation-invariant fast features (RIFF) [Taka 10] rely on histogram representations. In the context of local descriptors, histograms trade off descriptive power and positional precision for robustness and repeatability by compressing geometry into bins [Tomb 10], thus being an appropriate choice for noisy data. In this work, we propose adaptations of and extensions to state-of-the-art HOG-like descriptors that enable its application for robust multi-modal surface registration. First, we exploit the fact that some of the concepts known from 2-D feature description can be generalized to the 3-D domain. For instance, inspired by the performance of HOG, Zaharescu et al. extended the descriptor to scalar fields defined on 2-D manifolds (MeshHOG) [Zaha 09]. Even though the results of Zaharescu et al. indicate insensitivity of the descriptor to non-rigid deformations, the fact that it is constructed based on k-ring neighborhoods makes it triangulation-dependent, as noted by Bronstein et 36 Feature-based Multi-Modal Rigid Surface Registration al. [Bron 11]. Hence, in this work, we have substantially modified the MeshHOG descriptor to achieve invariance to mesh density and organization, and to improve its robustness to topographical deviations due to the different 3-D acquisition techniques (RI vs. CT). Second, we introduce a scheme that enables the application of 2-D descriptors to surface data in a direct manner. In particular, based on an orthographic depth representation of the local 3-D surface topography in a planar patch, we have extended the 2-D RIFF descriptor [Taka 10] to the domain of 3-D surfaces. 3.3 Feature-based Surface Registration Framework The proposed feature-based framework for multi-modal rigid surface alignment is composed of three stages, as illustrated in Fig. 3.3: First, the local 3-D surface topography of both the moving template point set Xm (RI data) and the fixed reference shape point set Xf (extracted from CT data), denoted: Xm = { xm,1 , . . . , xm,|Xm | } , Xf = { xf,1 , . . . , xf,|Xf | } , x m ∈ R3 , x f ∈ R3 , (3.1) (3.2) is encoded in a discriminative manner using shape descriptors that are explicitly designed to handle surface data from different modalities (Sect. 3.4). The subscripts m and f denote moving and fixed, respectively. The number of elements of a point set X is denoted |X |. Second, point correspondences are established by descriptor matching and successively pruned using geometric consistency constraints (Sect. 3.3.1). Third, based on these point correspondences, the rigid-body transformation aligning Xm to Xf is estimated (Sect. 3.3.2). Eventually, an iterative closest point (ICP) variant may be applied to refine the alignment (Sect. 4.3.1). The global rigid transformation Rg , tg is then given as the concatenation of the trans formation estimated by feature-based pre-alignment Rpre , tpre and the transformation estimated by ICP-based refinement Ricp , ticp : Rg = Ricp Rpre , tg = Ricp tpre + ticp , (3.3) (3.4) where R ∈ SO(3) denotes a rotation matrix and t ∈ R3 a translation vector. 3.3.1 Correspondence Search According to the workflow introduced before, the corresponding sets of local descriptors for both the moving and the fixed shape must be computed first. Let us denote the descriptor sets as: Dm = {dm,1 , . . . , dm,|Xm | } , Df = {df,1 , . . . , df,|Xf | } , (3.5) (3.6) where d ∈ RD denotes a feature vector of dimensionality D. Again, the subscripts m and f denote membership to the moving template and fixed reference, respectively. Details on the proposed 3-D shape descriptors are given in Sect. 3.4. For 3.3 Feature-based Surface Registration Framework Pre-procedural Workflow CT RI Segmentation, Triangulation Segmentation, Triangulation 37 Intra-procedural Workflow Feature Extraction Correspondence Search Transformation Estimation ICP Refinement Feature Extraction Feature-based Rigid Surface Registration Figure 3.3: Flowchart of the feature-based rigid surface registration framework for RI/CT alignment. Data propagates from left to right. The darker shading indicates the parts of the workflow that can be performed prior to the first fraction to reduce the intra-procedural computational load. now, let us assume the availability of Dm , Df . Then the moving data M and the fixed data F are represented as two sets of pairs of surface coordinates xi and their associated feature descriptors di : n o M = ( xm,1 , dm,1 ) , . . . , ( xm,|Xm | , dm,|Xm | ) , (3.7) n o F = ( xf,1 , df,1 ) , . . . , ( xf,|Xf | , df,|Xf | ) . (3.8) Initial Correspondence Search. Given a moving point xm and its associated descriptor dm , we determine the corresponding fixed point xc ∈ Xf by searching for the best matching descriptor dc ∈ Df , using an appropriate distance metric d : dc = cf (dm ) = argmin d (dm , df ) , (3.9) df ∈Df where cf denotes the correspondence operator with the subscript denoting the search domain. The particular choice of the distance metric d depends on the design of the feature descriptor (Sect. 3.4). The resulting initial set of correspondences is denoted as Cinit = {( xm,1 , xc,1 ) , . . . , ( xm,|Xm | , xc,|Xm | )}. Cross Validation. Then, vice versa, searching in the opposite direction, we validate if dm,i also constitutes the best match for dc,i in Dm : ? dm,i = cm (dc,i ) = argmin d (dc,i , dm ) . (3.10) dm ∈Dm All correspondence pairs ( xm , xc ) that do not pass this cross validation check are discarded in the subsequent processing steps. The remaining set of correspondences is denoted Ccross ⊆ Cinit . Geometric Consistency Check. For the purpose of eliminating erroneous correspondences, the set of correspondences Ccross is further pruned by applying a geometric consistency check similar to Funkhouser and Shilane [Funk 06] and Gelfand 38 Feature-based Multi-Modal Rigid Surface Registration (a) (b) (c) Figure 3.4: Graphical illustration of the geometric consistency check for a simplified example with |Xm | = 3. ( a), (b), (c) Computation of gc ( xm,i ), gc ( xm,j ), gc ( xm,k ), comparing the Euclidean distance of point pairs (blue edges) and their correspondences (green edges). et al. [Gelf 05]. It is based on the assumption of a rigid transformation, implying that the distance between two points xm,i , xm,j ∈ Xm is equal to the distance between their correspondences xc,i , xc,j ∈ Xf . Hence, for each ( xm,i , xc,i ) ∈ Ccross we compute a geometric consistency metric gc that considers the root mean squared pairwise distance w.r.t. all xm,j , xc,j ∈ Ccross (cf. Fig. 3.4): v u u gc ( xm,i ) = t 1 |Ccross | |Ccross | j =1 ∑ k xm,i − xm,j k2 − k xc,i − xc,j k2 2 , (3.11) where |Ccross | denotes the number of correspondences that passed the cross validation stage. In an iterative scheme, we successively penalize and eliminate a fixed percentage of low-grade correspondences according to gc ( xm,i ) until the criterion gc ( xm,i ) < δc is fulfilled for all remaining correspondence pairs given a reliability threshold δc . Let us stress that for a partial matching scenario, the geometric consistency check (Eq. 3.11) must be constrained to a local neighborhood. 3.3.2 Transformation Estimation The remaining set of reliable correspondences C ⊆ Ccross is eventually used to estimate the rigid body transformation ( Rpre , tpre ) that aligns the moving template to the fixed reference, based on minimizing the squared Euclidean distance between the transformed moving points and its correspondences in the fixed point set: ( Rpre , tpre ) = argmin R,t ∑ k( Rxm + t ) − xc k22 . (3.12) ( xm ,xc ) ∈ C This optimization problem can be solved in a closed form using Horn’s unit quaternion optimizer [Horn 87]. For the RT patient setup experiments in Sect. 3.5.1, Rpre is restricted to a rotation about the table’s vertical isocenter axis (here: x-axis) as conventional RT treatment tables are limited to four degrees of freedom (translation and rotation about 3.4 Shape Descriptors 39 one axis). In matrix notation and homogeneous coordinates, the transformation is then given as: 1 0 0 tx 0 cos(θ ) −sin(θ ) t y 0 sin(θ ) cos(θ ) tz 0 0 0 1 xm 1 ! = xc 1 ! ∀ ( xm , xc ) ∈ C , (3.13) where θ denotes the rotation angle. The resulting non-linear optimization problem to estimate the rotation angle θ and the transformation t = (t x , ty , tz )> from a set of reliable correspondences C can be solved numerically in a least-squares sense using the Levenberg-Marquardt algorithm [Marq 63], for instance. 3.4 Shape Descriptors In this section, we introduce the 3-D shape descriptors that we applied in the experiments (Sect. 3.5). Let us emphasize that the framework introduced in Sect. 3.3 is not limited to these specific descriptors but generic in a sense that any feature descriptor that meets the specific requirements for the application to be addressed can be used. The shape descriptors presented below encode the local surface topography in the neighborhood N of a reference point xref in a translation- and rotation-invariant manner. However, the descriptors are not invariant to scale on purpose as in most real-world applications it is beneficial to incorporate the metric scale of the anatomical surface topography as an important characteristic. Placing great importance on matching robustness and repeatability, we have customized two HOG-like descriptors (MeshHOG, RIFF) and the well established technique of spin images [John 99] for multi-modal application. Below, we outline the descriptors’ functional principles, identify its limitations and detail the concrete modifications we propose to meet the requirements in multi-modal surface registration. Before, let us introduce our notation of a histogram H(·, ·, ·) → R NH , where the first entry denotes the neighborhood of the domain that is to be encoded, the second entry the parameter that determines the bin assignment, and the third entry the parameter that controls the bin weighting, respectively. NH denotes the number of histogram bins. 3.4.1 Spin Images Given is an oriented surface point, i.e. a point xref ∈ R3 with its associated normal nref ∈ R3 , knref k2 = 1, cf. Fig. 3.5a. Then the pair ( xref , nref ) inherently describes an object-centered local point basis being invariant to rigid transformations. In particular, it gives a cylindrical coordinate system that is missing the polar angle coordinate. Based on this local basis, the corresponding spin image is generated as follows [John 98, John 99]: 40 Feature-based Multi-Modal Rigid Surface Registration • First, the set of points within a cylindrical neighborhood N = { xi , . . . , x j } centered around xref , also denoted support region below, are expressed in cylindrical coordinates xcyl ∈ R2 : xcyl,i = xcyl,i ycyl,i q ! := > ( x − x ))2 k xi − xref k22 − (nref i ref > (x − x ) nref i ref ! , (3.14) where xcyl,i is the non-negative perpendicular radial distance to nref and ycyl,i denotes the signed elevation component with respect to the surface tangent plane defined by xref and nref , see Fig. 3.5d. The resulting set of cylindrical coordinates Ncyl = { xcyl,i , . . . , xcyl,j } describes the relative positions of N with respect to its local basis ( xref , nref ). • Second, the cylindrical coordinate space is quantized into discrete bins of a 2-D histogram called spin image, i.e., the histogram is accumulated according to the position of the points xcyl,i ∈ Ncyl in the local cylindrical coordinate space, cf. Fig. 3.5a. The linearized version of this 2-D histogram, concatenating the histogram rows to a single vector, yields the spin image descriptor dspin denoted as: > dspin = H Ncyl , xcyl , χ( xcyl ) , (3.15) where χ denotes a characteristic function that indicates the occurrence of a point xcyl in a discrete bin. In practice, in order to make the representation less sensitive to variations in surface sampling and noise, the association of a cylindrical coordinate xcyl,i to a discrete bin in the 2-D array is performed using bilinear interpolation. Thus, the contribution of the point is smoothed over adjacent bins. Note that the parameterization of the spin image descriptor controls the trade-off between descriptiveness, robustness and dimensionality. Eventually, let us remark that the process of generating spin images described before can be thought of as a detector matrix spinning around nref and accumulating points ∈ N as it sweeps space. For the experiments in Sect. 3.5.1, we consider spin images as a baseline shape descriptor. In order to cope with 3-D data acquired with different modalities, thus involving meshes with different resolutions, we propose a modification compared to the original implementation [John 99]. Instead of deriving the histogram bin width from the median edge length of the surface mesh, we set a fixed metric bin width. Thus, the metric scale of the surface topography is explicitly incorporated into the descriptor computation. 3.4.2 Mesh Histograms of Oriented Gradients (MeshHOG) Both the MeshHOG descriptor and the RIFF descriptor (Sect. 3.4.3) rely on the concept of HOG. Hence, let us briefly review the basic idea in the 2-D image domain first [Dala 05]. HOG-like descriptors exploit the fact that local object appearance and shape is characterized by the distribution of intensity gradient directions. The descriptor operates on scalar image data f and evaluates local image patches. 3.4 Shape Descriptors 41 Spin Images MeshHOG RIFF (a) (b) (c) (d) (e) (f) Figure 3.5: (a) Graphical illustration of shape description with spin images, representing the local neighborhood in cylindrical coordinates ( xcyl , ycyl ), cf. subfigure (d), and encoding the coordinates in a 2-D histogram. (b) Functional principle of the MeshHOG surface descriptor, projecting a gradient vector field ∇ f – sketched by the green arrows and generated according to the proposed CUSS scheme, cf. subfigure (e) – onto circular segments of the orthogonal planes of a local coordinate system spanned by xref , nref and bref . For the RIFF descriptor (c), the surface topography is expressed as a 2-D depth image d⊥ where rotation-invariant gradients are analyzed within several annuli. Both the MeshHOG and RIFF descriptors are based on a concatenation of histograms of oriented gradients, binning a vector field according to its orientations γ(·) and magnitudes k · k2 , cf. subfigure ( f ). First, the gradient orientations γ(∇ f ) → [0, 2π [ and magnitudes k∇ f k2 → R are computed for each pixel of the image patch. Then, in order to measure local distributions of gradient values, the window is divided into local sub-patches (cells). For each cell, the set of pixels N is discretized into a histogram according to its gradient direction, cf. Fig. 3.5f: H (N , γ(∇ f ), k∇ f k2 ) . (3.16) The contribution depends on the gradient magnitude at the respective pixel. Finally, the cell histograms are concatenated to form the HOG descriptor. Depending on the application, contrast normalization may be performed by scaling the feature vector to unit length. The MeshHOG descriptor may be considered as a generalization of the concept of HOG from planar image domains to non-planar 2-D manifolds [Zaha 09]. Given a scalar function f being defined on the manifold, the descriptor encodes the local spatial distribution and orientation of a gradient vector field ∇ f being derived from both the surface topography in terms of normals n and the scalar function f . In particular, given a reference point xref and a set of points within a spherical support region N = { xi , . . . , x j }, the gradient vectors ∇ f ( x) → R3 are projected 42 Feature-based Multi-Modal Rigid Surface Registration onto the three orthogonal planes of a unique and invariant local reference frame. Let us denote the projection onto a plane P by an operator qP . The local reference frame at xref is spanned by its normal nref and a second axis bref residing in the tangent plane Tref defined by nref , and pointing into the dominant gradient direction [Baue 11b]. Then, for each plane, an orientation histogram binning is performed w.r.t. circular segments, see Fig. 3.5b. In particular, the projected vectors qP (∇ f ( x)) → R2 are assigned to Nseg circular spatial segments according to their origin, and binned in orientation histograms w.r.t. the orthogonal local reference frame: H (NP ,s , γ(qP (∇ f )), kqP (∇ f )k2 ) , (3.17) where NP ,s denotes the set of projected gradient vectors qP (∇ f ( x)) that reside within a segment s in plane P , γ(qP (∇ f )) denotes its gradient orientation, and kqP (∇ f )k2 its gradient magnitude. The MeshHOG feature descriptor dhog is then composed as a concatenation of 3 · S gradient orientation histograms, from the three orthogonal planes and the Nseg circular segments per plane: dhog = ( H(NP1 ,1 , γ(qP1 (∇ f )), kqP1 (∇ f )k2 ), H(NP1 ,2 , γ(qP1 (∇ f )), kqP1 (∇ f )k2 ), .. . H(NP3 ,Nseg , γ(qP3 (∇ f )), kqP3 (∇ f )k2 ) )> . (3.18) The study of related work in Sect. 3.2 revealed that the design of shape descriptors typically does not account for a potential application in multi-modal scenarios. This especially holds true for the MeshHOG descriptor. Below, we outline its original limitations and introduce our modifications to enforce (1) robustness to intermodality variations in surface topography and (2) invariance to mesh density and mesh representation (topology). In general, an arbitrary scalar function f can be chosen [Zaha 09], including photometric information as applied in [Baue 11b]. In scenarios that involve a modality which only provides geometric information (such as for RI/CT registration), the scalar function must characterize geometric surface properties. In the original method, Zaharescu et al. proposed to use a curvature measure. However, the use of second-order surface derivatives makes the descriptor susceptible to noise. Instead, we propose the signed distance of a point to the best fitting plane of its local neighborhood as scalar function f . By doing so, we can better cope with the low SNR of RI data [Mull 10]. Next, we address the invariance to mesh density and representation. Conventionally, in a 2-D image, gradients are computed by differentiating scalar data in two orthogonal directions. Zaharescu et al. introduced a numerical gradient of a scalar function ∇ f defined on a 2-D manifold using a discrete operator that relies on adjacent vertices [Zaha 09], restricting the approach to meshes that have a uniform and equally scaled triangular representation. To be able to cope with arbitrary mesh representations and densities, we propose a gradient operator that is based on a technique we call circular uniform surface sampling (CUSS), see Fig. 3.5e. First, for a given point x, we perform a circular uniform sampling of the tangent plane T defined by n. This is performed by rotating a reference vector a ∈ R3 3.4 Shape Descriptors 43 residing in T with k ak2 = 1 around n by a set of discrete angles {θ1 , . . . , θ Ncuss } with θi = i · N2π , yielding Rθi a. Rθ denotes a rotation matrix for angle θ around n. cuss Ncuss controls the circular sampling density. Scaling the vectors Rθi a with a sampling radius rcuss provides a set of points Xcuss = { xcuss,1 , . . . , xcuss,Ncuss } with: xcuss,i = x + rcuss Rθi a . (3.19) Based on these circular arranged points Xcuss ∈ T , the surface sampling is performed by intersecting the mesh with rays that emerge from the points xcuss and are directed parallel to n. Let us denote the mesh intersection points as mcuss , with its associated scalar field value f (mcuss ) being interpolated w.r.t. adjacent vertices. Eventually, the CUSS gradient ∇ f ( x) at a vertex x is defined as: ∇ f ( x) = 1 Ncuss Ncuss ∑ i =1 f (mcuss ) − f ( x) R θi a , kmcuss − xk2 (3.20) where Rθi a denotes the circular sampling direction. The nominator differentiates f in direction of Rθi a. The denominator acts as a regularizer that penalizes the contribution of outliers being represented by intersection points mcuss with large distances to x. 3.4.3 Rotation Invariant Fast Features (RIFF) In recent years, a multitude of powerful image descriptors have been proposed and quantitatively evaluated on large-scale databases [Miko 05]. Unfortunately, the transfer of feature extraction concepts that were originally developed for the 2-D image domain to 3-D shape space is typically not straightforward, cf. Sect. 3.4.2. To exploit the comprehensive body of literature on image descriptor representations, we have developed a scheme that allows to extend any 2-D descriptor concept to the domain of 3-D surfaces. The underlying strategy relates to the basic idea of spin images, namely, representing 3-D shape in a 2-D domain first. In particular, we encode the surface topography in the neighborhood of xref in a local orthographic depth representation w.r.t. the tangent plane Tref defined by the normal nref . Imagine a virtual RI camera hovering above the considered surface coordinate xref . The virtual camera uses an orthographic projection model and the direction of projection is defined by the surface normal nref . It yields a discrete 2-D image where each pixel xp holds the signed orthogonal depth d⊥ ( xp ) of its position on the tangent plane w.r.t. the surface, see Fig. 3.5c. For practical implementation we exploit the OpenGL depth buffer representation, cf. Sect. 2.2.2. Let us emphasize that this orthogonal re-sampling strategy implicitly overcomes differences in mesh density and representation that occur in multi-modal applications due to different triangulation schemes. Now we are in the convenient position to apply any 2-D descriptor on this orthogonal depth image d⊥ generated by the virtual RI camera. For the applications addressed in this thesis, we chose RIFF [Taka 10] which also build on the established concept of HOG [Dala 05]. As proposed by Takacs et al. [Taka 10], first, 44 Feature-based Multi-Modal Rigid Surface Registration rotation invariance is achieved by performing a radial gradient transform (RGT) f rgt : R2 → R2 on the gradient depth image ∇d⊥ , > > f rgt (∇d⊥ ) = ∇d⊥ e1 , ∇d⊥ e2 , (3.21) encoding the gradient ∇d⊥ → R2 w.r.t. a local polar reference frame. This frame is spanned by the orthogonal basis vectors e1 and e2 that depend on the image position xp ∈ R2 and the image position of the given reference point xp,ref : e1 ( xp ) = xp − xp,ref , k xp − xp,ref k2 e2 ( xp ) = R π2 e1 ( xp ) . (3.22) The basis vectors e1 ( xp ) and e2 ( xp ) are the radial and tangential directions at xp relative to xp,ref . For the proof of rotation invariance of the RGT we refer to [Taka 10]. Second, the image is subdivided into spatial cells arranged in Nriff circular equidistant annuli neighborhoods N a , see Fig. 3.5c. The subscript a denotes the annulus index. Then, given the rotation-invariant gradients f rgt (∇d⊥ ), for each annulus N a we compute a gradient orientation histogram, cf. Fig. 3.5f, H(N a , γ( f rgt (∇d⊥ )), k f rgt (∇d⊥ )k2 ) , (3.23) where γ( f rgt (∇d⊥ )) denotes the orientation and k f rgt (∇d⊥ )k2 the magnitude of f rgt (∇d⊥ ). The RIFF descriptor driff eventually concatenates the histograms from the set of annuli: driff = ( H(N1 , γ( f rgt (∇d⊥ )), k f rgt (∇d⊥ )k2 ), H(N2 , γ( f rgt (∇d⊥ )), k f rgt (∇d⊥ )k2 ), .. . H(N Nriff , γ( f rgt (∇d⊥ )), k f rgt (∇d⊥ )k2 ) )> . (3.24) Compared to a single HOG from the entire circular patch, this spatial subdivision into annular cells enforces distinctiveness of the descriptor. Again, note that the use of histogram representations implies resilience w.r.t. multi-modal topography deviations and the choice of the number of annuli is a trade-off between distinctiveness and robustness. 3.4.4 Distance Metrics for Feature Matching The similarity of spin images is typically rated by a correlation-based distance metric [Pear 96, John 98]. This is also a convenient choice for the multi-modal case implying different mesh densities, as spin images of the same topography but with different vertex densities exhibit a linear relationship between corresponding entries of dspin for approximately uniformly distributed point clouds. The RIFF and MeshHOG descriptors driff , dhog are commonly compared using bin-tobin distances such as the L1 and L2 norm, respectively. This is practicable as the RIFF and MeshHOG histogram domains are aligned prior to matching, using L2 normalization to gain invariance to mesh resolution and density, respectively. 3.5 3.5 Experiments and Results 45 Experiments and Results In the experiments, we investigate the application of the proposed multi-modal rigid surface registration framework for marker-less patient setup in fractionated RT and augmented reality applications in IGLS, as motivated in Sect. 3.1.1. In order to study the generalizability of the method, we consider different RI technologies: structured light (Sect. 3.5.1) and ToF imaging (Sect. 3.5.2). 3.5.1 Multi-Modal Patient Setup in Fractionated RT We consider the clinical scenario where the patient’s body surface is captured using RI sensors and registered to a given reference shape extracted from CT planning data, potentially involving gross initial misalignments. The estimated transformation is then transferred to the control unit of the steerable treatment table. For quantitative evaluation, we have benchmarked the performance of the proposed descriptors in an experimental study on real data from Microsoft Kinect, using anthropomorphic phantoms. Thereby, we underline the benefits of the proposed framework for multi-modal partial surface matching. Materials and Methods. We have generated a benchmark database of Microsoft Kinect RI data for two anthropomorphic phantoms (male/female), see Fig. 3.6a,b. Data were acquired in a clinical RT environment (Siemens ARTISTE, Siemens AG, Healthcare Sector, Kemnath, Germany, cf. Fig. 3.6c). For each phantom, we captured RI data for N = 20 different initial misalignments of the treatment table, including large deviations. The set of poses for the phantom benchmark is composed of all possible permutations of the transformation parameter sets θAP = {0, 5, 10, 25, 45}◦ , tSI = {0, 200} mm, and tML = {0, 200} mm, where the angle θAP describes the table rotation about the isocenter axis and tSI , tML denote the table translation in superior-inferior (SI) and medio-lateral (ML) direction, cf. Fig. 3.1. The translation in anterior-posterior (AP) direction was set to tAP = −600 mm, representing the initial height for the patient to get on the table and recline. The table positioning control (accuracy: ± 1.0 mm, ± 0.5◦ ) was used to set up the respective ground truth transformation ( RGT , tGT ). The RI sensor was mounted 200 cm above the floor, at a distance of 240 cm to the LINAC isocenter and a viewing angle of 55◦ . RI sequences (30 fps) were acquired with a resolution of 640×480 px. In terms of RI pre-processing, we combined temporal averaging (over 150 ms) with edge-preserving bilateral filtering. Invalid measurements were restored using normalized convolution (cf. Sect. 2.2.3). The patient surface can be segmented from the background using prior information about the treatment table position. CT data of the phantoms were acquired on a Siemens SOMATOM scanner (Department of Neuroradiology, Erlangen University Clinic, Germany). The male phantom was scanned and reconstructed with a resolution of 512 × 512 × 346 voxels and a spacing of 0.95 × 0.95 × 2.5 mm, the female one with a resolution of 512 × 512 × 325 voxels and a spacing of 0.81 × 0.81 × 2.5 mm. The phantom surface was extracted using a thresholding based region growing approach with manual seed point placement. On the extracted binary segmentation mask, we then ap- 46 Feature-based Multi-Modal Rigid Surface Registration (a) (b) (c) (d) Figure 3.6: Female (a) and male (b) anthropometric phantoms, made of glass-fiber reinforced plastic. (c) Siemens ARTISTE RT suite. Image source: Siemens AG. Note the set of four multi-modal RI/CT markers (d) attached to the phantoms (a,b) for registration to CT planning data. plied the Marching Cubes algorithm [Lore 87] followed by Laplacian mesh smoothing [Fiel 88]. Eventually, the mesh was decimated in order to reduce the computational complexity. The pre-processed RI (CT) meshes consist of ∼15k (∼20k) vertices. Note that CT data typically covers only a portion of the RI scene. Furthermore, let us stress that CT pre-processing including the computation of 3-D shape descriptors can be performed offline prior to the first fraction (cf. Fig. 3.3). In order to evaluate the accuracy of patient setup w.r.t. the ground truth transformation given in the LINAC coordinate system (table controls), the RI measurements were transformed into the LINAC coordinate system. The RI/LINAC coordinate system transformation was determined using a checkerboard calibration pattern. CT data of the male and female phantom were aligned to the LINAC coordinate system using an RI reference acquisition of the phantoms at the isocenter position (θAP = 0◦ , tSI = 0 mm, tML = 0 mm, tAP = −150 mm) in combination with multi-modal RI/CT markers, see Fig. 3.6. In particular, we attached a set of four painted Beekley spots1 to each phantom. These spots are placed such that they are visible from the RI camera. In addition, they can be easily detected in CT data. Based on the corresponding marker positions in RI and CT data, the transformation can be estimated. Thus, the CT dataset is transformed into the reference position of the LINAC coordinate system2 . This calibration procedure now allows us to compare the estimated table transformation (R, t) to the ground truth table setup (RGT , tGT ) given by the table control in the LINAC coordinate system. In 1 http://www.beekley.com 2 Note that this alignment procedure allows for a general feasibility study of coarse patient setup. For a more precise analysis including an investigation of the effect of subsequent position refinement using ICP, this procedure is considered insufficient. 3.5 Experiments and Results 47 particular, we compute the mean rotational and mean translational errors over the set of N poses: ∆θAP = ∆tAP/ML/SI = ∆t = 1 N 1 N 1 N N (i ) (i ) ∑ |θAP − θAP,GT | , i =1 N (i ) (i ) ∑ |tAP/ML/SI − tAP/ML/SI,GT | , i =1 N (i ) ∑ kt(i) − tGT k2 , (3.25) (3.26) (3.27) i =1 where θAP (θAP,GT ) denotes the estimated and ground truth rotation angle around the table axis, t (tGT ) the translation. The superscript indicates the index of the evaluated transformation. Recall that the transformation is restricted to a rotation about the table’s vertical isocenter axis as conventional RT treatment tables are limited to four degrees of freedom (Sect. 3.3.2). For a valid comparison of the proposed shape descriptors, we set an identical support region radius of rN = 100 mm. This choice is motivated by the physical scale of anatomical features on a human torso. The individual descriptor parameters were set to typical values [John 98, Taka 10, Zaha 09] if not stated otherwise. The most influential parameters were adapted empirically using parameter grid search techniques. For the spin image descriptor, the cylindrical coordinate space was discretized into 15 bins in each direction. Note that the metric bin width is rN /15 and the support region describes a cylinder. This parameterization results in a descriptor dimension of Dspin = 15 · 15 = 225. For the MeshHOG descriptor, we chose 8-bin orientation histograms and Nseg = 8 equally-spaced circular segments, resulting in Dhog = 3 · 8 · 8 = 192. The parameter study revealed that using less orientation bins and circular segments, respectively, reduces descriptiveness, while using more orientation bins affects robustness due to instabilities in the establishment of the local reference frame and the gradient vector field. The CUSS sampling density was set to Ncuss = 18, the sampling radius to rcuss = 20 mm. Regarding the scalar function, the neighborhood radius for estimating the best-fitting plane was set to 20 mm. For a comprehensive MeshHOG parameter study we refer to [Baue 11b, Mull 10]. For the RIFF descriptor, the virtual RI camera sampling density was set to 128 × 128 px. Using Nriff = 4 circular annuli and 8-bin orientation histograms per annuli results in Driff = 4 · 8 = 32. As distance metric for feature matching, we applied the correlation coefficient for spin images and the L1 -norm for RIFF and MeshHOG. Results. Quantitative results for the multi-modal alignment of the N = 20 phantom poses are depicted in Table 3.1. In total, for all three descriptors, the registration framework was able to estimate the table transformation for the vast majority of gross initial misalignments. For the application in RT patient setup, what is most important is a high percentage of successful registrations (SR) – accuracy is a secondary goal for this initial pre-alignment. Having achieved the highest percentage of successful registrations (97.5%), let us refer to the results of the MeshHOG descriptor as an overall performance indicator yielding a mean rotational 48 Descriptor Spin Images MeshHOG RIFF Feature-based Multi-Modal Rigid Surface Registration Metric SR ∆θAP [◦ ] ∆tAP [mm] ∆tML [mm] ∆tSI [mm] ∆t [mm] SR ∆θAP [◦ ] ∆tAP [mm] ∆tML [mm] ∆tSI [mm] ∆t [mm] SR ∆θAP [◦ ] ∆tAP [mm] ∆tML [mm] ∆tSI [mm] ∆t [mm] (m) 0.95 0.7 ± 0.6 5.7 ± 3.8 7.6 ± 5.0 7.1 ± 6.3 13.5 ± 6.0 1.00 1.0 ± 0.6 4.9 ± 3.6 6.7 ± 5.1 7.9 ± 8.3 13.5 ± 7.2 1.00 1.4 ± 1.2 4.5 ± 3.5 4.5 ± 3.7 6.2 ± 4.7 10.5 ± 3.9 (f) 0.95 0.6 ± 0.5 3.7 ± 2.5 5.6 ± 4.3 7.3 ± 4.2 11.1 ± 4.0 0.95 2.0 ± 1.6 3.9 ± 2.8 8.6 ± 4.4 5.8 ± 6.2 12.3 ± 5.9 0.90 2.0 ± 1.6 4.7 ± 3.5 7.2 ± 5.6 6.3 ± 5.9 12.4 ± 6.1 (m) & (f) 0.95 0.7 ± 0.6 4.7 ± 3.3 6.6 ± 4.7 7.2 ± 5.3 12.3 ± 5.2 0.98 1.5 ± 1.3 4.4 ± 3.2 7.6 ± 4.8 6.9 ± 7.3 12.9 ± 6.6 0.95 1.7 ± 1.4 4.6 ± 3.5 5.8 ± 4.8 6.3 ± 5.2 11.4 ± 5.1 Table 3.1: Mean rotational and translational errors in patient setup based on multi-modal RI/CT surface registration on male (m) and female (f) anthropometric phantoms. Given are mean and standard deviation. SR quotes the percentage of successful registrations, classified with heuristic thresholds (∆θAP < 10◦ , ∆t < 40 mm). and translational error of ∆θAP = 1.5 ± 1.3◦ and ∆t = 12.9 ± 6.6 mm, respectively3 . Note that with the original MeshHOG formulation [Zaha 09] where ∇ f relies on adjacent vertices (recall Sect. 3.4.2), the registration failed for all datasets. The results for spin images roughly match up to the HOG-like descriptors (MeshHOG, RIFF). We assume this good spin image performance to result from the fact that both meshes exhibit a rather uniform triangulation. Nonetheless, the experiments confirm the robustness of spin images, underlining its popularity. Opposing the performance of the shape descriptors on the male and female phantom, respectively, the results seem surprising at a first glance. With respect to its distinctive topography (Fig. 3.6a), one might expect a better registration success rate and accuracy for the female mannequin. Instead, the results indicate that negative effects due to self-occlusion (that increase with large table rotations) even lead to a slightly worse performance compared to the male phantom. 3 At this point, let us remark again that the scope of the feature-based approach is restricted to a coarse initial patient setup. Setup verification in terms of accurate positioning refinement is a mandatory subsequent step but not addressed here. 3.5 Experiments and Results (a) 49 (b) (c) Figure 3.7: Spatial distribution of point correspondences for a multi-modal RI/CT registration on male (top row, θAP = 0◦ , tSI = 200 mm, tML = 200 mm, tAP = −600 mm) and female phantom data (bottom row, θAP = 45◦ , tSI = 200 mm, tML = 200 mm, tAP = −600 mm). From left to right, point correspondences for (a) spin images, (b) MeshHOG and (c) RIFF are given. Only a subset of the found correspondences is shown. A qualitative illustration of the extracted set of correspondences is shown in Fig. 3.7. We found that with increasing table rotations, the number of found correspondences decreased considerably. For instance, in average over all descriptors, the number of correspondences for table rotations of θAP = 45◦ decreased to 59.2% compared to the number of correspondences for a table rotation of θAP = 0◦ . 3.5.2 Multi-Modal Data Fusion in IGLS Investigating the application of the proposed framework in IGLS, we consider a multi-modal ToF/CT setup (Sect. 3.1.2). The applicability of ToF imaging for intraoperative surface acquisition was first investigated by Seitel et al. [Seit 10], achieving promising results on intra-operative 3-D shape acquisition of porcine organs. Materials and Methods. We performed in-vitro experiments on four porcine livers that were captured both with a PMD CamCube 2.0 ToF camera and with a Carm CT system. Opposed to the patient setup experiments (Sect. 3.5.1), the ground 50 Feature-based Multi-Modal Rigid Surface Registration Figure 3.8: CT (top row) and ToF (bottom row) porcine liver surface data. For convenience, these graphics were produced for similar poses after manual alignment to facilitate visual comparison. Note the surface artifacts in ToF surface data due to specular reflections. truth transformation that aligns ToF to CT data is not given here. Hence, for evaluation of the registration performance, we computed the residual Euclidean meshto-mesh distance, denoted target registration error dTRE , after having applied the estimated transformation. ToF data were acquired with a PMD CamCube 2.0 camera with an integration time of 1000 µs. For further specifications we refer to Sect. 2.1.3. The camera was mounted 60 cm above the organ in an interventional radiology development environment. Owing to the low SNR with ToF data and w.r.t. the trade-off between data denoising and preservation of topographical structure, we perform data pre-processing in a way that gives priority to the topographical reliability of the surface. In particular, we combine temporal averaging with edge-preserving median and bilateral filtering, respectively. Furthermore, the organ surface was segmented by depth thresholding. The pre-processed ToF organ meshes are depicted in Fig. 3.8. Note that some surface artifacts that result from specular highlights could not be removed using the proposed pre-processing pipeline. CT data were acquired with an Artis zeego C-arm CT system (Siemens AG, Healthcare Sector, Forchheim, Germany). The livers were scanned and reconstructed with a volumetric resolution of 512 × 512 × 348 voxels and a spacing of 0.7 × 0.7 × 0.7 mm. In analogy to Sect. 3.5.1, the organ was segmented from CT data using a region growing approach with manual seed point placement followed by mesh generation. The CT organ surface meshes are depicted in Fig. 3.8. Based on the descriptor performance results in Sect. 3.5.1, we confine to the MeshHOG descriptor for this multi-modal ToF/CT surface registration scenario (Sect. 3.4.2). The descriptor parameters were determined in a comprehensive parameter study [Mull 10], resulting in a support region radius of rN = 30 mm, Nseg = 4 circular segments with 4-bin orientation histograms, and thus a descriptor dimension of Dhog = 3 · 4 · 4 = 48. The CUSS sampling density was set to Ncuss = 18, the sampling radius rcuss to triple the average ToF mesh edge length. Regarding the scalar function f , the neighborhood radius for estimating the best- 3.6 Discussion and Conclusions dTRE,pre [mm] dTRE,icp [mm] L1 L2 L3 L4 4.38 4.24 4.37 2.18 2.01 2.47 1.72 1.58 51 L1-L4 3.78 ± 1.08 1.95 ± 0.39 Table 3.2: Quantitative registration results for ToF data from four porcine livers (L1-L4). Given are the target registration errors dTRE after coarse pre-alignment (first row) and after ICP refinement (second row), respectively. fitting plane was set to 30 mm. As distance metric for feature matching, we chose the L2 norm as originally proposed [Zaha 09] to penalize outliers that are expected due to the low SNR and artifacts with ToF surface data, cf. Fig. 3.8. Results. Quantitative ToF/CT registration results for the porcine livers L1-L4 are given in Table 3.2. Over the four datasets, the target registration error after featurebased initial surface pre-alignment was dTRE,pre = 3.78 ± 1.08 mm and dTRE,icp = 1.95 ± 0.39 mm after subsequent ICP refinement. Note that dTRE,pre < 5 mm and dTRE,icp < 2.5 mm for all four datasets. The comparably small error for dataset L4 is assumed to result from its more distinctive topography compared to L1-L3. The spatial distribution of the set of reliable correspondences that are used for transformation estimation are illustrated in Fig. 3.9. Note that on liver L1, fewer areas with reliable correspondences were identified compared to L2-L4 that exhibit a more salient geometry and hence more distinct feature descriptors on average. 3.6 Discussion and Conclusions In this chapter, we have presented a feature-based rigid surface registration framework that meets the specific requirements for multi-modal medical applications. In particular, we have adapted state-of-the-art descriptors to enforce invariance w.r.t. mesh density, mesh organization and inter-modality deviations in surface topography. Let us remark that the proposed descriptors share several common properties. First, all of them project the original 3-D surface information onto a 2-D domain w.r.t. the given reference point. Second, the projected data is encoded using histogram techniques. Third, all three descriptors rely on a local objectcentered reference frame. For spin images and RIFF, this local frame constitutes a 2-D basis that relies on the position of the reference point and its associated normal. In contrast, the MeshHOG descriptor builds upon a 3-D basis that is established by an additional axis. This 3-D reference frame is advantageous in terms of descriptiveness opposed to spin images and RIFF, if established in a robust and repeatable manner [Petr 11]. In an experimental study, we have investigated the application of the framework to two medical applications. For marker-less initial patient setup in RT, applying our modified MeshHOG descriptor for RI/CT registration resulted in an average angular positioning error of 1.5 ± 1.3◦ and an average translational positioning error of 12.9 ± 6.6 mm, at a 97.5% success rate on Microsoft Kinect RI data. In general, all three shape descriptors gave a comparable performance in 52 Feature-based Multi-Modal Rigid Surface Registration L1 L2 L3 L4 Figure 3.9: Illustration of the spatial distribution of reliable point correspondences between ToF (right) and CT surfaces (left), for the four liver datasets L1-L4. Again, for convenience, only a subset of the found correspondences is shown. the experiments, even in the presence of gross misalignments and a flat RI viewing angle (55◦ ). The achieved setup accuracy fulfills the requirements for patient pre-alignment (min. ±50 mm [Fren 09]) and provides a reliable initialization for subsequent position refinement approaches [Baue 12b, Brah 08, Fren 09, Scho 07]; thus potentially making the conventional manual initialization using lasers and skin markers redundant. Regarding the alignment of intra-operative organ surface data to pre-operative shapes extracted from tomographic data, our experimental results on porcine liver data in a ToF/CT setup suggest that a feature-based registration approach based on marker-less RI can replace the manual selection of anatomical landmarks in IGLS. Over four datasets, the average target registration error was 3.78 ± 1.08 mm. Having successfully applied the proposed framework on two different RI modalities and different biological materials (skin vs. organs) indicates the generalization potential of the approach to a wide range of clinical applications where an approximately rigid assumption holds true. By design, the method can handle gross initial misalignments and partial matching, and can be also applied to setups with multiple RI cameras that typically provide more reliable surface data due to an increased and overlapping coverage of the target. Let us briefly comment on runtime performance. First, recall that the processing of pre-operative reference data including pre-processing, surface extraction and feature extraction can be performed offline. During the procedure, the computation of shape descriptors on RI data and its matching to the pre-computed 3.6 Discussion and Conclusions 53 CT shape descriptors is required. The runtime for this intra-procedural workload highly depends on the descriptor parameterization and the RI/CT mesh densities constituting the dimensionality of the search space. In average, for both the RT and the IGLS scenario, the runtime of our proof-of-concept implementation lies in the range of 30-60 s with a GPU-based cross-validation correspondence search. Further speed-ups could be achieved using dedicated acceleration structures, cf. Sect. 4.3.2. Several components of the framework could be addressed to improve the performance. First, introducing a low-level feature detection stage that identifies locations with distinctive topographies [Holz 12] would reduce the computational effort for shape description and narrow down the correspondence search space. Second, modifications regarding the correspondence search strategy, e.g. extending the geometric consistency check to angular constraints [Shan 04, Funk 06] or enforcing the second best feature match to be substantially worse than the best one [Zaha 09] could help in further improving robustness of the approach. Third and last, we expect benefits from performing feature extraction in a multi-scale scheme. 54 Feature-based Multi-Modal Rigid Surface Registration CHAPTER 4 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction 4.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 Photo-geometric Surface Registration Framework . . . . . . . . . . . . . . . 60 4.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 The feature-based surface registration framework introduced in Chap. 3 relies on the topography of the shapes that are to be aligned. In this chapter, we propose a method for 3-D shape reconstruction that incorporates additional complementary photometric information from textured point cloud data to guide the underlying surface registration process. In an experimental study, we show that this photo-geometric approach is of particular interest for modern RGB-D cameras that provide low-SNR range measurements but additionally acquire high-grade photometric information. We address two particular medical applications and show that they can benefit from incorporating complementary photometric information: • Operation situs reconstruction in 3-D laparoscopy • Colon shape model construction in 3-D colonoscopy Both applications involve real-time constraints for practical usage. Hence, we propose an ICP variant that (1) incorporates both geometric and photometric information in an efficient low-level scheme and (2) builds on a novel acceleration structure to overcome the traditional performance bottleneck in nearest neighbor (NN) search space traversal. Even though the acceleration structure is specifically designed for an implementation on many-core hardware [Cayt 10, Cayt 11], we have further optimized the scheme in terms of performance, trading off accuracy against runtime [Baue 13b, Neum 11]. The remainder of this chapter is organized as follows. In Sect. 4.1 and Sect. 4.2, we address the medical background and related work. In Sect. 4.3, we detail the proposed framework for photo-geometric surface registration and 3-D shape reconstruction. In Sect. 4.4, we study its performance on a classical computer vision 55 56 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction scenario and present experimental results for operation situs reconstruction in laparoscopy and organ shape model construction in colonoscopy. Eventually, we discuss the results and draw conclusions in Sect. 4.5. Parts of this chapter have been published in [Baue 13b, Neum 11]. Parts from [Baue 13b] are reprinted with kind permission from Springer Science and Business Media, © Springer-Verlag London 2013. 4.1 Medical Background Let us depict the medical background of the addressed applications. As an overall introduction, first, we summarize the state-of-the-art and comment on the potential of 3-D endoscopy. Then, we describe the particular applications we address in laparoscopy (Sect. 4.1.1) and colonoscopy (Sect. 4.1.2). 3-D endoscopy can help accomplishing both diagnostic and interventional minimally invasive procedures (MIP) more easily, safely, and in a quantitative manner. For a comprehensive overview of prospective applications we refer to Sect. 2.3.4 and related literature [Moun 07, Miro 11]. Beside 3-D reconstruction from monocular endoscopic video data, there is an increasing body of work investigating the miniaturization of RI measurement principles for endoscopic application. To date, both the research community and hardware manufacturers focus on the development of rigid 3-D endoscopes. Recently, Röhl et al. presented a GPU-enhanced surface reconstruction framework for stereo endoscopy [Rohl 12], and Haase et al. [Haas 12] presented a first prototype for combined ToF/RGB endoscopy, manufactured by Richard Wolf GmbH, Knittlingen, Germany. Both systems provide photometric and geometric surface information in real-time. In addition, literature reveals promising concepts that can potentially be translated to flexible 3-D endoscopy, e.g. photometric stereo [Coll 12], structured light [Schi 11, Schm 12], or ToF imaging [Penn 09]. More concrete, Clancy et al. presented a flexible multi-spectral structured illumination probe [Clan 11], yet limited to sparse 3-D measurements. In any case, to gain acceptance, 3-D endoscopes are required to capture complementary photometric video footage as in conventional 2-D endoscopy. 4.1.1 Operation Situs Reconstruction in Laparoscopy Minimally invasive procedures have become a promising option to open surgery for an increasing number of applications. Working through small incisions holds substantial advantages compared to conventional surgery, including the reduction of post-operative patient pain and surgical trauma, a minimized risk of comorbidity, a reduced recovery period and hospital stays, and improved cosmetic results. Laparoscopic surgery refers to MIP in the abdominal cavity. The procedure involves the inflation of carbon dioxide into the operation cavity to create the working volume (pneumoperitoneum), and the insertion of surgical tools into the pneumoperitoneum through small incisions that are sealed with trocars. The intervention is performed under remote video guidance through an endoscopic camera and one or more light sources. 4.1 Medical Background 57 Major limitations in endoscopic interventions include a limited field of view, distortions and inhomogeneous illumination, a difficult hand-eye coordination, and the loss of tactile feedback and 3-D perception. This complicates the control of surgical tools and the assessment of pathological structures. Hence, endoscopic procedures require both skill and experience of the surgeon. In particular, this includes a high degree of dexterity, spatial awareness and orientation to navigate safely in vivo. Indeed, the success of laparoscopic procedures highly depends on precise knowledge of the patient anatomy during the intervention. Dynamic view expansion techniques are a promising option for conventional 2-D endoscopy [Totz 11, Totz 12, Warr 12]. Basically, view expansion techniques combine the previously observed endoscopic video footage to a panoramic view of the operation situs. Even though a 3-D reconstruction of the situs from monocular video data is theoretically possible, the image correspondence problem poses hurdles that are hard to overcome in practice. Intra-operative data acquisition with 3-D endoscopes holds a much more practicable opportunity to reconstruct the geometric topography of the operation situs by moving the endoscope over the scene while successively aligning the captured RI data stream. First and foremost, this provides the surgeon with an extended view of the target and surrounding anatomy. Second, the reconstructed shape can be aligned to pre-operative patient-specific reference data, cf. Chap. 3. This data fusion forms the basis for motion tracking and augmented reality applications, enhancing the surgeon’s navigation beyond the observable tissue surface with pre-operative patient-specific data. Third, the geometric reconstruction of the operation situs is of particular interest in robotic laparoscopic surgery [Stoy 10]. 4.1.2 Towards 3-D Model Construction in Colonoscopy In colorectal screening, optical colonoscopy (OC) is considered the gold standard for visual diagnosis and intervention. After colon preparation using laxatives, a flexible endoscope (colonoscope) is inserted into the rectum until the tip reaches the cecum. Now the gastroenterologist slowly retracts the scope while inspecting the colon for polyps and suspicious tissue. Note that the colon is inflated with gas to avoid it collapse. Opposed to alternative screening techniques outlined below, OC gives the opportunity to take a biopsy or remove pre-cancerous colonic polyps and suspicious lesions, if required. A potential non-invasive alternative for colorectal screening is virtual colonoscopy (VC) [McFa 11], with a reported sensitivity of 96.1% for colorectal cancer [Pick 11]. Based on CT data of the lower abdomen in both prone and supine position, a virtual model of the inflated colon is generated. This model enables the gastroenterologist to perform interactive endoluminal fly-throughs (as in OC) to identify abnormalities. Compared to OC, VC does not require sedation, takes less time and also reveals abnormalities outside the colon. However, it involves radiation dose, a limited spatial resolution, and the impossibility of morphologic diagnosis and intervention. As a consequence, VC can be considered a non-invasive screening alternative for patients with an indicated risk to undergo OC. However, in case of positive VC results, a conventional OC is necessary for intervention. 58 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction The fusion of pre-interventional VC data with interventional data acquired during OC is a promising future direction. For instance, on-the-fly registration of OC data with the colon model extracted from VC scans would allow guiding the gastroenterologist by tracking the OC position in the volumetric reference data. This would facilitate navigation and could potentially supersede the need for additional hardware such as in magnetic guidance [Hoff 07]. Usually, OC is performed without prior VC. In this case, a prospective application is the construction of a metric 3-D shape model of the colon from OC data to assist surgeons in quantitative diagnosis, inspection from different viewpoints, pre-operative planning, longitudinal monitoring of suspicious tissue, and surgical training. The extent of model construction depends on the application and ranges from building a model of a small lesion or polyp [Chen 09, Chen 10] up to reconstructing a full model of the entire colon from cecum to rectum. Ideally, the model should incorporate both the 3-D shape and photometric appearance information. Today, early approaches to model construction in colonoscopy rely on monocular sensors and exploit SfM/SLAM techniques for shape reconstruction [Kopp 07, Liu 08, Chen 09, Chen 10]. However, expecting the availability of flexible 3-D endoscopes in the medium term, we investigate their potentials for shape reconstruction in 3-D OC. 4.2 Related Work Both applications considered in this chapter require a fast technique to align RI data on-the-fly during the intervention. In contrast to Chap. 3 where we used feature-based global registration techniques to cope with gross misalignments, the alignment of successive frames from a hand-guided RI device can be addressed with a local registration approach, cf. Sect. 2.4.1. Two decades after its introduction [Besl 92, Chen 92, Zhan 94], the ICP algorithm is still the de-facto standard in rigid point cloud registration in the case of slight shape misalignments. Over the years, a multitude of ICP variants have been proposed in literature, see Sect. 2.4.2 and the reviews by Rusinkiewicz and Levoy [Rusi 01] and Salvi et al. [Salv 07]. However, in the field of 3-D model reconstruction, only few existing approaches have achieved interactive frame rates so far. Huhle et al. proposed a system for on-the-fly 3-D scene modeling using a low resolution ToF camera (160×120 px), achieving per-frame runtimes of >2 s [Huhl 08]. Engelhard et al. presented comparable runtimes on Microsoft Kinect data (640×480 px) for an ICP-based RGB-D SLAM framework [Enge 11], Henry et al. [Henr 12] perform ICP registration in an average of 500 ms. Only recently, real-time frame rates were reported for geometric ICP variants. In particular, the GPU-based KinectFusion framework [Izad 11, Newc 11] has gained popularity in the field of 3-D reconstruction. Its core is based on the work of Rusinkiewicz et al. [Rusi 02], combining projective data association [Blai 95] and a point-to-plane metric [Chen 92] for transformation estimation. In projective data association, the set of moving points is typically projected onto the fixed mesh to establish the correspondences, instead of performing a nearest neighbor search over the entire fixed mesh. This projection approach involves an inferior convergence behavior [Blai 95], but can be performed very efficiently. The identification of corresponding points typically relies on comparing characteristics 4.2 Related Work (a) 59 (b) (c) (d) Figure 4.1: Incorporating photometric information into ICP alignment in situations of nonsalient surface geometry: (a, b) First and last frame of an RGB-D sequence capturing a colored poster sticked to a plane wall from changing perspectives. Considering only scene geometry results in an erroneous alignment (c). Instead, with the proposed method exploiting both geometric and photometric information, a proper alignment is found (d). of the moving point to its corresponding projection point on the fixed mesh. This hard constraint can be relaxed by extending the correspondence search to the local neighborhood of the projected point on the fixed mesh. However, this involves a substantial computational burden [Dora 97, Benj 99]. More than a decade ago, Godin et al. [Godi 94], Johnson and Kang [John 97] and Weik [Weik 97] presented approaches to incorporate photometric information into the ICP framework (photo-geometric ICP) to improve its robustness. The idea is that photometric information can compensate for regions with non-salient topographies, whereas geometric information can guide the pose estimation for faintly textured regions, see Fig. 4.1. The rare consideration of the concept of photogeometric ICP in the literature may result from the (1) unavailability of low-cost RGB-D sensors until recently and (2) the fact that it requires a 6-D nearest neighbor search, implying a substantial performance bottleneck. Modifications have been proposed that try to accelerate the nearest neighbor search by pruning the search space w.r.t. photometrically dissimilar points [Druo 06, Joun 09]. However, this reduction typically comes with a loss in robustness. Traditional approaches for efficient nearest neighbor search rely on space-partitioning data structures like k-D trees [Aken 02]. However, these structures are unsuitable for implementation on modern many-core hardware due to the non-parallel and recursive nature of the construction and/or traversal of the underlying data structures. Recently, space-partitioning strategies have been introduced that are specifically designed for many-core architectures. A promising approach is the random ball cover (RBC) proposed by Cayton [Cayt 10, Cayt 11]. The basic principle behind RBC is a two-tier nearest neighbor search. Even though the nearest neighbor search itself builds on the brute force (BF) search primitive, the introduction of a two-tier search hierarchy enables considerable speedups. In this work, trading accuracy against runtime, we propose a new approximative RBC variant that is optimized in terms of runtime performance to accelerate the nearest neighbor search in photo-geometric ICP alignment. In the applications addressed in this chapter, we expect a photo-geometric ICP to be of particular interest regarding (1) flat, spherical, or tubular formed structures (e.g. in tracheal endoscopy, bronchoscopy or colonoscopy) that evoke ambiguities in geometry-driven surface registration, 60 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction and (2) its application with modern RGB-D sensors that exhibit a low SNR in the range domain but additionally provide high-grade photometric information. 4.3 Photo-geometric Surface Registration Framework The proposed framework for on-the-fly photo-geometric point cloud mapping and 3-D shape modeling is composed of three stages, as depicted in Fig. 4.2. The initial stage involves RGB-D data acquisition and pre-processing. In the second stage, based on a set of landmarks, the proposed photo-geometric ICP variant is applied (Sect. 4.3.1). The rigid body transformation is estimated in a frame-to-frame manner, i.e. the pose of the instantaneous frame is estimated by registration against the previous frame. In the third stage, based on the estimated transformation, the instantaneous RI data are integrated into a global shape model. 4.3.1 Photo-geometric ICP Scheme The ICP algorithm estimates the rigid transformation (R, t) that brings a moving template point set Xm = { xm,1 , . . . , xm,|Xm | } in congruence with a fixed reference point set Xf = { xf,1 , . . . , xf,|Xf | } [Besl 92, Chen 92]. Based on an initial guess ( R0 , t 0 ), the scheme iteratively estimates the optimal transformation by minimizing an error metric assigned to repeatedly generated pairs of corresponding landmarks ( xm , xc ) where xm ∈ Xm and xc ∈ Xf . A standard choice for the distance d between an individual moving landmark xm and the set of reference landmarks Xf is the squared Euclidean distance: d( xm , Xf ) = min k xf − xm k22 . xf ∈Xf (4.1) The landmark xc ∈ Xf yielding the minimum distance to xm is then given by: xc = argmin k xf − xm k22 . (4.2) xf ∈Xf Now, let us consider RGB-D data, where a point holds both geometric x ∈ R3 and photometric information p ∈ R3 . In order to compensate for inconsistencies due to changes in illumination and viewpoint direction, we transfer the photometric information to normalized RGB space first [Geve 99], pr −1 p = ( pr + p g + p b ) p g , (4.3) pb where pr , p g , pb denote the measured intensities of the red, green and blue photometric channels. We denote the concatenation of both domains for the moving and fixed data M,F as: M = {( xm,1 , pm,1 ), . . . , ( xm,|Xm | , pm,|Xm | )} , F = {( xf,1 , pf,1 ), . . . , ( xf,|Xf | , pf,|Xf | )} . (4.4) (4.5) 4.3 Photo-geometric Surface Registration Framework 61 Figure 4.2: Flowchart of the proposed photo-geometric rigid surface registration and 3-D shape model reconstruction framework. Our proposed photo-geometric ICP variant should incorporate both geometric and photometric information in the correspondence search. Hence, we modify the distance metric d: 2 2 (4.6) d( xm , Xf ) = min βk xf − xm k2 + (1 − β)k pf − pm k2 , ( xf ,pf )∈F where β ∈ [0, 1] is a non-negative constant weighting the influence of the geometric and photometric information, respectively. The landmark xc ∈ Xf yielding the minimum distance to xm is given by: xc = argmin βk xf − xm k22 + (1 − β)k pf − pm k22 . (4.7) ( xf ,pf )∈F Assigning a nearest neighbor xc to all xm ∈ Xm yields a set of corresponding points Xc = { xc,1 , . . . , xc,|Xm | } and the set of landmark correspondences can be denoted as C = {( xm,1 , xc,1 ) , . . . , ( xm,|Xm | , xc,|Xm | )}, cf. Sect. 3.3.1. Next, based on the landmark correspondences C k found in the k-th ICP iteration, the transformation ( R̂k , t̂ k ) can be estimated by either minimizing a pointto-point error metric in a least-squares sense using a unit quaternion optimizer [Horn 87], also recall Eq. (3.12), ( R̂k , t̂ k ) = arg min Rk ,t k 1 k k( Rk xm + t k ) − xck k22 , ∑ k |C | C k (4.8) 62 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction or by minimizing a point-to-plane distance metric using a nonlinear solver as originally proposed by Chen and Medioni [Chen 92]: ( R̂k , t̂ k ) = arg min Rk ,t k 2 1 k (( Rk xm + t k ) − xck )> nxck . ∑ k |C | C k (4.9) Here, nxck denotes the surface normal associated with the point xck ∈ Xf . After each iteration, the global ICP solution (Ricp , ticp ) is accumulated: Ricp = R̂k Ricp , ticp = R̂k ticp + t̂ k , (4.10) k are updated according to xk = R x + t . The two and the elements of Xm icp m icp m alternating stages of (1) finding the set of nearest neighbors Xck and (2) estimating the optimal transformation given the correspondences C k are repeated iteratively until a convergence criterion is fulfilled, see Sect. 4.4. In the following section, we describe the RBC-based 6-D nearest neighbor search framework that we use to establish the ICP landmark correspondences (Eq. 4.7) in an efficient manner. In particular, we detail how we optimized the scheme, trading off accuracy against runtime performance. 4.3.2 Approximative 6-D Nearest Neighbor Search using RBC The RBC is a novel data structure for parallelized nearest neighbor search, proposed by Cayton [Cayt 10, Cayt 11]. By design, both the construction of the RBC as well as dataset queries are based on using BF search primitives that can be performed efficiently on many-core hardware. The RBC data structure relies on randomly selected points r ∈ F , called representatives. Each of them manages a local subset of F . This indirection creates a hierarchy in the database such that a nearest neighbor query is processed by (1) searching the nearest neighbor r among the set of representatives and (2) performing another search for the subset of entries managed by r. This two-tier approach outperforms a global BF search due to the fact that each of the two successive stages explore a heavily pruned search space. In this work, we have investigated the fitness of the RBC for acceleration of the 6-D nearest neighbor search of our photo-geometric ICP. In particular, we have optimized the concept in terms of runtime performance. Cayton proposed two alternative RBC search strategies [Cayt 11]: The exact search is the appropriate choice when the exact nearest neighbor is required. Otherwise, if a small error may be tolerated, the approximative one-shot search is typically faster. Originally, in order to set up the one-shot data structure, the representatives are chosen at random, and each r manages its S closest database elements. Depending on S, points typically belong to more than one representative. This implies a sorting of all database entries for each representative – hindering a high degree of parallelization or implying the need for multiple BF runs [Cayt 10]. Hence, we introduce a modified version of the one-shot approach that is even further optimized in terms of performance. In particular, we simplified the RBC construction, trading off accuracy against runtime, see Fig. 4.3 (a–c). First, we select a random set of representatives R = {r1 , . . . , r |R| } out of the set of fixed points F . Second, 4.3 Photo-geometric Surface Registration Framework 63 RBC Construction Scheme (a) (b) (c) RBC Query Scheme (d) (e) (f) Figure 4.3: Illustration of the RBC construction (a-c) and the two-tier nearest neighbor query scheme (d- f ) for the simplified case of 2-D data. (a) Selection of a set of representatives R (labeled in dark blue) out of the set of database entries F (light and dark blue). (b) Nearest representative search over the set of database entries, to establish a landmarkto-representative mapping. (c) Nearest neighbor set of each representative (shaded in blue). (d) Query data (orange) and set of representatives R (dark blue). (e) Identification of the closest representative r, in a first BF search run. ( f ) Identification of the nearest neighbor (green) in the subset of entries managed by r (shaded in blue), in a second BF search run. each representative r is assigned a local subset of F . This is done in an inverse manner by computing the nearest representative r for each point ∈ F . The query scheme of our modified one-shot RBC variant is consistent with the original approach and can be performed efficiently using two subsequent BF runs [Cayt 11], see Fig. 4.3 (d–f). First, the closest representative is identified among R. Second, based on the associated subset of entries managed by r, the nearest neighbor is located. Let us stress that this modified RBC construction scheme results in an approximative nearest neighbor search being error-prone from a theoretical point of view. In practice, facing the trade-off between accuracy and runtime, we tolerate this approximation, cf. Sect. 4.4.1. We further remark that the scheme is not limited to 6-D data but can applied to data of any dimension without loss of generality. For application in surface registration, this potentially allows to extend the point signature for ICP correspondence search from 6-D to higher dimensions, e.g. appending additional complementary information or local feature descriptors to the raw geometric and photometric measurements acquired by the sensor, cf. [Henr 12]. 64 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction 4.4 Experiments and Results The experiments in this chapter divide into three parts. First, we study the overall performance of the proposed framework on classical computer vision scenarios (Sect. 4.4.1). Here, we put emphasis on quantifying the benefits of using our optimized approximative RBC compared to its original formulation [Cayt 11] and a BF baseline. Second and third, we present synthetic experiments to investigate the application of the framework to operation situs reconstruction in 3-D laparoscopy (Sect. 4.4.2) and organ shape model construction in 3-D colonoscopy (Sect. 4.4.3). Before, let us briefly depict the entire system for model reconstruction as illustrated in Fig. 4.2. Prior to point cloud alignment, the acquired RGB-D data are preprocessed. For RI data enhancement, we combine restoration of invalid measurements using normalized convolution with edge-preserving denoising (Sect. 2.2.3). Considering the application of 3-D scene reconstruction using a real-time, handheld and steadily moved RGB-D device implies that a portion of the scene that was captured in the previous fixed frame is no longer visible in the instantaneous moving data and vice versa. Facing this issue, we discard the subset of moving points xm ∈ Xm that correspond to range measurements within the boundary area of the 2-D sensor domain to improve the robustness of ICP alignment. This clipping is performed in conjunction with the extraction of a set of ICP landmarks F , M from the fixed reference and moving template data. The landmark extraction is performed by sub-sampling the clipped point set. The ICP transformation is estimated by minimizing a point-to-point distance metric (Eq. 4.8). 4.4.1 Performance Study In this section, we evaluate the proposed photo-geometric ICP for on-the-fly scene and object reconstruction scenarios on real data from a hand-guided Microsoft Kinect RI sensor. First, we present qualitative results for both indoor scene mapping and object reconstruction scenarios. Second, we demonstrate the real-time capability of the framework in a thorough performance study. Third, we compare our approximative RBC variant to an exact NN search in terms of accuracy. 1 For all experiments, the number of representatives was set to |R| = |F | 2 along the suggestions by Cayton [Cayt 11], if not stated otherwise. As ICP convergence criterion we analyze the variation of the estimated transformation over subsequent iterations. In particular, we evaluate the change in translation magnitude and rotation angle w.r.t. heuristically set thresholds of 0.01 mm and 0.001◦ , respectively. As initialization for the ICP alignment, we incorporate the estimated global transformation ( R0 , t 0 ) from the previously aligned frame, see Fig. 4.2, assuming a smooth trajectory of the hand-guided acquisition device with a consistent direction. This speeds up convergence and thus reduces the overall runtime. Regarding the robustness and accuracy of point cloud alignment, we observed a strong impact of outliers that may occur in pre-processed RGB-D data due to changes in viewpoint direction or occlusion and cannot be eliminated by denoising. To account for these outliers, we optionally reject low-grade correspondences in the transformation estimation stage. The term low-grade is quantified by comparing the distance of a 4.4 Experiments and Results 65 Figure 4.4: First row: On-the-fly 3-D reconstruction of a lounge room (526 frames). The left image depicts a bird-eye view of the respective room layout. The images on the right provide a zoom-in for selected regions. Second row: 3-D reconstruction of a female torso model (cf. Fig. 3.6), where the hand-held acquisition device was moved around the model in a 360◦ -fashion to cover the entire object from different perspectives (525 frames). corresponding pair of landmarks (cf. Eq. 4.6) w.r.t. an empirical threshold δlg . The set of low-grade correspondences is re-computed for each ICP iteration and discarded in the subsequent transformation estimation step. Qualitative Reconstruction Results. Qualitative results for the reconstruction of indoor environments are depicted in the first row of Fig. 4.4. The RGB-D sequences were acquired from a static observer location by rotating the hand-held sensor. The alignment was performed on-the-fly. The room was reconstructed using the following pre-processing and ICP/RBC settings: Edge-preserving denoising (geometric median, geometric and photometric guided image filter), |F | = |M| = 214 ICP landmarks, 10% edge clipping, β = 0.0005, no elimination of low-grade correspondences (δlg → ∞). Note the scale of β resulting from the different scales of the geometric domain operating in [mm] and the photometric domain operating in the range [0, 1]. In addition to scene reconstruction, the proposed framework can also be applied to 3-D model digitalization, see the second row in Fig. 4.4. Here, the hand-held acquisition device is moved around an object to acquire RGB-D data from different perspectives while continuously merging the data into a global model using the proposed framework. For the case of 3-D object reconstruction, we apply a dedicated scheme for landmark extraction. Instead of considering the entire scene, we segment the foreground using a depth threshold. From the set of foreground pixels, we then select a subset of landmarks. Background data points that are located beyond a certain depth level are ignored within the ICP alignment procedure. The settings for object reconstruction were: Edge-preserving denois- 66 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction Runtime Performance: Brute Force vs. Exact RBC vs. Approximative RBC Approximative RBC 210 211 212 213 214 Figure 4.5: Comparison of the average runtime for a single ICP iteration based on GPU implementations of the BF search primitive, the exact RBC and our optimized approximative RBC variant as described in Sect. 4.3.2, for increasing number of landmarks. Note that our modified approximative RBC approach outperforms the exact RBC up to a factor of 3. The BF primitive scales quadratically w.r.t. the number of landmarks. ing, 214 ICP landmarks, β = 1 (invariance to illumination issues), δlg = 3 mm. Regarding the effectiveness of the proposed system for the reconstruction of scenes with non-salient 3-D geometry, we refer to Fig. 4.1. Facing a colored poster sticked to a plane wall, the reconstruction could benefit considerably from incorporating the photometric domain as a complementary source of information. Runtime Performance. Now let us study the potential of the proposed RBCbased ICP framework in terms of runtime performance. For that purpose, we have implemented the proposed photo-geometric framework on the GPU using CUDA [Baue 13b]. The runtime study was conducted on an off-the-shelf consumer desktop PC equipped with an NVIDIA GeForce GTX 460 GPU and a 2.8 GHz Intel Core 2 Quad Q9550 CPU. Runtimes were averaged over several successive runs. A comparison of absolute runtimes for a single ICP iteration is presented in Fig. 4.5. Our modified approximative RBC outperforms both a BF search and our reference implementation of Cayton’s exact RBC. In particular, the approximative RBC variant outperforms the exact RBC implementation up to a factor of 3. The BF search scales quadratically with the number of landmarks. Typical ICP runtimes are presented in Table 4.1. From our experiments on indoor scene mapping, we observed the ICP to converge after 10-20 iterations using the stopping criterion described in Sect. 4.4.1. Hence, as an overall performance indicator, let us refer to the runtime of 19.1 ms for 20 iterations with 214 landmarks. 4.4 Experiments and Results 67 Mapping Error vs. Number of Representatives ICP Iteration Runtime vs. Number of Representatives 210 211 212 213 214 20 22 210 211 212 213 214 24 26 28 210 212 214 20 (a) 22 24 26 28 210 212 214 (b) Figure 4.6: (a) Evaluation of the influence of |R| on mapping accuracy, compared to an exact BF search, for varying number of landmarks. The graphs show both discretized measurements and a trendline for each setting. Note the semi-log scale. (b) Runtimes of a single ICP iteration, for varying number of landmarks/representatives. Note the log scale. Approximative RBC. As detailed in Sect. 4.3.2, our approximative RBC construction and NN search trades exactness for runtime speedup. We quantitatively investigated the error that results from this approximation opposed to an exact BF search, comparing the Euclidean mesh-to-mesh distance of the aligned point clouds and considering the BF-based transformation estimate as gold standard, see Fig. 4.6a. With an increasing number of representatives |R|, the mapping error rises until dropping sharply when approaching |R| = |F |. Vice versa, for |R| |F |, decreasing |R| with a fixed number of landmarks reduces the error. This results from our approximative RBC construction scheme, where the probability of erroneous NN assignments increases with the number of representatives. Please note that both situations of |R| = 1 and |R| = |F | correspond to a classical BF search, hence yielding an identical alignment and a mean error of zero. In general, increasing the number of landmarks decreases the error. We remark that using our default configuration (214 LMs, |R| = 27 ), the mapping error is less than 0.25 mm. This is an acceptable scale for the large-scale applications con# Landmarks 210 211 212 213 214 |R| 1 |F | 2 = 32 1 |F | 2 = 45 1 |F | 2 = 64 1 |F | 2 = 91 1 |F | 2 = 128 tRBC,C tICP ttot (10 its) ttot (20 its) 0.58 0.25 3.13 5.68 0.60 0.27 3.31 6.03 0.63 0.32 3.80 6.97 0.76 0.50 5.80 10.82 0.90 0.91 9.96 19.07 Table 4.1: Runtimes in [ms] for the construction of the RBC data structure (tRBC,C ) and ICP execution for reconstructing a typical indoor scene, for varying number of landmarks. We state both the runtime for a single ICP iteration tICP and typical total ICP runtimes ttot (including RBC construction) for 10 and 20 iterations, respectively. 68 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction (a) (b) Figure 4.7: (a) Synthetic scene mimicking the scenario in a minimally invasive procedure in the abdominal cavity. (b) Illustration of a camera path, showing both photometric (first rows) and geometric depth information (second rows, darkness denotes closeness, brightness remoteness to the RI camera). For convenience, the entire RGB-D sequence is depicted using eight keyframes from top left to bottom right. sidered here. In addition, we have related the runtime per ICP iteration to the number of representatives, see Fig. 4.6b. Apart from the runtime minimum that 1 is located around |R| = 2|F | 2 , the computational load rises when decreasing or increasing |R|. Simultaneously, the error decreases, recall Fig. 4.6a. Hence, the application-specific demands in terms of runtime and accuracy motivate the choice of |R|. Together, Figs. 4.6a,b nicely illustrate the trade-off between error and runtime. 4.4.2 Experiments on Operation Situs Reconstruction Now let us consider the application of the proposed photo-geometric registration framework to operation situs reconstruction in laparoscopy. Even though promising concepts for 3-D endoscopy have been introduced lately, recall Sect. 4.1, the commercial availability of such hardware is an open issue. Hence, in this work, we performed a comprehensive study on synthetic RGB-D data. We explicitly investigate the benefit of the proposed photo-geometric approach to a solely geometrydriven variant in the presence of low-SNR range measurements. Materials and Methods. For the experiments, we generated a synthetic scene mimicking the scenario of a minimally invasive procedure in the abdominal cavity. We then used a virtual RGB-D camera to acquire both geometric range and photometric texture information while successively moving the virtual camera over the operation situs. Let us stress that the photometric values at a specific location on the scene vary with changing camera perspective, due to the underlying shader. The synthetic scene was generated from human organ shapes extracted from CT data. For the experiments below, we used mesh data from a publicly available 4.4 Experiments and Results 69 Frame-to-Frame Error (Translation), 3-D vs. 6-D 1.5 Frame-to-Frame Error (Rotation), 3-D vs. 6-D 0.35 0.3 in [°] in [mm] 0.25 1 0.2 0.15 0.5 0.1 0.05 0 p1 p2 p3 p4 p5 p6 Camera Path (a) p7 p8 p9 p10 0 p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 Camera Path (b) Figure 4.8: Boxplots of the (a) translational and (b) rotational drift for 3-D geometric ICP registration (in blue) and 6-D photo-geometric ICP registration (in green), for the ten camera paths used for quantitative evaluation. database1 . It includes both anonymized medical images and manual segmentations of structures of interest performed by clinical experts. The synthetic scene was created in collaboration with a physician. In particular, we combined mesh data of the liver, stomach, and gallbladder of a female patient, and modeled the surrounding fat tissue. The organ textures were generated using a professional medical visualization texture package2 and mapped onto the individual anatomical structures using texture mapping techniques, see Fig. 4.7a. Next, using our virtual RGB-D camera (Sect. 2.2.2), we manually generated a set of ten different camera paths. The paths were set up in a way that the entire operation situs is roughly covered. For an illustration, see Fig. 4.7b. The path is sampled with an RGB-D stream covering 200 frames. The FOV of the virtual RGB-D camera was set to 80◦ , being a typical value for laparoscope optics [Yama 07]. For evaluation, we aligned the ten different RGB-D data streams on-the-fly and integrated the registered RI surface data from successive frames into a global shape model. The model is based on the concept of truncated signed distance functions (TSDF) along the lines of Curless and Levoy [Curl 96] and Hilton et al. [Hilt 96]. In terms of parameterization, we used 20,000 landmarks, |R| = 256 RBC representatives, a geometric weighting of β = 0.001, 10% edge clipping, and 50 ICP iterations. We quantify the alignment error by evaluating the relative camera transformation error in a frame-to-frame manner, i.e. considering the local accuracy of the camera trajectory over subsequent frames, aka. drift. In particular, along the lines of Sturm et al. [Stur 12], we define the relative camera transformation error Ei ∈ R4×4 for frame i as: −1 Ei := TGT,i Ti , (4.11) where TGT,i ∈ R4×4 denotes the ground truth camera transformation at frame i w.r.t. the previous frame (i − 1), known from the given camera path, and Ti ∈ R4×4 1 http://www.ircad.fr/softwares/3Dircadb/3Dircadb1/index.php?lng=en 2 http://www.doschdesign.com/products/textures/medical_visualization_v3.html 70 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction 0.15 6-D ICP, Frame-to-Frame Error (Translation), Synth./Noisy 0.1 6-D ICP, Frame-to-Frame Error (Rotation), Synth./Noisy 0.09 0.07 0.1 in [°] in [mm] 0.08 0.06 0.05 0.04 0.05 0.03 0.02 0.01 0 p1 p2 p3 p4 p5 p6 p7 p8 p9 0 p10 p1 p2 p3 p4 Camera Path (a) 2 p7 p8 p9 p10 0.5 3-D ICP, Frame-to-Frame Error (Rotation), Synth./Noisy 0.45 1.6 0.4 1.4 0.35 1.2 0.3 in [°] in [mm] p6 (b) 3-D ICP, Frame-to-Frame Error (Translation), Synth./Noisy 1.8 1 0.25 0.8 0.2 0.6 0.15 0.4 0.1 0.2 0.05 0 p5 Camera Path p1 p2 p3 p4 p5 p6 Camera Path (c) p7 p8 p9 p10 0 p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 Camera Path (d) Figure 4.9: (a, b) Boxplots of the translational and rotational drift for 6-D photo-geometric ICP registration, on ideal (green) vs. noisy range data (red), for the ten camera paths. (c, d) Translational and rotational drift for 3-D geometric ICP registration, on ideal (blue) vs. noisy range data (red). the transformation that was estimated based on surface registration using the proposed framework. From these error matrices Ei , we eventually consider the translational and rotational components and calculate the per-frame Euclidean translation error k∆tE k2 and mean rotational error |∆θ E | = 31 (|∆θ E,x | + |∆θ E,y | + |∆θ E,z |), being a suitable metric in the presence of small angles, over the entire RI sequence. Note that this per-frame drift metric is particularly useful for the evaluation of scene reconstruction scenarios. In order to investigate the resilience of the photogeometric ICP variant to noise, we applied Gaussian noise to the virtual range image data at a rather rigorous noise level of σ = 5 mm. As in practice, we applied edge-preserving denoising using guided image filtering (Sect. 2.2.3). Results. Quantitative results comparing the performance of the conventional geometric (3-D) ICP against the proposed photo-geometric variant (6-D) in terms of translational/rotational drift are depicted in Fig. 4.8. Averaged over all ten camera paths, the mean errors were k∆tE k2 = 0.264 mm, |∆θ E | = 0.079◦ for 3-D ICP and 4.4 Experiments and Results (a) 71 (b) (c) Figure 4.10: Qualitative reconstruction results for the camera path shown in Fig. 4.7. The Euclidean mesh-to-mesh distance of the reconstructed scene to the known ground truth is color-coded. The second row shows the reconstruction results. The first row additionally depicts the ground truth data. From left to right: Reconstruction results for (a) geometric and (b) photo-geometric ICP on ideal synthetic data, and (c) for photo-geometric ICP on noisy data. Note the salient seam between the first and last frame (marked with a black ellipse) for geometric reconstruction. With photo-geometric reconstruction on noisy data, the seam is reduced substantially. For photo-geometric reconstruction on ideal data it is hardly visible. k∆tE k2 = 0.020 mm, |∆θ E | = 0.013◦ for 6-D ICP. This corresponds to a substantial improvement in terms of reconstruction accuracy by a factor of 12.9 (translation) and 5.9 (rotation), respectively. Comparing the reconstruction results of our photo-geometric approach on ideal synthetic range data to the results on noisy range data, see Fig. 4.9a,b, we observed a decrease in accuracy. In average over all paths, the per-frame drift increased by a factor of 2.1 (translation) and 2.2 (rotation). Note that in absolute numbers, the mean translational and rotational errors with noisy data are still in an acceptable range of k∆tE k2 = 0.044 mm, |∆θ E | = 0.029◦ and substantially outperform the geometric ICP variant on ideal range data (Fig. 4.9c,d). In addition, let us compare the influence of noise independently for 3-D ICP and 6-D ICP, see Fig. 4.9a-d. In average, the absolute increase in drift for the 3-D ICP due to noise exceeded the absolute increase in drift for the 6-D ICP by a factor of 7.0 (translation) and 3.3 (rotation). This confirms our initial assumption that the proposed photo-geometric approach is of particular interest for RGB-D cameras that provide low-SNR range measurements but additionally acquire high-grade photometric information. Qualitative results of the reconstructed operation situs based on the camera path depicted in Fig. 4.7 are illustrated in Fig. 4.10. Here, the local Euclidean mesh-to-mesh distance between the reconstructed and ground truth shape is colorcoded. Note that for the geometry-driven variant (Fig. 4.10a), the non-negligible drift behavior makes the reconstruction fail with an increasing number of frames 72 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction (a) (b) (c) (d) Figure 4.11: Realistic renderings of the colonic lumen during virtual fly-through. The first row depicts photometric data. In the second row, the associated range data is graycoded, where darkness denotes closeness, brightness remoteness to the RI camera. Note the symmetries of the tubular shapes in (a) and (c) and the triangular-shaped muscles seen when inflating the colon with gas. considered. The photo-geometric reconstruction provides reasonable results, even on noisy data. 4.4.3 Experiments on Colon Shape Model Construction In this section, we address the application of the photo-geometric registration framework for the construction of a shape model of the colon in 3-D colonoscopy. Experiments are performed on synthetic RGB-D data generated from realistically textured colon meshes that were extracted from virtual colonoscopy CT data. Materials and Methods. The experiments below are conducted on textured colon meshes from the work of Passenger et al. [Pass 08]3 . These were generated from abdominal CT data from a VC study4 . As detailed in Passenger et al. [Pass 08], textured colon meshes were generated as follows: First, the colon is segmented from CT data by edge-preserving anisotropic smoothing of the image data [Whit 01], subsequent thresholding to separate the gas-filled areas from the background, and connected component analysis to select the colon. Second, the resulting binary segmentation mask is meshed using Marching Cubes [Lore 87]. Eventually, Laplacian smoothing [Fiel 88] and decimation filters are applied to obtain a smooth surface mesh of the colonic lumen. Realistic tissue renderings for the colonic wall were obtained by computing texture coordinates for the generated surface mesh using a dedicated surface parameterization algorithm [Pass 08]. For generation of virtual fly-throughs, we computed a centerline through the colon mesh based on tracing the shortest path between two manually labeled points (cecum, rectum) with the restriction that the paths are bound to run on the Voronoi diagram of the colon 3 Courtesy 4 Courtesy of Dr. Hans de Visser, Australian e-Health Research Centre of Dr. Richard Choi, Virtual Colonoscopy Center, Walter Reed Army Medical Center 4.4 Experiments and Results 73 Frame-to-Frame Error (Translation), 3-D vs. 6-D 0.6 Frame-to-Frame Error (Rotation), 3-D vs. 6-D 0.8 0.7 0.5 in [°] in [mm] 0.6 0.4 0.3 0.5 0.4 0.3 0.2 0.2 0.1 0 0.1 c1 c2 c3 c4 c5 c6 c7 0 c8 c1 c2 c3 Colon Case ( a) c6 c7 c8 6-D ICP, Frame-to-Frame Error (Rotation) 0.08 0.5 0.07 0.05 0.06 0.4 0.04 in [°] in [mm] c5 (b) 6-D ICP, Frame-to-Frame Error (Translation) 0.06 c4 Colon Case 0.3 0.03 0.2 0.05 0.04 0.03 0.02 0.02 0.1 0.01 0.01 0 0 c1 c2 c3 c4 c5 c6 c7 c8 0 c1 c2 c3 c4 c5 Colon Case Colon Case (c) (d) c6 c7 c8 Figure 4.12: (a, b) Boxplots of the translational and rotational drift for 3-D geometric (blue) vs. 6-D photo-geometric ICP registration (green), for all eight colon cases. (c, d) Drift for 6-D photo-geometric ICP at a fine scale. Note that the scales differ by a factor of 10. model [Anti 02]. The colon texture was created using a professional medical visualization texture package5 . In total, the proposed framework is evaluated on eight textured colon models, see Fig. 4.16. The camera paths for virtual fly-through are generated from the computed centerlines, see Fig. 4.11. For each colon dataset, the camera path from cecum to rectum is divided into 1000 frames. The fly-throughs start at the cecum and end at the rectum, mimicking the clinical practice of colon examination while retracting the colonoscope (Sect. 4.1.2). Furthermore, we simulate a forward viewing endoscope optic with wide angle optics (100◦ FOV) being typical in colonoscopy. RGB-D data is generated using the virtual camera introduced in Sect. 2.2.2. Similar to the experiments in laparoscopy (Sect. 4.4.2), in terms of parameterization we used 20,000 landmarks, |R| = 256, β = 0.001, 30% edge clipping, and 50 ICP iterations. For quantitative evaluation, we consider the translational and rotational component of the relative camera transformation error (Eq. 4.11) again. 5 http://www.doschdesign.com/products/textures/medical_visualization_v3.html 74 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction 0.7 3-D ICP, Frame-to-Frame Error (Translation), RBC/Exact 0.9 3-D ICP, Frame-to-Frame Error (Rotation), RBC/Exact 0.8 0.6 0.7 0.6 in [°] in [mm] 0.5 0.4 0.5 0.4 0.3 0.3 0.2 0.2 0.1 0 0.1 c1 c2 c3 c4 c5 c6 c7 c8 0 c1 c2 c3 c4 c5 Colon Case Colon Case (a) (b) c6 c7 c8 Figure 4.13: Boxplots of the (a) translational and (b) rotational drift, comparing the results of an approximative RBC-based geometric ICP registration (blue) to the results with a BFbased NN search (green). Results. Let us first give an overview of the results, comparing the reconstruction accuracy of geometric vs. photo-geometric ICP. Quantitative results for the eight colon datasets are given in Fig. 4.12. For the geometry-driven reconstruction, the drift was k∆tE k2 = 0.137 mm and |∆θ E | = 0.252◦ . The proposed photo-geometric ICP reduced the error to k∆tE k2 = 0.025 mm and |∆θ E | = 0.038◦ . This corresponds to an improvement in reconstruction accuracy by a factor of 5.5 (translation) and 6.5 (rotation), respectively. Note that compared to operation situs reconstruction (Sect. 4.4.2), the application in colonoscopy considered here involves (1) a more complex camera path navigating through the colonic bends, (2) less diverse scene content, and (3) a rather low-contrast texture. Now let us compare the performance of our approximative RBC-based NN search to an exact BF-based NN search w.r.t. reconstruction accuracy. Experiments are performed for a geometric ICP, the results are depicted in Fig. 4.13. Over all eight cases, both approaches yield a similar drift behavior. In average, the exact BF-based NN search reduced the error by approximately 1%. This is an acceptable impairment considering the associated gain in runtime performance with our approximative RBC-based approach, recall Fig. 4.5. We further investigated the influence of the geometric weighting β on the drift behavior, see Fig. 4.14. The boxplots indicate that the optimal weighting lies in the scale of β = 0.001. It is worth noting that apart from this minimum, the drift increases relatively slowly – stable results were achieved over a wide range of values. We also investigated the convergence behavior for geometric and photo-geometric ICP alignment. Quantitative results of the drift after different numbers of iterations are depicted in Fig. 4.15. Note that the photo-geometric variant converges substantially faster while achieving a lower residual error, cf. Fig. 4.12. Comparing the drift after 16 ICP iterations, e.g., the photo-geometric approach outperforms the geometric one by factors of 5.1 (translation) and 4.3 (rotation), respectively. 4.5 Discussion and Conclusions 75 Photogeometric Weight, Frame-to-Frame Error (Translation) 0.15 0.5 Photogeometric Weight, Frame-to-Frame Error (Rotation) 0.45 0.35 0.1 in [°] in [mm] 0.4 0.3 0.25 0.2 0.05 0.15 0.1 0.05 0 1E-5 5E-5 1E-4 5E-4 1E-3 5E-3 1E-2 5E-2 1E-1 5E-1 0 1E-5 5E-5 1E-4 5E-4 1E-3 5E-3 1E-2 5E-2 1E-1 5E-1 Geometric Weight (a) Geometric Weight (b) Figure 4.14: Investigation of the influence of the geometric weighting β, for a single case. Given are boxplots of the (a) translational and (b) rotational drift. Recall that the scale of β results from the different scales of the geometric and photometric domain, respectively. Eventually, let us present some qualitative results for the reconstruction of a full colon shape model from 3-D colonoscopy data, see Fig. 4.16. Recall that the experiments are performed based on a rigidity assumption that will not be fulfilled in practical application. Nonetheless, the renderings underline the benefit of incorporating additional photometric information into the registration process, particularly for tubular shaped anatomic structures that imply ambiguities in the geometric domain. Reconstructing a colon from a sequence of 1000 frames further stresses the effect of drift on the global model over time. Note that for the geometric ICP, the degree of misalignment is low in the first section of the camera path (starting at the cecum), however, adds up with an increasing number of frames. In contrast, the proposed photo-geometric variant is capable of reconstructing the global shape of the colon in a superior manner. 4.5 Discussion and Conclusions In this chapter, we have proposed a method for on-the-fly surface registration and shape reconstruction that builds upon a photo-geometric ICP framework and exploits the RBC data structure and search scheme for efficient 6-D NN search. We have optimized the concept of RBC regarding runtime performance on lowdimensional data, and achieved frame-to-frame registration runtimes of less than 20 ms on an off-the-shelf consumer GPU. In an experimental study on synthetic RGB-D data, we have addressed two endoscopic applications and observed that incorporating photometric appearance as a complementary cue substantially outperforms a conventional geometry-driven ICP. In contrast to approaches that combine dense geometric point associations with a sparse set of correspondences derived from local photometric features, the proposed framework evaluates both geometric and photometric information in a low-level but dense manner. We found 76 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction Convergence Behavior, Frame-to-Frame Error (Translation) 1.5 1 Convergence Behavior, Frame-to-Frame Error (Rotation) 1 in [°] in [mm] 1.5 0.5 0 0.5 8 16 32 64 128 256 8 16 32 64 128 256 0 8 16 32 64 128 256 8 # Iterations # Iterations (a) (b) 16 32 64 128 256 Figure 4.15: Investigation of the convergence behavior, for 3-D geometric (in blue) and 6-D photo-geometric ICP registration (in green), for a single case. Given is the per-frame translational (a) and rotational (b) drift after different numbers of ICP iterations. that incorporating photometric appearance in such an elementary way gives a convenient compromise between registration robustness and runtime performance. For operation situs reconstruction in 3-D laparoscopy, the photo-geometric ICP reduced the drift by a factor of 12.9 (translation) and 5.9 (rotation), respectively, compared to a geometry-driven ICP registration. Furthermore, we showed that an equal increase in noise in the range measurements results in a substantially smaller increase in drift for the photo-geometric ICP variant. For colon model construction in 3-D colonoscopy, the drift could be reduced by a factor of 5.5 (translation) and 6.5 (rotation), comparing photo-geometric vs. geometric ICP registration. Overall, the results are consistent to findings in a previous study on non-medical scenarios, stating that incorporating photometric information decreased the registration error by an order of magnitude [John 97]. In conclusion, let us summarize the limitations of this study and comment on future research directions. First, we have modeled the target structure to be static. This might be an acceptable approximation for mapping scenarios in laparoscopic procedures where accuracy requirements are less strict and motion is moderate, or when modeling local structures of a colon. When aiming at the reconstruction of the full colon, the target will be subject to a substantial amount of non-rigid organ deformation. Nonetheless, assuming a piecewise static assumption could allow for the reconstruction of local colonic segments. Then, prior knowledge from VC data (if available) could be employed to align these local shape models. Vice versa, such local models could help in adding local texture to the VC colon model. A future evaluation on real RGB-D data must further reveal the influence of the modality-specific noise behavior and inherent artifacts that occur in clinical practice. Strategies to improve the robustness of the system include a multi-resolution scheme, smart outlier handling, the transition from frame-to-frame to frame-tomodel registration [Curl 96], and dedicated techniques such as loop closure – if applicable in the particular application. Discussion and Conclusions centerlines are depicted in blue. Note the substantial differences in both local and global shape and the sharp bends at the transition between ascending, transverse, descending and sigmoid colon. The remaining columns show reconstruction results for 3-D geometric ICP registration (blue) and 6-D photogeometric ICP registration (green). Figure 4.16: Qualitative reconstruction results for the eight colon models that are shown in the first and fourth column, respectively. Their associated 4.5 77 78 Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction Part II Non-Rigid Surface Registration for Range Imaging Applications in Medicine 79 CHAPTER 5 Joint Range Image Denoising and Surface Registration 5.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3 Non-Rigid Surface Registration Framework . . . . . . . . . . . . . . . . . . . 86 5.4 A Joint Denoising and Registration Approach. . . . . . . . . . . . . . . . . . 88 5.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.6 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 The management of respiratory motion in diagnostic imaging, interventional imaging and therapeutic applications is a rapidly evolving field of research with many current and future issues to be addressed. In Part II of this thesis we focus on respiratory motion tracking and management in radiation therapy. Based on the fusion of pre-fractionally acquired accurate tomographic planning data (CT/MR) and intra-fractionally acquired RI data, an improved RT treatment can be achieved. More specifically, first studies have indicated that the identification and tracking of non-rigid torso surface deformations induced by breathing holds great potential to improve the accuracy of dose delivery [Yan 06, Faya 11, Scha 12]. In this chapter, we propose a variational framework that solves denoising of low-SNR RI data and its registration to a reference shape extracted from tomographic planning data. In particular, we present a novel joint formulation for solving these two intertwined problems, assuming that tackling each task would benefit considerably from prior knowledge of the solution of the other task [Baue 12b]. Thereby, we explicitly exploit the fact that the reference shape extracted from tomographic data represents the patient’s geometry in an accurate and reliable manner. The proposed method enables: • Robust intra-fractional full torso surface acquisition for patient monitoring • Estimation of non-rigid torso deformations, yielding a high-dimensional respiration surrogate in terms of dense displacement fields The remainder of this chapter is organized as follows. First, we introduce the medical background in Sect. 5.1. We present a comprehensive overview here, as all methods proposed in Part II of this thesis (Chapters 5-7) address the task of respiratory motion tracking with a particular focus on RT. In Sect. 5.2, we summarize 81 82 Joint Range Image Denoising and Surface Registration related work w.r.t. methodology. Sect. 5.3 introduces our general variational formulation for non-rigid geometric surface registration with dense RI data. Based on this initial model, we propose an extended formulation for joint registration and range image denoising in Sect. 5.4. In Sect. 5.5, we study the parameterization of the method and show experimental results on both synthetic and real ToF/CT data. Eventually, we discuss the results and draw a conclusion in Sect. 5.6. Parts of this chapter have been published in [Baue 12b] and are joint work with Prof. Dr. Martin Rumpf and Prof. Dr. Benjamin Berkels. 5.1 Medical Background First, we introduce the medical background for the methods proposed in this and the two following chapters (Chapters 6,7). Even though the methods presented in Part II consider substantially different RI modalities and registration concepts, respectively, they share the clinical motivation. While we restrict the discussion to respiratory motion tracking in RT here, the management of respiratory motion holds great benefits in many clinical applications beyond RT, e.g. in diagnostic and interventional procedures. 5.1.1 Image-Guided Radiation Therapy Along with the trend toward small-margin and high-dose external beam RT, treatment success of patients with thoracic, abdominal and pelvic tumors strongly depends on an accurate delivery of the intended radiation dose. In target locations in the thoracic cavity and the abdomen, anatomical structures are known to move considerably due to patient respiration [Bran 06, Lang 01]. The motion of the tumor and adjacent tissue during treatment has a profound impact on RT planning and delivery. Besides uncertainties in inter-fractional patient positioning [Essa 02] (cf. Sect. 3.1.1) and the planning process [Rish 11], intra-fractional respiratory motion induces a substantial geometric and dosimetric source of error [Sepp 02, Will 12]. Even though there is clinical evidence that smaller fractions and higher radiation doses can be beneficial in terms of local tumor control, in practice, uncertainties due to respiratory induced motion typically demand for conservative treatment strategies. In order to account for potential targeting errors and to assure adequate dosimetric coverage of the tumor-bearing tissue, large safety margins are typically applied. However, these come at the cost of irradiating surrounding radiosensitive structures. To reduce tolerances between the planned and actually delivered dose distribution, a multitude of techniques for respiratory motion management have been developed over the past decades. For a comprehensive survey we refer to Keall et al. [Keal 06] and Verellen et al. [Vere 10]. The introduction of image-guided radiation therapy (IGRT) has been a milestone in improving dose delivery based on the instant knowledge of the target location and spatio-temporal changes in tumor volume during the course of treatment. In conventional IGRT, the location of the tumor and adjacent critical structures is determined using in-room radiographic 5.1 Medical Background 83 imaging (e.g. stereoscopic X-ray imaging, in-room CT, cone-beam CT) of the target site prior to radiation delivery. This allows a verification of the tumor location in the pre-treatment position, but cannot account for intra-fractional variations in tumor position induced by respiration. 5.1.2 Respiration-Synchronized Dose Delivery Modern respiration-synchronized IGRT aims at continuously tracking the moving target over its trajectory and re-position the treatment table or radiation beam dynamically to follow the tumor’s changing position during therapeutic dose delivery [Diet 11, Kilb 10, McCl 06, Murp 04, Wilb 08]. This allows to reduce the tumormotion margin in dose distribution and to substantially increase the LINAC’s duty cycle compared to gated RT (20-30%), where the beam is merely activated within a short time slot around a pre-defined state in the respiration cycle [Kubo 96]. Thus, the overall treatment time can be reduced, enabling an efficient operation of the therapy facility. The detection and tracking of the tumor position is the most important and challenging task in this context. In practice, radiographic imaging of the target itself is often not feasible as most tumors will not present a welldefined object boundary suitable for automatic image segmentation and registration, respectively. According to clinical studies, implanted fiducial markers offer the most accurate solution to determine the target position during treatment. However, the benefits of accuracy need to be weighed against the cost-intensive and invasive procedure of implanting markers and the eventuality of marker migration [Keal 06]. Both direct image-guided and fiducial-based tumor tracking potentially require continuous radiographic imaging, involving risks associated with the additional radiation dose [Murp 07]. In order to reduce additional radiation exposure, recent hybrid tumor-tracking techniques combine episodic radiographic imaging with continuous monitoring of external breathing surrogates based on the premise that the internal tumor position can be predicted from the deformation of the external body surface in the time interval between image acquisitions. The underlying patient-specific correlation model can be established from a series of simultaneously acquired external-internal position measurements [Muac 07, Hoog 09, Erns 12], from 4-D CT [Eom 10, Vand 11, Vere 10], or 4-D MR planning data [Miqu 13]. During treatment, this model is used to deduce the tumor position under the influence of respiratory motion using real-time external surrogates, see Fig. 5.1 for a schematic illustration of the workflow. A clinically available solution for tumor motion compensation based on a patient-specific external-internal motion correlation model is the CyberKnife system (Accuray Inc., Sunnyvale, CA, USA1 ) [Kilb 10]. It uses three optical markers attached to a wearable vest as external respiration surrogate. Episodic verification of the internal target position is performed based on stereoscopic X-ray imaging and fiducial markers, or fiducial-free soft-tissue tracking. For dynamic beam steering, the LINAC is mounted on a robotic manipulator. A similar approach is the VERO platform (Brainlab AG, Feldkirchen, Germany, 1 http://www.accuray.com/ 84 Joint Range Image Denoising and Surface Registration Figure 5.1: Workflow in RI-guided respiration-synchronized RT. The initial preparation phase involves the training of an external-internal motion correlation model and the generation of the treatment plan. Then, in the fractional treatment sessions, the learned model is applied for respiratory motion compensation during dose delivery. RI can be also used for patient setup (recall Sect. 3.1.1). Image sources (CT scanner, RT system): Siemens AG. and Mitsubishi Heavy Industries Ltd., Tokyo, Japan2 ) [Depu 11, Vere 10]. It integrates stereoscopic X-ray imaging, volumetric cone-beam CT and real-time beam adjustment for respiration-synchronized tumor tracking. 5.1.3 Dense Deformation Tracking The key issue with external-internal motion correlation models is the actual level of correlation, accounting for the accuracy of dose delivery in the time interval between radiographic model verification. Clinically available solutions that are in use or potentially suitable for hybrid tumor-tracking [Ford 02, Hoog 09, Will 06] typically measure external motion using a single or few passive markers on the patient’s chest as a low-dimensional (in most cases 1-D) surrogate. However, markerbased external surrogates require extensive patient preparation and reproducible marker placement with a considerable impact on model accuracy - directly translating into the accuracy of dose delivery. Furthermore, in practice, patient respiration is rarely regular and subject to substantial inter-cycle variability [Keal 06] and the bio-mechanical coupling of external surrogates with the internal target motion may exhibit complex relationships. Thus, those low-dimensional techniques are incapable of depicting the full complexity of respiratory motion. Experimental studies by Yan et al. [Yan 06] and Fayad et al. [Faya 11] confirmed that using multiple external surrogates at different anatomical locations is superior to the conventional approach with a single 1-D respiratory signal for external-internal correlation modeling. Both conclude that model accuracy correlates with the quantity of suitable external surrogate positions [Yan 06, Faya 09, Faya 11]. Modern IGRT solutions that enable to monitor the motion of the complete external patient body surface have the potential to help reducing correlation model uncertainties. In particular, marker-less RI technologies can acquire a dense 3-D surface model of the patient [Bert 05, Brah 08, Mose 11, Gier 08, Mose 11, Peng 10, Scho 07] over time (3-D+t). For an overview of non-radiographic systems for pa2 http://www.vero-sbrt.com/ 5.2 Related Work 85 tient positioning, tumor localization and tracking, and motion compensation we refer to Willoughby et al. [Will 12] and Meeks et al. [Meek 12]. Early strategies in RI-based motion tracking were restricted to low-dimensional respiration surrogates [Scha 08]. However, as stated before, a more reliable and accurate correlation model can be potentially established from high-dimensional respiration surrogates [Yan 06, Faya 11]. Hence, we target the estimation of a dense displacement field representing the deformation of the instantaneous torso shape w.r.t. a reference surface [Baue 12b, Baue 12a, Baue 12d, Berk 13]. The methods introduced in Part II of this thesis estimate such dense displacement fields in the presence of: • Dense but low-SNR RI data (Chap. 5) • Accurate but sparse RI data (Chap. 6) • Dense RI data with complementary photometric information (Chap. 7) Let us stress that in this thesis we explicitly focus on the aspect of respiratory motion tracking. The application of the estimated displacement fields in motion management and compensation using patient-specific motion models [Wasz 12b, Wasz 13, McCl 13], for instance, is beyond the scope of this work. 5.2 Related Work Opposed to marker-based tracking technologies, RI cameras do not measure the local motion trajectory at specific surface landmarks directly. Thus, the dense displacement field describing the local surface point trajectories must be recovered using non-rigid surface registration techniques (Sect. 2.4.2). The idea of reconstructing dense surface motion fields for respiratory motion tracking was proposed by Schaerer et al. [Scha 12] and Bauer et al. [Baue 12a, Baue 12b, Baue 12d] only recently. In previous work on extracting a multi-dimensional respiration surrogate from dense RI data, Fayad et al. [Faya 11] had proposed to track the 3-D points of the acquired surface according to its associated pixel indices in the sensor domain. Note that this is a poor approximation assuming the individual points to move along the projection rays of the RI camera. Let us distinguish our variational model for dense non-rigid surface registration introduced below (Sect. 5.3) from alternative strategies, cf. Sect. 2.4.2. First, we particularly exploit the fact that for the considered application in RT, the reference shape is extracted from tomographic planning data. Hence, it describes the patient geometry in a highly reliable and accurate manner. This supersedes the need to model the reference shape in a probabilistic manner such as in more generic point cloud registration approaches which treat the surface registration problem as an alignment between two distributions, cf. related work by Tsin and Kanade [Tsin 04] and Jian and Vermuri [Jian 05, Jian 11]. Second, the methods proposed in this chapter and in Chap. 6 build on an implicit representation of the reference data. In particular, we encode its shape in a signed distance function (SDF) [Jone 06] that is pre-computed in a sufficiently large neighborhood and stored in a 3-D volumetric image domain [Russ 00]. This SDF can be constructed 86 Joint Range Image Denoising and Surface Registration offline, prior to treatment. Regarding the template shape, embedding it into a signed distance transform space and solving the non-rigid registration problem between the two resulting SDFs in a volumetric image domain [Para 03, Huan 06] would imply a substantial computational burden. Even though hardware acceleration of volumetric registration techniques has progressed over the past years [Fluc 11, Sham 10], working on the 2-D range image domain is generally much more efficient. Hence, instead, we propose a variational formulation that aligns the template data to the reference shape SDF in a direct manner. In particular, we exploit the bijective mapping between the intra-fractionally acquired 3-D point cloud and its underlying representation in a regular 2-D base domain – the RI sensor plane. In this chapter, given a reference shape extracted from pre-fractionally acquired tomographic planning data and intra-fractionally acquired low-SNR RI data of the patient’s torso, we introduce a variational approach which combines the two intertwined tasks of RI data denoising and its non-rigid alignment to the reference shape. The idea of developing joint variational methods for intertwined problems has become quite popular and successful in imaging recently. Already a decade ago, Yezzi, Zöllei and Kapur [Kapu 01], Unal et al. [Unal 04], and Feron and Mohammad-Djafari [Fero 04] have combined image segmentation and registration. Joint formulations have been also proposed for object de-blurring and motion estimation [Bar 07], denoising and anisotropy estimation [Berk 06], and joint image registration, denoising and edge detection [Han 07]. Droske and Rumpf proposed a variational scheme for image denoising and registration based on nonlinear elastic functionals [Dros 07], and Buades et al. presented image sharpening methods based on combined denoising and registration [Buad 09]. We further refer to the dissertation of Berkels on joint methods in imaging [Berk 10]. 5.3 Non-Rigid Surface Registration Framework In this section, we introduce our general framework for non-rigid surface registration with dense RI data, assuming ideal (noise-free) range measurements. We depict the geometric configuration, introduce the basic notation and define a suitable variational formulation to solve the registration problem at hand. Then, in Sect. 5.4, we propose a novel approach for joint range image denoising and registration that is particularly designed for low-SNR RI data. 5.3.1 Geometric Configuration Assume that we have extracted a reliable surface G ⊂ R3 from tomographic planning data. Furthermore, during treatment delivery, we continuously acquire range image data r that describes a surface Xr ⊂ R3 . For each position ζ ∈ R2 on the image plane Ω, the (assumingly) noise-free range value r (ζ ) describes a position x = xr (ζ ) = r (ζ ) p(ζ ) on Xr with xr : Ω → Xr , where p denotes the 2-D/3-D mapping operator, see Sect. A.1 in the Appendix of this work. Due to respiration, the intra-fractionally acquired RI surface Xr differs from the pre-fractionally acquired planning shape G . More specifically, the shape of 5.3 Non-Rigid Surface Registration Framework (a) 87 (b) Figure 5.2: Geometric sketch of the registration configuration for a torso-like shape. For a better visibility, the reference shape G (in gray) and the RI surface Xr (in green) have been pulled apart. (a) Assuming noise-free RI measurements, the displacement vector u(ζ ) (in blue) maps a position xr (ζ ) ∈ Xr onto G . (b) With low-SNR RI data, the measured position xr0 (ζ ) ∈ Xr0 is unreliable. Hence, the proposed joint approach simultaneously estimates both a robust position xr (ζ ) and the associated displacement u(ζ ) mapping xr (ζ ) onto G . Xr depends on the state of the patient’s respiratory motion at the RI acquisition time. Hence, we consider a deformation φ : Xr → R3 matching Xr and G in the sense that φ(Xr ) ⊂ G . This deformation can be represented by a displacement u : Ω → R3 defined on the parameter domain Ω: φ( xr (ζ )) = xr (ζ ) + u(ζ ) . (5.1) For a graphical illustration of the geometric configuration with noise-free RI data we refer to Fig. 5.2a. 5.3.2 Definition of the Registration Energy Now we are in the position to develop a variational framework which allows us to estimate a suitable matching displacement u∗ as a minimizer of a functional E [u]: E [u] := Ematch [u] + κ Ereg [u] . (5.2) It consists of a matching term Ematch and a smoothness prior Ereg for the displacement. The parameter κ is a positive constant weighting the contributions of the different energies, thus controlling the trade-off between registration accuracy and smoothness of the displacement field. Matching Energy. The purpose of the matching functional Ematch is to encode the condition φ(Xr ) ⊂ G . To quantify the matching of φ(Xr ) onto G let us assume that the signed distance function dG with respect to G is pre-computed in a sufficiently large neighborhood in R3 . The SDF dG : R3 → R is given as: dG ( x) := ±dist( x, G) , (5.3) where the sign is positive outside G and negative inside. In particular, dG ( x) = 0 for x ∈ G . Furthermore, ∇dG ( x) is the outward pointing normal vector on G for 88 Joint Range Image Denoising and Surface Registration x ∈ G and k∇dG ( x)k2 = 1. Using this SDF dG , we can construct the projection of a point x ∈ R3 in a neighborhood of G onto the closest point on G , P : R3 → G : P( x) := x − dG ( x)∇dG ( x) . (5.4) Let us emphasize that, even though P(Xr ) ⊂ G holds by construction, we do not expect any biologically reasonable φ to be equal to the projection P. Using Eq. 5.4 and k∇dG ( x)k2 = 1, we construct a quantitative pointwise measure for the closeness of x = φ( xr (ζ )) to G : k P(φ( xr (ζ ))) − φ( xr (ζ ))k2 = kdG (φ( xr (ζ )))∇dG (φ( xr (ζ )))k2 = |dG (φ( xr (ζ )))| . (5.5) Based on this closeness measure, we define the matching energy Ematch : Ematch [u] := Z 2 Ω |dG (φ( xr (ζ )))| dζ = Z Ω |dG ( xr (ζ ) + u(ζ ))|2 dζ . (5.6) Note that we confine to an approximation here, as detailed in the Appendix of this thesis (Sect. A.2.1). Smoothness Prior. As a regularization prior for the displacement u, we quadratically penalize the magnitude of the variation in the vector field: Ereg [u] := Z Ω k Du(ζ )k22 dζ , (5.7) where Du denotes the Jacobian matrix, i.e. ( Du)ij = ∂ j ui , and k Ak2 the Frobenius norm, i.e. k Ak22 := tr( A> A). This approach is known as diffusion regularization [Fisc 02, Mode 03a]. Along the lines of Horn and Schunck [Horn 81], the idea behind is to minimize any variation of the vector field, favoring smooth deformations while preventing singularities such as cracks, foldings, and other undesired properties such as oscillations. 5.4 A Joint Denoising and Registration Approach The basic formulation for non-rigid surface registration presented in Sect. 5.3 might be an appropriate choice for high-SNR RI data. In the presence of low-SNR data from low-cost RI sensors, we face a different situation. When setting a low weighting of the regularization term in Eq. 5.2, the estimated non-rigid displacement field might fulfill the condition φ(Xr ) ⊂ G . However, this comes at the cost of an impaired smoothness of the displacement field – exhibiting local spikes and thus being an implausible solution from a biologically point of view. Denoising the low-SNR RI data in a pre-processing step prior to surface registration will alleviate the problem. However, we assume that tackling each task – range image data denoising and its non-rigid registration to a reference shape, respectively – would benefit substantially from prior knowledge of the solution of the other task [Baue 12b]. Hence, in this section, we propose a joint formulation to solve both tasks in a simultaneous manner. 5.4 A Joint Denoising and Registration Approach 5.4.1 89 Definition of the Registration Energy Let us extend the variational registration model from Sect. 5.3.2 in a way that allows us to cope with considerably noisy range data r0 from a low-SNR RI camera, see Fig. 5.2b. In particular, we aim at (1) restoring a reliable range function r ∗ and (2) simultaneously extracting a suitable matching displacement u∗ as a minimizer of a functional: E [u, r ] := Efid [r ] + κ Er,reg [r ] + λEmatch [u, r ] + µEu,reg [u] . (5.8) Compared to the formulation for noise-free RI data in Eq. (5.2), we have added a fidelity energy Efid for the range function r given the measured low-SNR range data r0 and a suitable regularization prior Er,reg for the estimated range function. The positive constants κ, λ, µ weight the contributions of the different energies. This functional directly combines the range function r and the displacement u and together with the corresponding prior functions both for r and u substantiates the joint optimization approach of our method. In fact, an insufficient and possibly noisy range function r prevents a regular and suitable matching displacement u and vice versa. For the matching energy Ematch and the prior for the displacement Eu,reg we use the formulations introduced in Eqs. (5.6),(5.7). However, note that Ematch now depends on two unknowns: the range data r and the displacement field u: Ematch [u, r ] := Z Ω |dG (φ( xr (ζ )))|2 dζ = Z Ω |dG (r (ζ ) p(ζ ) + u(ζ )) |2 dζ . (5.9) Fidelity Energy for the Range Function. In order to enforce closeness of the restored range function r to the given input data r0 , we confine here to a simple least square type functional and define: Efid [r ] := Z Ω |r (ζ ) − r0 (ζ )|2 dζ . (5.10) Prior for the Range Function. RI data of a human torso acquired from a camera position above the reclined patient are characterized by steep gradients in particular at the boundary of the projected torso surface and by pronounced contour lines. To preserve these features properly, a total variation (TV) type regularization prior for the range function is decisive. On the other hand, we would like to avoid the well-known staircasing artifacts of a standard TV regularization. Hence, we take into account a pseudo Huber norm: q 2 , (5.11) kykδreg = kyk22 + δreg for y ∈ R2 and a suitably fixed regularization parameter δreg > 0 and define: Er,reg [r ] := Z Ω k∇r (ζ )kδreg dζ . (5.12) Decreasing this energy comes along with a strong smoothing in flat regions which avoids staircasing and at the same time preserves large gradient magnitudes that occur at contour lines or boundaries. 90 Joint Range Image Denoising and Surface Registration Joint Functional. In summary, combining the individual energy terms, we obtain the following joint functional: Z E [u, r ] = |r − r0 |2 + κ k∇r kδreg + λ|dG (rp + u)|2 + µk Duk22 dζ . (5.13) Ω For a proof of the existence of minimizers, we refer to Bauer et al. [Baue 12b]. 5.4.2 Numerical Optimization For the numerical minimization of the energy functional E [u, r ], we consider a gradient descent method. This requires the computation of the first variations with respect to the range function r and the displacement u, respectively, given as: h∂r E [u, r ], ϑi = h∂u E [u, r ], ϕi = Z Ω Z Ω 2(r − r0 ) ϑ + κ q ∇r · ∇ ϑ 2 |∇r |2 + δreg + 2λdG (rp + u)∇dG (rp + u) · p ϑ dζ , (5.14) 2λdG (rp + u)∇dG (rp + u) · ϕ + 2µDu : Dϕ dζ , (5.15) where ϑ : Ω → R is a scalar test function and ϕ : Ω → R3 is a vector-valued test displacement. Furthermore, A : B = tr( A> B). For a derivation of the first variations of the individual energy terms, we refer to the Appendix (Sect. A.2.2). For the spatial discretization we apply a piecewise bilinear finite element (FE) approximation on a uniform rectangular mesh covering the image domain Ω. The SDF dG is pre-computed using a fast marching method [Russ 00] on a uniform rectangular 3-D grid covering the unit cube [0, 1]3 and stored on the nodes of this grid, dG and ∇dG are evaluated using trilinear interpolation of the pre-computed nodal values. In the assembly of the functional gradient we use a Gauss quadrature scheme of order 3. The total energy E is highly non-linear due to the involved non-linear distance function dG and the pseudo Huber norm k · kδreg . We take a multiscale gradient descent approach [Alva 99], solving a sequence of joint matching and denoising problems from coarse to fine scales. On each scale a non-linear conjugate gradient method is applied on the space of discrete range maps and discrete deformations. In particular, we use a regularized gradient descent along the lines of Sundaramoorthi et al. [Sund 07] to guarantee a fast and smooth relaxation. As initial guess for the range function r we take into account the measured range data r0 . The displacement is initialized with the zero mapping. As time step control the Armijo rule is taken into account [Armi 66]. We stop iterating as soon as the energy decay is sufficiently small. 5.5 Experiments and Results The experimental study is structured as follows. First, we investigate the performance of different denoising models. Second, the proposed model for joint range 5.5 Experiments and Results 91 image denoising and registration is validated on real CT and synthetic ToF data from an anthropometric torso phantom. Third, we investigate its application on synthetic data from a dynamic 4-D CT respiration phantom. Thereby, we underline the benefits of the joint variational approach compared to consecutive denoising and registration. Fourth, we apply the method to real CT and real ToF data. For an application of the non-rigid surface registration framework proposed in Sect. 5.3 we refer to Wasza et al. [Wasz 12b]. 5.5.1 Materials and Methods All experiments below, except from the respiration phantom study, build on an initial shape template that was extracted from CT data of a male torso phantom as described in Sect. 3.5.1. Here, let us consider this shape template as the instantaneous patient body surface denoted by M ⊂ R3 . Based on M and a virtual ToF camera (200×200 px, cf. Sect. 2.2.2) we generated: • Ideal noise-free ToF data rideal , with associated 3-D point cloud Xrideal , denoted as a pair (rideal , Xrideal ) below. • Realistic ToF measurements, denoted (r0 , Xr0 ), by artificially adding noise in the range measurement domain. In particular, we approximated sensor noise on a per-pixel basis by adding an individual offset to the ideal range rideal drawn from a standard normal distribution with σ2 = 40 mm2 . This variance is motivated by observations on real ToF data (PMD CamCube 2.0) at a typical clinical working distance of about 1.5 m. • Temporally averaged realistic ToF measurements, denoted (r0,ta , Xr0,ta ). As initial data enhancement step prior to spatial denoising addressed in the variational formulation, we applied temporal averaging over 5 frames on (r0 , Xr0 ). This is a viable choice w.r.t. the frame rate of today’s RI cameras and the considered application in respiratory motion tracking. The denoised range data and matching deformation estimated with the proposed joint approach are denoted by (r ∗ , Xr∗ ) and (u∗ , φ∗ ), respectively. Comparison of Denoising Models. Based on the synthetic data introduced before, we first studied the performance of different denoising models on (r0,ta , Xr0,ta ) using the variational formulation from the joint approach: E [r ] = Efid [r ] + κ Er,reg [r ] . (5.16) Let us stress that we solely investigated the denoising component of the proposed method here, in the absence of deformations. The proposed regularization using the pseudo Huber norm (Eq. 5.12) was compared to both a quadratic (Q) and an egde preserving TV regularization of r: Er,reg,Q [r ] := Er,reg,TV [r ] := Z ZΩ Ω k∇r (ζ )k22 dζ , (5.17) k∇r (ζ )k1 dζ . (5.18) 92 Joint Range Image Denoising and Surface Registration Ideal ToF data (intra-fractional) add noise Realistic ToF data (intra-fractional) denoising Denoised ToF data Denoised ToF data Joint denoising and registration registration Planning CT shape (pre-fractional) Reconstructed planning CT shape Reconstructed planning CT shape Figure 5.3: Geometric sketch of the model validation setup. On the left, the generation of noisy ToF data (r0 , Xr0 ), (r0,ta , Xr0,ta ) and a planning shape G = φideal (M) from a given ideal intra-fractional shape M (blue frame) is depicted (purple-shaded boxes). On the right, datasets used for validation of the proposed joint approach (left gray-shaded box) and a sequential denoising and registration scheme (right gray-shaded box) are illustrated (green frames). In addition, the metrics used for quantitative evaluation of the denoising process (top) and the registration process (bottom) are depicted in red. As a quantitative measure of the denoising quality, we evaluated the distance of the denoised surface Xr∗ to the ideal intra-fractional shape M. This is performed by evaluating dM (Xr∗ ), where dM represents the SDF w.r.t. M. For the experiments, the weighting factor in Eq. (5.16) was empirically set to κ = 1 · 10−4 . Validation of the Joint Model. The workflow of our experiments for model validation is depicted in Fig. 5.3. Recall that we consider M as the ideal intra-fractional shape of the patient and that we have generated both ideal (rideal , Xrideal ) and realistic synthetic ToF data (r0 , Xr0 ), (r0,ta , Xr0,ta ) from M. In addition, we deformed M by a synthetic deformation φideal and considered the deformed shape as planning CT surface G = φideal (M). As synthetic deformation we have taken into account a non-linear transformation in the treatment table plane: ux,ideal ( x) = ν(− x (y − 0.5) − (1 − x )( x − 0.5)) , uy,ideal ( x) = ν( x ( x − 0.5) − (1 − x )(y − 0.5)) , (5.19) (5.20) and uz,ideal ( x) = 0, x ∈ M, with a comparably large deformation scale parameter ν set to 10% of the scene width. 5.5 Experiments and Results 93 We consider the distance of the denoised range data to the ideal intra-fractional patient shape dM (Xr∗ ) for quantitative evaluation of the denoising process of the joint approach. In analogy, the quality of the registration process is quantified using dG (φ∗ (Xrideal )), evaluating the distance of the ideal intra-fractional patient shape being transformed with the estimated matching deformation φ∗ (Xrideal ) to the planning shape G , cf. Fig. 5.3. Hence, both components of the joint approach are evaluated in an independent manner. We investigated the case of unfiltered range data (r0 , Xr0 ) with a suitable set of model parameters κ = 4 · 10−4 , λ = 1 · 104 , µ = 4 · 10−3 and the case of temporally averaged range data (r0,ta , Xr0,ta ) with an adapted set of parameters κ = 1 · 10−4 , λ = 2.5 · 103 , µ = 1 · 10−3 , to study the impact of applying temporal denoising as a pre-processing measure. Joint vs. Non-Joint Denoising and Registration. To explicitly investigate the benefit of the proposed joint approach, we compared the quality of joint denoising and registration to a sequential (non-joint) scheme that performs denoising and registration consecutively, i.e. first denoising r0 and then computing the deformation φ̃∗ matching the denoised surface Xr̃∗ to G . The optimal range and displacement functions that are estimated based on the sequential scheme are denoted with a tilde, as (r̃ ∗ , Xr̃∗ ) and (ũ∗ , φ̃∗ ), opposed to the joint estimates (r ∗ , Xr∗ ) and (u∗ , φ∗ ). For direct comparability, we consider the same denoising model as with the joint approach: Z E [r̃ ] = Ω |r̃ − r0 |2 + κr̃ k∇r̃ kδreg dζ . (5.21) The registration of the denoised range data r̃ ∗ to G is then performed according to: E [ũ] = Z Ω |dG (r̃ ∗ p + ũ)|2 + κũ k D ũk22 dζ , (5.22) with the regularization weights κr̃ = 1 · 10−4 and κũ = 4 · 10−7 set in analogy to the settings with the joint approach. By doing so, we ensure that the regularization of the matching displacement is at a comparable level. Again, we investigated the denoising and registration components in a separate manner. For the sequential approach, we quantified the quality of denoising and registration in analogy to the joint approach by dM (Xr̃∗ ) and dG (φ̃∗ (Xrideal )), respectively. The former evaluates the distance of the denoised surface Xr̃∗ to the ideal intra-fractional patient shape M. The latter quantifies the distance of the ideal intra-fractional patient shape being transformed with the estimated matching deformation φ̃∗ (Xrideal ) w.r.t. the planning shape G . For a comprehensive illustration of the evaluation setup we refer to Fig. 5.3. The experiments were performed on (r0,ta , Xr0,ta ). Respiratory Motion Tracking. In order to quantify the performance of the proposed method in respiratory motion tracking, we used the synthetic 4-D Nurbsbased CArdiac-Torso phantom (NCAT) [Sega 07]. In particular, we generated torso shape data M p for 16 states within one respiration cycle, p ∈ {1, . . . , 16} denoting the respiration phase. For each state, the phantom’s external body surface was extracted from synthetic CT data with a resolution of 256 × 256 × 191 voxels and a spacing of 3.125 × 3.125 × 3.125 mm3 with the pipeline described in Sect. 3.5.1. We 94 Joint Range Image Denoising and Surface Registration (a) (b) (c) Figure 5.4: Experimental evaluation of different denoising models. From left to right, the residual distance dM (Xr∗ ) of the denoised surface Xr∗ to the ground truth shape M is color coded on one side of Xr∗ for (a) quadratic regularization, (b) TV regularization, and (c) the proposed regularization based on the pseudo Huber norm k∇r kδreg . generated a typical RT treatment scene by adding a plane that mimics the treatment table. The phantom shape at full expiration (p = 1) was considered as the pre-fractional planning geometry G . For the shapes of the remaining respiration states, we generated temporally averaged realistic ToF data (r p,0,ta , Xr p,0,ta ). These datasets were then processed using the proposed joint range image denoising and registration approach. The quality of the denoising process was evaluated based on dM p (Xr∗p ), where dM p denotes the SDF w.r.t. M p , the quality of the registration process based on dG (φ∗p (M p )), in analogy to the experiments described before. φ p denotes the deformation at respiration phase p. The model parameters were set in accordance with the model validation experiments: κ = 1 · 10−4 , λ = 2.5 · 103 , µ = 1 · 10−3 . Experiments on Real ToF Data. Eventually, we illustrate the feasibility of the proposed method on real CT and real ToF data from the male torso phantom. ToF data were acquired using a PMD CamCube 3.0 ToF camera with an integration time of 750 µs. Running at a frame rate of 40 Hz, we applied temporal averaging over 5 frames as in the experiments with synthetic ToF data before. The weighting parameters were also set in accordance with the experiments on synthetic data, κ = 1 · 10−4 , λ = 2.5 · 103 , µ = 1 · 10−3 . ToF and CT data were roughly aligned manually. However, note that for these experiments on real data we did not have ground truth information. Hence, we were unable to quantify the quality of denoising and registration as in the synthetic experiments. Instead, for a qualitative assessment of the denoising process, we compare the denoising results with the joint and non-joint approach to ToF data that were averaged over 500 frames. 5.5.2 Results Comparison of Denoising Models. The performance of the three investigated denoising models is illustrated in Fig. 5.4. Both the over-smoothing effect of the quadratic regularization at the torso boundaries and the staircasing artifacts of the TV regularization on the flat thoracic and abdominal regions are clearly visi- 5.5 Experiments and Results Xr0 95 Xr0,ta Xr0,ta ToF Data dG (φ∗ (Xr∗ )) dM (Xr∗ ) dG (φ∗ (Xrideal )) u∗ Figure 5.5: Validation of the joint model on male phantom data. The first two columns correspond to results for a full torso including the head, the last column to results for the trunk of the phantom. The left column depicts results on unfiltered ToF data Xr0 , the center and right columns results on temporally averaged ToF data Xr0,ta . From top to bottom, the rows depict the input ToF data, the residual mismatch dG (φ∗ (Xr∗ )) after joint denoising and registration to G , the denoising quality dM (Xr∗ ), the registration quality dG (φ∗ (Xrideal )) and the smoothness of the estimated displacement field u∗ . The associated color-coding is depicted on the right. ble. Instead, using the proposed regularization based on the pseudo Huber norm k∇r kδreg , we observe an edge-preserving smoothing that avoids staircasing. Validation of the Joint Model. Results of the proposed algorithm for joint range image denoising and registration on torso phantom data are depicted in Fig. 5.5. 96 Joint Range Image Denoising and Surface Registration In particular, we present results for two different datasets, one representing the full torso including the head, the other one representing the trunk of the phantom. For the full torso dataset, results for both unfiltered (r0 ,Xr0 ) and time-averaged realistic ToF data (r0,ta ,Xr0,ta ) are given. The results on time-averaged ToF data outperformed the results on unfiltered ToF data. Overall, regarding the results on time-averaged ToF data that still exhibit a rather low SNR, the residual mismatch in terms of both denoising and registration was little. We remark that this is a particularly promising result w.r.t. the strong synthetic deformation used in these validation experiments, cf. the visualization of u∗ in Fig. 5.5. Joint vs. Non-Joint Denoising and Registration. Fig. 5.6 opposes the proposed joint approach to sequential denoising and registration, at a comparable level of regularization of the matching displacement. The color-coding indicates the superiority of the joint approach. Over the central torso region, the absolute residual error after denoising was 0.47±0.36 mm for the sequential scheme evaluating |dM (Xr̃∗ )|, compared to 0.22±0.15 mm when estimated within the joint framework evaluating |dM (Xr∗ )|. This corresponds to an improvement by a factor of 2.1. Considering the registration quality, the absolute residual mismatch was 0.47±0.36 mm for the sequential scheme, where the alignment was performed after denoising, and evaluating |dG (φ̃∗ (Xrideal ))|. Using the proposed joint framework, the residual mismatch decreased to 0.24±0.16 mm evaluating |dG (φ∗ (Xrideal ))|. This corresponds to an improvement by a factor of 1.9. We conclude that incorporating prior knowledge about the target shape G helps substantially in the denoising process. On the other hand, proper denoising also renders the registration problem more robust. Respiratory Motion Tracking. Results for respiratory motion tracking on NCAT phantom data are depicted in Fig. 5.7. Given are color-coded plots of the residual error w.r.t. denoising and registration, and the estimated displacement fields u∗ for different phases within the respiration cycle. The results indicate that the high quality of denoising and registration with the joint approach is only slightly affected by the magnitude of respiration (increasing from left to right). To speed up the algorithm, we have taken into account the estimated displacement field and the denoised range data from the previous phase as initial data for the next phase. Comparing this initialization scheme to an initialization of r with r0 and u with the zero displacement, we observed a reduction of the required gradient descent steps by a factor of 3 in average, without any notable change of the resulting minimal energy, cf. [Baue 12b]. Experiments on Real ToF Data. Results on real CT and real ToF data from the male torso phantom are illustrated in Fig. 5.8. An interesting outcome was the observation that the denoising process of the joint model was able to remove topographic artifacts in Xr0,ta that result from systematical errors of real ToF data, cf. Sect. 2.1.3. Again, this underlines that the joint approach inherently incorporates prior shape knowledge from the reference shape G into the denoising process. Vice versa, this results in a substantially better displacement estimate. In Discussion and Conclusions 97 Non-Joint 5.6 dG (φ̃∗ (Xrideal )) Xr̃∗ dM (Xr∗ ) dG (φ∗ (Xrideal )) X r∗ Joint dM (Xr̃∗ ) Figure 5.6: Comparison of the proposed joint approach to a sequential scheme where registration is performed after denoising, for temporally averaged ToF data (r0,ta , Xr0,ta ). The color-coded renderings on the left depict the denoising quality, evaluating dM (Xr̃∗ ) on Xr̃∗ for the non-joint approach (upper row) and dM (Xr∗ ) on Xr∗ for the joint approach (lower row). The color-coded renderings in the center depict the registration performance, evaluating dG (φ̃∗ (Xrideal )) on φ̃∗ (Xrideal ) for the non-joint and dG (φ∗ (Xrideal )) on φ∗ (Xrideal ) for the joint approach. The right column illustrates the denoised surfaces Xr̃∗ and Xr∗ with a monochrome rendering, visually underlining the superior denoising performance of the joint approach. contrast, when performing denoising and registration in a consecutive manner, the denoising process cannot eliminate those artifacts. Hence the displacement field, even though satisfying φ̃∗ (Xr̃∗ ) ≈ G , will exhibit local artifacts that do not reflect the actual deformation. Regarding the considered application of the displacement field u as a multi-dimensional respiration surrogate in RT motion management, this would be highly problematic. 5.6 Discussion and Conclusions In this chapter, we have proposed a novel variational formulation for joint denoising of low-SNR RI data and its registration to a reference shape. First and foremost, the target application requires the estimation of a dense and reliable displacement field describing the torso deformation of a reclined patient and providing a multidimensional respiration surrogate. However, the need for a reliable displacement field implies the need for a reliable measurement of the intra-fractional patient shape. This is not a valid assumption for low-SNR RI data. Even though non-rigid surface registration techniques can cope with imperfect input data and estimate a matching displacement field, residual noise and artifacts in the input data will impair the deformation estimation. Joint Range Image Denoising and Surface Registration u∗ dG (φ∗ (Xrideal )) dM (Xr∗ ) Xr0,ta 98 Figure 5.7: Results on NCAT phantom data. From left to right, different phases within the respiration cycle, from exhale (left) to inhale (right) are illustrated. The first row shows the intra-fractional ToF data Xr0,ta . The second and third row, respectively, depict the denoising quality dM (Xr∗ ) on Xr∗ and the registration quality dG (φ∗ (Xrideal )) on φ∗ (Xrideal ). The associated color-coding is depicted on the right. The fourth row illustrates the estimated displacement fields u∗ . Here, the color-coding indicates the amplitude of the local displacement. In order to enhance the acquired low-SNR RI data, conventional spatial denoising may be applied. However, in the experiments, we showed that using a joint formulation that simultaneously denoises the measured RI data while registering it to an accurate reference shape is beneficial compared to a sequential approach that performs both tasks in a consecutive and independent manner. In a quantitative study on real CT and synthetic ToF data, we found that the joint formulation improves the quality of the denoising and registration processes by a factor of 2.1 and 1.9, respectively. A feasibility study on real CT and real ToF data further revealed that denoising with the joint model can compensate for surface artifacts that result from systematic ToF errors. This is possible due to the proposed joint formulation exploiting the reliable and accurate patient planning data as a shape prior. It likewise improves the quality of denoising and the correctness of the estimated displacement field. The results confirm our initial assumption that tackling each task does benefit considerably from prior knowledge of the solution of the other task. 5.6 Discussion and Conclusions dG (φ∗ (Xr∗ )) Xr0,ta initial mismatch Xr0,ta 99 Xr0,ta,500 Xr̃∗ X r∗ Figure 5.8: Experimental results on real ToF and real CT data. From left to right, the upper row depicts the initial mismatch between Xr0,ta and G rendered in a single image using alternating slices, the temporally averaged real ToF data Xr0,ta , and the color-coded residual mismatch dG (φ∗ (Xr∗ )) after joint denoising and registration. The lower row illustrates the elimination of systematic ToF measurement artifacts when using the proposed joint model. Xr0,ta,500 denotes real ToF data averaged over 500 frames. Note that both Xr0,ta and Xr0,ta,500 exhibit a local artifact above the clavicle (labeled in red). Conventional spatial denoising as applied in the sequential approach cannot eliminate this artifact, see Xr̃∗ . Instead, the denoising process of the joint model reduces the artifact substantially, see Xr∗ . We further investigated the performance of the proposed joint model on synthetic data from a 4-D CT respiration phantom. Here, we found that the quality of both the denoising and registration process, respectively, is only slightly impaired by increasing the respiration magnitude. However, the amount of noise in the RI input data correlates with the residual mismatch w.r.t. both tasks. This motivates the pre-processing of RI data with temporal denoising techniques prior to applying the proposed joint denoising and registration algorithm. 100 Joint Range Image Denoising and Surface Registration CHAPTER 6 Sparse-to-Dense Non-Rigid Surface Registration 6.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.2 Sparse-to-Dense Surface Registration Framework . . . . . . . . . . . . . . . 103 6.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 As detailed previously in Sect. 5.1, the intra-procedural tracking of respiratory motion has the potential to improve image-guided diagnosis and interventions. Available solutions in IGRT are subject to several limitations. Foremost, real-time RI technologies that are capable of acquiring dense 3-D data typically exhibit a low SNR. The question we pose is: Could a paradigm shift in the development of realtime RI technology from dense but noisy toward accurate but sparse help improving the accuracy of surface motion tracking? In this chapter, we investigate the fitness of a novel multi-line triangulation sensor for the task of non-rigid surface motion tracking. Instead of acquiring dense but noisy RI data, the MLT sensor delivers highly accurate but sparse measurements (recall Sect. 2.1.3). We have developed a novel sparse-to-dense registration approach that is capable of reconstructing the patient’s dense external body surface and estimating a 4-D (3-D+time) surface motion field from sparse sampling data and patient-specific prior shape knowledge [Baue 12a, Berk 13]. More specifically, the sparse position measurements acquired with the MLT sensor are registered with a dense pre-fractional reference shape. Thereby, a dense displacement field is recovered which describes the spatio-temporal deformation of the patient body surface, depending on the type and state of respiration. In a joint manner, the proposed method enables: • Dense reconstruction of the patient’s instantaneous external body shape • Estimation of non-rigid torso deformations, yielding a high-dimensional respiration surrogate The remainder of this chapter is organized as follows. In Sect. 6.1, we identify limitations of available solutions for marker-less intra-fractional respiratory motion tracking in IGRT, as a motivation for the proposed approach. Furthermore, we summarize related work w.r.t. methodology. For details on the MLT prototype we 101 102 Sparse-to-Dense Non-Rigid Surface Registration refer to Sect. 2.1.3. In Sect. 6.2, we introduce the variational formulation of the proposed method for sparse-to-dense non-rigid surface registration. In Sect. 6.3, the method is validated on NCAT data and evaluated in a comprehensive volunteer study investigating the method’s accuracy on realistic data. Eventually, we discuss the results and draw conclusions in Sect. 6.4. Parts of this chapter have been published in [Baue 12a, Berk 13] and are joint work with Prof. Dr. Martin Rumpf and Prof. Dr. Benjamin Berkels. 6.1 Motivation and Related Work Commercially available RI-based IGRT solutions for patient monitoring and respiratory motion tracking are subject to several limitations. First, they do not support dense sampling in real-time [Brah 08, Mose 11] or at the cost of a limited field of view [Bert 05, Peng 10, Scho 07]. For instance, the Sentinel system (CRAD AB, Uppsala, Sweden1 ) and the Galaxy system (LAP GmbH, Lüneburg, Germany2 ) take several seconds for a complete scan of the torso [Brah 08, Mose 11], and the real-time mode of the VisionRT stereo system (VisionRT Ltd., London, UK3 ) is limited to interactive frame rates of 1.5-7.5 Hz, depending on the size of the surface of interest [Peng 10]. The temporal resolution of these solutions may be insufficient to characterize respiratory motion [Wald 09]. We expect the low frame rates to result from the underlying measurement technologies: (1) consecutive light sectioning as used by the Sentinel and Galaxy systems is constrained by the fact that a laser line must be swept mechanically and a set of camera frames must be merged over time to reconstruct an appropriate dense surface scan from a set of subsequently acquired contours, (2) dense stereo imaging as used by VisionRT is known to imply a substantial computational burden for establishing image correspondences for 3-D reconstruction, posing constraints for real-time acquisition. Second, beside its limitations in terms of sampling density and speed, respectively, commercially available solutions often imply high costs in terms of hardware and are subject to measurement uncertainties due to the underlying sampling principles e.g. active stereo photogrammetry [Bert 05, Peng 10, Scho 07] or consecutive light-sectioning using mechanically swept lasers [Brah 08, Mose 11]. Third and last, the general focus of these systems is on patient positioning [Will 12] and none of them features dense and non-rigid respiratory motion tracking. Instead, if available at all, motion tracking is restricted to the acquisition of low-dimensional respiration surrogates [Scha 08]. Research on dense tracking of spatio-temporal deformations of a patient’s torso from marker-less RI data has emerged only recently [Baue 12b, Scha 12]. For instance, Schaerer et al. [Scha 12] have studied the application of a non-rigid extension of the ICP algorithm [Ambe 07] for this task, on stereo vision data acquired with the VisionRT system. We have presented a variational formulation for joint range image denoising and non-rigid registration with a planning surface 1 http://www.c-rad.se 2 http://www.lap-laser.com 3 http://www.visionrt.com 6.2 Sparse-to-Dense Surface Registration Framework (a) (b) 103 (c) Figure 6.1: Reconstruction of sparse non-rigid displacement fields from MLT data for respiratory motion tracking [Baue 12c]. (a) MLT measurements at exhale (in blue) and inhale (in green) respiration states. To support visual interpretation, the surface at exhale state (in gray), acquired with a dense RI camera, is additionally depicted, but not considered by the registration approach. (b) Estimated non-rigid displacement field (in yellow). (c) The acquisition of MLT data over time allows to analyze the spatial range of respiratory motion along the individual laser triangulation planes. [Baue 12b], as detailed in Chap. 5, on ToF data. However, both approaches rely on dense 3-D data that are subject to low SNRs and systematic errors, respectively. In this chapter, we propose an alternative strategy based on sparse but highly accurate RI data from a multi-line laser triangulation sensor. Early work on respiratory motion tracking using MLT data was restricted to the reconstruction of sparse displacement fields based on a non-rigid registration of successively acquired sparse point cloud data [Baue 12c], cf. Fig. 6.1. However, the estimated displacement fields rely on the insufficient assumption that the local surface trajectories reside in the plane of the projected laser line. Instead, below, a novel variational model is introduced to recover a dense, accurate and reliable 4-D displacement field and to reconstruct a complete patient body surface model at the instantaneous respiration state from sparse data, using prior patient shape knowledge from tomographic planning data. Estimating the dense displacement field is combined with recovering a sparse displacement field from MLT measurements to planning data. Thus, the approach is closely related to the field of inverse-consistent registration [Cach 00, Chri 01] where the deformations in both directions are estimated simultaneously with a penalty term constraining each deformation to be the inverse of the other one. Medical applications of the idea of inverse consistency include the symmetric matching of corresponding landmarks [John 02] and edge features [Han 07]. 6.2 Sparse-to-Dense Surface Registration Framework In this section, we describe the geometric configuration and derive the variational model for the joint reconstruction of the instantaneous patient shape and the estimation of the underlying dense and non-rigid displacement field, from sparse measurements and prior shape knowledge. 104 Sparse-to-Dense Non-Rigid Surface Registration Figure 6.2: Geometric configuration for the reconstruction of the dense deformation φ with φ(ζ, g(ζ )) = (ζ, g(ζ )) + u(ζ ) from sparse sampling data Y = {y1 , . . . , yn } and prior reference shape data G ⊂ R3 , and the approximate sparse inverse Ψ with Ψ(yi ) = yi + wi . For a better visibility G and Y have been pulled apart. Furthermore, the projection P onto G and the orthogonal projection Q from the graph G onto the parameter domain Ω are sketched. 6.2.1 Geometric Configuration Given is a pre-fractional reference shape G ⊂ R3 that can be (1) extracted from tomographic planning data, (2) captured with a dense RI sensor of low temporal resolution, or (3) acquired by an MLT sensor in combination with a steerable treatment table [Ettl 12a]. During therapeutic dose delivery, the instantaneous patient body surface denoted by M ⊂ R3 is represented by sparse MLT sampling data Y ⊂ R3 . In particular, the MLT sensor acquires a finite set of n measurements Y = {y1 , . . . , yn }, yi ∈ R3 , arranged in a grid-like structure (Fig. 6.2). Note that the intra-fractional grid-like sampling Y is not aligned with G and depends on the respiration state and magnitude at the time of acquisition. Now, the goal is to estimate the unknown dense and non-rigid deformation φ : G → R3 that matches the reference shape G to the instantaneous patient body surface M. Ideally, φ should be such that M = φ(G), but since Y only contains information about a sparse subset of M, the condition on φ appropriate for our problem setting is Y ⊂ φ(G). Along the lines of inverse-consistent registration, in a joint manner, we estimate φ together with its inverse ψ. Again, due to the sparse nature of the input data, we do not try to estimate the inverse everywhere on M but only on the known sparse subset Y. In other words, instead of trying to find ψ : M → R3 with ψ(M) = G , we estimate a sparse deformation Ψ : Y → R3 such that Ψ(Y ) ⊂ G . Here, dense and sparse deformations are distinguished by using lower and upper case letters respectively. Let us underline that Ψ is fully represented by the discrete set {Ψ(y1 ), . . . , Ψ(yn )} containing the deformed positions of the n points acquired by the MLT sensor. A geometric sketch that illustrates the deformations φ and Ψ is depicted in Fig. 6.2. As we will see when constructing the individual terms of our objective functional in Sect. 6.2.2, estimating Ψ allows us to establish a correspondence between the MLT measurements and the reference patient surface, whereas the dense deformation φ can be used as a high-dimensional breathing surrogate and enables the reconstruction of the complete instantaneous patient surface for intra-fractional monitoring of the patient setup. Before we describe the variational model in Sect. 6.2.2, let us introduce some basic notation, cf. Fig. 6.2. We assume that the reference shape G is given as a 6.2 Sparse-to-Dense Surface Registration Framework 105 graph, i. e. there is a parameter domain Ω ⊂ R2 usually associated with the patient table plane and a function g : Ω → R such that G = (ζ, g(ζ )) ∈ R3 : ζ ∈ Ω . Vice versa, the orthographic projection onto the parameter domain Ω is given as Q(ζ, g(ζ )) = ζ, Q ∈ R2×3 . Furthermore, we represent the sparse deformation Ψ by a set of displacement vectors W = {w1 , . . . , wn } ⊂ R3 via: Ψ(yi ) = yi + wi . (6.1) The deformation φ is represented by a displacement u : Ω → R3 defined on the parameter domain Ω of the graph G via: φ(ζ, g(ζ )) = (ζ, g(ζ )) + u(ζ ) . (6.2) To quantify the matching of Ψ(Y ) to G we apply the SDF-based closeness measure introduced in Sect. 5.3.2. Again, we emphasize that even though P(Y ) ⊂ G holds by construction, we do not expect any biologically reasonable Ψ to be equal to the projection P. Indeed, the computational results discussed below underline that it is the consistency term coupling φ and Ψ in combination with the prior for the deformation φ which leads to general matching correspondences for a minimizer of our variational approach. 6.2.2 Definition of the Registration Energy Now, since φ and Ψ are represented by u and W respectively, we define a functional E on dense displacement fields u and discrete sets of displacements W whose minimizer represents a suitable matching of the planning data G and MLT measurements Y: E [u, W ] := Ematch [W ] + κ Econ [u, W ] + λ Ereg [u] , (6.3) where κ and λ are nonnegative constants controlling the contributions of the individual terms. Ematch is a matching energy that encodes the condition Ψ(Y ) ⊂ G . The consistency functional Econ is responsible for establishing the relation between both displacement fields, constraining Ψ and φ to be approximately inverse to each other on the sparse set of positions Y where Ψ is defined. Thereby, it implicitly encodes the condition Y ⊂ φ(G). Finally, Ereg ensures a regularization of the dense displacement field u. The detailed definitions of these functionals are as follows. Matching Energy. The distance of the points Ψ(yi ) to their projection P(Ψ(yi )) onto G is a suitable indicator for the closeness of Ψ(Y ) to G . This pointwise distance can be conveniently expressed using the signed distance function dG , recall Sect. 5.3.2: With the representation Ψ(yi ) = yi + wi and this pointwise closeness measure (Eq. 5.5) we can define the matching functional: Ematch [W ] := 1 n |dG (yi + wi )|2 . 2n i∑ =1 By construction, this functional is minimal if and only if Ψ(Y ) ⊂ G . (6.4) 106 Sparse-to-Dense Non-Rigid Surface Registration Consistency Energy. For a given instantaneous deformation φ of the patient surface G and the corresponding exact deformation Ψ of the MLT measurement Y, we have Ψ(Y ) ⊂ G and these deformations are inverse to each other in the sense of the identity φ(Ψ(Y )) = Y. However, for an arbitrary deformation Ψ described by some vector of displacements W in general Ψ(Y ) 6⊂ G . Thus, since φ is only defined on G , we have to incorporate the projection P onto G to relate arbitrary φ and Ψ. This leads us to the identity φ( P(Ψ(Y ))) = Y that we encode with the consistency energy: Econ [u, W ] := = 1 n |φ( P(Ψ(yi ))) − yi |2 2n i∑ =1 1 n | P(yi + wi ) + u( Q P(yi + wi )) − yi |2 . 2n i∑ =1 (6.5) Here, we used Eq. (6.1), Eq. (6.2) and the projection Q onto the graph parameter domain. For a geometric interpretation of Eq. (6.5) we refer to the illustration in Fig. 6.2. The functional directly combines the dense displacement field u and the sparse set of displacements W and together with the regularizer Ereg substantiates the inverse-consistent sparse-to-dense registration approach of the method. Thus, it allows us to compute a dense smooth displacement of the patient planning surface even though only a sparse set of measurements is available. Smoothness Prior. To ensure smoothness of the deformation φ on G we incorporate a thin plate spline type regularization of the corresponding displacement field u [Mode 03b] and define: Z 1 |∆u|2 dζ . (6.6) Ereg [u] := 2 Ω Here, ∆u = (∆u1 , ∆u2 , ∆u3 ) and thus |∆u|2 = ∑3k=1 (∆uk )2 . Since our input data Y only implicitly provide information for φ on a sparse set, a first order regularizer is inadequate to ensure sufficient regularity for the deformation. Let us emphasize that the interplay of Econ and Ereg implicitly provides (discrete) smoothness of the approximate inverse deformation Ψ. Thus, there is no need to introduce a separate regularization term for Ψ. 6.2.3 Numerical Optimization To minimize the highly non-convex objective functional E w.r.t. the unknowns u and W, we apply a multi-linear FE discretization in space and use a regularized gradient descent again to guarantee a fast and smooth relaxation [Sund 07]. For the gradient descent, derivatives of the energy have to be computed. The derivatives of Ematch and Econ w.r.t. w j are given as: 1 d (y + w j )∇dG (yj + w j ) , n G j > 1 ∂wj Econ [u, W ] = P(yj + w j ) + u( QP(yj + w j )) − yj n DP(yj + w j ) + Du( Q P(yj + w j )) Q DP(yj + w j ) , ∂wj Ematch [W ] = (6.7) (6.8) 6.3 Experiments and Results 107 where DP denotes the Jacobian of the projection P. The variations of Econ and Ereg w.r.t. u in a direction ϕ : Ω → R3 are given by: h∂u Econ [u, W ], ϕi = 1 n ( P(yi + wi ) + u( Q P(yi + wi )) − yi ) · ϕ( Q P(yi + wi )) , n i∑ =1 (6.9) 0 hEreg [ u ], ϕ i = 3 Z ∑ k =1 Ω ∆uk · ∆ϕk dζ . (6.10) For a derivation we refer to the Appendix (Sect. A.3.1). After an appropriate scaling of G we choose Ω = [0, 1]2 and consider a piecewise bilinear, continuous FE approximation on a uniform rectangular mesh covering the domain Ω. In all experiments below we used a 129×129 grid. The gradient descent is discretized explicitly in time, the step size is controlled with the Armijo rule [Armi 66]. We stop the gradient descent iteration as soon as the energy decay is smaller than a specified threshold value τ, for practical values we refer to Sect. 6.3.2. By default, both deformations φ and Ψ are initialized with the identity mapping, i.e. u, W are initialized with zero. In the experiments (Sect. 6.3.2), we further study the benefit of initializing φ with the estimates from the previous step and initializing Ψ with Ψ(yj ) = P(yj ) for j ∈ {1, . . . , n}. The SDF dG is pre-computed using a fast marching method [Russ 00] on a uniform rectangular 3-D grid. Further details on the numerical optimization are given in the supplementary material of Berkels et al. [Berk 13] and in the Appendix (Sect. A.3.2). 6.3 Experiments and Results The experimental evaluation divides into two parts. In the first part, the proposed algorithm is validated on surface data from the NCAT 4-D CT respiration phantom. In the second part, we present a comprehensive study on realistic data from 16 healthy subjects. We have quantified the accuracy in 4-D deformation estimation and surface reconstruction, respectively, and have analyzed the performance of the proposed framework w.r.t. relevant system parameters. 6.3.1 Materials and Methods All experiments below were performed with a constant parameter setting of κ = 0.8, λ = 4 · 10−8 . These weighting factors were determined empirically. The convergence threshold was empirically set to τ = 10−4 . To generate MLT sampling data from synthetic datasets, we have developed a virtual simulator that mimics the sampling principle of the MLT sensor by intersecting a given dense and triangulated surface with a set of sampling rays. These rays are arranged in a grid-like structure and the default grid and sampling density of the simulator are set in accordance to the specifications of the actual MLT prototype used in the experiments on real data. Due to occlusion constraints in a clinical RT environment, the simulator’s sampling plane and viewing angle, respectively, is set 30◦ off from an orthogonal camera position w.r.t. the treatment table. 108 Sparse-to-Dense Non-Rigid Surface Registration Validation on 4-D CT Respiration Phantom Data. For model validation, we have investigated the reconstruction of respiratory deformation fields from surface data of the NCAT 4-D CT respiration phantom [Sega 07]. For the experiments, we generated dense surface meshes M p for eight phases within one respiration cycle, for male and female phantom data. The index p ∈ {1, . . . , 8} denotes the phase. In addition, we decided to consider both scenarios of arms-up and armsdown patient posture that occur in clinical RT treatment. This results in a total number of 8 · 2 · 2 = 32 datasets, 16 male and 16 female. The NCAT parameters were set to default values for the male and female phantom. The phantom surface at the state of full expiration M1 (phase 1 out of 8) was considered as the planning geometry G . The remaining set of surfaces was used to generate synthetic sparse sampling data Y2 , . . . , Y8 using our MLT simulator. The accuracy of the deformation estimation is assessed by the absolute distance of the points in φ p (G) to M p , representing the residual mismatch in terms of mesh-to-mesh distance between the transformed reference surface φ p (G) and the ground truth surface M p that is to be reconstructed from a sparse sampling Yp and prior shape data G . Here, we exploit the SDF w.r.t. M p to establish a correspondence between φ p (G) and M p by computing the distance of a point in the transformed reference surface to the closest point on the ground truth surface, i.e. computing |dM p | on φ p (G). In order to discard boundary effects at the body-table transition, the evaluation is performed in the central volume of interest that covers the trunk of the phantom. Prototype Study on Healthy Subjects. To demonstrate the clinical feasibility of the proposed system and to evaluate it under realistic conditions, we have conducted a comprehensive study on 16 healthy subjects, male and female. In particular, we have investigated the performance of our modified projection approximation scheme [Berk 13] compared to [Baue 12a], the impact of initializing the displacement fields with estimates of the preceding respiration phase, and the influence of the convergence threshold τ. Using the MLT simulator and the measured noise characteristics of our prototype, we performed realistic simulations to study the influence of the MLT laser grid density with regard to upcoming generations of sensor hardware. Along with qualitative and quantitative results in terms of reconstruction accuracy, we have empirically analyzed the performance of our implementation in terms of convergence and runtime, respectively. For the volunteer study, we have used an eye-safe MLT prototype as described in Sect. 2.1.3. The evaluation database is composed of 32 datasets from 16 subjects, each performing (1) abdominal and (2) thoracic breathing, respectively. Per subject, we synchronously acquired both real MLT data and surface data using a moderately accurate but rather dense structured-light (SL) system that provides range images of 640×480 px (Microsoft Kinect, see Sect. 2.1.3). SL data was acquired in order to provide a dense ground truth surface for quantitative evaluation of our approach. Both sensors were mounted at a height of 1.2 m above the patient table, at a viewing angle of 30◦ . MLT and SL data were aligned using geometric calibration. SL data were pre-processed using edge-preserving bilateral filtering. From each of the 32 datasets, we extracted sparse MLT measurements Yp and dense SL meshes M p (re-sampled from a rectangular 129×129 grid, cf. Sect. 6.2.3) for eight 6.3 Experiments and Results 109 phases within one respiration cycle. With 16 subjects performing abdominal and thoracic respiration, this results in a total number of 16·2·8 = 256 datasets. For the experiments, we considered the reconstruction of the displacement field φ p from a given planning surface G and intra-fractional MLT data Yp , p ∈ {2, . . . , 8}. The subject’s body surface at full expiration M1 being acquired with the SL system was considered as the given planning data surface G . As with the 4-D CT respiration phantom study, the accuracy of deformation estimation is assessed by the residual mismatch between the transformed reference surface φ p (G) – being reconstructed from sparse sampling data Yp and prior shape data G – and the ground truth SL surface M p , i. e. |dM p | on φ p (G), in the central volume of interest. In practice, a quantitative evaluation on synchronously acquired real MLT and SL data was unfeasible, as the SL camera exhibited local sampling artifacts due to the underlying measurement principle and interferences between the laser grid (MLT) and speckle pattern projections (SL) of the synchronously used modalities, which caused local deviations in the scale of several millimeters4 . Hence, the evaluation on real MLT data is restricted to qualitative results. For quantitative evaluation, we employed our simulator for the generation of realistic MLT sampling data Yp from dense SL surfaces M p . In order to generate MLT data as realistic as possible, the noise characteristics of our MLT sensor prototype were measured in an optics lab and applied to the synthetic sampling of dense SL data. Let us stress here that the aforementioned interferences solely hinder the generation of ground truth data necessary for evaluation. The practical application of the proposed method only requires an MLT sensor indeed. Influence of Estimate Initialization and Convergence Threshold. As proposed in Sect. 6.2.3, we investigated the benefits of initializing φ with the estimate from the previous phase and Ψ with Ψ(yj ) = P(yj ) as initial data, in order to reduce the number of iterations needed for the optimization scheme to convergence. In particular, we found that using the projection P onto G as initial guess for Ψ is an even better estimate than just considering the previous estimate. Note that for the first frame we initialize φ with the identity mapping. Furthermore, in order to determine a suitable value for the optimization convergence threshold τ, we studied the impact of τ on both reconstruction accuracy and runtime. Influence of Modified Projection Approximation. The numerical evaluation of the projection onto G is based on the SDF dG and the term P( x) = x − dG ( x)∇dG ( x). Thus, the variation of Econ w.r.t. Ψ involves the derivative of P which in turn involves second derivatives of dG . To avoid these second derivatives we use a projection approximation scheme. Compared to the approach used in [Baue 12a], the scheme applied here treats the distance dG implicitly and the direction ∇dG explicitly [Berk 13], while [Baue 12a] treated both dG and ∇dG explicitly. Thus, the improved projection approximation scheme reflects the underlying projective ge4 We expect similar interference effects when simultaneously using our MLT sensor together with an active stereo photogrammetry system such as the VisionRT stereo pods for evaluation which also rely on speckle pattern projectors in the infrared domain for simplifying the stereo matching problem. Sparse-to-Dense Non-Rigid Surface Registration |dM p | on φ p (G) in [mm] |dM p | on φ p (G) in [mm] 110 ( a) (b) Figure 6.3: Validation of the model on a male (a) and female (b) 4-D CT respiration phantom. Given are boxplots of the absolute registration error in [mm] w.r.t. discrete ranges of respiration amplitude in terms of |dM p | on φ p (G). Each boxplot combines the results for the phantom postures of arms-up and arms-down, illustrated above the plots. ometry much better and is substantially more efficient. Details on the projection approximation scheme are provided in the Appendix (Sect. A.3.2). In order to study the computational impact of this modification, we compared the results with the improved projection approximation [Berk 13] to previous results [Baue 12a] for τ = 10−7 . Influence of the MLT Laser Grid Density. Upcoming generations of MLT sensors are expected to feature higher laser grid densities, see Sect. 6.4. Hence, we have investigated the effect of the projection laser grid density on the registration error. The evaluation was performed on realistic MLT simulator data – 256 datasets from 16 subjects, each sampled with grids of 11×11, 22×22, 33×33 and 44×44 lines – as MLT sensors with higher grid densities than our prototype (11×10 sampling lines) are under development and do not exist yet. Hence, at this point, an experimental study on real data was unfeasible. 6.3.2 Results Validation on 4-D CT Respiration Phantom Data. Quantitative results of the proposed method on NCAT phantom data are given in Fig. 6.3. The boxplots illustrate the absolute registration error w.r.t. discrete ranges of respiration amplitude for the male and female phantom. The results for the arms-up and armsdown datasets are combined per gender. Even for instances with a large initial surface mismatch in the range of 9-12 mm, the median residual error in terms of |dM p | on φ p (G) is smaller than 0.1 mm. It is also worth noting that the error scales directly proportional to the respiration amplitude. Qualitative results for the male and female NCAT phantoms are depicted in Fig. 6.4. With the female phantom, 6.3 Experiments and Results 111 NCAT Male (arms-up) p: 2→1 p: 4→1 NCAT Female (arms-down) p: 2→1 p: 4→1 G, Yp and Y1 dG on Mp , Yp dMp on φp (G), Yp φp on G Figure 6.4: Qualitative results of the NCAT experiments, for reconstruction of the deformation field for respiration phases p = 2 and p = 4 w.r.t. phase 1 as reference (full expiration), for male arms-up (left) and female arms-down (right) data. First row: G , Yp (outer contour) and Y1 ⊂ G . Note that the synthetic MLT data Yp were generated by sampling M p using our MLT simulator. Second row: Initial mismatch in terms of dG on M p , and Yp . Third row: Residual mismatch after application of the proposed method in terms of dM p on φ p (G), and Yp . Fourth row: Glyph visualization of the displacement field φ p on G , u p is color coded according to the color bar on the right. the MLT coverage of the breast is limited. This becomes evident with an increased local error around the outer lateral part of the female breast. However, the impact is moderate due to the incorporation of prior shape knowledge and the higher order regularization of φ (Sect. 6.2.2). These model priors are also beneficial in cases of (self-)occlusion. For instance, due to the viewing angle of 30◦ w.r.t. the treatment table plane, the upper part of the female breast in Fig. 6.4 is self-occluded. Nonetheless, the occluded areas can be reconstructed in a robust manner. Prototype Study on Healthy Subjects. Qualitative results of the study on healthy subjects are depicted in Fig. 6.5. To facilitate an anatomic interpretation of the deformation, we overlaid the color texture – that was additionally acquired with our SL device – onto G . Please note that an analysis of the deformation φ p allows for a distinct differentiation between abdominal and thoracic respiratory motion patterns and inter-subject variations in the respiration amplitude. For instance, in the case of thoracic respiration, subject S1 and subject S2 exhibit a similar motion pattern in the thorax region but substantial differences in the abdominal region. An overview of the quantitative results over all subjects on realistic MLT data is given in Fig. 6.6. In particular, Fig. 6.6a depicts boxplots of the initial mismatch |dG | on M p and residual mismatch |dM p | on φ p (G) over all 16 subjects. Here, the results for abdominal and thoracic respiration are evaluated in a common plot. While Fig. 6.6a gives an impression about the overall performance, Fig. 6.6b shows 112 Sparse-to-Dense Non-Rigid Surface Registration Abdominal Respiration p: 2→1 p: 3→1 p: 4→1 Thoracic Respiration p: 2→1 p: 3→1 p: 4→1 S1 S2 S3 S4 Figure 6.5: Joint deformation estimation and surface reconstruction on real MLT data. Depicted are results from four subjects (left to right), for abdominal (top) and thoracic (bottom) respiration, for phases p ∈ {2, 3, 4}. For each subject, the reference surface G = M1 and the respective MLT sampling data Yp are shown in the first row: Y2 inner contour (black), Y4 outer contour (red), Y3 in between (green). The following three rows illustrate the estimated displacement fields φ2 , φ3 , φ4 on G . For the glyph visualization of φ p on G , u p is color coded in [mm] according to the color bar on the right. the residual mismatch in a more detailed scale. Figs. 6.6c,d depict the residual error for discrete respiration phases, for abdominal (c) and thoracic respiration (d), over all subjects. The reconstruction error scales directly proportional to the respiration amplitude observing a peak at the state of fully inhale (phase 4/5). The boxplot whiskers indicate that more than 99% of the residual error is less than 1 mm. Over all subjects, respiration types and respiration phases, the mean reconstruction error in terms of residual mismatch |dM p | on φ p (G) was ± 0.21 mm and ± 0.25 mm for abdominal and thoracic respiration, respectively, see Table 6.1. The 95th percentile did not exceed 0.93 mm for abdominal respiration and 1.17 mm for thoracic respiration, for any subject. For a detailed overview of the initial and residual mismatch (95th percentile) for the individual subjects, separated for abdominal and thoracic respiration, we refer to the Appendix (Sect. A.3.3). We assume the moderately higher reconstruction error for thoracic respiration to result |dG | on Mp in [mm] 113 |dMp | on φp (G) in [mm] Experiments and Results |dMp | on φp (G) in [mm] 6.3 (c) |dMp | on φp (G) in [mm] |dMp | on φp (G) in [mm] (a) (b) (d) Figure 6.6: Quantitative results of the prototype study, for realistic MLT sampling data from 16 subjects. (a) Reconstruction results per individual subject, comparing the initial mismatch in terms of |dG | on M p (dark gray bars) vs. residual mismatch in terms of |dM p | on φ p (G) (light gray bars) as boxplots over both respiration types (abdominal and thoracic) and all phases. (b) Residual mismatch in terms of |dM p | on φ p (G) per subject. (c, d) Boxplots of the residual mismatch for discrete phases of the respiration cycle, for abdominal (c) and thoracic (d) respiration, over all subjects. from the higher initial mismatch of thoracic respiration data (mean: 6.24 mm) compared to abdominal data (mean: 5.09 mm). Influence of Estimate Initialization and Convergence Threshold. Experimental results in terms of reconstruction accuracy and runtime with and without an appropriate initialization of φ and Ψ are given in Fig. 6.7a. The experiments illustrate that initializing φ and Ψ reduces the number of iterations needed for the optimization scheme to convergence, while achieving a comparable registration error. Over all datasets, estimate initialization reduced runtime by 19.2%. The results for different convergence thresholds (τ = 10−4 and τ = 10−7 ) are depicted in Fig. 6.7b. The plots indicate that a reduction of τ by a factor of 103 results in a small improvement in reconstruction accuracy at the cost of a substantial increase in solver iterations. In an empirical study, we found that the convergence thresh- 114 Sparse-to-Dense Non-Rigid Surface Registration Table 6.1: Results over all subjects, respiration types and phases. Given are the mean, median and 95th percentile of the initial and residual mismatch in [mm], for abdominal respiration (A), thoracic respiration (T) and the entire dataset covering both respiration types (A/T). The last row states numbers in terms of residual mesh-to-mesh mismatch from related work by Schaerer et al. [Scha 12]. Mean Median 95th Percentile [Scha 12], 95th Percentile Initial Mismatch [mm] A T A/T 5.09 6.24 5.66 3.95 4.66 4.23 14.0 17.1 15.2 6.1 Residual Mismatch [mm] A T A/T 0.21 0.25 0.23 0.13 0.15 0.14 0.69 0.82 0.76 1.08 old of τ = 10−4 used in the experiments gave a suitable tradeoff between accuracy and runtime. Influence of Modified Projection Approximation. The plots in Fig. 6.7c illustrates that both projection approximations result in a comparable reconstruction accuracy while the new improved approximation reduces runtime substantially (48.2% over all subjects). Note that we used τ = 10−7 instead of τ = 10−4 as convergence threshold to separate the effect of the improved approximation from the influence of the convergence threshold. Influence of the MLT Laser Grid Density. Qualitative and quantitative results w.r.t. the influence of the MLT grid density are depicted in Fig. 6.8 and Fig. 6.9. As intuition suggests, a higher grid density comes with a more reliable reconstruction of the deformation (Fig. 6.9). In particular, this becomes evident in remote regions that were poorly covered by a less dense laser grid. Please compare the local colorcoding for increasing grid density in Fig. 6.8. A valuable outcome of this study is the fact that doubling the grid density from 11×11 to 22×22 gives a substantial advantage (49.0% in average), but going further does not seem to noticeably improve the results (Fig. 6.9) – probably due to the comparably smooth surface topography of the human torso. Runtime Performance. Let us conclude the experimental evaluation with a comment on runtime performance. The total runtime per frame was 2.3 s, measured as mean over all datasets of the volunteer study. In detail, when initializing φ with the estimates from the previous phase and Ψ with Ψ(yj ) = P(yj ), the optimization process took 38.2±2.1 iterations to converge for one subject, in average over all subjects, respiration types and respiration phases. With our proof of concept implementation, a single gradient descent step on a single core of a Xeon X5550 2.67 GHz CPU takes ≈ 60 ms. The resulting per-frame runtime of 2.3 s substantially outperforms related work on dense-to-dense surface registration [Scha 12] with runtimes in the scale of minutes (25 iterations, 11.9 s per iteration on comparable CPU and for a surface mesh with a comparable number of vertices). ((a) a) (a) (a) |dM onφφ φ (G)inin in[mm] [mm] |d (G) p |d on (G) [mm] pp M p|||on M pp 115 |dMp|||on on φp(G) (G)inin in[mm] [mm] |d (G) [mm] |d Mpp onφφpp M Discussion and Conclusions |dMp || on on φp(G) (G) in in [mm] [mm] |d Mp| on φφpp(G) |d in [mm] M p 6.4 ((b) b) (b) ((c) c) (c) Figure 6.7: Study of algorithmic parameters and modifications. Given are the residual mismatch (top row) and the number of iterations until convergence (bottom row), respectively. (a) Results without (dark gray) and with (light gray) initialization of φ and Ψ. (b) Impact of the convergence threshold, results for τ = 10−7 are depicted in dark gray, results for τ = 10−4 in light gray. (c) Impact of the improved projection approximation [Berk 13] (light gray) compared to our previous work [Baue 12a] (dark gray). Note that in order to investigate the impact of different convergence thresholds and projection approximations independent from the effect of initializing φ and Ψ, the results in (b) and (c) were generated without initialization. 6.4 Discussion and Conclusions In this chapter, we have introduced a novel variational approach to marker-less reconstruction of dense non-rigid 4-D surface motion fields from sparse data and prior shape knowledge from tomographic planning data. In the field of IGRT, these motion fields can be used as high-dimensional respiration surrogates for gated RT, as input for accurate external-internal motion correlation models in respiration-synchronized RT, for motion compensated patient positioning [Wasz 12b, Wasz 13], and to reconstruct the intra-fractional body shape for patient setup monitoring during dose delivery. We have investigated the performance of the proposed method on synthetic, realistic and real data. In a comprehensive study on 256 datasets from 16 subjects with an average initial surface mismatch of 5.66 mm, the mean residual reconstruction error was ±0.23 mm w.r.t. ground truth data. The 95th percentile of the local residual mesh-to-mesh distance after registration did not exceed 1.17 mm for any subject. In the experiments, it was further shown that a proper initialization of the displacements φ and Ψ (Sect. 6.3.2) and the improved approximation of the 116 Sparse-to-Dense Non-Rigid Surface Registration Abdominal Respiration Thoracic Respiration (a) (d) (a) (d) (b) (e) (b) (e) (c) (f ) (c) (f ) Figure 6.8: Estimation of φ4 transforming G into M4 from realistic MLT sampling data Y4 , for abdominal (left) and thoracic (right) respiration. The phase p = 4 represents the respiration state of fully inhale, roughly. (a) Planning surface G = M1 , MLT sampling Y4 (outer contour) and Y1 ⊂ G . (b) Glyph visualization of the estimated φ4 on G , for 11×11 sampling lines, |u4 | is color coded [mm]. (c) Initial mismatch in terms of dG on M4 . (d- f ) Residual mismatch in terms of dM4 on φ4 (G) for grid resolutions of 11×11, 22×22, 44×44 sampling lines. Note that the color coding between (c) and (d- f ) differs by a factor of 10. projection compared to our first approach [Baue 12a] (Sect. 6.3.2) reduced the runtime by 19.2% and 48.2%, respectively. Doubling the MLT laser grid density from 11×11 to 22×22 lines would yield a considerable gain in accuracy (Sect. 6.3.2). In this context, let us remark that the MLT sensor used for the experiments in this work has not yet been optimized for the task of patient respiration monitoring. First, higher frame rates are possible for both the line pattern projection systems and the observing camera. The total frame rate of the MLT system is currently limited only by the applied camera technology. Second, denser laser lines can be realized by adapting the setup to the required measurement volume. With regard to the state-of-the-art in non-rigid surface deformation estimation, in particular in IGRT, let us compare our results to recent work by Schaerer et al. [Scha 12] on motion tracking with dense surfaces. In their study on five male subjects and dense surface acquisitions from three respiration phases5 , the authors achieved a residual mismatch of 1.08 mm (95th percentile) in terms of mesh-to5 The study by Schaerer et al. [Scha 12] was performed on data acquired with two stereo imaging units placed symmetrically w.r.t. the treatment couch, and in the static mode of the VisionRT system (AlignRT). A practical application would require the use of the dynamic mode instead, that potentially comes along a reduced FOV and/or increased uncertainties in 3-D surface sampling. 6.4 Discussion and Conclusions 117 |dMp | on φp (G) in [mm] Influence of MLT Grid Density Figure 6.9: Influence of MLT grid density on registration accuracy, evaluated on 256 datasets from 16 subjects. Given are boxplots for laser grid resolutions of 11×11, 22×22, 33×33, 44×44 sampling lines (grouped as four adjacent entries colored from dark to light gray), for increasing ranges of respiration amplitude (from left to right). mesh surface distance using non-rigid ICP surface registration [Ambe 07]. Note that our result of 0.76 mm for the 95th percentile residual mismatch over all subjects and both respiration types slightly outperform these numbers, see Table 6.16 , though our quantitative experiments are limited to realistic data. In addition, let us remark that compared to Schaerer’s volunteers, many of the subjects in our study exhibited a considerably higher respiration amplitude and initial mismatch (15.2 mm vs. 6.1 mm), respectively (cf. Table 6.1). Hence, the low residual mismatch indicates that our method can reliably recover the dense displacement field from a sparse sampling of the instantaneous patient state using prior shape knowledge, even in the presence of strong respiration. In addition to the hardware-related benefits of the MLT sensor for respiratory motion tracking compared to existing RI-based IGRT solutions (cf. Sect. 6.1), the CPU implementation of our surface registration approach outperforms the nonrigid ICP used by Schaerer et al. substantially in terms of runtime performance (two orders of magnitude). As our approach exhibits an inherently high degree of data parallelism, a GPU implementation [Bruh 12] might be considered in future work to achieve real-time operation required for clinical applications. In addition to exploiting data parallelism, embedded hardware interpolation could be used for an efficient evaluation of the pre-computed SDF, e.g. using 3-D textures. Furthermore, independent of hardware acceleration techniques, a runtime speedup can be achieved by reducing the dimensionality of the optimization problem, i.e. considering a subset of the MLT sampling data Y and/or reducing the fineness of the reference shape G given by the density of the grid covering Ω. 6 Note that the reported accuracy depends on the RI modality and the applied pre-processing pipeline. In [Scha 12], details about this stage are not given. 118 Sparse-to-Dense Non-Rigid Surface Registration CHAPTER 7 Photometry-driven Non-Rigid Surface Registration 7.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 In the two previous chapters, we have presented methods for non-rigid surface registration that rely on the sole geometry of the shapes to be aligned. In this chapter, we investigate the potential of using complementary photometric information, available with modern RGB-D cameras, to guide the surface registration process [Baue 12d]. Again, we consider the tracking of spatio-temporal 4-D surface motion fields in IGRT describing the elastic patient torso deformations induced by respiratory motion. In this context, we compare the proposed photometry-driven non-rigid registration approach to a geometry-driven baseline. With regard to the clinical workflow in IGRT, let us remark that the proposed photometric approach requires RGB-D reference data – a direct alignment w.r.t. tomographic planning data is unfeasible as those do not provide photometric information. This implies the need for an initial pre-registration of RGB-D reference data onto the tomographic planning shape, potentially introducing additional uncertainties due to error propagation, cf. Sect. 3.1.1. The remainder of this chapter is organized as follows. In Sect. 7.1, we motivate our approach with a practical observation and review relevant literature in the field. The proposed framework for photometry-driven non-rigid surface registration is introduced in Sect. 7.2. In Sect. 7.3, we present experimental results on real data. In particular, we compare the photometry-driven approach to a conventional geometry-driven surface registration as previously introduced in Sect. 5.3. Eventually, we discuss the results and draw conclusions in Sect. 7.4. Parts of this chapter have been published in [Baue 12d] and are reprinted with kind permission from Springer Science and Business Media, © Springer-Verlag Berlin Heidelberg 2012. 119 120 Photometry-driven Non-Rigid Surface Registration r (end-exhalation) r (end-inhalation) f rgb (end-exhalation) (a) (b) (c) (d) f rgb (end-inhalation) Figure 7.1: RGB-D measurements of a reclined patient at different respiration states (endexhalation/end-inhalation). In subfigure (a), the measured range data r is color-coded. Blue tones denote closeness, red tones remoteness to the RI camera. In subfigure (b), the additionally acquired photometric information f rgb is mapped onto the 3-D surface Xr . The bottom row sketches our practical observation. Estimating local displacements from the sole geometry (c, red arrows) differs from the motion of the external body surface w.r.t. photometric measurements, cf. the trajectories at salient landmarks (d, green arrows). 7.1 Motivation and Related Work Related work on estimating the torso deformation induced by respiratory motion typically relies on surface registration techniques that solely consider the 3-D topography of the external patient body surface, cf. Chapters 5,6 and the work by Schaerer et al. [Scha 12]. In practice, we have experienced that the displacement fields estimated from geometry-driven approaches do not necessarily match the changes that can be observed in the photometric domain. For a concrete illustration of this effect on real data we refer to Fig. 7.1. Hence, in this chapter, we propose an alternative photometry-driven surface registration approach to recover non-rigid patient torso deformations. Years before affordable dynamic RGB-D technologies were introduced, Vedula et al. proposed the concept of scene flow as an extension of optical flow in the 3-D domain [Vedu 99]. Basically, optical flow is the projection of scene flow onto the sensor plane – or vice versa, back-projection of the optical flow onto the measured surface geometry yields the scene flow in 3-D space. Early work in the field focused on the computation of scene flow from stereo image sequences [Zhan 00a, Vedu 05, Hugu 07] or multi-camera setups [Carc 02, Pons 07] and were often restricted to sparse and inaccurate scene flow estimates due to the multi-view correspondence problem in poorly textured regions, recall Sect. 2.1.1. Spies et al. were the first to study scene flow estimation on RI sequences with complementary photometric information, also known as range flow [Spie 02]. In analogy to the classi- 7.2 Materials and Methods 121 cal optical flow constraint [Horn 81, Luca 81], assuming brightness constancy, the authors introduced an additional range flow constraint to model the motion in the depth component. Along with the widespread availability of dense and realtime RGB-D sensors, the estimation of scene flow has increasingly gained interest. Most commonly, along the lines of variational optical flow [Horn 81], scene flow is estimated by optimizing a global energy function that consists of a data term comprising photometric and/or range constraints and a regularization term enforcing smoothness of the displacement field [Spie 02, Gott 11, Herb 13, Leto 11]. Letouzey et al. combined the conventional optical flow constraint in the image domain with sparse 3-D correspondences that are established from 2-D photometric features extracted and matched in the image domain. The regularization of the 3-D displacement field is performed over the surface, exploiting the associated range measurements [Leto 11]. Gottfried et al. built on previous work by Spies et al. [Spie 02] and Sun et al. [Sun 10] to estimate scene flow on Microsoft Kinect data [Gott 11]. They propose the use of an adaptive regularization of the displacement based on strong regularization in valid and weak regularization in invalid regions to preserve scene flow discontinuities. Recent work by Herbst et al. also built on the basic concept by Spies et al. [Herb 13]. Beyond variational formulations, alternative approaches have been presented. For instance, Quiroga et al. have proposed the computation of 3-D scene flow in a Lucas-Kanade framework [Quir 12], directly estimating the trajectories of local surface patches. Essentially, the authors consider scene flow computation as a 2-D tracking problem in both photometric intensity and geometric depth data. In contrast, Hadfield and Bowden proposed a particle filter framework, modeling RI data as a collection of moving particles in 3-D space [Hadf 11]. Assuming a comparably small degree of deformation for the problem at hand, note that we have neglected related work on scene flow reconstruction for large displacements in this review [Thor 09]. In the context of this thesis and with regard to the practical observation depicted initially, we are particularly interested in comparing a photometry-driven surface motion field estimate with a geometry-driven estimate. In this chapter, we confine to a classical two-stage approach along the lines of Vedula et al. [Vedu 05]. Based on an optical flow estimate in the 2-D photometric domain, the surface motion field is deduced directly using the associated 3-D point cloud data. To our knowledge, this is the first medical application of the concept of scene flow. 7.2 Materials and Methods In this section, we describe the proposed method for photometry-driven reconstruction of 3-D surface motion fields (Sect. 7.2.1). In addition, for comparison, we oppose a geometry-driven surface registration approach (Sect. 7.2.2), based on the scheme previously introduced in Sect. 5.3. For an illustrative comparison of both approaches we refer to Fig. 7.2. In terms of notation, r : Ω → R denotes the geometric range and f rgb : Ω → R3 the associated photometric color measurements in the 2-D sensor domain Ω. Exemplary data of a male torso under respiration are depicted in Fig. 7.1a,b. Fur- 122 Photometry-driven Non-Rigid Surface Registration (a) (b) Figure 7.2: Graphical illustration of the methods for (a) geometric and (b) photometric estimation of surface motion fields ug , up . The geometric approach matches the given shapes based on its geometry in a direct manner. In contrast, the proposed photometric approach comprises a two-stage scheme, first estimating a 2-D motion field ũp in the photometric image domain and then deducing the 3-D surface motion field up from the associated range measurements. thermore, RGB-D reference data are denoted (rref , Xrref , f rgb,ref ), the instantaneous intra-fractional data at time t as (rt , Xrt , f rgb,t ). The photometry- and geometrydriven displacement fields which describe the deformations φp : R3 → R3 and φg : R3 → R3 with φp(g) (Xrref ) ≈ Xrt are denoted up : Ω → R3 and ug : Ω → R3 , respectively. 7.2.1 Photometry-Driven Surface Registration The proposed algorithm to estimate dense 3-D surface motion fields from photometric information is based on a two-stage procedure: First, we perform a nonrigid registration in the photometric 2-D image domain. In general, any parametric or non-parametric registration method can be applied for this purpose [Zito 03]. In this work, we use an optical flow framework, see Sect. 7.3.1. Second, we transfer the 2-D point correspondences in the photometric image domain to the 3-D point positions of the associated range measurements. Hence, the proposed scheme yields both: • A 2-D displacement field ũp : Ω → R2 describing the deformation in the photometric image domain, • A 3-D displacement field up : Ω → R3 , deduced from the former, describing the geometric deformation of the associated surface. A graphical illustration of the approach is given in Fig. 7.2b. Note that without loss of generality, any arbitrary non-rigid image registration method that yields a dense displacement field can be applied in the first stage. Based on the estimated 7.3 Experiments and Results 123 2-D displacement field ũp , the 3-D surface motion field up between the reference shape Xrref and the torso shape Xrt at respiration state t can be inferred via: up (ζ ) = xrt ζ + ũp (ζ ) − xrref (ζ ) = rt ζ + ũp (ζ ) γ ζ + ũp (ζ ) − rref (ζ )γ(ζ ) , (7.1) using bilinear interpolation in the sensor domain Ω for evaluating rt ζ + ũp (ζ ) . 7.2.2 Geometry-Driven Surface Registration As motivated in Sect. 7.1, we intend to compare the proposed photometry-driven surface registration to a geometry-driven baseline. For this purpose, we build on the variational non-rigid surface registration framework introduced in Sect. 5.3. Here, we represent the surface Xrt at time t by its corresponding signed distance function dXrt . Then, using |dXrt (φg ( x))| as a pointwise closeness measure, we estimate the geometry-driven 3-D surface motion field ug by minimizing: E [ug ] = Ematch [ug ] + κg Ereg [ug ] = Z Ω |dXrt (φg ( xrref (ζ )))|2 + κg k Dug (ζ )k22 dζ , (7.2) where κg denotes the regularization weight. For numerical minimization, we considered a gradient descent scheme as described in Sect. 5.4.2. Recall that both ug : Ω → R3 and up : Ω → R3 describe the 3-D torso deformation. What differs is the driver of the surface motion estimation, namely, geometric information for ug and complementary photometric information for up (Sect. 7.2.1). 7.3 Experiments and Results In the experiments, we investigate the application of the proposed photometrydriven reconstruction of 3-D surface motion fields for the tracking of elastic torso deformations induced by respiration. 7.3.1 Materials and Methods We have acquired RGB-D data with Microsoft Kinect from four healthy subjects. Being reclined on a treatment table, the subjects were asked to perform (1) normal breathing and (2) deep inhalation thoracic breathing. For both respiration types, we extracted RGB-D data for the states of end-expiration and end-inhalation, respectively. The task to be solved was then to find the non-rigid surface motion field aligning the former to the latter. Prior to registration, range measurement data were pre-processed using edge-preserving denoising (Sect. 2.2.3). For estimating the non-rigid 2-D displacement field ũp with the proposed photometry-driven approach, we have applied a variational optical flow framework by Liu [Liu 09]. Essentially, it builds on the combined local-global method for optical flow computation by Bruhn et al. [Bruh 05] that combines the advantages of two classical algorithms: the variational approach by Horn and Schunck [Horn 81] providing dense 124 Photometry-driven Non-Rigid Surface Registration flow fields, and the local least-square technique of Lucas and Kanade [Luca 81] featuring robustness with respect to noise. The weighting parameters of the regularizers for the photometry-driven and the geometry-driven approach, ensuring smoothness of the estimated displacement fields, were empirically set to κp = 1.5 · 10−2 and κg = 10−6 . In our experimental setup using real data, the ground truth 3-D surface motion field is unknown. Furthermore, the attachment of markers is unacceptable as it might accidentally bias the optical flow computation. Hence, for quantitative evaluation, we applied the following scheme: First, we projected the 3-D displacement field ug : Ω → R3 onto the sensor domain and applied the resulting displacement ũg : Ω → R2 to the 2-D photometric data f rgb,ref on Ω acquired at the reference respiration state of end-exhalation. We then compared the warped images to the known photometric data f rgb,t at the respiration state of end-inhalation, over the patient’s torso given by a binary mask B . In particular, as a scalar measure of the initial and residual mismatch, we computed the root mean square photometric distance of the initial and warped data w.r.t. the reference f rgb,t , respectively: s e0 = s ep ( g ) = 1 k f t (ζ ) − f ref (ζ )k22 , |B| ζ∑ ∈B 1 k f t ζ + ũp(g) (ζ ) − f ref (ζ )k22 , ∑ |B| ζ ∈B (7.3) (7.4) where e0 denotes the initial mismatch, and ep , eg the residual mismatch after photometry-driven and geometry-driven registration, respectively. 7.3.2 Results Quantitative results of the residual mismatch for the proposed photometry-driven and the opposed geometry-driven registration approach are depicted in Fig. 7.3. Note that the measurements of the RGB channels of f rgb were scaled to the range of [0, 1] here. For both normal breathing and deep inhalation, the photometric approach outperformed the geometric variant by (eg − ep )/eg = 6.5% and 22.5%, respectively, in average over all subjects. Qualitative results for the deep inhalation study are depicted in Fig. 7.4. Here, the position of the papilla, being a salient anatomical landmark in the photometric domain, was manually labeled for the respiration states of end-exhalation and end-inhalation. Furthermore, it was labeled for the warped images after geometrydriven and photometry-driven registration. Overlaying the target position at the state of end-inhalation reveals the initial mismatch with the reference state (endexhalation). In addition, regarding the papilla position after photometry-driven and geometry-driven registration, it stresses the superior performance of the proposed photometry-driven approach in matching the target, resulting in an negligible residual photometric mismatch1 . By trend, our experiments indicated that the 1 Note that evaluating the estimated displacement vector at a photometrically salient landmark might differ from the results in less salient regions. 7.4 Discussion and Conclusions 0.15 125 Geometry-driven vs. Photometry-driven Registration: Normal Breathing 0.15 Geometry-driven vs. Photometry-driven Registration: Deep Inhalation Initial mismatch Mismatch geometric registration Mismatch photometric registration Mismatch prior to / after registration Mismatch prior to / after registration Initial mismatch Mismatch geometric registration Mismatch photometric registration 0.1 0.05 0 1 2 3 Subject 4 0.1 0.05 0 (a) 1 2 3 Subject 4 (b) Figure 7.3: Quantitative comparison of photometry-driven and geometry-driven surface registration. Given are results for four volunteers. For both normal breathing (a) and deep inhale (b), the photometry-driven approach outperformed the geometry-driven alternative for all cases. As intuition suggests, the effect is more pronounced with deep inhalation. Also note the higher initial mismatch with deep inhalation compared to normal breathing. geometry-driven approach underestimated the motion pattern in superior-inferior direction for all four subjects. The estimated 3-D surface motion fields aligning the respiration states of endexhalation and end-inhalation are illustrated in Fig. 7.5, for both the photometrydriven and the geometry-driven surface registration approach. It can be observed that the photometry-driven surface motion field up is more pronounced in SI direction than the geometric variant ug , particularly for the upper torso region. Again, this indicates an underestimation of the SI motion component by the geometrydriven registration. 7.4 Discussion and Conclusions In this chapter, we have presented a method for reconstructing dense 3-D surface motion fields over non-rigidly moving surfaces using RGB-D cameras. Opposed to conventional registration approaches that typically rely on the sole surface geometry, the registration process is driven by photometric information. In an experimental study for the application in IGRT, we have investigated the performance of the proposed photometry-driven method compared to a geometry-driven baseline. Both approaches are capable of providing dense surface motion fields for respiratory motion management. In experiments on real data from healthy volunteers, the proposed photometric method outperformed the geometry-driven surface registration by 6.5% and 22.5% for normal and deep thoracic breathing, respectively, evaluating the residual photometric mismatch. Note that the distance measure quantifies the photoconsistency of the warped images to the target and thus might potentially intro- 126 Photometry-driven Non-Rigid Surface Registration (a) (b) (c) (d) Figure 7.4: Manually labeled data from the four subjects (rows) performing deep inhalation thoracic breathing. Shown are photometric data from the thorax region, converted to grayscale in this illustration for better visibility of the colored labels. Given are data for end-exhalation (a), end-inhalation (d) and the results of geometry-driven (b) and photometry-driven (c) registration. For each subject, the position of the white crosshairs denotes the location of the papilla in the end-inhalation stage. The colored crosshairs depict the position of the papilla prior to registration (in blue) and after geometry-driven (red) and photometry-driven registration (in green). duce a bias favoring the result of the photometry-driven surface registration approach. However, comparing the mismatch in the geometric domain is impracticable here as the photometry-driven approach yields a perfect shape match by design, cf. Eq. (7.1). In addition, we observed an underestimation of the surface motion in SI direction for the geometry-driven registration. This coincides with the results of Schaerer et al. [Scha 12] comparing a geometry-driven registration to the local trajectories of skin markers (thus reflecting skin motion) and stating a reduced accuracy of the registration algorithm in recovering SI surface motion. Clinical studies have shown that the SI direction is the prominent direction of human breathing [Keal 06]. Hence, at first glance, one might interpret the results as an indication that the proposed photometry-driven registration variant is potentially a better choice for estimating surface motion fields [Baue 12d]. Here, let us point out a more differentiated view. In particular, we assume an interplay of three motion 7.4 Discussion and Conclusions 127 up ug (a) (b) Figure 7.5: Comparison of the 3-D surface motion fields up (upper row) and ug (lower row), for two subjects (a, b). The color of the displacement vectors encodes its magnitude in SI direction, according to the color bar on the right. Even though we depict results for the case of normal breathing, the underestimation effect of the geometric approach in SI direction is clearly visible, cf. Fig. 7.4. types being involved in the considered scenario. From interior to exterior, these are (1) soft tissue and organ motion in the abdominal and thoracic cavities due to contraction and relaxation of the diaphragm and ribcage muscles, (2) motion of the ribcage induced by bio-mechanical coupling to (1), and (3) skin motion induced by (2). While the coupling between (1) and (2) has been investigated extensively in physiology literature [West 08], a differentiation between (2) and (3) has not been considered yet. More specifically, while the external torso geometry and therefrom deduced surface motion fields essentially reflect the ribcage movement, we assume that there is an elastic stretching component involved between skin and ribcage motion that might lead to the differences between photometry-driven and geometry-driven surface motion estimates2 . Another influencing factor that might explain the differences of the estimated motion fields in our experimental evaluation is the choice of the regularization type and weighting. In conclusion, let us stress that both motion fields up and ug are meaningful and valuable for application in respiratory motion management. However, depending on the particular application, it must be investigated which motion fields better correlate with the internal target motion. Different physical signals may have stronger or weaker relationships with the respiratory motion [McCl 13]. This could be validated in a setup that acquires 4-D CT/MR data simultaneously with RGB-D data, but being beyond the scope of this thesis. 2 This implies that skin markers might be an inappropriate choice to evaluate geometry-driven surface registration methods, as performed by Schaerer et al. [Scha 12], for instance. 128 Photometry-driven Non-Rigid Surface Registration CHAPTER 8 Outlook The concepts proposed in this thesis open a number of opportunities for further research. In this chapter, we discuss directions and perspectives as well as challenges toward clinical translation. Directions for Rigid Surface Registration. Regarding the feature-based framework for rigid surface registration that is applicable in the presence of large misalignments (Chap. 3), the following aspects should be considered. With the current approach, shape descriptors are computed for every single surface point in the template and reference shape, respectively. Instead, introducing a preceding feature detection stage that identifies salient keypoints with a low-level algorithm would help reduce the computational effort for descriptor extraction. In addition, it would narrow down the search space in the subsequent feature matching stage. The matching stage could be accelerated using the RBC data structure and search scheme (Sect. 4.3.2) opposed to the brute force nearest neighbor search used in the current approach. Furthermore, benefits are expected from using a multi-scale search scheme. Gaussian filtering followed by subsampling in the underlying 2-D range image domain could be applied to generate multi-scale surface data [Bona 11]. Then, from coarse to fine levels, the found correspondences would be propagated to initialize the correspondence search at a finer scale, improving both robustness and runtime performance. If RGB-D data are available for both the template and the reference shape, one could further consider the application of photo-geometric shape descriptors [Baue 11b]. These encode both the photometric texture and geometric shape characteristics in a common representation. This might be particularly helpful to establish correspondences in regions with nonsalient topography. The benefits of incorporating complementary photometric information for rigid surface registration were also demonstrated in Chap. 4. We proposed a photogeometric ICP framework using the concept of RBC for efficient NN search. For estimating the aligning transformation, we built on the classical point-to-point error minimization that can be solved in closed form. However, from a statistical point of view, this implicitly assumes that the points are observed with zero mean and isotropic Gaussian noise [Bala 09]. In a multi-modal surface registration scenario this will generally not be the case, e.g., because of differences in mesh resolution and topology. Also, 3-D measurement errors may be highly anisotropic as RI sensors typically have a much higher localization uncertainty in the viewing di129 130 Outlook rection of the camera. Hence, future work should consider a generalized anisotropic ICP along the lines of Maier-Hein et al. [Maie 12], although requiring an iterative optimization scheme [Bala 09]. Another improvement in terms of accuracy is expected when minimizing a point-to-plane distance metric opposed to a point-to-point metric as it allows the shapes to slide over each other and thus avoids snap-to-grid effects [Chen 92]. Again, however, solving the corresponding optimization problem then involves an iterative scheme. In analogy to the feature-based approach, applying a multi-scale ICP scheme can improve both robustness and convergence behavior. Also, an automatic scene-dependent adaptation of the photo-geometric weight by low-level analysis of the acquired RGB-D data might be a promising direction. For the reconstruction scenario considered in Chap. 4 where an RGB-D data stream is fused on-the-fly to establish a global shape model we recommend the transition from frame-to-frame registration (considering the current frame and the previous one) by a frame-to-model registration (considering the shape of the instantaneous global model as reference). This has been shown to reduce drift effects, e.g. when using a TSDF representation [Izad 11, Newc 11], cf. Sects. 4.4.2,4.4.3. Note that for the proposed photo-geometric registration approach such a frameto-model reconstruction scheme implies the need for a 6-D global model incorporating both photometric and geometric information [Stei 11, Whel 12]. Part I of this thesis has addressed the task of rigid shape alignment. In the experiments (Sects. 3.5.1,3.5.2), the feature-based framework has proved to meet the requirements for multi-modal application on RI/CT data. Although the approach is robust w.r.t. noise and topological differences, it was designed for rigid shape comparison. Thus, in general, it is sensitive to non-rigid deformations. In clinical practice, the assumption of rigidity does often not hold and must be relaxed. Hence, tackling elastic deformations is a major direction for future research in the field. In principle, a feature-based approach is capable of coping with nonrigid deformations, if the rigidity assumption approximately holds within the local neighborhood that is encoded by the shape descriptor used in the correspondence search. In this case, instead of estimating a global rigid transformation, a dense non-rigid displacement field may be derived from sparse correspondences using interpolation techniques [Amid 02]. In this context, future work should benchmark the resilience of the proposed shape descriptors w.r.t. small-scale deformations. Regarding the proposed photo-geometric ICP framework, a non-rigid extension along the lines of Amberg et al. may be a promising option [Ambe 07]. Directions for Non-Rigid Surface Registration. Part II of this thesis was concerned with the estimation of non-rigid torso deformations with prospective applications in respiratory motion management in IGRT. In Chap. 5 we have presented a variational formulation for joint RI denoising and its registration to an accurate reference shape. Here, for a first feasibility study, we have confined to a simple denoising formulation that combines a least square type fidelity term with a TV-like regularization. In particular, we have chosen a pseudo Huber norm, enforcing a strong smoothing in flat regions to avoid staircasing while preserving discontinuities that occur at the torso boundaries. Future work might consider more advanced denoising models such as the variational 131 formulation by Lenzen et al. on adaptive anisotropic total variation [Lenz 11] for even better preserving these boundaries. In the current approach, we have initialized the displacement field that is to be reconstructed with the estimate from the previous phase. This can be considered a first step toward temporally coherent deformation tracking. A promising extension to the proposed variational formulation might be to incorporate a dedicated additional temporal regularization term for the displacement field, ensuring smoothness and consistency of the motion trajectories over time, cf. [Volz 11, Garg 13]. Further experimental studies should investigate the performance of the joint approach compared to subsequent denoising and registration w.r.t. different RI modalities beyond ToF imaging, e.g. structured light. It might be beneficial to incorporate modality-specific noise characteristics into the denoising term. In Chap. 6, we have proposed a novel method for estimating dense surface motion fields from sparse RI measurement data. The experimental study has shown that the proposed formulation yields highly accurate dense surface reconstructions. This is a promising result, in particular when considering the fact that the evaluation was performed on data with the noise characteristics of raw, unfiltered MLT measurements. We expect another gain in accuracy by pre-processing the MLT measurements using customized denoising filters. In analogy to the concept in Chap. 5, this could be implemented by incorporating an additional term into the objective function, jointly denoising the MLT data while performing a sparseto-dense surface registration. The objective function could be further refined with an adaptive regularization of the dense displacement field. In particular, we propose to adjust the regularization weight for a local displacement vector according to its spatial position w.r.t. the MLT measurement grid. The idea is to enforce strong regularization in regions with low support due to poor coverage by the MLT sensor – thus increasing the local contribution of prior shape knowledge. Vice versa, in regions where the interventional torso shape is known from nearby MLT measurements, e.g. close to the position where two projected laser lines intersect, the regularization could be relaxed. In Chap. 7, we have presented a photometry-driven approach to surface deformation tracking. A straightforward extension might combine the two-step approach in a joint photo-geometric registration formulation. Instead of first estimating a 2-D displacement field in the photometric image domain and then deriving the associated 3-D surface motion field from the former, the objective function could be reformulated such that it directly estimates the 3-D motion field. For instance, one might combine an optical flow constraint in the 2-D image domain with a regularization of the corresponding 3-D displacement field over the surface [Leto 11]. Our practical experience further suggests to exploit salient anatomical landmarks that are present in the photometric domain (e.g. papilla for torso applications) to improve the accuracy of the estimated displacement field. In particular, one might add a term to the objective function that enforces closeness of corresponding 3-D points that were established from 2-D photometric features. An alternative strategy might treat the matched landmarks as hard constraints in the optimization process along the lines of Daum [Daum 11]. Thus, these landmarks 132 Outlook would serve as anchor points in the motion field reconstruction process and constrain the displacement field in addition to the classical optical flow constraint. Directions for RI-based Guidance in RT. Regarding the medical applications addressed in this thesis, the focus was on RI-guided RT. In particular, we have addressed marker-less patient positioning (Chap. 3) and respiratory motion tracking (Chap. 5-7). Below, we give an outlook into future research from an application point of view, confining the discussion to the specific field of RT. First and foremost, the proper integration of the individual components into the existing clinical workflow in IGRT must be substantiated. This applies for both automatic coarse setup superseding the conventionally manual and marker-based initial alignment, as well as marker-less and non-radiographic respiratory motion management for continuous target tracking and treatment opposed to gated RT. It also implies the need for clinical studies investigating robustness, reliability and accuracy of the proposed methods on patient data acquired in a clinical environment. In particular, such studies must investigate the performance of the complete system combining the individual rigid and non-rigid registration modules along the workflow. For instance, note that the accuracy of correlating external surface motion fields to the internal target motion learned from 4-D tomographic planning data highly depends on the accuracy of patient setup w.r.t. this planning data. Consequently, the precise alignment of the interventionally acquired patient shape to planning data is a fundamental prerequisite to minimize error propagation in downstream tasks. Note that these studies must be carefully designed concerning a potential bias, e.g. using optical markers as a reference in photometry-driven registration approaches. Another essential prerequisite for clinical studies is the availability of certified RI sensors that provide 3-D measurement data in a reliable and accurate manner. While the sensors used in this thesis (PMD CamCube, Microsoft Kinect, MLT sensor) are not approved for clinical application, some IGRT providers have such clearance for stereo vision (VisionRT Ltd., London, UK) or structured light (Catalyst, C-RAD AB, Uppsala, Sweden) based RI sensor hardware. Clinical studies should investigate the system performance w.r.t. the camera setup, e.g., comparing a single-camera solution to a multi-camera setup. The latter is expected to provide improved coverage in dynamic environments that imply temporal or partial patient occlusion due to clinical staff or hardware. With regard to a multi-camera RI acquisition setup, a smart fusion of range data from multiple devices was proposed by Kainz et al. [Kain 12] and Wasza et al. [Wasz 13]. Associated practical issues that are not covered by this thesis but must be solved for clinical acceptance include a robust, accurate, and effortless system calibration and temporal synchronization w.r.t. the treatment system. Furthermore, real-time implementations of the approaches proposed in this thesis may be necessary to fulfill the demands in clinical practice. Concerning the integration of the proposed feature-based framework for automatic coarse patient setup, a gated positioning approach could be applied. Using a 1-D respiration curve extracted from RI data as an indicator for the current respiration state [Scha 08] can help trigger the acquisition of surface data at the particular 133 respiration state that was chosen for the acquisition of prior planning data. Regarding the application of the reconstructed dense non-rigid displacement fields in respiratory motion management, several approaches should be investigated. For instance, the 4-D surface motion fields might be analyzed to determine the current respiration type based on a previously learned patient-specific respiration model, e.g. allowing for an automatic separation between thoracic and abdominal breathing [Wasz 12a]. Furthermore, such learned motion models [McCl 13] could be used for motion-compensated patient positioning [Wasz 12b, Wasz 13]. The most obvious limitation of RI-based guidance systems is the fact that the observation is restricted to the external body surface – that might be even covered by sterile drapes. Hence, the movement of internal structures are invisible unless combining external measurements with planning data from 3-D or 4-D tomographic imaging. For application in tumor motion compensation, the essential question is: How to infer the motion of internal structures from the interventionally measured external torso deformations? There are two ways to approach this challenge. First, one might consider a non-rigid 3-D extension of the external body deformation onto volumetric planning data. However, we hypothesize that a reliable extension would require a tissue-specific modeling of elasticity and deformation behavior under physical stress, e.g. using bio-mechanically or physiologically inspired FE models [Robe 07, Eom 09]. This would further require a segmentation of the planning data w.r.t. different organs and tissue types. A promising alternative strategy involves techniques from machine learning to establish an external-internal motion correlation model from 4-D CT/MR planning data, along the lines of Schaerer et al. [Scha 12] and McClelland et al. [McCl 13], also recall Sect. 5.1. However, this necessitates the acquisition of 4-D tomographic planning data. While 4-D CT involves additional radiation exposure to the patient, we expect 4-D MR planning to boost the prospective clinical acceptance of this approach [Miqu 13]. Nonetheless, one open question in the context of externalinternal correlation remains: Which type of surface motion field correlates best with the associated motion of internal structures? In practice, driven by the individual matching and regularization terms and a customized incorporation of prior knowledge, the methods proposed in Chapters 5-7 and presented in literature [Scha 12] will yield different surface motion fields describing the same physical torso deformation. Hence, clinical studies must investigate which approach is suitable to achieve an optimal correlation. Potentially, advanced regularization schemes that incorporate prior knowledge about the biomechanics of human respiration might be required to address this issue. Translation to Different Clinical Applications. The methods proposed in this thesis are not restricted to the medical applications discussed in the individual chapters. Indeed, rigid and non-rigid surface registration of RI measurements to data acquired with conventional medical imaging modalities is an essential prerequisite for a wide range of clinical applications. The photo-geometric rigid registration framework (Chap. 4) can be applied for 3-D reconstruction of various anatomical shapes that can be modeled as approximately rigid. Apart from the applications considered in this thesis, we expect 134 Outlook great potentials for the reconstruction of tubular shaped objects with an inherently low degree of elasticity such as the esophagus, the trachea or the lung tree in bronchoscopy, being highly ambiguous for geometry-driven shape reconstruction methods. Future research may focus on fusing RGB-D data acquired during camera insertion and retraction, or incorporating prior shape knowledge from planning data into the interventional shape reconstruction process. An on-the-fly registration of endoscopic RGB-D data with a reference shape extracted from tomographic planning data would further allow for augmented reality navigation and guidance during the procedure. Such hybrid systems might further help reduce reconstruction drifts by registration to a global shape model, and are expected to be more robust and accurate compared to previous approaches based on conventional 2-D endoscopic video data [Rai 06, Higg 08]. The non-rigid surface registration approaches introduced in Part II of this thesis can be directly applied to manifold clinical applications beyond RT. For instance, dense surface motion tracking is of particular interest for improved navigation in image-guided interventions such as computer-aided open hepatic surgery [Mark 10, Oliv 11] or endoscopy-guided minimally invasive procedures. Tracking external organ deformations during IGLS might eventually allow for dynamic augmented reality with pre-operative planning data such as a real-time update of the deforming internal hepatic vessel tree during tissue manipulation. Dense 4-D surface motion fields could also help reducing motion artifacts in tomographic reconstruction [Baue 13a]. Gianoli et al. proposed the use of marker-based surface tracking to extract a multi-dimensional respiration surrogate for reducing artifacts in retrospective 4-D CT image sorting [Gian 11]. Their experiments revealed that using multiple surrogates reduced uncertainties in breathing phase identification compared to conventional methods based on a one-dimensional surrogate. In addition, RI-based body surface tracking is of particular interest for motion compensation in nuclear imaging such as PET and SPECT [Bett 13]. Based on previous work on motion compensation in PET/SPECT using marker-based tracking [Alno 10, Bruy 05, McNa 09], the potential application of dense and real-time RI has been attracting interest in the field lately [Oles 10, Noon 12]. CHAPTER 9 Summary The advent of dense and dynamic RI technologies is expected to accelerate the future demand for surface registration techniques. This thesis has addressed promising medical applications that require a mono-modal or multi-modal alignment of RI data. Depending on the particular task, the proposed methods target both rigid and non-rigid registration scenarios. In Chap. 2, we outlined the measurement principles of different real-time capable RI modalities and detailed the technologies that were applied in this thesis (ToF, SL, MLT), with a thorough discussion of modality-specific strengths and limitations. We introduced our development platform for range image processing (RITK), the integrated RI simulation environment used for quantitative evaluation throughout this work, and our data enhancement pipeline. In addition, we summarized recent developments and promising fields of applications of modern RI technologies in health care. Eventually, we reviewed the state-of-the-art in the field of surface registration and shape correspondence, with a particular focus on medical applications. Rigid Surface Registration. The rigid alignment of shape data is a common challenge in many medical applications. The availability of modern RI technologies that enable an intra-interventional acquisition of dense spatio-temporal surface data has accelerated the demand for a robust solution to this task, holding potential to improve existing and create new and innovative clinical workflows. In Chap. 3, we addressed two particular examples. First, we introduced a novel marker-less solution for automatic initial patient setup in RT. It is based on a direct multi-modal alignment of intra-fractional RI data of the patient’s external body surface to tomographic planning data. Second, we proposed the application of the technique to IGLS, where the alignment of the target organ to pre-operative reference data based on intra-operative RI holds great potential to augment navigation. In clinical practice, conventionally, both tasks are performed in a manual and marker-based manner. To overcome this, we have developed a feature-based rigid surface registration framework. More specifically, to meet the particular requirements in multi-modal shape alignment, we have introduced 3-D shape descriptors that are invariant to mesh density and organization, and resilient to inter-modality deviations in surface geometry. By design, the method can handle gross initial misalignments and cope with partial matching. For initial patient setup in RT, the proposed approach yielded an average angular positioning error of 1.5±1.3◦ and 135 136 Summary an average translational positioning error of 12.9±6.6 mm, at a 97.5% success rate, for aligning Microsoft Kinect data to a reference shape extracted from CT planning data. For organ registration in IGLS, the average target registration error was 3.8±1.1 mm on porcine liver data in a ToF/CT setup, stressing the potential of the proposed RI-based solution to supersede manual coarse alignment. Having successfully applied the proposed framework on different RI modalities and biological materials further indicates the generalization capability of the approach. Along with an increasing interest in using RI for medical applications, we have observed increasing efforts to miniaturize RI devices toward application in endoscopy. In Chap. 4, we addressed two exemplary fields that will benefit from the availability of 3-D endoscopes. In laparoscopy, the intra-operative registration of RI data provides the opportunity to reconstruct the geometric shape of the operation situs. This provides the surgeon with an extended view of the target and knowledge about surrounding anatomy. Aligning the interventional situs geometry to pre-operative planning data further enables augmented reality guidance. In colonoscopy, registration of intra-procedural RI data will allow to construct metric 3-D shape models that could assist gastroenterologists in quantitative diagnosis and pre-operative planning. An alignment to pre-interventional virtual colonoscopy data would further improve navigation. To address such endoscopic shape reconstruction scenarios, we proposed an ICP-based rigid registration framework, assuming rigidity as an acceptable approximation for mapping scenarios where accuracy requirements are less strict and motion is moderate. The approach incorporates geometric shape and complementary photometric appearance information in a joint manner to guide the registration process. To meet realtime constraints (≥ 20 Hz), our ICP variant builds on a novel acceleration structure for efficient 6-D NN queries. In particular, we optimized the RBC search scheme in terms of performance for low-dimensional data, trading off accuracy against runtime and yielding ICP runtimes of less than 20 ms on an off-the-shelf GPU. In a study on synthetic RGB-D data, we found that incorporating photometric appearance as a complementary cue substantially outperformed a geometry-driven conventional ICP. This makes the approach of particular interest for RGB-D cameras that provide low-SNR range measurements but additional high-grade photometric data. For operation situs reconstruction in laparoscopy, the proposed photo-geometric variant reduced the drift by a factor of 12.9 (translation) and 5.9 (rotation) compared to a geometry-driven ICP. For colon model construction, the approach yielded comparable factors. Non-Rigid Surface Registration. Dynamic RI technologies that enable real-time 3-D shape acquisition are of particular interest for medical applications, providing means to capture non-rigid shape deformations in a marker-less manner. In the second part of this thesis, we addressed applications that will benefit from the reconstruction of non-rigid surface motion fields from spatio-temporal RI data. Even though we focused on the example of respiratory motion tracking, the proposed methods can be exploited for a broad range of medical applications. In IGRT, surface motion fields can be used as a high-dimensional respiration surrogate for gating, to drive external-internal motion correlation models for respiration- 137 synchronized treatment, and for motion compensated patient positioning. Furthermore, beyond RT, they hold great potential for motion compensation in imageguided interventions and tomographic reconstruction. Altogether, we have presented three approaches to estimate surface motion fields that are tailored w.r.t. the individual strengths and limitations of three distinct RI technologies. In Chap. 5, we have proposed a novel variational framework that simultaneously solves denoising of low-SNR RI data and its registration to an accurate reference shape extracted from tomographic planning data. In the experiments, we have shown that solving these two intertwined problems of denoising and registration in a simultaneous manner is superior to a consecutive approach where the surface registration is performed after prior denoising of RI measurements. In a quantitative study on real CT and synthetic ToF data, we found that the joint formulation improved the quality of both the denoising and the registration process by a factor of 2.1 and 1.9, respectively. An additional study on real ToF data further revealed that the joint model can compensate for surface artifacts that result from systematic ToF measurement errors. In conclusion, the results indicate that incorporating prior shape knowledge into the denoising process allows for a robust estimation of dense surface motion fields with RI modalities that exhibit a low SNR. The proposed method enables both an improved intra-fractional full torso surface acquisition for patient monitoring and the tracking of non-rigid torso deformations. Instead of overcoming the low SNR of available dense RI sensors, we also investigated the application of a novel MLT RI technology that acquires sparse but highly accurate 3-D position measurements in real-time. In combination with a novel variational sparse-to-dense registration approach introduced in Chap. 6, both the patient’s dense instantaneous external body surface and the non-rigid surface motion field describing its spatio-temporal deformation can be reconstructed in a joint manner from sparse sampling data and patient-specific prior shape knowledge. The performance of the proposed method was evaluated on synthetic, realistic and real MLT data. In a comprehensive study on 256 datasets from 16 subjects with an average initial surface mismatch of 5.66 mm, the mean residual registration error was 0.23 mm w.r.t. ground truth data, on realistic MLT data. The 95th percentile of the local residual mesh-to-mesh distance after registration did not exceed 1.17 mm for any subject, indicating that the proposed method can reliably recover the dense displacement field even in the presence of strong respiration. We further found that a proper initialization and an improved mathematical formulation reduced the runtime by 19.2% and 48.2%, respectively. With regard to future advances in sensor technology, simulations indicated a considerable potential gain in accuracy when doubling the MLT laser grid sampling density. At a runtime of 2.3 s per frame, the developed CPU implementation outperformed related work substantially. The methods proposed in Chapters 5,6 rely on the sole geometry of the shapes to be aligned. In Chap. 7, we presented a method for non-rigid surface registration that exploits complementary photometric information available with modern RGB-D cameras. The idea behind is that photometric information can compensate for regions with non-salient topographies, whereas geometric information can 138 Summary guide the motion estimation in faintly textured regions. The proposed framework estimates the non-rigid transformation in the photometric 2-D image domain and then deduces the surface motion field from the former and the associated 3-D position measurements. In an experimental study on real data from Microsoft Kinect, we have investigated the performance of this photometry-driven method compared to a geometry-driven baseline. Indeed, the photometry-driven approach outperformed the latter by 6.5% and 22.5% in terms of residual photometric mismatch for normal and deep thoracic breathing, respectively. The results indicate that the approach is of particular interest for RI cameras that provide low-SNR range measurements but acquire additional high-grade photometric information. In Chap. 8 we summarized perspectives to improve the methods proposed in this thesis and discussed potential challenges toward clinical translation. In summary, this thesis made a number of original contributions, both on a theoretical and on a practical level, to the emerging research field of surface registration for RI-based applications in medicine. It provides novel techniques for rigid and non-rigid surface registration that are applicable to a broad range of clinical procedures. Based on the methods developed, the measurements conducted and the results obtained, an optimized treatment under RI guidance could be available in the near future and permit more accurate, safe and efficient interventions. CHAPTER A Appendix A.1 Projection Geometry In this section, we present a brief recapitulation of projection geometry [Hart 04, Faug 04]. In particular, we describe the concept of perspective projection and elaborate on how the inversion of this projection is employed to calculate 3-D positions from the pixel-wise scalar distance measurements of range imaging sensors. A.1.1 Perspective Projection A camera maps a 3-D position in the scene space, denoted in Cartesian (xw ) or homogeneous world coordinates (x̃w ) w.r.t. an arbitrarily defined world coordinate system, or in camera coordinates w.r.t. the camera coordinate system (xcc , x̃cc ), onto a position in the 2-D sensor domain, denoted picture coordinate (xp , x̃p ): xw = ( xw , yw , zw )> ∈ R3 , x̃w = ( x̃w , ỹw , z̃w , w̃w )> ∈ R4 , (A.1) xcc = ( xcc , ycc , zcc )> ∈ R3 , x̃cc = ( x̃cc , ỹcc , z̃cc , w̃cc )> ∈ R4 , (A.2) xp = ( xp , yp )> ∈ R2 , x̃p = ( x̃p , ỹp , z̃p )> ∈ R3 . (A.3) Assuming a pinhole camera model, the position of the projection onto the 2-D image xp is given by the intersection of the line between the 3-D position in the camera coordinate system xcc and the camera’s optical center with the sensor plane. This can be expressed in homogeneous coordinates and matrix notation: x̃p = K I 0 x̃cc , (A.4) where I ∈ R3×3 denotes the identity matrix and K ∈ R3×3 the camera calibration matrix: αx 0 cx K = 0 αy cy . (A.5) 0 0 1 Here, α x , αy denote the focal lengths in [px] and c x , cy the sensor’s principal point1 in [px]. Generalizing the projection formulation from camera coordinates to world coordinates involves a rotation matrix R ∈ SO(3) and a translation vector t ∈ R3 , 1 Note that we use an approximation of the camera calibration matrix here, neglecting the skew parameter for reasons of simplicity. 139 140 Appendix describing the relative position and orientation between the two coordinate systems: ! R t x̃p = K I 0 x̃w . (A.6) 0 1 For a comprehensive introduction into the principles of projection geometry and camera calibration, we refer to the books by Hartley and Zisserman [Hart 04] and Faugeras et al. [Faug 04]. A.1.2 3-D Point Cloud Reconstruction Now, based on Eq. (A.4), let us derive the reconstruction of 3-D positions from scalar range measurements by inverting the projection process. We treat RI sensors that measure orthogonal distances (e.g. Microsoft Kinect) first. Second, we derive the reconstruction for sensors that provide radial distances (e.g. ToF sensors). Orthogonal Measurements. For each position xp on the image plane, the measured orthogonal distance r⊥ ( xp ) = zcc (orthogonal w.r.t. the sensor plane) describes a 3-D position xcc = xr⊥ ( xp ) in the camera coordinate system. In particular, rearranging Eq. (A.4) yields: 1 xcc = α− x ( xp − c x ) zcc , 1 ycc = α− y ( yp − cy ) zcc . (A.7) (A.8) Hence, the 3-D position in the camera coordinate system xcc is given as: with p⊥ : R2 → R3 : xcc = xr⊥ ( xp ) = r⊥ ( xp ) p⊥ ( xp ) , (A.9) 1 (x − c ) α− p x x 1 p ⊥ ( xp ) = α − y ( yp − c y ) . 1 (A.10) Note that the intrinsic camera parameters (α x , αy , c x , cy ) are determined by camera calibration [Zhan 00b]. Radial Measurements. For RI sensors that measure radial distances r^ ( xp ), the 3-D position in the camera coordinate system xcc = xr^ ( xp ) is given as: xcc = xr^ ( xp ) = r^ ( xp ) p^ ( xp ) , (A.11) where p^ : R2 → S2 gives the projection ray normalized to unit length, cf. Eq. (A.10): 1 (x − c ) − 1 α− p x x 2 2 2 −1 1 1 p^ ( xp ) = α− + α− +1 α y ( yp − c y ) . x ( xp − c x ) y ( yp − c y ) 1 (A.12) It is worth noting that the projection rays only depend on the intrinsic camera parameters, and thus can be pre-calculated to speed up the computations for 3-D point cloud reconstruction from 2-D range measurements. A.2 A.1.3 Joint Range Image Denoising and Surface Registration 141 Range Image Data Representation The measurements of an RI sensor with a resolution of w × h pixels can be interpreted as a 2-D range image where each pixel holds the associated orthogonal or radial distance to the observed scene point, or likewise as a set of 3-D points, X = { x1 , . . . , x|X | }, xi ∈ R3 , |X | = w · h , (A.13) also termed 3-D point cloud, using Eqs. (A.9),(A.11) and the intrinsic camera parameters. As stated before, modern RI sensors may also provide complementary photometric image information (either grayscale or color). Using camera calibration, both geometric and photometric data can be aligned, eventually providing textured point clouds. Point cloud triangulation for surface mesh generation with RI modalities typically exploits the bijection between the reconstructed 3-D point cloud and its underlying topological representation in a regular 2-D image. A.2 Joint Range Image Denoising and Surface Registration A.2.1 Approximation of the Matching Energy For a geometrically correct formulation of the matching energy Ematch , in theory, we would have to consider the surface integral: Ematch [u] := Z Xr |dG (φ( xr (ζ )))|2 dA , (A.14) where dA denotes a surface element. Based on the generalization of integration by substitution for integrating functions of several variables, and the corresponding change of variables formula [Bron 08], Eq. A.14 can be re-written as: Z q (A.15) Ematch [u] = |dG (φ( xr (ζ )))|2 det ( Dxr (ζ ))> Dxr (ζ ) dζ . Ω The term det ( Dxr (ζ ))> Dxr (ζ ) is known as Gram determinant of Dxr . Geometrically, the Gram determinant is the square of the area of the parallelogram formed by the vectors [Kuhn 06]: det ( Dxr (ζ ))> Dxr (ζ ) = k∂ζ 1 xr (ζ ) × ∂ζ 2 xr (ζ )k22 , (A.16) q hence det ( Dxr (ζ ))> Dxr (ζ ) denotes the area. Instead of using this at first glance geometrically appealing approach, we considered the approximative formulation (Eq. 5.6). The reason is twofold: First, if the range function r and thus xr changes during the optimization process – as it does indeed with the proposed joint denoising and registration approach (Sect. 5.4) – the evaluation of the area term and its derivatives in the first variation of Ematch 142 Appendix would induce a substantial computational burden. Second, for the joint approach, the area term with Dxr (ζ ) = Dr (ζ ) ⊗ p(ζ ) + r (ζ ) Dp(ζ ) involves first derivatives of r, which can be regarded as a further first order prior for the range function2 . In practice, we observed a strong bias between this local weight for the quality of the matching and the actual matching term |dG (φ( xr (ζ )))|2 leading to less accurate matching results in particular in regions of steep gradients in r corresponding to edges or the boundary contour of Xr . A.2.2 Derivation of the First Variations The first variation (or Gâteaux derivative) of a functional E [u] around u w.r.t. a test function ψ is defined as: d E (u + εψ) − E (u) = E (u + εψ) . h∂u E [u], ψi = lim ε →0 ε dε ε =0 (A.17) Below, we derive the first variations of the individual energies of the joint range image denoising and surface registration approach proposed in Sect. 5.4: First variation of Efid [r ], test function ϑ : Ω → R: Z |r − r0 |2 dζ . Ω Z d 0 2 hEfid [r ], ϑi = |r + εϑ − r0 | dζ dε Ω ε =0 Z 2(r + εϑ − r0 )ϑ dζ = Ω Efid [r ] = ε =0 = Z Ω 2(r − r0 )ϑ dζ . (A.18) First variation of Er,reg [r ], test function ϑ : Ω → R: Er,reg [r ] = Z k∇r kδreg dζ . Z d 0 k∇(r + εϑ)kδreg dζ hEr,reg [r ], ϑi = dε Ω ε =0 Z ∇(r + εϑ) · ∇ϑ dζ = Ω k∇(r + εϑ )kδreg Ω ε =0 = 2⊗ denotes the Kronecker product. Z Ω ∇r · ∇ ϑ dζ . k∇r kδreg (A.19) A.2 Joint Range Image Denoising and Surface Registration 143 First variation of Ematch [u, r ] w.r.t. r, test function ϑ : Ω → R: Z |dG ( xr + u)|2 dζ . Z d 2 h∂r Ematch [u, r ], ϑi = |dG ((r + εϑ) p + u)| dζ dε Ω ε =0 Ematch [u] = = = Ω Z 2dG ((r + εϑ ) p + u)∇dG ((r + εϑ ) p + u) · pϑ dζ Ω Z Ω 2dG (rp + u)∇dG (rp + u) · pϑ dζ . ε =0 (A.20) First variation of Ematch [u, r ] w.r.t. u, test function ϕ : Ω → R3 : Z |dG ( xr + u)|2 dζ . Z d 2 h∂u Ematch [u, r ], ϕi = |dG ( xr + u + εϕ)| dζ dε Ω ε =0 Ematch [u] = = = Ω Z 2dG ( xr + u + εϕ)∇dG ( xr + u + εϕ) · ϕ dζ Ω Z Ω 2dG ( xr + u)∇dG ( xr + u) · ϕ dζ . ε =0 (A.21) First variation of Eu,reg [u], test function ϕ : Ω → R3 : Ereg [u] = Z Ω k Duk22 dζ . ! 2 k∇( u + εϕ )k dζ k k 2 ∑ Ω k =1 ε =0 Z 3 = 2 ∑ ∇(uk + εϕk ) · ∇ ϕk dζ Ω k =1 d 0 hEu,reg [ u ], ϕ i = dε Z 3 ε =0 = = 3 Z Ω Z Ω 2 ∑ ∇uk · ∇ ϕk dζ k =1 2Du : Dϕ dζ . (A.22) 144 Appendix A.3 Sparse-to-dense Non-Rigid Surface Registration A.3.1 Derivation of the First Variations First variation of Econ [u, W ] w.r.t. u, test function ϕ : Ω → R3 : Econ [u, W ] = 1 n | P(yi + wi ) + u( Q P(yi + wi )) − yi |2 . 2n i∑ =1 h∂u Econ [u, W ], ϕi = 1 n | P(yi + wi ) + u( Q P(yi + wi )) + εϕ( Q P(yi + wi )) − yi |2 2n i∑ =1 d = dε ! ε =0 1 = ∑ ( P(yi + wi ) + u( Q P(yi + wi )) + εϕ( Q P(yi + wi )) − yi ) · ϕ( Q P(yi + wi )) n i =1 n ε =0 n = 1 ( P(yi + wi ) + u( Q P(yi + wi )) − yi ) · ϕ( Q P(yi + wi )) . n i∑ =1 (A.23) First variation of Ereg [u], test function ϕ : Ω → R3 : Ereg [u] = 1 2 Z Ω |∆u|2 dζ . ! 2 | ∆ ( u + εϕ )| dζ k k ∑ Ω k =1 ε =0 3 Z = ∑ ∆(uk + εϕk ) · ∆ϕk dζ k =1 Ω d 0 hEreg [ u ], ϕ i = dε 1 2 Z 3 ε =0 3 = A.3.2 ∑ Z k =1 Ω ∆uk · ∆ϕk dζ . (A.24) Improved Projection Approximation To evaluate the variation ∂wj Econ one has to compute: DP( x) = I − ∇dG ( x)∇> dG ( x) − dG ( x) D2 dG ( x), (A.25) which involves the Hessian D2 dG ( x) of the SDF dG . Although it is possible to numerically approximate the second derivatives of dG , e.g. similar to how ∆u in Ereg is handled, we propose a modification that completely avoids second derivatives of dG . Using this approach, we stay clear of the additional algorithmic complexity required to handle D2 dG . Let us point out that this is possible since our objective functional E itself does not involve D2 dG , only the descent direction does. This is not true for ∆u, which is the reason why we need to evaluate ∆u numerically. In order to avoid the second derivatives of dG , we partially linearize P by replacing the projection direction in P by the already computed direction from the A.3 Sparse-to-dense Non-Rigid Surface Registration 145 last update. Denoting by W m−1 = {w1m−1 , . . . , wnm−1 } ⊂ R3 the estimate for W in the (m − 1)-th gradient descent step, in a first formulation [Baue 12a] we considered the following approximate projection in the m-th step, P(yi + wi ) = yi + wi − dG (yi + wi )∇dG (yi + wi ) ≈ yi + wi − dG (yi + wim−1 )∇dG (yi + wim−1 ) =: Pim (yi + wi ). (A.26) The linear part of the projection is evaluated at the unknown new estimate wi while the nonlinear part of P is evaluated at the old estimate wim−1 . Thus, Econ in the m-th step is replaced by: m Econ [u, W ] = 1 n | Pim (yi + wi ) + u( Q Pim (yi + wi )) − yi |2 . 2n i∑ =1 (A.27) m is: Since DPim is the identity matrix I, the variation of Econ m ∂wj Econ [u, W ] = > 1 m Pj (yj + w j ) + u( Q Pjm (yj + w j )) − yj nh i I + Du( Q Pjm (yj + w j )) Q . (A.28) In particular, it does not include second derivatives of dG . Since in each step of the gradient descent the W estimate from the preceding step is used to approximate the projection, the approximation is automatically updated after each step, leading to a fixed-point iteration. Unfortunately, this linearization does not reflect the underlying geometry properly and hence, not surprisingly, requires substantially more iterations than the approach investigated here – using a more accurate approximation of the projection. Indeed, we modified Eq. (A.26) so that the scaling term of the nonlinear part is evaluated at the new estimate wi , Pim (yi + wi ) = yi + wi − dG (yi + wi )∇dG (yi + wim−1 ) , DPim (yi + wi ) = I − ∇dG (yi + wim−1 )∇> dG (yi + wi ) . (A.29) m is then: The variation ∂wj Econ m ∂wj Econ [u, W ] = > 1 m Pj (yj + w j ) + u( Q Pjm (yj + w j )) − yj n I − ∇dG (yi + wim−1 )∇> dG (yi + wi ) + Du( Q Pjm (yj + w j )) Q(I − ∇dG (yi + wim−1 )∇> dG (yi + wi )) . (A.30) For a quantitative analysis of the impact of this modification on reconstruction accuracy and the convergence speed of the algorithm, we refer to Sect. 6.3.2. A.3.3 Detailed Results of the Prototype Study In addition to the boxplots in Fig. 6.6, we present numbers of the initial and residual mismatch (95th percentile) for the individual subjects in Table A.1. Here, we 146 Appendix Table A.1: Initial surface mismatch |dG | on M p and residual surface mismatch |dM p | on φ p (G) (95th percentile) per subject, for abdominal and thoracic respiration. Bold numbers denote the mean 95th percentile over all subjects. Respiration Abdominal Thoracic Abdominal+Thoracic Initial Mismatch [mm], 95th Percentile S1 S2 S3 S4 S5 S6 6.4 14.0 4.4 10.6 7.9 6.5 11.5 18.7 5.5 20.5 10.3 4.8 10.9 18.3 5.3 20.1 9.8 6.1 Respiration Abdominal Thoracic Abdominal+Thoracic S9 16.4 15.7 16.2 S7 7.2 9.5 9.3 S8 2.8 5.4 5.3 S14 13.3 13.8 13.7 S15 18.2 14.2 17.5 S16 16.1 20.5 20.5 Respiration Abdominal Thoracic Abdominal+Thoracic Residual Mismatch [mm], 95th Percentile S1 S2 S3 S4 S5 S6 0.52 0.77 0.58 0.76 0.63 0.72 0.65 1.13 0.55 0.96 0.78 0.86 0.58 0.96 0.56 0.86 0.70 0.80 S7 0.60 0.70 0.65 S8 0.50 0.54 0.52 Respiration Abdominal Thoracic Abdominal+Thoracic S9 0.78 0.77 0.77 S15 0.93 1.17 1.04 S16 0.79 1.05 0.93 S10 9.7 6.3 9.3 S10 0.83 0.66 0.74 S11 11.7 17.1 16.7 S11 0.83 1.03 0.93 S12 6.3 5.5 6.2 S12 0.57 0.55 0.56 S13 8.9 10.6 10.5 S13 0.60 0.85 0.73 S14 0.65 0.71 0.68 S1 -S16 14.0 17.1 15.2 S1 -S16 0.69 0.82 0.76 also present separate results for abdominal and thoracic respiration, respectively. The low residual reconstruction errors indicate that the approach is capable to recover both abdominal and thoracic surface motion fields to a comparable degree in accuracy. List of Symbols Chapter 2 x ∈ R3 φtof r^ (·) c f mod Position in 3-D space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Phase shift in CW ToF imaging . . . . . . . . . . . . . . . . . . . . . . . . . Radial range (distance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Speed of light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modulation frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 13 13 13 13 Set of points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of elements within a set X . . . . . . . . . . . . . . . . . . . . . Moving template point set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fixed reference point set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Point in moving template point set . . . . . . . . . . . . . . . . . . . . . Point in fixed reference point set . . . . . . . . . . . . . . . . . . . . . . . . Rotation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Translation vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Global rigid transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pre-alignment rigid transformation . . . . . . . . . . . . . . . . . . . . . ICP refinement rigid transformation . . . . . . . . . . . . . . . . . . . . Feature descriptor dimensionality . . . . . . . . . . . . . . . . . . . . . . Feature descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set of feature descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moving data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fixed data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Corresponding point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distance metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correspondence operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Initial set of correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . Cross-validated set of correspondences . . . . . . . . . . . . . . . . . Geometric consistency metric . . . . . . . . . . . . . . . . . . . . . . . . . . . Correspondence reliability threshold . . . . . . . . . . . . . . . . . . . Reliable set of correspondences . . . . . . . . . . . . . . . . . . . . . . . . . Rotation angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neighborhood/support region (set of pixels or points) . . Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of histogram bins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Normal vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Position in cylindrical coordinates . . . . . . . . . . . . . . . . . . . . . . 36 36 36 36 36 36 36 36 36 36 36 36 36 36 37 37 37 37 37 37 37 38 38 38 39 39 39 39 39 40 Chapter 3 X |X | Xm Xf x m ∈ R3 x f ∈ R3 R ∈ SO(3) t = (t x, ty , tz )> R g , tg Rpre , tpre Ricp , ticp D d ∈ RD D M F xc ∈ R3 d(·) cm (·), cf (·) Cinit Ccross gc (·) δc C θ N H(·, ·, ·) NH n ∈ R3 xcyl ∈ R2 147 148 dspin χ(·) f (·) ∇f γ(·) P qP (·) bref dhog Nseg s T a ∈ R3 Ncuss rcuss Rθ Xcuss xcuss mcuss d⊥ (·) f rgt (·) e Nriff a Na driff θAP tAP tML tSI ( RGT , tGT ) rN dTRE (·) Appendix Spin image descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristic function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scalar image function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gradient orientation operator . . . . . . . . . . . . . . . . . . . . . . . . . . . Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Projection operator w.r.t. plane P . . . . . . . . . . . . . . . . . . . . . . . MeshHOG second reference axis . . . . . . . . . . . . . . . . . . . . . . . . MeshHOG descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MeshHOG number of circular segments . . . . . . . . . . . . . . . . MeshHOG circular segment index . . . . . . . . . . . . . . . . . . . . . . Tangent plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUSS reference vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUSS sampling density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUSS sampling radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rotation matrix for angle θ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set of CUSS sampling points . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUSS sampling point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUSS mesh intersection point . . . . . . . . . . . . . . . . . . . . . . . . . . Signed orthogonal depth w.r.t. surface . . . . . . . . . . . . . . . . . . Radial gradient transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basis vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RIFF number of annuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RIFF annulus index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RIFF annulus neighborhood . . . . . . . . . . . . . . . . . . . . . . . . . . . . RIFF descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rotation angle around AP axis . . . . . . . . . . . . . . . . . . . . . . . . . . Translation along AP axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Translation along ML axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Translation along SI axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ground truth transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . Neighborhood radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Target registration error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 40 40 41 41 42 42 42 42 42 42 42 42 43 43 43 43 43 43 43 44 44 44 44 44 44 45 45 45 45 45 47 50 Initial transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Photometric RGB data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geometric weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transformation in k-th iteration . . . . . . . . . . . . . . . . . . . . . . . . . Representative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set of representatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Low-grade correspondence threshold . . . . . . . . . . . . . . . . . . . 60 60 61 61 62 62 65 Chapter 4 ( R0 , t 0 ) p = ( pr , p g , p b ) > β ( Rk , t k ) r R δlg E ∈ R4×4 TGT ∈ R4×4 T ∈ R4 × 4 Relative transformation error . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Ground truth transformation matrix . . . . . . . . . . . . . . . . . . . . 69 Estimated transformation matrix . . . . . . . . . . . . . . . . . . . . . . . 69 A.3 Sparse-to-dense Non-Rigid Surface Registration σ 149 Standard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Chapter 5 G Xr Ω ζ ∈ R2 r (·) xr (·), p(·) φ(·) u(·) E Ematch Ereg κ dA (·) P(·) D tr(·) Efid Er,reg Eu,reg λ, µ r0 (·) δreg k · kδreg ϑ (·) ϕ(·) M (rideal , Xrideal ) ( r 0 , X r0 ) (r0,ta , Xr0,ta ) ( r ∗ , X r∗ ) (u∗ , φ∗ ) Er,reg,Q Er,reg,TV φideal (·) ν (r̃ ∗ , Xr̃∗ ) (ũ∗ , φ̃∗ ) p Mp (r p,0,ta , Xr p,0,ta ) φ p (·) Planning shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RI point set/shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parameter domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Position on Ω . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Range (distance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-D/3-D mapping functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . Non-rigid 3-D deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-D displacement field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Energy functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matching energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regularization energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Non-negative weighting parameter . . . . . . . . . . . . . . . . . . . . . SDF w.r.t. shape A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Projection-onto-shape operator . . . . . . . . . . . . . . . . . . . . . . . . . Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fidelity energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Range regularization energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . Displacement regularization energy . . . . . . . . . . . . . . . . . . . . Non-negative weighting parameters . . . . . . . . . . . . . . . . . . . . Measured range (distance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pseudo Huber regularization parameter . . . . . . . . . . . . . . . . Pseudo Huber norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scalar test function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vector-valued test function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Instantaneous patient shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ideal (noise-free) RI data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measured/realistic RI data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Temporally averaged RI data . . . . . . . . . . . . . . . . . . . . . . . . . . . Denoised RI data estimate, joint scheme . . . . . . . . . . . . . . . . Displacement/deformation estimate, joint scheme . . . . . . Quadratic regularization energy . . . . . . . . . . . . . . . . . . . . . . . . TV regularization energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ideal synthetic 3-D deformation . . . . . . . . . . . . . . . . . . . . . . . . Deformation scale parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . Denoised RI data estimate, sequential scheme . . . . . . . . . . Displacement/deformation estimate, sequential scheme Respiration phase index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Instantaneous patient shape at phase p . . . . . . . . . . . . . . . . . Temporally averaged RI data at phase p . . . . . . . . . . . . . . . . Non-rigid 3-D deformation at phase p . . . . . . . . . . . . . . . . . . 86 86 86 86 86 86 87 87 87 87 87 87 87 88 88 88 89 89 89 89 89 89 89 90 90 91 91 91 91 91 91 92 92 92 92 93 93 93 91 94 94 150 Appendix Chapter 6 Y y ∈ R3 ψ(·) Ψ(·) g(·) Q(·) w ∈ R3 W Econ Yp τ u p (·) Sparse set of MLT measurements . . . . . . . . . . . . . . . . . . . . . . MLT measurement point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Non-rigid 3-D deformation, inverse to φ . . . . . . . . . . . . . . . Sparse non-rigid 3-D deformation . . . . . . . . . . . . . . . . . . . . . Graph mapping function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Orthographic projection operator . . . . . . . . . . . . . . . . . . . . . . Displacement vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sparse set of displacements . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consistency energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sparse set of MLT measurements at phase p . . . . . . . . . . . Convergence threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-D displacement field at phase p . . . . . . . . . . . . . . . . . . . . . . 104 104 104 104 105 105 105 105 105 108 108 112 RGB image function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RGB-D reference data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Instantaneous RGB-D data at time t . . . . . . . . . . . . . . . . . . . Non-rigid photometry-driven 3-D deformation . . . . . . . . Non-rigid geometry-driven 3-D deformation . . . . . . . . . . Non-rigid photometry-driven 2-D deformation . . . . . . . . Non-rigid geometry-driven 2-D deformation . . . . . . . . . . Initial mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Residual mismatch, photometry-driven approach . . . . . . Residual mismatch, geometry-driven approach . . . . . . . . 121 122 122 122 122 122 124 124 124 124 Position in world coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . Position in camera coordinates . . . . . . . . . . . . . . . . . . . . . . . . Position in picture coordinates . . . . . . . . . . . . . . . . . . . . . . . . . Position x in homogeneous coordinates . . . . . . . . . . . . . . . . Camera calibration matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Identity matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Focal length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Principal point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Orthogonal range (distance) . . . . . . . . . . . . . . . . . . . . . . . . . . . Orthogonal 2-D/3-D mapping functions . . . . . . . . . . . . . . . Radial 2-D/3-D mapping functions . . . . . . . . . . . . . . . . . . . . Sensor/image resolution (width × height) . . . . . . . . . . . . . 139 139 139 139 139 139 139 139 140 140 140 141 Chapter 7 f rgb (·) (rref , Xrref , f rgb,ref ) (rt , Xrt , f rgb,t ) up , φp ug , φg ũp ũg e0 ep eg Appendix xw ∈ R3 xcc ∈ R3 x p ∈ R2 x̃ K ∈ R3 × 3 I α x , αy c x , cy r⊥ (·) xr⊥ (·), p⊥ (·) xr^ (·), p^ (·) w×h List of Abbreviations AAPM AP API BF CCD CMOS CPU CT CUDA CUSS CW EM FE FOV GMM GPU HOG ICP IGLS IGRT IR LED LINAC MIP ML MLT MRI NCAT NN OC OR PET RBC RGB-D RGT RI RIFF RITK RPM RT American Association of Physicists in Medicine . . . . . . . . . 32 Anterior-Posterior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Application Programming Interface . . . . . . . . . . . . . . . . . . . . 15 Brute Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Charge-Coupled Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Complementary Metal-Oxide-Semiconductor . . . . . . . . . . . 15 Central Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Computed Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Compute Unified Device Architecture . . . . . . . . . . . . . . . . . . 66 Circular Uniform Surface Sampling . . . . . . . . . . . . . . . . . . . . . 42 Continuous-Wave (Modulation) . . . . . . . . . . . . . . . . . . . . . . . . 13 Expectation Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Finite Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Field of View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Graphics Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Histogram of Oriented Gradients . . . . . . . . . . . . . . . . . . . . . . . 35 Iterative Closest Point (Algorithm) . . . . . . . . . . . . . . . . . . . . . . . 3 Image-Guided Liver Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Image-Guided Radiation Therapy . . . . . . . . . . . . . . . . . . . . . . 82 Infrared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Light-Emitting Diode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Linear Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Minimally Invasive Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 56 Medio-Lateral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Multi-Line Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Nurbs-based CArdiac-Torso phantom . . . . . . . . . . . . . . . . . . 93 Nearest Neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Optical Colonoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Operating Room . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Positron Emission Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Random Ball Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 RGB + Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Radial Gradient Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Range Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Rotation Invariant Fast Features . . . . . . . . . . . . . . . . . . . . . . . . 35 Range Imaging ToolKit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Robust Point Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Radiation Therapy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 151 152 SDF SfM SI SIFT SL SLAM SNR SoC SPECT ToF TPS TSDF TV US VC Appendix Signed Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Structure-from-Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Superior-Inferior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Scale-Invariant Feature Transform . . . . . . . . . . . . . . . . . . . . . . 35 Structured Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Simultaneous Localization and Mapping . . . . . . . . . . . . . . . 58 Signal-to-Noise Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 System on a Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Single-Photon Emission Computed Tomography . . . . . . . . . 2 Time-of-Flight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Thin Plate Spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Truncated Signed Distance Function . . . . . . . . . . . . . . . . . . . . 69 Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Ultrasound Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Virtual Colonoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 List of Figures 1.1 Thesis organization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 2.2 Measurement principle of different RI technologies. . . . . . . . . . . 12 MLT sensor measurement principle. . . . . . . . . . . . . . . . . . . . 17 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 Proposed automatic initial patient setup. . . . . . . . . . . . . . . . . Intra-operative navigation in IGLS with a marker-based system. . . . Proposed feature-based rigid surface registration framework. . . . . Geometric consistency check. . . . . . . . . . . . . . . . . . . . . . . . Shape descriptors: Spin Images, MeshHOG, RIFF. . . . . . . . . . . . Materials for patient setup experiments. . . . . . . . . . . . . . . . . . Distribution of point correspondences for patient setup experiments. Porcine liver surface data for IGLS experiments. . . . . . . . . . . . . Distribution of point correspondences for IGLS experiments. . . . . . 33 34 37 38 41 46 49 50 52 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 Benefit of incorporating photometric information into ICP alignment. Proposed photo-geometric reconstruction framework. . . . . . . . . . RBC construction and two-tier NN query scheme. . . . . . . . . . . . Qualitative results for the reconstruction of indoor environments. . . RBC runtime comparison for a single ICP iteration. . . . . . . . . . . Registration error due to approximative RBC-based NN search. . . . Materials for laparoscopy reconstruction experiments. . . . . . . . . . Drift with 3-D vs. 6-D ICP registration in laparoscopy. . . . . . . . . . Drift with ICP on synthetic vs. noisy data in laparoscopy. . . . . . . . Qualitative results for laparoscopy reconstruction experiments. . . . Materials for colonoscopy reconstruction experiments. . . . . . . . . Drift with 3-D vs. 6-D ICP registration in colonoscopy. . . . . . . . . . Drift with approximative vs. exact geometric ICP. . . . . . . . . . . . Influence of the photo-geometric weighting parameter. . . . . . . . . Convergence behavior for 3-D vs. 6-D ICP registration. . . . . . . . . Qualitative results for colonoscopy reconstruction experiments. . . . 59 61 63 65 66 67 68 69 70 71 72 73 74 75 76 77 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 Workflow in RI-guided respiration-synchronized RT. . . . . . . . . . Geometric configuration for dense surface registration. . . . . . . . . Experimental setup for model validation. . . . . . . . . . . . . . . . . Experimental evaluation of denoising models. . . . . . . . . . . . . . Validation of the joint model on male phantom data. . . . . . . . . . . Comparison of the proposed joint approach to a sequential scheme. . Joint denoising and surface registration results on NCAT data. . . . . Joint denoising and surface registration results on real ToF/CT data. 84 87 92 94 95 97 98 99 153 7 154 List of Figures 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 Reconstruction of sparse displacement fields from MLT data. . . . Geometric configuration for sparse-to-dense surface registration. Validation of sparse-to-dense model on NCAT phantom data. . . Illustration of estimated NCAT surface motion fields. . . . . . . . Sparse-to-dense non-rigid surface registration on real MLT data. . Quantitative results of prototype study for realistic MLT data. . . Study of algorithmic parameters and modifications. . . . . . . . . Influence of MLT grid density on residual mismatch. . . . . . . . Quantitative evaluation of influence of MLT grid density. . . . . . . . . . . . . . . . 103 . 104 . 110 . 111 . 112 . 113 . 115 . 116 . 117 7.1 7.2 7.3 7.4 7.5 Motivation for photometry-driven surface registration. . . . . . . Geometric setup of geometry- vs. photometry-driven registration. Comparison of geometry- vs. photometry-driven registration. . . Investigation of landmark matching for deep inhalation study. . . Comparison of estimated surface motion fields. . . . . . . . . . . . . . . . . . 120 . 122 . 125 . 126 . 127 List of Tables 2.1 Specifications of RI sensors investigated in this thesis. . . . . . . . . . 14 3.1 3.2 Patient setup errors with multi-modal surface registration. . . . . . . 48 Organ alignment errors with multi-modal surface registration. . . . . 51 4.1 Runtimes for RBC construction and ICP execution. . . . . . . . . . . . 67 6.1 Quantitative results of prototype study for realistic MLT data. . . . . 114 A.1 Individual results of prototype study for realistic MLT data. . . . . . 146 155 156 List of Tables Bibliography [Aige 08] D. Aiger, N. J. Mitra, and D. Cohen-Or. “4-Points Congruent Sets for Robust Pairwise Surface Registration”. ACM Transactions on Graphics, Vol. 27, No. 3, pp. 85:1–85:10, Aug 2008. [Aken 02] T. Akenine-Möller and E. Haines. Real-Time Rendering. A K Peters, Ltd., Natick, MA, USA, 2nd Ed., 2002. [Albr 12] T. Albrecht and T. Vetter. “Automatic Fracture Reduction”. In: MICCAI Workshop on Mesh Processing in Medical Image Analysis, pp. 22–29, Springer, Oct 2012. [Alle 03] B. Allen, B. Curless, and Z. Popovic. “The Space of Human Body Shapes: Reconstruction and Parameterization from Range Scans”. ACM Transactions on Graphics, Vol. 22, No. 3, pp. 587–594, Jul 2003. [Alno 10] M. R. Alnowami, E. Lewis, M. Guy, and K. Wells. “An Observation Model for Motion Correction in Nuclear Medicine”. In: SPIE Medical Imaging, pp. 76232F–9, Feb 2010. [Alva 99] L. Alvarez, J. Weickert, and J. Sánchez. “A Scale-Space Approach to Nonlocal Optical Flow Calculations”. In: M. Nielsen, P. Johansen, O. F. Olsen, and J. Weickert, Eds., International Conference on Scale-Space Theories in Computer Vision, pp. 235–246, Springer, Sep 1999. [Ambe 07] B. Amberg, S. Romdhani, and T. Vetter. “Optimal Step Nonrigid ICP Algorithms for Surface Registration”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8, IEEE, Jun 2007. [Amid 02] I. Amidror. “Scattered Data Interpolation Methods for Electronic Imaging Systems: a Survey”. SPIE Journal of Electronic Imaging, Vol. 11, No. 2, pp. 157–176, 2002. [Ande 12] M. Andersen, T. Jensen, P. Lisouski, A. Mortensen, M. Hansen, T. Gregersen, and P. Ahrendt. “Kinect Depth Sensor Evaluation for Computer Vision Applications”. Tech. Rep., Aarhus University, Feb 2012. ECE-TR-6. [Anti 02] L. Antiga. Patient-Specific Modeling of Geometry and Blood Flow in Large Arteries. PhD thesis, Politechnico di Milano, 2002. [Armi 66] L. Armijo. “Minimization of Functions having Lipschitz Continuous First Partial Derivatives”. Pacific Journal of Mathematics, Vol. 16, No. 1, pp. 1–3, 1966. [Aude 00] M. A. Audette, F. P. Ferrie, and T. M. Peters. “An Algorithmic Overview of Surface Registration Techniques for Medical Imaging”. Medical Image Analysis, Vol. 4, No. 3, pp. 201 –217, 2000. 157 158 Bibliography [Auri 95] V. Aurich and J. Weule. “Non-Linear Gaussian Filters Performing Edge Preserving Diffusion”. In: German Association for Pattern Recognition (DAGM) Symposium, pp. 538–545, Springer, Sep 1995. [Bala 09] R. Balachandran and J. M. Fitzpatrick. “Iterative Solution for Rigidbody Point-based Registration with Anisotropic Weighting”. In: SPIE Medical Imaging, p. 72613D, Feb 2009. [Bar 07] L. Bar, B. Berkels, M. Rumpf, and G. Sapiro. “A Variational Framework for Simultaneous Motion Estimation and Restoration of MotionBlurred Video”. In: International Conference on Computer Vision (ICCV), pp. 1–8, IEEE, Oct 2007. [Baue 11a] S. Bauer, J. Wasza, S. Haase, N. Marosi, and J. Hornegger. “MultiModal Surface Registration for Markerless Initial Patient Setup in Radiation Therapy using Microsoft’s Kinect Sensor”. In: ICCV Workshop on Consumer Depth Cameras for Computer Vision (CDC4CV), pp. 1175– 1181, IEEE, Nov 2011. [Baue 11b] S. Bauer, J. Wasza, K. Müller, and J. Hornegger. “4D Photogeometric Face Recognition with Time-of-Flight Sensors”. In: Workshop on Applications of Computer Vision (WACV), pp. 196–203, IEEE, Jan 2011. [Baue 12a] S. Bauer, B. Berkels, S. Ettl, O. Arold, J. Hornegger, and M. Rumpf. “Marker-less Reconstruction of Dense 4-D Surface Motion Fields using Active Laser Triangulation for Respiratory Motion Management”. In: N. Ayache, H. Delingette, P. Golland, and K. Mori, Eds., International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 414–421, LNCS 7510, Part I, Springer, Oct 2012. [Baue 12b] S. Bauer, B. Berkels, J. Hornegger, and M. Rumpf. “Joint ToF Image Denoising and Registration with a CT Surface in Radiation Therapy”. In: A. Bruckstein, B. ter Haar Romeny, A. Bronstein, and M. Bronstein, Eds., International Conference on Scale Space and Variational Methods in Computer Vision (SSVM), pp. 98–109, Springer, May 2012. [Baue 12c] S. Bauer, S. Ettl, J. Wasza, F. Willomitzer, F. Huber, J. Hornegger, and G. Häusler. “Sparse Active Triangulation Grids for Respiratory Motion Management”. In: German Branch of the European Optical Society (DGaO) Annual Meeting, p. P23, May 2012. [Baue 12d] S. Bauer, J. Wasza, and J. Hornegger. “Photometric Estimation of 3D Surface Motion Fields for Respiration Management”. In: T. Tolxdorff, T. M. Deserno, H. Handels, and H.-P. Meinzer, Eds., Bildverarbeitung für die Medizin (BVM), Informatik aktuell, pp. 105–110, Springer, Mar 2012. [Baue 13a] S. Bauer, A. Seitel, H. Hofmann, T. Blum, J. Wasza, M. Balda, H.-P. Meinzer, N. Navab, J. Hornegger, and L. Maier-Hein. Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications, Chap. RealTime Range Imaging in Health Care: A Survey, pp. 228–254. LNCS 8200, Springer, 2013. Bibliography 159 [Baue 13b] S. Bauer, J. Wasza, F. Lugauer, D. Neumann, and J. Hornegger. Consumer Depth Cameras for Computer Vision: Research Topics and Applications, Chap. Real-time RGB-D Mapping and 3-D Modeling on the GPU using the Random Ball Cover, pp. 27–48. Advances in Computer Vision and Pattern Recognition, Springer, 2013. [Bell 07] S. Beller, M. Hünerbein, T. Lange, S. Eulenstein, B. Gebauer, and P. M. Schlag. “Image-guided Surgery of Liver Metastases by Threedimensional Ultrasound-based Optoelectronic Navigation”. British Journal of Surgery, Vol. 94, No. 7, pp. 866–875, John Wiley & Sons, Inc., Jul 2007. [Belo 00] S. Belongie, J. Malik, and J. Puzicha. “Shape Context: A New Descriptor for Shape Matching and Object Recognition”. In: International Conference on Neural Information Processing Systems (NIPS), pp. 831–837, Nov 2000. [Benj 99] R. Benjemaa and F. Schmitt. “Fast Global Registration of 3D Sampled Surfaces using a Multi-z-buffer Technique”. Image and Vision Computing, Vol. 17, No. 2, pp. 113–123, 1999. [Berk 06] B. Berkels, M. Burger, M. Droske, O. Nemitz, and M. Rumpf. “Cartoon Extraction Based on Anisotropic Image Classification”. In: International Workshop on Vision, Modeling and Visualization (VMV), pp. 293– 300, Eurographics Association, Nov 2006. [Berk 10] B. Berkels. Joint methods in imaging based on diffuse image representations. PhD thesis, Rheinische Friedrich-Wilhelms-Universität Bonn, Feb 2010. [Berk 13] B. Berkels, S. Bauer, S. Ettl, O. Arold, J. Hornegger, and M. Rumpf. “Joint Surface Reconstruction and 4-D Deformation Estimation from Sparse Data and Prior Knowledge for Marker-Less Respiratory Motion Tracking”. Medical Physics, Vol. 40, No. 9, pp. 091703 1–10, Sep 2013. [Bert 05] C. Bert, K. G. Metheany, K. Doppke, and G. T. Y. Chen. “A Phantom Evaluation of a Stereo-vision Surface Imaging System for Radiotherapy Patient Setup”. Medical Physics, Vol. 32, No. 9, pp. 2753–2762, Sep 2005. [Bert 06] C. Bert, K. G. Metheany, K. P. Doppke, A. G. Taghian, S. N. Powell, and G. T. Y. Chen. “Clinical Experience with a 3D Surface Patient Setup System for Alignment of Partial-breast Irradiation Patients”. International Journal of Radiation Oncology Biology Physics, Vol. 64, No. 4, pp. 1265–1274, Mar 2006. [Besl 92] J. Besl and N. McKay. “A Method for Registration of 3-D Shapes”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 14, No. 2, pp. 239–256, 1992. [Bett 13] V. Bettinardi, E. D. Bernardi, L. Presotto, and M. Gilardi. “MotionTracking Hardware and Advanced Applications in PET and PET/CT”. PET Clinics, Vol. 8, No. 1, pp. 11–28, 2013. 160 Bibliography [Bigd 12a] A. Bigdelou, A. Benz, L. Schwarz, and N. Navab. “Customizable Gesturing Interface for the Operating Room using Kinect”. In: CVPR Workshop on Gesture Recognition, Jun 2012. [Bigd 12b] A. Bigdelou, R. Stauder, T. Benz, A. Okur, T. Blum, R. Ghotbi, and N. Navab. “HCI Design in the OR: A Gesturing Case-study”. In: MICCAI Workshop on Modeling and Monitoring of Computer Assisted Interventions, Oct 2012. [Blai 04] F. Blais. “Review of 20 Years of Range Sensor Development”. Journal of Electronic Imaging, Vol. 13, No. 1, pp. 231–243, 2004. [Blai 95] G. Blais and M. D. Levine. “Registering Multiview Range Data to Create 3-D Computer Objects”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 17, No. 8, pp. 820–824, Aug 1995. [Blum 12] T. Blum, V. Kleeberger, C. Bichlmeier, and N. Navab. “mirracle: An Augmented Reality Magic Mirror System for Anatomy Education”. In: Virtual Reality (VR), pp. 115–116, IEEE, Mar 2012. [Bona 11] F. Bonarrigo, A. Signoroni, and R. Leonardi. “A Robust Pipeline for Rapid Feature-based Pre-alignment of Dense Range Scans”. In: International Conference on Computer Vision (ICCV), pp. 2260–2267, IEEE, Nov 2011. [Book 89] F. L. Bookstein. “Principal Warps: Thin-Plate Splines and the Decomposition of Deformations”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 6, pp. 567–585, 1989. [Boye 11] E. Boyer, A. M. Bronstein, M. M. Bronstein, B. Bustos, T. Darom, R. Horaud, I. Hotz, Y. Keller, J. Keustermans, A. Kovnatsky, R. Litman, J. Reininghaus, I. Sipiran, D. Smeets, P. Suetens, D. Vandermeulen, A. Zaharescu, and V. Zobel. “SHREC 2011: Robust Feature Detection and Description Benchmark”. CoRR, Vol. abs/1102.4258, Feb 2011. [Brad 00] G. Bradski. “The OpenCV Library”. Dr. Dobb’s Journal of Software Tools, 2000. [Brah 08] A. Brahme, P. Nyman, and B. Skatt. “4D Laser Camera for Accurate Patient Positioning, Collision Avoidance, Image Fusion and Adaptive Approaches during Diagnostic and Therapeutic Procedures”. Medical Physics, Vol. 35, No. 5, pp. 1670–1681, 2008. [Bran 06] E. D. Brandner, A. Wu, H. Chen, D. Heron, S. Kalnicki, K. Komanduri, K. Gerszten, S. Burton, I. Ahmed, and Z. Shou. “Abdominal Organ Motion Measured using 4D CT”. International Journal of Radiation Oncology Biology Physics, Vol. 65, No. 2, pp. 554–560, 2006. [Bron 08] I. Bronstein and K. Semendjajew. Taschenbuch der Mathematik. Harri Deutsch, 8th Ed., 2008. [Bron 10] A. Bronstein, M. Bronstein, B. Bustos, U. Castellani, M. Crisani, B. Falcidieno, L. Guibas, I. Kokkinos, V. Murino, I. Sipiran, M. Ovsjanikov, G. Patane, M. Spagnuolo, and J. Sun. “SHREC 2010: Robust Feature Detection and Description Benchmark”. In: Workshop on 3D Object Retrieval (3DOR), pp. 79–86, Eurographics Association, May 2010. Bibliography 161 [Bron 11] A. M. Bronstein, M. M. Bronstein, L. J. Guibas, and M. Ovsjanikov. “Shape Google: Geometric Words and Expressions for Invariant Shape Retrieval”. ACM Transactions on Graphics, Vol. 30, No. 1, pp. 1:1– 1:20, 2011. [Brow 07] B. J. Brown and S. Rusinkiewicz. “Global Non-rigid Alignment of 3-D Scans”. ACM Transactions on Graphics, Vol. 26, No. 3, pp. 21:1–21:9, Jul 2007. [Bruh 05] A. Bruhn, J. Weickert, and C. Schnörr. “Lucas/Kanade meets Horn/Schunck: Combining Local and Global Optic Flow Methods”. International Journal of Computer Vision, Vol. 61, No. 3, pp. 211–231, 2005. [Bruh 12] A. Bruhn, T. Pock, and X.-C. Tai. “Efficient Algorithms for Global Optimisation Methods in Computer Vision”. Dagstuhl Reports, Vol. 1, No. 11, pp. 66–90, 2012. [Bruy 05] P. Bruyant, M. A. Gennert, G. Speckert, R. Beach, J. Morgenstern, N. Kumar, S. Nadella, and M. King. “A Robust Visual Tracking System for Patient Motion Detection in SPECT: Hardware Solutions”. IEEE Transactions on Nuclear Science, Vol. 52, No. 5, pp. 1288–1294, 2005. [Buad 09] T. Buades, Y. Lou, J. Morel, and Z. Tang. “A Note on Multi-image Denoising”. In: International Workshop on Local and Non-Local Approximation in Image Processing (LNLA), pp. 1–15, Aug 2009. [Bust 05] B. Bustos, D. A. Keim, D. Saupe, T. Schreck, and D. V. Vranić. “Featurebased Similarity Search in 3D Object Databases”. ACM Computing Surveys, Vol. 37, pp. 345–387, 2005. [Cach 00] P. Cachier and D. Rey. “Symmetrization of the Non-rigid Registration Problem Using Inversion-Invariant Energies: Application to Multiple Sclerosis”. In: S. Delp, A. DiGoia, and B. Jaramaz, Eds., International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 697–708, LNCS 1935, Springer, Oct 2000. [Carc 02] R. L. Carceroni and K. N. Kutulakos. “Multi-View Scene Capture by Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape and Reflectance”. International Journal of Computer Vision, Vol. 49, No. 2-3, pp. 175–214, Sep 2002. [Cash 07] D. Cash, M. Miga, S. Glasgow, B. Dawant, L. Clements, Z. Cao, R. Galloway, and W. Chapman. “Concepts and Preliminary Data Toward the Realization of Image-guided Liver Surgery”. Journal of Gastrointestinal Surgery, Vol. 11, No. 7, pp. 844–859, Jul 2007. [Catu 12] D. Catuhe. Programming With the Kinect for Windows Software Development Kit: Add Gesture and Posture Recognition to Your Applications. Microsoft Press Series, Microsoft Press, 2012. [Cayt 10] L. Cayton. “A Nearest Neighbor Data Structure for Graphics Hardware”. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS), pp. 1– 10, Sep 2010. 162 Bibliography [Cayt 11] L. Cayton. “Accelerating Nearest Neighbor Search on Manycore Systems”. In: International Parallel and Distributed Processing Symposium (IPDPS), pp. 402–413, IEEE, May 2011. [Chan 11] Y.-J. Chang, S.-F. Chen, and J.-D. Huang. “A Kinect-based System for Physical Rehabilitation: A Pilot Study for Young Adults with Motor Disabilities”. Research in Developmental Disabilities, Vol. 32, No. 6, pp. 2566–2570, 2011. [Chan 12] C.-Y. Chang, B. Lange, M. Zhang, S. Koenig, P. Requejo, N. Somboon, A. A. Sawchuk, and A. A. Rizzo. “Towards Pervasive Physical Rehabilitation Using Microsoft Kinect”. In: International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth), pp. 159– 162, IEEE, May 2012. [Chen 09] C.-I. Chen. Automated Model Building from Video in Computer-aided Diagnosis in Colonoscopy. PhD thesis, University of California, Santa Barbara, 2009. [Chen 10] C.-I. Chen, D. Sargent, and Y.-F. Wang. “Modeling Tumor/Polyp/ Lesion Structure in 3D for Computer-aided Diagnosis in Colonoscopy”. In: SPIE Medical Imaging, pp. 76252F–8, Mar 2010. [Chen 92] Y. Chen and G. Medioni. “Object Modelling by Registration of Multiple Range Images”. Image and Vision Computing, Vol. 10, No. 3, pp. 145–155, Apr 1992. [Chri 01] G. Christensen and H. Johnson. “Consistent Image Registration”. IEEE Transactions on Medical Imaging, Vol. 20, No. 7, pp. 568–582, 2001. [Chua 97] C. S. Chua and R. Jarvis. “Point Signatures: A New Representation for 3D Object Recognition”. International Journal of Computer Vision, Vol. 25, pp. 63–85, 1997. [Chui 03] H. Chui and A. Rangarajan. “A New Point Matching Algorithm for Non-rigid Registration”. Computer Vision and Image Understanding, Vol. 89, No. 2-3, pp. 114–141, 2003. [Clan 11] N. T. Clancy, D. Stoyanov, L. Maier-Hein, A. Groch, G.-Z. Yang, and D. S. Elson. “Spectrally Encoded Fiber-based Structured Lighting Probe for Intraoperative 3D Imaging”. Biomedical Optics Express, Vol. 2, No. 11, pp. 3119–3128, Nov 2011. [Clem 08] L. W. Clements, W. C. Chapman, B. M. Dawant, R. L. Galloway, Jr, and M. I. Miga. “Robust Surface Registration using Salient Anatomical Features for Image-guided Liver Surgery: Algorithm and Validation”. Medical Physics, Vol. 35, No. 6, pp. 2528–2540, Jun 2008. [Cola 12] A. Colaco, A. Kirmani, G. A. Howland, J. C. Howell, and V. K. Goyal. “Compressive Depth Map Acquisition using a Single Photoncounting Detector: Parametric Signal Processing meets Sparsity”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 96–102, Jun 2012. Bibliography [Coll 12] 163 T. Collins and A. Bartoli. “3D Reconstruction in Laparoscopy with Close-Range Photometric Stereo”. In: N. Ayache, H. Delingette, P. Golland, and K. Mori, Eds., International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 634–642, LNCS 7511, Springer, Oct 2012. [Comb 10] B. Combès and S. Prima. “An Efficient EM-ICP Algorithm for Symmetric Consistent Non-linear Registration of Point Sets”. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 594–601, LNCS 6362, Part II, Springer, Sep 2010. [Coro 12] A. Coronato and L. Gallo. “Towards Abnormal Behavior Detection of Cognitive Impaired People”. In: International Workshop on Sensor Networks and Ambient Intelligence, pp. 859–864, IEEE, Mar 2012. [Curl 96] B. Curless and M. Levoy. “A Volumetric Method for Building Complex Models from Range Images”. In: SIGGRAPH Conference on Computer Graphics and Interactive Techniques, pp. 303–312, ACM, 1996. [Dala 05] N. Dalal and B. Triggs. “Histograms of Oriented Gradients for Human Detection”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893, IEEE, Jun 2005. [Daum 11] V. Daum. Model-Constrained Non-Rigid Registration in Medicine. PhD thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, 2011. [Demp 77] A. P. Dempster, N. M. Laird, and D. B. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm”. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, pp. 1–38, 1977. [Depu 11] T. Depuydt, D. Verellen, O. Haas, T. Gevaert, N. Linthout, M. Duchateau, K. Tournel, T. Reynders, K. Leysen, M. Hoogeman, G. Storme, and M. D. Ridder. “Geometric Accuracy of a Novel Gimbals based Radiation Therapy Tumor Tracking System”. Radiotherapy and Oncology, Vol. 98, No. 3, pp. 365–372, 2011. [Diet 11] S. Dieterich, C. Cavedon, C. F. Chuang, A. B. Cohen, J. A. Garrett, C. L. Lee, J. R. Lowenstein, M. F. d’Souza, D. D. Taylor, X. Wu, and C. Yu. “Report of AAPM TG 135: Quality Assurance for Robotic Radiosurgery”. Medical Physics, Vol. 38, No. 6, pp. 2914–2936, Jun 2011. [Dora 97] C. Dorai, J. Weng, and A. K. Jain. “Optimal Registration of Object Views Using Range Data”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 10, pp. 1131–1138, 1997. [Dorr 11] A. A. Dorrington, J. P. Godbaz, M. J. Cree, A. D. Payne, and L. V. Streeter. “Separating True Range Measurements from Multi-path and Scattering Interference in Commercial Range Cameras”. In: SPIE Electronic Imaging, pp. 786404–10, 2011. [Dros 07] M. Droske and M. Rumpf. “Multiscale Joint Segmentation and Registration of Image Morphology”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 12, pp. 2181–2194, Dec 2007. 164 Bibliography [Druo 06] S. Druon, M. Aldon, and A. Crosnier. “Color Constrained ICP for Registration of Large Unstructured 3D Color Data Sets”. In: International Conference on Information Acquisition, pp. 249–255, IEEE, Aug 2006. [Enge 11] N. Engelhard, F. Endres, J. Hess, J. Sturm, and W. Burgard. “Realtime 3D Visual SLAM with a Hand-held RGB-D Camera”. In: RGB-D Workshop on 3D Perception in Robotics, European Robotics Forum, Apr 2011. [Eom 09] J. Eom, C. Shi, X. G. Xu, and S. De. “Modeling Respiratory Motion for Cancer Radiation Therapy Based on Patient-Specific 4DCT Data”. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 348–355, LNCS 5762, Part II, Springer, Sep 2009. [Eom 10] J. Eom, X. G. Xu, S. De, and C. Shi. “Predictive Modeling of Lung Motion over the entire Respiratory Cycle using Measured Pressurevolume Data, 4DCT Images, and Finite-Element Analysis”. Medical Physics, Vol. 37, No. 8, pp. 4389–4400, Aug 2010. [Erns 12] F. Ernst, R. Bruder, A. Schlaefer, and A. Schweikard. “Correlation between External and Internal Respiratory Motion: a Validation Study”. International Journal of Computer Assisted Radiology and Surgery, Vol. 7, pp. 483–492, 2012. [Essa 02] S. Essapen, C. Knowles, A. Norman, and D. Tait. “Accuracy of Set-up of Thoracic Radiotherapy: Prospective Analysis of 24 Patients Treated with Radiotherapy for Lung Cancer”. British Journal of Radiology, Vol. 75, No. 890, pp. 162–169, Feb 2002. [Ettl 12a] S. Ettl, S. Fouladi-Movahed, S. Bauer, O. Arold, F. Willomitzer, F. Huber, S. Rampp, H. Stefan, J. Hornegger, and G. Häusler. “Medical Applications Enabled by a Motion-robust Optical 3D Sensor”. In: German Branch of the European Optical Society (DGaO) Annual Meeting, p. P22, May 2012. [Ettl 12b] S. Ettl, O. Arold, Z. Yang, and G. Häusler. “Flying Triangulation – An Optical 3D Sensor for the Motion-robust Acquisition of Complex Objects”. Applied Optics, Vol. 51, No. 2, pp. 281–289, 2012. [Fali 08] D. Falie, M. Ichim, and L. David. “Respiratory Motion Visualization and the Sleep Apnea Diagnosis with the Time of Flight (ToF) Camera”. In: International Conference on Visualization, Imaging and Simulation (VIS), pp. 179–184, WSEAS, Nov 2008. [Faug 04] O. Faugeras, Q. Luong, and T. Papadopoulo. The Geometry of Multiple Images: The Laws That Govern the Formation of Multiple Images of a Scene and Some of Their Applications. MIT Press, 2004. [Faya 09] H. Fayad, T. Pan, C. Roux, C. Le Rest, O. Pradier, J. Clement, and D. Visvikis. “A Patient Specific Respiratory Model based on 4D CT Data and a Time of Flight Camera (TOF)”. In: Nuclear Science Symposium and Medical Imaging Conference (NSS MIC), pp. 2594–2598, IEEE, Oct 2009. Bibliography 165 [Faya 11] H. Fayad, T. Pan, J. F. Clement, and D. Visvikis. “Correlation of Respiratory Motion between External Patient Surface and Internal Anatomical Landmarks”. Medical Physics, Vol. 38, No. 6, pp. 3157–3164, 2011. [Fero 04] O. Féron and A. Mohammad-Djafari. “Image Fusion and Unsupervised Joint Segmentation using a HMM and MCMC Algorithms”. Journal of Electronic Imaging, Vol. 15, No. 02, p. 023014, May 2004. [Fiel 88] D. A. Field. “Laplacian Smoothing and Delaunay Triangulations”. Communications in Applied Numerical Methods, Vol. 4, No. 6, pp. 709– 712, 1988. [Fisc 02] B. Fischer and J. Modersitzki. Inverse Problems, Image Analysis, and Medical Imaging: AMS Special Session on Interaction of Inverse Problems and Image Analysis, Chap. Fast Diffusion Registration, pp. 117–127. Contemporary Mathematics - American Mathematical Society, AMS, 2002. [Fitz 03] A. W. Fitzgibbon. “Robust Registration of 2D and 3D Point Sets”. Image and Vision Computing, Vol. 21, No. 13-14, pp. 1145–1153, 2003. [Fleu 02] M. Fleute, S. Lavallée, and L. Desbat. “Integrated Approach for Matching Statistical Shape Models with Intra-operative 2D and 3D Data”. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 364–372, LNCS 2489, Part II, Springer, Sep 2002. [Fleu 99] M. Fleute, S. Lavallée, and R. Julliard. “Incorporating a Statistically based Shape Model into a System for Computer-Assisted Anterior Cruciate Ligament Surgery”. Medical Image Analysis, Vol. 3, No. 3, pp. 209–222, Sep 1999. [Fluc 11] O. Fluck, C. Vetter, W. Wein, A. Kamen, B. Preim, and R. Westermann. “A Survey of Medical Image Registration on Graphics Hardware”. Computer Methods and Programs in Biomedicine, Vol. 104, No. 3, pp. 45– 57, 2011. [Foix 11] S. Foix, G. Alenya, and C. Torras. “Lock-in Time-of-Flight (ToF) Cameras: A Survey”. IEEE Sensors Journal, Vol. 11, No. 9, pp. 1917–1926, Sep 2011. [Ford 02] E. C. Ford, G. S. Mageras, E. Yorke, K. E. Rosenzweig, R. Wagman, and C. C. Ling. “Evaluation of Respiratory Movement during Gated Radiotherapy using Film and Electronic Portal Imaging”. International Journal of Radiation Oncology Biology Physics, Vol. 52, No. 2, pp. 522– 531, Feb 2002. [Foth 12] S. Fothergill, H. Mentis, P. Kohli, and S. Nowozin. “Instructing People for Training Gestural Interactive Systems”. In: SIGCHI Conference on Human Factors in Computing Systems, pp. 1737–1746, ACM, 2012. [Fran 02] A. Frangi, D. Rueckert, J. Schnabel, and W. Niessen. “Automatic Construction of Multiple-object Three-dimensional Statistical Shape Models: Application to Cardiac Modeling”. IEEE Transactions on Medical Imaging, Vol. 21, No. 9, pp. 1151–1166, Sep 2002. 166 Bibliography [Fran 09a] M. Frank, M. Plaue, and F. A. Hamprecht. “Denoising of ContinuousWave Time-Of-Flight Depth Images using Confidence Measures”. Optical Engineering, Vol. 48, No. 7, p. 077003, Jul 2009. [Fran 09b] M. Frank, M. Plaue, H. Rapp, U. Köthe, B. Jähne, and F. A. Hamprecht. “Theoretical and Experimental Error Analysis of Continuous-Wave Time-of-Flight Range Cameras”. Optical Engineering, Vol. 48, No. 1, p. 013602, 2009. [Free 10a] B. Freedman, A. Shpunt, and Y. Arieli. “Distance-Varying Illumination and Imaging Techniques for Depth Mapping”. Patent, Nov 2010. US20100290698. [Free 10b] B. Freedman, A. Shpunt, M. Machline, and Y. Arieli. “Depth Mapping using Projected Patterns”. Patent, May 2010. US20100118123. [Fren 09] T. Frenzel. “Patient Setup using a 3D Laser Surface Scanning System”. In: World Congress on Medical Physics and Biomedical Engineering, pp. 217–220, IFMBE, Springer, Sep 2009. [From 04] A. Frome, D. Huber, R. Kolluri, T. Bülow, and J. Malik. “Recognizing Objects in Range Data Using Regional Point Descriptors”. In: European Conference on Computer Vision (ECCV), pp. 224–237, Springer, May 2004. [Fuch 08a] S. Fuchs and G. Hirzinger. “Extrinsic and Depth Calibration of ToFCameras”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–6, IEEE, Jun 2008. [Fuch 08b] S. Fuchs and S. May. “Calibration and Registration for Precise Surface Reconstruction with Time-of-Flight Cameras”. International Journal of Intelligent Systems Technologies and Applications, Vol. 5, No. 3/4, pp. 274–284, Nov 2008. [Fuch 10] S. Fuchs. “Multipath Interference Compensation in Time-of-Flight Camera Images”. In: International Conference on Pattern Recognition (ICPR), pp. 3583–3586, Aug 2010. [Funk 06] T. Funkhouser and P. Shilane. “Partial Matching of 3D Shapes with Priority-driven Search”. In: Symposium on Geometry Processing, pp. 131–142, Eurographics Association, 2006. [Gal 06] R. Gal and D. Cohen-Or. “Salient Geometric Features for Partial Shape Matching and Similarity”. ACM Transactions on Graphics, Vol. 25, pp. 130–150, Jan 2006. [Gall 10] S. Gallo, D. Chapuis, L. Santos-Carreras, Y. Kim, P. Retornaz, H. Bleuler, and R. Gassert. “Augmented White Cane with Multimodal Haptic Feedback”. In: International Conference on Biomedical Robotics and Biomechatronics (BioRob), pp. 149–155, IEEE, RAS, EMBS, Sep 2010. [Gall 11] L. Gallo, A. P. Placitelli, and M. Ciampi. “Controller-Free Exploration of Medical Image Data: Experiencing the Kinect”. In: International Symposium on Computer-Based Medical Systems (CBMS), pp. 1–6, IEEE, 2011. Bibliography 167 [Gama 12] A. da Gama, T. Chaves, L. Figueiredo, and V. Teichrieb. “Improving Motor Rehabilitation Process through a Natural Interaction based System using Kinect Sensor”. In: Symposium on 3D User Interfaces (3DUI), pp. 145–146, IEEE, Mar 2012. [Garc 08] J. Garcia and Z. Zalevsky. “Range Mapping using Speckle Decorrelation”. Patent, Feb 2008. US7433024. [Garc 12] J. A. Garcia, K. F. Navarro, D. Schoene, S. T. Smith, and Y. Pisan. Health Informatics: Building a Healthcare Future Through Trusted Information, Chap. Exergames for the Elderly: Towards an Embedded Kinectbased Clinical Test of Falls Risk, pp. 51–57. Studies in Health Technology and Informatics, IOS, 2012. [Garg 13] R. Garg, A. Roussos, and L. Agapito. “A Variational Approach to Video Registration with Subspace Constraints”. International Journal of Computer Vision, pp. 1–29, 2013. [Gelf 05] N. Gelfand, N. Mitra, L. Guibas, and H. Pottmann. “Robust Global Registration”. In: H. P. M. Desbrun, Ed., Symposium on Geometry Processing, pp. 197–206, Eurographics Association, 2005. [Geve 99] T. Gevers and A. W. Smeulders. “Color-based Object Recognition”. Pattern Recognition, Vol. 32, No. 3, pp. 453–464, 1999. [Gian 11] C. Gianoli, M. Riboldi, M. F. Spadea, L. L. Travaini, M. Ferrari, R. Mei, R. Orecchia, and G. Baroni. “A Multiple Points Method for 4D CT Image Sorting”. Medical Physics, Vol. 38, No. 2, pp. 656–667, 2011. [Gier 08] D. P. Gierga, M. Riboldi, J. C. Turcotte, G. C. Sharp, S. B. Jiang, A. G. Taghian, and G. T. Chen. “Comparison of Target Registration Errors for Multiple Image-Guided Techniques in Accelerated Partial Breast Irradiation”. International Journal of Radiation Oncology Biology Physics, Vol. 70, No. 4, pp. 1239–1246, 2008. [Gies 12] M. van de Giessen, F. M. Vos, C. A. Grimbergen, L. J. van Vliet, and G. J. Streekstra. “An Efficient and Robust Algorithm for Parallel Groupwise Registration of Bone Surfaces”. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 164–171, LNCS 7512, Part III, Springer, Oct 2012. [Godi 94] G. Godin, M. Rioux, and R. Baribeau. “Three-Dimensional Registration using Range and Intensity Information”. SPIE Videometrics, pp. 279–290, Nov 1994. [Gott 11] J.-M. Gottfried, J. Fehr, and C. S. Garbe. “Computing Range Flow from Multi-modal Kinect Data”. In: International Symposium on Visual Computing (ISVC), pp. 758–767, Springer, Jul 2011. [Gran 02] S. Granger and X. Pennec. “Multi-scale EM-ICP: A Fast and Robust Approach for Surface Registration”. In: European Conference on Computer Vision (ECCV), pp. 418–432, May 2002. 168 Bibliography [Grim 12] R. Grimm, S. Bauer, J. Sukkau, J. Hornegger, and G. Greiner. “Markerless Estimation of Patient Orientation, Posture and Pose using Range and Pressure Imaging”. International Journal of Computer Assisted Radiology and Surgery, Vol. 7, No. 6, pp. 921–929, Nov 2012. [Haas 12] S. Haase, C. Forman, T. Kilgus, R. Bammer, L. Maier-Hein, and J. Hornegger. “ToF/RGB Sensor Fusion for Augmented 3-D Endoscopy using a Fully Automatic Calibration Scheme”. In: T. Tolxdorff, T. M. Deserno, H. Handels, and H.-P. Meinzer, Eds., Bildverarbeitung für die Medizin (BVM), pp. 111–116, Springer, Mar 2012. [Haas 13] S. Haase, J. Wasza, T. Kilgus, and J. Hornegger. “Laparoscopic Instrument Localization using a 3-D Time-of-Flight/RGB Endoscope”. In: Workshop on Applications of Computer Vision (WACV), pp. 449–454, IEEE, Jan 2013. [Hadf 11] S. Hadfield and R. Bowden. “Kinecting the Dots: Particle based Scene Flow from Depth Sensors”. In: International Conference on Computer Vision (ICCV), pp. 2290–2295, IEEE, Nov 2011. [Hajn 01] J. Hajnal, D. Hawkes, and D. Hill. Medical Image Registration. Biomedical Engineering Series, Taylor & Francis Group, 2001. [Han 07] J. Han, B. Berkels, M. Droske, J. Hornegger, M. Rumpf, C. Schaller, J. Scorzin, and H. Urbach. “Mumford-Shah Model for One-to-One Edge Matching”. IEEE Transactions on Image Processing, Vol. 16, No. 11, pp. 2720–2732, 2007. [Hans 13] M. Hansard, S. Lee, O. Choi, and R. Horaud. Time-of-Flight Cameras: Principles, Methods and Applications. SpringerBriefs in Computer Science, Springer, 2013. [Hart 04] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2nd Ed., 2004. [Hasl 09] N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel. “A Statistical Model of Human Pose and Body Shape”. In: P. Dutré and M. Stamminger, Eds., Computer Graphics Forum, pp. 337–346, Eurographics Association, Mar 2009. [Haus 11] G. Häusler and S. Ettl. “Limitations of Optical 3D Sensors”. In: R. Leach, Ed., Optical Measurement of Surface Topography, pp. 23–48, Springer, 2011. [He 13] K. He, J. Sun, and X. Tang. “Guided Image Filtering”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 6, pp. 1397–1409, 2013. [Heim 09] T. Heimann and H.-P. Meinzer. “Statistical Shape Models for 3D Medical Image Segmentation: A Review”. Medical Image Analysis, Vol. 13, No. 4, pp. 543–563, 2009. [Henr 12] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. “RGB-D Mapping: Using Kinect-style Depth Cameras for Dense 3D Modeling of Indoor Environments”. International Journal of Robotics Research, Vol. 31, No. 5, pp. 647–663, 2012. Bibliography 169 [Herb 13] E. Herbst, X. Ren, and D. Fox. “RGB-D Flow: Dense 3-D Motion Estimation Using Color and Depth”. In: International Conference on Robotics and Automation (ICRA), p. to appear, IEEE, May 2013. [Herl 99a] A. Herline et al. “Image-guided Surgery: Preliminary Feasibility Studies of Frameless Stereotactic Liver Surgery”. Archives of Surgery, Vol. 134, No. 6, pp. 644–650, 1999. [Herl 99b] A. J. Herline, J. L. Herring, J. D. Stefansic, W. C. Chapman, R. L. Galloway, and B. M. Dawant. “Surface Registration for Use in Interactive Image-Guided Liver Surgery”. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 892–899, LNCS 1679, Springer, Sep 1999. [Hers 08] M. Hersh and M. Johnson. Assistive Technology for Visually Impaired and Blind People. Springer, 2008. [Higg 08] W. E. Higgins, J. P. Helferty, K. Lu, S. A. Merritt, L. Rai, and K.-C. Yu. “3D CT-Video Fusion for Image-guided Bronchoscopy”. Computerized Medical Imaging and Graphics, Vol. 32, No. 3, pp. 159–173, Apr 2008. [Hilt 96] A. Hilton, A. J. Stoddart, J. Illingworth, and T. Windeatt. “Reliable Surface Reconstructiuon from Multiple Range Images”. In: European Conference on Computer Vision (ECCV), pp. 117–126, Springer, Apr 1996. [Hoff 07] G. Hoff, M. Bretthauer, S. Dahler, G. Huppertz-Hauss, J. Sauar, J. Paulsen, B. Seip, and V. Moritz. “Improvement in Caecal Intubation Rate and Pain Reduction by using 3-dimensional Magnetic Imaging for Unsedated Colonoscopy: A Randomized Trial of Patients referred for Colonoscopy”. Scandinavian Journal of Gastroenterology, Vol. 42, No. 7, pp. 885–889, 2007. [Holz 12] S. Holzer, J. Shotton, and P. Kohli. “Learning to Efficiently Detect Repeatable Interest Points in Depth Data”. In: European Conference on Computer Vision (ECCV), pp. 200–213, Springer, Oct 2012. [Hoog 09] M. Hoogeman, J.-B. Prévost, J. Nuyttens, et al. “Clinical Accuracy of the Respiratory Tumor Tracking System of the Cyberknife: Assessment by Analysis of Log Files”. International Journal of Radiation Oncology Biology Physics, Vol. 74, No. 1, pp. 297–303, 2009. [Horn 81] B. K. P. Horn and B. G. Schunck. “Determining Optical Flow”. Artificial Intelligence, Vol. 17, No. 1-3, pp. 185–203, 1981. [Horn 87] B. K. P. Horn. “Closed-form Solution of Absolute Orientation using Unit Quaternions”. Journal of the Optical Society of America, Vol. 4, No. 4, pp. 629–642, Apr 1987. [Huan 06] X. Huang, N. Paragios, and D. N. Metaxas. “Shape Registration in Implicit Spaces Using Information Theory and Free Form Deformations”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 8, pp. 1303–1318, 2006. 170 Bibliography [Huan 11] J.-D. Huang. “Kinerehab: A Kinect-based System for Physical Rehabilitation: A Pilot Study for Young Adults with Motor Disabilities”. In: International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS), pp. 319–320, 2011. [Hugu 07] F. Huguet and F. Devernay. “A Variational Method for Scene Flow Estimation from Stereo Sequences”. In: International Conference on Computer Vision (ICCV), pp. 1–7, IEEE, Oct 2007. [Huhl 08] B. Huhle, P. Jenke, and W. Strasser. “On-the-fly Scene Acquisition with a Handy Multi-sensor System”. International Journal of Intelligent Systems Technologies and Applications, Vol. 5, pp. 255–263, Nov 2008. [Iban 05] L. Ibanez, W. Schroeder, L. Ng, and J. Cates. The ITK Software Guide. Kitware, Inc., 2nd Ed., 2005. [Izad 11] S. Izadi, R. A. Newcombe, D. Kim, O. Hilliges, D. Molyneaux, S. Hodges, P. Kohli, J. Shotton, A. J. Davison, and A. W. Fitzgibbon. “KinectFusion: Real-time Dynamic 3D Surface Reconstruction and Interaction”. In: SIGGRAPH Talks, p. 23, ACM, 2011. [Jahn 99] B. Jähne, H. Haussecker, and P. Geissler. Handbook of Computer Vision and Applications: Sensors and Imaging. Handbook of Computer Vision and Applications, Academic Press, 1999. [Jarv 83] R. A. Jarvis. “A Perspective on Range Finding Techniques for Computer Vision”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 5, No. 2, pp. 122–139, 1983. [Jens 12] R. R. Jensen, O. V. Olesen, R. R. Paulsen, M. van der Poel, and R. Larsen. “Statistical Surface Recovery: A Study on Ear Canals”. In: MICCAI Workshop on Mesh Processing in Medical Image Analysis, pp. 49– 58, Oct 2012. [Jian 05] B. Jian and B. C. Vemuri. “A Robust Algorithm for Point Set Registration Using Mixture of Gaussians”. In: International Conference on Computer Vision (ICCV), pp. 1246–1251, IEEE, Oct 2005. [Jian 11] B. Jian and B. C. Vemuri. “Robust Point Set Registration Using Gaussian Mixture Models”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 8, pp. 1633–1645, 2011. [John 02] H. Johnson and G. Christensen. “Consistent landmark and intensitybased image registration”. IEEE Transactions on Medical Imaging, Vol. 21, No. 5, pp. 450–461, 2002. [John 11] R. Johnson, K. O’Hara, A. Sellen, C. Cousins, and A. Criminisi. “Exploring the Potential for Touchless Interaction in Image-guided Interventional Radiology”. In: Conference on Computer-Human Interaction (CHI), pp. 3323–3332, ACM, 2011. [John 97] A. Johnson and S. B. Kang. “Registration and Integration of Textured 3-D Data”. In: International Conference on Recent Advances in 3-D Digital Imaging and Modeling, pp. 234–241, IEEE, May 1997. Bibliography 171 [John 98] A. Johnson and M. Hebert. “Surface Matching for Object Recognition in Complex Three-dimensional Scenes”. Image and Vision Computing, Vol. 16, No. 9-10, pp. 635–651, 1998. [John 99] A. E. Johnson and M. Hebert. “Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, pp. 433–449, May 1999. [Jone 06] M. W. Jones, J. A. Bærentzen, and M. Srámek. “3D Distance Fields: A Survey of Techniques and Applications”. IEEE Transactions on Visualization and Computer Graphics, Vol. 12, No. 4, pp. 581–599, 2006. [Joun 09] J. H. Joung, K. H. An, J. W. Kang, M. J. Chung, and W. Yu. “3D Environment Reconstruction using Modified Color ICP Algorithm by Fusion of a Camera and a 3D Laser Range Finder”. In: International Conference on Intelligent Robots and Systems, pp. 3082–3088, IEEE, RSJ, Oct 2009. [Kaic 11] O. van Kaick, H. Zhang, G. Hamarneh, and D. Cohen-Or. “A Survey on Shape Correspondence”. Computer Graphics Forum, Vol. 30, No. 6, pp. 1681–1707, 2011. [Kain 12] B. Kainz, S. Hauswiesner, G. Reitmayr, M. Steinberger, R. Grasset, L. Gruber, E. E. Veas, D. Kalkofen, H. Seichter, and D. Schmalstieg. “OmniKinect: Real-Time Dense Volumetric Data Acquisition and Applications”. In: Symposium on Virtual Reality Software and Technology (VRST), pp. 25–32, ACM, 2012. [Kapu 01] T. Kapur, L. Yezzi, and L. Zöllei. “A Variational Framework for Joint Segmentation and Registration”. In: Workshop on Mathematical Methods in Biomedical Image Analysis (WMMBIA), pp. 44–51, IEEE, 2001. [Katz 12] B. Katz, S. Kammoun, G. Parseihian, O. Gutierrez, A. Brilhault, M. Auvray, P. Truillet, M. Denis, S. Thorpe, and C. Jouffrais. “NAVIG: Augmented Reality Guidance System for the Visually Impaired”. Virtual Reality, Vol. 16, pp. 253–269, 2012. [Kazh 03] M. M. Kazhdan, T. A. Funkhouser, and S. Rusinkiewicz. “Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors”. In: Symposium on Geometry Processing, pp. 156–164, ACM, 2003. [Keal 06] P. J. Keall, G. S. Mageras, J. M. Balter, R. S. Emery, K. M. Forster, S. B. Jiang, J. M. Kapatoes, D. A. Low, M. J. Murphy, B. R. Murray, C. R. Ramsey, M. B. V. Herk, S. S. Vedam, J. W. Wong, and E. Yorke. “The Management of Respiratory Motion in Radiation Oncology: Report of AAPM Task Group 76”. Medical Physics, Vol. 33, No. 10, pp. 3874–3900, Oct 2006. [Kell 09] M. Keller and A. Kolb. “Real-time Simulation of Time-of-Flight Sensors”. Simulation Modelling Practice and Theory, Vol. 17, No. 5, pp. 967– 978, 2009. [Khai 08] K. Khairy and J. Howard. “Spherical Harmonics-based Parametric Deconvolution of 3D Surface Images using Bending Energy Minimization”. Medical Image Analysis, Vol. 12, No. 2, pp. 217–227, 2008. 172 Bibliography [Khos 12] K. Khoshelham and S. O. Elberink. “Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications”. Sensors, Vol. 12, No. 2, pp. 1437–1454, 2012. [Kilb 10] W. Kilby, J. R. Dooley, G. Kuduvalli, S. Sayeh, and C. R. Maurer. “The CyberKnife Robotic Radiosurgery System in 2010”. Technology in Cancer Research and Treatment, Vol. 9, No. 5, pp. 433–452, Oct 2010. [Knut 93] H. Knutsson and C.-F. Westin. “Normalized and Differential Convolution: Methods for Interpolation and Filtering of Incomplete and Uncertain Data”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 515–523, IEEE, Jun 1993. [Kolb 09] A. Kolb, E. Barth, R. Koch, and R. Larsen. “Time-of-Flight Sensors in Computer Graphics”. In: Eurographics State of the Art Reports, pp. 119– 134, 2009. [Kopp 07] D. Koppel, C.-I. Chen, Y.-F. Wang, H. Lee, J. Gu, A. Poirson, and R. Wolters. “Toward Automated Model Building from Video in Computer-assisted Diagnoses in Colonoscopy”. In: SPIE Medical Imaging, pp. 65091L–9, Feb 2007. [Kren 09] M. Krengli, S. Gaiano, E. Mones, A. Ballarè, D. Beldì, C. Bolchini, and G. Loi. “Reproducibility of Patient Setup by Surface Image Registration System in Conformal Radiotherapy of Prostate Cancer”. Radiation Oncology, Vol. 4, p. 9, 2009. [Kubo 96] H. D. Kubo and B. C. Hill. “Respiration Gated Radiotherapy Treatment: A Technical Study”. Physics in Medicine and Biology, Vol. 41, No. 1, pp. 83–91, Jan 1996. [Kuhn 06] W. Kühnel. Differential Geometry: Curves - Surfaces - Manifolds. AMS, 2006. [Kupe 07] P. Kupelian, T. Willoughby, A. Mahadevan, T. Djemil, G. Weinstein, S. Jani, C. Enke, T. Solberg, N. Flores, D. Liu, D. Beyer, and L. Levine. “Multi-institutional Clinical Experience with the Calypso System in Localization and Continuous, Real-time Monitoring of the Prostate Gland during External Radiotherapy”. International Journal of Radiation Oncology Biology Physics, Vol. 67, No. 4, pp. 1088–1098, Mar 2007. [Kurt 11] S. Kurtek, E. Klassen, Z. Ding, S. Jacobson, J. Jacobson, M. Avison, and A. Srivastava. “Parameterization-Invariant Shape Comparisons of Anatomical Surfaces”. IEEE Transactions on Medical Imaging, Vol. 30, No. 3, pp. 849–858, 2011. [Ladi 08] A. Ladikos, S. Benhimane, and N. Navab. “Real-Time 3D Reconstruction for Collision Avoidance in Interventional Environments”. In: International Conference on Medical Image Computing and ComputerAssisted Intervention (MICCAI), pp. 526–534, LNCS 5242, Part II, Springer, Sep 2008. [Lang 00] R. Lange. 3D Time-of-Flight Distance Measurement with Custom SolidState Image Sensors in CMOS/CCD-Technology. PhD thesis, Universität Siegen, 2000. Bibliography 173 [Lang 01] K. Langen and D. Jones. “Organ Motion and its Management”. International Journal of Radiation Oncology Biology Physics, Vol. 50, No. 1, pp. 265–278, 2001. [Lang 11] B. Lange, C.-Y. Chang, E. Suma, B. Newman, A. Rizzo, and M. Bolas. “Development and Evaluation of Low Cost Game-based Balance Rehabilitation Tool using the Microsoft Kinect Sensor”. In: International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 1831–1834, Aug 2011. [Lea 12] C. S. Lea, J. C. Fackler, G. D. Hager, , and R. H. Taylor. “Towards Automated Activity Recognition in an Intensive Care Unit”. In: MICCAI Workshop on Modeling and Monitoring of Computer Assisted Interventions, Oct 2012. [Lee 12] S. Lee, B. Kang, J. D. Kim, and C. Y. Kim. “Motion Blur-free Timeof-Flight Range Sensor”. In: SPIE Sensors, Cameras, and Systems for Industrial and Scientific Applications, pp. 82980U–6, 2012. [Lenz 11] F. Lenzen, H. Schäfer, and C. S. Garbe. “Denoising Time-Of-Flight Data with Adaptive Total Variation”. In: International Symposium on Visual Computing (ISVC), pp. 337–346, Springer, Jul 2011. [Leto 11] A. Letouzey, B. Petit, and E. Boyer. “Scene Flow from Depth and Color Images”. In: J. Hoey, S. McKenna, and E. Trucco, Eds., British Machine Vision Conference (BMVC), pp. 46.1–46.11, BMVA Press, Sep 2011. [Li 05] X. Li and I. Guskov. “Multi-scale Features for Approximate Alignment of Point-based Surfaces”. In: Symposium on Geometry Processing, pp. 217–226, Eurographics Association, Jul 2005. [Lind 10] M. Lindner, I. Schiller, A. Kolb, and R. Koch. “Time-of-Flight Sensor Calibration for Accurate Range Sensing”. Computer Vision and Image Understanding, Vol. 114, No. 12, pp. 1318–1328, 2010. Special Issue on Time-of-Flight Camera Based Computer Vision. [Liu 08] J. Liu, K. Subramanian, T. Yoo, and R. Van Uitert. “A Stable Optic-flow Based Method for Tracking Colonoscopy Images”. In: CVPR Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA), pp. 1– 8, IEEE, Jun 2008. [Liu 09] C. Liu. Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. PhD thesis, MIT, May 2009. [Lore 87] W. E. Lorensen and H. E. Cline. “Marching Cubes: A High Resolution 3D Surface Construction Algorithm”. In: SIGGRAPH, pp. 163–169, ACM, Jul 1987. [Lowe 04] D. G. Lowe. “Distinctive Image Features from Scale-Invariant Keypoints”. International Journal of Computer Vision, Vol. 60, pp. 91–110, Nov 2004. [Luca 81] B. D. Lucas and T. Kanade. “An Iterative Image Registration Technique with an Application to Stereo Vision”. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 674–679, Aug 1981. 174 Bibliography [Maie 11] L. Maier-Hein, A. M. Franz, M. Fangerau, M. Schmidt, A. Seitel, S. Mersmann, T. Kilgus, A. Groch, K. Yung, T. R. dos Santos, and H.-P. Meinzer. “Towards Mobile Augmented Reality for On-Patient Visualization of Medical Images”. In: H. Handels, J. Ehrhardt, T. M. Deserno, H.-P. Meinzer, and T. Tolxdorff, Eds., Bildverarbeitung für die Medizin (BVM), pp. 389–393, Springer, Mar 2011. [Maie 12] L. Maier-Hein, A. Franz, T. dos Santos, M. Schmidt, M. Fangerau, H.P. Meinzer, and J. M. Fitzpatrick. “Convergent Iterative Closest-Point Algorithm to Accomodate Anisotropic and Inhomogenous Localization Error”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No. 8, pp. 1520–1532, 2012. [Main 98] J. B. A. Maintz and M. A. Viergever. “A Survey of Medical Image Registration”. Medical Image Analysis, Vol. 2, No. 1, pp. 1–36, 1998. [Mana 06] S. Manay, D. Cremers, B.-W. Hong, A. J. Yezzi, and S. Soatto. “Integral Invariants for Shape Matching”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 10, pp. 1602–1618, 2006. [Mark 10] M. Markert, A. Koschany, and T. Lueth. “Tracking of the Liver for Navigation in Open Surgery”. International Journal of Computer Assisted Radiology and Surgery, Vol. 5, No. 3, pp. 229–235, May 2010. [Mark 12] P. Markelj, D. Tomazevic, B. Likar, and F. Pernus. “A Review of 3D/2D Registration Methods for Image-guided Interventions”. Medical Image Analysis, Vol. 16, No. 3, pp. 642–661, 2012. [Marq 63] D. W. Marquardt. “An Algorithm for Least-squares Estimation of Nonlinear Parameters”. Journal of the Society for Industrial & Applied Mathematics, Vol. 11, No. 2, pp. 431–441, 1963. [Mate 08] D. Mateus, R. Horaud, D. Knossow, F. Cuzzolin, and E. Boyer. “Articulated Shape Matching using Laplacian Eigenfunctions and Unsupervised Point Registration”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8, IEEE, Jun 2008. [McCl 06] J. McClelland, J. Blackall, S. Tarte, A. Chandler, S. Hughes, S. Ahmad, D. Landau, and D. Hawkes. “A Continuous 4D Motion Model from Multiple Respiratory Cycles for Use in Lung Radiotherapy”. Medical Physics, Vol. 33, No. 9, pp. 3348–3358, 2006. [McCl 13] J. McClelland, D. Hawkes, T. Schaeffter, and A. King. “Respiratory Motion Models: A Review”. Medical Image Analysis, Vol. 17, No. 1, pp. 19–42, 2013. [McFa 11] E. G. McFarland, K. J. Keysor, and D. J. Vining. “Virtual Colonoscopy: From Concept to Implementation”. In: A. H. Dachman and A. Laghi, Eds., Atlas of Virtual Colonoscopy, pp. 3–7, Springer, 2011. [McNa 09] J. E. McNamara, P. H. Pretorius, K. Johnson, J. M. Mukherjee, J. Dey, M. A. Gennert, and M. A. King. “A Flexible Multicamera Visualtracking System for Detecting and Correcting Motion-induced Artifacts in Cardiac SPECT Slices”. Medical Physics, Vol. 36, No. 5, pp. 1913–1923, 2009. Bibliography 175 [Meek 12] S. L. Meeks, T. R. Willoughby, K. M. Langen, and P. A. Kupelian. Image-Guided Radiation Therapy, Chap. Optical and Remote Monitoring IGRT, pp. 1–12. Imaging in Medical Diagnosis and Therapy, CRC Press, 2012. [Ment 12] H. M. Mentis, K. O’Hara, A. Sellen, and R. Trivedi. “Interaction Proxemics and Image Use in Neurosurgery”. In: Conference on ComputerHuman Interaction (CHI), pp. 927–936, ACM, May 2012. [Miko 05] K. Mikolajczyk and C. Schmid. “A Performance Evaluation of Local Descriptors”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 10, pp. 1615–1630, 2005. [Miqu 13] M. Miquel, J. Blackall, S. Uribe, D. Hawkes, and T. Schaeffter. “Patientspecific Respiratory Models using Dynamic 3D MRI: Preliminary Volunteer Results”. Physica Medica: European Journal of Medical Physics, Vol. 29, No. 2, pp. 214–220, Mar 2013. [Miro 11] D. J. Mirota, M. Ishii, and G. D. Hager. “Vision-based Navigation in Image-guided Interventions”. Annual Review of Biomedical Engineering, Vol. 13, pp. 297–319, Aug 2011. [Mode 03a] J. Modersitzki. Numerical Methods for Image Registration. Numerical Mathematics and Scientific Computation, Oxford University Press, 2003. [Mode 03b] J. Modersitzki and B. Fischer. “Curvature Based Image Registration”. Journal of Mathematical Imaging and Vision, Vol. 18, No. 1, pp. 81–85, 2003. [Mode 09] J. Modersitzki. FAIR: Flexible Algorithms for Image Registration. Fundamentals of Algorithms, Society for Industrial and Applied Mathematics, 2009. [Monn 11] H. Mönnich, P. Nicolai, J. Raczkowsky, and H. Wörn. “A SemiAutonomous Robotic Teleoperation Surgery Setup with Multi 3D Camera Supervision”. In: International Journal of Computer Assisted Radiology and Surgery, pp. 132–133, 2011. [Mose 11] T. Moser, S. Fleischhacker, K. Schubert, G. Sroka-Perez, and C. P. Karger. “Technical Performance of a Commercial Laser Surface Scanning System for Patient Setup Correction in Radiotherapy”. Physica Medica: European Journal of Medical Physics, Vol. 27, No. 4, pp. 224–232, 2011. [Mosh 12] E. R. Moshe Gabel, Ran Gilad-Bachrach and A. Schuster. “Full Body Gait Analysis with Kinect”. In: International Conference of the Engineering in Medicine and Biology Society (EMBC), IEEE, Aug 2012. [Moun 07] P. Mountney, B. Lo, S. Thiemjarus, D. Stoyanov, and G. Zhong-Yang. “A Probabilistic Framework for Tracking Deformable Soft Tissue in Minimally Invasive Surgery”. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 34– 41, LNCS 4792, Part II, Springer, Nov 2007. 176 Bibliography [Muac 07] A. Muacevic, C. Drexler, A. Wowra, A. Schweikard, A. Schlaefer, R. T. Hoffmann, R. Wilkowski, and H. Winter. “Technical Description, Phantom Accuracy, and Clinical Feasibility for Single-session Lung Radiosurgery Using Robotic Image-guided Real-time Respiratory Tumor Tracking”. Technology in Cancer Research and Treatment, Vol. 6, No. 4, pp. 321–328, Aug 2007. [Mull 10] K. Müller. Multi-modal Organ Surface Registration using Time-of-Flight Imaging. Master’s thesis, Friedrich-Alexander-Universität ErlangenNürnberg, 2010. [Mull 11] K. Müller, S. Bauer, J. Wasza, and J. Hornegger. “Automatic Multi-modal ToF/CT Organ Surface Registration”. In: H. Handels, J. Ehrhardt, T. M. Deserno, H.-P. Meinzer, and T. Tolxdorff, Eds., Bildverarbeitung für die Medizin (BVM), pp. 154–158, Springer, Mar 2011. [Murp 04] M. J. Murphy. “Tracking Moving Organs in Real Time”. Seminars in Radiation Oncology, Vol. 14, No. 1, pp. 91–100, Jan 2004. [Murp 07] M. J. Murphy, J. Balter, S. Balter, J. Jose A. BenComo, I. J. Das, S. B. Jiang, C.-M. Ma, G. H. Olivera, R. F. Rodebaugh, K. J. Ruchala, H. Shirato, and F.-F. Yin. “The Management of Imaging Dose during Imageguided Radiotherapy, Report of the AAPM Task Group 75”. Medical Physics, Vol. 34, No. 10, pp. 4041–4063, 2007. [Myro 10] A. Myronenko and X. B. Song. “Point Set Registration: Coherent Point Drift”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 12, pp. 2262–2275, 2010. [Nava 11] N. Navab and S. Holzer. “Real-time 3D Reconstruction: Applications to Collision Detection and Surgical Workflow Monitoring”. In: IROS Workshop on Methods for Safer Surgical Robotics Procedures, IEEE, RSJ, Sep 2011. [Nava 12] N. Navab, T. Blum, L. Wang, A. Okur, and T. Wendler. “First Deployments of Augmented Reality in Operating Rooms”. Computer, Vol. 45, No. 7, pp. 48–55, Jul 2012. [Neum 11] D. Neumann, F. Lugauer, S. Bauer, J. Wasza, and J. Hornegger. “Realtime RGB-D Mapping and 3-D Modeling on the GPU using the Random Ball Cover Data Structure”. In: ICCV Workshop on Consumer Depth Cameras for Computer Vision (CDC4CV), pp. 1161–1167, IEEE, Nov 2011. [Newc 11] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. W. Fitzgibbon. “KinectFusion: Real-time Dense Surface Mapping and Tracking”. In: International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136, IEEE, Oct 2011. [Nico 11] P. Nicolai and J. Raczkowsky. “Operation Room Supervision for Safe Robotic Surgery with a Multi 3D-Camera Setup”. In: IROS Workshop on Methods for Safer Surgical Robotics Procedures, IEEE, RSJ, Sep 2011. Bibliography 177 [Noon 12] P. Noonan, J. Howard, D. Tout, I. Armstrong, H. Williams, T. Cootes, W. Hallett, and R. Hinz. “Accurate Markerless Respiratory Tracking for Gated Whole Body PET using the Microsoft Kinect”. In: Nuclear Science Symposium (NSS) and Medical Imaging Conference (MIC), IEEE, Oct 2012. [Oggi 04] T. Oggier, M. Lehmann, R. Kaufmann, M. Schweizer, M. Richter, P. Metzler, G. Lang, F. Lustenberger, and N. Blanc. “An All-solid-state Optical Range Camera for 3D Real-time Imaging with Sub-centimeter Depth Resolution (SwissRanger)”. In: SPIE Optical Design and Engineering, pp. 534–545, Feb 2004. [Oles 10] O. V. Olesen, M. R. Jorgensen, R. R. Paulsen, L. Hojgaard, B. Roed, and R. Larsen. “Structured Light 3D Tracking System for Measuring Motions in PET Brain Imaging”. In: SPIE Medical Imaging, pp. 76250X– 11, Feb 2010. [Oliv 11] T. Oliveira-Santos, M. Peterhans, S. Hofmann, and S. Weber. “Passive Single Marker Tracking for Organ Motion and Deformation Detection in Open Liver Surgery”. In: International Conference on Information Processing in Computer-Assisted Interventions (IPCAI), pp. 156–167, Springer, Jun 2011. [Ong 13] S. K. Ong, J. Zhang, and A. Y. C. Nee. “Assistive Obstacle Detection and Navigation Devices for Vision-impaired Users”. Disability and Rehabilitation: Assistive Technology, 2013. [Pado 12] N. Padoy, T. Blum, S.-A. Ahmadi, H. Feussner, M.-O. Berger, and N. Navab. “Statistical Modeling and Recognition of Surgical Workflow”. Medical Image Analysis, Vol. 16, No. 3, pp. 632–641, 2012. [Para 03] N. Paragios, M. Rousson, and V. Ramesh. “Non-rigid Registration using Distance Functions”. Computer Vision and Image Understanding, Vol. 89, No. 2-3, pp. 142–165, 2003. [Parr 12] G. Parra-Dominguez, B. Taati, and A. Mihailidis. “3D Human Motion Analysis to Detect Abnormal Events on Stairs”. In: International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), pp. 97–103, Oct 2012. [Pass 08] J. Passenger, O. Acosta, H. de Visser, S. Bauer, C. Russ, and S. Ourselin. “Texture Coordinate Generation of Colonic Surface Meshes for Surgical Simulation”. In: International Symposium on Biomedical Imaging (ISBI), pp. 640–643, IEEE, May 2008. [Paul 05] M. Pauly, N. J. Mitra, J. Giesen, M. H. Gross, and L. J. Guibas. “Example-Based 3D Scan Completion”. In: Symposium on Geometry Processing, pp. 23–32, Eurographics Association, Jul 2005. [Pear 12] N. Pears, Y. Liu, and P. Bunting, Eds. 3D Imaging, Analysis and Applications. Springer, 2012. [Pear 96] K. Pearson. “Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia”. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, Vol. 187, pp. 253–318, 1896. 178 Bibliography [Peng 10] J. L. Peng, D. Kahler, J. G. Li, S. Samant, G. Yan, R. Amdur, and C. Liu. “Characterization of a Real-time Surface Image-guided Stereotactic Positioning System”. Medical Physics, Vol. 37, No. 10, pp. 5421–5433, 2010. [Penn 09] J. Penne, K. Höller, M. Stürmer, T. Schrauder, A. Schneider, R. Engelbrecht, H. Feußner, B. Schmauss, and J. Hornegger. “Time-of-Flight 3-D Endoscopy”. In: G.-Z. Yang et al., Eds., International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 467–474, LNCS 5761, Part I, Springer, Nov 2009. [Petr 11] A. Petrelli and L. di Stefano. “On the Repeatability of the Local Reference Frame for Partial Shape Matching”. In: International Conference on Computer Vision (ICCV), pp. 2244–2251, IEEE, Nov 2011. [Pick 11] P. J. Pickhardt, C. Hassan, S. Halligan, and R. Marmo. “Colorectal Cancer: CT Colonography and Colonoscopy for Detection–Systematic Review and Meta-analysis.”. Radiology, Vol. 259, No. 2, pp. 393–405, May 2011. [Plac 12] S. Placht, J. Stancanello, C. Schaller, M. Balda, and E. Angelopoulou. “Fast Time-of-Flight Camera based Surface Registration for Radiotherapy Patient Positioning”. Medical Physics, Vol. 39, No. 1, pp. 4–17, 2012. [Pons 07] J.-P. Pons, R. Keriven, and O. Faugeras. “Multi-View Stereo Reconstruction and Scene Flow Estimation with a Global Image-Based Matching Score”. International Journal of Computer Vision, Vol. 72, No. 2, pp. 179–193, Apr 2007. [Quir 12] J. Quiroga, F. Devernay, and J. Crowley. “Scene Flow by Tracking in Intensity and Depth Data”. In: CVPR Workshop on Human Activity Understanding from 3D Data, pp. 50–57, IEEE, Jun 2012. [Rai 06] L. Rai, S. A. Merritt, and W. E. Higgins. “Real-time Image-Based Guidance Method for Lung-Cancer Assessment”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2437–2444, IEEE, Jun 2006. [Rang 97] A. Rangarajan, H. Chui, and F. L. Bookstein. “The Softassign Procrustes Matching Algorithm”. In: International Conference on Information Processing in Medical Imaging (IPMI), pp. 29–42, Jun 1997. [Reyn 11] M. Reynolds, J. Dobos, L. Peel, T. Weyrich, and G. Brostow. “Capturing Time-of-Flight Data with Confidence”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 945–952, IEEE, Jun 2011. [Rish 11] P. Risholm, J. Balter, and W. M. Wells. “Estimation of Delivered Dose in Radiotherapy: The Influence of Registration Uncertainty”. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 548–555, LNCS 6891, Part I, Springer, Sep 2011. Bibliography 179 [Robe 07] J. C. Roberts, A. C. Merkle, P. J. Biermann, E. E. Ward, B. G. Carkhuff, R. P. Cain, and J. V. O’Connor. “Computational and Experimental Models of the Human Torso for Non-penetrating Ballistic Impact”. Journal of Biomechanics, Vol. 40, No. 1, pp. 125–136, 2007. [Rohl 12] S. Röhl, S. Bodenstedt, S. Suwelack, H. Kenngott, B. P. Müller-Stich, R. Dillmann, and S. Speidel. “Dense GPU-Enhanced Surface Reconstruction from Stereo Endoscopic Images for Intraoperative Registration”. Medical Physics, Vol. 39, pp. 1632–1645, 2012. [Rouh 11] M. Rouhani and A. D. Sappa. “Correspondence Free Registration through a Point-to-Model Distance Minimization”. In: International Conference on Computer Vision (ICCV), pp. 2150–2157, IEEE, Nov 2011. [Ruec 03] D. Rueckert, A. F. Frangi, and J. A. Schnabel. “Automatic Construction of 3-D Statistical Deformation Models of the Brain using Nonrigid Registration”. IEEE Transactions on Medical Imaging, Vol. 22, No. 8, pp. 1014–1025, Aug 2003. [Ruec 11] D. Rueckert and J. Schnabel. “Medical Image Registration”. In: T. M. Deserno, Ed., Biomedical Image Processing, pp. 131–154, Springer, 2011. [Ruec 99] D. Rueckert, L. Sonoda, C. Hayes, D. Hill, M. Leach, and D. Hawkes. “Nonrigid Registration using Free-form Deformations: Application to Breast MR Images”. IEEE Transactions on Medical Imaging, Vol. 18, No. 8, pp. 712–721, Aug 1999. [Rusi 01] S. Rusinkiewicz and M. Levoy. “Efficient Variants of the ICP Algorithm”. In: International Conference on 3-D Digital Imaging and Modeling, pp. 145–152, May 2001. [Rusi 02] S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. “Real-time 3D Model Acquisition”. ACM Transactions on Graphics, Vol. 21, No. 3, pp. 438– 446, Jul 2002. [Russ 00] G. Russo and P. Smereka. “A Remark on Computing Distance Functions”. Journal of Computational Physics, Vol. 163, pp. 51–67, 2000. [Rusu 11] R. B. Rusu and S. Cousins. “3D is Here: Point Cloud Library (PCL)”. In: International Conference on Robotics and Automation (ICRA), pp. 1–4, IEEE, May 2011. [Salv 04] J. Salvi, J. Pages, and J. Batlle. “Pattern Codification Strategies in Structured Light Systems”. Pattern Recognition, Vol. 37, No. 4, pp. 827–849, 2004. [Salv 07] J. Salvi, C. Matabosch, D. Fofi, and J. Forest. “A Review of Recent Range Image Registration Methods with Accuracy Evaluation”. Image and Vision Computing, Vol. 25, No. 5, pp. 578–596, 2007. [Sant 10] T. R. dos Santos, A. Seitel, H.-P. Meinzer, and L. Maier-Hein. “Correspondences Search for Surface-Based Intra-Operative Registration”. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 660–667, LNCS 6362, Part II, Springer, Sep 2010. 180 Bibliography [Sant 12a] T. R. dos Santos. Muti-Modal Partial Surface Matching For Intraoperative Registration. PhD thesis, Ruprecht-Karls-Universität Heidelberg, 2012. [Sant 12b] T. R. dos Santos, C. J. Goch, A. M. Franz, H.-P. Meinzer, T. Heimann, and L. Maier-Hein. “Minimally Deformed Correspondences between Surfaces for Intra-Operative Registration”. In: SPIE Medical Imaging, p. 83141C, Feb 2012. [Sava 97] C. Savage. “A Survey of Combinatorial Gray Codes”. SIAM Review, Vol. 39, No. 4, pp. 605–629, Dec 1997. [Scha 08] C. Schaller, J. Penne, and J. Hornegger. “Time-of-Flight Sensor for Respiratory Motion Gating”. Medical Physics, Vol. 35, No. 7, pp. 3090– 3093, 2008. [Scha 09] C. Schaller, C. Rohkohl, J. Penne, M. Stürmer, and J. Hornegger. “Inverse C-arm Positioning for Interventional Procedures Using RealTime Body Part Detection”. In: G.-Z. Y. et al., Ed., International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 549–556, LNCS 5761, Part I, Springer, Sep 2009. [Scha 12] J. Schaerer, A. Fassi, M. Riboldi, P. Cerveri, G. Baroni, and D. Sarrut. “Multi-dimensional Respiratory Motion Tracking from Markerless Optical Surface Imaging based on Deformable Mesh Registration”. Physics in Medicine and Biology, Vol. 57, No. 2, pp. 357–373, 2012. [Schi 11] A. Schick, F. Forster, and M. Stockmann. “3D Measuring in the Field of Endoscopy”. In: SPIE Optical Measurement Systems for Industrial Inspection, pp. 808216–12, May 2011. [Schm 09] M. Schmidt and B. Jähne. “A Physical Model of Time-of-Flight 3D Imaging Systems, Including Suppression of Ambient Light”. In: A. Kolb and R. Koch, Eds., Dynamic 3D Imaging (Dyn3D), pp. 1–15, Springer, 2009. [Schm 11] M. Schmidt. Analysis, Modeling and Dynamic Optimization of 3D Timeof-Flight Imaging Systems. PhD thesis, Ruprecht-Karls-Universität Heidelberg, 2011. [Schm 12] C. Schmalz, F. Forster, A. Schick, and E. Angelopoulou. “An Endoscopic 3D Scanner based on Structured Light”. Medical Image Analysis, Vol. 16, No. 5, pp. 1063–1072, 2012. [Scho 07] P. J. Schöffel, W. Harms, G. Sroka-Perez, W. Schlegel, and C. P. Karger. “Accuracy of a Commercial Optical 3D Surface Imaging System for Realignment of Patients for Radiotherapy of the Thorax”. Physics in Medicine and Biology, Vol. 52, No. 13, pp. 3949–3963, Jul 2007. [Scho 11] C. Schoenauer, T. Pintaric, H. Kaufmann, S. Jansen-Kosterink, and M. Vollenbroek-Hutten. “Chronic Pain Rehabilitation with a Serious Game using Multimodal Input”. In: International Conference on Virtual Rehabilitation (ICVR), pp. 1–8, IEEE, Jun 2011. [Schr 06] W. Schroeder, K. Martin, and B. Lorensen. The Visualization Toolkit: An Object-Oriented Approach To 3D Graphics. Kitware, Inc., 2006. Bibliography 181 [Sega 07] W. Segars, S. Mori, G. Chen, and B. Tsui. “Modeling Respiratory Motion Variations in the 4D NCAT Phantom”. In: Nuclear Science Symposium (NSS) and Medical Imaging Conference (MIC), pp. 2677–2679, IEEE, Oct 2007. [Seit 10] A. Seitel, T. R. dos Santos, S. Mersmann, J. Penne, R. Tetzlaff, H.-P. Meinzer, and et al. “Time-of-Flight Kameras für die intraoperative Oberflächenerfassung”. In: Bildverarbeitung für die Medizin (BVM), pp. 11–15, Springer, Mar 2010. [Seit 12] A. Seitel. Markerless Navigation For Percutaneus Needle Insertions. PhD thesis, Ruprecht-Karls-Universität Heidelberg, 2012. [Sepp 02] Y. Seppenwoolde, H. Shirato, K. Kitamura, S. Shimizu, M. van Herk, J. V. Lebesque, and K. Miyasaka. “Precise and Real-time Measurement of 3D Tumor Motion in Lung due to Breathing and Heartbeat, Measured during Radiotherapy”. International Journal of Radiation Oncology Biology Physics, Vol. 53, No. 4, pp. 822–834, 2002. [Sesh 11] S. Seshamani, G. Chintalapani, and R. H. Taylor. “Iterative Refinement of Point Correspondences for 3D Statistical Shape Models”. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 417–425, LNCS 6892, Part II, Springer, Sep 2011. [Sham 10] R. Shams, P. Sadeghi, R. Kennedy, and R. Hartley. “A Survey of Medical Image Registration on Multicore and the GPU”. IEEE Signal Processing Magazine, Vol. 27, No. 2, pp. 50–60, Mar 2010. [Shan 04] Y. Shan, B. Matei, H. S. Sawhney, R. Kumar, D. F. Huber, and M. Hebert. “Linear Model Hashing and Batch RANSAC for Rapid and Accurate Object Recognition”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 121–128, IEEE, Jul 2004. [Shim 12] H. Shim and S. Lee. “Performance Evaluation of Time-of-Flight and Structured Light Depth Sensors in Radiometric/Geometric Variations”. SPIE Optical Engineering, Vol. 51, No. 9, pp. 1–094401–12, 2012. [Shpu 11] A. Shpunt and Z. Zalevsky. “Depth-varying Light Fields for Three Dimensional Sensing”. Patent, Nov 2011. US20080106746. [Simo 12] J. Simon. MRI Workflow Optimization using Real-Time Range Imaging Sensors. Master’s thesis, Friedrich-Alexander-Universität ErlangenNürnberg, 2012. [Siva 12] R. Sivalingam, A. Cherian, J. Fasching, N. Walczak, N. D. Bird, V. Morellas, B. Murphy, K. Cullen, K. Lim, G. Sapiro, and N. Papanikolopoulos. “A Multi-sensor Visual Tracking System for Behavior Monitoring of At-Risk Children”. In: International Conference on Robotics and Automation (ICRA), pp. 1345–1350, IEEE, May 2012. [Smis 13] J. Smisek, M. Jancosek, and T. Pajdla. Consumer Depth Cameras for Computer Vision: Research Topics and Applications, Chap. 3D with Kinect, pp. 3–25. Advances in Computer Vision and Pattern Recognition, Springer, 2013. 182 Bibliography [Smit 12] S. T. Smith and D. Schoene. “The Use of Exercise-based Videogames for Training and Rehabilitation of Physical Function in Older Adults: Current Practice and Guidelines for Future Research”. Aging Health, Vol. 8, No. 3, pp. 243–252, 2012. [Soti 12] A. Sotiras, D. Christos, and N. Paragios. “Deformable Medical Image Registration: A Survey”. Research Report RR-7919, INRIA, Mar 2012. [Sout 08] S. Soutschek, J. Penne, J. Hornegger, and J. Kornhuber. “3-D GestureBased Scene Navigation in Medical Imaging Applications Using Time-Of-Flight Cameras”. In: CVPR Workshop on Time of Flight Camera based Computer Vision, pp. 1–4, IEEE, Jun 2008. [Sout 10] S. Soutschek, A. Maier, S. Bauer, P. Kugler, M. Bebenek, S. Steckmann, S. von Stengel, W. Kemmler, J. Hornegger, and J. Kornhuber. “Measurement of Angles in Time-of-Flight Data for the Automatic Supervision of Training Exercises”. In: International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth), pp. 1–4, IEEE, Mar 2010. [Spie 02] H. Spies, B. Jähne, and J. L. Barron. “Range Flow Estimation”. Computer Vision and Image Understanding, Vol. 85, No. 3, pp. 209–231, 2002. [Stau 12] R. Stauder, V. Belagiannis, L. Schwarz, A. Bigdelou, E. Söhngen, S. Ilic, and N. Navab. “A User-Centered and Workflow-Aware Unified Display for the Operating Room”. In: MICCAI Workshop on Modeling and Monitoring of Computer Assisted Interventions, Oct 2012. [Stei 11] F. Steinbrucker, J. Sturm, and D. Cremers. “Real-time Visual Odometry from Dense RGB-D Images”. In: ICCV Workshop on Live Dense Reconstruction with Moving Cameras, pp. 719–722, IEEE, Nov 2011. [Ston 11a] E. Stone and M. Skubic. “Passive In-home Measurement of Strideto-Stride Gait Variability Comparing Vision and Kinect Sensing”. In: International Conference of the Engineering in Medicine and Biology Society (EMBC), pp. 6491–6494, IEEE, Sep 2011. [Ston 11b] E. Stone and M. Skubic. “Evaluation of an Inexpensive Depth Camera for In-home Gait Assessment”. Journal of Ambient Intelligence and Smart Environments, Vol. 3, No. 4, pp. 349–361, Dec 2011. [Stoy 07] E. Stoykova, A. Alatan, P. Benzie, N. Grammalidis, S. Malassiotis, J. Ostermann, S. Piekh, V. Sainov, C. Theobalt, T. Thevar, and X. Zabulis. “3-D Time-Varying Scene Capture Technologies; A Survey”. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 17, No. 11, pp. 1568–1586, Nov 2007. [Stoy 10] D. Stoyanov, M. Scarzanella, P. Pratt, and G.-Z. Yang. “Real-Time Stereo Reconstruction in Robotically Assisted Minimally Invasive Surgery”. In: T. Jiang, N. Navab, J. Pluim, and M. Viergever, Eds., International Conference on Medical Image Computing and ComputerAssisted Intervention (MICCAI), pp. 275–282, LNCS 6361, Part I, Springer, Sep 2010. Bibliography 183 [Stoy 12] D. Stoyanov. “Stereoscopic Scene Flow for Robotic Assisted Minimally Invasive Surgery”. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 479–486, LNCS 7510, Part I, Springer, Oct 2012. [Stur 12] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. “A Benchmark for the Evaluation of RGB-D SLAM Systems”. In: International Conference on Intelligent Robot Systems (IROS), pp. 573–580, Oct 2012. [Subs 98] G. Subsol, J.-P. Thirion, and N. Ayache. “A Scheme for Automatically Building Three-dimensional Morphometric Anatomical Atlases: Application to a Skull Atlas”. Medical Image Analysis, Vol. 2, No. 1, pp. 37–60, 1998. [Sun 09] J. Sun, M. Ovsjanikov, and L. J. Guibas. “A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion”. Eurographics Computer Graphics Forum, Vol. 28, No. 5, pp. 1383–1392, 2009. [Sun 10] D. Sun, S. Roth, and M. J. Black. “Secrets of Optical Flow Estimation and Their Principles”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2432–2439, IEEE, Jun 2010. [Sund 07] G. Sundaramoorthi, A. Yezzi, and A. Mennucci. “Sobolev Active Contours”. International Journal of Computer Vision, Vol. 73, No. 3, pp. 345– 366, 2007. [Taka 10] G. Takacs, V. Chandrasekhar, S. Tsai, D. Chen, R. Grzeszczuk, and B. Girod. “Unified Real-Time Tracking and Recognition with Rotation-Invariant Fast Features”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 934–941, IEEE, Jun 2010. [Tang 08a] L. Tang and G. Hamarneh. “SMRFI: Shape Matching via Registration of Vector-valued Feature Images”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8, IEEE, Jun 2008. [Tang 08b] J. W. Tangelder and R. C. Veltkamp. “A Survey of Content based 3D Shape Retrieval Methods”. Multimedia Tools and Applications, Vol. 39, No. 3, pp. 441–471, Sep 2008. [Thor 09] N. Thorstensen and R. Keriven. “Non-rigid Shape Matching Using Geometry and Photometry”. In: Asian Conference on Computer Vision (ACCV), pp. 644–654, Springer, Sep 2009. [Toma 98] C. Tomasi and R. Manduchi. “Bilateral Filtering for Gray and Color Images”. In: International Conference on Computer Vision (ICCV), pp. 839–846, IEEE, Jan 1998. [Tomb 10] F. Tombari, S. Salti, and L. Di Stefano. “Unique Signatures of Histograms for Local Surface Description”. In: European Conference on Computer Vision (ECCV), pp. 356–369, Springer, Sep 2010. [Totz 11] J. Totz, P. Mountney, D. Stoyanov, and G.-Z. Yang. “Dense Surface Reconstruction for Enhanced Navigation in MIS”. In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 89–96, LNCS 6891, Part I, Springer, Sep 2011. 184 Bibliography [Totz 12] J. Totz, K. Fujii, P. Mountney, and G.-Z. Yang. “Enhanced Visualisation for Minimally Invasive Surgery”. International Journal of Computer Assisted Radiology and Surgery, Vol. 7, No. 3, pp. 423–432, May 2012. [Tsag 12] G. Tsagkatakis, A. Woiselle, G. Tzagkarakis, M. Bousquet, J.-L. Starck, and P. Tsakalides. “Active Range Imaging via Random Gating”. In: SPIE Electro-Optical Remote Sensing, Photonic Technologies, and Applications, pp. 85420P–9, Sep 2012. [Tsin 04] Y. Tsin and T. Kanade. “A Correlation-Based Approach to Robust Point Set Registration”. In: European Conference on Computer Vision (ECCV), pp. 558–569, Springer, May 2004. [Unal 04] G. Unal, G. Slabaugh, A. Yezzi, and J. Tyan. “Joint Segmentation and Non-Rigid Registration Without Shape Priors”. Tech. Rep. SCR-04TR-7495, Siemens Corporate Research, 2004. [Vand 11] J. Vandemeulebroucke, S. Rit, J. Kybic, P. Clarysse, and D. Sarrut. “Spatiotemporal Motion Estimation for Respiratory-correlated Imaging of the Lungs”. Medical Physics, Vol. 38, No. 1, pp. 166–178, Jan 2011. [Vedu 05] S. Vedula, S. Baker, P. Rander, R. T. Collins, and T. Kanade. “ThreeDimensional Scene Flow”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 3, pp. 475–480, 2005. [Vedu 99] S. Vedula, S. Baker, P. Rander, R. T. Collins, and T. Kanade. “ThreeDimensional Scene Flow”. In: International Conference on Computer Vision (ICCV), pp. 722–729, IEEE, Sep 1999. [Vere 10] D. Verellen, T. Depuydt, T. Gevaert, N. Linthout, K. Tournel, M. Duchateau, T. Reynders, G. Storme, and M. D. Ridder. “Gating and Tracking, 4D in Thoracic Tumours”. Cancer/Radiotherapie, Vol. 14, No. 6–7, pp. 446–454, Oct 2010. [Volz 11] S. Volz, A. Bruhn, L. Valgaerts, and H. Zimmer. “Modeling Temporal Coherence for Optical Flow”. In: International Conference on Computer Vision (ICCV), pp. 1116–1123, IEEE, Nov 2011. [Vos 04] F. Vos, P. W. de Bruin, J. G. M. Aubel, G. J. Streekstra, M. Maas, L. J. van Vliet, and A. M. Vossepoel. “A Statistical Shape Model without Using Landmarks”. In: International Conference on Pattern Recognition (ICPR), pp. 714–717, IEEE, Aug 2004. [Walc 12] N. Walczak, J. Fasching, W. D. Toczyski, R. Sivalingam, N. D. Bird, K. Cullen, V. Morellas, B. Murphy, G. Sapiro, and N. Papanikolopoulos. “A Nonintrusive System for Behavioral Analysis of Children using Multiple RGB+Depth Sensors”. In: Workshop on the Applications of Computer Vision (WACV), pp. 217–222, IEEE, Jan 2012. [Wald 09] T. Waldron. “External Surrogate Measurement and Internal Target Motion: Photogrammetry as a Tool in IGRT”. In: ACMP Annual Meeting, p. 10059, 2009. Bibliography 185 [Wang 12] X. L. Wang, P. J. Stolka, E. Boctor, G. Hager, and M. Choti. “The Kinect as an Interventional Tracking System”. In: SPIE Medical Imaging, pp. 83160U–6, Feb 2012. [Warr 12] A. Warren, P. Mountney, D. Noonan, and G.-Z. Yang. “Horizon Stabilized–Dynamic View Expansion for Robotic Assisted Surgery (HS-DVE)”. International Journal of Computer Assisted Radiology and Surgery, Vol. 7, No. 2, pp. 281–288, Mar 2012. [Wasz 11a] J. Wasza, S. Bauer, and J. Hornegger. “Real-time Preprocessing for Dense 3-D Range Imaging on the GPU: Defect Interpolation, Bilateral Temporal Averaging and Guided Filtering”. In: ICCV Workshop on Consumer Depth Cameras for Computer Vision (CDC4CV), pp. 1221–1227, IEEE, Nov 2011. [Wasz 11b] J. Wasza, S. Bauer, S. Haase, M. Schmid, S. Reichert, and J. Hornegger. “RITK: The Range Imaging Toolkit - A Framework for 3-D Range Image Stream Processing”. In: P. Eisert, J. Hornegger, and K. Polthier, Eds., International Workshop on Vision, Modelling and Visualization (VMV), pp. 57–64, Eurographics Association, 2011. [Wasz 11c] J. Wasza, S. Bauer, and J. Hornegger. “High Performance GPUBased Preprocessing for Time-of-Flight Imaging in Medical Applications”. In: H. Handels, J. Ehrhardt, T. M. Deserno, H.-P. Meinzer, and T. Tolxdorff, Eds., Bildverarbeitung für die Medizin (BVM), pp. 324–328, Springer, Mar 2011. [Wasz 12a] J. Wasza, S. Bauer, S. Haase, and J. Hornegger. “Sparse Principal Axes Statistical Surface Deformation Models for Respiration Analysis and Classification”. In: T. Tolxdorff, T. M. Deserno, H. Handels, and H.P. Meinzer, Eds., Bildverarbeitung für die Medizin (BVM), pp. 316–321, Springer, Mar 2012. [Wasz 12b] J. Wasza, S. Bauer, and J. Hornegger. “Real-time Motion Compensated Patient Positioning and Non-rigid Deformation Estimation using 4-D Shape Priors”. In: International Conference on Medical Image Processing and Computer Assisted Intervention (MICCAI), pp. 576–583, LNCS 7511, Part II, Springer, Oct 2012. [Wasz 13] J. Wasza, S. Bauer, and J. Hornegger. “Real-time Respiratory Motion Analysis Using Manifold Ray Casting of Volumetrically Fused MultiView Range Imaging”. In: International Conference on Medical Image Processing and Computer Assisted Intervention (MICCAI), pp. 116–123, LNCS 8150, Part II, Springer, Sep 2013. [Weik 97] S. Weik. “Registration of 3-D Partial Surface Models using Luminance and Depth Information”. In: International Conference on 3-D Digital Imaging and Modeling (3DIM), pp. 93–100, IEEE, May 1997. [West 08] J. West. Respiratory Physiology: The Essentials. Point (Lippincott Williams and Wilkins) Series, Wolters Kluwer Health/Lippincott Williams & Wilkins, 2008. [Whel 12] T. Whelan, H. Johannsson, M. Kaess, J. Leonard, and J. McDonald. “Robust Tracking for Real-Time Dense RGB-D Mapping with Kintinuous”. Tech. Rep. MIT-CSAIL-TR-2012-031, MIT, Sep 2012. 186 Bibliography [Whit 01] R. Whitaker and X. Xue. “Variable-Conductance, Level-Set Curvature for Image Denoising”. In: International Conference on Image Processing (ICIP), pp. 142–145, IEEE, Oct 2001. [Wilb 08] J. Wilbert, J. Meyer, K. Baier, M. Guckenberger, C. Herrmann, R. Hess, C. Janka, L. Ma, T. Mersebach, A. Richter, M. Roth, K. Schilling, and M. Flentje. “Tumor Tracking and Motion Compensation with an Adaptive Tumor Tracking System (ATTS): System Description and Prototype Testing”. Medical Physics, Vol. 35, pp. 3911–3921, 2008. [Will 06] T. R. Willoughby, A. R. Forbes, D. Buchholz, K. M. Langen, T. H. Wagner, O. A. Zeidan, P. A. Kupelian, and S. L. Meeks. “Evaluation of an Infrared Camera and X-ray System using Implanted Fiducials in Patients with Lung Tumors for Gated Radiation Therapy”. International Journal of Radiation Oncology Biology Physics, Vol. 66, No. 2, pp. 568– 575, 2006. [Will 09] T. Willoughby. “Performance-Based QA for Radiotherapy: TG-147 - QA for Non-Radiographic Localization Systems”. Medical Physics, Vol. 36, No. 6, pp. 2743–2744, 2009. [Will 12] T. Willoughby, J. Lehmann, J. A. Bencomo, S. K. Jani, L. Santanam, A. Sethi, T. D. Solberg, W. A. Tome, and T. J. Waldron. “Quality Assurance for Nonradiographic Radiotherapy Localization and Positioning Systems: Report of Task Group 147”. Medical Physics, Vol. 39, No. 4, pp. 1728–1747, Apr 2012. [Wu 12] D. Wu, M. O’Toole, A. Velten, A. Agrawal, and R. Raskar. “Decomposing Global Light Transport using Time of Flight Imaging”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 366–373, IEEE, Jun 2012. [Xu 98] Z. Xu, R. Schwarte, H. Heinol, B. Buxbaum, and T. Ringbeck. “Smart Pixel - Photometric Mixer Device (PMD) / New System Concept of a 3D-Imaging-on-a-Chip”. In: International Conference on Mechatronics and Machine Vision in Practice, pp. 259–264, 1998. [Yaha 07] G. Yahav, G. Iddan, and D. Mandelboum. “3D Imaging Camera for Gaming Application”. In: International Conference on Consumer Electronics (ICCE), Digest of Technical Papers, pp. 1–2, IEEE, Jan 2007. [Yama 07] Y. Yamauchi. “Non-optical Expansion of Field-of-view of the Rigid Endoscope”. In: R. Magjarevic and J. Nagel, Eds., World Congress on Medical Physics and Biomedical Engineering, pp. 4184–4186, Springer, Aug 2007. [Yan 06] H. Yan, F.-F. Yin, G.-P. Zhu, M. Ajlouni, and J. H. Kim. “The Correlation Evaluation of a Tumor Tracking System using Multiple External Markers”. Medical Physics, Vol. 33, No. 11, pp. 4073–4084, 2006. [Yu 12] M.-C. Yu, H. Wu, J.-L. Liou, M.-S. Lee, and Y.-P. Hung. “Breath and Position Monitoring during Sleeping with a Depth Camera”. In: International Conference on Health Informatics (HEALTHINF), pp. 12–22, Feb 2012. Bibliography 187 [Zaha 09] A. Zaharescu, E. Boyer, K. Varanasi, and R. P. Horaud. “Surface Feature Detection and Description with Applications to Mesh Matching”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 373–380, IEEE, Jun 2009. [Zale 10] Z. Zalevsky, A. Shpunt, A. Maizels, and J. Garcia. “Method and System for Object Reconstruction”. Patent, Jul 2010. US20100177164A1. [Zhan 00a] Y. Zhang and C. Kambhamettu. “Integrated 3D Scene Flow and Structure Recovery from Multiview Image Sequences”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 674– 681, IEEE, Jun 2000. [Zhan 00b] Z. Zhang. “A Flexible New Technique for Camera Calibration”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 11, pp. 1330–1334, 2000. [Zhan 94] Z. Zhang. “Iterative Point Matching for Registration of Free-form Curves and Surfaces”. International Journal of Computer Vision, Vol. 13, No. 2, pp. 119–152, Oct 1994. [Zhen 10] B. Zheng, J. Takamatsu, and K. Ikeuchi. “An Adaptive and Stable Method for Fitting Implicit Polynomial Curves and Surfaces”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 3, pp. 561–568, 2010. [Zito 03] B. Zitová and J. Flusser. “Image Registration Methods: a Survey”. Image and Vision Computing, Vol. 21, No. 11, pp. 977–1000, 2003. 188 Bibliography