Download Rigid and Non-Rigid Surface Registration in Medicine

Document related concepts

Medical imaging wikipedia , lookup

Image-guided radiation therapy wikipedia , lookup

Transcript
Rigid and Non-Rigid Surface
Registration for Range Imaging
Applications in Medicine
Starre und nicht-starre Registrierung
von Oberflächen für den Einsatz der
Tiefenbildgebung in der Medizin
Der Technischen Fakultät der
Friedrich-Alexander-Universität
Erlangen-Nürnberg
zur
Erlangung des Doktorgrades Dr.-Ing.
vorgelegt von
Sebastian Bauer
aus Marktheidenfeld
Als Dissertation genehmigt
von der Technischen Fakultät
der Friedrich-Alexander-Universität Erlangen-Nürnberg
Tag der mündlichen Prüfung:
Vorsitzende des Promotionsorgans:
Gutachter:
24. September 2014
Prof. Dr.-Ing. habil. M. Merklein
Prof. Dr.-Ing. J. Hornegger
Prof. Dr. rer. nat. M. Rumpf
Abstract
The introduction of low-cost range imaging technologies that are capable of
acquiring the three-dimensional geometry of an observed scene in an accurate,
dense, and dynamic manner holds great potential for manifold applications in
health care. Over the past few years, the use of range imaging modalities has
been proposed for guidance in computer-assisted procedures, monitoring of interventional workspaces for safe robot-human interaction and workflow analysis,
touch-less user interaction in sterile environments, and for application in early
diagnosis and elderly care, among others. This thesis is concerned with the application of range imaging technologies in computer-assisted and image-guided
interventions, where the geometric alignment of range imaging data to a given
reference shape – either also acquired with range imaging technology or extracted
from tomographic planning data – poses a fundamental challenge. In particular,
we propose methods for both rigid and non-rigid surface registration that are tailored to cope with the specific properties of range imaging data.
In the first part of this work, we focus on rigid surface registration problems.
We introduce a point-based alignment approach based on matching customized
local surface features and estimating a global transformation from the set of detected correspondences. The approach is capable of handling gross initial misalignments and the multi-modal case of aligning range imaging data to tomographic shape data. We investigate its application in image-guided open hepatic
surgery and automatic patient setup in fractionated radiation therapy. For the
rigid registration of surface data that exhibit only slight misalignments, such as
with on-the-fly scene reconstruction using a hand-guided moving range imaging
camera, we extend the classical iterative closest point algorithm to incorporate
both geometric and photometric information. In particular, we investigate the use
of acceleration structures for efficient nearest neighbor search to achieve real-time
performance, and quantify the benefit of incorporating photometric information
in endoscopic applications with a comprehensive simulation study.
The emphasis of the second part of this work is on variational methods for
non-rigid surface registration. Here, we target respiratory motion management
in radiation therapy. The proposed methods estimate dense surface motion fields
that describe the elastic deformation of the patient’s body. It can serve as a highdimensional respiration surrogate that substantially better reflects the complexity
of human respiration compared to conventionally used low-dimensional surrogates. We propose three methods for different range imaging sensors and thereby
account for the particular strengths and limitations of the individual modalities.
For dense but noisy range imaging data, we propose a framework that solves the
intertwined tasks of range image denoising and its registration with an accurate
planning shape in a joint manner. For accurate but sparse range imaging data
we introduce a method that aligns sparse measurements with a dense reference
shape while simultaneously reconstructing a dense displacement field describing
the non-rigid deformation of the body surface. For range imaging sensors that additionally capture photometric information, we investigate the estimation of surface motion fields driven by this complementary source of information.
Kurzfassung
Kostengünstige Technologien zur Tiefenbildgebung, welche die dreidimensionale Geometrie eines Objektes präzise, engmaschig und dynamisch akquirieren
können, bergen großes Potential für Anwendungen im Gesundheitswesen. Erst
kürzlich wurde die Tiefenbildgebung zur Navigation bei computergestützten Interventionen, zur Kollisionsvermeidung in robotergestützten Operationssäalen,
zur Analyse von klinischen Arbeitsabläufen, zur berührungslosen Benutzerinteraktion in sterilen Umgebungen, oder für Anwendungen in der Früherkennung
und Altenpflege vorgeschlagen. Die vorliegende Arbeit befasst sich mit der Nutzung der Tiefenbildgebung für computer- und bildgestützten Interventionen. Eine
besondere Herausforderung in diesem Umfeld stellt die Registrierung von Tiefenbilddaten auf eine Referenzform, die entweder auch mit einer Tiefenbildkamera
akquiriert oder aus tomographischen Planungsdaten extrahiert wurde, dar. Konkret werden Methoden zur starren und nicht-starren Registrierung von Oberflächen entwickelt, die speziell auf Tiefenbilddaten zugeschnitten sind.
Der erste Teil der Arbeit behandelt starre Registrierungsprobleme. Wir präsentieren einen punktbasierten Registrierungsansatz, der auf dem Abgleich von
lokalen Oberflächenmerkmalen basiert und aus den gefundenen Korrespondenzen eine globale Transformation schätzt. Er eignet sich für Problemstellungen
mit großen initialen Abweichungen und für die multi-modale Registrierung von
Tiefenbilddaten mit tomographischen Referenzformen. Wir untersuchen die Anwendung der Methode in der bildgestützten offenen Leberchirurgie und zur automatisierten Patientenpositionierung in der fraktionierten Strahlentherapie. Für
die starre Registrierung von Oberflächen, die nur geringfügig zueinander verschoben sind, wie etwa bei der sukzessiven Rekonstruktion einer Szene mit einer
handgeführten Tiefenbildkamera, erweitern wir den klassischen Algorithmus des
iterativen nächsten Nachbarn auf die gemeinsame Analyse von geometrischen
und photometrischen Informationen. Dabei untersuchen wir das Potential von
Beschleunigungsstrukturen und quantifizieren den Vorteil dieses photo-geometrischen Ansatzes für endoskopische Anwendungen in einer Simulationsstudie.
Der Fokus des zweiten Teils der Arbeit liegt auf variationellen Methoden zur
nicht-starren Oberflächenregistrierung. Dabei adressieren wir das Management
von Atembewegungen in der Strahlentherapie. Die vorgestellten Methoden schätzen dichte Oberflächen-Bewegungsfelder, welche die elastische Deformation des
Patientenkörpers beschreiben. Diese Bewegungsfelder dienen als hochdimensionales Atemsurrogat und stellen die Komplexität der menschlichen Atmung wesentlich besser dar als konventionelle niedrigdimensionale Surrogate. Wir präsentieren drei Ansätze für unterschiedliche Tiefenbildgebungs-Sensoren, die auf deren
spezifische Stärken und Schwächen zugeschnitten sind: Für dichte aber verrauschte Tiefenbilder schlagen wir eine Methode vor, die das Entrauschen von Tiefendaten mit deren Registrierung auf eine präzise Planungsform in einer kombinierten
Formulierung löst. Für präzise aber dünn besetzte Tiefenbilder führen wir eine
Methode ein, welche die dünn besetzten Tiefenbilder mit einer Referenzform registriert und gleichzeitig ein dichtes Oberflächen-Bewegungsfeld schätzt. Für Sensoren, die zusätzlich photometrische Informationen akquirieren, untersuchen wir
die Schätzung von Bewegungsfeldern mithilfe dieser komplementären Daten.
Acknowledgments
First and foremost, let me express my sincere gratitude to Prof. Dr.-Ing. Joachim
Hornegger for the opportunity to work in such an inspiring research environment.
In particular, I appreciate his outstanding confidence in me, his encouragement,
support and guidance over the years – not only as a scientific mentor –, the freedom he allowed me regarding the contents of my work, and for setting up collaborations that built the fundamental basis of this thesis.
I would like to thank Prof. Dr. Martin Rumpf (University of Bonn) and Prof. Dr.
Benjamin Berkels (RWTH Aachen University) for the intense collaboration in joint
projects throughout this work. I enjoyed the winter months in Bonn, being a guest
at the lab, and deeply appreciate the valuable discussions on setting up mathematical models, variational methods, finite elements, optimization techniques and the
QuocMesh framework which facilitates life of an engineer.
Many thanks to my colleagues at the Pattern Recognition Lab, for the pleasant
and friendly atmosphere at the lab, for the ongoing knowledge sharing, and for the
joyful time outside the working hours. In particular, let me thank Jakob Wasza and
Sven Haase for the great time in our office, for the efforts in setting up a powerful
development environment (RITK), and for endless scientific discussions that had
a tremendous impact on this work. Among the students I have supervised, let
me acknowledge Kerstin Müller, Dominik Neumann and Felix Lugauer for their
excellent work that contributed to several publications. Let me also particularly
thank Jakob Wasza for his meticulous review of this thesis.
I acknowledge support by the European Regional Development Fund (ERDF)
and the Bayerisches Staatsministerium für Wirtschaft, Infrastruktur, Verkehr und
Technologie (StMWIVT), in the context of the R&D program IuK Bayern under
Grant No. IUK338/001, and by the Graduate School of Information Science in
Health (GSISH) and the Technische Universität München Graduate School.
Thanks to our industrial partners at Siemens AG, Healthcare Sector, and Softgate GmbH: Dr. Natalia Anderl and Dr. Annemarie Bakai for the background in
clinical workflows, Stefan Sattler and Stefan Schuster for the opportunity to serve
new application fields beyond radiation therapy, and Dr. Florian Höpfl, Sebastian Reichert and Christiane Kupczok for the support in project management and
camera calibration. I further would like to acknowledge Prof. Dr. Gerd Häusler
and Dr. Svenja Ettl (Institute of Optics, Information and Photonics, University
of Erlangen-Nürnberg) for the opportunity to investigate the active triangulation
sensor for medical applications, Dr. Anja Borsdorf, Dr. Holger Kunze (Siemens
AG, Healthcare Sector) and Prof. Dr. Arnd Dörfler (Department of Neuroradiology, Erlangen University Clinic) for their support in data acquisition, and Dr. Elli
Angelopoulou for her great support in improving our manuscripts.
Last but not least, I am deeply thankful to my wife and my family for their
patience and support over the years.
Erlangen, 26.04.2014
Sebastian Bauer
Contents
Chapter 1
Introduction
1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3 Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Chapter 2
9
Range Imaging and Surface Registration in Medicine
2.1 Real-time Range Imaging Technologies . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.1.1 Triangulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Time-of-Flight Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.3 Discussion of RI Sensors investigated in this Thesis . . . . . . . . . . . . 13
2.2 Range Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 RITK: The Range Imaging Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.2 Virtual RGB-D Camera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.3 Range Data Enhancement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Applications in Medicine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.1 Prevention, Diagnosis and Support . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.2 Monitoring for OR Safety and Workflow Analysis . . . . . . . . . . . . . 21
2.3.3 Touchless Interaction and Visualization. . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Guidance in Computer-assisted Interventions . . . . . . . . . . . . . . . . 22
2.4 Surface Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1 Global vs. Local Surface Registration. . . . . . . . . . . . . . . . . . . . . . . 24
2.4.2 Rigid vs. Non-Rigid Surface Registration. . . . . . . . . . . . . . . . . . . . 25
2.4.3 Medical Surface Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
I Rigid Surface Registration for Range Imaging Applications
in Medicine
29
Chapter 3
Feature-based Multi-Modal Rigid Surface Registration
31
3.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.1 Patient Setup in Fractionated Radiation Therapy . . . . . . . . . . . . . . 32
i
3.1.2 Image-Guided Open Liver Surgery . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Feature-based Surface Registration Framework. . . . . . . . . . . . . . . . . . . . 36
3.3.1 Correspondence Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.2 Transformation Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Shape Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.1 Spin Images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.2 Mesh Histograms of Oriented Gradients (MeshHOG) . . . . . . . . . . 40
3.4.3 Rotation Invariant Fast Features (RIFF) . . . . . . . . . . . . . . . . . . . . . 43
3.4.4 Distance Metrics for Feature Matching . . . . . . . . . . . . . . . . . . . . . 44
3.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5.1 Multi-Modal Patient Setup in Fractionated RT . . . . . . . . . . . . . . . . 45
3.5.2 Multi-Modal Data Fusion in IGLS . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.6 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Chapter 4
Photo-geometric Rigid Surface Registration for Endoscopic
Reconstruction
55
4.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.1 Operation Situs Reconstruction in Laparoscopy . . . . . . . . . . . . . . . 56
4.1.2 Towards 3-D Model Construction in Colonoscopy . . . . . . . . . . . . . 57
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Photo-geometric Surface Registration Framework . . . . . . . . . . . . . . . . . . 60
4.3.1 Photo-geometric ICP Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.2 Approximative 6-D Nearest Neighbor Search using RBC . . . . . . . . 62
4.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.1 Performance Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.2 Experiments on Operation Situs Reconstruction. . . . . . . . . . . . . . . 68
4.4.3 Experiments on Colon Shape Model Construction . . . . . . . . . . . . . 72
4.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
II Non-Rigid Surface Registration for Range Imaging Applications in Medicine
79
Chapter 5
Joint Range Image Denoising and Surface Registration
81
5.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.1.1 Image-Guided Radiation Therapy. . . . . . . . . . . . . . . . . . . . . . . . . 82
5.1.2 Respiration-Synchronized Dose Delivery . . . . . . . . . . . . . . . . . . . 83
ii
5.1.3 Dense Deformation Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3 Non-Rigid Surface Registration Framework . . . . . . . . . . . . . . . . . . . . . . 86
5.3.1 Geometric Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3.2 Definition of the Registration Energy . . . . . . . . . . . . . . . . . . . . . . 87
5.4 A Joint Denoising and Registration Approach. . . . . . . . . . . . . . . . . . . . . 88
5.4.1 Definition of the Registration Energy . . . . . . . . . . . . . . . . . . . . . . 89
5.4.2 Numerical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5.1 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.6 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Chapter 6
Sparse-to-Dense Non-Rigid Surface Registration
101
6.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2 Sparse-to-Dense Surface Registration Framework . . . . . . . . . . . . . . . . . . 103
6.2.1 Geometric Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.2.2 Definition of the Registration Energy . . . . . . . . . . . . . . . . . . . . . . 105
6.2.3 Numerical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.3.1 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Chapter 7
Photometry-driven Non-Rigid Surface Registration
119
7.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.2.1 Photometry-Driven Surface Registration . . . . . . . . . . . . . . . . . . . . 122
7.2.2 Geometry-Driven Surface Registration . . . . . . . . . . . . . . . . . . . . . 123
7.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.3.1 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Chapter 8
Outlook
129
Chapter 9
Summary
135
iii
Chapter A
Appendix
139
A.1 Projection Geometry. . . . . . . . . . . . . . . . . . . . . . . . . .
A.1.1 Perspective Projection . . . . . . . . . . . . . . . . . . . .
A.1.2 3-D Point Cloud Reconstruction. . . . . . . . . . . . .
A.1.3 Range Image Data Representation . . . . . . . . . . .
A.2 Joint Range Image Denoising and Surface Registration .
A.2.1 Approximation of the Matching Energy . . . . . . .
A.2.2 Derivation of the First Variations . . . . . . . . . . . .
A.3 Sparse-to-dense Non-Rigid Surface Registration . . . . .
A.3.1 Derivation of the First Variations . . . . . . . . . . . .
A.3.2 Improved Projection Approximation . . . . . . . . .
A.3.3 Detailed Results of the Prototype Study . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
139
139
140
141
141
141
142
144
144
144
145
List of Symbols
147
List of Abbreviations
151
List of Figures
153
List of Tables
155
Bibliography
157
iv
CHAPTER
1
Introduction
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3 Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Computer assistance has become increasingly important in medicine over the past
decades. One of the key requirements in this context is a robust localization and
dynamic tracking of the target objects involved in the specific medical procedure.
Guidance and navigation concepts in computer-assisted interventions are based
on establishing the spatial relationship between the patient anatomy and the medical instruments used during the intervention. This typically involves the registration of intra-interventionally acquired data – describing the patient anatomy
during the intervention – to pre-interventionally acquired patient-specific anatomical models. One of the fundamental prerequisites to perform this registration is
a dynamic, accurate, and robust acquisition of the patient anatomy during the
intervention. So far, this has been addressed using either optical or electromagnetic tracking technologies that require markers to be attached to the target, or
by means of intra-interventional radiographic imaging. While marker-based approaches often complicate the workflow and are thus not widely accepted in clinical routine, radiographic imaging implies a substantial radiation exposure to the
patient and/or the physician.
In contrast, real-time range imaging (RI) holds a marker-less and radiation-free
alternative for the acquisition of intra-interventional data in computer-assisted interventions. Indeed, RI based techniques have experienced a remarkable development in this context with the availability of dynamic, dense and low-cost technologies and have been applied for numerous applications in the clinical environment,
far beyond marker-less localization. In this chapter, we outline the motivation for
this thesis and specify our scientific contributions to the field of surface registration
for RI applications in medicine.
1.1
Motivation
Registration has emerged as one of the key technologies in medical image computing and is an essential component in various applications in computer-assisted
diagnosis and intervention. In general, registration denotes the process of finding an optimal geometric transformation that brings a moving template dataset into
1
2
Introduction
congruence with a fixed reference dataset. In practice, registration problems are
addressed by specifying a suitable mathematical model of transformations for the
desired alignment and estimating the model parameters by optimizing a dedicated
objective function. Depending on the particular application, the spatial correspondence between the template and the reference dataset can be of rigid or elastic
nature. Classical medical image registration tasks involve the alignment of planar
images (2-D/2-D registration), the alignment of volumetric datasets (3-D/3-D registration), projective alignment techniques (2-D/3-D registration), and the alignment of shapes (2-D contours or 3-D surfaces). First and foremost, the importance of medical image registration is driven by the growing and diverse variety
of imaging modalities. This trend demands for methods to combine complementary data from multiple modalities, making it easily accessible to the physician and
superseding the traditionally mental fusion. Typical scenarios involve the combination of morphological data, e.g. from computed tomography (CT), magnetic resonance imaging (MRI) or ultrasound (US), with functional information, e.g. from
positron emission tomography (PET) or single-photon emission computed tomography (SPECT) imaging. Second, medical image registration provides the basis
for intra-subject monitoring of spatio-temporal progression in longitudinal studies
and inter-subject comparison with anatomical atlases and statistical shape models.
Third, it can be applied in a joint manner with related medical image computing
tasks such as denoising and segmentation. Over the past two decades, a broad
spectrum of approaches for rigid and non-rigid, parametric and non-parametric
registration with numerous options in terms of distance measures, regularizers
and optimization schemes has evolved. For a survey let us refer to standard literature in the field [Hajn 01, Main 98, Mark 12, Mode 03a, Mode 09, Ruec 11].
In the last few years, significant advances in optics, electronics, sensor design,
and computing power have rendered 3-D range imaging (RI) at dense resolutions,
real-time frame rates and low manufacturing costs possible. These novel RI technologies hold benefits for a multitude of medical applications. Many of these
applications involve the registration of an acquired 3-D shape with a reference
model, for instance, the intra-fractional registration of the external patient body
surface with tomographic planning data for patient setup and respiratory motion
management in radiation therapy (RT), or the intra-operative registration of the
operation situs with pre-operative planning data for augmented reality navigation and guidance. Hence, RI-based surface registration in medicine is a rapidly
evolving field of research [Baue 13a]. This thesis is embedded in the context of RIbased applications in image-guided interventions – focusing on tasks that involve
shape correspondence problems. We present novel concepts based on dynamic
3-D perception to improve the quality, safety and efficiency of clinical workflows
and propose both rigid and non-rigid surface registration techniques that are optimized w.r.t. the strengths and limitations of different RI technologies.
1.2
Contributions
The scientific focus of this work lies on the development of novel rigid and nonrigid surface registration techniques that meet the requirements of RI-based med-
1.2
Contributions
3
ical applications. In addition, we investigate new clinical applications for imageguided open surgery, minimally invasive procedures and radiation therapy. Some
of the proposed methods follow up on ideas that have been introduced before.
Hence, let us briefly summarize the main contributions of this thesis to the progress
of research in the field of medical surface registration, along with the associated
scientific publications.
Contributions to the Field of Rigid Surface Registration
First, we outline our contributions to the field of rigid surface registration for range
imaging applications in medicine:
• We propose a novel feature-based method for marker-less rigid alignment
of intra-procedural range imaging data with pre-operative surface data extracted from tomographic imaging modalities. Regarding the challenge of
multi-modal registration, we introduce customized 3-D shape descriptors
that meet the following specific requirements: invariance to mesh density,
mesh organization and inter-modality deviations in surface topography that
result from the underlying sampling principles. Furthermore, we investigate
the application of the proposed method in image-guided liver surgery and
radiation therapy. Methods and results are detailed in Chap. 3 and have
been presented at two conferences:
[Mull 11]
K. Müller, S. Bauer, J. Wasza, and J. Hornegger. Automatic
Multi-modal ToF/CT Organ Surface Registration. In: Proceedings of Bildverarbeitung für die Medizin (BVM), pp. 154–
158, Springer, Mar 2011.
[Baue 11a]
S. Bauer, J. Wasza, S. Haase, N. Marosi, and J. Hornegger.
Multi-modal Surface Registration for Markerless Initial Patient
Setup in Radiation Therapy using Microsoft’s Kinect Sensor. In:
Proceedings of IEEE International Conference on Computer
Vision (ICCV) Workshops, Workshop on Consumer Depth
Cameras for Computer Vision (CDC4CV), pp. 1175–1181,
IEEE, Nov 2011.
• We propose a photo-geometric variant of the iterative closest point (ICP) algorithm in combination with an efficient nearest neighbor search scheme.
Incorporating photometric information into the registration process is of particular interest for modern RI sensors that exhibit a low signal-to-noise ratio
(SNR) in the range domain but acquire complementary high-grade photometric information. We investigate the benefits of this photo-geometric registration framework for two prospective clinical applications: optical 3-D colonoscopy and laparoscopic interventions. To overcome the traditional bottleneck in nearest neighbor search space traversal, we propose a variant of
a recently published scheme by Cayton [Cayt 10, Cayt 11] that we have opti-
4
Introduction
mized in terms of performance. Methods and results are detailed in Chap. 4
and have been presented at a conference and published as a book chapter:
[Neum 11]
D. Neumann, F. Lugauer, S. Bauer, J. Wasza, and J. Hornegger. Real-time RGB-D Mapping and 3-D Modeling on the
GPU using the Random Ball Cover Data Structure. In:
Proceedings of IEEE International Conference on Computer Vision (ICCV) Workshops, Workshop on Consumer
Depth Cameras for Computer Vision (CDC4CV), pp. 1161–
1167, IEEE, Nov 2011.
[Baue 13b]
S. Bauer, J. Wasza, F. Lugauer, D. Neumann, and J. Hornegger. Chap. Real-time RGB-D Mapping and 3-D Modeling on
the GPU using the Random Ball Cover. In: Consumer Depth
Cameras for Computer Vision: Research Topics and Applications, pp. 27–48, Advances in Computer Vision and Pattern Recognition, Springer, 2013.
Contributions to the Field of Non-Rigid Surface Registration
Second, we outline our contributions to the field of non-rigid surface registration
for range imaging applications in medicine. Overall, we propose three novel methods for the estimation of dense 4-D surface motion fields (3-D+time), describing the elastic deformation of the patient’s external body under the influence of
respiration. Dense respiratory motion tracking holds great potentials for motion
compensation techniques in radiation therapy, as it better reflects the complexity of respiratory motion compared to conventionally used 1-D respiration surrogates [Faya 11, Yan 06]. All three approaches are optimized w.r.t. the strengths and
limitations of different range imaging technologies:
• We propose a novel variational framework for joint denoising of range
imaging data and its non-rigid registration to a reference surface. Our
experiments show that solving both tasks of denoising and registration in
a simultaneous manner is superior to a sequential approach where surface
registration is performed after denoising of noisy RI measurements. This
allows a robust estimation of dense 4-D surface motion fields with range
imaging modalities that exhibit a low SNR. Methods and results are detailed
in Chap. 5 and have been presented at a conference:
[Baue 12b]
S. Bauer, B. Berkels, J. Hornegger, and M. Rumpf. Joint ToF
Image Denoising and Registration with a CT Surface in Radiation Therapy. In: Proceedings of International Conference
on Scale Space and Variational Methods in Computer Vision (SSVM), pp. 98–109, Springer, May 2012.
• We propose the application of a novel RI sensor that acquires sparse but
highly accurate 3-D position measurements in real-time. These are regis-
1.2
Contributions
5
tered with a dense reference surface extracted from planning data. Thereby
a dense displacement field is recovered which describes the elastic spatiotemporal deformation of the complete patient body surface. In particular,
the proposed approach involves the estimation of dense 4-D surface motion fields from sparse measurements using prior shape knowledge from
planning data. It yields both a reconstruction of the instantaneous patient
shape and a high-dimensional respiratory surrogate for respiratory motion
tracking. Methods and results are detailed in Chap. 6 and have been presented at a conference and published in a journal article:
[Baue 12a]
S. Bauer, B. Berkels, S. Ettl, O. Arold, J. Hornegger, and
M. Rumpf. Marker-less Reconstruction of Dense 4-D Surface Motion Fields using Active Laser Triangulation for Respiratory Motion Management. In: Proceedings of International
Conference on Medical Image Computing and Computer
Assisted Intervention (MICCAI), pp. 414–421, LNCS 7510,
Part I, Springer, Oct 2012.
[Berk 13]
B. Berkels, S. Bauer, S. Ettl, O. Arold, J. Hornegger, and
M. Rumpf. Joint Surface Reconstruction and 4-D Deformation
Estimation from Sparse Data and Prior Knowledge for MarkerLess Respiratory Motion Tracking. In: Medical Physics,
Vol. 40, No. 9, pp. 091703 1–10, Sep 2013.
• For RI sensors that provide aligned geometric and photometric information,
we propose a method that performs the reconstruction of the geometric surface motion field by estimating the non-rigid transformation in the photometric image domain using a variational optical flow formulation. From this
photometric 2-D displacement field and the known associated range measurements, the 3-D surface motion field is deduced. Methods and results are
detailed in Chap. 7 and have been presented at a conference:
[Baue 12d]
S. Bauer, J. Wasza, and J. Hornegger. Photometric Estimation of 3-D Surface Motion Fields for Respiration Management.
In: Proceedings of Bildverarbeitung für die Medizin (BVM),
pp. 105–110, Springer, Mar 2012.
In addition to the aforementioned scientific contributions, we have developed a
powerful framework for high-performance and rapid prototyping RI processing,
far beyond surface registration, named range imaging toolkit (RITK) [Wasz 11b].
RITK is released as an open source platform and thus another contribution to the
scientific community of range image processing and analysis, paving the way for
accelerating the use of range imaging technologies in clinical applications.
Furthermore, we have conducted a comprehensive state-of-the-art survey on
the integration of modern RI technologies in health care applications, published
as a book chapter:
6
Introduction
[Baue 13a]
S. Bauer, A. Seitel, H. Hofmann, T. Blum, J. Wasza, M. Balda,
H.-P. Meinzer, N. Navab, J. Hornegger, and L. Maier-Hein.
Real-Time Range Imaging in Health Care: A Survey. In: Timeof-Flight and Depth Imaging. Sensors, Algorithms, and Applications, pp. 228–254, LNCS 8200, Springer, 2013.
The survey identifies promising applications and algorithms, and provides an
overview of recent developments in this emerging domain. We have reviewed
recent methods and results and discuss open research issues and challenges that
are of fundamental importance for the progression of the field. To our knowledge,
this survey is the first in literature to address the fast growing number of research
activities in the context of real-time RI in health care.
Some chapters of this thesis contain material that has been published or submitted to conference proceedings and journals. In addition to the works listed in
the itemization above, this involves several publications that emerged during this
thesis [Baue 11b, Baue 12c, Ettl 12a, Grim 12, Pass 08, Sout 10, Wasz 11c, Wasz 11b,
Wasz 11a, Wasz 12b, Wasz 12a, Wasz 13].
1.3
Organization of this Thesis
Let us outline the structure of this thesis, cf. Fig. 1.1. Chap. 2 provides a comprehensive overview about RI technologies with a focus on modalities applied in
this work. We introduce our framework for range image processing and comment
on range image enhancement. In addition, we present a survey of recent developments of RI applications in health care and summarize directions of research in the
field of surface registration and shape correspondence. As we consider different
medical applications within this thesis, the clinical background is discussed in the
individual chapters.
The main part of this thesis divides into two parts. Part I is concerned with rigid
surface registration techniques. In Chap. 3, we introduce a comprehensive framework for feature-based rigid shape alignment and propose customized 3-D surface
descriptors that meet the specific requirements for multi-modal surface registration. This point-based approach inherently copes with cases of partial matching
and gross misalignments that occur in the applications we address. In particular,
we propose the use of this automatic multi-modal surface registration framework
for two clinical applications: image-guided liver surgery (IGLS) and reproducible
patient setup in fractionated RT. For both applications, we present experimental
results on real data from different RI modalities. Chap. 4 is concerned with rigid
surface registration in the case of slight misalignments. In this context, the ICP
algorithm is an established approach. Previous work had indicated that the incorporation of complementary photometric information into the correspondence
search – opposed to the classical ICP that solely considers the geometric domain –
improves alignment quality. Due to computational constraints and the lack of RI
cameras that acquire both 3-D and color data, this combined photo-geometric approach has not been considered for interactive applications before. We particularly
address on-the-fly reconstruction of tubular anatomical shapes holding potential
1.3
Organization of this Thesis
7
Introduction
C1
Range Imaging and Surface Registration in Medicine
C2
Rigid Surface Registration
Feature-based Multi-Modal Rigid Surface Registration
- Shape descriptors for multi-modal application
- Automatic RT patient setup, image-guided liver surgery
Photo-Geometric Rigid Surface Registration
- Color ICP using the random ball cover data structure
- Endoscopic operation situs and organ reconstruction
Non-Rigid Surface Registration
PI
P II
Joint Denoising and Non-Rigid Surface Registration
- Using shape priors to support denoising process
- Respiratory motion tracking using dense RI sensors
C5
Sparse-to-Dense Non-Rigid Surface Registration
- Dense estimation from sparse data and shape priors
- Respiratory motion tracking using sparse RI sensors
C6
Photometry-Driven Non-Rigid Surface Registration
- Incorporating complementary sources of information
- Respiratory motion tracking using RGB-D sensors
C7
C3
C4
Outlook
C8
Summary
C9
Figure 1.1: Organization of this thesis, dividing into rigid (left, Part I) and non-rigid (right,
Part II) surface registration. For the individual chapters, the main contribution in terms of
methodology and the medical applications we address are depicted.
for 3-D colonoscopy, and the reconstruction of the operation situs for field-of-view
expansion in laparoscopic interventions by consecutive alignment of RI streams
acquired from a hand-guided moving camera.
The focus of Part II is on non-rigid methods for surface registration. In terms of
application, Chapters 5-7 address the estimation of dense surface motion fields as
a high-dimensional respiration surrogate holding potentials for motion tracking
and compensation in image-guided diagnosis and interventions. First and foremost, we target motion-compensated dose delivery using external-internal correlation models in radiation therapy. The clinical background is detailed in the
first chapter of Part II. In Chap. 5, we derive a variational formulation for nonrigid registration of dense surface data. Formulated as a classical shape alignment problem, the template surface is deformed to match a given reference while
ensuring a smooth displacement field, thus preserving the original shape characteristics. Extending this basic formulation, we introduce a novel approach that
solves denoising of dense RI data and its non-rigid registration to a reference surface extracted from planning data. Experimental results confirm that treating the
two intertwined tasks of denoising and registration in a joint manner is beneficial: Incorporating prior knowledge about the reference shape helps substantially
8
Introduction
in the denoising process, and proper denoising renders the registration problem
more robust. Chap. 6 investigates the medical potential of a novel RI sensor that
acquires sparse but highly accurate 3-D data in real-time. In particular, we have
developed a sparse-to-dense registration approach that is capable of recovering
the patient’s dense 3-D body surface and estimating a 4-D (3-D+time) surface motion field from sparse sampling data and patient-specific prior shape knowledge
extracted from planning data. The method is validated on a 4-D CT respiration
phantom and evaluated on both synthetic and real data. The experimental results
indicate that a paradigm shift in RI technology – accurate but sparse vs. dense but
noisy – is a promising direction for future research. Chap. 7 takes advantage of
additional photometric information available with modern RGB-D sensors which
capture both color (RGB) and depth (D) information, along the lines of Chap. 3.
We propose an approach that breaks the estimation of surface motion fields down
to a non-rigid image registration problem in the 2-D photometric domain. Based
on this 2-D displacement field, the geometric 3-D motion field is deduced from
the associated depth information. Experimental results on real data indicate that
incorporating the photometric domain as a complementary source of information
can help improving the quality of surface motion fields.
The thesis concludes with an outlook (Chap. 8) and a summary (Chap. 9).
CHAPTER
2
Range Imaging and Surface
Registration in Medicine
2.1 Real-time Range Imaging Technologies . . . . . . . . . . . . . . . . . . . . . .
9
2.2 Range Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Applications in Medicine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Surface Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
The recent availability of dynamic, dense, and low-cost range imaging has
gained widespread interest in health care. It opens up new opportunities and has
an increasing impact on both research and commercial activities. In this chapter,
first, we introduce the measurement principles of different real-time range imaging modalities with a focus on RI sensors investigated in this thesis (Sect. 2.1).
In Sect. 2.2, we present our development platform for range image processing
and comment on range image enhancement. Sect. 2.3 comprises a state-of-theart survey on the integration of modern RI sensors in medical applications. Last,
in Sect. 2.4, we present an overview of approaches to rigid and non-rigid surface
registration. Parts of this chapter have been published in [Baue 13a].
2.1
Real-time Range Imaging Technologies
The projection of the 3-D world onto a 2-D sensor domain results in general in the
loss of depth information. This implies that, for a given 2-D image, the 3-D geometry of the observed scene cannot be reconstructed unambiguously. Nature has
satisfied the need for depth perception by providing humans and most animals
with a binocular visual system. Based on the exploitation of binocular disparity,
we can extract qualitative depth information from a pair of 2-D projections. For a
human being, perceiving the world in three dimensions apparently comes without any effort. In contrast, the acquisition or reconstruction of 3-D geometry with
sensor technologies has turned out to be an ongoing challenge.
Over the past decades, a multitude of technologies for non-contact 3-D perception has been proposed. For an overview we refer to the surveys by Jarvis [Jarv 83]
and Blais [Blai 04], and the books by Jähne et al. [Jahn 99] and Pears et al. [Pear 12].
These ongoing efforts underline the intuition that the most natural and descrip9
10
Range Imaging and Surface Registration in Medicine
tive interface between the world and computer vision algorithms would be a full
3-D description of the environment, at least from a theoretical point of view. However, back in the early days of photogrammetry, a dense and accurate reconstruction of 3-D geometry was both tedious, time consuming, and expensive. Furthermore, data acquisition was limited to static objects and scenes.
In practice, the world that we perceive spans another dimension: time. In fact,
for many real-world applications, knowledge about the 3-D geometry of the environment is impractical - until it comes in real-time. This lack of real-time capable
depth perception technology might explain the tremendous success of 2-D cameras in the field of computer vision. Even though such a classical camera projects
the 3-D geometry onto a flat plane, capturing the temporal component of our
4-D world (3-D+time) seems more essential than the third dimension in spatial
perception. This is not surprising, as 2-D projections still provide a multitude of
cues about the underlying 3-D geometry.
Lately, technological advances in optics, electronics, mechanical control, sensor
design, and computing power have rendered metric 3-D range imaging at high
resolutions (≥ 300k px) and real-time frame rates (≥ 30 Hz) possible. A significant step toward real-time and dense 3-D surface scanning was the development
of Time-of-Flight (ToF) imaging (Sect. 2.1.2). However, its moderate spatial resolution, systematic errors, and the pricing of early ToF imaging prototypes led to
a steady but rather slow increase of interest in this technology. In practice, the
computer vision communities did hardly notice these early real-time capable RI
sensors for some time. In 2010, this changed radically with the introduction of Microsoft Kinect as a natural user interface for gaming, at a mass market retail price of
about $100 a unit and with more than 10 million sales within a few months. Apart
from its impact on consumer electronics, computer vision researchers realized the
potential behind the device – being a fully-functional real-time RI camera at a competitive pricing – for a wide range of applications far beyond gaming. The device
has directed the attention of many research communities to the field of 3-D computer vision – with a strong focus on applications where real-time 3-D vision is
the key. The fact that the computer vision community has dedicated a separate
workshop series (Consumer Depth Cameras for Computer Vision, IEEE, since 2011)
underlines the significance of low-cost RI.
The first part of this chapter compares competing real-time RI technologies,
with a focus on modalities used in this work. In particular, below, we restrict
our discussion to triangulation and time-of-flight based approaches. Regardless
of the fundamentally different underlying physical principles, both technologies
are capable of acquiring dense and metric 3-D surface information at real-time
frame rates and the vast majority of real-time range imaging sensors nowadays
rely on these two principles. It is worth noting that alternative principles for
3-D shape acquisition exist, such as shape-from-shading also known as photometric stereo, interferometry, deflectometry, shape-from-texture, structure-frommotion and depth-from-focus/defocus. For a more generic overview of measurement principles for optical shape acquisition, we refer to Häusler and Ettl [Haus 11]
and Stoykova et al. [Stoy 07].
2.1
Real-time Range Imaging Technologies
11
Before we proceed, let us clarify that data acquired with RI devices is termed
3-D information in this thesis. We do not explicitly differentiate between 2.5-D
data acquired from a single viewpoint as it is the case with today’s RI cameras and
(full) 3-D data that can be obtained using reconstruction techniques in a multiview setup or with tomographic scanners (CT/MR).
2.1.1
Triangulation
The most intensively explored principle in optical 3-D shape acquisition is triangulation. Let us differentiate between passive and active triangulation techniques.
Passive Triangulation. The class of 3-D acquisition techniques restricted to using the natural illumination of a scene is denoted passive triangulation. The most
prominent example is binocular perception using stereo vision [Hart 04]. Similar to
the human visual system, stereo vision uses a pair of images acquired with two
cameras from different viewpoints in order to compute 3-D structure. In particular, based on the geometric principle of triangulation, the position of a point in
3-D space can be reconstructed if the positions of its projections are known in both
images. More specifically, the underlying theory of epipolar geometry states that
the projection rays associated with the point locations on the images (and known
from camera calibration) intersect at the unknown 3-D point in space, see Fig. 2.1a.
The relative difference in position of the projected point (disparity) quantifies the
depth of the object in the scene. The larger the disparity, the closer the object.
Although considerable progress has been made in stereo vision, systems require precise calibration and imply a substantial computational burden to establish dense point correspondences based on feature matching, even though the
search space can be reduced using epipolar constraints. Depth accuracy scales
with the triangulation angle and the length of the triangulation base, respectively.
However, a larger base comes with increased occlusion effects. Furthermore, the
recovery of depth inside homogeneous, texture-less image regions or in the presence of repetitive patterns is an ill-posed problem.
Active Triangulation. As opposed to passive triangulation approaches that solely
rely on ambient scene illumination, active triangulation techniques for 3-D imaging typically use illumination units to project some form of light pattern onto the
scene. A straightforward extension of passive stereo vision is the use of active
pattern projection to simplify the correspondence problem in texture-less regions
(active stereo vision). However, 3-D shape acquisition using active triangulation can
also be achieved by using a projector in combination with only one camera. Here,
one of the cameras of a stereo system is replaced with a light source that projects
a controlled pattern onto the scene. Building upon the same measurement principle (triangulation geometry), the distance of an object can be determined based
on prior knowledge of the extrinsic setup, i.e. the relative position and orientation
of the light source w.r.t. the observing camera, see Fig. 2.1b. The correspondence
problem is reduced to finding the known projected pattern. In the simplest case,
the projected pattern may represent a single spot or a sheet of light, illuminating
12
Range Imaging and Surface Registration in Medicine
Projection plane
Image plane A
Projector
Triangulation Base
Triangulation Base
Camera A
Surface
Camera B
Image plane B
(a)
Surface
Camera
Image plane
(b)
Projector
Camera
Surface
Image plane
(c)
Figure 2.1: Schematic illustration of the measurement principle of different RI technologies. From left to right: (a) Passive triangulation in stereo vision, where the position x ∈ R3
is computed from its projection onto two different image planes, xp,A , xp,B ∈ R2 . Note that
the term triangulation stems from the triangle formed by the connection between the projection points and the associated projection rays. (b) Active triangulation using pattern
projection, where one of the cameras is replaced by a projector, xproj ∈ R2 is given by the
projector geometry. (c) CW-based ToF imaging using intensity-modulated light where the
position x ∈ R3 is deduced from the measured phase delay φtof .
the scene with a single stripe. Active triangulation using the latter is commonly
termed light sectioning (cf. Sect. 2.1.3), as the projected light sheet intersects with
the 3-D scene geometry. Spot- or stripe-based triangulation systems are limited
to capturing one single point or a line profile at a time. These modalities are typically denoted 3-D scanners, as they scan the scene in a consecutive manner to
recover the geometry. This facilitates the correspondence problem and produces
highly accurate and reliable measurements, but involves moving mechanical or
electromechanical parts and precludes the acquisition of dynamic scenes.
From a theoretical point of view, arbitrary projection patterns can be used.
Range imaging modalities that use area patterns to capture the entire scene at a
time are commonly denoted as structured light modalities. Using these technologies, no scanning is required. However, the correspondence between the projected
and observed pattern is not obvious anymore and the projected pattern must be
encoded. As a consequence, solving the correspondence problem induces a computational burden, even though this burden is low compared to passive systems.
Along with advances in opto-electronics, various type of structured light patterns
have been proposed in the literature over the years. Single-shot methods that are
capable of reconstructing depth from one single static spatially-coded projection
pattern use either monochrome [Garc 08] or color-coded patterns [Schm 12]. For
the sake of completeness, we also refer to active triangulation sensors that are
based on the projection of a time-series of patterns such as binary temporal patterns [Sava 97] or phase shifting [Salv 04], but unsuitable for dynamic RI.
Let us conclude that active triangulation techniques are an interesting option
to passive triangulation. Typically, active techniques outperform passive triangulation in terms of density, reliability and acquisition speed. Nonetheless, both are
based on the geometric principle of triangulation and share the same limitations
regarding accuracy and occlusions.
2.1
2.1.2
Real-time Range Imaging Technologies
13
Time-of-Flight Imaging
ToF imaging is an emerging active range imaging technology that provides a direct
way of acquiring metric 3-D surface information by measuring the time that light
travels from an illumination unit to an object and back to a sensor. Two complementary approaches to ToF imaging have been proposed in the literature [Lang 00,
Kolb 09]. Pulse-based ToF imaging directly measures the period of time between
emission of a pulse from an illumination unit and reception of the reflected pulse
back at the sensor using a very short exposure window [Yaha 07]. The alternative,
continuous wave (CW) modulation ToF imaging, measures the phase delay φtof between an actively emitted and the reflected optical signal [Xu 98, Oggi 04, Foix 11],
see Fig. 2.1c.
CW-based ToF imaging is the most widely used approach in commercially
available cameras. Let us briefly summarize the details: Active light sources attached to the camera illuminate the scene with an incoherent cosine-modulated
optical signal in the non-visible spectrum of the infrared (IR) range. The light is
reflected by the scene and enters the monocular camera, where each ToF sensor
element performs a correlation of the local incoming optical signal with the electrical reference of the emitted signal. Based on this correlation, the phase shift φtof
representing the propagation delay between both waves is measured by sampling
the signal at equidistant points in time within a so-called integration time. The
radial distance (range) r^ from the sensor element to the object is then given as
φtof
where f mod denotes the modulation frequency and c the speed of
r^ = 2 f c · 2π
mod
light. Due to the periodicity of the cosine-shaped modulation signal, the validity
of this equation is limited to distances smaller than 2 f c . ToF devices are commod
pact, portable, easy to integrate and deliver complementary grayscale intensities
in real-time with a single sensor.
2.1.3
Discussion of RI Sensors investigated in this Thesis
In this thesis, we consider three different real-time RI modalities. All three are
single-shot techniques that return metric 3-D coordinates of the acquired scene.
For a comparison of their specifications, see Table 2.1. Below, let us discuss the
strengths and limitations of the individual modalities. First, we compare ToF
imaging with dense structured light. Then, we discuss a novel RI sensor prototype based on multi-line light sectioning.
PMD CamCube. The PMD CamCube 2.0/3.0 is a CW ToF camera by PMD Technologies GmbH1 . It features a resolution of 204×204 px (2.0) or 200×200 px (3.0)
and a full-resolution frame rate of 25 Hz (2.0) or 40 Hz (3.0), respectively. The IR
illumination unit operates at a wavelength of 870 nm. The camera optics feature a
field of view of 40◦ ×40◦ . Until PMD Technologies dropped the CamCube camera
line from its commercially available product portfolio in 2012, it was the highest
resolution ToF sensor on the market.
1 http://www.pmdtec.com
14
Range Imaging and Surface Registration in Medicine
Table 2.1: Overview of the specifications for the RI sensors investigated in this thesis.
The uncertainty for the PMD CamCube 3.0 and Microsoft Kinect, respectively, is given as
the standard deviation of the measured depth values over time at a distance of 1 m to a
white plane. These results are approximately consistent to related work and manufacturer
reports. For the MLT sensor, the mean measurement uncertainty within the measurement
volume of 80×80×35 cm3 is reported. We also depict raw data from the three sensors,
for capturing a female torso phantom at a distance of 1.2 m (bottom line). Image sources,
from left to right: PMD Technologies GmbH, Siegen Germany; Microsoft Corporation,
Redmond, USA; Chair of Optics, University Erlangen-Nürnberg, Erlangen, Germany.
Specification
Principle
Resolution [px]
Frame rate [Hz]
Meas. range [m]
Field of view [◦ ]
Uncertainty [mm]
Price [e]
Light source
PMD CamCube 3.0 Microsoft Kinect
Time-of-Flight
Active Triang.
(CW Modulation) (Structured Light)
200×200
640×480
40
30
0.3-7.0
0.8-4.0
40×40
57×43
±5.98
±0.92
8000
100
LED (870 nm)
Laser (830 nm)
MLT Sensor
Active Triang.
(Light Sectioning)
1024×768 (sparse)
15
Custom
44×33
±0.39
Prototype
Laser (660 nm)
Example data
The most substantial limitations of available ToF cameras are their low spatial
resolution and low SNR. The former results from the fact that the complex circuitry
for on-chip correlation entails substantial space requirements in terms of semiconductor area. The latter essentially results from limitations regarding the power
of the emitted IR signal (trade-off between accuracy, power consumption, frame
rate, and eye safety regulations), a finite integration time, and physical constraints
such as the measurement uncertainty being indirectly proportional to the squared
distance between the sensor and the object [Fran 09b].
Furthermore, ToF sensors suffer from a number of systematic (related to sensor)
and non-systematic (related to scene content) errors [Kolb 09, Foix 11, Hans 13].
Systematic errors include (1) distance related measurement errors (aka. wiggling)
that result from imperfections in the shape of the modulated IR signal, (2) amplitude related errors due to a low strength of the reflected signal, saturation, and the
2.1
Real-time Range Imaging Technologies
15
dependency of depth measurements on object material and reflectivity, (3) pixel
related errors due to tolerances in the chip manufacturing process, (4) temperature related errors due to the influence of temperature on semiconductor material
properties that cause variations in the response behavior, (5) errors that result from
the limited lateral resolution such as ambiguities at depth discontinuities (aka. flying pixels), and (6) integration time-related errors that are not yet fully understood
[Foix 11]. Note that the actual systematic error occurring in practice is a superposition of these individual sources of error. Non-systematic errors involve (1) motion
blur in the presence of dynamic scenes resulting from the underlying principle
of reconstructing the phase from a set of time-delayed samples captured within
a non-infinitesimal integration time, (2) internal light scattering effects between
the camera lens and the sensor and subsurface scattering at the object, and (3)
multi-path issues that result from the superposition and interference of responses
received from different reflection paths at the same sensor element.
Early approaches to ToF camera calibration have particularly focused on the
correction of systematic errors [Fuch 08a, Fuch 08b, Lind 10, Reyn 11, Schm 11] and
put substantial effort to the theoretical and physical modeling of ToF sensors and
their error sources using simulation frameworks [Kell 09, Schm 09]. Recently, we
have noticed increased efforts to tackle non-systematic error sources, e.g. for the
compensation of motion blur [Lee 12], light scattering [Wu 12], and multi-path issues [Fuch 10, Dorr 11].
Let us conclude that ToF imaging is a promising technology but still considered not mature for many practical and real-world applications. Nonetheless, it
features several advantages compared to triangulation techniques discussed below. ToF cameras do not require a baseline between the illumination unit and the
sensor, allowing for compact designs and superseding the need for extrinsic calibration. ToF imaging inherently delivers aligned geometric depth and photometric intensity information. Depth data is acquired independently for each pixel and
regardless of scene texture conditions, also avoiding the computationally expensive correspondence problem and, thus, enabling fast 3-D data acquisition. The
non-ambiguous measurement range is highly scalable by modifying light power,
integration time, or modulation frequency. Multi-camera setups are possible using distinct modulation frequencies. Furthermore, ToF imaging is robust to background illumination by on-chip filtering of the active transmitter signal from ambient light. Last, from a researcher’s perspective, a considerable advantage is that
most ToF manufacturers provide a comprehensive application programming interface (API) enabling software-based RI data enhancement. For instance, the application of compressed sensing techniques was proposed for ToF imaging recently
[Cola 12, Tsag 12].
Microsoft Kinect. The Microsoft Kinect device features a conventional RGB camera (1280×1024 px, 30 Hz) that typically operates at a resolution of 640×480 px,
an IR laser projector (830 nm) that projects a pseudo-random dot pattern, and a
monochrome IR CMOS sensor (1280×1024 px, 30 Hz) equipped with an IR bandpass that observes the projected pattern in the scene. IR data are evaluated on a
system-on-a-chip (SoC), generating range images at a maximum nominal resolu-
16
Range Imaging and Surface Registration in Medicine
tion of 640×480 px at 30 Hz with 11-bit discretization [Ande 12, Catu 12, Khos 12,
Smis 13]. The reconstruction of depth values from the observed dot pattern is
based on correlation with known reference patterns. For a comprehensive description of the underlying reconstruction process, we refer to a series of patents from
Primesense Ltd., Israel2 [Free 10b, Free 10a, Shpu 11, Zale 10] that originally developed the technique.
Major advantages of Microsoft Kinect compared to ToF cameras are its high
spatial resolution, a better SNR, and less systematic errors. For instance, measurements are independent of the reflectivity of objects, as opposed to ToF sensors.
Strong ambient illumination may reduce the contrast in the observed pattern, influencing the depth reconstruction quality. However, this is not an issue with indoor applications being considered in this thesis. Another advantage compared
to ToF imaging is the independence from measurements on scene geometry (no
multi-path issues). Being a closed (black box) system, let us remark that there is no
insight about internal pre-processing of depth data. Hence, it is unclear whether
the higher SNR results from more reliable measurements or from pre-processing
in the SoC unit. Compared to previous triangulation-based sensors, a substantial
progress was solving the correspondence problem efficiently on a low-cost SoC
processor, allowing for real-time depth reconstruction. Beside these technological
aspects and engineering achievements to produce the most dense and real-time
capable RI camera, the key factor for the success of Microsoft Kinect was its pricing that could be achieved using established IR sensor technology in combination
with a dedicated SoC processor and mass market quantities.
Let us briefly summarize its limitations [Khos 12, Shim 12]: First, both the depth
resolution and the measurement uncertainty decreases quadratically with increasing sensor-object distance. Second, a common problem of Microsoft Kinect is the
incapability to recover depth in regions with non-diffuse highlights or translucent
objects. Specular highlights typically cause total reflection and sensor saturation,
translucent objects result in refraction and subsurface scattering effects preventing
or distorting the reflection back to the sensor. Third, as other triangulation-based
RI systems, Microsoft Kinect suffers from partial occlusions in regions that are illuminated by the projector but not captured by the IR sensor, leading to missing
areas where depth cannot be reconstructed. Note that the requirement of a baseline also restrains the degree of miniaturization and the danger of de-calibration
over time due to physical stress. Fourth, multi-camera setups suffer from interferences, impairing depth reconstruction quality. Last, Microsoft Kinect is a an
off-the-shelf consumer electronics device with a preset fix measurement range and
restricted access to internal parameterization and raw sensor data.
Multi-line Triangulation Sensor. The multi-line active triangulation (MLT) sensor used in this work was recently introduced for interactive reconstruction of
dense and accurate 3-D models, based on the so-called principle of Flying Triangulation [Ettl 12b]: A hand-guided sensor is moved around an object while continuously capturing camera images of a projected line pattern. Each camera image
delivers sparse 3-D measurements, which are aligned to precedingly acquired data
2 http://www.primesense.com/
2.1
Real-time Range Imaging Technologies
17
camera images
(single-shot sensor)
...
3-D data generation (triangulation)
sparse 3-D
sampling lines
...
static viewpoint
stream of 3-D
sampling grids
time
...
t1
t2
t3
Figure 2.2: Schematic illustration of the proposed measurement principle for the MLT sensor. The camera acquires sets of horizontal and vertical lines that are projected onto the
scene alternately. Based on these 2-D images, the associated 3-D measurements are reconstructed and merged to a stream of spatio-temporal 3-D sampling grids.
in real-time. The principle is scalable and three sensors have been realized so far:
an intra-oral teeth sensor, a face sensor, and a body sensor. The sensors have minimal measurement uncertainty for the respective measurement frustum. As light
sources, LEDs or lasers can be used.
In this thesis, we propose to apply the MLT sensor in a non-moving way, see
Fig. 2.2: The sensor is rigidly mounted and captures the scene from one static
viewpoint. Continuously, a stream of sparse 3-D data of the observed geometry is
obtained. In detail, sets of 11 horizontal and 10 vertical lines are projected onto
the scene alternately, using two laser line pattern projection systems (660 nm).
The patterns are observed by a synchronized CCD camera with a resolution of
1024×768 px [Ettl 12b] and a frame rate of 30 Hz. Half of the lines of the measurement grid are updated from frame to frame, see Fig. 2.2. Hence, two consecutive
perpendicular sets of line profiles describe the surface topography within a time
window of 1/30 second and a fully updated set of horizontal and vertical measurements is available every 1/15 second, i.e. an effective frame rate of 15 Hz.
Thus, the sensor can acquire sparse but highly accurate 3-D data in real-time.
The MLT sensor can be potentially manufactured at low cost, is compact and
relies on the established principle of light-sectioning, enabling high-precision surface sampling. The robustness of the system is based on the high-contrast laser
signal, even under the presence of fabric and/or skin. Unlike light-sectioning RI
systems that capture 3-D contours consecutively over time while sweeping a single laser line, the MLT sensor acquires information along multiple lines (simultaneously) and in both directions (alternately) by projecting two orthogonal line
patterns. It features an optimized spatial resolution along the measurement grid
as only a direct localization of the observed lines is needed, compared to neighborhood correlation in depth reconstruction techniques based on speckle pattern
projection.
18
2.2
Range Imaging and Surface Registration in Medicine
Range Image Processing
As indicated before, the introduction of low-cost devices for real-time 3-D perception has attracted a great deal of attention. However, the state-of-the-art lacks a
software library that explicitly addresses real-time processing of dense range image and 3-D point cloud streams. We have developed an open source platform that
addresses the particular demands of modern RI sensors (Sect. 2.2.1). In this section, we further present an integrated RI simulation environment (Sect. 2.2.2) and
a brief overview of range image enhancement (Sect. 2.2.3). For a recapitulation of
the concept of perspective projection and on how the inversion of this projection
is employed to calculate 3-D positions from the pixel-wise distance measurements
of RI sensors we refer to the Appendix of this thesis (Sect. A.1).
2.2.1
RITK: The Range Imaging Toolkit
In the computer vision community, several open source software libraries for general purpose 2-D and 3-D image processing exist today [Iban 05, Schr 06, Brad 00,
Rusu 11]. However, most of these libraries only provide some basic functionality
for the processing of static 3-D point clouds or surfaces and there exists no open
source framework that is explicitly dedicated to real-time processing of range image streams. During the course of this thesis, we have developed a powerful yet
intuitive software platform that facilitates the development of RI applications: the
range imaging toolkit (RITK) [Wasz 11b].
It is a cross-platform and object-oriented toolkit explicitly dedicated to the processing of high-bandwidth data streams from modern RI devices. RITK puts emphasis on real-time processing using dedicated pipeline mechanisms and userfriendly interfaces for efficient range image stream processing on modern manycore graphics processing units (GPUs). Furthermore, RITK takes advantage of the
interoperability of general purpose computing on the GPU and rendering for realtime visualization of dynamic 3-D point cloud data. Being designed thoroughly
and in a generic manner, the toolkit is able to cope with the broad diversity of
data streams provided by available RI devices and can easily be extended by custom sensor interfaces and processing modules. The toolkit can support developers in two ways: First, it can be used as an independent library for range image
stream processing within existing software. Second, it supports developers at an
application level with a comprehensive software development and rapid prototyping infrastructure for the creation of application-specific RI solutions. Due to
its generic design, existing modules can be reused to assemble individual RI processing pipelines at run-time.
RITK is an open source project and publicly available online3 . In our experience, it proved to greatly reduce the time required to develop RI applications.
Hence, we feel confident that other researchers in the rapidly growing community
will also benefit from RITK.
3 http://www5.cs.fau.de/ritk
2.2
2.2.2
Range Image Processing
19
Virtual RGB-D Camera
We have implemented a range image stream simulator that can generate virtual
RGB-D data in the same representation as a real RI camera would acquire it. In
particular, it produces range data based on the OpenGL depth buffer representation of a given 3-D scene. The simulator allows to experiment with modalitydependent sensor resolutions, noise characteristics, and artifacts that occur with
different RI sensors, while providing an absolute ground truth for evaluation purposes. Among others, we use it for quantitative evaluation of the rigid and nonrigid surface registration algorithms proposed in this thesis.
2.2.3
Range Data Enhancement
As detailed in Sect. 2.1.3, available RI cameras typically exhibit low SNRs and may
entail invalid or unreliable measurements that result in incomplete data due to the
underlying sampling principles and physical limitations of the sensors. Consequently, the enhancement of the raw range measurements provided by RI cameras
is a fundamental premise for medical applications that require a high level of accuracy and reliability in shape information while meeting real-time demands. For the
applications addressed in this thesis, a typical RI pre-processing pipeline consists
of two basic components: (1) restoration of invalid measurements and (2) temporal and spatial edge-preserving denoising [Wasz 11a, Wasz 11c]. Note that range
data enhancement is usually performed in the 2-D sensor domain. This facilitates
real-time RI data processing.
Restoration of Invalid Measurements. Prior to denoising of RI data, the restoration of invalid measurements has to be taken into consideration. As opposed to
static defect pixels, such invalid measurements occur unpredictably and can affect
both an isolated pixel or connected local regions. Note that in practice, some RI
sensors provide a pixel-wise reliability measurement. In the literature, a plurality
of methods for defect pixel correction have been proposed. Given the trade-off
between effectiveness and complexity, conventional approaches like normalized
convolution [Knut 93, Fran 09a] provide satisfying results [Wasz 11a]. It is worth
noting that restoring missing data in a first step renders an extra conditioning of
invalid data unnecessary for subsequently applied denoising algorithms.
Temporal and Spatial Denoising. The second component of a common RI data
enhancement pipeline is concerned with both temporal and spatial denoising.
When capturing static scenes, temporal measurement fluctuations can be reduced
by averaging over a set of subsequent frames [Lind 10]. In practice, we typically
applied an equally-weighted averaging or bilateral temporal averaging for dynamic scenes [Wasz 11a]. The length of the time interval considered for temporal denoising highly depends on the dynamics of the particular application. For
low-frequent scenarios such as respiratory motion tracking and real-time RI acquisition frame rates, the averaging of frames within a finite time interval is typically acceptable. After temporal denoising, we apply edge-preserving spatial filtering. In the field of image processing and computer vision, a variety of spatial
20
Range Imaging and Surface Registration in Medicine
denoising approaches have been introduced in recent years. Edge-preserving filters that smooth homogeneous regions while preserving manifest discontinuities
are of special interest for RI data enhancement. In this context, one of the most
popular and established methods is the bilateral filter [Auri 95, Toma 98]. Beyond
its application for a multitude of conventional imaging modalities, it is a common
choice for RI denoising [Lind 10]. The filter is straightforward to implement, but
exhibits a poor run-time performance due to its non-stationary nature. A promising alternative is the recently introduced concept of guided filtering [He 13] with a
computational complexity that is independent of the filter kernel size. At the same
time, it exhibits a comparable degree of edge-preserving smoothing.
Let us conclude with a word on filter parameterization. Regarding the trade-off
between data denoising and preservation of topographical structure, we typically
perform RI data pre-processing in a way that gives priority to surface smoothness. Insufficient filtering results in topographical artifacts that may have a strong
influence on algorithmic performance.
2.3
Applications in Medicine
In this section, we present a brief state-of-the-art survey on the integration of
modern RI sensors in medical applications. For a comprehensive review we refer to [Baue 13a]. We focus on dynamic and interactive tasks where real-time and
dense 3-D vision forms the key aspect. The survey divides into four thematic
fields: prevention and support in elderly care and rehabilitation, room supervision
and workflow analysis, touch-less interaction and visualization, and guidance in
computer-assisted interventions.
2.3.1
Prevention, Diagnosis and Support
Activity Assessment and Remote Support in Elderly Care. In-home activity assessment in elderly care is a rapidly evolving field. Systems focus on monitoring
the health status, sharing information about presence and daily activities, and providing on-line assistance and coaching. This allows older adults to continue life
in independent settings. Low-cost RI holds great potential in this context. For instance, recognition of early indicators of functional decline such as deviations in
gait using pose estimation [Mosh 12, Garc 12, Ston 11b, Ston 11a] can help preventing accidents, automatic detection of abnormal events such as falls [Parr 12] can
improve the response time in emergency situations, and retrospective data analysis can help understanding the mechanisms that led to an event.
Early Diagnosis and Screening. The detection of abnormal behavior based on
RI technologies also holds potential for early diagnosis and screening, for different groups of patients. Information about daily lifestyle and deviations from the
normal can help in early diagnosis or progression analysis for cognitive impaired
people such as Alzheimer’s [Coro 12] or Parkinson’s disease patients [Mosh 12].
Low-cost RI is also considered for large-scale screening of at-risk groups. In devel-
2.3
Applications in Medicine
21
opmental disorders such as autism and schizophrenia, observing behavioral precursors in early childhood using 3-D perception for activity recognition [Siva 12,
Walc 12] can allow for early intervention and thus improve patient outcomes. In
sleep monitoring, RI is gaining interest for non-contact measurement of sleep conditions [Yu 12] or diagnosis of sleep apnea [Fali 08].
Therapy Support and Progress Monitoring in Rehabilitation. RI sensors have
also attracted interest in the field of physical therapy. “Serious games” in rehabilitation have shown to increase motivation of the patient, thus improving exercise performance and rehabilitation outcomes [Smit 12]. RI-based games are of
particular interest, as the embedded sensors simultaneously allow for a quantitative assessment of performance. Low-cost RI systems have lately been considered for tele-rehabilitation techniques that are beneficial for translating skills
learned in therapy to everyday life. Furthermore, RI-based rehabilitation systems for physically disabled patients [Chan 11, Gama 12, Huan 11], chronic pain
patients [Scho 11] and patients after neurological injuries [Chan 12, Lang 11] have
been proposed.
Aids for Handicapped People. Recently, first approaches toward the use of assistive technologies to support handicapped people were proposed [Hers 08]. For
instance, integration of a RI device into an augmented blindman’s cane or headmounted systems could aid visually impaired people in navigation by identifying and describing surroundings beyond the limited sensing range of a physical
cane [Gall 10, Katz 12, Ong 13]. Low-cost RI can be also used with autonomous
transportation vehicles that follow handicapped people using 3-D perception.
2.3.2
Monitoring for OR Safety and Workflow Analysis
Room Supervision for Safety in Robot-assisted Interventions. Monitoring the
working area of operating rooms (OR) using a multi-camera setup of conventional
cameras [Ladi 08, Nava 11] or multiple 3-D cameras can help improve both medical
staff and patient safety as well as the efficiency of workflows. In particular, collision avoidance is an emerging topic with the increased use of robotic workspaces,
ensuring safe human-robot interaction [Monn 11, Nico 11]. We further refer to the
EU projects SAFROS4 targeting patient safety in robotics surgery and ACTIVE5 involving camera-based OR monitoring to ensure safe workspace sharing between
people and robotic instruments.
Monitoring, Analysis, and Modeling of Workflows. In addition to ensuring safe
human-robot interaction, OR workspace monitoring is of interest for the analysis
and modeling of workflows. In intensive care units, 3-D perception and automatic
activity recognition hold potential for workflow supervision and documentation,
and can help improving workflow efficiency [Lea 12]. Another interesting research
4 http://www.safros.eu/
5 http://www.active-fp7.eu/
22
Range Imaging and Surface Registration in Medicine
direction addresses the development of a context-aware system for surgical interventions that is able to recognize the surgical phase within the procedure and support the surgeon with appropriate visualization etc. [Pado 12]. We also refer to
the MICCAI workshop series on Modeling and Monitoring of Computer Assisted
Interventions (M2CAI).
2.3.3
Touchless Interaction and Visualization
Touchless Interaction in Sterile Environments. The advent of touchless realtime user-machine interaction that came along with the introduction of low-cost
RI sensors has also evoked interest in the medical domain. In particular, gesture
control holds potential for touch-less interaction in interventional radiology where
surgeons need to remain sterile but often want to navigate patient data and volumetric scans (CT/MR) etc. [Foth 12, Ment 12, John 11, Bigd 12a, Bigd 12b, Stau 12,
Sout 08, Gall 11]. Gesture-based techniques are also being considered for fine control of surgical tools.
On-patient Visualization of Medical Data. On-patient visualization of anatomical data can be used for medical education and training, intervention planning and
further applications that require an intuitive visualization of 3-D data. Basically,
using an RI camera, the system tracks the pose of the patient w.r.t. the camera’s
coordinate system and blends anatomical information onto an augmented reality
display that can be mounted rigidly [Blum 12, Nava 12] or be portable [Maie 11].
The former addresses anatomy teaching, the latter can supersede the traditionally
mental transfer of medical image data visualized on a wall-mounted monitor onto
the patient.
2.3.4
Guidance in Computer-assisted Interventions
3-D Endoscopy for Minimally Invasive Procedures. Knowledge about the local
surface geometry during minimally invasive procedures holds great benefits compared to conventional 2-D endoscopy. Typically, 3-D data improve robustness for
applications such as instrument localization, collision detection, metric measurements, and augmented reality, hence improving both quality and efficiency of minimally invasive procedures. Over the past years, various competing technologies
have been proposed such as stereo vision [Stoy 12], photometric stereo [Coll 12],
color-coded structured light [Schm 12], or ToF [Haas 12, Haas 13, Penn 09].
Patient Localization, Setup and Tracking. The automation of patient setup and
position tracking is of particular interest for repeat treatments such as in fractionated radiation therapy, where the tumor is irradiated in a set of treatment sessions and reproducible patient setup is a key component for accurate dose delivery. Proposed solutions in RT rely on active stereo imaging with interactive
frame rates [Peng 10], ToF imaging [Baue 12b, Plac 12], Microsoft Kinect [Baue 11a,
Wasz 12b], and light sectioning [Brah 08, Ettl 12a]. Beside RT, the estimation of
patient position, orientation and pose [Grim 12, Scha 09], and the localization of
2.4
Surface Registration
23
accessories such as MR coils [Simo 12] also holds potential for diagnostic and interventional imaging.
Motion Management. Tracking the motion of patients by monitoring their external body surface is an essential prerequisite for motion compensation techniques.
Motion compensation is of particular interest in RT for abdominal and thoracic
regions where motion induces a substantial source of error, and holds potential
for improvements in tomographic image acquisition. In contrast to early strategies that were restricted to acquiring a 1-D respiration curve [Scha 08], the methods proposed in this thesis recover dense deformation fields that better reflect
the complexity of respiratory motion and allow for an automatic distinction between abdominal and thoracic respiration types [Wasz 12a]. Solutions for both
dense [Baue 12b, Baue 12d] and sparse RI data [Baue 12c, Baue 12a] are presented
in Part II, being tailored to the applied sensor technology. Vice versa, motion
tracking and compensation can be also applied to improve patient setup accuracy
[Wasz 12b, Wasz 13].
Guidance, Navigation and Augmented Reality. Real-time RI provides the basis
for intra-operative acquisition of the 3-D operation area and registration of organ
surfaces to pre-operative data in image-guided interventions, both for minimally
invasive and open surgery. Among others, RI has been proposed for marker-free
needle-tracking [Wang 12] and fusion with pre-operative data for augmented reality applications [Maie 12, Seit 12, Sant 12b, Sant 12a, Mull 11, Sant 10].
2.4
Surface Registration
Along with recent advances in dense and real-time 3-D RI and the availability of
3-D databases, the analysis of geometric shapes has been assuming an increasingly
important role. In particular, the identification of corresponding elements between
two or more given shapes represents a fundamental prerequisite for a wide range
of applications. Below, let us cover the variety of names for this task that appear
in the literature (shape correspondence, geometry registration, surface matching,
model alignment) with the term surface registration.
In general, the process of surface registration aims at recovering a transformation that brings a given moving template shape in congruence with a fixed reference shape and involves three basic components:
1. The choice of an appropriate geometric transformation model
2. The definition of an objective function based on a suitable similarity or distance metric
3. The application of an efficient optimization strategy that estimates the optimal transformation parameters w.r.t. the objective function
24
Range Imaging and Surface Registration in Medicine
The actual choices for these three components highly depend on the problem at
hand. In this section, we summarize related work and differentiate between local vs. global and rigid vs. non-rigid surface registration. In addition, we provide
an overview of medical applications. For a more comprehensive overview we refer to the surveys by Kaick et al. on shape correspondence [Kaic 11], Salvi et al. on
range image registration [Salv 07], Audette et al. and Sotiras et al. on medical image registration [Aude 00, Soti 12] and Heimann and Meinzer on statistical shape
modeling [Heim 09]. In order to focus on methods that are most relevant to this
thesis, the review is restricted to fully-automatic approaches that do not require
user input. Furthermore, we consider the case of pairwise surface registration between two given shapes as opposed to group-wise shape registration. The review
does not cover methods that cannot cope with partial matching and computationally highly expensive methods for recovering large-scale deformations such as approaches based on graph embeddings [Mate 08].
2.4.1
Global vs. Local Surface Registration
First, let us distinguish between global (aka. coarse) surface registration methods
that explore the entire solution space and local (aka. fine) registration approaches
that rely on an initial estimate and/or assume a similar pose between template
and reference. For applications where the latter assumption is not given, typically a two-stage procedure is applied, combining a pre-alignment using global
matching strategies with a subsequent local registration refinement technique that
is initialized with the previously computed coarse initial guess.
In the general global registration case, the goal is to find correspondences without pre-aligning the datasets. A popular approach is the use of feature descriptors
that encode the local 3-D surface topography of the fixed and moving shape. These
are then matched to establish point-to-point correspondences. Based on these correspondences and geometric consistency criteria, the transformation to align the
template with the reference can be established. Salient points may be localized
prior to the stages of feature description and correspondence search in order to
keep the search space in a manageable dimension. Speaking in the words of an
optimization problem, the objective function consists of a matching term that enforces similarity of the feature descriptors and is subject to constraints that quantify the degree of shape deformation. A common constraint for global registration
techniques assesses the disparity in the distances and normal orientations between
pairs of corresponding points, approximating the compatibility between pairs of
assignments without first aligning the shapes. In the past decade, a variety of
3-D shape descriptors have been proposed, e.g. based on spherical point signatures
[Chua 97], spin images [John 99], spherical harmonics [Kazh 03, Khai 08], shape
context [Belo 00, From 04], integral descriptors [Gelf 05, Mana 06], multi-scale features [Li 05], salient geometric features [Gal 06] and diffusion geometry features
such as heat diffusion signatures [Sun 09]. While traditional feature-based techniques were developed for the rigid registration case addressing globally or piecewise rigid (articulated) motion, they have been also explored for non-rigid regis-
2.4
Surface Registration
25
tration scenarios [Bron 11, Zaha 09, Sun 09], also cf. Sect. 2.4.2. For a concrete application of 3-D shape descriptors in rigid surface registration we refer to Chap. 3.
Global registration approaches are typically used to guide local registration
techniques that refine the initial solution. However, many scenarios involve only
a slight misalignment of the fixed and moving shapes and do not require an initialization using the aforementioned techniques. For instance, assuming a similar
pose is a valid assumption for the registration of data streams acquired with a
hand-guided RI device (Chap. 4) or for tracking the motion of a respiring patient
on a non-moving treatment table from a static viewpoint (Chap. 5).
2.4.2
Rigid vs. Non-Rigid Surface Registration
In contrast to the previous distinction between global and local registration approaches, let us now give an overview about related work w.r.t. the class of the
underlying transformations mapping a moving template shape onto a fixed reference. In the rigid case, the transformation is constrained to global translations and
rotations. The non-rigid case involves elastic deformations.
Rigid Surface Registration. Global surface registration in the rigid case, e.g. in
the presence of gross misalignments, is typically addressed using feature-based
approaches. As we already sketched the basic idea behind this class of methods
including popular 3-D shape descriptors in Sect. 2.4.1, let us here confine to the
case of rigid local surface registration. The most prominent and widely used algorithm for this task is the iterative closest point (ICP) algorithm originally introduced
by Besl and McKey [Besl 92], Chen and Medioni [Chen 92] and Zhang [Zhan 94]. In
an iterative manner, the ICP algorithm alternates between finding point correspondences in a nearest neighbor relationship and estimating the rigid transformation
that optimally aligns the correspondences in a least-squares sense, minimizing its
distance using a closed-form solution [Horn 87], cf. Chap. 4. The underlying objective function consists of a matching term that quantifies how well the datasets
align to each other. It is worth noting that ICP convergence depends on the quality
of the initial pre-alignment.
Over the years, various variants of the ICP algorithm have been proposed
[Rusi 01] with a focus on robustness to noise and outliers, efficiency, and alignment accuracy, e.g. considering different distance metrics (point-to-plane vs. pointto-point) to avoid snap-to-grid effects or addressing anisotropic noise [Maie 12].
Furthermore, promising alternatives to avoid the correspondence search in the
first stage have been proposed, e.g. using probabilistic [Jian 05], implicit [Zhen 10,
Rouh 11] or distance field representations [Fitz 03]. A prominent practical example for rigid surface registration is the alignment of 3-D scans [Gelf 05, Paul 05,
Aige 08]. Assuming rigidity in the scanning target, a classical approach to scan
alignment is a two-stage scheme combining a feature-based estimation of the gross
transformation with an ICP-like refinement technique. However, even though the
scanning target is a rigid body, non-linearities or calibration uncertainties of the
acquisition device may induce low-frequent non-rigid deformations [Brow 07].
26
Range Imaging and Surface Registration in Medicine
Non-Rigid Surface Registration. Non-rigid surface registration (aka. deformable
shape matching) is essential for various applications. One particular direction of
research is on building statistical shape models that can be used for model morphing [Hasl 09] or as prior knowledge with related tasks, for instance guiding segmentation approaches in medical image computing [Heim 09], cf. Sect. 2.4.3.
Among the most popular approaches to non-rigid surface registration is a modification along the original ICP algorithm and its variants. Instead of assigning
discrete one-to-one point correspondences based on a nearest neighbor criterion,
the assignment is relaxed to be a continuously valued confidence measurement.
Typically, correspondences are established between all combinations of points according to some probability. Hence, these soft assignment techniques can be considered a generalization of the binary assignment in the classical ICP. An inherent
advantage of such a weighted correspondence framework is its capability to handle noise and outliers, respectively. In the past decade, several soft assignment
approaches have been proposed. One of the first works in this field is the robust
point matching (RPM) framework by Chui and Rangarajan [Chui 03]. In combination with a thin plate spline (TPS) parameterization of the underlying elastic
transformation [Book 89], it alternates between soft assignment update using deterministic annealing and estimation of the TPS parameters, and has become one
of the most popular approaches (TPS-RPM) for non-rigid surface alignment. The
work by Myronenko and Song [Myro 10] proposes a similar alternating strategy
that considers the alignment as a probability density estimation problem. Basically, they interpret the template point set as Gaussian mixture model (GMM)
centroids that are fit to the reference point set by maximizing the likelihood in
an expectation-maximization (EM) framework [Demp 77]. To ensure the motion
field to be smooth, implying preservation of topographical structure, the GMM
centroids are constrained to move coherently. Different to Chui and Rangarajan [Chui 03], Myronenko and Song model the non-rigid transformation in a nonparametric manner. Related work by Tsin and Kanade [Tsin 04] and Jian and Vermuri [Jian 05] consider the surface registration problem as an alignment between
two distributions, modeled as Gaussian mixture models (GMM). In the non-rigid
case, one of the models is parameterized by TPS and the transformation parameters are estimated to minimize the statistical discrepancy between the two mixtures. For a common viewpoint of these closely related methods [Chui 03, Jian 05,
Myro 10] and the relation to the classical rigid ICP [Besl 92, Chen 92] we refer the
interested reader to the generalized framework by Jian and Vemuri [Jian 11]. Methods proposed for non-rigid surface registration along the basic idea of ICP alignment typically combine non-probabilistic closest point measures with an additional regularization formulation to enforce shape preservation [Alle 03, Ambe 07].
Another direction in non-rigid surface registration builds on implicit shape representations such as 3-D distance functions [Jone 06], embedding the problem of
shape correspondence to a volumetric image domain. In the work of Paragios et
al. [Para 03] and Huang et al. [Huan 06], both the template and reference shapes
are embedded into a distance transform space. The alignment procedure itself is
similar to volumetric non-rigid image registration algorithms, dividing into nonparametric [Para 03] and parametric [Huan 06] models such as free-form deforma-
2.5
Discussion and Conclusions
27
tions [Ruec 99]. An interesting generalization of distance transform based registration in the volumetric domain is the encoding of shapes into vector-valued feature
images where a voxel holds a set of complementary descriptor values [Tang 08a].
2.4.3
Medical Surface Registration
In medical image computing, establishing dense correspondences between surface
data is a fundamental requirement for the analysis of anatomical shapes that can be
of approximately rigid (e.g. bones) or highly elastic (e.g. organs) nature and vary
across age, gender, ethnicities, and diseases. Over the past decade, statistical shape
modeling and atlas generation has evolved to an established subfield [Heim 09].
Medical shape modeling typically builds on volumetric imaging modalities
(CT/MR/PET/SPECT) and extracted shapes are often represented as a binary volume. Hence, surface registration in medical image computing assumes a special
role compared to other fields such as computer graphics where shapes are generally represented as meshes. Medical surface registration approaches can be divided into three classes: mesh-to-mesh, mesh-to-volume and volume-to-volume
shape alignment. Rigid mesh-to-mesh registration approaches using Procrustes
analysis [Jens 12] or ICP variants [Albr 12] is typically applied when shapes are
expected to be similar, e.g. bones [Vos 04]. For anatomical structures that exhibit
a substantially higher degree of variation, non-rigid mesh-to-mesh techniques are
used [Huan 06, Tang 08a, Myro 10]. Notable early works are parametric solutions
by Subsol et al. [Subs 98] and Fleute et al. [Fleu 99], for a survey we refer to Audette et al. [Aude 00]. Different to this first class of mesh-to-mesh registration techniques, mesh-to-volume alignment is a popular option. Here, a deformable shape
model represented as a morphable mesh is adapted to a volumetric segmentation or used to actually perform the segmentation in the original intensity domain
[Fleu 02]. The third option is a volume-to-volume registration, either on binary
segmentation data [Fran 02] or in the original intensity domain [Ruec 03]. As indicated before, the widespread application of this last class of techniques in medical
shape registration is due to the fact the the original representation is a volume. Let
us remark that volume-to-volume based approaches can be also applied to shapes
that were originally represented as meshes by transforming them into a volumetric representation. However, on the downside, volumetric registration typically
implies a substantial computational burden since it aligns not only the target but
also the background.
Independent of the underlying methodology, surface registration has been applied to a variety of anatomical structures such as cardiac ventricles [Huan 06,
Myro 10], cerebral structures [Rang 97, Jian 05, Tang 08a, Comb 10, Kurt 11], abdominal organs [Clem 08, Sant 10], cartilage tissue [Jens 12], and bone structures
[Gran 02, Sesh 11, Gies 12], among others.
2.5
Discussion and Conclusions
In this chapter, we have presented a comprehensive overview of real-time range
imaging technologies and have identified promising applications in different fields
28
Range Imaging and Surface Registration in Medicine
of health care. Many of these applications involve the alignment of an acquired
3-D shape with a reference model. This underlines the demand for dedicated surface registration approaches that are tailored to cope with the specific properties of
RI data. In the remainder of this thesis, we propose both rigid and non-rigid surface registration techniques that are optimized w.r.t. the strengths and limitations
of different RI technologies and describe novel concepts to improve the quality,
safety and efficiency of clinical workflows.
Part I
Rigid Surface Registration for Range
Imaging Applications in Medicine
29
CHAPTER
3
Feature-based Multi-Modal
Rigid Surface Registration
3.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Feature-based Surface Registration Framework. . . . . . . . . . . . . . . . . 36
3.4 Shape Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
The robust alignment of surface data being acquired with different modalities
holds great potential for medical applications but has been rarely addressed in
literature. In this chapter, we present a generic framework for multi-modal rigid
surface registration. Facing large misalignments and partial matching scenarios,
we chose a feature-based registration approach. It is based on matching feature
descriptors that encode the local 3-D surface topography to establish point correspondences between a moving template and a fixed reference shape. Based
on these correspondences, the rigid transformation is estimated that brings both
shapes into congruence. We put particular emphasis on the conception of shape
descriptors that are capable of coping with surface data from different modalities.
This implies non-consistent mesh density, mesh organization and inter-modality
deviations in surface topography that result from the underlying sampling principles and noise characteristics. We have adapted state-of-the-art descriptors to meet
these specific demands and handle the required invariances. In the experiments,
we address two different clinical applications:
• RI/CT torso surface registration for automatic initial patient setup in fractionated radiation therapy [Baue 11a]
• RI/CT organ surface registration for augmented reality applications in imageguided open liver surgery (IGLS) [Mull 10, Mull 11]
The remainder of this chapter is organized as follows: The medical background
is depicted in Sect. 3.1. In Sect. 3.2, we review relevant literature. The proposed
rigid surface registration framework is introduced and detailed in Sects. 3.3-3.4.
Experimental results for torso and organ surface registration are given in Sect. 3.5.
31
32
Feature-based Multi-Modal Rigid Surface Registration
Eventually, we discuss the results and draw conclusions in Sect. 3.6. Parts of this
chapter have been published in [Baue 11a, Mull 10, Mull 11]. Parts from [Baue 11a]
are reprinted with permission of IEEE, © 2011 IEEE.
3.1
Medical Background
We propose the application of the developed multi-modal rigid surface registration framework in two medical applications. In RT, prior to each treatment fraction, the patient must be aligned to tomographic planning data. We present a novel
marker-less solution that enables a fully-automatic initial coarse patient setup using multi-modal RI/CT surface registration. In image-guided liver surgery, the
registration of the intra-operative organ shape with surface data extracted from
pre-operative tomographic image data is conventionally performed based on manually selected anatomical landmarks. We introduce a fully automatic scheme that
is able to estimate the transformation for organ registration in a multi-modal setup
aligning intra-operative RI with pre-operative CT data. Below, let us detail the
clinical motivation for both applications.
3.1.1
Patient Setup in Fractionated Radiation Therapy
Precise and reproducible patient setup is a mandatory prerequisite for the success
of fractionated RT, improving the balance between complications and cure and
providing the fundamental basis for high-dose and small-margin irradiation application. Prior to each fraction, the patient must be accurately aligned w.r.t. the target isocenter that has been localized in tomographic planning data (typically CT).
Conventionally, the alignment is performed manually by clinical staff using a laser
cross and skin markers. For subsequent setup verification and correction, stereoscopic X-ray imaging, radiographic portal imaging, cone-beam CT or CT-on-rails
may be applied. However, these involve additional radiation exposure to the patient. Non-radiographic techniques that locate electromagnetic fiducials [Kupe 07]
are an accurate alternative, but require the patient to be eligible for the invasive
procedure of marker implantation.
Over the past few years, several devices for non-radiographic and non-invasive
patient setup and monitoring based on RI have found their way into the clinics [Bert 05, Brah 08, Fren 09, Peng 10]. The impact of these solutions has provoked
the American Association of Physicists in Medicine (AAPM) to issue a special report on quality assurance for these kind of non-radiographic systems [Will 09].
Regardless of the particular RI technology, the systems provide a complete, precise and metric 3-D surface model of the patient in a marker-less manner. Typically, available systems are capable of estimating the table transformation that
brings a pre-defined region of the intra-fractional patient surface in congruence
with a reference. A number of studies have shown that these techniques obtain a
high degree of precision in patient setup for thoracic and abdominal tumor locations [Bert 06, Gier 08, Kren 09, Scho 07]. However, existing solutions are designed
with a focus on setup verification and the automatic RI-based alignment is limited to a fine-scale position refinement. Gross misalignments cannot be resolved
3.1
Medical Background
RI camera
33
LINAC
Treatment table
Figure 3.1: Schematic illustration of the proposed automatic initial patient setup: The patient’s intra-fractional surface is acquired with an RI device and registered to a reference
shape extracted from tomographic planning data (depicted in gray). The estimated transformation (blue) that brings template and reference into congruence is then applied to the
treatment table control.
in an automatic manner. This entails that the initial patient alignment must be performed with conventional techniques using laser cross-hairs in combination with
skin markers [Bert 06, Fren 09, Gier 08, Kren 09]. This manual initial coarse setup
is both a time-consuming and tedious procedure. In average, patient setup time in
fractionated RT lies in the range of several minutes.
In this chapter, we propose an RI-based solution that enables a marker-less and
automatic initial coarse RT patient setup, superseding the need for manual positioning using lasers and skin markers. Blending seamlessly into the clinical workflow, where the treatment table is initially moved to its lowest position to allow
the patient to get on the table, the proposed approach directly acquires the patient
surface in this initial position and aligns the target to the isocenter of the linear
accelerator (LINAC) by registration to a given reference shape. An illustration of
the concept is given in Fig. 3.1. In general, the method can be applied to reference
surfaces either acquired by the RI device prior to or in the first fraction [Plac 12], or
extracted from tomographic planning data [Baue 11a]. Let us stress that using RI
reference data involves a two-step alignment procedure: In the first fraction, the
patient needs to be manually positioned with conventional alignment techniques.
Its body surface in this position is then captured using RI and stored as the reference shape for the remaining treatment fractions. This two-step approach involves
increased alignment uncertainties due to error propagation. Hence, in this work,
we particularly focus on the direct alignment of intra-fractional RI surface data to a
reference shape extracted from pre-fractional tomographic data. This strategy implies the need for multi-modal registration, but yields a substantial gain in setup
accuracy compared to the aforementioned two-step workflow. The capability to
handle partial matching is required for both workflows. In the considered case,
for instance, depending on the positioning and orientation of the RI camera, its
field of view may differ substantially from the CT scan volume.
34
Feature-based Multi-Modal Rigid Surface Registration
( a)
(b)
(c)
Figure 3.2: Intra-operative navigation in image-guided open liver surgery using a markerbased optical tracking system [Bell 07]. ( a) Registration of 3-D US data with pre-operative
planning data. (b) Intra-operative setup with stereo localization and tracking device (left),
US transducer equipped with retro-reflecting spheres (at the operation situs), and navigation screen (center). (c) Instrument tracking. Images reprinted from Beller et al. [Bell 07]
by permission of John Wiley & Sons, Inc. Copyright © 2007 British Journal of Surgery
Society Ltd. Published by John Wiley & Sons, Ltd.
3.1.2
Image-Guided Open Liver Surgery
In image-guided surgery, the anatomical expertise of the surgeon is augmented
with a patient-specific source of information by correlating the operation situs with
pre-operative tomographic image data. This allows the physician to see the position and orientation of the surgical probe in relation to anatomical structures during the procedure, see Fig. 3.2. The essential step in image-guided surgery is the
determination of the mapping between the intra-operative representation of the
exposed organ and the patient anatomy available from pre-operatively acquired
tomographic data. Image-guidance has found widespread acceptance in neurosurgery. Here, the aligning transformation is performed based on landmarks, via
bone-implanted or skin-affixed fiducial markers [Clem 08]. The use of landmarkbased techniques is facilitated by the rigid anatomy surrounding the target.
For image-guided open liver surgery, this assumption does not hold. Hence,
surface-based techniques have been proposed to align the intra-operative representation to pre-operative images [Herl 99a, Herl 99b]. Benefits of computer navigated tool guidance in IGLS include (1) an enhanced support for the resection of
subsurface targets and avoidance of critical structures, (2) improved outcomes due
to reduced resection margins, and (3) an expansion of the spectrum of resectability [Cash 07]. To date, the conventional registration protocol for image guidance in
hepatic surgery is based on a landmark-based initial alignment [Herl 99b, Clem 08].
As a prerequisite, anatomical fiducials are manually selected in the pre-operative
image sets prior to surgery [Cash 07]. Then, during the procedure, the corresponding locations are digitized with a pen probe system at the operation situs. Based
on the given correspondences, the aligning transformation is estimated. Eventu-
3.2
Related Work
35
ally, if dense surface data is available, the initial landmark-based registration may
be refined by conventional rigid surface registration techniques [Besl 92, Clem 08].
In clinical practice, manual fiducial selection by radiology experts is difficult,
subjective and time-consuming. To overcome this elaborate task, we introduce
a fully automatic scheme that estimates the transformation without any manual
interaction, based on multi-modal RI/CT organ surface registration.
3.2
Related Work
For partial surface matching scenarios as addressed in this chapter, the trend is
toward methods that establish point correspondences from matching local feature
descriptors. In IGLS, we typically face partial matching as RI data only cover the
part of the organ that resides within the sensor’s field of view (FOV), as opposed
to the full shapes extracted from tomographic planning data. Matching local invariant features is a key component in a variety of computer vision tasks in the
2-D and 3-D domain such as registration, object recognition, scene reconstruction
or similarity search in databases. Owing to its relevance to this work, we focus
our discussion of related work on the subfield of feature-based 3-D surface registration.
Typically, the descriptors encode the surface geometry in a limited support
region around a point of interest [Chua 97, From 04, John 98, John 99, Tomb 10,
Zaha 09]. The point correspondences can then be used to estimate the transformation that aligns the shapes. Among the first descriptors in the field were spherical
point signatures [Chua 97] and spin images [John 98, John 99]. Introduced more
than a decade ago, the latter still enjoy great popularity for surface matching and
have established as one of the most commonly used methods for local 3-D surface description even though being originally designed for global shape encoding.
Over the past years, a broad variety of 3-D shape descriptors have been proposed.
For a comprehensive survey we refer to Sect. 2.4.1 and to the reviews by Bronstein
et al. [Bron 11], Bustos et al. [Bust 05], Tangelder and Veltkamp [Tang 08b], and the
shape retrieval contest benchmarks [Bron 10, Boye 11].
Let us also establish ties with the planar domain, where the majority of successful 2-D descriptors such as histograms of oriented gradients (HOG) [Dala 05],
scale-invariant feature transform (SIFT) [Lowe 04], and rotation-invariant fast features (RIFF) [Taka 10] rely on histogram representations. In the context of local
descriptors, histograms trade off descriptive power and positional precision for
robustness and repeatability by compressing geometry into bins [Tomb 10], thus
being an appropriate choice for noisy data. In this work, we propose adaptations
of and extensions to state-of-the-art HOG-like descriptors that enable its application for robust multi-modal surface registration. First, we exploit the fact that
some of the concepts known from 2-D feature description can be generalized to
the 3-D domain. For instance, inspired by the performance of HOG, Zaharescu
et al. extended the descriptor to scalar fields defined on 2-D manifolds (MeshHOG) [Zaha 09]. Even though the results of Zaharescu et al. indicate insensitivity
of the descriptor to non-rigid deformations, the fact that it is constructed based on
k-ring neighborhoods makes it triangulation-dependent, as noted by Bronstein et
36
Feature-based Multi-Modal Rigid Surface Registration
al. [Bron 11]. Hence, in this work, we have substantially modified the MeshHOG
descriptor to achieve invariance to mesh density and organization, and to improve
its robustness to topographical deviations due to the different 3-D acquisition techniques (RI vs. CT). Second, we introduce a scheme that enables the application of
2-D descriptors to surface data in a direct manner. In particular, based on an orthographic depth representation of the local 3-D surface topography in a planar patch,
we have extended the 2-D RIFF descriptor [Taka 10] to the domain of 3-D surfaces.
3.3
Feature-based Surface Registration Framework
The proposed feature-based framework for multi-modal rigid surface alignment
is composed of three stages, as illustrated in Fig. 3.3: First, the local 3-D surface
topography of both the moving template point set Xm (RI data) and the fixed reference shape point set Xf (extracted from CT data), denoted:
Xm = { xm,1 , . . . , xm,|Xm | } ,
Xf = { xf,1 , . . . , xf,|Xf | } ,
x m ∈ R3 ,
x f ∈ R3 ,
(3.1)
(3.2)
is encoded in a discriminative manner using shape descriptors that are explicitly
designed to handle surface data from different modalities (Sect. 3.4). The subscripts m and f denote moving and fixed, respectively. The number of elements
of a point set X is denoted |X |. Second, point correspondences are established
by descriptor matching and successively pruned using geometric consistency constraints (Sect. 3.3.1). Third, based on these point correspondences, the rigid-body
transformation aligning Xm to Xf is estimated (Sect. 3.3.2). Eventually, an iterative
closest point (ICP) variant may be applied
to refine the alignment (Sect. 4.3.1). The
global rigid transformation Rg , tg is then given as the concatenation
of the trans
formation estimated by feature-based pre-alignment Rpre , tpre and the transformation estimated by ICP-based refinement Ricp , ticp :
Rg = Ricp Rpre ,
tg = Ricp tpre + ticp ,
(3.3)
(3.4)
where R ∈ SO(3) denotes a rotation matrix and t ∈ R3 a translation vector.
3.3.1
Correspondence Search
According to the workflow introduced before, the corresponding sets of local descriptors for both the moving and the fixed shape must be computed first. Let us
denote the descriptor sets as:
Dm = {dm,1 , . . . , dm,|Xm | } ,
Df = {df,1 , . . . , df,|Xf | } ,
(3.5)
(3.6)
where d ∈ RD denotes a feature vector of dimensionality D. Again, the subscripts
m and f denote membership to the moving template and fixed reference, respectively. Details on the proposed 3-D shape descriptors are given in Sect. 3.4. For
3.3
Feature-based Surface Registration Framework
Pre-procedural Workflow
CT
RI
Segmentation,
Triangulation
Segmentation,
Triangulation
37
Intra-procedural Workflow
Feature
Extraction
Correspondence
Search
Transformation
Estimation
ICP
Refinement
Feature
Extraction
Feature-based Rigid Surface Registration
Figure 3.3: Flowchart of the feature-based rigid surface registration framework for RI/CT
alignment. Data propagates from left to right. The darker shading indicates the parts of the
workflow that can be performed prior to the first fraction to reduce the intra-procedural
computational load.
now, let us assume the availability of Dm , Df . Then the moving data M and the
fixed data F are represented as two sets of pairs of surface coordinates xi and their
associated feature descriptors di :
n
o
M = ( xm,1 , dm,1 ) , . . . , ( xm,|Xm | , dm,|Xm | ) ,
(3.7)
n
o
F = ( xf,1 , df,1 ) , . . . , ( xf,|Xf | , df,|Xf | ) .
(3.8)
Initial Correspondence Search. Given a moving point xm and its associated descriptor dm , we determine the corresponding fixed point xc ∈ Xf by searching for
the best matching descriptor dc ∈ Df , using an appropriate distance metric d :
dc = cf (dm ) = argmin d (dm , df ) ,
(3.9)
df ∈Df
where cf denotes the correspondence operator with the subscript denoting the
search domain. The particular choice of the distance metric d depends on the design of the feature descriptor (Sect. 3.4). The resulting initial set of correspondences
is denoted as Cinit = {( xm,1 , xc,1 ) , . . . , ( xm,|Xm | , xc,|Xm | )}.
Cross Validation. Then, vice versa, searching in the opposite direction, we validate if dm,i also constitutes the best match for dc,i in Dm :
?
dm,i = cm (dc,i ) = argmin d (dc,i , dm ) .
(3.10)
dm ∈Dm
All correspondence pairs ( xm , xc ) that do not pass this cross validation check are
discarded in the subsequent processing steps. The remaining set of correspondences is denoted Ccross ⊆ Cinit .
Geometric Consistency Check. For the purpose of eliminating erroneous correspondences, the set of correspondences Ccross is further pruned by applying a geometric consistency check similar to Funkhouser and Shilane [Funk 06] and Gelfand
38
Feature-based Multi-Modal Rigid Surface Registration
(a)
(b)
(c)
Figure 3.4: Graphical illustration of the geometric consistency check for a simplified example with |Xm | = 3. ( a), (b), (c) Computation of gc ( xm,i ), gc ( xm,j ), gc ( xm,k ), comparing the
Euclidean distance of point pairs (blue edges) and their correspondences (green edges).
et al. [Gelf 05]. It is based on the assumption of a rigid transformation, implying
that the distance between two points xm,i , xm,j ∈ Xm is equal to the distance between their correspondences xc,i , xc,j ∈ Xf . Hence, for each ( xm,i , xc,i ) ∈ Ccross we
compute a geometric consistency metric
gc that considers the root mean squared
pairwise distance w.r.t. all xm,j , xc,j ∈ Ccross (cf. Fig. 3.4):
v
u
u
gc ( xm,i ) = t
1
|Ccross |
|Ccross |
j =1
∑
k xm,i − xm,j k2 − k xc,i − xc,j k2
2
,
(3.11)
where |Ccross | denotes the number of correspondences that passed the cross validation stage. In an iterative scheme, we successively penalize and eliminate a fixed
percentage of low-grade correspondences according to gc ( xm,i ) until the criterion
gc ( xm,i ) < δc is fulfilled for all remaining correspondence pairs given a reliability threshold δc . Let us stress that for a partial matching scenario, the geometric
consistency check (Eq. 3.11) must be constrained to a local neighborhood.
3.3.2
Transformation Estimation
The remaining set of reliable correspondences C ⊆ Ccross is eventually used to estimate the rigid body transformation ( Rpre , tpre ) that aligns the moving template to
the fixed reference, based on minimizing the squared Euclidean distance between
the transformed moving points and its correspondences in the fixed point set:
( Rpre , tpre ) = argmin
R,t
∑
k( Rxm + t ) − xc k22 .
(3.12)
( xm ,xc ) ∈ C
This optimization problem can be solved in a closed form using Horn’s unit quaternion optimizer [Horn 87].
For the RT patient setup experiments in Sect. 3.5.1, Rpre is restricted to a rotation about the table’s vertical isocenter axis (here: x-axis) as conventional RT treatment tables are limited to four degrees of freedom (translation and rotation about
3.4
Shape Descriptors
39
one axis). In matrix notation and homogeneous coordinates, the transformation is
then given as:


1
0
0
tx
0 cos(θ ) −sin(θ ) t 

y


0 sin(θ ) cos(θ ) tz 
0
0
0
1
xm
1
!
=
xc
1
!
∀ ( xm , xc ) ∈ C ,
(3.13)
where θ denotes the rotation angle. The resulting non-linear optimization problem
to estimate the rotation angle θ and the transformation t = (t x , ty , tz )> from a set
of reliable correspondences C can be solved numerically in a least-squares sense
using the Levenberg-Marquardt algorithm [Marq 63], for instance.
3.4
Shape Descriptors
In this section, we introduce the 3-D shape descriptors that we applied in the experiments (Sect. 3.5). Let us emphasize that the framework introduced in Sect. 3.3
is not limited to these specific descriptors but generic in a sense that any feature
descriptor that meets the specific requirements for the application to be addressed
can be used. The shape descriptors presented below encode the local surface topography in the neighborhood N of a reference point xref in a translation- and
rotation-invariant manner. However, the descriptors are not invariant to scale on
purpose as in most real-world applications it is beneficial to incorporate the metric
scale of the anatomical surface topography as an important characteristic. Placing
great importance on matching robustness and repeatability, we have customized
two HOG-like descriptors (MeshHOG, RIFF) and the well established technique of
spin images [John 99] for multi-modal application. Below, we outline the descriptors’ functional principles, identify its limitations and detail the concrete modifications we propose to meet the requirements in multi-modal surface registration.
Before, let us introduce our notation of a histogram H(·, ·, ·) → R NH , where
the first entry denotes the neighborhood of the domain that is to be encoded, the
second entry the parameter that determines the bin assignment, and the third entry the parameter that controls the bin weighting, respectively. NH denotes the
number of histogram bins.
3.4.1
Spin Images
Given is an oriented surface point, i.e. a point xref ∈ R3 with its associated normal
nref ∈ R3 , knref k2 = 1, cf. Fig. 3.5a. Then the pair ( xref , nref ) inherently describes
an object-centered local point basis being invariant to rigid transformations. In
particular, it gives a cylindrical coordinate system that is missing the polar angle
coordinate. Based on this local basis, the corresponding spin image is generated
as follows [John 98, John 99]:
40
Feature-based Multi-Modal Rigid Surface Registration
• First, the set of points within a cylindrical neighborhood N = { xi , . . . , x j }
centered around xref , also denoted support region below, are expressed in
cylindrical coordinates xcyl ∈ R2 :
xcyl,i =
xcyl,i
ycyl,i
q
!
:=
> ( x − x ))2
k xi − xref k22 − (nref
i
ref
> (x − x )
nref
i
ref
!
,
(3.14)
where xcyl,i is the non-negative perpendicular radial distance to nref and ycyl,i
denotes the signed elevation component with respect to the surface tangent
plane defined by xref and nref , see Fig. 3.5d. The resulting set of cylindrical
coordinates Ncyl = { xcyl,i , . . . , xcyl,j } describes the relative positions of N
with respect to its local basis ( xref , nref ).
• Second, the cylindrical coordinate space is quantized into discrete bins of a
2-D histogram called spin image, i.e., the histogram is accumulated according
to the position of the points xcyl,i ∈ Ncyl in the local cylindrical coordinate
space, cf. Fig. 3.5a. The linearized version of this 2-D histogram, concatenating the histogram rows to a single vector, yields the spin image descriptor
dspin denoted as:
>
dspin = H Ncyl , xcyl , χ( xcyl ) ,
(3.15)
where χ denotes a characteristic function that indicates the occurrence of a
point xcyl in a discrete bin.
In practice, in order to make the representation less sensitive to variations in surface sampling and noise, the association of a cylindrical coordinate xcyl,i to a discrete bin in the 2-D array is performed using bilinear interpolation. Thus, the
contribution of the point is smoothed over adjacent bins. Note that the parameterization of the spin image descriptor controls the trade-off between descriptiveness,
robustness and dimensionality. Eventually, let us remark that the process of generating spin images described before can be thought of as a detector matrix spinning
around nref and accumulating points ∈ N as it sweeps space.
For the experiments in Sect. 3.5.1, we consider spin images as a baseline shape
descriptor. In order to cope with 3-D data acquired with different modalities, thus
involving meshes with different resolutions, we propose a modification compared
to the original implementation [John 99]. Instead of deriving the histogram bin
width from the median edge length of the surface mesh, we set a fixed metric bin
width. Thus, the metric scale of the surface topography is explicitly incorporated
into the descriptor computation.
3.4.2
Mesh Histograms of Oriented Gradients (MeshHOG)
Both the MeshHOG descriptor and the RIFF descriptor (Sect. 3.4.3) rely on the
concept of HOG. Hence, let us briefly review the basic idea in the 2-D image domain first [Dala 05]. HOG-like descriptors exploit the fact that local object appearance and shape is characterized by the distribution of intensity gradient directions.
The descriptor operates on scalar image data f and evaluates local image patches.
3.4
Shape Descriptors
41
Spin Images
MeshHOG
RIFF
(a)
(b)
(c)
(d)
(e)
(f)
Figure 3.5: (a) Graphical illustration of shape description with spin images, representing
the local neighborhood in cylindrical coordinates ( xcyl , ycyl ), cf. subfigure (d), and encoding the coordinates in a 2-D histogram. (b) Functional principle of the MeshHOG surface
descriptor, projecting a gradient vector field ∇ f – sketched by the green arrows and generated according to the proposed CUSS scheme, cf. subfigure (e) – onto circular segments
of the orthogonal planes of a local coordinate system spanned by xref , nref and bref . For the
RIFF descriptor (c), the surface topography is expressed as a 2-D depth image d⊥ where
rotation-invariant gradients are analyzed within several annuli. Both the MeshHOG and
RIFF descriptors are based on a concatenation of histograms of oriented gradients, binning
a vector field according to its orientations γ(·) and magnitudes k · k2 , cf. subfigure ( f ).
First, the gradient orientations γ(∇ f ) → [0, 2π [ and magnitudes k∇ f k2 → R are
computed for each pixel of the image patch. Then, in order to measure local distributions of gradient values, the window is divided into local sub-patches (cells).
For each cell, the set of pixels N is discretized into a histogram according to its
gradient direction, cf. Fig. 3.5f:
H (N , γ(∇ f ), k∇ f k2 ) .
(3.16)
The contribution depends on the gradient magnitude at the respective pixel. Finally, the cell histograms are concatenated to form the HOG descriptor. Depending on the application, contrast normalization may be performed by scaling the
feature vector to unit length.
The MeshHOG descriptor may be considered as a generalization of the concept
of HOG from planar image domains to non-planar 2-D manifolds [Zaha 09]. Given
a scalar function f being defined on the manifold, the descriptor encodes the local
spatial distribution and orientation of a gradient vector field ∇ f being derived
from both the surface topography in terms of normals n and the scalar function f .
In particular, given a reference point xref and a set of points within a spherical
support region N = { xi , . . . , x j }, the gradient vectors ∇ f ( x) → R3 are projected
42
Feature-based Multi-Modal Rigid Surface Registration
onto the three orthogonal planes of a unique and invariant local reference frame.
Let us denote the projection onto a plane P by an operator qP . The local reference frame at xref is spanned by its normal nref and a second axis bref residing
in the tangent plane Tref defined by nref , and pointing into the dominant gradient
direction [Baue 11b]. Then, for each plane, an orientation histogram binning is performed w.r.t. circular segments, see Fig. 3.5b. In particular, the projected vectors
qP (∇ f ( x)) → R2 are assigned to Nseg circular spatial segments according to their
origin, and binned in orientation histograms w.r.t. the orthogonal local reference
frame:
H (NP ,s , γ(qP (∇ f )), kqP (∇ f )k2 ) ,
(3.17)
where NP ,s denotes the set of projected gradient vectors qP (∇ f ( x)) that reside
within a segment s in plane P , γ(qP (∇ f )) denotes its gradient orientation, and
kqP (∇ f )k2 its gradient magnitude. The MeshHOG feature descriptor dhog is then
composed as a concatenation of 3 · S gradient orientation histograms, from the
three orthogonal planes and the Nseg circular segments per plane:
dhog = (
H(NP1 ,1 , γ(qP1 (∇ f )), kqP1 (∇ f )k2 ),
H(NP1 ,2 , γ(qP1 (∇ f )), kqP1 (∇ f )k2 ),
..
.
H(NP3 ,Nseg , γ(qP3 (∇ f )), kqP3 (∇ f )k2 ) )> .
(3.18)
The study of related work in Sect. 3.2 revealed that the design of shape descriptors
typically does not account for a potential application in multi-modal scenarios.
This especially holds true for the MeshHOG descriptor. Below, we outline its original limitations and introduce our modifications to enforce (1) robustness to intermodality variations in surface topography and (2) invariance to mesh density and
mesh representation (topology).
In general, an arbitrary scalar function f can be chosen [Zaha 09], including
photometric information as applied in [Baue 11b]. In scenarios that involve a modality which only provides geometric information (such as for RI/CT registration),
the scalar function must characterize geometric surface properties. In the original
method, Zaharescu et al. proposed to use a curvature measure. However, the use
of second-order surface derivatives makes the descriptor susceptible to noise. Instead, we propose the signed distance of a point to the best fitting plane of its local
neighborhood as scalar function f . By doing so, we can better cope with the low
SNR of RI data [Mull 10].
Next, we address the invariance to mesh density and representation. Conventionally, in a 2-D image, gradients are computed by differentiating scalar data in
two orthogonal directions. Zaharescu et al. introduced a numerical gradient of a
scalar function ∇ f defined on a 2-D manifold using a discrete operator that relies on adjacent vertices [Zaha 09], restricting the approach to meshes that have a
uniform and equally scaled triangular representation. To be able to cope with arbitrary mesh representations and densities, we propose a gradient operator that is
based on a technique we call circular uniform surface sampling (CUSS), see Fig. 3.5e.
First, for a given point x, we perform a circular uniform sampling of the tangent
plane T defined by n. This is performed by rotating a reference vector a ∈ R3
3.4
Shape Descriptors
43
residing in T with k ak2 = 1 around n by a set of discrete angles {θ1 , . . . , θ Ncuss }
with θi = i · N2π
, yielding Rθi a. Rθ denotes a rotation matrix for angle θ around n.
cuss
Ncuss controls the circular sampling density. Scaling the vectors Rθi a with a sampling radius rcuss provides a set of points Xcuss = { xcuss,1 , . . . , xcuss,Ncuss } with:
xcuss,i = x + rcuss Rθi a .
(3.19)
Based on these circular arranged points Xcuss ∈ T , the surface sampling is performed by intersecting the mesh with rays that emerge from the points xcuss and
are directed parallel to n. Let us denote the mesh intersection points as mcuss , with
its associated scalar field value f (mcuss ) being interpolated w.r.t. adjacent vertices.
Eventually, the CUSS gradient ∇ f ( x) at a vertex x is defined as:
∇ f ( x) =
1
Ncuss
Ncuss
∑
i =1
f (mcuss ) − f ( x)
R θi a ,
kmcuss − xk2
(3.20)
where Rθi a denotes the circular sampling direction. The nominator differentiates
f in direction of Rθi a. The denominator acts as a regularizer that penalizes the
contribution of outliers being represented by intersection points mcuss with large
distances to x.
3.4.3
Rotation Invariant Fast Features (RIFF)
In recent years, a multitude of powerful image descriptors have been proposed
and quantitatively evaluated on large-scale databases [Miko 05]. Unfortunately,
the transfer of feature extraction concepts that were originally developed for the
2-D image domain to 3-D shape space is typically not straightforward, cf. Sect. 3.4.2.
To exploit the comprehensive body of literature on image descriptor representations, we have developed a scheme that allows to extend any 2-D descriptor concept to the domain of 3-D surfaces. The underlying strategy relates to the basic
idea of spin images, namely, representing 3-D shape in a 2-D domain first. In particular, we encode the surface topography in the neighborhood of xref in a local orthographic depth representation w.r.t. the tangent plane Tref defined by the normal
nref . Imagine a virtual RI camera hovering above the considered surface coordinate
xref . The virtual camera uses an orthographic projection model and the direction
of projection is defined by the surface normal nref . It yields a discrete 2-D image
where each pixel xp holds the signed orthogonal depth d⊥ ( xp ) of its position on
the tangent plane w.r.t. the surface, see Fig. 3.5c. For practical implementation we
exploit the OpenGL depth buffer representation, cf. Sect. 2.2.2. Let us emphasize
that this orthogonal re-sampling strategy implicitly overcomes differences in mesh
density and representation that occur in multi-modal applications due to different
triangulation schemes.
Now we are in the convenient position to apply any 2-D descriptor on this
orthogonal depth image d⊥ generated by the virtual RI camera. For the applications addressed in this thesis, we chose RIFF [Taka 10] which also build on the
established concept of HOG [Dala 05]. As proposed by Takacs et al. [Taka 10], first,
44
Feature-based Multi-Modal Rigid Surface Registration
rotation invariance is achieved by performing a radial gradient transform (RGT)
f rgt : R2 → R2 on the gradient depth image ∇d⊥ ,
>
>
f rgt (∇d⊥ ) = ∇d⊥ e1 , ∇d⊥ e2 ,
(3.21)
encoding the gradient ∇d⊥ → R2 w.r.t. a local polar reference frame. This frame
is spanned by the orthogonal basis vectors e1 and e2 that depend on the image
position xp ∈ R2 and the image position of the given reference point xp,ref :
e1 ( xp ) =
xp − xp,ref
,
k xp − xp,ref k2
e2 ( xp ) = R π2 e1 ( xp ) .
(3.22)
The basis vectors e1 ( xp ) and e2 ( xp ) are the radial and tangential directions at
xp relative to xp,ref . For the proof of rotation invariance of the RGT we refer
to [Taka 10]. Second, the image is subdivided into spatial cells arranged in Nriff
circular equidistant annuli neighborhoods N a , see Fig. 3.5c. The subscript a denotes the annulus index. Then, given the rotation-invariant gradients f rgt (∇d⊥ ),
for each annulus N a we compute a gradient orientation histogram, cf. Fig. 3.5f,
H(N a , γ( f rgt (∇d⊥ )), k f rgt (∇d⊥ )k2 ) ,
(3.23)
where γ( f rgt (∇d⊥ )) denotes the orientation and k f rgt (∇d⊥ )k2 the magnitude of
f rgt (∇d⊥ ). The RIFF descriptor driff eventually concatenates the histograms from
the set of annuli:
driff = (
H(N1 , γ( f rgt (∇d⊥ )), k f rgt (∇d⊥ )k2 ),
H(N2 , γ( f rgt (∇d⊥ )), k f rgt (∇d⊥ )k2 ),
..
.
H(N Nriff , γ( f rgt (∇d⊥ )), k f rgt (∇d⊥ )k2 ) )> .
(3.24)
Compared to a single HOG from the entire circular patch, this spatial subdivision
into annular cells enforces distinctiveness of the descriptor. Again, note that the
use of histogram representations implies resilience w.r.t. multi-modal topography
deviations and the choice of the number of annuli is a trade-off between distinctiveness and robustness.
3.4.4
Distance Metrics for Feature Matching
The similarity of spin images is typically rated by a correlation-based distance
metric [Pear 96, John 98]. This is also a convenient choice for the multi-modal case
implying different mesh densities, as spin images of the same topography but
with different vertex densities exhibit a linear relationship between corresponding entries of dspin for approximately uniformly distributed point clouds. The
RIFF and MeshHOG descriptors driff , dhog are commonly compared using bin-tobin distances such as the L1 and L2 norm, respectively. This is practicable as the
RIFF and MeshHOG histogram domains are aligned prior to matching, using L2 normalization to gain invariance to mesh resolution and density, respectively.
3.5
3.5
Experiments and Results
45
Experiments and Results
In the experiments, we investigate the application of the proposed multi-modal
rigid surface registration framework for marker-less patient setup in fractionated
RT and augmented reality applications in IGLS, as motivated in Sect. 3.1.1. In order
to study the generalizability of the method, we consider different RI technologies:
structured light (Sect. 3.5.1) and ToF imaging (Sect. 3.5.2).
3.5.1
Multi-Modal Patient Setup in Fractionated RT
We consider the clinical scenario where the patient’s body surface is captured using RI sensors and registered to a given reference shape extracted from CT planning data, potentially involving gross initial misalignments. The estimated transformation is then transferred to the control unit of the steerable treatment table.
For quantitative evaluation, we have benchmarked the performance of the proposed descriptors in an experimental study on real data from Microsoft Kinect,
using anthropomorphic phantoms. Thereby, we underline the benefits of the proposed framework for multi-modal partial surface matching.
Materials and Methods. We have generated a benchmark database of Microsoft
Kinect RI data for two anthropomorphic phantoms (male/female), see Fig. 3.6a,b.
Data were acquired in a clinical RT environment (Siemens ARTISTE, Siemens AG,
Healthcare Sector, Kemnath, Germany, cf. Fig. 3.6c). For each phantom, we captured RI data for N = 20 different initial misalignments of the treatment table,
including large deviations. The set of poses for the phantom benchmark is composed of all possible permutations of the transformation parameter sets θAP =
{0, 5, 10, 25, 45}◦ , tSI = {0, 200} mm, and tML = {0, 200} mm, where the angle θAP
describes the table rotation about the isocenter axis and tSI , tML denote the table
translation in superior-inferior (SI) and medio-lateral (ML) direction, cf. Fig. 3.1.
The translation in anterior-posterior (AP) direction was set to tAP = −600 mm,
representing the initial height for the patient to get on the table and recline. The
table positioning control (accuracy: ± 1.0 mm, ± 0.5◦ ) was used to set up the respective ground truth transformation ( RGT , tGT ). The RI sensor was mounted 200
cm above the floor, at a distance of 240 cm to the LINAC isocenter and a viewing
angle of 55◦ . RI sequences (30 fps) were acquired with a resolution of 640×480 px.
In terms of RI pre-processing, we combined temporal averaging (over 150 ms)
with edge-preserving bilateral filtering. Invalid measurements were restored using normalized convolution (cf. Sect. 2.2.3). The patient surface can be segmented
from the background using prior information about the treatment table position.
CT data of the phantoms were acquired on a Siemens SOMATOM scanner (Department of Neuroradiology, Erlangen University Clinic, Germany). The male
phantom was scanned and reconstructed with a resolution of 512 × 512 × 346 voxels and a spacing of 0.95 × 0.95 × 2.5 mm, the female one with a resolution of
512 × 512 × 325 voxels and a spacing of 0.81 × 0.81 × 2.5 mm. The phantom surface was extracted using a thresholding based region growing approach with manual seed point placement. On the extracted binary segmentation mask, we then ap-
46
Feature-based Multi-Modal Rigid Surface Registration
(a)
(b)
(c)
(d)
Figure 3.6: Female (a) and male (b) anthropometric phantoms, made of glass-fiber reinforced plastic. (c) Siemens ARTISTE RT suite. Image source: Siemens AG. Note the set of
four multi-modal RI/CT markers (d) attached to the phantoms (a,b) for registration to CT
planning data.
plied the Marching Cubes algorithm [Lore 87] followed by Laplacian mesh smoothing [Fiel 88]. Eventually, the mesh was decimated in order to reduce the computational complexity. The pre-processed RI (CT) meshes consist of ∼15k (∼20k)
vertices. Note that CT data typically covers only a portion of the RI scene. Furthermore, let us stress that CT pre-processing including the computation of 3-D shape
descriptors can be performed offline prior to the first fraction (cf. Fig. 3.3).
In order to evaluate the accuracy of patient setup w.r.t. the ground truth transformation given in the LINAC coordinate system (table controls), the RI measurements were transformed into the LINAC coordinate system. The RI/LINAC coordinate system transformation was determined using a checkerboard calibration
pattern. CT data of the male and female phantom were aligned to the LINAC coordinate system using an RI reference acquisition of the phantoms at the isocenter
position (θAP = 0◦ , tSI = 0 mm, tML = 0 mm, tAP = −150 mm) in combination
with multi-modal RI/CT markers, see Fig. 3.6. In particular, we attached a set of
four painted Beekley spots1 to each phantom. These spots are placed such that
they are visible from the RI camera. In addition, they can be easily detected in CT
data. Based on the corresponding marker positions in RI and CT data, the transformation can be estimated. Thus, the CT dataset is transformed into the reference
position of the LINAC coordinate system2 . This calibration procedure now allows
us to compare the estimated table transformation (R, t) to the ground truth table
setup (RGT , tGT ) given by the table control in the LINAC coordinate system. In
1 http://www.beekley.com
2 Note
that this alignment procedure allows for a general feasibility study of coarse patient
setup. For a more precise analysis including an investigation of the effect of subsequent position
refinement using ICP, this procedure is considered insufficient.
3.5
Experiments and Results
47
particular, we compute the mean rotational and mean translational errors over the
set of N poses:
∆θAP =
∆tAP/ML/SI =
∆t =
1
N
1
N
1
N
N
(i )
(i )
∑ |θAP − θAP,GT | ,
i =1
N
(i )
(i )
∑ |tAP/ML/SI − tAP/ML/SI,GT | ,
i =1
N
(i )
∑ kt(i) − tGT k2 ,
(3.25)
(3.26)
(3.27)
i =1
where θAP (θAP,GT ) denotes the estimated and ground truth rotation angle around
the table axis, t (tGT ) the translation. The superscript indicates the index of the
evaluated transformation. Recall that the transformation is restricted to a rotation
about the table’s vertical isocenter axis as conventional RT treatment tables are
limited to four degrees of freedom (Sect. 3.3.2).
For a valid comparison of the proposed shape descriptors, we set an identical
support region radius of rN = 100 mm. This choice is motivated by the physical
scale of anatomical features on a human torso. The individual descriptor parameters were set to typical values [John 98, Taka 10, Zaha 09] if not stated otherwise.
The most influential parameters were adapted empirically using parameter grid
search techniques. For the spin image descriptor, the cylindrical coordinate space
was discretized into 15 bins in each direction. Note that the metric bin width is
rN /15 and the support region describes a cylinder. This parameterization results
in a descriptor dimension of Dspin = 15 · 15 = 225. For the MeshHOG descriptor,
we chose 8-bin orientation histograms and Nseg = 8 equally-spaced circular segments, resulting in Dhog = 3 · 8 · 8 = 192. The parameter study revealed that using
less orientation bins and circular segments, respectively, reduces descriptiveness,
while using more orientation bins affects robustness due to instabilities in the establishment of the local reference frame and the gradient vector field. The CUSS
sampling density was set to Ncuss = 18, the sampling radius to rcuss = 20 mm. Regarding the scalar function, the neighborhood radius for estimating the best-fitting
plane was set to 20 mm. For a comprehensive MeshHOG parameter study we refer to [Baue 11b, Mull 10]. For the RIFF descriptor, the virtual RI camera sampling
density was set to 128 × 128 px. Using Nriff = 4 circular annuli and 8-bin orientation histograms per annuli results in Driff = 4 · 8 = 32. As distance metric for
feature matching, we applied the correlation coefficient for spin images and the
L1 -norm for RIFF and MeshHOG.
Results. Quantitative results for the multi-modal alignment of the N = 20 phantom poses are depicted in Table 3.1. In total, for all three descriptors, the registration framework was able to estimate the table transformation for the vast majority
of gross initial misalignments. For the application in RT patient setup, what is
most important is a high percentage of successful registrations (SR) – accuracy is
a secondary goal for this initial pre-alignment. Having achieved the highest percentage of successful registrations (97.5%), let us refer to the results of the MeshHOG descriptor as an overall performance indicator yielding a mean rotational
48
Descriptor
Spin Images
MeshHOG
RIFF
Feature-based Multi-Modal Rigid Surface Registration
Metric
SR
∆θAP [◦ ]
∆tAP [mm]
∆tML [mm]
∆tSI [mm]
∆t [mm]
SR
∆θAP [◦ ]
∆tAP [mm]
∆tML [mm]
∆tSI [mm]
∆t [mm]
SR
∆θAP [◦ ]
∆tAP [mm]
∆tML [mm]
∆tSI [mm]
∆t [mm]
(m)
0.95
0.7 ± 0.6
5.7 ± 3.8
7.6 ± 5.0
7.1 ± 6.3
13.5 ± 6.0
1.00
1.0 ± 0.6
4.9 ± 3.6
6.7 ± 5.1
7.9 ± 8.3
13.5 ± 7.2
1.00
1.4 ± 1.2
4.5 ± 3.5
4.5 ± 3.7
6.2 ± 4.7
10.5 ± 3.9
(f)
0.95
0.6 ± 0.5
3.7 ± 2.5
5.6 ± 4.3
7.3 ± 4.2
11.1 ± 4.0
0.95
2.0 ± 1.6
3.9 ± 2.8
8.6 ± 4.4
5.8 ± 6.2
12.3 ± 5.9
0.90
2.0 ± 1.6
4.7 ± 3.5
7.2 ± 5.6
6.3 ± 5.9
12.4 ± 6.1
(m) & (f)
0.95
0.7 ± 0.6
4.7 ± 3.3
6.6 ± 4.7
7.2 ± 5.3
12.3 ± 5.2
0.98
1.5 ± 1.3
4.4 ± 3.2
7.6 ± 4.8
6.9 ± 7.3
12.9 ± 6.6
0.95
1.7 ± 1.4
4.6 ± 3.5
5.8 ± 4.8
6.3 ± 5.2
11.4 ± 5.1
Table 3.1: Mean rotational and translational errors in patient setup based on multi-modal
RI/CT surface registration on male (m) and female (f) anthropometric phantoms. Given
are mean and standard deviation. SR quotes the percentage of successful registrations,
classified with heuristic thresholds (∆θAP < 10◦ , ∆t < 40 mm).
and translational error of ∆θAP = 1.5 ± 1.3◦ and ∆t = 12.9 ± 6.6 mm, respectively3 . Note that with the original MeshHOG formulation [Zaha 09] where ∇ f
relies on adjacent vertices (recall Sect. 3.4.2), the registration failed for all datasets.
The results for spin images roughly match up to the HOG-like descriptors (MeshHOG, RIFF). We assume this good spin image performance to result from the fact
that both meshes exhibit a rather uniform triangulation. Nonetheless, the experiments confirm the robustness of spin images, underlining its popularity. Opposing the performance of the shape descriptors on the male and female phantom,
respectively, the results seem surprising at a first glance. With respect to its distinctive topography (Fig. 3.6a), one might expect a better registration success rate
and accuracy for the female mannequin. Instead, the results indicate that negative
effects due to self-occlusion (that increase with large table rotations) even lead to
a slightly worse performance compared to the male phantom.
3 At
this point, let us remark again that the scope of the feature-based approach is restricted to
a coarse initial patient setup. Setup verification in terms of accurate positioning refinement is a
mandatory subsequent step but not addressed here.
3.5
Experiments and Results
(a)
49
(b)
(c)
Figure 3.7: Spatial distribution of point correspondences for a multi-modal RI/CT registration on male (top row, θAP = 0◦ , tSI = 200 mm, tML = 200 mm, tAP = −600 mm)
and female phantom data (bottom row, θAP = 45◦ , tSI = 200 mm, tML = 200 mm,
tAP = −600 mm). From left to right, point correspondences for (a) spin images, (b) MeshHOG and (c) RIFF are given. Only a subset of the found correspondences is shown.
A qualitative illustration of the extracted set of correspondences is shown in
Fig. 3.7. We found that with increasing table rotations, the number of found correspondences decreased considerably. For instance, in average over all descriptors,
the number of correspondences for table rotations of θAP = 45◦ decreased to 59.2%
compared to the number of correspondences for a table rotation of θAP = 0◦ .
3.5.2
Multi-Modal Data Fusion in IGLS
Investigating the application of the proposed framework in IGLS, we consider a
multi-modal ToF/CT setup (Sect. 3.1.2). The applicability of ToF imaging for intraoperative surface acquisition was first investigated by Seitel et al. [Seit 10], achieving promising results on intra-operative 3-D shape acquisition of porcine organs.
Materials and Methods. We performed in-vitro experiments on four porcine livers that were captured both with a PMD CamCube 2.0 ToF camera and with a Carm CT system. Opposed to the patient setup experiments (Sect. 3.5.1), the ground
50
Feature-based Multi-Modal Rigid Surface Registration
Figure 3.8: CT (top row) and ToF (bottom row) porcine liver surface data. For convenience,
these graphics were produced for similar poses after manual alignment to facilitate visual
comparison. Note the surface artifacts in ToF surface data due to specular reflections.
truth transformation that aligns ToF to CT data is not given here. Hence, for evaluation of the registration performance, we computed the residual Euclidean meshto-mesh distance, denoted target registration error dTRE , after having applied the
estimated transformation.
ToF data were acquired with a PMD CamCube 2.0 camera with an integration time of 1000 µs. For further specifications we refer to Sect. 2.1.3. The camera
was mounted 60 cm above the organ in an interventional radiology development
environment. Owing to the low SNR with ToF data and w.r.t. the trade-off between data denoising and preservation of topographical structure, we perform
data pre-processing in a way that gives priority to the topographical reliability of
the surface. In particular, we combine temporal averaging with edge-preserving
median and bilateral filtering, respectively. Furthermore, the organ surface was
segmented by depth thresholding. The pre-processed ToF organ meshes are depicted in Fig. 3.8. Note that some surface artifacts that result from specular highlights could not be removed using the proposed pre-processing pipeline. CT data
were acquired with an Artis zeego C-arm CT system (Siemens AG, Healthcare Sector, Forchheim, Germany). The livers were scanned and reconstructed with a volumetric resolution of 512 × 512 × 348 voxels and a spacing of 0.7 × 0.7 × 0.7 mm.
In analogy to Sect. 3.5.1, the organ was segmented from CT data using a region
growing approach with manual seed point placement followed by mesh generation. The CT organ surface meshes are depicted in Fig. 3.8.
Based on the descriptor performance results in Sect. 3.5.1, we confine to the
MeshHOG descriptor for this multi-modal ToF/CT surface registration scenario
(Sect. 3.4.2). The descriptor parameters were determined in a comprehensive parameter study [Mull 10], resulting in a support region radius of rN = 30 mm,
Nseg = 4 circular segments with 4-bin orientation histograms, and thus a descriptor dimension of Dhog = 3 · 4 · 4 = 48. The CUSS sampling density was set to
Ncuss = 18, the sampling radius rcuss to triple the average ToF mesh edge length.
Regarding the scalar function f , the neighborhood radius for estimating the best-
3.6
Discussion and Conclusions
dTRE,pre [mm]
dTRE,icp [mm]
L1
L2
L3
L4
4.38 4.24 4.37 2.18
2.01 2.47 1.72 1.58
51
L1-L4
3.78 ± 1.08
1.95 ± 0.39
Table 3.2: Quantitative registration results for ToF data from four porcine livers (L1-L4).
Given are the target registration errors dTRE after coarse pre-alignment (first row) and after
ICP refinement (second row), respectively.
fitting plane was set to 30 mm. As distance metric for feature matching, we chose
the L2 norm as originally proposed [Zaha 09] to penalize outliers that are expected
due to the low SNR and artifacts with ToF surface data, cf. Fig. 3.8.
Results. Quantitative ToF/CT registration results for the porcine livers L1-L4 are
given in Table 3.2. Over the four datasets, the target registration error after featurebased initial surface pre-alignment was dTRE,pre = 3.78 ± 1.08 mm and dTRE,icp =
1.95 ± 0.39 mm after subsequent ICP refinement. Note that dTRE,pre < 5 mm and
dTRE,icp < 2.5 mm for all four datasets. The comparably small error for dataset
L4 is assumed to result from its more distinctive topography compared to L1-L3.
The spatial distribution of the set of reliable correspondences that are used for
transformation estimation are illustrated in Fig. 3.9. Note that on liver L1, fewer
areas with reliable correspondences were identified compared to L2-L4 that exhibit
a more salient geometry and hence more distinct feature descriptors on average.
3.6
Discussion and Conclusions
In this chapter, we have presented a feature-based rigid surface registration framework that meets the specific requirements for multi-modal medical applications.
In particular, we have adapted state-of-the-art descriptors to enforce invariance
w.r.t. mesh density, mesh organization and inter-modality deviations in surface
topography. Let us remark that the proposed descriptors share several common
properties. First, all of them project the original 3-D surface information onto a
2-D domain w.r.t. the given reference point. Second, the projected data is encoded
using histogram techniques. Third, all three descriptors rely on a local objectcentered reference frame. For spin images and RIFF, this local frame constitutes a
2-D basis that relies on the position of the reference point and its associated normal. In contrast, the MeshHOG descriptor builds upon a 3-D basis that is established by an additional axis. This 3-D reference frame is advantageous in terms
of descriptiveness opposed to spin images and RIFF, if established in a robust and
repeatable manner [Petr 11].
In an experimental study, we have investigated the application of the framework to two medical applications. For marker-less initial patient setup in RT, applying our modified MeshHOG descriptor for RI/CT registration resulted in an
average angular positioning error of 1.5 ± 1.3◦ and an average translational positioning error of 12.9 ± 6.6 mm, at a 97.5% success rate on Microsoft Kinect RI
data. In general, all three shape descriptors gave a comparable performance in
52
Feature-based Multi-Modal Rigid Surface Registration
L1
L2
L3
L4
Figure 3.9: Illustration of the spatial distribution of reliable point correspondences between ToF (right) and CT surfaces (left), for the four liver datasets L1-L4. Again, for convenience, only a subset of the found correspondences is shown.
the experiments, even in the presence of gross misalignments and a flat RI viewing angle (55◦ ). The achieved setup accuracy fulfills the requirements for patient
pre-alignment (min. ±50 mm [Fren 09]) and provides a reliable initialization for
subsequent position refinement approaches [Baue 12b, Brah 08, Fren 09, Scho 07];
thus potentially making the conventional manual initialization using lasers and
skin markers redundant. Regarding the alignment of intra-operative organ surface
data to pre-operative shapes extracted from tomographic data, our experimental
results on porcine liver data in a ToF/CT setup suggest that a feature-based registration approach based on marker-less RI can replace the manual selection of
anatomical landmarks in IGLS. Over four datasets, the average target registration
error was 3.78 ± 1.08 mm. Having successfully applied the proposed framework
on two different RI modalities and different biological materials (skin vs. organs)
indicates the generalization potential of the approach to a wide range of clinical
applications where an approximately rigid assumption holds true. By design, the
method can handle gross initial misalignments and partial matching, and can be
also applied to setups with multiple RI cameras that typically provide more reliable surface data due to an increased and overlapping coverage of the target.
Let us briefly comment on runtime performance. First, recall that the processing of pre-operative reference data including pre-processing, surface extraction
and feature extraction can be performed offline. During the procedure, the computation of shape descriptors on RI data and its matching to the pre-computed
3.6
Discussion and Conclusions
53
CT shape descriptors is required. The runtime for this intra-procedural workload highly depends on the descriptor parameterization and the RI/CT mesh densities constituting the dimensionality of the search space. In average, for both
the RT and the IGLS scenario, the runtime of our proof-of-concept implementation lies in the range of 30-60 s with a GPU-based cross-validation correspondence search. Further speed-ups could be achieved using dedicated acceleration
structures, cf. Sect. 4.3.2. Several components of the framework could be addressed to improve the performance. First, introducing a low-level feature detection stage that identifies locations with distinctive topographies [Holz 12] would
reduce the computational effort for shape description and narrow down the correspondence search space. Second, modifications regarding the correspondence
search strategy, e.g. extending the geometric consistency check to angular constraints [Shan 04, Funk 06] or enforcing the second best feature match to be substantially worse than the best one [Zaha 09] could help in further improving robustness of the approach. Third and last, we expect benefits from performing
feature extraction in a multi-scale scheme.
54
Feature-based Multi-Modal Rigid Surface Registration
CHAPTER
4
Photo-geometric Rigid Surface
Registration for Endoscopic
Reconstruction
4.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Photo-geometric Surface Registration Framework . . . . . . . . . . . . . . . 60
4.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
The feature-based surface registration framework introduced in Chap. 3 relies on
the topography of the shapes that are to be aligned. In this chapter, we propose
a method for 3-D shape reconstruction that incorporates additional complementary photometric information from textured point cloud data to guide the underlying surface registration process. In an experimental study, we show that this
photo-geometric approach is of particular interest for modern RGB-D cameras that
provide low-SNR range measurements but additionally acquire high-grade photometric information. We address two particular medical applications and show
that they can benefit from incorporating complementary photometric information:
• Operation situs reconstruction in 3-D laparoscopy
• Colon shape model construction in 3-D colonoscopy
Both applications involve real-time constraints for practical usage. Hence, we
propose an ICP variant that (1) incorporates both geometric and photometric information in an efficient low-level scheme and (2) builds on a novel acceleration
structure to overcome the traditional performance bottleneck in nearest neighbor
(NN) search space traversal. Even though the acceleration structure is specifically
designed for an implementation on many-core hardware [Cayt 10, Cayt 11], we
have further optimized the scheme in terms of performance, trading off accuracy
against runtime [Baue 13b, Neum 11].
The remainder of this chapter is organized as follows. In Sect. 4.1 and Sect. 4.2,
we address the medical background and related work. In Sect. 4.3, we detail the
proposed framework for photo-geometric surface registration and 3-D shape reconstruction. In Sect. 4.4, we study its performance on a classical computer vision
55
56
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
scenario and present experimental results for operation situs reconstruction in laparoscopy and organ shape model construction in colonoscopy. Eventually, we
discuss the results and draw conclusions in Sect. 4.5. Parts of this chapter have
been published in [Baue 13b, Neum 11]. Parts from [Baue 13b] are reprinted with
kind permission from Springer Science and Business Media, © Springer-Verlag
London 2013.
4.1
Medical Background
Let us depict the medical background of the addressed applications. As an overall introduction, first, we summarize the state-of-the-art and comment on the potential of 3-D endoscopy. Then, we describe the particular applications we address in laparoscopy (Sect. 4.1.1) and colonoscopy (Sect. 4.1.2). 3-D endoscopy can
help accomplishing both diagnostic and interventional minimally invasive procedures (MIP) more easily, safely, and in a quantitative manner. For a comprehensive overview of prospective applications we refer to Sect. 2.3.4 and related
literature [Moun 07, Miro 11]. Beside 3-D reconstruction from monocular endoscopic video data, there is an increasing body of work investigating the miniaturization of RI measurement principles for endoscopic application. To date, both the
research community and hardware manufacturers focus on the development of
rigid 3-D endoscopes. Recently, Röhl et al. presented a GPU-enhanced surface reconstruction framework for stereo endoscopy [Rohl 12], and Haase et al. [Haas 12]
presented a first prototype for combined ToF/RGB endoscopy, manufactured by
Richard Wolf GmbH, Knittlingen, Germany. Both systems provide photometric
and geometric surface information in real-time. In addition, literature reveals
promising concepts that can potentially be translated to flexible 3-D endoscopy,
e.g. photometric stereo [Coll 12], structured light [Schi 11, Schm 12], or ToF imaging [Penn 09]. More concrete, Clancy et al. presented a flexible multi-spectral structured illumination probe [Clan 11], yet limited to sparse 3-D measurements. In any
case, to gain acceptance, 3-D endoscopes are required to capture complementary
photometric video footage as in conventional 2-D endoscopy.
4.1.1
Operation Situs Reconstruction in Laparoscopy
Minimally invasive procedures have become a promising option to open surgery
for an increasing number of applications. Working through small incisions holds
substantial advantages compared to conventional surgery, including the reduction of post-operative patient pain and surgical trauma, a minimized risk of comorbidity, a reduced recovery period and hospital stays, and improved cosmetic
results. Laparoscopic surgery refers to MIP in the abdominal cavity. The procedure involves the inflation of carbon dioxide into the operation cavity to create
the working volume (pneumoperitoneum), and the insertion of surgical tools into
the pneumoperitoneum through small incisions that are sealed with trocars. The
intervention is performed under remote video guidance through an endoscopic
camera and one or more light sources.
4.1
Medical Background
57
Major limitations in endoscopic interventions include a limited field of view,
distortions and inhomogeneous illumination, a difficult hand-eye coordination,
and the loss of tactile feedback and 3-D perception. This complicates the control
of surgical tools and the assessment of pathological structures. Hence, endoscopic
procedures require both skill and experience of the surgeon. In particular, this
includes a high degree of dexterity, spatial awareness and orientation to navigate
safely in vivo. Indeed, the success of laparoscopic procedures highly depends on
precise knowledge of the patient anatomy during the intervention.
Dynamic view expansion techniques are a promising option for conventional
2-D endoscopy [Totz 11, Totz 12, Warr 12]. Basically, view expansion techniques
combine the previously observed endoscopic video footage to a panoramic view
of the operation situs. Even though a 3-D reconstruction of the situs from monocular video data is theoretically possible, the image correspondence problem poses
hurdles that are hard to overcome in practice. Intra-operative data acquisition
with 3-D endoscopes holds a much more practicable opportunity to reconstruct
the geometric topography of the operation situs by moving the endoscope over
the scene while successively aligning the captured RI data stream. First and foremost, this provides the surgeon with an extended view of the target and surrounding anatomy. Second, the reconstructed shape can be aligned to pre-operative
patient-specific reference data, cf. Chap. 3. This data fusion forms the basis for motion tracking and augmented reality applications, enhancing the surgeon’s navigation beyond the observable tissue surface with pre-operative patient-specific data.
Third, the geometric reconstruction of the operation situs is of particular interest
in robotic laparoscopic surgery [Stoy 10].
4.1.2
Towards 3-D Model Construction in Colonoscopy
In colorectal screening, optical colonoscopy (OC) is considered the gold standard
for visual diagnosis and intervention. After colon preparation using laxatives, a
flexible endoscope (colonoscope) is inserted into the rectum until the tip reaches
the cecum. Now the gastroenterologist slowly retracts the scope while inspecting
the colon for polyps and suspicious tissue. Note that the colon is inflated with gas
to avoid it collapse. Opposed to alternative screening techniques outlined below,
OC gives the opportunity to take a biopsy or remove pre-cancerous colonic polyps
and suspicious lesions, if required.
A potential non-invasive alternative for colorectal screening is virtual colonoscopy (VC) [McFa 11], with a reported sensitivity of 96.1% for colorectal cancer [Pick 11]. Based on CT data of the lower abdomen in both prone and supine
position, a virtual model of the inflated colon is generated. This model enables the
gastroenterologist to perform interactive endoluminal fly-throughs (as in OC) to
identify abnormalities. Compared to OC, VC does not require sedation, takes less
time and also reveals abnormalities outside the colon. However, it involves radiation dose, a limited spatial resolution, and the impossibility of morphologic diagnosis and intervention. As a consequence, VC can be considered a non-invasive
screening alternative for patients with an indicated risk to undergo OC. However,
in case of positive VC results, a conventional OC is necessary for intervention.
58
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
The fusion of pre-interventional VC data with interventional data acquired
during OC is a promising future direction. For instance, on-the-fly registration of
OC data with the colon model extracted from VC scans would allow guiding the
gastroenterologist by tracking the OC position in the volumetric reference data.
This would facilitate navigation and could potentially supersede the need for additional hardware such as in magnetic guidance [Hoff 07]. Usually, OC is performed without prior VC. In this case, a prospective application is the construction of a metric 3-D shape model of the colon from OC data to assist surgeons in
quantitative diagnosis, inspection from different viewpoints, pre-operative planning, longitudinal monitoring of suspicious tissue, and surgical training. The extent of model construction depends on the application and ranges from building
a model of a small lesion or polyp [Chen 09, Chen 10] up to reconstructing a full
model of the entire colon from cecum to rectum. Ideally, the model should incorporate both the 3-D shape and photometric appearance information. Today, early
approaches to model construction in colonoscopy rely on monocular sensors and
exploit SfM/SLAM techniques for shape reconstruction [Kopp 07, Liu 08, Chen 09,
Chen 10]. However, expecting the availability of flexible 3-D endoscopes in the
medium term, we investigate their potentials for shape reconstruction in 3-D OC.
4.2
Related Work
Both applications considered in this chapter require a fast technique to align RI
data on-the-fly during the intervention. In contrast to Chap. 3 where we used
feature-based global registration techniques to cope with gross misalignments, the
alignment of successive frames from a hand-guided RI device can be addressed
with a local registration approach, cf. Sect. 2.4.1. Two decades after its introduction [Besl 92, Chen 92, Zhan 94], the ICP algorithm is still the de-facto standard in
rigid point cloud registration in the case of slight shape misalignments. Over the
years, a multitude of ICP variants have been proposed in literature, see Sect. 2.4.2
and the reviews by Rusinkiewicz and Levoy [Rusi 01] and Salvi et al. [Salv 07].
However, in the field of 3-D model reconstruction, only few existing approaches
have achieved interactive frame rates so far. Huhle et al. proposed a system for
on-the-fly 3-D scene modeling using a low resolution ToF camera (160×120 px),
achieving per-frame runtimes of >2 s [Huhl 08]. Engelhard et al. presented comparable runtimes on Microsoft Kinect data (640×480 px) for an ICP-based RGB-D
SLAM framework [Enge 11], Henry et al. [Henr 12] perform ICP registration in an
average of 500 ms. Only recently, real-time frame rates were reported for geometric ICP variants. In particular, the GPU-based KinectFusion framework [Izad 11,
Newc 11] has gained popularity in the field of 3-D reconstruction. Its core is based
on the work of Rusinkiewicz et al. [Rusi 02], combining projective data association [Blai 95] and a point-to-plane metric [Chen 92] for transformation estimation.
In projective data association, the set of moving points is typically projected onto
the fixed mesh to establish the correspondences, instead of performing a nearest
neighbor search over the entire fixed mesh. This projection approach involves an
inferior convergence behavior [Blai 95], but can be performed very efficiently. The
identification of corresponding points typically relies on comparing characteristics
4.2
Related Work
(a)
59
(b)
(c)
(d)
Figure 4.1: Incorporating photometric information into ICP alignment in situations of nonsalient surface geometry: (a, b) First and last frame of an RGB-D sequence capturing a colored poster sticked to a plane wall from changing perspectives. Considering only scene
geometry results in an erroneous alignment (c). Instead, with the proposed method exploiting both geometric and photometric information, a proper alignment is found (d).
of the moving point to its corresponding projection point on the fixed mesh. This
hard constraint can be relaxed by extending the correspondence search to the local
neighborhood of the projected point on the fixed mesh. However, this involves a
substantial computational burden [Dora 97, Benj 99].
More than a decade ago, Godin et al. [Godi 94], Johnson and Kang [John 97] and
Weik [Weik 97] presented approaches to incorporate photometric information into
the ICP framework (photo-geometric ICP) to improve its robustness. The idea is that
photometric information can compensate for regions with non-salient topographies, whereas geometric information can guide the pose estimation for faintly
textured regions, see Fig. 4.1. The rare consideration of the concept of photogeometric ICP in the literature may result from the (1) unavailability of low-cost
RGB-D sensors until recently and (2) the fact that it requires a 6-D nearest neighbor search, implying a substantial performance bottleneck. Modifications have
been proposed that try to accelerate the nearest neighbor search by pruning the
search space w.r.t. photometrically dissimilar points [Druo 06, Joun 09]. However,
this reduction typically comes with a loss in robustness. Traditional approaches
for efficient nearest neighbor search rely on space-partitioning data structures like
k-D trees [Aken 02]. However, these structures are unsuitable for implementation
on modern many-core hardware due to the non-parallel and recursive nature of
the construction and/or traversal of the underlying data structures.
Recently, space-partitioning strategies have been introduced that are specifically designed for many-core architectures. A promising approach is the random
ball cover (RBC) proposed by Cayton [Cayt 10, Cayt 11]. The basic principle behind RBC is a two-tier nearest neighbor search. Even though the nearest neighbor
search itself builds on the brute force (BF) search primitive, the introduction of
a two-tier search hierarchy enables considerable speedups. In this work, trading
accuracy against runtime, we propose a new approximative RBC variant that is optimized in terms of runtime performance to accelerate the nearest neighbor search
in photo-geometric ICP alignment. In the applications addressed in this chapter, we expect a photo-geometric ICP to be of particular interest regarding (1) flat,
spherical, or tubular formed structures (e.g. in tracheal endoscopy, bronchoscopy
or colonoscopy) that evoke ambiguities in geometry-driven surface registration,
60
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
and (2) its application with modern RGB-D sensors that exhibit a low SNR in the
range domain but additionally provide high-grade photometric information.
4.3
Photo-geometric Surface Registration Framework
The proposed framework for on-the-fly photo-geometric point cloud mapping and
3-D shape modeling is composed of three stages, as depicted in Fig. 4.2. The initial
stage involves RGB-D data acquisition and pre-processing. In the second stage,
based on a set of landmarks, the proposed photo-geometric ICP variant is applied
(Sect. 4.3.1). The rigid body transformation is estimated in a frame-to-frame manner, i.e. the pose of the instantaneous frame is estimated by registration against
the previous frame. In the third stage, based on the estimated transformation, the
instantaneous RI data are integrated into a global shape model.
4.3.1
Photo-geometric ICP Scheme
The ICP algorithm estimates the rigid transformation (R, t) that brings a moving
template point set Xm = { xm,1 , . . . , xm,|Xm | } in congruence with a fixed reference
point set Xf = { xf,1 , . . . , xf,|Xf | } [Besl 92, Chen 92].
Based on an initial guess ( R0 , t 0 ), the scheme iteratively estimates the optimal
transformation by minimizing an error metric assigned to repeatedly generated
pairs of corresponding landmarks ( xm , xc ) where xm ∈ Xm and xc ∈ Xf . A standard choice for the distance d between an individual moving landmark xm and the
set of reference landmarks Xf is the squared Euclidean distance:
d( xm , Xf ) = min k xf − xm k22 .
xf ∈Xf
(4.1)
The landmark xc ∈ Xf yielding the minimum distance to xm is then given by:
xc = argmin k xf − xm k22 .
(4.2)
xf ∈Xf
Now, let us consider RGB-D data, where a point holds both geometric x ∈ R3 and
photometric information p ∈ R3 . In order to compensate for inconsistencies due
to changes in illumination and viewpoint direction, we transfer the photometric
information to normalized RGB space first [Geve 99],
 
pr
−1  
p = ( pr + p g + p b )  p g  ,
(4.3)
pb
where pr , p g , pb denote the measured intensities of the red, green and blue photometric channels. We denote the concatenation of both domains for the moving and
fixed data M,F as:
M = {( xm,1 , pm,1 ), . . . , ( xm,|Xm | , pm,|Xm | )} ,
F = {( xf,1 , pf,1 ), . . . , ( xf,|Xf | , pf,|Xf | )} .
(4.4)
(4.5)
4.3
Photo-geometric Surface Registration Framework
61
Figure 4.2: Flowchart of the proposed photo-geometric rigid surface registration and
3-D shape model reconstruction framework.
Our proposed photo-geometric ICP variant should incorporate both geometric and
photometric information in the correspondence search. Hence, we modify the distance metric d:
2
2
(4.6)
d( xm , Xf ) = min
βk xf − xm k2 + (1 − β)k pf − pm k2 ,
( xf ,pf )∈F
where β ∈ [0, 1] is a non-negative constant weighting the influence of the geometric and photometric information, respectively. The landmark xc ∈ Xf yielding the
minimum distance to xm is given by:
xc = argmin βk xf − xm k22 + (1 − β)k pf − pm k22 .
(4.7)
( xf ,pf )∈F
Assigning a nearest neighbor xc to all xm ∈ Xm yields a set of corresponding points
Xc = { xc,1 , . . . , xc,|Xm | } and the set of landmark correspondences can be denoted
as C = {( xm,1 , xc,1 ) , . . . , ( xm,|Xm | , xc,|Xm | )}, cf. Sect. 3.3.1.
Next, based on the landmark correspondences C k found in the k-th ICP iteration, the transformation ( R̂k , t̂ k ) can be estimated by either minimizing a pointto-point error metric in a least-squares sense using a unit quaternion optimizer
[Horn 87], also recall Eq. (3.12),
( R̂k , t̂ k ) = arg min
Rk ,t k
1
k
k( Rk xm
+ t k ) − xck k22 ,
∑
k
|C | C k
(4.8)
62
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
or by minimizing a point-to-plane distance metric using a nonlinear solver as originally proposed by Chen and Medioni [Chen 92]:
( R̂k , t̂ k ) = arg min
Rk ,t k
2
1
k
(( Rk xm
+ t k ) − xck )> nxck .
∑
k
|C | C k
(4.9)
Here, nxck denotes the surface normal associated with the point xck ∈ Xf . After each
iteration, the global ICP solution (Ricp , ticp ) is accumulated:
Ricp = R̂k Ricp ,
ticp = R̂k ticp + t̂ k ,
(4.10)
k are updated according to xk = R x + t . The two
and the elements of Xm
icp m
icp
m
alternating stages of (1) finding the set of nearest neighbors Xck and (2) estimating
the optimal transformation given the correspondences C k are repeated iteratively
until a convergence criterion is fulfilled, see Sect. 4.4.
In the following section, we describe the RBC-based 6-D nearest neighbor search
framework that we use to establish the ICP landmark correspondences (Eq. 4.7) in
an efficient manner. In particular, we detail how we optimized the scheme, trading
off accuracy against runtime performance.
4.3.2
Approximative 6-D Nearest Neighbor Search using RBC
The RBC is a novel data structure for parallelized nearest neighbor search, proposed by Cayton [Cayt 10, Cayt 11]. By design, both the construction of the RBC
as well as dataset queries are based on using BF search primitives that can be
performed efficiently on many-core hardware. The RBC data structure relies on
randomly selected points r ∈ F , called representatives. Each of them manages a
local subset of F . This indirection creates a hierarchy in the database such that a
nearest neighbor query is processed by (1) searching the nearest neighbor r among
the set of representatives and (2) performing another search for the subset of entries managed by r. This two-tier approach outperforms a global BF search due
to the fact that each of the two successive stages explore a heavily pruned search
space. In this work, we have investigated the fitness of the RBC for acceleration of
the 6-D nearest neighbor search of our photo-geometric ICP. In particular, we have
optimized the concept in terms of runtime performance.
Cayton proposed two alternative RBC search strategies [Cayt 11]: The exact
search is the appropriate choice when the exact nearest neighbor is required. Otherwise, if a small error may be tolerated, the approximative one-shot search is typically faster. Originally, in order to set up the one-shot data structure, the representatives are chosen at random, and each r manages its S closest database elements.
Depending on S, points typically belong to more than one representative. This implies a sorting of all database entries for each representative – hindering a high degree of parallelization or implying the need for multiple BF runs [Cayt 10]. Hence,
we introduce a modified version of the one-shot approach that is even further optimized in terms of performance. In particular, we simplified the RBC construction,
trading off accuracy against runtime, see Fig. 4.3 (a–c). First, we select a random
set of representatives R = {r1 , . . . , r |R| } out of the set of fixed points F . Second,
4.3
Photo-geometric Surface Registration Framework
63
RBC Construction Scheme
(a)
(b)
(c)
RBC Query Scheme
(d)
(e)
(f)
Figure 4.3: Illustration of the RBC construction (a-c) and the two-tier nearest neighbor
query scheme (d- f ) for the simplified case of 2-D data. (a) Selection of a set of representatives R (labeled in dark blue) out of the set of database entries F (light and dark blue).
(b) Nearest representative search over the set of database entries, to establish a landmarkto-representative mapping. (c) Nearest neighbor set of each representative (shaded in blue).
(d) Query data (orange) and set of representatives R (dark blue). (e) Identification of the
closest representative r, in a first BF search run. ( f ) Identification of the nearest neighbor
(green) in the subset of entries managed by r (shaded in blue), in a second BF search run.
each representative r is assigned a local subset of F . This is done in an inverse
manner by computing the nearest representative r for each point ∈ F . The query
scheme of our modified one-shot RBC variant is consistent with the original approach and can be performed efficiently using two subsequent BF runs [Cayt 11],
see Fig. 4.3 (d–f). First, the closest representative is identified among R. Second,
based on the associated subset of entries managed by r, the nearest neighbor is
located.
Let us stress that this modified RBC construction scheme results in an approximative nearest neighbor search being error-prone from a theoretical point of view.
In practice, facing the trade-off between accuracy and runtime, we tolerate this
approximation, cf. Sect. 4.4.1. We further remark that the scheme is not limited to
6-D data but can applied to data of any dimension without loss of generality. For
application in surface registration, this potentially allows to extend the point signature for ICP correspondence search from 6-D to higher dimensions, e.g. appending additional complementary information or local feature descriptors to the raw
geometric and photometric measurements acquired by the sensor, cf. [Henr 12].
64
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
4.4
Experiments and Results
The experiments in this chapter divide into three parts. First, we study the overall
performance of the proposed framework on classical computer vision scenarios
(Sect. 4.4.1). Here, we put emphasis on quantifying the benefits of using our optimized approximative RBC compared to its original formulation [Cayt 11] and a
BF baseline. Second and third, we present synthetic experiments to investigate the
application of the framework to operation situs reconstruction in 3-D laparoscopy
(Sect. 4.4.2) and organ shape model construction in 3-D colonoscopy (Sect. 4.4.3).
Before, let us briefly depict the entire system for model reconstruction as illustrated in Fig. 4.2. Prior to point cloud alignment, the acquired RGB-D data are preprocessed. For RI data enhancement, we combine restoration of invalid measurements using normalized convolution with edge-preserving denoising (Sect. 2.2.3).
Considering the application of 3-D scene reconstruction using a real-time, handheld and steadily moved RGB-D device implies that a portion of the scene that
was captured in the previous fixed frame is no longer visible in the instantaneous
moving data and vice versa. Facing this issue, we discard the subset of moving
points xm ∈ Xm that correspond to range measurements within the boundary area
of the 2-D sensor domain to improve the robustness of ICP alignment. This clipping is performed in conjunction with the extraction of a set of ICP landmarks
F , M from the fixed reference and moving template data. The landmark extraction is performed by sub-sampling the clipped point set. The ICP transformation
is estimated by minimizing a point-to-point distance metric (Eq. 4.8).
4.4.1
Performance Study
In this section, we evaluate the proposed photo-geometric ICP for on-the-fly scene
and object reconstruction scenarios on real data from a hand-guided Microsoft
Kinect RI sensor. First, we present qualitative results for both indoor scene mapping and object reconstruction scenarios. Second, we demonstrate the real-time
capability of the framework in a thorough performance study. Third, we compare
our approximative RBC variant to an exact NN search in terms of accuracy.
1
For all experiments, the number of representatives was set to |R| = |F | 2 along
the suggestions by Cayton [Cayt 11], if not stated otherwise. As ICP convergence
criterion we analyze the variation of the estimated transformation over subsequent
iterations. In particular, we evaluate the change in translation magnitude and rotation angle w.r.t. heuristically set thresholds of 0.01 mm and 0.001◦ , respectively. As
initialization for the ICP alignment, we incorporate the estimated global transformation ( R0 , t 0 ) from the previously aligned frame, see Fig. 4.2, assuming a smooth
trajectory of the hand-guided acquisition device with a consistent direction. This
speeds up convergence and thus reduces the overall runtime. Regarding the robustness and accuracy of point cloud alignment, we observed a strong impact of
outliers that may occur in pre-processed RGB-D data due to changes in viewpoint
direction or occlusion and cannot be eliminated by denoising. To account for these
outliers, we optionally reject low-grade correspondences in the transformation estimation stage. The term low-grade is quantified by comparing the distance of a
4.4
Experiments and Results
65
Figure 4.4: First row: On-the-fly 3-D reconstruction of a lounge room (526 frames). The
left image depicts a bird-eye view of the respective room layout. The images on the right
provide a zoom-in for selected regions. Second row: 3-D reconstruction of a female torso
model (cf. Fig. 3.6), where the hand-held acquisition device was moved around the model
in a 360◦ -fashion to cover the entire object from different perspectives (525 frames).
corresponding pair of landmarks (cf. Eq. 4.6) w.r.t. an empirical threshold δlg . The
set of low-grade correspondences is re-computed for each ICP iteration and discarded in the subsequent transformation estimation step.
Qualitative Reconstruction Results. Qualitative results for the reconstruction of
indoor environments are depicted in the first row of Fig. 4.4. The RGB-D sequences
were acquired from a static observer location by rotating the hand-held sensor.
The alignment was performed on-the-fly. The room was reconstructed using the
following pre-processing and ICP/RBC settings: Edge-preserving denoising (geometric median, geometric and photometric guided image filter), |F | = |M| = 214
ICP landmarks, 10% edge clipping, β = 0.0005, no elimination of low-grade correspondences (δlg → ∞). Note the scale of β resulting from the different scales of the
geometric domain operating in [mm] and the photometric domain operating in
the range [0, 1]. In addition to scene reconstruction, the proposed framework can
also be applied to 3-D model digitalization, see the second row in Fig. 4.4. Here,
the hand-held acquisition device is moved around an object to acquire RGB-D
data from different perspectives while continuously merging the data into a global
model using the proposed framework. For the case of 3-D object reconstruction,
we apply a dedicated scheme for landmark extraction. Instead of considering the
entire scene, we segment the foreground using a depth threshold. From the set of
foreground pixels, we then select a subset of landmarks. Background data points
that are located beyond a certain depth level are ignored within the ICP alignment
procedure. The settings for object reconstruction were: Edge-preserving denois-
66
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
Runtime Performance: Brute Force vs. Exact RBC vs. Approximative RBC
Approximative RBC
210
211
212
213
214
Figure 4.5: Comparison of the average runtime for a single ICP iteration based on GPU
implementations of the BF search primitive, the exact RBC and our optimized approximative RBC variant as described in Sect. 4.3.2, for increasing number of landmarks. Note that
our modified approximative RBC approach outperforms the exact RBC up to a factor of 3.
The BF primitive scales quadratically w.r.t. the number of landmarks.
ing, 214 ICP landmarks, β = 1 (invariance to illumination issues), δlg = 3 mm. Regarding the effectiveness of the proposed system for the reconstruction of scenes
with non-salient 3-D geometry, we refer to Fig. 4.1. Facing a colored poster sticked
to a plane wall, the reconstruction could benefit considerably from incorporating
the photometric domain as a complementary source of information.
Runtime Performance. Now let us study the potential of the proposed RBCbased ICP framework in terms of runtime performance. For that purpose, we
have implemented the proposed photo-geometric framework on the GPU using
CUDA [Baue 13b]. The runtime study was conducted on an off-the-shelf consumer
desktop PC equipped with an NVIDIA GeForce GTX 460 GPU and a 2.8 GHz Intel
Core 2 Quad Q9550 CPU. Runtimes were averaged over several successive runs.
A comparison of absolute runtimes for a single ICP iteration is presented in
Fig. 4.5. Our modified approximative RBC outperforms both a BF search and our
reference implementation of Cayton’s exact RBC. In particular, the approximative
RBC variant outperforms the exact RBC implementation up to a factor of 3. The BF
search scales quadratically with the number of landmarks. Typical ICP runtimes
are presented in Table 4.1. From our experiments on indoor scene mapping, we
observed the ICP to converge after 10-20 iterations using the stopping criterion
described in Sect. 4.4.1. Hence, as an overall performance indicator, let us refer to
the runtime of 19.1 ms for 20 iterations with 214 landmarks.
4.4
Experiments and Results
67
Mapping Error vs. Number of Representatives
ICP Iteration Runtime vs. Number of Representatives
210
211
212
213
214
20
22
210
211
212
213
214
24
26
28
210
212
214
20
(a)
22
24
26
28
210
212
214
(b)
Figure 4.6: (a) Evaluation of the influence of |R| on mapping accuracy, compared to an
exact BF search, for varying number of landmarks. The graphs show both discretized
measurements and a trendline for each setting. Note the semi-log scale. (b) Runtimes of a
single ICP iteration, for varying number of landmarks/representatives. Note the log scale.
Approximative RBC. As detailed in Sect. 4.3.2, our approximative RBC construction and NN search trades exactness for runtime speedup. We quantitatively
investigated the error that results from this approximation opposed to an exact
BF search, comparing the Euclidean mesh-to-mesh distance of the aligned point
clouds and considering the BF-based transformation estimate as gold standard,
see Fig. 4.6a. With an increasing number of representatives |R|, the mapping error rises until dropping sharply when approaching |R| = |F |. Vice versa, for
|R| |F |, decreasing |R| with a fixed number of landmarks reduces the error.
This results from our approximative RBC construction scheme, where the probability of erroneous NN assignments increases with the number of representatives.
Please note that both situations of |R| = 1 and |R| = |F | correspond to a classical BF search, hence yielding an identical alignment and a mean error of zero.
In general, increasing the number of landmarks decreases the error. We remark
that using our default configuration (214 LMs, |R| = 27 ), the mapping error is
less than 0.25 mm. This is an acceptable scale for the large-scale applications con# Landmarks
210
211
212
213
214
|R|
1
|F | 2 = 32
1
|F | 2 = 45
1
|F | 2 = 64
1
|F | 2 = 91
1
|F | 2 = 128
tRBC,C
tICP
ttot (10 its)
ttot (20 its)
0.58
0.25
3.13
5.68
0.60
0.27
3.31
6.03
0.63
0.32
3.80
6.97
0.76
0.50
5.80
10.82
0.90
0.91
9.96
19.07
Table 4.1: Runtimes in [ms] for the construction of the RBC data structure (tRBC,C ) and
ICP execution for reconstructing a typical indoor scene, for varying number of landmarks.
We state both the runtime for a single ICP iteration tICP and typical total ICP runtimes ttot
(including RBC construction) for 10 and 20 iterations, respectively.
68
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
(a)
(b)
Figure 4.7: (a) Synthetic scene mimicking the scenario in a minimally invasive procedure
in the abdominal cavity. (b) Illustration of a camera path, showing both photometric (first
rows) and geometric depth information (second rows, darkness denotes closeness, brightness remoteness to the RI camera). For convenience, the entire RGB-D sequence is depicted
using eight keyframes from top left to bottom right.
sidered here. In addition, we have related the runtime per ICP iteration to the
number of representatives, see Fig. 4.6b. Apart from the runtime minimum that
1
is located around |R| = 2|F | 2 , the computational load rises when decreasing
or increasing |R|. Simultaneously, the error decreases, recall Fig. 4.6a. Hence,
the application-specific demands in terms of runtime and accuracy motivate the
choice of |R|. Together, Figs. 4.6a,b nicely illustrate the trade-off between error
and runtime.
4.4.2
Experiments on Operation Situs Reconstruction
Now let us consider the application of the proposed photo-geometric registration
framework to operation situs reconstruction in laparoscopy. Even though promising concepts for 3-D endoscopy have been introduced lately, recall Sect. 4.1, the
commercial availability of such hardware is an open issue. Hence, in this work, we
performed a comprehensive study on synthetic RGB-D data. We explicitly investigate the benefit of the proposed photo-geometric approach to a solely geometrydriven variant in the presence of low-SNR range measurements.
Materials and Methods. For the experiments, we generated a synthetic scene
mimicking the scenario of a minimally invasive procedure in the abdominal cavity.
We then used a virtual RGB-D camera to acquire both geometric range and photometric texture information while successively moving the virtual camera over the
operation situs. Let us stress that the photometric values at a specific location on
the scene vary with changing camera perspective, due to the underlying shader.
The synthetic scene was generated from human organ shapes extracted from CT
data. For the experiments below, we used mesh data from a publicly available
4.4
Experiments and Results
69
Frame-to-Frame Error (Translation), 3-D vs. 6-D
1.5
Frame-to-Frame Error (Rotation), 3-D vs. 6-D
0.35
0.3
in [°]
in [mm]
0.25
1
0.2
0.15
0.5
0.1
0.05
0
p1
p2
p3
p4
p5
p6
Camera Path
(a)
p7
p8
p9
p10
0
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
Camera Path
(b)
Figure 4.8: Boxplots of the (a) translational and (b) rotational drift for 3-D geometric ICP
registration (in blue) and 6-D photo-geometric ICP registration (in green), for the ten camera paths used for quantitative evaluation.
database1 . It includes both anonymized medical images and manual segmentations of structures of interest performed by clinical experts. The synthetic scene
was created in collaboration with a physician. In particular, we combined mesh
data of the liver, stomach, and gallbladder of a female patient, and modeled the
surrounding fat tissue. The organ textures were generated using a professional
medical visualization texture package2 and mapped onto the individual anatomical structures using texture mapping techniques, see Fig. 4.7a. Next, using our
virtual RGB-D camera (Sect. 2.2.2), we manually generated a set of ten different
camera paths. The paths were set up in a way that the entire operation situs is
roughly covered. For an illustration, see Fig. 4.7b. The path is sampled with an
RGB-D stream covering 200 frames. The FOV of the virtual RGB-D camera was set
to 80◦ , being a typical value for laparoscope optics [Yama 07].
For evaluation, we aligned the ten different RGB-D data streams on-the-fly and
integrated the registered RI surface data from successive frames into a global shape
model. The model is based on the concept of truncated signed distance functions
(TSDF) along the lines of Curless and Levoy [Curl 96] and Hilton et al. [Hilt 96]. In
terms of parameterization, we used 20,000 landmarks, |R| = 256 RBC representatives, a geometric weighting of β = 0.001, 10% edge clipping, and 50 ICP iterations.
We quantify the alignment error by evaluating the relative camera transformation
error in a frame-to-frame manner, i.e. considering the local accuracy of the camera
trajectory over subsequent frames, aka. drift. In particular, along the lines of Sturm
et al. [Stur 12], we define the relative camera transformation error Ei ∈ R4×4 for
frame i as:
−1
Ei := TGT,i
Ti ,
(4.11)
where TGT,i ∈ R4×4 denotes the ground truth camera transformation at frame i
w.r.t. the previous frame (i − 1), known from the given camera path, and Ti ∈ R4×4
1 http://www.ircad.fr/softwares/3Dircadb/3Dircadb1/index.php?lng=en
2 http://www.doschdesign.com/products/textures/medical_visualization_v3.html
70
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
0.15
6-D ICP, Frame-to-Frame Error (Translation), Synth./Noisy
0.1
6-D ICP, Frame-to-Frame Error (Rotation), Synth./Noisy
0.09
0.07
0.1
in [°]
in [mm]
0.08
0.06
0.05
0.04
0.05
0.03
0.02
0.01
0
p1
p2
p3
p4
p5
p6
p7
p8
p9
0
p10
p1
p2
p3
p4
Camera Path
(a)
2
p7
p8
p9
p10
0.5
3-D ICP, Frame-to-Frame Error (Rotation), Synth./Noisy
0.45
1.6
0.4
1.4
0.35
1.2
0.3
in [°]
in [mm]
p6
(b)
3-D ICP, Frame-to-Frame Error (Translation), Synth./Noisy
1.8
1
0.25
0.8
0.2
0.6
0.15
0.4
0.1
0.2
0.05
0
p5
Camera Path
p1
p2
p3
p4
p5
p6
Camera Path
(c)
p7
p8
p9
p10
0
p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
Camera Path
(d)
Figure 4.9: (a, b) Boxplots of the translational and rotational drift for 6-D photo-geometric
ICP registration, on ideal (green) vs. noisy range data (red), for the ten camera paths.
(c, d) Translational and rotational drift for 3-D geometric ICP registration, on ideal (blue)
vs. noisy range data (red).
the transformation that was estimated based on surface registration using the proposed framework. From these error matrices Ei , we eventually consider the translational and rotational components and calculate the per-frame Euclidean translation error k∆tE k2 and mean rotational error |∆θ E | = 31 (|∆θ E,x | + |∆θ E,y | + |∆θ E,z |),
being a suitable metric in the presence of small angles, over the entire RI sequence.
Note that this per-frame drift metric is particularly useful for the evaluation of
scene reconstruction scenarios. In order to investigate the resilience of the photogeometric ICP variant to noise, we applied Gaussian noise to the virtual range image data at a rather rigorous noise level of σ = 5 mm. As in practice, we applied
edge-preserving denoising using guided image filtering (Sect. 2.2.3).
Results. Quantitative results comparing the performance of the conventional geometric (3-D) ICP against the proposed photo-geometric variant (6-D) in terms of
translational/rotational drift are depicted in Fig. 4.8. Averaged over all ten camera
paths, the mean errors were k∆tE k2 = 0.264 mm, |∆θ E | = 0.079◦ for 3-D ICP and
4.4
Experiments and Results
(a)
71
(b)
(c)
Figure 4.10: Qualitative reconstruction results for the camera path shown in Fig. 4.7. The
Euclidean mesh-to-mesh distance of the reconstructed scene to the known ground truth is
color-coded. The second row shows the reconstruction results. The first row additionally
depicts the ground truth data. From left to right: Reconstruction results for (a) geometric
and (b) photo-geometric ICP on ideal synthetic data, and (c) for photo-geometric ICP on
noisy data. Note the salient seam between the first and last frame (marked with a black
ellipse) for geometric reconstruction. With photo-geometric reconstruction on noisy data,
the seam is reduced substantially. For photo-geometric reconstruction on ideal data it is
hardly visible.
k∆tE k2 = 0.020 mm, |∆θ E | = 0.013◦ for 6-D ICP. This corresponds to a substantial
improvement in terms of reconstruction accuracy by a factor of 12.9 (translation)
and 5.9 (rotation), respectively.
Comparing the reconstruction results of our photo-geometric approach on ideal
synthetic range data to the results on noisy range data, see Fig. 4.9a,b, we observed
a decrease in accuracy. In average over all paths, the per-frame drift increased by
a factor of 2.1 (translation) and 2.2 (rotation). Note that in absolute numbers, the
mean translational and rotational errors with noisy data are still in an acceptable
range of k∆tE k2 = 0.044 mm, |∆θ E | = 0.029◦ and substantially outperform the
geometric ICP variant on ideal range data (Fig. 4.9c,d). In addition, let us compare
the influence of noise independently for 3-D ICP and 6-D ICP, see Fig. 4.9a-d. In
average, the absolute increase in drift for the 3-D ICP due to noise exceeded the
absolute increase in drift for the 6-D ICP by a factor of 7.0 (translation) and 3.3 (rotation). This confirms our initial assumption that the proposed photo-geometric
approach is of particular interest for RGB-D cameras that provide low-SNR range
measurements but additionally acquire high-grade photometric information.
Qualitative results of the reconstructed operation situs based on the camera
path depicted in Fig. 4.7 are illustrated in Fig. 4.10. Here, the local Euclidean
mesh-to-mesh distance between the reconstructed and ground truth shape is colorcoded. Note that for the geometry-driven variant (Fig. 4.10a), the non-negligible
drift behavior makes the reconstruction fail with an increasing number of frames
72
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
(a)
(b)
(c)
(d)
Figure 4.11: Realistic renderings of the colonic lumen during virtual fly-through. The
first row depicts photometric data. In the second row, the associated range data is graycoded, where darkness denotes closeness, brightness remoteness to the RI camera. Note
the symmetries of the tubular shapes in (a) and (c) and the triangular-shaped muscles seen
when inflating the colon with gas.
considered. The photo-geometric reconstruction provides reasonable results, even
on noisy data.
4.4.3
Experiments on Colon Shape Model Construction
In this section, we address the application of the photo-geometric registration
framework for the construction of a shape model of the colon in 3-D colonoscopy.
Experiments are performed on synthetic RGB-D data generated from realistically
textured colon meshes that were extracted from virtual colonoscopy CT data.
Materials and Methods. The experiments below are conducted on textured colon
meshes from the work of Passenger et al. [Pass 08]3 . These were generated from
abdominal CT data from a VC study4 . As detailed in Passenger et al. [Pass 08], textured colon meshes were generated as follows: First, the colon is segmented from
CT data by edge-preserving anisotropic smoothing of the image data [Whit 01],
subsequent thresholding to separate the gas-filled areas from the background, and
connected component analysis to select the colon. Second, the resulting binary segmentation mask is meshed using Marching Cubes [Lore 87]. Eventually, Laplacian
smoothing [Fiel 88] and decimation filters are applied to obtain a smooth surface
mesh of the colonic lumen. Realistic tissue renderings for the colonic wall were
obtained by computing texture coordinates for the generated surface mesh using
a dedicated surface parameterization algorithm [Pass 08]. For generation of virtual
fly-throughs, we computed a centerline through the colon mesh based on tracing
the shortest path between two manually labeled points (cecum, rectum) with the
restriction that the paths are bound to run on the Voronoi diagram of the colon
3 Courtesy
4 Courtesy
of Dr. Hans de Visser, Australian e-Health Research Centre
of Dr. Richard Choi, Virtual Colonoscopy Center, Walter Reed Army Medical Center
4.4
Experiments and Results
73
Frame-to-Frame Error (Translation), 3-D vs. 6-D
0.6
Frame-to-Frame Error (Rotation), 3-D vs. 6-D
0.8
0.7
0.5
in [°]
in [mm]
0.6
0.4
0.3
0.5
0.4
0.3
0.2
0.2
0.1
0
0.1
c1
c2
c3
c4
c5
c6
c7
0
c8
c1
c2
c3
Colon Case
( a)
c6
c7
c8
6-D ICP, Frame-to-Frame Error (Rotation)
0.08
0.5
0.07
0.05
0.06
0.4
0.04
in [°]
in [mm]
c5
(b)
6-D ICP, Frame-to-Frame Error (Translation)
0.06
c4
Colon Case
0.3
0.03
0.2
0.05
0.04
0.03
0.02
0.02
0.1
0.01
0.01
0
0
c1
c2
c3
c4
c5
c6
c7
c8
0
c1
c2
c3
c4
c5
Colon Case
Colon Case
(c)
(d)
c6
c7
c8
Figure 4.12: (a, b) Boxplots of the translational and rotational drift for 3-D geometric (blue)
vs. 6-D photo-geometric ICP registration (green), for all eight colon cases. (c, d) Drift for
6-D photo-geometric ICP at a fine scale. Note that the scales differ by a factor of 10.
model [Anti 02]. The colon texture was created using a professional medical visualization texture package5 . In total, the proposed framework is evaluated on eight
textured colon models, see Fig. 4.16.
The camera paths for virtual fly-through are generated from the computed centerlines, see Fig. 4.11. For each colon dataset, the camera path from cecum to rectum is divided into 1000 frames. The fly-throughs start at the cecum and end at the
rectum, mimicking the clinical practice of colon examination while retracting the
colonoscope (Sect. 4.1.2). Furthermore, we simulate a forward viewing endoscope
optic with wide angle optics (100◦ FOV) being typical in colonoscopy. RGB-D data
is generated using the virtual camera introduced in Sect. 2.2.2. Similar to the experiments in laparoscopy (Sect. 4.4.2), in terms of parameterization we used 20,000
landmarks, |R| = 256, β = 0.001, 30% edge clipping, and 50 ICP iterations. For
quantitative evaluation, we consider the translational and rotational component
of the relative camera transformation error (Eq. 4.11) again.
5 http://www.doschdesign.com/products/textures/medical_visualization_v3.html
74
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
0.7
3-D ICP, Frame-to-Frame Error (Translation), RBC/Exact
0.9
3-D ICP, Frame-to-Frame Error (Rotation), RBC/Exact
0.8
0.6
0.7
0.6
in [°]
in [mm]
0.5
0.4
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0
0.1
c1
c2
c3
c4
c5
c6
c7
c8
0
c1
c2
c3
c4
c5
Colon Case
Colon Case
(a)
(b)
c6
c7
c8
Figure 4.13: Boxplots of the (a) translational and (b) rotational drift, comparing the results
of an approximative RBC-based geometric ICP registration (blue) to the results with a BFbased NN search (green).
Results. Let us first give an overview of the results, comparing the reconstruction
accuracy of geometric vs. photo-geometric ICP. Quantitative results for the eight
colon datasets are given in Fig. 4.12. For the geometry-driven reconstruction, the
drift was k∆tE k2 = 0.137 mm and |∆θ E | = 0.252◦ . The proposed photo-geometric
ICP reduced the error to k∆tE k2 = 0.025 mm and |∆θ E | = 0.038◦ . This corresponds
to an improvement in reconstruction accuracy by a factor of 5.5 (translation) and
6.5 (rotation), respectively. Note that compared to operation situs reconstruction
(Sect. 4.4.2), the application in colonoscopy considered here involves (1) a more
complex camera path navigating through the colonic bends, (2) less diverse scene
content, and (3) a rather low-contrast texture.
Now let us compare the performance of our approximative RBC-based NN
search to an exact BF-based NN search w.r.t. reconstruction accuracy. Experiments
are performed for a geometric ICP, the results are depicted in Fig. 4.13. Over all
eight cases, both approaches yield a similar drift behavior. In average, the exact
BF-based NN search reduced the error by approximately 1%. This is an acceptable
impairment considering the associated gain in runtime performance with our approximative RBC-based approach, recall Fig. 4.5. We further investigated the influence of the geometric weighting β on the drift behavior, see Fig. 4.14. The boxplots
indicate that the optimal weighting lies in the scale of β = 0.001. It is worth noting
that apart from this minimum, the drift increases relatively slowly – stable results
were achieved over a wide range of values. We also investigated the convergence
behavior for geometric and photo-geometric ICP alignment. Quantitative results
of the drift after different numbers of iterations are depicted in Fig. 4.15. Note
that the photo-geometric variant converges substantially faster while achieving
a lower residual error, cf. Fig. 4.12. Comparing the drift after 16 ICP iterations,
e.g., the photo-geometric approach outperforms the geometric one by factors of
5.1 (translation) and 4.3 (rotation), respectively.
4.5
Discussion and Conclusions
75
Photogeometric Weight, Frame-to-Frame Error (Translation)
0.15
0.5
Photogeometric Weight, Frame-to-Frame Error (Rotation)
0.45
0.35
0.1
in [°]
in [mm]
0.4
0.3
0.25
0.2
0.05
0.15
0.1
0.05
0
1E-5 5E-5 1E-4 5E-4 1E-3 5E-3 1E-2 5E-2 1E-1 5E-1
0
1E-5 5E-5 1E-4 5E-4 1E-3 5E-3 1E-2 5E-2 1E-1 5E-1
Geometric Weight
(a)
Geometric Weight
(b)
Figure 4.14: Investigation of the influence of the geometric weighting β, for a single case.
Given are boxplots of the (a) translational and (b) rotational drift. Recall that the scale of β
results from the different scales of the geometric and photometric domain, respectively.
Eventually, let us present some qualitative results for the reconstruction of a
full colon shape model from 3-D colonoscopy data, see Fig. 4.16. Recall that the
experiments are performed based on a rigidity assumption that will not be fulfilled in practical application. Nonetheless, the renderings underline the benefit
of incorporating additional photometric information into the registration process,
particularly for tubular shaped anatomic structures that imply ambiguities in the
geometric domain. Reconstructing a colon from a sequence of 1000 frames further
stresses the effect of drift on the global model over time. Note that for the geometric ICP, the degree of misalignment is low in the first section of the camera path
(starting at the cecum), however, adds up with an increasing number of frames.
In contrast, the proposed photo-geometric variant is capable of reconstructing the
global shape of the colon in a superior manner.
4.5
Discussion and Conclusions
In this chapter, we have proposed a method for on-the-fly surface registration
and shape reconstruction that builds upon a photo-geometric ICP framework and
exploits the RBC data structure and search scheme for efficient 6-D NN search.
We have optimized the concept of RBC regarding runtime performance on lowdimensional data, and achieved frame-to-frame registration runtimes of less than
20 ms on an off-the-shelf consumer GPU. In an experimental study on synthetic
RGB-D data, we have addressed two endoscopic applications and observed that
incorporating photometric appearance as a complementary cue substantially outperforms a conventional geometry-driven ICP. In contrast to approaches that combine dense geometric point associations with a sparse set of correspondences derived from local photometric features, the proposed framework evaluates both geometric and photometric information in a low-level but dense manner. We found
76
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
Convergence Behavior, Frame-to-Frame Error (Translation)
1.5
1
Convergence Behavior, Frame-to-Frame Error (Rotation)
1
in [°]
in [mm]
1.5
0.5
0
0.5
8
16
32
64
128 256
8
16
32
64
128 256
0
8
16
32
64
128 256
8
# Iterations
# Iterations
(a)
(b)
16
32
64
128 256
Figure 4.15: Investigation of the convergence behavior, for 3-D geometric (in blue) and
6-D photo-geometric ICP registration (in green), for a single case. Given is the per-frame
translational (a) and rotational (b) drift after different numbers of ICP iterations.
that incorporating photometric appearance in such an elementary way gives a convenient compromise between registration robustness and runtime performance.
For operation situs reconstruction in 3-D laparoscopy, the photo-geometric ICP
reduced the drift by a factor of 12.9 (translation) and 5.9 (rotation), respectively,
compared to a geometry-driven ICP registration. Furthermore, we showed that an
equal increase in noise in the range measurements results in a substantially smaller
increase in drift for the photo-geometric ICP variant. For colon model construction
in 3-D colonoscopy, the drift could be reduced by a factor of 5.5 (translation) and
6.5 (rotation), comparing photo-geometric vs. geometric ICP registration. Overall,
the results are consistent to findings in a previous study on non-medical scenarios, stating that incorporating photometric information decreased the registration
error by an order of magnitude [John 97].
In conclusion, let us summarize the limitations of this study and comment on
future research directions. First, we have modeled the target structure to be static.
This might be an acceptable approximation for mapping scenarios in laparoscopic
procedures where accuracy requirements are less strict and motion is moderate,
or when modeling local structures of a colon. When aiming at the reconstruction
of the full colon, the target will be subject to a substantial amount of non-rigid organ deformation. Nonetheless, assuming a piecewise static assumption could allow for the reconstruction of local colonic segments. Then, prior knowledge from
VC data (if available) could be employed to align these local shape models. Vice
versa, such local models could help in adding local texture to the VC colon model.
A future evaluation on real RGB-D data must further reveal the influence of the
modality-specific noise behavior and inherent artifacts that occur in clinical practice. Strategies to improve the robustness of the system include a multi-resolution
scheme, smart outlier handling, the transition from frame-to-frame to frame-tomodel registration [Curl 96], and dedicated techniques such as loop closure – if
applicable in the particular application.
Discussion and Conclusions
centerlines are depicted in blue. Note the substantial differences in both local and global shape and the sharp bends at the transition between ascending,
transverse, descending and sigmoid colon. The remaining columns show reconstruction results for 3-D geometric ICP registration (blue) and 6-D photogeometric ICP registration (green).
Figure 4.16: Qualitative reconstruction results for the eight colon models that are shown in the first and fourth column, respectively. Their associated
4.5
77
78
Photo-geometric Rigid Surface Registration for Endoscopic Reconstruction
Part II
Non-Rigid Surface Registration for
Range Imaging Applications in
Medicine
79
CHAPTER
5
Joint Range Image Denoising
and Surface Registration
5.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3 Non-Rigid Surface Registration Framework . . . . . . . . . . . . . . . . . . . 86
5.4 A Joint Denoising and Registration Approach. . . . . . . . . . . . . . . . . . 88
5.5 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.6 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
The management of respiratory motion in diagnostic imaging, interventional imaging and therapeutic applications is a rapidly evolving field of research with many
current and future issues to be addressed. In Part II of this thesis we focus on respiratory motion tracking and management in radiation therapy. Based on the fusion
of pre-fractionally acquired accurate tomographic planning data (CT/MR) and
intra-fractionally acquired RI data, an improved RT treatment can be achieved.
More specifically, first studies have indicated that the identification and tracking
of non-rigid torso surface deformations induced by breathing holds great potential
to improve the accuracy of dose delivery [Yan 06, Faya 11, Scha 12].
In this chapter, we propose a variational framework that solves denoising of
low-SNR RI data and its registration to a reference shape extracted from tomographic planning data. In particular, we present a novel joint formulation for solving these two intertwined problems, assuming that tackling each task would benefit considerably from prior knowledge of the solution of the other task [Baue 12b].
Thereby, we explicitly exploit the fact that the reference shape extracted from tomographic data represents the patient’s geometry in an accurate and reliable manner. The proposed method enables:
• Robust intra-fractional full torso surface acquisition for patient monitoring
• Estimation of non-rigid torso deformations, yielding a high-dimensional respiration surrogate in terms of dense displacement fields
The remainder of this chapter is organized as follows. First, we introduce the
medical background in Sect. 5.1. We present a comprehensive overview here, as
all methods proposed in Part II of this thesis (Chapters 5-7) address the task of respiratory motion tracking with a particular focus on RT. In Sect. 5.2, we summarize
81
82
Joint Range Image Denoising and Surface Registration
related work w.r.t. methodology. Sect. 5.3 introduces our general variational formulation for non-rigid geometric surface registration with dense RI data. Based
on this initial model, we propose an extended formulation for joint registration
and range image denoising in Sect. 5.4. In Sect. 5.5, we study the parameterization of the method and show experimental results on both synthetic and real
ToF/CT data. Eventually, we discuss the results and draw a conclusion in Sect. 5.6.
Parts of this chapter have been published in [Baue 12b] and are joint work with
Prof. Dr. Martin Rumpf and Prof. Dr. Benjamin Berkels.
5.1
Medical Background
First, we introduce the medical background for the methods proposed in this and
the two following chapters (Chapters 6,7). Even though the methods presented
in Part II consider substantially different RI modalities and registration concepts,
respectively, they share the clinical motivation. While we restrict the discussion
to respiratory motion tracking in RT here, the management of respiratory motion
holds great benefits in many clinical applications beyond RT, e.g. in diagnostic and
interventional procedures.
5.1.1
Image-Guided Radiation Therapy
Along with the trend toward small-margin and high-dose external beam RT, treatment success of patients with thoracic, abdominal and pelvic tumors strongly
depends on an accurate delivery of the intended radiation dose. In target locations in the thoracic cavity and the abdomen, anatomical structures are known
to move considerably due to patient respiration [Bran 06, Lang 01]. The motion
of the tumor and adjacent tissue during treatment has a profound impact on RT
planning and delivery. Besides uncertainties in inter-fractional patient positioning [Essa 02] (cf. Sect. 3.1.1) and the planning process [Rish 11], intra-fractional
respiratory motion induces a substantial geometric and dosimetric source of error [Sepp 02, Will 12]. Even though there is clinical evidence that smaller fractions
and higher radiation doses can be beneficial in terms of local tumor control, in
practice, uncertainties due to respiratory induced motion typically demand for
conservative treatment strategies. In order to account for potential targeting errors and to assure adequate dosimetric coverage of the tumor-bearing tissue, large
safety margins are typically applied. However, these come at the cost of irradiating
surrounding radiosensitive structures.
To reduce tolerances between the planned and actually delivered dose distribution, a multitude of techniques for respiratory motion management have been
developed over the past decades. For a comprehensive survey we refer to Keall
et al. [Keal 06] and Verellen et al. [Vere 10]. The introduction of image-guided radiation therapy (IGRT) has been a milestone in improving dose delivery based on
the instant knowledge of the target location and spatio-temporal changes in tumor
volume during the course of treatment. In conventional IGRT, the location of the
tumor and adjacent critical structures is determined using in-room radiographic
5.1
Medical Background
83
imaging (e.g. stereoscopic X-ray imaging, in-room CT, cone-beam CT) of the target site prior to radiation delivery. This allows a verification of the tumor location
in the pre-treatment position, but cannot account for intra-fractional variations in
tumor position induced by respiration.
5.1.2
Respiration-Synchronized Dose Delivery
Modern respiration-synchronized IGRT aims at continuously tracking the moving
target over its trajectory and re-position the treatment table or radiation beam dynamically to follow the tumor’s changing position during therapeutic dose delivery [Diet 11, Kilb 10, McCl 06, Murp 04, Wilb 08]. This allows to reduce the tumormotion margin in dose distribution and to substantially increase the LINAC’s duty
cycle compared to gated RT (20-30%), where the beam is merely activated within a
short time slot around a pre-defined state in the respiration cycle [Kubo 96]. Thus,
the overall treatment time can be reduced, enabling an efficient operation of the
therapy facility. The detection and tracking of the tumor position is the most important and challenging task in this context. In practice, radiographic imaging
of the target itself is often not feasible as most tumors will not present a welldefined object boundary suitable for automatic image segmentation and registration, respectively. According to clinical studies, implanted fiducial markers offer the most accurate solution to determine the target position during treatment.
However, the benefits of accuracy need to be weighed against the cost-intensive
and invasive procedure of implanting markers and the eventuality of marker migration [Keal 06]. Both direct image-guided and fiducial-based tumor tracking potentially require continuous radiographic imaging, involving risks associated with
the additional radiation dose [Murp 07].
In order to reduce additional radiation exposure, recent hybrid tumor-tracking
techniques combine episodic radiographic imaging with continuous monitoring
of external breathing surrogates based on the premise that the internal tumor
position can be predicted from the deformation of the external body surface in
the time interval between image acquisitions. The underlying patient-specific
correlation model can be established from a series of simultaneously acquired
external-internal position measurements [Muac 07, Hoog 09, Erns 12], from 4-D
CT [Eom 10, Vand 11, Vere 10], or 4-D MR planning data [Miqu 13]. During treatment, this model is used to deduce the tumor position under the influence of respiratory motion using real-time external surrogates, see Fig. 5.1 for a schematic
illustration of the workflow. A clinically available solution for tumor motion compensation based on a patient-specific external-internal motion correlation model
is the CyberKnife system (Accuray Inc., Sunnyvale, CA, USA1 ) [Kilb 10]. It uses
three optical markers attached to a wearable vest as external respiration surrogate. Episodic verification of the internal target position is performed based on
stereoscopic X-ray imaging and fiducial markers, or fiducial-free soft-tissue tracking. For dynamic beam steering, the LINAC is mounted on a robotic manipulator.
A similar approach is the VERO platform (Brainlab AG, Feldkirchen, Germany,
1 http://www.accuray.com/
84
Joint Range Image Denoising and Surface Registration
Figure 5.1: Workflow in RI-guided respiration-synchronized RT. The initial preparation
phase involves the training of an external-internal motion correlation model and the generation of the treatment plan. Then, in the fractional treatment sessions, the learned model
is applied for respiratory motion compensation during dose delivery. RI can be also used
for patient setup (recall Sect. 3.1.1). Image sources (CT scanner, RT system): Siemens AG.
and Mitsubishi Heavy Industries Ltd., Tokyo, Japan2 ) [Depu 11, Vere 10]. It integrates stereoscopic X-ray imaging, volumetric cone-beam CT and real-time beam
adjustment for respiration-synchronized tumor tracking.
5.1.3
Dense Deformation Tracking
The key issue with external-internal motion correlation models is the actual level
of correlation, accounting for the accuracy of dose delivery in the time interval
between radiographic model verification. Clinically available solutions that are
in use or potentially suitable for hybrid tumor-tracking [Ford 02, Hoog 09, Will 06]
typically measure external motion using a single or few passive markers on the patient’s chest as a low-dimensional (in most cases 1-D) surrogate. However, markerbased external surrogates require extensive patient preparation and reproducible
marker placement with a considerable impact on model accuracy - directly translating into the accuracy of dose delivery. Furthermore, in practice, patient respiration is rarely regular and subject to substantial inter-cycle variability [Keal 06] and
the bio-mechanical coupling of external surrogates with the internal target motion
may exhibit complex relationships. Thus, those low-dimensional techniques are
incapable of depicting the full complexity of respiratory motion. Experimental
studies by Yan et al. [Yan 06] and Fayad et al. [Faya 11] confirmed that using multiple external surrogates at different anatomical locations is superior to the conventional approach with a single 1-D respiratory signal for external-internal correlation modeling. Both conclude that model accuracy correlates with the quantity of
suitable external surrogate positions [Yan 06, Faya 09, Faya 11].
Modern IGRT solutions that enable to monitor the motion of the complete external patient body surface have the potential to help reducing correlation model
uncertainties. In particular, marker-less RI technologies can acquire a dense 3-D
surface model of the patient [Bert 05, Brah 08, Mose 11, Gier 08, Mose 11, Peng 10,
Scho 07] over time (3-D+t). For an overview of non-radiographic systems for pa2 http://www.vero-sbrt.com/
5.2
Related Work
85
tient positioning, tumor localization and tracking, and motion compensation we
refer to Willoughby et al. [Will 12] and Meeks et al. [Meek 12]. Early strategies
in RI-based motion tracking were restricted to low-dimensional respiration surrogates [Scha 08]. However, as stated before, a more reliable and accurate correlation model can be potentially established from high-dimensional respiration
surrogates [Yan 06, Faya 11]. Hence, we target the estimation of a dense displacement field representing the deformation of the instantaneous torso shape w.r.t. a
reference surface [Baue 12b, Baue 12a, Baue 12d, Berk 13]. The methods introduced
in Part II of this thesis estimate such dense displacement fields in the presence of:
• Dense but low-SNR RI data (Chap. 5)
• Accurate but sparse RI data (Chap. 6)
• Dense RI data with complementary photometric information (Chap. 7)
Let us stress that in this thesis we explicitly focus on the aspect of respiratory
motion tracking. The application of the estimated displacement fields in motion
management and compensation using patient-specific motion models [Wasz 12b,
Wasz 13, McCl 13], for instance, is beyond the scope of this work.
5.2
Related Work
Opposed to marker-based tracking technologies, RI cameras do not measure the
local motion trajectory at specific surface landmarks directly. Thus, the dense displacement field describing the local surface point trajectories must be recovered
using non-rigid surface registration techniques (Sect. 2.4.2). The idea of reconstructing dense surface motion fields for respiratory motion tracking was proposed by Schaerer et al. [Scha 12] and Bauer et al. [Baue 12a, Baue 12b, Baue 12d]
only recently. In previous work on extracting a multi-dimensional respiration
surrogate from dense RI data, Fayad et al. [Faya 11] had proposed to track the
3-D points of the acquired surface according to its associated pixel indices in the
sensor domain. Note that this is a poor approximation assuming the individual
points to move along the projection rays of the RI camera.
Let us distinguish our variational model for dense non-rigid surface registration introduced below (Sect. 5.3) from alternative strategies, cf. Sect. 2.4.2. First,
we particularly exploit the fact that for the considered application in RT, the reference shape is extracted from tomographic planning data. Hence, it describes
the patient geometry in a highly reliable and accurate manner. This supersedes
the need to model the reference shape in a probabilistic manner such as in more
generic point cloud registration approaches which treat the surface registration
problem as an alignment between two distributions, cf. related work by Tsin and
Kanade [Tsin 04] and Jian and Vermuri [Jian 05, Jian 11]. Second, the methods
proposed in this chapter and in Chap. 6 build on an implicit representation of
the reference data. In particular, we encode its shape in a signed distance function (SDF) [Jone 06] that is pre-computed in a sufficiently large neighborhood and
stored in a 3-D volumetric image domain [Russ 00]. This SDF can be constructed
86
Joint Range Image Denoising and Surface Registration
offline, prior to treatment. Regarding the template shape, embedding it into a
signed distance transform space and solving the non-rigid registration problem
between the two resulting SDFs in a volumetric image domain [Para 03, Huan 06]
would imply a substantial computational burden. Even though hardware acceleration of volumetric registration techniques has progressed over the past years
[Fluc 11, Sham 10], working on the 2-D range image domain is generally much
more efficient.
Hence, instead, we propose a variational formulation that aligns the template
data to the reference shape SDF in a direct manner. In particular, we exploit the
bijective mapping between the intra-fractionally acquired 3-D point cloud and its
underlying representation in a regular 2-D base domain – the RI sensor plane.
In this chapter, given a reference shape extracted from pre-fractionally acquired
tomographic planning data and intra-fractionally acquired low-SNR RI data of the
patient’s torso, we introduce a variational approach which combines the two intertwined tasks of RI data denoising and its non-rigid alignment to the reference
shape. The idea of developing joint variational methods for intertwined problems has become quite popular and successful in imaging recently. Already a
decade ago, Yezzi, Zöllei and Kapur [Kapu 01], Unal et al. [Unal 04], and Feron
and Mohammad-Djafari [Fero 04] have combined image segmentation and registration. Joint formulations have been also proposed for object de-blurring and motion estimation [Bar 07], denoising and anisotropy estimation [Berk 06], and joint
image registration, denoising and edge detection [Han 07]. Droske and Rumpf
proposed a variational scheme for image denoising and registration based on nonlinear elastic functionals [Dros 07], and Buades et al. presented image sharpening
methods based on combined denoising and registration [Buad 09]. We further refer to the dissertation of Berkels on joint methods in imaging [Berk 10].
5.3
Non-Rigid Surface Registration Framework
In this section, we introduce our general framework for non-rigid surface registration with dense RI data, assuming ideal (noise-free) range measurements. We
depict the geometric configuration, introduce the basic notation and define a suitable variational formulation to solve the registration problem at hand. Then, in
Sect. 5.4, we propose a novel approach for joint range image denoising and registration that is particularly designed for low-SNR RI data.
5.3.1
Geometric Configuration
Assume that we have extracted a reliable surface G ⊂ R3 from tomographic planning data. Furthermore, during treatment delivery, we continuously acquire range
image data r that describes a surface Xr ⊂ R3 . For each position ζ ∈ R2 on the
image plane Ω, the (assumingly) noise-free range value r (ζ ) describes a position
x = xr (ζ ) = r (ζ ) p(ζ ) on Xr with xr : Ω → Xr , where p denotes the 2-D/3-D
mapping operator, see Sect. A.1 in the Appendix of this work.
Due to respiration, the intra-fractionally acquired RI surface Xr differs from
the pre-fractionally acquired planning shape G . More specifically, the shape of
5.3
Non-Rigid Surface Registration Framework
(a)
87
(b)
Figure 5.2: Geometric sketch of the registration configuration for a torso-like shape. For a
better visibility, the reference shape G (in gray) and the RI surface Xr (in green) have been
pulled apart. (a) Assuming noise-free RI measurements, the displacement vector u(ζ ) (in
blue) maps a position xr (ζ ) ∈ Xr onto G . (b) With low-SNR RI data, the measured position
xr0 (ζ ) ∈ Xr0 is unreliable. Hence, the proposed joint approach simultaneously estimates
both a robust position xr (ζ ) and the associated displacement u(ζ ) mapping xr (ζ ) onto G .
Xr depends on the state of the patient’s respiratory motion at the RI acquisition
time. Hence, we consider a deformation φ : Xr → R3 matching Xr and G in the
sense that φ(Xr ) ⊂ G . This deformation can be represented by a displacement
u : Ω → R3 defined on the parameter domain Ω:
φ( xr (ζ )) = xr (ζ ) + u(ζ ) .
(5.1)
For a graphical illustration of the geometric configuration with noise-free RI data
we refer to Fig. 5.2a.
5.3.2
Definition of the Registration Energy
Now we are in the position to develop a variational framework which allows us to
estimate a suitable matching displacement u∗ as a minimizer of a functional E [u]:
E [u] := Ematch [u] + κ Ereg [u] .
(5.2)
It consists of a matching term Ematch and a smoothness prior Ereg for the displacement. The parameter κ is a positive constant weighting the contributions of the
different energies, thus controlling the trade-off between registration accuracy and
smoothness of the displacement field.
Matching Energy. The purpose of the matching functional Ematch is to encode the
condition φ(Xr ) ⊂ G . To quantify the matching of φ(Xr ) onto G let us assume that
the signed distance function dG with respect to G is pre-computed in a sufficiently
large neighborhood in R3 . The SDF dG : R3 → R is given as:
dG ( x) := ±dist( x, G) ,
(5.3)
where the sign is positive outside G and negative inside. In particular, dG ( x) = 0
for x ∈ G . Furthermore, ∇dG ( x) is the outward pointing normal vector on G for
88
Joint Range Image Denoising and Surface Registration
x ∈ G and k∇dG ( x)k2 = 1. Using this SDF dG , we can construct the projection of a
point x ∈ R3 in a neighborhood of G onto the closest point on G , P : R3 → G :
P( x) := x − dG ( x)∇dG ( x) .
(5.4)
Let us emphasize that, even though P(Xr ) ⊂ G holds by construction, we do not
expect any biologically reasonable φ to be equal to the projection P.
Using Eq. 5.4 and k∇dG ( x)k2 = 1, we construct a quantitative pointwise measure for the closeness of x = φ( xr (ζ )) to G :
k P(φ( xr (ζ ))) − φ( xr (ζ ))k2 = kdG (φ( xr (ζ )))∇dG (φ( xr (ζ )))k2 = |dG (φ( xr (ζ )))| .
(5.5)
Based on this closeness measure, we define the matching energy Ematch :
Ematch [u] :=
Z
2
Ω
|dG (φ( xr (ζ )))| dζ =
Z
Ω
|dG ( xr (ζ ) + u(ζ ))|2 dζ .
(5.6)
Note that we confine to an approximation here, as detailed in the Appendix of this
thesis (Sect. A.2.1).
Smoothness Prior. As a regularization prior for the displacement u, we quadratically penalize the magnitude of the variation in the vector field:
Ereg [u] :=
Z
Ω
k Du(ζ )k22 dζ ,
(5.7)
where Du denotes the Jacobian matrix, i.e. ( Du)ij = ∂ j ui , and k Ak2 the Frobenius
norm, i.e. k Ak22 := tr( A> A). This approach is known as diffusion regularization [Fisc 02, Mode 03a]. Along the lines of Horn and Schunck [Horn 81], the idea
behind is to minimize any variation of the vector field, favoring smooth deformations while preventing singularities such as cracks, foldings, and other undesired
properties such as oscillations.
5.4
A Joint Denoising and Registration Approach
The basic formulation for non-rigid surface registration presented in Sect. 5.3 might
be an appropriate choice for high-SNR RI data. In the presence of low-SNR data
from low-cost RI sensors, we face a different situation. When setting a low weighting of the regularization term in Eq. 5.2, the estimated non-rigid displacement field
might fulfill the condition φ(Xr ) ⊂ G . However, this comes at the cost of an impaired smoothness of the displacement field – exhibiting local spikes and thus
being an implausible solution from a biologically point of view.
Denoising the low-SNR RI data in a pre-processing step prior to surface registration will alleviate the problem. However, we assume that tackling each task –
range image data denoising and its non-rigid registration to a reference shape, respectively – would benefit substantially from prior knowledge of the solution of
the other task [Baue 12b]. Hence, in this section, we propose a joint formulation to
solve both tasks in a simultaneous manner.
5.4
A Joint Denoising and Registration Approach
5.4.1
89
Definition of the Registration Energy
Let us extend the variational registration model from Sect. 5.3.2 in a way that allows us to cope with considerably noisy range data r0 from a low-SNR RI camera,
see Fig. 5.2b. In particular, we aim at (1) restoring a reliable range function r ∗ and
(2) simultaneously extracting a suitable matching displacement u∗ as a minimizer
of a functional:
E [u, r ] := Efid [r ] + κ Er,reg [r ] + λEmatch [u, r ] + µEu,reg [u] .
(5.8)
Compared to the formulation for noise-free RI data in Eq. (5.2), we have added
a fidelity energy Efid for the range function r given the measured low-SNR range
data r0 and a suitable regularization prior Er,reg for the estimated range function.
The positive constants κ, λ, µ weight the contributions of the different energies.
This functional directly combines the range function r and the displacement u and
together with the corresponding prior functions both for r and u substantiates the
joint optimization approach of our method. In fact, an insufficient and possibly
noisy range function r prevents a regular and suitable matching displacement u
and vice versa.
For the matching energy Ematch and the prior for the displacement Eu,reg we
use the formulations introduced in Eqs. (5.6),(5.7). However, note that Ematch now
depends on two unknowns: the range data r and the displacement field u:
Ematch [u, r ] :=
Z
Ω
|dG (φ( xr (ζ )))|2 dζ =
Z
Ω
|dG (r (ζ ) p(ζ ) + u(ζ )) |2 dζ .
(5.9)
Fidelity Energy for the Range Function. In order to enforce closeness of the restored range function r to the given input data r0 , we confine here to a simple least
square type functional and define:
Efid [r ] :=
Z
Ω
|r (ζ ) − r0 (ζ )|2 dζ .
(5.10)
Prior for the Range Function. RI data of a human torso acquired from a camera
position above the reclined patient are characterized by steep gradients in particular at the boundary of the projected torso surface and by pronounced contour
lines. To preserve these features properly, a total variation (TV) type regularization prior for the range function is decisive. On the other hand, we would like to
avoid the well-known staircasing artifacts of a standard TV regularization. Hence,
we take into account a pseudo Huber norm:
q
2 ,
(5.11)
kykδreg = kyk22 + δreg
for y ∈ R2 and a suitably fixed regularization parameter δreg > 0 and define:
Er,reg [r ] :=
Z
Ω
k∇r (ζ )kδreg dζ .
(5.12)
Decreasing this energy comes along with a strong smoothing in flat regions which
avoids staircasing and at the same time preserves large gradient magnitudes that
occur at contour lines or boundaries.
90
Joint Range Image Denoising and Surface Registration
Joint Functional. In summary, combining the individual energy terms, we obtain
the following joint functional:
Z E [u, r ] =
|r − r0 |2 + κ k∇r kδreg + λ|dG (rp + u)|2 + µk Duk22 dζ .
(5.13)
Ω
For a proof of the existence of minimizers, we refer to Bauer et al. [Baue 12b].
5.4.2
Numerical Optimization
For the numerical minimization of the energy functional E [u, r ], we consider a
gradient descent method. This requires the computation of the first variations
with respect to the range function r and the displacement u, respectively, given as:
h∂r E [u, r ], ϑi =
h∂u E [u, r ], ϕi =
Z
Ω
Z
Ω
2(r − r0 ) ϑ + κ q
∇r · ∇ ϑ
2
|∇r |2 + δreg
+
2λdG (rp + u)∇dG (rp + u) · p ϑ dζ ,
(5.14)
2λdG (rp + u)∇dG (rp + u) · ϕ + 2µDu : Dϕ dζ ,
(5.15)
where ϑ : Ω → R is a scalar test function and ϕ : Ω → R3 is a vector-valued
test displacement. Furthermore, A : B = tr( A> B). For a derivation of the first
variations of the individual energy terms, we refer to the Appendix (Sect. A.2.2).
For the spatial discretization we apply a piecewise bilinear finite element (FE)
approximation on a uniform rectangular mesh covering the image domain Ω. The
SDF dG is pre-computed using a fast marching method [Russ 00] on a uniform
rectangular 3-D grid covering the unit cube [0, 1]3 and stored on the nodes of this
grid, dG and ∇dG are evaluated using trilinear interpolation of the pre-computed
nodal values. In the assembly of the functional gradient we use a Gauss quadrature scheme of order 3. The total energy E is highly non-linear due to the involved
non-linear distance function dG and the pseudo Huber norm k · kδreg . We take a
multiscale gradient descent approach [Alva 99], solving a sequence of joint matching and denoising problems from coarse to fine scales. On each scale a non-linear
conjugate gradient method is applied on the space of discrete range maps and discrete deformations. In particular, we use a regularized gradient descent along the
lines of Sundaramoorthi et al. [Sund 07] to guarantee a fast and smooth relaxation.
As initial guess for the range function r we take into account the measured range
data r0 . The displacement is initialized with the zero mapping. As time step control the Armijo rule is taken into account [Armi 66]. We stop iterating as soon as
the energy decay is sufficiently small.
5.5
Experiments and Results
The experimental study is structured as follows. First, we investigate the performance of different denoising models. Second, the proposed model for joint range
5.5
Experiments and Results
91
image denoising and registration is validated on real CT and synthetic ToF data
from an anthropometric torso phantom. Third, we investigate its application on
synthetic data from a dynamic 4-D CT respiration phantom. Thereby, we underline the benefits of the joint variational approach compared to consecutive denoising and registration. Fourth, we apply the method to real CT and real ToF data.
For an application of the non-rigid surface registration framework proposed in
Sect. 5.3 we refer to Wasza et al. [Wasz 12b].
5.5.1
Materials and Methods
All experiments below, except from the respiration phantom study, build on an
initial shape template that was extracted from CT data of a male torso phantom as
described in Sect. 3.5.1. Here, let us consider this shape template as the instantaneous patient body surface denoted by M ⊂ R3 . Based on M and a virtual ToF
camera (200×200 px, cf. Sect. 2.2.2) we generated:
• Ideal noise-free ToF data rideal , with associated 3-D point cloud Xrideal , denoted as a pair (rideal , Xrideal ) below.
• Realistic ToF measurements, denoted (r0 , Xr0 ), by artificially adding noise in
the range measurement domain. In particular, we approximated sensor noise
on a per-pixel basis by adding an individual offset to the ideal range rideal
drawn from a standard normal distribution with σ2 = 40 mm2 . This variance is motivated by observations on real ToF data (PMD CamCube 2.0) at a
typical clinical working distance of about 1.5 m.
• Temporally averaged realistic ToF measurements, denoted (r0,ta , Xr0,ta ). As
initial data enhancement step prior to spatial denoising addressed in the
variational formulation, we applied temporal averaging over 5 frames on
(r0 , Xr0 ). This is a viable choice w.r.t. the frame rate of today’s RI cameras
and the considered application in respiratory motion tracking.
The denoised range data and matching deformation estimated with the proposed
joint approach are denoted by (r ∗ , Xr∗ ) and (u∗ , φ∗ ), respectively.
Comparison of Denoising Models. Based on the synthetic data introduced before, we first studied the performance of different denoising models on (r0,ta , Xr0,ta )
using the variational formulation from the joint approach:
E [r ] = Efid [r ] + κ Er,reg [r ] .
(5.16)
Let us stress that we solely investigated the denoising component of the proposed
method here, in the absence of deformations. The proposed regularization using
the pseudo Huber norm (Eq. 5.12) was compared to both a quadratic (Q) and an
egde preserving TV regularization of r:
Er,reg,Q [r ] :=
Er,reg,TV [r ] :=
Z
ZΩ
Ω
k∇r (ζ )k22 dζ ,
(5.17)
k∇r (ζ )k1 dζ .
(5.18)
92
Joint Range Image Denoising and Surface Registration
Ideal ToF data
(intra-fractional)
add
noise
Realistic ToF data
(intra-fractional)
denoising
Denoised ToF data
Denoised ToF data
Joint
denoising
and
registration
registration
Planning CT shape
(pre-fractional)
Reconstructed
planning CT shape
Reconstructed
planning CT shape
Figure 5.3: Geometric sketch of the model validation setup. On the left, the generation
of noisy ToF data (r0 , Xr0 ), (r0,ta , Xr0,ta ) and a planning shape G = φideal (M) from a given
ideal intra-fractional shape M (blue frame) is depicted (purple-shaded boxes). On the
right, datasets used for validation of the proposed joint approach (left gray-shaded box)
and a sequential denoising and registration scheme (right gray-shaded box) are illustrated
(green frames). In addition, the metrics used for quantitative evaluation of the denoising
process (top) and the registration process (bottom) are depicted in red.
As a quantitative measure of the denoising quality, we evaluated the distance of
the denoised surface Xr∗ to the ideal intra-fractional shape M. This is performed
by evaluating dM (Xr∗ ), where dM represents the SDF w.r.t. M. For the experiments, the weighting factor in Eq. (5.16) was empirically set to κ = 1 · 10−4 .
Validation of the Joint Model. The workflow of our experiments for model validation is depicted in Fig. 5.3. Recall that we consider M as the ideal intra-fractional
shape of the patient and that we have generated both ideal (rideal , Xrideal ) and realistic synthetic ToF data (r0 , Xr0 ), (r0,ta , Xr0,ta ) from M. In addition, we deformed M
by a synthetic deformation φideal and considered the deformed shape as planning
CT surface G = φideal (M). As synthetic deformation we have taken into account
a non-linear transformation in the treatment table plane:
ux,ideal ( x) = ν(− x (y − 0.5) − (1 − x )( x − 0.5)) ,
uy,ideal ( x) = ν( x ( x − 0.5) − (1 − x )(y − 0.5)) ,
(5.19)
(5.20)
and uz,ideal ( x) = 0, x ∈ M, with a comparably large deformation scale parameter
ν set to 10% of the scene width.
5.5
Experiments and Results
93
We consider the distance of the denoised range data to the ideal intra-fractional
patient shape dM (Xr∗ ) for quantitative evaluation of the denoising process of the
joint approach. In analogy, the quality of the registration process is quantified
using dG (φ∗ (Xrideal )), evaluating the distance of the ideal intra-fractional patient
shape being transformed with the estimated matching deformation φ∗ (Xrideal ) to
the planning shape G , cf. Fig. 5.3. Hence, both components of the joint approach
are evaluated in an independent manner. We investigated the case of unfiltered
range data (r0 , Xr0 ) with a suitable set of model parameters κ = 4 · 10−4 , λ = 1 · 104 ,
µ = 4 · 10−3 and the case of temporally averaged range data (r0,ta , Xr0,ta ) with an
adapted set of parameters κ = 1 · 10−4 , λ = 2.5 · 103 , µ = 1 · 10−3 , to study the
impact of applying temporal denoising as a pre-processing measure.
Joint vs. Non-Joint Denoising and Registration. To explicitly investigate the
benefit of the proposed joint approach, we compared the quality of joint denoising
and registration to a sequential (non-joint) scheme that performs denoising and
registration consecutively, i.e. first denoising r0 and then computing the deformation φ̃∗ matching the denoised surface Xr̃∗ to G . The optimal range and displacement functions that are estimated based on the sequential scheme are denoted
with a tilde, as (r̃ ∗ , Xr̃∗ ) and (ũ∗ , φ̃∗ ), opposed to the joint estimates (r ∗ , Xr∗ ) and
(u∗ , φ∗ ). For direct comparability, we consider the same denoising model as with
the joint approach:
Z
E [r̃ ] =
Ω
|r̃ − r0 |2 + κr̃ k∇r̃ kδreg dζ .
(5.21)
The registration of the denoised range data r̃ ∗ to G is then performed according to:
E [ũ] =
Z
Ω
|dG (r̃ ∗ p + ũ)|2 + κũ k D ũk22 dζ ,
(5.22)
with the regularization weights κr̃ = 1 · 10−4 and κũ = 4 · 10−7 set in analogy to
the settings with the joint approach. By doing so, we ensure that the regularization
of the matching displacement is at a comparable level. Again, we investigated the
denoising and registration components in a separate manner. For the sequential
approach, we quantified the quality of denoising and registration in analogy to the
joint approach by dM (Xr̃∗ ) and dG (φ̃∗ (Xrideal )), respectively. The former evaluates
the distance of the denoised surface Xr̃∗ to the ideal intra-fractional patient shape
M. The latter quantifies the distance of the ideal intra-fractional patient shape
being transformed with the estimated matching deformation φ̃∗ (Xrideal ) w.r.t. the
planning shape G . For a comprehensive illustration of the evaluation setup we
refer to Fig. 5.3. The experiments were performed on (r0,ta , Xr0,ta ).
Respiratory Motion Tracking. In order to quantify the performance of the proposed method in respiratory motion tracking, we used the synthetic 4-D Nurbsbased CArdiac-Torso phantom (NCAT) [Sega 07]. In particular, we generated torso
shape data M p for 16 states within one respiration cycle, p ∈ {1, . . . , 16} denoting
the respiration phase. For each state, the phantom’s external body surface was extracted from synthetic CT data with a resolution of 256 × 256 × 191 voxels and a
spacing of 3.125 × 3.125 × 3.125 mm3 with the pipeline described in Sect. 3.5.1. We
94
Joint Range Image Denoising and Surface Registration
(a)
(b)
(c)
Figure 5.4: Experimental evaluation of different denoising models. From left to right, the
residual distance dM (Xr∗ ) of the denoised surface Xr∗ to the ground truth shape M is
color coded on one side of Xr∗ for (a) quadratic regularization, (b) TV regularization, and
(c) the proposed regularization based on the pseudo Huber norm k∇r kδreg .
generated a typical RT treatment scene by adding a plane that mimics the treatment table. The phantom shape at full expiration (p = 1) was considered as the
pre-fractional planning geometry G . For the shapes of the remaining respiration
states, we generated temporally averaged realistic ToF data (r p,0,ta , Xr p,0,ta ). These
datasets were then processed using the proposed joint range image denoising and
registration approach. The quality of the denoising process was evaluated based
on dM p (Xr∗p ), where dM p denotes the SDF w.r.t. M p , the quality of the registration
process based on dG (φ∗p (M p )), in analogy to the experiments described before. φ p
denotes the deformation at respiration phase p. The model parameters were set
in accordance with the model validation experiments: κ = 1 · 10−4 , λ = 2.5 · 103 ,
µ = 1 · 10−3 .
Experiments on Real ToF Data. Eventually, we illustrate the feasibility of the
proposed method on real CT and real ToF data from the male torso phantom. ToF
data were acquired using a PMD CamCube 3.0 ToF camera with an integration
time of 750 µs. Running at a frame rate of 40 Hz, we applied temporal averaging
over 5 frames as in the experiments with synthetic ToF data before. The weighting
parameters were also set in accordance with the experiments on synthetic data,
κ = 1 · 10−4 , λ = 2.5 · 103 , µ = 1 · 10−3 . ToF and CT data were roughly aligned
manually. However, note that for these experiments on real data we did not have
ground truth information. Hence, we were unable to quantify the quality of denoising and registration as in the synthetic experiments. Instead, for a qualitative
assessment of the denoising process, we compare the denoising results with the
joint and non-joint approach to ToF data that were averaged over 500 frames.
5.5.2
Results
Comparison of Denoising Models. The performance of the three investigated
denoising models is illustrated in Fig. 5.4. Both the over-smoothing effect of the
quadratic regularization at the torso boundaries and the staircasing artifacts of
the TV regularization on the flat thoracic and abdominal regions are clearly visi-
5.5
Experiments and Results
Xr0
95
Xr0,ta
Xr0,ta
ToF Data
dG (φ∗ (Xr∗ ))
dM (Xr∗ )
dG (φ∗ (Xrideal ))
u∗
Figure 5.5: Validation of the joint model on male phantom data. The first two columns
correspond to results for a full torso including the head, the last column to results for
the trunk of the phantom. The left column depicts results on unfiltered ToF data Xr0 ,
the center and right columns results on temporally averaged ToF data Xr0,ta . From top to
bottom, the rows depict the input ToF data, the residual mismatch dG (φ∗ (Xr∗ )) after joint
denoising and registration to G , the denoising quality dM (Xr∗ ), the registration quality
dG (φ∗ (Xrideal )) and the smoothness of the estimated displacement field u∗ . The associated
color-coding is depicted on the right.
ble. Instead, using the proposed regularization based on the pseudo Huber norm
k∇r kδreg , we observe an edge-preserving smoothing that avoids staircasing.
Validation of the Joint Model. Results of the proposed algorithm for joint range
image denoising and registration on torso phantom data are depicted in Fig. 5.5.
96
Joint Range Image Denoising and Surface Registration
In particular, we present results for two different datasets, one representing the
full torso including the head, the other one representing the trunk of the phantom.
For the full torso dataset, results for both unfiltered (r0 ,Xr0 ) and time-averaged
realistic ToF data (r0,ta ,Xr0,ta ) are given. The results on time-averaged ToF data
outperformed the results on unfiltered ToF data. Overall, regarding the results on
time-averaged ToF data that still exhibit a rather low SNR, the residual mismatch
in terms of both denoising and registration was little. We remark that this is a
particularly promising result w.r.t. the strong synthetic deformation used in these
validation experiments, cf. the visualization of u∗ in Fig. 5.5.
Joint vs. Non-Joint Denoising and Registration. Fig. 5.6 opposes the proposed
joint approach to sequential denoising and registration, at a comparable level of
regularization of the matching displacement. The color-coding indicates the superiority of the joint approach. Over the central torso region, the absolute residual error after denoising was 0.47±0.36 mm for the sequential scheme evaluating
|dM (Xr̃∗ )|, compared to 0.22±0.15 mm when estimated within the joint framework
evaluating |dM (Xr∗ )|. This corresponds to an improvement by a factor of 2.1. Considering the registration quality, the absolute residual mismatch was 0.47±0.36 mm
for the sequential scheme, where the alignment was performed after denoising,
and evaluating |dG (φ̃∗ (Xrideal ))|. Using the proposed joint framework, the residual mismatch decreased to 0.24±0.16 mm evaluating |dG (φ∗ (Xrideal ))|. This corresponds to an improvement by a factor of 1.9. We conclude that incorporating prior
knowledge about the target shape G helps substantially in the denoising process.
On the other hand, proper denoising also renders the registration problem more
robust.
Respiratory Motion Tracking. Results for respiratory motion tracking on NCAT
phantom data are depicted in Fig. 5.7. Given are color-coded plots of the residual
error w.r.t. denoising and registration, and the estimated displacement fields u∗
for different phases within the respiration cycle. The results indicate that the high
quality of denoising and registration with the joint approach is only slightly affected by the magnitude of respiration (increasing from left to right). To speed up
the algorithm, we have taken into account the estimated displacement field and
the denoised range data from the previous phase as initial data for the next phase.
Comparing this initialization scheme to an initialization of r with r0 and u with the
zero displacement, we observed a reduction of the required gradient descent steps
by a factor of 3 in average, without any notable change of the resulting minimal
energy, cf. [Baue 12b].
Experiments on Real ToF Data. Results on real CT and real ToF data from the
male torso phantom are illustrated in Fig. 5.8. An interesting outcome was the
observation that the denoising process of the joint model was able to remove topographic artifacts in Xr0,ta that result from systematical errors of real ToF data,
cf. Sect. 2.1.3. Again, this underlines that the joint approach inherently incorporates prior shape knowledge from the reference shape G into the denoising process. Vice versa, this results in a substantially better displacement estimate. In
Discussion and Conclusions
97
Non-Joint
5.6
dG (φ̃∗ (Xrideal ))
Xr̃∗
dM (Xr∗ )
dG (φ∗ (Xrideal ))
X r∗
Joint
dM (Xr̃∗ )
Figure 5.6: Comparison of the proposed joint approach to a sequential scheme where registration is performed after denoising, for temporally averaged ToF data (r0,ta , Xr0,ta ). The
color-coded renderings on the left depict the denoising quality, evaluating dM (Xr̃∗ ) on Xr̃∗
for the non-joint approach (upper row) and dM (Xr∗ ) on Xr∗ for the joint approach (lower
row). The color-coded renderings in the center depict the registration performance, evaluating dG (φ̃∗ (Xrideal )) on φ̃∗ (Xrideal ) for the non-joint and dG (φ∗ (Xrideal )) on φ∗ (Xrideal ) for
the joint approach. The right column illustrates the denoised surfaces Xr̃∗ and Xr∗ with a
monochrome rendering, visually underlining the superior denoising performance of the
joint approach.
contrast, when performing denoising and registration in a consecutive manner, the
denoising process cannot eliminate those artifacts. Hence the displacement field,
even though satisfying φ̃∗ (Xr̃∗ ) ≈ G , will exhibit local artifacts that do not reflect
the actual deformation. Regarding the considered application of the displacement
field u as a multi-dimensional respiration surrogate in RT motion management,
this would be highly problematic.
5.6
Discussion and Conclusions
In this chapter, we have proposed a novel variational formulation for joint denoising of low-SNR RI data and its registration to a reference shape. First and foremost,
the target application requires the estimation of a dense and reliable displacement
field describing the torso deformation of a reclined patient and providing a multidimensional respiration surrogate. However, the need for a reliable displacement
field implies the need for a reliable measurement of the intra-fractional patient
shape. This is not a valid assumption for low-SNR RI data. Even though non-rigid
surface registration techniques can cope with imperfect input data and estimate
a matching displacement field, residual noise and artifacts in the input data will
impair the deformation estimation.
Joint Range Image Denoising and Surface Registration
u∗
dG (φ∗ (Xrideal ))
dM (Xr∗ )
Xr0,ta
98
Figure 5.7: Results on NCAT phantom data. From left to right, different phases within the
respiration cycle, from exhale (left) to inhale (right) are illustrated. The first row shows
the intra-fractional ToF data Xr0,ta . The second and third row, respectively, depict the denoising quality dM (Xr∗ ) on Xr∗ and the registration quality dG (φ∗ (Xrideal )) on φ∗ (Xrideal ).
The associated color-coding is depicted on the right. The fourth row illustrates the estimated displacement fields u∗ . Here, the color-coding indicates the amplitude of the local
displacement.
In order to enhance the acquired low-SNR RI data, conventional spatial denoising may be applied. However, in the experiments, we showed that using a joint
formulation that simultaneously denoises the measured RI data while registering
it to an accurate reference shape is beneficial compared to a sequential approach
that performs both tasks in a consecutive and independent manner. In a quantitative study on real CT and synthetic ToF data, we found that the joint formulation
improves the quality of the denoising and registration processes by a factor of 2.1
and 1.9, respectively. A feasibility study on real CT and real ToF data further revealed that denoising with the joint model can compensate for surface artifacts
that result from systematic ToF errors. This is possible due to the proposed joint
formulation exploiting the reliable and accurate patient planning data as a shape
prior. It likewise improves the quality of denoising and the correctness of the estimated displacement field. The results confirm our initial assumption that tackling
each task does benefit considerably from prior knowledge of the solution of the
other task.
5.6
Discussion and Conclusions
dG (φ∗ (Xr∗ ))
Xr0,ta
initial mismatch
Xr0,ta
99
Xr0,ta,500
Xr̃∗
X r∗
Figure 5.8: Experimental results on real ToF and real CT data. From left to right, the upper
row depicts the initial mismatch between Xr0,ta and G rendered in a single image using
alternating slices, the temporally averaged real ToF data Xr0,ta , and the color-coded residual
mismatch dG (φ∗ (Xr∗ )) after joint denoising and registration. The lower row illustrates the
elimination of systematic ToF measurement artifacts when using the proposed joint model.
Xr0,ta,500 denotes real ToF data averaged over 500 frames. Note that both Xr0,ta and Xr0,ta,500
exhibit a local artifact above the clavicle (labeled in red). Conventional spatial denoising
as applied in the sequential approach cannot eliminate this artifact, see Xr̃∗ . Instead, the
denoising process of the joint model reduces the artifact substantially, see Xr∗ .
We further investigated the performance of the proposed joint model on synthetic data from a 4-D CT respiration phantom. Here, we found that the quality of
both the denoising and registration process, respectively, is only slightly impaired
by increasing the respiration magnitude. However, the amount of noise in the RI
input data correlates with the residual mismatch w.r.t. both tasks. This motivates
the pre-processing of RI data with temporal denoising techniques prior to applying the proposed joint denoising and registration algorithm.
100
Joint Range Image Denoising and Surface Registration
CHAPTER
6
Sparse-to-Dense Non-Rigid
Surface Registration
6.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2 Sparse-to-Dense Surface Registration Framework . . . . . . . . . . . . . . . 103
6.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
As detailed previously in Sect. 5.1, the intra-procedural tracking of respiratory
motion has the potential to improve image-guided diagnosis and interventions.
Available solutions in IGRT are subject to several limitations. Foremost, real-time
RI technologies that are capable of acquiring dense 3-D data typically exhibit a low
SNR. The question we pose is: Could a paradigm shift in the development of realtime RI technology from dense but noisy toward accurate but sparse help improving
the accuracy of surface motion tracking?
In this chapter, we investigate the fitness of a novel multi-line triangulation sensor for the task of non-rigid surface motion tracking. Instead of acquiring dense
but noisy RI data, the MLT sensor delivers highly accurate but sparse measurements (recall Sect. 2.1.3). We have developed a novel sparse-to-dense registration
approach that is capable of reconstructing the patient’s dense external body surface and estimating a 4-D (3-D+time) surface motion field from sparse sampling
data and patient-specific prior shape knowledge [Baue 12a, Berk 13]. More specifically, the sparse position measurements acquired with the MLT sensor are registered with a dense pre-fractional reference shape. Thereby, a dense displacement
field is recovered which describes the spatio-temporal deformation of the patient
body surface, depending on the type and state of respiration. In a joint manner,
the proposed method enables:
• Dense reconstruction of the patient’s instantaneous external body shape
• Estimation of non-rigid torso deformations, yielding a high-dimensional respiration surrogate
The remainder of this chapter is organized as follows. In Sect. 6.1, we identify limitations of available solutions for marker-less intra-fractional respiratory motion
tracking in IGRT, as a motivation for the proposed approach. Furthermore, we
summarize related work w.r.t. methodology. For details on the MLT prototype we
101
102
Sparse-to-Dense Non-Rigid Surface Registration
refer to Sect. 2.1.3. In Sect. 6.2, we introduce the variational formulation of the proposed method for sparse-to-dense non-rigid surface registration. In Sect. 6.3, the
method is validated on NCAT data and evaluated in a comprehensive volunteer
study investigating the method’s accuracy on realistic data. Eventually, we discuss the results and draw conclusions in Sect. 6.4. Parts of this chapter have been
published in [Baue 12a, Berk 13] and are joint work with Prof. Dr. Martin Rumpf
and Prof. Dr. Benjamin Berkels.
6.1
Motivation and Related Work
Commercially available RI-based IGRT solutions for patient monitoring and respiratory motion tracking are subject to several limitations. First, they do not support dense sampling in real-time [Brah 08, Mose 11] or at the cost of a limited
field of view [Bert 05, Peng 10, Scho 07]. For instance, the Sentinel system (CRAD AB, Uppsala, Sweden1 ) and the Galaxy system (LAP GmbH, Lüneburg, Germany2 ) take several seconds for a complete scan of the torso [Brah 08, Mose 11],
and the real-time mode of the VisionRT stereo system (VisionRT Ltd., London,
UK3 ) is limited to interactive frame rates of 1.5-7.5 Hz, depending on the size of
the surface of interest [Peng 10]. The temporal resolution of these solutions may be
insufficient to characterize respiratory motion [Wald 09]. We expect the low frame
rates to result from the underlying measurement technologies: (1) consecutive
light sectioning as used by the Sentinel and Galaxy systems is constrained by the
fact that a laser line must be swept mechanically and a set of camera frames must
be merged over time to reconstruct an appropriate dense surface scan from a set
of subsequently acquired contours, (2) dense stereo imaging as used by VisionRT
is known to imply a substantial computational burden for establishing image correspondences for 3-D reconstruction, posing constraints for real-time acquisition.
Second, beside its limitations in terms of sampling density and speed, respectively,
commercially available solutions often imply high costs in terms of hardware and
are subject to measurement uncertainties due to the underlying sampling principles e.g. active stereo photogrammetry [Bert 05, Peng 10, Scho 07] or consecutive light-sectioning using mechanically swept lasers [Brah 08, Mose 11]. Third and
last, the general focus of these systems is on patient positioning [Will 12] and none
of them features dense and non-rigid respiratory motion tracking. Instead, if available at all, motion tracking is restricted to the acquisition of low-dimensional respiration surrogates [Scha 08].
Research on dense tracking of spatio-temporal deformations of a patient’s torso
from marker-less RI data has emerged only recently [Baue 12b, Scha 12]. For instance, Schaerer et al. [Scha 12] have studied the application of a non-rigid extension of the ICP algorithm [Ambe 07] for this task, on stereo vision data acquired with the VisionRT system. We have presented a variational formulation
for joint range image denoising and non-rigid registration with a planning surface
1 http://www.c-rad.se
2 http://www.lap-laser.com
3 http://www.visionrt.com
6.2
Sparse-to-Dense Surface Registration Framework
(a)
(b)
103
(c)
Figure 6.1: Reconstruction of sparse non-rigid displacement fields from MLT data for respiratory motion tracking [Baue 12c]. (a) MLT measurements at exhale (in blue) and inhale
(in green) respiration states. To support visual interpretation, the surface at exhale state
(in gray), acquired with a dense RI camera, is additionally depicted, but not considered by
the registration approach. (b) Estimated non-rigid displacement field (in yellow). (c) The
acquisition of MLT data over time allows to analyze the spatial range of respiratory motion
along the individual laser triangulation planes.
[Baue 12b], as detailed in Chap. 5, on ToF data. However, both approaches rely on
dense 3-D data that are subject to low SNRs and systematic errors, respectively.
In this chapter, we propose an alternative strategy based on sparse but highly
accurate RI data from a multi-line laser triangulation sensor. Early work on respiratory motion tracking using MLT data was restricted to the reconstruction of
sparse displacement fields based on a non-rigid registration of successively acquired sparse point cloud data [Baue 12c], cf. Fig. 6.1. However, the estimated
displacement fields rely on the insufficient assumption that the local surface trajectories reside in the plane of the projected laser line.
Instead, below, a novel variational model is introduced to recover a dense, accurate and reliable 4-D displacement field and to reconstruct a complete patient
body surface model at the instantaneous respiration state from sparse data, using
prior patient shape knowledge from tomographic planning data. Estimating the
dense displacement field is combined with recovering a sparse displacement field
from MLT measurements to planning data. Thus, the approach is closely related
to the field of inverse-consistent registration [Cach 00, Chri 01] where the deformations in both directions are estimated simultaneously with a penalty term constraining each deformation to be the inverse of the other one. Medical applications
of the idea of inverse consistency include the symmetric matching of corresponding landmarks [John 02] and edge features [Han 07].
6.2
Sparse-to-Dense Surface Registration Framework
In this section, we describe the geometric configuration and derive the variational
model for the joint reconstruction of the instantaneous patient shape and the estimation of the underlying dense and non-rigid displacement field, from sparse
measurements and prior shape knowledge.
104
Sparse-to-Dense Non-Rigid Surface Registration
Figure 6.2: Geometric configuration for the reconstruction of the dense deformation φ
with φ(ζ, g(ζ )) = (ζ, g(ζ )) + u(ζ ) from sparse sampling data Y = {y1 , . . . , yn } and prior
reference shape data G ⊂ R3 , and the approximate sparse inverse Ψ with Ψ(yi ) = yi + wi .
For a better visibility G and Y have been pulled apart. Furthermore, the projection P onto
G and the orthogonal projection Q from the graph G onto the parameter domain Ω are
sketched.
6.2.1
Geometric Configuration
Given is a pre-fractional reference shape G ⊂ R3 that can be (1) extracted from
tomographic planning data, (2) captured with a dense RI sensor of low temporal
resolution, or (3) acquired by an MLT sensor in combination with a steerable treatment table [Ettl 12a]. During therapeutic dose delivery, the instantaneous patient
body surface denoted by M ⊂ R3 is represented by sparse MLT sampling data
Y ⊂ R3 . In particular, the MLT sensor acquires a finite set of n measurements
Y = {y1 , . . . , yn }, yi ∈ R3 , arranged in a grid-like structure (Fig. 6.2). Note that
the intra-fractional grid-like sampling Y is not aligned with G and depends on the
respiration state and magnitude at the time of acquisition.
Now, the goal is to estimate the unknown dense and non-rigid deformation
φ : G → R3 that matches the reference shape G to the instantaneous patient body
surface M. Ideally, φ should be such that M = φ(G), but since Y only contains
information about a sparse subset of M, the condition on φ appropriate for our
problem setting is Y ⊂ φ(G). Along the lines of inverse-consistent registration, in
a joint manner, we estimate φ together with its inverse ψ. Again, due to the sparse
nature of the input data, we do not try to estimate the inverse everywhere on M
but only on the known sparse subset Y. In other words, instead of trying to find
ψ : M → R3 with ψ(M) = G , we estimate a sparse deformation Ψ : Y → R3 such
that Ψ(Y ) ⊂ G . Here, dense and sparse deformations are distinguished by using
lower and upper case letters respectively. Let us underline that Ψ is fully represented by the discrete set {Ψ(y1 ), . . . , Ψ(yn )} containing the deformed positions
of the n points acquired by the MLT sensor. A geometric sketch that illustrates the
deformations φ and Ψ is depicted in Fig. 6.2. As we will see when constructing the
individual terms of our objective functional in Sect. 6.2.2, estimating Ψ allows us to
establish a correspondence between the MLT measurements and the reference patient surface, whereas the dense deformation φ can be used as a high-dimensional
breathing surrogate and enables the reconstruction of the complete instantaneous
patient surface for intra-fractional monitoring of the patient setup.
Before we describe the variational model in Sect. 6.2.2, let us introduce some
basic notation, cf. Fig. 6.2. We assume that the reference shape G is given as a
6.2
Sparse-to-Dense Surface Registration Framework
105
graph, i. e. there is a parameter domain Ω ⊂ R2 usually associated
with the patient
table plane and a function g : Ω → R such that G = (ζ, g(ζ )) ∈ R3 : ζ ∈ Ω .
Vice versa, the orthographic projection onto the parameter domain Ω is given as
Q(ζ, g(ζ )) = ζ, Q ∈ R2×3 . Furthermore, we represent the sparse deformation Ψ
by a set of displacement vectors W = {w1 , . . . , wn } ⊂ R3 via:
Ψ(yi ) = yi + wi .
(6.1)
The deformation φ is represented by a displacement u : Ω → R3 defined on the
parameter domain Ω of the graph G via:
φ(ζ, g(ζ )) = (ζ, g(ζ )) + u(ζ ) .
(6.2)
To quantify the matching of Ψ(Y ) to G we apply the SDF-based closeness measure
introduced in Sect. 5.3.2. Again, we emphasize that even though P(Y ) ⊂ G holds
by construction, we do not expect any biologically reasonable Ψ to be equal to the
projection P. Indeed, the computational results discussed below underline that it
is the consistency term coupling φ and Ψ in combination with the prior for the
deformation φ which leads to general matching correspondences for a minimizer
of our variational approach.
6.2.2
Definition of the Registration Energy
Now, since φ and Ψ are represented by u and W respectively, we define a functional
E on dense displacement fields u and discrete sets of displacements W whose minimizer represents a suitable matching of the planning data G and MLT measurements Y:
E [u, W ] := Ematch [W ] + κ Econ [u, W ] + λ Ereg [u] ,
(6.3)
where κ and λ are nonnegative constants controlling the contributions of the individual terms. Ematch is a matching energy that encodes the condition Ψ(Y ) ⊂ G .
The consistency functional Econ is responsible for establishing the relation between
both displacement fields, constraining Ψ and φ to be approximately inverse to each
other on the sparse set of positions Y where Ψ is defined. Thereby, it implicitly encodes the condition Y ⊂ φ(G). Finally, Ereg ensures a regularization of the dense
displacement field u. The detailed definitions of these functionals are as follows.
Matching Energy. The distance of the points Ψ(yi ) to their projection P(Ψ(yi ))
onto G is a suitable indicator for the closeness of Ψ(Y ) to G . This pointwise distance can be conveniently expressed using the signed distance function dG , recall
Sect. 5.3.2: With the representation Ψ(yi ) = yi + wi and this pointwise closeness
measure (Eq. 5.5) we can define the matching functional:
Ematch [W ] :=
1 n
|dG (yi + wi )|2 .
2n i∑
=1
By construction, this functional is minimal if and only if Ψ(Y ) ⊂ G .
(6.4)
106
Sparse-to-Dense Non-Rigid Surface Registration
Consistency Energy. For a given instantaneous deformation φ of the patient surface G and the corresponding exact deformation Ψ of the MLT measurement Y,
we have Ψ(Y ) ⊂ G and these deformations are inverse to each other in the sense
of the identity φ(Ψ(Y )) = Y. However, for an arbitrary deformation Ψ described
by some vector of displacements W in general Ψ(Y ) 6⊂ G . Thus, since φ is only
defined on G , we have to incorporate the projection P onto G to relate arbitrary
φ and Ψ. This leads us to the identity φ( P(Ψ(Y ))) = Y that we encode with the
consistency energy:
Econ [u, W ] :=
=
1 n
|φ( P(Ψ(yi ))) − yi |2
2n i∑
=1
1 n
| P(yi + wi ) + u( Q P(yi + wi )) − yi |2 .
2n i∑
=1
(6.5)
Here, we used Eq. (6.1), Eq. (6.2) and the projection Q onto the graph parameter
domain. For a geometric interpretation of Eq. (6.5) we refer to the illustration in
Fig. 6.2. The functional directly combines the dense displacement field u and the
sparse set of displacements W and together with the regularizer Ereg substantiates
the inverse-consistent sparse-to-dense registration approach of the method. Thus,
it allows us to compute a dense smooth displacement of the patient planning surface even though only a sparse set of measurements is available.
Smoothness Prior. To ensure smoothness of the deformation φ on G we incorporate a thin plate spline type regularization of the corresponding displacement
field u [Mode 03b] and define:
Z
1
|∆u|2 dζ .
(6.6)
Ereg [u] :=
2 Ω
Here, ∆u = (∆u1 , ∆u2 , ∆u3 ) and thus |∆u|2 = ∑3k=1 (∆uk )2 . Since our input data Y
only implicitly provide information for φ on a sparse set, a first order regularizer
is inadequate to ensure sufficient regularity for the deformation. Let us emphasize
that the interplay of Econ and Ereg implicitly provides (discrete) smoothness of the
approximate inverse deformation Ψ. Thus, there is no need to introduce a separate
regularization term for Ψ.
6.2.3
Numerical Optimization
To minimize the highly non-convex objective functional E w.r.t. the unknowns u
and W, we apply a multi-linear FE discretization in space and use a regularized
gradient descent again to guarantee a fast and smooth relaxation [Sund 07]. For the
gradient descent, derivatives of the energy have to be computed. The derivatives
of Ematch and Econ w.r.t. w j are given as:
1
d (y + w j )∇dG (yj + w j ) ,
n G j
>
1
∂wj Econ [u, W ] =
P(yj + w j ) + u( QP(yj + w j )) − yj
n
DP(yj + w j ) + Du( Q P(yj + w j )) Q DP(yj + w j ) ,
∂wj Ematch [W ] =
(6.7)
(6.8)
6.3
Experiments and Results
107
where DP denotes the Jacobian of the projection P. The variations of Econ and Ereg
w.r.t. u in a direction ϕ : Ω → R3 are given by:
h∂u Econ [u, W ], ϕi =
1 n
( P(yi + wi ) + u( Q P(yi + wi )) − yi ) · ϕ( Q P(yi + wi )) ,
n i∑
=1
(6.9)
0
hEreg
[ u ], ϕ i =
3 Z
∑
k =1 Ω
∆uk · ∆ϕk dζ .
(6.10)
For a derivation we refer to the Appendix (Sect. A.3.1). After an appropriate scaling of G we choose Ω = [0, 1]2 and consider a piecewise bilinear, continuous FE
approximation on a uniform rectangular mesh covering the domain Ω. In all experiments below we used a 129×129 grid. The gradient descent is discretized
explicitly in time, the step size is controlled with the Armijo rule [Armi 66]. We
stop the gradient descent iteration as soon as the energy decay is smaller than a
specified threshold value τ, for practical values we refer to Sect. 6.3.2. By default,
both deformations φ and Ψ are initialized with the identity mapping, i.e. u, W are
initialized with zero. In the experiments (Sect. 6.3.2), we further study the benefit of initializing φ with the estimates from the previous step and initializing Ψ
with Ψ(yj ) = P(yj ) for j ∈ {1, . . . , n}. The SDF dG is pre-computed using a fast
marching method [Russ 00] on a uniform rectangular 3-D grid. Further details on
the numerical optimization are given in the supplementary material of Berkels et
al. [Berk 13] and in the Appendix (Sect. A.3.2).
6.3
Experiments and Results
The experimental evaluation divides into two parts. In the first part, the proposed
algorithm is validated on surface data from the NCAT 4-D CT respiration phantom. In the second part, we present a comprehensive study on realistic data from
16 healthy subjects. We have quantified the accuracy in 4-D deformation estimation and surface reconstruction, respectively, and have analyzed the performance
of the proposed framework w.r.t. relevant system parameters.
6.3.1
Materials and Methods
All experiments below were performed with a constant parameter setting of κ =
0.8, λ = 4 · 10−8 . These weighting factors were determined empirically. The convergence threshold was empirically set to τ = 10−4 . To generate MLT sampling
data from synthetic datasets, we have developed a virtual simulator that mimics
the sampling principle of the MLT sensor by intersecting a given dense and triangulated surface with a set of sampling rays. These rays are arranged in a grid-like
structure and the default grid and sampling density of the simulator are set in
accordance to the specifications of the actual MLT prototype used in the experiments on real data. Due to occlusion constraints in a clinical RT environment, the
simulator’s sampling plane and viewing angle, respectively, is set 30◦ off from an
orthogonal camera position w.r.t. the treatment table.
108
Sparse-to-Dense Non-Rigid Surface Registration
Validation on 4-D CT Respiration Phantom Data. For model validation, we
have investigated the reconstruction of respiratory deformation fields from surface data of the NCAT 4-D CT respiration phantom [Sega 07]. For the experiments,
we generated dense surface meshes M p for eight phases within one respiration
cycle, for male and female phantom data. The index p ∈ {1, . . . , 8} denotes the
phase. In addition, we decided to consider both scenarios of arms-up and armsdown patient posture that occur in clinical RT treatment. This results in a total
number of 8 · 2 · 2 = 32 datasets, 16 male and 16 female. The NCAT parameters
were set to default values for the male and female phantom. The phantom surface
at the state of full expiration M1 (phase 1 out of 8) was considered as the planning
geometry G . The remaining set of surfaces was used to generate synthetic sparse
sampling data Y2 , . . . , Y8 using our MLT simulator. The accuracy of the deformation estimation is assessed by the absolute distance of the points in φ p (G) to M p ,
representing the residual mismatch in terms of mesh-to-mesh distance between
the transformed reference surface φ p (G) and the ground truth surface M p that is
to be reconstructed from a sparse sampling Yp and prior shape data G . Here, we
exploit the SDF w.r.t. M p to establish a correspondence between φ p (G) and M p by
computing the distance of a point in the transformed reference surface to the closest point on the ground truth surface, i.e. computing |dM p | on φ p (G). In order to
discard boundary effects at the body-table transition, the evaluation is performed
in the central volume of interest that covers the trunk of the phantom.
Prototype Study on Healthy Subjects. To demonstrate the clinical feasibility of
the proposed system and to evaluate it under realistic conditions, we have conducted a comprehensive study on 16 healthy subjects, male and female. In particular, we have investigated the performance of our modified projection approximation scheme [Berk 13] compared to [Baue 12a], the impact of initializing the
displacement fields with estimates of the preceding respiration phase, and the influence of the convergence threshold τ. Using the MLT simulator and the measured noise characteristics of our prototype, we performed realistic simulations to
study the influence of the MLT laser grid density with regard to upcoming generations of sensor hardware. Along with qualitative and quantitative results in terms
of reconstruction accuracy, we have empirically analyzed the performance of our
implementation in terms of convergence and runtime, respectively.
For the volunteer study, we have used an eye-safe MLT prototype as described
in Sect. 2.1.3. The evaluation database is composed of 32 datasets from 16 subjects,
each performing (1) abdominal and (2) thoracic breathing, respectively. Per subject, we synchronously acquired both real MLT data and surface data using a moderately accurate but rather dense structured-light (SL) system that provides range
images of 640×480 px (Microsoft Kinect, see Sect. 2.1.3). SL data was acquired in
order to provide a dense ground truth surface for quantitative evaluation of our
approach. Both sensors were mounted at a height of 1.2 m above the patient table,
at a viewing angle of 30◦ . MLT and SL data were aligned using geometric calibration. SL data were pre-processed using edge-preserving bilateral filtering. From
each of the 32 datasets, we extracted sparse MLT measurements Yp and dense SL
meshes M p (re-sampled from a rectangular 129×129 grid, cf. Sect. 6.2.3) for eight
6.3
Experiments and Results
109
phases within one respiration cycle. With 16 subjects performing abdominal and
thoracic respiration, this results in a total number of 16·2·8 = 256 datasets. For the
experiments, we considered the reconstruction of the displacement field φ p from
a given planning surface G and intra-fractional MLT data Yp , p ∈ {2, . . . , 8}. The
subject’s body surface at full expiration M1 being acquired with the SL system was
considered as the given planning data surface G . As with the 4-D CT respiration
phantom study, the accuracy of deformation estimation is assessed by the residual
mismatch between the transformed reference surface φ p (G) – being reconstructed
from sparse sampling data Yp and prior shape data G – and the ground truth SL
surface M p , i. e. |dM p | on φ p (G), in the central volume of interest.
In practice, a quantitative evaluation on synchronously acquired real MLT and
SL data was unfeasible, as the SL camera exhibited local sampling artifacts due
to the underlying measurement principle and interferences between the laser grid
(MLT) and speckle pattern projections (SL) of the synchronously used modalities,
which caused local deviations in the scale of several millimeters4 . Hence, the evaluation on real MLT data is restricted to qualitative results. For quantitative evaluation, we employed our simulator for the generation of realistic MLT sampling
data Yp from dense SL surfaces M p . In order to generate MLT data as realistic
as possible, the noise characteristics of our MLT sensor prototype were measured
in an optics lab and applied to the synthetic sampling of dense SL data. Let us
stress here that the aforementioned interferences solely hinder the generation of
ground truth data necessary for evaluation. The practical application of the proposed method only requires an MLT sensor indeed.
Influence of Estimate Initialization and Convergence Threshold. As proposed
in Sect. 6.2.3, we investigated the benefits of initializing φ with the estimate from
the previous phase and Ψ with Ψ(yj ) = P(yj ) as initial data, in order to reduce
the number of iterations needed for the optimization scheme to convergence. In
particular, we found that using the projection P onto G as initial guess for Ψ is
an even better estimate than just considering the previous estimate. Note that for
the first frame we initialize φ with the identity mapping. Furthermore, in order
to determine a suitable value for the optimization convergence threshold τ, we
studied the impact of τ on both reconstruction accuracy and runtime.
Influence of Modified Projection Approximation. The numerical evaluation of
the projection onto G is based on the SDF dG and the term P( x) = x − dG ( x)∇dG ( x).
Thus, the variation of Econ w.r.t. Ψ involves the derivative of P which in turn involves second derivatives of dG . To avoid these second derivatives we use a projection approximation scheme. Compared to the approach used in [Baue 12a], the
scheme applied here treats the distance dG implicitly and the direction ∇dG explicitly [Berk 13], while [Baue 12a] treated both dG and ∇dG explicitly. Thus, the
improved projection approximation scheme reflects the underlying projective ge4 We
expect similar interference effects when simultaneously using our MLT sensor together
with an active stereo photogrammetry system such as the VisionRT stereo pods for evaluation
which also rely on speckle pattern projectors in the infrared domain for simplifying the stereo
matching problem.
Sparse-to-Dense Non-Rigid Surface Registration
|dM p | on φ p (G) in [mm]
|dM p | on φ p (G) in [mm]
110
( a)
(b)
Figure 6.3: Validation of the model on a male (a) and female (b) 4-D CT respiration phantom. Given are boxplots of the absolute registration error in [mm] w.r.t. discrete ranges of
respiration amplitude in terms of |dM p | on φ p (G). Each boxplot combines the results for
the phantom postures of arms-up and arms-down, illustrated above the plots.
ometry much better and is substantially more efficient. Details on the projection
approximation scheme are provided in the Appendix (Sect. A.3.2). In order to
study the computational impact of this modification, we compared the results with
the improved projection approximation [Berk 13] to previous results [Baue 12a] for
τ = 10−7 .
Influence of the MLT Laser Grid Density. Upcoming generations of MLT sensors are expected to feature higher laser grid densities, see Sect. 6.4. Hence, we
have investigated the effect of the projection laser grid density on the registration error. The evaluation was performed on realistic MLT simulator data – 256
datasets from 16 subjects, each sampled with grids of 11×11, 22×22, 33×33 and
44×44 lines – as MLT sensors with higher grid densities than our prototype (11×10
sampling lines) are under development and do not exist yet. Hence, at this point,
an experimental study on real data was unfeasible.
6.3.2
Results
Validation on 4-D CT Respiration Phantom Data. Quantitative results of the
proposed method on NCAT phantom data are given in Fig. 6.3. The boxplots
illustrate the absolute registration error w.r.t. discrete ranges of respiration amplitude for the male and female phantom. The results for the arms-up and armsdown datasets are combined per gender. Even for instances with a large initial
surface mismatch in the range of 9-12 mm, the median residual error in terms of
|dM p | on φ p (G) is smaller than 0.1 mm. It is also worth noting that the error scales
directly proportional to the respiration amplitude. Qualitative results for the male
and female NCAT phantoms are depicted in Fig. 6.4. With the female phantom,
6.3
Experiments and Results
111
NCAT Male (arms-up)
p: 2→1
p: 4→1
NCAT Female (arms-down)
p: 2→1
p: 4→1
G, Yp and Y1
dG on Mp , Yp
dMp on φp (G), Yp
φp on G
Figure 6.4: Qualitative results of the NCAT experiments, for reconstruction of the deformation field for respiration phases p = 2 and p = 4 w.r.t. phase 1 as reference (full expiration), for male arms-up (left) and female arms-down (right) data. First row: G , Yp (outer
contour) and Y1 ⊂ G . Note that the synthetic MLT data Yp were generated by sampling
M p using our MLT simulator. Second row: Initial mismatch in terms of dG on M p , and
Yp . Third row: Residual mismatch after application of the proposed method in terms of
dM p on φ p (G), and Yp . Fourth row: Glyph visualization of the displacement field φ p on G ,
u p is color coded according to the color bar on the right.
the MLT coverage of the breast is limited. This becomes evident with an increased
local error around the outer lateral part of the female breast. However, the impact
is moderate due to the incorporation of prior shape knowledge and the higher order regularization of φ (Sect. 6.2.2). These model priors are also beneficial in cases
of (self-)occlusion. For instance, due to the viewing angle of 30◦ w.r.t. the treatment table plane, the upper part of the female breast in Fig. 6.4 is self-occluded.
Nonetheless, the occluded areas can be reconstructed in a robust manner.
Prototype Study on Healthy Subjects. Qualitative results of the study on healthy
subjects are depicted in Fig. 6.5. To facilitate an anatomic interpretation of the deformation, we overlaid the color texture – that was additionally acquired with our
SL device – onto G . Please note that an analysis of the deformation φ p allows for
a distinct differentiation between abdominal and thoracic respiratory motion patterns and inter-subject variations in the respiration amplitude. For instance, in
the case of thoracic respiration, subject S1 and subject S2 exhibit a similar motion
pattern in the thorax region but substantial differences in the abdominal region.
An overview of the quantitative results over all subjects on realistic MLT data
is given in Fig. 6.6. In particular, Fig. 6.6a depicts boxplots of the initial mismatch
|dG | on M p and residual mismatch |dM p | on φ p (G) over all 16 subjects. Here, the
results for abdominal and thoracic respiration are evaluated in a common plot.
While Fig. 6.6a gives an impression about the overall performance, Fig. 6.6b shows
112
Sparse-to-Dense Non-Rigid Surface Registration
Abdominal Respiration
p: 2→1
p: 3→1
p: 4→1
Thoracic Respiration
p: 2→1
p: 3→1
p: 4→1
S1
S2
S3
S4
Figure 6.5: Joint deformation estimation and surface reconstruction on real MLT data.
Depicted are results from four subjects (left to right), for abdominal (top) and thoracic
(bottom) respiration, for phases p ∈ {2, 3, 4}. For each subject, the reference surface G =
M1 and the respective MLT sampling data Yp are shown in the first row: Y2 inner contour
(black), Y4 outer contour (red), Y3 in between (green). The following three rows illustrate
the
estimated displacement fields φ2 , φ3 , φ4 on G . For the glyph visualization of φ p on G ,
u p is color coded in [mm] according to the color bar on the right.
the residual mismatch in a more detailed scale. Figs. 6.6c,d depict the residual error
for discrete respiration phases, for abdominal (c) and thoracic respiration (d), over
all subjects. The reconstruction error scales directly proportional to the respiration
amplitude observing a peak at the state of fully inhale (phase 4/5). The boxplot
whiskers indicate that more than 99% of the residual error is less than 1 mm.
Over all subjects, respiration types and respiration phases, the mean reconstruction error in terms of residual mismatch |dM p | on φ p (G) was ± 0.21 mm and
± 0.25 mm for abdominal and thoracic respiration, respectively, see Table 6.1. The
95th percentile did not exceed 0.93 mm for abdominal respiration and 1.17 mm
for thoracic respiration, for any subject. For a detailed overview of the initial and
residual mismatch (95th percentile) for the individual subjects, separated for abdominal and thoracic respiration, we refer to the Appendix (Sect. A.3.3). We assume the moderately higher reconstruction error for thoracic respiration to result
|dG | on Mp in [mm]
113
|dMp | on φp (G) in [mm]
Experiments and Results
|dMp | on φp (G) in [mm]
6.3
(c)
|dMp | on φp (G) in [mm]
|dMp | on φp (G) in [mm]
(a)
(b)
(d)
Figure 6.6: Quantitative results of the prototype study, for realistic MLT sampling data
from 16 subjects. (a) Reconstruction results per individual subject, comparing the initial mismatch in terms of |dG | on M p (dark gray bars) vs. residual mismatch in terms of
|dM p | on φ p (G) (light gray bars) as boxplots over both respiration types (abdominal and
thoracic) and all phases. (b) Residual mismatch in terms of |dM p | on φ p (G) per subject.
(c, d) Boxplots of the residual mismatch for discrete phases of the respiration cycle, for
abdominal (c) and thoracic (d) respiration, over all subjects.
from the higher initial mismatch of thoracic respiration data (mean: 6.24 mm) compared to abdominal data (mean: 5.09 mm).
Influence of Estimate Initialization and Convergence Threshold. Experimental results in terms of reconstruction accuracy and runtime with and without an
appropriate initialization of φ and Ψ are given in Fig. 6.7a. The experiments illustrate that initializing φ and Ψ reduces the number of iterations needed for the
optimization scheme to convergence, while achieving a comparable registration
error. Over all datasets, estimate initialization reduced runtime by 19.2%. The results for different convergence thresholds (τ = 10−4 and τ = 10−7 ) are depicted
in Fig. 6.7b. The plots indicate that a reduction of τ by a factor of 103 results in a
small improvement in reconstruction accuracy at the cost of a substantial increase
in solver iterations. In an empirical study, we found that the convergence thresh-
114
Sparse-to-Dense Non-Rigid Surface Registration
Table 6.1: Results over all subjects, respiration types and phases. Given are the mean,
median and 95th percentile of the initial and residual mismatch in [mm], for abdominal
respiration (A), thoracic respiration (T) and the entire dataset covering both respiration
types (A/T). The last row states numbers in terms of residual mesh-to-mesh mismatch
from related work by Schaerer et al. [Scha 12].
Mean
Median
95th Percentile
[Scha 12], 95th Percentile
Initial Mismatch [mm]
A
T
A/T
5.09
6.24
5.66
3.95
4.66
4.23
14.0
17.1
15.2
6.1
Residual Mismatch [mm]
A
T
A/T
0.21
0.25
0.23
0.13
0.15
0.14
0.69
0.82
0.76
1.08
old of τ = 10−4 used in the experiments gave a suitable tradeoff between accuracy
and runtime.
Influence of Modified Projection Approximation. The plots in Fig. 6.7c illustrates that both projection approximations result in a comparable reconstruction
accuracy while the new improved approximation reduces runtime substantially
(48.2% over all subjects). Note that we used τ = 10−7 instead of τ = 10−4 as convergence threshold to separate the effect of the improved approximation from the
influence of the convergence threshold.
Influence of the MLT Laser Grid Density. Qualitative and quantitative results
w.r.t. the influence of the MLT grid density are depicted in Fig. 6.8 and Fig. 6.9. As
intuition suggests, a higher grid density comes with a more reliable reconstruction
of the deformation (Fig. 6.9). In particular, this becomes evident in remote regions
that were poorly covered by a less dense laser grid. Please compare the local colorcoding for increasing grid density in Fig. 6.8. A valuable outcome of this study is
the fact that doubling the grid density from 11×11 to 22×22 gives a substantial advantage (49.0% in average), but going further does not seem to noticeably improve
the results (Fig. 6.9) – probably due to the comparably smooth surface topography
of the human torso.
Runtime Performance. Let us conclude the experimental evaluation with a comment on runtime performance. The total runtime per frame was 2.3 s, measured
as mean over all datasets of the volunteer study. In detail, when initializing φ with
the estimates from the previous phase and Ψ with Ψ(yj ) = P(yj ), the optimization process took 38.2±2.1 iterations to converge for one subject, in average over
all subjects, respiration types and respiration phases. With our proof of concept
implementation, a single gradient descent step on a single core of a Xeon X5550
2.67 GHz CPU takes ≈ 60 ms. The resulting per-frame runtime of 2.3 s substantially outperforms related work on dense-to-dense surface registration [Scha 12]
with runtimes in the scale of minutes (25 iterations, 11.9 s per iteration on comparable CPU and for a surface mesh with a comparable number of vertices).
((a)
a)
(a)
(a)
|dM
onφφ
φ
(G)inin
in[mm]
[mm]
|d
(G)
p
|d
on
(G)
[mm]
pp
M
p|||on
M
pp
115
|dMp|||on
on φp(G)
(G)inin
in[mm]
[mm]
|d
(G)
[mm]
|d
Mpp onφφpp
M
Discussion and Conclusions
|dMp || on
on φp(G)
(G) in
in [mm]
[mm]
|d
Mp| on φφpp(G)
|d
in [mm]
M
p
6.4
((b)
b)
(b)
((c)
c)
(c)
Figure 6.7: Study of algorithmic parameters and modifications. Given are the residual
mismatch (top row) and the number of iterations until convergence (bottom row), respectively. (a) Results without (dark gray) and with (light gray) initialization of φ and Ψ. (b)
Impact of the convergence threshold, results for τ = 10−7 are depicted in dark gray, results
for τ = 10−4 in light gray. (c) Impact of the improved projection approximation [Berk 13]
(light gray) compared to our previous work [Baue 12a] (dark gray). Note that in order to
investigate the impact of different convergence thresholds and projection approximations
independent from the effect of initializing φ and Ψ, the results in (b) and (c) were generated
without initialization.
6.4
Discussion and Conclusions
In this chapter, we have introduced a novel variational approach to marker-less
reconstruction of dense non-rigid 4-D surface motion fields from sparse data and
prior shape knowledge from tomographic planning data. In the field of IGRT,
these motion fields can be used as high-dimensional respiration surrogates for
gated RT, as input for accurate external-internal motion correlation models in respiration-synchronized RT, for motion compensated patient positioning [Wasz 12b,
Wasz 13], and to reconstruct the intra-fractional body shape for patient setup monitoring during dose delivery.
We have investigated the performance of the proposed method on synthetic,
realistic and real data. In a comprehensive study on 256 datasets from 16 subjects
with an average initial surface mismatch of 5.66 mm, the mean residual reconstruction error was ±0.23 mm w.r.t. ground truth data. The 95th percentile of the
local residual mesh-to-mesh distance after registration did not exceed 1.17 mm for
any subject. In the experiments, it was further shown that a proper initialization
of the displacements φ and Ψ (Sect. 6.3.2) and the improved approximation of the
116
Sparse-to-Dense Non-Rigid Surface Registration
Abdominal Respiration
Thoracic Respiration
(a)
(d)
(a)
(d)
(b)
(e)
(b)
(e)
(c)
(f )
(c)
(f )
Figure 6.8: Estimation of φ4 transforming G into M4 from realistic MLT sampling data
Y4 , for abdominal (left) and thoracic (right) respiration. The phase p = 4 represents the
respiration state of fully inhale, roughly. (a) Planning surface G = M1 , MLT sampling Y4
(outer contour) and Y1 ⊂ G . (b) Glyph visualization of the estimated φ4 on G , for 11×11
sampling lines, |u4 | is color coded [mm]. (c) Initial mismatch in terms of dG on M4 . (d- f )
Residual mismatch in terms of dM4 on φ4 (G) for grid resolutions of 11×11, 22×22, 44×44
sampling lines. Note that the color coding between (c) and (d- f ) differs by a factor of 10.
projection compared to our first approach [Baue 12a] (Sect. 6.3.2) reduced the runtime by 19.2% and 48.2%, respectively. Doubling the MLT laser grid density from
11×11 to 22×22 lines would yield a considerable gain in accuracy (Sect. 6.3.2). In
this context, let us remark that the MLT sensor used for the experiments in this
work has not yet been optimized for the task of patient respiration monitoring.
First, higher frame rates are possible for both the line pattern projection systems
and the observing camera. The total frame rate of the MLT system is currently
limited only by the applied camera technology. Second, denser laser lines can be
realized by adapting the setup to the required measurement volume.
With regard to the state-of-the-art in non-rigid surface deformation estimation,
in particular in IGRT, let us compare our results to recent work by Schaerer et
al. [Scha 12] on motion tracking with dense surfaces. In their study on five male
subjects and dense surface acquisitions from three respiration phases5 , the authors
achieved a residual mismatch of 1.08 mm (95th percentile) in terms of mesh-to5 The
study by Schaerer et al. [Scha 12] was performed on data acquired with two stereo imaging units placed symmetrically w.r.t. the treatment couch, and in the static mode of the VisionRT
system (AlignRT). A practical application would require the use of the dynamic mode instead, that
potentially comes along a reduced FOV and/or increased uncertainties in 3-D surface sampling.
6.4
Discussion and Conclusions
117
|dMp | on φp (G) in [mm]
Influence of MLT Grid Density
Figure 6.9: Influence of MLT grid density on registration accuracy, evaluated on 256
datasets from 16 subjects. Given are boxplots for laser grid resolutions of 11×11, 22×22,
33×33, 44×44 sampling lines (grouped as four adjacent entries colored from dark to light
gray), for increasing ranges of respiration amplitude (from left to right).
mesh surface distance using non-rigid ICP surface registration [Ambe 07]. Note
that our result of 0.76 mm for the 95th percentile residual mismatch over all subjects and both respiration types slightly outperform these numbers, see Table 6.16 ,
though our quantitative experiments are limited to realistic data. In addition, let
us remark that compared to Schaerer’s volunteers, many of the subjects in our
study exhibited a considerably higher respiration amplitude and initial mismatch
(15.2 mm vs. 6.1 mm), respectively (cf. Table 6.1). Hence, the low residual mismatch indicates that our method can reliably recover the dense displacement field
from a sparse sampling of the instantaneous patient state using prior shape knowledge, even in the presence of strong respiration.
In addition to the hardware-related benefits of the MLT sensor for respiratory
motion tracking compared to existing RI-based IGRT solutions (cf. Sect. 6.1), the
CPU implementation of our surface registration approach outperforms the nonrigid ICP used by Schaerer et al. substantially in terms of runtime performance
(two orders of magnitude). As our approach exhibits an inherently high degree of
data parallelism, a GPU implementation [Bruh 12] might be considered in future
work to achieve real-time operation required for clinical applications. In addition
to exploiting data parallelism, embedded hardware interpolation could be used for
an efficient evaluation of the pre-computed SDF, e.g. using 3-D textures. Furthermore, independent of hardware acceleration techniques, a runtime speedup can
be achieved by reducing the dimensionality of the optimization problem, i.e. considering a subset of the MLT sampling data Y and/or reducing the fineness of the
reference shape G given by the density of the grid covering Ω.
6 Note
that the reported accuracy depends on the RI modality and the applied pre-processing
pipeline. In [Scha 12], details about this stage are not given.
118
Sparse-to-Dense Non-Rigid Surface Registration
CHAPTER
7
Photometry-driven Non-Rigid
Surface Registration
7.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
7.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
In the two previous chapters, we have presented methods for non-rigid surface
registration that rely on the sole geometry of the shapes to be aligned. In this
chapter, we investigate the potential of using complementary photometric information, available with modern RGB-D cameras, to guide the surface registration
process [Baue 12d]. Again, we consider the tracking of spatio-temporal 4-D surface
motion fields in IGRT describing the elastic patient torso deformations induced by
respiratory motion. In this context, we compare the proposed photometry-driven
non-rigid registration approach to a geometry-driven baseline. With regard to the
clinical workflow in IGRT, let us remark that the proposed photometric approach
requires RGB-D reference data – a direct alignment w.r.t. tomographic planning
data is unfeasible as those do not provide photometric information. This implies
the need for an initial pre-registration of RGB-D reference data onto the tomographic planning shape, potentially introducing additional uncertainties due to
error propagation, cf. Sect. 3.1.1.
The remainder of this chapter is organized as follows. In Sect. 7.1, we motivate our approach with a practical observation and review relevant literature in
the field. The proposed framework for photometry-driven non-rigid surface registration is introduced in Sect. 7.2. In Sect. 7.3, we present experimental results on
real data. In particular, we compare the photometry-driven approach to a conventional geometry-driven surface registration as previously introduced in Sect. 5.3.
Eventually, we discuss the results and draw conclusions in Sect. 7.4. Parts of this
chapter have been published in [Baue 12d] and are reprinted with kind permission
from Springer Science and Business Media, © Springer-Verlag Berlin Heidelberg
2012.
119
120
Photometry-driven Non-Rigid Surface Registration
r (end-exhalation)
r (end-inhalation)
f rgb (end-exhalation)
(a)
(b)
(c)
(d)
f rgb (end-inhalation)
Figure 7.1: RGB-D measurements of a reclined patient at different respiration states (endexhalation/end-inhalation). In subfigure (a), the measured range data r is color-coded.
Blue tones denote closeness, red tones remoteness to the RI camera. In subfigure (b), the
additionally acquired photometric information f rgb is mapped onto the 3-D surface Xr .
The bottom row sketches our practical observation. Estimating local displacements from
the sole geometry (c, red arrows) differs from the motion of the external body surface
w.r.t. photometric measurements, cf. the trajectories at salient landmarks (d, green arrows).
7.1
Motivation and Related Work
Related work on estimating the torso deformation induced by respiratory motion
typically relies on surface registration techniques that solely consider the 3-D topography of the external patient body surface, cf. Chapters 5,6 and the work by
Schaerer et al. [Scha 12]. In practice, we have experienced that the displacement
fields estimated from geometry-driven approaches do not necessarily match the
changes that can be observed in the photometric domain. For a concrete illustration of this effect on real data we refer to Fig. 7.1. Hence, in this chapter, we propose an alternative photometry-driven surface registration approach to recover
non-rigid patient torso deformations.
Years before affordable dynamic RGB-D technologies were introduced, Vedula
et al. proposed the concept of scene flow as an extension of optical flow in the
3-D domain [Vedu 99]. Basically, optical flow is the projection of scene flow onto
the sensor plane – or vice versa, back-projection of the optical flow onto the measured surface geometry yields the scene flow in 3-D space. Early work in the field
focused on the computation of scene flow from stereo image sequences [Zhan 00a,
Vedu 05, Hugu 07] or multi-camera setups [Carc 02, Pons 07] and were often restricted to sparse and inaccurate scene flow estimates due to the multi-view correspondence problem in poorly textured regions, recall Sect. 2.1.1. Spies et al. were
the first to study scene flow estimation on RI sequences with complementary photometric information, also known as range flow [Spie 02]. In analogy to the classi-
7.2
Materials and Methods
121
cal optical flow constraint [Horn 81, Luca 81], assuming brightness constancy, the
authors introduced an additional range flow constraint to model the motion in
the depth component. Along with the widespread availability of dense and realtime RGB-D sensors, the estimation of scene flow has increasingly gained interest.
Most commonly, along the lines of variational optical flow [Horn 81], scene flow
is estimated by optimizing a global energy function that consists of a data term
comprising photometric and/or range constraints and a regularization term enforcing smoothness of the displacement field [Spie 02, Gott 11, Herb 13, Leto 11].
Letouzey et al. combined the conventional optical flow constraint in the image domain with sparse 3-D correspondences that are established from 2-D photometric
features extracted and matched in the image domain. The regularization of the
3-D displacement field is performed over the surface, exploiting the associated
range measurements [Leto 11]. Gottfried et al. built on previous work by Spies
et al. [Spie 02] and Sun et al. [Sun 10] to estimate scene flow on Microsoft Kinect
data [Gott 11]. They propose the use of an adaptive regularization of the displacement based on strong regularization in valid and weak regularization in invalid
regions to preserve scene flow discontinuities. Recent work by Herbst et al. also
built on the basic concept by Spies et al. [Herb 13].
Beyond variational formulations, alternative approaches have been presented.
For instance, Quiroga et al. have proposed the computation of 3-D scene flow in
a Lucas-Kanade framework [Quir 12], directly estimating the trajectories of local
surface patches. Essentially, the authors consider scene flow computation as a
2-D tracking problem in both photometric intensity and geometric depth data. In
contrast, Hadfield and Bowden proposed a particle filter framework, modeling
RI data as a collection of moving particles in 3-D space [Hadf 11]. Assuming a
comparably small degree of deformation for the problem at hand, note that we
have neglected related work on scene flow reconstruction for large displacements
in this review [Thor 09].
In the context of this thesis and with regard to the practical observation depicted initially, we are particularly interested in comparing a photometry-driven
surface motion field estimate with a geometry-driven estimate. In this chapter, we
confine to a classical two-stage approach along the lines of Vedula et al. [Vedu 05].
Based on an optical flow estimate in the 2-D photometric domain, the surface motion field is deduced directly using the associated 3-D point cloud data. To our
knowledge, this is the first medical application of the concept of scene flow.
7.2
Materials and Methods
In this section, we describe the proposed method for photometry-driven reconstruction of 3-D surface motion fields (Sect. 7.2.1). In addition, for comparison, we
oppose a geometry-driven surface registration approach (Sect. 7.2.2), based on the
scheme previously introduced in Sect. 5.3. For an illustrative comparison of both
approaches we refer to Fig. 7.2.
In terms of notation, r : Ω → R denotes the geometric range and f rgb : Ω → R3
the associated photometric color measurements in the 2-D sensor domain Ω. Exemplary data of a male torso under respiration are depicted in Fig. 7.1a,b. Fur-
122
Photometry-driven Non-Rigid Surface Registration
(a)
(b)
Figure 7.2: Graphical illustration of the methods for (a) geometric and (b) photometric estimation of surface motion fields ug , up . The geometric approach matches the given shapes
based on its geometry in a direct manner. In contrast, the proposed photometric approach
comprises a two-stage scheme, first estimating a 2-D motion field ũp in the photometric
image domain and then deducing the 3-D surface motion field up from the associated
range measurements.
thermore, RGB-D reference data are denoted (rref , Xrref , f rgb,ref ), the instantaneous
intra-fractional data at time t as (rt , Xrt , f rgb,t ). The photometry- and geometrydriven displacement fields which describe the deformations φp : R3 → R3 and
φg : R3 → R3 with φp(g) (Xrref ) ≈ Xrt are denoted up : Ω → R3 and ug : Ω → R3 ,
respectively.
7.2.1
Photometry-Driven Surface Registration
The proposed algorithm to estimate dense 3-D surface motion fields from photometric information is based on a two-stage procedure: First, we perform a nonrigid registration in the photometric 2-D image domain. In general, any parametric
or non-parametric registration method can be applied for this purpose [Zito 03]. In
this work, we use an optical flow framework, see Sect. 7.3.1. Second, we transfer
the 2-D point correspondences in the photometric image domain to the 3-D point
positions of the associated range measurements. Hence, the proposed scheme
yields both:
• A 2-D displacement field ũp : Ω → R2 describing the deformation in the
photometric image domain,
• A 3-D displacement field up : Ω → R3 , deduced from the former, describing
the geometric deformation of the associated surface.
A graphical illustration of the approach is given in Fig. 7.2b. Note that without
loss of generality, any arbitrary non-rigid image registration method that yields a
dense displacement field can be applied in the first stage. Based on the estimated
7.3
Experiments and Results
123
2-D displacement field ũp , the 3-D surface motion field up between the reference
shape Xrref and the torso shape Xrt at respiration state t can be inferred via:
up (ζ ) = xrt ζ + ũp (ζ ) − xrref (ζ )
= rt ζ + ũp (ζ ) γ ζ + ũp (ζ ) − rref (ζ )γ(ζ ) ,
(7.1)
using bilinear interpolation in the sensor domain Ω for evaluating rt ζ + ũp (ζ ) .
7.2.2
Geometry-Driven Surface Registration
As motivated in Sect. 7.1, we intend to compare the proposed photometry-driven
surface registration to a geometry-driven baseline. For this purpose, we build on
the variational non-rigid surface registration framework introduced in Sect. 5.3.
Here, we represent the surface Xrt at time t by its corresponding signed distance
function dXrt . Then, using |dXrt (φg ( x))| as a pointwise closeness measure, we estimate the geometry-driven 3-D surface motion field ug by minimizing:
E [ug ] = Ematch [ug ] + κg Ereg [ug ]
=
Z
Ω
|dXrt (φg ( xrref (ζ )))|2 + κg k Dug (ζ )k22 dζ ,
(7.2)
where κg denotes the regularization weight. For numerical minimization, we considered a gradient descent scheme as described in Sect. 5.4.2. Recall that both
ug : Ω → R3 and up : Ω → R3 describe the 3-D torso deformation. What differs is
the driver of the surface motion estimation, namely, geometric information for ug
and complementary photometric information for up (Sect. 7.2.1).
7.3
Experiments and Results
In the experiments, we investigate the application of the proposed photometrydriven reconstruction of 3-D surface motion fields for the tracking of elastic torso
deformations induced by respiration.
7.3.1
Materials and Methods
We have acquired RGB-D data with Microsoft Kinect from four healthy subjects.
Being reclined on a treatment table, the subjects were asked to perform (1) normal
breathing and (2) deep inhalation thoracic breathing. For both respiration types,
we extracted RGB-D data for the states of end-expiration and end-inhalation, respectively. The task to be solved was then to find the non-rigid surface motion field
aligning the former to the latter. Prior to registration, range measurement data
were pre-processed using edge-preserving denoising (Sect. 2.2.3). For estimating
the non-rigid 2-D displacement field ũp with the proposed photometry-driven approach, we have applied a variational optical flow framework by Liu [Liu 09]. Essentially, it builds on the combined local-global method for optical flow computation by Bruhn et al. [Bruh 05] that combines the advantages of two classical algorithms: the variational approach by Horn and Schunck [Horn 81] providing dense
124
Photometry-driven Non-Rigid Surface Registration
flow fields, and the local least-square technique of Lucas and Kanade [Luca 81]
featuring robustness with respect to noise. The weighting parameters of the regularizers for the photometry-driven and the geometry-driven approach, ensuring
smoothness of the estimated displacement fields, were empirically set to κp =
1.5 · 10−2 and κg = 10−6 .
In our experimental setup using real data, the ground truth 3-D surface motion
field is unknown. Furthermore, the attachment of markers is unacceptable as it
might accidentally bias the optical flow computation. Hence, for quantitative evaluation, we applied the following scheme: First, we projected the 3-D displacement
field ug : Ω → R3 onto the sensor domain and applied the resulting displacement
ũg : Ω → R2 to the 2-D photometric data f rgb,ref on Ω acquired at the reference
respiration state of end-exhalation. We then compared the warped images to the
known photometric data f rgb,t at the respiration state of end-inhalation, over the
patient’s torso given by a binary mask B . In particular, as a scalar measure of the
initial and residual mismatch, we computed the root mean square photometric
distance of the initial and warped data w.r.t. the reference f rgb,t , respectively:
s
e0 =
s
ep ( g ) =
1
k f t (ζ ) − f ref (ζ )k22 ,
|B| ζ∑
∈B
1
k f t ζ + ũp(g) (ζ ) − f ref (ζ )k22 ,
∑
|B| ζ ∈B
(7.3)
(7.4)
where e0 denotes the initial mismatch, and ep , eg the residual mismatch after
photometry-driven and geometry-driven registration, respectively.
7.3.2
Results
Quantitative results of the residual mismatch for the proposed photometry-driven
and the opposed geometry-driven registration approach are depicted in Fig. 7.3.
Note that the measurements of the RGB channels of f rgb were scaled to the range
of [0, 1] here. For both normal breathing and deep inhalation, the photometric
approach outperformed the geometric variant by (eg − ep )/eg = 6.5% and 22.5%,
respectively, in average over all subjects.
Qualitative results for the deep inhalation study are depicted in Fig. 7.4. Here,
the position of the papilla, being a salient anatomical landmark in the photometric domain, was manually labeled for the respiration states of end-exhalation and
end-inhalation. Furthermore, it was labeled for the warped images after geometrydriven and photometry-driven registration. Overlaying the target position at the
state of end-inhalation reveals the initial mismatch with the reference state (endexhalation). In addition, regarding the papilla position after photometry-driven
and geometry-driven registration, it stresses the superior performance of the proposed photometry-driven approach in matching the target, resulting in an negligible residual photometric mismatch1 . By trend, our experiments indicated that the
1 Note
that evaluating the estimated displacement vector at a photometrically salient landmark
might differ from the results in less salient regions.
7.4
Discussion and Conclusions
0.15
125
Geometry-driven vs. Photometry-driven Registration: Normal Breathing
0.15
Geometry-driven vs. Photometry-driven Registration: Deep Inhalation
Initial mismatch
Mismatch geometric registration
Mismatch photometric registration
Mismatch prior to / after registration
Mismatch prior to / after registration
Initial mismatch
Mismatch geometric registration
Mismatch photometric registration
0.1
0.05
0
1
2
3
Subject
4
0.1
0.05
0
(a)
1
2
3
Subject
4
(b)
Figure 7.3: Quantitative comparison of photometry-driven and geometry-driven surface
registration. Given are results for four volunteers. For both normal breathing (a) and deep
inhale (b), the photometry-driven approach outperformed the geometry-driven alternative
for all cases. As intuition suggests, the effect is more pronounced with deep inhalation.
Also note the higher initial mismatch with deep inhalation compared to normal breathing.
geometry-driven approach underestimated the motion pattern in superior-inferior
direction for all four subjects.
The estimated 3-D surface motion fields aligning the respiration states of endexhalation and end-inhalation are illustrated in Fig. 7.5, for both the photometrydriven and the geometry-driven surface registration approach. It can be observed
that the photometry-driven surface motion field up is more pronounced in SI direction than the geometric variant ug , particularly for the upper torso region. Again,
this indicates an underestimation of the SI motion component by the geometrydriven registration.
7.4
Discussion and Conclusions
In this chapter, we have presented a method for reconstructing dense 3-D surface
motion fields over non-rigidly moving surfaces using RGB-D cameras. Opposed
to conventional registration approaches that typically rely on the sole surface geometry, the registration process is driven by photometric information. In an experimental study for the application in IGRT, we have investigated the performance
of the proposed photometry-driven method compared to a geometry-driven baseline. Both approaches are capable of providing dense surface motion fields for
respiratory motion management.
In experiments on real data from healthy volunteers, the proposed photometric method outperformed the geometry-driven surface registration by 6.5% and
22.5% for normal and deep thoracic breathing, respectively, evaluating the residual photometric mismatch. Note that the distance measure quantifies the photoconsistency of the warped images to the target and thus might potentially intro-
126
Photometry-driven Non-Rigid Surface Registration
(a)
(b)
(c)
(d)
Figure 7.4: Manually labeled data from the four subjects (rows) performing deep inhalation thoracic breathing. Shown are photometric data from the thorax region, converted to grayscale in this illustration for better visibility of the colored labels. Given are
data for end-exhalation (a), end-inhalation (d) and the results of geometry-driven (b) and
photometry-driven (c) registration. For each subject, the position of the white crosshairs
denotes the location of the papilla in the end-inhalation stage. The colored crosshairs depict the position of the papilla prior to registration (in blue) and after geometry-driven
(red) and photometry-driven registration (in green).
duce a bias favoring the result of the photometry-driven surface registration approach. However, comparing the mismatch in the geometric domain is impracticable here as the photometry-driven approach yields a perfect shape match by
design, cf. Eq. (7.1).
In addition, we observed an underestimation of the surface motion in SI direction for the geometry-driven registration. This coincides with the results of
Schaerer et al. [Scha 12] comparing a geometry-driven registration to the local trajectories of skin markers (thus reflecting skin motion) and stating a reduced accuracy of the registration algorithm in recovering SI surface motion. Clinical studies have shown that the SI direction is the prominent direction of human breathing [Keal 06]. Hence, at first glance, one might interpret the results as an indication that the proposed photometry-driven registration variant is potentially a
better choice for estimating surface motion fields [Baue 12d]. Here, let us point out
a more differentiated view. In particular, we assume an interplay of three motion
7.4
Discussion and Conclusions
127
up
ug
(a)
(b)
Figure 7.5: Comparison of the 3-D surface motion fields up (upper row) and ug (lower
row), for two subjects (a, b). The color of the displacement vectors encodes its magnitude
in SI direction, according to the color bar on the right. Even though we depict results for
the case of normal breathing, the underestimation effect of the geometric approach in SI
direction is clearly visible, cf. Fig. 7.4.
types being involved in the considered scenario. From interior to exterior, these
are (1) soft tissue and organ motion in the abdominal and thoracic cavities due to
contraction and relaxation of the diaphragm and ribcage muscles, (2) motion of the
ribcage induced by bio-mechanical coupling to (1), and (3) skin motion induced by
(2). While the coupling between (1) and (2) has been investigated extensively in
physiology literature [West 08], a differentiation between (2) and (3) has not been
considered yet. More specifically, while the external torso geometry and therefrom deduced surface motion fields essentially reflect the ribcage movement, we
assume that there is an elastic stretching component involved between skin and
ribcage motion that might lead to the differences between photometry-driven and
geometry-driven surface motion estimates2 . Another influencing factor that might
explain the differences of the estimated motion fields in our experimental evaluation is the choice of the regularization type and weighting.
In conclusion, let us stress that both motion fields up and ug are meaningful
and valuable for application in respiratory motion management. However, depending on the particular application, it must be investigated which motion fields
better correlate with the internal target motion. Different physical signals may
have stronger or weaker relationships with the respiratory motion [McCl 13]. This
could be validated in a setup that acquires 4-D CT/MR data simultaneously with
RGB-D data, but being beyond the scope of this thesis.
2 This
implies that skin markers might be an inappropriate choice to evaluate geometry-driven
surface registration methods, as performed by Schaerer et al. [Scha 12], for instance.
128
Photometry-driven Non-Rigid Surface Registration
CHAPTER
8
Outlook
The concepts proposed in this thesis open a number of opportunities for further
research. In this chapter, we discuss directions and perspectives as well as challenges toward clinical translation.
Directions for Rigid Surface Registration. Regarding the feature-based framework for rigid surface registration that is applicable in the presence of large misalignments (Chap. 3), the following aspects should be considered. With the current approach, shape descriptors are computed for every single surface point in
the template and reference shape, respectively. Instead, introducing a preceding
feature detection stage that identifies salient keypoints with a low-level algorithm
would help reduce the computational effort for descriptor extraction. In addition, it would narrow down the search space in the subsequent feature matching stage. The matching stage could be accelerated using the RBC data structure
and search scheme (Sect. 4.3.2) opposed to the brute force nearest neighbor search
used in the current approach. Furthermore, benefits are expected from using a
multi-scale search scheme. Gaussian filtering followed by subsampling in the underlying 2-D range image domain could be applied to generate multi-scale surface
data [Bona 11]. Then, from coarse to fine levels, the found correspondences would
be propagated to initialize the correspondence search at a finer scale, improving
both robustness and runtime performance. If RGB-D data are available for both
the template and the reference shape, one could further consider the application
of photo-geometric shape descriptors [Baue 11b]. These encode both the photometric texture and geometric shape characteristics in a common representation. This
might be particularly helpful to establish correspondences in regions with nonsalient topography.
The benefits of incorporating complementary photometric information for rigid
surface registration were also demonstrated in Chap. 4. We proposed a photogeometric ICP framework using the concept of RBC for efficient NN search. For
estimating the aligning transformation, we built on the classical point-to-point error minimization that can be solved in closed form. However, from a statistical
point of view, this implicitly assumes that the points are observed with zero mean
and isotropic Gaussian noise [Bala 09]. In a multi-modal surface registration scenario this will generally not be the case, e.g., because of differences in mesh resolution and topology. Also, 3-D measurement errors may be highly anisotropic as
RI sensors typically have a much higher localization uncertainty in the viewing di129
130
Outlook
rection of the camera. Hence, future work should consider a generalized anisotropic
ICP along the lines of Maier-Hein et al. [Maie 12], although requiring an iterative
optimization scheme [Bala 09]. Another improvement in terms of accuracy is expected when minimizing a point-to-plane distance metric opposed to a point-to-point
metric as it allows the shapes to slide over each other and thus avoids snap-to-grid
effects [Chen 92]. Again, however, solving the corresponding optimization problem then involves an iterative scheme. In analogy to the feature-based approach,
applying a multi-scale ICP scheme can improve both robustness and convergence
behavior. Also, an automatic scene-dependent adaptation of the photo-geometric weight
by low-level analysis of the acquired RGB-D data might be a promising direction.
For the reconstruction scenario considered in Chap. 4 where an RGB-D data
stream is fused on-the-fly to establish a global shape model we recommend the
transition from frame-to-frame registration (considering the current frame and the
previous one) by a frame-to-model registration (considering the shape of the instantaneous global model as reference). This has been shown to reduce drift effects,
e.g. when using a TSDF representation [Izad 11, Newc 11], cf. Sects. 4.4.2,4.4.3.
Note that for the proposed photo-geometric registration approach such a frameto-model reconstruction scheme implies the need for a 6-D global model incorporating both photometric and geometric information [Stei 11, Whel 12].
Part I of this thesis has addressed the task of rigid shape alignment. In the
experiments (Sects. 3.5.1,3.5.2), the feature-based framework has proved to meet
the requirements for multi-modal application on RI/CT data. Although the approach is robust w.r.t. noise and topological differences, it was designed for rigid
shape comparison. Thus, in general, it is sensitive to non-rigid deformations. In
clinical practice, the assumption of rigidity does often not hold and must be relaxed. Hence, tackling elastic deformations is a major direction for future research
in the field. In principle, a feature-based approach is capable of coping with nonrigid deformations, if the rigidity assumption approximately holds within the local
neighborhood that is encoded by the shape descriptor used in the correspondence
search. In this case, instead of estimating a global rigid transformation, a dense
non-rigid displacement field may be derived from sparse correspondences using
interpolation techniques [Amid 02]. In this context, future work should benchmark the resilience of the proposed shape descriptors w.r.t. small-scale deformations. Regarding the proposed photo-geometric ICP framework, a non-rigid extension along the lines of Amberg et al. may be a promising option [Ambe 07].
Directions for Non-Rigid Surface Registration. Part II of this thesis was concerned with the estimation of non-rigid torso deformations with prospective applications in respiratory motion management in IGRT.
In Chap. 5 we have presented a variational formulation for joint RI denoising
and its registration to an accurate reference shape. Here, for a first feasibility study,
we have confined to a simple denoising formulation that combines a least square
type fidelity term with a TV-like regularization. In particular, we have chosen a
pseudo Huber norm, enforcing a strong smoothing in flat regions to avoid staircasing while preserving discontinuities that occur at the torso boundaries. Future work might consider more advanced denoising models such as the variational
131
formulation by Lenzen et al. on adaptive anisotropic total variation [Lenz 11] for
even better preserving these boundaries. In the current approach, we have initialized the displacement field that is to be reconstructed with the estimate from the
previous phase. This can be considered a first step toward temporally coherent
deformation tracking. A promising extension to the proposed variational formulation might be to incorporate a dedicated additional temporal regularization term for
the displacement field, ensuring smoothness and consistency of the motion trajectories over time, cf. [Volz 11, Garg 13]. Further experimental studies should investigate the performance of the joint approach compared to subsequent denoising
and registration w.r.t. different RI modalities beyond ToF imaging, e.g. structured
light. It might be beneficial to incorporate modality-specific noise characteristics
into the denoising term.
In Chap. 6, we have proposed a novel method for estimating dense surface
motion fields from sparse RI measurement data. The experimental study has
shown that the proposed formulation yields highly accurate dense surface reconstructions. This is a promising result, in particular when considering the fact that
the evaluation was performed on data with the noise characteristics of raw, unfiltered MLT measurements. We expect another gain in accuracy by pre-processing the
MLT measurements using customized denoising filters. In analogy to the concept
in Chap. 5, this could be implemented by incorporating an additional term into
the objective function, jointly denoising the MLT data while performing a sparseto-dense surface registration. The objective function could be further refined with
an adaptive regularization of the dense displacement field. In particular, we propose to
adjust the regularization weight for a local displacement vector according to its
spatial position w.r.t. the MLT measurement grid. The idea is to enforce strong
regularization in regions with low support due to poor coverage by the MLT sensor – thus increasing the local contribution of prior shape knowledge. Vice versa,
in regions where the interventional torso shape is known from nearby MLT measurements, e.g. close to the position where two projected laser lines intersect, the
regularization could be relaxed.
In Chap. 7, we have presented a photometry-driven approach to surface deformation tracking. A straightforward extension might combine the two-step approach in a joint photo-geometric registration formulation. Instead of first estimating a 2-D displacement field in the photometric image domain and then deriving
the associated 3-D surface motion field from the former, the objective function
could be reformulated such that it directly estimates the 3-D motion field. For
instance, one might combine an optical flow constraint in the 2-D image domain
with a regularization of the corresponding 3-D displacement field over the surface [Leto 11]. Our practical experience further suggests to exploit salient anatomical
landmarks that are present in the photometric domain (e.g. papilla for torso applications) to improve the accuracy of the estimated displacement field. In particular,
one might add a term to the objective function that enforces closeness of corresponding 3-D points that were established from 2-D photometric features. An
alternative strategy might treat the matched landmarks as hard constraints in the
optimization process along the lines of Daum [Daum 11]. Thus, these landmarks
132
Outlook
would serve as anchor points in the motion field reconstruction process and constrain the displacement field in addition to the classical optical flow constraint.
Directions for RI-based Guidance in RT. Regarding the medical applications
addressed in this thesis, the focus was on RI-guided RT. In particular, we have addressed marker-less patient positioning (Chap. 3) and respiratory motion tracking
(Chap. 5-7). Below, we give an outlook into future research from an application
point of view, confining the discussion to the specific field of RT.
First and foremost, the proper integration of the individual components into
the existing clinical workflow in IGRT must be substantiated. This applies for both
automatic coarse setup superseding the conventionally manual and marker-based
initial alignment, as well as marker-less and non-radiographic respiratory motion
management for continuous target tracking and treatment opposed to gated RT.
It also implies the need for clinical studies investigating robustness, reliability and
accuracy of the proposed methods on patient data acquired in a clinical environment. In particular, such studies must investigate the performance of the complete
system combining the individual rigid and non-rigid registration modules along
the workflow. For instance, note that the accuracy of correlating external surface
motion fields to the internal target motion learned from 4-D tomographic planning data highly depends on the accuracy of patient setup w.r.t. this planning data.
Consequently, the precise alignment of the interventionally acquired patient shape
to planning data is a fundamental prerequisite to minimize error propagation in
downstream tasks. Note that these studies must be carefully designed concerning a potential bias, e.g. using optical markers as a reference in photometry-driven
registration approaches.
Another essential prerequisite for clinical studies is the availability of certified
RI sensors that provide 3-D measurement data in a reliable and accurate manner.
While the sensors used in this thesis (PMD CamCube, Microsoft Kinect, MLT sensor) are not approved for clinical application, some IGRT providers have such
clearance for stereo vision (VisionRT Ltd., London, UK) or structured light (Catalyst, C-RAD AB, Uppsala, Sweden) based RI sensor hardware. Clinical studies
should investigate the system performance w.r.t. the camera setup, e.g., comparing a single-camera solution to a multi-camera setup. The latter is expected to provide improved coverage in dynamic environments that imply temporal or partial
patient occlusion due to clinical staff or hardware. With regard to a multi-camera
RI acquisition setup, a smart fusion of range data from multiple devices was proposed by Kainz et al. [Kain 12] and Wasza et al. [Wasz 13]. Associated practical
issues that are not covered by this thesis but must be solved for clinical acceptance
include a robust, accurate, and effortless system calibration and temporal synchronization w.r.t. the treatment system. Furthermore, real-time implementations of
the approaches proposed in this thesis may be necessary to fulfill the demands in
clinical practice.
Concerning the integration of the proposed feature-based framework for automatic coarse patient setup, a gated positioning approach could be applied. Using a
1-D respiration curve extracted from RI data as an indicator for the current respiration state [Scha 08] can help trigger the acquisition of surface data at the particular
133
respiration state that was chosen for the acquisition of prior planning data. Regarding the application of the reconstructed dense non-rigid displacement fields
in respiratory motion management, several approaches should be investigated.
For instance, the 4-D surface motion fields might be analyzed to determine the
current respiration type based on a previously learned patient-specific respiration
model, e.g. allowing for an automatic separation between thoracic and abdominal
breathing [Wasz 12a]. Furthermore, such learned motion models [McCl 13] could
be used for motion-compensated patient positioning [Wasz 12b, Wasz 13].
The most obvious limitation of RI-based guidance systems is the fact that the
observation is restricted to the external body surface – that might be even covered by sterile drapes. Hence, the movement of internal structures are invisible
unless combining external measurements with planning data from 3-D or 4-D tomographic imaging. For application in tumor motion compensation, the essential
question is: How to infer the motion of internal structures from the interventionally measured external torso deformations? There are two ways to approach this
challenge. First, one might consider a non-rigid 3-D extension of the external body
deformation onto volumetric planning data. However, we hypothesize that a reliable extension would require a tissue-specific modeling of elasticity and deformation behavior under physical stress, e.g. using bio-mechanically or physiologically inspired FE models [Robe 07, Eom 09]. This would further require a segmentation of the planning data w.r.t. different organs and tissue types. A promising alternative strategy involves techniques from machine learning to establish
an external-internal motion correlation model from 4-D CT/MR planning data,
along the lines of Schaerer et al. [Scha 12] and McClelland et al. [McCl 13], also recall Sect. 5.1. However, this necessitates the acquisition of 4-D tomographic planning data. While 4-D CT involves additional radiation exposure to the patient,
we expect 4-D MR planning to boost the prospective clinical acceptance of this
approach [Miqu 13]. Nonetheless, one open question in the context of externalinternal correlation remains: Which type of surface motion field correlates best with the
associated motion of internal structures? In practice, driven by the individual matching and regularization terms and a customized incorporation of prior knowledge,
the methods proposed in Chapters 5-7 and presented in literature [Scha 12] will
yield different surface motion fields describing the same physical torso deformation. Hence, clinical studies must investigate which approach is suitable to achieve
an optimal correlation. Potentially, advanced regularization schemes that incorporate prior knowledge about the biomechanics of human respiration might be
required to address this issue.
Translation to Different Clinical Applications. The methods proposed in this
thesis are not restricted to the medical applications discussed in the individual
chapters. Indeed, rigid and non-rigid surface registration of RI measurements to
data acquired with conventional medical imaging modalities is an essential prerequisite for a wide range of clinical applications.
The photo-geometric rigid registration framework (Chap. 4) can be applied for
3-D reconstruction of various anatomical shapes that can be modeled as approximately rigid. Apart from the applications considered in this thesis, we expect
134
Outlook
great potentials for the reconstruction of tubular shaped objects with an inherently
low degree of elasticity such as the esophagus, the trachea or the lung tree in
bronchoscopy, being highly ambiguous for geometry-driven shape reconstruction
methods. Future research may focus on fusing RGB-D data acquired during camera insertion and retraction, or incorporating prior shape knowledge from planning data into the interventional shape reconstruction process. An on-the-fly registration of endoscopic RGB-D data with a reference shape extracted from tomographic planning data would further allow for augmented reality navigation and
guidance during the procedure. Such hybrid systems might further help reduce
reconstruction drifts by registration to a global shape model, and are expected to
be more robust and accurate compared to previous approaches based on conventional 2-D endoscopic video data [Rai 06, Higg 08].
The non-rigid surface registration approaches introduced in Part II of this thesis can be directly applied to manifold clinical applications beyond RT. For instance, dense surface motion tracking is of particular interest for improved navigation in image-guided interventions such as computer-aided open hepatic surgery
[Mark 10, Oliv 11] or endoscopy-guided minimally invasive procedures. Tracking external organ deformations during IGLS might eventually allow for dynamic
augmented reality with pre-operative planning data such as a real-time update of
the deforming internal hepatic vessel tree during tissue manipulation.
Dense 4-D surface motion fields could also help reducing motion artifacts in tomographic reconstruction [Baue 13a]. Gianoli et al. proposed the use of marker-based
surface tracking to extract a multi-dimensional respiration surrogate for reducing artifacts in retrospective 4-D CT image sorting [Gian 11]. Their experiments
revealed that using multiple surrogates reduced uncertainties in breathing phase
identification compared to conventional methods based on a one-dimensional surrogate. In addition, RI-based body surface tracking is of particular interest for motion compensation in nuclear imaging such as PET and SPECT [Bett 13]. Based on
previous work on motion compensation in PET/SPECT using marker-based tracking [Alno 10, Bruy 05, McNa 09], the potential application of dense and real-time
RI has been attracting interest in the field lately [Oles 10, Noon 12].
CHAPTER
9
Summary
The advent of dense and dynamic RI technologies is expected to accelerate the future demand for surface registration techniques. This thesis has addressed promising medical applications that require a mono-modal or multi-modal alignment of
RI data. Depending on the particular task, the proposed methods target both rigid
and non-rigid registration scenarios.
In Chap. 2, we outlined the measurement principles of different real-time capable RI modalities and detailed the technologies that were applied in this thesis
(ToF, SL, MLT), with a thorough discussion of modality-specific strengths and limitations. We introduced our development platform for range image processing
(RITK), the integrated RI simulation environment used for quantitative evaluation throughout this work, and our data enhancement pipeline. In addition, we
summarized recent developments and promising fields of applications of modern
RI technologies in health care. Eventually, we reviewed the state-of-the-art in the
field of surface registration and shape correspondence, with a particular focus on
medical applications.
Rigid Surface Registration. The rigid alignment of shape data is a common challenge in many medical applications. The availability of modern RI technologies
that enable an intra-interventional acquisition of dense spatio-temporal surface
data has accelerated the demand for a robust solution to this task, holding potential to improve existing and create new and innovative clinical workflows.
In Chap. 3, we addressed two particular examples. First, we introduced a novel
marker-less solution for automatic initial patient setup in RT. It is based on a direct
multi-modal alignment of intra-fractional RI data of the patient’s external body
surface to tomographic planning data. Second, we proposed the application of the
technique to IGLS, where the alignment of the target organ to pre-operative reference data based on intra-operative RI holds great potential to augment navigation. In clinical practice, conventionally, both tasks are performed in a manual and
marker-based manner. To overcome this, we have developed a feature-based rigid
surface registration framework. More specifically, to meet the particular requirements in multi-modal shape alignment, we have introduced 3-D shape descriptors
that are invariant to mesh density and organization, and resilient to inter-modality
deviations in surface geometry. By design, the method can handle gross initial
misalignments and cope with partial matching. For initial patient setup in RT, the
proposed approach yielded an average angular positioning error of 1.5±1.3◦ and
135
136
Summary
an average translational positioning error of 12.9±6.6 mm, at a 97.5% success rate,
for aligning Microsoft Kinect data to a reference shape extracted from CT planning data. For organ registration in IGLS, the average target registration error was
3.8±1.1 mm on porcine liver data in a ToF/CT setup, stressing the potential of
the proposed RI-based solution to supersede manual coarse alignment. Having
successfully applied the proposed framework on different RI modalities and biological materials further indicates the generalization capability of the approach.
Along with an increasing interest in using RI for medical applications, we have
observed increasing efforts to miniaturize RI devices toward application in endoscopy. In Chap. 4, we addressed two exemplary fields that will benefit from
the availability of 3-D endoscopes. In laparoscopy, the intra-operative registration of RI data provides the opportunity to reconstruct the geometric shape of
the operation situs. This provides the surgeon with an extended view of the target and knowledge about surrounding anatomy. Aligning the interventional situs geometry to pre-operative planning data further enables augmented reality
guidance. In colonoscopy, registration of intra-procedural RI data will allow to
construct metric 3-D shape models that could assist gastroenterologists in quantitative diagnosis and pre-operative planning. An alignment to pre-interventional
virtual colonoscopy data would further improve navigation. To address such endoscopic shape reconstruction scenarios, we proposed an ICP-based rigid registration framework, assuming rigidity as an acceptable approximation for mapping
scenarios where accuracy requirements are less strict and motion is moderate. The
approach incorporates geometric shape and complementary photometric appearance information in a joint manner to guide the registration process. To meet realtime constraints (≥ 20 Hz), our ICP variant builds on a novel acceleration structure
for efficient 6-D NN queries. In particular, we optimized the RBC search scheme
in terms of performance for low-dimensional data, trading off accuracy against
runtime and yielding ICP runtimes of less than 20 ms on an off-the-shelf GPU. In
a study on synthetic RGB-D data, we found that incorporating photometric appearance as a complementary cue substantially outperformed a geometry-driven
conventional ICP. This makes the approach of particular interest for RGB-D cameras that provide low-SNR range measurements but additional high-grade photometric data. For operation situs reconstruction in laparoscopy, the proposed
photo-geometric variant reduced the drift by a factor of 12.9 (translation) and 5.9
(rotation) compared to a geometry-driven ICP. For colon model construction, the
approach yielded comparable factors.
Non-Rigid Surface Registration. Dynamic RI technologies that enable real-time
3-D shape acquisition are of particular interest for medical applications, providing
means to capture non-rigid shape deformations in a marker-less manner. In the
second part of this thesis, we addressed applications that will benefit from the reconstruction of non-rigid surface motion fields from spatio-temporal RI data. Even
though we focused on the example of respiratory motion tracking, the proposed
methods can be exploited for a broad range of medical applications. In IGRT,
surface motion fields can be used as a high-dimensional respiration surrogate
for gating, to drive external-internal motion correlation models for respiration-
137
synchronized treatment, and for motion compensated patient positioning. Furthermore, beyond RT, they hold great potential for motion compensation in imageguided interventions and tomographic reconstruction.
Altogether, we have presented three approaches to estimate surface motion
fields that are tailored w.r.t. the individual strengths and limitations of three distinct RI technologies. In Chap. 5, we have proposed a novel variational framework
that simultaneously solves denoising of low-SNR RI data and its registration to an
accurate reference shape extracted from tomographic planning data. In the experiments, we have shown that solving these two intertwined problems of denoising
and registration in a simultaneous manner is superior to a consecutive approach
where the surface registration is performed after prior denoising of RI measurements. In a quantitative study on real CT and synthetic ToF data, we found that
the joint formulation improved the quality of both the denoising and the registration process by a factor of 2.1 and 1.9, respectively. An additional study on real ToF
data further revealed that the joint model can compensate for surface artifacts that
result from systematic ToF measurement errors. In conclusion, the results indicate
that incorporating prior shape knowledge into the denoising process allows for a
robust estimation of dense surface motion fields with RI modalities that exhibit
a low SNR. The proposed method enables both an improved intra-fractional full
torso surface acquisition for patient monitoring and the tracking of non-rigid torso
deformations.
Instead of overcoming the low SNR of available dense RI sensors, we also
investigated the application of a novel MLT RI technology that acquires sparse
but highly accurate 3-D position measurements in real-time. In combination with
a novel variational sparse-to-dense registration approach introduced in Chap. 6,
both the patient’s dense instantaneous external body surface and the non-rigid
surface motion field describing its spatio-temporal deformation can be reconstructed in a joint manner from sparse sampling data and patient-specific prior shape
knowledge. The performance of the proposed method was evaluated on synthetic,
realistic and real MLT data. In a comprehensive study on 256 datasets from 16 subjects with an average initial surface mismatch of 5.66 mm, the mean residual registration error was 0.23 mm w.r.t. ground truth data, on realistic MLT data. The 95th
percentile of the local residual mesh-to-mesh distance after registration did not
exceed 1.17 mm for any subject, indicating that the proposed method can reliably
recover the dense displacement field even in the presence of strong respiration. We
further found that a proper initialization and an improved mathematical formulation reduced the runtime by 19.2% and 48.2%, respectively. With regard to future
advances in sensor technology, simulations indicated a considerable potential gain
in accuracy when doubling the MLT laser grid sampling density. At a runtime of
2.3 s per frame, the developed CPU implementation outperformed related work
substantially.
The methods proposed in Chapters 5,6 rely on the sole geometry of the shapes
to be aligned. In Chap. 7, we presented a method for non-rigid surface registration that exploits complementary photometric information available with modern
RGB-D cameras. The idea behind is that photometric information can compensate for regions with non-salient topographies, whereas geometric information can
138
Summary
guide the motion estimation in faintly textured regions. The proposed framework
estimates the non-rigid transformation in the photometric 2-D image domain and
then deduces the surface motion field from the former and the associated 3-D position measurements. In an experimental study on real data from Microsoft Kinect,
we have investigated the performance of this photometry-driven method compared to a geometry-driven baseline. Indeed, the photometry-driven approach
outperformed the latter by 6.5% and 22.5% in terms of residual photometric mismatch for normal and deep thoracic breathing, respectively. The results indicate
that the approach is of particular interest for RI cameras that provide low-SNR
range measurements but acquire additional high-grade photometric information.
In Chap. 8 we summarized perspectives to improve the methods proposed in
this thesis and discussed potential challenges toward clinical translation.
In summary, this thesis made a number of original contributions, both on a theoretical and on a practical level, to the emerging research field of surface registration for RI-based applications in medicine. It provides novel techniques for rigid
and non-rigid surface registration that are applicable to a broad range of clinical
procedures. Based on the methods developed, the measurements conducted and
the results obtained, an optimized treatment under RI guidance could be available
in the near future and permit more accurate, safe and efficient interventions.
CHAPTER
A
Appendix
A.1
Projection Geometry
In this section, we present a brief recapitulation of projection geometry [Hart 04,
Faug 04]. In particular, we describe the concept of perspective projection and elaborate on how the inversion of this projection is employed to calculate 3-D positions
from the pixel-wise scalar distance measurements of range imaging sensors.
A.1.1
Perspective Projection
A camera maps a 3-D position in the scene space, denoted in Cartesian (xw ) or
homogeneous world coordinates (x̃w ) w.r.t. an arbitrarily defined world coordinate
system, or in camera coordinates w.r.t. the camera coordinate system (xcc , x̃cc ), onto
a position in the 2-D sensor domain, denoted picture coordinate (xp , x̃p ):
xw = ( xw , yw , zw )> ∈ R3 , x̃w = ( x̃w , ỹw , z̃w , w̃w )> ∈ R4 ,
(A.1)
xcc = ( xcc , ycc , zcc )> ∈ R3 , x̃cc = ( x̃cc , ỹcc , z̃cc , w̃cc )> ∈ R4 ,
(A.2)
xp = ( xp , yp )> ∈ R2 , x̃p = ( x̃p , ỹp , z̃p )> ∈ R3 .
(A.3)
Assuming a pinhole camera model, the position of the projection onto the 2-D image xp is given by the intersection of the line between the 3-D position in the camera coordinate system xcc and the camera’s optical center with the sensor plane.
This can be expressed in homogeneous coordinates and matrix notation:
x̃p = K I 0 x̃cc ,
(A.4)
where I ∈ R3×3 denotes the identity matrix and K ∈ R3×3 the camera calibration
matrix:


αx 0 cx


K =  0 αy cy  .
(A.5)
0 0 1
Here, α x , αy denote the focal lengths in [px] and c x , cy the sensor’s principal point1
in [px]. Generalizing the projection formulation from camera coordinates to world
coordinates involves a rotation matrix R ∈ SO(3) and a translation vector t ∈ R3 ,
1 Note
that we use an approximation of the camera calibration matrix here, neglecting the skew
parameter for reasons of simplicity.
139
140
Appendix
describing the relative position and orientation between the two coordinate systems:
!
R t
x̃p = K I 0
x̃w .
(A.6)
0 1
For a comprehensive introduction into the principles of projection geometry and
camera calibration, we refer to the books by Hartley and Zisserman [Hart 04] and
Faugeras et al. [Faug 04].
A.1.2
3-D Point Cloud Reconstruction
Now, based on Eq. (A.4), let us derive the reconstruction of 3-D positions from
scalar range measurements by inverting the projection process. We treat RI sensors
that measure orthogonal distances (e.g. Microsoft Kinect) first. Second, we derive
the reconstruction for sensors that provide radial distances (e.g. ToF sensors).
Orthogonal Measurements. For each position xp on the image plane, the measured orthogonal distance r⊥ ( xp ) = zcc (orthogonal w.r.t. the sensor plane) describes a 3-D position xcc = xr⊥ ( xp ) in the camera coordinate system. In particular,
rearranging Eq. (A.4) yields:
1
xcc = α−
x ( xp − c x ) zcc ,
1
ycc = α−
y ( yp − cy ) zcc .
(A.7)
(A.8)
Hence, the 3-D position in the camera coordinate system xcc is given as:
with p⊥ : R2 → R3 :
xcc = xr⊥ ( xp ) = r⊥ ( xp ) p⊥ ( xp ) ,
(A.9)

1 (x − c )
α−
p
x
x

 1
p ⊥ ( xp ) =  α −
y ( yp − c y )  .
1
(A.10)

Note that the intrinsic camera parameters (α x , αy , c x , cy ) are determined by camera
calibration [Zhan 00b].
Radial Measurements. For RI sensors that measure radial distances r^ ( xp ), the
3-D position in the camera coordinate system xcc = xr^ ( xp ) is given as:
xcc = xr^ ( xp ) = r^ ( xp ) p^ ( xp ) ,
(A.11)
where p^ : R2 → S2 gives the projection ray normalized to unit length, cf. Eq. (A.10):


1 (x − c )
− 1 α−
p
x
x
2
2 2
 −1

1
1
p^ ( xp ) =
α−
+ α−
+1
 α y ( yp − c y )  .
x ( xp − c x )
y ( yp − c y )
1
(A.12)
It is worth noting that the projection rays only depend on the intrinsic camera parameters, and thus can be pre-calculated to speed up the computations for 3-D point
cloud reconstruction from 2-D range measurements.
A.2
A.1.3
Joint Range Image Denoising and Surface Registration
141
Range Image Data Representation
The measurements of an RI sensor with a resolution of w × h pixels can be interpreted as a 2-D range image where each pixel holds the associated orthogonal or
radial distance to the observed scene point, or likewise as a set of 3-D points,
X = { x1 , . . . , x|X | },
xi ∈ R3 ,
|X | = w · h ,
(A.13)
also termed 3-D point cloud, using Eqs. (A.9),(A.11) and the intrinsic camera parameters. As stated before, modern RI sensors may also provide complementary
photometric image information (either grayscale or color). Using camera calibration, both geometric and photometric data can be aligned, eventually providing
textured point clouds. Point cloud triangulation for surface mesh generation with
RI modalities typically exploits the bijection between the reconstructed 3-D point
cloud and its underlying topological representation in a regular 2-D image.
A.2
Joint Range Image Denoising and Surface Registration
A.2.1
Approximation of the Matching Energy
For a geometrically correct formulation of the matching energy Ematch , in theory,
we would have to consider the surface integral:
Ematch [u] :=
Z
Xr
|dG (φ( xr (ζ )))|2 dA ,
(A.14)
where dA denotes a surface element. Based on the generalization of integration by
substitution for integrating functions of several variables, and the corresponding
change of variables formula [Bron 08], Eq. A.14 can be re-written as:
Z
q
(A.15)
Ematch [u] = |dG (φ( xr (ζ )))|2 det ( Dxr (ζ ))> Dxr (ζ ) dζ .
Ω
The term det ( Dxr (ζ ))> Dxr (ζ ) is known as Gram determinant of Dxr . Geometrically, the Gram determinant is the square of the area of the parallelogram formed
by the vectors [Kuhn 06]:
det ( Dxr (ζ ))> Dxr (ζ ) = k∂ζ 1 xr (ζ ) × ∂ζ 2 xr (ζ )k22 ,
(A.16)
q
hence det ( Dxr (ζ ))> Dxr (ζ ) denotes the area.
Instead of using this at first glance geometrically appealing approach, we considered the approximative formulation (Eq. 5.6). The reason is twofold: First, if the
range function r and thus xr changes during the optimization process – as it does
indeed with the proposed joint denoising and registration approach (Sect. 5.4) –
the evaluation of the area term and its derivatives in the first variation of Ematch
142
Appendix
would induce a substantial computational burden. Second, for the joint approach,
the area term with Dxr (ζ ) = Dr (ζ ) ⊗ p(ζ ) + r (ζ ) Dp(ζ ) involves first derivatives
of r, which can be regarded as a further first order prior for the range function2 .
In practice, we observed a strong bias between this local weight for the quality of
the matching and the actual matching term |dG (φ( xr (ζ )))|2 leading to less accurate
matching results in particular in regions of steep gradients in r corresponding to
edges or the boundary contour of Xr .
A.2.2
Derivation of the First Variations
The first variation (or Gâteaux derivative) of a functional E [u] around u w.r.t. a test
function ψ is defined as:
d
E (u + εψ) − E (u)
= E (u + εψ)
.
h∂u E [u], ψi = lim
ε →0
ε
dε
ε =0
(A.17)
Below, we derive the first variations of the individual energies of the joint range
image denoising and surface registration approach proposed in Sect. 5.4:
First variation of Efid [r ], test function ϑ : Ω → R:
Z
|r − r0 |2 dζ .
Ω
Z
d
0
2
hEfid [r ], ϑi =
|r + εϑ − r0 | dζ dε
Ω
ε =0
Z
2(r + εϑ − r0 )ϑ dζ =
Ω
Efid [r ] =
ε =0
=
Z
Ω
2(r − r0 )ϑ dζ .
(A.18)
First variation of Er,reg [r ], test function ϑ : Ω → R:
Er,reg [r ] =
Z
k∇r kδreg dζ .
Z
d
0
k∇(r + εϑ)kδreg dζ hEr,reg [r ], ϑi =
dε
Ω
ε =0
Z
∇(r + εϑ) · ∇ϑ dζ =
Ω k∇(r + εϑ )kδreg
Ω
ε =0
=
2⊗
denotes the Kronecker product.
Z
Ω
∇r · ∇ ϑ
dζ .
k∇r kδreg
(A.19)
A.2
Joint Range Image Denoising and Surface Registration
143
First variation of Ematch [u, r ] w.r.t. r, test function ϑ : Ω → R:
Z
|dG ( xr + u)|2 dζ .
Z
d
2
h∂r Ematch [u, r ], ϑi =
|dG ((r + εϑ) p + u)| dζ dε
Ω
ε =0
Ematch [u] =
=
=
Ω
Z
2dG ((r + εϑ ) p + u)∇dG ((r + εϑ ) p + u) · pϑ dζ Ω
Z
Ω
2dG (rp + u)∇dG (rp + u) · pϑ dζ .
ε =0
(A.20)
First variation of Ematch [u, r ] w.r.t. u, test function ϕ : Ω → R3 :
Z
|dG ( xr + u)|2 dζ .
Z
d
2
h∂u Ematch [u, r ], ϕi =
|dG ( xr + u + εϕ)| dζ dε
Ω
ε =0
Ematch [u] =
=
=
Ω
Z
2dG ( xr + u + εϕ)∇dG ( xr + u + εϕ) · ϕ dζ Ω
Z
Ω
2dG ( xr + u)∇dG ( xr + u) · ϕ dζ .
ε =0
(A.21)
First variation of Eu,reg [u], test function ϕ : Ω → R3 :
Ereg [u] =
Z
Ω
k Duk22 dζ .
!
2
k∇(
u
+
εϕ
)k
dζ
k
k 2
∑
Ω k =1
ε =0
Z
3
=
2 ∑ ∇(uk + εϕk ) · ∇ ϕk dζ Ω k =1
d
0
hEu,reg
[ u ], ϕ i =
dε
Z
3
ε =0
=
=
3
Z
Ω
Z
Ω
2
∑ ∇uk · ∇ ϕk dζ
k =1
2Du : Dϕ dζ .
(A.22)
144
Appendix
A.3
Sparse-to-dense Non-Rigid Surface Registration
A.3.1
Derivation of the First Variations
First variation of Econ [u, W ] w.r.t. u, test function ϕ : Ω → R3 :
Econ [u, W ] =
1 n
| P(yi + wi ) + u( Q P(yi + wi )) − yi |2 .
2n i∑
=1
h∂u Econ [u, W ], ϕi =
1 n
| P(yi + wi ) + u( Q P(yi + wi )) + εϕ( Q P(yi + wi )) − yi |2
2n i∑
=1
d
=
dε
!
ε =0
1
= ∑ ( P(yi + wi ) + u( Q P(yi + wi )) + εϕ( Q P(yi + wi )) − yi ) · ϕ( Q P(yi + wi ))
n i =1
n
ε =0
n
=
1
( P(yi + wi ) + u( Q P(yi + wi )) − yi ) · ϕ( Q P(yi + wi )) .
n i∑
=1
(A.23)
First variation of Ereg [u], test function ϕ : Ω → R3 :
Ereg [u] =
1
2
Z
Ω
|∆u|2 dζ .
!
2
|
∆
(
u
+
εϕ
)|
dζ
k
k
∑
Ω k =1
ε =0
3 Z
= ∑
∆(uk + εϕk ) · ∆ϕk dζ k =1 Ω
d
0
hEreg
[ u ], ϕ i =
dε
1
2
Z
3
ε =0
3
=
A.3.2
∑
Z
k =1 Ω
∆uk · ∆ϕk dζ .
(A.24)
Improved Projection Approximation
To evaluate the variation ∂wj Econ one has to compute:
DP( x) = I − ∇dG ( x)∇> dG ( x) − dG ( x) D2 dG ( x),
(A.25)
which involves the Hessian D2 dG ( x) of the SDF dG . Although it is possible to numerically approximate the second derivatives of dG , e.g. similar to how ∆u in Ereg
is handled, we propose a modification that completely avoids second derivatives
of dG . Using this approach, we stay clear of the additional algorithmic complexity
required to handle D2 dG . Let us point out that this is possible since our objective
functional E itself does not involve D2 dG , only the descent direction does. This is
not true for ∆u, which is the reason why we need to evaluate ∆u numerically.
In order to avoid the second derivatives of dG , we partially linearize P by replacing the projection direction in P by the already computed direction from the
A.3
Sparse-to-dense Non-Rigid Surface Registration
145
last update. Denoting by W m−1 = {w1m−1 , . . . , wnm−1 } ⊂ R3 the estimate for W in
the (m − 1)-th gradient descent step, in a first formulation [Baue 12a] we considered the following approximate projection in the m-th step,
P(yi + wi ) = yi + wi − dG (yi + wi )∇dG (yi + wi )
≈ yi + wi − dG (yi + wim−1 )∇dG (yi + wim−1 ) =: Pim (yi + wi ).
(A.26)
The linear part of the projection is evaluated at the unknown new estimate wi
while the nonlinear part of P is evaluated at the old estimate wim−1 . Thus, Econ in
the m-th step is replaced by:
m
Econ
[u, W ] =
1 n
| Pim (yi + wi ) + u( Q Pim (yi + wi )) − yi |2 .
2n i∑
=1
(A.27)
m is:
Since DPim is the identity matrix I, the variation of Econ
m
∂wj Econ
[u, W ] =
>
1 m
Pj (yj + w j ) + u( Q Pjm (yj + w j )) − yj
nh
i
I + Du( Q Pjm (yj + w j )) Q .
(A.28)
In particular, it does not include second derivatives of dG . Since in each step of the
gradient descent the W estimate from the preceding step is used to approximate
the projection, the approximation is automatically updated after each step, leading to a fixed-point iteration. Unfortunately, this linearization does not reflect the
underlying geometry properly and hence, not surprisingly, requires substantially
more iterations than the approach investigated here – using a more accurate approximation of the projection. Indeed, we modified Eq. (A.26) so that the scaling
term of the nonlinear part is evaluated at the new estimate wi ,
Pim (yi + wi ) = yi + wi − dG (yi + wi )∇dG (yi + wim−1 ) ,
DPim (yi + wi ) = I − ∇dG (yi + wim−1 )∇> dG (yi + wi ) .
(A.29)
m is then:
The variation ∂wj Econ
m
∂wj Econ
[u, W ] =
>
1 m
Pj (yj + w j ) + u( Q Pjm (yj + w j )) − yj
n
I − ∇dG (yi + wim−1 )∇> dG (yi + wi )
+ Du( Q Pjm (yj + w j )) Q(I − ∇dG (yi + wim−1 )∇> dG (yi + wi )) .
(A.30)
For a quantitative analysis of the impact of this modification on reconstruction
accuracy and the convergence speed of the algorithm, we refer to Sect. 6.3.2.
A.3.3
Detailed Results of the Prototype Study
In addition to the boxplots in Fig. 6.6, we present numbers of the initial and residual mismatch (95th percentile) for the individual subjects in Table A.1. Here, we
146
Appendix
Table A.1: Initial surface mismatch |dG | on M p and residual surface mismatch
|dM p | on φ p (G) (95th percentile) per subject, for abdominal and thoracic respiration. Bold
numbers denote the mean 95th percentile over all subjects.
Respiration
Abdominal
Thoracic
Abdominal+Thoracic
Initial Mismatch [mm], 95th Percentile
S1
S2
S3
S4
S5
S6
6.4
14.0
4.4
10.6
7.9
6.5
11.5
18.7
5.5
20.5
10.3
4.8
10.9
18.3
5.3
20.1
9.8
6.1
Respiration
Abdominal
Thoracic
Abdominal+Thoracic
S9
16.4
15.7
16.2
S7
7.2
9.5
9.3
S8
2.8
5.4
5.3
S14
13.3
13.8
13.7
S15
18.2
14.2
17.5
S16
16.1
20.5
20.5
Respiration
Abdominal
Thoracic
Abdominal+Thoracic
Residual Mismatch [mm], 95th Percentile
S1
S2
S3
S4
S5
S6
0.52
0.77
0.58
0.76
0.63
0.72
0.65
1.13
0.55
0.96
0.78
0.86
0.58
0.96
0.56
0.86
0.70
0.80
S7
0.60
0.70
0.65
S8
0.50
0.54
0.52
Respiration
Abdominal
Thoracic
Abdominal+Thoracic
S9
0.78
0.77
0.77
S15
0.93
1.17
1.04
S16
0.79
1.05
0.93
S10
9.7
6.3
9.3
S10
0.83
0.66
0.74
S11
11.7
17.1
16.7
S11
0.83
1.03
0.93
S12
6.3
5.5
6.2
S12
0.57
0.55
0.56
S13
8.9
10.6
10.5
S13
0.60
0.85
0.73
S14
0.65
0.71
0.68
S1 -S16
14.0
17.1
15.2
S1 -S16
0.69
0.82
0.76
also present separate results for abdominal and thoracic respiration, respectively.
The low residual reconstruction errors indicate that the approach is capable to recover both abdominal and thoracic surface motion fields to a comparable degree
in accuracy.
List of Symbols
Chapter 2
x ∈ R3
φtof
r^ (·)
c
f mod
Position in 3-D space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Phase shift in CW ToF imaging . . . . . . . . . . . . . . . . . . . . . . . . .
Radial range (distance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Speed of light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Modulation frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
13
13
13
13
Set of points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Number of elements within a set X . . . . . . . . . . . . . . . . . . . . .
Moving template point set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fixed reference point set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Point in moving template point set . . . . . . . . . . . . . . . . . . . . .
Point in fixed reference point set . . . . . . . . . . . . . . . . . . . . . . . .
Rotation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Translation vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Global rigid transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pre-alignment rigid transformation . . . . . . . . . . . . . . . . . . . . .
ICP refinement rigid transformation . . . . . . . . . . . . . . . . . . . .
Feature descriptor dimensionality . . . . . . . . . . . . . . . . . . . . . .
Feature descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Set of feature descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Moving data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fixed data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Corresponding point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Distance metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Correspondence operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Initial set of correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . .
Cross-validated set of correspondences . . . . . . . . . . . . . . . . .
Geometric consistency metric . . . . . . . . . . . . . . . . . . . . . . . . . . .
Correspondence reliability threshold . . . . . . . . . . . . . . . . . . .
Reliable set of correspondences . . . . . . . . . . . . . . . . . . . . . . . . .
Rotation angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Neighborhood/support region (set of pixels or points) . .
Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Number of histogram bins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Normal vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Position in cylindrical coordinates . . . . . . . . . . . . . . . . . . . . . .
36
36
36
36
36
36
36
36
36
36
36
36
36
36
37
37
37
37
37
37
37
38
38
38
39
39
39
39
39
40
Chapter 3
X
|X |
Xm
Xf
x m ∈ R3
x f ∈ R3
R ∈ SO(3)
t = (t x, ty , tz )>
R g , tg Rpre , tpre
Ricp , ticp
D
d ∈ RD
D
M
F
xc ∈ R3
d(·)
cm (·), cf (·)
Cinit
Ccross
gc (·)
δc
C
θ
N
H(·, ·, ·)
NH
n ∈ R3
xcyl ∈ R2
147
148
dspin
χ(·)
f (·)
∇f
γ(·)
P
qP (·)
bref
dhog
Nseg
s
T
a ∈ R3
Ncuss
rcuss
Rθ
Xcuss
xcuss
mcuss
d⊥ (·)
f rgt (·)
e
Nriff
a
Na
driff
θAP
tAP
tML
tSI
( RGT , tGT )
rN
dTRE (·)
Appendix
Spin image descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Characteristic function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Scalar image function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Image gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Gradient orientation operator . . . . . . . . . . . . . . . . . . . . . . . . . . .
Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Projection operator w.r.t. plane P . . . . . . . . . . . . . . . . . . . . . . .
MeshHOG second reference axis . . . . . . . . . . . . . . . . . . . . . . . .
MeshHOG descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MeshHOG number of circular segments . . . . . . . . . . . . . . . .
MeshHOG circular segment index . . . . . . . . . . . . . . . . . . . . . .
Tangent plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CUSS reference vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CUSS sampling density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CUSS sampling radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rotation matrix for angle θ . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Set of CUSS sampling points . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CUSS sampling point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CUSS mesh intersection point . . . . . . . . . . . . . . . . . . . . . . . . . .
Signed orthogonal depth w.r.t. surface . . . . . . . . . . . . . . . . . .
Radial gradient transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Basis vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RIFF number of annuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RIFF annulus index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RIFF annulus neighborhood . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RIFF descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rotation angle around AP axis . . . . . . . . . . . . . . . . . . . . . . . . . .
Translation along AP axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Translation along ML axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Translation along SI axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ground truth transformation . . . . . . . . . . . . . . . . . . . . . . . . . . .
Neighborhood radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Target registration error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
40
40
41
41
42
42
42
42
42
42
42
42
43
43
43
43
43
43
43
44
44
44
44
44
44
45
45
45
45
45
47
50
Initial transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Photometric RGB data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Geometric weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Transformation in k-th iteration . . . . . . . . . . . . . . . . . . . . . . . . .
Representative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Set of representatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Low-grade correspondence threshold . . . . . . . . . . . . . . . . . . .
60
60
61
61
62
62
65
Chapter 4
( R0 , t 0 )
p = ( pr , p g , p b ) >
β
( Rk , t k )
r
R
δlg
E ∈ R4×4
TGT ∈ R4×4
T ∈ R4 × 4
Relative transformation error . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Ground truth transformation matrix . . . . . . . . . . . . . . . . . . . . 69
Estimated transformation matrix . . . . . . . . . . . . . . . . . . . . . . . 69
A.3
Sparse-to-dense Non-Rigid Surface Registration
σ
149
Standard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Chapter 5
G
Xr
Ω
ζ ∈ R2
r (·)
xr (·), p(·)
φ(·)
u(·)
E
Ematch
Ereg
κ
dA (·)
P(·)
D
tr(·)
Efid
Er,reg
Eu,reg
λ, µ
r0 (·)
δreg
k · kδreg
ϑ (·)
ϕ(·)
M
(rideal , Xrideal )
( r 0 , X r0 )
(r0,ta , Xr0,ta )
( r ∗ , X r∗ )
(u∗ , φ∗ )
Er,reg,Q
Er,reg,TV
φideal (·)
ν
(r̃ ∗ , Xr̃∗ )
(ũ∗ , φ̃∗ )
p
Mp
(r p,0,ta , Xr p,0,ta )
φ p (·)
Planning shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RI point set/shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Parameter domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Position on Ω . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Range (distance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-D/3-D mapping functions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Non-rigid 3-D deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-D displacement field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Energy functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matching energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Regularization energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Non-negative weighting parameter . . . . . . . . . . . . . . . . . . . . .
SDF w.r.t. shape A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Projection-onto-shape operator . . . . . . . . . . . . . . . . . . . . . . . . .
Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fidelity energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Range regularization energy . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Displacement regularization energy . . . . . . . . . . . . . . . . . . . .
Non-negative weighting parameters . . . . . . . . . . . . . . . . . . . .
Measured range (distance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pseudo Huber regularization parameter . . . . . . . . . . . . . . . .
Pseudo Huber norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Scalar test function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Vector-valued test function . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Instantaneous patient shape . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ideal (noise-free) RI data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Measured/realistic RI data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Temporally averaged RI data . . . . . . . . . . . . . . . . . . . . . . . . . . .
Denoised RI data estimate, joint scheme . . . . . . . . . . . . . . . .
Displacement/deformation estimate, joint scheme . . . . . .
Quadratic regularization energy . . . . . . . . . . . . . . . . . . . . . . . .
TV regularization energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ideal synthetic 3-D deformation . . . . . . . . . . . . . . . . . . . . . . . .
Deformation scale parameter . . . . . . . . . . . . . . . . . . . . . . . . . . .
Denoised RI data estimate, sequential scheme . . . . . . . . . .
Displacement/deformation estimate, sequential scheme
Respiration phase index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Instantaneous patient shape at phase p . . . . . . . . . . . . . . . . .
Temporally averaged RI data at phase p . . . . . . . . . . . . . . . .
Non-rigid 3-D deformation at phase p . . . . . . . . . . . . . . . . . .
86
86
86
86
86
86
87
87
87
87
87
87
87
88
88
88
89
89
89
89
89
89
89
90
90
91
91
91
91
91
91
92
92
92
92
93
93
93
91
94
94
150
Appendix
Chapter 6
Y
y ∈ R3
ψ(·)
Ψ(·)
g(·)
Q(·)
w ∈ R3
W
Econ
Yp
τ
u p (·)
Sparse set of MLT measurements . . . . . . . . . . . . . . . . . . . . . .
MLT measurement point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Non-rigid 3-D deformation, inverse to φ . . . . . . . . . . . . . . .
Sparse non-rigid 3-D deformation . . . . . . . . . . . . . . . . . . . . .
Graph mapping function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Orthographic projection operator . . . . . . . . . . . . . . . . . . . . . .
Displacement vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sparse set of displacements . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Consistency energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sparse set of MLT measurements at phase p . . . . . . . . . . .
Convergence threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-D displacement field at phase p . . . . . . . . . . . . . . . . . . . . . .
104
104
104
104
105
105
105
105
105
108
108
112
RGB image function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RGB-D reference data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Instantaneous RGB-D data at time t . . . . . . . . . . . . . . . . . . .
Non-rigid photometry-driven 3-D deformation . . . . . . . .
Non-rigid geometry-driven 3-D deformation . . . . . . . . . .
Non-rigid photometry-driven 2-D deformation . . . . . . . .
Non-rigid geometry-driven 2-D deformation . . . . . . . . . .
Initial mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Residual mismatch, photometry-driven approach . . . . . .
Residual mismatch, geometry-driven approach . . . . . . . .
121
122
122
122
122
122
124
124
124
124
Position in world coordinates . . . . . . . . . . . . . . . . . . . . . . . . . .
Position in camera coordinates . . . . . . . . . . . . . . . . . . . . . . . .
Position in picture coordinates . . . . . . . . . . . . . . . . . . . . . . . . .
Position x in homogeneous coordinates . . . . . . . . . . . . . . . .
Camera calibration matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Identity matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Focal length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Principal point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Orthogonal range (distance) . . . . . . . . . . . . . . . . . . . . . . . . . . .
Orthogonal 2-D/3-D mapping functions . . . . . . . . . . . . . . .
Radial 2-D/3-D mapping functions . . . . . . . . . . . . . . . . . . . .
Sensor/image resolution (width × height) . . . . . . . . . . . . .
139
139
139
139
139
139
139
139
140
140
140
141
Chapter 7
f rgb (·)
(rref , Xrref , f rgb,ref )
(rt , Xrt , f rgb,t )
up , φp
ug , φg
ũp
ũg
e0
ep
eg
Appendix
xw ∈ R3
xcc ∈ R3
x p ∈ R2
x̃
K ∈ R3 × 3
I
α x , αy
c x , cy
r⊥ (·)
xr⊥ (·), p⊥ (·)
xr^ (·), p^ (·)
w×h
List of Abbreviations
AAPM
AP
API
BF
CCD
CMOS
CPU
CT
CUDA
CUSS
CW
EM
FE
FOV
GMM
GPU
HOG
ICP
IGLS
IGRT
IR
LED
LINAC
MIP
ML
MLT
MRI
NCAT
NN
OC
OR
PET
RBC
RGB-D
RGT
RI
RIFF
RITK
RPM
RT
American Association of Physicists in Medicine . . . . . . . . . 32
Anterior-Posterior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Application Programming Interface . . . . . . . . . . . . . . . . . . . . 15
Brute Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Charge-Coupled Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Complementary Metal-Oxide-Semiconductor . . . . . . . . . . . 15
Central Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Computed Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Compute Unified Device Architecture . . . . . . . . . . . . . . . . . . 66
Circular Uniform Surface Sampling . . . . . . . . . . . . . . . . . . . . . 42
Continuous-Wave (Modulation) . . . . . . . . . . . . . . . . . . . . . . . . 13
Expectation Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Finite Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Field of View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Gaussian Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Graphics Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Histogram of Oriented Gradients . . . . . . . . . . . . . . . . . . . . . . . 35
Iterative Closest Point (Algorithm) . . . . . . . . . . . . . . . . . . . . . . . 3
Image-Guided Liver Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Image-Guided Radiation Therapy . . . . . . . . . . . . . . . . . . . . . . 82
Infrared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Light-Emitting Diode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Linear Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Minimally Invasive Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 56
Medio-Lateral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Multi-Line Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Nurbs-based CArdiac-Torso phantom . . . . . . . . . . . . . . . . . . 93
Nearest Neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Optical Colonoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Operating Room . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Positron Emission Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Random Ball Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
RGB + Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Radial Gradient Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Range Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Rotation Invariant Fast Features . . . . . . . . . . . . . . . . . . . . . . . . 35
Range Imaging ToolKit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Robust Point Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Radiation Therapy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
151
152
SDF
SfM
SI
SIFT
SL
SLAM
SNR
SoC
SPECT
ToF
TPS
TSDF
TV
US
VC
Appendix
Signed Distance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Structure-from-Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Superior-Inferior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Scale-Invariant Feature Transform . . . . . . . . . . . . . . . . . . . . . . 35
Structured Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Simultaneous Localization and Mapping . . . . . . . . . . . . . . . 58
Signal-to-Noise Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
System on a Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Single-Photon Emission Computed Tomography . . . . . . . . . 2
Time-of-Flight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Thin Plate Spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Truncated Signed Distance Function . . . . . . . . . . . . . . . . . . . . 69
Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Ultrasound Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Virtual Colonoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
List of Figures
1.1
Thesis organization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
2.2
Measurement principle of different RI technologies. . . . . . . . . . . 12
MLT sensor measurement principle. . . . . . . . . . . . . . . . . . . . 17
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
Proposed automatic initial patient setup. . . . . . . . . . . . . . . . .
Intra-operative navigation in IGLS with a marker-based system. . . .
Proposed feature-based rigid surface registration framework. . . . .
Geometric consistency check. . . . . . . . . . . . . . . . . . . . . . . .
Shape descriptors: Spin Images, MeshHOG, RIFF. . . . . . . . . . . .
Materials for patient setup experiments. . . . . . . . . . . . . . . . . .
Distribution of point correspondences for patient setup experiments.
Porcine liver surface data for IGLS experiments. . . . . . . . . . . . .
Distribution of point correspondences for IGLS experiments. . . . . .
33
34
37
38
41
46
49
50
52
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
4.16
Benefit of incorporating photometric information into ICP alignment.
Proposed photo-geometric reconstruction framework. . . . . . . . . .
RBC construction and two-tier NN query scheme. . . . . . . . . . . .
Qualitative results for the reconstruction of indoor environments. . .
RBC runtime comparison for a single ICP iteration. . . . . . . . . . .
Registration error due to approximative RBC-based NN search. . . .
Materials for laparoscopy reconstruction experiments. . . . . . . . . .
Drift with 3-D vs. 6-D ICP registration in laparoscopy. . . . . . . . . .
Drift with ICP on synthetic vs. noisy data in laparoscopy. . . . . . . .
Qualitative results for laparoscopy reconstruction experiments. . . .
Materials for colonoscopy reconstruction experiments. . . . . . . . .
Drift with 3-D vs. 6-D ICP registration in colonoscopy. . . . . . . . . .
Drift with approximative vs. exact geometric ICP. . . . . . . . . . . .
Influence of the photo-geometric weighting parameter. . . . . . . . .
Convergence behavior for 3-D vs. 6-D ICP registration. . . . . . . . .
Qualitative results for colonoscopy reconstruction experiments. . . .
59
61
63
65
66
67
68
69
70
71
72
73
74
75
76
77
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
Workflow in RI-guided respiration-synchronized RT. . . . . . . . . .
Geometric configuration for dense surface registration. . . . . . . . .
Experimental setup for model validation. . . . . . . . . . . . . . . . .
Experimental evaluation of denoising models. . . . . . . . . . . . . .
Validation of the joint model on male phantom data. . . . . . . . . . .
Comparison of the proposed joint approach to a sequential scheme. .
Joint denoising and surface registration results on NCAT data. . . . .
Joint denoising and surface registration results on real ToF/CT data.
84
87
92
94
95
97
98
99
153
7
154
List of Figures
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
Reconstruction of sparse displacement fields from MLT data. . . .
Geometric configuration for sparse-to-dense surface registration.
Validation of sparse-to-dense model on NCAT phantom data. . .
Illustration of estimated NCAT surface motion fields. . . . . . . .
Sparse-to-dense non-rigid surface registration on real MLT data. .
Quantitative results of prototype study for realistic MLT data. . .
Study of algorithmic parameters and modifications. . . . . . . . .
Influence of MLT grid density on residual mismatch. . . . . . . .
Quantitative evaluation of influence of MLT grid density. . . . . .
.
.
.
.
.
.
.
.
.
. 103
. 104
. 110
. 111
. 112
. 113
. 115
. 116
. 117
7.1
7.2
7.3
7.4
7.5
Motivation for photometry-driven surface registration. . . . . . .
Geometric setup of geometry- vs. photometry-driven registration.
Comparison of geometry- vs. photometry-driven registration. . .
Investigation of landmark matching for deep inhalation study. . .
Comparison of estimated surface motion fields. . . . . . . . . . . .
.
.
.
.
.
. 120
. 122
. 125
. 126
. 127
List of Tables
2.1
Specifications of RI sensors investigated in this thesis. . . . . . . . . . 14
3.1
3.2
Patient setup errors with multi-modal surface registration. . . . . . . 48
Organ alignment errors with multi-modal surface registration. . . . . 51
4.1
Runtimes for RBC construction and ICP execution. . . . . . . . . . . . 67
6.1
Quantitative results of prototype study for realistic MLT data. . . . . 114
A.1 Individual results of prototype study for realistic MLT data. . . . . . 146
155
156
List of Tables
Bibliography
[Aige 08]
D. Aiger, N. J. Mitra, and D. Cohen-Or. “4-Points Congruent Sets for
Robust Pairwise Surface Registration”. ACM Transactions on Graphics,
Vol. 27, No. 3, pp. 85:1–85:10, Aug 2008.
[Aken 02]
T. Akenine-Möller and E. Haines. Real-Time Rendering. A K Peters,
Ltd., Natick, MA, USA, 2nd Ed., 2002.
[Albr 12]
T. Albrecht and T. Vetter. “Automatic Fracture Reduction”. In: MICCAI Workshop on Mesh Processing in Medical Image Analysis, pp. 22–29,
Springer, Oct 2012.
[Alle 03]
B. Allen, B. Curless, and Z. Popovic. “The Space of Human Body
Shapes: Reconstruction and Parameterization from Range Scans”.
ACM Transactions on Graphics, Vol. 22, No. 3, pp. 587–594, Jul 2003.
[Alno 10]
M. R. Alnowami, E. Lewis, M. Guy, and K. Wells. “An Observation
Model for Motion Correction in Nuclear Medicine”. In: SPIE Medical
Imaging, pp. 76232F–9, Feb 2010.
[Alva 99]
L. Alvarez, J. Weickert, and J. Sánchez. “A Scale-Space Approach to
Nonlocal Optical Flow Calculations”. In: M. Nielsen, P. Johansen,
O. F. Olsen, and J. Weickert, Eds., International Conference on Scale-Space
Theories in Computer Vision, pp. 235–246, Springer, Sep 1999.
[Ambe 07] B. Amberg, S. Romdhani, and T. Vetter. “Optimal Step Nonrigid ICP
Algorithms for Surface Registration”. In: International Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 1–8, IEEE, Jun
2007.
[Amid 02]
I. Amidror. “Scattered Data Interpolation Methods for Electronic
Imaging Systems: a Survey”. SPIE Journal of Electronic Imaging, Vol. 11,
No. 2, pp. 157–176, 2002.
[Ande 12]
M. Andersen, T. Jensen, P. Lisouski, A. Mortensen, M. Hansen,
T. Gregersen, and P. Ahrendt. “Kinect Depth Sensor Evaluation for
Computer Vision Applications”. Tech. Rep., Aarhus University, Feb
2012. ECE-TR-6.
[Anti 02]
L. Antiga. Patient-Specific Modeling of Geometry and Blood Flow in Large
Arteries. PhD thesis, Politechnico di Milano, 2002.
[Armi 66]
L. Armijo. “Minimization of Functions having Lipschitz Continuous
First Partial Derivatives”. Pacific Journal of Mathematics, Vol. 16, No. 1,
pp. 1–3, 1966.
[Aude 00]
M. A. Audette, F. P. Ferrie, and T. M. Peters. “An Algorithmic
Overview of Surface Registration Techniques for Medical Imaging”.
Medical Image Analysis, Vol. 4, No. 3, pp. 201 –217, 2000.
157
158
Bibliography
[Auri 95]
V. Aurich and J. Weule. “Non-Linear Gaussian Filters Performing
Edge Preserving Diffusion”. In: German Association for Pattern Recognition (DAGM) Symposium, pp. 538–545, Springer, Sep 1995.
[Bala 09]
R. Balachandran and J. M. Fitzpatrick. “Iterative Solution for Rigidbody Point-based Registration with Anisotropic Weighting”. In: SPIE
Medical Imaging, p. 72613D, Feb 2009.
[Bar 07]
L. Bar, B. Berkels, M. Rumpf, and G. Sapiro. “A Variational Framework for Simultaneous Motion Estimation and Restoration of MotionBlurred Video”. In: International Conference on Computer Vision (ICCV),
pp. 1–8, IEEE, Oct 2007.
[Baue 11a] S. Bauer, J. Wasza, S. Haase, N. Marosi, and J. Hornegger. “MultiModal Surface Registration for Markerless Initial Patient Setup in Radiation Therapy using Microsoft’s Kinect Sensor”. In: ICCV Workshop
on Consumer Depth Cameras for Computer Vision (CDC4CV), pp. 1175–
1181, IEEE, Nov 2011.
[Baue 11b] S. Bauer, J. Wasza, K. Müller, and J. Hornegger. “4D Photogeometric Face Recognition with Time-of-Flight Sensors”. In: Workshop on
Applications of Computer Vision (WACV), pp. 196–203, IEEE, Jan 2011.
[Baue 12a] S. Bauer, B. Berkels, S. Ettl, O. Arold, J. Hornegger, and M. Rumpf.
“Marker-less Reconstruction of Dense 4-D Surface Motion Fields using Active Laser Triangulation for Respiratory Motion Management”.
In: N. Ayache, H. Delingette, P. Golland, and K. Mori, Eds., International Conference on Medical Image Computing and Computer Assisted
Intervention (MICCAI), pp. 414–421, LNCS 7510, Part I, Springer, Oct
2012.
[Baue 12b] S. Bauer, B. Berkels, J. Hornegger, and M. Rumpf. “Joint ToF Image
Denoising and Registration with a CT Surface in Radiation Therapy”.
In: A. Bruckstein, B. ter Haar Romeny, A. Bronstein, and M. Bronstein,
Eds., International Conference on Scale Space and Variational Methods in
Computer Vision (SSVM), pp. 98–109, Springer, May 2012.
[Baue 12c] S. Bauer, S. Ettl, J. Wasza, F. Willomitzer, F. Huber, J. Hornegger, and
G. Häusler. “Sparse Active Triangulation Grids for Respiratory Motion Management”. In: German Branch of the European Optical Society
(DGaO) Annual Meeting, p. P23, May 2012.
[Baue 12d] S. Bauer, J. Wasza, and J. Hornegger. “Photometric Estimation of 3D
Surface Motion Fields for Respiration Management”. In: T. Tolxdorff,
T. M. Deserno, H. Handels, and H.-P. Meinzer, Eds., Bildverarbeitung
für die Medizin (BVM), Informatik aktuell, pp. 105–110, Springer, Mar
2012.
[Baue 13a] S. Bauer, A. Seitel, H. Hofmann, T. Blum, J. Wasza, M. Balda, H.-P.
Meinzer, N. Navab, J. Hornegger, and L. Maier-Hein. Time-of-Flight
and Depth Imaging. Sensors, Algorithms, and Applications, Chap. RealTime Range Imaging in Health Care: A Survey, pp. 228–254. LNCS
8200, Springer, 2013.
Bibliography
159
[Baue 13b] S. Bauer, J. Wasza, F. Lugauer, D. Neumann, and J. Hornegger. Consumer Depth Cameras for Computer Vision: Research Topics and Applications, Chap. Real-time RGB-D Mapping and 3-D Modeling on the GPU
using the Random Ball Cover, pp. 27–48. Advances in Computer Vision
and Pattern Recognition, Springer, 2013.
[Bell 07]
S. Beller, M. Hünerbein, T. Lange, S. Eulenstein, B. Gebauer, and
P. M. Schlag. “Image-guided Surgery of Liver Metastases by Threedimensional Ultrasound-based Optoelectronic Navigation”. British
Journal of Surgery, Vol. 94, No. 7, pp. 866–875, John Wiley & Sons, Inc.,
Jul 2007.
[Belo 00]
S. Belongie, J. Malik, and J. Puzicha. “Shape Context: A New Descriptor for Shape Matching and Object Recognition”. In: International Conference on Neural Information Processing Systems (NIPS), pp. 831–837,
Nov 2000.
[Benj 99]
R. Benjemaa and F. Schmitt. “Fast Global Registration of 3D Sampled
Surfaces using a Multi-z-buffer Technique”. Image and Vision Computing, Vol. 17, No. 2, pp. 113–123, 1999.
[Berk 06]
B. Berkels, M. Burger, M. Droske, O. Nemitz, and M. Rumpf. “Cartoon Extraction Based on Anisotropic Image Classification”. In: International Workshop on Vision, Modeling and Visualization (VMV), pp. 293–
300, Eurographics Association, Nov 2006.
[Berk 10]
B. Berkels. Joint methods in imaging based on diffuse image representations. PhD thesis, Rheinische Friedrich-Wilhelms-Universität Bonn,
Feb 2010.
[Berk 13]
B. Berkels, S. Bauer, S. Ettl, O. Arold, J. Hornegger, and M. Rumpf.
“Joint Surface Reconstruction and 4-D Deformation Estimation from
Sparse Data and Prior Knowledge for Marker-Less Respiratory Motion Tracking”. Medical Physics, Vol. 40, No. 9, pp. 091703 1–10, Sep
2013.
[Bert 05]
C. Bert, K. G. Metheany, K. Doppke, and G. T. Y. Chen. “A Phantom
Evaluation of a Stereo-vision Surface Imaging System for Radiotherapy Patient Setup”. Medical Physics, Vol. 32, No. 9, pp. 2753–2762, Sep
2005.
[Bert 06]
C. Bert, K. G. Metheany, K. P. Doppke, A. G. Taghian, S. N. Powell,
and G. T. Y. Chen. “Clinical Experience with a 3D Surface Patient
Setup System for Alignment of Partial-breast Irradiation Patients”. International Journal of Radiation Oncology Biology Physics, Vol. 64, No. 4,
pp. 1265–1274, Mar 2006.
[Besl 92]
J. Besl and N. McKay. “A Method for Registration of 3-D Shapes”.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 14,
No. 2, pp. 239–256, 1992.
[Bett 13]
V. Bettinardi, E. D. Bernardi, L. Presotto, and M. Gilardi. “MotionTracking Hardware and Advanced Applications in PET and
PET/CT”. PET Clinics, Vol. 8, No. 1, pp. 11–28, 2013.
160
Bibliography
[Bigd 12a]
A. Bigdelou, A. Benz, L. Schwarz, and N. Navab. “Customizable Gesturing Interface for the Operating Room using Kinect”. In: CVPR
Workshop on Gesture Recognition, Jun 2012.
[Bigd 12b]
A. Bigdelou, R. Stauder, T. Benz, A. Okur, T. Blum, R. Ghotbi, and
N. Navab. “HCI Design in the OR: A Gesturing Case-study”. In:
MICCAI Workshop on Modeling and Monitoring of Computer Assisted Interventions, Oct 2012.
[Blai 04]
F. Blais. “Review of 20 Years of Range Sensor Development”. Journal
of Electronic Imaging, Vol. 13, No. 1, pp. 231–243, 2004.
[Blai 95]
G. Blais and M. D. Levine. “Registering Multiview Range Data to
Create 3-D Computer Objects”. IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol. 17, No. 8, pp. 820–824, Aug 1995.
[Blum 12]
T. Blum, V. Kleeberger, C. Bichlmeier, and N. Navab. “mirracle: An
Augmented Reality Magic Mirror System for Anatomy Education”.
In: Virtual Reality (VR), pp. 115–116, IEEE, Mar 2012.
[Bona 11]
F. Bonarrigo, A. Signoroni, and R. Leonardi. “A Robust Pipeline for
Rapid Feature-based Pre-alignment of Dense Range Scans”. In: International Conference on Computer Vision (ICCV), pp. 2260–2267, IEEE,
Nov 2011.
[Book 89]
F. L. Bookstein. “Principal Warps: Thin-Plate Splines and the Decomposition of Deformations”. IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. 11, No. 6, pp. 567–585, 1989.
[Boye 11]
E. Boyer, A. M. Bronstein, M. M. Bronstein, B. Bustos, T. Darom, R. Horaud, I. Hotz, Y. Keller, J. Keustermans, A. Kovnatsky, R. Litman,
J. Reininghaus, I. Sipiran, D. Smeets, P. Suetens, D. Vandermeulen,
A. Zaharescu, and V. Zobel. “SHREC 2011: Robust Feature Detection
and Description Benchmark”. CoRR, Vol. abs/1102.4258, Feb 2011.
[Brad 00]
G. Bradski. “The OpenCV Library”. Dr. Dobb’s Journal of Software Tools,
2000.
[Brah 08]
A. Brahme, P. Nyman, and B. Skatt. “4D Laser Camera for Accurate
Patient Positioning, Collision Avoidance, Image Fusion and Adaptive
Approaches during Diagnostic and Therapeutic Procedures”. Medical
Physics, Vol. 35, No. 5, pp. 1670–1681, 2008.
[Bran 06]
E. D. Brandner, A. Wu, H. Chen, D. Heron, S. Kalnicki, K. Komanduri,
K. Gerszten, S. Burton, I. Ahmed, and Z. Shou. “Abdominal Organ
Motion Measured using 4D CT”. International Journal of Radiation Oncology Biology Physics, Vol. 65, No. 2, pp. 554–560, 2006.
[Bron 08]
I. Bronstein and K. Semendjajew. Taschenbuch der Mathematik. Harri
Deutsch, 8th Ed., 2008.
[Bron 10]
A. Bronstein, M. Bronstein, B. Bustos, U. Castellani, M. Crisani, B. Falcidieno, L. Guibas, I. Kokkinos, V. Murino, I. Sipiran, M. Ovsjanikov,
G. Patane, M. Spagnuolo, and J. Sun. “SHREC 2010: Robust Feature
Detection and Description Benchmark”. In: Workshop on 3D Object
Retrieval (3DOR), pp. 79–86, Eurographics Association, May 2010.
Bibliography
161
[Bron 11]
A. M. Bronstein, M. M. Bronstein, L. J. Guibas, and M. Ovsjanikov.
“Shape Google: Geometric Words and Expressions for Invariant
Shape Retrieval”. ACM Transactions on Graphics, Vol. 30, No. 1, pp. 1:1–
1:20, 2011.
[Brow 07]
B. J. Brown and S. Rusinkiewicz. “Global Non-rigid Alignment of 3-D
Scans”. ACM Transactions on Graphics, Vol. 26, No. 3, pp. 21:1–21:9, Jul
2007.
[Bruh 05]
A. Bruhn, J. Weickert, and C. Schnörr. “Lucas/Kanade meets
Horn/Schunck: Combining Local and Global Optic Flow Methods”.
International Journal of Computer Vision, Vol. 61, No. 3, pp. 211–231,
2005.
[Bruh 12]
A. Bruhn, T. Pock, and X.-C. Tai. “Efficient Algorithms for Global
Optimisation Methods in Computer Vision”. Dagstuhl Reports, Vol. 1,
No. 11, pp. 66–90, 2012.
[Bruy 05]
P. Bruyant, M. A. Gennert, G. Speckert, R. Beach, J. Morgenstern,
N. Kumar, S. Nadella, and M. King. “A Robust Visual Tracking System
for Patient Motion Detection in SPECT: Hardware Solutions”. IEEE
Transactions on Nuclear Science, Vol. 52, No. 5, pp. 1288–1294, 2005.
[Buad 09]
T. Buades, Y. Lou, J. Morel, and Z. Tang. “A Note on Multi-image
Denoising”. In: International Workshop on Local and Non-Local Approximation in Image Processing (LNLA), pp. 1–15, Aug 2009.
[Bust 05]
B. Bustos, D. A. Keim, D. Saupe, T. Schreck, and D. V. Vranić. “Featurebased Similarity Search in 3D Object Databases”. ACM Computing
Surveys, Vol. 37, pp. 345–387, 2005.
[Cach 00]
P. Cachier and D. Rey. “Symmetrization of the Non-rigid Registration
Problem Using Inversion-Invariant Energies: Application to Multiple
Sclerosis”. In: S. Delp, A. DiGoia, and B. Jaramaz, Eds., International
Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 697–708, LNCS 1935, Springer, Oct 2000.
[Carc 02]
R. L. Carceroni and K. N. Kutulakos. “Multi-View Scene Capture by
Surfel Sampling: From Video Streams to Non-Rigid 3D Motion, Shape
and Reflectance”. International Journal of Computer Vision, Vol. 49,
No. 2-3, pp. 175–214, Sep 2002.
[Cash 07]
D. Cash, M. Miga, S. Glasgow, B. Dawant, L. Clements, Z. Cao, R. Galloway, and W. Chapman. “Concepts and Preliminary Data Toward the
Realization of Image-guided Liver Surgery”. Journal of Gastrointestinal
Surgery, Vol. 11, No. 7, pp. 844–859, Jul 2007.
[Catu 12]
D. Catuhe. Programming With the Kinect for Windows Software Development Kit: Add Gesture and Posture Recognition to Your Applications.
Microsoft Press Series, Microsoft Press, 2012.
[Cayt 10]
L. Cayton. “A Nearest Neighbor Data Structure for Graphics Hardware”. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS), pp. 1–
10, Sep 2010.
162
Bibliography
[Cayt 11]
L. Cayton. “Accelerating Nearest Neighbor Search on Manycore Systems”. In: International Parallel and Distributed Processing Symposium
(IPDPS), pp. 402–413, IEEE, May 2011.
[Chan 11]
Y.-J. Chang, S.-F. Chen, and J.-D. Huang. “A Kinect-based System
for Physical Rehabilitation: A Pilot Study for Young Adults with Motor Disabilities”. Research in Developmental Disabilities, Vol. 32, No. 6,
pp. 2566–2570, 2011.
[Chan 12]
C.-Y. Chang, B. Lange, M. Zhang, S. Koenig, P. Requejo, N. Somboon,
A. A. Sawchuk, and A. A. Rizzo. “Towards Pervasive Physical Rehabilitation Using Microsoft Kinect”. In: International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth), pp. 159–
162, IEEE, May 2012.
[Chen 09]
C.-I. Chen. Automated Model Building from Video in Computer-aided Diagnosis in Colonoscopy. PhD thesis, University of California, Santa Barbara, 2009.
[Chen 10]
C.-I. Chen, D. Sargent, and Y.-F. Wang. “Modeling Tumor/Polyp/ Lesion Structure in 3D for Computer-aided Diagnosis in Colonoscopy”.
In: SPIE Medical Imaging, pp. 76252F–8, Mar 2010.
[Chen 92]
Y. Chen and G. Medioni. “Object Modelling by Registration of Multiple Range Images”. Image and Vision Computing, Vol. 10, No. 3,
pp. 145–155, Apr 1992.
[Chri 01]
G. Christensen and H. Johnson. “Consistent Image Registration”.
IEEE Transactions on Medical Imaging, Vol. 20, No. 7, pp. 568–582, 2001.
[Chua 97]
C. S. Chua and R. Jarvis. “Point Signatures: A New Representation
for 3D Object Recognition”. International Journal of Computer Vision,
Vol. 25, pp. 63–85, 1997.
[Chui 03]
H. Chui and A. Rangarajan. “A New Point Matching Algorithm for
Non-rigid Registration”. Computer Vision and Image Understanding,
Vol. 89, No. 2-3, pp. 114–141, 2003.
[Clan 11]
N. T. Clancy, D. Stoyanov, L. Maier-Hein, A. Groch, G.-Z. Yang,
and D. S. Elson. “Spectrally Encoded Fiber-based Structured Lighting Probe for Intraoperative 3D Imaging”. Biomedical Optics Express,
Vol. 2, No. 11, pp. 3119–3128, Nov 2011.
[Clem 08]
L. W. Clements, W. C. Chapman, B. M. Dawant, R. L. Galloway, Jr, and
M. I. Miga. “Robust Surface Registration using Salient Anatomical
Features for Image-guided Liver Surgery: Algorithm and Validation”.
Medical Physics, Vol. 35, No. 6, pp. 2528–2540, Jun 2008.
[Cola 12]
A. Colaco, A. Kirmani, G. A. Howland, J. C. Howell, and V. K.
Goyal. “Compressive Depth Map Acquisition using a Single Photoncounting Detector: Parametric Signal Processing meets Sparsity”.
In: International Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 96–102, Jun 2012.
Bibliography
[Coll 12]
163
T. Collins and A. Bartoli. “3D Reconstruction in Laparoscopy with
Close-Range Photometric Stereo”. In: N. Ayache, H. Delingette,
P. Golland, and K. Mori, Eds., International Conference on Medical Image
Computing and Computer Assisted Intervention (MICCAI), pp. 634–642,
LNCS 7511, Springer, Oct 2012.
[Comb 10] B. Combès and S. Prima. “An Efficient EM-ICP Algorithm for Symmetric Consistent Non-linear Registration of Point Sets”. In: International Conference on Medical Image Computing and Computer Assisted
Intervention (MICCAI), pp. 594–601, LNCS 6362, Part II, Springer, Sep
2010.
[Coro 12]
A. Coronato and L. Gallo. “Towards Abnormal Behavior Detection
of Cognitive Impaired People”. In: International Workshop on Sensor
Networks and Ambient Intelligence, pp. 859–864, IEEE, Mar 2012.
[Curl 96]
B. Curless and M. Levoy. “A Volumetric Method for Building Complex Models from Range Images”. In: SIGGRAPH Conference on Computer Graphics and Interactive Techniques, pp. 303–312, ACM, 1996.
[Dala 05]
N. Dalal and B. Triggs. “Histograms of Oriented Gradients for Human
Detection”. In: International Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 886–893, IEEE, Jun 2005.
[Daum 11] V. Daum. Model-Constrained Non-Rigid Registration in Medicine. PhD
thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, 2011.
[Demp 77] A. P. Dempster, N. M. Laird, and D. B. Rubin. “Maximum Likelihood
from Incomplete Data via the EM Algorithm”. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, pp. 1–38, 1977.
[Depu 11]
T. Depuydt, D. Verellen, O. Haas, T. Gevaert, N. Linthout,
M. Duchateau, K. Tournel, T. Reynders, K. Leysen, M. Hoogeman,
G. Storme, and M. D. Ridder. “Geometric Accuracy of a Novel Gimbals based Radiation Therapy Tumor Tracking System”. Radiotherapy
and Oncology, Vol. 98, No. 3, pp. 365–372, 2011.
[Diet 11]
S. Dieterich, C. Cavedon, C. F. Chuang, A. B. Cohen, J. A. Garrett,
C. L. Lee, J. R. Lowenstein, M. F. d’Souza, D. D. Taylor, X. Wu, and
C. Yu. “Report of AAPM TG 135: Quality Assurance for Robotic Radiosurgery”. Medical Physics, Vol. 38, No. 6, pp. 2914–2936, Jun 2011.
[Dora 97]
C. Dorai, J. Weng, and A. K. Jain. “Optimal Registration of Object
Views Using Range Data”. IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. 19, No. 10, pp. 1131–1138, 1997.
[Dorr 11]
A. A. Dorrington, J. P. Godbaz, M. J. Cree, A. D. Payne, and L. V.
Streeter. “Separating True Range Measurements from Multi-path and
Scattering Interference in Commercial Range Cameras”. In: SPIE Electronic Imaging, pp. 786404–10, 2011.
[Dros 07]
M. Droske and M. Rumpf. “Multiscale Joint Segmentation and Registration of Image Morphology”. IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol. 29, No. 12, pp. 2181–2194, Dec 2007.
164
Bibliography
[Druo 06]
S. Druon, M. Aldon, and A. Crosnier. “Color Constrained ICP for Registration of Large Unstructured 3D Color Data Sets”. In: International
Conference on Information Acquisition, pp. 249–255, IEEE, Aug 2006.
[Enge 11]
N. Engelhard, F. Endres, J. Hess, J. Sturm, and W. Burgard. “Realtime 3D Visual SLAM with a Hand-held RGB-D Camera”. In: RGB-D
Workshop on 3D Perception in Robotics, European Robotics Forum, Apr
2011.
[Eom 09]
J. Eom, C. Shi, X. G. Xu, and S. De. “Modeling Respiratory Motion for
Cancer Radiation Therapy Based on Patient-Specific 4DCT Data”. In:
International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 348–355, LNCS 5762, Part II, Springer,
Sep 2009.
[Eom 10]
J. Eom, X. G. Xu, S. De, and C. Shi. “Predictive Modeling of Lung
Motion over the entire Respiratory Cycle using Measured Pressurevolume Data, 4DCT Images, and Finite-Element Analysis”. Medical
Physics, Vol. 37, No. 8, pp. 4389–4400, Aug 2010.
[Erns 12]
F. Ernst, R. Bruder, A. Schlaefer, and A. Schweikard. “Correlation between External and Internal Respiratory Motion: a Validation Study”.
International Journal of Computer Assisted Radiology and Surgery, Vol. 7,
pp. 483–492, 2012.
[Essa 02]
S. Essapen, C. Knowles, A. Norman, and D. Tait. “Accuracy of Set-up
of Thoracic Radiotherapy: Prospective Analysis of 24 Patients Treated
with Radiotherapy for Lung Cancer”. British Journal of Radiology,
Vol. 75, No. 890, pp. 162–169, Feb 2002.
[Ettl 12a]
S. Ettl, S. Fouladi-Movahed, S. Bauer, O. Arold, F. Willomitzer, F. Huber, S. Rampp, H. Stefan, J. Hornegger, and G. Häusler. “Medical Applications Enabled by a Motion-robust Optical 3D Sensor”. In: German
Branch of the European Optical Society (DGaO) Annual Meeting, p. P22,
May 2012.
[Ettl 12b]
S. Ettl, O. Arold, Z. Yang, and G. Häusler. “Flying Triangulation –
An Optical 3D Sensor for the Motion-robust Acquisition of Complex
Objects”. Applied Optics, Vol. 51, No. 2, pp. 281–289, 2012.
[Fali 08]
D. Falie, M. Ichim, and L. David. “Respiratory Motion Visualization
and the Sleep Apnea Diagnosis with the Time of Flight (ToF) Camera”. In: International Conference on Visualization, Imaging and Simulation (VIS), pp. 179–184, WSEAS, Nov 2008.
[Faug 04]
O. Faugeras, Q. Luong, and T. Papadopoulo. The Geometry of Multiple
Images: The Laws That Govern the Formation of Multiple Images of a Scene
and Some of Their Applications. MIT Press, 2004.
[Faya 09]
H. Fayad, T. Pan, C. Roux, C. Le Rest, O. Pradier, J. Clement, and
D. Visvikis. “A Patient Specific Respiratory Model based on 4D CT
Data and a Time of Flight Camera (TOF)”. In: Nuclear Science Symposium and Medical Imaging Conference (NSS MIC), pp. 2594–2598, IEEE,
Oct 2009.
Bibliography
165
[Faya 11]
H. Fayad, T. Pan, J. F. Clement, and D. Visvikis. “Correlation of Respiratory Motion between External Patient Surface and Internal Anatomical Landmarks”. Medical Physics, Vol. 38, No. 6, pp. 3157–3164, 2011.
[Fero 04]
O. Féron and A. Mohammad-Djafari. “Image Fusion and Unsupervised Joint Segmentation using a HMM and MCMC Algorithms”.
Journal of Electronic Imaging, Vol. 15, No. 02, p. 023014, May 2004.
[Fiel 88]
D. A. Field. “Laplacian Smoothing and Delaunay Triangulations”.
Communications in Applied Numerical Methods, Vol. 4, No. 6, pp. 709–
712, 1988.
[Fisc 02]
B. Fischer and J. Modersitzki. Inverse Problems, Image Analysis, and
Medical Imaging: AMS Special Session on Interaction of Inverse Problems
and Image Analysis, Chap. Fast Diffusion Registration, pp. 117–127.
Contemporary Mathematics - American Mathematical Society, AMS, 2002.
[Fitz 03]
A. W. Fitzgibbon. “Robust Registration of 2D and 3D Point Sets”.
Image and Vision Computing, Vol. 21, No. 13-14, pp. 1145–1153, 2003.
[Fleu 02]
M. Fleute, S. Lavallée, and L. Desbat. “Integrated Approach for
Matching Statistical Shape Models with Intra-operative 2D and 3D
Data”. In: International Conference on Medical Image Computing and
Computer Assisted Intervention (MICCAI), pp. 364–372, LNCS 2489, Part
II, Springer, Sep 2002.
[Fleu 99]
M. Fleute, S. Lavallée, and R. Julliard. “Incorporating a Statistically
based Shape Model into a System for Computer-Assisted Anterior
Cruciate Ligament Surgery”. Medical Image Analysis, Vol. 3, No. 3,
pp. 209–222, Sep 1999.
[Fluc 11]
O. Fluck, C. Vetter, W. Wein, A. Kamen, B. Preim, and R. Westermann.
“A Survey of Medical Image Registration on Graphics Hardware”.
Computer Methods and Programs in Biomedicine, Vol. 104, No. 3, pp. 45–
57, 2011.
[Foix 11]
S. Foix, G. Alenya, and C. Torras. “Lock-in Time-of-Flight (ToF) Cameras: A Survey”. IEEE Sensors Journal, Vol. 11, No. 9, pp. 1917–1926,
Sep 2011.
[Ford 02]
E. C. Ford, G. S. Mageras, E. Yorke, K. E. Rosenzweig, R. Wagman,
and C. C. Ling. “Evaluation of Respiratory Movement during Gated
Radiotherapy using Film and Electronic Portal Imaging”. International
Journal of Radiation Oncology Biology Physics, Vol. 52, No. 2, pp. 522–
531, Feb 2002.
[Foth 12]
S. Fothergill, H. Mentis, P. Kohli, and S. Nowozin. “Instructing People
for Training Gestural Interactive Systems”. In: SIGCHI Conference on
Human Factors in Computing Systems, pp. 1737–1746, ACM, 2012.
[Fran 02]
A. Frangi, D. Rueckert, J. Schnabel, and W. Niessen. “Automatic Construction of Multiple-object Three-dimensional Statistical Shape Models: Application to Cardiac Modeling”. IEEE Transactions on Medical
Imaging, Vol. 21, No. 9, pp. 1151–1166, Sep 2002.
166
Bibliography
[Fran 09a]
M. Frank, M. Plaue, and F. A. Hamprecht. “Denoising of ContinuousWave Time-Of-Flight Depth Images using Confidence Measures”. Optical Engineering, Vol. 48, No. 7, p. 077003, Jul 2009.
[Fran 09b]
M. Frank, M. Plaue, H. Rapp, U. Köthe, B. Jähne, and F. A. Hamprecht.
“Theoretical and Experimental Error Analysis of Continuous-Wave
Time-of-Flight Range Cameras”. Optical Engineering, Vol. 48, No. 1,
p. 013602, 2009.
[Free 10a]
B. Freedman, A. Shpunt, and Y. Arieli. “Distance-Varying Illumination and Imaging Techniques for Depth Mapping”. Patent, Nov 2010.
US20100290698.
[Free 10b]
B. Freedman, A. Shpunt, M. Machline, and Y. Arieli. “Depth Mapping
using Projected Patterns”. Patent, May 2010. US20100118123.
[Fren 09]
T. Frenzel. “Patient Setup using a 3D Laser Surface Scanning System”. In: World Congress on Medical Physics and Biomedical Engineering,
pp. 217–220, IFMBE, Springer, Sep 2009.
[From 04]
A. Frome, D. Huber, R. Kolluri, T. Bülow, and J. Malik. “Recognizing
Objects in Range Data Using Regional Point Descriptors”. In: European Conference on Computer Vision (ECCV), pp. 224–237, Springer,
May 2004.
[Fuch 08a] S. Fuchs and G. Hirzinger. “Extrinsic and Depth Calibration of ToFCameras”. In: International Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 1–6, IEEE, Jun 2008.
[Fuch 08b] S. Fuchs and S. May. “Calibration and Registration for Precise Surface Reconstruction with Time-of-Flight Cameras”. International Journal of Intelligent Systems Technologies and Applications, Vol. 5, No. 3/4,
pp. 274–284, Nov 2008.
[Fuch 10]
S. Fuchs. “Multipath Interference Compensation in Time-of-Flight
Camera Images”. In: International Conference on Pattern Recognition
(ICPR), pp. 3583–3586, Aug 2010.
[Funk 06]
T. Funkhouser and P. Shilane. “Partial Matching of 3D Shapes
with Priority-driven Search”. In: Symposium on Geometry Processing,
pp. 131–142, Eurographics Association, 2006.
[Gal 06]
R. Gal and D. Cohen-Or. “Salient Geometric Features for Partial Shape
Matching and Similarity”. ACM Transactions on Graphics, Vol. 25,
pp. 130–150, Jan 2006.
[Gall 10]
S. Gallo, D. Chapuis, L. Santos-Carreras, Y. Kim, P. Retornaz,
H. Bleuler, and R. Gassert. “Augmented White Cane with Multimodal
Haptic Feedback”. In: International Conference on Biomedical Robotics
and Biomechatronics (BioRob), pp. 149–155, IEEE, RAS, EMBS, Sep 2010.
[Gall 11]
L. Gallo, A. P. Placitelli, and M. Ciampi. “Controller-Free Exploration
of Medical Image Data: Experiencing the Kinect”. In: International
Symposium on Computer-Based Medical Systems (CBMS), pp. 1–6, IEEE,
2011.
Bibliography
167
[Gama 12] A. da Gama, T. Chaves, L. Figueiredo, and V. Teichrieb. “Improving
Motor Rehabilitation Process through a Natural Interaction based System using Kinect Sensor”. In: Symposium on 3D User Interfaces (3DUI),
pp. 145–146, IEEE, Mar 2012.
[Garc 08]
J. Garcia and Z. Zalevsky. “Range Mapping using Speckle Decorrelation”. Patent, Feb 2008. US7433024.
[Garc 12]
J. A. Garcia, K. F. Navarro, D. Schoene, S. T. Smith, and Y. Pisan.
Health Informatics: Building a Healthcare Future Through Trusted Information, Chap. Exergames for the Elderly: Towards an Embedded Kinectbased Clinical Test of Falls Risk, pp. 51–57. Studies in Health Technology
and Informatics, IOS, 2012.
[Garg 13]
R. Garg, A. Roussos, and L. Agapito. “A Variational Approach to
Video Registration with Subspace Constraints”. International Journal
of Computer Vision, pp. 1–29, 2013.
[Gelf 05]
N. Gelfand, N. Mitra, L. Guibas, and H. Pottmann. “Robust Global
Registration”. In: H. P. M. Desbrun, Ed., Symposium on Geometry Processing, pp. 197–206, Eurographics Association, 2005.
[Geve 99]
T. Gevers and A. W. Smeulders. “Color-based Object Recognition”.
Pattern Recognition, Vol. 32, No. 3, pp. 453–464, 1999.
[Gian 11]
C. Gianoli, M. Riboldi, M. F. Spadea, L. L. Travaini, M. Ferrari, R. Mei,
R. Orecchia, and G. Baroni. “A Multiple Points Method for 4D CT
Image Sorting”. Medical Physics, Vol. 38, No. 2, pp. 656–667, 2011.
[Gier 08]
D. P. Gierga, M. Riboldi, J. C. Turcotte, G. C. Sharp, S. B. Jiang, A. G.
Taghian, and G. T. Chen. “Comparison of Target Registration Errors
for Multiple Image-Guided Techniques in Accelerated Partial Breast
Irradiation”. International Journal of Radiation Oncology Biology Physics,
Vol. 70, No. 4, pp. 1239–1246, 2008.
[Gies 12]
M. van de Giessen, F. M. Vos, C. A. Grimbergen, L. J. van Vliet,
and G. J. Streekstra. “An Efficient and Robust Algorithm for Parallel Groupwise Registration of Bone Surfaces”. In: International Conference on Medical Image Computing and Computer Assisted Intervention
(MICCAI), pp. 164–171, LNCS 7512, Part III, Springer, Oct 2012.
[Godi 94]
G. Godin, M. Rioux, and R. Baribeau. “Three-Dimensional Registration using Range and Intensity Information”. SPIE Videometrics,
pp. 279–290, Nov 1994.
[Gott 11]
J.-M. Gottfried, J. Fehr, and C. S. Garbe. “Computing Range Flow
from Multi-modal Kinect Data”. In: International Symposium on Visual
Computing (ISVC), pp. 758–767, Springer, Jul 2011.
[Gran 02]
S. Granger and X. Pennec. “Multi-scale EM-ICP: A Fast and Robust
Approach for Surface Registration”. In: European Conference on Computer Vision (ECCV), pp. 418–432, May 2002.
168
Bibliography
[Grim 12]
R. Grimm, S. Bauer, J. Sukkau, J. Hornegger, and G. Greiner. “Markerless Estimation of Patient Orientation, Posture and Pose using Range
and Pressure Imaging”. International Journal of Computer Assisted Radiology and Surgery, Vol. 7, No. 6, pp. 921–929, Nov 2012.
[Haas 12]
S. Haase, C. Forman, T. Kilgus, R. Bammer, L. Maier-Hein, and
J. Hornegger. “ToF/RGB Sensor Fusion for Augmented 3-D Endoscopy using a Fully Automatic Calibration Scheme”. In: T. Tolxdorff, T. M. Deserno, H. Handels, and H.-P. Meinzer, Eds., Bildverarbeitung für die Medizin (BVM), pp. 111–116, Springer, Mar 2012.
[Haas 13]
S. Haase, J. Wasza, T. Kilgus, and J. Hornegger. “Laparoscopic Instrument Localization using a 3-D Time-of-Flight/RGB Endoscope”.
In: Workshop on Applications of Computer Vision (WACV), pp. 449–454,
IEEE, Jan 2013.
[Hadf 11]
S. Hadfield and R. Bowden. “Kinecting the Dots: Particle based Scene
Flow from Depth Sensors”. In: International Conference on Computer
Vision (ICCV), pp. 2290–2295, IEEE, Nov 2011.
[Hajn 01]
J. Hajnal, D. Hawkes, and D. Hill. Medical Image Registration. Biomedical Engineering Series, Taylor & Francis Group, 2001.
[Han 07]
J. Han, B. Berkels, M. Droske, J. Hornegger, M. Rumpf, C. Schaller,
J. Scorzin, and H. Urbach. “Mumford-Shah Model for One-to-One
Edge Matching”. IEEE Transactions on Image Processing, Vol. 16, No. 11,
pp. 2720–2732, 2007.
[Hans 13]
M. Hansard, S. Lee, O. Choi, and R. Horaud. Time-of-Flight Cameras:
Principles, Methods and Applications. SpringerBriefs in Computer Science,
Springer, 2013.
[Hart 04]
R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer
Vision. Cambridge University Press, 2nd Ed., 2004.
[Hasl 09]
N. Hasler, C. Stoll, M. Sunkel, B. Rosenhahn, and H.-P. Seidel. “A
Statistical Model of Human Pose and Body Shape”. In: P. Dutré and
M. Stamminger, Eds., Computer Graphics Forum, pp. 337–346, Eurographics Association, Mar 2009.
[Haus 11]
G. Häusler and S. Ettl. “Limitations of Optical 3D Sensors”. In:
R. Leach, Ed., Optical Measurement of Surface Topography, pp. 23–48,
Springer, 2011.
[He 13]
K. He, J. Sun, and X. Tang. “Guided Image Filtering”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 6,
pp. 1397–1409, 2013.
[Heim 09]
T. Heimann and H.-P. Meinzer. “Statistical Shape Models for 3D Medical Image Segmentation: A Review”. Medical Image Analysis, Vol. 13,
No. 4, pp. 543–563, 2009.
[Henr 12]
P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. “RGB-D Mapping:
Using Kinect-style Depth Cameras for Dense 3D Modeling of Indoor
Environments”. International Journal of Robotics Research, Vol. 31, No. 5,
pp. 647–663, 2012.
Bibliography
169
[Herb 13]
E. Herbst, X. Ren, and D. Fox. “RGB-D Flow: Dense 3-D Motion
Estimation Using Color and Depth”. In: International Conference on
Robotics and Automation (ICRA), p. to appear, IEEE, May 2013.
[Herl 99a]
A. Herline et al. “Image-guided Surgery: Preliminary Feasibility
Studies of Frameless Stereotactic Liver Surgery”. Archives of Surgery,
Vol. 134, No. 6, pp. 644–650, 1999.
[Herl 99b]
A. J. Herline, J. L. Herring, J. D. Stefansic, W. C. Chapman, R. L. Galloway, and B. M. Dawant. “Surface Registration for Use in Interactive Image-Guided Liver Surgery”. In: International Conference on
Medical Image Computing and Computer-Assisted Intervention (MICCAI),
pp. 892–899, LNCS 1679, Springer, Sep 1999.
[Hers 08]
M. Hersh and M. Johnson. Assistive Technology for Visually Impaired and
Blind People. Springer, 2008.
[Higg 08]
W. E. Higgins, J. P. Helferty, K. Lu, S. A. Merritt, L. Rai, and K.-C. Yu.
“3D CT-Video Fusion for Image-guided Bronchoscopy”. Computerized
Medical Imaging and Graphics, Vol. 32, No. 3, pp. 159–173, Apr 2008.
[Hilt 96]
A. Hilton, A. J. Stoddart, J. Illingworth, and T. Windeatt. “Reliable Surface Reconstructiuon from Multiple Range Images”. In: European Conference on Computer Vision (ECCV), pp. 117–126, Springer, Apr 1996.
[Hoff 07]
G. Hoff, M. Bretthauer, S. Dahler, G. Huppertz-Hauss, J. Sauar,
J. Paulsen, B. Seip, and V. Moritz. “Improvement in Caecal Intubation
Rate and Pain Reduction by using 3-dimensional Magnetic Imaging
for Unsedated Colonoscopy: A Randomized Trial of Patients referred
for Colonoscopy”. Scandinavian Journal of Gastroenterology, Vol. 42,
No. 7, pp. 885–889, 2007.
[Holz 12]
S. Holzer, J. Shotton, and P. Kohli. “Learning to Efficiently Detect Repeatable Interest Points in Depth Data”. In: European Conference on
Computer Vision (ECCV), pp. 200–213, Springer, Oct 2012.
[Hoog 09]
M. Hoogeman, J.-B. Prévost, J. Nuyttens, et al. “Clinical Accuracy of
the Respiratory Tumor Tracking System of the Cyberknife: Assessment by Analysis of Log Files”. International Journal of Radiation Oncology Biology Physics, Vol. 74, No. 1, pp. 297–303, 2009.
[Horn 81]
B. K. P. Horn and B. G. Schunck. “Determining Optical Flow”. Artificial Intelligence, Vol. 17, No. 1-3, pp. 185–203, 1981.
[Horn 87]
B. K. P. Horn. “Closed-form Solution of Absolute Orientation using
Unit Quaternions”. Journal of the Optical Society of America, Vol. 4,
No. 4, pp. 629–642, Apr 1987.
[Huan 06]
X. Huang, N. Paragios, and D. N. Metaxas. “Shape Registration in
Implicit Spaces Using Information Theory and Free Form Deformations”. IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 28, No. 8, pp. 1303–1318, 2006.
170
Bibliography
[Huan 11]
J.-D. Huang. “Kinerehab: A Kinect-based System for Physical Rehabilitation: A Pilot Study for Young Adults with Motor Disabilities”.
In: International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS), pp. 319–320, 2011.
[Hugu 07]
F. Huguet and F. Devernay. “A Variational Method for Scene Flow Estimation from Stereo Sequences”. In: International Conference on Computer Vision (ICCV), pp. 1–7, IEEE, Oct 2007.
[Huhl 08]
B. Huhle, P. Jenke, and W. Strasser. “On-the-fly Scene Acquisition
with a Handy Multi-sensor System”. International Journal of Intelligent
Systems Technologies and Applications, Vol. 5, pp. 255–263, Nov 2008.
[Iban 05]
L. Ibanez, W. Schroeder, L. Ng, and J. Cates. The ITK Software Guide.
Kitware, Inc., 2nd Ed., 2005.
[Izad 11]
S. Izadi, R. A. Newcombe, D. Kim, O. Hilliges, D. Molyneaux,
S. Hodges, P. Kohli, J. Shotton, A. J. Davison, and A. W. Fitzgibbon.
“KinectFusion: Real-time Dynamic 3D Surface Reconstruction and Interaction”. In: SIGGRAPH Talks, p. 23, ACM, 2011.
[Jahn 99]
B. Jähne, H. Haussecker, and P. Geissler. Handbook of Computer Vision
and Applications: Sensors and Imaging. Handbook of Computer Vision and
Applications, Academic Press, 1999.
[Jarv 83]
R. A. Jarvis. “A Perspective on Range Finding Techniques for Computer Vision”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 5, No. 2, pp. 122–139, 1983.
[Jens 12]
R. R. Jensen, O. V. Olesen, R. R. Paulsen, M. van der Poel, and
R. Larsen. “Statistical Surface Recovery: A Study on Ear Canals”. In:
MICCAI Workshop on Mesh Processing in Medical Image Analysis, pp. 49–
58, Oct 2012.
[Jian 05]
B. Jian and B. C. Vemuri. “A Robust Algorithm for Point Set Registration Using Mixture of Gaussians”. In: International Conference on
Computer Vision (ICCV), pp. 1246–1251, IEEE, Oct 2005.
[Jian 11]
B. Jian and B. C. Vemuri. “Robust Point Set Registration Using Gaussian Mixture Models”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 8, pp. 1633–1645, 2011.
[John 02]
H. Johnson and G. Christensen. “Consistent landmark and intensitybased image registration”. IEEE Transactions on Medical Imaging,
Vol. 21, No. 5, pp. 450–461, 2002.
[John 11]
R. Johnson, K. O’Hara, A. Sellen, C. Cousins, and A. Criminisi. “Exploring the Potential for Touchless Interaction in Image-guided Interventional Radiology”. In: Conference on Computer-Human Interaction
(CHI), pp. 3323–3332, ACM, 2011.
[John 97]
A. Johnson and S. B. Kang. “Registration and Integration of Textured
3-D Data”. In: International Conference on Recent Advances in 3-D Digital
Imaging and Modeling, pp. 234–241, IEEE, May 1997.
Bibliography
171
[John 98]
A. Johnson and M. Hebert. “Surface Matching for Object Recognition
in Complex Three-dimensional Scenes”. Image and Vision Computing,
Vol. 16, No. 9-10, pp. 635–651, 1998.
[John 99]
A. E. Johnson and M. Hebert. “Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes”. IEEE Transactions on Pattern
Analysis and Machine Intelligence, Vol. 21, pp. 433–449, May 1999.
[Jone 06]
M. W. Jones, J. A. Bærentzen, and M. Srámek. “3D Distance Fields: A
Survey of Techniques and Applications”. IEEE Transactions on Visualization and Computer Graphics, Vol. 12, No. 4, pp. 581–599, 2006.
[Joun 09]
J. H. Joung, K. H. An, J. W. Kang, M. J. Chung, and W. Yu. “3D Environment Reconstruction using Modified Color ICP Algorithm by Fusion of a Camera and a 3D Laser Range Finder”. In: International
Conference on Intelligent Robots and Systems, pp. 3082–3088, IEEE, RSJ,
Oct 2009.
[Kaic 11]
O. van Kaick, H. Zhang, G. Hamarneh, and D. Cohen-Or. “A Survey
on Shape Correspondence”. Computer Graphics Forum, Vol. 30, No. 6,
pp. 1681–1707, 2011.
[Kain 12]
B. Kainz, S. Hauswiesner, G. Reitmayr, M. Steinberger, R. Grasset,
L. Gruber, E. E. Veas, D. Kalkofen, H. Seichter, and D. Schmalstieg.
“OmniKinect: Real-Time Dense Volumetric Data Acquisition and Applications”. In: Symposium on Virtual Reality Software and Technology
(VRST), pp. 25–32, ACM, 2012.
[Kapu 01]
T. Kapur, L. Yezzi, and L. Zöllei. “A Variational Framework for Joint
Segmentation and Registration”. In: Workshop on Mathematical Methods in Biomedical Image Analysis (WMMBIA), pp. 44–51, IEEE, 2001.
[Katz 12]
B. Katz, S. Kammoun, G. Parseihian, O. Gutierrez, A. Brilhault, M. Auvray, P. Truillet, M. Denis, S. Thorpe, and C. Jouffrais. “NAVIG: Augmented Reality Guidance System for the Visually Impaired”. Virtual
Reality, Vol. 16, pp. 253–269, 2012.
[Kazh 03]
M. M. Kazhdan, T. A. Funkhouser, and S. Rusinkiewicz. “Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors”.
In: Symposium on Geometry Processing, pp. 156–164, ACM, 2003.
[Keal 06]
P. J. Keall, G. S. Mageras, J. M. Balter, R. S. Emery, K. M. Forster, S. B.
Jiang, J. M. Kapatoes, D. A. Low, M. J. Murphy, B. R. Murray, C. R.
Ramsey, M. B. V. Herk, S. S. Vedam, J. W. Wong, and E. Yorke. “The
Management of Respiratory Motion in Radiation Oncology: Report of
AAPM Task Group 76”. Medical Physics, Vol. 33, No. 10, pp. 3874–3900,
Oct 2006.
[Kell 09]
M. Keller and A. Kolb. “Real-time Simulation of Time-of-Flight Sensors”. Simulation Modelling Practice and Theory, Vol. 17, No. 5, pp. 967–
978, 2009.
[Khai 08]
K. Khairy and J. Howard. “Spherical Harmonics-based Parametric
Deconvolution of 3D Surface Images using Bending Energy Minimization”. Medical Image Analysis, Vol. 12, No. 2, pp. 217–227, 2008.
172
Bibliography
[Khos 12]
K. Khoshelham and S. O. Elberink. “Accuracy and Resolution of
Kinect Depth Data for Indoor Mapping Applications”. Sensors, Vol. 12,
No. 2, pp. 1437–1454, 2012.
[Kilb 10]
W. Kilby, J. R. Dooley, G. Kuduvalli, S. Sayeh, and C. R. Maurer. “The
CyberKnife Robotic Radiosurgery System in 2010”. Technology in Cancer Research and Treatment, Vol. 9, No. 5, pp. 433–452, Oct 2010.
[Knut 93]
H. Knutsson and C.-F. Westin. “Normalized and Differential Convolution: Methods for Interpolation and Filtering of Incomplete and
Uncertain Data”. In: International Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 515–523, IEEE, Jun 1993.
[Kolb 09]
A. Kolb, E. Barth, R. Koch, and R. Larsen. “Time-of-Flight Sensors in
Computer Graphics”. In: Eurographics State of the Art Reports, pp. 119–
134, 2009.
[Kopp 07]
D. Koppel, C.-I. Chen, Y.-F. Wang, H. Lee, J. Gu, A. Poirson, and
R. Wolters. “Toward Automated Model Building from Video in
Computer-assisted Diagnoses in Colonoscopy”. In: SPIE Medical
Imaging, pp. 65091L–9, Feb 2007.
[Kren 09]
M. Krengli, S. Gaiano, E. Mones, A. Ballarè, D. Beldì, C. Bolchini, and
G. Loi. “Reproducibility of Patient Setup by Surface Image Registration System in Conformal Radiotherapy of Prostate Cancer”. Radiation
Oncology, Vol. 4, p. 9, 2009.
[Kubo 96]
H. D. Kubo and B. C. Hill. “Respiration Gated Radiotherapy Treatment: A Technical Study”. Physics in Medicine and Biology, Vol. 41,
No. 1, pp. 83–91, Jan 1996.
[Kuhn 06]
W. Kühnel. Differential Geometry: Curves - Surfaces - Manifolds. AMS,
2006.
[Kupe 07]
P. Kupelian, T. Willoughby, A. Mahadevan, T. Djemil, G. Weinstein,
S. Jani, C. Enke, T. Solberg, N. Flores, D. Liu, D. Beyer, and L. Levine.
“Multi-institutional Clinical Experience with the Calypso System in
Localization and Continuous, Real-time Monitoring of the Prostate
Gland during External Radiotherapy”. International Journal of Radiation Oncology Biology Physics, Vol. 67, No. 4, pp. 1088–1098, Mar 2007.
[Kurt 11]
S. Kurtek, E. Klassen, Z. Ding, S. Jacobson, J. Jacobson, M. Avison,
and A. Srivastava. “Parameterization-Invariant Shape Comparisons
of Anatomical Surfaces”. IEEE Transactions on Medical Imaging, Vol. 30,
No. 3, pp. 849–858, 2011.
[Ladi 08]
A. Ladikos, S. Benhimane, and N. Navab. “Real-Time 3D Reconstruction for Collision Avoidance in Interventional Environments”.
In: International Conference on Medical Image Computing and ComputerAssisted Intervention (MICCAI), pp. 526–534, LNCS 5242, Part II,
Springer, Sep 2008.
[Lang 00]
R. Lange. 3D Time-of-Flight Distance Measurement with Custom SolidState Image Sensors in CMOS/CCD-Technology. PhD thesis, Universität
Siegen, 2000.
Bibliography
173
[Lang 01]
K. Langen and D. Jones. “Organ Motion and its Management”. International Journal of Radiation Oncology Biology Physics, Vol. 50, No. 1,
pp. 265–278, 2001.
[Lang 11]
B. Lange, C.-Y. Chang, E. Suma, B. Newman, A. Rizzo, and M. Bolas. “Development and Evaluation of Low Cost Game-based Balance
Rehabilitation Tool using the Microsoft Kinect Sensor”. In: International Conference of the IEEE Engineering in Medicine and Biology Society,
pp. 1831–1834, Aug 2011.
[Lea 12]
C. S. Lea, J. C. Fackler, G. D. Hager, , and R. H. Taylor. “Towards Automated Activity Recognition in an Intensive Care Unit”. In: MICCAI
Workshop on Modeling and Monitoring of Computer Assisted Interventions,
Oct 2012.
[Lee 12]
S. Lee, B. Kang, J. D. Kim, and C. Y. Kim. “Motion Blur-free Timeof-Flight Range Sensor”. In: SPIE Sensors, Cameras, and Systems for
Industrial and Scientific Applications, pp. 82980U–6, 2012.
[Lenz 11]
F. Lenzen, H. Schäfer, and C. S. Garbe. “Denoising Time-Of-Flight
Data with Adaptive Total Variation”. In: International Symposium on
Visual Computing (ISVC), pp. 337–346, Springer, Jul 2011.
[Leto 11]
A. Letouzey, B. Petit, and E. Boyer. “Scene Flow from Depth and Color
Images”. In: J. Hoey, S. McKenna, and E. Trucco, Eds., British Machine
Vision Conference (BMVC), pp. 46.1–46.11, BMVA Press, Sep 2011.
[Li 05]
X. Li and I. Guskov. “Multi-scale Features for Approximate Alignment of Point-based Surfaces”. In: Symposium on Geometry Processing,
pp. 217–226, Eurographics Association, Jul 2005.
[Lind 10]
M. Lindner, I. Schiller, A. Kolb, and R. Koch. “Time-of-Flight Sensor
Calibration for Accurate Range Sensing”. Computer Vision and Image
Understanding, Vol. 114, No. 12, pp. 1318–1328, 2010. Special Issue on
Time-of-Flight Camera Based Computer Vision.
[Liu 08]
J. Liu, K. Subramanian, T. Yoo, and R. Van Uitert. “A Stable Optic-flow
Based Method for Tracking Colonoscopy Images”. In: CVPR Workshop
on Mathematical Methods in Biomedical Image Analysis (MMBIA), pp. 1–
8, IEEE, Jun 2008.
[Liu 09]
C. Liu. Beyond Pixels: Exploring New Representations and Applications
for Motion Analysis. PhD thesis, MIT, May 2009.
[Lore 87]
W. E. Lorensen and H. E. Cline. “Marching Cubes: A High Resolution
3D Surface Construction Algorithm”. In: SIGGRAPH, pp. 163–169,
ACM, Jul 1987.
[Lowe 04]
D. G. Lowe. “Distinctive Image Features from Scale-Invariant Keypoints”. International Journal of Computer Vision, Vol. 60, pp. 91–110,
Nov 2004.
[Luca 81]
B. D. Lucas and T. Kanade. “An Iterative Image Registration Technique with an Application to Stereo Vision”. In: International Joint
Conference on Artificial Intelligence (IJCAI), pp. 674–679, Aug 1981.
174
Bibliography
[Maie 11]
L. Maier-Hein, A. M. Franz, M. Fangerau, M. Schmidt, A. Seitel,
S. Mersmann, T. Kilgus, A. Groch, K. Yung, T. R. dos Santos, and
H.-P. Meinzer. “Towards Mobile Augmented Reality for On-Patient
Visualization of Medical Images”. In: H. Handels, J. Ehrhardt, T. M.
Deserno, H.-P. Meinzer, and T. Tolxdorff, Eds., Bildverarbeitung für die
Medizin (BVM), pp. 389–393, Springer, Mar 2011.
[Maie 12]
L. Maier-Hein, A. Franz, T. dos Santos, M. Schmidt, M. Fangerau, H.P. Meinzer, and J. M. Fitzpatrick. “Convergent Iterative Closest-Point
Algorithm to Accomodate Anisotropic and Inhomogenous Localization Error”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No. 8, pp. 1520–1532, 2012.
[Main 98]
J. B. A. Maintz and M. A. Viergever. “A Survey of Medical Image
Registration”. Medical Image Analysis, Vol. 2, No. 1, pp. 1–36, 1998.
[Mana 06]
S. Manay, D. Cremers, B.-W. Hong, A. J. Yezzi, and S. Soatto. “Integral
Invariants for Shape Matching”. IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol. 28, No. 10, pp. 1602–1618, 2006.
[Mark 10]
M. Markert, A. Koschany, and T. Lueth. “Tracking of the Liver for
Navigation in Open Surgery”. International Journal of Computer Assisted Radiology and Surgery, Vol. 5, No. 3, pp. 229–235, May 2010.
[Mark 12]
P. Markelj, D. Tomazevic, B. Likar, and F. Pernus. “A Review of 3D/2D
Registration Methods for Image-guided Interventions”. Medical Image
Analysis, Vol. 16, No. 3, pp. 642–661, 2012.
[Marq 63]
D. W. Marquardt. “An Algorithm for Least-squares Estimation of
Nonlinear Parameters”. Journal of the Society for Industrial & Applied
Mathematics, Vol. 11, No. 2, pp. 431–441, 1963.
[Mate 08]
D. Mateus, R. Horaud, D. Knossow, F. Cuzzolin, and E. Boyer. “Articulated Shape Matching using Laplacian Eigenfunctions and Unsupervised Point Registration”. In: International Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 1–8, IEEE, Jun 2008.
[McCl 06]
J. McClelland, J. Blackall, S. Tarte, A. Chandler, S. Hughes, S. Ahmad,
D. Landau, and D. Hawkes. “A Continuous 4D Motion Model from
Multiple Respiratory Cycles for Use in Lung Radiotherapy”. Medical
Physics, Vol. 33, No. 9, pp. 3348–3358, 2006.
[McCl 13]
J. McClelland, D. Hawkes, T. Schaeffter, and A. King. “Respiratory
Motion Models: A Review”. Medical Image Analysis, Vol. 17, No. 1,
pp. 19–42, 2013.
[McFa 11]
E. G. McFarland, K. J. Keysor, and D. J. Vining. “Virtual Colonoscopy:
From Concept to Implementation”. In: A. H. Dachman and A. Laghi,
Eds., Atlas of Virtual Colonoscopy, pp. 3–7, Springer, 2011.
[McNa 09] J. E. McNamara, P. H. Pretorius, K. Johnson, J. M. Mukherjee, J. Dey,
M. A. Gennert, and M. A. King. “A Flexible Multicamera Visualtracking System for Detecting and Correcting Motion-induced Artifacts in Cardiac SPECT Slices”. Medical Physics, Vol. 36, No. 5,
pp. 1913–1923, 2009.
Bibliography
175
[Meek 12]
S. L. Meeks, T. R. Willoughby, K. M. Langen, and P. A. Kupelian.
Image-Guided Radiation Therapy, Chap. Optical and Remote Monitoring IGRT, pp. 1–12. Imaging in Medical Diagnosis and Therapy, CRC
Press, 2012.
[Ment 12]
H. M. Mentis, K. O’Hara, A. Sellen, and R. Trivedi. “Interaction Proxemics and Image Use in Neurosurgery”. In: Conference on ComputerHuman Interaction (CHI), pp. 927–936, ACM, May 2012.
[Miko 05]
K. Mikolajczyk and C. Schmid. “A Performance Evaluation of Local
Descriptors”. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 10, pp. 1615–1630, 2005.
[Miqu 13]
M. Miquel, J. Blackall, S. Uribe, D. Hawkes, and T. Schaeffter. “Patientspecific Respiratory Models using Dynamic 3D MRI: Preliminary Volunteer Results”. Physica Medica: European Journal of Medical Physics,
Vol. 29, No. 2, pp. 214–220, Mar 2013.
[Miro 11]
D. J. Mirota, M. Ishii, and G. D. Hager. “Vision-based Navigation in
Image-guided Interventions”. Annual Review of Biomedical Engineering,
Vol. 13, pp. 297–319, Aug 2011.
[Mode 03a] J. Modersitzki. Numerical Methods for Image Registration. Numerical
Mathematics and Scientific Computation, Oxford University Press, 2003.
[Mode 03b] J. Modersitzki and B. Fischer. “Curvature Based Image Registration”.
Journal of Mathematical Imaging and Vision, Vol. 18, No. 1, pp. 81–85,
2003.
[Mode 09]
J. Modersitzki. FAIR: Flexible Algorithms for Image Registration. Fundamentals of Algorithms, Society for Industrial and Applied Mathematics,
2009.
[Monn 11] H. Mönnich, P. Nicolai, J. Raczkowsky, and H. Wörn. “A SemiAutonomous Robotic Teleoperation Surgery Setup with Multi 3D
Camera Supervision”. In: International Journal of Computer Assisted
Radiology and Surgery, pp. 132–133, 2011.
[Mose 11]
T. Moser, S. Fleischhacker, K. Schubert, G. Sroka-Perez, and C. P.
Karger. “Technical Performance of a Commercial Laser Surface Scanning System for Patient Setup Correction in Radiotherapy”. Physica
Medica: European Journal of Medical Physics, Vol. 27, No. 4, pp. 224–232,
2011.
[Mosh 12]
E. R. Moshe Gabel, Ran Gilad-Bachrach and A. Schuster. “Full Body
Gait Analysis with Kinect”. In: International Conference of the Engineering in Medicine and Biology Society (EMBC), IEEE, Aug 2012.
[Moun 07] P. Mountney, B. Lo, S. Thiemjarus, D. Stoyanov, and G. Zhong-Yang.
“A Probabilistic Framework for Tracking Deformable Soft Tissue in
Minimally Invasive Surgery”. In: International Conference on Medical
Image Computing and Computer Assisted Intervention (MICCAI), pp. 34–
41, LNCS 4792, Part II, Springer, Nov 2007.
176
Bibliography
[Muac 07]
A. Muacevic, C. Drexler, A. Wowra, A. Schweikard, A. Schlaefer, R. T.
Hoffmann, R. Wilkowski, and H. Winter. “Technical Description,
Phantom Accuracy, and Clinical Feasibility for Single-session Lung
Radiosurgery Using Robotic Image-guided Real-time Respiratory Tumor Tracking”. Technology in Cancer Research and Treatment, Vol. 6,
No. 4, pp. 321–328, Aug 2007.
[Mull 10]
K. Müller. Multi-modal Organ Surface Registration using Time-of-Flight
Imaging. Master’s thesis, Friedrich-Alexander-Universität ErlangenNürnberg, 2010.
[Mull 11]
K. Müller, S. Bauer, J. Wasza, and J. Hornegger.
“Automatic
Multi-modal ToF/CT Organ Surface Registration”. In: H. Handels,
J. Ehrhardt, T. M. Deserno, H.-P. Meinzer, and T. Tolxdorff, Eds., Bildverarbeitung für die Medizin (BVM), pp. 154–158, Springer, Mar 2011.
[Murp 04]
M. J. Murphy. “Tracking Moving Organs in Real Time”. Seminars in
Radiation Oncology, Vol. 14, No. 1, pp. 91–100, Jan 2004.
[Murp 07]
M. J. Murphy, J. Balter, S. Balter, J. Jose A. BenComo, I. J. Das, S. B.
Jiang, C.-M. Ma, G. H. Olivera, R. F. Rodebaugh, K. J. Ruchala, H. Shirato, and F.-F. Yin. “The Management of Imaging Dose during Imageguided Radiotherapy, Report of the AAPM Task Group 75”. Medical
Physics, Vol. 34, No. 10, pp. 4041–4063, 2007.
[Myro 10]
A. Myronenko and X. B. Song. “Point Set Registration: Coherent Point
Drift”. IEEE Transactions on Pattern Analysis and Machine Intelligence,
Vol. 32, No. 12, pp. 2262–2275, 2010.
[Nava 11]
N. Navab and S. Holzer. “Real-time 3D Reconstruction: Applications
to Collision Detection and Surgical Workflow Monitoring”. In: IROS
Workshop on Methods for Safer Surgical Robotics Procedures, IEEE, RSJ,
Sep 2011.
[Nava 12]
N. Navab, T. Blum, L. Wang, A. Okur, and T. Wendler. “First Deployments of Augmented Reality in Operating Rooms”. Computer, Vol. 45,
No. 7, pp. 48–55, Jul 2012.
[Neum 11] D. Neumann, F. Lugauer, S. Bauer, J. Wasza, and J. Hornegger. “Realtime RGB-D Mapping and 3-D Modeling on the GPU using the Random Ball Cover Data Structure”. In: ICCV Workshop on Consumer
Depth Cameras for Computer Vision (CDC4CV), pp. 1161–1167, IEEE,
Nov 2011.
[Newc 11]
R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J.
Davison, P. Kohli, J. Shotton, S. Hodges, and A. W. Fitzgibbon.
“KinectFusion: Real-time Dense Surface Mapping and Tracking”.
In: International Symposium on Mixed and Augmented Reality (ISMAR),
pp. 127–136, IEEE, Oct 2011.
[Nico 11]
P. Nicolai and J. Raczkowsky. “Operation Room Supervision for Safe
Robotic Surgery with a Multi 3D-Camera Setup”. In: IROS Workshop
on Methods for Safer Surgical Robotics Procedures, IEEE, RSJ, Sep 2011.
Bibliography
177
[Noon 12]
P. Noonan, J. Howard, D. Tout, I. Armstrong, H. Williams, T. Cootes,
W. Hallett, and R. Hinz. “Accurate Markerless Respiratory Tracking
for Gated Whole Body PET using the Microsoft Kinect”. In: Nuclear
Science Symposium (NSS) and Medical Imaging Conference (MIC), IEEE,
Oct 2012.
[Oggi 04]
T. Oggier, M. Lehmann, R. Kaufmann, M. Schweizer, M. Richter,
P. Metzler, G. Lang, F. Lustenberger, and N. Blanc. “An All-solid-state
Optical Range Camera for 3D Real-time Imaging with Sub-centimeter
Depth Resolution (SwissRanger)”. In: SPIE Optical Design and Engineering, pp. 534–545, Feb 2004.
[Oles 10]
O. V. Olesen, M. R. Jorgensen, R. R. Paulsen, L. Hojgaard, B. Roed,
and R. Larsen. “Structured Light 3D Tracking System for Measuring
Motions in PET Brain Imaging”. In: SPIE Medical Imaging, pp. 76250X–
11, Feb 2010.
[Oliv 11]
T. Oliveira-Santos, M. Peterhans, S. Hofmann, and S. Weber. “Passive Single Marker Tracking for Organ Motion and Deformation Detection in Open Liver Surgery”. In: International Conference on Information Processing in Computer-Assisted Interventions (IPCAI), pp. 156–167,
Springer, Jun 2011.
[Ong 13]
S. K. Ong, J. Zhang, and A. Y. C. Nee. “Assistive Obstacle Detection
and Navigation Devices for Vision-impaired Users”. Disability and
Rehabilitation: Assistive Technology, 2013.
[Pado 12]
N. Padoy, T. Blum, S.-A. Ahmadi, H. Feussner, M.-O. Berger, and
N. Navab. “Statistical Modeling and Recognition of Surgical Workflow”. Medical Image Analysis, Vol. 16, No. 3, pp. 632–641, 2012.
[Para 03]
N. Paragios, M. Rousson, and V. Ramesh. “Non-rigid Registration
using Distance Functions”. Computer Vision and Image Understanding,
Vol. 89, No. 2-3, pp. 142–165, 2003.
[Parr 12]
G. Parra-Dominguez, B. Taati, and A. Mihailidis. “3D Human Motion Analysis to Detect Abnormal Events on Stairs”. In: International
Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), pp. 97–103, Oct 2012.
[Pass 08]
J. Passenger, O. Acosta, H. de Visser, S. Bauer, C. Russ, and S. Ourselin.
“Texture Coordinate Generation of Colonic Surface Meshes for Surgical Simulation”. In: International Symposium on Biomedical Imaging
(ISBI), pp. 640–643, IEEE, May 2008.
[Paul 05]
M. Pauly, N. J. Mitra, J. Giesen, M. H. Gross, and L. J. Guibas.
“Example-Based 3D Scan Completion”. In: Symposium on Geometry
Processing, pp. 23–32, Eurographics Association, Jul 2005.
[Pear 12]
N. Pears, Y. Liu, and P. Bunting, Eds. 3D Imaging, Analysis and Applications. Springer, 2012.
[Pear 96]
K. Pearson. “Mathematical Contributions to the Theory of Evolution.
III. Regression, Heredity, and Panmixia”. Philosophical Transactions of
the Royal Society of London. Series A, Containing Papers of a Mathematical
or Physical Character, Vol. 187, pp. 253–318, 1896.
178
Bibliography
[Peng 10]
J. L. Peng, D. Kahler, J. G. Li, S. Samant, G. Yan, R. Amdur, and C. Liu.
“Characterization of a Real-time Surface Image-guided Stereotactic
Positioning System”. Medical Physics, Vol. 37, No. 10, pp. 5421–5433,
2010.
[Penn 09]
J. Penne, K. Höller, M. Stürmer, T. Schrauder, A. Schneider, R. Engelbrecht, H. Feußner, B. Schmauss, and J. Hornegger. “Time-of-Flight
3-D Endoscopy”. In: G.-Z. Yang et al., Eds., International Conference on
Medical Image Computing and Computer Assisted Intervention (MICCAI),
pp. 467–474, LNCS 5761, Part I, Springer, Nov 2009.
[Petr 11]
A. Petrelli and L. di Stefano. “On the Repeatability of the Local Reference Frame for Partial Shape Matching”. In: International Conference
on Computer Vision (ICCV), pp. 2244–2251, IEEE, Nov 2011.
[Pick 11]
P. J. Pickhardt, C. Hassan, S. Halligan, and R. Marmo. “Colorectal
Cancer: CT Colonography and Colonoscopy for Detection–Systematic
Review and Meta-analysis.”. Radiology, Vol. 259, No. 2, pp. 393–405,
May 2011.
[Plac 12]
S. Placht, J. Stancanello, C. Schaller, M. Balda, and E. Angelopoulou.
“Fast Time-of-Flight Camera based Surface Registration for Radiotherapy Patient Positioning”. Medical Physics, Vol. 39, No. 1, pp. 4–17,
2012.
[Pons 07]
J.-P. Pons, R. Keriven, and O. Faugeras. “Multi-View Stereo Reconstruction and Scene Flow Estimation with a Global Image-Based
Matching Score”. International Journal of Computer Vision, Vol. 72,
No. 2, pp. 179–193, Apr 2007.
[Quir 12]
J. Quiroga, F. Devernay, and J. Crowley. “Scene Flow by Tracking in
Intensity and Depth Data”. In: CVPR Workshop on Human Activity
Understanding from 3D Data, pp. 50–57, IEEE, Jun 2012.
[Rai 06]
L. Rai, S. A. Merritt, and W. E. Higgins. “Real-time Image-Based Guidance Method for Lung-Cancer Assessment”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2437–2444,
IEEE, Jun 2006.
[Rang 97]
A. Rangarajan, H. Chui, and F. L. Bookstein. “The Softassign Procrustes Matching Algorithm”. In: International Conference on Information Processing in Medical Imaging (IPMI), pp. 29–42, Jun 1997.
[Reyn 11]
M. Reynolds, J. Dobos, L. Peel, T. Weyrich, and G. Brostow. “Capturing Time-of-Flight Data with Confidence”. In: International Conference
on Computer Vision and Pattern Recognition (CVPR), pp. 945–952, IEEE,
Jun 2011.
[Rish 11]
P. Risholm, J. Balter, and W. M. Wells. “Estimation of Delivered Dose
in Radiotherapy: The Influence of Registration Uncertainty”. In: International Conference on Medical Image Computing and Computer Assisted
Intervention (MICCAI), pp. 548–555, LNCS 6891, Part I, Springer, Sep
2011.
Bibliography
179
[Robe 07]
J. C. Roberts, A. C. Merkle, P. J. Biermann, E. E. Ward, B. G. Carkhuff,
R. P. Cain, and J. V. O’Connor. “Computational and Experimental
Models of the Human Torso for Non-penetrating Ballistic Impact”.
Journal of Biomechanics, Vol. 40, No. 1, pp. 125–136, 2007.
[Rohl 12]
S. Röhl, S. Bodenstedt, S. Suwelack, H. Kenngott, B. P. Müller-Stich,
R. Dillmann, and S. Speidel. “Dense GPU-Enhanced Surface Reconstruction from Stereo Endoscopic Images for Intraoperative Registration”. Medical Physics, Vol. 39, pp. 1632–1645, 2012.
[Rouh 11]
M. Rouhani and A. D. Sappa. “Correspondence Free Registration
through a Point-to-Model Distance Minimization”. In: International
Conference on Computer Vision (ICCV), pp. 2150–2157, IEEE, Nov 2011.
[Ruec 03]
D. Rueckert, A. F. Frangi, and J. A. Schnabel. “Automatic Construction of 3-D Statistical Deformation Models of the Brain using Nonrigid
Registration”. IEEE Transactions on Medical Imaging, Vol. 22, No. 8,
pp. 1014–1025, Aug 2003.
[Ruec 11]
D. Rueckert and J. Schnabel. “Medical Image Registration”. In: T. M.
Deserno, Ed., Biomedical Image Processing, pp. 131–154, Springer, 2011.
[Ruec 99]
D. Rueckert, L. Sonoda, C. Hayes, D. Hill, M. Leach, and D. Hawkes.
“Nonrigid Registration using Free-form Deformations: Application
to Breast MR Images”. IEEE Transactions on Medical Imaging, Vol. 18,
No. 8, pp. 712–721, Aug 1999.
[Rusi 01]
S. Rusinkiewicz and M. Levoy. “Efficient Variants of the ICP Algorithm”. In: International Conference on 3-D Digital Imaging and Modeling, pp. 145–152, May 2001.
[Rusi 02]
S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. “Real-time 3D Model
Acquisition”. ACM Transactions on Graphics, Vol. 21, No. 3, pp. 438–
446, Jul 2002.
[Russ 00]
G. Russo and P. Smereka. “A Remark on Computing Distance Functions”. Journal of Computational Physics, Vol. 163, pp. 51–67, 2000.
[Rusu 11]
R. B. Rusu and S. Cousins. “3D is Here: Point Cloud Library (PCL)”.
In: International Conference on Robotics and Automation (ICRA), pp. 1–4,
IEEE, May 2011.
[Salv 04]
J. Salvi, J. Pages, and J. Batlle. “Pattern Codification Strategies in Structured Light Systems”. Pattern Recognition, Vol. 37, No. 4, pp. 827–849,
2004.
[Salv 07]
J. Salvi, C. Matabosch, D. Fofi, and J. Forest. “A Review of Recent
Range Image Registration Methods with Accuracy Evaluation”. Image
and Vision Computing, Vol. 25, No. 5, pp. 578–596, 2007.
[Sant 10]
T. R. dos Santos, A. Seitel, H.-P. Meinzer, and L. Maier-Hein. “Correspondences Search for Surface-Based Intra-Operative Registration”.
In: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 660–667, LNCS 6362, Part II,
Springer, Sep 2010.
180
Bibliography
[Sant 12a]
T. R. dos Santos. Muti-Modal Partial Surface Matching For Intraoperative
Registration. PhD thesis, Ruprecht-Karls-Universität Heidelberg, 2012.
[Sant 12b]
T. R. dos Santos, C. J. Goch, A. M. Franz, H.-P. Meinzer, T. Heimann,
and L. Maier-Hein. “Minimally Deformed Correspondences between
Surfaces for Intra-Operative Registration”. In: SPIE Medical Imaging,
p. 83141C, Feb 2012.
[Sava 97]
C. Savage. “A Survey of Combinatorial Gray Codes”. SIAM Review,
Vol. 39, No. 4, pp. 605–629, Dec 1997.
[Scha 08]
C. Schaller, J. Penne, and J. Hornegger. “Time-of-Flight Sensor for
Respiratory Motion Gating”. Medical Physics, Vol. 35, No. 7, pp. 3090–
3093, 2008.
[Scha 09]
C. Schaller, C. Rohkohl, J. Penne, M. Stürmer, and J. Hornegger. “Inverse C-arm Positioning for Interventional Procedures Using RealTime Body Part Detection”. In: G.-Z. Y. et al., Ed., International Conference on Medical Image Computing and Computer Assisted Intervention
(MICCAI), pp. 549–556, LNCS 5761, Part I, Springer, Sep 2009.
[Scha 12]
J. Schaerer, A. Fassi, M. Riboldi, P. Cerveri, G. Baroni, and D. Sarrut. “Multi-dimensional Respiratory Motion Tracking from Markerless Optical Surface Imaging based on Deformable Mesh Registration”. Physics in Medicine and Biology, Vol. 57, No. 2, pp. 357–373, 2012.
[Schi 11]
A. Schick, F. Forster, and M. Stockmann. “3D Measuring in the Field
of Endoscopy”. In: SPIE Optical Measurement Systems for Industrial
Inspection, pp. 808216–12, May 2011.
[Schm 09]
M. Schmidt and B. Jähne. “A Physical Model of Time-of-Flight 3D
Imaging Systems, Including Suppression of Ambient Light”. In:
A. Kolb and R. Koch, Eds., Dynamic 3D Imaging (Dyn3D), pp. 1–15,
Springer, 2009.
[Schm 11]
M. Schmidt. Analysis, Modeling and Dynamic Optimization of 3D Timeof-Flight Imaging Systems. PhD thesis, Ruprecht-Karls-Universität Heidelberg, 2011.
[Schm 12]
C. Schmalz, F. Forster, A. Schick, and E. Angelopoulou. “An Endoscopic 3D Scanner based on Structured Light”. Medical Image Analysis,
Vol. 16, No. 5, pp. 1063–1072, 2012.
[Scho 07]
P. J. Schöffel, W. Harms, G. Sroka-Perez, W. Schlegel, and C. P. Karger.
“Accuracy of a Commercial Optical 3D Surface Imaging System for
Realignment of Patients for Radiotherapy of the Thorax”. Physics in
Medicine and Biology, Vol. 52, No. 13, pp. 3949–3963, Jul 2007.
[Scho 11]
C. Schoenauer, T. Pintaric, H. Kaufmann, S. Jansen-Kosterink, and
M. Vollenbroek-Hutten. “Chronic Pain Rehabilitation with a Serious
Game using Multimodal Input”. In: International Conference on Virtual
Rehabilitation (ICVR), pp. 1–8, IEEE, Jun 2011.
[Schr 06]
W. Schroeder, K. Martin, and B. Lorensen. The Visualization Toolkit: An
Object-Oriented Approach To 3D Graphics. Kitware, Inc., 2006.
Bibliography
181
[Sega 07]
W. Segars, S. Mori, G. Chen, and B. Tsui. “Modeling Respiratory Motion Variations in the 4D NCAT Phantom”. In: Nuclear Science Symposium (NSS) and Medical Imaging Conference (MIC), pp. 2677–2679, IEEE,
Oct 2007.
[Seit 10]
A. Seitel, T. R. dos Santos, S. Mersmann, J. Penne, R. Tetzlaff, H.-P.
Meinzer, and et al. “Time-of-Flight Kameras für die intraoperative
Oberflächenerfassung”. In: Bildverarbeitung für die Medizin (BVM),
pp. 11–15, Springer, Mar 2010.
[Seit 12]
A. Seitel. Markerless Navigation For Percutaneus Needle Insertions. PhD
thesis, Ruprecht-Karls-Universität Heidelberg, 2012.
[Sepp 02]
Y. Seppenwoolde, H. Shirato, K. Kitamura, S. Shimizu, M. van Herk,
J. V. Lebesque, and K. Miyasaka. “Precise and Real-time Measurement of 3D Tumor Motion in Lung due to Breathing and Heartbeat,
Measured during Radiotherapy”. International Journal of Radiation Oncology Biology Physics, Vol. 53, No. 4, pp. 822–834, 2002.
[Sesh 11]
S. Seshamani, G. Chintalapani, and R. H. Taylor. “Iterative Refinement
of Point Correspondences for 3D Statistical Shape Models”. In: International Conference on Medical Image Computing and Computer-Assisted
Intervention (MICCAI), pp. 417–425, LNCS 6892, Part II, Springer, Sep
2011.
[Sham 10]
R. Shams, P. Sadeghi, R. Kennedy, and R. Hartley. “A Survey of Medical Image Registration on Multicore and the GPU”. IEEE Signal Processing Magazine, Vol. 27, No. 2, pp. 50–60, Mar 2010.
[Shan 04]
Y. Shan, B. Matei, H. S. Sawhney, R. Kumar, D. F. Huber, and
M. Hebert. “Linear Model Hashing and Batch RANSAC for Rapid and
Accurate Object Recognition”. In: International Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 121–128, IEEE, Jul 2004.
[Shim 12]
H. Shim and S. Lee. “Performance Evaluation of Time-of-Flight
and Structured Light Depth Sensors in Radiometric/Geometric Variations”. SPIE Optical Engineering, Vol. 51, No. 9, pp. 1–094401–12, 2012.
[Shpu 11]
A. Shpunt and Z. Zalevsky. “Depth-varying Light Fields for Three
Dimensional Sensing”. Patent, Nov 2011. US20080106746.
[Simo 12]
J. Simon. MRI Workflow Optimization using Real-Time Range Imaging
Sensors. Master’s thesis, Friedrich-Alexander-Universität ErlangenNürnberg, 2012.
[Siva 12]
R. Sivalingam, A. Cherian, J. Fasching, N. Walczak, N. D. Bird,
V. Morellas, B. Murphy, K. Cullen, K. Lim, G. Sapiro, and N. Papanikolopoulos. “A Multi-sensor Visual Tracking System for Behavior Monitoring of At-Risk Children”. In: International Conference on
Robotics and Automation (ICRA), pp. 1345–1350, IEEE, May 2012.
[Smis 13]
J. Smisek, M. Jancosek, and T. Pajdla. Consumer Depth Cameras for Computer Vision: Research Topics and Applications, Chap. 3D with Kinect,
pp. 3–25. Advances in Computer Vision and Pattern Recognition, Springer,
2013.
182
Bibliography
[Smit 12]
S. T. Smith and D. Schoene. “The Use of Exercise-based Videogames
for Training and Rehabilitation of Physical Function in Older Adults:
Current Practice and Guidelines for Future Research”. Aging Health,
Vol. 8, No. 3, pp. 243–252, 2012.
[Soti 12]
A. Sotiras, D. Christos, and N. Paragios. “Deformable Medical Image
Registration: A Survey”. Research Report RR-7919, INRIA, Mar 2012.
[Sout 08]
S. Soutschek, J. Penne, J. Hornegger, and J. Kornhuber. “3-D GestureBased Scene Navigation in Medical Imaging Applications Using
Time-Of-Flight Cameras”. In: CVPR Workshop on Time of Flight Camera
based Computer Vision, pp. 1–4, IEEE, Jun 2008.
[Sout 10]
S. Soutschek, A. Maier, S. Bauer, P. Kugler, M. Bebenek, S. Steckmann,
S. von Stengel, W. Kemmler, J. Hornegger, and J. Kornhuber. “Measurement of Angles in Time-of-Flight Data for the Automatic Supervision of Training Exercises”. In: International Conference on Pervasive
Computing Technologies for Healthcare (PervasiveHealth), pp. 1–4, IEEE,
Mar 2010.
[Spie 02]
H. Spies, B. Jähne, and J. L. Barron. “Range Flow Estimation”. Computer Vision and Image Understanding, Vol. 85, No. 3, pp. 209–231, 2002.
[Stau 12]
R. Stauder, V. Belagiannis, L. Schwarz, A. Bigdelou, E. Söhngen, S. Ilic,
and N. Navab. “A User-Centered and Workflow-Aware Unified Display for the Operating Room”. In: MICCAI Workshop on Modeling and
Monitoring of Computer Assisted Interventions, Oct 2012.
[Stei 11]
F. Steinbrucker, J. Sturm, and D. Cremers. “Real-time Visual Odometry from Dense RGB-D Images”. In: ICCV Workshop on Live Dense
Reconstruction with Moving Cameras, pp. 719–722, IEEE, Nov 2011.
[Ston 11a]
E. Stone and M. Skubic. “Passive In-home Measurement of Strideto-Stride Gait Variability Comparing Vision and Kinect Sensing”. In:
International Conference of the Engineering in Medicine and Biology Society
(EMBC), pp. 6491–6494, IEEE, Sep 2011.
[Ston 11b]
E. Stone and M. Skubic. “Evaluation of an Inexpensive Depth Camera
for In-home Gait Assessment”. Journal of Ambient Intelligence and Smart
Environments, Vol. 3, No. 4, pp. 349–361, Dec 2011.
[Stoy 07]
E. Stoykova, A. Alatan, P. Benzie, N. Grammalidis, S. Malassiotis,
J. Ostermann, S. Piekh, V. Sainov, C. Theobalt, T. Thevar, and X. Zabulis. “3-D Time-Varying Scene Capture Technologies; A Survey”. IEEE
Transactions on Circuits and Systems for Video Technology, Vol. 17, No. 11,
pp. 1568–1586, Nov 2007.
[Stoy 10]
D. Stoyanov, M. Scarzanella, P. Pratt, and G.-Z. Yang. “Real-Time
Stereo Reconstruction in Robotically Assisted Minimally Invasive
Surgery”. In: T. Jiang, N. Navab, J. Pluim, and M. Viergever, Eds.,
International Conference on Medical Image Computing and ComputerAssisted Intervention (MICCAI), pp. 275–282, LNCS 6361, Part I,
Springer, Sep 2010.
Bibliography
183
[Stoy 12]
D. Stoyanov. “Stereoscopic Scene Flow for Robotic Assisted Minimally Invasive Surgery”. In: International Conference on Medical Image
Computing and Computer-Assisted Intervention (MICCAI), pp. 479–486,
LNCS 7510, Part I, Springer, Oct 2012.
[Stur 12]
J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. “A
Benchmark for the Evaluation of RGB-D SLAM Systems”. In: International Conference on Intelligent Robot Systems (IROS), pp. 573–580, Oct
2012.
[Subs 98]
G. Subsol, J.-P. Thirion, and N. Ayache. “A Scheme for Automatically Building Three-dimensional Morphometric Anatomical Atlases:
Application to a Skull Atlas”. Medical Image Analysis, Vol. 2, No. 1,
pp. 37–60, 1998.
[Sun 09]
J. Sun, M. Ovsjanikov, and L. J. Guibas. “A Concise and Provably
Informative Multi-Scale Signature Based on Heat Diffusion”. Eurographics Computer Graphics Forum, Vol. 28, No. 5, pp. 1383–1392, 2009.
[Sun 10]
D. Sun, S. Roth, and M. J. Black. “Secrets of Optical Flow Estimation
and Their Principles”. In: International Conference on Computer Vision
and Pattern Recognition (CVPR), pp. 2432–2439, IEEE, Jun 2010.
[Sund 07]
G. Sundaramoorthi, A. Yezzi, and A. Mennucci. “Sobolev Active Contours”. International Journal of Computer Vision, Vol. 73, No. 3, pp. 345–
366, 2007.
[Taka 10]
G. Takacs, V. Chandrasekhar, S. Tsai, D. Chen, R. Grzeszczuk,
and B. Girod. “Unified Real-Time Tracking and Recognition with
Rotation-Invariant Fast Features”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 934–941, IEEE, Jun
2010.
[Tang 08a] L. Tang and G. Hamarneh. “SMRFI: Shape Matching via Registration
of Vector-valued Feature Images”. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8, IEEE, Jun 2008.
[Tang 08b] J. W. Tangelder and R. C. Veltkamp. “A Survey of Content based 3D
Shape Retrieval Methods”. Multimedia Tools and Applications, Vol. 39,
No. 3, pp. 441–471, Sep 2008.
[Thor 09]
N. Thorstensen and R. Keriven. “Non-rigid Shape Matching Using
Geometry and Photometry”. In: Asian Conference on Computer Vision
(ACCV), pp. 644–654, Springer, Sep 2009.
[Toma 98]
C. Tomasi and R. Manduchi. “Bilateral Filtering for Gray and Color
Images”. In: International Conference on Computer Vision (ICCV),
pp. 839–846, IEEE, Jan 1998.
[Tomb 10]
F. Tombari, S. Salti, and L. Di Stefano. “Unique Signatures of Histograms for Local Surface Description”. In: European Conference on
Computer Vision (ECCV), pp. 356–369, Springer, Sep 2010.
[Totz 11]
J. Totz, P. Mountney, D. Stoyanov, and G.-Z. Yang. “Dense Surface Reconstruction for Enhanced Navigation in MIS”. In: International Conference on Medical Image Computing and Computer Assisted Intervention
(MICCAI), pp. 89–96, LNCS 6891, Part I, Springer, Sep 2011.
184
Bibliography
[Totz 12]
J. Totz, K. Fujii, P. Mountney, and G.-Z. Yang. “Enhanced Visualisation for Minimally Invasive Surgery”. International Journal of Computer
Assisted Radiology and Surgery, Vol. 7, No. 3, pp. 423–432, May 2012.
[Tsag 12]
G. Tsagkatakis, A. Woiselle, G. Tzagkarakis, M. Bousquet, J.-L. Starck,
and P. Tsakalides. “Active Range Imaging via Random Gating”. In:
SPIE Electro-Optical Remote Sensing, Photonic Technologies, and Applications, pp. 85420P–9, Sep 2012.
[Tsin 04]
Y. Tsin and T. Kanade. “A Correlation-Based Approach to Robust
Point Set Registration”. In: European Conference on Computer Vision
(ECCV), pp. 558–569, Springer, May 2004.
[Unal 04]
G. Unal, G. Slabaugh, A. Yezzi, and J. Tyan. “Joint Segmentation and
Non-Rigid Registration Without Shape Priors”. Tech. Rep. SCR-04TR-7495, Siemens Corporate Research, 2004.
[Vand 11]
J. Vandemeulebroucke, S. Rit, J. Kybic, P. Clarysse, and D. Sarrut.
“Spatiotemporal Motion Estimation for Respiratory-correlated Imaging of the Lungs”. Medical Physics, Vol. 38, No. 1, pp. 166–178, Jan
2011.
[Vedu 05]
S. Vedula, S. Baker, P. Rander, R. T. Collins, and T. Kanade. “ThreeDimensional Scene Flow”. IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. 27, No. 3, pp. 475–480, 2005.
[Vedu 99]
S. Vedula, S. Baker, P. Rander, R. T. Collins, and T. Kanade. “ThreeDimensional Scene Flow”. In: International Conference on Computer
Vision (ICCV), pp. 722–729, IEEE, Sep 1999.
[Vere 10]
D. Verellen, T. Depuydt, T. Gevaert, N. Linthout, K. Tournel,
M. Duchateau, T. Reynders, G. Storme, and M. D. Ridder. “Gating
and Tracking, 4D in Thoracic Tumours”. Cancer/Radiotherapie, Vol. 14,
No. 6–7, pp. 446–454, Oct 2010.
[Volz 11]
S. Volz, A. Bruhn, L. Valgaerts, and H. Zimmer. “Modeling Temporal
Coherence for Optical Flow”. In: International Conference on Computer
Vision (ICCV), pp. 1116–1123, IEEE, Nov 2011.
[Vos 04]
F. Vos, P. W. de Bruin, J. G. M. Aubel, G. J. Streekstra, M. Maas, L. J.
van Vliet, and A. M. Vossepoel. “A Statistical Shape Model without
Using Landmarks”. In: International Conference on Pattern Recognition
(ICPR), pp. 714–717, IEEE, Aug 2004.
[Walc 12]
N. Walczak, J. Fasching, W. D. Toczyski, R. Sivalingam, N. D. Bird,
K. Cullen, V. Morellas, B. Murphy, G. Sapiro, and N. Papanikolopoulos. “A Nonintrusive System for Behavioral Analysis of Children using Multiple RGB+Depth Sensors”. In: Workshop on the Applications of
Computer Vision (WACV), pp. 217–222, IEEE, Jan 2012.
[Wald 09]
T. Waldron. “External Surrogate Measurement and Internal Target
Motion: Photogrammetry as a Tool in IGRT”. In: ACMP Annual Meeting, p. 10059, 2009.
Bibliography
185
[Wang 12]
X. L. Wang, P. J. Stolka, E. Boctor, G. Hager, and M. Choti. “The
Kinect as an Interventional Tracking System”. In: SPIE Medical Imaging, pp. 83160U–6, Feb 2012.
[Warr 12]
A. Warren, P. Mountney, D. Noonan, and G.-Z. Yang. “Horizon
Stabilized–Dynamic View Expansion for Robotic Assisted Surgery
(HS-DVE)”. International Journal of Computer Assisted Radiology and
Surgery, Vol. 7, No. 2, pp. 281–288, Mar 2012.
[Wasz 11a] J. Wasza, S. Bauer, and J. Hornegger. “Real-time Preprocessing for
Dense 3-D Range Imaging on the GPU: Defect Interpolation, Bilateral
Temporal Averaging and Guided Filtering”. In: ICCV Workshop on
Consumer Depth Cameras for Computer Vision (CDC4CV), pp. 1221–1227,
IEEE, Nov 2011.
[Wasz 11b] J. Wasza, S. Bauer, S. Haase, M. Schmid, S. Reichert, and J. Hornegger. “RITK: The Range Imaging Toolkit - A Framework for 3-D Range
Image Stream Processing”. In: P. Eisert, J. Hornegger, and K. Polthier, Eds., International Workshop on Vision, Modelling and Visualization
(VMV), pp. 57–64, Eurographics Association, 2011.
[Wasz 11c] J. Wasza, S. Bauer, and J. Hornegger. “High Performance GPUBased Preprocessing for Time-of-Flight Imaging in Medical Applications”. In: H. Handels, J. Ehrhardt, T. M. Deserno, H.-P. Meinzer, and
T. Tolxdorff, Eds., Bildverarbeitung für die Medizin (BVM), pp. 324–328,
Springer, Mar 2011.
[Wasz 12a] J. Wasza, S. Bauer, S. Haase, and J. Hornegger. “Sparse Principal Axes
Statistical Surface Deformation Models for Respiration Analysis and
Classification”. In: T. Tolxdorff, T. M. Deserno, H. Handels, and H.P. Meinzer, Eds., Bildverarbeitung für die Medizin (BVM), pp. 316–321,
Springer, Mar 2012.
[Wasz 12b] J. Wasza, S. Bauer, and J. Hornegger. “Real-time Motion Compensated
Patient Positioning and Non-rigid Deformation Estimation using 4-D
Shape Priors”. In: International Conference on Medical Image Processing
and Computer Assisted Intervention (MICCAI), pp. 576–583, LNCS 7511,
Part II, Springer, Oct 2012.
[Wasz 13]
J. Wasza, S. Bauer, and J. Hornegger. “Real-time Respiratory Motion
Analysis Using Manifold Ray Casting of Volumetrically Fused MultiView Range Imaging”. In: International Conference on Medical Image
Processing and Computer Assisted Intervention (MICCAI), pp. 116–123,
LNCS 8150, Part II, Springer, Sep 2013.
[Weik 97]
S. Weik. “Registration of 3-D Partial Surface Models using Luminance
and Depth Information”. In: International Conference on 3-D Digital
Imaging and Modeling (3DIM), pp. 93–100, IEEE, May 1997.
[West 08]
J. West. Respiratory Physiology: The Essentials. Point (Lippincott Williams
and Wilkins) Series, Wolters Kluwer Health/Lippincott Williams &
Wilkins, 2008.
[Whel 12]
T. Whelan, H. Johannsson, M. Kaess, J. Leonard, and J. McDonald.
“Robust Tracking for Real-Time Dense RGB-D Mapping with Kintinuous”. Tech. Rep. MIT-CSAIL-TR-2012-031, MIT, Sep 2012.
186
Bibliography
[Whit 01]
R. Whitaker and X. Xue. “Variable-Conductance, Level-Set Curvature
for Image Denoising”. In: International Conference on Image Processing
(ICIP), pp. 142–145, IEEE, Oct 2001.
[Wilb 08]
J. Wilbert, J. Meyer, K. Baier, M. Guckenberger, C. Herrmann, R. Hess,
C. Janka, L. Ma, T. Mersebach, A. Richter, M. Roth, K. Schilling,
and M. Flentje. “Tumor Tracking and Motion Compensation with
an Adaptive Tumor Tracking System (ATTS): System Description and
Prototype Testing”. Medical Physics, Vol. 35, pp. 3911–3921, 2008.
[Will 06]
T. R. Willoughby, A. R. Forbes, D. Buchholz, K. M. Langen, T. H. Wagner, O. A. Zeidan, P. A. Kupelian, and S. L. Meeks. “Evaluation of an
Infrared Camera and X-ray System using Implanted Fiducials in Patients with Lung Tumors for Gated Radiation Therapy”. International
Journal of Radiation Oncology Biology Physics, Vol. 66, No. 2, pp. 568–
575, 2006.
[Will 09]
T. Willoughby. “Performance-Based QA for Radiotherapy: TG-147
- QA for Non-Radiographic Localization Systems”. Medical Physics,
Vol. 36, No. 6, pp. 2743–2744, 2009.
[Will 12]
T. Willoughby, J. Lehmann, J. A. Bencomo, S. K. Jani, L. Santanam,
A. Sethi, T. D. Solberg, W. A. Tome, and T. J. Waldron. “Quality Assurance for Nonradiographic Radiotherapy Localization and Positioning
Systems: Report of Task Group 147”. Medical Physics, Vol. 39, No. 4,
pp. 1728–1747, Apr 2012.
[Wu 12]
D. Wu, M. O’Toole, A. Velten, A. Agrawal, and R. Raskar. “Decomposing Global Light Transport using Time of Flight Imaging”. In: International Conference on Computer Vision and Pattern Recognition (CVPR),
pp. 366–373, IEEE, Jun 2012.
[Xu 98]
Z. Xu, R. Schwarte, H. Heinol, B. Buxbaum, and T. Ringbeck. “Smart
Pixel - Photometric Mixer Device (PMD) / New System Concept of
a 3D-Imaging-on-a-Chip”. In: International Conference on Mechatronics
and Machine Vision in Practice, pp. 259–264, 1998.
[Yaha 07]
G. Yahav, G. Iddan, and D. Mandelboum. “3D Imaging Camera for
Gaming Application”. In: International Conference on Consumer Electronics (ICCE), Digest of Technical Papers, pp. 1–2, IEEE, Jan 2007.
[Yama 07]
Y. Yamauchi. “Non-optical Expansion of Field-of-view of the Rigid
Endoscope”. In: R. Magjarevic and J. Nagel, Eds., World Congress
on Medical Physics and Biomedical Engineering, pp. 4184–4186, Springer,
Aug 2007.
[Yan 06]
H. Yan, F.-F. Yin, G.-P. Zhu, M. Ajlouni, and J. H. Kim. “The Correlation Evaluation of a Tumor Tracking System using Multiple External
Markers”. Medical Physics, Vol. 33, No. 11, pp. 4073–4084, 2006.
[Yu 12]
M.-C. Yu, H. Wu, J.-L. Liou, M.-S. Lee, and Y.-P. Hung. “Breath and
Position Monitoring during Sleeping with a Depth Camera”. In: International Conference on Health Informatics (HEALTHINF), pp. 12–22, Feb
2012.
Bibliography
187
[Zaha 09]
A. Zaharescu, E. Boyer, K. Varanasi, and R. P. Horaud. “Surface Feature Detection and Description with Applications to Mesh Matching”.
In: International Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 373–380, IEEE, Jun 2009.
[Zale 10]
Z. Zalevsky, A. Shpunt, A. Maizels, and J. Garcia. “Method and System for Object Reconstruction”. Patent, Jul 2010. US20100177164A1.
[Zhan 00a] Y. Zhang and C. Kambhamettu. “Integrated 3D Scene Flow and Structure Recovery from Multiview Image Sequences”. In: International
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 674–
681, IEEE, Jun 2000.
[Zhan 00b] Z. Zhang. “A Flexible New Technique for Camera Calibration”. IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 11,
pp. 1330–1334, 2000.
[Zhan 94]
Z. Zhang. “Iterative Point Matching for Registration of Free-form
Curves and Surfaces”. International Journal of Computer Vision, Vol. 13,
No. 2, pp. 119–152, Oct 1994.
[Zhen 10]
B. Zheng, J. Takamatsu, and K. Ikeuchi. “An Adaptive and Stable
Method for Fitting Implicit Polynomial Curves and Surfaces”. IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 3,
pp. 561–568, 2010.
[Zito 03]
B. Zitová and J. Flusser. “Image Registration Methods: a Survey”.
Image and Vision Computing, Vol. 21, No. 11, pp. 977–1000, 2003.
188
Bibliography