Download brain tumor target volume determination for radiation treatment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nuclear medicine wikipedia , lookup

Radiation burn wikipedia , lookup

Proton therapy wikipedia , lookup

Industrial radiography wikipedia , lookup

Medical imaging wikipedia , lookup

Center for Radiological Research wikipedia , lookup

Radiation therapy wikipedia , lookup

Neutron capture therapy of cancer wikipedia , lookup

Radiosurgery wikipedia , lookup

Image-guided radiation therapy wikipedia , lookup

Transcript
Int. J. Radiation Oncology Biol. Phys., Vol. 59, No. 1, pp. 300 –312, 2004
Copyright © 2004 Elsevier Inc.
Printed in the USA. All rights reserved
0360-3016/04/$–see front matter
doi:10.1016/j.ijrobp.2004.01.026
PHYSICS CONTRIBUTION
BRAIN TUMOR TARGET VOLUME DETERMINATION FOR RADIATION
TREATMENT PLANNING THROUGH AUTOMATED MRI SEGMENTATION
JAMES
GLORIA P. MAZZARA, PH.D.,* ROBERT P. VELTHUIZEN, PH.D.,†
L. PEARLMAN, M.D.,‡ HARVEY M. GREENBERG, M.D.,‡ AND HENRY WAGNER, M.D.‡
*Department of Radiology and ‡Division of Radiation Oncology, Moffitt Cancer Center, Tampa, FL; †Unilever Research and
Development, Edgewater, NJ
Purpose: To assess the effectiveness of two automated magnetic resonance imaging (MRI) segmentation methods
in determining the gross tumor volume (GTV) of brain tumors for use in radiation therapy treatment planning.
Methods and Materials: Two automated MRI tumor segmentation methods (supervised k-nearest neighbors
[kNN] and automatic knowledge-guided [KG]) were evaluated for their potential as “cyber colleagues.” This
required an initial determination of the accuracy and variability of radiation oncologists engaged in the manual
definition of the GTV in MRI registered with computed tomography images for 11 glioma patients. Three sets
of contours were defined for each of these patients by three radiation oncologists. These outlines were compared
directly to establish inter- and intraoperator variability among the radiation oncologists. A novel, probabilistic
measurement of accuracy was introduced to compare the level of agreement among the automated MRI
segmentations. The accuracy was determined by comparing the volumes obtained by the automated segmentation
methods with the weighted average volumes prepared by the radiation oncologists.
Results: Intra- and inter-operator variability in outlining was found to be an average of 20% ⴞ 15% and 28%
ⴞ 12%, respectively. Lowest intraoperator variability was found for the physician who spent the most time
producing the contours. The average accuracy of the kNN segmentation method was 56% ⴞ 6% for all 11 cases,
whereas that of the KG method was 52% ⴞ 7% for 7 of the 11 cases when compared with the physician contours.
For the areas of the contours where the oncologists were in substantial agreement (i.e., the center of the tumor
volume), the accuracy of kNN and KG was 75% and 72%, respectively. The automated segmentation methods
were found to be least accurate in outlining at the edges of the tumor volume.
Conclusions: The kNN method was able to segment all cases, whereas the KG method was limited to enhancing
tumors and gliomas with clear enhancing edges and no cystic formation. Both methods undersegment the tumor
volume when compared with the radiation oncologists and performed within the variability of the contouring
performed by experienced radiation oncologists based on the same data. © 2004 Elsevier Inc.
Glioma, Tumor volume, Magnetic resonance imaging, Image segmentation, Brain radiation therapy.
Treatment protocols for malignant brain tumors known as
gliomas generally call for removal through surgical procedures followed by irradiation of the tumor bed. The goal of
three-dimensional (3D) conformal radiation therapy is to
irradiate the tumor volume while limiting damage to the
surrounding normal tissues. Achieving this goal requires
accurate determination of 3D treatment volumes. Radiation
oncologists traditionally model the brain treatment target
through a time-intensive manual procedure involving the
outlining of the gross tumor volume (GTV) on numerous
two-dimensional imaging “slices” using either computed
tomography (CT) or magnetic resonance imaging (MRI)
(1). Recently, the search for improvements in target volume
definition methodology has concentrated on improved imaging modalities (2, 3). It has been demonstrated that MRI
is more sensitive than CT in both lesion detection and in the
margin delineation of gliomas (4 –7). However, limitations
remain in the delineation of tumor volumes and in the
ability of different radiation oncologists to reproduce consistent results (4, 8 –10).
Reprint requests to: Gloria P. Mazzara, Ph.D., 2475 Brickell
Avenue #2607, Miami, FL 33129. Tel: (305) 858-0266; Fax: (305)
929-1971; E-mail: [email protected]
Supported by 1999 RSNA (1) seed grant, entitled “Automatic
brain tumor target volume for radiation treatment planning.”
Acknowledgments—We would like to thank Computerized Medical Systems and Colin Sims, M.S., product manager, for providing
the software and support that made this research possible. Thanks
are also due to all the personnel from Moffitt Cancer Center
involved in this research, technologists from the Department of
Radiology, physicists and dosimetrists from the Department of
Radiation Oncology, and specially to Carol Johnson for her assistance in collecting the data. Also, the Department of Computer
Science for providing access to their computer workstations; Matt
Clark, Ph.D., for helping with the KG data processing; and Hans
Christian Beyer for editing the manuscript.
Received Jul 22, 2003, and in revised form Dec 19, 2003.
Accepted for publication Jan 19, 2004.
INTRODUCTION
300
Automated brain tumor volume determination
● G. P. MAZZARA et al.
301
Table 1. Patient demographics
Case
Age
Sex
Diagnosis
Surgery
(days from MRI)
MRI
type
Tumor
enhanced
RT start
(days from MRI)
1
2
3
4
5
6
7
8
9
10
11
65
52
63
69
62
47
52
62
80
47
79
F
F
F
F
F
M
M
F
M
F
F
AO
O
GBM
GBM
GBM
GBM
AO
GBM
GBM
GBM
GBM
3
1
1
3
–89
–24
3
–17
2
–13
–4
Pre
Pre
Pre
Pre
Post
Post
Pre
Post
Pre
Post
Post
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
165
70
28
19
16
10
73
6
23
7
7
Abbreviations: MRI ⫽ magnetic resonance imaging; GBM ⫽ glioblastoma multiforme; AO ⫽ astrocytoma; O ⫽ oligodendroglioma; Pre
⫽ presurgery MRI; Post ⫽ postsurgery MRI.
Although the technology for conformal radiation treatment planning has developed to a high level of accuracy, the
definition of the tumor GTV is still based on time-intensive,
highly subjective manual outlining (8, 11, 12). Manual
outlining is the type of process that should be an excellent
candidate for automation through the development of a
computerized segmentation system. At our institution, several techniques of MRI segmentation have been developed
and evaluated specifically for use with brain tumors. These
methods use the information derived from several magnetic
resonance contrasts (i.e., multispectral data).
A supervised automated segmentation method requires an
operator to select regions of interest on each slice of multispectral MRI data, which, in turn, are used to train the
automated classifier. One of these methods, the “k-nearest
neighbor” (kNN) system, has been shown to perform better
than other tested supervised methods and has been used by
many researchers for automated brain segmentation (13).
Unsupervised techniques of MR segmentation do not require operator input for the processing of each data set. To
automate the tumor volume determination, Clark, Hall, and
Goldgof encoded knowledge of the pixel intensity and spatial relationships in the images to create a fully automated
segmentation system known as the knowledge-guided (KG)
method (16, 17). The KG expert system was initially trained
to identify slices of MR images of the brain that contain
pathology from slices that do not contain pathology. The
KG system’s current incarnation is able to identify and
measure tumor tissue from glioblastoma multiforme lesions
after gadolinium enhancement in the brain (17).
Both the kNN and KG segmentation methods have been
clinically applied as a technique for more accurately measuring tumor volume variation in the brain (15, 18). This
work evaluates the performance of kNN as a representative
of operator-assisted semiautomated segmentation and KG
as a promising candidate for fully automated GTV determination. Automatic segmentation of MR images offers the
potential to accurately define complex treatment volumes,
to speed the contouring process in radiation therapy treatment planning, and to provide a standardized reproducible
measurement protocol that can be employed by geographically diverse facilities and physicians in treating brain tumors.
METHODS AND MATERIALS
Subjects
Pre-existing MRI and CT data of 11 patients with primary
brain cancer (glioma) were used as the basis in this study.
The study was approved under the University of South
Florida institutional review board #5253 and required no
patient informed consent because only existing data were
used and were recorded in such a manner that participants
could not be identified. The demographics of this patient
group are listed in Table 1. Patient selection was based on
available cases collected over a period of 1 year with
primary brain cancer (glioma) that had a pre- and posttreatment MRI in our clinic and proceeded to have radiation
therapy in Moffitt Cancer Center. In conformance with the
standard clinical protocol of the treating facility, these patients had brain surgery and were imaged presurgery with
MRI and postsurgery with both MRI and CT. Depending on
the treatment protocol selected for each individual patient,
the MRI images used in connection with this study may
have been taken either before or after surgery. This factor
was included as a variable in the study. The CT was used for
3D radiation treatment planning.
CT scanning
The CT images were obtained using a Siemens CT HiQ
spiral scanner (Siemens Medical Systems, Erlangen, Germany) with 512 ⫻ 512 pixel images taken at 4-mm spacing
from the vertex through the treatment area and 8-mm slice
thickness through the thyroid. Patients were immobilized
using a customized mask together with a head rest (MedTec,
Orange City, IA). The CT treatment planning system includes MergeCom, the precursor to DICOM data communications.
302
I. J. Radiation Oncology
● Biology ● Physics
MRI scanning
The patients were imaged in either a 1.5 Tesla GE Signa
Horizon (General Electric Co., Milwaukee, WI) or a Siemens Magnetom Symphony with fast gradient systems using the standard multielement head coil. The systems include DICOM data communications. The multispectral data
set used for MRI segmentation consisted of 5-mm thick
axial anatomic slices T1-weighted, proton-density–
weighted, and T2-weighted images obtained with a field of
view of 220 mm and reconstructed to a 512 ⫻ 512 pixel
image. The T1 scans used for this study were obtained after
administration of 0.1 mmol/kg body weight of gadolinium
(Gd) MRI contrast material (Gd-DTPA) and using a standard spin-echo sequence with a repetition time (TR)/echo
time (TE) ⫽ 400/8 or TR/TE ⫽ 525/17 ms. The protondensity images were acquired using a fluid-attenuated inversion recovery sequence with a TR/TE ⫽ 10002/147 ms
or TR/TE ⫽ 9000/110 ms. The T2 images were acquired
using a TR/TE ⫽ 3000/104 ms or TR/TE ⫽ 4000/96 ms.
Radiation oncologists used axial postcontrast T1 images to
define GTV for cases of enhancing tumor or precontrast T2
images for cases involving no tumor enhancement.
Image registration
Both CT and MRI image sets were transferred to Hewlett
Packard workstations running Computerized Medical Systems (CMS) 3-D treatment planning FOCUS software version 2.4.0. Each image set was then transferred to a Dell
Inspiron 7000 laptop computer equipped with CMS software for image fusion (Focal Fusion, software release version 1.3) and contouring (Focal Ease, software release version 1.3.0). The laptop computer was dedicated to this
research.
The CT and MRI data were registered using the Focal
Fusion software. Registration was required to permit manual physician contouring on the MRI images. The Focal
Fusion software uses maximization of mutual information
for fully automatic registration without the need to define
fiducial or anatomic points (19). The software also incorporates a manual method for pre- or postinteractive adjustment of the registration. The program writes out a file with
a transformation matrix to convert MRI data to the CT
coordinates.
Quantitative accuracy of the mutual information registration algorithm has previously been validated (19, 20). The
final MRI image transformation was evaluated and approved by a radiation oncologist who specializes in neurooncology. The transformation matrix that resulted from
the image registration was also used for transforming the
contours generated by the segmentation methods to the CT
coordinate system.
Tumor volume definition
The reconstructed and registered MR images were used
to define the GTV using the CMS Focal Ease software. The
guidelines for contouring required the definition of the GTV
(enhancing tumor) from which the clinical and planning
Volume 59, Number 1, 2004
(PTV) target volumes would be expanded. The GTV was
defined by the Gd contrast enhancement in T1 images or
changes in the white matter (edema as defined by T2 MRI
images). Each radiation oncologist performed 3 different
GTV outlines on each image set for each of the 11 patients,
resulting in a total of 33 contours. The three different
outlining sessions for each physician were separated by
approximately 1 month to prevent memory bias. The laptop
computer was brought to each radiation oncologist’s location of choice. The time the radiation oncologists took for
the outlining process was measured and recorded as part of
this study. All contours, including CT data and MRI transformation files, were copied to a UNIX research network to
provide the basis for comparison with the segmentations
prepared by the automated expert systems.
The expertise of each radiation oncologist is as follows:
Physician 1 is a radiation oncologist specializing in neurooncology with 9 years of clinical practice in radiation
oncology and involvement with more than 200 glioma brain
tumor cases, including brain study trials (21, 22). Physicians
2 and 3 are radiation oncologists, each with more than 20
years of experience in radiation oncology.
MRI segmentation. The MRI segmentation methods were
run on UNIX workstations (Sun Microsystems, Mountain
View, CA) using a network and software environment in
which MRI, CT, focus treatment planning systems, the
laptop computer, and the image processing laboratory were
all integrated to allow for convenient flow of images and
other data between platforms. MRI segmentation was performed on the tumor volume data by two different techniques: kNN (13, 14, 23) and the fully automatic knowledge
guided system (17, 24). The kNN method requires the user
to select training data from each MRI slice. In the present
study, a medical physicist selected the training data. Previous research estimated the intra- and interoperator variation
arising from training data selection for kNN at 9% and 5%,
respectively (14). The KG system requires no user input;
therefore, there is no variability in output.
The results from the kNN segmentation included scattered tumor-labeled pixels in addition to the main body of
pixels identified as “the” tumor. Consistent with the previously reported studies, pixels from the kNN results that
were clustered together were selected for tumor classification and the scattered individual pixels were discarded.
Final results from both MRI segmentation methods were
transformed to the CT coordinate system using the transformation matrix produced by the registration software to
allow comparison with the GTV outlines prepared by the
radiation oncologists.
Analysis strategy
All analysis and image data transformation was performed using programs developed with Interactive Data
Language software version 5.4 (IDL, Research Systems
Inc., Boulder, CO).
Intraoperator and interoperator variability. The intraoperator variability was calculated by overlapping the three
Automated brain tumor volume determination
volumes defined on the same patient by the same radiation
oncologist at roughly 1-month intervals. The variability was
then calculated as the ratio of the average disagreement; that
is, the size of each volume minus the intersection of the
three volumes, divided by the average size of the three
volumes (see Appendix A).
The interoperator variability was calculated using the
nine sets of 3D resulting outlines for each of the 11 patients
and then calculating the disagreement of each volume outline prepared by each physician for each patient with each
volume outline prepared by each of the other two physicians
for that same patient. This process was repeated for each
patient to provide a data set comprising the average disagreement between the three contours for each patient prepared by one physician with the other six sets of contours
prepared by the other two physicians for that same patient
(see Appendix A).
Accuracy. It is customary for radiation oncologists to
delineate the GTV used in radiation therapy. The difference
between brain GTV delineation for different physician specialties (radiation oncology, radiologists, neurosurgeons)
has been reported by Weltens et al. (10). This research
evaluates the automated segmentation methods as possible
tools for delineating GTV in connection with treatment
planning of brain tumor volumes. For purposes of this
study, a probabilistic interpretation of the volumes delineated by the radiation oncologists provided the basis for
evaluating the accuracy of both the individual physicians
and the automated segmentation systems. Specifically, the
probability that a given pixel in an image is properly classified as part of the tumor volume is determined by the
number of times that this pixel was included in the nine
outlines prepared by the three physicians. Every pixel in the
image is labeled with an integer value (0 –9) corresponding
to the number of physician contours in which it was included. This pixel label provided the weight for measuring
accuracy. Final accuracy for the computer segmentation is
then defined as the ratio of the total sum of weights contained within the computer segmentation volume to the total
weights generated from the nine volumes produced by the
physicians (Appendix B). The same protocol was used to
determine the accuracy of each contoured volume produced
by each physician. This approach measures the true-positive
rate. To estimate the false-positive rate (i.e., the level of
agreement between physicians on healthy tissue that was
incorrectly characterized as constituting part of the GTV),
the study calculated the excluded volume accuracy in an
analogous manner (see Appendix B).
RESULTS
Image registration
The CT and MR images were registered using the automatic registration function of the software. The radiation
oncologist specializing in neuro-oncology reviewed each
case and, if necessary, performed an additional manual
adjustment. The same transformation matrix was used for
● G. P. MAZZARA et al.
303
comparing contours generated by the physicians with those
produced by the segmentation methods, thereby eliminating
any errors arising from the image registration.
Operator time
The time each radiation oncologist took to outline all 11
patients was recorded. It ranged from 4.0 to 6.5 hours in the
aggregate (i.e., an average of approximately 30 min per
patient). Physician 1 spent the most time outlining the
patient contours, resulting in an average of approximately
30 min, whereas Physician 2 spent the least time, averaging
approximately 20 min per patient.
The time to perform a kNN segmentation averaged 20
min per patient, with some variation based upon the number
of slices evidencing enhanced tumor and the difficulty of
selecting the training data for the kNN segmentation algorithm. For the KG system, the only time required was in the
preparation of the MRI scans for segmentation, which resulted in approximately 1.5 h of operator time. It is interesting to note that this data preparation task could be substantially automated in the future, further reducing the
human operator time required by the KG system. The automatic segmentation using the KG system required approximately 30 min of computer time for all patients and required no user input.
Manual outline variability
Reproducibility of the delineation of the GTV on the MRI
scans by the same radiation oncologist (intraoperator variability) was assessed producing the results set forth in Fig.
1 and Table 2. The intraoperator variability averaged 20%
⫾ 16% over all 33 contour sets of the 11 patients. The
reproducibility of the delineation of target volume was
generally better in preoperative cases (18 sets of contours:
15%) than in postoperative cases (15 sets of contours: 27%).
The difference among GTVs identified by the three radiation oncologists (interoperator variability) was also assessed
resulting in a total average of 28% ⫾ 12% (Fig. 2 and Table
2). The variability in the six preoperative cases was 24%,
with a higher average ratio obtained for the postoperative
cases (i.e., 32%).
From Figs. 1 and 2, a large variation can be observed for
Patient 6. This was a difficult case for both physicians and
the automated systems. The MRI used was postoperative
and the enhancement boundaries were not clear because of
cystic formation inside the resected area (Fig. 3). Notice
than even though there was a large variation for Patient 6,
the median is close to the average and is within the standard
deviation (see Table 2).
Computer segmentations
For the automated segmentation methods, it should be
noted that the KG algorithm was not designed to evaluate
nonenhancing tumors such as those encountered in connection with Patients 1 and 2. For Patients 3 and 6, the KG
method identified tumors in very few of the slices that had
physician outlines for tumor. Patient 6 had cystic formation
304
I. J. Radiation Oncology
● Biology ● Physics
Volume 59, Number 1, 2004
Fig. 1. Graph of physician intraoperator variability for the 11 patients.
inside a partly enhanced area so the margins were not clear
(Fig. 3). Patient 3 had dentures and implants that caused
artifacts in the images, making the KG automatic segmentation difficult (Fig. 4). In this case, there was also cystic
formation inside the enhanced area. For both of these cases
the kNN segmentation method performed within the contours produced by the physicians.
The KG segmentation method performed poorly for sections of the tumor that were located in the lower part of the
brain, which was the case for Patients 4 and 9, in whom the
lower axial scans were not identified properly regardless of
the enhancing property of the tumor (Fig. 5). For both of
these patients, the KG missed the last slice of the enhancing
tumor.
Accuracy
As described in the methods section and Appendix B, the
results from the physicians were used as the basis for
assessing the accuracy of the radiation oncologists and the
MRI segmentation methods to determine GTV for radiation
therapy brain 3D treatment planning. The results are tabu-
lated in Table 3. The kNN method gave an average accuracy
of 59% for preoperative scans compared with the 52%
average obtained for postoperative scans. For the KG
method, three of the preoperative cases were nonenhancing
tumors or had cystic formation (i.e., Patients 1, 2, and 3) and
were unable to be segmented.
The average accuracy of the three physicians is 85% ⫾
7%, compared with 56% ⫾ 6% for the kNN method and
52% ⫾ 7% for the KG method, resulting in a difference
from the physicians’ contours of 29% and 33% for the kNN
and KG method, respectively. Comparing this difference
with the average interoperator variability of 28% ⫾ 12% for
all 11 cases (with a range of variability between physicians
from 17% to 60% [Fig. 2]), the automated segmentation
methods are within the variability range of the physicians. It
is important to note that the design of this study defined
“true” volume by using the GTV generated by the same
three radiation oncologists to whom the automated systems
were compared; accordingly, it would not be possible
within this conceptual framework for the accuracy of the
automated systems to have exceeded that of their human
Table 2. Intraoperator and interoperator variability*
Intraoperator
Variability
Average
Median
Interoperator
variability
Average
Median
Physician 1
Physician 2
Physician 3
Average
Volume (cm3)
13 ⫾ 5
13
26 ⫾ 19
22
22 ⫾ 23
15
20 ⫾ 16
16
63 ⫾ 33
61
Physician 1–2
Physician 1–3
Physician 2–3
Average
Volume (cm3)
30 ⫾ 11
26
23 ⫾ 11
21
30 ⫾ 14
27
28 ⫾ 12
23
63 ⫾ 33
61
* Values shown are in percentages of total volume in cm3.
Automated brain tumor volume determination
● G. P. MAZZARA et al.
305
Fig. 2. Graph of physician interoperator variability for the 11 patients.
counterparts. This limitation of the study is explored further
in the Future Work section.
The accuracy measure used for this study favors larger
volumes because there is no penalty in the measure for
“false-positive” pixels. The effect of this false-positive effect is expressed as a ratio based on the volume included by
the computer segmentations and excluded by the physicians’ modeling volume; these results are set forth in Table
4. Interestingly, the false-positive rate of the kNN method
was 8% ⫾ 11% and of the KG method was 8 ⫾ 8%,
whereas the false positive rate of the physicians was 17% ⫾
11%. Thus the automated segmentation methods have been
shown to err on the side of underestimation of tumor volume when compared with the physicians.
The two segmentation methods were assessed visually
and quantitatively to evaluate where the major volume
differences occurred between the contours delineated by the
physicians and the automated systems. This analysis should
prove useful in suggesting further studies with larger sample
sizes, which could significantly improve the accuracy of
automated contouring systems for radiation oncology. In
general, the largest variations between the contouring of the
physicians as a group and those produced by the automated
systems were found at superior and inferior edges of tumor.
This effect can be seen clearly in Fig. 5, which shows
enhancing tumor in the most inferior slice, which was not
identified properly by the kNN method and that the KG
method completely failed to detect. For slices in the central
sections of tumors, both segmentation methods provided
contours that were much closer to those created by human
experts.
Another example of this effects can be seen in Fig. 6,
which shows the 3 superior slices for Patient 9 and contours
drawn by the physicians in the top slice showing tumor.
Images A through C demonstrate that the enhancement on
image C has been included mainly because of the physicians’ knowledge that there is tumor in the previous slices at
the corresponding locations. The kNN and KG segmentation methods are limited to two-dimensional identification
protocols (i.e., each individual slice is analyzed for tumor
without considering adjacent slices). The drawing of contour edges by human experts is a very subtle and subjective
activity blending scientific training with heuristics developed through experience with a variety of tumors contoured
over many years. Notice the variations in the contours
among the physicians in this study. Interestingly for this
case, one physician did not draw any GTV on a slice he
marked as containing tumor the next two times he was faced
with identical data.
Figure 7 shows a 3D reconstruction of the tumor volume
drawn by a physician (outer yellow volume) and the GTV
estimated by the kNN (Fig. 7a) and the KG (Fig. 7b)
systems. It can be seen clearly that the physician volume
contains the GTV produced by the segmentation methods
and that the segmentation methods fail to identify tumor
most frequently in the superior and inferior edges of a
tumor.
In addition to the differences found at the superior and
inferior edges of the tumor volume, the contours prepared
by the segmentation systems agree with the physicians more
at the center of the tumor than on the outside borders as
shown in the example of contours in Fig. 8. Similarly, the
areas where the physicians agreed most consistently (i.e.,
the regions where the pixels were included in at least seven
of the nine physician outlines) were located near the center
of the tumor. The corresponding accuracy for these areas of
306
I. J. Radiation Oncology
● Biology ● Physics
Fig. 3. Magnetic resonance image of Patient 6 showing a gross
tumor volume (GTV) contour from (a) Physician 1, (b) Physician
2, (c) Physician 3, and (d) supervised k-nearest neighbors (kNN)
segmentation result. Notice the variability between all physician
GTV contours and close agreement of kNN segmentation with
Physician 1. The kNN performed well for a difficult case involving
a cystic formation inside a partly enhanced area. Notice that all
three physicians choose an area beyond the contrast enhancing
area. This is due to information from previous image slice. The
kNN method misses additional tumor volume because it does not
use three-dimensional information (i.e., volume information from
previous and following images). The magnetic resonance image
shown is a T1-weighted axial scan after application of Gd contrast,
field of view ⫽ 220, TR/TE ⫽ 400/8 ms, flip angle ⫽ 90.
high physician agreement for kNN and KG was 75% and
72%, respectively.
Figure 9 shows a section of the receiver operating characteristic (ROC) curves for all three physicians compared
with those for the kNN and KG systems. The ROC curve is
a plot of the true-positive rate (TP) against the false-positive
rate (FP). It shows the tradeoff between sensitivity (portion
of accurate TP) and specificity (portion of accurate truenegative) because any increase in sensitivity will be accompanied by a decrease in specificity. The closer a curve
follows the left-hand border and top border of the ROC
space, the more accurate the test. True tumor (TP) for this
study is based on the times a pixel was included in an
outline by the physicians. The curves for the segmentation
methods must necessarily be below those of the physicians
since the latter defined “truth” for purposes of this study. It
can be seen that automated segmentation systems tend to
fail in sensitivity but have a high degree of specificity as
Volume 59, Number 1, 2004
Fig. 4. Magnetic resonance image of Patient 3 showing a small
teeth artifact effect next to tumor containing Gd enhancement with
nonenhancing cystic necrotic centers. Contours shown are gross
tumor volume contours of (a) Physician 1, (b) Physician 2, (c)
Physician 3, and (d) supervised k-nearest neighbors (kNN) segmentation method. The kNN method was able to segment this
contour and obtain results close to the physicians’ outlines. The
magnetic resonance image shown is a T1-weighted axial scan after
application of Gd contrast, TR/TE ⫽ 400/8 ms, field of view ⫽
220, flip angle ⫽ 90.
evidenced by the data summarized in Tables 3 and 4 and
Figs. 7 and 8.
DISCUSSION
The present study confirms published findings that variability in tumor contouring by human experts is high. Ten
Haken and coworkers ran a simple test to assess the dosimetric consequences of imprecision in the definition of
tumor volumes by a team of physicians when defining
tumor volume in CT, MRI, and MRI-CT fused images
finding that after two iterations of the contouring, the definition of tumor volumes were smaller and averaged just
75% of average physical volumes indicated in the first set of
contours (7). Yamamoto and coworkers measured inter- and
intraoperator variability that exceeded 10% for CT contoured areas of brain tumors (8). Another study reported
very large interobserver variations in brain tumor delineation (range 9 –32%) for different physician specialties performing contours in both CT alone and CT with MRI (10).
This error is larger that the setup variations and organ
Automated brain tumor volume determination
Fig. 5. Series of reconstructed magnetic resonance axial images for
Patient 4 showing contours from Physician 1, supervised k-nearest
neighbors (kNN), and knowledge-guided (KG) segmentation. Notice the excellent results for the middle slices for both automated
segmentation methods and the failure of kNN and KG method for
inferior slices where enhancement and tumor volume margins are
not as clear and knowledge from previous slices is necessary to
identify correct tumor volume. Slices shown are T1-weighted axial
scans after application of Gd contrast, field of view ⫽ 220, flip
angle ⫽ 90. Slices shown are 4-mm spaced.
motions that are traditionally taken into account in radiation
therapy planning. All of these results demonstrate the need
for a method of contouring that is more consistently reproducible.
In this study, it was found that the radiation oncologist
who took the most time for outlining achieved the smallest
intraoperator variability (i.e., 13% [the average for all of the
oncologists was 20%]) and the one who took the least time
● G. P. MAZZARA et al.
307
in outlining produced the largest intraoperator variability,
26%.
The variation between different radiation oncologists, or
the interoperator variability, ranged from 11% to 69% with
an average variability rate of 28%. These results mirror
previously published results (8 –10) and show that there is
significant uncertainty in target volumes definition even
when such volumes are determined by a single radiation
oncologist observing the same set of data on multiple occasions. The variability in delineation of GTV was about
10% larger in postoperative cases than in preoperative
cases. Similarly, a larger variation was found for postoperative cases using the automatic segmentation methods. In
postoperative cases, the margins of residual tumor are unclear making the identification of the GTV a difficult task
for both physicians and automated segmentation systems.
Previous studies confirm similar results (8, 9).
The purpose of this study was to evaluate KG and kNN as
potential cybercolleagues for radiation oncologists in determining tumor volume definition for treatment planning. We
proposed a probabilistic measure of accuracy accounting for
the inherent variability in operator judgment. It was found
that even without any of the system enhancements suggested herein, the automated segmentation methods could
qualify as independent experts because they perform within
the large range of interoperator variability found among
radiation oncologists.
Several factors were identified to improve the accuracy of
the segmentation methods to be used in radiation treatment
planning. The greatest discrepancies between the contours
produced by the automated segmentation methods and the
physician’s was found at the edges superior or inferior of
the tumor volume. For sections corresponding to the middle
of the tumor volume, the kNN and KG method performed
well (74% average) compared with the physicians.
A greater difference at the superior and inferior borders
Table 3. Accuracy of physicians and segmentation methods*
Patient
#
1
2
3
4
5
6
7
8
9
10
11
Average
MRI type
Pre
Pre
Pre
Pre
Post
Post
Pre
Post
Pre
Post
Post
Average
volume
(cm3)
Accuracy
physician
1
Accuracy
physician
2
Accuracy
physician
3
Accuracy
kNN
method
Accuracy
KG
method
28
90
98
40
108
73
100
63
36
17
39
63 ⫾ 33
86
92
92
86
86
71
89
84
74
79
91
84 ⫾ 7
87
82
95
92
90
63
94
88
91
87
89
87 ⫾ 9
89
94
91
89
91
57
88
89
78
88
86
85 ⫾ 10
57
67
62
58
62
52
57
51
52
48
46
56 ⫾ 6
n/a
n/a
n/a
48
50
n/a
63
54
43
60
48
52 ⫾ 7
* Values shown are in percentages of total volume in cm3.
Abbreviations: MRI ⫽ magnetic resonance imaging; kNN ⫽ supervised k-nearest neighbors; KG ⫽ knowledge-guided; pre ⫽ presurgery
MRI; post ⫽ postsurgery MRI.
308
I. J. Radiation Oncology
● Biology ● Physics
Volume 59, Number 1, 2004
Table 4. Excluded accuracy for physicians and segmentation methods*
Patient
No.
1
2
3
4
5
6
7
8
9
10
11
Average
MRI type
Pre
Pre
Pre
Pre
Post
Post
Pre
Post
Pre
Post
Post
Average
volume
(cm3)
Exc acc
physician
1
Exc acc
physician
2
Exc acc
physician
3
Exc acc
kNN
method
Exc acc
KG
method
28
90
98
40
108
73
100
63
36
17
39
63 ⫾ 33
8
13
5
5
7
34
6
8
5
6
14
10 ⫾ 9
20
6
18
25
18
58
23
19
62
26
16
26 ⫾ 17
15
19
4
9
15
32
7
17
11
24
8
15 ⫾ 8
7
4
2
3
3
41
1
7
9
12
1
8 ⫾ 11
n/a
n/a
n/a
4
5
n/a
4
14
3
23
1
8⫾8
* Values shown are in percentages of total volume in cm3.
Abbreviations: MRI ⫽ magnetic resonance imaging; exc acc ⫽ excluded accuracy; kNN ⫽ supervised k-nearest neighbors; KG ⫽
knowledge-guided; pre ⫽ presurgery MRI; post ⫽ postsurgery MRI.
of the tumor results, at least in part, from the small enhancement found at the edges of the tumor. The drawing of
contour edges is very subtle and subjective. The radiation
oncologists use a 3D method of contouring; that is, one in
which the previous and subsequent two-dimensional slices
are used to predict the presence of tumor volume on the
slide in question. This knowledge needs to be included in
Fig. 6. Series of magnetic resonance axial images for Patient 9. (a),
(b), and (c) show the top three slices of tumor, 4-mm spaced.
Notice how (c) does not show too much enhancement, but by
comparing it to previous images, some tumor volume can be seen.
The tumor volume drawn the first time by each physician on the
MR slice shown on (c) is shown in images (d), (e), and (f) for
Physicians 1, 2, and 3, respectively. There is a large variation of
contours between physicians. The segmentation methods did not
identify tumor volume on (c). Slices shown are T1-weighted axial
scans after application of Gd contrast, field of view ⫽ 220, TR/TE
⫽ 400/8 ms, flip angle ⫽ 90.
the segmentation methods to improve accuracy (i.e., a 3D
segmentation method is needed that uses knowledge and
pixel information of tumor from adjacent two-dimensional
slices).
Compared with the kNN method, the KG method performed poorly for glioma cases that show Gd enhancement
with nonenhancing cystic necrotic centers. The margins of
the tumor are not clear for these cases and even the physicians’ contours show a larger intra- and interoperator variability for these cases. A possible solution for these cases is
to use more knowledge from the T2 MRI images in the
automatic segmentation cases. Additionally, the KG method
failed to detect tumor volume located in the lower part of
the brain. The KG system performs differently in different
areas of the brain because it has rules describing the anatomy at the various levels through the brain. Anatomic
structures are simpler in superior areas of the brain and
increase in complexity toward inferior sections. The kNN
method was able to give better results to these cases because
there is some user input in selecting the initial tumor pixels
and slices from which the kNN method began its segmentation analysis.
The two cases that showed nonenhancing tumor volumes
were not segmented by the KG method. It is necessary to
incorporate automatic segmentation of nonenhancing brain
tumors in the knowledge guided technique. Some promising
work has been performed in developing an automatic
method that separates nonenhancing brain tumors from healthy
tissues in MRI images showing promising results (25).
In summary, there is need for more work on the KG
method to make it fully compatible for use in radiation
therapy; this work would include modifications to permit
contouring of partially enhancing tumors, resection cavities,
and nonenhancing tumors. The guided kNN method performed better under these special circumstances because of
its use of user input for initial selection of training pixel
data.
Automated brain tumor volume determination
● G. P. MAZZARA et al.
309
Fig. 8. Image shows axial slice with one contouring of Physicians
1, 2, and 3 (outer contours in red, orange, and pink, respectively),
and supervised k-nearest neighbors segmentation (inner contour in
blue), and knowledge-guidance segmentation (inner contour in
light blue). Figure represents general effect encountered of decreasing agreement between the contours produced by the automatic segmentation methods and the physicians as compared from
the center of the tumor toward its outside borders. Additionally, it
demonstrates that computer segmentations tend to agree with the
contours prepared by radiation oncologists when the radiation
oncologists agree with each other.
Fig. 7. Three-dimensional reconstructed images for Patient 11
showing contours of Physician 1 (outer volume in yellow) and
supervised k-nearest neighbors (kNN) (inner red volume in (a))
and KG (inner red volume in (b)). The kNN and KG methods have
a larger agreement with the physician toward the center of the
tumor compared with the superior and inferior borders. The physician volume contains the segmented gross tumor volume (GTV).
It can be noticed that the segmentation methods undersegment the
GTV volume compared with that the radiation oncologist.
Future work should concentrate on optimizing the segmentation techniques to improve the accuracy of their results, especially with respect to the definition of the inferior
and superior borders of the tumor volume. Note that different glioma types were included in this research to generate
the basis of possible future applications of brain segmentation methods. It would be of interest to perform more
in-depth analysis on the variability of segmentation methods
based on the type of brain tumors by selecting a specific
type of tumor patients (glioblastoma, astrocytoma, or oligodendrogliomas) and study its tumor-specific segmentations results.
Both segmentation methods considered herein should
also be enhanced to allow them to identify edema and
structures at risk. This would permit future incarnations of
these automated systems to provide outlines for such structures and to assist physicians in automatic clinical target
volume and PTV delineation. Additionally, the effects of
creating radiation therapy treatment plans using PTV expanded volumes from computer-segmented GTV would
provide valuable data for possible applications of these
methods in actual clinical radiation therapy treatment.
Additional research should also be performed that analyzes the results of the automated segmentation methods in
a way that does not favor the physicians involved in the
comparison. It was previously noted that the contours from
the radiation oncologists used for comparison were also
used to define the “true” volume. This posed a limitation in
that the accuracy of the computer segmentation methods
310
I. J. Radiation Oncology
● Biology ● Physics
Volume 59, Number 1, 2004
Fig. 9. Receiver operating characteristic curves for all three physicians compared with supervised k-nearest neighbors
and knowledge-guidance. Computer segmentations fail in sensitivity but have a high rate of specificity; that is, most
pixels identified are within the tumor volume defined by the physicians.
would always fall below that of the physicians. A recent
study recommends cooperation with a radiologist or neurosurgeon to reduce the variability in tumor volume definition
(10). Incorporating a second group of physicians (preferably
radiation oncologists, neurosurgeons, and radiologists) as
experts working together to define “true” GTV based on
their mutual consensus would allow subsequent studies to
fairly compare the accuracy of automated segmentation
systems with that of radiation oncologists evaluating the
same data.
CONCLUSIONS
Radiation therapy treatment planning requires radiation oncologists to expend substantial time and effort
contouring tumor target volumes for treatment. This
study investigated the application of state-of-the-art automated tumor segmentation methods for brain MRI as a
tool for tumor volume definition in radiation therapy
treatment planning. Starting with the assumption that true
target volume is found through the consensus of expert
radiation oncologists, the study assessed the viability of
computer segmentation methods as “cyber colleagues” of
the human experts by measuring the accuracy and consistency of the automated system’s contouring. The results of this study demonstrate that the kNN and KG
methods undersegment the tumor volume compared with
the radiation oncologists but are within the variability of
the contouring performed by experienced radiation oncologists based on the same data.
At this time, the level of sophistication of the automated
systems is insufficient for them to perform comparably to
radiation oncologists. As automated systems improve, it is
likely that their accuracy will approach that of human experts. Even in their current incarnation, the automated systems evaluated herein produced more consistent, though not
more accurate, results than the physicians. Automatic tumor
outlining has the potential to speed the contouring process
in radiation treatment planning, produce a reproducible
baseline for use by multiple physicians, and aid in multicenter trials because it would prevent physician- and centerbias that can affect trial outcomes.
REFERENCES
1. Morris DE, Bourland JD, Rosenman JG, et al. Three-dimensional conformal radiation treatment planning and delivery for
low- and intermediate-grade gliomas. Semin Radiat Oncol
2001;11:124–137.
2. Henkelman RM. New imaging technologies: Prospects for
target definition. Int J Radiat Oncol Bio Phys 1991;22:251–
257.
3. Jansen EP, Dewit LG, Van Herk M, et al. Target volumes in
radiotherapy for high-grade malignant glioma of the brain. Int
J Radiat Oncol Biol Phys 2000;56:151–156.
4. Caudrelier JM, Vial S, Gibon D, et al. MRI definition of target
volumes using fuzzy logic method for three-dimensional conformal radiation therapy. Int J Radiat Oncol Bio Phys 2003;
55:223–233.
5. Halperin EC, Bentel G, Heinz ER, et al. Radiation therapy
treatment planning in supratentorial glioblastoma multiforme:
an analysis based on post mortem topographic anatomy with
CT correlations. Int J Radiat Oncol Biol Phys 1989;17:1347–
1350.
6. Seither RB, Jose B, Paris KJ, et al. Results of irradiation in
patients with high-grade gliomas evaluated by magnetic resonance imaging. Am J Clin Oncol 1995;18:297–299.
Automated brain tumor volume determination
7. TenHaken RK, Thornton AF, Sandler HM, et al. A quantitative assessment of the addition of MRI to CT-based, 3-D
treatment planning of brain tumors. Radiother Oncol 1992;25:
121–133.
8. Yamamoto M, Nagata Y, Okajima K. Differences in target
outline from CT scans of brain tumours using different methods and different observers. Radiother Oncol 1999;50:151–
156.
9. Khoo VS, Adams EJ, Saran F, et al. A comparison of clinical
target volumes determined by CT and MRI for the radiotherapy planning of base of skull meningiomas. Int J Radiat Oncol
Biol Phys 2000;46:1309–1317.
10. Weltens C, Menten J, Feron M, et al. Interobserver variations
in gross tumor volume delineation of brain tumors on computed tomography and impact of magnetic resonance imaging.
Radiother Oncol 2001;60:49–59.
11. Pitkanen MA, Holli KA, Ojala AT, et al. Quality assurance in
radiotherapy of breast cancer—variability in planning target
volume delineation. Acta Oncol 2001;40:50–55.
12. Van den Berge DL, De Ridder M, Storme G. Imaging in
radiotherapy. Eur J Radiol 2000;34:41–48.
13. Clarke LP, Velthuizen RP, Camacho MA, et al. MRI segmentation: Methods and applications. Magn Reson Imaging 1995;
13:343–368.
14. Vaidyanathan M, Clarke LP, Velthuizen RP, et al. Comparison of supervised MRI segmentation methods for tumor volume determination during therapy. Magn Reson Imaging
1995;13:719–728.
15. Vaidyanathan M, Clarke LP, Hall LO, et al. Monitoring brain
tumor response to therapy using MRI segmentation. Magn
Reson Imaging 1997;15:323–334.
16. Li C, Goldgof DB, Hall LO. Automatic segmentation and
tissue labeling of MR images. IEEE Trans Med Imaging
1993;12:740–750.
● G. P. MAZZARA et al.
311
17. Clark MC, Hall LO, Goldgof DB, et al. Automatic tumor
segmentation using knowledge-based techniques. IEEE Trans
Med Imaging 1998;17:187–201.
18. Velthuizen RP, Clarke LP, Phuphanich S, et al. Unsupervised
measurement of brain tumor volume on MR images. J Magn
Reson Imaging 1995;5:594–605.
19. Meyer CR, Boes JL, Kim B, et al. Demonstration of accuracy
and clinical versatility of mutual information for automatic
multimodality image fusion using affine and thin-plate spline
warped geometric deformations. Med Image Anal 1997;1:
195–206.
20. Maes F, Vandermeulen D, Suetens P. Comparative evaluation
of multiresolution optimization strategies for multimodality
image registration by maximixation of mutual information.
Med Image Anal 1999;3:373–386.
21. Grossman SA, O’Neill A, Grunnet M, et al. Phase III study
comparing three cycles of infusional carmustine and cisplatin
followed by radiation therapy with radiation therapy and concurrent carmustine in patients with newly diagnosed supratentorial glioblastoma multiforme: Eastern Cooperative Oncology Group Trial 2394. J Clin Oncol 2003;21:1485–1491.
22. Kleinberg L, Grossman SA, Carson K, et al. Survival of
patients with newly diagnosed glioblastoma multiforme
treated with RSR13 and radiotherapy: Results of a phase II
new approaches to brain tumor therapy CNS consortium
safety and efficacy study. J Clin Oncol 2002;20:3149–3155.
23. Clarke LP, Velthuizen RP, Phuphanich S, et al. MRI: Stability
of three supervised segmentation techniques. Magn Reson
Imaging 1993;11:95–106.
24. Clarke LP, Velthuizen RP, Clark MC, et al. MRI measurement
of brain tumor response: Comparison of visual metric and
automatic segmentation. Magn Reson Imaging 1998;16:271–
279.
25. Fletcher-Heath LM, Hall LO, Goldgof DB, et al. Automatic
segmentation of non-enhancing brain tumors in magnetic resonance image. Artif Intell Med 2001;21:43–63.
APPENDIX A
Intraoperator and interoperator variability
The intraoperator variability was calculated as the ratio
of the average disagreement; that is, the size of each
volume minus the intersection of the three volumes,
divided by the average size of the three volumes. For
example, if a radiation oncologist had identified the same
target volume in the three sets of contours prepared for
any single patient, then the variability for that patient would
have been zero. The definition of intraoperator variability is
represented in Fig. A:
Vi1, Vi2, and Vi3 indicate tumor volume delineated by the
radiation oncologist i three times and shaded area Vi(int)
represents the intersection of all three volumes. This results
in the following formula to calculate the intraoperator variability:
冘
3
COV
intra
i
⫽
1
共V ⫺ V i共int兲兲
3 j⫽1 ij
冘
3
*100%
1
V
3 j⫽1 ij
Fig. A.
The interoperator variability was calculated using the
nine sets of outlines for each of the 11 patients and then
calculating the disagreement from the outline prepared by
each physician for each patient with the corresponding
outline prepared by each of the other two physicians for that
same patient. Figure B shows the disagreement of one
volume of one physician i (Vi1) with one volume of another
physician j (Vj1):
312
I. J. Radiation Oncology
● Biology ● Physics
Volume 59, Number 1, 2004
COV
Fig. B.
The final intraoperator variability is the average of the comparison of all volumes of one physician with all the volumes of
the other physician as shown in the following formula:
inter
ij
1
⫽
9
冘 冘 VV
3
3
m⫽1 n⫽1
⫺ V jn
*100%
im ⫹ V jn
im
This is done for each of the three physicians, resulting
in the average variability between the three contours for
each patient prepared by one physician with the other six
sets of contours prepared by the other two physicians for
that same patient. The greater the difference between the
contours of different physicians, the larger this ratio
becomes.
APPENDIX B
Calculation of accuracy
Accuracy was calculated by assuming that the probability
that a region is part of the definition of gross tumor volume is
reflected by the number of times that region is included in any
of the nine outline volumes produced by the three radiation
oncologists. Every pixel in the image volume is labeled with an
integer value corresponding to the number of physician contours in which it was included (e.g., if a pixel was never
included in any physician outline, its corresponding value
would be zero, whereas a pixel included in every physician
outline would have a value of nine). The resulting composite
physician GTV comprises pixels labeled with values from zero
to nine that define the probability of finding tumor volume.
The pixel label provided the weight for measuring accuracy. This analysis was done on a pixel-by-pixel basis. Thus
the degree of accuracy associated with a failure to classify
a pixel as being part of a tumor volume would decrease in
proportion to the weight associated with that pixel. For
example, the failure to include a pixel that was assigned a
label of nine would reduce accuracy more than missing a
pixel that was assigned a level of one (i.e., a pixel selected
only once by the physicians).
Figure C shows a single volume of either physician (Vij)
or segmentation (Vk) compared with three (out of the nine)
physician volumes. An area of higher pixel label weight is
represented by level of gray in the figure (i.e., the area
where the single volume being evaluated intersects more
physician contours):
Final accuracy or true-positive is then expressed as the
ratio of the total sum of pixel weights that was included by
the physician or segmentation volume (Vij or Vk, showed as
shaded areas in previous figure) to the total sum of pixel
weights of the nine volumes produced by the physicians
(represented by all area enclosed by the three volumes Vi1,
Vi2, and Vi3 in Fig. C):
冘
Accuracy 共Vij or Vk兲 ⫽
image pixels
1
9 i⫽1
3
ij
j⫽1
*100%
冘 冘冘
3
3
1
V ij
9
image pixels i⫽1 j⫽1
Similarly, excluded accuracy or false-positive is expressed
as:
Exc. Accuracy 共Vij or Vk兲
冘
⫽
Fig. C.
冘 冘V
3
共V ij or V k兲*
冉
冘冘
3
3
冊
1
共V ij or V k兲* 1 ⫺
V ij
9
image pixels
i⫽1 j⫽1
冉
冘 冘冘
3
3
冊
1
1⫺
V ij
9
image pixels i⫽1 j⫽1
*100%