Download Classification of DTI Major Brainstem Fiber Bundles

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia , lookup

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
1
Classification of DTI Major Brainstem Fiber
Bundles
Yuan Liu

Abstract—MRI diffusion imaging of the human brain provides
neuron fiber trajectories that can be grouped into different
bundles that have distinctive neurological functions. In this paper,
three major fiber bundles passing through brainstem are
identified. Features for representing fiber trajectories are studied
and a 3D shape context is adopted to capture the fiber shape.
Kernel PCA with histogram intersection kernel is then applied to
reduce the dimensionality of the fiber histogram. Both supervised
and unsupervised classification methods are investigated,
including discriminant analysis, support vector machine,
K-Nearest neighborhood, K-Means, with different kernels and
distance metric. A leave-one-out validation is done in 10 subjects
and the results show the feasibility of the approach to fiber
bundling.
Index Terms—Diffusion tensor images; fiber classification;
shape context; supervised learning
I. INTRODUCTION
D
iffusion tensor imaging (DTI) has provided new insights
for researchers to study tissue microstructure and
organization of brain white matter in vivo. From a set of
diffusion weighted images, a distinct diffusion tensor is
computed and assigned to each voxel. The principal directions
of the tensor were proven to be related to the directionality of
white matter axon bundles [1]. Various algorithms,
deterministic or probabilistic, have been proposed to find the
neural pathways based on that [2] [3]. These tractography
models yields a set of pathways, i.e., fibers, by following the
principal direction.
Once a fiber set has been obtained by means of a tractography
method, the next step to achieve clinical relevance is to find
significant pathways that form homonymous fiber bundles. This
is to say, we would like to have a classification of the brain
fibers, with the major bundles. This is a difficult task, not only
because of the complexity of the problem itself but also the
deficiency of the fiber tracking algorithm and tensor
reconstruction methods. In recent years, a lot of effort has been
devoted in this field, which could be divided into these three
main approaches:
 User-guided approaches: these methods try to identify
bundles based on prior knowledge from experts. This is
usually implemented by drawing a series of regions of
interest (ROI) and using the ROI operations to extract the
desired neuropath ways. Methods of this kind have been
previously reported in [4] [5] [6]. The main advantage is
that the extracted fibers are more accurate and robust due to
the prior information. However, this is too time consuming
and energy consuming. It requires certain degrees of
training and probably suffers from inter and intra operator
variability.
 Cluster-based approaches: these methods try to use
properties of fiber sets and do a grouping of fibers from the
same bundles using some studied similarity measurements
[7] [8] [9]. The main advantage of such methods is that they
are fully automatic and effortless. However, not all the
clustering results necessarily correspond to the defined
fiber bundles. As clustering algorithms are unsupervised,
they do not take any prior knowledge compared to the
user-guided approaches. The robustness and accuracy of
such methods needs to be improved.
 Learning-based approaches: these methods attempts to
learn the features of fiber tracts using the training data set
and apply the trained model to the testing data set for
supervised classification [10] [11]. This could be
considered as semi-automatic, as it has to have the ground
truth marked for the training set.
The methods proposed in this paper can be considered try
both the second and the third category. We want to find a robust
feature descriptor for the fibers and applying machine learning
techniques for automatic classification. We would also like to
investigate the characteristics of different classifiers.
The brainstem is a region characterized by densely packed
fibers traveling to and from the cerebrum and cerebellum [12].
Some of these fibers are of critical importance in the initiation,
control, and execution of movement [13]. In our study, we want
to classify among three important fiber bundles passing through
the brainstem, i.e., the corticospinal tract (CST), the medial
cerebellar peduncle (MCP), and the superior cerebellar
peduncle (SCP). A schematic overview of the overall system is
shown in Fig. 1.
II. METHODOLOGY
2.1 Data Pre-processing
The DTI data used in this study are from 10 subjects in the
deep brain stimulation database, with voxel size 2×2×2 mm and
image size 128×128×60. Fibers are tracked in the native space
using DtiStudio [14], which is based on the Fiber Assignment
2
by Continuous Tracking (FACT) algorithm and a brute-force
reconstruction approach. Each fiber is represented as a discrete
three dimensional curve. The three major tracts in the brainstem,
namely, the corticospinal tract (CST), medial cerebellar
peduncle (MCP), superior cerebellar peduncle (SCP), are
identified according to [5] based on ROI editing as the ground
truth for each subject. Fig. 2 shows an example of these three
bundles overlaid in the color map for one subject.
Fig. 1 Schematic overview of the classification system.
The b0 image of one volume was chosen as the atlas and the
rest were aligned to this atlas by rigid and non-rigid registration.
Fibers were projected to this common space by combining all he
deformation fields. Then, each fiber was resampled as N
equidistant points and short fibers were automatically deleted as
noise. The pre-processed fiber bundles of all subjects were
shown in Fig. 3 in the atlas space.
2.2 Feature Extraction
As each fiber is a represented by a unique vector with 3N
spatial coordinates’ values, a naïve way to extract the feature is
to take the spatial representation as the input feature for the
classification. However, as we can see from Fig. 1, those fibers
were not perfectly aligned to each other; fibers of same bundle
from different subjects tend to suffer from spatial misalignment.
But we could also notice that same bundles tend to have similar
shapes. A good shape representation of tracts is needed to
capture the differences between classes.
2.2.1 3D Shape Context
Shape context is a feature descriptor that allows for
measuring shape similarity and recovering point
correspondence. First proposed by [15], shape context is known
as a rich, robust and discriminative descriptor being used in
object recognition. Basically, given a list of contour points, the
shape of an object is captured by the relative distribution of the
points in the plane relative to each point on the shape. Usually,
this is computed as a histogram using log-polar coordinates, as
shown in Fig. 4.
(a)
(b)
Fig. 4 Shape context demonstration in 2D case for understanding used in
[10]. (a) The log-polar bin. Counts for outermost radial bins are shown. By
reading the bins in the order pointed by the red curve in (a) we obtain the
histogram in (b).
Fig. 2 The extracted three bundles for one subject. Red: CST; purple: MCP;
green: SCP.
(a)
(c)
CST
(b)
MCP
SCP
Fig. 3 The pre-processed fiber bundles of all subjects
We extend the 2D version to 3D, and build a “context” for
each fiber at its centered origin. 10˚ bins for azimuthal and polar
angles and 20 bins for logspace radii are used to construct the
histogram. Fig. 5 shows an example of each sample tract from
three different bundles and their histogram representation. This
will help us cope with misalignment issue, as the histograms
capture the distribution of tracts, which are high-level features
as opposed to voxel-level features.
(a)
CST
(b)
MCP
(c) SCP
Fig. 5 The histogram representation for each sample tract from three bundles
3
2.2.2 Kernel PCA
The dimensionality of features influences directly the
complexity of the classification. Higher dimensional features
produce more complex classifiers, which will lead to less
training error. But as they tend to over-fit the data, they lose the
generalization ability and increase the testing error. In our case,
each feature is histogram with 12, 960 bins. Intuitively thinking,
this histogram is a very sparse representation, as most of the bins
contains no points for the fibers, and thus can be reduced into a
much lower dimensional feature space.
Principal component analysis (PCA) is a widely used
dimensionality reduction technique to select the major features
by finding the most uncorrelated variables. This can be
obtained by the eigen decomposition of the covariance matrix of
the original data set
:
Where v is the matrix of eigen vectors and is the diagonal
matrix with eigen values. By selecting the top L largest eigen
values, the dimensionality can be reduced to L.
One of the drawbacks for using PCA is that the data points
have to be linearly separated. For high dimensional non-linear
data, PCA is not powerful enough to capture their correlations.
One non-linear extension is the kernel PCA, which assumes that
the data could almost be linearly separated in a higher
dimensional space by a non-linear mapping. The “kernel trick”
is applied in previous equation, where the dot product is
replaced with a kernel function that simulates the dot product in
the high dimensional space. There are various kernel functions
that could be used. Here, we adopt the histogram intersection
kernel, first introduced by [16] to compare color histograms.
The histogram intersection kernel between two points is defined
as:
Applying the kernel PCA and choosing the top six eigen
values, we project each histogram into a six dimensional feature
space.
2.2.3 Feature Scaling
Scaling components before classification is important [17].
The main advantage is to avoid attributes in a greater numeric
ranges dominate those in smaller numeric ranges. Another
reason is to avoid numerical difficulties during the calculation.
Chin-Wei Hshu et al. [18] recommends linearly scaling each
attribute to the range [-1:1] or [0:1].
For supervised classification, data sets are split into training
set and testing set. In this case, same rescaling method should be
used to maintain consistency. For example, suppose we scaled
the first attribute of training data from [-10:10] to [-1:1]. If the
first attribute of testing data is lying in the range [-11:8], we
must scale the testing data to [-1.1: 0.8].
2.3 Fiber Classification
Both unsupervised learning and supervised learning methods
are investigated here. Unsupervised learning method includes
K-means, which performs grouping directly on testing data.
Supervised learning algorithms include discriminant analysis,
support vector machine, k nearest neighborhood. For easy
understanding,
the
training
set
is
denoted
as
, which contains both vectors of
observations
and corresponding class labels
.
Testing set is denoted as
without knowledge
of the corresponding hidden class labels
2.3.1 Discriminant Analysis
The discriminant analysis assumes that the data are normally
distributed. The idea is to transform the multivariate
observations x to univariate observations y such that the y’s
derived from different classes were separated as much as
possible. This is done by maximizing the ratio of intra-class
scatter to the inter-class scatter among several competing
classes.
In the case of linear discriminant analysis, the classes are
assumed to have identical full rank covariance matrix, and the
discriminant function is:
For quadratic discriminant analysis, the classes do not
necessarily have identical covariance matrix and the
discriminant function is:
2.3.2 Support Vector Machine
Unlike discriminant analysis, support vector machine (SVM)
aims at finding a maximal margin that is defined as the distance
between bounding planes of different classes, while minimizing
classification error:
Kernel functions are usually applied, including:
 Linear:
 Radial basis function (RBF):
 Polynomial:
 Sigmoid:
SVM is a binary classifier. In order to extend to multi-class,
we adopt the one-against-one decomposition. This transforms
the multi-class problem into a series of binary subtasks that can
be trained by the binary SVM, and then construct the multi-class
rule based on the majority voting strategy.
2.3.3 K-Nearest Neighborhood
K-nearest neighborhood is a non-linear classification
algorithm that provides a simple local approximation of the
4
conditional density function. Every unseen point x is compared
through a distance function
to all points xi of the
training set. The k minimal distances are computed and the
majority over the corresponding labels
is taken as the
resulting label for x.
Different distance functions could be used. For example:
 Euclidean distance.
 Correlation: One minus the sample linear correlation
between observations.
2.3.4 K-Means
K-means is an unsupervised learning method. Starting from k
initial cluster centers, k-means clustering aims to partition the n
observations into k sets so as to minimize the within-cluster sum
of distances.
Similar to K-Nearest Neighborhood, different distance
functions could also be adopted here.
 Squared Euclidean distance.
 Correlation: One minus the sample correlation
between points.
(a) SVM with linear kernel
(b) SVM with sigmoid kernel
(c) KNN with Euclidean distance
(d) K-Means with Squared Euclidean
distance
Fig. 6 Partial results for some classifiers.
III. EXPERIMENTAL EVALUATION
We test 10 classifiers, which are, LDA, QDA, SVM using
linear kernel, SVM using RBF kernel, SVM using polynomial
kernel, SVM using sigmoid kernel, KNN using Euclidean
distance with k = 8, KNN using correlation with k = 8, K-Means
using Euclidean distance, K-Means using correlation. Since we
only have 10 volumes, we evaluate our classifiers’
performances using cross-validation scheme. In a k fold
leave-one-out cross validation, the dataset is partitioned into k
subsets and each time k-1 samples are used for training and 1 for
testing. The process is iterated k times until all the examples are
tested once. For each classifier, we calculate the dice
coefficients for every iteration.
Fig. 6 shows partial results from some classifiers. SVM using
linear kernel and sigmoid kernel are shown in (a) and (b), with
the red, blue, green dots representing the training samples, the
black circles the learned support vectors, the magenta ones the
misclassified testing data. (c) is the result for KNN classifier
based on Euclidean distance. (d) is the clustering result for
K-Means.
Fig. 7 shows the dice coefficient for all classifiers using
leave-one-out validation.
The results seem to be very satisfactory. With almost all
classifiers, it can label the right bundle with more than 80%
accuracy. This indicates that our features that represent
individual fibers are mostly separable between classes. Hence
the 3D shape context could capture the shape variability of the
fibers pretty well. Compared with different classifiers, it shows
that in our case, non-linear classifiers outperform the linear ones,
which could be explained by the intrinsic non-linear properties
of our features. Moreover, the supervised methods have higher
accuracy and consistency than the unsupervised methods.
Fig. 7 Dice coefficient for all classifiers. X axis from 1 to 10: LDA, QDA, SVM
with linear kernel, SVM with RBF kernel, SVM with polynomial kernel, SVM
with sigmoid kernel, KNN using Euclidean distance, KNN using correlation,
K-Means using Euclidean distance, K-Means using correlation.
IV. DISCUSSION AND CONCLUSION
This paper presents an effort to use supervised classification
methods to identify fiber bundles in human brain. It seems that
the shape context is a promising feature descriptor to delineate
each fiber; meanwhile, combining supervised classification
methods from machine learning techniques could be explored
more in the fiber bundle classification domain.
REFERENCES
[1]
[2]
[3]
Lin, C.P., Tseng, W.Y., Cheng, H.C. & Chen, J.H. Validation of diffusion
tensor magnetic resonance axonal fiber imaging with registered
manganese-enhanced optic tracts. NeuroImage 14, 1035-1047 (2001).
Mori, S., Crain, B.J., Chacko, V.P. & Van Zijl, P.C. Three-dimensional
tracking of axonal projections in the brain by magnetic resonance
imaging.Annals of Neurology 45, 265-269 (1999).
Basser, P.J., Pajevic, S., Pierpaoli, C., Duda, J. & Aldroubi, A. In vivo
fiber tractography using DT-MRI data. Magnetic Resonance in
Medicine 44, 625-632 (2000).
5
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
Lee, S.-K. et al. Diffusion-tensor MR imaging and fiber tractography: a
new method of describing aberrant fiber connections in developmental
CNS anomalies. Radiographics a review publication of the Radiological
Society of North America Inc 25, 53-65; discussion 66-68 (2005).
Stieltjes, B. et al. Diffusion tensor imaging and axonal tracking in the
human brainstem. NeuroImage 14, 723-735 (2001).
Schmahmann, J.D. et al. Association fibre pathways of the brain: parallel
observations
from
diffusion
spectrum
imaging
and
autoradiography. Brain: A journal of neurology 130, 630-653 (2007).
Ding, Z., Gore, J.C. & Anderson, A.W. Classification and quantification
of neuronal fiber pathways using diffusion tensor MRI. Magnetic
Resonance in Medicine 49, 716-721 (2003).
Brun, A., Knutsson, H., Park, H.-J., Shenton, M.E. & Westin, C.-F.
Clustering Fiber Traces Using Normalized Cuts. Lecture Notes in
Computer Science 3216/2004, 368-375 (2004).
OʼDonnell, L. & Westin, C.-F. White matter tract clustering and
correspondence in populations. Medical Image Computing and
Computer-Assisted Intervention 8, 140-147 (2005).
Adluru, N. et al. Classification in DTI using shapes of white matter
tracts.Conference Proceedings of the International Conference of IEEE
Engineering in Medicine and Biology Society 2009, 2719-2722 (2009).
Zimmerman-Moreno, G., Mayer, A. & Greenspan, H. Classification trees
for fast segmentation of DTI brain fiber tracts. 2008 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition
Workshops 1-7 (2008).
Carpenter, M. 1976. Human Neuroanatomy. Williams & Wilkins,
Baltimore.
Orioli, P., and Strick, P. 1989. Cerebellar connections with the motor
cortex and the arcuate premoter area: An analysis employing retrograde
transneuronal transport of WGA-HRP. J. Comp. Neurol.22: 612–626.
Jiang, H., Van Zijl, P.C.M., Kim, J., Pearlson, G.D. & Mori, S. DtiStudio:
resource program for diffusion tensor computation and fiber bundle
tracking.Computer Methods and Programs in Biomedicine 81, 106-116
(2006).
Belongie, S., Malik, J. & Puzicha, J. Shape matching and object
recognition using shape contexts. IEEE Transactions on Pattern Analysis
and Machine Intelligence 24, 509-522 (2002).
M. J. Swain and D. H. Ballard, ‘Color indexing,” IJCV, vol. 7, no. I , p
p . 11-32, 1991
Periodic posting to the usenet news-group comp.ai.neural-nets. neural
networks faq., 1997.