Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Classification of DTI Major Brainstem Fiber Bundles Yuan Liu Abstract—MRI diffusion imaging of the human brain provides neuron fiber trajectories that can be grouped into different bundles that have distinctive neurological functions. In this paper, three major fiber bundles passing through brainstem are identified. Features for representing fiber trajectories are studied and a 3D shape context is adopted to capture the fiber shape. Kernel PCA with histogram intersection kernel is then applied to reduce the dimensionality of the fiber histogram. Both supervised and unsupervised classification methods are investigated, including discriminant analysis, support vector machine, K-Nearest neighborhood, K-Means, with different kernels and distance metric. A leave-one-out validation is done in 10 subjects and the results show the feasibility of the approach to fiber bundling. Index Terms—Diffusion tensor images; fiber classification; shape context; supervised learning I. INTRODUCTION D iffusion tensor imaging (DTI) has provided new insights for researchers to study tissue microstructure and organization of brain white matter in vivo. From a set of diffusion weighted images, a distinct diffusion tensor is computed and assigned to each voxel. The principal directions of the tensor were proven to be related to the directionality of white matter axon bundles [1]. Various algorithms, deterministic or probabilistic, have been proposed to find the neural pathways based on that [2] [3]. These tractography models yields a set of pathways, i.e., fibers, by following the principal direction. Once a fiber set has been obtained by means of a tractography method, the next step to achieve clinical relevance is to find significant pathways that form homonymous fiber bundles. This is to say, we would like to have a classification of the brain fibers, with the major bundles. This is a difficult task, not only because of the complexity of the problem itself but also the deficiency of the fiber tracking algorithm and tensor reconstruction methods. In recent years, a lot of effort has been devoted in this field, which could be divided into these three main approaches: User-guided approaches: these methods try to identify bundles based on prior knowledge from experts. This is usually implemented by drawing a series of regions of interest (ROI) and using the ROI operations to extract the desired neuropath ways. Methods of this kind have been previously reported in [4] [5] [6]. The main advantage is that the extracted fibers are more accurate and robust due to the prior information. However, this is too time consuming and energy consuming. It requires certain degrees of training and probably suffers from inter and intra operator variability. Cluster-based approaches: these methods try to use properties of fiber sets and do a grouping of fibers from the same bundles using some studied similarity measurements [7] [8] [9]. The main advantage of such methods is that they are fully automatic and effortless. However, not all the clustering results necessarily correspond to the defined fiber bundles. As clustering algorithms are unsupervised, they do not take any prior knowledge compared to the user-guided approaches. The robustness and accuracy of such methods needs to be improved. Learning-based approaches: these methods attempts to learn the features of fiber tracts using the training data set and apply the trained model to the testing data set for supervised classification [10] [11]. This could be considered as semi-automatic, as it has to have the ground truth marked for the training set. The methods proposed in this paper can be considered try both the second and the third category. We want to find a robust feature descriptor for the fibers and applying machine learning techniques for automatic classification. We would also like to investigate the characteristics of different classifiers. The brainstem is a region characterized by densely packed fibers traveling to and from the cerebrum and cerebellum [12]. Some of these fibers are of critical importance in the initiation, control, and execution of movement [13]. In our study, we want to classify among three important fiber bundles passing through the brainstem, i.e., the corticospinal tract (CST), the medial cerebellar peduncle (MCP), and the superior cerebellar peduncle (SCP). A schematic overview of the overall system is shown in Fig. 1. II. METHODOLOGY 2.1 Data Pre-processing The DTI data used in this study are from 10 subjects in the deep brain stimulation database, with voxel size 2×2×2 mm and image size 128×128×60. Fibers are tracked in the native space using DtiStudio [14], which is based on the Fiber Assignment 2 by Continuous Tracking (FACT) algorithm and a brute-force reconstruction approach. Each fiber is represented as a discrete three dimensional curve. The three major tracts in the brainstem, namely, the corticospinal tract (CST), medial cerebellar peduncle (MCP), superior cerebellar peduncle (SCP), are identified according to [5] based on ROI editing as the ground truth for each subject. Fig. 2 shows an example of these three bundles overlaid in the color map for one subject. Fig. 1 Schematic overview of the classification system. The b0 image of one volume was chosen as the atlas and the rest were aligned to this atlas by rigid and non-rigid registration. Fibers were projected to this common space by combining all he deformation fields. Then, each fiber was resampled as N equidistant points and short fibers were automatically deleted as noise. The pre-processed fiber bundles of all subjects were shown in Fig. 3 in the atlas space. 2.2 Feature Extraction As each fiber is a represented by a unique vector with 3N spatial coordinates’ values, a naïve way to extract the feature is to take the spatial representation as the input feature for the classification. However, as we can see from Fig. 1, those fibers were not perfectly aligned to each other; fibers of same bundle from different subjects tend to suffer from spatial misalignment. But we could also notice that same bundles tend to have similar shapes. A good shape representation of tracts is needed to capture the differences between classes. 2.2.1 3D Shape Context Shape context is a feature descriptor that allows for measuring shape similarity and recovering point correspondence. First proposed by [15], shape context is known as a rich, robust and discriminative descriptor being used in object recognition. Basically, given a list of contour points, the shape of an object is captured by the relative distribution of the points in the plane relative to each point on the shape. Usually, this is computed as a histogram using log-polar coordinates, as shown in Fig. 4. (a) (b) Fig. 4 Shape context demonstration in 2D case for understanding used in [10]. (a) The log-polar bin. Counts for outermost radial bins are shown. By reading the bins in the order pointed by the red curve in (a) we obtain the histogram in (b). Fig. 2 The extracted three bundles for one subject. Red: CST; purple: MCP; green: SCP. (a) (c) CST (b) MCP SCP Fig. 3 The pre-processed fiber bundles of all subjects We extend the 2D version to 3D, and build a “context” for each fiber at its centered origin. 10˚ bins for azimuthal and polar angles and 20 bins for logspace radii are used to construct the histogram. Fig. 5 shows an example of each sample tract from three different bundles and their histogram representation. This will help us cope with misalignment issue, as the histograms capture the distribution of tracts, which are high-level features as opposed to voxel-level features. (a) CST (b) MCP (c) SCP Fig. 5 The histogram representation for each sample tract from three bundles 3 2.2.2 Kernel PCA The dimensionality of features influences directly the complexity of the classification. Higher dimensional features produce more complex classifiers, which will lead to less training error. But as they tend to over-fit the data, they lose the generalization ability and increase the testing error. In our case, each feature is histogram with 12, 960 bins. Intuitively thinking, this histogram is a very sparse representation, as most of the bins contains no points for the fibers, and thus can be reduced into a much lower dimensional feature space. Principal component analysis (PCA) is a widely used dimensionality reduction technique to select the major features by finding the most uncorrelated variables. This can be obtained by the eigen decomposition of the covariance matrix of the original data set : Where v is the matrix of eigen vectors and is the diagonal matrix with eigen values. By selecting the top L largest eigen values, the dimensionality can be reduced to L. One of the drawbacks for using PCA is that the data points have to be linearly separated. For high dimensional non-linear data, PCA is not powerful enough to capture their correlations. One non-linear extension is the kernel PCA, which assumes that the data could almost be linearly separated in a higher dimensional space by a non-linear mapping. The “kernel trick” is applied in previous equation, where the dot product is replaced with a kernel function that simulates the dot product in the high dimensional space. There are various kernel functions that could be used. Here, we adopt the histogram intersection kernel, first introduced by [16] to compare color histograms. The histogram intersection kernel between two points is defined as: Applying the kernel PCA and choosing the top six eigen values, we project each histogram into a six dimensional feature space. 2.2.3 Feature Scaling Scaling components before classification is important [17]. The main advantage is to avoid attributes in a greater numeric ranges dominate those in smaller numeric ranges. Another reason is to avoid numerical difficulties during the calculation. Chin-Wei Hshu et al. [18] recommends linearly scaling each attribute to the range [-1:1] or [0:1]. For supervised classification, data sets are split into training set and testing set. In this case, same rescaling method should be used to maintain consistency. For example, suppose we scaled the first attribute of training data from [-10:10] to [-1:1]. If the first attribute of testing data is lying in the range [-11:8], we must scale the testing data to [-1.1: 0.8]. 2.3 Fiber Classification Both unsupervised learning and supervised learning methods are investigated here. Unsupervised learning method includes K-means, which performs grouping directly on testing data. Supervised learning algorithms include discriminant analysis, support vector machine, k nearest neighborhood. For easy understanding, the training set is denoted as , which contains both vectors of observations and corresponding class labels . Testing set is denoted as without knowledge of the corresponding hidden class labels 2.3.1 Discriminant Analysis The discriminant analysis assumes that the data are normally distributed. The idea is to transform the multivariate observations x to univariate observations y such that the y’s derived from different classes were separated as much as possible. This is done by maximizing the ratio of intra-class scatter to the inter-class scatter among several competing classes. In the case of linear discriminant analysis, the classes are assumed to have identical full rank covariance matrix, and the discriminant function is: For quadratic discriminant analysis, the classes do not necessarily have identical covariance matrix and the discriminant function is: 2.3.2 Support Vector Machine Unlike discriminant analysis, support vector machine (SVM) aims at finding a maximal margin that is defined as the distance between bounding planes of different classes, while minimizing classification error: Kernel functions are usually applied, including: Linear: Radial basis function (RBF): Polynomial: Sigmoid: SVM is a binary classifier. In order to extend to multi-class, we adopt the one-against-one decomposition. This transforms the multi-class problem into a series of binary subtasks that can be trained by the binary SVM, and then construct the multi-class rule based on the majority voting strategy. 2.3.3 K-Nearest Neighborhood K-nearest neighborhood is a non-linear classification algorithm that provides a simple local approximation of the 4 conditional density function. Every unseen point x is compared through a distance function to all points xi of the training set. The k minimal distances are computed and the majority over the corresponding labels is taken as the resulting label for x. Different distance functions could be used. For example: Euclidean distance. Correlation: One minus the sample linear correlation between observations. 2.3.4 K-Means K-means is an unsupervised learning method. Starting from k initial cluster centers, k-means clustering aims to partition the n observations into k sets so as to minimize the within-cluster sum of distances. Similar to K-Nearest Neighborhood, different distance functions could also be adopted here. Squared Euclidean distance. Correlation: One minus the sample correlation between points. (a) SVM with linear kernel (b) SVM with sigmoid kernel (c) KNN with Euclidean distance (d) K-Means with Squared Euclidean distance Fig. 6 Partial results for some classifiers. III. EXPERIMENTAL EVALUATION We test 10 classifiers, which are, LDA, QDA, SVM using linear kernel, SVM using RBF kernel, SVM using polynomial kernel, SVM using sigmoid kernel, KNN using Euclidean distance with k = 8, KNN using correlation with k = 8, K-Means using Euclidean distance, K-Means using correlation. Since we only have 10 volumes, we evaluate our classifiers’ performances using cross-validation scheme. In a k fold leave-one-out cross validation, the dataset is partitioned into k subsets and each time k-1 samples are used for training and 1 for testing. The process is iterated k times until all the examples are tested once. For each classifier, we calculate the dice coefficients for every iteration. Fig. 6 shows partial results from some classifiers. SVM using linear kernel and sigmoid kernel are shown in (a) and (b), with the red, blue, green dots representing the training samples, the black circles the learned support vectors, the magenta ones the misclassified testing data. (c) is the result for KNN classifier based on Euclidean distance. (d) is the clustering result for K-Means. Fig. 7 shows the dice coefficient for all classifiers using leave-one-out validation. The results seem to be very satisfactory. With almost all classifiers, it can label the right bundle with more than 80% accuracy. This indicates that our features that represent individual fibers are mostly separable between classes. Hence the 3D shape context could capture the shape variability of the fibers pretty well. Compared with different classifiers, it shows that in our case, non-linear classifiers outperform the linear ones, which could be explained by the intrinsic non-linear properties of our features. Moreover, the supervised methods have higher accuracy and consistency than the unsupervised methods. Fig. 7 Dice coefficient for all classifiers. X axis from 1 to 10: LDA, QDA, SVM with linear kernel, SVM with RBF kernel, SVM with polynomial kernel, SVM with sigmoid kernel, KNN using Euclidean distance, KNN using correlation, K-Means using Euclidean distance, K-Means using correlation. IV. DISCUSSION AND CONCLUSION This paper presents an effort to use supervised classification methods to identify fiber bundles in human brain. It seems that the shape context is a promising feature descriptor to delineate each fiber; meanwhile, combining supervised classification methods from machine learning techniques could be explored more in the fiber bundle classification domain. REFERENCES [1] [2] [3] Lin, C.P., Tseng, W.Y., Cheng, H.C. & Chen, J.H. Validation of diffusion tensor magnetic resonance axonal fiber imaging with registered manganese-enhanced optic tracts. NeuroImage 14, 1035-1047 (2001). Mori, S., Crain, B.J., Chacko, V.P. & Van Zijl, P.C. Three-dimensional tracking of axonal projections in the brain by magnetic resonance imaging.Annals of Neurology 45, 265-269 (1999). Basser, P.J., Pajevic, S., Pierpaoli, C., Duda, J. & Aldroubi, A. In vivo fiber tractography using DT-MRI data. Magnetic Resonance in Medicine 44, 625-632 (2000). 5 [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] Lee, S.-K. et al. Diffusion-tensor MR imaging and fiber tractography: a new method of describing aberrant fiber connections in developmental CNS anomalies. Radiographics a review publication of the Radiological Society of North America Inc 25, 53-65; discussion 66-68 (2005). Stieltjes, B. et al. Diffusion tensor imaging and axonal tracking in the human brainstem. NeuroImage 14, 723-735 (2001). Schmahmann, J.D. et al. Association fibre pathways of the brain: parallel observations from diffusion spectrum imaging and autoradiography. Brain: A journal of neurology 130, 630-653 (2007). Ding, Z., Gore, J.C. & Anderson, A.W. Classification and quantification of neuronal fiber pathways using diffusion tensor MRI. Magnetic Resonance in Medicine 49, 716-721 (2003). Brun, A., Knutsson, H., Park, H.-J., Shenton, M.E. & Westin, C.-F. Clustering Fiber Traces Using Normalized Cuts. Lecture Notes in Computer Science 3216/2004, 368-375 (2004). OʼDonnell, L. & Westin, C.-F. White matter tract clustering and correspondence in populations. Medical Image Computing and Computer-Assisted Intervention 8, 140-147 (2005). Adluru, N. et al. Classification in DTI using shapes of white matter tracts.Conference Proceedings of the International Conference of IEEE Engineering in Medicine and Biology Society 2009, 2719-2722 (2009). Zimmerman-Moreno, G., Mayer, A. & Greenspan, H. Classification trees for fast segmentation of DTI brain fiber tracts. 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 1-7 (2008). Carpenter, M. 1976. Human Neuroanatomy. Williams & Wilkins, Baltimore. Orioli, P., and Strick, P. 1989. Cerebellar connections with the motor cortex and the arcuate premoter area: An analysis employing retrograde transneuronal transport of WGA-HRP. J. Comp. Neurol.22: 612–626. Jiang, H., Van Zijl, P.C.M., Kim, J., Pearlson, G.D. & Mori, S. DtiStudio: resource program for diffusion tensor computation and fiber bundle tracking.Computer Methods and Programs in Biomedicine 81, 106-116 (2006). Belongie, S., Malik, J. & Puzicha, J. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 509-522 (2002). M. J. Swain and D. H. Ballard, ‘Color indexing,” IJCV, vol. 7, no. I , p p . 11-32, 1991 Periodic posting to the usenet news-group comp.ai.neural-nets. neural networks faq., 1997.