The human brain is a complex neural system composing many anatomical structures. To study the functional and structural properties of its subcortical regions, image segmentation is a critical step in quantitative brain image analysis and clinical diagnosis. However, segmenting subcortical structures is difficult because they are small and often exhibit large variations in shape. Moreover, some structural boundaries are subtle or even missing in images. Although manual annotation is a standard procedure for obtaining quality segmentation, it is time-consuming and can suffer from inter- and intraobserver inconsistencies. In recent years, researchers have been focusing on developing automatic atlas-based segmentation methods which can effectively incorporate expert prior knowledge about the relationships between local intensity profiles and tissue labels. And many softwares have become available for brain image segmentation, such as FreeSurfer [1], BrainSuite [2], BrainVoyage [3], BrainVisa [4] and so on.
This work was supported in part by the Hong Kong Research Grants Council under Grant 16203115.
S. Bao is with the Lo Kwee-Seong Medical Image Analysis Laboratory, Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong (e-mail: sbao@cse.ust.hk).
A. Chung is with the Lo Kwee-Seong Medical Image Analysis Laboratory, Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong (e-mail: achung@cse.ust.hk).
Fig. 1. Overview of the main components in atlas-based segmentation.
Atlas-based segmentation involves three main components, image registration between atlases and a target image, label propagation, and label fusion, as summarized in Fig. 1. To register images of intra-subject generated by different modalities, global transformation methods can be used, such as rigid or affine transformation. As for the registration of inter-subject or longitude analysis of intra-subject, global transformation is insufficient to estimate an accurate deformation field due to the high anatomical variabilities among these images. Local transformation, represented by non-rigid registration, has been proposed to deal with this problem. In non-rigid registration, the deformation field can be estimated using control points on the grid, with a combination of B-splines [5] or cosine basis functions [6]. To further improve the quality of anatomical or matching correspondences between two images, symmetric diffeomorphism [7] moves both images simultaneously along a geodesic path until meeting at the middle of normalization domain and then the whole path or deformation field can be obtained by uniting the two parts of geodesic paths. In the evaluation of 14 nonlinear deformation algorithms [8], ANTs based on symmetric diffeomophism is selected as one of the best methods. While with a large number of target images to be labeled, the pairwise non-rigid registration methods can suffer from the expensive time consumption.
After image registration, the label maps can be propagated from atlases to the target image and multiple tissue labels can be collected for each image position, making label fusion a crucial final aggregation step for the reliable labeling of target images. A generative model for image segmentation based on label fusion is proposed in [9] and different label fusion strategies are discussed. Majority voting is commonly used, while its accuracy can be adversely affected if the atlases are dissimilar. In voting using global weights, the similarity between each atlas and the target image is calculated and used
Fig. 2. Effects of feature sensitivity. (a) Cropped target intensity image. (b) Manually labeled Hippocampus. (c) Color image to show the effects of feature sensitivity, with RGB values standing for the feature coefficients of intensity, gradient and structural signature respectively. (d) Three selected examples to illustrate their dominate features.
as a weight during the label fusion process. Recently, more label fusion methods based on patches [10], [11], [12], [13], [14], [15] have been proposed, which were first introduced for image de-noising [16] and recently become more prevalent in medical image segmentation.
Generally, there are three stages in patch-based label fusion. First, it is necessary to determine which kind of feature to be adopted as pixel representation. The conventional way is to collect pixel values inside the surrounding patch to formulate an intensity feature vector. To better reveal image changes, gradient magnitude is another commonly used feature information. However, it is not adequate enough to obtain quality segmentation if just relying on the above two kinds of features, as they can only capture local and low-level properties. Some advanced approaches have been proposed to extract high-level features to compensate for local limitations. In [17], the contextual information, which estimates the relative relation between intensity values, is appended to form an augmented feature vector for cardiac image segmentation.
With feature representation established, the second stage is to distinguish candidate pixels or patches for voting. In [12], to label a centre pixel in the target image, all surrounding small patches from atlases are utilized for weighted voting. To avoid the adverse effects from dissimilar patches, an extension has been proposed in [10] which involves first ranking the small patches based on structure similarity, followed by combining the selected ones in the final labeling. Another patch selection method based on sparse representation was proposed by Liao et al. [11], which selects patch-based signatures with sparse logistic and the LASSO interface [18].
The third stage is to fuse the labels of candidate atlas nodes and the fusion strategies fall into two main categories: weighted voting and image patch reconstruction. It is a common way to first estimate the similarity between two patches by embedding their sum of squared difference to the Gaussian function and then to utilize the similarity value as the weight for voting [10], [12]. Besides the independent impact on target pixel, Joint Label Fusion [13] also takes the error correlation among atlas patches into consideration and tries to find the optimal weight for voting. For the second category, to reconstruct a target patch, the linear combination coefficients of atlas patches need to be optimized first and the label of the centre pixel can be then assigned to the class with a minimum reconstruction error [14].
Moreover, as shown in [10], [12], the patch-based la-
bel fusion methods do not necessarily depend on the time-consuming non-rigid registration. While given the poor contrast condition in the brain Magnetic Resonance (MR) images and similar histogram profiles among adjacent structures, the label fusion with only affine transformation as processing becomes more challenging. As such, to compensate for the quality loss caused by the affine transformation, it raises the demand to design a more elegant label fusion process for brain MR image segmentation. Under the assumption that distinct features can assist the segmentation in a complementary way, in this paper, Feature Sensitive Label Prior (FSLP) is designed to capture label priors from atlases, whose process is distinct with the conventional label fusion at every stage.
As suggested in the segmentation of cardiac MR images, embracing more features besides intensity, such as contextual information, can help improve the segmentation quality [17]. For pixel representation, besides conventional intensity and gradient features, structural signature is introduced to extract the high-level property of each subcortical structure based on the Convolutional Neural Networks. During candidate node selection, rather than exact searching within a confined scope, the random k-d tree with a spatial constraint is put forward as an efficient approximation for high dimensional data matching. In the fusion stage, feature sensitivity is taken into account for the variance and consistency among various features. As FSLP is a non-convex problem, one heuristic method is further proposed to solve it by alternately dealing with two convex problems.
The motivations to introduce FSLP are two-fold. On the one hand, the contributions of distinct features are expected to be consistent during label fusion, i.e., they can reach an agreement when labeling a pixel. On the other hand, the impact of different features can change according to image conditions. For the flat regions away from structural boundaries, intensity and gradient are supposed to be more essential. As for the complex region near tissue bounders, structural signature should play a more significant role. The experimental result with our method also justifies the initial motivation, as shown in Fig. 2. The sub-figures (a) and (b) are a cropped target intensity image and its corresponding label map of the Hippocampus. In (c), for the pixels where atlases cannot make an agreement, the optimal feature coefficients estimated in FSLP are displayed as three channels of RGB. Three representative examples are selected to explain the dominant features in each pattern in (d). The color image in (c) demonstrates that the role of structural signature is more essential around tissue protrusions. For the other relatively flat regions, intensity and gradient matter more, which phenomenon also justifies our motivation to introduce feature sensitivity for label fusion.
In addition to FSLP from atlases, anatomical priors from target image are also utilized to assist our graph-based label fusion process. Based on anatomical knowledge, to label a pixel which is deep inside or outside a subcortical structure is easier, while to label one which is located around the boundary is challenging. As such, rather than updating labels for all target pixels, those far away from structural border are selected as graph seeds and their influence can be propagated to other pixels through image lattice. Unlike the graph-based labeling constructed with both atlas and target nodes [19], we further infer an equal but more concise graph to encode FSLP and anatomical prior, which only relies on target nodes. The objective energy function on the graph is formulated with Random Walker and can be solved as a discrete Dirichlet problem. To evaluate the proposed method, experiments have been carried out on two image databases and results demonstrate that our approach can obtain better performance as compared with other state-of-the-art methods.
Note that the preliminary version of this work has been published in the 19th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2016. In this paper, 1) we extend our previous work by generating multiple features and introducing randomized k-d tree with spatial constraint for efficient high dimensional feature matching; 2) additional mathematical proofs, solutions together with illustrative examples are given in this work; 3) intensive experiments has been carried out to evaluate each component of our proposed method and comprehensive evaluations have been done with the state-of-the-art methods.
In this paper, to obtain a more discriminative representation, three kinds of features are extracted and candidate nodes are selected for each pixel, which will be explained in Section II-A and Section II-B. Given the demands of consistency and variety among distinct feature vectors during label fusion, a novel method FSLP is proposed in Section II-C to deal with this dilemma, by collecting priors from atlases with feature sensitivity. Moreover, the pixels from target image are also selected based on anatomical knowledge, acting as anatomical prior. The whole label fusion process is modeled on an undirected graph and formulated under the framework of Random Walker, which is summarized in Section II-D.
A. Feature Generation
In medical images, the conventional feature utilized to represent a pixel is intensity values or gradient magnitudes in its surrounding cube. While these features are limited to local information and susceptible to adverse impacts from similar histogram profiles among tissues. As each subcortical structure in the brain has its own shape characteristics and structural properties, this kind of high-level features can be
Fig. 3. Illustration of convolution and pooling layers. With three images from layer as input, two feature maps generated in layer l, each corresponding to one pair of W and b, as stated in Equation (1).
used to formulate a more discriminative representation. Recently it has been proven that the feature extraction ability of Convolutional Neural Networks (CNN) [20] has surpassed hand-crafted features, like SIFT [21], and CNN has brought significant improvements in image classification [22], semantic segmentation [23], acoustic analysis [24] and so on. As such, in this paper, we propose to encode the high-level property for brain MR images with a feature vector extracted automatically using CNN.
CNN is inspired from a biological visual mechanism, where neurons in the higher layer operate on a subregion of neurons in the lower layer. In CNN, there are two basic components: convolution and pooling layers, as illustrated in Fig. 3. To estimate the convolutional response or in layer l, pixels within the subregion of images in the last layer (Red Region, namely the receptive field) are chosen as input. The convolution step consists of linear operation and non-linear activation, which can be formulated as follows:
where a is the convolutional response, refers to the non-linear activation function, x is the flattened input from the receptive field, W is the weight vector and b is the bias associated with the convolutional kernel.
It is notable that each feature map in the convolution layer is assigned with a specific pair of W and b. For the Purple pixel in the first feature map, its value can be estimated with and for the Green pixel in the second feature map, its value should be . The number of feature maps in each layer can be preset during the design of the network architecture, while the parameters W and b need to be learned through training.
For the non-linear activation , the conventional way is to employ a sigmoid or tanh function. However, both can encounter the saturation problem and kill the gradients during backpropagation. Recently, non-saturated functions have become prevalent, such as Rectified Linear Unit (ReLU) [25], leaky ReLU [26] and some other variants. The experiment in [22] demonstrates that ReLU can accelerate the training speed up to 6 times faster than the tanh function. As such, in the paper, ReLU is chosen as the activation function and Equation (1) can then be rewritten as:
Fig. 4. CNN architecture. Red: 2D input patch; Blue: convolution layers; Green: output layer; Orange: feature vector to the output layer.
TABLE I DETAILED PARAMETER SETTINGS OF CNN NETWORK TO EXTRACT STRUCTURAL SIGNATURE.
In CNN, the convolution and pooling layers are usually interwoven. The feature maps generated in the convolution layer can be regarded as input for the next pooling layer. As shown in Fig. 3, the patch in the i-th feature map of layer l polls for the corresponding pixel in the i-th feature map of layer l + 1. The pooling strategy used here can be either maximum or average pooling. From the example in Fig. 3, it can also be noticed that pooling can only shrink the feature maps, while leaving their amounts unchanged.
In this paper, to capture the high-level properties of sub-cortical structures, CNN is utilized to extract the structural signature from brain MR images. Fig. 4 illustrates the architecture of the employed network. There are seven layers in the network, including six alternating convolution (C1, C2 and C3) and average pooling (omitted as dashed lines) layers, and one output layer. The input to the network is a 2D patch with the size of pixels and the two nodes in the output layer refer to the probability of each class. The feature vector to the output layer is extracted and regarded as the structural signature. Detailed parameter settings of the network can be found in Table I.
To train the above network, each database is separated into two parts randomly in the experiments, with equal number of images as training (atlas) and testing (target) data sets. For the atlas pixels within region of interest (ROI), their surrounding patches (with the size of pixels) are extracted as training data, together with their corresponding labels. With the well trained network, we can obtain the structural signature for each pixel, by using its surrounding patch as input and extracting the feature vector before the output layer.
As demonstrated in [27], the performance of mitosis detection can be further improved by combining the discriminative CNN features with conventional handcrafted features, like morphology or color information. The method [17] also suggests that embracing high-level and low-level features yields better results in label fusion for cardiac image segmentation.
TABLE II SUMMARY OF GENERATED FEATURES FOR EACH PIXEL.
Fig. 5. Illustration of Feature Matching. For a considered pixel (Gray Square) in the target image, similar pixels are searched from the atlases using each kind of feature vector. Red, Purple and Green Squares represent similar pixels found with intensity, gradient and structural signature respectively. Dashed Gray Square is the spatial constraint and those outside similar pixels will not be involved as candidate nodes.
Under the assumption that distinct features can assist the segmentation in a complementary way, in this paper, we extract multiple features from brain MR images and will consider the feature sensitivity during label fusion. For each pixel, besides the structural signature, the intensity values and gradient magnitudes in the surrounding cube are also assembled as feature vectors. In total, three kinds of feature vectors are generated in the proposed method, which is summarized in Table II.
B. Feature Matching
After feature generation, the second stage in label fusion is to select candidate nodes from atlases. For each pixel in the target image, similar pixels can be selected from atlases using each kind of feature vector. In fact, it is the nearest neighbor (NN) problem to find similar points in real d-dimensional space from N samples. As shown in Table II, the dimension d of our generated features (intensity, gradient and structural signature) has a value of 125, 125 and 18 respectively. Using the brute force approach to check each sample in a sequential order, the computation complexity can be . Given the expensive computational cost of exact searching, approximate nearest neighbor (ANN) has been introduced to accelerate the searching speed. In [28], the ()-approximation to k nearest neighbors can be obtained in O(kd log N) time. However, the performance of this algorithm degrades rapidly along with the dimension increase and it cannot be applied well to high dimensional data. The matching results can become patchy when d becomes as high as 20. To tackle this issue, several advanced ANN approaches have been proposed for the application on high dimensional data, such as randomized k-d tree [29], locality sensitive hashing [30] and so on. In [31], it demonstrates that randomized k-d tree and priority
Fig. 6. (a) FSLP illustration. y is the feature vector for the target pixel, concatenating intensity (Red), gradient (Purple) and structural signature (Green). is feature coefficient and A is a dictionary constructed with atlas feature vectors. is the vector storing reconstruction weights. (b) Feature sensitive matrix . Each diagonal sub-matrix (Red corresponds to one kind of feature vector.
search k-means tree [32] can obtain the best results through comprehensive experiments. Therefore, the feature matching component of the proposed method is carried out on the foundation of the randomized k-d tree provided by fast library for approximate nearest neighbors (FLANN) toolbox [31].
As shown in Fig. 5, similar pixels are selected from the atlases using randomized k-d tree with each kind of feature vector. Considering the poor contrast condition in MR brain images and similar histogram profiles among adjacent tissues, the atlas pixels selected with randomized k-d tree can belong to other structures and mislead the subsequent fusion procedure. As such, a spatial constraint is enforced in the proposed method to filter out the pixels which are too far away from the considered target pixel. For those atlas pixels which cannot meet the spatial constraint (i.e., outside Dashed Gray Square), they tend to be deceptive similar pixels and therefore are not involved in the pool of candidate nodes.
C. Feature Sensitive Label Prior
In this paper, a novel method named Feature Sensitive Label Prior (FSLP) is proposed to capture label prior from atlases by seeking for the optimal linear combination of atlas nodes to reconstruct the feature vector of the target pixel, as illustrated in Fig. 6(a). For each considered pixel from the target image, its three kinds of features are extracted and concatenated together to formulate one augmented vector y. Its similar pixels selected from the atlases with Feature Matching are assembled as dictionary A. Given that the confidence and significance of different features can vary considerably, the feature coefficient is introduced to balance their influences. The optimal weight to reconstruct y with dictionary A is stored in vector . The formulation of FSLP is given as follows:
is the feature sensitive matrix, with its definition illustrated in Fig. 6(b). is split into three subregions (Red , Purple and Green ), each corresponding to one kind of feature vector. The diagonal elements in the sub-matrix are defined as:
where is the length of the j-th feature vector. Through the division between the coefficient and , the normalization on various features is enforced in the feature sensitive matrix. In Equation (3), with the regularization term on the coefficient vector , it guarantees that no feature dominates the whole optimization procedure.
By solving Equation (3), optimal feature coefficient and reconstruction weight can be obtained and label prior can be then estimated with grouped reconstruction error. However, Equation (3) is one non-convex problem, which may have multiple local optima and can be difficult to solve. The details of the proof are given in the Appendix. To solve Equation (3) efficiently, we also propose one solution for it in the following.
Problem Solution
As discussed above, when optimizing and simultaneously, Equation (3) is not a convex problem. To solve this non-convex problem efficiently, one heuristic approach is proposed in this paper by seeking optimal solutions for and alternately. The first step is to fix and Equation (3) turns into one least square problem:
This optimization problem is convex and its solution is . With updated , the second step is to fix it and Equation (3) is then simplified to one quadratic programming problem:
where
The newly introduced variables and are vectors related to three kinds of features, with length of and respectively. Equation (6) is also a convex problem and can be solved efficiently. The proposed heuristic algorithm iterates the above two steps until either one of the following two conditions are met: the change of is below a threshold or iterations exceed the predefined number.
With and acquired, the reconstruction error using each class can be estimated as follows:
where F and B refers to the foreground and background respectively. and refers to the weights for the foreground and background atlas nodes respectively. With the estimated reconstruction error, FSLP is encoded as edge weight on the graph during label fusion, which will be explained in next subsection.
D. Label Fusion with Random Walker
Besides the FSLP gathered from atlases, the anatomical knowledge from target images is also encoded in the proposed label fusion method. As mentioned in the Introduction section, to label pixels which locate deep inside or outside a subcortical structure is relatively easier as compared with those around structural boundary. Thanks to the location advantage, even with a rough initial label map generated by affine transformation, the labels of these pixels (far away from the object boundary) can be treated as confident results. This kind of confidence can be propagated to less confident pixels (near boundary) through image lattice, which is regarded as anatomical prior in our method.
In this paper, label fusion is formulated on an undirected graph G = (V, E), where V refers to a set of nodes consisting of foreground seeds , background seeds and candidate nodes . As both label and anatomical priors are employed in the proposed framework, two kinds of foreground seeds are included in from atlases and from the target image, similarly for . As for , it represents the set of nodes whose labels need to be determined during label fusion and these candidate nodes are selected from the target image. is the set of edges connecting nodes and , with as edge weight.
Since the number and location of nodes are critical to the efficiency of segmentation algorithms, the strategy for node selection needs to be deployed carefully. As the prior from atlases has been encoded to FSLP, and can be represented with two terminal nodes and the consideration of node selection can be limited to the target image. As discussed in our previous work [33], segmentation errors mainly lie around structural boundaries and those pixels which are far from the border can have higher label confidences. As such, in this paper, node selection is performed based on the Signed Distance Map (SDM), as illustrated in Fig. 7. With multiple label maps provided by a set of atlases, these maps are first fused with majority voting to produce the initial label map for the target image. Then its corresponding SDM can be estimated by calculating the Euclidean distance between a pixel and its nearest neighbour on the object boundary, with positive or negative value for outside (background) or inside (foreground) respectively. Using SDM and pre-defined distance threshold , the target seeds and candidate nodes can be identified, as displayed in Fig. 7(d).
With seeds and candidate nodes settled, the graph for label fusion can be constructed with edge connections, as shown in Fig. 8(a). The Orange and Purple nodes represent the atlas seeds and . Red and Black nodes refer to the foreground and background seeds selected from the target image. The influences of target seeds can be propagated to candidate nodes through image lattice. The affinity between nodes with lattice connection is defined using classical Gaussian function as follows:
where is one candidate node, refers to its 6-nearest neighbours in 3D image, is the pixel intensity value in
Fig. 7. Node selection. (a) 2D slice of target intensity image; (b) Initial label map fused with majority voting; (c) Signed Distance Map of b; (d) Red (inner) layer: target foreground seeds ; Black (outer) layer: target background seeds ; Blue (middle) layer: candidate nodes
Fig. 8. Graph construction for label fusion. (a) Orange and Purple nodes: atlas seeds. Red and Black nodes: target foreground and background seeds. is one candidate node and is one of its neighbours, with weight. FSLP is encoded to An equal graph of a.
the target image and is one tuning parameter. The FSLP is encoded as the edge weight between and atlas seeds, with the following definition:
Given that and are all foreground seeds, the edge weights between them are supposed to be infinity. In this case, setting up an edge between and is equal to appending an edge for with any target foreground seed, as illustrated in Fig. 8. In other words, the function of the atlas seeds can be replaced and FSLP can be assigned to the edges of and instead. In this way, the graph for label fusion can be constructed only with target nodes and the graph complexity can be greatly reduced.
For graph-based image segmentation, the general energy function [34] can be defined as follows:
where stands for the probability that node belongs to the foreground, with and . The first unary term considers the data fidelity of each node independently and the second binary term takes the impact between connected nodes into account. By minimizing the above energy function, the optimal solution for x can be obtained and the label of each
Fig. 9. Overview of the proposed feature sensitive label fusion.
node can be updated accordingly: if and otherwise.
As pointed out in [35], by assigning different values to p and q, Equation (10) can be adapted to several popular image segmentation models, including Graph Cuts, Random Walker, Power Watershed, and so on. However, as Graph Cuts prefers a surface with minimum energy, it can suffer from surface shrink [36]. In brain MR images, as a result of poor contrast conditions around structural boundaries, the shrinkage problem can be more serious. With Power Watershed, due to the fact that edge weights dominate the optimization procedure (q set to infinity), the generated boundary can be rough [35]. As such, to obtain a smooth and quality segmentation result, we choose to employ Random Walker (RW), with p and q set to 2. Then the minimization problem discussed above can be reformulated as follows:
This equation can be viewed as a discrete Dirichlet problem and solved by using the Laplace equation with Dirichlet conditions through Graph Analysis Toolbox [37].
Considering RW is sensitive to seed positions [38], the foreground and background seeds need to be chosen carefully. As mentioned above, one fundamental step for node selection is the initial label map, whose quality depends on the choice of registration methods, for example, non-rigid or affine transformation. To increase the robustness of the proposed label fusion approach to registration procedure, an iterative RW scheme is introduced to update the label map and gradually improve the quality of node selection. The experimental results shown in Fig. 10 also demonstrate that the segmentation accuracy can benefit from this iterative strategy and tends to be stable after several iterations.
The overview of the proposed method is summarized in Fig. 9. With atlas intensity and label maps , affine transformation is first carried out and an initial label map for the target image is obtained with majority voting. Node selection can be performed based on the SDM of the initial label map and the graph for label fusion can be constructed with these target nodes. With intensity values, gradient magnitude and structural signature as augmented feature vector, candidate nodes are selected from atlases and the atlas prior is gathered in the form of FSLP. With the label prior from atlases and anatomical prior from the target itself, label fusion is formulated on a graph with Random Walker and the label map is updated gradually through iterations until stable.
TABLE III DESCRIPTION OF IBSR AND LPBA40 DATABASES.
A. Databases and Preprocessing
To evaluate the performance of the proposed method, experiments have been carried out on two publicly available MR brain image databases – IBSR1 and LPBA402 [39]. The IBSR v2.0 database, consisting of 18 T1-weighted images with 84 manually labeled structures, is provided by the Center for Morphometries Analysis at Massachusetts General Hospital, U.S.A.. Three kinds of voxel resolutions () are utilized in the IBSR database: and . 18 healthy subjects, including 14 males and 4 females, took part in the image acquisition, with ages ranging between 7 and 71. All 18 images have been normalized to Talairach orientation and the bias field has been corrected.
The LPBA40 database, consisting of 40 images with 56 manually labeled structures and skull-stripped, is provided by the UCLA Laboratory of Neuro Imaging, U.S.A.. 40 human volunteers, including 20 males and 20 females, took part in the image acquisition, with ages ranging between 19 and 40. The 40 skull-stripped volumes have been rigidly registered to the MNI305 atlas and the intensity inhomogeneity has been corrected. Detailed description of these two databases is presented in Table III.
Given the significance of subcortical structures in clinical diagnosis, surgical planning and therapeutic assessment, in this paper, we focus on the extraction of subcortical structures from brain MR images. There are six subcortical structures labeled in IBSR database, including Amygdala, Caudate, Hippocampus, Pallidum, Putamen and Thalamus. As for the LPBA40 database, three subcortical structures are delineated: Caudate, Hippocampus and Putamen. Each of the subcortical structure has two sub-parts, located in the left and right hemispheres respectively.
In the experiments, each database was separated into two parts randomly, with equal number of images as training (atlas) and testing (target) data sets. Considering the intensity inconsistency among images, histogram matching was first conducted with the Insight Toolkit3. Then pair-wise registrations between each target image and all atlases were performed based on affine transformation, using FLIRT [40] provided by FSL toolbox [41]. With multiple label maps generated with various atlases, majority voting was applied to generate the
Fig. 10. Segmentation results by our method at each iteration on LPBA40 database, measured with DC.
initial label map and the results were also employed as the baseline during comparison.
B. Segmentation Results
In the Methodology section, the label fusion method with Random Walker was initially designed for binary segmentation. While in the experiments, there are several remarkable subcortical structures in one brain volume and some of them can be adjacent with each other. Directly applying binary segmentation for each structure independently may cause some inconsistencies around the neighboring areas. As such, it is necessary to extend the binary segmentation to multi-class segmentation in a refined way.
Distinct with other graph-based approaches (like Graph Cuts or Markov Random Field), Random Walker produces a probability map rather than a discrete label map, indicating the probability that each pixel belongs to the foreground. After applying Random Walker to each structure, we can obtain a vector for each pixel , where K is the total number of subcortical structures. represents the probability that belongs to the j-th subcortical structure. As for the background probability, is assigned as . Then the probability distribution over the K +1 classes, including the background and multiple structures, can be estimated with the softmax function (normalized exponential function). The category with the largest probability is assigned as the final label for each pixel.
Dice Coefficient (DC) is utilized to evaluate the quality of label fusion. In the proposed framework, the iterative strategy is exploited to update target label map gradually. To test the iterative effects, experiments have been conducted on LPBA40 database with available subcortical structures and the segmentation results at each iteration are recorded and displayed in Fig. 10. It can be observed that the segmentation accuracy increases with the number of iterations and tends to remain stable after three iterations.
Another set of experiments has been carried out to check how the amount of candidate nodes can influence the segmentation quality. As discussed in Feature Matching, with each kind of feature vector, a set of similar pixels can be collected from atlases with randomized k-d tree and the pool of candidate nodes can be further determined with a spatial constraint. In Fig. 11, the horizontal direction refers to the settings of how many similar pixels need to be selected with
Fig. 11. The effects of the setting of similar pixel amount on candidate nodes and segmentation quality with LPBA40 database. (a) Blue curve, amount of candidate nodes; Green curve, the percentage of candidate nodes among similar pixels. (b) Segmentation result measured with DC.
one kind of feature. In the Upper subfigure, the Bule curve displays the count of candidate nodes with three features, which indicates that with spatial constraint, only a portion of similar pixels can be kept in the candidate nodes pool. The Green curve shows the percentage of candidate nodes among similar pixels and the percentage decreases along with the increase of similar pixel amount, which can be caused by the disturbances from adjacent tissues with similar profiles. In the Bottom subfigure, the segmentation accuracy measured with DC is displayed and the peak of the performance lies around 32 similar pixels. Fig. 11 demonstrates that segmentation quality is not proportional to the number of candidate nodes and the setting of 32 similar pixels gives the best performance.
Based on the preliminary test on LPBA40, for the experiments on IBSR, the iteration was set to 3 and the number of similar atlas nodes was set to 32. The input patch size for various features follows Table II and the spatial constraint during Feature Matching was set to . In FSLP estimation, rather than choosing a fixed value in Equation (6), it was set to be adaptive
in each iteration. The settings of the rest parameters are listed as follows: signed distance threshold and for node selection, and the tuning parameter used in Equation (8) was set to .
There are several existing softwares which support the automatic segmentation function for brain MR images, for example, BrainSuite [2] or FreeSurfer [1]. Therefore, we decided to utilize BrainSuite, one of the available softwares, to label images in the IBSR and LPBA40 databases as a reference
TABLE IV EXPERIMENTAL RESULTS ON IBSR AND LPBA DATABASES, MEASURED WITH DC. HIGHEST VALUES ARE WRITTEN IN RED.
Fig. 12. Parameters selection for compared methods, measured with DC.
during evaluation. BrainSuite first runs surface/volume registration using the high-resolution () BCI-DNI brain atlas and then warps the label map from the atlas to the target image. Besides the reference BrainSuite and the baseline majority voting (MV), the comparison with the conventional patch-based label fusion (PBL) [10] has been made for evaluation. Considering multiple features employed in the proposed method, it was also compared with the state-of-the-art method – patch-based label fusion with augmented features (PBAF) [17]. In addition to intensity information, the spatial and context features are also appended for label fusion in PBAF. The implementations of PBL and PBAF provided by [17] were used in the experiments. To keep consistent in the evaluation, the patch size for PBL and PBAF was set to and the size of search volume was set to . For PBL, the key parameter is the Gaussian kernel value and it was tested from {1, 10, 100, 1000, 10000, 100000} on
two databases, measured with DC. From Fig. 12(a), it can be observed that 10 and 10000 gives the best performance on IBSR and LPBA40 respectively. For PBAF, the parameter setting of pre-selected atlas nodes amount was tested from {2, 16, 32, 64, 128, 256, 512} on two databases. Fig. 12(b) indicates that 128 can obtain the best result and the accuracy starts to decrease a little after the peak.
The quantitative segmentation results on two databases generated with our method and compared methods are listed in Table IV, with highest DC values written in Red. For the six subcortical structures delineated in the IBSR database, the accuracy on the left and right subcortical structures are listed respectively, separated by hyphen. The segmentation quality with available subcortical structures on the LPBA40 database is also reported in this Table. Although BrainSuite utilizes a high-resolution atlas, those patch-based methods (PBL, PBAF and our method) which rely on the low-resolution atlases inside the database, obtain much better performances. When compared with the baseline MV, our approach can create the considerable increase of 16.7 % and 9.3% respectively on two databases. The proposed method can still outperform the preeminent label fusion method PBAF by 3.2% and 0.8%.
In Fig. 13, we also present some visual results of 2D slices selected from 3D brain MR image volumes. Each row shows the original intensity slice, its corresponding ground truth, the segmentation results generated with compared methods and our method. The figures in the upper 3 rows are selected from IBSR database and those in the bottom 2 rows are from LBPA40 database. The first column (a) displays the 2D intensity slices from brain MR images, with the ground truth shown in column (b) for reference. The segmentation results generated with MV, PBL, PBAF and our method are shown in column (c) to (f). It can be observed that our method can obtain better segmentation quality. As compared with MV, the shapes of labeled subcortical structures by our method are closer to the ground truth. For the labeling results of the patch-based methods PBL and PBAF, some structures have isolate segments and the structural boundaries are relatively rough as compared with those of our method.
Fig. 13. Visual segmentation results of 2D slices selected from 3D brain image volumes. Each row presents the original intensity slice, corresponding ground truth, the segmentation results generated with our method and other compared methods. The figures in the upper 3 rows are selected from IBSR database and those in the bottom 2 rows are from LBPA40 database. (a) 2D intensity slices from brain MR images; (b) Ground truth for reference; (c) Majority voting; (d) Patch-based label fusion (PBL); (e) Patch-based label fusion with augmented features (PBAF); (f) Our method.
C. Further Discussion
There is an underlying assumption for the proposed Feature Sensitive Label Prior (FSLP): distinct features can assist the segmentation in a complementary way. To test the effects of utilizing multiple features, we compare the preliminary FSLP results with the label fusion using structural signature alone, as displayed in Fig. 14. In FSLP, besides the discriminative feature – structural signature extracted by CNN, the less discriminative features – intensity and gradient are also employed during label fusion. To further evaluate our feature sensitivity strategy, we also compare with the label fusion using fixed uniform feature coefficients, i.e., , and the results with equal weights (EW) are included in Fig. 14. These results demonstrate that embracing distinct features can yield better performance and the feature sensitivity strategy can consistently improve the segmentation quality. It is noting
Fig. 14. Comparison of the segmentation results on IBSR database using structural signature (extracted by CNN) alone, multiple features with equal weights (EW) and Feature Sensitive Label Prior (FSLP), measured with DC.
that FSLP is a general method to capture label prior from multiple features and its usage is not limited to the three kinds of features. Other features, such as Local Binary Pattern (LBP) [42] or Histogram of Oriented Gradients (HoG) [43], can be also encoded in FSLP to improve the performance.
For the computation cost of the proposed method, there are four main components to be considered: feature generation, feature matching, FSLP and label fusion with Random Walker. During feature generation, three kinds of features are generated: intensity, gradient and structural signature. The complexity of intensity and gradient extraction is O(dN), where d is the feature length. As for the structural signature, it only needs one forward pass through the CNN network to obtain the feature vector. As discussed in Section II-B, the feature matching process is carried out with the efficient randomized k-d tree algorithm. The FSLP is a non-convex problem and one heuristic approach is designed for it by alternately solving two convex problems. Given that the value of objective function will decrease strictly during each iteration and this non-singular function is lower-bounded by a finite value, the heuristic approach will converge after several iterations [44]. As both two convex problems (least square and quadratic programming) can be solved efficiently, the process to estimate FSLP can be finished in a short time. The last step is the label fusion with Random Walker, which is a discrete Dirichlet problem and can be solved efficiently using the Laplacian equation with the Dirichlet conditions. In total, the running time for labeling one sub-cortical structure in one target image is around 1.5 minutes using the proposed label fusion method (on a 3.1GHz, Quad-Core CPU with 8GB RAM machine), as compared with 10 minutes using PBAF.
Besides the volume overlap measurement Dice Coefficient (DC), we also evaluate the segmentation quality on IBSR database with one distance measurement – Hausdorff Distance (HD). The segmentation results of six subcortical structures are shown in Fig. 15, measured with HD. Although PBL can obtain higher DC values than MV, its performance is a little worse when measured with HD. This phenomenon may be caused by the lack of label consistency within the subcortical structures, as many holes and outliers exist in the labeled region (as shown in Fig. 13(d)). The results measured with HD demonstrate that our method can still obtain competitive performance as compared with the state-of-the-art methods.
In the proposed method, we collect FSLP from atlases to capture the relationships between local intensity profiles and tissue labels, and utilize anatomical priors from target image to assist graph-based label fusion. To check the effects of two priors in detail, the preliminary segmentation with FSLP is estimated and compared with the result after label fusion. Based on Equation (7), the intermediate labeling result by FSLP can be generated by assigning labels to pixels with minimum reconstruction error. The comparison has been carried out on IBSR database, with quantitative results measured with DC. Fig. 16 indicates that embracing anatomical priors during label fusion can bring consistent improvements for the labeling of each subcortical structure. Some visual segmentation results for each subcortical structure on IBSR database are displayed in Fig. 17. Each row presents the 3D labeled
Fig. 15. Segmentation quality on IBSR database, measured with HD.
Fig. 16. Comparison of the labeling result generated with FSLP and the complete proposed method on IBSR database, measured with DC.
volumes provided by ground truth for reference, intermediate labeling by FSLP and the final segmentation after graph-based label fusion with anatomical priors. As shown in column (b), the labeling result by FSLP also suffers from the holes and outliers problems (Red Circles) as conventional patch-based methods. By introducing anatomical priors as graph seeds and lattice connections to enforce label consistency, although there are still some defects, the structural boundary becomes more smooth and the segmentation quality can be improved significantly as shown in column (c). The graph-based label fusion with Random Walker is an essential component in the proposed method, which can be also utilized to improve the labeling result by other conventional patch-based methods in the future.
In this paper, a novel framework for atlas-based image segmentation is proposed. It can effectively encode both the label priors from atlases and anatomical prior from the target image. Three kinds of features are employed to represent a pixel, including conventional intensity values and gradient magnitudes, together with the newly designed structural signature.
Besides the FSLP from atlases, the anatomical prior from the target itself is also employed for final label estimation. The label fusion process is formulated on a graph with Random Walker, with priors encoded as edge weights. Although
Fig. 17. Some visual segmentation results of each subcortical structure on IBSR database. Each row presents the 3D labeled volumes for one selected subcortical structure. (a) Ground truth for reference; (b) Intermediate labeling result by FSLP; (c) Final segmentation after graph-based label fusion with anatomical priors.
atlas seeds involved in graph construction, an equal but simpler graph can be inferred which just relies on nodes from target image. The iterative strategy is employed to update the target label map gradually. The proposed framework has been compared with other state-of-the-art methods for comprehensive evaluation and experimental results indicate that it can obtain better label fusion quality consistently on two publicly available databases.
Proof of Non-convexity
To demonstrate the non-convexity of Equation (3), one simple instance is first proven to be non-convex and then it can be derived that the original is also non-convex. Considering this simplified scenario – the length of each feature vector is 1, the formulation of becomes
By defining a new variable
of the initial problem turns into:
Let us consider a special case and use to represent for simplification. Equation (12) can be rewritten as follows:
Given the constraints and , the range of becomes . If the special case shown in Equation (13) is non-convex, it can be inferred that Equation (12) is also non-convex. For this special case, it is much easier to determine whether it is convex or not. A problem is convex if and only if its Hessian matrix is positive semidefinite [45]. The Hessian matrix of Equation (13) can be calculated as follows:
If H is positive semidefinite, all of its eigenvalues have to be non-negative. The determinant of is:
The eigenvalues of H are
As it is not guaranteed that , one negative eigenvalue can appear. In this case, the Hessian matrix is not positive semidefinite and the special case shown in Equation (13) is non-convex. It can be inferred that the original Equation (3) is also a non-convex problem.
[1] B. Fischl, “Freesurfer,” Neuroimage, vol. 62, no. 2, pp. 774–781, 2012.
[2] D. W. Shattuck and R. M. Leahy, “Brainsuite: an automated cortical surface identification tool,” Medical image analysis, vol. 6, no. 2, pp. 129–142, 2002.
[3] R. Goebel, F. Esposito, and E. Formisano, “Analysis of functional image analysis contest (fiac) data with brainvoyager qx: From singlesubject to cortically aligned group general linear model analysis and self-organizing group independent component analysis,” Human brain mapping, vol. 27, no. 5, pp. 392–401, 2006.
[4] D. Geffroy, D. Rivi`ere, I. Denghien, N. Souedet, S. Laguitton, and Y. Cointepas, “Brainvisa: a complete software platform for neuroimaging,” in Python in neuroscience workshop. Euroscipy, Paris, 2011, pp. 15–16.
[5] J. A. Schnabel, D. Rueckert, M. Quist, J. M. Blackall, A. D. Castellano- Smith, T. Hartkens, G. P. Penney, W. A. Hall, H. Liu, C. L. Truwit et al., “A generic framework for non-rigid registration based on non-uniform multi-level free-form deformations,” in Med. Image Comput. ComputerAssist. Intervent. Springer, 2001, pp. 573–581.
[6] J. Ashburner and K. J. Friston, “Nonlinear spatial normalization using basis functions,” Human Brain Mapping, vol. 7, no. 4, pp. 254–266, 1999.
[7] B. B. Avants, C. L. Epstein, M. Grossman, and J. C. Gee, “Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain,” Med. Image Anal., vol. 12, no. 1, pp. 26–41, 2008.
[8] A. Klein, J. Andersson, B. A. Ardekani, J. Ashburner, B. Avants, M.- C. Chiang, G. E. Christensen, D. L. Collins, J. Gee, P. Hellier et al., “Evaluation of 14 nonlinear deformation algorithms applied to human brain mri registration,” NeuroImage, vol. 46, no. 3, pp. 786–802, 2009.
[9] M. R. Sabuncu, B. T. Yeo, K. Van Leemput, B. Fischl, and P. Golland, “A generative model for image segmentation based on label fusion,” IEEE Trans. Med. Imag., vol. 29, no. 10, pp. 1714–1729, 2010.
[10] P. Coup´e, J. V. Manj´on, V. Fonov, J. Pruessner, M. Robles, and D. L. Collins, “Patch-based segmentation using expert priors: Application to hippocampus and ventricle segmentation,” NeuroImage, vol. 54, no. 2, pp. 940–954, 2011.
[11] S. Liao, Y. Gao, J. Lian, and D. Shen, “Sparse patch-based label propagation for accurate prostate localization in ct images,” IEEE Trans. Med. Imag., vol. 32, no. 2, pp. 419–434, 2013.
[12] F. Rousseau, P. A. Habas, and C. Studholme, “A supervised patch-based approach for human brain labeling,” IEEE Trans. Med. Imag., vol. 30, no. 10, pp. 1852–1862, 2011.
[13] H. Wang, J. Suh, S. Das, J. Pluta, C. Craige, and P. Yushkevich, “Multi- atlas segmentation with joint label fusion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 3, pp. 611–623, 2013.
[14] T. Tong, R. Wolz, P. Coup´e, J. V. Hajnal, D. Rueckert, A. D. N. Initiative et al., “Segmentation of mr images via discriminative dictionary learning and sparse coding: Application to hippocampus labeling,” NeuroImage, vol. 76, pp. 11–23, 2013.
[15] V.-T. Ta, R. Giraud, D. L. Collins, and P. Coup´e, “Optimized patchmatch for near real time and accurate label fusion,” in Med. Image Comput. Computer-Assist. Intervent., ser. LNCS, P. Golland, N. Hata, C. Barillot, J. Hornegger, and R. Howe, Eds. Springer, 2014, vol. 8675, pp. 105– 112.
[16] P. Coup´e, P. Yger, S. Prima, P. Hellier, C. Kervrann, and C. Barillot, “An optimized blockwise nonlocal means denoising filter for 3-d magnetic resonance images,” IEEE Trans. Med. Imag., vol. 27, no. 4, pp. 425–441, 2008.
[17] W. Bai, W. Shi, C. Ledig, and D. Rueckert, “Multi-atlas segmentation with augmented features for cardiac mr images,” Med. Image Anal., vol. 19, no. 1, pp. 98–109, 2015.
[18] J. Liu, S. Ji, and J. Ye, “Slep: Sparse learning with efficient projections,” Arizona State University, vol. 6, 2009.
[19] L. M. Koch, M. Rajchl, T. Tong, J. Passerat-Palmbach, P. Aljabar, and D. Rueckert, “Multi-atlas segmentation as a graph labelling problem: Application to partially annotated atlas data,” in Information Processing in Medical Imaging. Springer, 2015, pp. 221–232.
[20] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[21] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
[22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
[23] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.
[24] T. Sercu, C. Puhrsch, B. Kingsbury, and Y. LeCun, “Very deep multilingual convolutional neural networks for lvcsr,” arXiv preprint arXiv:1509.08967, 2015.
[25] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in International Conference on Machine Learning, 2010, pp. 807–814.
[26] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in International Conference on Machine Learning, vol. 30, 2013.
[27] H. Wang, A. Cruz-Roa, A. Basavanhally, H. Gilmore, N. Shih, M. Feld- man, J. Tomaszewski, F. Gonzalez, and A. Madabhushi, “Mitosis detection in breast cancer pathology images by combining handcrafted and convolutional neural network features,” Journal of Medical Imaging, vol. 1, no. 3, pp. 034 003–034 003, 2014.
[28] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu, “An optimal algorithm for approximate nearest neighbor searching fixed dimensions,” Journal of the ACM, vol. 45, no. 6, pp. 891–923, 1998.
[29] C. Silpa-Anan and R. Hartley, “Optimised kd-trees for fast image descriptor matching,” in Comput. Vis. Pattern Recognit. IEEE, 2008, pp. 1–8.
[30] A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approx- imate nearest neighbor in high dimensions,” in IEEE Symposium on Foundations of Computer Science. IEEE, 2006, pp. 459–468.
[31] M. Muja and D. Lowe, “Scalable nearest neighbour algorithms for high dimensional data,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 11, pp. 2227–2240, 2014.
[32] M. Muja and D. G. Lowe, “Fast approximate nearest neighbors with automatic algorithm configuration.” Proc. Int. Conf. Computer Vis. Theory Appl., vol. 2, pp. 331–340, 2009.
[33] S. Bao and A. C. Chung, “Label inference with registration and patch priors,” in Med. Image Comput. Computer-Assist. Intervent., ser. LNCS, P. Golland, N. Hata, C. Barillot, J. Hornegger, and R. Howe, Eds. Springer, 2014, vol. 8673, pp. 731–738.
[34] O. L´ezoray and L. Grady, Image Processing and Analysis with Graphs: Theory and Practice. CRC Press, 2012.
[35] C. Couprie, L. Grady, L. Najman, and H. Talbot, “Power watershed: A unifying graph-based optimization framework,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 7, pp. 1384–1399, 2011.
[36] S. Vicente, V. Kolmogorov, and C. Rother, “Graph cut based image seg- mentation with connectivity priors,” in Comput. Vis. Pattern Recognit., 2008, pp. 1–8.
[37] L. Grady and E. Schwartz, “The graph analysis toolbox: Image process- ing on arbitrary graphs,” Tech. Rep. 021, 2003.
[38] A. Sinop and L. Grady, “A seeded image segmentation framework unifying graph cuts and random walker which yields a new algorithm,” in International Conference on Computer Vision, 2007, pp. 1–8.
[39] D. W. Shattuck, M. Mirza, V. Adisetiyo, C. Hojatkashani, G. Salamon, K. L. Narr, R. A. Poldrack, R. M. Bilder, and A. W. Toga, “Construction of a 3d probabilistic atlas of human cortical structures,” NeuroImage, vol. 39, no. 3, pp. 1064–1080, 2008.
[40] M. Jenkinson, P. Bannister, M. Brady, and S. Smith, “Improved op- timization for the robust and accurate linear registration and motion correction of brain images,” NeuroImage, vol. 17, no. 2, pp. 825–841, 2002.
[41] M. Jenkinson, C. F. Beckmann, T. E. Behrens, M. W. Woolrich, and S. M. Smith, “Fsl,” NeuroImage, vol. 62, no. 2, pp. 782–790, 2012.
[42] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987, 2002.
[43] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Comput. Vis. Pattern Recognit., vol. 1. IEEE, 2005, pp. 886–893.
[44] H. J. Kushner and D. S. Clark, Stochastic approximation methods for constrained and unconstrained systems. Springer Science & Business Media, 2012.
[45] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.