From the past several years deep learning has outperformed many conventional computer vision techniques in areas such as image classification, segmentation, tracking etc.[13], [6], [9], [11], [12], [18]. Convolutional Neural Networks (CNN) is one of the most famous deep learning architecture which is designed in 1989 [14], but its true effectiveness came to the surface when it is trained on more powerful machines with GPUs and leveraging large amount of training data. Krizhevsky et al. [13] trained a large CNN architecture containing 8 layers and millions of parameters by using the huge ImageNet dataset with 1 million training images. From the past several years many modified and more deeper architectures of CNN has been proposed which are not only used in the medcal imaging domain but they have been widely applied to other applications as well.
Computer vision based Medical image segmentation methods can be divided into two categories, i.e, conventional medical image segmentation techniques and deep learning based methods. Some widely used conventional medical image segmentation methods include thresholding based methods [19], [7], [1], region growing methods [8], [17], and clustering based methods [5], [16]. Deep CNN models are mostly used for the task of image classification, however, in medical image analysis, image segmentation has its own significance, for instance image segmentation is widely used in the localization of cancerous and defected regions in MRI, CT scan and Ultrasound images. In medical image segmentation CNN models are used along with cross entropy loss as a pixel-wise measure [4]. However, the most popular deep CNN architectures for medical image segmentation is based on an encoder-decoder architecture. The widely used models in this domain is U-Net [18] and V-Net architectures [15]. U-Net is employed for the segmentation of biological microscopy images, and since in mdeical domain the training images are not as large as in other computer vision areas, Ronneberger et al [18] trained the the U-Net model using data augmentation strategy to leverage from the available annotated images. The architecture of U-Net is consist of two main parts, i.e a contracting sub-net to encode the semantics and context information, and an expanding sub-net uses and decodes the encoded information for the generation of segmented maps. The contracting sub-net is based on down-sampling CNN blocks that extracts features with 3 3 convolutions. The expanding sub-net is based on up-sampling CNN blocks which uses deconvolution to increase the image dimensions in spatial axis while reducing he number of channels in each image. To leverage the context information which is encoded by the intermediate layers of the contracting sub-net, the encoded feature maps are concatenated with the feature maps from the intermediate layers of deconvolutional CNN blocks of the expanding sub-net. Afterwards, 1
1 convolution is applied on the feature maps obtained from the intermediate layers of the expanding sub-net in order to produce a segmentation map in which each pixel is classified according to the corresponding semantic class of the input image. The entire U-Net architecture is trained on a dataset containing 30 transmitted light microscopy images, and due to the efficient architectural design of this model, it won the ISBI cell tracking challenge 2015 by a significant margin.
Similarly, V-Net [15] is another widely used image segmentation network in medical image analysis, but the main difference between this network is that it is used for 3D medical image segmentation. They proposed a loss function based on the Dice coefficient to overcome the problem of voxel imbalance in the foreground and background during the network training. V-Net is trained end-to-end on MRI voxels containing prostate information, and V-Net is trained employing the Dice coefficient to infer the segmentation for the whole volume at once. Imran et al. [10] proposed a fast segmentation method known as Progressive Dense Vnet (PDV-Net) for the segmentation of pulmonary lobes from chest CT images. The PDV-net architecture contains three dense feature blocks, which processes the entire CT volume in order to generate the segmentation information in an automatic manner. As opposed to existing medical image segmentation methods which requires prior information, PDV-Net eliminates the need for any user interaction in the form of providing prior information. Similarly, [2] implements a 3D-CNN encoder for lesion segmentation which combines the advantages of UNet and CEN [3]. The 3D-CNN network is consist of two branches, a conventional
Fig. 1. The main idea: Our CNN network takes an input medical image and passes it through its intermediate layers and produces a segmentation map using its decoder part
convolutional branch and a deconvolutional branch. The convolutional branch is based on convolutional and pooling layers and the deconvolutional branch contains deconvolutional and unpooling layers.
In this paper, we present an architecture, which is quite similar to the aforementioned networks, but the main difference between our proposed method and the existing medical image segmentation techniques discussed in the previous paragraph is that we combine the advantages of supervised learning with the self-supervised training strategy of a typical U-Net architecture. We argue that, by explicitly, providing the supervisory signal at the bottleneck layer of the encoder part of U-Net, the encoder or the contracting branch can encode more effective features as compared to using self-supervised training approach.
The overall framework of our proposed technique is shown in Figure 2. The network is consist of three parts, i.e 1) an encoder part, 2) bottleneck training part, and 3) the decoder part. The encoder part is based on the typical design of a convolutional neural network which contains convolutional blocks with 3 3 filters. Each convolutional block is followed by a a rectified linear unit (ReLU) and a down-sampling layer having 2x2 max pooling operation with stride 2. The down-sampling layer reduces the size of the input image spatially, while it increases the number of channels of the feature maps to encode more useful information. The bottleneck training part is consist of two fully connected layers to predict the ground-truth segmentation map by using linear transformation as the input image and the predicted segmentation maps are registered. The decoder part of the network is designed based on the up-sampling deconvolutional blocks. We use 2
2 up-convolutions to increase the size of the feature maps in the intermediate de-convolutional layers. Following the skip-connection architecture of U-net we concatenate feature maps from the encoder layers to the corresponding layer in the decoder network. We then use 3
3 convolutional filters followed by a ReLU to incorporate non-linearity in this branch of the network.
Table 1. Oulu-CASIA: Accuracy for six expressions classification.
Fig. 2. The over-all architecture of our proposed network: An input image, such as an MRI or CT-scan image is fed to the CNN based network which extracts context information in its intermediate layers and this encoding of the context information is enhanced by using a bottleneck training layer. The decoder part of our network then uses the encoded information to generate the segmented map using skip connections from encoder layers to intermediate layers of decoder
The proposed CNN based method is evaluated using the criteria of sensitivity and specificity, defined by the following formulas:
Table 1 shows the specificity, sensitivity and accuracy obtained by training and validating our proposed model on MRI and CT-scan images.
In this paper we have presented a U-Net type of architecture, which is based on convolutional neural networks for medical image segmentation. Our proposed network has three parts, i.e, 1) an encoder part, 2) a bottleneck learning layer and a 3) decoder part of the network. The encoder part encodes the context information from the input image in the intermediate layers using CNN filters followed by non-linearity of RELU. The bottleneck layer is used to enhance the feature extraction capability of the encoder part by using a fully supervised linear transformation based on fully connected layers. The FC layers in the bottleneck part of the network is used to predict the ground truth segmentation map using a linear transformation. The decoder part of our network is based on deconvolutional blocks which increases the spatial dimensions of feature maps and reduces the channels of the feature maps in its intermediate layers. To take full advantage of the encoded information in the intermediate layers of the encoder and to prevent the loss of information, we add skip connections, connecting the intermediate layers of the encoder with the intermediate layers of the decoder. Experimental results show that the proposed technique produces promising results on MRI and CT scan images.
1. Ali, K., Jalil, A., Gull, M.U., Fiaz, M.: Medical image segmentation using h-minima transform and region merging technique. In: 2011 Frontiers of Information Technology. pp. 127–132. IEEE (2011)
2. Brosch, T., Tang, L.Y., Yoo, Y., Li, D.K., Traboulsee, A., Tam, R.: Deep 3d convo- lutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation. IEEE transactions on medical imaging 35(5), 1229–1239 (2016)
3. Brosch, T., Yoo, Y., Tang, L.Y., Li, D.K., Traboulsee, A., Tam, R.: Deep convolu- tional encoder networks for multiple sclerosis lesion segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 3–11. Springer (2015)
4. Chen, X., Williams, B.M., Vallabhaneni, S.R., Czanner, G., Williams, R., Zheng, Y.: Learning active contour models for medical image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 11632–11640 (2019)
5. Fiaz, M., Ali, K., Rehman, A., Gul, M.J., Jung, S.K.: Brain mri segmentation using rule-based hybrid approach. arXiv preprint arXiv:1902.04207 (2019)
6. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for ac- curate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 580–587 (2014)
7. Gowda, R.M., Lingaraju, G.: Texture-based watershed 3d medical image segmen- tation based on fuzzy region growing approach. In: Advances in Computational Intelligence, pp. 233–243. Springer (2017)
8. Haider, W., Sharif, M., Raza, M.: Achieving accuracy in early stage tumor identifi- cation systems based on image segmentation and 3d structure analysis. Computer Engineering and Intelligent Systems 2(6), 96–102 (2011)
9. Hariharan, B., Arbel´aez, P., Girshick, R., Malik, J.: Hypercolumns for object seg- mentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 447–456 (2015)
10. Hatamizadeh, A., Ananth, S.P., Ding, X., Terzopoulos, D., Tajbakhsh, N., et al.: Automatic segmentation of pulmonary lobes using a progressive dense v-network. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 282–290. Springer (2018)
11. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human- level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. pp. 1026–1034 (2015)
12. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadar- rama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia. pp. 675–678 (2014)
13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con- volutional neural networks. In: Advances in neural information processing systems. pp. 1097–1105 (2012)
14. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural computation 1(4), 541–551 (1989)
15. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). pp. 565–571. IEEE (2016)
16. Ng, H., Ong, S., Foong, K., Goh, P., Nowinski, W.: Medical image segmentation using k-means clustering and improved watershed algorithm. In: 2006 IEEE southwest symposium on image analysis and interpretation. pp. 61–65. IEEE (2006)
17. Poonguzhali, S., Ravindran, G.: A complete automatic region growing method for segmentation of masses on ultrasound images. In: 2006 International Conference on Biomedical and Pharmaceutical Engineering. pp. 88–92. IEEE (2006)
18. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi- cal image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)
19. Zanaty, E., Ghoniemy, S.: Medical image segmentation techniques: an overview. International Journal of informatics and medical data processing 1(1), 16–37 (2016)