Multi-Cycle-Consistent Adversarial Networks for CT Image Denoising

2020·Arxiv

ABSTRACT

ABSTRACT

CT image denoising can be treated as an image-to-image translation task where the goal is to learn the transform between a source domain X (noisy images) and a target domain Y (clean images). Recently, cycle-consistent adversarial denoising network (CCADN) has achieved state-of-the-art results by enforcing cycle-consistent loss without the need of paired training data. Our detailed analysis of CCADN raises a number of interesting questions. For example, if the noise is large leading to significant difference between domain X and domain Y , can we bridge X and Y with an intermediate domain Z such that both the denoising process between X and Z and that between Z and Y are easier to learn? As such intermediate domains lead to multiple cycles, how do we best enforce cycle-consistency? Driven by these questions, we propose a multi-cycle-consistent adversarial network (MCCAN) that builds intermediate domains and enforces both local and global cycle-consistency. The global cycle-consistency couples all generators together to model the whole denoising process, while the local cycle-consistency imposes effective supervision on the process between adjacent domains. Experiments show that both local and global cycle-consistency are important for the success of MCCAN, which outperforms the state-of-the-art.

Index Terms— Machine learning, Image enhancement/restoration (noise and artifact reduction), Computed tomography (CT), Multi-cycle-consistency

1. INTRODUCTION

Computed tomography (CT) is one of the most widely used medical imaging modality for showing anatomical structures [1, 2, 3, 4]. The foremost concern of CT examination is the associated exposure to radiation, which is known to increase the lifetime risk for death of cancer [5]. The radiation dose can be lowered at the cost of image quality [1], and the resulted images are denoised for enhanced perceptual quality and diagnostic confidence from radiologists.

Various deep neural network (DNN) based methods exist for CT image denoising [6, 7, 8, 9], which require paired clean and noisy images for training. Yet simulations are usually used to generate such paired data, where the synthetic noise patterns can be different from the real ones, leading to biased training results [10]. To address this issue, recently cycle-consistent adversarial denoising network (CCADN) was proposed in [10], which formulates CT image denoising as an image-to-image translation problem without paired training data. CCADN consists of two generators: one transforms noisy CT images (domain X) to clear ones (domain Y ) and the other transforms clear CT images (domain Y ) to noisy ones (domain X). Both generators are trained by adversarial loss. In addition, cycle-consistency loss and identity loss are utilized to gain better performance [11], which will be discussed in detail in Section 2. However, since CCADN only contains two domains X and Y , its efficacy degrades as the noise becomes stronger leading to larger differences between X and Y that are harder to learn.

To tackle this issue, we propose to establish an intermediate domain between the original noisy image domain X and clear image domain Y , and decompose the denoising task into multiple coupled steps such that each step is easier to learn by DNN-based models. Specifically, we construct an additional domain Z with images of intermediate noise level between X and Y . These images can be considered as a step stone in the denoising process and provide additional information for the training of the denoising network. The multi-step framework particularly suits the denoising problem: while it is difficult to either find or define a good collection of images in the “halfcat, half dog” domain in “cat-to-dog” type of image translation problems, a domain Z of images with intermediate level of noise exist naturally.

With the new domain Z, we further propose a multi-cycle-consistent adversarial network to perform the multi-step denoising, which builds multiple cycles of different scales (global cycles and local cycles) between the domains while enforcing the corresponding cycle-consistencies. In the experiments, we find that both global cycles and local cycles are necessary for the success of MCCAN, which combined outperforms the state-of-the-art competitor CCADN.

2. METHODOLOGY

Given training images that are either labelled as noisy (do- main X) or clear (domain Y ), we first construct a new domain Z which contains images with an intermediate noise level between X and Y . How to obtain Z is flexible in practice. In our experiments, it is obtained from X and Y by separating out those images with intermediate noise level.

With CT images from three domains, the multi-step denoising architecture of MCCAN is shown in Fig. 1(a). We train four convolutional neural networks as generators and three as discriminators. Arrows in Fig. 1(a) define how images are transformed in the training stage. Specifically, the generator aims to transform an image from X to Z. , and can be interpreted similarly. Discriminators , and aim to distinguish the “real” images originally belonging to the domains X, Y , and Z from the “fake” images transformed from other domains respectively.

Fig. 1. (a) Structure of MCCAN and (b) its cycles. The ar- rows inside each domain denote the computation of cycle-consistency loss. The solid and dashed arrows across domains form global and local cycles, respectively. For clarity, we only show cycles from left to right. Symmetric cycles going from right to left also exist but are not shown.

As the MCCAN structure in Fig. 1(a) contains thee domains, there are multiple ways in which we can construct cycles (paths where an image from a source domain is transformed through one (in [11]) or several other domains (in this paper) and back to the source domain) for cycle-consistent loss. In particular, we introduce two types of cycles as shown in Fig. 1(b). In this figure, each dot represents an image, which is color-coded based on the domain. The solid ones represent the images originally in the domain (“real” ones), and the hollow ones represent those transformed from another domain (“fake” ones). As such, the dashed arrows form the local cycles, each of which goes across only two adjacent domains. On the other hand, the solid arrows constitute a global cycle that starts from X through Z, Y , Z, and back to X sequentially. Note that in the figure we only show half of the cycles (from left to right) for clarity, and the other half which are from right to left and symmetric to the ones shown also exist. We then enforce cycle-consistency loss, which measures the difference between the original images and the final images produced at the end of the cycle as represented by the small arrows within each domain in Fig 1(b). Ideally, the images transformed back to the source domain should be identical to the original ones. The cycle-consistency loss is applied to every cycle, no matter whether it is local or a global.

The global cycles are important for the denoising performance due to the following reason. In the inference stage, an input noisy CT image x in domain X will be transformed by and sequentially, which means and are coupled by data dependency. Without global cycles, and will be trained independently. Thus, errors of the prediction of noise at intermediate steps may be accumulated as processing progresses. The global cycles enable the joint training of the generators, which models the denoising path used in the inference stage for better consistency.

The local cycles are also important to address two issues in the training. First, the global cycles go through all the four generators and have long paths for the gradient to back-propagate, which makes the end-to-end optimization difficult. The locals cycles are shallow and have shorter paths for the gradient to back-propagate. Second, adversarial training only enforces the generators to output “fake” images identically distributed as the original “real” images in the intermediate domain Z. However, they do not necessarily preserve the meaningful content in the inputs, which is critical for the denoising task. The local cycle-consistency supervises each generator to learn to transform images while preserving their meaningful content from the inputs more easily.

In summary, our MCCAN has two major advantages over CCADN. First, it decomposes the one-step transform into multiple steps using images in a constructed intermediate domain as a step stone. Second, it not only incorporates global cycles that model the denoising path in the inference stage for consistency, but also uses local cycles that provide strong supervision to facilitate the more challenging training process. In the experiments we find that MCCAN outperforms CCADN.

Note that in the discussion so far, only one intermediate domain was assumed. It is also possible to include more than one intermediate domains with more global and local cycles. However, our study suggests that any additional domains beyond one will not introduce further performance gain in the dataset we explored.

Finally, we state the training objective used in our framework. Denote {G} and {D} as the set of generators and discriminators respectively. Denote as one domain and as the discriminator associated with domain I. We let be a cycle and be a path of half that has the same source domain, where i, j are used to distinguish different cycles and paths merely. For example, is a cycle, saying , thus we can have , and , which are both half cycles of represents the set of all the paths that end at domain I. We denote as the source domain of and as the ordered function composition of the generators in . Thus, the total adversarial loss is

where is the adversarial loss associated with domain I and the transform path is obtained by

where is the distribution of “real” images in the domain I and represents the probability determined by that x is a “real” image from domain I rather than a “fake” one transformed by generators from another domain.

The cycle-consistency loss is associated with each , de-fined as

The final optimization problem we solve in the training stage is:

where is set to 10 in our experiments.

3. EXPERIMENTS AND RESULTS

3.1. Experiments Setup

The original dataset contains 200 normal-dose 3D CT images and 200 low-dose ones from various patients for training, and separate 11 images for test. All examinations are performed with a wide detector 256-slice MDCT scanner (Brilliance iCT; Philips Healthcare) providing 8cm of coverage. Each 2D CT image is of size 512512, which is then randomly cropped into 256256 for data augmentation. We construct the additional domain Z with images of intermediate noise level from these clear and noisy scans to make the number of scans in each domain comparable. There are CT images with more noise than usual from clear scans that use high dose radiation, and vice versa, because the noise variation cannot be controlled quantitatively.

Fig. 2. Comparison of (a) CCADN, (b) MCCAN without global cycles (c) MCCAN without local cycles, and (d) MCCAN. For the clarity of presentation, we only show cycles from left to right and symmetric cycles from right to left also exist.

We compare MCCAN with a state-of-the-art CT denoising framework CCADN [10]. In order to see how the local cycles and global cycles contribute to the final performance, we also implement and compare MCCAN without local cycles and without global cycles respectively as ablation study. The various structures are shown in Fig. 2. We train all the networks following the setting in [11]. Our implementation will be available online. We ensure that all network sizes and number of training epochs are the same for fair comparisons.

3.2. Qualitative Evaluation

We choose three representative low-dose CT images in the test dataset as shown in Fig. 3(a) for qualitative evaluation. The corresponding denoised images by CCADN, MCCAN without local cycles, MCCAN without global cycles, and MCCAN are shown in Fig. 3(b)- 3(e) respectively. Numbered areas are homogeneous regions, while areas with edges between heterogeneous regions are zoomed for visibility in Fig. 3. From the figures we can see that CCADN can successfully reduce noise in the original images. MCCAN without local cycles completely fails to produce reasonable results. A more closer examination of the images reveal that interestingly the background and the substances are approximately swapped compared with the original images. This is because the high-level features of content distribution are still kept even with such swap, and the discriminator cannot identify the generated image as “fake” because of the structure diversity in the training dataset. This aligns with our discussion on the importance of local cycles in Section 2. On the other hand, MCCAN without global cycles can successfully denoise the image and achieves similar quality compared with CCADN. This is expected as MCCAN without global cycles is essentially formed by two cascaded CCADNs. Finally, with both local and global cycles, the complete MCCAN has the smallest noise visually.

To further illustrate the efficacy of the MCCAN structure, Fig. 4 shows how an image is transformed along a global cycle (the path XX). From the figure we can see

Table 1. Mean and SD (normalized) of the selected areas in Fig. 3(a).

(c) Images denoised by MCCAN without global cycles

(e) Images denoised by MCCAN Fig. 3. Noisy and denoised images for qualitative evaluation.

Fig. 4. An image transformed through XX cycle in Fig. 2. The noise level decreases along XY and increases along YX, which conforms to our design.

that is an effective two-step denoising process while incrementally adds noise back.

3.3. Quantitative Evaluation

Following existing works [7, 9, 12], we use the mean and standard deviation (SD) of pixels in homogeneous regions of interest chosen by radiologists to quantitatively judge the quality of CT images. The mean value reflects substance information. Although the closer to that in the origin image the better, mean value can fluctuate within a range. On the other hand, the standard deviation reflects the noise level. It should be as low as possible, which is more sensitive than the mean value in the denoising task.

Five homogeneous areas chosen by radiologist are used for the quantitative evaluation, which are annotated by red rectangles in Fig. 3 and numbered from 1 to 5. The normalized quantitative results are shown in Table 1. CCADN can reduce the standard deviation in the five areas by 15%, 21%, 21%, 22% , and 22% respectively, with resulting mean values close to those of the original images. Although MCCAN without local cycles achieves smallest standard deviation in Areas 1, 3 and 4, it leads to meaningless output with large mean deviation from the original images, which corresponds to the structure loss in Fig. 3(c). MCCAN without global cycles has similar performance compared with CCADN. with mean values close to original and standard deviation reduction by 22%, 23%, 20%, 27%, and 19% respectively. Finally, the complete MCCAN behaves the best among all the methods: Within reasonable mean range, the standard deviations are decreased the most by 24%, 32%, 29%, 29%, and 32% from the original CT images respectively.

4. CONCLUSIONS

In this paper, we propose multi-cycle-consistent adversarial network (MCCAN) for CT image denoising. MCCAN builds intermediate domains and enforces both local and global cycle-consistency. The global cycle-consistency couples all generators together to model the whole denoising process, while the local cycle-consistency imposes effective supervision on the denoising process between adjacent domains. Experiments show that both local and global cycle-consistency are important for the success of MCCAN and it outperforms the state-of-the-art competitor.

5. REFERENCES

[1] Chenyu You, Qingsong Yang, Lars Gjesteby, Guang Li, et al., “Structurally-sensitive multi-scale deep neural network for low-dose ct denoising,” IEEE Access, vol. 6, pp. 41839–41855, 2018.

[2] Zihao Liu, Xiaowei Xu, Tao Liu, Qi Liu, Yanzhi Wang, Yiyu Shi, Wujie Wen, Meiping Huang, Haiyun Yuan, and Jian Zhuang, “Machine vision guided 3d medical image compression for efficient transmission and accurate segmentation in the clouds,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12687–12696.

[3] Xiaowei Xu, Tianchen Wang, Dewen Zeng, Yiyu Shi, Qianjun Jia, Haiyun Yuan, Meiping Huang, and Jian Zhuang, “Accurate congenital heart disease modelgeneration for 3d printing,” arXiv preprint arXiv:1907.05273, 2019.

[4] Xiaowei Xu, Tianchen Wang, Yiyu Shi, Haiyun Yuan, Qianjun Jia, Meiping Huang, and Jian Zhuang, “Whole heart and great vessel segmentation in congenital heart disease using deep neural networks and graph matching,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 477–485.

[5] Jason B Hobbs, Noah Goldstein, Kimberly E Lind, Deirdre Elder, Gerald D Dodd III, and James P Borgstede, “Physician knowledge of radiation exposure and risk in medical imaging,” Journal of the American College of Radiology, vol. 15, no. 1, pp. 34–43, 2018.

[6] Hongming Shan, Yi Zhang, Qingsong Yang, Uwe Kruger, Mannudeep K Kalra, Ling Sun, Wenxiang Cong, and Ge Wang, “3-d convolutional encoderdecoder network for low-dose ct via transfer learning from a 2-d trained network,” IEEE transactions on medical imaging, vol. 37, no. 6, pp. 1522–1534, 2018.

[7] Jelmer M Wolterink, Tim Leiner, Max A Viergever, and Ivana Iˇsgum, “Generative adversarial networks for noise reduction in low-dose ct,” IEEE transactions on medical imaging, vol. 36, no. 12, pp. 2536–2545, 2017.

[8] Hu Chen, Yi Zhang, Weihua Zhang, Peixi Liao, Ke Li, Jiliu Zhou, and Ge Wang, “Low-dose ct denoising with convolutional neural network,” in Biomedical Imaging (ISBI 2017), 2017 IEEE 14th International Symposium on. IEEE, 2017, pp. 143–146.

[9] Qingsong Yang, Pingkun Yan, Yanbo Zhang, Hengyong Yu, Yongyi Shi, Xuanqin Mou, Mannudeep K Kalra, Yi Zhang, Ling Sun, and Ge Wang, “Low-dose ct image denoising using a generative adversarial network with

wasserstein distance and perceptual loss,” IEEE transactions on medical imaging, vol. 37, no. 6, pp. 1348– 1357, 2018.

[10] Eunhee Kang, Hyun Jung Koo, Dong Hyun Yang, Joon Bum Seo, and Jong Chul Ye, “Cycle consistent adversarial denoising network for multiphase coronary ct angiography,” arXiv preprint arXiv:1806.09748, 2018.

[11] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” arXiv preprint, 2017.

[12] I Arapakis, E Efstathopoulos, V Tsitsia, S Kordolaimi, N Economopoulos, S Argentos, A Ploussi, and E Alexopoulou, “Using idose4 iterative reconstruction algorithm in adults’ chest–abdomen–pelvis ct examinations: effect on image quality in relation to patient radiation exposure,” The British journal of radiology, vol. 87, no. 1036, 2014.