We know if we train a deep learning model on a labeled dataset A (source domain), it may achieve high performance on A but low performance on an unlabeled dataset B (target domain) because A and B may have different attributes. We hypothesize the UDA method will improve the models performance on B while maintaining the high performance on A.
2.1 Dataset
The public mammogram dataset Digital Database of Screening Mammography (DDSM) [2] and a private mammogram dataset, UKY [5], are used in this work. These two datasets have different attributes: DDSM contains digitalized screen film mammograms and UKY contains full-field digital mammograms recently collected from a comprehensive breast imaging center. Several recent works explored the different attributes of those two or other similar datasets [7,4,6]. In this work, we use 1860 positive and 2781 negative images from DDSM and 1922 positive and 2330 negative images from UKY. We split the data in 80% for training and 20% for testing.
Fig. 1. Stepwise illustration of our unsupervised domain adaptation(UDA) method. Step 1) train Cycle-GAN by using unpaired, unlabeled UKY and DDSM datasets; Step 2) translate UKY to DDSM; 3) train deep learning models by using UKY and synthesized DDSM.
2.2 Method
Figure 1 illustrates our UDA method. We first train the Cycle-GAN [8] on unpaired images without any labels, then we synthesize DDSM data from UKY data to generate training samples in the target domain. Finally, we train a deep neural network on a mixture of UKY and synthesized DDSM images. We compared our UDA method with the baseline method, which trains on one dataset and directly tests on another dataset. In addition, we train the models on labeled DDSM and synthesized UKY by switching the source and target domains for a two-way verification.
Our results are summarized in Table 1. Two off-the-shelf architectures are used for evaluation: AlexNet [3] and ResNet [1]. When training and testing on different datasets, UDA achieves significant improvement compared to the baseline. For instance, when we trained on UKY and tested on DDSM with AlexNet, the baseline only achieved 0.516 auROC while UDA achieved 0.601 auROC. The table also shows when training and testing on the same dataset, UDA maintains similarly high performance, which verified our hypothesis.
Table 1. Testing Results of Different Methods.
Our results show that the proposed UDA method improves deep learning models generalization without requiring expensive manual annotations. However, there is still room for improvement. We expect combining improved versions of CycleGAN with small amounts of labeled data in the target domain will help bridge the gap.
Despite the reported high performance of deep learning models in crafted training data, generalization remains the challenge due to differences in publicly available and real-world clinical datasets. Our UDA method helps train models that can generalize between datasets, thereby significantly improving the results and lowering the cost of using deep learning models in clinical practice.
1. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
2. Heath, M., Bowyer, K., Kopans, D., Kegelmeyer, P., Moore, R., Chang, K., Mun- ishkumaran, S.: Current status of the digital database for screening mammography. In: Digital mammography, pp. 457–460. Springer (1998)
3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con- volutional neural networks. In: Advances in neural information processing systems. pp. 1097–1105 (2012)
4. Liang, G., Wang, X., Zhang, Y., Xing, X., Blanton, H., Salem, T., Jacobs, N.: Joint 2d-3d breast cancer classification. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). pp. 692–696. IEEE (2019)
5. Wang, X., Liang, G., Zhang, Y., Blanton, H., Bessinger, Z., Jacobs, N.: Inconsistent performance of deep learning models on mammogram classification. Journal of the American College of Radiology (2020)
6. Zhang, X., Zhang, Y., Y. Han, E., Jacobs, N., Han, Q., Wang, X., Liu, J.: Clas- sification of whole mammogram and tomosynthesis images using deep convolutional neural networks. IEEE Transactions on NanoBioscience PP, 1–1 (06 2018). https://doi.org/10.1109/TNB.2018.2845103
7. Zhang, Y., Wang, X., Blanton, H., Liang, G., Xing, X., Jacobs, N.: 2d convolutional neural networks for 3d digital breast tomosynthesis classification. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). pp. 1013– 1017. IEEE (2019)
8. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision. pp. 2223–2232 (2017)