Innovation in Computer Vision and Deep Learning is currently enhancing Medical Imaging research to improve diagnosis, image segmentation, and automated detection of specific cells or tissues. However, these recent methods require huge amount of data to train Neural Network models, and available resources in the Medical Imaging field are scarce.
On the other hand, new generative methods are today able to produce human faces with unprecedented quality and realism, giving access to unlimited datasets. The main idea relies on Generative Adversarial Networks (GAN), a dual architecture (see Subsection 1.1) which needs intensive hyperparameter tuning to reach high quality on the output images.
The following work presents a detailed comparison of three architectures to generate high quality Magnetic Resonance Images (MRIs), that can be used as input data for Neural Network training. Image quality, realism and diversity are studied with different hyperparameters along with computational efficiency. The three main methods presented here are: the original Deep Convolutional Generative Adversarial Network (DCGAN), a Super Resolution Residual Network (SRResNet) and a Progressive Generative Adversarial Network (ProGAN).
1.1 Related work
The presented architectures are inspired by three different papers that move the quality of image generation forward.
Generative Adversarial Networks
Goodfellow et al. [1] introduces the original framework of GANs and presents the first promising results. The idea consists of training a first Neural Network, called Generator, that tries to reproduce the input manifold as accurately as possible; and a second Neural Network, called Discriminator, that tries to detect whether an input image comes from the input training data or the image collection created by the Generator.
Because the Generator and the Discriminator are competing against each other, the training is usually unstable and convergence is difficult to achieve. Hyperparameters need to be tuned to synchronize the Generator and the Discriminator. Radford et al. [5], who introduced DCGANs, shows examples of generated images from LSUN (Large-scale Scene Understanding) and Imagenet-1k, a dataset of human faces.
Super-Resolution GAN
Since 2014 [1], improvements in Deep Neural architectures have been made, enabling deeper and better performing networks. Ledig et al. [4] drew inspiration from Residual Network to create a Super-Resolution GAN that can upscale images with high frequency details. The power of the architecture is its depth (with 16 residual blocks) leading to high accuracy but challenging configuration. Experiments on the Berkeley Segmentation Dataset (BSD100) show state-of-the-art performance although huge computing power is necessary.
Progressive GAN
Finally, Karras et al. [2] recently released a new technique to increase stability, speed and convergence during GAN training. The idea is to progressively increase the resolution of the input and output images while adding deeper and deeper layers in the Generator and Discriminator. A minibatch standard deviation layer is also added to the Discriminator to add diversity in the output space and prevent mode collapsing of the Generator. The paper shows incredible results in producing high quality faces using the CELEBA-HQ dataset, created for the experiments.
1.2 Contribution
The presented work defers from previous papers by analyzing different GAN architectures and several configurations dedicated to generating Magnetic Resonance Images from a random latent space (noise). In particular, it goes beyond the work of Kazuhiro et al. [3] by increasing the resolution and benchmarking convergence and quality of various methods.
The main contributions are:
• successfully generating MRIs from noise with three different architectures: DCGAN, SRResNet, ProGAN
• the comparison of five loss functions: Original loss, LSGAN, WGAN, WGAN GP, DRAGAN
• the tuning of hyperparameters to improve convergence and quality
The different architectures and loss functions are presented in Section 2. Then, a quantitative analysis on the Open Access Series of Imaging Studies (OASIS) dataset is provided in Section 3. Finally, the paper concludes with a discussion in Section 4 and concluding remarks on possible future works in Section 5. The source code and results are available on the GitHub repository:
This Section covers the framework used to generate MRIs and the hyperparameter tuning process to increase convergence and stability.
2.1 Tested architectures
This work presents three main architectures for comparison: DCGAN, SRResNet and ProGAN.
DCGAN
The first architecture corresponds to a simple GAN using convolution layers. The detailed architecture is presented in Tables 1 and 2. In particular, the input latent vector of the Generator is drawn from the distribution U(–1, 1), and the Leaky ReLU activation functions in the Discriminator have a slope of 0.2. Moreover, tensor weights are initialized with the distribution N(0, 0.02).
SRResNet
The second architecture is based on a Residual Neural Network with deeper layers. After a dense layer that rescales the input latent vector to the size 64 16
16, the Generator is followed by 16 residual blocks (2 convolutions with 3
3 kernels). At the end, 4 upsampling blocks generate the 1
256
256 output image by transferring filters into pixels, as described in Ledig et al. [4]. In the same way, the Discriminator is composed of 12 residual blocks separated by a downsampling convolution layer with 3
3 kernels and stride 2. The only dense layer is the one at the end of the Discriminator reducing the second to last layer of size 2048
2 into one scalar. In addition, the
Table 1: Generator Architecture of DCGAN
Table 2: Discriminator Architecture of DCGAN
Discriminator does not use any Batch Normalization in an attempt to avoid correlations within the batch. The 256-long input latent space is drawn from N(0, 1) before being normalized, and the tensor weights are initialized with the He Normal method.
ProGAN
The last architecture corresponds to a Progressive GAN as described in [2]. Both networks are trained with increasing image resolutions from 4 4 to 256
256, with smooth image resolution transitions to progressively adapt the architecture. 5
5 kernels are used for convolutions and the latent input is a 512-long vector drawn from N(0, 1) before being normalized. Tensor weights are dynamically scaled at each iteration with the He Normal method. In addition, pixel normalization and mini-batch similarity layer are added to improve convergence and image diversity.
2.2 Tested loss functions
Because stability is difficult to reach when training GANs, different loss functions try to regularize and speed up the convergence. The following notations are use thereafter:
• the input noise z pz(z) (uniform or normalized normal)
• the generator output G(z) pg • the input data x
pdata(x)
• the “probability” D(y), computed by the Discriminator, that y comes from pdata rather than pg
LSGAN LSGAN (Equation 2) is another loss function that tries to reduce mode collapsing and vanishing gradient.
LLSGAND = D(x) – 1
D(G(z))2
WGAN
In order to increase stability and convergence, WGAN (Equation 3) replaces the original JensenShannon divergence by the Wasserstein distance, a continuous function where the gradient is more easily computed.
WGAN GP
In addition to the WGAN loss, a gradient penalty can be introduced to avoid exploding gradient. is a hyperparameter to balance the penalty and
(0, 1) is a random parameter that combines the real and fake image.
DRAGAN
With the same idea, DRAGAN (Equation 5) introduces a gradient regularization to avoid local minima and mode collapsing. is a hyperparameter and
(0, 1) combines the real image with xp
U(0, 0.5
), a pixel-scaled random noise.
2.3 Tricks to improve GAN performance
The presented architectures are the result of a lot of trials and errors and a deep literature investigation. During the first steps of the experiments, it had been noticed that using a one-sided smoothing label (ie training both networks with D(y) = 0.9 instead of 1.0) gave better visual results. Moreover, most of the tested networks have a Discriminator too strong. That is why the Generator/Discriminator rate r is often greater than 1 (r = 3 for DCGAN, r = 2 for SRResNet and r = 1 for ProGAN).
The different models are trained on a Tesla K80 GPU with a batch size of 64 images. 60 epochs are used except for the ProGAN which needs 20 epochs for each change in resolution and after each block transition (20 13 = 260 epochs). The Adam optimizer performs the minimization of the loss function with lr = 0.0002 and
= 0.5 for DCGAN and SRResNet, but with lr = 0.001 and
= 0 for ProGAN.
3.1 Training dataset
The training dataset comes from the Open Access Series of Imaging Studies (OASIS). It is composed of 11328 brain MRIs of size 256 256, scaled to [–1, 1].
3.2 Evaluation measures
Two main quality measures need to be estimated: the image realism and the generated manifold diversity. To do so, a Principle Component Analysis (PCA) is performed over the data distribution to obtain 16 orthogonal vectors that represent the main variations of the input manifold (55 %).
Realism is calculated by projecting N = 11328 generated images G on the 16 covariance matrix eigenvectors Ei and retrieving the mean of the cosine similarity vector norm.
Diversity is evaluated through 2 measures: the total variation of the generated set and the number
of covariance matrix eigenvalues which are greater than 1% of
.
is meant to detect unrealistic images,
detects if the images have too few variations whereas
tracks mode collapsing.
Finally, an estimation of how the model overfits by remembering all training images is performed by vi- sualizing images generated from the interpolation between two random latent vectors.
3.3 Results
The Figure 1 shows a generated image for each main architecture. It can be noticed that they are all close to the original distribution, even if they seem a bit blurry compared to the ground truth. The quality of the images can be improved by increasing the number of epochs, but requires more computation time.
Figure 1: Generated images, from left to right: DCGAN, SRResNet, ProGAN, Original
Table 3: Realism and Diversity evaluation
The Table 3 summarizes the evaluation measures for each architecture. DRAGAN and WGAN GP are the only loss functions that allow stabilized training. DCGAN performs better with DRAGAN whereas SRResNet and ProGAN converge only with respectively DRAGAN and WGAN GP loss. Finally, DCGAN performs better than SRResNet or ProGAN, and SRResNet is the fastest to train (30 hours) compared to DCGAN (45 hours) and ProGAN (58 hours).
The performed experiments reinforce evidence that GANs are difficult to train and are sensitive to small changes in the hyperparameters or architecture. However, it has been shown that using a mini-batch similarity layer in the Discriminator (used in DCGAN and ProGAN) and controlling the gradient norm (used in DRAGAN and WGAN GP) are essential to stabilize the training. On the contrary, the use of noise in the discriminator or the choice between uniform or normalized Gaussian latent input do not seem to have any impact on the quality or realism of the results.
To conclude, GANs can be used to increase MRI datasets (data augmentation) and thus enable more advance training for neural networks. However, huge computation time is needed to achieve high quality and realism, and produce generated images indistinguishable from the original dataset.
Future work must be focused on training new architecture (like StyleGAN) or performing 3D GANs that can generated 3D images without running out of
memory.
I would like to thank Dr. Shakes Chandra for his useful insight on this project and for the helpful supervision he gave me when completing my Master thesis.
[1] I. Goodfellow, et al. “Generative Adversarial Nets.” In Z. Ghahramani, et al., eds., Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates, Inc., 2014.
[2] T. Karras, et al. “Progressive Growing of GANs for Improved Quality, Stability, and Variation.” In The International Conference on Learning Representations (ICLR), 2018.
[3] K. Kazuhiro, et al. “Generative Adversarial Networks for the Creation of Realistic Artificial Brain Magnetic Resonance Images.” Tomography, vol. 4, no. 4, pp. 159–163, 2018.
[4] C. Ledig, et al. “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.” In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[5] A. Radford, et al. “Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks.” In The International Conference on Learning Representations (ICLR), 2016.
Antoine Delplace is a Master student pursuing a double degree at the University of Queensland and at Ecole Centrale Paris in Software Engineering. His research focuses on Machine Learning and Deep Neural Networks. He will begin a PhD degree at the University of Queensland in January 2020.