Pathological science depends on the examination of microscopic images for the diagnosis of disease based on cellular and regular structures. Most cells are basically transparent, with little intrinsic pigment. Thus tissue Stains are used to confer contrast and reveal the underlying tissue structures and components. In the case of histopathology images, often staining is done by Hematoxylin and Eosin (H&E). hematoxylin is mostly bound to the nuclei (deep purple or blue color), and eosin is mostly bound to the cytoplasm (red color). Several factors can affect the final appearance of the stained tissue, resulting in color variation and intensity in the histopathology images. Some of these sources of change may be due to human skills in sample preparation, protocols between laboratories, tissue fixation stage, and imaging scanners.
Human color perception can easily understand color changes in images, so pathologists can more effectively cope with color variation, but the performance of CAD (Computer-
Fig. 1. Visual comparison of different stain normalization techniques. The goal of normalization is that the source stain style (a) is looking similar to the Reference stain-style (b).
aided detection) systems decreases dramatically with color change and intensity. Consequently, designing a valid CAD system with the expertise and understanding of pathologists is a challenging task. Hence, the first step of a CAD system is stain normalization, which is a very significant pre-processing step in automated systems. Different stain normalization strategies have been proposed to reduce the inconsistency of stained tissues in automated systems.
Stain normalization should be done in such a way that it maintains good contrast with preserving all the source information in the processed image. Hence a potential drawback of conventional methods is that tissue structures and texture in the original image could be distorted after stain normalization. Also, most of the classic techniques for normalizing all images, use only one user-selected reference image [1]–[6] which has a significant effect on the result of the methods as we show later. For this purpose, we present stain-to-stain translation, a pix2pix-based method, that destroys not only the need for the reference image but also achieves high visual similarity to the ground truth. We train the STST using paired patches of Scanner Hamamatsu and corresponding gray-scale patches. After training, the re-stained patch is compared to the ground truth. Our results show that working with gray-scale images, instead of colored images, generally favor texture-based features, showing then significant improvements in preserving tissue morphology. Furthermore, our proposed method with predict pixels from pixels learn the underlying structures in the tissue. So can say that STST an effective approach as a pre-processing step to reduce the impact that ’non-biological’ variations on histopathology data. We compare the results with traditional image processing approaches that have been developed for normalizing histopathology images. The visual appearance obtained from different methods can be seen in Fig.1. It clearly shows that images normalized with STST are very similar to the ground truth.
The rest of the paper is organized as follows. In section II, first Generative Adversarial Networks is introduced, then Previous work done at stain normalization is presented. Moreover, our proposed method is formulated in section III. After that, section IV gives an overview of the image dataset, the STST method implementation details, and the evaluation metrics used for comparing the proposed technique to some methods of stain normalization. Finally, we describe our concluding in section V.
In this section, we first introduce Generative Adversarial Networks (GAN), and then briefly describe three major categories of stain normalization algorithms have been proposed in the past.
A. Generative Adversarial Networks
Generative Adversarial Networks (GANs) [7] are unsupervised generative models that involve two deep neural networks: a generator G and a discriminator D, who are trained simultaneously. It can be considered as a two-player minimax game, where the two players (generator and discriminator) are competing against each other and thus gradually progressing to achieve their goals. The generator is responsible to learn a mapping from a noise vector z in the latent space to output an image in a target domain: , and the discriminator learns to classifies an image as a real image from training image (close to 1) or a fake image produced by the generator (close to 0):
. Both the generator and the discriminator are trained with backpropagation and have their own loss functions. Here, we call them
, respectively. The architecture of GANs is illustrated as Figure. 2. During training, the generator learns to produce synthetic samples resembling real images that fool the discriminator, while the discriminator learns to distinguish real and fake samples. To train the networks, the loss function is formulated as following:
Where denotes the real data probability distribution defined in the data space
denotes the probability distribution of the latent variable
defined on the latent space
represents the expectation.
B. Previous Work
Previous works published in Stain normalization area often can be broadly categorized into three classes: Stain-separation, template color matching, and style transfer with generative models, which are briefly explained below.
1) Stain-separation: Since the different stain is possessed
with various features in the images, being able to separate information on stains is of central importance. Ruifrok et al. [5] have proposed a novel Stain-separation method in which the stain color appearance matrix was manually estimated by measuring the relative color proportion for R, G, and B channels with only single stained (Hematoxylin or Eosin only) histopathology slide. Such manual estimation of stain vectors limits their applicability in extensive studies. So there are ways to auto-extract colors.
Macenko et al. provide a solution to this problem in [2]. This method assumes that the hematoxylin and eosin stains are linearly separable in the optical density (OD) color space. Hence finds the two largest singular value directions using singular value decomposition (SVD) and projects the OD pixel values onto this plane. However, this kind of method can’t always estimate the right stain vectors if strong staining variation is present in histopathology slides [8].One limitation of this method is the possibility that negative coefficients are obtained in its estimates, which constitutes an invalid biological condition [4].In another way, Khan et al. [3] using Stain Color Descriptor (SCD) global method obtained overall stain color. Then, to identify the locations where each stain is present, a supervised color classification using the Relevance Vector Machine (RVM) has been applied. However, This is a supervised method in which computation complexity is very much higher. Vahadane et al. [4] developed a stain normalization approach based on sparse non-negative matrix factorization (SNMF) technique to preserve the structural information of the source image. Although the solution space of NMF is reduced by SNMF, its computation complexity is considerably higher. This method also doesn’t preserve all color information of the source image.
Although these solutions lead to a better stain estimate, they are limited to image color information, and spatial dependence has been neglected between the structure of the tissue [8].
2) Template color-matching: Template Color-matching
based algorithms make use of the RGB color spectrum of the image and try to match the channel’s levels to that of the reference template. Reinhard et al. [1] proposed to match statistics of color histograms of a reference and source image
Fig. 2. The architecture of the GAN
after transforms of RGB images to the Lab color space. However, when used multiple colors in stained, not exist the assumption of the unimodal distribution of pixels in each channel of Lab color space. Thus, this can result in background areas being mapped as colored regions.
Adversarial Networks (GANs) [7] (especially cGANs [9]) are a completely different strategy to stain normalization, which was preferred in recent approaches. These generative models handle the problem of stain normalization as a style-transfer problem [10]–[14].
BenTaieb et al. [10] using the concept of style transfer [15], transferred the staining appearance of tissue images across different datasets to avoid color variations caused by batch effects. Nevertheless, in this method, stain normalization does not yield the expected result. In contrast, StainGAN [11], under an unsupervised setting, used CycleGANs [16] to transfer the H&E Stain Appearance between Hamamatsu to Aperio scanners and gained High visual similarity to the target domain. In [13], the photorealism and the structural similarity loss (SSIM) are introduced to keep the structural information unchanged. In [12], first on Input images performed graynormalization, then GANs are used to transfer a certain style. Also, this technique requires retraining once the reference image changes. In [17] proposed method does not require retraining when the reference image changes but entirely relies on the color of the reference image, not the mean color of the reference image.
We use Concept image-to-image translation for stain-to-stain translation in Histopathology Images. In the GAN, the generator produces images only from latent variable z. However, in the image-to-image translation task, the generated image must be related to the source image. To solve this, conditional GANs (cGAN) can be employed [9], which takes additional information y as input. For example, a source image is received as additional information for generator and discriminator. The loss function of cGANs is as follows:
Our framework is built using the work of Isola et al. [18] (Pix2Pix), that is an extension of the cGAN. Which learns the mapping from input image to output image along with a loss function to train this mapping. In pix2pix, the loss (Eq.3) encourages the generator to produce a sample that resembles the conditioning variable
. It is the average value of absolute values of the difference at each pixel between a training image
and the generated image
Finally, Eq.3, as an normalization term is added to Eq.2, is used as an adversarial loss. The loss function in this work is as follows:
Where λ called lambda, denotes a hyper-parameter that controls the weights of the terms. In our case, it is set to 100. During training, minimized for training a generator and maximized for training a discriminator. In other words, the purpose of training is to find the generator obtained by solving the optimization problem:
Fig. 3. Illustration of the Pix2Pix framework
The pix2pix method requires image pairs in the training phrase that consist of an original image and the corresponding transformed images, which usually are not easy to obtain in the real world. Among the datasets available for histology, there are no fitting images paired with various stain styles of the same sample. So we utilize from the gray-scale patch and the corresponding RGB patch as pair image.
We have a similar architecture with the Pix2Pix: U-net [19] in generator and PatchGAN [20] in discriminator. In the U-net architecture, the encoder layers and decoder layers are directly connected by “skip connection.” Since the skip connection can shuttle the low-level information (which are commonly shared between the input and output images) across the bottleneck of the encoder-decoder net. It effectively improves the performance of stain translation. In convolutional PatchGAN, instead of classifying the whole image together, each image is divided into n×n segments, then it is predicted that each part is real or fake. Finally, by averaging all the answers, the final classification is done. In other words, only the structure at a certain scale of patches is penalized. The Pix2Pix framework in our work is illustrated in Figure 3. The weights of the generator updated via both adversarial loss by the discriminator output and L1 loss by the re-stained image output.
Conditional variants of GAN[7], simultaneously train a conditional generator and a discriminator. The generator is trained to generate images (in our case, H&E re-stained images) conditioned on input images (in our case, the corresponding gray-scale images). The discriminator aims to classify whether
the H&E re-stained images are real or fake.
In this section, we compared the STST with five state-of-the-art histological stain normalization techniques: Reinhard[1], Macenko[2], Khan[3], Vahadane[4].
A. Dataset
The public Mitosis-Atypia dataset of images of H&E stained breast tissues released as part of the MITOS-ATYPIA ICPR’14 challenge [21]. The dataset consists of 16 histology slides with three different frames per case scanned with an Aperio Scanscope XT scanner and re-scanned with a Hamamatsu Nanozoomer 2.0-HT scanner. A total of consists of 424 frames at x20 magnification. We cropped each frame x20 of the Hamamatsu scanner into 30 patches of size 256×256 pixels, that finally obtained a total dataset of 12720 nonoverlapping patches. For the training set, we extract 3000 random patches from these patches. Also, for quantitative evaluation, we extract 500 patches from 9720 remaining patches (unseen in the training set).
B. Implementation Details Stain-to-Stain Translation
Our method does not require a reference image, but for the state of the art methods, we empirically demonstrate our sensitivity to the choice of a reference image. The STST not only learns the mapping from the gray-scale patch to the re-stained patch but also learn a loss function to train this mapping. Since the training of the discriminator is high-speed compared to the generator, therefore the discriminator loss is divided into two to slow down the training process (see Fig. 4). Both generators and discriminators models are trained with the Adam version of stochastic gradient descent with a learning rate 0.0002, and momentum parameters . Also, both network weights were initialized from a gaussian distribution with a mean 0 and a standard deviation of 0.02. Every experiment is trained for 30 epochs, and the models are updated after each image, In other words, batch size of 1. We used GPU NVIDIA Tesla P100-PCIe-16GB. After training according to loss values, we select one of the best-stored models of the generator. Then using this model, we able to translate any histopathological image to the Hamamatsu scanner.
Fig. 4. Discriminator and generator loss during training.
C. Evaluation Metrics and Results
Conventional quality metrics (e.g., Full reference metric) are not suitable for histopathology images since usually, the ground truth of histopathology image is not available. Indeed, the ground truth is changed after the normalization process. However, Since we train the STST experiment using paired patches created by Hamamatsu Scanner images and the gray-scale the same images. In this particular case, we can use the Hamamatsu scanner images as the ground truth.
But to have a fair and comprehensive comparison with other ways, the goal should be to be able to normalize the patches from scanner Aperio to Scanner Hamamatsu style, then we compare it with the slides of Scanner H (ground truth). Though the Mitosis-Atypia dataset includes the same tissue sections scanned with both scanners, but because of the difference in the type of scanner, images don't exactly match together. Therefore, part of this value drop in assessing similarity will be due to tiny differences between the two patches. Consequently, to show the excellent result of STST, we examine different evaluation metrics in both the match and non-match ground truth.
TABLE I. DIFFERENT EVALUATION METRICS ARE REPORTED FOR VARIOUS STAIN NORMALIZATION METHODS OF 500
Fig. 5. Comparison of some of the stain normalization methods on H&E stain images from different datasets.
The metrics used for comparison are Structural Similarity index (SSIM) [22], Multi-scale Structural Similarity Index (MS-SSIM) [23], Feature Similarity (FSIM) Index [24], Spatial Correlation Coefficient (SCC), Pearson Correlation Coefficient (PCC) [25], Mean Squared Error (MSE), Root Mean Square Error (RMSE) [26], Peak Signal-to-Noise Ratio (PSNR), Erreur Relative Globale Adimensionnelle de Synthèse (ERGAS), Relative Average Spectral Error (RASE), Universal Quality Index (UQI) [27]. On the other side, to compare the efficacy and better stain separability of the algorithms in extracting the correct stain vectors, the Euclidean distances between the manually determined stain vectors using Ruifrok’s method [5] in the ground-truth and the stain vectors computed by different methods.
truth: To evaluate the effectiveness of stain normalization algorithms, various metrics based on perceptual similarity and color evaluation have been proposed. However, there still exists an enormous gap between these metrics evaluation and human perception of visual similarity. Hence, we have done both qualitative and quantitative evaluations. From Table I, we can see our method has achieved superior results than all other approaches in all evaluation metrics. Further, in Table II it has shown a better stain separability compared to the ground-truth stain vectors. Also, it has given the processing time in our method is shorter than other methods (Table III).
Most computational metrics are not designed to directly measure the perceptual similarity of the normalized image so that the evaluation results may sometimes be incompatible with the subjective impression. But via visual evaluation, it generally can examine the effectiveness of different methods (Fig. 5).
TABLE II. STAINING SEPARATION COMPARISON (MEAN ± STD.)
TABLE III. PROCESSING TIME TAKEN TO
this section, also both quantitative and qualitative has done assessments. As seen in Fig. 6. show a there is a significant difference in the staining of blood cells compared to cytoplasmic/stromal staining. The STST is able to detect these differences and have done the right staining. Too, the criteria measured in Table IV demonstrate that our method is significantly similar to the ground-truth images and has made more reasonable normalization.
Fig. 6. High capability of the STST in staining blood cells
TABLE IV. COMPARE THE RE-STAINED PATCHES WITH THE
Appearance variation of H&E images can be reduced by adopting proper stain normalization methods that enhance the image contrast. In this paper, inspired by the efficiency of cGANs, that recently has been used as stain normalization methods for histopathological images, we used pix2pix architecture to stain-to-stain translation (STST). Can say that the re-staining process presented in this paper can be viewed as a normalization process where the model learns to re-stained the gray-scale patches with a similar stain-style. Based on evaluation results, we find that STST can provide meaningful colorizations of gray-scale patches and achieved a high perceptually similarity between the ground truth and re-stained image. Moreover, the processing time gained in this method is less than all the methods tested (Table III). So we conclude that effectiveness GANs approaches very outperform the classic stain normalization methods. Hence the STST could potentially be used as a pre-processing step in a histopathologic images analysis pipeline.
We want to thank Dr. Babak Ehteshami Bejnordi, for his guidance, time and feedback on this paper.
[1] E. Reinhard, M. Ashikhmin, B. Gooch, and P. Shirley, “Color transfer between images,” IEEE Comput. Graph. Appl., vol. 21, no. 5, pp. 34– 41, 2001.
[2] M. Macenko, M. Niethammer, J. S. Marron, D. Borland, J. T. Woosley, X. Guan, C. Schmitt, and N. E. Thomas, “A method for normalizing histology slides for quantitative analysis,” 2009 IEEE Int. Symp. Biomed. Imaging From Nano to Macro, pp. 1107–1110, 2009.
[3] A. M. Khan, N. Rajpoot, D. Treanor, and D. Magee, “A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution,” IEEE Trans. Biomed. Eng., vol. 61, no. 6, pp. 1729–1738, 2014.
[4] A. Vahadane, T. Peng, A. Sethi, S. Albarqouni, L. Wang, M. Baust, K. Steiger, A. M. Schlitter, I. Esposito, and N. Navab, “StructurePreserving Color Normalization and Sparse Stain Separation for Histological Images,” IEEE Trans. Med. Imaging, vol. 35, no. 8, pp. 1962–1971, 2016.
[5] A. C. Ruifrok and D. A. Johnston., “Quantification of histochemical staining by color deconvolution,” Anal. Quant. Cytol. Histol. Int. Acad. Cytol. Am. Soc. Cytol., vol. 23, no. 4, pp. 291–299, 2001.
[6] B. Ehteshami Bejnordi, N. Timofeeva, I. Otte-Höller, N. Karssemeijer, and J. A. W. M. van der Laak, “Quantitative analysis of stain variability in histology slides and an algorithm for standardization,” Med. Imaging 2014 Digit. Pathol., vol. 9041, p. 904108, 2014.
[7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Adv. Neural Inf. Process. Syst., pp. 2672– 2680, 2014.
[8] B. E. Bejnordi, G. Litjens, N. Timofeeva, I. Otte-Höller, A. Homeyer, N. Karssemeijer, and J. A. W. M. Van Der Laak, “Stain specific standardization of whole-slide histopathological images,” IEEE Trans. Med. Imaging, vol. 35, no. 2, pp. 404–415, 2015.
[9] M. Mirza and S. Osindero, “Conditional Generative Adversarial Nets,” CoRR, vol. abs/1411.1, 2014.
[10] A. BenTaieb and G. Hamarneh, “Adversarial Stain Transfer for Histopathology Image Analysis,” IEEE Trans. Med. Imaging, vol. 37, no. 3, pp. 792–802, 2018.
[11] M. T. Shaban, C. Baur, N. Navab, and S. Albarqouni, “Staingan: Stain style transfer for digital histological images,” IEEE 16th Int. Symp. Biomed. Imaging, no. ISBI, pp. 953–956, 2019.
[12] H. Cho, S. Lim, G. Choi, and H. Min, “Neural Stain-Style Transfer Learning using GAN for Histopathological Images,” arXiv Prepr. arXiv1710.08543., 2017.
[13] Z. Xu and C. Fern, “GAN-based Virtual Re-Staining: A Promising Solution for Whole Slide Image Analysis,” arXiv Prepr. arXiv1901.04059, 2019.
[14] F. Ciompi, O. Geessink, B. E. Bejnordi, G. S. De Souza, A. Baidoshvili, G. Litjens, B. Van Ginneken, I. Nagtegaal, and J. Van Der Laak, “The importance of stain normalization in colorectal tissue classification with convolutional networks,” Proc. - Int. Symp. Biomed. Imaging, pp. 160– 163, 2017.
[15] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image Style Transfer Using Convolutional Neural Networks,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016, pp. 2414–2423, 2016.
[16] J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” Proc. IEEE Int. Conf. Comput. Vis., pp. 2223–2232, 2017.
[17] F. G. Zanjani, “Histopathology Stain-Color Normalization Using Deep Generative Models,” Med. Imaging with Deep Learn., no. Midl, pp. 1– 11, 2018.
[18] P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” Proc. IEEE Conf. Comput. Vis. pattern Recognit., pp. 1125–1134, 2017.
[19] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” Int. Conf. Med. image Comput. Comput. Interv., pp. 234–241, 2015.
[20] C. Li and M. Wand, “Precomputed real-time texture synthesis with markovian generative adversarial networks,” Eur. Conf. Comput. Vis., pp. 702–716, 2016.
[21] L. Roux, D. Racoceanu, F. Capron, J. Calvo, E. Attieh, G. Le Naour, and A. Gloaguen, “Mitos & atypia,” Image Pervasive Access Lab, 2014.
[22] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. image Process., vol. 13, no. 4, pp. 600–612, 2004.
[23] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” Thrity-Seventh Asilomar Conf. Signals, Syst. Comput., vol. 2, pp. 1398–1402, 2003.
[24] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image quality assessment,” IEEE Trans. Image Process., vol. 8, no. 20, pp. 2378–2386, 2011.
[25] P. Ahlgren, B. Jarneving, and R. Rousseau, “Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient,” J. Am. Soc. Inf. Sci. Technol., vol. 54, no. 6, pp. 550–560, 2003.
[26] C. J. Willmott and K. Matsuura, “Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance,” Clim. Res., vol. 30, no. 1, pp. 79–82, 2005.
[27] Z. Wang and A. C. Bovik, “A Universal Image Quality Index,” IEEE Signal Process. Lett., vol. 9, no. 3, pp. 81–84, 2002.