Machine learning methods for image-to-image translation are widely studied and have applications in several fields. In medical imaging, the CycleGAN has found an important application for translating one modality to another, for instance in MR to CT translation (Han, 2017; Sj¨olund et al., 2015; Wolterink et al., 2017). Classically, these methods are trained in a supervised setting making their applications limited due to the a lack of good paired data. Similar issues appear in e.g. transferring the style of one artist to another (Gatys et al., 2015) or adding snow to sunny California streets (Liu et al., 2017). Unpaired image-to-image translation models such as CycleGAN (Zhu et al., 2017) promise to solve this issue by only enforcing a relationship on a distribution level, thus removing the need for paired data. However, given their widespread use, it is paramount to gain more understanding of their dynamics, to prevent unexpected things from happening, e.g., (Cohen et al., 2018). As a step in that direction, we explore the solution space of the CycleGAN in the subsequent sections of this paper.
The general task of unpaired domain translation can be informally described as follows: given two probability spaces X and Y which represent our domains, we seek to learn a mapping
Figure 1: CycleGAN model.
such that a sample is mapped to a sample
The mapping G is typically approximated by a neural network parametrized by
. Without paired data, directly solving this is impossible but on a distribution level it is easily seen if G solves eq. (1) then the distribution of G(x) as x is sampled from X is equal to that of Y . Mathematically, if
and
are probability spaces with probability measures
and
respectively, this can be written as
Or in words, the probability measure equals the push-forward measure
. By Jensen’s equality we can relate this to the fixed f-divergence
While adversarial adversarial optimization techniques such as GANs can in principle solve problem eq. (3), they remain under-constrained thus not giving a reasonable solution to the original problem eq. (1).
The idea behind the cycle consistency condition from (Zhu et al., 2017) is to enforce additional constraints by introducing another function , which is also approximated by a neural network and tries to solve the inverse task: for each
find
that would be the best translation of y to X. Similar to the reasoning above, this condition would imply that
The goal is to enforce that for all
and, similarly, that
for all
, i.e. to minimize the following cycle consistency loss
where typically the norm is chosen, but in principle any norm can be chosen. Zhu et al. (Zhu et al., 2017) also suggested that an adversarial loss could in principle have been used here as well, but they did not note any performance improvement.
Combining these losses, we arrive at the CycleGAN loss defined as
where the factor determines the weight of the cycle consistency term. We illustrate the CycleGAN model in fig. 1.
Precautions with generative models have been addressed before, for example, unpaired image to image translation can hallucinate features in medical images (Cohen et al., 2018). Furthermore, it was already noted in (Zhu et al., 2017) that the CycleGAN might admit multpiple solutions and that the issue of tint shift in image-to-image translation arises due to the fact that for a fixed input image multiple images
with different tints might be equally plausible. Adding identity loss term was suggested in (Zhu et al., 2017) to alleviate the tint shift issue, i.e., the extended CycleGAN loss is defined as
where the factor determines the weight of the identity loss term. In general, to properly define the identity loss one needs to represent both X and Y as being the supported on the same manifold, which is limiting if the distributions are substantially different.
The goal of this work is to study the kernel, or null space, of the CycleGAN loss, which is the set of solutions (G, F) which have zero ‘pure’ CycleGAN loss, and to give a perturbation bounds for approximate solutions for the case of extended CycleGAN loss. We do the theoretical analysis in section 2. We show that under certain assumptions on the probability spaces X, Y the kernel has symmetries which allow for multiple possible solutions in Proposition 2.1. Furthermore, we show in Proposition 2.2 and the following remarks that the kernel admits a natural structure of a principle homogeneous space with the automorphism group Aut(X) of X acting on the set of solutions freely and transitively. Next, we expand our analysis to the case of approximate solutions for the extended CycleGAN loss by proving perturbation bounds in Proposition 2.3 and Corollary 2.1. We discuss the existence problem of automorphism in Proposition 2.4 and Proposition 2.6. We proceed in section 3 by showing that unexpected symmetries can be learned by a CycleGAN. In particular, when translating the same domain to itself CycleGAN can learn a nontrivial automorphism of the domain. In appendix A, we briefly explain the measure-theoretic language we use heavily in the paper for those readers who are more used to working with distributions, and also remind the reader of some basic notions from differential geometry which we use as well.
2.1 CYCLEGAN KERNEL AS A PRINCIPLE HOMOGENEOUS SPACE
The notions of isomorphism of probability spaces and of probability space automorphisms are central to this paper. Intuitively speaking, an isomorphism of probability spaces X and Y is a bijection between X and Y such that the probability of an event
equals the probability of event
. An isomorphism of a probability space to itself is called a probability space automorphism. For example, if our probability space consists of samples from n-dimensional spherical Gaussian distribution, then any rotation in
is a probability space automorphism. For a precise definition we refer the reader to appendix A.
Firstly, we prove that if at least one of the probability spaces X, Y admits a nontrivial probability automorphism, then any exact solution in the kernel of CycleGAN can be altered giving a different solution.
Proposition 2.1 (Invariance of the kernel). Let be probability spaces and
be a probability space automorphism. Let
and
be measurable maps satisfying
Then F, G are probability space isomorphisms and
If, furthermore,
Proof. Since is a probability space automorphism, its inverse
is an automorphism as well. In particular, it is measure-preserving since
We note that by eq. (2) and the positivity of the norms eq. (6) implies that
and
Therefore both F and G are isomorphisms. By definition of L,
Since and
is measure-preserving, eq. (9) implies that
. Similarly,
is measure-preserving as well. This shows that
Using eq. (10) and the fact that almost everywhere, we conclude that
and the proof of eq. (7) is complete. To prove eq. (8), first note that there exists a set
since we assume that essentially differs from the identity mapping. If
-a.e., then
-a.e. as well, which implies that
for
-almost every x, which is a contradiction. In a similar way one can show that
essentially differs from F.
We provide the following converse to Proposition 2.1. Proposition 2.2 (Kernel as a principle homogeneous space). Let be probability spaces. Let
and
be measurable maps satisfying
Then there exists a unique probability space automorphism
For the proof it suffices to take . Combined with Proposition 2.1, this allows us to say that the group Aut(X) of probability space automorphisms of X acts freely and transitively on the set of isomorphisms Iso(X, Y) when the latter set is nonempty. This amounts to saying that the space of solutions of CycleGAN is a principle homogeneous space. It can be helpful to view this result from the abstract category theory point of view, that is, if C is a category and
is any fixed object, then for any object
the automorphism group Aut(X) acts on the set of homomorphisms Hom(X, Y ) on the right by composition, i.e. we define
This action leaves the space of isomorphisms invariant, and this restricted action is transitive if Iso(X, Y ) is nonempty, and, furthermore, free, i.e.
for all
and all
To proceed with our analysis for case of approximate solutions for extended CycleGAN loss, we first formulate a useful ‘push-forward property’ for general f-divergences between distributions on The proof is provided in appendix A.
Lemma 2.1 (Push-forward property for f-divergences). Let p, q be distributions on and
be a diffeomorphism. Then for any f-divergence
We are now ready to prove the perturbation bounds for approximate solutions.
Proposition 2.3 (Perturbation bound). Let X, Y be probability spaces with probability densities be a diffeomorphic probability space automorphism. Assume that
is
-Lipshitz, where
is some positive constant. Let
and
be measurable maps. Then the following perturbation bound holds for extended CycleGAN loss:
Firstly, since is measure-preserving,
. Using Lemma 2.1 and the fact that
is measure-preserving again, we see that
where the equality uses the fact that
is measure-preserving. As in before,
almost everywhere.
Finally, since is a probability space automorphism and
-Lipshitz, we conclude that
Corollary 2.1 (Asymptotic perturbation bound). In the setting of Proposition 2.3, let and
be a sequence of measurable maps such that the ‘pure’ CycleGAN loss
and let
Corollary 2.1 has a direct practical implication. When using a CycleGAN model for translating substantially different distributions (such as different medical imaging modalities) one would be forced to pick a small value for in order for the model to produce reasonable results. Furthermore, since the distributions are substantially different, we can expect that
many nontrivial automorphism
. Therefore, the asymptotic perturbation bound automatically implies that the approximate solution space admits a lot of symmetry, potentially leading to undesirable results.
2.2 EXISTENCE OF AUTOMORPHISMS
By Proposition 2.1 we see that if either space admits a nontrivial probability automorphism, then the CycleGAN problem has multiple solutions. However, for this to be a problem in practice there must actually exist such probability automorphisms, which we shall now show is the case. First of all, we state the following proposition, which says that we can transfer automorphism from an isomorphic copy of X to X itself.
Lemma 2.2. Let be an isomorphism of probability spaces and
be an automorphism of
is an automorphism of X and the diagram
commutes. Furthermore, if are submanifolds and f, T are diffeomorphisms, then S is a diffeomorphism as well.
Proof. The first claim follows from invertibility of f and T. The second claim follows from the definition of a diffeomorphism between submanifolds, see appendix A.
An important notion in probability theory is that of a Lebesgue probability space. Many probability spaces which emerge in practice such as with the Lebesgue measure or
with a Gaussian probability distribution, both defined on the respective
-algebras of Lebesgue measurable sets, are instances of Lebesgue probability spaces.
Definition 2.1. A probability space X is called a Lebesgue probability space if it is isomorphic as a measure space to a disjoint union , where
is the Lebesgue measure on the
-algebra of Lebesgue measurable subsets of the interval [0, c], and at most countably many atoms of total mass
Informally speaking, this definition says that Lebesgue probability spaces consist of a continuous part and at most countably many Dirac deltas (=atoms). First of all, we provide an abstract result about existence of nontrivial probability space automorphisms in Lebesgue probability spaces which are either ‘not purely atomic’ or have at least two atoms with equal mass. ‘Not purely atomic’ means that the sum of the probabilities of all atoms is strictly less than 1.
Proposition 2.4. Let X be a Lebesgue probability space such that at least one of the assumptions
1. X not purely atomic;
2. there exist at least two atoms with equal mass
holds. Then X admits nontrivial automorphisms.
Proof. If the space X is not purely atomic, we have for some c > 0, where [0, c] is the continuous part and
is the atomic part of the probability measure
. Interval [0, c] admits at least one nontrivial automorphism, namely the transformation
(leaving the atoms fixed), hence so does X by Lemma 2.2. In fact, there are infinitely many other automorphisms, which can be obtained by exchanging nonoverlapping subintervals
of the same length. If there exist two atoms
in X with equal mass, then a transformation which transposes
and keeps the rest of X fixed is a nontrivial automorphism.
Probability spaces of images which appear in real life typically have a continuous component which would correspond to continuous variations in object sizes, lighting conditions, etc. Therefore, they admit some probability space automorphisms. However, such abstract automorphisms can be highly discontinuous, which would make it questionable if neural networks can learn them. We would like to show that there are also automorphisms which are smooth, at least locally. For this, we first state the following technical claim. The proof is provided in appendix A.
Proposition 2.5. Let be a Borel probability measure on
and
be a continuous injective function. Then
is an isomorphism of probability spaces, where
denotes the push-forward of measure
Finally, we show the existence of smooth automorphisms under the assumption that our data manifold can be generated by embedding
with standard Gaussian measure into
as a submanifold. We write
for the standard Gaussian probability measure on the space
Proposition 2.6. Let be an n-dimensional standard Gaussian distribution. Let
be a manifold embedding. Denote by X the probability space
Then the following assertions hold:
1. f is an isomorphism of probability spaces when viewed as a map
2. every rotation is a probability space automorphism and a diffeomorphism of Z. T induces a probability space automorphism of X which is, additionally, a diffeomorphism when restricted to
Proof. The first claim follows directly from Proposition 2.5. For the second part, it is clear that rotations in preserve isotropic Gaussian distribution, and the rest follows from Lemma 2.2.
The connection with generative models is clear if we take f to be an invertible generative model such as RealNVP Dinh et al. (2016) or Glow Kingma & Dhariwal (2018). The assumption of manifold embedding in the proposition can be seen as too limiting in general, and we explain how to ‘bypass’ it in Lemma A.2 for the interested readers. In conclusion, if we assume that the distributions we are working with could be represented by an invertible generative model, then there exists a rich space of automorphisms. Given the success of e.g. Glow, this assumption seems to be valid for natural images.
Since we have established that the existence of automorphisms can negatively impact the results of CycleGAN, we now demonstrate how this can happen by considering a toy case with a known solution and demonstrating that CycleGAN can and does learn a nontrivial automorphism. The toy experiment which we perform is translation of MNIST dataset to itself. That is, at training time we pick two minibatches batchand batch
from MNIST at random and use these as samples from X and Y respectively. The generator neural network in this case is a convolutional autoencoder with residual blocks, fully connected layer in the bottleneck and no skip connections from encoder to decoder. We also train a simple CNN for MNIST classification in order to classify CycleGAN outputs. The networks were trained using SGD. The ‘natural’ transformation in this case is, of course, the identity mapping and we expect the classification of the inputs and outputs to stay the same. But we shall see that this is not the case.
In fig. 2a–fig. 2h we show some examples for the generated fake samples and the reconstruction on test set. In fig. 3a–fig. 3b we provide the confusion matrices for the A2B and B2A generators respectively. We use these matrices to understand if e.g. the class of transformed image for A2B translation equals the source class, or if is a random variable independent of the source class, or if we can spot some deterministic permutation of classes. We have observed that in practice the identity mapping is not learned. Instead, the network leans towards producing a certain permutation of digits, rather than identity or a random assignment of classes independent of the source label. One explanation would be as follows. Suppose that we can perfectly disentangle class and style in latent digit representation Makhzani et al. (2015). Then any permutation in , acting on the class part of the latent code, determines a probability space automorphism on the space of digits, which can be learned by a neural network. Further investigation of confusion matrices reveals that the networks introduce short cycles, e.g., mapping 2 to 6 and vice versa.
We provide additional experiments on BRATS2015 dataset in appendix B, where we show that in the absense of identity loss the pure CycleGAN loss demonstrates noticeable symmetry, while the PSNR is clearly not invariant. Increasing the weight of the identity loss term reduces the symmetry, but does not necessarily result in a similar PSNR improvement.
Figure 2: Examples on MNIST2MNIST task. (a)-(d) A2A translation, first column are samples from A, second column are ’fake B’ and third column are reconstructions of original samples from A (e)-(h) same for B2B translation.
Figure 3: Normalized confusion matrices for A2B and B2A generator respectively.
We have shown theoretically that under mild assumptions, the kernel of the CycleGAN admits nontrivial symmetries and has a natural structure of a principle homogeneous space. To show empirically that such symmetries can be learned, we have trained a CycleGAN on the task of translating a domain to itself. In particular, we show that on the MNIST2MNIST task, in contrast to the expected identity, the CycleGAN learns to permute the digits. We have therefore effectively shown, that it is not the CycleGAN loss which prevents this from occurring more often, but hypothesize that the network architecture also has major influence. We advocate against the usage of CycleGAN when translating between substantially different distributions in critical tasks such as medical imaging, given the theoretical results in Corollary 2.1 which suggest ambiguity of solutions, even in the presence of the identity loss term.
We would like to point out that some work has been done recently extending the CycleGAN. For example, in Na et al. (2019) the authors argue that many image-to-image translation tasks are ‘multimodal’ in a sense that there are multiple equally plausible outputs for a single input image, therefore, one should explicitly model this uncertainty in the model. To address this issue, the authors design a network which has two ‘style’ encoders , two discriminators for each domain, two conditional encoders for each direction
and two generators for each direction
. The style encoders serve to extract the ‘style’ of the image, which is present in both domains, e.g., in case of the ‘female-to-male’ task on CelebA dataset the style would correspond to coarsely represented facial features. The loss term forces the mutual information between the style vector of the translated image and the input style to the conditional encoder to be maximized. This allows the network to roughly preserve the style in the translation. While we leave full analysis of this approach for the future work, we expect that such loss would reduce ambiguity in the solution space to those isomorphisms which differ by automorhpishs from the set
leaving the style fixed, since replacing with
and
with
does not change the loss value for such
. Therefore, the reduction in uncertainty of our solution depends on capacity of the encoder
, and, ideally, should be quantified. In particular, one might still need to enforce additional problem-specific features in the encoder
to guarantee that important image style content is preserved.
V. I. Bogachev. Measure theory. Vol. I, II. Springer-Verlag, Berlin, 2007. ISBN 978-3-540-34513-8; 3-540-34513-2. doi: 10.1007/978-3-540-34514-5. URL https://doi.org/10.1007/ 978-3-540-34514-5.
Joseph Paul Cohen, Margaux Luck, and Sina Honari. How to Cure Cancer (in images) with Unpaired Image Translation. In Medical Imaging with Deep Learning (MIDL), volume 1, pp. 1–3, 2018.
Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real NVP. abs/1605.08803, 2016. URL http://arxiv.org/abs/1605.08803.
Tanja Eisner, B´alint Farkas, Markus Haase, and Rainer Nagel. Operator theoretic aspects of ergodic theory, volume 272 of Graduate Texts in Mathematics. Springer, Cham, 2015. ISBN 978-3-319-16897-5; 978-3-319-16898-2. doi: 10.1007/978-3-319-16898-2. URL https://doi.org/ 10.1007/978-3-319-16898-2.
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. A neural algorithm of artistic style. abs/1508.06576, 2015. URL http://arxiv.org/abs/1508.06576.
Xiao Han. Mr-based synthetic ct generation using a deep convolutional neural network method. Medical physics, 44(4):1408–1419, 2017.
Alexander S. Kechris. Classical descriptive set theory, volume 156 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995. ISBN 0-387-94374-9. doi: 10.1007/978-1-4612-4190-4. URL https://doi.org/10.1007/978-1-4612-4190-4.
Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems 31, pp. 10215– 10224. Curran Associates, Inc., 2018. URL http://papers.nips.cc/paper/ 8224-glow-generative-flow-with-invertible-1x1-convolutions.pdf.
Ming-Yu Liu, Thomas Breuel, and Jan Kautz. Unsupervised image-to-image translation networks. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems 30, pp. 700–708. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/ 6672-unsupervised-image-to-image-translation-networks.pdf.
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, and Ian J. Goodfellow. Adversarial autoencoders. abs/1511.05644, 2015. URL http://arxiv.org/abs/1511.05644.
Sanghyeon Na, Seungjoo Yoo, and Jaegul Choo. MISO: mutual information loss with stochastic style representations for multimodal image-to-image translation. CoRR, abs/1902.03938, 2019. URL http://arxiv.org/abs/1902.03938.
Jens Sj¨olund, Daniel Forsberg, Mats Andersson, and Hans Knutsson. Generating patient specific pseudo-ct of the head from mr using atlas-based regression. Physics in Medicine & Biology, 60(2): 825, 2015.
Frank W. Warner. Foundations of differentiable manifolds and Lie groups, volume 94 of Graduate Texts in Mathematics. Springer-Verlag, New York-Berlin, 1983. ISBN 0-387-90894-3. Corrected reprint of the 1971 edition.
Jelmer M. Wolterink, Anna M. Dinkla, Mark H. F. Savenije, Peter R. Seevinck, Cornelis A. T. van den Berg, and Ivana Isgum. Deep MR to CT synthesis using unpaired data. abs/1708.01155, 2017. URL http://arxiv.org/abs/1708.01155.
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. abs/1703.10593, 2017. URL http://arxiv. org/abs/1703.10593.
A BACKGROUND
Firstly, we very briefly explain the probability theory language we use in this article, and we refer the
reader to (Eisner et al., 2015; Bogachev, 2007) for more details. Formally, a measurable space (X, X)
is a pair of a set of subsets of X. Given a topological space X with topology U,
there exists the smallest , which contains all open sets in
-algebra is called
Borel -algebra of X and its elements are called Borel sets. A probability space
is
a triple of a set X, a sigma algebra X of subsets of X and a probability measure defined on the
sigma-algebra X. Given a probability space , a measurable set
is called an atom
if and for all measurable
such that
we have
. Given
measurable spaces (X, X) and (Y, Y), we say that a mapping is measurable if for any
we have
. If
and
are probability spaces and
is a measurable map, we say that
is measure-preserving if for all
we have
. An approximation argument easily shows that a measurable transformation
is measure-preserving if and only if for all nonnegative measurable functions f on X we
have
Given a probability space X, a measurable space (Y, Y) and a measurable map , we
define the push-forward measure by setting
Let be probability spaces and
be a measure-preserving map. A
measurable map is called an essential inverse of f if
for
-almost every
and
for
-almost every
. One can show that essential inverse is measure
preserving and uniquely defined up to equality almost everywhere. We say that f is an isomorphism
if it admits an essential inverse. An isomorphism is called an automorphism.
Lemma A.1 (Push-forward property for f-divergences). Let p, q be distributions on and
be a diffeomorphism. Then for any f-divergence
Proof. First of all, change of variables formula for the integral implies that
Therefore,
where the equality in uses a general property of Jacobians of smooth invertible maps that
We remind the reader that a Polish space is a separable completely metrizable topological space. A
Borel probability space is a Polish space endowed with a probability measure on its Borel
and we will also say that Borel probability measure. The basic examples of Borel probability
spaces would be e.g. the spaces with its Borel
, endowed with Lebesgue
measure . A Borel
-algebra of the space
endowed with Lebesgue measure
can be
extended by adding all -measurable sets, leading to the
-algebra of Lebesgue-measurable sets.
For the proof of Proposition 2.5 we need the following theorem, see Kechris (1995), Theorem 15.1.
Theorem A.1 (Lusin-Souslin theorem). Let X, Y be Polish spaces and be continuous.
If is Borel and
is injective, then f(A) is Borel.
Proof of Proposition 2.5. Denote the image by Im f. Then
is a Borel
subset, since is a countable union of a compact sets and f is continuous. Furthermore, from
Lusin-Souslin theorem (theorem A.1) it follows that for every Borel subset its image
is Borel as well. Pick a point
which is not an atom of
. We want to define an
almost everywhere inverse . Define a function
Using the remark above it is easy to see that is Borel measurable and that
for every Borel A. It follows from the definition that
Since is an almost everywhere inverse to f. We conclude that f is a probability
space isomorphism.
Secondly, we remind the reader of a couple of notions from differential geometry which we use in the
text, and we refer the reader to e.g. (Warner, 1983) for more details. Given a subset X of a manifold
M and a subset Y of a manifold N, a function is said to be smooth if for all
there is a neighborhood of p and a smooth function
such that g extends f, i.e.,
the restrictions agree is said to be a diffeomorphism between X and Y if it is
bijective, smooth and its inverse is smooth. Let M and N be smooth manifolds. A differentiable
mapping is said to be an immersion if the tangent map
is
injective for all . If, in addition, f is a homeomorphism onto
the subspace topology induced from N, we say that and the inclusion
map is an embedding, we say that M is a submanifold of N. Thus, the domain of an
embedding is diffeomorphic to its image, and the image of an embedding is a submanifold.
We close this section with a small lemma, explaining how one can weaken the embedding assumption
for generative models in Proposition 2.6.
Lemma A.2. Let be an injective manifold immersion. Let
be an open ball
of radius be its closure. Then
is a manifold embedding.
Proof. Since is compact and f is continuous, image of every closed subset
is compact
and hence closed. This shows that is continuous and thus
a homeomorphism. Restricting to the open ball , we conclude that
homemorphism and thus a manifold embedding.
As a consequence, for our example with spherical Gaussian latent vector one can take sufficiently
large ball of radius R > 0 in the latent space, truncating the latent distribution to ‘sufficiently likely’
values. This ball remains invariant under rotations, thus leading to a differentiable automorphism on
the submanifold of ‘sufficiently likely’ images.
B BRATS2015 EXPERIMENTS
We present some additional results on the BRATS2015 dataset. For this experiment Unet-based
generators with residual connections were used. The number of downsampling layers was 4 for
both generators, and skip connections were preserved. We trained all models for 20 epochs with
Adam optimizer and learning rate 0.0002. We trained 4 models with .
No data augmentation was used so as to avoid creating any additional symmetries. All images were
normalized by dividing by the 95%-percentile, as is common in medical imaging when working with
MR data.
We hypothesize that flipping images horizontally is a distribution symmetry. We measure the final
test loss for both the network output (Loss) and its flipped version (Loss (f)), as well as the PSNR
for both translation directions without (PSNR T1-Fl, PSNR Fl-T1) and with horizontal flips (PSNR
T1-Fl (f), PSNR Fl-T1 (f)). We summarize these results in table 1.
We observe that in the absense of identity loss the pure CycleGAN loss demonstrates noticeable
symmetry, while the PSNR is clearly not invariant. Increasing the weight of the identity loss term
reduces the symmetry, but does not always result in a similar PSNR improvement. We present some
samples from the model with in fig. 4a, fig. 4b.
Table 1: Results on BRATS2015
Figure 4: T1-Flair and Flair-T1 translation samples.