b

DiscoverSearch
About
My stuff
Kernel of CycleGAN as a Principle homogeneous space
2020·arXiv
ABSTRACT
ABSTRACT

Unpaired image-to-image translation has attracted significant interest due to the invention of CycleGAN, a method which utilizes a combination of adversarial and cycle consistency losses to avoid the need for paired data. It is known that the CycleGAN problem might admit multiple solutions, and our goal in this paper is to analyze the space of exact solutions and to give perturbation bounds for approximate solutions. We show theoretically that the exact solution space is invariant with respect to automorphisms of the underlying probability spaces, and, furthermore, that the group of automorphisms acts freely and transitively on the space of exact solutions. We examine the case of zero ‘pure’ CycleGAN loss first in its generality, and, subsequently, expand our analysis to approximate solutions for ‘extended’ CycleGAN loss where identity loss term is included. In order to demonstrate that these results are applicable, we show that under mild conditions nontrivial smooth automorphisms exist. Furthermore, we provide empirical evidence that neural networks can learn these automorphisms with unexpected and unwanted results. We conclude that finding optimal solutions to the CycleGAN loss does not necessarily lead to the envisioned result in image-to-image translation tasks and that underlying hidden symmetries can render the result utterly useless.

Machine learning methods for image-to-image translation are widely studied and have applications in several fields. In medical imaging, the CycleGAN has found an important application for translating one modality to another, for instance in MR to CT translation (Han, 2017; Sj¨olund et al., 2015; Wolterink et al., 2017). Classically, these methods are trained in a supervised setting making their applications limited due to the a lack of good paired data. Similar issues appear in e.g. transferring the style of one artist to another (Gatys et al., 2015) or adding snow to sunny California streets (Liu et al., 2017). Unpaired image-to-image translation models such as CycleGAN (Zhu et al., 2017) promise to solve this issue by only enforcing a relationship on a distribution level, thus removing the need for paired data. However, given their widespread use, it is paramount to gain more understanding of their dynamics, to prevent unexpected things from happening, e.g., (Cohen et al., 2018). As a step in that direction, we explore the solution space of the CycleGAN in the subsequent sections of this paper.

The general task of unpaired domain translation can be informally described as follows: given two probability spaces X and Y which represent our domains, we seek to learn a mapping  G : X → Y

image

Figure 1: CycleGAN model.

such that a sample  x ∈ Xis mapped to a sample  G(x) ∈ Y where

image

The mapping G is typically approximated by a neural network  Gθparametrized by  θ. Without paired data, directly solving this is impossible but on a distribution level it is easily seen if G solves eq. (1) then the distribution of G(x) as x is sampled from X is equal to that of Y . Mathematically, if  X = (X, X, µ)and  Y = (Y, Y, ν)are probability spaces with probability measures  µand  νrespectively, this can be written as

image

Or in words, the probability measure  νequals the push-forward measure  G∗µ. By Jensen’s equality we can relate this to the fixed f-divergence  Df:

image

While adversarial adversarial optimization techniques such as GANs can in principle solve problem eq. (3), they remain under-constrained thus not giving a reasonable solution to the original problem eq. (1).

The idea behind the cycle consistency condition from (Zhu et al., 2017) is to enforce additional constraints by introducing another function  F : Y → X, which is also approximated by a neural network and tries to solve the inverse task: for each  y ∈ Yfind  F(y) ∈ Xthat would be the best translation of y to X. Similar to the reasoning above, this condition would imply that

image

The goal is to enforce that  F(G(x)) ≈ xfor all  x ∈ Xand, similarly, that  G(F(y)) ≈ yfor all y ∈ Y, i.e. to minimize the following cycle consistency loss

image

where typically the  L1 norm is chosen, but in principle any norm can be chosen. Zhu et al. (Zhu et al., 2017) also suggested that an adversarial loss could in principle have been used here as well, but they did not note any performance improvement.

Combining these losses, we arrive at the CycleGAN loss defined as

image

where the factor  αcyc > 0determines the weight of the cycle consistency term. We illustrate the CycleGAN model in fig. 1.

Precautions with generative models have been addressed before, for example, unpaired image to image translation can hallucinate features in medical images (Cohen et al., 2018). Furthermore, it was already noted in (Zhu et al., 2017) that the CycleGAN might admit multpiple solutions and that the issue of tint shift in image-to-image translation arises due to the fact that for a fixed input image x ∈ Xmultiple images  y1, . . . , yn ∈ Ywith different tints might be equally plausible. Adding identity loss term was suggested in (Zhu et al., 2017) to alleviate the tint shift issue, i.e., the extended CycleGAN loss is defined as

image

where the factor  αid ≥ 0determines the weight of the identity loss term. In general, to properly define the identity loss one needs to represent both X and Y as being the supported on the same manifold, which is limiting if the distributions are substantially different.

The goal of this work is to study the kernel, or null space, of the CycleGAN loss, which is the set of solutions (G, F) which have zero ‘pure’ CycleGAN loss, and to give a perturbation bounds for approximate solutions for the case of extended CycleGAN loss. We do the theoretical analysis in section 2. We show that under certain assumptions on the probability spaces X, Y the kernel has symmetries which allow for multiple possible solutions in Proposition 2.1. Furthermore, we show in Proposition 2.2 and the following remarks that the kernel admits a natural structure of a principle homogeneous space with the automorphism group Aut(X) of X acting on the set of solutions freely and transitively. Next, we expand our analysis to the case of approximate solutions for the extended CycleGAN loss by proving perturbation bounds in Proposition 2.3 and Corollary 2.1. We discuss the existence problem of automorphism in Proposition 2.4 and Proposition 2.6. We proceed in section 3 by showing that unexpected symmetries can be learned by a CycleGAN. In particular, when translating the same domain to itself CycleGAN can learn a nontrivial automorphism of the domain. In appendix A, we briefly explain the measure-theoretic language we use heavily in the paper for those readers who are more used to working with distributions, and also remind the reader of some basic notions from differential geometry which we use as well.

2.1 CYCLEGAN KERNEL AS A PRINCIPLE HOMOGENEOUS SPACE

The notions of isomorphism of probability spaces and of probability space automorphisms are central to this paper. Intuitively speaking, an isomorphism  f : X → Yof probability spaces X and Y is a bijection between X and Y such that the probability of an event  A ⊂ Yequals the probability of event  {x : F(x) ∈ A} ⊂ X. An isomorphism of a probability space to itself is called a probability space automorphism. For example, if our probability space consists of samples from n-dimensional spherical Gaussian distribution, then any rotation in  SO(Rn)is a probability space automorphism. For a precise definition we refer the reader to appendix A.

Firstly, we prove that if at least one of the probability spaces X, Y admits a nontrivial probability automorphism, then any exact solution in the kernel of CycleGAN can be altered giving a different solution.

Proposition 2.1 (Invariance of the kernel). Let  X = (X, X, µ), Y = (Y, Y, ν)be probability spaces and  ϕ : X → Xbe a probability space automorphism. Let  G : X → Yand  F : Y → Xbe measurable maps satisfying

image

Then F, G are probability space isomorphisms and

image

If, furthermore,  ϕ ̸= idX,1 then

image

Proof. Since  ϕis a probability space automorphism, its inverse  ϕ−1 is an automorphism as well. In particular, it is measure-preserving since

image

We note that by eq. (2) and the positivity of the norms eq. (6) implies that

image

and

image

Therefore both F and G are isomorphisms. By definition of L,

image

Since  (G ◦ ϕ)∗µ = G∗(ϕ∗µ)and  ϕis measure-preserving, eq. (9) implies that  (G ◦ ϕ)∗µ = ν. Similarly,  (ϕ−1 ◦ F)∗ν = µ since ϕ−1 is measure-preserving as well. This shows that

image

Using eq. (10) and the fact that  ϕ−1 ◦ ϕ = ϕ ◦ ϕ−1 = idXalmost everywhere, we conclude that

image

and the proof of eq. (7) is complete. To prove eq. (8), first note that there exists a set  A ∈ X such that

image

since we assume that  ϕessentially differs from the identity mapping. If  G ◦ ϕ = G µ-a.e., then F ◦ G ◦ ϕ = F ◦ G µ-a.e. as well, which implies that  ϕ(x) = xfor  µ-almost every x, which is a contradiction. In a similar way one can show that  ϕ−1 ◦ Fessentially differs from F.

We provide the following converse to Proposition 2.1. Proposition 2.2 (Kernel as a principle homogeneous space). Let  X = (X, X, µ), Y = (Y, Y, ν)be probability spaces. Let  F : X → Y, G : Y → Xand  F ′ : X → Y, G′ : Y → Xbe measurable maps satisfying

image

Then there exists a unique probability space automorphism  ϕ : X → X such that

image

For the proof it suffices to take  ϕ := G ◦ F ′. Combined with Proposition 2.1, this allows us to say that the group Aut(X) of probability space automorphisms of X acts freely and transitively on the set of isomorphisms Iso(X, Y) when the latter set is nonempty. This amounts to saying that the space of solutions of CycleGAN is a principle homogeneous space. It can be helpful to view this result from the abstract category theory point of view, that is, if C is a category and  X ∈ Cis any fixed object, then for any object  Y ∈ Cthe automorphism group Aut(X) acts on the set of homomorphisms Hom(X, Y ) on the right by composition, i.e. we define

image

This action leaves the space of isomorphisms  Iso(X, Y ) ⊆ Hom(X, Y )invariant, and this restricted action is transitive if Iso(X, Y ) is nonempty, and, furthermore, free, i.e.  α(φ) ̸= φfor all  α ̸= idXand all  φ ∈ Iso(X, Y ).

To proceed with our analysis for case of approximate solutions for extended CycleGAN loss, we first formulate a useful ‘push-forward property’ for general f-divergences between distributions on  Rn2.The proof is provided in appendix A.

Lemma 2.1 (Push-forward property for f-divergences). Let p, q be distributions on  Rnand  ϕ :Rn → Rn be a diffeomorphism. Then for any f-divergence  Df we have

image

We are now ready to prove the perturbation bounds for approximate solutions.

Proposition 2.3 (Perturbation bound). Let X, Y be probability spaces with probability densities pX, pY ∈ L1(Rn) and let ϕ ∈ Aut(X)be a diffeomorphic probability space automorphism. Assume that  ϕ−1is  Cϕ-Lipshitz, where  Cϕ > 0is some positive constant. Let  G : Rn → Rnand F : Rn → Rnbe measurable maps. Then the following perturbation bound holds for extended CycleGAN loss:

image

Firstly, since  ϕis measure-preserving,  Df((G ◦ ϕ)∗pX∥pY ) = Df(G∗pX∥pY ). Using Lemma 2.1 and the fact that  ϕis measure-preserving again, we see that

image

where the equality  (∗)uses the fact that  ϕis measure-preserving. As in before, Ey∼Y∥G(ϕ(ϕ−1(F(y)))) − y∥ = Ey∼Y∥G(F(y)) − y∥ since ϕ ◦ ϕ−1 = idXalmost everywhere.

Finally, since  ϕis a probability space automorphism and  ϕ−1 is Cϕ-Lipshitz, we conclude that

image

Corollary 2.1 (Asymptotic perturbation bound). In the setting of Proposition 2.3, let  Gi : Rn → Rnand  Fi : Rn → Rn for i ≥ 1be a sequence of measurable maps such that the ‘pure’ CycleGAN loss

image

and let

image

Corollary 2.1 has a direct practical implication. When using a CycleGAN model for translating substantially different distributions (such as different medical imaging modalities) one would be forced to pick a small value for  αidin order for the model to produce reasonable results. Furthermore, since the distributions are substantially different, we can expect that  Lid ≫ 2 · Ex∼X∥ϕ(x) − x∥ formany nontrivial automorphism  ϕ. Therefore, the asymptotic perturbation bound automatically implies that the approximate solution space admits a lot of symmetry, potentially leading to undesirable results.

2.2 EXISTENCE OF AUTOMORPHISMS

By Proposition 2.1 we see that if either space admits a nontrivial probability automorphism, then the CycleGAN problem has multiple solutions. However, for this to be a problem in practice there must actually exist such probability automorphisms, which we shall now show is the case. First of all, we state the following proposition, which says that we can transfer automorphism from an isomorphic copy of X to X itself.

Lemma 2.2. Let  f : Z → Xbe an isomorphism of probability spaces and  T : Z → Zbe an automorphism of  Z. Then S := f ◦ T ◦ f −1 is an automorphism of X and the diagram

image

commutes. Furthermore, if  Z ⊂ Rn, X ⊂ Rm are submanifolds and f, T are diffeomorphisms, then S is a diffeomorphism as well.

Proof. The first claim follows from invertibility of f and T. The second claim follows from the definition of a diffeomorphism between submanifolds, see appendix A.

An important notion in probability theory is that of a Lebesgue probability space. Many probability spaces which emerge in practice such as  [0, 1]n ⊂ Rnwith the Lebesgue measure or  Rnwith a Gaussian probability distribution, both defined on the respective  σ-algebras of Lebesgue measurable sets, are instances of Lebesgue probability spaces.

Definition 2.1. A probability space X is called a Lebesgue probability space if it is isomorphic as a measure space to a disjoint union  ([0, c], λ), where  λis the Lebesgue measure on the  σ-algebra of Lebesgue measurable subsets of the interval [0, c], and at most countably many atoms of total mass 1 − c.

Informally speaking, this definition says that Lebesgue probability spaces consist of a continuous part and at most countably many Dirac deltas (=atoms). First of all, we provide an abstract result about existence of nontrivial probability space automorphisms in Lebesgue probability spaces which are either ‘not purely atomic’ or have at least two atoms with equal mass. ‘Not purely atomic’ means that the sum of the probabilities of all atoms is strictly less than 1.

Proposition 2.4. Let X be a Lebesgue probability space such that at least one of the assumptions

1. X not purely atomic;

2. there exist at least two atoms  aj, ak in Xwith equal mass

holds. Then X admits nontrivial automorphisms.

Proof. If the space X is not purely atomic, we have  X ≃ [0, c] ⊔ �i≥1 aifor some c > 0, where [0, c] is the continuous part and �i≥1 aiis the atomic part of the probability measure  µ. Interval [0, c] admits at least one nontrivial automorphism, namely the transformation  x �→ c − x(leaving the atoms fixed), hence so does X by Lemma 2.2. In fact, there are infinitely many other automorphisms, which can be obtained by exchanging nonoverlapping subintervals  (a, a + d), (b, b + d) ⊂ [0, c]of the same length. If there exist two atoms  aj, akin X with equal mass, then a transformation which transposes  aj with akand keeps the rest of X fixed is a nontrivial automorphism.

Probability spaces of images which appear in real life typically have a continuous component which would correspond to continuous variations in object sizes, lighting conditions, etc. Therefore, they admit some probability space automorphisms. However, such abstract automorphisms can be highly discontinuous, which would make it questionable if neural networks can learn them. We would like to show that there are also automorphisms which are smooth, at least locally. For this, we first state the following technical claim. The proof is provided in appendix A.

Proposition 2.5. Let  µbe a Borel probability measure on  Rnand  f : Rn → Rmbe a continuous injective function. Then  f : (Rn, B(Rn), µ) → (Rm, B(Rm), f∗µ)is an isomorphism of probability spaces, where  f∗µdenotes the push-forward of measure  µ to Rm.

Finally, we show the existence of smooth automorphisms under the assumption that our data manifold  D ⊂ Rmcan be generated by embedding  Rnwith standard Gaussian measure into  Rmas a submanifold. We write  γnfor the standard Gaussian probability measure on the space  Rn.

Proposition 2.6. Let  Z := (Rn, B(Rn), γn)be an n-dimensional standard Gaussian distribution. Let  f : Rn → Rm be a manifold embedding. Denote by X the probability space  (Rm, B(Rm), f∗γn).Then the following assertions hold:

1. f is an isomorphism of probability spaces when viewed as a map  Z → X;

2. every rotation  T ∈ SO(Rn)is a probability space automorphism and a diffeomorphism of Z. T induces a probability space automorphism of X which is, additionally, a diffeomorphism when restricted to  Im f ⊂ Rm.

Proof. The first claim follows directly from Proposition 2.5. For the second part, it is clear that rotations in  SO(Rn)preserve isotropic Gaussian distribution, and the rest follows from Lemma 2.2.

image

The connection with generative models is clear if we take f to be an invertible generative model such as RealNVP Dinh et al. (2016) or Glow Kingma & Dhariwal (2018). The assumption of manifold embedding in the proposition can be seen as too limiting in general, and we explain how to ‘bypass’ it in Lemma A.2 for the interested readers. In conclusion, if we assume that the distributions we are working with could be represented by an invertible generative model, then there exists a rich space of automorphisms. Given the success of e.g. Glow, this assumption seems to be valid for natural images.

Since we have established that the existence of automorphisms can negatively impact the results of CycleGAN, we now demonstrate how this can happen by considering a toy case with a known solution and demonstrating that CycleGAN can and does learn a nontrivial automorphism. The toy experiment which we perform is translation of MNIST dataset to itself. That is, at training time we pick two minibatches batchAand batchBfrom MNIST at random and use these as samples from X and Y respectively. The generator neural network in this case is a convolutional autoencoder with residual blocks, fully connected layer in the bottleneck and no skip connections from encoder to decoder. We also train a simple CNN for MNIST classification in order to classify CycleGAN outputs. The networks were trained using SGD. The ‘natural’ transformation in this case is, of course, the identity mapping and we expect the classification of the inputs and outputs to stay the same. But we shall see that this is not the case.

In fig. 2a–fig. 2h we show some examples for the generated fake samples and the reconstruction on test set. In fig. 3a–fig. 3b we provide the confusion matrices for the A2B and B2A generators respectively. We use these matrices to understand if e.g. the class of transformed image for A2B translation equals the source class, or if is a random variable independent of the source class, or if we can spot some deterministic permutation of classes. We have observed that in practice the identity mapping is not learned. Instead, the network leans towards producing a certain permutation of digits, rather than identity or a random assignment of classes independent of the source label. One explanation would be as follows. Suppose that we can perfectly disentangle class and style in latent digit representation Makhzani et al. (2015). Then any permutation in  S10, acting on the class part of the latent code, determines a probability space automorphism on the space of digits, which can be learned by a neural network. Further investigation of confusion matrices reveals that the networks introduce short cycles, e.g., mapping 2 to 6 and vice versa.

We provide additional experiments on BRATS2015 dataset in appendix B, where we show that in the absense of identity loss the pure CycleGAN loss demonstrates noticeable symmetry, while the PSNR is clearly not invariant. Increasing the weight of the identity loss term reduces the symmetry, but does not necessarily result in a similar PSNR improvement.

image

Figure 2: Examples on MNIST2MNIST task. (a)-(d) A2A translation, first column are samples from A, second column are ’fake B’ and third column are reconstructions of original samples from A (e)-(h) same for B2B translation.

image

Figure 3: Normalized confusion matrices for A2B and B2A generator respectively.

We have shown theoretically that under mild assumptions, the kernel of the CycleGAN admits nontrivial symmetries and has a natural structure of a principle homogeneous space. To show empirically that such symmetries can be learned, we have trained a CycleGAN on the task of translating a domain to itself. In particular, we show that on the MNIST2MNIST task, in contrast to the expected identity, the CycleGAN learns to permute the digits. We have therefore effectively shown, that it is not the CycleGAN loss which prevents this from occurring more often, but hypothesize that the network architecture also has major influence. We advocate against the usage of CycleGAN when translating between substantially different distributions in critical tasks such as medical imaging, given the theoretical results in Corollary 2.1 which suggest ambiguity of solutions, even in the presence of the identity loss term.

We would like to point out that some work has been done recently extending the CycleGAN. For example, in Na et al. (2019) the authors argue that many image-to-image translation tasks are ‘multimodal’ in a sense that there are multiple equally plausible outputs for a single input image, therefore, one should explicitly model this uncertainty in the model. To address this issue, the authors design a network which has two ‘style’ encoders  EX : X → ZX, EY : Y → ZY, two discriminators for each domain, two conditional encoders for each direction  EXY : X × ZY → ZXY , EY X :Y × ZX → ZY Xand two generators for each direction  GXY : ZXY → Y, GY X : ZY X → X. The style encoders serve to extract the ‘style’ of the image, which is present in both domains, e.g., in case of the ‘female-to-male’ task on CelebA dataset the style would correspond to coarsely represented facial features. The loss term forces the mutual information between the style vector of the translated image and the input style to the conditional encoder to be maximized. This allows the network to roughly preserve the style in the translation. While we leave full analysis of this approach for the future work, we expect that such loss would reduce ambiguity in the solution space to those isomorphisms which differ by automorhpishs from the set

image

leaving the style fixed, since replacing  GY Xwith  ϕ ◦ GY Xand  EXYwith  EXY ◦ ϕ−1does not change the loss value for such  ϕ. Therefore, the reduction in uncertainty of our solution depends on capacity of the encoder  EX, and, ideally, should be quantified. In particular, one might still need to enforce additional problem-specific features in the encoder  EXto guarantee that important image style content is preserved.

V. I. Bogachev. Measure theory. Vol. I, II. Springer-Verlag, Berlin, 2007. ISBN 978-3-540-34513-8; 3-540-34513-2. doi: 10.1007/978-3-540-34514-5. URL https://doi.org/10.1007/ 978-3-540-34514-5.

Joseph Paul Cohen, Margaux Luck, and Sina Honari. How to Cure Cancer (in images) with Unpaired Image Translation. In Medical Imaging with Deep Learning (MIDL), volume 1, pp. 1–3, 2018.

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real NVP. abs/1605.08803, 2016. URL http://arxiv.org/abs/1605.08803.

Tanja Eisner, B´alint Farkas, Markus Haase, and Rainer Nagel. Operator theoretic aspects of ergodic theory, volume 272 of Graduate Texts in Mathematics. Springer, Cham, 2015. ISBN 978-3-319-16897-5; 978-3-319-16898-2. doi: 10.1007/978-3-319-16898-2. URL https://doi.org/ 10.1007/978-3-319-16898-2.

Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. A neural algorithm of artistic style. abs/1508.06576, 2015. URL http://arxiv.org/abs/1508.06576.

Xiao Han. Mr-based synthetic ct generation using a deep convolutional neural network method. Medical physics, 44(4):1408–1419, 2017.

Alexander S. Kechris. Classical descriptive set theory, volume 156 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995. ISBN 0-387-94374-9. doi: 10.1007/978-1-4612-4190-4. URL https://doi.org/10.1007/978-1-4612-4190-4.

Durk P Kingma and Prafulla Dhariwal. Glow: Generative flow with invertible 1x1 convolutions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems 31, pp. 10215– 10224. Curran Associates, Inc., 2018. URL http://papers.nips.cc/paper/ 8224-glow-generative-flow-with-invertible-1x1-convolutions.pdf.

Ming-Yu Liu, Thomas Breuel, and Jan Kautz. Unsupervised image-to-image translation networks. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems 30, pp. 700–708. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/ 6672-unsupervised-image-to-image-translation-networks.pdf.

Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, and Ian J. Goodfellow. Adversarial autoencoders. abs/1511.05644, 2015. URL http://arxiv.org/abs/1511.05644.

Sanghyeon Na, Seungjoo Yoo, and Jaegul Choo. MISO: mutual information loss with stochastic style representations for multimodal image-to-image translation. CoRR, abs/1902.03938, 2019. URL http://arxiv.org/abs/1902.03938.

Jens Sj¨olund, Daniel Forsberg, Mats Andersson, and Hans Knutsson. Generating patient specific pseudo-ct of the head from mr using atlas-based regression. Physics in Medicine & Biology, 60(2): 825, 2015.

Frank W. Warner. Foundations of differentiable manifolds and Lie groups, volume 94 of Graduate Texts in Mathematics. Springer-Verlag, New York-Berlin, 1983. ISBN 0-387-90894-3. Corrected reprint of the 1971 edition.

Jelmer M. Wolterink, Anna M. Dinkla, Mark H. F. Savenije, Peter R. Seevinck, Cornelis A. T. van den Berg, and Ivana Isgum. Deep MR to CT synthesis using unpaired data. abs/1708.01155, 2017. URL http://arxiv.org/abs/1708.01155.

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. abs/1703.10593, 2017. URL http://arxiv. org/abs/1703.10593.

A BACKGROUND

Firstly, we very briefly explain the probability theory language we use in this article, and we refer the

reader to (Eisner et al., 2015; Bogachev, 2007) for more details. Formally, a measurable space (X, X)

is a pair of a set  X and a σ-algebra Xof subsets of X. Given a topological space X with topology U,

there exists the smallest  σ-algebra B(X), which contains all open sets in  U. This σ-algebra is called

Borel  σ-algebra of X and its elements are called Borel sets. A probability space  X = (X, X, µ)is

a triple of a set X, a sigma algebra X of subsets of X and a probability measure  µdefined on the

sigma-algebra X. Given a probability space  (X, X, µ), a measurable set  A ∈ Xis called an atom

if  µ(A) > 0and for all measurable  B ⊂ Asuch that  µ(B) < µ(A)we have  µ(B) = 0. Given

measurable spaces (X, X) and (Y, Y), we say that a mapping  φ : X → Yis measurable if for any

A ∈ Ywe have  φ−1(A) ∈ X. If  X = (X, X, µ)and  Y = (Y, Y, ν)are probability spaces and

φ : X → Yis a measurable map, we say that  φis measure-preserving if for all  A ∈ Ywe have

µ(φ−1(A)) = ν(A). An approximation argument easily shows that a measurable transformation

φ : X → Xis measure-preserving if and only if for all nonnegative measurable functions f on X we

have

image

Given a probability space X, a measurable space (Y, Y) and a measurable map  φ : X → Y, we

define the push-forward measure  φ∗µ on Yby setting  (φ∗µ)(A) := µ(φ−1(A)) for all A ∈ Y.

Let  (X, X, µ) and (Y, Y, ν)be probability spaces and  f : X → Ybe a measure-preserving map. A

measurable map  g : Y → Xis called an essential inverse of f if  f ◦ g = idYfor  ν-almost every

y ∈ Yand  g ◦ f = idXfor  µ-almost every  x ∈ X. One can show that essential inverse is measure

preserving and uniquely defined up to equality almost everywhere. We say that f is an isomorphism

if it admits an essential inverse. An isomorphism  f : X → Xis called an automorphism.

Lemma A.1 (Push-forward property for f-divergences). Let p, q be distributions on  Rnand  ϕ :

Rn → Rn be a diffeomorphism. Then for any f-divergence  Df we have

image

Proof. First of all, change of variables formula for the integral implies that

image

Therefore,

image

where the equality in  (∗)uses a general property of Jacobians of smooth invertible maps that

image

We remind the reader that a Polish space is a separable completely metrizable topological space. A

Borel probability space is a Polish space endowed with a probability measure  µon its Borel  σ-algebra,

and we will also say that  µ is aBorel probability measure. The basic examples of Borel probability

spaces would be e.g. the spaces  [0, 1]n ⊂ Rn with its Borel  σ-algebra B(Rn), endowed with Lebesgue

measure  λn. A Borel  σ-algebra of the space  [0, 1]nendowed with Lebesgue measure  λncan be

extended by adding all  λn-measurable sets, leading to the  σ-algebra of Lebesgue-measurable sets.

For the proof of Proposition 2.5 we need the following theorem, see Kechris (1995), Theorem 15.1.

Theorem A.1 (Lusin-Souslin theorem). Let X, Y be Polish spaces and  f : X → Ybe continuous.

If  A ⊂ Xis Borel and  f|Ais injective, then f(A) is Borel.

Proof of Proposition 2.5. Denote the image  f(Rn) ⊂ Rmby Im f. Then  Im f ⊂ Rmis a Borel

subset, since  Rnis a countable union of a compact sets and f is continuous. Furthermore, from

Lusin-Souslin theorem (theorem A.1) it follows that for every Borel subset  A ⊂ Rnits image

f(A) ⊂ Rm is Borel as well. Pick a point  x0 ∈ Rn which is not an atom of  µ. We want to define an

almost everywhere inverse ˜f of f. Define a function ˜f : Rm → Rn by

image

Using the remark above it is easy to see that ˜fis Borel measurable and that  (f∗µ)( ˜f −1(A)) = µ(A)

for every Borel A. It follows from the definition that ˜f ◦ f = idRn and that

image

Since  (f∗µ)(Im f) = 1, ˜fis an almost everywhere inverse to f. We conclude that f is a probability

space isomorphism.

Secondly, we remind the reader of a couple of notions from differential geometry which we use in the

text, and we refer the reader to e.g. (Warner, 1983) for more details. Given a subset X of a manifold

M and a subset Y of a manifold N, a function  f : X → Yis said to be smooth if for all  p ∈ X

there is a neighborhood  U ⊂ Mof p and a smooth function  g : U → Nsuch that g extends f, i.e.,

the restrictions agree  g|U∩X = f|U∩X. fis said to be a diffeomorphism between X and Y if it is

bijective, smooth and its inverse is smooth. Let M and N be smooth manifolds. A differentiable

mapping  f : M → Nis said to be an immersion if the tangent map  dpf : TpM → Tf(p)Nis

injective for all  p ∈ M. If, in addition, f is a homeomorphism onto  f(M) ⊂ N, where f(M) carries

the subspace topology induced from N, we say that  f is an embedding. If M ⊂ Nand the inclusion

map  ı : M → Nis an embedding, we say that M is a submanifold of N. Thus, the domain of an

embedding is diffeomorphic to its image, and the image of an embedding is a submanifold.

We close this section with a small lemma, explaining how one can weaken the embedding assumption

for generative models in Proposition 2.6.

Lemma A.2. Let  f : Rn → Rm be an injective manifold immersion. Let  BR ⊂ Rn be an open ball

of radius  R > 0 in Rn and BRbe its closure. Then  f : BR → f(BR)is a manifold embedding.

Proof. Since  BRis compact and f is continuous, image of every closed subset  A ⊆ BRis compact

and hence closed. This shows that  f −1 : f(BR) → BRis continuous and thus  f : BR → f(BR) is

a homeomorphism. Restricting to the open ball  BR ⊂ BR, we conclude that  f : BR → f(BR) is a

homemorphism and thus a manifold embedding.

As a consequence, for our example with spherical Gaussian latent vector one can take sufficiently

large ball of radius R > 0 in the latent space, truncating the latent distribution to ‘sufficiently likely’

values. This ball remains invariant under rotations, thus leading to a differentiable automorphism on

the submanifold of ‘sufficiently likely’ images.

B BRATS2015 EXPERIMENTS

We present some additional results on the BRATS2015 dataset. For this experiment Unet-based

generators with residual connections were used. The number of downsampling layers was 4 for

both generators, and skip connections were preserved. We trained all models for 20 epochs with

Adam optimizer and learning rate 0.0002. We trained 4 models with  αid ∈ {0.0, 10.0, 20.0, 40.0}.

No data augmentation was used so as to avoid creating any additional symmetries. All images were

normalized by dividing by the 95%-percentile, as is common in medical imaging when working with

MR data.

We hypothesize that flipping images horizontally is a distribution symmetry. We measure the final

test loss for both the network output (Loss) and its flipped version (Loss (f)), as well as the PSNR

for both translation directions without (PSNR T1-Fl, PSNR Fl-T1) and with horizontal flips (PSNR

T1-Fl (f), PSNR Fl-T1 (f)). We summarize these results in table 1.

We observe that in the absense of identity loss the pure CycleGAN loss demonstrates noticeable

symmetry, while the PSNR is clearly not invariant. Increasing the weight of the identity loss term

reduces the symmetry, but does not always result in a similar PSNR improvement. We present some

samples from the model with  αid = 0in fig. 4a, fig. 4b.

Table 1: Results on BRATS2015

image

image

Figure 4: T1-Flair and Flair-T1 translation samples.


Designed for Accessibility and to further Open Science