Learned SVD: solving inverse problems via hybrid autoencoding

2019·arXiv

Abstract

REFERENCES

[1] Problems, 33 (2017), p. 124007, https://doi.org/10.1088/1361-6420/aa9581.

[2] H. K. Aggarwal, M. P. Mani, and M. Jacob, Modl: Model-based deep learning architecture for inverse problems, IEEE transactions on medical imaging, 38 (2018), pp. 394–405.

[3] S. G. Armato III, G. McLennan, L. Bidaut, M. F. McNitt-Gray, C. R. Meyer, A. P. Reeves, B. Zhao, D. R. Aberle, C. I. Henschke, E. A. Hoffman, et al., The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans, Medical physics, 38 (2011), pp. 915–931.

[4] models, Acta Numerica, 28 (2019), pp. 1–174, https://doi.org/10.1017/S0962492919000059.

[5] Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE transactions on pattern analysis and machine intelligence, 35 (2013), pp. 1798–1828.

[6] P. Benner, S. Gugercin, and K. Willcox, A Survey of Projection-Based Model Reduction Methods for Parametric Dynamical Systems, SIAM Review, 57 (2015), pp. 483–531, https://doi.org/10.1137/ 130932715, http://epubs.siam.org/doi/10.1137/130932715.

[7] M. Benning and M. Burger, Modern regularization methods for inverse problems, Acta Numerica, 27 (2018), pp. 1–111.

[8] K. Bhattacharya, B. Hosseini, N. B. Kovachki, and A. M. Stuart, Model reduction and neural networks for parametric pdes, arXiv preprint arXiv:2005.03180, (2020).

[9] in Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques, vol. 53, Institut Henri Poincar´e, 2017, pp. 1–26.

[10] Y. E. Boink, M. Haltmeier, S. Holman, and J. Schwab, Data-consistent neural networks for solving nonlinear inverse problems, arXiv:2003.11253, (2020).

[11] Y. E. Boink, M. J. Lagerwerf, W. Steenbergen, S. A. van Gils, S. Manohar, and C. Brune, A framework for directional and higher-order reconstruction in photoacoustic tomography, Physics in Medicine & Biology, 63 (2018), p. 045018.

[12] Y. E. Boink, S. Manohar, and C. Brune, A Partially Learned Algorithm for Joint Photoacoustic Reconstruction and Segmentation, IEEE Transactions on Medical Imaging, (2019), pp. 1–11, https: //doi.org/10.1109/TMI.2019.2922026.

[13] S. L. Brunton, J. L. Proctor, and J. N. Kutz, Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proceedings of the National Academy of Sciences, 113 (2016), pp. 3932–3937, https://doi.org/10.1073/pnas.1517384113.

[14] T. Bui-Thanh, K. Willcox, and O. Ghattas, Model reduction for large-scale systems with highdimensional parametric input space, SIAM Journal on Scientific Computing, 30 (2008), pp. 3270– 3288.

[15] H. Chen, Y. Zhang, Y. Chen, J. Zhang, W. Zhang, H. Sun, Y. Lv, P. Liao, J. Zhou, and IEEE Trans. Med. Imaging, 37 (2018), pp. 1333–1347, https://doi.org/10.1109/TMI.2018.2805692, arxiv.org/abs/1707.09636, https://arxiv.org/abs/1707.09636.

[16] J. Chung and M. Chung, An efficient approach for computing optimal low-rank regularized inverse matrices, Inverse Problems, 30 (2014), p. 114009.

[17] Algebra and its Applications, 468 (2015), pp. 260–269.

[18] C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika, 1 (1936), pp. 211–218.

[19] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of inverse problems, vol. 375, Springer Science & Business Media, 1996.

[20] S. N. Evans and P. B. Stark, Inverse problems as statistics, Inverse Problems, 18 (2002), p. 201, https://doi.org/10.1088/0266-5611/18/4/201.

[21] P. T. Fletcher, C. Lu, S. M. Pizer, and S. Joshi, Principal geodesic analysis for the study of nonlinear statistics of shape, IEEE transactions on medical imaging, 23 (2004), pp. 995–1005.

[22] T. W. Gamelin and R. E. Greene, Introduction to topology, Dover Publications, 1999, https://www. maa.org/press/maa-reviews/introduction-to-topology.

[23] G. Golub and W. Kahan, Calculating the singular values and pseudo-inverse of a matrix, Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis, 2 (1965), pp. 205–224.

[24] K. Gupta, B. Bhowmick, and A. Majumdar, Motion blur removal via coupled autoencoder, in 2017 IEEE International Conference on Image Processing (ICIP), IEEE, 2017, pp. 480–484.

[25] A. Hauptmann, B. Cox, F. Lucka, N. Huynh, M. Betcke, P. Beard, and S. Arridge, Approximate k-Space Models and Deep Learning for Fast Photoacoustic Reconstruction, vol. 11074, Springer International Publishing, 2018, https://doi.org/10.1007/978-3-030-00129-2, http://link.springer.com/10. 1007/978-3-030-00129-2.

[26] G. E. Hinton, Reducing the Dimensionality of Data with Neural Networks, Science, 313 (2006), pp. 504– 507, https://doi.org/10.1126/science.1127647.

[27] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, Deep Convolutional Neural Network for Inverse Problems in Imaging, IEEE Transactions on Image Processing, 26 (2017), pp. 4509–4522, https://doi.org/10.1109/TIP.2017.2713099.

[28] J. P. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems, vol. 160 of Applied Mathematical Sciences, Springer-Verlag, New York, 2005, https://doi.org/10.1007/b138659.

[29] A. C. Kak, M. Slaney, and G. Wang, Principles of computerized tomographic imaging, Medical Physics, 29 (2002), pp. 107–107, https://doi.org/10.1118/1.1455742, https://aapm.onlinelibrary. wiley.com/doi/abs/10.1118/1.1455742, https://arxiv.org/abs/https://aapm.onlinelibrary.wiley.com/ doi/pdf/10.1118/1.1455742.

[30] E. Kobler, T. Klatzer, K. Hammernik, and T. Pock, Variational Networks: Connecting Variational Methods and Deep Learning, in German Conference on Pattern Recognition (GCPR), V. Roth and T. Vetter, eds., vol. 10496 of Lecture Notes in Computer Science, Springer International Publishing, Cham, 2017, pp. 281–293, https://doi.org/10.1007/978-3-319-66709-6 23.

[31] L. Le, A. Patterson, and M. White, Supervised autoencoders: Improving generalization performance with unsupervised regularizers, in Neural Information Processing Systems (NeurIPS), S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, eds., Curran Associates, Inc., 2018, pp. 107–117, http://papers.nips.cc/paper/ 7296-supervised-autoencoders-improving-generalization-performance-with-unsupervised-regularizers. pdf.

[32] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document

recognition, Proceedings of the IEEE, 86 (1998), pp. 2278–2324, https://doi.org/10.1109/5.726791.

The lodopab-ct dataset: A benchmark dataset for low-dose ct reconstruction methods, arXiv preprint arXiv:1910.01113, (2019).

[34] R. M. Lewitt, Multidimensional digital image representations using generalized kaiser–bessel window functions, JOSA A, 7 (1990), pp. 1834–1846.

[35] H. Li, J. Schwab, S. Antholzer, and M. Haltmeier, Nett: Solving inverse problems with deep neural networks, Inverse Problems, (2020).

[36] S. Lunz, A. Hauptmann, T. Tarvainen, C.-B. Sch¨onlieb, and S. Arridge, On learned operator correction, arXiv:2005.07069, (2020).

[37] , in Advances in Neural Information Processing Systems, 2018, pp. 8507–8516.

[38] E. Qian, B. Kramer, A. N. Marques, and K. E. Willcox, Transform & Learn: A data-driven approach to nonlinear model reduction, in AIAA Aviation 2019 Forum, no. June, Reston, Virginia, jun 2019, American Institute of Aeronautics and Astronautics, pp. 1–11, https://doi.org/10.2514/6. 2019-3707.

[39] M. Raissi, P. Perdikaris, and G. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Comp. Physics, 378 (2019), pp. 686–707, https://doi.org/10.1016/j.jcp.2018.10.045.

[40] M. Ranzato, C. Poultney, S. Chopra, and Y. L. Cun, Efficient learning of sparse representations with an energy-based model, in Adv. in neural information processing systems, 2007, pp. 1137–1144.

[41] S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, Contractive auto-encoders: Explicit invariance during feature extraction, in Proceedings of the 28th International Conference on International Conference on Machine Learning, Omnipress, 2011, pp. 833–840.

[42] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: nonlinear phenomena, 60 (1992), pp. 259–268.

[43] W. Rudin, Principles of Mathematical Analysis -, McGraw-Hill, New York, 3. aufl. ed., 1976.

[44] S. H. Rudy, S. L. Brunton, J. L. Proctor, and J. N. Kutz, Data-driven discovery of partial differ-ential equations, Science Advances, 3 (2017), p. e1602614, https://doi.org/10.1126/sciadv.1602614.

Nonlinear component analysis as a kernel eigenvalue problem, Neural computation, 10 (1998), pp. 1299–1319.

[46] J. Schwab, S. Antholzer, and M. Haltmeier, Deep null space learning for inverse problems: convergence analysis and rates, Inverse Problems, (2018).

[47] O. Senouf, S. Vedula, T. Weiss, A. Bronstein, O. Michailovich, and M. Zibulevsky, Selfsupervised learning of inverse problem solvers in medical imaging, in Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data, Springer, 2019, pp. 111–119.

[48] B. Sim, G. Oh, S. Lim, and J. C. Ye, Optimal transport, cyclegan, and penalized ls for unsupervised learning in inverse problems, arXiv:1909.12116, (2019).

[49] autoencoders, in Advances in neural information processing systems, 2016, pp. 3738–3746.

[50] A. M. Stuart, Inverse problems: A Bayesian perspective, Acta Numerica, 19 (2010), pp. 451–559, https://doi.org/10.1017/S0962492910000061.

[51] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion, Journal of Machine Learning Research, 11 (2010), pp. 3371–3408, http://www.jmlr.org/papers/volume11/ vincent10a/vincent10a.pdf.

[52] G. Yu, G. Sapiro, and S. Mallat, Solving Inverse Problems With Piecewise Linear Estimators: From Gaussian Mixture Models to Structured Sparsity, IEEE Transactions on Image Processing, 21 (2012), pp. 2481–2499, https://doi.org/10.1109/TIP.2011.2176743.

[53] K. Zeng, J. Yu, R. Wang, C. Li, and D. Tao, Coupled Deep Autoencoder for Single Image SuperResolution, IEEE Transactions on Cybernetics, 47 (2017), pp. 27–37, https://doi.org/10.1109/TCYB. 2015.2501373, http://ieeexplore.ieee.org/document/7339460/.

[54] Y. Zhang, K. Lee, and H. Lee, Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification, in International Conference on Machine Learning (ICML), vol. 48, New York, jun 2016, JMLR, pp. 612–621, https://arxiv.org/abs/1606.06582.

[55] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. Rosen, Image reconstruction by domaintransform manifold learning, Nature, 555 (2018), pp. 487–492, https://doi.org/10.1038/nature25988.

[56] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, Unpaired Image-to-Image Translation using CycleConsistent Adversarial Networks, in The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2223–2232, https://arxiv.org/abs/1703.10593.

Proof. We will first take a look at the more general case where we set (see equation (3.4) in [50]) as

B. Inequality (4.16) written out. This section shows that the inequality (4.16) holds, given the (4.14) and (4.15). We consider the function

For 1 this function attains its maximum at . Filling this in yields

For (the right end of the interval for this in yields

where we filled in (4.16).

C. Implementation details. This appendix provides the implementation details of the neural networks used in this paper. Training is performed in Tensorflow. The first two experiments make use of the complete MNIST training set (60 000 training samples), where each image is rescaled to 64 64 using bilinear interpolation. Testing is done on the first 1000 samples of the MNIST test set, using the same interpolation. For these experiments, the Radon transform is applied using the ‘scikit-image’ toolbox in Python. The third experiment makes use of the complete LoDoPaB training set. A data-augmented training set is obtained by mirroring and rotating each image with multiples of 90. This yields a training set with 284 672 samples: eight times the original number of samples. For this experiment, the Radon transform as described in [34] is applied.

The networks in the first and second experiment only make use of fully-connected (FC) layers. The third experiment has additional convolutional layers on the image side: before the fully-connected layers in the encoder and after the fully-connected layers in the decoder. Details about the encoder and decoder are given in Table C.1. After each layer, except the last layer, a leaky ReLU (lReLU) with parameter as specified in Table C.1 is applied. All networks in the first two experiments are chosen without biases, the third experiment makes use of biases. Initial weights of all experiments are normally distributed with a standard deviation of 0.01.

Table C.1: Architecture choices of the encoder and decoder for each of the experiments.

The fully learned L-SVD networks (experiments 1f, 2 and 3) make use of a linear scaling matrix Σ. Experiments 1d and 1e make use of a nonlinear scaling function Σ() in the form of a neural network, which consists of five fully-connected layers with biases. After each layer a leaky ReLU with parameter 1 is applied, except for the last layer: • Experiment 1d appliesas the final nonlinearity, where sig(denotes the sigmoid function. This nonlinearity ensures the bounds (see Section 4.2). In this experiment, scaling Σ(

• Experiment 1e applies the softplus function as the final nonlinearity. This immediately yields the total scaling Σ(Initial weights are normally distributed with a standard deviation of 0.01.

For all experiments, each loss function -type (mean squared error) with as specified in Table C.2. We apply the ADAM optimiser using a learning rate with exponential weight decay. The number of epochs, batch sizes and the start and final learning rates are stated in Table C.2. All other optimisation parameters are the default choices of ADAM in Tensorflow. Gradient norm clipping with a value of 10 is applied for training stability. No regularisation, dropout or batch normalisation are used.

Table C.2: Optimisation choices for each of the described experiments.

D. Visualisation of latent space elements in experiment 1. To understand the transition from model-based to data-driven, canonical basis vectors in the latent space are decoded to the image space for both the regular SVD (Figure 7.1b-7.1e) and fully learned L-SVD with random initialisation (Figure 7.1h). Four selected elements from this ‘dictionary’ are shown in Figure D.1.

Figure D.1: Selected elements in the latent space , decoded to the image space X and the sinogram space Y . SVD only makes use of the operator, while L-SVD combines operator information with image and sinogram information. This results in more localised information in the decoded elements of the data-driven L-SVD approach.

SVD decomposes the Radon operator in different elements with a different geometrical scale. Moreover, it combines higher order harmonics in the image space and the sinogram space. For example, the second sinogram from the left in Figure D.1c shows an approximate 2D sinusoidal structure, while Figure D.1a provides its counterpart in the image space. For the third image from the left, it is the other way around. L-SVD shows similar behaviour for larger geometrical scales, but differences are also apparent: the sinusoids are only ‘active’ at the location of potential MNIST digits, and the sinogram space also encodes noise-like structures (first and second image in Figures D.1b and D.1d). Smaller scale elements (third image) show very localised geometrical structures in image space, while others (fourth image) only seem to capture noise.

E. Visualisation of all latent space elements in experiment 2. Figure E.1 provides the complete dictionaries for all methods, from which a selection was shown in Figure 7.4. For a discussion on the results we refer to Section 7.2.

Figure E.1: All 64 elements of the dictionary of all methods, from which a selection was shown in Figure 7.4.

Designed for Accessibility and to further Open Science