A Novel Generative Neural Approach for InSAR Joint Phase Filtering and Coherence Estimation

2020·Arxiv

Abstract

Abstract

Phase filtering and pixel quality (coherence) estimation is critical in producing Digital Elevation Models (DEMs) from Interferometric Synthetic Aperture Radar (InSAR) images, as it removes spatial inconsistencies (residues) and immensely improves the subsequent unwrapping. Large amount of InSAR data facilitates Wide Area Monitoring (WAM) over geographical regions. Advances in parallel computing have accelerated Convolutional Neural Networks (CNNs), giving them advantages over human performance on visual pattern recognition, which makes CNNs a good choice for WAM. Nevertheless, this research is largely unexplored. We thus propose “GenInSAR”, a CNNbased generative model for joint phase filtering and coherence estimation, that directly learns the InSAR data distribution. GenInSAR’s unsupervised training on satellite and simulated noisy InSAR images outperforms other five related methods in total residue reduction (over 16 % better on average) with less over-smoothing/artefacts around branch cuts. GenInSAR’s Phase, and Coherence Root-Mean-Squared-Error and Phase Cosine Error have average improvements of 0.54, 0.07, and 0.05 respectively compared to the related methods.

Index Terms—Synthetic Aperture Radar, Neural Networks, Image Filtering, Radar Interferometry, Unsupervised Learning.

I. INTRODUCTION

INSAR or Interferometric Synthetic Aperture Radar is anemerging, highly successful remote sensing technique for measuring several geophysical quantities like surface deformation [1]. It is based on generating an interferogram as the complex difference of two SAR acquisitions of the same scene from slightly different view angles. The wrapped interferometric phase is then unwrapped to subsequently produce a Digital Elevation Model (DEM). However, several decorrelation factors create strong phase noise, affecting unwrapping and DEM accuracy [2]. Thus phase filtering is preferred, even when it results in some decrease in resolution and increase in spatial correlation [3] and we need filters adapted to enhance phase rather than amplitude [4]. Filtering the real and imaginary parts of the complex phase in its wrapped form can avoid blurring edges [5], [6], whereas unwrapping before filtering increases computation and potentially decreases accuracy [1]. Due to the non-stationary nature of InSAR signals, simple boxcar averaging and non-adaptive filtering methods tend to distort the phase [3], [5]. Methods that adapt their parameters

Manuscript submitted January 31, 2020; resubmitted April 30, 2020; revised June 9, 2020 and July 13, 2020; accepted July 15, 2020. Research supported by NSERC Discovery Grant RGPIN-2018-04367 and Department of National Defence/NSERC Discovery Grants Supplements DGDND-2018-00020.

S. Mukherjee, X. Sun and I. Cheng are with the Department of Computing Science, University of Alberta, Edmonton, AB T6G 2R3 Canada (e-mails: mukherje,xinyao1,locheng@ualberta.ca).

A. Zimmer and P. Ghuman are with 3vGeomatics, Vancouver, BC V5Y 0M6 Canada (e-mails: azimmer,pghuman@3vgeomatics.com).

based on, e.g. local phase quality (coherence) yield better results, as coherence is related to phase noise deviation [2], [6]. Early spatial methods like Lee [7], frequency based methods like Goldstein [8] and their numerous improvements [9], [10], [11], [12], [13] adapt to the local fringe direction and/or local noise. Frequency based methods gradually evolved into the wavelet domain [14], [15], [16] to simplify the separation of true phase from noise [2] but struggled to filter dense fringes, whereas spatial methods in general sacrificed spatial resolution [17]. The additive noise model of interferometric phase [7] inspired early filtering methods which assumed a stationary and consistent phase over the filtering window, but real-world challenges of strong topographic changes and restrictions imposed on window size motivated more recent non-linear models [18] and per-pixel filtering [19]. Recent advances in parallel computing architectures have motivated parallelism in the InSAR processing pipeline [20], which is critical to our proposed phase filtering method (“GenInSAR”) for InSAR-based Wide Area Monitoring (WAM) across geographical regions involving petabytes of data. Thus, we use a Convolutional Neural Network (CNN) architecture which seamlessly integrates with modern parallel architectures built on Graphics Processing Units (GPUs) to outperform humans on pattern recognition tasks. Use of CNNs in InSAR phase processing has been limited to volcano deformation monitoring [21] via transfer learning from optical images, but not training directly on InSAR data. Recent CNN-based InSAR phase filtering and coherence estimation/classification [22], [23] trained directly on InSAR data, but with separate CNNs for filtering and coherence estimation, and “raw” coherence generated/preprocessed using traditional methods. In contrast, our GenInSAR filters phase and estimates coherence jointly, using a single neural network, and predicts the center pixel’s distribution given only its neighborhood. It is thus “embarrassingly parallel” [24], whereas non-local filters [25], [26] require computing patch similarity and suffer from terrace-like DEM artefacts, over-smoothing and “rare patch” effect [27]. For similar computational concerns, we avoid strategies that are iterative [3], [4], multi-stage [27], [4] and require optimization during inference, e.g. via sparse coding [1]. Iterative strategies can also result in loss of detailed features [3].

We propose a novel InSAR phase filter inspired by Mixture Density Networks (MDN) [28]. Our CNN convolutional layers operate on their input (phase) receptive field, and help predict the parameters of a bi-variate (Real, Imaginary) Gaussian distribution of the center pixel: filtered pixel , having coherence as a function of (Eq. 1).

Our approach, GenInSAR facilitates learning the real InSAR data distribution from huge datasets generated by the rapidly

1545-598X c2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

increasing use of SAR satellites. Being a generative model, sampling from this distribution generates new interferograms which are slight variations of the filter output, and can be utilized to improve the InSAR pipeline. GenInSAR produces state-of-art results surpassing NLInSAR [25], without being prohibitively slow for most production situations like WAM.

II. PROPOSED METHOD

Fig. 1. Architecture of our proposed method GenInSAR.

Fig. 1 shows GenInSAR’s architecture. Its input is a phase patch centered around the pixel to be filtered. That pixel is zeroed out in the input patch to avoid learning the identity mapping. Thus, GenInSAR learns to ignore it during prediction. Significantly smaller patch sizes like reduce the receptive field, resulting in loss of details in the filtered phase and increasing bias in the predicted coherence. We can understand this more clearly in the training (fitting) and testing (prediction) steps. While training, patches extracted from a fixed set of phase images (training set) are input to the model. We set 50% dropout rate [29] during training to prevent over-fitting. Dropout randomly sets 50% activations of its preceding layer to zero. Intuitively, this forces the network to learn simpler mappings for each training example, thus preventing over-fitting. Convolutional layers [30] of fixed filter size but decreasing filter counts (512, 256, 128, 64, 32), with each followed by Exponential Linear Unit activation [31] (not shown) with promote fast convergence and non-linear mappings. It also allows negative outputs (lower limit of and ). Specifically, we use depth-wise separable 2D convolutions [32] with one filter per input channel (depth) for fast computation and convergence, where the kernel is applied to input feature map to obtain output feature map , followed by a convolution to combine the outputs. Finally, following MDN working principle, dense connections (weighted sums of all filter outputs) to the distribution fitting module outputs those Gaussian parameter values () for the real and imaginary channel which maximizes the input patch’s central pixel (training target) likelihood.

Thus, our training is completely unsupervised, as we learn from the input data itself, without requiring its “clean” version as the training target. The central pixel (surrounded by its neighborhood pixels, ) is treated as a sample drawn from the reference Gaussian distribution chosen to best encompass all n training set examples , by minimizing the loss via gradient descent back- propagation using Adam optimizer [33]. The network is thus trained to parameterize a Gaussian density that best encompasses observed (noisy) data, by minimizing E. During testing, dropout and distribution fitting (optimization) are not required.

GenInSAR does not train to predict . It is directly computed from the predicted . A nice property of is that it seems to be a better measure of filtering quality and filter output reliability, which partially depends on the spatial noise pattern (neighborhood), not just the noise underlying the center pixel. Considering two SAR acquisitions with resulting interferometric unwrapped phase having probability density , variance , and real and imaginary components (R, I) with predicted variances , we approximate as the normalized index of mutual linear predictability of random variables and , thus quantifying noise in interferometric acquisitions: since for our normalized input phasors, ’s denominator reduces to 1.

Again, σ

GenInSAR trains on patches to best utilize the training data, but prediction on patches is slow, especially for large patch sizes. To solve this, we split the trained model into Convolver (all Conv layers) + Combiner (Dense layer onward). During training, Convolver does not use padding, but we pad during prediction to ensure that the output and input image sizes match up along image borders. From Fig. 1, we can infer that the Convolver outputs a 32-channel pixel for each input image pixel, which the Combiner uses to output that pixel’s . Since the Combiner operates on individual pixels, we use a very large batch size for the Combiner during prediction (4096, limited only by GPU memory) for high time efficiency.

We assume a Gaussian distribution for the unwrapped phase noise, to approximate the InSAR multiplicative speckle noise distribution. Generally, a Gaussian Mixture Model can approximate any distribution arbitrarily well by adding more terms, but we lower the free parameter count to achieve a lower bias. Moreover, since the number of effective samples is low, highly accurate characterization of the true underlying distribution is not necessary, since the mean for the two distributions will be equivalent and the only difference will be in the variance. Thus, the variance might be slightly underestimated, but this is a common problem with most coherence estimators.

GenInSAR’s main advantage lies in learning the distribution of real data by directly training on it, without relying on assumptions. This is very useful, as countless images are being acquired daily by an ever-increasing number of SAR satellites around the world, creating a huge archive of real training data. Although most traditional filters rely on the spatial context to estimate the phase and coherence, GenInSAR’s coherence qualitatively appears to be more of an indication of the confidence of its estimated phase. This is a useful feature, because in real-world machine learning applications, GenInSAR might encounter a feature that it was not trained on, so there might be some error in its phase estimation. In that case, it would predict a low coherence, and that pixel will be down weighted or removed in subsequent stages of InSAR processing. Additionally, in areas that are noisy but very smooth, the predicted coherence might be biased up slightly because GenInSAR is more confident of its estimated phase, as it made better use of the contextual information.

III. RESULTS AND DISCUSSION

GenInSAR and the CNN-based InSAR filter mentioned earlier (hereafter referred to as “CNN-InSAR” [22]) were implemented in Keras [34] (Tensorflow-GPU backend). Boxcar, Goldstein [8] (outputs only phase), NLInSAR [25], and NLSAR [26] were implemented in OpenCL 1.2. All methods were executed on a 8 GB NVIDIA 1070 GPU. We present the qualitative and quantitative results of those experiments for real and simulated images respectively. The metrics used for quantitative analysis are Root-Mean-Square-Error (RMSE) of the InSAR phase, and coherence, Residue Reduction Percentage (RRP) [3], [2], [17], and Phase Cosine Error, (Eq. 2) where and denote ground truth and complex conjugate of filtered pixels of an n pixel interferogram. Residues are phase inconsistencies emphasized by computing curl of phase differences over the range of a reduced closed integral loop of four spatially adjacent pixels [35], [17], which are non-zero if residues are present. Most residues are caused by noise. However, few arise from signal structure, like steep change in topography or heavy deformations, and those residues should be preserved during filtering. Filtering should remove all other residues to facilitate phase unwrapping. Those that cannot be removed should have low values in the filter’s output coherence map; this prevents error propagation during phase difference integration by the unwrapper. Hence, filtering aims to reduce residues (high RRP) but preserve details (low Phase RMSE, ). These criteria drive our evaluations:

1) Experiments using satellite InSAR images: GenInSAR was trained for 100 epochs on 5 million patches (in batches of 64 patches) extracted from 300 pixels interferograms obtained from several different sensors at different resolutions. We tested the trained model on pixels interferograms of a mining site. Fig. 2 shows outputs for a test interferogram using GenInSAR and other methods. GenInSAR is a generative model: its filtered output per pixel (i, j) corresponds to the mean (and ) of the distribution predicted for that pixel. To show this, we randomly sample five times from the normal distribution setting , and generate pixel for five images in Fig. 3, which are slightly different outputs for the same input test interferogram of Fig. 2(a). Higher values of generate more variations in the outputs, but also tend to make them noisy. This technique can be used in InSAR machine learning for data augmentation [36], or to test any InSAR processing chain by running that chain all the way through, with slightly different interferograms to measure the variance of the outputs of the complete processing chain, and potentially, for error analysis.

TABLE I QUANTITATIVE EVALUATION OF GENINSAR AND EXISTING METHODS AND SCALABILITY OF GENINSAR OVER INCREASING GPU COUNTS

0.005 GPU Count 64 32 16 8 4 2 1

2) Experiments using simulated InSAR images: Our InSAR simulator can simulate ground truth interferograms with Gaussian bubbles, roads and buildings. We followed a similar training strategy as satellite InSAR images for training our model with simulated InSAR images, by adding Gaussian noise to simulated ground truth images, and inputting patches extracted from those noisy versions. For CNN-InSAR, we obtained two sets of results: 1. using the model as-is, and 2. retraining from scratch with simulated noisy images as mentioned above. For evaluating GenInSAR and five existing methods mentioned earlier including CNN-InSAR (as-is and retrained), we used 60 pixels noisy simulated images. Fig. 4 compares all methods using a cropped region of one such test image. Corresponding full-size clean (ground truth) images facilitated quantitative evaluation (Table I) showing overall superior performance of GenInSAR against others and almost linear speedup with increasing number of GPUs.

GenInSAR almost totally reduces residues and produces far less over-smoothing/artefacts around branch cuts than Boxcar, because it’s greatest strength is (unsupervised) learning of true spatial smoothing from noisy training data. It could potentially detect real residues better if trained more on such types of features, and a yet more efficient implementation, like the compared methods [37] could further reduce it’s run time. NLInSAR handles residues well and avoids artefacts by selecting neighbors with similar phase, but produces streaking correlated with amplitude bands. NLSAR (conservatively) interpolates well only over heavy noise. A final scope of future work for GenInSAR is enforcing the network’s output pixel to lie on the unit circle, as currently, is clipped to [0, 1], although even at present, most values lie in that range.

IV. CONCLUSION

We propose an InSAR phase filter and coherence estimator that outperforms the state-of-art. Our generative modeling via

Fig. 2. Filtered phase and coherence outputs for satellite InSAR images from GenInSAR (total residue reduction, less over-smoothing/artefacts around branch cuts) and five existing methods. Phase and coherence are coloured between –(blue) to +(red), and 0 (black: low) to 1 (white: high) respectively.

Fig. 3. GenInSAR outputs (fine, pixel-wise differences) obtained by sampling the Gaussian predicted for each pixel of the same noisy input (Fig. 2(a), red square). It is useful in data augmentation and testing InSAR processing chains / error analysis. Phase visualizations coloured from –(blue) to +

unsupervised learning learns from petabytes of noisy InSAR data captured by an increasing number of SAR satellites, and can generate new interferograms to improve InSAR machine learning. Our GPU-based highly scalable filter can help monitor extensive areas of earth’s surface for potential disasters.

REFERENCES

[1] C. Ojha, A. Fusco, and I. M. Pinto, “Interferometric sar phase denoising using proximity-based k-svd technique,” Sensors, vol. 19, no. 12, 2019.

[2] Y. Wang, H. Huang, Z. Dong, and M. Wu, “Modified patch-based locally optimal wiener method for interferometric sar phase filtering,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 114, pp. 10 – 23, 2016.

[3] A. Mestre-Quereda, J. M. Lopez-Sanchez, J. Selva, and P. J. Gonzalez, “An improved phase filter for differential sar interferometry based on an iterative method,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 8, pp. 4477–4491, Aug 2018.

[4] A. Zimmer and P. Ghuman, “Cuda optimization of non-local means extended to wrapped gaussian distributions for interferometric phase denoising,” Procedia Computer Science, vol. 80, pp. 166 – 177, 2016, international Conference on Computational Science 2016, ICCS 2016, 6-8 June 2016, San Diego, California, USA.

[5] X. Qing, J. Guowang, Z. Caiying, W. Zhengde, H. Yu, and Y. Peizhang, “The filtering and phase unwrapping of interferogram,” in Proceedings of International Society for Photogrammetry and Remote Sensing (ISPRS), Volume XXXV, Technical Commission V1/Working Group 4, 2004.

[6] X. Lin and D. Niu, “Experiments of interferometric phase filtering through weighted nuclear norm minimization,” in Proceedings of the 2nd International Conference on Big Data Technologies, ser. ICBDT2019. New York, NY, USA: ACM, 2019, pp. 278–282.

Fig. 4. Filtered phase and coherence outputs for simulated InSAR images from GenInSAR (total residue reduction, less over-smoothing/artefacts around branch cuts) and five existing methods. Phase and coherence are coloured between –(blue) to +(red), and 0 (black: low) to 1 (white: high) respectively.

[7] J.-S. Lee, K. P. Papathanassiou, T. L. Ainsworth, M. R. Grunes, and A. Reigber, “A new technique for noise filtering of sar interferometric phase images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 36, no. 5, pp. 1456–1465, Sep. 1998.

[8] R. M. Goldstein and C. L. Werner, “Radar interferogram filtering for geophysical applications,” Geophysical Research Letters, vol. 25, no. 21, pp. 4035–4038, 1998.

[9] N. Wu, D.-Z. Feng, and J. Li, “A locally adaptive filter of interferometric phase images,” IEEE Geoscience and Remote Sensing Letters, vol. 3, no. 1, pp. 73–77, Jan 2006.

[10] S. Fu, X. Long, X. Yang, and Q. Yu, “Directionally adaptive filter for synthetic aperture radar interferometric phase images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 1, pp. 552–559, Jan 2013.

[11] C. Chao, K. Chen, and J. Lee, “Refined filtering of interferometric phase from insar data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 12, pp. 5315–5323, Dec 2013.

[12] G. Vasile, E. Trouve, J.-S. Lee, and V. Buzuloiu, “Intensity-driven adaptive-neighborhood technique for polarimetric and interferometric sar parameters estimation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 44, no. 6, pp. 1609–1621, June 2006.

[13] R. Song, H. Guo, G. Liu, Z. Perski, and J. Fan, “Improved goldstein sar interferogram filter based on empirical mode decomposition,” IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 2, pp. 399–403, Feb 2014.

[14] C. Lopez-Martinez and X. Fabregas, “Modeling and reduction of sar interferometric phase noise in the wavelet domain,” IEEE Transactions on Geoscience and Remote Sensing, vol. 40, no. 12, pp. 2553–2566, Dec 2002.

[15] Y. Bian and B. Mercer, “Interferometric sar phase filtering in the wavelet domain using simultaneous detection and estimation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 4, pp. 1396–1416, April 2011.

[16] X. Zha, R. Fu, Z. Dai, and B. Liu, “Noise reduction in interferograms us- ing the wavelet packet transform and wiener filtering,” IEEE Geoscience and Remote Sensing Letters, vol. 5, no. 3, pp. 404–408, July 2008.

[17] Y, ong Gao, S. Zhang, K. Zhang, and S. Li, “Frequency domain filtering SAR interferometric phase noise using the amended matrix pencil model,” Computer Modeling in Engineering & Sciences, vol. 119, no. 2, pp. 349–363, 2019.

[18] H. Huang and Q. Wang, “A method of filtering and unwrapping sar interferometric phase based on nonlinear phase model,” Progress In Electromagnetics Research, vol. 144, pp. 67–78, 2014.

[19] Z. Suo, J. Zhang, M. Li, Q. Zhang, and C. Fang, “Improved insar phase noise filter in frequency domain,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 2, pp. 1185–1195, Feb 2016.

[20] A. Pepe, Y. Yang, M. Manzo, and R. Lanari, “Improved emcf-sbas processing chain based on advanced techniques for the noise-filtering and selection of small baseline multi-look dinsar interferograms,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 8, pp. 4394–4417, Aug 2015.

[21] N. Anantrasirichai, J. Biggs, F. Albino, and D. Bull, “A deep learning approach to detecting volcano deformation from satellite imagery using

synthetic datasets,” Remote Sensing of Environment, vol. 230, p. 111179, 2019.

[22] S. Mukherjee, A. Zimmer, N. K. Kottayil, X. Sun, P. Ghuman, and I. Cheng, “Cnn-based insar denoising and coherence metric,” in 2018 IEEE SENSORS, Oct 2018, pp. 1–4.

[23] S. Mukherjee, A. Zimmer, X. Sun, P. Ghuman, and I. Cheng, “Cnn-based insar coherence classification,” in 2018 IEEE SENSORS, Oct 2018, pp. 1–4.

[24] M. Herlihy and N. Shavit, The Art of Multiprocessor Programming, Revised Reprint. Morgan Kaufmann Publishers, 2012.

[25] C. Deledalle, L. Denis, and F. Tupin, “Nl-insar: Nonlocal interferogram estimation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 4, pp. 1441–1452, April 2011.

[26] C. Deledalle, L. Denis, F. Tupin, A. Reigber, and M. Jger, “Nl-sar: A unified nonlocal framework for resolution-preserving (pol)(in)sar denoising,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 4, pp. 2021–2038, April 2015.

[27] G. Baier, C. Rossi, M. Lachaise, X. X. Zhu, and R. Bamler, “A nonlocal insar filter for high-resolution dem generation from tandem-x interferograms,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 11, pp. 6469–6483, Nov 2018.

[28] C. M. Bishop, “Mixture density networks,” Aston University, Birming- ham, United Kingdom, Tech. Rep. NCRG/94/004, February 1994.

[29] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut- dinov, “Dropout: A simple way to prevent neural networks from over-fitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, Jan. 2014.

[30] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Nov 1998.

[31] D. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016.

[32] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, “Mo- bilenetv2: Inverted residuals and linear bottlenecks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2018, pp. 4510–4520.

[33] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza- tion,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.

[34] F. Chollet et al., “Keras,” https://keras.io, 2015.

[35] C. Dnior and A. Pepe, “Comparative study of SAR interferometric phase filtering algoithms ,” in Advanced Topics in Optoelectronics, Microelectronics, and Nanotechnologies IX, M. Vladescu, R. D. Tamas, and I. Cristea, Eds., vol. 10977, International Society for Optics and Photonics. SPIE, 2018, pp. 611 – 620.

[36] C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmen- tation for deep learning,” Journal of Big Data, vol. 6, no. 1, Jul. 2019.

[37] G. Baier. Github - gbaier/despeckcl: A c++ library for insar denoising images using opencl. [Online]. Available: https://github.com/gbaier/ despeckCL