Nowadays, a lot of effort has been spending in the direction of earth observation and thanks to the continuous development of satellite sensors more and more data are available. Given the huge availability of data, together with their day by day updating make the use of remote sensing images a crucial source for earth monitoring. In the last decades, remote sensing images have been used for many applications such as classification, detection and segmentation. The possibility of having an always updated data it is very important for the monitoring of wide areas like forest, agricultural field and urban areas. Moreover, it is also important for detection of natural and man-made disaster like landslide and fire detection.
Several methods have been developed to these aims taking advantage from both optical and SAR sensors. SAR are active sensors that work day and night, in any meteorological condition. Indeed, SAR images are crucial for monitoring in a fast way our planet. SAR imaging formation is characterized by particular geometrical effects: multiple bouncing, shadowing and layover are all related to the presence of an abject on the scene, to its position, to the position and angle of view of the sensor. Moreover, SAR images are also affected by a multiplicative noise called speckle (Argenti et al., 2013). Speckle is related to coherent and incoherent interferences among backscattering: depending on the relationship between the roughness of illuminated object and the transmitted wavelength, the backscatterings are spread in several direction and so, backscattering from different objects will interfere each other. Bright pixels are due to the constructive interference, dark pixels to destructive interference. This make the typical alternation of spikes and dark pixels in the SAR image, that obviously impairs the understanding of the scene. Therefore, despeckling usually is used as a preprocessing for further applications. In the last decades several despeckling methods have been developed. The first proposed despeckling filters were the Local filters, so called because work on the assumption of the pixel similarity in its own neighbourhood. These kind of filters (Argenti et al., 2013) usually suffer of smoothness in presence of edges. In fact, pixels that belong to the boundary between two different areas do not have many similar in the neighbourhood. In order to overcome this problem, in the last years Non-Local (NL) filters have been proposed (Deledalle et al., 2014). These filters largely over-
come the local one both in noise suppression and edge preservation. They look for similarity in a wider area: given a central pixel in a patch, similar patches are searched in a wider area and the result is the combination of selected patches. The similarity criteria and the combination’s rule make the differentiation among different methods (Deledalle et al., 2014), (Am- brosanio et al., 2018), (Aghababaee et al., 2018). Among these filters there are those that make use of optical data for helping the despeckling process (Vitale et al., 2019a) and others that takes advantage from the ratio image (Ferraioli et al., 2019) (it is the ratio between the SAR image and the filtered one, representing the predicted speckle). The drawback of these NL filters is that they are time consuming. In the last years, deep learning base methods are showing impressive results in many application of natural image processing such as classification, segmentation and detection (He et al., 2017). Actually, good results are achieved also in several remote sensing application like land classification and segmentation (Mazza, Sica, 2019), super-resolution (Vitale, 2019) and detection (Gargiulo et al., 2019). Clearly, the deep learning base method for despeckling have been proposed (Wang et al., 2017), (Chierchia et al., 2017), (Vitale et al., 2019b). Given the great amount of data and the rapidity of producing results easily match with deep learning solution. In this work we proposed a convolutional neural network (CNN) for despeckling. Based on the results of our previous solution (Vitale et al., 2019b), we propose a new cost function in order to better preserve and handle the edges.
Training a CNN for despeckling is a challenging task because of the lack of a noise free reference. The proposed solution work with simulated data under the fully developed hypothesis of the noise. In this work we inherit the architecture of KLDNN (Vitale et al., 2019b) and we apply another cost function in order to improve the edge preservation: the aim is to better filter not homogeneous areas, such as man-made structures, where KL-DNN performs poorly.
2.1 Data Simulation
We simulated a single look (L = 1) speckle N under the fully developed hypothesis with the following known Gamma distri-
bution (Argenti et al., 2013)
It means, in the simulation we consider just the speckle that homogeneous areas are characterized with. In order to obtain noise-free reference X, we collected images from the optical dataset Merced Land Use (Yang, Newsam, 2010), so converted those images to gray scale. So, we multiplied the simulated noise for producing the simulated SAR image . Finally, we tiled the dataset in patches of dimension
: 30000 patches were used for the training and 7000 for the validation.
2.2 KL-DNN
Figure 1. KL-DNN architecture
In our previous work, we trained a ten layers CNN (KL-DNN) (for training details refer to (Vitale et al., 2019b)) on the simulated data. The cost function involved in this work is given by combination of two terms:
where and N are the estimated noise and the theoretical one, respectively;
is the Kullback-Leibler (KL) divergence between two distributions p and q
In this cost function is responsible of spatial preservation by comparison of filtered image
and the noise-free reference.
is responsible of statistical noise preservation comparing the probabilistic distribution function of estimated
N and the theoretical one by mean of KL divergence. The aim of this cost function is to suppress the noise taking care of its statistical properties.
The mentioned solution provides good results on homogeneous areas, but presents artefacts in the not homogeneous ones such as urban areas where many man-made structures are present. Man made structures in real SAR images look totally different from the one in simulated data due to the geometry of SAR image acquisition: when an object is illuminated by the SAR, effects like layover, shadowing and multiple bounces arise. Usually, in SAR image a building is characterized by a side with a
strong backscattering due to multiple reflections with the ground, and the other side is darker due to the layover and shadowing. The speckle in such areas is not fully developed (Frery et al., 1997), and our simulated data do not include such effects and statistics.
Given KL-DNN works under the fully developed hypothesis, it does not know how to filter man-made structures. Generally, it is going to filter them in order to produce a speckle that is fully developed and so many artefacts arise.
In order to limit this problem we include a term in the cost function for improving the edge preservation. The aim is to make the network able to recognize man-made structures and to preserve their shape (usually characterized by strong edges). In such way, we want the network to filter homogeneous areas and to preserve objects details. So the actual cost function is composed of three terms:
Figure 2. Proposed Cost Function
where
where and
are respectively the derivatives along the rows and columns of the image I. With this function we train the network to suppress the noise, taking care both of the statistical properties of the speckle and of the present edges.
In order to have a fair comparison with KL-DNN, we use its same architecture. Moreover, we trained both the proposed network and KL-DNN on the same dataset, with Adam optimizer (Kingma, Ba, 2015).
Numerical and visual assessment are carried out for validating the method. We test our solution on both simulated and real data. We show comparison with our previous solution KL-DNN in order to show the impact of the cost function. Moreover, for sake of completeness we also compare with two famous non-local filters such as FANS (Cozzolino et al., 2014) and SAR-BM3D (Parrilli et al., 2012). The simulated data are taken from the Mercedes dataset and from scraped Google Maps (Wang et al., 2017) and never seen during the training. The simulation follows the process depicted in Section 2.
In Fig.3 are shown results on simulated data. In all cases, it can be appreciated how the introduction of the term in the cost function improves the edge preservation. In clip1 and clip2
Figure 3. Results on simulated data, from top to bottom: clip1 (scraped Google Maps), clip2 (scraped Google Maps), clip3 (Mercedes Land Use)
(first two rows of Fig.3), we consider an urban area: compared to KL-DNN, the filtered image is closer to the reference: the results are sharper and more clean. Moreover, the proposed approach shows a much better edge and detail preservation, e.g. the objects on the rooftop are more visible than in KL-DNN. Same consideration can be done on the image for clip3 (last row of Fig.3) that shows a storage tanks: compared to KLDNN, the edges are better preserved and details look sharper. In both cases, FANS and SAR-BM3D show several artefacts: FANS tends to preserve edges but is oversmoothed and lose a lot of details., SAR-BM3D better preserves details with respect to FANS but smooths the edges. Anyway, the proposed solution shows better edges and details preservation with respect the other methods.
These considerations are confirmed by the numerical assessment in Tabb. 1-3. The presented metrics indicate how much the filtered image is close the reference one (MSE), how much the noise is suppressed (SNR) and how much the filtered and reference image are similar from a perceptual point of view (SSIM). Ideal filter will give MSE=0, SNR=inf and SSIM=1. In all the metrics, the proposed solution outperforms the other methods validating the previous consideration.
Regarding the real SAR images we consider a TerraSAR-X image taken from Rosenheim . In Fig. 4 are shown the results for all the methods. Generally, we can keep the considerations done for simulated data. In this case, the proposed solution is sharper than KL-DNN showing a better edge preservation. Moreover, the proposed solution better preserves details and small object that are completely lost in FANS. Instead SAR-BM3D has very good edge preservation but a poor noise reduction. Actually, we want to focus on those challenging areas for KL-DNN. It means we want to find out the behaviour on man-made structures where the speckle is not fully developed and where KL-
Table 3. Numerical Results on clip3
DNN tends to smooth the image. In Fig. 5 two details from Rosenheim are shown. In both cases it can be noted how KLDNN face difficulties in filtering such areas and tends to smooth them. After all, KL-DNN is trained under the fully developed hypothesis, so these troubles should be expected . Watching the proposed results it can be noted that, introducing a cost function for edge preservation helps the network in localize and recognize these strong backscatterers as object to be preserved and so the smoothing effect is strongly preserved. It is clear from the two details in Fig.5 and the whole image in Fig.4 that the proposed solution is able to better preserve the sharpness of these edges without losing details on the homogeneous areas, even if the assumption of fully developed hypothesis is still valid during the training. So, the term helps the network in overcoming the limitation of fully developed hypothesis by preserving the geometry of man-made structures.
Figure 4. Results on Real Data: Rosenheim area taken from Terrasar-X
Figure 5. Details of real data
In this paper a convolutional neural network for despeckling has been proposed. In this work we define a new cost function, based on the knowledge of our previous solution KL-DNN where the network is trained under the fully developed hypothesis. This cost function aims to better preserve the edges and to have a better filtering process in areas where man-made structures are present. The results show how, even if the network is still trained under the fully developed assumption, the introduction of a loss taking care of the edges helps the filter in treating the not homogeneous areas.
Aghababaee, H., Ferraioli, G., Schirinzi, G., Sahebi, M. R., 2018. The Role of Nonlocal Estimation in SAR Tomographic Imaging of Volumetric Media. IEEE Geoscience and Remote Sensing Letters, 15(5), 729-733.
Ambrosanio, M., Baselice, F., Ferraioli, G., Pascazio, V., 2018. Ultrasound despeckling based on non local means. H. Eskola, O. V¨ais¨anen, J. Viik, J. Hyttinen (eds), EMBEC & NBC 2017, Springer Singapore, Singapore, 109–112.
Argenti, F., Lapini, A., Bianchi, T., Alparone, L., 2013. A Tu- torial on Speckle Reduction in Synthetic Aperture Radar Images. IEEE Geoscience and Remote Sensing Magazine, 1(3), 6-35.
Chierchia, G., Cozzolino, D., Poggi, G., Verdoliva, L., 2017. Sar image despeckling through convolutional neural networks. 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 5438–5441.
Cozzolino, D., Parrilli, S., Scarpa, G., Poggi, G., Verdoliva, L., 2014. Fast Adaptive Nonlocal SAR Despeckling. IEEE Geoscience and Remote Sensing Letters, 11(2), 524-528.
Deledalle, C. A., Denis, L., Poggi, G., Tupin, F., Verdoliva, L., 2014. Exploiting Patch Similarity for SAR Image Processing: The nonlocal paradigm. IEEE Signal Processing Magazine, 31(4), 69-78.
Ferraioli, G., Pascazio, V., Schirinzi, G., 2019. Ratio-Based Nonlocal Anisotropic Despeckling Approach for SAR Images. IEEE Transactions on Geoscience and Remote Sensing, 57(10), 7785-7798.
Frery, A. C., Muller, H. ., Yanasse, C. C. F., Sant’Anna, S. J. S., 1997. A model for extremely heterogeneous clutter. IEEE Transactions on Geoscience and Remote Sensing, 35(3), 648-659.
Gargiulo, M., Dell’Aglio, D. A. G., Iodice, A., Riccio, D., Ruello, G., 2019. A CNN-Based Super-Resolution Technique for Active Fire Detection on Sentinel-2 Data. CoRR, abs/1906.10413. http://arxiv.org/abs/1906.10413.
He, K., Gkioxari, G., Doll´ar, P., Girshick, R. B., 2017. Mask R- CNN. CoRR, abs/1703.06870. http://arxiv.org/abs/1703.06870.
Kingma, D. P., Ba, J., 2015. Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
Mazza, A., Sica, F., 2019. Deep learning solutions for tandem- x-based forest classification. IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, 2631– 2634.
Parrilli, S., Poderico, M., Angelino, C. V., Verdoliva, L., 2012. A Nonlocal SAR Image Denoising Algorithm Based on LLMMSE Wavelet Shrinkage. IEEE Transactions on Geoscience and Remote Sensing, 50(2), 606-616.
Vitale, S., 2019. A cnn-based pansharpening method with perceptual loss. IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, 3105–3108.
Vitale, S., Cozzolino, D., Scarpa, G., Verdoliva, L., Poggi, G., 2019a. Guided Patchwise Nonlocal SAR Despeckling. IEEE Transactions on Geoscience and Remote Sensing, 57(9), 6484–6498. http://dx.doi.org/10.1109/TGRS.2019.2906412.
Vitale, S., Ferraioli, G., Pascazio, V., 2019b. A new ratio im- age based cnn algorithm for sar despeckling. IGARSS 2019 -2019 IEEE International Geoscience and Remote Sensing Symposium, 9494–9497.
Wang, P., Zhang, H., Patel, V. M., 2017. SAR Image Despeck- ling Using a Convolutional Neural Network. IEEE Signal Processing Letters, 24(12), 1763-1767.
Yang, Y., Newsam, S., 2010. Bag-of-visual-words and spatial extensions for land-use classification. ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS).