Image restoration tasks [17, 28, 30, 29, 56, 47, 26, 35] have achieved noticeable improvement with the development of convolutional neural network (CNN). Although most of image restoration methods work well on synthetically degraded images [23, 57, 9, 24], they show insufficient performance on the real degradations.
Regarding the denoising methods, the networks trained with synthetic-noise (SN) do not work well for the real-world images because of the discrepancy in the distribution of SN and real-noise (RN). Specifically, CNNs [53, 54, 55] trained with Gaussian noise do not work well for the real-world images, because the CNNs are overfitted to the Gaussian distribution. The problem of overfitting can also be seen from a toy regression example in Fig. 1. As shown in Fig. 1(a), the severely overfitted regression method (‘w/o Regularizer’) shows worse performance than a regularized method (‘w/ Regularizer’) on the synthetic test data. Moreover, it can be seen in Fig. 1(b) that the generalization ability is much worse when the training and test domains are different.
To better address the problem due to the different data distribution between training and test sets, two kinds of approaches have been developed: (1) obtaining the pairs of RN image and corresponding near-noise-free image [37, 41, 5, 2, 48], and (2) finding more realistic noise model [19, 7].
The RN datasets enable the quantitative comparison of denoising performance on real-world images and also provide the training sets for learning-based methods. The CNNs trained with RN datasets robustly work on the real-world images, because domains of training and test set almost coincide. However, acquiring the pairs of RN images needs specialized knowledge, and the amount of provided datasets would not be enough for training a deeper CNN [51, 49]. Furthermore, learning-based methods can be easily overfitted to a specific camera device (dataset), which cannot cover all the devices that have different characteristics such as gamma correction, color correction, and other in-camera pipelines.
For a finding more realistic noise model, CBDNet [19] synthesized near-RN images by considering realistic noise models and simulating the in-camera pipeline. It generates enough dataset that simulates more than 200 camera response functions. The CBDNet shows excellent performance on RN images even though the CNN is trained with the SN. Furthermore, they showed that additional training with RN dataset improves performance. Although realistic noise modeling indeed reduces the domain discrepancy between SN and RN, there still remains a domain discrepancy to be handled. Moreover, CNN can be overfitted to a certain
Figure 1: A toy regression example presenting the effects of regularization and transfer learning. (a) We assume that training and test data are sampled from a 5th order polynomial, with additive white Gaussian noise (AWGN). The original regression model (without regularizer) is denoted as w/o Regularizer, which is a 10th order polynomial model. As well know, the higher-order model overfits the data. Assuming that a regularization method successfully degenerates the model to a 6th order one (w/ Regularizer), then overfitting is relieved. It can be seen from mean squared error (MSE) on synthetic test data that the regularization can enhance the performance when training and test distributions are the same. (b) We assume another 5th-order polynomial that generates a real data that has some domain difference from the synthetic one. It can be seen from the MSE on real test data that the regularization is essential for processing other distributions. (c) Transfer learning regression method w/ Regularizer + TF is fine-tuned with few real data samples from w/ Regularizer. It can be seen from the MSE on real test data that transfer learning can be trained efficiently with few real training samples.
noise model that is actually not a ‘real’ noise.
From these observations, we propose a novel denoiser that is well generalized to the various RN from camera devices by employing an adaptive instance normalization (AIN) [45, 21, 31, 40]. In recent CNN based methods for restoring the synthetic degradations [32, 57, 24], regularization methods have not been exploited due to the small performance gain (even degrading performance). This indicates that a CNN is overfitted to the training data to get the best performance when domains of training and test set coincide [15].
On the other hand, the denoiser trained with SN needs regularization, for applying it to the RN denoising. As shown in the example of Fig. 1 (a) and (b) with ‘w/ Regularizer’, the network needs to be generalized through the regularization. In this respect, we propose a well-regularized denoiser by adopting the AIN as a regularization method. Specifically, the affine transform parameters for the normalization of features are generated from the pixel-wise noise level. Then, the transform parameters adaptively scale and shift the feature maps according to the noise characteristics, which results in the generalization of the CNN.
Furthermore, we propose a transfer learning scheme from SN to RN denoising network to reduce the domain discrepancy between the synthetic and the real. As mentioned above, the RN dataset would not be sufficient to train a CNN, which can also be easily overfitted to a certain RN dataset. Hence, we devise a transfer learning scheme that learns the general and invariant information of denoising from the SN domain and then transfer-learns the domain-specific information from the information of RN. As can be seen in Fig. 1(c), we believe that the SN denoiser can be adapted to an RN denoiser by re-transforming normalized features. Specifically, the parameters of AIN are updated using the RN dataset. The proposed scheme based on transfer learning can be applied to any dataset that has a small number of labeled data. That is, a CNN trained with the SN is easily transferred to work for the RN removal, without the need for training the whole network with the RN.
The contribution of this work can be summarized as follows:
• We propose a novel well-generalized denoiser based on the AIN, which enables the CNN to work for various noise from many camera devices.
• We introduce a transfer learning for the denoising scheme, which learns the domain-invariant information from SN data and updates affine transform parameters of AIN for the different-domain data.
• The proposed method achieves state-of-the-art performance on the SN and RN images.
The statistics of RN in standard RGB (sRGB) images depend on the properties of camera sensors and in-camera pipelines. Specifically, shot noise and readout noise are generated from the sensor, and the statistics of generated noise are changed according to the in-camera pipeline such
Figure 2: Illustration of the proposed denoiser. The noise level estimator and reconstruction network are U-Net based architecture, so the feature maps are down/up-sampled by average-pool/transposed convolution. We denote each scale of feature map as 1/s where s can be 1, 2, and 4. All the represented convolutions in reconstruction network are kernel having 64s feature maps excluding last convolution. Feature representation of noise level estimator is also composed of
convolutions with 32 channels and noise level maps are achieved from
convolutions having 3 channel outputs. The amount of overall parameters is 13.7 M.
as demosaicing, gamma correction, in-camera denoiser, white balancing, color correction, etc [38]. There have been several works to approximate the RN model, including Gaussian-Poisson [16, 33], heteroscedastic Gaussian [20], Gaussian Mixture Model (GMM) [58], and deep leaning based methods [10, 1]. Considering the camera pipeline, CBDNet [19] and Unprocessing [7] also considered realistic noise models. Specifically, they obtained near-RN images by adding the heteroscedastic Gaussian noise to the pseudo-raw images and feeding them to the camera pipeline. These methods can simulate more than 200 camera response functions, and thus generate noisy images having different characteristics. Moreover, CBDNet is alternately trained with the RN and SN to overcome overfitting to the noise model. We think the alternate training scheme would incur training instability due to different data distributions, and also cannot train quite different RN effectively. Thus, we introduce a new transfer learning scheme that can simply but effectively adapt SN denoiser to other RN ones by re-transforming the normalized feature map.
We aim to train a robust RN denoiser, which reduces the discrepancy between the distributions of training and test sets, by proposing a novel denoiser and transfer learning. Precisely, we propose denoising architecture using the AIN, which can be well generalized to RN images. Also, we introduce a transfer learning scheme to reduce the remaining data discrepancy, which consists of two stages: (1) training a denoiser with SN dataset and (2) transfer learning with RN dataset
, where X and Y are noise-free images and noisy images respectively, and the subscript s is for SN and r for RN. We use the noise model from CBDNet for generating
from
with the noise level of
where y
denotes SN image. After training SN denoiser with S, RN denoiser is trained with T (pairs of RN image y
and near noise-free image x
). In the transfer learning stage, domain-specific parameters are only updated to effectively preserve learned knowledge from SN data.
3.1. Adaptive Instance Normalization Denoising Network
We present a novel AIN denoising network (AINDNet), where the same architecture is employed both for SN and RN denoiser. We compose AINDNet with a noise level estimator and a reconstruction network, which is presented in Fig. 2. The noise level estimator takes a noisy image y as an input and generates the estimated noise level map where
denotes a training parameter of estimator. The reconstruction network takes
(y) and y as input and generates denoised image
where
denotes a training parameter of reconstruction network. The reconstruction network is U-Net based architecture with AIN Residual blocks (AIN-ResBlocks).
Noise Level Estimator Estimating the noise level would not be an easy task due to the complex noise model and in-camera pipeline. In our experiment, we find that previous simple noise level estimators [19, 7], which consist of five convolutions, could not accurately estimate the noise level. The main reason is that the previous estimators have a
Figure 3: Illustration of the proposed AIN-ResBlock with corresponding kernel size (k), feature scale (s), and number of features (n). Note that n is linearly increasing according to s. Leaky ReLU is employed for an activation function. The Norm (red) block denotes channel-wise spatial normalization block. Average-pool scales the size of to be the same as that of h.
small receptive field so that it could not fully capture complex noise information. From this observation, we design a new noise level estimator with a larger receptive field by employing down/up-sampling and multi-scale estimations. Specifically, estimator produces down-scaled estimation map and original-sized estimation map
. Then, these two outputs are weight averaged to feed reconstruction network:
where denotes the height and width of the image, and the linear interpolation respectively.
is empirically determined to 0.8. From the weight average of multi-scale estimates, we can achieve region-wisely smoothed
, which follows general the characteristic of RN.
Adaptive Instance Normalization The proposed AINResBlock plays two crucial roles in the proposed denoising scheme. One is regularizing the network not to be overfit-ted to SN images, and the other is adapting SN denoiser to RN denoiser. For this, we build AIN-ResBlcok with two convolutions and two AIN modules, which is presented in Fig. 3. The AIN module affine transforms normalized feature map of convolution by taking a conditional input
where
denotes the spatial size of feature map at each scale s, and C is the number of channels. Specifically, the AIN module produces affine transform parameters such as scale (
) and shift (
) for each pixel. Thus, every feature map is channel-wisely normalized and pixel-wisely affine transformed according to the noise level. The update process of feature map in AIN module at site (
) is formally repre-
Figure 4: Illustration of the proposed transfer learning scheme. AIN module, noise level estimator, and last convolution are only updated when learning RN data. For the better visualization, we omit the noise level estimator in this figure.
sented as
where the variables with superscript * are generated from , and
and
denote the mean and standard deviation of h respectively, in channel c. Precisely,
where denotes the stability parameter, which prevents divide-by-zero in eq. (2), and we set
in our implementation. Note that
and
can be generated pixel-wisely and thus the proposed method can process spatially variant noisy images adaptively. In another point of view, AIN module acts as feature attention [11, 57, 46, 14, 25] with explicitly constrained information (
).
3.2. Transfer Learning
We propose transfer learning scheme to leverage S to accelerate the training of RN denoiser with T that has a limited number of elements (RN pairs). We expect that SN denoiser learns general and invariant feature representations and RN denoiser learns noise characteristics that cannot be fully modeled from SN data. The proposed transfer learning scheme can achieve these two merits by adapting SN denoiser to RN denoiser. For this, we focus on normalization parameter to handle different data distribution, which is inspired from other style transfer and classifica-tion tasks [45, 21, 40]. In these methods, transforming normalization parameters can transfer different style do- main, and different domain classifications can be handled by switching the batch normalization parameters. From these observations, we try to adapt different domain denoisers by transfer-learning the normalization parameters assuming that data discrepancy between S and T can be adapted by re-transforming the normalized feature maps.
Specifically, AIN parameters of SN denoiser can be adapted pixel-wisely with conditional . Thus, AIN modules and noise level estimator are transfer-learned with RN data. Although the objective function of noise level cannot be present in T , noise level estimator can be trained with the reconstruction loss. We consider that last convolution plays a crucial role reconstructing feature maps to RGB image, hence last convolution is also updated. The overall proposed transfer learning scheme is presented in Fig. 4.
Since the proposed transfer learning scheme only updates the parts of well generalized denoiser, it can be converged with faster speed and get better performance with very few number of elements from T than training from scratch. Moreover, the proposed scheme effectively copes with multiple models, which are inevitably required due to severely different noise statics, saving lots of memory by switching specific parameters.
Training For training SN denoiser, we exploit multi-scale asymmetric loss as an estimation loss where asymmetric loss is introduced from CBDNet [19] to prevent under estimation. Formally, multi-scale asymmetric loss is defined as,
where , and
denote element-wise operations such as indicator function, multiplication, and power respectively. Hyperparameters
are empirically determined as
is achieved from
average pooling
.
Then, the proposed SN denoiser is jointly trained with estimation loss and reconstruction loss as,
where denotes the SN denoiser training parameter including noise level estimator and reconstruction network.
denotes the weight term of noise level estimator and is empirically determined to 0.05.
For the RN denoiser, it is only trained with reconstruction loss:
where denotes the RN denoiser training parameter that is transferred from
. Previously stated parameter such as AIN modules, estimator, and last convolution are only updated, and other parameters are fixed when training the RN denoiser. We use Adam optimizer for both SN denoiser and RN denoiser.
We present the results of AWGN and RN images by training a Gaussian denoiser and RN denoiser.
4.1. Experimental Setup
Training Settings For the Gaussian denoiser, the training images are obtained from DIV2K [43] and BSD400 [36], and noisy image is generated by AWGN model. For the RN denoiser, we train a denoiser with two step: training an SN denoiser and training the RN denoiser by transfer learning. We achieve pairs of SN images and noise-free images from Waterloo dataset [34] with heteroscedastic Gaussian noise model and simulating in-camera pipelines. The RN denoiser, which is transferred from SN denoiser, is trained with SIDD training set [2]. All the training images are cropped into patches of size .
Test Set In the AWGN experiments, we evaluate Set12 [53] and BSD68 [42] that are widely used for validating the AWGN denoiser. Furthermore, we adopt three datasets for real-world noisy images:
• RNI15 [27] is composed of 15 real-world noisy images. Unfortunately, the ground-truth clean images are unavailable, therefore we only present qualitative results.
• DND [41] provides 50 noisy images that are captured by mirrorless cameras. Since we cannot access near noise-free counterparts, the objective results (PSNR/SSIM) can be achieved by submitting the denoised images to DND site.
• SIDD [2] is obtained from smartphone cameras. It provides 320 pairs of noisy images and corresponding near noise-free ones for the learning based methods where the captured scenes are mostly static. Furthermore, it provides 1280 patches for validation that has similar scenes with training set. The quantitative results (PSNR/SSIM) can be achieved by uploading the denoised image to SIDD site.
4.2. Comparison with state-of-the-arts
Noise Level Estimation We evaluate an accuracy of noise level estimator on exploited noise model images. We compare the proposed noise level estimator with fully convolutional network (FCN) that are widely used [19, 7]. In
Table 1: Average MAE and error STD for the images from Kodak24 where the inputs are corrupted by heteroscedastic Gaussian including in-camera pipeline.
Table 2: Average PSNR of the denoised images, where the inputs are corrupted by AWGN with and 50, for the images from Set12 and BSD68 datasets. (red: the best result, blue: the second best)
order to evaluate the accuracy of estimator itself, each estimator is trained with regression. The employed quantitative measurements are mean absolute error (MAE) and standard deviation (STD) of the error. We report the accuracy of each estimator in Table 1 where the input images are simultaneously corrupted with signal dependent noise level
and signal independent noise level
. We can find that proposed estimator gets more accurate results than previous estimator with a similar number of parameters. The results of more various noise levels will be presented in supplementary file. Furthermore, we will present the denoising performance when combined with reconstruction network.
AWGN Denoising We compare proposed denoiser on the noisy grayscale images that are corrupted by AWGN. For this, we train Gaussian denoiser in a single network that learns noise level in [0,60]. The comparisons between the proposed method and other methods are presented in Table 2. We can see that the proposed denoiser achieves the best performance on Set12 where composition of Set12 is independent from training sets. On the other hand, the proposed method gets second best performance on BSD68 that consists of similar objects in BSD400 (training set). We think these results present robust generalization ability of the proposed denoising architecture for training set.
Real Noise Denoising We also investigate the proposed denoiser and transfer learning scheme on RN datasets. Processing RN image is considered very practical, but difficult, because the noises are signal dependent, spatially variant, and visualized diversely according to different in-camera pipelines. Thus, we think RN denoising is an appropriate task for showing the generalization ability of the proposed denoiser and the effects of the proposed transfer learning.
For the precise comparison, we train four different denoisers according to training sets and learning methods:
• AINDNet(S): AINDNet is trained with SN images, which is proposed SN denoiser.
• AINDNet(R): AINDNet is trained with RN images.
• AINDNet+RT: All the parameters from AINDNet(S) are re-trained with RN images, which is common transfer learning scheme.
• AINDNet+TF: Specified parameters from AINDNet(S) are updated with RN images, which is proposed RN denoiser.
Moreover, we present the geometric self-ensemble [44] results denoting super script in order to maximize potential performance of the proposed methods.
Meanwhile, there have been a challenge on real image denoising [3] where the SIDD is used. Our method shows lower performance than the top-ranked ones in the challenge, but it needs to be noted that the number of parameters of our network is much smaller than those in the challenge. For example, DHDN [39] and DIDN [52] that appeared in the challenge require about 160 M and 190 M training parameters respectively which are about 12 - 15 times larger than ours. Moreover, challenge methods have been slightly overfitted to SIDD where the winning denoiser [22] gets comparably lower performance (38.78 dB) on DND than our method. Therefore, we would not directly compare the proposed method with challenge methods.
The comparisons, including internal comparisons, are presented in Table 3 and 4. We can find that proposed methods get the best performance on DND and SIDD benchmarks. Specifically, the proposed AINDNet(S) achieves the best performance on DND benchmark, which is impressive performance that outperforms RN trained denoisers. Moreover, AINDNet(S) gets 1.5 dB and 2.4 dB gains from CBDNet on DND and SIDD respectively where employed noise models are the same. These results indicate that the proposed denoiser is not overfitted to noise model and can be well generalized to RN images. However, AINDNet(S) has inferior performance than AINDNet(R) on SIDD with big margin. The main reason is that AINDNet(R) is solely trained with SIDD training images where test set consists of similar scenes and objects in training set. In other words,
Table 3: Average PSNR of the denoised images on the DND benchmark, we denote the environment of training, i.e., training with SN data only, RN data only, and both.
denotes geometric self-ensemble [44] result. (red: the best result, blue: the second best)
AINDNet(R) can be slightly overfitted to SIDD benchmark and this phenomenon can be seen from insufficient performance on DND.
In contrast, AINDNet+RT and AINDNet+TF get satisfying performance on both DND and SIDD. Concretely, AINDNet+RT and AINDNet+TF have better performance than others, including AINDNet(R) on SIDD, which indicates that pre-training the SN images results in better performance. AINDNet+TF more likely preserves priorly learned knowledges from SN data than AINDNet+RT, so AINDNet+TF achieves the best overall performance among compared methods.
We present visualized comparisons on SIDD and RNI15 in Figs. 5 and 6, which show that proposed methods remove noises robustly while preserving the edges. Thus, characters in output images are more apparent than in other methods’ results. Furthermore, we also present visual enhancement in Fig. 7 when the proposed transfer learning scheme is applied. Since RN denoiser transfer-learns characteristics of RN, AINDNet+TF successfully removes unusual noise that cannot be removed with AINDNet(S). Moreover, RN denoiser learns the properties of JPEG compression artifacts that is not priorly learned in SN denoiser, so it can also successfully reduces compression artifacts. We will also present other visualized comparisons in supplementary file.
Table 4: Average PSNR of the denoised images on the SIDD benchmark, we denote the environment of training, i.e., training with SN data only, RN data only, and both.
denotes geometric self-ensemble [44] result. (red: the best result, blue: the second best)
4.3. Discussions
Effect of Transfer Learning with Limited RN Pairs We investigate the relation between denoising performance and the amount of RN image pairs in T , because we consider that preparation of T is quite difficult and the number of elements can also be limited. For this, we train each network with constrained image pairs from one to all (320) from SIDD [2]. The average PSNR of each denoiser is presented in Table 5. It can be seen that transfer learning schemes can infer great performance with the small number of real training images. It is notable that AINDNet+TF trained with 32 pairs of real data achieves better performance than RIDNet that exploits all. Thus, we can conclude that the transfer learning with SN denoiser dramatically accelerate the performance with a small number of labeled data from other domain.
Architecture of Denoiser We demonstrate the effectiveness of reconstruction network for training with S. For this, AINDNet(S) is compared with a baseline (IN + Concat), which replaces AIN module with IN and concatenated input of noisy image and noise level map [54, 55]. Furthermore, we compare an adaptive Gaussian denoiser [24] that can process spatially variant noise map by feeding gatedresidual block (Gated-ResBlock). Since it has not reported the performance on RN dataset, we train SN denoiser by replacing AIN-ResBlock to Gated-Resblock where other settings are same as AINDNet. Table 6 shows that the proposed AIN-ResBlock shows the best performance on RN datasets. Thus, we believe that the AIN-ResBlock is an appropriate architecture for the generalization. We will present ablation study about update variable for transfer learning in supplementary file.
Table 5: Investigation of denoiser RN denoising performance according to the amount of RN dataset. The quantitative results (in average PSNR (dB)) are reported on SIDD validation dataset.
Figure 5: The real noisy image from SIDD, and the comparison of the results.
Figure 6: The real noisy image from RNI15, and the comparison of the results.
In this paper, we have presented a novel denoiser and transfer learning scheme of RN denoising. The proposed denoiser employs an AIN to regularize the network and also to prevent the network from overfitting to SN. The transfer learning mainly updates the AIN module using RN data to adjust data distribution. From the experimental results, we could find that the proposed denoising scheme can be well generalized to RN even if it is trained with SN. Moreover,
Figure 7: The real noisy image from RNI15, and the comparison of the results showing the effectiveness of the proposed transfer learning scheme.
Table 6: Investigation of the proposed reconstruction net- work when denoisers are trained with SN data. The quantitative results (in average PSNR (dB)) are reported on DND test dataset and SIDD validation dataset.
the transfer learning scheme can effectively adapt an SN denoiser to an RN denoiser, with very few additional training with real- noise pairs. We will make our codes publicly available at https://github.com/terryoo/AINDNet for further research and comparison.
Acknowledgments This work was supported in part by Institute for Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government( MSIT) (No.NI190004,Development of AI based Robot Technologies for Understanding Assembly Instruction and Automatic Assembly Task Planning), and in part by Samsung Electronics Co., Ltd.
[1] Abdelrahman Abdelhamed, Marcus A Brubaker, and Michael S Brown. Noise flow: Noise modeling with conditional normalizing flows. In Proceedings of the IEEE International Conference on Computer Vision, pages 3165–3173, 2019. 3
[2] Abdelrahman Abdelhamed, Stephen Lin, and Michael S Brown. A high-quality denoising dataset for smartphone cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1, 5, 7
[3] Abdelrahman Abdelhamed, Radu Timofte, and Michael S Brown. Ntire 2019 challenge on real image denoising: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. 6
[4] Michal Aharon, Michael Elad, and Alfred Bruckstein. K- svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on signal processing, 54(11):4311–4322, 2006. 7
[5] Josue Anaya and Adrian Barbu. Renoir–a dataset for real low-light image noise reduction. Journal of Visual Communication and Image Representation, 51:144–154, 2018. 1
[6] Saeed Anwar and Nick Barnes. Real image denoising with feature attention. In The IEEE International Conference on Computer Vision (ICCV), October 2019. 6, 7, 11
[7] Tim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, Dillon Sharlet, and Jonathan T Barron. Unprocessing images for learned raw denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 1, 3, 5
[8] Harold C Burger, Christian J Schuler, and Stefan Harmeling. Image denoising: Can plain neural networks compete with bm3d? In 2012 IEEE conference on computer vision and pattern recognition, 2012. 7
[9] Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. arXiv preprint arXiv:1904.00523, 2019. 1
[10] Jingwen Chen, Jiawei Chen, Hongyang Chao, and Ming Yang. Image blind denoising with generative adversarial network based noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 3, 7
[11] Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5659–5667, 2017. 4
[12] Yunjin Chen and Thomas Pock. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE transactions on pattern analysis and machine intelligence, 39(6):1256–1272, 2016. 6, 7
[13] Kostadin Dabov, Alessandro , Vladimir Katkovnik, and Karen Egiazarian. Color image denoising via sparse 3d collaborative filtering with grouping constraint in luminance-
chrominance space. In 2007 IEEE International Conference on Image Processing, 2007. 6, 7
[14] Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and Lei Zhang. Second-order attention network for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11065– 11074, 2019. 4
[15] Ruicheng Feng, Jinjin Gu, Yu Qiao, and Chao Dong. Sup- pressing model overfitting for image super-resolution networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019. 2
[16] Alessandro Foi, Mejdi Trimeche, Vladimir Katkovnik, and Karen Egiazarian. Practical poissonian-gaussian noise modeling and fitting for single-image raw-data. IEEE Transactions on Image Processing, 17(10):1737–1754, 2008. 3
[17] Micha¨el Gharbi, Gaurav Chaurasia, Sylvain Paris, and Fr´edo Durand. Deep joint demosaicking and denoising. ACM Transactions on Graphics (TOG), 35(6):191, 2016. 1
[18] Shuhang Gu, Lei Zhang, Wangmeng Zuo, and Xiangchu Feng. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2014. 7
[19] Shi Guo, Zifei Yan, Kai Zhang, Wangmeng Zuo, and Lei Zhang. Toward convolutional blind denoising of real photographs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 1, 3, 5, 7, 11
[20] Samuel W Hasinoff, Fr´edo Durand, and William T Freeman. Noise-optimal capture for high dynamic range photography. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 553–560. IEEE, 2010. 3
[21] Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, pages 1501–1510, 2017. 2, 4
[22] Dong-Wook Kim, Jae Ryun Chung, and Seung-Won Jung. Grdn: Grouped residual dense network for real image denoising and gan-based real-world noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. 6
[23] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016. 1
[24] Yoonsik Kim, Jae Woong Soh, and Nam Ik Cho. Adaptively tuning a convolutional neural network by gate process for image denoising. IEEE Access, 7:63447–63456, 2019. 1, 2, 7, 8
[25] Yoonsik Kim, Jae Woong Soh, and Nam Ik Cho. Agarnet: Adaptively gated jpeg compression artifacts removal network for a wide range quality factor. IEEE Access, 8:20160– 20170, 2020. 4
[26] Filippos Kokkinos and Stamatis Lefkimmiatis. Iterative residual cnns for burst photography applications. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5929–5938, 2019. 1
[27] Marc Lebrun, Miguel Colom, and Jean-Michel Morel. The noise clinic: a blind image denoising algorithm. Image Processing On Line, 5:1–54, 2015. 5
[28] Christian Ledig, Lucas Theis, Ferenc Husz´ar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photorealistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017. 1
[29] Stamatios Lefkimmiatis. Universal denoising networks: a novel cnn architecture for image denoising. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3204–3213, 2018. 1, 6
[30] Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala, and Timo Aila. Noise2noise: Learning image restoration without clean data. In International Conference on Machine Learning, pages 2971–2980, 2018. 1
[31] Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. Universal style transfer via feature transforms. In Advances in neural information processing systems, pages 386–396, 2017. 2
[32] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017. 2
[33] Xinhao Liu, Masayuki Tanaka, and Masatoshi Okutomi. Practical signal-dependent noise parameter estimation from a single noisy image. IEEE Transactions on Image Processing, 23(10):4361–4371, 2014. 3
[34] Kede Ma, Zhengfang Duanmu, Qingbo Wu, Zhou Wang, Hongwei Yong, Hongliang Li, and Lei Zhang. Waterloo exploration database: New challenges for image quality assessment models. IEEE Transactions on Image Processing, 26(2):1004–1016, 2016. 5
[35] Ilja Manakov, Markus Rohm, Christoph Kern, Benedikt Schworm, Karsten Kortuem, and Volker Tresp. Noise as domain shift: Denoising medical images by unpaired image translation. In Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data, pages 3–10. Springer, 2019. 1
[36] David Martin, Charless Fowlkes, Doron Tal, Jitendra Malik, et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In The IEEE International Conference on Computer Vision (ICCV), 2001. 5
[37] Seonghyeon Nam, Youngbae Hwang, Yasuyuki Matsushita, and Seon Joo Kim. A holistic approach to cross-channel image noise modeling and its application to image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1683–1691, 2016. 1
[38] Alberto Ortiz and Gabriel Oliver. Radiometric calibration of ccd sensors: Dark current and fixed pattern noise estimation. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, volume 5, pages 4730–4735. IEEE, 2004. 3
[39] Bumjun Park, Songhyun Yu, and Jechang Jeong. Densely connected hierarchical network for image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. 6
[40] Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2337–2346, 2019. 2, 4
[41] Tobias Plotz and Stefan Roth. Benchmarking denoising al- gorithms with real photographs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1, 5
[42] Stefan Roth and Michael J Black. Fields of experts. International Journal of Computer Vision, 82(2):205, 2009. 5
[43] Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming- Hsuan Yang, and Lei Zhang. Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 114–125, 2017. 5
[44] Radu Timofte, Rasmus Rothe, and Luc Van Gool. Seven ways to improve example-based single image super resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1865–1873, 2016. 6, 7
[45] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. In- stance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016. 2, 4
[46] Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), pages 3–19, 2018. 4
[47] Lei Xiao, Felix Heide, Wolfgang Heidrich, Bernhard Sch¨olkopf, and Michael Hirsch. Discriminative transfer learning for general image restoration. IEEE Transactions on Image Processing, 27(8):4091–4104, 2018. 1
[48] Jun Xu, Hui Li, Zhetong Liang, David Zhang, and Lei Zhang. Real-world noisy image denoising: A new benchmark. arXiv preprint arXiv:1804.02603, 2018. 1
[49] Jun Xu, Lei Zhang, and David Zhang. External prior guided internal prior learning for real-world noisy image denoising. IEEE Transactions on Image Processing, 27(6):2996–3010, 2018. 1
[50] Jun Xu, Lei Zhang, and David Zhang. A trilateral weighted sparse coding scheme for real-world image denoising. In Proceedings of the European Conference on Computer Vision (ECCV), 2018. 7
[51] Jun Xu, Lei Zhang, David Zhang, and Xiangchu Feng. Multi-channel weighted nuclear norm minimization for real color image denoising. In Proceedings of the IEEE International Conference on Computer Vision, pages 1096–1104, 2017. 1
[52] Songhyun Yu, Bumjun Park, and Jechang Jeong. Deep itera- tive down-up cnn for image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019. 6
[53] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of
deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017. 1, 5, 6, 7
[54] Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang. Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3929–3938, 2017. 1, 7
[55] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. IEEE Transactions on Image Processing, 27(9):4608–4622, 2018. 1, 6, 7
[56] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Learning a single convolutional super-resolution network for multiple degradations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3262– 3271, 2018. 1
[57] Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), 2018. 1, 2, 4
[58] Fengyuan Zhu, Guangyong Chen, and Pheng-Ann Heng. From noise modeling to blind image denoising. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 420–429, 2016. 3
We present the results of transfer-learned denoiser where AINDNet is pre-trained with AWGN and adapted to real noise (RN). For the precise comparison, we report performance of three denoisers in Table 7 according to training sets and learning methods:
• AINDNet(AWGN): AINDNet is trained with AWGN images.
• AINDNet(AWGN)+TF: AINDNet(AWGN) is transfer learned with a single real noisy image.
• AINDNet(AWGN)+TF: AINDNet(AWGN) is transfer learned with full real noisy images (320 images).
It can be seen that proposed transfer learning scheme sig-nificantly improves the performance of synthetic noise (SN) denoisers including AWGN denoiser when the input is limited.
We evaluate the accuracy of the proposed noise level estimator, where the input images are simultaneously corrupted with more diverse signal-dependent noise levels and signal-independent noise levels
. As presented in Table 8, the proposed noise level estimator achieves better accuracy with lower standard deviations of the errors in most cases. Furthermore, the proposed noise level estimator predicts quite accurate estimates when the images are corrupted with high
and
.
Table 7: Average PSNR of the denoised images on the SIDD validation set. denotes that the number of real training noisy image is one.
Table 8: Average MAE and error STD for the images from Kodak24 where the inputs are corrupted by heteroscedastic Gaussian including in-camera pipeline.
We demonstrate the effectiveness of noise level estimator for training with S. We present performance of noise level estimators combined with reconstruction network in Table 9 with different objective function. Remember that can generate smoothed outputs, so
is excluded when using
. We find that state-of-the-art training scheme (FCN +
) infers inferior performance than proposed training scheme (Ours +
). Moreover, the proposed training scheme also surpasses internal variation (Ours +
).
Table 9: Investigation of noise level estimator and estima- tion loss when denoisers are trained with SN data. The quantitative results (in average PSNR (dB)) are reported on DND test dataset and SIDD validation dataset.
We further investigate the relation between update parameters and performance in the transfer learning phase. For the precise comparison, we compare three variants by freezing each update parameter in Table 10:
• Ours-AIN: AIN module is not updated in transfer learning stage.
• Ours-Estimator: Noise level estimator is not updated in transfer learning stage.
• Ours-LastConv: Last convolution is not updated in transfer learning stage.
It can be seen that proposed updating the noise level estimator, and last convolution contribute 0.1 - 0.2 dB performance gain respectively. Fixing AIN module parameter presents even worse performance than the SN denoiser.
Table 10: Investigation of update parameters when denois- ers are transfer-learned with RN data. The quantitative results (in average PSNR (dB)) are reported on SIDD validation dataset.