Colored Noise Injection for Training Adversarially Robust Neural Networks

2020·Arxiv

Abstract

Abstract

Even though deep learning has shown unmatched perfor- mance on various tasks, neural networks have been shown to be vulnerable to small adversarial perturbations of the input that lead to significant performance degradation. In this work we extend the idea of adding white Gaussian noise to the network weights and activations during adversarial training (PNI [7]) to the injection of colored noise for defense against common white-box and black-box attacks. We show that our approach outperforms PNI and various previous approaches in terms of adversarial accuracy on CIFAR-10 and CIFAR-100 datasets. In addition, we provide an extensive ablation study of the proposed method justifying the chosen configurations.

1. Introduction

Deep Neural Networks (DNNs) have shown a tremendous success in a variety of applications, including image classifi-cation and generation, text recognition, machine translation, playing games, etc. Despite achieving notable performance on numerous tasks, DNNs appear to be sensitive to small perturbations of the inputs. Szegedy et al. [20] have shown that it is possible to exploit this sensitivity to create adversarial examples – visually indistinguishable inputs which are classified differently. Subsequent studies proposed different adversarial attacks — techniques for creating adversarial examples.

One of the first practical attacks is FGSM [6], which used the appropriately scaled sign of the attacked network’s gradient. PGD [15], one of the strongest attacks to date, improved FGSM by repeating the gradient step iteratively, i.e., performing projected gradient ascent in the neighbourhood of the input. C&W [3] used a loss term penalizing large distances from the orginal input instead of applying hard restriction on it. In this way, the resulting attack is unbounded, i.e., tries to find a minimum norm adversarial example rather than searching for it in predefined region. DDN [17] signifi-cantly improved the runtime and performance of C&W by decoupling optimization of the direction and the norm.

It was noted that it is possible to create adversarial examples even without access to internals of the model and, in particular, its gradients, i.e., treating the model as a black box (as opposed to previously mentioned white box attacks). The approaches to black box attacks can be roughly divided into two main classes of approaches: the first class trains a different model with known gradients to generate adversarial examples and then transfer them to the victim model [14, 16]. The second class attempts to estimate gradients of the model numerically, based solely on its inputs and outputs [4, 11, 18, 21].

In order to confront with adversarial attacks, it was suggested to add the adversarial examples to the training process and balance between them and the original images [15, 20]. Many subsequent works have tried to increase the strength of training-time attacks to improve robustness [2, 8, 9, 12, 22]. A different approach to overcome adversarial attacks is to add randomization to the neural network [25, 26], making it harder for the attacker to evaluate the gradients and thus to exploit the vulnerability of the network. Recently, He et al. [7] proposed to add Gaussian noise to the weights and activation of the network and showed improvement over ”vanilla” adversarial training under various attacks.

In this paper, we propose a generalization of parametric noise injection (PNI) [7] which we henceforth term parametric colored noise injection (CNI). The main idea is to replace the independent noise with low-rank multivariate Gaussian noise. We show that this modification provides consistent accuracy improvement under various attacks on a number of datasets.

2. Method

In this section we introduce the proposed method of colored noise injection for adversarial defence (CNI). The previously proposed PNI [7] has much in common with uncorrelated variational dropout [10], a powerful regularization technique. In both methods, the noise is distributed as:

for a diagonal matrix . Both methods optimize the parameters during training. The difference between two methods lies in their objective: while varational dropout attempts to infer the Bayesian posterior, PNI makes use of adversarial training to optimize the trade-off between the clean (unperturbed) and adversarial accuracy. In the adversarial training scheme, lowering the noise strength minimizes the clean loss, while increasing the strength provides a defense from adversarial attacks, thereby minimizing the adversarial loss.

Kingma et al. [10] have studied the addition of both correlated and uncorrelated random noise to the weights, claiming dropout [19] is a particular case of such additive noise. Specifically, the advantage of correlated noise over uncorrelated one was demonstrated. Nonetheless, He et al. [7] have only considered the addition of uncorrelated noise. We therefore consider a generalization of PNI which is based on colored (correlated) noise. We model such noise using the multivariate normal distribution with a low-rank covariance. For an N-dimensional noise vector, the noise with an M-ran covarianceis distributed as

where

where non-negative diagonal matrix, and V is an matrix. Note that PNI is a particular case of CNI with M = 0. The off-diagonal part of the covariance matrix, , is a general positive semi-definite symmetric matrix with the rank upper-bounded by M representing a low-dimensional interaction between different parameters.

Sampling low-rank multivariate normal noise In order to sample the noise, we make use of the decomposition of the covariance matrix . We sample two independent normal vectors,

and let

Table 1: Comparison of our method (CNI) to PNI [7] using various configurations on CIFAR-10 with ResNet20 under PGD attack with k = 7 iterations. Mean and standard deviation are calculated over 10 runs in our experiments (upper half), and over 5 runs in the experiments by He et al. [7] (lower half). Noise is injected either to the weights (“W”) or the output activations (“A-a”). Best results for PNI and CNI are set in bold.

Table 2: Comparison of our method to prior art against black- box attacks on CIFAR-10, ResNet-20 under transferable PGD attack and NAttack [11]. denotes our evaluation of the code provided by authors or our re-implementation thereof.

Weight decay We noted that for WideResNet the noise strength increases significantly as compared to ResNet. This leads to very slow convergence and lower performance of the resulting model. To overcome this phenomenon, we added an additional weight decay term to the elements of V . While this approach leads to faster covergence and competitive results on both clean and adversarial datasets, it introduces an additional hyperparameter that requires some tuning.

3. Experiments

Experimental settings. We trained ResNet-20 defended with CNI for 400 epochs on CIFAR-10 using SGD with the learning rate 0.1, reduced by 10 at epochs 200 and 300, and weight decay . The number of iteration of the PGD for adversarial training was set to k = 7. In our experiments we chose rank M = 5 for the colored noise factor. In Section 3.1 we show the effect of different values of M to the final accuracy. We have not studied other distributions

Table 3: Comparison of our method to prior art on CIFAR-10 with WideResNet-28-4 under PGD attack with k = 10. Mean and standard deviation are calculated over 10 runs in our experiments, and over 2 runs in MMA. denotes our evaluation of the code provided by authors or our re-implementation thereof. denotes our evaluation based on the checkpoint provided by the authors. We also provide results for a larger (WideResNet-34-10) network in the lower part of the table.

Table 4: Comparison of our method to prior art on CIFAR- 100 with WideResNet-28-4 under PGD attack with k = 10. Mean and standard deviation is calculated over 10 runs. denotes our evaluation of the code provided by authors or our re-implementation thereof. denotes our evaluation based on the checkpoint provided by the authors. We also provide results for a larger (WideResNet-34-10) network in the lower part of the table.

except the multivariate normal.

For WideResNet-28-4, we used the Ranger optimizer (RAdam [13] with lookahead [24]) for 100 epochs, with the learning rate 0.1, reduced by 10 at epochs 75 and 90, weight decay and additional weight decay of 10 and for CIFAR-100 for the elements of V . The number of iteration of the PGD for adversarial training was set to k = 10. In all cases, the model with highest accuracy on a clean validation set was chosen for the evaluation.

White box attacks. We evaluate our defense against PGD attack [15] with same number of iterations as used in ad-

Figure 1: Accuracy of CNI-W model under PGD attack with different noise covariance rank. Shaded region shows standard deviation of the results calculated over 50 runs.

versarial training (k = 7 for ResNet-20 and k = 10 for WideResNet-28-4). The results are reported in Tables 1, 3 and 4. CNI outperforms other methods using WideResNet-28-4 and shows compatible results even when compared with methods which use larger networks.

Black box attacks. We evaluated the proposed method against two common black-box attacks, in particular, the transferable attack [14] and NAttack[11]. For the transferable attack, we trained another instance of the CNI-W model and used it as a source model in two configurations: PGD with and without smoothing. The results are reported in Table 2. For the transferable attack, our method achieve comparable results to previous art and outperforms them on NAttack.

3.1. Ablation study

We study the dependence of the network performance on noise rank. The results are shown in Fig. 1. As we can see, coloring the noise gives significant improvement of adversarial accuracy, while too high rank of the noise reduces the accuracy, probably due to overparametrization.

We also study the adversarial accuracy as a function of and k (Figs. 2 and 3). Fig. 3 shows that CNI has relatively low variance of the results at small number of iterations, and converges to approximately 33% accuracy for large k. As expected, larger attack radius breaks the defence, and for the performance of the network is worse than random. These results are consistent with the experiments in He et al. [7] and confirm that noise injection leads to true increased robustness of the network rather than to mere gradient obfuscation [1].

Figure 2: Accuracy of CNI-W model under PGD attack with different attack radius, (255 scale). Shaded region shows standard deviation of the results calculated over 5 runs.

Figure 3: Accuracy of CNI-W model under PGD attack with different number of iterations, k. Shaded region shows standard deviation of the results calculated over 5 runs.

4. Conclusions

In this paper we proposed to inject low-rank colored multi-variate Gaussian noise to the parameters of a CNN during adversarial training. We show that adding covariance terms to the injected noise provides improvement over independent noise [7] on both white- and black-box attacks. Moreover, even though we used a much smaller architecture (WideResNet-28-4), we achieved results compatible with state-of-the-art adversarial defences, which used WideResNet-34-10. We also performed an ablation study of the method hyperparameter (noise rank) as well as the attack strength (

References

[1] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Jennifer Dy and Andreas

Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 274–283, Stockholmsmssan, Stockholm Sweden, 10–15 Jul 2018). PMLR. (cited on p. 3)

[2] Yogesh Balaji, Tom Goldstein, and Judy Hoffman. Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets. arXiv preprint arXiv:1910.08051, 2019). (cited on pp. 1 and 3)

[3] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57, May 2017). (cited on p. 1)

[4] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and ChoJui Hsieh. ZOO: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artifi-cial Intelligence and Security, AISec ’17, pages 15–26, New York, NY, USA, 2017). ACM. (cited on p. 1)

[5] Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, and Ruitong Huang. Mma training: Direct input space margin maximization through adversarial training. In International Conference on Learning Representations, 2020). (cited on p. 3)

[6] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014). (cited on p. 1)

[7] Zhezhi He, Adnan Siraj Rakin, and Deliang Fan. Parametric noise injection: Trainable randomness to improve deep neural network robustness against adversarial attack. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019). (cited on pp. 1, 2, 3, and 4)

[8] Haoming Jiang, Zhehui Chen, Yuyang Shi, Bo Dai, and Tuo Zhao. Learning to defense by learning to attack. arXiv preprint arXiv:1811.01213, 2018). (cited on pp. 1 and 3)

[9] Marc Khoury and Dylan Hadfield-Menell. Adversarial training with voronoi constraints. arXiv preprint arXiv:1905.01019, 2019). (cited on p. 1)

[10] Durk P. Kingma, Tim Salimans, and Max Welling. Variational dropout and the local reparameterization trick. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 2575–2583. Curran Associates, Inc., 2015). (cited on p. 2)

[11] Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, and Boqing Gong. NATTACK: Learning the distributions of adversarial examples for an improved black-box attack on deep neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 3866–3876, Long Beach, California, USA, 09–15 Jun 2019). PMLR. (cited on pp. 1, 2, and 3)

[12] Aishan Liu, Xianglong Liu, Chongzhi Zhang, Hang Yu, Qiang Liu, and Junfeng He. Training robust deep neural networks via adversarial noise propagation. arXiv preprint arXiv:1909.09034, 2019). (cited on p. 1)

[13] Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. On the variance of the adaptive learning rate and beyond. In International Conference on Learning Representations, 2020). (cited on p.

3)

[14] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016). (cited on pp. 1 and 3)

[15] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018). (cited on pp. 1, 2, and 3)

[16] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ASIA CCS ’17, pages 506–519, New York, NY, USA, 2017). ACM. (cited on p. 1)

[17] J´erˆome Rony, Luiz G. Hafemann, Luiz S. Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric Granger. Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019). (cited on p. 1)

[18] Binxin Ru, Adam Cobb, Arno Blaas, and Yarin Gal. Bayesopt adversarial attack. In International Conference on Learning Representations, 2020). (cited on p. 1)

[19] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958, 2014). (cited on p. 2)

[20] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013). (cited on p. 1)

[21] Daan Wierstra, Tom Schaul, Jan Peters, and J¨urgen Schmidhuber. Natural evolution strategies. In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pages 3381–3387, June 2008). (cited on p. 1)

[22] Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. Efficient defenses against adversarial attacks. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec ’17, pages 39–49, New York, NY, USA, 2017). ACM. (cited on p. 1)

[23] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. In ICML, pages 7472–7482, 2019). (cited on p. 3)

[24] Michael Zhang, James Lucas, Jimmy Ba, and Geoffrey E. Hinton. Lookahead optimizer: k steps forward, 1 step back. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alch´eBuc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 9597–9608. Curran Associates, Inc., 2019). (cited on p. 3)

[25] Yuchen Zhang and Percy Liang. Defending against whitebox adversarial attacks via randomized discretization. In Kamalika Chaudhuri and Masashi Sugiyama, editors, Proceedings of Machine Learning Research, volume 89 of Proceedings of Machine Learning Research, pages 684–693. PMLR, 16–18

Apr 2019). (cited on p. 1)

[26] Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. Improving the robustness of deep neural networks via stability training. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4480–4488, June 2016). (cited on p. 1)