b

DiscoverSearch
About
My stuff
Can we have it all? On the Trade-off between Spatial and Adversarial Robustness of Neural Networks
2020·arXiv
Abstract
Abstract

(Non-)robustness of neural networks to small, adversarial pixel-wise perturbations, and as more recently shown, to even random spatial transformations (e.g., translations, rotations) entreats both theoretical and empirical understanding. Spatial robustness to random translations and rotations is commonly attained via equivariant models (e.g., StdCNNs, GCNNs) and training augmentation, whereas adversarial robustness is typically achieved by adversarial training. In this paper, we prove a quantitative trade-off between spatial and adversarial robustness in a simple statistical setting. We complement this empirically by showing that: (a) as the spatial robustness of equivariant models improves by training augmentation with progressively larger transformations, their adversarial robustness worsens progressively, and (b) as the state-of-the-art robust models are adversarially trained with progressively larger pixel-wise perturbations, their spatial robustness drops progressively. Towards achieving Pareto-optimality in this trade-off, we propose a method based on curriculum learning that trains gradually on more difficult perturbations (both spatial and adversarial) to improve spatial and adversarial robustness simultaneously.

Neural network (NN) models have achieved state-of-the-art performance on several image tasks over the last few years. However, they have been known to be vulnerable to adversarial attacks that make small, imperceptible pixel-wise perturbations to images [39, 5]. Most pixel-wise adversarial perturbations studied in previous work are carefully constructed, model-dependent and, in most cases also input-dependent, perturbations of small  ℓ∞norm (and sometimes  ℓ1or  ℓ2norm). To achieve adversarial robustness, a popular, principled, and effective defense against  ℓ∞adversarial attacks known at present is adversarial training [17, 29, 37, 49].

From another perspective, recent work has also shown natural, physical-world attacks on neural network models using 2D and 3D spatial transformations such as translations, rotations and pose [47, 14, 42, 1]. Even random spatial transformations such as random translations and random rotations degrade the accuracy of neural network models considerably [12]. Recent benchmarks also measure the robustness of NN models to the average-case, model-agnostic perturbations (e.g., noise, blur, brightness, contrast) or natural distribution shifts, and not only to the worst-case adversarial perturbations [21, 40]. Spatial robustness or robustness to spatial transformations (e.g., translations, rotations) has been long studied as a problem of model invariance, and is generally addressed using equivariant models or training augmentation. While standard Convolutional Neural Networks (StdCNNs) are translation-equivariant, recent efforts have resulted in equivariant NN models for other transformations such as rotation, flip [16, 8, 10, 45, 31, 26, 27, 13], and scaling [30, 46, 38]. Such group-equivariant NN models (GCNNs) are designed to be invariant to a specific group of transformations, so they need training augmentation to attain spatial robustness to random or average-case transformations. However, aided by clever weight-sharing, these recent works have achieved good spatial robustness performance with lesser training augmentation than their non-equivariant counterparts.

In this paper, we address an inevitable trade-off between the average-case spatial robustness and the worst-case  ℓ∞adversarial robustness in neural network models, and show results theoretically as well as empirically. On the one hand, as we attain better spatial robustness via equivariant models and larger training augmentation, the adversarial robustness worsens. On the other hand, as we attain better adversarial robustness via adversarial training against larger pixel-wise perturbations, the spatial robustness worsens. We then provide a curriculum learning-based approach to mitigate this issue and achieve a certain degree of Pareto-optimality in this trade-off.

Our work extends recent progress on two important questions, namely, the robustness to multiple attacks and the robustness vs accuracy trade-off. The robustness to translations (up to  ±3pixels) and rotations (up to  ±30◦) has been shown to have a fundamentally different loss landscape than the robustness to adversarial  ℓpperturbations [12]. A trade-off between robustness to adversarial ℓ∞perturbations and adversarial translations and rotations up to  ±3pixels and  ±40◦, respectively, was also recently noted [42]. Adversarial attacks on  ℓpadversarially robust models by exploiting invariance at intermediate layers have also been of interest in recent work [23, 41]. A key difference of our work with these efforts is that: (i) we study spatial robustness to random translations and rotations (which is more closely related to practice when training NN models and has not been studied in these earlier efforts), instead of adversarial translations or rotations; (ii) we study spatial robustness w.r.t transformations over their entire range, and do not restrict ourselves to a  ±30 − 40◦range, for instance; and (iii) we consider the use of group-equivariant models with training augmentation (which are designed for spatial robustness) in our studies, beyond just StdCNN models. Our results show a progressive decline in spatial robustness as adversarial robustness improves, and vice versa – a trend that was not explicitly noted in previous results. Our result can also be interpreted as a progressive trade-off between adversarial robustness and generalization accuracy itself [43, 49, 48], where generalization performance measures accuracy on a test distribution that includes natural data augmentation such as random translations, rotations, etc.

Given the nature of this trade-off, we then ask the question of how one can obtain a model close to the Pareto-optimal frontier for adversarial and random spatial robustness. Are there training strategies that can help achieve a balance between both? We observe that the obvious solutions of training in two stages – first, adversarial training and then training augmentation by spatial transformations, or vice versa – does not help address this issue. This results in a version of catastrophic forgetting, where as the model learns robustness on one of these fronts, it loses robustness on the other. We instead show that a curriculum learning-based strategy, where the model is trained by applying an adversarial perturbation to a randomly transformed input with gradually increased difficulty levels, gives solutions consistently closer to the Pareto-optimal frontier across well-known datasets and models. This strategy reflects the progressive nature noted of the trade-off itself, and highlights the importance of the observation of this trend.

Our key contributions can be summarized as follows:

• We study a trade-off between adversarial robustness and spatial robustness (with a focus on random spatial transformations) both theoretically and empirically, on a setting closer to natural neural network training, which to the best of our knowledge, has not been considered before. We explain the trade-off between spatial and adversarial robustness with a well-motivated theoretical construction.

• We conduct a comprehensive suite of experiments across popularly used models and datasets, and find that models trained for spatial robustness (StdCNNs and GCNNs) progressively improve spatial robustness when their training is augmented with larger random spatial transformations but while doing so, their  ℓ∞adversarial robustness drops progressively. Similarly, models adversarially trained using progressively larger  ℓ∞-norm attacks improve their adversarial robustness but while doing so, their spatial robustness drops progressively.

• We provide a simple yet effective approach based on curriculum learning that gives better-balanced, Pareto-optimal spatial and adversarial robustness compared to several natural baselines derived from combining adversarial training and training augmentation by random spatial transformations.

We begin by describing our problem formally. Let (X, Y ) denote a random input-label pair from a given data distribution over the input-label pairs  (x, y) ∈ X × Y. Let fbe a neural network classifier. The accuracy of f is given by Pr (f(X) = Y ), the fraction of inputs where the prediction matches the true label. For  ϵ > 0, an  ℓ∞adversarial attack A maps each input x to x + A(x) such that ∥A(x)∥∞ ≤ ϵ, for all  x ∈ X. The fooling rate of A is given by  Pr (f(X + A(X)) ̸= f(X)), the fraction of inputs where the prediction changes after adversarial perturbation. For a given spatial transformation T, the classifier f is said to be T-invariant if the predicted label remains unchanged after the transformation T for all inputs; in other words, f(Tx) = f(x), for all  x ∈ X. The spatial transformations such as translation, rotation and flip preserve the  ℓ∞norm, so TA(x) is an adversarial perturbation of small  ℓ∞norm for the input Tx for a T-invariant classifier. Therefore, for a given perturbation bound  ϵ > 0, the maximum fooling rate for any T-invariant classifier f on the transformed data  {Tx : x ∈ X}must be equal to the maximum fooling rate for f on the original data X. It is a bit more subtle when f is not truly invariant, that is, f(Tx) = f(x) for most inputs x but not all x. We define the rate of invariance of a classifier f to a transformation T as Pr (f(TX) = f(X)) or the fraction of test images whose predicted labels remain unchanged under the transformation T. For a class of transformations, e.g., random rotation r from the range [−θ◦, +θ◦], we define the rate of invariance as the average rate of invariance over transformations T in this class, i.e., Pr (f(r(X)) = f(X)), where the probability is over the random input X as well as the random transformation r. The rate of invariance is 100% if the model f is truly invariant. When f is not truly invariant, the interplay between the invariance under transformations and robustness under adversarial perturbations of small  ℓ∞-norm is subtle. This interplay is exactly what we investigate.

In this paper, we study neural network models and the simultaneous interplay between their spatial robustness, viz, rate of invariance for spatial transformations (such as random rotations between [−θ◦, +θ◦] for θvarying in the range [0, 180]), and their adversarial robustness to pixel-wise perturbations of  ℓ∞norm at most  ϵ. Measuring the robustness of a model to adversarial perturbations of  ℓpnorm at most  ϵis NP-hard [24, 37]. Previous work on the comparison of different adversarial attacks has shown the Projected Gradient Descent (PGD) attack to be among the strongest [2]. Hence, we use PGD-based adversarial training as the methodology for adversarial robustness in a model, as commonly done. We show that our conclusions also hold for other forms of adversarial training such as TRADES [49], which gives a trade-off between adversarial robustness and natural accuracy (on the original unperturbed data).

Theoretically speaking, we prove a quantitative trade-off between spatial and adversarial robustness in a simple, natural statistical setting proposed in previous work [43]. Our theoretical results (described in Sec 3) show that there is a trade-off between spatial and adversarial robustness, and that it is not always possible to have both in a given model. Empirically, we study the following: (a) change in  ℓ∞adversarial robustness as we improve only the rate of spatial robustness using training augmentation with progressively larger transformations; (b) change in spatial invariance as we improve only adversarial robustness using PGD adversarial training with progressively larger  ℓ∞-norm of pixel-wise perturbations. We study StdCNNs, group-equivariant GCNNs, as well as popular architectures used to assess adversarial robustness [29, 49]. Our empirical studies are conducted on MNIST, CIFAR10, CIFAR100 as well as Tiny ImageNet datasets, thus showing the general nature of these results (described in Sec 4). Importantly, as stated earlier, we consider random spatial transformations that are more commonplace than adversarial rotations in previous work [42]. We look at the entire possible range for spatial transformations. Similarly, we normalize the underlying dataset, and compute the accuracy of a given model on test inputs adversarially perturbed using PGD attack of  ℓ∞norm at most  ϵ, for ϵvarying over [0, 1]. In contrast with the certifiable lower bounds on robustness [3, 15, 35, 28, 32], we study upper bounds on spatial and adversarial robustness, respectively.

In this section, we prove the trade-off between spatial and adversarial robustness theoretically, and support this result with experiments in Sec 4. We use A(x) to denote an adversarial  ℓ∞perturbation and r(x) to denote a random spatial transformation. Equivariant model constructions often consider a group of transformations and construct a neural network model invariant to this group. For simplicity, we consider a cyclic group that can model a rotation group (e.g., integer multiples of  30◦), or horizontal/vertical translations (e.g., horizontal translations by multiples of, say,  ±4 pixels).

We take a construction proposed in previous work for a robustness vs accuracy trade-off [43], and infuse it with a simple idea from the theory of error correcting codes [34]. Consider a binary cyclic code of length d, where each codeword is a binary string in  {−1, 1}dand all the codewords can be obtained by applying successive cyclic shift of coordinates to a generator codeword; see Chapter 8 in [34]. The cyclic code is said to have relative distance  δ, if any two codewords differ in at least  δdcoordinates. Let  c = (c1, c2, . . . , cd)be the generator codeword. Consider a random input-label pair (X, Y ), with  X = (X0, X1, . . . , Xd)taking values in  Rd+1and Y taking values in  {−1, 1}, generated as follows. The class label Y takes value  ±1with probability 1/2 each. X0 | Y = ytakes value y with probability p and  −ywith probability  1 − p, for some  p ≥1/2. The remaining coordinates are independent and normally distributed with  Xt | Y = yas N(2cty/√d, 1), for  1 ≤ t ≤ d. Let m be an integer divisor of d, and let  rj(x)denote the cyclic shift of mj coordinates applied to  (x1, x2, . . . , xd)while keeping  x0unchanged, i.e.,  rj(x) =(x0, xmj+1, xmj+2, . . . , xd, x1, . . . , xmj). For example, a  90◦ rotation can be considered as a cyclic permutation of pixels with m = d/4 and the center pixel  x0being fixed. Now let r(x) denote a random permutation that takes value  rj(x), uniformly at random over  j ∈ {1, 2, . . . , d/m}. For example, using m = d/4 as above, r(x) can model a random multiple of  90◦ rotation.

First, we show that achieving a high degree of spatial robustness on the above distribution is nontrivial even if we have high accuracy on the original data without any transformations. So our subsequent trade-off between spatial and adversarial robustness cannot be derived from previously known trade-offs between accuracy and adversarial robustness [43, 49].

Proposition 1. There exists  p ≥ 1/2and a cyclic code with relative distance  δ ≥ 3/8such that given the input distribution defined as above, the classifier of maximum accuracy on input (X, Y ) has accuracy at least 97%. Similarly, the classifier of maximum accuracy on the transformed input (rj(X), Y )also has accuracy at least 97%. However, when the classifier of maximum accuracy on (X, Y ) is applied to  (rj(X), Y ), for any j, it has accuracy at most 85%.

Next we show that if we have a classifier of high adversarial robustness on the above data distribution, then it must have low spatial robustness and vice versa. The proof of this follows from a previously known accuracy vs robustness trade-off by Tsipras et al.[43]. However, their distribution is invariant to any permutation of the coordinates  x1, x2, . . . , xd, so the accuracy and the spatial robustness are equal for their distribution.

Theorem 2. Given the input distribution defined as above, any  η > 0and any classifier  f : Rd+1 →{−1, 1}, if the adversarial accuracy of f is at least  1 − η, then the spatial accuracy of f is at mostη p(1 − p). Similarly, if the spatial accuracy of f is at least  1 − ηthen the adversarial accuracy f is at most  1 − (1 − p)(1 − η)p .

The above results show that adversarial and spatial robustness can not just be high simultaneously. We do not explicitly model the effect of adversarial training with larger  ℓ∞perturbations but, as a reasonable proxy, consider the increase in the adversarial accuracy for a fixed perturbation bound. A tighter analysis of Theorem 2 towards a gradual trade-off between spatial and adversarial robustness is presented in the supplementary.

Experimental Setup. In order to study the effect of spatial transformations (e.g. rotations, translations) and adversarial perturbations (e.g. Projected Gradient Descent or PGD attacks), we use well-known popularly used NN architectures as well as state-of-the-art networks known to have strong invariance [8], as well as networks which are known to have strong robustness [29, 49]. We study the spatial vs adversarial robustness trade-off using well-known architectures (described below) on MNIST, CIFAR10, CIFAR100 and Tiny ImageNet datasets. Our code is made available for reproducibility.

Spatially Robust Model Architectures: StdCNNs are known to be translation-equivariant by design,

and GCNNs [8] are rotation-equivariant by design through clever weight sharing [25]. Equivariant models, especially GCNNs, when trained with random rotation augmentations have been observed to come very close to being truly rotation-invariant [8, 9, 7] (or spatially robust in our context). We hence use both StdCNNs and equivalent GCNNs trained with suitable data augmentations for our studies with spatially robust architectures. In particular, for each StdCNN we use, the corresponding GCNN architecture is obtained by replacing the layer operations with equivalent GCNN operations as in [8]1. For the StdCNN, we use a Conv-Conv-Pool-Conv-Conv-Pool-FC-FC architecture for MNIST (more details in the supplementary); VGG16 [36] and ResNet18 [19] for CIFAR10 and CIFAR100; and ResNet18 for the Tiny ImageNet dataset.

Adversarially Robust Model Architectures: For adversarial training, we use a LeNet-based architec-

ture for MNIST2 and a ResNet-based architecture for CIFAR103. Both these models are exactly as given in [29]. For CIFAR100, we use the popularly used WideResNet-344 architecture also used in [49]. We use ResNet18 [19] for the Tiny ImageNet dataset.

Training Data Augmentation: Spatial: (a) Aug - R : Data is augmented with random rotations in the

range  ±θ◦given  θ, along with random crops and random horizontal flips (for MNIST alone, we do not apply crop and horizontal flips); (b) Aug - T: Data is augmented with random translations within [−i, +i]range of pixels in the image, given i (eg. for CIFAR10 with  i = 0.1 is 32 ∗ 0.1 ≈ ±3px) inboth horizontal and vertical directions; (c) Aug - RT: Data is augmented with random rotations in ±i ∗ 180◦and random translations within  [−i, +i]range of pixels in the image (eg. for CIFAR10 with i = 0.1 is  0.1 ∗ 180◦ = ±18◦rotation and  32 ∗ 0.1 ≈ ±3pxtranslation), here no cropping and no horizontal flip is used. We use nearest neighbour interpolation and black filling to obtain the transformed image. Adversarial : Adv - PGD: Adversarial training using PGD-perturbed adversarial samples using an  ϵ-budget of given  ϵ. Our experiments with PGD use a random start, 40 iterations, and step size 0.01 on MNIST, and a random start, 10 iterations, and step size 2/255 on CIFAR10, CIFAR100 and Tiny ImageNet. While we study the trade-off under a fixed PGD setting our trend holds even with different PGD hyperparameter settings. Refer supplementary J for details. Our results, presented in the next section, are best understood by noting the augmentation method mentioned in the figure caption. For example, in Fig 1(a), the augmentation scheme used is Aug-R. The red line (annotated as 60 in the legend) corresponds to the model trained with random rotation augmentations in the range  ±60◦.

Hardware Configuration: We used a server with 4 Nvidia GeForce GTX 1080i GPU to run all the

experiments in the paper.

Evaluation Metrics: We quantify performance using a spatial invariance profile and an adversarial

robustness profile.

Spatial invariance: We quantify rotation invariance by measuring the rate of invariance or the fraction of test images whose predicted label remains the same after rotation by a random angle between [−θ◦, θ◦]. As  θvaries from 0 to 180, we plot the rate of invariance. We call this the rotation invariance profile of a given model. Similarly, by training a model with other augmentation schemes given above, we obtain a translation invariance profile and a rotation-translation invariance profile for a given model.

Adversarial robustness: Similarly, we quantify the  ℓ∞adversarial robustness of a given model to a fixed adversarial attack (e.g., PGD) and a fixed  ℓ∞norm  ϵ ∈ [0, 1]by (1 - fooling rate), i.e., the fraction of test inputs for which their predicted label does not change after adversarial perturbation. We plot this for  ϵvarying from 0 to 1. We call this the robustness profile of a given model. (One can also choose accuracy instead of 1 - fooling rate and we observe similar trends, as shown in the supplementary. We use 1 - fooling rate for clearer visual presentation of our main results.)

Results. We now present our results on spatially robust model architectures followed by adversarially robust ones.

Results on Spatially Robust Model Architectures: Fig 1 present our results on the trade-off on ad-

versarial robustness for VGG16, trained with spatial augmentations, on CIFAR10. (More results including ResNet18 on CIFAR10, VGG16/ResNet18 on CIFAR100 and the StdCNN/GCNN-based model on MNIST are deferred to the supplementary owing to space constraints.)

image

Figure 1: On CIFAR10, For VGG16 model (a-b) Aug - R StdCNN, (c-d) Aug - R GCNN, (e-f) Aug -T StdCNN, (g-h) Aug - T GCNN, (i-l) Aug - RT StdCNN, (m-p) Aug - RT, invariance profiles of StdCNN/GCNN models and corresponding robustness profiles.

For any fixed  θ ∈ [0, 180], we take an equivariant model, namely, StdCNN or GCNN, and augment its training data by random rotations from  [−θ◦, +θ◦]. For example, Fig 1(b) show how the robustness profile of StdCNN changes on CIFAR10, as we increase the degree  θused in training augmentation of the model. We use PGD attack provided in versions of AdverTorch [11] to obtain the robustness profiles. Similarly, Figs 1(a) show the rotation invariance profile of the same models on CIFAR10. The black line in Fig 1(b) shows that the adversarial robustness of a StdCNN which is trained to handle rotations up to  ±180degrees on CIFAR10, drops by more than 50%, even when the  ϵbudget for PGD attack is only 2/255. Similarly, the black line in Fig 1(a) shows this model’s rotation invariance profile - this model is invariant to larger rotations on test data. This can be contrasted with the model depicted by the red line - this StdCNN is trained to handle rotations up to 60 degrees. The rotation invariance profile of this model is below that of the model depicted by the black line and hence is lesser invariant to large rotations. However, this model can handle adversarial  ℓ∞-perturbations up to 2/255 on unrotated data, with an accuracy more than 10%, as seen from the red line in Fig 1(b). The remaining plots in Figs 1 show similar trends for translation, as well as a combination of rotation and translation (e.g. plots (e-p) in Figs 1 for CIFAR10) for both StdCNN and GCNN models (e.g. Figs 1c,d).

The above results show that the spatial robustness of these models improves by training augmentation but at the cost of their adversarial robustness, indicating a trade-off between spatial and adversarial robustness. This trade-off exists in both StdCNN and GCNN models.

Results on Adversarially Robust Model Architectures: Fig 2 presents our results on the trade-off

on spatial robustness for models trained adversarially using PGD [29] on MNIST, CIFAR10 and CIFAR100 datasets. Each row corresponds to experiments for a single dataset with a plot of robustness profile (for which they are trained) followed by the spatial invariance profiles. Each colored line in the plots corresponds to a model adversarially trained with a different value of an  ϵ-budget.

On MNIST, adversarial training with PGD with larger  ϵresults in a drop in the invariance profile of the LeNet-based model; in Fig 2(b-d), the red line (PGD with  ϵ = 0.3) is below the light blue line (PGD with  ϵ = 0.1). Similar observations hold for the ResNet model on CIFAR10 (see Fig 2 (f-h)), as well as for WideResNet-34 on CIFAR100 (see Fig 2(j-l)). To complete this picture, the robustness profile curves confirm that as these models are trained with PGD using larger budget, their adversarial robustness increases. The robustness profile curves of the LeNet model trained with a larger PGD budget dominates the robustness profile curve of the same model trained with a smaller PGD budget; the red line in Fig 2(a) dominates the light blue line. This is true of the ResNet model too, as in Figs 2(e) and 2(i).

In other words, adversarial training with progressively larger  ϵleads to a drop in the rate of spatial invariance on test data. Fig 3 shows an additional result with StdCNN (VGG-16) with PGD-based adversarial training on CIFAR10. These results too support the same findings.

image

Figure 2: Adv - PGD, (1) Robustness, (2) Rotation, (3) Translation, (4) Rotation-Translation invariance profiles for PGD adversarially trained models. (a-d) LeNet based model from [29] on MNIST (e-h) ResNet based model from [29] on CIFAR10, (i-l) WideResNet34 model on CIFAR100. Different colored lines represent models adversarially trained with different  ℓ∞ budgets ϵ ∈ [0, 1].

image

Figure 3: On CIFAR10, Adv - PGD, For StdCNN/VGG16 model (a) Robustness profile, (b) Rotation invariance profile, (c) Translation invariance profile (d) Rotation-Translation invariance profile. Different colored lines represent models adversarially trained with different  ℓ∞ budgets ϵ ∈ [0, 1].

Results on Tiny ImageNet: Appendix I studies both adversarial robustness of spatially robust models,

as well as spatial invariance of adversarially robust models on the Tiny ImageNet dataset. Our observations mentioned so far holds here too, corroborating our claim that this trade-off is not merely theoretical but noticed in real-world datasets too.

Going beyond PGD-based Adversarial Training: While we have used the popular PGD-based adver-

sarial training for the study of robust models in all the results so far, we conduct experimental studies with another recent popular adversarial method, TRADES [49]. Fig 4 shows the results of these studies on CIFAR10, where we observe the same qualitative behavior of the models.

Additional Empirical Evidence (Average Perturbation Distance to Boundary) : While more results

and implementation details are provided in the supplementary section, we briefly provide an interesting evidence to our claims based on the average distance to the decision boundary (details in Appendix L). We notice that there is a distinct reduction in the distance to decision boundary as model achieves better spatial invariance (Figure 5). This reduction in distance can be linked to the drop in adversarial robustness. Such a study also has the potential to be used to study trade-offs in the presence of other common corruptions [21, 22, 20].

image

Figure 4: On CIFAR10, Adv - TRADES, For StdCNN/WideResNet34 model (a) Robustness , (b) Rotation invariance , (c) Translation invariance (d) Rotation-Translation invariance profiles. Different colored lines represent models adversarially trained with different  ℓ∞ budgets ϵ ∈ [0, 1].

image

Figure 5: Average perturbation distance of StdCNN/GCNN on (a-b) MNIST(based on appendix Table 2), (c-d) CIFAR10(VGG16) wrt unrotated test data.

The progressive nature of the trade-off between spatial invariance and adversarial robustness elicits the need to study training methods that can simultaneously achieve robustness on both of these fronts. In this section, we propose a curriculum learning-based training strategy that consistently provides models close to the Pareto-optimal frontier of the trade-off. An obvious strategy to obtain both kinds of robustness would be to sequentially combine adversarial training and augmentation by spatial transformations, i.e., take an adversarially trained network and re-train it with spatial (e.g., rotation) augmentation, or vice versa (Appendix K). We however observe that such a strategy presents a version of catastrophic forgetting where the robustness learned earlier is forgotten when the other robustness is enforced. A model thus trained has either good spatial robustness or good adversarial robustness but not both. We hence explore a curriculum learning strategy to this end.

Curriculum learning is inherently inspired by how humans learn; first learning simple tasks and then followed by gradually difficult tasks [4]. Previous work has demonstrated its effectiveness in training DNN models [44, 18]. A curriculum-based adversarial training has also been proposed [6], which however focuses only on adversarial robustness. For simultaneous spatial and adversarial robustness, we propose CuSP (Curriculum based Spatial-Adversarial Robustness training for Pareto-Optimality), where the difficulty of spatial augmentation and adversarial strength is gradually increased each time the learning rate is updated as part of a schedule. In particular, each time the learning rate is updated, we increase the spatial augmentation range  [−θ◦, θ◦]as well as the adversarial perturbation bound ϵand adversarial train on a randomly perturbed input within that range. This strategy reflects an optimization procedure where the difficulty of robustness handled by the model is increased when the model treads more slowly (lower learning rate) along areas of the loss surface. The outline of the proposed CuSP approach (Curriculum based Spatial-Adversarial Robustness training for Pareto-Optimality) is given in Algorithm 1.

To study the goodness of the proposed strategy, we look at three metrics: (a) spatial accuracy on large random transformations  Pr (f(T180(X)) = Y ), where  T180denotes a random rotation in the range [−180◦, 180◦], (b) adversarial accuracy at large perturbations  Pr�f(X + A8/255(X)) = Y�, whereA8/255denotes an adversarial attack (PGD or TRADES) of  ℓ∞-norm bounded by  ϵ = 8/255, and finally, (c) natural accuracy on the unperturbed data Pr (f(X) = Y ). The last one is included for completeness because we want better spatial and adversarial robustness simultaneously but not lose on natural accuracy in the bargain [43, 49].

Results: We studied CuSP across models and datasets in our experiments, and report the results of VGG16 on CIFAR10 in Table 1. The remainder of these results are presented in Appendix M. We follow the learning rate schedule 75-90-100 as used in TRADES [49] where the learning rate starting

image

Table 1: Comparison of performance of proposed CuSP for CIFAR10 on StdCNN/VGG16 with other baseline strategies. (Aug  θ): denotes training data augmented with random rotations in the range  [−θ, +θ]; a → b:denotes a sequentially followed by b during training. The top 7 values in each column are in boldface.

with 0.1 decays by a factor of 0.1 successively at the 75-th, 90-th and 100-th epochs. Our algorithm works with given sets of values for  θand  ϵand does not optimize for the choice of these values. We empirically studied various combinations of  (θ, ϵ)values as  θvaries over  {0, 30, 60◦, . . . , 180}and  ϵvaries over {0, 2/255, 4/255, 8/255}. While each of these options improves performance over the baselines, we found the list of  θ = {120◦, 150◦, 180◦}to give the best results on this particular setting. We also include results for CuSP training after a warm start, viz., PGD with no augmentation and natural training with  [−180◦, +180◦]augmentation. Figure 6 compares the performance of CuSP against other baseline training strategies, visualized in the context of the Pareto frontier for PGD accuracy and spatial accuracy.

In this work, we present a comprehensive discussion on the trade-off between spatial and adversarial robustness in neural network models. While a few recent efforts have studied similar trade-offs, we study random spatial transformations in this work, which are more popularly used in practice than any adversarial spatial transformations. We study the trade-off both theoretically and empirically, and

image

Figure 6: Visualization of performance of CuSP based on PGD against other baseline strategies on CIFAR10 for StdCNN/VGG16 model (each index corresponds to a row in Table 1). On the left, we compare CuSP against various baselines. On the right, we zoom in to compare different variants of CuSP.

prove a quantitative trade-off between spatial and adversarial robustness in a simple statistical setting used in earlier work. Our experiments show that as equivariant models (StdCNNs and GCNNs) are trained with progressively larger spatial transformations, their spatial invariance improves but at the cost of their adversarial robustness. Similarly, adversarial training with perturbations of progressively increasing norms improves the robustness of well-known neural network models, but with a resulting drop in their spatial invariance. We also provide a curriculum learning-based training method to obtain a solution that is close to Pareto-optimal in comparison to other baseline adversarial training methods. While CuSP gradually increases both the range of spatial transformation  θin the augmentation and the adversarial perturbation bound  ϵ, we conjecture that tuning these further with the learning rate schedule along with a warm start for model parameters could give further improvements. Recently, [21, 22, 20] have shown that study of other common corruptions is important in the study of state-of-the-art NN models. Similar studies between spatial invariance and such common corruptions is an interesting direction of future work.

Broader Impact. Adversarial robustness is one of the most important problems in better understanding the functioning of neural networks. Another important problem is how networks can efficiently handle various spatial transformations like rotation, translation, etc. The trade-off studied and the algorithm proposed to find a better Pareto-optimal in this paper tries to understand the interaction between the two which is a natural question. Hence, we believe this study is towards better understanding of the networks learning capability. There is no known detrimental social impact of our work.

Acknowledgements and Funding Transparency Statement. Sandesh Kamath would like to thank Microsoft Research India for funding a part of this work through his postdoctoral research fellowship at IIT Hyderabad.

[1] Michael A. Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Long Mai, Wei-Shinn Ku, and Anh Nguyen. Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[2] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the International Conference on Machine Learning (ICML), 2018.

[3] Mislav Balunovic, Maximilian Baader, Gagandeep Singh, Timon Gehr, and Martin Vechev. Certifying geometric robustness of neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2019.

[4] Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In In Proceedings of the International Conference on Machine Learning (ICML), 2009.

[5] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndi´c, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. Lecture Notes in Computer Science, 2013.

[6] Qi-Zhi Cai, Chang Liu, and Dawn Song. Curriculum adversarial training. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 3740–3747. International Joint Conferences on Artificial Intelligence Organization, 7 2018.

[7] Mario Cohen, Taco S. Geiger and Maurice Weiler. A general theory of equivariant cnns on homogeneous spaces. arXiv preprint arXiv:1811.02017, 2018.

[8] Taco S. Cohen and Max Welling. Group equivariant convolutional networks. In Proceedings of the International Conference on Machine Learning (ICML), 2016.

[9] Taco S. Cohen and Max Welling. Steerable CNNs. In International Conference on Learning Representations (ICLR), 2017.

[10] Sander Dieleman, Jeffrey De Fauw, and Koray Kavukcuoglu. Exploiting cyclic symmetry in convolutional neural networks. In Proceedings of the International Conference on Machine Learning (ICML), 2016.

[11] Gavin Weiguang Ding, Luyu Wang, and Xiaomeng Jin. AdverTorch v0.1: An adversarial robustness toolbox based on pytorch. arXiv preprint arXiv:1902.07623, 2019.

[12] Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. Exploring the landscape of spatial robustness. In Proceedings of the International Conference on Machine Learning (ICML), 2019.

[13] Carlos Esteves, Christine Allen-Blanchette, Xiaowei Zhou, and Kostas Daniilidis. Polar transformer networks. In International Conference on Learning Representations (ICLR), 2018.

[14] Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning visual classification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[15] Marc Fischer, Maximilian Baader, and Martin Vechev. Certified defense to image transformations via randomized smoothing. In Advances in Neural Information Processing Systems (NeurIPS), 2020.

[16] Robert Gens and Pedro M Domingos. Deep symmetry networks. In Advances in Neural Information Processing Systems (NeurIPS), 2014.

[17] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.

[18] Guy Hacohen and Daphna Weinshall. On the power of curriculum learning in training deep networks. In In Proceedings of the International Conference on Machine Learning (ICML), 2019.

[19] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[20] Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, and Justin Gilmer. The many faces of robustness: A critical analysis of out-of-distribution generalization. arXiv preprint arXiv:2006.16241, 2020.

[21] Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations (ICLR), 2019.

[22] Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial examples. arXiv preprint arXiv:1907.07174, 2019.

[23] Joern-Henrik Jacobsen, Jens Behrmann, Richard Zemel, and Matthias Bethge. Excessive invariance causes adversarial vulnerability. In International Conference on Learning Representations (ICLR), 2019.

[24] Guy Katz, Clark W. Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. Reluplex: An efficient smt solver for verifying deep neural networks. In CAV, 2017.

[25] Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. In Proceedings of the International Conference on Machine Learning (ICML), 2018.

[26] Dmitry Laptev, Nikolay Savinov, Joachim M. Buhmann, and Marc Pollefeys. TI-pooling: transformation-invariant pooling for feature learning in convolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[27] Junying Li, Zichen Yang, Haifeng Liu, and Deng Cai. Deep rotation equivariant network. arXiv preprint arXiv:1705.08623, 2017.

[28] Linyi Li, Maurice Weber, Xiaojun Xu, Luka Rimanic, Tao Xie, Ce Zhang, and Bo Li. Provable robust learning based on transformation-specific smoothing, 2020.

[29] Aleksander Madry, Aleksandar A Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018.

[30] Diego Marcos, Benjamin Kellenberger, Sylvain Lobry, and Devis Tuia. Scale equivariance in cnns with vector fields. arXiv preprint arXiv:1807.11783, 2018.

[31] Diego Marcos, Michele Volpi, Nikos Komodakis, and Devis Tuia. Rotation equivariant vector field networks. In International Conference on Computer Vision (ICCV), 2017.

[32] Jeet Mohapatra, Tsui-Wei Weng, Pin-Yu Chen, Sijia Liu, and Luca Daniel. Towards verifying robustness of neural networks against a family of semantic perturbations. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[33] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[34] Ron Roth. Introduction to Coding Theory. Cambridge University Press, 2006.

[35] Anian Ruoss, Maximilian Baader, Mislav Balunovi´c, and Martin Vechev. Efficient certification of spatial robustness, 2021.

[36] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[37] Aman Sinha, Hongseok Namkoong, and John C. Duchi. Certifying some distributional robustness with principled adversarial training. In International Conference on Learning Representations (ICLR), 2018.

[38] Ivan Sosnovik, Michał Szmaja, and Arnold Smeulders. Scale-equivariant steerable networks. In International Conference on Learning Representations (ICLR), 2020.

[39] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.

[40] Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, and Ludwig Schmidt. Measuring robustness to natural distribution shifts in image classification. In Advances in Neural Information Processing Systems (NeurIPS), 2020.

[41] Florian Tramèr, Jens Behrmann, Nicholas Carlini, Nicolas Papernot, and Jörn-Henrik Jacobsen. Fundamental tradeoffs between invariance and sensitivity to adversarial perturbations. In In Proceedings of the International Conference on Machine Learning (ICML), 2020.

[42] Florian Tramèr and Dan Boneh. Adversarial training and robustness for multiple perturbations. In Advances in Neural Information Processing Systems (NeurIPS), 2019.

[43] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. In International Conference on Learning Representations (ICLR), 2019.

[44] Daphna Weinshall, Gad Cohen, and Dan Amir. Curriculum learning by transfer learning: Theory and experiments with deep networks. In In Proceedings of the International Conference on Machine Learning (ICML), 2018.

[45] D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Brostow. Harmonic networks: Deep translation and rotation equivariance. arXiv preprint arXiv:1612.04642, 2016.

[46] Daniel Worrall and Max Welling. Deep scale-spaces: Equivariance over scale. In Advances in Neural Information Processing Systems, 2019.

[47] Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. Spatially transformed adversarial examples. In International Conference on Learning Representations (ICLR), 2018.

[48] Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Russ R. Salakhutdinov, and Kamalika Chaudhuri. A closer look at accuracy vs. robustness. In Advances in Neural Information Processing Systems (NeurIPS), 2020.

[49] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. In In Proceedings of the International Conference on Machine Learning (ICML), 2019.

The supplementary material contains proofs, additional experiments to show that our results continue to hold across different datasets and different models, and additional empirical evidence for the spatial vs adversarial robustness trade-off that we could not fully include in the main paper owing to space constraints.

This section provides the proof for Proposition 1. The parameter  δbelow is mostly for illustration and not optimized. We consider d to be large, as the small adversarial perturbation we consider has ℓ∞ norm ≤ 4/√d.

Proposition 1. There exists  p ≥ 1/2and a cyclic code with relative distance  δ ≥ 3/8such that given the input distribution defined as above, the classifier of maximum accuracy on input (X, Y ) has accuracy at least 97%. Similarly, the classifier of maximum accuracy on the transformed input (rj(X), Y )also has accuracy at least 97%. However, when the classifier of maximum accuracy on (X, Y ) is applied to  (rj(X), Y ), for any j, it has accuracy at most 85%.

Proof. Let  ρ(x0)be equal to  p/(1 − p), if x0 = 1, and (1 − p)/p, if x0 = −1. The Bayes classifier f ∗ that maximizes the accuracy is:

image

because the quadratic terms and the constants cancel out. Note that the above classifier of maximum accuracy can be represented as a single decision tree node at  x0followed by a linear classifier in the remaining coordinates  x1, x2, . . . , xd. Moreover,  ct = ±1 and Xt | Y = yis normally distributed as N(2cty/√d, 1), so �dt=1 ctXt | Y = yis normally distributed as  Zy = N(2y√d, d). As p → 1/2,we have  log ρ(x0) → 0. It is known that  Pr (Z ≥ µ − 2σ) = Pr (Z ≤ µ + 2σ) ≈ 0.977, for any normally distributed Z with mean  µand standard deviation  σ. So the  sign (Zy)matches y with probability at least 97%, and the accuracy of  f ∗ is at least 97%.

Similarly, the classifier  f ∗j that maximizes the accuracy on the transformed data  ( ˜X, Y ) = (rj(X), Y )is given by

image

and the same proof mutatis mutandis works to show that its accuracy on  ( ˜X, Y )is at least 97%.

Let  Sjbe the subset of coordinates  t with (rj(c))t = ct. Since our code is binary, we get  (rj(c))t =−ct, for t /∈ Sj. Therefore,

image

Since  ct = ±1and  Xt | Y = yis normally distributed as  N(2cty/√d, 1), we have  ctXt | Y = ynormally distributed as  N(2y/√d, 1). Moreover, we get that �t∈Sj ctXt − �t/∈Sj ctXt | Y = y

is normally distributed as  N(2y(2 |Sj| − d)/√d, 2 |Sj| − d). Note that  |Sj| ≤ (1 − δ)d. Thus, for

image

For the existence of a cyclic code with relative distance  δ ≥ 3/8, see Chapter 8 of [34]. It is known that  Pr (Z ≥ µ − σ) = Pr (Z ≤ µ + σ) ≈ 0.841, for any normally distributed Z with mean  µand standard deviation  σ. Using p sufficiently close to 1/2, we get that  Pr�f ∗j (X) = Y�is at most 85%. Since r(x) is a random transformation  rj(x)where j is picked uniformly at random from {1, 2, . . . , d/m}, the same bound holds for the data transformed using r(x).

We now prove Theorem 2 on the trade-off between spatial and adversarial robustness.

Theorem 2. Given the input distribution defined as above, any  η > 0and a classifier  f : Rd+1 →{−1, 1}, if the adversarial accuracy is at least  1 − η, then the spatial accuracy is at most η p(1 − p). Similarly, if the spatial accuracy is at least  1 − ηthen the adversarial accuracy is at most  1 −

image

Proof. Observe that  rj(X)only permutes the coordinates of X. Hence,

image

The known adversarial robustness vs. standard accuracy trade-off proof [43] implies, for all j,

image

(For completeness, we can add a proof similar to lines 564-565 in the supplementary.) By averaging the above over all j, we get

image

Now if the adversarial robustness (i.e., the LHS above) is at least  1 − η, then the spatial robustness can be upper bounded as

image

The other side of the trade-off can be proved similarly. If the spatial robustness Pr(f(r(X) = Y ) is at least  1 − ηthen there exists at least one index j such that  Pr(f(rj(X)) = Y ) ≥ 1 − η. So again using the above adversarial robustness vs. standard accuracy trade-off for that particular index j we get

image

Hence, the adversarial robustness can be upper bounded as follows, using the observation that  rj(X)only permutes the coordinates of X.

image

We now give a tighter analysis and prove a modified version of Theorem 2 towards exhibiting a trade-off between spatial and adversarial robustness. It would be interesting to close the gap between our theoretical trade-off and the progressive or gradual trade-off observed in our experiments.

Theorem 3. (Theorem 2, modified) Given the input distribution defined as above and a classifier f : Rd+1 → {−1, 1}, suppose  Pr (f(X + A(X)) = Y ) ≥ 1 − η, for all adversarial attacks A of ℓ∞norm bounded by  4/√d. Then

image

Note that as  p → 1/2and  η → 0, both the bounds approach  1 − m/d, which decreases as m increases. For m = d/2, this accuracy is as bad as that of a random classifier.

Proof. Let A(x) denote an adversarial perturbation for  (x, y) given by (A(x))0 = 0, and (A(x))t =−4cty/√d, for  1 ≤ t ≤ d. Note that  ∥A(x)∥∞ = 4/√d, for all  x ∈ X. Let  Gydenotes (X1, . . . , Xd)| Y = y, the last dcoordinates conditioned on Y = y. We observe that X +A(X)| Y = y has its last d coordinates distributed as  G−y, i.e., the adversarial attack that takes X to X + A(X) turns  Gy into G−y.

image

From the above statement, we get that if the adversarial accuracy  Pr (f(X + A(X)) = Y ) ≥ 1 − ηthen the standard accuracy  Pr (f(X) = Y ) ≤ η p/(1 − p). Using this we get:

image

Two important things to note are as follows. First, the above upper bound on the accuracy after random transformation r decreases as m increases. Note that larger rotations mean larger m, e.g., a random rotation from  {0◦, 90◦, 180◦, 270◦}can be modeled by m = d/4. Second, for m = d/2 and p → 1/2, this bound tends to  1/2 + η/2. When ηis small, this is only slightly better than a random binary classifier.

Observe that the transformation  rj(x)can also be achieved as an adversarial perturbation  x + Aj(x),where  (Aj(x))0 = 0 and (Aj(x))t = (rj(x))t − xt ∈ {−4y/√d, 0, 4y/√d}, for 1 ≤ t ≤ d. Using

this observation, we get

image

Note that for m = d/2, this bound becomes  (1 − η)/2. When ηis small, this is only slightly worse than a random binary classifier. In other words, for m = d/2, high adversarial robustness implies that spatial robustness must be as bad as that of a random classifier.

In the main paper, our theoretical results are about the adversarial accuracy Pr (f(X + A(X)) = Y ) and the spatial accuracy Pr (f(r(X)) = Y ). However, the experiments show the fooling rate Pr (f(X + A(X)) = f(X)) and the rate of invariance Pr (f(r(X)) = f(X)), respectively. In fact, they are reasonably proxies for each other, respectively, when the standard accuracy Pr (f(X) = Y ) is high. In Fig 7, we plot the adversarial and the spatial accuracy in (a) and (c), alongside (1 - fooling rate) and rate of invariance in (b) and (d). As evident, our results and claims on the trade-off continue to hold in either case. In the main paper, we present our results using the fooling rate and the rate of invariance because that ensures the same starting point for all the curves, and provides a better visual representation to compare adversarial and spatial robustness of different models.

image

Figure 7: (a-b) Plots of accuracy and (1-fooling rate) for StdCNN/VGG16 trained on CIFAR10; (c-d) Plots of accuracy and rate of invariance for ResNet-based model from [29] adversarially trained on CIFAR10. Note the same starting points on (b) & (d), which allow for better comparison.

We describe in detail the settings for all our experiments.

Datasets: We use the standard benchmark train-validation-test splits of all the datasets used in this work, that is publicly available. MNIST dataset consists of 70, 000 images of  28 × 28size, divided into 10 classes: 55, 000 used for training, 5, 000 for validation and 10, 000 for testing. CIFAR10 dataset consists of  60, 000 images of 32 × 32size, divided into 10 classes: 40, 000 used for training, 10, 000 for validation and 10, 000 for testing. CIFAR100 dataset consist of  60, 000 images of 32 × 32size, divided into 100 classes: 40, 000 used for training, 10, 000 for validation and 10, 000 for testing. Tiny-ImageNet dataset consists of  100, 000 images of 64 × 64size, divided into 200 classes: 80, 000 used for training, 10, 000 for validation and 10, 000 for testing.

Spatially Robust Model Architectures: StdCNNs are known to be translation-equivariant by design,

and GCNNs [8] are rotation-equivariant by design through clever weight sharing [25]. Equivariant models, especially GCNNs, when trained with random rotation augmentations have been observed to come very close to being truly rotation-invariant [8, 9, 7] (or spatially robust in our context). We hence use both StdCNNs and equivalent GCNNs trained with suitable data augmentations for our studies with spatially robust architectures. In particular, for each StdCNN we use, the corresponding GCNN architecture is obtained by replacing the layer operations with equivalent GCNN operations as in [8]5. For the StdCNN, we use the architecture given in Table 2 for MNIST; VGG16 [36] and ResNet18 [19] for CIFAR10 and CIFAR100; and ResNet18 for the Tiny ImageNet dataset.

Adversarially Robust Model Architectures: For adversarial training, we use a LeNet-based architec-

ture for MNIST6 and a ResNet-based architecture for CIFAR107. Both these models are exactly as given in [29]. For CIFAR100, we use the popularly used WideResNet-348 architecture also used in [49]. We use ResNet18 [19] for the Tiny ImageNet dataset.

Training Data Augmentation: Spatial: (a) Aug - R : Data is augmented with random rotations in the

range  ±θ◦given  θ, along with random crops and random horizontal flips (for MNIST alone, we do not apply crop and horizontal flips); (b) Aug - T: Data is augmented with random translations within [−i, +i]range of pixels in the image, given i (eg. for CIFAR10 with  i = 0.1 is 32 ∗ 0.1 ≈ ±3px) inboth horizontal and vertical directions; (c) Aug - RT: Data is augmented with random rotations in ±i ∗ 180◦and random translations within  [−i, +i]range of pixels in the image (eg. for CIFAR10 with i = 0.1 is  0.1 ∗ 180◦ = ±18◦rotation and  32 ∗ 0.1 ≈ ±3pxtranslation), here no cropping and no horizontal flip is used. Random transformation is picked uniformly at random in the given transformation range, e.g., given  θ, rotations are picked uniformly at random from  [−θ◦, +θ◦]for augmentation. We use nearest neighbour interpolation and black filling to obtain the transformed image. Adversarial : Adv - PGD: Adversarial training using PGD-perturbed adversarial samples using an  ϵ-budget of given  ϵ. Our experiments with PGD use a random start, 40 iterations, and step size 0.01 on MNIST, and a random start, 10 iterations, and step size 2/255 on CIFAR10, CIFAR100 and Tiny ImageNet. Our results are best understood by noting the augmentation method mentioned in the figure caption. For example, in Fig 1(a), the augmentation scheme used is Aug-R. The red line (annotated as 60 in the legend) corresponds to the model trained with random rotation augmentations in the range  ±60◦.

Hyperparameter Settings for CuSP algorithm: The empirical evaluations of the CuSP algorithm are

based on settings used in the TRADES method. For CIFAR10, we train the model for 120 epochs with the learning rate schedule 75-90-100 where the learning rate starting with 0.1 decays by a factor of 0.1 successively at the 75-th, 90-th and 100-th epochs. TRADES is used with 1λ = 5.0. PGD-based perturbation is used with step size = 2255or 0.007 and number of steps = 10 with the appropriate  ϵbudget.

Hardware Configuration: We used a computing server with 4 Nvidia GeForce GTX 1080i GPUs to

run all experiments in the paper. Tool License: AdverTorch [11] provides the tool under GNU/ LGPL. TRADES and code for [29] are

under MIT License.

Table 2 shows the details of the model architecture used for our experiments on the MNIST dataset. Fig 8 contains the invariance and robustness profiles of StdCNN and GCNN models on MNIST. We observe the same trends noted in the main paper in these results too.

image

Table 2: Architectures used for MNIST experiments

image

Figure 8: On MNIST, (a-b) Aug - R StdCNN, (c-d) Aug - R GCNN, (e-f) Aug - T StdCNN, (g-h) Aug - T GCNN, (i-l) Aug - RT StdCNN, (m-p) Aug - RT GCNN, invariance profiles of StdCNN/GCNN models and corresponding robustness profiles.

In the main paper (Fig 1), we presented results for experiments carried out using VGG16 on CIFAR10. Please see Fig 9 for additional experiments on CIFAR10 using the ResNet18 model. These complement the experiments in the main paper and show that our results continue to hold across different models.

In the main paper, we presented results for experiments carried out using VGG16 on CIFAR10. Please see Figs 10 and 11 for additional experiments on CIFAR100 using the VGG16 and ResNet18 models, respectively. These complement the experiments in the main paper and show that our results continue to hold across different models and datasets.

In the main paper, we presented results for experiments carried out using VGG16 on CIFAR10. Please see Figs 12 and 13 for additional experiments on Tiny-ImageNet using the ResNet18 and Densenet121 models, respectively. These complement the experiments in the main paper and show that our results continue to hold across different models and datasets.

The results in the main paper using PGD attack were with the well known setting of k = 10 and step size = 2/255. We checked the adversarial accuracy for the following 4 variants of PGD attack with ϵ = 8/255but different number of steps k and step sizes:

image

image

Table 3: Results similar to Table 1 row indexed 3 (PGD (Aug 0)) and row indexed 4 (PGD (Aug 180)) with different PGD hyperparameter settings on StdCNN/VGG16 with CIFAR10.

In Table 3, we note the results based on the above settings of PGD attack on adversarially trained StdCNN/VGG16 network on CIFAR10 with  0◦and  ±180◦augmented data. We observe that our conclusions about spatial and adversarial robustness trade-off continue to hold for (a-b-c-d) above.

image

Figure 9: (Best viewed in color, zoomed in) On CIFAR10, For ResNet18 model (a-b) Aug - R StdCNN, (c-d) Aug - R GCNN, (e-f) Aug - T StdCNN, (g-h) Aug - T GCNN, (i-l) Aug - RT StdCNN, (m-p) Aug - RT, invariance profiles of StdCNN/GCNN models and corresponding robustness profiles.

We additionally performed experiments to study training using spatial augmentations as well as adversarial training. The spatial vs adversarial robustness trade-off continues to hold, e.g., the improvement in spatial robustness that we get by spatial transformation augmentation during training is lost when we do adversarial training to improve adversarial robustness. Fig 14 shows translation invariance and  ℓ∞robustness for StdCNN/VGG16 models that are first trained with progressively larger ranges of translation augmentation on CIFAR10, followed by adversarially training them with a fixed  ℓ∞budget of  ϵ = 8/255. The results show the same trends as before despite the combined training.

For each test image, adversarial attacks find perturbations of the test point with small  ℓ∞norm that would change the prediction of the given model. Most adversarial attacks do so by finding the directions in which the loss function of the model changes the most. In order to explain why these networks become vulnerable to pixel-wise attacks as they learn more rotations, we see how the distance of the test points to the decision boundary changes as the networks learn larger rotations. This is illustrated in Fig 15 where we show the distance of a test point  x0to the boundary  D0(resp.  D180) when the model is trained with zero (resp.  ±180◦) rotations. We use the  L2attack

image

Figure 10: (Best viewed in color, zoomed in) On CIFAR100, For VGG16 model (a-b) Aug - R StdCNN, (c-d) Aug - R GCNN, (e-f) Aug - T StdCNN, (g-h) Aug - T GCNN, (i-l) Aug - RT StdCNN, (m-p) Aug - RT, invariance profiles of StdCNN/GCNN models and corresponding robustness profiles.

image

Figure 11: (Best viewed in color, zoomed in) On CIFAR100, For ResNet18 model (a-b) Aug - R StdCNN, (c-d) Aug - R GCNN, (e-f) Aug - T StdCNN, (g-h) Aug - T GCNN, (i-l) Aug - RT StdCNN, (m-p) Aug - RT, invariance profiles of StdCNN/GCNN models and corresponding robustness profiles.

vectors obtained by DeepFool [33] for the datapoints under attack. We take the norm of this attack vector as an approximation to the shortest distance of the test point to the decision boundary. For

image

Figure 12: (Best viewed in color, zoomed in) On Tiny-ImageNet for ResNet18 model: (a-b) Aug - R StdCNN, (c-d) Aug - R GCNN, (e-f) Aug - T StdCNN, (g-j) Aug - RT StdCNN, (k-n) Adv - PGD StdCNN, invariance profiles of StdCNN/GCNN models and corresponding robustness profiles.

image

Figure 13: (Best viewed in color, zoomed in) On Tiny-ImageNet for Densenet121 model: (a-b) Aug -R StdCNN, (c-d) Aug - T StdCNN, (e-h) Aug - RT StdCNN, (i-k) Adv - PGD StdCNN, invariance profiles of StdCNN models and corresponding robustness profiles.

each of the test points we collect the perturbation vectors given by DeepFool attack and report the average perturbation distance. We plot this average distance as the datasets are augmented with larger rotations. Our experiments show that as the networks learn larger rotations with augmentation, the average perturbation distance falls. So as (symmetric) networks become invariant to rotations, they are more vulnerable to pixel-wise attacks. The plots in Fig 16 show this for StdCNNs and GCNNs on MNIST, CIFAR10 and CIFAR100. We plot the accuracy of these models and also the average perturbation distance of the test points alongside in one figure, e.g. 16(a) shows accuracy and 16(b) shows corresponding average perturbation distance. The blue line in Fig 16(a) shows the accuracy of

image

Figure 14: On CIFAR10, Aug - T + Adv - PGD (ϵ = 8/255); (Left) Translation invariance profiles and (Right) robustness profiles of StdCNN/VGG16 models. Different colored lines represent models initially trained with different translation augmentation to make them translation invariant first. Each model made invariant to different translation ranges was further adversarially trained with  ℓ∞ budgetϵ = 8/255.

image

Figure 15: Distance of point  x0to decision boundary  D180obtained by augmenting training set with random rotations in range  [−180◦, 180◦]is different compared to the decision boundary  D0 obtainedwith no training augmentation. (This figure is only for illustrative purposes, and does not reflect actual measurements.)

a StdCNN on MNIST when the training data is augmented with  θranging from 0 to 180 and the test has no augmentation. The red line shows the accuracy when the test is not augmented with rotations but is PGD perturbed with  ℓ∞norm 0.3. The red line in Fig 16(b) shows the average perturbation distance of the unrotated test points when the network is trained with rotations upto  θ- this is about 5 when  θis  0◦(the point on y-axis where curves begin). As the network is trained with random rotations up to  180◦, the average perturbation distance of the augmented test drops below 4.0. Fig 16(a) shows that that the PGD accuracy has dropped from around 65% for the network at  0◦ to 20%at  180◦.

In the main paper, we presented results of CuSP based on PGD for the CIFAR10 dataset trained with StdCNN/VGG16 architecture. In this section, we provide additional experimental results and visualizations based on CuSP with PGD and TRADES on StdCNN/ResNet18 for the CIFAR10 and CIFAR100 datasets. We also include additional results using TRADES on StdCNN/VGG16 for the CIFAR10 dataset. Tables 4 and 5 contain the results for StdCNN/VGG16 and StdCNN/ResNet18 on CIFAR10, respectively. (Note that Table 4 is a more detailed version of Table 1 in the main paper). Table 6 presents the results for StdCNN/ResNet18 on CIFAR100. Similar to Fig 6 in the main paper, Figs 17, 18, 19, 20, 21 and 22 present a visualization of the results to understand their relevance to the Pareto frontier.

Observations: In Table 4 and 5, the rows indexed 5 and 6 indicate the training strategy of sequentially

combining adversarial training and augmentation by spatial transformations (as in Table 1 of the main paper), i.e., take an adversarially trained network and re-train it with spatial (e.g., rotation) augmentation, or vice versa. It is clear from the visualizations in Figs 17, 18, 19 and 20 that they

image

Figure 16: (Best viewed in color, zoomed in) Accuracy of StdCNN/GCNN on MNIST(Table2)/CIFAR10(VGG16)/CIFAR100(ResNet18) with/without PGD (ϵ = 0.3, ϵ =8/255, ϵ = 2/255), on unrotated test. Train/test if augmented are with random rotations in  [−θ◦, θ◦].(a-d) MNIST, (e-h) CIFAR10, (i-l) CIFAR100, with each pair of plots eg. (a) being the accuracy while (b) being the corresponding avg. perturbation distance.

produce models which lie either bottom-right of the plot (high spatial accuracy but low adversarial robustness) or top-left of the plot (high adversarial robustness but low spatial accuracy). These are examples of the version of catastrophic forgetting displayed by these networks, as discussed earlier. Another viable baseline for our study is adversarial training with data augmented with rotations in the range  [−180◦, +180◦]. We observe from these results that CuSP with various configurations produces a model with both spatial invariance and adversarial robustness, in comparison to the baselines. While rows indexed 4 and 6 across the tables seem to show promise, PGD(Aug 180) expectedly suffers from relatively poorer adversarial accuracy, while Aug 180  →PGD (Aug 0) expectedly suffers from poor spatial accuracy. This is clearly a case of the aforementioned catastrophic forgetting issue, as the latter has a behavior similar to row indexed 3 PGD(Aug 0). The proposed method, CuSP, consistently outperforms these methods across the datasets and models – supporting the role of curriculum learning for simultaneous robustness on multiple fronts.

Other Settings and Extensions: While the main set of experiments were based on TRADES learning

rate schedule 75-90-100, we also tried other schedules like 40-80 (40-th and 80-th epochs) and 50-100. Table 7 shows that these also obtain competent results, hence indicating that CuSP can be further finetuned to obtain better Pareto-optimal solutions, which is a promising direction of future work.

image

Figure 17: Visualization of performance of CuSP based on PGD against other baseline strategies on CIFAR10 for StdCNN/VGG16 model (each index corresponds to a row in Table 4). On the left, we compare CuSP against various baselines. On the right, we zoom in to compare different variants of CuSP.

image

Table 4: Comparison of performance of proposed CuSP using StdCNN/VGG16 with other baseline strategies on CIFAR10. (Aug  θ): denotes training data augmented with random rotations in the range  [−θ, +θ]; a → b:denotes a sequentially followed by b during training.

image

Figure 18: Visualization of performance of CuSP based on TRADES against other baseline strategies on CIFAR10 for StdCNN/VGG16 model (each index corresponds to a row in Table 4). On the left, we compare CuSP against various baselines. On the right, we zoom in to compare different variants of CuSP.

image

Table 5: Comparison of performance of proposed CuSP using StdCNN/ResNet18 with other baseline strategies on CIFAR10. (Aug  θ): denotes training data augmented with random rotations in the range  [−θ, +θ]; a → b:denotes a sequentially followed by b during training.

image

Figure 19: Visualization of performance of CuSP based on PGD against other baseline strategies on CIFAR10 for StdCNN/ResNet18 model (each index corresponds to a row in Table 5). On the left, we compare CuSP against various baselines. On the right, we zoom in to compare different variants of CuSP.

image

Table 6: Comparison of performance of proposed CuSP using StdCNN/ResNet18 with other baseline strategies on CIFAR100. (Aug  θ): denotes training data augmented with random rotations in the range  [−θ, +θ]; a → b:denotes a sequentially followed by b during training.

image

Figure 20: Visualization of performance of CuSP based on TRADES against other baseline strategies on CIFAR10 for StdCNN/ResNet18 model (each index corresponds to a row in Table 5). On the left, we compare CuSP against various baselines. On the right, we zoom in to compare different variants of CuSP.

image

Figure 21: Visualization of performance of CuSP based on PGD against other baseline strategies on CIFAR100 for StdCNN/ResNet18 model (each index corresponds to a row in Table 6). On the left, we compare CuSP against various baselines. On the right, we zoom in to compare different variants of CuSP.

image

Figure 22: Visualization of performance of CuSP based on TRADES against other baseline strategies on CIFAR100 for StdCNN/ResNet18 model (each index corresponds to a row in Table 6). On the left, we compare CuSP against various baselines. On the right, we zoom in to compare different variants of CuSP.

image

Table 7: Performance of proposed CuSP vs PGD (Aug 180) for CIFAR10 on StdCNN/VGG16 with various learning rate schedules (Aug  θ): denotes training data augmented with random rotations in the range  [−θ, +θ];a → b: denotes asequentially followed by b during training;  [E1 − E2 − ... − Eh]: denotes the learning rate changes at epoch  Ei.

image

Table 8: Trade-off (PGD(Aug 0) vs PGD(Aug 180)) on simpler architectures like LeNet and MLP on CIFAR10 dataset.

To study the trade-off further across a wider variety of architectures, we ran additional experiments using LeNet (2 Conv + 2 FullyConnected) and two MLP (4 layers) architectures and obtain the following results in Table 8 similar to Table 1 row 3 (PGD (Aug 0)) and row 4 (PGD (Aug 180)) with CIFAR10. Note that the trade-off exists here too, supporting our observations across a gradation of architectures (MLP, LeNet, VGG16, ResNet18, WideResNet34, GCNN).


Designed for Accessibility and to further Open Science