With the development of CNN(Convolutional Neural Network), computers can deal with many tasks today, such as target classification[25-30], face recognition, license plate recognition, etc. In some tasks, computers performs even better than humans. Therefore, more and more works are done by computers rather than humans nowadays. However, just like hackers attack computer systems which threaten the security of computer, there are always unscrupulous people want to benefit from finding security holes in the security-sensitive fields, which impose secure threat to CNN-based systems.
So how about the security of the neural networks? It is a pity that, for almost every classification networks, lots of adversarial examples [2] can be generalized to mislead the classification result only by adding small perturbations on original images [3]. Such adversarial examples are potential threats to a wide range of applications (e.g. imagine that a "No passing" sign can be detected as a "No parking" sign by a self-driving car, just because of some small perturbations that humans are not aware of [4]). Therefore, finding a robust defensive method against adversarial attacks is really important nowadays.
The existing defense methods can be roughly divided into four categories: (1) Hiding the information of the target model to increase the difficulty of generating adversarial examples, e.g., defensive distillation [4],[10]; (2) Training the classifier with adversarial examples to improve its precision [3]; (3) Removing the adversarial perturbations by training a denoising autoencoder [5],[8]; (4) Training a classifier to distinguish between real images and adversarial examples [6],[7].
However, all these methods have disadvantages. For the first category, [20] showed that defensive distillation does not significantly increase the robustness of neural network; For categories 2 and 3, they need adversarial examples to train the defense, so these defenses are only effective to the process for generating those adversarial examples; For the last category, Carlini and Wagner [9] showed that these adversarial detecting methods can't defend their C&W Attack with slight changes on loss function.
Even for some powerful defense methods such as MagNet [7] and HGR [5], Carlini found them ineffective several days after they are published [22]. Viewing these challenges, we change our mind to build a defense system in a smaller scope to avoid being easily cracked.
Contributions: (1) We propose an efficient, effective defense method against adversarial examples. Our method is independent to the generation process of adversarial samples, as it requires only real images for training. (2) We explain the working mechanism of our "two-stream" method, which also explains why our method is difficult to attack.
Fig.1: Features visualization of the "Incv-3" network by Lucid [15].
during the transmission of information when the input image is “adversarial example”, which causes the final misclassification. The images from left to right in Fig.1 are the input images and the visualization results of each neuron in the 4A, 4D and 5A layers in GoogleNet. By observing Fig.1, we can find that, in the process of classifying a real image, the neurons in the "high-resolution" network can accurately classify the categories of the local areas according to the the texture information among the receptive fields, and these correct features can be delivered layer by layer, which leads to the correct classification result. However, as can be seen from the pictures on the second row, the existence of adversarial perturbations makes it impossible for the low-level neurons to accurately extract local features, which in turn affects the final classification results. This is the way how adversarial examples affect the "high-resolution" neural networks.
mistakes from a human perspective in Fig.2. Fig.2 is obtained by dividing the two images in Fig.1 into 10*10 small squares respectively and then disturbing the arrangement of the small squares. The size of each small square in Fig.2 is approximately equal to the size of the receptive field of 4A layer in GoogleNet. In another word, all the information that each neuron in the 4A layer can obtain is included in a small square in Fig.2. And the disordered order is to make it impossible for us humans to judge the category by the context information of each small square, and only to look at each small square independently, thus the angle of view of each neuron node in the 4A layer is simulated. By carefully observing Fig.2, we can see that for the picture on the left, we humans can classify most of the small squares to “dog” without the outline information. However, when faced with the picture on the right, we will find that we are unable to accurately classify these small squares into “dog”. The reason is that the perturbations destroy the texture features, so we can’t accurately classify the small squares in the right picture by using the texture features just like what we do in the left picture.
The size of the small squares in Fig.2 is divided according to the size of the receptive field in the 4A layer in GoogleNet. Therefore, our neural network encounters the same problem in facing the adversarial examples. The change of texture features destroys the feature expression of each neuron, and such errors finally lead to a wrong classification result after being uploaded through layers. As to the real image in disordered order, the image on the left in Fig.2 can still be classified into “dog” with more than 90% confidence in GoogleNet, which confirms that the classification logic of GoogleNet, a high-resolution network, is different from humans, and they rely more on local features for image classification.
Fig.2: The Real image and the adversarial example in disordered order
Baker N [24] elaborated on the idea that neural networks do not rely on contour information in an article entitled “Deep convolutional networks do not classify based on global object shape”. In this paper, the author combines the texture of the object A with the contour of the object B to test which feature the neural network is more dependent on. Reading this article will be of great help in reading our paper.
2.1 What kind of problem has led to the failures of the defending methods?
As mentioned above, many defending methods tends to fail against adversarial examples. It's difficult for these method to detect the adversarial examples [9], let alone correctly classification on them. What's the reason behind this situation? In this paper, we give the assumption that this reason is caused by the insufficient amount of data. From the perspective of information theory, all classification problems require a certain amount of information to support their classification results, and the information comes from the data involved in training. Adversarial perturbations increase the entropy of the pictures, so that the amount of information contained in the pictures are reduced, which lead to decrease of the information for correctly classification. And it is the decrease of information fails the defending methods. The adversarial perturbations degenerate the textures of the images. If we want to classify such images into right label, the defending methods should not depending on the texture features. To achieve this, a simple way is to enlarge the receptive field in CNNs, which is very similar to resizing the image to a smaller size. In our experiments, we did an experiment in ImageNet that resizing the training images to 32*32 to avoid interference from the adversarial perturbations, as a result, the testing accuracy is less than 10%, which proves that the amount of information in ImageNet is not sufficient to support 1000 categories of classification without the texture features. Similar to the situation in information theory that the coding algorithm with insufficient information will definitely fail, the classification tasks with insufficient information is like trying to make a dress with a handkerchief, which is destined to have a lot of loopholes.
Under the assumption that we can't obtain more data to offset the lack of information, we change our mind to build the defense system in a smaller scope to avoid of been cracked. Therefore, we propose a defense method that is extremely difficult to break under the following constraints: (1) The size of the input images should be 299*299, which is the size of the input images in GoogleNet. (2) The input images should involved in the 10 categories in CIFAR-10.
2.2 Why choose "two-stream"?
The idea of "two-stream" has been widely used in the security-sensitive field. For example, in the communication protocol, the "checksum" is used to transmit along with the "body part" to check for errors during transmission, the safe deposit box needs the keys of both the banker's and the customer's to open, important experiments need to be successfully replicated in different laboratories to be recognized, etc.
Moreover, during the research we found that the transferability of the adversarial examples are always pretty good when the target classifier is within googlenet, incv3, incv4, resnet and the networks derived from them. However, the fooling ratio will be much lower when the target classifier is CapsNet [23]. We believe that the reason of this phenomenon is that, the extraction of low-level features is more likely to be affected by the size of the receptive field of neurons in low layer. The low-layer neurons of the state-of-the-art classification algorithms have similar receptive fields, which leads to the similarity of the low-level features they extract, resulting in the transferability of the adversarial examples in these neural networks. However, the low-layer neurons in CapsNet have a much larger receptive field, which makes it more robust to the adversarial perturbations generated by the other networks.
In our "two-stream" architecture, the "low-resolution" network can be treat as a network with large receptive field for low layer neurons in dealing with high-resolution images, so the transferability of the adversarial examples between "high-resolution" and "low-resolution" network is bad, which is the reason why our method is effective.
Fig.3: The framework of our two-stream network
Similar with SafetyNet [19] and MagNet [7], the workflow of our "two-stream" architecture consists of two steps: (1) a detector that rejects the adversarial examples and (2) a classifier that classifies the remaining images to the right label. As shown in Fig.3, the classification results of the "high-resolution" and "low-resolution" networks are fed to the comparation algorithm, and the comparation algorithm acts as both a detector and a classifier. The specific comparation algorithm is shown in Algorithm 1. The Mapping Table is a Table that maps the labels in Imagenet to the labels of Cifar-10, e.g., n02123045, n02124075,... → "Cat"; n02110063, n02110806,... "Dog", etc.
It is worth noting that this is just a generic backbone, and the network used in this framework can be replace by other backbones (e.g. the Incv3 in "high-resolution" network can be replaced by VGG16, ResNet-152 or the other networks trained in ImageNet, and the ResNet-32 in "low-resolution" network can also be replaced by NiN, AllConv or the other networks trained in Cifar-10), which greatly increases the flexibility of this framework, making it difficult for attackers to implement white box attacks.
Algorithm 1 is the comparation algorithm, where p_1 and p_2 are hyperparameters as thresholds, which are set to 10% and 20% in our experiments, respectively. Y indicates the Labels, and P indicates the probability of these Labels. Yhigh and Phigh indicate the label and its corresponding probability of the TOP-5 classification results of the “High-resolution” network, Ylow and Plow indicate same things of the “Low-resolution” network.
In order to verify the practicality of our proposed method, we built a network of 10000 user nodes and 1 server node to simulate a real network environment. The user node consists of 9000 normal user nodes and 1000 adversarial user nodes. Each normal user node periodically sends a real picture to the server to request the classification result, and the adversarial user node periodically sends an adversarial example. What the server node needs to do is to find and add these adversarial user nodes to the blacklist to prevent them from accessing the server, and return the correct classification results to the normal user nodes at the same time. In order to achieve this goal, we adopt the following algorithm 2 on the server node to distinguish whether a user node should be blacklisted. The sources of the images are recorded in ImgnIP, The confidence coefficient for each IP is set to CC[IP], CC[IP][0, 15], The blacklist is set to be Bl[], The detection result of our "two-stream" network is recorded in ImgnRoF, "1" means it's a real image, and "0" means not.
In this paper, we divide the attack methods into two categories, namely type I attack and type II attack. Type I attack aims at fooling the high-resolution network and type II attack aims at the low-resolution network. We evaluated our defense against four popular attacks. Universal Adversarial Perturbations is type I attack, One Pixel Attack and Carlini Attack are type II attacks, FGSM can be both type I and type II attack. We now explain these attacks one by one.
Fast Gradient Sign Method (FGSM): Goodfellow et al. [3] introduced this adversarial attack algorithm. They developed a method to generate an adversarial example by solving the following problem: x' = x + εsign(
Loss(x, lx)). This attack is simple, but effective. Kurakin et al. [17] described an iteration version of the FGSM. For each iteration, the attack applies FGSM with a small step size α and clips the updated result after each iteration so that the updated image stays in the ε neighborhood of the original image. However, this adversarial attack can hardly fool a black-box model. To address this issue, Dong Y [18] proposed momentum iterative fast gradient sign method (MI-FGSM) to boost adversarial attacks.
Universal Adversarial Perturbations: Followed their previous work on DeepFool [16], Moosavi-Dezfooli [1] proposed this universal adversarial attack. Unlike other methods that compute perturbations to fool a network on a single image, this method is able to fool a network on all images. Moreover, the universal perturbations were shown to be generalized well across different neural networks.
One Pixel Attack: Su J [14] introduced this adversarial attack algorithm. They generate adversarial examples by only modifying one pixel. They claimed successful fooling of three common deep neural network on about 70% of the tested images. It is worth noting that, this attack method generates adversarial examples without any information about the parameter values or the gradients of the network. We utilize the “one-pixel” and “three-pixel” versions to test our method in our experiment.
Carlini Attack: Carlini [21] introduced an attack method for Cifar-10 and MNIST. It is the most powerful type II attack we found.
We evaluate the properties of our "two-stream" architecture on three datasets: car196 [12], fgvc-aircraft [13] and ImageNet [11]. Car196 [12] and fgvc-aircraft [13] are fine-grained datasets, which cantains 16185 images of 196 classes of cars and 10200 images of 102 kind of aircrafts, respectively. In this paper, these two databases are used to test the defensive performance of our architecture for the "automobile", "truck" and "airplane" categories. And the Imagenet [11] used in this paper is composed of the Cifar-10 related categories selected from the original Imagenet database, e.g. n01582220, n01601694$→bird, n01644373, n01644900→frog. There are totally 217 out of 1000 categories in Imagenet that can be classified into the 10 categories in Cifar-10, and the other 783 are labeled by “other”.
The classification results of the "High-resolution" and "Low-resolution" networks are directly used to determine whether an image is an adversarial example or not, so it will be a disaster for our framework if there is an attack method that can take effects on both networks. To this end, we did an experiment to test the performance of the state-of-the-art attack algorithms on both of the networks. The experimental results are shown on Table 1, and “Non Attack Data” is a control group here.
In Table 1, H-Net means the Top-5 accuracy of the "High-resolution" network. L-Net means the Top-1 accuracy of the "Low-resolution" network. For the type I attack on Cifar-10, we resized the images from 32*32 to 299*299 to make them can be attacked by the type I attacks like the high-resolution datasets. And the process shown in Fig.4 is used to achieve the type II attacks on high-resolution datasets.
Table 1: Classification accuracy of "High-resolution" and "Low-resolution" networks on adversarial examples generated by
As can be seen from Table 1, Type I Attack can only affect the classification result of “H-Net”,and Type II Attack can only affect the classification result of “L-Net”, in another word, Neither type I attack nor type II attack can be effective on both networks, which means that their misclassification results are irrelevant, so it is feasible to determine whether the input image is adversarial example by comparing their classification results. In addition, it is worth mentioning that it is not the attacking methods that caused the accuracy to drop while attacking Cifar-10 with type I attacks. We did a comparative experiment that just resize the images in Cifar-10 to 299*299 and then resize back to 32*32, the classification result of these images is 88.9%, which means that it is the "resize" but not the attack methods that reduced the accuracy.
Fig.4: The flowchart for implementing the Type II Attack in high-resolution images
For the Type II Attack in high-resolution images, we take three steps to achieve the attack, the image is resized to 32*32 first, then this resized image is attacked by Type II Attack, and then the difference between the obtained adversarial example and the 32*32 original image is calculated, finally, the difference map is zoom in and overlay into the original image.
Table 2 shows the detection and classification results of our "two-stream" architecture. “Reject” means the rate at which the images are detected as adversarial examples and rejected by our "two-stream" architecture. “Right” means the rate at which the images are not rejected and classified into the right label, same for “Wrong”. As can be seen from Table 2, almost all of the images can be either detected as "adversarial example" or classified into the right label. In another word, it is hard to produce an example that (a) is mislabeled and (b) is not detected as an adversarial example by "two-stream" architecture. And this is the standard Lu J [19] proposed to evaluate the quality of a defense method.
The experimental results of simulating a real-world network environment are shown in Fig.5, each polyline represents the proportion of a class of user nodes that are blacklisted. The horizontal axis represents the number of images sent by the user node and the vertical axis represents the proportion of blacklisted.
Table 2: Summary of the reaction of our "two-stream" architecture on various attacks.
It can be seen that the adversarial user nodes with strong perturbations(Incv3, Universal) are rapidly blacklisted, and the nodes with weak perturbations also have a high probability of being blacklisted, meanwhile, referring to Table 2, the classification results returned to these unshielded nodes are often the correct classification results. Therefore, not our defending algorithm is not strong enough, but the strength of these attack algorithms(DeepFool, Three-pixel) are not enough for us to shield them. As to the normal user nodes, the proportion almost equal to 0. Actually, during our totally 50 times experiments, only 17 normal user-nodes were added to the blacklist. Therefore, our defense algorithm performs well for both single images and simulated real-world network environments.
Fig.5: The proportion of user nodes being blacklisted by the server node
In this paper, we propose a new "two-stream" architecture to defend against adversarial examples. By comparing the classification results of the "high-resolution" and "low-resolution" networks, our "two-stream" framework is able to detect adversarial examples without requiring either adversarial examples or the knowledge of the generation process. It is two kinds of networks but not two specific networks that compared in our "two-stream" framework, which makes it: (1) can be further enhanced by new datasets and new backbones in the future, (2) difficult for an attacker to implement white-box attack. Experiments show that, it is hard to produce an example that (a) is mislabeled and (b) is not detected as an adversarial example by "two-stream" architecture. Moreover, we sketched one possible reason for why "two-stream" works by analyzing the impact of adversarial perturbations on neural networks.
[1] Moosavi-Dezfooli S M, Fawzi A, Fawzi O, et al. Universal adversarial perturbations[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1765-1773.
[2] Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks[J]. arXiv preprint arXiv:1312.6199, 2013. Available: https://arxiv.org/pdf/1312.6199.pdf
[3] Goodfellow I J, Shlens J, Szegedy C. EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES[J]. stat, 2015, 1050: 20.
[4] Papernot N, McDaniel P, Goodfellow I, et al. Practical black-box attacks against deep learning systems using adversarial examples[J]. arXiv preprint, 2016. Available: https://arxiv.org/pdf/1602.02697v3.pdf
[5] Liao F, Liang M, Dong Y, et al. Defense against adversarial attacks using high-level representation guided denoiser[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1778-1787.
[6] Metzen J H, Genewein T, Fischer V, et al. On detecting adversarial perturbations[J]. arXiv preprint arXiv:1702.04267, 2017.Available: https://arxiv.org/pdf/1702.04267.pdf
[7] Meng D, Chen H. Magnet: a two-pronged defense against adversarial examples[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017: 135-147.
[8] Ross A S, Doshi-Velez F. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients[C]//Thirty-second AAAI conference on artificial intelligence. 2018.
[9] Carlini N, Wagner D. Adversarial examples are not easily detected: Bypassing ten detection methods[C]//Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. ACM, 2017: 3-14.
[10] Nguyen L, Wang S, Sinha A. A learning and masking approach to secure learning[C]//International Conference on Decision and Game Theory for Security. Springer, Cham, 2018: 453-464.
[11] Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252.
[12] Krause J, Stark M, Deng J, et al. 3d object representations for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013: 554-561.
[13] Maji S, Rahtu E, Kannala J, et al. Fine-grained visual classification of aircraft[J]. arXiv preprint arXiv:1306.5151, 2013. Available: https://arxiv.org/pdf/1306.5151.pdf.
[14] Su J, Vargas D V, Sakurai K. One pixel attack for fooling deep neural networks[J]. IEEE Transactions on Evolutionary Computation, 2019.
[15] Olah C, Satyanarayan A, Johnson I, et al. The building blocks of interpretability[J]. Distill, 2018, 3(3): e10.
[16] Moosavi-Dezfooli S M, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2574-2582.
[17] Kurakin A, Goodfellow I, Bengio S. Adversarial examples in the physical world[J]. arXiv preprint arXiv:1607.02533, 2016. Available: https://arxiv.org/pdf/1607.02533.pdf
[18] Dong Y, Liao F, Pang T, et al. Boosting adversarial attacks with momentum[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 9185-9193.
[19] Lu J , Issaranon T , Forsyth D . SafetyNet: Detecting and Rejecting Adversarial Examples Robustly[C]// IEEE International Conference on Computer Vision. IEEE Computer Society, 2017.
[20] Carlini N, Wagner D. Towards evaluating the robustness of neural networks[C]//2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017: 39-57.
[21] Carlini N, Wagner D. Towards evaluating the robustness of neural networks[C]//2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017: 39-57.
[22] Athalye A, Carlini N. On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses[J]. arXiv preprint arXiv:1804.03286, 2018. Available: https://arxiv.org/pdf/1804.03286.pdf
[23] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules[C]//Advances in Neural Information Processing Systems. 2017: 3856-3866.
[24] Baker N, Lu H, Erlikhman G, et al. Deep convolutional networks do not classify based on global object shape[J]. PLoS computational biology, 2018, 14(12): e1006613.
[25] Tu X, Xie M, Gao J, et al. Automatic categorization and scoring of solid, part-solid and non-solid pulmonary nodules in CT images with convolutional neural network[J]. Scientific reports, 2017, 7(1): 8533.
[26] Tu X, Zhao J, Jiang Z, et al. Joint 3D Face Reconstruction and Dense Face Alignment from A Single Image with 2D-Assisted Self-Supervised Learning[J]. arXiv preprint arXiv:1903.09359, 2019.
[27] Tu X, Zhao J, Xie M, et al. Learning Generalizable and Identity-Discriminative Representations for Face Anti-Spoofing[J]. arXiv preprint arXiv:1901.05602, 2019.
[28] Tu X, Gao J, Zhu C, et al. MR image segmentation and bias field estimation based on coherent local intensity clustering with total variation regularization[J]. Medical & biological engineering & computing, 2016, 54(12): 1807-1818.
[29] Tu X, Zhang H, Xie M, et al. Enhance the Motion Cues for Face Anti-Spoofing using CNN-LSTM Architecture[J]. arXiv preprint arXiv:1901.05635, 2019.
[30] Tu X, Zhang H, Xie M, et al. Deep Transfer Across Domains for Face Anti-spoofing[J]. arXiv preprint arXiv:1901.05633, 2019.