A novel class of extreme link-flooding DDoS (Distributed Denial of Service) attacks [1] is the Crossfire attack, which is designed to cut off entire geographical areas such as cities and even countries from the Internet by simultaneously targeting a selected set of network links [2], [3]. The most intriguing property of this target-area link-flooding attack is the usage of legitimate traffic flows to achieve its devastating impact by making the attack particularly difficult to detect and, consequently, to mitigate [4].
The Crossfire attack uses a complex and massively large-scale botnet for attack execution [4]. A botnet is a network of computers infected with malware (bots) that can be controlled remotely. A command-and-control unit updates the bots by sending them the commands of the botmaster, which is orchestrating the attack by executing the attack procedure. The bots direct their low-intensity flows to a large number of servers in such a devastating manner that the targeted geographical region is essentially cut off from the Internet.
The success of the attack depends highly on the network structure and how the attacker plans and initiates the attack sequence [5], [6]. The attacker aims to find a set of target links which connects to the decoy servers such that if the target links are flooded, traffic destined to the target area is prevented from
Figure 1. The Crossfire attack traffic flows congest a small set of selected network links using benign low-rate flows from bots to publicly accessible servers, while degrading connectivity to the target area.
reaching its destination. Reciprocally, access from the target area to Internet services outside the target area will be cut off. For the adversary to achieve its goal, it chooses public servers either inside of the target area or nearby the target area, which can be easily found due to their availability. The quality of the attack depends on the specific selection of servers and the resulting links to be targeted, but also on the overall network topology [7].
The Crossfire attack: The Crossfire attack consists of three phases: (a) the construction of the link map, (2) the selection of target links, and (c) the coordination of the botnet. While phases (a) and (b) are sequentially executed only initially, once triggered phase (c) is executed periodically. Fig. 1 illustrates the dynamics of the Crossfire attack.
Link map construction: The initial step of the Crossfire attack is the construction of the link map. The attacker crates a map of the network along the ways from the attacker’s bots to the servers using traceroute. The result of traceroute inevitably consist of a record of different routes between the same pairs of nodes, because of network-inherent elements influencing the effective route chosen (e.g., ISP and loadbalancing). Subsequently, a link map is gradually constructed, which exposes the network structure and the traffic flow
behavior around the target area. 1) Target link selection: After the construction of the link map, the adversary evaluates the data for more stable and reliable routes to decide on its selection of the target links. The adversary prefers disjoint routes with mostly independent target links for the attack to create the biggest impact. Bot coordination: In the final phase of the attack, the adversary coordinates the bots to generate low-intensity traffic and to send it to the corresponding decoy server. The targeted aggregation of multiple low-intensity traffic flows on the target link ideally exhausts its capacity, hence, congesting the link. Because the Crossfire attack aims to congest the target links with low-rate benign traffic, neither signature based Intrusion Detection Systems (IDS) nor alternative traffic anomaly detection schemes are capable of detecting malicious behavior on individual flows. The Crossfire attack’s detectibility can even be further reduced by integrating any of the following features into the attack: the attacker (a) gradually increases bot traffic intensity, (b) estimates the decoy servers’ bandwidth to avoid exceeding their bandwidth, (c) evenly distributes the traffic over the decoy servers, (d) alternates the set of bots flooding a target link, and (e) alternates the set of target links [4]. Although these techniques further sophisticate the attack, research described in this paper focuses on the effort the adversary has to invest for successful attack preparation and execution. As it turns out, the inherent complexities of the attack create also substantial execution obstacles, which exposes the attack to detection vulnerabilities. In this paper, we describe how the Crossfire attack has been replicated in a realistic test bed emulation. The traffic has been measured during the topology construction phase and attack phase and analyzed for patterns and vulnerabilities of the Crossfire attack. The results indicate that characteristic traffic anomalies emerge in the attack region. Particularly, we found a correlation between coordination of the botnet traffic and the quality of the attack and a correlation between the attack distribution and detectability of the attack. Additionally, we show that due to the bot synchronization there is a warm-up period after the attack is launched and before the target links are overwhelmed. Because of this warm-up period and the distinguishing patterns in the topology construction phase, the obtained results pave the way for novel detection methods in the early stage of the attack, when the attack traffic is formed [8]. As a consequence, based on intrinsic property of attack traffic distribution, we propose a new approach to monitor the traffic volume (or intensity) on specific network regions for any sudden subtle changes on some of the links. Depending on the resolution of the monitoring scheme, this leads to an early detection of the attack, which we illustrate in this paper. This paper also provides a functional analysis on how to assess the impact of the Crossfire attack on the effected area more realistically instead of over-estimating resources needed for attack detection and mitigation. We analyze these challenges in attack preparation and execution of the Cross-fire attack and exploit them for attack detection. Hence, we
describe a prototypical Crossfire attack detector, which exploits these vulnerabilities. For this, we utilize two supervised machine learning approaches: Support Vector Machine (SVM) and Random Forest (RF) for classification of network traffic to normal and abnormal traffic, i.e, attack traffic. To show the feasibility of detection, we report on the trained scenarios using the link volume as the main feature set. Finally, results of the attack detector are reported along with some future directions to improve the detector.
A. Monitoring points
Considering the described Crossfire attack execution sequence, it turns out that there are potentially four ways to detect the attack: (a) detection at the traffic flows origin, i.e., bot sides, (b) detection at the target area, (c) detection at the target link, and (d) detection at the decoy servers. Following, we address the advantages and disadvantages each of the four ways to finally justify our choice for traffic monitoring.
• Detecting at origin can be the fastest way to stop an attack before even it is initiated. However, versatility and spatial distribution of bots (source of the attack traffic) makes it the most challenging option.
• Detection at target area is the most reasonable approach as any target areas should be equipped for self defense. However, assuming not all decoy servers are inside the target area , early detection is impossible [9].
• Detection at target link might be the simplest form of detection as simple a threshold based detection system that could detect the trend of the incoming traffic.
• Detection at decoy servers can be the best approach to detect Crossfire attack. Assuming the target area is not far from the decoy servers (3 to 4 hops [4]) detecting at the decoy servers might reduce the impact of the attack. Our approach is based on detection at the decoy servers, because it is the exclusive area that the defender can detect the attack while actively respond to it. To emphasize the effectiveness of our detection approach at the decoy servers, we address the question of where is the best location to probe the network. In a high resolution, this probing can be placed either at the target link, before target link or after the target link. Monitoring a single link as a target link is not considered as a solution because of two reasons:
• Any links can be targeted for an attack. Therefore, there should be one-to-one detector for every link in the network. While, in our proposal there is only one detector but many probing points.
• Monitoring and detecting based on a single link will fail in distinguishing between link attack and flash-crowd. The main goal is to detect the Crossfire attack without necessity of having the target link info. To find out the best monitoring domain, we assume to know the location of the target link for now. The question is which side of the target link provides more information for detection? Assuming the number of ports of a switch/router is limited, considering only
Figure 2. A four sub-tree topology of 10 bots, 10 normal clients, and 40 decoy servers. Each 10 decoy servers connected to a switch is called a sub-tree.
the immediate links before or after the target link might not help to choose a side. However, getting farther away from the target link the distribution of the intensity of the traffic on the links might be a function of the distribution of the end points. We will show in Section IV-B that more distributed attack traffic is more difficult to detect.
Depending on the budget of the adversary, the number of bots purchased for an attack can be in range of thousands to millions. If the source of the attack traffic, i.e., bots, is geographically spread out, the variation of the traffic volume on most of the links is very small (for many routes there might be only one or few attack flows before they are aggregated at the target link). That leaves only few link closer to a target link worth to examine. However, the chosen decoy servers should not be very far away from the target area (if they are not inside the target area). Since there are smaller number of destinations for the attack traffic than the number of sources of generating them, it can be assumed that the variation of the volume of the traffic caused by the attack traffic on the links after the target link is higher than the links before the target link. Therefore, we suggest monitoring links around servers or data centers results in better detection than around clients.
The approach of evenly distributing the traffic for decoy servers [4], might even support the above reasoning and rather make it simpler to detect some variation on the traffic volume on several links. The important element in this method is to be able to monitor the traffic at several links and send the information to a detector for decision making.
In order to substantiate our discussion from Section IV, we emulate the Crossfire attack in a realistic test bed environment. The test bed is implemented in Mininet and the following setup has been chosen for the emulation environment:
• SDN network created in Mininet.
• SDN switches with POX controller.
• D-ITG traffic generator [10].
• Tree topology (cf. Fig. 2).
• Link bandwidth is set to 2 Mbps with 10 ms delay.
Table I DIFFERENT VARIATIONS OF THE NETWORK TOPOLOGY.
• POX controller gets link status every 5 sec from switches.
• Bots generate both normal and bot traffic.
• Some background traffic from clients.
• Some background traffic at the leaf switches (to decoy servers) to level up the traffic at the edge links. One focus in this paper is the correlation of the traffic distribution on the detectability of the Crossfire attack. We hence used a tree structure as the topology of the network. This permits us to intuitively expand the topology of the network, i.e., the tree structure, in order to widen the traffic distribution. Fig. 2 illustrates a base network for our emulation in which there exists several sub-trees, each of them includes 10 decoy servers. To investigate different traffic distributions on the network, we design three variations of this topology as shown in Table I. Fig. 2 depicts the network topology of 4ST which includes 4 sub-trees, 11 switches, and 40 decoy servers. From practical aspect, Mininet with D-ITG traffic generator have limitation on the size of the network in the emulation. This is attributes to the fact that we need to reduce CPU utilization in SDN networks. Hence, the bandwidth of all links in our emulations are set to 2 Mbps to be able to saturate the target link with fewer bots and less number of traffic generators. Moreover, the number of clients and bots are set to a small number of 10 each, to compromise for a larger number of decoy server. Nevertheless, bots and clients can generate traffic in higher rate to rectify the problem. All switches are SDN switches connected to a POX controller. We modified the POX module flow_stats.py provided in Github [11], which gives the controller its ability to collect some port- and flow-based statistics from switches. By using this code, the controller sends a stat request to all of the switches connected to the controller every five seconds. The respond from switches is the number of packets in the buffer of each port and the number of flows at each link. There are 20 clients in this network including 10 bots (connected to switches 1 and 2) and 10 normal clients (connected to switches 3 and 4). Clients can be considered as super clients which can generate traffic with higher rate than a normal client (or bot) can do. Bots generate two types of traffic: normal traffic from beginning to the end of the experiments, and bot traffic which starts after d seconds and for duration of another d seconds. For experiments in Section VII, d is set to 5 and to 30 minutes to have enough samples for the detector. There is a limited number of traffic types for both normal and bot traffic. Table II presents all traffic types used in the experiments. Background traffic (normal traffic) consists of five application traffic including: Telnet, DNS, CSa (Counter Strike active player), VoIP and Quakes3, that D-ITG allows us
Table II PARAMETERS OF THE TRAFFIC.
to generate [10]. Both normal clients and bots are using these traffic to generate background traffic. However, since we could not specify any inter-departure time nor packet size using these applications, we use simple TCP requests to generate attack traffic. To make the two type of traffic indistinguishable, we add the same TCP traffic to the set of background traffic as well. This is the requirement of the Crossfire attack in which background traffic is indistinguishable from the attack traffic.
Although the type of the normal and abnormal traffic should be the same, the rate of traffic for the two type of traffic can be different. In reality, the rate of bots’ traffic must be engineered by the attacker. Here we set the rate base on the remaining bandwidth of the targeted link after receiving the normal traffic. The details of the traffic types and their parameters for the setup in Fig. 2 is given in Table II.
In addition to the traffic generated by the clients and bots, there are extra traffic generators attached to some of the switches (mostly leaf switches) to increase the level of the background traffic at the links. This can be considered as the traffic coming from another part of the network which is not in Fig. 2. Since, the number of clients is much less than the number of servers, the extra traffic generators help to boost the level of traffic at the edge links connected to the servers.
As noted before, the contribution of this paper is to expose hardships of the Crossfire attack and use them for an early detection method. We specifically focus on the effect of bot traffic synchronization on the quality of the Crossfire attack, and the effect of the distribution of the attack on detectability of the attack, which we describe in the following two subsections. Since our focus is on the detection of the attack, we ignore the first few steps of the Crossfire attack such link map construction, finding link persistence, or target link selection. We assume that all attack preparations have been made and the attacker is ready to attack.
A. Bot traffic synchronization
The topology we use is presented in Fig. 2. At this stage, to bring down the target link, the adversary only needs to start the bot traffic and direct it to the decoy servers. Thus, the botmaster initiates the attack by sending the attack order to the Command and Control (C&C) server or some selected peers depending on the structure of the botnet. Bots usually update each other in a polling or pushing mechanism. However, the question which is of interest is what happens if bots receive the attack order in different time order?
When designing Crossfire detection mechanisms, an often ignored part of the Crossfire attack is the phase from the attack initiation and the successful impact of the attack [4]. This often ignored part of the Crossfire attack, which we call it warm-up period, is the time difference between the time of the first bot-flow of the attack reaches the target link and the moment the target link is down. By definition, the attack actually happens at the end of the warm-up period when the target links are down. Since, reaching a zero time warm-up period is hard, this period can be used for early detection and before the attack successfully takes place.
In fact, for several reasons reaching a zero warm-up time is hard. One reason could be the dynamic delay of packet arrival at the target link. That could be because of variations of hop distances from bots to target link, or the delay in receiving attack order from the adversary. Any sudden significant change on traffic volume can be detected by firewalls and IDSs. Therefore, adversaries gradually increase the attack traffic volume to prevent being detected. To have a perfect link failure, the volume of the traffic arriving at the target link should be slightly higher than the bandwidth of the target link itself. However, this might not happen immediately. There are three main reasons for gradual traffic growth:
1) Bot traffic can be originated from any geographical location in the world and they might arrive at the target link with different delays (dynamic delay).
2) Since the source of the attack is a botnet, it is reasonable to assume that there is some time slack between each bot to start sending the bot traffic. This time slack can be caused by how bots receive updates from their C&C center or from other peers in an advanced P2P botnet, but also from the malware itself [12], [13].
3) Bots might gradually increase their traffic intensity to prevent detection [4]. This can be considered as the main reason of gradual increasing the attack traffic volume. To illustrate the effect of the bot synchronization on the traffic volume of the target link, the result of an emulated attack is presented in Fig. 4. A two sub-tree version of Fig. 2 is used to generate above results. At this stage, to bring down the target link, the adversary only needs to start the bot traffic to the decoy servers. That means, at this stage, we are only running the last part of the Crossfire attack. Fig. 4 illustrates the utilization of the target link before and after the attack. Different curves in different colors represent different BS time for bots to generate the attack flow. In Fig. 4 the red curve is the baseline to show that perfect attack happens when all the bot traffic simultaneously arrive at the target link with their maximum intensity. The time interval which is used in BS in above experiments is in range of 1 to 5 minutes. The reason is that in a p2p platform (the most recent platform to synchronize Botnets) peers usually contact each other in range of few minutes [12], [13]. For instance, Skype peers update only closer peers every 60 seconds [12].
Figure 3. Bot traffic with various starting points and traffic flow duration.
In other studies like [14], the time synchronization between bots is reported in range of few milliseconds. However, there are few steps (three state machine) before they can reach to that accuracy and those states take sufficiently long (i.e., few minutes). Therefore we still can assume that there is enough time in range of few minutes before the real attack takes place. We are looking here at the time difference between arrival of the first packet of each bot to the target link. The time difference between arrival of each packet of any bot traffic could be in range of milliseconds which is not our concern here.
Since we are now aware of this possible early detection, we discuss how to detect an attack which is formed by some low-intensity non-malicious traffic. The main idea is analogous to detecting the variation of the volume of the traffic at several links. Since one cannot gain any information from per flow traffic monitoring (the attack traffic is benign traffic), and attack type is a flooding attack, probing the volume of the traffic at several links might be effective. Although, a single bot-flow is very small and can be detected neither at IDS nor at the server, the aggregation of the flows are not small anymore. All these small traffic flows must be aggregated at the certain time and place to be able to overwhelm the target link(s).
Another important parameter to generate the bot traffic is the duration of the attack. Usually, bot-masters (adversaries) tend to reduce the duration of the attack to prevent being detected. The combination of the dynamic delay and the attack duration is difficult to figure out. The attack duration parameter which is the time difference between the end of the warm-up period to the end of the attack, is named Dur in our experiments.
In the case of the Crossfire attack, a rolling mechanism is introduced to keep the attack at the data plane (evade activating control plane which redirects the traffic) [4]. In the rolling scheme, a set of target links only be used for a specific period of time before it switches to another set of target links. The duration they used in the rolling scheme is 3 minutes. The 3 min is the keep-alive messages time interval for the BGP algorithm. This duration might be insufficient
Figure 4. The effect of bot-traffic synchronization on the warm-up period. 2 Sub Trees with different warm-up periods.
when the attacking traffic is gradual because of any of the above mentioning reasons. This limitation of small duration of the attack forces the adversary to introduce another delay in forming the attack which causes a larger warm-up period. The effect of the length of the attack duration (including warm-up period) with various BS and Dur parameters are depicted in Fig. 3. For comparison purpose, the baseline is the case where bots simultaneously generate traffic (warm-up period is zero) for an unlimited duration of time. For the other curves, there are warm-up periods for BS length and duration of length Dur.
Figure 3 shows that, with less synchronized attack traffic,to have a successful attack, either the duration of the attack should be prolonged enough to pass the warm-up period or, the adversary should delay the attack and let the warm-up period passes before initiating the attack.
Although, the parameters in our experiments are set to small numbers (few minutes of warm-up periods)1, the result can generally be extended for longer periods. The main reason of keeping the simulation time short is limitation of resources in our setup. Increasing the size and the time of the experiments reduces the accuracy of the traffic generator [10].
B. Distribution
We discussed the traffic synchronization problem in forming the attack. The introduction of warm-up period can be used for early detection of the Crossfire attack which is the topic of the next section. In this section, we introduce a hypothesis about the link traffic intensity variation caused by the Crossfire attack and suggest to use it for detection. We hypothesize that even if the Crossfire attack is successfully formed by generation of very low intensity attack traffic, unavoidably there will be a sudden jump in the traffic on (backbone) links, whereby this jump will be characteristic for a Crossfire attack.
Figure 5. 2 Sub-Trees where jump can be seen at all levels.
Figure 6. 4 Sub-Trees and jump still visible.
The main objective of the Crossfire attack is to bring down a set of target links to effect the connectivity of a target area. Depending on the power of the attacker, the target area could be cut off completely from the Internet or the quality of the connection to the Internet could be degraded. Either way, to bring down the target link the link utilization should be increased to its maximum capacity. The extra unwanted traffic at the target link must go through downstream links and affects their utilization. For instance, the attack traffic at the target link between switch 5 and switch 7 in Fig. 2 must pass through the four subsequent links between switch 7 and downstream switches 8, 9, 10 or 11. This sudden extra change on traffic has a huge impact on these downstream links and perhaps the effect goes down further to other links as well. We will examine this impact through emulation and report all the results for most of the links below the target link. Then by expanding the size of the network, we try to hide this impact by distributing the jump on the target link through more links. In these experiments, we do not consider the gradual traffic intensity increase at the bots. All bots send traffic at the maximum predefined level. The warm-up period is 3 min.
Figure 7. 8 Sub-Trees where jump at the edge links are not visible.
The first scenario is a two sub-tree network of Fig. 2. In this scenario, the traffic at the target link can only go through two other paths. Fig. 5 shows that during the attack time, the target link is completely utilized and the other two links underneath of the target link are under influence of the sudden traffic change. The green line is for the traffic at the edge of the network where the switch 8 is connected to the Decoy Server 1. The result of running the similar experiments with some larger networks is illustrated in Fig. 6 and Fig. 7.
Comparing the link utilization in Fig. 5, Fig. 6 and Fig. 7 shows that increasing the number of branches in the network, reduces the obvious jump on the downstream links. In particular in Fig. 7, it is very difficult to distinguish the attack period only by looking at the edge link (the green line) utilization.
The main point of these experiments is to show that sensing a similar variation on traffic intensity on multiple links could be a good indication of Crossfire attack for detection. Since expanding the network reduces the jump in the link utilization, the detector must be accurate enough to detect very small variations of the traffic intensity where cannot be detected by unarmed human sight.
A. Warm-up phase
As mentioned, warm-up period is the phase from the attack initiation and the successful impact of the attack [4]. Early detection means detecting the attack during this phase when the attack traffic has reached to the decoy servers but the network is still operational. We show that by the time the attack starts the correlation among links to decoy servers gradually increases during the warm-up period potentially providing sufficient time and data to detect the attack.
For an effective and early detection, we propose to monitor the traffic volume and intensity on several links of the network for simultaneously occurring sudden characteristic change on some of these links. Based on the awareness of this possible early detection, we discuss how to detect an attack which is formed by low-intensity non-malicious traffic.
Figure 8. Link utilization of one link with different attack intensity.
Figure 9. Correlation among links when there are attacks.
Figure 10. Correlation among links when the attack intensity is reduced.
The main idea is analogous to detecting the variation of the volume of the traffic at several links. Since one cannot gain any information from per flow traffic monitoring (the attack traffic is benign traffic), and attack type is a flooding attack, probing the volume of the traffic at several links might turn out to be effective. Although a single bot-flow is very small and can be detected neither at IDS (Intrusion Detection System) nor the server, the aggregation of the flows is not small anymore. All of these small traffic flows must be aggregated at the certain
Figure 11. Correlation among links when there is no attack.
time and place to be able to overwhelm the target link(s). This variation on the traffic volume at several links correlates them more and this is where the attack can be detected.
B. Experimental results
In this section, we present experimental results to support the hypothesis of early detection of Crossfire attack based on the correlation among the links to decoy servers.
The experiments in this section are different in a way that they are designed to study the correlation among the links with and without the attack. Thus, the attack traffic is not designed to overwhelm any target links. The objective of the attack traffic is to add extra scheduled traffic at all decoy servers.
The same tree with 8-subtrees (80 decoy servers) and the same traffic types are used. In both experiments, there are a warm-up period of length 30 samples. The attack intensity during this period gradually increases at every time sample. This extra attack traffic for the first experiment (experiment-1) increases from 300 bps to 600 bps and for the second experiment increases from 60 bps to 150 bps.
The normalized (l1-norm)2 data of the link utilization of one link for both experiments is illustrated in Fig. 8. The figure shows that the attack intensity for the first experiment (green curve) is higher than the second experiment (blue curve). Higher link utilization of the second experiment might hide the small variation of the attack traffic. The warm-up period for each experiment is highlighted with two parallel line.
Pearson-R is used to measure the correlation among the links. Correlation is computed for every possible combination of two links. Since there are 80 decoy servers in our experiments, there are combination of two links. Pearson-R returns a single value for two sets of data, representing how tightly (or loosely) the two sets are correlated together. However, we are interested in observing how correlation of two links for a duration of the warm-up period evolves. Therefore, Pearson-R is calculated for a window size of 30 (the same size of warm-up period) points. To calculate the first value of the Pearson-R, there are 29 sample points before the attack and one sample of the attack in the set. Then, the window is moved one sample to calculate the second value with 2 attack samples and 28 samples before the attack. Finally when the window reaches to the end of the warm-up period, all 30 samples in calculating Pearson-R include the attack traffic.
The result of experiment-1 is reported in Fig. 9 This figure shows that the correlation constantly increases even for links that they are not correlated before the attack (sample-2, the green curve). The average of correlation among all links (the average over all 3160 pair of links) are presented and proves the positive effect of the attack traffic in increasing the correlation among the all links. Figure 10 illustrates the result of experiment-2, When the attack intensity is reduced. This attack traffic is not strong enough to affect correlating on all links. For instance, the Pearson-R value of the two links in sample-3 curve of the Fig. 10 are changing based on the background traffic on the links (they are not influenced by the attack traffic). However, there are some combination of links that are under influence of the attack traffic. Sample-1 curve in the same figure is one such example. We observed that the effectiveness of the attack traffic on the correlation of links is a function of the intensity of the background traffic. Smaller volume of attack traffic does not effect the correlation when there is a large amount of background traffic passing the link.
The results of the link correlation when there is no attack traffic involved, is reported in Fig.11. The figure shows that in average the correlation among the links are zero. Although, there might be some positive correlation among some links (like sample-2), this is not a general trend in the network.
The Crossfire attack poses great challenges for security researchers and analysts both in detection and mitigation as the packets streaming from bots in the network are seemingly legitimate. While the objective of the Crossfire attack is to deplete the bandwidth of specific network links, a distinct traffic flow between each bot to server, i.e., “bot-to-server” is usually very less intensive flow, and consumes a limited bandwidth at each link. Thus detecting a single flow (or very few number of them) at a link is hard to detect and filter. On the defender’s side, Traffic Engineering (TE) is the network process that reacts to link-flooding events, regardless of their cause [15]. As an attacker, we like to hide the variation of traffic bandwidth as much as possible from the TE module.
The study in this section is to show that if the attacker distributes the benign traffic effectively enough, the defender face much trouble to distinguish the attack traffic from normal traffic when detecting the attack far away from the target link.
Indeed, the analysis, in previous sections, have already attested the imminent importance of traffic distribution. Following this direction, we leverage state-of-the-art approaches in machine learning to investigate the effect of traffic distribution in concealing and detecting the Crossfire attack from available traffic data. To do so, we utilize supervised learning
Figure 12. Classification result for SVM with different distribution.
Figure 13. Classification Result for RF with different distribution.
for classification of network traffic to normal and abnormal traffic, i.e, attack traffic.
A. Learning models
In this paper we attempt to construct a model from big data collected from network. We utilize two supervised learning approaches: Support Vector Machine (SVM) and Random Forest (RF) as they are commonly used machine learning approaches which have demonstrated effective performance on different datasets and problems. SVM: It is known as one of the most powerful and nonprobabilistic binary classifiers which attempts to separate the two classes of data with a hyperplane in a multidimensional space of features. We utilize linear-SVM due to its scalability. RF: Inspired by ensemble learning and bootstrapping, RF leverages multiple instances of decision trees, where each tree is built based on a randomly selected portion of training set. After computing the output of distinct trees, the final decision is made by aggregation of the outputs via a majority voting scheme. The Random Forest Algorithm was chosen because the problem of Crossfire detection has the requirements of high
Figure 14. The effect of number of features on attack detection for 4-sub-tree.
accuracy of prediction, ability to handle diverse bots, ability to handle data characterized by a very large number and diverse types of descriptors.
B. Dataset and feature extraction
We utilized an emulated dataset collected based on the experiments discussed in Section III. To generate the attack we used the topology designed and collected the data from distinct switches. As aforementioned, the objective of this section is to study the subtle variations in traffic data of the network to design effective detection approach for the Crossfire traffic. Therefore we employ the volume of traffic in different links of the network to construct feature vectors.
We evaluate the performance of the learning approaches via the area under the receiver operating characteristic curve (AUC) [16], which illustrates the true positive, i.e., sensitivity, as a function of false positive, i.e., fall-out.
C. Experimental Results
In this section, we design and analyze experiments to answer the following questions:
1) What is the impact of distribution of bot-to-server traffic in the performance of classification algorithms?
2) What is the impact of extracted features in the performance of classification algorithm?
3) What is the impact of levels of the links (in a tree structure) used for feature extraction? 1) The effect of traffic distribution: To examine the impact of traffic distribution on the detection of the attack, we conducted experiments in three different topologies designed in Section III: 2ST, 4ST and 8ST, where the distribution of traffic increases as the number of sub-trees increases in the topology of the network. Fig. 12 shows the classification results of SVM in different settings. As can be seen, the effectiveness of classification in 8ST is significantly lower than 2ST, and 4ST, which is attributed to the fact that the former setting utilizes lower flow than the alternative settings during the attack scenarios. This low amount of traffic as compared
Figure 15. The effect of number of features on attack detection for 8-sub-tree.
with normal states of the network conceals the attack from the eyes of the detection approach. Further, the AUC for 2ST and 4ST is neck to neck with a small improvement in 4ST. This is attributed to the fact that it benefits from more features as compared to 2ST, i.e., 40 features against 20 features. Fig. 13 depicts the classification results of RF model for different traffic distributions. As can be seen from the Figure, RF demonstrated similar behavior results as that for SVM.
2) The effect of features: Prior studies in data mining have demonstrated that the performance of classification models highly depends on the selected features with regards to the classes. Further, a huge amount of data is required to be continuously processed as the network is a streaming and dynamic environment per se, which signify the importance of feature selection to reduce computational complexity. We hence vary the number of extracted features and evaluate the performance of classification algorithm in terms of AUC. Fig. 14 and Fig. 15 demonstrates the performance of classification for 4ST and 8ST, respectively. From Fig. 14, we can see that the classification performance first indicates a positive correlation with the number of features and then saturates after an optimal value, i.e., 30 numbers of features. This is an interesting results verifying that with a too small feature dimension we would fail to achieve the optimal performance. However, by only a limited number of features, we can achieve reasonable performance. This is important as in network environment we may access to a limited number of links for feature extraction. Alternatively, Fig. 15 depicts the classification performance for the 8ST setting. In contrast to 4St, classification of 8St setting is much lower. This indicates the importance of distribution of the Crossfire attack where with a enough distribution of attack standard machine learning approaches would fail to distinguish Crossfire attack traffic from background traffic.
3) The effect of network visibility: Looking from the network aspect, an important factor for attack detection is the level of information we can gather about the traffic data of the network. To examine how features from different levels of the network affects the performance of traffic classification,
Figure 16. The effect of higher level features for SVM.
Figure 17. The effect of higher level features for RF.
we added the volume of one link from the upper level to the feature vector. More specifically, in 8ST setting, we have the volume of 80 decoy servers as a feature for the baseline. We also add the volume of a random link from one level upper to construct a 81-dimension and 21 -dimension feature vectors. Fig. 16 and Fig. 17 demonstrate the performance of classification of traffic data in 8ST setting for SVM and RF. Only adding one feature from the the upper level, even if there are less features from the lower level, improves the performance significantly, which highlights the importance of extracting features from different part of the network.
The Crossfire attack is considered to be one of the most difficult target-area link-flooding attacks to be detected. The attack uses a massively distributed large-scale botnet to generate multiple low-rate benign traffic flows aiming to congest selected network link with the ultimate goal to disconnect the target area from the Internet. Although the Crossfire attack is a tremendous threat to any network, by analyzing the obtained data we show that the adversary has also substantial obstacles in the successful attack execution. As a result, this paper exposes detection vulnerabilities of the Crossfire attack by showing a correlation between coordination of the botnet traffic and the quality of the attack, and a correlation between the attack distribution and detectability of the attack.
We also show that due to the bot synchronization there is a warm-up period after the attack is launched and before the target links are overwhelmed. Our results show that this period can be used for an early attack detection. In this paper a prototypical Crossfire attack detector is described, which exploits these vulnerabilities. For this, we utilize two supervised machine learning approaches: Support Vector Machine (SVM) and Random Forest (RF) for classification of network traffic to normal and abnormal traffic, i.e, attack traffic. In particular, to show the feasibility of detection, we report on the trained scenarios using the link volume as the main feature set. Finally, results of the attack detector are reported along with some future directions to improve the detector.
This work is partially funded by the joint research programme UL/SnT-ILNAS on Digital Trust for Smart-ICT.
[1] L. Xue, X. Luo, E. W. Chan, and X. Zhan, “Towards detecting target link flooding attack.,” in LISA, pp. 81–96, 2014.
[2] D. Gkounis, V. Kotronis, C. Liaskos, and X. A. Dimitropoulos, “On the interplay of link-flooding attacks and traffic engineering,” Computer Communication Review, vol. 46, pp. 5–11, 2016.
[3] D. Gkounis, V. Kotronis, and X. Dimitropoulos, “Towards defeating the crossfire attack using sdn,” arXiv preprint arXiv:1412.2013, 2014.
[4] M. S. Kang, S. B. Lee, and V. D. Gligor, “The crossfire attack,” in 2013 IEEE Symposium on Security and Privacy (SP), pp. 127–141, May 2013.
[5] S. T. Zargar, J. Joshi, and D. Tipper, “A survey of defense mechanisms against distributed denial of service (ddos) flooding attacks,” IEEE Communications Surveys Tutorials, vol. 15, no. 4, pp. 2046–2069, 2013.
[6] S. Ramazani, J. Kanno, R. R. Selmic, and M. R. Brust, “Topological and combinatorial coverage hole detection in coordinate-free wireless sensor networks,” Inter. Journal of Sensor Networks, vol. 21, no. 1, 2016.
[7] M. R. Brust, D. Turgut, C. H. Ribeiro, and M. Kaiser, “Is the clustering coefficient a measure for fault tolerance in wireless sensor networks?,” in 2012 IEEE Inter. Conf. on Communications, pp. 183–187, IEEE, 2012.
[8] S. Misra, M. Tan, M. Rezazad, M. R. Brust, and N.-M. Cheung, “Early detection of crossfire attacks using deep learning,” arXiv preprint arXiv:1801.00235v3 [cs.CR], 2017.
[9] L. Xue, X. Luo, E. W. W. Chan, and X. Zhan, “Towards detecting target link flooding attack,” in 28th Large Installation System Administration Conference (LISA14), (Seattle, WA), pp. 90–105, 2014.
[10] A. Botta, A. Dainotti, and A. Pescapè, “A tool for the generation of realistic network workload for emerging networking scenarios,” Computer Networks, vol. 56, no. 15, pp. 3531–3547, 2012.
[11] W. Yu, “Pox flow statistics.” https://github.com/hip2b2/poxstuff, 2012.
[12] C.-C. Wu, K.-T. Chen, Y.-C. Chang, and C.-L. Lei, “Peer-to-peer application recognition based on signaling activity,” in Proceedings of the 2009 IEEE International Conference on Communications, ICC’09, (Piscataway, NJ, USA), pp. 2174–2178, IEEE Press, 2009.
[13] C. chi Wu, K. ta Chen, Y. chun Chang, and C. laung Lei, “Detecting peer-to-peer activity by signaling packet counting,” 2008.
[14] Y.-M. Ke, C.-W. Chen, H.-C. Hsiao, A. Perrig, and V. Sekar, “Cicadas: Congesting the internet with coordinated and decentralized pulsating attacks,” in Proc. of the ACM Asia Conf. on Computer and Communications Security, (New York, NY, USA), pp. 699–710, ACM, 2016.
[15] C. Liaskos, V. Kotronis, and X. Dimitropoulos, “A novel framework for modeling and mitigating distributed link flooding attacks,” in IEEE INFOCOM 2016, pp. 1–9, IEEE, 2016.
[16] D. M. Powers, “Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation,” 2011.