Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
SUBMITTED TO IEEE TRANSACTIONS ON SYSTEM, MAN, AND CYBERNETICS 1 A Gateway-based Defense System for Distributed Denial-of-Service Attacks in High-Speed Networks Dong Xuan, Shengquan Wang, Ye Zhu, Riccardo Bettati, and Wei Zhao Abstract— We describe a defense system to contain Distributed Denial-of-Service (DDoS) flooding attacks in highspeed networks. We aim at protecting TCP friendly traffic, which forms a large portion of Internet traffic. DDoS flooding attacks tend to establish large numbers of malicious traffic flows to congest network. These flows are marked as TCP flows, and use spoofed source identifiers to hide their identities. Current network equipment lacks the countermeasure abilities for such kind of DDoS attack. We describe a gateway-based countermeasure approach. A gateway is a device that is inserted in some point of the network. We envision the gateway devices that are deployed in the network to collaboratively perform the desired countermeasure functions, including detection of DDoS flooding attacks and access control of network traffic. Given the nature of DDoS attack in high speed networks and the limitation of defense resources, it is impossible for the gateway to work on the individual level of on-going traffic flows. We use a groupbased strategy where we partition the network under DDoS attack into several subnetworks, and handle the traffic from the same subnetworks as an aggregate. This approach is applied both in attack detection and access control. With this strategy, the system can be free from the overhead to handle individual flows, and focus on the groups of traffic flows. I. I NTRODUCTION Recent events have shown how various forms of scripts and other forms of automation can be used to harness large numbers of largely unprotected resources on the Internet to mount security attacks on very large scales. The most prominent form of such attacks is the distributed denial of service (DDoS) attack. Given the large amounts of resources available to the attacker, critical components of a victim can be easily overwhelmed, and so the service provided by the victim effectively disrupted. These attacks typically exhaust link bandwidth, router processing capacity, and/or network stack resources, to achieve their objective of breaking network connectivity to the victims. Very little has been done to date in terms of early detecDong Xuan is with the Department of Computer Information and Science, the Ohio-State University, Columbus, OH 43210. E-mail: [email protected] . Shengquan Wang, Ye Zhu, Riccardo Bettati, and Wei Zhao are with the Department of Computer Science, Texas A&M University, College Station, TX 77843. E-mail: {swang, bettati, zhao}@cs.tamu.edu, [email protected] . tion and containment of this form of DDoS attacks. This is largely caused by the difficulties encountered in designing such systems. Difficulties arise from three aspects: First, it is difficult to maintain high friendly TCP traffic throughput under the DDoS attack. Current DDoS defense strategies based on packet dropping cannot avoid dropping significant numbers of TCP friendly packets due to the difficulty to separate TCP friendly traffic from malicious traffic in high-speed networks. Since TCP traffic is inherently responsive, additional dropping of TCP traffic significantly amplifies the effect caused by the DDoS attack. Second, DDoS flooding attacks are inherently difficult to detect. The attack flows hide their identities by using spoofed source identifiers and by marking themselves as TCP flows, although their dynamics is completely unresponsive. An individual attack flow looks like a friendly flow in terms of the network bandwidth consumption. Millions of such mini-flows (generated by using spoofed sources) make the network congested, however. Naturally, the attack flows aggregate together with the friendly TCP flows (they maybe come from the same sources). Finally, the large number of flows involved in massive DDoS attacks require large amounts of resources to be devoted to classifying, monitoring, and countering malicious flows. The limited system resources, such as the CPU processing capacity, buffer etc, are easily exhausted in detecting millions of the above attack flows from the friendly TCP flows. Individual malicious flows can operate significantly below the detection level of current monitoring technology. In this study, we aim at designing a defense system that contains DDoS flooding attacks in high-speed networks. The objectives are to (a) maximize friendly traffic throughput while reducing attack traffic as much as possible, (b) minimize the disturbance of the defense system on delay performance of friendly traffic, and (c) achieve high compatibility to the existing systems. We adopt the following two main strategies to achieve these objectives: • A gateway-based defense strategy: We adopt a gateway based approach. In this context, a gateway is a device that is inserted in some point of the network. We envision the gateway devices that are deployed in the network to SUBMITTED TO IEEE TRANSACTIONS ON SYSTEM, MAN, AND CYBERNETICS 2 collaboratively perform the desired countermeasure functions, including detection of DDoS flooding attacks and access control of network traffic. • A group-based defense strategy: Given the nature of DDoS attack in high speed networks and the limitation of defense resources, it is impossible for the gateway to work on the individual level of on-going traffic flows. In this study, we adopt a group-based strategy. The basic idea is that we partition the network under DDoS attack into several subnetworks, and use the same treatments to the traffic from the same subnetworks. The idea is applied both in attack detection and access control. With this strategy, the system can be free from the overhead to handle individual flows, and focus on the groups of traffic flows. Besides the above main defense strategies, we adopt efficient defense approaches at each stage of DDoS defense. At the stage of attack detection, we design TCP-ACK based attack detection, and use statistical sampling to efficiently obtain the knowledge of the traffic under DDoS attack. At the stage of access control, we classify the traffic into different classes according to their geometry similarity and the damaged degree by DDoS attack. We design a multi-class RED control block with a Class-based Queueing (CBQ) scheduler to control the consumption of bandwidth, aiming to achieve the maximum possible throughput. The rest of the paper is organized as follows: Previous work related to our study is discussed in Section II. In Section III, we introduce the network model used in this paper. We also categorize DDoS flooding attacks, and identify the one we will address in this study. The defense system and gateway architecture are studied in Section IV and Section V respectively. Attack detection strategy and access control strategy are described in Section VI and Section VII respectively. We describe gateway cooperation in Section VIII. In Section IX, we discuss extension of the proposed system. We summarize the paper in Section X. cure Broader Gateway Protocol (S-BGP) [2] architecture employs three security mechanisms to render BGP robust against attacks: A Public key Infrastructure is used to support the authentication of Autonomous Systems and BGP routers, and of various authorizations. A BGP transitive path attribute is employed to carry digital signatures (in ”attestation”) covering the routing information in BGP UPDATEs. IPsec is used to provide data and partial sequence integrity, and to enable BGP routers to authenticate each other for exchanges of BGP control traffic. Anderson et al. [7] addresses methods to render protocols enforceable. In such cases, behavioral properties must be checked in a no-trust relation. This can be done by having appropriate countermeasures to respond to misbehavior or by modifying protocols to carry enough status to verify correct operation. This work is mostly focusing on TCP. • DoS Detection: BBN’s Source Path Isolation Engine (SPIE) attempts to locate source of attacks by tracing back the path of packets. The result is a graph back to a set of origins, some of which may be the attackers. Work at Network Associates attempts to enhance the attack detection and response capacity via active network (AN) technology. The work is based on CITRA (Cooperative Intrusion Trace-back and Response Architecture) and the Intruder Detection and Isolation Protocol architecture [8]. Packets belonging to DDoS attacks do not have readily-identifiable flow signatures, some researchers developed the concept of Aggregate-based Congestion Control, which can be used to counter some formats of DDoS flooding attacks [5]. • DoS Response: Mechanisms for response use a combination of (a) restricting the access of the attacker by limiting access to resources, (b) re-routing to isolate critical components, and (c) back-tracing and offensive attack suppression. II. P REVIOUS W ORK In this work, we restrict ourselves to a single domain, and we focus our attention on domains that are not transit domains. This means that either sources or destinations of traffic flows belong to the domain. We also assume that the domain is fully within our jurisdiction. This means, for example, that we can deploy our gateways anywhere in the domain, and that gateways know the exact topology of the domain. Recent work on DoS can be categorized into one of the following three classes: Network Infrastructure Protection, DoS Detection, and DoS Response. We elaborate on each of them shortly, describe at least one example for each category. • Network Infrastructure Protection: This line of work focuses on attack prevention and defense through a robust infrastructure. Work at the University of Michigan, for example, starts from an analysis of Internet routing instabilities [3], and studies the use of methods to prevent attackers from getting network information, and methods to automate back-tracing of DDoS attacks. BBN’s Se- III. M ODELS A. Network Model B. DDoS Flooding Attack Model In this paper, we propose a defense system against DDoS flooding attacks. This form of DDoS attacks is caused by the attacker(s) breaking into a large number of – SUBMITTED TO IEEE TRANSACTIONS ON SYSTEM, MAN, AND CYBERNETICS geographically dispersed – machines, and harnessing their computing and communication resources for large-scale, coordinated attacks on victim sites. These attacks typically exhaust link bandwidth, router processing capacity, and/or network stack resources, to achieve their objective of breaking network connectivity to the victims. Network resources can be consumed by DDoS flooding attacks in two forms: • A1: When the attack originates from only a few number of hosts, the individual attack flows must very aggressively consume bandwidth, and the attack flows can easily be identified by their bandwidth consumption behavior. • A2: If more hosts are involved as sources of the attack, individual flows can be made to behave in a much more compliant fashion, and so can behave similarly to TCP or UDP flows expected in the system. The attacker can achieve this by frequently changing sources (i.e. using spoofed sources) to hide flow identities. If such flows are multiplexed with friendly traffic, it becomes very difficult to detect and drop the attack traffic to prevent losing a lot of friendly packets, which is not tolerant for TCP traffic. In addition, flows in a DDoS flooding attack tend to be non-responsive, i.e., they use UDP-style dynamics to congest the network. However, the attack traffic may be marked as: • B1: TCP • B2: UDP in the IP packet header. Some DDoS flooding attacks may spoof their sources, others may not, hence we can categorize the attacks as • C1: spoofed-source attacks • C2: non-spoofed-source attacks As mentioned above, we work in the network with a single domain. There are two possibilities of the distribution of the attack sources: • D1: all attack sources are outside the network. • D2: there may be some attack sources inside the network. We use a 4-tuple to represent the different cases of DDoS flooding attacks. For instance, hA2, B1, C1, D1i represents the case in which attacks use an extraordinary large number of attack traffic with TCP header and spoofed source to congest the network, and the real attack sources are out of the attacked network. Obviously, there may be mixed cases. For example, attacks may use both TCP and UDP marked flows to congest the network. In this study, we work on Case hA2, B1, C1, D1i, where many hosts have been harassed into flooding the victim with non-responsive traffic marked as TCP, where the source addresses are spoofed, and where all sources are outside the considered network. We believe 3 that this is one of the most typical and challenging cases. In Section IX, we will discuss how to extend our work to the other cases. IV. S YSTEM OVERVIEW In this section, we give the overview of the whole defense system to DDoS attacks. 21 22 13 13 6 23 15 14 7 24 16 17 8 9 3 k 4 25 18 19 10 26 20 11 12 5 2 1 Fig. 1. A Part of a Network with Gateways The defense system centers around the gateway. A gateway is a device that is inserted in some point of the network. It is an external unit to the network existing equipments. With this strategy, no change is need to the current network equipments or network protocols, and high compatibility can be achieved. Figure 1 illustrates a part of a network with several gateways deployed. The basic functions of the gateways are attack detection and access control. Gateways in the network cooperate with each other to achieve high defense efficiency. They may share attack detection results, and work on different portion of the ongoing traffic. Given the nature of DDoS attack in high speed networks and the limitation of defense resources, it is impossible for the gateway to work on the individual level of on-going traffic flows. In this study, we adopt a group-based strategy. The basic idea is that we partition the network under DDoS attack into several subnetworks, and handle the traffic from the same subnetworks as an aggregate. Our idea of grouping can be applied to all stages of defense including attack detection and access control. With this strategy, the system can be free from the overhead to handle individual flows, and focus on the grouped traffic. In the following sections, we will first describe the architecture of the basic unit in our defense system, then introduce two main defense functions: attack detection and access control. SUBMITTED TO IEEE TRANSACTIONS ON SYSTEM, MAN, AND CYBERNETICS V. G ATEWAY A RCHITECTURE As mentioned above, the gateway is the basic unit in our proposed defense system. The basic functionality of the gateway is to perform attack detection and traffic access control. Figure 2 shows the basic architecture of one Access Control Module RED RED Classifier Scheduler RED Network Traffic DB Signaling Module Attack Detection Module Checking Traffic Sampling Sampling Rules Fig. 2. The Gateway Architecture gateway. There are three modules within a gateway: • Attack Detection Module (AD Module, in short): This module is responsible for obtaining the knowledge of the network traffic suffering DDoS attack. As mentioned above, given the limited resources, it is impossible for a gateway to get individual flow information. We propose a way to know the overall knowledge of traffic under DDoS attack, such as the percentage of traffic belonging to the bad traffic. The knowledge will be used by the Traffic Access Control module. The module will select a portion of the on-going traffic to perform attack detection. Accordingly, this module can be further divided into the traffic sampling sub-module and the checking sub-module. The traffic is sampled and selected by the traffic sampling submodule, and queued at the buffer between the two submodules for checking. The traffic handled by this module is copied from the on-going traffic, hence there is little disturbance on the on-going friendly traffic introduced by this module. • Traffic Access Control Module (TAC Module, in short): This module takes response actions on the on-going traffic based on the knowledge obtained by the AD module. The response actions can be packet dropping, and forwarding. Recall that we aim to protect TCP traffic. We overall reserve a limited amount of bandwidth for UDP traffic, and do comprehensive control on TCP traffic. The control is based on equations of RED and TCP. The results of detection, i.e, the overall situation of defection by DDoS attack of each group, are used in BW assignment to maximize the total friendly TCP traffic throughput. • Signaling Module (SIG Module, in short) : This module 4 provides communication channels among gateways. Gateways cooperate with each other via these channels by exchanging networking information and coordination rules. It is not necessary for a gateway to have all above three modules. Some gateways may just have the Detection Module and the Signaling Modules, which stick to attack defense. Some may just have Access Control And Signaling Modules, focusing on access control. Cooperation among gateways is introduced to make sure gateways work on the different and proper portions of the on-going traffic. Eventually, the individual gateways countermeasure behavior together with their cooperation construct the working scenario of the whole defense system. VI. ATTACK D ETECTION The main purpose of attack detection is to obtain the knowledge of the traffic that may be under DDoS attack. In the following, we will first introduce the basic strategy for attack detection, and then discuss how to adopt this technology in high speed networks. A. TCP-ACK based Attack Detection As mentioned early, we aim at protecting friendly TCP flows. Once under an attack, there may be millions of low-bandwidth, unresponsive flows marked as TCP traffic present. Thus, a successful classification mechanism has to be in place. In this study, we decide to keep track of the TCP friendly flows rather than the attack flows. We identify the friendly TCP flows based on the TCP semantics. There are two special characteristics in TCP semantics which are different from UDP. One is that a TCP flow (connection) experiences a three-stage of handshaking in the flow (connection) establishment. An unresponsive attack flow with a spoofed source, although marked as a TCP flow, cannot establish a real TCP flow (connection). The reason is that its source unlikely gets the SYN-ACK packet from the receiver which destines to the spoofed source rather than its real source. Unfortunately, it will be very difficult for the gateway to monitor the three-stage connection establishment for the individual flows. The other special point in TCP semantics is that within an established TCP flow (connection), the sender and receiver keep exchanging ACK packets (maybe piggy bagged in data packets) to confirm the success of transmission. The matching degree of ACK packets between the sender and the receiver of a flow can be used to decide whether the flow is a friendly one or not. In this study, we rely on detecting the matching degree of ACK packets to identify the friendly traffic. We call this approach as the TCP-ACK based attack detection. SUBMITTED TO IEEE TRANSACTIONS ON SYSTEM, MAN, AND CYBERNETICS B. Discussion The TCP-ACK based approach can be applied to identify whether an individual flow is a friendly TCP flow or not. Ideally, the gateway can keep track of all the individual flows, and do attack detection flow by flow. However, in high speed networks, there are thousands of flows passing through a gateway. The approach may not be feasible for the following reasons: • The overhead for the gateway to keep and manage perflow information is very large. The gateway has to use a very large table to keep per-flow information, and spend the significant processing power to do table management (i.e. lookup, add and delete). • The overhead for the gateway to read and examine each packet header is significant, given the large number of packets passing through the gateway. To reduce both storage and process overheads mentioned above, we adopt the following schemes: • Flow aggregation (or grouping): instead of working on individual flows, our scheme works on groups of flows. According to the TCP semantics, for an individual TCP flow, there is a profile on ACK amount. For a group of TCP flows, we can also get a profile on their ACK amount. We can use this profile to estimate the overall damage of a group of flows that may be under DDoS attack. With this scheme, the gateway need only use the limited size of table to keep flow information, and the overhead of table management is also reduced. The problem is that the precise degree of estimation to the overall damage of a group of flows decreases as the number of flows (i.e. population) in the group increases. An interesting issue is that given the group number 1 , how to group the traffic to achieve the maximum fairness among groups in terms of the preciseness degree of estimation. We design a heuristic grouping algorithm. The basic idea of the algorithm is to let each group have the similar amount of traffic, i.e. the traffic population 2 . To one gateway, the routes of the on-going traffic construct a tree rooted at the gateway itself. Our algorithm assigns the group number recursively to subtrees driven by the total traffic population of sub-trees. The more the population of a sub-tree is, the larger group number the sub-tree can be assigned3 . 1 The group number means the number of groups that the traffic can be split. Generally speaking, the group number is much smaller than the number of flows in the network. It may be determined by the processing power and storage of the gateway which can be used in attack detection. 2 In this study, we use traffic population to represent the amount of traffic in some time unit, say, second, it is equivalent to traffic arriving rate. 3 In our algorithm, the information about the traffic population of the networks with the certain degree of granularity assume to be available. We believe that the information can be obtained with much less over- 5 Traffic sampling: We can use the statistical sampling technology to examine a subset of packets that are randomly selected, rather than to examine every packet in the traffic. With this scheme, the overhead of packet header reading can be reduced. As long as the sample size is sufficient large, a desired degree of confidence can be maintained. • VII. ACCESS C ONTROL Attack detection itself is not the final goal of the defense system. Once the detection is done, the system should take action based on the detection results. As mentioned above, the group-based approach is also applied here. We classify the traffic into different groups (or classes) and assign different bandwidth to achieve the overall maximum TCP throughput. In the following, we will first introduce how to classify traffic, and then concentrate on how to control traffic by using RED and CBQ technologies. A. Classification The goal of traffic classification is to put the traffic sharing the certain degree of similarities together. Since within the same class, the traffic will be treated uniformly, it is very important to make sure that the traffic in the same class share the certain degree of similarities. The similarities include: • Damaging similarity: The damage degree of DDoS attack of the traffic in the same class should be similar. It is unfair to group the traffic with very low level damaged degree together with the traffic seriously damaged by DDoS attack. • Geometry similarity: We will use TCP-equation based solution to control the bandwidth consumption of the traffic. The solution requires that the traffic should share some geometry similarities. By this way, the delay and other behaviors can be approximated to be same. In this study, we use the notation of the overall variance in (1) to describe the damaging similarity. The smaller variance, the higher degree of the damaging similarity of a group. On the contrary to the damaging similarity, it is difficult to quantize the similarity of geometry. In this study, we determine that only the traffic from brother-nodes can be grouped into one class. In reality, the number of classes that the traffic can be grouped into is fixed. It may be determined by the number of RED queues at the output link of the gateway. In this case, with the above consideration about similarity, the problem of classification can be defined as follows: head than the one in attack detection. SUBMITTED TO IEEE TRANSACTIONS ON SYSTEM, MAN, AND CYBERNETICS Given the number of classes |G|, and the bad traffic ratio ej for traffic group j, classify the traffic into different classes Gi to minimize the variance of traffic: σ2 = 1 X X (ei − ej )2 Pj , P i∈G j∈G (1) i where G is the set of class IDs, Gi is the set of traffic for class i, P is the total population, ei is the average bad traffic ratio, i.e., = P X X Pj , (2) i∈G j∈Gi ei = P Pj ej . j∈Gi Pj j∈Gi P 6 TABLE I G ROUPS G ENERATED BY T HE G ROUPING A LGORITHM 67, 68, 69, 70 71, 74 72 73 75, 76, 78 77 79, 80, 81, 82 83 84, 85, 86 47 48 49, 50 51, 52, 53, 54 55, 56, 57 58 59, 60, 61, 62 63, 65 64 66 23, 26 24 25 27, 29, 30 28 31, 32, 33, 34 35, 36, 37, 38 39, 40, 41 42 43, 44, 45, 46 (3) Note that the bad traffic ratio ej and the population Pj for traffic group j are obtained in the Attack Detection module of the gateway. The classifier in the Access Control Module uses the information to classify the traffic into different classes. The problem is NP-hard. We design a heuristic classification algorithm, which is polynomial. The basic idea of this algorithm is that we sort the child-nodes of each node in the increasing order of the bad traffic ratio, and then recursively assign the class numbers to each node to get the minimum variance. Since we have sorted the child-nodes of each node, it is easy to prove that our algorithm is polynomial. The detail of the algorithm is in Appendix A. The measurement of the classification algorithm is the variance of traffic (the variance in short) which is defined in (1). For the purpose of comparison, we introduce the low-bound and the up-bound for the variance. They are obtained by randomly generating a large number 4 of classification plans. The low-bound is the minimum variance among ones resulted by these randomly generated plans. The up-bound is the average value of the variances resulted by these plans. The variance resulted by our classification algorithm should be smaller than the up-bound, and close to the low-bound. 0.04 0.035 26 35 71 38 7 10 74 19 3 83 0.03 86 variance 23 22 0.025 0.02 0.015 Classification Alg Low-bound Up-bound 6 0.01 k 0.005 2 0 2 6 8 10 12 14 16 The number of classes 1 Fig. 4. Classification Results Fig. 3. Network Topology for Simulation In the following, we will evaluate the performance of the classification algorithm. For the purpose of evaluation, we generate a 4-ary tree shown in Figure 3. The traffic from each leaf has population randomly generated between 1, 600kbps and 32, 000kbps. With the grouping algorithm, the tree is grouped as Table I. In Table I, all traffic from sources in same row will form one group. For example, Traffic from source 23, 26 forms one group. In the following, we use this grouped tree as the input of the classification algorithms. 4 Figure 4 shows the evaluation data of our algorithm with the two bounds. We can find that as expected, the variance resulted by our algorithm is very close to the low-bound, and much smaller than the up-bound. Note that our algorithms performance turns to be better as the number of groups increases. It can be explained by the fact that as the class number increases, the algorithm has more freedom to classify the traffic to achieve its objective. 4 In our simulation, the number is 1000. SUBMITTED TO IEEE TRANSACTIONS ON SYSTEM, MAN, AND CYBERNETICS B. TCP-Equation Based Access Control The overall goal of access control is to achieve maximum TCP throughput under DDoS attack. The problem this step faces is how to smartly drop traffic to achieve the goal. In this study, we design a multi-class RED control block attached with a CBQ scheduler (see the Access Control module of the gateway in Figure 2). The block is composed of several RED queues. Different queues will have different bandwidth assignment. The bandwidth assignment determines the drop probability of the traffic. The total TCP throughput will be the sum of the throughput of all the classes of traffic. We can express it as follows: achieved, and also the overall delay performance of the TCP traffic will be less disturbed by packet dropping performed by the gateway. Both of these are the main objectives of our work. To gain the advantages of the stability, we have to derive the stable conditions of systems. The basic idea of deriving these conditions is that first to describe the application traffic behavior into differential equations, then obtain the transfer functions of the system, finally use the control method to analyze the whole system. The differential equations to describe TCP behavior have been derived in [6]. The equations are listed as follows: dWi (t) dt Ttotal = p1 (1 − δ1 )(1 − e1 ) + p2 (1 − δ2 )(1 − e2 ) + . . . + pn (1 − δn )(1 − en ). (4) p̄ is defined as the vector of arriving rate of different classes of traffic, p1 p2 .. . pn . (5) ē is defined as the vector of arriving probability of the bad traffic in different classes of traffic, e1 e2 .. . en . (6) dqi (t) dt δ1 δ2 .. . δn . = 1 Ri (t) Wi (t)Wi (t − Ri (t)) − δi (t − Ri (t)), (8) 2Ri (t) = −Cir + n X Wi (t) i=1 Lired−tcp (Ri+ Cir )3 ≤ (2Ni− )2 where wg = 0.1 min{ (7) Among the above three vectors, vector p̄ and ē are the results of classification. δ̄ is what we want to determine at this step. An intuitive way to determine the drop probability is to let the traffic with small ei have the small drop probability. While this way is easy, it may not be able to achieve the high overall TCP throughput. The reason is that TCP traffic is responsive. The dynamic behavior of TCP traffic to packet loss should be considered. Ideally, the maximum throughput should be achieved at the stable point of the system. If the system is stable, a longterm (in other words, stable) maximum throughput can be Ri (t) , (9) where for class i traffic, Wi (t) is the TCP window function, Ri (t) is TCP round trip time function, δi (t) is RED drop probability function, qi (t) is RED queue length function and Cir is the bandwidth for the friendly TCP traffic of each class 5 . (8) describes the TCP congestion control mechanism multiplicative decrease and additive increase. (9) describes the RED queue change. Based on the above two equations, we can get stable conditions and stable points as follows: • Stable conditions: According to [1], stable conditions are listed as follows: δ̄ is defined as the optimal vector of drop probability of different classes of traffic, 7 s wg2 + 1, K2 (10) 2Ni− 1 + 2 r , + }, (Ri ) Ci Ri (11) Lired−tcp is RED curve slop, one of RED parameters, Ri+ is the upper bound of round trip time, Ni− is the lower bound , α is the average factor in of flow numbers, K = log(1−α) ∆ calculating the average queue length, and ∆ is the sample time. • Stable points: The stable points can be got from the differential equations (8) and (9) [1] as long as the bad traffic can be regarded as stable: Wi2 δi = 2, 5 (12) In fact, Cir = Ci − pi ei , where Ci is the bandwidth assigned to i-th class of traffic. SUBMITTED TO IEEE TRANSACTIONS ON SYSTEM, MAN, AND CYBERNETICS Wi = Ri Cir , Ni (13) where Wi is the window size, and Ri is the round trip time when the system is stable. Ni is the number of TCP friendly flows in the class6 . Observing the above equations, we can find the drop probabilities δi have very clear relationship with the assigned bandwidth Ci . It is very nature that higher bandwidth assigned to the class of traffic, less drop probability the class will get. Having derived the system stable conditions, now we consider some constraint system conditions: • Apparently, the sum of the bandwidth assigned to all the classes of traffic should be no greater than the total bandwidth available. Hence, we have n X Ci ≤ βC, (14) i=0 where Ci is the bandwidth assigned to i-th class of traffic, C is the link bandwidth and β is a parameter7 . • Since δi is the traffic loss rate or drop probability for the i-th class of traffic, we have 0 ≤ δi ≤ 1, (15) for i = 0, 1, ...n. Now our problem turns to find the optimal traffic drop vector δ̄ to maximize the total TCP throughput expressed in equation (4) under the constraints of inequalities (10) – (15). It is a constrained optimization NLP problem. We can use Lagrange Multiplier and non-negative KuhnTucker conditions to solve it. By solving the defined optimization NLP problem, we can get the optimal drop probability for each class of traffic, and the link bandwidth assignment to each class of traffic. The RED and CBQ scheduler can work based on these parameters to achieve our objectives. Note the traffic is dynamic, vectors p̄ and ē may change, accordingly, the drop probability and bandwidth assignment need adjusted. VIII. G ATEWAY C OOPERATION As mentioned above, due to the limitation of the gateway capacity, it is necessary for gateways to cooperate with each other to achieve the high defense performance. Cooperation is needed among gateways to achieve the following goals: • Reducing duplication of processing the on-going traffic among gateways. 6 Ni can be estimated at the stage of attack detection. β can be a value between 0 and 1. It is related to the overall percentage of the bad traffic in the whole traffic passing through this gateway. 7 8 Selecting the proper portion of the on-going traffic to process. • Sharing the detection results among gateways. There are two schemes to reduce duplication. One scheme is to explicitly mark the IP header once a packet is selected to be further-checked. The successive gateways need not select the marked packet, and duplication can be avoid. While this scheme is effective in term of duplication reduction (in fact, it can avoid duplication), it is not compatible to the existing IP protocols. Furthermore, the overhead in writing packets is significant. We prefer the second scheme, in which the explicit coordination approach is used. With this scheme, the carefully designed rules in the attack detection module (i.e. the sampling rules in Figure 2) coordinate different gateways to select different portions of the on-going traffic. The rules also direct the classifier to class the undetected portion of traffic into one specific class. The access control module leaves certain amount of bandwidth for this portion of traffic. The following example shows how the sampling rules can reduce duplication among gateways. The rules guarantee Gateway I and J to select the different portions of the traffic based on the source address information. • At Gateway I: If the last digital of an incoming packet’s source address is X, the packet will be selected. • At Gateway J: If the last digital of an incoming packet’s source address is Y, the packet will be selected. While cooperation among gateways can reduce duplication, it can also help individual gateways to make smart selection on the portion of the on-going traffic. For example, one gateway can inform its neighboring gateways to select the traffic that the gateway has no enough capacity to handle. Cooperation in this example belongs to the dynamic explicit coordination approach. With this approach, the defense load can be distributed dynamically among gateways depending on the dynamic network situation. Cooperation at this point can also be in a static manner. The following distance-based traffic selection falls into this category. To reduce bandwidth consumption of attack traffic is our basic approach to DDoS flooding attacks. Different attack traffic may have different targets with different paths, accordingly, having different potential bandwidth consumption damage. Generally speaking, the attack traffic with a longer path will consume more bandwidth and cause more damage than the traffic with a shorter path. Hence, it is beneficial for the gateway to process more packets with longer remaining paths from the gateway to the destinations of the packets, as opposed to ones with shorter remaining paths among the on-going traffic. Gateways can also exchange the detected traffic information to complement the locally obtained database. It • SUBMITTED TO IEEE TRANSACTIONS ON SYSTEM, MAN, AND CYBERNETICS is particularly useful among the gateways who are on the same path of the attack traffic. Recall that some gateways in our system may not have the attack detection module. Sharing detection information is particular useful for this type of gateways. IX. E XTENSIONS In this study, we do not discuss the issue of gateway deployment. The interested readers can refer the work reported in [4]. Recall that in Section III, we used a 4-tuple to model DDoS flooding attacks. In this study, we focus on containing the attack in case hA2, B1, C1, D1i, where attacks use an extraordinary large amount of attack traffic with TCP headers and spoofed sources to congest the network, and the real attack sources are out of the attacked network. We believe that this is one of the most typical and challenging cases. In this section, for the sake of space limitation, we discuss how to extend our work to DDoS flooding attacks which use UDP or mixed traffic i.e. UDP and TCP traffic to congest the network. In cases where attackers use pure UDP traffic, that is, traffic marked as UDP, to congest the network, at least 80 percent of the TCP friendly traffic can be easily separated from the attack traffic. Also, since UDP flows are not responsive flows, the friendly UDP flows are tolerant to some degree of packet losses. Hence, packet dropping can be relatively easy to perform on the UDP flows (both attack and friendly flows) to control their bandwidth consumption. Also, the defense system can monitor the bandwidth usage of individual UDP flows to identify the attack traffic. In cases that attackers use the mixed traffic, i.e. UDP and TCP traffic to congest the network, we have to handle both the TCP and UDP attacks. Due to the resource limitation, the defense system may have to spend most of its capacity on protecting TCP flows by: (1) discriminating TCP and UDP traffic via strictly limiting the bandwidth usage of UDP traffic, say in any cases, only up to 10 percent of bandwidth can be used for UDP traffic. In this way, the bandwidth for TCP traffic can be guaranteed; (2) adopting the approaches proposed in this study to protect friendly TCP traffic. X. C ONCLUSION We have proposed a defense system for DDoS flooding attack. The individual gateways countermeasure behavior together with their cooperation construct the working scenario of the whole defense system. Our designed system is compatible in the sense that it adopts the gateway-based approach, and no changes are 9 needed to the existing systems and network protocols. The system is efficient and feasible in high speed networks in the sense that it adopts the group-based approach, and the system is free from the overhead to handle individual flows. In this study we propose several efficient defense approaches at each stage of DDoS defense. At the stage of attack detection, we design TCP-ACK based attack detection, and use statistical sampling to efficiently obtain the knowledge of the traffic under DDoS attack. At the stage of access control, we classify the traffic into different classes according to their geometry similarity and the damaged degree by DDoS attack. We design a multi-class RED control block with a Class-based Queuing (CBQ) scheduler to control the consumption of bandwidth, aiming to achieve the maximum possible throughput. Currently, we are implementing the prototype in the Linux environment. We are also investigating how to integrate the current existing detection technologies such spoof-source filtering schemes into our defense system. R EFERENCES [1] C.V. Hollot, Vishal Misra, Don Towsley and Wei-Bo Gong, A Control Theoretic Analysis of RED, Proceedings of IEEE Infocom, 2001. [2] S. Kent, C. Lynn, J. Mikkelson, and K. Seo, Secure Border Gateway Protocol (S-BGP)-Real Worked Performance and Deployment Issues, in Proceedings of the Network and Distributed System Security Symposium (NDSS2000), Feb. 2000. [3] G. Labovitz, G. Robert malan and F. Jahanian, Origins of Internet Routing Instability, in Proceedings of IEEE Infocom’99. [4] B. Li, M. J. Golin, G. F. Italiano and X. Deng, On the optimal placement of web proxies in the Internet, in Proceedings of IEEE Infocom’99. [5] R. Matajan, S. Bellovin, S. Floyd, J. Ioannidis, V. Paxson and S. Shanker, Controlling high bandwidth aggregates in the network, submitted to ACM SIGCOMM 2001. [6] Vishal Misra, Weibo Gong, Don Towsley, Fluid-based Analysis of a Network of AQM Routers Supporting TCP Flows with an Application to RED, in Proceedings of ACM SIGCOMM, 2000. [7] S. Savage, N. Gardwell, D. Wetherall and T. Anderson, TCP Congestion Control with a Misbehaving Receiver Review, ACM Computer Communications Review, v29, no5, October 1999. [8] D. Schnackenberg, K. Djahandari and D. Sterne, Infrastructure for Intrusion detection and Response, in Proceedings of the DARPA Information Survivability Conference and Exposition (DISCEX) 2000. [9] Defense Information Systems Agency, Network Warfare Simulation, URL: http://www.disa.mil/D8/netwars SUBMITTED TO IEEE TRANSACTIONS ON SYSTEM, MAN, AND CYBERNETICS A PPENDIX A: T HE A LGORITHM OF C LASSIFICATION tree T with root ROOT , traffic population Pi and bad traffic ratio ei going through node i, and class number CN . Output: classified tree CT and variance V AR. Input: 1. sort tree T , such that for each father node, all his children’s ei ’s are ordered increasingly; 2. each node is initialized with class number 0; 3. call Classif ication(ROOT, CN, CT, V AR) 4. return CT and V AR. Fig. 5. The Algorithm of Classification Classif ication(i, CNi , CT, V AR) 1. if i is not a leaf 1.1. assign each child j with class number CNj , such that P j∈Ci CNj = CNi and only brother nodes can be grouped together (Ci is the set of children of node i); 1.2. for each class number assignment 1.2.1. for each child j 1.2.1.1. RET = Classif ication(j, CNj , CT, V AR); 1.2.1.2. if RET = F ALSE goto 1.2; else continue; 1.2.2. compute the current variance cur V AR, set V AR = min{cur V AR, V AR} and update its corresponding CT ; 2. if i is a leaf 2.1. if CNi > 1 return F ALSE; else return T RU E; Fig. 6. Procedure Classification 10