Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Intelligence Based Intrusion Detection System (IBIDS) Senior Project Session 2004-2005 Intelligence Based Intrusion Detection System IBIDS Feb 18th 2005 Submitted By: Mobeen Faiq Sayyed Sharjeel Musa Hussain Zahra Nadeem Abdul Moeed 2005-02-0090 2005-02-0260 2005-02-0208 2003-02-0005 Department of Computer Science Lahore University of Management Sciences A report submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree BSc (Honours) in Computer Science By Mobeen Faiq Zahra Nadeem Sayyed Sharjeel Musa Hussain Abdul Moeed Lahore University of Management Sciences February 18th, 2004 ii Acknowledgements First of all we would like to pay our respects and thanks to Allah Almighty for making us capable of completing a successful project and for showering his blessing upon us throughout the project and especially at times of need. The cooperation, help and support provided by Dr Asim Karim was a great asset to have during our moments of confusion and we truly acknowledge that. He has been a constant source of guidance throughout the course of this project, and a great advisor. We would like to thank Dr Tariq Jadoon for his help and guidance in understanding the different aspects of networks, which we truly needed in the beginning, without his help, we might not have gotten half way through our project. We would also like to thank M.M Awais for his guidance with the various aspects of Artificial intelligence that he helped us understand, and for helping us out where we needed him. The project’s completion would not have been possible without the constant follow-up by Dr. Asim Loan and Mr. Bilal Afzal, who gave us a boost at times when we were lagging behind. We would also like to thank our families whose love and care enabled us to ease our minds and souls. ______________________ Sayyed Sharjeel Musa Hussain 2005-02-0260 ______________________ Mobeen Faiq 2005-02-0090 ______________________ Zahra Nadeem 2005-02-0208 ______________________ Abdul Moeed 2003-02-0005 Date February 18th, 2004 i Executive Summary In the current age of technology, the means of communication and establishing networks throughout the globe has changed from a human medium to a digital medium. Now it is mostly through computers, and especially the internet, that most of global communication takes place. This global communication does not just contain normal conversations and public-available data, but also consists of the transactions and transfers of private and confidential data which are kept at sites available on the network. Everyone who uses the network has some private data on the internet, whether it be a simple hotmail account password, and on his own computer. This information needs to be secured and kept safe from all kinds of intruders so that the safety and privacy of the individual is maintained. Intrusion Detection Systems are devices that can be hardware or software based and which detect different kinds of intrusions. It is the job of these systems to detect and warn of any kind of attack being taking place, so that the user is aware of the situation and takes control of it, even if it means restarting the computer. The problem with these systems is that most of them use pre-defined rules that are easy to defeat because hackers usually find workarounds for those rules. Our project is a software-based Intrusion Detection System for the Linux operating system and it allows identification of malicious activities using pattern matching that occurs in most commercial Intrusion Detection Systems. The difference in our system and the rest is that ours is a hybrid of anomaly and misuse detection system with an additional domain knowledge rule base which has been extracted through data mining on the network. All this serves the purpose of not just detecting existing attacks, but also of detecting new and unknown attacks. ii Table of Contents Statement of Submission Acknowledgements Executive Summary Table of Content i ii iii iv Chapter 1: Introduction 1.1 Scope of Project 1.2 Currently available solutions 1.2.1 Stateless Firewalls 1.2.2 State full Firewalls 1.2.3 Proxy Servers 1.2.4 Signature Matching IDS 1.2.5 Anomaly detection IDS 1.3 Problems with current solutions 1.3.1 Problems with stateless firewalls 1.3.2 Problems with state full firewalls 1.3.3 Problems with Signature Matching IDS 1.3.4 Problems with Anomaly based IDS 1.4 Development Steps 1 1 1 1 2 3 4 4 5 5 6 7 8 8 Chapter 2: Research Work 2.1 Honeyd 2.2 Anomaly Detection 2.2.1 Clustering: A data mining technique 2.2.2 Types of clustering algorithms 2.2.3 Partitioning Algorithms 2.2.4 Hierarchal clustering algorithms 2.2.5 Density based clustering Algorithms 2.3 Misuse Detection 2.3.1 Boyer- Moore Algorithm 2.3.2 Exb and E2xb Algorithm 2.3.3 Wu and Manber’s Algorithm 2.3.4 Kim’s Algorithm 2.4 Derived Attributes 2.5 Packet Header 2.6 Intrusion Types and their Characteristics 2.6.1 DOS Attacks 2.6.2 R2L Attacks 2.6.3 U2R Attacks 2.6.4 Probes and Scans 9 9 10 10 11 11 12 12 12 14 15 16 18 19 22 23 23 24 24 24 Chapter 3: Design 3.1 Design Overview 25 25 ii 3.2 Details of Modules 3.2.1 Blocked Ports/IP 3.2.2 Pre Anomaly 3.2.3 Anomaly Detection 3.2.4 Signature Matching 3.2.5 Attack Clustering 3.3 Performance Tweaking 27 27 28 28 38 38 40 Chapter 4: Prototype Results 4.1 Coverage 4.1.1 The Pre Filtering Stage 4.1.2 Layer One: Anomaly Detection Stage 4.1.3 Layer Two: Payload signature matching and Payload attributes check stage 4.2 False Positives 4.3 False Negatives 4.4 Detection Probability 4.5 Handling higher bandwidth Traffic 4.6 Ability to Detect new attacks 4.7 Limitations of the Model 43 43 43 44 Chapter 5: Future Enhancements Chapter 6: Conclusion 48 49 References 50 44 44 45 46 46 46 47 iii Chapter One Introduction 1.1 Scope of Project Typically network administrators use firewalls, proxy servers and Network Intrusion Detection and Response mechanisms to protect their networks against any possible threat from an external unauthorized source. However, the problem with currently available NIDS lies in their detection mechanism. Current NIDS use pattern recognition and signature matching to detect flow of malicious packets over the network. However, these techniques fail in the case of presence of new and unknown attacks. This is due to the absence of signatures and patters for new attacks within the database. This creates a loophole in the security of the network, making it vulnerable to potential threats. The scope of this project was to come up with a new design for a Network Intrusion Detection System whose ability to detect unauthorized attacks did not lie in having a pattern and signature database to match network flow against. The limited time for our senior project forced us to limit our focus only on the TCP protocol. In other words the solution proposed in this report, as of current, works only on TCP packets. However, this is no way means that the solution proposed cannot be modified and applied to other protocols. Essentially, this paper puts forward a generic design for a totally new real time intrusion detection system; a design which can very easily be replicated for other protocols besides TCP. By stating that our design works on the TCP protocol, what we mean is that the prototype developed for this design was only tested on the TCP protocol. Testing of this design for ICMP, UDP and other protocols was not carried out. Thus the viability of this design for other protocols besides TCP is not known. 1.2 Currently Available Solutions In this section we will provide a brief overview of the currently available solutions for monitoring intrusions within the network. 1.2.1 Stateless Firewalls Firewalls, in their basic form, are intended to prevent people with harmful intentions from penetrating and gaining access to a network. There are several different technologies in use today that a firewall can use to accomplish this. For the purposes of this discussion, we will concentrate on stateless packet filters, or “Stateless Firewalls”. Stateless packet filters work by examining individual packets as they are transmitted between the data link layer and network layer of the receiving 1 computer. Based on how the filter is configured, the incoming packet’s protocol header is examined and compared with criteria established by the network administrator. Some of the more useful data fields in the protocol header are: Protocol type IP address TCP/UDP port Fragment number Source routing information Filtering based on the IP address is more effective than protocol filtering, depending on how the filter is configured. It’s a two-way type of filter as well, meaning that incoming as well as outgoing packets can be dropped. If the filter is configured so that all IP addresses are permitted through the filter with the exception of a few, then this represents a significant risk to the network of being attacked by a hacker. The reason being is that a hacker has a good chance of finding an IP address that is permitted by the filter. A better approach is to deny all IP addresses by default and grant access to a limited number of IP addresses. A potential hacker will have a harder time finding IP addresses that are granted access by the filter. Filtering on the basis of TCP/UDP Port numbers is another way to filter packets. Port #s, which represent access points to a network, are a common entry point for a hacker to gain access to a system. TCP/UDP examines the port number associated with the incoming packet and compares it with a list established by the network administrator. Some of the more important protocols to block include Telnet ports, Net BIOS session ports and POP ports. Most hackers try to exploit ports associated with these protocols because they give them enormous capability once they have broken in. 1.2.2 State full Firewalls A stateful firewall remembers the context of connections and continuously updates this state information in dynamic connection tables. To give an example of the benefits of a stateful firewall, a hacker trying to gain access has less chance of forging entry as part of a valid series of connections because the context will show that the additional connection does not make sense for a legitimate user. What is state and how does a firewall determine the state of a communication between a source and destination host? State can be loosely defined as the "condition or status of a connection between two communicating hosts". States might be defined as beginning, middle, and end, or beginning and end, or sent and received, or none of the above (as seen with "stateless" protocols). The first rule about communication states is that they vary with the 2 protocols used. Regardless of the protocol and how it manages its state of communication, a firewall needs to keep track of the communication status between a source and destination host. This information is stored in what is called a "state table". Various types of information are stored in a state table and the information varies with the protocol used by the communicating hosts. Examples of information kept in a state table include: Source and destination IP address Source and destination port Protocol, flags, sequence and acknowledge numbers ICMP Code and Type numbers Secondary connection information communicated in application layer headers Application layer specific command sequences (GET, PUT, OPTIONS, etc.) For example, one of the main jobs a firewall performs is to block all unsolicited inbound connections while allowing responses from servers that internal network clients have made outbound connections to. The firewall can block the unsolicited inbound connections while allowing the servers to respond by keeping track of the outbound connections in its state table. For example, when the internal network client makes an outbound connection, the firewall might enter the source and destination IP address and port number in the state table (it might also enter flag, sequence number, and ack number information too). When the firewall receives the server's response, it checks the state table to see if anyone made an outbound request to that server. If so, and if the flags, sequence, and acknowledge numbers are appropriate (for TCP communications), then the firewall passes the response to the internal network client that made the outbound request. 1.2.3 Proxy Servers A proxy server is a web server that resides between a client computer and the Internet. It usually is on the same server as the firewall. The proxy server acts as a block between the client computers and the live Internet. It is able to monitor all requests, inbound and outbound, that pass through the server. Proxy servers are a key component of most corporate networks. Proxy servers are used for the following reasons: Filter requests and control access Provide access to clients that are behind a firewall Improve the performance of the network Share Internet connections among several computers 3 In order for the proxy server to be able to filter and control access to the Internet, it must be setup on the network as the gateway to the Internet. For this to be possible all computers that will require filtering and access control are required to be inside the firewall. These computers will then request an Internet service (FTP, HTTP, etc.). The proxy server receives the request and uses access control lists to determine if the request is acceptable. If it is not, the user will receive and error that the page is forbidden or that it is inaccessible. If it does pass the filtering requirements, the proxy server checks to see if the page is cached locally. If it is, it is passed to the user. If not the proxy server uses one of its own IP addresses and the firewall software to connect to the Internet on behalf of the requesting user. The request is then made on the Internet. When it is returned, the proxy server reads the response and makes sure it is acceptable. It is then forwarded to the user. This process is virtually invisible to the users, and is extremely fast, making the user believes they are getting the response directly from a remote server. 1.2.4 Signature Matching IDS The Idea behind a Signature Matching IDS lies in the ability of a system to detect pre compiled attack patterns within a packet. Every attack produces some sort of signature or pattern. These signatures or patterns are basically packet attributes which when arranged in a certain manner can be used to intrude networks. These attributes include: Flag Bit settings Time to live Payload Content Acknowledgement Values Based on values of these attributes found within attack data, signatures and patterns are compiled. These signatures and patterns are then used as benchmarks to compare against incoming network packets. A packet that contains any one of these signatures is classified as malicious and not allowed to enter the network. For the purpose of this project, these rules have been taken off the Snort website. Snort is an open source Intrusion detection system which works on Linux based server machines. These precompiled signatures and patterns, have made available for distribution the website (URL). 1.2.5 Anomaly Detection IDS 4 Anomaly detection systems are another form of intrusion detection system. While a signature matching based intrusion detection system compares arriving packets with available signatures, anomaly detection IDS defines a normal state for the network. By a normal state what we mean is that an Anomaly based system will observe the normal flow of data over the intended network for a pre specified period of time. Using this data, the system defines an average (normal) packet for the network. This average packet is different for different protocols. After this normal packet has been defined, the system compares all incoming packets with this normal packet. Any packet found to be deviating considerably from this normal state is classified malicious for the intended network. 1.3 Problems with current Solutions 1.3.1 Problems with Stateless Firewalls Stateless firewalls suffer from several significant drawbacks that make them insufficient to safeguard networks by themselves. The major drawbacks to stateless firewalls are They cannot check the data (payload) that packets contain. They do not retain the state of connections The TCP can only be filtered in the 0th fragments Public services must be forwarded through the filter Trojan horses can defeat packet filters using NAT Low pass blocking filters don’t catch high port connections. Each of these is explained below. The first drawback pertains to what the packet filters check prior to either dropping or permitting a packet access to the network. Packet filters apply criteria to a packet’s protocol header, which says nothing about the data portion of the packet. As an example, a HTTP packet flowing into a network could contain Trojan horses embedded in ActiveX controls. The packet filters cannot detect this because it’s not part of the packet’s protocol header. The other limitation has to do with the fact that stateless firewalls don’t retain a memory of connections between host computers. As such, a hacker can set a packet and claim that it belongs to a connection. A stateless firewall has minimal ability (by checking packet’s SYN flag, which a hacker can set) to determine that it doesn’t. As hacker sophistication has grown and the use of the Internet has grown, hackers are now able to gain access to networks through e-mail and web servers that are open to the public. If these servers are part of a larger network, a hacker can gain access to the larger network through them. 5 1.3.2 Problem with Stateful Firewalls Despite the fact that many stateful firewalls by definition can examine application layer traffic, holes in their implementation prevent stateful firewalls from being a replacement for proxy firewalls in environments that need the utmost in application-level control. The main problems with the stateful examination of application-level traffic involve the abbreviated examination of application-level traffic and the lack of thoroughness of this examination, including the firewall's inability to track the content of said application flow. To provide better performance, many stateful firewalls abbreviate examinations by performing only an application-level examination of the packet that initiates a communication session, which means that all subsequent packets are tracked through the state table using Layer 4 information and lower. This is an efficient way to track communications, but it lacks the ability to consider the full application dialog of a session. In turn, any deviant application-level behavior after the initial packet might be missed, and there are no checks to verify that proper application commands are being used throughout the communication session. However, because the state table entry will record at least the source and destination IP address and port information, whatever exploit was applied would have to involve those two communicating parties and transpire over the same port numbers. Also, the connection that established the state table entry would not be properly terminated, or the entry would be instantly cleared. Finally, whatever activity transpired would have to take place in the time left on the timeout of the state table entry in question. Making such an exploit work would take a determined attacker or involve an accomplice on the inside. Another issue with the way stateful inspection firewalls handle applicationlevel traffic is that they typically watch traffic more so for triggers than for a full understanding of the communication dialog; therefore, they lack full application support. As an example, a stateful device might be monitoring an FTP session for the port command, but it might let other non-FTP traffic pass through the FTP port as normal. Such is the nature of a stateful firewall; it is most often reactive and not proactive. A stateful firewall simply filters on one particular command type on which it must act rather than considering each command that might pass in a communication flow. Such behavior, although efficient, can leave openings for unwanted communications types, such as those used by covert channels or those used by outbound devious application traffic. In the previous example, we considered that the stateful firewall watches diligently for the FTP port command, while letting non-FTP traffic traverse without issue. For this reason, it would be possible in most standard stateful 6 firewall implementations to pass traffic of one protocol through a port that was being monitored at the application level for a different protocol. For example, if you are only allowing HTTP traffic on TCP port 80 out of your stateful firewall, an inside user could run a communication channel of some sort (that uses a protocol other than the HTTP protocol) to an outside server listening for such communications on port 80. Another potential issue with a stateful firewall is its inability to monitor the content of allowed traffic. For example, because you allow HTTP and HTTPS out through your firewall, it would be possible for an inside user to contact an outside website service such as www.gotomypc.com. This website offers users the ability to access their PC from anywhere via the web. The firewall will not prevent this access; because their desktop will initiate a connection to the outside Gotomypc.com server via TCP port 443 using HTTPS, which is allowed by your firewall policy. Then the user can contact the Gotomypc.com server from the outside and it will "proxy" the user's access back to his desktop via the same TCP port 443 data flow. The whole communication will transpire over HTTPS. The firewall won't be able to prevent this obvious security breach because the application inspection portion of most stateful firewalls really isn't meant to consider content. It is looking for certain triggerapplication behaviors, but most often (with some exceptions) not the lack thereof. 1.3.3 Problems with Signature Matching IDS Essentially all signatures matching IDS lack the ability to detect new attacks. Since signature matching is based on a database of signatures, new attacks or variations of old attacks are not logged into the database and hence forth go undetected by the IDS. This creates a major loop hole in the network security. Furthermore, the number of known attacks and their different variations run into thousands. The database maintains signatures for all these attacks making the database enormous. Each packet that arrives on the network is matched against this entire database. This makes signature matching based IDS very slow. Also as time progresses, this database will continue grow and so will the bandwidth of the Internet. This would mean that packets would be arriving at a much faster rate at the network and the signature database will be bigger as well. Under such circumstances, it will become virtually impossible for a signature matching based IDS to do detection in real time, creating a further security loop hole for hackers. 7 1.3.4 Problems with Anomaly based IDS Anomaly detection systems suffer from an inherent drawback i.e.; they generate too many false positives. False positive is a situation where the IDS declare a packet as an attack where as in reality that packet is non-malicious. Since every declaration of an attack by an IDS will needs to be sent to the response system, excessive false positives will result in the response system going crazy 1.4 Development Steps Initially we started off in one direction however the final result which we accomplished was drastically different from our initial thinking. Initially we based our work on the assumption that their must be some common features amongst all attacks. Based on this assumption we started our work by collecting attack data. However as time progressed we realized that their existed no such commonality. From that point onwards we started experimenting with the currently available technologies, understanding them and attempting to combine them in a manner that would bring about a new hybrid system which will cover the short comings of all the currently available solutions for network security. The following section lists a complete detail off all the work undertaken in both the initial phase of the project as well as the work done later to develop a hybrid system. 8 Chapter Two Research Work 2.1 Honeyd What Honeyd essentially does is that it simulates a virtual network, consisting of routers, print servers, web servers and the works. This virtual network is then put on the web with the intention of allowing hackers to attack this network. Whenever a hacker attacks this network, an invisible logger logs all the activities that the hacker undertakes without his/her knowledge. The basic purpose behind deploying Honeyd was to collect attack data for our initial assumption, i.e. to derive a common feature among all attacks. Honeyd was supposed to provide us with the data and statistics that we required to carry out our work. During the course of our research into the use and deployment of Honeyd, the following technical papers were read: Article: Author: Abstract: Usage of Honey pots for Detection and Analysis of unknown Security Attacks Prof. Dr.-Ing. A. Wolisz (University of Berlin) This article, which was originally written as a PhD thesis, discusses in detail the design of a virtual Honeyd network. It talks about the various issues in setting up a Honeyd network and how Honeyd operates. It overviews the different form of signatures and hack attacks available at currently. However, further research is still required in this area because this article only gave an over view of different forms of intrusions but left out the details which are more relevant with regards to our current project. Article: Author: Abstract: Honey pots: Weighing up the costs and benefits Andrew Evans This article was more of a side reading to determine the feasibility of use of honey pots for our senior project. No design or implementation issues were discussed. This article contained a critical analysis of the use of honey pots against currently available solutions towards intrusion detection; primarily intrusion detection systems Article: Author: Abstract To Build a Honey pot Lance Spitzner This again was among one of those readings which aided in gaining a better idea about the working of a honey pot. What essentially was different in this article from the first one was this article concentrated less on the design of the honey pot and more on how to use that honey pot and the how different services can be simulated on a Honeyd network. 9 2.2 Anomaly Detection The Anomaly Detection technique works on headers and not payload. This makes this technique very fast and a good candidate for real time detection, as no lengthy payload string comparisons need to be done. One disadvantage is that it produces a lot of false positives because it only makes an intelligent guess. But, this disadvantage can become an advantage. Because it is making an intelligent guess, any new attack that has not been seen before, may be detected by such a system. With progress in developing software and methods to counter network attacks, hackers and attackers are getting cleverer and deploying newer and newer techniques to attack. So you are not safe from attacks just by guarding yourself against known attacks. You would have to anticipate the methods and intentions of attackers and come up with a mean to counteract unknown attacks to some degree. Anomaly Detection techniques can prevent new attacks that have not been seen before. Because this compares incoming packets with normal packets, any abnormal packet seen or unseen would be detected. But these are only those attacks that can be identified from headers. Any new virus or worm would not be detected by such a system because they are present in the payloads of the packets. Since not all the normal packets can be incorporated in these clusters, false positives are produced. Here there is a tradeoff. The more you give leverage to the system, the more false positives would be produced but at the same time, more novel attacks would be detected and less false negatives would be produced and vice versa. So one has to decide upon how much leverage should be given to the system. Anomaly Detection seemed a good choice for fast real time detection of attacks using packet headers. We reached this decision after reading various papers on Intrusion Detection in real time. It was seen that if fast attack detection is needed, then payload should not be touched as payload comparisons are very costly. Detection using header information was seen as very fast. One such paper that really caught our attention was NATE (Proceedings of 2001 workshop on New Security Paradigms, ACM Press) which measures TCP header information and defines clusters for them. Then it detects deviations in the new TCP header data from the existing clusters to decide whether the header represents normal or attack packet. It gave us an idea as to how to use headers in clustering. Another paper further helped us as it gave detailed information about the usage of TCP headers. It was Anomaly Detection using TCP Header Information by Weijie Cai and Li. It is based on the idea of NATE. It contained all the TCP flags needed and how they should be checked. 2.2.1 Clustering: A Data Mining Technique It is a common practice to employ Data Mining techniques for Anomaly Detection. A common Data Mining technique for Anomaly Detection is Clustering. Clusters are groups of objects that are similar to each other. 10 Similar objects end up in one cluster. These clusters have low inter-cluster similarity and high intra-cluster similarity. This means that the similarity between any two objects of any two DIFFERENT clusters is less than the similarity between any two objects of SAME cluster. Clustering has various applications. One such field is marketing; where it helps marketers to generate groups about customers and use these to develop targeted marketing programs. Another area is City Planning; planners can identify groups of houses according to house type, value, geographical location by using this technique. World Wide Web also benefits from this as documents can be classified and Web log data can be clustered to discover groups of similar access patterns. The objects to be categorized into clusters can be numerical or non-numerical and depending on the type of the objects, there are various similarity measures and algorithms available for clustering. Similarity is expressed in terms of a distance function which is typically metric. Some popular distance measures for numerical data are Minkowski Distance, Manhattan Distance and Euclidean Distance. We used the Euclidean Distance in our project. 2.2.2 Types of Clustering Algorithms There are around 5 basic clustering approaches. The following are some of them. Partitioning Algorithms construct various partitions and then evaluate them by some criterion. Hierarchal Algorithms create a hierarchal decomposition of the set of data using some criterion. Density Based Clustering Algorithms produce clusters based on connectivity and density functions. Grid Based Clustering is based on a multiple level granularity structure. And in Model Based Clustering, a model is created for each of the clusters and the idea is to find the best fit of that model to the data. These are the basic approaches. There are various algorithms that follow either one of these approaches or are a mixture of them. Different algorithms were reviewed in order to decide which algorithm should be used in our project to cluster the header information. The information needed to differentiate attack packets from normal packets is basically numeric; so we looked at algorithms that worked with numeric data. We needed an algorithm that was simple yet efficient. The following are some of the algorithms that were narrowed down for implementation but one was selected. 2.2.3 Partitioning Algorithms The K-Means Algorithm constructs a partition of a database D of n objects into k clusters. Each cluster is represented by the mean of the cluster. A point is placed in that cluster which has the least distance between the clusters “mean” and the point. 11 Its strength is that it is relatively efficient and simple. It take O (tkn) where t is the number of iterations, k are the number of clusters and n are the number of points. Its weakness is that we need to specify the number of clusters in advance. There are some variations of K-Means Algorithm like the K-Modes Algorithm that uses the modes instead of mean and the K-Mediod Algorithm that uses median instead of mean. The K-Mediod Algorithm is also called PAM which works for small data but not for large data. Thus PAM can not be used here. Then there is CLARA that is built on PAM to improve its performance by using sampling. CLARA can deal with larger data sets. Then there is an even more efficient CLARANS but these algorithms are complex. 2.2.4 Hierarchal Clustering Algorithms AGNES and DIANA are two hierarchal clustering algorithms. AGNES is agglomerative and DIANA is divisive. These do not require k number of clusters as input, but need a terminating condition. AGNES goes in a non-descending fashion. It merges nodes that have the least dissimilarity. It goes on merging the nodes until they all belong to the same cluster. That is why we need a terminating condition. DIANA is the inverse order of AGNES. They algorithms do not scale well. Agglomerative methods have time complexity of at least O (n2). BIRCH was looked at. It makes an in-memory tree of the data and then applies a clustering algorithm to it but it is sensitive to the order of the data record. 2.2.5 Density Based Clustering Algorithms Several algorithms are present. One is the DBSCAN. It is a long and complex algorithm. Another algorithm OPTICS is based upon it. OPTICS creates an augmented ordering of the data set with respect to its density based clustering. Clusters of all densities can be derived from this ordering. DENCLUE is an algorithm which is faster than OPTICS and DBSCAN but needs a large number of parameters. It is good for data sets with large amounts of noise. 2.3 Misuse Detection Misuse detection essentially checks for "activity that's bad" with comparison to abstracted descriptions of undesired activity. This approach attempts to draft rules describing known undesired usage (based on past penetrations or activity which is 12 theorized and would exploit known weaknesses) rather than describing historical "normal" usage. Rules may be written to recognize a single auditable event that in and of itself represents a threat to system security, or a sequence of events that represent a prolonged penetration scenario. The effectiveness of provided misuse detection rules is dependent upon how knowledgeable the developers are about vulnerabilities. Misuse detection may be implemented by developing expert system rules; model based reasoning or state transition analysis systems, or neural nets. Expert Systems may be used to code misuse signatures as if-then implication rules. Signature analysis focuses on defining specific descriptions and instances of attack-type behavior to flag. Signatures describe an attribute of an attack or class of attacks, and may require the recognition of sequences of events. A misuse information database provides a quick-and-dirty capability to address newly identified attacks prior to overcoming the vulnerability on the target system. Typically, misuse rules tend to be specific to the target machine, and thus not very portable. Model Based Reasoning attempts to combine models of misuse with evidential reasoning to support conclusions about the occurrence of a misuse. This technique seeks to model intrusions at a higher level of abstraction than the audit records. In this technique, developers develop intrusion descriptions at a high, intuitive level of abstraction in terms of sequences of events that define the intrusion. This technique may be useful for identifying intrusions which are closely related, but whose audit trails patterns are different. It permits the selective narrowing of the focus of the relevant data, so a smaller part of the collected data needs to be examined. As a rule-based approach it is still based on being able to define and monitor known intrusions, whereas new and unknown vulnerabilities and attacks are the greatest threats. State Transition Analysis creates a state transition model of known penetrations. In the Initial State the intruder has some prerequisite access to the system. The intruder executes a series of actions which take the target system through intermediate states and may eventually result in a Compromised State. The model specifies state variables, intruder actions, and defines the meaning of a compromised state. Evidence is pre selected from the audit trail to assess the possibility that current system activity matches a modeled sequence of intruder penetration activity (i.e., described state transitions lead to a compromised state). Based upon an ongoing set of partial matches, specific audit data may be sought for confirmation. The higher level representation of intrusions allows this technique to recognize variations of scenarios missed by lower level approaches. Neural Networks offer an alternative means of maintaining a model of expected normal user behavior. They may offer a more efficient, less complex, and better performing model than mean and standard deviation, time decayed models of system and user behavior. Neural network techniques are still in the research stage and their utility have yet to be proven. They may be found to be more efficient and less computationally 13 intensive than conventional rule-based systems. However, a lengthy, careful training phase is required with skilled monitoring. This was followed by a review of expert systems techniques. Expert systems mainly use signature matching as the technique for detection of known attacks. However, the greatest problem encountered with these systems is that the signatures database is quite large. In order to gain a better understanding of how having a very large signatures database can create problems; imagine a signatures database containing around 1000 signatures. Now if we assume that on average 100 packets arrive per second on the network (a very unrealistic number, because in reality that figure runs into thousands even millions depending on the bandwidth and capacity of the link). Now since the signature matching mechanism requires that every arriving packet be compared with the signature matching database, this will lead to nearly 100,000 comparisons within a span of one minute?. Imagine a gigabit network or even a 100Mbps network and you will realize how the numbers of comparisons increase exponentially. This problem is further heightened by considering substring matching. The majority of these signature and patterns consist of a payload. These payload based signatures need to be found within the packet payloads, giving rise to the issue of substring matching; which is a computationally expensive process. Computer Scientists over the world have been working on this problem and have come up with a number of effective solutions to solve the problem of locating signatures within packets, making it more efficient and reliable. The algorithms proposed for signature matching fall into two broad categories: Single pattern matching Algorithms Algorithms that fall into this category have one common feature i.e., they compare one signature at a time. This means that any of these algorithm will pick up one signature at a time from the signatures database and after it has determined weather this signature is located within the packet or not, only then will it move on to the next signature. Descriptions of the algorithms which fall into this category are listed below: 2.3.1 Boyer-Moore algorithm: The most well-known algorithm for matching a single pattern against an input was proposed by Boyer and Moore. The Boyer-Moore algorithm compares the search string with the input starting from the rightmost character of the search string. This allows the use of two heuristics that may reduce the number of comparisons needed for string matching (compared to the naive algorithm). Both heuristics are triggered on a mismatch. The first heuristic, called the bad character heuristic, works as follows: if the mismatching 14 character appears in the search string, the search string is shifted so that the mismatching character is aligned with the rightmost position at which the mismatching character appears in the search string. If the mismatching character does not appear in the search string, the search string is shifted so that the first character of the pattern is one position past the mismatching character in the input. The second heuristic, called the good suffixes heuristic, is also triggered on a mismatch. If the mismatch occurs in the middle of the search string, then there is a non-empty suffix that matches. The heuristic then shifts the search string up to the next occurrence of the suffix in the string. 2.3.2 ExB and E2Xb E. P. Markatos, K. G. Anagnostakis proposed the exclusion based string matching algorithm in 2002 and 2003. The idea is that for pattern P, if any character pi doesn’t show up in text T, we will not be able to find P in T. This algorithm is mainly designed for NIDS. It has two assumptions. The first assumption is that most traffic will not trigger patterns. So by using this algorithm, we don’t need invoke expensive algorithm such as Boyer-Moore algorithm to verify the safety of the traffic. The second assumption is that the text T is not too big. Otherwise the effectiveness of this idea will decrease dramatically because the chance that text T has all the characters of pattern P will increase as text T get big. This forces the NIDS not to accumulate too much data to do pattern matching. Given the trend that NIDS are more and more data stream (e.g. TCP data stream) oriented, this limitation may make this algorithm less valuable. In order to gain a better under standing of the ExB algorithm we will start off by writing the pseudo code for ExB and then explaining it. This will clarify the working of ExB. boolean exists[256]; pre_process(char * text_T, int len_of_T) { bzero(exists, 256/8); // clear array For (int idx = 0; idx < len_of_T ; idx++) { Exists [text_T[idx]] = 1; } } Search (char * pattern_P, char * text_T, int len_of_P, int len_of_T) { For (int idx = 0; idx < len_of_P ; idx++) { if (exists[ pattern_P[idx] ] == 0) Return DOES_NOT_EXIST; } 15 Return boyer_moore(pattern_P, len_of_P, text_T, len_of_T); } Given above is the pseudo-code of ExB algorithm. There are two steps to exclude patterns that will not match the packet. In the first step, function preprocess () will process the characters in packet (text T) to record their existence in array exists [256]. In the second step, function search () will check signature pattern P is in the string text T or not. If any character of signatures pattern P doesn’t show up in array exists [256], we can conclude that signature pattern P will not match string text T. If there are multiple patterns, the second step will be called repeatedly. But the first step is only called once for all patterns. To decrease the false positive, the idea can be generalized to use more bits of characters to create array exists []. For common packets with nearly 1500-byte data, the chance that all characters show up in a packet is high. So the 8-bit array (256 elements) will have high false positive. The major enhancement of E2xB to ExB is to use integer for variable exists [256]. The reason for this change is that clearing exists [256] for each search (each packet in NIDS) is too expensive. When exists [256] is integer, we can assign the ID of packets to exists [256] in pre process (). In search (), we check if exists[x] is equal to the ID of the packet. If exists[x] is not equal to the ID of the packet, it means character x of pattern P doesn’t show up in this packet (text T). Multi Pattern Matching Algorithms Unlike single pattern matching algorithm which takes up one signature at a time and tries to locate it within the arriving packet, multi pattern based algorithms simultaneously match a single packet with all available signatures. How is this done? For a better explanation we will look at some currently available multi pattern matching algorithms. 2.3.3 Wu and Manber’s algorithm Sun Wu and Udi Manber proposed their multi-pattern matching algorithm in 1994. It mainly uses the bad character heuristic of Boyer-Moore algorithm. But since the large number of patterns will decrease the chance to get bigger movement, this algorithm use a block of characters, say 2 or 3 characters as a block, to find a movement offset. Let’s see an example to understand how it works. The figure below shows that we are going to find 4 patterns, P1, P2, P3 and P4 in the text T. The comparison region covers 2 characters. It is also the suffix region and the character block that we are going to use to find patterns in the text. 16 Figure 1: Illustration of the Wu and Manber’s Algorithm In step 1, “12” of text T is in the character block. Using pre computed movement table, we know “12” will match P2’s substring “12”. So the text is moved to left by 4. The pre computed movement table is got by using the same idea as Boyer-Moore algorithm. In step 2, “56” of text T is in the character block. Now since “56” of text T match the substring “56” of P1, P2, and P3, the character is good. The bad character heuristic cannot tell us how much to move. The algorithm pre-populated a hash table for the character block of all patterns in the suffix region. Then the algorithm will be able to get all potential patterns from the hash table quickly. In this case, the patterns are P1, P2, and P3. To eliminate some patterns that will not match the text, the algorithm will get rid of the patterns that don’t match the text in the prefix region. In this case, pattern P1 will be eliminated quickly without checking all characters of pattern P1. Now only P2 and P3 left. The algorithm will check all remaining patterns against the text T. If no pattern matches the text T, the text T will move left by one and restart from step 1. The key point of this algorithm is that it assumes it can move the text quickly to the left from the bad character heuristic because of its larger character block (2 or 3 characters instead of 1 character). If the substring in the character block of the text matches patterns, the bad character heuristic will not work any more. The solution to this problem is to use a hash to find potential patterns and use prefix to further eliminate ineligible patterns. Hopefully the number of remaining patterns will be small. Then we can use naive comparison to verify if the remaining patterns can be found in the text. 17 2.3.4 Kim’s Algorithm Sun Kim and Yanggon Kim proposed their encoding and hashing based multipattern matching algorithm in 1999. There are two basic ideas. The first idea is that patterns may only have a few distinct characters. Then we can encode the characters with fewer bits. This is the same idea as compression algorithms. With fewer bits to compare, we can use less number of comparisons. The second idea is that we only need compare some patterns with the text instead of all patterns in each comparison. The way to find only potential patterns to compare is via hashing. In fact, these two ideas are independent. Either idea can work by itself. Let’s see an example to understand how the encoding scheme works. In the following example, the pattern string only has 3 different characters: a, b, and c. So 2-bit encoding is enough to represent the pattern. For text string, all characters (e.g. d, e, f, g) other than the characters (a, b, c) in the pattern only need one value to represent. Even if we encode all characters that are not in patterns to one value, the pattern matching will still be correct. In the following example, “abc” of text T and pattern P will match after encoding. But “de” of text T will not match “ac” of pattern P because the encoded value of “de” is not equal to the encoded value of “ac” of pattern P. This encoding scheme is like the ExB algorithm, the special encoded valued for characters not in patterns serve as the role of exclusion. Pattern string: abcac Text string: abcdefg Encoding scheme: a: 00, b: 01, c: 10, all other characters: 11 Encoded pattern string: 00 01 10 00 10 Encoded text string: 00 01 10 11 11 11 11 To search only one pattern in the text, the algorithm proposed in this paper is the naive one: character by character comparison. In our example, if the comparison fails, the encoded pattern string will move to the right by 2-bit (corresponding to one character of the original strings). But because the encoded pattern string becomes shorter, we may compare all characters of the encoded pattern string with the text in one computer instruction. For instance, the 32-bit computer needs two instructions to compare the original pattern with the text because the pattern is 5 characters long. The encoded pattern string only needs one instruction because it is only 10 bits long. To search multiple patterns in the text, the authors proposed a hash to decrease the number of patterns that need to be compared against the text. All patterns are hashed to a hash table. The key is the first j characters of all patterns. For instance, we add another two patterns to the above example: acbcaaabb, cccaaabbbccc. The value of j is chosen to the min of the length of 18 patterns. So in this case, the value of j is 5, the length of abcac. The hash table is pre-populated with the patterns. In each comparison, we use the current j characters of the text as key to search the hash table. Then only potential patterns are found. Other patterns whose hash value is different from the hash value of the j characters of the text will not match the text. So it is safe to use this hash scheme to find the potential patterns. 2.4 Derived Attributes As mentioned before, the medium of attack for the attacker lies only in two places, i.e. the packet payload and the packet header. The research done on how to detect various kinds of attacks through these mediums led us to understand that the current intrusion detection products in the market use only a basic set of options which may consist of signature matching on the basis of rules, pattern recognition and state based deviation detection systems. Further research on the types of attacks made it obvious that a different set of rules were being used which was not part of the basic set. These rules are defined as being derived attributes. Derived attributes use domain knowledge to come up with certain attributes which are able to encompass a larger radius of attacks and therefore were of more interest. The research done on these derived attributes came primarily from the 1999 KDD Cup data, 1998 DARPA intrusion detection evaluation program and some research papers like “Cost-based Modeling and Evaluation for Data Mining With Application to Fraud and Intrusion Detection”: Results from the JAM Project by Salvatore J. Stolfo, Wei Fan, Wenke Lee, Andreas Prodromidis, and Philip K. Chan. The objective of the DARPA evaluation program arranged by MIT Lincoln Labs was to survey and evaluate research in intrusion detection. They divided attacks in four main categories: DOS: denial-of-service, e.g. syn flood; R2L: unauthorized access from a remote machine, e.g. guessing password; U2R: unauthorized access to local super user (root) privileges, e.g., various ``buffer overflow'' attacks; Probing: surveillance and other probing, e.g., port scanning. Now for detecting these attacks, just checking the payload for signatures and headers is not enough as that will not detect new kinds of attacks. To detect these attacks some higher level features were required which were found on the KDD Cup site. These Derived features are as follows: 19 type Feature name description duration length (number of seconds) of the continuous connection protocol_type type of the protocol, e.g. tcp, udp, discrete etc. service network service on the destination, discrete e.g., http, telnet, etc. src_bytes number of data bytes from source to continuous destination dst_bytes number of data bytes from destination continuous to source flag normal or connection Land 1 if connection is from/to the same discrete host/port; 0 otherwise error status of the discrete wrong_fragment number of ``wrong'' fragments continuous urgent continuous number of urgent packets Table 1: Basic features of individual TCP connections. The features described in Table 1 are one of the basic features of TCP connections and will nearly always be used. All of these features can be extracted from the TCP header. These features are used in nearly all the intrusion detection systems in the market. The following is a table for the derived content features. feature name Description Type hot number of ``hot'' indicators Continuous num_failed_logins number of failed login attempts Continuous logged_in 1 if successfully logged in; 0 Discrete otherwise num_compromised number of conditions root_shell 1 if root shell is obtained; 0 Discrete otherwise su_attempted 1 if ``su root'' command Discrete attempted; 0 otherwise num_root number of ``root'' accesses num_file_creations number of ``compromised'' file Continuous Continuous creation Continuous 20 operations num_shells number of shell prompts Continuous num_access_files number of operations on access Continuous control files num_outbound_cmds number of outbound commands continuous in an ftp session is_hot_login 1 if the login belongs to the ``hot'' discrete list; 0 otherwise is_guest_login 1 if the login is a ``guest’ login; 0 discrete otherwise Table 2: Content features within a connection suggested by domain knowledge. Table 2 dictates the content features which are suggested by domain knowledge. These features are essentially part and parcel of the payload of the TCP packet and must be extracted from there. Nevertheless they give precious information regarding the usage of the system by an external source. It covers most of the R2L and U2R attacks. The Hot feature describes events like the transfer of a file which can carry malicious code, access to system directories, creation and execution of programs and directories. The compromised condition covers wrong path traversal errors as path not found/ file not found errors. This table is one of the most important tables and its features are necessary to detect login hacks. Table 3 gives a time based analysis of the traffic to find an attack. feature name Description type count number of connections to the same host as the current connection in continuous the past two seconds Note: The following features refer to these same-host connections. serror_rate % of connections that have ``SYN'' continuous errors rerror_rate % of connections that have ``REJ'' continuous errors same_srv_rate % of connections to the same continuous service diff_srv_rate % of connections services srv_count number of connections to the same service as the current connection in continuous the past two seconds to different continuous 21 Note: The following features refer to these same-service connections. srv_serror_rate % of connections that have ``SYN'' continuous errors srv_rerror_rate % of connections that have ``REJ'' continuous errors srv_diff_host_rate % of connections to different hosts continuous Table 3: Traffic features computed using a two-second time window As is obvious from Table 3, this table tries to gather time based information from the traffic to determine if an attack has taken place. Therefore it is a good derivation to detect floods and scans. Although this table was of much importance but we neglected it because most of the detection that it was doing was also being done by other modules that we had thought up of. 2.5 Packet Header To implement a good intrusion detection system, one needs to understand and have a good knowledge of how network packets operate, and what are their components. This is why we had to get an in depth knowledge of what kind of packets there are, the most common include: TCP UDP ICMP IP IPV6 Most of the packets transported over the network are carried over the IP protocol which has the following header properties: 4 8 16 32 bits Ver. IHL Type of service Total length Identification Time to live Flags Protocol Fragment offset Header checksum Source address Destination address Option + Padding Figure 2: IP header structure This shows that the IP header is responsible for directing packets to/from IP sources. Secondly some protocols are built upon this IP protocol, e.g. the TCP protocol, which is responsible for a reliable end to end service. The TCP header structure is as follows: 22 16 32 bits Source port Destination port Sequence number Acknowledgement number Offset Resrvd U A P R S F Checksum Window Urgent pointer Option + Padding Figure 3: TCP header structure Most of the anomalies and attacks used in scans and floods manipulate the TCP header structure by setting a certain combination of flag values which for e.g. establish a connection to get the computer’s Operating System Identification and then end the connection. The most famous technique of getting the victim’s information is by using the PING and the FINGER command. The FINGER command operates on the FINGER protocol whereas the PING command operates on the UDP protocol whose header is as follows: 16 32 bits Source port Destination port Length Checksum Figure 4: UDP header structure The above header structures were mentioned due to their popularity. If the need arises to check the header structure of other protocols, they can be found at www.protocols.com . 2.6 Intrusion types and their characteristics The research that we had done led us to understand that attacks fall into four main categories: 2.6.1 DOS attacks: DOS stands for denial of service attack. This attack does the exact same thing; it denies a service to the victim. This service usually is the network communication and the internet, but can also be more specific as targeting the HTTP protocol and the web service or the TELNET service. Unlike a privacy attack, where an adversary is trying to get access to resources to 23 which it has no authorization, the goal of DOS attacks is to keep authorized users from accessing resources. The infected computers may crash or are disconnect from the Internet. In some cases they are not very harmful, because once you restart the crashed computer everything is on track again; in other cases they can be disasters, especially when you run a corporate network or ISP. These attacks either exploit a bug in the operating system to crash the victim’s computer, or they use a large amount of data packets to flood the victim’s service so that the service becomes of limited use to the user. Some famous DOS attacks include nukes, syn-floods and http-floods. Dos attacks can also be extended to DDOS which stand for the distributed denial of service attack, which use multiple computers to start a DOS attack to one victim. Characteristics of DOS attacks include a large surge of packets to a certain service or the entire network service itself. 2.6.2 R2L attacks: R2L attacks stand for Remote Login attacks. These attacks range from guessing passwords to using scripts to crack passwords. In the end the goal of this attack is to infiltrate the system by getting an unauthorized access from a remote machine. These attacks are hard to detect but the major characteristics of such attacks include a large amount of Password incorrect errors which are sent back to the intruder. In another scenario in which the attacker knows the password before hand, the attack becomes very hard to detect, unless you have an anomaly detecting scheme installed which checks the usage of the computer with the normal usage of the computer, and if there is a certain deviation then it is considered as an attack. 2.6.3 U2R attacks: U2R attacks stand for unauthorized access to root attack. These attacks exploit user passwords and certain bugs and characteristics of the operating system and their memory sizes to gain access to root. The famous kinds of U2R attacks consist of buffer overflows. The characteristics of such attacks include access to memory outside the set bounds. 2.6.4 Probes and scans: Probes and scans are not necessarily attacks, but they help the attacker in a variety of ways to construct and initiate an attack. Probes and scans are used to gain information of the victim computer to check if certain ports are open on which the attack can be initiated, and information of the Operating system used. The information about the operating system is used to exploit certain bugs in the OS or to initiate certain OS specific attacks. Characteristics of such attacks include a packet being sent to multiple ports from a single computer, or certain packets which request the reply from the computer and which identifies the OS, such as finger or ping commands. 24 Chapter Three Design This section provides architectural details of a proposed hybrid design. 3.1 Design Overview After reviewing a number of topics in network security, my team came up with the conclusion that in order to resolve the issues present n currently available IDS one needed to develop a hybrid based design, incorporating the features of currently available technologies. The figure given below represents the final design for our prototype. 25 Packet Sniffer Blocked Ports/IP Pseudo Random Pre Anomaly Signature Matching Anomaly Detection Check Attributes Signature Matching Attack Clustering Figure 5: Modular View of proposed NIDS Individual modules of this design are explained later on in this text. For now let us examine the flow of data through this design. Any packet arriving on the network can take one of two paths through the IDS. The details of these paths are mentioned below: Path One (a) Packet Arrives at the Sniffer (b) Packet goes to the blocked IP/ Ports module. If the packet is coming from a source IP that has been registered with the IDS as blocked then the network administrator will be alerted. Same will be the case with blocked ports. If packet is not from a blocked IP or port then it is forwarded to the Pre Anomaly Module. 26 (c) This module checks various malicious settings of the header flags. Details of what these checks are can be found in the relevant section. If the attributes of the packet header match any of these pre defined malicious header attributes then the administrator is alerted else the packet is passed on to the Anomaly Detection Module. (d) In the anomaly detection module the arriving packet is checked against the mean (average/normal) packet which has been defined for the network. How this mean packet is calculated and details of what criteria are used to reject or accept a packet as malicious are given in the relevant section. However, it suffices to say that if the anomaly detection phase classifies the packet as malicious then it is stored in a database from where the next module will retrieve the packet for further processing. If not then no action is taken. (e) After the packet has been stored in the database, the signature matching layer, which works off line, will take on this data and attempt to classify it as an attack based on the currently available signatures. However, if a signature is not found for any given packet, then it is not discarded but again stored in a different database to be taken up for further processing by other Modules. (f) Again the unclassified malicious packet is taken from this data base and is passed on to the attack clustering module. This module is based on attack data clustering. It will check the arriving malicious packet against pre computed attack clusters. If the packet is within certain threshold of the attack cluster the system administrator is warmed that although a packet could not be definitely classified as an attack however its characteristics resemble those of an attack. If the packet is outside these given thresholds then no action is taken. Path Two (a) The packet arrives at the Sniffer (b) Some packets are randomly sent to the pseudo random module instead of sending them through path one. (c) In the pseudo random module, the IDS initially performs signature matching on the packet against provided attack signatures. If a match is found then the administrator is alerted else the packet is passed on to the check attribute module. (d) Details of how the check Attributes module works is left for later sections. However, here it will suffice to say that check attributes works on domain specific knowledge. Based on this knowledge it generates derived attributes for the packets. It then performs if-then analysis based on these derived attributes and based on this if-then analysis the packet is declared as either malicious or clean. 3.2 Details of Modules 3.2.1 Blocked Ports/IP This module is by far the simplest module in our entire design. What this module basically does is that it maintains a list of all the IPs and ports that 27 have been blocked by the network administrator. Whenever a packet arrives through the Sniffer, this module it will check the IP and TCP headers of that packet against the list of blocked ports and IPs. If packet is found to be coming from a blocked source or going to a blocked port then the administrator is alerted. 3.2.2 Pre Anomaly This module is invoked immediately after the packets have been filtered for blocked ports and IPs. Its functions are the following: Matching known flag anomalies The flags of the incoming packet are matched with the flags of popular attacks like the syn-fin flag combination attack. It also handles values which should not be set e.g. out of range or extremely high or low values e.g. setting a very low packets size while setting the more fragment bit. Matching popular values The other thing that this module is capable of is that it matches certain popular values amongst hackers with the incoming packets e.g. IP ID 31337. 3.2.3 Anomaly detection K-Mean Algorithm: The Selected Algorithm AGNES and DIANA took a lot of time. Secondly, they did one pass and they could not undo what had already been done. So there is a possibility that the clusters created were not that good. BIRCH was sensitive to the order of the data. It may need fewer passes than partitioning algorithms but this affects the results. Density Based Algorithms are complex. Secondly we simply need a clustering, not an augmented ordering from we which we derive clusters of various densities by giving different parameters. DENCLUE is for data sets with large amounts of noise. As we are going to deal with only normal traffic data, there is going to be no noise and all the data needs to be placed in clusters. So we are left with the Partitioning Algorithms. PAM reduces the amount of scans done by the K-Mean algorithm and its variants but this affects its results as it can work on small data sets but not on large data sets. This also excludes CLARA and CLARANS which are based on PAM. K-Mean Algorithm is relatively efficient and simple and is of O (ktn). There is a need to specify the number of clusters in the beginning but we can get over that as you will soon see. Algorithm Implementation 28 It is implemented in four steps: Partition objects into k non empty clusters. Compute seed points as the centroids of the clusters. Centroids are the centers (the means) of the clusters. Assign each object to the cluster with the nearest seed point. Go back to step 2 (start next iteration); stop when the clusters remain unchanged. Issues with the K-Mean Algorithm 1. It is difficult to estimate the optimum number of clusters for a particular dataset. 2. A new point that should belong to a new (k+1) cluster is forcibly placed in one of the old (<=k) clusters. 3. Density of the k clusters will be low if the data is scattered. K-Mean Modification Instead of fixing the number of clusters k in the start, we provide the radius as the parameter. However, once again we will not need to specify the number of clusters at the start. If a point is too far off from any cluster, i.e. does not fall within the specified radius, it will be placed in a new cluster. The density will improve but since the radius is same for all clusters, some clusters will still have low density as compared to other clusters. If we decrease the magnitude of the radius, then the low density clusters will be replaced by the high density clusters, and high density clusters might divide into overlapping clusters. So the radius seems to be a very rigid way of defining boundaries. Instead of using radius, we are could use the standard deviation. Standard Deviation and its impact on Clusters To overcome this problem of density, we are using standard deviation instead of radius. The standard deviation of each cluster is calculated. As the clusters contain different data points, the standard deviation is different for different clusters. Around 99% of the data points lie within “2.32 times the standard deviation” of a cluster. If a new point lies within “2.32 times the standard deviation” of a cluster, then it is added to that cluster. This works for one iteration. Then the standard deviation is recalculated before the next iteration. Since we are only adding those points whose distance from the mean of the cluster is within “2.32 times the standard deviation” of that cluster, and most of such points will be closer to the mean, it will mostly reduce the over all standard deviation of that cluster. Thus, next time (in the new iteration) when a new point comes, it will have a much stricter standard deviation to compare with. This will ensure that the density of the resulting clusters will be high. 29 It is possible that the points accumulate near the edge of the cluster. Since more points will be further from the mean now than when the iteration started, the new standard deviation will be larger than the previous standard deviation. We have kept a bound for the standard deviation. If the standard deviation goes beyond that, then those points responsible for the increase are not added in the cluster and in the next iteration, those points are put in a new cluster. Then both the clusters will be of relatively high density. The benefit of standard deviation is that it is not rigid like radius and different clusters can be represented by their own means along with their own standard deviations. The deviations will increase and decrease but the density will be kept under control by not getting low. Secondly, only those clusters will have large standard deviations which do have points that are spaced all around in the cluster (except that they have a relatively higher density in an area farther from the mean). Any cluster whose points are not present in all the space and are accumulated at specific points farther from the mean, then that cluster will be divided (since the standard deviation will become larger than the boundary specified). Network Traffic Header Features Following are the features that are extracted and then used in clustering and comparisons. 1. Connection: This includes the source and destination IP’s, and the destination port. It is claimed in the paper; Anomaly Detection using TCP Header Information, this combination is sufficient to be called a connection and is sufficient to detect anomalies. This is used as an ID of a connection and thus also used as primary key in searches. 2. FSR: This is the sum of the FIN, SYN and RST flags divided by the number of total packets in a connection. In the paper it is claimed that if the value is high, then it is a good indication of an attack. FIN and SYN maximum values can be three and RST is usually zero. But keeping in mind that there can be some nonzero values of RST, we had come up with a certain value around which the normal FSR value revolved. This was used to decide the initial standard deviation in Modified K-Means Algorithm. 3. PSH: This is the sum of PSH flag divided by the number of total packets in a connection. 30 4. ACK: This is the sum of ACK flag divided by the number of total packets in a connection. 5. Total Packets: The number of packets sent in a connection till now. 6. Bytes per Packet: The total bytes sent in a connection divided by the total packets in a connection. 7. Port: The port number that the connection is being established with. This helps in checking for port scans. When a port is not present in the normal clusters (this tells that no service is running on that port), and a connection is attempted to it, then it generates an anomaly telling about the port scan. Modified K-Means Algorithm, the Training Data and Network Sampling Before the NIDS is put into action to detect attacks, the algorithm is executed. The clustering on the normal data is done in the algorithm. The normal traffic data is provided to the algorithm as the training data. The clusters formed represent the normal traffic data. When the NIDS starts to work, it simply compares the incoming traffic packet data with the clusters and if there is any deviation, it produces an alert. The training data is important because the algorithm trains itself on the training data. In the anomaly detection phase, the comparisons need to be done with only the normal traffic. So the training data represents the normal network traffic. In order to keep our clusters representative of the normal network traffic, we need take a huge amount of normal network traffic data so that it can encompass as much detail about traffic as it can. This is a case of sampling. We need to sample in such a way that the sample contains as much information about the network as possible because the training data is to represent the whole network. For this, huge amount of data needs to be collected for weeks. As it was not possible for us to collect such a huge amount of data and that too, of only the normal network traffic, by ourselves, we got the data from the DARPA website. Network Traffic Header Features and Training Data Firstly, all the connections from the normal data collected for training are extracted. Then features are extracted for each connection. The connection IDs along with their features are placed in a file that becomes the training data file for the Modified K-Means Algorithm. Before the algorithm is executed, another function works that separates the individual features of each connection and places the same features in the same files because the algorithm works on individual features. Then individual feature training files are sent to the Modified K-Means Algorithm one by one, which produces a clustering for each feature separately. As the algorithm is run for each feature 31 separately, clusters are created for each feature separately. Thus, when the network traffic is being checked, the features of the connections present are extracted. Each feature is individually compared with their corresponding clusters. If an anomaly is detected in any feature, then the connection is flagged as malicious. There may be more than one cluster for a single feature. Modified K-Means Algorithm Pseudo-code structure Cluster float mean; float st_dev; int no_of_pts; //used in the algo int used; float updated_mean; float updated_st_dev; /* initially the updated_mean and updated_st_dev are equal to mean and st_dev. As new points enter the cluster, the updated_mean and st_dev changes but not the original mean and st_dev, until the end of the iteration. After that, the mean and st_dev are set equal to updated_mean and updated_st_dev*/ Procedure KMeanVariant(string filename, float stdev): returns ClusterArray current is an array of clusters previous is an array of clusters do for all elements in the current array set no_of_pts = 0 set used = NOTUSED copyArray(previous,current) Set pointer to beginning of file While end of file is not reached Input = read a number from file If current is empty current[0] = addCluster(input,stdev) else Call function: updateClusterArray(current,input,std) 32 delete those means(clusters) from the current array which had no updation update mean and stdev of the clusters in current while(current and previous are not same) return current; updateClusterArray(ClusterArray a, float m, float stdev) for(i=0; i<size of array a; i++) if(a[i] == NULL) a[i] = addCluster(m,std); else mean_diff = absolute(a[i]->mean – m); if(mean_diff <= 2*a[i]->st_dev) add point m in cluster a[i] by updating the updated_st_dev and no_of_pts updated_mean and Modified K-Means Algorithm Preview This algorithm is executed before the real time analysis of the network traffic begins. This provides the clusters for the anomaly detection of the network data. We already defined the features for a connection. This algorithm is run for each feature separately, except for the connection feature, which is an ID. This results in the creation of clusters for each feature separately. Thus, when the network traffic is being checked, the features of the connections present are extracted. Each feature is individually compared with their corresponding clusters. If an anomaly is detected in any feature, then the connection is flagged as malicious. There may be more than one cluster for a single feature. When we start the algorithm for a feature, we give a Starting Standard Deviation for that feature. This serves as the initial standard deviation to which the points should be compared, in order to decide whether they lie in a cluster or not, or whether they should be placed in a new cluster. Structure “Cluster” The structure cluster has six values: mean: The “mean” of the cluster st_dev: The “standard deviation” of the cluster 33 no_of_pts: The “no. of points” in the cluster used: This variable tells us if the cluster contains any points or not updated_mean updated_st_dev Initially the “updated_mean” and “updated_st_dev” are equal to “mean” and “st_dev”. As new points enter the cluster, the “updated_mean” and “updated_st_dev” change but not the original “mean” and “st_dev”, until the end of the clustering. After that, the “mean” and “st_dev” are set equal to “updated_mean” and “updated_st_dev”. “used” variable is used when the re-clustering is done. Before re-clustering, we have clusters that have some points in them; at least one point. Now in the next iteration, we need to do the re-clustering. When we do the reclustering, it possible that when all the points are catered for, there may remain a cluster/mean that was previously there, which does not contain any point. This cluster would have “used” equal to NOTUSED, and this would be deleted in the end of the clustering iteration. The “used” is set to USED when a point is added to the cluster; when a new cluster is made, it’s “used” is also set to USED. “update Cluster Array” function This function takes in a cluster array “a”, a point “m” and a starting standard deviation “std”. There is a loop that goes through the whole of the array. It searches to see if the point “m” falls in any cluster in the array. If the value lies within “2.32*st_dev” of the cluster, then the point is added to that cluster. If the end of the array is reached and the point “m” does not fall in any cluster, then a new cluster is formed and that point is placed in that cluster. When a point does not come in a cluster, then a new cluster consisting of that point is made. Here, we need the starting standard deviation, which is “std”. The “st_dev” and the “updated_st_dev” of the cluster are set to starting standard dev, the “mean” and the “updated_mean” are set to the value of the point and the “no_of_pts” is set to 1. When a point comes in a cluster, then the “updated_mean” and “updated_st_dev” are updated. When the clustering of all the points is done at the end of an iteration, then the “mean” is set equal to “updated_mean” and the “st_dev” is set equal to the “updated_st_dev”. The updation of the “updated_mean” and “updated_st_dev” is just a mathematical calculation using the formulae. No. of points in the cluster is also increased by one. “KMeanVariant” function 34 This is the function that does the clustering of the points in the dataset. This function takes in the filename of the file that includes the values of a feature for all the connections in the training data set (Another function extracts the features from the training data file that contains all the connections along with their features and passes these files to the KMeanVariant algorithm one by one). It also takes in the starting standard dev “std”. This function returns a ClusterArray. In the beginning, two cluster arrays are declared; “current” and “previous”. Two arrays are kept so that the ending condition can be checked. The ending condition is that, when after re-clustering, the previous clustering is only a small fraction different from the new/current clustering, and then the algorithm should stop. This fraction is “st_dev/30”. Now the loop starts that does the clustering until the ending condition is met. First we see that if there are any elements in the “current” array, then we set the “no_of_pts” equal to 0 and “used” = NOTUSED. Then we copy the current array into the previous array because the re-clustering would be done on the current array and then the current array and the previous array need to be checked for the ending condition. Then we set the pointer to the beginning of file so that we start reading the file from the beginning before we start clustering in iteration. Then we do the following until the file does not end: We read a point/value from the file. If the current array is empty, i.e. there are no clusters in the array (this is the first iteration), then simply a new cluster is added at the first location in the cluster with the cluster “mean” equal to the input value from the file and the “st dev” equal to the starting standard deviation. The “no of pts” is also set to 1. If the current array is not empty, then we call the update Cluster Array function to place the value in its right location in the cluster array. After the above is done for all the points in the file, then we check if there is any cluster in the array that was “not used”/“has no points in it”. If such a cluster is present, we delete that cluster from the current array. The next and the last thing to be done, is to update the “mean” and “st_dev” of the clusters so that they can be used in the next iteration for re-clustering. This is simply done by setting “mean” equal to “updated_mean” and “st_dev” equal to the “updated_st_dev”. As we know, the “updated_mean” and “updated_st_dev” are being updated as new points are being added to the cluster. Lastly we check the ending condition. If it is fulfilled, we return the current array. Else, we go into the next iteration. 35 Creation and Real Time Update of Clusters The clusters are created using the Modifies K-Means Algorithm right before the NIDS starts working. Normal network traffic data is provided as the training data to the algorithm to create clusters. Certain features involving the headers of the normal network traffic packets are extracted and provided to the algorithm as training data to create clusters. The clusters are updated in real time. After the NIDS starts working, when new features of a connection are compared with the clusters to check for anomalies, if they are normal then they are added to the normal clusters by updating the clusters, so as to keep on recording the progress. It simply means recalculating the mean and standard deviation of the clusters that are changed. If we do not do that, the clusters will not evolve and a packet, which is not malicious, might come out to be one. So to avoid that, we evolve the clusters in the least costly way in real time. After sometime, the clusters are bound to get farther and farther from optimal. The real time update of clusters led to overlapping and malformed clusters. So to overcome this, at the end of the day, we do a costly re-clustering on all the normal data recorded that day, to optimize the clusters; this then improves the functioning of the Anomaly Detection Layer. As the pervious clusters are kept and re-clustering is done upon those clusters, the history is maintained. Secondly, day by day, as the re-clustering is done using that network’s data, the NIDS will become accustomed to that network. Feature Extraction from Window Since our major criteria for clustering is done upon certain features of a TCP connection, hence one of the main part of using these features is to extract them from the incoming packets, and store them according to their connections. This module does this task. Upon the execution of the IDS, the first thing to be done is to initialize a window of packets which will retain information about the TCP connections made within that window. Therefore this module is set into play after the window has been initialized. After the window has been initialized the entire contents of the window are sent to this module. The module takes each packet from the window list and does the following tasks on it: 1. Feature Extraction: This task essentially takes the packet and extracts the features inside its header that are required for our IDS. 2. Insert into Tree: The second task done on the packet is that the connection that it belongs to is searched in the tree that we are maintaining. If a match is found then 36 the features that we have extracted are added to the existing connection. If however the connection does not exist in the tree, then a new connection is created and inserted into the tree with the features set to the extracted values. After all the packets in the window have had their features extracted and been inserted into the tree then a function is called which runs anomaly detection on each connection inserted into the tree to check if any of the connections filling up the window are anomalous are not. All of the anomalous connections in the window are displayed at this point. Anomaly Detection The anomaly detection module operates on a single connection. It checks whether that connection is anomalous or not by checking if it lies within the clusters that we have formed. It performs the following tasks: 1. Cluster Check. It checks all the clusters separately and if the connection’s features do not fall inside any of the clusters set up during the clustering phase then it is classified as an anomaly 2. Update Cluster. If the features lie within the clusters then each cluster number is saved in a temporary location before updating the corresponding clusters. Update Connection Features After the initialization of the window and the initial tree, each successive packet that arrives is used to update the features in the tree while maintaining the window size. The following tasks are performed in this module: 1. Extracting the packets Two packets are involved in updating the tree; the packet that has just arrived and the oldest packet in the tree. The oldest packet is extracted from the window queue that is being maintained. 2. Feature Extraction The features are extracted for two packets, the one that has just arrived, and the one that will be removed from the tree. 3. Updating the tree After the features have been extracted, the contribution of the oldest packet is removed and the contribution of the arriving one added to the tree. This is done by first searching for the connection of the oldest packet, 37 and then subtracting the feature values from that connection. If after subtracting the values for the connection reach zero then that connection is deleted from the tree. After this the contribution of the arriving packet is added by searching for its connection in the tree. If the connection exists then the feature values are simply added, otherwise a new connection is made with the extracted features and inserted into the tree. 4. Check For Anomaly After the tree has been updated then the connection that has just been updated by the new packet is sent to the anomaly detection module to check if an anomaly has been found in it or not. 3.2.4 Signature Matching This module checks the packet payloads for existence of attack signatures. Essentially, these signatures are various flag settings along with different payload content. These payloads can range from executable scripts to viruses and worms. As discussed in section …, many different kinds of algorithms exist for the purpose of matching attack signature against incoming packets. However the algorithm used in this prototype was the Boyer- Moore algorithm, discussed previously. The reason for the use of this algorithm is its simplicity and efficiency. Basically, the signature matching modules maintains a list of all possible signatures. These signatures are available from snort; an open source Linux based Intrusion Detection System. For information on how the Boyer- Moore algorithm works please consult the above documentation. 3.2.5 Attack Clustering Basic Idea Layer 3 was designed to reduce the false positives generated by the Anomaly Detection phase. Layer 3 was designed on the fundamental premise behind misuse model, i.e. Attacks follow a pattern. The pattern of the attack is usually formulated to exploit known weaknesses in the system. If an attack follows a pattern, then there is a similarity between attacks. Thus it is possible to cluster attacks. Attack Clustering and Modified K-Means 38 As already mentioned, an algorithm has been made for the clustering that is well suited to this system; the same algorithm is being used for the clustering of attack data. Same as in the case of normal clustering, the attack clusters are created before the NIDS starts checking the traffic for attacks because clusters are needed for comparisons. Training data given the header features of anomalous connections are provided to the algorithm. The algorithms works on individual features and creates clusters for each feature separately. The attack clustering is same as normal clustering, except that the training data provided in both the parts is different and the clusters created are different, i.e., first part produces clusters that represent attacks, and the second part produces clusters that represent normal traffic. Attack Clustering and the Detection of Attacks The anomalous packets from the Anomaly Detection go to the signature matching phase and then come to the attack clustering phase. This whole path is traversed in order to reduce the false positives generated by the anomaly detection phase and to further make sure that the anomaly detected is actually an attack. The anomaly detection phase sends with the anomalous connection packet, a distance measure. This includes all the least distances that each feature has from their corresponding cluster means. These will be used here in the attack clustering phase. The connection information will be compared with the attack clusters. Now three cases arise here: 1. If any feature of the connection lies within an attack cluster, then it is labeled as an attack. 2. We calculate the least distances of each feature from their corresponding cluster means. Now a comparison is done with the distances from the normal cluster means. If any feature has the attack cluster distance less than the normal cluster distance, then an alert is generated, indicating that this could potentially be a new kind of attack. Otherwise, no alert is generated. 3. There is a third possibility. It is checked before the alert for the 2 nd possibility is generated. If the distance from the attack cluster is greater than a certain radius surrounding that cluster, then an alert is generated indicating that even though this is closer to attack than normal data, it is deviant enough from most attacks to consider it as an attack; but it is still an anomaly. The radius is larger than the standard deviation of that cluster. 39 Appropriate information about the connection is also generated if it is not deemed normal as in the 2nd possibility. Update of Attack and Normal Clusters After the connection information is checked with the attack clusters and one of the following possibilities is followed, there is a need to update the clusters so that the clusters and in whole, the system can evolve. If the first possibility is followed, then simply the cluster in which the feature (which caused the packet to be labeled as an attack) falls is updated. If the second possibility is followed, then if it is labeled as a possible attack, a new attack cluster is made for that feature. If it is deemed normal, then the corresponding normal cluster is updated. If the third possibility is followed, then no change is done. The attack cluster update is done in the same way as the normal cluster update is done. And to optimize the clusters, re-clustering of the attack clusters is done at the end of the day, just like it is done for the normal clusters. 3.3 Performance Tweaking For the tweaking part of our project the following variables were tweaked: Cluster Standard Deviations a. FSR deviation The final FSR value that we reached upon was 0.1. This makes sense as the range of FSR ranges from 0 to 3. b. PSH deviation The final PSH value that we reached upon was 0.2. This is because the range of PSH already lies between 0 and 1. c. ACK deviation The final ACK value that we reached upon was 0.1. The range of ACK is from 0-1. d. Port deviation The final Port value reached was 10. This was found to be more accurate. All the cluster deviation values that have been determined have been done so by modifying the deviations in both directions, in all cases, increasing the deviation resulted in lesser clusters and decreasing deviation led to more clusters. In many cases, decreasing deviation led to more false positives, and increasing the deviation 40 led to false negatives for higher values. The values chosen have been done so by trial and error starting with an educated guess, and then checking which values led to the minimum error. Window Size The window size was set at 500 as that gave a good representation of the traffic flow while not being too hard on the disk space and the initial window initialization. Real Time Optimizations 1. AVL trees used to make searches more efficient. 2. Least costly Real Time Update of normal and attack clusters for their evolution so that they incorporate normal data of the network as it comes. 3. Addition of new attack clusters that represent new unknown attacks so that the system can evolve. 4. Re-clustering of clusters at the end of the day. Not only does this keep a trace of history, but because of this, the system becomes accustomed to the network. 5. Modified K-Means Algorithm and the use of standard deviation. 6. Clusters are different for features. This may produce more false positives but this makes the architecture more efficient to attack detection because the features need to lie within a cluster to be classified as normal and if the feature fails to lie within a cluster of a specific feature then it is classified as an attack Payload tree size The Payload tree size was set at 100. This was an arbitrary value and we did not get much time to test it for its size. Payload Attributes Timed Attributes (for 5 second time window) i. Login Attempts threshold Set at 3. It’s too hard for a normal person to attempt 3 logins in 5 seconds. ii. HTTP requests threshold Set at 10. Not tested iii. Directory creation attempt threshold Set at 3. Directory creation attempt is also lesser than 3 in normal cases. Derived Attributes 41 iv. Hot threshold Set at 3 as it is an important threshold and higher values here are a good sign of anomalous activity. v. Hidden directory threshold Set at 3 as it is also an important threshold and higher values here are a good sign of anomalous activity. vi. Total failed login threshold Set at 15 because a normal person does not enter a wrong password 15 times in a session. vii. Total directories created threshold Set at 60 because of the same reason as above. viii. Compromised condition threshold Set at 3 as it is an important threshold and higher values here are a good sign of anomalous activity. 42 Chapter Four Prototype Results 4.1 Coverage This measurement determines which attacks an IDS can detect under ideal conditions. As our IDS contain three layers, which deal with different types of data, we will treat them separately. 4.1.1 The Pre-filtering Stage The Pre-filtering stage is the first stage that the incoming network packets encounter. The first thing this layer does is check if the packet is coming from a blocked IP. If it is, then it informs the network administrator. Second, it checks if a blocked port is not being accessed. If it is, then appropriate information is displayed. The information about the blocked IP’s and ports are provided by the user. The next thing it does is do some checking on packet header attributes. These are attributes which are well known and could not be used in clustering because they are malicious only in a certain combination and individually they are not. These were taken from Snort website, research papers and related sites. These are those signatures that only look at the header and not the payload. Anomaly was detected and appropriate information displayed when: FIN flag is set, and any of the flags (except for ACK flag) is set. Detects scans like SCAN nmap fingerprint attempt, SCAN synscan portscan, SCAN SYN FIN, and SCAN XMAS. TTL > 220, SYN is set and Acknowledgement is 0. This detects SCAN myscan. Reserved bits 1 and 2 of the TCP flags are set (this might lead false positives as we will see later). This detects scans like SCAN cybercop os and SCAN synscan portscan. DOS attacks like DDOS shaft synflood and DDOS mstream client to handler can be detected. A backdoor BACKDOOR ACKcmdC trojan scan is also detected. This can also detect a new attack with these flags set. IP header fragment ID field value is 31337 (very famous with some hackers). Acknowledge=0, flags=0 and sequence=0. This detects the SCAN NULL. 43 ACK is set and Acknowledgement=0. This detects the SCAN NMAP TCP ping. More fragment bit set and IP length < 256 bytes. This detects those attacks that have wrong fragmentation. 4.1.2 Layer 1: Anomaly Detection Stage This is the stage that uses clustering. Here, the clustering is being done on basically two types of attributes: Ports: As the clustering is around those ports which are normally used (which have services running on them), so when a port that is not in the clusters was accessed, it is considered as an anomaly. Consequently, this counteracts the SYN scan. As most of the Trojans and, viruses communicate back on an ephemeral port, or at least listen on one, their activity is also detected. Flags: The clustering on flags detects flood attacks. SYN flood attacks, attacks by sending a lot of ACK packets, FIN flood attacks, can be detected. In an attack, the number of these individual flags set in a connection is quite large than in a normal connection. This helps in determining that it is an attack. The plus point of this feature is that any flood attack (using header fields) that is new, will be detected by this scheme. 4.1.3 Layer 2: Payload Signature Matching and Payload Attributes Check Stage The signature matching stage is not done on all packets because it is costly and cannot be done in real time. When a packet has cleared stage 2, the system checks if the stage 3 is free (i.e. not processing any packet). If it is free, it sends the packet for signature matching of payload. Otherwise, the packet is simply let go. The signature matching stage matches the signatures that require payload check. This stage detects attacks which involve the payloads like Viruses, Trojans, etc. The signatures were taken from Snort website. The stage 3 works on these signatures. Thus the number of attacks detected by Signature Matching is dependent upon the rule base of attacks. There is the functionality of derived attributes. Attributes are derived from payload information and the results tell whether an intrusion is under process or not. It detects wrong log ins, checks if someone is trying to fill up your disk space like by creating a lot of directories, checking system compromise attempts like someone trying to access the root directory, wrong file traversals and http floods. 4.2 False Positives The area where false positives can occur is in ports check. When we install a new application, then a new port is used and when a sender tries to access it, an anomaly will be generated. But this can be a one time job. The user can update the training data appropriately to incorporate that port. 44 In flags check in anomaly detection, there is a possibility of false positives. There can be cases where the flag values for a connection come out to be large which indicates an anomaly but in reality, it is not. But this is the tradeoff we are willing to do to detect attacks in real time and to detect new attacks. There is one area which can produce false positives: the part where we check the two TCP reserved bits. It involves ECN (Explicit Congestion Notification). What is ECN? ECN is a standard proposed by the IETF that will cut down on network congestion and routers dropping packets. Currently, RFC 2481 states that in order to accomplish this task ECN will use four previously unused bits in both the IP header and the TCP Header. In TCP header, these are the reserved bits which should normally be set to zero. Now, the problem is that scans and attacks set these bits. Here is the tradeoff, if we consider the setting of these bits as normal, then the false negatives would increase and if we consider it as abnormal, then the false positives would increase. We are considering it as abnormal behavior because it is better to receive false positives than false negatives; otherwise, we would compromise the security of our network. Secondly, ECN uses the three - way handshake to determine whether or not a sender and receiver are ECN compatible. If they are compatible, only then the ECN is used. Otherwise, the communication is done in the normal way. The ECN concept is new and not many senders are compatible. Thus most of the packets that will have their reserved bits set will be malicious and not normal. So we do not consider setting the reserved bits as normal behavior and give more importance to attacks. This would increase our detection rate and reduce our false negatives rate, without increasing the false positives rate too much (because the ECN concept is new); thus overall being beneficial. 4.3 False Negatives One area where the false negatives can be generated is the Pseudo-random phase. This phase is not applicable on all the packets going through the IDS. It selects packets in pseudo-real time and works on them. So even though a signature is present for an attack, it is a possibility that the attack will not be detected because the packet was not picked up by the pseudo-real time module. In the anomaly detection stage, when we are doing the comparisons of clusters with the incoming traffic, we compare using “2.32 times the standard deviation”. If the information of a connection lies within this figure, then it is considered to be normal; otherwise, it is considered malicious. Around 99% of the points in any cluster lie within “2.32 times the standard deviation” of that cluster. If the information of a connection does not lie within this figure, it is a possibility that it is part of the 1% that does lie within the cluster but does not lie within “2.32 times the standard deviation”; it will not be detected, even though it should be detected. 45 4.4 Detection Probability For the Pseudo-random Signature Matching phase, the detection is hundred percent if the attack information (attack is in payload) is present in the rule base and the packet is picked for signature matching. If it is not detected, the packet enters the network. This depends how busy the network is. The busier the network, the more packets are going to be dropped by the pseudo-random phase. The Anomaly Detection phase detects only 99% of the anomalies that were present in the training data as we have seen in the false negatives part. The rest 1% is not detected. 4.5 Handling High Bandwidth Traffic The system that we designed can be implemented in a distributed fashion. If this system is deployed in a distributed fashion, then considering the performance it is giving, it will work very efficiently in high bandwidth traffic. The pre-filter part is computationally efficient (very light weight as simple seven comparisons are being done). Anomaly Detection phase is also simply comparing the connection information with the clusters already defined. The information is numeric, so it is simple greater than or not greater than comparisons. This is very fast and according to research by various scientists in this field, it is one of the best and latest techniques for real time intrusion detection based on headers. Signature Matching is comparatively slow and thus, performed pseudo-real time. There is a possibility that if the traffic is fast, it would check fewer packets as compared to when the traffic is slower. 4.6 Ability to detect new Attacks New Attacks can be detected in the anomaly detection stage. Almost all sorts of floods using TCP header can be detected. The clustering on flags detected flood attacks. In an attack, the number of these individual flags set in a connection is quite large in comparison with a normal connection. This helps in determining that it is a flood attack. Thus new flood attacks can be detected. Where payload derived attributes are checked, new attacks can be detected because the attributes are extracted from domain knowledge which is developed by domain experts and they claim that it capable of detecting many new attacks. As the clustering is around those ports which are normally used (which have services running on them), so when a port that is not in the clusters is accessed, it is considered as an anomaly. So when someone tries to access some port with no service running on it, an alert is generated. Thus, any type of new probe that tries to access a new port on our machine, are detected. 46 Another area where the new attacks can be detected is the Pre-filtering stage. Here, new attacks dealing with the seven conditions tested can be detected. 4.7 Limitations of the Model The limitations of the model include: Coverage: By coverage what we mean is that our model basically incorporates a subset of the current signatures and the domain knowledge attributes which we were able to obtain from our research. These limitations can be overcome by incorporating them in the current model as well. Pseudo Real time: The pseudo real time incorporated in our model is a big limitation as it means that a certain amount of packets will be left unchecked by the signature matcher. This limitation can be highly reduced by incorporating a distributed model, with layer one working real time and layer two working on another computer. This would distribute the resources and hence give the computer more time to capture packets and reduce the pseudo real time model to an approximately equal real time model. Attacks not catered for: Although we have tried to incorporate as much as we could in the given time, but due to the above mentioned reasons and some more, some of the attacks are not detected. One reason for this is given above that we have not had the chance to incorporate the certain limitations, but we have an answer for them. The second reason is for the type of attacks that our model does not cater for at all and can not improve upon. These attacks mainly lie in the U2R and R2L ranges. These attacks require an anomaly detector for each of the user using the computer and this our model does not cater for, as we are using an anomaly detector for the network data and not personal usage data. No response system: Right now our model does not cater for a response system to the arriving attacks, meaning that we solely rely on the administrator to decide what to do when an attack is found and relayed to the administrator. A response system could be introduced which would pose a better chance in responding to the attacks because of its mechanical nature. Thus it would be able to respond to the attack and not let them through whereas right now the success of the attack lies on the administrator’s response. 47 Chapter Five Future Enhancements In the Signature Matching part, all of the signatures could not be incorporated. So those which are not present are not detected by the system. But we can incorporate them in our rule base. The DDOS attack that spoofs source addresses is not detected by the system. This is because we have to deal with it differently from the usual DOS attacks. However, we can extend the system to incorporate it Our design can be implemented in a distributed fashion which can improve its efficiency in high bandwidth networks. An anomaly detection mechanism could be set up for all the important personnel so that if a U2R or R2L attack is launched on them and succeeds then the anomaly detector could detect the deviated behavior in the usage of that system/account. The clustering algorithm can be further optimized to re-cluster itself when the clusters are getting inefficient. 48 Chapter Six Conclusion IBIDS is a hybrid of anomaly detection, misuse detection and domain knowledge based attribute extraction. The prototype developed by us is not just capable of detecting existing attacks, but also detects new attacks. Although there is a lot of room for future enhancements, a number of methods have been incorporated in this system to improve detection. In short IBIDS is capable of the following: • • • • • Efficient. Saves time and space. Able to process headers in real time. Able to process payloads in pseudo real time. Can detect anomalous behavior and new attacks. Space for future enhancements. 49 References [1] http://www.ll.mit.edu/IST/ideval/data/data_index.html [2] http://kdd.ics.uci.edu/databases/kddcup99/task.html [3] WEIJIE CAI AND LI LI, "ANOMALY DETECTION USING TCP HEADER INFORMATION [4] S Terry Brugger, “Data Mining Methods for Network Intrusion Detection”, June 2004. [5] Salvatore J. Stolfo, “Cost-based Modeling for Fraud and Intrusion Detection: Results from the JAM Project” [6] Wenke Lee, Salvatore J. Stolfo, “Real Time Data Mining-based Intrusion Detection” [7] Wenke Lee, “A Data Mining Framework for Constructing Features and Models for Intrusion Detection Systems” [8] Binh Viet Nguyen, “An Application of Support Vector Machines to Anomaly Detection” [9] Wei Lu and Issa Traore, “Detecting new forms of network intrusion using genetic programming” [10] Aleksandar Lazarevic, “A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection ” [11] Wenke Lee Salvatore J. Stolfo, “A Data Mining Framework for Building Intrusion Detection Models” [12] Paul Dokas and Levent Ertoz, “Data mining for network intrusion detection” [13] Matthew V. Mahoney and Philip K. Chan, “PHAD: Packet Header Anomaly Detection for Identifying Hostile Network Traffic” [14] Luca Deri, Gaia Maselli, Stefano Suin, “Design and Implementation of an Anomaly Detection System” [15] Maheshkumar Sabhnani and Gursel Serpen, “KDD Feature Set Complaint Heuristic Rules for R2L Attack Detection “ 50 [16] Alexandr Seleznyov and Seppo Puuronen, “Anomaly Intrusion Detection Systems: Handling Temporal Relations between Events” [17] Vasilios A. Siris and Fotini Papagalou, “Application of Anomaly Detection Algorithms forDetecting SYN Flooding Attacks” [18] Marina Bykova, Shawn Ostermann, Brett Tjaden, “Detecting Network Intrusions via a Statistical Analysis of Network Packet Characteristics” [19] Eleazer Eskin and Andrew Arnold, “A Geometric Framework for Unsupervised Anomaly Detection: Detecting intrusions in Unlabeled Data” [20] http://www.protocols.com/pbook/tcpip2.htm 51