Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Second International Conference on Innovations in Information Technology (IIT’05) HoneyAnalyzer – Analysis and Extraction of Intrusion Detection Patterns & Signatures Using Honeypot Urjita Thakar Reader, Department of Computer Engineering, Shri G.S. Institute of Technology and Science 23, Park Road, Indore (MP) 452003 INDIA Sudarshan Varma Department of Computer Engineering, Shri G.S. Institute of Technology and Science 23, Park Road, Indore (MP) 452003 INDIA A.K. Ramani Professor, School of Computer Science, DAVV Khandwa Road, Indore (MP) 452001 INDIA Correspondence Email: [email protected], [email protected] ABSTRACT A Honeypot is a security resource, which is intended to be attacked and compromised to gain more information about the attacker and his attack techniques. A honeypot can also indicate about how to perform forensics. The information gathered by watching a honeypot being probed is invaluable. It gives information about attacks and attack patterns. Currently, the creation of intrusion detection signatures is a tedious process that requires detailed knowledge of the traffic characteristics of the phenomenon to be detected. In this paper we address these issues. We have proposed, HoneyAnalyzer, a tool for analyzing honeyd-logs in a RDBMS with a webbased monitoring interface. The data collected from Honeypot is analyzed for possible attacks, scans, and viruses. The system displays the honeyd logs as well as traffic analyzer (e.g. Tcpdump) logs in a well-defined graphical manner so that a security administrator can filter the data of honeypot’s log. We also propose the use of a signature extraction algorithm such as LCS (Longest Common Substring) on the data filtered out by the administrator. Thus the security administrator gets the flexibility to apply the signature extraction algorithm on the data of his choice resulting in more precise attack signature extraction. Keywords: Honeypot, Intrusion Detection, Attack Signatures, Security. 1. INTRODUCTION Intrusion detection systems (IDS) have become an important component in the Security Administrator's toolbox. More specifically, IDS tools aim to detect computer attacks and/or computer misuse, and to alert the proper individuals upon detection. Intrusion detection systems serve three essential security functions: they monitor, detect and respond to unauthorized activity of organization insiders and outsiders [1]. Intrusion detection systems use policies to define certain events that, if detected will issue an alert in the form of a sound or email. Intrusion detection systems are an integral and necessary element of a complete information security infrastructure functioning as "the logical complement to network firewalls”. IDS tools allow for complete supervision of networks, regardless of the action being taken, such that information will always exist to determine the nature of the security incident and its source. Honeypot is a highly flexible security tool with differing applications for security [2]. They don't fix any problem but instead they have important use in intrusion prevention, detection and information gathering. A Honeypot is a security resource that does not have any production or authorized activity. This makes it very simple to use. A Honeypot's greatest value lies in its simplicity because it is a device that is intended to be compromised [3]. This means that there is little or no production traffic going to or from it. Any time a connection is sent to the Honeypot, it is most likely to be a probe, scan, or even an attack. Honeypot collects very little data and what it collects is normally of high value. This information can be used in extraction of intrusion detection signature. There are two basic techniques to detect intruders: anomaly detection, misuse detection (signature detection). Anomaly Detection is designed to uncover abnormal patterns of behavior, the IDS The Second International Conference on Innovations in Information Technology (IIT’05) establishes a baseline of normal usage patterns, and anything that widely deviates from it is flagged as a possible intrusion [5]. The data mining techniques are generally applied to this category of Intrusion Detection i.e. Anomaly detection. Misuse Detection, commonly called signature detection, uses specifically known patterns of unauthorized behavior to predict and detect subsequent similar attempts. These specific patterns are called signatures. Therefore in case of Misuse Detection at the heart of IDS is the attack signature. Various experiments on Data Mining based Intrusion Detection System [14] have demonstrated the effectiveness of classification models in detecting anomalies but the accuracy of the detection models depends on sufficient training data and the right feature set. The Data Mining Method is not suitable for signature extraction in combination with honeypots as they provide very little useful data. The signatures can be generated through approaches like Network Grapping / Pattern Matching, Protocol Decode/Analysis, Heuristic and Honeypot. Current intrusion detection systems often work as misuse detectors, where the packets in the monitored network are compared against a repository of signatures that define characteristics of an intrusion. Successful matching causes alerts to be fired. The signature often consists of one or more specific binary patterns found in a given network packet. The signature can be described as a Boolean relation called rule [6]. An intrusion detection system is able to recognize an attack only when it knows a signature for this attack, and thus require continuous updates of their signature database. Also continuous research to analyze new attacks and find their signatures is a must. Moreover, a slight change in the attack scenario may be enough to alter the attack signature and thus fool a signature filter. They are consequently vulnerable to polymorphic attacks and other evasion techniques which are expected to grow in the near future. At present, the creation of these signatures is a tedious process that requires detailed knowledge of each software exploit and analysis of large pool of ASCII-log data. The automated extraction of the signatures e.g. application of longest common substring (LCS) algorithm to the database of attack log data as presented in [5] extracts the binary pattern blindly, resulting in more false positives. Thus there is a need of generating more precise attack signatures. Simplistic signatures tend to generate large numbers of false positives, too specific ones cause false negatives. To address these issues, this paper presents HoneyAnalyzer, a tool that helps the security administrator in generating precise signatures of malicious network traffic. The proposed system uses honeyd [4], a popular low-interaction open-source honeypot for collecting intruder’s log. Honeyd simulates hosts with individual networking personalities. It intercepts traffic sent to nonexistent hosts and uses the simulated systems to respond to this traffic. Each host's personality can be individually configured in terms of OS type and running network services. This paper focuses on graphical visualization of the attack/access made on various ports of different simulated Honeypots machines. The proposed system gives the flexibility to security administrator by providing him a good graphical interface, to filter out the data. On this filtered data he can apply attack signature algorithms and can get a balanced attack signature that will not give too many false positives or negatives. A security administrator can apply LCS algorithm for signature extraction on the data of his choice. Therefore, this manual intervention will give more precise signatures. 2. BACKGROUND 2.1 Intrusion Detection Signatures The purpose of attack signatures is to describe the characteristic elements of attacks. A signature can be a portion of code, a pattern of behavior, a sequence of system calls, etc. There is currently no common standard for defining these signatures. As a consequence, different systems provide signature languages of varying expressiveness. A good signature must be narrow enough to capture precisely the characteristic aspects of exploit it attempts to address; at the same time, it should be flexible enough to capture variations of the attack. Failure in generating good signatures leads to either large amounts of false positives or false negatives. Content Based Signature Generation [10] is process of extracting the attack signatures based on selection of the most frequently occurring byte sequences across the flows in the suspicious flow The Second International Conference on Innovations in Information Technology (IIT’05) pool. To do so various algorithms like LCS are applied to extract the common patterns in it because malicious payload appears with increasing frequency as the malicious activity spreads. 2.2 Honeypots The honeypot has emerged as an effective tool for observing and understanding intruder’s toolkits, tactics, and motivations [7]. A honeypot suspects every packet transmitted to/from it, giving it the ability to collect highly concentrated and less noisy datasets for network attack analysis. Honeypots are decoy computer resources set up for the purpose of monitoring and logging the activities of entities that probe, attack or compromise them [8]. Activities on honeypots can be considered suspicious by definition, as there is no point for benign users to interact with these systems. Honeypots come in many shapes and sizes; examples include dummy items in a database, low-interaction network components like preconfigured traffic sinks, or full-interaction hosts with real operating systems and services [9]. Honeypots excel at detection, addressing many of the problems of traditional detection. Honeypots reduce false positives by capturing small data sets of high value, capture unknown attacks such as new exploits or polymorphic shell-code, and work in encrypted and IPv6 environments [6]. In general, low-interaction honeypots make the best solutions for detection. They are easier to deploy and maintain. 3. THE PROPOSED METHOD The proposed signature extraction system consists of three major parts – i) Data Capture i.e. traffic logging components: this part includes Honeyd and Tcpdump for data collection. ii) Data Analysis i.e. analysis and extraction components: this part contains data analysis part of signature extraction mechanism for extracting precise attack signature. iii) Signature Extraction i.e. steps to extract out good quality attack signatures. 3.1 Data Capture The purpose of Data Capture is to log all the activities of an attacker. The Honeypot does exactly this i.e. it collects information. The HoneyAnalyzer system has two sources of Data: Honeypot log and network traffic log from Tcpdump. The Honeyd framework supports several ways of logging network activity. It can create connection logs that report attempted and completed connections for all protocols. But to analyze the complete attack scenario, the system needs full payload of the packets entering and leaving the honeypot. This task is performed by the second element i.e. Tcpdump which captures every packet’s full payload. Tcpdump is a tool for network monitoring and one of the most well known sniffers for Linux. Built with the libpcap (packet capture library) interface, it collects information from packets on the network including those intended for other host machines. It does this through a network interface card's ability to enter into promiscuous mode. It then dumps packet header information in the log file. 3.2 Data Analysis In order to extract the precise attack signature, a data analyzer has been developed as shown in figure 1. The web-interface gives a graphical output using which security administrator can easily find out most attacked port, most attacked IP address in the form of pie chart as shown in fig 2 and 3. The proposed methodology for realization of the HoneyAnalyzer for extracting more precise attack signature is described below: i) Configure honeyd to simulate network. ii) Run Tcpdump for traffic analysis. The Second International Conference on Innovations in Information Technology (IIT’05) iii) Invoke the auto-run shell script that will run in a particular time interval and execute the parser utility that will parse the data from the honeyd log file and insert it into the database, as shown in figure 1. The realization of parser utility can be done in any language, which has strong string tokenization capability like Java. iv) Execute the auto-run shell-script to push the honeyd logs data into the database. This will be invoked by cron. v) Login to the web interface to view the attack patterns and analyze the data for extraction of good quality signature. Figure 1: Honey Analyzer’s architecture, illustrating honeyd as it is simulating a number of different machines, each running a number of pre-configured services. The HoneyAnalyzer has hooked itself into the wire to see in and outgoing connections and providing the web-interface. The Second International Conference on Innovations in Information Technology (IIT’05) To enable the Security Administrator to select the suspicious data, the web GUI has the following features: i) Ability to display packet information from the database. ii) Ability to display real time network traffic from data stored in database, as well as historical traffic statistics. iii) Display the ports, which were attacked within a certain time range using pie charts. iv) A timeline based hit statistic showing how many hits per second Honeypot got in a certain time range. v) Show using pie charts which remote IP-addresses were "visited" by Honeypot in a certain time range. Here it's possible to specify a port number to show activity on a specific port. vi) A textual hit statistic over a certain time range. By specifying an IP or a port number it is possible to focus on specific events. Figure 2: This is a quick summary of hits on a particular port like in this case it is port number 137 by various IP address. Figure 3: This is a quick summary of hits by a particular IP address e.g. hits by the machine 192.168.0.39 on various ports. In the proposed method, database module is useful mainly due to two reasons. First, it is easier to search for a particular packet or range of packets using database, and all one has to do is to construct the correct query syntax. Second, the database facilitates different representations of generated data. The database records all the packets (IP, TCP, and UDP) that are received by the Honeypot and Tcpdump. The graphical interface can be run independent of the Honeypot and without any type of honeypot configuration. This independence will come from the database module that is described earlier. Since past events are all recorded in a database, the web GUI can analyze events without having to interfere with normal operations of the Honeypot. In this way the proposed system allows The Second International Conference on Innovations in Information Technology (IIT’05) for a good selection of data for extracting the attack signatures as against the existing methods, which blindly apply the content-based signature extraction algorithm on whole data captured by the honeypot. 3.3 Signature Extraction The graphical interface has support for application of LCS algorithm the data of interest while present systems apply LCS algorithm on whole data. The process of finding attack signatures is not fully automated rather it also depends upon Security Administrator’s (SA) wisdom and experience. The SA can choose the traffic on which the LCS algorithm is to be applied. The Resulting precise signature will give less number of false positives and false negatives. The steps followed for finding the good quality attack signature are as follows: i) Identify data of interest (i.e. of significance) from the database by looking at the web GUI. ii) Analyze combined data from different data sources i.e. honeypot and Tcpdump. For each received packet initiate the following sequence of activities: a) If there is any existing connection state for the new packet, that state is updated otherwise new state is created. b) If the packet is outbound, don’t process the packet. c) Perform protocol analysis [6] at the network and transport layer. d) For each stored connection, perform header comparison in order to detect matching IP networks, initial TCP sequence numbers, etc. iii) Apply content-based string matching algorithm on the payload of interest by applying following sequence of activities: a) If the connections have the same destination port, perform pattern detection on the exchanged messages with the help of Longest Common Substring algorithm. A description about string-based pattern detection is given in the [10]. b) If a new signature is created in the process use the signature to augment the signature pool otherwise stop the process. DISCUSSION & CONCLUSIONS The honeyanalyzer presented in this paper shall be useful in extracting good quality signatures from the data obtained by the logs of honeypot and traffic analyzer. It has been observed that the number of signatures generated by traditional methods are large in number as compared to those generated using honeyanalyzer i.e. lack of knowledge of protocol semantics and local network produce more number of signatures with benign substrings. Honeycomb was one of the first efforts to address the problem of automatic signature generation from honeypot traces [5]. An evaluation of Honeycomb performed in [13], shows that while there were several perfectly functional signatures, there were also a surprisingly large number of benign strings that were identified by the LCS algorithm. Some of these were small strings such as “GET” or “HTTP” that are clearly impractical and just happened to be the longest common substring between unrelated sessions. These were part of normal operation and were suppressed by white-listing signatures smaller than a certain length [13]. There were also much longer strings in the signature set, such as proxy-headers that also do not represent real attack signatures. Thus, the only way to avoid these kinds of problems is through manual grooming of signatures by an expert with protocol knowledge. A comparison of HoneyAnalyzer and Honeycomb is as follows: i) Pairwise LCS employed by Honeycomb often leads to redundant (non-identical) signatures, which would generate multiple alarms for the same attack. While, HoneyAnalyzer generalizes the approach such that a security administrator who is aware of protocol semantics can groom the signature to make it far less prone to redundant signature production. The Second International Conference on Innovations in Information Technology (IIT’05) ii) Honeycomb’s lack of semantics awareness leads to signatures consisting of benign sub strings. These lead to false positives, thus Honeycomb is unable to produce precise signatures for protocols such as NetBIOS, MS-SQL and HTTP attacks, such as Nimda, where the exploit content is a small portion of the entire attack string. In case of HoneyAnalyzer semantics awareness is the responsibility of security administrator. He can better understand the benign substrings of the local network and can filter out redundant and useless strings. Thus the signatures obtained through HoneyAnalyzer are of high quality and result in more precise intrusion detection, not giving too many false positives or negatives. HoneyAnalyzer can also act as an intrusion indicator i.e. how, when and from where different intrusion attempts are taking place. This can be shown through the graphical interface. Honeypots are increasingly deployed in networks; however, they are mostly used passively and administrators watch it just for what happens. The proposed system gives better control to the security administrator on intrusion detection process for extracting good quality attack signature. Future Work In the future, attempt can be made to add implementation of some more algorithms and techniques like connection Tracking, protocol analysis, and pattern detection in flow content etc. based on which security administrator can perform the analysis and extract the signature with even greater precision. To make HoneyAnalyzer more flexible, certain more parameters like allowing the negative interpretation of input like Port! = 445 that will show activities on all Ports except 445 can also be added. A quantitative comparison also needs to be done between the existing method and proposed method to illustrate the advantages of proposed system over existing system. REFERENCES [1] Paul Innella and Oba McMillan, "An Introduction to Intrusion Detection Systems", http://www.securityfocus.com/infocus/1520. [2] Christian Plattner, Reto Baumann, “White Paper: Honeypots”, http://www.inf.ethz.ch/personal/plattner/pdf/whitepaper.pdf. [3] Lance Spitzner, “The Value of Honeypots, Part One: Definitions and Values of Honeypots”, http://www.securityfocus.com/infocus/1492. [4] Niels Provos, “A Virtual Honeypot Framework”, In Proceedings of the 13th Usenix Security Symposium (Security 2004), San Diego, CA, August 2004, Pp. 1–14. [5] Christian Kreibich, Jon Crowcroft, “Honeycomb-Creating Intrusion Detection Signatures” Using Honeypot, ACM SIGCOMM Computer Communication Review archive Volume 34,Issue1 January 2004, Pp. 51 – 56. [6] Erwan Lemonnier, Defcom, “Protocol Anomaly Detection in Network-based IDSs”, http://erwan.lemonnier.free.fr/. [7] Lance Spitzner, “Honeypots: Simple, Cost-Effective Detection”, http://www.securityfocus.com/infocus/1690. [8] Martin Roesch, “Snort – Lightweight Intrusion Detection for Networks”, Proceedings of USENIX 13th System Administration Conference, November 1999, pp.229-238. [9] Yuqing Mai, Radhika Upadrashta and Xiao Su, J-Honeypot: A Java-Based Network Deception Tool with Monitoring and Intrusion Detection, Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 1 April 05 - 07,2004, Pp. 804-808. [10] Hyang-Ah Kim, Brad Karp, “Autograph: Toward Automated, Distributed Worm Signature Detection,” In Proceedings of the 13th Usenix Security Symposium, San Diego, CA, August 2004. Pp. 271–286. [11] Peng Ning, Dingbang Xu, "Learning Attack Strategies from Intrusion Alerts," in Proceedings of the 10th ACM Conference on Computer and Communications Security, October 2003, Pp 200-209. [12] Peng Ning, Yun Cui, Douglas Reeves, and Dingbang Xu, "Tools and Techniques for Analyzing Intrusion Alerts," in ACM Transactions on Information and System Security, Vol. 7, No. 2, May 2004, Pp 273-318. [13] Vinod Yegneswaran, Jonathon T. Giffin, Paul Barford, and Somesh Jha. An Architecture for Generating Semantics-Aware Signatures. In 14th USENIX Security Symposium, Baltimore, Maryland, August 2005. To appear. [14] V.V. Patriciu, I. Priescu, Using Data Mining Techniques for increasing Security in E-mail System Internet-based, in 11th Conference CAIM, Oradea, 2003.