Download HoneyAnalyzer - Analysis and Extraction of Intrusion Detection

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
The Second International Conference on Innovations in Information Technology (IIT’05)
HoneyAnalyzer – Analysis and Extraction of Intrusion Detection Patterns
& Signatures Using Honeypot
Urjita Thakar
Reader, Department of Computer Engineering, Shri G.S. Institute of Technology and Science
23, Park Road, Indore (MP) 452003 INDIA
Sudarshan Varma
Department of Computer Engineering, Shri G.S. Institute of Technology and Science
23, Park Road, Indore (MP) 452003 INDIA
A.K. Ramani
Professor, School of Computer Science, DAVV
Khandwa Road, Indore (MP) 452001 INDIA
Correspondence Email: [email protected], [email protected]
ABSTRACT
A Honeypot is a security resource, which is intended to be attacked and compromised to gain more information
about the attacker and his attack techniques. A honeypot can also indicate about how to perform forensics. The
information gathered by watching a honeypot being probed is invaluable. It gives information about attacks and
attack patterns. Currently, the creation of intrusion detection signatures is a tedious process that requires
detailed knowledge of the traffic characteristics of the phenomenon to be detected. In this paper we address
these issues. We have proposed, HoneyAnalyzer, a tool for analyzing honeyd-logs in a RDBMS with a webbased monitoring interface. The data collected from Honeypot is analyzed for possible attacks, scans, and
viruses. The system displays the honeyd logs as well as traffic analyzer (e.g. Tcpdump) logs in a well-defined
graphical manner so that a security administrator can filter the data of honeypot’s log. We also propose the use
of a signature extraction algorithm such as LCS (Longest Common Substring) on the data filtered out by the
administrator. Thus the security administrator gets the flexibility to apply the signature extraction algorithm on
the data of his choice resulting in more precise attack signature extraction.
Keywords: Honeypot, Intrusion Detection, Attack Signatures, Security.
1. INTRODUCTION
Intrusion detection systems (IDS) have become an important component in the Security
Administrator's toolbox. More specifically, IDS tools aim to detect computer attacks and/or computer
misuse, and to alert the proper individuals upon detection. Intrusion detection systems serve three
essential security functions: they monitor, detect and respond to unauthorized activity of organization
insiders and outsiders [1]. Intrusion detection systems use policies to define certain events that, if
detected will issue an alert in the form of a sound or email. Intrusion detection systems are an integral
and necessary element of a complete information security infrastructure functioning as "the logical
complement to network firewalls”. IDS tools allow for complete supervision of networks, regardless
of the action being taken, such that information will always exist to determine the nature of the
security incident and its source. Honeypot is a highly flexible security tool with differing applications
for security [2]. They don't fix any problem but instead they have important use in intrusion
prevention, detection and information gathering. A Honeypot is a security resource that does not have
any production or authorized activity. This makes it very simple to use. A Honeypot's greatest value
lies in its simplicity because it is a device that is intended to be compromised [3]. This means that
there is little or no production traffic going to or from it. Any time a connection is sent to the
Honeypot, it is most likely to be a probe, scan, or even an attack. Honeypot collects very little data
and what it collects is normally of high value. This information can be used in extraction of intrusion
detection signature.
There are two basic techniques to detect intruders: anomaly detection, misuse detection (signature
detection). Anomaly Detection is designed to uncover abnormal patterns of behavior, the IDS
The Second International Conference on Innovations in Information Technology (IIT’05)
establishes a baseline of normal usage patterns, and anything that widely deviates from it is flagged
as a possible intrusion [5]. The data mining techniques are generally applied to this category of
Intrusion Detection i.e. Anomaly detection. Misuse Detection, commonly called signature detection,
uses specifically known patterns of unauthorized behavior to predict and detect subsequent similar
attempts. These specific patterns are called signatures. Therefore in case of Misuse Detection at the
heart of IDS is the attack signature. Various experiments on Data Mining based Intrusion Detection
System [14] have demonstrated the effectiveness of classification models in detecting anomalies but
the accuracy of the detection models depends on sufficient training data and the right feature set. The
Data Mining Method is not suitable for signature extraction in combination with honeypots as they
provide very little useful data. The signatures can be generated through approaches like Network
Grapping / Pattern Matching, Protocol Decode/Analysis, Heuristic and Honeypot.
Current intrusion detection systems often work as misuse detectors, where the packets in the
monitored network are compared against a repository of signatures that define characteristics of an
intrusion. Successful matching causes alerts to be fired. The signature often consists of one or more
specific binary patterns found in a given network packet. The signature can be described as a Boolean
relation called rule [6]. An intrusion detection system is able to recognize an attack only when it
knows a signature for this attack, and thus require continuous updates of their signature database.
Also continuous research to analyze new attacks and find their signatures is a must. Moreover, a
slight change in the attack scenario may be enough to alter the attack signature and thus fool a
signature filter. They are consequently vulnerable to polymorphic attacks and other evasion
techniques which are expected to grow in the near future. At present, the creation of these signatures
is a tedious process that requires detailed knowledge of each software exploit and analysis of large
pool of ASCII-log data. The automated extraction of the signatures e.g. application of longest
common substring (LCS) algorithm to the database of attack log data as presented in [5] extracts the
binary pattern blindly, resulting in more false positives. Thus there is a need of generating more
precise attack signatures. Simplistic signatures tend to generate large numbers of false positives, too
specific ones cause false negatives.
To address these issues, this paper presents HoneyAnalyzer, a tool that helps the security
administrator in generating precise signatures of malicious network traffic. The proposed system uses
honeyd [4], a popular low-interaction open-source honeypot for collecting intruder’s log. Honeyd
simulates hosts with individual networking personalities. It intercepts traffic sent to nonexistent hosts
and uses the simulated systems to respond to this traffic. Each host's personality can be individually
configured in terms of OS type and running network services. This paper focuses on graphical
visualization of the attack/access made on various ports of different simulated Honeypots machines.
The proposed system gives the flexibility to security administrator by providing him a good graphical
interface, to filter out the data. On this filtered data he can apply attack signature algorithms and can
get a balanced attack signature that will not give too many false positives or negatives. A security
administrator can apply LCS algorithm for signature extraction on the data of his choice. Therefore,
this manual intervention will give more precise signatures.
2. BACKGROUND
2.1 Intrusion Detection Signatures
The purpose of attack signatures is to describe the characteristic elements of attacks. A signature can
be a portion of code, a pattern of behavior, a sequence of system calls, etc. There is currently no
common standard for defining these signatures. As a consequence, different systems provide
signature languages of varying expressiveness. A good signature must be narrow enough to capture
precisely the characteristic aspects of exploit it attempts to address; at the same time, it should be
flexible enough to capture variations of the attack. Failure in generating good signatures leads to
either large amounts of false positives or false negatives.
Content Based Signature Generation [10] is process of extracting the attack signatures based on
selection of the most frequently occurring byte sequences across the flows in the suspicious flow
The Second International Conference on Innovations in Information Technology (IIT’05)
pool. To do so various algorithms like LCS are applied to extract the common patterns in it because
malicious payload appears with increasing frequency as the malicious activity spreads.
2.2 Honeypots
The honeypot has emerged as an effective tool for observing and understanding intruder’s toolkits,
tactics, and motivations [7]. A honeypot suspects every packet transmitted to/from it, giving it the
ability to collect highly concentrated and less noisy datasets for network attack analysis.
Honeypots are decoy computer resources set up for the purpose of monitoring and logging the
activities of entities that probe, attack or compromise them [8]. Activities on honeypots can be
considered suspicious by definition, as there is no point for benign users to interact with these
systems. Honeypots come in many shapes and sizes; examples include dummy items in a database,
low-interaction network components like preconfigured traffic sinks, or full-interaction hosts with
real operating systems and services [9].
Honeypots excel at detection, addressing many of the problems of traditional detection. Honeypots
reduce false positives by capturing small data sets of high value, capture unknown attacks such as
new exploits or polymorphic shell-code, and work in encrypted and IPv6 environments [6]. In
general, low-interaction honeypots make the best solutions for detection. They are easier to deploy
and maintain.
3. THE PROPOSED METHOD
The proposed signature extraction system consists of three major parts –
i) Data Capture i.e. traffic logging components: this part includes Honeyd and Tcpdump for data
collection.
ii) Data Analysis i.e. analysis and extraction components: this part contains data analysis part of
signature extraction mechanism for extracting precise attack signature.
iii) Signature Extraction i.e. steps to extract out good quality attack signatures.
3.1 Data Capture
The purpose of Data Capture is to log all the activities of an attacker. The Honeypot does exactly this
i.e. it collects information. The HoneyAnalyzer system has two sources of Data: Honeypot log and
network traffic log from Tcpdump. The Honeyd framework supports several ways of logging
network activity. It can create connection logs that report attempted and completed connections for
all protocols. But to analyze the complete attack scenario, the system needs full payload of the
packets entering and leaving the honeypot. This task is performed by the second element i.e.
Tcpdump which captures every packet’s full payload. Tcpdump is a tool for network monitoring and
one of the most well known sniffers for Linux. Built with the libpcap (packet capture library)
interface, it collects information from packets on the network including those intended for other host
machines. It does this through a network interface card's ability to enter into promiscuous mode. It
then dumps packet header information in the log file.
3.2 Data Analysis
In order to extract the precise attack signature, a data analyzer has been developed as shown in figure
1. The web-interface gives a graphical output using which security administrator can easily find out
most attacked port, most attacked IP address in the form of pie chart as shown in fig 2 and 3. The
proposed methodology for realization of the HoneyAnalyzer for extracting more precise attack
signature is described below:
i) Configure honeyd to simulate network.
ii) Run Tcpdump for traffic analysis.
The Second International Conference on Innovations in Information Technology (IIT’05)
iii) Invoke the auto-run shell script that will run in a particular time interval and execute the parser
utility that will parse the data from the honeyd log file and insert it into the database, as shown in
figure 1. The realization of parser utility can be done in any language, which has strong string
tokenization capability like Java.
iv) Execute the auto-run shell-script to push the honeyd logs data into the database. This will be
invoked by cron.
v) Login to the web interface to view the attack patterns and analyze the data for extraction of good
quality signature.
Figure 1: Honey Analyzer’s architecture, illustrating honeyd as it is simulating a number of different
machines, each running a number of pre-configured services. The HoneyAnalyzer has hooked itself
into the wire to see in and outgoing connections and providing the web-interface.
The Second International Conference on Innovations in Information Technology (IIT’05)
To enable the Security Administrator to select the suspicious data, the web GUI has the following
features:
i) Ability to display packet information from the database.
ii) Ability to display real time network traffic from data stored in database, as well as historical traffic
statistics.
iii) Display the ports, which were attacked within a certain time range using pie charts.
iv) A timeline based hit statistic showing how many hits per second Honeypot got in a certain time
range.
v) Show using pie charts which remote IP-addresses were "visited" by Honeypot in a certain time
range. Here it's possible to specify a port number to show activity on a specific port.
vi) A textual hit statistic over a certain time range. By specifying an IP or a port number it is possible
to focus on specific events.
Figure 2: This is a quick summary of hits on a particular port like in this case it is port number 137
by various IP address.
Figure 3: This is a quick summary of hits by a particular IP address e.g. hits by the machine
192.168.0.39 on various ports.
In the proposed method, database module is useful mainly due to two reasons. First, it is easier to
search for a particular packet or range of packets using database, and all one has to do is to construct
the correct query syntax. Second, the database facilitates different representations of generated data.
The database records all the packets (IP, TCP, and UDP) that are received by the Honeypot and
Tcpdump. The graphical interface can be run independent of the Honeypot and without any type of
honeypot configuration. This independence will come from the database module that is described
earlier. Since past events are all recorded in a database, the web GUI can analyze events without
having to interfere with normal operations of the Honeypot. In this way the proposed system allows
The Second International Conference on Innovations in Information Technology (IIT’05)
for a good selection of data for extracting the attack signatures as against the existing methods, which
blindly apply the content-based signature extraction algorithm on whole data captured by the
honeypot.
3.3 Signature Extraction
The graphical interface has support for application of LCS algorithm the data of interest while
present systems apply LCS algorithm on whole data. The process of finding attack signatures is not
fully automated rather it also depends upon Security Administrator’s (SA) wisdom and experience.
The SA can choose the traffic on which the LCS algorithm is to be applied. The Resulting precise
signature will give less number of false positives and false negatives. The steps followed for finding
the good quality attack signature are as follows: i) Identify data of interest (i.e. of significance) from the database by looking at the web GUI.
ii) Analyze combined data from different data sources i.e. honeypot and Tcpdump. For each received
packet initiate the following sequence of activities:
a) If there is any existing connection state for the new packet, that state is updated otherwise
new state is created.
b) If the packet is outbound, don’t process the packet.
c) Perform protocol analysis [6] at the network and transport layer.
d) For each stored connection, perform header comparison in order to detect matching IP
networks, initial TCP sequence numbers, etc.
iii) Apply content-based string matching algorithm on the payload of interest by applying following
sequence of activities:
a) If the connections have the same destination port, perform pattern detection on the
exchanged messages with the help of Longest Common Substring algorithm. A description
about string-based pattern detection is given in the [10].
b) If a new signature is created in the process use the signature to augment the signature pool
otherwise stop the process.
DISCUSSION & CONCLUSIONS
The honeyanalyzer presented in this paper shall be useful in extracting good quality signatures from
the data obtained by the logs of honeypot and traffic analyzer. It has been observed that the number
of signatures generated by traditional methods are large in number as compared to those generated
using honeyanalyzer i.e. lack of knowledge of protocol semantics and local network produce more
number of signatures with benign substrings. Honeycomb was one of the first efforts to address the
problem of automatic signature generation from honeypot traces [5]. An evaluation of Honeycomb
performed in [13], shows that while there were several perfectly functional signatures, there were
also a surprisingly large number of benign strings that were identified by the LCS algorithm. Some of
these were small strings such as “GET” or “HTTP” that are clearly impractical and just happened to
be the longest common substring between unrelated sessions. These were part of normal operation
and were suppressed by white-listing signatures smaller than a certain length [13]. There were also
much longer strings in the signature set, such as proxy-headers that also do not represent real attack
signatures. Thus, the only way to avoid these kinds of problems is through manual grooming of
signatures by an expert with protocol knowledge.
A comparison of HoneyAnalyzer and Honeycomb is as follows:
i) Pairwise LCS employed by Honeycomb often leads to redundant (non-identical) signatures, which
would generate multiple alarms for the same attack. While, HoneyAnalyzer generalizes the approach
such that a security administrator who is aware of protocol semantics can groom the signature to
make it far less prone to redundant signature production.
The Second International Conference on Innovations in Information Technology (IIT’05)
ii) Honeycomb’s lack of semantics awareness leads to signatures consisting of benign sub strings.
These lead to false positives, thus Honeycomb is unable to produce precise signatures for protocols
such as NetBIOS, MS-SQL and HTTP attacks, such as Nimda, where the exploit content is a small
portion of the entire attack string. In case of HoneyAnalyzer semantics awareness is the responsibility
of security administrator. He can better understand the benign substrings of the local network and can
filter out redundant and useless strings.
Thus the signatures obtained through HoneyAnalyzer are of high quality and result in more precise
intrusion detection, not giving too many false positives or negatives. HoneyAnalyzer can also act as
an intrusion indicator i.e. how, when and from where different intrusion attempts are taking place.
This can be shown through the graphical interface. Honeypots are increasingly deployed in networks;
however, they are mostly used passively and administrators watch it just for what happens. The
proposed system gives better control to the security administrator on intrusion detection process for
extracting good quality attack signature.
Future Work
In the future, attempt can be made to add implementation of some more algorithms and techniques
like connection Tracking, protocol analysis, and pattern detection in flow content etc. based on which
security administrator can perform the analysis and extract the signature with even greater precision.
To make HoneyAnalyzer more flexible, certain more parameters like allowing the negative
interpretation of input like Port! = 445 that will show activities on all Ports except 445 can also be
added. A quantitative comparison also needs to be done between the existing method and proposed
method to illustrate the advantages of proposed system over existing system.
REFERENCES
[1] Paul Innella and Oba McMillan, "An Introduction to Intrusion Detection Systems",
http://www.securityfocus.com/infocus/1520.
[2] Christian
Plattner,
Reto
Baumann,
“White
Paper:
Honeypots”,
http://www.inf.ethz.ch/personal/plattner/pdf/whitepaper.pdf.
[3] Lance Spitzner, “The Value of Honeypots, Part One: Definitions and Values of Honeypots”,
http://www.securityfocus.com/infocus/1492.
[4] Niels Provos, “A Virtual Honeypot Framework”, In Proceedings of the 13th Usenix Security
Symposium (Security 2004), San Diego, CA, August 2004, Pp. 1–14.
[5] Christian Kreibich, Jon Crowcroft, “Honeycomb-Creating Intrusion Detection Signatures” Using
Honeypot, ACM SIGCOMM Computer Communication Review archive Volume 34,Issue1 January
2004, Pp. 51 – 56.
[6] Erwan Lemonnier, Defcom, “Protocol Anomaly Detection in Network-based IDSs”,
http://erwan.lemonnier.free.fr/.
[7] Lance
Spitzner,
“Honeypots:
Simple,
Cost-Effective
Detection”,
http://www.securityfocus.com/infocus/1690.
[8] Martin Roesch, “Snort – Lightweight Intrusion Detection for Networks”, Proceedings of USENIX
13th System Administration Conference, November 1999, pp.229-238.
[9] Yuqing Mai, Radhika Upadrashta and Xiao Su, J-Honeypot: A Java-Based Network Deception Tool
with Monitoring and Intrusion Detection, Proceedings of the International Conference on Information
Technology: Coding and Computing (ITCC'04) Volume 1 April 05 - 07,2004, Pp. 804-808.
[10] Hyang-Ah Kim, Brad Karp, “Autograph: Toward Automated, Distributed Worm Signature Detection,”
In Proceedings of the 13th Usenix Security Symposium, San Diego, CA, August 2004. Pp. 271–286.
[11] Peng Ning, Dingbang Xu, "Learning Attack Strategies from Intrusion Alerts," in Proceedings of the
10th ACM Conference on Computer and Communications Security, October 2003, Pp 200-209.
[12] Peng Ning, Yun Cui, Douglas Reeves, and Dingbang Xu, "Tools and Techniques for Analyzing
Intrusion Alerts," in ACM Transactions on Information and System Security, Vol. 7, No. 2, May 2004,
Pp 273-318.
[13] Vinod Yegneswaran, Jonathon T. Giffin, Paul Barford, and Somesh Jha. An Architecture for
Generating Semantics-Aware Signatures. In 14th USENIX Security Symposium, Baltimore,
Maryland, August 2005. To appear.
[14] V.V. Patriciu, I. Priescu, Using Data Mining Techniques for increasing Security in E-mail System
Internet-based, in 11th Conference CAIM, Oradea, 2003.