Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DECISION BASED INTRUSION DETECTION SYSTEM USING GENETIC ALGORITHM Ms. Priyanka Kisan Ghodekar Ms. Shaila Bhausaheb Dere [email protected] [email protected] Ms. Sneha Prabhakar Chaskar Mr. Harish Tanaji Indalkar [email protected] [email protected] Student, Department Of Computer Engineering, Sharadchandra Pawar College of Engineering, Pune, Maharashtra, India Abstract- In today’s scenario the security is important testing dataset is used to generate effective new rules in vast growing computer networks, so Intrusion by adopting reasonable detection rate. Detection System is essential task in daily life Keywords-Computer and Network Security, Intrusion practices. There are various approaches being utilized Detection System, Genetic Algorithm, KDD CUP for intrusion detection. We present an Intrusion 1999 Dataset, Decision Tree. Detection System using Genetic Algorithm and Decision Tree for efficiently and effectively to identify various types of intrusions or attack. This project is on developing advanced intelligent systems using ensemble computing techniques for detection of intrusions. Integration of computing techniques like Genetic Algorithm (GA), and Decision Tree (DT) are used to detect and prevent intrusions. The Intrusion Detection System in Networking Using 1. INTRODUCTION An intrusion detection system (IDS) is a device (or application) that monitors network and/or system activities for malicious activities or policy violations and produces reports. It is a process of controlling the events occurring in a computer system or network and analyzing them for signs of possible incidents, which are violation of computer security practices. Genetic Algorithm (IDS) and Decision Tree is to identify the intruder and block the data from the Intrusion Detection Systems have undergone intruder to avoid the system attack by the virus. The rapid growth in power, scope and complexity in their major components of the system are creating new set short history. In recent years, Intrusion detection of rules during run time. The GA component imparts system has been one of the most sought after research the feature subset selection through a suitably framed topics in the field of Information Security having fitness function. A decision tree is used to detect the huge applications in the cooperate world where data subtype of attack. The KDDCUP99 training and integrity and security is a complex issue. When an intruder attempts to break into an information system, then we can say intrusion parameters of the fuzzy functions for selecting the features of the relevant network. occurred. Intruders may be external or internal depending upon the authorization level. Intrusion Lu: In this method classification rules are generated techniques may include exploiting software bugs or by Genetic Programming. Detection or Classification system An of intrusions in the network with the help of the Intrusion Detection System (IDS) is a system for fitness function is fine tuned by this method. The detecting intrusions and reporting them accurately to time required to train the system with huge data the proper authority. IDSs are usually specific to the creates operating system that they operate in and are an difficult. configurations, password cracking. Genetic Programming implementation important tool in the overall implementation of an organization’s information security criteria by Crosbie: Different agent techniques and Genetic defining the rules and practices to provide security, Programming can be used to detecting network handle intrusions, and recover from damage caused intrusions. The set of agents that determine the by security breaches. network behaviors can be finding out by an agent who monitors one parameter of the network audit 2. RELATED WORK data and Genetic Programming. Many small GAs has been used for network intrusion autonomous agents can be used in this method which detection in several ways. Some of them use Genetic is an advantage and the communication among the Algorithm for to obtain the classification rules , while agents is a drawback. others use different AI methods for possession of rules, where GAs are used to select appropriate Selvakani: This system identifies the attacks using features or to determine the optimal parameters of rule set by proceeding Genetic Algorithm, then some functions. exploit rules only for R2l and DoS type of attacks. Between these two attacks, one from each is selected. Li represent a technique using GA to detect abnormal The common performance of the system is less than network 60%. intrusion. This approach includes is obtaining classification rules for quantitative and distinct features of network data. Apart from the implementation of rule generation for IDS is given 3. INTRUSION DETECTION SYSTEM but results of experiments do not exist. OVERVIEW The following sections give a overview of Bridge: This method is a combines both fuzzy data various components of Intrusion Detection System, mining techniques and Genetic Algorithm for classifications and networking attacks. detection of network anomalies and misuses. The most features are not predicted properly in various 3.1 Components of Intrusion Detection System existing Genetic Algorithm based IDS’s. This method An intrusion detection system normally consists of uses Genetic Algorithm to recognize the optimal three functional components. The first component is known as the event generator or data source. The This section is an overview of the four major second component is known as the analysis engine, categories of networking attacks and each attack is which takes information from the data source and placed into following groupings. examines the data for symptoms of attacks. The _ Denial of Service (DoS): A DoS attack is a type of analysis engine use following analysis approaches: attack in which the hacker makes a computing or _ Misuse/Signature-Based Detection: This type of memory resources too busy or too full to serve detection engine detects intrusions that follow well- legitimate networking requests and hence denying known patterns of attacks (or signatures) that exploit users access to a memory resources. For example known software vulnerabilities. The main limitation smurf, apache, mail bomb, Neptune, etc. of this approach is that it only looks for the known _ Remote to User Attacks (R2L): A remote to user weaknesses and may not care about detecting attack is an attack in which a user sends packets to a unknown future intrusions. machine over an internet, which he or she doesn’t _ Anomaly/Statistical Detection: An anomaly based have access to in case to expose the machines detection engine will search for something rare or vulnerabilities and exploit privileges which a local unusual. The drawbacks of the system are that they user would have on the machine. For example phf, are highly expensive and they can recognize an guest, xlock, xnsnoop, etc. intrusive behavior as normal behavior because of _ User to Root Attacks (U2R): These attacks are insufficient data. exploitations in which the hacker starts off on the _ The third component of an intrusion detection system with a normal user account and attempts to system is the response manager, which will only act abuse vulnerabilities in the system in order to gain when inaccuracies (possible intrusion attacks) are super user rights. For example xterm, perl, etc. found. _ Probing: Probing is an attack in which the hacker scans a machine or a networking device in order to 3.2 Classification of Intrusion Detection determine weaknesses or vulnerabilities that may Intrusions Detection can be classified into two later be exploited. This is used in data mining. For following categories. example portsweep, nmap, etc. _ Host Based Intrusion Detection: HIDSs evaluate information found on a single or multiple host 4. PROPOSED SYSTEM systems, including contents of operating systems, system and application files. _ Network Based Intrusion Detection: NIDSs evaluate information captured from network GA. communications, analyzing the stream of packets which travel across the network. 3.3 Networking Attacks This system generates its own rules using a The system is implemented using a KDD CUP 1999 Testing and Training Datasets. It dynamically increases the rules in the dataset according to the packets flowing in the network. Because of this reliability of the system also increases. Our system classifies attacks using a permitted. And if that connection is attack type then Decision Tree. type of attack is detected and system generates alert. The major objective of this system is to improve the detection rate. 6. FLOW OF SYSTEM 5. WORKING OF OUR IDS IN REAL SYSTEM The following diagram shows the components of our system. In the system attack detected before arriving packet on machine through network connection. For this we are using IDS which contains KDD, GA and Decision Tree. Figure 6.1 Flow of System Let’s see the flow of system. KDD CUP 1999 dataset contains records in which each record has 41 attributes and 1 manually assigned record type. Record type indicates whether a record is a normal network connection or abnormal network connection. Here we are using only 6 attributes. For the extraction of attributes we are using a weka tool. Then that 6 attributes given to the genetic algorithm as input. Then genetic algorithm generates runtime rule set which is nothing but a one chromosome. We Figure 5.1 Applying Intrusion Detection System can say that chromosome may be a signature of Let’s see where to apply the Intrusion Detection normal type or attack type. Match that signature with System. A network connection is a sequence of TCP the signatures predefined in test data set. If that packets starting from a source IP address and ending signature match with the signature of a particular at target IP address, results in 41 attributes for every attack then that attack is detected. Then that rule set connection and 1 manually assigned record type. or detected attack given to the decision tree as a Before arriving that packet on machine that network input. Then decision tree classify that attack in a connection analyzed by Intrusion Detection System. particular attack type. Generated rule set will be If that connection is normal connection then stored in rule base. of chromosomes for survival and combination is 7. GENETIC ALGORITHM biased towards the best fit chromosomes. A Genetic Algorithm (GA) is a programming technique that reproduces biological evolution as a 7.1 FLOWCHART OF GA problem-solving strategy. GA is a technique which works on the mechanics of natural selection. It is based on the Darwin’s theory of survival of the fittest. The following flowchart shows the flow of a simple genetic algorithm. Starting by a random generation of initial population, then evaluate and evolve through selection, recombination (crossover), and The GA process begins with a set of potential solutions or chromosomes which are randomly mutation. Finally, the best individual (chromosome) is picked out as the final result once the optimization meets it target. generated or selected. These chromosomes are normally encoded in the binary form but other forms of encodings are also used. The entire set of these chromosomes comprises a population. In every generation the fitness of these chromosomes is checked. Fitness function is used to find out the fitness of the chromosomes and then selection operator will choose the fittest chromosomes using tournament selection. The chromosomes with poor fitness value are discarded. GA uses an evolution and natural selection that uses a chromosome-like data structure and evolve the chromosomes using selection, recombination (crossover), and mutation operators. The process generally begins with arbitrarily generated population of chromosomes, which represent all potential solution of a problem that are measured applicant solutions. Different positions of each chromosome are encoded as bits, characters or numbers, which is refer as genes. An evaluation function is used to Figure 7.1.1 Flowchart of GA 7.2 ALGORITHM OF GA compute the decency of each chromosome according GA_Rule _Generation to the desired solution is known as “Fitness Input: Encoded binary string of length n (where n is Function”. For the period of evaluation, the basic two the number of features being passed), number of operators, crossover and mutation, are used to imitate generations, population size, crossover probability the natural reproduction and mutation. The selection (Pc), mutation probability (Pm). Output: A rule set generation for IDS. 1. Initialize the population randomly. 2. Initialize N (total number of records in the training In preprocessing phase the KDDCUP99 Dataset set). is processed by using Weka tool which is used to 3. for each chromosome in the new population remove the redundant data from existing Dataset 4. Calculate fitness= Fx/Sum (Fx) which result in tested Dataset. The removal of 5. End for redundant data or records from Dataset it improves 6. Select 50% best fit chromosome and remove worse the detection rate of desired result and improves the fit chromosome. performance of our system. 7. Apply Crossover to best selected chromosome. 8. Apply Mutation for each chromosome to generate In detection phase the Genetic Algorithm is new population .go to step no3. applied on chosen features data set and locate fitness 9. Stop for every rule with the following fitness function. Fitness = Fx / sum (Fx) Where Fx is the fitness of individual x and sum (Fx) GA Parameters is the entire fitness of all individuals. GA has some general elements and parameters which can be defined: • GA Operators The different GA parameter 8. KNOWLEDGE DISCOVERY selection mutation and crossover are the most DATASET (KDD) successful parts in the algorithm as they are KDDCUP99 is based on DARPA data from MIT contribute in the generation of each population. Selection phase individuals with where superior population fitness are selected, otherwise it gets damaged. • • Lincoln Laboratory is broadly used to evaluate IDSs. In this study, we used the KDDCUP99 training and testing datasets. Crossover is a method in each pair of each Each record of the datasets consists of 41 individuals selects arbitrarily participates in network features and 1 manually assigned record exchanging their parent’s genes with each type. Nine network features were used in the GA other, until an entire new population has which is Duration, Protocol, Service, Flag, Source been generated. bytes, Destination bytes etc. The record type indicates Mutation flips some of the bits in an whether a record is a normal network connection or individual, and since all bits could be filled, abnormal network connection. there is low probability of predicting the change. The KDD 99 intrusion detection benchmark consists different components : • Fitness Function The fitness function is defined as kddcup.data; kddcup.data_10_percent; a function which scales the value individual relative kddcup.newtestdata_10_percent_unlabeled; to the rest of population. It generates the best possible kddcup.testdata.unlabeled; solutions from the amount of candidates located in kddcup.testdata.unlabeled_10_percent; corrected. the population. We have used “kddcup.data_10_percent” as Table 9.1- Detection rate of intrusions training dataset and “corrected” as testing dataset. In this case the training set consists of 494,021 Types records among which 97,280 are normal connection Total no. Correctly Detection of records detected Rate records records, while the test set contains 311,029 records among which 60,593 are normal connection records. Table 8.1 shows the distribution of each intrusion type in the training and the test set. normal 25640 22398 87.35 % Probe 17890 14177 79.24 % dos 16872 14982 88.26 % u2r 8742 5230 59.82 % r2l 6412 4310 67.21 % Table 8.1- Distribution of intrusion types in datasets Types Dataset Train 10. CONCLUSIONS Test (“corrected”) (“kddcup.data_10_per In this paper, we present and implemented an Intrusion Detection System Using Genetic Algorithm cent”) and Decision Tree to efficiently detect various types normal 97280 60593 probe 4107 4166 dos 391458 229853 u2r 52 228 r2l 1124 16189 Total 494021 311029 of network intrusions. The KDDCUP99 training and testing dataset is used to generate effective new rules by adopting reasonable detection rate. The major advantages of this proposed detection system can be generating the new rules to the systems as the new intrusions become known. A GA is used to obtain a set of classification rules. The six features were used when encoding and obtaining the rules. A simple but effective and flexible fitness function is used to select the appropriate rules. Depending on the 9. EXPERIMENTAL RESULTS AND selection of fitness, the generated rules given to the ANALYSIS Decision Tree to detect network intrusions or categorize the types of intrusions. We get better detection rate for denial of service & The Genetic Algorithm based Intrusion Detection user-to-root and close detection rate for probe & System’s detecting several types of attacks is possible remote-to-local. with a high rate of rule set provided. 11. REFERENCES [1] Mohammad Sazzadul Hoque, Md. Abu Naser Bikasi, and Md. Abdul Mukit “An Implementation Of Intrusion Detection System Using Genetic Algorithm”, International Journal of Network Security and Its Applications (IJNSA), Vol.4, No.2, March 2012. [2] Ch.Satya Keerthi.N.V.L#1, B.Minny Priscilla*2, P.Lakshmi prasanna#3, M.V.B.T.Santhi#4, “Model Generation for an Intrusion Detection System using Genetic Algorithm”, International Journal of P2P Network Trends & Technology- Vol.1Issue2- 2011. [3] Mark, Crosbie, and Gene Spafford. 1995. “Applying Genetic Programming to Intrusion Detection”. In Proceeding of 1995 AAAI Fall Symposium on Genetic Programming, pp. 1-8. Cambridge, Massachusetts. [4] Bridges, Susan and Rayford B. Vaughn. 2000. “Intrusion Detection via Fuzzy Data Mining”, In Proceedings of 12th Annual Canadian IT Security Symposium, pp. 109-122. Ottawa, Canada. [5] KDDcup 1999 data, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99. html [6] www.google.com