Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International conference on Signal Processing, Communication, Power and Embedded System (SCOPES)-2016 Data Analysis of File Forensic Investigation Ms. Priyanka Salunkhe Mrs.Smita Bharne, Mrs.Puja Padiya Student Department of Computer Engineering Ramrao Adik Institute of Technology Nerul, Navi Mumbai [email protected] Assistant Professor Department of Computer Engineering Ramrao Adik Institute of Technology Nerul, Navi Mumbai [email protected], [email protected] Abstract— rapidly growing Internet Technology may cause cybercrimes committed by attackers. Different type of digital devices is being used to commit an attack. To detect such a criminal activity forensic investigator has to use various data recovery methods and practical framework. There are various type of forensic tool kit (FTK), freeware software’s, techniques and tools are available for file forensic investigation. Decision Tree (DT) is also one of the technique which can help for file forensic investigation purpose. So, system can adopt a way by using Decision Tree for generating, storing and analyzing data retrieved from log files which pose as evidence in file forensic analysis. This paper focuses on how Decision Tree can allow system to quickly, easily and inexpensively analysis of log data available in various file formats for file forensic analysis. Keywords-Digital forensic lifecycle, log file collection, File forensic analysis, k-means clustering technique, Classifier as Decision tree. I. INTRODUCTION Digital Forensic is the process of collecting information or evidences from digital devices [1]. The collected evidences are get analysed to detect whether any illegal activity is present. This illegal activity can be cyber-attack also. To recognize such cyber-attack various techniques and tools are available. Cyber-crime involves log files which belong to file forensic, IP address of attack attackers system which belongs to network forensic. Various type of attack can be possible in digital device. Those are IP spoofing, salami attack, DOS attack, DDOS attack, buffer overflow attack etc. Detecting such type attack and storing evidences regarding attacks takes plenty of times. Digital forensic contributes to detect a cyber crime. Only government has authorities to perform digital forensic now-a-days many commercial organizations also got authorities for the same. Forensic analysis is nothing but recovery of data from digital devices. Various tools and applications are built for digital forensic but they have certain limitations. Technical challenges implement small scale of data mining in which decision tree(DT) (add first time full form)can support for fast and efficient classification of data. General methodology is being used for digital forensic analysis as follows [2]: Figure 1: General methodology for digital forensic analysis [1]. (1) Preparation: In digital forensic analysis, data preparation is an important task. Data can be collect from log files or web browser history. this log data consists of information about user like user’s session id, session time, source_id of user, destination_id of user, type of file format accessed by user etc. (2) Detection: Once log file gathered, the next step is to recognize the presence of nature of an attack. If any suspicious activity got, then have to detect type of an attack. (3) Generation: Log file gives in detailed information about particular file. It may cause large amount of data get generated. Maintaining such huge data becomes difficult to forensic investigator. (4) Examine: Once data get collected from different nodes it will get integrate into large dataset. This large dataset again get checked to see whether any exception is present. (5) Analysis: This step is very important which helps to reach our output or final result. Analysis of collected data helps to recognize the pattern of an attack. (6) Investigation: The main aim is find out victim nodes by analysing collected data. Once victim system get recognized then next job is try to understand the intention of an attacker behind his crime activity. (7) Presentation: After investigation final step is to be present remark in understandable format. The output or explanation of analysis step should be presented in graphical or in tabular format. So, Investigator can easily able to understand problem and make their decision on it. Above figure(1) shows, the general methodology to arrive at the victim system. Among all steps, analysis step is an important step which helps to detect a crime activity and take a decision according to that. II. LITERATURE SURVEY In digital forensic, evidences are get analysed at the victim system. Also evidences give the clue to investigator which helps to recognize the pattern of an attack. There are various tools and methods are available to detect a crime activity. Data mining techniques also works well to identify the crime records. Simple K-means algorithm is use to clustering similar kind of records using Euclidean distance. Finding victim system or recognize attack pattern is very difficult task to forensic investigator. Various forensic tool kits are available to detect a crime activity. But such kind of tools works within some limitations. These toolkits are expensive. III. RELATED WORK Veena Bhat [3] suggests a way to detect a crime activity using classic extract-transform-load steps. This paper introduced gathering, storing and analysing the data from digital devices. Flash Drive is used as digital device. The data is extracted from flash drive and stored it as evidences. Recuva software has been used to exact information from flash drive. Oracle Express Edition (10g) software has been used to load the data for pre-process further. But there are certain limitations with this software’s like Recuva is unable to recover or extract all files from flash drive. Hence the extracted data is not complete data through which crime activity get recognized. Prashant Khobragade [4] introduced a way to analyse a browser history data to detect where the attack is happened. Data get collected from browser history and stored in database as evidence. Then stored data get retrieved from database and K-means algorithm has been used to clustered normal users and attackers. Forensic tool kit is applied on data for network and remote system forensic. Latesh Malik [5] introduced various network forensic analysis tools. This tools help to monitor network traffic. Also collect all the details about victim’s source and destination id. Network forensic does not provide system security it just trace the victim’s behaviour. Most of the attack happens due to the network transmission. Lot of events occurred accordingly each event get stored may cause huge amount of data get created. Handling such huge data for network forensic is a difficult task. So, this tool helps to monitor network to detect attacker’s behaviour. K.K.Sindhu [6] introduced tools which combination of digital forensic investigation and crime mining. In this paper, proposed system is designed to find out attack pattern by using data mining technique and count of attack types happened during time. Network and file forensic investigation can achieve through this proposed system. To arrive at final result, proposed system implemented crime data mining algorithm in which association ruled has been applied. Chrysoula Tsochataridou [7] suggests how forensic investigator can handle huge or complex data while analysing stage. Forensic investigation process can be improved by using data mining technique. Collected evidences should be cross checked to each other it may cause investigator get some clues. Different type of data mining strategies can be used for forensic analysis. K-means and apriori algorithm work well for clustering similar kind of data. Proposed framework analyse huge amount of textual data using weka. Weka provides various machine learning concepts to analyse the collected data. Daniel Compton [8] proposed a framework using text mining technique. Using this text mining technique it’s become easy to forensic investigator to find out particulars identity with the help of social networking sites. Text mining technique mine the data of particulars on different two or more social networking site. Sonal Honale [9] proposed a system which recover the data for analysis and stored as evidence in database. Proposed system can recover data as evidence from deleted and hidden spaces. K-means algorithm has been used to detect a signature of an attack. To achieve this system used wincap, jcap, wmic software’s. This system recognizes DOS and sql injection attack signature. From literature survey we analysed there is various ways to deal with different type of attacks. Different type of tools and methods are available to recognize the pattern of an attack. Handling complex and huge data for forensic investigation purpose become difficult task. While investigating for crime activity it involves memory forensic, network forensic, file forensic as well. So investigator must have to consider all this concepts while forensic investigation analysis. IV. CONVENTIONAL SYSTEM [2] Because of advanced internet feature cybercrimes committed highly. To detect an attack, system has been used various software’s and forensic tool kit. K-means clustering technique has been used to cluster normal users and attackers. Then forensic tool kit applied on attacker’s cluster to find out the victim system. unsupervised clustering algorithm using K-Means which will cluster in normal users and attackers. Then system uses the classification algorithm (Decision Tree) to verify the visualization pattern of the data instances and detect a type of an attack. Data pattern of attacker’s data will get match with pattern in training dataset. This training data help to recognize the type of an attack. A. Disadvantages of conventional system 1. In network lots of data is generated in every event of action, so it is time consuming and difficult task for investigator to analyze huge amount of data. 2. Forensic tool kit does not support AVI files. 3. K-means clustering technique is slowest because the classification time directly related to the data. If data available in huge amount then it take more time to classify that particular data. V. PROPOSED SYSTEM The proposed system implements file forensic analysis strategy which will help forensic investigator to detect a crime activity. The system will collect the log files as an input and analyse the collected data using data mining technique. When a crime occurred, it investigates and evidence is stored in the database. Using crime data mining system the nature of the attack is identified and alert administrator about similar attacks also our proposed system will helps to increase the security of the organization. Figure 2, shows the Block diagram of system Figure 3: workflow diagram of system Figure 2: Architecture of proposed system 1. Graphical User Interface (GUI): GUI provides interface between investigator and system. Through graphical user interface investigator can give the input for a desired output. 2. Log file collection: Log file collection is nothing but browser history collection. Through this data investigator get the evidences related crime. This system captured the browser history as input and save in database. 3. Database: System used the database to save browser history as evidence. Database provides ease to stored data in graphical as well as in graphical format. 4. Data mining technique: Data mining techniques is used to analyse the collected data. It consists of various modules like association, characterization, cluster analysis, classification. The first step is to run 5. VI. Report: Finally system generates the report which shows the type of an attack has been occurred. Generated report shows the result of the analysis stage. FILE FORENSIC ANALYSIS MODULE AND NETWORK INVESTIGATION ANALYSIS MODULE This system implements file forensic analysis in which system captures the log files. These log files is nothing but the browser history in which system collects files in various formats. The file forensic analysis is nothing but gathering files in various file formats, storing them into database as evidence and analyzing them for detecting crime activity. File forensic analysis involves collecting evidences from log files and analyze those evidence to detect a cyber-crime activity. This proposed system also implements the Network investigation analysis module. Network investigation module is capturing, storing and investigating the each and every network action to discover the source of cyber-crime activity [10], [11]. Network investigation is used to find out attackers behaviour and tracked them by gathering and analysing log file information and monitoring network traffic. This system captures the user source_id, destination_id and port through log files. VII. DATA ANALYSIS TO DETECT AN ATTACK BY USING DATA MINING TECHNIQUE Data instances are grouped together, based on the similarity scheme. So, the clustering is the data mining technique which helps to make a group of data instances into clusters of significant interest evaluate the performance of the system and detect outlier [12]. This proposed system implements clustering of data as normal user’s or attackers by using simple k-means algorithm. Simple k-means algorithm takes k, the number of clusters to be determined, as an input parameter and partitions the given set of n objects into k clusters so that the resulting intra-cluster similarity is high while the inter-cluster similarity is low. Euclidean distance measure is used to assign instances to clusters. The objective of this algorithm is the minimization of the mean squared Euclidean distance of the data from the centre of their clusters. In system, decision tree work as powerful classifier. It does not matter how huge amount of data is, DT has capacity to classify that huge amount of data [13]. Tree size is depending on the size of data. Decision tree represent specific set of association rules regarding each node which will help to classify the dataset. VIII. CONCLUSION Cyber-crime data is collected using file forensics analyser of system methodology; the log file is captured and stored in database as evidence. Digital forensics is used to find out attackers behaviour and tracked them by gathering and analysing log information in network, also Digital forensic tries to analysis network traffic data. To, implement file forensic analysis and network investigation analyser in efficient way system applying data mining techniques which will help to forensics investigator to detect a crime activity. In system, the Decision Tree works as classifier used to analyse where an attack is happened and also the type of an attack. References [1] Vimal shetty, Ramesh Vardhan,“Digital Forensics using Data Mining”,International Journal of Computer Trends and Technology (IJCTT) volume 22 Number 3rd April 2015. [2] Mohd Taufik Abdullah, Ramlan Mahmod, Abdul A. A. Ghani, Mohd A Zain and AbuBakar Md S, “Review on Advanced features of Digital Forensics”,International Journal Of Computer Science and Network Security, vol. 8, no. 2, February 2008. [3] Veena H Bhat, Prasanth Rao, Abhilash R V, Deepa Shenoy, Venugopal K R and L M Patnaik,“A Data Mining Approach for Data Generation and Analysis for Digital Forensic Application”,,IACSIT International Journal of Engineering and Technology, Vol.2, No.3, June 2010. [4] Prashant K. Khobragade,Latesh G. Malik, “Data Generation and Analysis for Digital Forensic Application using DataMining”,Fourth International Conference on Communication Systems and Network Technologies, June 2015. . [5] Latesh G. Malik,“A Review on Data Generation for Digital Forensic Investigation using Datamining”, IJCAT International Journal of Computing andTechnology, Volume 1, Issue 3, April 2014. [6] K. K. Sindhu, B. B. Meshram, “ Digital Forensics and Cyber Crime Data mining, Journal of Information Security, May 2012. [7] Chrysoula Tsochataridou, Avi Arampatzis, Vasilios Katos,“Improving Digital Forensics Through Data Mining”, IMMM 2014, The Fourth International Conference on Advances in Information Mining and Management, September 2016. [8] Daniel Compton, J.A. Hamilton,“ An Examination of the Techniques and Implications of the Crowd-sourced Collection of Forensic Data”, IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing, April 2014. [9] Sonal Honale , Jayshree Borkar,“ Framework for Live Digital Forensics using Data Mining”, International Journal of Computer Trends and Technology (IJCTT) volume 22 Number 3, April 2015. [10] Seung-hoon Kang, Juho Kim, “Network Forensic Analysis Using Visualization Effect, International Conference on Convergence and Hybrid Information Technology”, 2008 Internationa Journal of Security,December 2015. [11] Natarajan Meghanathan, Sumanth Reddy Allam and Loretta,“Tools And Techniques For Network Forensics”, USA International Journal of Network Security and Its Applications (IJNSA), Vol .1, No.1,April 2014. [12] Sachin S. Patil, Deepak Kapgate, P.S. Prasad,“ A Review on Detection of Attacks using Data Mining Techniques”,December 2013 volume 3 International Journal of Advanced Research in Computer Science and Software Engineering. [13] Iman Paryudi , Ahmad Ashari ,“ Performance Comparison between Nave Bayes, Decision Tree and k”-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool”,(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 4, No. 11, 2013 .