Download Data Analysis of File Forensic Investigation - scopes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
International conference on Signal Processing, Communication, Power and Embedded System (SCOPES)-2016
Data Analysis of File Forensic Investigation
Ms. Priyanka Salunkhe
Mrs.Smita Bharne, Mrs.Puja Padiya
Student
Department of Computer Engineering
Ramrao Adik Institute of Technology
Nerul, Navi Mumbai
[email protected]
Assistant Professor
Department of Computer Engineering
Ramrao Adik Institute of Technology
Nerul, Navi Mumbai
[email protected], [email protected]
Abstract— rapidly growing Internet Technology may cause
cybercrimes committed by attackers. Different type of digital
devices is being used to commit an attack. To detect such a
criminal activity forensic investigator has to use various data
recovery methods and practical framework. There are various
type of forensic tool kit (FTK), freeware software’s, techniques
and tools are available for file forensic investigation. Decision
Tree (DT) is also one of the technique which can help for file
forensic investigation purpose. So, system can adopt a way by
using Decision Tree for generating, storing and analyzing data
retrieved from log files which pose as evidence in file forensic
analysis. This paper focuses on how Decision Tree can allow
system to quickly, easily and inexpensively analysis of log data
available in various file formats for file forensic analysis.
Keywords-Digital forensic lifecycle, log file collection, File forensic
analysis, k-means clustering technique, Classifier as Decision tree.
I.
INTRODUCTION
Digital Forensic is the process of collecting information or
evidences from digital devices [1]. The collected evidences are
get analysed to detect whether any illegal activity is present.
This illegal activity can be cyber-attack also. To recognize
such cyber-attack various techniques and tools are available.
Cyber-crime involves log files which belong to file forensic,
IP address of attack attackers system which belongs to
network forensic. Various type of attack can be possible in
digital device. Those are IP spoofing, salami attack, DOS
attack, DDOS attack, buffer overflow attack etc. Detecting
such type attack and storing evidences regarding attacks takes
plenty of times. Digital forensic contributes to detect a cyber crime. Only government has authorities to perform digital
forensic now-a-days many commercial organizations also got
authorities for the same.
Forensic analysis is nothing but recovery of data from
digital devices. Various tools and applications are built for
digital forensic but they have certain limitations. Technical
challenges implement small scale of data mining in which
decision tree(DT) (add first time full form)can support for fast
and efficient classification of data.
General methodology is being used for digital forensic
analysis as follows [2]:
Figure 1: General methodology for digital forensic
analysis [1].
(1) Preparation: In digital forensic analysis, data
preparation is an important task. Data can be collect
from log files or web browser history. this log data
consists of information about user like user’s session
id, session time, source_id of user, destination_id of
user, type of file format accessed by user etc.
(2) Detection: Once log file gathered, the next step is to
recognize the presence of nature of an attack. If any
suspicious activity got, then have to detect type of an
attack.
(3) Generation: Log file gives in detailed information
about particular file. It may cause large amount of data
get generated. Maintaining such huge data becomes
difficult to forensic investigator.
(4) Examine: Once data get collected from different nodes
it will get integrate into large dataset. This large
dataset again get checked to see whether any
exception is present.
(5) Analysis: This step is very important which helps to
reach our output or final result. Analysis of collected
data helps to recognize the pattern of an attack.
(6) Investigation: The main aim is find out victim nodes
by analysing collected data. Once victim system get
recognized then next job is try to understand the
intention of an attacker behind his crime activity.
(7) Presentation: After investigation final step is to be
present remark in understandable format. The output
or explanation of analysis step should be presented in
graphical or in tabular format. So, Investigator can
easily able to understand problem and make their
decision on it.
Above figure(1) shows, the general methodology to arrive
at the victim system. Among all steps, analysis step is an
important step which helps to detect a crime activity and
take a decision according to that.
II.
LITERATURE SURVEY
In digital forensic, evidences are get analysed at the victim
system. Also evidences give the clue to investigator which
helps to recognize the pattern of an attack. There are various
tools and methods are available to detect a crime activity. Data
mining techniques also works well to identify the crime
records. Simple K-means algorithm is use to clustering similar
kind of records using Euclidean distance.
Finding victim system or recognize attack pattern is very
difficult task to forensic investigator. Various forensic tool kits
are available to detect a crime activity. But such kind of tools
works within some limitations. These toolkits are expensive.
III.
RELATED WORK
Veena Bhat [3] suggests a way to detect a crime activity using
classic extract-transform-load steps. This paper introduced
gathering, storing and analysing the data from digital devices.
Flash Drive is used as digital device. The data is extracted
from flash drive and stored it as evidences. Recuva software
has been used to exact information from flash drive. Oracle
Express Edition (10g) software has been used to load the data
for pre-process further. But there are certain limitations with
this software’s like Recuva is unable to recover or extract all
files from flash drive. Hence the extracted data is not complete
data through which crime activity get recognized.
Prashant Khobragade [4] introduced a way to analyse a
browser history data to detect where the attack is happened.
Data get collected from browser history and stored in database
as evidence. Then stored data get retrieved from database and
K-means algorithm has been used to clustered normal users
and attackers. Forensic tool kit is applied on data for network
and remote system forensic.
Latesh Malik [5] introduced various network forensic analysis
tools. This tools help to monitor network traffic. Also collect
all the details about victim’s source and destination id.
Network forensic does not provide system security it just trace
the victim’s behaviour. Most of the attack happens due to the
network transmission. Lot of events occurred accordingly each
event get stored may cause huge amount of data get created.
Handling such huge data for network forensic is a difficult
task. So, this tool helps to monitor network to detect attacker’s
behaviour.
K.K.Sindhu [6] introduced tools which combination of digital
forensic investigation and crime mining. In this paper,
proposed system is designed to find out attack pattern by using
data mining technique and count of attack types happened
during time. Network and file forensic investigation can
achieve through this proposed system. To arrive at final result,
proposed system implemented crime data mining algorithm in
which association ruled has been applied.
Chrysoula Tsochataridou [7] suggests how forensic
investigator can handle huge or complex data while analysing
stage. Forensic investigation process can be improved by
using data mining technique. Collected evidences should be
cross checked to each other it may cause investigator get some
clues. Different type of data mining strategies can be used for
forensic analysis. K-means and apriori algorithm work well
for clustering similar kind of data. Proposed framework
analyse huge amount of textual data using weka. Weka
provides various machine learning concepts to analyse the
collected data.
Daniel Compton [8] proposed a framework using text mining
technique. Using this text mining technique it’s become easy
to forensic investigator to find out particulars identity with the
help of social networking sites. Text mining technique mine
the data of particulars on different two or more social
networking site.
Sonal Honale [9] proposed a system which recover the data for
analysis and stored as evidence in database. Proposed system
can recover data as evidence from deleted and hidden spaces.
K-means algorithm has been used to detect a signature of an
attack. To achieve this system used wincap, jcap, wmic
software’s. This system recognizes DOS and sql injection
attack signature.
From literature survey we analysed there is various ways to
deal with different type of attacks. Different type of tools and
methods are available to recognize the pattern of an attack.
Handling complex and huge data for forensic investigation
purpose become difficult task. While investigating for crime
activity it involves memory forensic, network forensic, file
forensic as well. So investigator must have to consider all this
concepts while forensic investigation analysis.
IV.
CONVENTIONAL SYSTEM [2]
Because of advanced internet feature cybercrimes committed
highly. To detect an attack, system has been used various
software’s and forensic tool kit. K-means clustering technique
has been used to cluster normal users and attackers. Then
forensic tool kit applied on attacker’s cluster to find out the
victim system.
unsupervised clustering algorithm using K-Means
which will cluster in normal users and attackers. Then
system uses the classification algorithm (Decision
Tree) to verify the visualization pattern of the data
instances and detect a type of an attack. Data pattern
of attacker’s data will get match with pattern in
training dataset. This training data help to recognize
the type of an attack.
A. Disadvantages of conventional system
1. In network lots of data is generated in every event of
action, so it is time consuming and difficult task for
investigator to analyze huge amount of data.
2. Forensic tool kit does not support AVI files.
3. K-means clustering technique is slowest because the
classification time directly related to the data. If data
available in huge amount then it take more time to
classify that particular data.
V.
PROPOSED SYSTEM
The proposed system implements file forensic analysis
strategy which will help forensic investigator to detect a crime
activity. The system will collect the log files as an input and
analyse the collected data using data mining technique. When
a crime occurred, it investigates and evidence is stored in the
database. Using crime data mining system the nature of the
attack is identified and alert administrator about similar attacks
also our proposed system will helps to increase the security of
the organization. Figure 2, shows the Block diagram of system
Figure 3: workflow diagram of system
Figure 2: Architecture of proposed system
1.
Graphical User Interface (GUI):
GUI provides
interface between investigator and system. Through
graphical user interface investigator can give the input
for a desired output.
2.
Log file collection: Log file collection is nothing but
browser history collection. Through this data
investigator get the evidences related crime. This
system captured the browser history as input and save
in database.
3.
Database: System used the database to save browser
history as evidence. Database provides ease to stored
data in graphical as well as in graphical format.
4.
Data mining technique: Data mining techniques is used
to analyse the collected data. It consists of various
modules like association, characterization, cluster
analysis, classification. The first step is to run
5.
VI.
Report: Finally system generates the report which
shows the type of an attack has been occurred.
Generated report shows the result of the analysis
stage.
FILE FORENSIC ANALYSIS MODULE AND NETWORK
INVESTIGATION ANALYSIS MODULE
This system implements file forensic analysis in which
system captures the log files. These log files is nothing but the
browser history in which system collects files in various
formats. The file forensic analysis is nothing but gathering files
in various file formats, storing them into database as evidence
and analyzing them for detecting crime activity. File forensic
analysis involves collecting evidences from log files and
analyze those evidence to detect a cyber-crime activity. This
proposed system also implements the Network investigation
analysis module. Network investigation module is capturing,
storing and investigating the each and every network action to
discover the source of cyber-crime activity [10], [11]. Network
investigation is used to find out attackers behaviour and
tracked them by gathering and analysing log file information
and monitoring network traffic. This system captures the user
source_id, destination_id and port through log files.
VII. DATA ANALYSIS TO DETECT AN ATTACK BY USING DATA
MINING TECHNIQUE
Data instances are grouped together, based on the similarity
scheme. So, the clustering is the data mining technique which
helps to make a group of data instances into clusters of
significant interest evaluate the performance of the system and
detect outlier [12]. This proposed system implements
clustering of data as normal user’s or attackers by using simple
k-means algorithm. Simple k-means algorithm takes k, the
number of clusters to be determined, as an input parameter and
partitions the given set of n objects into k clusters so that the
resulting intra-cluster similarity is high while the inter-cluster
similarity is low. Euclidean distance measure is used to assign
instances to clusters. The objective of this algorithm is the
minimization of the mean squared Euclidean distance of the
data from the centre of their clusters. In system, decision tree
work as powerful classifier. It does not matter how huge
amount of data is, DT has capacity to classify that huge
amount of data [13]. Tree size is depending on the size of data.
Decision tree represent specific set of association rules
regarding each node which will help to classify the dataset.
VIII. CONCLUSION
Cyber-crime data is collected using file forensics analyser of
system methodology; the log file is captured and stored in
database as evidence. Digital forensics is used to find out
attackers behaviour and tracked them by gathering and
analysing log information in network, also Digital forensic
tries to analysis network traffic data. To, implement file
forensic analysis and network investigation analyser in
efficient way system applying data mining techniques which
will help to forensics investigator to detect a crime activity. In
system, the Decision Tree works as classifier used to analyse
where an attack is happened and also the type of an attack.
References
[1] Vimal shetty, Ramesh Vardhan,“Digital Forensics using Data
Mining”,International Journal of Computer Trends and Technology (IJCTT)
volume 22 Number 3rd April 2015.
[2] Mohd Taufik Abdullah, Ramlan Mahmod, Abdul A. A. Ghani, Mohd A
Zain and AbuBakar Md S, “Review on Advanced features of Digital
Forensics”,International Journal Of Computer Science and Network Security,
vol. 8, no. 2, February 2008.
[3] Veena H Bhat, Prasanth Rao, Abhilash R V, Deepa Shenoy, Venugopal
K R and L M Patnaik,“A Data Mining Approach for Data Generation and
Analysis for Digital Forensic Application”,,IACSIT International Journal of
Engineering and Technology, Vol.2, No.3,
June 2010.
[4] Prashant K. Khobragade,Latesh G. Malik, “Data Generation and Analysis
for Digital Forensic Application using DataMining”,Fourth International
Conference on Communication Systems and Network Technologies, June
2015.
.
[5] Latesh G. Malik,“A Review on Data Generation for Digital Forensic
Investigation using Datamining”, IJCAT International Journal of Computing
andTechnology, Volume 1, Issue 3, April 2014.
[6] K. K. Sindhu, B. B. Meshram, “ Digital Forensics and Cyber Crime Data
mining, Journal of Information Security, May 2012.
[7] Chrysoula Tsochataridou, Avi Arampatzis, Vasilios Katos,“Improving
Digital Forensics Through Data Mining”, IMMM 2014, The Fourth
International Conference on Advances in Information Mining and
Management, September 2016.
[8] Daniel Compton, J.A. Hamilton,“ An Examination of the Techniques and
Implications of the Crowd-sourced Collection of Forensic Data”, IEEE
International Conference on Privacy, Security, Risk, and Trust, and IEEE
International Conference on Social Computing, April 2014.
[9] Sonal Honale , Jayshree Borkar,“ Framework for Live Digital Forensics
using Data Mining”, International Journal of Computer Trends and
Technology (IJCTT) volume 22 Number 3, April 2015.
[10] Seung-hoon Kang, Juho Kim, “Network Forensic Analysis Using
Visualization Effect, International Conference on Convergence and Hybrid
Information Technology”, 2008 Internationa Journal of Security,December
2015.
[11] Natarajan Meghanathan, Sumanth Reddy Allam and Loretta,“Tools And
Techniques For Network Forensics”, USA International Journal of Network
Security and Its Applications (IJNSA), Vol .1, No.1,April 2014.
[12] Sachin S. Patil, Deepak Kapgate, P.S. Prasad,“ A Review on Detection of
Attacks using Data Mining Techniques”,December 2013 volume 3
International Journal of Advanced Research in Computer Science and
Software Engineering.
[13] Iman Paryudi , Ahmad Ashari ,“ Performance Comparison between Nave
Bayes, Decision Tree and k”-Nearest Neighbor in Searching Alternative
Design in an Energy Simulation Tool”,(IJACSA) International Journal of
Advanced Computer Science and Applications, Vol. 4, No. 11, 2013 .