Download Abstract - CSEPACK

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computational phylogenetics wikipedia , lookup

Geographic information system wikipedia , lookup

Neuroinformatics wikipedia , lookup

Inverse problem wikipedia , lookup

Theoretical computer science wikipedia , lookup

Error detection and correction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Data analysis wikipedia , lookup

Pattern recognition wikipedia , lookup

Corecursion wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Anomaly Detection via Online Oversampling Principal Component Analysis
ABSTRACT
Anomaly detection has been an important research topic in data mining and machine
learning. Many real-world applications such as intrusion or credit card fraud detection require an
effective and efficient framework to identify deviated data instances. However, most anomaly
detection methods are typically implemented in batch mode, and thus cannot be easily extended
to large-scale problems without sacrificing computation and memory requirements. In this paper,
we propose an online over-sampling principal component analysis (osPCA) algorithm to address
this problem, and we aim at detecting the presence of outliers from a large amount of data via an
online updating technique. Unlike prior PCA based approaches, we do not store the entire data
matrix or covariance matrix, and thus our approach is especially of interest in online or largescale problems. By over-sampling the target instance and extracting the principal direction of the
data, the proposed osPCA allows us to determine the anomaly of the target instance according to
the variation of the resulting dominant eigenvector. Since our osPCA need not perform eigen
analysis explicitly, the proposed framework is favored for online applications which have
computation or memory limitations. Compared with the well-known power method for PCA and
other popular anomaly detection algorithms, our experimental results verify the feasibility of our
proposed method in terms of both accuracy and efficiency.
Existing System
Statistical approaches assume that the data follows some standard or
predetermined distributions, and this type of approach aims to find the outliers which deviate
form such distributions. For distance-based methods, the distances between each data point of
interest and its neighbors are calculated. If the result is above some predetermined threshold, the
target instance will be considered as an outlier. One of the representatives of this type of
approach is to use a density based local outlier factor (LOF) to measure the outlierness of each
data instance. Based on the local density of each data instance, the LOF determines the degree of
outlierness, which provides suspicious ranking scores for all samples. The most important
Anomaly Detection via Online Oversampling Principal Component Analysis
property of the LOF is the ability to estimate local data structure via density estimation. This
allows users to identify outliers which are sheltered under a global data structure
Problems on existing system:
Most distribution models are assumed univariate, and thus the lack of robustness for
multidimensional data is a concern. Moreover, since these methods are typically implemented in
the original data space directly, their solution models might suffer from the noise present in the
data.
Proposed System
PCA is a well known unsupervised dimension reduction method, which determines the principal
directions of the data distribution. This will prohibit the use of our proposed framework for realworld large-scale applications. Although the well known power method is able to produce
approximated PCA solutions, it requires the storage of the covariance matrix and cannot be
easily extended to applications with streaming data or online settings. Therefore, we present an
online updating technique for our osPCA. This updating technique allows us to efficiently
calculate the approximated values without performing analysis or storing the data covariance
matrix.
ADVANTAGE:
1. The required computational costs and memory requirements are significantly reduced.
2. Our method is especially preferable in online, streaming data, or large scale problems.
IMPLEMENTATION
Implementation is the stage of the project when the theoretical design is turned out
into a working system. Thus it can be considered to be the most critical stage in achieving a
Anomaly Detection via Online Oversampling Principal Component Analysis
successful new system and in giving the user, confidence that the new system will work and
be effective.
The implementation stage involves careful planning, investigation of the existing
system and it’s constraints on implementation, designing of methods to achieve changeover
and evaluation of changeover methods.
Main Modules:1. User Module:
In this module, Users are having authentication and security to access the detail which
is presented in the ontology system. Before accessing or searching the details user should have
the account in that otherwise they should register first.
2. Computation or memory limitations :
Here The Memory should not need for storing all datils in the database this work based
on the interface just passes the information from client to server. This is not Required
Memory Space for Storing datas.
3. Clustering:
The training data will be selected only by our assumption. So there is a possibility that
some outlier data may be considered as normal data in the previous method due to our training
data. So the clustering method is used to solve this problem. The clusters are formed for input
data instances and then the outlier calculation is applied for each cluster to find the outlier
exactly.
rigorous security definition and proved the security of the proposed scheme under the provided
definition to ensure the confidentiality.
Anomaly Detection via Online Oversampling Principal Component Analysis
4. Anomaly Detection:
This is for detecting the outlierness of the user input. When the user giving the input to the
system, the system calculate the St value for the new input. And then compare that new St value
with the threshold value which is calculated in earlier.
If the St value of the new data instance is above the threshold value, then that input data is
identified as an outlier and that value will be discarded by the system. Otherwise it is considered
as a normal data instance, and the PCA value of that particular data instance is updated
accordingly.
System Configuration:H/W System Configuration:-
Processor
- Pentium –III
Speed
- 1.1 Ghz
RAM
- 256 MB(min)
Hard Disk
- 20 GB
Floppy Drive
- 1.44 MB
Key Board
- Standard Windows Keyboard
Mouse
- Two or Three Button Mouse
Monitor
- SVGA
Anomaly Detection via Online Oversampling Principal Component Analysis
S/W System Configuration:-

Operating System
:Windows95/98/2000/XP

Application Server
: Tomcat5.0/6.X

Front End
: HTML, Java, Jsp

Scripts

Server side Script

Database
: Mysql 5.0

Database Connectivity
: JDBC.
: JavaScript.
: Java Server Pages.