Download IOSR Journal of Computer Engineering (IOSR-JCE)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan – Feb. 2016), PP 71-76
www.iosrjournals.org
Classifying Evaluation Secure Patterns under Attacks
Mitnasala Mamatha1, Hussain Syed2
#1Student of M. Tech(CSE) and #2 Asst.Prof, Department of Computer Science and Engineering, QIS Institute
of technology, Ongole.
Abstract: Pattern Classification is one division for machine discovering that spotlights on acknowledgment of
examples and regularities in information. In antagonistic applications such as biometric verification, spam
sifting, system interruption identification the example grouping frameworks are utilized. Design arrangement
frameworks might display vulnerabilities if antagonistic situation is not considered. Multimodal biometric
frameworks are heartier to parodying assaults, as they consolidate data originating from various biometric
characteristics. Assess the security of example classifiers that formalizes and sums up the fundamental thoughts
proposed in the writing and give samples of its utilization in three genuine applications. We propose a structure
for assessment of example security, model of foe for characterizing any assault situation. Reported results
demonstrate that security assessment can give a more finish comprehension of the classifier's conduct in
antagonistic situations, and lead to better plan decisions.
Index Terms: Data mining, Pattern characterization, Model of Adversary.
I.
Introduction:
In example request structures machine learning estimations are used to perform security-related
applications like biometric approval, framework intrusion area, and spam filtering, to perceive a "real" and a
"noxious" sample class. The data can be purposely controlled by an adversary to make classifiers to convey
false negative. Regardless of standard ones, these Applications have a characteristic opposing nature since the
information data can be purposefully controlled by a keen and adaptable adversary to undermine classifier
operation. This often offers climb to a defenses challenge between the enemy and the classifier organizer.
Doubtlessly comprehended specimens of strikes against case classifiers are: exhibiting a fake biometric
trademark to a biometric affirmation structure (deriding ambush) .Well known instances of attacks are: Spoofing
strikes where one individual or program deliberately distorting data and accordingly getting an illegitimate
purpose of inclination altering framework groups fitting in with intrusive development controlling substance of
messages adjusting framework packages having a spot with meddling movement. Not well arranged machine
learning is an examination field that lies at the meeting of machine learning and PC security. It hopes to engage
the protected choice of machine learning systems in will-arranged settings like spam filtering, malware
recognizable proof and biometric affirmation. Tests include: strikes in spam isolating, where spam messages are
waded through erroneous spelling of dreadful words or insertion of good words; ambushes in PC security, e.g.,
to disorder malware code within framework packages or beguile signature acknowledgment; attacks in
biometric affirmation, where fake biometric attributes might be mishandled to copy a bona fide customer
(biometric mocking) or to exchange off customers' organization shows that are adaptively updated over
time.[16] To fathom the security properties of learning computations in opposing settings, one should address
the going with major issues:
i. recognizing potential vulnerabilities of machine learning computations in the midst of learning and request;
ii. Figuring legitimate attacks that identify with the recognized perils and evaluating their impact on the
concentrated on structure;
iii. Proposing countermeasures to improve the security of machine learning estimations against the considered
attacks.
Fig. 1 Email Types
DOI: 10.9790/0661-18127176
www.iosrjournals.org
71 | Page
Classifying Evaluation Secure Patterns under Attacks
II.
Related Work:
Biometric frameworks have been observed to be helpful devices for individual ID and confirmation. A
biometric trademark is any physiological of behavioral characteristic of a man that can be utilized to recognize
that individual from other individuals. A couple key parts of a human physiological or behavioral attribute that
make for a solid biometric for acknowledgment are all-inclusiveness, peculiarity perpetual quality and
collectability. Era of preparing and test information sets from accumulated information is an imperative
undertaking in adding to a classifier with high era capacity. Reassembling strategies are utilized as a part of
factual investigation, are utilized for model choice by evaluating the characterization execution of classifiers.
Reassembling methods are utilized for evaluating insights, for example, the mean and the middle by arbitrarily
selecting information from the given information set, figuring measurements on that information and rehashing
above methodology ordinarily. Parody assaults comprise in submitting fake biometric qualities to biometric
frameworks, and this is a noteworthy risk in security. Multi-modular biometric frameworks are usually utilized
as a part of satire assaults. Multimodal biometric frameworks for individual character acknowledgment are
exceptionally valuable from recent years. It has been demonstrated that consolidating data originating from
various biometric attributes can conquer the cutoff points and the shortcomings natural in each individual
biometric, bringing about a higher precision. Interruption location frameworks investigate system movement to
avert and identify malevolent exercises like interruption endeavors, port outputs, and dissent of-administration
assaults. At the point when suspected noxious activity is identified, a caution is raised by the IDS and thusly
taken care of by the framework manager. Two primary sorts of IDSs exist: abuse identifiers and inconsistency
based ones. These guarantee that the characteristic is accessible from all individuals, is enough variable among
all individuals, does not change fundamentally after some time, and is sensibly ready to be measured. The issue
with any human quality that meets these criteria is in the execution, worthiness, and circumvention of the
biometric highlight. Execution is an issue coming about fundamentally from the blend of absence of variability
in the biometric attribute, commotion in the sensor information because of ecological components, and
heartiness of the coordinating calculation. Worthiness shows how willing the customer pool will be to utilize the
biometric identifier consistently. Circumvention is the likelihood of a non-customer (impostor) moving beyond
the framework utilizing misleading strategies. The way to making a safe multimodal biometric framework is in
how the data from the distinctive modalities is melded to settle on an official conclusion. There are two distinct
classifications of combination plans for various classifiers; standard based and directed based. Regulated
systems, then again, require preparing however can frequently give preferred results over the principle based
techniques. For instance, a combination system utilizing a bolster vector machine (SVM) could out-perform a
combination calculation utilizing the entirety principle. Bringing a quality measure into a combination
calculation is one strategy that has been utilized to help execution in multi biometric frameworks. On the off
chance that for occurrence, a more secure biometric of fantastic gives a low match score and a less secure
biometric gives a high match score, then there is a high probability of a satire assault. It is generally
comprehended that one of the qualities of a multimodal framework is in its capacity to oblige for loud sensor
information in an individual methodology. Conversely, a more secure calculation, to address the issue of a
parody assault on a fractional subset of the biometric modalities, must require satisfactory execution in all
modalities. This kind of calculation would perpetually discredit, to some degree, the commitment of a
multimodal framework to execution in the vicinity of boisterous sensor information. A multimodal framework
enhances the execution angle however expands the security just somewhat since it is still defenseless against
incomplete farce assaults. Upgraded combination routines which use ways to deal with enhance security will
again endure diminished execution when given loud information. The bolster vector machine (SVM) is an
activity strategy for information association and inversion rubrics after measurements, for occasion the SVM
can be reused to study polynomial, round establishment reason (RBF) then multi-layer observation (MLP)
classifiers SVMs stayed boss discretionary by Vapnik in the 1960s for association to build up a piece of
infiltrate in Investigate on owed to development in the systems in addition to rationality joined with deferments
to inversion and thickness guess. SVM's rose after arithmetical learning rationality the objective presence to
determine separate the risky of consideration denied of determining extra hazardous as a center stage. SVM's are
established on the physical risk minimization code, painstakingly joined with general inaction rationality. This
conviction joins volume switch to stop over-fitting and accordingly is aim finished reaction to the inclination
difference exchange off problem.
III. Spam Filtering Overview:
Over the past few years, spam filtering software has gained popularity due to its relative accuracy and
ease of deployment. With its roots in text classification research, spam filtering software seeks to answer the
question “Whether the message x is spam or not?” The means by which this question is addressed varies upon
the type of classification algorithm in place. While the categorization method differs between statistical filters,
their basic functionality is similar. The basic model is often known as the bag of words (multinomial) or
DOI: 10.9790/0661-18127176
www.iosrjournals.org
72 | Page
Classifying Evaluation Secure Patterns under Attacks
multivariate model. Essentially, a document is distilled into a set of features such as words, phrases, meta-data,
etc. This set of features can then be represented as a vector whose components are Boolean (multivariate) or real
values (multinomial). One should note that with this model the ordering of features is ignored. Classification
algorithm uses the feature vector as a basis upon which the document is judged. The usage of the feature vector
varies between classification methods. As the name implies, rule based methods classify documents based on
whether or not they meet a particular set of criteria. Machine learning algorithms are primarily driven by the
statistics (e.g. word frequency) that can be derived from the feature vectors. One of the widely used methods,
Bayesian classification, attempts to calculate the probability that a message is spam based upon previous feature
frequencies in spam and legitimate e-mail.
ALGORITHM
Construction of Training (TR) or Testing Set (TS) Generation
This algorithm is used to construction of training and testing of any desired size from the distribution
D. It follows a step by step procedure.This algorithm is based on classical resamplingtechniques such as crossvalidation and bootstrapping. Which consists of discriminating between legitimate (L) and malicious (M)
samples.
X denotes a d-dimensional feature vector
Properties to those exhibited by classical performance evaluation
Methods based on the same techniques. The step by step procedure as follows.
i.
Consider there are „n‟ number of labeled sample.
ii.
The class label „y‟ belongs to legitimate(L) or malicious(M) and „a‟ belongs to true (T) or false(F).
iii.
Initially the sample set is empty.
iv.
For the distribution I from 1 to n.
v.
Take a sample y from probability distribution of L,M.
vi.
The probability of a by y is equal to y then take the sample „a‟.
vii.
Draw the sample „x‟ which is the combination of y and a, if analytically defined otherwise draw a
sample with replacement from D(y, a).
viii.
Now the sample S have the distribution of x, y.
ix.
End for
x.
Return to the sample set.
IV.
Spam And Online Svms:
The support vector machine (SVM)is a exercise procedure for knowledge organization and reversion
rubrics after statistics, for instance the SVM can be recycled to study polynomial, circular foundation purpose
(RBF) then multi-layer perception (MLP) classifiers SVMs remained chief optional by Vapnik in the 1960s for
organization beside smustlately develop an part of penetrate in investigate on owed to growths in the methods
plus philosophy joined with postponements to reversion and thicknessapproximation.SVM‟s ascendedafter
arithmeticalknowledgephilosophy the goal existence to resolve separate the problematic of attention deprived of
resolving additional problematic as a middle stage. SVM‟s are founded on the physical threat minimisation
code, carefully connected to regular inaction philosophy. This belief joins volume switch to stop over-fitting and
therefore is ain complete response to the bias-variance trade-off quandary. Binary key rudiments in the
application of SVM are the methods of precise software design and seed purposes. The limits are originated by
resolving a quadratic software design problematic with direct parity and disparity restraints; slightly than by
resolving a non-convex, unimpeded optimisation problem. The suppleness of seed purposes lets the SVM to
exploration a extensive diversity of theory places. The geometrical clarification of support vector classification
(SVC) is that the procedure pursuits for the best unravelling superficial, i.e. the hyper plane that is, in a
intelligence, intermediate after the binary courses. This best unscrambling per plane has several agree able
arithmetical possessions. SVC is drawn chief aimed at the linearly divisible circumstance. Kernel purposes are
then presented in instruction to concept non-linear choice exteriors. In conclusion, for noisy data, when whole
parting of the binary courses might not be desirable, relaxed variables are presented to permit for exercise faults.
V.
Problem Statement
A systematic and unified dealing of this issue is thus needed to allow the trusted taking on of pattern
classifiers in adversarial environments, starting from the theoretical foundations up to novel design methods,
extending the classical design cycle.
Pattern classification systems base on classical theory and design methods do not take into account
adversarial settings, they exhibit vulnerabilities to some potential attacks, allowing adversaries to undermine
their usefulness.
DOI: 10.9790/0661-18127176
www.iosrjournals.org
73 | Page
Classifying Evaluation Secure Patterns under Attacks
Three main open issues can be identified: Analyzing the vulnerabilities of classification algorithms, and the
corresponding attacks.
Developing novel methods to assess classifier security against these attacks, which is not possible using
classical performance evaluation methods.
Developing novel design methods to promise classifier security in adversarial environments.
VI. Pattern Recognition:
Pattern recognition is a branch of machine learning that focuses on the recognition of patterns and
regularities in data, although it is in some cases considered to be nearly synonymous with machine learning.
Pattern recognition systems are in many cases trained from labelled "training" data (supervised learning), but
when no labelled data are available other algorithms can be used to discover previously unknown patterns
(unsupervised learning). The terms pattern recognition, machine learning, data mining and knowledge discovery
in databases (KDD) are hard to separate, as they largely overlap in their scope. Machine learning is the common
term for supervised learning methods and originates from artificial intelligence, whereas KDD and data mining
have a larger focus on unsupervised methods and stronger connection to business use. Pattern recognition has its
origins in engineering, and the term is popular in the context of computer vision: a leading computer vision
conference is named Conference on Computer Vision and Pattern Recognition. In pattern recognition, there may
be a higher interest to formalize, explain and visualize the pattern; whereas machine learning traditionally
focuses on maximizing the recognition rates. Yet, all of these domains have evolved substantially from their
roots in artificial intelligence, engineering and statistics; and have become increasingly similar by integrating
developments and ideas from each other. In machine learning, pattern recognition is the assignment of a label to
a given input value. In statistics, discriminate analysis was introduced for this same purpose in 1936. An
example of pattern recognition is classification, which attempts to assign each input value to one of a given set
of classes (for example, determine whether a given email is "spam" or "non-spam"). However, pattern
recognition is a more general problem that encompasses other types of output as well. Other examples are
regression, which assigns a real-valued output to each input; sequence labelling, which assigns a class to each
member of a sequence of values (for example, part of speech tagging, which assigns a part of speech to each
word in an input sentence); and parsing, which assigns a parse tree to an input sentence, describing the syntactic
structure of the sentence.
VII.
Contributions, Limitations And Open Issues
In this paper we focused on empirical security evaluation of pattern classifiers that have to be deployed
in adversarial environments, and proposed how to revise the classical performance evaluation design step, which
is not suitable for this purpose. Our main contribution is a framework for empirical security evaluation that
formalizes and generalizes ideas from previous work, and can be applied to different classifiers, learning
algorithms, and classification tasks. It is grounded on a formal model of the adversary that enables security
evaluation; and can accommodate application-specific techniques for attack simulation. This is a clear
advancement with respect to previous work, since without a general framework most of the proposed techniques
(often tailored to a given classifier model, attack, and application) could not be directly applied to other
problems. An intrinsic limitation of our work is that security evaluation is carried out empirically, and it is thus
data dependent; on the other hand, model-driven analyses require a full analytical model of the problem and of
the adversary‟s behaviour that may be very difficult to develop for real-world applications. Another intrinsic
limitation is due to fact that our method is not application-specific, and, therefore, provides only high-level
guidelines for simulating attacks. Indeed, detailed guidelines require one to take into account application
specific constraints and adversary models. Our future work will be devoted to develop techniques for simulating
attacks for different applications. Although the design of secure classifiers is a distinct problem than security
evaluation, our framework could be also exploited to this end.
VIII. Experimental Results
Table1.0classificationofpatternclassifierpotential
Attacks
1
DOI: 10.9790/0661-18127176
pattern
classifier
Potential
0.0992
2
6
10
0.0995
5
5
20
0.0996
5
5
30
0.0997
7
8
50
5
10
60
www.iosrjournals.org
74 | Page
Classifying Evaluation Secure Patterns under Attacks
70
60
Attacks
50
patternc
40
1
2
3
4
30
5
lassifierP
otential
Fig2. Function of classifier values
Each model decreases that is20it drops to zero for values between 3and 5 (depending on the classifier). This
means that all testing spam emails gotmis-classified as legitimate, after adding or obfuscating from3 to
5words.The pattern and attack10classifiers perform very similarly when they are not under attack, regardless of
the feature set size; therefore, according to the viewpoint of classical performance evaluation, the designer could
choose any of the eight models.0However, security evaluation
IX. Conclusion:
In this paper we focused on empirical security evaluation of pattern classifiers that have to be deployed
in adversarial environments, and proposed how to revise the classical performance evaluation design step, which
is not suitable forth is purpose. Our main contribution is a framework for empirical security evaluation that
formalizes and generalizes ideas from previous work, and can be applied to different classifiers, learning
algorithms, and classification tasks. It is grounded on a formal model of the adversary, and on a model of data
distribution that can represent all the attacks considered in previous work; provides a systematic method for the
generation of training and testing sets that enables security evaluation; and can accommodate applicationspecific techniques for attack simulation. An intrinsic limitation of our work is that security evaluation is carried
out empirically, and it is thus data dependent; on the other hand, model-driven analyses require a full analytical
model of the problem and of the adversary‟s behaviour that may be very difficult to develop for real-world
applications. Another intrinsic limitation is due to fact that our method is not application-specific, and, therefore,
provides only high-level guidelines for simulating attacks. Indeed, detailed guidelines require one to take into
account application-specific constraints and adversary models.
References:
[1].
[2].
[3].
[4].
[5].
[6].
[7].
[8].
[9].
[10].
[11].
[12].
[13].
[14].
[15].
[16].
R.N. Rodrigues, L.L. Ling, and V. Govindaraju, “Robustness ofMultimodal Biometric Fusion Methods against Spoof Attacks,”J.
Visual Languages and Computing, vol. 20, no. 3, pp. 169-179, 2009.
P. Johnson, B. Tan, and S. Schuckers, “Multimodal Fusion Vulnerabilityto Non-Zero Effort (Spoof) Imposters,” Proc. IEEE
Int‟lWorkshop Information Forensics and Security, pp. 1-5, 2010.
P. Fogla, M. Sharif, R. Perdisci, O. Kolesnikov, and W. Lee,“Polymorphic Blending Attacks,” Proc. 15th Conf. USENIX
SecuritySymp., 2006.
G.L. Wittel and S.F. Wu, “On Attacking Statistical Spam Filters,”Proc.First Conf. Email and Anti-Spam, 2004.
D. Lowd and C. Meek, “Good Word Attacks on Statistical SpamFilters,” Proc. Second Conf. Email and Anti-Spam, 2005.
A. Kolcz and C.H. Teo, “Feature Weighting for Improved ClassifierRobustness,” Proc. Sixth Conf. Email and Anti-Spam, 2009.
D.B. Skillicorn, “Adversarial Knowledge Discovery,” IEEE IntelligentSystems, vol. 24, no. 6, Nov./Dec. 2009.
D. Fetterly, “Adversarial Information Retrieval: The Manipulationof Web Content,” ACM Computing Rev., 2007.
R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification.Wiley-Interscience Publication, 2000.
N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma,“Adversarial Classification,” Proc. 10th ACM SIGKDD Int‟l
Conf.Knowledge Discovery and Data Mining, pp. 99-108, 2004.
M. Barreno, B. Nelson, R. Sears, A.D. Joseph, and J.D. Tygar, “Can Machine Learning be Secure?” Proc. ACM Symp.
Information, Computer and Comm. Security (ASIACCS), pp. 16-25, 2006.
A.A. C_ardenas and J.S. Baras, “Evaluation of Classifiers: Practical Considerations for Security Applications,” Proc. AAAI
Workshop Evaluation Methods for Machine Learning, 2006.
P. Laskov and R. Lippmann, “Machine Learning in Adversarial Environments,” Machine Learning, vol. 81, pp. 115-119, 2010.
L. Huang, A.D. Joseph, B. Nelson, B. Rubinstein, and J.D. Tygar, “Adversarial Machine Learning,” Proc. Fourth ACM Workshop
Artificial Intelligence and Security, pp. 43-57, 2011.
M. Barreno, B. Nelson, A. Joseph, and J. Tygar, “The Security of Machine Learning,” Machine Learning, vol. 81, pp. 121-148,
2010.
D. Lowd and C. Meek, “Adversarial Learning,” Proc. 11th ACM SIGKDD Int‟l Conf. Knowledge Discovery and Data Mining,
pp. 641- 647, 2005.
DOI: 10.9790/0661-18127176
www.iosrjournals.org
75 | Page
Classifying Evaluation Secure Patterns under Attacks
AUTHORS:
MITNASALA MAMATHA isPursuingM.Tech (Computer Science and Engineering), in
QIS Institute of Technology, PrakasamDist, Andhra Pradesh,India.
HUSSAIN SYED currently working as Asst.Professor in QIS Institute of Technology, in
the Department of Computer Science and Engineering, Ongole, Prakasam Dist, Andhra
Pradesh, India
DOI: 10.9790/0661-18127176
www.iosrjournals.org
76 | Page