Download pptx - cse.sc.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern language wikipedia , lookup

Transcript
Detecting Web Attacks Using Multi-Stage Log
Analysis
Presented by
Akhil Katpally
Authors:
Melody Moh ,
Santhosh Pininti,
Sindhusha Doddapaneni,
Teng-sheng Moh
Agenda
•
•
•
•
•
•
Goal of the paper.
Introduction.
Background and Related Studies.
System Design and Implementation.
Experiments and Results.
Conclusion
Goal of the paper
• Authors have proposed a new Multi-Stage Log Analysis system, which
combines both Pattern Matching and supervised Machine Learning
methods.
• This system can effectively detect new SQL-injection attacks.
• Authors has successfully implemented a Proof-of-concept of the
proposed system on Amazon AWS, using Kibana for Pattern Matching
and Bayes Net for Machine Learning.
Introduction
• With so much dependency on the web in our daily life, its security has
become extremely important.
• One of the major security issues of web applications is SQL-injection.
• According to OWASP Top 10 Security issues, SQL-injection stands top.
• Even the emerging cloud technology is accessed through web
interfaces, its security is a top priority for Internet-and-Cloud-Services
providers(ISP/CSP).
Introduction….Continued
• Real Time Log Analysis is one major procedure to detect and prevent an
SQL-injection attacks. It uses Pattern Matching and Machine Learning
techniques.
• With Pattern Matching, only know injection patterns are recognized and
patterns with small changes are recognized.
• Existing Log Analysis methods for SQL injection detection are based on
either Pattern Matching or Machine Learning.
• Proposed system uses both Pattern Matching and Machine Learning.
Contributions
• Proposed a multi-stage architecture for detecting SQL injection attacks.
• Implemented a prototype based on proposed architecture, using Bayes
Net and Kibana.
• Compare the Pro and Cons of Pattern Matching (Kibana) and Machine
Learning (Bayes Net)
• Evaluated the 2-stage system through a series of experiments.
Background and Related Studies
• SQL injection: Attacker inputs an SQL query, which modifies or damages
the database that is connected to the target web application.
• Order Wise, Blind and Against Database.
• Log Analysis(Log4j): understanding logs and extracting useful
information.
• Pattern Matching: checks whether a set of words is present in the given
text.
Background and Related Studies….Continued
• Logstash: Data pipelining tools which connects to a variety of sources
and receives different types of logs (system, web server, error and
application logs).
• ElasticSearch: Search and data analysis software which gives deep
insight on streaming data. Uses apache Lucene.
• Kibana: Data visualization interface for real-time summarizing and
charting of stream data.
Background and Related Studies….Continued
• Machine Learning: Way of making a computer learn and take action
without explicitly programming.
• Naïve Bayes Classification: Simple probabilistic classifier, builds upon the
Bayes theorem, which gives the probability of an event occurring based
on the given conditions that are related to the event.
• Bayes Networks are often used to tackle the independent-attributes
assumption of Native Bayes Classification and is helpful and improves
performance.
System Design and Implementation
Web Application Logic
• Single-Stage Architecture
• Application Logs are generated using log4j
library
• Either Machine Learning method (Bayes
Net)or Pattern Matching method (ELK
system)is used for SQL injection detection.
Web
Application log4j
Web
application
users
Analyst
Log Files
Logstash
Preprocessi
ng for WEKA
Elasticsearch
Bayes Net
Model
Kibana
Bayes Net
Rank
System Design and Implementation….Continued
• Multi-Stage Architecture
• Proposed method combines both machine
learning and pattern matching.
• WEKA is a Machine Learning tool used,
initially a model is trained, with the training
data.
• Model generated is tested using 5-fold cross
validation and has an accuracy of 78.8% for
Bayes Net model.
Web Application Logic
Web
Application log4j
Web
application
users
Analyst
Log Files
Preprocessing
for Kibana
Preprocessi
ng for WEKA
Elasticsearch
Bayes Net
Model
Kibana
Bayes Net
Rank
System Design and Implementation….Continued
• Log Generation: We can either use parsers and filter to filter out the
unnecessary information in the logs, or use logging libraries(log4j) to
create custom logs.
• Preprocessing for WEKA (Single stage) : attributes in test set should
match in training set.
• Preprocessing for WEKA(multi stage) : logs not detected by kibana are
input to WEKA. Unix script to convert CSV to ARFF file for WEKA input.
• Preprocessing for Kibana(multi stage): output of WEKA is used as input
of kibana. Unix script to convert ARFF file to text file.
Experiment Setup
• Dataset: web application logs generated using the Log4j framework.
Data
Total Logs
SQL Logs
Regular Logs
Training Set
2000
547
1453
Testing Set
10000
2812
7188
• Web Application: Web application developed using Java, Bootstrap,
HTML, CSS, JavaScript and MySQL. It is hosted on Amazon AWS Linux
instance.
• Kibana and Bayes Net methods are used.
Kibana vs Bayes Net
Kibana
Bayes Net
Purpose
Used for Detecting SQL injections and visualizing data.
Used for classification of logs into SQL-injection and other logs.
Mechanism
Use Pattern Matching techniques for detection.
Use Supervised machine learning to learn and detect attacks
Overhead
No file conversion is required. It takes directly from text file.
Load only ARFF files, so log files need to convert into ARFF.
No preprocessing is required. Filters can be used to extract required data.
Preprocessing is required. Before passed to model for classification needs
to be preprocessed.
No training is required. Queries are written for detection.
Training is required, which involves manual classification.
Pros and Cons
A real-time system where new queries may be issued.
Not a real-time system as it involves offline training.
Can detect only specified patterns, cannot detect new types of SQLinjection.
Can detect new patterns, since it considers attributes like IP address
while classifying.
Results are in visualized form. Easy to analyze.
Results in text form. Difficult to analyze
Experiment Results
Method
Accuracy for SQL Detection (%)
Machine Learning: Naïve Bayes
61.7
Machine Learning: Bayes Net
80.0
Pattern Matching: Kibana
85.3
Kibana followed by Bayes Net
94.7
Bayes Net followed by Kibana
95.4
Conclusion
• A multi-stage log analysis architecture has been proposed, which uses
both machine learning and pattern recognition.
• Experiment results proves two-stage architecture is more accurate and
also particularly when Bayes Net model precedes Kibana. Kibana can
also provide final output with visualization.
• Further improvements can be done on Kibana queries and also
unsupervised machine learning methods can be used which may lead to
real-time log analysis.