Download Intelligent Detection of Malicious Script Code

Intelligent Detection of Malicious Script Code CS194, 2007-08 Benson Luk Eyal Reuveni Kamron Farrokh Advisor: Adnan Darwiche Introduction 3-quarter project Sponsored by Symantec Main focuses:  Web programming  Database development  Data mining  Artificial intelligence Overview Current security software catches known malicious attacks based on a list of signatures  The problem: New attacks are being created every day  • Developers need to create new signatures for these attacks • Until these signatures are made, users are vulnerable to these attacks Overview (cont.) Our objective is to build a system that can effectively detect malicious activity without relying on signature lists  The goal of our research is to see if and how artificial intelligence can discern malicious code from non-malicious code  Data Gathering Gather data using a web crawler (probably a modified web crawler based on the Heritrix software)  Crawler scours a list of known “safe” websites  Will also branch out into websites linked to by these websites for additional data, if necessary  While this is performed, we will gather key information on the scripts (function calls, parameter values, return values, etc.)  This will be done in Internet Explorer  Data Storage When data is gathered it will need to be stored for the analysis that will take place later  Need to develop a database that can efficiently store the script activity of tens of thousands (possibly millions) of websites  Data Analysis Using information from database, deduce normal behavior  Find a robust algorithm for generating a heuristic for acceptable behavior  The goal here is to later weigh this heuristic against scripts to determine abnormal (and thus potentially malicious) behavior  Challenges  Gathering • How to grab relevant information from scripts? • How deep do we search?  Good websites may inadvertently link to malicious ones  The traversal graph is probably infinitely long  Storage • In what form should the data be stored?  Need efficient way to store data without simplifying it  Example: A simple laundry list of function calls does not take call sequence into account  Analysis • What analysis algorithm can handle all of this data? • How can we ensure that the normality heuristic it generates minimizes false positives and maximizes true positives? Milestones  Phase I: Setup • Set up equipment for research, ensure whitelist is clean  Phase II: Crawler • Modify crawler to grab and output necessary data so that it can later be stored and begin crawler activity for sample information  Phase III: Database • Research and develop an effective structure for storing data and link it to webcrawler  Phase IV: Analysis • Research and develop an effective algorithm for learning from massive amounts of data  Phase V: Verification • Using webcrawler, visit a large volume of websites to ensure that heuristic generated in phase IV is accurate Certain milestones may need to be revisited depending on results in each phase 

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Intelligent Detection of Malicious Script Code