Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Using Data Mining to Develop Profiles to Anticipate Attacks Systems and Software Technology Conference (SSTC 2008) May 1, 2008 Dr. Michael L. Martin Uma Marques MITRE MITRE Standard Disclaimer • The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions or viewpoints expressed by the author. Where is the threat Most of computer security money is spent in prevention -- a bastion mentality Most of the loss is from insider activity (82%) Intrusion Detection is the art of detecting and responding to computer misuse Intrusion Detection (ID) Deterrence (we will find out what you did and catch you) Detection Misuse detection based on known patterns of attack (signatures) Anomaly detection (profile of expected behavior) patterns of acceptable behavior patterns of known misbehavior Intrusion Detection (continued) Response Damage Assessment Attack Anticipation need to assess in dollar terms when (time of year/day; significant dates) type Prosecution Support (forensics) Comparison of Network and Host Based Intrusion Detection Host Based Patterns of File Access Patterns of Application Execution Network Based Analysis of Packets and other network activity Sensor Placement & Firewalls Sensor’s placed outside the Firewall (sometime called the DMZ or demilitarized zone) are useful for detecting the source addresses attempting to attack and for attack anticipation Sensor’s placed inside the Firewall are useful for detecting attacks that get through the firewall and for unauthorized traffic going out Threats to Normal Network Traffic A B A: sender B: receiver A B A: sender B: receiver A: sender B: receiver C Spoofing: C sends message to B which B assumes is from A A B A: sender B: receiver C Interception: C obtains copy of data intended for B only B B A: sender C Normal Transmission A A B: receiver C Modification: C intercepts and changes data intended for B Masquerade: C masquerades as B so A thinks B received the message. A B A: sender B: receiver C Monitoring: C learns about A and B by analyzing traffic Attacker Skills Table 10-1 Skill Level Clueless Script Kiddie Guru Wizard Hierarchy of Attacker Skills Ability Virtually no skills. Able to find ready-made exploit scripts on the Internet and run them following rote instructions. This code may give them root access, activity-hiding capabilities, and back doors for return visits. Unable to deal with non-standard UNIX configurations. Equivalent to an experienced systems administrator. Able to manipulate UNIX systems that are not configured in the standard way. Able to program in C, Perl, and shell script. Check for existence of security programs and logging performed off-system and Intimate knowledge of UNIX internals. Capable of programming in assembly language. Can manipulate hardware and software. Very rare! Evidence All activities are readily apparent. May attempt to cover tracks but with limited success. Activities can be detected with minimal effort. Carefully clears out log files to remove evidence of original compromise. Leaves no obvious traces associated with account used to access system. May leave Trojan horses behind for future access. Leaves virtually no useful evidence on the attached host Attacker Chain of Attack Attacker No attack code running on original workstation Dial-In Dial-up to stolen ISP account. Might use hacked phone switch to confuse trail. Source Originate attack from stolen user account. No root access necessary. User directory may contain executables and especially data files related to system attacks. Proxy One or more intermediary hosts used as cutouts to confuse trail. Can telnet in and telnet back out of stolen account without root access. May use netcat or other introduced executable to set up a convenient proxy. Attack Target Automated attack software normally requires root privileges to access network. Manual attack may not need root access. Attack goal can include: Hostile code can include: •Disabling host •Sniffer •IRC bot •Game bot •Denial-of-service zombie •Gaining logon access •Manipulating a chat room •Special powers on game server Typical Attack Host Exploits Table 10-2 Typical Attack Host Exploits Attack Type Target Sniffer Any host with logon sessions visible on the same LAN segment. Outgoing sessions to hosts on other networks are desirable to hackers Characteristic Evidence Unauthorized daemon running Unauthorized binary Source code for unauthorized binary Network adaptor in promiscuous mode. Log file containing hostnames, usernames, and passwords Unauthorized daemon running Unauthorized binary Source code for unauthorized binary Log or config files IRC bot Internet Relay Chat (IRC) hosts Game bot Game servers Unauthorized daemon running Unauthorized binary Source code for unauthorized binary Log or config files Distributed denial-of service (DDoS) Prominent Internet Web servers Zombie executable (unauthorized daemon) Source code for unauthorized binary Manual Any hosts, local or remote Source or binary code for attacks Unusual outgoing connections Lists of hostnames or IP addresses (victims) Password files or lists of account/password pairs Proactive Intrusion Detection Security violations evolve in multiple states Preliminary stages often not destructive merely preparatory steps in the Attach Scenario Goal is to Detect attack precursors, and take immediate action Preventing the resulting attach (Temporal Data Mining) Phases of Attack Target Identification Potential victim(s) Experienced Crackers keep Long Lists of Potential Victims; Sometimes willing to Share Phases of Attack Intelligence Gathering Probe Systems to garner: Operating System and Version, List of Network Services Provided Use Password Sniffing & Guessing; and well know compromises for Buggy Network Services Most Vulnerability Scans are Heavy-Handed (immediately visible to virtually any network intrusion detection system IDS) Patient & Skillful Attackers can circumvent IDS Phases of Attack Initial Compromise Often Very Messy Easy to find Evidence at this time Unusual Number of Failed Logons Log Records of Buffer Overflows and Undocumented System Features Core Dumps Daemons Restarting Phases of Attack Privilege Escalation Exploit Code to Compromise System Exploit well-know vulnerabilities to Gain Root Last Flurry of Incriminating Error Messages Phases of Attack Reconnaissance First; Logging On? Second; Where are Logs Stored? Logs in Permanente form Best (printer, CD-R) Forwarding Address of Root’s Email Check Administrators’ home directory to see what they have been up to Phases of Attack Reconnaissance Looks for Security Programs Looks for Open Files– what are currently running programs doing? Looks for File Integrity Programs (Tripwire, etc) Systems Administrators (Name, System, etc) Phases of Attack Covering Tracks Deleting Log! (Red Flag—”I’ve been attacked”) Editing Log! (Remove their tracks only) Log Editors for Binary Data (necessary for editing binary log data; equivalent to burglar tools) ---utmp and wtmp are binary log files Phases of Attack Covering Tracks Back Door hidden copy of the command shell that is SUID (set user ID) root (the file is owner by root, the SUID bit is set, and the intruder has execute permission) Hacked Binaries (modification or replacement of standard system executables) also called altered binaries, Trojan horses, hostile changelings, and trojanizing Conceptual Views of Misuse An Unauthorized Individual Accesses Data An Unauthorized Individual Modifies Data Denial of Service Acceptable versus Unacceptable If you had a Perfect Model of Acceptable Behavior OR a Perfect Model of Unacceptable Behavior it would be Easy That is if you have defined all Acceptable Behavior anything else is Unacceptable or Misuse Or is you have defined all Unacceptable Behavior anything else is Acceptable Acceptable Behavior Models Usually based on historic data on ‘acceptable behavior’ System is ‘trained’ on historical data But if training data has unacceptable behavior in it (that was missed) then unacceptable behavior is allowed (false negative) But if training data is missing data on acceptable behavior (false positive) Unacceptable Behavior Models Define ‘all’ unacceptable behavior A Priori rules based on ‘experts’ Catches most Significant Misuse Misses much unacceptable behavior (hard to define all unacceptable behavior with certainty) Detecting Hackers (outsider misuse) Attempts to gain ACCESS Reading an Object (or file) Writing an Object (or file) Planting a TROJAN HORSE Altering Systems Configuration Achieving a FULLY Interactive Login Detecting Hackers (outsider misuse) Denial of Service (DOS) Deleting an Object (necessary part of system) Slowing Down a Network (flooding) Stopping a Program (necessary part of system) Filling Storage Space (no work/file space) Shutting Down a Critical Server Weapons of Choice Network Intrusion Detection Attack Patterns Differ SIGNITIFICANTLY from Normal Access Attack Patterns Pronounced Readily Identifiable Because they EXPLOIT Know Vulnerabilities Known Vulnerabilities HAVE Known Signatures Weapons of Choice Information Assurance Vulnerabilities (IAVA) Know Vulnerabilities & Harden System Against Web Sites for Information on Vulnerabilities -- See SANS top 20) http://www.sans.org/top20/ Misuse Examples Anomalous Outbound Traffic Outbound Information Not Requested Imbalance between Requested and Provided a sign that someone has gotten into your system and is: Attaching from it! (Distributed DOS) Stealing Information! Misuse Examples Site being Swept Range of Attacks AND Range of IP Addresses Done to MAP your Site Done to Probe for Vulnerabilities Solution: Proper Patches & Configuration Misuse Examples Site being Swept (continued) Information Flood Unusually Large Traffic Volume From a “Single” class of Service From a “Single” IP Address From Many IP Addresses Solution - Block IP Address/Class of Service Problem- Might Block Legitimate Connections Misuse Examples Unauthorized Access: Mission-Critical Data Unauthorized Release (Privacy Violation) Unauthorized Alteration Sensitive Medical/Employee/Customer Information Theft Appraisals/Safety Reports/Work Reports/Customer Records Solution: Identify Mission-Critical Data & Define Authorized Use Behavioral Data Forensics In Intrusion Detection Data Mining to Identify Trends AND Specific Activities that Indicate Misuse Decision Support Capabilities of Intrusion Detection Find Out What Happened in a Network of Live Computers Error Detection and Eradication Behavioral Data Forensics: Benefits Detect Insiders Detect Outsiders (Hackers) Identify Trends: Misuse & Suspicions Activity Identify Attack Trends to Harden Networks Improve Policy Fit Observed Versus Predicted Behavior Identify Bad or Missing Policy Data Mining Means to Extract Unknown, Actionable Data From Among Other Things Data Warehouses Nontrivial Extraction of Implicit Previously Unknown, & Potentially Useful Information from Data Process of discovering new correlations, patterns, anomalies and trends by sifting through large amounts of data Data Mining Pattern recognition technologies and statistical and mathematical techniques Tools often based on artificial intelligence techniques Processing Large Quantities of Data at a Central Location Looking for “Patterns of Interest” Purpose of Data Mining Complements predefined and ad hoc access by enabling users to discover new relationships Improvement over a user's "gut feeling" Bottom-up discovery data analysis, also known as "knowledge discovery" Data Mining & Intrusion Detection MADAM ID Constructs Intrusion Detection Signatures in systematic and automated manner Learns classifier that distinguish between intrusions and normal activities ADAM Learns normal network behavior from attack-free training data Connection records of the last delta-seconds continuously mined for new associations rules Data Mining & Intrusion Detection Clustering Unlabeled ID Data normal elements with cluster together and intrusive elements will cluster together Biggest clusters are normal; smallest are intrusive Mining the Alarm Stream Modeling normal and abnormal alarm streams Forms and Formats (Data Types) Raw TCP/IP Data (network event capture) Raw Binary Data (operating system data) ASCII Application Data (e.g., Syslog) Detected Signatures (stored in RDBMS) Behavioral Statistics (stored in RDBMS) User-Centric versus TargetCentric Target-centric Database Optimized to Provide Target Data Example: All Logins on a set of Target Machines User-centric Database Optimized to provide User Data Example: All Logins by User X on any Target Examples of Behavioral Data Forensics Security Unauthorized Changes to Data (Price Lists) Track Consultant Activities (Trust) Administrators Browsing Personal Folders (Abuse of Privilege) Unauthorized User Logging into Backup Account (if they encrypt your backup your toast) Examples of Behavioral Data Forensics Security Policy (monitoring for compliance) Policy Ignored (Locking Screensavers--time to short) Users Applying the Wrong Profile Administrators Not Using Backup Accounts for Backups (used admin account instead) Data Mining Techniques Data Presentation Refinement (change view and tune parameters) Tune Parameters until Interesting Features Stand Out Eliminate Common Occurrences to Zone in on Rarer Interesting occurrences (needle in a haystack) Data Mining Techniques Contextual Interpretations (visualization, clustering, pattern match) Have a Detection Requirement in Mind (predetermined interesting events) Assign Context to Observed Trends (knowledge discovery) Data Mining Techniques Drill Down (get to the root cause-underlying data causing the anomaly) Focus on: Individual Time Frames (odd hours, surge times), Specific Users (most active, unusual hours, many privileges), Specific Actions (Logons, Updates, Large Transactions, Long Transactions), or Targets (Data Servers, Main Servers, Critical Mission Servers) Data Mining Techniques Combining Heterogeneous Data Sources UNIX, Windows NT/2000, Mainframe Incorporating Out-of-Band Data Sources Interviews, Physical Logs, Coworkers Data Mining Examples Target Browsing User Access Multiple Objects in Short Time Frame Critical File Browsing Users Directory Hopping High Activity Data Mining Examples Attack Anticipation (Tip-Off) User Accessing Critical Files at Odd Times (teller when bank is closed) Target Overload (e.g., Server Overload) Load Balancing Problem Causes Crash Damage Assessment -- find loss and document Surveillance -- employee makes threats Policy Compliance -- night logout Summary Behavioral Data Forensics Studies Past Behavior in Event Records Provides Decision Support Capabilities Detects Hackers and Insider Misuse Supports Damage Assessment AND Attack Anticipation Behavioral Data Forensics Facilitates Business Process Reengineering Contact Information Dr Martin may be reached at: Voice: 703-983-1093 Email: [email protected] Uma Marques may be reached at: Voice: 703-983-3783 Email: [email protected] Sources Kruse II, W.G., & Heiser, J.G. (2002). Computer Forensics: Incident Response Essential, New York: Addison-Wesley. Proctor, P. E. (2001). The Practical Intrusion Detection Handbook, Upper Saddle River, NJ: Prentice Hall. McClure, S., Scambray, J., & Kurtz G. (2001). Hacking Exposed: Network Security Secrets & Solutions (3rd ed ). New York: Osborne/McGraw-Hill. Barbara, D., & Jajodia (2002). Applications of Data Mining in Computer Security. Boston: Kluwer Academic Publishers