Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Analyzing and Securing Social Media Attacks on Social Media Dr. Bhavani Thuraisingham October 9, 2015 Outline Malware Attacks on Social Media Data mining solutions Malware Malware includes viruses, worms, Trojan horses, time and logic bombs, botnets, and spyware. A number of techniques have been devised by researchers to counter these attacks; however, the more successful the researchers become in detecting and preventing the attacks, the more sophisticated malicious code appears in the wild. Thus, the arms race between malware authors and malware defenders continues to escalate. Malware Virus Computer viruses are malware that piggyback onto other executables and are capable of replicating. Viruses can exhibit a wide range of malicious behaviors ranging from simple annoyance (such as displaying messages) to widespread destruction such as wiping all the data in the hard drive (e.g. CIH virus). Viruses are not independent programs. Rather, they are code fragments that exist on other binary files. A virus can infect a host machine by replicating itself when it is brought in contact with that machine, such as via a shared network drive, removable media, or email attachment. The replication is done when the virus code is executed and it is permitted to write in the memory. Malware There are two types of viruses based on their replication strategy: nonresident and resident. The nonresident virus does not store itself on the hard drive of the infected computer. It is only attached to an executable file that infects a computer. The virus is activated each time the infected executable is accessed and run. When activated, the virus looks for other victims (e.g., other executables) and infects them. On the contrary, resident viruses allocate memory in the computer hard drive, such as the boot sector. These viruses become active every time the infected machine starts. Malware Worms Computer worms are malware but unlike viruses, they need not attach themselves to other binaries. Worms are capable of propagating themselves to other hosts though network connections. Worms also exhibit a wide range of malicious behavior such as spamming, phishing, harvesting and sending sensitive information to the worm writer, jamming or slowing down network connections, deleting data from hard drive and so on. Worms are independent programs, and reside in the infected machine by camouflage. Some of the worms open a backdoor in the infected machine allowing the worm writer to control the machine and making it a zombie (or bot) for his malicious activities. Malware Trojan Horse Trojan horses have been studied within the context of multilevel databases. They covertly pass information from a high level process to a low level process. Good example of a Trojan horse is the manipulation of file locks A Secret process cannot directly send data to an unclassified process as this will constitute a write down. However, a malicious Secret process can covertly pass data to an unclassified process by manipulating the file locks as follows. Suppose both processes want to access say, an unclassified file. The secret process wants to read from the file while the unclassified process can write into the file. Malware However, both processes cannot obtain the read and write locks at the same time. Therefore, at time T1 let’s assume that the Secret process has the read lock while the unclassified process attempts to get a write lock. The unclassified process cannot obtain this lock. This means a one bit information say, 0, is passed to the unclassified process. At time T2, let’s assume the situation does not change. This means one bit information of 0 is passed. However, at time T3, let’s assume the Secret process does not have the read lock, in which case the unclassified process can obtain the write lock. This time one bit information of 1 is passed. Over time a classified string of 0011000011101 could be passed from the Secret process to the unclassified process. Malware In the software paradigm, time bomb refers to a computer program that stops functioning after a pre-specified time/date has reached. This is usually imposed by software companies in beta versions of software so that the software stops functioning after a certain date. An example is the Windows Vista Beta 2 that stopped functioning on May 31, 2007 A logic bomb is a computer program that is intended to perform This technique is sometimes injected into viruses or worms to increase the chances of survival and spreading before getting caught. Malware Botnet Botnet is a network of compromised hosts or bots, under the control of a human attacker known as the botmaster. The botmaster can issue commands to the bots to perform malicious actions, such as recruiting new bots, launching coordinated DDoS attacks against some hosts, stealing sensitive information from the bot machine, sending mass spam emails and so on. Thus, botnets have emerged as an enormous threat to the internet community. Malware Spyware is a type of malware that can be installed on computers, which collects information about users without their knowledge. For example, spyware observes the websites visited by the user, the emails sent by the user and in general the activities carried out by the user in his/her computer. Spyware is usually hidden from the user. However, sometimes employers can install spyware to find out the computer activities of the employees. An example of spyware is keylogger (also called keystroke logging) software. Attacks on Social Media There are three types of attacks One is to attack the social media The other is to attack the computer systems, networks and infrastructures through social media. T The third group consists of attacks specially formulated for social media systems. Attacks on Social Media De-Anonymization Attacks: In this attack, hackers can exploit the group membership information about the members of the networks and subsequently identify the members. “Group information is available on social networking sites”. Specifically they used the web browser attacks to obtain the group membership information. When a member of a group and the social network visit a malicious website, the website will carry out the attack the de-anonymization attack formulated by the hacker. Source: on “A Practical Attack to De-Anonymize Social Network Users”, Wondracek et al Attacks on Social Media Sourse: Seven Deadly Attacks; Timm and Perez Seven attacks that could occur including malware attacks, phishing attacks, and identity theft. For example, for malware attacks they state that there are two ways the malware can compromise the network. One is a virus that will infect the system and the other is a malware such as a Trojan horse that could conceal information. They also explain the cross site scripting (XSS) attack where the malware will enable the user’s browser to execute the attacker’s code and cause a compromise to the network. Attacks on Social Media COMBOFIX List of Attacks: The COMBOFIX website lists several attacks to social The Bad SEO attack attracts the user to a website that contains the malware. The users are also lured to fake websites. The Pornspace malware is a worm that utilized a flaw in the security mailing list of MySpace and stole the profiles of the users and then sent porn-based spam. In the Over the Rainbow malware attack the hacker’s embedded JavaScript code into Twitter messages that can retweet. The user as well as the members of his/her network could be directed to porn sites. In the Dislike Scam on Facebook attack which affected Facebook, the users were given bogus surveys and once they filled the surveys that were attacked by a malware. Attacks on Social Media Top Ten Attacks in Social Media: At the RSA conference in 2014, Gary Bahadur, the CEO of KRAA Security describes various attacks to Facebook, Twitter, LinkedIn as well as some other social media attacks. - For example, he explains how an Android malware attack spread through Facebook. - This attack shows that the gadgets we use to connect to a social network site can cause a serious attack to the site. Top Nine Social Media Threats of 2015: The Zerofox website published the top nine social media threats including executive impersonations, corporate impersonations, account takeover, customer scams and phishing attacks. - An account takeover attack in 2015 was especially sinister as it affected the United States Central Command (CENTCOM). Attacks on Social Media Financial Times Report: On July 30, 2015, the Financial Times reported that hackers are using Twitter to conceal intrusions. - For example, the hackers used Twitter images to conceal malware and from there attacked the computers they wanted to compromise. - This attack appears to be similar to a stenographic attack where suspicious messages are embedded into a media such as images and video. Link Privacy Attacks: In their article on link privacy Effendy et al discuss a version of the link privacy attack. - It is essentially bribing or compromising some of the members (usually a small number) in a social networks and using this obtain the link details (that is, who their friends are) of those members who are non-compromised. Attacks on Social Media Evil Twin Attack involves perpetrators pretending to be legitimate users in order to gain something they are not entitled to. - Evil Twin Attacks on social networking site occur when perpetrators impersonate companies to get access to the social network. Identity Theft is common in social media Here the attacker hacks into the social media site, gets the identity of the legitimate user and starts posting information on the site. Cyber Bullying Data Mining for Malware Detection Data mining overview Intrusion detection, Malicious code detection, Buffer overflow detection, Email worm detection (worms and virus) Novel Class Detection for polymorphic malware Reference: - Data Mining Tools for Malware Detection Masud, Khan and Thuraisingham CRC Press/Taylor and Francis, 2011 What is Data Mining? Information Harvesting Knowledge Mining Data Mining Knowledge Discovery in Databases Data Dredging Data Archaeology Data Pattern Processing Database Mining Knowledge Extraction Siftware The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data, often previously unknown, using pattern recognition technologies and statistical and mathematical techniques (Thuraisingham, Data Mining, CRC Press 1998) What’s going on in data mining? What are the technologies for data mining? - Database management, data warehousing, machine learning, statistics, pattern recognition, visualization, parallel processing What can data mining do for you? - Data mining outcomes: Classification, Clustering, Association, Anomaly detection, Prediction, Estimation, . . . How do you carry out data mining? - Data mining techniques: Decision trees, Neural networks, Market-basket analysis, Link analysis, Genetic algorithms, . . . What is the current status? - Many commercial products mine relational databases What are some of the challenges? - Mining unstructured data, extracting useful patterns, web mining, Data mining, security and privacy Data Mining for Intrusion Detection: Problem An intrusion can be defined as “any set of actions that attempt to compromise the integrity, confidentiality, or availability of a resource”. Attacks are: Intrusion detection systems are split into two groups: Host-based attacks Network-based attacks Anomaly detection systems Misuse detection systems Use audit logs - Capture all activities in network and hosts. But the amount of data is huge! Misuse Detection Misuse Detection Problem: Anomaly Detection Anomaly Detection Our Approach: Overview Training Data Class Hierarchical Clustering (DGSOT) SVM Class Training Testing DGSOT: Dynamically growing self organizing tree Testing Data Our Approach: Hierarchical Clustering Our Approach Hierarchical clustering with SVM flow chart Results Training Time, FP and FN Rates of Various Methods Average FP Average FN Rate (%) Rate (%) Accuracy Total Training Time Random Selection 52% 0.44 hours 40 47 Pure SVM 57.6% 17.34 hours 35.5 42 SVM+Rocchio Bundling 51.6% 26.7 hours 44.2 48 SVM + DGSOT 69.8% 13.18 hours 37.8 29.8 Methods Average Introduction: Detecting Malicious Executables using Data Mining What are malicious executables? Harm computer systems - Virus, Exploit, Denial of Service (DoS), Flooder, Sniffer, Spoofer, Trojan etc. - Exploits software vulnerability on a victim - May remotely infect other victims Incurs great loss. Example: Code Red epidemic cost $2.6 Billion - Malicious code detection: Traditional approach Signature based - Requires signatures to be generated by human experts - So, not effective against “zero day” attacks - Feature Extraction and Hybrid Model ✗Our Approach ✗Analyze Binary Code and Assembly Code (Hybrid Model) ✗Features ✗Binary n-gram features Sequence of n consecutive bytes of binary executable Sequence of n consecutive assembly instructions ✗Assembly n-gram features ✗System API call features Collect training samples of normal and malicious executables. Extract features Train a Classifier and build a model Test the model against test samples Hybrid Feature Retrieval (HFR): Training and Testing Feature Extraction Binary n-gram features - Features are extracted from the byte codes in the form of ngrams, where n = 2,4,6,8,10 and so on. Example: Given a 11-byte sequence: 0123456789abcdef012345, The 2-grams (2-byte sequences) are: 0123, 2345, 4567, 6789, 89ab, abcd, cdef, ef01, 0123, 2345 The 4-grams (4-byte sequences) are: 01234567, 23456789, 456789ab,...,ef012345 and so on.... Problem: - Large dataset. Too many features (millions!). Solution: - Use secondary memory, efficient data structures - Apply feature selection Feature Extraction Assembly n-gram features - Features are extracted from the assembly programs in the form of n-grams, where n = 2,4,6,8,10 and so on. Example: three instructions “push eax”; “mov eax, dword[0f34]” ; “add ecx, eax”; 2-grams (1) “push eax”; “mov eax, dword[0f34]”; (2) “mov eax, dword[0f34]”; “add ecx, eax”; Problem: - Same problem as binary Solution: - Same solution Feature Selection 0 Select Best K features 0 Selection Criteria: Information Gain 0 Gain of an attribute A on a collection of examples S is given by | Sv | Gain ( S, A) Entropy ( S) Entropy ( Sv ) | S | VValues ( A) Experiments Dataset Dataset1: 838 Malicious and 597 Benign executables Dataset2: 1082 Malicious and 1370 Benign executables Collected Malicious code from VX Heavens (http://vx.netlux.org) Disassembly Pedisassem ( http://www.geocities.com/~sangcho/index.html ) Training, Testing Support Vector Machine (SVM) C-Support Vector Classifiers with an RBF kernel - - Results HFS = Hybrid Feature Set BFS = Binary Feature Set AFS = Assembly Feature Set Results HFS = Hybrid Feature Set BFS = Binary Feature Set AFS = Assembly Feature Set Results HFS = Hybrid Feature Set BFS = Binary Feature Set AFS = Assembly Feature Set Directions Malware is evolving continuously Malware attacking social networks Data mining solution is one approach to handle the problem.