Download Lecture15

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Analyzing and Securing Social Media
Attacks on Social Media
Dr. Bhavani Thuraisingham
October 9, 2015
 Malware
 Attacks on Social Media
 Data mining solutions
 Malware includes viruses, worms, Trojan horses, time and
logic bombs, botnets, and spyware.
 A number of techniques have been devised by researchers to
counter these attacks; however, the more successful the
researchers become in detecting and preventing the attacks,
the more sophisticated malicious code appears in the wild.
 Thus, the arms race between malware authors and malware
defenders continues to escalate.
 Virus Computer viruses are malware that piggyback onto
other executables and are capable of replicating.
 Viruses can exhibit a wide range of malicious behaviors
ranging from simple annoyance (such as displaying
messages) to widespread destruction such as wiping all the
data in the hard drive (e.g. CIH virus).
 Viruses are not independent programs. Rather, they are code
fragments that exist on other binary files.
 A virus can infect a host machine by replicating itself when it
is brought in contact with that machine, such as via a shared
network drive, removable media, or email attachment.
 The replication is done when the virus code is executed and it
is permitted to write in the memory.
 There are two types of viruses based on their replication
strategy: nonresident and resident.
 The nonresident virus does not store itself on the hard drive
of the infected computer.
 It is only attached to an executable file that infects a
 The virus is activated each time the infected executable is
accessed and run.
 When activated, the virus looks for other victims (e.g., other
executables) and infects them.
 On the contrary, resident viruses allocate memory in the
computer hard drive, such as the boot sector.
 These viruses become active every time the infected machine
 Worms Computer worms are malware but unlike viruses, they
need not attach themselves to other binaries.
 Worms are capable of propagating themselves to other hosts
though network connections.
 Worms also exhibit a wide range of malicious behavior such
as spamming, phishing, harvesting and sending sensitive
information to the worm writer, jamming or slowing down
network connections, deleting data from hard drive and so
 Worms are independent programs, and reside in the infected
machine by camouflage.
 Some of the worms open a backdoor in the infected machine
allowing the worm writer to control the machine and making it
a zombie (or bot) for his malicious activities.
 Trojan Horse Trojan horses have been studied within the
context of multilevel databases.
 They covertly pass information from a high level process to a
low level process.
 Good example of a Trojan horse is the manipulation of file
 A Secret process cannot directly send data to an unclassified
process as this will constitute a write down.
 However, a malicious Secret process can covertly pass data
to an unclassified process by manipulating the file locks as
 Suppose both processes want to access say, an unclassified
 The secret process wants to read from the file while the
unclassified process can write into the file.
 However, both processes cannot obtain the read and write locks at
the same time.
 Therefore, at time T1 let’s assume that the Secret process has the
read lock while the unclassified process attempts to get a write lock.
 The unclassified process cannot obtain this lock. This means a one
bit information say, 0, is passed to the unclassified process.
 At time T2, let’s assume the situation does not change. This means
one bit information of 0 is passed.
 However, at time T3, let’s assume the Secret process does not have
the read lock, in which case the unclassified process can obtain the
write lock.
 This time one bit information of 1 is passed.
 Over time a classified string of 0011000011101 could be passed from
the Secret process to the unclassified process.
 In the software paradigm, time bomb refers to a computer program
that stops functioning after a pre-specified time/date has reached.
 This is usually imposed by software companies in beta versions of
software so that the software stops functioning after a certain date.
An example is the Windows Vista Beta 2 that stopped functioning on
May 31, 2007
 A logic bomb is a computer program that is intended to perform
 This technique is sometimes injected into viruses or worms to
increase the chances of survival and spreading before getting
 Botnet Botnet is a network of compromised hosts or bots, under the
control of a human attacker known as the botmaster.
 The botmaster can issue commands to the bots to perform malicious
actions, such as recruiting new bots, launching coordinated DDoS
attacks against some hosts, stealing sensitive information from the
bot machine, sending mass spam emails and so on.
 Thus, botnets have emerged as an enormous threat to the internet
 Spyware is a type of malware that can be installed on computers,
which collects information about users without their knowledge.
 For example, spyware observes the websites visited by the user, the
emails sent by the user and in general the activities carried out by
the user in his/her computer.
 Spyware is usually hidden from the user.
 However, sometimes employers can install spyware to find out the
computer activities of the employees.
 An example of spyware is keylogger (also called keystroke logging)
Attacks on Social Media
 There are three types of attacks
 One is to attack the social media
 The other is to attack the computer systems, networks and
infrastructures through social media. T
 The third group consists of attacks specially formulated for social
media systems.
Attacks on Social Media
 De-Anonymization Attacks: In this attack, hackers can exploit the
group membership information about the members of the networks
and subsequently identify the members.
 “Group information is available on social networking sites”.
 Specifically they used the web browser attacks to obtain the group
membership information.
 When a member of a group and the social network visit a malicious
website, the website will carry out the attack the de-anonymization
attack formulated by the hacker.
 Source: on “A Practical Attack to De-Anonymize Social Network
Users”, Wondracek et al
Attacks on Social Media
 Sourse: Seven Deadly Attacks; Timm and Perez
 Seven attacks that could occur including malware attacks, phishing
attacks, and identity theft.
 For example, for malware attacks they state that there are two ways
the malware can compromise the network.
 One is a virus that will infect the system and the other is a malware
such as a Trojan horse that could conceal information.
 They also explain the cross site scripting (XSS) attack where the
malware will enable the user’s browser to execute the attacker’s
code and cause a compromise to the network.
Attacks on Social Media
 COMBOFIX List of Attacks: The COMBOFIX website lists several
attacks to social
 The Bad SEO attack attracts the user to a website that contains the
malware. The users are also lured to fake websites.
 The Pornspace malware is a worm that utilized a flaw in the security
mailing list of MySpace and stole the profiles of the users and then
sent porn-based spam.
 In the Over the Rainbow malware attack the hacker’s embedded
JavaScript code into Twitter messages that can retweet.
 The user as well as the members of his/her network could be
directed to porn sites.
 In the Dislike Scam on Facebook attack which affected Facebook,
the users were given bogus surveys and once they filled the surveys
that were attacked by a malware.
Attacks on Social Media
 Top Ten Attacks in Social Media: At the RSA conference in 2014,
Gary Bahadur, the CEO of KRAA Security describes various attacks
to Facebook, Twitter, LinkedIn as well as some other social media
- For example, he explains how an Android malware attack spread
through Facebook.
- This attack shows that the gadgets we use to connect to a social
network site can cause a serious attack to the site.
 Top Nine Social Media Threats of 2015: The Zerofox website
published the top nine social media threats including executive
impersonations, corporate impersonations, account takeover,
customer scams and phishing attacks.
- An account takeover attack in 2015 was especially sinister as it
affected the United States Central Command (CENTCOM).
Attacks on Social Media
 Financial Times Report: On July 30, 2015, the Financial Times
reported that hackers are using Twitter to conceal intrusions.
- For example, the hackers used Twitter images to conceal
malware and from there attacked the computers they wanted to
- This attack appears to be similar to a stenographic attack where
suspicious messages are embedded into a media such as
images and video.
 Link Privacy Attacks: In their article on link privacy Effendy et al
discuss a version of the link privacy attack.
It is essentially bribing or compromising some of the members
(usually a small number) in a social networks and using this
obtain the link details (that is, who their friends are) of those
members who are non-compromised.
Attacks on Social Media
 Evil Twin Attack involves perpetrators pretending to be legitimate
users in order to gain something they are not entitled to.
- Evil Twin Attacks on social networking site occur when
perpetrators impersonate companies to get access to the social
 Identity Theft is common in social media
 Here the attacker hacks into the social media site, gets the
identity of the legitimate user and starts posting information on
the site.
 Cyber Bullying
Data Mining for Malware Detection
 Data mining overview
 Intrusion detection, Malicious code detection, Buffer
overflow detection, Email worm detection (worms and
 Novel Class Detection for polymorphic malware
 Reference:
Data Mining Tools for Malware Detection
Masud, Khan and Thuraisingham
CRC Press/Taylor and Francis, 2011
What is Data Mining?
Information Harvesting
Knowledge Mining
Data Mining
Knowledge Discovery
in Databases
Data Dredging
Data Archaeology
Data Pattern Processing
Database Mining
Knowledge Extraction
The process of discovering meaningful new correlations, patterns, and trends by
sifting through large amounts of data, often previously unknown, using pattern
recognition technologies and statistical and mathematical techniques
(Thuraisingham, Data Mining, CRC Press 1998)
What’s going on in data mining?
 What are the technologies for data mining?
- Database management, data warehousing, machine learning,
statistics, pattern recognition, visualization, parallel processing
 What can data mining do for you?
- Data mining outcomes: Classification, Clustering, Association,
Anomaly detection, Prediction, Estimation, . . .
 How do you carry out data mining?
- Data mining techniques: Decision trees, Neural networks,
Market-basket analysis, Link analysis, Genetic algorithms, . . .
 What is the current status?
- Many commercial products mine relational databases
 What are some of the challenges?
- Mining unstructured data, extracting useful patterns, web
mining, Data mining, security and privacy
Data Mining for Intrusion Detection: Problem
An intrusion can be defined as “any set of actions that attempt to
compromise the integrity, confidentiality, or availability of a resource”.
Attacks are:
Intrusion detection systems are split into two groups:
Host-based attacks
Network-based attacks
Anomaly detection systems
Misuse detection systems
Use audit logs
Capture all activities in network and hosts.
But the amount of data is huge!
Misuse Detection
 Misuse Detection
Problem: Anomaly Detection
 Anomaly Detection
Our Approach: Overview
Clustering (DGSOT)
SVM Class Training
DGSOT: Dynamically growing self organizing tree
Testing Data
Our Approach: Hierarchical Clustering
Our Approach
Hierarchical clustering with SVM flow chart
Training Time, FP and FN Rates of Various Methods
0.44 hours
Pure SVM
17.34 hours
26.7 hours
13.18 hours
Introduction: Detecting Malicious Executables using Data Mining
What are malicious executables?
Harm computer systems
- Virus, Exploit, Denial of Service (DoS), Flooder, Sniffer, Spoofer,
Trojan etc.
- Exploits software vulnerability on a victim
- May remotely infect other victims
Incurs great loss. Example: Code Red epidemic cost $2.6
Malicious code detection: Traditional approach
Signature based
- Requires signatures to be generated by human experts
- So, not effective against “zero day” attacks
Feature Extraction and Hybrid Model
✗Our Approach
✗Analyze Binary Code and Assembly Code (Hybrid Model)
✗Binary n-gram features
Sequence of n consecutive bytes of binary executable
Sequence of n consecutive assembly instructions
✗Assembly n-gram features
✗System API call features
 Collect training samples of normal and malicious executables.
 Extract features
 Train a Classifier and build a model
 Test the model against test samples
Hybrid Feature Retrieval (HFR): Training and
Feature Extraction
Binary n-gram features
- Features are extracted from the byte codes in the form of ngrams, where n = 2,4,6,8,10 and so on.
Given a 11-byte sequence: 0123456789abcdef012345,
The 2-grams (2-byte sequences) are: 0123, 2345, 4567, 6789,
89ab, abcd, cdef, ef01, 0123, 2345
The 4-grams (4-byte sequences) are: 01234567, 23456789,
456789ab,...,ef012345 and so on....
- Large dataset. Too many features (millions!).
- Use secondary memory, efficient data structures
- Apply feature selection
Feature Extraction
Assembly n-gram features
- Features are extracted from the assembly programs in the form
of n-grams, where n = 2,4,6,8,10 and so on.
three instructions
“push eax”; “mov eax, dword[0f34]” ; “add ecx, eax”;
(1) “push eax”; “mov eax, dword[0f34]”;
(2) “mov eax, dword[0f34]”; “add ecx, eax”;
- Same problem as binary
- Same solution
Feature Selection
Select Best K features
Selection Criteria: Information Gain
Gain of an attribute A on a collection of examples S is given by
| Sv |
Gain ( S, A)  Entropy ( S) 
Entropy ( Sv )
VValues ( A)
Dataset1: 838 Malicious and 597 Benign executables
Dataset2: 1082 Malicious and 1370 Benign executables
Collected Malicious code from VX Heavens (
Pedisassem ( )
Training, Testing
Support Vector Machine (SVM)
C-Support Vector Classifiers with an RBF kernel
HFS = Hybrid Feature Set
BFS = Binary Feature Set
AFS = Assembly Feature Set
HFS = Hybrid Feature Set
 BFS = Binary Feature Set
 AFS = Assembly Feature Set
HFS = Hybrid Feature Set
 BFS = Binary Feature Set
 AFS = Assembly Feature Set
 Malware is evolving continuously
 Malware attacking social networks
 Data mining solution is one approach to handle the