Download using clustering and machine learning for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Measure of a Man (Star Trek: The Next Generation) wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Machine learning wikipedia , lookup

Pattern recognition wikipedia , lookup

Time series wikipedia , lookup

Transcript
Conference Session A11
Paper Number 185
Disclaimer—This paper partially fulfills a writing requirement for first year (freshman) engineering students at the University
of Pittsburgh Swanson School of Engineering. This paper is a student, not a professional, paper. This paper is based on publicly
available information and may not provide complete analyses of all relevant data. If this paper is used for any purpose other than
these authors’ partial fulfillment of a writing requirement for first year (freshman) engineering students at the University Of
Pittsburgh Swanson School Of Engineering, the user does so at his or her own risk.
USING CLUSTERING AND MACHINE LEARNING FOR ANOMALOUS
BREACH DETECTION
Jarod Vickers, [email protected], Mahboobin 4:00, Andrew Tran, [email protected], Mahboobin 4:00
Abstract — In the current digital world where almost everyone
has constant access to the internet in some way, users are
constantly in danger of hackers who can remotely access and
control devices as well as view personal information all while
remaining undetected. Traditional cyber security systems use
basic statistical analysis to notify human analysts of suspicious
activity that may be signs of remote access, and a simple
firewall. However, the problem with this method is that these
systems often overwhelm security analysts with red flags that
are mostly false-positives, and the firewalls are fairly simple
for advanced hackers to breach.
A solution to this problem is to incorporate clustering, an
analysis model that incorporates both data mining and
machine learning algorithms, into the threat detection process.
This method is known as user entity behavioral analytics
(UEBA). There are already security systems that implement
UEBA, but they are not widely used due to many factors such
as high cost and ethical concerns. UEBA can take full
advantage of clustering paired with machine learning, as it
groups data into two overall categories: normal behavior and
anomalous behavior. This grouping happens quickly and
automatically making UEBA the most efficient form of breach
detection available as well as the most sustainable option.
The implementation of UEBA will have a profound effect
on the future of cyber security by significantly decreasing the
amount of successful cyber-attacks, especially when its flaws
are overcome.
Key Words—Sustainability, Computers, Cyber security,
Breach detection, Machine learning, Data mining, Clustering
data is stored on a personal computer (PC), a user must login
to access their information and if it is stored on the cloud, such
as an email, a password must be used. Without the correct
information, such as a password, the data stays encrypted
making it unreadable, but there are ways to access this
information without knowing a user’s password.
Every day, people send personal information to make
accounts, apply for loans, make online purchases, etc.
Encryption ensures that this personal information cannot be
seen by anyone without the user’s permission. However,
encryption is not perfect. Similar to how doors can be unlocked
without keys by lock picking, encrypted data can be unlocked
without passwords by hacking. If they are able to break an
encryption, hackers are able to access personal information
remotely, but sometimes cannot be stopped using current
methods of cyber security.
Cyber security is a major issue as computers and big data
become more and more ubiquitous because people’s personal
information is being given out to companies and organizations
who are trusted to keep this information private. The current
cyber security methods are not enough to stop all cyberattacks, however. Current cyber security methods have one
main flaw: they are computer programs. Intelligent and clever
hackers discover new ways of bypassing security that the
current methods are not programmed to deal with or even
detect. When a hacker can figure out how that program works,
they can find a way to get around it. For this reason, these
methods, like many other computer programs, are inherently
unsustainable. A sustainable computer program can best be
described as a program that can be maintained and updated to
continuously perform at a high level. An example of a
sustainable computer program is Microsoft Office. Microsoft
Office was released in 1990, and still continues to be updated
and sold worldwide [1]. A solution to this sustainability
problem is to allow the program to change, adapt, and
essentially learn. This technology actually already exists and it
called user entity behavioral analytics (UEBA).
CYBERSECURITY: DETECTION OVER
PROTECTION
The rapid advancements of computing systems has led to
more convenient lives, but has also left personal information
vulnerable to anyone smart enough access it. Information in
recent years has all been stored in computer hard drives and
can even be sent and stored in large data storage centers,
commonly referred to as the cloud, if there is access to the
internet. This information can include passwords to a personal
email account, credit card information, and financial records.
Not just anyone can access this information of course. If the
Evolution of Cyber Security and Cyber Attacks
Ever since the dawn of the World Wide Web in 1990,
computer viruses have been a serious issue in the technological
world. Viruses started off simple, as basic worms, with the
1
University of Pittsburgh, Swanson School of Engineering
3.31.2017
Jarod Vickers
Andrew Tran
first being created by Robert Morris in 1989 [2]. Worms
initially only served as Denial of Service (DoS) attacks which
simply flooded users’ networks with superfluous requests and
disrupted internet connection. Due to the infancy of the
internet, DoS attacks had no profound effect. These basic
worms eventually evolved into modern day viruses, which
have more impact, as they are able to access and retrieve
information from PCs. These viruses led to the creation of
antivirus programs.
Antivirus programs were initially developed in the early
1990s, following the widespread attack of the Melissa and
ILOVEYOU viruses. Both of these viruses were viruses sent
through email, and infected personal computers, with the
ability to extract private information [2]. Antivirus programs
were designed to recognize the signature of computer viruses
and prevent them from executing. This task was accomplished
by what is known as a firewall. Firewalls, at the most basic
level, function as a routing device; they act as a gateway and
only allow certain addresses to have access to whatever
information the firewall is protecting [3]. In the early days of
the internet, a firewall was sufficient, as it was able to stop
most basic attacks. As time and technology progressed,
hackers became smarter, and began to develop new ways to
breach security systems. Now, rather than creating viruses that
infect computer systems, computer hackers manually breach
firewalls, and do the dirty work themselves.
Modern hackers are smart and advanced enough to
breach practically any firewall. In this day in age, firewalls
simply exist to slow these hackers down. As evident by the
recent breach of Target, a massive corporation, hackers have
the ability to breach almost anything. As a result, focus for
cyber security has shifted from prevention to
containment. However, most cyber breaches go over 100 days
unnoticed, and thus millions of bits of data can be
compromised. In the Target breach alone, over 40 million
credit and debit card numbers were compromised [2]. UEBA
is the future in containing these threats, as it has the ability to
detect cyber breaches and notify professionals, in a fraction of
the time.
anomalies and deviations from the normal behavior. This
method can make a unique model for every system and can
even detect “day-zero” attacks. However, a major problem
with this method is the high false alarm rates. Any behavior
that deviates from the normal model will be flagged, even if it
is completely legitimate [4].
One of the flaws of the current methods of cyber security
is the heavy dependence on developers. Programs are only
capable of doing what they are programmed to do. If a new
type of virus is created, the program will be useless until the
virus is discovered by a human and the program is updated to
protect against it. To solve this problem, these cyber security
programs must have the ability to learn and adapt
automatically.
ALGORITHMS USED IN BREACH
DETECTION
The ability to learn is made possible through two key
algorithms: machine learning and data mining. An algorithm
is essentially a sequence of instructions to convert an input to
an output. For example, algorithms can take a set of numbers
and output a sorted list from least to greatest or take a list of
names and output the list in alphabetical order. Simple
algorithms for tasks such as sorting have simple known
instructions and rules such as, one is less than two, and A
comes before B in the alphabet. This is the basis behind the
signature-based method of cyber security. If attacks, such as
viruses, are detected, they can be recorded and made into a new
“rule.” This method works, but not against novelty “day-zero”
attacks.
There are more abstract rules for certain types of data that
algorithms cannot be made for. For example, the rules for what
is considered a spam email is different from person to person.
The solution to this is to allow computers to learn and
automatically create algorithms specifically for its user. This
solution will not only increase the effectiveness of cyber
security, but also increase sustainability. Security systems will
be able to change and adapt even when day-zero attacks occur,
possibly preventing any future attacks that may occur. One of
the algorithms that allows a system to adapt is machine
learning.
Current Methods of Cyber Security
Security systems, such as anti-virus software, that are
currently being implemented use methods known as misusebased analytics, also known as signature based analytics, and
anomaly-based analytics. Misuse-based analytics can detect
viruses by comparing them to previously known viruses. As a
benefit, this method does not raise many false alarms but
requires frequent updates as new types of viruses are
discovered. The only way this method is effective is if new
viruses are similar to previously detected viruses which makes
it useless during the very first time a new type of virus is used,
known as a “day-zero” attack.
The other method is anomaly-based analytics which is
step towards more intelligent analysis. Anomaly-based
techniques model a system’s normal behavior and detects and
Machine Learning
Machine learning is a type of artificial intelligence that
allows a system to learn without explicitly being programmed.
Learning as it pertains to machines is the ability to change and
update its algorithms automatically. There are two types of
machine learning: supervised and unsupervised. Supervised
machine learning allows a computer to learn based on inputs
and corresponding outputs, based on previously collected data,
provided by a supervisor.
2
Jarod Vickers
Andrew Tran
techniques are able to extract data, and create a set of patterns
and rules to explain the data. This can be accomplished
through a wide variety of different methods, with clustering
being one of these methods. Clustering is a set of techniques
for finding patterns in high-dimensional unlabeled data. It is a
data mining method in which similar data is grouped together
by focusing on dividing separate instances into natural groups,
rather than into predicted groups [6]. Natural groups are
groups created from data patterns (unsupervised learning),
whereas predicted groups are groups created by a user
(supervised learning). There are two primary types of
clustering techniques, distance-based and density-based. In
cases involving anomaly detection and cybersecurity, the
distance-based clustering style is preferred, as it produces
more results overall than density-based clustering [7].
The basic technique of distance-based clustering is
derived from k-means clustering, as the method relies on
distance from means, or averages. The first step in this process
is establishing a cluster set, C = {C1, …, Cj}, where j is the
number of clusters. Following this step, each data point x in a
data set x = {x, …, xk} is assigned to the closest cluster,
according to a Euclidean distance (see Figure 2) between the
data point and the cluster’s center. The Euclidian distance is a
calculated difference between a data point’s value and the
mean of a cluster. If the distance between any data point and
the center of the cluster is considered to be too great, a new
cluster is created to accommodate for this data point and future
data points [7]. Once all data points are assigned to a certain
cluster, the average of each cluster, also known as a centroid,
is calculated. These centroids are then used as the new centers
for the clusters, and the entire process stated above is repeated.
Once the centroids have stabilized and are no longer
fluctuating, the data is considered completely clustered [8].
FIGURE 1 [5]
A hypothetical engine power versus price plot of various
cars.
These data points are then categorized manually into
positive and negative examples, which is known as the training
process. For example, in figure 1, various cars are plotted
based on engine power and cost. The positive examples,
represented as “+” marks, are considered by a supervisor to be
family cars while negative examples, represented by “-“marks,
are not family cars.
Based on the data, a computer can apply bounds to the
positive example based on how specific the supervisor wants
to be. The specificity is called the hypothesis class, represented
by the shaded rectangles in the figure 1. In real-life scenarios,
there are often more than just two variables to consider and
hypothesis classes do not have simple rectangular shapes.
There are more factors that go into categorizing a car as a
family car other just engine power and price, such as passenger
limit, efficiency, etc. With machine learning a computer is able
to adjust its own hypothesis class as new data is provided. New
data can later be categorized by comparing its characteristics
with the characteristics of data previously plotted.
Unsupervised learning works similarly to supervised
learning, but requires no supervisor during the training process
which only includes inputs [5]. Without predetermined outputs
to determine positive and negative examples, unsupervised
machine learning analyzes the data given during the training
period to automatically determine hypothesis classes. This is
done by comparing the characteristics of each given data point
and grouping them with other similar data points. When new
data points are analyzed, they will be assigned to existing
groups or a new group will be made if a data point deviates
significantly from the others. Unsupervised learning is
completely autonomous which is beneficial when new and
unknown data is received as it does not need a human to tell
the machine what to do with it. This method is called clustering
when it is used in tandem with another algorithm: data mining.
FIGURE 2 [6]
The Euclidean Distance formula. This formula calculates
the distance between any 2 given data points.
When applying clustering to network behavior, it can be
difficult to tell what is considered ‘normal’ behavior and what
is ‘anomalous’ behavior. In most cases, it can be safely
assumed that ‘normal’ data will consist of a much larger
percentage of the data set than ‘anomalous’ data, and thus will
be labeled as such [7]. Unfortunately, with this basic method,
data can sometimes fall under more than one category, and it
is not completely clear which group the data belongs to, which
presents a major issue. However, advanced algorithms of
Data Mining: Clustering
Data mining is the process of extracting knowledge from
a large amount of data. Through machine learning, data mining
3
Jarod Vickers
Andrew Tran
clustering deal with this issue, namely probability-based
clustering.
clustering are still in their infancy stages. As time passes, the
algorithms will be optimized, and become more effective and
efficient. Therefore, the incorporation of these flexible
methods in cybersecurity programs, such as UEBA, vastly
increases the ability to maintain and update a program, leading
to an overall sustainable program.
Probabilistic Clustering: An Advanced Method
Probability-based clustering works very similarly to kmeans clustering. The base algorithm is exactly the same; data
is organized into clusters based on their distance from the
centroids of the clusters. In the case when a data point falls in
between more than one cluster, a statistical approach is taken
[6]. Probability-based clustering defines clusters in terms of a
mean and standard deviation. Using the values of the means
and standard deviations, each data point can be described as a
function of x (the data point), the mean of each cluster, and the
standard deviation of each cluster. As a result of this function,
each data point is statistically assigned to a cluster. This
minimizes the aforementioned issue of data points that are
located somewhere in between 2 clusters.
NETWORK ANOMALY DETECTION
In the new modern age of technology, the number of
network attacks has increased dramatically in both number and
quality. In an effort to combat the always developing hackers
of the world, significant research has been conducted on
network intrusion detection. Network anomaly detection can
be described as “finding exceptional patterns in network traffic
that do not conform to expected normal behavior” [9]. These
non-normal patterns are often referred to as anomalies, and can
signify serious breaches in a network. Anomaly detection can
be applied to a plethora of different scenarios, which range
from fraud detection, intrusion detection, and military
surveillance [9]. With the recent developments of machine
learning, and its application to data mining, anomaly detection
has improved greatly in recent years.
Machine learning and clustering in specific have made
the possibility of network anomaly detection a reality. The
overarching idea of clustering is to sort collected data into
different clusters. When applied to network anomaly detection,
this primarily becomes involved with the analysis of user and
network behavior. The data being analyzed is any action made
by a user or the network. The cluster that each action is
assigned to depends on the characteristics of that action such
as when the action took place, where it originated from, and
what files or programs that action affects.
Behavior, although a categorical value, can be
transformed into a numeric value for purposes of clustering.
This numeric data is then run through a probability based
clustering algorithm. All the data is statistically analyzed as a
function of x, μ, and σ, and is then sorted into 3 general
clusters: intrusion attack data, denial of service data, and
normal data. Intrusion attack behavior is generally considered
the more dangerous of the two types of attacks [9].
Intrusion attacks are defined as actions “aimed to
compromise the security of computer and network components
in terms of confidentiality, integrity, and availability”
[9]. Intrusions can be performed by internal sources
(individuals who have permission to access the network) or
external sources (those who do not have permission) [9]. One
example of these attacks is the Target hack of 2013. In 2013,
Target networks were hacked, and upwards of 40 million credit
cards were compromised [2]. Clustering is the superior
method of data mining when it comes to detecting these
intrusion attacks. In a study performed by Blower and
Williams, a clustering method was used to group normal
versus anomalous network data [4]. In the study, the clustering
method had a reported performance of ninety-eight percent in
determining whether there was an attack or not. This value
FIGURE 3 [6]
2 distributions of clusters, each with their own mean and
standard deviation.
FIGURE 4 [6]
The statistical formula f(x; μ, σ), where x is the value, μ is
the mean, and σ is the standard deviation.
Clustering is a powerful tool that harnesses machine
learning algorithms to analyze and sort data. This ability to
sort data with fair ease into certain categories can be very
useful, especially for network anomaly detection. Clustering in
this sense is used to analyze user behavior and categorize it as
either ‘normal’ or ‘anomalous’ behavior. The ability to
correctly categorize different behaviors comes from machine
learning which allows the security system to “learn and make
judgements without being programmed explicitly for every
scenario” [10]. These algorithms behind machine learning and
4
Jarod Vickers
Andrew Tran
was significantly larger than other forms of data mining
methods, such as Bayesian networks and fuzzy rules, in
revealing anomalous behavior [4].
There are several different programs on the market that
perform network anomaly detection using clustering
algorithms. These programs include host-based intrusion
detection systems and network-based intrusion detection
systems. One of the more modern programs is known as User
and Entity Behavior Analytics (UEBA), which combines these
2 forms of intrusion detection systems.
for the foreseeable future, and is a sustainable program.
Niara’s product and other forms of UEBA are currently the
most effective versions of anomaly breach detection, as they
discover threats in a fraction of the time that older methods do.
UEBA Compared to Other Methods
In the infancy stages of the internet, firewalls and basic
antivirus systems were sufficient in dealing with cyber security
threats. As time has passed, and hackers have become much
more intelligent, these basic systems are simply not powerful
enough detect and eliminate cybersecurity threats. Therefore,
anomalous breach detection systems were created, to actively
detect these threats by analyzing network activity and lead to
their elimination. UEBA is currently the most effective of
these anomalous breach detection systems, as it analyzes both
interior users and exterior entities, unlike most cybersecurity
systems.
Old versions, known simply as UBA, only monitored and
analyzed devices and users within the network. This however,
does not provide extensive enough coverage, as evident in the
Target hack case, in which hackers accessed the network via a
third party company. UEBA addresses this problem, as it
analyzes any and all devices that are somehow involved with
a network [11]. Although UEBA provides an overall superior
security blanket in terms of detecting security threats, the
product is not perfect, and thus there are still some issues with
it.
USER ENTITY BEHAVIORAL ANALYTICS
UEBA is a relatively new intrusion detection system that
builds upon the foundation of past systems, known as User
Behavior Analytics (UBA). As stated previously, UEBA
combines both host-based intrusion detection and networkbased intrusion detection [11]. Host-based intrusion detection
systems (HIDS) focus on the software on an actual
device. HIDS monitor internal activity, such as what programs
are running, and the processes they are carrying out
[9]. Network-based intrusion detection systems monitor
network systems to detect any potential compromises
[9]. These primarily focus on the transfer of data throughout
the network. By implementing both of these methods, UEBA
is able to easily detect cyber-criminals attempting to hack into
systems.
Cyber-criminals typically intrude network systems
through a method called spearphishing. Spearphishing is the
process of posing as an employee or trusted individual to not
raise any alarms within a company [10]. For example, in the
Target hack, the perpetrators accessed the network through a
ventilation and heating supplier to Target [2]. For this reason,
many cyber-attacks go unnoticed for long periods of time;
prior to UEBA, the average intrusion goes unnoticed for 265
days, and takes 69 days to contain. UEBA cuts the time
required to detect a security breach significantly, by over 50
percent, according to Niara, a company that produces and
distributes UEBA technology [10].
Niara’s version of UEBA uses all of the described
processes above. Niara incorporates machine learning and
clustering algorithms into their product to analyze user
behavior and external device behaviors. The distinguishing
factor of Niara’s UEBA from other UEBA programs is its
contextual risk scoring. The program performs probabilistic
clustering on every piece of data, and based on the probability
that any given piece of data is categorized as an intrusion
attack, a risk score is assigned to the data [10]. This creates a
user friendly interface that allows professionals to easily assess
whether a specific individual needs to be investigated or
not. Therefore, Niara’s UEBA’s use of statistical analysis
ensures that hackers must almost perfectly impersonate an
entity’s behavior to remain undetected. This results in a
program that is fairly unlikely to become obsolete, unless
hacker’s find a method to perfectly replicate someone’s
behaviors. Niara’s UEBA thus will perform at an optimal level
Limitations of UEBA
The primary issues with UEBA are the limitations of the
program to actually deal with threats. UEBA is simply a
program that analyzes behavior, and discovers any anomalies
that may potentially be threats. UEBA does absolutely nothing
to prevent these threats or deal with any successful intrusions.
Therefore, UEBA must be used in tandem with other programs
to deal with these threats. Preventing threats before they occur
is primarily the job of a firewall. Network threats are slowed
down greatly by firewalls, and if they do breach the firewall,
UEBA algorithms can detect the breach, creating a very
effective cybersecurity system [9]. Also, UEBA programs do
nothing
to
contain
a
breach
once
it
has
occurred. Unfortunately, in our current day in age, the only
thing that can deal with intrusions once they occur is a
professional. Therefore, a cybersecurity professional must
always be readily available to contain an intrusion, and
terminate it [9].
Although it is an issue that UEBA cannot deal with
specific threats, there are potential advancements that could
help fix the issue. These, however, involve technology that is
beyond our current reach. Every attack is different, and
requires different methods to respond to and
contain. Computers in their current stage are unable to make
difficult decisions and respond properly to all
situations. Further advancements in machine learning and
5
Jarod Vickers
Andrew Tran
artificial intelligence could eventually allow for cybersecurity
programs to eliminate threats, but not in the foreseeable future.
UEBA could also be incorporated more into a network’s
infrastructure, and have the capabilities to change network
traffic flow temporarily [11].
Although potential
advancements are possible, it is still plausible that hackers cold
create a method to spearphish perfectly without leaving traces.
This unfortunately would make UEBA and other behavior
based network breach detection systems obsolete, despite the
fact that UEBA has the potential to be improved upon. UEBA,
when partnered with other sources of cybersecurity, is still one
of the most powerful tools in dealing with cyber threats,
however.
UEBA might not be perfect and may detect false
positives, but it is still reliable enough to have a ninety-eight
percent success rate.
CURRENT STATE OF UEBA
UEBA is the next step in perfecting cyber security. It
solves many of the problems faced by older methods such as
over-reliance on the developer and it also has the ability to
detect threats from external sources. All of these problems
were able to be solved through the implementation of two key
algorithms: machine learning and clustering. These two
algorithms allow security systems to categorize any behavior
and action that affects a network as anomalous or safe before
alerting human analysts. UEBA has proven to be the most
effective cyber security system to date and is the most
advanced. However, there are still many flaws and ethical
issues that need to be resolved before UEBA becomes more
widespread.
Currently, there is no autonomous way to actually deal
with an anomalous threat. UEBA can only detect threats and
warn experts. This limitation cannot be resolved until
advancements in machine learning and artificial intelligence
are made. Even though there are some issues, UEBA is still
being used currently, and until more technological
advancements come along, it will be the best breach detection
option. It will also dramatically increase the sustainability of
cyber security software. New methods of breach detection may
not need to be developed for a long time as UEBA will be able
to change and adapt using machine learning.
Unpredictability
Another issue with UEBA is its unpredictability which is
a result of using artificial intelligence (AI). The predictability
of any technology is important for many reasons such as safety
and accountability. Every piece of technology is created to
complete a certain task, and causes an expected result.
However, the technology can sometimes cause unwanted
results. Having a predictable product allows engineers to
develop fail safes ahead of time in case of a malfunction [12].
AI does not have the same level of predictability meaning that
that it becomes difficult to predict when an unexpected result
will occur.
In the case of cyber security and UEBA, an unexpected
result would be a false positive, or when the system categorizes
a user’s normal behavior as anomalous. Unfortunately, there is
no way to know or predict what conditions would result in a
false positive, or even how often it will happen.
The reason for the lack of predictability of AI can be an
issue can explained by considering a classic example of AI,
those that are made to play chess against humans such as Deep
Blue, the first machine to beat the world champion of chess,
Garry Kasparov. If Deep Blue was predictable, that would
mean that its developers would have been able to predict every
move the machine was going to make before it was made. The
developers would also need to be able to realize every time an
unintended result occurs, in this case the machine making a
bad move. In other words, the developers would have needed
to be better at chess than the machine and could have beaten
Kasparov without it [12].
Similar to the chess example, the unpredictability of
UEBA means that developers must know when a false positive
occurs, but with the multitude of variables that are considered
by UEBA, a developer would need to perform a complete
investigation of the alert just to make sure it is not a false
positive, which trivializes the function of the security system
itself.
A false positive is not necessarily harmful, however,
because a human expert makes the final decision on how to act
against the behavior, but the having a technology that is
unpredictable could cause problems in the future. Some may
find it hard to trust an alert from an unpredictable technology.
Ethical Concerns of Machine Learning
While UEBA may be beneficial to cyber security, there
are important ethical concerns that must be considered. The
main concern with UEBA comes from the method of data
collection. Systems that implement UEBA must collect data
from users such as where they send and receive data, as well
as when files and programs are accessed. Essentially, all of
their activities while using the computer are monitored by the
security system and recorded to determine their normal
behavior.
This can be considered a violation to the user’s right to
privacy. Privacy can be defined as the state of being free from
being observed. However, as mentioned before, this system is
essentially observing every action that is made and file that is
opened. There are many questions about the extent of this
observation that need to be answered. For example, how far
can this monitoring go? Do the developers of the system have
access to the information? Is it possible for the program to
actually access these data files? If so, how can this be
prevented so no one exploits these permissions?
With the sensitivity of the personal information, it is
important that these questions are answered before UEBA
becomes a widely used method.
6
Jarod Vickers
Andrew Tran
[12] N. Bostrom, E. Yudkowsky. “The Ethics of Artificial
Intelligence.” Machine Intelligence Research Institute.
6.12.2014.
Accessed
2.10.2017.
https://intelligence.org/files/EthicsofAI.pdf
SOURCES
[1] “A Brief History of Microsoft Office.” Microsoft.
9.10.2015.
Accessed
3.25.2017.
https://enterprise.microsoft.com/en-gb/articles/roles/itleader/brief-history-microsoft-office/
[2] T. Julian. “Defining Moments in the History of Cyber
security and the Rise of Incident Response.” Infosecurity.
2015. Accessed 2.20.2017. https://www.infosecuritymagazine.com/opinions/the-history-of-cybersecurity/
[3] K. Scarfone, P. Hoffman. “Guidelines on Firewalls and
Firewall Policy.” National Institute of Standards and
Technology.
2009.
Accessed
2/1/2017.
http://csrc.nist.gov/publications/nistpubs/800-41-Rev1/sp80041-rev1.pdf
[4] A. L. Buczak, E Guven. “A Survey of Data Mining
Learning Methods for Cyber Security Intrusion Detection.”
IEEE.
10.26.2015.
Accessed
2.20.2017.
http://ieeexplore.ieee.org.pitt.idm.oclc.org/document/730709
8/
[5] E. Alpaydin. “Introduction to Machine Learning.”
Massachusetts Institute of Technology. 2014. Accessed
2.15.2017.
[6] I. H. Witten, E. Frank, M. A. Hall. “Data Mining Practical
Machine Learning Tools and Techniques.” Elsevier. 2011.
Accessed 2.24.2017.
[7] S. Dua, X Du. “Machine Learning in Cybersecurity.” CRC.
2011. Accessed 2.24.2017.
[8] Y. Zhao. “R and Data Mining: Examples and Case
Studies.” Elsevier.
10.20.2015. Accessed 2.21.2017.
https://78462f86-a-e2d7344e-ssites.googlegroups.com/a/rdatamining.com/www/docs/RData
Mining-book.pdf?attachauth=ANoY7cp85M9jpNUCtACrnD6Bd0bqQlSUBLIOv026Fj4DHSpb2PskMx7krHMu
f5qGBGs5YtlCTpK_BmsEureCQnAp_i6Xlk_o77f1I3O4Kea_BqeKKMgl8rDuvEs7UAEjiSxcafLm
MjAChNhDLcOEJffQqVaq63FFha8tIn1U9idild47U4Ho7q4j
_AESkoUq4NHMUbAtqBBda27vBp6_05ezIsIJttsA%3D%3
D&attredirects=0.
[9] M. H. Bhuyan, D. K. Bhattacharyya, J. K. Kalita. “Network
Anomaly Detection: Methods, Systems, and Tools.” IEEE.
2014.
Accessed
2.23.2017.
http://www.nr2.ufpr.br/~jefferson/pdf/Network_Anomaly_De
tection-Methods,_Systems_and_Tools.pdf
[10] “How UEBA and Machine Learning Detect Attacks.”
niara.
2016.
Accessed
2.09.2017.
http://info.niara.com/hubfs/PDFs/Guides/Security_Analysts_
Guide_How_UEBA_and_Machine_Learning_Detect_Attacks
.pdf
[11] D. Shackleford. “Active Breach Detection: The NextGeneration Security Technology?” SANS. 2.1.2016. Accessed
2.09.2017.
https://www.sans.org/readingroom/whitepapers/analyst/active-breach-detection-nextgeneration-security-technology-36812
ACKNOWLEDGEMENTS
We’d like to thank our friends and the wonderful card
game of bridge, for keeping us sane throughout this entire
process
7
Jarod Vickers
Andrew Tran
8