Download Dangerous Minds: The Art of Guerrilla Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Dangerous Minds: The Art of
Guerrilla Data Mining
Mark Ryan del Moral Talabis
Background
 “Security Analytics”:
 Concept of using data mining and AI in
security
 Presented techniques and theories that
we could use
 This Talk:
 Move from theory to practical
applications
 Provide scenarios and examples to
leverage these techniques for your
research
2
DEFCON 2009
Introduction
 Traditional warfare vs. Information
Security
 Very similar
 Reconnaissance, information gathering,
and espionage play an important part in
battle tactics
3
DEFCON 2009
This is Sparta!
 Anyone watched
300?
 Spartans: they knew
and understood the
terrain
 Persians: They did
not win because of
overwhelming
numbers, they
actually won because
someone told them
about a hidden pass.
4
DEFCON 2009
Information Security
 In information security:
 Not only in “hacking” systems
 The more information you have, you’ll
have a better chance to protect you
organization
 Drafting good policies and procedures
as well as picking the correct tools and
techniques based on the information
that you have.
5
DEFCON 2009
Words of Wisdom
 It is said that if you know your
enemies and know yourself, you
will not be imperiled in a hundred
battles - Sun Tzu
6
DEFCON 2009
Information Warfare
 Information Warfare
 The use and management of
information in pursuit of a competitive
advantage over an opponent
 Information are just 1’s and 0’s if not
used properly
 Analysis makes information
meaningful - INTELLIGENCE
7
DEFCON 2009
The Business of Information
Warfare
 People who are into the
Information Warfare Business:
 CIA
 FBI
 NSA
 Information Awareness
Office
 Foreign Governments
8
DEFCON 2009
Projects
 Government Projects:
 ECHELON
 TALON
 ADVISE
 MATRIX
 Able Danger
 Large endeavors!
9
DEFCON 2009
Challenges
 Amount of data: there’s just too
much
 Resources: way too little
Data
Data
Data
Challenge
10
Intelligence
DEFCON 2009
The Veritas Project
 Veritas is latin for “Truth”
 The Veritas Project
 Modeled in the same general threat
intelligence premise
 Primarily based on community sharing
approach and using tools,
technologies, and techniques that are
freely available.
 Hawaii Honeynet Project and SecureDNA
11
DEFCON 2009
An Analogy
Field
Agents
Decision
Makers
Information
Warfare
HQ
Analysts
12
DEFCON 2009
Framework
Data
Collection
Decision
Making
Framework
Data
Storage
Data
Analysis
13
DEFCON 2009
Data Collection
 Sources of Data
 Depends on what you want to research
 Forums
 Bulletins
 Chat logs
 News
 Articles
 Blogs
 Word documents
 The more you can gather, the better
results
 It’s not as easy, unless you’re Google
14
DEFCON 2009
Data Storage
 Information can be stored in:
 Relational databases
 Flat files
 Possibly the easiest part of all this
15
DEFCON 2009
Analysis
 Possible the most important aspect of the
framework
 Crunching large amount of data.
 Making data and information meaningful
 Some Data Mining and Artificial
Intelligence Concepts




K-Means
Neural Networks
SVM
A lot more
 Not too easy but there are a lot of tools
out there
16
DEFCON 2009
Data Analysis Tools
 Some very useful tools that are free
 Text Garden
 Ontogen
 Weka
 Rapid Miner
 Tanagra
 Orange
 MEAD
17
DEFCON 2009
Human Factor
 Why do we need
humans?
 Interpretation of
Results and Analysis
= Intelligence
18
DEFCON 2009
Demo Scenarios
 Let’s look at the scenarios that you
can use as templates for your own
research
19
DEFCON 2009
Applications
 Examples
 Trends Research
 Malware Taxonomy
 Monitoring – Persons of Interest
 Corporate Intelligence - Strategy
 Opinion Polls – What people are
thinking about
20
DEFCON 2009
Trends Research
 Track increases in chatter across time
 Gives researchers focus
 Find relationships between topics
 Framework
 Data Collection
 Crawlers (News articles, Forums)
 Data Storage
 MySQL
 Analysis
 Text Garden (html2txt, txt2bow, bowkmeans)
 Decision Making
 Me!
21
DEFCON 2009
China Activity
22
DEFCON 2009
China Computer Assisted
23
DEFCON 2009
China Nuclear / Power
24
DEFCON 2009
Thought Cloud
25
DEFCON 2009
Defcon
26
DEFCON 2009
Defcon Sumbissions
27
DEFCON 2009
Defcon Crime
28
DEFCON 2009
Conclusion
 Defcon = Crime
29
DEFCON 2009
iRobot!
30
DEFCON 2009
Skynet?
31
DEFCON 2009
Malware Taxonomy
 Grouping similar malwares
together
 Framework
 Data Collection
 Notes from Malware Analysts
 Data Storage
 Flat files
 Analysis
 Ontogen
 Decision Making
 Depends
32
DEFCON 2009
Thousands of malware
descriptions
33
DEFCON 2009
“Unsupervised Learning”
34
DEFCON 2009
Monitoring – Persons of
Interest
 Monitoring of chat logs and finding
“persons of interest” and who they
talk to. “Cells”.
35
DEFCON 2009
Monitoring – Persons of
Interest
36
DEFCON 2009
Monitoring – Persons of
Interest
37
DEFCON 2009
Corporate Intelligence
 Using data mining to profile
companies to determine strategy
38
DEFCON 2009
What the Public is Thinking
About
 Obama Town Hall Meeting
 Data mining of over 100,000
questions to get a “pulse” of what
people are concerned about
 Healthcare
39
DEFCON 2009
Our Love Affair with Marijuana
40
DEFCON 2009
Future
 Contributors
 Sentiment Analysis
 Good or Bad?
 We need more data!
41
DEFCON 2009
Acknowledgements
 Howard Van de Vaarst
 Chris Potter
 Secure DNA management
 University of Santo Tomas
(Philippines)
 Blaz Fortuna (Ontogen)
 Jozef Stefan Institute, Slovenia
(Text Garden)
42
DEFCON 2009
Mahalo
 The Veritas Project
43
DEFCON 2009