Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Dangerous Minds: The Art of Guerrilla Data Mining Mark Ryan del Moral Talabis Background “Security Analytics”: Concept of using data mining and AI in security Presented techniques and theories that we could use This Talk: Move from theory to practical applications Provide scenarios and examples to leverage these techniques for your research 2 DEFCON 2009 Introduction Traditional warfare vs. Information Security Very similar Reconnaissance, information gathering, and espionage play an important part in battle tactics 3 DEFCON 2009 This is Sparta! Anyone watched 300? Spartans: they knew and understood the terrain Persians: They did not win because of overwhelming numbers, they actually won because someone told them about a hidden pass. 4 DEFCON 2009 Information Security In information security: Not only in “hacking” systems The more information you have, you’ll have a better chance to protect you organization Drafting good policies and procedures as well as picking the correct tools and techniques based on the information that you have. 5 DEFCON 2009 Words of Wisdom It is said that if you know your enemies and know yourself, you will not be imperiled in a hundred battles - Sun Tzu 6 DEFCON 2009 Information Warfare Information Warfare The use and management of information in pursuit of a competitive advantage over an opponent Information are just 1’s and 0’s if not used properly Analysis makes information meaningful - INTELLIGENCE 7 DEFCON 2009 The Business of Information Warfare People who are into the Information Warfare Business: CIA FBI NSA Information Awareness Office Foreign Governments 8 DEFCON 2009 Projects Government Projects: ECHELON TALON ADVISE MATRIX Able Danger Large endeavors! 9 DEFCON 2009 Challenges Amount of data: there’s just too much Resources: way too little Data Data Data Challenge 10 Intelligence DEFCON 2009 The Veritas Project Veritas is latin for “Truth” The Veritas Project Modeled in the same general threat intelligence premise Primarily based on community sharing approach and using tools, technologies, and techniques that are freely available. Hawaii Honeynet Project and SecureDNA 11 DEFCON 2009 An Analogy Field Agents Decision Makers Information Warfare HQ Analysts 12 DEFCON 2009 Framework Data Collection Decision Making Framework Data Storage Data Analysis 13 DEFCON 2009 Data Collection Sources of Data Depends on what you want to research Forums Bulletins Chat logs News Articles Blogs Word documents The more you can gather, the better results It’s not as easy, unless you’re Google 14 DEFCON 2009 Data Storage Information can be stored in: Relational databases Flat files Possibly the easiest part of all this 15 DEFCON 2009 Analysis Possible the most important aspect of the framework Crunching large amount of data. Making data and information meaningful Some Data Mining and Artificial Intelligence Concepts K-Means Neural Networks SVM A lot more Not too easy but there are a lot of tools out there 16 DEFCON 2009 Data Analysis Tools Some very useful tools that are free Text Garden Ontogen Weka Rapid Miner Tanagra Orange MEAD 17 DEFCON 2009 Human Factor Why do we need humans? Interpretation of Results and Analysis = Intelligence 18 DEFCON 2009 Demo Scenarios Let’s look at the scenarios that you can use as templates for your own research 19 DEFCON 2009 Applications Examples Trends Research Malware Taxonomy Monitoring – Persons of Interest Corporate Intelligence - Strategy Opinion Polls – What people are thinking about 20 DEFCON 2009 Trends Research Track increases in chatter across time Gives researchers focus Find relationships between topics Framework Data Collection Crawlers (News articles, Forums) Data Storage MySQL Analysis Text Garden (html2txt, txt2bow, bowkmeans) Decision Making Me! 21 DEFCON 2009 China Activity 22 DEFCON 2009 China Computer Assisted 23 DEFCON 2009 China Nuclear / Power 24 DEFCON 2009 Thought Cloud 25 DEFCON 2009 Defcon 26 DEFCON 2009 Defcon Sumbissions 27 DEFCON 2009 Defcon Crime 28 DEFCON 2009 Conclusion Defcon = Crime 29 DEFCON 2009 iRobot! 30 DEFCON 2009 Skynet? 31 DEFCON 2009 Malware Taxonomy Grouping similar malwares together Framework Data Collection Notes from Malware Analysts Data Storage Flat files Analysis Ontogen Decision Making Depends 32 DEFCON 2009 Thousands of malware descriptions 33 DEFCON 2009 “Unsupervised Learning” 34 DEFCON 2009 Monitoring – Persons of Interest Monitoring of chat logs and finding “persons of interest” and who they talk to. “Cells”. 35 DEFCON 2009 Monitoring – Persons of Interest 36 DEFCON 2009 Monitoring – Persons of Interest 37 DEFCON 2009 Corporate Intelligence Using data mining to profile companies to determine strategy 38 DEFCON 2009 What the Public is Thinking About Obama Town Hall Meeting Data mining of over 100,000 questions to get a “pulse” of what people are concerned about Healthcare 39 DEFCON 2009 Our Love Affair with Marijuana 40 DEFCON 2009 Future Contributors Sentiment Analysis Good or Bad? We need more data! 41 DEFCON 2009 Acknowledgements Howard Van de Vaarst Chris Potter Secure DNA management University of Santo Tomas (Philippines) Blaz Fortuna (Ontogen) Jozef Stefan Institute, Slovenia (Text Garden) 42 DEFCON 2009 Mahalo The Veritas Project 43 DEFCON 2009