Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Cyber Security Research at the University of Texas at Dallas Sample Projects Prof. Bhavani Thuraisingham, PhD, CISSP Prof. Latifur Khan, PhD Prof. Murat Kantarcioglu, PhD Prof. Kevin Hamlen, PhD Prof. Edwin Sha, PhD August 2010 Data Mining for Malicious Traffic Dr. Latifur Khan (NASA, AFOSR) Technical Approach Motivation • • Network traffic is a continuous flow of data, which is evolving with time How can we detect intrusion by mining the network traffic when • • • the intrusions evolve themselves ? only a small fraction of the traffic is analyzed and labeled by human experts ? new kind of intrusions appear ? • • Idea: Build a classification model from past data and predict intrusions using the model. The model must be able to • • • • Strategy: • • System Architecture • Last Partially labeled chunk 2 Last Unlabeled chunk 1 Training 4 Classification Update Ensemble of models New model Refinement FEARLESS engineering Semi-supervised learning to compensate for the short of labeled training data Ensemble classification technique to cope with the changes in the traffic Novel class detection to detect new kind of intrusions in the traffic Newer chunks Older chunks Network traffic keep itself up-to-date so that it can detect intrusions even if their characteristics change over time use the limited amount of labeled data to efficiently update itself detect new kind of intrusions in the traffic 3 Intrusion? Reactively Adaptive Malware Dr. Kevin W. Hamlen and Dr. Latifur Khan (AFOSR) • • • Motivation Design and study malware immune to conventional antivirus technologies Important for AF active defense project Important for developing adequate defenses in anticipation of next-generation attacks Signature Query Interface Antivirus Signature Database FEARLESS engineering Signature Inference Engine Signature Approximation Model Malware Binary • • Technical Approach Data Mining • use machine learning to discover signatures dynamically • adapt to new malware in the field • share learned signatures amongst mutually trusting attackers Reactively Adaptive Malware • discover false negatives in protection system • self-obfuscate to defeat defenses Obfuscation Generation Obfuscation Function Obfuscated Binary AFOSR: Assured Information Sharing: 2005-2008 (Dr. Bhavani Thuraisingham) Data/Policy for Coalition Export Data/Policy Export Data/Policy Export Data/Policy Component Data/Policy for Agency A Component Data/Policy for Agency C Component Data/Policy for Agency B Integrate the Medicaid claims data and mine the data; next enforced policies and determine how much information has been lost (Trustworthy partners); Prototype system; Application of Semantic web technologies Apply game theory and probing to extract information from semi-trustworthy partners Conduct Active Defence and determine the actions of an untrustworthy partner Defend ourselves from our partners using data mining techniques Conduct active defence – find our what our partners are doing by monitoring them so that we can defend our selves from dynamic situations Trust for Peer to Peer Networks (Infrastructure security) Trustworthy Partners Semi-Trustworthy Partners Untrustworthy Partners Incentive Issues in Assured Information Sharing Dr. Murat Kantarcioglu (DoD MURI Project 2008-2013, AFOSR)) Motivation •Misaligned incentives could be a significant problem in Information Security. —Software bugs vs. Software companies’ incentives •Incentive issues in information sharing have been explored to some extent —Incentive issues in file sharing p2p networks •Assured information sharing creates new challenges —Security considerations vs. Utility • Technical Approach Verify that the other participants do not lie about their data. If the data is revealed as it is Trust but verify (Our initial results: DKE ’08 paper) If the data is not revealed (e.g., SMC techniques are used) Non-cooperative computing Mechanism design SMC with rational adversaries. – – • • • • FEARLESS engineering Scalable Social Network Mining Dr. Murat Kantarcioglu (NSF) Motivation •Mining social network data could provide important insights. •Recently many different data mining techniques have been suggested for mining social network data. •These techniques require many iterations (e.g., collective inference techniques) and expensive computations (e.g., maximum likelihood methods) over the large social networks. Technical Approach •Our goal is to scale the existing social network mining techniques to very large social network data by using cloud computing. •To achieve this goal, we are exploring Intelligent data partition techniques based on social network concepts Caching of some important queries Efficient update of cached query results using cloud computing FEARLESS engineering Initial Results •Partitioning techniques based on various social network centrality metrics have been implemented Degree centrality (DC) Clustering coefficient (CC) Closeness centrality (CloC) Betweenness centrality (BC) Random partionining Domain specific Our initial results indicate by intelligent partitioning we can increase accuracy and reduce running time. Language-based Security Dr. Kevin W. Hamlen (AFOSR) • • Motivation Mobile code security (web scripts, patches, etc.) How to enforce application-specific security policies over these untrusted software extensions? – – – • • • One simple rewriting strategy: – • rewritten code must satisfy security policy rewritten code behaves exactly like original (except with regard to policy violations) insert guard instructions before every potentially dangerous instruction Use compiler optimizations to eliminate or streamline unnecessary guards FEARLESS engineering untrusted code Trusted Computing Base Policy #1: Untrusted code must not create or modify any file whose name ends in “.exe” Policy #2: Untrusted code must not access the network after reading a confidential file Policy #3: Untrusted code must relinquish the thread after at most 1000 instruction cycles Technical Approach Idea: Automatically rewrite the code prior to execution Two constraints on rewritten code: – – System Architecture reject security policy Rewriter verifier self-monitoring code + proof accept Example Code (inserted code shown in green) … eax := “filename.exe” if (eax == “*.exe”) abort(); call System.open(eax, “w”); … Privacy-preserving Distributed Data Mining Dr. Murat Kantarcioglu (NSF) Motivation • • • Privacy sensitive data that is needed for many critical tasks is distributed among different organizations. Statistical analysis of hospital discharge data for detecting biological weapons attacks. Privacy concerns may hinder sharing such data for legitimate purposes. Our goal is to develop techniques to enable distributed data mining without sacrificing individual privacy Technical Approach • • Idea: Combine sanitization and cryptographic techniques to enable efficient and accurate privacypreserving distributed data mining. Each data source sanitizes its own data. Sanitized data is shared directly . Cryptographic algorithms use sanitize data along with original data to get the data mining results. Our initial results indicate that this idea is more efficient than pure cryptographic approaches and more accurate than pure sanitization approaches. FEARLESS engineering Cryptographic Protocols Sanitized Data Processing Sanitized Data 1 (Public) Sanitized Data 2 (Public) Data Sanitization Data Sanitization Source Data 1 (Private) Source Data 2 (Private) Result WWW problems as a source of geo-information Geographic context embedded in natural language descriptions Place names ambiguous and confused with names of organisations, people, buildings and streets Text Info. Retrieval Web queries depend on exact match of text terms Applications: • • • NNP Update Webpage gazetteer Location-based services NN, NNS, NNP, NNPS Locally targeted web advertising Mining geographic properties Market research • Geo-Tagging = Geo-parsing + Geo-coding • • Ranking Based Disambiguation Geo-Information Web services Geo-parsing Recognising geographic references (ignoring nongeographic uses of place terminology) Geo-coding – Attaching a unique quantitative locations (footprint) to geographic references Example: Geo-Geo ambiguity {city}Columbia/{S_C}California/U.S. {City}Columbia/{S_C}Pennsylvania/U.S. Geo- non Geo ambiguity e.g. “Samuel Lancaster” Lancaster > Last name. {City} Lancaster / Texas/ U.S. Other Projects • • • • • • • • • Secure Cloud Computing http://www.wpafb.af.mil/news/story.asp?id=123209377 Secure Social and Private Networks Security and Privacy preserving ontology alignment Secure Peer to Peer Data Management Risk modeling and analysis of Botnets Policy interoperability of geospatial data Data provenance and Attribution of Attacks Accountability of Secure Systems