Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Assured Cloud Computing for Assured Information Sharing Dr. Bhavani Thuraisingham The University of Texas at Dallas (UTD) November 2012 Team Members • Sponsor: Air Force Office of Scientific Research • The University of Texas at Dallas – Dr. Murat Kantarcioglu; Dr. Latifur Khan; Dr. Kevin Hamlen; Dr. Zhiqiang Lin, Dr. Kamil Sarac • Sub-contractors – Prof. Elisa Bertino (Purdue) – Ms. Anita Miller, Dr. Bob Johnson (North Texas Fusion Center) • Collaborators – Late Dr. Steve Barker, Kings College, U of London (EOARD) – Dr. Barbara Carminati; Dr. Elena Ferrari, U of Insubria (EOARD) Outline • • • • • • Objectives Assured Information Sharing Layered Framework Our Research Education Acknowledgement: – Research Funded by Air Force Office of Scientific Research – Education funded by the National Science Foundation Objectives • Cloud computing is an example of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them. • Our research on Cloud Computing is based on Hadoop, MapReduce, Xen • Apache Hadoop is a Java software framework that supports data intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers. • XEN is a Virtual Machine Monitor developed at the University of Cambridge, England • Our goal is to build a secure cloud infrastructure for assured information sharing applications Information Operations Across Infospheres: Assured Information Sharing Objectives Develop a Framework for Secure and Timely Data Sharing across Infospheres Investigate Access Control and Usage Control policies for Secure Data Sharing Develop innovative techniques for extracting information from trustworthy, semi-trustworthy and untrustworthy partners Budget FY06-8: AFOSR $300K, State Match. $150K Scientific/Technical Approach Conduct experiments as to how much information is lost as a result of enforcing security policies in the case of trustworthy partners Develop more sophisticated policies based on role-based and usage control based access control models Develop techniques based on game theoretical strategies to handle partners who are semi-trustworthy Develop data mining techniques to carry out defensive and offensive information operations Data/Policy for Coalition Publish Data/Policy Publish Data/Policy Publish Data/Policy Component Data/Policy for Agency A Component Data/Policy for Agency C Component Data/Policy for Agency B Accomplishments Developed an experimental system for determining information loss due to security policy enforcement Developed a strategy for applying game theory for semitrustworthy partners; simulation results Developed data mining techniques for conducting defensive operations for untrustworthy partners Challenges Handling dynamically changing trust levels; Scalability Architecture: 2005-2008 Data/Policy for Coalition Export Data/Policy Export Data/Policy Export Data/Policy Component Data/Policy for Agency A Component Data/Policy for Agency C Component Data/Policy for Agency B Trustworthy Partners Semi-Trustworthy Partners Untrustworthy Partners Our Approach • Integrate the Medicaid claims data and mine the data; next enforce policies and determine how much information has been lost (Trustworthy partners); Prototype system; Application of Semantic web technologies • Apply game theory and probing to extract information from semi-trustworthy partners • Conduct Active Defence and determine the actions of an untrustworthy partner – Defend ourselves from our partners using data mining techniques – Conduct active defence – find our what our partners are doing by monitoring them so that we can defend our selves from dynamic situations • Trust for Peer to Peer Networks (Infrastructure security) Policy Enforcement Prototype Dr. Mamoun Awad (postdoc) and students Coalition Game Theory for Assured Information Sharing • Studies such interactions through mathematical representations of gain – Each party is considered a player – The information they gain from each other is considered a payoff – Scenario considered a finite repeated game • Information exchanged in discrete ‘chunks’ each round • Situation terminates at a finite yet unforeseeable point in the future – Actions within the game are to either lie or tell the truth • Our Goal: All players draw conclusion that telling the truth is the best option Incentive Issues in Assured Information Sharing DoD MURI Project 2008 - 2013, AFOSR Motivation • Misaligned incentives could be a significant problem in Information Security – Software bugs vs. software companies’ incentives • Incentive issues in information sharing have been explored to some extent – Incentive issues in file sharing p2p networks • Assured information sharing creates new challenges – Security considerations vs. utility Technical Approach • Verify that the other participants do not lie about their data – If the data is revealed as it is • Trust but verify (Our initial results: DKE ’08 paper) – If the data is not revealed (e.g., SMC techniques are used) • Non-cooperative computing • Mechanism design • SMC with rational adversaries Layered Framework for Assured Cloud Computing Policies XACML QoS User Interface Resource Allocation HIVE/SPARQL/Query Hadoop/MapReduc/Storage XEN/Linux/VMM Risks/ Costs Cloud Monitors Secure Virtual Network Monitor Figure2. Layered Framework for Assured Cloud 5/22/2017 11 Secure Query Processing with Hadoop/MapReduce • We have studied clouds based on Hadoop • Query rewriting and optimization techniques designed and implemented for two types of data • (i) Relational data: Secure query processing with HIVE • (ii) RDF data: Secure query processing with SPARQL • Demonstrated with XACML policies • Joint demonstration with Kings College and University of Insubria – First demo (2011): Each party submits their data and policies – Our cloud will manage the data and policies – Second demo (2012): Multiple clouds Fine-grained Access Control with Hive System Architecture Table/View definition and loading, Users can create tables as well as load data into tables. Further, they can also upload XACML policies for the table they are creating. Users can also create XACML policies for tables/views. Users can define views only if they have permissions for all tables specified in the query used to create the view. They can also either specify or create XACML policies for the views they are defining. CollaborateCom 2010 SPARQL Query Optimizer for Secure RDF Data Processing New Data Web Interface Answer Query Data Preprocessor MapReduce Framework Parser N-Triples Converter Query Validator & Rewriter Prefix Generator Predicate Based Splitter Predicate Object Based Splitter Server Backend XACML PDP Query Rewriter By Policy Plan Generator Plan Executor To build an efficient storage mechanism using Hadoop for large amounts of data (e.g. a billion triples); build an efficient query mechanism for data stored in Hadoop; Integrate with Jena Developed a query optimizer and query rewriting techniques for RDF Data with XACML policies and implemented on top of JENA IEEE Transactions on Knowledge and Data Engineering, 2011 Demonstration: Concept of Operation Agency 1 Agency 2 Agency n … User Interface Layer Relational Data Fine-grained Access Control with Hive RDF Data SPARQL Query Optimizer for Secure RDF Data Processing RDF-Based Policy Engine Technology By UTDallas Interface to the Semantic Web Inference Engine/ Rules Processor e.g., Pellet Policies Ontologies Rules In RDF JENA RDF Engine RDF Documents RDF-based Policy Engine on the Cloud Query Result Determine how access is granted to a resource as well as how a document is shared User specify policy: e.g., Access Control, Redaction, Released Policy Parse a high-level policy to a low-level representation Support Graph operations and visualization. Policy executed as graph operations Execute policies as SPARQL queries over large RDF graphs on Hadoop Support for policies over Traditional data and its provenance IFIP Data and Applications Security, 2010, ACM SACMAT 2011 User Interface Layer High Level Specification Policy Translator Policy Parser Layer Access Control/ Redaction Policy (Traditional Mechanism) Policy / Graph Transformation Rules Regular Expression-Query Translator Provenance Controller Data Controller XML DB Policy Transformation Layer ... RDF DB RDF A testbed for evaluating different policy sets over different data representation. Also supporting provenance as directed graph and viewing policy outcomes graphically Integration with Assured Information Sharing: Agency 1 Agency 2 Agency n … User Interface Layer SPARQL Query RDF Data and Policies Policy Translation and Transformation Layer RDF Data Preprocessor MapReduce Framework for Query Processing Hadoop HDFS Result Architecture Agency 2 Agency n Agency 1 User Interface Layer Policy Request RDF Graph Access Control Combined Redaction Policy n-2 Redaction Policy Engine Policy n-1 Access Control Cloud-based Store Provenance Combined RDF Query: SPARQL Policy n RDF Graph: Model Connection Interface RDBMS Connection: DB Connection: Cloud Connection: Text Local Key Feature 1: Policy Reciprocity Agency 1 wishes to share its resources if Agency 2 also shares its resources with it Use our Combined policies Allow agents to define policies based on reciprocity and mutual interest amongst cooperating agencies SPARQL query: SELECT B FROM NAMED uri1 FROM NAMED uri2 WHERE P Key Feature 2: Develop and Scale Policies Agency 1 wishes to extend its existing policies with support for constructing policies at a finer granularity. The Policy engine – Policy interface that should be implemented by all policies – Add newer types of policies as needed Key Feature 3: Justification of Resources Agency 1 asks Agency 2 for a justification of resource R2 • Policy engine – Allows agents to define policies over provenance – Agency 2 can provide the provenance to Agency 1 • But protect it by using access control or redaction policies Key Feature 4: Development Testbed Policy framework provides three configurations – A standalone version for development and testing; – A version backed by a relational database – A cloud-based version • achieves high availability and scalability while maintaining low setup and operation costs Secure Storage and Query Processing in a Hybrid Cloud • The use of hybrid clouds is an emerging trend in cloud computing – Ability to exploit public resources for high throughput – Yet, better able to control costs and data privacy • Several key challenges – Data Design: how to store data in a hybrid cloud? • Solution must account for data representation used (unencrypted/encrypted), public cloud monetary costs and query workload characteristics – Query Processing: how to execute a query over a hybrid cloud? • Solution must provide query rewrite rules that ensure the correctness of a generated query plan over the hybrid cloud Hypervisor integrity and forensics in the Cloud Applications Linux forensics Solaris XP MacOS OS integrity Virtualization Layer (Xen, vSphere) Hardware Layer Secure control flow of hypervisor code Hypervisor Cloud integrity & forensics Integrity via in-lined reference monitor Forensics data extraction in the cloud Multiple VMs De-mapping (isolate) each VM memory from physical memory Cloud-based Malware Detection Dr. Mehedy Stream of known malware or benign executables Buffer Unknown executable Feature extraction and selection using Cloud Feature extraction Malware Remove Training & Model update Ensemble of Classification models Classify Benign Class Keep Cloud-based Malware Detection • ACM Transactions on Management Information Systems • Binary feature extraction involves – Enumerating binary n-grams from the binaries and selecting the best n-grams based on information gain – For a training data with 3,500 executables, number of distinct 6-grams can exceed 200 millions – In a single machine, this may take hours, depending on available computing resources – not acceptable for training from a stream of binaries – We use Cloud to overcome this bottleneck • A Cloud Map-reduce framework is used – to extract and select features from each chunk – A 10-node cloud cluster is 10 times faster than a single node – Very effective in a dynamic framework, where malware characteristics change rapidly Identity Management Considerations in a Cloud • Trust model that handles – (i) Various trust relationships, (ii) access control policies based on roles and attributes, iii) real-time provisioning, (iv) authorization, and (v) auditing and accountability. • Several technologies have to be examined to develop the trust model – Service-oriented technologies; standards such as SAML and XACML; and identity management technologies such as OpenID. • Does one size fit all? – Can we develop a trust model that will be applicable to all types of clouds such as private clouds, public clouds and hybrid clouds Identity architecture has to be integrated into the cloud architecture. Education • NSF Capacity Building Grant on Assured Cloud Computing – Introduce cloud computing into several cyber security courses • Completed courses – – – – – Data and Applications Security Data Storage Digital Forensics Secure Web Services Computer and Information Security – Capstone Course • One course that covers all aspects of assured cloud computing – Week long course to be given at Texas Southern University Directions • Secure VMM (Virtual Machine Monitor) and VNM (Virtual Network Monitor) – Exploring XEN VMM and examining security issues – Developing automated techniques for VMM introspection – Will examine VMM issues January 2012 • Integrate Secure Storage Algorithms into Hadoop (FY 2012) • Identity Management (FY 2012) • Technology Transfer through Knowledge and Security Analytics, LLC