Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Security and Integrity Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas June 2009 Outline Data Security and Integrity - Multilevel Data Management, Data and Applications Security, Data Integrity and Provenance Policy Management - Confidentiality, Privacy Trust Privacy and Data Mining Secure Web Services and Semantic Web Emerging Directions Developments in Data and Applications Security: 1975 - Present Access Control for Systems R and Ingres (mid 1970s) Multilevel secure database systems (1980 – present) - Relational database systems: research prototypes and products; Distributed database systems: research prototypes and some operational systems; Object data systems; Inference problem and deductive database system; Transactions Recent developments in Secure Data Management (1996 – Present) - Secure data warehousing, Role-based access control (RBAC); Ecommerce; XML security and Secure Semantic Web; Data mining for intrusion detection and national security; Privacy; Dependable data management; Secure knowledge management and collaboration Developments in Data and Applications Security: Multilevel Secure Databases - I Air Force Summer Study in 1982 Early systems based on Integrity Lock approach Systems in the mid to late 1980s, early 90s - E.g., Seaview by SRI, Lock Data Views by Honeywell, ASD and ASD Views by TRW - Prototypes and commercial products - Trusted Database Interpretation and Evaluation of Commercial Products Secure Distributed Databases (late 80s to mid 90s) - Architectures; Algorithms and Prototype for distributed query processing; Simulation of distributed transaction management and concurrency control algorithms; Secure federated data management Developments in Data and Applications Security: Multilevel Secure Databases - II Inference Problem (mid 80s to mid 90s) - Unsolvability of the inference problem; Security constraint processing during query, update and database design operations; Semantic models and conceptual structures Secure Object Databases and Systems (late 80s to mid 90s) - Secure object models; Distributed object systems security; Object modeling for designing secure applications; Secure multimedia data management Secure Transactions (1990s) - Single Level/ Multilevel Transactions; Secure recovery and commit protocols Directions in Data and Applications Security - I Secure semantic web - Security models Secure Information Integration - How do you securely integrate numerous and heterogeneous data sources on the web and otherwise Secure Sensor Information Management - Fusing and managing data/information from distributed and autonomous sensors Secure Dependable Information Management - Integrating Security, Real-time Processing and Fault Tolerance Data Sharing vs. Privacy - Federated database architectures? Directions in Data and Applications Security - II Data mining and knowledge discovery for intrusion detection - Need realistic models; real-time data mining Secure knowledge management - Protect the assets and intellectual rights of an organization Information assurance, Infrastructure protection, Access Control - Insider cyber-threat analysis, Protecting national databases, Role-based access control for emerging applications Security for emerging applications - Geospatial, Biomedical, E-Commerce, etc. Other Areas - Trust and Economics, Trust Management/Negotiation, Secure Peer-to-peer computing, Data Integrity and Quality Data Integrity maintains the accuracy of the data - E.g., When multiple transactions access the data, the action of one transaction cannot invalidate that of another - Solutions: Locking mechanism - Integrity also includes preventing unauthorized modifications to the data Data quality provides some measure for determining the accuracy of the data - Is the data current? Can we trust the source? - Tools for data cleansing and handling incompleteness - Data quality parameters can be passed from source tom source E.g., Trust A 50% and Trust B 30% Data quality can be specified as part of the annotation to the data - Develop an annotation management system Data Provenance Keeping track of the entire history of the data - Who created the data - Who modified the data - Who read the data - Do we trust the data source? - Do we trust the person who handled the data - The organizations traveled by the data Data annotations for data provenance - What is the model? - Design of the annotation management system Using data analysis techniques, unauthorized modification and access, Misuse detection activities can be carried out Coalition Data and Policy Sharing Data/Policy for Federation Export Data/Policy Export Data/Policy Export Data/Policy Component Data/Policy for Agency A Component Data/Policy for Agency C Component Data/Policy for Agency B Need to Know to Need to Share Need to know policies during the cold war; even if the user has access, does the user have a need to know? Pose 9/11 the emphasis is on need to share - User may not have access, but needs the data Do we give the data to the user and then analyze the consequences Do we analyze the consequences and then determine the actions to take Do we simply not give the data to the user What are risks involved? CPT: Confidentiality, Privacy and Trust Before I as a user of Organization A send data about me to organization B, I read the privacy policies enforced by organization B - If I agree to the privacy policies of organization B, then I will send data about me to organization B - If I do not agree with the policies of organization B, then I can negotiate with organization B Even if the web site states that it will not share private information with others, do I trust the web site Note: while confidentiality is enforced by the organization, privacy is determined by the user. Therefore for confidentiality, the organization will determine whether a user can have the data. If so, then the organization van further determine whether the user can be trusted RBAC Access to information sources including structured and unstructured data both within the organization and external to the organization Access based on roles Hierarchy of roles: handling conflicts Controlled dissemination and sharing of the data UCON RBAC model is incorporated into UCON and useful for various applications - Authorization component Obligations Obligations are actions required to be performed before an access is permitted - Obligations can be used to determine whether an expensive knowledge search is required Attribute Mutability - Used to control the scope of the knowledge search Condition - Can be used for resource usage policies to be relaxed or tightened - Dissemination Policies Release policies will determine to whom to release the data - What is the connection to access control - Is access control sufficient - Once the data is retrieved from the information source (e.g., database) should it be released to the user Once the data is released, dissemination policies will determine who the data can be given to - Electronic music, etc. Risk Based Data Sharing/Access Control What are the risks involved in releasing/disseminating the data Risk modeling should be integrated with the access control model Simple method: assign risk values Higher the risk, lower the sharing What is the cost of releasing the data? Cost/Risk/Security closely related Trust Management Trust Services - Identify services, authorization services, reputation services Trust negotiation (TN) Digital credentials, Disclosure policies TN Requirements - Language requirements Semantics, constraints, policies System requirements Credential ownership, validity, alternative negotiation strategies, privacy Example TN systems KeyNote and Trust-X (U of Milan), TrustBuilder (UIUC) - - Credentials and Disclosure Credentials can be expressed through the Security Assertion Mark-up Language (SAML) SAML allows a party to express security statements about a given subject Authentication statements Attribute statements Authorization decision statements Disclosure policies govern: - Access to protected resources Access to sensitive information Disclosure of sensitive credentials Disclosure policies express trust requirements by means of credential combinations that must be disclosed to obtain authorization What is Privacy Medical Community - Privacy is about a patient determining what patient/medical information the doctor should be released about him/her Financial community - A bank customer determine what financial information the bank should release about him/her Government community - FBI would collect information about US citizens. However FBI determines what information about a US citizen it can release to say the CIA Data Mining as a Threat to Privacy Data mining gives us “facts” that are not obvious to human analysts of the data Can general trends across individuals be determined without revealing information about individuals? Possible threats: Combine collections of data and infer information that is private Disease information from prescription data Military Action from Pizza delivery to pentagon Need to protect the associations and correlations between the data that are sensitive or private - Some Privacy Problems and Potential Solutions Problem: Privacy violations that result due to data mining - Potential solution: Privacy-preserving data mining Problem: Privacy violations that result due to the Inference problem - Inference is the process of deducing sensitive information from the legitimate responses received to user queries - Potential solution: Privacy Constraint Processing Problem: Privacy violations due to un-encrypted data - Potential solution: Encryption at different levels Problem: Privacy violation due to poor system design - Potential solution: Develop methodology for designing privacyenhanced systems Privacy Constraint Processing Privacy constraints processing - Based on prior research in security constraint processing - Simple Constraint: an attribute of a document is private - Content-based constraint: If document contains information about X, then it is private - Association-based Constraint: Two or more documents taken together is private; individually each document is public - Release constraint: After X is released Y becomes private Augment a database system with a privacy controller for constraint processing Architecture for Privacy Constraint Processing User Interface Manager Privacy Constraints Constraint Manager Query Processor: Constraints during query and release operations DBMS Database Design Tool Constraints during database design operation Update Processor: Constraints during update operation Database Semantic Model for Privacy Control Dark lines/boxes contain private information Cancer Influenza Has disease John’s address Patient John address England Travels frequently Privacy Preserving Data Mining Prevent useful results from mining - Introduce “cover stories” to give “false” results - Only make a sample of data available so that an adversary is unable to come up with useful rules and predictive functions Randomization - Introduce random values into the data and/or results - Challenge is to introduce random values without significantly affecting the data mining results - Give range of values for results instead of exact values Secure Multi-party Computation - Each party knows its own inputs; encryption techniques used to compute final results Platform for Privacy Preferences (P3P): What is it? P3P is an emerging industry standard that enables web sites to express their privacy practices in a standard format The format of the policies can be automatically retrieved and understood by user agents It is a product of W3C; World wide web consortium www.w3c.org When a user enters a web site, the privacy policies of the web site is conveyed to the user; If the privacy policies are different from user preferences, the user is notified; User can then decide how to proceed Several major corporations are working on P3P standards including Data Mining and Privacy: Friends or Foes? They are neither friends nor foes Need advances in both data mining and privacy Need to design flexible systems - For some applications one may have to focus entirely on “pure” data mining while for some others there may be a need for “privacy-preserving” data mining - Need flexible data mining techniques that can adapt to the changing environments Technologists, legal specialists, social scientists, policy makers and privacy advocates MUST work together WS-* security Standards framework Security mgmt. XKMS Identity Mgmt. WS-Trust WS-federation Message security WS Security WS SecureConversation Liberty SAML Policy & Access Control Reliable Messaging WS ReliableMessaging WS-Policy XACML SAML SOAP foundation XML security Transport level security Network level security XML Encryption XML Signature SSL/TLS IPSec 28 Inference/Privacy Control with Semantic Web Technologies Technology By UTD Interface to the Semantic Web Inference Engine/ Rules Processor Policies Ontologies Rules RDF Database RDF Documents Web Pages, Databases Emerging Directions Digital Identity Management Identity Theft Management Digital Forensics Digital Watermarking Risk Analysis Economic Analysis Secure Electronic Voting Machines Biometrics Social network security