Download KDD-ISI-2009 - The University of Texas at Dallas

Data Security and Integrity Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas June 2009 Outline  Data Security and Integrity - Multilevel Data Management, Data and Applications Security, Data Integrity and Provenance  Policy Management - Confidentiality, Privacy Trust  Privacy and Data Mining  Secure Web Services and Semantic Web  Emerging Directions Developments in Data and Applications Security: 1975 - Present  Access Control for Systems R and Ingres (mid 1970s)  Multilevel secure database systems (1980 – present) - Relational database systems: research prototypes and products; Distributed database systems: research prototypes and some operational systems; Object data systems; Inference problem and deductive database system; Transactions  Recent developments in Secure Data Management (1996 – Present) - Secure data warehousing, Role-based access control (RBAC); Ecommerce; XML security and Secure Semantic Web; Data mining for intrusion detection and national security; Privacy; Dependable data management; Secure knowledge management and collaboration Developments in Data and Applications Security: Multilevel Secure Databases - I  Air Force Summer Study in 1982  Early systems based on Integrity Lock approach  Systems in the mid to late 1980s, early 90s - E.g., Seaview by SRI, Lock Data Views by Honeywell, ASD and ASD Views by TRW - Prototypes and commercial products - Trusted Database Interpretation and Evaluation of Commercial Products  Secure Distributed Databases (late 80s to mid 90s) - Architectures; Algorithms and Prototype for distributed query processing; Simulation of distributed transaction management and concurrency control algorithms; Secure federated data management Developments in Data and Applications Security: Multilevel Secure Databases - II  Inference Problem (mid 80s to mid 90s) - Unsolvability of the inference problem; Security constraint processing during query, update and database design operations; Semantic models and conceptual structures  Secure Object Databases and Systems (late 80s to mid 90s) - Secure object models; Distributed object systems security; Object modeling for designing secure applications; Secure multimedia data management  Secure Transactions (1990s) - Single Level/ Multilevel Transactions; Secure recovery and commit protocols Directions in Data and Applications Security - I  Secure semantic web - Security models  Secure Information Integration - How do you securely integrate numerous and heterogeneous data sources on the web and otherwise  Secure Sensor Information Management - Fusing and managing data/information from distributed and autonomous sensors  Secure Dependable Information Management - Integrating Security, Real-time Processing and Fault Tolerance  Data Sharing vs. Privacy - Federated database architectures? Directions in Data and Applications Security - II  Data mining and knowledge discovery for intrusion detection - Need realistic models; real-time data mining  Secure knowledge management - Protect the assets and intellectual rights of an organization  Information assurance, Infrastructure protection, Access Control - Insider cyber-threat analysis, Protecting national databases, Role-based access control for emerging applications  Security for emerging applications - Geospatial, Biomedical, E-Commerce, etc.  Other Areas - Trust and Economics, Trust Management/Negotiation, Secure Peer-to-peer computing, Data Integrity and Quality  Data Integrity maintains the accuracy of the data - E.g., When multiple transactions access the data, the action of one transaction cannot invalidate that of another - Solutions: Locking mechanism - Integrity also includes preventing unauthorized modifications to the data  Data quality provides some measure for determining the accuracy of the data - Is the data current? Can we trust the source? - Tools for data cleansing and handling incompleteness - Data quality parameters can be passed from source tom source  E.g., Trust A 50% and Trust B 30%  Data quality can be specified as part of the annotation to the data - Develop an annotation management system Data Provenance  Keeping track of the entire history of the data - Who created the data - Who modified the data - Who read the data - Do we trust the data source? - Do we trust the person who handled the data - The organizations traveled by the data  Data annotations for data provenance - What is the model? - Design of the annotation management system  Using data analysis techniques, unauthorized modification and access, Misuse detection activities can be carried out Coalition Data and Policy Sharing Data/Policy for Federation Export Data/Policy Export Data/Policy Export Data/Policy Component Data/Policy for Agency A Component Data/Policy for Agency C Component Data/Policy for Agency B Need to Know to Need to Share  Need to know policies during the cold war; even if the user has access, does the user have a need to know?  Pose 9/11 the emphasis is on need to share - User may not have access, but needs the data  Do we give the data to the user and then analyze the consequences  Do we analyze the consequences and then determine the actions to take  Do we simply not give the data to the user  What are risks involved? CPT: Confidentiality, Privacy and Trust  Before I as a user of Organization A send data about me to organization B, I read the privacy policies enforced by organization B - If I agree to the privacy policies of organization B, then I will send data about me to organization B - If I do not agree with the policies of organization B, then I can negotiate with organization B  Even if the web site states that it will not share private information with others, do I trust the web site  Note: while confidentiality is enforced by the organization, privacy is determined by the user. Therefore for confidentiality, the organization will determine whether a user can have the data. If so, then the organization van further determine whether the user can be trusted RBAC  Access to information sources including structured and unstructured data both within the organization and external to the organization  Access based on roles  Hierarchy of roles: handling conflicts  Controlled dissemination and sharing of the data UCON  RBAC model is incorporated into UCON and useful for various applications - Authorization component  Obligations Obligations are actions required to be performed before an access is permitted - Obligations can be used to determine whether an expensive knowledge search is required  Attribute Mutability - Used to control the scope of the knowledge search  Condition - Can be used for resource usage policies to be relaxed or tightened - Dissemination Policies  Release policies will determine to whom to release the data - What is the connection to access control - Is access control sufficient - Once the data is retrieved from the information source (e.g., database) should it be released to the user  Once the data is released, dissemination policies will determine who the data can be given to - Electronic music, etc. Risk Based Data Sharing/Access Control  What are the risks involved in releasing/disseminating the data  Risk modeling should be integrated with the access control model  Simple method: assign risk values  Higher the risk, lower the sharing  What is the cost of releasing the data?  Cost/Risk/Security closely related Trust Management  Trust Services - Identify services, authorization services, reputation services  Trust negotiation (TN) Digital credentials, Disclosure policies  TN Requirements - Language requirements  Semantics, constraints, policies System requirements  Credential ownership, validity, alternative negotiation strategies, privacy  Example TN systems KeyNote and Trust-X (U of Milan), TrustBuilder (UIUC) - - Credentials and Disclosure  Credentials can be expressed through the Security Assertion Mark-up Language (SAML)  SAML allows a party to express security statements about a given subject Authentication statements Attribute statements Authorization decision statements  Disclosure policies govern: - Access to protected resources Access to sensitive information Disclosure of sensitive credentials  Disclosure policies express trust requirements by means of credential combinations that must be disclosed to obtain authorization What is Privacy  Medical Community - Privacy is about a patient determining what patient/medical information the doctor should be released about him/her  Financial community - A bank customer determine what financial information the bank should release about him/her  Government community - FBI would collect information about US citizens. However FBI determines what information about a US citizen it can release to say the CIA Data Mining as a Threat to Privacy  Data mining gives us “facts” that are not obvious to human analysts of the data  Can general trends across individuals be determined without revealing information about individuals?  Possible threats: Combine collections of data and infer information that is private  Disease information from prescription data  Military Action from Pizza delivery to pentagon  Need to protect the associations and correlations between the data that are sensitive or private - Some Privacy Problems and Potential Solutions  Problem: Privacy violations that result due to data mining - Potential solution: Privacy-preserving data mining  Problem: Privacy violations that result due to the Inference problem - Inference is the process of deducing sensitive information from the legitimate responses received to user queries - Potential solution: Privacy Constraint Processing  Problem: Privacy violations due to un-encrypted data - Potential solution: Encryption at different levels  Problem: Privacy violation due to poor system design - Potential solution: Develop methodology for designing privacyenhanced systems Privacy Constraint Processing  Privacy constraints processing - Based on prior research in security constraint processing - Simple Constraint: an attribute of a document is private - Content-based constraint: If document contains information about X, then it is private - Association-based Constraint: Two or more documents taken together is private; individually each document is public - Release constraint: After X is released Y becomes private  Augment a database system with a privacy controller for constraint processing Architecture for Privacy Constraint Processing User Interface Manager Privacy Constraints Constraint Manager Query Processor: Constraints during query and release operations DBMS Database Design Tool Constraints during database design operation Update Processor: Constraints during update operation Database Semantic Model for Privacy Control Dark lines/boxes contain private information Cancer Influenza Has disease John’s address Patient John address England Travels frequently Privacy Preserving Data Mining  Prevent useful results from mining - Introduce “cover stories” to give “false” results - Only make a sample of data available so that an adversary is unable to come up with useful rules and predictive functions  Randomization - Introduce random values into the data and/or results - Challenge is to introduce random values without significantly affecting the data mining results - Give range of values for results instead of exact values  Secure Multi-party Computation - Each party knows its own inputs; encryption techniques used to compute final results Platform for Privacy Preferences (P3P): What is it?  P3P is an emerging industry standard that enables web sites to express their privacy practices in a standard format  The format of the policies can be automatically retrieved and understood by user agents  It is a product of W3C; World wide web consortium www.w3c.org  When a user enters a web site, the privacy policies of the web site is conveyed to the user; If the privacy policies are different from user preferences, the user is notified; User can then decide how to proceed  Several major corporations are working on P3P standards including Data Mining and Privacy: Friends or Foes?  They are neither friends nor foes  Need advances in both data mining and privacy  Need to design flexible systems - For some applications one may have to focus entirely on “pure” data mining while for some others there may be a need for “privacy-preserving” data mining - Need flexible data mining techniques that can adapt to the changing environments  Technologists, legal specialists, social scientists, policy makers and privacy advocates MUST work together WS-* security Standards framework Security mgmt. XKMS Identity Mgmt. WS-Trust WS-federation Message security WS Security WS SecureConversation Liberty SAML Policy & Access Control Reliable Messaging WS ReliableMessaging WS-Policy XACML SAML SOAP foundation XML security Transport level security Network level security XML Encryption XML Signature SSL/TLS IPSec 28 Inference/Privacy Control with Semantic Web Technologies Technology By UTD Interface to the Semantic Web Inference Engine/ Rules Processor Policies Ontologies Rules RDF Database RDF Documents Web Pages, Databases Emerging Directions  Digital Identity Management  Identity Theft Management  Digital Forensics  Digital Watermarking  Risk Analysis  Economic Analysis  Secure Electronic Voting Machines  Biometrics  Social network security

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download KDD-ISI-2009 - The University of Texas at Dallas