Download KDD-ISI-2009 - The University of Texas at Dallas

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Security and Integrity
Developments and Directions
Dr. Bhavani Thuraisingham
The University of Texas at Dallas
June 2009
Outline
 Data Security and Integrity
-
Multilevel Data Management, Data and Applications Security, Data
Integrity and Provenance
 Policy Management
-
Confidentiality, Privacy Trust
 Privacy and Data Mining
 Secure Web Services and Semantic Web
 Emerging Directions
Developments in Data and Applications
Security: 1975 - Present
 Access Control for Systems R and Ingres (mid 1970s)
 Multilevel secure database systems (1980 – present)
- Relational database systems: research prototypes and products;
Distributed database systems: research prototypes and some
operational systems; Object data systems; Inference problem
and deductive database system; Transactions
 Recent developments in Secure Data Management (1996 – Present)
- Secure data warehousing, Role-based access control (RBAC); Ecommerce; XML security and Secure Semantic Web; Data
mining for intrusion detection and national security; Privacy;
Dependable data management; Secure knowledge management
and collaboration
Developments in Data and Applications
Security: Multilevel Secure Databases - I
 Air Force Summer Study in 1982
 Early systems based on Integrity Lock approach
 Systems in the mid to late 1980s, early 90s
- E.g., Seaview by SRI, Lock Data Views by Honeywell, ASD and
ASD Views by TRW
- Prototypes and commercial products
- Trusted Database Interpretation and Evaluation of Commercial
Products
 Secure Distributed Databases (late 80s to mid 90s)
- Architectures; Algorithms and Prototype for distributed query
processing; Simulation of distributed transaction management
and concurrency control algorithms; Secure federated data
management
Developments in Data and Applications
Security: Multilevel Secure Databases - II
 Inference Problem (mid 80s to mid 90s)
- Unsolvability of the inference problem; Security constraint
processing during query, update and database design
operations; Semantic models and conceptual structures
 Secure Object Databases and Systems (late 80s to mid 90s)
- Secure object models; Distributed object systems security;
Object modeling for designing secure applications; Secure
multimedia data management
 Secure Transactions (1990s)
- Single Level/ Multilevel Transactions; Secure recovery and
commit protocols
Directions in Data and Applications Security - I
 Secure semantic web
- Security models
 Secure Information Integration
- How do you securely integrate numerous and
heterogeneous data sources on the web and otherwise
 Secure Sensor Information Management
- Fusing and managing data/information from distributed
and autonomous sensors
 Secure Dependable Information Management
- Integrating Security, Real-time Processing and Fault
Tolerance
 Data Sharing vs. Privacy
- Federated database architectures?
Directions in Data and Applications Security - II
 Data mining and knowledge discovery for intrusion detection
- Need realistic models; real-time data mining
 Secure knowledge management
- Protect the assets and intellectual rights of an organization
 Information assurance, Infrastructure protection, Access
Control
- Insider cyber-threat analysis, Protecting national databases,
Role-based access control for emerging applications
 Security for emerging applications
- Geospatial, Biomedical, E-Commerce, etc.
 Other Areas
- Trust and Economics, Trust Management/Negotiation, Secure
Peer-to-peer computing,
Data Integrity and Quality
 Data Integrity maintains the accuracy of the data
- E.g., When multiple transactions access the data, the action of
one transaction cannot invalidate that of another
- Solutions: Locking mechanism
- Integrity also includes preventing unauthorized modifications
to the data
 Data quality provides some measure for determining the accuracy
of the data
- Is the data current? Can we trust the source?
- Tools for data cleansing and handling incompleteness
- Data quality parameters can be passed from source tom source

E.g., Trust A 50% and Trust B 30%
 Data quality can be specified as part of the annotation to the data
- Develop an annotation management system
Data Provenance
 Keeping track of the entire history of the data
- Who created the data
- Who modified the data
- Who read the data
- Do we trust the data source?
- Do we trust the person who handled the data
- The organizations traveled by the data
 Data annotations for data provenance
- What is the model?
- Design of the annotation management system
 Using data analysis techniques, unauthorized modification and
access, Misuse detection activities can be carried out
Coalition Data and Policy Sharing
Data/Policy for Federation
Export
Data/Policy
Export
Data/Policy
Export
Data/Policy
Component
Data/Policy for
Agency A
Component
Data/Policy for
Agency C
Component
Data/Policy for
Agency B
Need to Know to Need to Share
 Need to know policies during the cold war; even if the user has
access, does the user have a need to know?
 Pose 9/11 the emphasis is on need to share
- User may not have access, but needs the data

Do we give the data to the user and then analyze the
consequences

Do we analyze the consequences and then determine the
actions to take

Do we simply not give the data to the user

What are risks involved?
CPT: Confidentiality, Privacy and Trust
 Before I as a user of Organization A send data about me to
organization B, I read the privacy policies enforced by
organization B
- If I agree to the privacy policies of organization B, then I
will send data about me to organization B
- If I do not agree with the policies of organization B, then I
can negotiate with organization B
 Even if the web site states that it will not share private
information with others, do I trust the web site
 Note: while confidentiality is enforced by the organization,
privacy is determined by the user. Therefore for
confidentiality, the organization will determine whether a user
can have the data. If so, then the organization van further
determine whether the user can be trusted
RBAC
 Access to information sources including structured and
unstructured data both within the organization and external to the
organization
 Access based on roles
 Hierarchy of roles: handling conflicts
 Controlled dissemination and sharing of the data
UCON
 RBAC model is incorporated into UCON and useful for
various applications
- Authorization component
 Obligations
Obligations are actions required to be performed before
an access is permitted
- Obligations can be used to determine whether an
expensive knowledge search is required
 Attribute Mutability
- Used to control the scope of the knowledge search
 Condition
- Can be used for resource usage policies to be relaxed or
tightened
-
Dissemination Policies
 Release policies will determine to whom to release the data
- What is the connection to access control
- Is access control sufficient
- Once the data is retrieved from the information source (e.g.,
database) should it be released to the user
 Once the data is released, dissemination policies will determine who
the data can be given to
- Electronic music, etc.
Risk Based Data Sharing/Access Control
 What are the risks involved in releasing/disseminating the data
 Risk modeling should be integrated with the access control model
 Simple method: assign risk values
 Higher the risk, lower the sharing
 What is the cost of releasing the data?
 Cost/Risk/Security closely related
Trust Management
 Trust Services
- Identify services, authorization services, reputation
services
 Trust negotiation (TN)
Digital credentials, Disclosure policies
 TN Requirements
- Language requirements
 Semantics, constraints, policies
System requirements
 Credential ownership, validity, alternative negotiation
strategies, privacy
 Example TN systems
KeyNote and Trust-X (U of Milan), TrustBuilder (UIUC)
-
-
Credentials and Disclosure
 Credentials can be expressed through the Security Assertion Mark-up
Language (SAML)
 SAML allows a party to express security statements about a given subject
Authentication statements
Attribute statements
Authorization decision statements
 Disclosure policies govern:
-
Access to protected resources
Access to sensitive information
Disclosure of sensitive credentials
 Disclosure policies express trust requirements by means of credential
combinations that must be disclosed to obtain authorization
What is Privacy
 Medical Community
- Privacy is about a patient determining what
patient/medical information the doctor should be released
about him/her
 Financial community
- A bank customer determine what financial information the
bank should release about him/her
 Government community
- FBI would collect information about US citizens. However
FBI determines what information about a US citizen it can
release to say the CIA
Data Mining as a Threat to Privacy
 Data mining gives us “facts” that are not obvious to human analysts
of the data
 Can general trends across individuals be determined without
revealing information about individuals?
 Possible threats:
Combine collections of data and infer information that is private
 Disease information from prescription data
 Military Action from Pizza delivery to pentagon
 Need to protect the associations and correlations between the data
that are sensitive or private
-
Some Privacy Problems and Potential Solutions
 Problem: Privacy violations that result due to data mining
- Potential solution: Privacy-preserving data mining
 Problem: Privacy violations that result due to the Inference problem
- Inference is the process of deducing sensitive information from
the legitimate responses received to user queries
- Potential solution: Privacy Constraint Processing
 Problem: Privacy violations due to un-encrypted data
- Potential solution: Encryption at different levels
 Problem: Privacy violation due to poor system design
- Potential solution: Develop methodology for designing privacyenhanced systems
Privacy Constraint Processing
 Privacy constraints processing
- Based on prior research in security constraint processing
- Simple Constraint: an attribute of a document is private
- Content-based constraint: If document contains information
about X, then it is private
- Association-based Constraint: Two or more documents taken
together is private; individually each document is public
- Release constraint: After X is released Y becomes private
 Augment a database system with a privacy controller for constraint
processing
Architecture for Privacy
Constraint Processing
User Interface Manager
Privacy
Constraints
Constraint
Manager
Query Processor:
Constraints during query
and release operations
DBMS
Database Design Tool
Constraints during
database design
operation
Update
Processor:
Constraints
during update
operation
Database
Semantic Model for Privacy Control
Dark lines/boxes contain
private information
Cancer
Influenza
Has disease
John’s
address
Patient John
address
England
Travels frequently
Privacy Preserving Data Mining
 Prevent useful results from mining
- Introduce “cover stories” to give “false” results
- Only make a sample of data available so that an adversary is
unable to come up with useful rules and predictive functions
 Randomization
- Introduce random values into the data and/or results
- Challenge is to introduce random values without significantly
affecting the data mining results
- Give range of values for results instead of exact values
 Secure Multi-party Computation
- Each party knows its own inputs; encryption techniques used to
compute final results
Platform for Privacy Preferences (P3P):
What is it?
 P3P is an emerging industry standard that enables web sites
to express their privacy practices in a standard format
 The format of the policies can be automatically retrieved and
understood by user agents
 It is a product of W3C; World wide web consortium
www.w3c.org
 When a user enters a web site, the privacy policies of the web
site is conveyed to the user; If the privacy policies are
different from user preferences, the user is notified; User can
then decide how to proceed
 Several major corporations are working on P3P standards
including
Data Mining and Privacy: Friends or Foes?
 They are neither friends nor foes
 Need advances in both data mining and privacy
 Need to design flexible systems
- For some applications one may have to focus entirely on
“pure” data mining while for some others there may be a
need for “privacy-preserving” data mining
- Need flexible data mining techniques that can adapt to the
changing environments
 Technologists, legal specialists, social scientists, policy
makers and privacy advocates MUST work together
WS-* security Standards framework
Security mgmt.
XKMS
Identity Mgmt.
WS-Trust
WS-federation
Message security
WS Security
WS
SecureConversation
Liberty
SAML
Policy & Access
Control
Reliable Messaging
WS ReliableMessaging
WS-Policy
XACML
SAML
SOAP foundation
XML security
Transport level security
Network level security
XML
Encryption
XML
Signature
SSL/TLS
IPSec
28
Inference/Privacy Control with Semantic Web
Technologies
Technology
By UTD
Interface to the Semantic Web
Inference Engine/
Rules Processor
Policies
Ontologies
Rules
RDF Database
RDF
Documents
Web Pages,
Databases
Emerging Directions
 Digital Identity Management
 Identity Theft Management
 Digital Forensics
 Digital Watermarking
 Risk Analysis
 Economic Analysis
 Secure Electronic Voting Machines
 Biometrics
 Social network security