Download slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Inference Problem
Privacy Preserving
Data Mining
Readings and Assignments
Required:
• Pfleeger: Chapter 7
Interesting reading:
• I. Moskowitz, M. H. Kang: Covert Channels – Here to
Stay?
http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.it
d.nrl.navy.milzSzITDzSz5540zSzpublicationszSzCHACSzSz1994
zSz1994moskowitz-compass.pdf/moskowitz94covert.pdf
•
Jajodia, Meadows: Inference Problems in Multilevel
Secure Database Management Systems
http://www.acsac.org/secshelf/book001/book001.html, essay 24
Lecture 19
CSCE 522 - Farkas
2
Indirect Information Flow Channels
Covert channels
 Inference channels

Lecture 19
CSCE 522 - Farkas
3
Communication Channels


Overt Channel: designed into a system and
documented in the user's manual
Covert Channel: not documented. Covert
channels may be deliberately inserted into a
system, but most such channels are accidents
of the system design.
Lecture 19
CSCE 522 - Farkas
4
Covert Channel
Timing Channel: based on system times
 Storage channels: not time related
communication
 Can be turned into each other

Lecture 19
CSCE 522 - Farkas
5
Inference Channels
Non-sensitive
information
Lecture 19
+
Meta-data
CSCE 522 - Farkas
=
Sensitive
Information
6
Inference Channels
Statistical Database Inferences
 General Purpose Database Inferences

Lecture 19
CSCE 522 - Farkas
7
Statistical Databases

Goal: provide aggregate information about groups of
individuals
 E.g.,

Security risk: specific information about a particular
individual
 E.g.,

average grade point of students
grade point of student John Smith
Meta-data:
 Working
knowledge about the attributes
 Supplementary knowledge (not stored in database)
Lecture 19
CSCE 522 - Farkas
8
Types of Statistics


Macro-statistics: collections of related statistics presented in 2dimensional tables
Sex\Year
1997
1998
Sum
Female
4
1
5
Male
6
13
19
Sum
10
14
24
Micro-statistics: Individual data records used for statistics after
identifying information is removed
Lecture 19
Sex
Course
GPA
Year
F
CSCE 590
3.5
2000
M
CSCE 590
3.0
2000
F
CSCE 790
4.0
2001
CSCE 522 - Farkas
9
Statistical Compromise
Exact compromise: find exact value of an
attribute of an individual (e.g., John Smith’s
GPA is 3.8)
 Partial compromise: find an estimate of an
attribute value corresponding to an individual
(e.g., John Smith’s GPA is between 3.5 and
4.0)

Lecture 19
CSCE 522 - Farkas
10
Methods of Attacks and Protection

Small/Large Query Set Attack
 C:
characteristic formula that identifies groups of
individuals
If C identifies a single individual I, e.g., count(C) = 1
 Find out existence of property


If count(C and D)=1 means I has property D
If count(C and D)=0 means I does not have D
OR
 Find value of property

Lecture 19
Sum(C, D), gives value of D
CSCE 522 - Farkas
11
Small/Large Query Set Attack cont.
Protection from small/large query set attack:
query-set-size control
 A query q(C) is permitted only if
N-n  |C|  n , where n  0 is a parameter of
the database and N is all the records in the
database

Lecture 19
CSCE 522 - Farkas
12
Tracker attack
q(C) is disallowed
C=C1 and C2
T=C1 and ~C2
Tracker
C
C2
C1
q(C)=q(C1) – q(T)
Lecture 19
CSCE 522 - Farkas
13
Tracker attack
q(C and D) is disallowed
C=C1 and C2
T=C1 and ~C2
C
Tracker
C2
C1
C and D
q(C and D)=
q(T or C and D) – q(T)
Lecture 19
D
CSCE 522 - Farkas
14
Query overlap attack
Q(John)=q(C1)-q(C2)
C1
C2
Kathy
John
Paul
Eve
Max
Fred
Lecture 19
Mitch
CSCE 522 - Farkas
Protection:
query-overlap control
15
Insertion/Deletion Attack

Observing changes overtime
 q1=q(C)
 insert(i)
 q2=q(C)
 q(i)=q2-q1

Protection: insertion/deletion performed as
pairs
Lecture 19
CSCE 522 - Farkas
16
Statistical Inference Theory

Give unlimited number of statistics and correct
statistical answers, all statistical databases can
be compromised (Ullman)
Lecture 19
CSCE 522 - Farkas
17
Inferences in General-Purpose
Databases
Queries based on sensitive data
 Inference via database constraints
 Inferences via updates

Lecture 19
CSCE 522 - Farkas
18
Queries based on sensitive data
Sensitive information is used in selection
condition but not returned to the user.
 Example: Salary: secret, Name: public

NameSalary=$25,000

Protection: apply query of database views at
different security levels
Lecture 19
CSCE 522 - Farkas
19
Database Constraints
Integrity constraints
 Database dependencies
 Key integrity

Lecture 19
CSCE 522 - Farkas
20
Integrity Constraints
C=A+B
 A=public, C=public, and B=secret
 B can be calculated from A and C, i.e., secret
information can be calculated from public data

Lecture 19
CSCE 522 - Farkas
21
Database Dependencies
Metadata:
 Functional dependencies
 Multi-valued dependencies
 Join dependencies
 etc.
Lecture 19
CSCE 522 - Farkas
22
Functional Dependency


FD: A  B, that is for any two tuples in the relation,
if they have the same value for A, they must have the
same value for B.
Example: FD: Rank  Salary
Secret information: Name and Salary together
 Query1:
Name and Rank
 Query2: Rank and Salary
 Combine answers for query1 and 2 to reveal Name and
Salary together
Lecture 19
CSCE 522 - Farkas
23
Key integrity
Every tuple in the relation have a unique key
 Users at different levels, see different versions
of the database
 Users might attempt to update data that is not
visible for them

Lecture 19
CSCE 522 - Farkas
24
Example
Secret View
Name (key)
Black P
Red S
Salary
38,000 P
42,000 S
Address
Columbia S
Irmo S
Name (key)
Salary
Address
Black P
38,000 P
Null P
Public View
Lecture 19
CSCE 522 - Farkas
25
Updates
Public User:
Name (key)
Black P
Salary
38,000 P
Address
Null P
1. Update Black’s address to Orlando
2. Add new tuple: (Red, 22,000, Manassas)
If
Refuse update: covert channel
Allow update:
• Overwrite high data – may be incorrect
• Create new tuple – which data it correct
(polyinstantiation) – violate key constraints
Lecture 19
CSCE 522 - Farkas
26
Updates
Secret user:
Name (key)
Salary
Address
Black P
38,000 P
Columbia S
Red S
42,000 S
Irmo S
1. Update Black’s salary to 45,000
If
Refuse update: denial of service
Allow update:
• Overwrite low data – covert channel
• Create new tuple
– which data it correct
(polyinstantiation) – violate key constraints
Lecture 19
CSCE 522 - Farkas
27
Inference Problem
No general technique is available to solve the
problem
 Need assurance of protection
 Hard to incorporate outside knowledge

Lecture 19
CSCE 522 - Farkas
28
The Inference Problem
General Purpose Database:
Non-confidential data + Metadata 
Undesired Inferences
Web Enabled Data:
Non-confidential data + Metadata (data and
application semantics) + Computational Power +
Connectivity  Undesired Inferences
29
Correlated Inference
base
Base
fort
Public
place
address
Place
Public
basin
Water source
district
Water Source
Object[].
waterSource :: Object
basin :: waterSource
place :: Object
district :: place
address :: place
base :: Object
fort :: base
Confidential
Base
Water source
30
Inference Control
Public
Access Control
Confidentia
l
X
Misinfo
Organizational
Data
Attacker
X
Ontology
Web Data
Data Integration
and
Inferences
Inference Control
Public
Misinfo
Confidentia
l
Organizational
Data
ACCESS and INFERENCE CONTROL POLICY
• Logic-based inference detection
• Exact and partial disclosure
• Data and metadata protection
• Heterogeneous data manipulation
• Metadata discovery
Data Mining and Privacy

Statistical inference:
 K-anonymity
 Correlation

General inference:
 metadata
 Biased learning
 Pattern
Lecture 19
CSCE 522 - Farkas
33
Next Class

Midterm exam review
Lecture 19
CSCE 522 - Farkas
34