Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inference Problem Privacy Preserving Data Mining Readings and Assignments Required: • Pfleeger: Chapter 7 Interesting reading: • I. Moskowitz, M. H. Kang: Covert Channels – Here to Stay? http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.it d.nrl.navy.milzSzITDzSz5540zSzpublicationszSzCHACSzSz1994 zSz1994moskowitz-compass.pdf/moskowitz94covert.pdf • Jajodia, Meadows: Inference Problems in Multilevel Secure Database Management Systems http://www.acsac.org/secshelf/book001/book001.html, essay 24 Lecture 19 CSCE 522 - Farkas 2 Indirect Information Flow Channels Covert channels Inference channels Lecture 19 CSCE 522 - Farkas 3 Communication Channels Overt Channel: designed into a system and documented in the user's manual Covert Channel: not documented. Covert channels may be deliberately inserted into a system, but most such channels are accidents of the system design. Lecture 19 CSCE 522 - Farkas 4 Covert Channel Timing Channel: based on system times Storage channels: not time related communication Can be turned into each other Lecture 19 CSCE 522 - Farkas 5 Inference Channels Non-sensitive information Lecture 19 + Meta-data CSCE 522 - Farkas = Sensitive Information 6 Inference Channels Statistical Database Inferences General Purpose Database Inferences Lecture 19 CSCE 522 - Farkas 7 Statistical Databases Goal: provide aggregate information about groups of individuals E.g., Security risk: specific information about a particular individual E.g., average grade point of students grade point of student John Smith Meta-data: Working knowledge about the attributes Supplementary knowledge (not stored in database) Lecture 19 CSCE 522 - Farkas 8 Types of Statistics Macro-statistics: collections of related statistics presented in 2dimensional tables Sex\Year 1997 1998 Sum Female 4 1 5 Male 6 13 19 Sum 10 14 24 Micro-statistics: Individual data records used for statistics after identifying information is removed Lecture 19 Sex Course GPA Year F CSCE 590 3.5 2000 M CSCE 590 3.0 2000 F CSCE 790 4.0 2001 CSCE 522 - Farkas 9 Statistical Compromise Exact compromise: find exact value of an attribute of an individual (e.g., John Smith’s GPA is 3.8) Partial compromise: find an estimate of an attribute value corresponding to an individual (e.g., John Smith’s GPA is between 3.5 and 4.0) Lecture 19 CSCE 522 - Farkas 10 Methods of Attacks and Protection Small/Large Query Set Attack C: characteristic formula that identifies groups of individuals If C identifies a single individual I, e.g., count(C) = 1 Find out existence of property If count(C and D)=1 means I has property D If count(C and D)=0 means I does not have D OR Find value of property Lecture 19 Sum(C, D), gives value of D CSCE 522 - Farkas 11 Small/Large Query Set Attack cont. Protection from small/large query set attack: query-set-size control A query q(C) is permitted only if N-n |C| n , where n 0 is a parameter of the database and N is all the records in the database Lecture 19 CSCE 522 - Farkas 12 Tracker attack q(C) is disallowed C=C1 and C2 T=C1 and ~C2 Tracker C C2 C1 q(C)=q(C1) – q(T) Lecture 19 CSCE 522 - Farkas 13 Tracker attack q(C and D) is disallowed C=C1 and C2 T=C1 and ~C2 C Tracker C2 C1 C and D q(C and D)= q(T or C and D) – q(T) Lecture 19 D CSCE 522 - Farkas 14 Query overlap attack Q(John)=q(C1)-q(C2) C1 C2 Kathy John Paul Eve Max Fred Lecture 19 Mitch CSCE 522 - Farkas Protection: query-overlap control 15 Insertion/Deletion Attack Observing changes overtime q1=q(C) insert(i) q2=q(C) q(i)=q2-q1 Protection: insertion/deletion performed as pairs Lecture 19 CSCE 522 - Farkas 16 Statistical Inference Theory Give unlimited number of statistics and correct statistical answers, all statistical databases can be compromised (Ullman) Lecture 19 CSCE 522 - Farkas 17 Inferences in General-Purpose Databases Queries based on sensitive data Inference via database constraints Inferences via updates Lecture 19 CSCE 522 - Farkas 18 Queries based on sensitive data Sensitive information is used in selection condition but not returned to the user. Example: Salary: secret, Name: public NameSalary=$25,000 Protection: apply query of database views at different security levels Lecture 19 CSCE 522 - Farkas 19 Database Constraints Integrity constraints Database dependencies Key integrity Lecture 19 CSCE 522 - Farkas 20 Integrity Constraints C=A+B A=public, C=public, and B=secret B can be calculated from A and C, i.e., secret information can be calculated from public data Lecture 19 CSCE 522 - Farkas 21 Database Dependencies Metadata: Functional dependencies Multi-valued dependencies Join dependencies etc. Lecture 19 CSCE 522 - Farkas 22 Functional Dependency FD: A B, that is for any two tuples in the relation, if they have the same value for A, they must have the same value for B. Example: FD: Rank Salary Secret information: Name and Salary together Query1: Name and Rank Query2: Rank and Salary Combine answers for query1 and 2 to reveal Name and Salary together Lecture 19 CSCE 522 - Farkas 23 Key integrity Every tuple in the relation have a unique key Users at different levels, see different versions of the database Users might attempt to update data that is not visible for them Lecture 19 CSCE 522 - Farkas 24 Example Secret View Name (key) Black P Red S Salary 38,000 P 42,000 S Address Columbia S Irmo S Name (key) Salary Address Black P 38,000 P Null P Public View Lecture 19 CSCE 522 - Farkas 25 Updates Public User: Name (key) Black P Salary 38,000 P Address Null P 1. Update Black’s address to Orlando 2. Add new tuple: (Red, 22,000, Manassas) If Refuse update: covert channel Allow update: • Overwrite high data – may be incorrect • Create new tuple – which data it correct (polyinstantiation) – violate key constraints Lecture 19 CSCE 522 - Farkas 26 Updates Secret user: Name (key) Salary Address Black P 38,000 P Columbia S Red S 42,000 S Irmo S 1. Update Black’s salary to 45,000 If Refuse update: denial of service Allow update: • Overwrite low data – covert channel • Create new tuple – which data it correct (polyinstantiation) – violate key constraints Lecture 19 CSCE 522 - Farkas 27 Inference Problem No general technique is available to solve the problem Need assurance of protection Hard to incorporate outside knowledge Lecture 19 CSCE 522 - Farkas 28 The Inference Problem General Purpose Database: Non-confidential data + Metadata Undesired Inferences Web Enabled Data: Non-confidential data + Metadata (data and application semantics) + Computational Power + Connectivity Undesired Inferences 29 Correlated Inference base Base fort Public place address Place Public basin Water source district Water Source Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base Confidential Base Water source 30 Inference Control Public Access Control Confidentia l X Misinfo Organizational Data Attacker X Ontology Web Data Data Integration and Inferences Inference Control Public Misinfo Confidentia l Organizational Data ACCESS and INFERENCE CONTROL POLICY • Logic-based inference detection • Exact and partial disclosure • Data and metadata protection • Heterogeneous data manipulation • Metadata discovery Data Mining and Privacy Statistical inference: K-anonymity Correlation General inference: metadata Biased learning Pattern Lecture 19 CSCE 522 - Farkas 33 Next Class Midterm exam review Lecture 19 CSCE 522 - Farkas 34