Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inference Problem Privacy Preserving Data Mining Readings and Assignments Moskowitz, M. H. Kang: Covert Channels – Here to Stay? I. http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.it d.nrl.navy.milzSzITDzSz5540zSzpublicationszSzCHACSzSz1994 zSz1994moskowitz-compass.pdf/moskowitz94covert.pdf Jajodia, Meadows: Inference Problems in Multilevel Secure Database Management Systems http://www.acsac.org/secshelf/book001/book001.html, essay 24 Lecture 19 CSCE 522 - Farkas 2 Indirect Information Flow Channels Covert channels Inference channels Lecture 19 CSCE 522 - Farkas 3 Communication Channels Overt Channel: designed into a system and documented in the user's manual Covert Channel: not documented. Covert channels may be deliberately inserted into a system, but most such channels are accidents of the system design. Lecture 19 CSCE 522 - Farkas 4 Covert Channel Timing Channel: based on system times Storage channels: not time related communication Can be turned into each other Lecture 19 CSCE 522 - Farkas 5 Inference Channels Non-sensitive information Lecture 19 + Meta-data CSCE 522 - Farkas = Sensitive Information 6 Inference Channels Statistical Database Inferences General Purpose Database Inferences Lecture 19 CSCE 522 - Farkas 7 Statistical Databases Goal: provide aggregate information about groups of individuals E.g., Security risk: specific information about a particular individual E.g., average grade point of students grade point of student John Smith Meta-data: Working knowledge about the attributes Supplementary knowledge (not stored in database) Lecture 19 CSCE 522 - Farkas 8 Types of Statistics Macro-statistics: collections of related statistics presented in 2dimensional tables Sex\Year 1997 1998 Sum Female 4 1 5 Male 6 13 19 Sum 10 14 24 Micro-statistics: Individual data records used for statistics after identifying information is removed Lecture 19 Sex Course GPA Year F CSCE 590 3.5 2000 M CSCE 590 3.0 2000 F CSCE 790 4.0 2001 CSCE 522 - Farkas 9 Statistical Compromise Exact compromise: find exact value of an attribute of an individual (e.g., John Smith’s GPA is 3.8) Partial compromise: find an estimate of an attribute value corresponding to an individual (e.g., John Smith’s GPA is between 3.5 and 4.0) Lecture 19 CSCE 522 - Farkas 10 Methods of Attacks and Protection Small/Large Query Set Attack C: characteristic formula that identifies groups of individuals If C identifies a single individual I, e.g., count(C) = 1 Find out existence of property If count(C and D)=1 means I has property D If count(C and D)=0 means I does not have D OR Find value of property Lecture 19 Sum(C, D), gives value of D CSCE 522 - Farkas 11 Small/Large Query Set Attack cont. Protection from small/large query set attack: query-set-size control A query q(C) is permitted only if N-n |C| n , where n 0 is a parameter of the database and N is all the records in the database Lecture 19 CSCE 522 - Farkas 12 Tracker attack q(C) is disallowed C=C1 and C2 T=C1 and ~C2 Tracker C C2 C1 q(C)=q(C1) – q(T) Lecture 19 CSCE 522 - Farkas 13 Tracker attack q(C and D) is disallowed C=C1 and C2 T=C1 and ~C2 C Tracker C2 C1 C and D q(C and D)= q(T or C and D) – q(T) Lecture 19 D CSCE 522 - Farkas 14 Query overlap attack Q(John)=q(C1)-q(C2) C1 C2 Kathy John Paul Eve Max Fred Lecture 19 Mitch CSCE 522 - Farkas Protection: query-overlap control 15 Insertion/Deletion Attack Observing changes overtime q1=q(C) insert(i) q2=q(C) q(i)=q2-q1 Protection: insertion/deletion performed as pairs Lecture 19 CSCE 522 - Farkas 16 Statistical Inference Theory Give unlimited number of statistics and correct statistical answers, all statistical databases can be compromised (Ullman) Lecture 19 CSCE 522 - Farkas 17 Inferences in General-Purpose Databases Queries based on sensitive data Inference via database constraints Inferences via updates Lecture 19 CSCE 522 - Farkas 18 Queries based on sensitive data Sensitive information is used in selection condition but not returned to the user. Example: Salary: secret, Name: public NameSalary=$25,000 Protection: apply query of database views at different security levels Lecture 19 CSCE 522 - Farkas 19 Database Constraints Integrity constraints Database dependencies Key integrity Lecture 19 CSCE 522 - Farkas 20 Integrity Constraints C=A+B A=public, C=public, and B=secret B can be calculated from A and C, i.e., secret information can be calculated from public data Lecture 19 CSCE 522 - Farkas 21 Database Dependencies Metadata: Functional dependencies Multi-valued dependencies Join dependencies etc. Lecture 19 CSCE 522 - Farkas 22 Functional Dependency FD: A B, that is for any two tuples in the relation, if they have the same value for A, they must have the same value for B. Example: FD: Rank Salary Secret information: Name and Salary together Query1: Name and Rank Query2: Rank and Salary Combine answers for query1 and 2 to reveal Name and Salary together Lecture 19 CSCE 522 - Farkas 23 Key integrity Every tuple in the relation have a unique key Users at different levels, see different versions of the database Users might attempt to update data that is not visible for them Lecture 19 CSCE 522 - Farkas 24 Example Secret View Name (key) Black P Red S Salary 38,000 P 42,000 S Address Columbia S Irmo S Name (key) Salary Address Black P 38,000 P Null P Public View Lecture 19 CSCE 522 - Farkas 25 Updates Public User: Name (key) Black P Salary 38,000 P Address Null P 1. Update Black’s address to Orlando 2. Add new tuple: (Red, 22,000, Manassas) If Refuse update: covert channel Allow update: • Overwrite high data – may be incorrect • Create new tuple – which data it correct (polyinstantiation) – violate key constraints Lecture 19 CSCE 522 - Farkas 26 Updates Secret user: Name (key) Salary Address Black P 38,000 P Columbia S Red S 42,000 S Irmo S 1. Update Black’s salary to 45,000 If Refuse update: denial of service Allow update: • Overwrite low data – covert channel • Create new tuple – which data it correct (polyinstantiation) – violate key constraints Lecture 19 CSCE 522 - Farkas 27 Inference Problem No general technique is available to solve the problem Need assurance of protection Hard to incorporate outside knowledge Lecture 19 CSCE 522 - Farkas 28 Web Evolution Past: Human usage Static Web pages (HTML, XML) Present: Human & Automated usage Semantic Web, WS, SOA Future: Mobile Computing 29 Web Data Security Access Control Models Heterogeneous Data: XML, Stream, Text Limitations: Syntax-based No association protection Limited handling of updates No data or application semantics No inference control 30 Secure XML Views - Example medicalFiles <medicalFiles> UC <countyRec> S <patient> S <name>John Smith </name> UC countyRec milBaseRec <phone>111-2222</phone> S </patient> <physician>Jim Dale </physician> physician physician milTag UC Jim Dale Joe White MT78 </countyRec> <milBaseRec> TS <patient> S patient patient <name>Harry Green</name> UC <phone>333-4444</phone> S </patient> name phone name phone <physician>Joe White John Smith 111-2222 Harry Green 333-4444 </physician> UC <milTag>MT78</milTag> TS </milBaseRec> </medicalFiles> View over UC data 31 Secure XML Views - Example medicalFiles <medicalFiles> <countyRec> countyRec <patient> <name>John Smith</name> </patient> physician Jim Dale <physician>Jim Dale</physician> </countyRec> <milBaseRec> patient <patient> <name>Harry Green</name> </patient> <physician>Joe White</physician> name </milBaseRec> John Smith </medicalFiles> milBaseRec physician Joe White patient name Harry Green View over UC data 32 Secure XML Views - Example medicalFiles <medicalFiles> <tag01> countyRec <tag02> <name>John Smith</name> </tag02> physician <physician>Jim Dale</physician> Jim Dale </tag01> <tag03> patient <tag02> <name>Harry Green</name> </tag02> <physician>Joe White</physician> name </tag03> John Smith </medicalFiles> milBaseRec physician Joe White patient name Harry Green View over UC data 33 Secure XML Views - Example medicalFiles <medicalFiles> UC <countyRec> S countyRec <patient> S <name>John Smith</name> UC </patient> <physician>Jim Dale</physician> physician Jim Dale UC </countyRec> <milBaseRec> TS patient <patient> S <name>Harry Green</name> UC </patient> <physician>Joe White</physician> name John Smith UC </milBaseRec> </medicalFiles> milBaseRec physician Joe White patient name Harry Green View over UC data 34 Secure XML Views - Example medicalFiles <medicalFiles> <name>John Smith</name> <physician>Jim Dale</physician> physician <name>Harry Green</name> Jim Dale <physician>Joe White</physician> </medicalFiles> name John Smith physician Joe White name Harry Green View over UC data 35 The Inference Problem General Purpose Database: Non-confidential data + Metadata Undesired Inferences Semantic Web: Non-confidential data + Metadata (data and application semantics) + Computational Power + Connectivity Undesired Inferences 36 Correlated Inference base Base fort Public place address Place Public basin Water source district Water Source Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base Confidential Base Water source 37 Inference Control Public Access Control Confidentia l X Misinfo Organizational Data Attacker X Ontology Web Data Data Integration and Inferences Inference Control Public Misinfo Confidentia l Organizational Data ACCESS and INFERENCE CONTROL POLICY • Logic-based inference detection • Exact and partial disclosure • Data and metadata protection • Heterogeneous data manipulation • Metadata discovery Data Mining and Privacy Statistical inference: K-anonymity Correlation General inference: metadata Biased learning Pattern Lecture 19 CSCE 522 - Farkas 40 Future 41 Next Class Midterm exam Lecture 19 CSCE 522 - Farkas 42