Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CPT-S 580-06 Advanced Databases Yinghui Wu EME 49 ADB (ln29) 1 CPT-S 580-08 Advanced Databases DBMS: privacy and security in the Cloud Data security and privacy Security and privacy in cloud Data confidentiality Research Challenges ADB (ln29) adapted from “Secure and Privacypreserving database services in the cloud, Divy Agrawal, et.al, ICDE 2013 tutorial” Database systems: security & privacy issues ADB (ln29) Access Control [Bertino et al. TDSC’05] Problem Statement: authorizing data access scopes (relations, attributes, tuples) to users of DBMS Discretionary access control – Authorization administration policies, ie, granting and revoking authorization (centralized, ownership, etc) – Content-based using views and rewriting for fine-grained access control – Role-based access control: a function with a set of actions, consisting of users members Mandatory access control: – Object and subject classification (eg, top secret, secret, unclassified, etc). 4 Data Anonymization Problem: protecting Personally Identifiable Information (PII) and their sensitive attributes Quasi-identifier Sensitive DOB Gender Zipcode Disease 1/21/76 Male 53715 Heart Disease 4/13/86 Female 53715 Hepatitis 2/28/76 Male 53703 Brochitis 1/21/76 Male 53703 Broken Arm 4/13/86 Female 53706 Flu 2/28/76 Female 53706 Hang Nail Quasi-identifiers need to be generalized or suppressed Quasi-identifiers are sets of attributes that can be linked with external data to uniquely identify an individual 5 Solution: k-Anonymity [Samarati et al. TR’98] Quasi-identifiers indistinguishable among k individuals Implemented by building generalization hierarchy or partitioning multi-dimensional data space Equivalence Homogeneity attack class share same QI Background knowledge attack 6 Enhanced Solution: l-Diversity [Machanavajjhala et al. ICDE’06] • At least l values for sensitive attributes in each equivalence class Similarity attack A 3-diverse patient table Zipcode Age Salary Disease 476** 2* 20K Gastric Ulcer 476** 2* 25K Gastritis 476** 2* 30K Stomach Cancer 4790* ≥40 50K Gastritis 4790* ≥40 100K Flu 4790* ≥40 70K Bronchitis 476** 3* 60K Bronchitis 476** 3* 80K Pneumonia 476** 3* 90K Stomach Cancer Skewness attack 7 Enhanced Solution: t-Closeness [Li et al. ICDE’07] • Distance between overall distribution of sensitive attribute values and distribution of sensitive attribute values in an equivalence class bounded by t 8 Differential Privacy for Statistical Data [Dwork ICALP’06] Strong privacy guarantees while querying a database Query P(A) A Indistiguishable! PERTURBATION Query P(A’) A’ PERTURBATION A randomized function K gives ε-Differential Privacy IFF for all datasets D1 and D2 differing on at most one element, and all S Î Range (K) ln Pr[K(D1 ) Î S] £e Pr[K(D2 ) Î S] 9 Secure Devices for Privacy [Anciaux et al. SIGMOD’07] Problem: protecting private data during queries involving both private (hidden) and public (visible) data Solution: carry private data in a secure USB key, ensure private data never leaves the USB key, and only public data flows to the key Query optimization for small RAM USB key 4/11/2013 ICDE 2013 Tutorial 10 Database security & privacy in the cloud ADB (ln29) Cloud – A Tempting Attack Target Why the cloud? – Ubiquitous access to consolidated data. – Shared infrastructure economies of scale – A lot of small and medium businesses Why attack? – Target one service provider, attack multiple companies – Financial gain from trading sensitive information 12 Cloud Provides Novel Attack Opportunities Co-residence attack [Ristenpart et al. CCS’09] – Adversary: non-provider-affiliated malicious parties – Map and identify location of target VM – Place attacker VM co-resident with target VM – Cross-VM side-channel attacks (due to sharing of physical resources): eg, number of visitors to a page, or keystroke attacks for password retrieval. Signature wrapping attack – – – – [Somorovsky et al. CCSW’11] Control Interface compromise by capturing a SOAP msg. Manipulate SOAP message with arbitrary XML fragments Use XML signature vulnerability to pass authentication Take control of a victim’s account 13 A Barrier to Conquer Security and privacy – a barrier to cloud adoption Data (sensitive data) – a key concern need to solve data security and privacy problems in the cloud 14 Problems Amplified by the Cloud Data confidentiality – Attacks • Unauthorized accesses, side channel attacks – Solutions • Encryption, querying encrypted data • Trusted computing • Access privacy – Attacks • Inferences on access patterns or query results – Solutions • Private information retrieval • Query obfuscation Query Data Answer User Cloud Servers 15 Challenges: Conflicting Goals High Existing Services Ideal State Functionality Performance Many Crypto Systems/Protocols Low Confidentiality / Privacy High 16 Data confidentiality ADB (ln29) Database as a Service [Hacigümüs et al. ICDE’02] Protects data from steeling but plaintext data can still be seen on the server Write – encrypt before storing – insert into lineitem (discount) values (encrypt(10,key)) Read – decrypt before access – select decrypt(discount,key) from lineitem where custid = 300 Encryption alternatives – Software level v.s. Hardware level (cryptographic coprocessor) encryption – Granularity: field, row, page 18 Partition and Identification Index [Hacigümüs et al. SIGMOD’02] E(tuple): encrypted-tuple, {attribute-index} Attribute-index: attribute value partition ids 2 0 7 200 5 400 1 600 4 800 1000 19 Partition and Identification Index Client knows a map function, Map(val) = id of the partition containing val Random mapping 2 0 7 5 400 200 1 4 800 600 1000 Order-preserving mapping 1 0 2 200 4 400 5 600 7 800 1000 20 Mapping Predicate Conditions • Map(< val) : ids of the partitions that could contain values < val • E.g. Map(eid < 280) = {2, 7} for random mapping • Map(> val) : ids of the partitions that could contain values > val • Map(Ai = Aj): pairs of ids of the partitions that could have equal Ai and Aj values • Decryption and processing on the client 21 Mapping Predicate Conditions emp.did = mrg.did 22 Partition / Bucketization Review Pros – Efficient computation on the server Cons – Data update is hard (may need re-distribution) – Filtering super answer set could be time consuming depending on the partitions sizes – Might reveal value distribution from relative partitions changes during dynamic data updates 23 CryptDB [Popa et al. SOSP’11] Supports a wide range of SQL queries over encrypted data Server fully evaluates queries on encrypted data, and client does not perform query processing SQL-aware encryption – leverage provable practical techniques for different SQL operators over encrypted data Adjustable query-based encryption – Dynamically adjust the encryption level of data items according to user’s queries Onion of encryptions – From weaker forms of encryption that allow certain computation to stronger forms of encryption that reveal no information 24 SQL-Aware Onion Encryption RND: no functionality DET: equality selection SEARCH: word selection (only for text fields) JOIN: equality join RND: no functionality OPE: comparison OPE-JOIN: inequality join Any value Any value HOM: sum int value 25 CryptDB System For sending certain onion layer key For performing cryptographic operations 26 Open problems ADB (ln29) Open Research Problems Encryption for processing range/join database queries on encrypted data Improve performance of querying encrypted data for use in practical OLTP applications – Pre-computation – Parallel calculation End to end security in the cloud – Need information flow control and auditing in addition to cryptography or trusted computing based approaches 28 Concluding Remarks Cloud security and privacy is not a completely new problem. Some issues are amplified by the cloud. Protecting data confidentiality and access privacy Maintaining practical functionality and performance while achieving security and privacy 29 References • • • [Bertino et al. TDSC’05] E. Bertino et al. Database security-concepts, approaches, and challenges. In IEEE TDSC, 2(1), 2005. [Samarati et al. TR’98] P. Samarati et al. Protecting privacy when disclosing information: kanonymity and its enforcement through generalization and suppression. TR 1998. [Machanavajjhala et al. ICDE’06] A. Machanavajjhala et al. l-diversity: privacy beyond kanonymity. In ICDE 2006. [Li et al. ICDE’07] N. Li et al. t-closeness: privacy beyond k-anonymity and l-diversity. In ICDE 2007. [Dwork ICALP’06] C. Dwork. Differential privacy. In ICALP(2) 2006. [Verykios et al. SIGMOD’04] V. S. Verykios et al. State-of-the-art in privacy preserving data mining. In SIGMOD 2004. [Agrawal et al. SIGMOD’00] R. Agrawal et al. Privacy-preserving data mining. In SIGMOD 2000. [Clifton et al. KDD’02] C. Clifton et al. Tools for privacy preserving distributed data mining. In KDD 2002. [Anciaux et al. SIGMOD’07] N. Anciaux et al. GhostDB: querying visible and hidden data without leaks. In SIGMOD 2007. 30 References [Chaudhuri et al. CIDR’11] S. Chaudhuri et al. Database access control & privacy: is there a common ground? In CIDR 2011. [Ristenpart et al. CCS’09] T. Ristenpart et al. Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds. In CCS 2009. [Somorovsky et al. CCSW’11] J. Somorovsky et al. All your clouds are belong to us: security analysis of cloud management interfaces. In CCSW 2011. [Hacigümüs et al. ICDE’02] H. Hacigümüs et al. Providing database as a service. In ICDE 2002. [Song et al. S&P’00] D. Song et al. Practical techniques for searches on encrypted data. In S&P 2000. [Hacigümüs et al. SIGMOD’02] H. Hacigümüs et al. Executing SQL over encrypted data in the database service provider mode. In SIGMOD 2002. [Hore et al. VLDB’04] B. Hore et al. A privacy-preserving index for range queries. In VLDB 2004. [Agrawal et al. SIGMOD’04] R. Agrawal et al. Order preserving encryption for numeric data. In SIGMOD 2004. 31 References [Popa et al. SOSP’11] R. A. Popa et al. Cryptdb: protecting confidentiality with encrypted query processing. In SOSP 2011. [Damiani et al. CCS’03] E. Damiani et al. Balancing confidentiality and efficiency in untrusted relational DBMSs. In CCS 2003. [Wang et al. SDM’11] S. Wang et al. A comprehensive framework for secure query processing on relational data in the cloud. In SDM 2011. [Aggarwal et al. CIDR’05] G. Aggarwal et al. Two can keep a secret: a distributed architecture for secure database services. In CIDR 2005. [Emekci et al. ICDE’06] F. Emekci et al. Privacy preserving query processing using third parties. In ICDE 2006. [Agrawal et al. SRDS’88] D. Agrawal et al. Quorum consensus algorithms for secure and reliable data. In SRDS 1988. [Bajaj et al. SIGMOD’11] S. Bajaj et al. Trusteddb: a trusted hardware based database with privacy and data confidentiality. In SIGMOD 2011. [Song et al. IEEE’12] D. Song et al. Cloud data protection for the masses. In IEEE Computer, 45(1), 2012. [Chor et al. JACM’98] B. Chor et al. Private information retrieval. In J. ACM, 45(6), 1998. 32 References [Kushilevitz et al. FOCS’97] E. Kushilevitz et al. Replication is not needed: single database, computationally private information retrieval. In FOCS 1997. [Sion et al. NDSS’07] R. Sion et al. On the computational practicality of private information retrieval. In NDSS 2007. [Olumofin et al. FC’11] F. G. Olumofin et al. Revisiting the computational practicality of private information retrieval. In FC 2011. [Williams et al. NDSS’08] P. Williams et al. Usable private information retrieval. In NDSS 2008. [Wang et al. DBSEC’10] S. Wang et al. Generalizing PIR for practical private retrieval of public data. In DBSec 2010. [Wang et al. DAPD’13] S. Wang et al. Towards practical private processing of database queries over public data. In DAPD 2013. [Vimercati et al. ICDCS’11] S. D. C. Vimercati et al. Efficient and private access to outsourced data. In ICDCS 2011. 33