Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Research Statement Xianrui Meng In today’s internet, with the advent of cloud computing, there is a natural desire for enterprises, organizations, and end users to outsource increasingly large amounts of data to a cloud provider. Therefore, ensuring security and privacy is becoming a significant challenge for cloud computing, especially for users with sensitive and valuable data. In addition, the benefits of big data - including advances in machine learning, e-commerce, social sciences, and marketing - are well-publicized, but the various privacy and security problems it presents have received less attention from the public at large. My work as a security and data science researcher focuses on the development of privacy-enhancing technologies that minimize the amount of data being revealed when outsourcing massive datasets in cloud-based environments. In particular: My primary research objective is to design provably secure and scalable schemes for encrypting large-scale databases without losing the ability to query them. In my research, I have been closely collaborating with database and data mining researchers to design cryptographic systems that are not only secure but also efficient and can be easily deployed in practice. In the remainder of this statement, I briefly discuss specific problems that I studied and some of my ongoing work. I then offer a few concluding remarks about my future research agenda. Research Overview Graph Encryption Schemes. Graph databases that store, query, and manage large graphs have received increasing interest recently due to many large-scale database applications that can be modeled as graph problems. Example applications include storing and querying large Web graphs, online social networks, biological networks, RDF datasets, and communication networks. Graph encryption is to encrypt graph data in such a way that they can be privately queried. Ideally, a graph encryption scheme should encrypt a graph with support for various graph queries like nearest neighbor queries, shortest distance queries, node similarity queries, etc. Given the ubiquity and importance of graph data, it comes as no surprise that such graph encryption scheme would have numerous potential applications. In recent work with Kamara, Nissim, and Kollios [MKNK15], I introduce and formalize graph encryption schemes based on the concepts of searchable [CGKO11] and structured encryption [CK10]. In addition, I investigate shortest distance queries on encrypted graphs. Shortest distance queries are arguably one of the most fundamental graph operations and have a wide range of applications. They can also be easily applied to other interesting graph query problems, such as centrality and nearest neighbor queries. Despite the emergence of many privacy-preserving encryptions and cryptographic technologies, an efficient graph encryption scheme that supports shortest distance was still an open problem. Therefore, I aimed to answer the following question: can we design a practical and provably secure system that supports shortest distance queries on encrypted graphs? To resolve the problem, I proposed GRECS, a graph encryption framework that consists of three different schemes and can support approximated shortest distance queries. By leveraging distance oracle structures, I present several encryption schemes with different trade-offs. These encryption 1 schemes do not affect the distance approximation of the underlying distance oracles, and, at the same time, provide strong security guarantees based on the formal security definition from graph encryption. The first scheme is computationally efficient, while the second one is communication-efficient by adopting specialized homomorphic encryption schemes. I also propose a third scheme which is both computational and communication-efficient but leaks some small amount of controlled information. The experimental results of using the encryption scheme on many real world graph datasets are very promising. It demonstrates that the constructions are extremely efficient and scalable compared to state-of-the-art solutions. In fact, in most cases, GRECS can report better approximations than the original distance oracle, which makes the accuracy of the shortest distance quite high. For example, for 10,000 randomly generated queries, roughly 50% of the distances returned are the true shortest distances. In my ongoing work, I am continuing to study graph encryption schemes for more complex graph queries. Our techniques can be easily applied to ‘node similarity query’, which is a graph query type that indicates how similar two nodes are, based on their distance information in the graph. Moreover, in my current work [MMK15], I also investigate subgraph mining queries on encrypted graphs, which can be very useful for many graph analytics tasks. Furthermore, I plan to explore verifiable graph encryption schemes that provide both privacy and integrity. Secure Top-k Database Queries. In a different project, I study top-k queries on encrypted databases. Many previous techniques have been proposed for solving k nearest neighbor queries, range queries, or keyword queries on encrypted relational databases. However, although top-k queries are very important in many database applications, no existing work is applicable to solving the top-k queries securely and efficiently. In the work of [MZK15], I construct a secure top-k query processing protocol on encrypted databases under the non-colluding semi-honest clouds model. This is the first ranking query processing protocol on encrypted relational databases that satisfy Indistinguishability Chosen Query Attack security, and therefore offers a strong privacy guarantee. I also formulate and construct several novel secure sub-protocols, such as secure best/worst score and secure de-duplication, which can be adapted as stand-alone building blocks for many other applications. I implemente the proposed protocols and run a set of experiments on a number of real-world databases. The results show that the protocol is extremely efficient and has very low computational overhead. I further investigate and extend my methods to support secure top-k join queries over multiple encrypted databases. Furthermore, I show that my techniques can be adopted for handling general secure join queries. The proposed scheme also improves the security of other existing works which adopt some less secure property-preserving encryptions. For example, existing work on table join queries on encrypted databases still relies on deterministic encryptions. In [MZK15], I show some preliminary results of constructing secure joins by adapting a semantically-secure homomorphic encryption scheme with much less leakage than before. Other Research and Ongoing Work. In my joint work with Zhu and Kollios [ZMK14], we propose a secure two-party protocol for computing the similarity between two different time series. Our protocol is practical and can be used to compute the similarity between two time series, one from the client and the other from the server, without revealing the actual time series to the other party. In an on-going project [MO15] in collaboration of Dr. Alina Oprea from RSA Labs, I aim to design a secure multi-party protocol for privacy preserving clustering problems. In particular, I seek solutions for two or more parties who can jointly cluster their individual dataset without leaking it to other untrusted parties. Currently I am working towards a generic privacy-preserving two-party protocol for hierarchical clustering. I also maintain interests in theoretical cryptography. In another joint work with Fuller and Reyzin [FMR13], we invent for the first time a computational fuzzy extractor, which is used to derive a stable 2 and strong cryptographic key from entropic but noisy physical sources. In particular, based on the computational hard assumption for lattices, we design a fuzzy extractor in the computational setting. Future Agenda My research so far has taken preliminary steps to dealing with some security and privacy issues that arise in databases and data mining. Nevertheless, there are a number of fascinating open problems in the area that need to be addressed. In the future, I plan to broaden my understanding on how to provide large-scale data security and privacy in cloud computing. Below I present a short list of open problems that I plan to explore in the future. In the long term, my goal is to design a practical privacy-preserving machine learning system that enables machine learning algorithms to run over encrypted data. Recently, a number of applied security and privacy research problems have appeared in data mining and machine learning. Consider a user with sensitive data who wants to make an inference using a machine learning predictive model that is held by the cloud, without compromising the user’s private information. To protect data confidentiality, the user encrypts the data and sends the ciphertexts to the cloud who runs the machine learning algorithm over the encrypted data. Existing solutions are mainly focused on specific machine learning models and rely on some high-degree homomorphic encryption schemes with extremely high performance overheads. However, by taking advantage of the structure of the original dataset, one can have a more efficient structured encryption scheme, as I have shown in my recent work. Having many such encryption schemes as building blocks can allow for many different algorithms to become more secure and efficient. The goal would be to have a generic and modular approach that can combine those structured encryption schemes to implement various machine learning tasks. I also would like to explore the possibility of bringing verifiable computation to both structured and graph encryptions. Cryptographic researchers have shown some preliminary results of designing verifiable computation for some particular homomorphic encryption schemes. However, these theoretical results are very inefficient and are very hard to apply them to database and data mining applications. I would like to combine the techniques of searchable and graph encryption with the techniques used in verifiable computation. A long-term goal would be to have a practical verifiable encryption scheme for massive datasets that provides both data privacy and integrity, without losing the capability to query the datasets. Finally, I am interested in exploring techniques to further reduce leakage in privacy-preserving database systems and mitigate inference attacks on those systems. In addition, I plan to seek better ways of expressing the semantic meanings of leakage and quantifying the amount of leakage when querying encrypted databases. We have seen many proposed works on supporting rich SQL queries on encrypted databases by leveraging property-preserving encryptions. However, those approaches have much weaker security guarantees. Therefore, to address these problems, I am determined to further study how we can design and construct more semantically secure and scalable schemes. 3 References [CGKO11] Reza Curtmola, Juan A. Garay, Seny Kamara, and Rafail Ostrovsky. Searchable symmetric encryption: Improved definitions and efficient constructions. Journal of Computer Security, 19(5):895–934, 2011. [CK10] Melissa Chase and Seny Kamara. Structured encryption and controlled disclosure. In Advances in Cryptology - ASIACRYPT 2010 - 16th International Conference on the Theory and Application of Cryptology and Information Security, Singapore, December 5-9, 2010. Proceedings, pages 577–594, 2010. [FMR13] Benjamin Fuller, Xianrui Meng, and Leonid Reyzin. Computational fuzzy extractors. In Advances in Cryptology - ASIACRYPT 2013 - 19th International Conference on the Theory and Application of Cryptology and Information Security, Bengaluru, India, December 1-5, 2013, Proceedings, Part I, pages 174–193, 2013. [MKNK15] Xianrui Meng, Seny Kamara, Kobbi Nissim, and George Kollios. GRECS: graph encryption for approximate shortest distance queries. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, October 12-6, 2015, pages 504–517, 2015. [MMK15] Xianrui Meng, Tarik Moataz, and Seny Kamara. Subgraph mining for encrypted graphs. Working Paper, 2015. [MO15] Xianrui Meng and Alina Oprea. Privacy preserving clustering. Working Paper, 2015. [MZK15] Xianrui Meng, Haohan Zhu, and George Kollios. Secure top-k query processing on encrypted databases. CoRR, abs/1510.05175, 2015. [ZMK14] Haohan Zhu, Xianrui Meng, and George Kollios. Privacy preserving similarity evaluation of time series data. In Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, March 24-28, 2014., pages 499–510, 2014. 4