Download Research Statement

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Research Statement
Xianrui Meng
In today’s internet, with the advent of cloud computing, there is a natural desire for enterprises,
organizations, and end users to outsource increasingly large amounts of data to a cloud provider.
Therefore, ensuring security and privacy is becoming a significant challenge for cloud computing,
especially for users with sensitive and valuable data. In addition, the benefits of big data - including
advances in machine learning, e-commerce, social sciences, and marketing - are well-publicized, but the
various privacy and security problems it presents have received less attention from the public at large.
My work as a security and data science researcher focuses on the development of privacy-enhancing
technologies that minimize the amount of data being revealed when outsourcing massive datasets in
cloud-based environments. In particular:
My primary research objective is to design provably secure and scalable schemes for
encrypting large-scale databases without losing the ability to query them.
In my research, I have been closely collaborating with database and data mining researchers to
design cryptographic systems that are not only secure but also efficient and can be easily deployed
in practice. In the remainder of this statement, I briefly discuss specific problems that I studied and
some of my ongoing work. I then offer a few concluding remarks about my future research agenda.
Research Overview
Graph Encryption Schemes. Graph databases that store, query, and manage large graphs have
received increasing interest recently due to many large-scale database applications that can be modeled
as graph problems. Example applications include storing and querying large Web graphs, online social
networks, biological networks, RDF datasets, and communication networks. Graph encryption is to
encrypt graph data in such a way that they can be privately queried. Ideally, a graph encryption
scheme should encrypt a graph with support for various graph queries like nearest neighbor queries,
shortest distance queries, node similarity queries, etc. Given the ubiquity and importance of graph
data, it comes as no surprise that such graph encryption scheme would have numerous potential
applications.
In recent work with Kamara, Nissim, and Kollios [MKNK15], I introduce and formalize graph
encryption schemes based on the concepts of searchable [CGKO11] and structured encryption [CK10].
In addition, I investigate shortest distance queries on encrypted graphs. Shortest distance queries are
arguably one of the most fundamental graph operations and have a wide range of applications. They
can also be easily applied to other interesting graph query problems, such as centrality and nearest
neighbor queries. Despite the emergence of many privacy-preserving encryptions and cryptographic
technologies, an efficient graph encryption scheme that supports shortest distance was still an open
problem. Therefore, I aimed to answer the following question: can we design a practical and provably
secure system that supports shortest distance queries on encrypted graphs?
To resolve the problem, I proposed GRECS, a graph encryption framework that consists of three
different schemes and can support approximated shortest distance queries. By leveraging distance
oracle structures, I present several encryption schemes with different trade-offs. These encryption
1
schemes do not affect the distance approximation of the underlying distance oracles, and, at the same
time, provide strong security guarantees based on the formal security definition from graph encryption. The first scheme is computationally efficient, while the second one is communication-efficient by
adopting specialized homomorphic encryption schemes. I also propose a third scheme which is both
computational and communication-efficient but leaks some small amount of controlled information.
The experimental results of using the encryption scheme on many real world graph datasets are very
promising. It demonstrates that the constructions are extremely efficient and scalable compared to
state-of-the-art solutions. In fact, in most cases, GRECS can report better approximations than the
original distance oracle, which makes the accuracy of the shortest distance quite high. For example,
for 10,000 randomly generated queries, roughly 50% of the distances returned are the true shortest
distances.
In my ongoing work, I am continuing to study graph encryption schemes for more complex graph
queries. Our techniques can be easily applied to ‘node similarity query’, which is a graph query type
that indicates how similar two nodes are, based on their distance information in the graph. Moreover,
in my current work [MMK15], I also investigate subgraph mining queries on encrypted graphs, which
can be very useful for many graph analytics tasks. Furthermore, I plan to explore verifiable graph
encryption schemes that provide both privacy and integrity.
Secure Top-k Database Queries. In a different project, I study top-k queries on encrypted
databases. Many previous techniques have been proposed for solving k nearest neighbor queries,
range queries, or keyword queries on encrypted relational databases. However, although top-k queries
are very important in many database applications, no existing work is applicable to solving the top-k
queries securely and efficiently.
In the work of [MZK15], I construct a secure top-k query processing protocol on encrypted
databases under the non-colluding semi-honest clouds model. This is the first ranking query processing protocol on encrypted relational databases that satisfy Indistinguishability Chosen Query Attack
security, and therefore offers a strong privacy guarantee. I also formulate and construct several novel
secure sub-protocols, such as secure best/worst score and secure de-duplication, which can be adapted
as stand-alone building blocks for many other applications. I implemente the proposed protocols and
run a set of experiments on a number of real-world databases. The results show that the protocol is
extremely efficient and has very low computational overhead.
I further investigate and extend my methods to support secure top-k join queries over multiple
encrypted databases. Furthermore, I show that my techniques can be adopted for handling general
secure join queries. The proposed scheme also improves the security of other existing works which adopt
some less secure property-preserving encryptions. For example, existing work on table join queries on
encrypted databases still relies on deterministic encryptions. In [MZK15], I show some preliminary
results of constructing secure joins by adapting a semantically-secure homomorphic encryption scheme
with much less leakage than before.
Other Research and Ongoing Work. In my joint work with Zhu and Kollios [ZMK14], we
propose a secure two-party protocol for computing the similarity between two different time series.
Our protocol is practical and can be used to compute the similarity between two time series, one from
the client and the other from the server, without revealing the actual time series to the other party.
In an on-going project [MO15] in collaboration of Dr. Alina Oprea from RSA Labs, I aim to design a
secure multi-party protocol for privacy preserving clustering problems. In particular, I seek solutions
for two or more parties who can jointly cluster their individual dataset without leaking it to other
untrusted parties. Currently I am working towards a generic privacy-preserving two-party protocol
for hierarchical clustering.
I also maintain interests in theoretical cryptography. In another joint work with Fuller and Reyzin
[FMR13], we invent for the first time a computational fuzzy extractor, which is used to derive a stable
2
and strong cryptographic key from entropic but noisy physical sources. In particular, based on the
computational hard assumption for lattices, we design a fuzzy extractor in the computational setting.
Future Agenda
My research so far has taken preliminary steps to dealing with some security and privacy issues that
arise in databases and data mining. Nevertheless, there are a number of fascinating open problems in
the area that need to be addressed.
In the future, I plan to broaden my understanding on how to provide large-scale data security and
privacy in cloud computing. Below I present a short list of open problems that I plan to explore in
the future.
In the long term, my goal is to design a practical privacy-preserving machine learning system
that enables machine learning algorithms to run over encrypted data. Recently, a number of applied
security and privacy research problems have appeared in data mining and machine learning. Consider
a user with sensitive data who wants to make an inference using a machine learning predictive model
that is held by the cloud, without compromising the user’s private information. To protect data
confidentiality, the user encrypts the data and sends the ciphertexts to the cloud who runs the machine
learning algorithm over the encrypted data. Existing solutions are mainly focused on specific machine
learning models and rely on some high-degree homomorphic encryption schemes with extremely high
performance overheads. However, by taking advantage of the structure of the original dataset, one can
have a more efficient structured encryption scheme, as I have shown in my recent work. Having many
such encryption schemes as building blocks can allow for many different algorithms to become more
secure and efficient. The goal would be to have a generic and modular approach that can combine
those structured encryption schemes to implement various machine learning tasks.
I also would like to explore the possibility of bringing verifiable computation to both structured and
graph encryptions. Cryptographic researchers have shown some preliminary results of designing verifiable computation for some particular homomorphic encryption schemes. However, these theoretical
results are very inefficient and are very hard to apply them to database and data mining applications.
I would like to combine the techniques of searchable and graph encryption with the techniques used
in verifiable computation. A long-term goal would be to have a practical verifiable encryption scheme
for massive datasets that provides both data privacy and integrity, without losing the capability to
query the datasets.
Finally, I am interested in exploring techniques to further reduce leakage in privacy-preserving
database systems and mitigate inference attacks on those systems. In addition, I plan to seek better
ways of expressing the semantic meanings of leakage and quantifying the amount of leakage when
querying encrypted databases. We have seen many proposed works on supporting rich SQL queries on
encrypted databases by leveraging property-preserving encryptions. However, those approaches have
much weaker security guarantees. Therefore, to address these problems, I am determined to further
study how we can design and construct more semantically secure and scalable schemes.
3
References
[CGKO11] Reza Curtmola, Juan A. Garay, Seny Kamara, and Rafail Ostrovsky. Searchable symmetric encryption: Improved definitions and efficient constructions. Journal of Computer
Security, 19(5):895–934, 2011.
[CK10]
Melissa Chase and Seny Kamara. Structured encryption and controlled disclosure. In Advances in Cryptology - ASIACRYPT 2010 - 16th International Conference on the Theory
and Application of Cryptology and Information Security, Singapore, December 5-9, 2010.
Proceedings, pages 577–594, 2010.
[FMR13]
Benjamin Fuller, Xianrui Meng, and Leonid Reyzin. Computational fuzzy extractors. In
Advances in Cryptology - ASIACRYPT 2013 - 19th International Conference on the Theory and Application of Cryptology and Information Security, Bengaluru, India, December
1-5, 2013, Proceedings, Part I, pages 174–193, 2013.
[MKNK15] Xianrui Meng, Seny Kamara, Kobbi Nissim, and George Kollios. GRECS: graph encryption for approximate shortest distance queries. In Proceedings of the 22nd ACM SIGSAC
Conference on Computer and Communications Security, Denver, CO, USA, October 12-6,
2015, pages 504–517, 2015.
[MMK15]
Xianrui Meng, Tarik Moataz, and Seny Kamara. Subgraph mining for encrypted graphs.
Working Paper, 2015.
[MO15]
Xianrui Meng and Alina Oprea. Privacy preserving clustering. Working Paper, 2015.
[MZK15]
Xianrui Meng, Haohan Zhu, and George Kollios. Secure top-k query processing on encrypted databases. CoRR, abs/1510.05175, 2015.
[ZMK14]
Haohan Zhu, Xianrui Meng, and George Kollios. Privacy preserving similarity evaluation
of time series data. In Proceedings of the 17th International Conference on Extending
Database Technology, EDBT 2014, Athens, Greece, March 24-28, 2014., pages 499–510,
2014.
4