* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download No Slide Title
Survey
Document related concepts
Transcript
MURI — Info. Management Group Group Co-Leaders: Jiawei Han (UIUC) Chris Clifton (Purdue) Hillol Kargupta (UMBC) Collaborators: Murat Kantarcioglu (UT-Dallas) Shouhuai Xu (UT- San Antonio) Ninghui Li (Purdue) Core Contributors: May 22, 2017 Latifur Khan (UTDallas) Chengxiang Zhai (UIUC) Liasons: Ravi Sandhu (UT- San Antonio) Anupam Joshi (UMBC) 1 Core Contributors & Current Ph.D. Students Jiawei Han (UIUC) Lu An Tang Zhijun Yin Chengxiang Zhai Yuanhua Lv Hyun Duk Kim Mehedy Masud Chris Clifton (Purdue) May 22, 2017 Kamalika Das Latifur Khan (UTD) (UIUC) Hillol Kargupta (UMBC) Mummoorthy Murugesan 2 General Project Goals Provide information management and analysis support for the project Major research themes Knowledge Discovery Data integration and fusion Measuring and maintaining information quality Provenance tracking Confidentiality in Information Management and Analysis May 22, 2017 3 Posters Reported in the Kick-Off Meeting Plausibly Deniable Search Conforming to Truth with Multiple Conflicting Information Providers on the Web Xuehua Shen, Bin Tan, and ChengXiang Zhai Privacy Preserving Distributed Data Mining: A Game-Theoretic Approach Shouhuai Xu User-Centered Adaptive Information Retrieval Jiawei Han, Xiaoxin Yin, and Philip S. Yu Privacy-preserving Data Mining within Anonymous Credential Systems Mummoorthy Murugesan and Chris Clifton Kamalika Das and Hillol Kargupta Novel Class Detection in Concept-Drifting Data Streams in a Shared Environment. May 22, 2017 Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham 4 On-Going Research Projects Novel Class Detection in Concept-Drifting Data Streams in a Shared Environment Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham (UTD/UIUC) Confidentiality Preserving Data Cubes Jiawei Han, Lu An Tang and Bolin Ding (UIUC) Scalable Distributed Privacy-Preserving Local Algorithms for Large Peer-to-Peer Data Mining: A Game Theoretic Approach Hillol Kargupta and Kamalika Das (UMBC) Confidential peer to peer extension to personalized search Chengxiang Zhai, Chris Clifton, and Mummoorthy Murugesan (UIUC/Purdue) Information quality: Understanding and identifying provenance ChengXiang Zhai and Jiawei Han (UIUC) SPDU: A Secure Provenance Management Framework Shouhuai Xu and Ravi Sandhu (UTSA) May 22, 2017 5 Discovery in Data Streams for Security Protection Novel Class Detection in Concept-Drifting Data Streams in a Shared Environment Novelty/anomaly detection: A major issue in many applications, especially in a streaming environment Goal: Detect new classes in data streams Approach: Efficiently handle the novel class detection task in the presence of concept-drift and multiple classes The approach is non-parametric—not assume any underlying distributions of data Comparison with the state-of-the-art stream classification techniques prove the superiority of our approach The technique can be extended to a distributed environment with multiple sources May 22, 2017 6 Confidentiality-Preserving Data Cubes Confidentiality-/privacy-/sensitivity-preserving data cubes Researchers have been studying confidentialitypreserving database systems (for query processing) and confidentiality-preserving data mining systems We propose to investigate confidentiality-preserving data cubes for multidimensional analysis of data warehouses Goal: Work out mechanisms to ensure one can access maximal information in data cubes for information understanding but lose minimal privacy information, even with different combinations of OLAP queries Extensions: How knowledge discovery will help confidentiality preserving May 22, 2017 7 Data and Information Integration for Security Protection Data fusion: Merge/integrate the same objects with different names or identities Data distinction: Distinguish different objects with identical names Information integration by information network analysis Veracity analysis to conform truth with conflicting information provided by multiple website or other information providers Correlation analysis to reduce redundancy and control information disclosure May 22, 2017 E.g. medical records, patients, medical treatments 8 Data and Information Access and Management for Security Protection Data separation vs. data integration and their role in sensitive information disclosure and correlation discovery Privacy-aware indexing to support fast/efficient data accessing Sensitivity-aware query processing and data publishing Any other data/information management and analysis issues needed from other groups in the project May 22, 2017 9 Scalable Distributed Local Algorithms for Peer-to-Peer Knowledge Discovery from Sensitive Data Hillol Kargupta University of Maryland, Baltimore County www.cs.umbc.edu/~hillol www.agnik.com Acknowledgement: Chengxiang Zhai, Kamalika Das, Kanishka Bhaduri, Kun Liu May 22, 2017 10 Scalable Privacy-Preserving Information Assurance Challenges in Scalable Knowledge Discovery Scaling in large asynchronous distributed environments Confidentiality/Privacy Preserving Data Analysis Heterogeneous Policies and Strategies Applications Distributed collaboration Distributed search and information retrieval Motivation: Secure Multi-Party Sum Computation v1 • Compute the sum without divulging the numbers z1=(R+v1) mod N z3=(z2+v3) mod N v2 • Each party has a number z2=(z1+v2) mod N R is uniformly distributed in [0, N-1] • Consider a sequence of secure sum operations. v3 Locality Sensitive Distributed Algorithms Global algorithms: Communicate with the entire network Every node needs to maintain information about the entire network Maintaining this information is resource intensive for large networks Local algorithms: Communicate only with the local neighborhood. Bounded communication local algorithms Distributed Sum Computation: A Local Approach Each node has a number xi [0] Compute the sum Update xi [t ] using the following rule: xi [t ] xi [t 1] ( x j [t 1] xi [t 1]) ji Asymptotically converges to the global sum Optimization, Games, and PrivacyPreserving Knowledge Discovery Multi-Party Privacy Preservation as an optimization problem Multi-party, multi-objective optimization Blending game theory and mechanism design Asynchronous algorithms for achieving equilibrium states Privacy/Confidentiality Preservation: An Optimization Perspective Multi-objective Optimization Perspective Policies Strategies Performance Distributed games for optimizing utility functions Summary of the Approach Local Asynchronous Distributed Knowledge Discovery Algorithms that preserve Privacy/Confidentiality Distributed Search and Information Retrieval Algorithms Multi-party Optimization Perspective of Privacy/Confidentiality Preservation and Design of Distributed Game Theoretic Mechanisms May 22, 2017 17 Example: Cross-Domain Network Threat Detection Correlating threats from different network domains Copyright, Agnik Motivation : P2P Search Engine What is the most visited news-page in network today? Has anybody found a cheap store to buy a digital camera? What is the best search-key to search for “Child Care”? Useful Browser Data Web-browser history Browser cache Click-stream data stored at browser (browsing pattern) Search queries typed in the search engine User profile Bookmarks Challenges Indexing, clustering, data analysis in a decentralized asynchronous manner Scalability Privacy User-Centered Adaptive Information Retrieval WEB Viewed Web pages Search Engine Search Engine Desktop Files ... Personalized search agent Email Query History Search Engine “java” Personalized search agent “java” User-Centered Adaptive IR • A novel retrieval strategy emphasizing – user modeling (“user-centered”) – search context modeling (“adaptive”) – interactive retrieval • Implemented as a personalized search agent that – sits on the client-side (owned by the user) – integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users) – collaborates with each other – goes beyond search toward task support Reranking of Search Results with UCAIR Toolbar May 22, 2017 23 Research Agenda Develop a scalable methodology for Knowledge Discovery from Multi-Party Data Design local asynchronous algorithms with bounded communication Multi-objective Distributed Optimization, Mechanism Design, and Local Algorithms Designing the Next Generation of PrivacyPreserving Distributed Knowledge Discovery Algorithms Research Agenda Privacy-preserving user modeling: P2P information recommendation How can we model a user’s information need yet preserving privacy? How can we aggregate user models and information needs to control privacy? P2P architecture: flexible information sharing What’s the right protocol for information recommendation? How to extend collaborative filtering algorithms to protect user privacy? Collaborative Search How can we match information needs with information content at different levels of representation? From Collaborative Query/Filtering to Information Push Chengxiang Zhai and Chris Clifton (UIUC/Purdue) Personalized search profile of information needs Profile based on prior search, without requiring explicit definition of profile Assist information sources in identifying need to share Challenge: profile / search may be sensitive May not be able to reveal to information source (unless they have needed information?) Research thrusts: Turning personalized search into profiles Matching information to profiles without disclosing either May 22, 2017 27 SPDU: A Secure Provenance Management Framework Shouhuai Xu and Ravi Sandhu (UTSA) Security of provenance management is critical to many applications including assured information sharing The state-of-the-art is that we know little about the security aspect of provenance management. We propose investigating a comprehensive framework for secure provenance management as well as supporting architectures and mechanisms for realizing the framework May 22, 2017 28 SPDU Shouhuai Xu and Ravi Sandhu • A comprehensive framework for securing provenance and the corresponding information – We cannot talk about provenance without touching what the provenance is for (i.e., both data and their provenance are the goals for protection) • Supporting architectures and mechanisms for realizing the framework SPDU framework • The above challenges call for a novel framework for secure provenance management. • We propose a SPDU framework for this purpose. – S stands for Source trustworthiness management Information trustworthiness – P stands for Processing trustworthiness management management – D stands for Dissemination management – U stands for Usage management • SPDU is application-neutral: allowing plug-and-play applicationspecific modules (e.g., semantic similarity between two documents) • SPDU covers the whole lifecycle of information sharing Processing (recursive) Dissemination Source Usage Eight facets of SPDU Usage accountability Dissemination accountability Processing accountability Source accountability Source privacy Secure provenance management Usage privacy Processing privacy Dissemination privacy Information Quality: Understanding and Identifying Provenance ChengXiang Zhai and Jiawei Han (UIUC) Credibility of information, particularly information presumed to be from multiple sources, is a challenging issue Are multiple reports independent confirmation of the same event? Based on a common report? Reports of different events? Propose to use data mining techniques to identify similarities/differences in information that is apparently from different sources to estimate the likelihood that data is from a single or independent sources, and about the same or multiple events Propose to develop novel text mining algorithms to analyze "information genealogy" in large amounts of text data from multiple sources and summarize contradictory opinions on a topic May 22, 2017 33 Summarizing Contradictory Information Given a set of text articles from different sources with contradictory information, how can we help analysts to digest the information? Problem 1: Semantic integration of information from multiple sources Problem 2: Detection of contradictory information Problem 3: Summarization of contradictory information Techniques to explore: text mining with probabilistic models information extraction (e.g., entity/relation extraction) Questions for YOU! Other data analysis / global statistical model needs? Data quality? Lifecycle? What sort of global statistical models would be of interest to Intelligence Analysts? Models that transcend data silos Scenarios for testing Sample/surrogate data to support scenarios May 22, 2017 35 Thanks and Questions May 22, 2017 36