Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Sources -- Intelligence The Ugly Challenges The Good The Bad Reading The Good: Rasmus Rosenqvist Petersen and Uffe Kock Wiil. 2011. Hypertext structures for investigative teams. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia (HT '11). ACM, New York, NY, USA, 123-132. The Bad: C. Farkas and A. Stoica, “Correlated Data Inference in Ontology Guided XML Security Engine,” Proc. of IFIP 17th WG 11.3 working conference on Data and Application Security, 2003. The Challenges: Joseph V. Treglia and Joon S. Park. 2009. Towards trusted intelligence information sharing. In Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics (CSI-KDD '09) CSCE 727 - Farkas 2 The Good: Support for Data Integration and Analysis CSCE 727 - Farkas 3 Intelligence Analysis Law Enforcement Investigations Investigative teams: – Collect, process, analyze information related to a specific target – Disseminate findings Need automated tool to support activities CSCE 727 - Farkas 4 Application Areas Policing – Reactive nature – Incident-driven response Counterterrorism – National security – Proactive – Covert operations Investigative journalism – “Wrong doing” of organizations, influential individuals, etc. CSCE 727 - Farkas 5 Knowledge Management Acquisition: collect and process data – Traditional methods – Artificial Intelligence: Machine Learning Synthesis: create model of the target Sense making: extract useful information Disseminate findings: appropriate representation for the appropriate audiance CSCE 727 - Farkas 6 Case Study Kidnapping of Daniel Pearl, Wall Street Journal bureau chief in 2002 The Perl Project, Georgetown Univeristy, http://pearlproject.georgetown.edu/press_pe arlrelease.html Complex mapping of people, situations, locations, etc. to find kidnappers CSCE 727 - Farkas 7 Technology Involved in Visualizing Information White Board – link chart – Useful to model entities, their attributes, and relationships – Complex data types (text, images, symbols, etc.) – Becomes complex – Difficult to share CSCE 727 - Farkas 8 Computer Support – Functionality Acquisition: import, drag-drop, cut/past Synthesis: add/modify/delete entities, relations, restructure, group, collapse/expand, brainstorming Sense-making: retracing, creating hypothesis and alternative interpretations, prediction, exploring perspectives, decision making Dissemination: storytelling, report generation CSCE 727 - Farkas 9 Hypertext Structuring Mechanism Associative structures – extended to handle composites (supports synthesis) Spatial structure – handle emerging and dynamic structures over time Taxonomy structures – supports classification tasks Issue-based structures – support argumentation and reasoning Annotation and metadata structure – add semantics CSCE 727 - Farkas 10 Past Copyright: Rasmus Rosenqvist Petersen and Uffe Kock Wiil. 2011. Hypertext structures for investigative teams. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia (HT '11). ACM, New York, NY, USA, 123-132. CSCE 727 - Farkas 11 With the CrimeFighter Investigator Copyright: Rasmus Rosenqvist Petersen and Uffe Kock Wiil. 2011. Hypertext structures for investigative teams. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia (HT '11). ACM, New York, NY, USA, 123-132. CSCE 727 - Farkas 12 What Would be Better? Automated Data Collection Semantic-based Data Integration Intelligent Data Analysis Assurance of results CSCE 727 - Farkas 13 The Bad: Unauthorized Disclosure CSCE 727 - Farkas 14 The Bad: A. Stoica and C. Farkas, “Ontology guided Security Engine,” Journal of Intelligent Information Systems, 23(3): 209-223, 2004. (http://www.cse.sc.edu/~farkas/publications /j5.pdf ) Computer Science and Engineering 15 Semantic Web • Open, dynamic environment • Large number of users, agents, resources •Semantic tools • Autonomous agents • Machine understandable in data semantics • Computers exchange information transparently on behalf of the user Computer Science and Engineering 16 IS INFERENCING ON THE SEMANTIC WEB CREATES A SECURITY PROBLEM? Computer Science and Engineering 17 Motivation 1: Simulation Exploitation Using Open Source Information Objective: US Government would like to share a limited simulation software with friendly countries. – Can this software be used to explore the capabilities of US weaponry? – Can sufficient information be found from public sources to create such simulation? Findings: – Most of the information needed for the simulation was available on the Internet. – Needed human aid to combine available information Computer Science and Engineering 18 Motivation 2: Homeland Security Objective: Hide location of water reservoirs supplying military bases to limit terrorist activities. – Can location of a reservoir of a military base be found from public data on the Internet? Findings: – Location of a military base and water reservoirs of that region are available on the Web. – Needed human aid to combine available information Computer Science and Engineering 19 The Inference Problem General Purpose Database: Non-confidential data + Metadata Undesired Inferences Semantic Web: Non-confidential data + Metadata + Computational Power + Connectivity Undesired Inferences Computer Science and Engineering 20 The Inference Problem • Given • a set of confidential information, • large amount of public data, and • semantic relationship of public data. • Is it possible to deduce the confidential information from the semantically enhanced public data? • Security violation = disallowed data can be deduced from public data Computer Science and Engineering 21 Ontology Guided XML Security Engine (Oxegin) Public Organizational Data Public User Ontology Web Data Confidential Replicated Correlated Data Inf. Data Inf. Oxegin Computer Science and Engineering 22 Correlated Data Inference • Finds confidential information from public data (sensitive associations) • Inference guidance: – Ontology concept hierarchy – Structural similarity of public data • Features of similarity – Levels of abstraction for each node – Distance of associated nodes from association root • Similarity of the distances • Length of the distance – Similarity of sub-trees originating from correlated nodes Computer Science and Engineering 23 Associated Nodes Association similarity – Distance of each node from the association root – Difference of the distance of the nodes from the association root – Similarity of the sub-trees originating at nodes Example: XML document: Air show Public Inference Association Graph: Public, AC fort fort address address Computer Science and Engineering 24 Concept Generalization Ontology concept hierarchy – Normalized weight of concepts (more specific concept, higher weight) – Concept abstraction level – Range of allowed abstractions Example: Abstraction Level Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base Weight OAL=0 WGT=1 OAL=1 WGT=15 OAL=2 WGT=1 OAL=1 WGT=15 OAL=2 WGT=1 OAL=2 WGT=1 OAL=1 WGT=15 OAL=2 WGT=1 Computer Science and Engineering Normalized weight OP=1/50 OP=15/50 OP=1/50 OP=15/50 OP=1/50 OP=15/50 OP=15/50 OP=1/50 25 Correlated Inference Public fort address Public basin district ? Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base Confidential base Water source Computer Science and Engineering 26 Correlated Inference (cont.) Public fort address Public basin district Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base Computer Science and Engineering 27 Correlated Inference (cont.) base Public fort place Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base address Public basin district Water Source Confidential base Water source Computer Science and Engineering 28 Inference Removal Relational databases – Database design time: redesign database – Query processing time: modify/refuse answer Web inferences – Problems: Cannot redesign public data outside of protection domain Cannot modify/refuse answer to already published web page – Possible solutions: Withhold data: do not publish any public data, that may lead to inferences. Publish confusing data: publish data that creates confusion in contrast with existing publicly available data Computer Science and Engineering 29 The Ugly Challenges Technology Support CSCE 727 - Farkas 30 Technical Influences – Interoperability Heterogenenous data – Unstructured data, semi-structured data, structured data Representation of data semantics – Schema languages, taxonomies, ontologies Policy compliance – Policy languages, expressive power, implementation CSCE 727 - Farkas 31 Technical Influences – Availability Survivability – Response – Critical environments Open vs. protected Redundancy CSCE 727 - Farkas 32 Technical Influences – Control Control, monitor, and manage all usage Track dissemination of information Workflow management – Policy, trust, efficiency, local vs. global properties CSCE 727 - Farkas 33 Social Influences Trust – People and agencies Shadow network – Conflict of interest – Self-interest Criticality – The greater the threat the greater the likelihood of information sharing CSCE 727 - Farkas 34 Legal Influences Policy Conflict and Competition – Agency policy – Agencies may compete for the same resources – need to maintain advantage Governance – No universal policy on information sharing (federal, state, local, tribal, etc.) – International law CSCE 727 - Farkas 35