Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Fusion and Semantic Web: Meta-Models of Distributed Data and Decision Fusion. Project Report Vladimir Gorodetski, Oleg Karsaev, Vladimir Samoilov Intelligent System Laboratory of the St. Petersburg Institute for Informatics and Automation E-mail: {gor, ok, samovl}@mail.iias.spb.su http://space.iias.spb.su/ai/english/gorodetski.htm 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Title of the Project “Autonomous Information Collection, Knowledge Discovery Techniques and Software Tool Prototype for Knowledge-Based Data Fusion” Project from European Office of Aerospace Research and Development (EOARD) –AFRL/IF (USA) (December 2000 - December 2003) 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Outline of the Project Presentation 1. Outline of the Data and Information Fusion problems 2. Project research objectives 3. Examples of case studies and applications used 4. Ontology-centered meta-model of data sources 5. Meta-model of decision fusion 6. Multi-agent architecture 7. Conclusion 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Tasks and Applications of Data and Information Fusion Application Fields Critical areas of human society security, life support, security of critical state infrastructures, large-scale logistics, natural and man-made disasters, etc. Examples of Applications Assessment and prediction of situations, Resource management and rescue operation planning in large scale natural and man-made disasters, Decision making and planning of rescue operations in systems like US 911, Situational awareness and prediction for terrorist intents and anti-terrorist activity planning, Military situation assessment, Safeguard of critical plants like nuclear power stations, electrical power grids, etc. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Information Fusion-Definition Sensor 1 Sensor 2 … Sensor N Distributed data sources “…data fusion is a formal framework in which means and tools for the alliance of data originating from different sources are expressed. It aims at obtaining information of greater quality; the exact definition of “greater quality” will depend on the application” (JDL-Joint Directors of Laboratories model, USAF) Level 0-Pre-processing of sensor data Areas of the current and Future research projects are yellowed Level 5-User refinement Level 1-Object assessment Level 2- Situation assessment Level 3- Impact assessment Level 4Process refinement Data Base Management System Support DB Fusion DB HumanComputer interface Distributed information sources Sensor management, resource management (Erik Blash, Fusion-2002, July, 2002, Annapolis, USA) 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Project Research Objectives Development of DF software tool providing support for design (first of all, for learning!) and implementation of DF applications of broad spectrum, in particular, providing support for : Development of ontology-based meta-models of data sources, meta-model of decision fusion and conceptual model of DF software tool, Development of Multi-agent architecture and Design and implementation of applications of broad spectrum. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Examples of case studies and application used in Projects Case studies -KDD Cup99 dataset -- Preprocessed relational data specifying Intrusion Detection task http://kdd.ics.uci.edu/databases/kddcup99.html -Landsat Multi-Spectral Scanner image dataset http://www.dfc-grss.org/data/grss_dfc_0010.zip -STULONG dataset– Longitudinal Study of Atherosclerosis Risk Factors http://euromise.vse.cz/challenge/en/projekt/index.php Application to be used in debugging and validation of MAS DK-DF - Intrusion detection learning system (Project also funded by EOARD/AFRL) 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Subtasks of the Project matching Semantic Web Mining area 1. Design and implementation of meta-model of data sources caused by heterogeneity and distribution of data to be fused. 2. Design and implementation of meta-model of distributed learning. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Multiplicity of Data Sources Presenting User’s Activity in Intrusion Detection system Host-based sources SPP - Statistical processing program Log of commands run by users plus resource System program 3 System program 2 Log of all user logins/logouts and system startups and shutdowns Auditing subsystem of OS Filtered OS audit trail DNS log SPP DNS statistical data HTTP log HTTP service SPP statistical data set 2 SPP SPP SPP Telnet log Telnet service FTP service HTTP statistical data statistical data set 3 OS audit trail statistical data Mail log Mail service statistical data set 1 SPP Log of all login failures System program 1 DNS service SPP Mail statistical data SPP FTP log Tcpdump statistical data Network-based sources Telnet statistical data SPP SPP FTP statistical data Tcpdump TCPDUMP (WINDUMP) IP ICMP Header Header Network Packet … DNS Data … Network Packet HTTP Data … Network Packet SMTP Data … Network Packet TELNET Data Network Packet … FTP Data UDP/TCP Header IP Header Network Packet Network Traffic 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Interrelation of Semantic Web and Ontology-oriented Research within the Project Semantic Web considers development and standardization of the ontology specification languages (XML, RDF, DAML+OIL), ontology-based query languages, ontology editors, etc). Semantic Web Mining considers specific problems of ontology design technology for (Web-based) Data Mining systems. Any DF system technology supposes (Web-based) distributed Data Mining and KDD and that is why it is a subarea of the Semantic Web Mining. Ontology-based Data and Information Fusion system design put a number of specific problems of technological sort. Among them, the most important one is a technology for distributed design of distributed ontology. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland What is distributed design of distributed ontology? Data Sources Meta-model Sensor Data Source Data Source management agent Meta-data manager Sensor Data Source Data Source management agent Data Source Manager Data Source Manager Ontology-based meta-model of Data sources ……. Data Source Manager Data Source management agent Data Source management agent Sensor Data Source Data Source Manager “KDD Master” Agent Data Source Sensor Meta-model =Ontology + Data source models at meta-level supporting a unified view of data of particular sources 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland DF system ontology DF Problem ontology Shared component of Application ontology Private Private component of component of application application ontology of data ontology of data source 2 source 1 … Private component of application ontology of data source k Tower of DF application ontology components 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Distributed Ontology and Protocols for Distributed Ontology Design KDD agent of source 1 Data Source 1 DS- 1 management agent KDD agent of source k Meta-level KDD Agent Protocols, Functions Shared component of application ontology Private component of application ontology-k Protocols, Functions Problem and shared components of application ontology Agent 1 Shared component of application ontology Private component of application ontology-k ……. Shared component of application ontology Agent k Shared component of application ontology Private component of application ontology-3 Protocols, Agent 2 Functions DS- 2 management agent Data Source 2 Data Source k DS- k management agent KDD agent of source 2 “KDD Master” Agent Protocols, Functions Protocols, Functions Private component of application ontology-3 Agent 3 DS- 3 management agent KDD agent of source 3 Data Source 3 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Particular Tasks to Be Solved on the Basis of Meta-model of Data Sources • Providing for monosemantic understanding of terminology used in data specification by distributed analysts; • Solution of the entity identification problem; • Providing consistency of data representation (in case if the same attributes are presented differently in different data sources); • Providing a gateway between ontology and distributed databases accessibility making possible interaction between ontology and distributed databases, and several other tasks. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Meta-model of Data Sources: Ontology + Protocols => Monosemantic understanding of terminology Monosemantic understanding of terminology among DF system components is provided by shared vocabulary used by DF system distributed entities for communication. This excludes different naming of the same entities and their properties in different sources, and equal naming of different entities within different data sources thus providing integrity and consistency of shared vocabulary. Protocols Supports distributed collaborative design of coherent ontology by distributed analysts. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Example of Application Ontology: High-level Part of Intrusion Detection Domain Ontology Reconnaissance CI Collection of R Network attack ABE Applications Information IdentificationIS of services IO Identification of OS IH A UE and Banners Enumeration Implantation and threat realization Users and Groups Enumeration I Creating Back Resource Doors RE Enumeration Getting Identification CBD Access to of hosts SPIH Resources CT Network Ping DC Proxy Port Scanning Covering GAR Sweeps GAD scanning Tracks ER ST Gaining TCP connect Escalating PS Additional scan Dumb Privilege TR Threat Data SS Notions of host scan TCP SYN Realization DHS micro-layer scan ID DOS CD Scanning SFB SF Denial of 'FTP Confidentiality Integrity Service TCP FIN Bounce' SN destruction destructio SX scan TCP Null n TCP Xmas scan HS SU "Part of" relationship Tree scan Half scan UDP scan Notions of lower levels “Subclass of" relationship 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland The Simplest ("top-down") Meta-protocol for Collaborative Ontology Design … Source 1. Local source expert Source 1: Data preparation agent Source N: Local Source N: Data source expert preparation agent Meta-data Application description agent domain expert Forming the basic variant of ontology Sending the basic variant Analysis of the suggested basic variant Sending the basic variant Analysis of the suggested basic variant Modifying and expanding the ontology Synchronization of modifications by the basic protocol Modifying and expanding the ontology Synchronization of modifications by the basic protocol 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Ontology Synchronization Protocol Represented in Terms of UML-sequence Diagram 2 1 Current state reading 3 4 5 6 7 8 Request for required ontology descriptions Legend: Unconfirmed changes buffer query Representation of current state of ontology Forming the current representation of ontology Changes of ontology Recording the changes Sending current changes to the shared ontology Forming the current representation of ontology Representation of current state of ontology Confirmation/rejection of suggested changes Periodic request for suggested changes Verification of changes Introducing changes Introducing of changes Deletion of verified changes 9 Adding changes to ontology Deletion of verified changes 1. Local source expert 2. Local source data managing agent 3. Local source ontology 4. Local source: buffer of temporary changes 5. KDD master (Metadata description agent) 6. Shared ontology 7. Meta-level agent: buffer of temporary changes 8. Application expert (meta-level) 9. Local source determining the modified ontology part 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Meta-model of Data Sources: Entity Identification Problem Explanation of Entity Identification Problem Data Source 3 # of case Attributes of Data source 3 1 Data Source 2 2 Data Source 1 4 8 9 11 # of case Attributes of Data source 1 # of case Attributes of Data Source 2 14 15 1 1 4 3 5 4 9 7 11 9 12 11 14 15 15 19 17 19 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Demonstration of Entity Identification Problem: Intrusion Detection Application Host-based sources statistical data on Connection 1 Log of commands run by users plus resource System program 3 SPP statistical data on Connection N System program 2 System program 1 Auditing subsystem of OS SPP Filtered OS audit trail OS audit trail statistical data on Connection 1 Mail log Mail service statistical data on Connection 1 Log of all user logins/logouts and system startups and shutdowns SPP FTP log FTP service Network-based sources SPP statistical data on Connection N OS audit trail statistical data on Connection N Case 1 Mail statistical data on Connection 1 SPP Case N FTP statistical data on Connection N Tcpdump Tcpdump statistical data on Connection 1 … SPP TCPDUMP (WINDUMP) Tcpdump statistical data on Connection N TCP Hdr IP (FIN) Hdr … SMTP Data … SMTP Data … SMTP Data … SMTP TCP IP Data Hdr Hdr … TCP Hdr IP (ACK) Hdr TCP Hdr IP (SYN) Hdr Connection 1 ………………………………………………………………………………………… TCP Hdr IP (FIN) Hdr … FTP Data … FTP Data … FTP Data … FTP Data TCP IP Hdr Hdr … TCP Hdr IP (ACK) Hdr TCP Hdr IP (SYN) Hdr Connection N 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland A Technique for Entity Identification Problem In the DF problem ontology, for each instance of an object to be classified, the notion of entity identifier ("ID entity") is introduced. This entity identifier plays the role of the primary key of the instance (in analogy with the primary key of a table). For each such identifier, a rule as a component of the shared part of application ontology is defined, which can be used to calculate the value of the instance key. A rule is a function which arguments are chosen from the set of this entity attributes. A rule is defined for each local data source to uniquely connect the entity identifier and the local primary key in this source. This rule specifies: how to derive the local primary key of instance from the entity identifier value; how to derive the entity identifier value from the value of the local primary key of an instance of the source. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Meta-model of Data Sources: Diversity of Measurement Scales of the Same Attributes in Different Data Sources Let X be an attribute in application ontology that is measured differently in different sources. 1. In the shared component of application ontology, the type and the measurement unit of the attribute X are determined. Selection of attribute X specification within shared part of application ontology is made by experts during negotiations according to a synchronization protocol. 2. In all the sources where X is present, expressions are determined for this attribute, through which it can further be converted into the same scale in all the sources. This allows using the values of attributes on the metalevel regardless of the data source from which they originated. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Meta-model of Data Sources: Interaction of Ontology and Databases of Sources The task arises due to the fact that application ontology entities are specified in terms of ontology notions but their instances are represented in terms of database language. To provide interaction of ontology and databases of sources (accessibility of data requested in ontology terms) , a special gateway is developed. Application DF problem ontology DF application ontology Client-gateway DF problem ontology DF Application Local source data ontology properties Access via VIEW objects Database objects Local data source Three-level hierarchy of access to the database objects 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Meta-model of Distributed Learning Components of meta-model of distributed learning: • Meta-model of decision making and combining decisions of multiple base-level classifiers; • Model of distributed data management (allocation training and testing data sets for learning particular classifiers; management by computation of metadata for upper level example-based learning, etc.); • Approaches and formal techniques used for combining decisions. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Conclusion: Future work . 1. Development of sophisticated ontology editor supporting distributed design of a distributed ontology. 2. Further design and Implementation of Data Fusion System software tool for development and implementation of particular distributed applications in Data Fusion area. 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland Thank you! For more information and related publications please contact E-mail: [email protected] http://space.iias.spb.su/ai/english/gorodetski.htm Acknowledgement This research is funded by AFRL/IF (EOARD), 1999-2003 2nd Semantic Web Mining Workshop at ECML/PKDD-2002, August 2002, Helsinki, Finland