Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
인공지능 특강 프로젝트 - Development of Decision Tree Algorithm for Semantic Web data - 2010313148 전동규 Agenda 1. Project Purpose 2. Motivation 3. Related Work 4. Algorithm 5. Description of Problem 2 1. Project Purpose Relational Data - Semantic Web Data (Linked data) - Semantic Decision Tree Decision Tree Algorithm - C4.5 Algorithm - 3 1. Project Purpose • Goal of project Development of new kind of Decision Tree algorithm which supports decision making based on Semantic Web environmental information Solve the several problems which is already solved by other related researches • Data : Linked Data(Semantic Web ontology) 4 1. Project Purpose • Ontology Class : Definition of Set Property : Relations between instances Instance : Individuals which are belonged in classes String Int Boolean String name Rich age name Workplace Location String Hospital Person School Doctor Person Class Teacher Student Range type Person Datatype Property Object Property Schema of Example Ontology 5 2. Motivation What is Semantic Web ? • Definition of Semantic Web The Semantic Web is an evolving development of the World Wide Web in which the meaning (semantics) of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content Semantic Web is based on ontologies which corresponds to Semantic Web data • Linked Data The term Linked Data is used to describe a method of exposing, sharing, and connecting data on the Web [1] Berners-Lee, T. (2001), “ The Semantic Web,” Scientific American, Vol. 501. 6 2. Motivation Development status of Semantic Web • Increase of Semantic Web Data Appearance of Semantic Web Document Search Engines Falcons : Twenty millions over RDF/XML Documents Swoogle : Three millions over Semantic Web Documents Date RDF/XML Quadruple 2009-09-02 21,639,337 2,936,868,638 2009-05-29 19,919,364 2,177,084,709 Date Semantic Web Document Triple 2009-10-17 3,109,616 1,065,799,526 Falcons Swoogle Open data in Semantic Web LINKINGOPENDATA : The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources The data sets consist of over 4.7 billion RDF triples, which are interlinked by around 142 million RDF links (May 2009). 7 2. Motivation Development status of Semantic Web • Increase of Semantic Web Data Semantic Web based Portal Site Twine : Twine is a Semantic Web Portal that making networks of information based on user’s posts which consist of their own information and favorite things. Every information composing Twine is written in RDF and OWL format. Twine have millions of visitors in a month, and they have over millions of relationships between 3 millions of semantic tags (March 2009) The necessity of mining useful knowledge from huge size ontology is highly expected. Therefore, Data Mining methodology for Semantic Web should be ready for this necessity. 8 2. Motivation Decision Tree in Semantic Web • Traditional Decision Tree algorithm is impossible to apply in Semantic Web Semantic Web based Ontology has special characteristics for mining Since Semantic Web document has network structure, multi-value issue is occurred Traditional Decision Tree just uses value of variables. Therefore, additional information of Semantic Web are can not be applied Converting Semantic Web data into single table style that used to use in traditional decision tree algorithm is impossible 9 3. Related Work • Arno J.Knobbe[2] developed decision tree algorithm for Multi-relational database Selection Graph is suggested to do decision tree on RDB Selection Graph is composed of Node, Edge, and condition and it can be expressed in SQL syntax This research suggested partial solution about multi-value issue which also happened in Semantic Web ontology. However, this methodology can not be applied to Semantic Web which contains a lot of information than RDB • David Jensen[3] suggested methodology that converting social network data to single table data which can be applied to Traditional Decision Tree algorithm ‘QGraph' that kind of query language to get the local network from entire social network is suggested QGraph is composed of Node, Edge, and condition and it can query many objects at once Since ontology information are manually converted to single table form, missing information will be occurred a lot [2] Arno J. Knobbe., Arno Siebes., Danil Van Der Wallen., Syllogic B. V. (1999). “Multi-Relational Decision Tree Induction,” In Proceedings of PKDD’99, [3] D. Jensen., and J. Neville.(to appear) (2002). “Data mining in networks,” Papers of the Symposium on Dynamic Social Network Modeling and Analysis. 10 4. Algorithm • Search procedure of algorithm follows C4.5 algorithm • New methodologies are required to learn concepts in ontology • ‘Constructor’ can be used as similar as attributes in traditional Decision tree • Related works used the terms ‘Refinement’ as an attribute in Decision Tree 11 4. Algorithm Refinements • What is a Refinement? Refinement is a condition for split branches in decision tree. In this algorithm, property and class from ontology are used as a refinement. When define a refinement, Role Constructors from Description Logic are applied to make the best use of information in Semantic Web • Type of Refinements Concept Constructor Refinement : Applying type information of instances Cardinality restriction Refinement : Applying cardinality information on object property Domain restriction Refinement : Applying value of datatype property Qualification Refinement : Applying information of quantification restrictions and range class of object property 12 4. Algorithm Refinements • Developed Refinements Example Concept Constructor Refinement Hospital not Human Domain restriction Refinement Age.(≥ 21) Cardinality restriction Refinement ≥ 3 hasChild Qualification Refinement hasChild.Blond 13 4. Algorithm • The list of syntax information which can be expressed in ontology Language Syntax RDF rdf:type owl:equivalentProperty rdfs:domain owl:FunctionalProperty rdfs:range owl:hasValue rdfs:subClassOf owl:intersectionOf rdfs:subPropertyOf owl:InverseFunctionalProperty owl:AllDifferent owl:inverseOf owl:allValuesFrom owl:masCardinality RDFS owl:cardinality OWL Language OWL Syntax owl:minCardinality owl:Class owl:ObjectProperty owl:complementOf owl:oneOf owl:DatatypeProperty owl:sameAs owl:DataRange owl:someValuesFrom owl:differentFrom owl:SymmetricProperty owl:disjointWith owl:TransitiveProperty owl:equivalentClass owl:unionOf 14 5. Description of Problem Train problem Ten trains • After learning, found definition of eastbound train is as follows 15 5. Description of Problem Train problem • Artificial task of learning to predict whether a train is headed east or west • Data is consist of relation tuples • Relations eastbound(T) : train T is eastbound has-car(T,C) : C is a car of T infront(C,D) : car C is in front of D long(C) : car C is long open-rectangle(C) : car C is shaped as an open rectangle similar relations for five other shapes jagged-top(C) : C has a jagged top sloping-top(C) : C has a sloping top open-top(C) : C is open contains-load(C,L) : C contains load L 1-item(C) : C has one load item similar relations for two and three load items 2-wheels(C) : C has two wheels 3-wheels(C) : C has three wheels 16