Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ontology Generation and Applications Dr. A.C.M. Fong, CEng Professor of Computer Engineering School of Computing and Mathematical Sciences Faculty of Design and Creative Technologies Auckland University of Technology [email protected] Contents 1. 2. 3. 4. 5. 6. Introduction – Semantic Web and Ontology Related Work – Ontology Generation Toward Automated Ontology Generation Fuzzy Ontology Generation Framework Application 1 – Scholarly Info Application 2 – Service Helpdesk [email protected] 2 1. Introduction Semantic Web The basis for the Semantic Web is on its ability to represent real-life domains accurately so that it enables programs to completely understand the environment in which they operate. In summary, Semantic Web provides the following benefits: SWeb offers an expressive metadata model to represent data, so that data can be managed effectively. Programs can understand the semantic concepts described in metadata used on Semantic Web. Hence, knowledge carried on the Semantic Web can be shared and reused among different programs. Users can interact with programs using a semantic query language to specify their requests and thereby improving the retrieval performance. Deductive mechanism that is used to derive new information from existing information can be described clearly, so that knowledge can be reasoned with efficiently. [email protected] 3 1. Introduction Semantic Web Architecture [email protected] 4 1. Introduction Semantic Web Architecture - Layers Foundation Layer. Semantic Web uses Uniform Resource Identifier URI to identify resources and uses Unicode to encode the documents. Schema Layer. This layer comprises XML + NS (Namespace) + xmlschema layer and the RDF + rdfschema layer. This layer defines objects and classes, their relations and constrains. The XML Schema (XMLS) and RDF Schema (RDFS), which are based on XML and RDF respectively, are used for these layers. RDFS has widely been used to describe classes at the Schema Layers. [email protected] 5 1. Introduction Semantic Web Architecture - Layers Ontology Layer. This layer provides constructs on using meta-information to represent domain knowledge. In this layer, information is represented as ontology, which is adopted by the Semantic Web to define knowledge. Logic Layer. This layer infers more knowledge from the existing knowledge. It can be integrated with the Ontology Layer. In this layer, concepts and relationships defined in lower layers are converted into Turing-complete logic languages in order to generate new knowledge. [email protected] 6 1. Introduction Semantic Web Architecture - Layers Proof Layer. This layer provides a mechanism to check whether a statement is true or not. Trust Layer. This Layer provides a mechanism which resolves conflicts between knowledge carried by the Semantic Web to form the "Web of Trust" Digital Signature Layer. This layer uses public key cryptography to secure documents. [email protected] 7 1. Introduction Ontology – Definition Ontology has different definitions. A commonly cited definition defines ontology as a formal, explicit specification of a shared conceptualization. Conceptualization refers to an abstract model of phenomena in the world by having identified the relevant concepts of those phenomena. Explicit means that the type of concepts used, and the constraints on their use are explicitly defined. Formal: should be machine readable. Shared: should capture consensual knowledge accepted by the communities. [email protected] 8 1. Introduction Ontology Research Ontology is regarded as a standard conceptual model for knowledge representation, especially on Semantic Web. The term ontology engineering has been proposed to imply ontology-related research in computer science Current interesting issues on ontology engineering include ontology generation, ontology mapping, ontology integration and ontology versioning. This presentation focuses on ontology generation. [email protected] 9 1. Introduction Ontology Description Languages Ontology is described using an ontology description language. Ontology description languages are based on Web metadata description languages, which can be classified into the following three groups: HTML-based XML-based RDF- based [email protected] 10 1. Introduction HTML-based Ontology Description Languages The tags supported by traditional Web are sufficient to represent some semantic knowledge. Simple HTML Extension (SHOE) and Ontobroker have embedded additional tags into HTML to represent knowledge. However, HTML does not support self-defined tags. Therefore, HTML-based approach is difficult to define classes for ontology. Hence, XML-based ontology description languages have been proposed to overcome this limitation. [email protected] 11 1. Introduction XML-based Ontology Description Languages These languages are usually based on XML Schema (XMLS) or Document Type Definition (DTD). DTD allows users to define new markup types to describe information. Therefore, users can define ontology classes using DTD. Moreover, XMLS supports the definition of relations between classes. Thus, XMLS and DTD can be used directly to embed semantic information. However, since XML actually only renders syntactic support for knowledge representation, XML-based ontology description languages face the following problems when representing knowledge [email protected] 12 1. Introduction XML-based Ontology Description Languages A mechanism to define some relationships that are usually central in ontologies such as is-a or element-of relationships is lacking in XML. XML does not support any notion of inheritance, which is an important attribute in ontologies. In XML, concepts are defined through tags, which can be either a string or a combination of other nested tags. Such mechanism may not be sufficient for defining concepts in ontology, which may require richer data structures to be represented. In XML, the order of tags appearing in a document must be previously defined. In contrast, the ordering of attribute description does not matter on ontology. [email protected] 13 1. Introduction RDF-based Ontology Description Languages RDF extends XML to become a standard for knowledge representation. In addition, RDF Schema (RDFS) can be used to define classes and class hierarchies in a domain. The standardization supported by RDF provides two important contributions: A standard set of modeling primitives (e.g. class, instance, etc.) and their relationships (e.g. subclass) are provided. A standardized syntax for writing ontologies is supported. Popular RDF-based ontology description languages include DARPA Agent Markup Language (DAML), Ontology Inference Language (OIL), DAML+OIL and Web Ontology Language (OWL) [email protected] 14 1. Introduction DARPA Agent Markup Language DAML or DAML-ONT extends RDFS to represent ontology using the object-oriented approach. It embeds some object-oriented concepts to represent classes. Thus, the class representation of DMAL-ONT is better than RDF. Example of DAML-ONT to represent the class "Journal", which is a subclass of the class "Publication Medium", but is disjoint with classes "Conference" and "Workshop" (i.e. an object which belongs to class "Journal" can not belong to classes "Conference" or "Workshop" <Class ID="Journal"> <subClassOf resource="#Publication Medium"= > <disjointFrom resource="#Conference"= > <disjointFrom resource="#Workshop"= > < =Class> [email protected] 15 1. Introduction Ontology Inference Language OIL extends RDFS to represent ontology. It is designed based on three criteria: Frame-based. It supports frames to define classes and properties of classes. Thus, class contents can be described more informatively (e.g. constraints can be used for class properties) Description Logic. It describes knowledge using logic rules. Thus, knowledge is represented mathematically and can be processed by programs. Uses Web Standard. It is based on XML and RDFS. [email protected] 16 1. Introduction Ontology Inference Language <rdfs:Class rdf:ID="animal"= > <rdfs:Class rdf:ID="plant"> <rdfs:subClassOf> <oil:NOT> <oil:hasOperand rdf:resource="#animal"= > <oil:NOT= > < =rdfs:subClassOf> < =rdfs:Class> <rdfs:Class rdf:ID="tree"> <rdfs:subClassOf rdf:resource="#plant"> < =rdfs:Class> Class "animal" is defined, followed by class "plant", which is defined with the operator "NOT" used to state that it is strictly not identical with class "animal“ (i.e. objects which belong to class "animal" can not belong to class "plant" and vice-versa). Finally, class "tree" is defined as a subclass of "plant". [email protected] 17 1. Introduction DAML vs. OIL Compared with DAML, OIL can represent class properties better, but DAML can represent class relationships more clearly. Hence, they can be combined to form a better ontology description language DAML + OIL It defines class relationships based on DAML. Class properties are defined in a similar way as OIL. Hence, DAML+OIL takes the advantages of both DAML and OIL. [email protected] 18 1. Introduction Web Ontology Language OWL is extended from DAML+OIL to allow users to define various types of relationships between classes. Properties can also be defined using additional constructs in OWL. OWL has three sublanguages OWL Lite OWL DL OWL Full. [email protected] 19 1. Introduction Web Ontology Language Even though there is the same OWL syntax used among these sublanguages, they have a little difference in design aimed at various communities of implementers and users: OWL Lite only primarily supports classification hierarchy and simple constrains when designing classes. OWL DL includes all OWL language constructs but they can be used only under certain restriction (e.g. a class cannot be an instance of another class). OWL Full allows all OWL language constructs to be used without any restriction. [email protected] 20 1. Introduction Web Ontology Language <rdf:RDF> xmlns:owl ="http://www.w3.org/2002/07/owl#" xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-nsl#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:xsd ="http://www.w3.org/2000/10/XMLSchema#" xmlns:daml="http://www.w3.org/2001/10/daml+oil#" Header Info <owl:Ontology rdf:about="Scholarly Information"> <owl:versionInfo>v 1.0 2009-12-07 19:06:40</owl:versionInfo> < =owl:Ontology> <owl:Class rdf:ID="Concept1"> <owl:rdfLabel="Data Mining"> < =owl:Class> <owl:Class rdf:ID="Concept2"> <owl:rdfLabel="Fuzzy Logic"> < =owl:Class> < =owl:Class rdf:ID="Concept2"> < =owl:Class rdf:ID="Concept3"> <owl:rdfLabel="Data Mining, Fuzzy Logic"= > <rdf:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="Concept1"> <owl:onProperty rdf:resource="Concept2"> < =rdf:subClassOf> < =rdf:RDF> [email protected] Ontology Name and Version 3 classes: Concept1 (labelled Data mining), Concept2 (labelled Fuzzy Logic) and Concept3. Concept3 is a subclass of both Concept 1 and Concept2. 21 2. Related Work Ontology Generation Ontology uses classes, which contain attributes, to represent concepts. Ontology also supports taxonomy and nontaxonomy relations between classes. Although editing tools such as Protege [1] and OilEd [2] have been developed to help users to create and edit ontology, it is a tedious task to manually derive ontology from data. [email protected] 22 2. Related Work Ontology Generation – Approaches Ontology can be generated from various types of data, mostly textual. Large corpora [3,4] are considered as good sources for mining knowledge for constructing ontology, since the information in the corpus is usually well annotated. Therefore, it can be easily processed by other programs. Ontology can also be generated from a knowledge base of rules [5], which is represented as a tree with rules residing at tree nodes. Statistical approaches have been used to estimate the existence of relationships between entities involved in rules [6]. [email protected] 23 2. Related Work Ontology Generation – Approaches When knowledge is represented in semistructured schemata such as XML and RDF, its contents can easily be parsed by programs; techniques have been proposed to generate ontology from semi-structured schemata based on Graph Theory [7] and statistical approaches [8]. Learning Source Description (LSD) proposed [9] to generate ontology from any arbitrary formalisms of semi-structured schemata. Entity-Relationship model used in database schema has also been adopted as an information source for generating ontology [10,11]. [email protected] 24 2. Related Work Ontology Generation –Textual Data For textual data, ontology concepts can be extracted efficiently using Natural Language Processing (NLP) techniques [12,13]. NLP for preprocessing the textual data in order to extract significant keywords. WordNet [14] can be used to improve accuracy of ontology generated by NLP-based techniques. However, the NLP techniques have difficulty in finding semantic relationships among the keywords. Data mining techniques can be combined with NLP to improve the efficiency of ontology generation. In Text-to-Onto [15], association rules are used to ¯find associative relations between keywords, which are used to construct non-taxonomy relations for the ontology. [email protected] 25 2. Related Work Ontology Generation –Textual Data Keywords' frequencies are often used in statistical approaches [16,17] to identify significant keywords that can be used to represent a certain concept. Clustering techniques have also been applied to generate ontology from textual data [18]. Using significant keywords extracted from textual data, clustering techniques can cluster documents and interpret topics from the generated clusters. [email protected] 26 2. Related Work Ontology Generation –Clustering Clustering can be used to mine hidden knowledge from data to construct an ontology. It can also be used to enrich existing ontology. Traditional clustering techniques are useful for generating non-taxonomy relations for ontology. In particular, conceptual clustering techniques are powerful clustering techniques that can conceptualize clusters and construct a concept hierarchy of clusters useful for generating taxonomy relations for ontology. E.g. approach based on COBWEB [18] that can generate taxonomy relations among concepts on a domain for ontology generation. Mo'K [19] is a system that can obtain taxonomy relations from tagged text using conceptual clustering. [email protected] 27 2. Related Work Ontology Applications – Scholarly Info In E-Scholar Knowledge Inference MOdel (ESKIMO) [20], knowledge on scholarly publications is represented as a simple ontology, known as OntoPortal, which is manually developed and maintained. OntoPortal describes and provides links to other external research pages on the Web. Hypertext links between the web pages are also described in the OntoPortal ontology. ESKIMO allows users to retrieve scholarly information from the constructed ontology by using queries represented as Prolog-like rules. [email protected] 28 2. Related Work Ontology Applications – Scholarly Info In the Scholarly Ontology Project [21], a digital library Web server is constructed using Semantic Web technologies in order to support scholarly retrieval. Developed using a collaborative approach in which researchers will submit their documents in a specifically structured format. As such, the contents of the submitted documents can be further processed in the system and converted into scholarly ontology accordingly. [email protected] 29 2. Related Work Ontology Applications – Scholarly Info In the Research in Semantic Scholarly Publishing (RSSP) project, scientific publications are collected from online archives such as the Open Archive Initiative (OAI) [22]. Information of the documents (e.g. their authors, titles, citations, publishers, etc.) is extracted, indexed and converted into ontology formalism. DAML+OIL is used to annotate the ontology as Semantic Web pages to support scholarly retrieval [email protected] 30 2. Related Work Summary Many techniques to construct ontology from various data types/sources; mainly textual data Traditionally, NLP techniques are used to analyze textual data. Recently, data mining techniques have been incorporated into NLP to further discover hidden knowledge from textual data. Conceptual clustering is an advanced data mining technique that can organize data in a hierarchical conceptual structure. Thus, conceptual clustering is a useful technique to discover knowledge for generating ontology from textual data. [email protected] 31 3. Toward Automated Ontology Generation Basics Initial focus on Scholarly info Scholarly ontology generated directly from explicit information on scientific publications (e.g. their titles, authors, citations, etc.). Other advanced scholarly knowledge such as research experts and areas are usually inferred manually by human experts. [email protected] 32 3. Toward Automated Ontology Generation Basics To construct scholarly ontology from citation database, we use data mining techniques to discover hidden knowledge in the database. Data mining techniques include Context-based Cluster Analysis (CCA) and Fuzzy Concept Hierarchy Generation (FCHG) Discovered knowledge then converted and integrated into the ontology formalism. As such, apart from the implicit information available on scientific publications, Scholarly Ontology can also support other useful scholarly retrieval functions such as research experts finding and trends detection [email protected] 33 3. Toward Automated Ontology Generation Context-based Cluster Analysis CCA is based on Formal Concept Analysis (FCA) [23] technique. FCA provides a formal model, known as formal context, to represent relations between objects and attributes in a data set. We use formal contexts to represent multiple resultant clustering data. Then, relations between the formal contexts are analyzed to find the relations between the corresponding resultant clustering data [email protected] 34 3. Toward Automated Ontology Generation Fuzzy Concept Hierarchy Generation Concept hierarchy is a data structure useful for knowledge presentation. Widely used in data mining applications. Size of a concept hierarchy may be large to reflect the knowledge in a domain precisely. Manual construction may be difficult and tedious. Need conceptual clustering [email protected] 35 3. Toward Automated Ontology Generation Fuzzy Concept Hierarchy Generation Many conceptual clustering techniques organize knowledge as a concept hierarchy. It may not be sufficient for representing information in a real domain. FCA, which is a data exploratory technique, supports concept lattice that provides a more informative conceptual model for representing knowledge. FCA-based conceptual clustering techniques are potentially useful for constructing taxonomy knowledge of ontology. However, the typical FCA-based conceptual clustering techniques do not support uncertainty information. [email protected] 36 3. Toward Automated Ontology Generation Fuzzy Concept Hierarchy Generation Traditional FCA-based conceptual clustering approaches can’t represent vague information… Need fuzziness L-Fuzzy context uses linguistic variables to represent uncertainty in the context. But needs human interpretation to define linguistic variables. Fuzzy concept lattice generated from L-fuzzy context usually causes a combinatorial explosion of concepts (compared to traditional concept lattice) [email protected] 37 3. Toward Automated Ontology Generation Fuzzy Concept Hierarchy Generation We combine fuzzy logic and FCA as Fuzzy Formal Concept Analysis (FFCA). In FFCA, uncertainty information is directly represented by a real number of membership value in the range of [0,1]. Linguistic variables are no longer needed. Compared to fuzzy concept lattice generated from L-fuzzy context, the fuzzy concept lattice generated using FFCA will be simpler in terms of the number of formal concepts. It also supports a formal mechanism for calculating concept similarities. Based on FFCA, we propose the Fuzzy Conceptual Clustering technique in FCHG to generate fuzzy concept hierarchy. [email protected] 38 4. Fuzzy Ontology Generation Framework Fuzzy Ontology Application of fuzzy logic offers a possible solution for dealing with uncertainty information Fuzzy ontology is generated and used in text retrieval and search engines, where membership values are used to evaluate the similarities between the concepts in a concept hierarchy Manual generation of fuzzy ontology from a predefined concept hierarchy is a difficult and tedious task that often requires expert interpretation. [email protected] 39 4. Fuzzy Ontology Generation Framework Introduction Efficient method for generation of concept hierarchy and fuzzy ontology is highly desirable We propose a Fuzzy Ontology Generation Framework (FOGF) that can automate fuzzy ontology generation from uncertainty data based on Formal Concept Analysis (FCA) theory Generated fuzzy ontology is mapped to a semantic representation in OWL [email protected] 40 4. Fuzzy Ontology Generation Framework Overview Fuzzy Concept Lattice Concept Hierarchy Fuzzy Ontology Semantic Web Uncertainty Information Fuzzy Formal Concept Analysis Concept Hierarchy Generation Fuzzy Ontology Generation Semantic Representation Conversion Fuzzy Formal Concept Analysis incorporates fuzzy logic into Formal Concept Analysis to represent vague information Concept Hierarchy Generation clusters the fuzzy concept lattice generated by FFCA to construct a concept hierarchy in two steps: Fuzzy Conceptual Clustering and Hierarchical Relation Generation Fuzzy Ontology Generation constructs fuzzy ontology from a fuzzy context using the concept hierarchy created by fuzzy conceptual clustering Semantic Representation Conversion – make knowledge accessible and sharable on the Web environment. Use OWL [email protected] 41 4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis Definition (Fuzzy Formal Context) A fuzzy formal context is a triple K =(G, M, I = (G M)) where G is a set of objects, M is a set of attributes, and I is a fuzzy set on domain G M. Each relation (g, m) I has a membership value (g,m) in [0,1]. [email protected] 42 4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis Fuzzy formal context can be represented as a cross-table (Table 1) Data Mining Clustering Fuzzy Logic D1 0.8 0.12 0.61 D2 0.9 0.85 0.13 D3 0.1 0.14 0.87 An α-cut can be set to eliminate relations with low membership values, e.g. α = 0.5 (Table 2) Data Mining Clustering Fuzzy Logic D1 0.8 - 0.61 D2 0.9 0.85 - D3 - - 0.87 The context has 3 objects representing 3 documents, D1, D2 and D3. It also has 3 attributes, “Data Mining”, “Clustering” and “Fuzzy Logic” representing 3 research topics. The relationship between an object and an attribute is represented by a membership value in [0, 1]. [email protected] 43 4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis Definition (Fuzzy Representation of Object) Each object O in a fuzzy formal context K can be represented by a fuzzy set (O) as where {A1, A2,…, Am} is the set of attributes in K and µi is the membership of O with attribute Ai in K. (O) is called the fuzzy representation of O. [email protected] 44 4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis Generally, we can consider the attributes of a formal concept as the description of the concept. Thus, the relationships between the object and the concept should be the intersection of the relationships between the objects and the attributes of the concept Since each relationship between the object and an attribute is represented as a membership value in fuzzy formal context, the intersection of these membership values should be the minimum of these membership values, hence… [email protected] 45 4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis Definition (Fuzzy Formal Concept) Given a fuzzy formal context K =(G, M, I) and a confidence threshold T, we define A*= {m M | g A: (g, m) T} for A G and B* = {g G | m B: (g,m) T} for B M. A fuzzy formal concept (or fuzzy concept) of a fuzzy formal context (G, M, I) with a confidence threshold T is a pair (Af =(A), B) where A G, B M, A* = B and B* = A. Each object g (A) has a membership g defined as g = min (g,m) mB where (g,m) = membership value between object g and attribute m defined in I. If B = {} then g = 1 for every g. A and B are the extent and intent of the formal concept ((A), B) respectively. [email protected] 46 4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis This version of FFCA as presented in these Definitions preserves differently continuous values of objects’ memberships, crucial for calculating concepts’ similarities. In a formal context, a concept can have many superconcepts and subconcepts. However, the similarities of a concept to its superconcepts and subconcepts are different. With fuzzy concept lattice, we can make use of the fuzzy set theory to calculate the similarities between a concept and its subconcepts. [email protected] 47 4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis Definition (Fuzzy Formal Concept Cardinality) Since the fuzziness of a fuzzy formal concept is represented by membership values of objects of the concept, the cardinality of a fuzzy formal concept Kf = ((A), B) is defined as |Kf| = |(A)|. [email protected] 48 4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis Definition (Fuzzy Formal Concept Similarity) The similarity of a fuzzy formal concept Kf1 = ((A1), B1) and its subconcept Kf2 = ((A2), B2) is defined as E(Kf1,Kf2) = E((A1), (A2)). [email protected] 49 4. Fuzzy Ontology Generation Framework Step 1 Fuzzy Formal Concept Analysis {} Traditional concept lattice generated from Table 1 without membership values C1 {“Data Mining”} 0.5 {“Data Mining”, “Clustering”} {“Data Mining”} {D1, D2,D3} {“Data Mining”, “Clustering”} {D1,D2}} {D2(0.85)} C3 C2 {D1,D3} {D2} C3 {D1} {} 0.41 {D1(0.61)} {} {“Data Mining”, “Clustering”, “Fuzzy Logic”} 0.00 {“Data Mining”, C4 “Fuzzy Logic”} Fig. 3 {“Fuzzy Logic”} {“Data Mining”, C4 “Fuzzy Logic”} {“Data Mining”, “Clustering”, “Fuzzy Logic”} Fig. 2 0.35 0.00 {} C1 {D1, 0.00 D2,D3} C2 {D1(0.8), {D1(0.61), {“Fuzzy Logic”} D2(0.9)} D3(0.87)} 0.00 Fuzzy concept lattice generated from fuzzy formal context in Table 2 (similarities between concepts shown) [email protected] 50 4. Fuzzy Ontology Generation Framework Overview Fuzzy Concept Lattice Concept Hierarchy Fuzzy Ontology Semantic Web Uncertainty Information Fuzzy Formal Concept Analysis Concept Hierarchy Generation Fuzzy Ontology Generation [email protected] Semantic Representation Conversion 51 4. Fuzzy Ontology Generation Framework Step 2 Concept Hierarchy Generation Concept Hierarchy Generation clusters the fuzzy concept lattice generated by FFCA to construct a concept hierarchy in two steps: Fuzzy Conceptual Clustering and Hierarchical Relation Generation [email protected] 52 4. Fuzzy Ontology Generation Framework Step 2 a)Fuzzy Conceptual Clustering Compared to traditional clusters, the conceptual clusters generated have the following properties: Each conceptual cluster is considered as a human interpretable concept in the domain of the fuzzy concept lattice Each conceptual cluster is a sublattice extracted from the fuzzy concept lattice A formal concept must belong to at least one conceptual cluster e.g. a scientific document can belong to more than one research area [email protected] 53 4. Fuzzy Ontology Generation Framework Step 2 a)Fuzzy Conceptual Clustering Conceptual clusters are generated based on the idea at if a formal concept A belongs to a conceptual cluster R, then its subconcept B also belongs to R if B is similar to A. We can use a similarity confidence threshold Ts to determine whether two concepts are similar or not. [email protected] 54 4. Fuzzy Ontology Generation Framework Step 2 a)Fuzzy Conceptual Clustering Definition (Conceptual Cluster). A conceptual cluster of a concept lattice K with a similarity confidence threshold Ts is a sublattice SK of K which has the following properties: SK has a supremum concept CS that is not similar to any of its superconcepts. Any concept C CS in SK must have at least one superconcept C’ SK so that E(C,C’) > Ts. [email protected] 55 4. Fuzzy Ontology Generation Framework Step 2 a)Fuzzy Conceptual Clustering Fig. 5 shows the conceptual clusters generated from the fuzzy concept lattice given in Fig. 3 with similarity confidence threshold Ts = 0.5 {} 0.00 0.00 CK1 C2 {“Data Mining”} {“Fuzzy Logic”} C1 0.5 {“Data Mining”, “Clustering”} CK2 0.35 0.41 CK3 C3 0.00 0.00 {“Data Mining”, C4 “Fuzzy Logic”} {“Data Mining”, “Clustering”, “Fuzzy Logic”} Fig. 5 [email protected] 56 4. Fuzzy Ontology Generation Framework Step 2 b)Hierarchical Relation Generation Fuzzy conceptual clustering generates a set of conceptual clusters SC. To construct a concept hierarchy from the conceptual clusters, we need to find the hierarchy relations from the clusters. We first define a concept hierarchy Definition (Concept Hierarchy) A concept hierarchy is a poset (partially ordered set) (H,) where H is a finite set of concepts, and is a partial order on H. [email protected] 57 4. Fuzzy Ontology Generation Framework Step 2 b)Hierarchical Relation Generation Definition of superconcept and subconcept relations on conceptual clusters assures that each conceptual cluster has at least one superconcept, unless it corresponds to the root node of the concept hierarchy generated. However, we must prove that the relation is a partial order. Definition (Subconcept and Superconcept on a Concept Hierarchy) Let C1 and C2 be two conceptual clusters corresponding to two sublattices L1 and L2 of a fuzzy concept lattice F (K). Let the fuzzy formal concept I be the supremum of L1, i.e. I = sup(L1). C1 is the subconcept of C2, denoted as C1 C2 , if I is the subconcept of any concept C’ L2, or I C’ where is the partial order defined on F (K). Equivalently, C2 is the superconcept of C1. [email protected] 58 4. Fuzzy Ontology Generation Framework Step 2 b)Hierarchical Relation Generation Figure 8(b) illustrates the hierarchical relations constructed from the conceptual clusters given in Figure 8(a). Each concept in the concept hierarchy is represented by a set of its attributes. The supremum and infimum of the lattice are considered as “Thing” and “Nothing” concepts, respectively. {} Thing 0.00 0.00 CK1 C2 {“Data Mining”} CK2 {“Fuzzy Logic”} C1 0.5 {“Fuzzy Logic”} {“Data Mining”, “Clustering”} 0.35 0.41 {“Data Mining”, “Clustering”} C3 0.00 0.00 {“Data Mining”, “Clustering”, “Fuzzy Logic”} {“Data Mining”, C4 “Fuzzy Logic”} {“Data Mining”, “Fuzzy Logic”} CK3 Nothing Figure 8(a). Conceptual clusters. Figure 8(b). Concept hierarchy. [email protected] 59 4. Fuzzy Ontology Generation Framework Overview Fuzzy Concept Lattice Concept Hierarchy Fuzzy Ontology Semantic Web Uncertainty Information Fuzzy Formal Concept Analysis Concept Hierarchy Generation Fuzzy Ontology Generation [email protected] Semantic Representation Conversion 60 4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation This step constructs fuzzy ontology from a fuzzy context using the concept hierarchy created by fuzzy conceptual clustering. This is done based on the characteristic that both FCA and ontology support formal definitions of concepts. However, a concept defined in FCA has both extensional and intensional information in a balanced manner, whereas a concept in ontology emphasizes on its intensional aspect. To construct the fuzzy ontology, we need to convert both intensional and extensional information of FCA concepts into the corresponding classes and relations of the ontology. Thus, we define the fuzzy ontology as follows… [email protected] 61 4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation Definition (Fuzzy Ontology). A fuzzy ontology FO consists of 4 elements (C,AC,R, X), where C = set of concepts; AC represents a collection of attributes sets, one for each concept; R = (RT, RN) represents a set of relationships, which consists of 2 elements: RN is a set of non-taxonomy relationships and RT is a set of taxonomy relationships. Each concept ci in C represents a set of objects, or instances, of the same kind. Each object oij of a concept ci can be described by a set of attributes values denoted by AC(ci). Each relationship ri(cp,cq) in R represents a binary association between concepts cp and cq, and the instances of such a relationship are pairs of (cp,cq) concept objects. Each attribute value of an object or relationship instance is associated with a fuzzy membership value between [0,1] implying the uncertainty degree of this attribute value or relationship. X is a set of axioms. Each axiom in X is a constraint on the concept’s and relationship’s attribute values or a constraint on the relationships between concept objects [email protected] 62 4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation Example (Fuzzy Ontology). the Scholarly Ontology OS = (C, AC, R, X) is a fuzzy ontology where its components are as follows. C = {“Document”, “Research Area”} AC(“Document”) = {“Name” ,“Author”, “Title”, “Keywords”, “Abstract”, “Body”, “Publisher”, “Publication Date”} AC(“Research Area”) = {“Name”,“Keyword”} RN = {belong-to(“Document”, “Research Area”), consist-of(“Research Area”,”Document”)} RT = {superarea-of(“Research Area”, “Research Area”), subareaof(“Research Area”, “Research Area”)} X ={Implies(Antecedent(consist-of(I-variable(x1) I-variable(x2))) Consequent(belong-to(I-variable(x2) I-variable(x1)))) Implies(Antecedent(belong-to(I-variable(x1) I-variable(x2))) Consequent(consist-of(I-variable(x2) I-variable(x1)))) Implies(Antecedent(superarea(I-variable(x1) I-variable(x2))) Consequent(subarea(I-variable(x2) I-variable(x1)))) Implies(Antecedent(subarea(I-variable(x1) I-variable(x2))) Consequent(superarea(I-variable(x2) I-variable(x1))))} [email protected] 63 4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation Ontology Extent and Intent Classes Class Mapping Ontology Hierarchical Classes Taxonomy Relation Generation Fuzzy Context Ontology Relation Classes Non-Taxonomy Relation Generation Fuzzy Ontology Instances Generation Concept Hierarchy Figure 9. Fuzzy ontology generation process. [email protected] 64 4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation Class Mapping furnishes C = {E, I} in which E and I are classes corresponding to extent and intent of the fuzzy context. For example, the extent class mapped from the extent of the fuzzy context given in Table 1(b) can be labeled manually as Document. We can use appropriate names to represent keyword attributes and use them to label the intent class names as well. For example, the class Research Area can be used to label the initial intent class. [email protected] 65 4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation Taxonomy Relation Generation furnishes RT = {superclass(I,I), subclass(I,I)}. Thus, the hierarchical relations between instances of intent classes are defined. Also, two rules are added to X accordingly: superclass(X,Y):-subclass(Y,X). subclass(X,Y):-superclass(Y,X). [email protected] 66 4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation Non-taxonomy Relation Generation furnishes RN = {RIE(I,E), REI(E,I)}, in which REI is the relation between the extent class and intent class. RIE is the reversed relation of REI. However, we still need to label the non-taxonomy relation. For example, the relation between class Document and class Research Area can be labeled as belong-to, which implies that a document can belong to one or more research areas. Also, two rules are added to X accordingly: REI(X,Y):- RIE(Y,X). RIE (X,Y):- REI (Y,X). [email protected] 67 4. Fuzzy Ontology Generation Framework Step 3 Fuzzy Ontology Generation Instances Generation generates instances set I = {II, IE} where II and IE are instances of the intent and extent class. Then, it furnishes membership values for the instances’ attributes and relationships [email protected] 68 4. Fuzzy Ontology Generation Framework Overview Fuzzy Concept Lattice Concept Hierarchy Fuzzy Ontology Semantic Web Uncertainty Information Fuzzy Formal Concept Analysis Concept Hierarchy Generation Fuzzy Ontology Generation [email protected] Semantic Representation Conversion 69 4. Fuzzy Ontology Generation Framework Step 4 Semantic Representation Conversion The generated fuzzy ontology provides a conceptual model of knowledge in the corresponding domain However, to make such knowledge accessible and sharable, we must convert it into a semantic representation that can be embedded into the contents of Web pages. In Semantic Web, ontology description language such as OWL can be used to annotate ontology. Therefore, the generated fuzzy ontology can be automatically converted into the corresponding semantic representation in OWL, in which each class and instance is annotated as shown on the next slide… [email protected] 70 4. Fuzzy Ontology Generation Framework Step 4 Semantic Representation Conversion Ontology for the concept hierarchy represented by OWL <?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns="http://www.owlontologies.com/unnamed.owl#" xml:base="http://www.owlontologies.com/unnamed.owl"> <owl:Ontology rdf:about=""/> <owl:Class rdf:ID="Concept_2"/> <owl:Class rdf:ID="Concept_1"/> <owl:Class rdf:ID="Concept_3"> <rdfs:subClassOf rdf:resource="#Concept_1"/> <rdfs:subClassOf rdf:resource="#Concept_2"/> </owl:Class> <owl:DatatypeProperty rdf:ID="Data_Mining"/> <owl:DatatypeProperty rdf:ID="DataMining"> <rdfs:domain rdf:resource="#Concept_1"/> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#float"/> </owl:DatatypeProperty> <owl:DatatypeProperty rdf:ID="FuzzyLogic"> <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#float"/> <rdfs:domain rdf:resource="#Concept_2"/> </owl:DatatypeProperty> <Concept_2 rdf:ID="Document2"> <FuzzyLogic rdf:datatype="http://www.w3.org/2001/XMLSchema#float" >0.87</FuzzyLogic> </Concept_2> </rdf:RDF> [email protected] 71 5. Scholarly Ontology Ontology Generation Collected scientific documents on the research area “Information Retrieval” published in 1987-1997 from ISI Downloaded documents are preprocessed to extract related information such as the title, authors, citation keywords, and other citation information Extracted information then stored in a citation database [email protected] 72 5. Scholarly Ontology Ontology Generation First, we construct a fuzzy formal context Kf = {G,M,I}, with G as the set of documents and M as the set of citation keywords. The membership value of a document D on a citation keyword CK n1 in Kf is computed as ( D, C K ) n2 where n1 is the number of documents that cite D and contain CK, and n2 is the number of documents that cite D This formula is based on the premise that the more frequent a keyword occurs in the citing paper, the more important the keyword is in the cited paper. [email protected] 73 5. Scholarly Ontology Ontology Generation Then, conceptual clustering is performed from the fuzzy formal context Each generated conceptual cluster represents a research area The generated conceptual clusters form a hierarchy of research areas of documents in the Citation Database, or Research Area Hierarchy [email protected] 74 5. Scholarly Ontology Example of concept hierarchy generated {"Information Retrieval", "Query Processing", "Searching"} {"User Interface", "Browsing"} {"Retrieval Evaluation", "System Training"} {"Online Search", "Information Filtering"} {“User Satisfaction", "User Training", "User Study"} {"Data Mining"} {"Data Indexing"} {"Semantic Similarity", "Knowledge Representation"} {"Clustering", "Neural Network"} {"Expert System"} Figure 11 {"Recall", "Precision"} {"Text Retrieval"} Each research area is represented by a set of most frequent keywords occurring in the documents that belong to that research area. In FFCA, sub-areas inherit keywords from their super-areas. Note that the inherited keywords are not shown in Figure 11 when labeling the concepts. Only keywords specific to the concepts are used for labeling. [email protected] 75 5. Scholarly Ontology Ontology Generation The generated ontology contains scholarly information as a hierarchy of research areas as well as research areas for each document. Taking advantages of the Semantic Web, such knowledge can be easily shared and reused by other systems for browsing or retrieval. For example, we can use Protégé-2000 for browsing the scholarly ontology. [email protected] 76 5. Scholarly Ontology Part of the generated concept hierarchy of research areas Fig. 12 We use the keyword that has the highest membership value to label the research area. Nevertheless, users can browse more information of each research area. [email protected] 77 5. Scholarly Ontology Performance Evaluation Performance of the ontology generation is evaluated based on the generated Research Area Hierarchy. Firstly, we measure the typical recall, precision and Fmeasure to evaluate the clustering results. Secondly, we use the relaxation error and the corresponding cluster goodness measure to evaluate the goodness of the conceptual clusters generated. We also show whether the use of fuzzy membership instead of crisp value can help improve cluster goodness. Finally, we use the Average Uninterpolated Precision (AUP), which is a typical measure for evaluating a hierarchical construct, to evaluate the goodness of the generated concept hierarchy. [email protected] 78 5. Scholarly Ontology Performance Evaluation Keyword attributes are descriptors for the generated clusters, if more keywords are extracted and used, the more meaningful the cluster descriptors are constructed? To verify this, we vary the number of keywords N extracted from documents from 2 to 10, and the similarity threshold Ts from 0.2 to 0.9 when performing conceptual clustering We have classified the documents downloaded from ISI into classes based on their research themes. These classes are used as a benchmark to evaluate the clustering results in terms of recall, precision and F-measure. [email protected] 79 5. Scholarly Ontology Performance Evaluation - Precision Precision implies accuracy of the clustering results. Table 6 shows that when N is small, the precision is poor. It implies that “noisy” data in clusters. Table 6. Performance results using precision measurement. Ts=0.2 Ts=0.3 Ts=0.4 Ts=0.5 Ts=0.6 Ts=0.7 Ts=0.8 Ts=0.9 N=2 0.64 0.64 0.64 0.64 0.63 0.62 0.62 0.62 N=3 0.66 0.66 0.66 0.66 0.64 0.62 0.62 0.62 N=4 0.73 0.77 0.78 0.79 0.74 0.69 0.68 0.68 N=5 0.8 0.84 0.84 0.85 0.81 0.75 0.75 0.75 N=6 0.9 0.9 0.9 0.9 0.86 0.8 0.79 0.8 N=7 0.96 0.94 0.93 0.93 0.9 0.86 0.84 0.84 N=8 0.95 0.94 0.92 0.93 0.9 0.86 0.83 0.83 N=9 0.94 0.93 0.92 0.92 0.89 0.86 0.83 0.83 N=10 0.93 0.92 0.91 0.91 0.89 0.85 0.83 0.83 The precision is improved when the number of extracted keywords is increased. However, this will also cause the recall to decrease as shown in [email protected] 80 Table 7. 5. Scholarly Ontology Performance Evaluation - Recall When the number of clusters is gradually increased, the efficiency of the clustering results will gradually be decreased. Table 7. Performance results using recall measurement. Ts=0.2 Ts=0.3 Ts=0.4 Ts=0.5 Ts=0.6 Ts=0.7 Ts=0.8 Ts=0.9 N=2 0.99 0.99 0.99 0.99 0.99 0.98 0.98 0.98 N=3 0.99 0.99 0.99 0.99 0.98 0.98 0.97 0.97 N=4 0.98 0.98 0.97 0.97 0.94 0.95 0.94 0.94 N=5 0.89 0.87 0.87 0.88 0.87 0.89 0.89 0.89 N=6 0.8 0.81 0.83 0.83 0.83 0.85 0.85 0.85 N=7 0.81 0.8 0.82 0.82 0.83 0.84 0.86 0.86 N=8 0.79 0.79 0.81 0.82 0.82 0.84 0.85 0.85 N=9 0.76 0.77 0.8 0.8 0.81 0.83 0.84 0.84 N=10 0.73 0.75 0.78 0.78 0.79 0.81 0.83 0.83 [email protected] 81 5. Scholarly Ontology Performance Evaluation - F-measure When N is low, the F-measure is quite poor. Nevertheless, the F-measure is stable and good when a sufficient number of keywords are extracted. The results also show that the F-measure tends to have the best performance when Ts = 0.5. Table 8. Performance results using F-measure measurement. Ts=0.2 Ts=0.3 Ts=0.4 Ts=0.5 Ts=0.6 Ts=0.7 Ts=0.8 Ts=0.9 N=2 0.78 0.78 0.78 0.78 0.77 0.76 0.76 0.76 N=3 0.79 0.79 0.79 0.79 0.77 0.76 0.76 0.76 N=4 0.83 0.86 0.86 0.87 0.82 0.79 0.78 0.78 N=5 0.84 0.85 0.85 0.86 0.83 0.81 0.81 0.81 N=6 0.85 0.85 0.86 0.86 0.84 0.82 0.82 0.82 N=7 0.88 0.86 0.87 0.87 0.86 0.85 0.85 0.85 N=8 0.86 0.86 0.86 0.87 0.85 0.85 0.84 0.84 N=9 0.84 0.84 0.86 0.86 0.85 0.84 0.83 0.83 N=10 0.81 0.82 0.84 0.84 0.83 0.83 0.83 0.83 Average 0.83 0.83 0.83 0.84 0.82 0.81 0.8 0.8 [email protected] 82 5. Scholarly Ontology Performance Evaluation – Relaxation Error Relaxation error implies dissimilarities of items in a cluster based on attributes’ values. Since conceptual clustering techniques typically use a set of attributes for concept generation, relaxation error is quite commonly used for evaluating the goodness of conceptual clusters. [email protected] 83 5. Scholarly Ontology Performance Evaluation – Relaxation Error The relaxation error RE of a cluster C is defined as n n RE (C ) P( xi ) P( x j )d a ( xi , x j ) aA i 1 j 1 where A is the set of the attributes of items in C, P(xi) is the probability of item xi occurring in C and da(xi,xj) is the distance of xi and xj on attribute a. The cluster goodness G of cluster C is defined as G(C) = 1 - RE(C). [email protected] 84 5. Scholarly Ontology Performance Evaluation – Relaxation Error Comparison of FFCA and COBWEB while the number of extracted keywords is varied from 2 to 10 we vary the number of keywords extracted to observe the effect of the keyword generated on cluster goodness. Besides, since COBWEB is considered as one of the most popular techniques for conceptual clustering, we also apply COBWEB to the citation database to compare the performance. It shows that FFCA achieves better cluster goodness than COBWEB [email protected] 85 5. Scholarly Ontology Performance Evaluation – AUP Average Uninterpolated Precision (AUP) is defined as the sum of the precision value at each point (or node) in a hierarchical structure where a relevant item appears, divided by the total number of relevant items Typically, AUP implies the goodness of a concept hierarchical structure. For evaluating AUP, we have manually classified the downloaded documents into classes based on their research themes. For each class, we extract 5 most frequent keywords from the documents in the class. Then, we use these keywords as inputs to form retrieval queries and evaluate the retrieval performance using AUP [email protected] 86 5. Scholarly Ontology Performance Evaluation – AUP There are two ways to generate document keywords. The first is to use the set of keywords, known as attribute keywords, from each conceptual cluster as the document keywords. The second is to use the keywords from each document as the document keywords. Then, we vectorize the document keywords and the input query, and calculate the vectors’ distance for measuring the retrieval performance. [email protected] 87 5. Scholarly Ontology Performance Evaluation – AUP 1. 2. Two methods AUP measured using attribute keywords Hierarchical Average Uninterpolated Precision (AUP(H)), as each concept inherits attribute keywords from its superconcepts. AUP measured using keywords from documents Unconnected Average Uninterpolated Precision (AUP(U)). [email protected] 88 5. Scholarly Ontology Performance Evaluation – AUP Fig. 14 shows the results for AUP(H) and AUP(U) using different numbers of extracted keywords N. It shows that when N gets larger, the performance on AUP(H) and AUP(U) gets better. In addition, performance on AUP(H) is generally better than AUP(U). It means that the attribute keywords generated for conceptual clusters are appropriate Fig. 14 [email protected] 89 6. Semantic Helpdesk Application Introduction Developed in collaboration with a multinational company, the Semantic Help-Desk Environment comprises the Web Service Requester, Matchmaking Agent and Web Service Provider. The focus is on the fuzzy ontology generation process that generates Machine Service Ontology from a customer service database. This approach enables individual machine service knowledge to be shared over the Semantic Web. Thus, machine service knowledge from different machines or models provided by different manufacturers can be shared and integrated. This is important as many customers may have different types of machines and models from different manufacturers. [email protected] 90 6. Semantic Helpdesk Application Introduction - Web Service Requester A kind of Web Service that enables access to customer support for machine services. Instances of the Web Service Requester can be created from a Web Requester Server where its address is accessible for all users through the Web. When encountering a problem, a user can use the Web to connect the Web Requester Server in order to create an instance of the Web Service Requester. The created instance runs as a web-based program. That is, it can use the Web to interact with the user and other programs. [email protected] 91 6. Semantic Helpdesk Application Introduction - Web Service Requester Through the Web, the Web Service Requester instance provides an interface for the user to enter their reported problem. Through the interface, the user can specify the encountered fault as a textual string. The user is also required to enter the code of the machine model. The given information is used to form a profile for the Web Service Requester. The profile is then sent as a request to the Matchmaking Agent to seek a potential Web Service Provider for solving the problem [email protected] 92 6. Semantic Helpdesk Application Introduction - Web Service Provider It offers its machine service support as a Web Service extended with ontology capabilities. There are probably many instances of a Web Service Provider existing concurrently on the Internet. An instance of the Web Service Provider can be considered as a program that can access the Machine Service Ontology to retrieve machine service knowledge for a given reported problem. An instance of the Web Service Provider can interact with other programs. That is, it can be called by other programs and return the outputs to the calling programs. Instances of the Web Service Provider must be registered with a specific agent known as the Matchmaking Agent that serves as a registry and look-up service. [email protected] 93 6. Semantic Helpdesk Application Introduction - Web Service Provider Each instance of the Web Service Provider also provides a profile file that describes its parameters and capabilities. XML is used in most Web Services to represent the information contained in the profiles. However, traditional XML lacks the capabilities of representing semantic information. To overcome this problem, the Web Service Provider uses ontology-based service description language OWL-S (formerly DAML-S) to describe information in its profile. Hence, we describe the service as OWL ontology and its intentional information can be fully understood by other programs. [email protected] 94 6. Semantic Helpdesk Application Introduction - Matchmaking Agent When the Matchmaking Agent receives machine service requests from the Web Service Requester, it locates the appropriate Web Services that can fulfill the request [email protected] 95 6. Semantic Helpdesk Application Overview Manufacturer Manufacturer Machine Service Ontologies Customer Service Databases Customer Service Databases Machine Service Ontologies Web Service Provider Web Service Provider Internet Matchmaking Agent Client Web Browser Customer Client Web Browser Customer [email protected] Web Service Requester 96 6. Semantic Helpdesk Application Customer Service Database The customer service database contains 9000 service records, each record consists of faultcondition and checkpoint information Fault-condition contains the service engineer’s description of the machine fault. Checkpoint information indicates the suggested actions to be carried out to repair the machine based on the occurred fault-condition given by the customer [email protected] 97 6. Semantic Helpdesk Application Customer Service Database Fault-condition 3008 PCB CARRY MISS ERROR. PCB WAS NOT TRANSFERRED BY THE CARRIER DURING LOADING BUT STAYED AT THE DETECTION POSITION OF PCB DETECTION SENSOR 2. Checkpoint group: AVF_CHK003 Priority 1 2 3 4 Help file Checkpoint description CONFIRM WHETHER THE CARRY GUIDE PINS ARE IN LINE WITH PCB. CONFIRM WHETHER THE PCB IS IN CORRECT DIRECTION. CONFIRM THE POSITION OF THE GUIDE LOWER LIMIT SENSOR. (I/O 0165) CONFIRM THE TIMING FOR PCB 2 DETECT SENSOR. [email protected] AVF_CHK 007-1.GIF AVF_CHK 007-2.GIF AVF_CHK 007-3.GIF AVF_CHK 007-4.GIF 98 6. Semantic Helpdesk Application Machine Service Ontology Generation Apply FOGF to obtain Fuzzy Fault Concept Lattice → Fault Concept Hierarchy → Machine Service Ontology Any fault {“Anvil”} {“Anvil”, “Joint”, “Cannot Engage”} {“Drive”} {“Cutter”} {“Component”} {“Cutter”, “Drive”, “Cannot Open”,”Axis”} {“Cutter”, “Component”, “Cut”,”Insertion”} {“Anvil”, “Shaky”, “Unit”} {“Anvil”, “Drive”, “Cannot Open”,”Pitch”} [email protected] { “Component”, “Float”,”PCB”} Part of the Fault Concept Hierarchy of the machine model AV_2011 99 6. Semantic Helpdesk Application Machine Service Ontology Generation The generation process creates classes, relations and instances for the service ontology. The machine fault service knowledge stored in the Customer Service Database is known as nontaxonomy knowledge, whereas the machine fault hierarchy knowledge from the Fault Concept Hierarchy is called taxonomy knowledge. These two types of knowledge are combined to form the Machine Service Ontology. [email protected] 100 6. Semantic Helpdesk Application Machine Service Ontology in OWL <rdf:RDF> xmlns:owl ="http://www.w3.org/2002/07/owl#" xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-nsl#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:xsd ="http://www.w3.org/2000/10/XMLSchema#" xmlns:daml="http://www.w3.org/2001/10/daml+oil#" <owl:Ontology rdf:about=””> <owl:versionInfo>v 1.0 2004-12-07 19:06:40 </owl:versionInfo> <rdfs:label> Machine Service Ontology </rdfs:label> </owl:Ontology> <owl:Class rdf:ID=”Machine”/> <owl:Class rdf:ID=”Check_point”> <owl:Class rdf:ID=”Machine_Fault_Cluster”> … <owl:Class rdf:ID=”Machine_Fault_Cluster_1”> <owl:rdfLabel=”Anvil”> <rdf:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster”/> </rdf:subClassOf> <owl:ObjectProperty rdf:ID="Anvil"> <rdfs:range rdf:resource="&xsd;Float"/> </owl:ObjectProperty> </owl:Class> <owl:Class rdf:ID=”Machine_Fault_Cluster_2”> <owl:rdfLabel=”Cutter”> <rdf:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster”/> </rdf:subClassOf> <owl:ObjectProperty rdf:ID="Cutter"> <rdfs:range rdf:resource="&xsd;Float"/> </owl:ObjectProperty> </owl:Class> <owl:Class rdf:ID=”Machine_Fault_Cluster_3”> <owl:rdfLabel=”Anvil_Cutter”> <rdf:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster_1”/> <owl:onProperty rdf:resource=”#Machine_Fault_Cluster_2”/> </rdf:subClassOf> </owl:Class> … <owl:Class rdf:ID=”Machine_Fault”> <owl:ObjectProperty rdf:ID="occur_on"> <rdfs:domain rdf:resource="#Machine"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="inspect_to"> <rdfs:domain rdf:resource="#Checkpoint"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="belong_to"> <rdfs:domain rdf:resource="#Machine_Fault_Cluster"/> </owl:ObjectProperty> </owl:Class> </rdf:RDF> [email protected] 101 6. Semantic Helpdesk Application Experiments Data stored in the database was divided into 10 subsets. Each subset was sequentially used as a testing set while others were used for generating conceptual clustering. Keywords in fault conditions in each testing set were extracted and fuzzified as testing fuzzy queries. To verify whether fuzzy queries can improve the retrieval performance, the keywords extracted are also used for retrieving without membership as crisp queries for comparison. [email protected] 102 6. Semantic Helpdesk Application Experiments Manually classified faults in each machine model into groups based on the machine components in which the fault occurred. Retrieval accuracy is evaluated based on the number of the retrieved faults that are in the same classified group with the query. [email protected] 103 6. Semantic Helpdesk Application Performance Measures Recall, Precision and F-measure number of fault conditions retrieved and correct recall total number of fault conditions correct precision number of fault conditions retrieved and correct total number of fault conditions retrieved 2 * recall * precision F measure recall precision [email protected] 104 6. Semantic Helpdesk Application Retrieval Performance 1 0.9 0.8 0.7 Recall 0.6 Crisp Query 0.5 Fuzzy Query 0.4 0.3 0.2 1 0.1 0.9 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.8 0.9 0.7 F-measure Confidence Threshold Crisp Query 0.5 Fuzzy Query 0.4 0.3 1 0.2 0.9 0.1 0.8 0 0.7 0 0.6 Precision 0.6 Crisp Query 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Confidence Threshold Fuzzy Query 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Confidence Threshold [email protected] 105 6. Semantic Helpdesk Application Performance Comparison Retrieval accuracy compared with four other techniques Two variations of k-nearest neighbor (kNN) technique. The first variation (kNN1) is based on vector’s normalized Euclidean distance to perform the retrieval. The second (kNN2) makes use of fuzzy-trigram technique to do so. Two kinds of artificial neural networks (ANN): the supervised learning vector quantization (LVQ3) neural network and the unsupervised SelfOrganizing Maps (SOM). [email protected] 106 6. Semantic Helpdesk Application Performance Comparison Retrieval Technique kNN1 kNN2 LVQ3 SOM FFCA with Crisp Query FFCA with Fuzzy Query Retrieval Accuracy 81.4% 77.6% 93.2% 90.3% 84.6% 93.0% (Confidence Threshold = 0.2) •FFCA with fuzzy query outperformed kNN. •LVQ3 performed marginally better, but requires prior expert knowledge for training, which would be a problem when dealing with large amounts of uncertainty information. •The proposed technique can generate a concept hierarchy from the clusters, which is important information for generating a corresponding meaningful ontology. [email protected] 107 7. Summary Proposed a framework for fuzzy ontology generation with uncertainty information FOGF consists of the following steps: Fuzzy Formal Concept Analysis Fuzzy Conceptual Clustering Fuzzy Ontology Generation Semantic Representation Conversion [email protected] 108 7. Summary FOGF can represent uncertainty information and construct a concept hierarchy from the uncertainty information Apart from constructing scholarly ontology from citation database, FOGF has also been used to generate Machine Service Ontology for Semantic Help-desk and Reuters News Topic Themes Ontology Also, the scholarly ontology has been partially used to construct a Scholarly Semantic Web, a Semantic Web-based information retrieval system to support scholarly activities in the Semantic Web environment [email protected] 109 References (Not intended to be Exhaustive) Ontology Editors [1] http://protege.stanford.edu/ [2] S. Bechhofer, I. Horrocks, P. Patel-Schneider, and S. Tessaris, "A proposal for a description logic interface," in Proceedings of the International Workshop on Description Logics, pp. 3336, 1999. Large corpora [3] E. Morin, “Automatic acquisition of semantic relations between terms from technical corpora," in Proceedings of the Fifth International Congress on Terminology and Knowledge Engineering (TKE-99), (Vienna, Austria), 1999. [4] M. Hearst, “Automatic acquisition of hyponyms from large text corpora," in Proceedings of the Fourteenth International Conference on Computational Linguistic, (France), 1992. Knowledge base of rules [5] P. Compton and A. Jansen, Knowledge Acquisition, ch. A Philosophical Basis for Knowledge Acquisition, pp. 241-257. Statistical approaches [6] H. Suryanto and P. Compton, “Discovery of ontologies from knowledge bases," in Proceedings of The 5th International Conference on Knowledge Capture (Y. Gil, M. Musen, J. Shavlik, and Victoria(, eds.), (Canada), pp. 171-178, 2001. Semi-structured schemata based on Graphs [7] A. Deitel, C. Faron, and R. Dieng, “Learning ontologies from RDF annotations,“ in Proceedings of the IJCAI Workshop in Ontology Learning, (Seattle,USA), 2001. [email protected] 110 References (Not intended to be Exhaustive) Semi-structured schemata based on Statistics [8] C. Papatheodorou, A. Vassiliou, and B. Simon, “Discovery of ontologies for learning resources using word-based clustering," in Proceedings of ED-MEDIA 2002, (Denver,USA), 2002. LSD [9] A. Doan, P. Domingos, and A. Levy, “Learning source descriptions for data integration," in Proceedings of the Third International Workshop on the Web and Databases, pp. 81-86, 2000. Database schema [10] P. Johannesson, “A method for transforming relational schemas into conceptual schemas," in Proceedings of the 10th International Conference on Data Engineering (M. Rusinkiewicz, ed.), (Houston, USA), pp. 115-122, IEEE Press, 1994. [11] D. Rubin, M. Hewett, D. Oliver, T. Klein, and R. Altman, “Automatic data acquisition into ontologies from pharmacogenetics relational data sources using declarative object de¯nitions and XML," in Proceedings of the Paci¯c Symposium on Biology (R.B.Altman, A. Dunker, L. Hunter, K. Lauderdale, and T. Klein, eds.), (Lihue, HI), 2002. NLP [12] D. Lonsdale, Y. Ding, D. Embley, and A. Melby, “Peppering knowledge sources with SALT; boosting conceptual content for ontology generation," in Proceedings of the AAAI Workshop on Semantic Web Meets Language Resources, 2002. [13] D. I. Moldovan and R. C. Girju, \An interactive tool for the rapid development of knowledge bases," International Journal on Arti¯cial Intelligence Tools (IJAIT), vol. 10, no. 1-2, 2001. [email protected] 111 References (Not intended to be Exhaustive) Wordnet [14] http://wordnet.princeton.edu/wordnet/download/ Text-to-Onto [15] A. Maedche and S. Staab, “Ontology learning for the Semantic Web," IEEE Intelligent Systems, Special Issue on the Semantic Web, vol. 16, no. 2, 2001. Keyword frequencies [16] A. Faatz and R. Steinmetz, “Ontology enrichment with texts from the WWW,“ in In Proceedings of Semantic Web Mining 2nd Workshop at ECML/PKDD-2002, (Helsinki, Finland), 2002. [17] R. Navigli, P. Velardi, and A. Gangemi, “Ontology learning and its application to automated terminology translation," IEEE Intelligent Systems, vol. 18, no. 1, 2003. Clustering / COBWEB [18] P. Clerkin, P. Cunningham, and C. Hayes, \Ontology discovery for the Semantic Web using hierarchical clustering," in Proceedings of Workshop at ECML/PKDD-2001, (Germany), 2001. Mo'K [19] G. Bisson and C. Nedellec, \Designing clustering methods for ontology building: The Mo'K workbench," in Proceedings of the Workshop on Ontology Learning, 14th European Conference on Arti¯cial Intelligence, ECAI'00 (S. Staab, A. Maedche, C. Nedellec, and P. WiemerHasting, eds.), (Germany), 2000. [email protected] 112 References (Not intended to be Exhaustive) ESKIMO [20] S. Kampa, T. Miles-Board, and L.Carr, \Hypertext in the Semantic Web," The ACM Conference on Hypertext and Hypermedia, pp. 237-238, 2001. Scholarly Ontology Project [21] V. Uren, S. Shum, C. Mancini, and G. Li, “Modelling naturalistic argumentation in research literatures," in Proceedings of the 4th Workshop on Computational Models of Natural Argument, (Valencia, Spain), 2004. OAI [22] http://www.openarchives.org/ FCA [23] B. Ganter and R. Wille, Formal Concept Analysis: Mathematical Foundations. [email protected] 113