Download Multimedia Engineering

Document related concepts

Cluster analysis wikipedia , lookup

Transcript
Ontology Generation and
Applications
Dr. A.C.M. Fong, CEng
Professor of Computer Engineering
School of Computing and Mathematical Sciences
Faculty of Design and Creative Technologies
Auckland University of Technology
[email protected]
Contents
1.
2.
3.
4.
5.
6.
Introduction – Semantic Web and Ontology
Related Work – Ontology Generation
Toward Automated Ontology Generation
Fuzzy Ontology Generation Framework
Application 1 – Scholarly Info
Application 2 – Service Helpdesk
[email protected]
2
1. Introduction
Semantic Web
The basis for the Semantic Web is on its ability to represent
real-life domains accurately so that it enables programs to
completely understand the environment in which they
operate.
In summary, Semantic Web provides the following benefits:
 SWeb offers an expressive metadata model to represent
data, so that data can be managed effectively.
 Programs can understand the semantic concepts described
in metadata used on Semantic Web. Hence, knowledge
carried on the Semantic Web can be shared and reused
among different programs.
 Users can interact with programs using a semantic query
language to specify their requests and thereby improving
the retrieval performance.
 Deductive mechanism that is used to derive new
information from existing information can be described
clearly, so that knowledge can be reasoned with efficiently.

[email protected]
3
1. Introduction
Semantic Web Architecture
[email protected]
4
1. Introduction
Semantic Web Architecture - Layers
Foundation Layer. Semantic Web uses Uniform
Resource Identifier URI to identify resources and
uses Unicode to encode the documents.
 Schema Layer. This layer comprises XML + NS
(Namespace) + xmlschema layer and the RDF +
rdfschema layer.
This layer defines objects and classes, their
relations and constrains. The XML Schema (XMLS)
and RDF Schema (RDFS), which are based on
XML and RDF respectively, are used for these
layers.
RDFS has widely been used to describe classes at
the Schema Layers.

[email protected]
5
1. Introduction
Semantic Web Architecture - Layers
Ontology Layer. This layer provides constructs on
using meta-information to represent domain
knowledge.
In this layer, information is represented as ontology,
which is adopted by the Semantic Web to define
knowledge.
 Logic Layer. This layer infers more knowledge
from the existing knowledge. It can be integrated
with the Ontology Layer.
In this layer, concepts and relationships defined in
lower layers are converted into Turing-complete
logic languages in order to generate new
knowledge.

[email protected]
6
1. Introduction
Semantic Web Architecture - Layers



Proof Layer. This layer provides a mechanism to
check whether a statement is true or not.
Trust Layer. This Layer provides a mechanism
which resolves conflicts between knowledge
carried by the Semantic Web to form the "Web of
Trust"
Digital Signature Layer. This layer uses public key
cryptography to secure documents.
[email protected]
7
1. Introduction
Ontology – Definition





Ontology has different definitions. A commonly cited
definition defines ontology as a formal, explicit
specification of a shared conceptualization.
Conceptualization refers to an abstract model of
phenomena in the world by having identified the
relevant concepts of those phenomena.
Explicit means that the type of concepts used, and the
constraints on their use are explicitly defined.
Formal: should be machine readable.
Shared: should capture consensual knowledge
accepted by the communities.
[email protected]
8
1. Introduction
Ontology Research




Ontology is regarded as a standard conceptual
model for knowledge representation, especially
on Semantic Web.
The term ontology engineering has been
proposed to imply ontology-related research in
computer science
Current interesting issues on ontology
engineering include ontology generation,
ontology mapping, ontology integration and
ontology versioning.
This presentation focuses on ontology generation.
[email protected]
9
1. Introduction
Ontology Description Languages
Ontology is described using an ontology
description language.
Ontology description languages are based on Web
metadata description languages, which can be
classified into the following three groups:
 HTML-based
 XML-based
 RDF- based

[email protected]
10
1. Introduction
HTML-based Ontology Description Languages




The tags supported by traditional Web are
sufficient to represent some semantic knowledge.
Simple HTML Extension (SHOE) and Ontobroker
have embedded additional tags into HTML to
represent knowledge.
However, HTML does not support self-defined
tags. Therefore, HTML-based approach is difficult
to define classes for ontology.
Hence, XML-based ontology description
languages have been proposed to overcome this
limitation.
[email protected]
11
1. Introduction
XML-based Ontology Description Languages





These languages are usually based on XML Schema
(XMLS) or Document Type Definition (DTD).
DTD allows users to define new markup types to
describe information. Therefore, users can define
ontology classes using DTD.
Moreover, XMLS supports the definition of relations
between classes.
Thus, XMLS and DTD can be used directly to
embed semantic information.
However, since XML actually only renders syntactic
support for knowledge representation, XML-based
ontology description languages face the following
problems when representing knowledge
[email protected]
12
1. Introduction
XML-based Ontology Description Languages




A mechanism to define some relationships that are
usually central in ontologies such as is-a or
element-of relationships is lacking in XML.
XML does not support any notion of inheritance,
which is an important attribute in ontologies.
In XML, concepts are defined through tags, which
can be either a string or a combination of other
nested tags. Such mechanism may not be
sufficient for defining concepts in ontology, which
may require richer data structures to be
represented.
In XML, the order of tags appearing in a document
must be previously defined. In contrast, the
ordering of attribute description does not matter
on ontology.
[email protected]
13
1. Introduction
RDF-based Ontology Description Languages
RDF extends XML to become a standard for
knowledge representation.
 In addition, RDF Schema (RDFS) can be used to
define classes and class hierarchies in a domain.
The standardization supported by RDF provides two
important contributions:
 A standard set of modeling primitives (e.g. class,
instance, etc.) and their relationships (e.g. subclass)
are provided.
 A standardized syntax for writing ontologies is
supported.
 Popular RDF-based ontology description languages
include DARPA Agent Markup Language (DAML),
Ontology Inference Language (OIL), DAML+OIL and
Web Ontology Language (OWL)

[email protected]
14
1. Introduction
DARPA Agent Markup Language
DAML or DAML-ONT extends RDFS to represent ontology
using the object-oriented approach.
 It embeds some object-oriented concepts to represent
classes. Thus, the class representation of DMAL-ONT is
better than RDF.
 Example of DAML-ONT to represent the class "Journal",
which is a subclass of the class "Publication Medium", but is
disjoint with classes "Conference" and "Workshop" (i.e. an
object which belongs to class "Journal" can not belong to
classes "Conference" or "Workshop"
<Class ID="Journal">
<subClassOf resource="#Publication Medium"= >
<disjointFrom resource="#Conference"= >
<disjointFrom resource="#Workshop"= >
< =Class>

[email protected]
15
1. Introduction
Ontology Inference Language
OIL extends RDFS to represent ontology.
It is designed based on three criteria:
 Frame-based. It supports frames to define
classes and properties of classes. Thus, class
contents can be described more informatively
(e.g. constraints can be used for class properties)
 Description Logic. It describes knowledge using
logic rules. Thus, knowledge is represented
mathematically and can be processed by
programs.
 Uses Web Standard. It is based on XML and RDFS.

[email protected]
16
1. Introduction
Ontology Inference Language
<rdfs:Class rdf:ID="animal"= >
<rdfs:Class rdf:ID="plant">
<rdfs:subClassOf>
<oil:NOT>
<oil:hasOperand rdf:resource="#animal"= >
<oil:NOT= >
< =rdfs:subClassOf>
< =rdfs:Class>
<rdfs:Class rdf:ID="tree">
<rdfs:subClassOf rdf:resource="#plant">
< =rdfs:Class>
 Class "animal" is defined, followed by class "plant", which is
defined with the operator "NOT" used to state that it is strictly not
identical with class "animal“ (i.e. objects which belong to class
"animal" can not belong to class "plant" and vice-versa).
 Finally, class "tree" is defined as a subclass of "plant".
[email protected]
17
1. Introduction
DAML vs. OIL
Compared with DAML, OIL can represent class
properties better, but DAML can represent class
relationships more clearly.
 Hence, they can be combined to form a better
ontology description language
DAML + OIL
 It defines class relationships based on DAML.
 Class properties are defined in a similar way as
OIL.
 Hence, DAML+OIL takes the advantages of both
DAML and OIL.

[email protected]
18
1. Introduction
Web Ontology Language
OWL is extended from DAML+OIL to allow users
to define various types of relationships between
classes.
 Properties can also be defined using additional
constructs in OWL.
OWL has three sublanguages
 OWL Lite
 OWL DL
 OWL Full.

[email protected]
19
1. Introduction
Web Ontology Language
Even though there is the same OWL syntax used
among these sublanguages, they have a little
difference in design aimed at various
communities of implementers and users:
 OWL Lite only primarily supports classification
hierarchy and simple constrains when designing
classes.
 OWL DL includes all OWL language constructs but
they can be used only under certain restriction
(e.g. a class cannot be an instance of another
class).
 OWL Full allows all OWL language constructs to
be used without any restriction.
[email protected]
20
1. Introduction
Web Ontology Language
<rdf:RDF>
xmlns:owl ="http://www.w3.org/2002/07/owl#"
xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-nsl#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsd ="http://www.w3.org/2000/10/XMLSchema#"
xmlns:daml="http://www.w3.org/2001/10/daml+oil#"
Header
Info
<owl:Ontology rdf:about="Scholarly Information">
<owl:versionInfo>v 1.0 2009-12-07 19:06:40</owl:versionInfo>
< =owl:Ontology>
<owl:Class rdf:ID="Concept1">
<owl:rdfLabel="Data Mining">
< =owl:Class>
<owl:Class rdf:ID="Concept2">
<owl:rdfLabel="Fuzzy Logic">
< =owl:Class>
< =owl:Class rdf:ID="Concept2">
< =owl:Class rdf:ID="Concept3">
<owl:rdfLabel="Data Mining, Fuzzy Logic"= >
<rdf:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="Concept1">
<owl:onProperty rdf:resource="Concept2">
< =rdf:subClassOf>
< =rdf:RDF>
[email protected]
Ontology Name
and Version
3 classes: Concept1
(labelled Data
mining), Concept2
(labelled Fuzzy Logic)
and Concept3.
Concept3 is a
subclass of both
Concept 1 and
Concept2.
21
2. Related Work
Ontology Generation



Ontology uses classes, which contain attributes,
to represent concepts.
Ontology also supports taxonomy and nontaxonomy relations between classes.
Although editing tools such as Protege [1] and
OilEd [2] have been developed to help users to
create and edit ontology, it is a tedious task to
manually derive ontology from data.
[email protected]
22
2. Related Work
Ontology Generation – Approaches



Ontology can be generated from various types of
data, mostly textual.
Large corpora [3,4] are considered as good
sources for mining knowledge for constructing
ontology, since the information in the corpus is
usually well annotated. Therefore, it can be easily
processed by other programs.
Ontology can also be generated from a
knowledge base of rules [5], which is represented
as a tree with rules residing at tree nodes.
Statistical approaches have been used to
estimate the existence of relationships between
entities involved in rules [6].
[email protected]
23
2. Related Work
Ontology Generation – Approaches



When knowledge is represented in semistructured schemata such as XML and RDF, its
contents can easily be parsed by programs;
techniques have been proposed to generate
ontology from semi-structured schemata based
on Graph Theory [7] and statistical approaches
[8].
Learning Source Description (LSD) proposed [9]
to generate ontology from any arbitrary
formalisms of semi-structured schemata.
Entity-Relationship model used in database
schema has also been adopted as an information
source for generating ontology [10,11].
[email protected]
24
2. Related Work
Ontology Generation –Textual Data





For textual data, ontology concepts can be extracted
efficiently using Natural Language Processing (NLP)
techniques [12,13].
NLP for preprocessing the textual data in order to
extract significant keywords.
WordNet [14] can be used to improve accuracy of
ontology generated by NLP-based techniques.
However, the NLP techniques have difficulty in finding
semantic relationships among the keywords.
Data mining techniques can be combined with NLP to
improve the efficiency of ontology generation. In
Text-to-Onto [15], association rules are used to ¯find
associative relations between keywords, which are
used to construct non-taxonomy relations for the
ontology.
[email protected]
25
2. Related Work
Ontology Generation –Textual Data



Keywords' frequencies are often used in
statistical approaches [16,17] to identify
significant keywords that can be used to
represent a certain concept.
Clustering techniques have also been applied to
generate ontology from textual data [18].
Using significant keywords extracted from textual
data, clustering techniques can cluster
documents and interpret topics from the
generated clusters.
[email protected]
26
2. Related Work
Ontology Generation –Clustering





Clustering can be used to mine hidden knowledge
from data to construct an ontology. It can also be
used to enrich existing ontology.
Traditional clustering techniques are useful for
generating non-taxonomy relations for ontology.
In particular, conceptual clustering techniques are
powerful clustering techniques that can conceptualize
clusters and construct a concept hierarchy of clusters
useful for generating taxonomy relations for ontology.
E.g. approach based on COBWEB [18] that can
generate taxonomy relations among concepts on a
domain for ontology generation.
Mo'K [19] is a system that can obtain taxonomy
relations from tagged text using conceptual clustering.
[email protected]
27
2. Related Work
Ontology Applications – Scholarly Info



In E-Scholar Knowledge Inference MOdel
(ESKIMO) [20], knowledge on scholarly
publications is represented as a simple ontology,
known as OntoPortal, which is manually
developed and maintained.
OntoPortal describes and provides links to other
external research pages on the Web. Hypertext
links between the web pages are also described
in the OntoPortal ontology.
ESKIMO allows users to retrieve scholarly
information from the constructed ontology by
using queries represented as Prolog-like rules.
[email protected]
28
2. Related Work
Ontology Applications – Scholarly Info



In the Scholarly Ontology Project [21], a digital
library Web server is constructed using Semantic
Web technologies in order to support scholarly
retrieval.
Developed using a collaborative approach in
which researchers will submit their documents in
a specifically structured format.
As such, the contents of the submitted
documents can be further processed in the
system and converted into scholarly ontology
accordingly.
[email protected]
29
2. Related Work
Ontology Applications – Scholarly Info



In the Research in Semantic Scholarly Publishing
(RSSP) project, scientific publications are
collected from online archives such as the Open
Archive Initiative (OAI) [22].
Information of the documents (e.g. their authors,
titles, citations, publishers, etc.) is extracted,
indexed and converted into ontology formalism.
DAML+OIL is used to annotate the ontology as
Semantic Web pages to support scholarly
retrieval
[email protected]
30
2. Related Work
Summary





Many techniques to construct ontology from
various data types/sources; mainly textual data
Traditionally, NLP techniques are used to analyze
textual data.
Recently, data mining techniques have been
incorporated into NLP to further discover hidden
knowledge from textual data.
Conceptual clustering is an advanced data mining
technique that can organize data in a hierarchical
conceptual structure.
Thus, conceptual clustering is a useful technique
to discover knowledge for generating ontology
from textual data.
[email protected]
31
3. Toward Automated Ontology Generation
Basics



Initial focus on Scholarly info
Scholarly ontology generated directly from
explicit information on scientific publications (e.g.
their titles, authors, citations, etc.).
Other advanced scholarly knowledge such as
research experts and areas are usually inferred
manually by human experts.
[email protected]
32
3. Toward Automated Ontology Generation
Basics




To construct scholarly ontology from citation
database, we use data mining techniques to
discover hidden knowledge in the database.
Data mining techniques include Context-based
Cluster Analysis (CCA) and Fuzzy Concept
Hierarchy Generation (FCHG)
Discovered knowledge then converted and
integrated into the ontology formalism.
As such, apart from the implicit information
available on scientific publications, Scholarly
Ontology can also support other useful scholarly
retrieval functions such as research experts
finding and trends detection
[email protected]
33
3. Toward Automated Ontology Generation
Context-based Cluster Analysis




CCA is based on Formal Concept Analysis (FCA)
[23] technique.
FCA provides a formal model, known as formal
context, to represent relations between objects
and attributes in a data set.
We use formal contexts to represent multiple
resultant clustering data.
Then, relations between the formal contexts are
analyzed to find the relations between the
corresponding resultant clustering data
[email protected]
34
3. Toward Automated Ontology Generation
Fuzzy Concept Hierarchy Generation





Concept hierarchy is a data structure useful for
knowledge presentation.
Widely used in data mining applications.
Size of a concept hierarchy may be large to
reflect the knowledge in a domain precisely.
Manual construction may be difficult and tedious.
Need conceptual clustering
[email protected]
35
3. Toward Automated Ontology Generation
Fuzzy Concept Hierarchy Generation




Many conceptual clustering techniques organize
knowledge as a concept hierarchy. It may not be
sufficient for representing information in a real
domain.
FCA, which is a data exploratory technique,
supports concept lattice that provides a more
informative conceptual model for representing
knowledge.
FCA-based conceptual clustering techniques are
potentially useful for constructing taxonomy
knowledge of ontology.
However, the typical FCA-based conceptual
clustering techniques do not support uncertainty
information.
[email protected]
36
3. Toward Automated Ontology Generation
Fuzzy Concept Hierarchy Generation




Traditional FCA-based conceptual clustering
approaches can’t represent vague information…
Need fuzziness
L-Fuzzy context uses linguistic variables to
represent uncertainty in the context.
But needs human interpretation to define
linguistic variables.
Fuzzy concept lattice generated from L-fuzzy
context usually causes a combinatorial explosion
of concepts (compared to traditional concept
lattice)
[email protected]
37
3. Toward Automated Ontology Generation
Fuzzy Concept Hierarchy Generation






We combine fuzzy logic and FCA as Fuzzy Formal
Concept Analysis (FFCA).
In FFCA, uncertainty information is directly
represented by a real number of membership
value in the range of [0,1].
Linguistic variables are no longer needed.
Compared to fuzzy concept lattice generated
from L-fuzzy context, the fuzzy concept lattice
generated using FFCA will be simpler in terms of
the number of formal concepts.
It also supports a formal mechanism for
calculating concept similarities.
Based on FFCA, we propose the Fuzzy Conceptual
Clustering technique in FCHG to generate fuzzy
concept hierarchy. [email protected]
38
4. Fuzzy Ontology Generation Framework
Fuzzy Ontology



Application of fuzzy logic offers a possible
solution for dealing with uncertainty information
Fuzzy ontology is generated and used in text
retrieval and search engines, where membership
values are used to evaluate the similarities
between the concepts in a concept hierarchy
Manual generation of fuzzy ontology from a
predefined concept hierarchy is a difficult and
tedious task that often requires expert
interpretation.
[email protected]
39
4. Fuzzy Ontology Generation Framework
Introduction



Efficient method for generation of concept
hierarchy and fuzzy ontology is highly desirable
We propose a Fuzzy Ontology Generation
Framework (FOGF) that can automate fuzzy
ontology generation from uncertainty data based
on Formal Concept Analysis (FCA) theory
Generated fuzzy ontology is mapped to a
semantic representation in OWL
[email protected]
40
4. Fuzzy Ontology Generation Framework
Overview
Fuzzy Concept Lattice
Concept Hierarchy
Fuzzy Ontology
Semantic Web
Uncertainty
Information
Fuzzy Formal
Concept Analysis




Concept Hierarchy
Generation
Fuzzy Ontology
Generation
Semantic
Representation
Conversion
Fuzzy Formal Concept Analysis incorporates fuzzy logic into Formal
Concept Analysis to represent vague information
Concept Hierarchy Generation clusters the fuzzy concept lattice
generated by FFCA to construct a concept hierarchy in two steps:
Fuzzy Conceptual Clustering and Hierarchical Relation Generation
Fuzzy Ontology Generation constructs fuzzy ontology from a fuzzy
context using the concept hierarchy created by fuzzy conceptual
clustering
Semantic Representation Conversion – make knowledge accessible
and sharable on the Web environment. Use OWL
[email protected]
41
4. Fuzzy Ontology Generation Framework
Step 1 Fuzzy Formal Concept Analysis
Definition (Fuzzy Formal Context)
A fuzzy formal context is a triple
K =(G, M, I = (G  M))
where G is a set of objects, M is a set of attributes,
and I is a fuzzy set on domain G  M.
Each relation (g, m)  I has a membership value
(g,m) in [0,1].

[email protected]
42
4. Fuzzy Ontology Generation Framework
Step 1 Fuzzy Formal Concept Analysis



Fuzzy formal context can be represented as a cross-table
(Table 1)
Data Mining
Clustering
Fuzzy Logic
D1
0.8
0.12
0.61
D2
0.9
0.85
0.13
D3
0.1
0.14
0.87
An α-cut can be set to eliminate relations with low
membership values, e.g. α = 0.5 (Table 2)
Data Mining
Clustering
Fuzzy Logic
D1
0.8
-
0.61
D2
0.9
0.85
-
D3
-
-
0.87
The context has 3 objects representing 3 documents, D1,
D2 and D3. It also has 3 attributes, “Data Mining”,
“Clustering” and “Fuzzy Logic” representing 3 research
topics. The relationship between an object and an
attribute is represented by a membership value in [0, 1].
[email protected]
43
4. Fuzzy Ontology Generation Framework
Step 1 Fuzzy Formal Concept Analysis
Definition (Fuzzy Representation of
Object)
Each object O in a fuzzy formal context K
can be represented by a fuzzy set (O) as
where {A1, A2,…, Am} is the set of
attributes in K and µi is the membership
of O with attribute Ai in K.  (O) is called
the fuzzy representation of O.

[email protected]
44
4. Fuzzy Ontology Generation Framework
Step 1 Fuzzy Formal Concept Analysis



Generally, we can consider the attributes of a
formal concept as the description of the concept.
Thus, the relationships between the object and
the concept should be the intersection of the
relationships between the objects and the
attributes of the concept
Since each relationship between the object and
an attribute is represented as a membership
value in fuzzy formal context, the intersection of
these membership values should be the minimum
of these membership values, hence…
[email protected]
45
4. Fuzzy Ontology Generation Framework
Step 1 Fuzzy Formal Concept Analysis
Definition (Fuzzy Formal Concept)
Given a fuzzy formal context K =(G, M, I) and a confidence
threshold T, we define A*= {m  M | g  A: (g, m)  T}
for A  G and B* = {g  G | m  B: (g,m)  T} for B  M.
A fuzzy formal concept (or fuzzy concept) of a fuzzy formal
context (G, M, I) with a confidence threshold T is a pair (Af
=(A), B) where A  G, B  M, A* = B and B* = A. Each
object g  (A) has a membership g defined as
g = min (g,m)

mB
where (g,m) = membership value between object g and
attribute m defined in I. If B = {} then g = 1 for every g.
A and B are the extent and intent of the formal concept
((A), B) respectively.
[email protected]
46
4. Fuzzy Ontology Generation Framework
Step 1 Fuzzy Formal Concept Analysis



This version of FFCA as presented in these
Definitions preserves differently continuous
values of objects’ memberships, crucial for
calculating concepts’ similarities.
In a formal context, a concept can have many
superconcepts and subconcepts. However, the
similarities of a concept to its superconcepts and
subconcepts are different.
With fuzzy concept lattice, we can make use of
the fuzzy set theory to calculate the similarities
between a concept and its subconcepts.
[email protected]
47
4. Fuzzy Ontology Generation Framework
Step 1 Fuzzy Formal Concept Analysis
Definition (Fuzzy Formal Concept
Cardinality)
Since the fuzziness of a fuzzy formal
concept is represented by membership
values of objects of the concept, the
cardinality of a fuzzy formal concept Kf =
((A), B) is defined as |Kf| = |(A)|.

[email protected]
48
4. Fuzzy Ontology Generation Framework
Step 1 Fuzzy Formal Concept Analysis
Definition (Fuzzy Formal Concept
Similarity)
The similarity of a fuzzy formal concept Kf1
= ((A1), B1) and its subconcept Kf2 =
((A2), B2) is defined as E(Kf1,Kf2) =
E((A1), (A2)).

[email protected]
49
4. Fuzzy Ontology Generation Framework
Step 1 Fuzzy Formal Concept Analysis
{}

Traditional concept lattice
generated from Table 1
without membership values
C1
{“Data Mining”}
0.5
{“Data Mining”,
“Clustering”}
{“Data Mining”}
{D1,
D2,D3}
{“Data Mining”,
“Clustering”}
{D1,D2}}
{D2(0.85)}
C3
C2
{D1,D3}
{D2}
C3
{D1}
{}
0.41
{D1(0.61)}
{}
{“Data Mining”,
“Clustering”,
“Fuzzy Logic”}
0.00
{“Data Mining”,
C4 “Fuzzy Logic”}
Fig. 3
{“Fuzzy Logic”}

{“Data Mining”,
C4 “Fuzzy Logic”}
{“Data Mining”,
“Clustering”,
“Fuzzy Logic”}
Fig. 2
0.35
0.00
{}
C1
{D1,
0.00
D2,D3}
C2
{D1(0.8),
{D1(0.61),
{“Fuzzy Logic”}
D2(0.9)}
D3(0.87)}
0.00
Fuzzy concept lattice
generated from fuzzy formal
context in Table 2 (similarities
between concepts shown)
[email protected]
50
4. Fuzzy Ontology Generation Framework
Overview
Fuzzy Concept Lattice
Concept Hierarchy
Fuzzy Ontology
Semantic Web
Uncertainty
Information
Fuzzy Formal
Concept Analysis
Concept Hierarchy
Generation
Fuzzy Ontology
Generation
[email protected]
Semantic
Representation
Conversion
51
4. Fuzzy Ontology Generation Framework
Step 2 Concept Hierarchy Generation

Concept Hierarchy Generation clusters the
fuzzy concept lattice generated by FFCA to
construct a concept hierarchy in two steps:
Fuzzy Conceptual Clustering and
Hierarchical Relation Generation
[email protected]
52
4. Fuzzy Ontology Generation Framework
Step 2 a)Fuzzy Conceptual Clustering
Compared to traditional clusters, the conceptual
clusters generated have the following properties:
 Each conceptual cluster is considered as a human
interpretable concept in the domain of the fuzzy
concept lattice
 Each conceptual cluster is a sublattice extracted
from the fuzzy concept lattice
 A formal concept must belong to at least one
conceptual cluster e.g. a scientific document can
belong to more than one research area
[email protected]
53
4. Fuzzy Ontology Generation Framework
Step 2 a)Fuzzy Conceptual Clustering

Conceptual clusters are generated based
on the idea at if a formal concept A
belongs to a conceptual cluster R, then its
subconcept B also belongs to R if B is
similar to A. We can use a similarity
confidence threshold Ts to determine
whether two concepts are similar or not.
[email protected]
54
4. Fuzzy Ontology Generation Framework
Step 2 a)Fuzzy Conceptual Clustering
Definition (Conceptual Cluster).
A conceptual cluster of a concept lattice K
with a similarity confidence threshold Ts is
a sublattice SK of K which has the
following properties:
 SK has a supremum concept CS that is
not similar to any of its superconcepts.
 Any concept C  CS in SK must have at
least one superconcept C’  SK so that
E(C,C’) > Ts.

[email protected]
55
4. Fuzzy Ontology Generation Framework
Step 2 a)Fuzzy Conceptual Clustering

Fig. 5 shows the conceptual clusters generated from the
fuzzy concept lattice given in Fig. 3 with similarity
confidence threshold Ts = 0.5
{}
0.00
0.00
CK1
C2
{“Data Mining”}
{“Fuzzy Logic”}
C1
0.5
{“Data Mining”,
“Clustering”}
CK2
0.35
0.41
CK3
C3
0.00
0.00
{“Data Mining”,
C4 “Fuzzy Logic”}
{“Data Mining”,
“Clustering”,
“Fuzzy Logic”}
Fig. 5
[email protected]
56
4. Fuzzy Ontology Generation Framework
Step 2 b)Hierarchical Relation Generation

Fuzzy conceptual clustering generates a set of
conceptual clusters SC. To construct a concept
hierarchy from the conceptual clusters, we need
to find the hierarchy relations from the clusters.

We first define a concept hierarchy
Definition (Concept Hierarchy)
A concept hierarchy is a poset (partially
ordered set) (H,) where H is a finite set
of concepts, and  is a partial order on H.

[email protected]
57
4. Fuzzy Ontology Generation Framework
Step 2 b)Hierarchical Relation Generation

Definition of superconcept and subconcept relations on
conceptual clusters assures that each conceptual cluster
has at least one superconcept, unless it corresponds to the
root node of the concept hierarchy generated. However, we
must prove that the  relation is a partial order.
Definition (Subconcept and Superconcept on a
Concept Hierarchy)
Let C1 and C2 be two conceptual clusters corresponding
to two sublattices L1 and L2 of a fuzzy concept lattice
F (K). Let the fuzzy formal concept I be the
supremum of L1, i.e. I = sup(L1). C1 is the
subconcept of C2, denoted as C1  C2 , if I is the
subconcept of any concept C’  L2, or I  C’ where  is
the partial order defined on F (K). Equivalently, C2 is
the superconcept of C1.

[email protected]
58
4. Fuzzy Ontology Generation Framework
Step 2 b)Hierarchical Relation Generation

Figure 8(b) illustrates the hierarchical relations constructed from
the conceptual clusters given in Figure 8(a). Each concept in the
concept hierarchy is represented by a set of its attributes. The
supremum and infimum of the lattice are considered as “Thing”
and “Nothing” concepts, respectively.
{}
Thing
0.00
0.00
CK1
C2
{“Data Mining”}
CK2
{“Fuzzy Logic”}
C1
0.5
{“Fuzzy Logic”}
{“Data Mining”,
“Clustering”}
0.35
0.41
{“Data Mining”,
“Clustering”}
C3
0.00
0.00
{“Data Mining”,
“Clustering”,
“Fuzzy Logic”}
{“Data Mining”,
C4 “Fuzzy Logic”}
{“Data Mining”,
“Fuzzy Logic”}
CK3
Nothing
Figure 8(a). Conceptual clusters.
Figure 8(b). Concept hierarchy.
[email protected]
59
4. Fuzzy Ontology Generation Framework
Overview
Fuzzy Concept Lattice
Concept Hierarchy
Fuzzy Ontology
Semantic Web
Uncertainty
Information
Fuzzy Formal
Concept Analysis
Concept Hierarchy
Generation
Fuzzy Ontology
Generation
[email protected]
Semantic
Representation
Conversion
60
4. Fuzzy Ontology Generation Framework
Step 3 Fuzzy Ontology Generation





This step constructs fuzzy ontology from a fuzzy
context using the concept hierarchy created by fuzzy
conceptual clustering.
This is done based on the characteristic that both FCA
and ontology support formal definitions of concepts.
However, a concept defined in FCA has both
extensional and intensional information in a balanced
manner, whereas a concept in ontology emphasizes
on its intensional aspect.
To construct the fuzzy ontology, we need to convert
both intensional and extensional information of FCA
concepts into the corresponding classes and relations
of the ontology.
Thus, we define the fuzzy ontology as follows…
[email protected]
61
4. Fuzzy Ontology Generation Framework
Step 3 Fuzzy Ontology Generation
Definition (Fuzzy Ontology).
A fuzzy ontology FO consists of 4 elements (C,AC,R, X), where C =
set of concepts; AC represents a collection of attributes sets, one
for each concept; R = (RT, RN) represents a set of relationships,
which consists of 2 elements: RN is a set of non-taxonomy
relationships and RT is a set of taxonomy relationships. Each
concept ci in C represents a set of objects, or instances, of the
same kind. Each object oij of a concept ci can be described by a
set of attributes values denoted by AC(ci). Each relationship
ri(cp,cq) in R represents a binary association between concepts cp
and cq, and the instances of such a relationship are pairs of (cp,cq)
concept objects. Each attribute value of an object or relationship
instance is associated with a fuzzy membership value between
[0,1] implying the uncertainty degree of this attribute value or
relationship. X is a set of axioms. Each axiom in X is a constraint
on the concept’s and relationship’s attribute values or a constraint
on the relationships between concept objects

[email protected]
62
4. Fuzzy Ontology Generation Framework
Step 3 Fuzzy Ontology Generation
Example (Fuzzy Ontology).
 the Scholarly Ontology OS = (C, AC, R, X) is a fuzzy ontology where its
components are as follows.
 C = {“Document”, “Research Area”}
 AC(“Document”) = {“Name” ,“Author”, “Title”, “Keywords”, “Abstract”,
“Body”, “Publisher”, “Publication Date”}
 AC(“Research Area”) = {“Name”,“Keyword”}
 RN = {belong-to(“Document”, “Research Area”), consist-of(“Research
Area”,”Document”)}
 RT = {superarea-of(“Research Area”, “Research Area”), subareaof(“Research Area”, “Research Area”)}
 X ={Implies(Antecedent(consist-of(I-variable(x1) I-variable(x2)))
Consequent(belong-to(I-variable(x2) I-variable(x1))))
Implies(Antecedent(belong-to(I-variable(x1) I-variable(x2)))
Consequent(consist-of(I-variable(x2) I-variable(x1))))
Implies(Antecedent(superarea(I-variable(x1) I-variable(x2)))
Consequent(subarea(I-variable(x2) I-variable(x1))))
Implies(Antecedent(subarea(I-variable(x1) I-variable(x2)))
Consequent(superarea(I-variable(x2) I-variable(x1))))}

[email protected]
63
4. Fuzzy Ontology Generation Framework
Step 3 Fuzzy Ontology Generation
Ontology Extent
and Intent Classes
Class Mapping
Ontology
Hierarchical Classes
Taxonomy Relation
Generation
Fuzzy Context
Ontology Relation
Classes
Non-Taxonomy
Relation Generation
Fuzzy Ontology
Instances
Generation
Concept Hierarchy
Figure 9. Fuzzy ontology generation process.
[email protected]
64
4. Fuzzy Ontology Generation Framework
Step 3 Fuzzy Ontology Generation
Class Mapping furnishes C = {E, I} in which E and I
are classes corresponding to extent and intent of
the fuzzy context. For example, the extent class
mapped from the extent of the fuzzy context
given in Table 1(b) can be labeled manually as
Document. We can use appropriate names to
represent keyword attributes and use them to
label the intent class names as well. For example,
the class Research Area can be used to label the
initial intent class.
[email protected]
65
4. Fuzzy Ontology Generation Framework
Step 3 Fuzzy Ontology Generation



Taxonomy Relation Generation furnishes RT =
{superclass(I,I), subclass(I,I)}. Thus, the
hierarchical relations between instances of intent
classes are defined. Also, two rules are added to
X accordingly:
superclass(X,Y):-subclass(Y,X).
subclass(X,Y):-superclass(Y,X).
[email protected]
66
4. Fuzzy Ontology Generation Framework
Step 3 Fuzzy Ontology Generation



Non-taxonomy Relation Generation furnishes RN
= {RIE(I,E), REI(E,I)}, in which REI is the
relation between the extent class and intent class.
RIE is the reversed relation of REI. However, we
still need to label the non-taxonomy relation. For
example, the relation between class Document
and class Research Area can be labeled as
belong-to, which implies that a document can
belong to one or more research areas. Also, two
rules are added to X accordingly:
REI(X,Y):- RIE(Y,X).
RIE (X,Y):- REI (Y,X).
[email protected]
67
4. Fuzzy Ontology Generation Framework
Step 3 Fuzzy Ontology Generation


Instances Generation generates instances set I =
{II, IE} where II and IE are instances of the
intent and extent class.
Then, it furnishes membership values for the
instances’ attributes and relationships
[email protected]
68
4. Fuzzy Ontology Generation Framework
Overview
Fuzzy Concept Lattice
Concept Hierarchy
Fuzzy Ontology
Semantic Web
Uncertainty
Information
Fuzzy Formal
Concept Analysis
Concept Hierarchy
Generation
Fuzzy Ontology
Generation
[email protected]
Semantic
Representation
Conversion
69
4. Fuzzy Ontology Generation Framework
Step 4 Semantic Representation Conversion




The generated fuzzy ontology provides a
conceptual model of knowledge in the
corresponding domain
However, to make such knowledge accessible and
sharable, we must convert it into a semantic
representation that can be embedded into the
contents of Web pages.
In Semantic Web, ontology description language
such as OWL can be used to annotate ontology.
Therefore, the generated fuzzy ontology can be
automatically converted into the corresponding
semantic representation in OWL, in which each
class and instance is annotated as shown on the
next slide…
[email protected]
70
4. Fuzzy Ontology Generation Framework
Step 4 Semantic Representation Conversion

Ontology for the concept hierarchy represented by OWL
<?xml version="1.0"?><rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns="http://www.owlontologies.com/unnamed.owl#" xml:base="http://www.owlontologies.com/unnamed.owl"> <owl:Ontology rdf:about=""/> <owl:Class
rdf:ID="Concept_2"/> <owl:Class rdf:ID="Concept_1"/> <owl:Class
rdf:ID="Concept_3"> <rdfs:subClassOf rdf:resource="#Concept_1"/>
<rdfs:subClassOf rdf:resource="#Concept_2"/> </owl:Class>
<owl:DatatypeProperty rdf:ID="Data_Mining"/> <owl:DatatypeProperty
rdf:ID="DataMining"> <rdfs:domain rdf:resource="#Concept_1"/>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#float"/>
</owl:DatatypeProperty> <owl:DatatypeProperty rdf:ID="FuzzyLogic">
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#float"/>
<rdfs:domain rdf:resource="#Concept_2"/> </owl:DatatypeProperty>
<Concept_2 rdf:ID="Document2"> <FuzzyLogic
rdf:datatype="http://www.w3.org/2001/XMLSchema#float"
>0.87</FuzzyLogic> </Concept_2> </rdf:RDF>
[email protected]
71
5. Scholarly Ontology
Ontology Generation



Collected scientific documents on the research area
“Information Retrieval” published in 1987-1997
from ISI
Downloaded documents are preprocessed to
extract related information such as the title,
authors, citation keywords, and other citation
information
Extracted information then stored in a citation
database
[email protected]
72
5. Scholarly Ontology
Ontology Generation

First, we construct a fuzzy formal context Kf =
{G,M,I}, with G as the set of documents and M as
the set of citation keywords. The membership
value of a document D on a citation keyword CK
n1
in Kf is computed as
 ( D, C K ) 
n2
where n1 is the number of documents that cite D and contain
CK, and n2 is the number of documents that cite D

This formula is based on the premise that the
more frequent a keyword occurs in the citing
paper, the more important the keyword is in the
cited paper.
[email protected]
73
5. Scholarly Ontology
Ontology Generation



Then, conceptual clustering is performed from the
fuzzy formal context
Each generated conceptual cluster represents a
research area
The generated conceptual clusters form a hierarchy
of research areas of documents in the Citation
Database, or Research Area Hierarchy
[email protected]
74
5. Scholarly Ontology
Example of concept hierarchy generated
{"Information Retrieval",
"Query Processing",
"Searching"}
{"User Interface",
"Browsing"}
{"Retrieval Evaluation",
"System Training"}
{"Online Search",
"Information Filtering"}
{“User Satisfaction",
"User Training",
"User Study"}
{"Data Mining"}
{"Data Indexing"}
{"Semantic
Similarity",
"Knowledge
Representation"}
{"Clustering",
"Neural Network"}
{"Expert System"}
Figure 11

{"Recall",
"Precision"}
{"Text Retrieval"}
Each research area is represented by a set of most frequent keywords
occurring in the documents that belong to that research area. In FFCA,
sub-areas inherit keywords from their super-areas. Note that the
inherited keywords are not shown in Figure 11 when labeling the concepts.
Only keywords specific to the concepts are used for labeling.
[email protected]
75
5. Scholarly Ontology
Ontology Generation
The generated ontology contains scholarly
information as a hierarchy of research
areas as well as research areas for each
document.
 Taking advantages of the Semantic Web,
such knowledge can be easily shared and
reused by other systems for browsing or
retrieval.
 For example, we can use Protégé-2000 for
browsing the scholarly ontology.

[email protected]
76
5. Scholarly Ontology
Part of the generated concept hierarchy of research areas
Fig. 12
We use the keyword that has the highest membership value to label the research
area. Nevertheless, users can browse more information of each research area.
[email protected]
77
5. Scholarly Ontology
Performance Evaluation




Performance of the ontology generation is evaluated
based on the generated Research Area Hierarchy.
Firstly, we measure the typical recall, precision and Fmeasure to evaluate the clustering results.
Secondly, we use the relaxation error and the
corresponding cluster goodness measure to evaluate
the goodness of the conceptual clusters generated.
We also show whether the use of fuzzy membership
instead of crisp value can help improve cluster
goodness.
Finally, we use the Average Uninterpolated Precision
(AUP), which is a typical measure for evaluating a
hierarchical construct, to evaluate the goodness of the
generated concept hierarchy.
[email protected]
78
5. Scholarly Ontology
Performance Evaluation



Keyword attributes are descriptors for the
generated clusters, if more keywords are extracted
and used, the more meaningful the cluster
descriptors are constructed?
To verify this, we vary the number of keywords N
extracted from documents from 2 to 10, and the
similarity threshold Ts from 0.2 to 0.9 when
performing conceptual clustering
We have classified the documents downloaded from
ISI into classes based on their research themes.
These classes are used as a benchmark to evaluate
the clustering results in terms of recall, precision
and F-measure.
[email protected]
79
5. Scholarly Ontology
Performance Evaluation - Precision
Precision implies accuracy of the clustering results. Table 6 shows that when
N is small, the precision is poor. It implies that “noisy” data in clusters.
Table 6. Performance results using precision measurement.
Ts=0.2
Ts=0.3
Ts=0.4
Ts=0.5
Ts=0.6
Ts=0.7
Ts=0.8
Ts=0.9
N=2
0.64
0.64
0.64
0.64
0.63
0.62
0.62
0.62
N=3
0.66
0.66
0.66
0.66
0.64
0.62
0.62
0.62
N=4
0.73
0.77
0.78
0.79
0.74
0.69
0.68
0.68
N=5
0.8
0.84
0.84
0.85
0.81
0.75
0.75
0.75
N=6
0.9
0.9
0.9
0.9
0.86
0.8
0.79
0.8
N=7
0.96
0.94
0.93
0.93
0.9
0.86
0.84
0.84
N=8
0.95
0.94
0.92
0.93
0.9
0.86
0.83
0.83
N=9
0.94
0.93
0.92
0.92
0.89
0.86
0.83
0.83
N=10
0.93
0.92
0.91
0.91
0.89
0.85
0.83
0.83
The precision is improved when the number of extracted keywords is
increased. However, this will also cause the recall to decrease as shown in
[email protected]
80
Table 7.
5. Scholarly Ontology
Performance Evaluation - Recall
When the number of clusters is gradually increased, the efficiency
of the clustering results will gradually be decreased.
Table 7. Performance results using recall measurement.
Ts=0.2
Ts=0.3
Ts=0.4
Ts=0.5
Ts=0.6
Ts=0.7
Ts=0.8
Ts=0.9
N=2
0.99
0.99
0.99
0.99
0.99
0.98
0.98
0.98
N=3
0.99
0.99
0.99
0.99
0.98
0.98
0.97
0.97
N=4
0.98
0.98
0.97
0.97
0.94
0.95
0.94
0.94
N=5
0.89
0.87
0.87
0.88
0.87
0.89
0.89
0.89
N=6
0.8
0.81
0.83
0.83
0.83
0.85
0.85
0.85
N=7
0.81
0.8
0.82
0.82
0.83
0.84
0.86
0.86
N=8
0.79
0.79
0.81
0.82
0.82
0.84
0.85
0.85
N=9
0.76
0.77
0.8
0.8
0.81
0.83
0.84
0.84
N=10
0.73
0.75
0.78
0.78
0.79
0.81
0.83
0.83
[email protected]
81
5. Scholarly Ontology
Performance Evaluation - F-measure
When N is low, the F-measure is quite poor. Nevertheless, the F-measure
is stable and good when a sufficient number of keywords are extracted.
The results also show that the F-measure tends to have the best
performance when Ts = 0.5.
Table 8. Performance results using F-measure measurement.
Ts=0.2
Ts=0.3
Ts=0.4
Ts=0.5
Ts=0.6
Ts=0.7
Ts=0.8
Ts=0.9
N=2
0.78
0.78
0.78
0.78
0.77
0.76
0.76
0.76
N=3
0.79
0.79
0.79
0.79
0.77
0.76
0.76
0.76
N=4
0.83
0.86
0.86
0.87
0.82
0.79
0.78
0.78
N=5
0.84
0.85
0.85
0.86
0.83
0.81
0.81
0.81
N=6
0.85
0.85
0.86
0.86
0.84
0.82
0.82
0.82
N=7
0.88
0.86
0.87
0.87
0.86
0.85
0.85
0.85
N=8
0.86
0.86
0.86
0.87
0.85
0.85
0.84
0.84
N=9
0.84
0.84
0.86
0.86
0.85
0.84
0.83
0.83
N=10
0.81
0.82
0.84
0.84
0.83
0.83
0.83
0.83
Average
0.83
0.83
0.83
0.84
0.82
0.81
0.8
0.8
[email protected]
82
5. Scholarly Ontology
Performance Evaluation – Relaxation Error
Relaxation error implies dissimilarities of
items in a cluster based on attributes’
values.
 Since conceptual clustering techniques
typically use a set of attributes for concept
generation, relaxation error is quite
commonly used for evaluating the
goodness of conceptual clusters.

[email protected]
83
5. Scholarly Ontology
Performance Evaluation – Relaxation Error

The relaxation error RE of a cluster C is defined
as
n
n
RE (C )   P( xi ) P( x j )d a ( xi , x j )
aA i 1 j 1
where A is the set of the attributes of items in C,
P(xi) is the probability of item xi occurring in C
and da(xi,xj) is the distance of xi and xj on
attribute a.
 The cluster goodness G of cluster C is defined as
G(C) = 1 - RE(C).
[email protected]
84
5. Scholarly Ontology
Performance Evaluation – Relaxation Error

Comparison of FFCA and COBWEB while the number of extracted
keywords is varied from 2 to 10
we vary the number of keywords extracted to observe the effect of the
keyword generated on cluster goodness. Besides, since COBWEB is
considered as one of the most popular techniques for conceptual
clustering, we also apply COBWEB to the citation database to compare
the performance. It shows that FFCA achieves better cluster goodness
than COBWEB
[email protected]
85
5. Scholarly Ontology
Performance Evaluation – AUP




Average Uninterpolated Precision (AUP) is defined as
the sum of the precision value at each point (or node)
in a hierarchical structure where a relevant item
appears, divided by the total number of relevant items
Typically, AUP implies the goodness of a concept
hierarchical structure.
For evaluating AUP, we have manually classified the
downloaded documents into classes based on their
research themes.
For each class, we extract 5 most frequent keywords
from the documents in the class. Then, we use these
keywords as inputs to form retrieval queries and
evaluate the retrieval performance using AUP
[email protected]
86
5. Scholarly Ontology
Performance Evaluation – AUP

There are two ways to generate
document keywords. The first is to use
the set of keywords, known as attribute
keywords, from each conceptual cluster
as the document keywords. The second
is to use the keywords from each
document as the document keywords.
Then, we vectorize the document
keywords and the input query, and
calculate the vectors’ distance for
measuring the retrieval performance.
[email protected]
87
5. Scholarly Ontology
Performance Evaluation – AUP

1.
2.
Two methods
AUP measured using attribute keywords
Hierarchical Average Uninterpolated
Precision (AUP(H)), as each concept
inherits attribute keywords from its
superconcepts.
AUP measured using keywords from
documents Unconnected Average
Uninterpolated Precision (AUP(U)).
[email protected]
88
5. Scholarly Ontology
Performance Evaluation – AUP

Fig. 14 shows the results for AUP(H) and AUP(U) using different numbers
of extracted keywords N.
It shows that when N gets larger, the performance
on AUP(H) and AUP(U) gets better. In addition,
performance on AUP(H) is generally better than
AUP(U). It means that the attribute keywords
generated for conceptual clusters are appropriate
Fig. 14
[email protected]
89
6. Semantic Helpdesk Application
Introduction



Developed in collaboration with a multinational company,
the Semantic Help-Desk Environment comprises the Web
Service Requester, Matchmaking Agent and Web Service
Provider.
The focus is on the fuzzy ontology generation process that
generates Machine Service Ontology from a customer
service database.
This approach enables individual machine service
knowledge to be shared over the Semantic Web. Thus,
machine service knowledge from different machines or
models provided by different manufacturers can be shared
and integrated. This is important as many customers may
have different types of machines and models from different
manufacturers.
[email protected]
90
6. Semantic Helpdesk Application
Introduction - Web Service Requester




A kind of Web Service that enables access to
customer support for machine services.
Instances of the Web Service Requester can be
created from a Web Requester Server where its
address is accessible for all users through the Web.
When encountering a problem, a user can use the
Web to connect the Web Requester Server in order
to create an instance of the Web Service Requester.
The created instance runs as a web-based program.
That is, it can use the Web to interact with the
user and other programs.
[email protected]
91
6. Semantic Helpdesk Application
Introduction - Web Service Requester



Through the Web, the Web Service Requester instance
provides an interface for the user to enter their reported
problem.
Through the interface, the user can specify the encountered
fault as a textual string. The user is also required to enter the
code of the machine model. The given information is used to
form a profile for the Web Service Requester.
The profile is then sent as a request to the Matchmaking
Agent to seek a potential Web Service Provider for solving
the problem
[email protected]
92
6. Semantic Helpdesk Application
Introduction - Web Service Provider





It offers its machine service support as a Web Service
extended with ontology capabilities.
There are probably many instances of a Web Service
Provider existing concurrently on the Internet.
An instance of the Web Service Provider can be considered
as a program that can access the Machine Service Ontology
to retrieve machine service knowledge for a given reported
problem.
An instance of the Web Service Provider can interact with
other programs. That is, it can be called by other programs
and return the outputs to the calling programs.
Instances of the Web Service Provider must be registered
with a specific agent known as the Matchmaking Agent that
serves as a registry and look-up service.
[email protected]
93
6. Semantic Helpdesk Application
Introduction - Web Service Provider



Each instance of the Web Service Provider also provides a
profile file that describes its parameters and capabilities.
XML is used in most Web Services to represent the
information contained in the profiles.
However, traditional XML lacks the capabilities of
representing semantic information.
To overcome this problem, the Web Service Provider uses
ontology-based service description language OWL-S
(formerly DAML-S) to describe information in its profile.
Hence, we describe the service as OWL ontology and its
intentional information can be fully understood by other
programs.
[email protected]
94
6. Semantic Helpdesk Application
Introduction - Matchmaking Agent

When the Matchmaking Agent receives
machine service requests from the Web
Service Requester, it locates the
appropriate Web Services that can fulfill
the request
[email protected]
95
6. Semantic Helpdesk Application
Overview
Manufacturer
Manufacturer
Machine Service
Ontologies
Customer
Service
Databases
Customer
Service
Databases
Machine Service
Ontologies
Web Service
Provider
Web Service
Provider
Internet
Matchmaking
Agent
Client Web Browser
Customer
Client Web Browser
Customer
[email protected]
Web
Service
Requester
96
6. Semantic Helpdesk Application
Customer Service Database


The customer service database contains 9000
service records, each record consists of faultcondition and checkpoint information
Fault-condition contains the service engineer’s
description of the machine fault. Checkpoint
information indicates the suggested actions to
be carried out to repair the machine based on
the occurred fault-condition given by the
customer
[email protected]
97
6. Semantic Helpdesk Application
Customer Service Database
Fault-condition
3008 PCB CARRY MISS ERROR. PCB WAS NOT TRANSFERRED BY THE
CARRIER DURING LOADING BUT STAYED AT THE DETECTION POSITION OF
PCB DETECTION SENSOR 2.
Checkpoint group: AVF_CHK003
Priority
1
2
3
4
Help
file
Checkpoint description
CONFIRM WHETHER THE CARRY GUIDE PINS ARE IN LINE WITH PCB.
CONFIRM WHETHER THE PCB IS IN CORRECT DIRECTION.
CONFIRM THE POSITION OF THE GUIDE LOWER LIMIT SENSOR. (I/O 0165)
CONFIRM THE TIMING FOR PCB 2 DETECT SENSOR.
[email protected]
AVF_CHK
007-1.GIF
AVF_CHK
007-2.GIF
AVF_CHK
007-3.GIF
AVF_CHK
007-4.GIF
98
6. Semantic Helpdesk Application
Machine Service Ontology Generation

Apply FOGF to obtain Fuzzy Fault Concept
Lattice → Fault Concept Hierarchy →
Machine Service Ontology
Any fault
{“Anvil”}
{“Anvil”, “Joint”, “Cannot
Engage”}
{“Drive”}
{“Cutter”}
{“Component”}
{“Cutter”, “Drive”, “Cannot
Open”,”Axis”}
{“Cutter”, “Component”,
“Cut”,”Insertion”}
{“Anvil”, “Shaky”, “Unit”}
{“Anvil”, “Drive”, “Cannot
Open”,”Pitch”}
[email protected]
{ “Component”,
“Float”,”PCB”}
Part of the Fault Concept
Hierarchy of the machine
model AV_2011
99
6. Semantic Helpdesk Application
Machine Service Ontology Generation


The generation process creates classes, relations
and instances for the service ontology.
The machine fault service knowledge stored in
the Customer Service Database is known as nontaxonomy knowledge, whereas the machine fault
hierarchy knowledge from the Fault Concept
Hierarchy is called taxonomy knowledge. These
two types of knowledge are combined to form the
Machine Service Ontology.
[email protected]
100
6. Semantic Helpdesk Application
Machine Service Ontology in OWL
<rdf:RDF>
xmlns:owl ="http://www.w3.org/2002/07/owl#"
xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-nsl#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsd ="http://www.w3.org/2000/10/XMLSchema#"
xmlns:daml="http://www.w3.org/2001/10/daml+oil#"
<owl:Ontology rdf:about=””>
<owl:versionInfo>v 1.0 2004-12-07 19:06:40 </owl:versionInfo>
<rdfs:label> Machine Service Ontology </rdfs:label>
</owl:Ontology>
<owl:Class rdf:ID=”Machine”/>
<owl:Class rdf:ID=”Check_point”>
<owl:Class rdf:ID=”Machine_Fault_Cluster”>
…
<owl:Class rdf:ID=”Machine_Fault_Cluster_1”>
<owl:rdfLabel=”Anvil”>
<rdf:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource=”#Machine_Fault_Cluster”/>
</rdf:subClassOf>
<owl:ObjectProperty rdf:ID="Anvil">
<rdfs:range rdf:resource="&xsd;Float"/>
</owl:ObjectProperty>
</owl:Class>
<owl:Class rdf:ID=”Machine_Fault_Cluster_2”>
<owl:rdfLabel=”Cutter”>
<rdf:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource=”#Machine_Fault_Cluster”/>
</rdf:subClassOf>
<owl:ObjectProperty rdf:ID="Cutter">
<rdfs:range rdf:resource="&xsd;Float"/>
</owl:ObjectProperty>
</owl:Class>
<owl:Class rdf:ID=”Machine_Fault_Cluster_3”>
<owl:rdfLabel=”Anvil_Cutter”>
<rdf:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource=”#Machine_Fault_Cluster_1”/>
<owl:onProperty rdf:resource=”#Machine_Fault_Cluster_2”/>
</rdf:subClassOf>
</owl:Class>
…
<owl:Class rdf:ID=”Machine_Fault”>
<owl:ObjectProperty rdf:ID="occur_on">
<rdfs:domain rdf:resource="#Machine"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="inspect_to">
<rdfs:domain rdf:resource="#Checkpoint"/>
</owl:ObjectProperty>
<owl:ObjectProperty rdf:ID="belong_to">
<rdfs:domain rdf:resource="#Machine_Fault_Cluster"/>
</owl:ObjectProperty>
</owl:Class>
</rdf:RDF>
[email protected]
101
6. Semantic Helpdesk Application
Experiments



Data stored in the database was divided into 10
subsets. Each subset was sequentially used as a
testing set while others were used for generating
conceptual clustering.
Keywords in fault conditions in each testing set
were extracted and fuzzified as testing fuzzy
queries.
To verify whether fuzzy queries can improve the
retrieval performance, the keywords extracted
are also used for retrieving without membership
as crisp queries for comparison.
[email protected]
102
6. Semantic Helpdesk Application
Experiments
Manually classified faults in each machine
model into groups based on the machine
components in which the fault occurred.
 Retrieval accuracy is evaluated based on
the number of the retrieved faults that are
in the same classified group with the
query.

[email protected]
103
6. Semantic Helpdesk Application
Performance Measures

Recall, Precision and F-measure
number of fault conditions retrieved and correct
recall 
total number of fault conditions correct
precision 
number of fault conditions retrieved and correct
total number of fault conditions retrieved
2 * recall * precision
F  measure 
recall  precision
[email protected]
104
6. Semantic Helpdesk Application
Retrieval Performance
1
0.9
0.8
0.7
Recall
0.6
Crisp Query
0.5
Fuzzy Query
0.4
0.3
0.2
1
0.1
0.9
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.8
0.9
0.7
F-measure
Confidence Threshold
Crisp Query
0.5
Fuzzy Query
0.4
0.3
1
0.2
0.9
0.1
0.8
0
0.7
0
0.6
Precision
0.6
Crisp Query
0.5
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Confidence Threshold
Fuzzy Query
0.4
0.3
0.2
0.1
0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Confidence Threshold
[email protected]
105
6. Semantic Helpdesk Application
Performance Comparison



Retrieval accuracy compared with four other
techniques
Two variations of k-nearest neighbor (kNN)
technique. The first variation (kNN1) is based on
vector’s normalized Euclidean distance to perform
the retrieval. The second (kNN2) makes use of
fuzzy-trigram technique to do so.
Two kinds of artificial neural networks (ANN): the
supervised learning vector quantization (LVQ3)
neural network and the unsupervised SelfOrganizing Maps (SOM).
[email protected]
106
6. Semantic Helpdesk Application
Performance Comparison
Retrieval Technique
kNN1
kNN2
LVQ3
SOM
FFCA with Crisp Query
FFCA with Fuzzy Query
Retrieval Accuracy
81.4%
77.6%
93.2%
90.3%
84.6%
93.0%
(Confidence Threshold = 0.2)
•FFCA with fuzzy query outperformed kNN.
•LVQ3 performed marginally better, but requires prior expert knowledge
for training, which would be a problem when dealing with large amounts
of uncertainty information.
•The proposed technique can generate a concept hierarchy from the
clusters, which is important information for generating a corresponding
meaningful ontology.
[email protected]
107
7. Summary
Proposed a framework for fuzzy ontology
generation with uncertainty information
 FOGF consists of the following steps:
 Fuzzy Formal Concept Analysis
 Fuzzy Conceptual Clustering
 Fuzzy Ontology Generation
 Semantic Representation Conversion

[email protected]
108
7. Summary



FOGF can represent uncertainty information and
construct a concept hierarchy from the
uncertainty information
Apart from constructing scholarly ontology from
citation database, FOGF has also been used to
generate Machine Service Ontology for Semantic
Help-desk and Reuters News Topic Themes
Ontology
Also, the scholarly ontology has been partially
used to construct a Scholarly Semantic Web, a
Semantic Web-based information retrieval
system to support scholarly activities in the
Semantic Web environment
[email protected]
109
References
(Not intended to be Exhaustive)
Ontology Editors
[1] http://protege.stanford.edu/
[2] S. Bechhofer, I. Horrocks, P. Patel-Schneider, and S. Tessaris, "A proposal for a description
logic interface," in Proceedings of the International Workshop on Description Logics, pp. 3336, 1999.

Large corpora
[3] E. Morin, “Automatic acquisition of semantic relations between terms from technical
corpora," in Proceedings of the Fifth International Congress on Terminology and Knowledge
Engineering (TKE-99), (Vienna, Austria), 1999.
[4] M. Hearst, “Automatic acquisition of hyponyms from large text corpora," in Proceedings of
the Fourteenth International Conference on Computational Linguistic, (France), 1992.

Knowledge base of rules
[5] P. Compton and A. Jansen, Knowledge Acquisition, ch. A Philosophical Basis for Knowledge
Acquisition, pp. 241-257.

Statistical approaches
[6] H. Suryanto and P. Compton, “Discovery of ontologies from knowledge bases," in
Proceedings of The 5th International Conference on Knowledge Capture (Y. Gil, M. Musen, J.
Shavlik, and Victoria(, eds.), (Canada), pp. 171-178, 2001.

Semi-structured schemata based on Graphs
[7] A. Deitel, C. Faron, and R. Dieng, “Learning ontologies from RDF annotations,“ in
Proceedings of the IJCAI Workshop in Ontology Learning, (Seattle,USA), 2001.

[email protected]
110
References
(Not intended to be Exhaustive)
Semi-structured schemata based on Statistics
[8] C. Papatheodorou, A. Vassiliou, and B. Simon, “Discovery of ontologies for learning
resources using word-based clustering," in Proceedings of ED-MEDIA 2002, (Denver,USA),
2002.

LSD
[9] A. Doan, P. Domingos, and A. Levy, “Learning source descriptions for data integration," in
Proceedings of the Third International Workshop on the Web and Databases, pp. 81-86,
2000.

Database schema
[10] P. Johannesson, “A method for transforming relational schemas into conceptual schemas,"
in Proceedings of the 10th International Conference on Data Engineering (M. Rusinkiewicz,
ed.), (Houston, USA), pp. 115-122, IEEE Press, 1994.
[11] D. Rubin, M. Hewett, D. Oliver, T. Klein, and R. Altman, “Automatic data acquisition into
ontologies from pharmacogenetics relational data sources using declarative object
de¯nitions and XML," in Proceedings of the Paci¯c Symposium on Biology (R.B.Altman, A.
Dunker, L. Hunter, K. Lauderdale, and T. Klein, eds.), (Lihue, HI), 2002.

NLP
[12] D. Lonsdale, Y. Ding, D. Embley, and A. Melby, “Peppering knowledge sources with SALT;
boosting conceptual content for ontology generation," in Proceedings of the AAAI Workshop
on Semantic Web Meets Language Resources, 2002.
[13] D. I. Moldovan and R. C. Girju, \An interactive tool for the rapid development of
knowledge bases," International Journal on Arti¯cial Intelligence Tools (IJAIT), vol. 10, no.
1-2, 2001.

[email protected]
111
References
(Not intended to be Exhaustive)
Wordnet
[14] http://wordnet.princeton.edu/wordnet/download/

Text-to-Onto
[15] A. Maedche and S. Staab, “Ontology learning for the Semantic Web," IEEE
Intelligent Systems, Special Issue on the Semantic Web, vol. 16, no. 2, 2001.

Keyword frequencies
[16] A. Faatz and R. Steinmetz, “Ontology enrichment with texts from the WWW,“ in In
Proceedings of Semantic Web Mining 2nd Workshop at ECML/PKDD-2002, (Helsinki,
Finland), 2002.
[17] R. Navigli, P. Velardi, and A. Gangemi, “Ontology learning and its application to
automated terminology translation," IEEE Intelligent Systems, vol. 18, no. 1, 2003.

Clustering / COBWEB
[18] P. Clerkin, P. Cunningham, and C. Hayes, \Ontology discovery for the Semantic Web
using hierarchical clustering," in Proceedings of Workshop at ECML/PKDD-2001,
(Germany), 2001.

Mo'K
[19] G. Bisson and C. Nedellec, \Designing clustering methods for ontology building: The
Mo'K workbench," in Proceedings of the Workshop on Ontology Learning, 14th
European Conference on Arti¯cial Intelligence, ECAI'00 (S. Staab, A. Maedche, C.
Nedellec, and P. WiemerHasting, eds.), (Germany), 2000.

[email protected]
112
References
(Not intended to be Exhaustive)
ESKIMO
[20] S. Kampa, T. Miles-Board, and L.Carr, \Hypertext in the Semantic Web," The ACM
Conference on Hypertext and Hypermedia, pp. 237-238, 2001.

Scholarly Ontology Project
[21] V. Uren, S. Shum, C. Mancini, and G. Li, “Modelling naturalistic argumentation in
research literatures," in Proceedings of the 4th Workshop on Computational Models of
Natural Argument, (Valencia, Spain), 2004.

OAI
[22] http://www.openarchives.org/

FCA
[23] B. Ganter and R. Wille, Formal Concept Analysis: Mathematical Foundations.

[email protected]
113