Download Representation and validation of domain and range

Document related concepts

Relational model wikipedia , lookup

Database model wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Transcript
Scholars' Mine
Masters Theses
Student Research & Creative Works
Fall 2010
Representation and validation of domain and range
restrictions in a relational database driven ontology
maintenance system
Patrick Garrett. Edgett
Follow this and additional works at: http://scholarsmine.mst.edu/masters_theses
Part of the Computer Sciences Commons
Department: Computer Science
Recommended Citation
Edgett, Patrick Garrett., "Representation and validation of domain and range restrictions in a relational database driven ontology
maintenance system" (2010). Masters Theses. 6730.
http://scholarsmine.mst.edu/masters_theses/6730
This Thesis - Open Access is brought to you for free and open access by Scholars' Mine. It has been accepted for inclusion in Masters Theses by an
authorized administrator of Scholars' Mine. This work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution
requires the permission of the copyright holder. For more information, please contact [email protected].
REPRESENTATION AND VALIDATION OF DOMAIN AND RANGE
RESTRICTIONS IN A RELATIONAL DATABASE DRIVEN ONTOLOGY
MAINTENANCE SYSTEM
by
PATRICK GARRETT EDGETT
A THESIS
Presented to the Faculty of the Graduate School of the
MISSOURI UNIVERSITY OF SCIENCE AND TECHNOLOGY
In Partial Fulfillment of the Requirements for the Degree
MASTER OF SCIENCE IN COMPUTER SCIENCE
2010
Approved by
Dr. Jennifer L. Leopold, Advisor
Dr. Dan Lin
Dr. Ronald L. Frank
iii
ABSTRACT
An ontology can be used to represent and organize the objects, properties, events,
processes, and relations that embody an area of reality [1]. These knowledge bases may
be created manually (by individuals or groups), and/or automatically using software
tools, such as those developed for information retrieval and data mining. Recently, the
National Science Foundation funded a large collaborative development project for the
semi-automated construction of an ontology of amphibian anatomy (AmphibAnat [2]).
To satisfy the extensive community curation requirements of that project, a generic, Webbased, multi-user, relational database ontology management system (RDBOM [3]) was
constructed, based upon a novel theoretical ontology model called an Ontology Abstract
Machine (OAM [4]). The need to support concurrent data entry by multiple users with
different levels of access privileges (as determined and assigned by the administrators),
made it critical to ensure that the entered data were semantically correct. In particular,
the ability to define and enforce restrictions on property characteristics such as the
domain and range of a relation provide several advantages. It helps to identify
inconsistencies in the ontology, maintain a higher level of overall integrity, and avoid
erroneous conclusions that could be made by automated reasoners. In this thesis a
modified OAM model is presented that includes definitions for property characteristics
and the associated validation algorithms. As proof of concept, it is shown how this
modified abstract model has been implemented for domain and range restrictions in
RDBOM.
iv
ACKNOWLEDGMENTS
I would like to thank Dr. Jennifer Leopold for getting me involved in research as
an undergraduate, and encouraging me to obtain a Master of Science Degree along the
same line of work. Additionally I want to thank Dr. Dan Lin and Dr. Ronald Frank for
serving as members of my committee. This work was supported by the National Science
Foundation under award DBI-0640053, and I am also extremely grateful for the
Chancellor's Fellowship provided by Missouri S&T. I also want to thank my cat Horace
and my sister's dog Floyd for serving as excellent examples and particularly Floyd for not
eating Horace. Lastly I want to thank Leong Lee and Alton Coalter for their support
throughout my work on RDBOM and designing the domain and range restrictions.
v
TABLE OF CONTENTS
Page
ABSTRACT ....................................................................................................................... iii
ACKNOWLEDGMENTS ................................................................................................. iv
LIST OF ALGORITHMS ................................................................................................. vii
LIST OF EXAMPLES ..................................................................................................... viii
LIST OF ILLUSTRATIONS ............................................................................................. ix
LIST OF SCHEMAS .......................................................................................................... x
NOMENCLATURE .......................................................................................................... xi
SECTION
1. INTRODUCTION ...................................................................................................... 1
2. BACKGROUND AND MOTIVATION.................................................................... 3
2.1. PROPERTY CHARACTERISTICS................................................................... 3
2.2. DOMAIN AND RANGE ................................................................................... 4
2.3. ONTOLOGY ABSTRACT MACHINE ............................................................. 7
2.4. RESTRICTION CAPABILITIES OF ONTOLOGIES REPRESENTED AS
RELATIONAL DATABASES ........................................................................ 10
2.5. SUMMARY ...................................................................................................... 11
3. MODIFIED ONTOLOGY ABSTRACT MACHINE .............................................. 12
3.1. MODIFIED OAM MODEL ............................................................................. 12
3.2. MODIFIED OAM MODEL EXAMPLES ....................................................... 13
3.3. ALGORITHMS TO VALIDATE RELATIONSHIP EDGES ......................... 15
3.4. SUMMARY ...................................................................................................... 22
4. RDBOM'S STRUCTURE ........................................................................................ 23
4.1. REPRESENTATION OF THE ONTOLOGY ................................................. 23
4.2. ONTOLOGY MODIFICATION ...................................................................... 25
4.3. USER AUTHORIZATION .............................................................................. 26
4.4. SUMMARY ...................................................................................................... 27
5. PROPOSED SOLUTION......................................................................................... 28
5.1. PROPERTY CHARACTERISTICS................................................................. 28
5.2. VALIDATING RESTRICTED PROPERTY CHARACTERISTIC USE ....... 29
vi
5.3. VIOLATING RESTRICTED PROPERTY CHARACTERISTICS ................ 32
5.4. SUMMARY ...................................................................................................... 35
6. IMPLEMENTATION .............................................................................................. 37
6.1. PROPERTY CHARACTERISTICS................................................................. 37
6.2. VALIDATING RESTRICTED PROPERTY CHARACTERISTICS ............. 38
6.2.1. Linking a Node to Another Node ........................................................... 38
6.2.2. Moving Nodes/Branches ........................................................................ 38
6.2.3. Linking to Additional Parents ................................................................ 40
6.3. RESTRICTED PROPERTY CHARACTERISTIC VIOLATIONS ................ 41
6.3.1. Node to Node Link Operations .............................................................. 42
6.3.2. Move/Cut and Link to Additional Parents ............................................. 43
6.4. ENTIRE ONTOLOGY VALIDATION ........................................................... 44
6.5. INFORMATIONAL PAGES ........................................................................... 46
6.6. SUMMARY ...................................................................................................... 49
7. FUTURE WORK ..................................................................................................... 50
8. SUMMARY ............................................................................................................. 51
BIBLIOGRAPHY ............................................................................................................. 52
VITA ................................................................................................................................ 54
vii
LIST OF ALGORITHMS
Item
Page
Algorithm 3.1. Create a New Domain ............................................................................ 16
Algorithm 3.2. Create a New Range............................................................................... 17
Algorithm 3.3. Perform a Validation Check Before Creating a New Edge ................... 19
viii
LIST OF EXAMPLES
Item
Page
Example 2.1. OAM Representation of the Ontology Presented In Figure 2.1 .................. 8
Example 3.1. Example of the Modified OAM Representation of an Ontology .............. 14
Example 3.2. Create a New Range (is defined by, literature)......................................... 18
Example 3.3. Perform a Validation Check (In Terms of Violation Detection) Before
Creating a New Edge ................................................................................ 20
Example 3.4. Perform a Validation Check (In Terms of Acceptance) Before
Creating a New Edge ................................................................................ 21
ix
LIST OF ILLUSTRATIONS
Item
Page
Figure 2.1.
Example of a Domain and Range Restriction in OWL............................... 6
Figure 2.2.
OAM Diagram of the Ontology in Example 2.1......................................... 9
Figure 3.1.
OAM Diagram of the Ontology in Example 3.1....................................... 15
Figure 3.2.
OAM Diagram of the Ontology in Example 3.4....................................... 22
Figure 4.1.
RDBOM Update Page for Horace ............................................................ 25
Figure 5.1.
Move/Cut Node Operation in RDBOM .................................................... 31
Figure 5.2.
Link to Additional Parent Operation in RDBOM ..................................... 31
Figure 6.1.
RDBOM Detection of Invalid Edge from Figure 2.1 ............................... 39
Figure 6.2.
RDBOM Attempting a Link Node to Additional Parent Operation ......... 41
Figure 6.3.
RDBOM Fixing a Violation via the Move Operation .............................. 45
Figure 6.4.
AmphibAnat Restricted Properties ........................................................... 46
Figure 6.5.
RDBOM Term Information Page for Floyd ............................................. 47
Figure 6.6.
RDBOM Violations Overview Page ......................................................... 48
Figure 6.7.
RDBOM Violations Overview Page for AmphibAnat ............................. 49
x
LIST OF SCHEMAS
Item
Schema 5.1.
Page
Restriction Violations Table Schema........................................................ 34
xi
NOMENCLATURE
Symbol
Description
M
Ontology Abstract Machine (OAM), (Q, ∑, δ, Q0, F)
Q
Set of nodes, Qc ∪ Qi ∪ Qv
Qc
Set of class terms in an ontology
Qi
Set of instance terms in an ontology
Qv
Set of values in an ontology
∑
Set of relationship types, ∑B ∪ ∑E
∑B
Set of base relationship types
∑E
Set of extended relationship types
δ
Set of triples representing relationships (edges) among nodes
Q0
Set of source nodes (no incoming ∑B edge, leaves)
F
Set of root nodes
U
Set of individual user IDs
δdomain
Set of domain restriction tuples
δrange
Set of range restrictions tuples
1. INTRODUCTION
Philosopher Barry Smith has defined an ontology as “…the science of what is, of
the kinds and structures of objects, properties, events, processes, and relations in every
area of reality. For an information system, an ontology is a representation of some preexisting domain of reality which: 1) reflects the properties of the objects within its
domain in such a way that there obtains a systematic correlation between reality and the
representation itself; 2) is intelligible to a domain expert; and 3) is formalized in a way
that allows it to support automatic information processing” [1].
A perusal of ontology-related literature in computer science clearly reflects that
the research focus and problems relating to ontologies in information systems have
changed over the years; the focal point has shifted from the more theoretical ontology
issues to problems associated with the actual development and use of ontologies in realworld, large-scale, collaborative applications. As an example of a large ontology project
that required the use of modularity and collaborative editing, in 2007 the National
Science Foundation funded a project to build a comprehensive ontology of amphibian
anatomy (AmphibAnat [2]). After determining that existing ontology editors/servers did
not meet all of their needs, the research team designed and constructed a Web-based,
multi-user, relational database ontology management system (RDBOM [3]) based on a
novel theoretical ontology model called an Ontology Abstract Machine (OAM) [4].
However, the need for multi-user, collaborative ontology development tools is
only part of the problem. The extent of user participation in developing an ontology (of
any size) also is an important consideration. Typically, ontology curation is performed
manually by a small number of users using a single-user-based ontology editor such as
OBO-Edit [5], Protégé [6], SWOOP [7], or COBrA [8], and the ontology files are
maintained with a version control system. When a very limited number of people are
modifying an ontology, it may be feasible for the entire process to be overseen by those
who are “in charge”; they can exercise some degree of control over the data entry, and
make sure that the data adhere to an agreed upon design.
Much larger community-curated ontology projects require automated control over
the data entry process, particularly in terms of data validation. As an example of such a
2
project, the AmphibAnat ontology of amphibian ontology contains over 22,000 terms and
approximately 44 relationship types, and is curated by a community of 21 active users
from various specific areas of anatomical expertise. Initially, the administrators created a
skeleton for the ontology hierarchy, and defined the relationships to be used to link terms.
Access to branches of the ontology was assigned to various members of the community
to allow for content to be added and updated. It made sense to utilize a database
management system to address several multi-user issues, including concurrency control
and the restriction of data access privileges.
The success of an extensive project like AmphibAnat was dependent upon the
participation of a fairly large number of users. Although the administrators were
responsible for major design decisions, they needed to have the ability to enforce such
decisions in the open development environment. One such requirement was the ability to
restrict the use of term relationships based on characteristics defined with respect to the
relationship, in particular domain and range. That would help prevent erroneous inputs,
both accidental and malicious, and thereby improve the overall integrity of the ontology.
Additionally, it would help clarify the relationship definitions.
Herein is presented a modified abstract ontology model that includes property
restrictions and the associated validation algorithms. As proof of concept, it is described
how this model has been implemented for domain and range restrictions in the RDBOM
relational database driven ontology maintenance system. The modified ontology abstract
machine model and its related algorithms are discussed in Section 3. An understanding
of the internal structure of RDBOM is covered in Section 4. In Section 5, discussion is
focused on introducing restricted property characteristics to RDBOM as well as the need
in some situations for the restrictions to be ignored and thus violated. Finally, Section 6
pertains to the actual implementation of restricted properties in RDBOM.
3
2. BACKGROUND AND MOTIVATION
There are several important concepts regarding ontologies that need to be
discussed before presenting the OAM and RDBOM modifications to support property
restrictions. The first such subject is what exactly property characteristics are, and how
they can be used to check if a property is being used correctly (and consistently)
throughout an ontology, as well as how additional information about terms can be
inferred through property characteristics. The second subject is how the concepts of data
constraints such as domain and range apply to an ontology; specifically, the validation
capabilities of ontology languages such as the Web Ontology Language (OWL) and the
Open Biological and Biomedical Ontology language (OBO) are compared to the desired
functionality of RDBOM's domain and range validation. Lastly, a formal definition and
several examples of the original Ontology Abstract Machine are given.
2.1. PROPERTY CHARACTERISTICS
There must be a clear understanding of how the concepts of domain and range
apply to ontologies. In an ontology, properties are used to describe the relationships
among terms. The relationship itself is regarded as a property of a term. A property
characteristic describes additional information about a specific property. OWL facilitates
the designation of several property characteristics including transitive and inverse
relations [10]. For example, the transitive property characteristic can be used to indicate
that if a transitive property relates term A to term B, and relates term B to term C, then
term A is related to term C.
A property characteristic such as transitivity can be used to infer additional
information about terms in the ontology. Often this is referred to as reasoning over
information from an ontology. Other types of property characteristics can be used to
restrict the values for a particular property. For example, restricting the minimum or
maximum numerical values of a property has_legs would ensure that every use of the
has_legs property lies within a certain range. In doing this, consistency checking of the
property's use ensures that the target value meets the restriction set in place on the
4
property. In the discussion that follows in this section, it is shown how property
characteristics such as domain and range can be used to perform both consistency
checking of an ontology as well as how they can be used for reasoning by OWL.
It should be noted that the objective of this work is to implement the restricted
form of property characteristics, a much more focused type of a property characteristic.
It requires additional functionality to account for the validation of property use that has a
restricted property characteristic applied to it. As will be discussed in subsequent
sections, this work also accommodates possible violations of the restricted property
characteristic in order to track and eliminate issues in existing ontologies. Implementing
non-restricted property characteristics in a relational database driven ontology
maintenance system such as RDBOM is not nearly as difficult, and does not satisfy the
functionality desired by the end users of RDBOM for the AmphibAnat project; hence,
this is the reason for focusing on restricted property characteristics. Additionally, the
mechanisms required for restricting a property characteristic can potentially be used to
infer additional information on non-restricted property characteristics, as will be
discussed in Section 7.
2.2. DOMAIN AND RANGE
Domain and range are both features of popular ontology languages such as OWL
(via RDFS) [9] and OBO [5]. Both of those languages use virtually the same definition
for domain and range. From [10], domain is defined as “if a property P has domain D,
then any term T that has a relationship of type P to another term is a subclass of D.” This
states that “any term that has a relationship of type P to another term is by definition a
subclass of D.” Also from [10], range is defined as “if a property P has range R, then any
term T that is the target of a relationship of type P is a subclass of R.” Equivalently, “any
term that is the target of a relationship of type P is by definition a subclass of R.” Other
relationships within the set of properties (such as disjointness) provide additional
restrictions that a reasoner could use to determine violations of the integrity of the data.
An example of using domain and range within OWL to restrict the values of a
property can be seen in Figure 2.1. The ontology consists of classes for Animals and
5
Food. Dog and Cat are both subclasses of Animals. An object property eats is defined
with the domain of Animals and the range of Food. An instance of Cat is created with
the name of Horace, and an instance of Dog is created with the name of Floyd. Floyd is
then assigned the relationship eats Horace. Since the Animals and Food classes are
disjoint, a reasoner could detect an error about the use of Horace as the range of the eats
relationship for Floyd. This implies the use of eats is inconsistent with its definition.
If the two terms, Animals and Food are not disjoint then the behavior is different.
For terms used as the domain of property eats, one could infer additional information
about the term, stating that it is a subclass of Animals which is the specified domain of
eats. Likewise, terms used as the range term of eats would be inferred to be subclasses of
Food. Applying this to the Floyd eats Horace relationship in Example 1 without the
disjointWith statements, Floyd is being used as the domain and is already a subclass of
Animals, so no additional information is inferred. However, Horace is only a subclass of
Animals. By using Horace as the range of a use of the eats property an entry could
automatically be inferred, stating that Horace is also a subclass of Food, the range term
for eats.
<owl:ObjectProperty rdf:about="#eats">
<rdfs:domain rdf:resource="#Animals"/>
<rdfs:range rdf:resource="#Food"/>
</owl:ObjectProperty>
<owl:Class rdf:about="#Animals">
<rdfs:subClassOf rdf:resource="&owl;Thing"/>
<owl:disjointWith rdf:resource="#Food"/>
</owl:Class>
<owl:Class rdf:about="#Cat">
<rdfs:subClassOf rdf:resource="#Animals"/>
</owl:Class>
<owl:Class rdf:about="#Dog">
<rdfs:subClassOf rdf:resource="#Animals"/>
</owl:Class>
<owl:Class rdf:about="#Food"/>
6
<owl:Class rdf:about="&owl;Thing"/>
<owl:Thing rdf:about="#Horace">
<rdf:type rdf:resource="#Cat"/>
</owl:Thing>
<owl:Thing rdf:about="#Floyd">
<rdf:type rdf:resource="#Dog"/>
<eats rdf:resource="#Horace"/>
</owl:Thing>
Figure 2.1. Example of a Domain and Range Restriction in OWL
In a multi-user environment, it can be a challenging task to enforce constraints
such as the domain and range restrictions on the values of relationships. In this case, the
domain and range concepts provide the restrictions to be placed on the data, unlike their
specification in OWL or OBO, where they instead are used to define the data. Using
similar data to that from Figure 2.1, if someone tries to assert that “Floyd eats Horace”,
instead of defining Horace to be a subclass of Food (as would be the case for OWL or
OBO), the system must detect a violation of the range of eats (since Horace is not
already a subclass of Food). Note the distinct differences between OBO/OWL and this
philosophy. For OBO/OWL, the object or relation is added to the existing set of data
(e.g., Horace becomes a subclass of Food) and is considered inference of additional
information for Horace. In a multi-user curation environment the data must already exist
(e.g., Horace must be defined as a subclass of Food) so that the specification of the
domain or range causes a validation to occur, and does not cause a new relation to be
added.
The functionality desired from the OAM and RDBOM modifications is to
accommodate checking for consistent property use throughout the ontology. The ability
to detect the invalid property use of eats as with OWL in Figure 2.1 should be able to be
performed without the explicit disjointWith statements existing in the class definitions in
RDBOM.
7
2.3. ONTOLOGY ABSTRACT MACHINE
As previously mentioned, specification and (automated) enforcement of domain
and range restrictions are necessary to ensure the data integrity of large communitycurated ontologies. One such ontology, the AmphibAnat project, is based on the OAM
model; hence restrictions such as domain and range need to be defined formally as part of
that model. The motivations for developing the OAM model, the model definition, and
the various algorithms associated with the model are discussed in detail in [4]. Here is
provided a brief overview of the basic OAM model before the incorporation of property
restrictions is addressed. The formal definition of the OAM is given as follows:
An Ontology Abstract Machine (OAM) is a 5-tuple representation of an ontology,
M = (Q, ∑, δ, Q0, F), where:
Q: set of nodes; Q = Qc ∪ Qi ∪ Qv
Qc = set of classes
Qi = set of instances
Qv = set of values
∑: set of relationship types
∑ = ∑B ∪ ∑E
∑B = set of base relationship types, e.g. {is_a, part_of}
∑E = set of extended relationship types, e.g. {is_from_literature, image_ contains,
is_from_image, …}
δ: set of relationships in the form of edges (node, relationship type, node), Q x ∑ → Q;
hence each element is a child node, a relationship type, or a parent node.
Q0: set of source nodes: These are nodes with no incoming ∑B edge. This set can be
identified from δ. Q0 is a subset of (Qc ∪ Qi). Source nodes can only be elements of
the set of classes or elements of the set of instances.
F: set of root nodes, i.e. nodes with no outgoing ∑B edge. e.g. F = {Concepts}, F is a
subset of Qc.
Another set U = {u1, u2, … ui … un} is used to represent individual user ids in
8
order to implement security features. Any elements of U can be associated with any node
in Q.
∑B is a set of base relationship types. Members of this set can be used as links
(among classes and instances) to generate the main graph view of an ontology; two of the
most common base relationship types used for this purpose are is_a and part_of. Here it
is assumed that the main graph view is acyclic.
∑E is a set of extended relationship types. The member elements can be used as
links between values and classes, or as links between values and instances. In other
words, they are used to link attributes to classes and instances. Another required function
is the ability to link classes and instances (without affecting the main structure of the
ontology).
To better understand the OAM model, an OAM will be developed based off the
ontology presented in Figure 2.1. The OAM is described in Example 2.1 and depicted in
Figure 2.2.
OAM instance M = (Q, ∑, δ, Q0, F):
Q = Qc ∪ Qi ∪ Qv = {Thing, Food, Animals, Dog, Cat, Floyd, Horace};
Qc = {Thing, Food, Animals, Dog, Cat}; Qi = {Floyd, Horace};
Qv = {}
∑ = ∑B ∪ ∑E = {is_a, eats}
∑B = {is_a}; ∑E = {eats}
δ = {(Food, is_a, Thing), (Animals, is_a, Thing), (Dog, is_a, Animals),
(Floyd, is_a, Dog), (Cat, is_a, Animals), (Horace, is_a, Cat)}
Q0 = {Food, Floyd, Horace}
F = {Thing}
Example 2.1. OAM Representation of the Ontology Presented In Figure 2.1
9
Thing
is_a
is_a
Animals
is_a
Dog
Floyd
is_a
Cat
Horace
Food
Figure 2.2. OAM Diagram of the Ontology in Example 2.1
As discussed in [4], the OAM model is a generic theoretical model developed to
provide a framework for constructing a collaborative, Web-based ontology management
system that supports modularity and distinct relationship classifications. For
collaboration purposes, it is useful to assign user access rights to a subset of the ontology;
for example (M, Animals) denotes a subset of M [4] (or informally a branch below the
Animals node). The RDBOM relational database driven ontology management system
uses the OAM model to implement the multi-user access control capability [3] [4]. For
example, administers can attach a user ui to the node Animals, thereby granting user ui
both access and update rights to the subset (M, Animals), but not to other parts of the
ontology M.
The original OAM model did not include the capability to identify the domain and
range of a relationship, and thereby restrict the classes that can be used with a
relationship. For example, referring to Example 2.1, the administrators should be able to
control the domain of “∑E = {eats}” and the range of “∑E = {eats}”; specifically, the
domain of eats should be nodes from (M, Animals), and the range of eats should be nodes
10
from (M, Food). If a user tries to create an edge (Floyd eats Horace), the OAM model,
and hence the RDBOM system, should give a domain/range violation warning.
2.4. RESTRICTION CAPABILITIES OF ONTOLOGIES REPRESENTED AS
RELATIONAL DATABASES
In [11], the importance of formally stating rules regarding relations is
emphasized. In an open, multi-user setting, rules (or restrictions) are needed to ensure
that the community follows the standards and designs that have been decided upon for the
particular ontology. The widely used Protégé application [6] user interface supports
consistency checking through FaCT [12], which can be used on the entire ontology or
only on a selected part of the ontology. There are also plugins for other reasoners such as
Pellet [13].
Ontology development project teams have started to recognize how valuable it is
to be able to check the consistency of an ontology. Members of the Gene Ontology
project [14] stated that as the size of their ontology increased, the curation of it became
much more challenging. As a result, they turned to the use of Protégé, in part for its
consistency checking features, to ensure that the knowledge model was being used
correctly by its classes and instances.
For some applications, it is useful to convert an ontology into a relational
database, as demonstrated in [15]. There exist systems such as Minerva [16] that support
persistent storage and inference of OWL ontologies in a relational database. Likewise,
DBOWL [17] consists of an OWL relational database storage system, and an OWL
reasoning system. Thus, it is possible to store an ontology as a relational database, and to
specify OWL-compatible domain and range restrictions, such as those illustrated in
Figure 2.1. However, these systems are designed for persistent storage of ontologies,
and not dynamic management of the data contained therein; if changes are made to the
data, the database must be reloaded and the consistency checker must be re-executed.
11
2.5. SUMMARY
This section has provided some background information about the differences
between property characteristics, restricted property characteristics, and different ways
they can be used within an ontology to perform consistency checking of properties used
in relationships as well as inference of additional term data. The OAM model and the
RDBOM implementation aim to incorporate restricted property characteristics such as
domain and range. This provides a means for checking the consistency of properties and
how they are used throughout the ontology, which has shown to be valuable in RDBOM's
multi-user curation environment. The original OAM model was introduced and applied
to Figure 2.1, as can be found in Example 2.1 and Figure 2.2, in order to provide a basic
understanding of how it represents an ontology. In the next section the OAM model will
be extended to support restrictions, using domain and range as a proof of concept.
12
3. MODIFIED ONTOLOGY ABSTRACT MACHINE
The original definition of the OAM model presented in the previous section must
be modified to take into consideration restrictions such as domain and range. Those
modifications, together with the supporting algorithms for maintenance of that
information, are provided in this section. Additionally, several examples are given using
a very small section of the Amphibanat ontology to enforce these concepts. Lastly, an
algorithm for validating a new edge (relationship) introduced to the OAM is provided
along with an example.
3.1. MODIFIED OAM MODEL
Introduced to the model are two new sets, δdomain and δrange. Both of these sets
contain tuples of a relationship type, and a node which represents the property to restrict
as well as the node to which to restrict the values. That is, all domain and range property
restrictions are specified in δdomain and δrange, respectively.
The (modified) Ontology Abstract Machine is a 7-tuple representation of an
ontology, M = (Q, ∑, δ, Q0, F, δdomain, δrange), where:
Q: set of nodes; Q = Qc ∪ Qi ∪ Qv
Qc = set of classes
Qi = set of instances
Qv = set of values
∑: set of relationship types
∑ = ∑B ∪ ∑E
∑B = set of base relationship types, e.g. {is_a, part_of}
∑E = set of extended relationship types, e.g. {is_from_literature, image_ contains,
is_from_image, …}
δ: set of relationships in the form of edges (node, relationship type, node),
13
Q x ∑ → Q; hence each element is a child node, a relationship type, or a parent node.
Q0: set of source nodes: These are nodes with no incoming ∑B edge. This set can
be identified from δ. Q0 is a subset of (Qc ∪ Qi). Source nodes can only be elements of
the set of classes or elements of the set of instances.
F: set of root nodes, i.e. nodes with no outgoing ∑B edge. F is a subset of Qc.
δdomain: set of meta data in the form of (relationship type, node), ∑ x Q. This is
used to define the domain of a relationship type; it means that the domain of this
relationship type can only be nodes from subset (M, node). Normally the relationship
type is an element of ∑E, but it also can be an element of ∑B.
δrange: set of meta data in the form of (relationship type, node), ∑ x Q. This is
used to describe the range of a relationship type; it means that the range of this
relationship type can only be nodes from subset (M, node). Normally the relationship
type is an element of ∑E, but it also can be an element of ∑B.
The descriptions of U = {u1, u2, … ui … un}, ∑B, and ∑E are the same as their
descriptions described in Section 2.3, the original OAM definition.
3.2. MODIFIED OAM MODEL EXAMPLES
Several examples will be given to demonstrate the modifications made to the
original OAM model. Example 3.1 illustrates how the (modified) OAM can be used to
represent domain and range restrictions.
Q = Qc ∪ Qi ∪ Qv
Qc = {Concepts, image, Gaupp_1896_Fig_11, synonym, sternum inferius,
literature, Maglia et al. 2007, amphibian anatomical entity, sternum};
Qi = {}; Qv = {}
∑ = ∑B ∪ ∑E = {is_a, is defined by, has related synonym, image contains}
∑B = {is_a}; ∑E = {has related synonym, image contains}
δ = {(image, is_a, Concepts), (Gaupp_1896_Fig_11, is_a, image),
14
(synonym, is_a, Concepts), (sternum inferius, is_a, synonym),
(literature, is_a, Concepts), (Maglia et al. 2007, is_a, literature),
(amphibian anatomical entity, is_a, Concepts),
(sternum, is_a, amphibian anatomical entity),
(Gaupp_1896_Fig_11, image contains, sternum),
(sternum, has related synonym, sternum inferius)}
Q0 = {Gaupp_1896_Fig_11, sternum inferius, Maglia et al. 2007, sternum}
F = {Concepts}
δdomain = {(image contains, image)}
δrange = {(has related synonym, synonym)}
Example 3.1. Example of the Modified OAM Representation of an Ontology
In Example 3.1, the domain specification of the relationship has_related_synonym
and range specification for the relationship image_contains were not included for brevity.
It is assumed that the image_contains relationship can be used with any term in the
ontology being a valid range term. Likewise, the has_related_synonym relationship can
correctly be used with any term in the ontology as its range.
Shown in Figure 3.1 is the graphical OAM representation of the ontology in
Example 3.1. Note that this is a very simple example with only a few nodes. In the
AmphibAnat project the OAM is used to represent ontology modules which each contain
thousands of nodes. For the RDBOM implementation, the OAM data are stored in a
relational database. However, the OAM model is implementation independent, and could
be stored as a flat file or simply stored in system memory.
In Example 3.1 it can be seen that δdomain = {(image contains, image)}, which
means the domain of relationship type image contains can only be nodes from subset (M,
image). The edge (Gaupp_1896_Fig_11, image contains, sternum) satisfies this
requirement because the node Gaupp_1896_Fig_11 is under (M, image).
Similarly, δrange = {(has related synonym, synonym)}, means the range of
relationship type has related synonym can only be nodes from subset (M, synonym.) The
15
edge (sternum, has related synonym, sternum inferius) satisfies this requirement because
sternum inferius is under (M, synonym).
Concepts
is_a
image
is_a
Gaupp_
1896_Fig_
11
is_a
synonym
is_a
sternum
inferius
is_a
literature
is_a
Maglia et
al. 2007
is_a
amphibian
anatomical
entity
is_a
sternum
has related
synonym
image contains
Figure 3.1. OAM Diagram of the Ontology in Example 3.1
3.3. ALGORITHMS TO VALIDATE RELATIONSHIP EDGES
Because the OAM model has been modified to store domain and range property
restrictions, an algorithm needs to be provided to enable the addition and enforcement of
the restrictions when adding new edges (relationships) to the OAM. To provide the
functionality required to specify domain and range restrictions, and perform validation
checks, Algorithm 3.1, Algorithm 3.2 and Algorithm 3.3 were developed. An application
of Algorithm 3.2 is given in Example 3.2 where a new range restriction "is defined by has
range literature" is created.
16
Input: An OAM instance, and a new domain definition in the form of (r, n),
where r is a relationship type and n is a class node.
Output: An updated OAM instance.
Algorithm:
if n is not an element of Qc then
exit and output the error message “class node n does not exist”
end if
if r is not an element of ∑ then
if r should be of base relation type then
add r to set ∑B
elseif r should be of extended relationship type then
add r to set ∑E
end if/elseif
else
δdomain = δdomain ∪ {(r, n)}
end if/else
Algorithm 3.1. Create a New Domain
Input: An OAM instance, and a new range definition in the form of (r, n),
where r is a relationship type and n is a class node.
Output: An updated OAM instance.
Algorithm:
if n is not an element of Qc then
exit and output the error message “class node n does not exist”
end if
if r is not an element of ∑ then
if r should be of base relation type then
add r to set ∑B
elseif r should be of extended relationship type then
add r to set ∑E
17
end if/elseif
end if
δrange = δrange ∪ {(r, n)}
Algorithm 3.2. Create a New Range
Input: The OAM instance in Example 3.1, and a new range definition in the
form of (is defined by, literature), where is defined by is an extended
relationship type, and literature is a class node.
Steps:
Since literature ∈ Qc; continue
is defined by ∉ ∑; so add is defined by to ∑E (is defined by is an extended
relationship type)
now ∑E = {has related synonym, image contains, is defined by}
δrange = δrange ∪ {(is defined by, literature)}
now δrange = {(has related synonym, synonym), ( is defined by, literature)}
Output: An updated OAM instance as shown below.
OAM instance M = (Q, ∑, δ, Q0, F, δdomain, δrange):
Q = Qc ∪ Qi ∪ Qv
Qc = {Concepts, image, Gaupp_1896_Fig_11, synonym,
sternum inferius, literature, Maglia et al. 2007, amphibian anatomical
entity, sternum};
Qi = {}; Qv = {}
∑ = ∑B ∪ ∑E = {is_a, is defined by, has related synonym, image contains}
∑B = {is_a};
∑E = {has related synonym, image contains, is defined by}
δ = {(image, is_a, Concepts), (Gaupp_1896_Fig_11, is_a, image),
(synonym, is_a, Concepts), (sternum inferius, is_a, synonym),
(literature, is_a, Concepts), (Maglia et al. 2007, is_a, literature),
(amphibian anatomical entity, is_a, Concepts), (sternum, is_a,
amphibian anatomical entity), (Gaupp_1896_Fig_11, image contains,
18
sternum), (sternum, has related synonym, sternum inferius)}
Q0 = {Gaupp_1896_Fig_11, sternum inferius, Maglia et al. 2007, sternum}
F = {Concepts}
δdomain = {(image contains, image)}
δrange = {(has related synonym, synonym), (is defined by, literature)}
Example 3.2. Create a New Range (is defined by, literature)
Note that the only differences between the OAM instances in Example 3.1 and
Example 3.2 are the sets ∑E and δrange. It does not affect the OAM diagram, although the
meta-data concerning the range of relationships are changed. As a result, Example 3.2
has the same OAM diagram as Example 3.1, shown in Figure 3.1.
Algorithm 3.3 can be used to validate the domain and range definitions. Here it is
assumed that such validation checks normally would be performed before an edge is
added to the ontology. Example 3.3 illustrates that the algorithm performs the validation
check and detects that this is a domain/range violation; consequently, the edge creation
request is rejected. Example 3.4 demonstrates that the algorithm allows the edge creation
request, as the request satisfies all domain/range definitions.
Input: An OAM instance, and a new edge in the form of (nd, r, nr), where r is a
relationship type, and nd and nr are nodes.
Output: An updated OAM instance.
Algorithm:
if (nd ∉ Q) or (nr ∉ Q) or (r ∉ ∑) then
exit and output error message “class node or relationship type does not exist”
end if
for each element (r’, n) in set δdomain do
if (r’ = r) then
if (check_up (nd, n) = “not_found”) then
//check_up is declared below
exit and output the error message “domain (r’, n) violation”
19
end if
end if
end for
for each element (r’, n) in set δrange do
if (r’ = r) then
if (check_up (nr, n) = “not_found”) then
exit and output the error message “range (r’, n) violation”
end if
end if
end for
δ = δ ∪ {(nd, r, nr)} //add edge to OAM
check_up(test_node, node)
begin
for each ancestor path p of test_node:
if node not in p then
return "not_found"
end if
end for loop
return "found"
end check_up
Algorithm 3.3. Perform a Validation Check Before Creating a New Edge
The check_up routine included in Algorithm 3.3 is used to validate any domain or
range restrictions that apply to a new edge. In OAM it is possible for nodes to have
multiple parents. When checking to see if a node can be used in a property with domain
or range restrictions, the node must be a descendent of the node to which the property is
restricted. Depending on the location of the node being validated, the restricted node
could be an ancestor in one path but not another. The restriction violations are used to
20
perform consistency checking of the relationships using properties, so it is important to
ensure that their use is correct in every path of the node.
Input: The OAM instance in Example 3.2, and a new edge (sternum, image contains,
Maglia et al. 2007) to be added, where image contains is a relationship type, and
sternum and Maglia et al. 2007 are nodes.
Steps:
(sternum ∈ Q), (Maglia et al. 2007 ∈ Q), and (image contains ∈ ∑)
for element (image contains, image) in set δdomain
since (r’ = image contains = r)
since (check_up (sternum, image) = “not found”)
exit and output error message “domain (image contains, image) violation”
Output: Since there is a violation, the OAM instance remains unchanged.
Example 3.3. Perform a Validation Check (In Terms of Violation Detection) Before
Creating a New Edge
Input: The OAM instance in Example 3.2, and a new edge (sternum, is defined by,
Maglia et al. 2007) to be added, where is defined by is a relationship type, and
sternum and Maglia et al. 2007 are nodes.
Steps:
(sternum ∈ Q) and (Maglia et al. 2007 ∈ Q) and (is defined by ∈ ∑)
no element (r’, n) in set δdomain satisfies (r’ = is defined by), so it passes the domain check
for element (is defined by, literature) in set δrange
since (r’ = is defined by = r)
(check_up (Maglia et al. 2007, literature) = “found”), so it passes range check
δ = δ ∪ {(sternum, is defined by, Maglia et al. 2007)} //add edge to the OAM instance
δ = {(image, is_a, Concepts), (Gaupp_1896_Fig_11, is_a, image), (synonym, is_a,
Concepts), (sternum inferius, is_a, synonym), (literature, is_a, Concepts), (Maglia et
al. 2007, is_a, literature), (amphibian anatomical entity, is_a, Concepts), (sternum,
21
is_a, amphibian anatomical entity), (Gaupp_1896_Fig_11, image contains, sternum),
(sternum, has related synonym, sternum inferius), {(sternum, is defined by, Maglia et
al. 2007)}
Output: An updated OAM instance shown below. Please refer to Figure 3.2 for the
graphical OAM representation of this ontology instance.
OAM instance M = (Q, ∑, δ, Q0, F, δdomain, δrange):
Q = Qc ∪ Qi ∪ Qv
Qc = {Concepts, image, Gaupp_1896_Fig_11, synonym, sternum inferius, literature,
Maglia et al. 2007, amphibian anatomical entity, sternum}; Qi = {}; Qv = {}
∑ = ∑B ∪ ∑E = {is_a, is defined by, has related synonym, image contains}
∑B = {is_a}; ∑E = { has related synonym, image contains, is defined by}
δ = {(image, is_a, Concepts), (Gaupp_1896_Fig_11, is_a, image), (synonym, is_a,
Concepts), (sternum inferius, is_a, synonym), (literature, is_a, Concepts), (Maglia et
al. 2007, is_a, literature), (amphibian anatomical entity, is_a, Concepts), (sternum,
is_a, amphibian anatomical entity), (Gaupp_1896_Fig_11, image contains, sternum),
(sternum, has related synonym, sternum inferius), {(sternum, is defined by, Maglia et
al. 2007)}
Q0 = {Gaupp_1896_Fig_11, sternum inferius, Maglia et al. 2007, sternum}
F = {Concepts}
δdomain = {(image contains, image)}
δrange = {(has related synonym, synonym), (is defined by, literature)}
Example 3.4. Perform a Validation Check (In Terms of Acceptance) Before Creating a
New Edge
22
Concepts
is_a
image
is_a
Gaupp_
1896_Fig_
11
is_a
synonym
is_a
sternum
inferius
is_a
literature
is_a
Maglia et
al. 2007
has related
synonym
is defined by
is_a
amphibian
anatomical
entity
is_a
sternum
image contains
Figure 3.2. OAM Diagram of the Ontology in Example 3.4
3.4. SUMMARY
This section has introduced the necessary modifications to the OAM model to
accommodate property restrictions. Domain and range restrictions were used as a proof
of concept. However, it is intended that the modified OAM model definition, and
restriction maintenance and validation algorithms could be applied to other types of
restrictions as well.
In the next section, RDBOM, an implementation of the (original) OAM is
presented. Section 5 explains how the OAM modifications for property restrictions were
incorporated into RDBOM.
23
4. RDBOM'S STRUCTURE
RDBOM is a relational database driven ontology maintenance system based on
the OAM. In order to efficiently implement the OAM modifications to support property
restrictions that were discussed in the previous section, the existing RDBOM relation
schemas had to be considered carefully. In this section, a brief overview of the RDBOM
system is provided with a focus on: (1) the storage features that needed to be expanded
upon to accommodate restricted property characteristics, and (2) the role of user
authorization, which plays a part in the enforcement of restrictions. Also briefly covered
are the various operations that can be performed to modify the ontology data in RDBOM,
and how property restrictions are involved in those functions.
4.1. REPRESENTATION OF THE ONTOLOGY
At the center of RDBOM's data storage is the terms table. The terms table assigns
a unique integer identifier to a piece of text. Every piece of information in an ontology is
stored as a term in RDBOM. It does not matter what the type of information is, whether
it is the name of a class, the name of a property, a definition of a term, the path to an
image representing a term, etc. Additionally, the terms table entry also provides the
identifier of the ontology with which it is associated.
The term types table gives definitions and identifiers for every type of term that is
used in RDBOM. The 'class' term type is used as in OWL, and can be seen in Figure 2.1
to represent the Food and Animal classes in the ontology. 'Value' term types identify
terms being used as class definitions or unique identifiers for classes within the ontology
so they can be referenced easily. Instances of classes are denoted with the 'instance' term
type (e.g., Horace and Floyd from Figure 2.1). Any properties (e.g., eats from Figure 2.1)
are specified with the 'property' term type.
In order to express how a term should be used in the ontology, one or more entries
are created for it in the term usages table. The term usages table uses foreign keys to
reference a terms table entry and a term types entry. Additionally, for terms defined to be
of type 'property' via the term usages table, a corresponding entry in the properties table
24
is made. The properties table was originally intended to store property characteristics for
every property. The table includes an indicator for whether or not the property is
considered a 'tree' property, (i.e., is a and part of), which subsequently determines if that
term gets displayed as a node in the hierarchical display of the ontology in the user
interface browsing mode. There are other attributes to denote if it a property is transitive,
reflexive, symmetric, antisymmetric, or cyclic, as well as an attribute to provide the
identifier for an inverse property term (if one exists).
In the original RDBOM schema, the property characteristics (attributes) are
nothing more than flags to indicate the characteristic. However, this does support the
more generic approach to maintaining property characteristics (namely, those that require
some other term to be the property characteristic's value). For example, if the domain
and range restricted property characteristics were to be implemented in the same manner,
then two more columns would have to be added to the properties table to indicate the
terms to which they are restricted. Herein, domain and range are being used as proof of
concept for a generic implementation of property characteristics. This means for any
other property characteristics that would be added to RDBOM in the future, the schema
for the properties table would need to be modified in the same manner (i.e., adding
additional attributes to the RDBOM tables). Additionally, it would not allow for the
same property characteristic to be applied to multiple terms for a single property. The
user could not restrict a property to multiple classes in the ontology since the properties
table maintains a 1-to-1 relation between property terms and the property characteristic
information.
Another consideration in the incorporation of property restrictions into RDBOM
is the existing term2terms table which allows properties to be used to link two terms
together in the form of "term1 relation_term term2." In Figure 2.1, the relationship of
Floyd eats Horace would be represented as term1 referring to Floyd, relation_term being
eats and term2 referencing Horace. The use of this table and additional term types
entries provides the ability to generically store both restricted and unrestricted property
characteristics in the RDBOM schema.
25
4.2. ONTOLOGY MODIFICATION
The ontology data stored in the RDBOM database is modified through a series of
Web pages. All of the commands are initiated by first selecting a class term to modify.
From there the user is presented with the list of commands that can be applied to the
selected term. The commands are split into two categories, with the first category
encompassing operations for modifying the ontology's structure. Figure 4.1 shows the
ontology from Figure 2.1 represented in RDBOM with the update page for the term
Horace selected.
Figure 4.1. RDBOM Update Page for Horace
26
Structure modifications allow for the creation and deletion of child terms, moving
(cutting and pasting) a term to a different parent, linking the term to an additional parent,
and also detaching the term from a parent. The move operation can be applied to both
leaf and non-leaf terms. When used on a non-leaf term, it serves the function of moving
a branch of the ontology. Linking a term to an additional parent causes multiple ancestry
paths for a term. This is important to note since domain and range restrictions of any
property applying to the term must be valid in each path.
The second category of modifications deals with the content of the term.
Definitions can be created for the term, and changes can be made to the term's name.
Relationships can be created to link two terms together by a property. Every creation of a
relationship begins with the domain term. From this term, the property is selected
followed by the range term. It also allows users to delete relationships. RDBOM
contains an administrative module that allows administrators to create and delete
properties in the ontology, grant specific users access to the update pages, and import
existing ontologies into RDBOM's database. These descriptions of the various structural
and content operations will be used to identify locations requiring changes to
accommodate the validation of domain and range restrictions in the next section.
4.3. USER AUTHORIZATION
User authorization is important to understand since ultimately some users will be
granted permission to violate restrictions. There are already existing mechanisms
allowing users to authenticate with RDBOM by logging in with a user name and
password. The only authorization performed prior to this work was used for controlling
access to ontology modifications. As previously discussed, ontology data in RDBOM are
maintained through a series of update operations.
User accounts and their information are stored in the users table. The
authorization levels table provides the various tasks for which the system must authorize
users. The 'update' authorization allows users to gain access to the data update operations
of an ontology. However, the user might not have access to the entire ontology.
Administrators use RDBOM's administrative module to grant the update permission to
27
users on a case by case basis, giving a user access to update only the parts (or “branches”)
of the ontology for which the user is regarded as an expert. This is done by granting
access to the some class term in the ontology, and allowing update operations to be
performed on any class that has the specified term as an ancestor (i.e., so that the user has
permission to modify a branch of the ontology).
The permissions granted are stored in the authorizations table which uses foreign
key references to identify a user from the users table, a particular permission from the
authorization levels table, and a term from the terms table. An entry in the authorizations
table can be used to specify a user, the permission the user has, and a branch of the
ontology to which the authorization levels entry applies. Multiple authorizations entries
are used to identify distinct branches in an ontology that a user has permission to update.
4.4. SUMMARY
The interactions between the main tables used for capturing the information in an
ontology in RDBOM have been identified in this section, as well as the mechanisms
involved with granting users permission to update various branches of the ontology.
Additionally, the operations used to modify the ontology data were identified. These
concepts are necessary to understand the design of the modifications to RDBOM that
were required in order to support domain and range validation, the topic of the next
section.
28
5. PROPOSED SOLUTION
This section discusses the design of the proposed solution and highlights the parts
of RDBOM that need to be modified in order to support restricted property
characteristics, specifically for domain and range as proof of concept of the more general
case. Details that are specific to each change in RDBOM are explained, including
database schema and update operation changes. First the changes required to generically
represent any restricted property characteristic are identified. Next discussed are the
steps required to validate property use in the ontology when there is a restricted property
characteristic specified. Finally, consideration is given for when restricted property
characteristics can knowingly be violated by users, and how to deal with recording such
violations. The actual implementation of these modifications in RDBOM will then be
covered in the following section.
5.1. PROPERTY CHARACTERISTICS
To accommodate restrictions such as domain and range, changes needed to be
made to the way property characteristics are stored in RDBOM. As identified in Section
4.1, property characteristics are tightly coupled with the properties table. It restricts the
number of property characteristics that can be applied to any given property to a single
term or value, and also requires the schema to be altered for any addition of property
characteristics. To allow for any arbitrary property characteristics to be represented in
RDBOM, two new term types of 'property characteristic' and 'restricted property
characteristic' were introduced to the term types table. Property characteristics could then
be applied to a property via the term2terms table.
The RDBOM user interface already has full support for defining new
relationships, and using them to link classes and instances in an ontology. This
functionality was exploited to define the domain and range properties of relationship
types. For simplicity, a class, instance, or relationship type in RDBOM will be referred
to as a term. Recall that in RDBOM, the relationships between terms (δ in the OAM) are
inserted into the term2terms table in the form of (term1, relationship type, term2), which
29
represents the relationship 'term1 has relationship type term2.' Referring to Example 3.1,
the domain description (image contains, image) is represented in RDBOM by the
term2terms entry (image contains, has domain, image). Similarly, the range description
(has related synonym, synonym) can be expressed with the term2terms entry (has related
synonym, has range, synonym). By doing this, the required information is available to
RDBOM to allow Algorithm 3.3 to be applied for validating a relationship. It also should
be noted that this can be used to represent other relationship characteristics, both
restrictive and non-restrictive, such as cardinalities, inverse of, and symmetric.
The addition of property characteristics necessitates a few modifications to the
RDBOM user interface. The update pages requiring validation will be discussed shortly.
But now the user must be informed of any property characteristics involved in the domain
and range restrictions. Also, the administrative module needs to now also include
management of the ontologies restricted property characteristics, domain, and range.
This requires providing an interface for administrators to enter any domain or range
restrictions that should be placed upon existing properties.
5.2. VALIDATING RESTRICTED PROPERTY CHARACTERISTIC USE
Once there was a way to store restricted property characteristics in the database, it
needed to determine which existing parts of RDBOM should be changed to accommodate
enforcing the domain and range restrictions. Several of the update pages mentioned in
the previous section would require validation. Validation must also occur in the
administrative module when a restricted property characteristic is created or an ontology
is imported into RDBOM. The method in which validation should occur needed to be
determined as well, with consideration of any mechanisms already provided by
RDBOM's relational database management system that could assist in the process.
One approach that was considered for doing this was the use of SQL triggers.
Triggers can be specified for relational database tables to control the actions taken
whenever an update, insert, or delete action occurs [18]. For a generic relational database
ontology implementation, triggers could be predefined to check transactions that occur
upon insertion of relationships between classes, thus performing validation of the
30
relationship usage. Additionally, they could be used upon updating the domain or range
of existing relationship types to make sure that the modified relationship types remain
valid. However, this could be a complicated and difficult task to perform on a nongeneric database implementation of an ontology.
Another problem in using triggers in this manner arises from the potential need to
be able to violate restrictions under certain circumstances. The trigger would become
much more complex to specify since it could not simply refuse the transaction, but
instead would need to manage the violations, with differing actions under differing
conditions.
Further, if the domain and range of relationship types are the only restrictions
being used in the ontology, not every insert or update transaction on the term
relationships may need to be checked. For example, actions such as creating child classes
for terms and updating term definitions do not involve domain and range restrictions, but
still might activate triggers on the associated tables.
An alternative to database triggers is to perform these tasks in the ontology
management system's application logic. In so doing, the actions for a particular
validation could more easily be controlled. Validation could still be checked as the
classes are accessed or updated, as well as upon demand for the entire ontology. This
would eliminate much of the automated management required by triggers (since there is
no longer a single point responsible for handling all validation cases). Furthermore,
being able to check the entire ontology also would allow for the validation of the domain
and range of relationships in imported ontologies.
The main disadvantage to implementing validation in the application logic is that
every location where data in an ontology can be modified in some way must be assessed
to determine if any validation needs to be performed. Several update operations
accessible through the user interface were identified as requiring modification. The first
was the Link Node to Node page which allows users to tie two classes together with a
property. The second was the Move/Cut Node page that can be used to change the
location of a single node or branch in the ontology. Figure 5.1 shows Figure 2.1 loaded
into RDBOM before a move operation, the dialog prompted to the user for moving
Horace to be located under the node Food, and then a view of the ontology after the
31
operation. The red boxes highlight changes in the ontology's structure where Cat no
longer contains Horace as a subclass and Food does.
Figure 5.1. Move/Cut Node Operation in RDBOM
The last operation requiring validation was the Link to Additional Parent. When a
term is linked to an additional parent a new ancestry path is created for the term since it is
present in a new location of the ontology. If the term is involved in any relationships
using properties with restricted domains or ranges, it may no longer be considered valid
in this new location. The new set of ancestors created by the path to the new parent
might not include the terms to which the properties are restricted. In this case, the new
path is invalid and should not be allowed. An example of a Link to Additional Parent
operation performed in RDBOM without the property restrictions in place can be seen in
Figure 5.2 where Horace is linked to an additional parent of Food. The red box
highlights the change to the ontology's structure in which Horace is now located under
the Food term as well as Cats.
Figure 5.2. Link to Additional Parent Operation in RDBOM
32
5.3. VIOLATING RESTRICTED PROPERTY CHARACTERISTICS
One issue with restricting the domain and range of a relation lies in giving the
users the ability to breach such restrictions. Any action that would violate what is
defined as the domain or range for a given relationship type should be considered
carefully to determine whether the current ontology design is adequate, especially if the
action was not inadvertent. If it is a violation that is intended to be persistent, this
suggests that there might be another solution for representing the information; either the
ontology design needs to be changed, or the violated relationship needs to be modified to
include other domain and/or range classes. Consider Figure 2.1 and 2.2, in which the
range of the relationship type eats is defined as Food. Introducing the relationship (Floyd
eats Horace) causes a violation of the specified domain for eats. Review of this violation
suggests that perhaps dogs can in fact eat cats, and that this was not a careless mistake
made by the user. By addressing this issue, and perhaps specifying an additional range
class of Cat to the relation eats, or changing the relationship type eats to be eats
nonliving, the clarity of the intended semantics of the ontology would be improved.
Some administrative tasks require the ability to identify all of the violations of the
restricted property characteristics. A motivating factor is the application of restriction
violations to existing properties that are actively being used for curation in the ontology.
There are ontologies already being developed in the RDBOM system, and other existing
ontologies easily can be imported into a RDBOM ontology from the OBO ontology file
format. Without a way to identify existing violations when a new domain or range
restriction is created, there is no easy way for administrators to create the restrictions. If
the restrictions were added, then the existing invalid property uses would not be captured,
or else the task of correcting all the errors would be entirely the responsibility of the
administrators. RDBOM was developed to facilitate community curation of ontologies.
Therefore, there needed to be a way for the violations to be presented to the users. This,
in turn, would help the administrators in adopting the use of restrictions such as the
domain and range since they could rely on the community to see invalid relationships and
resolve them; the task would not be solely the responsibility of the administrators.
There also are scenarios in which a restriction temporarily must be broken, such
as moving entire branches within an ontology; intermediate states may be in violation of
33
the rules, even though the final state would not contain those violations. An example of
such a situation would be restructuring items that are subclasses of Food in Examples 2.1
and 2.2. Suppose that a user wants to add new classes under Food such as Meat, Fruit,
and Vegetable. The relocation of the current classes and instances that have a parent of
Food to a more specific parent would result in their restriction to Food to be broken
temporarily.
In some cases, the user community must be able to perform these operations. And
perhaps there may be a situation in which an administrator should have the option of
intentionally breaking a relationship restricted by domain or range. By making
administrators the only users able to violate these restrictions, it allows them to screen the
actions of other users; that is, other users would be required to contact an administrator to
make any changes that violate a domain or range restriction. Administrators could then
either make the change if desired, or alternatively could make decisions on how better to
represent the information involved in the violation.
It is possible that some members of the community will be trusted enough by the
administrators to make these types of changes. In the current OAM model, users must be
assigned rights to a branch of the ontology before they are able to modify it. One
reasonable solution for giving a user the ability to violate a domain or range restriction is
to add another permission setting (in addition to the branch permission) that would
specify whether the user would be allowed to violate a restriction. Administrators then
could control not only the branches that a user is able to update, but also the ability of the
user to violate restrictions that have been set in place (perhaps based on the user's
experience and expertise).
To avoid constantly re-running the validation routines on terms in the ontology
each time they are accessed, a new table was introduced to persist the violated restriction
information. This reduced the number of times that the validation algorithm has to be
executed and improves the performance of the RDBOM system. Violated relationships
are initially determined based on Algorithm 3.3, and are then recorded in a dedicated
table. After an initial check to verify the data, this provides quicker access to invalid
relationship use. That is, upon visiting a term simply for display purposes, any domain or
range violations already will have been determined, and recorded in a table.
34
The schema for the table used to maintain violations is as follows:
TABLE restriction_violations (
id int IDENTITY(1,1),
violated_term2terms_id int,
violated_term_usage_id int,
restricted_term2terms_id int,
)
Schema 5.1. Restriction Violations Table Schema
Here id is a unique identifier for the given violation, and term2terms_id refers to
the unique identifier of the term2terms entry that violates the specified domain or range
of a relationship type (where a 'term2terms' entry is in the form of (term1, relationship
type, term2)). The violated_term_usage_id refers to the unique identifier for a term that
is stored in the 'term2terms' record. The term_usage_id is the identifier used to reference
a particular term, in this case term1 or term2 from a term2terms entry, in the ontology.
Thus, the violated_term_usage_id value of the restriction violations entry can be used to
determine whether it was the domain of the relationship type that is violated (in which
case it would be found in the term1 location of the term2terms entry), or the range of the
relationship type (in which case it would be found in the term2 location).
Also included in the table is a foreign key reference to the term2term entry that
contains the definition of the restricted property characteristic being violated,
restricted_term2terms_id. Relying solely upon violated_term2terms_id and
violated_term_usage_id provides enough information to determine if the violation
occurred due to a domain or range violation and allows for the retrieval of the violating
property usage; however, it does not provide enough information to say why the use is
invalid without re-running the validation algorithms. It would also make it difficult to
uniquely identify multiple domain or range violations that resulted from a single property
35
use. Including a way to immediately look up the violated restriction gives a quick way to
inform the user why the relationship is used incorrectly.
The foreign key constraints of RDBOM's relational database management system
are leveraged to take care of many of the situations where an invalid property use
becomes correct. The schema can be set so that any deletions of the referenced
violated_term2terms_id or violated_term_usage_id cascades to the restriction violations
table. As a result, all violated entries referencing the id of either a deleted term usage id
(which happens when a term is deleted) or the id of a term2term entry that is deleted are
automatically removed from the restriction violations table. Because of this, the only
situations in which entries must be manually deleted are when the violations are corrected
by restricting properties to additional terms (which could make them valid) and when a
user moves the class within the ontology.
The cascading deletion cannot apply to the restricted_term2terms_id column
because it references the term2terms table, as is the case with the violated_term2terms_id
column. The database engine being used for RDBOM, MS SQL Server, will not allow
cascade actions to be performed by two individual references to the same table since it
can potentially introduce cycles or multiple cascade paths. Because of this particular
implementation limitation, when an administrator deletes domain and range restrictions,
the corresponding restriction violations entries also will need to be removed.
5.4. SUMMARY
In this section a design was proposed for representing, checking, and allowing the
violation one type of property restriction (namely, domain and range) in RDBOM. In
summary, the new term types entry 'restricted property characteristic' was introduced to
record the restricted property characteristics domain and range. This would allow for two
terms entries, has domain and has range, to be created and identified as restricted
property characteristics. To specify the domain or range of a property p to be some term
t, a term2terms entry of (p, has domain/range, t) was created. The locations requiring
validation to be performed in RDBOM's update pages were identified, and it was decided
that the best way to implement the validation was through RDBOM's application logic.
36
A new table, restriction violations, was also proposed to record any violations that occur,
allowing quick access to the details of the violation.
In the next section, the actual implementation of this design in RDBOM is
discussed.
37
6. IMPLEMENTATION
The actual implementation of the domain and range restricted property
characteristics in RDBOM is discussed in this section. Discussion is focused on some of
the more interesting details involved in the implementation, as well as the algorithms
developed to handle the modification of various parts of RDBOM functionality based on
the discussion in the preceding section. First discussed is how the administrative module
was modified so domain and range could be applied to restrict existing properties. Then
changes to the various update pages are addressed to account for validation of property
use. Finally, the intricacies of allowing restrictions to be violated are covered, as well as
methods for validating the entire ontology and locations in the RDBOM user interface
where the violation information is displayed to the user. Figure 2.1 presented in Section
2 will be used to demonstrate the various implementations in RDBOM.
6.1. PROPERTY CHARACTERISTICS
Implementation of the property characteristics mainly dealt with modifying some
administrative pages and creating the two new term types, 'restricted property
characteristic' and 'property characteristic'. The existing property administration page
was updated to allow administrators to create and delete restricted property
characteristics, and apply them to existing properties and remove property characteristics.
When creating the required SQL queries, several inconsistencies in the RDBOM data
were identified. Seven terms being used as properties were lacking term usage entries to
identify the term's type as a property. This had not been noticed yet during the RDBOM
development because queries relating to properties performed joins between the terms
table and properties table (which is sufficient since each property is required to have a
properties table entry). For the sake of consistency with the RDBOM data and to help
avoid future development problems that could arise due to looking for properties by a
term's usage, corresponding term usage entries were created for the seven missing terms.
38
6.2. VALIDATING RESTRICTED PROPERTY CHARACTERISTICS
The core validation routines occur in the link node to node operations since only a
single relationship is being validated. This routine was developed incrementally by
others, starting with a way to validate a relationship, then validation of a term based on its
relationships that use restricted properties, and finally the application to branches which
validate all of the constituent terms. The following subsections describe how the
validation was performed for various RDBOM operations.
6.2.1. Linking a Node to Another Node. Using a property to relate two terms
takes part in several steps. First, the user selects the domain term and is presented with a
list of properties. The next step involves selecting a property. Upon selecting a property,
the user should now be presented with a list of the range restrictions in place on that
property as well as any error messages due to the selection of the domain term. After
selecting the range term, a confirmation page is displayed summarizing the relationship
to be created. On this page it is first determined if the range of the property is valid in the
context of any domain or range restrictions in place on the property, and, if not, display a
meaningful error message to the user. Otherwise, the user can authorize the creation of
the relationship. Implementing a routine validate_relationship was very straightforward
and followed Algorithm 3.3 with no difficulties. Recall that in Figure 2.1 the edge Floyd
eats Horace was attempted to be created. Figure 6.1 demonstrates how RDBOM utilizes
Algorithm 3.3 to detect that was an invalid edge and prevents the user from creating it.
6.2.2. Moving Nodes/Branches. The move operation allows the user to relocate
an individual term or the root term of entire branch in the ontology. If the selected term
is a leaf node, then the validation process needs to evaluate the new ancestry path that
would be created upon moving the term. Every relationship the term is involved with
that contains restricted properties must be validated. For terms selected to be moved that
are actually a root of a branch, this validation must be done for every term in the branch.
In order to allow a user to relocate nodes in the ontology, the validity of the properties
with domain and range restrictions must be upheld in the new location.
39
Figure 6.1. RDBOM Detection of Invalid Edge from Figure 2.1
Two methods were considered for performing the validation. The first involved
carrying out the move by changing the data in the database. Immediately after this, a
validation routine would be executed on every term involved with the move which would
validate the domain and range of every property applied to every involved term. If any
invalid property use occurred in the new location then the root term could just be moved
back to its old location. However, this produces an intermediate state in RDBOM's data
where it is not known if the relationships are being used correctly or not. RDBOM's
main purpose is to facilitate community-based curation, so it is possible that other users
could be working with the data during this event. The advantage is that this method
would not require any changes to the way check_up is performed to validate restrictions
from Algorithm 3.3 since the data will already be present in the location requiring
validation.
The other option, which was ultimately chosen, involves leaving the selected term
in place, not moving it, but instead using a modified version of Algorithm 3.3 to validate
domain and range terms of relationship edges that already exist. In the modified
algorithm, the check_up routine is changed to take information about the current root
40
term and the new root term. The current root term corresponds to the term selected to be
moved in the ontology, and the new root term refers to the new parent the selected term
will have. When performing the check_up on each term in the branch, check_up will
search through each ancestor path of the term that is being tested until it reaches the
current root term. Upon testing the current root term, if the relationship is still invalid it
will switch to checking the ancestry of the new root term. It can be thought of as creating
a temporary ancestry list corresponding to the term's new location that consists of the
term's parents up through the root term of the branch being moved and then appended to
the parents of the new root term. This allows any invalid domain and range restrictions
to be identified without temporarily moving data in the RDBOM database.
To demonstrate this, imagine moving the branch Animals in Figure 2.1 to be a
child of the term Food. Each term inside the branch (Cat, Dog, Horace, Floyd) must be
checked in the new location to make sure that properties making use of these terms are
still valid. When performing any check_up routines for Horace, it begins searching
through the ancestry path checking Horace, Cat, and then Animals. The routine stops
searching this ancestry path upon reaching Animals since this is the root of the branch
being moved. However, it continues searching by starting at the end of Food's ancestry
path since this is the location to which the branch will be moved.
6.2.3. Linking to Additional Parents. Validation for linking a node to an
additional parent can be done using the same technique as moving a branch. Since only
valid paths are allowed, there is no need to validate them again, only the new path
introduced by the new parent. The algorithm for checking a branch before it is moved is
used with the current root term corresponding to the term being linked to another parent
and the new root term being the parent to which to link the term. This will, in turn,
validate any necessary relationships of the selected term in the context of the new parent's
ancestry path.
Recall in Figure 2.1 how a new edge, Floyd eats Horace, was attempted to be
introduced to the ontology. It failed because Horace is not a subclass of Food. Suppose
it is determined by the administrators that the ontology really needs to support this
relationship. Since RDBOM supports linking nodes to multiple parents, a quick potential
solution to address this issue could be to have Horace link to an additional parent, Food.
41
Figure 6.2 shows the results. The property eats has a domain restriction of Animals and
the relationship Horace eats Chicken of the Sea exists in the ontology. Therefore, the
new path to Horace that would be introduced (Concepts->Food->Horace) contains a
domain violation on the eats property and is not allowed. This is a good demonstration
of how the consistency checking shows that the current class structure of the ontology
does not provide the ability to express particular relationships that the administrators feel
it should include. It should suggest to the administrators that a new structure be designed
to accommodate this type of relationship.
Figure 6.2. RDBOM Attempting a Link Node to Additional Parent Operation
6.3. RESTRICTED PROPERTY CHARACTERISTIC VIOLATIONS
As identified in the previous section, there is a need in some situations for users to
violate the restricted domain and range property characteristics that have been created on
properties. Aside from the creation of the restriction violations table discussed in Section
5.3, the only other database work involved creating an authorization permission to
42
identify users capable of violating restrictions. By referring to the layout of RDBOM's
authorization section (in Section 4.3), it is easy to see how this is accomplished by
creating an authorization levels entry 'violate restrictions'. The subsections presented
regarding various update pages only apply to users that have this authorization level.
The most considerable change to the existing validation routines involved the
confirmation a user must provide before RDBOM commits the user's action to the
database. For example, when linking two terms together, the term2terms entry is not
created until the user's actions are reviewed, which takes place after the validation has
been performed. Any violations of the domain and range restrictions that occur due to
some action by a user cannot be stored immediately in the restriction violations table
because the invalid data has not been created yet. This disrupted the flow of many pages
in the Web-based RDBOM user interface that had followed a rigid, step by step process
for carrying out actions, and resulted in the need to incorporate bookkeeping throughout
every validation process. For every violation, the terms used as both the domain and
range needs to be recorded as well as the restricted property and the violated restriction's
term2terms entry. This provides enough information to uniquely identify a relationship
stored in RDBOM's term2terms table as well as generate a detailed error message for the
user.
The validation routine for the link node to node operation only needed to identify
new domain and range violations that occurred due to a newly defined relationship. On
the other hand, the Move/Cut update page is capable of not only introducing domain and
range violations, but also fixing existing ones. Because of this, some validation routines
have to not only check for new violations, but also check all the relationships that are
validated to see if they are the cause of existing entries in the restriction violations table.
6.3.1. Node to Node Link Operations. All of the domain and range violations
that occurred due to a new relationship need to be recorded. The implemented version of
Algorithm 3.3 required some modification to keep lists of the specific properties that
were violated and which specific property restriction caused the violation to occur. It
also required information about the user to determine if they were able to violate
restrictions or not, and thus, if their recording was necessary. After checking the
relationship, any resulting errors get presented to the user. If the user chooses to go
43
ahead and make the relationship regardless of the errors, the term2term entry is created
and there is now enough information to record each violation in the restriction violations
table. The RDBOM interface would look identical to that shown in Figure 6.1, except
there would be an additional button allowing the user to create the relationship regardless
of the violations.
Nothing else needed to be implemented for the RDBOM operations affecting
single relationships which involved deleting leaf nodes and deleting node to node links.
Any violations caused by a leaf node were removed automatically from the restriction
violations table due to the foreign key constraints specified on the violated_term2term_id
column. Deleting a leaf deletes its relationships and thus, cascades to the violation
entries. Similarly, a node to node link that gets deleted which also causes a violation
which cascades through the restricted_term2terms_id of the restriction violations table.
6.3.2. Move/Cut and Link to Additional Parents. As mentioned earlier, the
move term operation is capable of introducing new violations as well as fixing existing
violations. The validation of a branch relies on validating each term of the branch with a
new ancestry path. Individual relationship validation is capable of recording domain and
range violations, so the term validation process can leverage this by retrieving a list of
restricted properties a specific term is involved with and running each of these
relationships through the already existing validation routine.
Moving a term to a new location in the tree could cause new violations to occur
by either that term itself or its children. However, this action also is capable of fixing
existing violations. The validate relationship routine only records the violated domain
and range restrictions. In order to identify new violations that would result from a move
and violations that are corrected, the branch validation routine is executed twice. The
first time it is used to generate all of the current violations that exist in the branch. The
second time it generates the violations that would occur by placing the branch under a
new root, as described in Section 6.2.1. Comparing the two sets of violations produces
the new violations which are not present in the first violation set, and the fixed violations
which are those found in the first, but not second, violation set.
This process of determining the validity of a branch of the ontology before and
after an operation is used in two other places besides the move/cut operation. Linking a
44
term to an additional parent can introduce new violations which can be detected in this
manner, with the first violation set representing the violations of the term with its current
ancestry paths and the second violation set representing only the new ancestry path of the
term after the operation. Similarly, detaching a node from one of its parents can compare
the first set of its current violations with the second set (which is generated after the node
is detached from its parent). The second set contains the validations for all of the paths
except for the deleted one so the comparison is used to identify any violations that are
now valid (since the extra path has been deleted).
To demonstrate this, consider the modified Figure 2.1 that was presented at the
beginning of this section. Assume those data had been imported into RDBOM, including
the specification of the term Chicken of the Sea as a subclass of Animal instead of Food.
This would be identified as a range violation of the property eats in the relationship
Horace eats Chicken of the Sea. Figure 6.3 demonstrates how the move operation can be
used to correct this violation.
6.4. ENTIRE ONTOLOGY VALIDATION
Performing validation of the entire ontology is necessary for reasons covered in
Section 5, including checking the consistency of property use in imported ontologies, and
defining new domain and range restrictions on properties. Thus, it could be the case that
validation of an entire ontology is being performed which fully populates the restriction
violations table, or that new property restriction is being added to an existing ontology.
The implementation is similar to that for detaching a node from a parent (previously
discussed in Section 6.3.3). First, a list of all the restriction violations entries are
retrieved for the ontology. Next, the ontology is traversed, starting with the root term of
the ontology and making a list of terms that have been visited. As each term is visited, it
is validated in the same manner as terms are validated when checking branches. For each
violation identified, the corresponding restriction violations entry is also identified and
mark is as having been processed. If no entry is found, an appropriate entry in the table is
created for it. After every term has been visited, the restriction violations entries that
were not found are deleted and the user is informed that the violations no longer exist.
45
Figure 6.3. RDBOM Fixing a Violation via the Move Operation
Having the ability to validate an entire ontology allowed for the consistency
checking of the AmphibAnat ontology. The AmphibAnat [2] ontology is being developed
in RDBOM with 9,790 class terms and 44 properties. Excluding the basic relationships
types (is a, part of) as well as properties used to describe unique RDBOM identifiers
there are 15,145 relationships. The structure of the ontology is larger than the numbers
portray due to many class terms having multiple parents which results in their entire
branch being included in several places throughout the ontology. Several domain and
range restrictions were put in place on the ontology's properties, as show in Figure 6.4.
These restrictions mostly consist of properties involving terms located under the
"Administrator's Node" which contains dbxrefs, images and synonyms. Other
relationships involve citing pieces of literature. From these 12 property restrictions, 28
domain violations were discovered along with 58 range violations.
Reviewing the violations shows some improper use of several properties. For
example, the is discussed by property is meant to relate a term to a piece of literature that
contains a discussion of the term. The piece of literature should be the range of any is
46
Figure 6.4. AmphibAnat Restricted Properties
discussed by relationships. However, when a user was annotating one journal article
term, Dunlap 1960, the journal article was improperly used as the domain of is discussed
by relationships stating the journal article is discussed by several mussels. The intended
property to use in the relationships was discussed. It identifies an inconsistent use of the
is discussed by property. Figure 6.7 shows the report of violations detected.
6.5. INFORMATIONAL PAGES
A considerable amount of work went into managing violations of domain and
range restrictions throughout the RDBOM system. But another aspect of this project
concerned the display of relevant restriction violation information to the user. This
affected two locations in the RDBOM user interface. Every class term has an
information page that is displayed to the user which contains: relationships the term has
with others, a term definition, and paths to the term in the ontology. Checking a
relationship of a term to see if it is invalid when rendering this page to the user can be
done easily since the violations are stored in a separate table. The invalid term in the
47
relationship (i.e., domain or range) is displayed in red and the restriction causing the
entry is placed next to it. This provides an easy way for users to tell if a property has
been applied correctly when browsing a term's information. As an example, Figure 6.5
shows RDBOM's information page for Floyd with the invalid relationship Floyd eats
Horace. In the listing for eats relationships, Horace can be seen in red with the statement
"range restricted to Food" following it.
Figure 6.5. RDBOM Term Information Page for Floyd
Particularly in a large ontology such as AmphibAnat, it is not very practical for a
user to check every single term in the ontology for invalid property use. Because of this,
a page containing all of the relationships violating property restrictions was created. The
page first shows every property restriction that exists in the ontology. It presents the user
with a list of domain violations and a list of range violations. Links are provided to allow
the user to view the terms' information pages. Additionally, if the user has the ability to
48
update the term to which the relationship applies, then a link is provided that allows
him/her to delete node to node relationships for the term. An example of this
functionality is shown in Figure 6.6, which is the violations overview page for Figure 2.1
(that has been used throughout this section). A listing of some of the violations found
from the ontology validation performed on AmphibAnat in 6.4.1 can be seen in Figure
6.7.
Figure 6.6. RDBOM Violations Overview Page
49
Figure 6.7. RDBOM Violations Overview Page for AmphibAnat
6.6. SUMMARY
This section addressed various implementation details associated with the domain
and range restricted property characteristics in RDBOM. This discussion included
functionality that allows administrators to define domain and range restrictions of
properties, the detection and enforcement of the restrictions throughout the operations of
RDBOM that allow modification of the ontology data, and the routines for performing
the validation of the restrictions. Also discussed were the mechanisms that would allow
users to violate restrictions under certain conditions, and a display of information about
violated restrictions.
50
7. FUTURE WORK
The incorporation of property restrictions into the OAM, and the subsequent
design and implementation of domain and range restrictions in RDBOM, suggested some
interesting ideas for further work in this area.
As an example, Section 2 discussed how this implementation was focused on the
need to check the consistency of property use throughout ontologies. But, property
characteristics could also be used to reason over the information associated with terms.
One extension to this work would be to turn the restriction violations table introduced in
Section 5 into a table filled with facts inferred from reasoning. To demonstrate how this
would work, recall the discussion in Section 2 regarding the difference in how RDBOM
was to handle domain and range by restricting the properties to only those values, versus
the OWL implementation which used domain and range to infer information about
classes. OWL uses domain and range to infer that Horace is a subclass of Food based on
the relationship of Floyd eats Horace and eats has range Food.
Now consider the violations stored in RDBOM's restriction violations as stored
facts. Expressing this example in RDBOM, an entry would be created stating that Floyd
eats Horace is invalid because of the range restriction on eats to the term Food. This
means there is an entry in the restriction violations table stating Floyd eats Horace is
invalid because of the restriction "eats has range Food." If range was considered to be a
non-restricted property characteristic in RDBOM, this record could be used instead to
state the relationship Floyd eats Horace is not invalid because of the restriction; rather it
can be inferred that, since Horace is not explicitly stated to be a subclass of Food in the
ontology, the relationship Horace is_a Food exists because of the use of Horace as the
range term of the eats property.
In summary, future work on this project could focus more on using property
characteristics to reason over the data, rather than check the consistency of the data. The
ability to perform consistency checking is an excellent precursor to reasoning as it
ensures valid facts would be inferred. In general, this is an area of research that currently
is lacking for all ontologies, not just those based on the OAM, those implemented as a
relational database, or those designed for multi-user curation.
51
8. SUMMARY
The research focus and problems relating to ontologies in information systems
have changed over the years; namely, the focal point has shifted from the more
theoretical ontology issues to problems associated with the actual development and use of
ontologies in real-world, large-scale, collaborative applications. One step in this
direction is the ability to specify and automatically enforce restrictions such as domain
and range on relations and terms in the ontology. This would aid in the identification of
inconsistencies in the ontology and ultimately help to maintain a higher level of data
integrity.
OAM is a generic, abstract model that provides a framework for constructing a
collaborative, Web-based ontology management system. In this paper, a modified
Ontology Abstract Machine (OAM) model and its related algorithms for various domain
and range maintenance tasks were presented. A modified OAM was proposed and
implemented for the domain and range property restrictions. Detailed discussion of the
RDBOM implementation was given for the validation of individual relationships, terms,
and branches, as well as the techniques for managing information about relationships that
violated the restrictions. It should be noted that domain and range were selected as a
proof of concept; the proposed solution could be applied to other property restrictions as
well.
It is hoped that future work on this project will explore the use of property
characteristic restrictions to reason over the data, rather than just simply check the
consistency of the data. In general, this is an area of research that currently is lacking for
all ontologies, not just those based on the OAM, those implemented as a relational
database, or those designed for multi-user curation. Realization of this goal ultimately
will contribute significantly to the value of the Semantic Web.
52
BIBLIOGRAPHY
[1]
http://ontology.buffalo.edu/. Buffalo Ontology Site, October 2010.
[2]
http://www.amphibanat.org/. AmphibAnat, October 2010.
[3]
J. Leopold, A. Coalter, and L. Lee, "A Generic, Functionally Comprehensive
Approach to Maintaining an Ontology as a Relational Database," ICOSE 2009:
International Conference on Ontological and Semantic Engineering, Rome, Italy,
2009.
[4]
L. Lee, J. Leopold, J. Albath, and A. Coalter, "An Ontology Abstract Machine,"
ICOSE 2009: International Conference on Ontological and Semantic
Engineering, Rome, Italy, 2009.
[5]
http://www.oboedit.org/. OBO Edit, October 2010.
[6]
http://protege.stanford.edu/. Protégé, October 2010.
[7]
http://code.google.com/p/swoop/. SWOOP, October 2010.
[8]
http://www.xspan.org/applications/cobra/. COBrA, October 2010.
[9]
http://www.w3.org/TR/2004/REC-owl-guide-20040210/. OWL Web Ontology
Language Guide, October 2010.
[10]
http://www.geneontology.org/GO.format.obo-1_2.shtml. The OBO Flat File
Format Specification, October 2010.
[11]
H. Knublauch, R. W. Fergerson, N. F. Noy, and M. A. Musen, "The Protégé
OWL Plugin: An Open Development Environment for Semantic Web
Applications," The Semantic Web - ISWC 2004, pp. 229-243, 2004.
[12]
http://www.cs.man.ac.uk/~horrocks/FaCT/. FaCT, October 2010.
[13]
http://clarkparsia.com/pellet. Pellet, October 2010.
[14]
I. Yeh, P. D. Karp, N. F. Noy, and R. B. Altman, "Knowledge acquisition,
consistency checking and concurrency control for Gene Ontology (GO),"
Bioinformatics, Jan 22;19(2):241-8, 2003.
[15]
I. Astrova, N. Korda, and A. Kalja, "Storing OWL Ontologies in SQL Relational
Databases," Proceedings of World Academy of Science, Engineering and
Technology, Vol 29, pp. 167-172, 2007.
53
[16]
J. Zhou, L. Ma, Q. Liu, L. Zhang, Y. Yu, and Y. Pan, "Minerva A Scalable OWL
Ontology Storage and Inference System," Asian Semantic Web Conference
(ASWC), pp. 429-443, 2006.
[17]
M. del Roldan-Garcia and J. F. Aldana-Montes, "DBOWL: Towards a Scalable
and Persistent OWL Reasoner," Internet and Web Applications and Services
(ICIW '08), 2008.
[18]
J. Lee, R. Goodwin, "Ontology Management for Large-Scale Enterprise
Systems," Electronic Commerce Research and Applications, vol. 5, Iss. 1, Spring
2006, pp. 2–15.
54
VITA
Patrick Garrett Edgett was born and raised in Clinton, Missouri. After graduating
from Clinton High School in May 2005 he started college the following August at
Missouri University of Science and Technology in Rolla, Missouri. In May of 2009 he
received the degree of Bachelor of Science in Computer Science. He continued his
education at Missouri S&T and in December of 2010, completed the requirements for his
Master of Science Degree in Computer Science.